While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 621.555431] Mem abort info: [ 621.557243] ESR = 0x96000006 [ 621.559074] EC = 0x25: DABT (current EL), IL = 32 bits [ 621.562240] SET = 0, FnV = 0 [ 621.563626] EA = 0, S1PTW = 0 [ 621.565134] Data abort info: [ 621.566425] ISV = 0, ISS = 0x00000006 [ 621.568064] CM = 0, WnR = 0 [ 621.569571] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ef0000 [ 621.572446] [0000000000000008] pgd=0000000102ee1003, p4d=0000000102ee1003, pud=0000000100b25003, pmd=0000000000000000 [ 621.577007] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 621.579359] Modules linked in: rcutorture(-) torture rfkill crct10dif_ce fuse [ 621.582549] CPU: 2 PID: 422 Comm: rcu_torture_sta Not tainted 5.11.0-rc2-next-20210111 #2 [ 621.585294] Hardware name: linux,dummy-virt (DT) [ 621.586671] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [ 621.588497] pc : kmem_valid_obj+0x58/0xa8 [ 621.589748] lr : kmem_valid_obj+0x40/0xa8 [ 621.591022] sp : ffff800012debdc0 [ 621.592026] x29: ffff800012debdc0 x28: 0000000000000000 [ 621.593652] x27: ffff800012e8b988 x26: ffff0000c634dbc0 [ 621.595287] x25: 0000000000000000 x24: ffff8000091a1e60 [ 621.596882] x23: ffff8000091ab8e0 x22: ffff0000c0a3ac40 [ 621.598464] x21: ffff0000c1f44100 x20: ffff8000091a5e90 [ 621.600070] x19: 0000000000000000 x18: 0000000000000010 [ 621.601692] x17: 0000000000007fff x16: 00000000ffffffff [ 621.603303] x15: ffff0000c0a3b0b8 x14: 66203d2070687226 [ 621.604866] x13: 202c303463613361 x12: 2c30346562656432 [ 621.606455] x11: ffff80001246cbc0 x10: ffff800012454b80 [ 621.608064] x9 : ffff8000100370c8 x8 : 0000000100000000 [ 621.609649] x7 : 0000000000000018 x6 : ffff800012816348 [ 621.611253] x5 : ffff800012816348 x4 : 0000000000000001 [ 621.612849] x3 : 0000000000000001 x2 : 0000000140000000 [ 621.614455] x1 : 0000000000000000 x0 : fffffc0000000000 [ 621.616062] Call trace: [ 621.616816] kmem_valid_obj+0x58/0xa8 [ 621.617933] mem_dump_obj+0x20/0xc8 [ 621.619015] rcu_torture_stats+0xf0/0x298 [rcutorture] [ 621.620578] kthread+0x120/0x158 [ 621.621583] ret_from_fork+0x10/0x34 [ 621.622685] Code: 8b000273 b25657e0 d34cfe73 8b131813 (f9400660) [ 621.624570] ---[ end trace 2a00688830f37ea1 ]---
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
Full test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210111/tes... https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210121/tes...
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 621.555431] Mem abort info: [ 621.557243] ESR = 0x96000006 [ 621.559074] EC = 0x25: DABT (current EL), IL = 32 bits [ 621.562240] SET = 0, FnV = 0 [ 621.563626] EA = 0, S1PTW = 0 [ 621.565134] Data abort info: [ 621.566425] ISV = 0, ISS = 0x00000006 [ 621.568064] CM = 0, WnR = 0 [ 621.569571] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000101ef0000 [ 621.572446] [0000000000000008] pgd=0000000102ee1003, p4d=0000000102ee1003, pud=0000000100b25003, pmd=0000000000000000 [ 621.577007] Internal error: Oops: 96000006 [#1] PREEMPT SMP [ 621.579359] Modules linked in: rcutorture(-) torture rfkill crct10dif_ce fuse [ 621.582549] CPU: 2 PID: 422 Comm: rcu_torture_sta Not tainted 5.11.0-rc2-next-20210111 #2 [ 621.585294] Hardware name: linux,dummy-virt (DT) [ 621.586671] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) [ 621.588497] pc : kmem_valid_obj+0x58/0xa8 [ 621.589748] lr : kmem_valid_obj+0x40/0xa8 [ 621.591022] sp : ffff800012debdc0 [ 621.592026] x29: ffff800012debdc0 x28: 0000000000000000 [ 621.593652] x27: ffff800012e8b988 x26: ffff0000c634dbc0 [ 621.595287] x25: 0000000000000000 x24: ffff8000091a1e60 [ 621.596882] x23: ffff8000091ab8e0 x22: ffff0000c0a3ac40 [ 621.598464] x21: ffff0000c1f44100 x20: ffff8000091a5e90 [ 621.600070] x19: 0000000000000000 x18: 0000000000000010 [ 621.601692] x17: 0000000000007fff x16: 00000000ffffffff [ 621.603303] x15: ffff0000c0a3b0b8 x14: 66203d2070687226 [ 621.604866] x13: 202c303463613361 x12: 2c30346562656432 [ 621.606455] x11: ffff80001246cbc0 x10: ffff800012454b80 [ 621.608064] x9 : ffff8000100370c8 x8 : 0000000100000000 [ 621.609649] x7 : 0000000000000018 x6 : ffff800012816348 [ 621.611253] x5 : ffff800012816348 x4 : 0000000000000001 [ 621.612849] x3 : 0000000000000001 x2 : 0000000140000000 [ 621.614455] x1 : 0000000000000000 x0 : fffffc0000000000 [ 621.616062] Call trace: [ 621.616816] kmem_valid_obj+0x58/0xa8 [ 621.617933] mem_dump_obj+0x20/0xc8 [ 621.619015] rcu_torture_stats+0xf0/0x298 [rcutorture] [ 621.620578] kthread+0x120/0x158 [ 621.621583] ret_from_fork+0x10/0x34 [ 621.622685] Code: 8b000273 b25657e0 d34cfe73 8b131813 (f9400660) [ 621.624570] ---[ end trace 2a00688830f37ea1 ]---
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
Full test log: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210111/tes... https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210121/tes...
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
It is easy enough for me to make kmem_valid_obj() return false for any address less than (say) PAGE_SIZE for the upcoming merge window, but I figured I should check for the longer term.
Thanx, Paul
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Will
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Thanx, Paul
------------------------------------------------------------------------
diff --git a/mm/slab_common.c b/mm/slab_common.c index cefa9ae..a8375d1 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -550,7 +550,8 @@ bool kmem_valid_obj(void *object) { struct page *page;
- if (!virt_addr_valid(object)) + /* Some arches consider ZERO_SIZE_PTR to be a valid address. */ + if (object < (void *)PAGE_SIZE || !virt_addr_valid(object)) return false; page = virt_to_head_page(object); return PageSlab(page);
On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Paul, thanks for your (short-term workaround) patch.
I have applied your patch and tested rcu-torture test on qemu_arm64 and the reported issues has been fixed.
- Naresh
On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Paul, thanks for your (short-term workaround) patch.
I have applied your patch and tested rcu-torture test on qemu_arm64 and the reported issues has been fixed.
May I add your Tested-by?
And before I forget again, good to see the rcutorture testing on a non-x86 platform!
Thanx, Paul
On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney paulmck@kernel.org wrote:
On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Paul, thanks for your (short-term workaround) patch.
I have applied your patch and tested rcu-torture test on qemu_arm64 and the reported issues has been fixed.
May I add your Tested-by?
Yes. Please add Reported-by and Tested-by.
And before I forget again, good to see the rcutorture testing on a non-x86 platform!
We are running rcutorture tests on arm, arm64, i386 and x86_64.
Happy to test !
- Naresh
On Fri, Jan 22, 2021 at 09:16:38PM +0530, Naresh Kamboju wrote:
On Fri, 22 Jan 2021 at 21:07, Paul E. McKenney paulmck@kernel.org wrote:
On Fri, Jan 22, 2021 at 03:21:07PM +0530, Naresh Kamboju wrote:
On Fri, 22 Jan 2021 at 03:13, Paul E. McKenney paulmck@kernel.org wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote: > While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device > the following kernel crash noticed. This started happening from Linux next > next-20210111 tag to next-20210121. > > metadata: > git branch: master > git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next > git describe: next-20210111 > kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config > > output log: > > [ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = > ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z > = ffff8000091ab8e0 > [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): > [ 621.546696] Unable to handle kernel NULL pointer dereference at > virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Paul, thanks for your (short-term workaround) patch.
I have applied your patch and tested rcu-torture test on qemu_arm64 and the reported issues has been fixed.
May I add your Tested-by?
Yes. Please add Reported-by and Tested-by.
Very good! I have added:
Tested-by: Naresh Kamboju naresh.kamboju@linaro.org
Because I folded the workaround into the first commit in the series, instead of adding your Reported-by, I added the following to that commit:
[ paulmck: Explicitly check for small pointers per Naresh Kamboju. ]
And before I forget again, good to see the rcutorture testing on a non-x86 platform!
We are running rcutorture tests on arm, arm64, i386 and x86_64.
Nice!!!
Some ARMv8 people are getting bogus (but harmless) error messages because parts of rcutorture think that all the world is an x86. I am looking at a fix, but need to work out what the system is. To that end, coul you please run the following on the arm, arm64, and i386 systems and tell me what the output is?
gcc -dumpmachine
Happy to test !
And thank you very much for your testing efforts!!!
Thanx, Paul
On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Hopefully, though we might need to check other architectures beyond arm64, ppc, and x86, to be certain!
Is there any other latent use of virt_addr_valid() that needs this semantic? If so we'll probably want to backport the changes to arm64's implementation, at least for v5.10.
Vincenzo, would you mind taking a look?
Thanks, Mark.
On 1/22/21 10:02 AM, Mark Rutland wrote:
On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Hopefully, though we might need to check other architectures beyond arm64, ppc, and x86, to be certain!
Which other architectures do you propose to verify?
Is there any other latent use of virt_addr_valid() that needs this semantic? If so we'll probably want to backport the changes to arm64's implementation, at least for v5.10.
Vincenzo, would you mind taking a look?
I am happy to have a look at it, but due to previous commitments I will be able to get at it after -rc1. A quick grep shows that there are ~32 cases that might be affected by the same semantic in the common code (left out arch/ and drivers/). I will post the improvement for arm64 in the meantime though.
Thanks, Mark.