On 1/22/21 10:02 AM, Mark Rutland wrote:
On Thu, Jan 21, 2021 at 01:43:14PM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 09:31:10PM +0000, Will Deacon wrote:
On Thu, Jan 21, 2021 at 10:55:21AM -0800, Paul E. McKenney wrote:
On Thu, Jan 21, 2021 at 10:37:21PM +0530, Naresh Kamboju wrote:
While running rcu-torture test on qemu_arm64 and arm64 Juno-r2 device the following kernel crash noticed. This started happening from Linux next next-20210111 tag to next-20210121.
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210111 kernel-config: https://builds.tuxbuild.com/1muTTn7AfqcWvH5x2Alxifn7EUH/config
output log:
[ 621.538050] mem_dump_obj() slab test: rcu_torture_stats = ffff0000c0a3ac40, &rhp = ffff800012debe40, rhp = ffff0000c8cba000, &z = ffff8000091ab8e0 [ 621.546662] mem_dump_obj(ZERO_SIZE_PTR): [ 621.546696] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[...]
Huh. I am relying on virt_addr_valid() rejecting NULL pointers and things like ZERO_SIZE_PTR, which is defined as ((void *)16). It looks like your configuration rejects NULL as an invalid virtual address, but does not reject ZERO_SIZE_PTR. Is this the intent, given that you are not allowed to dereference a ZERO_SIZE_PTR?
Adding the ARM64 guys on CC for their thoughts.
Spooky timing, there was a thread _today_ about that:
https://lore.kernel.org/r/ecbc7651-82c4-6518-d4a9-dbdbdf833b5b@arm.com
Very good, then my workaround (shown below for Naresh's ease of testing) is only a short-term workaround. Yay! ;-)
Hopefully, though we might need to check other architectures beyond arm64, ppc, and x86, to be certain!
Which other architectures do you propose to verify?
Is there any other latent use of virt_addr_valid() that needs this semantic? If so we'll probably want to backport the changes to arm64's implementation, at least for v5.10.
Vincenzo, would you mind taking a look?
I am happy to have a look at it, but due to previous commitments I will be able to get at it after -rc1. A quick grep shows that there are ~32 cases that might be affected by the same semantic in the common code (left out arch/ and drivers/). I will post the improvement for arm64 in the meantime though.
Thanks, Mark.