On Mon, Feb 17, 2025 at 05:00:43PM +0530, Naresh Kamboju wrote:
On Sat, 8 Feb 2025 at 16:54, Naresh Kamboju naresh.kamboju@linaro.org wrote:
[...]
We observed a kernel warning on QEMU-ARM64 and FVP while running the newly added selftest: arm64: check_hugetlb_options. This issue appears on 6.6.76 onward and 6.12.13 onward, as reported in the stable review [1]. However, the test case passes successfully on stable 6.13.
The selftests: arm64: check_hugetlb_options test was introduced following the recent upgrade of kselftest test sources to the stable 6.13 branch. As you are aware, LKFT runs the latest kselftest sources (from stable 6.13.x) on 6.12.x, 6.6.x, and older kernels for validation purposes.
From Anders' bisection results, we identified that the missing patch on 6.12 is likely causing this regression:
First fixed commit: [25c17c4b55def92a01e3eecc9c775a6ee25ca20f] hugetlb: arm64: add MTE support
I wouldn't backport this and it's definitely not a fix for the problem reported.
Could you confirm whether this patch is eligible for backporting to 6.12 and 6.6 kernels? If backporting is not an option, we will need to skip running this test case on older kernels.
Regression on qemu-arm64 and FVP noticed this kernel warning running selftests: arm64: check_hugetlb_options test case on 6.6.76-rc1 and 6.6.76-rc2.
Test regression: WARNING-arch-arm64-mm-copypage-copy_highpage
------------[ cut here ]------------ [ 96.920028] WARNING: CPU: 1 PID: 3611 at arch/arm64/mm/copypage.c:29 copy_highpage (arch/arm64/include/asm/mte.h:87) [ 96.922100] Modules linked in: crct10dif_ce sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables [ 96.925603] CPU: 1 PID: 3611 Comm: check_hugetlb_o Not tainted 6.6.76-rc2 #1 [ 96.926956] Hardware name: linux,dummy-virt (DT) [ 96.927695] pstate: 43402009 (nZcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 96.928687] pc : copy_highpage (arch/arm64/include/asm/mte.h:87) [ 96.929037] lr : copy_highpage (arch/arm64/include/asm/alternative-macros.h:232 arch/arm64/include/asm/cpufeature.h:443 arch/arm64/include/asm/cpufeature.h:504 arch/arm64/include/asm/cpufeature.h:814 arch/arm64/mm/copypage.c:27) [ 96.929399] sp : ffff800088aa3ab0 [ 96.930232] x29: ffff800088aa3ab0 x28: 00000000000001ff x27: 0000000000000000 [ 96.930784] x26: 0000000000000000 x25: 0000ffff9b800000 x24: 0000ffff9b9ff000 [ 96.931402] x23: fffffc0003257fc0 x22: ffff0000c95ff000 x21: ffff0000c93ff000 [ 96.932054] x20: fffffc0003257fc0 x19: fffffc000324ffc0 x18: 0000ffff9b800000 [ 96.933357] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 96.934091] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 [ 96.935095] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 [ 96.935982] x8 : 0bfffc0001800000 x7 : 0000000000000000 x6 : 0000000000000000 [ 96.936536] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 96.937089] x2 : 0000000000000000 x1 : ffff0000c9600000 x0 : ffff0000c9400080 [ 96.939431] Call trace: [ 96.939920] copy_highpage (arch/arm64/include/asm/mte.h:87) [ 96.940443] copy_user_highpage (arch/arm64/mm/copypage.c:40) [ 96.940963] copy_user_large_folio (mm/memory.c:5977 mm/memory.c:6109) [ 96.941535] hugetlb_wp (mm/hugetlb.c:5701) [ 96.941948] hugetlb_fault (mm/hugetlb.c:6237) [ 96.942344] handle_mm_fault (mm/memory.c:5330) [ 96.942794] do_page_fault (arch/arm64/mm/fault.c:513 arch/arm64/mm/fault.c:626) [ 96.943341] do_mem_abort (arch/arm64/mm/fault.c:846) [ 96.943797] el0_da (arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:144 arch/arm64/kernel/entry-common.c:547) [ 96.944229] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:0) [ 96.944765] el0t_64_sync (arch/arm64/kernel/entry.S:599) [ 96.945383] ---[ end trace 0000000000000000 ]---
Prior to commit 25c17c4b55de ("hugetlb: arm64: add mte support"), there was no hugetlb support with MTE, so the above code path should not happen - it seems to get a PROT_MTE hugetlb page which should have been prevented by arch_validate_flags(). Or something else corrupts the page flags and we end up with some random PG_mte_tagged set.
Does this happen with vanilla 6.6? I wonder whether we always had this issue, only that we haven't noticed until the hugetlb MTE kselftest. There were some backports in this area but I don't see how they would have caused this.