Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00
This occurs when memset() is called on a buffer that is not 4-byte aligned and extends to the end of a guard page, i.e. the next page is unmapped.
The bug is that the loop at the end of kmsan_internal_set_shadow_origin() accesses the wrong shadow memory bytes when the address is not 4-byte aligned. Since each 4 bytes are associated with an origin, it rounds the address and size so that it can access all the origins that contain the buffer. However, when it checks the corresponding shadow bytes for a particular origin, it incorrectly uses the original unrounded shadow address. This results in reads from shadow memory beyond the end of the buffer's shadow memory, which crashes when that memory is not mapped.
To fix this, correctly align the shadow address before accessing the 4 shadow bytes corresponding to each origin.
Fixes: 2ef3cec44c60 ("kmsan: do not wipe out origin when doing partial unpoisoning") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@kernel.org --- mm/kmsan/core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/kmsan/core.c b/mm/kmsan/core.c index 1ea711786c522..8bca7fece47f0 100644 --- a/mm/kmsan/core.c +++ b/mm/kmsan/core.c @@ -193,11 +193,12 @@ depot_stack_handle_t kmsan_internal_chain_origin(depot_stack_handle_t id)
void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b, u32 origin, bool checked) { u64 address = (u64)addr; - u32 *shadow_start, *origin_start; + void *shadow_start; + u32 *aligned_shadow, *origin_start; size_t pad = 0;
KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size)); shadow_start = kmsan_get_metadata(addr, KMSAN_META_SHADOW); if (!shadow_start) { @@ -212,13 +213,16 @@ void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b, } return; } __memset(shadow_start, b, size);
- if (!IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) { + if (IS_ALIGNED(address, KMSAN_ORIGIN_SIZE)) { + aligned_shadow = shadow_start; + } else { pad = address % KMSAN_ORIGIN_SIZE; address -= pad; + aligned_shadow = shadow_start - pad; size += pad; } size = ALIGN(size, KMSAN_ORIGIN_SIZE); origin_start = (u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN); @@ -228,11 +232,11 @@ void kmsan_internal_set_shadow_origin(void *addr, size_t size, int b, * and unconditionally overwrite the old origin slot. * If the new origin is zero, overwrite the old origin slot iff the * corresponding shadow slot is zero. */ for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++) { - if (origin || !shadow_start[i]) + if (origin || !aligned_shadow[i]) origin_start[i] = origin; } }
struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
base-commit: 1b237f190eb3d36f52dffe07a40b5eb210280e00
On Fri, Aug 29, 2025 at 09:45:00AM -0700, Eric Biggers wrote:
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00
Any thoughts on this patch from the KMSAN folks? I'd love to add CONFIG_KMSAN=y to my crypto subsystem testing, but unfortunately the kernel crashes due to this bug :-(
- Eric
On Wed, Sep 10, 2025 at 9:49 PM Eric Biggers ebiggers@kernel.org wrote:
On Fri, Aug 29, 2025 at 09:45:00AM -0700, Eric Biggers wrote:
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00
Any thoughts on this patch from the KMSAN folks? I'd love to add CONFIG_KMSAN=y to my crypto subsystem testing, but unfortunately the kernel crashes due to this bug :-(
- Eric
Sorry, I was out in August and missed this email when digging through my inbox.
Curiously, I couldn't find any relevant crashes on the KMSAN syzbot instance, but the issue is legit. Thank you so much for fixing this!
Any chance you can add a test case for it to mm/kmsan/kmsan_test.c?
-- Alexander Potapenko Software Engineer
Google Germany GmbH Erika-Mann-Straße, 33 80636 München
Geschäftsführer: Paul Manicle, Liana Sebastian Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg
On Thu, Sep 11, 2025 at 11:09:17AM +0200, Alexander Potapenko wrote:
On Wed, Sep 10, 2025 at 9:49 PM Eric Biggers ebiggers@kernel.org wrote:
On Fri, Aug 29, 2025 at 09:45:00AM -0700, Eric Biggers wrote:
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00
Any thoughts on this patch from the KMSAN folks? I'd love to add CONFIG_KMSAN=y to my crypto subsystem testing, but unfortunately the kernel crashes due to this bug :-(
- Eric
Sorry, I was out in August and missed this email when digging through my inbox.
Curiously, I couldn't find any relevant crashes on the KMSAN syzbot instance, but the issue is legit. Thank you so much for fixing this!
Any chance you can add a test case for it to mm/kmsan/kmsan_test.c?
Unfortunately most of the KMSAN test cases already fail on upstream, which makes it difficult to develop new ones:
[ 1.322395] KTAP version 1 [ 1.322899] 1..1 [ 1.323644] KTAP version 1 [ 1.324142] # Subtest: kmsan [ 1.324650] # module: kmsan_test [ 1.324667] 1..24 [ 1.325990] # test_uninit_kmalloc: uninitialized kmalloc test (UMR report) [ 1.327078] *ptr is true [ 1.327525] # test_uninit_kmalloc: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:173 Expected report_matches(&expect) to be true, but is false [ 1.330117] not ok 1 test_uninit_kmalloc [ 1.330474] # test_init_kmalloc: initialized kmalloc test (no reports) [ 1.332129] *ptr is false [ 1.333384] ok 2 test_init_kmalloc [ 1.333729] # test_init_kzalloc: initialized kzalloc test (no reports) [ 1.335285] *ptr is false [ 1.339418] ok 3 test_init_kzalloc [ 1.339791] # test_uninit_stack_var: uninitialized stack variable (UMR report) [ 1.341484] cond is false [ 1.341927] # test_uninit_stack_var: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:211 Expected report_matches(&expect) to be true, but is false [ 1.344844] not ok 4 test_uninit_stack_var [ 1.345262] # test_init_stack_var: initialized stack variable (no reports) [ 1.347083] cond is true [ 1.347847] ok 5 test_init_stack_var [ 1.348145] # test_params: uninit passed through a function parameter (UMR report) [ 1.349926] arg1 is false [ 1.350338] arg2 is false [ 1.350746] arg is false [ 1.351154] arg1 is false [ 1.351561] arg2 is true [ 1.351987] # test_params: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:262 Expected report_matches(&expect) to be true, but is false [ 1.354751] not ok 6 test_params [ 1.355229] # test_uninit_multiple_params: uninitialized local passed to fn (UMR report) [ 1.357056] signed_sum3(a, b, c) is true [ 1.357677] # test_uninit_multiple_params: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:282 Expected report_matches(&expect) to be true, but is false [ 1.360393] not ok 7 test_uninit_multiple_params [ 1.360676] # test_uninit_kmsan_check_memory: kmsan_check_memory() called on uninit local (UMR report) [ 1.362916] # test_uninit_kmsan_check_memory: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:309 Expected report_matches(&expect) to be true, but is false [ 1.365946] not ok 8 test_uninit_kmsan_check_memory [ 1.366415] # test_init_kmsan_vmap_vunmap: pages initialized via vmap (no reports) [ 1.368805] ok 9 test_init_kmsan_vmap_vunmap [ 1.369223] # test_init_vmalloc: vmalloc buffer can be initialized (no reports) [ 1.371106] buf[0] is true [ 1.371937] ok 10 test_init_vmalloc [ 1.372396] # test_uaf: use-after-free in kmalloc-ed buffer (UMR report) [ 1.374021] value is true [ 1.374463] # test_uaf: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:378 Expected report_matches(&expect) to be true, but is false [ 1.376867] not ok 11 test_uaf [ 1.377229] # test_percpu_propagate: uninit local stored to per_cpu memory (UMR report) [ 1.378951] check is false [ 1.379432] # test_percpu_propagate: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:396 Expected report_matches(&expect) to be true, but is false [ 1.382201] not ok 12 test_percpu_propagate [ 1.382625] # test_printk: uninit local passed to pr_info() (UMR report) [ 1.384329] ffffc900002bfcd4 contains 0 [ 1.384933] # test_printk: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:418 Expected report_matches(&expect) to be true, but is false [ 1.387474] not ok 13 test_printk [ 1.387824] # test_init_memcpy: memcpy()ing aligned initialized src to aligned dst (no reports) [ 1.390061] ok 14 test_init_memcpy [ 1.390327] # test_memcpy_aligned_to_aligned: memcpy()ing aligned uninit src to aligned dst (UMR report) [ 1.392359] # test_memcpy_aligned_to_aligned: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:459 Expected report_matches(&expect) to be true, but is false [ 1.395181] not ok 15 test_memcpy_aligned_to_aligned [ 1.395467] # test_memcpy_aligned_to_unaligned: memcpy()ing aligned uninit src to unaligned dst (UMR report) [ 1.397845] # test_memcpy_aligned_to_unaligned: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:483 Expected report_matches(&expect) to be true, but is false [ 1.400221] # test_memcpy_aligned_to_unaligned: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:486 Expected report_matches(&expect) to be true, but is false [ 1.403059] not ok 16 test_memcpy_aligned_to_unaligned [ 1.403437] # test_memcpy_initialized_gap: unaligned 4-byte initialized value gets a nonzero origin after memcpy() - (2 UMR reports) [ 1.406077] # test_memcpy_initialized_gap: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:532 Expected report_matches(&expect) to be true, but is false [ 1.408340] # test_memcpy_initialized_gap: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:538 Expected report_matches(&expect) to be true, but is false [ 1.411063] not ok 17 test_memcpy_initialized_gap [ 1.411338] # test_memset16: memset16() should initialize memory [ 1.413393] ok 18 test_memset16 [ 1.413651] # test_memset32: memset32() should initialize memory [ 1.415427] ok 19 test_memset32 [ 1.415739] # test_memset64: memset64() should initialize memory [ 1.417513] ok 20 test_memset64 [ 1.417783] # test_long_origin_chain: origin chain exceeding KMSAN_MAX_ORIGIN_DEPTH (UMR report) [ 1.419805] # test_long_origin_chain: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:584 Expected report_matches(&expect) to be true, but is false [ 1.422415] not ok 21 test_long_origin_chain [ 1.422752] # test_stackdepot_roundtrip: testing stackdepot roundtrip (no reports) [ 1.424598] kunit_try_run_case+0x19d/0xa50 [ 1.425243] kunit_generic_run_threadfn_adapter+0x62/0xe0 [ 1.426252] kthread+0x8cd/0xb40 [ 1.426747] ret_from_fork+0x189/0x2b0 [ 1.427320] ret_from_fork_asm+0x1a/0x30 [ 1.428245] ok 22 test_stackdepot_roundtrip [ 1.428519] # test_unpoison_memory: unpoisoning via the instrumentation vs. kmsan_unpoison_memory() (2 UMR reports) [ 1.430771] ===================================================== [ 1.431682] BUG: KMSAN: uninit-value in test_unpoison_memory+0x146/0x3e0 [ 1.432705] test_unpoison_memory+0x146/0x3e0 [ 1.433356] kunit_try_run_case+0x19d/0xa50 [ 1.433979] kunit_generic_run_threadfn_adapter+0x62/0xe0 [ 1.434773] kthread+0x8cd/0xb40 [ 1.435263] ret_from_fork+0x189/0x2b0 [ 1.435846] ret_from_fork_asm+0x1a/0x30
[ 1.436692] Local variable a created at: [ 1.437270] test_unpoison_memory+0x41/0x3e0 [ 1.437903] kunit_try_run_case+0x19d/0xa50
[ 1.438766] Bytes 0-2 of 3 are uninitialized [ 1.439433] Memory access of size 3 starts at ffffc90000347cd5
[ 1.440517] CPU: 3 UID: 0 PID: 99 Comm: kunit_try_catch Tainted: G N 6.17.0-rc5-00110-ge59a039119c3 #3 PREEMPT(none) [ 1.442247] Tainted: [N]=TEST [ 1.442725] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 1.444376] ===================================================== [ 1.445263] Disabling lock debugging due to kernel taint [ 1.446103] ===================================================== [ 1.447007] BUG: KMSAN: uninit-value in test_unpoison_memory+0x23f/0x3e0 [ 1.447996] test_unpoison_memory+0x23f/0x3e0 [ 1.448650] kunit_try_run_case+0x19d/0xa50 [ 1.449319] kunit_generic_run_threadfn_adapter+0x62/0xe0 [ 1.450122] kthread+0x8cd/0xb40 [ 1.450611] ret_from_fork+0x189/0x2b0 [ 1.451181] ret_from_fork_asm+0x1a/0x30
[ 1.452010] Local variable b created at: [ 1.452894] test_unpoison_memory+0x56/0x3e0 [ 1.453537] kunit_try_run_case+0x19d/0xa50
[ 1.454407] Bytes 0-2 of 3 are uninitialized [ 1.455043] Memory access of size 3 starts at ffffc90000347cd1
[ 1.456182] CPU: 3 UID: 0 PID: 99 Comm: kunit_try_catch Tainted: G B N 6.17.0-rc5-00110-ge59a039119c3 #3 PREEMPT(none) [ 1.457925] Tainted: [B]=BAD_PAGE, [N]=TEST [ 1.458545] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 [ 1.460239] ===================================================== [ 1.461617] ok 23 test_unpoison_memory [ 1.462056] # test_copy_from_kernel_nofault: testing copy_from_kernel_nofault with uninitialized memory [ 1.464122] ret is false [ 1.464538] # test_copy_from_kernel_nofault: EXPECTATION FAILED at mm/kmsan/kmsan_test.c:656 Expected report_matches(&expect) to be true, but is false [ 1.467250] not ok 24 test_copy_from_kernel_nofault [ 1.482563] # kmsan: pass:11 fail:13 skip:0 total:24 [ 1.483790] # Totals: pass:11 fail:13 skip:0 total:24 [ 1.484532] not ok 1 kmsan
On Thu, Sep 11, 2025 at 10:51:45AM -0700, Eric Biggers wrote:
On Thu, Sep 11, 2025 at 11:09:17AM +0200, Alexander Potapenko wrote:
On Wed, Sep 10, 2025 at 9:49 PM Eric Biggers ebiggers@kernel.org wrote:
On Fri, Aug 29, 2025 at 09:45:00AM -0700, Eric Biggers wrote:
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00
Any thoughts on this patch from the KMSAN folks? I'd love to add CONFIG_KMSAN=y to my crypto subsystem testing, but unfortunately the kernel crashes due to this bug :-(
- Eric
Sorry, I was out in August and missed this email when digging through my inbox.
Curiously, I couldn't find any relevant crashes on the KMSAN syzbot instance, but the issue is legit. Thank you so much for fixing this!
Any chance you can add a test case for it to mm/kmsan/kmsan_test.c?
Unfortunately most of the KMSAN test cases already fail on upstream, which makes it difficult to develop new ones:
The KMSAN test failures bisect to the following commit:
commit f90b474a35744b5d43009e4fab232e74a3024cae Author: Vlastimil Babka vbabka@suse.cz Date: Mon Mar 10 13:40:17 2025 +0100
mm: Fix the flipped condition in gfpflags_allow_spinning()
I'm not sure why. Apparently something related to lib/stackdepot.c.
Reverting that commit on top of upstream fixes the KMSAN tests.
- Eric
On Thu, Sep 11, 2025 at 12:03:02PM -0700, Eric Biggers wrote:
On Thu, Sep 11, 2025 at 10:51:45AM -0700, Eric Biggers wrote:
On Thu, Sep 11, 2025 at 11:09:17AM +0200, Alexander Potapenko wrote:
On Wed, Sep 10, 2025 at 9:49 PM Eric Biggers ebiggers@kernel.org wrote:
On Fri, Aug 29, 2025 at 09:45:00AM -0700, Eric Biggers wrote:
Running sha224_kunit on a KMSAN-enabled kernel results in a crash in kmsan_internal_set_shadow_origin():
BUG: unable to handle page fault for address: ffffbc3840291000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 1810067 P4D 1810067 PUD 192d067 PMD 3c17067 PTE 0 Oops: 0000 [#1] SMP NOPTI CPU: 0 UID: 0 PID: 81 Comm: kunit_try_catch Tainted: G N 6.17.0-rc3 #10 PREEMPT(voluntary) Tainted: [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014 RIP: 0010:kmsan_internal_set_shadow_origin+0x91/0x100 [...] Call Trace: <TASK> __msan_memset+0xee/0x1a0 sha224_final+0x9e/0x350 test_hash_buffer_overruns+0x46f/0x5f0 ? kmsan_get_shadow_origin_ptr+0x46/0xa0 ? __pfx_test_hash_buffer_overruns+0x10/0x10 kunit_try_run_case+0x198/0xa00
Any thoughts on this patch from the KMSAN folks? I'd love to add CONFIG_KMSAN=y to my crypto subsystem testing, but unfortunately the kernel crashes due to this bug :-(
- Eric
Sorry, I was out in August and missed this email when digging through my inbox.
Curiously, I couldn't find any relevant crashes on the KMSAN syzbot instance, but the issue is legit. Thank you so much for fixing this!
Any chance you can add a test case for it to mm/kmsan/kmsan_test.c?
Unfortunately most of the KMSAN test cases already fail on upstream, which makes it difficult to develop new ones:
The KMSAN test failures bisect to the following commit:
commit f90b474a35744b5d43009e4fab232e74a3024cae Author: Vlastimil Babka <vbabka@suse.cz> Date: Mon Mar 10 13:40:17 2025 +0100 mm: Fix the flipped condition in gfpflags_allow_spinning()
I'm not sure why. Apparently something related to lib/stackdepot.c.
Reverting that commit on top of upstream fixes the KMSAN tests.
Rolling back all the BPF (?) related changes that were made to lib/stackdepot.c in v6.15 fixes this too. Looks like there was a regression where stack traces stopped being saved in some cases.
diff --git a/lib/stackdepot.c b/lib/stackdepot.c index de0b0025af2b9..99e374d35b61d 100644 --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -638,12 +638,11 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, struct list_head *bucket; struct stack_record *found = NULL; depot_stack_handle_t handle = 0; struct page *page = NULL; void *prealloc = NULL; - bool allow_spin = gfpflags_allow_spinning(alloc_flags); - bool can_alloc = (depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC) && allow_spin; + bool can_alloc = depot_flags & STACK_DEPOT_FLAG_CAN_ALLOC; unsigned long flags; u32 hash;
if (WARN_ON(depot_flags & ~STACK_DEPOT_FLAGS_MASK)) return 0; @@ -678,11 +677,11 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, DEPOT_POOL_ORDER); if (page) prealloc = page_address(page); }
- if (in_nmi() || !allow_spin) { + if (in_nmi()) { /* We can never allocate in NMI context. */ WARN_ON_ONCE(can_alloc); /* Best effort; bail if we fail to take the lock. */ if (!raw_spin_trylock_irqsave(&pool_lock, flags)) goto exit; @@ -719,14 +718,11 @@ depot_stack_handle_t stack_depot_save_flags(unsigned long *entries, printk_deferred_exit(); raw_spin_unlock_irqrestore(&pool_lock, flags); exit: if (prealloc) { /* Stack depot didn't use this memory, free it. */ - if (!allow_spin) - free_pages_nolock(virt_to_page(prealloc), DEPOT_POOL_ORDER); - else - free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); + free_pages((unsigned long)prealloc, DEPOT_POOL_ORDER); } if (found) handle = found->handle.handle; return handle; }
linux-stable-mirror@lists.linaro.org