Kunit test cases failed and found warnings while booting Linux next version 6.1.0-rc5-next-20221117 on qemu-x86_64 [1].
It was working on Linux next-20221116 tag.
[ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664465] kfence: kfence_init failed
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Regressions found on qemu_x86_64:[2]
- kunit/test_use_after_free_read - kunit/test_invalid_access - kunit/test_double_free-memcache - kunit/test_krealloc - kunit/test_shrink_memcache - kunit/test_memcache_typesafe_by_rcu - kunit/test_use_after_free_read-memcache - kunit/test_invalid_addr_free-memcache - kunit/test_out_of_bounds_read-memcache - kunit/test_memcache_alloc_bulk - kunit/test_out_of_bounds_read - kunit/test_memcache_ctor - kunit/test_corruption-memcache - kunit/test_gfpzero - kunit/test_out_of_bounds_write - kunit/test_out_of_bounds_write-memcache - kunit/test_kmalloc_aligned_oob_read - kunit/test_free_bulk-memcache
[ 0.663758] ------------[ cut here ]------------ [ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.663782] Modules linked in: [ 0.663788] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rc5-next-20221117 #1 [ 0.663795] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 0.663801] RIP: 0010:kfence_protect+0x7b/0x120 [ 0.663811] Code: f1 f1 f1 f1 c7 40 04 04 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 e0 0d ba ff 48 85 c0 74 06 83 7d a0 01 74 17 <0f> 0b 0f 0b c6 05 cb 97 1d 03 00 45 31 c0 c6 05 c0 97 1d 03 01 eb [ 0.663819] RSP: 0000:ffffffff9c407d98 EFLAGS: 00010002 [ 0.663826] RAX: ffff8880adc01020 RBX: 00000000000001ff RCX: 80000001000001e3 [ 0.663830] RDX: dffffc0000000000 RSI: ffff88811ac00000 RDI: ffff8880adc01020 [ 0.663836] RBP: ffffffff9c407e18 R08: ffffffff997c8ef8 R09: ffffea00046b7f87 [ 0.663841] R10: fffff940008d6ff0 R11: 0000000000000001 R12: 1ffffffff3880fb3 [ 0.663845] R13: ffff88811ac00000 R14: ffffea00046b7fc0 R15: 0000000000000200 [ 0.663852] FS: 0000000000000000(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 [ 0.663859] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.663864] CR2: ffff88813ffff000 CR3: 00000000ac814000 CR4: 00000000000406b0 [ 0.663869] Call Trace: [ 0.663871] <TASK> [ 0.663876] ? __pfx_kfence_protect+0x10/0x10 [ 0.663886] ? __pfx_set_memory_4k+0x10/0x10 [ 0.663899] kfence_init_pool+0x1ea/0x350 [ 0.663909] ? __pfx_kfence_init_pool+0x10/0x10 [ 0.663919] ? random_init+0xe9/0x13b [ 0.663930] ? __pfx_random_init+0x10/0x10 [ 0.663936] ? _find_next_bit+0x46/0xe0 [ 0.663947] kfence_init+0x42/0x1e8 [ 0.663959] start_kernel+0x1fd/0x3a6 [ 0.663970] x86_64_start_reservations+0x28/0x2e [ 0.663978] x86_64_start_kernel+0x96/0xa0 [ 0.663986] secondary_startup_64_no_verify+0xe0/0xeb [ 0.664001] </TASK> [ 0.664003] ---[ end trace 0000000000000000 ]--- [ 0.664032] ------------[ cut here ]------------ [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664045] Modules linked in: [ 0.664049] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.1.0-rc5-next-20221117 #1 [ 0.664055] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 [ 0.664058] RIP: 0010:kfence_protect+0x7d/0x120 [ 0.664068] Code: f1 f1 c7 40 04 04 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 e8 e0 0d ba ff 48 85 c0 74 06 83 7d a0 01 74 17 0f 0b <0f> 0b c6 05 cb 97 1d 03 00 45 31 c0 c6 05 c0 97 1d 03 01 eb 4a 48 [ 0.664074] RSP: 0000:ffffffff9c407d98 EFLAGS: 00010002 [ 0.664080] RAX: ffff8880adc01020 RBX: 00000000000001ff RCX: 80000001000001e3 [ 0.664085] RDX: dffffc0000000000 RSI: ffff88811ac00000 RDI: ffff8880adc01020 [ 0.664090] RBP: ffffffff9c407e18 R08: ffffffff997c8ef8 R09: ffffea00046b7f87 [ 0.664095] R10: fffff940008d6ff0 R11: 0000000000000001 R12: 1ffffffff3880fb3 [ 0.664099] R13: ffff88811ac00000 R14: ffffea00046b7fc0 R15: 0000000000000200 [ 0.664105] FS: 0000000000000000(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 [ 0.664112] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.664116] CR2: ffff88813ffff000 CR3: 00000000ac814000 CR4: 00000000000406b0 [ 0.664121] Call Trace: [ 0.664123] <TASK> [ 0.664126] ? __pfx_kfence_protect+0x10/0x10 [ 0.664136] ? __pfx_set_memory_4k+0x10/0x10 [ 0.664146] kfence_init_pool+0x1ea/0x350 [ 0.664156] ? __pfx_kfence_init_pool+0x10/0x10 [ 0.664166] ? random_init+0xe9/0x13b [ 0.664172] ? __pfx_random_init+0x10/0x10 [ 0.664179] ? _find_next_bit+0x46/0xe0 [ 0.664186] kfence_init+0x42/0x1e8 [ 0.664196] start_kernel+0x1fd/0x3a6 [ 0.664205] x86_64_start_reservations+0x28/0x2e [ 0.664213] x86_64_start_kernel+0x96/0xa0 [ 0.664221] secondary_startup_64_no_verify+0xe0/0xeb [ 0.664232] </TASK> [ 0.664235] ---[ end trace 0000000000000000 ]--- [ 0.664465] kfence: kfence_init failed
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: af37ad1e01c72483c4ee8453d9d9bac95d35f023 git_describe: next-20221117 kernel_version: 6.1.0-rc5 kernel-config: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm/config build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/697483979 artifact-location: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm toolchain: gcc-11
[1] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20221117/tes... [2] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20221117/tes... [3] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20221117/tes...
-- Linaro LKFT https://lkft.linaro.org
On Thu, Nov 17, 2022 at 05:01PM +0530, Naresh Kamboju wrote:
Kunit test cases failed and found warnings while booting Linux next version 6.1.0-rc5-next-20221117 on qemu-x86_64 [1].
It was working on Linux next-20221116 tag.
[ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664465] kfence: kfence_init failed
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
[...]
[ 0.663758] ------------[ cut here ]------------ [ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120
[...]
[ 0.664465] kfence: kfence_init failed
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: af37ad1e01c72483c4ee8453d9d9bac95d35f023 git_describe: next-20221117 kernel_version: 6.1.0-rc5 kernel-config: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm/config build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/697483979 artifact-location: https://builds.tuxbuild.com/2Hfb6n1z0frt4iBlIvqUzjMHiLm toolchain: gcc-11
I bisected this to:
commit 127960a05548ea699a95791669e8112552eb2452 Author: Peter Zijlstra peterz@infradead.org Date: Thu Nov 10 13:33:57 2022 +0100
x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
There is a cludge in change_page_attr_set_clr() that inhibits propagating NX changes to the aliases (directmap and highmap) -- this is a cludge twofold:
- it also inhibits the primary checks in __change_page_attr(); - it hard depends on single bit changes.
The introduction of set_memory_rox() triggered this last issue for clearing both _PAGE_RW and _PAGE_NX.
Explicitly ignore _PAGE_NX in cpa_process_alias() instead.
Fixes: b38994948567 ("x86/mm: Implement native set_memory_rox()") Reported-by: kernel test robot oliver.sang@intel.com Debugged-by: Dave Hansen dave.hansen@intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20221110125544.594991716%40infradead.org
A simple revert of this commit fixes the issue.
Since all this seems to be about set_memory_rox(), and this is a fix commit, the fix itself missed something?
Thanks, -- Marco
On 11/17/22 05:58, Marco Elver wrote:
[ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664465] kfence: kfence_init failed
Any chance you could add some debugging and figure out what actually made kfence call over? Was it the pte or the level?
if (WARN_ON(!pte || level != PG_LEVEL_4K)) return false;
I can see how the thing you bisected to might lead to a page table not being split, which could mess with the 'level' check.
Also, is there a reason this code is mucking with the page tables directly? It seems, uh, rather wonky. This, for instance:
if (protect) set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); else set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); /* * Flush this CPU's TLB, assuming whoever did the allocation/free is * likely to continue running on this CPU. */ preempt_disable(); flush_tlb_one_kernel(addr); preempt_enable();
Seems rather broken. I assume the preempt_disable() is there to get rid of some warnings. But, there is nothing I can see to *keep* the CPU that did the free from being different from the one where the TLB flush is performed until the preempt_disable(). That makes the flush_tlb_one_kernel() mostly useless.
Is there a reason this code isn't using the existing page table manipulation functions and tries to code its own? What prevents it from using something like the attached patch?
On Thu, Nov 17, 2022 at 06:34AM -0800, Dave Hansen wrote:
On 11/17/22 05:58, Marco Elver wrote:
[ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664465] kfence: kfence_init failed
Any chance you could add some debugging and figure out what actually made kfence call over? Was it the pte or the level?
if (WARN_ON(!pte || level != PG_LEVEL_4K)) return false;
I can see how the thing you bisected to might lead to a page table not being split, which could mess with the 'level' check.
Yes - it's the 'level != PG_LEVEL_4K'.
We do actually try to split the pages in arch_kfence_init_pool() (above this function) - so with "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" this somehow fails...
Also, is there a reason this code is mucking with the page tables directly? It seems, uh, rather wonky. This, for instance:
if (protect) set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); else set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); /* * Flush this CPU's TLB, assuming whoever did the allocation/free is * likely to continue running on this CPU. */ preempt_disable(); flush_tlb_one_kernel(addr); preempt_enable();
Seems rather broken. I assume the preempt_disable() is there to get rid of some warnings. But, there is nothing I can see to *keep* the CPU that did the free from being different from the one where the TLB flush is performed until the preempt_disable(). That makes the flush_tlb_one_kernel() mostly useless.
Is there a reason this code isn't using the existing page table manipulation functions and tries to code its own? What prevents it from using something like the attached patch?
Yes, see the comment below - it's to avoid the IPIs and TLB shoot-downs, because KFENCE _can_ tolerate the inaccuracy even if we hit the wrong TLB or other CPUs' TLBs aren't immediately flushed - we trade a few false negatives for minimizing performance impact.
diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h index ff5c7134a37a..5cdb3a1f3995 100644 --- a/arch/x86/include/asm/kfence.h +++ b/arch/x86/include/asm/kfence.h @@ -37,34 +37,13 @@ static inline bool arch_kfence_init_pool(void) return true; } -/* Protect the given page and flush TLB. */ static inline bool kfence_protect_page(unsigned long addr, bool protect) {
- unsigned int level;
- pte_t *pte = lookup_address(addr, &level);
- if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;
- /*
* We need to avoid IPIs, as we may get KFENCE allocations or faults
* with interrupts disabled. Therefore, the below is best-effort, and
* does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
* lazy fault handling takes care of faults after the page is PRESENT.
*/
^^ See this comment. Additionally there's a real performance concern, and the inaccuracy is something that we deliberately accept.
if (protect)
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
elseset_memory_np(addr, addr + PAGE_SIZE);
set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
set_memory_p(addr, addr + PAGE_SIZE);
Isn't this going to do tons of IPIs and shoot down other CPU's TLBs? KFENCE shouldn't incur this overhead on large machines with >100 CPUs if we can avoid it.
What does "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" do that suddenly makes all this fail?
What solution do you prefer that both fixes the issue and avoids the IPIs?
Thanks, -- Marco
On 11/17/22 15:23, Marco Elver wrote:
Yes - it's the 'level != PG_LEVEL_4K'.
That plus the bisect made it pretty easy to find, thanks for the effort!
Could you double-check that the attached patch fixes it? It seemed to for me.
The issue was that the new "No changes, easy!" check in the suspect commit didn't check the cpa->force_split option. It didn't split down to 4k and then all hell broke loose.
Oh, and I totally misread the kfence ability to tolerate partial TLB flushes. Sorry for the noise there!
On Thu, Nov 17, 2022 at 03:54PM -0800, Dave Hansen wrote:
On 11/17/22 15:23, Marco Elver wrote:
Yes - it's the 'level != PG_LEVEL_4K'.
That plus the bisect made it pretty easy to find, thanks for the effort!
Could you double-check that the attached patch fixes it? It seemed to for me.
Yes, that works - thanks!
The issue was that the new "No changes, easy!" check in the suspect commit didn't check the cpa->force_split option. It didn't split down to 4k and then all hell broke loose.
Oh, and I totally misread the kfence ability to tolerate partial TLB flushes. Sorry for the noise there!
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 220361ceb997..9b4e2ad957f6 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1727,7 +1727,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary) /* * No changes, easy! */
- if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr)))
- if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr))
return ret;&& !cpa->force_split)
while (rempages) {
Tested-by: Marco Elver elver@google.com
On Thu, Nov 17, 2022 at 03:54:21PM -0800, Dave Hansen wrote:
On 11/17/22 15:23, Marco Elver wrote:
Yes - it's the 'level != PG_LEVEL_4K'.
That plus the bisect made it pretty easy to find, thanks for the effort!
Could you double-check that the attached patch fixes it? It seemed to for me.
The issue was that the new "No changes, easy!" check in the suspect commit didn't check the cpa->force_split option. It didn't split down to 4k and then all hell broke loose.
Oh, and I totally misread the kfence ability to tolerate partial TLB flushes. Sorry for the noise there!
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 220361ceb997..9b4e2ad957f6 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -1727,7 +1727,8 @@ static int __change_page_attr_set_clr(struct cpa_data *cpa, int primary) /* * No changes, easy! */
- if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr)))
- if (!(pgprot_val(cpa->mask_set) | pgprot_val(cpa->mask_clr))
&& !cpa->force_split)
(operators go at the end of the previous line)
return ret;
while (rempages) {
Urgh.. sorry about that.
On Thu, 17 Nov 2022 at 20:04, Dave Hansen dave.hansen@intel.com wrote:
On 11/17/22 05:58, Marco Elver wrote:
[ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664465] kfence: kfence_init failed
Any chance you could add some debugging and figure out what actually made kfence call over? Was it the pte or the level?
if (WARN_ON(!pte || level != PG_LEVEL_4K)) return false;
I can see how the thing you bisected to might lead to a page table not being split, which could mess with the 'level' check.
Also, is there a reason this code is mucking with the page tables directly? It seems, uh, rather wonky. This, for instance:
if (protect) set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); else set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); /* * Flush this CPU's TLB, assuming whoever did the allocation/free is * likely to continue running on this CPU. */ preempt_disable(); flush_tlb_one_kernel(addr); preempt_enable();
Seems rather broken. I assume the preempt_disable() is there to get rid of some warnings. But, there is nothing I can see to *keep* the CPU that did the free from being different from the one where the TLB flush is performed until the preempt_disable(). That makes the flush_tlb_one_kernel() mostly useless.
Is there a reason this code isn't using the existing page table manipulation functions and tries to code its own? What prevents it from using something like the attached patch?
I have applied this patch and found build warnings / errors.
In file included from mm/kfence/core.c:34: arch/x86/include/asm/kfence.h: In function 'kfence_protect_page': arch/x86/include/asm/kfence.h:45:17: error: implicit declaration of function 'set_memory_p'; did you mean 'set_memory_np'? [-Werror=implicit-function-declaration] 45 | set_memory_p(addr, addr + PAGE_SIZE); | ^~~~~~~~~~~~ | set_memory_np cc1: all warnings being treated as errors make[4]: *** [scripts/Makefile.build:250: mm/kfence/core.o] Error 1 In file included from mm/kfence/report.c:20: arch/x86/include/asm/kfence.h: In function 'kfence_protect_page': arch/x86/include/asm/kfence.h:45:17: error: implicit declaration of function 'set_memory_p'; did you mean 'set_memory_np'? [-Werror=implicit-function-declaration] 45 | set_memory_p(addr, addr + PAGE_SIZE); | ^~~~~~~~~~~~ | set_memory_np cc1: all warnings being treated as errors make[4]: *** [scripts/Makefile.build:250: mm/kfence/report.o] Error 1 In file included from mm/kfence/kfence_test.c:26: arch/x86/include/asm/kfence.h: In function 'kfence_protect_page': arch/x86/include/asm/kfence.h:45:17: error: implicit declaration of function 'set_memory_p'; did you mean 'set_memory_np'? [-Werror=implicit-function-declaration] 45 | set_memory_p(addr, addr + PAGE_SIZE); | ^~~~~~~~~~~~ | set_memory_np cc1: all warnings being treated as errors make[4]: *** [scripts/Makefile.build:250: mm/kfence/kfence_test.o] Error 1
ref: https://builds.tuxbuild.com/2HqMWcweeInju7rqVgGdNge7gby/
- Naresh