On Thu, Nov 17, 2022 at 06:34AM -0800, Dave Hansen wrote:
On 11/17/22 05:58, Marco Elver wrote:
[ 0.663761] WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/kfence.h:46 kfence_protect+0x7b/0x120 [ 0.664033] WARNING: CPU: 0 PID: 0 at mm/kfence/core.c:234 kfence_protect+0x7d/0x120 [ 0.664465] kfence: kfence_init failed
Any chance you could add some debugging and figure out what actually made kfence call over? Was it the pte or the level?
if (WARN_ON(!pte || level != PG_LEVEL_4K)) return false;
I can see how the thing you bisected to might lead to a page table not being split, which could mess with the 'level' check.
Yes - it's the 'level != PG_LEVEL_4K'.
We do actually try to split the pages in arch_kfence_init_pool() (above this function) - so with "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" this somehow fails...
Also, is there a reason this code is mucking with the page tables directly? It seems, uh, rather wonky. This, for instance:
if (protect) set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); else set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); /* * Flush this CPU's TLB, assuming whoever did the allocation/free is * likely to continue running on this CPU. */ preempt_disable(); flush_tlb_one_kernel(addr); preempt_enable();
Seems rather broken. I assume the preempt_disable() is there to get rid of some warnings. But, there is nothing I can see to *keep* the CPU that did the free from being different from the one where the TLB flush is performed until the preempt_disable(). That makes the flush_tlb_one_kernel() mostly useless.
Is there a reason this code isn't using the existing page table manipulation functions and tries to code its own? What prevents it from using something like the attached patch?
Yes, see the comment below - it's to avoid the IPIs and TLB shoot-downs, because KFENCE _can_ tolerate the inaccuracy even if we hit the wrong TLB or other CPUs' TLBs aren't immediately flushed - we trade a few false negatives for minimizing performance impact.
diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h index ff5c7134a37a..5cdb3a1f3995 100644 --- a/arch/x86/include/asm/kfence.h +++ b/arch/x86/include/asm/kfence.h @@ -37,34 +37,13 @@ static inline bool arch_kfence_init_pool(void) return true; } -/* Protect the given page and flush TLB. */ static inline bool kfence_protect_page(unsigned long addr, bool protect) {
- unsigned int level;
- pte_t *pte = lookup_address(addr, &level);
- if (WARN_ON(!pte || level != PG_LEVEL_4K))
return false;
- /*
* We need to avoid IPIs, as we may get KFENCE allocations or faults
* with interrupts disabled. Therefore, the below is best-effort, and
* does not flush TLBs on all CPUs. We can tolerate some inaccuracy;
* lazy fault handling takes care of faults after the page is PRESENT.
*/
^^ See this comment. Additionally there's a real performance concern, and the inaccuracy is something that we deliberately accept.
if (protect)
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
elseset_memory_np(addr, addr + PAGE_SIZE);
set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
set_memory_p(addr, addr + PAGE_SIZE);
Isn't this going to do tons of IPIs and shoot down other CPU's TLBs? KFENCE shouldn't incur this overhead on large machines with >100 CPUs if we can avoid it.
What does "x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()" do that suddenly makes all this fail?
What solution do you prefer that both fixes the issue and avoids the IPIs?
Thanks, -- Marco