On 7/24/23 04:32, Valentin Schneider wrote:
AFAICT the only reasonable way to go about the deferral is to prove that no such access happens before the deferred @operation is done. We got to prove that for sync_core() deferral, cf. PATCH 18.
I'd like to reason about it for deferring vunmap TLB flushes:
What addresses in VMAP range, other than the stack, can early entry code access? Yes, the ranges can be checked at runtime, but is there any chance of figuring this out e.g. at build-time?
Nadav was touching on a very important point: TLB flushes for addresses are relatively easy to defer. You just need to ensure that the CPU deferring the flush does an actual flush before it might architecturally consume the contents of the flushed entry.
TLB flushes for freed page tables are another game entirely. The CPU is free to cache any part of the paging hierarchy it wants at any time. It's also free to set accessed and dirty bits at any time, even for instructions that may never execute architecturally.
That basically means that if you have *ANY* freed page table page *ANYWHERE* in the page table hierarchy of any CPU at any time ... you're screwed.
There's no reasoning about accesses or ordering. As soon as the CPU does *anything*, it's out to get you.
You're going to need to do something a lot more radical to deal with free page table pages.