On 02/05/25 06:53, Dave Hansen wrote:
On 5/2/25 02:55, Valentin Schneider wrote:
My gripe with that was having two separate mechanisms
- super early entry around SWITCH_TO_KERNEL_CR3)
- later entry at context tracking
What do you mean by "later entry"?
I meant the point at which the deferred operation is run in the current patches, i.e. ct_kernel_enter() - kernel entry from the PoV of context tracking.
All of the paths to enter the kernel from userspace have some SWITCH_TO_KERNEL_CR3 variant. If they didn't, the userspace that they entered from could have attacked the kernel with Meltdown.
I'm theorizing that if this is _just_ about avoiding TLB flush IPIs that you can get away with a single mechanism.
So right now there would indeed be the TLB flush IPIs, but also the text_poke() ones (sync_core() after patching text).
These are the two NOHZ-breaking IPIs that show up on my HP box, and that I also got reports for from folks using NOHZ_FULL + CPU isolation in production, mostly on SPR "edge enhanced" type of systems.
There's been some other sources of IPIs that have been fixed with an ad-hoc solution - disable the mechanism for NOHZ_FULL CPUs or do it differently such that an IPI isn't required, e.g.
https://lore.kernel.org/lkml/ZJtBrybavtb1x45V@tpad/
While I don't expect the list to grow much, it's unfortunately not just the TLB flush IPIs.