On Jul 23, 2020, at 9:52 AM, Fenghua Yu fenghua.yu@intel.com wrote:
Hi, Dave,
On Thu, Jul 23, 2020 at 09:23:13AM -0700, Dave Hansen wrote:
On 7/23/20 9:18 AM, Fenghua Yu wrote: The PKRS MSR has been preserved in thread_info during kernel entry. We don't need to preserve it in another place (i.e. idtentry_state).
I'm missing how the PKRS MSR gets preserved in thread_info. Could you explain the mechanism by which this happens and point to the code implementing it, please?
[Sorry, my mistake: I mean "thread_struct" instead of "thread_info". Hopefully the typo doesn't change the essential part in my last email.]
The "saved_pkrs" is defined in thread_struct and context switched in patch 04/17: https://lore.kernel.org/lkml/20200717072056.73134-5-ira.weiny@intel.com/
Because there is no XSAVE support the PKRS MSR, we preserve it in "saved_pkrs" in thread_struct. It's initialized as 0 (init state, no protection key) in fork() or exec(). It's updated to a right protection value when a driver calls the updating API. The PKRS MSR is context switched by "saved_pkrs" when switching to a task (unless optimized if the cached MSR is the same as the saved one).
Suppose some kernel code (a syscall or kernel thread) changes PKRS then takes a page fault. The page fault handler needs a fresh PKRS. Then the page fault handler (say a VMA’s .fault handler) changes PKRS. The we get an interrupt. The interrupt *also* needs a fresh PKRS and the page fault value needs to be saved somewhere.
So we have more than one saved value per thread, and thread_struct isn’t going to solve this problem.
But idtentry_state is also not great for a couple reasons. Not all entries have idtentry_state, and the unwinder can’t find it for debugging. For that matter, the page fault logic probably wants to know the previous PKRS, so it should either be stashed somewhere findable or it should be explicitly passed around.
My suggestion is to enlarge pt_regs. The save and restore logic can probably be in C, but pt_regs is the logical place to put a register that is saved and restored across all entries.
Whoever does this work will have the delightful job of figuring out whether BPF thinks that the layout of pt_regs is ABI and, if so, fixing the resulting mess.
The fact the new fields will go at the beginning of pt_regs will make this an entertaining prospect.