On Fri, Jul 24, 2020 at 2:25 PM Thomas Gleixner tglx@linutronix.de wrote:
Ira,
Thomas Gleixner tglx@linutronix.de writes:
Ira Weiny ira.weiny@intel.com writes:
On Thu, Jul 23, 2020 at 09:53:20PM +0200, Thomas Gleixner wrote: I think, after fixing my code (see below), using idtentry_state could still work. If the per-cpu cache and the MSR is updated in idtentry_exit() that should carry the state to the new cpu, correct?
I'm way too tired to think about that now. Will have a look tomorrow with brain awake.
Not that I'm way more awake now, but at least I have the feeling that my brain is not completely useless.
Let me summarize what I understood:
A per CPU cache which shadows the current state of the MSR, i.e. the current valid key. You use that to avoid costly MSR writes if the key does not change.
On idtentry you store the key on entry in idtentry_state, clear it in the MSR and shadow state if necessary and restore it on exit.
On context switch out you save the per CPU cache value in the task and on context switch in you restore it from there.
Yes, that works (see below for #2) and sorry for my confusion yesterday about storing this in task state.
#2 requires to handle the exceptions which do not go through idtentry_enter/exit() seperately, but that's a manageable amount. It's the ones which use IDTENTRY_RAW or a variant of it.
#BP, #MC, #NMI, #DB, #DF need extra local storage as all the kernel entries for those use nmi_enter()/exit(). So you just can create wrappers around those. Somehting like this
static __always_inline idtentry_state_t idtentry_nmi_enter(void) { idtentry_state_t state = {};
nmi_enter(); instrumentation_begin(); state.key = save_and_clear_key(); instrumentation_end();
}
static __always_inline void idtentry_nmi_exit(idtentry_state_t state) { instrumentation_begin(); restore_key(state.key); instrumentation_end(); nmi_exit(); }
#UD and #PF are using the raw entry variant as well but still invoke idtentry_enter()/exit(). #PF does not need any work. #UD handles WARN/BUG without going through idtentry_enter() first, but I don't think that's an issue unless a not 0 key would prevent writing to the console device. You surely can figure that out.
Putting on my mm maintainer hat for a moment, I really think that we want oopses to print PKRS along with all the other registers when we inevitably oops due to a page fault. And we probably also want it in the nasty nested case where we get infinite page faults and eventually double fault.
I'm sure it's *possible* to wire this up if we stick it in idtentry_state, but it's trivial if we stick it in pt_regs. I'm okay with doing the save/restore in C (in fact, I prefer that), but I think that either the value should be stuck in pt_regs or we should find a way to teach the unwinder to locate idtentry_state.
And, if we go with idtentry_state, we should make a signature change to nmi_enter() to also provide idtentry_state or some equivalent object.
--Andy