On Tue, Jul 21, 2020 at 10:27:09PM -0700, Ira Weiny wrote:
I've been really digging into this today and I'm very concerned that I'm completely missing something WRT idtentry_enter() and idtentry_exit().
I've instrumented idt_{save,restore}_pkrs(), and __dev_access_{en,dis}able() with trace_printk()'s.
With this debug code, I have found an instance where it seems like idtentry_enter() is called without a corresponding idtentry_exit(). This has left the thread ref counter at 0 which results in very bad things happening when __dev_access_disable() is called and the ref count goes negative.
Effectively this seems to be happening:
... // ref == 0 dev_access_enable() // ref += 1 ==> disable protection // exception (which one I don't know) idtentry_enter() // ref = 0 _handler() // or whatever code... // *_exit() not called [at least there is no trace_printk() output]... // Regardless of trace output, the ref is left at 0 dev_access_disable() // ref -= 1 ==> -1 ==> does not enable protection (Bad stuff is bound to happen now...) ...
The ref count ends up completely messed up after this sequence... and the PKRS register gets out of sync as well. This is starting to make some sense of how I was getting what seemed like random crashes before.
Unfortunately I don't understand the idt entry/exit code well enough to see clearly what is going on. Is there any reason idtentry_exit() is not called after idtentry_enter()? Perhaps some NMI/MCE or 'not normal' exception code path I'm not seeing? In my searches I see a corresponding exit for every enter. But MCE is pretty hard to follow.
Also is there any chance that the process could be getting scheduled and that is causing an issue?
Ooh, I think I see the problem. We also use idtentry_enter() for #PF, and #PF can schedule, exactly as you observed. Argh!
This then means you need to go frob something in arch/x86/include/asm/idtentry.h, which is somewhat unfortunate.
Thomas, would it make sense to add a type parameter to idtentry_{enter,exit}() ?