On Oct 1, 2021, at 2:29 PM, Thomas Gleixner tglx@linutronix.de wrote:
On Fri, Oct 01 2021 at 08:13, Andy Lutomirski wrote:
On Fri, Oct 1, 2021, at 2:56 AM, Thomas Gleixner wrote: On Thu, Sep 30 2021 at 21:41, Andy Lutomirski wrote:
On Thu, Sep 30, 2021, at 5:01 PM, Thomas Gleixner wrote:
Now that I read the docs some more, I'm seriously concerned about this XSAVE design. XSAVES with UINTR is destructive -- it clears UINV. If we actually use this, then the whole last_cpu "preserve the state in registers" optimization goes out the window. So does anything that happens to assume that merely saving the state doesn't destroy it on respectable modern CPUs XRSTORS will #GP if you XRSTORS twice, which makes me nervous and would need a serious audit of our XRSTORS paths.
I have no idea what you are fantasizing about. You can XRSTORS five times in a row as long as your XSTATE memory image is correct.
I'm just reading TFM, which is some kind of dystopian fantasy.
11.8.2.4 XRSTORS
Before restoring the user-interrupt state component, XRSTORS verifies that UINV is 0. If it is not, XRSTORS causes a general-protection fault (#GP) before loading any part of the user-interrupt state component. (UINV is IA32_UINTR_MISC[39:32]; XRSTORS does not check the contents of the remainder of that MSR.)
Duh. I was staring at the SDM and searching for a hint. Stupid me!
So if UINV is set in the memory image and you XRSTORS five times in a row, the first one will work assuming UINV was zero. The second one will #GP.
Yes. I can see what you mean now :)
11.8.2.3 XSAVES After saving the user-interrupt state component, XSAVES clears UINV. (UINV is IA32_UINTR_MISC[39:32]; XSAVES does not modify the remainder of that MSR.)
So if we're running a UPID-enabled user task and we switch to a kernel thread, we do XSAVES and UINV is cleared. Then we switch back to the same task and don't do XRSTORS (or otherwise write IA32_UINTR_MISC) and UINV is still clear.
Yes, that has to be mopped up on the way to user space.
And we had better clear UINV when running a kernel thread because the UPID might get freed or the kernel thread might do some CPL3 shenanigans (via EFI, perhaps? I don't know if any firmwares actually do this).
Right. That's what happens already with the current pile.
So all this seems to put UINV into the "independent" category of feature along with LBR. And the 512-byte wastes from extra copies of the legacy area and the loss of the XMODIFIED optimization will just be collateral damage.
So we'd end up with two XSAVES on context switch. We can simply do:
XSAVES(); fpu.state.xtsate.uintr.uinv = 0;
Could work. As long as UINV is armed, RR can change at any time (maybe just when IF=1? The manual is unclear). But the first XSAVES disarms UINV, so maybe this won’t confuse any callers.
which allows to do as many XRSTORS in a row as we want. Only the final one on the way to user space will have to restore the real vector if the register state is not valid:
if (fpu_state_valid()) { if (needs_uinv(current) wrmsrl(UINV, vector); } else { if (needs_uinv(current) fpu.state.xtsate.uintr.uinv = vector; XRSTORS(); }
Hmm?
I like it better than anything else I’ve seen.
Thanks,
tglx