On 25/10/2024 13:33, Dave Martin wrote:
On Fri, Oct 25, 2024 at 10:24:41AM +0200, Kevin Brodsky wrote:
> @@ -907,6 +964,7 @@ SYSCALL_DEFINE0(rt_sigreturn) > { > struct pt_regs *regs = current_pt_regs(); > struct rt_sigframe __user *frame; > + struct user_access_state ua_state; > > /* Always make any pending restarted system calls return -EINTR */ > current->restart_block.fn = do_no_restart_syscall; > @@ -923,12 +981,14 @@ SYSCALL_DEFINE0(rt_sigreturn) > if (!access_ok(frame, sizeof (*frame))) > goto badframe; > > - if (restore_sigframe(regs, frame)) > + if (restore_sigframe(regs, frame, &ua_state)) > goto badframe; > > if (restore_altstack(&frame->uc.uc_stack)) > goto badframe; > > + restore_user_access_state(&ua_state); > + > return regs->regs[0]; > > badframe: The saving part I'm fine with. For restoring, I was wondering whether we can get a more privileged POR_EL0 if reading the frame somehow failed. This is largely theoretical, there are other ways to attack like writing POR_EL0 directly than unmapping/remapping the signal stack.
What I'd change here is always restore_user_access_state() to POR_EL0_INIT. Maybe just initialise ua_state above and add the function call after the badframe label.
I'm not sure I understand. When we enter this function, POR_EL0 is set to whatever the signal handler set it to (POR_EL0_INIT by default). There are then two cases:
- Everything succeeds, including reading the saved POR_EL0 from the
frame. We then call restore_user_access_state(), setting POR_EL0 to the value we've read, and return to userspace. 2) Any uaccess fails (for instance reading POR_EL0). In that case we leave POR_EL0 unchanged and deliver SIGSEGV.
In case 2 POR_EL0 is most likely already set to POR_EL0_INIT, or whatever the signal handler set it to. It's not clear to me that forcing it to POR_EL0_INIT helps much. Either way it's doubtful that the SIGSEGV handler will be able to recover, since the new signal frame we will create for it may be a mix of interrupted state and signal handler state (depending on exactly where we fail).
If the SIGSEGV delivery succeeds, returning would restore the POR_EL0 set up by the previous signal handler, potentially more privileged. Does it matter? Can it return all the way to the original context?
What we store into the signal frame when delivering that SIGSEGV is a mixture of the original state (up to the point of failure) and the signal handler's state (what we couldn't restore). It's hard to reason about how that SIGSEGV handler could possibly handle this, but in any case it would have to massage its signal frame so that the next sigreturn does the right thing. Restoring only part of the frame records is bound to cause trouble and that's true for POR_EL0 as well - I doubt there's much value in special-casing it.
This feels like a simplification?
We can leave a mix of restored and unrestored state when generating the SIGSEGV signal frame, providing that those changes will make no difference when the rt_sigreturn is replayed.
I'm not sure I understand what this means. If the SIGSEGV handler were to sigreturn without touching its signal frame, things are likely to explode: it may be returning to the point where the original handler called sigreturn, for instance (if the first uaccess failed during that sigreturn call).
POR_EL0 will make a difference, though.
The POR_EL0 image in the SIGSEGV signal frame needs be the same value that caused the original rt_sigreturn to barf (if this is what caused the barf). It should be up to the SIGSEGV handler to decide what (if anything) to do about that. The kernel can't know what userspace intended.
Unless I'm missing something this is exactly what happens now: what we store in the SIGSEGV frame is the POR_EL0 value the original handler was using.
Note that for this to work, the SIGSEGV stack (whether main or alternate) must be accessible with POR_EL0_INIT permissions, or the SIGSEGV handler must start with a (gross) asm shim to establish a usable POR_EL0. But that's not really our problem here.
This is indeed orthogonal - the SIGSEGV handler will be run with POR_EL0_INIT, like any other handler. The value we store in the frame is unrelated.
(I'm not saying that the kernel necessarily fails to do this -- I haven't checked -- but just trying to understand the problem here.)
The actual problem here is that if the SIGSEGV handler wants to bail out with a siglongjmp(), there is no way to determine the correct value of POR_EL0 to restore.
Correct, but again this is true of any other record - for instance TPIDR2.
I wonder whether POR_EL0 should be saved in sigjmp_buf (depending on whether sigjmp_buf is horribly inextensible and also full up, or merely horribly inextensible).
It very much feels that this is the case - if a handler relies on longjmp() or setcontext() to restore a known state, then POR_EL0 should be part of that state.
Does anyone know whether PKRU in sigjmp_buf on x86?
I can't say for sure but I don't see PKRU being handled in setjmp/longjmp in glibc at least.
Kevin