On Fri, Aug 23, 2024 at 11:01:13PM +0100, Mark Brown wrote:
On Fri, Aug 23, 2024 at 04:59:11PM +0100, Catalin Marinas wrote:
On Fri, Aug 23, 2024 at 11:25:30AM +0100, Mark Brown wrote:
We could store either the cap token or the interrupted GCSPR_EL0 (the address below the cap token). It felt more joined up to go with the cap token since notionally signal return is consuming the cap token but either way would work, we could just add an offset when looking at the pointer.
In a hypothetical sigaltshadowstack() scenario, would the cap go on the new signal shadow stack or on the old one? I assume on the new one but in sigcontext we'd save the original GCSPR_EL0. In such hypothetical case, the original GCSPR_EL0 would not need 8 subtracted.
I would have put the token on the old stack since that's what we'd be returning to.
After some more spec reading, your approach makes sense as it matches the GCSSS[12] instructions where the outgoing, rather than incoming, shadow stack is capped. So all good I think. However, a bit more below on the restore order (it's ok but a bit confusing).
This raises interesting questions about what happens if the reason for the signal is that we just overflowed the normal stack (which are among the issues that have got in the way of working out if or how we do something with sigaltshadowstack).
That's not that different from the classic case where we get an error trying to setup the frame. signal_setup_done() handles it by forcing a SIGSEGV. I'd say we do the same here.
I'm not clear what the purpose of the token would be on the new stack, the token basically says "this is somewhere we can sigreturn to", that's not the case for the alternative stack.
Yeah, I thought we have to somehow mark the top of the stack with this token. But looking at the architecture stack switching, it caps the outgoing stack (in our case this would be the interrupted one). So that's settled.
On the patch itself, I think there are some small inconsistencies on how it reads the GCSPR_EL0: preserve_gcs_context() does a gcs_preserve_current_state() and subsequently reads the value from the thread structure. A bit later, gcs_signal_entry() goes for the sysreg directly. I don't think that's a problem even if the thread gets preempted but it would be nice to be consistent. Maybe leave the gcs_preserve_current_state() only a context switch thing. Would it work if we don't touch the thread structure at all in the signal code? We wouldn't deliver a signal in the middle of the switch_to() code. So any value we write in thread struct would be overridden at the next switch.
If GCS is disabled for a guest, we save the GCSPR_EL0 with the cap size subtracted but there's no cap written. In restore_gcs_context() it doesn't look like we add the cap size back when writing GCSPR_EL0. If GCS is enabled, we do consume the cap and add 8 but otherwise it looks that we keep decreasing GCSPR_EL0. I think we should always subtract the cap size if GCS is enabled. This could could do with some refactoring as I find it hard to follow (not sure exactly how, maybe just comments will do).
I'd also keep a single write to GCSPR_EL0 on the return path but I'm ok with two if we need to cope with GCS being disabled but the GCSPR_EL0 still being saved/restored.
Another aspect for gcs_restore_signal(), I think it makes more sense for the cap to be consumed _after_ restoring the sigcontext since this has the actual gcspr_el0 where we stored the cap and represents the original stack. If we'll get an alternative shadow stack, current GCSPR_EL0 on sigreturn points to that alternative shadow stack rather than the original one. That's what confused me when reviewing the patch and I thought the cap goes to the top of the signal stack.