On Fri, Aug 23, 2024 at 04:59:11PM +0100, Catalin Marinas wrote:
On Fri, Aug 23, 2024 at 11:25:30AM +0100, Mark Brown wrote:
We could store either the cap token or the interrupted GCSPR_EL0 (the address below the cap token). It felt more joined up to go with the cap token since notionally signal return is consuming the cap token but either way would work, we could just add an offset when looking at the pointer.
In a hypothetical sigaltshadowstack() scenario, would the cap go on the new signal shadow stack or on the old one? I assume on the new one but in sigcontext we'd save the original GCSPR_EL0. In such hypothetical case, the original GCSPR_EL0 would not need 8 subtracted.
I would have put the token on the old stack since that's what we'd be returning to. This raises interesting questions about what happens if the reason for the signal is that we just overflowed the normal stack (which are among the issues that have got in the way of working out if or how we do something with sigaltshadowstack). I'm not clear what the purpose of the token would be on the new stack, the token basically says "this is somewhere we can sigreturn to", that's not the case for the alternative stack.
I need to think some more about this. The gcs_restore_signal() function makes sense, it starts with the current GCSPR_EL0 on the signal stack and consumes the token, adds 8 to the shadow stack pointer. The restore_gcs_context() one is confusing as it happens before consuming the cap token and assumes that the GCSPR_EL0 value actually points to the signal stack. If we ever implement an alternative shadow stack, the original GCSPR_EL0 of the interrupted context would be lost. I know it's not planned for now but the principles should be the same. The sigframe.uc should store the interrupted state.
I think the issues you're pointing out here go to the thing with the cap token marking a place we can sigreturn to and therefore being on the original stack.
To me the order for sigreturn should be first to consume the cap token, validate it etc. and then restore GCSPR_EL0 to whatever was saved in the sigframe.uc prior to the signal being delivered.
To me what we're doing here is that the signal frame says where userspace wants to point GCSPR_EL0 in the returned context, we then go and confirm that this is a valid address by looking at it and checking for a token. The token serves to validate what was saved in sigframe.uc so that it can't just be pointed at some random address.
restore_gcs_context() is loading values from the signal frame in memory (which will only happen if a GCS context is present) then gcs_restore_signal() consumes the token at the top of the stack. The split is because userspace can skip the restore_X_context() functions for the optional signal frame elements by removing them from the context but we want to ensure that we always consume a token.
I agree we should always consume a token but this should be done from the actual hardware GCSPR_EL0 value on the sigreturn call rather than the one restored from sigframe.uc. The restoring should be the last step.
If we look for a token at the GCSPR_EL0 on sigreturn then it wouldn't be valid to call sigreturn() (well, without doing something first to unwind the stack which seems hostile and would require code to become GCS aware that wouldn't otherwise), if you do a sigreturn from somewhere other than the vDSO then you'd have at least the vDSO signal frame left in the GCS. You'd also be able to point GCSPR_EL0 to anywhere since we'd just load a value from the signal frame but not do the cap verification.