On Tue, Aug 13, 2024 at 05:25:47PM +0100, Catalin Marinas wrote:
However, the x86 would be slightly inconsistent here between clone() and clone3(). I guess it depends how you look at it. The classic clone() syscall, if one doesn't pass CLONE_VM but does set new stack, there's no new shadow stack allocated which I'd expect since it's a new stack. Well, I doubt anyone cares about this scenario. Are there real cases of !CLONE_VM but with a new stack?
ISTR the concerns were around someone being clever with vfork() but I don't remember anything super concrete. In terms of the inconsistency here that was actually another thing that came up - if userspace specifies a stack for clone3() it'll just get used even with CLONE_VFORK so it seemed to make sense to do the same thing for the shadow stack. This was part of the thinking when we were looking at it, if you can specify a regular stack you should be able to specify a shadow stack.
I'm confused that we need to consume the token here. I could not find the default shadow stack allocation doing this, only setting it via create_rstor_token() (or I did not search enough). In the default case,
As discussed for a couple of previous versions if we don't have the token and userspace can specify any old shadow stack page as the shadow stack this allows clone3() to be used to overwrite the shadow stack of another thread, you can point to a shadow stack page which is currently
IIUC, the kernel-allocated shadow stack will have the token always set while the user-allocated one will be cleared. I was looking to
No, when the kernel allocates we don't bother with tokens at all. We only look for and clear a token with the user specified shadow stack.
understand the inconsistency between these two cases in terms of the final layout of the new shadow stack: one with the token, the other without. I can see the need for checking but maybe start with requiring it to be 0 and setting the token before returning, for consistency with clone().
The layout should be the same, the shadow stack will point to where the token would be - the only difference is if we checked to see if there was a token there. Since we either clear the token on use or allocate a fresh page in both cases the value there will be 0.
In the kernel-allocated shadow stack, is the token used for anything? I can see it's used for signal delivery and return but I couldn't figure out what it is used for in a thread's shadow stack.
For arm64 we place differently formatted tokens there during signal handling, and a token is placed at the top of the stack as part of the architected stack pivoting instructions (and a token at the destination consumed). I believe x86 has the same pivoting behaviour but ICBW. A user specified shadow stack is handled in a very similar way to what would happen if the newly created thread immediately pivoted to the specified stack.
Also, can one not use the clone3() to point to the clone()-allocated shadow stack? Maybe that's unlikely as an app tends to stick to one syscall flavour or the other.
A valid token will only be present on an inactive stack. If a thread pivots away from a kernel allocated stack then another thread could be started using the original kernel allocated stack, any program doing this should think carefully about the lifecycle of the kernel allocated stack but it's possible. If a thread has not pivoted away from it's stack then there won't be a token at the top of the stack and it won't be possible to pivot to it.
/*
* For CLONE_VFORK the child will share the parents
* shadow stack. Make sure to clear the internal
* tracking of the thread shadow stack so the freeing
* logic run for child knows to leave it alone.
*/
if (clone_flags & CLONE_VFORK) {
shstk->base = 0;
shstk->size = 0;
return 0;
}
I think we should leave the CLONE_VFORK check on its own independent of the clone3() arguments. If one passes both CLONE_VFORK and specific shadow stack address/size, they should be ignored (or maybe return an error if you want to make it stricter).
This is existing logic from the current x86 code that's been reindented due to the addition of explicitly specified shadow stacks, it's not new behaviour. It is needed to stop the child thinking it has the parent's shadow stack in the CLONE_VFORK case.
I figured that. But similar to the current !CLONE_VM behaviour where no new shadow stack is allocated even if a new stack is passed to clone(), I was thinking of something similar here for consistency: don't set up a shadow stack in the CLONE_VFORK case or at least allow it only if a new stack is being set up (if we extend this to clone(), it would be a small ABI change).
We could restrict specifying a shadow stack to only be supported when a regular stack is also specified, if we're doing that I'd prefer to do it in all cases rather than only for vfork() since that reduces the number of special cases and we don't restrict normal stacks like that.
This is again all existing behaviour for the case where the user has not specified a shadow stack reindented, as mentioned above if the user has specified one explicitly then we just do what we were asked. The existing behaviour is to only create a new shadow stack for the child in the CLONE_VM case and leave the child using the same shadow stack as the parent in the copied mm for !CLONE_VM.
I guess I was rather questioning the current choices than the new clone3() ABI. But even for the new clone3() ABI, does it make sense to set up a shadow stack if the current stack isn't changed? We'll end up with a lot of possible combinations that will never get tested but potentially become obscure ABI. Limiting the options to the sane choices only helps with validation and unsurprising changes later on.
OTOH if we add the restrictions it's more code (and more test code) to check, and thinking about if we've missed some important use case. Not that it's a *huge* amount of code, like I say I'd not be too unhappy with adding a restriction on having a regular stack specified in order to specify a shadow stack.