On Sun, 2025-09-21 at 14:21 +0100, Mark Brown wrote:
During the discussion of the clone3() support for shadow stacks concerns were raised from the glibc side that since it is not possible to reuse the allocated shadow stack[1]. This means that the benefit of being able to manage allocations is greatly reduced, for example it is not possible to integrate the shadow stacks into the glibc thread stack cache. The stack can be inspected but otherwise it would have to be unmapped and remapped before it could be used again, it's not clear that this is better than managing things in the kernel.
In that discussion I suggested that we could enable reuse by writing a token to the shadow stack of exiting threads, mirroring how the userspace stack pivot instructions write a token to the outgoing stack. As mentioned by Florian[2] glibc already unwinds the stack and exits the thread from the start routine which would integrate nicely with this, the shadow stack pointer will be at the same place as it was when the thread started.
This would not write a token if the thread doesn't exit cleanly, that seems viable to me - users should probably handle this by double checking that a token is present after waiting for the thread.
This is tagged as a RFC since I put it together fairly quickly to demonstrate the proposal and the suggestion hasn't had much response either way from the glibc developers. At the very least we don't currently handle scheduling during exit(), or distinguish why the thread is exiting. I've also not done anything about x86.
Security-wise, it seems reasonable that if you are leaving a shadow stack, that you could leave a token behind. But for the userspace scheme to back up the SSP by doing a longjmp() or similar I have some doubts. IIRC there were some cross stack edge cases that we never figured out how to handle.
As far as re-using allocated shadow stacks, there is always the option to enable WRSS (or similar) to write the shadow stack as well as longjmp at will.
I think we should see a fuller solution from the glibc side before adding new kernel features like this. (apologies if I missed it). I wonder if we are building something that will have an extremely complicated set of rules for what types of stack operations should be expected to work.
Sort of related, I think we might think about msealing shadow stacks, which will have trouble with a lot of these user managed shadow stack schemes. The reason is that as long as shadow stacks can be unmapped while a thread is on them (say a sleeping thread), a new shadow stack can be allocated in the same place with a token. Then a second thread can consume the token and possibly corrupt the shadow stack for the other thread with it's own calls. I don't know how realistic it is in practice, but it's something that guard gaps can't totally prevent.
But for automatic thread created shadow stacks, there is no need to allow userspace to unmap a shadow stack, so the automatically created stacks could simply be msealed on creation and unmapped from the kernel. For a lot of apps (most?) this would work perfectly fine.
I think we don't want 100 modes of shadow stack. If we have two, I'd think: 1. Msealed, simple more locked down kernel allocated shadow stack. Limited or none user space managed shadow stacks. 2. WRSS enabled, clone3-preferred max compatibility shadow stack. Longjmp via token writes and don't even have to think about taking signals while unwinding across stacks, or whatever other edge case.
This RFC seems to be going down the path of addressing one edge case at a time. Alone it's fine, but I'd rather punt these types of usages to (2) by default.
Thoughts?