On Wed, Feb 21, 2024 at 06:53:44PM +0000, Edgecombe, Rick P wrote:
On Wed, 2024-02-21 at 13:30 -0500, dalias@libc.org wrote:
3 is the cleanest and safest I think, and it was thought it might not need kernel help, due to a scheme Florian had to direct signals to specific threads. It's my preference at this point.
The operations where the shadow stack has to be processed need to be executable from async-signal context, so this imposes a requirement to block all signals around the lock. This makes all longjmps a heavy, multi-syscall operation rather than O(1) userspace operation. I do not think this is an acceptable implementation, especially when there are clearly superior alternatives without that cost or invasiveness.
That is a good point. Could the per-thread locks be nestable to address this? We just need to know if a thread *might* be using shadow stacks. So we really just need a per-thread count.
Due to arbitrarily nestable signal frames, no, this does not suffice. An interrupted operation using the lock could be arbitrarily delayed, even never execute again, making any call to dlopen deadlock.
1 and 2 are POCed here, if you are interested: https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
I'm not clear why 2 (suppression of #CP) is desirable at all. If shadow stack is being disabled, it should just be disabled, with minimal fault handling to paper over any racing operations at the moment it's disabled. Leaving it on with extreme slowness to make it not actually do anything does not seem useful.
The benefit is that code that is using shadow stack instructions won't crash if it relies on them working. For example RDSSP turns into a NOP if shadow stack is disabled, and the intrinsic is written such that a NULL pointer is returned if shadow stack is disabled. The shadow stack is normally readable, and this happens in glibc sometimes. So if there was code like:
long foo = *(long *)_get_ssp();
...then it could suddenly read a NULL pointer if shadow stack got disabled. (notice, it's not even a "shadow stack access" fault-wise. So it was looked at as somewhat more robust. But neither 1 or 2 are perfect for apps that are manually using shadow stack instructions.
It's fine to turn RDSSP into an actual emulated read of the SSP, or at least an emulated load of zero so that uninitialized data is not left in the target register. If doing the latter, code working with the shadow stack just needs to be prepared for the possibility that it could be async-disabled, and check the return value.
I have not looked at all the instructions that become #UD but I suspect they all have reasonable trivial ways to implement a "disabled" version of them that userspace can act upon reasonably.
Rich