On Wed, 2024-02-21 at 13:30 -0500, dalias@libc.org wrote:
3 is the cleanest and safest I think, and it was thought it might not need kernel help, due to a scheme Florian had to direct signals to specific threads. It's my preference at this point.
The operations where the shadow stack has to be processed need to be executable from async-signal context, so this imposes a requirement to block all signals around the lock. This makes all longjmps a heavy, multi-syscall operation rather than O(1) userspace operation. I do not think this is an acceptable implementation, especially when there are clearly superior alternatives without that cost or invasiveness.
That is a good point. Could the per-thread locks be nestable to address this? We just need to know if a thread *might* be using shadow stacks. So we really just need a per-thread count.
1 and 2 are POCed here, if you are interested: https://github.com/rpedgeco/linux/commits/shstk_suppress_rfc/
I'm not clear why 2 (suppression of #CP) is desirable at all. If shadow stack is being disabled, it should just be disabled, with minimal fault handling to paper over any racing operations at the moment it's disabled. Leaving it on with extreme slowness to make it not actually do anything does not seem useful.
The benefit is that code that is using shadow stack instructions won't crash if it relies on them working. For example RDSSP turns into a NOP if shadow stack is disabled, and the intrinsic is written such that a NULL pointer is returned if shadow stack is disabled. The shadow stack is normally readable, and this happens in glibc sometimes. So if there was code like:
long foo = *(long *)_get_ssp();
...then it could suddenly read a NULL pointer if shadow stack got disabled. (notice, it's not even a "shadow stack access" fault-wise. So it was looked at as somewhat more robust. But neither 1 or 2 are perfect for apps that are manually using shadow stack instructions.
Is there some way folks have in mind to use option 2 to lazily disable shadow stack once the first SS-incompatible code is executed, when execution is then known not to be in the middle of a SS-critical section, instead of doing it right away? I don't see how this could work, since the SS-incompatible code could be running from a signal handler that interrupted an SS-critical section.
The idea was to disable it without critical sections, and it could be more robust, but not perfect. I was preferring option 1 between 1 and 2, which was closer to your original suggestion. But it has problems like the example I gave above. I agree 1 is relatively simpler for the kernel, between 1 and 2.
If folks on the kernel side are not going to be amenable to doing the things that are easy for the kernel to make it work without breaking compatibility with existing interfaces, but that are impossible or near-impossible for userspace to do, this seems like a dead-end. And I suspect an operation to "disable shadow stack, but without making threads still in SS-critical sections crash" is going to be necessary..
I think we have to work through all the alternative before we can accuse the kernel of not being amenable. Is there something that you would like to see out of this conversation that is not happening?
No, I was just interpreting "uphill battle". I really do not want to engage in an uphill battle for the sake of making it practical to support something that was never my goal to begin with. If I'm misreading this, or if others are willing to put the effort into that "battle", I'd be happy to be mistaken about "not amenable".
I don't think x86 maintainers have put a foot down on anything around this at least. They would normally have concerns about complexity and maintainability. So if we have something that has lower value (imperfect solution), and high complexity, it starts to look like less promising path.