On Tue, 2024-02-20 at 20:27 -0500, dalias@libc.org wrote:
Then I think WRSS might fit your requirements better than what glibc did. It was considered a reduced security mode that made libc's job much easier and had better compatibility, but the last discussion was to try to do it without WRSS.
Where can I read more about this? Some searches I tried didn't turn up much useful information.
There never was any proposal written down AFAIK. In the past we have had a couple "shadow stack meetup" calls where folks who are working on shadow stack got together to hash out some things. We discussed it there.
But briefly, in the Intel SDM (and other places) there is documentation on the special shadow stack instructions. The two key ones for this are WRSS and RSTORSSP. WRSS is an instruction which can be enabled by the kernel (and there is upstream support for this). The instruction can write through shadow stack memory.
RSTORSSP can be used to consume a restore token, which is a special value on the shadow stack. When this operations happens the SSP is moved adjacent to the token that was just consumed. So between the two of them the SSP can be adjusted to specific spots on the shadow stack or another shadow stack.
Today when you longjmp() with shadow stack in glibc, INCSSP is used to move the SSP back to the spot on the shadow stack where the setjmp() was called. But this algorithm doesn't always work, for example, longjmp()ing between stacks. To work around this glibc uses a scheme where it searches from the target SSP for a shadow stack token and then consumes it and INCSSPs back to the target SPP. It just barely miraculously worked in most cases.
Some specific cases that were still open were longjmp()ing off of a custom userspace threading library stack, which may not have left a token behind when it jumped to a new stack. And also, potentially off of an alt shadow stack in the future, depending on whether it leaves a restore token when handling a signal. (the problem there, is if there is no room to leave it).
So that is how x86 glic works, and I think arm was thinking along the same lines. But if you have WRSS (and arm's version), you could just write a restore token or anything else you need to fixup on the shadow stack. Then you could longjmp() in one go without any high wire acts. It's much simpler and more robust and would prevent needing to leave a restore token when handling a signal to an alt shadow stack. Although, nothing was ever prototyped. So "in theory".
But that is all about moving the SSP where you need it. It doesn't resolve any of the allocation lifecycle issues. I think for those the solutions are: 1. Not supporting ucontext/sigaltstack and shadow stack 2. Stefan's idea 3. A new interface that takes user allocated shadow stacks for those operations
My preference has been a combination of 1 and 3. For threads, I think Mark's clone3 enhancements will help.
Anyway, there is an attempt at a summary. I'd also point you to HJ for more glibc context, as I mostly worked on the kernel side.