Hi,
I worked on the x86 kernel shadow stack support. I think it is an interesting suggestion. Some questions below, and I will think more on it.
On Tue, 2024-02-20 at 11:36 -0500, Stefan O'Rear wrote:
While discussing the ABI implications of shadow stacks in the context of Zicfiss and musl a few days ago, I had the following idea for how to solve the source compatibility problems with shadow stacks in POSIX.1-2004 and POSIX.1-2017:
- Introduce a "flexible shadow stack handling" option. For what
follows, it doesn't matter if this is system-wide, per-mm, or per-vma.
- Shadow stack faults on non-shadow stack pages, if flexible shadow
stack handling is in effect, cause the affected page to become a shadow stack page. When this happens, the page filled with invalid address tokens.
Hmm, could the shadow stack underflow onto the real stack then? Not sure how bad that is. INCSSP (incrementing the SSP register on x86) loops are not rare so it seems like something that could happen.
Faults from non-shadow-stack accesses to a shadow-stack page which was created by the previous paragraph will cause the page to revert to non-shadow-stack usage, with or without clearing.
Won't this prevent catching stack overflows when they happen? An overflow will just turn the shadow stack into normal stack and only get detected when the shadow stack unwinds?
A related question would be how to handle the expanding nature of the initial stack. I guess the initial stack could be special and have a separate shadow stack.
Important: a shadow stack operation can only load a valid address from a page if that page has been in continuous shadow stack use since the address was written by another shadow stack operation; the flexibility delays error reporting in cases of stray writes but it never allows for corruption of shadow stack operation.
Shadow stacks currently have automatic guard gaps to try to prevent one thread from overflowing onto another thread's shadow stack. This would somewhat opens that up, as the stack guard gaps are usually maintained by userspace for new threads. It would have to be thought through if these could still be enforced with checking at additional spots.
- Standards-defined operations which use a user-provided stack
(makecontext, sigaltstack, pthread_attr_setstack) use a subrange of the provided stack for shadow stack storage. I propose to use a shadow stack size of 1/32 of the provided stack size, rounded up to a positive integer number of pages, and place the shadow stack allocation at the lowest page-aligned address inside the provided stack region.
Since page usage is flexible, no change in page permissions is immediately needed; this merely sets the initial shadow stack pointer for the new context.
If the shadow stack grew in the opposite direction to the architectural stack, it would not be necessary to pick a fixed direction.
- SIGSTKSZ and MINSIGSTKSZ are increased by 2 pages to provide
sufficient space for a minimum-sized shadow stack region and worst case alignment.
Do all makecontext() callers ensure the size is greater than this?
I guess glibc's makecontext() could do this scheme to prevent leaking without any changes to the kernel. Basically steal a little of the stack address range and overwrite it with a shadow stack mapping. But only if the apps leave enough room. If they need to be updated, then they could be updated to manage their own shadow stacks too I think.
_Without_ doing this, sigaltstack cannot be used to recover from stack overflows if the shadow stack limit is reached first, and makecontext cannot be supported without memory leaks and unreportable error conditions.
FWIW, I think the makecontext() shadow stack leaking is a bad idea. I would prefer the existing makecontext() interface just didn't support shadow stack, rather than the leaking solution glibc does today.
The situation (for arm and riscv too I think?) is that some applications will just not work automatically due to custom stack switching implementations. (user level threading libraries, JITs, etc). So I think it should be ok to ask for apps to change to enable shadow stack and we should avoid doing anything too awkward in pursuit of getting it to work completely transparently.
For ucontext, there was some discussion about implementing changes to the interface makecontext() interface that allows the app to allocate and manage their own shadow stacks. So they would be responsible for freeing and allocating the shadow stacks. It seems a little more straightforward.
For x86, due to some existing GCC binaries that jumped ahead of the kernel support, it will likely require an ABI opt-in to enable alt shadow stacks. So alt shadow stack support design is still pretty open on the x86 side. Very glad to get broader input on it.
Kernel-allocated shadow stacks with a unique VM type are still useful since they allows stray writes to crash at the time the stray write is performed, rather than delaying the crash until the next shadow stack read.
The pthread and makecontext changes could be purely libc side, but we would need kernel support for sigaltstack and page usage changes.
Luckily, there is no need to support stacks which are simultaneously used from more than one process, so "is this a shadow stack page" can be tracked purely at the vma/pte level without any need to involve the inode. POSIX explicitly allows using mmap to obtain stack memory and does not forbid MAP_SHARED; I consider this sufficiently perverse application behavior that it is not necessary to ensure exclusive use of the underlying pages while a shadow stack pte exists. (Applications that use MAP_SHARED for stacks do not get the full benefit of the shadow stack but they keep POSIX.1-2004 conformance, applications that allocate stacks exclusively in MAP_PRIVATE memory lose no security.)
On x86 we don't support MAP_SHARED shadow stacks. There is a whole snarl around the dirty bit in the PTE. I'm not sure it's impossible but it was gladly avoided. There is also a benefit in avoiding having them get mapped as writable in a different context.
The largest complication of this scheme is likely to be that the shadow stack usage property of a page needs to survive the page being swapped out and back in, which likely means that it must be present in the swap PTE.
I am substantially less familiar with GCS and SHSTK than with Zicfiss. It is likely that a syscall or other mechanism is needed to initialize the shadow stack in flexible memory for makecontext.
The ucontext stacks (and alt shadow stacks is the plan) need to have a "restore token". So, yea, you would probably need some syscall to "convert" the normal stack memory into shadow stack with a restore token.
Is there interest on the kernel side on having mechanisms to fully support POSIX.1-2004 with GCS or Zicfiss enabled?
Can you clarify, is the goal to meet compatibility with the spec or try to make more apps run with shadow stack automatically?