The 11/21/2023 11:17, Christian Brauner wrote:
On Mon, Nov 20, 2023 at 11:54:28PM +0000, Mark Brown wrote:
The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zicfiss respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process.
Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process in a similar manner
So while I made clone3() readily extensible I don't want it to ever devolve into a fancier version of a prctl().
I would really like to see a strong reason for allowing userspace to configure the shadow stack size at this point in time.
I have a few questions that are probably me just not knowing much about shadow stacks so hopefully I'm not asking you write a thesis by accident:
(1) What does it mean for a shadow stack to be over allocated and is over-allocation really that much of a problem out in the wild that we need to give I userspace a knob to control a kernel security feature?
over-allocation: allocating 100M shadow stack (RLIMIT_DATA, RLIMIT_AS) for a thread that keeps arrays of data on the stack, instead of say 8k, and thus running into resource limits.
under-allocation: small thread stack with runaway recursion, but large sigaltstack: the stack overflow handler can run out of space because it uses the same shadow stack as the thread.
(2) With what other interfaces is implicit allocation and deallocation not consistent? I don't understand this argument. The kernel creates a shadow stack as a security measure to store return addresses. It seems to me exactly that the kernel should implicitly allocate and deallocate the shadow stack and not have userspace muck around with its size?
the kernel is not supposed to impose stack size policy or a particular programming model that limits the stack management options nor prevent the handling of stack overflows.
(3) Why is it safe for userspace to request the shadow stack size? What if they request a tiny shadow stack size? Should this interface require any privilege?
user can allocate huge or tiny stacks already.
and i think userspace can take control over shadow stack management: it can disable signals, start a clone child with stack_size == 1 page, map_shadow_stack and switch to it, enable signals. however this is complicated, leaks 1 page of kernel allocated shadow stack (+reserved guard page, i guess userspace could unmap, not sure if that works currently) and requires additional syscalls.
(4) Why isn't the @stack_size argument I added for clone3() enough? If it is specified can't the size of the shadow stack derived from it?
shadow stack only contains return addresses so it is proportional to the number of stack frames, not the stack size and it must account for sigaltstack too, not just the thread stack.
if you make minimal assumptions about stack usage and ignore the sigaltstack issue then the worst case shadow stack requirement is indeed proportional to the stack_size, but this upper bound can be pessimistic and userspace knows the tradeoffs better.
And my current main objection is that shadow stacks were just released to userspace. There can't be a massive amount of users yet - outside of maybe early adopters.
no upstream libc has code to enable shadow stacks at this point so there are exactly 0 users in the open. (this feature requires runtime support)
the change is expected to allow wider deployability. (e.g. not just in glibc)
The fact that there are other architectures that bring in a similar feature makes me even more hesitant. If they have all agreed _and_ implemented shadow stacks and have unified semantics then we can consider exposing control knobs to userspace that aren't implicitly architecture specific currently.
So I don't have anything against the patches per obviously but with the wider context.
Thanks!