On Mon, Nov 20, 2023 at 11:54:28PM +0000, Mark Brown wrote:
The kernel has recently added support for shadow stacks, currently x86 only using their CET feature but both arm64 and RISC-V have equivalent features (GCS and Zicfiss respectively), I am actively working on GCS[1]. With shadow stacks the hardware maintains an additional stack containing only the return addresses for branch instructions which is not generally writeable by userspace and ensures that any returns are to the recorded addresses. This provides some protection against ROP attacks and making it easier to collect call stacks. These shadow stacks are allocated in the address space of the userspace process.
Our API for shadow stacks does not currently offer userspace any flexiblity for managing the allocation of shadow stacks for newly created threads, instead the kernel allocates a new shadow stack with the same size as the normal stack whenever a thread is created with the feature enabled. The stacks allocated in this way are freed by the kernel when the thread exits or shadow stacks are disabled for the thread. This lack of flexibility and control isn't ideal, in the vast majority of cases the shadow stack will be over allocated and the implicit allocation and deallocation is not consistent with other interfaces. As far as I can tell the interface is done in this manner mainly because the shadow stack patches were in development since before clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process in a similar manner
So while I made clone3() readily extensible I don't want it to ever devolve into a fancier version of a prctl().
I would really like to see a strong reason for allowing userspace to configure the shadow stack size at this point in time.
I have a few questions that are probably me just not knowing much about shadow stacks so hopefully I'm not asking you write a thesis by accident:
(1) What does it mean for a shadow stack to be over allocated and is over-allocation really that much of a problem out in the wild that we need to give I userspace a knob to control a kernel security feature? (2) With what other interfaces is implicit allocation and deallocation not consistent? I don't understand this argument. The kernel creates a shadow stack as a security measure to store return addresses. It seems to me exactly that the kernel should implicitly allocate and deallocate the shadow stack and not have userspace muck around with its size? (3) Why is it safe for userspace to request the shadow stack size? What if they request a tiny shadow stack size? Should this interface require any privilege? (4) Why isn't the @stack_size argument I added for clone3() enough? If it is specified can't the size of the shadow stack derived from it?
And my current main objection is that shadow stacks were just released to userspace. There can't be a massive amount of users yet - outside of maybe early adopters.
The fact that there are other architectures that bring in a similar feature makes me even more hesitant. If they have all agreed _and_ implemented shadow stacks and have unified semantics then we can consider exposing control knobs to userspace that aren't implicitly architecture specific currently.
So I don't have anything against the patches per obviously but with the wider context.
Thanks!