Hi Mark,
Thanks for putting this together and sorry it took me some time to catch up (well, still not fully, so rather more questions below).
On Tue, Nov 28, 2023 at 06:22:38PM +0000, Mark Brown wrote:
Since clone3() is readily extensible let's add support for specifying a shadow stack when creating a new thread or process in a similar manner to how the normal stack is specified, keeping the current implicit allocation behaviour if one is not specified either with clone3() or through the use of clone(). Unlike normal stacks only the shadow stack size is specified, similar issues to those that lead to the creation of map_shadow_stack() apply.
My hope when looking at the arm64 patches was that we can completely avoid the kernel allocation/deallocation of the shadow stack since it doesn't need to do this for the normal stack either. Could someone please summarise why we dropped the shadow stack pointer after v1? IIUC there was a potential security argument but I don't think it was a very strong one. Also what's the threat model for this feature? I thought it's mainly mitigating stack corruption. If some rogue code can do syscalls, we have bigger problems than clone3() taking a shadow stack pointer.
My (probably wrong) mental model was that libc can do an mmap() for normal stack, a map_shadow_stack() for the shadow one and invoke clone3() with both these pointers and sizes. There is an overhead of an additional syscall but if some high-performance app needs to spawn threads quickly, it would most likely do some pooling.
I'm not against clone3() getting a shadow_stack_size argument but asking some more questions. If we won't pass a pointer as well, is there any advantage in expanding this syscall vs a specific prctl() option? Do we need a different size per thread or do all threads have the same shadow stack size? A new RLIMIT doesn't seem to map well though, it is more like an upper limit rather than a fixed/default size (glibc I think uses it for thread stacks but bionic or musl don't AFAIK).
Another dumb question on arm64 - is GCSPR_EL0 writeable by the user? If yes, can the libc wrapper for threads allocate a shadow stack via map_shadow_stack() and set it up in the thread initialisation handler before invoking the thread function?
Thanks.