On Tue, Oct 03, 2023 at 03:26:51PM +0100, Mark Brown wrote:
On Tue, Oct 03, 2023 at 09:45:56AM +0100, Szabolcs Nagy wrote:
clone3 seems to have features that are only available in clone3 and not exposed (reasonably) in libc apis so ppl will use clone3 directly and those will be hard to fix for gcs (you have to convince upstream to add future arm64 arch specific changes that they cannot test).
Ah, I hadn't realised that there were things that weren't available via libc - that does change the calculation a bit here. I would hope that anything we do for clone3() would work just as well for x86 so the test side should be a bit easier there than if it were a future arm64 thing, though obviously it wouldn't be mandatory on x86 in the way that Catalin wanted it for arm64.
I haven't checked how many clone() or clone3() uses outside the libc are (I tried some quick search in Debian but did not dig into the specifics to see how generic that code is). I agree that having to change valid cases outside of libc is not ideal. Even if we have the same clone3() interface for x86 and arm64, we'd have other architectures that need #ifdef'ing.
So I'm slightly warming up to the idea of having a default shadow stack size (either RLIMIT_STACK or the clone3() stack size, following x86). A clone3() extension can be added on top, though I wonder whether anyone will use it if the kernel allocates a shadow stack by default.
It's not just the default size that I dislike (I think the x86 RLIMIT_STACK or clone3() stack_size is probably good enough) but the kernel allocating the shadow stack and inserting it into the user address space. The actual thread stack is managed by the user but the shadow stack is not (and we don't do this very often). Anyway, I don't have a better solution for direct uses of clone() or clone3(), other than running those threads with the shadow stack disabled. Not sure that's desirable.