On Thu, Sep 28, 2023 at 05:59:25PM +0100, Szabolcs Nagy wrote:
The 08/23/2023 14:11, Catalin Marinas wrote:
and there is user code doing raw clone threads (such threads are technically not allowed to call into libc) it's not immediately clear to me if having gcs in those threads is better or worse.
i think raw clone / clone3 users may be relevant so we need a solution such that they don't fail when gcs args are missing.
Are we sure about that? Old binaries shouldn't be affected since they won't turn GCS so we're just talking about new binaries here - are there really so many of them that we won't be able to get them all converted over to clone3() and GCS in the timescales we're talking about for GCS deployment? I obviously don't particularly mind having the default size logic but if we allow clone() then that's keeping the existing behaviour and layering allocation via clone3() on top of it which Catalin didn't want. Catalin?
userspace allocated gcs works for me, but maybe the alternative with size only is more consistent (thread gcs is kernel mapped with fallback size logic if gcs size is missing):
If we have size only then the handling of GCS and normal stack in struct clone_args would be inconsistent. Given that it seems better to have the field present, we can allow it to be NULL and do the allocation with the specified size but it should be there.
An alternative would be for the clone3() to provide an address _hint_ and size for GCS and it would still be the kernel doing the mmap (and munmap on clearing). But at least the user has some control over the placement of the GCS and its size (and maybe providing the address has MAP_FIXED semantics).
the main thread gcs is still special: the size is provided via prctl (if at all).
Either that or we have it do a map_shadow_stack() but that's an extra syscall during startup.