On Tue, Dec 05, 2023 at 10:23:08PM +0000, Edgecombe, Rick P wrote:
On Tue, 2023-12-05 at 15:51 +0000, Mark Brown wrote:
Hrm, right. And we then can't use do_mmap() either. I'd be somewhat tempted to disallow that specific case for now rather than deal with it though that's not really in the spirit of just always following what the user asked for.
Oh, yea. What a pain. It doesn't seem like we could easily even add a do_mmap() variant that takes an mm either.
I did a quick logging test on a Fedora userspace. systemd (I think) appears to do a clone(!CLONE_VM) with a stack passed. So maybe the combo might actually get used with a shadow_stack_size if it used clone3 some day. At the same time, fixing clone to mmap() in the child doesn't seem straight forward at all. Checking with some of our MM folks, the suggestion was to look at doing the child's shadow stack mapping in dup_mm() to avoid tripping over complications that happen when a remote MM becomes more "live".
Yeah, I can't see anything that looks particularly tasteful.
If we just punt on this combination for now, then the documented rules for args->shadow_stack_size would be something like: clone3 will use the parents shadow stack when CLONE_VM is not present. If CLONE_VFORK is set then it will use the parents shadow stack only when args->shadow_stack_size is non-zero. In the cases when the parents shadow stack is not used, args->shadow_stack_size is used for the size whenever non-zero.
I guess it doesn't seem too overly complicated. But I'm not thinking any of the options seem great. I'd unhappily lean towards not
Indeed, it's all really hard to get enthusiastic about.
supporting shadow_stack_size!=0 && !CLONE_VM for now. But it seems like there may be a user for the unsupported case, so this would be just improving things a little and kicking the can down the road. I also wonder if this is a sign to reconsider the earlier token consuming design.
In the case where we have !CLONE_VM it should actually possible to reuse the token (since the user is in at least some sense the child process rather than the parent) so it's less pure overhead, providing you don't mind the children of a given parent all using the same addresses for their initial shadow stack.
I'll have a poke at the various options and come up with something, hopefully this month but it's getting a bit busy so might be early next year instead.