Validate the stack arguments and setup the stack depening on whether or not it is growing down or up.
Legacy clone() required userspace to know in which direction the stack is growing and pass down the stack pointer appropriately. To make things more confusing microblaze uses a variant of the clone() syscall selected by CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. IA64 has a separate clone2() syscall which also takes an additional stack_size argument. Finally, parisc has a stack that is growing upwards. Userspace therefore has a lot nasty code like the following:
#define __STACK_SIZE (8 * 1024 * 1024) pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { pid_t ret; void *stack;
stack = malloc(__STACK_SIZE); if (!stack) return -ENOMEM;
#ifdef __ia64__ ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #elif defined(__parisc__) /* stack grows up */ ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd); #else ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #endif return ret; }
or even crazier variants such as [3].
With clone3() we have the ability to validate the stack. We can check that when stack_size is passed, the stack pointer is valid and the other way around. We can also check that the memory area userspace gave us is fine to use via access_ok(). Furthermore, we probably should not require userspace to know in which direction the stack is growing. It is easy for us to do this in the kernel and I couldn't find the original reasoning behind exposing this detail to userspace.
/* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action. Note, that this is an explicit change in user visible behavior we introduce with this patch. If it breaks someone's use-case we will revert! (And then e.g. place the new behavior under an appropriate flag.) Breaking someone's use-case is very unlikely though. First, neither glibc nor musl currently expose a wrapper for clone3(). Second, there is no real motivation for anyone to use clone3() directly since it does not provide features that legacy clone doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. Searches on [4] did not reveal any packages calling clone3().
[1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17T... [2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa3... [4]: https://codesearch.debian.net Fixes: 7f192e3cd316 ("fork: add clone3") Cc: Arnd Bergmann arnd@arndb.de Cc: Kees Cook keescook@chromium.org Cc: Jann Horn jannh@google.com Cc: David Howells dhowells@redhat.com Cc: Ingo Molnar mingo@redhat.com Cc: Oleg Nesterov oleg@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Florian Weimer fweimer@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: linux-api@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 5.3 Cc: GNU C Library libc-alpha@sourceware.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com --- include/uapi/linux/sched.h | 4 ++++ kernel/fork.c | 33 ++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..25b4fa00bad1 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@ * sent when the child exits. * @stack: Specify the location of the stack for the * child process. + * Note, @stack is expected to point to the + * lowest address. The stack direction will be + * determined by the kernel and set up + * appropriately based on @stack_size. * @stack_size: The size of the stack for the child process. * @tls: If CLONE_SETTLS is set, the tls descriptor * is set to tls. diff --git a/kernel/fork.c b/kernel/fork.c index bcdf53125210..55af6931c6ec 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, return 0; }
-static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/** + * clone3_stack_valid - check and prepare stack + * @kargs: kernel clone args + * + * Verify that the stack arguments userspace gave us are sane. + * In addition, set the stack direction for userspace since it's easy for us to + * determine. + */ +static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{ + if (kargs->stack == 0) { + if (kargs->stack_size > 0) + return false; + } else { + if (kargs->stack_size == 0) + return false; + + if (!access_ok((void __user *)kargs->stack, kargs->stack_size)) + return false; + +#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64) + kargs->stack += kargs->stack_size; +#endif + } + + return true; +} + +static bool clone3_args_valid(struct kernel_clone_args *kargs) { /* * All lower bits of the flag word are taken. @@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) kargs->exit_signal) return false;
+ if (!clone3_stack_valid(kargs)) + return false; + return true; }
From: Christian Brauner
Sent: 31 October 2019 11:36
...
/* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action.
What happens if someone 'accidentally' runs a program compiled for the new API on a system running the existing 5.3 kernel?
While it won't work, it needs to die reasonably gracefully.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 2019-10-31, Christian Brauner christian.brauner@ubuntu.com wrote:
Validate the stack arguments and setup the stack depening on whether or not it is growing down or up.
Legacy clone() required userspace to know in which direction the stack is growing and pass down the stack pointer appropriately. To make things more confusing microblaze uses a variant of the clone() syscall selected by CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. IA64 has a separate clone2() syscall which also takes an additional stack_size argument. Finally, parisc has a stack that is growing upwards. Userspace therefore has a lot nasty code like the following:
#define __STACK_SIZE (8 * 1024 * 1024) pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { pid_t ret; void *stack;
stack = malloc(__STACK_SIZE); if (!stack) return -ENOMEM;
#ifdef __ia64__ ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #elif defined(__parisc__) /* stack grows up */ ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd); #else ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #endif return ret; }
or even crazier variants such as [3].
With clone3() we have the ability to validate the stack. We can check that when stack_size is passed, the stack pointer is valid and the other way around. We can also check that the memory area userspace gave us is fine to use via access_ok(). Furthermore, we probably should not require userspace to know in which direction the stack is growing. It is easy for us to do this in the kernel and I couldn't find the original reasoning behind exposing this detail to userspace.
/* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action. Note, that this is an explicit change in user visible behavior we introduce with this patch. If it breaks someone's use-case we will revert! (And then e.g. place the new behavior under an appropriate flag.) Breaking someone's use-case is very unlikely though. First, neither glibc nor musl currently expose a wrapper for clone3(). Second, there is no real motivation for anyone to use clone3() directly since it does not provide features that legacy clone doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. Searches on [4] did not reveal any packages calling clone3().
Fixes: 7f192e3cd316 ("fork: add clone3") Cc: Arnd Bergmann arnd@arndb.de Cc: Kees Cook keescook@chromium.org Cc: Jann Horn jannh@google.com Cc: David Howells dhowells@redhat.com Cc: Ingo Molnar mingo@redhat.com Cc: Oleg Nesterov oleg@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Florian Weimer fweimer@redhat.com Cc: Peter Zijlstra peterz@infradead.org Cc: linux-api@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # 5.3 Cc: GNU C Library libc-alpha@sourceware.org Signed-off-by: Christian Brauner christian.brauner@ubuntu.com
Looks reasonable, and it'll be lovely to not have to worry about stack ordering anymore. As for the UAPI breakage, I agree that nobody is using clone3() yet and so we still have a bit of lee-way to fix this behaviour.
Acked-by: Aleksa Sarai cyphar@cyphar.com
include/uapi/linux/sched.h | 4 ++++ kernel/fork.c | 33 ++++++++++++++++++++++++++++++++- 2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..25b4fa00bad1 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
- @stack_size: The size of the stack for the child process.
- @tls: If CLONE_SETTLS is set, the tls descriptor
is set to tls.
diff --git a/kernel/fork.c b/kernel/fork.c index bcdf53125210..55af6931c6ec 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, return 0; } -static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/**
- clone3_stack_valid - check and prepare stack
- @kargs: kernel clone args
- Verify that the stack arguments userspace gave us are sane.
- In addition, set the stack direction for userspace since it's easy for us to
- determine.
- */
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
+#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64)
kargs->stack += kargs->stack_size;
+#endif
- }
- return true;
+}
+static bool clone3_args_valid(struct kernel_clone_args *kargs) { /* * All lower bits of the flag word are taken. @@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) kargs->exit_signal) return false;
- if (!clone3_stack_valid(kargs))
return false;
- return true;
}
From Christian Brauner Sent: 31 October 2019 11:36
Validate the stack arguments and setup the stack depening on whether or not it is growing down or up.
...
-static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/**
- clone3_stack_valid - check and prepare stack
- @kargs: kernel clone args
- Verify that the stack arguments userspace gave us are sane.
- In addition, set the stack direction for userspace since it's easy for us to
- determine.
- */
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
Does access_ok() do anything useful here? It only verifies that the buffer isn't in kernel space.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On 10/31, Christian Brauner wrote:
--- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
I can't review this patch, I have no idea what does stack_size mean if !arch/x86.
x86 doesn't use stack_size unless a kthread does kernel_thread(), so this change is probably fine...
Hmm. Off-topic question, why did 7f192e3cd3 ("fork: add clone3") add "& ~CSIGNAL" in kernel_thread() ? This looks pointless and confusing to me...
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
So to implement clone3_wrapper(void *bottom_of_stack) you need to do
clone3_wrapper(void *bottom_of_stack) { struct clone_args args = { ... // make clone3_stack_valid() happy .stack = bottom_of_stack - 1, .stack_size = 1, }; }
looks a bit strange. OK, I agree, this example is very artificial. But why do you think clone3() should nack stack_size == 0 ?
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
Why?
Oleg.
On Thu, Oct 31, 2019 at 05:46:53PM +0100, Oleg Nesterov wrote:
On 10/31, Christian Brauner wrote:
--- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
I can't review this patch, I have no idea what does stack_size mean if !arch/x86.
In short: nothing at all if it weren't for ia64 (and maybe parisc). But let me provide some (hopefully useful) context. (Probably most of that is well-know, so sorry for superflous info. :))
The stack and stack_size argument are used in copy_thread_tls() and in copy_thread(). What the arch will end up calling depends on CONFIG_HAVE_COPY_THREAD. Afaict, mips, powerpc, s390, and x86 call copy_thread_tls(). The other arches call copy_thread(). On all arches _except_ IA64 copy_thread{_tls}() just assigns "stack" to the right register and is done with it. On all arches _except_ parisc "stack" needs to point to the highest address. On parisc it needs to point to the lowest (CONFIG_STACK_GROWSUP). IA64 has a downwards growing stack like all the other architectures but it expects "stack" to poin to the _lowest_ address nonetheless. In contrast to all the other arches it does:
child_ptregs->r12 = user_stack_base + user_stack_size - 16;
so ia64 sets up the stack pointer itself.
So now we have: parisc -> upwards growing stack, stack_size _unused_ for user stacks !parisc -> downwards growing stack, stack_size _unused_ for user stacks ia64 -> downwards growing stack, stack_size _used_ for user stacks
Now it gets more confusing since the clone() syscall layout is arch dependent as well. Let's ignore the case of arches that have a clone syscall version with switched flags and stack argument and only focus on arches with an _additional_ stack_size argument:
microblaze -> clone(stack, stack_size)
Then there's clone2() for ia64 which is a _separate_ syscall with an additional stack_size argument:
ia64 -> clone2(stack, stack_size)
Now, contrary to what you'd expect, microblaze ignores the stack_size argument.
So the stack_size argument _would_ be completely meaningless if it weren't for ia64 and parisc.
x86 doesn't use stack_size unless a kthread does kernel_thread(), so this change is probably fine...
Hmm. Off-topic question, why did 7f192e3cd3 ("fork: add clone3") add "& ~CSIGNAL" in kernel_thread() ? This looks pointless and confusing to me...
(Can we discuss this over a patch that removes this restriction if we think this is pointless?)
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
So to implement clone3_wrapper(void *bottom_of_stack) you need to do
clone3_wrapper(void *bottom_of_stack) { struct clone_args args = { ... // make clone3_stack_valid() happy .stack = bottom_of_stack - 1, .stack_size = 1, }; }
looks a bit strange. OK, I agree, this example is very artificial. But why do you think clone3() should nack stack_size == 0 ?
In short, consistency. I think prior clone() versions (on accident) have exposed the stack direction as an implementation detail to userspace. Userspace clone() code wrapping code is _wild_ and buggy partially because of that.
The best thing imho, is to clearly communicate to userspace that stack needs to point to the lowest address and stack_size to the initial range of the stack pointer or both are 0.
The alternative is to let userspace either give us a stack pointer that we expect to be setup correctly by userspace or a stack pointer to the lowest address and a stack_size argument. That's just an invitation for more confusion and we have proof with legacy clone that this is not a good idea.
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
Why?
It's nice of us to tell userspace _before_ we have created a thread that it messed up its parameters instead of starting a thread that then immediately crashes.
Christian
On 11/01, Christian Brauner wrote:
On Thu, Oct 31, 2019 at 05:46:53PM +0100, Oleg Nesterov wrote:
On 10/31, Christian Brauner wrote:
--- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
I can't review this patch, I have no idea what does stack_size mean if !arch/x86.
In short: nothing at all if it weren't for ia64 (and maybe parisc). But let me provide some (hopefully useful) context.
Thanks...
(Probably most of that is well-know,
Certainly not to me ;) Thanks.
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
So to implement clone3_wrapper(void *bottom_of_stack) you need to do
clone3_wrapper(void *bottom_of_stack) { struct clone_args args = { ... // make clone3_stack_valid() happy .stack = bottom_of_stack - 1, .stack_size = 1, }; }
looks a bit strange. OK, I agree, this example is very artificial. But why do you think clone3() should nack stack_size == 0 ?
In short, consistency.
And in my opinion this stack_size == 0 check destroys the consistency, see below.
But just in case, let me say that overall I personally like this change.
The best thing imho, is to clearly communicate to userspace that stack needs to point to the lowest address and stack_size to the initial range of the stack pointer
Agreed.
But the kernel can't verify that "stack" actually points to the lowest address and stack_size is actually the stack size. Consider another artificial
clone3_wrapper(void *bottom_of_stack, unsigned long offs) { struct clone_args args = { ... // make clone3_stack_valid() happy .stack = bottom_of_stack - offs, .stack_size = offs, }; sys_clone3(args); } Now,
clone3_wrapper(bottom_of_stack, offs);
is same thing for _any_ offs except offs == 0 will fail. Why? To me this is not consistent, I think the "stack_size == 0" check buys nothing and only adds some confusion.
Say, stack_size == 1 is "obviously wrong" too, this certainly means that "stack" doesn't point to the lowest address (or the child will corrupt the memory), but it works.
OK, I won't insist. Perhaps it can help to detect the case when a user forgets to pass the correct stack size.
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
Why?
It's nice of us to tell userspace _before_ we have created a thread that it messed up its parameters instead of starting a thread that then immediately crashes.
Heh. Then why this code doesn't verify that at least stack + stack_size is properly mmaped with PROT_READ|WRITE?
Oleg.
On Fri, Nov 01, 2019 at 01:32:57PM +0100, Oleg Nesterov wrote:
On 11/01, Christian Brauner wrote:
On Thu, Oct 31, 2019 at 05:46:53PM +0100, Oleg Nesterov wrote:
On 10/31, Christian Brauner wrote:
--- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
I can't review this patch, I have no idea what does stack_size mean if !arch/x86.
In short: nothing at all if it weren't for ia64 (and maybe parisc). But let me provide some (hopefully useful) context.
Thanks...
(Probably most of that is well-know,
Certainly not to me ;) Thanks.
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
So to implement clone3_wrapper(void *bottom_of_stack) you need to do
clone3_wrapper(void *bottom_of_stack) { struct clone_args args = { ... // make clone3_stack_valid() happy .stack = bottom_of_stack - 1, .stack_size = 1, }; }
looks a bit strange. OK, I agree, this example is very artificial. But why do you think clone3() should nack stack_size == 0 ?
In short, consistency.
And in my opinion this stack_size == 0 check destroys the consistency, see below.
But just in case, let me say that overall I personally like this change.
The best thing imho, is to clearly communicate to userspace that stack needs to point to the lowest address and stack_size to the initial range of the stack pointer
Agreed.
But the kernel can't verify that "stack" actually points to the lowest address and stack_size is actually the stack size. Consider another artificial
Sure, but that's the similar to other structs that are passed via a pointer and come with a size. You could pass:
setxattr(..., ..., value - size, size, ...);
and the kernel would be confused as well.
clone3_wrapper(void *bottom_of_stack, unsigned long offs) { struct clone_args args = { ... // make clone3_stack_valid() happy .stack = bottom_of_stack - offs, .stack_size = offs, }; sys_clone3(args); }
Now,
clone3_wrapper(bottom_of_stack, offs);
is same thing for _any_ offs except offs == 0 will fail. Why? To me this is not consistent, I think the "stack_size == 0" check buys nothing and only adds some confusion.
I disagree. It's a very easy contract: pass a stack and a size or request copy-on-write by passing both as 0. Sure, you can flaunt that contract but that's true of every other pointer + size api. The point is: the api we endorse should be simple and stack + stack_size is very simple.
Say, stack_size == 1 is "obviously wrong" too, this certainly means that "stack" doesn't point to the lowest address (or the child will corrupt the memory), but it works.
OK, I won't insist. Perhaps it can help to detect the case when a user forgets to pass the correct stack size.
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
Why?
It's nice of us to tell userspace _before_ we have created a thread that it messed up its parameters instead of starting a thread that then immediately crashes.
Heh. Then why this code doesn't verify that at least stack + stack_size is properly mmaped with PROT_READ|WRITE?
access_ok() is uncomplicated. The other check makes a lot more assumptions. Theare are users that might want to have a PROT_NONE part of their stack as their own "private" guard page (Jann just made that point) and there are other corner cases.
Christian
On 31/10/2019 11:36, Christian Brauner wrote:
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..25b4fa00bad1 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
- @stack_size: The size of the stack for the child process.
- @tls: If CLONE_SETTLS is set, the tls descriptor
is set to tls.
diff --git a/kernel/fork.c b/kernel/fork.c index bcdf53125210..55af6931c6ec 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, return 0; } -static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/**
- clone3_stack_valid - check and prepare stack
- @kargs: kernel clone args
- Verify that the stack arguments userspace gave us are sane.
- In addition, set the stack direction for userspace since it's easy for us to
- determine.
- */
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
+#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64)
kargs->stack += kargs->stack_size;
+#endif
- }
from the description it is not clear whose responsibility it is to guarantee the alignment of sp on entry.
i think 0 stack size may work if signals are blocked and then prohibiting it might not be the right thing.
it's not clear how libc should deal with v5.3 kernels which don't have the stack+=stack_size logic.
On Fri, Nov 01, 2019 at 02:57:10PM +0000, Szabolcs Nagy wrote:
On 31/10/2019 11:36, Christian Brauner wrote:
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h index 99335e1f4a27..25b4fa00bad1 100644 --- a/include/uapi/linux/sched.h +++ b/include/uapi/linux/sched.h @@ -51,6 +51,10 @@
sent when the child exits.
- @stack: Specify the location of the stack for the
child process.
Note, @stack is expected to point to the
lowest address. The stack direction will be
determined by the kernel and set up
appropriately based on @stack_size.
- @stack_size: The size of the stack for the child process.
- @tls: If CLONE_SETTLS is set, the tls descriptor
is set to tls.
diff --git a/kernel/fork.c b/kernel/fork.c index bcdf53125210..55af6931c6ec 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, return 0; } -static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/**
- clone3_stack_valid - check and prepare stack
- @kargs: kernel clone args
- Verify that the stack arguments userspace gave us are sane.
- In addition, set the stack direction for userspace since it's easy for us to
- determine.
- */
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{
- if (kargs->stack == 0) {
if (kargs->stack_size > 0)
return false;
- } else {
if (kargs->stack_size == 0)
return false;
if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
return false;
+#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64)
kargs->stack += kargs->stack_size;
+#endif
- }
from the description it is not clear whose responsibility it is to guarantee the alignment of sp on entry.
Userspace.
i think 0 stack size may work if signals are blocked and then prohibiting it might not be the right thing.
Note that stack size 0 is allowed:
struct clone_args args = { .exit_signal = SIGCHLD, };
clone3(&args, sizeof(args));
will just work fine.
it's not clear how libc should deal with v5.3 kernels which don't have the stack+=stack_size logic.
stable is already Cced and the change will be backported to v5.3. Nearly all distros track pull in stable updates.
Florian, thoughts on this?
Christian
On Thu, Oct 31, 2019 at 12:36 PM Christian Brauner christian.brauner@ubuntu.com wrote:
Validate the stack arguments and setup the stack depening on whether or not it is growing down or up.
Signed-off-by: Christian Brauner christian.brauner@ubuntu.com
Acked-by: Arnd Bergmann arnd@arndb.de
On Tue, 5 Nov 2019 at 15:25, Arnd Bergmann arnd@arndb.de wrote:
On Thu, Oct 31, 2019 at 12:36 PM Christian Brauner christian.brauner@ubuntu.com wrote:
Validate the stack arguments and setup the stack depening on whether or not it is growing down or up.
Signed-off-by: Christian Brauner christian.brauner@ubuntu.com
Acked-by: Arnd Bergmann arnd@arndb.de
and Acked-by: Michael Kerrisk mtk.manpages@gmail.com
linux-stable-mirror@lists.linaro.org