On 11/19/21 5:29 PM, Peter Zijlstra wrote:
On Thu, Nov 18, 2021 at 06:04:27PM -0800, Josh Poimboeuf wrote:
On Thu, Nov 18, 2021 at 01:11:09PM +0100, Peter Zijlstra wrote:
I now have the below, the only thing missing is that there's a user_mode() call on a stack based regs. Now on x86_64 we can __get_kernel_nofault() regs->cs and call it a day, but on i386 we have to also fetch regs->flags.
Is this really the way to go?
Please no. Can we just add a check in unwind_start() to ensure the caller did try_get_task_stack()?
I tried; but at best it's fundamentally racy and in practise its worse because init_task doesn't seem to believe in refcounts and kthreads are odd for some raisin. Now those are fixable, but given the fundamental races, I don't see how it's ever going to be reliable.
I don't mind the __get_kernel_nofault() usage and think I can do a better implementation that will allow us to get rid of the pagefault_{dis,en}able() sprinkling, but that's for another day. It's just the user_mode(regs) usage that's going to be somewhat ugleh.
Anyway, below is the minimal fix for the situation at hand. I'm not going to be around much today, so if Linus wants to pick that up instead of mass revert things that's obviously fine too.
Subject: x86: Pin task-stack in __get_wchan()
When commit 5d1ceb3969b6 ("x86: Fix __get_wchan() for !STACKTRACE") moved from stacktrace to native unwind_*() usage, the try_get_task_stack() got lost, leading to use-after-free issues for dying tasks.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org
arch/x86/kernel/process.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index e9ee8b526319..04143a653a8a 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -964,6 +964,9 @@ unsigned long __get_wchan(struct task_struct *p) struct unwind_state state; unsigned long addr = 0;
- if (!try_get_task_stack(p))
return 0;
- for (unwind_start(&state, p, NULL, NULL); !unwind_done(&state); unwind_next_frame(&state)) { addr = unwind_get_return_address(&state);
@@ -974,6 +977,8 @@ unsigned long __get_wchan(struct task_struct *p) break; }
- put_task_stack(p);
- return addr; }
This implementation is very similar to stack_trace_save_tsk(), maybe we can just move stack_trace_save_tsk() out of CONFIG_STACKTRACE and reuse it.