rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
- Kyle
On Wed, Nov 17, 2021 at 10:47:13AM -0800, Kyle Huey wrote:
rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
Ah, hm, this was trying to fix the case where a program trips SECCOMP_RET_KILL (which is a "fatal SIGSYS"), and had been unobservable before. I guess the fix was too broad...
On Wed, Nov 17, 2021 at 10:51 AM Kees Cook keescook@chromium.org wrote:
On Wed, Nov 17, 2021 at 10:47:13AM -0800, Kyle Huey wrote:
rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
Ah, hm, this was trying to fix the case where a program trips SECCOMP_RET_KILL (which is a "fatal SIGSYS"), and had been unobservable before. I guess the fix was too broad...
Perhaps I don't understand precisely what you mean by this, but gdb's behavior for a program that is SECCOMP_RET_KILLed was not changed by this patch (the SIGSYS is not observed until after program exit before or after this change).
- Kyle
On Wed, Nov 17, 2021 at 11:05 AM Kyle Huey me@kylehuey.com wrote:
On Wed, Nov 17, 2021 at 10:51 AM Kees Cook keescook@chromium.org wrote:
On Wed, Nov 17, 2021 at 10:47:13AM -0800, Kyle Huey wrote:
rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
Ah, hm, this was trying to fix the case where a program trips SECCOMP_RET_KILL (which is a "fatal SIGSYS"), and had been unobservable before. I guess the fix was too broad...
Perhaps I don't understand precisely what you mean by this, but gdb's behavior for a program that is SECCOMP_RET_KILLed was not changed by this patch (the SIGSYS is not observed until after program exit before or after this change).
Ah, maybe that behavior changed in 5.15 (my "before" here is a 5.14 kernel). I would argue that the debugger seeing the SIGSYS for SECCOMP_RET_KILL is desirable though ...
- Kyle
Kyle Huey me@kylehuey.com writes:
On Wed, Nov 17, 2021 at 11:05 AM Kyle Huey me@kylehuey.com wrote:
On Wed, Nov 17, 2021 at 10:51 AM Kees Cook keescook@chromium.org wrote:
On Wed, Nov 17, 2021 at 10:47:13AM -0800, Kyle Huey wrote:
rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
Ah, hm, this was trying to fix the case where a program trips SECCOMP_RET_KILL (which is a "fatal SIGSYS"), and had been unobservable before. I guess the fix was too broad...
Perhaps I don't understand precisely what you mean by this, but gdb's behavior for a program that is SECCOMP_RET_KILLed was not changed by this patch (the SIGSYS is not observed until after program exit before or after this change).
Ah, maybe that behavior changed in 5.15 (my "before" here is a 5.14 kernel). I would argue that the debugger seeing the SIGSYS for SECCOMP_RET_KILL is desirable though ...
This is definitely worth discussing, and probably in need of fixing (aka something in rr seems to have broken).
We definitely need protection against the race with sigaction.
The fundamental question becomes does it make sense and is it safe to allow a debugger to stop at, and possibly change these signals.
Stopping at something SA_IMMUTABLE as long as the signal is allowed to continue and kill the process when PTRACE_CONT happens seems harmless.
Allowing the debugger to change the signal, or change it's handling I don't know.
All of this is channeled through the following function.
static int force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool sigdfl) { unsigned long int flags; int ret, blocked, ignored; struct k_sigaction *action; int sig = info->si_signo;
spin_lock_irqsave(&t->sighand->siglock, flags); action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig); if (blocked || ignored || sigdfl) { action->sa.sa_handler = SIG_DFL; action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t); } } /* * Don't clear SIGNAL_UNKILLABLE for traced tasks, users won't expect * debugging to leave init killable. */ if (action->sa.sa_handler == SIG_DFL && !t->ptrace) t->signal->flags &= ~SIGNAL_UNKILLABLE; ret = send_signal(sig, info, t, PIDTYPE_PID); spin_unlock_irqrestore(&t->sighand->siglock, flags);
return ret; }
Right now we have 3 conditions that trigger SA_IMMUTABLE. - The sigdfl parameter is passed asking that userspace not be able to change the handling of the signal.
- A synchronous exception is taken and the signal is blocked.
- A synchronous exception is taken and the signal is ignored.
Today because of how things are implemented the code most change the userspace state to allow the signal to kill the process. I really want to get rid of that, because that has other side effects. As part of getting rid of changing the state it is my plan to get rid of SA_IMMUTABLE as well. If I don't have to allow the debugger to stop and observe what is happening with the signal that change is much easier to implement.
The classic trigger of sigdfl is a recursive SIGSEGV.
However we have other cases like SECCOMP_RET_KILL where the kernel has never allowed userspace to intercept the killing of the process. Things that have messages like: "seccomp tried to change syscall nr or ip"
My brain is drawing a blank on how to analyze those.
Kees I am back to asking the question I had before I figured out SA_IMMUTABLE. Are there security concerns with debuggers intercepting SECCOMP_RET_KILL.
I think I can modify dequeue_synchronous_signal so that we can perform the necessary logic in get_signal rather than hack up the signal handling state in force_sig_info_to_task.
Except for the cases like SECCOMP_RET_KILL where the kernel has never allowed userspace to intercept the handling. I don't see any fundamental reason why ptrace could not intercept the signal. The handling is overriden to force the process to die, because the way userspace is currently configured to handle the signal does not work so it is necessary to kill the process.
I think there are cases where the userspace state is known to be sufficiently wrong that the kernel can not safely allow anything more than inspecting the state.
I can revisit the code to see if the kernel will get confused if something more is allowed. Still I really like the current semantics of SA_IMMUTABLE because these are cases where something wrong. If someone miscalculates how things are wrong it could result in the kernel getting confused and doing the wrong thing. Allowing the debugger to intercept the signal requires we risk miscalculating what is wrong.
Kyle how exactly is rr broken? Certainly a historical usage does not work. How does this affect actual real world debugging sessions?
You noticed this and bisected the change quickly so I fully expect this does affect real world debugging sessions. I just want to know exactly how so that exactly what is wrong can be fixed.
As far as I can tell SA_IMMUTABLE has only been backported to v5.15.x where in cleaning things up I made SECCOMP_RET_KILL susceptible to races with sigaction, and ptrace. Those races need to be closed or we need to decide that we don't actually care if the debugger does things.
Eric
On Wed, Nov 17, 2021 at 03:04:31PM -0600, Eric W. Biederman wrote:
Kyle Huey me@kylehuey.com writes:
On Wed, Nov 17, 2021 at 11:05 AM Kyle Huey me@kylehuey.com wrote:
On Wed, Nov 17, 2021 at 10:51 AM Kees Cook keescook@chromium.org wrote:
On Wed, Nov 17, 2021 at 10:47:13AM -0800, Kyle Huey wrote:
rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
Ah, hm, this was trying to fix the case where a program trips SECCOMP_RET_KILL (which is a "fatal SIGSYS"), and had been unobservable before. I guess the fix was too broad...
Perhaps I don't understand precisely what you mean by this, but gdb's behavior for a program that is SECCOMP_RET_KILLed was not changed by this patch (the SIGSYS is not observed until after program exit before or after this change).
The SA_IMMUTABLE change was to deal with failures seen in the seccomp test suite after the recent fatal signal refactoring. Mainly that a process that should have effectively performed do_exit() was suddenly visible to the tracer.
Ah, maybe that behavior changed in 5.15 (my "before" here is a 5.14 kernel). I would argue that the debugger seeing the SIGSYS for SECCOMP_RET_KILL is desirable though ...
This is definitely worth discussing, and probably in need of fixing (aka something in rr seems to have broken).
We definitely need protection against the race with sigaction.
The fundamental question becomes does it make sense and is it safe to allow a debugger to stop at, and possibly change these signals.
I have no problem with a debugger getting notified about a fatal (SECCOMP_RET_KILL*-originated) SIGSYS. But whatever happens, the kernel needs to make sure the process does not continue. (i.e. signal can't be changed/removed/etc.)
Stopping at something SA_IMMUTABLE as long as the signal is allowed to continue and kill the process when PTRACE_CONT happens seems harmless.
Allowing the debugger to change the signal, or change it's handling I don't know.
Right -- I'm fine with a visibility change (the seccomp test suite is just checking for various expected state machine changes across the various signal/death cases: as long as it _dies_, that's what we want. If a extra notification appears before it dies, that's okay, it just needs the test suite to change).
[...] Kees I am back to asking the question I had before I figured out SA_IMMUTABLE. Are there security concerns with debuggers intercepting SECCOMP_RET_KILL.
I see no problem with allowing a tracer to observe the signal, but the signalled process must have no way to continue running. If we end up in such a state, then a seccomp process with access to clone() and ptrace() can escape the seccomp sandbox. This is why seccomp had been using the big do_exit() hammer -- I really want to absolutely never have a bug manifest with a bypassed SECCOMP_RET_KILL: having a completely unavoidable "dying" state is needed.
On Wed, Nov 17, 2021 at 1:54 PM Kees Cook keescook@chromium.org wrote:
The SA_IMMUTABLE change was to deal with failures seen in the seccomp test suite after the recent fatal signal refactoring. Mainly that a process that should have effectively performed do_exit() was suddenly visible to the tracer.
I think this basically shows that the conversion from do_exit() to fatal_signal() was just wrong. The "do_exit()" wasn't really a signal, and can't be treated as such.
That said, instead of reverting, maybe we can just mark the cases where it really is about sending a synchronous signal, vs sending an explicitly fatal signal.
It's basically the "true" condition to force_sig_info_to_task(), so the fix might be just
@@ -1323,7 +1323,8 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool blocked = sigismember(&t->blocked, sig); if (blocked || ignored || sigdfl) { action->sa.sa_handler = SIG_DFL; - action->sa.sa_flags |= SA_IMMUTABLE; + if (sigdfl) + action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t);
Kyle, does that fix your test-case? And Kees - yours?
Linus
On Wed, Nov 17, 2021 at 03:24:24PM -0800, Linus Torvalds wrote:
On Wed, Nov 17, 2021 at 1:54 PM Kees Cook keescook@chromium.org wrote:
The SA_IMMUTABLE change was to deal with failures seen in the seccomp test suite after the recent fatal signal refactoring. Mainly that a process that should have effectively performed do_exit() was suddenly visible to the tracer.
I think this basically shows that the conversion from do_exit() to fatal_signal() was just wrong. The "do_exit()" wasn't really a signal, and can't be treated as such.
That said, instead of reverting, maybe we can just mark the cases where it really is about sending a synchronous signal, vs sending an explicitly fatal signal.
It's basically the "true" condition to force_sig_info_to_task(), so the fix might be just
@@ -1323,7 +1323,8 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool blocked = sigismember(&t->blocked, sig); if (blocked || ignored || sigdfl) { action->sa.sa_handler = SIG_DFL;
action->sa.sa_flags |= SA_IMMUTABLE;
if (sigdfl)
action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t);
Kyle, does that fix your test-case? And Kees - yours?
Yup, the correct behavior is retained for me with this change.
(nit: should the "sigdfl" argument be renamed "immutable" for clarity in this function?)
On Wed, Nov 17, 2021 at 4:05 PM Kees Cook keescook@chromium.org wrote:
(nit: should the "sigdfl" argument be renamed "immutable" for clarity in this function?)
I don't think that would necessarily clarify anything. Neither "sigdfl" nor "immutable" makes at least me go "Ahh, that explains things".
Both "sigdfl" and "immutable" are about random internal implementation choices ("force SIGDFL" and "set SA_IMMUTABLE" respectively).
I think naming things by random internal implementation things is questionable in general, but it's particularly questionable when they aren't even some really fundamental thing.
I think you generally want to name things not by how they do something, but by *WHAT* they do.
So I think the proper name for it would be "fatal" or something like that. It's basically saying "This signal is fatal, even if you have a handler for it or not". That "set it to SIGDFL" just happens to be how we made it fatal.
And then we should perhaps also make such a signal uncatchable by the debugger (rather than just "debugger cannot undo or modify it" like the SA_IMMUTABLE bit does).
Anybody want to take on that renaming / uncatchable part? Please take my (now at least tested by Kees) patch and make it your own.
Linus
On Wed, Nov 17, 2021 at 3:24 PM Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, Nov 17, 2021 at 1:54 PM Kees Cook keescook@chromium.org wrote:
The SA_IMMUTABLE change was to deal with failures seen in the seccomp test suite after the recent fatal signal refactoring. Mainly that a process that should have effectively performed do_exit() was suddenly visible to the tracer.
I think this basically shows that the conversion from do_exit() to fatal_signal() was just wrong. The "do_exit()" wasn't really a signal, and can't be treated as such.
That said, instead of reverting, maybe we can just mark the cases where it really is about sending a synchronous signal, vs sending an explicitly fatal signal.
It's basically the "true" condition to force_sig_info_to_task(), so the fix might be just
@@ -1323,7 +1323,8 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool blocked = sigismember(&t->blocked, sig); if (blocked || ignored || sigdfl) { action->sa.sa_handler = SIG_DFL;
action->sa.sa_flags |= SA_IMMUTABLE;
if (sigdfl)
action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t);
Kyle, does that fix your test-case? And Kees - yours?
This fixes most of the issues with rr, but it still changes the ptrace behavior for the double-SIGSEGV case (yes, we have a test for that too). The second SIGSEGV isn't reported to the ptracer and the program just skips straight to the PTRACE_EVENT_EXIT. This is visible in gdb as well (only the first SIGSEGV is caught).
- Kyle
On Wed, Nov 17, 2021 at 4:37 PM Kyle Huey me@kylehuey.com wrote:
This fixes most of the issues with rr, but it still changes the ptrace behavior for the double-SIGSEGV case
Hmm. I think that's because of how "force_sigsgv()" works.
I absolutely detest that function.
So we have signal_setup_done() doing that
if (failed) force_sigsegv(ksig->sig);
and then force_sigsegv() has that completely insane
if (sig == SIGSEGV) force_fatal_sig(SIGSEGV); else force_sig(SIGSEGV);
behavior.
And I think I know the _reason_ for that complete insanity: when SIGSEGV takes a SIGSEGV, and there is a handler, we need to stop trying to send more SIGSEGV's.
But it does mean that with my change, that second SIGSEGV now ends up being that SA_IMMUTABLE kind, so yeah, it broke the debugger test - where catching the second SIGSEGV is actually somewhat sensible (ok, not really, but at least understandable)
End result: I think we want not a boolean, but a three-way choice for that force_sig_info_to_task() thing:
- unconditionally fatal (for things that just want to force an exit and used to do do_exit())
- ignore valid and unblocked handler (for that SIGSEGV recursion case, aka force "sigdfl")
- catching signal ok
So my one-liner isn't sufficient. It wants some kind of nasty enum.
At least the enum can be entirely internal to kernel/signal.c, I think. No need to expose this all to anything else.
Linus
On Wed, Nov 17, 2021 at 5:11 PM Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, Nov 17, 2021 at 4:37 PM Kyle Huey me@kylehuey.com wrote:
This fixes most of the issues with rr, but it still changes the ptrace behavior for the double-SIGSEGV case
Hmm. I think that's because of how "force_sigsgv()" works.
I absolutely detest that function.
So we have signal_setup_done() doing that
if (failed) force_sigsegv(ksig->sig);
and then force_sigsegv() has that completely insane
if (sig == SIGSEGV) force_fatal_sig(SIGSEGV); else force_sig(SIGSEGV);
behavior.
And I think I know the _reason_ for that complete insanity: when SIGSEGV takes a SIGSEGV, and there is a handler, we need to stop trying to send more SIGSEGV's.
Right, in our test we setup a SIGSEGV handler on an alt stack that doesn't actually exist, and then overflow the regular stack, and that would loop forever trying to setup SIGSEGV handlers if it weren't for force_sigsegv and the sigdfl=true stuff.
But it does mean that with my change, that second SIGSEGV now ends up being that SA_IMMUTABLE kind, so yeah, it broke the debugger test - where catching the second SIGSEGV is actually somewhat sensible (ok, not really, but at least understandable)
End result: I think we want not a boolean, but a three-way choice for that force_sig_info_to_task() thing:
with the following clarifications, yes
- unconditionally fatal (for things that just want to force an exit
and used to do do_exit())
no matter what the ptracer wants
- ignore valid and unblocked handler (for that SIGSEGV recursion
case, aka force "sigdfl")
but following the usual ptrace rules
- catching signal ok
So my one-liner isn't sufficient. It wants some kind of nasty enum.
At least the enum can be entirely internal to kernel/signal.c, I think. No need to expose this all to anything else.
Linus
Yeah that's one way to solve the problem. I think you're right that fundamentally the problem here is that what SECCOMP_RET_KILL wants is not really a signal. To the extent that it wants a signal, what it really wants is SIGKILL, and the problem here is the code trying to act like SIGKILL but call it SIGSYS. I assume the ship for fixing that sailed years ago though.
- Kyle
On Wed, Nov 17, 2021 at 05:20:33PM -0800, Kyle Huey wrote:
Yeah that's one way to solve the problem. I think you're right that fundamentally the problem here is that what SECCOMP_RET_KILL wants is not really a signal. To the extent that it wants a signal, what it really wants is SIGKILL, and the problem here is the code trying to act like SIGKILL but call it SIGSYS. I assume the ship for fixing that sailed years ago though.
Yeah, this was IIRC, a specific design choice (to distinguish a seccomp KILL from a SIGKILL), as desired by the sandboxing folks, and instead of using two different signals (one for KILL and one for TRAP), both used SIGSYS, with the KILL variant being uncatchable.
Kees Cook keescook@chromium.org writes:
On Wed, Nov 17, 2021 at 05:20:33PM -0800, Kyle Huey wrote:
Yeah that's one way to solve the problem. I think you're right that fundamentally the problem here is that what SECCOMP_RET_KILL wants is not really a signal. To the extent that it wants a signal, what it really wants is SIGKILL, and the problem here is the code trying to act like SIGKILL but call it SIGSYS. I assume the ship for fixing that sailed years ago though.
Yeah, this was IIRC, a specific design choice (to distinguish a seccomp KILL from a SIGKILL), as desired by the sandboxing folks, and instead of using two different signals (one for KILL and one for TRAP), both used SIGSYS, with the KILL variant being uncatchable.
I see a general consensus on how to fix the regression. Linus patch plus some tweaks. I will get to work on that today.
For v5.15 I think all that needs to get fixed is what Linus fixed and the force_sigsegv case. That is my priority.
For v5.16-rc1+ the instances that became force_fatal_signal need a careful review to figure out which semantics we want.
Having a clear distinction between which forced signals we can let the debugger intercept and which ones we can not seems to be what needs to be added.
Kyle thank you for your explanation of what breaks. For future kernels I do need to do some work in this area and I will copy on the patches going forward. In particular I strongly suspect that changing the sigaction and blocked state of the signal for these synchronous signals is the wrong thing to do, especially if the process is not killed. I want to find another solution that does not break things but that also does not change the program state behind the programs back so things work differently under the debugger.
Eric
On Thu, Nov 18, 2021 at 8:12 AM Eric W. Biederman ebiederm@xmission.com wrote:
Kyle thank you for your explanation of what breaks. For future kernels I do need to do some work in this area and I will copy on the patches going forward. In particular I strongly suspect that changing the sigaction and blocked state of the signal for these synchronous signals is the wrong thing to do, especially if the process is not killed. I want to find another solution that does not break things but that also does not change the program state behind the programs back so things work differently under the debugger.
The heads up in the future is appreciated, thanks.
- Kyle
On Fri, Nov 19, 2021 at 08:07:36AM -0800, Kyle Huey wrote:
On Thu, Nov 18, 2021 at 8:12 AM Eric W. Biederman ebiederm@xmission.com wrote:
Kyle thank you for your explanation of what breaks. For future kernels I do need to do some work in this area and I will copy on the patches going forward. In particular I strongly suspect that changing the sigaction and blocked state of the signal for these synchronous signals is the wrong thing to do, especially if the process is not killed. I want to find another solution that does not break things but that also does not change the program state behind the programs back so things work differently under the debugger.
The heads up in the future is appreciated, thanks.
Yeah, I wonder if we could add you as a Reviewer in the MAINTAINERS file for ptrace/signal stuff? Then anyone using scripts/get_maintainers.pl would have a CC to you added.
Also, are there more instructions about running the rr tests? When the execve refactoring was happening, I tried it[1], but the results were unclear (there seemed to be a lot of warnings and it made me think I'd done something wrong on my end).
-Kees
[1] https://github.com/rr-debugger/rr/wiki/Building-And-Installing#tests
On Fri, Nov 19, 2021 at 8:36 AM Kees Cook keescook@chromium.org wrote:
On Fri, Nov 19, 2021 at 08:07:36AM -0800, Kyle Huey wrote:
On Thu, Nov 18, 2021 at 8:12 AM Eric W. Biederman ebiederm@xmission.com wrote:
Kyle thank you for your explanation of what breaks. For future kernels I do need to do some work in this area and I will copy on the patches going forward. In particular I strongly suspect that changing the sigaction and blocked state of the signal for these synchronous signals is the wrong thing to do, especially if the process is not killed. I want to find another solution that does not break things but that also does not change the program state behind the programs back so things work differently under the debugger.
The heads up in the future is appreciated, thanks.
Yeah, I wonder if we could add you as a Reviewer in the MAINTAINERS file for ptrace/signal stuff? Then anyone using scripts/get_maintainers.pl would have a CC to you added.
I don't object to that. I guess we'll see how manageable the email load is.
Also, are there more instructions about running the rr tests? When the execve refactoring was happening, I tried it[1], but the results were unclear (there seemed to be a lot of warnings and it made me think I'd done something wrong on my end).
It's a standard cmake test suite. The easiest way to run it is just to run `make check`, wait a while, and see what gets printed out at the end as failing. There's a couple thousand tests that run and they print all sorts of output ... some of them even crash intentionally to make sure we can record specific types of crashes, so the ctest pass/fail output at the very end is the only reliable indicator.
If you have specific issues you're seeing I'm happy to follow up here or off list.
- Kyle
-Kees
[1] https://github.com/rr-debugger/rr/wiki/Building-And-Installing#tests
-- Kees Cook
SA_IMMUTABLE fixed issues with force_sig_seccomp and the introduction for force_sig_fatal where the exit previously could not be interrupted but now it can. Unfortunately it added that behavior to all force_sig functions under the right conditions which debuggers usage of SIG_TRAP and debuggers handling of SIGSEGV.
Solve that by limiting SA_IMMUTABLE to just the cases that historically debuggers have not been able to intercept.
The first patch changes force_sig_info_to_task to take a flag that requests which behavior is desired.
The second patch adds force_exit_sig which replaces force_fatal_sig in the cases where historically userspace would only find out about the ``signal'' after the process has exited.
The first one with the hunk changing force_fatal_sig removed should be suitable for backporting to v5.15. v5.15 does not implement force_fatal_sig.
This should be enough to fix the regressions.
Kyle if you can double check me that I have properly fixed these issues that would be appreciated.
Any other review or suggestions to improve the names would be appreciated. I think I have named things reasonably well but I am very close to the code so it is easy for me to miss things.
Eric W. Biederman (2): signal: Don't always set SA_IMMUTABLE for forced signals signal: Replace force_fatal_sig with force_exit_sig when in doubt
arch/m68k/kernel/traps.c | 2 +- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 4 ++-- arch/s390/kernel/traps.c | 2 +- arch/sparc/kernel/signal_32.c | 4 ++-- arch/sparc/kernel/windows.c | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/vm86_32.c | 2 +- include/linux/sched/signal.h | 1 + kernel/entry/syscall_user_dispatch.c | 4 ++-- kernel/signal.c | 36 ++++++++++++++++++++++++++++------- 11 files changed, 42 insertions(+), 19 deletions(-)
Eric
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace known about the signal until the process has exited.
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com --- kernel/signal.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c index 7c4b7ae714d4..02058c983bd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1298,6 +1298,12 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p return ret; }
+enum sig_handler { + HANDLER_CURRENT, /* If reachable use the current handler */ + HANDLER_SIG_DFL, /* Always use SIG_DFL handler semantics */ + HANDLER_EXIT, /* Only visible as the proces exit code */ +}; + /* * Force a signal that the process can't ignore: if necessary * we unblock the signal and change any SIG_IGN to SIG_DFL. @@ -1310,7 +1316,8 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p * that is why we also clear SIGNAL_UNKILLABLE. */ static int -force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool sigdfl) +force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, + enum sig_handler handler) { unsigned long int flags; int ret, blocked, ignored; @@ -1321,9 +1328,10 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig); - if (blocked || ignored || sigdfl) { + if (blocked || ignored || (handler != HANDLER_CURRENT)) { action->sa.sa_handler = SIG_DFL; - action->sa.sa_flags |= SA_IMMUTABLE; + if (handler == HANDLER_EXIT) + action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t); @@ -1343,7 +1351,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool
int force_sig_info(struct kernel_siginfo *info) { - return force_sig_info_to_task(info, current, false); + return force_sig_info_to_task(info, current, HANDLER_CURRENT); }
/* @@ -1660,7 +1668,7 @@ void force_fatal_sig(int sig) info.si_code = SI_KERNEL; info.si_pid = 0; info.si_uid = 0; - force_sig_info_to_task(&info, current, true); + force_sig_info_to_task(&info, current, HANDLER_SIG_DFL); }
/* @@ -1693,7 +1701,7 @@ int force_sig_fault_to_task(int sig, int code, void __user *addr info.si_flags = flags; info.si_isr = isr; #endif - return force_sig_info_to_task(&info, t, false); + return force_sig_info_to_task(&info, t, HANDLER_CURRENT); }
int force_sig_fault(int sig, int code, void __user *addr @@ -1813,7 +1821,8 @@ int force_sig_seccomp(int syscall, int reason, bool force_coredump) info.si_errno = reason; info.si_arch = syscall_get_arch(current); info.si_syscall = syscall; - return force_sig_info_to_task(&info, current, force_coredump); + return force_sig_info_to_task(&info, current, + force_coredump ? HANDLER_EXIT : HANDLER_CURRENT); }
/* For the crazy architectures that include trap information in
On Thu, Nov 18, 2021 at 04:04:58PM -0600, Eric W. Biederman wrote:
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace known about the signal until the process has exited.
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
Thanks! This passes the seccomp self-tests.
Reviewed-by: Kees Cook keescook@chromium.org Tested-by: Kees Cook keescook@chromium.org
-Kees
kernel/signal.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c index 7c4b7ae714d4..02058c983bd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1298,6 +1298,12 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p return ret; } +enum sig_handler {
- HANDLER_CURRENT, /* If reachable use the current handler */
- HANDLER_SIG_DFL, /* Always use SIG_DFL handler semantics */
- HANDLER_EXIT, /* Only visible as the proces exit code */
+};
/*
- Force a signal that the process can't ignore: if necessary
- we unblock the signal and change any SIG_IGN to SIG_DFL.
@@ -1310,7 +1316,8 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
- that is why we also clear SIGNAL_UNKILLABLE.
*/ static int -force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool sigdfl) +force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
- enum sig_handler handler)
{ unsigned long int flags; int ret, blocked, ignored; @@ -1321,9 +1328,10 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig);
- if (blocked || ignored || sigdfl) {
- if (blocked || ignored || (handler != HANDLER_CURRENT)) { action->sa.sa_handler = SIG_DFL;
action->sa.sa_flags |= SA_IMMUTABLE;
if (handler == HANDLER_EXIT)
if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t);action->sa.sa_flags |= SA_IMMUTABLE;
@@ -1343,7 +1351,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool int force_sig_info(struct kernel_siginfo *info) {
- return force_sig_info_to_task(info, current, false);
- return force_sig_info_to_task(info, current, HANDLER_CURRENT);
} /* @@ -1660,7 +1668,7 @@ void force_fatal_sig(int sig) info.si_code = SI_KERNEL; info.si_pid = 0; info.si_uid = 0;
- force_sig_info_to_task(&info, current, true);
- force_sig_info_to_task(&info, current, HANDLER_SIG_DFL);
} /* @@ -1693,7 +1701,7 @@ int force_sig_fault_to_task(int sig, int code, void __user *addr info.si_flags = flags; info.si_isr = isr; #endif
- return force_sig_info_to_task(&info, t, false);
- return force_sig_info_to_task(&info, t, HANDLER_CURRENT);
} int force_sig_fault(int sig, int code, void __user *addr @@ -1813,7 +1821,8 @@ int force_sig_seccomp(int syscall, int reason, bool force_coredump) info.si_errno = reason; info.si_arch = syscall_get_arch(current); info.si_syscall = syscall;
- return force_sig_info_to_task(&info, current, force_coredump);
- return force_sig_info_to_task(&info, current,
force_coredump ? HANDLER_EXIT : HANDLER_CURRENT);
} /* For the crazy architectures that include trap information in -- 2.20.1
On Thu, Nov 18, 2021 at 04:04:58PM -0600, Eric W. Biederman wrote:
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace known about the signal until the process has exited.
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
kernel/signal.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c index 7c4b7ae714d4..02058c983bd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1298,6 +1298,12 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p return ret; } +enum sig_handler {
- HANDLER_CURRENT, /* If reachable use the current handler */
- HANDLER_SIG_DFL, /* Always use SIG_DFL handler semantics */
- HANDLER_EXIT, /* Only visible as the proces exit code */
Oh, I just noticed this typo "proces" -> "process"
-Kees
Kees Cook keescook@chromium.org writes:
On Thu, Nov 18, 2021 at 04:04:58PM -0600, Eric W. Biederman wrote:
diff --git a/kernel/signal.c b/kernel/signal.c index 7c4b7ae714d4..02058c983bd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1298,6 +1298,12 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p return ret; } +enum sig_handler {
- HANDLER_CURRENT, /* If reachable use the current handler */
- HANDLER_SIG_DFL, /* Always use SIG_DFL handler semantics */
- HANDLER_EXIT, /* Only visible as the proces exit code */
Oh, I just noticed this typo "proces" -> "process"
Fixed. Thank you.
Eric
On Thu, Nov 18, 2021 at 2:05 PM Eric W. Biederman ebiederm@xmission.com wrote:
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace known about the signal until the process has exited.
s/known/know/
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902
This link doesn't work.
Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
kernel/signal.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c index 7c4b7ae714d4..02058c983bd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1298,6 +1298,12 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p return ret; }
+enum sig_handler {
HANDLER_CURRENT, /* If reachable use the current handler */
HANDLER_SIG_DFL, /* Always use SIG_DFL handler semantics */
HANDLER_EXIT, /* Only visible as the proces exit code */
+};
/*
- Force a signal that the process can't ignore: if necessary
- we unblock the signal and change any SIG_IGN to SIG_DFL.
@@ -1310,7 +1316,8 @@ int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p
- that is why we also clear SIGNAL_UNKILLABLE.
*/ static int -force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool sigdfl) +force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t,
enum sig_handler handler)
{ unsigned long int flags; int ret, blocked, ignored; @@ -1321,9 +1328,10 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig);
if (blocked || ignored || sigdfl) {
if (blocked || ignored || (handler != HANDLER_CURRENT)) { action->sa.sa_handler = SIG_DFL;
action->sa.sa_flags |= SA_IMMUTABLE;
if (handler == HANDLER_EXIT)
action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t);
@@ -1343,7 +1351,7 @@ force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool
int force_sig_info(struct kernel_siginfo *info) {
return force_sig_info_to_task(info, current, false);
return force_sig_info_to_task(info, current, HANDLER_CURRENT);
}
/* @@ -1660,7 +1668,7 @@ void force_fatal_sig(int sig) info.si_code = SI_KERNEL; info.si_pid = 0; info.si_uid = 0;
force_sig_info_to_task(&info, current, true);
force_sig_info_to_task(&info, current, HANDLER_SIG_DFL);
}
/* @@ -1693,7 +1701,7 @@ int force_sig_fault_to_task(int sig, int code, void __user *addr info.si_flags = flags; info.si_isr = isr; #endif
return force_sig_info_to_task(&info, t, false);
return force_sig_info_to_task(&info, t, HANDLER_CURRENT);
}
int force_sig_fault(int sig, int code, void __user *addr @@ -1813,7 +1821,8 @@ int force_sig_seccomp(int syscall, int reason, bool force_coredump) info.si_errno = reason; info.si_arch = syscall_get_arch(current); info.si_syscall = syscall;
return force_sig_info_to_task(&info, current, force_coredump);
return force_sig_info_to_task(&info, current,
force_coredump ? HANDLER_EXIT : HANDLER_CURRENT);
}
/* For the crazy architectures that include trap information in
2.20.1
- Kyle
Kyle Huey me@kylehuey.com writes:
On Thu, Nov 18, 2021 at 2:05 PM Eric W. Biederman ebiederm@xmission.com wrote:
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Update force_sig_to_task to support both the case when we can allow the debugger to intercept and possibly ignore the signal and the case when it is not safe to let userspace known about the signal until the process has exited.
s/known/know/
Fixed.
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com Cc: stable@vger.kernel.org [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902
This link doesn't work.
Shame. I missed a trailing 0, but unfortunately that request did not go to list that is archived on lore. I will keep the link on the chance the message winds up in a lore archive in the future.
Eric
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept.
In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis.
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com --- arch/m68k/kernel/traps.c | 2 +- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 4 ++-- arch/s390/kernel/traps.c | 2 +- arch/sparc/kernel/signal_32.c | 4 ++-- arch/sparc/kernel/windows.c | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/vm86_32.c | 2 +- include/linux/sched/signal.h | 1 + kernel/entry/syscall_user_dispatch.c | 4 ++-- kernel/signal.c | 13 +++++++++++++ 11 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c index 99058a6da956..34d6458340b0 100644 --- a/arch/m68k/kernel/traps.c +++ b/arch/m68k/kernel/traps.c @@ -1145,7 +1145,7 @@ asmlinkage void set_esp0(unsigned long ssp) */ asmlinkage void fpsp040_die(void) { - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); }
#ifdef CONFIG_M68KFPU_EMU diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 00a9c9cd6d42..3e053e2fd6b6 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1063,7 +1063,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, * We kill the task with a SIGSEGV in this situation. */ if (do_setcontext(new_ctx, regs, 0)) { - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); return -EFAULT; }
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index ef518535d436..d1e1fc0acbea 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -704,7 +704,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, */
if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) { - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); return -EFAULT; } set_current_blocked(&set); @@ -713,7 +713,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, return -EFAULT; if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) { user_read_access_end(); - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); return -EFAULT; } user_read_access_end(); diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c index 035705c9f23e..2b780786fc68 100644 --- a/arch/s390/kernel/traps.c +++ b/arch/s390/kernel/traps.c @@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs) { if (user_mode(regs)) { report_user_fault(regs, SIGSEGV, 0); - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); } else die(regs, "Unknown program exception"); } diff --git a/arch/sparc/kernel/signal_32.c b/arch/sparc/kernel/signal_32.c index cd677bc564a7..ffab16369bea 100644 --- a/arch/sparc/kernel/signal_32.c +++ b/arch/sparc/kernel/signal_32.c @@ -244,7 +244,7 @@ static int setup_frame(struct ksignal *ksig, struct pt_regs *regs, get_sigframe(ksig, regs, sigframe_size);
if (invalid_frame_pointer(sf, sigframe_size)) { - force_fatal_sig(SIGILL); + force_exit_sig(SIGILL); return -EINVAL; }
@@ -336,7 +336,7 @@ static int setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs, sf = (struct rt_signal_frame __user *) get_sigframe(ksig, regs, sigframe_size); if (invalid_frame_pointer(sf, sigframe_size)) { - force_fatal_sig(SIGILL); + force_exit_sig(SIGILL); return -EINVAL; }
diff --git a/arch/sparc/kernel/windows.c b/arch/sparc/kernel/windows.c index bbbd40cc6b28..8f20862ccc83 100644 --- a/arch/sparc/kernel/windows.c +++ b/arch/sparc/kernel/windows.c @@ -122,7 +122,7 @@ void try_to_clear_window_buffer(struct pt_regs *regs, int who) if ((sp & 7) || copy_to_user((char __user *) sp, &tp->reg_window[window], sizeof(struct reg_window32))) { - force_fatal_sig(SIGILL); + force_exit_sig(SIGILL); return; } } diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index 0b6b277ee050..fd2ee9408e91 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -226,7 +226,7 @@ bool emulate_vsyscall(unsigned long error_code, if ((!tmp && regs->orig_ax != syscall_nr) || regs->ip != address) { warn_bad_vsyscall(KERN_DEBUG, regs, "seccomp tried to change syscall nr or ip"); - force_fatal_sig(SIGSYS); + force_exit_sig(SIGSYS); return true; } regs->orig_ax = -1; diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index cce1c89cb7df..c21bcd668284 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -160,7 +160,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) user_access_end(); Efault: pr_alert("could not access userspace vm86 info\n"); - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); goto exit_vm86; }
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 23505394ef70..33a50642cf41 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -352,6 +352,7 @@ extern __must_check bool do_notify_parent(struct task_struct *, int); extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent); extern void force_sig(int); extern void force_fatal_sig(int); +extern void force_exit_sig(int); extern int send_sig(int, struct task_struct *, int); extern int zap_other_threads(struct task_struct *p); extern struct sigqueue *sigqueue_alloc(void); diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c index 4508201847d2..0b6379adff6b 100644 --- a/kernel/entry/syscall_user_dispatch.c +++ b/kernel/entry/syscall_user_dispatch.c @@ -48,7 +48,7 @@ bool syscall_user_dispatch(struct pt_regs *regs) * the selector is loaded by userspace. */ if (unlikely(__get_user(state, sd->selector))) { - force_fatal_sig(SIGSEGV); + force_exit_sig(SIGSEGV); return true; }
@@ -56,7 +56,7 @@ bool syscall_user_dispatch(struct pt_regs *regs) return false;
if (state != SYSCALL_DISPATCH_FILTER_BLOCK) { - force_fatal_sig(SIGSYS); + force_exit_sig(SIGSYS); return true; } } diff --git a/kernel/signal.c b/kernel/signal.c index 02058c983bd6..fe7ba05145d4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1671,6 +1671,19 @@ void force_fatal_sig(int sig) force_sig_info_to_task(&info, current, HANDLER_SIG_DFL); }
+void force_exit_sig(int sig) +{ + struct kernel_siginfo info; + + clear_siginfo(&info); + info.si_signo = sig; + info.si_errno = 0; + info.si_code = SI_KERNEL; + info.si_pid = 0; + info.si_uid = 0; + force_sig_info_to_task(&info, current, HANDLER_EXIT); +} + /* * When things go south during signal handling, we * will force a SIGSEGV. And if the signal that caused
On Thu, Nov 18, 2021 at 04:05:31PM -0600, Eric W. Biederman wrote:
Recently to prevent issues with SECCOMP_RET_KILL and similar signals being changed before they are delivered SA_IMMUTABLE was added.
Unfortunately this broke debuggers[1][2] which reasonably expect to be able to trap synchronous SIGTRAP and SIGSEGV even when the target process is not configured to handle those signals.
Add force_exit_sig and use it instead of force_fatal_sig where historically the code has directly called do_exit. This has the implementation benefits of going through the signal exit path (including generating core dumps) without the danger of allowing userspace to ignore or change these signals.
This avoids userspace regressions as older kernels exited with do_exit which debuggers also can not intercept.
In the future is should be possible to improve the quality of implementation of the kernel by changing some of these force_exit_sig calls to force_fatal_sig. That can be done where it matters on a case-by-case basis with careful analysis.
Reported-by: Kyle Huey me@kylehuey.com Reported-by: kernel test robot oliver.sang@intel.com [1] https://lkml.kernel.org/r/CAP045AoMY4xf8aC_4QU_-j7obuEPYgTcnQQP3Yxk=2X90jtpj... [2] https://lkml.kernel.org/r/20211117150258.GB5403@xsang-OptiPlex-902 Fixes: 00b06da29cf9 ("signal: Add SA_IMMUTABLE to ensure forced siganls do not get changed") Fixes: a3616a3c0272 ("signal/m68k: Use force_sigsegv(SIGSEGV) in fpsp040_die") Fixes: 83a1f27ad773 ("signal/powerpc: On swapcontext failure force SIGSEGV") Fixes: 9bc508cf0791 ("signal/s390: Use force_sigsegv in default_trap_handler") Fixes: 086ec444f866 ("signal/sparc32: In setup_rt_frame and setup_fram use force_fatal_sig") Fixes: c317d306d550 ("signal/sparc32: Exit with a fatal signal when try_to_clear_window_buffer fails") Fixes: 695dd0d634df ("signal/x86: In emulate_vsyscall force a signal instead of calling do_exit") Fixes: 1fbd60df8a85 ("signal/vm86_32: Properly send SIGSEGV when the vm86 state cannot be saved.") Fixes: 941edc5bf174 ("exit/syscall_user_dispatch: Send ordinary signals on failure") Signed-off-by: "Eric W. Biederman" ebiederm@xmission.com
Reviewed-by: Kees Cook keescook@chromium.org Tested-by: Kees Cook keescook@chromium.org
Thanks!
-Kees
arch/m68k/kernel/traps.c | 2 +- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 4 ++-- arch/s390/kernel/traps.c | 2 +- arch/sparc/kernel/signal_32.c | 4 ++-- arch/sparc/kernel/windows.c | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/vm86_32.c | 2 +- include/linux/sched/signal.h | 1 + kernel/entry/syscall_user_dispatch.c | 4 ++-- kernel/signal.c | 13 +++++++++++++ 11 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/arch/m68k/kernel/traps.c b/arch/m68k/kernel/traps.c index 99058a6da956..34d6458340b0 100644 --- a/arch/m68k/kernel/traps.c +++ b/arch/m68k/kernel/traps.c @@ -1145,7 +1145,7 @@ asmlinkage void set_esp0(unsigned long ssp) */ asmlinkage void fpsp040_die(void) {
- force_fatal_sig(SIGSEGV);
- force_exit_sig(SIGSEGV);
} #ifdef CONFIG_M68KFPU_EMU diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c index 00a9c9cd6d42..3e053e2fd6b6 100644 --- a/arch/powerpc/kernel/signal_32.c +++ b/arch/powerpc/kernel/signal_32.c @@ -1063,7 +1063,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, * We kill the task with a SIGSEGV in this situation. */ if (do_setcontext(new_ctx, regs, 0)) {
force_fatal_sig(SIGSEGV);
return -EFAULT; }force_exit_sig(SIGSEGV);
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c index ef518535d436..d1e1fc0acbea 100644 --- a/arch/powerpc/kernel/signal_64.c +++ b/arch/powerpc/kernel/signal_64.c @@ -704,7 +704,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, */ if (__get_user_sigset(&set, &new_ctx->uc_sigmask)) {
force_fatal_sig(SIGSEGV);
return -EFAULT; } set_current_blocked(&set);force_exit_sig(SIGSEGV);
@@ -713,7 +713,7 @@ SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx, return -EFAULT; if (__unsafe_restore_sigcontext(current, NULL, 0, &new_ctx->uc_mcontext)) { user_read_access_end();
force_fatal_sig(SIGSEGV);
return -EFAULT; } user_read_access_end();force_exit_sig(SIGSEGV);
diff --git a/arch/s390/kernel/traps.c b/arch/s390/kernel/traps.c index 035705c9f23e..2b780786fc68 100644 --- a/arch/s390/kernel/traps.c +++ b/arch/s390/kernel/traps.c @@ -84,7 +84,7 @@ static void default_trap_handler(struct pt_regs *regs) { if (user_mode(regs)) { report_user_fault(regs, SIGSEGV, 0);
force_fatal_sig(SIGSEGV);
} else die(regs, "Unknown program exception");force_exit_sig(SIGSEGV);
} diff --git a/arch/sparc/kernel/signal_32.c b/arch/sparc/kernel/signal_32.c index cd677bc564a7..ffab16369bea 100644 --- a/arch/sparc/kernel/signal_32.c +++ b/arch/sparc/kernel/signal_32.c @@ -244,7 +244,7 @@ static int setup_frame(struct ksignal *ksig, struct pt_regs *regs, get_sigframe(ksig, regs, sigframe_size); if (invalid_frame_pointer(sf, sigframe_size)) {
force_fatal_sig(SIGILL);
return -EINVAL; }force_exit_sig(SIGILL);
@@ -336,7 +336,7 @@ static int setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs, sf = (struct rt_signal_frame __user *) get_sigframe(ksig, regs, sigframe_size); if (invalid_frame_pointer(sf, sigframe_size)) {
force_fatal_sig(SIGILL);
return -EINVAL; }force_exit_sig(SIGILL);
diff --git a/arch/sparc/kernel/windows.c b/arch/sparc/kernel/windows.c index bbbd40cc6b28..8f20862ccc83 100644 --- a/arch/sparc/kernel/windows.c +++ b/arch/sparc/kernel/windows.c @@ -122,7 +122,7 @@ void try_to_clear_window_buffer(struct pt_regs *regs, int who) if ((sp & 7) || copy_to_user((char __user *) sp, &tp->reg_window[window], sizeof(struct reg_window32))) {
force_fatal_sig(SIGILL);
} }force_exit_sig(SIGILL); return;
diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index 0b6b277ee050..fd2ee9408e91 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -226,7 +226,7 @@ bool emulate_vsyscall(unsigned long error_code, if ((!tmp && regs->orig_ax != syscall_nr) || regs->ip != address) { warn_bad_vsyscall(KERN_DEBUG, regs, "seccomp tried to change syscall nr or ip");
force_fatal_sig(SIGSYS);
return true; } regs->orig_ax = -1;force_exit_sig(SIGSYS);
diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index cce1c89cb7df..c21bcd668284 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -160,7 +160,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) user_access_end(); Efault: pr_alert("could not access userspace vm86 info\n");
- force_fatal_sig(SIGSEGV);
- force_exit_sig(SIGSEGV); goto exit_vm86;
} diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 23505394ef70..33a50642cf41 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -352,6 +352,7 @@ extern __must_check bool do_notify_parent(struct task_struct *, int); extern void __wake_up_parent(struct task_struct *p, struct task_struct *parent); extern void force_sig(int); extern void force_fatal_sig(int); +extern void force_exit_sig(int); extern int send_sig(int, struct task_struct *, int); extern int zap_other_threads(struct task_struct *p); extern struct sigqueue *sigqueue_alloc(void); diff --git a/kernel/entry/syscall_user_dispatch.c b/kernel/entry/syscall_user_dispatch.c index 4508201847d2..0b6379adff6b 100644 --- a/kernel/entry/syscall_user_dispatch.c +++ b/kernel/entry/syscall_user_dispatch.c @@ -48,7 +48,7 @@ bool syscall_user_dispatch(struct pt_regs *regs) * the selector is loaded by userspace. */ if (unlikely(__get_user(state, sd->selector))) {
force_fatal_sig(SIGSEGV);
}force_exit_sig(SIGSEGV); return true;
@@ -56,7 +56,7 @@ bool syscall_user_dispatch(struct pt_regs *regs) return false; if (state != SYSCALL_DISPATCH_FILTER_BLOCK) {
force_fatal_sig(SIGSYS);
} }force_exit_sig(SIGSYS); return true;
diff --git a/kernel/signal.c b/kernel/signal.c index 02058c983bd6..fe7ba05145d4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1671,6 +1671,19 @@ void force_fatal_sig(int sig) force_sig_info_to_task(&info, current, HANDLER_SIG_DFL); } +void force_exit_sig(int sig) +{
- struct kernel_siginfo info;
- clear_siginfo(&info);
- info.si_signo = sig;
- info.si_errno = 0;
- info.si_code = SI_KERNEL;
- info.si_pid = 0;
- info.si_uid = 0;
- force_sig_info_to_task(&info, current, HANDLER_EXIT);
+}
/*
- When things go south during signal handling, we
- will force a SIGSEGV. And if the signal that caused
-- 2.20.1
On Thu, Nov 18, 2021 at 1:58 PM Eric W. Biederman ebiederm@xmission.com wrote:
SA_IMMUTABLE fixed issues with force_sig_seccomp and the introduction for force_sig_fatal where the exit previously could not be interrupted but now it can. Unfortunately it added that behavior to all force_sig functions under the right conditions which debuggers usage of SIG_TRAP and debuggers handling of SIGSEGV.
Solve that by limiting SA_IMMUTABLE to just the cases that historically debuggers have not been able to intercept.
The first patch changes force_sig_info_to_task to take a flag that requests which behavior is desired.
The second patch adds force_exit_sig which replaces force_fatal_sig in the cases where historically userspace would only find out about the ``signal'' after the process has exited.
The first one with the hunk changing force_fatal_sig removed should be suitable for backporting to v5.15. v5.15 does not implement force_fatal_sig.
This should be enough to fix the regressions.
Kyle if you can double check me that I have properly fixed these issues that would be appreciated.
Any other review or suggestions to improve the names would be appreciated. I think I have named things reasonably well but I am very close to the code so it is easy for me to miss things.
Eric W. Biederman (2): signal: Don't always set SA_IMMUTABLE for forced signals signal: Replace force_fatal_sig with force_exit_sig when in doubt
arch/m68k/kernel/traps.c | 2 +- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 4 ++-- arch/s390/kernel/traps.c | 2 +- arch/sparc/kernel/signal_32.c | 4 ++-- arch/sparc/kernel/windows.c | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/vm86_32.c | 2 +- include/linux/sched/signal.h | 1 + kernel/entry/syscall_user_dispatch.c | 4 ++-- kernel/signal.c | 36 ++++++++++++++++++++++++++++------- 11 files changed, 42 insertions(+), 19 deletions(-)
Eric
rr's test suite passes with both diffs applied
Tested-by: Kyle Huey khuey@kylehuey.com
- Kyle
Linus,
Please pull the SA_IMMUTABLE-fixes-for-v5.16-rc2 branch from the git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git SA_IMMUTABLE-fixes-for-v5.16-rc2
HEAD: fcb116bc43c8c37c052530ead79872f8b2615711 signal: Replace force_fatal_sig with force_exit_sig when in doubt
This is just a small set of changes where debuggers were no longer able to intercept synchronous SIGTRAP and SIGSEGV. This is essentially the change you suggested with all of i's dotted and the t's crossed so that ptrace can intercept all of the cases it has been able to intercept the past, and all of the cases that made it to exit without giving ptrace a chance still don't give ptrace a chance.
This change[1] has been tested by both Kyle and Kees.
Eric W. Biederman (2): signal: Don't always set SA_IMMUTABLE for forced signals signal: Replace force_fatal_sig with force_exit_sig when in doubt
arch/m68k/kernel/traps.c | 2 +- arch/powerpc/kernel/signal_32.c | 2 +- arch/powerpc/kernel/signal_64.c | 4 ++-- arch/s390/kernel/traps.c | 2 +- arch/sparc/kernel/signal_32.c | 4 ++-- arch/sparc/kernel/windows.c | 2 +- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/vm86_32.c | 2 +- include/linux/sched/signal.h | 1 + kernel/entry/syscall_user_dispatch.c | 4 ++-- kernel/signal.c | 36 ++++++++++++++++++++++++++++------- 11 files changed, 42 insertions(+), 19 deletions(-)
[1]: https://lkml.kernel.org/r/87h7c9qg7p.fsf_-_@email.froward.int.ebiederm.org
Eric
The pull request you sent on Fri, 19 Nov 2021 09:41:49 -0600:
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git SA_IMMUTABLE-fixes-for-v5.16-rc2
has been merged into torvalds/linux.git: https://git.kernel.org/torvalds/c/7af959b5d5c8497b423e802e2b0ad847cb29b3d3
Thank you!
On Wed, Nov 17, 2021 at 1:05 PM Eric W. Biederman ebiederm@xmission.com wrote:
Kyle Huey me@kylehuey.com writes:
On Wed, Nov 17, 2021 at 11:05 AM Kyle Huey me@kylehuey.com wrote:
On Wed, Nov 17, 2021 at 10:51 AM Kees Cook keescook@chromium.org wrote:
On Wed, Nov 17, 2021 at 10:47:13AM -0800, Kyle Huey wrote:
rr, a userspace record and replay debugger[0], is completely broken on 5.16rc1. I bisected this to 00b06da29cf9dc633cdba87acd3f57f4df3fd5c7.
That patch makes two changes, it blocks sigaction from changing signal handlers once the kernel has decided to force the program to take a signal and it also stops notifying ptracers of the signal in the same circumstances. The latter behavior is just wrong. There's no reason that ptrace should not be able to observe and even change (non-SIGKILL) forced signals. It should be reverted.
This behavior change is also observable in gdb. If you take a program that sets SIGSYS to SIG_IGN and then raises a SIGSYS via SECCOMP_RET_TRAP and run it under gdb on a good kernel gdb will stop when the SIGSYS is raised, let you inspect program state, etc. After the SA_IMMUTABLE change gdb won't stop until the program has already died of SIGSYS.
Ah, hm, this was trying to fix the case where a program trips SECCOMP_RET_KILL (which is a "fatal SIGSYS"), and had been unobservable before. I guess the fix was too broad...
Perhaps I don't understand precisely what you mean by this, but gdb's behavior for a program that is SECCOMP_RET_KILLed was not changed by this patch (the SIGSYS is not observed until after program exit before or after this change).
Ah, maybe that behavior changed in 5.15 (my "before" here is a 5.14 kernel). I would argue that the debugger seeing the SIGSYS for SECCOMP_RET_KILL is desirable though ...
This is definitely worth discussing, and probably in need of fixing (aka something in rr seems to have broken).
I mean this in the nicest possible way: fixing this is not optional.
We definitely need protection against the race with sigaction.
Sure, no argument here, and that doesn't cause any problems for us.
The fundamental question becomes does it make sense and is it safe to allow a debugger to stop at, and possibly change these signals.
And the answer is yes, because at least some of these signals are generated by actions of the debugger (e.g. setting a breakpoint).
Stopping at something SA_IMMUTABLE as long as the signal is allowed to continue and kill the process when PTRACE_CONT happens seems harmless.
Allowing the debugger to change the signal, or change it's handling I don't know.
This is required to support breakpoints.
All of this is channeled through the following function.
static int force_sig_info_to_task(struct kernel_siginfo *info, struct task_struct *t, bool sigdfl) { unsigned long int flags; int ret, blocked, ignored; struct k_sigaction *action; int sig = info->si_signo;
spin_lock_irqsave(&t->sighand->siglock, flags); action = &t->sighand->action[sig-1]; ignored = action->sa.sa_handler == SIG_IGN; blocked = sigismember(&t->blocked, sig); if (blocked || ignored || sigdfl) { action->sa.sa_handler = SIG_DFL; action->sa.sa_flags |= SA_IMMUTABLE; if (blocked) { sigdelset(&t->blocked, sig); recalc_sigpending_and_wake(t); } } /* * Don't clear SIGNAL_UNKILLABLE for traced tasks, users won't expect * debugging to leave init killable. */ if (action->sa.sa_handler == SIG_DFL && !t->ptrace) t->signal->flags &= ~SIGNAL_UNKILLABLE; ret = send_signal(sig, info, t, PIDTYPE_PID); spin_unlock_irqrestore(&t->sighand->siglock, flags); return ret;
}
Right now we have 3 conditions that trigger SA_IMMUTABLE.
The sigdfl parameter is passed asking that userspace not be able to change the handling of the signal.
A synchronous exception is taken and the signal is blocked.
A synchronous exception is taken and the signal is ignored.
Delivering signals to a ptracee in the latter two cases is simply not optional. As it stands with your change, a program that blocks SIGTRAP or sets its SIGTRAP handler to SIG_IGN becomes undebuggable. If a debugger injects a breakpoint or uses PTRACE_SINGLESTEP on a tracee the delivery of that signal can't be controlled by the tracee's signal state.
Today because of how things are implemented the code most change the userspace state to allow the signal to kill the process. I really want to get rid of that, because that has other side effects. As part of getting rid of changing the state it is my plan to get rid of SA_IMMUTABLE as well. If I don't have to allow the debugger to stop and observe what is happening with the signal that change is much easier to implement.
The classic trigger of sigdfl is a recursive SIGSEGV.
However we have other cases like SECCOMP_RET_KILL where the kernel has never allowed userspace to intercept the killing of the process. Things that have messages like: "seccomp tried to change syscall nr or ip"
My brain is drawing a blank on how to analyze those.
Kees I am back to asking the question I had before I figured out SA_IMMUTABLE. Are there security concerns with debuggers intercepting SECCOMP_RET_KILL.
I think I can modify dequeue_synchronous_signal so that we can perform the necessary logic in get_signal rather than hack up the signal handling state in force_sig_info_to_task.
Except for the cases like SECCOMP_RET_KILL where the kernel has never allowed userspace to intercept the handling. I don't see any fundamental reason why ptrace could not intercept the signal. The handling is overriden to force the process to die, because the way userspace is currently configured to handle the signal does not work so it is necessary to kill the process.
I think there are cases where the userspace state is known to be sufficiently wrong that the kernel can not safely allow anything more than inspecting the state.
I can revisit the code to see if the kernel will get confused if something more is allowed. Still I really like the current semantics of SA_IMMUTABLE because these are cases where something wrong. If someone miscalculates how things are wrong it could result in the kernel getting confused and doing the wrong thing. Allowing the debugger to intercept the signal requires we risk miscalculating what is wrong.
Kyle how exactly is rr broken? Certainly a historical usage does not work. How does this affect actual real world debugging sessions?
rr is broken across the board because of specific things related to its handling of exit_group (namely we first block all signals in the tracee, so that we don't catch a signal during our handling of it, then we hijack the tracee to do some cleanup before exit_group is really allowed to execute, and we use e.g. PTRACE_SINGLESTEP that expects to punch through the signal blocking). But even if I fixed that, I expect there would be other issues. The expectation that these signals will be delivered is deeply embedded.
You noticed this and bisected the change quickly so I fully expect this does affect real world debugging sessions. I just want to know exactly how so that exactly what is wrong can be fixed.
I noticed this because we have a test suite we run against new kernel releases precisely to catch regressions like this.
You don't need rr to reproduce the underlying issue though. Compile the following
``` #include <signal.h> #include <stdio.h>
int main() { signal(SIGTRAP, SIG_IGN); printf("Hello World\n"); return 0; } ```
And try to break on the printf under gdb. After you fix that (and the equivalent where SIGTRAP is blocked) rr should be fine.
- Kyle
As far as I can tell SA_IMMUTABLE has only been backported to v5.15.x where in cleaning things up I made SECCOMP_RET_KILL susceptible to races with sigaction, and ptrace. Those races need to be closed or we need to decide that we don't actually care if the debugger does things.
Eric
linux-kselftest-mirror@lists.linaro.org