On Wed, Nov 17, 2021 at 5:11 PM Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, Nov 17, 2021 at 4:37 PM Kyle Huey me@kylehuey.com wrote:
This fixes most of the issues with rr, but it still changes the ptrace behavior for the double-SIGSEGV case
Hmm. I think that's because of how "force_sigsgv()" works.
I absolutely detest that function.
So we have signal_setup_done() doing that
if (failed) force_sigsegv(ksig->sig);
and then force_sigsegv() has that completely insane
if (sig == SIGSEGV) force_fatal_sig(SIGSEGV); else force_sig(SIGSEGV);
behavior.
And I think I know the _reason_ for that complete insanity: when SIGSEGV takes a SIGSEGV, and there is a handler, we need to stop trying to send more SIGSEGV's.
Right, in our test we setup a SIGSEGV handler on an alt stack that doesn't actually exist, and then overflow the regular stack, and that would loop forever trying to setup SIGSEGV handlers if it weren't for force_sigsegv and the sigdfl=true stuff.
But it does mean that with my change, that second SIGSEGV now ends up being that SA_IMMUTABLE kind, so yeah, it broke the debugger test - where catching the second SIGSEGV is actually somewhat sensible (ok, not really, but at least understandable)
End result: I think we want not a boolean, but a three-way choice for that force_sig_info_to_task() thing:
with the following clarifications, yes
- unconditionally fatal (for things that just want to force an exit
and used to do do_exit())
no matter what the ptracer wants
- ignore valid and unblocked handler (for that SIGSEGV recursion
case, aka force "sigdfl")
but following the usual ptrace rules
- catching signal ok
So my one-liner isn't sufficient. It wants some kind of nasty enum.
At least the enum can be entirely internal to kernel/signal.c, I think. No need to expose this all to anything else.
Linus
Yeah that's one way to solve the problem. I think you're right that fundamentally the problem here is that what SECCOMP_RET_KILL wants is not really a signal. To the extent that it wants a signal, what it really wants is SIGKILL, and the problem here is the code trying to act like SIGKILL but call it SIGSYS. I assume the ship for fixing that sailed years ago though.
- Kyle