On Mon, Sep 7, 2020 at 7:25 AM Christian Brauner christian.brauner@ubuntu.com wrote:
On Mon, Sep 07, 2020 at 07:15:52AM -0700, Andy Lutomirski wrote:
On Sep 7, 2020, at 3:15 AM, Christian Brauner christian.brauner@ubuntu.com wrote:
On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
Syscall User Dispatch (SUD) must take precedence over seccomp, since the use case is emulation (it can be invoked with a different ABI) such that seccomp filtering by syscall number doesn't make sense in the first place. In addition, either the syscall is dispatched back to userspace, in which case there is no resource for seccomp to protect, or the
Tbh, I'm torn here. I'm not a super clever attacker but it feels to me that this is still at least a clever way to circumvent a seccomp sandbox. If I'd be confined by a seccomp profile that would cause me to be SIGKILLed when I try do open() I could prctl() myself to do user dispatch to prevent that from happening, no?
Not really, I think. The idea is that you didn’t actually do open(). You did a SYSCALL instruction which meant something else, and the syscall dispatch correctly prevented the kernel from misinterpreting it as open().
Right, for the case where you're e.g. emulating windows syscalls that's true. I was thinking when you're running natively on Linux: couldn't I first load a seccomp profile "kill me if someone does an open()", then I exec() the target binary and that binary is setup to do prctl(USER_DISPATCH) first thing. I guess, it's ok because as far as I had time to read it this is a nothing or all mechanism, i.e. _all_ system calls are re-routed in contrast to e.g. seccomp where I could do this per-syscall. So for user-dispatch it wouldn't make sense to use it on Linux per se. Still makes me a little uneasy. :)
There's an escape hatch, so processes using this can still make syscalls.
Maybe think about it another way: a process using user dispatch should definitely *not* trigger seccomp user notifiers, errno returns, or ptrace events, since they'll all do the wrong thing. IMO RET_KILL is the same.
Barring some very severe defect, there's no way a program can use user dispatch to escape seccomp -- a program could use user dispatch to allow them to do:
mov $__NR_open, %rax syscall
without dying despite the presence of a filter that would kill the process if it tried to do open(), but this doesn't bypass the filter at all. The process could just as easily have done:
mov $__NR_open jmp magic_stub(%rip)
without tripping the filter, since no system call actually happens here.
--Andy