On Fri, Sep 04, 2020 at 04:31:44PM -0400, Gabriel Krisman Bertazi wrote:
Syscall User Dispatch (SUD) must take precedence over seccomp, since the use case is emulation (it can be invoked with a different ABI) such that seccomp filtering by syscall number doesn't make sense in the first place. In addition, either the syscall is dispatched back to userspace, in which case there is no resource for seccomp to protect, or the
Tbh, I'm torn here. I'm not a super clever attacker but it feels to me that this is still at least a clever way to circumvent a seccomp sandbox. If I'd be confined by a seccomp profile that would cause me to be SIGKILLed when I try do open() I could prctl() myself to do user dispatch to prevent that from happening, no?
syscall will be executed, and seccomp will execute next.
Regarding ptrace, I experimented with before and after, and while the same ABI argument applies, I felt it was easier to debug if I let ptrace happen for syscalls that are dispatched back to userspace. In addition, doing it after ptrace makes the code in syscall_exit_work slightly simpler, since it doesn't require special handling for this feature.
Signed-off-by: Gabriel Krisman Bertazi krisman@collabora.com
kernel/entry/common.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 44fd089d59da..fdb0c543539d 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -6,6 +6,8 @@ #include <linux/audit.h> #include <linux/syscall_intercept.h> +#include "common.h"
#define CREATE_TRACE_POINTS #include <trace/events/syscalls.h> @@ -47,6 +49,12 @@ static inline long do_syscall_intercept(struct pt_regs *regs) int sysint_work = READ_ONCE(current->syscall_intercept); int ret;
- if (sysint_work & SYSINT_USER_DISPATCH) {
ret = do_syscall_user_dispatch(regs);
if (ret == -1L)
return ret;
- }
- if (sysint_work & SYSINT_SECCOMP) { ret = __secure_computing(NULL); if (ret == -1L)
-- 2.28.0