This patch series enables secure computing (system call filtering) on arm64, and contains related enhancements and bug fixes.
NOTE: This versions contain a workaround against possible BUG_ON() failure at audit_syscall_exit(), but doesn't contain an extra optimization, as I submitted for arm, of excluding syscall enter/exit tracing against invalid system calls due to an issue that I reported in: http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/292170.ht...
The code was tested on ARMv8 fast model with 64-bit/32-bit userspace using: * libseccomp v2.1.1 with modifications for arm64, especially its "live" tests: No.20, 21 and 24. * modified version of Kees' seccomp test for 'changing/skipping a syscall' and seccomp() system call * in-house tests for 'changing/skipping a system call' by tracing with ptrace(SETREGSET, NT_SYSTEM_CALL) (that is, not via seccomp filter)' with and without audit tracing.
Changes v7 -> v8: * changed an interface of changing a syscall number from ptrace(SET_SYSCALL) to ptrace(SETREGSET, NT_ARM_SYSTEM_CALL) [1/6] * removed IS_SKILL_SYSCALL macro [2/6] * clarified comments in syscall_trace_enter() [2/6] * changed unsigned int to compat_uint_t in compat_siginfo._sigsys [5/6] * moved to a new calling interface of secure_computing(void) [6/6]
Changes v6 -> v7: * simplified the condition of checking for user-issued syscall(-1) at syscall_trace_enter() [2/6] * defines __NR_seccomp_sigreturn only if arch-specific def doesn't exist. As Kees suggests, this is necessary for x86 and others. [3/6] * removed "#ifdef __ARCH_SIGSYS" which is always true on arm64. [5/6] * changed to call syscall_trace_exit() even if secure_computing fails. [6/6] In v6, syscall_trace_enter() returns RET_SYSCALL_SKIP_TRACE (== -2) and skips syscall_trace_exit() to minimize the overhead, but this case can be easily confused with user-issued (and invalid) syscall(-2). Anyway, this is now a consistent behavior with arm and other archs.
Changes v5 -> v6: * rebased to v3.17-rc * changed the interface of changing/skipping a system call from re-writing x8 register [v5 1/3] to using dedicated PTRACE_SET_SYSCALL command [1/6, 2/6] Patch [1/6] contains a checkpatch error around a switch statement, but it won't be fixed as in compat_arch_ptrace(). * added a new system call, seccomp(), for compat task [4/6] * added SIGSYS siginfo for compat task [5/6] * changed to always execute audit exit tracing to avoid OOPs [2/6, 6/6]
Changes v4 -> v5: * rebased to v3.16-rc * add patch [1/3] to allow ptrace to change a system call (please note that this patch should be applied even without seccomp.)
Changes v3 -> v4: * removed the following patch and moved it to "arm64: prerequisites for audit and ftrace" patchset since it is required for audit and ftrace in case of !COMPAT, too. "arm64: is_compat_task is defined both in asm/compat.h and linux/compat.h"
Changes v2 -> v3: * removed unnecessary 'type cast' operations [2/3] * check for a return value (-1) of secure_computing() explicitly [2/3] * aligned with the patch, "arm64: split syscall_trace() into separate functions for enter/exit" [2/3] * changed default of CONFIG_SECCOMP to n [2/3]
Changes v1 -> v2: * added generic seccomp.h for arm64 to utilize it [1,2/3] * changed syscall_trace() to return more meaningful value (-EPERM) on seccomp failure case [2/3] * aligned with the change in "arm64: make a single hook to syscall_trace() for all syscall features" v2 [2/3] * removed is_compat_task() definition from compat.h [3/3]
AKASHI Takahiro (6): arm64: ptrace: add NT_ARM_SYSTEM_CALL regset arm64: ptrace: allow tracer to skip a system call asm-generic: add generic seccomp.h for secure computing mode 1 arm64: add seccomp syscall for compat task arm64: add SIGSYS siginfo for compat task arm64: add seccomp support
arch/arm64/Kconfig | 14 +++++++++ arch/arm64/include/asm/compat.h | 7 +++++ arch/arm64/include/asm/seccomp.h | 25 ++++++++++++++++ arch/arm64/include/asm/unistd.h | 3 ++ arch/arm64/include/asm/unistd32.h | 3 +- arch/arm64/kernel/entry.S | 3 ++ arch/arm64/kernel/ptrace.c | 58 +++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal32.c | 6 ++++ include/asm-generic/seccomp.h | 30 +++++++++++++++++++ include/uapi/linux/elf.h | 1 + 10 files changed, 149 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/include/asm/seccomp.h create mode 100644 include/asm-generic/seccomp.h
This regeset is intended to be used to get and set a system call number while tracing. There was some discussion about possible approaches to do so:
(1) modify x8 register with ptrace(PTRACE_SETREGSET) indirectly, and update regs->syscallno later on in syscall_trace_enter(), or (2) define a dedicated regset for this purpose as on s390, or (3) support ptrace(PTRACE_SET_SYSCALL) as on arch/arm
Thinking of the fact that user_pt_regs doesn't expose 'syscallno' to tracer as well as that secure_computing() expects a changed syscall number, especially case of -1, to be visible before this function returns in syscall_trace_enter(), (1) doesn't work well. We will take (2) since it looks much cleaner.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/kernel/ptrace.c | 35 +++++++++++++++++++++++++++++++++++ include/uapi/linux/elf.h | 1 + 2 files changed, 36 insertions(+)
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 8a4ae8e..8b98781 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -551,6 +551,32 @@ static int tls_set(struct task_struct *target, const struct user_regset *regset, return ret; }
+static int system_call_get(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + void *kbuf, void __user *ubuf) +{ + struct pt_regs *regs = task_pt_regs(target); + + return user_regset_copyout(&pos, &count, &kbuf, &ubuf, + ®s->syscallno, 0, -1); +} + +static int system_call_set(struct task_struct *target, + const struct user_regset *regset, + unsigned int pos, unsigned int count, + const void *kbuf, const void __user *ubuf) +{ + int syscallno, ret; + + ret = user_regset_copyin(&pos, &count, &kbuf, &ubuf, &syscallno, 0, -1); + if (ret) + return ret; + + task_pt_regs(target)->syscallno = syscallno; + return ret; +} + enum aarch64_regset { REGSET_GPR, REGSET_FPR, @@ -559,6 +585,7 @@ enum aarch64_regset { REGSET_HW_BREAK, REGSET_HW_WATCH, #endif + REGSET_SYSTEM_CALL, };
static const struct user_regset aarch64_regsets[] = { @@ -608,6 +635,14 @@ static const struct user_regset aarch64_regsets[] = { .set = hw_break_set, }, #endif + [REGSET_SYSTEM_CALL] = { + .core_note_type = NT_ARM_SYSTEM_CALL, + .n = 1, + .size = sizeof(int), + .align = sizeof(int), + .get = system_call_get, + .set = system_call_set, + }, };
static const struct user_regset_view user_aarch64_view = { diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h index ea9bf25..71e1d0e 100644 --- a/include/uapi/linux/elf.h +++ b/include/uapi/linux/elf.h @@ -397,6 +397,7 @@ typedef struct elf64_shdr { #define NT_ARM_TLS 0x401 /* ARM TLS register */ #define NT_ARM_HW_BREAK 0x402 /* ARM hardware breakpoint registers */ #define NT_ARM_HW_WATCH 0x403 /* ARM hardware watchpoint registers */ +#define NT_ARM_SYSTEM_CALL 0x404 /* ARM system call number */ #define NT_METAG_CBUF 0x500 /* Metag catch buffer registers */ #define NT_METAG_RPIPE 0x501 /* Metag read pipeline state */ #define NT_METAG_TLS 0x502 /* Metag TLS pointer */
If tracer specifies -1 as a syscall number, this traced system call should be skipped with a return value specified in x0. This patch implements this semantics, but there is one restriction here:
syscall(-1) always return ENOSYS whatever value is stored in x0 (a return value) at syscall entry.
Normally, with ptrace off, syscall(-1) returns -ENOSYS. With ptrace on, however, if a tracer didn't pay any attention to user-issued syscall(-1) and just let it go, it would return a value in x0 as in other system call cases. This means that this system call might succeed and yet see any bogus return value. This should be definitely avoided.
Please also note: * syscall entry tracing and syscall exit tracing (ftrace tracepoint and audit) are always executed, if enabled, even when skipping a system call (that is, -1). In this way, we can avoid a potential bug where audit_syscall_entry() might be called without audit_syscall_exit() at the previous system call being called, that would cause OOPs in audit_syscall_entry().
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/kernel/entry.S | 3 +++ arch/arm64/kernel/ptrace.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..01118b1 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -670,6 +670,8 @@ ENDPROC(el0_svc) __sys_trace: mov x0, sp bl syscall_trace_enter + cmp w0, #-1 // skip the syscall? + b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs @@ -684,6 +686,7 @@ __sys_trace:
__sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 8b98781..34b1e85 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1149,6 +1149,8 @@ static void tracehook_report_syscall(struct pt_regs *regs,
asmlinkage int syscall_trace_enter(struct pt_regs *regs) { + int orig_syscallno = regs->syscallno; + if (test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
@@ -1158,6 +1160,22 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1], regs->regs[2], regs->regs[3]);
+ if (((int)regs->syscallno == -1) && (orig_syscallno == -1)) { + /* + * user-issued syscall(-1): + * RESTRICTION: We always return ENOSYS whatever value is + * stored in x0 (a return value) at this point. + * Normally, with ptrace off, syscall(-1) returns -ENOSYS. + * With ptrace on, however, if a tracer didn't pay any + * attention to user-issued syscall(-1) and just let it go + * without a hack here, it would return a value in x0 as in + * other system call cases. This means that this system call + * might succeed and see any bogus return value. + * This should be definitely avoided. + */ + regs->regs[0] = -ENOSYS; + } + return regs->syscallno; }
On Tue, Nov 18, 2014 at 01:10:34AM +0000, AKASHI Takahiro wrote:
If tracer specifies -1 as a syscall number, this traced system call should be skipped with a return value specified in x0. This patch implements this semantics, but there is one restriction here:
syscall(-1) always return ENOSYS whatever value is stored in x0 (a return value) at syscall entry.
Normally, with ptrace off, syscall(-1) returns -ENOSYS. With ptrace on, however, if a tracer didn't pay any attention to user-issued syscall(-1) and just let it go, it would return a value in x0 as in other system call cases. This means that this system call might succeed and yet see any bogus return value. This should be definitely avoided.
Please also note:
- syscall entry tracing and syscall exit tracing (ftrace tracepoint and audit) are always executed, if enabled, even when skipping a system call (that is, -1). In this way, we can avoid a potential bug where audit_syscall_entry() might be called without audit_syscall_exit() at the previous system call being called, that would cause OOPs in audit_syscall_entry().
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org
arch/arm64/kernel/entry.S | 3 +++ arch/arm64/kernel/ptrace.c | 18 ++++++++++++++++++ 2 files changed, 21 insertions(+)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..01118b1 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -670,6 +670,8 @@ ENDPROC(el0_svc) __sys_trace: mov x0, sp bl syscall_trace_enter
- cmp w0, #-1 // skip the syscall?
- b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs
@@ -684,6 +686,7 @@ __sys_trace: __sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 8b98781..34b1e85 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -1149,6 +1149,8 @@ static void tracehook_report_syscall(struct pt_regs *regs, asmlinkage int syscall_trace_enter(struct pt_regs *regs) {
- int orig_syscallno = regs->syscallno;
- if (test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
@@ -1158,6 +1160,22 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) audit_syscall_entry(regs->syscallno, regs->orig_x0, regs->regs[1], regs->regs[2], regs->regs[3]);
- if (((int)regs->syscallno == -1) && (orig_syscallno == -1)) {
/*
* user-issued syscall(-1):
* RESTRICTION: We always return ENOSYS whatever value is
* stored in x0 (a return value) at this point.
* Normally, with ptrace off, syscall(-1) returns -ENOSYS.
* With ptrace on, however, if a tracer didn't pay any
* attention to user-issued syscall(-1) and just let it go
* without a hack here, it would return a value in x0 as in
* other system call cases. This means that this system call
* might succeed and see any bogus return value.
* This should be definitely avoided.
*/
regs->regs[0] = -ENOSYS;
- }
I'm still really uncomfortable with this, and it doesn't seem to match what arch/arm/ does either. Doesn't it also prevent a tracer from skipping syscall(-1)?
Will
On 11/18/2014 11:04 PM, Will Deacon wrote:
On Tue, Nov 18, 2014 at 01:10:34AM +0000, AKASHI Takahiro wrote:
- if (((int)regs->syscallno == -1) && (orig_syscallno == -1)) {
/*
* user-issued syscall(-1):
* RESTRICTION: We always return ENOSYS whatever value is
* stored in x0 (a return value) at this point.
* Normally, with ptrace off, syscall(-1) returns -ENOSYS.
* With ptrace on, however, if a tracer didn't pay any
* attention to user-issued syscall(-1) and just let it go
* without a hack here, it would return a value in x0 as in
* other system call cases. This means that this system call
* might succeed and see any bogus return value.
* This should be definitely avoided.
*/
regs->regs[0] = -ENOSYS;
- }
I'm still really uncomfortable with this, and it doesn't seem to match what arch/arm/ does either.
Yeah, I know but as I mentioned before, syscall(-1) will be signaled on arm, and so we don't have to care about a return value :)
Doesn't it also prevent a tracer from skipping syscall(-1)?
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
-Takahiro AKASHI
Will
On Wed, Nov 19, 2014 at 08:46:19AM +0000, AKASHI Takahiro wrote:
On 11/18/2014 11:04 PM, Will Deacon wrote:
On Tue, Nov 18, 2014 at 01:10:34AM +0000, AKASHI Takahiro wrote:
- if (((int)regs->syscallno == -1) && (orig_syscallno == -1)) {
/*
* user-issued syscall(-1):
* RESTRICTION: We always return ENOSYS whatever value is
* stored in x0 (a return value) at this point.
* Normally, with ptrace off, syscall(-1) returns -ENOSYS.
* With ptrace on, however, if a tracer didn't pay any
* attention to user-issued syscall(-1) and just let it go
* without a hack here, it would return a value in x0 as in
* other system call cases. This means that this system call
* might succeed and see any bogus return value.
* This should be definitely avoided.
*/
regs->regs[0] = -ENOSYS;
- }
I'm still really uncomfortable with this, and it doesn't seem to match what arch/arm/ does either.
Yeah, I know but as I mentioned before, syscall(-1) will be signaled on arm, and so we don't have to care about a return value :)
What does x86 do?
Doesn't it also prevent a tracer from skipping syscall(-1)?
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
Will
On 11/20/2014 04:06 AM, Will Deacon wrote:
On Wed, Nov 19, 2014 at 08:46:19AM +0000, AKASHI Takahiro wrote:
On 11/18/2014 11:04 PM, Will Deacon wrote:
On Tue, Nov 18, 2014 at 01:10:34AM +0000, AKASHI Takahiro wrote:
- if (((int)regs->syscallno == -1) && (orig_syscallno == -1)) {
/*
* user-issued syscall(-1):
* RESTRICTION: We always return ENOSYS whatever value is
* stored in x0 (a return value) at this point.
* Normally, with ptrace off, syscall(-1) returns -ENOSYS.
* With ptrace on, however, if a tracer didn't pay any
* attention to user-issued syscall(-1) and just let it go
* without a hack here, it would return a value in x0 as in
* other system call cases. This means that this system call
* might succeed and see any bogus return value.
* This should be definitely avoided.
*/
regs->regs[0] = -ENOSYS;
- }
I'm still really uncomfortable with this, and it doesn't seem to match what arch/arm/ does either.
Yeah, I know but as I mentioned before, syscall(-1) will be signaled on arm, and so we don't have to care about a return value :)
What does x86 do?
On x86, syscall(-1) returns -ENOSYS if not traced, and we can change a return value if traced.
Doesn't it also prevent a tracer from skipping syscall(-1)?
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
Yes. If you don't really like this behavior, how about this patch instead of my [2/6] patch?
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..1ef57d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -668,8 +668,15 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace: + cmp w8, #-1 // default errno for invalid + b.ne 1f // system call + mov x0, #-ENOSYS + str x0, [sp, #S_X0] +1: mov x0, sp bl syscall_trace_enter + cmp w0, #-1 // skip the syscall? + b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs @@ -684,6 +691,7 @@ __sys_trace:
__sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user
With this change, I believe, syscall(-1) returns -ENOSYS by default whether traced or not, and still you can change a return value when tracing. (But a drawback here is that a tracer will see -ENOSYS in x0 even at syscall entry for syscall(-1).)
-Takahiro AKASHI
Will
On 11/20/2014 02:13 PM, AKASHI Takahiro wrote:
On 11/20/2014 04:06 AM, Will Deacon wrote:
On Wed, Nov 19, 2014 at 08:46:19AM +0000, AKASHI Takahiro wrote:
On 11/18/2014 11:04 PM, Will Deacon wrote:
On Tue, Nov 18, 2014 at 01:10:34AM +0000, AKASHI Takahiro wrote:
- if (((int)regs->syscallno == -1) && (orig_syscallno == -1)) {
/*
* user-issued syscall(-1):
* RESTRICTION: We always return ENOSYS whatever value is
* stored in x0 (a return value) at this point.
* Normally, with ptrace off, syscall(-1) returns -ENOSYS.
* With ptrace on, however, if a tracer didn't pay any
* attention to user-issued syscall(-1) and just let it go
* without a hack here, it would return a value in x0 as in
* other system call cases. This means that this system call
* might succeed and see any bogus return value.
* This should be definitely avoided.
*/
regs->regs[0] = -ENOSYS;
- }
I'm still really uncomfortable with this, and it doesn't seem to match what arch/arm/ does either.
Yeah, I know but as I mentioned before, syscall(-1) will be signaled on arm, and so we don't have to care about a return value :)
What does x86 do?
On x86, syscall(-1) returns -ENOSYS if not traced, and we can change a return value if traced.
Doesn't it also prevent a tracer from skipping syscall(-1)?
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
Yes. If you don't really like this behavior, how about this patch instead of my [2/6] patch?
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..1ef57d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -668,8 +668,15 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace:
cmp w8, #-1 // default errno for invalid
I needed to correct the code here: w8 should be w26, thinking of compat syscalls.
b.ne 1f // system call
mov x0, #-ENOSYS
str x0, [sp, #S_X0]
+1:
and this part might better be generalized like the following:
__sys_trace: cmp w26, w25 // cannot use x26 and x25 here b.hs 1f // scno > sc_nr || scno < 0 b 2f 1: mov x0, #-ENOSYS str x0, [sp, #S_X0] 2:
If you will be comfortable, I will submit a new patch soon.
-Takahiro AKASHI
mov x0, sp bl syscall_trace_enter
cmp w0, #-1 // skip the syscall?
b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs
@@ -684,6 +691,7 @@ __sys_trace:
__sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user
With this change, I believe, syscall(-1) returns -ENOSYS by default whether traced or not, and still you can change a return value when tracing. (But a drawback here is that a tracer will see -ENOSYS in x0 even at syscall entry for syscall(-1).)
-Takahiro AKASHI
Will
On Thu, Nov 20, 2014 at 05:52:34AM +0000, AKASHI Takahiro wrote:
On 11/20/2014 02:13 PM, AKASHI Takahiro wrote:
On 11/20/2014 04:06 AM, Will Deacon wrote:
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
Yes. If you don't really like this behavior, how about this patch instead of my [2/6] patch?
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..1ef57d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -668,8 +668,15 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace:
cmp w8, #-1 // default errno for invalid
I needed to correct the code here: w8 should be w26, thinking of compat syscalls.
b.ne 1f // system call
mov x0, #-ENOSYS
str x0, [sp, #S_X0]
+1:
and this part might better be generalized like the following:
__sys_trace: cmp w26, w25 // cannot use x26 and x25 here b.hs 1f // scno > sc_nr || scno < 0 b 2f 1: mov x0, #-ENOSYS str x0, [sp, #S_X0] 2:
If you will be comfortable, I will submit a new patch soon.
Yes, please send a new series including this change.
Will
On Thu, Nov 20, 2014 at 05:13:04AM +0000, AKASHI Takahiro wrote:
On 11/20/2014 04:06 AM, Will Deacon wrote:
On Wed, Nov 19, 2014 at 08:46:19AM +0000, AKASHI Takahiro wrote:
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
If you don't really like this behavior, how about this patch instead of my [2/6] patch?
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..1ef57d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -668,8 +668,15 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace:
cmp w8, #-1 // default errno for invalid
b.ne 1f // system call
mov x0, #-ENOSYS
str x0, [sp, #S_X0]
+1: mov x0, sp bl syscall_trace_enter
cmp w0, #-1 // skip the syscall?
b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs
@@ -684,6 +691,7 @@ __sys_trace:
__sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user
With this change, I believe, syscall(-1) returns -ENOSYS by default whether traced or not, and still you can change a return value when tracing. (But a drawback here is that a tracer will see -ENOSYS in x0 even at syscall entry for syscall(-1).)
But it's exactly these drawbacks that I'm objected to. syscall(-1) shouldn't be treated any differently to syscall(42) with respect to restarting, exactly like x86.
Will
On 11/21/2014 04:17 AM, Will Deacon wrote:
On Thu, Nov 20, 2014 at 05:13:04AM +0000, AKASHI Takahiro wrote:
On 11/20/2014 04:06 AM, Will Deacon wrote:
On Wed, Nov 19, 2014 at 08:46:19AM +0000, AKASHI Takahiro wrote:
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
If you don't really like this behavior, how about this patch instead of my [2/6] patch?
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..1ef57d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -668,8 +668,15 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace:
cmp w8, #-1 // default errno for invalid
b.ne 1f // system call
mov x0, #-ENOSYS
str x0, [sp, #S_X0]
+1: mov x0, sp bl syscall_trace_enter
cmp w0, #-1 // skip the syscall?
b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs
@@ -684,6 +691,7 @@ __sys_trace:
__sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user
With this change, I believe, syscall(-1) returns -ENOSYS by default whether traced or not, and still you can change a return value when tracing. (But a drawback here is that a tracer will see -ENOSYS in x0 even at syscall entry for syscall(-1).)
But it's exactly these drawbacks that I'm objected to. syscall(-1) shouldn't be treated any differently to syscall(42) with respect to restarting, exactly like x86.
Can you elaborate a bit more as to "restarting?" We can't make any assumption about the number of arguments taken by *invalid* syscall(-1) and so changing a value in x0 (or any other registers) doesn't make any difference. ()
-Takahiro AKASHI
Will
On Tue, Nov 25, 2014 at 07:42:10AM +0000, AKASHI Takahiro wrote:
On 11/21/2014 04:17 AM, Will Deacon wrote:
On Thu, Nov 20, 2014 at 05:13:04AM +0000, AKASHI Takahiro wrote:
On 11/20/2014 04:06 AM, Will Deacon wrote:
On Wed, Nov 19, 2014 at 08:46:19AM +0000, AKASHI Takahiro wrote:
Syscall(-1) will return -ENOSYS whether or not a syscallno is explicitly replaced with -1 by a tracer, and, in this sense, it is *skipped*.
Ok, but now userspace sees -ENOSYS for a skipped system call in that case, whereas it would usually see whatever the trace put in x0, right?
If you don't really like this behavior, how about this patch instead of my [2/6] patch?
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S index 726b910..1ef57d0 100644 --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -668,8 +668,15 @@ ENDPROC(el0_svc) * switches, and waiting for our parent to respond. */ __sys_trace:
cmp w8, #-1 // default errno for invalid
b.ne 1f // system call
mov x0, #-ENOSYS
str x0, [sp, #S_X0]
+1: mov x0, sp bl syscall_trace_enter
cmp w0, #-1 // skip the syscall?
b.eq __sys_trace_return_skipped adr lr, __sys_trace_return // return address uxtw scno, w0 // syscall number (possibly new) mov x1, sp // pointer to regs
@@ -684,6 +691,7 @@ __sys_trace:
__sys_trace_return: str x0, [sp] // save returned x0 +__sys_trace_return_skipped: mov x0, sp bl syscall_trace_exit b ret_to_user
With this change, I believe, syscall(-1) returns -ENOSYS by default whether traced or not, and still you can change a return value when tracing. (But a drawback here is that a tracer will see -ENOSYS in x0 even at syscall entry for syscall(-1).)
But it's exactly these drawbacks that I'm objected to. syscall(-1) shouldn't be treated any differently to syscall(42) with respect to restarting, exactly like x86.
Can you elaborate a bit more as to "restarting?"
Sorry, I meant skipping. There was another thread about syscall restarting at the same time I wrote that, so my mind was elsewhere!
We can't make any assumption about the number of arguments taken by *invalid* syscall(-1) and so changing a value in x0 (or any other registers) doesn't make any difference. ()
Ok, that's a fair point.
Will
On Thu, Nov 20, 2014 at 02:13:04PM +0900, AKASHI Takahiro wrote:
On 11/20/2014 04:06 AM, Will Deacon wrote:
What does x86 do?
On x86, syscall(-1) returns -ENOSYS if not traced, and we can change a return value if traced.
... which is used for UML (user mode Linux). UML works by spawning processes under the host kernel, which run with syscall tracing enabled, with the UML kernel as the tracer. The UML kernel tracer receives the syscall trace event when the child tries to execute a syscall, decodes the syscall, executes syscall in the UML kernel, and then cancels the syscall in the host kernel, setting the return code appropriately.
Those values (__NR_seccomp_*) are used solely in secure_computing() to identify mode 1 system calls. If compat system calls have different syscall numbers, asm/seccomp.h may override them.
Acked-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- include/asm-generic/seccomp.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 include/asm-generic/seccomp.h
diff --git a/include/asm-generic/seccomp.h b/include/asm-generic/seccomp.h new file mode 100644 index 0000000..9fa1f65 --- /dev/null +++ b/include/asm-generic/seccomp.h @@ -0,0 +1,30 @@ +/* + * include/asm-generic/seccomp.h + * + * Copyright (C) 2014 Linaro Limited + * Author: AKASHI Takahiro takahiro.akashi@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#ifndef _ASM_GENERIC_SECCOMP_H +#define _ASM_GENERIC_SECCOMP_H + +#include <linux/unistd.h> + +#if defined(CONFIG_COMPAT) && !defined(__NR_seccomp_read_32) +#define __NR_seccomp_read_32 __NR_read +#define __NR_seccomp_write_32 __NR_write +#define __NR_seccomp_exit_32 __NR_exit +#define __NR_seccomp_sigreturn_32 __NR_rt_sigreturn +#endif /* CONFIG_COMPAT && ! already defined */ + +#define __NR_seccomp_read __NR_read +#define __NR_seccomp_write __NR_write +#define __NR_seccomp_exit __NR_exit +#ifndef __NR_seccomp_sigreturn +#define __NR_seccomp_sigreturn __NR_rt_sigreturn +#endif + +#endif /* _ASM_GENERIC_SECCOMP_H */
This patch allows compat task to issue seccomp() system call.
Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/include/asm/unistd32.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 9dfdac4..8893ceb 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -787,7 +787,8 @@ __SYSCALL(__NR_sched_setattr, sys_sched_setattr) __SYSCALL(__NR_sched_getattr, sys_sched_getattr) #define __NR_renameat2 382 __SYSCALL(__NR_renameat2, sys_renameat2) - /* 383 for seccomp */ +#define __NR_seccomp 383 +__SYSCALL(__NR_seccomp, sys_seccomp) #define __NR_getrandom 384 __SYSCALL(__NR_getrandom, sys_getrandom) #define __NR_memfd_create 385
SIGSYS is primarily used in secure computing to notify tracer of syscall events. This patch allows signal handler on compat task to get correct information with SA_SIGINFO specified when this signal is delivered.
Reviewed-by: Kees Cook keescook@chromium.org Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/include/asm/compat.h | 7 +++++++ arch/arm64/kernel/signal32.c | 6 ++++++ 2 files changed, 13 insertions(+)
diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h index 56de5aa..3fb053f 100644 --- a/arch/arm64/include/asm/compat.h +++ b/arch/arm64/include/asm/compat.h @@ -205,6 +205,13 @@ typedef struct compat_siginfo { compat_long_t _band; /* POLL_IN, POLL_OUT, POLL_MSG */ int _fd; } _sigpoll; + + /* SIGSYS */ + struct { + compat_uptr_t _call_addr; /* calling user insn */ + int _syscall; /* triggering system call number */ + compat_uint_t _arch; /* AUDIT_ARCH_* of syscall */ + } _sigsys; } _sifields; } compat_siginfo_t;
diff --git a/arch/arm64/kernel/signal32.c b/arch/arm64/kernel/signal32.c index 1b9ad02..5a1ba6e 100644 --- a/arch/arm64/kernel/signal32.c +++ b/arch/arm64/kernel/signal32.c @@ -186,6 +186,12 @@ int copy_siginfo_to_user32(compat_siginfo_t __user *to, const siginfo_t *from) err |= __put_user(from->si_uid, &to->si_uid); err |= __put_user((compat_uptr_t)(unsigned long)from->si_ptr, &to->si_ptr); break; + case __SI_SYS: + err |= __put_user((compat_uptr_t)(unsigned long) + from->si_call_addr, &to->si_call_addr); + err |= __put_user(from->si_syscall, &to->si_syscall); + err |= __put_user(from->si_arch, &to->si_arch); + break; default: /* this is just in case for now ... */ err |= __put_user(from->si_pid, &to->si_pid); err |= __put_user(from->si_uid, &to->si_uid);
secure_computing() is called first in syscall_trace_enter() so that a system call will be aborted quickly without doing succeeding syscall tracing if seccomp rules want to deny that system call.
On compat task, syscall numbers for system calls allowed in seccomp mode 1 are different from those on normal tasks, and so _NR_seccomp_xxx_32's need to be redefined.
Signed-off-by: AKASHI Takahiro takahiro.akashi@linaro.org --- arch/arm64/Kconfig | 14 ++++++++++++++ arch/arm64/include/asm/seccomp.h | 25 +++++++++++++++++++++++++ arch/arm64/include/asm/unistd.h | 3 +++ arch/arm64/kernel/ptrace.c | 5 +++++ 4 files changed, 47 insertions(+) create mode 100644 arch/arm64/include/asm/seccomp.h
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 9532f8d..f495d3c 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -37,6 +37,7 @@ config ARM64 select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_KGDB + select HAVE_ARCH_SECCOMP_FILTER select HAVE_ARCH_TRACEHOOK select HAVE_BPF_JIT select HAVE_C_RECORDMCOUNT @@ -345,6 +346,19 @@ config ARCH_HAS_CACHE_LINE_SIZE
source "mm/Kconfig"
+config SECCOMP + bool "Enable seccomp to safely compute untrusted bytecode" + ---help--- + This kernel feature is useful for number crunching applications + that may need to compute untrusted bytecode during their + execution. By using pipes or other transports made available to + the process as file descriptors supporting the read/write + syscalls, it's possible to isolate those applications in + their own address space using seccomp. Once seccomp is + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled + and the task is only allowed to execute a few safe syscalls + defined by each seccomp mode. + config XEN_DOM0 def_bool y depends on XEN diff --git a/arch/arm64/include/asm/seccomp.h b/arch/arm64/include/asm/seccomp.h new file mode 100644 index 0000000..c76fac9 --- /dev/null +++ b/arch/arm64/include/asm/seccomp.h @@ -0,0 +1,25 @@ +/* + * arch/arm64/include/asm/seccomp.h + * + * Copyright (C) 2014 Linaro Limited + * Author: AKASHI Takahiro takahiro.akashi@linaro.org + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#ifndef _ASM_SECCOMP_H +#define _ASM_SECCOMP_H + +#include <asm/unistd.h> + +#ifdef CONFIG_COMPAT +#define __NR_seccomp_read_32 __NR_compat_read +#define __NR_seccomp_write_32 __NR_compat_write +#define __NR_seccomp_exit_32 __NR_compat_exit +#define __NR_seccomp_sigreturn_32 __NR_compat_rt_sigreturn +#endif /* CONFIG_COMPAT */ + +#include <asm-generic/seccomp.h> + +#endif /* _ASM_SECCOMP_H */ diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 6d2bf41..49c9aef 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -31,6 +31,9 @@ * Compat syscall numbers used by the AArch64 kernel. */ #define __NR_compat_restart_syscall 0 +#define __NR_compat_exit 1 +#define __NR_compat_read 3 +#define __NR_compat_write 4 #define __NR_compat_sigreturn 119 #define __NR_compat_rt_sigreturn 173
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index 34b1e85..f2554eb 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -27,6 +27,7 @@ #include <linux/smp.h> #include <linux/ptrace.h> #include <linux/user.h> +#include <linux/seccomp.h> #include <linux/security.h> #include <linux/init.h> #include <linux/signal.h> @@ -1151,6 +1152,10 @@ asmlinkage int syscall_trace_enter(struct pt_regs *regs) { int orig_syscallno = regs->syscallno;
+ /* Do the secure computing check first; failures should be fast. */ + if (secure_computing() == -1) + return -1; + if (test_thread_flag(TIF_SYSCALL_TRACE)) tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER);
linaro-kernel@lists.linaro.org