- Linux-kselftest-mirror - lists.linaro.org

[PATCH 4.19 053/114] ftrace/x86_64: Emulate call function while updating in breakpoint handler

by Greg Kroah-Hartman

From: Peter Zijlstra <peterz(a)infradead.org> commit 9e298e8604088a600d8100a111a532a9d342af09 upstream. Nicolai Stange discovered[1] that if live kernel patching is enabled, and the function tracer started tracing the same function that was patched, the conversion of the fentry call site during the translation of going from calling the live kernel patch trampoline to the iterator trampoline, would have as slight window where it didn't call anything. As live kernel patching depends on ftrace to always call its code (to prevent the function being traced from being called, as it will redirect it). This small window would allow the old buggy function to be called, and this can cause undesirable results. Nicolai submitted new patches[2] but these were controversial. As this is similar to the static call emulation issues that came up a while ago[3]. But after some debate[4][5] adding a gap in the stack when entering the breakpoint handler allows for pushing the return address onto the stack to easily emulate a call. [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.154171145… [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=… [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_… [ Live kernel patching is not implemented on x86_32, thus the emulate calls are only for x86_64. ] Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Changed to only implement emulated calls for x86_64 ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/kernel/ftrace.c | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -29,6 +29,7 @@ #include <asm/kprobes.h> #include <asm/ftrace.h> #include <asm/nops.h> +#include <asm/text-patching.h> #ifdef CONFIG_DYNAMIC_FTRACE @@ -228,6 +229,7 @@ int ftrace_modify_call(struct dyn_ftrace } static unsigned long ftrace_update_func; +static unsigned long ftrace_update_func_call; static int update_ftrace_func(unsigned long ip, void *new) { @@ -256,6 +258,8 @@ int ftrace_update_ftrace_func(ftrace_fun unsigned char *new; int ret; + ftrace_update_func_call = (unsigned long)func; + new = ftrace_call_replace(ip, (unsigned long)func); ret = update_ftrace_func(ip, new); @@ -291,13 +295,28 @@ int ftrace_int3_handler(struct pt_regs * if (WARN_ON_ONCE(!regs)) return 0; - ip = regs->ip - 1; - if (!ftrace_location(ip) && !is_ftrace_caller(ip)) - return 0; + ip = regs->ip - INT3_INSN_SIZE; - regs->ip += MCOUNT_INSN_SIZE - 1; +#ifdef CONFIG_X86_64 + if (ftrace_location(ip)) { + int3_emulate_call(regs, (unsigned long)ftrace_regs_caller); + return 1; + } else if (is_ftrace_caller(ip)) { + if (!ftrace_update_func_call) { + int3_emulate_jmp(regs, ip + CALL_INSN_SIZE); + return 1; + } + int3_emulate_call(regs, ftrace_update_func_call); + return 1; + } +#else + if (ftrace_location(ip) || is_ftrace_caller(ip)) { + int3_emulate_jmp(regs, ip + CALL_INSN_SIZE); + return 1; + } +#endif - return 1; + return 0; } static int ftrace_write(unsigned long ip, const char *val, int size) @@ -868,6 +887,8 @@ void arch_ftrace_update_trampoline(struc func = ftrace_ops_get_func(ops); + ftrace_update_func_call = (unsigned long)func; + /* Do a safe modify in case the trampoline is executing */ new = ftrace_call_replace(ip, (unsigned long)func); ret = update_ftrace_func(ip, new); @@ -964,6 +985,7 @@ static int ftrace_mod_jmp(unsigned long { unsigned char *new; + ftrace_update_func_call = 0UL; new = ftrace_jmp_replace(ip, (unsigned long)func); return update_ftrace_func(ip, new);

6 years, 1 month

1
0
0 0

[PATCH 4.19 052/114] x86_64: Allow breakpoints to emulate call instructions

by Greg Kroah-Hartman

From: Peter Zijlstra <peterz(a)infradead.org> commit 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 upstream. In order to allow breakpoints to emulate call instructions, they need to push the return address onto the stack. The x86_64 int3 handler adds a small gap to allow the stack to grow some. Use this gap to add the return address to be able to emulate a call instruction at the breakpoint location. These helper functions are added: int3_emulate_jmp(): changes the location of the regs->ip to return there. (The next two are only for x86_64) int3_emulate_push(): to push the address onto the gap in the stack int3_emulate_call(): push the return address and change regs->ip Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/include/asm/text-patching.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -39,4 +39,32 @@ extern int poke_int3_handler(struct pt_r extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); extern int after_bootmem; +static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip) +{ + regs->ip = ip; +} + +#define INT3_INSN_SIZE 1 +#define CALL_INSN_SIZE 5 + +#ifdef CONFIG_X86_64 +static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val) +{ + /* + * The int3 handler in entry_64.S adds a gap between the + * stack where the break point happened, and the saving of + * pt_regs. We can extend the original stack because of + * this gap. See the idtentry macro's create_gap option. + */ + regs->sp -= sizeof(unsigned long); + *(unsigned long *)regs->sp = val; +} + +static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func) +{ + int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE); + int3_emulate_jmp(regs, func); +} +#endif + #endif /* _ASM_X86_TEXT_PATCHING_H */

6 years, 1 month

1
0
0 0

[PATCH 4.14 33/77] x86_64: Allow breakpoints to emulate call instructions

by Greg Kroah-Hartman

From: Peter Zijlstra <peterz(a)infradead.org> commit 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 upstream. In order to allow breakpoints to emulate call instructions, they need to push the return address onto the stack. The x86_64 int3 handler adds a small gap to allow the stack to grow some. Use this gap to add the return address to be able to emulate a call instruction at the breakpoint location. These helper functions are added: int3_emulate_jmp(): changes the location of the regs->ip to return there. (The next two are only for x86_64) int3_emulate_push(): to push the address onto the gap in the stack int3_emulate_call(): push the return address and change regs->ip Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- arch/x86/include/asm/text-patching.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -38,4 +38,32 @@ extern void *text_poke(void *addr, const extern int poke_int3_handler(struct pt_regs *regs); extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); +static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip) +{ + regs->ip = ip; +} + +#define INT3_INSN_SIZE 1 +#define CALL_INSN_SIZE 5 + +#ifdef CONFIG_X86_64 +static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val) +{ + /* + * The int3 handler in entry_64.S adds a gap between the + * stack where the break point happened, and the saving of + * pt_regs. We can extend the original stack because of + * this gap. See the idtentry macro's create_gap option. + */ + regs->sp -= sizeof(unsigned long); + *(unsigned long *)regs->sp = val; +} + +static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func) +{ + int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE); + int3_emulate_jmp(regs, func); +} +#endif + #endif /* _ASM_X86_TEXT_PATCHING_H */

6 years, 1 month

1
0
0 0

[PATCH bpf v1 0/3] Test the 32bit narrow reads

by Krzesimir Nowak

These patches try to test the fix made in commit e2f7fc0ac695 ("bpf: fix undefined behavior in narrow load handling"). The problem existed in the generated BPF bytecode that was doing a 32bit narrow read of a 64bit field, so to test it the code would need to be executed. Currently the only such field exists in BPF_PROG_TYPE_PERF_EVENT, which is not supported by bpf_prog_test_run(). First commit adds the test, but since I can't run it, I'm not sure if it is even valid (because endianness, for example). Second commit adds a message to test_verifier when it couldn't run the program because of lack permissions or program type being not supported. Third commit fixes a possible problem with errno. With these patches, I get the following output on my test: /kernel/tools/testing/selftests/bpf$ sudo ./test_verifier 920 920 #920/p 32bit loads of a 64bit field (both least and most significant words) Did not run the program (not supported) OK Summary: 1 PASSED, 0 SKIPPED, 0 FAILED So it looks like I need to pick a different approach. Krzesimir Nowak (3): selftests/bpf: Test correctness of narrow 32bit read on 64bit field selftests/bpf: Print a message when tester could not run a program selftests/bpf: Avoid a clobbering of errno tools/testing/selftests/bpf/test_verifier.c | 19 +++++++++++++++---- .../testing/selftests/bpf/verifier/var_off.c | 15 +++++++++++++++ 2 files changed, 30 insertions(+), 4 deletions(-) -- 2.20.1

6 years, 1 month

3
11
0 0

[RFC PATCH 0/4] KVM selftests for s390x

by Thomas Huth

This patch series enables the KVM selftests for s390x. As a first test, the sync_regs from x86 has been adapted to s390x. Please note that the ucall() interface is not used yet - since s390x neither has PIO nor MMIO, this needs some more work first before it becomes usable (we likely should use a DIAG hypercall here, which is what the sync_reg test is currently using, too...). Thomas Huth (4): KVM: selftests: Guard struct kvm_vcpu_events with __KVM_HAVE_VCPU_EVENTS KVM: selftests: Align memory region addresses to 1M on s390x KVM: selftests: Add processor code for s390x KVM: selftests: Add the sync_regs test for s390x MAINTAINERS | 2 + tools/testing/selftests/kvm/Makefile | 3 + .../testing/selftests/kvm/include/kvm_util.h | 2 + .../selftests/kvm/include/s390x/processor.h | 22 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 24 +- .../selftests/kvm/lib/s390x/processor.c | 277 ++++++++++++++++++ .../selftests/kvm/s390x/sync_regs_test.c | 151 ++++++++++ 7 files changed, 476 insertions(+), 5 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/s390x/processor.h create mode 100644 tools/testing/selftests/kvm/lib/s390x/processor.c create mode 100644 tools/testing/selftests/kvm/s390x/sync_regs_test.c -- 2.21.0

6 years, 1 month

5
18
0 0

FAILED: patch "[PATCH] ftrace/x86_64: Emulate call function while updating in" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9e298e8604088a600d8100a111a532a9d342af09 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz(a)infradead.org> Date: Wed, 1 May 2019 15:11:17 +0200 Subject: [PATCH] ftrace/x86_64: Emulate call function while updating in breakpoint handler Nicolai Stange discovered[1] that if live kernel patching is enabled, and the function tracer started tracing the same function that was patched, the conversion of the fentry call site during the translation of going from calling the live kernel patch trampoline to the iterator trampoline, would have as slight window where it didn't call anything. As live kernel patching depends on ftrace to always call its code (to prevent the function being traced from being called, as it will redirect it). This small window would allow the old buggy function to be called, and this can cause undesirable results. Nicolai submitted new patches[2] but these were controversial. As this is similar to the static call emulation issues that came up a while ago[3]. But after some debate[4][5] adding a gap in the stack when entering the breakpoint handler allows for pushing the return address onto the stack to easily emulate a call. [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.154171145… [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=… [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_… [ Live kernel patching is not implemented on x86_32, thus the emulate calls are only for x86_64. ] Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Changed to only implement emulated calls for x86_64 ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index ef49517f6bb2..bd553b3af22e 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -29,6 +29,7 @@ #include <asm/kprobes.h> #include <asm/ftrace.h> #include <asm/nops.h> +#include <asm/text-patching.h> #ifdef CONFIG_DYNAMIC_FTRACE @@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, } static unsigned long ftrace_update_func; +static unsigned long ftrace_update_func_call; static int update_ftrace_func(unsigned long ip, void *new) { @@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func) unsigned char *new; int ret; + ftrace_update_func_call = (unsigned long)func; + new = ftrace_call_replace(ip, (unsigned long)func); ret = update_ftrace_func(ip, new); @@ -294,13 +298,28 @@ int ftrace_int3_handler(struct pt_regs *regs) if (WARN_ON_ONCE(!regs)) return 0; - ip = regs->ip - 1; - if (!ftrace_location(ip) && !is_ftrace_caller(ip)) - return 0; + ip = regs->ip - INT3_INSN_SIZE; - regs->ip += MCOUNT_INSN_SIZE - 1; +#ifdef CONFIG_X86_64 + if (ftrace_location(ip)) { + int3_emulate_call(regs, (unsigned long)ftrace_regs_caller); + return 1; + } else if (is_ftrace_caller(ip)) { + if (!ftrace_update_func_call) { + int3_emulate_jmp(regs, ip + CALL_INSN_SIZE); + return 1; + } + int3_emulate_call(regs, ftrace_update_func_call); + return 1; + } +#else + if (ftrace_location(ip) || is_ftrace_caller(ip)) { + int3_emulate_jmp(regs, ip + CALL_INSN_SIZE); + return 1; + } +#endif - return 1; + return 0; } NOKPROBE_SYMBOL(ftrace_int3_handler); @@ -859,6 +878,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops) func = ftrace_ops_get_func(ops); + ftrace_update_func_call = (unsigned long)func; + /* Do a safe modify in case the trampoline is executing */ new = ftrace_call_replace(ip, (unsigned long)func); ret = update_ftrace_func(ip, new); @@ -960,6 +981,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func) { unsigned char *new; + ftrace_update_func_call = 0UL; new = ftrace_jmp_replace(ip, (unsigned long)func); return update_ftrace_func(ip, new);

6 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] ftrace/x86_64: Emulate call function while updating in" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 9e298e8604088a600d8100a111a532a9d342af09 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz(a)infradead.org> Date: Wed, 1 May 2019 15:11:17 +0200 Subject: [PATCH] ftrace/x86_64: Emulate call function while updating in breakpoint handler Nicolai Stange discovered[1] that if live kernel patching is enabled, and the function tracer started tracing the same function that was patched, the conversion of the fentry call site during the translation of going from calling the live kernel patch trampoline to the iterator trampoline, would have as slight window where it didn't call anything. As live kernel patching depends on ftrace to always call its code (to prevent the function being traced from being called, as it will redirect it). This small window would allow the old buggy function to be called, and this can cause undesirable results. Nicolai submitted new patches[2] but these were controversial. As this is similar to the static call emulation issues that came up a while ago[3]. But after some debate[4][5] adding a gap in the stack when entering the breakpoint handler allows for pushing the return address onto the stack to easily emulate a call. [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.154171145… [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=… [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_… [ Live kernel patching is not implemented on x86_32, thus the emulate calls are only for x86_64. ] Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Changed to only implement emulated calls for x86_64 ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index ef49517f6bb2..bd553b3af22e 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -29,6 +29,7 @@ #include <asm/kprobes.h> #include <asm/ftrace.h> #include <asm/nops.h> +#include <asm/text-patching.h> #ifdef CONFIG_DYNAMIC_FTRACE @@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr, } static unsigned long ftrace_update_func; +static unsigned long ftrace_update_func_call; static int update_ftrace_func(unsigned long ip, void *new) { @@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func) unsigned char *new; int ret; + ftrace_update_func_call = (unsigned long)func; + new = ftrace_call_replace(ip, (unsigned long)func); ret = update_ftrace_func(ip, new); @@ -294,13 +298,28 @@ int ftrace_int3_handler(struct pt_regs *regs) if (WARN_ON_ONCE(!regs)) return 0; - ip = regs->ip - 1; - if (!ftrace_location(ip) && !is_ftrace_caller(ip)) - return 0; + ip = regs->ip - INT3_INSN_SIZE; - regs->ip += MCOUNT_INSN_SIZE - 1; +#ifdef CONFIG_X86_64 + if (ftrace_location(ip)) { + int3_emulate_call(regs, (unsigned long)ftrace_regs_caller); + return 1; + } else if (is_ftrace_caller(ip)) { + if (!ftrace_update_func_call) { + int3_emulate_jmp(regs, ip + CALL_INSN_SIZE); + return 1; + } + int3_emulate_call(regs, ftrace_update_func_call); + return 1; + } +#else + if (ftrace_location(ip) || is_ftrace_caller(ip)) { + int3_emulate_jmp(regs, ip + CALL_INSN_SIZE); + return 1; + } +#endif - return 1; + return 0; } NOKPROBE_SYMBOL(ftrace_int3_handler); @@ -859,6 +878,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops) func = ftrace_ops_get_func(ops); + ftrace_update_func_call = (unsigned long)func; + /* Do a safe modify in case the trampoline is executing */ new = ftrace_call_replace(ip, (unsigned long)func); ret = update_ftrace_func(ip, new); @@ -960,6 +981,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func) { unsigned char *new; + ftrace_update_func_call = 0UL; new = ftrace_jmp_replace(ip, (unsigned long)func); return update_ftrace_func(ip, new);

6 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] x86_64: Allow breakpoints to emulate call instructions" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz(a)infradead.org> Date: Wed, 1 May 2019 15:11:17 +0200 Subject: [PATCH] x86_64: Allow breakpoints to emulate call instructions In order to allow breakpoints to emulate call instructions, they need to push the return address onto the stack. The x86_64 int3 handler adds a small gap to allow the stack to grow some. Use this gap to add the return address to be able to emulate a call instruction at the breakpoint location. These helper functions are added: int3_emulate_jmp(): changes the location of the regs->ip to return there. (The next two are only for x86_64) int3_emulate_push(): to push the address onto the gap in the stack int3_emulate_call(): push the return address and change regs->ip Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index e85ff65c43c3..05861cc08787 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -39,4 +39,32 @@ extern int poke_int3_handler(struct pt_regs *regs); extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); extern int after_bootmem; +static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip) +{ + regs->ip = ip; +} + +#define INT3_INSN_SIZE 1 +#define CALL_INSN_SIZE 5 + +#ifdef CONFIG_X86_64 +static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val) +{ + /* + * The int3 handler in entry_64.S adds a gap between the + * stack where the break point happened, and the saving of + * pt_regs. We can extend the original stack because of + * this gap. See the idtentry macro's create_gap option. + */ + regs->sp -= sizeof(unsigned long); + *(unsigned long *)regs->sp = val; +} + +static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func) +{ + int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE); + int3_emulate_jmp(regs, func); +} +#endif + #endif /* _ASM_X86_TEXT_PATCHING_H */

6 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] x86_64: Allow breakpoints to emulate call instructions" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 4b33dadf37666c0860b88f9e52a16d07bf6d0b03 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra <peterz(a)infradead.org> Date: Wed, 1 May 2019 15:11:17 +0200 Subject: [PATCH] x86_64: Allow breakpoints to emulate call instructions In order to allow breakpoints to emulate call instructions, they need to push the return address onto the stack. The x86_64 int3 handler adds a small gap to allow the stack to grow some. Use this gap to add the return address to be able to emulate a call instruction at the breakpoint location. These helper functions are added: int3_emulate_jmp(): changes the location of the regs->ip to return there. (The next two are only for x86_64) int3_emulate_push(): to push the address onto the gap in the stack int3_emulate_call(): push the return address and change regs->ip Cc: Andy Lutomirski <luto(a)kernel.org> Cc: Nicolai Stange <nstange(a)suse.de> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: the arch/x86 maintainers <x86(a)kernel.org> Cc: Josh Poimboeuf <jpoimboe(a)redhat.com> Cc: Jiri Kosina <jikos(a)kernel.org> Cc: Miroslav Benes <mbenes(a)suse.cz> Cc: Petr Mladek <pmladek(a)suse.com> Cc: Joe Lawrence <joe.lawrence(a)redhat.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com> Cc: Tim Chen <tim.c.chen(a)linux.intel.com> Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de> Cc: Mimi Zohar <zohar(a)linux.ibm.com> Cc: Juergen Gross <jgross(a)suse.com> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Nayna Jain <nayna(a)linux.ibm.com> Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com> Cc: Joerg Roedel <jroedel(a)suse.de> Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org> Cc: stable(a)vger.kernel.org Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching") Tested-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Nicolai Stange <nstange(a)suse.de> Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> [ Modified to only work for x86_64 and added comment to int3_emulate_push() ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h index e85ff65c43c3..05861cc08787 100644 --- a/arch/x86/include/asm/text-patching.h +++ b/arch/x86/include/asm/text-patching.h @@ -39,4 +39,32 @@ extern int poke_int3_handler(struct pt_regs *regs); extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); extern int after_bootmem; +static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip) +{ + regs->ip = ip; +} + +#define INT3_INSN_SIZE 1 +#define CALL_INSN_SIZE 5 + +#ifdef CONFIG_X86_64 +static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val) +{ + /* + * The int3 handler in entry_64.S adds a gap between the + * stack where the break point happened, and the saving of + * pt_regs. We can extend the original stack because of + * this gap. See the idtentry macro's create_gap option. + */ + regs->sp -= sizeof(unsigned long); + *(unsigned long *)regs->sp = val; +} + +static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func) +{ + int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE); + int3_emulate_jmp(regs, func); +} +#endif + #endif /* _ASM_X86_TEXT_PATCHING_H */

6 years, 1 month

1
0
0 0

[PATCH v2] KVM: selftests: Compile code with warnings enabled

by Thomas Huth

So far the KVM selftests are compiled without any compiler warnings enabled. That's quite bad, since we miss a lot of possible bugs this way. Let's enable at least "-Wall" and some other useful warning flags now, and fix at least the trivial problems in the code (like unused variables). Signed-off-by: Thomas Huth <thuth(a)redhat.com> --- v2: - Rebased to kvm/queue - Fix warnings in state_test.c and evmcs_test.c, too tools/testing/selftests/kvm/Makefile | 4 +++- tools/testing/selftests/kvm/dirty_log_test.c | 6 +++++- tools/testing/selftests/kvm/lib/kvm_util.c | 3 --- tools/testing/selftests/kvm/lib/x86_64/processor.c | 4 +--- tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c | 1 + tools/testing/selftests/kvm/x86_64/evmcs_test.c | 7 +------ tools/testing/selftests/kvm/x86_64/platform_info_test.c | 1 - tools/testing/selftests/kvm/x86_64/smm_test.c | 3 +-- tools/testing/selftests/kvm/x86_64/state_test.c | 7 +------ .../selftests/kvm/x86_64/vmx_close_while_nested_test.c | 5 +---- tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c | 5 ++--- 11 files changed, 16 insertions(+), 30 deletions(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index 79c524395ebe..d113eaf2d570 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -34,7 +34,9 @@ LIBKVM += $(LIBKVM_$(UNAME_M)) INSTALL_HDR_PATH = $(top_srcdir)/usr LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/ LINUX_TOOL_INCLUDE = $(top_srcdir)/tools/include -CFLAGS += -O2 -g -std=gnu99 -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(UNAME_M) -I.. +CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ + -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ + -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(UNAME_M) -I.. no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \ $(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -no-pie -x c - -o "$$TMP", -no-pie) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index a29d1119ccb3..ff20950aee05 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -131,6 +131,7 @@ static void *vcpu_worker(void *data) while (!READ_ONCE(host_quit)) { /* Let the guest dirty the random pages */ ret = _vcpu_run(vm, VCPU_ID); + TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); if (get_ucall(vm, VCPU_ID, &uc) == UCALL_SYNC) { pages_count += TEST_PAGES_PER_LOOP; generate_random_array(guest_array, TEST_PAGES_PER_LOOP); @@ -423,8 +424,11 @@ int main(int argc, char *argv[]) unsigned long interval = TEST_HOST_LOOP_INTERVAL; bool mode_selected = false; uint64_t phys_offset = 0; - unsigned int mode, host_ipa_limit; + unsigned int mode; int opt, i; +#ifdef __aarch64__ + unsigned int host_ipa_limit; +#endif #ifdef USE_CLEAR_DIRTY_LOG if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2)) { diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index e9113857f44e..cf62de377310 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -135,7 +135,6 @@ struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm, unsigned long type) { struct kvm_vm *vm; - int kvm_fd; vm = calloc(1, sizeof(*vm)); TEST_ASSERT(vm != NULL, "Insufficient Memory"); @@ -556,7 +555,6 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, uint32_t flags) { int ret; - unsigned long pmem_size = 0; struct userspace_mem_region *region; size_t huge_page_size = KVM_UTIL_PGS_PER_HUGEPG * vm->page_size; @@ -1334,7 +1332,6 @@ void vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_sregs *sregs) int _vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_sregs *sregs) { struct vcpu *vcpu = vcpu_find(vm, vcpuid); - int ret; TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index dc7fae9fa424..21f3040d90cb 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -229,8 +229,6 @@ void sregs_dump(FILE *stream, struct kvm_sregs *sregs, void virt_pgd_alloc(struct kvm_vm *vm, uint32_t pgd_memslot) { - int rc; - TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use " "unknown or unsupported guest mode, mode: 0x%x", vm->mode); @@ -549,7 +547,6 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) struct pageDirectoryPointerEntry *pdpe; struct pageDirectoryEntry *pde; struct pageTableEntry *pte; - void *hva; TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use " "unknown or unsupported guest mode, mode: 0x%x", vm->mode); @@ -582,6 +579,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) unmapped_gva: TEST_ASSERT(false, "No mapping for vm virtual address, " "gva: 0x%lx", gva); + exit(EXIT_FAILURE); } static void kvm_setup_gdt(struct kvm_vm *vm, struct kvm_dtable *dt, int gdt_memslot, diff --git a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c index 7c2c4d4055a8..63cc9c3f5ab6 100644 --- a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c +++ b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c @@ -87,6 +87,7 @@ int main(int argc, char *argv[]) while (1) { rc = _vcpu_run(vm, VCPU_ID); + TEST_ASSERT(rc == 0, "vcpu_run failed: %d\n", rc); TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, "Unexpected exit reason: %u (%s),\n", run->exit_reason, diff --git a/tools/testing/selftests/kvm/x86_64/evmcs_test.c b/tools/testing/selftests/kvm/x86_64/evmcs_test.c index 36669684eca5..b38260e29775 100644 --- a/tools/testing/selftests/kvm/x86_64/evmcs_test.c +++ b/tools/testing/selftests/kvm/x86_64/evmcs_test.c @@ -19,8 +19,6 @@ #define VCPU_ID 5 -static bool have_nested_state; - void l2_guest_code(void) { GUEST_SYNC(6); @@ -73,7 +71,6 @@ void guest_code(struct vmx_pages *vmx_pages) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages = NULL; vm_vaddr_t vmx_pages_gva = 0; struct kvm_regs regs1, regs2; @@ -88,8 +85,6 @@ int main(int argc, char *argv[]) .args[0] = (unsigned long)&evmcs_ver }; - struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); - /* Create VM */ vm = vm_create_default(VCPU_ID, 0, guest_code); @@ -113,7 +108,7 @@ int main(int argc, char *argv[]) vcpu_regs_get(vm, VCPU_ID, &regs1); - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); for (stage = 1;; stage++) { diff --git a/tools/testing/selftests/kvm/x86_64/platform_info_test.c b/tools/testing/selftests/kvm/x86_64/platform_info_test.c index eb3e7a838cb4..40050e44ec0a 100644 --- a/tools/testing/selftests/kvm/x86_64/platform_info_test.c +++ b/tools/testing/selftests/kvm/x86_64/platform_info_test.c @@ -81,7 +81,6 @@ static void test_msr_platform_info_disabled(struct kvm_vm *vm) int main(int argc, char *argv[]) { struct kvm_vm *vm; - struct kvm_run *state; int rv; uint64_t msr_platform_info; diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c index fb8086964d83..4daf520bada1 100644 --- a/tools/testing/selftests/kvm/x86_64/smm_test.c +++ b/tools/testing/selftests/kvm/x86_64/smm_test.c @@ -87,7 +87,6 @@ void guest_code(struct vmx_pages *vmx_pages) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages = NULL; vm_vaddr_t vmx_pages_gva = 0; struct kvm_regs regs; @@ -115,7 +114,7 @@ int main(int argc, char *argv[]) vcpu_set_msr(vm, VCPU_ID, MSR_IA32_SMBASE, SMRAM_GPA); if (kvm_check_cap(KVM_CAP_NESTED_STATE)) { - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); } else { printf("will skip SMM test with VMX enabled\n"); diff --git a/tools/testing/selftests/kvm/x86_64/state_test.c b/tools/testing/selftests/kvm/x86_64/state_test.c index e0a3c0204b7c..2a4121f4de01 100644 --- a/tools/testing/selftests/kvm/x86_64/state_test.c +++ b/tools/testing/selftests/kvm/x86_64/state_test.c @@ -22,8 +22,6 @@ #define VCPU_ID 5 -static bool have_nested_state; - void l2_guest_code(void) { GUEST_SYNC(6); @@ -122,7 +120,6 @@ void guest_code(struct vmx_pages *vmx_pages) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages = NULL; vm_vaddr_t vmx_pages_gva = 0; struct kvm_regs regs1, regs2; @@ -132,8 +129,6 @@ int main(int argc, char *argv[]) struct ucall uc; int stage; - struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); - /* Create VM */ vm = vm_create_default(VCPU_ID, 0, guest_code); vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); @@ -142,7 +137,7 @@ int main(int argc, char *argv[]) vcpu_regs_get(vm, VCPU_ID, &regs1); if (kvm_check_cap(KVM_CAP_NESTED_STATE)) { - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); } else { printf("will skip nested state checks\n"); diff --git a/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c b/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c index 6edec6fd790b..97182b47b10c 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c @@ -39,8 +39,6 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) { #define L2_GUEST_STACK_SIZE 64 unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; - uint32_t control; - uintptr_t save_cr3; GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages)); GUEST_ASSERT(load_vmcs(vmx_pages)); @@ -55,7 +53,6 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages; vm_vaddr_t vmx_pages_gva; struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); @@ -68,7 +65,7 @@ int main(int argc, char *argv[]) vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); /* Allocate VMX pages and shared descriptors (vmx_pages). */ - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); for (;;) { diff --git a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c index 18fa64db0d7a..6d37a3173956 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c @@ -121,7 +121,7 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) GUEST_DONE(); } -void report(int64_t val) +static void report(int64_t val) { printf("IA32_TSC_ADJUST is %ld (%lld * TSC_ADJUST_VALUE + %lld).\n", val, val / TSC_ADJUST_VALUE, val % TSC_ADJUST_VALUE); @@ -129,7 +129,6 @@ void report(int64_t val) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages; vm_vaddr_t vmx_pages_gva; struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); @@ -142,7 +141,7 @@ int main(int argc, char *argv[]) vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); /* Allocate VMX pages and shared descriptors (vmx_pages). */ - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); for (;;) { -- 2.21.0

6 years, 1 month

5
7
0 0

[PATCH v2 00/17] kunit: introduce KUnit, the Linux kernel unit testing framework

by Brendan Higgins

## TLDR I rebased the last patchset on 5.1-rc7 in hopes that we can get this in 5.2. Shuah, I think you, Greg KH, and myself talked off thread, and we agreed we would merge through your tree when the time came? Am I remembering correctly? ## Background This patch set proposes KUnit, a lightweight unit testing and mocking framework for the Linux kernel. Unlike Autotest and kselftest, KUnit is a true unit testing framework; it does not require installing the kernel on a test machine or in a VM and does not require tests to be written in userspace running on a host kernel. Additionally, KUnit is fast: From invocation to completion KUnit can run several dozen tests in under a second. Currently, the entire KUnit test suite for KUnit runs in under a second from the initial invocation (build time excluded). KUnit is heavily inspired by JUnit, Python's unittest.mock, and Googletest/Googlemock for C++. KUnit provides facilities for defining unit test cases, grouping related test cases into test suites, providing common infrastructure for running tests, mocking, spying, and much more. ## What's so special about unit testing? A unit test is supposed to test a single unit of code in isolation, hence the name. There should be no dependencies outside the control of the test; this means no external dependencies, which makes tests orders of magnitudes faster. Likewise, since there are no external dependencies, there are no hoops to jump through to run the tests. Additionally, this makes unit tests deterministic: a failing unit test always indicates a problem. Finally, because unit tests necessarily have finer granularity, they are able to test all code paths easily solving the classic problem of difficulty in exercising error handling code. ## Is KUnit trying to replace other testing frameworks for the kernel? No. Most existing tests for the Linux kernel are end-to-end tests, which have their place. A well tested system has lots of unit tests, a reasonable number of integration tests, and some end-to-end tests. KUnit is just trying to address the unit test space which is currently not being addressed. ## More information on KUnit There is a bunch of documentation near the end of this patch set that describes how to use KUnit and best practices for writing unit tests. For convenience I am hosting the compiled docs here: https://google.github.io/kunit-docs/third_party/kernel/docs/ Additionally for convenience, I have applied these patches to a branch: https://kunit.googlesource.com/linux/+/kunit/rfc/v5.1-rc7/v1 The repo may be cloned with: git clone https://kunit.googlesource.com/linux This patchset is on the kunit/rfc/v5.1-rc7/v1 branch. ## Changes Since Last Version None. I just rebased the last patchset on v5.1-rc7. -- 2.21.0.593.g511ec345e18-goog

6 years, 1 month

12
130
0 0

[PATCH AUTOSEL 4.19 009/244] selftests/bpf: set RLIMIT_MEMLOCK properly for test_libbpf_open.c

by Sasha Levin

From: Yonghong Song <yhs(a)fb.com> [ Upstream commit 6cea33701eb024bc6c920ab83940ee22afd29139 ] Test test_libbpf.sh failed on my development server with failure -bash-4.4$ sudo ./test_libbpf.sh [0] libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. test_libbpf: failed at file test_l4lb.o selftests: test_libbpf [FAILED] -bash-4.4$ The reason is because my machine has 64KB locked memory by default which is not enough for this program to get locked memory. Similar to other bpf selftests, let us increase RLIMIT_MEMLOCK to infinity, which fixed the issue. Signed-off-by: Yonghong Song <yhs(a)fb.com> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/test_libbpf_open.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/test_libbpf_open.c b/tools/testing/selftests/bpf/test_libbpf_open.c index 8fcd1c076add0..cbd55f5f8d598 100644 --- a/tools/testing/selftests/bpf/test_libbpf_open.c +++ b/tools/testing/selftests/bpf/test_libbpf_open.c @@ -11,6 +11,8 @@ static const char *__doc__ = #include <bpf/libbpf.h> #include <getopt.h> +#include "bpf_rlimit.h" + static const struct option long_options[] = { {"help", no_argument, NULL, 'h' }, {"debug", no_argument, NULL, 'D' }, -- 2.20.1

6 years, 1 month

1
0
0 0

[PATCH AUTOSEL 5.0 011/317] selftests/bpf: set RLIMIT_MEMLOCK properly for test_libbpf_open.c

by Sasha Levin

From: Yonghong Song <yhs(a)fb.com> [ Upstream commit 6cea33701eb024bc6c920ab83940ee22afd29139 ] Test test_libbpf.sh failed on my development server with failure -bash-4.4$ sudo ./test_libbpf.sh [0] libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. test_libbpf: failed at file test_l4lb.o selftests: test_libbpf [FAILED] -bash-4.4$ The reason is because my machine has 64KB locked memory by default which is not enough for this program to get locked memory. Similar to other bpf selftests, let us increase RLIMIT_MEMLOCK to infinity, which fixed the issue. Signed-off-by: Yonghong Song <yhs(a)fb.com> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/test_libbpf_open.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/test_libbpf_open.c b/tools/testing/selftests/bpf/test_libbpf_open.c index 8fcd1c076add0..cbd55f5f8d598 100644 --- a/tools/testing/selftests/bpf/test_libbpf_open.c +++ b/tools/testing/selftests/bpf/test_libbpf_open.c @@ -11,6 +11,8 @@ static const char *__doc__ = #include <bpf/libbpf.h> #include <getopt.h> +#include "bpf_rlimit.h" + static const struct option long_options[] = { {"help", no_argument, NULL, 'h' }, {"debug", no_argument, NULL, 'D' }, -- 2.20.1

6 years, 1 month

1
0
0 0

[PATCH AUTOSEL 5.1 015/375] selftests/bpf: set RLIMIT_MEMLOCK properly for test_libbpf_open.c

by Sasha Levin

From: Yonghong Song <yhs(a)fb.com> [ Upstream commit 6cea33701eb024bc6c920ab83940ee22afd29139 ] Test test_libbpf.sh failed on my development server with failure -bash-4.4$ sudo ./test_libbpf.sh [0] libbpf: Error in bpf_object__probe_name():Operation not permitted(1). Couldn't load basic 'r0 = 0' BPF program. test_libbpf: failed at file test_l4lb.o selftests: test_libbpf [FAILED] -bash-4.4$ The reason is because my machine has 64KB locked memory by default which is not enough for this program to get locked memory. Similar to other bpf selftests, let us increase RLIMIT_MEMLOCK to infinity, which fixed the issue. Signed-off-by: Yonghong Song <yhs(a)fb.com> Signed-off-by: Alexei Starovoitov <ast(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/bpf/test_libbpf_open.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/test_libbpf_open.c b/tools/testing/selftests/bpf/test_libbpf_open.c index 65cbd30704b5a..9e9db202d218a 100644 --- a/tools/testing/selftests/bpf/test_libbpf_open.c +++ b/tools/testing/selftests/bpf/test_libbpf_open.c @@ -11,6 +11,8 @@ static const char *__doc__ = #include <bpf/libbpf.h> #include <getopt.h> +#include "bpf_rlimit.h" + static const struct option long_options[] = { {"help", no_argument, NULL, 'h' }, {"debug", no_argument, NULL, 'D' }, -- 2.20.1

6 years, 1 month

1
0
0 0

[PATCH v3 0/3] Fix vDSO clock_getres()

by Vincenzo Frascino

clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. A possible fix is to change the vdso implementation of clock_getres, keeping a copy of hrtimer_resolution in vdso data and using that directly [1]. This patchset implements the proposed fix for arm64, powerpc, s390, nds32 and adds a test to verify that the syscall and the vdso library implementation of clock_getres return the same values. Even if these patches are unified by the same topic, there is no dependency between them, hence they can be merged singularly by each arch maintainer. Note: arm64 and nds32 respective fixes have been merged in 5.2-rc1, hence they have been removed from this series. [1] https://marc.info/?l=linux-arm-kernel&m=155110381930196&w=2 Changes: -------- v3: - Rebased on 5.2-rc1. - Addressed review comments. v2: - Rebased on 5.1-rc5. - Addressed review comments. Cc: Christophe Leroy <christophe.leroy(a)c-s.fr> Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org> Cc: Paul Mackerras <paulus(a)samba.org> Cc: Michael Ellerman <mpe(a)ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Arnd Bergmann <arnd(a)arndb.de> Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Vincenzo Frascino (3): powerpc: Fix vDSO clock_getres() s390: Fix vDSO clock_getres() kselftest: Extend vDSO selftest to clock_getres arch/powerpc/include/asm/vdso_datapage.h | 2 + arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/time.c | 1 + arch/powerpc/kernel/vdso32/gettimeofday.S | 7 +- arch/powerpc/kernel/vdso64/gettimeofday.S | 7 +- arch/s390/include/asm/vdso.h | 1 + arch/s390/kernel/asm-offsets.c | 2 +- arch/s390/kernel/time.c | 1 + arch/s390/kernel/vdso32/clock_getres.S | 12 +- arch/s390/kernel/vdso64/clock_getres.S | 10 +- tools/testing/selftests/vDSO/Makefile | 2 + .../selftests/vDSO/vdso_clock_getres.c | 137 ++++++++++++++++++ 12 files changed, 168 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c -- 2.21.0

6 years, 1 month

2
6
0 0

[PATCH v2] selftests: netfilter: missing error check when setting up veth interface

by Jeffrin Jose T

A test for the basic NAT functionality uses ip command which needs veth device.There is a condition where the kernel support for veth is not compiled into the kernel and the test script breaks.This patch contains code for reasonable error display and correct code exit. Signed-off-by: Jeffrin Jose T <jeffrin(a)rajagiritech.edu.in> --- tools/testing/selftests/netfilter/nft_nat.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/netfilter/nft_nat.sh b/tools/testing/selftests/netfilter/nft_nat.sh index 8ec76681605c..f25f72a75cf3 100755 --- a/tools/testing/selftests/netfilter/nft_nat.sh +++ b/tools/testing/selftests/netfilter/nft_nat.sh @@ -23,7 +23,11 @@ ip netns add ns0 ip netns add ns1 ip netns add ns2 -ip link add veth0 netns ns0 type veth peer name eth0 netns ns1 +ip link add veth0 netns ns0 type veth peer name eth0 netns ns1 > /dev/null 2>&1 +if [ $? -ne 0 ];then + echo "SKIP: No virtual ethernet pair device support in kernel" + exit $ksft_skip +fi ip link add veth1 netns ns0 type veth peer name eth0 netns ns2 ip -net ns0 link set lo up -- 2.20.1

6 years, 1 month

3
2
0 0

[PATCH v2 0/5] Fix vDSO clock_getres()

by Vincenzo Frascino

clock_getres in the vDSO library has to preserve the same behaviour of posix_get_hrtimer_res(). In particular, posix_get_hrtimer_res() does: sec = 0; ns = hrtimer_resolution; and hrtimer_resolution depends on the enablement of the high resolution timers that can happen either at compile or at run time. A possible fix is to change the vdso implementation of clock_getres, keeping a copy of hrtimer_resolution in vdso data and using that directly [1]. This patchset implements the proposed fix for arm64, powerpc, s390, nds32 and adds a test to verify that the syscall and the vdso library implementation of clock_getres return the same values. Even if these patches are unified by the same topic, there is no dependency between them, hence they can be merged singularly by each arch maintainer. [1] https://marc.info/?l=linux-arm-kernel&m=155110381930196&w=2 Changes: -------- v2: - Rebased on 5.1-rc5. - Addressed review comments. Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Will Deacon <will.deacon(a)arm.com> Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org> Cc: Paul Mackerras <paulus(a)samba.org> Cc: Michael Ellerman <mpe(a)ellerman.id.au> Cc: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com> Cc: Greentime Hu <green.hu(a)gmail.com> Cc: Vincent Chen <deanbo422(a)gmail.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Arnd Bergmann <arnd(a)arndb.de> Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Vincenzo Frascino (5): arm64: Fix vDSO clock_getres() powerpc: Fix vDSO clock_getres() s390: Fix vDSO clock_getres() nds32: Fix vDSO clock_getres() kselftest: Extend vDSO selftest to clock_getres arch/arm64/include/asm/vdso_datapage.h | 1 + arch/arm64/kernel/asm-offsets.c | 2 +- arch/arm64/kernel/vdso.c | 2 + arch/arm64/kernel/vdso/gettimeofday.S | 22 ++-- arch/nds32/include/asm/vdso_datapage.h | 1 + arch/nds32/kernel/vdso.c | 1 + arch/nds32/kernel/vdso/gettimeofday.c | 4 +- arch/powerpc/include/asm/vdso_datapage.h | 2 + arch/powerpc/kernel/asm-offsets.c | 2 +- arch/powerpc/kernel/time.c | 1 + arch/powerpc/kernel/vdso32/gettimeofday.S | 7 +- arch/powerpc/kernel/vdso64/gettimeofday.S | 7 +- arch/s390/include/asm/vdso.h | 1 + arch/s390/kernel/asm-offsets.c | 2 +- arch/s390/kernel/time.c | 1 + arch/s390/kernel/vdso32/clock_getres.S | 12 +- arch/s390/kernel/vdso64/clock_getres.S | 10 +- tools/testing/selftests/vDSO/Makefile | 2 + .../selftests/vDSO/vdso_clock_getres.c | 108 ++++++++++++++++++ 19 files changed, 159 insertions(+), 29 deletions(-) create mode 100644 tools/testing/selftests/vDSO/vdso_clock_getres.c -- 2.21.0

6 years, 1 month

5
13
0 0

[PATCH 1/2] open: add close_range()

by Christian Brauner

This adds the close_range() syscall. It allows to efficiently close a range of file descriptors up to all file descriptors of a calling task. The syscall came up in a recent discussion around the new mount API and making new file descriptor types cloexec by default. During this discussion, Al suggested the close_range() syscall (cf. [1]). Note, a syscall in this manner has been requested by various people over time. First, it helps to close all file descriptors of an exec()ing task. This can be done safely via (quoting Al's example from [1] verbatim): /* that exec is sensitive */ unshare(CLONE_FILES); /* we don't want anything past stderr here */ close_range(3, ~0U); execve(....); The code snippet above is one way of working around the problem that file descriptors are not cloexec by default. This is aggravated by the fact that we can't just switch them over without massively regressing userspace. For a whole class of programs having an in-kernel method of closing all file descriptors is very helpful (e.g. demons, service managers, programming language standard libraries, container managers etc.). (Please note, unshare(CLONE_FILES) should only be needed if the calling task is multi-threaded and shares the file descriptor table with another thread in which case two threads could race with one thread allocating file descriptors and the other one closing them via close_range(). For the general case close_range() before the execve() is sufficient.) Second, it allows userspace to avoid implementing closing all file descriptors by parsing through /proc/<pid>/fd/* and calling close() on each file descriptor. From looking at various large(ish) userspace code bases this or similar patterns are very common in: - service managers (cf. [4]) - libcs (cf. [6]) - container runtimes (cf. [5]) - programming language runtimes/standard libraries - Python (cf. [2]) - Rust (cf. [7], [8]) As Dmitry pointed out there's even a long-standing glibc bug about missing kernel support for this task (cf. [3]). In addition, the syscall will also work for tasks that do not have procfs mounted and on kernels that do not have procfs support compiled in. In such situations the only way to make sure that all file descriptors are closed is to call close() on each file descriptor up to UINT_MAX or RLIMIT_NOFILE, OPEN_MAX trickery (cf. comment [8] on Rust). The performance is striking. For good measure, comparing the following simple close_all_fds() userspace implementation that is essentially just glibc's version in [6]: static int close_all_fds(void) { DIR *dir; struct dirent *direntp; dir = opendir("/proc/self/fd"); if (!dir) return -1; while ((direntp = readdir(dir))) { int fd; if (strcmp(direntp->d_name, ".") == 0) continue; if (strcmp(direntp->d_name, "..") == 0) continue; fd = atoi(direntp->d_name); if (fd == 0 || fd == 1 || fd == 2) continue; close(fd); } closedir(dir); /* cannot fail */ return 0; } to close_range() yields: 1. closing 4 open files: - close_all_fds(): ~280 us - close_range(): ~24 us 2. closing 1000 open files: - close_all_fds(): ~5000 us - close_range(): ~800 us close_range() is designed to allow for some flexibility. Specifically, it does not simply always close all open file descriptors of a task. Instead, callers can specify an upper bound. This is e.g. useful for scenarios where specific file descriptors are created with well-known numbers that are supposed to be excluded from getting closed. For extra paranoia close_range() comes with a flags argument. This can e.g. be used to implement extension. Once can imagine userspace wanting to stop at the first error instead of ignoring errors under certain circumstances. There might be other valid ideas in the future. In any case, a flag argument doesn't hurt and keeps us on the safe side. >From an implementation side this is kept rather dumb. It saw some input from David and Jann but all nonsense is obviously my own! - Errors to close file descriptors are currently ignored. (Could be changed by setting a flag in the future if needed.) - __close_range() is a rather simplistic wrapper around __close_fd(). My reasoning behind this is based on the nature of how __close_fd() needs to release an fd. But maybe I misunderstood specifics: We take the files_lock and rcu-dereference the fdtable of the calling task, we find the entry in the fdtable, get the file and need to release files_lock before calling filp_close(). In the meantime the fdtable might have been altered so we can't just retake the spinlock and keep the old rcu-reference of the fdtable around. Instead we need to grab a fresh reference to the fdtable. If my reasoning is correct then there's really no point in fancyfying __close_range(): We just need to rcu-dereference the fdtable of the calling task once to cap the max_fd value correctly and then go on calling __close_fd() in a loop. /* References */ [1]: https://lore.kernel.org/lkml/20190516165021.GD17978@ZenIV.linux.org.uk/ [2]: https://github.com/python/cpython/blob/9e4f2f3a6b8ee995c365e86d976937c141d8… [3]: https://sourceware.org/bugzilla/show_bug.cgi?id=10353#c7 [4]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa… [5]: https://github.com/lxc/lxc/blob/ddf4b77e11a4d08f09b7b9cd13e593f8c047edc5/sr… [6]: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/gr… Note that this is an internal implementation that is not exported. Currently, libc seems to not provide an exported version of this because of missing kernel support to do this. [7]: https://github.com/rust-lang/rust/issues/12148 [8]: https://github.com/rust-lang/rust/blob/5f47c0613ed4eb46fca3633c1297364c09e5… Rust's solution is slightly different but is equally unperformant. Rust calls getdtablesize() which is a glibc library function that simply returns the current RLIMIT_NOFILE or OPEN_MAX values. Rust then goes on to call close() on each fd. That's obviously overkill for most tasks. Rarely, tasks - especially non-demons - hit RLIMIT_NOFILE or OPEN_MAX. Let's be nice and assume an unprivileged user with RLIMIT_NOFILE set to 1024. Even in this case, there's a very high chance that in the common case Rust is calling the close() syscall 1021 times pointlessly if the task just has 0, 1, and 2 open. Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk> Signed-off-by: Christian Brauner <christian(a)brauner.io> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Jann Horn <jannh(a)google.com> Cc: David Howells <dhowells(a)redhat.com> Cc: Dmitry V. Levin <ldv(a)altlinux.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Florian Weimer <fweimer(a)redhat.com> Cc: linux-api(a)vger.kernel.org --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd32.h | 2 ++ arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/file.c | 30 +++++++++++++++++++++ fs/open.c | 20 ++++++++++++++ include/linux/fdtable.h | 2 ++ include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/unistd.h | 4 ++- 22 files changed, 75 insertions(+), 1 deletion(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 9e7704e44f6d..b55d93af8096 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -473,3 +473,4 @@ 541 common fsconfig sys_fsconfig 542 common fsmount sys_fsmount 543 common fspick sys_fspick +545 common close_range sys_close_range diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index aaf479a9e92d..0125c97c75dd 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -447,3 +447,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index c39e90600bb3..9a3270d29b42 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig) __SYSCALL(__NR_fsmount, sys_fsmount) #define __NR_fspick 433 __SYSCALL(__NR_fspick, sys_fspick) +#define __NR_close_range 435 +__SYSCALL(__NR_close_range, sys_close_range) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index e01df3f2f80d..1a90b464e96f 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -354,3 +354,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 7e3d0734b2f3..2dee2050f9ef 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 26339e417695..923ef69e5a76 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -439,3 +439,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 0e2dd68ade57..967ed9de51cd 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -372,3 +372,4 @@ 431 n32 fsconfig sys_fsconfig 432 n32 fsmount sys_fsmount 433 n32 fspick sys_fspick +435 n32 close_range sys_close_range diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index 5eebfa0d155c..71de731102b1 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -348,3 +348,4 @@ 431 n64 fsconfig sys_fsconfig 432 n64 fsmount sys_fsmount 433 n64 fspick sys_fspick +435 n64 close_range sys_close_range diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 3cc1374e02d0..5a325ab29f88 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -421,3 +421,4 @@ 431 o32 fsconfig sys_fsconfig 432 o32 fsmount sys_fsmount 433 o32 fspick sys_fspick +435 o32 close_range sys_close_range diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index c9e377d59232..dcc0a0879139 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -430,3 +430,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 103655d84b4b..ba2c1f078cbd 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -515,3 +515,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index e822b2964a83..d7c9043d2902 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -436,3 +436,4 @@ 431 common fsconfig sys_fsconfig sys_fsconfig 432 common fsmount sys_fsmount sys_fsmount 433 common fspick sys_fspick sys_fspick +435 common close_range sys_close_range sys_close_range diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index 016a727d4357..9b5e6bf0ce32 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -436,3 +436,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index e047480b1605..8c674a1e0072 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -479,3 +479,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index ad968b7bac72..7f7a89a96707 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig 432 i386 fsmount sys_fsmount __ia32_sys_fsmount 433 i386 fspick sys_fspick __ia32_sys_fspick +435 i386 close_range sys_close_range __ia32_sys_close_range diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index b4e6f9e6204a..0f7d47ae921c 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 431 common fsconfig __x64_sys_fsconfig 432 common fsmount __x64_sys_fsmount 433 common fspick __x64_sys_fspick +435 common close_range __x64_sys_close_range # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 5fa0ee1c8e00..b489532265d0 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -404,3 +404,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +435 common close_range sys_close_range diff --git a/fs/file.c b/fs/file.c index 3da91a112bab..3680977a685a 100644 --- a/fs/file.c +++ b/fs/file.c @@ -641,6 +641,36 @@ int __close_fd(struct files_struct *files, unsigned fd) } EXPORT_SYMBOL(__close_fd); /* for ksys_close() */ +/** + * __close_range() - Close all file descriptors in a given range. + * + * @fd: starting file descriptor to close + * @max_fd: last file descriptor to close + * + * This closes a range of file descriptors. All file descriptors + * from @fd up to and including @max_fd are closed. + */ +int __close_range(struct files_struct *files, unsigned fd, unsigned max_fd) +{ + unsigned int cur_max; + + if (fd > max_fd) + return -EINVAL; + + rcu_read_lock(); + cur_max = files_fdtable(files)->max_fds; + rcu_read_unlock(); + + /* cap to last valid index into fdtable */ + if (max_fd >= cur_max) + max_fd = cur_max - 1; + + while (fd <= max_fd) + __close_fd(files, fd++); + + return 0; +} + /* * variant of __close_fd that gets a ref on the file for later fput */ diff --git a/fs/open.c b/fs/open.c index 9c7d724a6f67..c7baaee7aa47 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1174,6 +1174,26 @@ SYSCALL_DEFINE1(close, unsigned int, fd) return retval; } +/** + * close_range() - Close all file descriptors in a given range. + * + * @fd: starting file descriptor to close + * @max_fd: last file descriptor to close + * @flags: reserved for future extensions + * + * This closes a range of file descriptors. All file descriptors + * from @fd up to and including @max_fd are closed. + * Currently, errors to close a given file descriptor are ignored. + */ +SYSCALL_DEFINE3(close_range, unsigned int, fd, unsigned int, max_fd, + unsigned int, flags) +{ + if (flags) + return -EINVAL; + + return __close_range(current->files, fd, max_fd); +} + /* * This routine simulates a hangup on the tty, to arrange that users * are given clean terminals at login time. diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index f07c55ea0c22..fcd07181a365 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -121,6 +121,8 @@ extern void __fd_install(struct files_struct *files, unsigned int fd, struct file *file); extern int __close_fd(struct files_struct *files, unsigned int fd); +extern int __close_range(struct files_struct *files, unsigned int fd, + unsigned int max_fd); extern int __close_fd_get_file(unsigned int fd, struct file **res); extern struct kmem_cache *files_cachep; diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e2870fe1be5b..c0189e223255 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -441,6 +441,8 @@ asmlinkage long sys_fchown(unsigned int fd, uid_t user, gid_t group); asmlinkage long sys_openat(int dfd, const char __user *filename, int flags, umode_t mode); asmlinkage long sys_close(unsigned int fd); +asmlinkage long sys_close_range(unsigned int fd, unsigned int max_fd, + unsigned int flags); asmlinkage long sys_vhangup(void); /* fs/pipe.c */ diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index a87904daf103..3f36c8745d24 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig) __SYSCALL(__NR_fsmount, sys_fsmount) #define __NR_fspick 433 __SYSCALL(__NR_fspick, sys_fspick) +#define __NR_close_range 435 +__SYSCALL(__NR_close_range, sys_close_range) #undef __NR_syscalls -#define __NR_syscalls 434 +#define __NR_syscalls 436 /* * 32 bit systems traditionally used different -- 2.21.0

6 years, 1 month

6
14
0 0

[PATCH v5.2-rc2 0/2] selftests: Remove forced unbuffering for test running

by Kees Cook

This should fix test regressions seen when running under unbuffered output. Additionally, fixes are added for the missing flushed output in the timers test which become obvious without unbuffered output. -Kees Kees Cook (2): selftests: Remove forced unbuffering for test running selftests/timers: Add missing fflush(stdout) calls tools/testing/selftests/kselftest/runner.sh | 12 +----------- tools/testing/selftests/timers/adjtick.c | 1 + tools/testing/selftests/timers/leapcrash.c | 1 + tools/testing/selftests/timers/mqueue-lat.c | 1 + tools/testing/selftests/timers/nanosleep.c | 1 + tools/testing/selftests/timers/nsleep-lat.c | 1 + tools/testing/selftests/timers/raw_skew.c | 1 + tools/testing/selftests/timers/set-tai.c | 1 + tools/testing/selftests/timers/set-tz.c | 2 ++ tools/testing/selftests/timers/threadtest.c | 1 + tools/testing/selftests/timers/valid-adjtimex.c | 2 ++ 11 files changed, 13 insertions(+), 11 deletions(-) -- 2.17.1

6 years, 1 month

3
4
0 0

[PATCH v2 1/2] pid: add pidfd_open()

by Christian Brauner

This adds the pidfd_open() syscall. It allows a caller to retrieve pollable pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a process that is created via traditional fork()/clone() calls that is only referenced by a PID: int pidfd = pidfd_open(1234, 0); ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0); With the introduction of pidfds through CLONE_PIDFD it is possible to created pidfds at process creation time. However, a lot of processes get created with traditional PID-based calls such as fork() or clone() (without CLONE_PIDFD). For these processes a caller can currently not create a pollable pidfd. This is a problem for Android's low memory killer (LMK) and service managers such as systemd. Both are examples of tools that want to make use of pidfds to get reliable notification of process exit for non-parents (pidfd polling) and race-free signal sending (pidfd_send_signal()). They intend to switch to this API for process supervision/management as soon as possible. Having no way to get pollable pidfds from PID-only processes is one of the biggest blockers for them in adopting this api. With pidfd_open() making it possible to retrieve pidfds for PID-based processes we enable them to adopt this api. In line with Arnd's recent changes to consolidate syscall numbers across architectures, I have added the pidfd_open() syscall to all architectures at the same time. Signed-off-by: Christian Brauner <christian(a)brauner.io> Reviewed-by: Oleg Nesterov <oleg(a)redhat.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Kees Cook <keescook(a)chromium.org> Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Jann Horn <jannh(a)google.com> Cc: David Howells <dhowells(a)redhat.com> Cc: Andy Lutomirsky <luto(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Aleksa Sarai <cyphar(a)cyphar.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: linux-api(a)vger.kernel.org --- v1: - kbuild test robot <lkp(a)intel.com>: - add missing entry for pidfd_open to arch/arm/tools/syscall.tbl - Oleg Nesterov <oleg(a)redhat.com>: - use simpler thread-group leader check v2: - Oleg Nesterov <oleg(a)redhat.com>: - avoid using additional variable - remove unneeded comment - Arnd Bergmann <arnd(a)arndb.de>: - switch from 428 to 434 since the new mount api has taken it - bump syscall numbers in arch/arm64/include/asm/unistd.h - Joel Fernandes (Google) <joel(a)joelfernandes.org>: - switch from ESRCH to EINVAL when the passed-in pid does not refer to a thread-group leader - Christian Brauner <christian(a)brauner.io>: - rebase on v5.2-rc1 - adapt syscall number to account for new mount api syscalls --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/pid.h | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 4 +- kernel/fork.c | 2 +- kernel/pid.c | 43 +++++++++++++++++++++ 21 files changed, 66 insertions(+), 3 deletions(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 9e7704e44f6d..1db9bbcfb84e 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -473,3 +473,4 @@ 541 common fsconfig sys_fsconfig 542 common fsmount sys_fsmount 543 common fspick sys_fspick +544 common pidfd_open sys_pidfd_open diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index aaf479a9e92d..81e6e1817c45 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -447,3 +447,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 70e6882853c0..e8f7d95a1481 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -44,7 +44,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800) -#define __NR_compat_syscalls 434 +#define __NR_compat_syscalls 435 #endif #define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index c39e90600bb3..7a3158ccd68e 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig) __SYSCALL(__NR_fsmount, sys_fsmount) #define __NR_fspick 433 __SYSCALL(__NR_fspick, sys_fspick) +#define __NR_pidfd_open 434 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index e01df3f2f80d..ecc44926737b 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -354,3 +354,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index 7e3d0734b2f3..9a3eb2558568 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 26339e417695..ad706f83c755 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -439,3 +439,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 0e2dd68ade57..97035e19ad03 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -372,3 +372,4 @@ 431 n32 fsconfig sys_fsconfig 432 n32 fsmount sys_fsmount 433 n32 fspick sys_fspick +434 n32 pidfd_open sys_pidfd_open diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index c9e377d59232..5022b9e179c2 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -430,3 +430,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 103655d84b4b..f2c3bda2d39f 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -515,3 +515,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index e822b2964a83..6ebacfeaf853 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -436,3 +436,4 @@ 431 common fsconfig sys_fsconfig sys_fsconfig 432 common fsmount sys_fsmount sys_fsmount 433 common fspick sys_fspick sys_fspick +434 common pidfd_open sys_pidfd_open sys_pidfd_open diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index 016a727d4357..834c9c7d79fa 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -436,3 +436,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index e047480b1605..c58e71f21129 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -479,3 +479,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index ad968b7bac72..43e4429a5272 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig 432 i386 fsmount sys_fsmount __ia32_sys_fsmount 433 i386 fspick sys_fspick __ia32_sys_fspick +434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index b4e6f9e6204a..1bee0a77fdd3 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 431 common fsconfig __x64_sys_fsconfig 432 common fsmount __x64_sys_fsmount 433 common fspick __x64_sys_fspick +434 common pidfd_open __x64_sys_pidfd_open # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 5fa0ee1c8e00..782b81945ccc 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -404,3 +404,4 @@ 431 common fsconfig sys_fsconfig 432 common fsmount sys_fsmount 433 common fspick sys_fspick +434 common pidfd_open sys_pidfd_open diff --git a/include/linux/pid.h b/include/linux/pid.h index 3c8ef5a199ca..c938a92eab99 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -67,6 +67,7 @@ struct pid extern struct pid init_struct_pid; extern const struct file_operations pidfd_fops; +extern int pidfd_create(struct pid *pid); static inline struct pid *get_pid(struct pid *pid) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e2870fe1be5b..989055e0b501 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock, struct old_timex32 __user *tx); asmlinkage long sys_syncfs(int fd); asmlinkage long sys_setns(int fd, int nstype); +asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags); asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg, unsigned int vlen, unsigned flags); asmlinkage long sys_process_vm_readv(pid_t pid, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index a87904daf103..e5684a4512c0 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig) __SYSCALL(__NR_fsmount, sys_fsmount) #define __NR_fspick 433 __SYSCALL(__NR_fspick, sys_fspick) +#define __NR_pidfd_open 434 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) #undef __NR_syscalls -#define __NR_syscalls 434 +#define __NR_syscalls 435 /* * 32 bit systems traditionally used different diff --git a/kernel/fork.c b/kernel/fork.c index b4cba953040a..c3df226f47a1 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1724,7 +1724,7 @@ const struct file_operations pidfd_fops = { * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -static int pidfd_create(struct pid *pid) +int pidfd_create(struct pid *pid) { int fd; diff --git a/kernel/pid.c b/kernel/pid.c index 89548d35eefb..8fc9d94f6ac1 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -37,6 +37,7 @@ #include <linux/syscalls.h> #include <linux/proc_ns.h> #include <linux/proc_fs.h> +#include <linux/sched/signal.h> #include <linux/sched/task.h> #include <linux/idr.h> @@ -450,6 +451,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) return idr_get_next(&ns->idr, &nr); } +/** + * pidfd_open() - Open new pid file descriptor. + * + * @pid: pid for which to retrieve a pidfd + * @flags: flags to pass + * + * This creates a new pid file descriptor with the O_CLOEXEC flag set for + * the process identified by @pid. Currently, the process identified by + * @pid must be a thread-group leader. This restriction currently exists + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot + * be used with CLONE_THREAD) and pidfd polling (only supports thread group + * leaders). + * + * Return: On success, a cloexec pidfd is returned. + * On error, a negative errno number will be returned. + */ +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) +{ + int fd, ret; + struct pid *p; + + if (flags) + return -EINVAL; + + if (pid <= 0) + return -EINVAL; + + p = find_get_pid(pid); + if (!p) + return -ESRCH; + + ret = 0; + rcu_read_lock(); + if (!pid_task(p, PIDTYPE_TGID)) + ret = -EINVAL; + rcu_read_unlock(); + + fd = ret ?: pidfd_create(p); + put_pid(p); + return fd; +} + void __init pid_idr_init(void) { /* Verify no one has done anything silly: */ -- 2.21.0

6 years, 1 month

3
5
0 0

Inquiry 20/May/2019

by Daniel Murray

Hi,friend, This is Daniel Murray and i am from Sinara Group Co.Ltd in Russia. We are glad to know about your company from the web and we are interested in your products. Could you kindly send us your Latest catalog and price list for our trial order. Best Regards, Daniel Murray Purchasing Manager

6 years, 1 month

1
0
0 0

[PATCH] KVM: selftests: Remove duplicated TEST_ASSERT in hyperv_cpuid.c

by Thomas Huth

The check for entry->index == 0 is done twice. One time should be sufficient. Suggested-by: Vitaly Kuznetsov <vkuznets(a)redhat.com> Signed-off-by: Thomas Huth <thuth(a)redhat.com> --- Vitaly already noticed this in his review to the "Fix a condition in test_hv_cpuid()" patch a couple of days ago, but so far I haven't seen any patch yet on the list that fixes this ... if I missed it instead, please simply ignore this patch. tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c index 9a21e912097c..8bdf1e7da6cc 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c @@ -52,9 +52,6 @@ static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries, TEST_ASSERT(entry->index == 0, ".index field should be zero"); - TEST_ASSERT(entry->index == 0, - ".index field should be zero"); - TEST_ASSERT(entry->flags == 0, ".flags field should be zero"); -- 2.21.0

6 years, 1 month

2
1
0 0

[PATCH] KVM: selftests: Fix a condition in test_hv_cpuid()

by Dan Carpenter

The code is trying to check that all the padding is zeroed out and it does this: entry->padding[0] == entry->padding[1] == entry->padding[2] == 0 Assume everything is zeroed correctly, then the first comparison is true, the next comparison is false and false is equal to zero so the overall condition is true. This bug doesn't affect run time very badly, but the code should instead just check that all three paddings are zero individually. Also the error message was copy and pasted from an earlier error and it wasn't correct. Fixes: 7edcb7343327 ("KVM: selftests: Add hyperv_cpuid test") Signed-off-by: Dan Carpenter <dan.carpenter(a)oracle.com> --- tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c index 9a21e912097c..63b9fc3fdfbe 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c @@ -58,9 +58,8 @@ static void test_hv_cpuid(struct kvm_cpuid2 *hv_cpuid_entries, TEST_ASSERT(entry->flags == 0, ".flags field should be zero"); - TEST_ASSERT(entry->padding[0] == entry->padding[1] - == entry->padding[2] == 0, - ".index field should be zero"); + TEST_ASSERT(!entry->padding[0] && !entry->padding[1] && + !entry->padding[2], "padding should be zero"); /* * If needed for debug: -- 2.20.1

6 years, 1 month

4
3
0 0

Re: [PATCH 1/3] kselftest/cgroup: fix unexcepted testing failure on test_memcontrol

by Alex Shi

> Hi Alex! > > Sorry for the late reply, somehow I missed your e-mails. > > The patchset looks good to me, except a typo in subjects/commit logs: > you probably meant "unexpected failures". > > Please, fix and feel free to use: > Reviewed-by: Roman Gushchin <guro(a)fb.com> > for the series. > Thanks Roman! BTW, do you know other test cases for cgroup2 function or performance? Thanks Alex

6 years, 1 month

1
0
0 0

[PATCH v2 3/3] kselftest/cgroup: fix incorrect test_core skip

by Alex Shi

The test_core will skip the test_cgcore_no_internal_process_constraint_on_threads test case if the 'cpu' controller missing in root's subtree_control. In fact we need to set the 'cpu' in subtree_control, to make the testing meaningful. ./test_core ... ok 4 # skip test_cgcore_no_internal_process_constraint_on_threads ... Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: Claudio Zumbo <claudioz(a)fb.com> Cc: Claudio <claudiozumbo(a)gmail.com> Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Reviewed-by: Roman Gushchin <guro(a)fb.com> --- tools/testing/selftests/cgroup/test_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c index b1a570d7c557..2ffc3e07d98d 100644 --- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -198,7 +198,7 @@ static int test_cgcore_no_internal_process_constraint_on_threads(const char *roo char *parent = NULL, *child = NULL; if (cg_read_strstr(root, "cgroup.controllers", "cpu") || - cg_read_strstr(root, "cgroup.subtree_control", "cpu")) { + cg_write(root, "cgroup.subtree_control", "+cpu")) { ret = KSFT_SKIP; goto cleanup; } -- 2.19.1.856.g8858448bb

6 years, 1 month

1
0
0 0

[PATCH v2 2/3] kselftest/cgroup: fix unexpected testing failure on test_core

by Alex Shi

The cgroup testing relys on the root cgroup's subtree_control setting, If the 'memory' controller isn't set, some test cases will be failed as following: $sudo ./test_core not ok 1 test_cgcore_internal_process_constraint ok 2 test_cgcore_top_down_constraint_enable not ok 3 test_cgcore_top_down_constraint_disable ... To correct this unexpected failure, this patch write the 'memory' to subtree_control of root to get a right result. Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: Claudio Zumbo <claudioz(a)fb.com> Cc: Claudio <claudiozumbo(a)gmail.com> Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Reviewed-by: Roman Gushchin <guro(a)fb.com> --- tools/testing/selftests/cgroup/test_core.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c index be59f9c34ea2..b1a570d7c557 100644 --- a/tools/testing/selftests/cgroup/test_core.c +++ b/tools/testing/selftests/cgroup/test_core.c @@ -376,6 +376,11 @@ int main(int argc, char *argv[]) if (cg_find_unified_root(root, sizeof(root))) ksft_exit_skip("cgroup v2 isn't mounted\n"); + + if (cg_read_strstr(root, "cgroup.subtree_control", "memory")) + if (cg_write(root, "cgroup.subtree_control", "+memory")) + ksft_exit_skip("Failed to set root memory controller\n"); + for (i = 0; i < ARRAY_SIZE(tests); i++) { switch (tests[i].fn(root)) { case KSFT_PASS: -- 2.19.1.856.g8858448bb

6 years, 1 month

1
0
0 0

Re: [PATCH 1/3] kselftest/cgroup: fix unexcepted testing failure on test_memcontrol

by Alex Shi

> Hi Alex! > > Sorry for the late reply, somehow I missed your e-mails. > > The patchset looks good to me, except a typo in subjects/commit logs: > you probably meant "unexpected failures". > > Please, fix and feel free to use: > Reviewed-by: Roman Gushchin <guro(a)fb.com> > for the series. Hi Roman, Thanks for the typo correct and review. I will resend them with your suggestion. BTW, How do you test cgroup2 except kselftest? Any other function/performance testing case? Thanks a lot! Alex

6 years, 1 month

1
0
0 0

[PATCH v1 1/2] pid: add pidfd_open()

by Christian Brauner

This adds the pidfd_open() syscall. It allows a caller to retrieve pollable pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a process that is created via traditional fork()/clone() calls that is only referenced by a PID: int pidfd = pidfd_open(1234, 0); ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0); With the introduction of pidfds through CLONE_PIDFD it is possible to created pidfds at process creation time. However, a lot of processes get created with traditional PID-based calls such as fork() or clone() (without CLONE_PIDFD). For these processes a caller can currently not create a pollable pidfd. This is a huge problem for Android's low memory killer (LMK) and service managers such as systemd. Both are examples of tools that want to make use of pidfds to get reliable notification of process exit for non-parents (pidfd polling) and race-free signal sending (pidfd_send_signal()). They intend to switch to this API for process supervision/management as soon as possible. Having no way to get pollable pidfds from PID-only processes is one of the biggest blockers for them in adopting this api. With pidfd_open() making it possible to retrieve pidfd for PID-based processes we enable them to adopt this api. In line with Arnd's recent changes to consolidate syscall numbers across architectures, I have added the pidfd_open() syscall to all architectures at the same time. Signed-off-by: Christian Brauner <christian(a)brauner.io> Acked-by: Geert Uytterhoeven <geert(a)linux-m68k.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Kees Cook <keescook(a)chromium.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Jann Horn <jannh(a)google.com> Cc: David Howells <dhowells(a)redhat.com> Cc: Andy Lutomirsky <luto(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Aleksa Sarai <cyphar(a)cyphar.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: linux-api(a)vger.kernel.org --- v1: - kbuild test robot <lkp(a)intel.com>: - add missing entry for pidfd_open to arch/arm/tools/syscall.tbl - Oleg Nesterov <oleg(a)redhat.com>: - use simpler thread-group leader check --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/pid.h | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 4 +- kernel/fork.c | 2 +- kernel/pid.c | 50 +++++++++++++++++++++ 20 files changed, 72 insertions(+), 2 deletions(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 165f268beafc..ddc3c93ad7a7 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -467,3 +467,4 @@ 535 common io_uring_setup sys_io_uring_setup 536 common io_uring_enter sys_io_uring_enter 537 common io_uring_register sys_io_uring_register +538 common pidfd_open sys_pidfd_open diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index 0393917eaa57..fc41fb34a636 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -441,3 +441,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 23f1a44acada..350e2049b4a9 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -874,6 +874,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_pidfd_open 428 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 56e3d0b685e1..7115f6dd347a 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -348,3 +348,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index df4ec3ec71d1..44bf12b16ffe 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -427,3 +427,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 4964947732af..0d32e5152dc0 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 9392dfe33f97..726e107b3c9f 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -366,3 +366,4 @@ 425 n32 io_uring_setup sys_io_uring_setup 426 n32 io_uring_enter sys_io_uring_enter 427 n32 io_uring_register sys_io_uring_register +428 n32 pidfd_open sys_pidfd_open diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index fe8ca623add8..83b46b568d51 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -424,3 +424,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 00f5a63c8d9a..5294d04d7fa5 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -509,3 +509,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 061418f787c3..dcdb838adf49 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -430,3 +430,4 @@ 425 common io_uring_setup sys_io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open sys_pidfd_open diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index 480b057556ee..8e66edfbc521 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -430,3 +430,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index a1dd24307b00..d6f3bc686939 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -473,3 +473,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cd5f982b1e5..1af6b469160a 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup 426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter 427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register +428 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 64ca0d06259a..c18e6ebe3387 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 425 common io_uring_setup __x64_sys_io_uring_setup 426 common io_uring_enter __x64_sys_io_uring_enter 427 common io_uring_register __x64_sys_io_uring_register +428 common pidfd_open __x64_sys_pidfd_open # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 30084eaf8422..21ee795f3003 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -398,3 +398,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/include/linux/pid.h b/include/linux/pid.h index 3c8ef5a199ca..c938a92eab99 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -67,6 +67,7 @@ struct pid extern struct pid init_struct_pid; extern const struct file_operations pidfd_fops; +extern int pidfd_create(struct pid *pid); static inline struct pid *get_pid(struct pid *pid) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e2870fe1be5b..989055e0b501 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock, struct old_timex32 __user *tx); asmlinkage long sys_syncfs(int fd); asmlinkage long sys_setns(int fd, int nstype); +asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags); asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg, unsigned int vlen, unsigned flags); asmlinkage long sys_process_vm_readv(pid_t pid, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index dee7292e1df6..94a257a93d20 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_pidfd_open 428 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) #undef __NR_syscalls -#define __NR_syscalls 428 +#define __NR_syscalls 429 /* * 32 bit systems traditionally used different diff --git a/kernel/fork.c b/kernel/fork.c index 737db1828437..980cc1d2b8d4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1714,7 +1714,7 @@ const struct file_operations pidfd_fops = { * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -static int pidfd_create(struct pid *pid) +int pidfd_create(struct pid *pid) { int fd; diff --git a/kernel/pid.c b/kernel/pid.c index 20881598bdfa..4afca3d6dcb8 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -38,6 +38,7 @@ #include <linux/syscalls.h> #include <linux/proc_ns.h> #include <linux/proc_fs.h> +#include <linux/sched/signal.h> #include <linux/sched/task.h> #include <linux/idr.h> @@ -451,6 +452,55 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) return idr_get_next(&ns->idr, &nr); } +/** + * pidfd_open() - Open new pid file descriptor. + * + * @pid: pid for which to retrieve a pidfd + * @flags: flags to pass + * + * This creates a new pid file descriptor with the O_CLOEXEC flag set for + * the process identified by @pid. Currently, the process identified by + * @pid must be a thread-group leader. This restriction currently exists + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot + * be used with CLONE_THREAD) and pidfd polling (only supports thread group + * leaders). + * + * Return: On success, a cloexec pidfd is returned. + * On error, a negative errno number will be returned. + */ +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) +{ + int fd, ret; + struct pid *p; + struct task_struct *tsk; + + if (flags) + return -EINVAL; + + if (pid <= 0) + return -EINVAL; + + p = find_get_pid(pid); + if (!p) + return -ESRCH; + + ret = 0; + rcu_read_lock(); + /* + * If this returns non-NULL the pid was used as a thread-group + * leader. Note, we race with exec here: If it changes the + * thread-group leader we might return the old leader. + */ + tsk = pid_task(p, PIDTYPE_TGID); + if (!tsk) + ret = -ESRCH; + rcu_read_unlock(); + + fd = ret ?: pidfd_create(p); + put_pid(p); + return fd; +} + void __init pid_idr_init(void) { /* Verify no one has done anything silly: */ -- 2.21.0

6 years, 1 month

5
12
0 0

[PATCH bpf-next 0/2] Move bpf_printk to bpf_helpers.h

by Michal Rostecki

This series of patches move the commonly used bpf_printk macro to bpf_helpers.h which is already included in all BPF programs which defined that macro on their own. Michal Rostecki (2): selftests: bpf: Move bpf_printk to bpf_helpers.h samples: bpf: Do not define bpf_printk macro samples/bpf/hbm_kern.h | 12 +++--------- samples/bpf/hbm_out_kern.c | 2 ++ samples/bpf/tcp_basertt_kern.c | 7 ------- samples/bpf/tcp_bufs_kern.c | 7 ------- samples/bpf/tcp_clamp_kern.c | 7 ------- samples/bpf/tcp_cong_kern.c | 7 ------- samples/bpf/tcp_iw_kern.c | 7 ------- samples/bpf/tcp_rwnd_kern.c | 7 ------- samples/bpf/tcp_synrto_kern.c | 7 ------- samples/bpf/tcp_tos_reflect_kern.c | 7 ------- samples/bpf/xdp_sample_pkts_kern.c | 7 ------- tools/testing/selftests/bpf/bpf_helpers.h | 8 ++++++++ .../testing/selftests/bpf/progs/sockmap_parse_prog.c | 7 ------- .../selftests/bpf/progs/sockmap_tcp_msg_prog.c | 7 ------- .../selftests/bpf/progs/sockmap_verdict_prog.c | 7 ------- .../testing/selftests/bpf/progs/test_lwt_seg6local.c | 7 ------- .../testing/selftests/bpf/progs/test_xdp_noinline.c | 7 ------- tools/testing/selftests/bpf/test_sockmap_kern.h | 7 ------- 18 files changed, 13 insertions(+), 114 deletions(-) -- 2.21.0

6 years, 1 month

2
4
0 0

[PATCH v3] selftests/x86: Support Atom for syscall_arg_fault test

by Tong Bo

Atom-based CPUs trigger stack fault when invoke 32-bit SYSENTER instruction with invalid register values. So we also need SIGBUS handling in this case. Following is assembly when the fault exception happens. (gdb) disassemble $eip Dump of assembler code for function __kernel_vsyscall: 0xf7fd8fe0 <+0>: push %ecx 0xf7fd8fe1 <+1>: push %edx 0xf7fd8fe2 <+2>: push %ebp 0xf7fd8fe3 <+3>: mov %esp,%ebp 0xf7fd8fe5 <+5>: sysenter 0xf7fd8fe7 <+7>: int $0x80 => 0xf7fd8fe9 <+9>: pop %ebp 0xf7fd8fea <+10>: pop %edx 0xf7fd8feb <+11>: pop %ecx 0xf7fd8fec <+12>: ret End of assembler dump. According to Intel SDM, this could also be a Stack Segment Fault(#SS, 12), except a normal Page Fault(#PF, 14). Especially, in section 6.9 of Vol.3A, both stack and page faults are within the 10th(lowest priority) class, and as it said, "exceptions within each class are implementation-dependent and may vary from processor to processor". It's expected for processors like Intel Atom to trigger stack fault(SIGBUS), while we get page fault(SIGSEGV) from common Core processors. Signed-off-by: Tong Bo <bo.tong(a)intel.com> Acked-by: Andy Lutomirski <luto(a)kernel.org> --- tools/testing/selftests/x86/syscall_arg_fault.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/x86/syscall_arg_fault.c b/tools/testing/selftests/x86/syscall_arg_fault.c index 7db4fc9..d2548401 100644 --- a/tools/testing/selftests/x86/syscall_arg_fault.c +++ b/tools/testing/selftests/x86/syscall_arg_fault.c @@ -43,7 +43,7 @@ static sigjmp_buf jmpbuf; static volatile sig_atomic_t n_errs; -static void sigsegv(int sig, siginfo_t *info, void *ctx_void) +static void sigsegv_or_sigbus(int sig, siginfo_t *info, void *ctx_void) { ucontext_t *ctx = (ucontext_t*)ctx_void; @@ -73,7 +73,13 @@ int main() if (sigaltstack(&stack, NULL) != 0) err(1, "sigaltstack"); - sethandler(SIGSEGV, sigsegv, SA_ONSTACK); + sethandler(SIGSEGV, sigsegv_or_sigbus, SA_ONSTACK); + /* + * The actual exception can vary. On Atom CPUs, we get #SS + * instead of #PF when the vDSO fails to access the stack when + * ESP is too close to 2^32, and #SS causes SIGBUS. + */ + sethandler(SIGBUS, sigsegv_or_sigbus, SA_ONSTACK); sethandler(SIGILL, sigill, SA_ONSTACK); /* -- 2.7.4

6 years, 1 month

3
3
0 0

[PATCH] KVM: selftests: Compile code with warnings enabled

by Thomas Huth

So far the KVM selftests are compiled without any compiler warnings enabled. That's quite bad, since we miss a lot of possible bugs this way. Let's enable at least "-Wall" and some other useful warning flags now. Signed-off-by: Thomas Huth <thuth(a)redhat.com> --- This patch fixes most of the warnings in the x86 code already - but for some warnings, I was not quite sure (e.g. about the need for the kvm_get_supported_cpuid_entry(1) in some tests), so I did not touch that code yet. I also did not check aarch64 yet. I'd be glad if someone who knows these parts of the code could have a look at the warnings there. tools/testing/selftests/kvm/Makefile | 4 +++- tools/testing/selftests/kvm/dirty_log_test.c | 6 +++++- tools/testing/selftests/kvm/lib/kvm_util.c | 3 --- tools/testing/selftests/kvm/lib/x86_64/processor.c | 4 +--- tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c | 1 + tools/testing/selftests/kvm/x86_64/platform_info_test.c | 1 - tools/testing/selftests/kvm/x86_64/smm_test.c | 3 +-- .../selftests/kvm/x86_64/vmx_close_while_nested_test.c | 5 +---- tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c | 5 ++--- 9 files changed, 14 insertions(+), 18 deletions(-) diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index f8588cca2bef..93f344bb96af 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -32,7 +32,9 @@ LIBKVM += $(LIBKVM_$(UNAME_M)) INSTALL_HDR_PATH = $(top_srcdir)/usr LINUX_HDR_PATH = $(INSTALL_HDR_PATH)/include/ LINUX_TOOL_INCLUDE = $(top_srcdir)/tools/include -CFLAGS += -O2 -g -std=gnu99 -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(UNAME_M) -I.. +CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \ + -fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \ + -I$(LINUX_HDR_PATH) -Iinclude -I$(<D) -Iinclude/$(UNAME_M) -I.. no-pie-option := $(call try-run, echo 'int main() { return 0; }' | \ $(CC) -Werror $(KBUILD_CPPFLAGS) $(CC_OPTION_CFLAGS) -no-pie -x c - -o "$$TMP", -no-pie) diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 93f99c6b7d79..0b9f197a783e 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -131,6 +131,7 @@ static void *vcpu_worker(void *data) while (!READ_ONCE(host_quit)) { /* Let the guest dirty the random pages */ ret = _vcpu_run(vm, VCPU_ID); + TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret); if (get_ucall(vm, VCPU_ID, &uc) == UCALL_SYNC) { pages_count += TEST_PAGES_PER_LOOP; generate_random_array(guest_array, TEST_PAGES_PER_LOOP); @@ -426,8 +427,11 @@ int main(int argc, char *argv[]) unsigned long interval = TEST_HOST_LOOP_INTERVAL; bool mode_selected = false; uint64_t phys_offset = 0; - unsigned int mode, host_ipa_limit; + unsigned int mode; int opt, i; +#ifdef __aarch64__ + unsigned int host_ipa_limit; +#endif #ifdef USE_CLEAR_DIRTY_LOG if (!kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT)) { diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index 4ca96b228e46..e2528a3894cf 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -135,7 +135,6 @@ struct kvm_vm *_vm_create(enum vm_guest_mode mode, uint64_t phy_pages, int perm, unsigned long type) { struct kvm_vm *vm; - int kvm_fd; vm = calloc(1, sizeof(*vm)); TEST_ASSERT(vm != NULL, "Insufficient Memory"); @@ -556,7 +555,6 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, uint32_t flags) { int ret; - unsigned long pmem_size = 0; struct userspace_mem_region *region; size_t huge_page_size = KVM_UTIL_PGS_PER_HUGEPG * vm->page_size; @@ -1302,7 +1300,6 @@ void vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_sregs *sregs) int _vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_sregs *sregs) { struct vcpu *vcpu = vcpu_find(vm, vcpuid); - int ret; TEST_ASSERT(vcpu != NULL, "vcpu not found, vcpuid: %u", vcpuid); diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c index dc7fae9fa424..21f3040d90cb 100644 --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c @@ -229,8 +229,6 @@ void sregs_dump(FILE *stream, struct kvm_sregs *sregs, void virt_pgd_alloc(struct kvm_vm *vm, uint32_t pgd_memslot) { - int rc; - TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use " "unknown or unsupported guest mode, mode: 0x%x", vm->mode); @@ -549,7 +547,6 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) struct pageDirectoryPointerEntry *pdpe; struct pageDirectoryEntry *pde; struct pageTableEntry *pte; - void *hva; TEST_ASSERT(vm->mode == VM_MODE_P52V48_4K, "Attempt to use " "unknown or unsupported guest mode, mode: 0x%x", vm->mode); @@ -582,6 +579,7 @@ vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva) unmapped_gva: TEST_ASSERT(false, "No mapping for vm virtual address, " "gva: 0x%lx", gva); + exit(EXIT_FAILURE); } static void kvm_setup_gdt(struct kvm_vm *vm, struct kvm_dtable *dt, int gdt_memslot, diff --git a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c index 7c2c4d4055a8..63cc9c3f5ab6 100644 --- a/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c +++ b/tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c @@ -87,6 +87,7 @@ int main(int argc, char *argv[]) while (1) { rc = _vcpu_run(vm, VCPU_ID); + TEST_ASSERT(rc == 0, "vcpu_run failed: %d\n", rc); TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, "Unexpected exit reason: %u (%s),\n", run->exit_reason, diff --git a/tools/testing/selftests/kvm/x86_64/platform_info_test.c b/tools/testing/selftests/kvm/x86_64/platform_info_test.c index eb3e7a838cb4..40050e44ec0a 100644 --- a/tools/testing/selftests/kvm/x86_64/platform_info_test.c +++ b/tools/testing/selftests/kvm/x86_64/platform_info_test.c @@ -81,7 +81,6 @@ static void test_msr_platform_info_disabled(struct kvm_vm *vm) int main(int argc, char *argv[]) { struct kvm_vm *vm; - struct kvm_run *state; int rv; uint64_t msr_platform_info; diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c index fb8086964d83..4daf520bada1 100644 --- a/tools/testing/selftests/kvm/x86_64/smm_test.c +++ b/tools/testing/selftests/kvm/x86_64/smm_test.c @@ -87,7 +87,6 @@ void guest_code(struct vmx_pages *vmx_pages) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages = NULL; vm_vaddr_t vmx_pages_gva = 0; struct kvm_regs regs; @@ -115,7 +114,7 @@ int main(int argc, char *argv[]) vcpu_set_msr(vm, VCPU_ID, MSR_IA32_SMBASE, SMRAM_GPA); if (kvm_check_cap(KVM_CAP_NESTED_STATE)) { - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); } else { printf("will skip SMM test with VMX enabled\n"); diff --git a/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c b/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c index 6edec6fd790b..97182b47b10c 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_close_while_nested_test.c @@ -39,8 +39,6 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) { #define L2_GUEST_STACK_SIZE 64 unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; - uint32_t control; - uintptr_t save_cr3; GUEST_ASSERT(prepare_for_vmx_operation(vmx_pages)); GUEST_ASSERT(load_vmcs(vmx_pages)); @@ -55,7 +53,6 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages; vm_vaddr_t vmx_pages_gva; struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); @@ -68,7 +65,7 @@ int main(int argc, char *argv[]) vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); /* Allocate VMX pages and shared descriptors (vmx_pages). */ - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); for (;;) { diff --git a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c index 18fa64db0d7a..6d37a3173956 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c @@ -121,7 +121,7 @@ static void l1_guest_code(struct vmx_pages *vmx_pages) GUEST_DONE(); } -void report(int64_t val) +static void report(int64_t val) { printf("IA32_TSC_ADJUST is %ld (%lld * TSC_ADJUST_VALUE + %lld).\n", val, val / TSC_ADJUST_VALUE, val % TSC_ADJUST_VALUE); @@ -129,7 +129,6 @@ void report(int64_t val) int main(int argc, char *argv[]) { - struct vmx_pages *vmx_pages; vm_vaddr_t vmx_pages_gva; struct kvm_cpuid_entry2 *entry = kvm_get_supported_cpuid_entry(1); @@ -142,7 +141,7 @@ int main(int argc, char *argv[]) vcpu_set_cpuid(vm, VCPU_ID, kvm_get_supported_cpuid()); /* Allocate VMX pages and shared descriptors (vmx_pages). */ - vmx_pages = vcpu_alloc_vmx(vm, &vmx_pages_gva); + vcpu_alloc_vmx(vm, &vmx_pages_gva); vcpu_args_set(vm, VCPU_ID, 1, vmx_pages_gva); for (;;) { -- 2.21.0

6 years, 1 month

2
2
0 0

[GIT PULL] Kselftest second update for Linux 5.2-rc1

by Shuah Khan

Hi Linus, Please pull the following Kselftest update for Linux 5.2-rc1 This kselftest second update for Linux 5.2-rc1 consists of Kselftest framework fixes from Shuah Khan - kselftest framework bpf build/test workflow regression fix - Fix to kselftest install to use default install path - Fix to kselftest KBUILD_OUTPUT builds to not clutter main KBUILD_OUTPUT directory with selftest objects - .gitignore fixes from Kelsey Skunberg - rseq selftests updates from Mathieu Desnoyers and Martin Schwidefsky: They change the per-architecture pre-abort signatures to ensure those are valid trap instructions. The way exit points are presented to debuggers is enhanced, ensuring all exit points are present, so debuggers don't have to disassemble rseq critical section to properly skip over them. Discussions with the glibc community is reaching a consensus of exposing a __rseq_handled symbol from glibc to coexist with rseq early adopters. Update the rseq selftest code to expose and use this symbol. Support for compiling asm goto with clang is added with the "-no-integrated-as" compiler switch, similarly to the top level kernel Makefile. - kselftest Makefile test run output refactoring and making test output TAP13 compliant from Kees Cook: This re-factors the selftest Makefiles to extract the test running logic to be reused between "run_tests" and "emit_tests", while also fixing up the test output to be TAP version 13 compliant: - added "plan" line - fixed result line syntax - moved all test output to be "# "-prefixed as TAP "diagnostic" lines The prefixing code includes a fallback mode for limited execution environments. Additionally, the plan lines are fixed for all callers of kselftest.h. diff is attached. Also, there is a merge conflict with the following commit which went through the s390 maintainer tree and is now in your master which will require a minor fix-up. commit e24e4712efad737ca09ff299276737331cd021d9 Author: Martin Schwidefsky <schwidefsky(a)de.ibm.com> rseq/selftests: s390: use trap4 for RSEQ_SIG thanks, -- Shuah --------------------------------------------------------------- The following changes since commit d917fb876f6eaeeea8a2b620d2a266ce26372f4d: selftests: build and run gpio when output directory is the src dir (2019-04-22 17:02:26 -0600) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-5.2-rc1-2 for you to fetch changes up to 61c2018c0743fe0c9ca68e308b5727b8a7c3d544: selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects (2019-05-14 17:37:41 -0600) ---------------------------------------------------------------- linux-kselftest-5.2-rc1-2 This kselftest second update for Linux 5.2-rc1 consists of Kselftest framework fixes from Shuah Khan - kselftest framework bpf build/test workflow regression fix - Fix to kselftest install to use default install path - Fix to kselftest KBUILD_OUTPUT builds to not clutter main KBUILD_OUTPUT directory with selftest objects - .gitignore fixes from Kelsey Skunberg - rseq selftests updates from Mathieu Desnoyers and Martin Schwidefsky: They change the per-architecture pre-abort signatures to ensure those are valid trap instructions. The way exit points are presented to debuggers is enhanced, ensuring all exit points are present, so debuggers don't have to disassemble rseq critical section to properly skip over them. Discussions with the glibc community is reaching a consensus of exposing a __rseq_handled symbol from glibc to coexist with rseq early adopters. Update the rseq selftest code to expose and use this symbol. Support for compiling asm goto with clang is added with the "-no-integrated-as" compiler switch, similarly to the top level kernel Makefile. - kselftest Makefile test run output refactoring and making test output TAP13 compliant from Kees Cook: This re-factors the selftest Makefiles to extract the test running logic to be reused between "run_tests" and "emit_tests", while also fixing up the test output to be TAP version 13 compliant: - added "plan" line - fixed result line syntax - moved all test output to be "# "-prefixed as TAP "diagnostic" lines The prefixing code includes a fallback mode for limited execution environments. Additionally, the plan lines are fixed for all callers of kselftest.h. ---------------------------------------------------------------- Kees Cook (8): selftests: Extract single-test shell logic from lib.mk selftests: Use runner.sh for emit targets selftests: Extract logic for multiple test runs selftests: Add plan line and fix result line syntax selftests: Distinguish between missing and non-executable selftests: Move test output to diagnostic lines selftests: Remove KSFT_TAP_LEVEL selftests: Add test plan API to kselftest.h and adjust callers Kelsey Skunberg (2): selftests: pidfd: Create .gitignore to include pidfd_test selftests: drivers: Create .gitignore to include /dma-buf/udmabuf Martin Schwidefsky (1): rseq/selftests: s390: use trap4 for RSEQ_SIG Mathieu Desnoyers (11): rseq/selftests: x86: Work-around bogus gcc-8 optimisation rseq/selftests: Add __rseq_exit_point_array section for debuggers rseq/selftests: Introduce __rseq_cs_ptr_array, rename __rseq_table to __rseq_cs rseq/selftests: Use __rseq_handled symbol to coexist with glibc rseq/selftests: s390: use jg instruction for jumps outside of the asm rseq/selftests: x86: use ud1 instruction as RSEQ_SIG opcode rseq/selftests: arm: use udf instruction for RSEQ_SIG rseq/selftests: aarch64 code signature: handle big-endian environment rseq/selftests: powerpc code signature: generate valid instructions rseq/selftests: mips: use break instruction for RSEQ_SIG rseq/selftests: add -no-integrated-as for clang Shuah Khan (3): selftests: fix install target to use default install path selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects tools/testing/selftests/.gitignore | 1 - tools/testing/selftests/Makefile | 39 +-- .../selftests/breakpoints/breakpoint_test.c | 15 +- .../selftests/breakpoints/breakpoint_test_arm64.c | 3 +- .../breakpoints/step_after_suspend_test.c | 8 + tools/testing/selftests/capabilities/test_execve.c | 6 +- tools/testing/selftests/drivers/.gitignore | 1 + .../selftests/futex/functional/futex_requeue_pi.c | 1 + .../functional/futex_requeue_pi_mismatched_ops.c | 1 + .../functional/futex_requeue_pi_signal_restart.c | 1 + .../functional/futex_wait_private_mapped_file.c | 1 + .../futex/functional/futex_wait_timeout.c | 1 + .../functional/futex_wait_uninitialized_heap.c | 1 + .../futex/functional/futex_wait_wouldblock.c | 1 + tools/testing/selftests/kselftest.h | 17 +- tools/testing/selftests/kselftest/prefix.pl | 23 ++ tools/testing/selftests/kselftest/runner.sh | 86 +++++++ tools/testing/selftests/lib.mk | 76 ++---- .../testing/selftests/membarrier/membarrier_test.c | 1 + tools/testing/selftests/pidfd/.gitignore | 1 + tools/testing/selftests/pidfd/pidfd_test.c | 1 + tools/testing/selftests/rseq/Makefile | 8 +- tools/testing/selftests/rseq/rseq-arm.h | 132 +++++++++-- tools/testing/selftests/rseq/rseq-arm64.h | 74 +++++- tools/testing/selftests/rseq/rseq-mips.h | 115 ++++++++- tools/testing/selftests/rseq/rseq-ppc.h | 90 ++++++- tools/testing/selftests/rseq/rseq-s390.h | 78 +++++- tools/testing/selftests/rseq/rseq-x86.h | 264 ++++++++++++++------- tools/testing/selftests/rseq/rseq.c | 55 ++++- tools/testing/selftests/rseq/rseq.h | 1 + tools/testing/selftests/sigaltstack/sas.c | 1 + tools/testing/selftests/sync/sync_test.c | 1 + 32 files changed, 883 insertions(+), 221 deletions(-) create mode 100644 tools/testing/selftests/drivers/.gitignore create mode 100755 tools/testing/selftests/kselftest/prefix.pl create mode 100644 tools/testing/selftests/kselftest/runner.sh create mode 100644 tools/testing/selftests/pidfd/.gitignore ---------------------------------------------------------------

6 years, 1 month

2
1
0 0

[PATCH] selftests: fix bpf build/test workflow regression when KBUILD_OUTPUT is set

by Shuah Khan

commit 8ce72dc32578 ("selftests: fix headers_install circular dependency") broke bpf build/test workflow. When KBUILD_OUTPUT is set, bpf objects end up in KBUILD_OUTPUT build directory instead of in ../selftests/bpf. The following bpf workflow breaks when it can't find the test_verifier: cd tools/testing/selftests/bpf; make; ./test_verifier; Fix it to set OUTPUT only when it is undefined in lib.mk. It didn't need to be set in the first place. Fixes: commit 8ce72dc32578 ("selftests: fix headers_install circular dependency") Reported-by: Alexei Starovoitov <alexei.starovoitov(a)gmail.com> Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> --- tools/testing/selftests/lib.mk | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk index 098dd0065fb1..077337195783 100644 --- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -3,15 +3,9 @@ CC := $(CROSS_COMPILE)gcc ifeq (0,$(MAKELEVEL)) - ifneq ($(O),) - OUTPUT := $(O) - else - ifneq ($(KBUILD_OUTPUT),) - OUTPUT := $(KBUILD_OUTPUT) - else - OUTPUT := $(shell pwd) - DEFAULT_INSTALL_HDR_PATH := 1 - endif + ifeq ($(OUTPUT),) + OUTPUT := $(shell pwd) + DEFAULT_INSTALL_HDR_PATH := 1 endif endif selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST)))) -- 2.17.1

6 years, 1 month

2
3
0 0

[PATCH bpf-next v2 0/2] Move bpf_printk to bpf_helpers.h

by Michal Rostecki

This series of patches move the commonly used bpf_printk macro to bpf_helpers.h which is already included in all BPF programs which defined that macro on their own. v1->v2: - If HBM_DEBUG is not defined in hbm sample, undefine bpf_printk and set an empty macro for it. Michal Rostecki (2): selftests: bpf: Move bpf_printk to bpf_helpers.h samples: bpf: Do not define bpf_printk macro samples/bpf/hbm_kern.h | 11 ++--------- samples/bpf/tcp_basertt_kern.c | 7 ------- samples/bpf/tcp_bufs_kern.c | 7 ------- samples/bpf/tcp_clamp_kern.c | 7 ------- samples/bpf/tcp_cong_kern.c | 7 ------- samples/bpf/tcp_iw_kern.c | 7 ------- samples/bpf/tcp_rwnd_kern.c | 7 ------- samples/bpf/tcp_synrto_kern.c | 7 ------- samples/bpf/tcp_tos_reflect_kern.c | 7 ------- samples/bpf/xdp_sample_pkts_kern.c | 7 ------- tools/testing/selftests/bpf/bpf_helpers.h | 8 ++++++++ .../testing/selftests/bpf/progs/sockmap_parse_prog.c | 7 ------- .../selftests/bpf/progs/sockmap_tcp_msg_prog.c | 7 ------- .../selftests/bpf/progs/sockmap_verdict_prog.c | 7 ------- .../testing/selftests/bpf/progs/test_lwt_seg6local.c | 7 ------- tools/testing/selftests/bpf/progs/test_xdp_noinline.c | 7 ------- tools/testing/selftests/bpf/test_sockmap_kern.h | 7 ------- 17 files changed, 10 insertions(+), 114 deletions(-) -- 2.21.0

6 years, 1 month

1
2
0 0

[PATCH 1/2] pid: add pidfd_open()

by Christian Brauner

This adds the pidfd_open() syscall. It allows a caller to retrieve pollable pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a process that is created via traditional fork()/clone() calls that is only referenced by a PID: int pidfd = pidfd_open(1234, 0); ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0); With the introduction of pidfds through CLONE_PIDFD it is possible to created pidfds at process creation time. However, a lot of processes get created with traditional PID-based calls such as fork() or clone() (without CLONE_PIDFD). For these processes a caller can currently not create a pollable pidfd. This is a huge problem for Android's low memory killer (LMK) and service managers such as systemd. Both are examples of tools that want to make use of pidfds to get reliable notification of process exit for non-parents (pidfd polling) and race-free signal sending (pidfd_send_signal()). They intend to switch to this API for process supervision/management as soon as possible. Having no way to get pollable pidfds from PID-only processes is one of the biggest blockers for them in adopting this api. With pidfd_open() making it possible to retrieve pidfd for PID-based processes we enable them to adopt this api. In line with Arnd's recent changes to consolidate syscall numbers across architectures, I have added the pidfd_open() syscall to all architectures at the same time. Signed-off-by: Christian Brauner <christian(a)brauner.io> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Kees Cook <keescook(a)chromium.org> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Jann Horn <jannh(a)google.com> Cc: David Howells <dhowells(a)redhat.com> Cc: Andy Lutomirsky <luto(a)kernel.org> Cc: Andrew Morton <akpm(a)linux-foundation.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Aleksa Sarai <cyphar(a)cyphar.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: linux-api(a)vger.kernel.org --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/pid.h | 1 + include/linux/syscalls.h | 1 + include/uapi/asm-generic/unistd.h | 4 +- kernel/fork.c | 2 +- kernel/pid.c | 48 +++++++++++++++++++++ 19 files changed, 69 insertions(+), 2 deletions(-) diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 165f268beafc..ddc3c93ad7a7 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -467,3 +467,4 @@ 535 common io_uring_setup sys_io_uring_setup 536 common io_uring_enter sys_io_uring_enter 537 common io_uring_register sys_io_uring_register +538 common pidfd_open sys_pidfd_open diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 23f1a44acada..350e2049b4a9 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -874,6 +874,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_pidfd_open 428 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) /* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 56e3d0b685e1..7115f6dd347a 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -348,3 +348,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index df4ec3ec71d1..44bf12b16ffe 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -427,3 +427,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index 4964947732af..0d32e5152dc0 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -433,3 +433,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index 9392dfe33f97..726e107b3c9f 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -366,3 +366,4 @@ 425 n32 io_uring_setup sys_io_uring_setup 426 n32 io_uring_enter sys_io_uring_enter 427 n32 io_uring_register sys_io_uring_register +428 n32 pidfd_open sys_pidfd_open diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index fe8ca623add8..83b46b568d51 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -424,3 +424,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index 00f5a63c8d9a..5294d04d7fa5 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -509,3 +509,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index 061418f787c3..dcdb838adf49 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -430,3 +430,4 @@ 425 common io_uring_setup sys_io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open sys_pidfd_open diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index 480b057556ee..8e66edfbc521 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -430,3 +430,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index a1dd24307b00..d6f3bc686939 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -473,3 +473,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cd5f982b1e5..1af6b469160a 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup 426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter 427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register +428 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 64ca0d06259a..c18e6ebe3387 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 425 common io_uring_setup __x64_sys_io_uring_setup 426 common io_uring_enter __x64_sys_io_uring_enter 427 common io_uring_register __x64_sys_io_uring_register +428 common pidfd_open __x64_sys_pidfd_open # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 30084eaf8422..21ee795f3003 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -398,3 +398,4 @@ 425 common io_uring_setup sys_io_uring_setup 426 common io_uring_enter sys_io_uring_enter 427 common io_uring_register sys_io_uring_register +428 common pidfd_open sys_pidfd_open diff --git a/include/linux/pid.h b/include/linux/pid.h index 3c8ef5a199ca..c938a92eab99 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -67,6 +67,7 @@ struct pid extern struct pid init_struct_pid; extern const struct file_operations pidfd_fops; +extern int pidfd_create(struct pid *pid); static inline struct pid *get_pid(struct pid *pid) { diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e2870fe1be5b..989055e0b501 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock, struct old_timex32 __user *tx); asmlinkage long sys_syncfs(int fd); asmlinkage long sys_setns(int fd, int nstype); +asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags); asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg, unsigned int vlen, unsigned flags); asmlinkage long sys_process_vm_readv(pid_t pid, diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index dee7292e1df6..94a257a93d20 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_pidfd_open 428 +__SYSCALL(__NR_pidfd_open, sys_pidfd_open) #undef __NR_syscalls -#define __NR_syscalls 428 +#define __NR_syscalls 429 /* * 32 bit systems traditionally used different diff --git a/kernel/fork.c b/kernel/fork.c index 737db1828437..980cc1d2b8d4 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1714,7 +1714,7 @@ const struct file_operations pidfd_fops = { * Return: On success, a cloexec pidfd is returned. * On error, a negative errno number will be returned. */ -static int pidfd_create(struct pid *pid) +int pidfd_create(struct pid *pid) { int fd; diff --git a/kernel/pid.c b/kernel/pid.c index 20881598bdfa..237d18d6ecb8 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -38,6 +38,7 @@ #include <linux/syscalls.h> #include <linux/proc_ns.h> #include <linux/proc_fs.h> +#include <linux/sched/signal.h> #include <linux/sched/task.h> #include <linux/idr.h> @@ -451,6 +452,53 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns) return idr_get_next(&ns->idr, &nr); } +/** + * pidfd_open() - Open new pid file descriptor. + * + * @pid: pid for which to retrieve a pidfd + * @flags: flags to pass + * + * This creates a new pid file descriptor with the O_CLOEXEC flag set for + * the process identified by @pid. Currently, the process identified by + * @pid must be a thread-group leader. This restriction currently exists + * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot + * be used with CLONE_THREAD) and pidfd polling (only supports thread group + * leaders). + * + * Return: On success, a cloexec pidfd is returned. + * On error, a negative errno number will be returned. + */ +SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags) +{ + int fd, ret; + struct pid *p; + struct task_struct *tsk; + + if (flags) + return -EINVAL; + + if (pid <= 0) + return -EINVAL; + + p = find_get_pid(pid); + if (!p) + return -ESRCH; + + rcu_read_lock(); + tsk = pid_task(p, PIDTYPE_PID); + if (!tsk) + ret = -ESRCH; + else if (unlikely(!thread_group_leader(tsk))) + ret = -EINVAL; + else + ret = 0; + rcu_read_unlock(); + + fd = ret ?: pidfd_create(p); + put_pid(p); + return fd; +} + void __init pid_idr_init(void) { /* Verify no one has done anything silly: */ -- 2.21.0

6 years, 1 month

7
17
0 0

[PATCH AUTOSEL 5.0 29/34] KVM: selftests: make hyperv_cpuid test pass on AMD

by Sasha Levin

From: Vitaly Kuznetsov <vkuznets(a)redhat.com> [ Upstream commit eba3afde1cea7dbd7881683232f2a85e2ed86bfe ] Enlightened VMCS is only supported on Intel CPUs but the test shouldn't fail completely. Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c index 264425f75806b..9a21e912097c4 100644 --- a/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c +++ b/tools/testing/selftests/kvm/x86_64/hyperv_cpuid.c @@ -141,7 +141,13 @@ int main(int argc, char *argv[]) free(hv_cpuid_entries); - vcpu_ioctl(vm, VCPU_ID, KVM_ENABLE_CAP, &enable_evmcs_cap); + rv = _vcpu_ioctl(vm, VCPU_ID, KVM_ENABLE_CAP, &enable_evmcs_cap); + + if (rv) { + fprintf(stderr, + "Enlightened VMCS is unsupported, skip related test\n"); + goto vm_free; + } hv_cpuid_entries = kvm_get_supported_hv_cpuid(vm); if (!hv_cpuid_entries) @@ -151,6 +157,7 @@ int main(int argc, char *argv[]) free(hv_cpuid_entries); +vm_free: kvm_vm_free(vm); return 0; -- 2.20.1

6 years, 1 month

1
0
0 0

[PATCH AUTOSEL 5.0 28/34] KVM: fix KVM_CLEAR_DIRTY_LOG for memory slots of unaligned size

by Sasha Levin

From: Paolo Bonzini <pbonzini(a)redhat.com> [ Upstream commit 76d58e0f07ec203bbdfcaabd9a9fc10a5a3ed5ea ] If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: Peter Xu <peterx(a)redhat.com> Fixes: 98938aa8edd6 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- Documentation/virtual/kvm/api.txt | 5 +++-- tools/testing/selftests/kvm/dirty_log_test.c | 9 ++++++--- virt/kvm/kvm_main.c | 7 ++++--- 3 files changed, 13 insertions(+), 8 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index ba8927c0d45c4..a1b8e6d92298c 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3790,8 +3790,9 @@ The ioctl clears the dirty status of pages in a memory slot, according to the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap field. Bit 0 of the bitmap corresponds to page "first_page" in the memory slot, and num_pages is the size in bits of the input bitmap. -Both first_page and num_pages must be a multiple of 64. For each bit -that is set in the input bitmap, the corresponding page is marked "clean" +first_page must be a multiple of 64; num_pages must also be a multiple of +64 unless first_page + num_pages is the size of the memory slot. For each +bit that is set in the input bitmap, the corresponding page is marked "clean" in KVM's dirty bitmap, and dirty tracking is re-enabled for that page (for example via write-protection, or by clearing the dirty bit in a page table entry). diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 4715cfba20dce..93f99c6b7d79e 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -288,8 +288,11 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, #endif max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1; guest_page_size = (1ul << guest_page_shift); - /* 1G of guest page sized pages */ - guest_num_pages = (1ul << (30 - guest_page_shift)); + /* + * A little more than 1G of guest page sized pages. Cover the + * case where the size is not aligned to 64 pages. + */ + guest_num_pages = (1ul << (30 - guest_page_shift)) + 3; host_page_size = getpagesize(); host_num_pages = (guest_num_pages * guest_page_size) / host_page_size + !!((guest_num_pages * guest_page_size) % host_page_size); @@ -359,7 +362,7 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations, kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap); #ifdef USE_CLEAR_DIRTY_LOG kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0, - DIV_ROUND_UP(host_num_pages, 64) * 64); + host_num_pages); #endif vm_dirty_log_verify(bmap); iteration++; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index b4f2d892a1d36..c9b102e65a3a7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1241,7 +1241,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS) return -EINVAL; - if ((log->first_page & 63) || (log->num_pages & 63)) + if (log->first_page & 63) return -EINVAL; slots = __kvm_memslots(kvm, as_id); @@ -1254,8 +1254,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm, n = kvm_dirty_bitmap_bytes(memslot); if (log->first_page > memslot->npages || - log->num_pages > memslot->npages - log->first_page) - return -EINVAL; + log->num_pages > memslot->npages - log->first_page || + (log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63))) + return -EINVAL; *flush = false; dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot); -- 2.20.1

6 years, 1 month

1
0
0 0

[PATCH] selftests: pidfd: Create .gitignore to include pidfd_test

by Kelsey Skunberg

Create /selftests/pidfd/.gitignore which holds the following file name created after compiling: - pidfd_test Signed-off-by: Kelsey Skunberg <skunberg.kelsey(a)gmail.com> --- tools/testing/selftests/pidfd/.gitignore | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/pidfd/.gitignore diff --git a/tools/testing/selftests/pidfd/.gitignore b/tools/testing/selftests/pidfd/.gitignore new file mode 100644 index 000000000000..822a1e63d045 --- /dev/null +++ b/tools/testing/selftests/pidfd/.gitignore @@ -0,0 +1 @@ +pidfd_test -- 2.20.1

6 years, 1 month

2
2
0 0

[PATCH v4 0/2] kheaders fixes for -rc

by Joel Fernandes (Google)

Linus, Greg, Masahiro, Here are some simple fixes for the kheaders feature. Please consider these patches for an rc release. They are based on Linus's master branch. The only difference between the last series [1] and this one is I squashed 1/3 and 3/3 and rebased. Thanks! [1] https://patchwork.kernel.org/cover/10939557/ Joel Fernandes (Google) (2): kheaders: Move from proc to sysfs kheaders: Do not regenerate archive if config is not changed init/Kconfig | 17 +++++---- kernel/Makefile | 4 +-- kernel/{gen_ikh_data.sh => gen_kheaders.sh} | 17 ++++++--- kernel/kheaders.c | 40 +++++++++------------ 4 files changed, 38 insertions(+), 40 deletions(-) rename kernel/{gen_ikh_data.sh => gen_kheaders.sh} (82%) -- 2.21.0.1020.gf2820cf01a-goog

6 years, 1 month

1
2
0 0

[PATCH v3 00/18] kunit: introduce KUnit, the Linux kernel unit testing framework

by Brendan Higgins

## TLDR I mostly wanted to incorporate feedback I got over the last week and a half. Biggest things to look out for: - KUnit core now outputs results in TAP14. - Heavily reworked tools/testing/kunit/kunit.py - Changed how parsing works. - Added testing. - Greg, Logan, you might want to re-review this. - Added documentation on how to use KUnit on non-UML kernels. You can see the docs rendered here[1]. There is still some discussion going on on the [PATCH v2 00/17] thread, but I wanted to get some of these updates out before they got too stale (and too difficult for me to keep track of). I hope no one minds. ## Background This patch set proposes KUnit, a lightweight unit testing and mocking framework for the Linux kernel. Unlike Autotest and kselftest, KUnit is a true unit testing framework; it does not require installing the kernel on a test machine or in a VM (however, KUnit still allows you to run tests on test machines or in VMs if you want) and does not require tests to be written in userspace running on a host kernel. Additionally, KUnit is fast: From invocation to completion KUnit can run several dozen tests in under a second. Currently, the entire KUnit test suite for KUnit runs in under a second from the initial invocation (build time excluded). KUnit is heavily inspired by JUnit, Python's unittest.mock, and Googletest/Googlemock for C++. KUnit provides facilities for defining unit test cases, grouping related test cases into test suites, providing common infrastructure for running tests, mocking, spying, and much more. ## What's so special about unit testing? A unit test is supposed to test a single unit of code in isolation, hence the name. There should be no dependencies outside the control of the test; this means no external dependencies, which makes tests orders of magnitudes faster. Likewise, since there are no external dependencies, there are no hoops to jump through to run the tests. Additionally, this makes unit tests deterministic: a failing unit test always indicates a problem. Finally, because unit tests necessarily have finer granularity, they are able to test all code paths easily solving the classic problem of difficulty in exercising error handling code. ## Is KUnit trying to replace other testing frameworks for the kernel? No. Most existing tests for the Linux kernel are end-to-end tests, which have their place. A well tested system has lots of unit tests, a reasonable number of integration tests, and some end-to-end tests. KUnit is just trying to address the unit test space which is currently not being addressed. ## More information on KUnit There is a bunch of documentation near the end of this patch set that describes how to use KUnit and best practices for writing unit tests. For convenience I am hosting the compiled docs here[2]. Additionally for convenience, I have applied these patches to a branch[3]. The repo may be cloned with: git clone https://kunit.googlesource.com/linux This patchset is on the kunit/rfc/v5.1/v3 branch. ## Changes Since Last Version - Converted KUnit core to print test results in TAP14 format as suggested by Greg and Frank. - Heavily reworked tools/testing/kunit/kunit.py - Changed how parsing works. - Added testing. - Added documentation on how to use KUnit on non-UML kernels. You can see the docs rendered here[1]. - Added a new set of EXPECTs and ASSERTs for pointer comparison. - Removed more function indirection as suggested by Logan. - Added a new patch that adds `kunit_try_catch_throw` to objtool's noreturn list. - Fixed a number of minorish issues pointed out by Shuah, Masahiro, and kbuild bot. [1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni… [2] https://google.github.io/kunit-docs/third_party/kernel/docs/ [3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.1/v3 -- 2.21.0.1020.gf2820cf01a-goog

6 years, 1 month

3
27
0 0

[PATCH 1/3] kselftest/cgroup: fix unexcepted testing failure on test_memcontrol

by Alex Shi

The cgroup testing relys on the root cgroup's subtree_control setting, If the 'memory' controller isn't set, all test cases will be failed as following: $ sudo ./test_memcontrol not ok 1 test_memcg_subtree_control not ok 2 test_memcg_current ok 3 # skip test_memcg_min not ok 4 test_memcg_low not ok 5 test_memcg_high not ok 6 test_memcg_max not ok 7 test_memcg_oom_events ok 8 # skip test_memcg_swap_max not ok 9 test_memcg_sock not ok 10 test_memcg_oom_group_leaf_events not ok 11 test_memcg_oom_group_parent_events not ok 12 test_memcg_oom_group_score_events To correct this unexcepted failure, this patch write the 'memory' to subtree_control of root to get a right result. Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Jay Kamat <jgkamat(a)fb.com> Cc: linux-kselftest(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org --- tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index 6f339882a6ca..73612d604a2a 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -1205,6 +1205,10 @@ int main(int argc, char **argv) if (cg_read_strstr(root, "cgroup.controllers", "memory")) ksft_exit_skip("memory controller isn't available\n"); + if (cg_read_strstr(root, "cgroup.subtree_control", "memory")) + if (cg_write(root, "cgroup.subtree_control", "+memory")) + ksft_exit_skip("Failed to set root memory controller\n"); + for (i = 0; i < ARRAY_SIZE(tests); i++) { switch (tests[i].fn(root)) { case KSFT_PASS: -- 2.19.1.856.g8858448bb

6 years, 1 month

1
2
0 0

[PATCH] selftests: drivers: Create .gitignore to include /dma-buf/udmabuf

by Kelsey Skunberg

Create /selftests/drivers/.gitignore which holds the following file name created after compiling: - /dma-buf/udmabuf Signed-off-by: Kelsey Skunberg <skunberg.kelsey(a)gmail.com> --- tools/testing/selftests/drivers/.gitignore | 1 + 1 file changed, 1 insertion(+) create mode 100644 tools/testing/selftests/drivers/.gitignore diff --git a/tools/testing/selftests/drivers/.gitignore b/tools/testing/selftests/drivers/.gitignore new file mode 100644 index 000000000000..f6aebcc27b76 --- /dev/null +++ b/tools/testing/selftests/drivers/.gitignore @@ -0,0 +1 @@ +/dma-buf/udmabuf -- 2.20.1

6 years, 1 month

2
1
0 0

[PATCH] selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects

by Shuah Khan

Running "make kselftest" or building selftests when KBUILD_OUTPUT is set, will create selftest objects in the KBUILD_OUTPUT directory. This could be undesirable especially when user didn't intend to relocate selftest objects. Use KBUILD_OUTPUT/kselftest to create selftest objects instead of cluttering the main directory. Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org> --- tools/testing/selftests/Makefile | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c71a63b923d4..9781ca79794a 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -71,6 +71,9 @@ override LDFLAGS = override MAKEFLAGS = endif +# Append kselftest to KBUILD_OUTPUT to avoid cluttering +# KBUILD_OUTPUT with selftest objects and headers installed +# by selftests Makefile or lib.mk. ifneq ($(KBUILD_SRC),) override LDFLAGS = endif @@ -79,7 +82,7 @@ ifneq ($(O),) BUILD := $(O) else ifneq ($(KBUILD_OUTPUT),) - BUILD := $(KBUILD_OUTPUT) + BUILD := $(KBUILD_OUTPUT)/kselftest else BUILD := $(shell pwd) DEFAULT_INSTALL_HDR_PATH := 1 -- 2.17.1

6 years, 1 month

1
0
0 0

next-20190510 kselftest results

by ci_notify＠linaro.org

Summary ------------------------------------------------------------------------ kernel: 5.1.0 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git branch: master git commit: a802303934b3bd4df6e2fc8bf2e4ebced1c37556 git describe: next-20190510 Test details: https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190510 Regressions (compared to build next-20190509) ------------------------------------------------------------------------ No regressions Fixes (compared to build next-20190509) ------------------------------------------------------------------------ No fixes In total: ------------------------------------------------------------------------ Ran 12 total tests in the following environments and test suites. pass 11 fail 1 xfail 0 skip 0 Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites ----------- * boot-lkft-kselftests-master-519 * boot-lkft-kselftests-vsyscall-mode-native-master-519 * boot-lkft-kselftests-vsyscall-mode-none-master-519 Failures ------------------------------------------------------------------------ qemu_x86_64: qemu_arm64: qemu_i386: x15: qemu_arm: x86: i386: hi6220-hikey: juno-r2: dragonboard-410c: * boot-lkft-kselftests-master-519/dragonboard-410c Skips ------------------------------------------------------------------------ No skips -- Linaro LKFT https://lkft.linaro.org

6 years, 1 month

3
4
0 0

[PATCH v2 1/2] Add polling support to pidfd

by Joel Fernandes (Google)

Android low memory killer (LMK) needs to know when a process dies once it is sent the kill signal. It does so by checking for the existence of /proc/pid which is both racy and slow. For example, if a PID is reused between when LMK sends a kill signal and checks for existence of the PID, since the wrong PID is now possibly checked for existence. This patch adds polling support to pidfd. Using the polling support, LMK will be able to get notified when a process exists in race-free and fast way, and allows the LMK to do other things (such as by polling on other fds) while awaiting the process being killed to die. For notification to polling processes, we follow the same existing mechanism in the kernel used when the parent of the task group is to be notified of a child's death (do_notify_parent). This is precisely when the tasks waiting on a poll of pidfd are also awakened in this patch. We have decided to include the waitqueue in struct pid for the following reasons: 1. The wait queue has to survive for the lifetime of the poll. Including it in task_struct would not be option in this case because the task can be reaped and destroyed before the poll returns. 2. By including the struct pid for the waitqueue means that during de_thread(), the new thread group leader automatically gets the new waitqueue/pid even though its task_struct is different. Appropriate test cases are added in the second patch to provide coverage of all the cases the patch is handling. Andy had a similar patch [1] in the past which was a good reference however this patch tries to handle different situations properly related to thread group existence, and how/where it notifies. And also solves other bugs (waitqueue lifetime). Daniel had a similar patch [2] recently which this patch supercedes. [1] https://lore.kernel.org/patchwork/patch/345098/ [2] https://lore.kernel.org/lkml/20181029175322.189042-1-dancol@google.com/ Cc: Andy Lutomirski <luto(a)amacapital.net> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Daniel Colascione <dancol(a)google.com> Cc: Christian Brauner <christian(a)brauner.io> Cc: Jann Horn <jannh(a)google.com> Cc: Tim Murray <timmurray(a)google.com> Cc: Jonathan Kowalski <bl0pbl33p(a)gmail.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Al Viro <viro(a)zeniv.linux.org.uk> Cc: Kees Cook <keescook(a)chromium.org> Cc: David Howells <dhowells(a)redhat.com> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: kernel-team(a)android.com (Oleg improved the code by showing how to avoid tasklist_lock) Suggested-by: Oleg Nesterov <oleg(a)redhat.com> Co-developed-by: Daniel Colascione <dancol(a)google.com> Signed-off-by: Daniel Colascione <dancol(a)google.com> Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org> --- v1 -> v2: * Restructure poll code to avoid tasklist_lock (Oleg) * use task_pid instead of get_pid_task in notify_pidfd (Oleg) * Added comments to code, commit message nits (Christian) * Test case nits/improvements (Christian) RFC -> v1: * Based on CLONE_PIDFD patches: https://lwn.net/Articles/786244/ * Updated selftests. * Renamed poll wake function to do_notify_pidfd. * Removed depending on EXIT flags * Removed POLLERR flag since semantics are controversial and we don't have usecases for it right now (later we can add if there's a need for it). include/linux/pid.h | 3 +++ kernel/fork.c | 29 +++++++++++++++++++++++++++++ kernel/pid.c | 2 ++ kernel/signal.c | 11 +++++++++++ 4 files changed, 45 insertions(+) diff --git a/include/linux/pid.h b/include/linux/pid.h index 3c8ef5a199ca..1484db6ca8d1 100644 --- a/include/linux/pid.h +++ b/include/linux/pid.h @@ -3,6 +3,7 @@ #define _LINUX_PID_H #include <linux/rculist.h> +#include <linux/wait.h> enum pid_type { @@ -60,6 +61,8 @@ struct pid unsigned int level; /* lists of tasks that use this pid */ struct hlist_head tasks[PIDTYPE_MAX]; + /* wait queue for pidfd notifications */ + wait_queue_head_t wait_pidfd; struct rcu_head rcu; struct upid numbers[1]; }; diff --git a/kernel/fork.c b/kernel/fork.c index 5525837ed80e..721f8c9d2921 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1685,8 +1685,37 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f) } #endif +/* + * Poll support for process exit notification. + */ +static unsigned int pidfd_poll(struct file *file, struct poll_table_struct *pts) +{ + struct task_struct *task; + struct pid *pid = file->private_data; + int poll_flags = 0; + + poll_wait(file, &pid->wait_pidfd, pts); + + rcu_read_lock(); + task = pid_task(pid, PIDTYPE_PID); + WARN_ON_ONCE(task && !thread_group_leader(task)); + + /* + * Inform pollers only when the whole thread group exits, if thread + * group leader exits before all other threads in the group, then + * poll(2) should block, similar to the wait(2) family. + */ + if (!task || (task->exit_state && thread_group_empty(task))) + poll_flags = POLLIN | POLLRDNORM; + rcu_read_unlock(); + + return poll_flags; +} + + const struct file_operations pidfd_fops = { .release = pidfd_release, + .poll = pidfd_poll, #ifdef CONFIG_PROC_FS .show_fdinfo = pidfd_show_fdinfo, #endif diff --git a/kernel/pid.c b/kernel/pid.c index 20881598bdfa..5c90c239242f 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -214,6 +214,8 @@ struct pid *alloc_pid(struct pid_namespace *ns) for (type = 0; type < PIDTYPE_MAX; ++type) INIT_HLIST_HEAD(&pid->tasks[type]); + init_waitqueue_head(&pid->wait_pidfd); + upid = pid->numbers + ns->level; spin_lock_irq(&pidmap_lock); if (!(ns->pid_allocated & PIDNS_ADDING)) diff --git a/kernel/signal.c b/kernel/signal.c index 1581140f2d99..a17fff073c3d 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1800,6 +1800,14 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) return ret; } +static void do_notify_pidfd(struct task_struct *task) +{ + struct pid *pid; + + pid = task_pid(task); + wake_up_all(&pid->wait_pidfd); +} + /* * Let a parent know about the death of a child. * For a stopped/continued status change, use do_notify_parent_cldstop instead. @@ -1823,6 +1831,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig) BUG_ON(!tsk->ptrace && (tsk->group_leader != tsk || !thread_group_empty(tsk))); + /* Wake up all pidfd waiters */ + do_notify_pidfd(tsk); + if (sig != SIGCHLD) { /* * This is only possible if parent == real_parent. -- 2.21.0.593.g511ec345e18-goog

6 years, 1 month

4
10
0 0

[PATCH v2] selftests: kvm: Add files generated when compiled to .gitignore

by Kelsey Skunberg

The following files are generated in /selftests/kvm/ after compiling and should be added to /selftests/kvm/.gitignore: - /x86_64/hyperv_cpuid - /x86_64/smm_test - /clear_dirty_log_test Signed-off-by: Kelsey Skunberg <skunberg.kelsey(a)gmail.com> --- Changes since v1: - Included /clear_dirty_log_test to .gitignore - Updated commit log to reflect file added tools/testing/selftests/kvm/.gitignore | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore index 2689d1ea6d7a..391a19231618 100644 --- a/tools/testing/selftests/kvm/.gitignore +++ b/tools/testing/selftests/kvm/.gitignore @@ -6,4 +6,7 @@ /x86_64/vmx_close_while_nested_test /x86_64/vmx_tsc_adjust_test /x86_64/state_test +/x86_64/hyperv_cpuid +/x86_64/smm_test +/clear_dirty_log_test /dirty_log_test -- 2.20.1

6 years, 1 month

2
1
0 0

selftests: bpf: test_sock: WARNING: workqueue.c:3030 __flush_work+0x3fc/0x470

by Naresh Kamboju

Do you see kernel WARNING while running bpf: test_sock test case ? This kernel warning is popping up continuously for ~725 times on all devices ( arm64, arm, x86_64 and i386). selftests: bpf: test_sock Test case: bind4 load with invalid access: src_ip6 .. [PASS] Test case: bind4 load with invalid access: mark .. [PASS] Test case: bind6 load with invalid access: src_ip4 .. [PASS] Test case: sock_create load with invalid access: src_port .. [PASS] Test case: sock_create load w/o expected_attach_type (compat mode) .. [PASS] Test case: sock_create load w/ expected_attach_type .. [PASS] Test case: attach type mismatch bind4 vs bind6 .. [PASS] Test case: attach type mismatch bind6 vs bind4 .. [PASS] Test case: attach type mismatch default vs bind4 .. [PASS] Test case: attach type mismatch bind6 vs sock_create .. [PASS] Test case: bind4 reject all .. [PASS] Test case: bind6 reject all .. [PASS] Test case: bind6 deny specific IP & port .. [PASS] Test case: bind4 allow specific IP & port .. [PASS] Test case: bind4 allow all .. [PASS] Test case: bind6 allow all .. [PASS] Summ[ 200.359313] WARNING: CPU: 1 PID: 3400 at /usr/src/kernel/kernel/workqueue.c:3030 __flush_work+0x3fc/0x470 [ 200.369390] Modules linked in: tun algif_hash af_alg x86_pkg_temp_thermal fuse [ 200.376621] CPU: 1 PID: 3400 Comm: kworker/1:156 Not tainted 5.1.0 #1 [ 200.383061] Hardware name: Supermicro SYS-5019S-ML/X11SSH-F, BIOS 2.2 05/23/2018 [ 200.390455] Workqueue: events sk_psock_destroy_deferred [ 200.395679] RIP: 0010:__flush_work+0x3fc/0x470 [ 200.400123] Code: 03 00 31 c0 e9 a4 fe ff ff 41 8b 0c 24 49 8b 54 24 08 83 e1 08 49 0f ba 2c 24 03 80 c9 f0 e9 d0 fd ff ff 0f 0b e9 83 fe ff ff <0f> 0b 31 c0 e9 7a fe ff ff e8 e6 1f 06 00 84 c0 0f 85 fd fe ff ff [ 200.418859] RSP: 0018:ffffa5d68487bca0 EFLAGS: 00010246 [ 200.424079] RAX: 0000000000000000 RBX: ffff9abc19269468 RCX: 0000000000000006 [ 200.431200] RDX: 0000000000001b35 RSI: 0000000000000001 RDI: ffff9abc19269468 [ 200.438324] RBP: ffffa5d68487bd68 R08: 0000000000000000 R09: 0000000000000000 [ 200.445449] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9abc19269468 [ 200.452572] R13: 0000000000000001 R14: ffffa5d68487bd98 R15: ffffffffabe91000 [ 200.459696] FS: 0000000000000000(0000) GS:ffff9abc6f880000(0000) knlGS:0000000000000000 [ 200.467776] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 200.473521] CR2: 00007f6c3ebf3cb0 CR3: 00000001d920a006 CR4: 00000000003606e0 [ 200.480644] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 200.487768] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 200.494891] Call Trace: [ 200.497338] ? get_work_pool+0x90/0x90 [ 200.501090] ? mark_held_locks+0x4d/0x80 [ 200.505015] ? __cancel_work_timer+0x11a/0x1d0 [ 200.509452] ? cancel_delayed_work_sync+0x13/0x20 [ 200.514151] ? lockdep_hardirqs_on+0xf6/0x190 [ 200.518509] ? __cancel_work_timer+0x11a/0x1d0 [ 200.522946] ? get_work_pool+0x90/0x90 [ 200.526691] __cancel_work_timer+0x134/0x1d0 [ 200.530965] cancel_delayed_work_sync+0x13/0x20 [ 200.535497] strp_done+0x1c/0x50 [ 200.538729] sk_psock_destroy_deferred+0x34/0x1c0 [ 200.543434] process_one_work+0x281/0x610 [ 200.547439] worker_thread+0x3c/0x3f0 [ 200.551105] kthread+0x12c/0x150 [ 200.554337] ? process_one_work+0x610/0x610 [ 200.558515] ? kthread_park+0x90/0x90 [ 200.562180] ret_from_fork+0x3a/0x50 [ 200.565762] irq event stamp: 2382 [ 200.569079] hardirqs last enabled at (2381): [<ffffffffabe9569a>] __cancel_work_timer+0x11a/0x1d0 [ 200.578030] hardirqs last disabled at (2382): [<ffffffffabe01b7b>] trace_hardirqs_off_thunk+0x1a/0x1c [ 200.587234] softirqs last enabled at (1806): [<fffffffface00334>] __do_softirq+0x334/0x445 [ 200.595573] softirqs last disabled at (1761): [<ffffffffabe76e70>] irq_exit+0xc0/0xd0 [ 200.603398] ---[ end trace 447ba7732b3f8640 ]--- Test full log link, https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.1-9822-g47782… Best regards Naresh Kamboju

6 years, 1 month

1
0
0 0

next-20190513 kselftest results

by ci_notify＠linaro.org

Summary ------------------------------------------------------------------------ kernel: 5.1.0 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git git branch: master git commit: 04c4b6775d34f12f196e056debed9e8718585342 git describe: next-20190513 Test details: https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190513 Regressions (compared to build next-20190510) ------------------------------------------------------------------------ No regressions Fixes (compared to build next-20190510) ------------------------------------------------------------------------ No fixes In total: ------------------------------------------------------------------------ Ran 12 total tests in the following environments and test suites. pass 11 fail 1 xfail 0 skip 0 Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites ----------- * boot-lkft-kselftests-master-520 * boot-lkft-kselftests-vsyscall-mode-native-master-520 * boot-lkft-kselftests-vsyscall-mode-none-master-520 Failures ------------------------------------------------------------------------ qemu_i386: juno-r2: qemu_arm64: qemu_x86_64: hi6220-hikey: dragonboard-410c: * boot-lkft-kselftests-master-520/dragonboard-410c i386: x86: qemu_arm: x15: Skips ------------------------------------------------------------------------ No skips -- Linaro LKFT https://lkft.linaro.org

6 years, 1 month

1
0
0 0

Re: [PATCH 2/4] x86/kprobes: Fix frame pointer annotations

by Josh Poimboeuf

On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote: > On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote: > > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote: > > > > Do the x86_64 variants also want some ORC annotation? > > > > Maybe so. Though it looks like regs->ip isn't saved. The saved > > registers might need to be tweaked. I'll need to look into it. > > What all these sites do (and maybe we should look at unifying them > somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka > pt_regs). > > So regs->ip will be the return address (which is fixed up to be the CALL > address in the handler). But from what I can tell, trampoline_handler() hard-codes regs->ip to point to kretprobe_trampoline(), and the original return address is placed in regs->sp. Masami, is there a reason why regs->ip doesn't have the original return address and regs->sp doesn't have the original SP? I think that would help the unwinder understand things. -- Josh

6 years, 1 month

5
22
0 0

[PATCH v2] selftests: bpf: Add files generated after build to .gitignore

by Kelsey Skunberg

The following files are generated after building /selftests/bpf/ and should be added to .gitignore: - libbpf.pc - libbpf.so.* Signed-off-by: Kelsey Skunberg <skunberg.kelsey(a)gmail.com> --- Change since v1: - Add libbpf.so.* in replace of libbpf.so.0 and libbpf.so.0.0.3 - Update commit log to reflect file change tools/testing/selftests/bpf/.gitignore | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore index 41e8a689aa77..a877803e4ba8 100644 --- a/tools/testing/selftests/bpf/.gitignore +++ b/tools/testing/selftests/bpf/.gitignore @@ -32,3 +32,5 @@ test_tcpnotify_user test_libbpf test_tcp_check_syncookie_user alu32 +libbpf.pc +libbpf.so.* -- 2.20.1

6 years, 1 month

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror