Hi Linus,
Please pull the following Kselftest update for Linux 5.2-rc1
This kselftest second update for Linux 5.2-rc1 consists of
Kselftest framework fixes from Shuah Khan
- kselftest framework bpf build/test workflow regression fix
- Fix to kselftest install to use default install path
- Fix to kselftest KBUILD_OUTPUT builds to not clutter main
KBUILD_OUTPUT directory with selftest objects
- .gitignore fixes from Kelsey Skunberg
- rseq selftests updates from Mathieu Desnoyers and Martin Schwidefsky:
They change the per-architecture pre-abort signatures to ensure those
are valid trap instructions.
The way exit points are presented to debuggers is enhanced, ensuring
all exit points are present, so debuggers don't have to disassemble
rseq critical section to properly skip over them.
Discussions with the glibc community is reaching a consensus of
exposing a __rseq_handled symbol from glibc to coexist with rseq early
adopters.
Update the rseq selftest code to expose and use this symbol.
Support for compiling asm goto with clang is added with the
"-no-integrated-as" compiler switch, similarly to the top level kernel
Makefile.
- kselftest Makefile test run output refactoring and making test
output TAP13 compliant from Kees Cook:
This re-factors the selftest Makefiles to extract the test running
logic to be reused between "run_tests" and "emit_tests", while also
fixing up the test output to be TAP version 13 compliant:
- added "plan" line
- fixed result line syntax
- moved all test output to be "# "-prefixed as TAP "diagnostic"
lines
The prefixing code includes a fallback mode for limited execution
environments.
Additionally, the plan lines are fixed for all callers of kselftest.h.
diff is attached.
Also, there is a merge conflict with the following commit which went
through the s390 maintainer tree and is now in your master which will
require a minor fix-up.
commit e24e4712efad737ca09ff299276737331cd021d9
Author: Martin Schwidefsky <schwidefsky(a)de.ibm.com>
rseq/selftests: s390: use trap4 for RSEQ_SIG
thanks,
-- Shuah
---------------------------------------------------------------
The following changes since commit d917fb876f6eaeeea8a2b620d2a266ce26372f4d:
selftests: build and run gpio when output directory is the src dir
(2019-04-22 17:02:26 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.2-rc1-2
for you to fetch changes up to 61c2018c0743fe0c9ca68e308b5727b8a7c3d544:
selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects
(2019-05-14 17:37:41 -0600)
----------------------------------------------------------------
linux-kselftest-5.2-rc1-2
This kselftest second update for Linux 5.2-rc1 consists of
Kselftest framework fixes from Shuah Khan
- kselftest framework bpf build/test workflow regression fix
- Fix to kselftest install to use default install path
- Fix to kselftest KBUILD_OUTPUT builds to not clutter main
KBUILD_OUTPUT directory with selftest objects
- .gitignore fixes from Kelsey Skunberg
- rseq selftests updates from Mathieu Desnoyers and Martin Schwidefsky:
They change the per-architecture pre-abort signatures to ensure those
are valid trap instructions.
The way exit points are presented to debuggers is enhanced, ensuring
all exit points are present, so debuggers don't have to disassemble
rseq critical section to properly skip over them.
Discussions with the glibc community is reaching a consensus of exposing
a __rseq_handled symbol from glibc to coexist with rseq early adopters.
Update the rseq selftest code to expose and use this symbol.
Support for compiling asm goto with clang is added with the
"-no-integrated-as" compiler switch, similarly to the top level kernel
Makefile.
- kselftest Makefile test run output refactoring and making test
output TAP13 compliant from Kees Cook:
This re-factors the selftest Makefiles to extract the test running logic
to be reused between "run_tests" and "emit_tests", while also fixing
up the test output to be TAP version 13 compliant:
- added "plan" line
- fixed result line syntax
- moved all test output to be "# "-prefixed as TAP "diagnostic"
lines
The prefixing code includes a fallback mode for limited execution
environments.
Additionally, the plan lines are fixed for all callers of kselftest.h.
----------------------------------------------------------------
Kees Cook (8):
selftests: Extract single-test shell logic from lib.mk
selftests: Use runner.sh for emit targets
selftests: Extract logic for multiple test runs
selftests: Add plan line and fix result line syntax
selftests: Distinguish between missing and non-executable
selftests: Move test output to diagnostic lines
selftests: Remove KSFT_TAP_LEVEL
selftests: Add test plan API to kselftest.h and adjust callers
Kelsey Skunberg (2):
selftests: pidfd: Create .gitignore to include pidfd_test
selftests: drivers: Create .gitignore to include /dma-buf/udmabuf
Martin Schwidefsky (1):
rseq/selftests: s390: use trap4 for RSEQ_SIG
Mathieu Desnoyers (11):
rseq/selftests: x86: Work-around bogus gcc-8 optimisation
rseq/selftests: Add __rseq_exit_point_array section for debuggers
rseq/selftests: Introduce __rseq_cs_ptr_array, rename
__rseq_table to __rseq_cs
rseq/selftests: Use __rseq_handled symbol to coexist with glibc
rseq/selftests: s390: use jg instruction for jumps outside of the asm
rseq/selftests: x86: use ud1 instruction as RSEQ_SIG opcode
rseq/selftests: arm: use udf instruction for RSEQ_SIG
rseq/selftests: aarch64 code signature: handle big-endian environment
rseq/selftests: powerpc code signature: generate valid instructions
rseq/selftests: mips: use break instruction for RSEQ_SIG
rseq/selftests: add -no-integrated-as for clang
Shuah Khan (3):
selftests: fix install target to use default install path
selftests: fix bpf build/test workflow regression when
KBUILD_OUTPUT is set
selftests: avoid KBUILD_OUTPUT dir cluttering with selftest objects
tools/testing/selftests/.gitignore | 1 -
tools/testing/selftests/Makefile | 39 +--
.../selftests/breakpoints/breakpoint_test.c | 15 +-
.../selftests/breakpoints/breakpoint_test_arm64.c | 3 +-
.../breakpoints/step_after_suspend_test.c | 8 +
tools/testing/selftests/capabilities/test_execve.c | 6 +-
tools/testing/selftests/drivers/.gitignore | 1 +
.../selftests/futex/functional/futex_requeue_pi.c | 1 +
.../functional/futex_requeue_pi_mismatched_ops.c | 1 +
.../functional/futex_requeue_pi_signal_restart.c | 1 +
.../functional/futex_wait_private_mapped_file.c | 1 +
.../futex/functional/futex_wait_timeout.c | 1 +
.../functional/futex_wait_uninitialized_heap.c | 1 +
.../futex/functional/futex_wait_wouldblock.c | 1 +
tools/testing/selftests/kselftest.h | 17 +-
tools/testing/selftests/kselftest/prefix.pl | 23 ++
tools/testing/selftests/kselftest/runner.sh | 86 +++++++
tools/testing/selftests/lib.mk | 76 ++----
.../testing/selftests/membarrier/membarrier_test.c | 1 +
tools/testing/selftests/pidfd/.gitignore | 1 +
tools/testing/selftests/pidfd/pidfd_test.c | 1 +
tools/testing/selftests/rseq/Makefile | 8 +-
tools/testing/selftests/rseq/rseq-arm.h | 132 +++++++++--
tools/testing/selftests/rseq/rseq-arm64.h | 74 +++++-
tools/testing/selftests/rseq/rseq-mips.h | 115 ++++++++-
tools/testing/selftests/rseq/rseq-ppc.h | 90 ++++++-
tools/testing/selftests/rseq/rseq-s390.h | 78 +++++-
tools/testing/selftests/rseq/rseq-x86.h | 264
++++++++++++++-------
tools/testing/selftests/rseq/rseq.c | 55 ++++-
tools/testing/selftests/rseq/rseq.h | 1 +
tools/testing/selftests/sigaltstack/sas.c | 1 +
tools/testing/selftests/sync/sync_test.c | 1 +
32 files changed, 883 insertions(+), 221 deletions(-)
create mode 100644 tools/testing/selftests/drivers/.gitignore
create mode 100755 tools/testing/selftests/kselftest/prefix.pl
create mode 100644 tools/testing/selftests/kselftest/runner.sh
create mode 100644 tools/testing/selftests/pidfd/.gitignore
---------------------------------------------------------------
This series of patches move the commonly used bpf_printk macro to
bpf_helpers.h which is already included in all BPF programs which
defined that macro on their own.
v1->v2:
- If HBM_DEBUG is not defined in hbm sample, undefine bpf_printk and set
an empty macro for it.
Michal Rostecki (2):
selftests: bpf: Move bpf_printk to bpf_helpers.h
samples: bpf: Do not define bpf_printk macro
samples/bpf/hbm_kern.h | 11 ++---------
samples/bpf/tcp_basertt_kern.c | 7 -------
samples/bpf/tcp_bufs_kern.c | 7 -------
samples/bpf/tcp_clamp_kern.c | 7 -------
samples/bpf/tcp_cong_kern.c | 7 -------
samples/bpf/tcp_iw_kern.c | 7 -------
samples/bpf/tcp_rwnd_kern.c | 7 -------
samples/bpf/tcp_synrto_kern.c | 7 -------
samples/bpf/tcp_tos_reflect_kern.c | 7 -------
samples/bpf/xdp_sample_pkts_kern.c | 7 -------
tools/testing/selftests/bpf/bpf_helpers.h | 8 ++++++++
.../testing/selftests/bpf/progs/sockmap_parse_prog.c | 7 -------
.../selftests/bpf/progs/sockmap_tcp_msg_prog.c | 7 -------
.../selftests/bpf/progs/sockmap_verdict_prog.c | 7 -------
.../testing/selftests/bpf/progs/test_lwt_seg6local.c | 7 -------
tools/testing/selftests/bpf/progs/test_xdp_noinline.c | 7 -------
tools/testing/selftests/bpf/test_sockmap_kern.h | 7 -------
17 files changed, 10 insertions(+), 114 deletions(-)
--
2.21.0
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:
int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a huge problem
for Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfd for PID-based processes we enable them to adopt this api.
In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.
Signed-off-by: Christian Brauner <christian(a)brauner.io>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jann Horn <jannh(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: linux-api(a)vger.kernel.org
---
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/pid.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +-
kernel/pid.c | 48 +++++++++++++++++++++
19 files changed, 69 insertions(+), 2 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 165f268beafc..ddc3c93ad7a7 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -467,3 +467,4 @@
535 common io_uring_setup sys_io_uring_setup
536 common io_uring_enter sys_io_uring_enter
537 common io_uring_register sys_io_uring_register
+538 common pidfd_open sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index 23f1a44acada..350e2049b4a9 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -874,6 +874,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
__SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
#define __NR_io_uring_register 427
__SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_pidfd_open 428
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index 56e3d0b685e1..7115f6dd347a 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -348,3 +348,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index df4ec3ec71d1..44bf12b16ffe 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -427,3 +427,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 4964947732af..0d32e5152dc0 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -433,3 +433,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 9392dfe33f97..726e107b3c9f 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -366,3 +366,4 @@
425 n32 io_uring_setup sys_io_uring_setup
426 n32 io_uring_enter sys_io_uring_enter
427 n32 io_uring_register sys_io_uring_register
+428 n32 pidfd_open sys_pidfd_open
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index fe8ca623add8..83b46b568d51 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -424,3 +424,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 00f5a63c8d9a..5294d04d7fa5 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -509,3 +509,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index 061418f787c3..dcdb838adf49 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -430,3 +430,4 @@
425 common io_uring_setup sys_io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open sys_pidfd_open
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 480b057556ee..8e66edfbc521 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -430,3 +430,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index a1dd24307b00..d6f3bc686939 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index 4cd5f982b1e5..1af6b469160a 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup
426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter
427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register
+428 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index 64ca0d06259a..c18e6ebe3387 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
425 common io_uring_setup __x64_sys_io_uring_setup
426 common io_uring_enter __x64_sys_io_uring_enter
427 common io_uring_register __x64_sys_io_uring_register
+428 common pidfd_open __x64_sys_pidfd_open
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 30084eaf8422..21ee795f3003 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -398,3 +398,4 @@
425 common io_uring_setup sys_io_uring_setup
426 common io_uring_enter sys_io_uring_enter
427 common io_uring_register sys_io_uring_register
+428 common pidfd_open sys_pidfd_open
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 3c8ef5a199ca..c938a92eab99 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -67,6 +67,7 @@ struct pid
extern struct pid init_struct_pid;
extern const struct file_operations pidfd_fops;
+extern int pidfd_create(struct pid *pid);
static inline struct pid *get_pid(struct pid *pid)
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..989055e0b501 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
struct old_timex32 __user *tx);
asmlinkage long sys_syncfs(int fd);
asmlinkage long sys_setns(int fd, int nstype);
+asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
unsigned int vlen, unsigned flags);
asmlinkage long sys_process_vm_readv(pid_t pid,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index dee7292e1df6..94a257a93d20 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -832,9 +832,11 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup)
__SYSCALL(__NR_io_uring_enter, sys_io_uring_enter)
#define __NR_io_uring_register 427
__SYSCALL(__NR_io_uring_register, sys_io_uring_register)
+#define __NR_pidfd_open 428
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#undef __NR_syscalls
-#define __NR_syscalls 428
+#define __NR_syscalls 429
/*
* 32 bit systems traditionally used different
diff --git a/kernel/fork.c b/kernel/fork.c
index 737db1828437..980cc1d2b8d4 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1714,7 +1714,7 @@ const struct file_operations pidfd_fops = {
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
-static int pidfd_create(struct pid *pid)
+int pidfd_create(struct pid *pid)
{
int fd;
diff --git a/kernel/pid.c b/kernel/pid.c
index 20881598bdfa..237d18d6ecb8 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -38,6 +38,7 @@
#include <linux/syscalls.h>
#include <linux/proc_ns.h>
#include <linux/proc_fs.h>
+#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>
@@ -451,6 +452,53 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return idr_get_next(&ns->idr, &nr);
}
+/**
+ * pidfd_open() - Open new pid file descriptor.
+ *
+ * @pid: pid for which to retrieve a pidfd
+ * @flags: flags to pass
+ *
+ * This creates a new pid file descriptor with the O_CLOEXEC flag set for
+ * the process identified by @pid. Currently, the process identified by
+ * @pid must be a thread-group leader. This restriction currently exists
+ * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
+ * be used with CLONE_THREAD) and pidfd polling (only supports thread group
+ * leaders).
+ *
+ * Return: On success, a cloexec pidfd is returned.
+ * On error, a negative errno number will be returned.
+ */
+SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
+{
+ int fd, ret;
+ struct pid *p;
+ struct task_struct *tsk;
+
+ if (flags)
+ return -EINVAL;
+
+ if (pid <= 0)
+ return -EINVAL;
+
+ p = find_get_pid(pid);
+ if (!p)
+ return -ESRCH;
+
+ rcu_read_lock();
+ tsk = pid_task(p, PIDTYPE_PID);
+ if (!tsk)
+ ret = -ESRCH;
+ else if (unlikely(!thread_group_leader(tsk)))
+ ret = -EINVAL;
+ else
+ ret = 0;
+ rcu_read_unlock();
+
+ fd = ret ?: pidfd_create(p);
+ put_pid(p);
+ return fd;
+}
+
void __init pid_idr_init(void)
{
/* Verify no one has done anything silly: */
--
2.21.0
From: Paolo Bonzini <pbonzini(a)redhat.com>
[ Upstream commit 76d58e0f07ec203bbdfcaabd9a9fc10a5a3ed5ea ]
If a memory slot's size is not a multiple of 64 pages (256K), then
the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages
either requires the requested page range to go beyond memslot->npages,
or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect
requires log->num_pages to be both in range and aligned.
To allow this case, allow log->num_pages not to be a multiple of 64 if
it ends exactly on the last page of the slot.
Reported-by: Peter Xu <peterx(a)redhat.com>
Fixes: 98938aa8edd6 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02)
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
Documentation/virtual/kvm/api.txt | 5 +++--
tools/testing/selftests/kvm/dirty_log_test.c | 9 ++++++---
virt/kvm/kvm_main.c | 7 ++++---
3 files changed, 13 insertions(+), 8 deletions(-)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ba8927c0d45c4..a1b8e6d92298c 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3790,8 +3790,9 @@ The ioctl clears the dirty status of pages in a memory slot, according to
the bitmap that is passed in struct kvm_clear_dirty_log's dirty_bitmap
field. Bit 0 of the bitmap corresponds to page "first_page" in the
memory slot, and num_pages is the size in bits of the input bitmap.
-Both first_page and num_pages must be a multiple of 64. For each bit
-that is set in the input bitmap, the corresponding page is marked "clean"
+first_page must be a multiple of 64; num_pages must also be a multiple of
+64 unless first_page + num_pages is the size of the memory slot. For each
+bit that is set in the input bitmap, the corresponding page is marked "clean"
in KVM's dirty bitmap, and dirty tracking is re-enabled for that page
(for example via write-protection, or by clearing the dirty bit in
a page table entry).
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 4715cfba20dce..93f99c6b7d79e 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -288,8 +288,11 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations,
#endif
max_gfn = (1ul << (guest_pa_bits - guest_page_shift)) - 1;
guest_page_size = (1ul << guest_page_shift);
- /* 1G of guest page sized pages */
- guest_num_pages = (1ul << (30 - guest_page_shift));
+ /*
+ * A little more than 1G of guest page sized pages. Cover the
+ * case where the size is not aligned to 64 pages.
+ */
+ guest_num_pages = (1ul << (30 - guest_page_shift)) + 3;
host_page_size = getpagesize();
host_num_pages = (guest_num_pages * guest_page_size) / host_page_size +
!!((guest_num_pages * guest_page_size) % host_page_size);
@@ -359,7 +362,7 @@ static void run_test(enum vm_guest_mode mode, unsigned long iterations,
kvm_vm_get_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap);
#ifdef USE_CLEAR_DIRTY_LOG
kvm_vm_clear_dirty_log(vm, TEST_MEM_SLOT_INDEX, bmap, 0,
- DIV_ROUND_UP(host_num_pages, 64) * 64);
+ host_num_pages);
#endif
vm_dirty_log_verify(bmap);
iteration++;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b4f2d892a1d36..c9b102e65a3a7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1241,7 +1241,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
return -EINVAL;
- if ((log->first_page & 63) || (log->num_pages & 63))
+ if (log->first_page & 63)
return -EINVAL;
slots = __kvm_memslots(kvm, as_id);
@@ -1254,8 +1254,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
n = kvm_dirty_bitmap_bytes(memslot);
if (log->first_page > memslot->npages ||
- log->num_pages > memslot->npages - log->first_page)
- return -EINVAL;
+ log->num_pages > memslot->npages - log->first_page ||
+ (log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
+ return -EINVAL;
*flush = false;
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
--
2.20.1
Linus, Greg, Masahiro, Here are some simple fixes for the kheaders feature.
Please consider these patches for an rc release. They are based on Linus's
master branch. The only difference between the last series [1] and this one is
I squashed 1/3 and 3/3 and rebased. Thanks!
[1] https://patchwork.kernel.org/cover/10939557/
Joel Fernandes (Google) (2):
kheaders: Move from proc to sysfs
kheaders: Do not regenerate archive if config is not changed
init/Kconfig | 17 +++++----
kernel/Makefile | 4 +--
kernel/{gen_ikh_data.sh => gen_kheaders.sh} | 17 ++++++---
kernel/kheaders.c | 40 +++++++++------------
4 files changed, 38 insertions(+), 40 deletions(-)
rename kernel/{gen_ikh_data.sh => gen_kheaders.sh} (82%)
--
2.21.0.1020.gf2820cf01a-goog
## TLDR
I mostly wanted to incorporate feedback I got over the last week and a
half.
Biggest things to look out for:
- KUnit core now outputs results in TAP14.
- Heavily reworked tools/testing/kunit/kunit.py
- Changed how parsing works.
- Added testing.
- Greg, Logan, you might want to re-review this.
- Added documentation on how to use KUnit on non-UML kernels. You can
see the docs rendered here[1].
There is still some discussion going on on the [PATCH v2 00/17] thread,
but I wanted to get some of these updates out before they got too stale
(and too difficult for me to keep track of). I hope no one minds.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in under a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3].
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.1/v3 branch.
## Changes Since Last Version
- Converted KUnit core to print test results in TAP14 format as
suggested by Greg and Frank.
- Heavily reworked tools/testing/kunit/kunit.py
- Changed how parsing works.
- Added testing.
- Added documentation on how to use KUnit on non-UML kernels. You can
see the docs rendered here[1].
- Added a new set of EXPECTs and ASSERTs for pointer comparison.
- Removed more function indirection as suggested by Logan.
- Added a new patch that adds `kunit_try_catch_throw` to objtool's
noreturn list.
- Fixed a number of minorish issues pointed out by Shuah, Masahiro, and
kbuild bot.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.1/v3
--
2.21.0.1020.gf2820cf01a-goog
The cgroup testing relys on the root cgroup's subtree_control setting,
If the 'memory' controller isn't set, all test cases will be failed
as following:
$ sudo ./test_memcontrol
not ok 1 test_memcg_subtree_control
not ok 2 test_memcg_current
ok 3 # skip test_memcg_min
not ok 4 test_memcg_low
not ok 5 test_memcg_high
not ok 6 test_memcg_max
not ok 7 test_memcg_oom_events
ok 8 # skip test_memcg_swap_max
not ok 9 test_memcg_sock
not ok 10 test_memcg_oom_group_leaf_events
not ok 11 test_memcg_oom_group_parent_events
not ok 12 test_memcg_oom_group_score_events
To correct this unexcepted failure, this patch write the 'memory' to
subtree_control of root to get a right result.
Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Jay Kamat <jgkamat(a)fb.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 6f339882a6ca..73612d604a2a 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -1205,6 +1205,10 @@ int main(int argc, char **argv)
if (cg_read_strstr(root, "cgroup.controllers", "memory"))
ksft_exit_skip("memory controller isn't available\n");
+ if (cg_read_strstr(root, "cgroup.subtree_control", "memory"))
+ if (cg_write(root, "cgroup.subtree_control", "+memory"))
+ ksft_exit_skip("Failed to set root memory controller\n");
+
for (i = 0; i < ARRAY_SIZE(tests); i++) {
switch (tests[i].fn(root)) {
case KSFT_PASS:
--
2.19.1.856.g8858448bb
Android low memory killer (LMK) needs to know when a process dies once
it is sent the kill signal. It does so by checking for the existence of
/proc/pid which is both racy and slow. For example, if a PID is reused
between when LMK sends a kill signal and checks for existence of the
PID, since the wrong PID is now possibly checked for existence.
This patch adds polling support to pidfd. Using the polling support, LMK
will be able to get notified when a process exists in race-free and fast
way, and allows the LMK to do other things (such as by polling on other
fds) while awaiting the process being killed to die.
For notification to polling processes, we follow the same existing
mechanism in the kernel used when the parent of the task group is to be
notified of a child's death (do_notify_parent). This is precisely when
the tasks waiting on a poll of pidfd are also awakened in this patch.
We have decided to include the waitqueue in struct pid for the following
reasons:
1. The wait queue has to survive for the lifetime of the poll. Including
it in task_struct would not be option in this case because the task can
be reaped and destroyed before the poll returns.
2. By including the struct pid for the waitqueue means that during
de_thread(), the new thread group leader automatically gets the new
waitqueue/pid even though its task_struct is different.
Appropriate test cases are added in the second patch to provide coverage
of all the cases the patch is handling.
Andy had a similar patch [1] in the past which was a good reference
however this patch tries to handle different situations properly related
to thread group existence, and how/where it notifies. And also solves
other bugs (waitqueue lifetime). Daniel had a similar patch [2]
recently which this patch supercedes.
[1] https://lore.kernel.org/patchwork/patch/345098/
[2] https://lore.kernel.org/lkml/20181029175322.189042-1-dancol@google.com/
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Daniel Colascione <dancol(a)google.com>
Cc: Christian Brauner <christian(a)brauner.io>
Cc: Jann Horn <jannh(a)google.com>
Cc: Tim Murray <timmurray(a)google.com>
Cc: Jonathan Kowalski <bl0pbl33p(a)gmail.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: kernel-team(a)android.com
(Oleg improved the code by showing how to avoid tasklist_lock)
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Co-developed-by: Daniel Colascione <dancol(a)google.com>
Signed-off-by: Daniel Colascione <dancol(a)google.com>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1 -> v2:
* Restructure poll code to avoid tasklist_lock (Oleg)
* use task_pid instead of get_pid_task in notify_pidfd (Oleg)
* Added comments to code, commit message nits (Christian)
* Test case nits/improvements (Christian)
RFC -> v1:
* Based on CLONE_PIDFD patches: https://lwn.net/Articles/786244/
* Updated selftests.
* Renamed poll wake function to do_notify_pidfd.
* Removed depending on EXIT flags
* Removed POLLERR flag since semantics are controversial and
we don't have usecases for it right now (later we can add if there's
a need for it).
include/linux/pid.h | 3 +++
kernel/fork.c | 29 +++++++++++++++++++++++++++++
kernel/pid.c | 2 ++
kernel/signal.c | 11 +++++++++++
4 files changed, 45 insertions(+)
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 3c8ef5a199ca..1484db6ca8d1 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -3,6 +3,7 @@
#define _LINUX_PID_H
#include <linux/rculist.h>
+#include <linux/wait.h>
enum pid_type
{
@@ -60,6 +61,8 @@ struct pid
unsigned int level;
/* lists of tasks that use this pid */
struct hlist_head tasks[PIDTYPE_MAX];
+ /* wait queue for pidfd notifications */
+ wait_queue_head_t wait_pidfd;
struct rcu_head rcu;
struct upid numbers[1];
};
diff --git a/kernel/fork.c b/kernel/fork.c
index 5525837ed80e..721f8c9d2921 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1685,8 +1685,37 @@ static void pidfd_show_fdinfo(struct seq_file *m, struct file *f)
}
#endif
+/*
+ * Poll support for process exit notification.
+ */
+static unsigned int pidfd_poll(struct file *file, struct poll_table_struct *pts)
+{
+ struct task_struct *task;
+ struct pid *pid = file->private_data;
+ int poll_flags = 0;
+
+ poll_wait(file, &pid->wait_pidfd, pts);
+
+ rcu_read_lock();
+ task = pid_task(pid, PIDTYPE_PID);
+ WARN_ON_ONCE(task && !thread_group_leader(task));
+
+ /*
+ * Inform pollers only when the whole thread group exits, if thread
+ * group leader exits before all other threads in the group, then
+ * poll(2) should block, similar to the wait(2) family.
+ */
+ if (!task || (task->exit_state && thread_group_empty(task)))
+ poll_flags = POLLIN | POLLRDNORM;
+ rcu_read_unlock();
+
+ return poll_flags;
+}
+
+
const struct file_operations pidfd_fops = {
.release = pidfd_release,
+ .poll = pidfd_poll,
#ifdef CONFIG_PROC_FS
.show_fdinfo = pidfd_show_fdinfo,
#endif
diff --git a/kernel/pid.c b/kernel/pid.c
index 20881598bdfa..5c90c239242f 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -214,6 +214,8 @@ struct pid *alloc_pid(struct pid_namespace *ns)
for (type = 0; type < PIDTYPE_MAX; ++type)
INIT_HLIST_HEAD(&pid->tasks[type]);
+ init_waitqueue_head(&pid->wait_pidfd);
+
upid = pid->numbers + ns->level;
spin_lock_irq(&pidmap_lock);
if (!(ns->pid_allocated & PIDNS_ADDING))
diff --git a/kernel/signal.c b/kernel/signal.c
index 1581140f2d99..a17fff073c3d 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1800,6 +1800,14 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
return ret;
}
+static void do_notify_pidfd(struct task_struct *task)
+{
+ struct pid *pid;
+
+ pid = task_pid(task);
+ wake_up_all(&pid->wait_pidfd);
+}
+
/*
* Let a parent know about the death of a child.
* For a stopped/continued status change, use do_notify_parent_cldstop instead.
@@ -1823,6 +1831,9 @@ bool do_notify_parent(struct task_struct *tsk, int sig)
BUG_ON(!tsk->ptrace &&
(tsk->group_leader != tsk || !thread_group_empty(tsk)));
+ /* Wake up all pidfd waiters */
+ do_notify_pidfd(tsk);
+
if (sig != SIGCHLD) {
/*
* This is only possible if parent == real_parent.
--
2.21.0.593.g511ec345e18-goog
On Wed, May 08, 2019 at 05:39:07PM +0200, Peter Zijlstra wrote:
> On Wed, May 08, 2019 at 07:42:48AM -0500, Josh Poimboeuf wrote:
> > On Wed, May 08, 2019 at 02:04:16PM +0200, Peter Zijlstra wrote:
>
> > > Do the x86_64 variants also want some ORC annotation?
> >
> > Maybe so. Though it looks like regs->ip isn't saved. The saved
> > registers might need to be tweaked. I'll need to look into it.
>
> What all these sites do (and maybe we should look at unifying them
> somehow) is turn a CALL frame (aka RET-IP) into an exception frame (aka
> pt_regs).
>
> So regs->ip will be the return address (which is fixed up to be the CALL
> address in the handler).
But from what I can tell, trampoline_handler() hard-codes regs->ip to
point to kretprobe_trampoline(), and the original return address is
placed in regs->sp.
Masami, is there a reason why regs->ip doesn't have the original return
address and regs->sp doesn't have the original SP? I think that would
help the unwinder understand things.
--
Josh
The following files are generated after building /selftests/bpf/ and
should be added to .gitignore:
- libbpf.pc
- libbpf.so.*
Signed-off-by: Kelsey Skunberg <skunberg.kelsey(a)gmail.com>
---
Change since v1:
- Add libbpf.so.* in replace of libbpf.so.0 and
libbpf.so.0.0.3
- Update commit log to reflect file change
tools/testing/selftests/bpf/.gitignore | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 41e8a689aa77..a877803e4ba8 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -32,3 +32,5 @@ test_tcpnotify_user
test_libbpf
test_tcp_check_syncookie_user
alu32
+libbpf.pc
+libbpf.so.*
--
2.20.1
Hello kbuild, kselftest,
I've been working on a patchset which adds an additional build script to
the toolchain when compiling livepatches. There are a few kernel
section features in which this script does not yet support, but can
detect and abort when it encounters. To test this detection, I've
written a small set of kernel modules that require such sections.
A few questions:
Is build-testing out of scope for kernel selftests? For expediency,
it was really easy to spin out new lib/livepatch kernel modules.
Does kbuild support the notion of expected failure? In this case, the
build script returns a non-zero error and the build stops.
Am I trying to fit a square peg in a round hole? I could easily keep
these build tests in a private branch, but could they exist in a
different format somewhere else in the tree?
Suggestions welcome,
-- Joe
Linus, Greg, Masahiro,
Here are some simple fixes for the kheaders feature. Please consider these
patches for an rc release. They are based on Linus's master branch. Thanks!
Joel Fernandes (Google) (3):
kheaders: Move from proc to sysfs
kheaders: Do not regenerate archive if config is not changed
kheaders: Make it depend on sysfs
init/Kconfig | 17 +++++----
kernel/Makefile | 4 +--
kernel/{gen_ikh_data.sh => gen_kheaders.sh} | 17 ++++++---
kernel/kheaders.c | 40 +++++++++------------
4 files changed, 38 insertions(+), 40 deletions(-)
rename kernel/{gen_ikh_data.sh => gen_kheaders.sh} (82%)
--
2.21.0.1020.gf2820cf01a-goog
[Andrew, could you take this patchset into your tree, please?
Besides the patch for hexagon, all patches in this series have
Acked-by or Reviewed-by tags already.
I have been waiting and pinging the hexagon maintainer since November
without any visible effect. The last Acked-by from the hexagon maintainer
in linux.git was in October and the last Signed-off-by was in July. Since
that time not a single change affecting hexagon was able to attract
attention of the hexagon maintainer, so I don't think it's worth waiting
any longer.]
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its tracer
has no reliable means to find out that the syscall was, in fact,
a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the tracer.
Common practice is to keep track of the sequence of ptrace-stops in order
not to mix the two syscall-stops up. But it is not as simple as it looks;
for example, strace had a (just recently fixed) long-standing bug where
attaching strace to a tracee that is performing the execve system call
led to the tracer identifying the following syscall-exit-stop as
syscall-enter-stop, which messed up all the state tracking.
* Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3
("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA
and process_vm_readv become unavailable when the process dumpable flag
is cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
PTRACE_GET_SYSCALL_INFO returns the following structure:
struct ptrace_syscall_info {
__u8 op; /* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32))));
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};
The structure was chosen according to [2], except for the following
changes:
* seccomp substructure was added as a superset of entry substructure;
* the type of nr field was changed from int to __u64 because syscall
numbers are, as a practical matter, 64 bits;
* stack_pointer field was added along with instruction_pointer field
since it is readily available and can save the tracer from extra
PTRACE_GETREGS/PTRACE_GETREGSET calls;
* arch is always initialized to aid with tracing system calls
* such as execve();
* instruction_pointer and stack_pointer are always initialized
so they could be easily obtained for non-syscall stops;
* a boolean is_error field was added along with rval field, this way
the tracer can more reliably distinguish a return value
from an error value.
strace has been ported to PTRACE_GET_SYSCALL_INFO.
Starting with release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API
as the preferred mechanism of obtaining syscall information.
[1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF…
[2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3…
---
Notes:
v11:
* Added more Acked-by.
* Rebased back to mainline as the prerequisite syscall_get_arch patchset
has already been merged via audit tree.
v10:
* Added more Acked-by.
v9:
* Rebased to linux-next again due to syscall_get_arguments() signature change.
v8:
* Moved syscall_get_arch() specific patches to a separate patchset
which is now merged into audit/next tree.
* Rebased to linux-next.
* Moved ptrace_get_syscall_info code under #ifdef CONFIG_HAVE_ARCH_TRACEHOOK,
narrowing down the set of architectures supported by this implementation
back to those 19 that enable CONFIG_HAVE_ARCH_TRACEHOOK because
I failed to get all syscall_get_*(), instruction_pointer(),
and user_stack_pointer() functions implemented on some niche
architectures. This leaves the following architectures out:
alpha, h8300, m68k, microblaze, and unicore32.
v7:
* Rebased to v5.0-rc1.
* 5 arch-specific preparatory patches out of 25 have been merged
into v5.0-rc1 via arch trees.
v6:
* Added syscall_get_arguments and syscall_set_arguments wrappers
to asm-generic/syscall.h, requested by Geert.
* Changed PTRACE_GET_SYSCALL_INFO return code: do not take trailing paddings
into account, use the end of the last field of the structure being written.
* Changed struct ptrace_syscall_info:
* remove .frame_pointer field, is is not needed and not portable;
* make .arch field explicitly aligned, remove no longer needed
padding before .arch field;
* remove trailing pads, they are no longer needed.
v5:
* Merged separate series and patches into the single series.
* Changed PTRACE_EVENTMSG_SYSCALL_{ENTRY,EXIT} values as requested by Oleg.
* Changed struct ptrace_syscall_info: generalized instruction_pointer,
stack_pointer, and frame_pointer fields by moving them from
ptrace_syscall_info.{entry,seccomp} substructures to ptrace_syscall_info
and initializing them for all stops.
* Added PTRACE_SYSCALL_INFO_NONE, set it when not in a syscall stop,
so e.g. "strace -i" could use PTRACE_SYSCALL_INFO_SECCOMP to obtain
instruction_pointer when the tracee is in a signal stop.
* Patched all remaining architectures to provide all necessary
syscall_get_* functions.
* Made available for all architectures: do not conditionalize on
CONFIG_HAVE_ARCH_TRACEHOOK since all syscall_get_* functions
are implemented on all architectures.
* Added a test for PTRACE_GET_SYSCALL_INFO to selftests/ptrace.
v4:
* Revisited PTRACE_EVENT_SECCOMP support:
do not introduce task_struct.ptrace_event, use child->last_siginfo->si_code instead.
* Implemented PTRACE_SYSCALL_INFO_SECCOMP and ptrace_syscall_info.seccomp
support along with PTRACE_SYSCALL_INFO_{ENTRY,EXIT} and
ptrace_syscall_info.{entry,exit}.
v3:
* Changed struct ptrace_syscall_info.
* Added PTRACE_EVENT_SECCOMP support by adding ptrace_event to task_struct.
* Added proper defines for ptrace_syscall_info.op values.
* Renamed PT_SYSCALL_IS_ENTERING and PT_SYSCALL_IS_EXITING to
PTRACE_EVENTMSG_SYSCALL_ENTRY and PTRACE_EVENTMSG_SYSCALL_EXIT
and moved them to uapi.
v2:
* Stopped using task->ptrace.
* Replaced entry_info.is_compat with entry_info.arch, used syscall_get_arch().
* Used addr argument of sys_ptrace to get expected size of the struct;
return full size of the struct.
Dmitry V. Levin (6):
nds32: fix asm/syscall.h # acked
hexagon: define syscall_get_error() and syscall_get_return_value() # waiting for ack since November
mips: define syscall_get_error() # acked
parisc: define syscall_get_error() # acked
powerpc: define syscall_get_error() # acked
selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFO # acked
Elvira Khabirova (1):
ptrace: add PTRACE_GET_SYSCALL_INFO request # reviewed
arch/hexagon/include/asm/syscall.h | 14 +
arch/mips/include/asm/syscall.h | 6 +
arch/nds32/include/asm/syscall.h | 27 +-
arch/parisc/include/asm/syscall.h | 7 +
arch/powerpc/include/asm/syscall.h | 10 +
include/linux/tracehook.h | 9 +-
include/uapi/linux/ptrace.h | 35 +++
kernel/ptrace.c | 101 ++++++-
tools/testing/selftests/ptrace/.gitignore | 1 +
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/get_syscall_info.c | 271 ++++++++++++++++++
11 files changed, 468 insertions(+), 15 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/get_syscall_info.c
--
ldv
The kheaders archive consisting of the kernel headers used for compiling
bpf programs is in /proc. However there is concern that moving it here
will make it permanent. Let us move it to /sys/kernel as discussed [1].
[1] https://lore.kernel.org/patchwork/patch/1067310/#1265969
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
This patch applies on top of the previous patch that was applied to the
driver tree:
https://lore.kernel.org/patchwork/patch/1067310/
v2->v3: Fixed sysfs file mode nit (Greg).
v1->v2: Fixed some kconfig nits.
init/Kconfig | 16 ++++-----
kernel/Makefile | 4 +--
kernel/{gen_ikh_data.sh => gen_kheaders.sh} | 2 +-
kernel/kheaders.c | 40 +++++++++------------
4 files changed, 26 insertions(+), 36 deletions(-)
rename kernel/{gen_ikh_data.sh => gen_kheaders.sh} (98%)
diff --git a/init/Kconfig b/init/Kconfig
index 26a364a95b57..c3661991b089 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -579,15 +579,13 @@ config IKCONFIG_PROC
This option enables access to the kernel configuration file
through /proc/config.gz.
-config IKHEADERS_PROC
- tristate "Enable kernel header artifacts through /proc/kheaders.tar.xz"
- depends on PROC_FS
- help
- This option enables access to the kernel header and other artifacts that
- are generated during the build process. These can be used to build eBPF
- tracing programs, or similar programs. If you build the headers as a
- module, a module called kheaders.ko is built which can be loaded on-demand
- to get access to the headers.
+config IKHEADERS
+ tristate "Enable kernel headers through /sys/kernel/kheaders.tar.xz"
+ help
+ This option enables access to the in-kernel headers that are generated during
+ the build process. These can be used to build eBPF tracing programs,
+ or similar programs. If you build the headers as a module, a module called
+ kheaders.ko is built which can be loaded on-demand to get access to headers.
config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
diff --git a/kernel/Makefile b/kernel/Makefile
index 12399614c350..b32a558fae2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -70,7 +70,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_USER_NS) += user_namespace.o
obj-$(CONFIG_PID_NS) += pid_namespace.o
obj-$(CONFIG_IKCONFIG) += configs.o
-obj-$(CONFIG_IKHEADERS_PROC) += kheaders.o
+obj-$(CONFIG_IKHEADERS) += kheaders.o
obj-$(CONFIG_SMP) += stop_machine.o
obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
@@ -126,7 +126,7 @@ $(obj)/config_data.gz: $(KCONFIG_CONFIG) FORCE
$(obj)/kheaders.o: $(obj)/kheaders_data.tar.xz
quiet_cmd_genikh = CHK $(obj)/kheaders_data.tar.xz
-cmd_genikh = $(srctree)/kernel/gen_ikh_data.sh $@
+cmd_genikh = $(srctree)/kernel/gen_kheaders.sh $@
$(obj)/kheaders_data.tar.xz: FORCE
$(call cmd,genikh)
diff --git a/kernel/gen_ikh_data.sh b/kernel/gen_kheaders.sh
similarity index 98%
rename from kernel/gen_ikh_data.sh
rename to kernel/gen_kheaders.sh
index 591a94f7b387..581b83534587 100755
--- a/kernel/gen_ikh_data.sh
+++ b/kernel/gen_kheaders.sh
@@ -2,7 +2,7 @@
# SPDX-License-Identifier: GPL-2.0
# This script generates an archive consisting of kernel headers
-# for CONFIG_IKHEADERS_PROC.
+# for CONFIG_IKHEADERS.
set -e
spath="$(dirname "$(readlink -f "$0")")"
kroot="$spath/.."
diff --git a/kernel/kheaders.c b/kernel/kheaders.c
index 70ae6052920d..8f69772af77b 100644
--- a/kernel/kheaders.c
+++ b/kernel/kheaders.c
@@ -8,9 +8,8 @@
#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/proc_fs.h>
+#include <linux/kobject.h>
#include <linux/init.h>
-#include <linux/uaccess.h>
/*
* Define kernel_headers_data and kernel_headers_data_end, within which the
@@ -31,39 +30,32 @@ extern char kernel_headers_data;
extern char kernel_headers_data_end;
static ssize_t
-ikheaders_read_current(struct file *file, char __user *buf,
- size_t len, loff_t *offset)
+ikheaders_read(struct file *file, struct kobject *kobj,
+ struct bin_attribute *bin_attr,
+ char *buf, loff_t off, size_t len)
{
- return simple_read_from_buffer(buf, len, offset,
- &kernel_headers_data,
- &kernel_headers_data_end -
- &kernel_headers_data);
+ memcpy(buf, &kernel_headers_data + off, len);
+ return len;
}
-static const struct file_operations ikheaders_file_ops = {
- .read = ikheaders_read_current,
- .llseek = default_llseek,
+static struct bin_attribute kheaders_attr __ro_after_init = {
+ .attr = {
+ .name = "kheaders.tar.xz",
+ .mode = 0444,
+ },
+ .read = &ikheaders_read,
};
static int __init ikheaders_init(void)
{
- struct proc_dir_entry *entry;
-
- /* create the current headers file */
- entry = proc_create("kheaders.tar.xz", S_IRUGO, NULL,
- &ikheaders_file_ops);
- if (!entry)
- return -ENOMEM;
-
- proc_set_size(entry,
- &kernel_headers_data_end -
- &kernel_headers_data);
- return 0;
+ kheaders_attr.size = (&kernel_headers_data_end -
+ &kernel_headers_data);
+ return sysfs_create_bin_file(kernel_kobj, &kheaders_attr);
}
static void __exit ikheaders_cleanup(void)
{
- remove_proc_entry("kheaders.tar.xz", NULL);
+ sysfs_remove_bin_file(kernel_kobj, &kheaders_attr);
}
module_init(ikheaders_init);
--
2.21.0.1020.gf2820cf01a-goog
This adds a basic framework for running all the "safe" lkdtm tests.
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
MAINTAINERS | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/lkdtm/Makefile | 11 ++++
tools/testing/selftests/lkdtm/config | 1 +
tools/testing/selftests/lkdtm/run.sh | 86 +++++++++++++++++++++++++
tools/testing/selftests/lkdtm/tests.txt | 64 ++++++++++++++++++
6 files changed, 164 insertions(+)
create mode 100644 tools/testing/selftests/lkdtm/Makefile
create mode 100644 tools/testing/selftests/lkdtm/config
create mode 100755 tools/testing/selftests/lkdtm/run.sh
create mode 100644 tools/testing/selftests/lkdtm/tests.txt
diff --git a/MAINTAINERS b/MAINTAINERS
index 2c2fce72e694..055f1a09b1da 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8977,6 +8977,7 @@ LINUX KERNEL DUMP TEST MODULE (LKDTM)
M: Kees Cook <keescook(a)chromium.org>
S: Maintained
F: drivers/misc/lkdtm/*
+F: tools/testing/selftests/lkdtm/*
LINUX KERNEL MEMORY CONSISTENCY MODEL (LKMM)
M: Alan Stern <stern(a)rowland.harvard.edu>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 971fc8428117..c2397baeb777 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -23,6 +23,7 @@ TARGETS += kcmp
TARGETS += kvm
TARGETS += lib
TARGETS += livepatch
+TARGETS += lkdtm
TARGETS += membarrier
TARGETS += memfd
TARGETS += memory-hotplug
diff --git a/tools/testing/selftests/lkdtm/Makefile b/tools/testing/selftests/lkdtm/Makefile
new file mode 100644
index 000000000000..f322d281fd8e
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for LKDTM regression tests
+
+include ../lib.mk
+
+# NOTE: $(OUTPUT) won't get default value if used before lib.mk
+TEST_GEN_PROGS = $(patsubst %,$(OUTPUT)/%.sh,$(shell awk '{print $$1}' tests.txt | sed -e 's/\#//'))
+all: $(TEST_GEN_PROGS)
+
+$(OUTPUT)/%: run.sh tests.txt
+ install -m 0744 run.sh $@
diff --git a/tools/testing/selftests/lkdtm/config b/tools/testing/selftests/lkdtm/config
new file mode 100644
index 000000000000..d874990e442b
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/config
@@ -0,0 +1 @@
+CONFIG_LKDTM=y
diff --git a/tools/testing/selftests/lkdtm/run.sh b/tools/testing/selftests/lkdtm/run.sh
new file mode 100755
index 000000000000..117349bf78ca
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/run.sh
@@ -0,0 +1,86 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# This reads tests.txt for the list of LKDTM tests to invoke. Any marked
+# with a leading "#" are skipped. The line after the test name is either
+# the text to look for in dmesg for a "success", or the rationale for why
+# a test is marked to be skipped.
+#
+set -e
+TRIGGER=/sys/kernel/debug/provoke-crash/DIRECT
+
+# Verify we have LKDTM available in the kernel.
+if [ ! -r $TRIGGER ] ; then
+ /sbin/modprobe -q lkdtm || true
+ if [ ! -r $TRIGGER ] ; then
+ echo "Cannot find $TRIGGER (missing CONFIG_LKDTM?)"
+ else
+ echo "Cannot write $TRIGGER (need to run as root?)"
+ fi
+ # Skip this test
+ exit 4
+fi
+
+# Figure out which test to run from our script name.
+test=$(basename $0 .sh)
+# Look up details about the test from master list of LKDTM tests.
+line=$(egrep '^#?'"$test"'\b' tests.txt)
+if [ -z "$line" ]; then
+ echo "Skipped: missing test '$test' in tests.txt"
+ exit 4
+fi
+# Check that the test is known to LKDTM.
+if ! egrep -q '^'"$test"'$' "$TRIGGER" ; then
+ echo "Skipped: test '$test' missing in $TRIGGER!"
+ exit 4
+fi
+
+# Extract notes/expected output from test list.
+test=$(echo "$line" | cut -d" " -f1)
+if echo "$line" | grep -q ' ' ; then
+ expect=$(echo "$line" | cut -d" " -f2-)
+else
+ expect=""
+fi
+
+# If the test is commented out, report a skip
+if echo "$test" | grep -q '^#' ; then
+ test=$(echo "$test" | cut -c2-)
+ if [ -z "$expect" ]; then
+ expect="crashes entire system"
+ fi
+ echo "Skipping $test: $expect"
+ exit 4
+fi
+
+# If no expected output given, assume an Oops with back trace is success.
+if [ -z "$expect" ]; then
+ expect="call trace:"
+fi
+
+# Clear out dmesg for output reporting
+dmesg -c >/dev/null
+
+# Prepare log for report checking
+LOG=$(mktemp --tmpdir -t lkdtm-XXXXXX)
+cleanup() {
+ rm -f "$LOG"
+}
+trap cleanup EXIT
+
+# Most shells yell about signals and we're expecting the "cat" process
+# to usually be killed by the kernel. So we have to run it in a sub-shell
+# and silence errors.
+($SHELL -c 'cat <(echo '"$test"') >'"$TRIGGER" 2>/dev/null) || true
+
+# Record and dump the results
+dmesg -c >"$LOG"
+cat "$LOG"
+# Check for expected output
+if grep -qi "$expect" "$LOG" ; then
+ echo "$test: saw '$expect': ok"
+ exit 0
+else
+ echo "$test: missing '$expect': [FAIL]"
+ exit 1
+fi
diff --git a/tools/testing/selftests/lkdtm/tests.txt b/tools/testing/selftests/lkdtm/tests.txt
new file mode 100644
index 000000000000..59df8a23db45
--- /dev/null
+++ b/tools/testing/selftests/lkdtm/tests.txt
@@ -0,0 +1,64 @@
+#PANIC
+BUG kernel BUG at
+WARNING WARNING:
+EXCEPTION BUG: unable to handle kernel
+#LOOP Hangs the system
+#EXHAUST_STACK Corrupts memory on failure
+#CORRUPT_STACK Crashes entire system on success
+#CORRUPT_STACK_STRONG Crashes entire system on success
+CORRUPT_LIST_ADD list_add corruption
+CORRUPT_LIST_DEL list_del corruption
+CORRUPT_USER_DS Invalid address limit on user-mode return
+STACK_GUARD_PAGE_LEADING
+STACK_GUARD_PAGE_TRAILING
+UNALIGNED_LOAD_STORE_WRITE
+#OVERWRITE_ALLOCATION Corrupts memory on failure
+#WRITE_AFTER_FREE Corrupts memory on failure
+READ_AFTER_FREE
+#WRITE_BUDDY_AFTER_FREE Corrupts memory on failure
+READ_BUDDY_AFTER_FREE
+#SOFTLOCKUP Hangs the system
+#HARDLOCKUP Hangs the system
+#SPINLOCKUP Hangs the system
+#HUNG_TASK Hangs the system
+EXEC_DATA
+EXEC_STACK
+EXEC_KMALLOC
+EXEC_VMALLOC
+EXEC_RODATA
+EXEC_USERSPACE
+EXEC_NULL
+ACCESS_USERSPACE
+ACCESS_NULL
+WRITE_RO
+WRITE_RO_AFTER_INIT
+WRITE_KERN
+REFCOUNT_INC_OVERFLOW
+REFCOUNT_ADD_OVERFLOW
+REFCOUNT_INC_NOT_ZERO_OVERFLOW
+REFCOUNT_ADD_NOT_ZERO_OVERFLOW
+REFCOUNT_DEC_ZERO
+REFCOUNT_DEC_NEGATIVE Negative detected: saturated
+REFCOUNT_DEC_AND_TEST_NEGATIVE Negative detected: saturated
+REFCOUNT_SUB_AND_TEST_NEGATIVE Negative detected: saturated
+REFCOUNT_INC_ZERO
+REFCOUNT_ADD_ZERO
+REFCOUNT_INC_SATURATED Saturation detected: still saturated
+REFCOUNT_DEC_SATURATED Saturation detected: still saturated
+REFCOUNT_ADD_SATURATED Saturation detected: still saturated
+REFCOUNT_INC_NOT_ZERO_SATURATED
+REFCOUNT_ADD_NOT_ZERO_SATURATED
+REFCOUNT_DEC_AND_TEST_SATURATED Saturation detected: still saturated
+REFCOUNT_SUB_AND_TEST_SATURATED Saturation detected: still saturated
+#REFCOUNT_TIMING timing only
+#ATOMIC_TIMING timing only
+USERCOPY_HEAP_SIZE_TO
+USERCOPY_HEAP_SIZE_FROM
+USERCOPY_HEAP_WHITELIST_TO
+USERCOPY_HEAP_WHITELIST_FROM
+USERCOPY_STACK_FRAME_TO
+USERCOPY_STACK_FRAME_FROM
+USERCOPY_STACK_BEYOND
+USERCOPY_KERNEL
+USERCOPY_KERNEL_DS
+STACKLEAK_ERASING OK: the rest of the thread stack is properly erased
--
2.17.1
--
Kees Cook
[Andrew, could you take this patchset into your tree, please?]
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its tracer
has no reliable means to find out that the syscall was, in fact,
a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the tracer.
Common practice is to keep track of the sequence of ptrace-stops in order
not to mix the two syscall-stops up. But it is not as simple as it looks;
for example, strace had a (just recently fixed) long-standing bug where
attaching strace to a tracee that is performing the execve system call
led to the tracer identifying the following syscall-exit-stop as
syscall-enter-stop, which messed up all the state tracking.
* Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3
("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA
and process_vm_readv become unavailable when the process dumpable flag
is cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
PTRACE_GET_SYSCALL_INFO returns the following structure:
struct ptrace_syscall_info {
__u8 op; /* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32))));
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};
The structure was chosen according to [2], except for the following
changes:
* seccomp substructure was added as a superset of entry substructure;
* the type of nr field was changed from int to __u64 because syscall
numbers are, as a practical matter, 64 bits;
* stack_pointer field was added along with instruction_pointer field
since it is readily available and can save the tracer from extra
PTRACE_GETREGS/PTRACE_GETREGSET calls;
* arch is always initialized to aid with tracing system calls
* such as execve();
* instruction_pointer and stack_pointer are always initialized
so they could be easily obtained for non-syscall stops;
* a boolean is_error field was added along with rval field, this way
the tracer can more reliably distinguish a return value
from an error value.
strace has been ported to PTRACE_GET_SYSCALL_INFO.
Starting with release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API
as the preferred mechanism of obtaining syscall information.
[1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF…
[2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3…
---
Notes:
v10: added more Acked-by.
v9:
* Rebased to linux-next again due to syscall_get_arguments() signature change.
v8:
* Moved syscall_get_arch() specific patches to a separate patchset
which is now merged into audit/next tree.
* Rebased to linux-next.
* Moved ptrace_get_syscall_info code under #ifdef CONFIG_HAVE_ARCH_TRACEHOOK,
narrowing down the set of architectures supported by this implementation
back to those 19 that enable CONFIG_HAVE_ARCH_TRACEHOOK because
I failed to get all syscall_get_*(), instruction_pointer(),
and user_stack_pointer() functions implemented on some niche
architectures. This leaves the following architectures out:
alpha, h8300, m68k, microblaze, and unicore32.
v7:
* Rebased to v5.0-rc1.
* 5 arch-specific preparatory patches out of 25 have been merged
into v5.0-rc1 via arch trees.
v6:
* Add syscall_get_arguments and syscall_set_arguments wrappers
to asm-generic/syscall.h, requested by Geert.
* Change PTRACE_GET_SYSCALL_INFO return code: do not take trailing paddings
into account, use the end of the last field of the structure being written.
* Change struct ptrace_syscall_info:
* remove .frame_pointer field, is is not needed and not portable;
* make .arch field explicitly aligned, remove no longer needed
padding before .arch field;
* remove trailing pads, they are no longer needed.
v5:
* Merge separate series and patches into the single series.
* Change PTRACE_EVENTMSG_SYSCALL_{ENTRY,EXIT} values as requested by Oleg.
* Change struct ptrace_syscall_info: generalize instruction_pointer,
stack_pointer, and frame_pointer fields by moving them from
ptrace_syscall_info.{entry,seccomp} substructures to ptrace_syscall_info
and initializing them for all stops.
* Add PTRACE_SYSCALL_INFO_NONE, set it when not in a syscall stop,
so e.g. "strace -i" could use PTRACE_SYSCALL_INFO_SECCOMP to obtain
instruction_pointer when the tracee is in a signal stop.
* Patch all remaining architectures to provide all necessary
syscall_get_* functions.
* Make available for all architectures: do not conditionalize on
CONFIG_HAVE_ARCH_TRACEHOOK since all syscall_get_* functions
are implemented on all architectures.
* Add a test for PTRACE_GET_SYSCALL_INFO to selftests/ptrace.
v4:
* Do not introduce task_struct.ptrace_event,
use child->last_siginfo->si_code instead.
* Implement PTRACE_SYSCALL_INFO_SECCOMP and ptrace_syscall_info.seccomp
support along with PTRACE_SYSCALL_INFO_{ENTRY,EXIT} and
ptrace_syscall_info.{entry,exit}.
v3:
* Change struct ptrace_syscall_info.
* Support PTRACE_EVENT_SECCOMP by adding ptrace_event to task_struct.
* Add proper defines for ptrace_syscall_info.op values.
* Rename PT_SYSCALL_IS_ENTERING and PT_SYSCALL_IS_EXITING to
PTRACE_EVENTMSG_SYSCALL_ENTRY and PTRACE_EVENTMSG_SYSCALL_EXIT
* and move them to uapi.
v2:
* Do not use task->ptrace.
* Replace entry_info.is_compat with entry_info.arch, use syscall_get_arch().
* Use addr argument of sys_ptrace to get expected size of the struct;
return full size of the struct.
Dmitry V. Levin (6):
nds32: fix asm/syscall.h # acked
hexagon: define syscall_get_error() and syscall_get_return_value() # waiting for ack since November
mips: define syscall_get_error() # acked
parisc: define syscall_get_error() # acked
powerpc: define syscall_get_error() # waiting for ack since early December
selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFO # acked
Elvira Khabirova (1):
ptrace: add PTRACE_GET_SYSCALL_INFO request # reviewed
arch/hexagon/include/asm/syscall.h | 14 +
arch/mips/include/asm/syscall.h | 6 +
arch/nds32/include/asm/syscall.h | 27 +-
arch/parisc/include/asm/syscall.h | 7 +
arch/powerpc/include/asm/syscall.h | 10 +
include/linux/tracehook.h | 9 +-
include/uapi/linux/ptrace.h | 35 +++
kernel/ptrace.c | 103 ++++++-
tools/testing/selftests/ptrace/.gitignore | 1 +
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/get_syscall_info.c | 271 ++++++++++++++++++
11 files changed, 470 insertions(+), 15 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/get_syscall_info.c
--
ldv
Summary
------------------------------------------------------------------------
kernel: 5.1.0
git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
git branch: master
git commit: unknown
git describe: next-20190509
Test details: https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190509
Regressions (compared to build next-20190507)
------------------------------------------------------------------------
No regressions
Fixes (compared to build next-20190507)
------------------------------------------------------------------------
No fixes
In total:
------------------------------------------------------------------------
Ran 0 total tests in the following environments and test suites.
pass 0
fail 0
xfail 0
skip 0
Environments
--------------
- x15 - arm
Test Suites
-----------
Failures
------------------------------------------------------------------------
x15:
Skips
------------------------------------------------------------------------
No skips
--
Linaro LKFT
https://lkft.linaro.org
So the first 3 patches are cleanups and fixes and should probably just go in.
The last patch however is the one Linus hates, it converts i386 to always have
a complete pt_regs. Both Josh and me did a bunch of cleanups and the patch is
now a net reduction in lines.
I still think it is the right thing to do; it removes a whole lot of magic from
the C code (and avoids future surprises), but if this cannot convince Linus,
I'll not pursue this further.
From: Peter Zijlstra <peterz(a)infradead.org>
Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
function tracer started tracing the same function that was patched, the
conversion of the fentry call site during the translation of going from
calling the live kernel patch trampoline to the iterator trampoline, would
have as slight window where it didn't call anything.
As live kernel patching depends on ftrace to always call its code (to
prevent the function being traced from being called, as it will redirect
it). This small window would allow the old buggy function to be called, and
this can cause undesirable results.
Nicolai submitted new patches[2] but these were controversial. As this is
similar to the static call emulation issues that came up a while ago[3].
But after some debate[4][5] adding a gap in the stack when entering the
breakpoint handler allows for pushing the return address onto the stack to
easily emulate a call.
[1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
[2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
[3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.154171145…
[4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=…
[5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_…
[
Live kernel patching is not implemented on x86_32, thus the emulate
calls are only for x86_64.
]
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Nicolai Stange <nstange(a)suse.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: the arch/x86 maintainers <x86(a)kernel.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Miroslav Benes <mbenes(a)suse.cz>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Nayna Jain <nayna(a)linux.ibm.com>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Tested-by: Nicolai Stange <nstange(a)suse.de>
Reviewed-by: Nicolai Stange <nstange(a)suse.de>
Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
[ Changed to only implement emulated calls for x86_64 ]
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
arch/x86/kernel/ftrace.c | 32 +++++++++++++++++++++++++++-----
1 file changed, 27 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index ef49517f6bb2..bd553b3af22e 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,7 @@
#include <asm/kprobes.h>
#include <asm/ftrace.h>
#include <asm/nops.h>
+#include <asm/text-patching.h>
#ifdef CONFIG_DYNAMIC_FTRACE
@@ -231,6 +232,7 @@ int ftrace_modify_call(struct dyn_ftrace *rec, unsigned long old_addr,
}
static unsigned long ftrace_update_func;
+static unsigned long ftrace_update_func_call;
static int update_ftrace_func(unsigned long ip, void *new)
{
@@ -259,6 +261,8 @@ int ftrace_update_ftrace_func(ftrace_func_t func)
unsigned char *new;
int ret;
+ ftrace_update_func_call = (unsigned long)func;
+
new = ftrace_call_replace(ip, (unsigned long)func);
ret = update_ftrace_func(ip, new);
@@ -294,13 +298,28 @@ int ftrace_int3_handler(struct pt_regs *regs)
if (WARN_ON_ONCE(!regs))
return 0;
- ip = regs->ip - 1;
- if (!ftrace_location(ip) && !is_ftrace_caller(ip))
- return 0;
+ ip = regs->ip - INT3_INSN_SIZE;
- regs->ip += MCOUNT_INSN_SIZE - 1;
+#ifdef CONFIG_X86_64
+ if (ftrace_location(ip)) {
+ int3_emulate_call(regs, (unsigned long)ftrace_regs_caller);
+ return 1;
+ } else if (is_ftrace_caller(ip)) {
+ if (!ftrace_update_func_call) {
+ int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);
+ return 1;
+ }
+ int3_emulate_call(regs, ftrace_update_func_call);
+ return 1;
+ }
+#else
+ if (ftrace_location(ip) || is_ftrace_caller(ip)) {
+ int3_emulate_jmp(regs, ip + CALL_INSN_SIZE);
+ return 1;
+ }
+#endif
- return 1;
+ return 0;
}
NOKPROBE_SYMBOL(ftrace_int3_handler);
@@ -859,6 +878,8 @@ void arch_ftrace_update_trampoline(struct ftrace_ops *ops)
func = ftrace_ops_get_func(ops);
+ ftrace_update_func_call = (unsigned long)func;
+
/* Do a safe modify in case the trampoline is executing */
new = ftrace_call_replace(ip, (unsigned long)func);
ret = update_ftrace_func(ip, new);
@@ -960,6 +981,7 @@ static int ftrace_mod_jmp(unsigned long ip, void *func)
{
unsigned char *new;
+ ftrace_update_func_call = 0UL;
new = ftrace_jmp_replace(ip, (unsigned long)func);
return update_ftrace_func(ip, new);
--
2.20.1
From: Peter Zijlstra <peterz(a)infradead.org>
In order to allow breakpoints to emulate call instructions, they need to push
the return address onto the stack. The x86_64 int3 handler adds a small gap
to allow the stack to grow some. Use this gap to add the return address to
be able to emulate a call instruction at the breakpoint location.
These helper functions are added:
int3_emulate_jmp(): changes the location of the regs->ip to return there.
(The next two are only for x86_64)
int3_emulate_push(): to push the address onto the gap in the stack
int3_emulate_call(): push the return address and change regs->ip
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Nicolai Stange <nstange(a)suse.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: the arch/x86 maintainers <x86(a)kernel.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Miroslav Benes <mbenes(a)suse.cz>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Nayna Jain <nayna(a)linux.ibm.com>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Tested-by: Nicolai Stange <nstange(a)suse.de>
Reviewed-by: Nicolai Stange <nstange(a)suse.de>
Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
[ Modified to only work for x86_64 and added comment to int3_emulate_push() ]
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
arch/x86/include/asm/text-patching.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..05861cc08787 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -39,4 +39,32 @@ extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern int after_bootmem;
+static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
+{
+ regs->ip = ip;
+}
+
+#define INT3_INSN_SIZE 1
+#define CALL_INSN_SIZE 5
+
+#ifdef CONFIG_X86_64
+static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
+{
+ /*
+ * The int3 handler in entry_64.S adds a gap between the
+ * stack where the break point happened, and the saving of
+ * pt_regs. We can extend the original stack because of
+ * this gap. See the idtentry macro's create_gap option.
+ */
+ regs->sp -= sizeof(unsigned long);
+ *(unsigned long *)regs->sp = val;
+}
+
+static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func)
+{
+ int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
+ int3_emulate_jmp(regs, func);
+}
+#endif
+
#endif /* _ASM_X86_TEXT_PATCHING_H */
--
2.20.1
From: Peter Zijlstra <peterz(a)infradead.org>
In order to allow breakpoints to emulate call functions, they need to push
the return address onto the stack. But because the breakpoint exception
frame is added to the stack when the breakpoint is hit, there's no room to
add the address onto the stack and return to the address of the emulated
called funtion.
To handle this, copy the exception frame on entry of the breakpoint handler
and have leave a gap that can be used to add a return address to the stack
frame and return from the breakpoint to the emulated called function,
allowing for that called function to return back to the location after the
breakpoint was placed.
The helper functions were also added:
int3_emulate_push(): to push the address onto the gap in the stack
int3_emulate_jmp(): changes the location of the regs->ip to return there.
int3_emulate_call(): push the return address and change regs->ip
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Nicolai Stange <nstange(a)suse.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: the arch/x86 maintainers <x86(a)kernel.org>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Jiri Kosina <jikos(a)kernel.org>
Cc: Miroslav Benes <mbenes(a)suse.cz>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: Mimi Zohar <zohar(a)linux.ibm.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Nayna Jain <nayna(a)linux.ibm.com>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest(a)vger.kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: b700e7f03df5 ("livepatch: kernel: add support for live patching")
Signed-off-by: *** Need Peter Zijlstra's SoB here! ***
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
arch/x86/entry/entry_32.S | 11 +++++++++++
arch/x86/entry/entry_64.S | 14 ++++++++++++--
arch/x86/include/asm/text-patching.h | 20 ++++++++++++++++++++
3 files changed, 43 insertions(+), 2 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d309f30cf7af..50bbf4035baf 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -1478,6 +1478,17 @@ ENTRY(int3)
ASM_CLAC
pushl $-1 # mark this as an int
+#ifdef CONFIG_VM86
+ testl $X86_EFLAGS_VM, PT_EFLAGS(%esp)
+ jnz .Lfrom_usermode_no_gap
+#endif
+ testl $SEGMENT_RPL_MASK, PT_CS(%esp)
+ jnz .Lfrom_usermode_no_gap
+ .rept 6
+ pushl 5*4(%esp)
+ .endr
+.Lfrom_usermode_no_gap:
+
SAVE_ALL switch_stacks=1
ENCODE_FRAME_POINTER
TRACE_IRQS_OFF
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 1f0efdb7b629..834ec1397dab 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -879,7 +879,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt
* @paranoid == 2 is special: the stub will never switch stacks. This is for
* #DF: if the thread stack is somehow unusable, we'll still get a useful OOPS.
*/
-.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
+.macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 create_gap=0
ENTRY(\sym)
UNWIND_HINT_IRET_REGS offset=\has_error_code*8
@@ -899,6 +899,16 @@ ENTRY(\sym)
jnz .Lfrom_usermode_switch_stack_\@
.endif
+ .if \create_gap == 1
+ testb $3, CS-ORIG_RAX(%rsp)
+ jnz .Lfrom_usermode_no_gap_\@
+ .rept 6
+ pushq 5*8(%rsp)
+ .endr
+ UNWIND_HINT_IRET_REGS offset=8
+.Lfrom_usermode_no_gap_\@:
+ .endif
+
.if \paranoid
call paranoid_entry
.else
@@ -1130,7 +1140,7 @@ apicinterrupt3 HYPERV_STIMER0_VECTOR \
#endif /* CONFIG_HYPERV */
idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
-idtentry int3 do_int3 has_error_code=0
+idtentry int3 do_int3 has_error_code=0 create_gap=1
idtentry stack_segment do_stack_segment has_error_code=1
#ifdef CONFIG_XEN_PV
diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..ba275b6292db 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -39,4 +39,24 @@ extern int poke_int3_handler(struct pt_regs *regs);
extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
extern int after_bootmem;
+static inline void int3_emulate_push(struct pt_regs *regs, unsigned long val)
+{
+ regs->sp -= sizeof(unsigned long);
+ *(unsigned long *)regs->sp = val;
+}
+
+static inline void int3_emulate_jmp(struct pt_regs *regs, unsigned long ip)
+{
+ regs->ip = ip;
+}
+
+#define INT3_INSN_SIZE 1
+#define CALL_INSN_SIZE 5
+
+static inline void int3_emulate_call(struct pt_regs *regs, unsigned long func)
+{
+ int3_emulate_push(regs, regs->ip - INT3_INSN_SIZE + CALL_INSN_SIZE);
+ int3_emulate_jmp(regs, func);
+}
+
#endif /* _ASM_X86_TEXT_PATCHING_H */
--
2.20.1
Install target fails when INSTALL_PATH is undefined. Fix install target
to use "output_dir/install as the default install location. "output_dir"
is either the root of selftests directory under kernel source tree or
output directory specified by O= or KBUILD_OUTPUT.
e.g:
make -C tools/testing/selftests install
<installs under tools/testing/selftests/install>
make O=/tmp/kselftest -C tools/testing/selftests install
<installs under /tmp/kselftest/install>
export KBUILD_OUTPUT=/tmp/kselftest
make -C tools/testing/selftests install
<installs under /tmp/kselftest/install>
In addition, add "all" target as dependency to "install" to build and
install using a single command.
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/Makefile | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 9f05448e5e4b..c71a63b923d4 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -163,11 +163,17 @@ clean_hotplug:
run_pstore_crash:
make -C pstore run_crash
-INSTALL_PATH ?= install
+# Use $BUILD as the default install root. $BUILD points to the
+# right output location for the following cases:
+# 1. output_dir=kernel_src
+# 2. a separate output directory is specified using O= KBUILD_OUTPUT
+# 3. a separate output directory is specified using KBUILD_OUTPUT
+#
+INSTALL_PATH ?= $(BUILD)/install
INSTALL_PATH := $(abspath $(INSTALL_PATH))
ALL_SCRIPT := $(INSTALL_PATH)/run_kselftest.sh
-install:
+install: all
ifdef INSTALL_PATH
@# Ask all targets to install their files
mkdir -p $(INSTALL_PATH)/kselftest
--
2.17.1
[
This is the non-RFC version.
It went through and passed all my tests. If there's no objections
I'm going to include this in my pull request. I still have patches
in my INBOX that may still be included, so I need to run those through
my tests as well, so a pull request wont be immediate.
]
Nicolai Stange discovered that Live Kernel Patching can have unforseen
consequences if tracing is enabled when there are functions that are
patched. The reason being, is that Live Kernel patching is built on top
of ftrace, which will have the patched functions call the live kernel
trampoline directly, and that trampoline will modify the regs->ip address
to return to the patched function.
But in the transition between changing the call to the customized
trampoline, the tracing code is needed to have its handler called
an well, so the function fentry location must be changed from calling
the live kernel patching trampoline, to the ftrace_reg_caller trampoline
which will iterate through all the registered ftrace handlers for
that function.
During this transition, a break point is added to do the live code
modifications. But if that break point is hit, it just skips calling
any handler, and makes the call site act as a nop. For tracing, the
worse that can happen is that you miss a function being traced, but
for live kernel patching the affects are more severe, as the old buggy
function is now called.
To solve this, an int3_emulate_call() is created for x86_64 to allow
ftrace on x86_64 to emulate the call to ftrace_regs_caller() which will
make sure all the registered handlers to that function are still called.
And this keeps live kernel patching happy!
To mimimize the changes, and to avoid controversial patches, this
only changes x86_64. Due to the way x86_32 implements the regs->sp
the complexity of emulating calls on that platform is too much for
stable patches, and live kernel patching does not support x86_32 anyway.
Josh Poimboeuf (1):
x86_64: Add gap to int3 to allow for call emulation
Peter Zijlstra (2):
x86_64: Allow breakpoints to emulate call instructions
ftrace/x86_64: Emulate call function while updating in breakpoint handler
----
arch/x86/entry/entry_64.S | 18 ++++++++++++++++--
arch/x86/include/asm/text-patching.h | 28 ++++++++++++++++++++++++++++
arch/x86/kernel/ftrace.c | 32 +++++++++++++++++++++++++++-----
3 files changed, 71 insertions(+), 7 deletions(-)
Nicolai Stange discovered that Live Kernel Patching can have unforseen
consequences if tracing is enabled when there are functions that are
patched. The reason being, is that Live Kernel patching is built on top
of ftrace, which will have the patched functions call the live kernel
trampoline directly, and that trampoline will modify the regs->ip address
to return to the patched function.
But in the transition between changing the call to the customized
trampoline, the tracing code is needed to have its handler called
an well, so the function fentry location must be changed from calling
the live kernel patching trampoline, to the ftrace_reg_caller trampoline
which will iterate through all the registered ftrace handlers for
that function.
During this transition, a break point is added to do the live code
modifications. But if that break point is hit, it just skips calling
any handler, and makes the call site act as a nop. For tracing, the
worse that can happen is that you miss a function being traced, but
for live kernel patching the affects are more severe, as the old buggy
function is now called.
To solve this, an int3_emulate_call() is created for x86_64 to allow
ftrace on x86_64 to emulate the call to ftrace_regs_caller() which will
make sure all the registered handlers to that function are still called.
And this keeps live kernel patching happy!
To mimimize the changes, and to avoid controversial patches, this
only changes x86_64. Due to the way x86_32 implements the regs->sp
the complexity of emulating calls on that platform is too much for
stable patches, and live kernel patching does not support x86_32 anyway.
Josh Poimboeuf (1):
x86_64: Add gap to int3 to allow for call emulation
Peter Zijlstra (2):
x86_64: Allow breakpoints to emulate call functions
ftrace/x86_64: Emulate call function while updating in breakpoint handler
----
arch/x86/entry/entry_64.S | 18 ++++++++++++++++--
arch/x86/include/asm/text-patching.h | 22 ++++++++++++++++++++++
arch/x86/kernel/ftrace.c | 32 +++++++++++++++++++++++++++-----
3 files changed, 65 insertions(+), 7 deletions(-)
The kheaders archive consisting of the kernel headers used for compiling
bpf programs is in /proc. However there is concern that moving it here
will make it permanent. Let us move it to /sys/kernel as discussed [1].
[1] https://lore.kernel.org/patchwork/patch/1067310/#1265969
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
This patch applies on top of the previous patch that was applied to the
driver tree:
https://lore.kernel.org/patchwork/patch/1067310/
v1->v2: Fixed some kconfig nits (Masami).
init/Kconfig | 16 ++++-----
kernel/Makefile | 4 +--
kernel/{gen_ikh_data.sh => gen_kheaders.sh} | 2 +-
kernel/kheaders.c | 40 +++++++++------------
4 files changed, 26 insertions(+), 36 deletions(-)
rename kernel/{gen_ikh_data.sh => gen_kheaders.sh} (98%)
diff --git a/init/Kconfig b/init/Kconfig
index 26a364a95b57..c3661991b089 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -579,15 +579,13 @@ config IKCONFIG_PROC
This option enables access to the kernel configuration file
through /proc/config.gz.
-config IKHEADERS_PROC
- tristate "Enable kernel header artifacts through /proc/kheaders.tar.xz"
- depends on PROC_FS
- help
- This option enables access to the kernel header and other artifacts that
- are generated during the build process. These can be used to build eBPF
- tracing programs, or similar programs. If you build the headers as a
- module, a module called kheaders.ko is built which can be loaded on-demand
- to get access to the headers.
+config IKHEADERS
+ tristate "Enable kernel headers through /sys/kernel/kheaders.tar.xz"
+ help
+ This option enables access to the in-kernel headers that are generated during
+ the build process. These can be used to build eBPF tracing programs,
+ or similar programs. If you build the headers as a module, a module called
+ kheaders.ko is built which can be loaded on-demand to get access to headers.
config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
diff --git a/kernel/Makefile b/kernel/Makefile
index 12399614c350..b32a558fae2f 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -70,7 +70,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_USER_NS) += user_namespace.o
obj-$(CONFIG_PID_NS) += pid_namespace.o
obj-$(CONFIG_IKCONFIG) += configs.o
-obj-$(CONFIG_IKHEADERS_PROC) += kheaders.o
+obj-$(CONFIG_IKHEADERS) += kheaders.o
obj-$(CONFIG_SMP) += stop_machine.o
obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
@@ -126,7 +126,7 @@ $(obj)/config_data.gz: $(KCONFIG_CONFIG) FORCE
$(obj)/kheaders.o: $(obj)/kheaders_data.tar.xz
quiet_cmd_genikh = CHK $(obj)/kheaders_data.tar.xz
-cmd_genikh = $(srctree)/kernel/gen_ikh_data.sh $@
+cmd_genikh = $(srctree)/kernel/gen_kheaders.sh $@
$(obj)/kheaders_data.tar.xz: FORCE
$(call cmd,genikh)
diff --git a/kernel/gen_ikh_data.sh b/kernel/gen_kheaders.sh
similarity index 98%
rename from kernel/gen_ikh_data.sh
rename to kernel/gen_kheaders.sh
index 591a94f7b387..581b83534587 100755
--- a/kernel/gen_ikh_data.sh
+++ b/kernel/gen_kheaders.sh
@@ -2,7 +2,7 @@
# SPDX-License-Identifier: GPL-2.0
# This script generates an archive consisting of kernel headers
-# for CONFIG_IKHEADERS_PROC.
+# for CONFIG_IKHEADERS.
set -e
spath="$(dirname "$(readlink -f "$0")")"
kroot="$spath/.."
diff --git a/kernel/kheaders.c b/kernel/kheaders.c
index 70ae6052920d..6a16f8f6898d 100644
--- a/kernel/kheaders.c
+++ b/kernel/kheaders.c
@@ -8,9 +8,8 @@
#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/proc_fs.h>
+#include <linux/kobject.h>
#include <linux/init.h>
-#include <linux/uaccess.h>
/*
* Define kernel_headers_data and kernel_headers_data_end, within which the
@@ -31,39 +30,32 @@ extern char kernel_headers_data;
extern char kernel_headers_data_end;
static ssize_t
-ikheaders_read_current(struct file *file, char __user *buf,
- size_t len, loff_t *offset)
+ikheaders_read(struct file *file, struct kobject *kobj,
+ struct bin_attribute *bin_attr,
+ char *buf, loff_t off, size_t len)
{
- return simple_read_from_buffer(buf, len, offset,
- &kernel_headers_data,
- &kernel_headers_data_end -
- &kernel_headers_data);
+ memcpy(buf, &kernel_headers_data + off, len);
+ return len;
}
-static const struct file_operations ikheaders_file_ops = {
- .read = ikheaders_read_current,
- .llseek = default_llseek,
+static struct bin_attribute kheaders_attr __ro_after_init = {
+ .attr = {
+ .name = "kheaders.tar.xz",
+ .mode = S_IRUGO,
+ },
+ .read = &ikheaders_read,
};
static int __init ikheaders_init(void)
{
- struct proc_dir_entry *entry;
-
- /* create the current headers file */
- entry = proc_create("kheaders.tar.xz", S_IRUGO, NULL,
- &ikheaders_file_ops);
- if (!entry)
- return -ENOMEM;
-
- proc_set_size(entry,
- &kernel_headers_data_end -
- &kernel_headers_data);
- return 0;
+ kheaders_attr.size = (&kernel_headers_data_end -
+ &kernel_headers_data);
+ return sysfs_create_bin_file(kernel_kobj, &kheaders_attr);
}
static void __exit ikheaders_cleanup(void)
{
- remove_proc_entry("kheaders.tar.xz", NULL);
+ sysfs_remove_bin_file(kernel_kobj, &kheaders_attr);
}
module_init(ikheaders_init);
--
2.21.0.1020.gf2820cf01a-goog
From: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
[ Upstream commit 30c04d796b693e22405c38e9b78e9a364e4c77e6 ]
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/run_netsocktests | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
index c09a682df56a..19486dab2379 100644
--- a/tools/testing/selftests/net/run_netsocktests
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -6,7 +6,7 @@ echo "--------------------"
./socket
if [ $? -ne 0 ]; then
echo "[FAIL]"
+ exit 1
else
echo "[PASS]"
fi
-
--
2.20.1
From: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
[ Upstream commit 30c04d796b693e22405c38e9b78e9a364e4c77e6 ]
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/run_netsocktests | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
index 16058bbea7a8..c195b4478662 100755
--- a/tools/testing/selftests/net/run_netsocktests
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -6,7 +6,7 @@ echo "--------------------"
./socket
if [ $? -ne 0 ]; then
echo "[FAIL]"
+ exit 1
else
echo "[PASS]"
fi
-
--
2.20.1
From: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
[ Upstream commit 30c04d796b693e22405c38e9b78e9a364e4c77e6 ]
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/run_netsocktests | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
index 16058bbea7a8..c195b4478662 100755
--- a/tools/testing/selftests/net/run_netsocktests
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -6,7 +6,7 @@ echo "--------------------"
./socket
if [ $? -ne 0 ]; then
echo "[FAIL]"
+ exit 1
else
echo "[PASS]"
fi
-
--
2.20.1
From: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
[ Upstream commit 30c04d796b693e22405c38e9b78e9a364e4c77e6 ]
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/run_netsocktests | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
index b093f39c298c..14e41faf2c57 100755
--- a/tools/testing/selftests/net/run_netsocktests
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -7,7 +7,7 @@ echo "--------------------"
./socket
if [ $? -ne 0 ]; then
echo "[FAIL]"
+ exit 1
else
echo "[PASS]"
fi
-
--
2.20.1
From: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
[ Upstream commit 30c04d796b693e22405c38e9b78e9a364e4c77e6 ]
The run_netsocktests will be marked as passed regardless the actual test
result from the ./socket:
selftests: net: run_netsocktests
========================================
--------------------
running socket test
--------------------
[FAIL]
ok 1..6 selftests: net: run_netsocktests [PASS]
This is because the test script itself has been successfully executed.
Fix this by exit 1 when the test failed.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/run_netsocktests | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/run_netsocktests b/tools/testing/selftests/net/run_netsocktests
index b093f39c298c..14e41faf2c57 100755
--- a/tools/testing/selftests/net/run_netsocktests
+++ b/tools/testing/selftests/net/run_netsocktests
@@ -7,7 +7,7 @@ echo "--------------------"
./socket
if [ $? -ne 0 ]; then
echo "[FAIL]"
+ exit 1
else
echo "[PASS]"
fi
-
--
2.20.1
From: David Ahern <dsahern(a)gmail.com>
[ Upstream commit a5f622984a623df9a84cf43f6b098d8dd76fbe05 ]
A couple of tests are verifying a route has been removed. The helper
expects the prefix as the first part of the expected output. When
checking that a route has been deleted the prefix is empty leading
to an invalid ip command:
$ ip ro ls match
Command line is not complete. Try option "help"
Fix by moving the comparison of expected output and output to a new
function that is used by both check_route and check_route6. Use the
new helper for the 2 checks on route removal.
Also, remove the reset of 'set -x' in route_setup which overrides the
user managed setting.
Fixes: d69faad76584c ("selftests: fib_tests: Add prefix route tests with metric")
Signed-off-by: David Ahern <dsahern(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/net/fib_tests.sh | 94 ++++++++++--------------
1 file changed, 40 insertions(+), 54 deletions(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index a4ccde0e473b..2f190aa8fc5f 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -602,6 +602,39 @@ run_cmd()
return $rc
}
+check_expected()
+{
+ local out="$1"
+ local expected="$2"
+ local rc=0
+
+ [ "${out}" = "${expected}" ] && return 0
+
+ if [ -z "${out}" ]; then
+ if [ "$VERBOSE" = "1" ]; then
+ printf "\nNo route entry found\n"
+ printf "Expected:\n"
+ printf " ${expected}\n"
+ fi
+ return 1
+ fi
+
+ # tricky way to convert output to 1-line without ip's
+ # messy '\'; this drops all extra white space
+ out=$(echo ${out})
+ if [ "${out}" != "${expected}" ]; then
+ rc=1
+ if [ "${VERBOSE}" = "1" ]; then
+ printf " Unexpected route entry. Have:\n"
+ printf " ${out}\n"
+ printf " Expected:\n"
+ printf " ${expected}\n\n"
+ fi
+ fi
+
+ return $rc
+}
+
# add route for a prefix, flushing any existing routes first
# expected to be the first step of a test
add_route6()
@@ -646,31 +679,7 @@ check_route6()
local rc=0
out=$($IP -6 ro ls match ${pfx} | sed -e 's/ pref medium//')
- [ "${out}" = "${expected}" ] && return 0
-
- if [ -z "${out}" ]; then
- if [ "$VERBOSE" = "1" ]; then
- printf "\nNo route entry found\n"
- printf "Expected:\n"
- printf " ${expected}\n"
- fi
- return 1
- fi
-
- # tricky way to convert output to 1-line without ip's
- # messy '\'; this drops all extra white space
- out=$(echo ${out})
- if [ "${out}" != "${expected}" ]; then
- rc=1
- if [ "${VERBOSE}" = "1" ]; then
- printf " Unexpected route entry. Have:\n"
- printf " ${out}\n"
- printf " Expected:\n"
- printf " ${expected}\n\n"
- fi
- fi
-
- return $rc
+ check_expected "${out}" "${expected}"
}
route_cleanup()
@@ -714,7 +723,7 @@ route_setup()
$IP addr add 172.16.103.2/24 dev veth4
$IP addr add 172.16.104.1/24 dev dummy1
- set +ex
+ set +e
}
# assumption is that basic add of a single path route works
@@ -949,7 +958,8 @@ ipv6_addr_metric_test()
run_cmd "$IP li set dev dummy2 down"
rc=$?
if [ $rc -eq 0 ]; then
- check_route6 ""
+ out=$($IP -6 ro ls match 2001:db8:104::/64)
+ check_expected "${out}" ""
rc=$?
fi
log_test $rc 0 "Prefix route removed on link down"
@@ -1009,34 +1019,9 @@ check_route()
local pfx="172.16.104.0/24"
local expected="$1"
local out
- local rc=0
out=$($IP ro ls match ${pfx})
- [ "${out}" = "${expected}" ] && return 0
-
- if [ -z "${out}" ]; then
- if [ "$VERBOSE" = "1" ]; then
- printf "\nNo route entry found\n"
- printf "Expected:\n"
- printf " ${expected}\n"
- fi
- return 1
- fi
-
- # tricky way to convert output to 1-line without ip's
- # messy '\'; this drops all extra white space
- out=$(echo ${out})
- if [ "${out}" != "${expected}" ]; then
- rc=1
- if [ "${VERBOSE}" = "1" ]; then
- printf " Unexpected route entry. Have:\n"
- printf " ${out}\n"
- printf " Expected:\n"
- printf " ${expected}\n\n"
- fi
- fi
-
- return $rc
+ check_expected "${out}" "${expected}"
}
# assumption is that basic add of a single path route works
@@ -1301,7 +1286,8 @@ ipv4_addr_metric_test()
run_cmd "$IP li set dev dummy2 down"
rc=$?
if [ $rc -eq 0 ]; then
- check_route ""
+ out=$($IP ro ls match 172.16.104.0/24)
+ check_expected "${out}" ""
rc=$?
fi
log_test $rc 0 "Prefix route removed on link down"
--
2.20.1