Patch changelog:
v9:
* Replace resolveat(2) with openat2(2). [Linus]
* Output a warning to dmesg if may_open_magiclink() is violated.
* Add an openat2(O_CREAT) testcase.
v8:
* Default to O_CLOEXEC to match other new fd-creation syscalls
(users can always disable O_CLOEXEC afterwards). [Christian]
* Implement magic-link restrictions based on their mode. This is
done through a series of masks and is designed to avoid breaking
users -- most users don't have chained O_PATH fd re-opens.
* Add O_EMPTYPATH which allows for fd re-opening without needing
procfs. This would help some users of fd re-opening, and with the
changes to magic-link permissions we now have the right semantics
for such a flag.
* Add selftests for resolveat(2), O_EMPTYPATH, and the magic-link
mode semantics.
v7:
* Remove execveat(2) support for these flags since it might
result in some pretty hairy security issues with setuid binaries.
There are other avenues we can go down to solve the issues with
CVE-2019-5736. [Jann]
* Reserve an additional bit in resolveat(2) for the eXecute access
mode if we end up implementing it.
v6:
* Drop O_* flags API to the new LOOKUP_ path scoping bits and
instead introduce resolveat(2) as an alternative method of
obtaining an O_PATH. The justification for this is included in
patch 6 (though switching back to O_* flags is trivial).
v5:
* In response to CVE-2019-5736 (one of the vectors showed that
open(2)+fexec(3) cannot be used to scope binfmt_script's implicit
open_exec()), AT_* flags have been re-added and are now piped
through to binfmt_script (and other binfmt_* that use open_exec)
but are only supported for execveat(2) for now.
v4:
* Remove AT_* flag reservations, as they require more discussion.
* Switch to path_is_under() over __d_path() for breakout checking.
* Make O_XDEV no longer block openat("/tmp", "/", O_XDEV) -- dirfd
is now ignored for absolute paths to match other flags.
* Improve the dirfd_path_init() refactor and move it to a separate
commit.
* Remove reference to Linux-capsicum.
* Switch "proclink" name to magic-link.
v3: [resend]
v2:
* Made ".." resolution with AT_THIS_ROOT and AT_BENEATH safe(r) with
some semi-aggressive __d_path checking (see patch 3).
* Disallowed "proclinks" with AT_THIS_ROOT and AT_BENEATH, in the
hopes they can be re-enabled once safe.
* Removed the selftests as they will be reimplemented as xfstests.
* Removed stat(2) support, since you can already get it through
O_PATH and fstatat(2).
The need for some sort of control over VFS's path resolution (to avoid
malicious paths resulting in inadvertent breakouts) has been a very
long-standing desire of many userspace applications. This patchset is a
revival of Al Viro's old AT_NO_JUMPS[1,2] patchset (which was a variant
of David Drysdale's O_BENEATH patchset[3] which was a spin-off of the
Capsicum project[4]) with a few additions and changes made based on the
previous discussion within [5] as well as others I felt were useful.
In line with the conclusions of the original discussion of AT_NO_JUMPS,
the flag has been split up into separate flags. However, instead of
being an openat(2) flag it is provided through a new syscall openat2(2)
which provides an alternative way to get an O_PATH file descriptor (the
reasoning for doing this is included in patch 6). The following new
LOOKUP_ flags are added:
* LOOKUP_XDEV blocks all mountpoint crossings (upwards, downwards, or
through absolute links). Absolute pathnames alone in openat(2) do
not trigger this.
* LOOKUP_NO_MAGICLINKS blocks resolution through /proc/$pid/fd-style
links. This is done by blocking the usage of nd_jump_link() during
resolution in a filesystem. The term "magic-links" is used to match
with the only reference to these links in Documentation/, but I'm
happy to change the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
* LOOKUP_BENEATH disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed. Conceptually this flag is to
ensure you "stay below" a certain point in the filesystem tree --
but this requires some additional to protect against various races
that would allow escape using "..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done as
in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
* LOOKUP_NO_SYMLINKS does what it says on the tin. No symlink
resolution is allowed at all, including magic-links. Just as with
LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an
fd for the symlink as long as no parent path had a symlink
component.
* LOOKUP_IN_ROOT is an extension of LOOKUP_BENEATH that, rather than
blocking attempts to move past the root, forces all such movements
to be scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that chroot(2)
is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to cross
magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[6] when opening
paths in a potentially malicious container. There is a long list of
CVEs that could have bene mitigated by having O_THISROOT (such as
CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and
CVE-2019-5736, just to name a few).
And further, several semantics of file descriptor "re-opening" are now
changed to prevent attacks like CVE-2019-5736 by restricting how
magic-links can be resolved (based on their mode). This required some
other changes to the semantics of the modes of O_PATH file descriptor's
associated /proc/self/fd magic-links. openat2(2) has the ability to
further restrict re-opening of its own O_PATH fds, so that users can
make even better use of this feature.
Finally, O_EMPTYPATH was added so that users can do /proc/self/fd-style
re-opening without depending on procfs. The new restricted semantics for
magic-links are applied here too.
In order to make all of the above more usable, I'm working on
libpathrs[7] which is a C-friendly library for safe path resolution. It
features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Christian Brauner <christian(a)brauner.io>
Cc: David Drysdale <drysdale(a)google.com>
Cc: Tycho Andersen <tycho(a)tycho.ws>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: <containers(a)lists.linux-foundation.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Cc: <linux-api(a)vger.kernel.org>
[1]: https://lwn.net/Articles/721443/
[2]: https://lore.kernel.org/patchwork/patch/784221/
[3]: https://lwn.net/Articles/619151/
[4]: https://lwn.net/Articles/603929/
[5]: https://lwn.net/Articles/723057/
[6]: https://github.com/cyphar/filepath-securejoin
[7]: https://github.com/openSUSE/libpathrs
Aleksa Sarai (10):
namei: obey trailing magic-link DAC permissions
procfs: switch magic-link modes to be more sane
open: O_EMPTYPATH: procfs-less file descriptor re-opening
namei: split out nd->dfd handling to dirfd_path_init
namei: O_BENEATH-style path resolution flags
namei: LOOKUP_IN_ROOT: chroot-like path resolution
namei: aggressively check for nd->root escape on ".." resolution
open: openat2(2) syscall
kselftest: save-and-restore errno to allow for %m formatting
selftests: add openat2(2) selftests
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/fcntl.c | 2 +-
fs/internal.h | 1 +
fs/namei.c | 333 ++++++++++++---
fs/open.c | 140 +++++--
fs/proc/base.c | 20 +-
fs/proc/fd.c | 23 +-
fs/proc/namespaces.c | 2 +-
include/linux/fcntl.h | 17 +-
include/linux/fs.h | 8 +-
include/linux/namei.h | 8 +
include/linux/syscalls.h | 14 +-
include/uapi/asm-generic/fcntl.h | 5 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/fcntl.h | 38 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/kselftest.h | 15 +
tools/testing/selftests/memfd/memfd_test.c | 7 +-
tools/testing/selftests/openat2/.gitignore | 1 +
tools/testing/selftests/openat2/Makefile | 12 +
tools/testing/selftests/openat2/helpers.c | 162 +++++++
tools/testing/selftests/openat2/helpers.h | 114 +++++
.../testing/selftests/openat2/linkmode_test.c | 325 ++++++++++++++
.../selftests/openat2/rename_attack_test.c | 124 ++++++
.../testing/selftests/openat2/resolve_test.c | 395 ++++++++++++++++++
42 files changed, 1667 insertions(+), 125 deletions(-)
create mode 100644 tools/testing/selftests/openat2/.gitignore
create mode 100644 tools/testing/selftests/openat2/Makefile
create mode 100644 tools/testing/selftests/openat2/helpers.c
create mode 100644 tools/testing/selftests/openat2/helpers.h
create mode 100644 tools/testing/selftests/openat2/linkmode_test.c
create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
create mode 100644 tools/testing/selftests/openat2/resolve_test.c
--
2.22.0
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
[ Upstream commit ee8a84c60bcc1f1615bd9cb3edfe501e26cdc85b ]
Using ".arm .inst" for the arm signature introduces build issues for
programs compiled in Thumb mode because the assembler stays in the
arm mode for the rest of the inline assembly. Revert to using a ".word"
to express the signature as data instead.
The choice of signature is a valid trap instruction on arm32 little
endian, where both code and data are little endian.
ARMv6+ big endian (BE8) generates mixed endianness code vs data:
little-endian code and big-endian data. The data value of the signature
needs to have its byte order reversed to generate the trap instruction.
Prior to ARMv6, -mbig-endian generates big-endian code and data
(which match), so the endianness of the data representation of the
signature should not be reversed. However, the choice between BE32
and BE8 is done by the linker, so we cannot know whether code and
data endianness will be mixed before the linker is invoked. So rather
than try to play tricks with the linker, the rseq signature is simply
data (not a trap instruction) prior to ARMv6 on big endian. This is
why the signature is expressed as data (.word) rather than as
instruction (.inst) in assembler.
Because a ".word" is used to emit the signature, it will be interpreted
as a literal pool by a disassembler, not as an actual instruction.
Considering that the signature is not meant to be executed except in
scenarios where the program execution is completely bogus, this should
not be an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
CC: Peter Zijlstra <peterz(a)infradead.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Joel Fernandes <joelaf(a)google.com>
CC: Catalin Marinas <catalin.marinas(a)arm.com>
CC: Dave Watson <davejwatson(a)fb.com>
CC: Will Deacon <will.deacon(a)arm.com>
CC: Shuah Khan <shuah(a)kernel.org>
CC: Andi Kleen <andi(a)firstfloor.org>
CC: linux-kselftest(a)vger.kernel.org
CC: "H . Peter Anvin" <hpa(a)zytor.com>
CC: Chris Lameter <cl(a)linux.com>
CC: Russell King <linux(a)arm.linux.org.uk>
CC: Michael Kerrisk <mtk.manpages(a)gmail.com>
CC: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
CC: Paul Turner <pjt(a)google.com>
CC: Boqun Feng <boqun.feng(a)gmail.com>
CC: Josh Triplett <josh(a)joshtriplett.org>
CC: Steven Rostedt <rostedt(a)goodmis.org>
CC: Ben Maurer <bmaurer(a)fb.com>
CC: linux-api(a)vger.kernel.org
CC: Andy Lutomirski <luto(a)amacapital.net>
CC: Andrew Morton <akpm(a)linux-foundation.org>
CC: Linus Torvalds <torvalds(a)linux-foundation.org>
CC: Carlos O'Donell <carlos(a)redhat.com>
CC: Florian Weimer <fweimer(a)redhat.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/rseq/rseq-arm.h | 61 +++++++++++++------------
1 file changed, 33 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 84f28f147fb6..5943c816c07c 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -6,6 +6,8 @@
*/
/*
+ * - ARM little endian
+ *
* RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
* value 0x5de3. This traps if user-space reaches this instruction by mistake,
* and the uncommon operand ensures the kernel does not move the instruction
@@ -22,36 +24,40 @@
* def3 udf #243 ; 0xf3
* e7f5 b.n <7f5>
*
- * pre-ARMv6 big endian code:
- * e7f5 b.n <7f5>
- * def3 udf #243 ; 0xf3
+ * - ARMv6+ big endian (BE8):
*
* ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
- * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
- * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
- * (which match), so there is no need to reverse the endianness of the data
- * representation of the signature. However, the choice between BE32 and BE8
- * is done by the linker, so we cannot know whether code and data endianness
- * will be mixed before the linker is invoked.
+ * code and big-endian data. The data value of the signature needs to have its
+ * byte order reversed to generate the trap instruction:
+ *
+ * Data: 0xf3def5e7
+ *
+ * Translates to this A32 instruction pattern:
+ *
+ * e7f5def3 udf #24035 ; 0x5de3
+ *
+ * Translates to this T16 instruction pattern:
+ *
+ * def3 udf #243 ; 0xf3
+ * e7f5 b.n <7f5>
+ *
+ * - Prior to ARMv6 big endian (BE32):
+ *
+ * Prior to ARMv6, -mbig-endian generates big-endian code and data
+ * (which match), so the endianness of the data representation of the
+ * signature should not be reversed. However, the choice between BE32
+ * and BE8 is done by the linker, so we cannot know whether code and
+ * data endianness will be mixed before the linker is invoked. So rather
+ * than try to play tricks with the linker, the rseq signature is simply
+ * data (not a trap instruction) prior to ARMv6 on big endian. This is
+ * why the signature is expressed as data (.word) rather than as
+ * instruction (.inst) in assembler.
*/
-#define RSEQ_SIG_CODE 0xe7f5def3
-
-#ifndef __ASSEMBLER__
-
-#define RSEQ_SIG_DATA \
- ({ \
- int sig; \
- asm volatile ("b 2f\n\t" \
- "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
- "2:\n\t" \
- "ldr %[sig], 1b\n\t" \
- : [sig] "=r" (sig)); \
- sig; \
- })
-
-#define RSEQ_SIG RSEQ_SIG_DATA
-
+#ifdef __ARMEB__
+#define RSEQ_SIG 0xf3def5e7 /* udf #24035 ; 0x5de3 (ARMv6+) */
+#else
+#define RSEQ_SIG 0xe7f5def3 /* udf #24035 ; 0x5de3 */
#endif
#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc")
@@ -125,8 +131,7 @@ do { \
__rseq_str(table_label) ":\n\t" \
".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
- ".arm\n\t" \
- ".inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
+ ".word " __rseq_str(RSEQ_SIG) "\n\t" \
__rseq_str(label) ":\n\t" \
teardown \
"b %l[" __rseq_str(abort_label) "]\n\t"
--
2.20.1
The code in vmx.c does not use "program_invocation_name", so there
is no need to "#define _GNU_SOURCE" here.
Signed-off-by: Thomas Huth <thuth(a)redhat.com>
---
tools/testing/selftests/kvm/lib/x86_64/vmx.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/vmx.c b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
index fe56d159d65f..204f847bd065 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/vmx.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/vmx.c
@@ -5,8 +5,6 @@
* Copyright (C) 2018, Google LLC.
*/
-#define _GNU_SOURCE /* for program_invocation_name */
-
#include "test_util.h"
#include "kvm_util.h"
#include "processor.h"
--
2.21.0
The kvm_create_max_vcpus is generic enough so that it works out of the
box on POWER, too. We just have to provide some stubs for linking the
code from kvm_util.c.
Note that you also might have to do "ulimit -n 2500" before running the
test, to avoid that it runs out of file handles for the vCPUs.
Signed-off-by: Thomas Huth <thuth(a)redhat.com>
---
RFC since the stubs are a little bit ugly (does someone here like
to implement them?), and since it's a little bit annoying that
you have to raise the ulimit for this test in case the kernel provides
more vCPUs than the default ulimit...
tools/testing/selftests/kvm/Makefile | 6 +++
.../selftests/kvm/lib/powerpc/processor.c | 37 +++++++++++++++++++
2 files changed, 43 insertions(+)
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index ba7849751989..c92dc78ff74b 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -11,6 +11,8 @@ LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/ucall.c lib/sparsebi
LIBKVM_x86_64 = lib/x86_64/processor.c lib/x86_64/vmx.c
LIBKVM_aarch64 = lib/aarch64/processor.c
LIBKVM_s390x = lib/s390x/processor.c
+LIBKVM_ppc64 = lib/powerpc/processor.c
+LIBKVM_ppc64le = $(LIBKVM_ppc64)
TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86_64 += x86_64/evmcs_test
@@ -35,6 +37,10 @@ TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus
TEST_GEN_PROGS_s390x += s390x/sync_regs_test
TEST_GEN_PROGS_s390x += kvm_create_max_vcpus
+TEST_GEN_PROGS_ppc64 += kvm_create_max_vcpus
+
+TEST_GEN_PROGS_ppc64le = $(TEST_GEN_PROGS_ppc64)
+
TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
LIBKVM += $(LIBKVM_$(UNAME_M))
diff --git a/tools/testing/selftests/kvm/lib/powerpc/processor.c b/tools/testing/selftests/kvm/lib/powerpc/processor.c
new file mode 100644
index 000000000000..c0b7f06e206e
--- /dev/null
+++ b/tools/testing/selftests/kvm/lib/powerpc/processor.c
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * KVM selftest s390x library code - CPU-related functions
+ */
+
+#define _GNU_SOURCE
+
+#include "kvm_util.h"
+#include "../kvm_util_internal.h"
+
+void virt_pgd_alloc(struct kvm_vm *vm, uint32_t memslot)
+{
+ abort(); /* TODO: implement this */
+}
+
+void virt_pg_map(struct kvm_vm *vm, uint64_t gva, uint64_t gpa,
+ uint32_t memslot)
+{
+ abort(); /* TODO: implement this */
+}
+
+vm_paddr_t addr_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
+{
+ abort(); /* TODO: implement this */
+
+ return -1;
+}
+
+void virt_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
+{
+ abort(); /* TODO: implement this */
+}
+
+void vcpu_dump(FILE *stream, struct kvm_vm *vm, uint32_t vcpuid, uint8_t indent)
+{
+ abort(); /* TODO: implement this */
+}
--
2.21.0
Add a skip() message function that stops the test, logs an explanation,
and sets the "skip" return code (4).
Before loading a livepatch self-test kernel module, first verify that
we've built and installed it by running a 'modprobe --dry-run'. This
should catch a few environment issues, including !CONFIG_LIVEPATCH and
!CONFIG_TEST_LIVEPATCH. In these cases, exit gracefully with the new
skip() function.
Reported-by: Jiri Benc <jbenc(a)redhat.com>
Suggested-by: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Joe Lawrence <joe.lawrence(a)redhat.com>
---
v2: move assert_mod() call into load_mod() and load_lp_nowait(), before
they check whether the module is a livepatch or not (a test-failing
assertion).
.../testing/selftests/livepatch/functions.sh | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 30195449c63c..de5a504ffdbc 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -13,6 +13,14 @@ function log() {
echo "$1" > /dev/kmsg
}
+# skip(msg) - testing can't proceed
+# msg - explanation
+function skip() {
+ log "SKIP: $1"
+ echo "SKIP: $1" >&2
+ exit 4
+}
+
# die(msg) - game over, man
# msg - dying words
function die() {
@@ -43,6 +51,12 @@ function loop_until() {
done
}
+function assert_mod() {
+ local mod="$1"
+
+ modprobe --dry-run "$mod" &>/dev/null
+}
+
function is_livepatch_mod() {
local mod="$1"
@@ -75,6 +89,9 @@ function __load_mod() {
function load_mod() {
local mod="$1"; shift
+ assert_mod "$mod" ||
+ skip "Failed modprobe --dry-run of module: $mod"
+
is_livepatch_mod "$mod" &&
die "use load_lp() to load the livepatch module $mod"
@@ -88,6 +105,9 @@ function load_mod() {
function load_lp_nowait() {
local mod="$1"; shift
+ assert_mod "$mod" ||
+ skip "Failed modprobe --dry-run of module: $mod"
+
is_livepatch_mod "$mod" ||
die "module $mod is not a livepatch"
--
2.21.0
Hi,
These patches make it possible to attach BPF programs directly to tracepoints
using ftrace (/sys/kernel/debug/tracing) without needing the process doing the
attach to be alive. This has the following benefits:
1. Simplified Security: In Android, we have finer-grained security controls to
specific ftrace trace events using SELinux labels. We control precisely who is
allowed to enable an ftrace event already. By adding a node to ftrace for
attaching BPF programs, we can use the same mechanism to further control who is
allowed to attach to a trace event.
2. Process lifetime: In Android we are adding usecases where a tracing program
needs to be attached all the time to a tracepoint, for the full life time of
the system. Such as to gather statistics where there no need for a detach for
the full system lifetime. With perf or bpf(2)'s BPF_RAW_TRACEPOINT_OPEN, this
means keeping a process alive all the time. However, in Android our BPF loader
currently (for hardeneded security) involves just starting a process at boot
time, doing the BPF program loading, and then pinning them to /sys/fs/bpf. We
don't keep this process alive all the time. It is more suitable to do a
one-shot attach of the program using ftrace and not need to have a process
alive all the time anymore for this. Such process also needs elevated
privileges since tracepoint program loading currently requires CAP_SYS_ADMIN
anyway so by design Android's bpfloader runs once at init and exits.
This series add a new bpf file to /sys/kernel/debug/tracing/events/X/Y/bpf
The following commands can be written into it:
attach:<fd> Attaches BPF prog fd to tracepoint
detach:<fd> Detaches BPF prog fd to tracepoint
Reading the bpf file will show all the attached programs to the tracepoint.
Joel Fernandes (Google) (4):
Move bpf_raw_tracepoint functionality into bpf_trace.c
trace/bpf: Add support for attach/detach of ftrace events to BPF
lib/bpf: Add support for ftrace event attach and detach
selftests/bpf: Add test for ftrace-based BPF attach/detach
include/linux/bpf_trace.h | 16 ++
include/linux/trace_events.h | 1 +
kernel/bpf/syscall.c | 69 +-----
kernel/trace/bpf_trace.c | 225 ++++++++++++++++++
kernel/trace/trace.h | 1 +
kernel/trace/trace_events.c | 8 +
tools/lib/bpf/bpf.c | 53 +++++
tools/lib/bpf/bpf.h | 4 +
tools/lib/bpf/libbpf.map | 2 +
.../raw_tp_writable_test_ftrace_run.c | 89 +++++++
10 files changed, 410 insertions(+), 58 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/raw_tp_writable_test_ftrace_run.c
--
2.22.0.410.gd8fdbe21b5-goog
## TL;DR
This patchset addresses comments from Stephen Boyd. No changes affect
the API, and all changes are specific to patches 02, 03, and 04;
however, there were some significant changes to how string_stream and
kunit_stream work under the hood.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[1]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in about a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.2/v11 branch.
## Changes Since Last Version
- Went back to using spinlock in `struct string_stream`. Needed for so
that it is compatible with different GFP flags to address comment from
Stephen.
- Added string_stream_append function to string_stream API. - suggested
by Stephen.
- Made all string fragments and other allocations internal to
string_stream and kunit_stream managed by the KUnit resource
management API.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.2/v11
--
2.22.0.510.g264f2c817a-goog
## TL;DR
This patchset addresses comments from Stephen Boyd. Most changes are
pretty minor, but this does fix a couple of bugs pointed out by Stephen.
I imagine that Stephen will probably have some more comments, but I
wanted to get this out for him to look at as soon as possible.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[1]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in about a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.2/v10 branch.
## Changes Since Last Version
- Went back to using spinlock in `struct kunit`. Needed for resource
management API. Thanks to Stephen for this change.
- Fixed bug where an init failure may not be recorded as a failure in
patch 01/18.
- Added append method to string_stream as suggested by Stephen.
- Mostly pretty minor changes after that, which mostly pertain to
string_stream and kunit_stream.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.2/v10
--
2.22.0.510.g264f2c817a-goog