The kernel may be configured or an IMA policy specified on the boot
command line requiring the kexec kernel image signature to be verified.
At runtime a custom IMA policy may be loaded, replacing the policy
specified on the boot command line. In addition, the arch specific
policy rules are dynamically defined based on the secure boot mode that
may require the kernel image signature to be verified.
The kernel image may have a PE signature, an IMA signature, or both. In
addition, there are two kexec syscalls - kexec_load and kexec_file_load
- but only the kexec_file_load syscall can verify signatures.
These kexec selftests verify that only properly signed kernel images are
loaded as required, based on the kernel config, the secure boot mode,
and the IMA runtime policy.
Loading a kernel image requires root privileges. To run just the KEXEC
selftests: sudo make TARGETS=kexec kselftest
Changelog v5:
- Make tests independent of IMA being enabled, folding the changes
into the kexec_file_load test.
- Add support for CONFIG_KEXEC_VERIFY_SIG being enabled, but not
CONFIG_KEXEC_BZIMAGE_VERIFY_SIG.
Changelog v4:
- Moved the kexec tests to selftests/kexec, as requested by Dave Young.
- Removed the kernel module selftest from this patch set.
- Rewritten cover letter, removing reference to kernel modules.
Changelog v3:
- Updated tests based on Petr's review, including the defining a common
test to check for root privileges.
- Modified config, removing the CONFIG_KEXEC_VERIFY_SIG requirement.
- Updated the SPDX license to GPL-2.0 based on Shuah's review.
- Updated the secureboot mode test to check the SetupMode as well, based
on David Young's review.
Mimi Zohar (8):
selftests/kexec: move the IMA kexec_load selftest to selftests/kexec
selftests/kexec: cleanup the kexec selftest
selftests/kexec: define a set of common functions
selftests/kexec: define common logging functions
kselftest/kexec: define "require_root_privileges"
selftests/kexec: kexec_file_load syscall test
selftests/kexec: check kexec_load and kexec_file_load are enabled
selftests/kexec: make kexec_load test independent of IMA being enabled
Petr Vorel (1):
selftests/kexec: Add missing '=y' to config options
tools/testing/selftests/Makefile | 2 +-
tools/testing/selftests/ima/Makefile | 11 --
tools/testing/selftests/ima/config | 4 -
tools/testing/selftests/ima/test_kexec_load.sh | 54 ------
tools/testing/selftests/kexec/Makefile | 12 ++
tools/testing/selftests/kexec/config | 3 +
tools/testing/selftests/kexec/kexec_common_lib.sh | 175 +++++++++++++++++
.../selftests/kexec/test_kexec_file_load.sh | 208 +++++++++++++++++++++
tools/testing/selftests/kexec/test_kexec_load.sh | 47 +++++
9 files changed, 446 insertions(+), 70 deletions(-)
delete mode 100644 tools/testing/selftests/ima/Makefile
delete mode 100644 tools/testing/selftests/ima/config
delete mode 100755 tools/testing/selftests/ima/test_kexec_load.sh
create mode 100644 tools/testing/selftests/kexec/Makefile
create mode 100644 tools/testing/selftests/kexec/config
create mode 100755 tools/testing/selftests/kexec/kexec_common_lib.sh
create mode 100755 tools/testing/selftests/kexec/test_kexec_file_load.sh
create mode 100755 tools/testing/selftests/kexec/test_kexec_load.sh
--
2.7.5
If the cgroup destruction races with an exit() of a belonging
process(es), cg_kill_all() may fail. It's not a good reason to make
cg_destroy() fail and leave the cgroup in place, potentially causing
next test runs to fail.
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: kernel-team(a)fb.com
Cc: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/cgroup/cgroup_util.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/testing/selftests/cgroup/cgroup_util.c b/tools/testing/selftests/cgroup/cgroup_util.c
index 14c9fe284806..eba06f94433b 100644
--- a/tools/testing/selftests/cgroup/cgroup_util.c
+++ b/tools/testing/selftests/cgroup/cgroup_util.c
@@ -227,9 +227,7 @@ int cg_destroy(const char *cgroup)
retry:
ret = rmdir(cgroup);
if (ret && errno == EBUSY) {
- ret = cg_killall(cgroup);
- if (ret)
- return ret;
+ cg_killall(cgroup);
usleep(100);
goto retry;
}
--
2.20.1
A test for the basic NAT functionality uses ip command which
needs veth device.There is a condition where the kernel support
for veth is not compiled into the kernel and the test script
breaks.This patch contains code for reasonable error display
and correct code exit.
Signed-off-by: Jeffrin Jose T <jeffrin(a)rajagiritech.edu.in>
---
tools/testing/selftests/netfilter/nft_nat.sh | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/netfilter/nft_nat.sh b/tools/testing/selftests/netfilter/nft_nat.sh
index 8ec76681605c..f25f72a75cf3 100755
--- a/tools/testing/selftests/netfilter/nft_nat.sh
+++ b/tools/testing/selftests/netfilter/nft_nat.sh
@@ -23,7 +23,11 @@ ip netns add ns0
ip netns add ns1
ip netns add ns2
-ip link add veth0 netns ns0 type veth peer name eth0 netns ns1
+ip link add veth0 netns ns0 type veth peer name eth0 netns ns1 > /dev/null 2>&1
+if [ $? -ne 0 ];then
+ echo "SKIP: No virtual ethernet pair device support in kernel"
+ exit $ksft_skip
+fi
ip link add veth1 netns ns0 type veth peer name eth0 netns ns2
ip -net ns0 link set lo up
--
2.20.1
Hi,
strscpy_pad() patch set now with added test shenanigans.
This version adds 5 initial patches to the set and splits the single
patch from v2 into two separate patches (6 and 7).
While doing the testing for strscpy_pad() it was noticed that there is
duplication in how test modules are being fed to kselftest and also in
the test modules themselves.
This set makes an attempt at adding a framework to kselftest for writing
kernel test modules. It also adds a script for use in creating script
test runners for kselftest. My macro-foo is not great, all criticism
and suggestions very much appreciated. The design is based on test
modules lib/test_printf.c, lib/test_bitmap.c, lib/test_xarray.c.
Shua, I'm by no means a kselftest expert, if this approach does not fit
in with your general direction please say so.
Kees, I put the strscpy_pad() addition patch separate so if this goes in
through Shua's tree (and if it goes in at all) its a single patch to
grab if we want to start playing around with strscpy_pad().
Patch 1 fixes module unload for lib/test_printf in preparation for the
rest of the series.
Patch 2 Adds a shell script that can be used to create shell script test
runners.
Patch 3 Converts current shell script runners in
tools/testing/selftests/lib/ to use the script introduced in
patch 2.
Patch 4 Adds the test framework by way of a header file (inc. documentation)
Patch 5 Converts a couple of current test modules to make some use of
the newly added test framework.
Patch 6 Adds strscpy_pad()
Patch 7 Adds test module for strscpy_pad() using the new framework and script.
If you are a testing geek and you would like to play with this; if you
are already running a kernel built recently from your tree you may
want to just apply the first 5 patches then you don't need to build/boot
a new kernel, just config and build the lib/ test modules (test_printf
etc.) and then:
sudo make TARGETS=lib kselftest
Late in the development of this I found that a bunch of boiler plate had
to be added to the script to handle running tests with:
make O=/path/to/kout kselftest
The reason is that during the build we are in the output directory but
the script is in the source directory. I get the feeling that a better
understanding of how the kernel build process works would provide a
better solution to this. The current solution is disappointing since
removing duplication and boiler plate was the point of the whole
exercise. I'd love a better way to solve this?
One final interesting note: there are 36 test modules in lib/ only 3 of
them are run by kselftest from tools/testing/selfests/lib?
Thanks for looking at this,
Tobin.
Tobin C. Harding (7):
lib/test_printf: Add empty module_exit function
kselftest: Add test runner creation script
kselftest/lib: Use new shell runner to define tests
kselftest: Add test module framework header
lib: Use new kselftest header
lib/string: Add strscpy_pad() function
lib: Add test module for strscpy_pad
Documentation/dev-tools/kselftest.rst | 108 ++++++++++++-
include/linux/string.h | 4 +
lib/Kconfig.debug | 3 +
lib/Makefile | 1 +
lib/string.c | 47 +++++-
lib/test_bitmap.c | 20 +--
lib/test_printf.c | 17 +--
lib/test_strscpy.c | 150 +++++++++++++++++++
tools/testing/selftests/kselftest_module.h | 48 ++++++
tools/testing/selftests/kselftest_module.sh | 75 ++++++++++
tools/testing/selftests/lib/Makefile | 2 +-
tools/testing/selftests/lib/bitmap.sh | 25 ++--
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/prime_numbers.sh | 23 ++-
tools/testing/selftests/lib/printf.sh | 25 ++--
tools/testing/selftests/lib/strscpy.sh | 17 +++
16 files changed, 490 insertions(+), 76 deletions(-)
create mode 100644 lib/test_strscpy.c
create mode 100644 tools/testing/selftests/kselftest_module.h
create mode 100755 tools/testing/selftests/kselftest_module.sh
create mode 100755 tools/testing/selftests/lib/strscpy.sh
--
2.20.1
Introduce in-kernel headers and other artifacts which are made available
as an archive through proc (/proc/kheaders.txz file). This archive makes
it possible to build kernel modules, run eBPF programs, and other
tracing programs that need to extend the kernel for tracing purposes
without any dependency on the file system having headers and build
artifacts.
On Android and embedded systems, it is common to switch kernels but not
have kernel headers available on the file system. Raw kernel headers
also cannot be copied into the filesystem like they can be on other
distros, due to licensing and other issues. There's no linux-headers
package on Android. Further once a different kernel is booted, any
headers stored on the file system will no longer be useful. By storing
the headers as a compressed archive within the kernel, we can avoid these
issues that have been a hindrance for a long time.
The feature is also buildable as a module just in case the user desires
it not being part of the kernel image. This makes it possible to load
and unload the headers on demand. A tracing program, or a kernel module
builder can load the module, do its operations, and then unload the
module to save kernel memory. The total memory needed is 3.8MB.
The code to read the headers is based on /proc/config.gz code and uses
the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine:
modprobe kheaders
rm -rf $HOME/headers
mkdir -p $HOME/headers
tar -xvf /proc/kheaders.txz -C $HOME/headers >/dev/null
cd my-kernel-module
make -C $HOME/headers M=$(pwd) modules
rmmod kheaders
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
Changes since v1:
- removed IKH_EXTRA variable, not needed (Masahiro Yamada)
- small fix ups to selftest
- added target to main Makefile etc
- added MODULE_LICENSE to test module
- made selftest more quiet
Changes since RFC:
Both changes bring size down to 3.8MB:
- use xz for compression
- strip comments except SPDX lines
- Call out the module name in Kconfig
- Also added selftests in second patch to ensure headers are always
working.
Documentation/dontdiff | 1 +
init/Kconfig | 11 ++++++
kernel/.gitignore | 2 ++
kernel/Makefile | 27 ++++++++++++++
kernel/kheaders.c | 74 +++++++++++++++++++++++++++++++++++++++
scripts/gen_ikh_data.sh | 19 ++++++++++
scripts/strip-comments.pl | 8 +++++
7 files changed, 142 insertions(+)
create mode 100644 kernel/kheaders.c
create mode 100755 scripts/gen_ikh_data.sh
create mode 100755 scripts/strip-comments.pl
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 2228fcc8e29f..05a2319ee2a2 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -151,6 +151,7 @@ int8.c
kallsyms
kconfig
keywords.c
+kheaders_data.h*
ksym.c*
ksym.h*
kxgettext
diff --git a/init/Kconfig b/init/Kconfig
index c9386a365eea..9fbf4f73d98c 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -563,6 +563,17 @@ config IKCONFIG_PROC
This option enables access to the kernel configuration file
through /proc/config.gz.
+config IKHEADERS_PROC
+ tristate "Enable kernel header artifacts through /proc/kheaders.txz"
+ select BUILD_BIN2C
+ depends on PROC_FS
+ help
+ This option enables access to the kernel header and other artifacts that
+ are generated during the build process. These can be used to build kernel
+ modules, and other in-kernel programs such as those generated by eBPF
+ and systemtap tools. If you build the headers as a module, a module
+ called kheaders.ko is built which can be loaded to get access to them.
+
config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
range 12 25
diff --git a/kernel/.gitignore b/kernel/.gitignore
index b3097bde4e9c..6acf71acbdcb 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -3,5 +3,7 @@
#
config_data.h
config_data.gz
+kheaders_data.h
+kheaders_data.txz
timeconst.h
hz.bc
diff --git a/kernel/Makefile b/kernel/Makefile
index 6aa7543bcdb2..1d13a7a6c537 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -70,6 +70,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_USER_NS) += user_namespace.o
obj-$(CONFIG_PID_NS) += pid_namespace.o
obj-$(CONFIG_IKCONFIG) += configs.o
+obj-$(CONFIG_IKHEADERS_PROC) += kheaders.o
obj-$(CONFIG_SMP) += stop_machine.o
obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
@@ -130,3 +131,29 @@ filechk_ikconfiggz = \
targets += config_data.h
$(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call filechk,ikconfiggz)
+
+# Build a list of in-kernel headers for building kernel modules
+ikh_file_list := include/
+ikh_file_list += arch/$(ARCH)/Makefile
+ikh_file_list += arch/$(ARCH)/include/
+ikh_file_list += scripts/
+ikh_file_list += Makefile
+ikh_file_list += Module.symvers
+ifeq ($(CONFIG_STACK_VALIDATION), y)
+ikh_file_list += $(objtree)/tools/objtool/objtool
+endif
+
+$(obj)/kheaders.o: $(obj)/kheaders_data.h
+
+targets += kheaders_data.txz
+
+quiet_cmd_genikh = GEN $(obj)/kheaders_data.txz
+cmd_genikh = $(srctree)/scripts/gen_ikh_data.sh $@ $^ >/dev/null 2>&1
+$(obj)/kheaders_data.txz: $(ikh_file_list) FORCE
+ $(call cmd,genikh)
+
+filechk_ikheadersxz = (echo "static const char kernel_headers_data[] __used = KH_MAGIC_START"; cat $< | scripts/bin2c; echo "KH_MAGIC_END;")
+
+targets += kheaders_data.h
+$(obj)/kheaders_data.h: $(obj)/kheaders_data.txz FORCE
+ $(call filechk,ikheadersxz)
diff --git a/kernel/kheaders.c b/kernel/kheaders.c
new file mode 100644
index 000000000000..c39930f51202
--- /dev/null
+++ b/kernel/kheaders.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel/kheaders.c
+ * Provide headers and artifacts needed to build kernel modules.
+ * (Borrowed code from kernel/configs.c)
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/init.h>
+#include <linux/uaccess.h>
+
+/*
+ * Define kernel_headers_data and kernel_headers_data_size, which contains the
+ * compressed kernel headers. The file is first compressed with xz and then
+ * bounded by two eight byte magic numbers to allow extraction from a binary
+ * kernel image:
+ *
+ * IKHD_ST
+ * <image>
+ * IKHD_ED
+ */
+#define KH_MAGIC_START "IKHD_ST"
+#define KH_MAGIC_END "IKHD_ED"
+#include "kheaders_data.h"
+
+
+#define KH_MAGIC_SIZE (sizeof(KH_MAGIC_START) - 1)
+#define kernel_headers_data_size \
+ (sizeof(kernel_headers_data) - 1 - KH_MAGIC_SIZE * 2)
+
+static ssize_t
+ikheaders_read_current(struct file *file, char __user *buf,
+ size_t len, loff_t *offset)
+{
+ return simple_read_from_buffer(buf, len, offset,
+ kernel_headers_data + KH_MAGIC_SIZE,
+ kernel_headers_data_size);
+}
+
+static const struct file_operations ikheaders_file_ops = {
+ .owner = THIS_MODULE,
+ .read = ikheaders_read_current,
+ .llseek = default_llseek,
+};
+
+static int __init ikheaders_init(void)
+{
+ struct proc_dir_entry *entry;
+
+ /* create the current headers file */
+ entry = proc_create("kheaders.txz", S_IFREG | S_IRUGO, NULL,
+ &ikheaders_file_ops);
+ if (!entry)
+ return -ENOMEM;
+
+ proc_set_size(entry, kernel_headers_data_size);
+
+ return 0;
+}
+
+static void __exit ikheaders_cleanup(void)
+{
+ remove_proc_entry("kheaders.txz", NULL);
+}
+
+module_init(ikheaders_init);
+module_exit(ikheaders_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Joel Fernandes");
+MODULE_DESCRIPTION("Echo the kernel header artifacts used to build the kernel");
diff --git a/scripts/gen_ikh_data.sh b/scripts/gen_ikh_data.sh
new file mode 100755
index 000000000000..609196b5cea2
--- /dev/null
+++ b/scripts/gen_ikh_data.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+spath="$(dirname "$(readlink -f "$0")")"
+
+rm -rf $1.tmp
+mkdir $1.tmp
+
+for f in "${@:2}";
+ do find "$f" ! -name "*.c" ! -name "*.o" ! -name "*.cmd" ! -name ".*";
+done | cpio -pd $1.tmp
+
+for f in $(find $1.tmp); do
+ $spath/strip-comments.pl $f
+done
+
+tar -Jcf $1 -C $1.tmp/ . > /dev/null
+
+rm -rf $1.tmp
diff --git a/scripts/strip-comments.pl b/scripts/strip-comments.pl
new file mode 100755
index 000000000000..f8ada87c5802
--- /dev/null
+++ b/scripts/strip-comments.pl
@@ -0,0 +1,8 @@
+#!/usr/bin/perl -pi
+# SPDX-License-Identifier: GPL-2.0
+
+# This script removes /**/ comments from a file, unless such comments
+# contain "SPDX". It is used when building compressed in-kernel headers.
+
+BEGIN {undef $/;}
+s/\/\*((?!SPDX).)*?\*\///smg;
--
2.20.1.791.gb4d0f1c61a-goog
Add a new test for Media Device Allocator API.
Media Device Allocator API to allows multiple drivers share a media device.
This API solves a very common use-case for media devices where one physical
device (an USB stick) provides both audio and video. When such media device
exposes a standard USB Audio class, a proprietary Video class, two or more
independent drivers will share a single physical USB bridge. In such cases,
it is necessary to coordinate access to the shared resource.
Using this API, drivers can allocate a media device with the shared struct
device as the key. Once the media device is allocated by a driver, other
drivers can get a reference to it. The media device is released when all
the references are released.
This test does a series of unbind/bind tests to make sure media device
is released correctly when it is no longer is use and when the last
driver releases the reference.
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
---
.../media_tests/media_dev_allocator.sh | 85 +++++++++++++++++++
1 file changed, 85 insertions(+)
create mode 100755 tools/testing/selftests/media_tests/media_dev_allocator.sh
diff --git a/tools/testing/selftests/media_tests/media_dev_allocator.sh b/tools/testing/selftests/media_tests/media_dev_allocator.sh
new file mode 100755
index 000000000000..ffe00c59a483
--- /dev/null
+++ b/tools/testing/selftests/media_tests/media_dev_allocator.sh
@@ -0,0 +1,85 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Media Device Allocator API test script
+# Copyright (c) 2019 Shuah Khan <shuah(a)kernel.org>
+
+echo "Media Device Allocator testing: unbind and bind"
+echo "media driver $1 audio driver $2"
+
+MDRIVER=/sys/bus/usb/drivers/$1
+cd $MDRIVER
+MDEV=$(ls -d *\-*)
+
+ADRIVER=/sys/bus/usb/drivers/$2
+cd $ADRIVER
+ADEV=$(ls -d *\-*.1)
+
+echo "=================================="
+echo "Test unbind both devices - start"
+echo "Running unbind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/unbind;
+
+echo "Media device should still be present!"
+ls -l /dev/media*
+
+echo "sound driver is at: $ADRIVER"
+echo "Device is: $ADEV"
+
+echo "Running unbind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/unbind;
+
+echo "Media device should have been deleted!"
+ls -l /dev/media*
+echo "Test unbind both devices - end"
+
+echo "=================================="
+
+echo "Test bind both devices - start"
+echo "Running bind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/bind;
+
+echo "Media device should be present!"
+ls -l /dev/media*
+
+echo "Running bind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Test bind both devices - end"
+
+echo "=================================="
+
+echo "Test unbind $MDEV - bind $MDEV - unbind $ADEV - bind $ADEV start"
+
+echo "Running unbind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/unbind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+sleep 1
+
+echo "Running bind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Running unbind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/unbind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+sleep 1
+
+echo "Running bind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Test unbind $MDEV - bind $MDEV - unbind $ADEV - bind $ADEV end"
+echo "=================================="
--
2.17.1
hello
i think the script nft_nat.sh is assuming devices eth0 and eth1
which may not be the case always. my suggestion is why not give the needed
network devices as arguments to the script. iam showing related
command line sessions below and error related file is attached.
---------------------------x-------------x----------------------------
$ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast
state DOWN mode DEFAULT group default qlen 1000
link/ether 70:5a:0f:b9:d8:5c brd ff:ff:ff:ff:ff:ff
3: wlp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state
UP mode DORMANT group default qlen 1000
link/ether 68:14:01:07:36:1f brd ff:ff:ff:ff:ff:ff
$
------------------------x-----------x---------------------------------------
$sudo ./nft_nat.sh 2> error-related.txt
ERROR: ping failed
SKIP: Could not add add ip6 dnat hook
ERROR: canot ping ns1 from ns2
ERROR: cannot ping ns1 from ns2 with active ip masquerading
ERROR: cannot ping ns1 from ns2 via ipv6
ERROR: cannot ping ns1 from ns2
ERROR: cannot ping ns1 from ns2 with active ip redirect
ERROR: cannnot ping ns1 from ns2 via ipv6
ERROR: cannot ping ns1 from ns2 with active ip6 redirect
-------------------------x---------------------------x------------------------------------
a file is attached which shows the contents of error-related.txt
/Jeffrin
--
software engineer
rajagiri school of engineering and technology
Extend bpf_skb_adjust_room growth to mark inner MAC header so that
L2 encapsulation can be used for tc tunnels.
Patch #1 extends the existing test_tc_tunnel to support UDP
encapsulation; later we want to be able to test MPLS over UDP and
MPLS over GRE encapsulation.
Patch #2 adds the BPF_F_ADJ_ROOM_ENCAP_L2(len) macro, which
allows specification of inner mac length. Other approaches were
explored prior to taking this approach. Specifically, I tried
automatically computing the inner mac length on the basis of the
specified flags (so inner maclen for GRE/IPv4 encap is the len_diff
specified to bpf_skb_adjust_room minus GRE + IPv4 header length
for example). Problem with this is that we don't know for sure
what form of GRE/UDP header we have; is it a full GRE header,
or is it a FOU UDP header or generic UDP encap header? My fear
here was we'd end up with an explosion of flags. The other approach
tried was to support inner L2 header marking as a separate room
adjustment, i.e. adjust for L3/L4 encap, then call
bpf_skb_adjust_room for L2 encap. This can be made to work but
because it imposed an order on operations, felt a bit clunky.
Patch #3 syncs tools/ bpf.h.
Patch #4 extends the tests again to support MPLSoverGRE and
MPLSoverUDP encap, along with existing test coverage.
Alan Maguire (4):
selftests_bpf: extend test_tc_tunnel for UDP encap
bpf: add layer 2 encap support to bpf_skb_adjust_room
bpf: sync bpf.h to tools/ for BPF_F_ADJ_ROOM_ENCAP_L2
selftests_bpf: extend test_tc_tunnel.sh test for L2 encap
include/uapi/linux/bpf.h | 5 +
net/core/filter.c | 19 +-
tools/include/uapi/linux/bpf.h | 5 +
tools/testing/selftests/bpf/progs/test_tc_tunnel.c | 281 ++++++++++++++++-----
tools/testing/selftests/bpf/test_tc_tunnel.sh | 105 +++++---
5 files changed, 318 insertions(+), 97 deletions(-)
--
1.8.3.1
Add a new test for Media Device Allocator API.
Media Device Allocator API to allows multiple drivers share a media device.
This API solves a very common use-case for media devices where one physical
device (an USB stick) provides both audio and video. When such media device
exposes a standard USB Audio class, a proprietary Video class, two or more
independent drivers will share a single physical USB bridge. In such cases,
it is necessary to coordinate access to the shared resource.
Using this API, drivers can allocate a media device with the shared struct
device as the key. Once the media device is allocated by a driver, other
drivers can get a reference to it. The media device is released when all
the references are released.
This test does a series of unbind/bind tests to make sure media device
is released correctly when it is no longer is use and when the last
driver releases the reference.
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
---
.../media_tests/media_dev_allocator.sh | 85 +++++++++++++++++++
1 file changed, 85 insertions(+)
create mode 100755 tools/testing/selftests/media_tests/media_dev_allocator.sh
diff --git a/tools/testing/selftests/media_tests/media_dev_allocator.sh b/tools/testing/selftests/media_tests/media_dev_allocator.sh
new file mode 100755
index 000000000000..ffe00c59a483
--- /dev/null
+++ b/tools/testing/selftests/media_tests/media_dev_allocator.sh
@@ -0,0 +1,85 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Media Device Allocator API test script
+# Copyright (c) 2019 Shuah Khan <shuah(a)kernel.org>
+
+echo "Media Device Allocator testing: unbind and bind"
+echo "media driver $1 audio driver $2"
+
+MDRIVER=/sys/bus/usb/drivers/$1
+cd $MDRIVER
+MDEV=$(ls -d *\-*)
+
+ADRIVER=/sys/bus/usb/drivers/$2
+cd $ADRIVER
+ADEV=$(ls -d *\-*.1)
+
+echo "=================================="
+echo "Test unbind both devices - start"
+echo "Running unbind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/unbind;
+
+echo "Media device should still be present!"
+ls -l /dev/media*
+
+echo "sound driver is at: $ADRIVER"
+echo "Device is: $ADEV"
+
+echo "Running unbind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/unbind;
+
+echo "Media device should have been deleted!"
+ls -l /dev/media*
+echo "Test unbind both devices - end"
+
+echo "=================================="
+
+echo "Test bind both devices - start"
+echo "Running bind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/bind;
+
+echo "Media device should be present!"
+ls -l /dev/media*
+
+echo "Running bind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Test bind both devices - end"
+
+echo "=================================="
+
+echo "Test unbind $MDEV - bind $MDEV - unbind $ADEV - bind $ADEV start"
+
+echo "Running unbind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/unbind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+sleep 1
+
+echo "Running bind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Running unbind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/unbind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+sleep 1
+
+echo "Running bind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Test unbind $MDEV - bind $MDEV - unbind $ADEV - bind $ADEV end"
+echo "=================================="
--
2.17.1
Add a new test for Media Device Allocator API.
Media Device Allocator API to allows multiple drivers share a media device.
This API solves a very common use-case for media devices where one physical
device (an USB stick) provides both audio and video. When such media device
exposes a standard USB Audio class, a proprietary Video class, two or more
independent drivers will share a single physical USB bridge. In such cases,
it is necessary to coordinate access to the shared resource.
Using this API, drivers can allocate a media device with the shared struct
device as the key. Once the media device is allocated by a driver, other
drivers can get a reference to it. The media device is released when all
the references are released.
This test does a series of unbind/bind tests to make sure media device
is released correctly when it is no longer is use and when the last
driver releases the reference.
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
---
.../media_tests/media_dev_allocator.sh | 81 +++++++++++++++++++
1 file changed, 81 insertions(+)
create mode 100755 tools/testing/selftests/media_tests/media_dev_allocator.sh
diff --git a/tools/testing/selftests/media_tests/media_dev_allocator.sh b/tools/testing/selftests/media_tests/media_dev_allocator.sh
new file mode 100755
index 000000000000..d58e39c1b66c
--- /dev/null
+++ b/tools/testing/selftests/media_tests/media_dev_allocator.sh
@@ -0,0 +1,81 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Media Device Allocator API test script
+# Copyright (c) 2018 Shuah Khan <shuah(a)kernel.org>
+
+echo "Media Device Allocator testing: unbind and bind"
+echo "media driver $1 audio driver $2"
+
+MDRIVER=/sys/bus/usb/drivers/$1
+cd $MDRIVER
+MDEV=$(ls -d *\-*)
+
+ADRIVER=/sys/bus/usb/drivers/$2
+cd $ADRIVER
+ADEV=$(ls -d *\-*.1)
+
+echo "=================================="
+echo "Test unbind both devices - start"
+echo "Running unbind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/unbind;
+
+echo "Media device should still be present!"
+ls -l /dev/media*
+
+echo "sound driver is at: $ADRIVER"
+echo "Device is: $ADEV"
+
+echo "Running unbind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/unbind;
+
+echo "Media device should have been deleted!"
+ls -l /dev/media*
+echo "Test unbind both devices - end"
+
+echo "=================================="
+
+echo "Test bind both devices - start"
+echo "Running bind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/bind;
+
+echo "Media device should be present!"
+ls -l /dev/media*
+
+echo "Running bind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Test bind both devices - end"
+
+echo "=================================="
+
+echo "Test unbind $MDEV - bind $MDEV - unbind $ADEV - bind $ADEV start"
+
+echo "Running unbind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/unbind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Running bind of $MDEV from $MDRIVER"
+echo $MDEV > $MDRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Running unbind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/unbind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Running bind of $ADEV from $ADRIVER"
+echo $ADEV > $ADRIVER/bind;
+
+echo "Media device should be there!"
+ls -l /dev/media*
+
+echo "Test unbind $MDEV - bind $MDEV - unbind $ADEV - bind $ADEV end"
+echo "=================================="
--
2.17.1
In rcu_rrupt_from_idle, we want to check if it is called from within an
interrupt, but want to do such checking only for debug builds. lockdep
already tracks when we enter an interrupt. Let us expose it as an
assertion macro so it can be used to assert this.
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Cc: kernel-team(a)android.com
Cc: rcu(a)vger.kernel.org
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
include/linux/lockdep.h | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index c5335df2372f..d24f564823d3 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -601,11 +601,18 @@ do { \
"IRQs not disabled as expected\n"); \
} while (0)
+#define lockdep_assert_in_irq() do { \
+ WARN_ONCE(debug_locks && !current->lockdep_recursion && \
+ !current->hardirq_context, \
+ "Not in hardirq as expected\n"); \
+ } while (0)
+
#else
# define might_lock(lock) do { } while (0)
# define might_lock_read(lock) do { } while (0)
# define lockdep_assert_irqs_enabled() do { } while (0)
# define lockdep_assert_irqs_disabled() do { } while (0)
+# define lockdep_assert_in_irq() do { } while (0)
#endif
#ifdef CONFIG_LOCKDEP
--
2.21.0.392.gf8f6787159e-goog
From: Tycho Andersen <tycho(a)tycho.ws>
[ Upstream commit 3aa415dd2128e478ea3225b59308766de0e94d6b ]
The get_metadata() test requires real root, so let's skip it if we're not
real root.
Note that I used XFAIL here because that's what the test does later if
CONFIG_CHEKCKPOINT_RESTORE happens to not be enabled. After looking at the
code, there doesn't seem to be a nice way to skip tests defined as TEST(),
since there's no return code (I tried exit(KSFT_SKIP), but that didn't work
either...). So let's do it this way to be consistent, and easier to fix
when someone comes along and fixes it.
Signed-off-by: Tycho Andersen <tycho(a)tycho.ws>
Acked-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 83057fa9d391..14cad657bc6a 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -2920,6 +2920,12 @@ TEST(get_metadata)
struct seccomp_metadata md;
long ret;
+ /* Only real root can get metadata. */
+ if (geteuid()) {
+ XFAIL(return, "get_metadata requires real root");
+ return;
+ }
+
ASSERT_EQ(0, pipe(pipefd));
pid = fork();
--
2.19.1
From: Tycho Andersen <tycho(a)tycho.ws>
[ Upstream commit 3aa415dd2128e478ea3225b59308766de0e94d6b ]
The get_metadata() test requires real root, so let's skip it if we're not
real root.
Note that I used XFAIL here because that's what the test does later if
CONFIG_CHEKCKPOINT_RESTORE happens to not be enabled. After looking at the
code, there doesn't seem to be a nice way to skip tests defined as TEST(),
since there's no return code (I tried exit(KSFT_SKIP), but that didn't work
either...). So let's do it this way to be consistent, and easier to fix
when someone comes along and fixes it.
Signed-off-by: Tycho Andersen <tycho(a)tycho.ws>
Acked-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 7e632b465ab4..6d7a81306f8a 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -2971,6 +2971,12 @@ TEST(get_metadata)
struct seccomp_metadata md;
long ret;
+ /* Only real root can get metadata. */
+ if (geteuid()) {
+ XFAIL(return, "get_metadata requires real root");
+ return;
+ }
+
ASSERT_EQ(0, pipe(pipefd));
pid = fork();
--
2.19.1
The rcutorture jitter.sh script selects a random CPU but does not check
if it is offline or online. This leads to taskset errors many times. On
my machine, hyper threading is disabled so half the cores are offline
causing taskset errors a lot of times. Let us fix this by checking from
only the online CPUs on the system.
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
tools/testing/selftests/rcutorture/bin/jitter.sh | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/bin/jitter.sh b/tools/testing/selftests/rcutorture/bin/jitter.sh
index 3633828375e3..53bf9d99b5cd 100755
--- a/tools/testing/selftests/rcutorture/bin/jitter.sh
+++ b/tools/testing/selftests/rcutorture/bin/jitter.sh
@@ -47,10 +47,19 @@ do
exit 0;
fi
- # Set affinity to randomly selected CPU
+ # Set affinity to randomly selected online CPU
cpus=`ls /sys/devices/system/cpu/*/online |
sed -e 's,/[^/]*$,,' -e 's/^[^0-9]*//' |
grep -v '^0*$'`
+
+ for c in $cpus; do
+ if [ "$(cat /sys/devices/system/cpu/cpu$c/online)" == "1" ];
+ then
+ cpus_tmp="$cpus_tmp $c"
+ fi
+ done
+ cpus=$cpus_tmp
+
cpumask=`awk -v cpus="$cpus" -v me=$me -v n=$n 'BEGIN {
srand(n + me + systime());
ncpus = split(cpus, ca);
--
2.21.0.392.gf8f6787159e-goog
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/5.0-rc5/v4
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/5.0-rc5/v4 branch.
## Changes Since Last Version
- Got KUnit working on (hypothetically) all architectures (tested on
x86), as per Rob's (and other's) request
- Punting all KUnit features/patches depending on UML for now.
- Broke out UML specific support into arch/um/* as per "[RFC v3 01/19]
kunit: test: add KUnit test runner core", as requested by Luis.
- Added support to kunit_tool to allow it to build kernels in external
directories, as suggested by Kieran.
- Added a UML defconfig, and a config fragment for KUnit as suggested
by Kieran and Luis.
- Cleaned up, and reformatted a bunch of stuff.
--
2.21.0.rc0.258.g878e2cd30e-goog
This is version 3 of the MSI interrupts for ntb_transport patchset.
I've addressed the feedback so far and rebased on the latest kernel
and would like this to be considered for merging this cycle.
The only outstanding issue I know of is that it still will not work
with IDT hardware, but ntb_transport doesn't work with IDT hardware
and there is still no sensible common infrastructure to support
ntb_peer_mw_set_trans(). Thus, I decline to consider that complication
in this patchset. However, I'll be happy to review work that adds this
feature in the future.
Also as the port number and resource index stuff is a bit complicated,
I made a quick out of tree test fixture to ensure it's correct[1]. As
an excerise I also wrote some test code[2] using the upcomming KUnit
feature.
Logan
[1] https://repl.it/repls/ExcitingPresentFile
[2] https://github.com/sbates130272/linux-p2pmem/commits/ntb_kunit
--
Changes in v3:
* Rebased onto v5.1-rc1 (Dropped the first two patches as they have
been merged, and cleaned up some minor conflicts in the PCI tree)
* Added a new patch (#3) to calculate logical port numbers that
are port numbers from 0 to (number of ports - 1). This is
then used in ntb_peer_resource_idx() to fix the issues brought
up by Serge.
* Fixed missing __iomem and iowrite calls (as noticed by Serge)
* Added patch 10 which describes ntb_msi_test in the documentation
file (as requested by Serge)
* A couple other minor nits and documentation fixes
--
Changes in v2:
* Cleaned up the changes in intel_irq_remapping.c to make them
less confusing and add a comment. (Per discussion with Jacob and
Joerg)
* Fixed a nit from Bjorn and collected his Ack
* Added a Kconfig dependancy on CONFIG_PCI_MSI for CONFIG_NTB_MSI
as the Kbuild robot hit a random config that didn't build
without it.
* Worked in a callback for when the MSI descriptor changes so that
the clients can resend the new address and data values to the peer.
On my test system this was never necessary, but there may be
other platforms where this can occur. I tested this by hacking
in a path to rewrite the MSI descriptor when I change the cpu
affinity of an IRQ. There's a bit of uncertainty over the latency
of the change, but without hardware this can acctually occur on
we can't test this. This was the result of a discussion with Dave.
--
This patch series adds optional support for using MSI interrupts instead
of NTB doorbells in ntb_transport. This is desirable seeing doorbells on
current hardware are quite slow and therefore switching to MSI interrupts
provides a significant performance gain. On switchtec hardware, a simple
apples-to-apples comparison shows ntb_netdev/iperf numbers going from
3.88Gb/s to 14.1Gb/s when switching to MSI interrupts.
To do this, a couple changes are required outside of the NTB tree:
1) The IOMMU must know to accept MSI requests from aliased bused numbers
seeing NTB hardware typically sends proxied request IDs through
additional requester IDs. The first patch in this series adds support
for the Intel IOMMU. A quirk to add these aliases for switchtec hardware
was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk
for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why
this is necessary.
2) NTB transport (and other clients) may often need more MSI interrupts
than the NTB hardware actually advertises support for. However, seeing
these interrupts will not be triggered by the hardware but through an
NTB memory window, the hardware does not actually need support or need
to know about them. Therefore we add the concept of Virtual MSI
interrupts which are allocated just like any other MSI interrupt but
are not programmed into the hardware's MSI table. This is done in
Patch 2 and then made use of in Patch 3.
The remaining patches in this series add a library for dealing with MSI
interrupts, a test client and finally support in ntb_transport.
The series is based off of v5.1-rc1 plus the patches in ntb-next.
A git repo is available here:
https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v3
Thanks,
Logan
--
Logan Gunthorpe (10):
PCI/MSI: Support allocating virtual MSI interrupts
PCI/switchtec: Add module parameter to request more interrupts
NTB: Introduce helper functions to calculate logical port number
NTB: Introduce functions to calculate multi-port resource index
NTB: Rename ntb.c to support multiple source files in the module
NTB: Introduce MSI library
NTB: Introduce NTB MSI Test Client
NTB: Add ntb_msi_test support to ntb_test
NTB: Add MSI interrupt support to ntb_transport
NTB: Describe the ntb_msi_test client in the documentation.
Documentation/ntb.txt | 27 ++
drivers/ntb/Kconfig | 11 +
drivers/ntb/Makefile | 3 +
drivers/ntb/{ntb.c => core.c} | 0
drivers/ntb/msi.c | 415 +++++++++++++++++++++++
drivers/ntb/ntb_transport.c | 169 ++++++++-
drivers/ntb/test/Kconfig | 9 +
drivers/ntb/test/Makefile | 1 +
drivers/ntb/test/ntb_msi_test.c | 433 ++++++++++++++++++++++++
drivers/pci/msi.c | 54 ++-
drivers/pci/switch/switchtec.c | 12 +-
include/linux/msi.h | 8 +
include/linux/ntb.h | 196 ++++++++++-
include/linux/pci.h | 9 +
tools/testing/selftests/ntb/ntb_test.sh | 54 ++-
15 files changed, 1386 insertions(+), 15 deletions(-)
rename drivers/ntb/{ntb.c => core.c} (100%)
create mode 100644 drivers/ntb/msi.c
create mode 100644 drivers/ntb/test/ntb_msi_test.c
--
2.20.1
After the first run, the test case 'test_create_read' will always
fail because the file is exist and file's attr is 'S_IMMUTABLE',
open with 'O_RDWR' will always return -EPERM.
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5(a)huawei.com>
---
tools/testing/selftests/efivarfs/efivarfs.sh | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/efivarfs/efivarfs.sh b/tools/testing/selftests/efivarfs/efivarfs.sh
index a47029a..d386610 100755
--- a/tools/testing/selftests/efivarfs/efivarfs.sh
+++ b/tools/testing/selftests/efivarfs/efivarfs.sh
@@ -77,6 +77,10 @@ test_create_empty()
test_create_read()
{
local file=$efivarfs_mount/$FUNCNAME-$test_guid
+ if [ -f $file]; then
+ chattr -i $file
+ rm -rf $file
+ fi
./create-read $file
}
--
2.7.4
test_tc_tunnel.sh sets up a pair of namespaces connected by a
veth pair to verify encap/decap using bpf_skb_adjust_room. In
testing this, it uses tunnel links as the peer of the bpf-based
encap/decap. However because the same IP header is used for inner
and outer IP, when packets arrive at the tunnel interface they will
be dropped by reverse path filtering as those packets are expected
on the veth interface (where the destination IP of the decapped
packet is configured).
To avoid this, ensure reverse path filtering is disabled for the
namespace using tunneling.
Fixes: 98cdabcd0798 ("selftests/bpf: bpf tunnel encap test")
Signed-off-by: Alan Maguire <alan.maguire(a)oracle.com>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index dcf3206..c805adb 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -160,6 +160,14 @@ server_listen
# client can connect again
ip netns exec "${ns2}" ip link add dev testtun0 type "${tuntype}" \
remote "${addr1}" local "${addr2}"
+# Because packets are decapped by the tunnel they arrive on testtun0 from
+# the IP stack perspective. Ensure reverse path filtering is disabled
+# otherwise we drop the TCP SYN as arriving on testtun0 instead of the
+# expected veth2 (veth2 is where 192.168.1.2 is configured).
+ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.all.rp_filter=0
+# rp needs to be disabled for both all and testtun0 as the rp value is
+# selected as the max of the "all" and device-specific values.
+ip netns exec "${ns2}" sysctl -qw net.ipv4.conf.testtun0.rp_filter=0
ip netns exec "${ns2}" ip link set dev testtun0 up
echo "test bpf encap with tunnel device decap"
client_connect
--
1.8.3.1
PTRACE_GET_SYSCALL_INFO is a generic ptrace API that lets ptracer obtain
details of the syscall the tracee is blocked in.
There are two reasons for a special syscall-related ptrace request.
Firstly, with the current ptrace API there are cases when ptracer cannot
retrieve necessary information about syscalls. Some examples include:
* The notorious int-0x80-from-64-bit-task issue. See [1] for details.
In short, if a 64-bit task performs a syscall through int 0x80, its tracer
has no reliable means to find out that the syscall was, in fact,
a compat syscall, and misidentifies it.
* Syscall-enter-stop and syscall-exit-stop look the same for the tracer.
Common practice is to keep track of the sequence of ptrace-stops in order
not to mix the two syscall-stops up. But it is not as simple as it looks;
for example, strace had a (just recently fixed) long-standing bug where
attaching strace to a tracee that is performing the execve system call
led to the tracer identifying the following syscall-exit-stop as
syscall-enter-stop, which messed up all the state tracking.
* Since the introduction of commit 84d77d3f06e7e8dea057d10e8ec77ad71f721be3
("ptrace: Don't allow accessing an undumpable mm"), both PTRACE_PEEKDATA
and process_vm_readv become unavailable when the process dumpable flag
is cleared. On such architectures as ia64 this results in all syscall
arguments being unavailable for the tracer.
Secondly, ptracers also have to support a lot of arch-specific code for
obtaining information about the tracee. For some architectures, this
requires a ptrace(PTRACE_PEEKUSER, ...) invocation for every syscall
argument and return value.
PTRACE_GET_SYSCALL_INFO returns the following structure:
struct ptrace_syscall_info {
__u8 op; /* PTRACE_SYSCALL_INFO_* */
__u32 arch __attribute__((__aligned__(sizeof(__u32))));
__u64 instruction_pointer;
__u64 stack_pointer;
union {
struct {
__u64 nr;
__u64 args[6];
} entry;
struct {
__s64 rval;
__u8 is_error;
} exit;
struct {
__u64 nr;
__u64 args[6];
__u32 ret_data;
} seccomp;
};
};
The structure was chosen according to [2], except for the following
changes:
* seccomp substructure was added as a superset of entry substructure;
* the type of nr field was changed from int to __u64 because syscall
numbers are, as a practical matter, 64 bits;
* stack_pointer field was added along with instruction_pointer field
since it is readily available and can save the tracer from extra
PTRACE_GETREGS/PTRACE_GETREGSET calls;
* arch is always initialized to aid with tracing system calls
* such as execve();
* instruction_pointer and stack_pointer are always initialized
so they could be easily obtained for non-syscall stops;
* a boolean is_error field was added along with rval field, this way
the tracer can more reliably distinguish a return value
from an error value.
strace has been ported to PTRACE_GET_SYSCALL_INFO.
Starting with release 4.26, strace uses PTRACE_GET_SYSCALL_INFO API
as the preferred mechanism of obtaining syscall information.
[1] https://lore.kernel.org/lkml/CA+55aFzcSVmdDj9Lh_gdbz1OzHyEm6ZrGPBDAJnywm2LF…
[2] https://lore.kernel.org/lkml/CAObL_7GM0n80N7J_DFw_eQyfLyzq+sf4y2AvsCCV88Tb3…
---
Notes:
v8:
* Moved syscall_get_arch() specific patches to a separate patchset
which is now merged into audit/next tree.
* Rebased to linux-next.
* Moved ptrace_get_syscall_info code under #ifdef CONFIG_HAVE_ARCH_TRACEHOOK,
narrowing down the set of architectures supported by this implementation
back to those 19 that enable CONFIG_HAVE_ARCH_TRACEHOOK because
I failed to get all syscall_get_*(), instruction_pointer(),
and user_stack_pointer() functions implemented on some niche
architectures. This leaves the following architectures out:
alpha, h8300, m68k, microblaze, and unicore32.
v7:
* Rebased to v5.0-rc1.
* 5 arch-specific preparatory patches out of 25 have been merged
into v5.0-rc1 via arch trees.
v6:
* Add syscall_get_arguments and syscall_set_arguments wrappers
to asm-generic/syscall.h, requested by Geert.
* Change PTRACE_GET_SYSCALL_INFO return code: do not take trailing paddings
into account, use the end of the last field of the structure being written.
* Change struct ptrace_syscall_info:
* remove .frame_pointer field, is is not needed and not portable;
* make .arch field explicitly aligned, remove no longer needed
padding before .arch field;
* remove trailing pads, they are no longer needed.
v5:
* Merge separate series and patches into the single series.
* Change PTRACE_EVENTMSG_SYSCALL_{ENTRY,EXIT} values as requested by Oleg.
* Change struct ptrace_syscall_info: generalize instruction_pointer,
stack_pointer, and frame_pointer fields by moving them from
ptrace_syscall_info.{entry,seccomp} substructures to ptrace_syscall_info
and initializing them for all stops.
* Add PTRACE_SYSCALL_INFO_NONE, set it when not in a syscall stop,
so e.g. "strace -i" could use PTRACE_SYSCALL_INFO_SECCOMP to obtain
instruction_pointer when the tracee is in a signal stop.
* Patch all remaining architectures to provide all necessary
syscall_get_* functions.
* Make available for all architectures: do not conditionalize on
CONFIG_HAVE_ARCH_TRACEHOOK since all syscall_get_* functions
are implemented on all architectures.
* Add a test for PTRACE_GET_SYSCALL_INFO to selftests/ptrace.
v4:
* Do not introduce task_struct.ptrace_event,
use child->last_siginfo->si_code instead.
* Implement PTRACE_SYSCALL_INFO_SECCOMP and ptrace_syscall_info.seccomp
support along with PTRACE_SYSCALL_INFO_{ENTRY,EXIT} and
ptrace_syscall_info.{entry,exit}.
v3:
* Change struct ptrace_syscall_info.
* Support PTRACE_EVENT_SECCOMP by adding ptrace_event to task_struct.
* Add proper defines for ptrace_syscall_info.op values.
* Rename PT_SYSCALL_IS_ENTERING and PT_SYSCALL_IS_EXITING to
PTRACE_EVENTMSG_SYSCALL_ENTRY and PTRACE_EVENTMSG_SYSCALL_EXIT
* and move them to uapi.
v2:
* Do not use task->ptrace.
* Replace entry_info.is_compat with entry_info.arch, use syscall_get_arch().
* Use addr argument of sys_ptrace to get expected size of the struct;
return full size of the struct.
Dmitry V. Levin (6):
nds32: fix asm/syscall.h
hexagon: define syscall_get_error() and syscall_get_return_value()
mips: define syscall_get_error()
parisc: define syscall_get_error()
powerpc: define syscall_get_error()
selftests/ptrace: add a test case for PTRACE_GET_SYSCALL_INFO
Elvira Khabirova (1):
ptrace: add PTRACE_GET_SYSCALL_INFO request
arch/hexagon/include/asm/syscall.h | 14 +
arch/mips/include/asm/syscall.h | 6 +
arch/nds32/include/asm/syscall.h | 29 +-
arch/parisc/include/asm/syscall.h | 7 +
arch/powerpc/include/asm/syscall.h | 10 +
include/linux/tracehook.h | 9 +-
include/uapi/linux/ptrace.h | 35 +++
kernel/ptrace.c | 103 ++++++-
tools/testing/selftests/ptrace/.gitignore | 1 +
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/get_syscall_info.c | 271 ++++++++++++++++++
11 files changed, 471 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/get_syscall_info.c
--
ldv
Not all compilers have __builtin_bswap16() and __builtin_bswap32(),
thus not all compilers are able to compile the following code:
(__builtin_constant_p(x) ? \
___constant_swab16(x) : __builtin_bswap16(x))
That's the reason why bpf_ntohl() doesn't work on GCC < 4.8, for
instance:
error: implicit declaration of function '__builtin_bswap16'
We can use __builtin_bswap16() only if compiler has this built-in,
that is, only if __HAVE_BUILTIN_BSWAP16__ is defined. Standard UAPI
__swab16()/__swab32() take care of that, and, additionally, handle
__builtin_constant_p() cases as well:
#ifdef __HAVE_BUILTIN_BSWAP16__
#define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
#else
#define __swab16(x) \
(__builtin_constant_p((__u16)(x)) ? \
___constant_swab16(x) : \
__fswab16(x))
#endif
So we can tweak selftests/bpf/bpf_endian.h and use UAPI
__swab16()/__swab32().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
---
v2: fixed build error, reshuffled patches (Stanislav Fomichev)
tools/testing/selftests/bpf/bpf_endian.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/bpf/bpf_endian.h b/tools/testing/selftests/bpf/bpf_endian.h
index b25595ea4a78..1ed268b2002b 100644
--- a/tools/testing/selftests/bpf/bpf_endian.h
+++ b/tools/testing/selftests/bpf/bpf_endian.h
@@ -20,12 +20,12 @@
* use different targets.
*/
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
-# define __bpf_ntohs(x) __builtin_bswap16(x)
-# define __bpf_htons(x) __builtin_bswap16(x)
+# define __bpf_ntohs(x) __swab16(x)
+# define __bpf_htons(x) __swab16(x)
# define __bpf_constant_ntohs(x) ___constant_swab16(x)
# define __bpf_constant_htons(x) ___constant_swab16(x)
-# define __bpf_ntohl(x) __builtin_bswap32(x)
-# define __bpf_htonl(x) __builtin_bswap32(x)
+# define __bpf_ntohl(x) __swab32(x)
+# define __bpf_htonl(x) __swab32(x)
# define __bpf_constant_ntohl(x) ___constant_swab32(x)
# define __bpf_constant_htonl(x) ___constant_swab32(x)
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
--
2.21.0
After some experiences I found that urandom_read does not need to be
linked statically. When the 'read' syscall call is moved to separate
non-inlined function then bpf_get_stackid() is able to find
the executable in stack trace and extract its build_id from it.
Signed-off-by: Ivan Vecera <ivecera(a)redhat.com>
---
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/urandom_read.c | 15 +++++++++++----
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 2aed37ea61a4..c33900a8fec0 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -69,7 +69,7 @@ TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
all: $(TEST_CUSTOM_PROGS)
$(OUTPUT)/urandom_read: $(OUTPUT)/%: %.c
- $(CC) -o $@ -static $< -Wl,--build-id
+ $(CC) -o $@ $< -Wl,--build-id
BPFOBJ := $(OUTPUT)/libbpf.a
diff --git a/tools/testing/selftests/bpf/urandom_read.c b/tools/testing/selftests/bpf/urandom_read.c
index 9de8b7cb4e6d..db781052758d 100644
--- a/tools/testing/selftests/bpf/urandom_read.c
+++ b/tools/testing/selftests/bpf/urandom_read.c
@@ -7,11 +7,19 @@
#define BUF_SIZE 256
+static __attribute__((noinline))
+void urandom_read(int fd, int count)
+{
+ char buf[BUF_SIZE];
+ int i;
+
+ for (i = 0; i < count; ++i)
+ read(fd, buf, BUF_SIZE);
+}
+
int main(int argc, char *argv[])
{
int fd = open("/dev/urandom", O_RDONLY);
- int i;
- char buf[BUF_SIZE];
int count = 4;
if (fd < 0)
@@ -20,8 +28,7 @@ int main(int argc, char *argv[])
if (argc == 2)
count = atoi(argv[1]);
- for (i = 0; i < count; ++i)
- read(fd, buf, BUF_SIZE);
+ urandom_read(fd, count);
close(fd);
return 0;
--
2.19.2
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 kernel tree and is now
being used to enable testing of Pixel 2 phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://lkml.org/lkml/2019/3/18/819
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
Changes in v13:
- Simplified untagging in tcp_zerocopy_receive().
- Looked at find_vma() callers in drivers/, which allowed to identify a
few other places where untagging is needed.
- Added patch "mm, arm64: untag user pointers in get_vaddr_frames".
- Added patch "drm/amdgpu, arm64: untag user pointers in
amdgpu_ttm_tt_get_user_pages".
- Added patch "drm/radeon, arm64: untag user pointers in
radeon_ttm_tt_pin_userptr".
- Added patch "IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr".
- Added patch "media/v4l2-core, arm64: untag user pointers in
videobuf_dma_contig_user_get".
- Added patch "tee/optee, arm64: untag user pointers in check_mem_type".
- Added patch "vfio/type1, arm64: untag user pointers".
Changes in v12:
- Changed untagging in tcp_zerocopy_receive() to also untag zc->address.
- Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups
and validity checks, but leave them as is for actual user space accesses.
- Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3].
- Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3]
handles that.
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset"
patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to
correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and
SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from
include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through
tagged pointers.
- Updated the documentation to mention that passing tagged pointers to
memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for
arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Andrey Konovalov (20):
uaccess: add untagged_addr definition for other arches
arm64: untag user pointers in access_ok and __uaccess_mask_ptr
lib, arm64: untag user pointers in strn*_user
mm, arm64: untag user pointers passed to memory syscalls
mm, arm64: untag user pointers in mm/gup.c
mm, arm64: untag user pointers in get_vaddr_frames
fs, arm64: untag user pointers in copy_mount_options
fs, arm64: untag user pointers in fs/userfaultfd.c
net, arm64: untag user pointers in tcp_zerocopy_receive
kernel, arm64: untag user pointers in prctl_set_mm*
tracing, arm64: untag user pointers in seq_print_user_ip
uprobes, arm64: untag user pointers in find_active_uprobe
bpf, arm64: untag user pointers in stack_map_get_build_id_offset
drm/amdgpu, arm64: untag user pointers in amdgpu_ttm_tt_get_user_pages
drm/radeon, arm64: untag user pointers in radeon_ttm_tt_pin_userptr
IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr
media/v4l2-core, arm64: untag user pointers in
videobuf_dma_contig_user_get
tee/optee, arm64: untag user pointers in check_mem_type
vfio/type1, arm64: untag user pointers in vaddr_get_pfn
selftests, arm64: add a selftest for passing tagged pointers to kernel
arch/arm64/include/asm/uaccess.h | 10 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 ++-
drivers/gpu/drm/radeon/radeon_ttm.c | 5 ++-
drivers/infiniband/hw/mlx4/mr.c | 7 +--
drivers/media/v4l2-core/videobuf-dma-contig.c | 9 ++--
drivers/tee/optee/call.c | 1 +
drivers/vfio/vfio_iommu_type1.c | 2 +
fs/namespace.c | 2 +-
fs/userfaultfd.c | 5 +++
include/linux/mm.h | 4 ++
ipc/shm.c | 2 +
kernel/bpf/stackmap.c | 6 ++-
kernel/events/uprobes.c | 2 +
kernel/sys.c | 44 +++++++++++++------
kernel/trace/trace_output.c | 5 ++-
lib/strncpy_from_user.c | 3 +-
lib/strnlen_user.c | 3 +-
mm/frame_vector.c | 2 +
mm/gup.c | 4 ++
mm/madvise.c | 2 +
mm/mempolicy.c | 5 +++
mm/migrate.c | 1 +
mm/mincore.c | 2 +
mm/mlock.c | 5 +++
mm/mmap.c | 7 +++
mm/mprotect.c | 1 +
mm/mremap.c | 2 +
mm/msync.c | 2 +
net/ipv4/tcp.c | 2 +
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 +++++
.../testing/selftests/arm64/run_tags_test.sh | 12 +++++
tools/testing/selftests/arm64/tags_test.c | 21 +++++++++
33 files changed, 159 insertions(+), 36 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.21.0.225.g810b269d1ac-goog
Hi Mimi,
Thank you for help about the pointer about IMA testing.
Probably I should cc list as well since we are talking about the patch
itself. For the ima test itself I could still ask for help in a private
email thread.
On 03/18/19 at 02:09pm, Mimi Zohar wrote:
> On Mon, 2019-03-18 at 22:06 +0800, Dave Young wrote:
> > Hi Mimi,
> >
> > On 03/14/19 at 02:41pm, Mimi Zohar wrote:
> > > The kernel may be configured or an IMA policy specified on the boot
> > > command line requiring the kexec kernel image signature to be verified.
> > > At runtime a custom IMA policy may be loaded, replacing the policy
> > > specified on the boot command line. In addition, the arch specific
> > > policy rules are dynamically defined based on the secure boot mode that
> > > may require the kernel image signature to be verified.
> > >
> > > The kernel image may have a PE signature, an IMA signature, or both. In
> > > addition, there are two kexec syscalls - kexec_load and kexec_file_load
> > > - but only the kexec_file_load syscall can verify signatures.
> > >
> > > These kexec selftests verify that only properly signed kernel images are
> > > loaded as required, based on the kernel config, the secure boot mode,
> > > and the IMA runtime policy.
> > >
> > > Loading a kernel image or kernel module requires root privileges. To
> > > run just the KEXEC selftests: sudo make TARGETS=kexec kselftest
> > >
> > > Changelog v4:
> > > - Moved the kexec tests to selftests/kexec, as requested by Dave Young.
> > > - Removed the kernel module selftest from this patch set.
> > > - Rewritten cover letter, removing reference to kernel modules.
> > >
> > > Changelog v3:
> > > - Updated tests based on Petr's review, including the defining a common
> > > test to check for root privileges.
> > > - Modified config, removing the CONFIG_KEXEC_VERIFY_SIG requirement.
> > > - Updated the SPDX license to GPL-2.0 based on Shuah's review.
> > > - Updated the secureboot mode test to check the SetupMode as well, based
> > > on David Young's review.
> > >
> > >
> > I was trying to review the patches although I'm slow due to something
> > else.
> >
> > But I still did not setup a IMA testable system, need check your old
> > email about how to setup it.
>
> (The ima-evm-utils package contains a README with directions.)
>
> >
> > A quick testing gives me below results
> >
> > /* test #1, my default kconfig
> > # NO CONFIG_INTEGRITY compiled in
> > */
> >
> > make[1]: Nothing to be done for 'all'.
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make[1]: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > TAP version 13
> > selftests: kexec: test_kexec_load.sh
> > ========================================
> > selftests: kexec: test_kexec_load.sh: Warning: file
> > test_kexec_load.sh is not executable, correct this.
> > not ok 1..1 selftests: kexec: test_kexec_load.sh [FAIL]
>
> That's really weird. Both before and after applying these patches
> test_kexec_load.sh is executable (stable linux-5.0.y). Could
> something else be preventing it from executing?
>
> > selftests: kexec: test_kexec_file_load.sh
> > ========================================
> > [INFO] kexec_file_load is enabled
> > [INFO] secure boot mode not enabled
> > [INFO] kexec kernel image PE signed
> > [INFO] kexec kernel image not IMA signed
> > kexec_file_load succeeded (possibly missing IMA sig) [FAIL]
> > not ok 1..2 selftests: kexec: test_kexec_file_load.sh [FAIL]
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests'
>
> This message is because neither CONFIG_KEXEC_BZIMAGE_VERIFY_SIG or an
> IMA signature is required. It couldn't read the IMA runtime policy
> rules to determine if an IMA signature is required. So, it's trying
> to provide a hint as to what happened.
>
> I'll update the test to see if CONFIG_IMA_APPRAISE is enabled, before
> emitting this message.
>
> >
> > /* test #2, enabled IMA kconfigs, simply test without other ima
> > setup eg. use a policy etc. need to follow up some guide to test the
> > ima functionality (TODO..)
> > */
> >
> >
> > [root@dhcp-128-65 linux-x86]# make -C tools/testing/selftests TARGETS=kexec run_tests
> > make: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests'
> > make[1]: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make[1]: Nothing to be done for 'all'.
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make[1]: Entering directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > TAP version 13
> > selftests: kexec: test_kexec_load.sh
> > ========================================
> > selftests: kexec: test_kexec_load.sh: Warning: file test_kexec_load.sh is not executable, correct this.
> > not ok 1..1 selftests: kexec: test_kexec_load.sh [FAIL]
> > selftests: kexec: test_kexec_file_load.sh
> > ========================================
> > [INFO] kexec_file_load is enabled
> > [INFO] reading IMA policy permitted
> > [INFO] secure boot mode not enabled
> > No signature verification required
> > not ok 1..2 selftests: kexec: test_kexec_file_load.sh [SKIP]
> > make[1]: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests/kexec'
> > make: Leaving directory '/home/dyoung/git/github/linux/tools/testing/selftests'
>
> The purpose of these tests was to coordinate kernel image signature
> verification.
>
> If you require a PE signature, load an IMA policy requiring an IMA
> signature, or even enable CONFIG_IMA_ARCH_POLICY, the test would
> require some form of signature verification.
Did a test with a embedded ima key in kernel, with secure boot disabled,
but with Secure Boot enabled, but failed to sign the kernel with both
pesign and evmctl, will continue to see how to work on it and ask in
private email if needed :)
About the patch itself, as we talked in another email, I would expect it
can work with other test cases eg. without IMA/secure boot. But if that
is not easy, maybe you can change the test script filename to something
like: test_kexec_load_sigcheck.sh and test_kexec_file_load_sigcheck.sh
then we can add other non-sigcheck related cases to other test scripts
later. But ideally if we can handle them in current files it would be
better.
Another issue I noticed is even if boot with ima_appraise=off, kexec
load still checking the conditions. Will see if I'm having something
wrong in test steps.
Thanks
Dave
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
Memory syscalls (mmap, mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 kernel tree and is now
being used to enable testing of Pixel 2 phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://lkml.org/lkml/2019/3/18/819
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
Changes in v12:
- Changed untagging in tcp_zerocopy_receive() to also untag zc->address.
- Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups
and validity checks, but leave them as is for actual user space accesses.
- Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3].
- Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3]
handles that.
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset"
patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to
correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and
SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from
include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through
tagged pointers.
- Updated the documentation to mention that passing tagged pointers to
memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for
arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Andrey Konovalov (13):
uaccess: add untagged_addr definition for other arches
arm64: untag user pointers in access_ok and __uaccess_mask_ptr
lib, arm64: untag user pointers in strn*_user
mm, arm64: untag user pointers passed to memory syscalls
mm, arm64: untag user pointers in mm/gup.c
fs, arm64: untag user pointers in copy_mount_options
fs, arm64: untag user pointers in fs/userfaultfd.c
net, arm64: untag user pointers in tcp_zerocopy_receive
kernel, arm64: untag user pointers in prctl_set_mm*
tracing, arm64: untag user pointers in seq_print_user_ip
uprobes, arm64: untag user pointers in find_active_uprobe
bpf, arm64: untag user pointers in stack_map_get_build_id_offset
selftests, arm64: add a selftest for passing tagged pointers to kernel
arch/arm64/include/asm/uaccess.h | 10 +++--
fs/namespace.c | 2 +-
fs/userfaultfd.c | 5 +++
include/linux/mm.h | 4 ++
ipc/shm.c | 2 +
kernel/bpf/stackmap.c | 6 ++-
kernel/events/uprobes.c | 2 +
kernel/sys.c | 44 +++++++++++++------
kernel/trace/trace_output.c | 5 ++-
lib/strncpy_from_user.c | 3 +-
lib/strnlen_user.c | 3 +-
mm/gup.c | 4 ++
mm/madvise.c | 2 +
mm/mempolicy.c | 5 +++
mm/migrate.c | 1 +
mm/mincore.c | 2 +
mm/mlock.c | 5 +++
mm/mmap.c | 7 +++
mm/mprotect.c | 1 +
mm/mremap.c | 2 +
mm/msync.c | 2 +
net/ipv4/tcp.c | 9 +++-
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 +++++
.../testing/selftests/arm64/run_tags_test.sh | 12 +++++
tools/testing/selftests/arm64/tags_test.c | 21 +++++++++
26 files changed, 144 insertions(+), 27 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.21.0.225.g810b269d1ac-goog