From: Thomas Gleixner <tglx(a)linutronix.de>
CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
removed. Cleanup the leftovers before doing so.
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: rcu(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
---
tools/testing/selftests/rcutorture/configs/rcu/SRCU-t | 1 -
tools/testing/selftests/rcutorture/configs/rcu/SRCU-u | 1 -
tools/testing/selftests/rcutorture/configs/rcu/TINY01 | 1 -
tools/testing/selftests/rcutorture/doc/TINY_RCU.txt | 5 ++---
tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt | 1 -
tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h | 1 -
6 files changed, 2 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
index 6c78022..553cf65 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-t
@@ -7,4 +7,3 @@ CONFIG_RCU_TRACE=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
CONFIG_DEBUG_ATOMIC_SLEEP=y
-#CHECK#CONFIG_PREEMPT_COUNT=y
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
index c15ada8..99563da 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
+++ b/tools/testing/selftests/rcutorture/configs/rcu/SRCU-u
@@ -7,4 +7,3 @@ CONFIG_RCU_TRACE=n
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TINY01 b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
index 6db705e..9b22b8e 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TINY01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TINY01
@@ -10,4 +10,3 @@ CONFIG_RCU_TRACE=n
#CHECK#CONFIG_RCU_STALL_COMMON=n
CONFIG_DEBUG_LOCK_ALLOC=n
CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
-CONFIG_PREEMPT_COUNT=n
diff --git a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
index a75b169..d30cedf 100644
--- a/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
+++ b/tools/testing/selftests/rcutorture/doc/TINY_RCU.txt
@@ -3,11 +3,10 @@ This document gives a brief rationale for the TINY_RCU test cases.
Kconfig Parameters:
-CONFIG_DEBUG_LOCK_ALLOC -- Do all three and none of the three.
-CONFIG_PREEMPT_COUNT
+CONFIG_DEBUG_LOCK_ALLOC -- Do both and none of the two.
CONFIG_RCU_TRACE
-The theory here is that randconfig testing will hit the other six possible
+The theory here is that randconfig testing will hit the other two possible
combinations of these parameters.
diff --git a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
index 1b96d68..cfdd48f 100644
--- a/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
+++ b/tools/testing/selftests/rcutorture/doc/TREE_RCU-kconfig.txt
@@ -43,7 +43,6 @@ CONFIG_64BIT
Used only to check CONFIG_RCU_FANOUT value, inspection suffices.
-CONFIG_PREEMPT_COUNT
CONFIG_PREEMPT_RCU
Redundant with CONFIG_PREEMPT, ignore.
diff --git a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
index 283d710..d0d485d 100644
--- a/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
+++ b/tools/testing/selftests/rcutorture/formal/srcu-cbmc/src/config.h
@@ -8,7 +8,6 @@
#undef CONFIG_HOTPLUG_CPU
#undef CONFIG_MODULES
#undef CONFIG_NO_HZ_FULL_SYSIDLE
-#undef CONFIG_PREEMPT_COUNT
#undef CONFIG_PREEMPT_RCU
#undef CONFIG_PROVE_RCU
#undef CONFIG_RCU_NOCB_CPU
--
2.9.5
Hi!
I really like Hangbin Liu's intent[1] but I think we need to be a little
more clean about the implementation. This extracts run_kselftest.sh from
the Makefile so it can actually be changed without embeds, etc. Instead,
generate the test list into a text file. Everything gets much simpler.
:)
And in patch 2, I add back Hangin Liu's new options (with some extra
added) with knowledge of "collections" (i.e. Makefile TARGETS) and
subtests. This should work really well with LAVA too, which needs to
manipulate the lists of tests being run.
Thoughts?
-Kees
[1] https://lore.kernel.org/lkml/20200914022227.437143-1-liuhangbin@gmail.com/
Kees Cook (2):
selftests: Extract run_kselftest.sh and generate stand-alone test list
selftests/run_kselftest.sh: Make each test individually selectable
tools/testing/selftests/Makefile | 26 ++-----
tools/testing/selftests/lib.mk | 5 +-
tools/testing/selftests/run_kselftest.sh | 89 ++++++++++++++++++++++++
3 files changed, 98 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/run_kselftest.sh
--
2.25.1
Right now .kunitconfig and the build dir are automatically created if
the build dir does not exists; however, if the build dir is present and
.kunitconfig is not, kunit_tool will crash.
Fix this by checking for both the build dir as well as the .kunitconfig.
NOTE: This depends on commit 5578d008d9e0 ("kunit: tool: fix running
kunit_tool from outside kernel tree")
Link: https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/c…
Signed-off-by: Brendan Higgins <brendanhiggins(a)google.com>
---
tools/testing/kunit/kunit.py | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index e2caf4e24ecb2..8ab17e21a3578 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -243,6 +243,8 @@ def main(argv, linux=None):
if cli_args.subcommand == 'run':
if not os.path.exists(cli_args.build_dir):
os.mkdir(cli_args.build_dir)
+
+ if not os.path.exists(kunit_kernel.kunitconfig_path):
create_default_kunitconfig()
if not linux:
@@ -258,10 +260,12 @@ def main(argv, linux=None):
if result.status != KunitStatus.SUCCESS:
sys.exit(1)
elif cli_args.subcommand == 'config':
- if cli_args.build_dir:
- if not os.path.exists(cli_args.build_dir):
- os.mkdir(cli_args.build_dir)
- create_default_kunitconfig()
+ if cli_args.build_dir and (
+ not os.path.exists(cli_args.build_dir)):
+ os.mkdir(cli_args.build_dir)
+
+ if not os.path.exists(kunit_kernel.kunitconfig_path):
+ create_default_kunitconfig()
if not linux:
linux = kunit_kernel.LinuxSourceTree()
base-commit: d96fe1a5485fa978a6e3690adc4dbe4d20b5baa4
--
2.28.0.681.g6f77f65b4e-goog
Changes since v2:
- added struct xfrm_translator as API to register xfrm_compat.ko with
xfrm_state.ko. This allows compilation of translator as a loadable
module
- fixed indention and collected reviewed-by (Johannes Berg)
- moved boilerplate from commit messages into cover-letter (Steffen
Klassert)
- found on KASAN build and fixed non-initialised stack variable usage
in the translator
The resulting v2/v3 diff can be found here:
https://gist.github.com/0x7f454c46/8f68311dfa1f240959fdbe7c77ed2259
Patches as a .git branch:
https://github.com/0x7f454c46/linux/tree/xfrm-compat-v3
Changes since v1:
- reworked patches set to use translator
- separated the compat layer into xfrm_compat.c,
compiled under XFRM_USER_COMPAT config
- 32-bit messages now being sent in frag_list (like wext-core does)
- instead of __packed add compat_u64 members in compat structures
- selftest reworked to kselftest lib API
- added netlink dump testing to the selftest
XFRM is disabled for compatible users because of the UABI difference.
The difference is in structures paddings and in the result the size
of netlink messages differ.
Possibility for compatible application to manage xfrm tunnels was
disabled by: the commmit 19d7df69fdb2 ("xfrm: Refuse to insert 32 bit
userspace socket policies on 64 bit systems") and the commit 74005991b78a
("xfrm: Do not parse 32bits compiled xfrm netlink msg on 64bits host").
This is my second attempt to resolve the xfrm/compat problem by adding
the 64=>32 and 32=>64 bit translators those non-visibly to a user
provide translation between compatible user and kernel.
Previous attempt was to interrupt the message ABI according to a syscall
by xfrm_user, which resulted in over-complicated code [1].
Florian Westphal provided the idea of translator and some draft patches
in the discussion. In these patches, his idea is reused and some of his
initial code is also present.
There were a couple of attempts to solve xfrm compat problem:
https://lkml.org/lkml/2017/1/20/733https://patchwork.ozlabs.org/patch/44600/http://netdev.vger.kernel.narkive.com/2Gesykj6/patch-net-next-xfrm-correctl…
All the discussions end in the conclusion that xfrm should have a full
compatible layer to correctly work with 32-bit applications on 64-bit
kernels:
https://lkml.org/lkml/2017/1/23/413https://patchwork.ozlabs.org/patch/433279/
In some recent lkml discussion, Linus said that it's worth to fix this
problem and not giving people an excuse to stay on 32-bit kernel:
https://lkml.org/lkml/2018/2/13/752
There is also an selftest for ipsec tunnels.
It doesn't depend on any library and compat version can be easy
build with: make CFLAGS=-m32 net/ipsec
[1]: https://lkml.kernel.org/r/20180726023144.31066-1-dima@arista.com
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Florian Westphal <fw(a)strlen.de>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Stephen Suryaputra <ssuryaextr(a)gmail.com>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: netdev(a)vger.kernel.org
Dmitry Safonov (7):
xfrm: Provide API to register translator module
xfrm/compat: Add 64=>32-bit messages translator
xfrm/compat: Attach xfrm dumps to 64=>32 bit translator
netlink/compat: Append NLMSG_DONE/extack to frag_list
xfrm/compat: Add 32=>64-bit messages translator
xfrm/compat: Translate 32-bit user_policy from sockptr
selftest/net/xfrm: Add test for ipsec tunnel
MAINTAINERS | 1 +
include/net/xfrm.h | 33 +
net/netlink/af_netlink.c | 47 +-
net/xfrm/Kconfig | 11 +
net/xfrm/Makefile | 1 +
net/xfrm/xfrm_compat.c | 625 +++++++
net/xfrm/xfrm_state.c | 77 +-
net/xfrm/xfrm_user.c | 110 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ipsec.c | 2195 ++++++++++++++++++++++++
11 files changed, 3066 insertions(+), 36 deletions(-)
create mode 100644 net/xfrm/xfrm_compat.c
create mode 100644 tools/testing/selftests/net/ipsec.c
base-commit: ba4f184e126b751d1bffad5897f263108befc780
--
2.28.0
Hi,
The v6 of this patch series include only the type change requested by
Andy on the vdso patch, but since v5 included some bigger changes, I'm
documenting them in this cover letter as well.
Please note this applies on top of Linus tree, and it succeeds seccomp
and syscall user dispatch selftests.
v5 cover letter
--------------
This is v5 of Syscall User Dispatch. It has some big changes in
comparison to v4.
First of all, it allows the vdso trampoline code for architectures that
support it. This is exposed through an arch hook. It also addresses
the concern about what happens when a bad selector is provided, instead
of SIGSEGV, we fail with SIGSYS, which is more debug-able.
Another major change is that it is now based on top of Gleixner's common
syscall entry work, and is supposed to only be used by that code.
Therefore, the entry symbol is not exported outside of kernel/entry/ code.
The biggest change in this version is the attempt to avoid using one of
the final TIF flags on x86 32 bit, without increasing the size of that
variable to 64 bit. My expectation is that, with this work, plus the
removal of TIF_IA32, TIF_X32 and TIF_FORCE_TF, we might be able to avoid
changing this field to 64 bits at all. Instead, this follows the
suggestion by Andy to have a generic TIF flag for SECCOMP and this
mechanism, and use another field to decide which one is enabled. The
code for this is not complex, so it seems like a viable approach.
Finally, this version adds some documentation to the feature.
Kees, I dropped your reviewed-by on patch 5, given the amount of
changes.
Thanks,
Previous submissions are archived at:
RFC/v1: https://lkml.org/lkml/2020/7/8/96
v2: https://lkml.org/lkml/2020/7/9/17
v3: https://lkml.org/lkml/2020/7/12/4
v4: https://www.spinics.net/lists/linux-kselftest/msg16377.html
v5: https://lkml.org/lkml/2020/8/10/1320
Gabriel Krisman Bertazi (9):
kernel: Support TIF_SYSCALL_INTERCEPT flag
kernel: entry: Support TIF_SYSCAL_INTERCEPT on common entry code
x86: vdso: Expose sigreturn address on vdso to the kernel
signal: Expose SYS_USER_DISPATCH si_code type
kernel: Implement selective syscall userspace redirection
kernel: entry: Support Syscall User Dispatch for common syscall entry
x86: Enable Syscall User Dispatch
selftests: Add kselftest for syscall user dispatch
doc: Document Syscall User Dispatch
.../admin-guide/syscall-user-dispatch.rst | 87 ++++++
arch/Kconfig | 21 ++
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vdso2c.c | 2 +
arch/x86/entry/vdso/vdso32/sigreturn.S | 2 +
arch/x86/entry/vdso/vma.c | 15 +
arch/x86/include/asm/elf.h | 1 +
arch/x86/include/asm/thread_info.h | 4 +-
arch/x86/include/asm/vdso.h | 2 +
arch/x86/kernel/signal_compat.c | 2 +-
fs/exec.c | 8 +
include/linux/entry-common.h | 6 +-
include/linux/sched.h | 8 +-
include/linux/seccomp.h | 20 +-
include/linux/syscall_intercept.h | 71 +++++
include/linux/syscall_user_dispatch.h | 29 ++
include/uapi/asm-generic/siginfo.h | 3 +-
include/uapi/linux/prctl.h | 5 +
kernel/entry/Makefile | 1 +
kernel/entry/common.c | 32 +-
kernel/entry/common.h | 15 +
kernel/entry/syscall_user_dispatch.c | 101 ++++++
kernel/fork.c | 10 +-
kernel/seccomp.c | 7 +-
kernel/sys.c | 5 +
tools/testing/selftests/Makefile | 1 +
.../syscall_user_dispatch/.gitignore | 2 +
.../selftests/syscall_user_dispatch/Makefile | 9 +
.../selftests/syscall_user_dispatch/config | 1 +
.../syscall_user_dispatch.c | 292 ++++++++++++++++++
30 files changed, 744 insertions(+), 19 deletions(-)
create mode 100644 Documentation/admin-guide/syscall-user-dispatch.rst
create mode 100644 include/linux/syscall_intercept.h
create mode 100644 include/linux/syscall_user_dispatch.h
create mode 100644 kernel/entry/common.h
create mode 100644 kernel/entry/syscall_user_dispatch.c
create mode 100644 tools/testing/selftests/syscall_user_dispatch/.gitignore
create mode 100644 tools/testing/selftests/syscall_user_dispatch/Makefile
create mode 100644 tools/testing/selftests/syscall_user_dispatch/config
create mode 100644 tools/testing/selftests/syscall_user_dispatch/syscall_user_dispatch.c
--
2.28.0
This patchset is based on Google-internal RSEQ
work done by Paul Turner and Andrew Hunter.
When working with per-CPU RSEQ-based memory allocations,
it is sometimes important to make sure that a global
memory location is no longer accessed from RSEQ critical
sections. For example, there can be two per-CPU lists,
one is "active" and accessed per-CPU, while another one
is inactive and worked on asynchronously "off CPU" (e.g.
garbage collection is performed). Then at some point
the two lists are swapped, and a fast RCU-like mechanism
is required to make sure that the previously active
list is no longer accessed.
This patch introduces such a mechanism: in short,
membarrier() syscall issues an IPI to a CPU, restarting
a potentially active RSEQ critical section on the CPU.
v1->v2:
- removed the ability to IPI all CPUs in a single sycall;
- use task->mm rather than task->group_leader to identify
tasks belonging to the same process.
v2->v3:
- re-added the ability to IPI all CPUs in a single syscall;
- integrated with membarrier_private_expedited() to
make sure only CPUs running tasks with the same mm as
the current task are interrupted;
- also added MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ;
- flags in membarrier_private_expedited are never actually
bit flags but always distinct values (i.e. never two flags
are combined), so I modified bit testing to full equation
comparison for simplicity (otherwise the code needs to
work when several bits are set, for example).
v3->v4:
- added the third parameter to membarrier syscall: @cpu_id:
if @flags == MEMBARRIER_CMD_FLAG_CPU, then @cpu_id indicates
the cpu on which RSEQ CS should be restarted.
v4->v5:
- added @cpu_id parameter to sys_membarrier in syscalls.h.
v5->v6:
- made membarrier_private_expedited more efficient in a
single-cpu case;
- a couple of minor refactorings.
v6->v7:
- made @flags an unsigned int in sys_membarrier;
- a couple of minor refactorings.
v7->v8:
- replaced BUG_ON with WARN_ON_ONCE in membarrier.c.
The second patch in the patchset adds a selftest
of this feature.
Signed-off-by: Peter Oskolkov <posk(a)google.com>
---
include/linux/sched/mm.h | 3 +
include/linux/syscalls.h | 2 +-
include/uapi/linux/membarrier.h | 26 ++++++
kernel/sched/membarrier.c | 136 +++++++++++++++++++++++++-------
4 files changed, 136 insertions(+), 31 deletions(-)
diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index f889e332912f..15bfb06f2884 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -348,10 +348,13 @@ enum {
MEMBARRIER_STATE_GLOBAL_EXPEDITED = (1U << 3),
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY = (1U << 4),
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE = (1U << 5),
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY = (1U << 6),
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ = (1U << 7),
};
enum {
MEMBARRIER_FLAG_SYNC_CORE = (1U << 0),
+ MEMBARRIER_FLAG_RSEQ = (1U << 1),
};
#ifdef CONFIG_ARCH_HAS_MEMBARRIER_CALLBACKS
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index 75ac7f8ae93c..466c993e52bf 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -974,7 +974,7 @@ asmlinkage long sys_execveat(int dfd, const char __user *filename,
const char __user *const __user *argv,
const char __user *const __user *envp, int flags);
asmlinkage long sys_userfaultfd(int flags);
-asmlinkage long sys_membarrier(int cmd, int flags);
+asmlinkage long sys_membarrier(int cmd, int flags, int cpu_id);
asmlinkage long sys_mlock2(unsigned long start, size_t len, int flags);
asmlinkage long sys_copy_file_range(int fd_in, loff_t __user *off_in,
int fd_out, loff_t __user *off_out,
diff --git a/include/uapi/linux/membarrier.h b/include/uapi/linux/membarrier.h
index 5891d7614c8c..737605897f36 100644
--- a/include/uapi/linux/membarrier.h
+++ b/include/uapi/linux/membarrier.h
@@ -114,6 +114,26 @@
* If this command is not implemented by an
* architecture, -EINVAL is returned.
* Returns 0 on success.
+ * @MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ * Ensure the caller thread, upon return from
+ * system call, that all its running thread
+ * siblings have any currently running rseq
+ * critical sections restarted if @flags
+ * parameter is 0; if @flags parameter is
+ * MEMBARRIER_CMD_FLAG_CPU,
+ * then this operation is performed only
+ * on CPU indicated by @cpu_id. If this command is
+ * not implemented by an architecture, -EINVAL
+ * is returned. A process needs to register its
+ * intent to use the private expedited rseq
+ * command prior to using it, otherwise
+ * this command returns -EPERM.
+ * @MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
+ * Register the process intent to use
+ * MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ.
+ * If this command is not implemented by an
+ * architecture, -EINVAL is returned.
+ * Returns 0 on success.
* @MEMBARRIER_CMD_SHARED:
* Alias to MEMBARRIER_CMD_GLOBAL. Provided for
* header backward compatibility.
@@ -131,9 +151,15 @@ enum membarrier_cmd {
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED = (1 << 4),
MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 5),
MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE = (1 << 6),
+ MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ = (1 << 7),
+ MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ = (1 << 8),
/* Alias for header backward compatibility. */
MEMBARRIER_CMD_SHARED = MEMBARRIER_CMD_GLOBAL,
};
+enum membarrier_cmd_flag {
+ MEMBARRIER_CMD_FLAG_CPU = (1 << 0),
+};
+
#endif /* _UAPI_LINUX_MEMBARRIER_H */
diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index 168479a7d61b..e23e74d52db5 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -18,6 +18,14 @@
#define MEMBARRIER_PRIVATE_EXPEDITED_SYNC_CORE_BITMASK 0
#endif
+#ifdef CONFIG_RSEQ
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK \
+ (MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ \
+ | MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ_BITMASK)
+#else
+#define MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ_BITMASK 0
+#endif
+
#define MEMBARRIER_CMD_BITMASK \
(MEMBARRIER_CMD_GLOBAL | MEMBARRIER_CMD_GLOBAL_EXPEDITED \
| MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED \
@@ -30,6 +38,11 @@ static void ipi_mb(void *info)
smp_mb(); /* IPIs should be serializing but paranoid. */
}
+static void ipi_rseq(void *info)
+{
+ rseq_preempt(current);
+}
+
static void ipi_sync_rq_state(void *info)
{
struct mm_struct *mm = (struct mm_struct *) info;
@@ -129,19 +142,27 @@ static int membarrier_global_expedited(void)
return 0;
}
-static int membarrier_private_expedited(int flags)
+static int membarrier_private_expedited(int flags, int cpu_id)
{
- int cpu;
cpumask_var_t tmpmask;
struct mm_struct *mm = current->mm;
+ smp_call_func_t ipi_func = ipi_mb;
- if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+ if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
+ } else if (flags == MEMBARRIER_FLAG_RSEQ) {
+ if (!IS_ENABLED(CONFIG_RSEQ))
+ return -EINVAL;
+ if (!(atomic_read(&mm->membarrier_state) &
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY))
+ return -EPERM;
+ ipi_func = ipi_rseq;
} else {
+ WARN_ON_ONCE(flags);
if (!(atomic_read(&mm->membarrier_state) &
MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY))
return -EPERM;
@@ -156,35 +177,59 @@ static int membarrier_private_expedited(int flags)
*/
smp_mb(); /* system call entry is not a mb. */
- if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
+ if (cpu_id < 0 && !zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
return -ENOMEM;
cpus_read_lock();
- rcu_read_lock();
- for_each_online_cpu(cpu) {
+
+ if (cpu_id >= 0) {
struct task_struct *p;
- /*
- * Skipping the current CPU is OK even through we can be
- * migrated at any point. The current CPU, at the point
- * where we read raw_smp_processor_id(), is ensured to
- * be in program order with respect to the caller
- * thread. Therefore, we can skip this CPU from the
- * iteration.
- */
- if (cpu == raw_smp_processor_id())
- continue;
- p = rcu_dereference(cpu_rq(cpu)->curr);
- if (p && p->mm == mm)
- __cpumask_set_cpu(cpu, tmpmask);
+ if (cpu_id >= nr_cpu_ids || !cpu_online(cpu_id))
+ goto out;
+ if (cpu_id == raw_smp_processor_id())
+ goto out;
+ rcu_read_lock();
+ p = rcu_dereference(cpu_rq(cpu_id)->curr);
+ if (!p || p->mm != mm) {
+ rcu_read_unlock();
+ goto out;
+ }
+ rcu_read_unlock();
+ } else {
+ int cpu;
+
+ rcu_read_lock();
+ for_each_online_cpu(cpu) {
+ struct task_struct *p;
+
+ /*
+ * Skipping the current CPU is OK even through we can be
+ * migrated at any point. The current CPU, at the point
+ * where we read raw_smp_processor_id(), is ensured to
+ * be in program order with respect to the caller
+ * thread. Therefore, we can skip this CPU from the
+ * iteration.
+ */
+ if (cpu == raw_smp_processor_id())
+ continue;
+ p = rcu_dereference(cpu_rq(cpu)->curr);
+ if (p && p->mm == mm)
+ __cpumask_set_cpu(cpu, tmpmask);
+ }
+ rcu_read_unlock();
}
- rcu_read_unlock();
preempt_disable();
- smp_call_function_many(tmpmask, ipi_mb, NULL, 1);
+ if (cpu_id >= 0)
+ smp_call_function_single(cpu_id, ipi_func, NULL, 1);
+ else
+ smp_call_function_many(tmpmask, ipi_func, NULL, 1);
preempt_enable();
- free_cpumask_var(tmpmask);
+out:
+ if (cpu_id < 0)
+ free_cpumask_var(tmpmask);
cpus_read_unlock();
/*
@@ -283,11 +328,18 @@ static int membarrier_register_private_expedited(int flags)
set_state = MEMBARRIER_STATE_PRIVATE_EXPEDITED,
ret;
- if (flags & MEMBARRIER_FLAG_SYNC_CORE) {
+ if (flags == MEMBARRIER_FLAG_SYNC_CORE) {
if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
ready_state =
MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY;
+ } else if (flags == MEMBARRIER_FLAG_RSEQ) {
+ if (!IS_ENABLED(CONFIG_RSEQ))
+ return -EINVAL;
+ ready_state =
+ MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ_READY;
+ } else {
+ WARN_ON_ONCE(flags);
}
/*
@@ -299,6 +351,8 @@ static int membarrier_register_private_expedited(int flags)
return 0;
if (flags & MEMBARRIER_FLAG_SYNC_CORE)
set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE;
+ if (flags & MEMBARRIER_FLAG_RSEQ)
+ set_state |= MEMBARRIER_STATE_PRIVATE_EXPEDITED_RSEQ;
atomic_or(set_state, &mm->membarrier_state);
ret = sync_runqueues_membarrier_state(mm);
if (ret)
@@ -310,8 +364,15 @@ static int membarrier_register_private_expedited(int flags)
/**
* sys_membarrier - issue memory barriers on a set of threads
- * @cmd: Takes command values defined in enum membarrier_cmd.
- * @flags: Currently needs to be 0. For future extensions.
+ * @cmd: Takes command values defined in enum membarrier_cmd.
+ * @flags: Currently needs to be 0 for all commands other than
+ * MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ: in the latter
+ * case it can be MEMBARRIER_CMD_FLAG_CPU, indicating that @cpu_id
+ * contains the CPU on which to interrupt (= restart)
+ * the RSEQ critical section.
+ * @cpu_id: if @flags == MEMBARRIER_CMD_FLAG_CPU, indicates the cpu on which
+ * RSEQ CS should be interrupted (@cmd must be
+ * MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ).
*
* If this system call is not implemented, -ENOSYS is returned. If the
* command specified does not exist, not available on the running
@@ -337,10 +398,21 @@ static int membarrier_register_private_expedited(int flags)
* smp_mb() X O O
* sys_membarrier() O O O
*/
-SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
+SYSCALL_DEFINE3(membarrier, int, cmd, unsigned int, flags, int, cpu_id)
{
- if (unlikely(flags))
- return -EINVAL;
+ switch (cmd) {
+ case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ if (unlikely(flags && flags != MEMBARRIER_CMD_FLAG_CPU))
+ return -EINVAL;
+ break;
+ default:
+ if (unlikely(flags))
+ return -EINVAL;
+ }
+
+ if (!(flags & MEMBARRIER_CMD_FLAG_CPU))
+ cpu_id = -1;
+
switch (cmd) {
case MEMBARRIER_CMD_QUERY:
{
@@ -362,13 +434,17 @@ SYSCALL_DEFINE2(membarrier, int, cmd, int, flags)
case MEMBARRIER_CMD_REGISTER_GLOBAL_EXPEDITED:
return membarrier_register_global_expedited();
case MEMBARRIER_CMD_PRIVATE_EXPEDITED:
- return membarrier_private_expedited(0);
+ return membarrier_private_expedited(0, cpu_id);
case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED:
return membarrier_register_private_expedited(0);
case MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE:
- return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+ return membarrier_private_expedited(MEMBARRIER_FLAG_SYNC_CORE, cpu_id);
case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_SYNC_CORE:
return membarrier_register_private_expedited(MEMBARRIER_FLAG_SYNC_CORE);
+ case MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ:
+ return membarrier_private_expedited(MEMBARRIER_FLAG_RSEQ, cpu_id);
+ case MEMBARRIER_CMD_REGISTER_PRIVATE_EXPEDITED_RSEQ:
+ return membarrier_register_private_expedited(MEMBARRIER_FLAG_RSEQ);
default:
return -EINVAL;
}
--
2.28.0.709.gb0816b6eb0-goog
[ Please CC me I am not subscribed to all MLs ]
[ CC Sami ]
Hi Bill,
I have tested your patch on top of Sami's latest clang-cfi Git branch.
Feel free to add...
Tested-by: Sedat Dilek <sedat.dilek(a)gmail.com> # LLVM toolchain
version 11.0.0-rc3 on x86-64
Thanks for the patch.
Regards,
- Sedat -
These patch series adds below kselftests to test the user-space support for the
ARMv8.5 Memory Tagging Extension present in arm64 tree [1].
1) This test-case verifies that the memory allocated by kernel mmap interface
can support tagged memory access. It first checks the presence of tags at
address[56:59] and then proceeds with read and write. The pass criteria for
this test is that tag fault exception should not happen.
2) This test-case crosses the valid memory to the invalid memory. In this
memory area valid tags are not inserted so read and write should not pass. The
pass criteria for this test is that tag fault exception should happen for all
the illegal addresses. This test also verfies that PSTATE.TCO works properly.
3) This test-case verifies that the memory inherited by child process from
parent process should have same tags copied. The pass criteria for this test is
that tag fault exception should not happen.
4) This test checks different mmap flags with PROT_MTE memory protection.
5) This testcase checks that KSM should not merge pages containing different
MTE tag values. However, if the tags are same then the pages may merge. This
testcase uses the generic ksm sysfs interfaces to verify the MTE behaviour, so
this testcase is not fullproof and may be impacted due to other load in the system.
6) Fifth test verifies that syscalls read/write etc works by considering that
user pointer has valid/invalid allocation tags.
To simplify the testing, a copy of the patchset on top of a recent linux
tree can be found at [2].
Thanks,
Amit Daniel
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mte
[2]: http://linux-arm.org/git?p=linux-ak.git;a=shortlog;h=refs/heads/kselftest-m…
Amit Daniel Kachhap (6):
kselftest/arm64: Add utilities and a test to validate mte memory
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Check mte tagged user address in kernel
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/mte/.gitignore | 6 +
tools/testing/selftests/arm64/mte/Makefile | 29 ++
.../selftests/arm64/mte/check_buffer_fill.c | 476 ++++++++++++++++++
.../selftests/arm64/mte/check_child_memory.c | 195 +++++++
.../selftests/arm64/mte/check_ksm_options.c | 131 +++++
.../selftests/arm64/mte/check_mmap_options.c | 262 ++++++++++
.../arm64/mte/check_tags_inclusion.c | 183 +++++++
.../selftests/arm64/mte/check_user_mem.c | 118 +++++
.../selftests/arm64/mte/mte_common_util.c | 374 ++++++++++++++
.../selftests/arm64/mte/mte_common_util.h | 135 +++++
tools/testing/selftests/arm64/mte/mte_def.h | 26 +
.../testing/selftests/arm64/mte/mte_helper.S | 116 +++++
13 files changed, 2052 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/mte/.gitignore
create mode 100644 tools/testing/selftests/arm64/mte/Makefile
create mode 100644 tools/testing/selftests/arm64/mte/check_buffer_fill.c
create mode 100644 tools/testing/selftests/arm64/mte/check_child_memory.c
create mode 100644 tools/testing/selftests/arm64/mte/check_ksm_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_mmap_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_tags_inclusion.c
create mode 100644 tools/testing/selftests/arm64/mte/check_user_mem.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_def.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_helper.S
--
2.17.1
This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identifed
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter wothout it
controlling object lifetimes and state changes.
There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.
The purpose of these counters is twofold: 1. clearly differentiate
atomic_t counters from atomic_t usages that guard object lifetimes,
hence prone to overflow and underflow errors. It allows tools that scan
for underflow and overflow on atomic_t usages to detect overflow and
underflows to scan just the cases that are prone to errors. 2. provides
non-atomic counters for cases where atomic isn't necessary.
Simple atomic and non-atomic counters api provides interfaces for simple
atomic and non-atomic counters that just count, and don't guard resource
lifetimes. Counters will wrap around to 0 when it overflows and should
not be used to guard resource lifetimes, device usage and open counts
that control state changes, and pm states.
Using counter_atomic to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.
This patch series introduces Simple atomic and non-atomic counters.
Counter atomic ops leverage atomic_t and provide a sub-set of atomic_t
ops.
In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:
1. Variable doesn't guard object lifetimes, manage state changes e.g:
device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.
Please review and let me know if non-stat conversions e.g: probe_count,
deferred_trigger_count make sense.
Shuah Khan (11):
counters: Introduce counter and counter_atomic counters
selftests:lib:test_counters: add new test for counters
drivers/base: convert deferred_trigger_count and probe_count to
counter_atomic
drivers/base/devcoredump: convert devcd_count to counter_atomic
drivers/acpi: convert seqno counter_atomic
drivers/acpi/apei: convert seqno counter_atomic
drivers/android/binder: convert stats, transaction_log to
counter_atomic
drivers/base/test/test_async_driver_probe: convert to use
counter_atomic
drivers/char/ipmi: convert stats to use counter_atomic
drivers/misc/vmw_vmci: convert num guest devices counter to
counter_atomic
drivers/edac: convert pci counters to counter_atomic
Documentation/core-api/counters.rst | 158 +++++++++
MAINTAINERS | 8 +
drivers/acpi/acpi_extlog.c | 5 +-
drivers/acpi/apei/ghes.c | 5 +-
drivers/android/binder.c | 41 +--
drivers/android/binder_internal.h | 3 +-
drivers/base/dd.c | 19 +-
drivers/base/devcoredump.c | 5 +-
drivers/base/test/test_async_driver_probe.c | 23 +-
drivers/char/ipmi/ipmi_msghandler.c | 9 +-
drivers/char/ipmi/ipmi_si_intf.c | 9 +-
drivers/edac/edac_pci.h | 5 +-
drivers/edac/edac_pci_sysfs.c | 28 +-
drivers/misc/vmw_vmci/vmci_guest.c | 9 +-
include/linux/counters.h | 343 +++++++++++++++++++
lib/Kconfig | 10 +
lib/Makefile | 1 +
lib/test_counters.c | 283 +++++++++++++++
tools/testing/selftests/lib/Makefile | 1 +
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/test_counters.sh | 5 +
21 files changed, 897 insertions(+), 74 deletions(-)
create mode 100644 Documentation/core-api/counters.rst
create mode 100644 include/linux/counters.h
create mode 100644 lib/test_counters.c
create mode 100755 tools/testing/selftests/lib/test_counters.sh
--
2.25.1
This series attempts to provide a simple way for BPF programs (and in
future other consumers) to utilize BPF Type Format (BTF) information
to display kernel data structures in-kernel. The use case this
functionality is applied to here is to support a snprintf()-like
helper to copy a BTF representation of kernel data to a string,
and a BPF seq file helper to display BTF data for an iterator.
There is already support in kernel/bpf/btf.c for "show" functionality;
the changes here generalize that support from seq-file specific
verifier display to the more generic case and add another specific
use case; rather than seq_printf()ing the show data, it is copied
to a supplied string using a snprintf()-like function. Other future
consumers of the show functionality could include a bpf_printk_btf()
function which printk()ed the data instead. Oops messaging in
particular would be an interesting application for such functionality.
The above potential use case hints at a potential reply to
a reasonable objection that such typed display should be
solved by tracing programs, where the in-kernel tracing records
data and the userspace program prints it out. While this
is certainly the recommended approach for most cases, I
believe having an in-kernel mechanism would be valuable
also. Critically in BPF programs it greatly simplifies
debugging and tracing of such data to invoking a simple
helper.
One challenge raised in an earlier iteration of this work -
where the BTF printing was implemented as a printk() format
specifier - was that the amount of data printed per
printk() was large, and other format specifiers were far
simpler. Here we sidestep that concern by printing
components of the BTF representation as we go for the
seq file case, and in the string case the snprintf()-like
operation is intended to be a basis for perf event or
ringbuf output. The reasons for avoiding bpf_trace_printk
are that
1. bpf_trace_printk() strings are restricted in size and
cannot display anything beyond trivial data structures; and
2. bpf_trace_printk() is for debugging purposes only.
As Alexei suggested, a bpf_trace_puts() helper could solve
this in the future but it still would be limited by the
1000 byte limit for traced strings.
Default output for an sk_buff looks like this (zeroed fields
are omitted):
(struct sk_buff){
.transport_header = (__u16)65535,
.mac_header = (__u16)65535,
.end = (sk_buff_data_t)192,
.head = (unsigned char *)0x000000007524fd8b,
.data = (unsigned char *)0x000000007524fd8b,
.truesize = (unsigned int)768,
.users = (refcount_t){
.refs = (atomic_t){
.counter = (int)1,
},
},
}
Flags can modify aspects of output format; see patch 3
for more details.
Changes since v4:
- Changed approach from a BPF trace event-centric design to one
utilizing a snprintf()-like helper and an iter helper (Alexei,
patches 3,5)
- Added tests to verify BTF output (patch 4)
- Added support to tests for verifying BTF type_id-based display
as well as type name via __builtin_btf_type_id (Andrii, patch 4).
- Augmented task iter tests to cover the BTF-based seq helper.
Because a task_struct's BTF-based representation would overflow
the PAGE_SIZE limit on iterator data, the "struct fs_struct"
(task->fs) is displayed for each task instead (Alexei, patch 6).
Changes since v3:
- Moved to RFC since the approach is different (and bpf-next is
closed)
- Rather than using a printk() format specifier as the means
of invoking BTF-enabled display, a dedicated BPF helper is
used. This solves the issue of printk() having to output
large amounts of data using a complex mechanism such as
BTF traversal, but still provides a way for the display of
such data to be achieved via BPF programs. Future work could
include a bpf_printk_btf() function to invoke display via
printk() where the elements of a data structure are printk()ed
one at a time. Thanks to Petr Mladek, Andy Shevchenko and
Rasmus Villemoes who took time to look at the earlier printk()
format-specifier-focused version of this and provided feedback
clarifying the problems with that approach.
- Added trace id to the bpf_trace_printk events as a means of
separating output from standard bpf_trace_printk() events,
ensuring it can be easily parsed by the reader.
- Added bpf_trace_btf() helper tests which do simple verification
of the various display options.
Changes since v2:
- Alexei and Yonghong suggested it would be good to use
probe_kernel_read() on to-be-shown data to ensure safety
during operation. Safe copy via probe_kernel_read() to a
buffer object in "struct btf_show" is used to support
this. A few different approaches were explored
including dynamic allocation and per-cpu buffers. The
downside of dynamic allocation is that it would be done
during BPF program execution for bpf_trace_printk()s using
%pT format specifiers. The problem with per-cpu buffers
is we'd have to manage preemption and since the display
of an object occurs over an extended period and in printk
context where we'd rather not change preemption status,
it seemed tricky to manage buffer safety while considering
preemption. The approach of utilizing stack buffer space
via the "struct btf_show" seemed like the simplest approach.
The stack size of the associated functions which have a
"struct btf_show" on their stack to support show operation
(btf_type_snprintf_show() and btf_type_seq_show()) stays
under 500 bytes. The compromise here is the safe buffer we
use is small - 256 bytes - and as a result multiple
probe_kernel_read()s are needed for larger objects. Most
objects of interest are smaller than this (e.g.
"struct sk_buff" is 224 bytes), and while task_struct is a
notable exception at ~8K, performance is not the priority for
BTF-based display. (Alexei and Yonghong, patch 2).
- safe buffer use is the default behaviour (and is mandatory
for BPF) but unsafe display - meaning no safe copy is done
and we operate on the object itself - is supported via a
'u' option.
- pointers are prefixed with 0x for clarity (Alexei, patch 2)
- added additional comments and explanations around BTF show
code, especially around determining whether objects such
zeroed. Also tried to comment safe object scheme used. (Yonghong,
patch 2)
- added late_initcall() to initialize vmlinux BTF so that it would
not have to be initialized during printk operation (Alexei,
patch 5)
- removed CONFIG_BTF_PRINTF config option as it is not needed;
CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
determining behaviour of type-based printk can be done via
retrieval of BTF data; if it's not there BTF was unavailable
or broken (Alexei, patches 4,6)
- fix bpf_trace_printk test to use vmlinux.h and globals via
skeleton infrastructure, removing need for perf events
(Andrii, patch 8)
Changes since v1:
- changed format to be more drgn-like, rendering indented type info
along with type names by default (Alexei)
- zeroed values are omitted (Arnaldo) by default unless the '0'
modifier is specified (Alexei)
- added an option to print pointer values without obfuscation.
The reason to do this is the sysctls controlling pointer display
are likely to be irrelevant in many if not most tracing contexts.
Some questions on this in the outstanding questions section below...
- reworked printk format specifer so that we no longer rely on format
%pT<type> but instead use a struct * which contains type information
(Rasmus). This simplifies the printk parsing, makes use more dynamic
and also allows specification by BTF id as well as name.
- removed incorrect patch which tried to fix dereferencing of resolved
BTF info for vmlinux; instead we skip modifiers for the relevant
case (array element type determination) (Alexei).
- fixed issues with negative snprintf format length (Rasmus)
- added test cases for various data structure formats; base types,
typedefs, structs, etc.
- tests now iterate through all typedef, enum, struct and unions
defined for vmlinux BTF and render a version of the target dummy
value which is either all zeros or all 0xff values; the idea is this
exercises the "skip if zero" and "print everything" cases.
- added support in BPF for using the %pT format specifier in
bpf_trace_printk()
- added BPF tests which ensure %pT format specifier use works (Alexei).
Alan Maguire (6):
bpf: provide function to get vmlinux BTF information
bpf: move to generic BTF show support, apply it to seq files/strings
bpf: add bpf_btf_snprintf helper
selftests/bpf: add bpf_btf_snprintf helper tests
bpf: add bpf_seq_btf_write helper
selftests/bpf: add test for bpf_seq_btf_write helper
include/linux/bpf.h | 3 +
include/linux/btf.h | 40 +
include/uapi/linux/bpf.h | 78 ++
kernel/bpf/btf.c | 978 ++++++++++++++++++---
kernel/bpf/helpers.c | 4 +
kernel/bpf/verifier.c | 18 +-
kernel/trace/bpf_trace.c | 133 +++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 78 ++
tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 66 ++
.../selftests/bpf/prog_tests/btf_snprintf.c | 55 ++
.../selftests/bpf/progs/bpf_iter_task_btf.c | 49 ++
.../selftests/bpf/progs/netif_receive_skb.c | 260 ++++++
13 files changed, 1656 insertions(+), 108 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/btf_snprintf.c
create mode 100644 tools/testing/selftests/bpf/progs/bpf_iter_task_btf.c
create mode 100644 tools/testing/selftests/bpf/progs/netif_receive_skb.c
--
1.8.3.1
v1: https://lore.kernel.org/lkml/20200912110820.597135-1-keescook@chromium.org
v2:
- Took Acked patches into -next
- refactored powerpc syscall setting implementation
- refactored clone3 args implementation
Hi,
This finishes the refactoring of the seccomp selftest logic used in
for ptrace syscall number/return handling for powerpc. Additionally
fixes clone3 (which seccomp depends on for testing) to run under MIPS
where an old struct clone_args has become visible.
(FWIW, I expect to take these via the seccomp tree.)
Thanks,
Kees Cook (4):
selftests/seccomp: Record syscall during ptrace entry
selftests/seccomp: Allow syscall nr and ret value to be set separately
selftests/seccomp: powerpc: Set syscall return during ptrace syscall
exit
selftests/clone3: Avoid OS-defined clone_args
tools/testing/selftests/clone3/clone3.c | 45 +++----
.../clone3/clone3_cap_checkpoint_restore.c | 4 +-
.../selftests/clone3/clone3_clear_sighand.c | 2 +-
.../selftests/clone3/clone3_selftests.h | 24 ++--
.../testing/selftests/clone3/clone3_set_tid.c | 4 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 120 ++++++++++++++----
6 files changed, 131 insertions(+), 68 deletions(-)
--
2.25.1
Hi,
This refactors the seccomp selftest macros used in change_syscall(),
in an effort to remove special cases for mips, arm, arm64, and xtensa,
which paves the way for powerpc fixes.
I'm not entirely done testing, but all-arch build tests and x86_64
selftests pass. I'll be doing arm, arm64, and i386 selftests shortly,
but I currently don't have an easy way to check xtensa, mips, nor
powerpc. Any help there would be appreciated!
(FWIW, I expect to take these via the seccomp tree.)
Thanks,
-Kees
Kees Cook (15):
selftests/seccomp: Refactor arch register macros to avoid xtensa
special case
selftests/seccomp: Provide generic syscall setting macro
selftests/seccomp: mips: Define SYSCALL_NUM_SET macro
selftests/seccomp: arm: Define SYSCALL_NUM_SET macro
selftests/seccomp: arm64: Define SYSCALL_NUM_SET macro
selftests/seccomp: mips: Remove O32-specific macro
selftests/seccomp: Remove syscall setting #ifdefs
selftests/seccomp: Convert HAVE_GETREG into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Convert REGSET calls into ARCH_GETREG/ARCH_SETREG
selftests/seccomp: Avoid redundant register flushes
selftests/seccomp: Remove SYSCALL_NUM_RET_SHARE_REG in favor of
SYSCALL_RET_SET
selftests/seccomp: powerpc: Fix seccomp return value testing
selftests/seccomp: powerpc: Set syscall return during ptrace syscall
exit
selftests/clone3: Avoid OS-defined clone_args
selftests/seccomp: Use __NR_mknodat instead of __NR_mknod
.../selftests/clone3/clone3_selftests.h | 16 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 313 ++++++++++--------
2 files changed, 184 insertions(+), 145 deletions(-)
--
2.25.1
This series imports a series of tests for FPSIMD and SVE originally
written by Dave Martin to the tree. Since these extensions have some
overlap in terms of register usage and must sometimes be tested together
they're dropped into a single directory. I've adapted some of the tests
to run within the kselftest framework but there are also some stress
tests here that are intended to be run as soak tests so aren't suitable
for running by default and are mostly just integrated with the build
system. There doesn't seem to be a more suitable home for those stress
tests and they are very useful for work on these areas of the code so it
seems useful to have them somewhere in tree.
v2: Rebased onto v5.9-rc1
Mark Brown (6):
selftests: arm64: Test case for enumeration of SVE vector lengths
selftests: arm64: Add test for the SVE ptrace interface
selftests: arm64: Add stress tests for FPSMID and SVE context
switching
selftests: arm64: Add utility to set SVE vector lengths
selftests: arm64: Add wrapper scripts for stress tests
selftests: arm64: Add build and documentation for FP tests
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/fp/.gitignore | 5 +
tools/testing/selftests/arm64/fp/Makefile | 17 +
tools/testing/selftests/arm64/fp/README | 100 +++
.../testing/selftests/arm64/fp/asm-offsets.h | 11 +
tools/testing/selftests/arm64/fp/assembler.h | 57 ++
.../testing/selftests/arm64/fp/fpsimd-stress | 60 ++
.../testing/selftests/arm64/fp/fpsimd-test.S | 482 +++++++++++++
.../selftests/arm64/fp/sve-probe-vls.c | 58 ++
.../selftests/arm64/fp/sve-ptrace-asm.S | 33 +
tools/testing/selftests/arm64/fp/sve-ptrace.c | 336 +++++++++
tools/testing/selftests/arm64/fp/sve-stress | 59 ++
tools/testing/selftests/arm64/fp/sve-test.S | 672 ++++++++++++++++++
tools/testing/selftests/arm64/fp/vlset.c | 155 ++++
14 files changed, 2046 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/fp/.gitignore
create mode 100644 tools/testing/selftests/arm64/fp/Makefile
create mode 100644 tools/testing/selftests/arm64/fp/README
create mode 100644 tools/testing/selftests/arm64/fp/asm-offsets.h
create mode 100644 tools/testing/selftests/arm64/fp/assembler.h
create mode 100755 tools/testing/selftests/arm64/fp/fpsimd-stress
create mode 100644 tools/testing/selftests/arm64/fp/fpsimd-test.S
create mode 100644 tools/testing/selftests/arm64/fp/sve-probe-vls.c
create mode 100644 tools/testing/selftests/arm64/fp/sve-ptrace-asm.S
create mode 100644 tools/testing/selftests/arm64/fp/sve-ptrace.c
create mode 100755 tools/testing/selftests/arm64/fp/sve-stress
create mode 100644 tools/testing/selftests/arm64/fp/sve-test.S
create mode 100644 tools/testing/selftests/arm64/fp/vlset.c
--
2.20.1
Pointer Authentication (PAuth) is a security feature introduced in ARMv8.3.
It introduces instructions to sign addresses and later check for potential
corruption using a second modifier value and one of a set of keys. The
signature, in the form of the Pointer Authentication Code (PAC), is stored
in some of the top unused bits of the virtual address (e.g. [54: 49] if
TBID0 is enabled and TnSZ is set to use a 48 bit VA space). A set of
controls are present to enable/disable groups of instructions (which use
certain keys) for compatibility with libraries that do not utilize the
feature. PAuth is used to verify the integrity of return addresses on the
stack with less memory than the stack canary.
This patchset adds kselftests to verify the kernel's configuration of the
feature and its runtime behaviour. There are 7 tests which verify that:
* an authentication failure leads to a SIGSEGV
* the data/instruction instruction groups are enabled
* the generic instructions are enabled
* all 5 keys are different for a single thread
* exec() changes all keys to new different ones
* context switching preserves the 4 data/instruction keys
* context switching preserves the generic keys
The tests have been verified to work on qemu without a working PAUTH
Implementation and on ARM's FVP with a full or partial PAuth
implementation.
Changes in v3:
* remove double blank lines
* Patch 1: "kselftests: add a basic arm64 Pointer Authentication test"
* shorten pac_corruptor.S to cut out unnecessary code
* add second signal handler to cover ARMv8.6 compatibily
* Patch 3: "kselftests/arm64: add PAuth test for whether exec() changes keys"
* change name of "exec_unique_keys" to "exec_changed_keys"
* change reporting of error to be how many keys were left unchanged
* Path 4: "kselftests/arm64: add PAuth tests for single threaded consistency and key uniqueness"
* change unique to different
* rename "single_thread_unique_keys" to "single_thread_different_keys"
* change reporting of error to be how many keys were left unchanged
Changes in v2:
* remove extra lines at end of files
* Patch 1: "kselftests: add a basic arm64 Pointer Authentication test"
* add checks for a compatible compiler in Makefile
* Patch 4: "kselftests: add PAuth tests for single threaded consistency and
key uniqueness"
* rephrase comment for clarity in pac.c
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino(a)arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap(a)arm.com>
Signed-off-by: Boyan Karatotev <boyan.karatotev(a)arm.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
Boyan Karatotev (4):
kselftests/arm64: add a basic Pointer Authentication test
kselftests/arm64: add nop checks for PAuth tests
kselftests/arm64: add PAuth test for whether exec() changes keys
kselftests/arm64: add PAuth tests for single threaded consistency and
differently initialized keys
tools/testing/selftests/arm64/Makefile | 2 +-
.../testing/selftests/arm64/pauth/.gitignore | 2 +
tools/testing/selftests/arm64/pauth/Makefile | 39 ++
.../selftests/arm64/pauth/exec_target.c | 34 ++
tools/testing/selftests/arm64/pauth/helper.c | 39 ++
tools/testing/selftests/arm64/pauth/helper.h | 28 ++
tools/testing/selftests/arm64/pauth/pac.c | 368 ++++++++++++++++++
.../selftests/arm64/pauth/pac_corruptor.S | 19 +
8 files changed, 530 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/pauth/.gitignore
create mode 100644 tools/testing/selftests/arm64/pauth/Makefile
create mode 100644 tools/testing/selftests/arm64/pauth/exec_target.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.h
create mode 100644 tools/testing/selftests/arm64/pauth/pac.c
create mode 100644 tools/testing/selftests/arm64/pauth/pac_corruptor.S
--
2.28.0
The test harness forks() a child to run each test. Both the parent and
the child print to stdout using libc functions. That can lead to
duplicated (or more) output if the libc buffers are not flushed before
forking.
It's generally not seen when running programs directly, because stdout
will usually be line buffered when it's pointing to a terminal.
This was noticed when running the seccomp_bpf test, eg:
$ ./seccomp_bpf | tee test.log
$ grep -c "TAP version 13" test.log
2
But we only expect the TAP header to appear once.
It can be exacerbated using stdbuf to increase the buffer size:
$ stdbuf -o 1MB ./seccomp_bpf > test.log
$ grep -c "TAP version 13" test.log
13
The fix is simple, we just flush stdout & stderr before fork. Usually
stderr is unbuffered, but that can be changed, so flush it as well
just to be safe.
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
tools/testing/selftests/kselftest_harness.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index 4f78e4805633..f19804df244c 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -971,6 +971,11 @@ void __run_test(struct __fixture_metadata *f,
ksft_print_msg(" RUN %s%s%s.%s ...\n",
f->name, variant->name[0] ? "." : "", variant->name, t->name);
+
+ /* Make sure output buffers are flushed before fork */
+ fflush(stdout);
+ fflush(stderr);
+
t->pid = fork();
if (t->pid < 0) {
ksft_print_msg("ERROR SPAWNING TEST CHILD\n");
--
2.25.1
Previously it was not possible to make a distinction between plain TCP
sockets and MPTCP subflow sockets on the BPF_PROG_TYPE_SOCK_OPS hook.
This patch series now enables a fine control of subflow sockets. In its
current state, it allows to put different sockopt on each subflow from a
same MPTCP connection (socket mark, TCP congestion algorithm, ...) using
BPF programs.
It should also be the basis of exposing MPTCP-specific fields through BPF.
v2 -> v3:
- minor modifications in new MPTCP selftests (Song). More details in patch notes.
- rebase on latest bpf-next
- the new is_mptcp field in bpf_tcp_sock is left as an __u32 to keep cohesion
with the is_fullsock field from bpf_sock_ops. Also it seems easier with a __u32
on the verifier side.
v1 -> v2:
- add basic mandatory selftests for the new helper and is_mptcp field (Alexei)
- rebase on latest bpf-next
Nicolas Rybowski (5):
bpf: expose is_mptcp flag to bpf_tcp_sock
mptcp: attach subflow socket to parent cgroup
bpf: add 'bpf_mptcp_sock' structure and helper
bpf: selftests: add MPTCP test base
bpf: selftests: add bpf_mptcp_sock() verifier tests
include/linux/bpf.h | 33 +++++
include/uapi/linux/bpf.h | 15 +++
kernel/bpf/verifier.c | 30 +++++
net/core/filter.c | 13 +-
net/mptcp/Makefile | 2 +
net/mptcp/bpf.c | 72 +++++++++++
net/mptcp/subflow.c | 27 ++++
scripts/bpf_helpers_doc.py | 2 +
tools/include/uapi/linux/bpf.h | 15 +++
tools/testing/selftests/bpf/config | 1 +
tools/testing/selftests/bpf/network_helpers.c | 37 +++++-
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../testing/selftests/bpf/prog_tests/mptcp.c | 118 ++++++++++++++++++
tools/testing/selftests/bpf/progs/mptcp.c | 48 +++++++
tools/testing/selftests/bpf/verifier/sock.c | 63 ++++++++++
15 files changed, 473 insertions(+), 6 deletions(-)
create mode 100644 net/mptcp/bpf.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/mptcp.c
create mode 100644 tools/testing/selftests/bpf/progs/mptcp.c
--
2.28.0
As pointed out by Michael Ellerman, the ptrace ABI on powerpc does not
allow or require the return code to be set on syscall entry when
skipping the syscall. It will always return ENOSYS and the return code
must be set on syscall exit.
This code does that, behaving more similarly to strace. It still sets
the return code on entry, which is overridden on powerpc, and it will
always repeat the same on exit. Also, on powerpc, the errno is not
inverted, and depends on ccr.so being set.
This has been tested on powerpc and amd64.
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Kees Cook <keescook(a)google.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 24 +++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 252140a52553..b90a9190ba88 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1738,6 +1738,14 @@ void change_syscall(struct __test_metadata *_metadata,
TH_LOG("Can't modify syscall return on this architecture");
#else
regs.SYSCALL_RET = result;
+# if defined(__powerpc__)
+ if (result < 0) {
+ regs.SYSCALL_RET = -result;
+ regs.ccr |= 0x10000000;
+ } else {
+ regs.ccr &= ~0x10000000;
+ }
+# endif
#endif
#ifdef HAVE_GETREGS
@@ -1796,6 +1804,7 @@ void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
int ret, nr;
unsigned long msg;
static bool entry;
+ int *syscall_nr = args;
/*
* The traditional way to tell PTRACE_SYSCALL entry/exit
@@ -1809,10 +1818,15 @@ void tracer_ptrace(struct __test_metadata *_metadata, pid_t tracee,
EXPECT_EQ(entry ? PTRACE_EVENTMSG_SYSCALL_ENTRY
: PTRACE_EVENTMSG_SYSCALL_EXIT, msg);
- if (!entry)
+ if (!entry && !syscall_nr)
return;
- nr = get_syscall(_metadata, tracee);
+ if (entry)
+ nr = get_syscall(_metadata, tracee);
+ else
+ nr = *syscall_nr;
+ if (syscall_nr)
+ *syscall_nr = nr;
if (nr == __NR_getpid)
change_syscall(_metadata, tracee, __NR_getppid, 0);
@@ -1889,9 +1903,10 @@ TEST_F(TRACE_syscall, ptrace_syscall_redirected)
TEST_F(TRACE_syscall, ptrace_syscall_errno)
{
+ int syscall_nr = -1;
/* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */
teardown_trace_fixture(_metadata, self->tracer);
- self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL,
+ self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, &syscall_nr,
true);
/* Tracer should skip the open syscall, resulting in ESRCH. */
@@ -1900,9 +1915,10 @@ TEST_F(TRACE_syscall, ptrace_syscall_errno)
TEST_F(TRACE_syscall, ptrace_syscall_faked)
{
+ int syscall_nr = -1;
/* Swap SECCOMP_RET_TRACE tracer for PTRACE_SYSCALL tracer. */
teardown_trace_fixture(_metadata, self->tracer);
- self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, NULL,
+ self->tracer = setup_trace_fixture(_metadata, tracer_ptrace, &syscall_nr,
true);
/* Tracer should skip the gettid syscall, resulting fake pid. */
--
2.25.1
In case of errors, this message was printed:
(...)
# read: Resource temporarily unavailable
# client exit code 0, server 3
# \nnetns ns1-0-BJlt5D socket stat for 10003:
(...)
Obviously, the idea was to add a new line before the socket stat and not
print "\nnetns".
Fixes: b08fbf241064 ("selftests: add test-cases for MPTCP MP_JOIN")
Fixes: 048d19d444be ("mptcp: add basic kselftest for mptcp")
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Notes:
This commit improves the output in selftests in case of errors, mostly
seen when modifying MPTCP code. The selftests behaviour is not changed.
That's why this patch is proposed only for net-next.
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 4 ++--
tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index e4df9ba64824..2cfd87d94db8 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -443,9 +443,9 @@ do_transfer()
duration=$(printf "(duration %05sms)" $duration)
if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
echo "$duration [ FAIL ] client exit code $retc, server $rets" 1>&2
- echo "\nnetns ${listener_ns} socket stat for $port:" 1>&2
+ echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2
ip netns exec ${listener_ns} ss -nita 1>&2 -o "sport = :$port"
- echo "\nnetns ${connector_ns} socket stat for $port:" 1>&2
+ echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2
ip netns exec ${connector_ns} ss -nita 1>&2 -o "dport = :$port"
cat "$capout"
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index f39c1129ce5f..c2943e4dfcfe 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -176,9 +176,9 @@ do_transfer()
if [ ${rets} -ne 0 ] || [ ${retc} -ne 0 ]; then
echo " client exit code $retc, server $rets" 1>&2
- echo "\nnetns ${listener_ns} socket stat for $port:" 1>&2
+ echo -e "\nnetns ${listener_ns} socket stat for ${port}:" 1>&2
ip netns exec ${listener_ns} ss -nita 1>&2 -o "sport = :$port"
- echo "\nnetns ${connector_ns} socket stat for $port:" 1>&2
+ echo -e "\nnetns ${connector_ns} socket stat for ${port}:" 1>&2
ip netns exec ${connector_ns} ss -nita 1>&2 -o "dport = :$port"
cat "$capout"
--
2.27.0
On 9/15/20 11:52 AM, Justin Cook wrote:
> Hello,
>
> Linaro had previously been sending out a report based on our testing of
> the linux kernel using kselftest. We paused sending that report to fix a
> few issues. We are now continuing the process, starting with this report.
>
> If you have any questions, comments, feedback, or concerns please email
> lkft(a)linaro.org <mailto:lkft@linaro.org>.
>
> Thanks,
>
> Justin
>
Hi Justin,
Thanks for the report. It would be nice to see the reports. However, it
is hard for me to determine which tests failed and why.
> On Tue, 15 Sep 2020 at 12:44, LKFT <lkft(a)linaro.org
> <mailto:lkft@linaro.org>> wrote:
>
> ## Kernel
> * kernel: 5.9.0-rc5
> * git repo:
> ['https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git',
> 'https://gitlab.com/Linaro/lkft/mirrors/next/linux-next']
> * git branch: master
> * git commit: 6b02addb1d1748d21dd1261e46029b264be4e5a0
> * git describe: next-20200915
> * Test details:
> https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20200915
>
> ## Regressions (compared to build next-20200914)
>
> juno-r2:
> kselftest:
> * memfd_memfd_test
>
> x86:
> kselftest-vsyscall-mode-native:
> * kvm_vmx_preemption_timer_test
I looked for the above two failures to start with since these
are regressions and couldn't find them.
Are the regressions tied to new commits in linux-next from the
mm and kvm trees?
thanks,
-- Shuah
Pointer Authentication (PAuth) is a security feature introduced in ARMv8.3.
It introduces instructions to sign addresses and later check for potential
corruption using a second modifier value and one of a set of keys. The
signature, in the form of the Pointer Authentication Code (PAC), is stored
in some of the top unused bits of the virtual address (e.g. [54: 49] if
TBID0 is enabled and TnSZ is set to use a 48 bit VA space). A set of
controls are present to enable/disable groups of instructions (which use
certain keys) for compatibility with libraries that do not utilize the
feature. PAuth is used to verify the integrity of return addresses on the
stack with less memory than the stack canary.
This patchset adds kselftests to verify the kernel's configuration of the
feature and its runtime behaviour. There are 7 tests which verify that:
* an authentication failure leads to a SIGSEGV
* the data/instruction instruction groups are enabled
* the generic instructions are enabled
* all 5 keys are unique for a single thread
* exec() changes all keys to new unique ones
* context switching preserves the 4 data/instruction keys
* context switching preserves the generic keys
The tests have been verified to work on qemu without a working PAUTH
Implementation and on ARM's FVP with a full or partial PAuth
implementation.
Changes in v2:
* remove extra lines at end of files
* Patch 1: "kselftests: add a basic arm64 Pointer Authentication test"
* add checks for a compatible compiler in Makefile
* Patch 4: "kselftests: add PAuth tests for single threaded consistency and
key uniqueness"
* rephrase comment for clarity in pac.c
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Reviewed-by: Vincenzo Frascino <Vincenzo.Frascino(a)arm.com>
Reviewed-by: Amit Daniel Kachhap <amit.kachhap(a)arm.com>
Signed-off-by: Boyan Karatotev <boyan.karatotev(a)arm.com>
Boyan Karatotev (4):
kselftests/arm64: add a basic Pointer Authentication test
kselftests/arm64: add nop checks for PAuth tests
kselftests/arm64: add PAuth test for whether exec() changes keys
kselftests/arm64: add PAuth tests for single threaded consistency and
key uniqueness
tools/testing/selftests/arm64/Makefile | 2 +-
.../testing/selftests/arm64/pauth/.gitignore | 2 +
tools/testing/selftests/arm64/pauth/Makefile | 39 ++
.../selftests/arm64/pauth/exec_target.c | 35 ++
tools/testing/selftests/arm64/pauth/helper.c | 40 ++
tools/testing/selftests/arm64/pauth/helper.h | 29 ++
tools/testing/selftests/arm64/pauth/pac.c | 348 ++++++++++++++++++
.../selftests/arm64/pauth/pac_corruptor.S | 35 ++
8 files changed, 529 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/pauth/.gitignore
create mode 100644 tools/testing/selftests/arm64/pauth/Makefile
create mode 100644 tools/testing/selftests/arm64/pauth/exec_target.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.h
create mode 100644 tools/testing/selftests/arm64/pauth/pac.c
create mode 100644 tools/testing/selftests/arm64/pauth/pac_corruptor.S
--
2.17.1
On Mon, Sep 14, 2020 at 11:55:24PM +0200, Thomas Gleixner wrote:
> But just look at any check which uses preemptible(), especially those
> which check !preemptible():
hmm.
+++ b/include/linux/preempt.h
@@ -180,7 +180,9 @@ do { \
#define preempt_enable_no_resched() sched_preempt_enable_no_resched()
+#ifndef MODULE
#define preemptible() (preempt_count() == 0 && !irqs_disabled())
+#endif
#ifdef CONFIG_PREEMPTION
#define preempt_enable() \
$ git grep -w preemptible drivers
(slightly trimmed by hand to remove, eg, comments)
drivers/firmware/arm_sdei.c: WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c: WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c: WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c: WARN_ON_ONCE(preemptible());
drivers/firmware/arm_sdei.c: WARN_ON(preemptible());
drivers/firmware/efi/efi-pstore.c: preemptible(), record->size, record->psi->buf);
drivers/irqchip/irq-gic-v4.c: WARN_ON(preemptible());
drivers/irqchip/irq-gic-v4.c: WARN_ON(preemptible());
drivers/scsi/hisi_sas/hisi_sas_main.c: if (!preemptible())
drivers/xen/time.c: BUG_ON(preemptible());
That only looks like two drivers that need more than WARNectomies.
Although maybe rcu_read_load_sched_held() or rcu_read_lock_any_held()
might get called from a module ...
Pointer Authentication (PAuth) is a security feature introduced in ARMv8.3.
It introduces instructions to sign addresses and later check for potential
corruption using a second modifier value and one of a set of keys. The
signature, in the form of the Pointer Authentication Code (PAC), is stored
in some of the top unused bits of the virtual address (e.g. [54: 49] if
TBID0 is enabled and TnSZ is set to use a 48 bit VA space). A set of
controls are present to enable/disable groups of instructions (which use
certain keys) for compatibility with libraries that do not utilize the
feature. PAuth is used to verify the integrity of return addresses on the
stack with less memory than the stack canary.
This patchset adds kselftests to verify the kernel's configuration of the
feature and its runtime behaviour. There are 7 tests which verify that:
* an authentication failure leads to a SIGSEGV
* the data/instruction instruction groups are enabled
* the generic instructions are enabled
* all 5 keys are unique for a single thread
* exec() changes all keys to new unique ones
* context switching preserves the 4 data/instruction keys
* context switching preserves the generic keys
The tests have been verified to work on qemu without a working PAUTH
Implementation and on ARM's FVP with a full or partial PAuth
implementation.
Note: This patchset is only verified for ARMv8.3 and there will be some
changes required for ARMv8.6. More details can be found here [1]. Once
ARMv8.6 PAuth is merged the first test in this series will required to be
updated.
[1] https://lore.kernel.org/linux-arm-kernel/1597734671-23407-1-git-send-email-…
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Signed-off-by: Boyan Karatotev <boyan.karatotev(a)arm.com>
Boyan Karatotev (4):
kselftests/arm64: add a basic Pointer Authentication test
kselftests/arm64: add nop checks for PAuth tests
kselftests/arm64: add PAuth test for whether exec() changes keys
kselftests/arm64: add PAuth tests for single threaded consistency and
key uniqueness
tools/testing/selftests/arm64/Makefile | 2 +-
.../testing/selftests/arm64/pauth/.gitignore | 2 +
tools/testing/selftests/arm64/pauth/Makefile | 29 ++
.../selftests/arm64/pauth/exec_target.c | 35 ++
tools/testing/selftests/arm64/pauth/helper.c | 41 +++
tools/testing/selftests/arm64/pauth/helper.h | 30 ++
tools/testing/selftests/arm64/pauth/pac.c | 347 ++++++++++++++++++
.../selftests/arm64/pauth/pac_corruptor.S | 36 ++
8 files changed, 521 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/pauth/.gitignore
create mode 100644 tools/testing/selftests/arm64/pauth/Makefile
create mode 100644 tools/testing/selftests/arm64/pauth/exec_target.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.c
create mode 100644 tools/testing/selftests/arm64/pauth/helper.h
create mode 100644 tools/testing/selftests/arm64/pauth/pac.c
create mode 100644 tools/testing/selftests/arm64/pauth/pac_corruptor.S
--
2.17.1
On 14/09/20 21:42, Thomas Gleixner wrote:
> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> removed. Cleanup the leftovers before doing so.
>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: Ingo Molnar <mingo(a)redhat.com>
> Cc: Peter Zijlstra <peterz(a)infradead.org>
> Cc: Juri Lelli <juri.lelli(a)redhat.com>
> Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
> Cc: Steven Rostedt <rostedt(a)goodmis.org>
> Cc: Ben Segall <bsegall(a)google.com>
> Cc: Mel Gorman <mgorman(a)suse.de>
> Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Small nit below;
Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com>
> ---
> kernel/sched/core.c | 6 +-----
> lib/Kconfig.debug | 1 -
> 2 files changed, 1 insertion(+), 6 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3706,8 +3706,7 @@ asmlinkage __visible void schedule_tail(
> * finish_task_switch() for details.
> *
> * finish_task_switch() will drop rq->lock() and lower preempt_count
> - * and the preempt_enable() will end up enabling preemption (on
> - * PREEMPT_COUNT kernels).
I suppose this wanted to be s/PREEMPT_COUNT/PREEMPT/ in the first place,
which ought to be still relevant.
> + * and the preempt_enable() will end up enabling preemption.
> */
>
> rq = finish_task_switch(prev);
On 14/09/20 21:42, Thomas Gleixner wrote:
> CONFIG_PREEMPT_COUNT is now unconditionally enabled and will be
> removed. Cleanup the leftovers before doing so.
>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Cc: Ingo Molnar <mingo(a)kernel.org>
> Cc: Peter Zijlstra <peterz(a)infradead.org>
> Cc: Juri Lelli <juri.lelli(a)redhat.com>
> Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
> Cc: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
> Cc: Steven Rostedt <rostedt(a)goodmis.org>
> Cc: Ben Segall <bsegall(a)google.com>
> Cc: Mel Gorman <mgorman(a)suse.de>
> Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com>
On Mon, Sep 14, 2020 at 01:59:15PM -0700, Linus Torvalds wrote:
> On Mon, Sep 14, 2020 at 1:45 PM Thomas Gleixner <tglx(a)linutronix.de> wrote:
> >
> > Recently merged code does:
> >
> > gfp = preemptible() ? GFP_KERNEL : GFP_ATOMIC;
> >
> > Looks obviously correct, except for the fact that preemptible() is
> > unconditionally false for CONFIF_PREEMPT_COUNT=n, i.e. all allocations in
> > that code use GFP_ATOMIC on such kernels.
>
> I don't think this is a good reason to entirely get rid of the no-preempt thing.
>
> The above is just garbage. It's bogus. You can't do it.
>
> Blaming the no-preempt code for this bug is extremely unfair, imho.
>
> And the no-preempt code does help make for much better code generation
> for simple spinlocks.
>
> Where is that horribly buggy recent code? It's not in that exact
> format, certainly, since 'grep' doesn't find it.
It would be convenient for that "gfp =" code to work, as this would
allow better cache locality while invoking RCU callbacks, and would
further provide better robustness to callback floods. The full story
is quite long, but here are alternatives have not yet been proven to be
abject failures:
1. Use workqueues to do the allocations in a clean context.
While waiting for the allocations, the callbacks are queued
in the old cache-busting manner. This functions correctly,
but in the meantime (which on busy systems can be some time)
the cache locality and robustness are lost.
2. Provide the ability to allocate memory in raw atomic context.
This is extremely effective, especially when used in combination
with #1 above, but as you might suspect, the MM guys don't like
it much.
In contrast, with Thomas's patch series, call_rcu() and kvfree_rcu()
could just look at preemptible() to see whether or not it was safe to
allocate memory, even in !PREEMPT kernels -- and in the common case,
it almost always would be safe. It is quite possible that this approach
would work in isolation, or failing that, that adding #1 above would do
the trick.
I understand that this is all very hand-wavy, and I do apologize for that.
If you really want the full sad story with performance numbers and the
works, let me know!
Thanx, Paul
Hi,
This fixes a couple of minor aggravating factors that I ran across while
trying to do some changes in selftests/vm. These are simple things, but
like most things with GNU Make, it's rarely obvious what's wrong until
you understand *the entire Makefile and all of its includes*.
So while there is, of course, joy in learning those details, I thought I'd
fix these little things, so as to allow others to skip out on the Joy if
they so choose. :)
First of all, if you have an item (let's choose userfaultfd for an
example) that fails to build, you might do this:
$ make -j32
# ...you observe a failed item in the threaded output
# OK, let's get a closer look
$ make
# ...but now the build quietly "succeeds".
That's what Patch 0001 fixes.
Second, if you instead attempt this approach for your closer look (a casual
mistake, as it's not supported):
$ make userfaultfd
# ...userfaultfd fails to link, due to incomplete LDLIBS
That's what Patch 0002 fixes.
John Hubbard (2):
selftests/vm: fix false build success on the second and later attempts
selftests/vm: fix incorrect gcc invocation in some cases
tools/testing/selftests/vm/Makefile | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
--
2.28.0
On Tue, 15 Sep 2020 at 01:43, Linus Torvalds
<torvalds(a)linux-foundation.org> wrote:
>
> On Mon, Sep 14, 2020 at 3:24 PM Linus Torvalds
> <torvalds(a)linux-foundation.org> wrote:
> >
> > Ard and Herbert added to participants: see
> > chacha20poly1305_crypt_sg_inplace(), which does
> >
> > flags = SG_MITER_TO_SG;
> > if (!preemptible())
> > flags |= SG_MITER_ATOMIC;
> >
> > introduced in commit d95312a3ccc0 ("crypto: lib/chacha20poly1305 -
> > reimplement crypt_from_sg() routine").
>
> As far as I can tell, the only reason for this all is to try to use
> "kmap()" rather than "kmap_atomic()".
>
> And kmap() actually has the much more complex "might_sleep()" tests,
> and apparently the "preemptible()" check wasn't even the proper full
> debug check, it was just a complete hack to catch the one that
> triggered.
>
This was not driven by a failing check.
The documentation of kmap_atomic() states the following:
* The use of kmap_atomic/kunmap_atomic is discouraged - kmap/kunmap
* gives a more generic (and caching) interface. But kmap_atomic can
* be used in IRQ contexts, so in some (very limited) cases we need
* it.
so if this is no longer accurate, perhaps we should fix it?
But another reason I tried to avoid kmap_atomic() is that it disables
preemption unconditionally, even on 64-bit architectures where HIGHMEM
is irrelevant. So using kmap_atomic() here means that the bulk of
WireGuard packet encryption runs with preemption disabled, essentially
for legacy reasons.
> From a quick look, that code should probably just get rid of
> SG_MITER_ATOMIC entirely, and alwayse use kmap_atomic().
>
> kmap_atomic() is actually the faster and proper interface to use
> anyway (never mind that any of this matters on any sane hardware). The
> old kmap() and kunmap() interfaces should generally be avoided like
> the plague - yes, they allow sleeping in the middle and that is
> sometimes required, but if you don't need that, you should never ever
> use them.
>
> We used to have a very nasty kmap_atomic() that required people to be
> very careful and know exactly which atomic entry to use, and that was
> admitedly quite nasty.
>
> So it _looks_ like this code started using kmap() - probably back when
> kmap_atomic() was so cumbersome to use - and was then converted
> (conditionally) to kmap_atomic() rather than just changed whole-sale.
> Is there actually something that wants to use those sg_miter functions
> and sleep?
>
> Because if there is, that choice should come from the outside, not
> from inside lib/scatterlist.c trying to make some bad guess based on
> the wrong thing entirely.
>
> Linus