On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.
Detect the number of online cpus at test runtime, and multiply that
by 6 to have 6 rseq threads per cpu preempting each other.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Watson <davejwatson(a)fb.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Chris Lameter <cl(a)linux.com>
Cc: Russell King <linux(a)arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Maurer <bmaurer(a)fb.com>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
---
tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
index 3acd6d75ff9f..e426304fd4a0 100755
--- a/tools/testing/selftests/rseq/run_param_test.sh
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+ or MIT
+NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
+
EXTRA_ARGS=${@}
OLDIFS="$IFS"
@@ -28,15 +30,16 @@ IFS="$OLDIFS"
REPS=1000
SLOW_REPS=100
+NR_THREADS=$((6*${NR_CPUS}))
function do_tests()
{
local i=0
while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
echo "Running test ${TEST_NAME[$i]}"
- ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+ ./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
echo "Running compare-twice test ${TEST_NAME[$i]}"
- ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+ ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
let "i++"
done
}
--
2.11.0
Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.
This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.
We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.
Patch changelog:
v2:
* documentation: mention kretprobe stack-stashing
* ftrace: add self-test for fixed kretprobe stacktraces
* ftrace: remove [unknown/kretprobe'd] handling
* kprobe: remove needless EXPORT statements
* kprobe: minor corrections to current_kretprobe_instance (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
(I forgot to Cc the BPF folks in v1, I've added them now.)
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 15 +++
kernel/events/callchain.c | 8 +-
kernel/kprobes.c | 101 +++++++++++++++++-
kernel/trace/trace.c | 11 +-
kernel/trace/trace_output.c | 34 +-----
.../test.d/kprobe/kretprobe_stacktrace.tc | 25 +++++
7 files changed, 165 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
--
2.19.1
Discussions around time virtualization are there for a long time.
The first attempt to implement time namespace was in 2006 by Jeff Dike.
>From that time, the topic appears on and off in various discussions.
There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.
“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)
The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each running
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.
The main idea behind this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock on a host and the sum is returned.
All offsets are placed on a separate page, this allows up to map it as
part of vvar into user processes and use offsets from vdso calls.
Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.
Questions to discuss:
* Clone flags exhaustion. Currently there is only one unused clone flag
bit left, and it may be worth to use it to extend arguments of the clone
system call.
* Realtime clock implementation details:
Is having a simple offset enough?
What to do when date and time is changed on the host?
Is there a need to adjust vfs modification and creation times?
Implementation for adjtime() syscall.
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Adrian Reber <adrian(a)lisas.de>
Cc: Andrei Vagin <avagin(a)openvz.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Cyrill Gorcunov <gorcunov(a)openvz.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jeff Dike <jdike(a)addtoit.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: containers(a)lists.linux-foundation.org
Cc: criu(a)openvz.org
Cc: linux-api(a)vger.kernel.org
Cc: x86(a)kernel.org
Andrei Vagin (12):
ns: Introduce Time Namespace
timens: Add timens_offsets
timens: Introduce CLOCK_MONOTONIC offsets
timens: Introduce CLOCK_BOOTTIME offset
timerfd/timens: Take into account ns clock offsets
kernel: Take into account timens clock offsets in clock_nanosleep
x86/vdso/timens: Add offsets page in vvar
x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow
posix-timers/timens: Take into account clock offsets
selftest/timens: Add test for timerfd
selftest/timens: Add test for clock_nanosleep
timens/selftest: Add timer offsets test
Dmitry Safonov (8):
timens: Shift /proc/uptime
x86/vdso: Restrict splitting vvar vma
x86/vdso: Purge timens page on setns()/unshare()/clone()
x86/vdso: Look for vvar vma to purge timens page
timens: Add align for timens_offsets
timens: Optimize zero-offsets
selftest: Add Time Namespace test for supported clocks
timens/selftest: Add procfs selftest
arch/Kconfig | 5 +
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vclock_gettime.c | 52 +++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 +-
arch/x86/entry/vdso/vdso2c.c | 3 +
arch/x86/entry/vdso/vma.c | 67 +++++++
arch/x86/include/asm/vdso.h | 2 +
fs/proc/namespaces.c | 3 +
fs/proc/uptime.c | 3 +
fs/timerfd.c | 16 +-
include/linux/nsproxy.h | 1 +
include/linux/proc_ns.h | 1 +
include/linux/time_namespace.h | 72 +++++++
include/linux/timens_offsets.h | 25 +++
include/linux/user_namespace.h | 1 +
include/uapi/linux/sched.h | 1 +
init/Kconfig | 8 +
kernel/Makefile | 1 +
kernel/fork.c | 3 +-
kernel/nsproxy.c | 19 +-
kernel/time/hrtimer.c | 8 +
kernel/time/posix-timers.c | 89 ++++++++-
kernel/time/posix-timers.h | 2 +
kernel/time_namespace.c | 230 +++++++++++++++++++++++
tools/testing/selftests/timens/.gitignore | 5 +
tools/testing/selftests/timens/Makefile | 6 +
tools/testing/selftests/timens/clock_nanosleep.c | 98 ++++++++++
tools/testing/selftests/timens/config | 1 +
tools/testing/selftests/timens/log.h | 21 +++
tools/testing/selftests/timens/procfs.c | 145 ++++++++++++++
tools/testing/selftests/timens/timens.c | 196 +++++++++++++++++++
tools/testing/selftests/timens/timer.c | 95 ++++++++++
tools/testing/selftests/timens/timerfd.c | 96 ++++++++++
33 files changed, 1272 insertions(+), 13 deletions(-)
create mode 100644 include/linux/time_namespace.h
create mode 100644 include/linux/timens_offsets.h
create mode 100644 kernel/time_namespace.c
create mode 100644 tools/testing/selftests/timens/.gitignore
create mode 100644 tools/testing/selftests/timens/Makefile
create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
create mode 100644 tools/testing/selftests/timens/config
create mode 100644 tools/testing/selftests/timens/log.h
create mode 100644 tools/testing/selftests/timens/procfs.c
create mode 100644 tools/testing/selftests/timens/timens.c
create mode 100644 tools/testing/selftests/timens/timer.c
create mode 100644 tools/testing/selftests/timens/timerfd.c
--
2.13.6
This series fixes issues I encountered building and running the
selftests on a Ubuntu Cosmic ppc64le system.
Joel Stanley (6):
selftests: powerpc/ptrace: Make tests build
selftests: powerpc/ptrace: Remove clean rule
selftests: powerpc/ptrace: Fix linking against pthread
selftests: powerpc/signal: Make tests build
selftests: powerpc/signal: Fix signal_tm CFLAGS
selftests: powerpc/pmu: Link ebb tests with -no-pie
tools/testing/selftests/powerpc/pmu/ebb/Makefile | 3 +++
tools/testing/selftests/powerpc/ptrace/Makefile | 11 ++++-------
tools/testing/selftests/powerpc/signal/Makefile | 9 +++------
3 files changed, 10 insertions(+), 13 deletions(-)
--
2.19.1
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <linux/memfd.h>
#include <linux/fcntl.h>
#include <asm/unistd.h>
#include <unistd.h>
#define F_SEAL_FUTURE_WRITE 0x0010
#define REGION_SIZE (5 * 1024 * 1024)
int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
F_SEAL_GROW |
F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Cc: hch(a)infradead.org
Reviewed-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 2bb5e257080e..5ba9804e9515 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FUTURE_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FUTURE_WRITE) &&
+ !(*file_seals & F_SEAL_FUTURE_WRITE)) {
+ /*
+ * The FUTURE_WRITE seal also prevents growing and shrinking
+ * so we need them to be already set, or requested now.
+ */
+ int test_seals = (seals | *file_seals) &
+ (F_SEAL_GROW | F_SEAL_SHRINK);
+
+ if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
+ error = -EINVAL;
+ goto unlock;
+ }
+
+ spin_lock(&file->f_lock);
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+ spin_unlock(&file->f_lock);
+ }
+
*file_seals |= seals;
error = 0;
--
2.19.1.331.ge82ca0e54c-goog