Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.
This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.
We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.
Patch changelog:
v3:
* kprobe: fix build on !CONFIG_KPROBES
v2:
* documentation: mention kretprobe stack-stashing
* ftrace: add self-test for fixed kretprobe stacktraces
* ftrace: remove [unknown/kretprobe'd] handling
* kprobe: remove needless EXPORT statements
* kprobe: minor corrections to current_kretprobe_instance (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 27 +++++
kernel/events/callchain.c | 8 +-
kernel/kprobes.c | 101 +++++++++++++++++-
kernel/trace/trace.c | 11 +-
kernel/trace/trace_output.c | 34 +-----
.../test.d/kprobe/kretprobe_stacktrace.tc | 25 +++++
7 files changed, 177 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
--
2.19.1
Hi,friend,
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Best Regards,
Daniel Murray
Purchasing Manager
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <linux/memfd.h>
#include <linux/fcntl.h>
#include <asm/unistd.h>
#include <unistd.h>
#define F_SEAL_FUTURE_WRITE 0x0010
#define REGION_SIZE (5 * 1024 * 1024)
int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
F_SEAL_GROW |
F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Cc: hch(a)infradead.org
Reviewed-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 2bb5e257080e..5ba9804e9515 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FUTURE_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FUTURE_WRITE) &&
+ !(*file_seals & F_SEAL_FUTURE_WRITE)) {
+ /*
+ * The FUTURE_WRITE seal also prevents growing and shrinking
+ * so we need them to be already set, or requested now.
+ */
+ int test_seals = (seals | *file_seals) &
+ (F_SEAL_GROW | F_SEAL_SHRINK);
+
+ if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
+ error = -EINVAL;
+ goto unlock;
+ }
+
+ spin_lock(&file->f_lock);
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+ spin_unlock(&file->f_lock);
+ }
+
*file_seals |= seals;
error = 0;
--
2.19.1.930.g4563a0d9d0-goog
Hi,friend,
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Best Regards,
Daniel Murray
Purchasing Manager
MAP_FIXED is important for this test but, unfortunately, lowest virtual
address for user space mapping on arm is (PAGE_SIZE * 2) and NULL hint
does not seem to guarantee that when MAP_FIXED is given. This patch sets
the virtual address that will hold the mapping for the test, fixing the
issue.
Link: https://bugs.linaro.org/show_bug.cgi?id=3782
Signed-off-by: Rafael David Tinoco <rafael.tinoco(a)linaro.org>
---
tools/testing/selftests/proc/proc-self-map-files-002.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-self-map-files-002.c b/tools/testing/selftests/proc/proc-self-map-files-002.c
index 6f1f4a6e1ecb..0a47eaca732a 100644
--- a/tools/testing/selftests/proc/proc-self-map-files-002.c
+++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
@@ -55,7 +55,9 @@ int main(void)
if (fd == -1)
return 1;
- p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
+ p = mmap((void *) (2 * PAGE_SIZE), PAGE_SIZE, PROT_NONE,
+ MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
+
if (p == MAP_FAILED) {
if (errno == EPERM)
return 2;
--
2.19.1
On Sat, Nov 10, 2018 at 04:26:46AM -0800, Daniel Colascione wrote:
> On Friday, November 9, 2018, Joel Fernandes <joel(a)joelfernandes.org> wrote:
>
> > On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> > > On Fri, Nov 9, 2018 at 10:06 PM Jann Horn <jannh(a)google.com> wrote:
> > > > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> > > > <joel(a)joelfernandes.org> wrote:
> > > > > Android uses ashmem for sharing memory regions. We are looking
> > forward
> > > > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > > > remove the ashmem driver in the future from staging while also
> > > > > benefiting from using memfd and contributing to it. Note staging
> > drivers
> > > > > are also not ABI and generally can be removed at anytime.
> > > > >
> > > > > One of the main usecases Android has is the ability to create a
> > region
> > > > > and mmap it as writeable, then add protection against making any
> > > > > "future" writes while keeping the existing already mmap'ed
> > > > > writeable-region active. This allows us to implement a usecase where
> > > > > receivers of the shared memory buffer can get a read-only view, while
> > > > > the sender continues to write to the buffer.
> > > > > See CursorWindow documentation in Android for more details:
> > > > > https://developer.android.com/reference/android/database/
> > CursorWindow
> > > > >
> > > > > This usecase cannot be implemented with the existing F_SEAL_WRITE
> > seal.
> > > > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE
> > seal
> > > > > which prevents any future mmap and write syscalls from succeeding
> > while
> > > > > keeping the existing mmap active.
> > > >
> > > > Please CC linux-api@ on patches like this. If you had done that, I
> > > > might have criticized your v1 patch instead of your v3 patch...
> > > >
> > > > > The following program shows the seal
> > > > > working in action:
> > > > [...]
> > > > > Cc: jreck(a)google.com
> > > > > Cc: john.stultz(a)linaro.org
> > > > > Cc: tkjos(a)google.com
> > > > > Cc: gregkh(a)linuxfoundation.org
> > > > > Cc: hch(a)infradead.org
> > > > > Reviewed-by: John Stultz <john.stultz(a)linaro.org>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
> > > > > ---
> > > > [...]
> > > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > > index 2bb5e257080e..5ba9804e9515 100644
> > > > > --- a/mm/memfd.c
> > > > > +++ b/mm/memfd.c
> > > > [...]
> > > > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file,
> > unsigned int seals)
> > > > > }
> > > > > }
> > > > >
> > > > > + if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > > > + !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > > > + /*
> > > > > + * The FUTURE_WRITE seal also prevents growing and
> > shrinking
> > > > > + * so we need them to be already set, or requested
> > now.
> > > > > + */
> > > > > + int test_seals = (seals | *file_seals) &
> > > > > + (F_SEAL_GROW | F_SEAL_SHRINK);
> > > > > +
> > > > > + if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > > > + error = -EINVAL;
> > > > > + goto unlock;
> > > > > + }
> > > > > +
> > > > > + spin_lock(&file->f_lock);
> > > > > + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > > > + spin_unlock(&file->f_lock);
> > > > > + }
> > > >
> > > > So you're fiddling around with the file, but not the inode? How are
> > > > you preventing code like the following from re-opening the file as
> > > > writable?
> > > >
> > > > $ cat memfd.c
> > > > #define _GNU_SOURCE
> > > > #include <unistd.h>
> > > > #include <sys/syscall.h>
> > > > #include <printf.h>
> > > > #include <fcntl.h>
> > > > #include <err.h>
> > > > #include <stdio.h>
> > > >
> > > > int main(void) {
> > > > int fd = syscall(__NR_memfd_create, "testfd", 0);
> > > > if (fd == -1) err(1, "memfd");
> > > > char path[100];
> > > > sprintf(path, "/proc/self/fd/%d", fd);
> > > > int fd2 = open(path, O_RDWR);
> > > > if (fd2 == -1) err(1, "reopen");
> > > > printf("reopen successful: %d\n", fd2);
> > > > }
> > > > $ gcc -o memfd memfd.c
> > > > $ ./memfd
> > > > reopen successful: 4
> > > > $
> > > >
> > > > That aside: I wonder whether a better API would be something that
> > > > allows you to create a new readonly file descriptor, instead of
> > > > fiddling with the writability of an existing fd.
> > >
> > > My favorite approach would be to forbid open() on memfds, hope that
> > > nobody notices the tiny API break, and then add an ioctl for "reopen
> > > this memfd with reduced permissions" - but that's just my personal
> > > opinion.
> >
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its
> > not
> > an ABI break because this is a brand new seal. That seems the least
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and
> > Tested-by
> > to the new fix? The patch is based on top of this series.
> >
>
> Please don't forbid reopens entirely. You're taking a feature that works
> generally (reopens) and breaking it in one specific case (memfd write
> sealed files). The open modes are available in .open in the struct file:
> you can deny *only* opens for write instead of denying reopens generally.
Yes, as we discussed over chat already, I will implement it that way.
Also lets continue to discuss Andy's concerns he raised on the other thread.
thanks,
- Joel
From: Stanislav Fomichev <sdf(a)google.com>
v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
load/loadall
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
an example
Stanislav Fomichev (7):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
libbpf: add internal pin_name
bpftool: add loadall command
bpftool: add pinmaps argument to the load/loadall
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 42 +-
tools/bpf/bpftool/bash-completion/bpftool | 21 +-
tools/bpf/bpftool/common.c | 31 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 185 ++++++---
tools/lib/bpf/libbpf.c | 364 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 18 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 546 insertions(+), 120 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin which is now being
used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds actual support to the bpftool
See forth patch for the description/details.
Stanislav Fomichev (4):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 36 ++-
tools/bpf/bpftool/bash-completion/bpftool | 6 +-
tools/bpf/bpftool/common.c | 30 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 112 ++++++--
tools/lib/bpf/libbpf.c | 258 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 11 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 368 insertions(+), 90 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
All,
Updating Kselftest wiki and providing links to overview and how-to documents
has been on my list of things to do for a while.
It is now updated with the current status and links to documents. I am planning
to write a detailed how-to blog/article.
https://kselftest.wiki.kernel.org
thanks,
-- Shuah
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin which is now being
used to attach all flow dissector progs/maps
* third patch adds actual support to the bpftool
See third patch for the description/details.
Stanislav Fomichev (3):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 26 +++--
tools/bpf/bpftool/bash-completion/bpftool | 2 +-
tools/bpf/bpftool/common.c | 30 +++---
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 94 ++++++++++++++-----
tools/lib/bpf/libbpf.c | 58 ++++++++++--
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
8 files changed, 151 insertions(+), 64 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds actual support to the bpftool
See second patch for the description/details.
Stanislav Fomichev (2):
selftests/bpf: rename flow dissector section to flow_dissector
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 16 ++-
tools/bpf/bpftool/common.c | 32 +++--
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 135 +++++++++++++++---
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
6 files changed, 143 insertions(+), 45 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
From: Randy Dunlap <rdunlap(a)infradead.org>
This is a small cleanup to kselftest.rst:
- Fix some language typos in the usage instructions.
- Change one non-ASCII space to an ASCII space.
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-doc(a)vger.kernel.org
---
Documentation/dev-tools/kselftest.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- linux-next-20181101.orig/Documentation/dev-tools/kselftest.rst
+++ linux-next-20181101/Documentation/dev-tools/kselftest.rst
@@ -9,7 +9,7 @@ and booting a kernel.
On some systems, hot-plug tests could hang forever waiting for cpu and
memory to be ready to be offlined. A special hot-plug target is created
-to run full range of hot-plug tests. In default mode, hot-plug tests run
+to run the full range of hot-plug tests. In default mode, hot-plug tests run
in safe mode with a limited scope. In limited mode, cpu-hotplug test is
run on a single cpu as opposed to all hotplug capable cpus, and memory
hotplug test is run on 2% of hotplug capable memory instead of 10%.
@@ -89,9 +89,9 @@ Note that some tests will require root p
Install selftests
=================
-You can use kselftest_install.sh tool installs selftests in default
-location which is tools/testing/selftests/kselftest or a user specified
-location.
+You can use the kselftest_install.sh tool to install selftests in the
+default location, which is tools/testing/selftests/kselftest, or in a
+user specified location.
To install selftests in default location::
@@ -109,7 +109,7 @@ Running installed selftests
Kselftest install as well as the Kselftest tarball provide a script
named "run_kselftest.sh" to run the tests.
-You can simply do the following to run the installed Kselftests. Please
+You can simply do the following to run the installed Kselftests. Please
note some tests will require root privileges::
$ cd kselftest
@@ -139,7 +139,7 @@ Contributing new tests (details)
default.
TEST_CUSTOM_PROGS should be used by tests that require custom build
- rule and prevent common build rule use.
+ rules and prevent common build rule use.
TEST_PROGS are for test shell scripts. Please ensure shell script has
its exec bit set. Otherwise, lib.mk run_tests will generate a warning.
On smaller systems, running a test with 200 threads can take a long
time on machines with smaller number of CPUs.
Detect the number of online cpus at test runtime, and multiply that
by 6 to have 6 rseq threads per cpu preempting each other.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Watson <davejwatson(a)fb.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Andi Kleen <andi(a)firstfloor.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: "H . Peter Anvin" <hpa(a)zytor.com>
Cc: Chris Lameter <cl(a)linux.com>
Cc: Russell King <linux(a)arm.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Boqun Feng <boqun.feng(a)gmail.com>
Cc: Josh Triplett <josh(a)joshtriplett.org>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ben Maurer <bmaurer(a)fb.com>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
---
tools/testing/selftests/rseq/run_param_test.sh | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rseq/run_param_test.sh b/tools/testing/selftests/rseq/run_param_test.sh
index 3acd6d75ff9f..e426304fd4a0 100755
--- a/tools/testing/selftests/rseq/run_param_test.sh
+++ b/tools/testing/selftests/rseq/run_param_test.sh
@@ -1,6 +1,8 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+ or MIT
+NR_CPUS=`grep '^processor' /proc/cpuinfo | wc -l`
+
EXTRA_ARGS=${@}
OLDIFS="$IFS"
@@ -28,15 +30,16 @@ IFS="$OLDIFS"
REPS=1000
SLOW_REPS=100
+NR_THREADS=$((6*${NR_CPUS}))
function do_tests()
{
local i=0
while [ "$i" -lt "${#TEST_LIST[@]}" ]; do
echo "Running test ${TEST_NAME[$i]}"
- ./param_test ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+ ./param_test ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
echo "Running compare-twice test ${TEST_NAME[$i]}"
- ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} ${@} ${EXTRA_ARGS} || exit 1
+ ./param_test_compare_twice ${TEST_LIST[$i]} -r ${REPS} -t ${NR_THREADS} ${@} ${EXTRA_ARGS} || exit 1
let "i++"
done
}
--
2.11.0
Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.
This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.
We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.
Patch changelog:
v2:
* documentation: mention kretprobe stack-stashing
* ftrace: add self-test for fixed kretprobe stacktraces
* ftrace: remove [unknown/kretprobe'd] handling
* kprobe: remove needless EXPORT statements
* kprobe: minor corrections to current_kretprobe_instance (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
(I forgot to Cc the BPF folks in v1, I've added them now.)
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 15 +++
kernel/events/callchain.c | 8 +-
kernel/kprobes.c | 101 +++++++++++++++++-
kernel/trace/trace.c | 11 +-
kernel/trace/trace_output.c | 34 +-----
.../test.d/kprobe/kretprobe_stacktrace.tc | 25 +++++
7 files changed, 165 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
--
2.19.1
Discussions around time virtualization are there for a long time.
The first attempt to implement time namespace was in 2006 by Jeff Dike.
>From that time, the topic appears on and off in various discussions.
There are two main use cases for time namespaces:
1. change date and time inside a container;
2. adjust clocks for a container restored from a checkpoint.
“It seems like this might be one of the last major obstacles keeping
migration from being used in production systems, given that not all
containers and connections can be migrated as long as a time dependency
is capable of messing it up.” (by github.com/dav-ell)
The kernel provides access to several clocks: CLOCK_REALTIME,
CLOCK_MONOTONIC, CLOCK_BOOTTIME. Last two clocks are monotonous, but the
start points for them are not defined and are different for each running
system. When a container is migrated from one node to another, all
clocks have to be restored into consistent states; in other words, they
have to continue running from the same points where they have been
dumped.
The main idea behind this patch set is adding per-namespace offsets for
system clocks. When a process in a non-root time namespace requests
time of a clock, a namespace offset is added to the current value of
this clock on a host and the sum is returned.
All offsets are placed on a separate page, this allows up to map it as
part of vvar into user processes and use offsets from vdso calls.
Now offsets are implemented for CLOCK_MONOTONIC and CLOCK_BOOTTIME
clocks.
Questions to discuss:
* Clone flags exhaustion. Currently there is only one unused clone flag
bit left, and it may be worth to use it to extend arguments of the clone
system call.
* Realtime clock implementation details:
Is having a simple offset enough?
What to do when date and time is changed on the host?
Is there a need to adjust vfs modification and creation times?
Implementation for adjtime() syscall.
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Adrian Reber <adrian(a)lisas.de>
Cc: Andrei Vagin <avagin(a)openvz.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Christian Brauner <christian.brauner(a)ubuntu.com>
Cc: Cyrill Gorcunov <gorcunov(a)openvz.org>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jeff Dike <jdike(a)addtoit.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: containers(a)lists.linux-foundation.org
Cc: criu(a)openvz.org
Cc: linux-api(a)vger.kernel.org
Cc: x86(a)kernel.org
Andrei Vagin (12):
ns: Introduce Time Namespace
timens: Add timens_offsets
timens: Introduce CLOCK_MONOTONIC offsets
timens: Introduce CLOCK_BOOTTIME offset
timerfd/timens: Take into account ns clock offsets
kernel: Take into account timens clock offsets in clock_nanosleep
x86/vdso/timens: Add offsets page in vvar
x86/vdso: Use set_normalized_timespec() to avoid 32 bit overflow
posix-timers/timens: Take into account clock offsets
selftest/timens: Add test for timerfd
selftest/timens: Add test for clock_nanosleep
timens/selftest: Add timer offsets test
Dmitry Safonov (8):
timens: Shift /proc/uptime
x86/vdso: Restrict splitting vvar vma
x86/vdso: Purge timens page on setns()/unshare()/clone()
x86/vdso: Look for vvar vma to purge timens page
timens: Add align for timens_offsets
timens: Optimize zero-offsets
selftest: Add Time Namespace test for supported clocks
timens/selftest: Add procfs selftest
arch/Kconfig | 5 +
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vclock_gettime.c | 52 +++++
arch/x86/entry/vdso/vdso-layout.lds.S | 9 +-
arch/x86/entry/vdso/vdso2c.c | 3 +
arch/x86/entry/vdso/vma.c | 67 +++++++
arch/x86/include/asm/vdso.h | 2 +
fs/proc/namespaces.c | 3 +
fs/proc/uptime.c | 3 +
fs/timerfd.c | 16 +-
include/linux/nsproxy.h | 1 +
include/linux/proc_ns.h | 1 +
include/linux/time_namespace.h | 72 +++++++
include/linux/timens_offsets.h | 25 +++
include/linux/user_namespace.h | 1 +
include/uapi/linux/sched.h | 1 +
init/Kconfig | 8 +
kernel/Makefile | 1 +
kernel/fork.c | 3 +-
kernel/nsproxy.c | 19 +-
kernel/time/hrtimer.c | 8 +
kernel/time/posix-timers.c | 89 ++++++++-
kernel/time/posix-timers.h | 2 +
kernel/time_namespace.c | 230 +++++++++++++++++++++++
tools/testing/selftests/timens/.gitignore | 5 +
tools/testing/selftests/timens/Makefile | 6 +
tools/testing/selftests/timens/clock_nanosleep.c | 98 ++++++++++
tools/testing/selftests/timens/config | 1 +
tools/testing/selftests/timens/log.h | 21 +++
tools/testing/selftests/timens/procfs.c | 145 ++++++++++++++
tools/testing/selftests/timens/timens.c | 196 +++++++++++++++++++
tools/testing/selftests/timens/timer.c | 95 ++++++++++
tools/testing/selftests/timens/timerfd.c | 96 ++++++++++
33 files changed, 1272 insertions(+), 13 deletions(-)
create mode 100644 include/linux/time_namespace.h
create mode 100644 include/linux/timens_offsets.h
create mode 100644 kernel/time_namespace.c
create mode 100644 tools/testing/selftests/timens/.gitignore
create mode 100644 tools/testing/selftests/timens/Makefile
create mode 100644 tools/testing/selftests/timens/clock_nanosleep.c
create mode 100644 tools/testing/selftests/timens/config
create mode 100644 tools/testing/selftests/timens/log.h
create mode 100644 tools/testing/selftests/timens/procfs.c
create mode 100644 tools/testing/selftests/timens/timens.c
create mode 100644 tools/testing/selftests/timens/timer.c
create mode 100644 tools/testing/selftests/timens/timerfd.c
--
2.13.6
This series fixes issues I encountered building and running the
selftests on a Ubuntu Cosmic ppc64le system.
Joel Stanley (6):
selftests: powerpc/ptrace: Make tests build
selftests: powerpc/ptrace: Remove clean rule
selftests: powerpc/ptrace: Fix linking against pthread
selftests: powerpc/signal: Make tests build
selftests: powerpc/signal: Fix signal_tm CFLAGS
selftests: powerpc/pmu: Link ebb tests with -no-pie
tools/testing/selftests/powerpc/pmu/ebb/Makefile | 3 +++
tools/testing/selftests/powerpc/ptrace/Makefile | 11 ++++-------
tools/testing/selftests/powerpc/signal/Makefile | 9 +++------
3 files changed, 10 insertions(+), 13 deletions(-)
--
2.19.1
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <linux/memfd.h>
#include <linux/fcntl.h>
#include <asm/unistd.h>
#include <unistd.h>
#define F_SEAL_FUTURE_WRITE 0x0010
#define REGION_SIZE (5 * 1024 * 1024)
int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
F_SEAL_GROW |
F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Cc: hch(a)infradead.org
Reviewed-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 2bb5e257080e..5ba9804e9515 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FUTURE_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FUTURE_WRITE) &&
+ !(*file_seals & F_SEAL_FUTURE_WRITE)) {
+ /*
+ * The FUTURE_WRITE seal also prevents growing and shrinking
+ * so we need them to be already set, or requested now.
+ */
+ int test_seals = (seals | *file_seals) &
+ (F_SEAL_GROW | F_SEAL_SHRINK);
+
+ if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
+ error = -EINVAL;
+ goto unlock;
+ }
+
+ spin_lock(&file->f_lock);
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+ spin_unlock(&file->f_lock);
+ }
+
*file_seals |= seals;
error = 0;
--
2.19.1.331.ge82ca0e54c-goog
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
When passing tagged pointers to syscalls, there's a special case of such a
pointer being passed to one of the memory syscalls (mmap, mprotect, etc.).
These syscalls don't do memory accesses but rather deal with memory
ranges, hence an untagged pointer is better suited.
This patchset extends tagged pointer support to non-memory syscalls. This
is done by reusing the untagged_addr macro to untag user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok).
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [3].
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Andrey Konovalov (8):
arm64: add type casts to untagged_addr macro
uaccess: add untagged_addr definition for other arches
arm64: untag user addresses in access_ok and __uaccess_mask_ptr
mm, arm64: untag user addresses in mm/gup.c
lib, arm64: untag addrs passed to strncpy_from_user and strnlen_user
fs, arm64: untag user address in copy_mount_options
arm64: update Documentation/arm64/tagged-pointers.txt
selftests, arm64: add a selftest for passing tagged pointers to kernel
Documentation/arm64/tagged-pointers.txt | 24 +++++++++++--------
arch/arm64/include/asm/uaccess.h | 14 +++++++----
fs/namespace.c | 2 +-
include/linux/uaccess.h | 4 ++++
lib/strncpy_from_user.c | 2 ++
lib/strnlen_user.c | 2 ++
mm/gup.c | 4 ++++
tools/testing/selftests/arm64/.gitignore | 1 +
tools/testing/selftests/arm64/Makefile | 11 +++++++++
.../testing/selftests/arm64/run_tags_test.sh | 12 ++++++++++
tools/testing/selftests/arm64/tags_test.c | 19 +++++++++++++++
11 files changed, 79 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.19.0.605.g01d371f741-goog
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
--
2.19.1.331.ge82ca0e54c-goog
If test is being directly executed (with stdout opened on the
terminal) and the terminal capabilities indicate enough
colors, then use the existing scheme of green, red, and blue
to show when tests pass, fail or end in a different way.
When running the tests redirecting the stdout, for instance,
to a file, then colors are not shown, thus producing a more
readable output.
Signed-off-by: Daniel Díaz <daniel.diaz(a)linaro.org>
---
tools/testing/selftests/ftrace/ftracetest | 29 +++++++++++++++++------
1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index 4946b2edfcff..d987bbec675f 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -152,6 +152,21 @@ else
date > $LOG_FILE
fi
+# Define text colors
+# Check available colors on the terminal, if any
+ncolors=`tput colors 2>/dev/null`
+color_reset=
+color_red=
+color_green=
+color_blue=
+# If stdout exists and number of colors is eight or more, use them
+if [ -t 1 -a "$ncolors" -a "$ncolors" -ge 8 ]; then
+ color_reset="\e[0m"
+ color_red="\e[31m"
+ color_green="\e[32m"
+ color_blue="\e[34m"
+fi
+
prlog() { # messages
[ -z "$LOG_FILE" ] && echo -e "$@" || echo -e "$@" | tee -a $LOG_FILE
}
@@ -195,37 +210,37 @@ test_on_instance() { # testfile
eval_result() { # sigval
case $1 in
$PASS)
- prlog " [\e[32mPASS\e[30m]"
+ prlog " [${color_green}PASS${color_reset}]"
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
- prlog " [\e[31mFAIL\e[30m]"
+ prlog " [${color_red}FAIL${color_reset}]"
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
- prlog " [\e[34mUNRESOLVED\e[30m]"
+ prlog " [${color_blue}UNRESOLVED${color_reset}]"
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return 1 # this is a kind of bug.. something happened.
;;
$UNTESTED)
- prlog " [\e[34mUNTESTED\e[30m]"
+ prlog " [${color_blue}UNTESTED${color_reset}]"
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
- prlog " [\e[34mUNSUPPORTED\e[30m]"
+ prlog " [${color_blue}UNSUPPORTED${color_reset}]"
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
- prlog " [\e[31mXFAIL\e[30m]"
+ prlog " [${color_red}XFAIL${color_reset}]"
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
- prlog " [\e[34mUNDEFINED\e[30m]"
+ prlog " [${color_blue}UNDEFINED${color_reset}]"
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
--
2.17.1
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then drop its protection for "future" writes
while keeping the existing already mmap'ed writeable-region active.
This allows us to implement a usecase where receivers of the shared
memory buffer can get a read-only view, while the sender continues to
write to the buffer. See CursorWindow in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FS_WRITE seal which
prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no fs-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FS_WRITE);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("fs-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to fs-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
fs-write seal now active
write failed as expected due to fs-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Note: This seal will also prevent growing and shrinking of the memfd.
This is not something we do in Android so it does not affect us, however
I have mentioned this behavior of the seal in the manpage.
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 6 +++++-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index c98312fa78a5..fe44a2035edf 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FS_WRITE 0x0010 /* prevent all write-related syscalls */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 27069518e3c5..9b8855b80de9 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FS_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,9 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FS_WRITE) && !(*file_seals & F_SEAL_FS_WRITE))
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+
*file_seals |= seals;
error = 0;
--
2.19.0.605.g01d371f741-goog
Makefile contains -D_GNU_SOURCE. remove define "_GNU_SOURCE"
in c files.
Signed-off-by: Peng Hao <peng.hao2(a)zte.com.cn>
---
tools/testing/selftests/proc/fd-001-lookup.c | 2 +-
tools/testing/selftests/proc/fd-003-kthread.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/proc/fd-001-lookup.c b/tools/testing/selftests/proc/fd-001-lookup.c
index a2010df..60d7948 100644
--- a/tools/testing/selftests/proc/fd-001-lookup.c
+++ b/tools/testing/selftests/proc/fd-001-lookup.c
@@ -14,7 +14,7 @@
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
// Test /proc/*/fd lookup.
-#define _GNU_SOURCE
+
#undef NDEBUG
#include <assert.h>
#include <dirent.h>
diff --git a/tools/testing/selftests/proc/fd-003-kthread.c b/tools/testing/selftests/proc/fd-003-kthread.c
index 1d659d5..dc591f9 100644
--- a/tools/testing/selftests/proc/fd-003-kthread.c
+++ b/tools/testing/selftests/proc/fd-003-kthread.c
@@ -14,7 +14,7 @@
* OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*/
// Test that /proc/$KERNEL_THREAD/fd/ is empty.
-#define _GNU_SOURCE
+
#undef NDEBUG
#include <sys/syscall.h>
#include <assert.h>
--
1.8.3.1
Fixes the following warnings:
dirty_log_test.c: In function ‘help’:
dirty_log_test.c:216:9: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘int’ [-Wformat=]
printf(" -i: specify iteration counts (default: %"PRIu64")\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from include/test_util.h:18:0,
from dirty_log_test.c:16:
/usr/include/inttypes.h:105:34: note: format string is defined here
# define PRIu64 __PRI64_PREFIX "u"
dirty_log_test.c:218:9: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘int’ [-Wformat=]
printf(" -I: specify interval in ms (default: %"PRIu64" ms)\n",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from include/test_util.h:18:0,
from dirty_log_test.c:16:
/usr/include/inttypes.h:105:34: note: format string is defined here
# define PRIu64 __PRI64_PREFIX "u"
Signed-off-by: Andrea Parri <andrea.parri(a)amarulasolutions.com>
---
tools/testing/selftests/kvm/dirty_log_test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 0c2cdc105f968..a9c4b5e21d7e7 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -31,9 +31,9 @@
/* How many pages to dirty for each guest loop */
#define TEST_PAGES_PER_LOOP 1024
/* How many host loops to run (one KVM_GET_DIRTY_LOG for each loop) */
-#define TEST_HOST_LOOP_N 32
+#define TEST_HOST_LOOP_N 32UL
/* Interval for each host loop (ms) */
-#define TEST_HOST_LOOP_INTERVAL 10
+#define TEST_HOST_LOOP_INTERVAL 10UL
/*
* Guest variables. We use these variables to share data between host
--
2.17.1
Hello Jay Kamat,
This is a semi-automatic email about new static checker warnings.
The patch 48c2bb0b9cf8: "Fix cg_read_strcmp()" from Sep 7, 2018,
leads to the following Smatch complaint:
./tools/testing/selftests/cgroup/cgroup_util.c:111 cg_read_strcmp()
error: we previously assumed 'expected' could be null (see line 97)
./tools/testing/selftests/cgroup/cgroup_util.c
96 /* Handle the case of comparing against empty string */
97 if (!expected)
^^^^^^^^
Originally, we assumed that expected was non-NULL but we added a check
here. I feel like maybe the intention was to check was supposed to be:
if (expected[0] == '\0')
but that's just a random guess.
98 size = 32;
99 else
100 size = strlen(expected) + 1;
101
102 buf = malloc(size);
103 if (!buf)
104 return -1;
105
106 if (cg_read(cgroup, control, buf, size)) {
107 free(buf);
108 return -1;
109 }
110
111 ret = strcmp(expected, buf);
^^^^^^^^
Unchecked dereference.
112 free(buf);
113 return ret;
regards,
dan carpenter
When test_lwt_seg6local.sh was added commit c99a84eac026
("selftests/bpf: test for seg6local End.BPF action") config fragment
wasn't added, and without CONFIG_LWTUNNEL enabled we see this:
Error: CONFIG_LWTUNNEL is not enabled in this kernel.
selftests: test_lwt_seg6local [FAILED]
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/bpf/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config
index 3655508f95fd..dd49df5e2df4 100644
--- a/tools/testing/selftests/bpf/config
+++ b/tools/testing/selftests/bpf/config
@@ -19,3 +19,4 @@ CONFIG_CRYPTO_SHA256=m
CONFIG_VXLAN=y
CONFIG_GENEVE=y
CONFIG_NET_CLS_FLOWER=m
+CONFIG_LWTUNNEL=y
--
2.19.1