Historically, kretprobe has always produced unusable stack traces
(kretprobe_trampoline is the only entry in most cases, because of the
funky stack pointer overwriting). This has caused quite a few annoyances
when using tracing to debug problems[1] -- since return values are only
available with kretprobes but stack traces were only usable for kprobes,
users had to probe both and then manually associate them.
This patch series stores the stack trace within kretprobe_instance on
the kprobe entry used to set up the kretprobe. This allows for
DTrace-style stack aggregation between function entry and exit with
tools like BPFtrace -- which would not really be doable if the stack
unwinder understood kretprobe_trampoline.
We also revert commit 76094a2cf46e ("ftrace: distinguish kretprobe'd
functions in trace logs") and any follow-up changes because that code is
no longer necessary now that stack traces are sane. *However* this patch
might be a bit contentious since the original usecase (that ftrace
returns shouldn't show kretprobe_trampoline) is arguably still an
issue. Feel free to drop it if you think it is wrong.
Patch changelog:
v3:
* kprobe: fix build on !CONFIG_KPROBES
v2:
* documentation: mention kretprobe stack-stashing
* ftrace: add self-test for fixed kretprobe stacktraces
* ftrace: remove [unknown/kretprobe'd] handling
* kprobe: remove needless EXPORT statements
* kprobe: minor corrections to current_kretprobe_instance (switch
away from hlist_for_each_entry_safe)
* kprobe: make maximum stack size 127, which is the ftrace default
Aleksa Sarai (2):
kretprobe: produce sane stack traces
trace: remove kretprobed checks
Documentation/kprobes.txt | 6 +-
include/linux/kprobes.h | 27 +++++
kernel/events/callchain.c | 8 +-
kernel/kprobes.c | 101 +++++++++++++++++-
kernel/trace/trace.c | 11 +-
kernel/trace/trace_output.c | 34 +-----
.../test.d/kprobe/kretprobe_stacktrace.tc | 25 +++++
7 files changed, 177 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kretprobe_stacktrace.tc
--
2.19.1
Hi,friend,
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Best Regards,
Daniel Murray
Purchasing Manager
Android uses ashmem for sharing memory regions. We are looking forward
to migrating all usecases of ashmem to memfd so that we can possibly
remove the ashmem driver in the future from staging while also
benefiting from using memfd and contributing to it. Note staging drivers
are also not ABI and generally can be removed at anytime.
One of the main usecases Android has is the ability to create a region
and mmap it as writeable, then add protection against making any
"future" writes while keeping the existing already mmap'ed
writeable-region active. This allows us to implement a usecase where
receivers of the shared memory buffer can get a read-only view, while
the sender continues to write to the buffer.
See CursorWindow documentation in Android for more details:
https://developer.android.com/reference/android/database/CursorWindow
This usecase cannot be implemented with the existing F_SEAL_WRITE seal.
To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE seal
which prevents any future mmap and write syscalls from succeeding while
keeping the existing mmap active. The following program shows the seal
working in action:
#include <stdio.h>
#include <errno.h>
#include <sys/mman.h>
#include <linux/memfd.h>
#include <linux/fcntl.h>
#include <asm/unistd.h>
#include <unistd.h>
#define F_SEAL_FUTURE_WRITE 0x0010
#define REGION_SIZE (5 * 1024 * 1024)
int memfd_create_region(const char *name, size_t size)
{
int ret;
int fd = syscall(__NR_memfd_create, name, MFD_ALLOW_SEALING);
if (fd < 0) return fd;
ret = ftruncate(fd, size);
if (ret < 0) { close(fd); return ret; }
return fd;
}
int main() {
int ret, fd;
void *addr, *addr2, *addr3, *addr1;
ret = memfd_create_region("test_region", REGION_SIZE);
printf("ret=%d\n", ret);
fd = ret;
// Create map
addr = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED)
printf("map 0 failed\n");
else
printf("map 0 passed\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed even though no future-write seal "
"(ret=%d errno =%d)\n", ret, errno);
else
printf("write passed\n");
addr1 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr1 == MAP_FAILED)
perror("map 1 prot-write failed even though no seal\n");
else
printf("map 1 prot-write passed as expected\n");
ret = fcntl(fd, F_ADD_SEALS, F_SEAL_FUTURE_WRITE |
F_SEAL_GROW |
F_SEAL_SHRINK);
if (ret == -1)
printf("fcntl failed, errno: %d\n", errno);
else
printf("future-write seal now active\n");
if ((ret = write(fd, "test", 4)) != 4)
printf("write failed as expected due to future-write seal\n");
else
printf("write passed (unexpected)\n");
addr2 = mmap(0, REGION_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr2 == MAP_FAILED)
perror("map 2 prot-write failed as expected due to seal\n");
else
printf("map 2 passed\n");
addr3 = mmap(0, REGION_SIZE, PROT_READ, MAP_SHARED, fd, 0);
if (addr3 == MAP_FAILED)
perror("map 3 failed\n");
else
printf("map 3 prot-read passed as expected\n");
}
The output of running this program is as follows:
ret=3
map 0 passed
write passed
map 1 prot-write passed as expected
future-write seal now active
write failed as expected due to future-write seal
map 2 prot-write failed as expected due to seal
: Permission denied
map 3 prot-read passed as expected
Cc: jreck(a)google.com
Cc: john.stultz(a)linaro.org
Cc: tkjos(a)google.com
Cc: gregkh(a)linuxfoundation.org
Cc: hch(a)infradead.org
Reviewed-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
v1->v2: No change, just added selftests to the series. manpages are
ready and I'll submit them once the patches are accepted.
v2->v3: Updated commit message to have more support code (John Stultz)
Renamed seal from F_SEAL_FS_WRITE to F_SEAL_FUTURE_WRITE
(Christoph Hellwig)
Allow for this seal only if grow/shrink seals are also
either previous set, or are requested along with this seal.
(Christoph Hellwig)
Added locking to synchronize access to file->f_mode.
(Christoph Hellwig)
include/uapi/linux/fcntl.h | 1 +
mm/memfd.c | 22 +++++++++++++++++++++-
2 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index 6448cdd9a350..a2f8658f1c55 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -41,6 +41,7 @@
#define F_SEAL_SHRINK 0x0002 /* prevent file from shrinking */
#define F_SEAL_GROW 0x0004 /* prevent file from growing */
#define F_SEAL_WRITE 0x0008 /* prevent writes */
+#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
/* (1U << 31) is reserved for signed error codes */
/*
diff --git a/mm/memfd.c b/mm/memfd.c
index 2bb5e257080e..5ba9804e9515 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -150,7 +150,8 @@ static unsigned int *memfd_file_seals_ptr(struct file *file)
#define F_ALL_SEALS (F_SEAL_SEAL | \
F_SEAL_SHRINK | \
F_SEAL_GROW | \
- F_SEAL_WRITE)
+ F_SEAL_WRITE | \
+ F_SEAL_FUTURE_WRITE)
static int memfd_add_seals(struct file *file, unsigned int seals)
{
@@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file, unsigned int seals)
}
}
+ if ((seals & F_SEAL_FUTURE_WRITE) &&
+ !(*file_seals & F_SEAL_FUTURE_WRITE)) {
+ /*
+ * The FUTURE_WRITE seal also prevents growing and shrinking
+ * so we need them to be already set, or requested now.
+ */
+ int test_seals = (seals | *file_seals) &
+ (F_SEAL_GROW | F_SEAL_SHRINK);
+
+ if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
+ error = -EINVAL;
+ goto unlock;
+ }
+
+ spin_lock(&file->f_lock);
+ file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
+ spin_unlock(&file->f_lock);
+ }
+
*file_seals |= seals;
error = 0;
--
2.19.1.930.g4563a0d9d0-goog
Hi,friend,
This is Daniel Murray and i am from Sinara Group Co.Ltd Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Best Regards,
Daniel Murray
Purchasing Manager
MAP_FIXED is important for this test but, unfortunately, lowest virtual
address for user space mapping on arm is (PAGE_SIZE * 2) and NULL hint
does not seem to guarantee that when MAP_FIXED is given. This patch sets
the virtual address that will hold the mapping for the test, fixing the
issue.
Link: https://bugs.linaro.org/show_bug.cgi?id=3782
Signed-off-by: Rafael David Tinoco <rafael.tinoco(a)linaro.org>
---
tools/testing/selftests/proc/proc-self-map-files-002.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-self-map-files-002.c b/tools/testing/selftests/proc/proc-self-map-files-002.c
index 6f1f4a6e1ecb..0a47eaca732a 100644
--- a/tools/testing/selftests/proc/proc-self-map-files-002.c
+++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
@@ -55,7 +55,9 @@ int main(void)
if (fd == -1)
return 1;
- p = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
+ p = mmap((void *) (2 * PAGE_SIZE), PAGE_SIZE, PROT_NONE,
+ MAP_PRIVATE|MAP_FILE|MAP_FIXED, fd, 0);
+
if (p == MAP_FAILED) {
if (errno == EPERM)
return 2;
--
2.19.1
On Sat, Nov 10, 2018 at 04:26:46AM -0800, Daniel Colascione wrote:
> On Friday, November 9, 2018, Joel Fernandes <joel(a)joelfernandes.org> wrote:
>
> > On Fri, Nov 09, 2018 at 10:19:03PM +0100, Jann Horn wrote:
> > > On Fri, Nov 9, 2018 at 10:06 PM Jann Horn <jannh(a)google.com> wrote:
> > > > On Fri, Nov 9, 2018 at 9:46 PM Joel Fernandes (Google)
> > > > <joel(a)joelfernandes.org> wrote:
> > > > > Android uses ashmem for sharing memory regions. We are looking
> > forward
> > > > > to migrating all usecases of ashmem to memfd so that we can possibly
> > > > > remove the ashmem driver in the future from staging while also
> > > > > benefiting from using memfd and contributing to it. Note staging
> > drivers
> > > > > are also not ABI and generally can be removed at anytime.
> > > > >
> > > > > One of the main usecases Android has is the ability to create a
> > region
> > > > > and mmap it as writeable, then add protection against making any
> > > > > "future" writes while keeping the existing already mmap'ed
> > > > > writeable-region active. This allows us to implement a usecase where
> > > > > receivers of the shared memory buffer can get a read-only view, while
> > > > > the sender continues to write to the buffer.
> > > > > See CursorWindow documentation in Android for more details:
> > > > > https://developer.android.com/reference/android/database/
> > CursorWindow
> > > > >
> > > > > This usecase cannot be implemented with the existing F_SEAL_WRITE
> > seal.
> > > > > To support the usecase, this patch adds a new F_SEAL_FUTURE_WRITE
> > seal
> > > > > which prevents any future mmap and write syscalls from succeeding
> > while
> > > > > keeping the existing mmap active.
> > > >
> > > > Please CC linux-api@ on patches like this. If you had done that, I
> > > > might have criticized your v1 patch instead of your v3 patch...
> > > >
> > > > > The following program shows the seal
> > > > > working in action:
> > > > [...]
> > > > > Cc: jreck(a)google.com
> > > > > Cc: john.stultz(a)linaro.org
> > > > > Cc: tkjos(a)google.com
> > > > > Cc: gregkh(a)linuxfoundation.org
> > > > > Cc: hch(a)infradead.org
> > > > > Reviewed-by: John Stultz <john.stultz(a)linaro.org>
> > > > > Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
> > > > > ---
> > > > [...]
> > > > > diff --git a/mm/memfd.c b/mm/memfd.c
> > > > > index 2bb5e257080e..5ba9804e9515 100644
> > > > > --- a/mm/memfd.c
> > > > > +++ b/mm/memfd.c
> > > > [...]
> > > > > @@ -219,6 +220,25 @@ static int memfd_add_seals(struct file *file,
> > unsigned int seals)
> > > > > }
> > > > > }
> > > > >
> > > > > + if ((seals & F_SEAL_FUTURE_WRITE) &&
> > > > > + !(*file_seals & F_SEAL_FUTURE_WRITE)) {
> > > > > + /*
> > > > > + * The FUTURE_WRITE seal also prevents growing and
> > shrinking
> > > > > + * so we need them to be already set, or requested
> > now.
> > > > > + */
> > > > > + int test_seals = (seals | *file_seals) &
> > > > > + (F_SEAL_GROW | F_SEAL_SHRINK);
> > > > > +
> > > > > + if (test_seals != (F_SEAL_GROW | F_SEAL_SHRINK)) {
> > > > > + error = -EINVAL;
> > > > > + goto unlock;
> > > > > + }
> > > > > +
> > > > > + spin_lock(&file->f_lock);
> > > > > + file->f_mode &= ~(FMODE_WRITE | FMODE_PWRITE);
> > > > > + spin_unlock(&file->f_lock);
> > > > > + }
> > > >
> > > > So you're fiddling around with the file, but not the inode? How are
> > > > you preventing code like the following from re-opening the file as
> > > > writable?
> > > >
> > > > $ cat memfd.c
> > > > #define _GNU_SOURCE
> > > > #include <unistd.h>
> > > > #include <sys/syscall.h>
> > > > #include <printf.h>
> > > > #include <fcntl.h>
> > > > #include <err.h>
> > > > #include <stdio.h>
> > > >
> > > > int main(void) {
> > > > int fd = syscall(__NR_memfd_create, "testfd", 0);
> > > > if (fd == -1) err(1, "memfd");
> > > > char path[100];
> > > > sprintf(path, "/proc/self/fd/%d", fd);
> > > > int fd2 = open(path, O_RDWR);
> > > > if (fd2 == -1) err(1, "reopen");
> > > > printf("reopen successful: %d\n", fd2);
> > > > }
> > > > $ gcc -o memfd memfd.c
> > > > $ ./memfd
> > > > reopen successful: 4
> > > > $
> > > >
> > > > That aside: I wonder whether a better API would be something that
> > > > allows you to create a new readonly file descriptor, instead of
> > > > fiddling with the writability of an existing fd.
> > >
> > > My favorite approach would be to forbid open() on memfds, hope that
> > > nobody notices the tiny API break, and then add an ioctl for "reopen
> > > this memfd with reduced permissions" - but that's just my personal
> > > opinion.
> >
> > I did something along these lines and it fixes the issue, but I forbid open
> > of memfd only when the F_SEAL_FUTURE_WRITE seal is in place. So then its
> > not
> > an ABI break because this is a brand new seal. That seems the least
> > intrusive
> > solution and it works. Do you mind testing it and I'll add your and
> > Tested-by
> > to the new fix? The patch is based on top of this series.
> >
>
> Please don't forbid reopens entirely. You're taking a feature that works
> generally (reopens) and breaking it in one specific case (memfd write
> sealed files). The open modes are available in .open in the struct file:
> you can deny *only* opens for write instead of denying reopens generally.
Yes, as we discussed over chat already, I will implement it that way.
Also lets continue to discuss Andy's concerns he raised on the other thread.
thanks,
- Joel
From: Stanislav Fomichev <sdf(a)google.com>
v4 changes:
* addressed another round of comments/style issues from Jakub Kicinski &
Quentin Monnet (thanks!)
* implemented bpf_object__pin_maps and bpf_object__pin_programs helpers and
used them in bpf_program__pin
* added new pin_name to bpf_program so bpf_program__pin
works with sections that contain '/'
* moved *loadall* command implementation into a separate patch
* added patch that implements *pinmaps* to pin maps when doing
load/loadall
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin, parts of which are now
being used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds pin_name to the bpf_program struct
which is now used as a pin name in bpf_program__pin et al
* fifth patch adds *loadall* command that pins all programs, not just
the first one
* sixth patch adds *pinmaps* argument to load/loadall to let users pin
all maps of the obj file
* seventh patch adds actual flow_dissector support to the bpftool and
an example
Stanislav Fomichev (7):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
libbpf: add internal pin_name
bpftool: add loadall command
bpftool: add pinmaps argument to the load/loadall
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 42 +-
tools/bpf/bpftool/bash-completion/bpftool | 21 +-
tools/bpf/bpftool/common.c | 31 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 185 ++++++---
tools/lib/bpf/libbpf.c | 364 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 18 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 546 insertions(+), 120 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
v3 changes:
* (maybe) better cleanup for partial failure in bpf_object__pin
* added special case in bpf_program__pin for programs with single
instances
v2 changes:
* addressed comments/style issues from Jakub Kicinski & Quentin Monnet
* removed logic that populates jump table
* added cleanup for partial failure in bpf_object__pin
This patch series adds support for loading and attaching flow dissector
programs from the bpftool:
* first patch fixes flow dissector section name in the selftests (so
libbpf auto-detection works)
* second patch adds proper cleanup to bpf_object__pin which is now being
used to attach all flow dissector progs/maps
* third patch adds special case in bpf_program__pin for programs with
single instances (we don't create <prog>/0 pin anymore, just <prog>)
* forth patch adds actual support to the bpftool
See forth patch for the description/details.
Stanislav Fomichev (4):
selftests/bpf: rename flow dissector section to flow_dissector
libbpf: cleanup after partial failure in bpf_object__pin
libbpf: bpf_program__pin: add special case for instances.nr == 1
bpftool: support loading flow dissector
.../bpftool/Documentation/bpftool-prog.rst | 36 ++-
tools/bpf/bpftool/bash-completion/bpftool | 6 +-
tools/bpf/bpftool/common.c | 30 +-
tools/bpf/bpftool/main.h | 1 +
tools/bpf/bpftool/prog.c | 112 ++++++--
tools/lib/bpf/libbpf.c | 258 ++++++++++++++++--
tools/lib/bpf/libbpf.h | 11 +
tools/testing/selftests/bpf/bpf_flow.c | 2 +-
.../selftests/bpf/test_flow_dissector.sh | 2 +-
9 files changed, 368 insertions(+), 90 deletions(-)
--
2.19.1.930.g4563a0d9d0-goog
All,
Updating Kselftest wiki and providing links to overview and how-to documents
has been on my list of things to do for a while.
It is now updated with the current status and links to documents. I am planning
to write a detailed how-to blog/article.
https://kselftest.wiki.kernel.org
thanks,
-- Shuah