On Tue, Sep 05, 2023 at 03:42:58PM +0200, Thomas Hellström wrote:
> Hi, Maxime
>
> On 9/5/23 15:16, Maxime Ripard wrote:
> > On Tue, Sep 05, 2023 at 02:32:38PM +0200, Thomas Hellström wrote:
> > > Hi,
> > >
> > > On 9/5/23 14:05, Maxime Ripard wrote:
> > > > Hi,
> > > >
> > > > On Tue, Sep 05, 2023 at 10:58:31AM +0200, Thomas Hellström wrote:
> > > > > Check that object freeing from within drm_exec_fini() works as expected
> > > > > and doesn't generate any warnings.
> > > > >
> > > > > Cc: Christian König <christian.koenig(a)amd.com>
> > > > > Cc: dri-devel(a)lists.freedesktop.org
> > > > > Signed-off-by: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
> > > > > ---
> > > > > drivers/gpu/drm/tests/drm_exec_test.c | 47 +++++++++++++++++++++++++++
> > > > > 1 file changed, 47 insertions(+)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/tests/drm_exec_test.c b/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > index 563949d777dd..294c25f49cc7 100644
> > > > > --- a/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > +++ b/drivers/gpu/drm/tests/drm_exec_test.c
> > > > > @@ -170,6 +170,52 @@ static void test_prepare_array(struct kunit *test)
> > > > > drm_gem_private_object_fini(&gobj2);
> > > > > }
> > > > > +static const struct drm_gem_object_funcs put_funcs = {
> > > > > + .free = (void *)kfree,
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Check that freeing objects from within drm_exec_fini()
> > > > > + * behaves as expected.
> > > > > + */
> > > > > +static void test_early_put(struct kunit *test)
> > > > > +{
> > > > > + struct drm_exec_priv *priv = test->priv;
> > > > > + struct drm_gem_object *gobj1;
> > > > > + struct drm_gem_object *gobj2;
> > > > > + struct drm_gem_object *array[2];
> > > > > + struct drm_exec exec;
> > > > > + int ret;
> > > > > +
> > > > > + gobj1 = kzalloc(sizeof(*gobj1), GFP_KERNEL);
> > > > > + KUNIT_EXPECT_NOT_ERR_OR_NULL(test, gobj1);
> > > > > + if (!gobj1)
> > > > > + return;
> > > > > +
> > > > > + gobj2 = kzalloc(sizeof(*gobj2), GFP_KERNEL);
> > > > > + KUNIT_EXPECT_NOT_ERR_OR_NULL(test, gobj2);
> > > > > + if (!gobj2) {
> > > > > + kfree(gobj1);
> > > > > + return;
> > > > > + }
> > > > > +
> > > > > + gobj1->funcs = &put_funcs;
> > > > > + gobj2->funcs = &put_funcs;
> > > > > + array[0] = gobj1;
> > > > > + array[1] = gobj2;
> > > > > + drm_gem_private_object_init(priv->drm, gobj1, PAGE_SIZE);
> > > > > + drm_gem_private_object_init(priv->drm, gobj2, PAGE_SIZE);
> > > > > +
> > > > > + drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT);
> > > > > + drm_exec_until_all_locked(&exec)
> > > > > + ret = drm_exec_prepare_array(&exec, array, ARRAY_SIZE(array),
> > > > > + 1);
> > > > > + KUNIT_EXPECT_EQ(test, ret, 0);
> > > > > + drm_gem_object_put(gobj1);
> > > > > + drm_gem_object_put(gobj2);
> > > > > + drm_exec_fini(&exec);
> > > > It doesn't look like you actually check that "freeing objects from
> > > > within drm_exec_fini() behaves as expected." What is the expectation
> > > > here, and how is it checked?
> > > Hm. Good question, I've been manually checking dmesg for lockdep splats. Is
> > > there a way to automate that?
> > I'm not familiar with the drm_exec API, but judging by the code I'd
> > assume you want to check that gobj1 and gobj2 are actually freed using
> > kfree?
>
> Actually not. What's important here is that the call to drm_exec_fini(),
> which puts the last references to gobj1 and gobj2 doesn't trigger any
> lockdep splats, like the one in the commit message of patch 3/3. So to make
> more sense, the test could perhaps be conditioned on
> CONFIG_DEBUG_LOCK_ALLOC. Still it would require manual checking of dmesg()
> after being run.
I'm not aware of something to check on lockdep's status when running a
kunit test, but I'm not sure anyone is expected to look at the dmesg
trace when running kunit to find out whether the test succeeded or not.
It looks like there was an attempt at some point to fail the test if
there was a lockdep error:
https://lore.kernel.org/all/20200814205527.1833459-1-urielguajardojr@gmail.…
It doesn't look like it's been merged though. David, Brendan, do you
know why it wasn't merged or if there is a good option for us there?
At the very least, I think a comment after the call to drm_exec_fini to
make it clear that the error would be in the kernel logs, and a better
one on the test definition to explicitly say what you want to make sure
of, and how one can check it's been done would be great.
Maxime
From: Benjamin Berg <benjamin.berg(a)intel.com>
The existing KUNIT_ARRAY_PARAM macro requires a separate function to
get the description. However, in a lot of cases the description can
just be copied directly from the array. Add a second macro that
avoids having to write a static function just for a single strscpy.
Signed-off-by: Benjamin Berg <benjamin.berg(a)intel.com>
---
Documentation/dev-tools/kunit/usage.rst | 7 ++++---
include/kunit/test.h | 19 +++++++++++++++++++
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/kunit/usage.rst b/Documentation/dev-tools/kunit/usage.rst
index c27e1646ecd9..fe8c28d66dfe 100644
--- a/Documentation/dev-tools/kunit/usage.rst
+++ b/Documentation/dev-tools/kunit/usage.rst
@@ -571,8 +571,9 @@ By reusing the same ``cases`` array from above, we can write the test as a
{
strcpy(desc, t->str);
}
- // Creates `sha1_gen_params()` to iterate over `cases`.
- KUNIT_ARRAY_PARAM(sha1, cases, case_to_desc);
+ // Creates `sha1_gen_params()` to iterate over `cases` while using
+ // the struct member `str` for the case description.
+ KUNIT_ARRAY_PARAM_DESC(sha1, cases, str);
// Looks no different from a normal test.
static void sha1_test(struct kunit *test)
@@ -588,7 +589,7 @@ By reusing the same ``cases`` array from above, we can write the test as a
}
// Instead of KUNIT_CASE, we use KUNIT_CASE_PARAM and pass in the
- // function declared by KUNIT_ARRAY_PARAM.
+ // function declared by KUNIT_ARRAY_PARAM or KUNIT_ARRAY_PARAM_DESC.
static struct kunit_case sha1_test_cases[] = {
KUNIT_CASE_PARAM(sha1_test, sha1_gen_params),
{}
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 68ff01aee244..f60d11e41855 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -1516,6 +1516,25 @@ do { \
return NULL; \
}
+/**
+ * KUNIT_ARRAY_PARAM_DESC() - Define test parameter generator from an array.
+ * @name: prefix for the test parameter generator function.
+ * @array: array of test parameters.
+ * @desc_member: structure member from array element to use as description
+ *
+ * Define function @name_gen_params which uses @array to generate parameters.
+ */
+#define KUNIT_ARRAY_PARAM_DESC(name, array, desc_member) \
+ static const void *name##_gen_params(const void *prev, char *desc) \
+ { \
+ typeof((array)[0]) *__next = prev ? ((typeof(__next)) prev) + 1 : (array); \
+ if (__next - (array) < ARRAY_SIZE((array))) { \
+ strscpy(desc, __next->desc_member, KUNIT_PARAM_DESC_SIZE); \
+ return __next; \
+ } \
+ return NULL; \
+ }
+
// TODO(dlatypov(a)google.com): consider eventually migrating users to explicitly
// include resource.h themselves if they need it.
#include <kunit/resource.h>
--
2.41.0
The most critical issue with vm.memfd_noexec=2 (the fact that passing
MFD_EXEC would bypass it entirely[1]) has been fixed in Andrew's
tree[2], but there are still some outstanding issues that need to be
addressed:
* vm.memfd_noexec=2 shouldn't reject old-style memfd_create(2) syscalls
because it will make it far to difficult to ever migrate. Instead it
should imply MFD_EXEC.
* The dmesg warnings are pr_warn_once(), which on most systems means
that they will be used up by systemd or some other boot process and
userspace developers will never see it.
- For the !(flags & (MFD_EXEC | MFD_NOEXEC_SEAL)) case, outputting a
rate-limited message to the kernel log is necessary to tell
userspace that they should add the new flags.
Arguably the most ideal way to deal with the spam concern[3,4]
while still prompting userspace to switch to the new flags would be
to only log the warning once per task or something similar.
However, adding something to task_struct for tracking this would be
needless bloat for a single pr_warn_ratelimited().
So just switch to pr_info_ratelimited() to avoid spamming the log
with something that isn't a real warning. There's lots of
info-level stuff in dmesg, it seems really unlikely that this
should be an actual problem. Most programs are already switching to
the new flags anyway.
- For the vm.memfd_noexec=2 case, we need to log a warning for every
failure because otherwise userspace will have no idea why their
previously working program started returning -EACCES (previously
-EINVAL) from memfd_create(2). pr_warn_once() is simply wrong here.
* The racheting mechanism for vm.memfd_noexec makes it incredibly
unappealing for most users to enable the sysctl because enabling it
on &init_pid_ns means you need a system reboot to unset it. Given the
actual security threat being protected against, CAP_SYS_ADMIN users
being restricted in this way makes little sense.
The argument for this ratcheting by the original author was that it
allows you to have a hierarchical setting that cannot be unset by
child pidnses, but this is not accurate -- changing the parent
pidns's vm.memfd_noexec setting to be more restrictive didn't affect
children.
Instead, switch the vm.memfd_noexec sysctl to be properly
hierarchical and allow CAP_SYS_ADMIN users (in the pidns's owning
userns) to lower the setting as long as it is not lower than the
parent's effective setting. This change also makes it so that
changing a parent pidns's vm.memfd_noexec will affect all
descendants, providing a properly hierarchical setting. The
performance impact of this is incredibly minimal since the maximum
depth of pidns is 32 and it is only checked during memfd_create(2)
and unshare(CLONE_NEWPID).
* The memfd selftests would not exit with a non-zero error code when
certain tests that ran in a forked process (specifically the ones
related to MFD_EXEC and MFD_NOEXEC_SEAL) failed.
[1]: https://lore.kernel.org/all/ZJwcsU0vI-nzgOB_@codewreck.org/
[2]: https://lore.kernel.org/all/20230705063315.3680666-1-jeffxu@google.com/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundatio…
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
Changes in v2:
- Make vm.memfd_noexec restrictions properly hierarchical.
- Allow vm.memfd_noexec setting to be lowered by CAP_SYS_ADMIN as long
as it is not lower than the parent's effective setting.
- Fix the logging behaviour related to the new flags and
vm.memfd_noexec=2.
- Add more thorough tests for vm.memfd_noexec in selftests.
- v1: <https://lore.kernel.org/r/20230713143406.14342-1-cyphar@cyphar.com>
---
Aleksa Sarai (5):
selftests: memfd: error out test process when child test fails
memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2
memfd: improve userspace warnings for missing exec-related flags
memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy
selftests: improve vm.memfd_noexec sysctl tests
include/linux/pid_namespace.h | 39 ++--
kernel/pid.c | 3 +
kernel/pid_namespace.c | 6 +-
kernel/pid_sysctl.h | 28 ++-
mm/memfd.c | 33 ++-
tools/testing/selftests/memfd/memfd_test.c | 332 +++++++++++++++++++++++------
6 files changed, 322 insertions(+), 119 deletions(-)
---
base-commit: 3ff995246e801ea4de0a30860a1d8da4aeb538e7
change-id: 20230803-memfd-vm-noexec-uapi-fixes-ace725c67b0f
Best regards,
--
Aleksa Sarai <cyphar(a)cyphar.com>
From: Rong Tao <rongtao(a)cestc.cn>
We need to optimize the kallsyms cache, including optimizations for the
number of symbols limit, and, some test cases add new kernel symbols
(such as testmods) and we need to refresh kallsyms (reload or refresh).
Rong Tao (2):
selftests/bpf: trace_helpers.c: optimize kallsyms cache
selftests/bpf: trace_helpers.c: Add a global ksyms initialization
mutex
samples/bpf/Makefile | 4 +
.../selftests/bpf/prog_tests/fill_link_info.c | 2 +-
.../prog_tests/kprobe_multi_testmod_test.c | 20 ++-
tools/testing/selftests/bpf/trace_helpers.c | 136 +++++++++++++-----
tools/testing/selftests/bpf/trace_helpers.h | 9 +-
5 files changed, 126 insertions(+), 45 deletions(-)
--
2.41.0
Hi all,
Recently "memfd: improve userspace warnings for missing exec-related
flags" was merged. On my system, this is a regression, not an
improvement, because the entire 256k kernel log buffer (default on x86)
is filled with these warnings and "__do_sys_memfd_create: 122 callbacks
suppressed". I haven't investigated too closely, but the most likely
cause is Wayland libraries.
This is too serious of a consequence for using an old API, especially
considering how recently the flags were added. The vast majority of
software has not had time to add the flags: glibc does not define the
macros until 2.38 which was released less than one month ago, man-pages
does not document the flags, and according to Debian Code Search, only
systemd, stress-ng, and strace actually pass either of these flags.
Furthermore, since old kernels reject unknown flags, it's not just a
matter of defining and passing the flag; every program needs to
add logic to handle EINVAL and try again.
Some other way needs to be found to encourage userspace to add the
flags; otherwise, this message will be patched out because the kernel
log becomes unusable after running unupdated programs, which will still
exist even after upstreams are fixed. In particular, AppImages,
flatpaks, snaps, and similar app bundles contain vendored Wayland
libraries which can be difficult or impossible to update.
Thanks,
Alex.
This change introduces a new fcntl to check if an fd points to a memfd's
original open fd (the one created by memfd_create).
We encountered an issue with migrating memfds in CRIU (checkpoint
restore in userspace - it migrates running processes between
machines). Imagine a scenario:
1. Create a memfd. By default it's open with O_RDWR and yet one can
exec() to it (unlike with regular files, where one would get ETXTBSY).
2. Reopen that memfd with O_RDWR via /proc/self/fd/<fd>.
Now those 2 fds are indistinguishable from userspace. You can't exec()
to either of them (since the reopen incremented inode->i_writecount)
and their /proc/self/fdinfo/ are exactly the same. Unfortunately they
are not the same. If you close the second one, the first one becomes
exec()able again. If you close the first one, the other doesn't become
exec()able. Therefore during migration it does matter which is recreated
first and which is reopened but there is no way for CRIU to tell which
was first.
Michal Clapinski (2):
fcntl: add fcntl(F_CHECK_ORIGINAL_MEMFD)
selftests: test fcntl(F_CHECK_ORIGINAL_MEMFD)
fs/fcntl.c | 3 ++
include/uapi/linux/fcntl.h | 9 ++++++
tools/testing/selftests/memfd/memfd_test.c | 32 ++++++++++++++++++++++
3 files changed, 44 insertions(+)
--
2.42.0.283.g2d96d420d3-goog