Hi Kees and All,
There are several tests in kselftest subsystem which load modules to tests
the internals of the kernel. Most of these test modules are just loaded by
the kselftest, their status isn't read and reported to the user logs. Hence
they don't provide benefit of executing those tests.
I've found patches from Kees where he has been converting such kselftests
to kunit tests [1]. The probable motivation is to move tests output of
kselftest subsystem which only triggers tests without correctly reporting
the results. On the other hand, kunit is there to test the kernel's
internal functions which can't be done by userspace.
Kselftest: Test user facing APIs from userspace
Kunit: Test kernel's internal functions from kernelspace
This brings me to conclusion that kselftest which are loading modules to
test kernelspace should be converted to kunit tests. I've noted several
such kselftests.
This is just my understanding. Please mention if I'm correct above or more
reasons to support kselftest test modules transformation into kunit test.
[1] https://lore.kernel.org/all/20221018082824.never.845-kees@kernel.org/
--
BR,
Muhammad Usama Anjum
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). The user must provide a shadow stack
address and size, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with a shadow stack token at the top of the
stack.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
Please further note that the token consumption done by clone3() is not
currently implemented in an atomic fashion, Rick indicated that he would
look into fixing this if people are OK with the implementation.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (9):
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Explicitly handle child exits due to signals
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 41 ++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 104 +++++++---
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 13 ++
include/uapi/linux/sched.h | 13 +-
kernel/fork.c | 76 ++++++--
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 224 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 40 +++-
tools/testing/selftests/ksft_shstk.h | 63 ++++++
15 files changed, 511 insertions(+), 88 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A regression happened where running the ownership test passes on the first
iteration but fails running it a second time. This was caught and fixed,
but a later change brought it back. The regression was missed because the
automated tests only run the tests once per boot.
Change the ownership test to iterate through the tests twice, as this will
catch the regression with a single run.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../ftrace/test.d/00basic/test_ownership.tc | 34 +++++++++++--------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
index c45094d1e1d2..71e43a92352a 100644
--- a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -83,32 +83,38 @@ run_tests() {
done
}
-mount -o remount,"$new_options" .
+# Run the tests twice as leftovers can cause issues
+for loop in 1 2 ; do
-run_tests
+ echo "Running iteration $loop"
-mount -o remount,"$mount_options" .
+ mount -o remount,"$new_options" .
-for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
- test "$d" $original_group
-done
+ run_tests
+
+ mount -o remount,"$mount_options" .
+
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+ done
# check instances as well
-chgrp $other_group instances
+ chgrp $other_group instances
-instance="$(mktemp -u test-XXXXXX)"
+ instance="$(mktemp -u test-XXXXXX)"
-mkdir instances/$instance
+ mkdir instances/$instance
-cd instances/$instance
+ cd instances/$instance
-run_tests
+ run_tests
-cd ../..
+ cd ../..
-rmdir instances/$instance
+ rmdir instances/$instance
-chgrp $original_group instances
+ chgrp $original_group instances
+done
exit 0
--
2.43.0
This revision only updates the tests from the previous revision[1], and
integrates an Acked-by[2] and a Reviewed-By[3] into the first commit
message.
Documentation/admin-guide/cgroup-v2.rst | 22 ++-
include/linux/cgroup-defs.h | 5 +
include/linux/cgroup.h | 3 +
include/linux/memcontrol.h | 5 +
include/linux/page_counter.h | 11 +-
kernel/cgroup/cgroup-internal.h | 2 +
kernel/cgroup/cgroup.c | 7 +
mm/memcontrol.c | 116 +++++++++++++--
mm/page_counter.c | 30 +++-
tools/testing/selftests/cgroup/cgroup_util.c | 22 +++
tools/testing/selftests/cgroup/cgroup_util.h | 2 +
tools/testing/selftests/cgroup/test_memcontrol.c | 264 ++++++++++++++++++++++++++++++++-
12 files changed, 454 insertions(+), 35 deletions(-)
[1]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/
[2]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/#…
[3]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/#…
Thank you all for the support and reviews so far!
David Finkel
Senior Principal Software Engineer
Vimeo Inc.
Hello,
this series brings a new set of test converted to the test_progs framework.
Since the tests are quite small, I chose to group three tests conversion in
the same series, but feel free to let me know if I should keep one series
per test. The series focuses on cgroup testing and converts the following
tests:
- get_cgroup_id_user
- cgroup_storage
- test_skb_cgroup_id_user
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (4):
selftests/bpf: convert get_current_cgroup_id_user to test_progs
selftests/bpf: convert test_cgroup_storage to test_progs
selftests/bpf: add proper section name to bpf prog and rename it
selftests/bpf: convert test_skb_cgroup_id_user to test_progs
tools/testing/selftests/bpf/.gitignore | 3 -
tools/testing/selftests/bpf/Makefile | 8 +-
tools/testing/selftests/bpf/get_cgroup_id_user.c | 151 -----------------
.../selftests/bpf/prog_tests/cgroup_ancestor.c | 159 ++++++++++++++++++
.../bpf/prog_tests/cgroup_get_current_cgroup_id.c | 58 +++++++
.../selftests/bpf/prog_tests/cgroup_storage.c | 65 ++++++++
...test_skb_cgroup_id_kern.c => cgroup_ancestor.c} | 2 +-
tools/testing/selftests/bpf/progs/cgroup_storage.c | 24 +++
tools/testing/selftests/bpf/test_cgroup_storage.c | 174 --------------------
tools/testing/selftests/bpf/test_skb_cgroup_id.sh | 63 -------
.../selftests/bpf/test_skb_cgroup_id_user.c | 183 ---------------------
11 files changed, 309 insertions(+), 581 deletions(-)
---
base-commit: 0e2eaf4b33f65e904b69bae6b956f3f610dbba9a
change-id: 20240725-convert_cgroup_tests-d07c66053225
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
kunit_driver_create() accepts a name for the driver, but does not copy
it, so if that name is either on the stack, or otherwise freed, we end
up with a use-after-free when the driver is cleaned up.
Instead, strdup() the name, and manage it as another KUnit allocation.
As there was no existing kunit_kstrdup(), we add one. Further, add a
kunit_ variant of strdup_const() and kfree_const(), so we don't need to
allocate and manage the string in the majority of cases where it's a
constant.
This fixes a KASAN splat with overflow.overflow_allocation_test, when
built as a module.
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reported-by: Nico Pache <npache(a)redhat.com>
Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
---
There's some more serious changes since the RFC I sent, so please take a
closer look.
Thanks,
-- David
Changes since RFC:
https://groups.google.com/g/kunit-dev/c/81V9b9QYON0/m/PFKNKDKAAAAJ
- Add and use the kunit_kstrdup_const() and kunit_free_const()
functions.
- Fix a typo in the doc comments.
---
include/kunit/test.h | 58 ++++++++++++++++++++++++++++++++++++++++++++
lib/kunit/device.c | 7 ++++--
2 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index e2a1f0928e8b..da9e84de14c0 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -28,6 +28,7 @@
#include <linux/types.h>
#include <asm/rwonce.h>
+#include <asm/sections.h>
/* Static key: true if any KUnit tests are currently running */
DECLARE_STATIC_KEY_FALSE(kunit_running);
@@ -480,6 +481,63 @@ static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp
return kunit_kmalloc_array(test, n, size, gfp | __GFP_ZERO);
}
+
+/**
+ * kunit_kfree_const() - conditionally free test managed memory
+ * @x: pointer to the memory
+ *
+ * Calls kunit_kfree() only if @x is not in .rodata section.
+ * See kunit_kstrdup_const() for more information.
+ */
+static inline void kunit_kfree_const(struct kunit *test, const void *x)
+{
+ if (!is_kernel_rodata((unsigned long)x))
+ kunit_kfree(test, x);
+}
+
+/**
+ * kunit_kstrdup() - Duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kstrdup() and kunit_kmalloc_array() for more information.
+ */
+static inline char *kunit_kstrdup(struct kunit *test, const char *str, gfp_t gfp)
+{
+ size_t len;
+ char *buf;
+
+ if (!str)
+ return NULL;
+
+ len = strlen(str) + 1;
+ buf = kunit_kmalloc(test, len, gfp);
+ if (buf)
+ memcpy(buf, str, len);
+ return buf;
+}
+
+/**
+ * kunit_kstrdup_const() - Conditionally duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * Calls kunit_kstrdup() only if @str is not in the rodata section. Must be freed with
+ * kunit_free_const() -- not kunit_free().
+ * See kstrdup_const() and kunit_kmalloc_array() for more information.
+ */
+static inline const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
+{
+ if (is_kernel_rodata((unsigned long)str))
+ return str;
+
+ return kunit_kstrdup(test, str, gfp);
+}
+
/**
* kunit_vm_mmap() - Allocate KUnit-tracked vm_mmap() area
* @test: The test context object.
diff --git a/lib/kunit/device.c b/lib/kunit/device.c
index 25c81ed465fb..520c1fccee8a 100644
--- a/lib/kunit/device.c
+++ b/lib/kunit/device.c
@@ -89,7 +89,7 @@ struct device_driver *kunit_driver_create(struct kunit *test, const char *name)
if (!driver)
return ERR_PTR(err);
- driver->name = name;
+ driver->name = kunit_kstrdup_const(test, name, GFP_KERNEL);
driver->bus = &kunit_bus_type;
driver->owner = THIS_MODULE;
@@ -192,8 +192,11 @@ void kunit_device_unregister(struct kunit *test, struct device *dev)
const struct device_driver *driver = to_kunit_device(dev)->driver;
kunit_release_action(test, device_unregister_wrapper, dev);
- if (driver)
+ if (driver) {
+ const char *driver_name = driver->name;
kunit_release_action(test, driver_unregister_wrapper, (void *)driver);
+ kunit_kfree_const(test, driver_name);
+ }
}
EXPORT_SYMBOL_GPL(kunit_device_unregister);
--
2.46.0.rc1.232.g9752f9e123-goog
In arm64 pKVM and QuIC's Gunyah protected VM model, we want to support
grabbing shmem user pages instead of using KVM's guestmemfd. These
hypervisors provide a different isolation model than the CoCo
implementations from x86. KVM's guest_memfd is focused on providing
memory that is more isolated than AVF requires. Some specific examples
include ability to pre-load data onto guest-private pages, dynamically
sharing/isolating guest pages without copy, and (future) migrating
guest-private pages. In sum of those differences after a discussion in
[1] and at PUCK, we want to try to stick with existing shmem and extend
GUP to support the isolation needs for arm64 pKVM and Gunyah. To that
end, we introduce the concept of "exclusive GUP pinning", which enforces
that only one pin of any kind is allowed when using the FOLL_EXCLUSIVE
flag is set. This behavior doesn't affect FOLL_GET or any other folio
refcount operations that don't go through the FOLL_PIN path.
[1]: https://lore.kernel.org/all/20240319143119.GA2736@willie-the-truck/
Tree with patches at:
https://git.codelinaro.org/clo/linux-kernel/gunyah-linux/-/tree/sent/exclus…
anup(a)brainfault.org, paul.walmsley(a)sifive.com,
palmer(a)dabbelt.com, aou(a)eecs.berkeley.edu, seanjc(a)google.com,
viro(a)zeniv.linux.org.uk, brauner(a)kernel.org,
willy(a)infradead.org, akpm(a)linux-foundation.org,
xiaoyao.li(a)intel.com, yilun.xu(a)intel.com,
chao.p.peng(a)linux.intel.com, jarkko(a)kernel.org,
amoorthy(a)google.com, dmatlack(a)google.com,
yu.c.zhang(a)linux.intel.com, isaku.yamahata(a)intel.com,
mic(a)digikod.net, vbabka(a)suse.cz, vannapurve(a)google.com,
ackerleytng(a)google.com, mail(a)maciej.szmigiero.name,
david(a)redhat.com, michael.roth(a)amd.com, wei.w.wang(a)intel.com,
liam.merwick(a)oracle.com, isaku.yamahata(a)gmail.com,
kirill.shutemov(a)linux.intel.com, suzuki.poulose(a)arm.com,
steven.price(a)arm.com, quic_eberman(a)quicinc.com,
quic_mnalajal(a)quicinc.com, quic_tsoni(a)quicinc.com,
quic_svaddagi(a)quicinc.com, quic_cvanscha(a)quicinc.com,
quic_pderrin(a)quicinc.com, quic_pheragu(a)quicinc.com,
catalin.marinas(a)arm.com, james.morse(a)arm.com,
yuzenghui(a)huawei.com, oliver.upton(a)linux.dev, maz(a)kernel.org,
will(a)kernel.org, qperret(a)google.com, keirf(a)google.com,
tabba(a)google.com
Signed-off-by: Elliot Berman <quic_eberman(a)quicinc.com>
---
Elliot Berman (2):
mm/gup-test: Verify exclusive pinned
mm/gup_test: Verify GUP grabs same pages twice
Fuad Tabba (3):
mm/gup: Move GUP_PIN_COUNTING_BIAS to page_ref.h
mm/gup: Add an option for obtaining an exclusive pin
mm/gup: Add support for re-pinning a normal pinned page as exclusive
include/linux/mm.h | 57 ++++----
include/linux/mm_types.h | 2 +
include/linux/page_ref.h | 74 ++++++++++
mm/Kconfig | 5 +
mm/gup.c | 265 ++++++++++++++++++++++++++++++----
mm/gup_test.c | 108 ++++++++++++++
mm/gup_test.h | 1 +
tools/testing/selftests/mm/gup_test.c | 5 +-
8 files changed, 457 insertions(+), 60 deletions(-)
---
base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f
change-id: 20240509-exclusive-gup-66259138bbff
Best regards,
--
Elliot Berman <quic_eberman(a)quicinc.com>