From: zhouyuhang <zhouyuhang(a)kylinos.cn>
Test case idmap_mount_tree_invalid failed to run on the newer kernel
with the following output:
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0)
# idmap_mount_tree_invalid: Test terminated by assertion
This is because tmpfs is mounted at "/mnt/A", and tmpfs already
contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem:
support idmapped mounts for tmpfs"). So calling sys_mount_setattr here
returns 0 instead of -EINVAL as expected.
Ramfs does not support idmap mounts, so we can use it here to test invalid mounts,
which allows the test case to pass with the following output:
# Starting 1 tests from 1 test cases.
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# OK mount_setattr_idmapped.idmap_mount_tree_invalid
ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid
# PASSED: 1 / 1 tests passed.
Signed-off-by: zhouyuhang <zhouyuhang(a)kylinos.cn>
---
.../testing/selftests/mount_setattr/mount_setattr_test.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index c6a8c732b802..68801e1a9ec2 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -1414,6 +1414,13 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid)
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0);
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0);
+ ASSERT_EQ(mount("testing", "/mnt/A", "ramfs", MS_NOATIME | MS_NODEV,
+ "size=100000,mode=700"), 0);
+
+ ASSERT_EQ(mkdir("/mnt/A/AA", 0777), 0);
+
+ ASSERT_EQ(mount("/tmp", "/mnt/A/AA", NULL, MS_BIND | MS_REC, NULL), 0);
+
open_tree_fd = sys_open_tree(-EBADF, "/mnt/A",
AT_RECURSIVE |
AT_EMPTY_PATH |
@@ -1433,6 +1440,8 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid)
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0);
ASSERT_EQ(expected_uid_gid(open_tree_fd, "B/b", 0, 0, 0), 0);
ASSERT_EQ(expected_uid_gid(open_tree_fd, "B/BB/b", 0, 0, 0), 0);
+
+ (void)umount2("/mnt/A", MNT_DETACH);
}
TEST_F(mount_setattr, mount_attr_nosymfollow)
--
2.27.0
Good day,
My name is Evan from Ukraine.I have tried to reach you but find it difficult due to internet scarcity here in this town.
I am contacting you because I want to come over to your country,I have some huge funds I plan to invest in your country because I want to relocate my Late Father business due to ongoing war in my country Ukraine and visit your country to set up new investment business you may advice profitable in your area.
I also have some millions of dollars belonging to my late father that i want to invest and 995kg of 24+carat 99.9% purity gold I want to ship to your country and establish gold jewelry manufacturing company.
I will offer you good monetary rewards for your help,due to how russian drone bombards this town frequently,internet network is disrupted because of attacks on network installations,please for fast discussion, you can add me on whatsapp and let's talk faster as i want to leave this war zone as soon as possible,this is my whatsapp number below.
+380-672575220
Email:evanbohdan@gmail.com
I wait for your earliest response.
Regards,
Evan
Whatsapp/Tel:+380-672575220
Email:evanbohdan@gmail.com
From: Jason Xing <kernelxing(a)tencent.com>
As we can see from the title, when I compiled the selftests/bpf, I
saw the error:
implicit declaration of function ‘gettid’ ; did you mean ‘getgid’? [-Werror=implicit-function-declaration]
skel->bss->tid = gettid();
^~~~~~
getgid
Adding a define to fix it (referring to
tools/perf/tests/shell/coresight/thread_loop/thread_loop.c file.
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
---
tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index f0a3a9c18e9e..a105759f3dcf 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -34,6 +34,8 @@
#include "bpf_iter_ksym.skel.h"
#include "bpf_iter_sockmap.skel.h"
+#define gettid() syscall(SYS_gettid)
+
static void test_btf_id_or_null(void)
{
struct bpf_iter_test_kern3 *skel;
--
2.37.3
Fixup small broken window panes in DAMON selftests and kunit tests.
First four patches clean up DAMON debugfs interface selftests output, by
fixing segmentation fault of a test program (patch 1), removing
unnecessary debugging messages (patch 2), and hiding error messages from
expected failures (patches 3 and 4).
Following two patches fix copy-paste mistakes in DAMON Kconfig help
message that copied from debugfs kunit test (patch 5) and a comment on
the debugfs kunit test code (patch 6).
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Andrew Paniakin (1):
selftests/damon/huge_count_read_write: provide sufficiently large
buffer for DEPRECATED file read
SeongJae Park (5):
selftests/damon/huge_count_read_write: remove unnecessary debugging
message
selftests/damon/_debugfs_common: hide expected error message from
test_write_result()
selftests/damon/debugfs_duplicate_context_creation: hide errors from
expected file write failures
mm/damon/Kconfig: update DBGFS_KUNIT prompt copy for SYSFS_KUNIT
mm/damon/tests/dbgfs-kunit: fix the header double inclusion guarding
ifdef comment
mm/damon/Kconfig | 2 +-
mm/damon/tests/dbgfs-kunit.h | 2 +-
tools/testing/selftests/damon/_debugfs_common.sh | 7 ++++++-
.../selftests/damon/debugfs_duplicate_context_creation.sh | 2 +-
tools/testing/selftests/damon/huge_count_read_write.c | 4 +---
5 files changed, 10 insertions(+), 7 deletions(-)
base-commit: 13583c750117b4e10cdaf5578dcc7723b305ce4e
--
2.39.5
Two small fixes related to the MPTCP packets scheduler:
- Patch 1: add missing rcu_read_(un)lock(). A fix for >= 6.6.
- Patch 2: remove unneeded lock when listing packets schedulers. A fix
for >= 6.10.
And some modifications in the MPTCP selftests:
- Patch 3: a small addition to the MPTCP selftests to cover more code.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (3):
mptcp: init: protect sched with rcu_read_lock
mptcp: remove unneeded lock when listing scheds
selftests: mptcp: list sysctl data
net/mptcp/protocol.c | 2 ++
net/mptcp/sched.c | 2 --
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 9 +++++++++
3 files changed, 11 insertions(+), 2 deletions(-)
---
base-commit: 3b05b9c36ddd01338e1352588f2ec1ea23f97d43
change-id: 20241021-net-mptcp-sched-lock-10dfc75d1e00
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_INSTALL
which installs page table-level guard page markers - and
MADV_GUARD_REMOVE - which clears them.
Guard markers can be installed across multiple VMAs and any existing
mappings will be cleared, that is zapped, before installing the guard page
markers in the page tables.
There is no concept of 'nested' guard markers, multiple attempts to install
guard markers in a range will, after the first attempt, have no effect.
Importantly, removing guard markers over a range that contains both guard
markers and ordinary backed memory has no effect on anything but the guard
markers (including leaving huge pages un-split), so a user can safely
remove guard markers over a range of memory leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by MADV_DONTNEED and MADV_FREE do not
remove guard markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the markers if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of guard
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v3
* Cleaned up mm/pagewalk.c logic a bit to make things clearer, as suggested
by Vlastiml.
* Explicitly avoid splitting THP on PTE installation, as suggested by
Vlastimil. Note this has no impact on the guard pages logic, which has
page table entry handlers at PUD, PMD and PTE level.
* Added WARN_ON_ONCE() to mm/hugetlb.c path where we don't expect a guard
marker, as suggested by Vlastimil.
* Reverted change to is_poisoned_swp_entry() to exclude guard pages which
has the effect of MADV_FREE _not_ clearing guard pages. After discussion
with Vlastimil, it became apparent that the ability to 'cancel' the
freeing operation by writing to the mapping after having issued an
MADV_FREE would mean that we would risk unexpected behaviour should the
guard pages be removed, so we now do not remove markers here at all.
* Added comment to PTE_MARKER_GUARD to highlight that memory tagged with
the marker behaves as if it were a region mapped PROT_NONE, as
highlighted by David.
* Rename poison -> install, unpoison -> remove (i.e. MADV_GUARD_INSTALL /
MADV_GUARD_REMOVE over MADV_GUARD_POISON / MADV_GUARD_REMOVE) at the
request of David and John who both find the poison analogy
confusing/overloaded.
* After a lot of discussion, replace the looping behaviour should page
faults race with guard page installation with a modest reattempt followed
by returning -ERESTARTNOINTR to have the operation abort and re-enter,
relieving lock contention and avoiding the possibility of allowing a
malicious sandboxed process to impact the mmap lock or stall the overall
process more than necessary, as suggested by Jann and Vlastimil having
raised the issue.
* Adjusted the page table walker so a populated huge PUD or PMD is
correctly treated as being populated, necessitating a zap. In v2 we
incorrectly skipped over these, which would cause the logic to wrongly
proceed as if nothing were populated and the install succeeded.
Instead, explicitly check to see if a huge page - if so, do not split but
rather abort the operation and let zap take care of things.
* Updated the guard remove logic to not unnecessarily split huge pages
either.
* Added a debug check to assert that the number of installed PTEs matches
expectation, accounting for any existing guard pages.
* Adapted vector_madvise() used by the process_madvise() system call to
handle -ERESTARTNOINTR correctly.
v2
* The macros in kselftest_harness.h seem to be broken - __EXPECT() is
terminated by '} while (0); OPTIONAL_HANDLER(_assert)' meaning it is not
safe in single line if / else or for /which blocks, however working
around this results in checkpatch producing invalid warnings, as reported
by Shuah.
* Fixing these macros is out of scope for this series, so compromise and
instead rewrite test blocks so as to use multiple lines by separating out
a decl in most cases. This has the side effect of, for the most part,
making things more readable.
* Heavily document the use of the volatile keyword - we can't avoid
checkpatch complaining about this, so we explain it, as reported by
Shuah.
* Updated commit message to highlight that we skip tests we lack
permissions for, as reported by Shuah.
* Replaced a perror() with ksft_exit_fail_perror(), as reported by Shuah.
* Added user friendly messages to cases where tests are skipped due to lack
of permissions, as reported by Shuah.
* Update the tool header to include the new MADV_GUARD_POISON/UNPOISON
defines and directly include asm-generic/mman.h to get the
platform-neutral versions to ensure we import them.
* Finally fixed Vlastimil's email address in Suggested-by tags from suze to
suse, as reported by Vlastimil.
* Added linux-api to cc list, as reported by Vlastimil.
https://lore.kernel.org/all/cover.1729440856.git.lorenzo.stoakes@oracle.com/
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
https://lore.kernel.org/all/cover.1729196871.git.lorenzo.stoakes@oracle.com/
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
tools: testing: update tools UAPI header for mman-common.h
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 24 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 4 +
mm/internal.h | 12 +
mm/madvise.c | 225 ++++
mm/memory.c | 18 +-
mm/mprotect.c | 6 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 227 +++-
tools/include/uapi/asm-generic/mman-common.h | 3 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1239 ++++++++++++++++++
19 files changed, 1720 insertions(+), 76 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.47.0
The following kselftest arm64 and FVP failed with Linux next-20241025 on
- Qemu-arm64
- FVP
running Linux next-20241025 kernel.
First seen on next-20241025
Good: next-20241024
BAD: next-20241025
kselftest-arm64, FVP
* arm64_check_buffer_fill
* arm64_check_mmap_options
* arm64_check_child_memory
Anyone have seen these failures ?
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Test log:
----------
# selftests: arm64: check_buffer_fill
# 1..20
# ok 1 Check buffer correctness by byte with sync err mode and mmap memory
# ok 2 Check buffer correctness by byte with async err mode and mmap memory
# ok 3 Check buffer correctness by byte with sync err mode and
mmap/mprotect memory
# ok 4 Check buffer correctness by byte with async err mode and
mmap/mprotect memory
# ok 5 Check buffer write underflow by byte with sync mode and mmap memory
# ok 6 Check buffer write underflow by byte with async mode and mmap memory
# ok 7 Check buffer write underflow by byte with tag check fault
ignore and mmap memory
# ok 8 Check buffer write underflow by byte with sync mode and mmap memory
# ok 9 Check buffer write underflow by byte with async mode and mmap memory
# ok 10 Check buffer write underflow by byte with tag check fault
ignore and mmap memory
# ok 11 Check buffer write overflow by byte with sync mode and mmap memory
# ok 12 Check buffer write overflow by byte with async mode and mmap memory
# ok 13 Check buffer write overflow by byte with tag fault ignore mode
and mmap memory
# ok 14 Check buffer write correctness by block with sync mode and mmap memory
# ok 15 Check buffer write correctness by block with async mode and mmap memory
# ok 16 Check buffer write correctness by block with tag fault ignore
and mmap memory
# # FAIL: mmap allocation
# # FAIL: memory allocation
# not ok 17 Check initial tags with private mapping, sync error mode
and mmap memory
# ok 18 Check initial tags with private mapping, sync error mode and
mmap/mprotect memory
# # FAIL: mmap allocation
# # FAIL: memory allocation
# not ok 19 Check initial tags with shared mapping, sync error mode
and mmap memory
# ok 20 Check initial tags with shared mapping, sync error mode and
mmap/mprotect memory
# # Totals: pass:18 fail:2 xfail:0 xpass:0 skip:0 error:0
not ok 21 selftests: arm64: check_buffer_fill # exit=1
# timeout set to 45
# selftests: arm64: check_child_memory
# 1..12
# ok 1 Check child anonymous memory with private mapping, precise mode
and mmap memory
# ok 2 Check child anonymous memory with shared mapping, precise mode
and mmap memory
# ok 3 Check child anonymous memory with private mapping, imprecise
mode and mmap memory
# ok 4 Check child anonymous memory with shared mapping, imprecise
mode and mmap memory
# ok 5 Check child anonymous memory with private mapping, precise mode
and mmap/mprotect memory
# ok 6 Check child anonymous memory with shared mapping, precise mode
and mmap/mprotect memory
# # FAIL: mmap allocation
# # FAIL: memory allocation
# not ok 7 Check child file memory with private mapping, precise mode
and mmap memory
# # FAIL: mmap allocation
# # FAIL: memory allocation
# not ok 8 Check child file memory with shared mapping, precise mode
and mmap memory
# ok 9 Check child file memory with private mapping, imprecise mode
and mmap memory
# ok 10 Check child file memory with shared mapping, imprecise mode
and mmap memory
# ok 11 Check child file memory with private mapping, precise mode and
mmap/mprotect memory
# ok 12 Check child file memory with shared mapping, precise mode and
mmap/mprotect memory
# # Totals: pass:10 fail:2 xfail:0 xpass:0 skip:0 error:0
not ok 22 selftests: arm64: check_child_memory # exit=1
boot Log links,
--------
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241028/te…
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241028/te…
Test results history:
----------
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241028/te…
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241028/te…
metadata:
----
git describe: next-20241028
git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next
git sha: dec9255a128e19c5fcc3bdb18175d78094cc624d
kernel config:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2o3tMqzOtHXYQjlvfR5t…
build url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2o3tMqzOtHXYQjlvfR5t…
toolchain: gcc-13
Steps to reproduce:
---------
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2o3tON5kNi…
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2o3tON5kNi…
--
Linaro LKFT
https://lkft.linaro.org
As the part-2 of the VIOMMU infrastructure, this series introduces a VIRQ
object after repurposing the existing FAULT object, which provides a nice
notification pathway to the user space already. So, the first thing to do
is reworking the FAULT object.
Mimicing the HWPT structures, add a common EVENT structure to support its
derivatives: EVENT_IOPF (the prior FAULT object) and EVENT_VIRQ (new one).
IOMMUFD_CMD_VIRQ_ALLOC is introduced to allocate EVENT_VIRQ for a VIOMMU.
One VIOMMU can have multiple VIRQs in different types but can not support
multiple VIRQs with the same types.
Drivers might need the VIOMMU's vdev_id list or the exact vdev_id link of
the passthrough device's to forward IRQs/events via the VIOMMU framework.
Thus, extend the set/unset_vdev_id ioctls down to the driver using VIOMMU
ops. This allows drivers to take the control of a vdev_id's lifecycle.
The forwarding part is fairly simple but might need to replace a physical
device ID with a virtual device ID. So, there comes with some helpers for
drivers to use.
As usual, this series comes with the selftest coverage for this new VIRQ,
and with a real world use case in the ARM SMMUv3 driver.
This must be based on the VIOMMU Part-1 series. It's on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_virq-v1
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_virq-v1
Thanks!
Nicolin
Nicolin Chen (10):
iommufd: Rename IOMMUFD_OBJ_FAULT to IOMMUFD_OBJ_EVENT_IOPF
iommufd: Rename fault.c to event.c
iommufd: Add IOMMUFD_OBJ_EVENT_VIRQ and IOMMUFD_CMD_VIRQ_ALLOC
iommufd/viommu: Allow drivers to control vdev_id lifecycle
iommufd/viommu: Add iommufd_vdev_id_to_dev helper
iommufd/viommu: Add iommufd_viommu_report_irq helper
iommufd/selftest: Implement mock_viommu_set/unset_vdev_id
iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VIRQ for VIRQ coverage
iommufd/selftest: Add EVENT_VIRQ test coverage
iommu/arm-smmu-v3: Report virtual IRQ for device in user space
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 109 +++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 2 +
drivers/iommu/iommufd/Makefile | 2 +-
drivers/iommu/iommufd/device.c | 2 +
drivers/iommu/iommufd/event.c | 613 ++++++++++++++++++
drivers/iommu/iommufd/fault.c | 443 -------------
drivers/iommu/iommufd/hw_pagetable.c | 12 +-
drivers/iommu/iommufd/iommufd_private.h | 147 ++++-
drivers/iommu/iommufd/iommufd_test.h | 10 +
drivers/iommu/iommufd/main.c | 13 +-
drivers/iommu/iommufd/selftest.c | 66 ++
drivers/iommu/iommufd/viommu.c | 25 +-
drivers/iommu/iommufd/viommu_api.c | 54 ++
include/linux/iommufd.h | 28 +
include/uapi/linux/iommufd.h | 46 ++
tools/testing/selftests/iommu/iommufd.c | 11 +
tools/testing/selftests/iommu/iommufd_utils.h | 64 ++
17 files changed, 1130 insertions(+), 517 deletions(-)
create mode 100644 drivers/iommu/iommufd/event.c
delete mode 100644 drivers/iommu/iommufd/fault.c
--
2.43.0
This series introduces a new vIOMMU infrastructure and related ioctls.
IOMMUFD has been using the HWPT infrastructure for all cases, including a
nested IO page table support. Yet, there're limitations for an HWPT-based
structure to support some advanced HW-accelerated features, such as CMDQV
on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-IOMMU
environment, it is not straightforward for nested HWPTs to share the same
parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
parent HWPT typically hold one stage-2 IO pagetable and tag it with only
one ID in the cache entries. When sharing one large stage-2 IO pagetable
across physical IOMMU instances, that one ID may not always be available
across all the IOMMU instances. In other word, it's ideal for SW to have
a different container for the stage-2 IO pagetable so it can hold another
ID that's available.
For this "different container", add vIOMMU, an additional layer to hold
extra virtualization information:
_______________________________________________________________________
| iommufd (with vIOMMU) |
| |
| [5] |
| _____________ |
| | | |
| |----------------| vIOMMU | |
| | | | |
| | | | |
| | [1] | | [4] [2] |
| | ______ | | _____________ ________ |
| | | | | [3] | | | | | |
| | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
| | |______| |_____________| |_____________| |________| |
| | | | | | |
|______|________|______________|__________________|_______________|_____|
| | | | |
______v_____ | ______v_____ ______v_____ ___v__
| struct | | PFN | (paging) | | (nested) | |struct|
|iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device|
|____________| storage|____________| |____________| |______|
The vIOMMU object should be seen as a slice of a physical IOMMU instance
that is passed to or shared with a VM. That can be some HW/SW resources:
- Security namespace for guest owned ID, e.g. guest-controlled cache tags
- Access to a sharable nesting parent pagetable across physical IOMMUs
- Virtualization of various platforms IDs, e.g. RIDs and others
- Delivery of paravirtualized invalidation
- Direct assigned invalidation queues
- Direct assigned interrupts
- Non-affiliated event reporting
On a multi-IOMMU system, the vIOMMU object must be instanced to the number
of the physical IOMMUs that are passed to (via devices) a guest VM, while
being able to hold the shareable parent HWPT. Each vIOMMU then just needs
to allocate its own individual ID to tag its own cache:
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested0 |--->| viommu0 ------------------
---------------- | | IDx |
----------------------------
----------------------------
---------------- | | paging_hwpt0 |
| hwpt_nested1 |--->| viommu1 ------------------
---------------- | | IDy |
----------------------------
As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an allocation
only. And implement it in arm-smmu-v3 driver as a real world use case.
More vIOMMU-based structs and ioctls will be introduced in the follow-up
series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we
repurposed the vIOMMU object from an earlier RFC, just for a referece:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v4
(paring QEMU branch for testing will be provided with the part2 series)
Changelog
v4
* Added "Reviewed-by" from Jason
* Dropped IOMMU_VIOMMU_TYPE_DEFAULT support
* Dropped iommufd_object_alloc_elm renamings
* Renamed iommufd's viommu_api.c to driver.c
* Reworked iommufd_viommu_alloc helper
* Added a separate iommufd_hwpt_nested_alloc_for_viommu function for
hwpt_nested allocations on a vIOMMU, and added comparison between
viommu->iommu_dev->ops and dev_iommu_ops(idev->dev)
* Replaced s2_parent with vsmmu in arm_smmu_nested_domain
* Replaced domain_alloc_user in iommu_ops with domain_alloc_nested in
viommu_ops
* Replaced wait_queue_head_t with a completion, to delay the unplug of
mock_iommu_dev
* Corrected documentation graph that was missing struct iommu_device
* Added an iommufd_verify_unfinalized_object helper to verify driver-
allocated vIOMMU/vDEVICE objects
* Added missing test cases for TEST_LENGTH and fail_nth
v3
https://lore.kernel.org/all/cover.1728491453.git.nicolinc@nvidia.com/
* Rebased on top of Jason's nesting v3 series
https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-smmuv3_nesting_jgg@nvid…
* Split the series into smaller parts
* Added Jason's Reviewed-by
* Added back viommu->iommu_dev
* Added support for driver-allocated vIOMMU v.s. core-allocated
* Dropped arm_smmu_cache_invalidate_user
* Added an iommufd_test_wait_for_users() in selftest
* Reworked test code to make viommu an individual FIXTURE
* Added missing TEST_LENGTH case for the new ioctl command
v2
https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
* Limited vdev_id to one per idev
* Added a rw_sem to protect the vdev_id list
* Reworked driver-level APIs with proper lockings
* Added a new viommu_api file for IOMMUFD_DRIVER config
* Dropped useless iommu_dev point from the viommu structure
* Added missing index numnbers to new types in the uAPI header
* Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
* Reworked mock_viommu_cache_invalidate() using the new iommu helper
* Reordered details of set/unset_vdev_id handlers for proper lockings
v1
https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Nicolin Chen (11):
iommufd: Move struct iommufd_object to public iommufd header
iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
iommufd: Add iommufd_verify_unfinalized_object
iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
iommufd: Add domain_alloc_nested op to iommufd_viommu_ops
iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
iommufd/selftest: Add refcount to mock_iommu_device
iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
Documentation: userspace-api: iommufd: Update vIOMMU
iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3 support
drivers/iommu/iommufd/Makefile | 5 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++---
drivers/iommu/iommufd/iommufd_private.h | 36 ++------
drivers/iommu/iommufd/iommufd_test.h | 2 +
include/linux/iommu.h | 14 +++
include/linux/iommufd.h | 89 +++++++++++++++++++
include/uapi/linux/iommufd.h | 56 ++++++++++--
tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 79 ++++++++++------
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +-
drivers/iommu/iommufd/driver.c | 38 ++++++++
drivers/iommu/iommufd/hw_pagetable.c | 69 +++++++++++++-
drivers/iommu/iommufd/main.c | 58 ++++++------
drivers/iommu/iommufd/selftest.c | 73 +++++++++++++--
drivers/iommu/iommufd/viommu.c | 85 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd.c | 78 ++++++++++++++++
.../selftests/iommu/iommufd_fail_nth.c | 11 +++
Documentation/userspace-api/iommufd.rst | 69 +++++++++++++-
18 files changed, 701 insertions(+), 124 deletions(-)
create mode 100644 drivers/iommu/iommufd/driver.c
create mode 100644 drivers/iommu/iommufd/viommu.c
--
2.43.0
From: zhouyuhang <zhouyuhang(a)kylinos.cn>
Test case idmap_mount_tree_invalid failed to run on the newer kernel
with the following output:
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0)
# idmap_mount_tree_invalid: Test terminated by assertion
This is because tmpfs is mounted at "/mnt/A", and tmpfs already
contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem:
support idmapped mounts for tmpfs"). So calling sys_mount_setattr here
returns 0 instead of -EINVAL as expected.
Ramfs is mounted at "/mnt/B" and does not support idmap mounts.
So we can use "/mnt/B" instead of "/mnt/A" to make the test run
successfully with the following output:
# Starting 1 tests from 1 test cases.
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# OK mount_setattr_idmapped.idmap_mount_tree_invalid
ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid
# PASSED: 1 / 1 tests passed.
Signed-off-by: zhouyuhang <zhouyuhang(a)kylinos.cn>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index c6a8c732b802..54552c19bc24 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -1414,7 +1414,7 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid)
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0);
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0);
- open_tree_fd = sys_open_tree(-EBADF, "/mnt/A",
+ open_tree_fd = sys_open_tree(-EBADF, "/mnt/B",
AT_RECURSIVE |
AT_EMPTY_PATH |
AT_NO_AUTOMOUNT |
--
2.27.0
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v5:
* Fixup self test dependencies on pidfd/pidfd.h.
v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
Christian, instead simply always pin the pid and maintain fd scope in the
helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
UAPI pidfd.h header without encountering the collision between system
fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
stdbool.h in userland.
https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoakes@oracl…
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracl…
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (5):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
tools: testing: separate out wait_for_pid() into helper header
selftests: pidfd: add pidfd.h UAPI wrapper
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 34 ++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 65 +++++---
kernel/signal.c | 29 +---
tools/include/linux/pidfd.h | 14 ++
tools/testing/selftests/cgroup/test_kill.c | 2 +-
.../pid_namespace/regression_enomem.c | 2 +-
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 28 +---
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
tools/testing/selftests/pidfd/pidfd_helpers.h | 39 +++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
15 files changed, 375 insertions(+), 88 deletions(-)
create mode 100644 tools/include/linux/pidfd.h
create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h
--
2.47.0
Use a less populated IP range to run the tests, as suggested by Petr in
Link: https://lore.kernel.org/netdev/87ikvukv3s.fsf@nvidia.com/.
Suggested-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh
index 06021b2059b7..4ad1e216c6b0 100755
--- a/tools/testing/selftests/drivers/net/netcons_basic.sh
+++ b/tools/testing/selftests/drivers/net/netcons_basic.sh
@@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
# Simple script to test dynamic targets in netconsole
SRCIF="" # to be populated later
-SRCIP=192.168.1.1
+SRCIP=192.168.2.1
DSTIF="" # to be populated later
-DSTIP=192.168.1.2
+DSTIP=192.168.2.2
PORT="6666"
MSG="netconsole selftest"
--
2.43.5
Following the previous vIOMMU series, this adds another vDEVICE structure,
representing the association from an iommufd_device to an iommufd_viommu.
This gives the whole architecture a new "v" layer:
_______________________________________________________________________
| iommufd (with vIOMMU/vDEVICE) |
| _____________ _____________ |
| | | | | |
| |----------------| vIOMMU |<---| vDEVICE |<------| |
| | | | |_____________| | |
| | ______ | | _____________ ___|____ |
| | | | | | | | | | |
| | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
| | |______| |_____________| |_____________| |________| |
|______|________|______________|__________________|_______________|_____|
| | | | |
______v_____ | ______v_____ ______v_____ ___v__
| struct | | PFN | (paging) | | (nested) | |struct|
|iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device|
|____________| storage|____________| |____________| |______|
This vDEVICE object is used to collect and store all vIOMMU-related device
information/attributes in a VM. As an initial series for vDEVICE, add only
the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM:
e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to
a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
of the device against the physical IOMMU instance. This is essential for a
vIOMMU-based invalidation, where the request contains a device's vID for a
device cache flush, e.g. ATC invalidation.
Therefore, with this vDEVICE object, support a vIOMMU-based invalidation,
by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache
with a given driver data.
As for the implementation of the series, add driver support in ARM SMMUv3
for a real world use case.
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v4
For testing, try this "with-rmr" branch:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v4-with-rmr
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p2-v4
Changelog
v4
* Added missing brackets in switch-case
* Fixed the unreleased idev refcount issue
* Reworked the iommufd_vdevice_alloc allocator
* Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT
* Added missing TEST_LENGTH and fail_nth coverages
* Added a verification to the driver-allocated vDEVICE object
* Added an iommufd_vdevice_abort for a missing mutex protection
* Added a u64 structure arm_vsmmu_invalidation_cmd for user command
conversion
v3
https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/
* Added Jason's Reviewed-by
* Split this invalidation part out of the part-1 series
* Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl
* Reduced viommu_api functions by allowing drivers to access viommu
and vdevice structure directly
* Dropped vdevs_rwsem by using xa_lock instead
* Dropped arm_smmu_cache_invalidate_user
v2
https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
* Limited vdev_id to one per idev
* Added a rw_sem to protect the vdev_id list
* Reworked driver-level APIs with proper lockings
* Added a new viommu_api file for IOMMUFD_DRIVER config
* Dropped useless iommu_dev point from the viommu structure
* Added missing index numnbers to new types in the uAPI header
* Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
* Reworked mock_viommu_cache_invalidate() using the new iommu helper
* Reordered details of set/unset_vdev_id handlers for proper lockings
v1
https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Jason Gunthorpe (2):
iommu: Add iommu_copy_struct_from_full_user_array helper
iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
Nicolin Chen (12):
iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct
iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl
iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
iommu/viommu: Add cache_invalidate to iommufd_viommu_ops
iommufd/hw_pagetable: Enforce cache invalidation op on vIOMMU-based
hwpt_nested
iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
iommufd/viommu: Add vdev_to_dev helper
iommufd/selftest: Add mock_viommu_cache_invalidate
iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
Documentation: userspace-api: iommufd: Update vDEVICE
iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +-
drivers/iommu/iommufd/iommufd_private.h | 12 ++
drivers/iommu/iommufd/iommufd_test.h | 30 +++
include/linux/iommu.h | 49 ++++-
include/linux/iommufd.h | 50 +++++
include/uapi/linux/iommufd.h | 61 +++++-
tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 162 +++++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 ++-
drivers/iommu/iommufd/device.c | 11 +
drivers/iommu/iommufd/driver.c | 7 +
drivers/iommu/iommufd/hw_pagetable.c | 36 +++-
drivers/iommu/iommufd/main.c | 7 +
drivers/iommu/iommufd/selftest.c | 115 +++++++++-
drivers/iommu/iommufd/viommu.c | 108 ++++++++++
tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++-
.../selftests/iommu/iommufd_fail_nth.c | 4 +
Documentation/userspace-api/iommufd.rst | 41 +++-
18 files changed, 983 insertions(+), 38 deletions(-)
--
2.43.0
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
Christian, instead simply always pin the pid and maintain fd scope in the
helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
UAPI pidfd.h header without encountering the collision between system
fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
stdbool.h in userland.
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracl…
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (4):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
selftests: pidfd: add pidfd.h UAPI wrapper
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 34 ++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 65 +++++---
kernel/signal.c | 29 +---
tools/include/linux/pidfd.h | 14 ++
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 2 +
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
12 files changed, 333 insertions(+), 61 deletions(-)
create mode 100644 tools/include/linux/pidfd.h
--
2.46.2
RISC-V defines three extensions for pointer masking[1]:
- Smmpm: configured in M-mode, affects M-mode
- Smnpm: configured in M-mode, affects the next lower mode (S or U-mode)
- Ssnpm: configured in S-mode, affects the next lower mode (VS, VU, or U-mode)
This series adds support for configuring Smnpm or Ssnpm (depending on
which privilege mode the kernel is running in) to allow pointer masking
in userspace (VU or U-mode), extending the PR_SET_TAGGED_ADDR_CTRL API
from arm64. Unlike arm64 TBI, userspace pointer masking is not enabled
by default on RISC-V. Additionally, the tag width (referred to as PMLEN)
is variable, so userspace needs to ask the kernel for a specific tag
width, which is interpreted as a lower bound on the number of tag bits.
This series also adds support for a tagged address ABI similar to arm64
and x86. Since accesses from the kernel to user memory use the kernel's
pointer masking configuration, not the user's, the kernel must untag
user pointers in software before dereferencing them. And since the tag
width is variable, as with LAM on x86, it must be kept the same across
all threads in a process so untagged_addr_remote() can work.
[1]: https://github.com/riscv/riscv-j-extension/raw/d70011dde6c2/zjpm-spec.pdf
---
This series depends on the per-thread envcfg series in riscv/for-next.
This series can be tested in QEMU by applying a patch set[2].
KASAN_SW_TAGS using pointer masking is an independent patch series[3].
[2]: https://lore.kernel.org/qemu-devel/20240511101053.1875596-1-me@deliversmonk…
[3]: https://lore.kernel.org/linux-riscv/20240814085618.968833-1-samuel.holland@…
Changes in v5:
- Update pointer masking spec version to 1.0 and state to ratified
- Document how PR_[SG]ET_TAGGED_ADDR_CTRL are used on RISC-V
- Document that the RISC-V tagged address ABI is the same as AArch64
- Rename "pm" selftests directory to "abi" to be more generic
- Fix -Wparentheses warnings
- Fix order of operations when writing via the tagged pointer
- Update pointer masking spec version to 1.0 in hwprobe documentation
Changes in v4:
- Switch IS_ENABLED back to #ifdef to fix riscv32 build
- Combine __untagged_addr() and __untagged_addr_remote()
Changes in v3:
- Note in the commit message that the ISA extension spec is frozen
- Rebase on riscv/for-next (ISA extension list conflicts)
- Remove RISCV_ISA_EXT_SxPM, which was not used anywhere
- Use shifts instead of large numbers in ENVCFG_PMM* macro definitions
- Rename CONFIG_RISCV_ISA_POINTER_MASKING to CONFIG_RISCV_ISA_SUPM,
since it only controls the userspace part of pointer masking
- Use IS_ENABLED instead of #ifdef when possible
- Use an enum for the supported PMLEN values
- Simplify the logic in set_tagged_addr_ctrl()
- Use IS_ENABLED instead of #ifdef when possible
- Implement mm_untag_mask()
- Remove pmlen from struct thread_info (now only in mm_context_t)
Changes in v2:
- Drop patch 4 ("riscv: Define is_compat_thread()"), as an equivalent
patch was already applied
- Move patch 5 ("riscv: Split per-CPU and per-thread envcfg bits") to a
different series[3]
- Update pointer masking specification version reference
- Provide macros for the extension affecting the kernel and userspace
- Use the correct name for the hstatus.HUPMM field
- Rebase on riscv/linux.git for-next
- Add and use the envcfg_update_bits() helper function
- Inline flush_tagged_addr_state()
- Implement untagged_addr_remote()
- Restrict PMLEN changes once a process is multithreaded
- Rename "tags" directory to "pm" to avoid .gitignore rules
- Add .gitignore file to ignore the compiled selftest binary
- Write to a pipe to force dereferencing the user pointer
- Handle SIGSEGV in the child process to reduce dmesg noise
- Export Supm via hwprobe
- Export Smnpm and Ssnpm to KVM guests
Samuel Holland (10):
dt-bindings: riscv: Add pointer masking ISA extensions
riscv: Add ISA extension parsing for pointer masking
riscv: Add CSR definitions for pointer masking
riscv: Add support for userspace pointer masking
riscv: Add support for the tagged address ABI
riscv: Allow ptrace control of the tagged address ABI
riscv: selftests: Add a pointer masking test
riscv: hwprobe: Export the Supm ISA extension
RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests
KVM: riscv: selftests: Add Smnpm and Ssnpm to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 3 +
Documentation/arch/riscv/uabi.rst | 16 +
.../devicetree/bindings/riscv/extensions.yaml | 18 +
arch/riscv/Kconfig | 11 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/hwcap.h | 5 +
arch/riscv/include/asm/mmu.h | 7 +
arch/riscv/include/asm/mmu_context.h | 13 +
arch/riscv/include/asm/processor.h | 8 +
arch/riscv/include/asm/switch_to.h | 11 +
arch/riscv/include/asm/uaccess.h | 43 ++-
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 2 +
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/process.c | 154 ++++++++
arch/riscv/kernel/ptrace.c | 42 +++
arch/riscv/kernel/sys_hwprobe.c | 3 +
arch/riscv/kvm/vcpu_onereg.c | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 5 +-
.../selftests/kvm/riscv/get-reg-list.c | 8 +
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/abi/.gitignore | 1 +
tools/testing/selftests/riscv/abi/Makefile | 10 +
.../selftests/riscv/abi/pointer_masking.c | 332 ++++++++++++++++++
25 files changed, 712 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/riscv/abi/.gitignore
create mode 100644 tools/testing/selftests/riscv/abi/Makefile
create mode 100644 tools/testing/selftests/riscv/abi/pointer_masking.c
--
2.45.1
This series was motivated by the regression fixed by 166bf8af9122
("pinctrl: mediatek: common-v2: Fix broken bias-disable for
PULL_PU_PD_RSEL_TYPE"). A bug was introduced in the pinctrl_paris driver
which prevented certain pins from having their bias configured.
Running this test on the mt8195-tomato platform with the test plan
included below[1] shows the test passing with the fix applied, but failing
without the fix:
With fix:
$ ./gpio-setget-config.py
TAP version 13
# Using test plan file: ./google,tomato.yaml
1..3
ok 1 pinctrl_paris.34.pull-up
ok 2 pinctrl_paris.34.pull-down
ok 3 pinctrl_paris.34.disabled
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Without fix:
$ ./gpio-setget-config.py
TAP version 13
# Using test plan file: ./google,tomato.yaml
1..3
# Bias doesn't match: Expected pull-up, read pull-down.
not ok 1 pinctrl_paris.34.pull-up
ok 2 pinctrl_paris.34.pull-down
# Bias doesn't match: Expected disabled, read pull-down.
not ok 3 pinctrl_paris.34.disabled
# Totals: pass:1 fail:2 xfail:0 xpass:0 skip:0 error:0
In order to achieve this, the first patch exposes bias configuration
through the GPIO API in the pinctrl_paris driver, patch 2 extends the
gpio-mockup-cdev utility for use by patch 3, and patch 3 introduces a
new GPIO kselftest that takes a test plan in YAML, which can be tailored
per-platform to specify the configurations to test, and sets and gets
back each pin configuration to verify that they match and thus that the
driver is behaving as expected.
Since the GPIO uAPI only allows setting the pin configuration, getting
it back is done through pinconf-pins in the pinctrl debugfs folder.
The test currently only verifies bias but it would be easy to extend to
verify other pin configurations.
The test plan YAML file can be customized for each use-case and is
platform-dependant. For that reason, only an example is included in
patch 3 and the user is supposed to provide their test plan. That said,
the aim is to collect test plans for ease of use at [2].
[1] This is the test plan used for mt8195-tomato:
- label: "pinctrl_paris"
tests:
# Pin 34 has type MTK_PULL_PU_PD_RSEL_TYPE and is unused.
# Setting bias to MTK_PULL_PU_PD_RSEL_TYPE pins was fixed by
# 166bf8af9122 ("pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE")
- pin: 34
bias: "pull-up"
- pin: 34
bias: "pull-down"
- pin: 34
bias: "disabled"
[2] https://github.com/kernelci/platform-test-parameters
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Nícolas F. R. A. Prado (3):
pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
selftest: gpio: Add wait flag to gpio-mockup-cdev
selftest: gpio: Add a new set-get config test
drivers/pinctrl/mediatek/pinctrl-paris.c | 20 +--
tools/testing/selftests/gpio/Makefile | 2 +-
tools/testing/selftests/gpio/gpio-mockup-cdev.c | 14 +-
.../gpio-set-get-config-example-test-plan.yaml | 15 ++
.../testing/selftests/gpio/gpio-set-get-config.py | 183 +++++++++++++++++++++
5 files changed, 220 insertions(+), 14 deletions(-)
---
base-commit: 6a7917c89f219f09b1d88d09f376000914a52763
change-id: 20240906-kselftest-gpio-set-get-config-6e5bb670c1a5
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.
Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 73 -------------------
tools/testing/selftests/net/lib.sh | 73 +++++++++++++++++++
2 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index d28dbf27c1f0..8625e3c99f55 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -445,79 +445,6 @@ done
##############################################################################
# Helpers
-# Whether FAILs should be interpreted as XFAILs. Internal.
-FAIL_TO_XFAIL=
-
-check_err()
-{
- local err=$1
- local msg=$2
-
- if ((err)); then
- if [[ $FAIL_TO_XFAIL = yes ]]; then
- ret_set_ksft_status $ksft_xfail "$msg"
- else
- ret_set_ksft_status $ksft_fail "$msg"
- fi
- fi
-}
-
-check_fail()
-{
- local err=$1
- local msg=$2
-
- check_err $((!err)) "$msg"
-}
-
-check_err_fail()
-{
- local should_fail=$1; shift
- local err=$1; shift
- local what=$1; shift
-
- if ((should_fail)); then
- check_fail $err "$what succeeded, but should have failed"
- else
- check_err $err "$what failed"
- fi
-}
-
-xfail()
-{
- FAIL_TO_XFAIL=yes "$@"
-}
-
-xfail_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW = yes ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
-omit_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW != yes ]]; then
- "$@"
- fi
-}
-
-xfail_on_veth()
-{
- local dev=$1; shift
- local kind
-
- kind=$(ip -j -d link show dev $dev |
- jq -r '.[].linkinfo.info_kind')
- if [[ $kind = veth ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 4f52b8e48a3a..6bcf5d13879d 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -361,3 +361,76 @@ tests_run()
$current_test
done
}
+
+# Whether FAILs should be interpreted as XFAILs. Internal.
+FAIL_TO_XFAIL=
+
+check_err()
+{
+ local err=$1
+ local msg=$2
+
+ if ((err)); then
+ if [[ $FAIL_TO_XFAIL = yes ]]; then
+ ret_set_ksft_status $ksft_xfail "$msg"
+ else
+ ret_set_ksft_status $ksft_fail "$msg"
+ fi
+ fi
+}
+
+check_fail()
+{
+ local err=$1
+ local msg=$2
+
+ check_err $((!err)) "$msg"
+}
+
+check_err_fail()
+{
+ local should_fail=$1; shift
+ local err=$1; shift
+ local what=$1; shift
+
+ if ((should_fail)); then
+ check_fail $err "$what succeeded, but should have failed"
+ else
+ check_err $err "$what failed"
+ fi
+}
+
+xfail()
+{
+ FAIL_TO_XFAIL=yes "$@"
+}
+
+xfail_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW = yes ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
+
+omit_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW != yes ]]; then
+ "$@"
+ fi
+}
+
+xfail_on_veth()
+{
+ local dev=$1; shift
+ local kind
+
+ kind=$(ip -j -d link show dev $dev |
+ jq -r '.[].linkinfo.info_kind')
+ if [[ $kind = veth ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
--
2.45.0
It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.
Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 10 ++++++++++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 41dd14c42c48..d28dbf27c1f0 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1285,16 +1285,6 @@ matchall_sink_create()
action drop
}
-tests_run()
-{
- local current_test
-
- for current_test in ${TESTS:-$ALL_TESTS}; do
- in_defer_scope \
- $current_test
- done
-}
-
cleanup()
{
pre_cleanup
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 691318b1ec55..4f52b8e48a3a 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -351,3 +351,13 @@ log_info()
echo "INFO: $msg"
}
+
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ in_defer_scope \
+ $current_test
+ done
+}
--
2.45.0
Many net selftests invent their own logging helpers. These really should be
in a library sourced by these tests. Currently forwarding/lib.sh has a
suite of perfectly fine logging helpers, but sourcing a forwarding/ library
from a higher-level directory smells of layering violation. In this patch,
move the logging helpers to net/lib.sh so that every net test can use them.
Together with the logging helpers, it's also necessary to move
pause_on_fail(), and EXIT_STATUS and RET.
Existing lib.sh users might be using these same names for their functions
or variables. However lib.sh is always sourced near the top of the
file (checked), and whatever new definitions will simply override the ones
provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 113 -----------------
tools/testing/selftests/net/lib.sh | 115 ++++++++++++++++++
2 files changed, 115 insertions(+), 113 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 89c25f72b10c..41dd14c42c48 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -48,7 +48,6 @@ declare -A NETIFS=(
: "${WAIT_TIME:=5}"
# Whether to pause on, respectively, after a failure and before cleanup.
-: "${PAUSE_ON_FAIL:=no}"
: "${PAUSE_ON_CLEANUP:=no}"
# Whether to create virtual interfaces, and what netdevice type they should be.
@@ -446,22 +445,6 @@ done
##############################################################################
# Helpers
-# Exit status to return at the end. Set in case one of the tests fails.
-EXIT_STATUS=0
-# Per-test return value. Clear at the beginning of each test.
-RET=0
-
-ret_set_ksft_status()
-{
- local ksft_status=$1; shift
- local msg=$1; shift
-
- RET=$(ksft_status_merge $RET $ksft_status)
- if (( $? )); then
- retmsg=$msg
- fi
-}
-
# Whether FAILs should be interpreted as XFAILs. Internal.
FAIL_TO_XFAIL=
@@ -535,102 +518,6 @@ xfail_on_veth()
fi
}
-log_test_result()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
- local result=$1; shift
- local retmsg=$1; shift
-
- printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
- if [[ $retmsg ]]; then
- printf "\t%s\n" "$retmsg"
- fi
-}
-
-pause_on_fail()
-{
- if [[ $PAUSE_ON_FAIL == yes ]]; then
- echo "Hit enter to continue, 'q' to quit"
- read a
- [[ $a == q ]] && exit 1
- fi
-}
-
-handle_test_result_pass()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" " OK "
-}
-
-handle_test_result_fail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_xfail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_skip()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
-}
-
-log_test()
-{
- local test_name=$1
- local opt_str=$2
-
- if [[ $# -eq 2 ]]; then
- opt_str="($opt_str)"
- fi
-
- if ((RET == ksft_pass)); then
- handle_test_result_pass "$test_name" "$opt_str"
- elif ((RET == ksft_xfail)); then
- handle_test_result_xfail "$test_name" "$opt_str"
- elif ((RET == ksft_skip)); then
- handle_test_result_skip "$test_name" "$opt_str"
- else
- handle_test_result_fail "$test_name" "$opt_str"
- fi
-
- EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
- return $RET
-}
-
-log_test_skip()
-{
- RET=$ksft_skip retmsg= log_test "$@"
-}
-
-log_test_xfail()
-{
- RET=$ksft_xfail retmsg= log_test "$@"
-}
-
-log_info()
-{
- local msg=$1
-
- echo "INFO: $msg"
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index c8991cc6bf28..691318b1ec55 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -9,6 +9,9 @@ source "$net_dir/lib/sh/defer.sh"
: "${WAIT_TIMEOUT:=20}"
+# Whether to pause on after a failure.
+: "${PAUSE_ON_FAIL:=no}"
+
BUSYWAIT_TIMEOUT=$((WAIT_TIMEOUT * 1000)) # ms
# Kselftest framework constants.
@@ -20,6 +23,11 @@ ksft_skip=4
# namespace list created by setup_ns
NS_LIST=()
+# Exit status to return at the end. Set in case one of the tests fails.
+EXIT_STATUS=0
+# Per-test return value. Clear at the beginning of each test.
+RET=0
+
##############################################################################
# Helpers
@@ -236,3 +244,110 @@ tc_rule_handle_stats_get()
| jq ".[] | select(.options.handle == $handle) | \
.options.actions[0].stats$selector"
}
+
+ret_set_ksft_status()
+{
+ local ksft_status=$1; shift
+ local msg=$1; shift
+
+ RET=$(ksft_status_merge $RET $ksft_status)
+ if (( $? )); then
+ retmsg=$msg
+ fi
+}
+
+log_test_result()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+ local result=$1; shift
+ local retmsg=$1; shift
+
+ printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
+ if [[ $retmsg ]]; then
+ printf "\t%s\n" "$retmsg"
+ fi
+}
+
+pause_on_fail()
+{
+ if [[ $PAUSE_ON_FAIL == yes ]]; then
+ echo "Hit enter to continue, 'q' to quit"
+ read a
+ [[ $a == q ]] && exit 1
+ fi
+}
+
+handle_test_result_pass()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" " OK "
+}
+
+handle_test_result_fail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_xfail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_skip()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
+}
+
+log_test()
+{
+ local test_name=$1
+ local opt_str=$2
+
+ if [[ $# -eq 2 ]]; then
+ opt_str="($opt_str)"
+ fi
+
+ if ((RET == ksft_pass)); then
+ handle_test_result_pass "$test_name" "$opt_str"
+ elif ((RET == ksft_xfail)); then
+ handle_test_result_xfail "$test_name" "$opt_str"
+ elif ((RET == ksft_skip)); then
+ handle_test_result_skip "$test_name" "$opt_str"
+ else
+ handle_test_result_fail "$test_name" "$opt_str"
+ fi
+
+ EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
+ return $RET
+}
+
+log_test_skip()
+{
+ RET=$ksft_skip retmsg= log_test "$@"
+}
+
+log_test_xfail()
+{
+ RET=$ksft_xfail retmsg= log_test "$@"
+}
+
+log_info()
+{
+ local msg=$1
+
+ echo "INFO: $msg"
+}
--
2.45.0
Some applications rely on placing data in free bits addresses allocated
by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
address returned by mmap to be less than the 48-bit address space,
unless the hint address uses more than 47 bits (the 48th bit is reserved
for the kernel address space).
The riscv architecture needs a way to similarly restrict the virtual
address space. On the riscv port of OpenJDK an error is thrown if
attempted to run on the 57-bit address space, called sv57 [1]. golang
has a comment that sv57 support is not complete, but there are some
workarounds to get it to mostly work [2].
These applications work on x86 because x86 does an implicit 47-bit
restriction of mmap() address that contain a hint address that is less
than 48 bits.
Instead of implicitly restricting the address space on riscv (or any
current/future architecture), a flag would allow users to opt-in to this
behavior rather than opt-out as is done on other architectures. This is
desirable because it is a small class of applications that do pointer
masking.
This flag will also allow seemless compatibility between all
architectures, so applications like Go and OpenJDK that use bits in a
virtual address can request the exact number of bits they need in a
generic way. The flag can be checked inside of vm_unmapped_area() so
that this flag does not have to be handled individually by each
architecture.
Link:
https://github.com/openjdk/jdk/blob/f080b4bb8a75284db1b6037f8c00ef3b1ef1add…
[1]
Link:
https://github.com/golang/go/blob/9e8ea567c838574a0f14538c0bbbd83c3215aa55/…
[2]
To: Arnd Bergmann <arnd(a)arndb.de>
To: Richard Henderson <richard.henderson(a)linaro.org>
To: Ivan Kokshaysky <ink(a)jurassic.park.msu.ru>
To: Matt Turner <mattst88(a)gmail.com>
To: Vineet Gupta <vgupta(a)kernel.org>
To: Russell King <linux(a)armlinux.org.uk>
To: Guo Ren <guoren(a)kernel.org>
To: Huacai Chen <chenhuacai(a)kernel.org>
To: WANG Xuerui <kernel(a)xen0n.name>
To: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
To: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
To: Helge Deller <deller(a)gmx.de>
To: Michael Ellerman <mpe(a)ellerman.id.au>
To: Nicholas Piggin <npiggin(a)gmail.com>
To: Christophe Leroy <christophe.leroy(a)csgroup.eu>
To: Naveen N Rao <naveen(a)kernel.org>
To: Alexander Gordeev <agordeev(a)linux.ibm.com>
To: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
To: Heiko Carstens <hca(a)linux.ibm.com>
To: Vasily Gorbik <gor(a)linux.ibm.com>
To: Christian Borntraeger <borntraeger(a)linux.ibm.com>
To: Sven Schnelle <svens(a)linux.ibm.com>
To: Yoshinori Sato <ysato(a)users.sourceforge.jp>
To: Rich Felker <dalias(a)libc.org>
To: John Paul Adrian Glaubitz <glaubitz(a)physik.fu-berlin.de>
To: David S. Miller <davem(a)davemloft.net>
To: Andreas Larsson <andreas(a)gaisler.com>
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andy Lutomirski <luto(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Muchun Song <muchun.song(a)linux.dev>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: linux-arch(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-alpha(a)vger.kernel.org
Cc: linux-snps-arc(a)lists.infradead.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-csky(a)vger.kernel.org
Cc: loongarch(a)lists.linux.dev
Cc: linux-mips(a)vger.kernel.org
Cc: linux-parisc(a)vger.kernel.org
Cc: linuxppc-dev(a)lists.ozlabs.org
Cc: linux-s390(a)vger.kernel.org
Cc: linux-sh(a)vger.kernel.org
Cc: sparclinux(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
Changes in v2:
- Added much greater detail to cover letter
- Removed all code that touched architecture specific code and was able
to factor this out into all generic functions, except for flags that
needed to be added to vm_unmapped_area_info
- Made this an RFC since I have only tested it on riscv and x86
- Link to v1: https://lore.kernel.org/r/20240827-patches-below_hint_mmap-v1-0-46ff2eb9022…
---
Charlie Jenkins (4):
mm: Add MAP_BELOW_HINT
mm: Add hint and mmap_flags to struct vm_unmapped_area_info
mm: Support MAP_BELOW_HINT in vm_unmapped_area()
selftests/mm: Create MAP_BELOW_HINT test
arch/alpha/kernel/osf_sys.c | 2 ++
arch/arc/mm/mmap.c | 3 +++
arch/arm/mm/mmap.c | 7 ++++++
arch/csky/abiv1/mmap.c | 3 +++
arch/loongarch/mm/mmap.c | 3 +++
arch/mips/mm/mmap.c | 3 +++
arch/parisc/kernel/sys_parisc.c | 3 +++
arch/powerpc/mm/book3s64/slice.c | 7 ++++++
arch/s390/mm/hugetlbpage.c | 4 ++++
arch/s390/mm/mmap.c | 6 ++++++
arch/sh/mm/mmap.c | 6 ++++++
arch/sparc/kernel/sys_sparc_32.c | 3 +++
arch/sparc/kernel/sys_sparc_64.c | 6 ++++++
arch/sparc/mm/hugetlbpage.c | 4 ++++
arch/x86/kernel/sys_x86_64.c | 6 ++++++
arch/x86/mm/hugetlbpage.c | 4 ++++
fs/hugetlbfs/inode.c | 4 ++++
include/linux/mm.h | 2 ++
include/uapi/asm-generic/mman-common.h | 1 +
mm/mmap.c | 9 ++++++++
tools/include/uapi/asm-generic/mman-common.h | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/map_below_hint.c | 32 ++++++++++++++++++++++++++++
23 files changed, 120 insertions(+)
---
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
--
- Charlie
Many net selftests invent their own logging helpers. These really should be
in a library sourced by these tests. Currently forwarding/lib.sh has a
suite of perfectly fine logging helpers, but sourcing a forwarding/ library
from a higher-level directory smells of layering violation. In this patch,
move the logging helpers to net/lib.sh so that every net test can use them.
Together with the logging helpers, it's also necessary to move
pause_on_fail(), and EXIT_STATUS and RET.
Existing lib.sh users might be using these same names for their functions
or variables. However lib.sh is always sourced near the top of the
file (checked), and whatever new definitions will simply override the ones
provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 113 -----------------
tools/testing/selftests/net/lib.sh | 115 ++++++++++++++++++
2 files changed, 115 insertions(+), 113 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 89c25f72b10c..41dd14c42c48 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -48,7 +48,6 @@ declare -A NETIFS=(
: "${WAIT_TIME:=5}"
# Whether to pause on, respectively, after a failure and before cleanup.
-: "${PAUSE_ON_FAIL:=no}"
: "${PAUSE_ON_CLEANUP:=no}"
# Whether to create virtual interfaces, and what netdevice type they should be.
@@ -446,22 +445,6 @@ done
##############################################################################
# Helpers
-# Exit status to return at the end. Set in case one of the tests fails.
-EXIT_STATUS=0
-# Per-test return value. Clear at the beginning of each test.
-RET=0
-
-ret_set_ksft_status()
-{
- local ksft_status=$1; shift
- local msg=$1; shift
-
- RET=$(ksft_status_merge $RET $ksft_status)
- if (( $? )); then
- retmsg=$msg
- fi
-}
-
# Whether FAILs should be interpreted as XFAILs. Internal.
FAIL_TO_XFAIL=
@@ -535,102 +518,6 @@ xfail_on_veth()
fi
}
-log_test_result()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
- local result=$1; shift
- local retmsg=$1; shift
-
- printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
- if [[ $retmsg ]]; then
- printf "\t%s\n" "$retmsg"
- fi
-}
-
-pause_on_fail()
-{
- if [[ $PAUSE_ON_FAIL == yes ]]; then
- echo "Hit enter to continue, 'q' to quit"
- read a
- [[ $a == q ]] && exit 1
- fi
-}
-
-handle_test_result_pass()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" " OK "
-}
-
-handle_test_result_fail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_xfail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_skip()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
-}
-
-log_test()
-{
- local test_name=$1
- local opt_str=$2
-
- if [[ $# -eq 2 ]]; then
- opt_str="($opt_str)"
- fi
-
- if ((RET == ksft_pass)); then
- handle_test_result_pass "$test_name" "$opt_str"
- elif ((RET == ksft_xfail)); then
- handle_test_result_xfail "$test_name" "$opt_str"
- elif ((RET == ksft_skip)); then
- handle_test_result_skip "$test_name" "$opt_str"
- else
- handle_test_result_fail "$test_name" "$opt_str"
- fi
-
- EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
- return $RET
-}
-
-log_test_skip()
-{
- RET=$ksft_skip retmsg= log_test "$@"
-}
-
-log_test_xfail()
-{
- RET=$ksft_xfail retmsg= log_test "$@"
-}
-
-log_info()
-{
- local msg=$1
-
- echo "INFO: $msg"
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index c8991cc6bf28..691318b1ec55 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -9,6 +9,9 @@ source "$net_dir/lib/sh/defer.sh"
: "${WAIT_TIMEOUT:=20}"
+# Whether to pause on after a failure.
+: "${PAUSE_ON_FAIL:=no}"
+
BUSYWAIT_TIMEOUT=$((WAIT_TIMEOUT * 1000)) # ms
# Kselftest framework constants.
@@ -20,6 +23,11 @@ ksft_skip=4
# namespace list created by setup_ns
NS_LIST=()
+# Exit status to return at the end. Set in case one of the tests fails.
+EXIT_STATUS=0
+# Per-test return value. Clear at the beginning of each test.
+RET=0
+
##############################################################################
# Helpers
@@ -236,3 +244,110 @@ tc_rule_handle_stats_get()
| jq ".[] | select(.options.handle == $handle) | \
.options.actions[0].stats$selector"
}
+
+ret_set_ksft_status()
+{
+ local ksft_status=$1; shift
+ local msg=$1; shift
+
+ RET=$(ksft_status_merge $RET $ksft_status)
+ if (( $? )); then
+ retmsg=$msg
+ fi
+}
+
+log_test_result()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+ local result=$1; shift
+ local retmsg=$1; shift
+
+ printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
+ if [[ $retmsg ]]; then
+ printf "\t%s\n" "$retmsg"
+ fi
+}
+
+pause_on_fail()
+{
+ if [[ $PAUSE_ON_FAIL == yes ]]; then
+ echo "Hit enter to continue, 'q' to quit"
+ read a
+ [[ $a == q ]] && exit 1
+ fi
+}
+
+handle_test_result_pass()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" " OK "
+}
+
+handle_test_result_fail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_xfail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_skip()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
+}
+
+log_test()
+{
+ local test_name=$1
+ local opt_str=$2
+
+ if [[ $# -eq 2 ]]; then
+ opt_str="($opt_str)"
+ fi
+
+ if ((RET == ksft_pass)); then
+ handle_test_result_pass "$test_name" "$opt_str"
+ elif ((RET == ksft_xfail)); then
+ handle_test_result_xfail "$test_name" "$opt_str"
+ elif ((RET == ksft_skip)); then
+ handle_test_result_skip "$test_name" "$opt_str"
+ else
+ handle_test_result_fail "$test_name" "$opt_str"
+ fi
+
+ EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
+ return $RET
+}
+
+log_test_skip()
+{
+ RET=$ksft_skip retmsg= log_test "$@"
+}
+
+log_test_xfail()
+{
+ RET=$ksft_xfail retmsg= log_test "$@"
+}
+
+log_info()
+{
+ local msg=$1
+
+ echo "INFO: $msg"
+}
--
2.45.0
The series of patches are for doing basic tests
of NIC driver. Test comprises checks for auto-negotiation,
speed, duplex state and throughput between local NIC and
partner. Tools such as ethtool, iperf3 are used.
Signed-off-by: Mohan Prasad J <mohan.prasad(a)microchip.com>
---
Changes in v3:
- LinkConfig class is included in the hw library. This contains
generic APIs for doing link layer operations.
- Auto-negotiation checks involve changing the auto-neg state
both in local and partner NIC.
- Link layer test and performance test are separated to
different selftest files.
- Resetting of NIC driver done after test completion.
Changes in v2:
- Changed the hardcoded implementation of speed, duplex states,
throughput to generic values, in order to support all type
of NIC drivers.
- Test executes based on the supported link modes between local
NIC driver and partner.
- Instead of lan743x directory, selftest file is now relocated
to /selftests/drivers/net/hw.
---
Mohan Prasad J (3):
selftests: nic_link_layer: Add link layer selftest for NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex
states
selftests: nic_performance: Add selftest for performance of NIC driver
.../testing/selftests/drivers/net/hw/Makefile | 2 +
.../drivers/net/hw/lib/py/__init__.py | 1 +
.../drivers/net/hw/lib/py/linkconfig.py | 220 ++++++++++++++++++
.../drivers/net/hw/nic_link_layer.py | 105 +++++++++
.../drivers/net/hw/nic_performance.py | 121 ++++++++++
5 files changed, 449 insertions(+)
create mode 100644 tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
create mode 100644 tools/testing/selftests/drivers/net/hw/nic_link_layer.py
create mode 100644 tools/testing/selftests/drivers/net/hw/nic_performance.py
--
2.43.0
This test verifies the correct behavior of the fork() system call,
which creates a child process by duplicating the parent process.
The test checks the following:
- The child PID returned by fork() is present in /proc.
- The child PID is different from the parent PID.
- The memory allocated to a variable in the child process is independent
of the parent process.
Test logs :
- Run without root
TAP version 13
1..1
ok 1 # SKIP This test needs root to run!
- Run with root
TAP version 13
1..1
# Inside the parent process.
# Child PID got from fork() return : 56038
# Parent PID from getpid(): 56037
# Inside the child process.
1..2
ok 1 Child Pid from /proc and fork() matching
ok 2 Child Pid != Parent pid
1..3
ok 3 After modification in child No effect on the value of 'var' in parent
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Shivam Chaudhary <cvam0000(a)gmail.com>
---
Here is my proposal for a new directory, /syscalls, to add syscall selftests,
as there is currently no dedicated space for these tests. I encountered this
issue while writing the test case for the delete_module syscall and was unsure
where to place it. As a heads-up, the delete_module test is currently under
review, and I would like to add it to this directory.
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/syscalls/.gitignore | 1 +
.../syscalls/fork_syscall/.gitignore | 1 +
.../selftests/syscalls/fork_syscall/Makefile | 5 +
.../syscalls/fork_syscall/fork_syscall.c | 151 ++++++++++++++++++
5 files changed, 159 insertions(+)
create mode 100644 tools/testing/selftests/syscalls/.gitignore
create mode 100644 tools/testing/selftests/syscalls/fork_syscall/.gitignore
create mode 100644 tools/testing/selftests/syscalls/fork_syscall/Makefile
create mode 100644 tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 363d031a16f7..9265c17c5de3 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -97,6 +97,7 @@ TARGETS += sparc64
TARGETS += splice
TARGETS += static_keys
TARGETS += sync
+TARGETS += syscalls/fork_syscall
TARGETS += syscall_user_dispatch
TARGETS += sysctl
TARGETS += tc-testing
diff --git a/tools/testing/selftests/syscalls/.gitignore b/tools/testing/selftests/syscalls/.gitignore
new file mode 100644
index 000000000000..c7ae138d3f0c
--- /dev/null
+++ b/tools/testing/selftests/syscalls/.gitignore
@@ -0,0 +1 @@
+// SPDX-License-Identifier: GPL-2.0
\ No newline at end of file
diff --git a/tools/testing/selftests/syscalls/fork_syscall/.gitignore b/tools/testing/selftests/syscalls/fork_syscall/.gitignore
new file mode 100644
index 000000000000..788cc1ff70bd
--- /dev/null
+++ b/tools/testing/selftests/syscalls/fork_syscall/.gitignore
@@ -0,0 +1 @@
+# SPDX-License-Identifier: GPL-2.0-only
\ No newline at end of file
diff --git a/tools/testing/selftests/syscalls/fork_syscall/Makefile b/tools/testing/selftests/syscalls/fork_syscall/Makefile
new file mode 100644
index 000000000000..56033a3d5a87
--- /dev/null
+++ b/tools/testing/selftests/syscalls/fork_syscall/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_GEN_PROGS := fork_syscall
+CFLAGS += -Wall
+
+include ../lib.mk
\ No newline at end of file
diff --git a/tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c b/tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c
new file mode 100644
index 000000000000..eab22831f7e1
--- /dev/null
+++ b/tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* kselftest for fork() system call
+ *
+ * Summery : fork() system call is used to create a new process
+ * by duplicating an existing one. The new process, known as the
+ * child process, is a copy of the parent process.
+ *
+ * Child process is dublicate process but has different PID and
+ * memory allocation.
+ *
+ * About the test : With this test we are testing the following:
+ * - Child PID which fork() returns to Parent is present in /proc
+ * - Child PID is not same as Parent PID.
+ * - Memory allocation to a variable in child and parent process
+ * is different.
+*/
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <dirent.h>
+#include <ctype.h>
+
+#include "../../kselftest.h"
+
+// Function to check if a string is numeric (PID check)
+int is_numeric(const char *str) {
+ while (*str) {
+ if (!isdigit(*str)) return 0;
+ str++;
+ }
+ return 1;
+}
+
+// Function to find the child PID in /proc
+pid_t find_child_pid(pid_t parent_pid) {
+ DIR *proc_dir = opendir("/proc");
+ struct dirent *entry;
+
+ if (proc_dir == NULL) {
+ perror("Failed to open /proc directory");
+ ksft_exit_fail();
+ return 1;
+ }
+
+ // Iterate through the /proc directory to find PIDs
+ while ((entry = readdir(proc_dir)) != NULL) {
+ // Check if the entry is a PID
+ if (is_numeric(entry->d_name)) {
+ pid_t pid = atoi(entry->d_name);
+
+ // Construct the path to /proc/<pid>/
+ //stat to check the parent PID
+
+ char path[40], buffer[100];
+ snprintf(path, 40, "/proc/%d/stat", pid);
+
+ FILE *stat_file = fopen(path, "r");
+ if (stat_file != NULL) {
+ fgets(buffer, 100, stat_file);
+ fclose(stat_file);
+
+ // The fourth field in /proc/<pid>/stat is the parent PID
+ pid_t ppid;
+ sscanf(buffer, "%*d %*s %*c %d", &ppid);
+
+ if (ppid == parent_pid) {
+ closedir(proc_dir);
+ // Return the child PID if the parent PID matches
+ return pid;
+ }
+ }
+ }
+ }
+
+ closedir(proc_dir);
+
+ // Return -1 if no child PID was found
+ return -1;
+}
+
+int main(void) {
+
+ // Setting up kselftest framework
+ ksft_print_header();
+ ksft_set_plan(1);
+
+ // Check if test is run a root
+ if (geteuid()) {
+ ksft_test_result_skip("This test needs root to run!\n");
+ return 1;
+ }
+
+ // forking
+ pid_t pid = fork();
+
+ // Declare a variable in both parent and child processes
+ int var = 17;
+
+ if (pid == -1) {
+ ksft_test_result_error("%s.\n", strerror(errno));
+ ksft_finished();
+ return 1;
+
+ } else if (pid == 0) {
+ // This is the child process
+ ksft_print_msg("Inside the child process.\n");
+ var = 1998;
+
+ } else {
+ // This is the parent process
+ pid_t ppid=getpid();
+ ksft_print_msg("Inside the parent process.\n");
+ ksft_print_msg("Child PID got from fork() return : %d\n", pid);
+ ksft_print_msg("Parent PID from getpid(): %d\n",ppid);
+
+ // Find the child PID in /proc
+ pid_t child_pid = find_child_pid(getpid());
+ if (child_pid != -1) {
+ ksft_set_plan(2);
+ if(child_pid == pid && pid != ppid && var != 1998) {
+ ksft_test_result_pass("Child Pid from /proc and fork() matching\n");
+ ksft_test_result_pass("Child Pid != Parent pid\n");
+ ksft_set_plan(3);
+ ksft_test_result_pass(
+ "After modification in child No effect on the value of 'var' in parent\n");
+ ksft_exit_pass();
+ return 0;
+ }
+ else {
+ ksft_exit_fail();
+ return 1;
+ }
+ }
+ else {
+ ksft_test_result_fail("Child Pid from /proc and fork() does not match");
+ ksft_exit_fail();
+ return 1;
+ }
+
+ // Wait for the child process to finish
+ wait(NULL);
+ }
+
+ return 0;
+}
+
--
2.34.1
Hi,
===== START =====
TEST: enq_last_no_enq_fails
DESCRIPTION: Verify we fail to load a scheduler if we specify the SCX_OPS_ENQ_LAST flag without defining ops.enqueue()
OUTPUT:
ERR: enq_last_no_enq_fails.c:35
Incorrectly succeeded in to attaching scheduler
not ok 2 enq_last_no_enq_fails #
===== END =====
Above selftest fails even when BPF scheduler is not loaded into the kernel.
Below is snippet from the dmesg verifing bpf program was not loaded:
sched_ext: enq_last_no_enq_fails: SCX_OPS_ENQ_LAST requires ops.enqueue() to be implemented
scx_ops_enable.isra.0+0xde8/0xe30
bpf_struct_ops_link_create+0x1ac/0x240
link_create+0x178/0x400
__sys_bpf+0x7ac/0xd50
sys_bpf+0x2c/0x70
system_call_exception+0x148/0x310
system_call_vectored_common+0x15c/0x2ec
sched_ext: "enq_select_cpu_fails" does not implement cgroup cpu.weight
sched_ext: BPF scheduler "enq_select_cpu_fails" enabled
sched_ext: BPF scheduler "enq_select_cpu_fails" disabled (runtime error)
static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
{
...
ret = validate_ops(ops);
if (ret)
goto err_disable;
...
err_disable:
mutex_unlock(&scx_ops_enable_mutex);
/*
* Returning an error code here would not pass all the error information
* to userspace. Record errno using scx_ops_error() for cases
* scx_ops_error() wasn't already invoked and exit indicating success so
* that the error is notified through ops.exit() with all the details.
*
* Flush scx_ops_disable_work to ensure that error is reported before
* init completion.
*/
scx_ops_error("scx_ops_enable() failed (%d)", ret);
kthread_flush_work(&scx_ops_disable_work);
return 0;
}
validate_ops() correctly reports the error, but err_disable path ultimately
returns with a value of zero
from: enq_last_no_enq_fails.c
static enum scx_test_status run(void *ctx)
{
struct enq_last_no_enq_fails *skel = ctx;
struct bpf_link *link;
link = bpf_map__attach_struct_ops(skel->maps.enq_last_no_enq_fails_ops);
if (link) {
SCX_ERR("Incorrectly succeeded in to attaching scheduler");
return SCX_TEST_FAIL;
}
bpf_link__destroy(link);
return SCX_TEST_PASS;
}
From: Jeff Xu <jeffxu(a)google.com>
Two fixes for madvise(MADV_DONTNEED) when sealed.
For PROT_NONE mappings, the previous blocking of
madvise(MADV_DONTNEED) is unnecessary. As PROT_NONE already prohibits
memory access, madvise(MADV_DONTNEED) should be allowed to proceed in
order to free the page.
For file-backed, private, read-only memory mappings, we previously did
not block the madvise(MADV_DONTNEED). This was based on
the assumption that the memory's content, being file-backed, could be
retrieved from the file if accessed again. However, this assumption
failed to consider scenarios where a mapping is initially created as
read-write, modified, and subsequently changed to read-only. The newly
introduced VM_WASWRITE flag addresses this oversight.
Jeff Xu (2):
mseal: Two fixes for madvise(MADV_DONTNEED) when sealed
selftest/mseal: Add tests for madvise
include/linux/mm.h | 2 +
mm/mprotect.c | 3 +
mm/mseal.c | 42 +++++++--
tools/testing/selftests/mm/mseal_test.c | 118 +++++++++++++++++++++++-
4 files changed, 157 insertions(+), 8 deletions(-)
--
2.47.0.rc1.288.g06298d1525-goog
Hi
Note for V12:
There was a small conflict between the Intel PT changes in
"KVM: x86: Fix Intel PT Host/Guest mode when host tracing" and the
changes in this patch set, so I have put the patch sets together,
along with outstanding fix "perf/x86/intel/pt: Fix buffer full but
size is 0 case"
Cover letter for KVM changes (patches 2 to 4):
There is a long-standing problem whereby running Intel PT on host and guest
in Host/Guest mode, causes VM-Entry failure.
The motivation for this patch set is to provide a fix for stable kernels
prior to the advent of the "Mediated Passthrough vPMU" patch set:
https://lore.kernel.org/kvm/20240801045907.4010984-1-mizhang@google.com/
which would render a large part of the fix unnecessary but likely not be
suitable for backport to stable due to its size and complexity.
Ideally, this patch set would be applied before "Mediated Passthrough vPMU"
Note that the fix does not conflict with "Mediated Passthrough vPMU", it
is just that "Mediated Passthrough vPMU" will make the code to stop and
restart Intel PT unnecessary.
Note for V11:
Moving aux_paused into a union within struct hw_perf_event caused
a regression because aux_paused was being written unconditionally
even though it is valid only for AUX (e.g. Intel PT) PMUs.
That is fixed in V11.
Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.
The ability to pause or resume tracing when another event happens, can do
that.
These patches add such a facilty and show how it would work for Intel
Processor Trace.
Maintainers of other AUX area tracing implementations are requested to
consider if this is something they might employ and then whether or not
the ABI would work for them. Note, thank you to James Clark (ARM) for
evaluating the API for Coresight. Suzuki K Poulose (ARM) also responded
positively to the RFC.
Changes to perf tools are now (since V4) fleshed out.
Please note, Intel® Architecture Instruction Set Extensions and Future
Features Programming Reference March 2024 319433-052, currently:
https://cdrdv2.intel.com/v1/dl/getContent/671368
introduces hardware pause / resume for Intel PT in a feature named
Intel PT Trigger Tracing.
For that more fields in perf_event_attr will be necessary. The main
differences are:
- it can be applied not just to overflows, but optionally to
every event
- a packet is emitted into the trace, optionally with IP
information
- no PMI
- works with PMC and DR (breakpoint) events only
Here are the proposed additions to perf_event_attr, please comment:
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 0c557f0a17b3..05dcc43f11bb 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -369,6 +369,22 @@ enum perf_event_read_format {
PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
};
+enum {
+ PERF_AUX_ACTION_START_PAUSED = 1U << 0,
+ PERF_AUX_ACTION_PAUSE = 1U << 1,
+ PERF_AUX_ACTION_RESUME = 1U << 2,
+ PERF_AUX_ACTION_EMIT = 1U << 3,
+ PERF_AUX_ACTION_NR = 0x1f << 4,
+ PERF_AUX_ACTION_NO_IP = 1U << 9,
+ PERF_AUX_ACTION_PAUSE_ON_EVT = 1U << 10,
+ PERF_AUX_ACTION_RESUME_ON_EVT = 1U << 11,
+ PERF_AUX_ACTION_EMIT_ON_EVT = 1U << 12,
+ PERF_AUX_ACTION_NR_ON_EVT = 0x1f << 13,
+ PERF_AUX_ACTION_NO_IP_ON_EVT = 1U << 18,
+ PERF_AUX_ACTION_MASK = ~PERF_AUX_ACTION_START_PAUSED,
+ PERF_AUX_PAUSE_RESUME_MASK = PERF_AUX_ACTION_PAUSE | PERF_AUX_ACTION_RESUME,
+};
+
#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */
#define PERF_ATTR_SIZE_VER1 72 /* add: config2 */
#define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */
@@ -515,10 +531,19 @@ struct perf_event_attr {
union {
__u32 aux_action;
struct {
- __u32 aux_start_paused : 1, /* start AUX area tracing paused */
- aux_pause : 1, /* on overflow, pause AUX area tracing */
- aux_resume : 1, /* on overflow, resume AUX area tracing */
- __reserved_3 : 29;
+ __u32 aux_start_paused : 1, /* start AUX area tracing paused */
+ aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ aux_emit : 1, /* generate AUX records instead of events */
+ aux_nr : 5, /* AUX area tracing reference number */
+ aux_no_ip : 1, /* suppress IP in AUX records */
+ /* Following apply to event occurrence not overflows */
+ aux_pause_on_evt : 1, /* on event, pause AUX area tracing */
+ aux_resume_on_evt : 1, /* on event, resume AUX area tracing */
+ aux_emit_on_evt : 1, /* generate AUX records instead of events */
+ aux_nr_on_evt : 5, /* AUX area tracing reference number */
+ aux_no_ip_on_evt : 1, /* suppress IP in AUX records */
+ __reserved_3 : 13;
};
};
Changes in V13:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Do aux_resume at the end of __perf_event_overflow() so as to trace
less of perf itself
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
Add error message also in EOPNOTSUPP case (Leo)
Changes in V12:
Add previously sent patch "perf/x86/intel/pt: Fix buffer full
but size is 0 case"
Add previously sent patch set "KVM: x86: Fix Intel PT Host/Guest
mode when host tracing"
Rebase on current tip plus patch set "KVM: x86: Fix Intel PT Host/Guest
mode when host tracing"
Changes in V11:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Make assignment to event->hw.aux_paused conditional on
(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE).
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
Remove definition of has_aux_action() because it has
already been added as an inline function.
perf/x86/intel/pt: Fix sampling synchronization
perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64
perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF
Dropped because they have already been applied
Changes in V10:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Move aux_paused into a union within struct hw_perf_event.
Additional comment wrt PERF_EF_PAUSE/PERF_EF_RESUME.
Factor out has_aux_action() as an inline function.
Use scoped_guard for irqsave.
Move calls of perf_event_aux_pause() from __perf_event_output()
to __perf_event_overflow().
Changes in V9:
perf/x86/intel/pt: Fix sampling synchronization
New patch
perf/core: Add aux_pause, aux_resume, aux_start_paused
Move aux_paused to struct hw_perf_event
perf/x86/intel/pt: Add support for pause / resume
Add more comments and barriers for resume_allowed and
pause_allowed
Always use WRITE_ONCE with resume_allowed
Changes in V8:
perf tools: Parse aux-action
Fix clang warning:
util/auxtrace.c:821:7: error: missing field 'aux_action' initializer [-Werror,-Wmissing-field-initializers]
821 | {NULL},
| ^
Changes in V7:
Add Andi's Reviewed-by for patches 2-12
Re-base
Changes in V6:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Removed READ/WRITE_ONCE from __perf_event_aux_pause()
Expanded comment about guarding against NMI
Changes in V5:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Added James' Ack
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
New patch
perf tools
Added Ian's Ack
Changes in V4:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Rename aux_output_cfg -> aux_action
Reorder aux_action bits from:
aux_pause, aux_resume, aux_start_paused
to:
aux_start_paused, aux_pause, aux_resume
Fix aux_action bits __u64 -> __u32
coresight: Have a stab at support for pause / resume
Dropped
perf tools
All new patches
Changes in RFC V3:
coresight: Have a stab at support for pause / resume
'mode' -> 'flags' so it at least compiles
Changes in RFC V2:
Use ->stop() / ->start() instead of ->pause_resume()
Move aux_start_paused bit into aux_output_cfg
Tighten up when Intel PT pause / resume is allowed
Add an example of how it might work for CoreSight
Adrian Hunter (14):
perf/x86/intel/pt: Fix buffer full but size is 0 case
KVM: x86: Fix Intel PT IA32_RTIT_CTL MSR validation
KVM: x86: Fix Intel PT Host/Guest mode when host tracing also
KVM: selftests: Add guest Intel PT test
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Add support for pause / resume
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
perf tools: Add aux_start_paused, aux_pause and aux_resume
perf tools: Add aux-action config term
perf tools: Parse aux-action
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
perf intel-pt: Improve man page format
perf intel-pt: Add documentation for pause / resume
perf intel-pt: Add a test for pause / resume
arch/x86/events/intel/core.c | 4 +-
arch/x86/events/intel/pt.c | 209 +++++++-
arch/x86/events/intel/pt.h | 16 +
arch/x86/include/asm/intel_pt.h | 4 +
arch/x86/kvm/vmx/vmx.c | 26 +-
arch/x86/kvm/vmx/vmx.h | 1 -
include/linux/perf_event.h | 28 +
include/uapi/linux/perf_event.h | 11 +-
kernel/events/core.c | 75 ++-
kernel/events/internal.h | 1 +
tools/include/uapi/linux/perf_event.h | 11 +-
tools/perf/Documentation/perf-intel-pt.txt | 596 +++++++++++++--------
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/builtin-record.c | 4 +-
tools/perf/tests/shell/test_intel_pt.sh | 28 +
tools/perf/util/auxtrace.c | 67 ++-
tools/perf/util/auxtrace.h | 6 +-
tools/perf/util/evsel.c | 15 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/evsel_config.h | 1 +
tools/perf/util/parse-events.c | 10 +
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 3 +
tools/perf/util/pmu.c | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
tools/testing/selftests/kvm/x86_64/intel_pt.c | 381 +++++++++++++
28 files changed, 1243 insertions(+), 264 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/intel_pt.c
Regards
Adrian
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v4->v5 changes:
- Handled comment by Stanislav Fomichev to fix a print format error.
- Made thread.c completely automated by starting proc_filter program
from within threads.c.
- Changed name CONFIG_CN_HASH_KUNIT_TEST to CN_HASH_KUNIT_TEST in
Kconfig.debug and changed display text.
v3->v4 changes:
- Reduce size of exit.log by removing unnecessary text.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it.
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one in cn_hash.c
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Handled comment by Liam Howlett to make threads test automated.
proc_filter.c creates exit.log, which is read by thread.c and checks
the values reported.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 +++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 232 ++++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 967 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ]
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/progs/verifier_scalar_ids.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 13b29a7faa71a..d24d3a36ec144 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -656,4 +656,71 @@ __naked void two_old_ids_one_cur_id(void)
: __clobber_all);
}
+SEC("socket")
+/* Note the flag, see verifier.c:opt_subreg_zext_lo32_rnd_hi32() */
+__flag(BPF_F_TEST_RND_HI32)
+__success
+/* This test was added because of a bug in verifier.c:sync_linked_regs(),
+ * upon range propagation it destroyed subreg_def marks for registers.
+ * The subreg_def mark is used to decide whether zero extension instructions
+ * are needed when register is read. When BPF_F_TEST_RND_HI32 is set it
+ * also causes generation of statements to randomize upper halves of
+ * read registers.
+ *
+ * The test is written in a way to return an upper half of a register
+ * that is affected by range propagation and must have it's subreg_def
+ * preserved. This gives a return value of 0 and leads to undefined
+ * return value if subreg_def mark is not preserved.
+ */
+__retval(0)
+/* Check that verifier believes r1/r0 are zero at exit */
+__log_level(2)
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+__msg("from 3 to 4")
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+/* Verify that statements to randomize upper half of r1 had not been
+ * generated.
+ */
+__xlated("call unknown")
+__xlated("r0 &= 2147483647")
+__xlated("w1 = w0")
+/* This is how disasm.c prints BPF_ZEXT_REG at the moment, x86 and arm
+ * are the only CI archs that do not need zero extension for subregs.
+ */
+#if !defined(__TARGET_ARCH_x86) && !defined(__TARGET_ARCH_arm64)
+__xlated("w1 = w1")
+#endif
+__xlated("if w0 < 0xa goto pc+0")
+__xlated("r1 >>= 32")
+__xlated("r0 = r1")
+__xlated("exit")
+__naked void linked_regs_and_subreg_def(void)
+{
+ asm volatile (
+ "call %[bpf_ktime_get_ns];"
+ /* make sure r0 is in 32-bit range, otherwise w1 = w0 won't
+ * assign same IDs to registers.
+ */
+ "r0 &= 0x7fffffff;"
+ /* link w1 and w0 via ID */
+ "w1 = w0;"
+ /* 'if' statement propagates range info from w0 to w1,
+ * but should not affect w1->subreg_def property.
+ */
+ "if w0 < 10 goto +0;"
+ /* r1 is read here, on archs that require subreg zero
+ * extension this would cause zext patch generation.
+ */
+ "r1 >>= 32;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_ktime_get_ns)
+ : __clobber_all);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ]
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/progs/verifier_scalar_ids.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 13b29a7faa71a..d24d3a36ec144 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -656,4 +656,71 @@ __naked void two_old_ids_one_cur_id(void)
: __clobber_all);
}
+SEC("socket")
+/* Note the flag, see verifier.c:opt_subreg_zext_lo32_rnd_hi32() */
+__flag(BPF_F_TEST_RND_HI32)
+__success
+/* This test was added because of a bug in verifier.c:sync_linked_regs(),
+ * upon range propagation it destroyed subreg_def marks for registers.
+ * The subreg_def mark is used to decide whether zero extension instructions
+ * are needed when register is read. When BPF_F_TEST_RND_HI32 is set it
+ * also causes generation of statements to randomize upper halves of
+ * read registers.
+ *
+ * The test is written in a way to return an upper half of a register
+ * that is affected by range propagation and must have it's subreg_def
+ * preserved. This gives a return value of 0 and leads to undefined
+ * return value if subreg_def mark is not preserved.
+ */
+__retval(0)
+/* Check that verifier believes r1/r0 are zero at exit */
+__log_level(2)
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+__msg("from 3 to 4")
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+/* Verify that statements to randomize upper half of r1 had not been
+ * generated.
+ */
+__xlated("call unknown")
+__xlated("r0 &= 2147483647")
+__xlated("w1 = w0")
+/* This is how disasm.c prints BPF_ZEXT_REG at the moment, x86 and arm
+ * are the only CI archs that do not need zero extension for subregs.
+ */
+#if !defined(__TARGET_ARCH_x86) && !defined(__TARGET_ARCH_arm64)
+__xlated("w1 = w1")
+#endif
+__xlated("if w0 < 0xa goto pc+0")
+__xlated("r1 >>= 32")
+__xlated("r0 = r1")
+__xlated("exit")
+__naked void linked_regs_and_subreg_def(void)
+{
+ asm volatile (
+ "call %[bpf_ktime_get_ns];"
+ /* make sure r0 is in 32-bit range, otherwise w1 = w0 won't
+ * assign same IDs to registers.
+ */
+ "r0 &= 0x7fffffff;"
+ /* link w1 and w0 via ID */
+ "w1 = w0;"
+ /* 'if' statement propagates range info from w0 to w1,
+ * but should not affect w1->subreg_def property.
+ */
+ "if w0 < 10 goto +0;"
+ /* r1 is read here, on archs that require subreg zero
+ * extension this would cause zext patch generation.
+ */
+ "r1 >>= 32;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_ktime_get_ns)
+ : __clobber_all);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v2
* The macros in kselftest_harness.h seem to be broken - __EXPECT() is
terminated by '} while (0); OPTIONAL_HANDLER(_assert)' meaning it is not
safe in single line if / else or for /which blocks, however working
around this results in checkpatch producing invalid warnings, as reported
by Shuah.
* Fixing these macros is out of scope for this series, so compromise and
instead rewrite test blocks so as to use multiple lines by separating out
a decl in most cases. This has the side effect of, for the most part,
making things more readable.
* Heavily document the use of the volatile keyword - we can't avoid
checkpatch complaining about this, so we explain it, as reported by
Shuah.
* Updated commit message to highlight that we skip tests we lack
permissions for, as reported by Shuah.
* Replaced a perror() with ksft_exit_fail_perror(), as reported by Shuah.
* Added user friendly messages to cases where tests are skipped due to lack
of permissions, as reported by Shuah.
* Update the tool header to include the new MADV_GUARD_POISON/UNPOISON
defines and directly include asm-generic/mman.h to get the
platform-neutral versions to ensure we import them.
* Finally fixed Vlastimil's email address in Suggested-by tags from suze to
suse, as reported by Vlastimil.
* Added linux-api to cc list, as reported by Vlastimil.
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
https://lore.kernel.org/all/cover.1729196871.git.lorenzo.stoakes@oracle.com/
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
tools: testing: update tools UAPI header for mman-common.h
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 +++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++-
tools/include/uapi/asm-generic/mman-common.h | 3 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1228 ++++++++++++++++++
19 files changed, 1627 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.47.0