From: zhouyuhang <zhouyuhang(a)kylinos.cn>
Test case idmap_mount_tree_invalid failed to run on the newer kernel
with the following output:
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# mount_setattr_test.c:1428:idmap_mount_tree_invalid:Expected sys_mount_setattr(open_tree_fd, "", AT_EMPTY_PATH, &attr, sizeof(attr)) (0) ! = 0 (0)
# idmap_mount_tree_invalid: Test terminated by assertion
This is because tmpfs is mounted at "/mnt/A", and tmpfs already
contains the flag FS_ALLOW_IDMAP after the commit 7a80e5b8c6fa ("shmem:
support idmapped mounts for tmpfs"). So calling sys_mount_setattr here
returns 0 instead of -EINVAL as expected.
Ramfs is mounted at "/mnt/B" and does not support idmap mounts.
So we can use "/mnt/B" instead of "/mnt/A" to make the test run
successfully with the following output:
# Starting 1 tests from 1 test cases.
# RUN mount_setattr_idmapped.idmap_mount_tree_invalid ...
# OK mount_setattr_idmapped.idmap_mount_tree_invalid
ok 1 mount_setattr_idmapped.idmap_mount_tree_invalid
# PASSED: 1 / 1 tests passed.
Signed-off-by: zhouyuhang <zhouyuhang(a)kylinos.cn>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index c6a8c732b802..54552c19bc24 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -1414,7 +1414,7 @@ TEST_F(mount_setattr_idmapped, idmap_mount_tree_invalid)
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/b", 0, 0, 0), 0);
ASSERT_EQ(expected_uid_gid(-EBADF, "/tmp/B/BB/b", 0, 0, 0), 0);
- open_tree_fd = sys_open_tree(-EBADF, "/mnt/A",
+ open_tree_fd = sys_open_tree(-EBADF, "/mnt/B",
AT_RECURSIVE |
AT_EMPTY_PATH |
AT_NO_AUTOMOUNT |
--
2.27.0
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v5:
* Fixup self test dependencies on pidfd/pidfd.h.
v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
Christian, instead simply always pin the pid and maintain fd scope in the
helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
UAPI pidfd.h header without encountering the collision between system
fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
stdbool.h in userland.
https://lore.kernel.org/linux-mm/cover.1729198898.git.lorenzo.stoakes@oracl…
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracl…
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (5):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
tools: testing: separate out wait_for_pid() into helper header
selftests: pidfd: add pidfd.h UAPI wrapper
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 34 ++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 65 +++++---
kernel/signal.c | 29 +---
tools/include/linux/pidfd.h | 14 ++
tools/testing/selftests/cgroup/test_kill.c | 2 +-
.../pid_namespace/regression_enomem.c | 2 +-
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 28 +---
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
tools/testing/selftests/pidfd/pidfd_helpers.h | 39 +++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
15 files changed, 375 insertions(+), 88 deletions(-)
create mode 100644 tools/include/linux/pidfd.h
create mode 100644 tools/testing/selftests/pidfd/pidfd_helpers.h
--
2.47.0
Use a less populated IP range to run the tests, as suggested by Petr in
Link: https://lore.kernel.org/netdev/87ikvukv3s.fsf@nvidia.com/.
Suggested-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/drivers/net/netcons_basic.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/netcons_basic.sh b/tools/testing/selftests/drivers/net/netcons_basic.sh
index 06021b2059b7..4ad1e216c6b0 100755
--- a/tools/testing/selftests/drivers/net/netcons_basic.sh
+++ b/tools/testing/selftests/drivers/net/netcons_basic.sh
@@ -20,9 +20,9 @@ SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
# Simple script to test dynamic targets in netconsole
SRCIF="" # to be populated later
-SRCIP=192.168.1.1
+SRCIP=192.168.2.1
DSTIF="" # to be populated later
-DSTIP=192.168.1.2
+DSTIP=192.168.2.2
PORT="6666"
MSG="netconsole selftest"
--
2.43.5
Following the previous vIOMMU series, this adds another vDEVICE structure,
representing the association from an iommufd_device to an iommufd_viommu.
This gives the whole architecture a new "v" layer:
_______________________________________________________________________
| iommufd (with vIOMMU/vDEVICE) |
| _____________ _____________ |
| | | | | |
| |----------------| vIOMMU |<---| vDEVICE |<------| |
| | | | |_____________| | |
| | ______ | | _____________ ___|____ |
| | | | | | | | | | |
| | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
| | |______| |_____________| |_____________| |________| |
|______|________|______________|__________________|_______________|_____|
| | | | |
______v_____ | ______v_____ ______v_____ ___v__
| struct | | PFN | (paging) | | (nested) | |struct|
|iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device|
|____________| storage|____________| |____________| |______|
This vDEVICE object is used to collect and store all vIOMMU-related device
information/attributes in a VM. As an initial series for vDEVICE, add only
the virt_id to the vDEVICE, which is a vIOMMU specific device ID in a VM:
e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vID of Intel VT-d to
a Context Table. This virt_id helps IOMMU drivers to link the vID to a pID
of the device against the physical IOMMU instance. This is essential for a
vIOMMU-based invalidation, where the request contains a device's vID for a
device cache flush, e.g. ATC invalidation.
Therefore, with this vDEVICE object, support a vIOMMU-based invalidation,
by reusing IOMMUFD_CMD_HWPT_INVALIDATE for a vIOMMU object to flush cache
with a given driver data.
As for the implementation of the series, add driver support in ARM SMMUv3
for a real world use case.
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v4
For testing, try this "with-rmr" branch:
https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p2-v4-with-rmr
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_viommu_p2-v4
Changelog
v4
* Added missing brackets in switch-case
* Fixed the unreleased idev refcount issue
* Reworked the iommufd_vdevice_alloc allocator
* Dropped support for IOMMU_VIOMMU_TYPE_DEFAULT
* Added missing TEST_LENGTH and fail_nth coverages
* Added a verification to the driver-allocated vDEVICE object
* Added an iommufd_vdevice_abort for a missing mutex protection
* Added a u64 structure arm_vsmmu_invalidation_cmd for user command
conversion
v3
https://lore.kernel.org/all/cover.1728491532.git.nicolinc@nvidia.com/
* Added Jason's Reviewed-by
* Split this invalidation part out of the part-1 series
* Repurposed VDEV_ID ioctl to a wider vDEVICE structure and ioctl
* Reduced viommu_api functions by allowing drivers to access viommu
and vdevice structure directly
* Dropped vdevs_rwsem by using xa_lock instead
* Dropped arm_smmu_cache_invalidate_user
v2
https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
* Limited vdev_id to one per idev
* Added a rw_sem to protect the vdev_id list
* Reworked driver-level APIs with proper lockings
* Added a new viommu_api file for IOMMUFD_DRIVER config
* Dropped useless iommu_dev point from the viommu structure
* Added missing index numnbers to new types in the uAPI header
* Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT one
* Reworked mock_viommu_cache_invalidate() using the new iommu helper
* Reordered details of set/unset_vdev_id handlers for proper lockings
v1
https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Jason Gunthorpe (2):
iommu: Add iommu_copy_struct_from_full_user_array helper
iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
Nicolin Chen (12):
iommufd/viommu: Introduce IOMMUFD_OBJ_VDEVICE and its related struct
iommufd/viommu: Add IOMMU_VDEVICE_ALLOC ioctl
iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
iommu/viommu: Add cache_invalidate to iommufd_viommu_ops
iommufd/hw_pagetable: Enforce cache invalidation op on vIOMMU-based
hwpt_nested
iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
iommufd/viommu: Add vdev_to_dev helper
iommufd/selftest: Add mock_viommu_cache_invalidate
iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
Documentation: userspace-api: iommufd: Update vDEVICE
iommu/arm-smmu-v3: Add arm_vsmmu_cache_invalidate
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 9 +-
drivers/iommu/iommufd/iommufd_private.h | 12 ++
drivers/iommu/iommufd/iommufd_test.h | 30 +++
include/linux/iommu.h | 49 ++++-
include/linux/iommufd.h | 50 +++++
include/uapi/linux/iommufd.h | 61 +++++-
tools/testing/selftests/iommu/iommufd_utils.h | 83 +++++++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 162 +++++++++++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 32 ++-
drivers/iommu/iommufd/device.c | 11 +
drivers/iommu/iommufd/driver.c | 7 +
drivers/iommu/iommufd/hw_pagetable.c | 36 +++-
drivers/iommu/iommufd/main.c | 7 +
drivers/iommu/iommufd/selftest.c | 115 +++++++++-
drivers/iommu/iommufd/viommu.c | 108 ++++++++++
tools/testing/selftests/iommu/iommufd.c | 204 +++++++++++++++++-
.../selftests/iommu/iommufd_fail_nth.c | 4 +
Documentation/userspace-api/iommufd.rst | 41 +++-
18 files changed, 983 insertions(+), 38 deletions(-)
--
2.43.0
If you wish to utilise a pidfd interface to refer to the current process or
thread it is rather cumbersome, requiring something like:
int pidfd = pidfd_open(getpid(), 0 or PIDFD_THREAD);
...
close(pidfd);
Or the equivalent call opening /proc/self. It is more convenient to use a
sentinel value to indicate to an interface that accepts a pidfd that we
simply wish to refer to the current process thread.
This series introduces sentinels for this purposes which can be passed as
the pidfd in this instance rather than having to establish a dummy fd for
this purpose.
It is useful to refer to both the current thread from the userland's
perspective for which we use PIDFD_SELF, and the current process from the
userland's perspective, for which we use PIDFD_SELF_PROCESS.
There is unfortunately some confusion between the kernel and userland as to
what constitutes a process - a thread from the userland perspective is a
process in userland, and a userland process is a thread group (more
specifically the thread group leader from the kernel perspective). We
therefore alias things thusly:
* PIDFD_SELF_THREAD aliased by PIDFD_SELF - use PIDTYPE_PID.
* PIDFD_SELF_THREAD_GROUP alised by PIDFD_SELF_PROCESS - use PIDTYPE_TGID.
In all of the kernel code we refer to PIDFD_SELF_THREAD and
PIDFD_SELF_THREAD_GROUP. However we expect users to use PIDFD_SELF and
PIDFD_SELF_PROCESS.
This matters for cases where, for instance, a user unshare()'s FDs or does
thread-specific signal handling and where the user would be hugely confused
if the FDs referenced or signal processed referred to the thread group
leader rather than the individual thread.
We ensure that pidfd_send_signal() and pidfd_getfd() work correctly, and
assert as much in selftests. All other interfaces except setns() will work
implicitly with this new interface, however it doesn't make sense to test
waitid(P_PIDFD, ...) as waiting on ourselves is a blocking operation.
In the case of setns() we explicitly disallow use of PIDFD_SELF* as it
doesn't make sense to obtain the namespaces of our own process, and it
would require work to implement this functionality there that would be of
no use.
We also do not provide the ability to utilise PIDFD_SELF* in ordinary fd
operations such as open() or poll(), as this would require extensive work
and be of no real use.
v4:
* Avoid returning an fd in the __pidfd_get_pid() function as pointed out by
Christian, instead simply always pin the pid and maintain fd scope in the
helper alone.
* Add wrapper header file in tools/include/linux to allow for import of
UAPI pidfd.h header without encountering the collision between system
fcntl.h and linux/fcntl.h as discussed with Shuah and John.
* Fixup tests to import the UAPI pidfd.h header working around conflicts
between system fcntl.h and linux/fcntl.h which the UAPI pidfd.h imports,
as reported by Shuah.
* Use an int for pidfd_is_self_sentinel() to avoid any dependency on
stdbool.h in userland.
v3:
* Do not fput() an invalid fd as reported by kernel test bot.
* Fix unintended churn from moving variable declaration.
https://lore.kernel.org/linux-mm/cover.1729073310.git.lorenzo.stoakes@oracl…
v2:
* Fix tests as reported by Shuah.
* Correct RFC version lore link.
https://lore.kernel.org/linux-mm/cover.1728643714.git.lorenzo.stoakes@oracl…
Non-RFC v1:
* Removed RFC tag - there seems to be general consensus that this change is
a good idea, but perhaps some debate to be had on implementation. It
seems sensible then to move forward with the RFC flag removed.
* Introduced PIDFD_SELF_THREAD, PIDFD_SELF_THREAD_GROUP and their aliases
PIDFD_SELF and PIDFD_SELF_PROCESS respectively.
* Updated testing accordingly.
https://lore.kernel.org/linux-mm/cover.1728578231.git.lorenzo.stoakes@oracl…
RFC version:
https://lore.kernel.org/linux-mm/cover.1727644404.git.lorenzo.stoakes@oracl…
Lorenzo Stoakes (4):
pidfd: extend pidfd_get_pid() and de-duplicate pid lookup
pidfd: add PIDFD_SELF_* sentinels to refer to own thread/process
selftests: pidfd: add pidfd.h UAPI wrapper
selftests: pidfd: add tests for PIDFD_SELF_*
include/linux/pid.h | 34 ++++-
include/uapi/linux/pidfd.h | 15 ++
kernel/exit.c | 3 +-
kernel/nsproxy.c | 1 +
kernel/pid.c | 65 +++++---
kernel/signal.c | 29 +---
tools/include/linux/pidfd.h | 14 ++
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 2 +
.../selftests/pidfd/pidfd_getfd_test.c | 141 ++++++++++++++++++
.../selftests/pidfd/pidfd_setns_test.c | 11 ++
tools/testing/selftests/pidfd/pidfd_test.c | 76 ++++++++--
12 files changed, 333 insertions(+), 61 deletions(-)
create mode 100644 tools/include/linux/pidfd.h
--
2.46.2
RISC-V defines three extensions for pointer masking[1]:
- Smmpm: configured in M-mode, affects M-mode
- Smnpm: configured in M-mode, affects the next lower mode (S or U-mode)
- Ssnpm: configured in S-mode, affects the next lower mode (VS, VU, or U-mode)
This series adds support for configuring Smnpm or Ssnpm (depending on
which privilege mode the kernel is running in) to allow pointer masking
in userspace (VU or U-mode), extending the PR_SET_TAGGED_ADDR_CTRL API
from arm64. Unlike arm64 TBI, userspace pointer masking is not enabled
by default on RISC-V. Additionally, the tag width (referred to as PMLEN)
is variable, so userspace needs to ask the kernel for a specific tag
width, which is interpreted as a lower bound on the number of tag bits.
This series also adds support for a tagged address ABI similar to arm64
and x86. Since accesses from the kernel to user memory use the kernel's
pointer masking configuration, not the user's, the kernel must untag
user pointers in software before dereferencing them. And since the tag
width is variable, as with LAM on x86, it must be kept the same across
all threads in a process so untagged_addr_remote() can work.
[1]: https://github.com/riscv/riscv-j-extension/raw/d70011dde6c2/zjpm-spec.pdf
---
This series depends on the per-thread envcfg series in riscv/for-next.
This series can be tested in QEMU by applying a patch set[2].
KASAN_SW_TAGS using pointer masking is an independent patch series[3].
[2]: https://lore.kernel.org/qemu-devel/20240511101053.1875596-1-me@deliversmonk…
[3]: https://lore.kernel.org/linux-riscv/20240814085618.968833-1-samuel.holland@…
Changes in v5:
- Update pointer masking spec version to 1.0 and state to ratified
- Document how PR_[SG]ET_TAGGED_ADDR_CTRL are used on RISC-V
- Document that the RISC-V tagged address ABI is the same as AArch64
- Rename "pm" selftests directory to "abi" to be more generic
- Fix -Wparentheses warnings
- Fix order of operations when writing via the tagged pointer
- Update pointer masking spec version to 1.0 in hwprobe documentation
Changes in v4:
- Switch IS_ENABLED back to #ifdef to fix riscv32 build
- Combine __untagged_addr() and __untagged_addr_remote()
Changes in v3:
- Note in the commit message that the ISA extension spec is frozen
- Rebase on riscv/for-next (ISA extension list conflicts)
- Remove RISCV_ISA_EXT_SxPM, which was not used anywhere
- Use shifts instead of large numbers in ENVCFG_PMM* macro definitions
- Rename CONFIG_RISCV_ISA_POINTER_MASKING to CONFIG_RISCV_ISA_SUPM,
since it only controls the userspace part of pointer masking
- Use IS_ENABLED instead of #ifdef when possible
- Use an enum for the supported PMLEN values
- Simplify the logic in set_tagged_addr_ctrl()
- Use IS_ENABLED instead of #ifdef when possible
- Implement mm_untag_mask()
- Remove pmlen from struct thread_info (now only in mm_context_t)
Changes in v2:
- Drop patch 4 ("riscv: Define is_compat_thread()"), as an equivalent
patch was already applied
- Move patch 5 ("riscv: Split per-CPU and per-thread envcfg bits") to a
different series[3]
- Update pointer masking specification version reference
- Provide macros for the extension affecting the kernel and userspace
- Use the correct name for the hstatus.HUPMM field
- Rebase on riscv/linux.git for-next
- Add and use the envcfg_update_bits() helper function
- Inline flush_tagged_addr_state()
- Implement untagged_addr_remote()
- Restrict PMLEN changes once a process is multithreaded
- Rename "tags" directory to "pm" to avoid .gitignore rules
- Add .gitignore file to ignore the compiled selftest binary
- Write to a pipe to force dereferencing the user pointer
- Handle SIGSEGV in the child process to reduce dmesg noise
- Export Supm via hwprobe
- Export Smnpm and Ssnpm to KVM guests
Samuel Holland (10):
dt-bindings: riscv: Add pointer masking ISA extensions
riscv: Add ISA extension parsing for pointer masking
riscv: Add CSR definitions for pointer masking
riscv: Add support for userspace pointer masking
riscv: Add support for the tagged address ABI
riscv: Allow ptrace control of the tagged address ABI
riscv: selftests: Add a pointer masking test
riscv: hwprobe: Export the Supm ISA extension
RISC-V: KVM: Allow Smnpm and Ssnpm extensions for guests
KVM: riscv: selftests: Add Smnpm and Ssnpm to get-reg-list test
Documentation/arch/riscv/hwprobe.rst | 3 +
Documentation/arch/riscv/uabi.rst | 16 +
.../devicetree/bindings/riscv/extensions.yaml | 18 +
arch/riscv/Kconfig | 11 +
arch/riscv/include/asm/csr.h | 16 +
arch/riscv/include/asm/hwcap.h | 5 +
arch/riscv/include/asm/mmu.h | 7 +
arch/riscv/include/asm/mmu_context.h | 13 +
arch/riscv/include/asm/processor.h | 8 +
arch/riscv/include/asm/switch_to.h | 11 +
arch/riscv/include/asm/uaccess.h | 43 ++-
arch/riscv/include/uapi/asm/hwprobe.h | 1 +
arch/riscv/include/uapi/asm/kvm.h | 2 +
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/process.c | 154 ++++++++
arch/riscv/kernel/ptrace.c | 42 +++
arch/riscv/kernel/sys_hwprobe.c | 3 +
arch/riscv/kvm/vcpu_onereg.c | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 5 +-
.../selftests/kvm/riscv/get-reg-list.c | 8 +
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/abi/.gitignore | 1 +
tools/testing/selftests/riscv/abi/Makefile | 10 +
.../selftests/riscv/abi/pointer_masking.c | 332 ++++++++++++++++++
25 files changed, 712 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/riscv/abi/.gitignore
create mode 100644 tools/testing/selftests/riscv/abi/Makefile
create mode 100644 tools/testing/selftests/riscv/abi/pointer_masking.c
--
2.45.1
This series was motivated by the regression fixed by 166bf8af9122
("pinctrl: mediatek: common-v2: Fix broken bias-disable for
PULL_PU_PD_RSEL_TYPE"). A bug was introduced in the pinctrl_paris driver
which prevented certain pins from having their bias configured.
Running this test on the mt8195-tomato platform with the test plan
included below[1] shows the test passing with the fix applied, but failing
without the fix:
With fix:
$ ./gpio-setget-config.py
TAP version 13
# Using test plan file: ./google,tomato.yaml
1..3
ok 1 pinctrl_paris.34.pull-up
ok 2 pinctrl_paris.34.pull-down
ok 3 pinctrl_paris.34.disabled
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Without fix:
$ ./gpio-setget-config.py
TAP version 13
# Using test plan file: ./google,tomato.yaml
1..3
# Bias doesn't match: Expected pull-up, read pull-down.
not ok 1 pinctrl_paris.34.pull-up
ok 2 pinctrl_paris.34.pull-down
# Bias doesn't match: Expected disabled, read pull-down.
not ok 3 pinctrl_paris.34.disabled
# Totals: pass:1 fail:2 xfail:0 xpass:0 skip:0 error:0
In order to achieve this, the first patch exposes bias configuration
through the GPIO API in the pinctrl_paris driver, patch 2 extends the
gpio-mockup-cdev utility for use by patch 3, and patch 3 introduces a
new GPIO kselftest that takes a test plan in YAML, which can be tailored
per-platform to specify the configurations to test, and sets and gets
back each pin configuration to verify that they match and thus that the
driver is behaving as expected.
Since the GPIO uAPI only allows setting the pin configuration, getting
it back is done through pinconf-pins in the pinctrl debugfs folder.
The test currently only verifies bias but it would be easy to extend to
verify other pin configurations.
The test plan YAML file can be customized for each use-case and is
platform-dependant. For that reason, only an example is included in
patch 3 and the user is supposed to provide their test plan. That said,
the aim is to collect test plans for ease of use at [2].
[1] This is the test plan used for mt8195-tomato:
- label: "pinctrl_paris"
tests:
# Pin 34 has type MTK_PULL_PU_PD_RSEL_TYPE and is unused.
# Setting bias to MTK_PULL_PU_PD_RSEL_TYPE pins was fixed by
# 166bf8af9122 ("pinctrl: mediatek: common-v2: Fix broken bias-disable for PULL_PU_PD_RSEL_TYPE")
- pin: 34
bias: "pull-up"
- pin: 34
bias: "pull-down"
- pin: 34
bias: "disabled"
[2] https://github.com/kernelci/platform-test-parameters
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Nícolas F. R. A. Prado (3):
pinctrl: mediatek: paris: Expose more configurations to GPIO set_config
selftest: gpio: Add wait flag to gpio-mockup-cdev
selftest: gpio: Add a new set-get config test
drivers/pinctrl/mediatek/pinctrl-paris.c | 20 +--
tools/testing/selftests/gpio/Makefile | 2 +-
tools/testing/selftests/gpio/gpio-mockup-cdev.c | 14 +-
.../gpio-set-get-config-example-test-plan.yaml | 15 ++
.../testing/selftests/gpio/gpio-set-get-config.py | 183 +++++++++++++++++++++
5 files changed, 220 insertions(+), 14 deletions(-)
---
base-commit: 6a7917c89f219f09b1d88d09f376000914a52763
change-id: 20240906-kselftest-gpio-set-get-config-6e5bb670c1a5
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.
Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 73 -------------------
tools/testing/selftests/net/lib.sh | 73 +++++++++++++++++++
2 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index d28dbf27c1f0..8625e3c99f55 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -445,79 +445,6 @@ done
##############################################################################
# Helpers
-# Whether FAILs should be interpreted as XFAILs. Internal.
-FAIL_TO_XFAIL=
-
-check_err()
-{
- local err=$1
- local msg=$2
-
- if ((err)); then
- if [[ $FAIL_TO_XFAIL = yes ]]; then
- ret_set_ksft_status $ksft_xfail "$msg"
- else
- ret_set_ksft_status $ksft_fail "$msg"
- fi
- fi
-}
-
-check_fail()
-{
- local err=$1
- local msg=$2
-
- check_err $((!err)) "$msg"
-}
-
-check_err_fail()
-{
- local should_fail=$1; shift
- local err=$1; shift
- local what=$1; shift
-
- if ((should_fail)); then
- check_fail $err "$what succeeded, but should have failed"
- else
- check_err $err "$what failed"
- fi
-}
-
-xfail()
-{
- FAIL_TO_XFAIL=yes "$@"
-}
-
-xfail_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW = yes ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
-omit_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW != yes ]]; then
- "$@"
- fi
-}
-
-xfail_on_veth()
-{
- local dev=$1; shift
- local kind
-
- kind=$(ip -j -d link show dev $dev |
- jq -r '.[].linkinfo.info_kind')
- if [[ $kind = veth ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 4f52b8e48a3a..6bcf5d13879d 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -361,3 +361,76 @@ tests_run()
$current_test
done
}
+
+# Whether FAILs should be interpreted as XFAILs. Internal.
+FAIL_TO_XFAIL=
+
+check_err()
+{
+ local err=$1
+ local msg=$2
+
+ if ((err)); then
+ if [[ $FAIL_TO_XFAIL = yes ]]; then
+ ret_set_ksft_status $ksft_xfail "$msg"
+ else
+ ret_set_ksft_status $ksft_fail "$msg"
+ fi
+ fi
+}
+
+check_fail()
+{
+ local err=$1
+ local msg=$2
+
+ check_err $((!err)) "$msg"
+}
+
+check_err_fail()
+{
+ local should_fail=$1; shift
+ local err=$1; shift
+ local what=$1; shift
+
+ if ((should_fail)); then
+ check_fail $err "$what succeeded, but should have failed"
+ else
+ check_err $err "$what failed"
+ fi
+}
+
+xfail()
+{
+ FAIL_TO_XFAIL=yes "$@"
+}
+
+xfail_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW = yes ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
+
+omit_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW != yes ]]; then
+ "$@"
+ fi
+}
+
+xfail_on_veth()
+{
+ local dev=$1; shift
+ local kind
+
+ kind=$(ip -j -d link show dev $dev |
+ jq -r '.[].linkinfo.info_kind')
+ if [[ $kind = veth ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
--
2.45.0
It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.
Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 10 ++++++++++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 41dd14c42c48..d28dbf27c1f0 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1285,16 +1285,6 @@ matchall_sink_create()
action drop
}
-tests_run()
-{
- local current_test
-
- for current_test in ${TESTS:-$ALL_TESTS}; do
- in_defer_scope \
- $current_test
- done
-}
-
cleanup()
{
pre_cleanup
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 691318b1ec55..4f52b8e48a3a 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -351,3 +351,13 @@ log_info()
echo "INFO: $msg"
}
+
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ in_defer_scope \
+ $current_test
+ done
+}
--
2.45.0
Many net selftests invent their own logging helpers. These really should be
in a library sourced by these tests. Currently forwarding/lib.sh has a
suite of perfectly fine logging helpers, but sourcing a forwarding/ library
from a higher-level directory smells of layering violation. In this patch,
move the logging helpers to net/lib.sh so that every net test can use them.
Together with the logging helpers, it's also necessary to move
pause_on_fail(), and EXIT_STATUS and RET.
Existing lib.sh users might be using these same names for their functions
or variables. However lib.sh is always sourced near the top of the
file (checked), and whatever new definitions will simply override the ones
provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
Acked-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
Notes:
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 113 -----------------
tools/testing/selftests/net/lib.sh | 115 ++++++++++++++++++
2 files changed, 115 insertions(+), 113 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 89c25f72b10c..41dd14c42c48 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -48,7 +48,6 @@ declare -A NETIFS=(
: "${WAIT_TIME:=5}"
# Whether to pause on, respectively, after a failure and before cleanup.
-: "${PAUSE_ON_FAIL:=no}"
: "${PAUSE_ON_CLEANUP:=no}"
# Whether to create virtual interfaces, and what netdevice type they should be.
@@ -446,22 +445,6 @@ done
##############################################################################
# Helpers
-# Exit status to return at the end. Set in case one of the tests fails.
-EXIT_STATUS=0
-# Per-test return value. Clear at the beginning of each test.
-RET=0
-
-ret_set_ksft_status()
-{
- local ksft_status=$1; shift
- local msg=$1; shift
-
- RET=$(ksft_status_merge $RET $ksft_status)
- if (( $? )); then
- retmsg=$msg
- fi
-}
-
# Whether FAILs should be interpreted as XFAILs. Internal.
FAIL_TO_XFAIL=
@@ -535,102 +518,6 @@ xfail_on_veth()
fi
}
-log_test_result()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
- local result=$1; shift
- local retmsg=$1; shift
-
- printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
- if [[ $retmsg ]]; then
- printf "\t%s\n" "$retmsg"
- fi
-}
-
-pause_on_fail()
-{
- if [[ $PAUSE_ON_FAIL == yes ]]; then
- echo "Hit enter to continue, 'q' to quit"
- read a
- [[ $a == q ]] && exit 1
- fi
-}
-
-handle_test_result_pass()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" " OK "
-}
-
-handle_test_result_fail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_xfail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_skip()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
-}
-
-log_test()
-{
- local test_name=$1
- local opt_str=$2
-
- if [[ $# -eq 2 ]]; then
- opt_str="($opt_str)"
- fi
-
- if ((RET == ksft_pass)); then
- handle_test_result_pass "$test_name" "$opt_str"
- elif ((RET == ksft_xfail)); then
- handle_test_result_xfail "$test_name" "$opt_str"
- elif ((RET == ksft_skip)); then
- handle_test_result_skip "$test_name" "$opt_str"
- else
- handle_test_result_fail "$test_name" "$opt_str"
- fi
-
- EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
- return $RET
-}
-
-log_test_skip()
-{
- RET=$ksft_skip retmsg= log_test "$@"
-}
-
-log_test_xfail()
-{
- RET=$ksft_xfail retmsg= log_test "$@"
-}
-
-log_info()
-{
- local msg=$1
-
- echo "INFO: $msg"
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index c8991cc6bf28..691318b1ec55 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -9,6 +9,9 @@ source "$net_dir/lib/sh/defer.sh"
: "${WAIT_TIMEOUT:=20}"
+# Whether to pause on after a failure.
+: "${PAUSE_ON_FAIL:=no}"
+
BUSYWAIT_TIMEOUT=$((WAIT_TIMEOUT * 1000)) # ms
# Kselftest framework constants.
@@ -20,6 +23,11 @@ ksft_skip=4
# namespace list created by setup_ns
NS_LIST=()
+# Exit status to return at the end. Set in case one of the tests fails.
+EXIT_STATUS=0
+# Per-test return value. Clear at the beginning of each test.
+RET=0
+
##############################################################################
# Helpers
@@ -236,3 +244,110 @@ tc_rule_handle_stats_get()
| jq ".[] | select(.options.handle == $handle) | \
.options.actions[0].stats$selector"
}
+
+ret_set_ksft_status()
+{
+ local ksft_status=$1; shift
+ local msg=$1; shift
+
+ RET=$(ksft_status_merge $RET $ksft_status)
+ if (( $? )); then
+ retmsg=$msg
+ fi
+}
+
+log_test_result()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+ local result=$1; shift
+ local retmsg=$1; shift
+
+ printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
+ if [[ $retmsg ]]; then
+ printf "\t%s\n" "$retmsg"
+ fi
+}
+
+pause_on_fail()
+{
+ if [[ $PAUSE_ON_FAIL == yes ]]; then
+ echo "Hit enter to continue, 'q' to quit"
+ read a
+ [[ $a == q ]] && exit 1
+ fi
+}
+
+handle_test_result_pass()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" " OK "
+}
+
+handle_test_result_fail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_xfail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_skip()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
+}
+
+log_test()
+{
+ local test_name=$1
+ local opt_str=$2
+
+ if [[ $# -eq 2 ]]; then
+ opt_str="($opt_str)"
+ fi
+
+ if ((RET == ksft_pass)); then
+ handle_test_result_pass "$test_name" "$opt_str"
+ elif ((RET == ksft_xfail)); then
+ handle_test_result_xfail "$test_name" "$opt_str"
+ elif ((RET == ksft_skip)); then
+ handle_test_result_skip "$test_name" "$opt_str"
+ else
+ handle_test_result_fail "$test_name" "$opt_str"
+ fi
+
+ EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
+ return $RET
+}
+
+log_test_skip()
+{
+ RET=$ksft_skip retmsg= log_test "$@"
+}
+
+log_test_xfail()
+{
+ RET=$ksft_xfail retmsg= log_test "$@"
+}
+
+log_info()
+{
+ local msg=$1
+
+ echo "INFO: $msg"
+}
--
2.45.0
Some applications rely on placing data in free bits addresses allocated
by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
address returned by mmap to be less than the 48-bit address space,
unless the hint address uses more than 47 bits (the 48th bit is reserved
for the kernel address space).
The riscv architecture needs a way to similarly restrict the virtual
address space. On the riscv port of OpenJDK an error is thrown if
attempted to run on the 57-bit address space, called sv57 [1]. golang
has a comment that sv57 support is not complete, but there are some
workarounds to get it to mostly work [2].
These applications work on x86 because x86 does an implicit 47-bit
restriction of mmap() address that contain a hint address that is less
than 48 bits.
Instead of implicitly restricting the address space on riscv (or any
current/future architecture), a flag would allow users to opt-in to this
behavior rather than opt-out as is done on other architectures. This is
desirable because it is a small class of applications that do pointer
masking.
This flag will also allow seemless compatibility between all
architectures, so applications like Go and OpenJDK that use bits in a
virtual address can request the exact number of bits they need in a
generic way. The flag can be checked inside of vm_unmapped_area() so
that this flag does not have to be handled individually by each
architecture.
Link:
https://github.com/openjdk/jdk/blob/f080b4bb8a75284db1b6037f8c00ef3b1ef1add…
[1]
Link:
https://github.com/golang/go/blob/9e8ea567c838574a0f14538c0bbbd83c3215aa55/…
[2]
To: Arnd Bergmann <arnd(a)arndb.de>
To: Richard Henderson <richard.henderson(a)linaro.org>
To: Ivan Kokshaysky <ink(a)jurassic.park.msu.ru>
To: Matt Turner <mattst88(a)gmail.com>
To: Vineet Gupta <vgupta(a)kernel.org>
To: Russell King <linux(a)armlinux.org.uk>
To: Guo Ren <guoren(a)kernel.org>
To: Huacai Chen <chenhuacai(a)kernel.org>
To: WANG Xuerui <kernel(a)xen0n.name>
To: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
To: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
To: Helge Deller <deller(a)gmx.de>
To: Michael Ellerman <mpe(a)ellerman.id.au>
To: Nicholas Piggin <npiggin(a)gmail.com>
To: Christophe Leroy <christophe.leroy(a)csgroup.eu>
To: Naveen N Rao <naveen(a)kernel.org>
To: Alexander Gordeev <agordeev(a)linux.ibm.com>
To: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
To: Heiko Carstens <hca(a)linux.ibm.com>
To: Vasily Gorbik <gor(a)linux.ibm.com>
To: Christian Borntraeger <borntraeger(a)linux.ibm.com>
To: Sven Schnelle <svens(a)linux.ibm.com>
To: Yoshinori Sato <ysato(a)users.sourceforge.jp>
To: Rich Felker <dalias(a)libc.org>
To: John Paul Adrian Glaubitz <glaubitz(a)physik.fu-berlin.de>
To: David S. Miller <davem(a)davemloft.net>
To: Andreas Larsson <andreas(a)gaisler.com>
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andy Lutomirski <luto(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Muchun Song <muchun.song(a)linux.dev>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Shuah Khan <shuah(a)kernel.org>
Cc: linux-arch(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-alpha(a)vger.kernel.org
Cc: linux-snps-arc(a)lists.infradead.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-csky(a)vger.kernel.org
Cc: loongarch(a)lists.linux.dev
Cc: linux-mips(a)vger.kernel.org
Cc: linux-parisc(a)vger.kernel.org
Cc: linuxppc-dev(a)lists.ozlabs.org
Cc: linux-s390(a)vger.kernel.org
Cc: linux-sh(a)vger.kernel.org
Cc: sparclinux(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
Changes in v2:
- Added much greater detail to cover letter
- Removed all code that touched architecture specific code and was able
to factor this out into all generic functions, except for flags that
needed to be added to vm_unmapped_area_info
- Made this an RFC since I have only tested it on riscv and x86
- Link to v1: https://lore.kernel.org/r/20240827-patches-below_hint_mmap-v1-0-46ff2eb9022…
---
Charlie Jenkins (4):
mm: Add MAP_BELOW_HINT
mm: Add hint and mmap_flags to struct vm_unmapped_area_info
mm: Support MAP_BELOW_HINT in vm_unmapped_area()
selftests/mm: Create MAP_BELOW_HINT test
arch/alpha/kernel/osf_sys.c | 2 ++
arch/arc/mm/mmap.c | 3 +++
arch/arm/mm/mmap.c | 7 ++++++
arch/csky/abiv1/mmap.c | 3 +++
arch/loongarch/mm/mmap.c | 3 +++
arch/mips/mm/mmap.c | 3 +++
arch/parisc/kernel/sys_parisc.c | 3 +++
arch/powerpc/mm/book3s64/slice.c | 7 ++++++
arch/s390/mm/hugetlbpage.c | 4 ++++
arch/s390/mm/mmap.c | 6 ++++++
arch/sh/mm/mmap.c | 6 ++++++
arch/sparc/kernel/sys_sparc_32.c | 3 +++
arch/sparc/kernel/sys_sparc_64.c | 6 ++++++
arch/sparc/mm/hugetlbpage.c | 4 ++++
arch/x86/kernel/sys_x86_64.c | 6 ++++++
arch/x86/mm/hugetlbpage.c | 4 ++++
fs/hugetlbfs/inode.c | 4 ++++
include/linux/mm.h | 2 ++
include/uapi/asm-generic/mman-common.h | 1 +
mm/mmap.c | 9 ++++++++
tools/include/uapi/asm-generic/mman-common.h | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/map_below_hint.c | 32 ++++++++++++++++++++++++++++
23 files changed, 120 insertions(+)
---
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
--
- Charlie
Many net selftests invent their own logging helpers. These really should be
in a library sourced by these tests. Currently forwarding/lib.sh has a
suite of perfectly fine logging helpers, but sourcing a forwarding/ library
from a higher-level directory smells of layering violation. In this patch,
move the logging helpers to net/lib.sh so that every net test can use them.
Together with the logging helpers, it's also necessary to move
pause_on_fail(), and EXIT_STATUS and RET.
Existing lib.sh users might be using these same names for their functions
or variables. However lib.sh is always sourced near the top of the
file (checked), and whatever new definitions will simply override the ones
provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 113 -----------------
tools/testing/selftests/net/lib.sh | 115 ++++++++++++++++++
2 files changed, 115 insertions(+), 113 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 89c25f72b10c..41dd14c42c48 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -48,7 +48,6 @@ declare -A NETIFS=(
: "${WAIT_TIME:=5}"
# Whether to pause on, respectively, after a failure and before cleanup.
-: "${PAUSE_ON_FAIL:=no}"
: "${PAUSE_ON_CLEANUP:=no}"
# Whether to create virtual interfaces, and what netdevice type they should be.
@@ -446,22 +445,6 @@ done
##############################################################################
# Helpers
-# Exit status to return at the end. Set in case one of the tests fails.
-EXIT_STATUS=0
-# Per-test return value. Clear at the beginning of each test.
-RET=0
-
-ret_set_ksft_status()
-{
- local ksft_status=$1; shift
- local msg=$1; shift
-
- RET=$(ksft_status_merge $RET $ksft_status)
- if (( $? )); then
- retmsg=$msg
- fi
-}
-
# Whether FAILs should be interpreted as XFAILs. Internal.
FAIL_TO_XFAIL=
@@ -535,102 +518,6 @@ xfail_on_veth()
fi
}
-log_test_result()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
- local result=$1; shift
- local retmsg=$1; shift
-
- printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
- if [[ $retmsg ]]; then
- printf "\t%s\n" "$retmsg"
- fi
-}
-
-pause_on_fail()
-{
- if [[ $PAUSE_ON_FAIL == yes ]]; then
- echo "Hit enter to continue, 'q' to quit"
- read a
- [[ $a == q ]] && exit 1
- fi
-}
-
-handle_test_result_pass()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" " OK "
-}
-
-handle_test_result_fail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_xfail()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
- pause_on_fail
-}
-
-handle_test_result_skip()
-{
- local test_name=$1; shift
- local opt_str=$1; shift
-
- log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
-}
-
-log_test()
-{
- local test_name=$1
- local opt_str=$2
-
- if [[ $# -eq 2 ]]; then
- opt_str="($opt_str)"
- fi
-
- if ((RET == ksft_pass)); then
- handle_test_result_pass "$test_name" "$opt_str"
- elif ((RET == ksft_xfail)); then
- handle_test_result_xfail "$test_name" "$opt_str"
- elif ((RET == ksft_skip)); then
- handle_test_result_skip "$test_name" "$opt_str"
- else
- handle_test_result_fail "$test_name" "$opt_str"
- fi
-
- EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
- return $RET
-}
-
-log_test_skip()
-{
- RET=$ksft_skip retmsg= log_test "$@"
-}
-
-log_test_xfail()
-{
- RET=$ksft_xfail retmsg= log_test "$@"
-}
-
-log_info()
-{
- local msg=$1
-
- echo "INFO: $msg"
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index c8991cc6bf28..691318b1ec55 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -9,6 +9,9 @@ source "$net_dir/lib/sh/defer.sh"
: "${WAIT_TIMEOUT:=20}"
+# Whether to pause on after a failure.
+: "${PAUSE_ON_FAIL:=no}"
+
BUSYWAIT_TIMEOUT=$((WAIT_TIMEOUT * 1000)) # ms
# Kselftest framework constants.
@@ -20,6 +23,11 @@ ksft_skip=4
# namespace list created by setup_ns
NS_LIST=()
+# Exit status to return at the end. Set in case one of the tests fails.
+EXIT_STATUS=0
+# Per-test return value. Clear at the beginning of each test.
+RET=0
+
##############################################################################
# Helpers
@@ -236,3 +244,110 @@ tc_rule_handle_stats_get()
| jq ".[] | select(.options.handle == $handle) | \
.options.actions[0].stats$selector"
}
+
+ret_set_ksft_status()
+{
+ local ksft_status=$1; shift
+ local msg=$1; shift
+
+ RET=$(ksft_status_merge $RET $ksft_status)
+ if (( $? )); then
+ retmsg=$msg
+ fi
+}
+
+log_test_result()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+ local result=$1; shift
+ local retmsg=$1; shift
+
+ printf "TEST: %-60s [%s]\n" "$test_name $opt_str" "$result"
+ if [[ $retmsg ]]; then
+ printf "\t%s\n" "$retmsg"
+ fi
+}
+
+pause_on_fail()
+{
+ if [[ $PAUSE_ON_FAIL == yes ]]; then
+ echo "Hit enter to continue, 'q' to quit"
+ read a
+ [[ $a == q ]] && exit 1
+ fi
+}
+
+handle_test_result_pass()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" " OK "
+}
+
+handle_test_result_fail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" FAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_xfail()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" XFAIL "$retmsg"
+ pause_on_fail
+}
+
+handle_test_result_skip()
+{
+ local test_name=$1; shift
+ local opt_str=$1; shift
+
+ log_test_result "$test_name" "$opt_str" SKIP "$retmsg"
+}
+
+log_test()
+{
+ local test_name=$1
+ local opt_str=$2
+
+ if [[ $# -eq 2 ]]; then
+ opt_str="($opt_str)"
+ fi
+
+ if ((RET == ksft_pass)); then
+ handle_test_result_pass "$test_name" "$opt_str"
+ elif ((RET == ksft_xfail)); then
+ handle_test_result_xfail "$test_name" "$opt_str"
+ elif ((RET == ksft_skip)); then
+ handle_test_result_skip "$test_name" "$opt_str"
+ else
+ handle_test_result_fail "$test_name" "$opt_str"
+ fi
+
+ EXIT_STATUS=$(ksft_exit_status_merge $EXIT_STATUS $RET)
+ return $RET
+}
+
+log_test_skip()
+{
+ RET=$ksft_skip retmsg= log_test "$@"
+}
+
+log_test_xfail()
+{
+ RET=$ksft_xfail retmsg= log_test "$@"
+}
+
+log_info()
+{
+ local msg=$1
+
+ echo "INFO: $msg"
+}
--
2.45.0
The series of patches are for doing basic tests
of NIC driver. Test comprises checks for auto-negotiation,
speed, duplex state and throughput between local NIC and
partner. Tools such as ethtool, iperf3 are used.
Signed-off-by: Mohan Prasad J <mohan.prasad(a)microchip.com>
---
Changes in v3:
- LinkConfig class is included in the hw library. This contains
generic APIs for doing link layer operations.
- Auto-negotiation checks involve changing the auto-neg state
both in local and partner NIC.
- Link layer test and performance test are separated to
different selftest files.
- Resetting of NIC driver done after test completion.
Changes in v2:
- Changed the hardcoded implementation of speed, duplex states,
throughput to generic values, in order to support all type
of NIC drivers.
- Test executes based on the supported link modes between local
NIC driver and partner.
- Instead of lan743x directory, selftest file is now relocated
to /selftests/drivers/net/hw.
---
Mohan Prasad J (3):
selftests: nic_link_layer: Add link layer selftest for NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex
states
selftests: nic_performance: Add selftest for performance of NIC driver
.../testing/selftests/drivers/net/hw/Makefile | 2 +
.../drivers/net/hw/lib/py/__init__.py | 1 +
.../drivers/net/hw/lib/py/linkconfig.py | 220 ++++++++++++++++++
.../drivers/net/hw/nic_link_layer.py | 105 +++++++++
.../drivers/net/hw/nic_performance.py | 121 ++++++++++
5 files changed, 449 insertions(+)
create mode 100644 tools/testing/selftests/drivers/net/hw/lib/py/linkconfig.py
create mode 100644 tools/testing/selftests/drivers/net/hw/nic_link_layer.py
create mode 100644 tools/testing/selftests/drivers/net/hw/nic_performance.py
--
2.43.0
This test verifies the correct behavior of the fork() system call,
which creates a child process by duplicating the parent process.
The test checks the following:
- The child PID returned by fork() is present in /proc.
- The child PID is different from the parent PID.
- The memory allocated to a variable in the child process is independent
of the parent process.
Test logs :
- Run without root
TAP version 13
1..1
ok 1 # SKIP This test needs root to run!
- Run with root
TAP version 13
1..1
# Inside the parent process.
# Child PID got from fork() return : 56038
# Parent PID from getpid(): 56037
# Inside the child process.
1..2
ok 1 Child Pid from /proc and fork() matching
ok 2 Child Pid != Parent pid
1..3
ok 3 After modification in child No effect on the value of 'var' in parent
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Shivam Chaudhary <cvam0000(a)gmail.com>
---
Here is my proposal for a new directory, /syscalls, to add syscall selftests,
as there is currently no dedicated space for these tests. I encountered this
issue while writing the test case for the delete_module syscall and was unsure
where to place it. As a heads-up, the delete_module test is currently under
review, and I would like to add it to this directory.
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/syscalls/.gitignore | 1 +
.../syscalls/fork_syscall/.gitignore | 1 +
.../selftests/syscalls/fork_syscall/Makefile | 5 +
.../syscalls/fork_syscall/fork_syscall.c | 151 ++++++++++++++++++
5 files changed, 159 insertions(+)
create mode 100644 tools/testing/selftests/syscalls/.gitignore
create mode 100644 tools/testing/selftests/syscalls/fork_syscall/.gitignore
create mode 100644 tools/testing/selftests/syscalls/fork_syscall/Makefile
create mode 100644 tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 363d031a16f7..9265c17c5de3 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -97,6 +97,7 @@ TARGETS += sparc64
TARGETS += splice
TARGETS += static_keys
TARGETS += sync
+TARGETS += syscalls/fork_syscall
TARGETS += syscall_user_dispatch
TARGETS += sysctl
TARGETS += tc-testing
diff --git a/tools/testing/selftests/syscalls/.gitignore b/tools/testing/selftests/syscalls/.gitignore
new file mode 100644
index 000000000000..c7ae138d3f0c
--- /dev/null
+++ b/tools/testing/selftests/syscalls/.gitignore
@@ -0,0 +1 @@
+// SPDX-License-Identifier: GPL-2.0
\ No newline at end of file
diff --git a/tools/testing/selftests/syscalls/fork_syscall/.gitignore b/tools/testing/selftests/syscalls/fork_syscall/.gitignore
new file mode 100644
index 000000000000..788cc1ff70bd
--- /dev/null
+++ b/tools/testing/selftests/syscalls/fork_syscall/.gitignore
@@ -0,0 +1 @@
+# SPDX-License-Identifier: GPL-2.0-only
\ No newline at end of file
diff --git a/tools/testing/selftests/syscalls/fork_syscall/Makefile b/tools/testing/selftests/syscalls/fork_syscall/Makefile
new file mode 100644
index 000000000000..56033a3d5a87
--- /dev/null
+++ b/tools/testing/selftests/syscalls/fork_syscall/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+TEST_GEN_PROGS := fork_syscall
+CFLAGS += -Wall
+
+include ../lib.mk
\ No newline at end of file
diff --git a/tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c b/tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c
new file mode 100644
index 000000000000..eab22831f7e1
--- /dev/null
+++ b/tools/testing/selftests/syscalls/fork_syscall/fork_syscall.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/* kselftest for fork() system call
+ *
+ * Summery : fork() system call is used to create a new process
+ * by duplicating an existing one. The new process, known as the
+ * child process, is a copy of the parent process.
+ *
+ * Child process is dublicate process but has different PID and
+ * memory allocation.
+ *
+ * About the test : With this test we are testing the following:
+ * - Child PID which fork() returns to Parent is present in /proc
+ * - Child PID is not same as Parent PID.
+ * - Memory allocation to a variable in child and parent process
+ * is different.
+*/
+
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <dirent.h>
+#include <ctype.h>
+
+#include "../../kselftest.h"
+
+// Function to check if a string is numeric (PID check)
+int is_numeric(const char *str) {
+ while (*str) {
+ if (!isdigit(*str)) return 0;
+ str++;
+ }
+ return 1;
+}
+
+// Function to find the child PID in /proc
+pid_t find_child_pid(pid_t parent_pid) {
+ DIR *proc_dir = opendir("/proc");
+ struct dirent *entry;
+
+ if (proc_dir == NULL) {
+ perror("Failed to open /proc directory");
+ ksft_exit_fail();
+ return 1;
+ }
+
+ // Iterate through the /proc directory to find PIDs
+ while ((entry = readdir(proc_dir)) != NULL) {
+ // Check if the entry is a PID
+ if (is_numeric(entry->d_name)) {
+ pid_t pid = atoi(entry->d_name);
+
+ // Construct the path to /proc/<pid>/
+ //stat to check the parent PID
+
+ char path[40], buffer[100];
+ snprintf(path, 40, "/proc/%d/stat", pid);
+
+ FILE *stat_file = fopen(path, "r");
+ if (stat_file != NULL) {
+ fgets(buffer, 100, stat_file);
+ fclose(stat_file);
+
+ // The fourth field in /proc/<pid>/stat is the parent PID
+ pid_t ppid;
+ sscanf(buffer, "%*d %*s %*c %d", &ppid);
+
+ if (ppid == parent_pid) {
+ closedir(proc_dir);
+ // Return the child PID if the parent PID matches
+ return pid;
+ }
+ }
+ }
+ }
+
+ closedir(proc_dir);
+
+ // Return -1 if no child PID was found
+ return -1;
+}
+
+int main(void) {
+
+ // Setting up kselftest framework
+ ksft_print_header();
+ ksft_set_plan(1);
+
+ // Check if test is run a root
+ if (geteuid()) {
+ ksft_test_result_skip("This test needs root to run!\n");
+ return 1;
+ }
+
+ // forking
+ pid_t pid = fork();
+
+ // Declare a variable in both parent and child processes
+ int var = 17;
+
+ if (pid == -1) {
+ ksft_test_result_error("%s.\n", strerror(errno));
+ ksft_finished();
+ return 1;
+
+ } else if (pid == 0) {
+ // This is the child process
+ ksft_print_msg("Inside the child process.\n");
+ var = 1998;
+
+ } else {
+ // This is the parent process
+ pid_t ppid=getpid();
+ ksft_print_msg("Inside the parent process.\n");
+ ksft_print_msg("Child PID got from fork() return : %d\n", pid);
+ ksft_print_msg("Parent PID from getpid(): %d\n",ppid);
+
+ // Find the child PID in /proc
+ pid_t child_pid = find_child_pid(getpid());
+ if (child_pid != -1) {
+ ksft_set_plan(2);
+ if(child_pid == pid && pid != ppid && var != 1998) {
+ ksft_test_result_pass("Child Pid from /proc and fork() matching\n");
+ ksft_test_result_pass("Child Pid != Parent pid\n");
+ ksft_set_plan(3);
+ ksft_test_result_pass(
+ "After modification in child No effect on the value of 'var' in parent\n");
+ ksft_exit_pass();
+ return 0;
+ }
+ else {
+ ksft_exit_fail();
+ return 1;
+ }
+ }
+ else {
+ ksft_test_result_fail("Child Pid from /proc and fork() does not match");
+ ksft_exit_fail();
+ return 1;
+ }
+
+ // Wait for the child process to finish
+ wait(NULL);
+ }
+
+ return 0;
+}
+
--
2.34.1
Hi,
===== START =====
TEST: enq_last_no_enq_fails
DESCRIPTION: Verify we fail to load a scheduler if we specify the SCX_OPS_ENQ_LAST flag without defining ops.enqueue()
OUTPUT:
ERR: enq_last_no_enq_fails.c:35
Incorrectly succeeded in to attaching scheduler
not ok 2 enq_last_no_enq_fails #
===== END =====
Above selftest fails even when BPF scheduler is not loaded into the kernel.
Below is snippet from the dmesg verifing bpf program was not loaded:
sched_ext: enq_last_no_enq_fails: SCX_OPS_ENQ_LAST requires ops.enqueue() to be implemented
scx_ops_enable.isra.0+0xde8/0xe30
bpf_struct_ops_link_create+0x1ac/0x240
link_create+0x178/0x400
__sys_bpf+0x7ac/0xd50
sys_bpf+0x2c/0x70
system_call_exception+0x148/0x310
system_call_vectored_common+0x15c/0x2ec
sched_ext: "enq_select_cpu_fails" does not implement cgroup cpu.weight
sched_ext: BPF scheduler "enq_select_cpu_fails" enabled
sched_ext: BPF scheduler "enq_select_cpu_fails" disabled (runtime error)
static int scx_ops_enable(struct sched_ext_ops *ops, struct bpf_link *link)
{
...
ret = validate_ops(ops);
if (ret)
goto err_disable;
...
err_disable:
mutex_unlock(&scx_ops_enable_mutex);
/*
* Returning an error code here would not pass all the error information
* to userspace. Record errno using scx_ops_error() for cases
* scx_ops_error() wasn't already invoked and exit indicating success so
* that the error is notified through ops.exit() with all the details.
*
* Flush scx_ops_disable_work to ensure that error is reported before
* init completion.
*/
scx_ops_error("scx_ops_enable() failed (%d)", ret);
kthread_flush_work(&scx_ops_disable_work);
return 0;
}
validate_ops() correctly reports the error, but err_disable path ultimately
returns with a value of zero
from: enq_last_no_enq_fails.c
static enum scx_test_status run(void *ctx)
{
struct enq_last_no_enq_fails *skel = ctx;
struct bpf_link *link;
link = bpf_map__attach_struct_ops(skel->maps.enq_last_no_enq_fails_ops);
if (link) {
SCX_ERR("Incorrectly succeeded in to attaching scheduler");
return SCX_TEST_FAIL;
}
bpf_link__destroy(link);
return SCX_TEST_PASS;
}
From: Jeff Xu <jeffxu(a)google.com>
Two fixes for madvise(MADV_DONTNEED) when sealed.
For PROT_NONE mappings, the previous blocking of
madvise(MADV_DONTNEED) is unnecessary. As PROT_NONE already prohibits
memory access, madvise(MADV_DONTNEED) should be allowed to proceed in
order to free the page.
For file-backed, private, read-only memory mappings, we previously did
not block the madvise(MADV_DONTNEED). This was based on
the assumption that the memory's content, being file-backed, could be
retrieved from the file if accessed again. However, this assumption
failed to consider scenarios where a mapping is initially created as
read-write, modified, and subsequently changed to read-only. The newly
introduced VM_WASWRITE flag addresses this oversight.
Jeff Xu (2):
mseal: Two fixes for madvise(MADV_DONTNEED) when sealed
selftest/mseal: Add tests for madvise
include/linux/mm.h | 2 +
mm/mprotect.c | 3 +
mm/mseal.c | 42 +++++++--
tools/testing/selftests/mm/mseal_test.c | 118 +++++++++++++++++++++++-
4 files changed, 157 insertions(+), 8 deletions(-)
--
2.47.0.rc1.288.g06298d1525-goog
Hi
Note for V12:
There was a small conflict between the Intel PT changes in
"KVM: x86: Fix Intel PT Host/Guest mode when host tracing" and the
changes in this patch set, so I have put the patch sets together,
along with outstanding fix "perf/x86/intel/pt: Fix buffer full but
size is 0 case"
Cover letter for KVM changes (patches 2 to 4):
There is a long-standing problem whereby running Intel PT on host and guest
in Host/Guest mode, causes VM-Entry failure.
The motivation for this patch set is to provide a fix for stable kernels
prior to the advent of the "Mediated Passthrough vPMU" patch set:
https://lore.kernel.org/kvm/20240801045907.4010984-1-mizhang@google.com/
which would render a large part of the fix unnecessary but likely not be
suitable for backport to stable due to its size and complexity.
Ideally, this patch set would be applied before "Mediated Passthrough vPMU"
Note that the fix does not conflict with "Mediated Passthrough vPMU", it
is just that "Mediated Passthrough vPMU" will make the code to stop and
restart Intel PT unnecessary.
Note for V11:
Moving aux_paused into a union within struct hw_perf_event caused
a regression because aux_paused was being written unconditionally
even though it is valid only for AUX (e.g. Intel PT) PMUs.
That is fixed in V11.
Hardware traces, such as instruction traces, can produce a vast amount of
trace data, so being able to reduce tracing to more specific circumstances
can be useful.
The ability to pause or resume tracing when another event happens, can do
that.
These patches add such a facilty and show how it would work for Intel
Processor Trace.
Maintainers of other AUX area tracing implementations are requested to
consider if this is something they might employ and then whether or not
the ABI would work for them. Note, thank you to James Clark (ARM) for
evaluating the API for Coresight. Suzuki K Poulose (ARM) also responded
positively to the RFC.
Changes to perf tools are now (since V4) fleshed out.
Please note, Intel® Architecture Instruction Set Extensions and Future
Features Programming Reference March 2024 319433-052, currently:
https://cdrdv2.intel.com/v1/dl/getContent/671368
introduces hardware pause / resume for Intel PT in a feature named
Intel PT Trigger Tracing.
For that more fields in perf_event_attr will be necessary. The main
differences are:
- it can be applied not just to overflows, but optionally to
every event
- a packet is emitted into the trace, optionally with IP
information
- no PMI
- works with PMC and DR (breakpoint) events only
Here are the proposed additions to perf_event_attr, please comment:
diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 0c557f0a17b3..05dcc43f11bb 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -369,6 +369,22 @@ enum perf_event_read_format {
PERF_FORMAT_MAX = 1U << 5, /* non-ABI */
};
+enum {
+ PERF_AUX_ACTION_START_PAUSED = 1U << 0,
+ PERF_AUX_ACTION_PAUSE = 1U << 1,
+ PERF_AUX_ACTION_RESUME = 1U << 2,
+ PERF_AUX_ACTION_EMIT = 1U << 3,
+ PERF_AUX_ACTION_NR = 0x1f << 4,
+ PERF_AUX_ACTION_NO_IP = 1U << 9,
+ PERF_AUX_ACTION_PAUSE_ON_EVT = 1U << 10,
+ PERF_AUX_ACTION_RESUME_ON_EVT = 1U << 11,
+ PERF_AUX_ACTION_EMIT_ON_EVT = 1U << 12,
+ PERF_AUX_ACTION_NR_ON_EVT = 0x1f << 13,
+ PERF_AUX_ACTION_NO_IP_ON_EVT = 1U << 18,
+ PERF_AUX_ACTION_MASK = ~PERF_AUX_ACTION_START_PAUSED,
+ PERF_AUX_PAUSE_RESUME_MASK = PERF_AUX_ACTION_PAUSE | PERF_AUX_ACTION_RESUME,
+};
+
#define PERF_ATTR_SIZE_VER0 64 /* sizeof first published struct */
#define PERF_ATTR_SIZE_VER1 72 /* add: config2 */
#define PERF_ATTR_SIZE_VER2 80 /* add: branch_sample_type */
@@ -515,10 +531,19 @@ struct perf_event_attr {
union {
__u32 aux_action;
struct {
- __u32 aux_start_paused : 1, /* start AUX area tracing paused */
- aux_pause : 1, /* on overflow, pause AUX area tracing */
- aux_resume : 1, /* on overflow, resume AUX area tracing */
- __reserved_3 : 29;
+ __u32 aux_start_paused : 1, /* start AUX area tracing paused */
+ aux_pause : 1, /* on overflow, pause AUX area tracing */
+ aux_resume : 1, /* on overflow, resume AUX area tracing */
+ aux_emit : 1, /* generate AUX records instead of events */
+ aux_nr : 5, /* AUX area tracing reference number */
+ aux_no_ip : 1, /* suppress IP in AUX records */
+ /* Following apply to event occurrence not overflows */
+ aux_pause_on_evt : 1, /* on event, pause AUX area tracing */
+ aux_resume_on_evt : 1, /* on event, resume AUX area tracing */
+ aux_emit_on_evt : 1, /* generate AUX records instead of events */
+ aux_nr_on_evt : 5, /* AUX area tracing reference number */
+ aux_no_ip_on_evt : 1, /* suppress IP in AUX records */
+ __reserved_3 : 13;
};
};
Changes in V13:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Do aux_resume at the end of __perf_event_overflow() so as to trace
less of perf itself
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
Add error message also in EOPNOTSUPP case (Leo)
Changes in V12:
Add previously sent patch "perf/x86/intel/pt: Fix buffer full
but size is 0 case"
Add previously sent patch set "KVM: x86: Fix Intel PT Host/Guest
mode when host tracing"
Rebase on current tip plus patch set "KVM: x86: Fix Intel PT Host/Guest
mode when host tracing"
Changes in V11:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Make assignment to event->hw.aux_paused conditional on
(pmu->capabilities & PERF_PMU_CAP_AUX_PAUSE).
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
Remove definition of has_aux_action() because it has
already been added as an inline function.
perf/x86/intel/pt: Fix sampling synchronization
perf tools: Enable evsel__is_aux_event() to work for ARM/ARM64
perf tools: Enable evsel__is_aux_event() to work for S390_CPUMSF
Dropped because they have already been applied
Changes in V10:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Move aux_paused into a union within struct hw_perf_event.
Additional comment wrt PERF_EF_PAUSE/PERF_EF_RESUME.
Factor out has_aux_action() as an inline function.
Use scoped_guard for irqsave.
Move calls of perf_event_aux_pause() from __perf_event_output()
to __perf_event_overflow().
Changes in V9:
perf/x86/intel/pt: Fix sampling synchronization
New patch
perf/core: Add aux_pause, aux_resume, aux_start_paused
Move aux_paused to struct hw_perf_event
perf/x86/intel/pt: Add support for pause / resume
Add more comments and barriers for resume_allowed and
pause_allowed
Always use WRITE_ONCE with resume_allowed
Changes in V8:
perf tools: Parse aux-action
Fix clang warning:
util/auxtrace.c:821:7: error: missing field 'aux_action' initializer [-Werror,-Wmissing-field-initializers]
821 | {NULL},
| ^
Changes in V7:
Add Andi's Reviewed-by for patches 2-12
Re-base
Changes in V6:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Removed READ/WRITE_ONCE from __perf_event_aux_pause()
Expanded comment about guarding against NMI
Changes in V5:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Added James' Ack
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
New patch
perf tools
Added Ian's Ack
Changes in V4:
perf/core: Add aux_pause, aux_resume, aux_start_paused
Rename aux_output_cfg -> aux_action
Reorder aux_action bits from:
aux_pause, aux_resume, aux_start_paused
to:
aux_start_paused, aux_pause, aux_resume
Fix aux_action bits __u64 -> __u32
coresight: Have a stab at support for pause / resume
Dropped
perf tools
All new patches
Changes in RFC V3:
coresight: Have a stab at support for pause / resume
'mode' -> 'flags' so it at least compiles
Changes in RFC V2:
Use ->stop() / ->start() instead of ->pause_resume()
Move aux_start_paused bit into aux_output_cfg
Tighten up when Intel PT pause / resume is allowed
Add an example of how it might work for CoreSight
Adrian Hunter (14):
perf/x86/intel/pt: Fix buffer full but size is 0 case
KVM: x86: Fix Intel PT IA32_RTIT_CTL MSR validation
KVM: x86: Fix Intel PT Host/Guest mode when host tracing also
KVM: selftests: Add guest Intel PT test
perf/core: Add aux_pause, aux_resume, aux_start_paused
perf/x86/intel/pt: Add support for pause / resume
perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling
perf tools: Add aux_start_paused, aux_pause and aux_resume
perf tools: Add aux-action config term
perf tools: Parse aux-action
perf tools: Add missing_features for aux_start_paused, aux_pause, aux_resume
perf intel-pt: Improve man page format
perf intel-pt: Add documentation for pause / resume
perf intel-pt: Add a test for pause / resume
arch/x86/events/intel/core.c | 4 +-
arch/x86/events/intel/pt.c | 209 +++++++-
arch/x86/events/intel/pt.h | 16 +
arch/x86/include/asm/intel_pt.h | 4 +
arch/x86/kvm/vmx/vmx.c | 26 +-
arch/x86/kvm/vmx/vmx.h | 1 -
include/linux/perf_event.h | 28 +
include/uapi/linux/perf_event.h | 11 +-
kernel/events/core.c | 75 ++-
kernel/events/internal.h | 1 +
tools/include/uapi/linux/perf_event.h | 11 +-
tools/perf/Documentation/perf-intel-pt.txt | 596 +++++++++++++--------
tools/perf/Documentation/perf-record.txt | 4 +
tools/perf/builtin-record.c | 4 +-
tools/perf/tests/shell/test_intel_pt.sh | 28 +
tools/perf/util/auxtrace.c | 67 ++-
tools/perf/util/auxtrace.h | 6 +-
tools/perf/util/evsel.c | 15 +
tools/perf/util/evsel.h | 1 +
tools/perf/util/evsel_config.h | 1 +
tools/perf/util/parse-events.c | 10 +
tools/perf/util/parse-events.h | 1 +
tools/perf/util/parse-events.l | 1 +
tools/perf/util/perf_event_attr_fprintf.c | 3 +
tools/perf/util/pmu.c | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/x86_64/processor.h | 1 +
tools/testing/selftests/kvm/x86_64/intel_pt.c | 381 +++++++++++++
28 files changed, 1243 insertions(+), 264 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/intel_pt.c
Regards
Adrian
Recently we committed a fix to allow processes to receive notifications for
non-zero exits via the process connector module. Commit is a4c9a56e6a2c.
However, for threads, when it does a pthread_exit(&exit_status) call, the
kernel is not aware of the exit status with which pthread_exit is called.
It is sent by child thread to the parent process, if it is waiting in
pthread_join(). Hence, for a thread exiting abnormally, kernel cannot
send notifications to any listening processes.
The exception to this is if the thread is sent a signal which it has not
handled, and dies along with it's process as a result; for eg. SIGSEGV or
SIGKILL. In this case, kernel is aware of the non-zero exit and sends a
notification for it.
For our use case, we cannot have parent wait in pthread_join, one of the
main reasons for this being that we do not want to track normal
pthread_exit(), which could be a very large number. We only want to be
notified of any abnormal exits. Hence, threads are created with
pthread_attr_t set to PTHREAD_CREATE_DETACHED.
To fix this problem, we add a new type PROC_CN_MCAST_NOTIFY to proc connector
API, which allows a thread to send it's exit status to kernel either when
it needs to call pthread_exit() with non-zero value to indicate some
error or from signal handler before pthread_exit().
We also need to filter packets with non-zero exit notifications futher
based on instances, which can be identified by task names. Hence, added a
comm field to the packet's struct proc_event, in which task->comm is
stored.
v4->v5 changes:
- Handled comment by Stanislav Fomichev to fix a print format error.
- Made thread.c completely automated by starting proc_filter program
from within threads.c.
- Changed name CONFIG_CN_HASH_KUNIT_TEST to CN_HASH_KUNIT_TEST in
Kconfig.debug and changed display text.
v3->v4 changes:
- Reduce size of exit.log by removing unnecessary text.
v2->v3 changes:
- Handled comment by Liam Howlett to set hdev to NULL and add comment on
it.
- Handled comment by Liam Howlett to combine functions for deleting+get
and deleting into one in cn_hash.c
- Handled comment by Liam Howlett to remove extern in the functions
defined in cn_hash_test.h
- Some nits by Liam Howlett fixed.
- Handled comment by Liam Howlett to make threads test automated.
proc_filter.c creates exit.log, which is read by thread.c and checks
the values reported.
- Added "comm" field to struct proc_event, to copy the task's name to
the packet to allow further filtering by packets.
v1->v2 changes:
- Handled comment by Peter Zijlstra to remove locking for PF_EXIT_NOTIFY
task->flags.
- Added error handling in thread.c
v->v1 changes:
- Handled comment by Simon Horman to remove unused err in cn_proc.c
- Handled comment by Simon Horman to make adata and key_display static
in cn_hash_test.c
Anjali Kulkarni (3):
connector/cn_proc: Add hash table for threads
connector/cn_proc: Kunit tests for threads hash table
connector/cn_proc: Selftest for threads
drivers/connector/Makefile | 2 +-
drivers/connector/cn_hash.c | 221 +++++++++++++++++
drivers/connector/cn_proc.c | 62 ++++-
drivers/connector/connector.c | 75 +++++-
include/linux/connector.h | 35 +++
include/linux/sched.h | 2 +-
include/uapi/linux/cn_proc.h | 5 +-
lib/Kconfig.debug | 17 ++
lib/Makefile | 1 +
lib/cn_hash_test.c | 167 +++++++++++++
lib/cn_hash_test.h | 10 +
tools/testing/selftests/connector/Makefile | 23 +-
.../testing/selftests/connector/proc_filter.c | 34 ++-
tools/testing/selftests/connector/thread.c | 232 ++++++++++++++++++
.../selftests/connector/thread_filter.c | 96 ++++++++
15 files changed, 967 insertions(+), 15 deletions(-)
create mode 100644 drivers/connector/cn_hash.c
create mode 100644 lib/cn_hash_test.c
create mode 100644 lib/cn_hash_test.h
create mode 100644 tools/testing/selftests/connector/thread.c
create mode 100644 tools/testing/selftests/connector/thread_filter.c
--
2.46.0
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ]
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/progs/verifier_scalar_ids.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 13b29a7faa71a..d24d3a36ec144 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -656,4 +656,71 @@ __naked void two_old_ids_one_cur_id(void)
: __clobber_all);
}
+SEC("socket")
+/* Note the flag, see verifier.c:opt_subreg_zext_lo32_rnd_hi32() */
+__flag(BPF_F_TEST_RND_HI32)
+__success
+/* This test was added because of a bug in verifier.c:sync_linked_regs(),
+ * upon range propagation it destroyed subreg_def marks for registers.
+ * The subreg_def mark is used to decide whether zero extension instructions
+ * are needed when register is read. When BPF_F_TEST_RND_HI32 is set it
+ * also causes generation of statements to randomize upper halves of
+ * read registers.
+ *
+ * The test is written in a way to return an upper half of a register
+ * that is affected by range propagation and must have it's subreg_def
+ * preserved. This gives a return value of 0 and leads to undefined
+ * return value if subreg_def mark is not preserved.
+ */
+__retval(0)
+/* Check that verifier believes r1/r0 are zero at exit */
+__log_level(2)
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+__msg("from 3 to 4")
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+/* Verify that statements to randomize upper half of r1 had not been
+ * generated.
+ */
+__xlated("call unknown")
+__xlated("r0 &= 2147483647")
+__xlated("w1 = w0")
+/* This is how disasm.c prints BPF_ZEXT_REG at the moment, x86 and arm
+ * are the only CI archs that do not need zero extension for subregs.
+ */
+#if !defined(__TARGET_ARCH_x86) && !defined(__TARGET_ARCH_arm64)
+__xlated("w1 = w1")
+#endif
+__xlated("if w0 < 0xa goto pc+0")
+__xlated("r1 >>= 32")
+__xlated("r0 = r1")
+__xlated("exit")
+__naked void linked_regs_and_subreg_def(void)
+{
+ asm volatile (
+ "call %[bpf_ktime_get_ns];"
+ /* make sure r0 is in 32-bit range, otherwise w1 = w0 won't
+ * assign same IDs to registers.
+ */
+ "r0 &= 0x7fffffff;"
+ /* link w1 and w0 via ID */
+ "w1 = w0;"
+ /* 'if' statement propagates range info from w0 to w1,
+ * but should not affect w1->subreg_def property.
+ */
+ "if w0 < 10 goto +0;"
+ /* r1 is read here, on archs that require subreg zero
+ * extension this would cause zext patch generation.
+ */
+ "r1 >>= 32;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_ktime_get_ns)
+ : __clobber_all);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
From: Eduard Zingerman <eddyz87(a)gmail.com>
[ Upstream commit a41b3828ec056a631ad22413d4560017fed5c3bd ]
This test was added because of a bug in verifier.c:sync_linked_regs(),
upon range propagation it destroyed subreg_def marks for registers.
The test is written in a way to return an upper half of a register
that is affected by range propagation and must have it's subreg_def
preserved. This gives a return value of 0 and leads to undefined
return value if subreg_def mark is not preserved.
Signed-off-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Daniel Borkmann <daniel(a)iogearbox.net>
Link: https://lore.kernel.org/bpf/20240924210844.1758441-2-eddyz87@gmail.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/bpf/progs/verifier_scalar_ids.c | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 13b29a7faa71a..d24d3a36ec144 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -656,4 +656,71 @@ __naked void two_old_ids_one_cur_id(void)
: __clobber_all);
}
+SEC("socket")
+/* Note the flag, see verifier.c:opt_subreg_zext_lo32_rnd_hi32() */
+__flag(BPF_F_TEST_RND_HI32)
+__success
+/* This test was added because of a bug in verifier.c:sync_linked_regs(),
+ * upon range propagation it destroyed subreg_def marks for registers.
+ * The subreg_def mark is used to decide whether zero extension instructions
+ * are needed when register is read. When BPF_F_TEST_RND_HI32 is set it
+ * also causes generation of statements to randomize upper halves of
+ * read registers.
+ *
+ * The test is written in a way to return an upper half of a register
+ * that is affected by range propagation and must have it's subreg_def
+ * preserved. This gives a return value of 0 and leads to undefined
+ * return value if subreg_def mark is not preserved.
+ */
+__retval(0)
+/* Check that verifier believes r1/r0 are zero at exit */
+__log_level(2)
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+__msg("from 3 to 4")
+__msg("4: (77) r1 >>= 32 ; R1_w=0")
+__msg("5: (bf) r0 = r1 ; R0_w=0 R1_w=0")
+__msg("6: (95) exit")
+/* Verify that statements to randomize upper half of r1 had not been
+ * generated.
+ */
+__xlated("call unknown")
+__xlated("r0 &= 2147483647")
+__xlated("w1 = w0")
+/* This is how disasm.c prints BPF_ZEXT_REG at the moment, x86 and arm
+ * are the only CI archs that do not need zero extension for subregs.
+ */
+#if !defined(__TARGET_ARCH_x86) && !defined(__TARGET_ARCH_arm64)
+__xlated("w1 = w1")
+#endif
+__xlated("if w0 < 0xa goto pc+0")
+__xlated("r1 >>= 32")
+__xlated("r0 = r1")
+__xlated("exit")
+__naked void linked_regs_and_subreg_def(void)
+{
+ asm volatile (
+ "call %[bpf_ktime_get_ns];"
+ /* make sure r0 is in 32-bit range, otherwise w1 = w0 won't
+ * assign same IDs to registers.
+ */
+ "r0 &= 0x7fffffff;"
+ /* link w1 and w0 via ID */
+ "w1 = w0;"
+ /* 'if' statement propagates range info from w0 to w1,
+ * but should not affect w1->subreg_def property.
+ */
+ "if w0 < 10 goto +0;"
+ /* r1 is read here, on archs that require subreg zero
+ * extension this would cause zext patch generation.
+ */
+ "r1 >>= 32;"
+ "r0 = r1;"
+ "exit;"
+ :
+ : __imm(bpf_ktime_get_ns)
+ : __clobber_all);
+}
+
char _license[] SEC("license") = "GPL";
--
2.43.0
Userland library functions such as allocators and threading implementations
often require regions of memory to act as 'guard pages' - mappings which,
when accessed, result in a fatal signal being sent to the accessing
process.
The current means by which these are implemented is via a PROT_NONE mmap()
mapping, which provides the required semantics however incur an overhead of
a VMA for each such region.
With a great many processes and threads, this can rapidly add up and incur
a significant memory penalty. It also has the added problem of preventing
merges that might otherwise be permitted.
This series takes a different approach - an idea suggested by Vlasimil
Babka (and before him David Hildenbrand and Jann Horn - perhaps more - the
provenance becomes a little tricky to ascertain after this - please forgive
any omissions!) - rather than locating the guard pages at the VMA layer,
instead placing them in page tables mapping the required ranges.
Early testing of the prototype version of this code suggests a 5 times
speed up in memory mapping invocations (in conjunction with use of
process_madvise()) and a 13% reduction in VMAs on an entirely idle android
system and unoptimised code.
We expect with optimisation and a loaded system with a larger number of
guard pages this could significantly increase, but in any case these
numbers are encouraging.
This way, rather than having separate VMAs specifying which parts of a
range are guard pages, instead we have a VMA spanning the entire range of
memory a user is permitted to access and including ranges which are to be
'guarded'.
After mapping this, a user can specify which parts of the range should
result in a fatal signal when accessed.
By restricting the ability to specify guard pages to memory mapped by
existing VMAs, we can rely on the mappings being torn down when the
mappings are ultimately unmapped and everything works simply as if the
memory were not faulted in, from the point of view of the containing VMAs.
This mechanism in effect poisons memory ranges similar to hardware memory
poisoning, only it is an entirely software-controlled form of poisoning.
Any poisoned region of memory is also able to 'unpoisoned', that is, to
have its poison markers removed.
The mechanism is implemented via madvise() behaviour - MADV_GUARD_POISON
which simply poisons ranges - and MADV_GUARD_UNPOISON - which clears this
poisoning.
Poisoning can be performed across multiple VMAs and any existing mappings
will be cleared, that is zapped, before installing the poisoned page table
mappings.
There is no concept of 'nested' poisoning, multiple attempts to poison a
range will, after the first poisoning, have no effect.
Importantly, unpoisoning of poisoned ranges has no effect on non-poisoned
memory, so a user can safely unpoison a range of memory and clear only
poison page table mappings leaving the rest intact.
The actual mechanism by which the page table entries are specified makes
use of existing logic - PTE markers, which are used for the userfaultfd
UFFDIO_POISON mechanism.
Unfortunately PTE_MARKER_POISONED is not suited for the guard page
mechanism as it results in VM_FAULT_HWPOISON semantics in the fault
handler, so we add our own specific PTE_MARKER_GUARD and adapt existing
logic to handle it.
We also extend the generic page walk mechanism to allow for installation of
PTEs (carefully restricted to memory management logic only to prevent
unwanted abuse).
We ensure that zapping performed by, for instance, MADV_DONTNEED, does not
remove guard poison markers, nor does forking (except when VM_WIPEONFORK is
specified for a VMA which implies a total removal of memory
characteristics).
It's important to note that the guard page implementation is emphatically
NOT a security feature, so a user can remove the poisoning if they wish. We
simply implement it in such a way as to provide the least surprising
behaviour.
An extensive set of self-tests are provided which ensure behaviour is as
expected and additionally self-documents expected behaviour of poisoned
ranges.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Suggested-by: Jann Horn <jannh(a)google.com>
Suggested-by: David Hildenbrand <david(a)redhat.com>
v2
* The macros in kselftest_harness.h seem to be broken - __EXPECT() is
terminated by '} while (0); OPTIONAL_HANDLER(_assert)' meaning it is not
safe in single line if / else or for /which blocks, however working
around this results in checkpatch producing invalid warnings, as reported
by Shuah.
* Fixing these macros is out of scope for this series, so compromise and
instead rewrite test blocks so as to use multiple lines by separating out
a decl in most cases. This has the side effect of, for the most part,
making things more readable.
* Heavily document the use of the volatile keyword - we can't avoid
checkpatch complaining about this, so we explain it, as reported by
Shuah.
* Updated commit message to highlight that we skip tests we lack
permissions for, as reported by Shuah.
* Replaced a perror() with ksft_exit_fail_perror(), as reported by Shuah.
* Added user friendly messages to cases where tests are skipped due to lack
of permissions, as reported by Shuah.
* Update the tool header to include the new MADV_GUARD_POISON/UNPOISON
defines and directly include asm-generic/mman.h to get the
platform-neutral versions to ensure we import them.
* Finally fixed Vlastimil's email address in Suggested-by tags from suze to
suse, as reported by Vlastimil.
* Added linux-api to cc list, as reported by Vlastimil.
v1
* Un-RFC'd as appears no major objections to approach but rather debate on
implementation.
* Fixed issue with arches which need mmu_context.h and
tlbfush.h. header imports in pagewalker logic to be able to use
update_mmu_cache() as reported by the kernel test bot.
* Added comments in page walker logic to clarify who can use
ops->install_pte and why as well as adding a check_ops_valid() helper
function, as suggested by Christoph.
* Pass false in full parameter in pte_clear_not_present_full() as suggested
by Jann.
* Stopped erroneously requiring a write lock for the poison operation as
suggested by Jann and Suren.
* Moved anon_vma_prepare() to the start of madvise_guard_poison() to be
consistent with how this is used elsewhere in the kernel as suggested by
Jann.
* Avoid returning -EAGAIN if we are raced on page faults, just keep looping
and duck out if a fatal signal is pending or a conditional reschedule is
needed, as suggested by Jann.
* Avoid needlessly splitting huge PUDs and PMDs by specifying
ACTION_CONTINUE, as suggested by Jann.
https://lore.kernel.org/all/cover.1729196871.git.lorenzo.stoakes@oracle.com/
RFC
https://lore.kernel.org/all/cover.1727440966.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (5):
mm: pagewalk: add the ability to install PTEs
mm: add PTE_MARKER_GUARD PTE marker
mm: madvise: implement lightweight guard page mechanism
tools: testing: update tools UAPI header for mman-common.h
selftests/mm: add self tests for guard page feature
arch/alpha/include/uapi/asm/mman.h | 3 +
arch/mips/include/uapi/asm/mman.h | 3 +
arch/parisc/include/uapi/asm/mman.h | 3 +
arch/xtensa/include/uapi/asm/mman.h | 3 +
include/linux/mm_inline.h | 2 +-
include/linux/pagewalk.h | 18 +-
include/linux/swapops.h | 26 +-
include/uapi/asm-generic/mman-common.h | 3 +
mm/hugetlb.c | 3 +
mm/internal.h | 6 +
mm/madvise.c | 168 +++
mm/memory.c | 18 +-
mm/mprotect.c | 3 +-
mm/mseal.c | 1 +
mm/pagewalk.c | 200 ++-
tools/include/uapi/asm-generic/mman-common.h | 3 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/guard-pages.c | 1228 ++++++++++++++++++
19 files changed, 1627 insertions(+), 66 deletions(-)
create mode 100644 tools/testing/selftests/mm/guard-pages.c
--
2.47.0
Currently if we encounter an error between fork() and exec() of a child
process we log the error to stderr. This means that the errors don't get
annotated with the child information which makes diagnostics harder and
means that if we miss the exit signal from the child we can deadlock
waiting for output from the child. Improve robustness and output quality
by logging to stdout instead.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/fp-stress.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-stress.c b/tools/testing/selftests/arm64/fp/fp-stress.c
index faac24bdefeb9436e2daf20b7250d0ae25ca23a7..80f22789504d661efc52a90d4b0893fbebec42f8 100644
--- a/tools/testing/selftests/arm64/fp/fp-stress.c
+++ b/tools/testing/selftests/arm64/fp/fp-stress.c
@@ -79,7 +79,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(pipefd[1], 1);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -89,7 +89,7 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = dup2(startup_pipe[0], 3);
if (ret == -1) {
- fprintf(stderr, "dup2() %d\n", errno);
+ printf("dup2() %d\n", errno);
exit(EXIT_FAILURE);
}
@@ -107,16 +107,15 @@ static void child_start(struct child_data *child, const char *program)
*/
ret = read(3, &i, sizeof(i));
if (ret < 0)
- fprintf(stderr, "read(startp pipe) failed: %s (%d)\n",
- strerror(errno), errno);
+ printf("read(startp pipe) failed: %s (%d)\n",
+ strerror(errno), errno);
if (ret > 0)
- fprintf(stderr, "%d bytes of data on startup pipe\n",
- ret);
+ printf("%d bytes of data on startup pipe\n", ret);
close(3);
ret = execl(program, program, NULL);
- fprintf(stderr, "execl(%s) failed: %d (%s)\n",
- program, errno, strerror(errno));
+ printf("execl(%s) failed: %d (%s)\n",
+ program, errno, strerror(errno));
exit(EXIT_FAILURE);
} else {
---
base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354
change-id: 20241017-arm64-fp-stress-exec-fail-d074ec82cf43
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This patch series migrates test cases out of test_sock.c to
prog_tests-style tests. It moves all BPF_CGROUP_INET4_POST_BIND and
BPF_CGROUP_INET6_POST_BIND test cases into a new prog_test,
sock_post_bind.c, while reimplementing all LOAD_REJECT test cases as
verifier tests in progs/verifier_sock.c. Finally, it moves remaining
BPF_CGROUP_INET_SOCK_CREATE test coverage into prog_tests/sock_create.c
before retiring test_sock.c completely.
Changes
=======
v1->v2:
- Remove superfluous verbose bool from the top of sock_post_bind.c.
- Use ASSERT_OK_FD instead of ASSERT_GE to test cgroup_fd validity.
- Run sock_post_bind tests in their own namespace, "sock_post_bind".
Jordan Rife (4):
selftests/bpf: Migrate *_POST_BIND test cases to prog_tests
selftests/bpf: Migrate LOAD_REJECT test cases to prog_tests
selftests/bpf: Migrate BPF_CGROUP_INET_SOCK_CREATE test cases to
prog_tests
selftests/bpf: Retire test_sock.c
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 3 +-
.../selftests/bpf/prog_tests/sock_create.c | 35 ++-
.../sock_post_bind.c} | 256 +++++-------------
.../selftests/bpf/progs/verifier_sock.c | 60 ++++
5 files changed, 150 insertions(+), 205 deletions(-)
rename tools/testing/selftests/bpf/{test_sock.c => prog_tests/sock_post_bind.c} (64%)
--
2.47.0.105.g07ac214952-goog
Hi Zheng,
Cc-ed kunit folks, as we usually do for DAMON kunit test changes.
On Tue, 22 Oct 2024 16:39:27 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
> As discussed in [1], damon_va_evenly_split_region() is called to
> size-evenly split a region into 'nr_pieces' small regions,
> when nr_pieces == 1, no actual split is required. Check that case
> for better code readability and add a simple kunit testcase.
>
> [1] https://lore.kernel.org/all/20241021163316.12443-1-sj@kernel.org/
>
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Thanks,
SJ
[...]
Hi Zheng,
We Cc kunit folks for any DAMON kunit test changes, so I Cc-ed them.
On Tue, 22 Oct 2024 16:39:26 +0800 Zheng Yejian <zhengyejian(a)huaweicloud.com> wrote:
> According to the logic of damon_va_evenly_split_region(), currently
> following split case would not meet the expectation:
>
> Suppose DAMON_MIN_REGION=0x1000,
> Case: Split [0x0, 0x3000) into 2 pieces, then the result would be
> acutually 3 regions:
> [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000)
> but NOT the expected 2 regions:
> [0x0, 0x1000), [0x1000, 0x3000) !!!
>
> The root cause is that when calculating size of each split piece in
> damon_va_evenly_split_region():
>
> `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);`
>
> both the dividing and the ALIGN_DOWN may cause loss of precision,
> then each time split one piece of size 'sz_piece' from origin 'start' to
> 'end' would cause more pieces are split out than expected!!!
>
> To fix it, count for each piece split and make sure no more than
> 'nr_pieces'. In addition, add above case into damon_test_split_evenly().
>
> After this patch, damon-operations test passed:
Just for a clarification. damon-operations test doesn't fail without this
patch. This patch introduces two changes. A new kunit test, and a bug fix.
Without the bug fix, the new kunit test fails.
I usually prefer separating test changes from fixes (introduc a fix first, and
then the test for it, to avoid unnecessary test failures). But, given the
small size and the simplicity of the kunit change for this patch, I think
introducing it together with the fix is ok.
>
> # ./tools/testing/kunit/kunit.py run damon-operations
> [...]
> ============== damon-operations (6 subtests) ===============
> [PASSED] damon_test_three_regions_in_vmas
> [PASSED] damon_test_apply_three_regions1
> [PASSED] damon_test_apply_three_regions2
> [PASSED] damon_test_apply_three_regions3
> [PASSED] damon_test_apply_three_regions4
> [PASSED] damon_test_split_evenly
> ================ [PASSED] damon-operations =================
>
> Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces")
> Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Thanks,
SJ
[...]
Thanks for all the reviews.
V5:
Replace /sys/kernel/livepatch also in other/already existing tests.
Improve commit message of 3rd patch.
V4:
Use variable for /sys/kernel/debug.
Be consistent with "" around variables.
Fix path in commit message to /sys/kernel/debug/kprobes/enabled.
V3:
Save and restore kprobe state also when test fails, by integrating it
into setup_config() and cleanup().
Rename SYSFS variables in a more logical way.
Sort test modules in alphabetical order.
Rename module description.
V2:
Save and restore kprobe state.
Michael Vetter (3):
selftests: livepatch: rename KLP_SYSFS_DIR to SYSFS_KLP_DIR
selftests: livepatch: save and restore kprobe state
selftests: livepatch: test livepatching a kprobed function
tools/testing/selftests/livepatch/Makefile | 3 +-
.../testing/selftests/livepatch/functions.sh | 29 +++++----
.../selftests/livepatch/test-callbacks.sh | 24 +++----
.../selftests/livepatch/test-ftrace.sh | 2 +-
.../selftests/livepatch/test-kprobe.sh | 62 +++++++++++++++++++
.../selftests/livepatch/test-livepatch.sh | 12 ++--
.../testing/selftests/livepatch/test-state.sh | 8 +--
.../selftests/livepatch/test-syscall.sh | 2 +-
.../testing/selftests/livepatch/test-sysfs.sh | 8 +--
.../selftests/livepatch/test_modules/Makefile | 3 +-
.../livepatch/test_modules/test_klp_kprobe.c | 38 ++++++++++++
11 files changed, 150 insertions(+), 41 deletions(-)
create mode 100755 tools/testing/selftests/livepatch/test-kprobe.sh
create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c
--
2.47.0
For logging to be useful, something has to set RET and retmsg by calling
ret_set_ksft_status(). There is a suite of functions to that end in
forwarding/lib: check_err, check_fail et.al. Move them to net/lib.sh so
that every net test can use them.
Existing lib.sh users might be using these same names for their functions.
However lib.sh is always sourced near the top of the file (checked), and
whatever new definitions will simply override the ones provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 73 -------------------
tools/testing/selftests/net/lib.sh | 73 +++++++++++++++++++
2 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index d28dbf27c1f0..8625e3c99f55 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -445,79 +445,6 @@ done
##############################################################################
# Helpers
-# Whether FAILs should be interpreted as XFAILs. Internal.
-FAIL_TO_XFAIL=
-
-check_err()
-{
- local err=$1
- local msg=$2
-
- if ((err)); then
- if [[ $FAIL_TO_XFAIL = yes ]]; then
- ret_set_ksft_status $ksft_xfail "$msg"
- else
- ret_set_ksft_status $ksft_fail "$msg"
- fi
- fi
-}
-
-check_fail()
-{
- local err=$1
- local msg=$2
-
- check_err $((!err)) "$msg"
-}
-
-check_err_fail()
-{
- local should_fail=$1; shift
- local err=$1; shift
- local what=$1; shift
-
- if ((should_fail)); then
- check_fail $err "$what succeeded, but should have failed"
- else
- check_err $err "$what failed"
- fi
-}
-
-xfail()
-{
- FAIL_TO_XFAIL=yes "$@"
-}
-
-xfail_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW = yes ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
-omit_on_slow()
-{
- if [[ $KSFT_MACHINE_SLOW != yes ]]; then
- "$@"
- fi
-}
-
-xfail_on_veth()
-{
- local dev=$1; shift
- local kind
-
- kind=$(ip -j -d link show dev $dev |
- jq -r '.[].linkinfo.info_kind')
- if [[ $kind = veth ]]; then
- FAIL_TO_XFAIL=yes "$@"
- else
- "$@"
- fi
-}
-
not()
{
"$@"
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 4f52b8e48a3a..6bcf5d13879d 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -361,3 +361,76 @@ tests_run()
$current_test
done
}
+
+# Whether FAILs should be interpreted as XFAILs. Internal.
+FAIL_TO_XFAIL=
+
+check_err()
+{
+ local err=$1
+ local msg=$2
+
+ if ((err)); then
+ if [[ $FAIL_TO_XFAIL = yes ]]; then
+ ret_set_ksft_status $ksft_xfail "$msg"
+ else
+ ret_set_ksft_status $ksft_fail "$msg"
+ fi
+ fi
+}
+
+check_fail()
+{
+ local err=$1
+ local msg=$2
+
+ check_err $((!err)) "$msg"
+}
+
+check_err_fail()
+{
+ local should_fail=$1; shift
+ local err=$1; shift
+ local what=$1; shift
+
+ if ((should_fail)); then
+ check_fail $err "$what succeeded, but should have failed"
+ else
+ check_err $err "$what failed"
+ fi
+}
+
+xfail()
+{
+ FAIL_TO_XFAIL=yes "$@"
+}
+
+xfail_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW = yes ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
+
+omit_on_slow()
+{
+ if [[ $KSFT_MACHINE_SLOW != yes ]]; then
+ "$@"
+ fi
+}
+
+xfail_on_veth()
+{
+ local dev=$1; shift
+ local kind
+
+ kind=$(ip -j -d link show dev $dev |
+ jq -r '.[].linkinfo.info_kind')
+ if [[ $kind = veth ]]; then
+ FAIL_TO_XFAIL=yes "$@"
+ else
+ "$@"
+ fi
+}
--
2.45.0
It would be good to use the same mechanism for scheduling and dispatching
general net tests as the many forwarding tests already use. To that end,
move the logging helpers to net/lib.sh so that every net test can use them.
Existing lib.sh users might be using the name themselves. However lib.sh is
always sourced near the top of the file (checked), and whatever new
definition will simply override the one provided by lib.sh.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Amit Cohen <amcohen(a)nvidia.com>
---
CC: Shuah Khan <shuah(a)kernel.org>
CC: Benjamin Poirier <bpoirier(a)nvidia.com>
CC: Hangbin Liu <liuhangbin(a)gmail.com>
CC: linux-kselftest(a)vger.kernel.org
CC: Jiri Pirko <jiri(a)resnulli.us>
---
tools/testing/selftests/net/forwarding/lib.sh | 10 ----------
tools/testing/selftests/net/lib.sh | 10 ++++++++++
2 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 41dd14c42c48..d28dbf27c1f0 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -1285,16 +1285,6 @@ matchall_sink_create()
action drop
}
-tests_run()
-{
- local current_test
-
- for current_test in ${TESTS:-$ALL_TESTS}; do
- in_defer_scope \
- $current_test
- done
-}
-
cleanup()
{
pre_cleanup
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh
index 691318b1ec55..4f52b8e48a3a 100644
--- a/tools/testing/selftests/net/lib.sh
+++ b/tools/testing/selftests/net/lib.sh
@@ -351,3 +351,13 @@ log_info()
echo "INFO: $msg"
}
+
+tests_run()
+{
+ local current_test
+
+ for current_test in ${TESTS:-$ALL_TESTS}; do
+ in_defer_scope \
+ $current_test
+ done
+}
--
2.45.0
This series is a follow-up to Joey's Permission Overlay Extension (POE)
series [1] that recently landed on mainline. The goal is to improve the
way we handle the register that governs which pkeys/POIndex are
accessible (POR_EL0) during signal delivery. As things stand, we may
unexpectedly fail to write the signal frame on the stack because POR_EL0
is not reset before the uaccess operations. See patch 3 for more details
and the main changes this series brings.
A similar series landed recently for x86/MPK [2]; the present series
aims at aligning arm64 with x86. Worth noting: once the signal frame is
written, POR_EL0 is still set to POR_EL0_INIT, granting access to pkey 0
only. This means that a program that sets up an alternate signal stack
with a non-zero pkey will need some assembly trampoline to set POR_EL0
before invoking the real signal handler, as discussed here [3].
The x86 series also added kselftests to ensure that no spurious SIGSEGV
occurs during signal delivery regardless of which pkey is accessible at
the point where the signal is delivered. This series adapts those
kselftests to allow running them on arm64 (patch 4-5).
Finally patch 2 is a clean-up following feedback on Joey's series [4].
I have tested this series on arm64 and x86_64 (booting and running the
protection_keys and pkey_sighandler_tests mm kselftests).
- Kevin
[1] https://lore.kernel.org/linux-arm-kernel/20240822151113.1479789-1-joey.goul…
[2] https://lore.kernel.org/lkml/20240802061318.2140081-1-aruna.ramakrishna@ora…
[3] https://lore.kernel.org/lkml/CABi2SkWxNkP2O7ipkP67WKz0-LV33e5brReevTTtba6oK…
[4] https://lore.kernel.org/linux-arm-kernel/20241015114116.GA19334@willie-the-…
Cc: akpm(a)linux-foundation.org
Cc: anshuman.khandual(a)arm.com
Cc: aruna.ramakrishna(a)oracle.com
Cc: broonie(a)kernel.org
Cc: catalin.marinas(a)arm.com
Cc: dave.hansen(a)linux.intel.com
Cc: dave.martin(a)arm.com
Cc: jeffxu(a)chromium.org
Cc: joey.gouly(a)arm.com
Cc: shuah(a)kernel.org
Cc: will(a)kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: x86(a)kernel.org
Kevin Brodsky (5):
arm64: signal: Remove unused macro
arm64: signal: Remove unnecessary check when saving POE state
arm64: signal: Improve POR_EL0 handling to avoid uaccess failures
selftests/mm: Use generic pkey register manipulation
selftests/mm: Enable pkey_sighandler_tests on arm64
arch/arm64/kernel/signal.c | 92 +++++++++++++---
tools/testing/selftests/mm/Makefile | 8 +-
tools/testing/selftests/mm/pkey-arm64.h | 1 +
tools/testing/selftests/mm/pkey-x86.h | 2 +
.../selftests/mm/pkey_sighandler_tests.c | 101 +++++++++++++-----
5 files changed, 159 insertions(+), 45 deletions(-)
--
2.43.0
Commit 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX
instructions") unconditionally added -march=x86-64-v2 to the CFLAGS used
to build the KVM selftests which does not work on non-x86 architectures:
cc1: error: unknown value ‘x86-64-v2’ for ‘-march’
Fix this by making the addition of this x86 specific command line flag
conditional on building for x86.
Fixes: 9a400068a158 ("KVM: selftests: x86: Avoid using SSE/AVX instructions")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/kvm/Makefile | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e6b7e01d57080b304b21120f0d47bda260ba6c43..156fbfae940feac649f933dc6e048a2e2926542a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -244,11 +244,13 @@ CFLAGS += -Wall -Wstrict-prototypes -Wuninitialized -O2 -g -std=gnu99 \
-fno-stack-protector -fno-PIE -I$(LINUX_TOOL_INCLUDE) \
-I$(LINUX_TOOL_ARCH_INCLUDE) -I$(LINUX_HDR_PATH) -Iinclude \
-I$(<D) -Iinclude/$(ARCH_DIR) -I ../rseq -I.. $(EXTRA_CFLAGS) \
- -march=x86-64-v2 \
$(KHDR_INCLUDES)
ifeq ($(ARCH),s390)
CFLAGS += -march=z10
endif
+ifeq ($(ARCH),x86)
+ CFLAGS += -march=x86-64-v2
+endif
ifeq ($(ARCH),arm64)
tools_dir := $(top_srcdir)/tools
arm64_tools_dir := $(tools_dir)/arch/arm64/tools/
---
base-commit: d129377639907fce7e0a27990e590e4661d3ee02
change-id: 20241021-kvm-build-break-495abedc51e0
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Recently, a defer helper was added to Python selftests. The idea is to keep
cleanup commands close to their dirtying counterparts, thereby making it
more transparent what is cleaning up what, making it harder to miss a
cleanup, and make the whole cleanup business exception safe. All these
benefits are applicable to bash as well, exception safety can be
interpreted in terms of safety vs. a SIGINT.
This patchset therefore introduces a framework of several helpers that
serve to schedule cleanups in bash selftests.
- Patch #1 has more details about the primitives being introduced.
Patch #2 adds a fallback cleanup() function to lib.sh, because ideally
selftests wouldn't need to introduce a dedicated cleanup function at all.
- Patch #3 adds a parameter to stop_traffic(), which makes it possible to
start other background processes after the traffic is started without
confusing the cleanup.
- Patches #4 to #10 convert a number of selftests.
The goal was to convert all tests that use start_traffic / stop_traffic
to the defer framework. Leftover traffic generators are a particularly
painful sort of a missed cleanup. Normal unfinished cleanups can usually
be cleaned up simply by rerunning the test and interrupting it early to
let the cleanups run again / in full. This does not work with
stop_traffic, because it is only issued at the end of the test case that
starts the traffic. At the same time, leftover traffic generators
influence follow-up test runs, and are hard to notice.
The tests were however converted whole-sale, not just their traffic bits.
Thus they form a proof of concept of the defer framework.
v2:
- Patch #1:
- In __defer__schedule(), use ndefers in place of
${__DEFER__NJOBS[$ndefers_key]}
- Patch #4:
- Defer stop_traffic including the sleep. The sleep is actually
necessary and v1 was wrong in that it had the sleep prior to the
stop_traffic invocation.
v1 (from the RFC):
- Patch #1:
- Added the priority defer track
- Dropped defer_scoped_fn, added in_defer_scope
- Extracted to a separate independent module
- Patch #2:
- Moved this bit to a separate patch
- Patch #3:
- New patch
- Patch #4 (RED):
- Squashed the individual RED-related patches into one
- Converted the SW datapath RED selftest as well
- Patch #5 (TBF):
- Fully converted the selftest, not just stop_traffic
- Patches #6, #7, #8, #9, #10:
- New patch
Petr Machata (10):
selftests: net: lib: Introduce deferred commands
selftests: forwarding: Add a fallback cleanup()
selftests: forwarding: lib: Allow passing PID to stop_traffic()
selftests: RED: Use defer for test cleanup
selftests: TBF: Use defer for test cleanup
selftests: ETS: Use defer for test cleanup
selftests: mlxsw: qos_mc_aware: Use defer for test cleanup
selftests: mlxsw: qos_ets_strict: Use defer for test cleanup
selftests: mlxsw: qos_max_descriptors: Use defer for test cleanup
selftests: mlxsw: devlink_trap_police: Use defer for test cleanup
.../drivers/net/mlxsw/devlink_trap_policer.sh | 85 ++++----
.../drivers/net/mlxsw/qos_ets_strict.sh | 167 ++++++++--------
.../drivers/net/mlxsw/qos_max_descriptors.sh | 118 ++++-------
.../drivers/net/mlxsw/qos_mc_aware.sh | 146 +++++++-------
.../selftests/drivers/net/mlxsw/sch_ets.sh | 26 ++-
.../drivers/net/mlxsw/sch_red_core.sh | 185 +++++++++---------
.../drivers/net/mlxsw/sch_red_ets.sh | 24 +--
.../drivers/net/mlxsw/sch_red_root.sh | 18 +-
tools/testing/selftests/net/forwarding/lib.sh | 13 +-
.../selftests/net/forwarding/sch_ets.sh | 7 +-
.../selftests/net/forwarding/sch_ets_core.sh | 81 +++-----
.../selftests/net/forwarding/sch_ets_tests.sh | 14 +-
.../selftests/net/forwarding/sch_red.sh | 103 ++++------
.../selftests/net/forwarding/sch_tbf_core.sh | 91 +++------
.../net/forwarding/sch_tbf_etsprio.sh | 7 +-
.../selftests/net/forwarding/sch_tbf_root.sh | 3 +-
tools/testing/selftests/net/lib.sh | 3 +
tools/testing/selftests/net/lib/Makefile | 2 +-
tools/testing/selftests/net/lib/sh/defer.sh | 115 +++++++++++
19 files changed, 595 insertions(+), 613 deletions(-)
create mode 100644 tools/testing/selftests/net/lib/sh/defer.sh
--
2.45.0