This series adds KVM selftests for Secure AVIC.
The Secure AVIC KVM support patch series is at:
https://lore.kernel.org/kvm/20250228085115.105648-1-Neeraj.Upadhyay@amd.com…
Git tree is available at:
https://github.com/AMDESE/linux-kvm/tree/savic-host-latest
This series depends on SNP Smoke tests patch series by Pratik:
https://lore.kernel.org/lkml/20250123220100.339867-1-prsampat@amd.com/
- Patch 1-6 are taken from Peter Gonda's patch series for GHCB support
for SEV-ES guests. GHCB support for SNP guests is added to these
patches.
https://lore.kernel.org/lkml/Ziln_Spd6KtgVqkr@google.com/T/#m6c0fc7e2b2e35f…
Patches 7-8 are fixes on top of Peter's series.
- Patch 9 fixes IDT vector for #VC exception (29) which has a valid
error code associated with the exception.
- Patch 10 adds #VC exception handling for rdmsr/wrmsr accesses of
SEV-ES guests.
- Patch 11 skips vm_is_gpa_protected() check for APIC MMIO base address
in __virt_pg_map() for VMs with protected memory. This is required
for xapic tests enablement for SEV VMs.
- Patch 12 and 13 are PoC patches to support MMIO #VC handling for SEV-ES
guests. They add x86 instruction decoding support.
- Patch 14 adds #VC handling for MMIO accesses by SEV-ES guests.
- Patch 15 adds movabs instruction decoding for cases where compiler
generates movabs for MMIO reads/writes.
- Patch 16 adds SEV guests testing support in xapic_state_test.
- Patch 17 adds x2apic mode support in xapic_ipi_test.
- Patch 18 adds SEV VMs support in xapic_ipi_test.
- Patch 19 adds a library for Secure AVIC backing page initialization
and enabling Secure AVIC for a SNP guest.
- Patch 20 adds support for SVM_EXIT_AVIC_UNACCELERATED_ACCESS #VC
exception handling for APIC msr reads/writes by Secure AVIC enabled
VM.
- Patch 21 adds support for SVM_EXIT_AVIC_INCOMPLETE_IPI #VC error
code handling for Secure AVIC enabled VM.
- Patch 22 adds args param to kvm_arch_vm_post_create() to pass
vmsa features to KVM_SEV_INIT2 ioctl for SEV VMs.
- Patch 23 adds an api for passing guest APIC page GPA to Hypervisor.
- Patch 24 adds Secure AVIC VM support to xapic_ipi_test test.
- Patch 25 adds a test for verifying APIC regs MMIO/msr accesses
for a Secure AVIC VM before it enables x2apic mode, in x2apic mode
and after enabling Secure AVIC in the Secure AVIC control msr.
- Patch 26 adds a msr access test to verify accelerated/unaccelerated
msr acceses for Secure AVIC enabled VM.
- Patch 27 tests idle hlt for Secure AVIC enabled VM.
- Patch 28 adds IOAPIC tests for Secure AVIC enabled VM.
- Patch 29 adds cross-vCPU IPI testing with various destination
shorthands for Secure AVIC enabled VM.
- Patch 30 adds Hypervisor NMI injection and cross-vCPU ICR based NMI
for Secure AVIC enabled VM.
- Patch 31 adds MSI injection test for Secure AVIC enabled VM.
Neeraj Upadhyay (25):
KVM: selftests: Fix ghcb_entry returned in ghcb_alloc()
KVM: selftests: Make GHCB entry page size aligned
KVM: selftests: Add support for #VC in x86 exception handlers
KVM: selftests: Add MSR VC handling support for SEV-ES VMs
KVM: selftests: Skip vm_is_gpa_protected() call for APIC MMIO base
KVM: selftests: Add instruction decoding support
KVM: selftests: Add instruction decoding support
KVM: selftests: Add MMIO VC exception handling for SEV-ES guests
KVM: selftests: Add instruction decoding for movabs instructions
KVM: selftests: Add SEV guests support in xapic_state_test
KVM: selftests: Add x2apic mode testing in xapic_ipi_test
KVM: selftests: Add SEV VM support in xapic_ipi_test
KVM: selftests: Add Secure AVIC lib
KVM: selftests: Add unaccelerated APIC msrs #VC handling
KVM: selftests: Add IPI handling support for Secure AVIC
KVM: selftests: Add args param to kvm_arch_vm_post_create()
KVM: selftests: Add SAVIC GPA notification GHCB call
KVM: selftests: Add Secure AVIC mode to xapic_ipi_test
KVM: selftests: Add Secure AVIC APIC regs test
KVM: selftests: Add test to verify APIC MSR accesses for SAVIC guest
KVM: selftests: Extend savic test with idle halt testing
KVM: selftests: Add IOAPIC tests for Secure AVIC
KVM: selftests: Add cross-vCPU IPI testing for SAVIC guests
KVM: selftests: Add NMI test for SAVIC guests
KVM: selftests: Add MSI injection test for SAVIC
Peter Gonda (6):
Add GHCB with setters and getters
Add arch specific additional guest pages
Add vm_vaddr_alloc_pages_shared()
Add GHCB allocations and helpers
Add is_sev_enabled() helpers
Add ability for SEV-ES guests to use ucalls via GHCB
tools/arch/x86/include/asm/msr-index.h | 4 +-
tools/testing/selftests/kvm/.gitignore | 3 +-
tools/testing/selftests/kvm/Makefile.kvm | 16 +-
.../testing/selftests/kvm/include/kvm_util.h | 14 +-
.../testing/selftests/kvm/include/x86/apic.h | 57 +
.../selftests/kvm/include/x86/ex_regs.h | 21 +
.../selftests/kvm/include/x86/insn-eval.h | 48 +
.../selftests/kvm/include/x86/processor.h | 18 +-
.../testing/selftests/kvm/include/x86/savic.h | 25 +
tools/testing/selftests/kvm/include/x86/sev.h | 15 +
tools/testing/selftests/kvm/include/x86/svm.h | 109 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 109 +-
.../testing/selftests/kvm/lib/x86/handlers.S | 4 +-
.../testing/selftests/kvm/lib/x86/insn-eval.c | 1726 +++++++++++++++++
.../testing/selftests/kvm/lib/x86/processor.c | 24 +-
tools/testing/selftests/kvm/lib/x86/savic.c | 490 +++++
tools/testing/selftests/kvm/lib/x86/sev.c | 598 +++++-
tools/testing/selftests/kvm/lib/x86/ucall.c | 18 +
tools/testing/selftests/kvm/s390/cmma_test.c | 2 +-
tools/testing/selftests/kvm/x86/savic_test.c | 1549 +++++++++++++++
.../selftests/kvm/x86/sev_smoke_test.c | 40 +-
.../selftests/kvm/x86/xapic_ipi_test.c | 183 +-
.../selftests/kvm/x86/xapic_state_test.c | 117 +-
23 files changed, 5084 insertions(+), 106 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86/ex_regs.h
create mode 100644 tools/testing/selftests/kvm/include/x86/insn-eval.h
create mode 100644 tools/testing/selftests/kvm/include/x86/savic.h
create mode 100644 tools/testing/selftests/kvm/lib/x86/insn-eval.c
create mode 100644 tools/testing/selftests/kvm/lib/x86/savic.c
create mode 100644 tools/testing/selftests/kvm/x86/savic_test.c
base-commit: f7bafceba76e9ab475b413578c1757ee18c3e44b
--
2.34.1
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
The test_tunnel.sh script has already been partly migrated to
test_progs in prog_tests/test_tunnel.c so I add my work to it.
PATCH 1 & 2 create some helpers to avoid code duplication and ease the
migration in the following patches.
PATCH 3 to 9 migrate the tests of gre, ip6gre, erspan, ip6erspan,
geneve, ip6geneve and ip6tnl tunnels.
PATCH 10 removes test_tunnel.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Bastien Curutchet (eBPF Foundation) (10):
selftests/bpf: test_tunnel: Add generic_attach* helpers
selftests/bpf: test_tunnel: Add ping helpers
selftests/bpf: test_tunnel: Move gre tunnel test to test_progs
selftests/bpf: test_tunnel: Move ip6gre tunnel test to test_progs
selftests/bpf: test_tunnel: Move erspan tunnel tests to test_progs
selftests/bpf: test_tunnel: Move ip6erspan tunnel test to test_progs
selftests/bpf: test_tunnel: Move geneve tunnel test to test_progs
selftests/bpf: test_tunnel: Move ip6geneve tunnel test to test_progs
selftests/bpf: test_tunnel: Move ip6tnl tunnel tests to test_progs
selftests/bpf: test_tunnel: Remove test_tunnel.sh
tools/testing/selftests/bpf/Makefile | 1 -
.../testing/selftests/bpf/prog_tests/test_tunnel.c | 627 +++++++++++++++++---
tools/testing/selftests/bpf/test_tunnel.sh | 645 ---------------------
3 files changed, 532 insertions(+), 741 deletions(-)
---
base-commit: 16566afa71143757b49fc4b2a331639f487d105a
change-id: 20250131-tunnels-59b641ea3f10
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
1. Issue
Syzkaller reported this issue [1].
2. Reproduce
We can reproduce this issue by using the test_sockmap_with_close_on_write()
test I provided in selftest, also you need to apply the following patch to
ensure 100% reproducibility (sleep after checking sock):
'''
static void sk_psock_verdict_data_ready(struct sock *sk)
{
.......
if (unlikely(!sock))
return;
+ if (!strcmp("test_progs", current->comm)) {
+ printk("sleep 2s to wait socket freed\n");
+ mdelay(2000);
+ printk("sleep end\n");
+ }
ops = READ_ONCE(sock->ops);
if (!ops || !ops->read_skb)
return;
}
'''
Then running './test_progs -v sockmap_basic', and if the kernel has KASAN
enabled [2], you will see the following warning:
'''
BUG: KASAN: slab-use-after-free in sk_psock_verdict_data_ready+0x29b/0x2d0
Read of size 8 at addr ffff88813a777020 by task test_progs/47055
Tainted: [O]=OOT_MODULE
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_address_description.constprop.0+0x30/0x420
? sk_psock_verdict_data_ready+0x29b/0x2d0
print_report+0xb7/0x270
? sk_psock_verdict_data_ready+0x29b/0x2d0
? kasan_addr_to_slab+0xd/0xa0
? sk_psock_verdict_data_ready+0x29b/0x2d0
kasan_report+0xca/0x100
? sk_psock_verdict_data_ready+0x29b/0x2d0
sk_psock_verdict_data_ready+0x29b/0x2d0
unix_stream_sendmsg+0x4a6/0xa40
? __pfx_unix_stream_sendmsg+0x10/0x10
? fdget+0x2c1/0x3a0
__sys_sendto+0x39c/0x410
'''
3. Reason
'''
CPU0 CPU1
unix_stream_sendmsg(sk):
other = unix_peer(sk)
other->sk_data_ready(other):
socket *sock = sk->sk_socket
if (unlikely(!sock))
return;
close(other):
...
other->close()
free(socket)
READ_ONCE(sock->ops)
^
use 'sock' after free
'''
For TCP, UDP, or other protocols, we have already performed
rcu_read_lock() when the network stack receives packets in ip_input.c:
'''
ip_local_deliver_finish():
rcu_read_lock()
ip_protocol_deliver_rcu()
xxx_rcv
rcu_read_unlock()
'''
However, for Unix sockets, sk_data_ready is called directly from the
process context without rcu_read_lock() protection.
4. Solution
Based on the fact that the 'struct socket' is released using call_rcu(),
We add rcu_read_{un}lock() at the entrance and exit of our sk_data_ready.
It will not increase performance overhead, at least for TCP and UDP, they
are already in a relatively large critical section.
Of course, we can also add a custom callback for Unix sockets and call
rcu_read_lock() before calling _verdict_data_ready like this:
'''
if (sk_is_unix(sk))
sk->sk_data_ready = sk_psock_verdict_data_ready_rcu;
else
sk->sk_data_ready = sk_psock_verdict_data_ready;
sk_psock_verdict_data_ready_rcu():
rcu_read_lock()
sk_psock_verdict_data_ready()
rcu_read_unlock()
'''
However, this will cause too many branches, and it's not suitable to
distinguish network protocols in skmsg.c.
[1] https://syzkaller.appspot.com/bug?extid=dd90a702f518e0eac072
[2] https://syzkaller.appspot.com/text?tag=KernelConfig&x=1362a5aee630ff34
Jiayuan Chen (3):
bpf, sockmap: avoid using sk_socket after free
selftests/bpf: Add socketpair to create_pair to support unix socket
selftests/bpf: Add edge case tests for sockmap
net/core/skmsg.c | 18 ++++--
.../selftests/bpf/prog_tests/socket_helpers.h | 13 ++++-
.../selftests/bpf/prog_tests/sockmap_basic.c | 57 +++++++++++++++++++
3 files changed, 82 insertions(+), 6 deletions(-)
--
2.47.1
The GRO selftests can flake and have some confusing behavior. These
changes make the output and return value of GRO behave as expected, then
deflake the tests.
v2:
- Split into multiple commits.
- Reduced napi_defer_hard_irqs to 1.
- Reduced gro_flush_timeout to 100us.
- Fixed comment that wasn't updated.
v1: https://lore.kernel.org/netdev/20250218164555.1955400-1-krakauer@google.com/
Kevin Krakauer (3):
selftests/net: have `gro.sh -t` return a correct exit code
selftests/net: only print passing message in GRO tests when tests pass
selftests/net: deflake GRO tests
tools/testing/selftests/net/gro.c | 8 +++++---
tools/testing/selftests/net/gro.sh | 7 ++++---
tools/testing/selftests/net/setup_veth.sh | 3 ++-
3 files changed, 11 insertions(+), 7 deletions(-)
--
2.48.1.658.g4767266eb4-goog
There is a spelling mistake in a ksft_test_result_skip message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/kvm/s390/cpumodel_subfuncs_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/s390/cpumodel_subfuncs_test.c b/tools/testing/selftests/kvm/s390/cpumodel_subfuncs_test.c
index 27255880dabd..aded795d42be 100644
--- a/tools/testing/selftests/kvm/s390/cpumodel_subfuncs_test.c
+++ b/tools/testing/selftests/kvm/s390/cpumodel_subfuncs_test.c
@@ -291,7 +291,7 @@ int main(int argc, char *argv[])
ksft_test_result_pass("%s\n", testlist[idx].subfunc_name);
free(array);
} else {
- ksft_test_result_skip("%s feature is not avaialable\n",
+ ksft_test_result_skip("%s feature is not available\n",
testlist[idx].subfunc_name);
}
}
--
2.47.2
[ Background ]
On ARM GIC systems and others, the target address of the MSI is translated
by the IOMMU. For GIC, the MSI address page is called "ITS" page. When the
IOMMU is disabled, the MSI address is programmed to the physical location
of the GIC ITS page (e.g. 0x20200000). When the IOMMU is enabled, the ITS
page is behind the IOMMU, so the MSI address is programmed to an allocated
IO virtual address (a.k.a IOVA), e.g. 0xFFFF0000, which must be mapped to
the physical ITS page: IOVA (0xFFFF0000) ===> PA (0x20200000).
When a 2-stage translation is enabled, IOVA will be still used to program
the MSI address, though the mappings will be in two stages:
IOVA (0xFFFF0000) ===> IPA (e.g. 0x80900000) ===> PA (0x20200000)
(IPA stands for Intermediate Physical Address).
If the device that generates MSI is attached to an IOMMU_DOMAIN_DMA, the
IOVA is dynamically allocated from the top of the IOVA space. If attached
to an IOMMU_DOMAIN_UNMANAGED (e.g. a VFIO passthrough device), the IOVA is
fixed to an MSI window reported by the IOMMU driver via IOMMU_RESV_SW_MSI,
which is hardwired to MSI_IOVA_BASE (IOVA==0x8000000) for ARM IOMMUs.
So far, this IOMMU_RESV_SW_MSI works well as kernel is entirely in charge
of the IOMMU translation (1-stage translation), since the IOVA for the ITS
page is fixed and known by kernel. However, with virtual machine enabling
a nested IOMMU translation (2-stage), a guest kernel directly controls the
stage-1 translation with an IOMMU_DOMAIN_DMA, mapping a vITS page (at an
IPA 0x80900000) onto its own IOVA space (e.g. 0xEEEE0000). Then, the host
kernel can't know that guest-level IOVA to program the MSI address.
There have been two approaches to solve this problem:
1. Create an identity mapping in the stage-1. VMM could insert a few RMRs
(Reserved Memory Regions) in guest's IORT. Then the guest kernel would
fetch these RMR entries from the IORT and create an IOMMU_RESV_DIRECT
region per iommu group for a direct mapping. Eventually, the mappings
would look like: IOVA (0x8000000) === IPA (0x8000000) ===> 0x20200000
This requires an IOMMUFD ioctl for kernel and VMM to agree on the IPA.
2. Forward the guest-level MSI IOVA captured by VMM to the host-level GIC
driver, to program the correct MSI IOVA. Forward the VMM-defined vITS
page location (IPA) to the kernel for the stage-2 mapping. Eventually:
IOVA (0xFFFF0000) ===> IPA (0x80900000) ===> PA (0x20200000)
This requires a VFIO ioctl (for IOVA) and an IOMMUFD ioctl (for IPA).
Worth mentioning that when Eric Auger was working on the same topic with
the VFIO iommu uAPI, he had a solution for approach (2) first, and then
switched to approach (1), suggested by Jean-Philippe for the reduction of
complexity.
Approach (1) basically feels like the existing VFIO passthrough that has
a 1-stage mapping for the unmanaged domain, yet only by shifting the MSI
mapping from stage 1 (no-viommu case) to stage 2 (has-viommu case). So,
it could reuse the existing IOMMU_RESV_SW_MSI piece, by sharing the same
idea of "VMM leaving everything to the kernel".
Approach (2) is an ideal solution, yet it requires additional effort for
kernel to be aware of the stage-1 gIOVAs and the stage-2 IPAs for vITS
page(s), which demands VMM to closely cooperate.
* It also brings some complicated use cases to the table where the host
or/and guest system(s) has/have multiple ITS pages.
[ Execution ]
Though these two approaches feel very different on the surface, they can
share some underlying common infrastructure. Currently, only one pair of
sw_msi functions (prepare/compose) are provided by dma-iommu for irqchip
drivers to directly use. There could be different versions of functions
from different domain owners: for existing VFIO passthrough cases and in-
kernel DMA domain cases, reuse the existing dma-iommu's version of sw_msi
functions; for nested translation use cases, there can be another version
of sw_msi functions to handle mapping and msi_msg(s) differently.
As a part-1 series, this refactors the core infrastructure:
- Get rid of the duplication in the "compose" function
- Introduce a function pointer for the previously "prepare" function
- Allow different domain owners to set their own "sw_msi" implementations
- Implement an iommufd_sw_msi function to additionally support non-nested
use cases and also prepare for a nested translation use case using the
approach (1)
[ Future Plan ]
Part-2 will add support of approach (1), i.e. RMR solution:
- Add a pair of IOMMUFD options for a SW_MSI window for kernel and VMM to
agree on (for approach 1)
Part-3 and beyond will continue the effort of supporting approach (2) i.e.
a complete vITS-to-pITS mapping:
- Map the phsical ITS page (potentially via IOMMUFD_CMD_IOAS_MAP_MSI)
- Convey the IOVAs per-irq (potentially via VFIO_IRQ_SET_ACTION_PREPARE)
---
This is a joint effort that includes Jason's rework in irq/iommu/iommufd
base level and my additional patches on top of that for new uAPIs.
This series is on github:
https://github.com/nicolinc/iommufd/commits/iommufd_msi_p1-v2
For testing with nested SMMU (approach 1):
https://github.com/nicolinc/iommufd/commits/wip/iommufd_msi_p2-v2
Pairing QEMU branch for testing (approach 1):
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_msi_p2-v2-rmr
Changelog
v2
* Split the iommufd ioctl for approach (1) out of this part-1
* Rebase on Jason's for-next tree (6.14-rc2) for two iommufd patches
* Update commit logs in two irqchip patches to make narrative clearer
* Keep iommu_dma_compose_msi_msg() in PATCH-1 as a small cleaner step
* Improve with some coding style changes: kdoc and 100-char wrapping
v1
https://lore.kernel.org/kvm/cover.1739005085.git.nicolinc@nvidia.com/
* Rebase on v6.14-rc1 and iommufd_attach_handle-v1 series
https://lore.kernel.org/all/cover.1738645017.git.nicolinc@nvidia.com/
* Correct typos
* Replace set_bit with __set_bit
* Use a common helper to get iommufd_handle
* Add kdoc for iommu_msi_iova/iommu_msi_page_shift
* Rename msi_msg_set_msi_addr() to msi_msg_set_addr()
* Update selftest for a better coverage for the new options
* Change IOMMU_OPTION_SW_MSI_START/SIZE to be per-idev and properly
check against device's reserved region list
RFCv2
https://lore.kernel.org/kvm/cover.1736550979.git.nicolinc@nvidia.com/
* Rebase on v6.13-rc6
* Drop all the irq/pci patches and rework the compose function instead
* Add a new sw_msi op to iommu_domain for a per type implementation and
let iommufd core has its own implementation to support both approaches
* Add RMR-solution (approach 1) support since it is straightforward and
have been used in some out-of-tree projects widely
RFCv1
https://lore.kernel.org/kvm/cover.1731130093.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Jason Gunthorpe (5):
genirq/msi: Store the IOMMU IOVA directly in msi_desc instead of
iommu_cookie
genirq/msi: Refactor iommu_dma_compose_msi_msg()
iommu: Make iommu_dma_prepare_msi() into a generic operation
irqchip: Have CONFIG_IRQ_MSI_IOMMU be selected by irqchips that need
it
iommufd: Implement sw_msi support natively
Nicolin Chen (2):
iommu: Turn fault_data to iommufd private pointer
iommu: Turn iova_cookie to dma-iommu private pointer
drivers/iommu/Kconfig | 1 -
drivers/irqchip/Kconfig | 4 +
kernel/irq/Kconfig | 1 +
drivers/iommu/iommufd/iommufd_private.h | 23 +++-
include/linux/iommu.h | 58 +++++----
include/linux/msi.h | 55 +++++---
drivers/iommu/dma-iommu.c | 63 +++-------
drivers/iommu/iommu.c | 29 +++++
drivers/iommu/iommufd/device.c | 160 ++++++++++++++++++++----
drivers/iommu/iommufd/fault.c | 2 +-
drivers/iommu/iommufd/hw_pagetable.c | 5 +-
drivers/iommu/iommufd/main.c | 9 ++
drivers/irqchip/irq-gic-v2m.c | 5 +-
drivers/irqchip/irq-gic-v3-its.c | 13 +-
drivers/irqchip/irq-gic-v3-mbi.c | 12 +-
drivers/irqchip/irq-ls-scfg-msi.c | 5 +-
16 files changed, 309 insertions(+), 136 deletions(-)
base-commit: dc10ba25d43f433ad5d9e8e6be4f4d2bb3cd9ddb
prerequisite-patch-id: 0000000000000000000000000000000000000000
--
2.43.0
test_select_reuseport_kern.c is currently including <stdlib.h>, but it
does not use any definition from there.
Remove stdlib.h inclusion from test_select_reuseport_kern.c
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
I stumbled upon this specific header include while trying to build selftests on
the current bpf-next_base branch, which ended with this error:
[...]
CLNG-BPF [test_progs-cpuv4] test_select_reuseport_kern.bpf.o
In file included from progs/test_select_reuseport_kern.c:4:
/usr/include/bits/floatn.h:83:52: error: unsupported machine mode
'__TC__'
83 | typedef _Complex float __cfloat128 __attribute__ ((__mode__
(__TC__)));
| ^
/usr/include/bits/floatn.h:97:9: error: __float128 is not supported on
this target
97 | typedef __float128 _Float128;
The exact error (unknown TC mode) is likely rather due to some issues in
my local build, in which I am actually cross-compiling selftests (for
ARM64 from a x86_64 host, but not through vmtests.sh), and I still have
to sort out some other issues. But I guess it is not really right anyway
to include stdlib.h from an ebpf program, especially if it is not used,
so I am still proposing this small change.
---
tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
index 5eb25c6ad75b1a9c61f22e978d817d3dc88b3a2f..a5be3267dbb01372c84bb468e3a48eae69ac5329 100644
--- a/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_select_reuseport_kern.c
@@ -1,7 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2018 Facebook */
-#include <stdlib.h>
#include <linux/in.h>
#include <linux/ip.h>
#include <linux/ipv6.h>
---
base-commit: 072c40912477ebac2ef98cd0b1532ba9bebda20a
change-id: 20250227-remove_wrong_header-02d288d64204
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Currently, kselftests does not have a generalised mechanism to skip
compilation and run tests when required kernel configuration options
are disabled.
This patch series adresses this issue by checking whether all required
configs from selftest/<test>/config are enabled in the current kernel
Siddharth Menon (2):
selftests: Introduce script to validate required dependencies
selftests/lib.mk: Introduce check to validate required dependencies
.../testing/selftests/check_kselftest_deps.pl | 170 ++++++++++++++++++
tools/testing/selftests/lib.mk | 15 +-
2 files changed, 183 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/check_kselftest_deps.pl
--
2.48.1
This improves the expressiveness of unprivileged BPF by inserting
speculation barriers instead of rejcting the programs.
The approach was presented at LPC'24:
https://lpc.events/event/18/contributions/1954/ ("Mitigating
Spectre-PHT using Speculation Barriers in Linux eBPF")
and RAID'24:
https://arxiv.org/pdf/2405.00078 ("VeriFence: Lightweight and Precise
Spectre Defenses for Untrusted Linux Kernel Extensions")
Goal of this RFC is to get feedback on the approach and the structuring
into commits.
TODOs to be fixed for final version:
* actually emit arm64 barrier
* fix unexpected_load_success from test_progs for "bpf: Fall back to nospec for sanitization-failures"
* use bpf-next as base commit
Luis Gerhorst (9):
bpf/arm64: Unset bypass_spec_v4() instead of ignoring BPF_NOSPEC
bpf: Refactor do_check() if/else into do_check_insn()
bpf: Return EFAULT on misconfigurations
bpf: Return EFAULT on internal errors
bpf: Fall back to nospec if v1 verification fails
bpf: Allow nospec-protected var-offset stack access
bpf: Refactor push_stack to return error code
bpf: Fall back to nospec for sanitization-failures
bpf: Cut speculative path verification short
arch/arm64/net/bpf_jit_comp.c | 10 +-
include/linux/bpf.h | 14 +-
include/linux/bpf_verifier.h | 3 +-
kernel/bpf/core.c | 17 +-
kernel/bpf/verifier.c | 832 ++++++++++--------
.../selftests/bpf/progs/verifier_and.c | 3 +-
.../selftests/bpf/progs/verifier_bounds.c | 30 +-
.../selftests/bpf/progs/verifier_movsx.c | 6 +-
.../selftests/bpf/progs/verifier_unpriv.c | 3 +-
.../bpf/progs/verifier_value_ptr_arith.c | 11 +-
10 files changed, 520 insertions(+), 409 deletions(-)
base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
--
2.48.1
Some drivers, like tg3, do not set combined-count:
$ ethtool -l enp4s0f1
Channel parameters for enp4s0f1:
Pre-set maximums:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Current hardware settings:
RX: 4
TX: 1
Other: n/a
Combined: n/a
In the case where combined-count is not set, the ethtool netlink code
in the kernel elides the value and the code in the test:
netnl.channels_get(...)
With a tg3 device, the returned dictionary looks like:
{'header': {'dev-index': 3, 'dev-name': 'enp4s0f1'},
'rx-max': 4,
'rx-count': 4,
'tx-max': 4,
'tx-count': 1}
Note that the key 'combined-count' is missing. As a result of this
missing key the test raises an exception:
# Exception| if channels['combined-count'] == 0:
# Exception| ~~~~~~~~^^^^^^^^^^^^^^^^^^
# Exception| KeyError: 'combined-count'
Change the test to check if 'combined-count' is a key in the dictionary
first and if not assume that this means the driver has separate RX and
TX queues.
With this change, the test now passes successfully on tg3 and mlx5
(which does have a 'combined-count').
Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API")
Signed-off-by: Joe Damato <jdamato(a)fastly.com>
---
v2:
- Simplify logic and reduce indentation as suggested by David Wei.
Retested on both tg3 and mlx5 and test passes as expected.
v1: https://lore.kernel.org/lkml/20250225181455.224309-1-jdamato@fastly.com/
tools/testing/selftests/drivers/net/queues.py | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/queues.py b/tools/testing/selftests/drivers/net/queues.py
index 38303da957ee..8a518905a9f9 100755
--- a/tools/testing/selftests/drivers/net/queues.py
+++ b/tools/testing/selftests/drivers/net/queues.py
@@ -45,10 +45,9 @@ def addremove_queues(cfg, nl) -> None:
netnl = EthtoolFamily()
channels = netnl.channels_get({'header': {'dev-index': cfg.ifindex}})
- if channels['combined-count'] == 0:
- rx_type = 'rx'
- else:
- rx_type = 'combined'
+ rx_type = 'rx'
+ if channels.get('combined-count', 0) > 0:
+ rx_type = 'combined'
expected = curr_queues - 1
cmd(f"ethtool -L {cfg.dev['ifname']} {rx_type} {expected}", timeout=10)
base-commit: 8d52da23b6c68a0f6bad83959ebb61a2cf623c4e
--
2.43.0
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Patch set splited into part I (now already applied to net-next)
- net: pktgen: replace ENOTSUPP with EOPNOTSUPP
- net: pktgen: enable 'param=value' parsing
- net: pktgen: fix hex32_arg parsing for short reads
- net: pktgen: fix 'rate 0' error handling (return -EINVAL)
- net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
- net: pktgen: fix ctrl interface command parsing
- net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
And part II (this one):
- net: pktgen: use defines for the various dec/hex number parsing digits lengths
- net: pktgen: fix mix of int/long
- net: pktgen: remove extra tmp variable (re-use len instead)
- net: pktgen: remove some superfluous variable initializing
- net: pktgen: fix mpls maximum labels list parsing
- net: pktgen: fix access outside of user given buffer in pktgen_if_write()
- net: pktgen: fix mpls reset parsing
- net: pktgen: remove all superfluous index assignements
- selftest: net: add proc_net_pktgen
Regards,
Peter
Changes v6 -> v7:
- rebased on actual net-next/main
- selftest: net: add proc_net_pktgen
- fixed conflict in tools/testing/selftests/net/config
Changes v5 -> v6:
- add rev-by Simon Horman
- drop patch 'net: pktgen: use defines for the various dec/hex number
parsing digits lengths'
- adjust to dropped patch ''net: pktgen: use defines for the various
dec/hex number parsing digits lengths'
- net: pktgen: fix mix of int/long
- fix line break (suggested by Simon Horman)
Changes v4 -> v5:
- split up patchset into part i/ii (suggested by Simon Horman)
- add rev-by Simon Horman
- net: pktgen: align some variable declarations to the most common pattern
-> net: pktgen: fix mix of int/long
- instead of align to most common pattern (int) adjust all usages to
size_t for i and max and ssize_t for len and adjust function signatures
of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly
- respect reverse xmas tree order for local variable declarations (where
possible without too much code churn)
- update subject line and patch description
- dropped net: pktgen: hex32_arg/num_arg error out in case no characters are
available
- keep empty hex/num arg is implicit assumed as zero value
- dropped net: pktgen: num_arg error out in case no valid character is parsed
- keep empty hex/num arg is implicit assumed as zero value
- Change patch description ('Fixes:' -> 'Addresses the following:',
suggested by Simon Horman)
- net: pktgen: remove all superfluous index assignements
- new patch (suggested by Simon Horman)
- selftest: net: add proc_net_pktgen
- addapt to dropped patch 'net: pktgen: hex32_arg/num_arg error out in case
no characters are available', empty hex/num arg is now implicit assumed as
zero value (instead of failure)
Changes v3 -> v4:
- add rev-by Simon Horman
- new patch 'net: pktgen: use defines for the various dec/hex number parsing
digits lengths' (suggested by Simon Horman)
- replace C99 comment (suggested by Paolo Abeni)
- drop available characters check in strn_len() (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: align some variable declarations to the
most common pattern' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove extra tmp variable (re-use len
instead)' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove some superfluous variable
initializing' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: fix mpls maximum labels list parsing'
(suggested by Paolo Abeni)
- factored out 'net: pktgen: hex32_arg/num_arg error out in case no
characters are available' (suggested by Paolo Abeni)
- factored out 'net: pktgen: num_arg error out in case no valid character
is parsed' (suggested by Paolo Abeni)
Changes v2 -> v3:
- new patch: 'net: pktgen: fix ctrl interface command parsing'
- new patch: 'net: pktgen: fix mpls reset parsing'
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo in change description ('v1 -> v1' and tyop)
- rename some vars to better match usage
add_loopback_0 -> thr_cmd_add_loopback_0
rm_loopback_0 -> thr_cmd_rm_loopback_0
wrong_ctrl_cmd -> wrong_thr_cmd
legacy_ctrl_cmd -> legacy_thr_cmd
ctrl_fd -> thr_fd
- add ctrl interface tests
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix tyop not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (8):
net: pktgen: fix mix of int/long
net: pktgen: remove extra tmp variable (re-use len instead)
net: pktgen: remove some superfluous variable initializing
net: pktgen: fix mpls maximum labels list parsing
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
net: pktgen: fix mpls reset parsing
net: pktgen: remove all superfluous index assignements
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 288 ++++----
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 646 ++++++++++++++++++
4 files changed, 805 insertions(+), 131 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.1
There is a spelling mistake in a sig_print message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/x86/xstate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/xstate.c b/tools/testing/selftests/x86/xstate.c
index 875777911d82..23c1d6c964ea 100644
--- a/tools/testing/selftests/x86/xstate.c
+++ b/tools/testing/selftests/x86/xstate.c
@@ -391,7 +391,7 @@ static void validate_sigfpstate(int sig, siginfo_t *si, void *ctx_void)
if (get_xstatebv(xbuf) & xstate.mask)
sig_print("[OK]\t'xfeatures' in XSAVE header is valid\n");
else
- sig_print("[FAIL]\t'xfeatures' in XSAVE hader is not valid\n");
+ sig_print("[FAIL]\t'xfeatures' in XSAVE header is not valid\n");
if (validate_xstate_same(stashed_xbuf, xbuf))
sig_print("[OK]\txstate delivery was successful\n");
--
2.47.2
Fix grammar such as "number amount of times is
recommended" etc -> "the recommended number of
times".
Signed-off-by: Bharadwaj Raju <bharadwaj.raju777(a)gmail.com>
---
tools/testing/selftests/sysctl/sysctl.sh | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh
index f6e129a82ffd..db1616857d89 100755
--- a/tools/testing/selftests/sysctl/sysctl.sh
+++ b/tools/testing/selftests/sysctl/sysctl.sh
@@ -857,7 +857,7 @@ list_tests()
echo
echo "TEST_ID x NUM_TEST"
echo "TEST_ID: Test ID"
- echo "NUM_TESTS: Number of recommended times to run the test"
+ echo "NUM_TESTS: Recommended number of times to run the test"
echo
echo "0001 x $(get_test_count 0001) - tests proc_dointvec_minmax()"
echo "0002 x $(get_test_count 0002) - tests proc_dostring()"
@@ -884,7 +884,7 @@ usage()
echo "Valid tests: 0001-$MAX_TEST"
echo ""
echo " all Runs all tests (default)"
- echo " -t Run test ID the number amount of times is recommended"
+ echo " -t Run test ID the recommended number of times"
echo " -w Watch test ID run until it runs into an error"
echo " -c Run test ID once"
echo " -s Run test ID x test-count number of times"
@@ -898,7 +898,7 @@ usage()
echo Example uses:
echo
echo "$TEST_NAME.sh -- executes all tests"
- echo "$TEST_NAME.sh -t 0002 -- Executes test ID 0002 number of times is recommended"
+ echo "$TEST_NAME.sh -t 0002 -- Executes test ID 0002 the recommended number of times"
echo "$TEST_NAME.sh -w 0002 -- Watch test ID 0002 run until an error occurs"
echo "$TEST_NAME.sh -s 0002 -- Run test ID 0002 once"
echo "$TEST_NAME.sh -c 0002 3 -- Run test ID 0002 three times"
--
2.48.1
From: Willem de Bruijn <willemb(a)google.com>
Expand IPV6_TCLASS to also cover IP_TOS.
Expand IPV6_HOPLIMIT to also cover IP_TTL.
A series of two patches for basic readability (patch 1 is a noop),
and so that git does not interpret code changes + file rename as
a whole file del + add.
Willem de Bruijn (2):
selftests/net: prepare cmsg_ipv6.sh for ipv4
selftests/net: expand cmsg_ipv6.sh with ipv4
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/cmsg_ip.sh | 184 ++++++++++++++++++++++
tools/testing/selftests/net/cmsg_ipv6.sh | 154 ------------------
tools/testing/selftests/net/cmsg_sender.c | 90 +++++++----
4 files changed, 240 insertions(+), 190 deletions(-)
create mode 100755 tools/testing/selftests/net/cmsg_ip.sh
delete mode 100755 tools/testing/selftests/net/cmsg_ipv6.sh
--
2.48.1.658.g4767266eb4-goog
Commit 14be4e6f3522 ("selftests: vDSO: fix ELF hash table entry size for s390x")
changed the type of the ELF hash table entries to 64bit on s390x.
However the *GNU* hash tables entries are always 32bit.
The "bucket" pointer is shared between both hash algorithms.
On s390x the GNU algorithm assigns and dereferences this pointer to a
64bit value as a pointer to a 32bit value, leading to compiler warnings and
runtime crashes.
Introduce a new dedicated "gnu_bucket" pointer which is used by the GNU hash.
Fixes: e0746bde6f82 ("selftests/vDSO: support DT_GNU_HASH")
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v2:
- Fix wording around the width of pointers vs the pointed-to values
- Link to v1: https://lore.kernel.org/r/20250213-selftests-vdso-s390-gnu-hash-v1-1-ace3bc…
---
tools/testing/selftests/vDSO/parse_vdso.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c
index 2fe5e983cb22f1ed066d0310a54f6aef2ed77ed8..f89d052c730eb43eea28d69ca27b56e897503e16 100644
--- a/tools/testing/selftests/vDSO/parse_vdso.c
+++ b/tools/testing/selftests/vDSO/parse_vdso.c
@@ -53,7 +53,7 @@ static struct vdso_info
/* Symbol table */
ELF(Sym) *symtab;
const char *symstrings;
- ELF(Word) *gnu_hash;
+ ELF(Word) *gnu_hash, *gnu_bucket;
ELF_HASH_ENTRY *bucket, *chain;
ELF_HASH_ENTRY nbucket, nchain;
@@ -185,8 +185,8 @@ void vdso_init_from_sysinfo_ehdr(uintptr_t base)
/* The bucket array is located after the header (4 uint32) and the bloom
* filter (size_t array of gnu_hash[2] elements).
*/
- vdso_info.bucket = vdso_info.gnu_hash + 4 +
- sizeof(size_t) / 4 * vdso_info.gnu_hash[2];
+ vdso_info.gnu_bucket = vdso_info.gnu_hash + 4 +
+ sizeof(size_t) / 4 * vdso_info.gnu_hash[2];
} else {
vdso_info.nbucket = hash[0];
vdso_info.nchain = hash[1];
@@ -268,11 +268,11 @@ void *vdso_sym(const char *version, const char *name)
if (vdso_info.gnu_hash) {
uint32_t h1 = gnu_hash(name), h2, *hashval;
- i = vdso_info.bucket[h1 % vdso_info.nbucket];
+ i = vdso_info.gnu_bucket[h1 % vdso_info.nbucket];
if (i == 0)
return 0;
h1 |= 1;
- hashval = vdso_info.bucket + vdso_info.nbucket +
+ hashval = vdso_info.gnu_bucket + vdso_info.nbucket +
(i - vdso_info.gnu_hash[1]);
for (;; i++) {
ELF(Sym) *sym = &vdso_info.symtab[i];
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250213-selftests-vdso-s390-gnu-hash-7206671abc85
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
The implementation is limited and only supports numeric arguments.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Return __LINE__ from different testcases to directly point to the
failed testcase
- Add some comments
- Expand commit message
- Link to v1: https://lore.kernel.org/r/20240731-nolibc-scanf-v1-0-f71bcc4abb9e@weissschu…
---
Thomas Weißschuh (2):
tools/nolibc: add support for [v]sscanf()
Revert "selftests: kselftest: Fix build failure with NOLIBC"
tools/include/nolibc/stdio.h | 98 ++++++++++++++++++++++++++++
tools/testing/selftests/kselftest.h | 5 --
tools/testing/selftests/nolibc/nolibc-test.c | 68 +++++++++++++++++++
3 files changed, 166 insertions(+), 5 deletions(-)
---
base-commit: 665fa8dea90d9fbc0e7137c7e1315d6f7e15757e
change-id: 20240414-nolibc-scanf-f1db6930d0c6
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi all,
This patchset adds a new buddy allocator like (or non-uniform) large folio
split from a order-n folio to order-m with m < n. It reduces
1. the total number of after-split folios from 2^(n-m) to n-m+1;
2. the amount of memory needed for multi-index xarray split from 2^(n/6-m/6) to
n/6-m/6, assuming XA_CHUNK_SHIFT=6;
3. keep more large folios after a split from all order-m folios to
order-(n-1) to order-m folios.
For example, to split an order-9 to order-0, folio split generates 10
(or 11 for anonymous memory) folios instead of 512, allocates 1 xa_node
instead of 8, and leaves 1 order-8, 1 order-7, ..., 1 order-1 and 2 order-0
folios (or 4 order-0 for anonymous memory) instead of 512 order-0 folios.
It is on top of mm-everything-2025-02-15-05-49 with V7 reverted. It is ready to
be merged.
Instead of duplicating existing split_huge_page*() code, __folio_split()
is introduced as the shared backend code for both
split_huge_page_to_list_to_order() and folio_split(). __folio_split()
can support both uniform split and buddy allocator like (or non-uniform) split.
All existing split_huge_page*() users can be gradually converted to use
folio_split() if possible. In this patchset, I converted
truncate_inode_partial_folio() to use folio_split().
xfstests quick group passed for both tmpfs and xfs.
Changelog
===
From V7[9]:
1. Fixed a wrong function name in lib/test_xarray.c.
2. Made __split_folio_to_order() never fail, since the old order check
is already done in __folio_split(). (per David Hildenbrand)
3. Fixed an issue reported by syzbot[10] by not dropping the original
folio during truncate.
4. Fixed a WARNING when READ_ONLY_THP_FOR_FS is enabled. (Thank David
Hildenbrand for reporting the issue)
5. Used two separate struct page* parameters, split_at and lock_at, to
specify at which subpage the non-uniform split happens and which subpage
to keep locked after the split, respectively. It improves code
readability.
From V6[8]:
1. Added an xarray function xas_try_split() to support iterative folio split,
removing the need of using xas_split_alloc() and xas_split(). The
function guarantees that at most one xa_node is allocated for each
call.
2. Added concrete numbers of after-split folios and xa_node savings to
cover letter, commit log. (per Andrew)
From V5[7]:
1. Split shmem to any lower order patches are in mm tree, so dropped
from this series.
2. Rename split_folio_at() to try_folio_split() to clarify that
non-uniform split will not be used if it is not supported.
From V4[6]:
1. Enabled shmem support in both uniform and buddy allocator like split
and added selftests for it.
2. Added functions to check if uniform split and buddy allocator like
split are supported for the given folio and order.
3. Made truncate fall back to uniform split if buddy allocator split is
not supported (CONFIG_READ_ONLY_THP_FOR_FS and FS without large folio).
4. Added the missing folio_clear_has_hwpoisoned() to
__split_unmapped_folio().
From V3[5]:
1. Used xas_split_alloc(GFP_NOWAIT) instead of xas_nomem(), since extra
operations inside xas_split_alloc() are needed for correctness.
2. Enabled folio_split() for shmem and no issue was found with xfstests
quick test group.
3. Split both ends of a truncate range in truncate_inode_partial_folio()
to avoid wasting memory in shmem truncate (per David Hildenbrand).
4. Removed page_in_folio_offset() since page_folio() does the same
thing.
5. Finished truncate related tests from xfstests quick test group on XFS and
tmpfs without issues.
6. Disabled buddy allocator like split on CONFIG_READ_ONLY_THP_FOR_FS
and FS without large folio. This check was missed in the prior
versions.
From V2[3]:
1. Incorporated all the feedback from Kirill[4].
2. Used GFP_NOWAIT for xas_nomem().
3. Tested the code path when xas_nomem() fails.
4. Added selftests for folio_split().
5. Fixed no THP config build error.
From V1[2]:
1. Split the original patch 1 into multiple ones for easy review (per
Kirill).
2. Added xas_destroy() to avoid memory leak.
3. Fixed nr_dropped not used error (per kernel test robot).
4. Added proper error handling when xas_nomem() fails to allocate memory
for xas_split() during buddy allocator like split.
From RFC[1]:
1. Merged backend code of split_huge_page_to_list_to_order() and
folio_split(). The same code is used for both uniform split and buddy
allocator like split.
2. Use xas_nomem() instead of xas_split_alloc() for folio_split().
3. folio_split() now leaves the first after-split folio unlocked,
instead of the one containing the given page, since
the caller of truncate_inode_partial_folio() locks and unlocks the
first folio.
4. Extended split_huge_page debugfs to use folio_split().
5. Added truncate_inode_partial_folio() as first user of folio_split().
Design
===
folio_split() splits a large folio in the same way as buddy allocator
splits a large free page for allocation. The purpose is to minimize the
number of folios after the split. For example, if user wants to free the
3rd subpage in a order-9 folio, folio_split() will split the order-9 folio
as:
O-0, O-0, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-8 if it is anon
O-1, O-0, O-0, O-2, O-3, O-4, O-5, O-6, O-7, O-9 if it is pagecache
Since anon folio does not support order-1 yet.
The split process is similar to existing approach:
1. Unmap all page mappings (split PMD mappings if exist);
2. Split meta data like memcg, page owner, page alloc tag;
3. Copy meta data in struct folio to sub pages, but instead of spliting
the whole folio into multiple smaller ones with the same order in a
shot, this approach splits the folio iteratively. Taking the example
above, this approach first splits the original order-9 into two order-8,
then splits left part of order-8 to two order-7 and so on;
4. Post-process split folios, like write mapping->i_pages for pagecache,
adjust folio refcounts, add split folios to corresponding list;
5. Remap split folios
6. Unlock split folios.
__split_unmapped_folio() and __split_folio_to_order() replace
__split_huge_page() and __split_huge_page_tail() respectively.
__split_unmapped_folio() uses different approaches to perform
uniform split and buddy allocator like split:
1. uniform split: one single call to __split_folio_to_order() is used to
uniformly split the given folio. All resulting folios are put back to
the list after split. The folio containing the given page is left to
caller to unlock and others are unlocked.
2. buddy allocator like (or non-uniform) split: (old_order - new_order) calls
to __split_folio_to_order() are used to split the given folio at order N to
order N-1. After each call, the target folio is changed to the one
containing the page, which is given as a folio_split() parameter.
After each call, folios not containing the page are put back to the list.
The folio containing the page is put back to the list when its order
is new_order. All folios are unlocked except the first folio, which
is left to caller to unlock.
Patch Overview
===
1. Patch 1 added a new xarray function xas_try_split() to perform
iterative xarray split.
2. Patch 2 added __split_unmapped_folio() and __split_folio_to_order() to
prepare for moving to new backend split code.
3. Patch 3 moved common code in split_huge_page_to_list_to_order() to
__folio_split().
4. Patch 4 added new folio_split() and made
split_huge_page_to_list_to_order() share the new
__split_unmapped_folio() with folio_split().
5. Patch 5 removed no longer used __split_huge_page() and
__split_huge_page_tail().
6. Patch 6 added a new in_folio_offset to split_huge_page debugfs for
folio_split() test.
7. Patch 7 used try_folio_split() for truncate operation.
8. Patch 8 added folio_split() tests.
Any comments and/or suggestions are welcome. Thanks.
[1] https://lore.kernel.org/linux-mm/20241008223748.555845-1-ziy@nvidia.com/
[2] https://lore.kernel.org/linux-mm/20241028180932.1319265-1-ziy@nvidia.com/
[3] https://lore.kernel.org/linux-mm/20241101150357.1752726-1-ziy@nvidia.com/
[4] https://lore.kernel.org/linux-mm/e6ppwz5t4p4kvir6eqzoto4y5fmdjdxdyvxvtw43nc…
[5] https://lore.kernel.org/linux-mm/20241205001839.2582020-1-ziy@nvidia.com/
[6] https://lore.kernel.org/linux-mm/20250106165513.104899-1-ziy@nvidia.com/
[7] https://lore.kernel.org/linux-mm/20250116211042.741543-1-ziy@nvidia.com/
[8] https://lore.kernel.org/linux-mm/20250205031417.1771278-1-ziy@nvidia.com/
[9] https://lore.kernel.org/linux-mm/20250211155034.268962-1-ziy@nvidia.com/
[10] https://lore.kernel.org/all/67af65cb.050a0220.21dd3.004a.GAE@google.com/
Zi Yan (8):
xarray: add xas_try_split() to split a multi-index entry
mm/huge_memory: add two new (not yet used) functions for folio_split()
mm/huge_memory: move folio split common code to __folio_split()
mm/huge_memory: add buddy allocator like (non-uniform) folio_split()
mm/huge_memory: remove the old, unused __split_huge_page()
mm/huge_memory: add folio_split() to debugfs testing interface
mm/truncate: use buddy allocator like folio split for truncate
operation
selftests/mm: add tests for folio_split(), buddy allocator like split
Documentation/core-api/xarray.rst | 14 +-
include/linux/huge_mm.h | 36 +
include/linux/xarray.h | 7 +
lib/test_xarray.c | 47 ++
lib/xarray.c | 138 +++-
mm/huge_memory.c | 756 ++++++++++++------
mm/truncate.c | 31 +-
tools/testing/radix-tree/Makefile | 1 +
.../selftests/mm/split_huge_page_test.c | 34 +-
9 files changed, 783 insertions(+), 281 deletions(-)
--
2.47.2
The uprobe events test fails on s390, but also on x86 (Fedora 41). The
problem appears to be that there is an assumption that adding a uprobe to
the beginning of the executable mapping of /bin/sh is sufficient to trigger
a uprobe event when /bin/sh is executed.
This assumption is not necessarily true. Therefore use "readelf -h" to find
the entry point address of /bin/sh and use this address when adding the
uprobe event.
This adds a dependency to readelf which is not always installed. Therefore
add a check and exit with exit_unresolved if it is not installed.
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
---
.../ftrace/test.d/dynevent/add_remove_uprobe.tc | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_uprobe.tc b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_uprobe.tc
index 86c76679c56e..f2048c244526 100644
--- a/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_uprobe.tc
+++ b/tools/testing/selftests/ftrace/test.d/dynevent/add_remove_uprobe.tc
@@ -3,14 +3,18 @@
# description: Generic dynamic event - add/remove/test uprobe events
# requires: uprobe_events
+if ! which readelf > /dev/null 2>&1 ; then
+ echo "No readelf found. skipped."
+ exit_unresolved
+fi
+
echo 0 > events/enable
echo > dynamic_events
REALBIN=`readlink -f /bin/sh`
+ENTRYPOINT=`readelf -h ${REALBIN} | grep Entry | sed -e 's/[^0]*//'`
-echo 'cat /proc/$$/maps' | /bin/sh | \
- grep "r-xp .*${REALBIN}$" | \
- awk '{printf "p:myevent %s:0x%s\n", $6,$3 }' >> uprobe_events
+echo "p:myevent ${REALBIN}:${ENTRYPOINT}" >> uprobe_events
grep -q myevent uprobe_events
test -d events/uprobes/myevent
--
2.45.2
Hi,
thank you for your reviw. As promised, here is V3 of this patch series.
I noticed that the updated selftests were flaky sometimes due to the kernel
networking stack sending IPv6 multicast listener reports on the created
test interfaces.
This can be seen here:
https://github.com/kernel-patches/bpf/actions/runs/13449071153/job/37580497…
Setting the NOARP flag on the interfaces should fix this race condition.
Successful pipeline:
https://github.com/kernel-patches/bpf/actions/runs/13500667544
Signed-off-by: Marcus Wichelmann <marcus.wichelmann(a)hetzner-cloud.de>
Acked-by: Jason Wang <jasowang(a)redhat.com>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
---
v3:
- change the condition to handle xdp_buffs without metadata support, as
suggested by Willem de Bruijn <willemb(a)google.com>
- add clarifying comment why that condition is needed
- set NOARP flag in selftests to ensure that the kernel does not send
packets on the test interfaces that may interfere with the tests
v2: https://lore.kernel.org/bpf/20250217172308.3291739-1-marcus.wichelmann@hetz…
- submit against bpf-next subtree
- split commits and improved commit messages
- remove redundant metasize check and add clarifying comment instead
- use max() instead of ternary operator
- add selftest for metadata support in the tun driver
v1: https://lore.kernel.org/all/20250130171614.1657224-1-marcus.wichelmann@hetz…
Marcus Wichelmann (6):
net: tun: enable XDP metadata support
net: tun: enable transfer of XDP metadata to skb
selftests/bpf: move open_tuntap to network helpers
selftests/bpf: refactor xdp_context_functional test and bpf program
selftests/bpf: add test for XDP metadata support in tun driver
selftests/bpf: fix file descriptor assertion in open_tuntap helper
drivers/net/tun.c | 28 ++-
tools/testing/selftests/bpf/network_helpers.c | 28 +++
tools/testing/selftests/bpf/network_helpers.h | 3 +
.../selftests/bpf/prog_tests/lwt_helpers.h | 29 ----
.../bpf/prog_tests/xdp_context_test_run.c | 163 ++++++++++++++++--
.../selftests/bpf/progs/test_xdp_meta.c | 56 +++---
6 files changed, 230 insertions(+), 77 deletions(-)
--
2.43.0
This patch series introduces changes to add default build support for
the sched tests in selftests.
The only test under sched is cs_prctl_test which validates cookies when
core scheduling is in effect. This test fails on systems where core
scheduling is disabled. The patch series also modifies this behaviour to
gracefully skip the test on such systems.
A system with core scheduling disabled would skip the test like:
~# ./run_kselftest.sh
TAP version 13
1..1
timeout set to 45
selftests: sched: cs_prctl_test
prctl failed: Invalid argument
Core sched not supported, hence skipping tests
ok 1 selftests: sched: cs_prctl_test # SKIP
Signed-off-by: Sinadin Shan <sinadin.shan(a)oracle.com>
---
v3:
* Use prctl to check core sched support instead of config
* v2 link: https://lore.kernel.org/all/20250221115750.631990-1-sinadin.shan@oracle.com/
v2:
* Add patch to skip cs_prctl_test on core scheduling disabled systems
* v1 link: https://lore.kernel.org/all/20250219064658.449069-1-sinadin.shan@oracle.com
---
Sinadin Shan (2):
selftests: sched: add sched as a default selftest target
selftests: sched: skip cs_prctl_test for systems with core scheduling
disabled
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/sched/cs_prctl_test.c | 34 ++++++++++++++++++-
2 files changed, 34 insertions(+), 1 deletion(-)
--
2.43.5
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c…
===
v5 has no major changes; it clears up the relatively minor issues
pointed out to in v4, and rebases the series on top of net-next to
resolve the conflict with a patch that raced to the tree. It also
collects the review tags from v4.
Changes:
- Rebase to net-next
- Fix issues in selftest (Stan).
- Address comments in the devmem and netmem driver docs (Stan and Bagas)
- Fix zerocopy_fill_skb_from_devmem return error code (Stan).
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.…
===
v4 mainly addresses the critical driver support issue surfaced in v3 by
Paolo and Stan. Drivers aiming to support netmem_tx should make sure not
to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs
may come from dma-bufs.
Additionally other feedback from v3 is addressed.
Major changes:
- Add helpers to handle netmem dma-addrs. Add GVE support for
netmem_tx.
- Fix binding->tx_vec not being freed on error paths during the
tx binding.
- Add a minimal devmem_tx test to devmem.py.
- Clean up everything obsolete from the cover letter (Paolo).
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
===
Address minor comments from RFCv2 and fix a few build warnings and
ynl-regen issues. No major changes.
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now multiple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Cc: Samiullah Khawaja <skhawaja(a)google.com>
Mina Almasry (8):
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: devmem: make dmabuf unbinding scheduled work
net: add devmem TCP TX documentation
net: enable driver support for netmem TX
gve: add netmem TX support to GVE DQO-RDA mode
net: check for driver support in netmem TX
selftests: ncdevmem: Implement devmem TCP TX
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 150 ++++++++-
.../networking/net_cachelines/net_device.rst | 1 +
Documentation/networking/netdev-features.rst | 5 +
Documentation/networking/netmem.rst | 23 +-
drivers/net/ethernet/google/gve/gve_main.c | 4 +
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +-
include/linux/netdevice.h | 2 +
include/linux/skbuff.h | 17 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 23 ++
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
net/core/datagram.c | 48 ++-
net/core/dev.c | 3 +
net/core/devmem.c | 113 ++++++-
net/core/devmem.h | 69 +++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 73 ++++-
net/core/skbuff.c | 48 ++-
net/core/sock.c | 6 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/tcp.c | 46 ++-
net/ipv6/ip6_output.c | 3 +-
net/vmw_vsock/virtio_transport_common.c | 5 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/devmem.py | 26 +-
.../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++-
29 files changed, 938 insertions(+), 71 deletions(-)
base-commit: b66e19dcf684b21b6d3a1844807bd1df97ad197a
--
2.48.1.601.g30ceb7b040-goog
The first patch fixes the incorrect locks using in bond driver.
The second patch fixes the xfrm offload feature during setup active-backup
mode. The third patch add a ipsec offload testing.
v2: move the mutex lock to a work queue (Cosmin Ratiu)
Hangbin Liu (3):
bonding: move mutex lock to a work queue for XFRM GC tasks
bonding: fix xfrm offload feature setup on active-backup mode
selftests: bonding: add ipsec offload test
drivers/net/bonding/bond_main.c | 43 +++--
drivers/net/bonding/bond_netlink.c | 16 +-
include/net/bonding.h | 7 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 4 +
6 files changed, 209 insertions(+), 19 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
--
2.46.0
The page allocator does a lot of stuff that is not visible to the user
in any deterministic way. But this stuff is still important and it would
be nice to test that behaviour.
KUnit is a tool for unit-testing kernel-internal APIs. This is an
attempt to adopt it the page allocator.
I have been hacking on this as a way to try and test the code I'm
writing for my ASI page_alloc integration proposal [0]. It's been
extremely useful to be able to "just call it and see what it does". So I
wanna gather some feedback on whether this basic idea is of interest
before I invest too much more time in it.
You can run these tests like this:
tools/testing/kunit/kunit.py run \
--arch=x86_64 --kernel_args="movablecore=2G" \
--qemu_args="-m 4G" --kunitconfig mm/.kunitconfig
Unit-testing code that has mutable global variables can be a pain.
Unit-testing code with mutable global variables _that can change
concurrently with the tests_ is basically impossible. So, we need some
way to isolate an "instance" of the allocator that doesn't refer to any
such concurrently-mutated state.
Luckily, the allocator only has one really important global variable:
node_data. So, the approach here is to carve out a subset of that
variable which is as isolated as possible from the rest of rthe system,
which can be used for deterministic testing. This is achieved by crating
a fake "isolated" node at boot, and plugging in memory at test init
time.
This is an RFC and not a PATCH because:
1. I have not taken much care to ensure the isolation is complete.
There are probably sources of flakiness and nondeterminism in here.
2. I suspect the the basic idea might be over-complicated: do we really
need memory hotplug here? Do we even need the instance of the
allocator we're testing to actual memory behind the pages it's
allocating, or could we just hallucinate a new region of vmemmap
without any of that awkwardness?
One significant downside of relying on memory hotplug is that the
test won't run if we can't hotplug anything out. That means you have
to fiddle with the platform to even run the tests - see the
--kernel_args and --qemu_args I had to add to my kunit.py command
above.
So yeah, other suggestions welcome.
2b. I'm not very confident I'm using the hotplug API properly.
3. There's no point in merging this without actually having at least a
few tests that are actually interesting!
Maybe a "build it and they will come" approach can be justified to
some extent, but there's a nonzero cost to the infrastructure so we
should probably have some confidence that they will indeed come.
[0] https://lore.kernel.org/linux-mm/20250129144320.2675822-1-jackmanb@google.c…
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
Brendan Jackman (4):
kunit: Allocate assertion data with GFP_ATOMIC
mm/page_alloc_test: Add empty KUnit boilerplate
mm/page_alloc_test: Add logic to isolate a node for testing
mm/page_alloc_test: Add smoke-test for page allocation
drivers/base/memory.c | 5 +-
include/linux/memory.h | 4 +
include/linux/nodemask.h | 13 +++
kernel/kthread.c | 3 +
lib/kunit/assert.c | 2 +-
lib/kunit/resource.c | 2 +-
lib/kunit/test.c | 2 +-
mm/.kunitconfig | 10 ++
mm/Kconfig | 8 ++
mm/Makefile | 2 +
mm/internal.h | 11 ++
mm/memory_hotplug.c | 26 +++--
mm/numa_memblks.c | 22 ++++
mm/page_alloc.c | 37 +++++-
mm/page_alloc_test.c | 296 +++++++++++++++++++++++++++++++++++++++++++++++
15 files changed, 429 insertions(+), 14 deletions(-)
---
base-commit: d082ecbc71e9e0bf49883ee4afd435a77a5101b6
change-id: 20250219-page-alloc-kunit-df76ef8d8eb9
Best regards,
--
Brendan Jackman <jackmanb(a)google.com>
Hi all,
This series proposes a rework of xstate-related tests to improve
maintainability and expand test coverage.
== Motivation: Addressing Missing and New XSTATE Tests ==
With the introduction of AMX, a new test suite [1] was created to verify
dynamic state handling by the kernel as observed from userspace. However,
previous tests for non-dynamic states like AVX lacked ABI validation,
leaving gaps in coverage. While these states currently function without
major issues (following the alternate sigstack fix [2]), xstate testing
in the x86 selftest suite has been largely overlooked.
Now, with Intel introducing another extended state, Advanced Performance
Extensions (APX) [3], a correspondent test case is need. The APX enabling
series will follow shortly and will leverage this refactored selftest
framework.
== Selftest Code Rework ==
To ensure ABI validation and core functionality across various xstates,
refactoring the test code is necessary. Without this, existing code from
amx.c would need to be duplicated, compromising the structural quality of
xstate tests.
This series introduces a shared test framework for extended state
validation, applicable to both existing and new xstates. The test cases
cover:
* Context switching
* ABI compatibility for signal handling
* ABI compatibility for ptrace interactions
== Patch Organization ==
The patchset is structured as follows:
* PATCH1: Preparatory cleanup — removing redundant signal handler
registration code.
* PATCH2/3: Introduce low-level XSAVE helpers and xstate component
enumeration.
* PATCH4/5: Refactor existing test code.
* PATCH6: Introduce a new signal test case.
* PATCH7/8: Consolidate test invocations and clarify the list of
supported features.
* PATCH9: Add test coverage for AVX.
== Coverage and Future Work Considerations ==
Currently, these tests are aligned with 64-bit mode only. Support for
32-bit cases will be considered when necessary, but only after this phase
of rework is finalized.
FWIW, the AMX TILECFG state is trivial, requiring almost constant values.
Additionally, various PKRU tests are already established in
tools/selftests/mm.
This series is based on the tip/master branch. You can also find it in
the following repository:
git://github.com/intel/apx.git selftest-xstate_v1
Thanks,
Chang
[1] https://lore.kernel.org/all/20211026122523.AFB99C1F@davehans-spike.ostc.int…
[2] https://lore.kernel.org/lkml/20210518200320.17239-1-chang.seok.bae@intel.co…
[3] https://www.intel.com/content/www/us/en/developer/articles/technical/advanc…
Chang S. Bae (9):
selftests/x86: Consolidate redundant signal helper functions
selftests/x86/xstate: Refactor XSAVE helpers for general use
selftests/x86/xstate: Enumerate and name xstate components
selftests/x86/xstate: Refactor context switching test
selftests/x86/xstate: Refactor ptrace ABI test
selftests/x86/xstate: Introduce signal ABI test
selftests/x86/xstate: Consolidate test invocations into a single entry
selftests/x86/xstate: Clarify supported xstates
selftests/x86/avx: Add AVX test
tools/testing/selftests/x86/Makefile | 6 +-
tools/testing/selftests/x86/amx.c | 442 +---------------
tools/testing/selftests/x86/avx.c | 12 +
.../selftests/x86/corrupt_xstate_header.c | 14 +-
tools/testing/selftests/x86/entry_from_vm86.c | 24 +-
tools/testing/selftests/x86/fsgsbase.c | 24 +-
tools/testing/selftests/x86/helpers.h | 28 +
tools/testing/selftests/x86/ioperm.c | 25 +-
tools/testing/selftests/x86/iopl.c | 25 +-
tools/testing/selftests/x86/ldt_gdt.c | 18 +-
tools/testing/selftests/x86/mov_ss_trap.c | 14 +-
tools/testing/selftests/x86/ptrace_syscall.c | 24 +-
tools/testing/selftests/x86/sigaltstack.c | 26 +-
tools/testing/selftests/x86/sigreturn.c | 24 +-
.../selftests/x86/single_step_syscall.c | 22 -
.../testing/selftests/x86/syscall_arg_fault.c | 12 -
tools/testing/selftests/x86/syscall_nt.c | 12 -
tools/testing/selftests/x86/sysret_rip.c | 24 +-
tools/testing/selftests/x86/test_vsyscall.c | 13 -
tools/testing/selftests/x86/unwind_vdso.c | 12 -
tools/testing/selftests/x86/xstate.c | 477 ++++++++++++++++++
tools/testing/selftests/x86/xstate.h | 195 +++++++
22 files changed, 753 insertions(+), 720 deletions(-)
create mode 100644 tools/testing/selftests/x86/avx.c
create mode 100644 tools/testing/selftests/x86/xstate.c
create mode 100644 tools/testing/selftests/x86/xstate.h
base-commit: 5bff053d066ba892464995ae4a7246f7a78fce2d
--
2.45.2
v1/v2:
There is only the first patch: RISC-V: Enable cbo.clean/flush in usermode,
which mainly removes the enabling of cbo.inval in user mode.
v3:
Add the functionality of Expose Zicbom and selftests for Zicbom.
v4:
Modify the order of macros, The test_no_cbo_inval function is added
separately.
v5:
1. Modify the order of RISCV_HWPROBE_KEY_ZICBOM_BLOCK_SIZE in hwprobe.rst
2. "TEST_NO_ZICBOINVAL" -> "TEST_NO_CBO_INVAL"
Yunhui Cui (3):
RISC-V: Enable cbo.clean/flush in usermode
RISC-V: hwprobe: Expose Zicbom extension and its block size
RISC-V: selftests: Add TEST_ZICBOM into CBO tests
Documentation/arch/riscv/hwprobe.rst | 6 ++
arch/riscv/include/asm/hwprobe.h | 2 +-
arch/riscv/include/uapi/asm/hwprobe.h | 2 +
arch/riscv/kernel/cpufeature.c | 8 +++
arch/riscv/kernel/sys_hwprobe.c | 6 ++
tools/testing/selftests/riscv/hwprobe/cbo.c | 66 +++++++++++++++++----
6 files changed, 78 insertions(+), 12 deletions(-)
--
2.39.2
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1
Some distributions may not enable MPTCP by default. All other MPTCP tests
source mptcp_lib.sh to ensure MPTCP is enabled before testing. However,
the ip_local_port_range test is the only one that does not include this
step.
Let's also ensure MPTCP is enabled in netns for ip_local_port_range so
that it passes on all distributions.
Suggested-by: Davide Caratti <dcaratti(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
tools/testing/selftests/net/ip_local_port_range.sh | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/ip_local_port_range.sh b/tools/testing/selftests/net/ip_local_port_range.sh
index 6c6ad346eaa0..4ff746db1256 100755
--- a/tools/testing/selftests/net/ip_local_port_range.sh
+++ b/tools/testing/selftests/net/ip_local_port_range.sh
@@ -2,4 +2,6 @@
# SPDX-License-Identifier: GPL-2.0
./in_netns.sh \
- sh -c 'sysctl -q -w net.ipv4.ip_local_port_range="40000 49999" && ./ip_local_port_range'
+ sh -c 'sysctl -q -w net.mptcp.enabled=1 && \
+ sysctl -q -w net.ipv4.ip_local_port_range="40000 49999" && \
+ ./ip_local_port_range'
--
2.46.0
Some drivers, like tg3, do not set combined-count:
$ ethtool -l enp4s0f1
Channel parameters for enp4s0f1:
Pre-set maximums:
RX: 4
TX: 4
Other: n/a
Combined: n/a
Current hardware settings:
RX: 4
TX: 1
Other: n/a
Combined: n/a
In the case where combined-count is not set, the ethtool netlink code
in the kernel elides the value and the code in the test:
netnl.channels_get(...)
With a tg3 device, the returned dictionary looks like:
{'header': {'dev-index': 3, 'dev-name': 'enp4s0f1'},
'rx-max': 4,
'rx-count': 4,
'tx-max': 4,
'tx-count': 1}
Note that the key 'combined-count' is missing. As a result of this
missing key the test raises an exception:
# Exception| if channels['combined-count'] == 0:
# Exception| ~~~~~~~~^^^^^^^^^^^^^^^^^^
# Exception| KeyError: 'combined-count'
Change the test to check if 'combined-count' is a key in the dictionary
first and if not assume that this means the driver has separate RX and
TX queues.
With this change, the test now passes successfully on tg3 and mlx5
(which does have a 'combined-count').
Fixes: 1cf270424218 ("net: selftest: add test for netdev netlink queue-get API")
Signed-off-by: Joe Damato <jdamato(a)fastly.com>
---
tools/testing/selftests/drivers/net/queues.py | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/queues.py b/tools/testing/selftests/drivers/net/queues.py
index 38303da957ee..baa8845d9f64 100755
--- a/tools/testing/selftests/drivers/net/queues.py
+++ b/tools/testing/selftests/drivers/net/queues.py
@@ -45,10 +45,13 @@ def addremove_queues(cfg, nl) -> None:
netnl = EthtoolFamily()
channels = netnl.channels_get({'header': {'dev-index': cfg.ifindex}})
- if channels['combined-count'] == 0:
- rx_type = 'rx'
+ if 'combined-count' in channels:
+ if channels['combined-count'] == 0:
+ rx_type = 'rx'
+ else:
+ rx_type = 'combined'
else:
- rx_type = 'combined'
+ rx_type = 'rx'
expected = curr_queues - 1
cmd(f"ethtool -L {cfg.dev['ifname']} {rx_type} {expected}", timeout=10)
base-commit: bc50682128bde778a1ddc457a02d92a637c20c6f
--
2.43.0
Fix three DAMON selftest bugs that causes two and one false positive
failures and success.
SeongJae Park (3):
selftests/damon/damos_quota: make real expectation of quota exceeds
selftests/damon/damon_nr_regions: set ops update for merge results
check to 100ms
selftests/damon/damon_nr_regions: sort collected regiosn before
checking with min/max boundaries
tools/testing/selftests/damon/damon_nr_regions.py | 2 ++
tools/testing/selftests/damon/damos_quota.py | 9 ++++++---
2 files changed, 8 insertions(+), 3 deletions(-)
base-commit: 0ab548cd0961a01f9ef65aa999ca84febcdb04ab
--
2.39.5
GRO tests are timing dependent and can easily flake. This is partially
mitigated in gro.sh by giving each subtest 3 chances to pass. However,
this still flakes on some machines.
Set the device's napi_defer_hard_irqs to 50 so that GRO is less likely
to immediately flush. This already happened in setup_loopback.sh, but
wasn't added to setup_veth.sh. This accounts for most of the reduction
in flakiness.
We also increase the number of chances for success from 3 to 6.
`gro.sh -t <test>` now returns a passing/failing exit code as expected.
gro.c:main no longer erroneously claims a test passes when running as a
server.
Tested: Ran `gro.sh -t large` 100 times with and without this change.
Passed 100/100 with and 64/100 without. Ran inside strace to increase
flakiness.
Signed-off-by: Kevin Krakauer <krakauer(a)google.com>
---
tools/testing/selftests/net/gro.c | 8 +++++---
tools/testing/selftests/net/gro.sh | 5 +++--
tools/testing/selftests/net/setup_veth.sh | 1 +
3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/gro.c b/tools/testing/selftests/net/gro.c
index b2184847e388..d5824eadea10 100644
--- a/tools/testing/selftests/net/gro.c
+++ b/tools/testing/selftests/net/gro.c
@@ -1318,11 +1318,13 @@ int main(int argc, char **argv)
read_MAC(src_mac, smac);
read_MAC(dst_mac, dmac);
- if (tx_socket)
+ if (tx_socket) {
gro_sender();
- else
+ } else {
+ /* Only the receiver exit status determines test success. */
gro_receiver();
+ fprintf(stderr, "Gro::%s test passed.\n", testname);
+ }
- fprintf(stderr, "Gro::%s test passed.\n", testname);
return 0;
}
diff --git a/tools/testing/selftests/net/gro.sh b/tools/testing/selftests/net/gro.sh
index 02c21ff4ca81..703173f8c8a9 100755
--- a/tools/testing/selftests/net/gro.sh
+++ b/tools/testing/selftests/net/gro.sh
@@ -21,7 +21,7 @@ run_test() {
# Each test is run 3 times to deflake, because given the receive timing,
# not all packets that should coalesce will be considered in the same flow
# on every try.
- for tries in {1..3}; do
+ for tries in {1..6}; do
# Actual test starts here
ip netns exec $server_ns ./gro "${ARGS[@]}" "--rx" "--iface" "server" \
1>>log.txt &
@@ -100,5 +100,6 @@ trap cleanup EXIT
if [[ "${test}" == "all" ]]; then
run_all_tests
else
- run_test "${proto}" "${test}"
+ exit_code=$(run_test "${proto}" "${test}")
+ exit $exit_code
fi;
diff --git a/tools/testing/selftests/net/setup_veth.sh b/tools/testing/selftests/net/setup_veth.sh
index 1f78a87f6f37..9882ad730c24 100644
--- a/tools/testing/selftests/net/setup_veth.sh
+++ b/tools/testing/selftests/net/setup_veth.sh
@@ -12,6 +12,7 @@ setup_veth_ns() {
[[ -e /var/run/netns/"${ns_name}" ]] || ip netns add "${ns_name}"
echo 1000000 > "/sys/class/net/${ns_dev}/gro_flush_timeout"
+ echo 50 > "/sys/class/net/${ns_dev}/napi_defer_hard_irqs"
ip link set dev "${ns_dev}" netns "${ns_name}" mtu 65535
ip -netns "${ns_name}" link set dev "${ns_dev}" up
--
2.48.1
This patch series extends the sev_init2 and the sev_smoke test to
exercise the SEV-SNP VM launch workflow.
Primarily, it introduces the architectural defines, its support in the
SEV library and extends the tests to interact with the SEV-SNP ioctl()
wrappers.
Patch 1 - Do not advertize SNP on initialization failure
Patch 2 - SNP test for KVM_SEV_INIT2
Patch 3 - Add vmgexit helper
Patch 4 - Add SMT control interface helper
Patch 5 - Replace assert() with TEST_ASSERT_EQ()
Patch 6 - Introduce SEV+ VM type check
Patch 7 - SNP iotcl() plumbing for the SEV library
Patch 8 - Force set GUEST_MEMFD for SNP
Patch 9 - Cleanups of smoke test - Decouple policy from type
Patch 10 - SNP smoke test
The series is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
v6..v7:
Based on comments from Sean -
* Replaced FW check with sev->snp_initialized
* Dropped the patch which removes SEV+ KVM advertizement if INIT fails.
This should be now be resolved by the combination of the patches [1,2]
from Ashish.
* Change vmgexit to an inline function
* Export SMT control parsing interface to kvm_util
Note: hyperv_cpuid KST only compile testeworkbench.editor.empty.hintd
* Replace assert() with TEST_ASSERT_EQ() within SEV library
* Define KVM_SEV_PAGE_TYPE_INVALID for SEV call of encrypt_region()
* Parameterize encrypt_region() to include privatize_region()
* Deduplication of sev test calls between SEV,SEV-ES and SNP
* Removed FW version tests for SNP
* Included testing of SNP_POLICY_DBG
* Dropped most tags from patches that have been changed or indirectly
affected
[1] https://lore.kernel.org/all/d6d08c6b-9602-4f3d-92c2-8db6d50a1b92@amd.com
[2] https://lore.kernel.org/all/f78ddb64087df27e7bcb1ae0ab53f55aa0804fab.173922…
v5..v6:
https://lore.kernel.org/kvm/ab433246-e97c-495b-ab67-b0cb1721fb99@amd.com/
* Rename is_sev_platform_init to sev_fw_initialized (Nikunj)
* Rename KVM CPU feature X86_FEATURE_SNP to X86_FEATURE_SEV_SNP (Nikunj)
* Collected Tags from Nikunj, Pankaj, Srikanth.
v4..v5:
https://lore.kernel.org/kvm/8e7d8172-879e-4a28-8438-343b1c386ec9@amd.com/
* Introduced a check to disable advertising support for SEV, SEV-ES
and SNP when platform initialization fails (Nikunj)
* Remove the redundant SNP check within is_sev_vm() (Nikunj)
* Cleanup of the encrypt_region flow for better readability (Nikunj)
* Refactor paths to use the canonical $(ARCH) to rebase for kvm/next
v3..v4:
https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd…
* Remove SNP FW API version check in the test and ensure the KVM
capability advertizes the presence of the feature. Retain the minimum
version definitions to exercise these API versions in the smoke test
* Retained only the SNP smoke test and SNP_INIT2 test
* The SNP architectural defined merged with SNP_INIT2 test patch
* SNP shutdown merged with SNP smoke test patch
* Add SEV VM type check to abstract comparisons and reduce clutter
* Define a SNP default policy which sets bits based on the presence of
SMT
* Decouple privatization and encryption for it to be SNP agnostic
* Assert for only positive tests using vm_ioctl()
* Dropped tested-by tags
In summary - based on comments from Sean, I have primarily reduced the
scope of this patch series to focus on breaking down the SNP smoke test
patch (v3 - patch2) to first introduce SEV-SNP support and use this
interface to extend the sev_init2 and the sev_smoke test.
The rest of the v3 patchset that introduces ioctl, pre fault, fallocate
and negative tests, will be re-worked and re-introduced subsequently in
future patch series post addressing the issues discussed.
v2..v3:
https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c…
* Remove the assignments for the prefault and fallocate test type
enums.
* Fix error message for sev launch measure and finish.
* Collect tested-by tags [Peter, Srikanth]](<This patch series extends the sev_init2 and the sev_smoke test to
exercise the SEV-SNP VM launch workflow.
Primarily, it introduces the architectural defines, its support in the SEV
library and extends the tests to interact with the SEV-SNP ioctl()
wrappers.
Patch 1 - Do not advertize SNP on initialization failure
Patch 2 - SNP test for KVM_SEV_INIT2
Patch 3 - Add vmgexit helper
Patch 4 - Helper for SMT control interface
Patch 5 - Replace assert() with TEST_ASSERT_EQ()
Patch 6 - Introduce SEV+ VM type check
Patch 7 - SNP iotcl() plumbing for the SEV library
Patch 8 - Force set GUEST_MEMFD for SNP
Patch 9 - Cleanups of smoke test - Decouple policy from type
Patch 10 - SNP smoke test
The series is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
v6..v7
Based on comments from Sean -
* Replaced FW check with sev-%3Esnp_initialized
* Dropped the patch which removes SEV+ KVM advertizement if INIT fails
This should be resolved by the combination of [1][2] from Ashish:
* Change vmgexit to an inline function
* Export SMT control parsing interface to kvm_util
* Replace assert() with TEST_ASSERT_EQ() within SEV library
* Define KVM_SEV_PAGE_TYPE_INVALID for SEV to use it with
encrypt_region()
* Parameterize encrypt_region() to include privatize_region()
functionality
* Deduplication of sev test calls between SEV,SEV-ES and SNP
* Removed FW version tests for SNP
* Included testing of SNP_POLICY_DBG
* Dropped most tags from patches that have directly / indirectly
changed.
[1] https://lore.kernel.org/all/d6d08c6b-9602-4f3d-92c2-8db6d50a1b92@amd.com
[2] https://lore.kernel.org/all/f78ddb64087df27e7bcb1ae0ab53f55aa0804fab.173922…
v5..v6
https://lore.kernel.org/kvm/ab433246-e97c-495b-ab67-b0cb1721fb99@amd.com/
* Rename is_sev_platform_init to sev_fw_initialized (Nikunj)
* Rename KVM CPU feature X86_FEATURE_SNP to X86_FEATURE_SEV_SNP (Nikunj)
* Collected Tags from Nikunj, Pankaj, Srikanth.
v4..v5:
https://lore.kernel.org/kvm/8e7d8172-879e-4a28-8438-343b1c386ec9@amd.com/
* Introduced a check to disable advertising support for SEV, SEV-ES
and SNP when platform initialization fails (Nikunj)
* Remove the redundant SNP check within is_sev_vm() (Nikunj)
* Cleanup of the encrypt_region flow for better readability (Nikunj)
* Refactor paths to use the canonical $(ARCH) to rebase for kvm/next
v3..v4:
https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd…
* Remove SNP FW API version check in the test and ensure the KVM
capability advertizes the presence of the feature. Retain the minimum
version definitions to exercise these API versions in the smoke test
* Retained only the SNP smoke test and SNP_INIT2 test
* The SNP architectural defined merged with SNP_INIT2 test patch
* SNP shutdown merged with SNP smoke test patch
* Add SEV VM type check to abstract comparisons and reduce clutter
* Define a SNP default policy which sets bits based on the presence of
SMT
* Decouple privatization and encryption for it to be SNP agnostic
* Assert for only positive tests using vm_ioctl()
* Dropped tested-by tags
In summary - based on comments from Sean, I have primarily reduced the
scope of this patch series to focus on breaking down the SNP smoke test
patch (v3 - patch2) to first introduce SEV-SNP support and use this
interface to extend the sev_init2 and the sev_smoke test.
The rest of the v3 patchset that introduces ioctl, pre fault, fallocate
and negative tests, will be re-worked and re-introduced subsequently in
future patch series post addressing the issues discussed.
v2..v3:
https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c…
* Remove the assignments for the prefault and fallocate test type
enums.
* Fix error message for sev launch measure and finish.
* Collect tested-by tags [Peter, Srikanth]
Pratik R. Sampat (10):
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: selftests: SEV-SNP test for KVM_SEV_INIT2
KVM: selftests: Add vmgexit helper
KVM: selftests: Add SMT control state helper
KVM: selftests: Replace assert() with TEST_ASSERT_EQ()
KVM: selftests: Introduce SEV VM type check
KVM: selftests: Add library support for interacting with SNP
KVM: selftests: Force GUEST_MEMFD flag for SNP VM type
KVM: selftests: Abstractions for SEV to decouple policy from type
KVM: selftests: Add a basic SEV-SNP smoke test
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 4 +-
drivers/crypto/ccp/sev-dev.c | 8 ++
include/linux/psp-sev.h | 3 +
tools/arch/x86/include/uapi/asm/kvm.h | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 35 +++++++
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/include/x86/sev.h | 42 ++++++++-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
.../testing/selftests/kvm/lib/x86/processor.c | 4 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 93 +++++++++++++++++--
.../testing/selftests/kvm/x86/hyperv_cpuid.c | 19 ----
.../selftests/kvm/x86/sev_init2_tests.c | 13 +++
.../selftests/kvm/x86/sev_smoke_test.c | 75 +++++++++------
14 files changed, 246 insertions(+), 60 deletions(-)
--
2.43.0
As the vIOMMU infrastructure series part-3, this introduces a new vEVENTQ
object. The existing FAULT object provides a nice notification pathway to
the user space with a queue already, so let vEVENTQ reuse that.
Mimicing the HWPT structure, add a common EVENTQ structure to support its
derivatives: IOMMUFD_OBJ_FAULT (existing) and IOMMUFD_OBJ_VEVENTQ (new).
An IOMMUFD_CMD_VEVENTQ_ALLOC is introduced to allocate vEVENTQ object for
vIOMMUs. One vIOMMU can have multiple vEVENTQs in different types but can
not support multiple vEVENTQs in the same type.
The forwarding part is fairly simple but might need to replace a physical
device ID with a virtual device ID in a driver-level event data structure.
So, this also adds some helpers for drivers to use.
As usual, this series comes with the selftest coverage for this new ioctl
and with a real world use case in the ARM SMMUv3 driver.
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_veventq-v7
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_veventq-v7
Changelog
v7
* Rebase on Jason's for-next tree for latest fault.c
* Add Reviewed-by
* Update commit logs
* Add __reserved field sanity
* Skip kfree() on the static header
* Replace "bool on_list" with list_is_last()
* Use u32 for flags in iommufd_vevent_header
* Drop casting in iommufd_viommu_get_vdev_id()
* Update the bounding logic to veventq->sequence
* Add missing cpu_to_le64() around STRTAB_STE_1_MEV
* Reuse veventq->common.lock to fence sequence and num_events
* Rename overflow to lost_events and log it in upon kmalloc failure
* Correct the error handling part in iommufd_veventq_deliver_fetch()
* Add an arm_smmu_clear_vmaster() to simplify identity/blocked domain
attach ops
* Add additional four event records to forward to user space VM, and
update the uAPI doc
* Reuse the existing smmu->streams_mutex lock to fence master->vmaster
pointer, instead of adding a new rwsem
v6
https://lore.kernel.org/all/cover.1737754129.git.nicolinc@nvidia.com/
* Drop supports_veventq viommu op
* Split bug/cosmetics fixes out of the series
* Drop the blocking mutex around copy_to_user()
* Add veventq_depth in uAPI to limit vEVENTQ size
* Revise the documentation for a clear description
* Fix sparse warnings in arm_vmaster_report_event()
* Rework iommufd_viommu_get_vdev_id() to return -ENOENT v.s. 0
* Allow Abort/Bypass STEs to allocate vEVENTQ and set STE.MEV for DoS
mitigations
v5
https://lore.kernel.org/all/cover.1736237481.git.nicolinc@nvidia.com/
* Add Reviewed-by from Baolu
* Reorder the OBJ list as well
* Fix alphabetical order after renaming in v4
* Add supports_veventq viommu op for vEVENTQ type validation
v4
https://lore.kernel.org/all/cover.1735933254.git.nicolinc@nvidia.com/
* Rename "vIRQ" to "vEVENTQ"
* Use flexible array in struct iommufd_vevent
* Add the new ioctl command to union ucmd_buffer
* Fix the alphabetical order in union ucmd_buffer too
* Rename _TYPE_NONE to _TYPE_DEFAULT aligning with vIOMMU naming
v3
https://lore.kernel.org/all/cover.1734477608.git.nicolinc@nvidia.com/
* Rebase on Will's for-joerg/arm-smmu/updates for arm_smmu_event series
* Add "Reviewed-by" lines from Kevin
* Fix typos in comments, kdocs, and jump tags
* Add a patch to sort struct iommufd_ioctl_op
* Update iommufd's userpsace-api documentation
* Update uAPI kdoc to quote SMMUv3 offical spec
* Drop the unused workqueue in struct iommufd_virq
* Drop might_sleep() in iommufd_viommu_report_irq() helper
* Add missing "break" in iommufd_viommu_get_vdev_id() helper
* Shrink the scope of the vmaster's read lock in SMMUv3 driver
* Pass in two arguments to iommufd_eventq_virq_handler() helper
* Move "!ops || !ops->read" validation into iommufd_eventq_init()
* Move "fault->ictx = ictx" closer to iommufd_ctx_get(fault->ictx)
* Update commit message for arm_smmu_attach_prepare/commit_vmaster()
* Keep "iommufd_fault" as-is and rename "iommufd_eventq_virq" to just
"iommufd_virq"
v2
https://lore.kernel.org/all/cover.1733263737.git.nicolinc@nvidia.com/
* Rebase on v6.13-rc1
* Add IOPF and vIRQ in iommufd.rst (userspace-api)
* Add a proper locking in iommufd_event_virq_destroy
* Add iommufd_event_virq_abort with a lockdep_assert_held
* Rename "EVENT_*" to "EVENTQ_*" to describe the objects better
* Reorganize flows in iommufd_eventq_virq_alloc for abort() to work
* Adde struct arm_smmu_vmaster to store vSID upon attaching to a nested
domain, calling a newly added iommufd_viommu_get_vdev_id helper
* Adde an arm_vmaster_report_event helper in arm-smmu-v3-iommufd file
to simplify the routine in arm_smmu_handle_evt() of the main driver
v1
https://lore.kernel.org/all/cover.1724777091.git.nicolinc@nvidia.com/
Thanks!
Nicolin
Nicolin Chen (14):
iommufd/fault: Move two fault functions out of the header
iommufd/fault: Add an iommufd_fault_init() helper
iommufd: Abstract an iommufd_eventq from iommufd_fault
iommufd: Rename fault.c to eventq.c
iommufd: Add IOMMUFD_OBJ_VEVENTQ and IOMMUFD_CMD_VEVENTQ_ALLOC
iommufd/viommu: Add iommufd_viommu_get_vdev_id helper
iommufd/viommu: Add iommufd_viommu_report_event helper
iommufd/selftest: Require vdev_id when attaching to a nested domain
iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_VEVENT for vEVENTQ
coverage
iommufd/selftest: Add IOMMU_VEVENTQ_ALLOC test coverage
Documentation: userspace-api: iommufd: Update FAULT and VEVENTQ
iommu/arm-smmu-v3: Introduce struct arm_smmu_vmaster
iommu/arm-smmu-v3: Report events that belong to devices attached to
vIOMMU
iommu/arm-smmu-v3: Set MEV bit in nested STE for DoS mitigations
drivers/iommu/iommufd/Makefile | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 35 +
drivers/iommu/iommufd/iommufd_private.h | 135 +++-
drivers/iommu/iommufd/iommufd_test.h | 10 +
include/linux/iommufd.h | 23 +
include/uapi/linux/iommufd.h | 105 +++
tools/testing/selftests/iommu/iommufd_utils.h | 115 ++++
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 72 +++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 79 ++-
drivers/iommu/iommufd/driver.c | 72 +++
drivers/iommu/iommufd/eventq.c | 597 ++++++++++++++++++
drivers/iommu/iommufd/fault.c | 342 ----------
drivers/iommu/iommufd/hw_pagetable.c | 6 +-
drivers/iommu/iommufd/main.c | 7 +
drivers/iommu/iommufd/selftest.c | 54 ++
drivers/iommu/iommufd/viommu.c | 2 +
tools/testing/selftests/iommu/iommufd.c | 36 ++
.../selftests/iommu/iommufd_fail_nth.c | 7 +
Documentation/userspace-api/iommufd.rst | 17 +
19 files changed, 1308 insertions(+), 408 deletions(-)
create mode 100644 drivers/iommu/iommufd/eventq.c
delete mode 100644 drivers/iommu/iommufd/fault.c
base-commit: 598749522d4254afb33b8a6c1bea614a95896868
--
2.43.0
While taking a look at '[PATCH net] pktgen: Avoid out-of-range in
get_imix_entries' ([1]) and '[PATCH net v2] pktgen: Avoid out-of-bounds
access in get_imix_entries' ([2], [3]) and doing some tests and code review
I detected that the /proc/net/pktgen/... parsing logic does not honour the
user given buffer bounds (resulting in out-of-bounds access).
This can be observed e.g. by the following simple test (sometimes the
old/'longer' previous value is re-read from the buffer):
$ echo add_device lo@0 > /proc/net/pktgen/kpktgend_0
$ echo "min_pkt_size 12345" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo -n "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 12345 max_pkt_size: 0
Result: OK: min_pkt_size=12345
$ echo "min_pkt_size 123" > /proc/net/pktgen/lo\@0 && grep min_pkt_size /proc/net/pktgen/lo\@0
Params: count 1000 min_pkt_size: 123 max_pkt_size: 0
Result: OK: min_pkt_size=123
So fix the out-of-bounds access (and some minor findings) and add a simple
proc_net_pktgen selftest...
Patch set splited into part I (now already applied to net-next)
- net: pktgen: replace ENOTSUPP with EOPNOTSUPP
- net: pktgen: enable 'param=value' parsing
- net: pktgen: fix hex32_arg parsing for short reads
- net: pktgen: fix 'rate 0' error handling (return -EINVAL)
- net: pktgen: fix 'ratep 0' error handling (return -EINVAL)
- net: pktgen: fix ctrl interface command parsing
- net: pktgen: fix access outside of user given buffer in pktgen_thread_write()
And part II (this one):
- net: pktgen: use defines for the various dec/hex number parsing digits lengths
- net: pktgen: fix mix of int/long
- net: pktgen: remove extra tmp variable (re-use len instead)
- net: pktgen: remove some superfluous variable initializing
- net: pktgen: fix mpls maximum labels list parsing
- net: pktgen: fix access outside of user given buffer in pktgen_if_write()
- net: pktgen: fix mpls reset parsing
- net: pktgen: remove all superfluous index assignements
- selftest: net: add proc_net_pktgen
Regards,
Peter
Changes v5 -> v6:
- add rev-by Simon Horman
- drop patch 'net: pktgen: use defines for the various dec/hex number
parsing digits lengths'
- adjust to dropped patch ''net: pktgen: use defines for the various
dec/hex number parsing digits lengths'
- net: pktgen: fix mix of int/long
- fix line break (suggested by Simon Horman)
Changes v4 -> v5:
- split up patchset into part i/ii (suggested by Simon Horman)
- add rev-by Simon Horman
- net: pktgen: align some variable declarations to the most common pattern
-> net: pktgen: fix mix of int/long
- instead of align to most common pattern (int) adjust all usages to
size_t for i and max and ssize_t for len and adjust function signatures
of hex32_arg(), count_trail_chars(), num_arg() and strn_len() accordingly
- respect reverse xmas tree order for local variable declarations (where
possible without too much code churn)
- update subject line and patch description
- dropped net: pktgen: hex32_arg/num_arg error out in case no characters are
available
- keep empty hex/num arg is implicit assumed as zero value
- dropped net: pktgen: num_arg error out in case no valid character is parsed
- keep empty hex/num arg is implicit assumed as zero value
- Change patch description ('Fixes:' -> 'Addresses the following:',
suggested by Simon Horman)
- net: pktgen: remove all superfluous index assignements
- new patch (suggested by Simon Horman)
- selftest: net: add proc_net_pktgen
- addapt to dropped patch 'net: pktgen: hex32_arg/num_arg error out in case
no characters are available', empty hex/num arg is now implicit assumed as
zero value (instead of failure)
Changes v3 -> v4:
- add rev-by Simon Horman
- new patch 'net: pktgen: use defines for the various dec/hex number parsing
digits lengths' (suggested by Simon Horman)
- replace C99 comment (suggested by Paolo Abeni)
- drop available characters check in strn_len() (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: align some variable declarations to the
most common pattern' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove extra tmp variable (re-use len
instead)' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: remove some superfluous variable
initializing' (suggested by Paolo Abeni)
- factored out patch 'net: pktgen: fix mpls maximum labels list parsing'
(suggested by Paolo Abeni)
- factored out 'net: pktgen: hex32_arg/num_arg error out in case no
characters are available' (suggested by Paolo Abeni)
- factored out 'net: pktgen: num_arg error out in case no valid character
is parsed' (suggested by Paolo Abeni)
Changes v2 -> v3:
- new patch: 'net: pktgen: fix ctrl interface command parsing'
- new patch: 'net: pktgen: fix mpls reset parsing'
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix typo in change description ('v1 -> v1' and tyop)
- rename some vars to better match usage
add_loopback_0 -> thr_cmd_add_loopback_0
rm_loopback_0 -> thr_cmd_rm_loopback_0
wrong_ctrl_cmd -> wrong_thr_cmd
legacy_ctrl_cmd -> legacy_thr_cmd
ctrl_fd -> thr_fd
- add ctrl interface tests
Changes v1 -> v2:
- new patch: 'net: pktgen: fix hex32_arg parsing for short reads'
- new patch: 'net: pktgen: fix 'rate 0' error handling (return -EINVAL)'
- new patch: 'net: pktgen: fix 'ratep 0' error handling (return -EINVAL)'
- net/core/pktgen.c: additional fix get_imix_entries() and get_labels()
- tools/testing/selftests/net/proc_net_pktgen.c:
- fix tyop not vs. nod (suggested by Jakub Kicinski)
- fix misaligned line (suggested by Jakub Kicinski)
- enable fomerly commented out CONFIG_XFRM dependent test (command spi),
as CONFIG_XFRM is enabled via tools/testing/selftests/net/config
CONFIG_XFRM_INTERFACE/CONFIG_XFRM_USER (suggestex by Jakub Kicinski)
- add CONFIG_NET_PKTGEN=m to tools/testing/selftests/net/config
(suggested by Jakub Kicinski)
- add modprobe pktgen to FIXTURE_SETUP() (suggested by Jakub Kicinski)
- fix some checkpatch warnings (Missing a blank line after declarations)
- shrink line length by re-naming some variables (command -> cmd,
device -> dev)
- add 'rate 0' testcase
- add 'ratep 0' testcase
[1] https://lore.kernel.org/netdev/20241006221221.3744995-1-artem.chernyshev@re…
[2] https://lore.kernel.org/netdev/20250109083039.14004-1-pchelkin@ispras.ru/
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Peter Seiderer (8):
net: pktgen: fix mix of int/long
net: pktgen: remove extra tmp variable (re-use len instead)
net: pktgen: remove some superfluous variable initializing
net: pktgen: fix mpls maximum labels list parsing
net: pktgen: fix access outside of user given buffer in
pktgen_if_write()
net: pktgen: fix mpls reset parsing
net: pktgen: remove all superfluous index assignements
selftest: net: add proc_net_pktgen
net/core/pktgen.c | 288 ++++----
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/proc_net_pktgen.c | 646 ++++++++++++++++++
4 files changed, 805 insertions(+), 131 deletions(-)
create mode 100644 tools/testing/selftests/net/proc_net_pktgen.c
--
2.48.1
The cited commit fixed a software GSO bug with VXLAN + IPSec in tunnel
mode. Unfortunately, it is slightly broader than necessary, as it also
severely affects performance for Geneve + IPSec transport mode over a
device capable of both HW GSO and IPSec crypto offload. In this case,
xfrm_output unnecessarily triggers software GSO instead of letting the
HW do it. In simple iperf3 tests over Geneve + IPSec transport mode over
a back-2-back pair of NICs with MTU 1500, the performance was observed
to be up to 6x worse when doing software GSO compared to leaving it to
the hardware.
This commit makes xfrm_output only trigger software GSO in crypto
offload cases for already encapsulated packets in tunnel mode, as not
doing so would then cause the inner tunnel skb->inner_networking_header
to be overwritten and break software GSO for that packet later if the
device turns out to not be capable of HW GSO.
Taking a closer look at the conditions for the original bug, to better
understand the reasons for this change:
- vxlan_build_skb -> iptunnel_handle_offloads sets inner_protocol and
inner network header.
- then, udp_tunnel_xmit_skb -> ip_tunnel_xmit adds outer transport and
network headers.
- later in the xmit path, xfrm_output -> xfrm_outer_mode_output ->
xfrm4_prepare_output -> xfrm4_tunnel_encap_add overwrites the inner
network header with the one set in ip_tunnel_xmit before adding the
second outer header.
- __dev_queue_xmit -> validate_xmit_skb checks whether GSO segmentation
needs to happen based on dev features. In the original bug, the hw
couldn't segment the packets, so skb_gso_segment was invoked.
- deep in the .gso_segment callback machinery, __skb_udp_tunnel_segment
tries to use the wrong inner network header, expecting the one set in
iptunnel_handle_offloads but getting the one set by xfrm instead.
- a bit later, ipv6_gso_segment accesses the wrong memory based on that
wrong inner network header.
With the new change, the original bug (or similar ones) cannot happen
again, as xfrm will now trigger software GSO before applying a tunnel.
This concern doesn't exist in packet offload mode, when the HW adds
encapsulation headers. For the non-offloaded packets (crypto in SW),
software GSO is still done unconditionally in the else branch.
Reviewed-by: Dragos Tatulea <dtatulea(a)nvidia.com>
Reviewed-by: Yael Chemla <ychemla(a)nvidia.com>
Fixes: a204aef9fd77 ("xfrm: call xfrm_output_gso when inner_protocol is set in xfrm_output")
Signed-off-by: Cosmin Ratiu <cratiu(a)nvidia.com>
---
net/xfrm/xfrm_output.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index f7abd42c077d..42f1ca513879 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -758,7 +758,7 @@ int xfrm_output(struct sock *sk, struct sk_buff *skb)
skb->encapsulation = 1;
if (skb_is_gso(skb)) {
- if (skb->inner_protocol)
+ if (skb->inner_protocol && x->props.mode == XFRM_MODE_TUNNEL)
return xfrm_output_gso(net, sk, skb);
skb_shinfo(skb)->gso_type |= SKB_GSO_ESP;
--
2.45.0
This patch series introduces changes to add default build support for
the sched tests in selftests.
The only test under sched is cs_prctl_test which validates cookies when
core scheduling is in effect. This test fails on systems where core
scheduling is disabled. The patch series also modifies this behaviour to
gracefully skip the test on such systems.
For example, such a test skip would look like:
TAP version 13
1..1
timeout set to 45
selftests: sched: cs_prctl_test
Checking for CONFIG_SCHED_CORE support
Core scheduling not enabled in kernel, hence skipping tests
ok 1 selftests: sched: cs_prctl_test # SKIP
and a successful run:
TAP version 13
1..1
timeout set to 45
selftests: sched: cs_prctl_test
Checking for CONFIG_SCHED_CORE support
CONFIG_SCHED_CORE=y
.
.
.
SUCCESS !!!
ok 1 selftests: sched: cs_prctl_test
Signed-off-by: Sinadin Shan <sinadin.shan(a)oracle.com>
---
v2:
* Add patch to skip cs_prctl_test on core scheduling disabled systems
* v1 link: https://lore.kernel.org/all/20250219064658.449069-1-sinadin.shan@oracle.com
---
Sinadin Shan (2):
selftests: sched: add sched as a default selftest target
selftests: sched: skip cs_prctl_test for systems with core scheduling
disabled
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/sched/cs_prctl_test.c | 29 ++++++++++++++++++-
2 files changed, 29 insertions(+), 1 deletion(-)
--
2.43.5
> > > On Mon, 2025-02-10 at 16:06 -0800, Andrii Nakryiko wrote:
> > > > Tracking associated maps for a program is not necessary. As long as
> > > > the last BPF program using the BPF map is unloaded, the kernel will
> > > > automatically free not-anymore-referenced BPF map. Note that
> > > > bpf_object itself will keep FDs for BPF maps, so you'd need to make
> > > > sure to do bpf_object__close() to release those references.
> > > >
> > > > But if you are going to ask to re-create BPF maps next time BPF
> > > > program is loaded... Well, I'll say you are asking for a bit too >
> > > > much,
> > > > tbh. If you want to be *that* sophisticated, it shouldn't be too
> > > > hard
> > > > for you to get all this information from BPF program's
> > > > instructions.
> > > >
> >
> > We really are that sophisticated (see below for more details). We could
> > scan program instructions, but we'd then tie our logic to BPF
> > implementation details and duplicate logic already present in libbpf
> > (https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.13.2/source… <https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v6.13.2/source…>*L6087__;Iw!!BmdzS3_lV9HdKG8!384hiRcpmYOjMU3kT-1m4GxWp5XxwHj9ICrDtIUKCmCbxLVAxemc6UN64cE3Mktu_z4f2JheoKm6RBYQdc-MiPbyexk6-Rn48A$
> > ). Obviously this *can* be done but it's not at all ideal from an
> > application perspective.
> >
>
>
> I agree it's not ideal, but it's also not some complicated and
> bound-to-be-changed logic. What you point out in libbpf source code is
> a bit different thing, reality is much simpler. Only so-called ldimm64
> instruction (BPF_LD | BPF_IMM | BPF_DW opcode) can be referencing map
> FD, so analysing this is borderline trivial. And this is part of BPF
> ISA, so not going to change.
Our approach is to associate an array of maps as a property with each
BPF program, this property is initialised at the relocation stage.
So, we do not need to parse BPF program instructions. Instead, we rely on
recorded relocations. I think this is a more robust and clean solution with
advantage of all code in the same place and being at the higher level of
abstraction with a relocation table.
The mainline libbpf keeps array of maps for a bpf_object, we extended
this by adding an array of maps associated with each bpf_program.
For example, a code excerpt, from our development branch, which associates
a map with bpf_program at relocation phase:
insn[0].src_reg = BPF_PSEUDO_MAP_FD;
insn[0].imm = map->fd;
err = bpf_program__add_map(prog, map);
> > > > > >
> > > > bpf_object is the unit of coherence in libbpf, so I don't see us
> > > > refcounting maps between bpf_objects. Kernel is doing refcounting
> > > > based on FDs, so see if you can use that.
> > > >
> >
> > I can understand that. That said, I think if there's no logic across
> > objects, and bpf_object access is not thread-safe, it puts us into a
> > tough situation:
> > - Complex refcounting, code scanning, etc to keep consistency when
> > manipulating maps used by multiple programs.
> > - Parallel loading not being well-balanced, if we split programs across
> > objects.
> >
> > We could alternatively write our own custom loader, but then we’d have
> > to duplicate much of the useful logic that libbpf already implements:
> > skeleton generation, map/program association, embedding programs into
> > ELFs, loading logic and kernel probing, etc. We’d like some way to
> > handle dynamic/parallel loading without having to replicate all the
> > advantages libbpf grants us.
> >
>
>
> Yeah, I can understand that as well, but bpf_object's single-threaded
> design and the fact that bpf_object__load is kind of the final step
> where programs are loaded (or not) is pretty backed in. I don't see
> bpf_object becoming multi-threaded.
We understood this, but the current bpf_object design allowed us to use
it in a multithreaded environment with minor modification for bpf_program
load.
We understand that the design choice of libbpf being single threaded
is unlikely to be reconsidered.
> > > > > >
> > > > bpf_object is the unit of coherence in libbpf, so I don't see us
> > > > refcounting maps between bpf_objects. Kernel is doing refcounting
> > > > based on FDs, so see if you can use that.
> > > >
> >
> > I can understand that. That said, I think if there's no logic across
> > objects, and bpf_object access is not thread-safe, it puts us into a
> > tough situation:
> > - Complex refcounting, code scanning, etc to keep consistency when
> > manipulating maps used by multiple programs.
> > - Parallel loading not being well-balanced, if we split programs across
> > objects.
> >
> > We could alternatively write our own custom loader, but then we’d have
> > to duplicate much of the useful logic that libbpf already implements:
> > skeleton generation, map/program association, embedding programs into
> > ELFs, loading logic and kernel probing, etc. We’d like some way to
> > handle dynamic/parallel loading without having to replicate all the
> > advantages libbpf grants us.
> >
>
>
> Yeah, I can understand that as well, but bpf_object's single-threaded
> design and the fact that bpf_object__load is kind of the final step
> where programs are loaded (or not) is pretty backed in. I don't see
> bpf_object becoming multi-threaded. The dynamic program
> loading/unloading/loading again is something that I can't yet justify,
> tbh.
>
>
> So the best I can propose you is to use libbpf's skeleton and
> bpf_object concept for, effectively, ELF handling, relocations, all
> the preparations up to loading BPF programs. And after that you can
> take over loading and handling program lifetime outside of bpf_object.
>
>
> Dynamic map creation after bpf_object__load() I think is completely
> outside of the scope and you'll have to solve this problem for
> yourself. I would point out, though, that internally libbpf already
> switched to sort-of pre-creating stable FDs for maps before they are
> actually created in the kernel. So it's conceivable that we can have
> more granularity in bpf_object preparation. I.e., first step would be
> to parse ELF and handle relocations, prepare everything. After that we
> can have a step to create maps, and then another one to create
> programs. Usually people would do all that, but you can stop right
> before maps creation or before program creation, whatever fits your
> use case better.
>
>
> The key is that program instructions will be final and won't need
> adjustments regardless of maps actually being created or not. FDs, as
> I mentioned, are stable regardless.
We used this in our design, so we did not need to scan BPF program
instructions to fix map's fds referenced by instructions from a dynamically
loaded bpf_program with dynamically created maps.
> >
> > The use case here is that our security monitoring agent leverages eBPF
> > as its foundational technology to gather telemetry from the kernel. As
> > part of that, we hook many different kernel subsystems (process,
> > memory, filesystem, network, etc), tying them together and tracking
> > with maps. So we legitimately have a very large number of programs all
> > doing different work. For products of this scale, it increases security
> > and performance to load this set of programs and their maps in an
> > optimized, parallel fashion and subsequently change the loaded set of
> > programs and maps dynamically without disturbing the rest of the
> > application.
>
>
> Yes, makes sense. You'll need to decide for yourself if it's actually
> more meaningful to split those 200 programs into independent
> bpf_objects by features, and be rigorous about sharing state (maps)
> through bpf_map__reuse_fd(), which would allow to parallelize loading
> within confines of existing libbpf APIs. Or you can be a bit more
> low-level with program loading outside of bpf_object API, as I
> described above.
Yes, this can be one of the ways to share bpf maps across multiple bpf_objects
and use existing libbpf for parallel bps programs loading, if we want to keep
a full libbpf compatibility, but at a cost of complicating design, as we need to
convert a single bpf_object model to multiple bpf_objects with a new layer
that manages these bpf_objects.
In our case, as a bpf_program can map to multiple features, which can be
modified independently, and to achieve an even load balancing across multiple
threads, it would be probably one bpf_program for a bpf_object.
For testing the functionality of the vDSO, it is necessary to build
userspace programs for multiple different architectures.
It is additional work to acquire matching userspace cross-compilers with
full C libraries and then building root images out of those.
The kernel tree already contains nolibc, a small, header-only C library.
By using it, it is possible to build userspace programs without any
additional dependencies.
For example the kernel.org crosstools or multi-target clang can be used
to build test programs for a multitude of architectures.
While nolibc is very limited, it is enough for many selftests.
With some minor adjustments it is possible to make parse_vdso.c
compatible with nolibc.
As an example, vdso_standalone_test_x86 is now built from the same C
code as the regular vdso_test_gettimeofday, while still being completely
standalone.
This should probably go through the kselftest tree.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Thomas Weißschuh (16):
MAINTAINERS: Add vDSO selftests
elf, uapi: Add definition for STN_UNDEF
elf, uapi: Add definition for DT_GNU_HASH
elf, uapi: Add definitions for VER_FLG_BASE and VER_FLG_WEAK
elf, uapi: Add type ElfXX_Versym
elf, uapi: Add types ElfXX_Verdef and ElfXX_Veraux
tools/include: Add uapi/linux/elf.h
selftests: Add headers target
selftests: vDSO: vdso_standalone_test_x86: Use vdso_init_form_sysinfo_ehdr
selftests: vDSO: parse_vdso: Drop vdso_init_from_auxv()
selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers
selftests: vDSO: parse_vdso: Test __SIZEOF_LONG__ instead of ULONG_MAX
selftests: vDSO: parse_vdso: Make compatible with nolibc
selftests: vDSO: vdso_test_gettimeofday: Clean up includes
selftests: vDSO: vdso_test_gettimeofday: Make compatible with nolibc
selftests: vDSO: vdso_standalone_test_x86: Switch to nolibc
MAINTAINERS | 1 +
include/uapi/linux/elf.h | 38 ++
tools/include/uapi/linux/elf.h | 524 +++++++++++++++++++++
tools/testing/selftests/lib.mk | 5 +-
tools/testing/selftests/vDSO/Makefile | 11 +-
tools/testing/selftests/vDSO/parse_vdso.c | 21 +-
tools/testing/selftests/vDSO/parse_vdso.h | 1 -
.../selftests/vDSO/vdso_standalone_test_x86.c | 143 +-----
.../selftests/vDSO/vdso_test_gettimeofday.c | 4 +-
9 files changed, 584 insertions(+), 164 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20241017-parse_vdso-nolibc-e069baa7ff48
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
A task in the kernel (task_mm_cid_work) runs somewhat periodically to
compact the mm_cid for each process. Add a test to validate that it runs
correctly and timely.
The test spawns 1 thread pinned to each CPU, then each thread, including
the main one, runs in short bursts for some time. During this period, the
mm_cids should be spanning all numbers between 0 and nproc.
At the end of this phase, a thread with high enough mm_cid (>= nproc/2)
is selected to be the new leader, all other threads terminate.
After some time, the only running thread should see 0 as mm_cid, if that
doesn't happen, the compaction mechanism didn't work and the test fails.
The test never fails if only 1 core is available, in which case, we
cannot test anything as the only available mm_cid is 0.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Signed-off-by: Gabriele Monaco <gmonaco(a)redhat.com>
---
tools/testing/selftests/rseq/.gitignore | 1 +
tools/testing/selftests/rseq/Makefile | 2 +-
.../selftests/rseq/mm_cid_compaction_test.c | 200 ++++++++++++++++++
3 files changed, 202 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/rseq/mm_cid_compaction_test.c
diff --git a/tools/testing/selftests/rseq/.gitignore b/tools/testing/selftests/rseq/.gitignore
index 16496de5f6ce4..2c89f97e4f737 100644
--- a/tools/testing/selftests/rseq/.gitignore
+++ b/tools/testing/selftests/rseq/.gitignore
@@ -3,6 +3,7 @@ basic_percpu_ops_test
basic_percpu_ops_mm_cid_test
basic_test
basic_rseq_op_test
+mm_cid_compaction_test
param_test
param_test_benchmark
param_test_compare_twice
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb586..ce1b38f46a355 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -16,7 +16,7 @@ OVERRIDE_TARGETS = 1
TEST_GEN_PROGS = basic_test basic_percpu_ops_test basic_percpu_ops_mm_cid_test param_test \
param_test_benchmark param_test_compare_twice param_test_mm_cid \
- param_test_mm_cid_benchmark param_test_mm_cid_compare_twice
+ param_test_mm_cid_benchmark param_test_mm_cid_compare_twice mm_cid_compaction_test
TEST_GEN_PROGS_EXTENDED = librseq.so
diff --git a/tools/testing/selftests/rseq/mm_cid_compaction_test.c b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
new file mode 100644
index 0000000000000..7ddde3b657dd6
--- /dev/null
+++ b/tools/testing/selftests/rseq/mm_cid_compaction_test.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: LGPL-2.1
+#define _GNU_SOURCE
+#include <assert.h>
+#include <pthread.h>
+#include <sched.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stddef.h>
+
+#include "../kselftest.h"
+#include "rseq.h"
+
+#define VERBOSE 0
+#define printf_verbose(fmt, ...) \
+ do { \
+ if (VERBOSE) \
+ printf(fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/* 0.5 s */
+#define RUNNER_PERIOD 500000
+/* Number of runs before we terminate or get the token */
+#define THREAD_RUNS 5
+
+/*
+ * Number of times we check that the mm_cid were compacted.
+ * Checks are repeated every RUNNER_PERIOD.
+ */
+#define MM_CID_COMPACT_TIMEOUT 10
+
+struct thread_args {
+ int cpu;
+ int num_cpus;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ pthread_t *tinfo;
+ struct thread_args *args_head;
+};
+
+static void __noreturn *thread_runner(void *arg)
+{
+ struct thread_args *args = arg;
+ int i, ret, curr_mm_cid;
+ cpu_set_t cpumask;
+
+ CPU_ZERO(&cpumask);
+ CPU_SET(args->cpu, &cpumask);
+ ret = pthread_setaffinity_np(pthread_self(), sizeof(cpumask), &cpumask);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to set affinity");
+ abort();
+ }
+ pthread_barrier_wait(args->barrier);
+
+ for (i = 0; i < THREAD_RUNS; i++)
+ usleep(RUNNER_PERIOD);
+ curr_mm_cid = rseq_current_mm_cid();
+ /*
+ * We select one thread with high enough mm_cid to be the new leader.
+ * All other threads (including the main thread) will terminate.
+ * After some time, the mm_cid of the only remaining thread should
+ * converge to 0, if not, the test fails.
+ */
+ if (curr_mm_cid >= args->num_cpus / 2 &&
+ !pthread_mutex_trylock(args->token)) {
+ printf_verbose(
+ "cpu%d has mm_cid=%d and will be the new leader.\n",
+ sched_getcpu(), curr_mm_cid);
+ for (i = 0; i < args->num_cpus; i++) {
+ if (args->tinfo[i] == pthread_self())
+ continue;
+ ret = pthread_join(args->tinfo[i], NULL);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to join thread");
+ abort();
+ }
+ }
+ pthread_barrier_destroy(args->barrier);
+ free(args->tinfo);
+ free(args->token);
+ free(args->barrier);
+ free(args->args_head);
+
+ for (i = 0; i < MM_CID_COMPACT_TIMEOUT; i++) {
+ curr_mm_cid = rseq_current_mm_cid();
+ printf_verbose("run %d: mm_cid=%d on cpu%d.\n", i,
+ curr_mm_cid, sched_getcpu());
+ if (curr_mm_cid == 0)
+ exit(EXIT_SUCCESS);
+ usleep(RUNNER_PERIOD);
+ }
+ exit(EXIT_FAILURE);
+ }
+ printf_verbose("cpu%d has mm_cid=%d and is going to terminate.\n",
+ sched_getcpu(), curr_mm_cid);
+ pthread_exit(NULL);
+}
+
+int test_mm_cid_compaction(void)
+{
+ cpu_set_t affinity;
+ int i, j, ret = 0, num_threads;
+ pthread_t *tinfo;
+ pthread_mutex_t *token;
+ pthread_barrier_t *barrier;
+ struct thread_args *args;
+
+ sched_getaffinity(0, sizeof(affinity), &affinity);
+ num_threads = CPU_COUNT(&affinity);
+ tinfo = calloc(num_threads, sizeof(*tinfo));
+ if (!tinfo) {
+ perror("Error: failed to allocate tinfo");
+ return -1;
+ }
+ args = calloc(num_threads, sizeof(*args));
+ if (!args) {
+ perror("Error: failed to allocate args");
+ ret = -1;
+ goto out_free_tinfo;
+ }
+ token = malloc(sizeof(*token));
+ if (!token) {
+ perror("Error: failed to allocate token");
+ ret = -1;
+ goto out_free_args;
+ }
+ barrier = malloc(sizeof(*barrier));
+ if (!barrier) {
+ perror("Error: failed to allocate barrier");
+ ret = -1;
+ goto out_free_token;
+ }
+ if (num_threads == 1) {
+ fprintf(stderr, "Cannot test on a single cpu. "
+ "Skipping mm_cid_compaction test.\n");
+ /* only skipping the test, this is not a failure */
+ goto out_free_barrier;
+ }
+ pthread_mutex_init(token, NULL);
+ ret = pthread_barrier_init(barrier, NULL, num_threads);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to initialise barrier");
+ goto out_free_barrier;
+ }
+ for (i = 0, j = 0; i < CPU_SETSIZE && j < num_threads; i++) {
+ if (!CPU_ISSET(i, &affinity))
+ continue;
+ args[j].num_cpus = num_threads;
+ args[j].tinfo = tinfo;
+ args[j].token = token;
+ args[j].barrier = barrier;
+ args[j].cpu = i;
+ args[j].args_head = args;
+ if (!j) {
+ /* The first thread is the main one */
+ tinfo[0] = pthread_self();
+ ++j;
+ continue;
+ }
+ ret = pthread_create(&tinfo[j], NULL, thread_runner, &args[j]);
+ if (ret) {
+ errno = ret;
+ perror("Error: failed to create thread");
+ abort();
+ }
+ ++j;
+ }
+ printf_verbose("Started %d threads.\n", num_threads);
+
+ /* Also main thread will terminate if it is not selected as leader */
+ thread_runner(&args[0]);
+
+ /* only reached in case of errors */
+out_free_barrier:
+ free(barrier);
+out_free_token:
+ free(token);
+out_free_args:
+ free(args);
+out_free_tinfo:
+ free(tinfo);
+
+ return ret;
+}
+
+int main(int argc, char **argv)
+{
+ if (!rseq_mm_cid_available()) {
+ fprintf(stderr, "Error: rseq_mm_cid unavailable\n");
+ return -1;
+ }
+ if (test_mm_cid_compaction())
+ return -1;
+ return 0;
+}
--
2.48.1