The check of is_backed_by_folio() is done on each page.
Directly move pointer to next page instead of increase one and check if
it is page size aligned.
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index 10ae65ea032f..7f7016ba4054 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -423,9 +423,8 @@ static void split_pte_mapped_thp(void)
/* smap does not show THPs after mremap, use kpageflags instead */
thp_size = 0;
- for (i = 0; i < pagesize * 4; i++)
- if (i % pagesize == 0 &&
- is_backed_by_folio(&pte_mapped[i], pmd_order, pagemap_fd, kpageflags_fd))
+ for (i = 0; i < pagesize * 4; i += pagesize)
+ if (is_backed_by_folio(&pte_mapped[i], pmd_order, pagemap_fd, kpageflags_fd))
thp_size++;
if (thp_size != 4)
--
2.34.1
I've removed the RFC tag from this version of the series, but the items
that I'm looking for feedback on remains the same:
- The userspace ABI, in particular:
- The vector length used for the SVE registers, access to the SVE
registers and access to ZA and (if available) ZT0 depending on
the current state of PSTATE.{SM,ZA}.
- The use of a single finalisation for both SVE and SME.
- The addition of control for enabling fine grained traps in a similar
manner to FGU but without the UNDEF, I'm not clear if this is desired
at all and at present this requires symmetric read and write traps like
FGU. That seemed like it might be desired from an implementation
point of view but we already have one case where we enable an
asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it
seems generally useful to enable asymmetrically.
This series implements support for SME use in non-protected KVM guests.
Much of this is very similar to SVE, the main additional challenge that
SME presents is that it introduces a new vector length similar to the
SVE vector length and two new controls which change the registers seen
by guests:
- PSTATE.ZA enables the ZA matrix register and, if SME2 is supported,
the ZT0 LUT register.
- PSTATE.SM enables streaming mode, a new floating point mode which
uses the SVE register set with the separately configured SME vector
length. In streaming mode implementation of the FFR register is
optional.
It is also permitted to build systems which support SME without SVE, in
this case when not in streaming mode no SVE registers or instructions
are available. Further, there is no requirement that there be any
overlap in the set of vector lengths supported by SVE and SME in a
system, this is expected to be a common situation in practical systems.
Since there is a new vector length to configure we introduce a new
feature parallel to the existing SVE one with a new pseudo register for
the streaming mode vector length. Due to the overlap with SVE caused by
streaming mode rather than finalising SME as a separate feature we use
the existing SVE finalisation to also finalise SME, a new define
KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising
SVE and SME separately would introduce complication with register access
since finalising SVE makes the SVE registers writeable by userspace and
doing multiple finalisations results in an error being reported.
Dealing with a state where the SVE registers are writeable due to one of
SVE or SME being finalised but may have their VL changed by the other
being finalised seems like needless complexity with minimal practical
utility, it seems clearer to just express directly that only one
finalisation can be done in the ABI.
Access to the floating point registers follows the architecture:
- When both SVE and SME are present:
- If PSTATE.SM == 0 the vector length used for the Z and P registers
is the SVE vector length.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
is the SME vector length.
- If only SME is present:
- If PSTATE.SM == 0 the Z and P registers are inaccessible and the
floating point state accessed via the encodings for the V registers.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
- The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1.
The VMM must understand this, in particular when loading state SVCR
should be configured before other state. It should be noted that while
the architecture refers to PSTATE.SM and PSTATE.ZA these PSTATE bits are
not preserved in SPSR_ELx, they are only accessible via SVCR.
There are a large number of subfeatures for SME, most of which only
offer additional instructions but some of which (SME2 and FA64) add
architectural state. These are configured via the ID registers as per
usual.
Protected KVM supported, with the implementation maintaining the
existing restriction that the hypervisor will refuse to run if streaming
mode or ZA is enabled. This both simplfies the code and avoids the need
to allocate storage for host ZA and ZT0 state, there seems to be little
practical use case for supporting this and the memory usage would be
non-trivial.
The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been
added to the get-reg-list selftest, the idea of supporting additional
features there without restructuring the program to generate all
possible feature combinations has been rejected. I will post a separate
series which does that restructuring.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v7:
- Rebase onto v6.17-rc1.
- Handle SMIDR_EL1 as a VM wide ID register and use this in feat_sme_smps().
- Expose affinity fields in SMIDR_EL1.
- Remove SMPRI_EL1 from vcpu_sysreg, the value is always 0 currently.
- Prevent userspace writes to SMPRIMAP_EL2.
- Link to v6: https://lore.kernel.org/r/20250625-kvm-arm64-sme-v6-0-114cff4ffe04@kernel.o…
Changes in v6:
- Rebase onto v6.16-rc3.
- Link to v5: https://lore.kernel.org/r/20250417-kvm-arm64-sme-v5-0-f469a2d5f574@kernel.o…
Changes in v5:
- Rebase onto v6.15-rc2.
- Add pKVM guest support.
- Always restore SVCR.
- Link to v4: https://lore.kernel.org/r/20250214-kvm-arm64-sme-v4-0-d64a681adcc2@kernel.o…
Changes in v4:
- Rebase onto v6.14-rc2 and Mark Rutland's fixes.
- Expose SME to nested guests.
- Additional cleanups and test fixes following on from the rebase.
- Flush register state on VMM PSTATE.{SM,ZA}.
- Link to v3: https://lore.kernel.org/r/20241220-kvm-arm64-sme-v3-0-05b018c1ffeb@kernel.o…
Changes in v3:
- Rebase onto v6.12-rc2.
- Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.o…
Changes in v2:
- Rebase onto v6.7-rc3.
- Configure subfeatures based on host system only.
- Complete nVHE support.
- There was some snafu with sending v1 out, it didn't make it to the
lists but in case it hit people's inboxes I'm sending as v2.
---
Mark Brown (29):
arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06
arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state
arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time
arm64/fpsimd: Check enable bit for FA64 when saving EFI state
arm64/fpsimd: Determine maximum virtualisable SME vector length
KVM: arm64: Introduce non-UNDEF FGT control
KVM: arm64: Pay attention to FFR parameter in SVE save and load
KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h
KVM: arm64: Move SVE state access macros after feature test macros
KVM: arm64: Rename SVE finalization constants to be more general
KVM: arm64: Document the KVM ABI for SME
KVM: arm64: Define internal features for SME
KVM: arm64: Rename sve_state_reg_region
KVM: arm64: Store vector lengths in an array
KVM: arm64: Implement SME vector length configuration
KVM: arm64: Support SME control registers
KVM: arm64: Support TPIDR2_EL0
KVM: arm64: Support SME identification registers for guests
KVM: arm64: Support SME priority registers
KVM: arm64: Provide assembly for SME register access
KVM: arm64: Support userspace access to streaming mode Z and P registers
KVM: arm64: Flush register state on writes to SVCR.SM and SVCR.ZA
KVM: arm64: Expose SME specific state to userspace
KVM: arm64: Context switch SME state for guests
KVM: arm64: Handle SME exceptions
KVM: arm64: Expose SME to nested guests
KVM: arm64: Provide interface for configuring and enabling SME for guests
KVM: arm64: selftests: Add SME system registers to get-reg-list
KVM: arm64: selftests: Add SME to set_id_regs test
Documentation/virt/kvm/api.rst | 117 +++++++----
arch/arm64/include/asm/fpsimd.h | 26 +++
arch/arm64/include/asm/kvm_emulate.h | 6 +
arch/arm64/include/asm/kvm_host.h | 169 ++++++++++++---
arch/arm64/include/asm/kvm_hyp.h | 5 +-
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 33 +++
arch/arm64/kernel/cpufeature.c | 2 -
arch/arm64/kernel/fpsimd.c | 89 ++++----
arch/arm64/kvm/arm.c | 10 +
arch/arm64/kvm/config.c | 8 +-
arch/arm64/kvm/fpsimd.c | 28 ++-
arch/arm64/kvm/guest.c | 252 ++++++++++++++++++++---
arch/arm64/kvm/handle_exit.c | 14 ++
arch/arm64/kvm/hyp/fpsimd.S | 28 ++-
arch/arm64/kvm/hyp/include/hyp/switch.h | 175 ++++++++++++++--
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 110 ++++++----
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 85 ++++++--
arch/arm64/kvm/hyp/nvhe/switch.c | 4 +-
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 6 +
arch/arm64/kvm/hyp/vhe/switch.c | 17 +-
arch/arm64/kvm/hyp/vhe/sysreg-sr.c | 7 +
arch/arm64/kvm/nested.c | 3 +-
arch/arm64/kvm/reset.c | 156 ++++++++++----
arch/arm64/kvm/sys_regs.c | 141 ++++++++++++-
arch/arm64/tools/sysreg | 8 +-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/arm64/get-reg-list.c | 15 +-
tools/testing/selftests/kvm/arm64/set_id_regs.c | 27 ++-
31 files changed, 1328 insertions(+), 304 deletions(-)
---
base-commit: 062b3e4a1f880f104a8d4b90b767788786aa7b78
change-id: 20230301-kvm-arm64-sme-06a1246d3636
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is based on mm-unstable.
I will only CC non-MM folks on the cover letter and the respective patch
to not flood too many inboxes (the lists receive all patches).
--
As discussed recently with Linus, nth_page() is just nasty and we would
like to remove it.
To recap, the reason we currently need nth_page() within a folio is because
on some kernel configs (SPARSEMEM without SPARSEMEM_VMEMMAP), the
memmap is allocated per memory section.
While buddy allocations cannot cross memory section boundaries, hugetlb
and dax folios can.
So crossing a memory section means that "page++" could do the wrong thing.
Instead, nth_page() on these problematic configs always goes from
page->pfn, to the go from (++pfn)->page, which is rather nasty.
Likely, many people have no idea when nth_page() is required and when
it might be dropped.
We refer to such problematic PFN ranges and "non-contiguous pages".
If we only deal with "contiguous pages", there is not need for nth_page().
Besides that "obvious" folio case, we might end up using nth_page()
within CMA allocations (again, could span memory sections), and in
one corner case (kfence) when processing memblock allocations (again,
could span memory sections).
So let's handle all that, add sanity checks, and remove nth_page().
Patch #1 -> #5 : stop making SPARSEMEM_VMEMMAP user-selectable + cleanups
Patch #6 -> #13 : disallow folios to have non-contiguous pages
Patch #14 -> #20 : remove nth_page() usage within folios
Patch #21 : disallow CMA allocations of non-contiguous pages
Patch #22 -> #32 : sanity+check + remove nth_page() usage within SG entry
Patch #33 : sanity-check + remove nth_page() usage in
unpin_user_page_range_dirty_lock()
Patch #34 : remove nth_page() in kfence
Patch #35 : adjust stale comment regarding nth_page
Patch #36 : mm: remove nth_page()
A lot of this is inspired from the discussion at [1] between Linus, Jason
and me, so cudos to them.
[1] https://lore.kernel.org/all/CAHk-=wiCYfNp4AJLBORU-c7ZyRBUp66W2-Et6cdQ4REx-G…
RFC -> v1:
* "wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu kernel
config"
-> Mention that it was never really relevant for the test
* "mm/mm_init: make memmap_init_compound() look more like
prep_compound_page()"
-> Mention the setup of page links
* "mm: limit folio/compound page sizes in problematic kernel configs"
-> Improve comment for PUD handling, mentioning hugetlb and dax
* "mm: simplify folio_page() and folio_page_idx()"
-> Call variable "n"
* "mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()"
-> Keep __init_single_page() and refer to the usage of
memblock_reserved_mark_noinit()
* "fs: hugetlbfs: cleanup folio in adjust_range_hwpoison()"
* "fs: hugetlbfs: remove nth_page() usage within folio in
adjust_range_hwpoison()"
-> Separate nth_page() removal from cleanups
-> Further improve cleanups
* "io_uring/zcrx: remove nth_page() usage within folio"
-> Keep the io_copy_cache for now and limit to nth_page() removal
* "mm/gup: drop nth_page() usage within folio when recording subpages"
-> Cleanup record_subpages as bit
* "mm/cma: refuse handing out non-contiguous page ranges"
-> Replace another instance of "pfn_to_page(pfn)" where we already have
the page
* "scatterlist: disallow non-contigous page ranges in a single SG entry"
-> We have to EXPORT the symbol. I thought about moving it to mm_inline.h,
but I really don't want to include that in include/linux/scatterlist.h
* "ata: libata-eh: drop nth_page() usage within SG entry"
* "mspro_block: drop nth_page() usage within SG entry"
* "memstick: drop nth_page() usage within SG entry"
* "mmc: drop nth_page() usage within SG entry"
-> Keep PAGE_SHIFT
* "scsi: scsi_lib: drop nth_page() usage within SG entry"
* "scsi: sg: drop nth_page() usage within SG entry"
-> Split patches, Keep PAGE_SHIFT
* "crypto: remove nth_page() usage within SG entry"
-> Keep PAGE_SHIFT
* "kfence: drop nth_page() usage"
-> Keep modifying i and use "start_pfn" only instead
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Marek Szyprowski <m.szyprowski(a)samsung.com>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Lameter <cl(a)gentwo.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: x86(a)kernel.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-mips(a)vger.kernel.org
Cc: linux-s390(a)vger.kernel.org
Cc: linux-crypto(a)vger.kernel.org
Cc: linux-ide(a)vger.kernel.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-mmc(a)vger.kernel.org
Cc: linux-arm-kernel(a)axis.com
Cc: linux-scsi(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
Cc: virtualization(a)lists.linux.dev
Cc: linux-mm(a)kvack.org
Cc: io-uring(a)vger.kernel.org
Cc: iommu(a)lists.linux.dev
Cc: kasan-dev(a)googlegroups.com
Cc: wireguard(a)lists.zx2c4.com
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-riscv(a)lists.infradead.org
David Hildenbrand (36):
mm: stop making SPARSEMEM_VMEMMAP user-selectable
arm64: Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
s390/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
x86/Kconfig: drop superfluous "select SPARSEMEM_VMEMMAP"
wireguard: selftests: remove CONFIG_SPARSEMEM_VMEMMAP=y from qemu
kernel config
mm/page_alloc: reject unreasonable folio/compound page sizes in
alloc_contig_range_noprof()
mm/memremap: reject unreasonable folio/compound page sizes in
memremap_pages()
mm/hugetlb: check for unreasonable folio sizes when registering hstate
mm/mm_init: make memmap_init_compound() look more like
prep_compound_page()
mm: sanity-check maximum folio size in folio_set_order()
mm: limit folio/compound page sizes in problematic kernel configs
mm: simplify folio_page() and folio_page_idx()
mm/hugetlb: cleanup hugetlb_folio_init_tail_vmemmap()
mm/mm/percpu-km: drop nth_page() usage within single allocation
fs: hugetlbfs: remove nth_page() usage within folio in
adjust_range_hwpoison()
fs: hugetlbfs: cleanup folio in adjust_range_hwpoison()
mm/pagewalk: drop nth_page() usage within folio in folio_walk_start()
mm/gup: drop nth_page() usage within folio when recording subpages
io_uring/zcrx: remove nth_page() usage within folio
mips: mm: convert __flush_dcache_pages() to
__flush_dcache_folio_pages()
mm/cma: refuse handing out non-contiguous page ranges
dma-remap: drop nth_page() in dma_common_contiguous_remap()
scatterlist: disallow non-contigous page ranges in a single SG entry
ata: libata-eh: drop nth_page() usage within SG entry
drm/i915/gem: drop nth_page() usage within SG entry
mspro_block: drop nth_page() usage within SG entry
memstick: drop nth_page() usage within SG entry
mmc: drop nth_page() usage within SG entry
scsi: scsi_lib: drop nth_page() usage within SG entry
scsi: sg: drop nth_page() usage within SG entry
vfio/pci: drop nth_page() usage within SG entry
crypto: remove nth_page() usage within SG entry
mm/gup: drop nth_page() usage in unpin_user_page_range_dirty_lock()
kfence: drop nth_page() usage
block: update comment of "struct bio_vec" regarding nth_page()
mm: remove nth_page()
arch/arm64/Kconfig | 1 -
arch/mips/include/asm/cacheflush.h | 11 +++--
arch/mips/mm/cache.c | 8 ++--
arch/s390/Kconfig | 1 -
arch/x86/Kconfig | 1 -
crypto/ahash.c | 4 +-
crypto/scompress.c | 8 ++--
drivers/ata/libata-sff.c | 6 +--
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 2 +-
drivers/memstick/core/mspro_block.c | 3 +-
drivers/memstick/host/jmb38x_ms.c | 3 +-
drivers/memstick/host/tifm_ms.c | 3 +-
drivers/mmc/host/tifm_sd.c | 4 +-
drivers/mmc/host/usdhi6rol0.c | 4 +-
drivers/scsi/scsi_lib.c | 3 +-
drivers/scsi/sg.c | 3 +-
drivers/vfio/pci/pds/lm.c | 3 +-
drivers/vfio/pci/virtio/migrate.c | 3 +-
fs/hugetlbfs/inode.c | 33 +++++--------
include/crypto/scatterwalk.h | 4 +-
include/linux/bvec.h | 7 +--
include/linux/mm.h | 48 +++++++++++++++----
include/linux/page-flags.h | 5 +-
include/linux/scatterlist.h | 3 +-
io_uring/zcrx.c | 4 +-
kernel/dma/remap.c | 2 +-
mm/Kconfig | 3 +-
mm/cma.c | 39 +++++++++------
mm/gup.c | 14 ++++--
mm/hugetlb.c | 22 +++++----
mm/internal.h | 1 +
mm/kfence/core.c | 12 +++--
mm/memremap.c | 3 ++
mm/mm_init.c | 15 +++---
mm/page_alloc.c | 5 +-
mm/pagewalk.c | 2 +-
mm/percpu-km.c | 2 +-
mm/util.c | 34 +++++++++++++
tools/testing/scatterlist/linux/mm.h | 1 -
.../selftests/wireguard/qemu/kernel.config | 1 -
40 files changed, 202 insertions(+), 129 deletions(-)
base-commit: efa7612003b44c220551fd02466bfbad5180fc83
--
2.50.1
From: Yicong Yang <yangyicong(a)hisilicon.com>
Armv8.7 introduces single-copy atomic 64-byte loads and stores
instructions and its variants named under FEAT_{LS64, LS64_V}.
Add support for Armv8.7 FEAT_{LS64, LS64_V}:
- Add identifying and enabling in the cpufeature list
- Expose the support of these features to userspace through HWCAP3
and cpuinfo
- Add related hwcap test
- Handle the trap of unsupported memory (normal/uncacheable) access in a VM
A real scenario for this feature is that the userspace driver can make use of
this to implement direct WQE (workqueue entry) - a mechanism to fill WQE
directly into the hardware.
Picked Marc's 2 patches form [1] for handling the LS64 trap in a VM on emulated
MMIO and the introduce of KVM_EXIT_ARM_LDST64B.
[1] https://lore.kernel.org/linux-arm-kernel/20240815125959.2097734-1-maz@kerne…
Tested with updated hwcap test:
[root@localhost tmp]# dmesg | grep "All CPU(s) started"
[ 14.789859] CPU: All CPU(s) started at EL2
[root@localhost tmp]# ./hwcap
# LS64 present
ok 217 cpuinfo_match_LS64
ok 218 sigill_LS64
ok 219 # SKIP sigbus_LS64_V
# LS64_V present
ok 220 cpuinfo_match_LS64_V
ok 221 sigill_LS64_V
ok 222 # SKIP sigbus_LS64_V
# 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0
root@localhost:/mnt# dmesg | grep "All CPU(s) started"
[ 0.281152] CPU: All CPU(s) started at EL1
root@localhost:/mnt# ./hwcap
# LS64 present
ok 217 cpuinfo_match_LS64
ok 218 sigill_LS64
ok 219 # SKIP sigbus_LS64
# LS64_V present
ok 220 cpuinfo_match_LS64_V
ok 221 sigill_LS64_V
ok 222 # SKIP sigbus_LS64_V
# 115 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:107 fail:0 xfail:0 xpass:0 skip:115 error:0
Change since v4:
- Rebase on v6.17-rc2 and fix the conflicts
Link: https://lore.kernel.org/linux-arm-kernel/20250715081356.12442-1-yangyicong@…
Change since v3:
- Inject DABT fault for LS64 fault on unsupported memory but with valid memslot
Link: https://lore.kernel.org/linux-arm-kernel/20250626080906.64230-1-yangyicong@…
Change since v2:
- Handle the LS64 fault to userspace and allow userspace to inject LS64 fault
- Reorder the patches to make KVM handling prior to feature support
Link: https://lore.kernel.org/linux-arm-kernel/20250331094320.35226-1-yangyicong@…
Change since v1:
- Drop the support for LS64_ACCDATA
- handle the DABT of unsupported memory type after checking the memory attributes
Link: https://lore.kernel.org/linux-arm-kernel/20241202135504.14252-1-yangyicong@…
Marc Zyngier (2):
KVM: arm64: Add exit to userspace on {LD,ST}64B* outside of memslots
KVM: arm64: Add documentation for KVM_EXIT_ARM_LDST64B
Yicong Yang (5):
KVM: arm64: Handle DABT caused by LS64* instructions on unsupported
memory
arm64: Provide basic EL2 setup for FEAT_{LS64, LS64_V} usage at EL0/1
arm64: Add support for FEAT_{LS64, LS64_V}
KVM: arm64: Enable FEAT_{LS64, LS64_V} in the supported guest
kselftest/arm64: Add HWCAP test for FEAT_{LS64, LS64_V}
Documentation/arch/arm64/booting.rst | 12 +++
Documentation/arch/arm64/elf_hwcaps.rst | 6 ++
Documentation/virt/kvm/api.rst | 43 +++++++++--
arch/arm64/include/asm/el2_setup.h | 12 ++-
arch/arm64/include/asm/esr.h | 8 ++
arch/arm64/include/asm/hwcap.h | 2 +
arch/arm64/include/asm/kvm_emulate.h | 7 ++
arch/arm64/include/uapi/asm/hwcap.h | 2 +
arch/arm64/kernel/cpufeature.c | 51 +++++++++++++
arch/arm64/kernel/cpuinfo.c | 2 +
arch/arm64/kvm/inject_fault.c | 22 ++++++
arch/arm64/kvm/mmio.c | 27 ++++++-
arch/arm64/kvm/mmu.c | 14 +++-
arch/arm64/tools/cpucaps | 2 +
include/uapi/linux/kvm.h | 3 +-
tools/testing/selftests/arm64/abi/hwcap.c | 90 +++++++++++++++++++++++
16 files changed, 292 insertions(+), 11 deletions(-)
--
2.24.0
Summary
----------
Hi, everyone,
This patch set introduces an extensible cpuidle governor framework
using BPF struct_ops, enabling dynamic implementation of idle-state selection policies
via BPF programs.
Motivation
----------
As is well-known, CPUs support multiple idle states (e.g., C0, C1, C2, ...),
where deeper states reduce power consumption, but results in longer wakeup latency,
potentially affecting performance.
Existing generic cpuidle governors operate effectively in common scenarios
but exhibit suboptimal behavior in specific Android phone's use cases.
Our testing reveals that during low-utilization scenarios
(e.g., screen-off background tasks like music playback with CPU utilization <10%),
the C0 state occupies ~50% of idle time, causing significant energy inefficiency.
Reducing C0 to ≤20% could yield ≥5% power savings on mobile phones.
To address this, we expect:
1.Dynamic governor switching to power-saved policies for low cpu utilization scenarios (e.g., screen-off mode)
2.Dynamic switching to alternate governors for high-performance scenarios (e.g., gaming)
OverView
----------
The BPF cpuidle ext governor registers at postcore_initcall()
but remains disabled by default due to its low priority "rating" with value "1".
Activation requires adjust higer "rating" than other governors within BPF.
Core Components:
1.**struct cpuidle_gov_ext_ops** – BPF-overridable operations:
- ops.enable()/ops.disable(): enable or disable callback
- ops.select(): cpu Idle-state selection logic
- ops.set_stop_tick(): Scheduler tick management after state selection
- ops.reflect(): feedback info about previous idle state.
- ops.init()/ops.deinit(): Initialization or cleanup.
2.**Critical kfuncs for kernel state access**:
- bpf_cpuidle_ext_gov_update_rating():
Activate ext governor by raising rating must be called from "ops.init()"
- bpf_cpuidle_ext_gov_latency_req(): get idle-state latency constraints
- bpf_tick_nohz_get_sleep_length(): get CPU sleep duration in tickless mode
Future work
----------
1. Scenario detection: Identifying low-utilization states (e.g., screen-off + background music)
2. Policy optimization: Optimizing state-selection algorithms for specific scenarios
Lin Yikai (2):
Subject: [PATCH v1 1/2] cpuidle: Implement BPF extensible cpuidle class
Subject: [PATCH v1 2/2] selftests/bpf: Add selftests
drivers/cpuidle/Kconfig | 12 +
drivers/cpuidle/governors/Makefile | 1 +
drivers/cpuidle/governors/ext.c | 537 ++++++++++++++++++
.../bpf/prog_tests/test_cpuidle_gov_ext.c | 28 +
.../selftests/bpf/progs/cpuidle_gov_ext.c | 208 +++++++
5 files changed, 786 insertions(+)
create mode 100644 drivers/cpuidle/governors/ext.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cpuidle_gov_ext.c
create mode 100644 tools/testing/selftests/bpf/progs/cpuidle_gov_ext.c
--
2.43.0
In fp-trace when allocating a buffer to write SVE register data we open
code the addition of the header size to the VL depeendent register data
size, which lead to an underallocation bug when we cut'n'pasted the code
for FPSIMD format writes. Use the SVE_PT_SIZE() macro that the kernel
UAPI provides for this.
Fixes: b84d2b27954f ("kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/fp-ptrace.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/arm64/fp/fp-ptrace.c b/tools/testing/selftests/arm64/fp/fp-ptrace.c
index 124bc883365e..cdd7a45c045d 100644
--- a/tools/testing/selftests/arm64/fp/fp-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/fp-ptrace.c
@@ -1187,7 +1187,7 @@ static void sve_write_sve(pid_t child, struct test_config *config)
if (!vl)
return;
- iov.iov_len = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq, SVE_PT_REGS_SVE);
+ iov.iov_len = SVE_PT_SIZE(vq, SVE_PT_REGS_SVE);
iov.iov_base = malloc(iov.iov_len);
if (!iov.iov_base) {
ksft_print_msg("Failed allocating %lu byte SVE write buffer\n",
@@ -1234,8 +1234,7 @@ static void sve_write_fpsimd(pid_t child, struct test_config *config)
if (!vl)
return;
- iov.iov_len = SVE_PT_SVE_OFFSET + SVE_PT_SVE_SIZE(vq,
- SVE_PT_REGS_FPSIMD);
+ iov.iov_len = SVE_PT_SIZE(vq, SVE_PT_REGS_FPSIMD);
iov.iov_base = malloc(iov.iov_len);
if (!iov.iov_base) {
ksft_print_msg("Failed allocating %lu byte SVE write buffer\n",
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250808-arm64-fp-trace-macro-02ede083da51
Best regards,
--
Mark Brown <broonie(a)kernel.org>
[All the precursor patches are merged now and AMD/RISCV/VTD conversions
are written]
Currently each of the iommu page table formats duplicates all of the logic
to maintain the page table and perform map/unmap/etc operations. There are
several different versions of the algorithms between all the different
formats. The io-pgtable system provides an interface to help isolate the
page table code from the iommu driver, but doesn't provide tools to
implement the common algorithms.
This makes it very hard to improve the state of the pagetable code under
the iommu domains as any proposed improvement needs to alter a large
number of different driver code paths. Combined with a lack of software
based testing this makes improvement in this area very hard.
iommufd wants several new page table operations:
- More efficient map/unmap operations, using iommufd's batching logic
- unmap that returns the physical addresses into a batch as it progresses
- cut that allows splitting areas so large pages can have holes
poked in them dynamically (ie guestmemfd hitless shared/private
transitions)
- More agressive freeing of table memory to avoid waste
- Fragmenting large pages so that dirty tracking can be more granular
- Reassembling large pages so that VMs can run at full IO performance
in migration/dirty tracking error flows
- KHO integration for kernel live upgrade
Together these are algorithmically complex enough to be a very significant
task to go and implement in all the page table formats we support. Just
the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86
PAE / AMDv1 / VT-D SS / RISCV)
Instead of doing the duplicated work, this series takes the first step to
consolidate the algorithms into one places. In spirit it is similar to the
work Christoph did a few years back to pull the redundant get_user_pages()
implementations out of the arch code into core MM. This unlocked a great
deal of improvement in that space in the following years. I would like to
see the same benefit in iommu as well.
My first RFC showed a bigger picture with all most all formats and more
algorithms. This series reorganizes that to be narrowly focused on just
enough to convert the AMD driver to use the new mechanism.
kunit tests are provided that allow good testing of the algorithms and all
formats on x86, nothing is arch specific.
AMD is one of the simpler options as the HW is quite uniform with few
different options/bugs while still requiring the complicated contiguous
pages support. The HW also has a very simple range based invalidation
approach that is easy to implement.
The AMD v1 and AMD v2 page table formats are implemented bit for bit
identical to the current code, tested using a compare kunit test that
checks against the io-pgtable version (on github, see below).
Updating the AMD driver to replace the io-pgtable layer with the new stuff
is fairly straightforward now. The layering is fixed up in the new version
so that all the invalidation goes through function pointers.
Several small fixing patches have come out of this as I've been fixing the
problems that the test suite uncovers in the current code, and
implementing the fixed version in iommupt.
On performance, there is a quite wide variety of implementation designs
across all the drivers. Looking at some key performance across
the main formats:
iommu_map():
pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
2^12, 53,66 , 51,63 , 19.19 (AMDV1)
256*2^12, 386,1909 , 367,1795 , 79.79
256*2^21, 362,1633 , 355,1556 , 77.77
2^12, 56,62 , 52,59 , 11.11 (AMDv2)
256*2^12, 405,1355 , 357,1292 , 72.72
256*2^21, 393,1160 , 358,1114 , 67.67
2^12, 55,65 , 53,62 , 14.14 (VTD second stage)
256*2^12, 391,518 , 332,512 , 35.35
256*2^21, 383,635 , 336,624 , 46.46
2^12, 57,65 , 55,63 , 12.12 (ARM 64 bit)
256*2^12, 380,389 , 361,369 , 2.02
256*2^21, 358,419 , 345,400 , 13.13
iommu_unmap():
pgsz ,avg new,old ns, min new,old ns , min % (+ve is better)
2^12, 69,88 , 65,85 , 23.23 (AMDv1)
256*2^12, 353,6498 , 331,6029 , 94.94
256*2^21, 373,6014 , 360,5706 , 93.93
2^12, 71,72 , 66,69 , 4.04 (AMDv2)
256*2^12, 228,891 , 206,871 , 76.76
256*2^21, 254,721 , 245,711 , 65.65
2^12, 69,87 , 65,82 , 20.20 (VTD second stage)
256*2^12, 210,321 , 200,315 , 36.36
256*2^21, 255,349 , 238,342 , 30.30
2^12, 72,77 , 68,74 , 8.08 (ARM 64 bit)
256*2^12, 521,357 , 447,346 , -29.29
256*2^21, 489,358 , 433,345 , -25.25
* Above numbers include additional patches to remove the iommu_pgsize()
overheads. gcc 13.3.0, i7-12700
This version provides fairly consistent performance across formats. ARM
unmap performance is quite different because this version supports
contiguous pages and uses a very different algorithm for unmapping. Though
why it is so worse compared to AMDv1 I haven't figured out yet.
The per-format commits include a more detailed chart.
There is a second branch:
https://github.com/jgunthorpe/linux/commits/iommu_pt_all
Containing supporting work and future steps:
- ARM short descriptor (32 bit), ARM long descriptor (64 bit) formats
- RISCV format and RISCV conversion
https://github.com/jgunthorpe/linux/commits/iommu_pt_riscv
- Support for a DMA incoherent HW page table walker
- VT-D second stage format and VT-D conversion
https://github.com/jgunthorpe/linux/commits/iommu_pt_vtd
- DART v1 & v2 format
- Draft of a iommufd 'cut' operation to break down huge pages
- A compare test that checks the iommupt formats against the iopgtable
interface, including updating AMD to have a working iopgtable and patches
to make VT-D have an iopgtable for testing.
- A performance test to micro-benchmark map and unmap against iogptable
My strategy is to go one by one for the drivers:
- AMD driver conversion
- RISCV page table and driver
- Intel VT-D driver and VTDSS page table
- Flushing improvements for RISCV
- ARM SMMUv3
And concurrently work on the algorithm side:
- debugfs content dump, like VT-D has
- Cut support
- Increase/Decrease page size support
- map/unmap batching
- KHO
As we make more algorithm improvements the value to convert the drivers
increases.
This is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt
v3:
- Rebase on v6.16-rc3
- Integrate the HATS/HATDis changes
- Remove 'default n' from kconfig
- Remove unused 'PT_FIXED_TOP_LEVEL'
- Improve comments and coumentation
- Fix some compile warnings from kbuild robots
v2: https://patch.msgid.link/r/0-v3-a93aab628dbc+521-iommu_pt_jgg@nvidia.com
- Rebase on v6.16-rc2
- s/PT_ENTRY_WORD_SIZE/PT_ITEM_WORD_SIZE/s to follow the language better
- Comment and documentation updates
- Add PT_TOP_PHYS_MASK to help manage alignment restrictions on the top
pointer
- Add missed force_aperture = true
- Make pt_iommu_deinit() take care of the not-yet-inited error case
internally as AMD/RISCV/VTD all shared this logic
- Change gather_range() into gather_range_pages() so it also deals with
the page list. This makes the following cache flushing series simpler
- Fix missed update of unmap->unmapped in some error cases
- Change clear_contig() to order the gather more logically
- Remove goto from the error handling in __map_range_leaf()
- s/log2_/oalog2_/ in places where the argument is an oaddr_t
- Pass the pts to pt_table_install64/32()
- Do not use SIGN_EXTEND for the AMDv2 page table because of Vasant's
information on how PASID 0 works.
v1: https://patch.msgid.link/r/0-v2-5c26bde5c22d+58b-iommu_pt_jgg@nvidia.com
- AMD driver only, many code changes
RFC: https://lore.kernel.org/all/0-v1-01fa10580981+1d-iommu_pt_jgg@nvidia.com/
Cc: Michael Roth <michael.roth(a)amd.com>
Cc: Alexey Kardashevskiy <aik(a)amd.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: James Gowans <jgowans(a)amazon.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Alejandro Jimenez (1):
iommu/amd: Use the generic iommu page table
Jason Gunthorpe (14):
genpt: Generic Page Table base API
genpt: Add Documentation/ files
iommupt: Add the basic structure of the iommu implementation
iommupt: Add the AMD IOMMU v1 page table format
iommupt: Add iova_to_phys op
iommupt: Add unmap_pages op
iommupt: Add map_pages op
iommupt: Add read_and_clear_dirty op
iommupt: Add a kunit test for Generic Page Table
iommupt: Add a mock pagetable format for iommufd selftest to use
iommufd: Change the selftest to use iommupt instead of xarray
iommupt: Add the x86 64 bit page table format
iommu/amd: Remove AMD io_pgtable support
iommupt: Add a kunit test for the IOMMU implementation
.clang-format | 1 +
Documentation/driver-api/generic_pt.rst | 140 ++
Documentation/driver-api/index.rst | 1 +
drivers/iommu/Kconfig | 2 +
drivers/iommu/Makefile | 1 +
drivers/iommu/amd/Kconfig | 5 +-
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu.h | 1 -
drivers/iommu/amd/amd_iommu_types.h | 109 +-
drivers/iommu/amd/io_pgtable.c | 560 --------
drivers/iommu/amd/io_pgtable_v2.c | 370 ------
drivers/iommu/amd/iommu.c | 538 ++++----
drivers/iommu/generic_pt/.kunitconfig | 13 +
drivers/iommu/generic_pt/Kconfig | 67 +
drivers/iommu/generic_pt/fmt/Makefile | 26 +
drivers/iommu/generic_pt/fmt/amdv1.h | 409 ++++++
drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 +
drivers/iommu/generic_pt/fmt/defs_x86_64.h | 21 +
drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 15 +
drivers/iommu/generic_pt/fmt/iommu_mock.c | 10 +
drivers/iommu/generic_pt/fmt/iommu_template.h | 48 +
drivers/iommu/generic_pt/fmt/iommu_x86_64.c | 11 +
drivers/iommu/generic_pt/fmt/x86_64.h | 248 ++++
drivers/iommu/generic_pt/iommu_pt.h | 1146 +++++++++++++++++
drivers/iommu/generic_pt/kunit_generic_pt.h | 717 +++++++++++
drivers/iommu/generic_pt/kunit_iommu.h | 183 +++
drivers/iommu/generic_pt/kunit_iommu_pt.h | 451 +++++++
drivers/iommu/generic_pt/pt_common.h | 354 +++++
drivers/iommu/generic_pt/pt_defs.h | 323 +++++
drivers/iommu/generic_pt/pt_fmt_defaults.h | 193 +++
drivers/iommu/generic_pt/pt_iter.h | 636 +++++++++
drivers/iommu/generic_pt/pt_log2.h | 130 ++
drivers/iommu/io-pgtable.c | 4 -
drivers/iommu/iommufd/Kconfig | 1 +
drivers/iommu/iommufd/iommufd_test.h | 11 +-
drivers/iommu/iommufd/selftest.c | 438 +++----
include/linux/generic_pt/common.h | 166 +++
include/linux/generic_pt/iommu.h | 270 ++++
include/linux/io-pgtable.h | 2 -
tools/testing/selftests/iommu/iommufd.c | 60 +-
tools/testing/selftests/iommu/iommufd_utils.h | 12 +
41 files changed, 6124 insertions(+), 1592 deletions(-)
create mode 100644 Documentation/driver-api/generic_pt.rst
delete mode 100644 drivers/iommu/amd/io_pgtable.c
delete mode 100644 drivers/iommu/amd/io_pgtable_v2.c
create mode 100644 drivers/iommu/generic_pt/.kunitconfig
create mode 100644 drivers/iommu/generic_pt/Kconfig
create mode 100644 drivers/iommu/generic_pt/fmt/Makefile
create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h
create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h
create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86_64.h
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_mock.c
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h
create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86_64.c
create mode 100644 drivers/iommu/generic_pt/fmt/x86_64.h
create mode 100644 drivers/iommu/generic_pt/iommu_pt.h
create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h
create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h
create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h
create mode 100644 drivers/iommu/generic_pt/pt_common.h
create mode 100644 drivers/iommu/generic_pt/pt_defs.h
create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h
create mode 100644 drivers/iommu/generic_pt/pt_iter.h
create mode 100644 drivers/iommu/generic_pt/pt_log2.h
create mode 100644 include/linux/generic_pt/common.h
create mode 100644 include/linux/generic_pt/iommu.h
base-commit: 8da0d63bd5726ff656bfa1eacb45d6f5cce65616
--
2.43.0
The pthread_attr_setaffinity_np function is a GNU extension that may not
be available in non-glibc C libraries. Some KVM selftests use this
function for CPU affinity control.
Add a function declaration and weak stub implementation for non-glibc
builds. This allows tests to build, with the affinity setting being a
no-op and errno set for the caller when the actual function is not available.
Signed-off-by: Aqib Faruqui <aqibaf(a)amazon.com>
---
tools/testing/selftests/kvm/include/kvm_util.h | 4 ++++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +++++++++++
2 files changed, 15 insertions(+)
diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
index 7fae7f5e7..8177178b5 100644
--- a/tools/testing/selftests/kvm/include/kvm_util.h
+++ b/tools/testing/selftests/kvm/include/kvm_util.h
@@ -31,6 +31,10 @@
#include "kvm_util_types.h"
#include "sparsebit.h"
+#ifndef __GLIBC__
+int pthread_attr_setaffinity_np(pthread_attr_t *attr, size_t cpusetsize, const cpu_set_t *cpuset);
+#endif /* __GLIBC__ */
+
#define KVM_DEV_PATH "/dev/kvm"
#define KVM_MAX_VCPUS 512
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index c3f5142b0..5ce80303d 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -20,6 +20,17 @@
#define KVM_UTIL_MIN_PFN 2
+#ifndef __GLIBC__
+int __attribute__((weak))
+pthread_attr_setaffinity_np(pthread_attr_t *__attr,
+ size_t __cpusetsize,
+ const cpu_set_t *__cpuset)
+{
+ errno = ENOSYS;
+ return -1;
+}
+#endif
+
uint32_t guest_random_seed;
struct guest_random_state guest_rng;
static uint32_t last_guest_seed;
--
2.47.3
From: Rong Tao <rongtao(a)cestc.cn>
strnstr should not treat the ending '\0' of s2 as a matching character
if the parameter 'len' equal to s2 string length, for example:
1. bpf_strnstr("openat", "open", 4) = -ENOENT
2. bpf_strnstr("openat", "open", 5) = 0
This patch makes (1) return 0, fix just the `len == strlen(s2)` case.
And fix a more general case when s2 is a suffix of the first len
characters of s1.
Fixes: e91370550f1f ("bpf: Add kfuncs for read-only string operations")
Signed-off-by: Rong Tao <rongtao(a)cestc.cn>
---
kernel/bpf/helpers.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 401b4932cc49..91ad124844ae 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3672,10 +3672,17 @@ __bpf_kfunc int bpf_strnstr(const char *s1__ign, const char *s2__ign, size_t len
guard(pagefault)();
for (i = 0; i < XATTR_SIZE_MAX; i++) {
- for (j = 0; i + j < len && j < XATTR_SIZE_MAX; j++) {
+ for (j = 0; i + j <= len && j < XATTR_SIZE_MAX; j++) {
__get_kernel_nofault(&c2, s2__ign + j, char, err_out);
if (c2 == '\0')
return i;
+ /**
+ * We allow reading an extra byte from s2 (note the
+ * `i + j <= len` above) to cover the case when s2 is
+ * a suffix of the first len chars of s1.
+ */
+ if (i + j == len)
+ break;
__get_kernel_nofault(&c1, s1__ign + j, char, err_out);
if (c1 == '\0')
return -ENOENT;
--
2.51.0
Some C libraries may not define the ulong typedef that is commonly
available as a BSD/GNU extension. Add a fallback typedef to ensure ulong
is available across all selftest environments.
Signed-off-by: Aqib Faruqui <aqibaf(a)amazon.com>
---
tools/testing/selftests/kselftest.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index f362c6766..a1088a2af 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -58,6 +58,11 @@
#include <stdio.h>
#include <sys/utsname.h>
#include <sys/syscall.h>
+#include <sys/types.h>
+#endif
+
+#ifndef ulong
+typedef unsigned long ulong;
#endif
#ifndef ARRAY_SIZE
--
2.47.3
The original stdbuf use only checked if /usr/bin/stdbuf exists in the
host's system but failed to verify compatibility between stdbuf and the
target test binary.
The issue occurs when:
- Host system has glibc-based stdbuf from coreutils
- Selftest binaries are compiled with a non-glibc toolchain (cross
compilation)
The fix adds a runtime compatibility test against the target test binary
before enabling stdbuf, enabling cross-compiled selftests to run
successfully.
Signed-off-by: Aqib Faruqui <aqibaf(a)amazon.com>
---
tools/testing/selftests/kselftest/runner.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 2c3c58e65..8d4e33bd5 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -107,7 +107,7 @@ run_one()
echo "# Warning: file $TEST is missing!"
echo "not ok $test_num $TEST_HDR_MSG"
else
- if [ -x /usr/bin/stdbuf ]; then
+ if [ -x /usr/bin/stdbuf ] && [ -x "$TEST" ] && /usr/bin/stdbuf --output=L ldd "$TEST" >/dev/null 2>&1; then
stdbuf="/usr/bin/stdbuf --output=L "
fi
eval kselftest_cmd_args="\$${kselftest_cmd_args_ref:-}"
--
2.47.3
The rseq selftests rely on features provided by glibc that may not be
available in non-glibc C libraries:
1. The __GNU_PREREQ macro and glibc's thread pointer implementation are
not available in non-glibc libraries
2. The __NR_rseq syscall number may not be defined in non-glibc headers
Add a fallback thread pointer implementation for non-glibc systems using
the pre-existing inline assembly to access thread-local storage directly
via %fs/%gs registers. Also provide a fallback definition for __NR_rseq
when not already defined by the C library headers: 527 for alpha and 293
for other architectures.
Signed-off-by: Aqib Faruqui <aqibaf(a)amazon.com>
---
.../selftests/rseq/rseq-x86-thread-pointer.h | 14 ++++++++++++++
tools/testing/selftests/rseq/rseq.c | 8 ++++++++
2 files changed, 22 insertions(+)
diff --git a/tools/testing/selftests/rseq/rseq-x86-thread-pointer.h b/tools/testing/selftests/rseq/rseq-x86-thread-pointer.h
index d3133587d..a7c402926 100644
--- a/tools/testing/selftests/rseq/rseq-x86-thread-pointer.h
+++ b/tools/testing/selftests/rseq/rseq-x86-thread-pointer.h
@@ -14,6 +14,7 @@
extern "C" {
#endif
+#ifdef __GLIBC__
#if __GNUC_PREREQ (11, 1)
static inline void *rseq_thread_pointer(void)
{
@@ -32,6 +33,19 @@ static inline void *rseq_thread_pointer(void)
return __result;
}
#endif /* !GCC 11 */
+#else
+static inline void *rseq_thread_pointer(void)
+{
+ void *__result;
+
+# ifdef __x86_64__
+ __asm__ ("mov %%fs:0, %0" : "=r" (__result));
+# else
+ __asm__ ("mov %%gs:0, %0" : "=r" (__result));
+# endif
+ return __result;
+}
+#endif /* !__GLIBC__ */
#ifdef __cplusplus
}
diff --git a/tools/testing/selftests/rseq/rseq.c b/tools/testing/selftests/rseq/rseq.c
index 663a9cef1..1a6f73c98 100644
--- a/tools/testing/selftests/rseq/rseq.c
+++ b/tools/testing/selftests/rseq/rseq.c
@@ -36,6 +36,14 @@
#include "../kselftest.h"
#include "rseq.h"
+#ifndef __NR_rseq
+#ifdef __alpha__
+#define __NR_rseq 527
+#else
+#define __NR_rseq 293
+#endif
+#endif
+
/*
* Define weak versions to play nice with binaries that are statically linked
* against a libc that doesn't support registering its own rseq.
--
2.47.3
The backtrace() function is a GNU extension available in glibc but may
not be present in non-glibc libraries. KVM selftests use backtrace() for
error reporting and debugging.
Add conditional inclusion of execinfo.h only for glibc builds and
provide a weak stub implementation of backtrace() that returns 0 (stack
trace empty) for non-glibc systems.
Signed-off-by: Aqib Faruqui <aqibaf(a)amazon.com>
---
tools/testing/selftests/kvm/lib/assert.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/assert.c b/tools/testing/selftests/kvm/lib/assert.c
index b49690658..c9778dc6c 100644
--- a/tools/testing/selftests/kvm/lib/assert.c
+++ b/tools/testing/selftests/kvm/lib/assert.c
@@ -6,11 +6,19 @@
*/
#include "test_util.h"
-#include <execinfo.h>
#include <sys/syscall.h>
+#ifdef __GLIBC__
+#include <execinfo.h> /* backtrace */
+#endif
+
#include "kselftest.h"
+int __attribute__((weak)) backtrace(void **buffer, int size)
+{
+ return 0;
+}
+
/* Dumps the current stack trace to stderr. */
static void __attribute__((noinline)) test_dump_stack(void);
static void test_dump_stack(void)
--
2.47.3
Fix kvm_is_forced_enabled() to use get_kvm_param_bool() instead of
get_kvm_param_integer() when reading the "force_emulation_prefix" kernel
module parameter.
The force_emulation_prefix parameter is a boolean that accepts Y/N
values, but the function was incorrectly trying to parse it as an
integer using strtol().
Signed-off-by: Aqib Faruqui <aqibaf(a)amazon.com>
---
tools/testing/selftests/kvm/include/x86/processor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/include/x86/processor.h b/tools/testing/selftests/kvm/include/x86/processor.h
index 3f93d1b4f..8edf48b5a 100644
--- a/tools/testing/selftests/kvm/include/x86/processor.h
+++ b/tools/testing/selftests/kvm/include/x86/processor.h
@@ -1323,7 +1323,7 @@ static inline bool kvm_is_pmu_enabled(void)
static inline bool kvm_is_forced_emulation_enabled(void)
{
- return !!get_kvm_param_integer("force_emulation_prefix");
+ return get_kvm_param_bool("force_emulation_prefix");
}
static inline bool kvm_is_unrestricted_guest_enabled(void)
--
2.47.3
From: Rong Tao <rongtao(a)cestc.cn>
strnstr should not treat the ending '\0' of s2 as a matching character
if the parameter 'len' equal to s2 string length, for example:
1. bpf_strnstr("openat", "open", 4) = -ENOENT
2. bpf_strnstr("openat", "open", 5) = 0
This patch makes (1) return 0, indicating a successful match.
Fixes: e91370550f1f ("bpf: Add kfuncs for read-only string operations")
Signed-off-by: Rong Tao <rongtao(a)cestc.cn>
---
kernel/bpf/helpers.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 401b4932cc49..bf04881f96ec 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3672,10 +3672,18 @@ __bpf_kfunc int bpf_strnstr(const char *s1__ign, const char *s2__ign, size_t len
guard(pagefault)();
for (i = 0; i < XATTR_SIZE_MAX; i++) {
- for (j = 0; i + j < len && j < XATTR_SIZE_MAX; j++) {
+ for (j = 0; i + j <= len && j < XATTR_SIZE_MAX; j++) {
__get_kernel_nofault(&c2, s2__ign + j, char, err_out);
if (c2 == '\0')
return i;
+ /**
+ * corner case i+j==len to ensure that we matched
+ * entire s2. for example, param len=3:
+ * s1: A B C D E F -> i==1
+ * s2: B C D -> j==2
+ */
+ if (i + j == len)
+ break;
__get_kernel_nofault(&c1, s1__ign + j, char, err_out);
if (c1 == '\0')
return -ENOENT;
--
2.51.0
Commit 0d6ccfe6b319 ("selftests: drv-net: rss_ctx: check for all-zero keys")
added a skip exception if NIC has fewer than 3 queues enabled,
but it's just constructing the object, it's not actually rising
this exception.
Before:
# Exception| net.lib.py.utils.CmdExitFailure: Command failed: ethtool -X enp1s0 equal 3 hkey d1:cc:77:47:9d:ea:15:f2:b9:6c:ef:68:62:c0:45:d5:b0:99:7d:cf:29:53:40:06:3d:8e:b9:bc:d4:70:89:b8:8d:59:04:ea:a9:c2:21:b3:55:b8:ab:6b:d9:48:b4:bd:4c:ff:a5:f0:a8:c2
not ok 1 rss_ctx.test_rss_key_indir
After:
ok 1 rss_ctx.test_rss_key_indir # SKIP Device has fewer than 3 queues (or doesn't support queue stats)
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
I spotted that NIPA instances with 4 CPUs are failing this test case.
They have only 4/2=2 queues. I bumped their CPU count to 6, but test
is clearly wrong.
CC: shuah(a)kernel.org
CC: ecree.xilinx(a)gmail.com
CC: gal(a)nvidia.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/hw/rss_ctx.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/hw/rss_ctx.py b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
index 7bb552f8b182..9838b8457e5a 100755
--- a/tools/testing/selftests/drivers/net/hw/rss_ctx.py
+++ b/tools/testing/selftests/drivers/net/hw/rss_ctx.py
@@ -118,7 +118,7 @@ from lib.py import ethtool, ip, defer, GenerateTraffic, CmdExitFailure
qcnt = len(_get_rx_cnts(cfg))
if qcnt < 3:
- KsftSkipEx("Device has fewer than 3 queues (or doesn't support queue stats)")
+ raise KsftSkipEx("Device has fewer than 3 queues (or doesn't support queue stats)")
data = get_rss(cfg)
want_keys = ['rss-hash-key', 'rss-hash-function', 'rss-indirection-table']
--
2.51.0
Hi all,
This is a new version of Marie's patch series, with a couple of extra
fixes squashed in, notably:
- drm/xe/tests: Fix some additional gen_params signatures
https://lore.kernel.org/linux-kselftest/20250821135447.1618942-1-davidgow@g…
- kunit: Only output a test plan if we're using kunit_array_gen_params
https://lore.kernel.org/linux-kselftest/20250821135447.1618942-2-davidgow@g…
These should fix the issues found in linux-next here:
https://lore.kernel.org/linux-next/20250818120846.347d64b1@canb.auug.org.au/
These changes only affect patches 3 and 4 of the series, the others are
unchanged from v3.
Thanks, everyone, and sorry for the inconvenience!
Cheers,
-- David
---
Hello!
KUnit offers a parameterized testing framework, where tests can be
run multiple times with different inputs. However, the current
implementation uses the same `struct kunit` for each parameter run.
After each run, the test context gets cleaned up, which creates
the following limitations:
a. There is no way to store resources that are accessible across
the individual parameter runs.
b. It's not possible to pass additional context, besides the previous
parameter (and potentially anything else that is stored in the current
test context), to the parameter generator function.
c. Test users are restricted to using pre-defined static arrays
of parameter objects or generate_params() to define their
parameters. There is no flexibility to make a custom dynamic
array without using generate_params(), which can be complex if
generating the next parameter depends on more than just the single
previous parameter.
This patch series resolves these limitations by:
1. [P 1] Giving each parameterized run its own `struct kunit`. It will
remove the need to manage state, such as resetting the `test->priv`
field or the `test->status_comment` after every parameter run.
2. [P 1] Introducing parameterized test context available to all
parameter runs through the parent pointer of type `struct kunit`.
This context won't be used to execute any test logic, but will
instead be used for storing shared resources. Each parameter run
context will have a reference to that parent instance and thus,
have access to those resources.
3. [P 2] Introducing param_init() and param_exit() functions that can
initialize and exit the parameterized test context. They will run once
before and after the parameterized test. param_init() can be used to add
resources to share between parameter runs, pass parameter arrays, and
any other setup logic. While param_exit() can be used to clean up
resources that were not managed by the parameterized test, and
any other teardown logic.
4. [P 3] Passing the parameterized test context as an additional argument
to generate_params(). This provides generate_params() with more context,
making parameter generation much more flexible. The generate_params()
implementations in the KCSAN and drm/xe tests have been adapted to match
the new function pointer signature.
5. [P 4] Introducing a `params_array` field in `struct kunit`. This will
allow the parameterized test context to have direct storage of the
parameter array, enabling features like using dynamic parameter arrays
or using context beyond just the previous parameter. This will also
enable outputting the KTAP test plan for a parameterized test when the
parameter count is available.
Patches 5 and 6 add examples tests to lib/kunit/kunit-example-test.c to
showcase the new features and patch 7 updates the KUnit documentation
to reflect all the framework changes.
Thank you!
-Marie
---
Changes in v4:
Link to v3 of this patch series:
https://lore.kernel.org/linux-kselftest/20250815103604.3857930-1-marievic@g…
- Fixup the signatures of some more gen_params functions in the drm/xe
driver.
- Only print a KTAP test plan if a parameterised test is using the
built-in kunit_array_gen_params generating function, fixing the issues
with generator functions which skip array elements.
Changes in v3:
Link to v2 of this patch series:
https://lore.kernel.org/all/20250811221739.2694336-1-marievic@google.com/
- Added logic for skipping the parameter runs and updating the test statistics
when parameterized test initialization fails.
- Minor changes to the documentation.
- Commit message formatting.
Changes in v2:
Link to v1 of this patch series:
https://lore.kernel.org/all/20250729193647.3410634-1-marievic@google.com/
- Establish parameterized testing terminology:
- "parameterized test" will refer to the group of all runs of a single test
function with different parameters.
- "parameter run" will refer to the execution of the test case function with
a single parameter.
- "parameterized test context" is the `struct kunit` that holds the context
for the entire parameterized test.
- "parameter run context" is the `struct kunit` that holds the context of the
individual parameter run.
- A test is defined to be a parameterized tests if it was registered with a
generator function.
- Make comment edits to reflect the established terminology.
- Require users to manually pass kunit_array_gen_params() to
KUNIT_CASE_PARAM_WITH_INIT() as the generator function, unless they want to
provide their own generator function, if the parameter array was registered
in param_init(). This is to be consistent with the definition of a
parameterized test, i.e. generate_params() is never NULL if it's
a parameterized test.
- Change name of kunit_get_next_param_and_desc() to
kunit_array_gen_params().
- Other minor function name changes such as removing the "__" prefix in front
of internal functions.
- Change signature of get_description() in `struct params_array` to accept
the parameterized test context, as well.
- Output the KTAP test plan for a parameterized test when the parameter count
is available.
- Cover letter was made more concise.
- Edits to the example tests.
- Fix bug of parameterized test init/exit logic being done outside of the
parameterized test check.
- Fix bugs identified by the kernel test robot.
---
Marie Zhussupova (7):
kunit: Add parent kunit for parameterized test context
kunit: Introduce param_init/exit for parameterized test context
management
kunit: Pass parameterized test context to generate_params()
kunit: Enable direct registration of parameter arrays to a KUnit test
kunit: Add example parameterized test with shared resource management
using the Resource API
kunit: Add example parameterized test with direct dynamic parameter
array setup
Documentation: kunit: Document new parameterized test features
Documentation/dev-tools/kunit/usage.rst | 342 +++++++++++++++++++++++-
drivers/gpu/drm/xe/tests/xe_pci.c | 14 +-
drivers/gpu/drm/xe/tests/xe_pci_test.h | 9 +-
include/kunit/test.h | 95 ++++++-
kernel/kcsan/kcsan_test.c | 2 +-
lib/kunit/kunit-example-test.c | 217 +++++++++++++++
lib/kunit/test.c | 94 +++++--
rust/kernel/kunit.rs | 4 +
8 files changed, 740 insertions(+), 37 deletions(-)
--
2.51.0.261.g7ce5a0a67e-goog
From: Feng Yang <yangfeng(a)kylinos.cn>
The error message printed here only uses the previous err value,
which results in it being printed as 0.
When bpf_map__attach_struct_ops encounters an error,
it uses libbpf_err_ptr(err) to set errno = -err and returns NULL.
Therefore, strerror(errno) can be used to fix this issue.
Fix before:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=0
Fix after:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: Bad file descriptor
Signed-off-by: Feng Yang <yangfeng(a)kylinos.cn>
---
tools/testing/selftests/bpf/test_loader.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_loader.c b/tools/testing/selftests/bpf/test_loader.c
index f361c8aa1daf..686a7d7f87b1 100644
--- a/tools/testing/selftests/bpf/test_loader.c
+++ b/tools/testing/selftests/bpf/test_loader.c
@@ -1008,8 +1008,8 @@ void run_subtest(struct test_loader *tester,
}
link = bpf_map__attach_struct_ops(map);
if (!link) {
- PRINT_FAIL("bpf_map__attach_struct_ops failed for map %s: err=%d\n",
- bpf_map__name(map), err);
+ PRINT_FAIL("bpf_map__attach_struct_ops failed for map %s: %s\n",
+ bpf_map__name(map), strerror(errno));
goto tobj_cleanup;
}
links[links_cnt++] = link;
--
2.27.0
The file_stressor test creates directories in the root filesystem and
performs mount namespace operations that can fail on NFS root filesystems
due to network filesystem restrictions and permission limitations.
Add NFS root filesystem detection using statfs() to check for
NFS_SUPER_MAGIC and skip the test gracefully when running on NFS root,
providing a clear message about why the test was skipped.
This prevents spurious test failures in CI environments that use NFS
root while preserving the test's ability to catch SLAB_TYPESAFE_BY_RCU
related bugs on local filesystems where it can run properly.
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/filesystems/file_stressor.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/tools/testing/selftests/filesystems/file_stressor.c b/tools/testing/selftests/filesystems/file_stressor.c
index 01dd89f8e52f..b9dfe0b6b125 100644
--- a/tools/testing/selftests/filesystems/file_stressor.c
+++ b/tools/testing/selftests/filesystems/file_stressor.c
@@ -10,12 +10,14 @@
#include <string.h>
#include <sys/stat.h>
#include <sys/mount.h>
+#include <sys/vfs.h>
#include <unistd.h>
#include "../kselftest_harness.h"
#include <linux/types.h>
#include <linux/mount.h>
+#include <linux/magic.h>
#include <sys/syscall.h>
static inline int sys_fsopen(const char *fsname, unsigned int flags)
@@ -58,8 +60,13 @@ FIXTURE(file_stressor) {
FIXTURE_SETUP(file_stressor)
{
+ struct statfs sfs;
int fd_context;
+ /* Skip test if root filesystem is NFS */
+ if (statfs("/", &sfs) == 0 && sfs.f_type == NFS_SUPER_MAGIC)
+ SKIP(return, "Test requires local root filesystem, NFS root detected");
+
ASSERT_EQ(unshare(CLONE_NEWNS), 0);
ASSERT_EQ(mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL), 0);
ASSERT_EQ(mkdir("/slab_typesafe_by_rcu", 0755), 0);
--
2.50.1
From: Zhou Yuhang <zhouyuhang(a)kylinos.cn>
Flock fl and fl2 are not initialized after definition.
Due to struct padding, this may cause memcmp() to return
a non-zero value. The output is as follows:
# [INFO] opened fds 3 4
# [SUCCESS] set OFD read lock on first fd
# [SUCCESS] read and write locks conflicted
# [SUCCESS] F_UNLCK test returns: locked, type 0 pid -1 len 3
# [FAIL] F_UNLCK test returns: locked, type 0 pid -1 len 3
Initialize them to zero to solve this problem.
Signed-off-by: Zhou Yuhang <zhouyuhang(a)kylinos.cn>
---
tools/testing/selftests/filelock/ofdlocks.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/filelock/ofdlocks.c b/tools/testing/selftests/filelock/ofdlocks.c
index a55b79810ab2..84e25505bebb 100644
--- a/tools/testing/selftests/filelock/ofdlocks.c
+++ b/tools/testing/selftests/filelock/ofdlocks.c
@@ -36,6 +36,8 @@ int main(void)
{
int rc;
struct flock fl, fl2;
+ memset(&fl, 0, sizeof(fl));
+ memset(&fl2, 0, sizeof(fl2));
int fd = open("/tmp/aa", O_RDWR | O_CREAT | O_EXCL, 0600);
int fd2 = open("/tmp/aa", O_RDONLY);
--
2.33.0
pthread_create provided by the bionic libc uses getpid internally.
Therefore using getpid as the filter target may cause the test to fail.
This hasn't been a problem because bionic caches the pid and doesn't
call the actual syscall. However we are planning to stop the pid
caching and it will cause the test failure.
This patch changes to use getppid instead in the test.
Signed-off-by: Ryuichiro Chiba <chibar(a)google.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index fc4910d35342..5505d134d1a6 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -798,7 +798,7 @@ void *kill_thread(void *data)
bool die = (bool)data;
if (die) {
- syscall(__NR_getpid);
+ syscall(__NR_getppid);
return (void *)SIBLING_EXIT_FAILURE;
}
@@ -817,11 +817,11 @@ void kill_thread_or_group(struct __test_metadata *_metadata,
{
pthread_t thread;
void *status;
- /* Kill only when calling __NR_getpid. */
+ /* Kill only when calling __NR_getppid. */
struct sock_filter filter_thread[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
- BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getpid, 0, 1),
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getppid, 0, 1),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_KILL_THREAD),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
};
@@ -833,7 +833,7 @@ void kill_thread_or_group(struct __test_metadata *_metadata,
struct sock_filter filter_process[] = {
BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
offsetof(struct seccomp_data, nr)),
- BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getpid, 0, 1),
+ BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getppid, 0, 1),
BPF_STMT(BPF_RET|BPF_K, kill),
BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
};
--
2.51.0.268.g9569e192d0-goog
This patchset introduces a new per-port bonding option: `ad_actor_port_prio`.
It allows users to configure the actor's port priority, which can then be used
by the bonding driver for aggregator selection based on port priority.
This provides finer control over LACP aggregator choice, especially in setups
with multiple eligible aggregators over 2 switches.
v4:
a) fix actor_port_prio minimal value (Jay Vosburgh)
b) fix ad_agg_selection_test comment order (Paolo Abeni)
c) restruct selftest, reduce duplication (Paolo Abeni)
v3:
a) add comments when init slave port_priority (Jonas Gorski)
b) rename ad_lacp_port_prio to lacp_port_prio (Jay Vosburgh)
v2:
a) set default bond option value for port priority (Nikolay Aleksandrov)
b) fix __agg_ports_priority coding style (Nikolay Aleksandrov)
c) fix shellcheck warns
Hangbin Liu (3):
bonding: add support for per-port LACP actor priority
bonding: support aggregator selection based on port priority
selftests: bonding: add test for LACP actor port priority
Documentation/networking/bonding.rst | 18 ++-
drivers/net/bonding/bond_3ad.c | 31 +++++
drivers/net/bonding/bond_netlink.c | 16 +++
drivers/net/bonding/bond_options.c | 37 ++++++
include/net/bond_3ad.h | 2 +
include/net/bond_options.h | 1 +
include/uapi/linux/if_link.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_lacp_prio.sh | 107 ++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 24 ----
tools/testing/selftests/net/lib.sh | 24 ++++
11 files changed, 238 insertions(+), 26 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_lacp_prio.sh
--
2.50.1
When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array, e.g. "1@-96(%rbp,%rax,8)".
The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).
This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
`parse_usdt_arg`.
- add an usdt_o2 test case to cover SIB addressing mode.
Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.
The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how
Change since v2:
- fix the `scale` uninitialized error
Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
and pass all test cases.
Change since v4:
- split the patch into two parts, one for the fix and the other for the
test
Change since v5:
- Only enable optimization for x86 architecture to generate SIB addressing
usdt argument spec.
Change since v6:
- Add an usdt_o2 test case to cover SIB addressing mode.
- Reinstate the usdt.c test case.
Change since v7:
- Refactor modifications to __bpf_usdt_arg_spec to avoid increasing its size,
achieving better compatibility
- Fix some minor code style issues
- Refactor the usdt_o2 test case, removing semaphore and adding GCC attribute
to force -O2 optimization
Change since v8:
- Refactor the usdt_o2 test case, using assembly to force SIB addressing mode.
Change since v9:
- Only enable the usdt_o2 test case on x86_64 and i386 architectures since the
SIB addressing mode is only supported on x86_64 and i386.
Change since v10:
- Replace `__attribute__((optimize("O2")))` with `#pragma GCC optimize("O1")`
to fix the issue where the optimized compilation condition works improperly.
- Renamed test case usdt_o2 and relevant files name to usdt_o1 in that O1
level optimization is enough to generate SIB addressing usdt argument spec.
Change since v11:
- Replace `STAP_PROBE1` with `STAP_PROBE_ASM`
- Use bit fields instead of bit shifting operations
- Merge the usdt_o1 test case into the usdt test case
Change since v12:
- This patch is same with the v12 but with a new version number.
Change since v13(resolve some review comments):
- https://lore.kernel.org/bpf/CAEf4BzZWd2zUC=U6uGJFF3EMZ7zWGLweQAG3CJWTeHy-5y…
- https://lore.kernel.org/bpf/CAEf4Bzbs3hV_Q47+d93tTX13WkrpkpOb4=U04mZCjHyZg4…
Change since v14:
- fix a typo in __bpf_usdt_arg_spec
Change since v15(resolve some review comments):
- https://lore.kernel.org/bpf/CAEf4BzaxuYijEfQMDFZ+CQdjxLuDZiesUXNA-SiopS+5+V…
- https://lore.kernel.org/bpf/CAEf4BzaHi5kpuJ6OVvDU62LT5g0qHbWYMfb_FBQ3iuvvUF…
- https://lore.kernel.org/bpf/d438bf3a-a9c9-4d34-b814-63f2e9bb3a85@linux.dev/
Jiawei Zhao (2):
libbpf: fix USDT SIB argument handling causing unrecognized register
error
selftests/bpf: Enrich subtest_basic_usdt case in selftests to cover
SIB handling logic
tools/lib/bpf/usdt.bpf.h | 44 +++++++++-
tools/lib/bpf/usdt.c | 69 +++++++++++++--
tools/testing/selftests/bpf/prog_tests/usdt.c | 84 ++++++++++++++++++-
tools/testing/selftests/bpf/progs/test_usdt.c | 31 +++++++
4 files changed, 219 insertions(+), 9 deletions(-)
--
2.43.0
From: Shubham Sharma <slopixelz(a)gmail.com>
Fixed the spelling typo and checked other BPF selftests sources for similar typos.
Follow-up to patch series 990629
v2:Instead of sending multiple tiny patches for minor comment fixes, combined them into a single pass across the affected files.
Signed-off-by: Shubham Sharma <slopixelz(a)gmail.com>
---
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/bpf/bench.c | 2 +-
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 2 +-
tools/testing/selftests/bpf/prog_tests/fd_array.c | 2 +-
.../testing/selftests/bpf/prog_tests/kprobe_multi_test.c | 2 +-
tools/testing/selftests/bpf/prog_tests/module_attach.c | 2 +-
tools/testing/selftests/bpf/prog_tests/reg_bounds.c | 4 ++--
.../selftests/bpf/prog_tests/stacktrace_build_id.c | 2 +-
.../selftests/bpf/prog_tests/stacktrace_build_id_nmi.c | 2 +-
tools/testing/selftests/bpf/prog_tests/stacktrace_map.c | 2 +-
.../selftests/bpf/prog_tests/stacktrace_map_raw_tp.c | 2 +-
.../selftests/bpf/prog_tests/stacktrace_map_skip.c | 2 +-
tools/testing/selftests/bpf/progs/bpf_cc_cubic.c | 2 +-
tools/testing/selftests/bpf/progs/bpf_dctcp.c | 2 +-
.../selftests/bpf/progs/freplace_connect_v4_prog.c | 2 +-
tools/testing/selftests/bpf/progs/iters_state_safety.c | 2 +-
tools/testing/selftests/bpf/progs/rbtree_search.c | 2 +-
.../testing/selftests/bpf/progs/struct_ops_kptr_return.c | 2 +-
tools/testing/selftests/bpf/progs/struct_ops_refcounted.c | 2 +-
tools/testing/selftests/bpf/progs/test_cls_redirect.c | 2 +-
.../selftests/bpf/progs/test_cls_redirect_dynptr.c | 2 +-
tools/testing/selftests/bpf/progs/uretprobe_stack.c | 4 ++--
tools/testing/selftests/bpf/progs/verifier_scalar_ids.c | 2 +-
tools/testing/selftests/bpf/progs/verifier_var_off.c | 6 +++---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
tools/testing/selftests/bpf/verifier/calls.c | 8 ++++----
tools/testing/selftests/bpf/xdping.c | 2 +-
tools/testing/selftests/bpf/xsk.h | 4 ++--
28 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 4863106034df..de0418f7a661 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -398,7 +398,7 @@ $(HOST_BPFOBJ): $(wildcard $(BPFDIR)/*.[ch] $(BPFDIR)/Makefile) \
DESTDIR=$(HOST_SCRATCH_DIR)/ prefix= all install_headers
endif
-# vmlinux.h is first dumped to a temprorary file and then compared to
+# vmlinux.h is first dumped to a temporary file and then compared to
# the previous version. This helps to avoid unnecessary re-builds of
# $(TRUNNER_BPF_OBJS)
$(INCLUDE_DIR)/vmlinux.h: $(VMLINUX_BTF) $(BPFTOOL) | $(INCLUDE_DIR)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index ddd73d06a1eb..3ecc226ea7b2 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -499,7 +499,7 @@ extern const struct bench bench_rename_rawtp;
extern const struct bench bench_rename_fentry;
extern const struct bench bench_rename_fexit;
-/* pure counting benchmarks to establish theoretical lmits */
+/* pure counting benchmarks to establish theoretical limits */
extern const struct bench bench_trig_usermode_count;
extern const struct bench bench_trig_syscall_count;
extern const struct bench bench_trig_kernel_count;
diff --git a/tools/testing/selftests/bpf/prog_tests/btf_dump.c b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
index 82903585c870..10cba526d3e6 100644
--- a/tools/testing/selftests/bpf/prog_tests/btf_dump.c
+++ b/tools/testing/selftests/bpf/prog_tests/btf_dump.c
@@ -63,7 +63,7 @@ static int test_btf_dump_case(int n, struct btf_dump_test_case *t)
/* tests with t->known_ptr_sz have no "long" or "unsigned long" type,
* so it's impossible to determine correct pointer size; but if they
- * do, it should be 8 regardless of host architecture, becaues BPF
+ * do, it should be 8 regardless of host architecture, because BPF
* target is always 64-bit
*/
if (!t->known_ptr_sz) {
diff --git a/tools/testing/selftests/bpf/prog_tests/fd_array.c b/tools/testing/selftests/bpf/prog_tests/fd_array.c
index 241b2c8c6e0f..c534b4d5f9da 100644
--- a/tools/testing/selftests/bpf/prog_tests/fd_array.c
+++ b/tools/testing/selftests/bpf/prog_tests/fd_array.c
@@ -293,7 +293,7 @@ static int get_btf_id_by_fd(int btf_fd, __u32 *id)
* 1) Create a new btf, it's referenced only by a file descriptor, so refcnt=1
* 2) Load a BPF prog with fd_array[0] = btf_fd; now btf's refcnt=2
* 3) Close the btf_fd, now refcnt=1
- * Wait and check that BTF stil exists.
+ * Wait and check that BTF still exists.
*/
static void check_fd_array_cnt__referenced_btfs(void)
{
diff --git a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
index e19ef509ebf8..f377bea0b82d 100644
--- a/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/kprobe_multi_test.c
@@ -463,7 +463,7 @@ static bool skip_entry(char *name)
return false;
}
-/* Do comparision by ignoring '.llvm.<hash>' suffixes. */
+/* Do comparison by ignoring '.llvm.<hash>' suffixes. */
static int compare_name(const char *name1, const char *name2)
{
const char *res1, *res2;
diff --git a/tools/testing/selftests/bpf/prog_tests/module_attach.c b/tools/testing/selftests/bpf/prog_tests/module_attach.c
index 6d391d95f96e..70fa7ae93173 100644
--- a/tools/testing/selftests/bpf/prog_tests/module_attach.c
+++ b/tools/testing/selftests/bpf/prog_tests/module_attach.c
@@ -90,7 +90,7 @@ void test_module_attach(void)
test_module_attach__detach(skel);
- /* attach fentry/fexit and make sure it get's module reference */
+ /* attach fentry/fexit and make sure it gets module reference */
link = bpf_program__attach(skel->progs.handle_fentry);
if (!ASSERT_OK_PTR(link, "attach_fentry"))
goto cleanup;
diff --git a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c
index e261b0e872db..d93a0c7b1786 100644
--- a/tools/testing/selftests/bpf/prog_tests/reg_bounds.c
+++ b/tools/testing/selftests/bpf/prog_tests/reg_bounds.c
@@ -623,7 +623,7 @@ static void range_cond(enum num_t t, struct range x, struct range y,
*newx = range(t, x.a, x.b);
*newy = range(t, y.a + 1, y.b);
} else if (x.a == x.b && x.b == y.b) {
- /* X is a constant matching rigth side of Y */
+ /* X is a constant matching right side of Y */
*newx = range(t, x.a, x.b);
*newy = range(t, y.a, y.b - 1);
} else if (y.a == y.b && x.a == y.a) {
@@ -631,7 +631,7 @@ static void range_cond(enum num_t t, struct range x, struct range y,
*newx = range(t, x.a + 1, x.b);
*newy = range(t, y.a, y.b);
} else if (y.a == y.b && x.b == y.b) {
- /* Y is a constant matching rigth side of X */
+ /* Y is a constant matching right side of X */
*newx = range(t, x.a, x.b - 1);
*newy = range(t, y.a, y.b);
} else {
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
index b7ba5cd47d96..271b5cc9fc01 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id.c
@@ -39,7 +39,7 @@ void test_stacktrace_build_id(void)
bpf_map_update_elem(control_map_fd, &key, &val, 0);
/* for every element in stackid_hmap, we can find a corresponding one
- * in stackmap, and vise versa.
+ * in stackmap, and vice versa.
*/
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
index 0832fd787457..b277dddd5af7 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_build_id_nmi.c
@@ -66,7 +66,7 @@ void test_stacktrace_build_id_nmi(void)
bpf_map_update_elem(control_map_fd, &key, &val, 0);
/* for every element in stackid_hmap, we can find a corresponding one
- * in stackmap, and vise versa.
+ * in stackmap, and vice versa.
*/
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
index df59e4ae2951..84a7e405e912 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map.c
@@ -50,7 +50,7 @@ void test_stacktrace_map(void)
bpf_map_update_elem(control_map_fd, &key, &val, 0);
/* for every element in stackid_hmap, we can find a corresponding one
- * in stackmap, and vise versa.
+ * in stackmap, and vice versa.
*/
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
index c6ef06f55cdb..e0cb4697b4b3 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_raw_tp.c
@@ -46,7 +46,7 @@ void test_stacktrace_map_raw_tp(void)
bpf_map_update_elem(control_map_fd, &key, &val, 0);
/* for every element in stackid_hmap, we can find a corresponding one
- * in stackmap, and vise versa.
+ * in stackmap, and vice versa.
*/
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (CHECK(err, "compare_map_keys stackid_hmap vs. stackmap",
diff --git a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c
index 1932b1e0685c..dc2ccf6a14d1 100644
--- a/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c
+++ b/tools/testing/selftests/bpf/prog_tests/stacktrace_map_skip.c
@@ -40,7 +40,7 @@ void test_stacktrace_map_skip(void)
skel->bss->control = 1;
/* for every element in stackid_hmap, we can find a corresponding one
- * in stackmap, and vise versa.
+ * in stackmap, and vice versa.
*/
err = compare_map_keys(stackid_hmap_fd, stackmap_fd);
if (!ASSERT_OK(err, "compare_map_keys stackid_hmap vs. stackmap"))
diff --git a/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c b/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c
index 1654a530aa3d..4e51785e7606 100644
--- a/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c
+++ b/tools/testing/selftests/bpf/progs/bpf_cc_cubic.c
@@ -101,7 +101,7 @@ static void tcp_cwnd_reduction(struct sock *sk, int newly_acked_sacked,
tp->snd_cwnd = pkts_in_flight + sndcnt;
}
-/* Decide wheather to run the increase function of congestion control. */
+/* Decide whether to run the increase function of congestion control. */
static bool tcp_may_raise_cwnd(const struct sock *sk, const int flag)
{
if (tcp_sk(sk)->reordering > TCP_REORDERING)
diff --git a/tools/testing/selftests/bpf/progs/bpf_dctcp.c b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
index 7cd73e75f52a..32c511bcd60b 100644
--- a/tools/testing/selftests/bpf/progs/bpf_dctcp.c
+++ b/tools/testing/selftests/bpf/progs/bpf_dctcp.c
@@ -1,7 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright (c) 2019 Facebook */
-/* WARNING: This implemenation is not necessarily the same
+/* WARNING: This implementation is not necessarily the same
* as the tcp_dctcp.c. The purpose is mainly for testing
* the kernel BPF logic.
*/
diff --git a/tools/testing/selftests/bpf/progs/freplace_connect_v4_prog.c b/tools/testing/selftests/bpf/progs/freplace_connect_v4_prog.c
index 544e5ac90461..d09bbd8ae8a8 100644
--- a/tools/testing/selftests/bpf/progs/freplace_connect_v4_prog.c
+++ b/tools/testing/selftests/bpf/progs/freplace_connect_v4_prog.c
@@ -12,7 +12,7 @@
SEC("freplace/connect_v4_prog")
int new_connect_v4_prog(struct bpf_sock_addr *ctx)
{
- // return value thats in invalid range
+ // return value that's in invalid range
return 255;
}
diff --git a/tools/testing/selftests/bpf/progs/iters_state_safety.c b/tools/testing/selftests/bpf/progs/iters_state_safety.c
index f41257eadbb2..b381ac0c736c 100644
--- a/tools/testing/selftests/bpf/progs/iters_state_safety.c
+++ b/tools/testing/selftests/bpf/progs/iters_state_safety.c
@@ -345,7 +345,7 @@ int __naked read_from_iter_slot_fail(void)
"r3 = 1000;"
"call %[bpf_iter_num_new];"
- /* attemp to leak bpf_iter_num state */
+ /* attempt to leak bpf_iter_num state */
"r7 = *(u64 *)(r6 + 0);"
"r8 = *(u64 *)(r6 + 8);"
diff --git a/tools/testing/selftests/bpf/progs/rbtree_search.c b/tools/testing/selftests/bpf/progs/rbtree_search.c
index 098ef970fac1..b05565d1db0d 100644
--- a/tools/testing/selftests/bpf/progs/rbtree_search.c
+++ b/tools/testing/selftests/bpf/progs/rbtree_search.c
@@ -183,7 +183,7 @@ long test_##op##_spinlock_##dolock(void *ctx) \
}
/*
- * Use a spearate MSG macro instead of passing to TEST_XXX(..., MSG)
+ * Use a separate MSG macro instead of passing to TEST_XXX(..., MSG)
* to ensure the message itself is not in the bpf prog lineinfo
* which the verifier includes in its log.
* Otherwise, the test_loader will incorrectly match the prog lineinfo
diff --git a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c
index 36386b3c23a1..2b98b7710816 100644
--- a/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c
+++ b/tools/testing/selftests/bpf/progs/struct_ops_kptr_return.c
@@ -9,7 +9,7 @@ void bpf_task_release(struct task_struct *p) __ksym;
/* This test struct_ops BPF programs returning referenced kptr. The verifier should
* allow a referenced kptr or a NULL pointer to be returned. A referenced kptr to task
- * here is acquried automatically as the task argument is tagged with "__ref".
+ * here is acquired automatically as the task argument is tagged with "__ref".
*/
SEC("struct_ops/test_return_ref_kptr")
struct task_struct *BPF_PROG(kptr_return, int dummy,
diff --git a/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c
index 76dcb6089d7f..9c0a65466356 100644
--- a/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c
+++ b/tools/testing/selftests/bpf/progs/struct_ops_refcounted.c
@@ -9,7 +9,7 @@ __attribute__((nomerge)) extern void bpf_task_release(struct task_struct *p) __k
/* This is a test BPF program that uses struct_ops to access a referenced
* kptr argument. This is a test for the verifier to ensure that it
- * 1) recongnizes the task as a referenced object (i.e., ref_obj_id > 0), and
+ * 1) recognizes the task as a referenced object (i.e., ref_obj_id > 0), and
* 2) the same reference can be acquired from multiple paths as long as it
* has not been released.
*/
diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect.c b/tools/testing/selftests/bpf/progs/test_cls_redirect.c
index f344c6835e84..823169fb6e4c 100644
--- a/tools/testing/selftests/bpf/progs/test_cls_redirect.c
+++ b/tools/testing/selftests/bpf/progs/test_cls_redirect.c
@@ -129,7 +129,7 @@ typedef uint8_t *net_ptr __attribute__((align_value(8)));
typedef struct buf {
struct __sk_buff *skb;
net_ptr head;
- /* NB: tail musn't have alignment other than 1, otherwise
+ /* NB: tail mustn't have alignment other than 1, otherwise
* LLVM will go and eliminate code, e.g. when checking packet lengths.
*/
uint8_t *const tail;
diff --git a/tools/testing/selftests/bpf/progs/test_cls_redirect_dynptr.c b/tools/testing/selftests/bpf/progs/test_cls_redirect_dynptr.c
index d0f7670351e5..dfd4a2710391 100644
--- a/tools/testing/selftests/bpf/progs/test_cls_redirect_dynptr.c
+++ b/tools/testing/selftests/bpf/progs/test_cls_redirect_dynptr.c
@@ -494,7 +494,7 @@ static ret_t get_next_hop(struct bpf_dynptr *dynptr, __u64 *offset, encap_header
*offset += sizeof(*next_hop);
- /* Skip the remainig next hops (may be zero). */
+ /* Skip the remaining next hops (may be zero). */
return skip_next_hops(offset, encap->unigue.hop_count - encap->unigue.next_hop - 1);
}
diff --git a/tools/testing/selftests/bpf/progs/uretprobe_stack.c b/tools/testing/selftests/bpf/progs/uretprobe_stack.c
index 9fdcf396b8f4..a2951e2f1711 100644
--- a/tools/testing/selftests/bpf/progs/uretprobe_stack.c
+++ b/tools/testing/selftests/bpf/progs/uretprobe_stack.c
@@ -26,8 +26,8 @@ int usdt_len;
SEC("uprobe//proc/self/exe:target_1")
int BPF_UPROBE(uprobe_1)
{
- /* target_1 is recursive wit depth of 2, so we capture two separate
- * stack traces, depending on which occurence it is
+ /* target_1 is recursive with depth of 2, so we capture two separate
+ * stack traces, depending on which occurrence it is
*/
static bool recur = false;
diff --git a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
index 7c5e5e6d10eb..dba3ca728f6e 100644
--- a/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
+++ b/tools/testing/selftests/bpf/progs/verifier_scalar_ids.c
@@ -349,7 +349,7 @@ __naked void precision_two_ids(void)
SEC("socket")
__success __log_level(2)
__flag(BPF_F_TEST_STATE_FREQ)
-/* check thar r0 and r6 have different IDs after 'if',
+/* check that r0 and r6 have different IDs after 'if',
* collect_linked_regs() can't tie more than 6 registers for a single insn.
*/
__msg("8: (25) if r0 > 0x7 goto pc+0 ; R0=scalar(id=1")
diff --git a/tools/testing/selftests/bpf/progs/verifier_var_off.c b/tools/testing/selftests/bpf/progs/verifier_var_off.c
index 1d36d01b746e..f345466bca68 100644
--- a/tools/testing/selftests/bpf/progs/verifier_var_off.c
+++ b/tools/testing/selftests/bpf/progs/verifier_var_off.c
@@ -114,8 +114,8 @@ __naked void stack_write_priv_vs_unpriv(void)
}
/* Similar to the previous test, but this time also perform a read from the
- * address written to with a variable offset. The read is allowed, showing that,
- * after a variable-offset write, a priviledged program can read the slots that
+ * address written to with a variable offet. The read is allowed, showing that,
+ * after a variable-offset write, a privileged program can read the slots that
* were in the range of that write (even if the verifier doesn't actually know if
* the slot being read was really written to or not.
*
@@ -157,7 +157,7 @@ __naked void stack_write_followed_by_read(void)
SEC("socket")
__description("variable-offset stack write clobbers spilled regs")
__failure
-/* In the priviledged case, dereferencing a spilled-and-then-filled
+/* In the privileged case, dereferencing a spilled-and-then-filled
* register is rejected because the previous variable offset stack
* write might have overwritten the spilled pointer (i.e. we lose track
* of the spilled register when we analyze the write).
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index fd2da2234cc9..76568db7a664 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -1372,7 +1372,7 @@ static int run_options(struct sockmap_options *options, int cg_fd, int test)
} else
fprintf(stderr, "unknown test\n");
out:
- /* Detatch and zero all the maps */
+ /* Detach and zero all the maps */
bpf_prog_detach2(bpf_program__fd(progs[3]), cg_fd, BPF_CGROUP_SOCK_OPS);
for (i = 0; i < ARRAY_SIZE(links); i++) {
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index f3492efc8834..c8d640802cce 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -1375,7 +1375,7 @@
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
/* write into map value */
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
- /* fetch secound map_value_ptr from the stack */
+ /* fetch second map_value_ptr from the stack */
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -16),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
/* write into map value */
@@ -1439,7 +1439,7 @@
/* second time with fp-16 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 1, 2),
- /* fetch secound map_value_ptr from the stack */
+ /* fetch second map_value_ptr from the stack */
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, 0),
/* write into map value */
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
@@ -1493,7 +1493,7 @@
/* second time with fp-16 */
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4),
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2),
- /* fetch secound map_value_ptr from the stack */
+ /* fetch second map_value_ptr from the stack */
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_7, 0),
/* write into map value */
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 0),
@@ -2380,7 +2380,7 @@
*/
BPF_JMP_REG(BPF_JGT, BPF_REG_6, BPF_REG_7, 1),
BPF_MOV64_REG(BPF_REG_9, BPF_REG_8),
- /* r9 = *r9 ; verifier get's to this point via two paths:
+ /* r9 = *r9 ; verifier gets to this point via two paths:
* ; (I) one including r9 = r8, verified first;
* ; (II) one excluding r9 = r8, verified next.
* ; After load of *r9 to r9 the frame[0].fp[-24].id == r9.id.
diff --git a/tools/testing/selftests/bpf/xdping.c b/tools/testing/selftests/bpf/xdping.c
index 1503a1d2faa0..9ed8c796645d 100644
--- a/tools/testing/selftests/bpf/xdping.c
+++ b/tools/testing/selftests/bpf/xdping.c
@@ -155,7 +155,7 @@ int main(int argc, char **argv)
}
if (!server) {
- /* Only supports IPv4; see hints initiailization above. */
+ /* Only supports IPv4; see hints initialization above. */
if (getaddrinfo(argv[optind], NULL, &hints, &a) || !a) {
fprintf(stderr, "Could not resolve %s\n", argv[optind]);
return 1;
diff --git a/tools/testing/selftests/bpf/xsk.h b/tools/testing/selftests/bpf/xsk.h
index 93c2cc413cfc..48729da142c2 100644
--- a/tools/testing/selftests/bpf/xsk.h
+++ b/tools/testing/selftests/bpf/xsk.h
@@ -93,8 +93,8 @@ static inline __u32 xsk_prod_nb_free(struct xsk_ring_prod *r, __u32 nb)
/* Refresh the local tail pointer.
* cached_cons is r->size bigger than the real consumer pointer so
* that this addition can be avoided in the more frequently
- * executed code that computs free_entries in the beginning of
- * this function. Without this optimization it whould have been
+ * executed code that computes free_entries in the beginning of
+ * this function. Without this optimization it would have been
* free_entries = r->cached_prod - r->cached_cons + r->size.
*/
r->cached_cons = __atomic_load_n(r->consumer, __ATOMIC_ACQUIRE);
--
2.48.1
This series introduces VFIO selftests, located in
tools/testing/selftests/vfio/.
VFIO selftests aim to enable kernel developers to write and run tests
that take the form of userspace programs that interact with VFIO and
IOMMUFD uAPIs. VFIO selftests can be used to write functional tests for
new features, regression tests for bugs, and performance tests for
optimizations.
These tests are designed to interact with real PCI devices, i.e. they do
not rely on mocking out or faking any behavior in the kernel. This
allows the tests to exercise not only VFIO but also IOMMUFD, the IOMMU
driver, interrupt remapping, IRQ handling, etc.
For more background on the motivation and design of this series, please
see the RFC:
https://lore.kernel.org/kvm/20250523233018.1702151-1-dmatlack@google.com/
This series can also be found on GitHub:
https://github.com/dmatlack/linux/tree/vfio/selftests/v2
Changelog
-----------------------------------------------------------------------
v1: https://lore.kernel.org/kvm/20250620232031.2705638-1-dmatlack@google.com/
- Collect various Acks
- Switch myself from Reviewer to Maintainer of VFIO selftests
- Re-order the new MAINTAINERS entry to be alphabetical
- Drop the KVM selftests patches from the series
- Reorder the tools header commits to be closer to the commits that
use them (Vinicius)
- Use host virtual addresses instead of magic numbers for IOVAs in
vfio_pci_driver_test and vfio_dma_mapping_test
RFC: https://lore.kernel.org/kvm/20250523233018.1702151-1-dmatlack@google.com/
- Add symlink to linux/pci_ids.h instead of copying (Jason)
- Add symlinks to drivers/dma/*/*.h instead of copying (Jason)
- Automatically replicate vfio_dma_mapping_test across backing
sources using fixture variants (Jason)
- Automatically replicate vfio_dma_mapping_test and
vfio_pci_driver_test across all iommu_modes using fixture
variants (Jason)
- Invert access() check in vfio_dma_mapping_test (me)
- Use driver_override instead of add/remove_id (Alex)
- Allow tests to get BDF from env var (Alex)
- Use KSFT_FAIL instead of 1 to exit with failure (Alex)
- Unconditionally create $(LIBVFIO_O_DIRS) to avoid target
conflict with ../cgroup/lib/libcgroup.mk when building
KVM selftests (me)
- Allow VFIO selftests to run automatically by switching from
TEST_GEN_PROGS_EXTENDED to TEST_GEN_PROGS. Automatically run
selftests will use $VFIO_SELFTESTS_BDF environment variable
to know which device to use (Alex)
- Replace hardcoded SZ_4K with getpagesize() in vfio_dma_mapping_test
to support platforms with other page sizes (me)
- Make all global variables static where possible (me)
- Pass argc and argv to test_harness_main() so that users can
pass flags to the kselftest harness (me)
Instructions
-----------------------------------------------------------------------
Running VFIO selftests requires at a PCI device bound to vfio-pci for
the tests to use. The address of this device is passed to the test as
a segment:bus:device.function string, which must match the path to
the device in /sys/bus/pci/devices/ (e.g. 0000:00:04.0).
Once you have chosen a device, there is a helper script provided to
unbind the device from its current driver, bind it to vfio-pci, export
the environment variable $VFIO_SELFTESTS_BDF, and launch a shell:
$ tools/testing/selftests/vfio/run.sh -d 0000:00:04.0 -s
The -d option tells the script which device to use and the -s option
tells the script to launch a shell.
Additionally, the VFIO selftest vfio_dma_mapping_test has test cases
that rely on HugeTLB pages being available, otherwise they are skipped.
To enable those tests make sure at least 1 2MB and 1 1GB HugeTLB pages
are available.
$ echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ echo 1 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
To run all VFIO selftests using make:
$ make -C tools/testing/selftests/vfio run_tests
To run individual tests:
$ tools/testing/selftests/vfio/vfio_dma_mapping_test
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -v iommufd_anonymous_hugetlb_2mb
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -r vfio_dma_mapping_test.iommufd_anonymous_hugetlb_2mb.dma_map_unmap
The environment variable $VFIO_SELFTESTS_BDF can be overridden for a
specific test by passing in the BDF on the command line as the last
positional argument.
$ tools/testing/selftests/vfio/vfio_dma_mapping_test 0000:00:04.0
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -v iommufd_anonymous_hugetlb_2mb 0000:00:04.0
$ tools/testing/selftests/vfio/vfio_dma_mapping_test -r vfio_dma_mapping_test.iommufd_anonymous_hugetlb_2mb.dma_map_unmap 0000:00:04.0
When you are done, free the HugeTLB pages and exit the shell started by
run.sh. Exiting the shell will cause the device to be unbound from
vfio-pci and bound back to its original driver.
$ echo 0 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
$ echo 0 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
$ exit
It's also possible to use run.sh to run just a single test hermetically,
rather than dropping into a shell:
$ tools/testing/selftests/vfio/run.sh -d 0000:00:04.0 -- tools/testing/selftests/vfio/vfio_dma_mapping_test -v iommufd_anonymous
Tests
-----------------------------------------------------------------------
There are 4 tests in this series, mostly to demonstrate as a
proof-of-concept:
- tools/testing/selftests/vfio/vfio_pci_device_test.c
- tools/testing/selftests/vfio/vfio_pci_driver_test.c
- tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
- tools/testing/selftests/vfio/vfio_dma_mapping_test.c
Future Areas of Development
-----------------------------------------------------------------------
Library:
- Driver support for devices that can be used on AMD, ARM, and other
platforms (e.g. mlx5).
- Driver support for a device available in QEMU VMs (e.g.
pcie-ats-testdev [1])
- Support for tests that use multiple devices.
- Support for IOMMU groups with multiple devices.
- Support for multiple devices sharing the same container/iommufd.
- Sharing TEST_ASSERT() macros and other common code between KVM
and VFIO selftests.
Tests:
- DMA mapping performance tests for BARs/HugeTLB/etc.
- Porting tests from
https://github.com/awilliam/tests/commits/for-clg/ to selftests.
- Live Update selftests.
- Resend Sean's KVM selftest for posted interrupts using the VFIO
selftests library [2][3]
Cc: Alex Williamson <alex.williamson(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Kevin Tian <kevin.tian(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Sean Christopherson <seanjc(a)google.com>
Cc: Vipin Sharma <vipinsh(a)google.com>
Cc: Josh Hilke <jrhilke(a)google.com>
Cc: Aaron Lewis <aaronlewis(a)google.com>
Cc: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Cc: Saeed Mahameed <saeedm(a)nvidia.com>
Cc: Adithya Jayachandran <ajayachandra(a)nvidia.com>
Cc: Joel Granados <joel.granados(a)kernel.org>
[1] https://github.com/Joelgranados/qemu/blob/pcie-testdev/hw/misc/pcie-ats-tes…
[2] https://lore.kernel.org/kvm/20250404193923.1413163-68-seanjc@google.com/
[3] https://lore.kernel.org/kvm/20250620232031.2705638-32-dmatlack@google.com/
David Matlack (25):
selftests: Create tools/testing/selftests/vfio
vfio: selftests: Add a helper library for VFIO selftests
vfio: selftests: Introduce vfio_pci_device_test
vfio: selftests: Keep track of DMA regions mapped into the device
vfio: selftests: Enable asserting MSI eventfds not firing
vfio: selftests: Add a helper for matching vendor+device IDs
vfio: selftests: Add driver framework
vfio: sefltests: Add vfio_pci_driver_test
tools headers: Add stub definition for __iomem
tools headers: Import asm-generic MMIO helpers
tools headers: Import x86 MMIO helper overrides
tools headers: Add symlink to linux/pci_ids.h
dmaengine: ioat: Move system_has_dca_enabled() to dma.h
vfio: selftests: Add driver for Intel CBDMA
tools headers: Import iosubmit_cmds512()
dmaengine: idxd: Allow registers.h to be included from tools/
vfio: selftests: Add driver for Intel DSA
vfio: selftests: Move helper to get cdev path to libvfio
vfio: selftests: Encapsulate IOMMU mode
vfio: selftests: Replicate tests across all iommu_modes
vfio: selftests: Add vfio_type1v2_mode
vfio: selftests: Add iommufd_compat_type1{,v2} modes
vfio: selftests: Add iommufd mode
vfio: selftests: Make iommufd the default iommu_mode
vfio: selftests: Add a script to help with running VFIO selftests
Josh Hilke (5):
vfio: selftests: Test basic VFIO and IOMMUFD integration
vfio: selftests: Move vfio dma mapping test to their own file
vfio: selftests: Add test to reset vfio device.
vfio: selftests: Add DMA mapping tests for 2M and 1G HugeTLB
vfio: selftests: Validate 2M/1G HugeTLB are mapped as 2M/1G in IOMMU
MAINTAINERS | 7 +
drivers/dma/idxd/registers.h | 4 +
drivers/dma/ioat/dma.h | 2 +
drivers/dma/ioat/hw.h | 3 -
tools/arch/x86/include/asm/io.h | 101 +++
tools/arch/x86/include/asm/special_insns.h | 27 +
tools/include/asm-generic/io.h | 482 ++++++++++++++
tools/include/asm/io.h | 11 +
tools/include/linux/compiler.h | 4 +
tools/include/linux/io.h | 4 +-
tools/include/linux/pci_ids.h | 1 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/vfio/.gitignore | 7 +
tools/testing/selftests/vfio/Makefile | 21 +
.../selftests/vfio/lib/drivers/dsa/dsa.c | 416 ++++++++++++
.../vfio/lib/drivers/dsa/registers.h | 1 +
.../selftests/vfio/lib/drivers/ioat/hw.h | 1 +
.../selftests/vfio/lib/drivers/ioat/ioat.c | 235 +++++++
.../vfio/lib/drivers/ioat/registers.h | 1 +
.../selftests/vfio/lib/include/vfio_util.h | 295 +++++++++
tools/testing/selftests/vfio/lib/libvfio.mk | 24 +
.../selftests/vfio/lib/vfio_pci_device.c | 594 ++++++++++++++++++
.../selftests/vfio/lib/vfio_pci_driver.c | 126 ++++
tools/testing/selftests/vfio/run.sh | 109 ++++
.../selftests/vfio/vfio_dma_mapping_test.c | 199 ++++++
.../selftests/vfio/vfio_iommufd_setup_test.c | 127 ++++
.../selftests/vfio/vfio_pci_device_test.c | 176 ++++++
.../selftests/vfio/vfio_pci_driver_test.c | 244 +++++++
28 files changed, 3219 insertions(+), 4 deletions(-)
create mode 100644 tools/arch/x86/include/asm/io.h
create mode 100644 tools/arch/x86/include/asm/special_insns.h
create mode 100644 tools/include/asm-generic/io.h
create mode 100644 tools/include/asm/io.h
create mode 120000 tools/include/linux/pci_ids.h
create mode 100644 tools/testing/selftests/vfio/.gitignore
create mode 100644 tools/testing/selftests/vfio/Makefile
create mode 100644 tools/testing/selftests/vfio/lib/drivers/dsa/dsa.c
create mode 120000 tools/testing/selftests/vfio/lib/drivers/dsa/registers.h
create mode 120000 tools/testing/selftests/vfio/lib/drivers/ioat/hw.h
create mode 100644 tools/testing/selftests/vfio/lib/drivers/ioat/ioat.c
create mode 120000 tools/testing/selftests/vfio/lib/drivers/ioat/registers.h
create mode 100644 tools/testing/selftests/vfio/lib/include/vfio_util.h
create mode 100644 tools/testing/selftests/vfio/lib/libvfio.mk
create mode 100644 tools/testing/selftests/vfio/lib/vfio_pci_device.c
create mode 100644 tools/testing/selftests/vfio/lib/vfio_pci_driver.c
create mode 100755 tools/testing/selftests/vfio/run.sh
create mode 100644 tools/testing/selftests/vfio/vfio_dma_mapping_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_pci_device_test.c
create mode 100644 tools/testing/selftests/vfio/vfio_pci_driver_test.c
base-commit: c17b750b3ad9f45f2b6f7e6f7f4679844244f0b9
--
2.51.0.rc2.233.g662b1ed5c5-goog