Until CONFIG_DMABUF_SYSFS_STATS was added [1] it was only possible to
perform per-buffer accounting with debugfs which is not suitable for
production environments. Eventually we discovered the overhead with
per-buffer sysfs file creation/removal was significantly impacting
allocation and free times, and exacerbated kernfs lock contention. [2]
dma_buf_stats_setup() is responsible for 39% of single-page buffer
creation duration, or 74% of single-page dma_buf_export() duration when
stressing dmabuf allocations and frees.
I prototyped a change from per-buffer to per-exporter statistics with a
RCU protected list of exporter allocations that accommodates most (but
not all) of our use-cases and avoids almost all of the sysfs overhead.
While that adds less overhead than per-buffer sysfs, and less even than
the maintenance of the dmabuf debugfs_list, it's still *additional*
overhead on top of the debugfs_list and doesn't give us per-buffer info.
This series uses the existing dmabuf debugfs_list to implement a BPF
dmabuf iterator, which adds no overhead to buffer allocation/free and
provides per-buffer info. The list has been moved outside of
CONFIG_DEBUG_FS scope so that it is always populated. The BPF program
loaded by userspace that extracts per-buffer information gets to define
its own interface which avoids the lack of ABI stability with debugfs.
This will allow us to replace our use of CONFIG_DMABUF_SYSFS_STATS, and
the plan is to remove it from the kernel after the next longterm stable
release.
[1] https://lore.kernel.org/linux-media/20201210044400.1080308-1-hridya@google.…
[2] https://lore.kernel.org/all/20220516171315.2400578-1-tjmercier@google.com
v1: https://lore.kernel.org/all/20250414225227.3642618-1-tjmercier@google.com
v1 -> v2:
Make the DMA buffer list independent of CONFIG_DEBUG_FS per Christian König
Add CONFIG_DMA_SHARED_BUFFER check to kernel/bpf/Makefile per kernel test robot
Use BTF_ID_LIST_SINGLE instead of BTF_ID_LIST_GLOBAL_SINGLE per Song Liu
Fixup comment style, mixing code/declarations, and use ASSERT_OK_FD in selftest per Song Liu
Add BPF_ITER_RESCHED feature to bpf_dmabuf_reg_info per Alexei Starovoitov
Add open-coded iterator and selftest per Alexei Starovoitov
Add a second test buffer from the system dmabuf heap to selftests
Use the BPF program we'll use in production for selftest per Alexei Starovoitov
https://r.android.com/c/platform/system/bpfprogs/+/3616123/2/dmabufIter.chttps://r.android.com/c/platform/system/memory/libmeminfo/+/3614259/1/libdm…
v2: https://lore.kernel.org/all/20250504224149.1033867-1-tjmercier@google.com
v2 -> v3:
Rebase onto bpf-next/master
Move get_next_dmabuf() into drivers/dma-buf/dma-buf.c, along with the
new get_first_dmabuf(). This avoids having to expose the dmabuf list
and mutex to the rest of the kernel, and keeps the dmabuf mutex
operations near each other in the same file. (Christian König)
Add Christian's RB to dma-buf: Rename debugfs symbols
Drop RFC: dma-buf: Remove DMA-BUF statistics
v3: https://lore.kernel.org/all/20250507001036.2278781-1-tjmercier@google.com
v3 -> v4:
Fix selftest BPF program comment style (not kdoc) per Alexei Starovoitov
Fix dma-buf.c kdoc comment style per Alexei Starovoitov
Rename get_first_dmabuf / get_next_dmabuf to dma_buf_iter_begin /
dma_buf_iter_next per Christian König
Add Christian's RB to bpf: Add dmabuf iterator
T.J. Mercier (5):
dma-buf: Rename debugfs symbols
bpf: Add dmabuf iterator
bpf: Add open coded dmabuf iterator
selftests/bpf: Add test for dmabuf_iter
selftests/bpf: Add test for open coded dmabuf_iter
drivers/dma-buf/dma-buf.c | 98 +++++--
include/linux/dma-buf.h | 4 +-
kernel/bpf/Makefile | 3 +
kernel/bpf/dmabuf_iter.c | 149 ++++++++++
kernel/bpf/helpers.c | 5 +
.../testing/selftests/bpf/bpf_experimental.h | 5 +
tools/testing/selftests/bpf/config | 3 +
.../selftests/bpf/prog_tests/dmabuf_iter.c | 258 ++++++++++++++++++
.../testing/selftests/bpf/progs/dmabuf_iter.c | 91 ++++++
9 files changed, 594 insertions(+), 22 deletions(-)
create mode 100644 kernel/bpf/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/dmabuf_iter.c
create mode 100644 tools/testing/selftests/bpf/progs/dmabuf_iter.c
base-commit: 43745d11bfd9683abdf08ad7a5cc403d6a9ffd15
--
2.49.0.1015.ga840276032-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find DUALPI2 iproute2 patch v8.
v8 (09-May-25)
- Update pkt_sched.h with the one in nex-next
- Fix one typo of comment in pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
v7 (05-May-25)
- Align pkt_sched.h with the v14 version of net-next due to spec modificiaotn in tc.yaml
- Reorganize dualpi2_print_opt() to match the order in tc.yaml
- Remove credit-queue in PRINT_JSON
v6 (26-Apr-25)
- Update JSON file output due to spec modifiocation in tc.yaml of net-next
v5 (25-Mar-25)
- Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>)
- Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>)
v4 (16-Mar-25)
- Add min_qlen_step to dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step amrking.
v3 (21-Feb-25)
- Add memlimit to dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update manual to align latest implementation and clarify the queue naming and default unit
- Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2
v2 (23-Oct-24)
- Rename get_float in dualpi2 to get_float_min_max in utils.c
- Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>)
For more details of DualPI2, plesae refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
Best Regards,
Chia-Yu
Chia-Yu Chang (1):
tc: add dualpi2 scheduler module
bash-completion/tc | 11 +-
include/uapi/linux/pkt_sched.h | 68 +++++
include/utils.h | 2 +
ip/iplink_can.c | 14 -
lib/utils.c | 30 ++
man/man8/tc-dualpi2.8 | 249 +++++++++++++++
tc/Makefile | 1 +
tc/q_dualpi2.c | 534 +++++++++++++++++++++++++++++++++
8 files changed, 894 insertions(+), 15 deletions(-)
create mode 100644 man/man8/tc-dualpi2.8
create mode 100644 tc/q_dualpi2.c
--
2.34.1
There are KUnit tests for the bitfield packing library but these depend
on CONFIG_PACKING which is not enabled by anything in either the
existing KUnit all_tests.config or the base UML config it runs on as
standard. Enable that in all_tests.config to improve coverage.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/kunit/configs/all_tests.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/kunit/configs/all_tests.config b/tools/testing/kunit/configs/all_tests.config
index cdd9782f9646..b0223b7aebde 100644
--- a/tools/testing/kunit/configs/all_tests.config
+++ b/tools/testing/kunit/configs/all_tests.config
@@ -51,3 +51,5 @@ CONFIG_SOUND=y
CONFIG_SND=y
CONFIG_SND_SOC=y
CONFIG_SND_SOC_TOPOLOGY_BUILD=y
+
+CONFIG_PACKING=y
---
base-commit: 8ffd015db85fea3e15a77027fda6c02ced4d2444
change-id: 20250422-kunit-enable-missing-tests-9189ee930cae
Best regards,
--
Mark Brown <broonie(a)kernel.org>
TRACE_syscall.ptrace.negative_ENOSYS and TRACE_syscall.seccomp.negative_ENOSYS
on arm32 are being reported as failures instead of skipping.
The teardown_trace_fixture function sets the test to KSFT_FAIL in case of a
non 0 return value from the tracer process.
Due to _metadata now being shared between the forked processes the tracer is
returning the KSFT_SKIP value set by the tracee which is non 0.
Remove the setting of the _metadata.exit_code in teardown_trace_fixture.
Fixes: 24cf65a62266 ("selftests/harness: Share _metadata between forked processes")
Signed-off-by: Terry Tritton <terry.tritton(a)linaro.org>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index b2f76a52215a..c43a6f8f8cd5 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -1629,14 +1629,8 @@ void teardown_trace_fixture(struct __test_metadata *_metadata,
{
if (tracer) {
int status;
- /*
- * Extract the exit code from the other process and
- * adopt it for ourselves in case its asserts failed.
- */
ASSERT_EQ(0, kill(tracer, SIGUSR1));
ASSERT_EQ(tracer, waitpid(tracer, &status, 0));
- if (WEXITSTATUS(status))
- _metadata->exit_code = KSFT_FAIL;
}
}
--
2.39.5
From: Ashish Kalra <ashish.kalra(a)amd.com>
During platform init, SNP initialization may fail for several reasons,
such as firmware command failures and incompatible versions. However,
the KVM capability may continue to advertise support for it.
The platform may have SNP enabled but if SNP_INIT fails then SNP is
not supported by KVM.
During KVM module initialization query the SNP platform status to obtain
the SNP initialization state and use it as an additional condition to
determine support for SEV-SNP.
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Co-developed-by: Pratik R. Sampat <prsampat(a)amd.com>
Signed-off-by: Pratik R. Sampat <prsampat(a)amd.com>
Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com>
---
arch/x86/kvm/svm/sev.c | 43 +++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index ada53f04158c..a6abdb26f877 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -2934,6 +2934,32 @@ void __init sev_set_cpu_caps(void)
}
}
+static bool sev_is_snp_initialized(void)
+{
+ struct sev_user_data_snp_status *status;
+ struct sev_data_snp_addr buf;
+ bool initialized = false;
+ void *data;
+ int error;
+
+ data = snp_alloc_firmware_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!data)
+ return initialized;
+
+ buf.address = __psp_pa(data);
+ if (sev_do_cmd(SEV_CMD_SNP_PLATFORM_STATUS, &buf, &error))
+ goto out;
+
+ status = (struct sev_user_data_snp_status *)data;
+ if (status->state)
+ initialized = true;
+
+out:
+ snp_free_firmware_page(data);
+
+ return initialized;
+}
+
void __init sev_hardware_setup(void)
{
unsigned int eax, ebx, ecx, edx, sev_asid_count, sev_es_asid_count;
@@ -3038,6 +3064,14 @@ void __init sev_hardware_setup(void)
sev_snp_supported = sev_snp_enabled && cc_platform_has(CC_ATTR_HOST_SEV_SNP);
out:
+ if (sev_enabled) {
+ init_args.probe = true;
+ if (sev_platform_init(&init_args))
+ sev_supported = sev_es_supported = sev_snp_supported = false;
+ else
+ sev_snp_supported &= sev_is_snp_initialized();
+ }
+
if (boot_cpu_has(X86_FEATURE_SEV))
pr_info("SEV %s (ASIDs %u - %u)\n",
sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" :
@@ -3064,15 +3098,6 @@ void __init sev_hardware_setup(void)
sev_supported_vmsa_features = 0;
if (sev_es_debug_swap_enabled)
sev_supported_vmsa_features |= SVM_SEV_FEAT_DEBUG_SWAP;
-
- if (!sev_enabled)
- return;
-
- /*
- * Do both SNP and SEV initialization at KVM module load.
- */
- init_args.probe = true;
- sev_platform_init(&init_args);
}
void sev_hardware_unsetup(void)
--
2.34.1
Poll program is a helper to ftracetest, thus make it a
generic file and remove it from being run as a test.
Currently when executing tests using
$ make run_tests
CC poll
TAP version 13
1..2
# timeout set to 0
# selftests: ftrace: poll
# Error: Polling file is not specified
not ok 1 selftests: ftrace: poll # exit=255
Fix this by using TEST_GEN_FILES to build the 'poll' binary as a helper
rather than as a test.
Fixes: 80c3e28528ff ("selftests/tracing: Add hist poll() support test")
Signed-off-by: Ayush Jain <Ayush.jain3(a)amd.com>
---
tools/testing/selftests/ftrace/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index 49d96bb16355..7c12263f8260 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -6,6 +6,6 @@ TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
-TEST_GEN_PROGS = poll
+TEST_GEN_FILES := poll
include ../lib.mk
--
2.34.1
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
A vQUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which is an IOMMU HW feature to directly access the virtual queue
in the guest address space, to avoid VM Exits to improve the performance.
As an initial use case, it adds support for guest-owned HW virtual queues
that VMM can allocate per request from a guest OS writing the VM register.
Introduce IOMMUFD_OBJ_VQUEUE and its allocator IOMMUFD_CMD_VQUEUE_ALLOC,
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.
Meanwhile, a guest-owned virtual queue needs the kernel (a virtual queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the virtual queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VQUEUE_ALLOC and returned to the user space as an output
driver-data carried via the IOMMUFD_CMD_VQUEUE_ALLOC. So, this requires a
driver-specific user data support in the vIOMMU allocation flow.
As a real-world use case, this series implements a vQUEUE support to the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.
// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
4KB 1 35.7 us 5.3 us
16KB 1 41.8 us 6.8 us
64KB 1 68.9 us 9.9 us
128KB 1 109.0 us 12.6 us
256KB 1 187.1 us 18.0 us
4KB 2 96.9 us 6.8 us
16KB 2 97.8 us 7.5 us
64KB 2 151.5 us 10.7 us
128KB 2 257.8 us 12.7 us
256KB 2 443.0 us 17.9 us
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vqueue-v3
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vqueue-v3
Changelog
v3
* Add Reviewed-by from Baolu, Pranjal, and Alok
* Revise kdocs, uAPI docs, and commit logs
* Rename "vCMDQ" back to "vQUEUE" for AMD cases
* [tegra] Add tegra241_vcmdq_hw_flush_timeout()
* [tegra] Rename vsmmu_alloc to alloc_vintf_user
* [tegra] Use writel for SID replacement registers
* [tegra] Move mmap removal call to vsmmu_destroy op
* [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
* [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
* [iommufd] Add an object-type "owner" to immap structure
* [iommufd] Drop the ictx input in the new for-driver APIs
* [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
* [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
* [iommufd] Rename iommufd_ctx_alloc/free_mmap to
_iommufd_alloc/destroy_mmap
v2
https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
* Add Reviewed-by from Jason
* [smmu] Fix vsmmu initial value
* [smmu] Support impl for hw_info
* [tegra] Rename "slot" to "vsid"
* [tegra] Update kdocs and commit logs
* [tegra] Map/unmap LVCMDQ dynamically
* [tegra] Refcount the previous LVCMDQ
* [tegra] Return -EEXIST if LVCMDQ exists
* [tegra] Simplify VINTF cleanup routine
* [tegra] Use vmid and s2_domain in vsmmu
* [tegra] Rename "mmap_pgoff" to "immap_id"
* [tegra] Add more addr and length validation
* [iommufd] Add more narrative to mmap's kdoc
* [iommufd] Add iommufd_struct_depend/undepend()
* [iommufd] Rename vcmdq_free op to vcmdq_destroy
* [iommufd] Fix bug in iommu_copy_struct_to_user()
* [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
* [iommufd] Test the queue memory for its contiguity
* [iommufd] Return -ENXIO if address or length fails
* [iommufd] Do not change @min_last in mock_viommu_alloc()
* [iommufd] Generalize TEGRA241_VCMDQ data in core structure
* [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
* [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (23):
iommufd/viommu: Add driver-allocated vDEVICE support
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommu: Add iommu_copy_struct_to_user helper
iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
viommu->ictx
iommufd/driver: Add iommufd_struct_destroy to revert
iommufd_viommu_alloc
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add covearge for viommu data
iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
iommufd/viommu: Introduce IOMMUFD_OBJ_VQUEUE and its related struct
iommufd/viommu: Add IOMMUFD_CMD_VQUEUE_ALLOC ioctl
iommufd/driver: Add iommufd_vqueue_depend/undepend() helpers
iommufd/selftest: Add coverage for IOMMUFD_CMD_VQUEUE_ALLOC
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update vQUEUE
iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/tegra241-cmdqv: Simplify deinit flow in
tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +-
drivers/iommu/iommufd/io_pagetable.h | 8 +
drivers/iommu/iommufd/iommufd_private.h | 28 +-
drivers/iommu/iommufd/iommufd_test.h | 20 +
include/linux/iommu.h | 43 +-
include/linux/iommufd.h | 184 ++++++-
include/uapi/linux/iommufd.h | 117 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 52 +-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 42 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 481 +++++++++++++++++-
drivers/iommu/iommufd/device.c | 117 +----
drivers/iommu/iommufd/driver.c | 88 ++++
drivers/iommu/iommufd/io_pagetable.c | 95 ++++
drivers/iommu/iommufd/main.c | 84 ++-
drivers/iommu/iommufd/selftest.c | 126 ++++-
drivers/iommu/iommufd/viommu.c | 116 ++++-
tools/testing/selftests/iommu/iommufd.c | 96 +++-
.../selftests/iommu/iommufd_fail_nth.c | 11 +-
Documentation/userspace-api/iommufd.rst | 15 +
19 files changed, 1555 insertions(+), 193 deletions(-)
--
2.43.0