On Tue, May 27, 2025 at 5:11 PM Jason Gunthorpe jgg@nvidia.com wrote:
On Tue, May 27, 2025 at 04:01:52PM -0700, David Matlack wrote:
A reusable mini-driver framework that can trigger DMA is a huge leap forward.
How broad do you think the reusability should go?
I structured the library (which includes the driver framework and drivers) so that it is reusable across other selftests (i.e. not just in tools/testing/selftests/vfio). The last 3 patches in this series show it being used in KVM selftests for example. IOMMU-focused tests in tools/testing/selftests/iommu could also use it.
I think having it as a usable library within selftests is a good place to start at least. It shows it has a clean API boundary at least.
But it's not reusable outside of selftests, or outside of the kernel source tree. My intuition is the former wouldn't be too hard to support, but the latter would be challenging.
And then we can see if there is interest to move it outside.
Sounds good to me.
I was also thinking of using NVMe for this (cheap, broadly available), but I'm a little worried someone might accidentally corrupt their boot disk if they accidentally pass in the wrong BDF :)
Yeah, you can't do memcpy on NVMe without being destructive.
You'd want an alternative stimulus API that was more like 'DMA write something random to X", then you could DMA READ from the media and use that as a non-destructive test.
Yeah we would need a different driver API. Intel DSA supports a Memory Fill operation, which would be similar. But the nice thing about memcpy is you can validate that the contents of memory are "correct" after the DMA. With NVMe we wouldn't be able to have any guarantees about what exactly would get written.
If mlx5 HW is truly cheap and broadly available then maybe we just align on that for baremetal tests and not worry about NVMe. That way we can keep the memcpy API and be able to validate the contents of DMAs.
For running these tests in VMs, Joel's pci-ats-testdev [1] looks like a good option.
[1] https://github.com/Joelgranados/qemu/blob/pcie-testdev/hw/misc/pcie-ats-test...
Do you think mlx5 HW could support the current driver API?
I think it can do memcpy. It would require copying a lot of code but it is "straightforward" to setup a loopback QP and then issue RDMA WRITE operations to memcpy data. It would act almost the same as IDXD.
There are examples doing this in the kernel, and we have examples in rdma-core how to boot the device under VFIO.
Good to know. I'd need some help from someone from Nvidia to write the driver though, or it might take a while.