Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail.
VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address.
Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated.
This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2].
Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists.
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/
To: Alex Williamson alex@shazbot.org To: David Matlack dmatlack@google.com To: Shuah Khan shuah@kernel.org To: Jason Gunthorpe jgg@ziepe.ca Cc: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Mastro amastro@fb.com
Changes in v2: - Fix various nits - calloc() where appropriate - Update overflow test to run regardless of iova range constraints - Change iova_allocator_init() to return an allocated struct - Unfold iova_allocator_alloc() - Fix iova allocator initial state bug - Update vfio_pci_driver_test to use iova allocator - Link to v1: https://lore.kernel.org/r/20251110-iova-ranges-v1-0-4d441cf5bf6d@fb.com
--- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: replace iova=vaddr with allocated iovas
.../testing/selftests/vfio/lib/include/vfio_util.h | 19 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 241 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 20 +- .../testing/selftests/vfio/vfio_pci_driver_test.c | 12 +- 4 files changed, 283 insertions(+), 9 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63
Best regards,
VFIO selftests need to map IOVAs from legally accessible ranges, which could vary between hardware. Tests in vfio_dma_mapping_test.c are making excessively strong assumptions about which IOVAs can be mapped.
Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the IOMMUFD or VFIO container associated with the device. The queried ranges are normalized to IOMMUFD's iommu_iova_range representation so that handling of IOVA ranges up the stack can be implementation-agnostic. iommu_iova_range and vfio_iova_range are equivalent, so bias to using the new interface's struct.
Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES. Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.
The underlying vfio_iommu_type1_info buffer-related functionality has been kept generic so the same helpers can be used to query other capability chain information, if needed.
Signed-off-by: Alex Mastro amastro@fb.com --- .../testing/selftests/vfio/lib/include/vfio_util.h | 8 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 167 +++++++++++++++++++++ 2 files changed, 174 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index 240409bf5f8a..ef8f06ef0c13 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -4,9 +4,12 @@
#include <fcntl.h> #include <string.h> -#include <linux/vfio.h> + +#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/list.h> #include <linux/pci_regs.h> +#include <linux/vfio.h>
#include "../../../kselftest.h"
@@ -206,6 +209,9 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_ void vfio_pci_device_cleanup(struct vfio_pci_device *device); void vfio_pci_device_reset(struct vfio_pci_device *device);
+struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, + u32 *nranges); + int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..7a523e3f2dce 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,173 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0)
+static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz, + size_t *cap_offset) +{ + struct vfio_info_cap_header *hdr; + + if (!*cap_offset) + return NULL; + + VFIO_ASSERT_LT(*cap_offset, bufsz); + VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr)); + + hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset); + + if (hdr->next) + VFIO_ASSERT_GT(hdr->next, *cap_offset); + + *cap_offset = hdr->next; + + return hdr; +} + +static struct vfio_info_cap_header *vfio_iommu_info_cap_hdr(struct vfio_iommu_type1_info *info, + u16 cap_id) +{ + struct vfio_info_cap_header *hdr; + size_t cap_offset = info->cap_offset; + + if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) + return NULL; + + if (cap_offset) + VFIO_ASSERT_GE(cap_offset, sizeof(struct vfio_iommu_type1_info)); + + while ((hdr = next_cap_hdr(info, info->argsz, &cap_offset))) { + if (hdr->id == cap_id) + return hdr; + } + + return NULL; +} + +/* Return buffer including capability chain, if present. Free with free() */ +static struct vfio_iommu_type1_info *vfio_iommu_get_info(struct vfio_pci_device *device) +{ + struct vfio_iommu_type1_info *info; + + info = malloc(sizeof(*info)); + VFIO_ASSERT_NOT_NULL(info); + + *info = (struct vfio_iommu_type1_info) { + .argsz = sizeof(*info), + }; + + ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, info); + + info = realloc(info, info->argsz); + VFIO_ASSERT_NOT_NULL(info); + + ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, info); + + return info; +} + +/* + * Return iova ranges for the device's container. Normalize vfio_iommu_type1 to + * report iommufd's iommu_iova_range. Free with free(). + */ +static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device, + u32 *nranges) +{ + struct vfio_iommu_type1_info_cap_iova_range *cap_range; + struct vfio_iommu_type1_info *info; + struct vfio_info_cap_header *hdr; + struct iommu_iova_range *ranges = NULL; + + info = vfio_iommu_get_info(device); + hdr = vfio_iommu_info_cap_hdr(info, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE); + VFIO_ASSERT_NOT_NULL(hdr); + + cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header); + VFIO_ASSERT_GT(cap_range->nr_iovas, 0); + + ranges = calloc(cap_range->nr_iovas, sizeof(*ranges)); + VFIO_ASSERT_NOT_NULL(ranges); + + for (u32 i = 0; i < cap_range->nr_iovas; i++) { + ranges[i] = (struct iommu_iova_range){ + .start = cap_range->iova_ranges[i].start, + .last = cap_range->iova_ranges[i].end, + }; + } + + *nranges = cap_range->nr_iovas; + + free(info); + return ranges; +} + +/* Return iova ranges of the device's IOAS. Free with free() */ +static struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device, + u32 *nranges) +{ + struct iommu_iova_range *ranges; + int ret; + + struct iommu_ioas_iova_ranges query = { + .size = sizeof(query), + .ioas_id = device->ioas_id, + }; + + ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); + VFIO_ASSERT_EQ(ret, -1); + VFIO_ASSERT_EQ(errno, EMSGSIZE); + VFIO_ASSERT_GT(query.num_iovas, 0); + + ranges = calloc(query.num_iovas, sizeof(*ranges)); + VFIO_ASSERT_NOT_NULL(ranges); + + query.allowed_iovas = (uintptr_t)ranges; + + ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); + *nranges = query.num_iovas; + + return ranges; +} + +static int iova_range_comp(const void *a, const void *b) +{ + const struct iommu_iova_range *ra = a, *rb = b; + + if (ra->start < rb->start) + return -1; + + if (ra->start > rb->start) + return 1; + + return 0; +} + +/* Return sorted IOVA ranges of the device. Free with free(). */ +struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, + u32 *nranges) +{ + struct iommu_iova_range *ranges; + + if (device->iommufd) + ranges = iommufd_iova_ranges(device, nranges); + else + ranges = vfio_iommu_iova_ranges(device, nranges); + + if (!ranges) + return NULL; + + VFIO_ASSERT_GT(*nranges, 0); + + /* Sort and check that ranges are sane and non-overlapping */ + qsort(ranges, *nranges, sizeof(*ranges), iova_range_comp); + VFIO_ASSERT_LT(ranges[0].start, ranges[0].last); + + for (u32 i = 1; i < *nranges; i++) { + VFIO_ASSERT_LT(ranges[i].start, ranges[i].last); + VFIO_ASSERT_LT(ranges[i - 1].last, ranges[i].start); + } + + return ranges; +} + iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On Tue, 11 Nov 2025 06:52:02 -0800 Alex Mastro amastro@fb.com wrote:
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..7a523e3f2dce 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,173 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0) +static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz,
size_t *cap_offset)+{
- struct vfio_info_cap_header *hdr;
- if (!*cap_offset)
return NULL;- VFIO_ASSERT_LT(*cap_offset, bufsz);
- VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr));
- hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset);
- if (hdr->next)
VFIO_ASSERT_GT(hdr->next, *cap_offset);
This might be implementation, but I don't think it's a requirement. The vfio capability chains are based on PCI capabilities, which have no ordering requirement. Thanks,
Alex
- *cap_offset = hdr->next;
- return hdr;
+}
On Tue, Nov 11, 2025 at 10:09:48AM -0700, Alex Williamson wrote:
On Tue, 11 Nov 2025 06:52:02 -0800 Alex Mastro amastro@fb.com wrote:
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..7a523e3f2dce 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,173 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0) +static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz,
size_t *cap_offset)+{
- struct vfio_info_cap_header *hdr;
- if (!*cap_offset)
return NULL;- VFIO_ASSERT_LT(*cap_offset, bufsz);
- VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr));
- hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset);
- if (hdr->next)
VFIO_ASSERT_GT(hdr->next, *cap_offset);This might be implementation, but I don't think it's a requirement. The vfio capability chains are based on PCI capabilities, which have no ordering requirement. Thanks,
My main interest was to enforce that the chain doesn't contain a cycle, and checking for monotonically increasing cap offset was the simplest way I could think of to guarantee such.
If there isn't such a check, and kernel vends a malformed cycle-containing chain, chain traversal would infinite loop.
Given the location of this test code coupled to the kernel tree, do you think such assumptions about implementation still reach too far? If yes, I can either remove this check, or try to make cycle detection more relaxed about offsets potentially going backwards.
Alex
- *cap_offset = hdr->next;
- return hdr;
+}
On Tue, 11 Nov 2025 09:35:31 -0800 Alex Mastro amastro@fb.com wrote:
On Tue, Nov 11, 2025 at 10:09:48AM -0700, Alex Williamson wrote:
On Tue, 11 Nov 2025 06:52:02 -0800 Alex Mastro amastro@fb.com wrote:
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..7a523e3f2dce 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,173 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0) +static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz,
size_t *cap_offset)+{
- struct vfio_info_cap_header *hdr;
- if (!*cap_offset)
return NULL;- VFIO_ASSERT_LT(*cap_offset, bufsz);
- VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr));
- hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset);
- if (hdr->next)
VFIO_ASSERT_GT(hdr->next, *cap_offset);This might be implementation, but I don't think it's a requirement. The vfio capability chains are based on PCI capabilities, which have no ordering requirement. Thanks,
My main interest was to enforce that the chain doesn't contain a cycle, and checking for monotonically increasing cap offset was the simplest way I could think of to guarantee such.
If there isn't such a check, and kernel vends a malformed cycle-containing chain, chain traversal would infinite loop.
Given the location of this test code coupled to the kernel tree, do you think such assumptions about implementation still reach too far? If yes, I can either remove this check, or try to make cycle detection more relaxed about offsets potentially going backwards.
I've seen cycle detection in PCI config space implemented as just a depth/ttl counter. Max cycles is roughly (buffer-size/header-size). I think that would be sufficient if we want to include that sanity testing. Thanks,
Alex
On Tue, Nov 11, 2025 at 10:52:02AM -0700, Alex Williamson wrote:
On Tue, 11 Nov 2025 09:35:31 -0800 Alex Mastro amastro@fb.com wrote:
On Tue, Nov 11, 2025 at 10:09:48AM -0700, Alex Williamson wrote:
On Tue, 11 Nov 2025 06:52:02 -0800 Alex Mastro amastro@fb.com wrote:
diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..7a523e3f2dce 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,173 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0) +static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz,
size_t *cap_offset)+{
- struct vfio_info_cap_header *hdr;
- if (!*cap_offset)
return NULL;- VFIO_ASSERT_LT(*cap_offset, bufsz);
- VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr));
- hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset);
- if (hdr->next)
VFIO_ASSERT_GT(hdr->next, *cap_offset);This might be implementation, but I don't think it's a requirement. The vfio capability chains are based on PCI capabilities, which have no ordering requirement. Thanks,
My main interest was to enforce that the chain doesn't contain a cycle, and checking for monotonically increasing cap offset was the simplest way I could think of to guarantee such.
If there isn't such a check, and kernel vends a malformed cycle-containing chain, chain traversal would infinite loop.
Given the location of this test code coupled to the kernel tree, do you think such assumptions about implementation still reach too far? If yes, I can either remove this check, or try to make cycle detection more relaxed about offsets potentially going backwards.
I've seen cycle detection in PCI config space implemented as just a depth/ttl counter. Max cycles is roughly (buffer-size/header-size). I think that would be sufficient if we want to include that sanity testing. Thanks,
Thanks, that's a good suggestion -- will take this in v3.
Alex
Use the newly available vfio_pci_iova_ranges() to determine the last legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.
Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests") Signed-off-by: Alex Mastro amastro@fb.com --- tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 4f1ea79a200c..e1374aab96bd 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -3,6 +3,8 @@ #include <sys/mman.h> #include <unistd.h>
+#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> #include <linux/sizes.h> @@ -219,7 +221,10 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(); FIXTURE_SETUP(vfio_dma_map_limit_test) { struct vfio_dma_region *region = &self->region; + struct iommu_iova_range *ranges; u64 region_size = getpagesize(); + iova_t last_iova; + u32 nranges;
/* * Over-allocate mmap by double the size to provide enough backing vaddr @@ -232,8 +237,13 @@ FIXTURE_SETUP(vfio_dma_map_limit_test) MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); ASSERT_NE(region->vaddr, MAP_FAILED);
- /* One page prior to the end of address space */ - region->iova = ~(iova_t)0 & ~(region_size - 1); + ranges = vfio_pci_iova_ranges(self->device, &nranges); + VFIO_ASSERT_NOT_NULL(ranges); + last_iova = ranges[nranges - 1].last; + free(ranges); + + /* One page prior to the last iova */ + region->iova = last_iova & ~(region_size - 1); region->size = region_size; }
@@ -276,6 +286,7 @@ TEST_F(vfio_dma_map_limit_test, overflow) struct vfio_dma_region *region = &self->region; int rc;
+ region->iova = ~(iova_t)0 & ~(region->size - 1); region->size = self->mmap_size;
rc = __vfio_pci_dma_map(self->device, region);
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map. This allocator traverses the sorted available IOVA ranges linearly, requires power-of-two size allocations, and does not support freeing iova allocations. The assumption is that tests are not IOVA space-bounded, and will not need to recycle IOVAs.
This is based on Alex Williamson's patch series for adding an IOVA allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Signed-off-by: Alex Mastro amastro@fb.com
--- The unfolded code uses David's range_offset suggestion because it makes assignment of the updated state simpler when crossing a range boundary.
No more ALIGN(). The insight is that the initial check for sufficient space (pre-alignment) requires computing
last = iova + size - 1
which is already half of what ALIGN() does. Just masking is left. --- --- .../testing/selftests/vfio/lib/include/vfio_util.h | 11 ++++ tools/testing/selftests/vfio/lib/vfio_pci_device.c | 74 +++++++++++++++++++++- 2 files changed, 84 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index ef8f06ef0c13..69ec0c856481 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -188,6 +188,13 @@ struct vfio_pci_device { struct vfio_pci_driver driver; };
+struct iova_allocator { + struct iommu_iova_range *ranges; + u32 nranges; + u32 range_idx; + u64 range_offset; +}; + /* * Return the BDF string of the device that the test should use. * @@ -212,6 +219,10 @@ void vfio_pci_device_reset(struct vfio_pci_device *device); struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, u32 *nranges);
+struct iova_allocator *iova_allocator_init(struct vfio_pci_device *device); +void iova_allocator_cleanup(struct iova_allocator *allocator); +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size); + int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 7a523e3f2dce..ff1edb11747a 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -12,11 +12,12 @@ #include <sys/mman.h>
#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> +#include <linux/overflow.h> #include <linux/types.h> #include <linux/vfio.h> -#include <linux/iommufd.h>
#include "../../../kselftest.h" #include <vfio_util.h> @@ -196,6 +197,77 @@ struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, return ranges; }
+struct iova_allocator *iova_allocator_init(struct vfio_pci_device *device) +{ + struct iova_allocator *allocator; + struct iommu_iova_range *ranges; + u32 nranges; + + ranges = vfio_pci_iova_ranges(device, &nranges); + VFIO_ASSERT_NOT_NULL(ranges); + + allocator = malloc(sizeof(*allocator)); + VFIO_ASSERT_NOT_NULL(allocator); + + *allocator = (struct iova_allocator){ + .ranges = ranges, + .nranges = nranges, + .range_idx = 0, + .range_offset = 0, + }; + + return allocator; +} + +void iova_allocator_cleanup(struct iova_allocator *allocator) +{ + free(allocator->ranges); + free(allocator); +} + +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) +{ + VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n"); + VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n"); + + for (;;) { + struct iommu_iova_range *range; + iova_t iova, last; + + VFIO_ASSERT_LT(allocator->range_idx, allocator->nranges, + "IOVA allocator out of space\n"); + + range = &allocator->ranges[allocator->range_idx]; + iova = range->start + allocator->range_offset; + + /* Check for sufficient space at the current offset */ + if (check_add_overflow(iova, size - 1, &last) || + last > range->last) + goto next_range; + + /* Align iova to size */ + iova = last & ~(size - 1); + + /* Check for sufficient space at the aligned iova */ + if (check_add_overflow(iova, size - 1, &last) || + last > range->last) + goto next_range; + + if (last == range->last) { + allocator->range_idx++; + allocator->range_offset = 0; + } else { + allocator->range_offset = last - range->start + 1; + } + + return iova; + +next_range: + allocator->range_idx++; + allocator->range_offset = 0; + } +} + iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr as part of DMA mapping operations. The assumption that these IOVAs are legal has held up on all the hardware we've tested so far, but is not guaranteed. Make the tests more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container.
Signed-off-by: Alex Mastro amastro@fb.com --- tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 5 ++++- tools/testing/selftests/vfio/vfio_pci_driver_test.c | 12 ++++++++---- 2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index e1374aab96bd..102603d4407d 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -95,6 +95,7 @@ static int iommu_mapping_get(const char *bdf, u64 iova,
FIXTURE(vfio_dma_mapping_test) { struct vfio_pci_device *device; + struct iova_allocator *iova_allocator; };
FIXTURE_VARIANT(vfio_dma_mapping_test) { @@ -119,10 +120,12 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(anonymous_hugetlb_1gb, SZ_1G, MAP_HUGETLB | FIXTURE_SETUP(vfio_dma_mapping_test) { self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode); + self->iova_allocator = iova_allocator_init(self->device); }
FIXTURE_TEARDOWN(vfio_dma_mapping_test) { + iova_allocator_cleanup(self->iova_allocator); vfio_pci_device_cleanup(self->device); }
@@ -144,7 +147,7 @@ TEST_F(vfio_dma_mapping_test, dma_map_unmap) else ASSERT_NE(region.vaddr, MAP_FAILED);
- region.iova = (u64)region.vaddr; + region.iova = iova_allocator_alloc(self->iova_allocator, size); region.size = size;
vfio_pci_dma_map(self->device, ®ion); diff --git a/tools/testing/selftests/vfio/vfio_pci_driver_test.c b/tools/testing/selftests/vfio/vfio_pci_driver_test.c index 2dbd70b7db62..f69eec8b928d 100644 --- a/tools/testing/selftests/vfio/vfio_pci_driver_test.c +++ b/tools/testing/selftests/vfio/vfio_pci_driver_test.c @@ -19,6 +19,7 @@ static const char *device_bdf; } while (0)
static void region_setup(struct vfio_pci_device *device, + struct iova_allocator *iova_allocator, struct vfio_dma_region *region, u64 size) { const int flags = MAP_SHARED | MAP_ANONYMOUS; @@ -29,7 +30,7 @@ static void region_setup(struct vfio_pci_device *device, VFIO_ASSERT_NE(vaddr, MAP_FAILED);
region->vaddr = vaddr; - region->iova = (u64)vaddr; + region->iova = iova_allocator_alloc(iova_allocator, size); region->size = size;
vfio_pci_dma_map(device, region); @@ -44,6 +45,7 @@ static void region_teardown(struct vfio_pci_device *device,
FIXTURE(vfio_pci_driver_test) { struct vfio_pci_device *device; + struct iova_allocator *iova_allocator; struct vfio_dma_region memcpy_region; void *vaddr; int msi_fd; @@ -72,14 +74,15 @@ FIXTURE_SETUP(vfio_pci_driver_test) struct vfio_pci_driver *driver;
self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode); + self->iova_allocator = iova_allocator_init(self->device);
driver = &self->device->driver;
- region_setup(self->device, &self->memcpy_region, SZ_1G); - region_setup(self->device, &driver->region, SZ_2M); + region_setup(self->device, self->iova_allocator, &self->memcpy_region, SZ_1G); + region_setup(self->device, self->iova_allocator, &driver->region, SZ_2M);
/* Any IOVA that doesn't overlap memcpy_region and driver->region. */ - self->unmapped_iova = 8UL * SZ_1G; + self->unmapped_iova = iova_allocator_alloc(self->iova_allocator, SZ_1G);
vfio_pci_driver_init(self->device); self->msi_fd = self->device->msi_eventfds[driver->msi]; @@ -108,6 +111,7 @@ FIXTURE_TEARDOWN(vfio_pci_driver_test) region_teardown(self->device, &self->memcpy_region); region_teardown(self->device, &driver->region);
+ iova_allocator_cleanup(self->iova_allocator); vfio_pci_device_cleanup(self->device); }
On Tue, 11 Nov 2025 06:52:05 -0800 Alex Mastro amastro@fb.com wrote:
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr as part of DMA mapping operations. The assumption that these IOVAs are legal has held up on all the hardware we've tested so far, but is not guaranteed. Make the tests more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container.
I've reported hardware that it doesn't work on, QEMU emulates such hardware. The commit message suggests this is more of a theoretical problem. Thanks,
Alex
On Tue, Nov 11, 2025 at 10:09:37AM -0700, Alex Williamson wrote:
On Tue, 11 Nov 2025 06:52:05 -0800 Alex Mastro amastro@fb.com wrote:
vfio_dma_mapping_test and vfio_pci_driver_test currently use iova=vaddr as part of DMA mapping operations. The assumption that these IOVAs are legal has held up on all the hardware we've tested so far, but is not guaranteed. Make the tests more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container.
I've reported hardware that it doesn't work on, QEMU emulates such hardware. The commit message suggests this is more of a theoretical problem. Thanks,
Agree that it's misleading. I'll update the commit message to describe the things you described earlier.
On Tue, 11 Nov 2025 06:52:01 -0800 Alex Mastro amastro@fb.com wrote:
Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail.
VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address.
Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated.
This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2].
Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists.
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/
To: Alex Williamson alex@shazbot.org To: David Matlack dmatlack@google.com To: Shuah Khan shuah@kernel.org To: Jason Gunthorpe jgg@ziepe.ca Cc: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Mastro amastro@fb.com
Changes in v2:
- Fix various nits
- calloc() where appropriate
- Update overflow test to run regardless of iova range constraints
- Change iova_allocator_init() to return an allocated struct
- Unfold iova_allocator_alloc()
- Fix iova allocator initial state bug
- Update vfio_pci_driver_test to use iova allocator
- Link to v1: https://lore.kernel.org/r/20251110-iova-ranges-v1-0-4d441cf5bf6d@fb.com
Minor comments, but otherwise LGTM and passes testing on a Kaby Lake system with limited IOMMU address width. Thanks,
Alex
On 2025-11-11 06:52 AM, Alex Mastro wrote:
Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail.
VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address.
Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated.
This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2].
Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists.
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/
To: Alex Williamson alex@shazbot.org To: David Matlack dmatlack@google.com To: Shuah Khan shuah@kernel.org To: Jason Gunthorpe jgg@ziepe.ca Cc: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Mastro amastro@fb.com
LGTM. And I confirmed this fixes vfio_dma_mapping_test on HW that does not support IOVA 0xffffffffffffffff. Thanks!
Reviewed-by: David Matlack dmatlack@google.com Tested-by: David Matlack dmatlack@google.com
On Tue, Nov 11, 2025 at 05:41:04PM +0000, David Matlack wrote:
On 2025-11-11 06:52 AM, Alex Mastro wrote:
Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail.
VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address.
Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated.
This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2].
Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` and IOVA range helpers are likely to be short lived, since they reside in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists.
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/
To: Alex Williamson alex@shazbot.org To: David Matlack dmatlack@google.com To: Shuah Khan shuah@kernel.org To: Jason Gunthorpe jgg@ziepe.ca Cc: kvm@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Alex Mastro amastro@fb.com
LGTM. And I confirmed this fixes vfio_dma_mapping_test on HW that does not support IOVA 0xffffffffffffffff. Thanks!
Reviewed-by: David Matlack dmatlack@google.com Tested-by: David Matlack dmatlack@google.com
Thanks David!
linux-kselftest-mirror@lists.linaro.org