Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail.
VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address.
Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated.
This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2].
Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` is likely to be short lived, since it resides in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists.
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/#t [2] https://lore.kernel.org/all/20251107222058.2009244-1-dmatlack@google.com/ [3] https://lore.kernel.org/all/aRIoKJk0uwLD-yGr@google.com/
Signed-off-by: Alex Mastro amastro@fb.com --- Alex Mastro (4): vfio: selftests: add iova range query helpers vfio: selftests: fix map limit tests to use last available iova vfio: selftests: add iova allocator vfio: selftests: update vfio_dma_mapping_test to allocate iovas
.../testing/selftests/vfio/lib/include/vfio_util.h | 22 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 226 ++++++++++++++++++++- .../testing/selftests/vfio/vfio_dma_mapping_test.c | 25 ++- 3 files changed, 268 insertions(+), 5 deletions(-) --- base-commit: 0ed3a30fd996cb0cac872432cf25185fda7e5316 change-id: 20251110-iova-ranges-1c09549fbf63
Best regards,
VFIO selftests need to map IOVAs from legally accessible ranges, which could vary between hardware. Tests in vfio_dma_mapping_test.c are making excessively strong assumptions about which IOVAs can be mapped.
Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the IOMMUFD or VFIO container associated with the device. The queried ranges are normalized to IOMMUFD's iommu_iova_range representation so that handling of IOVA ranges up the stack can be implementation-agnostic. iommu_iova_range and vfio_iova_range are equivalent, so bias to using the new interface's struct.
Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES. Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.
The underlying vfio_iommu_type1_info buffer-related functionality has been kept generic so the same helpers can be used to query other capability chain information, if needed.
Signed-off-by: Alex Mastro amastro@fb.com --- .../testing/selftests/vfio/lib/include/vfio_util.h | 8 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 161 +++++++++++++++++++++ 2 files changed, 168 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index 240409bf5f8a..fb5efec52316 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -4,9 +4,12 @@
#include <fcntl.h> #include <string.h> -#include <linux/vfio.h> + +#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/list.h> #include <linux/pci_regs.h> +#include <linux/vfio.h>
#include "../../../kselftest.h"
@@ -206,6 +209,9 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_ void vfio_pci_device_cleanup(struct vfio_pci_device *device); void vfio_pci_device_reset(struct vfio_pci_device *device);
+struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, + size_t *nranges); + int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..6bedbe65f0a1 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,167 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0)
+static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz, + size_t *cap_offset) +{ + struct vfio_info_cap_header *hdr; + + if (!*cap_offset) + return NULL; + + /* Cap offset must be in bounds */ + VFIO_ASSERT_LT(*cap_offset, bufsz); + /* There must be enough remaining space to contain the header */ + VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr)); + + hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset); + + /* If there is a next, offset must increase by at least the header size */ + if (hdr->next) { + VFIO_ASSERT_GT(hdr->next, *cap_offset); + VFIO_ASSERT_GE(hdr->next - *cap_offset, sizeof(*hdr)); + } + + *cap_offset = hdr->next; + + return hdr; +} + +static struct vfio_info_cap_header *vfio_iommu_info_cap_hdr(struct vfio_iommu_type1_info *buf, + u16 cap_id) +{ + struct vfio_info_cap_header *hdr; + size_t cap_offset = buf->cap_offset; + + if (!(buf->flags & VFIO_IOMMU_INFO_CAPS)) + return NULL; + + if (cap_offset) + VFIO_ASSERT_GE(cap_offset, sizeof(struct vfio_iommu_type1_info)); + + while ((hdr = next_cap_hdr(buf, buf->argsz, &cap_offset))) { + if (hdr->id == cap_id) + return hdr; + } + + return NULL; +} + +/* Return buffer including capability chain, if present. Free with free() */ +static struct vfio_iommu_type1_info *vfio_iommu_info_buf(struct vfio_pci_device *device) +{ + struct vfio_iommu_type1_info *buf; + + buf = malloc(sizeof(*buf)); + VFIO_ASSERT_NOT_NULL(buf); + + *buf = (struct vfio_iommu_type1_info) { + .argsz = sizeof(*buf), + }; + + ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, buf); + + buf = realloc(buf, buf->argsz); + VFIO_ASSERT_NOT_NULL(buf); + + ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, buf); + + return buf; +} + +/* + * Return iova ranges for the device's container. Normalize vfio_iommu_type1 to + * report iommufd's iommu_iova_range. Free with free(). + */ +static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device, + size_t *nranges) +{ + struct vfio_iommu_type1_info_cap_iova_range *cap_range; + struct vfio_iommu_type1_info *buf; + struct vfio_info_cap_header *hdr; + struct iommu_iova_range *ranges = NULL; + + buf = vfio_iommu_info_buf(device); + VFIO_ASSERT_NOT_NULL(buf); + + hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE); + if (!hdr) + goto free_buf; + + cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header); + if (!cap_range->nr_iovas) + goto free_buf; + + ranges = malloc(cap_range->nr_iovas * sizeof(*ranges)); + VFIO_ASSERT_NOT_NULL(ranges); + + for (u32 i = 0; i < cap_range->nr_iovas; i++) { + ranges[i] = (struct iommu_iova_range){ + .start = cap_range->iova_ranges[i].start, + .last = cap_range->iova_ranges[i].end, + }; + } + + *nranges = cap_range->nr_iovas; + +free_buf: + free(buf); + return ranges; +} + +/* Return iova ranges of the device's IOAS. Free with free() */ +struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device, + size_t *nranges) +{ + struct iommu_iova_range *ranges; + int ret; + + struct iommu_ioas_iova_ranges query = { + .size = sizeof(query), + .ioas_id = device->ioas_id, + }; + + ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); + VFIO_ASSERT_EQ(ret, -1); + VFIO_ASSERT_EQ(errno, EMSGSIZE); + VFIO_ASSERT_GT(query.num_iovas, 0); + + ranges = malloc(query.num_iovas * sizeof(*ranges)); + VFIO_ASSERT_NOT_NULL(ranges); + + query.allowed_iovas = (uintptr_t)ranges; + + ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); + *nranges = query.num_iovas; + + return ranges; +} + +struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, + size_t *nranges) +{ + struct iommu_iova_range *ranges; + + if (device->iommufd) + ranges = iommufd_iova_ranges(device, nranges); + else + ranges = vfio_iommu_iova_ranges(device, nranges); + + if (!ranges) + return NULL; + + /* ranges should be valid, ascending, and non-overlapping */ + VFIO_ASSERT_GT(*nranges, 0); + VFIO_ASSERT_LT(ranges[0].start, ranges[0].last); + + for (size_t i = 1; i < *nranges; i++) { + VFIO_ASSERT_LT(ranges[i].start, ranges[i].last); + VFIO_ASSERT_LT(ranges[i - 1].last, ranges[i].start); + } + + return ranges; +} + iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On Mon, 10 Nov 2025 13:10:41 -0800 Alex Mastro amastro@fb.com wrote:
VFIO selftests need to map IOVAs from legally accessible ranges, which could vary between hardware. Tests in vfio_dma_mapping_test.c are making excessively strong assumptions about which IOVAs can be mapped.
Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the IOMMUFD or VFIO container associated with the device. The queried ranges are normalized to IOMMUFD's iommu_iova_range representation so that handling of IOVA ranges up the stack can be implementation-agnostic. iommu_iova_range and vfio_iova_range are equivalent, so bias to using the new interface's struct.
Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES. Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.
The underlying vfio_iommu_type1_info buffer-related functionality has been kept generic so the same helpers can be used to query other capability chain information, if needed.
Signed-off-by: Alex Mastro amastro@fb.com
.../testing/selftests/vfio/lib/include/vfio_util.h | 8 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 161 +++++++++++++++++++++ 2 files changed, 168 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index 240409bf5f8a..fb5efec52316 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -4,9 +4,12 @@ #include <fcntl.h> #include <string.h> -#include <linux/vfio.h>
+#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/list.h> #include <linux/pci_regs.h> +#include <linux/vfio.h> #include "../../../kselftest.h" @@ -206,6 +209,9 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_ void vfio_pci_device_cleanup(struct vfio_pci_device *device); void vfio_pci_device_reset(struct vfio_pci_device *device); +struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device,
size_t *nranges);int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..6bedbe65f0a1 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,167 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0) +static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz,
size_t *cap_offset)+{
- struct vfio_info_cap_header *hdr;
- if (!*cap_offset)
return NULL;- /* Cap offset must be in bounds */
- VFIO_ASSERT_LT(*cap_offset, bufsz);
- /* There must be enough remaining space to contain the header */
- VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr));
- hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset);
- /* If there is a next, offset must increase by at least the header size */
- if (hdr->next) {
VFIO_ASSERT_GT(hdr->next, *cap_offset);VFIO_ASSERT_GE(hdr->next - *cap_offset, sizeof(*hdr));- }
- *cap_offset = hdr->next;
- return hdr;
+}
+static struct vfio_info_cap_header *vfio_iommu_info_cap_hdr(struct vfio_iommu_type1_info *buf,
u16 cap_id)+{
- struct vfio_info_cap_header *hdr;
- size_t cap_offset = buf->cap_offset;
- if (!(buf->flags & VFIO_IOMMU_INFO_CAPS))
return NULL;- if (cap_offset)
VFIO_ASSERT_GE(cap_offset, sizeof(struct vfio_iommu_type1_info));- while ((hdr = next_cap_hdr(buf, buf->argsz, &cap_offset))) {
if (hdr->id == cap_id)return hdr;- }
- return NULL;
+}
+/* Return buffer including capability chain, if present. Free with free() */ +static struct vfio_iommu_type1_info *vfio_iommu_info_buf(struct vfio_pci_device *device) +{
- struct vfio_iommu_type1_info *buf;
- buf = malloc(sizeof(*buf));
- VFIO_ASSERT_NOT_NULL(buf);
- *buf = (struct vfio_iommu_type1_info) {
.argsz = sizeof(*buf),- };
- ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, buf);
- buf = realloc(buf, buf->argsz);
- VFIO_ASSERT_NOT_NULL(buf);
- ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, buf);
- return buf;
+}
+/*
- Return iova ranges for the device's container. Normalize vfio_iommu_type1 to
- report iommufd's iommu_iova_range. Free with free().
- */
+static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct vfio_iommu_type1_info_cap_iova_range *cap_range;
- struct vfio_iommu_type1_info *buf;
- struct vfio_info_cap_header *hdr;
- struct iommu_iova_range *ranges = NULL;
- buf = vfio_iommu_info_buf(device);
- VFIO_ASSERT_NOT_NULL(buf);
- hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
- if (!hdr)
goto free_buf;- cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header);
- if (!cap_range->nr_iovas)
goto free_buf;- ranges = malloc(cap_range->nr_iovas * sizeof(*ranges));
Natural calloc() use case.
- VFIO_ASSERT_NOT_NULL(ranges);
- for (u32 i = 0; i < cap_range->nr_iovas; i++) {
ranges[i] = (struct iommu_iova_range){.start = cap_range->iova_ranges[i].start,.last = cap_range->iova_ranges[i].end,};- }
- *nranges = cap_range->nr_iovas;
+free_buf:
- free(buf);
- return ranges;
+}
+/* Return iova ranges of the device's IOAS. Free with free() */ +struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct iommu_iova_range *ranges;
- int ret;
- struct iommu_ioas_iova_ranges query = {
.size = sizeof(query),.ioas_id = device->ioas_id,- };
- ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- VFIO_ASSERT_EQ(ret, -1);
- VFIO_ASSERT_EQ(errno, EMSGSIZE);
- VFIO_ASSERT_GT(query.num_iovas, 0);
- ranges = malloc(query.num_iovas * sizeof(*ranges));
Same.
- VFIO_ASSERT_NOT_NULL(ranges);
- query.allowed_iovas = (uintptr_t)ranges;
- ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- *nranges = query.num_iovas;
- return ranges;
+}
+struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct iommu_iova_range *ranges;
- if (device->iommufd)
ranges = iommufd_iova_ranges(device, nranges);- else
ranges = vfio_iommu_iova_ranges(device, nranges);- if (!ranges)
return NULL;- /* ranges should be valid, ascending, and non-overlapping */
I don't recall that ranges are required to be in any particular order. Thanks,
Alex
- VFIO_ASSERT_GT(*nranges, 0);
- VFIO_ASSERT_LT(ranges[0].start, ranges[0].last);
- for (size_t i = 1; i < *nranges; i++) {
VFIO_ASSERT_LT(ranges[i].start, ranges[i].last);VFIO_ASSERT_LT(ranges[i - 1].last, ranges[i].start);- }
- return ranges;
+}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On Mon, Nov 10, 2025 at 02:31:53PM -0700, Alex Williamson wrote:
On Mon, 10 Nov 2025 13:10:41 -0800 Alex Mastro amastro@fb.com wrote:
VFIO selftests need to map IOVAs from legally accessible ranges, which could vary between hardware. Tests in vfio_dma_mapping_test.c are making excessively strong assumptions about which IOVAs can be mapped.
Add vfio_iommu_iova_ranges(), which queries IOVA ranges from the IOMMUFD or VFIO container associated with the device. The queried ranges are normalized to IOMMUFD's iommu_iova_range representation so that handling of IOVA ranges up the stack can be implementation-agnostic. iommu_iova_range and vfio_iova_range are equivalent, so bias to using the new interface's struct.
Query IOMMUFD's ranges with IOMMU_IOAS_IOVA_RANGES. Query VFIO container's ranges with VFIO_IOMMU_GET_INFO and VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE.
The underlying vfio_iommu_type1_info buffer-related functionality has been kept generic so the same helpers can be used to query other capability chain information, if needed.
Signed-off-by: Alex Mastro amastro@fb.com
.../testing/selftests/vfio/lib/include/vfio_util.h | 8 +- tools/testing/selftests/vfio/lib/vfio_pci_device.c | 161 +++++++++++++++++++++ 2 files changed, 168 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index 240409bf5f8a..fb5efec52316 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -4,9 +4,12 @@ #include <fcntl.h> #include <string.h> -#include <linux/vfio.h>
+#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/list.h> #include <linux/pci_regs.h> +#include <linux/vfio.h> #include "../../../kselftest.h" @@ -206,6 +209,9 @@ struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_ void vfio_pci_device_cleanup(struct vfio_pci_device *device); void vfio_pci_device_reset(struct vfio_pci_device *device); +struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device,
size_t *nranges);int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a381fd253aa7..6bedbe65f0a1 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -29,6 +29,167 @@ VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ } while (0) +static struct vfio_info_cap_header *next_cap_hdr(void *buf, size_t bufsz,
size_t *cap_offset)+{
- struct vfio_info_cap_header *hdr;
- if (!*cap_offset)
return NULL;- /* Cap offset must be in bounds */
- VFIO_ASSERT_LT(*cap_offset, bufsz);
- /* There must be enough remaining space to contain the header */
- VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr));
- hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset);
- /* If there is a next, offset must increase by at least the header size */
- if (hdr->next) {
VFIO_ASSERT_GT(hdr->next, *cap_offset);VFIO_ASSERT_GE(hdr->next - *cap_offset, sizeof(*hdr));- }
- *cap_offset = hdr->next;
- return hdr;
+}
+static struct vfio_info_cap_header *vfio_iommu_info_cap_hdr(struct vfio_iommu_type1_info *buf,
u16 cap_id)+{
- struct vfio_info_cap_header *hdr;
- size_t cap_offset = buf->cap_offset;
- if (!(buf->flags & VFIO_IOMMU_INFO_CAPS))
return NULL;- if (cap_offset)
VFIO_ASSERT_GE(cap_offset, sizeof(struct vfio_iommu_type1_info));- while ((hdr = next_cap_hdr(buf, buf->argsz, &cap_offset))) {
if (hdr->id == cap_id)return hdr;- }
- return NULL;
+}
+/* Return buffer including capability chain, if present. Free with free() */ +static struct vfio_iommu_type1_info *vfio_iommu_info_buf(struct vfio_pci_device *device) +{
- struct vfio_iommu_type1_info *buf;
- buf = malloc(sizeof(*buf));
- VFIO_ASSERT_NOT_NULL(buf);
- *buf = (struct vfio_iommu_type1_info) {
.argsz = sizeof(*buf),- };
- ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, buf);
- buf = realloc(buf, buf->argsz);
- VFIO_ASSERT_NOT_NULL(buf);
- ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, buf);
- return buf;
+}
+/*
- Return iova ranges for the device's container. Normalize vfio_iommu_type1 to
- report iommufd's iommu_iova_range. Free with free().
- */
+static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct vfio_iommu_type1_info_cap_iova_range *cap_range;
- struct vfio_iommu_type1_info *buf;
- struct vfio_info_cap_header *hdr;
- struct iommu_iova_range *ranges = NULL;
- buf = vfio_iommu_info_buf(device);
- VFIO_ASSERT_NOT_NULL(buf);
- hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
- if (!hdr)
goto free_buf;- cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header);
- if (!cap_range->nr_iovas)
goto free_buf;- ranges = malloc(cap_range->nr_iovas * sizeof(*ranges));
Natural calloc() use case.
Ack.
- VFIO_ASSERT_NOT_NULL(ranges);
- for (u32 i = 0; i < cap_range->nr_iovas; i++) {
ranges[i] = (struct iommu_iova_range){.start = cap_range->iova_ranges[i].start,.last = cap_range->iova_ranges[i].end,};- }
- *nranges = cap_range->nr_iovas;
+free_buf:
- free(buf);
- return ranges;
+}
+/* Return iova ranges of the device's IOAS. Free with free() */ +struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct iommu_iova_range *ranges;
- int ret;
- struct iommu_ioas_iova_ranges query = {
.size = sizeof(query),.ioas_id = device->ioas_id,- };
- ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- VFIO_ASSERT_EQ(ret, -1);
- VFIO_ASSERT_EQ(errno, EMSGSIZE);
- VFIO_ASSERT_GT(query.num_iovas, 0);
- ranges = malloc(query.num_iovas * sizeof(*ranges));
Same.
Ack.
- VFIO_ASSERT_NOT_NULL(ranges);
- query.allowed_iovas = (uintptr_t)ranges;
- ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- *nranges = query.num_iovas;
- return ranges;
+}
+struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct iommu_iova_range *ranges;
- if (device->iommufd)
ranges = iommufd_iova_ranges(device, nranges);- else
ranges = vfio_iommu_iova_ranges(device, nranges);- if (!ranges)
return NULL;- /* ranges should be valid, ascending, and non-overlapping */
I don't recall that ranges are required to be in any particular order.
Yes, this is assuming more than the UAPI guarantees. I'll update this to sort what the kernel vends so that we can preserve the sanity checks.
Thanks,
Alex
- VFIO_ASSERT_GT(*nranges, 0);
- VFIO_ASSERT_LT(ranges[0].start, ranges[0].last);
- for (size_t i = 1; i < *nranges; i++) {
VFIO_ASSERT_LT(ranges[i].start, ranges[i].last);VFIO_ASSERT_LT(ranges[i - 1].last, ranges[i].start);- }
- return ranges;
+}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On 2025-11-10 01:10 PM, Alex Mastro wrote:
+/*
- Return iova ranges for the device's container. Normalize vfio_iommu_type1 to
- report iommufd's iommu_iova_range. Free with free().
- */
+static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct vfio_iommu_type1_info_cap_iova_range *cap_range;
- struct vfio_iommu_type1_info *buf;
nit: Maybe name this variable `info` here and in vfio_iommu_info_buf() and vfio_iommu_info_cap_hdr()? It is not an opaque buffer.
- struct vfio_info_cap_header *hdr;
- struct iommu_iova_range *ranges = NULL;
- buf = vfio_iommu_info_buf(device);
nit: How about naming this vfio_iommu_get_info() since it actually fetches the info from VFIO? (It doesn't just allocate a buffer.)
- VFIO_ASSERT_NOT_NULL(buf);
This assert is unnecessary.
- hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
- if (!hdr)
goto free_buf;
Is this to account for running on old versions of VFIO? Or are there some scenarios when VFIO can't report the list of IOVA ranges?
- cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header);
- if (!cap_range->nr_iovas)
goto free_buf;- ranges = malloc(cap_range->nr_iovas * sizeof(*ranges));
- VFIO_ASSERT_NOT_NULL(ranges);
- for (u32 i = 0; i < cap_range->nr_iovas; i++) {
ranges[i] = (struct iommu_iova_range){.start = cap_range->iova_ranges[i].start,.last = cap_range->iova_ranges[i].end,};- }
- *nranges = cap_range->nr_iovas;
+free_buf:
- free(buf);
- return ranges;
+}
+/* Return iova ranges of the device's IOAS. Free with free() */ +struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct iommu_iova_range *ranges;
- int ret;
- struct iommu_ioas_iova_ranges query = {
.size = sizeof(query),.ioas_id = device->ioas_id,- };
- ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- VFIO_ASSERT_EQ(ret, -1);
- VFIO_ASSERT_EQ(errno, EMSGSIZE);
- VFIO_ASSERT_GT(query.num_iovas, 0);
- ranges = malloc(query.num_iovas * sizeof(*ranges));
- VFIO_ASSERT_NOT_NULL(ranges);
- query.allowed_iovas = (uintptr_t)ranges;
- ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- *nranges = query.num_iovas;
- return ranges;
+}
+struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)
nit: Both iommufd and VFIO represent the number of IOVA ranges as a u32. Perhaps we should do the same in VFIO selftests?
On Mon, Nov 10, 2025 at 10:03:54PM +0000, David Matlack wrote:
On 2025-11-10 01:10 PM, Alex Mastro wrote:
+/*
- Return iova ranges for the device's container. Normalize vfio_iommu_type1 to
- report iommufd's iommu_iova_range. Free with free().
- */
+static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct vfio_iommu_type1_info_cap_iova_range *cap_range;
- struct vfio_iommu_type1_info *buf;
nit: Maybe name this variable `info` here and in vfio_iommu_info_buf() and vfio_iommu_info_cap_hdr()? It is not an opaque buffer.
- struct vfio_info_cap_header *hdr;
- struct iommu_iova_range *ranges = NULL;
- buf = vfio_iommu_info_buf(device);
nit: How about naming this vfio_iommu_get_info() since it actually fetches the info from VFIO? (It doesn't just allocate a buffer.)
- VFIO_ASSERT_NOT_NULL(buf);
This assert is unnecessary.
- hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
- if (!hdr)
goto free_buf;Is this to account for running on old versions of VFIO? Or are there some scenarios when VFIO can't report the list of IOVA ranges?
I wanted to avoid being overly assertive in this low-level helper function, mostly out of ignorance about where/in which system states this capability may not be reported.
- cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header);
- if (!cap_range->nr_iovas)
goto free_buf;- ranges = malloc(cap_range->nr_iovas * sizeof(*ranges));
- VFIO_ASSERT_NOT_NULL(ranges);
- for (u32 i = 0; i < cap_range->nr_iovas; i++) {
ranges[i] = (struct iommu_iova_range){.start = cap_range->iova_ranges[i].start,.last = cap_range->iova_ranges[i].end,};- }
- *nranges = cap_range->nr_iovas;
+free_buf:
- free(buf);
- return ranges;
+}
+/* Return iova ranges of the device's IOAS. Free with free() */ +struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)+{
- struct iommu_iova_range *ranges;
- int ret;
- struct iommu_ioas_iova_ranges query = {
.size = sizeof(query),.ioas_id = device->ioas_id,- };
- ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- VFIO_ASSERT_EQ(ret, -1);
- VFIO_ASSERT_EQ(errno, EMSGSIZE);
- VFIO_ASSERT_GT(query.num_iovas, 0);
- ranges = malloc(query.num_iovas * sizeof(*ranges));
- VFIO_ASSERT_NOT_NULL(ranges);
- query.allowed_iovas = (uintptr_t)ranges;
- ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query);
- *nranges = query.num_iovas;
- return ranges;
+}
+struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device,
size_t *nranges)nit: Both iommufd and VFIO represent the number of IOVA ranges as a u32. Perhaps we should do the same in VFIO selftests?
Thanks David. All suggestions SGTM -- will roll into v2.
On 2025-11-10 02:32 PM, Alex Mastro wrote:
On Mon, Nov 10, 2025 at 10:03:54PM +0000, David Matlack wrote:
On 2025-11-10 01:10 PM, Alex Mastro wrote:
- hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
- if (!hdr)
goto free_buf;Is this to account for running on old versions of VFIO? Or are there some scenarios when VFIO can't report the list of IOVA ranges?
I wanted to avoid being overly assertive in this low-level helper function, mostly out of ignorance about where/in which system states this capability may not be reported.
Makes sense, but IIUC a failure here will eventually turn into an assertion failure in all callers that exist today. So there's currently no reason to plumb it up the stack.
For situations like this, I think we should err on asserting at the lower level helpers, and only propagating errors up as needed. That keeps all the happy-path callers simple, and those should be the majority of callers (if not all callers).
On Mon, Nov 10, 2025 at 11:02:32PM +0000, David Matlack wrote:
On 2025-11-10 02:32 PM, Alex Mastro wrote:
On Mon, Nov 10, 2025 at 10:03:54PM +0000, David Matlack wrote:
On 2025-11-10 01:10 PM, Alex Mastro wrote:
- hdr = vfio_iommu_info_cap_hdr(buf, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE);
- if (!hdr)
goto free_buf;Is this to account for running on old versions of VFIO? Or are there some scenarios when VFIO can't report the list of IOVA ranges?
I wanted to avoid being overly assertive in this low-level helper function, mostly out of ignorance about where/in which system states this capability may not be reported.
Makes sense, but IIUC a failure here will eventually turn into an assertion failure in all callers that exist today. So there's currently no reason to plumb it up the stack.
Yes, the first part is true.
For situations like this, I think we should err on asserting at the lower level helpers, and only propagating errors up as needed. That keeps all the happy-path callers simple, and those should be the majority of callers (if not all callers).
SGTM -- I will do this.
Use the newly available vfio_pci_iova_ranges() to determine the last legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.
Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests") Signed-off-by: Alex Mastro amastro@fb.com --- tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 4f1ea79a200c..37c2a342df8d 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -3,6 +3,8 @@ #include <sys/mman.h> #include <unistd.h>
+#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> #include <linux/sizes.h> @@ -219,7 +221,10 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(); FIXTURE_SETUP(vfio_dma_map_limit_test) { struct vfio_dma_region *region = &self->region; + struct iommu_iova_range *ranges; u64 region_size = getpagesize(); + iova_t last_iova; + size_t nranges;
/* * Over-allocate mmap by double the size to provide enough backing vaddr @@ -232,8 +237,13 @@ FIXTURE_SETUP(vfio_dma_map_limit_test) MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); ASSERT_NE(region->vaddr, MAP_FAILED);
- /* One page prior to the end of address space */ - region->iova = ~(iova_t)0 & ~(region_size - 1); + ranges = vfio_pci_iova_ranges(self->device, &nranges); + VFIO_ASSERT_NOT_NULL(ranges); + last_iova = ranges[nranges - 1].last; + free(ranges); + + /* One page prior to the last iova */ + region->iova = last_iova & ~(region_size - 1); region->size = region_size; }
@@ -276,6 +286,9 @@ TEST_F(vfio_dma_map_limit_test, overflow) struct vfio_dma_region *region = &self->region; int rc;
+ if (region->iova != (~(iova_t)0 & ~(region->size - 1))) + SKIP(return, "IOMMU address space insufficient for overflow test"); + region->size = self->mmap_size;
rc = __vfio_pci_dma_map(self->device, region);
On Mon, 10 Nov 2025 13:10:42 -0800 Alex Mastro amastro@fb.com wrote:
Use the newly available vfio_pci_iova_ranges() to determine the last legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.
Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests") Signed-off-by: Alex Mastro amastro@fb.com
tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 4f1ea79a200c..37c2a342df8d 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -3,6 +3,8 @@ #include <sys/mman.h> #include <unistd.h> +#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> #include <linux/sizes.h> @@ -219,7 +221,10 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(); FIXTURE_SETUP(vfio_dma_map_limit_test) { struct vfio_dma_region *region = &self->region;
- struct iommu_iova_range *ranges; u64 region_size = getpagesize();
- iova_t last_iova;
- size_t nranges;
/* * Over-allocate mmap by double the size to provide enough backing vaddr @@ -232,8 +237,13 @@ FIXTURE_SETUP(vfio_dma_map_limit_test) MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); ASSERT_NE(region->vaddr, MAP_FAILED);
- /* One page prior to the end of address space */
- region->iova = ~(iova_t)0 & ~(region_size - 1);
- ranges = vfio_pci_iova_ranges(self->device, &nranges);
- VFIO_ASSERT_NOT_NULL(ranges);
- last_iova = ranges[nranges - 1].last;
Building on the imposed requirement of ordered ranges. Thanks,
Alex
- free(ranges);
- /* One page prior to the last iova */
- region->iova = last_iova & ~(region_size - 1); region->size = region_size;
} @@ -276,6 +286,9 @@ TEST_F(vfio_dma_map_limit_test, overflow) struct vfio_dma_region *region = &self->region; int rc;
- if (region->iova != (~(iova_t)0 & ~(region->size - 1)))
SKIP(return, "IOMMU address space insufficient for overflow test");- region->size = self->mmap_size;
rc = __vfio_pci_dma_map(self->device, region);
On Mon, Nov 10, 2025 at 02:31:52PM -0700, Alex Williamson wrote:
On Mon, 10 Nov 2025 13:10:42 -0800 Alex Mastro amastro@fb.com wrote:
Use the newly available vfio_pci_iova_ranges() to determine the last legal IOVA, and use this as the basis for vfio_dma_map_limit_test tests.
Fixes: de8d1f2fd5a5 ("vfio: selftests: add end of address space DMA map/unmap tests") Signed-off-by: Alex Mastro amastro@fb.com
tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 4f1ea79a200c..37c2a342df8d 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -3,6 +3,8 @@ #include <sys/mman.h> #include <unistd.h> +#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> #include <linux/sizes.h> @@ -219,7 +221,10 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(); FIXTURE_SETUP(vfio_dma_map_limit_test) { struct vfio_dma_region *region = &self->region;
- struct iommu_iova_range *ranges; u64 region_size = getpagesize();
- iova_t last_iova;
- size_t nranges;
/* * Over-allocate mmap by double the size to provide enough backing vaddr @@ -232,8 +237,13 @@ FIXTURE_SETUP(vfio_dma_map_limit_test) MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); ASSERT_NE(region->vaddr, MAP_FAILED);
- /* One page prior to the end of address space */
- region->iova = ~(iova_t)0 & ~(region_size - 1);
- ranges = vfio_pci_iova_ranges(self->device, &nranges);
- VFIO_ASSERT_NOT_NULL(ranges);
- last_iova = ranges[nranges - 1].last;
Building on the imposed requirement of ordered ranges. Thanks,
Agree. Will keep this code as-is given my plan to explicitly normalize the ranges to be ordered in the helper.
Alex
- free(ranges);
- /* One page prior to the last iova */
- region->iova = last_iova & ~(region_size - 1); region->size = region_size;
} @@ -276,6 +286,9 @@ TEST_F(vfio_dma_map_limit_test, overflow) struct vfio_dma_region *region = &self->region; int rc;
- if (region->iova != (~(iova_t)0 & ~(region->size - 1)))
SKIP(return, "IOMMU address space insufficient for overflow test");- region->size = self->mmap_size;
rc = __vfio_pci_dma_map(self->device, region);
On Mon, Nov 10, 2025 at 1:11 PM Alex Mastro amastro@fb.com wrote:
if (region->iova != (~(iova_t)0 & ~(region->size - 1)))SKIP(return, "IOMMU address space insufficient for overflow test");
If, instead, this was:
region->iova = ~(iova_t)0 & ~(region->size - 1);
then I think this test could be run on all platforms. The kernel checks for overflow before it checks for valid iova ranges.
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map.
This is based on Alex Williamson's patch series for adding an IOVA allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Signed-off-by: Alex Mastro amastro@fb.com --- .../testing/selftests/vfio/lib/include/vfio_util.h | 14 +++++ tools/testing/selftests/vfio/lib/vfio_pci_device.c | 65 +++++++++++++++++++++- 2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index fb5efec52316..bb1e7d39dfb9 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -13,6 +13,8 @@
#include "../../../kselftest.h"
+#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1))) + #define VFIO_LOG_AND_EXIT(...) do { \ fprintf(stderr, " " __VA_ARGS__); \ fprintf(stderr, "\n"); \ @@ -188,6 +190,13 @@ struct vfio_pci_device { struct vfio_pci_driver driver; };
+struct iova_allocator { + struct iommu_iova_range *ranges; + size_t nranges; + size_t range_idx; + iova_t iova_next; +}; + /* * Return the BDF string of the device that the test should use. * @@ -212,6 +221,11 @@ void vfio_pci_device_reset(struct vfio_pci_device *device); struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, size_t *nranges);
+int iova_allocator_init(struct vfio_pci_device *device, + struct iova_allocator *allocator); +void iova_allocator_deinit(struct iova_allocator *allocator); +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size); + int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 6bedbe65f0a1..a634feb1d378 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -12,11 +12,12 @@ #include <sys/mman.h>
#include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> +#include <linux/overflow.h> #include <linux/types.h> #include <linux/vfio.h> -#include <linux/iommufd.h>
#include "../../../kselftest.h" #include <vfio_util.h> @@ -190,6 +191,68 @@ struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, return ranges; }
+int iova_allocator_init(struct vfio_pci_device *device, + struct iova_allocator *allocator) +{ + struct iommu_iova_range *ranges; + size_t nranges; + + memset(allocator, 0, sizeof(*allocator)); + + ranges = vfio_pci_iova_ranges(device, &nranges); + if (!ranges) + return -ENOENT; + + *allocator = (struct iova_allocator){ + .ranges = ranges, + .nranges = nranges, + .range_idx = 0, + .iova_next = 0, + }; + + return 0; +} + +void iova_allocator_deinit(struct iova_allocator *allocator) +{ + free(allocator->ranges); +} + +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) +{ + int idx = allocator->range_idx; + struct iommu_iova_range *range = &allocator->ranges[idx]; + + VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n"); + VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n"); + VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n"); + + for (;;) { + iova_t iova, last; + + iova = ALIGN(allocator->iova_next, size); + + if (iova < allocator->iova_next || iova > range->last || + check_add_overflow(iova, size - 1, &last) || + last > range->last) { + allocator->range_idx = ++idx; + VFIO_ASSERT_LT(idx, allocator->nranges, + "Out of ranges for allocation\n"); + allocator->iova_next = (++range)->start; + continue; + } + + if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) || + allocator->iova_next > range->last) { + allocator->range_idx = ++idx; + if (idx < allocator->nranges) + allocator->iova_next = (++range)->start; + } + + return iova; + } +} + iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On Mon, 10 Nov 2025 13:10:43 -0800 Alex Mastro amastro@fb.com wrote:
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map.
This is based on Alex Williamson's patch series for adding an IOVA allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Signed-off-by: Alex Mastro amastro@fb.com
.../testing/selftests/vfio/lib/include/vfio_util.h | 14 +++++ tools/testing/selftests/vfio/lib/vfio_pci_device.c | 65 +++++++++++++++++++++- 2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index fb5efec52316..bb1e7d39dfb9 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -13,6 +13,8 @@ #include "../../../kselftest.h" +#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
#define VFIO_LOG_AND_EXIT(...) do { \ fprintf(stderr, " " __VA_ARGS__); \ fprintf(stderr, "\n"); \ @@ -188,6 +190,13 @@ struct vfio_pci_device { struct vfio_pci_driver driver; }; +struct iova_allocator {
- struct iommu_iova_range *ranges;
- size_t nranges;
- size_t range_idx;
- iova_t iova_next;
+};
/*
- Return the BDF string of the device that the test should use.
@@ -212,6 +221,11 @@ void vfio_pci_device_reset(struct vfio_pci_device *device); struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, size_t *nranges); +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator);+void iova_allocator_deinit(struct iova_allocator *allocator); +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size);
int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 6bedbe65f0a1..a634feb1d378 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -12,11 +12,12 @@ #include <sys/mman.h> #include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> +#include <linux/overflow.h> #include <linux/types.h> #include <linux/vfio.h> -#include <linux/iommufd.h> #include "../../../kselftest.h" #include <vfio_util.h> @@ -190,6 +191,68 @@ struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, return ranges; } +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator)+{
- struct iommu_iova_range *ranges;
- size_t nranges;
- memset(allocator, 0, sizeof(*allocator));
- ranges = vfio_pci_iova_ranges(device, &nranges);
- if (!ranges)
return -ENOENT;- *allocator = (struct iova_allocator){
.ranges = ranges,.nranges = nranges,.range_idx = 0,.iova_next = 0,
iova_next needs to be initialized from ranges[0].start. Thanks,
Alex
- };
- return 0;
+}
+void iova_allocator_deinit(struct iova_allocator *allocator) +{
- free(allocator->ranges);
+}
+iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) +{
- int idx = allocator->range_idx;
- struct iommu_iova_range *range = &allocator->ranges[idx];
- VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n");
- VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n");
- VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n");
- for (;;) {
iova_t iova, last;iova = ALIGN(allocator->iova_next, size);if (iova < allocator->iova_next || iova > range->last ||check_add_overflow(iova, size - 1, &last) ||last > range->last) {allocator->range_idx = ++idx;VFIO_ASSERT_LT(idx, allocator->nranges,"Out of ranges for allocation\n");allocator->iova_next = (++range)->start;continue;}if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) ||allocator->iova_next > range->last) {allocator->range_idx = ++idx;if (idx < allocator->nranges)allocator->iova_next = (++range)->start;}return iova;- }
+}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On Mon, Nov 10, 2025 at 02:31:54PM -0700, Alex Williamson wrote:
On Mon, 10 Nov 2025 13:10:43 -0800 Alex Mastro amastro@fb.com wrote:
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map.
This is based on Alex Williamson's patch series for adding an IOVA allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Signed-off-by: Alex Mastro amastro@fb.com
.../testing/selftests/vfio/lib/include/vfio_util.h | 14 +++++ tools/testing/selftests/vfio/lib/vfio_pci_device.c | 65 +++++++++++++++++++++- 2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index fb5efec52316..bb1e7d39dfb9 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -13,6 +13,8 @@ #include "../../../kselftest.h" +#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
#define VFIO_LOG_AND_EXIT(...) do { \ fprintf(stderr, " " __VA_ARGS__); \ fprintf(stderr, "\n"); \ @@ -188,6 +190,13 @@ struct vfio_pci_device { struct vfio_pci_driver driver; }; +struct iova_allocator {
- struct iommu_iova_range *ranges;
- size_t nranges;
- size_t range_idx;
- iova_t iova_next;
+};
/*
- Return the BDF string of the device that the test should use.
@@ -212,6 +221,11 @@ void vfio_pci_device_reset(struct vfio_pci_device *device); struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, size_t *nranges); +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator);+void iova_allocator_deinit(struct iova_allocator *allocator); +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size);
int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 6bedbe65f0a1..a634feb1d378 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -12,11 +12,12 @@ #include <sys/mman.h> #include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> +#include <linux/overflow.h> #include <linux/types.h> #include <linux/vfio.h> -#include <linux/iommufd.h> #include "../../../kselftest.h" #include <vfio_util.h> @@ -190,6 +191,68 @@ struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, return ranges; } +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator)+{
- struct iommu_iova_range *ranges;
- size_t nranges;
- memset(allocator, 0, sizeof(*allocator));
- ranges = vfio_pci_iova_ranges(device, &nranges);
- if (!ranges)
return -ENOENT;- *allocator = (struct iova_allocator){
.ranges = ranges,.nranges = nranges,.range_idx = 0,.iova_next = 0,iova_next needs to be initialized from ranges[0].start. Thanks,
True. Thanks for catching this.
Alex
- };
- return 0;
+}
+void iova_allocator_deinit(struct iova_allocator *allocator) +{
- free(allocator->ranges);
+}
+iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) +{
- int idx = allocator->range_idx;
- struct iommu_iova_range *range = &allocator->ranges[idx];
- VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n");
- VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n");
- VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n");
- for (;;) {
iova_t iova, last;iova = ALIGN(allocator->iova_next, size);if (iova < allocator->iova_next || iova > range->last ||check_add_overflow(iova, size - 1, &last) ||last > range->last) {allocator->range_idx = ++idx;VFIO_ASSERT_LT(idx, allocator->nranges,"Out of ranges for allocation\n");allocator->iova_next = (++range)->start;continue;}if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) ||allocator->iova_next > range->last) {allocator->range_idx = ++idx;if (idx < allocator->nranges)allocator->iova_next = (++range)->start;}return iova;- }
+}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
On 2025-11-10 01:10 PM, Alex Mastro wrote:
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map.
This is based on Alex Williamson's patch series for adding an IOVA allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Signed-off-by: Alex Mastro amastro@fb.com
.../testing/selftests/vfio/lib/include/vfio_util.h | 14 +++++ tools/testing/selftests/vfio/lib/vfio_pci_device.c | 65 +++++++++++++++++++++- 2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index fb5efec52316..bb1e7d39dfb9 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -13,6 +13,8 @@ #include "../../../kselftest.h" +#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
Please name this ALIGN_UP() so that it is clear it aligns x up and not down.
#define VFIO_LOG_AND_EXIT(...) do { \ fprintf(stderr, " " __VA_ARGS__); \ fprintf(stderr, "\n"); \ @@ -188,6 +190,13 @@ struct vfio_pci_device { struct vfio_pci_driver driver; }; +struct iova_allocator {
- struct iommu_iova_range *ranges;
- size_t nranges;
- size_t range_idx;
- iova_t iova_next;
+};
/*
- Return the BDF string of the device that the test should use.
@@ -212,6 +221,11 @@ void vfio_pci_device_reset(struct vfio_pci_device *device); struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, size_t *nranges); +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator);+void iova_allocator_deinit(struct iova_allocator *allocator); +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size);
int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 6bedbe65f0a1..a634feb1d378 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -12,11 +12,12 @@ #include <sys/mman.h> #include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> +#include <linux/overflow.h> #include <linux/types.h> #include <linux/vfio.h> -#include <linux/iommufd.h> #include "../../../kselftest.h" #include <vfio_util.h> @@ -190,6 +191,68 @@ struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, return ranges; } +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator)+{
- struct iommu_iova_range *ranges;
- size_t nranges;
- memset(allocator, 0, sizeof(*allocator));
- ranges = vfio_pci_iova_ranges(device, &nranges);
- if (!ranges)
return -ENOENT;- *allocator = (struct iova_allocator){
.ranges = ranges,.nranges = nranges,.range_idx = 0,.iova_next = 0,- };
- return 0;
+}
+void iova_allocator_deinit(struct iova_allocator *allocator) +{
- free(allocator->ranges);
+}
I think it would be good to be consistent about how the library hands out and initializes objects. e.g. For devices we have:
device = vfio_pci_device_init(...); vfio_pci_device_cleanup(device);
So for allocator it would be:
allocator = iova_allocator_init(); iova_allocator_cleanup(allocator);
It's a small thing, but this way users of the library can always work with pointers allocated by the library, there is a consistent meaning of *_init() functions, and one doesn't have to distinguish between *_deinit() and *_cleanup().
Forcing dynamic memory allocation is less efficient, but I think simplicity and consistency matters more when it comes to tests.
+iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) +{
- int idx = allocator->range_idx;
- struct iommu_iova_range *range = &allocator->ranges[idx];
- VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n");
- VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n");
- VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n");
ALIGN() is what requires size to be a power of 2, so the assert should probably go inside that macro.
- for (;;) {
iova_t iova, last;iova = ALIGN(allocator->iova_next, size);if (iova < allocator->iova_next || iova > range->last ||check_add_overflow(iova, size - 1, &last) ||last > range->last) {allocator->range_idx = ++idx;VFIO_ASSERT_LT(idx, allocator->nranges,"Out of ranges for allocation\n");allocator->iova_next = (++range)->start;continue;}if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) ||allocator->iova_next > range->last) {allocator->range_idx = ++idx;if (idx < allocator->nranges)allocator->iova_next = (++range)->start;}return iova;- }
I found this loop a bit hard to read. The if statements have 3-4 statements, and idx and range are managed deep in the loop. What about something like this? It also avoids the need to check for overflow (unless I missed something :).
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index bb1e7d39dfb9..63fce0ffe287 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -193,8 +193,10 @@ struct vfio_pci_device { struct iova_allocator { struct iommu_iova_range *ranges; size_t nranges; + + /* The next range, and offset within it, from which to allocate. */ size_t range_idx; - iova_t iova_next; + iova_t range_offset; };
/* diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a634feb1d378..5b85005c4544 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -207,7 +207,7 @@ int iova_allocator_init(struct vfio_pci_device *device, .ranges = ranges, .nranges = nranges, .range_idx = 0, - .iova_next = 0, + .range_offset = 0, };
return 0; @@ -220,37 +220,41 @@ void iova_allocator_deinit(struct iova_allocator *allocator)
iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) { - int idx = allocator->range_idx; - struct iommu_iova_range *range = &allocator->ranges[idx]; + int idx;
- VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n"); VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n"); VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n");
- for (;;) { + for (idx = allocator->range_idx; idx < allocator->nranges; idx++) { + struct iommu_iova_range *range = &allocator->ranges[idx]; iova_t iova, last;
- iova = ALIGN(allocator->iova_next, size); + if (idx == allocator->range_idx) + iova = ALIGN(range->start + allocator->range_offset, size); + else + iova = ALIGN(range->start, size);
- if (iova < allocator->iova_next || iova > range->last || - check_add_overflow(iova, size - 1, &last) || - last > range->last) { - allocator->range_idx = ++idx; - VFIO_ASSERT_LT(idx, allocator->nranges, - "Out of ranges for allocation\n"); - allocator->iova_next = (++range)->start; + if (range->last - iova + 1 < size) continue; - }
- if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) || - allocator->iova_next > range->last) { - allocator->range_idx = ++idx; - if (idx < allocator->nranges) - allocator->iova_next = (++range)->start; + /* + * Found a range to hold the allocation. Update the allocator + * for the next allocation. + */ + last = iova + (size - 1); + + if (last < range->last) { + allocator->range_idx = idx; + allocator->range_offset = last - range->start + 1; + } else { + allocator->range_idx = idx + 1; + allocator->range_offset = 0; }
return iova; } + + VFIO_FAIL("Failed to iova range of size 0x%lx\n", size); }
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr)
+}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
-- 2.47.3
On Mon, Nov 10, 2025 at 10:54:04PM +0000, David Matlack wrote:
On 2025-11-10 01:10 PM, Alex Mastro wrote:
Add struct iova_allocator, which gives tests a convenient way to generate legally-accessible IOVAs to map.
This is based on Alex Williamson's patch series for adding an IOVA allocator [1].
[1] https://lore.kernel.org/all/20251108212954.26477-1-alex@shazbot.org/
Signed-off-by: Alex Mastro amastro@fb.com
.../testing/selftests/vfio/lib/include/vfio_util.h | 14 +++++ tools/testing/selftests/vfio/lib/vfio_pci_device.c | 65 +++++++++++++++++++++- 2 files changed, 78 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index fb5efec52316..bb1e7d39dfb9 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -13,6 +13,8 @@ #include "../../../kselftest.h" +#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
Please name this ALIGN_UP() so that it is clear it aligns x up and not down.
Ack.
#define VFIO_LOG_AND_EXIT(...) do { \ fprintf(stderr, " " __VA_ARGS__); \ fprintf(stderr, "\n"); \ @@ -188,6 +190,13 @@ struct vfio_pci_device { struct vfio_pci_driver driver; }; +struct iova_allocator {
- struct iommu_iova_range *ranges;
- size_t nranges;
- size_t range_idx;
- iova_t iova_next;
+};
/*
- Return the BDF string of the device that the test should use.
@@ -212,6 +221,11 @@ void vfio_pci_device_reset(struct vfio_pci_device *device); struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, size_t *nranges); +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator);+void iova_allocator_deinit(struct iova_allocator *allocator); +iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size);
int __vfio_pci_dma_map(struct vfio_pci_device *device, struct vfio_dma_region *region); int __vfio_pci_dma_unmap(struct vfio_pci_device *device, diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index 6bedbe65f0a1..a634feb1d378 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -12,11 +12,12 @@ #include <sys/mman.h> #include <uapi/linux/types.h> +#include <linux/iommufd.h> #include <linux/limits.h> #include <linux/mman.h> +#include <linux/overflow.h> #include <linux/types.h> #include <linux/vfio.h> -#include <linux/iommufd.h> #include "../../../kselftest.h" #include <vfio_util.h> @@ -190,6 +191,68 @@ struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, return ranges; } +int iova_allocator_init(struct vfio_pci_device *device,
struct iova_allocator *allocator)+{
- struct iommu_iova_range *ranges;
- size_t nranges;
- memset(allocator, 0, sizeof(*allocator));
- ranges = vfio_pci_iova_ranges(device, &nranges);
- if (!ranges)
return -ENOENT;- *allocator = (struct iova_allocator){
.ranges = ranges,.nranges = nranges,.range_idx = 0,.iova_next = 0,- };
- return 0;
+}
+void iova_allocator_deinit(struct iova_allocator *allocator) +{
- free(allocator->ranges);
+}
I think it would be good to be consistent about how the library hands out and initializes objects. e.g. For devices we have:
device = vfio_pci_device_init(...); vfio_pci_device_cleanup(device);
So for allocator it would be:
allocator = iova_allocator_init(); iova_allocator_cleanup(allocator);
It's a small thing, but this way users of the library can always work with pointers allocated by the library, there is a consistent meaning of *_init() functions, and one doesn't have to distinguish between *_deinit() and *_cleanup().
Forcing dynamic memory allocation is less efficient, but I think simplicity and consistency matters more when it comes to tests.
SGTM -- will change it to be like how you suggested for consistency.
+iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) +{
- int idx = allocator->range_idx;
- struct iommu_iova_range *range = &allocator->ranges[idx];
- VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n");
- VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n");
- VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n");
ALIGN() is what requires size to be a power of 2, so the assert should probably go inside that macro.
SGTM
- for (;;) {
iova_t iova, last;iova = ALIGN(allocator->iova_next, size);if (iova < allocator->iova_next || iova > range->last ||check_add_overflow(iova, size - 1, &last) ||last > range->last) {allocator->range_idx = ++idx;VFIO_ASSERT_LT(idx, allocator->nranges,"Out of ranges for allocation\n");allocator->iova_next = (++range)->start;continue;}if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) ||allocator->iova_next > range->last) {allocator->range_idx = ++idx;if (idx < allocator->nranges)allocator->iova_next = (++range)->start;}return iova;- }
I found this loop a bit hard to read. The if statements have 3-4 statements, and idx and range are managed deep in the loop. What about something like this? It also avoids the need to check for overflow (unless I missed something :).
I'll take a closer look at your suggestions. Agree that it's terse. I shamelessly lifted this verbatim from AlexW's patch, and it seemed ok from my first pass squinting.
diff --git a/tools/testing/selftests/vfio/lib/include/vfio_util.h b/tools/testing/selftests/vfio/lib/include/vfio_util.h index bb1e7d39dfb9..63fce0ffe287 100644 --- a/tools/testing/selftests/vfio/lib/include/vfio_util.h +++ b/tools/testing/selftests/vfio/lib/include/vfio_util.h @@ -193,8 +193,10 @@ struct vfio_pci_device { struct iova_allocator { struct iommu_iova_range *ranges; size_t nranges;
- /* The next range, and offset within it, from which to allocate. */ size_t range_idx;
- iova_t iova_next;
- iova_t range_offset;
};
/* diff --git a/tools/testing/selftests/vfio/lib/vfio_pci_device.c b/tools/testing/selftests/vfio/lib/vfio_pci_device.c index a634feb1d378..5b85005c4544 100644 --- a/tools/testing/selftests/vfio/lib/vfio_pci_device.c +++ b/tools/testing/selftests/vfio/lib/vfio_pci_device.c @@ -207,7 +207,7 @@ int iova_allocator_init(struct vfio_pci_device *device, .ranges = ranges, .nranges = nranges, .range_idx = 0,
.iova_next = 0,
.range_offset = 0,};
return 0;
@@ -220,37 +220,41 @@ void iova_allocator_deinit(struct iova_allocator *allocator)
iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) {
- int idx = allocator->range_idx;
- struct iommu_iova_range *range = &allocator->ranges[idx];
- int idx;
VFIO_ASSERT_LT(idx, allocator->nranges, "IOVA allocator out of space\n"); VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n"); VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n");
for (;;) {
- for (idx = allocator->range_idx; idx < allocator->nranges; idx++) {
iova_t iova, last;struct iommu_iova_range *range = &allocator->ranges[idx];
iova = ALIGN(allocator->iova_next, size);
if (idx == allocator->range_idx)iova = ALIGN(range->start + allocator->range_offset, size);elseiova = ALIGN(range->start, size);
if (iova < allocator->iova_next || iova > range->last ||check_add_overflow(iova, size - 1, &last) ||last > range->last) {allocator->range_idx = ++idx;VFIO_ASSERT_LT(idx, allocator->nranges,"Out of ranges for allocation\n");allocator->iova_next = (++range)->start;
if (range->last - iova + 1 < size) continue;
}if (check_add_overflow(last, (iova_t)1, &allocator->iova_next) ||allocator->iova_next > range->last) {allocator->range_idx = ++idx;if (idx < allocator->nranges)allocator->iova_next = (++range)->start;
/** Found a range to hold the allocation. Update the allocator* for the next allocation.*/last = iova + (size - 1);if (last < range->last) {allocator->range_idx = idx;allocator->range_offset = last - range->start + 1;} else {allocator->range_idx = idx + 1;allocator->range_offset = 0;}
return iova; }
VFIO_FAIL("Failed to iova range of size 0x%lx\n", size);
}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr)
+}
iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) { struct vfio_dma_region *region;
-- 2.47.3
vfio_dma_mapping_test currently uses iova=vaddr as part of DMA mapping validation. The assumption that these IOVAs are legal has held up on all the hardware we've tested so far, but but is not guaranteed. Make the test more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container.
Signed-off-by: Alex Mastro amastro@fb.com --- tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 37c2a342df8d..c1a015385b0f 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -95,6 +95,7 @@ static int iommu_mapping_get(const char *bdf, u64 iova,
FIXTURE(vfio_dma_mapping_test) { struct vfio_pci_device *device; + struct iova_allocator iova_allocator; };
FIXTURE_VARIANT(vfio_dma_mapping_test) { @@ -118,11 +119,16 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(anonymous_hugetlb_1gb, SZ_1G, MAP_HUGETLB |
FIXTURE_SETUP(vfio_dma_mapping_test) { + int ret; + self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode); + ret = iova_allocator_init(self->device, &self->iova_allocator); + VFIO_ASSERT_EQ(ret, 0); }
FIXTURE_TEARDOWN(vfio_dma_mapping_test) { + iova_allocator_deinit(&self->iova_allocator); vfio_pci_device_cleanup(self->device); }
@@ -144,7 +150,7 @@ TEST_F(vfio_dma_mapping_test, dma_map_unmap) else ASSERT_NE(region.vaddr, MAP_FAILED);
- region.iova = (u64)region.vaddr; + region.iova = iova_allocator_alloc(&self->iova_allocator, size); region.size = size;
vfio_pci_dma_map(self->device, ®ion);
On Mon, 10 Nov 2025 13:10:44 -0800 Alex Mastro amastro@fb.com wrote:
vfio_dma_mapping_test currently uses iova=vaddr as part of DMA mapping validation. The assumption that these IOVAs are legal has held up on all the hardware we've tested so far, but but is not guaranteed. Make the test more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container.
Signed-off-by: Alex Mastro amastro@fb.com
tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 37c2a342df8d..c1a015385b0f 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -95,6 +95,7 @@ static int iommu_mapping_get(const char *bdf, u64 iova, FIXTURE(vfio_dma_mapping_test) { struct vfio_pci_device *device;
- struct iova_allocator iova_allocator;
}; FIXTURE_VARIANT(vfio_dma_mapping_test) { @@ -118,11 +119,16 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(anonymous_hugetlb_1gb, SZ_1G, MAP_HUGETLB | FIXTURE_SETUP(vfio_dma_mapping_test) {
- int ret;
- self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode);
- ret = iova_allocator_init(self->device, &self->iova_allocator);
- VFIO_ASSERT_EQ(ret, 0);
} FIXTURE_TEARDOWN(vfio_dma_mapping_test) {
- iova_allocator_deinit(&self->iova_allocator); vfio_pci_device_cleanup(self->device);
} @@ -144,7 +150,7 @@ TEST_F(vfio_dma_mapping_test, dma_map_unmap) else ASSERT_NE(region.vaddr, MAP_FAILED);
- region.iova = (u64)region.vaddr;
- region.iova = iova_allocator_alloc(&self->iova_allocator, size); region.size = size;
vfio_pci_dma_map(self->device, ®ion);
There's another in the driver test. Thanks,
Alex
On Mon, Nov 10, 2025 at 02:31:56PM -0700, Alex Williamson wrote:
On Mon, 10 Nov 2025 13:10:44 -0800 Alex Mastro amastro@fb.com wrote:
vfio_dma_mapping_test currently uses iova=vaddr as part of DMA mapping validation. The assumption that these IOVAs are legal has held up on all the hardware we've tested so far, but but is not guaranteed. Make the test more robust by using iova_allocator to vend IOVAs, which queries legally accessible IOVAs from the underlying IOMMUFD or VFIO container.
Signed-off-by: Alex Mastro amastro@fb.com
tools/testing/selftests/vfio/vfio_dma_mapping_test.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c index 37c2a342df8d..c1a015385b0f 100644 --- a/tools/testing/selftests/vfio/vfio_dma_mapping_test.c +++ b/tools/testing/selftests/vfio/vfio_dma_mapping_test.c @@ -95,6 +95,7 @@ static int iommu_mapping_get(const char *bdf, u64 iova, FIXTURE(vfio_dma_mapping_test) { struct vfio_pci_device *device;
- struct iova_allocator iova_allocator;
}; FIXTURE_VARIANT(vfio_dma_mapping_test) { @@ -118,11 +119,16 @@ FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(anonymous_hugetlb_1gb, SZ_1G, MAP_HUGETLB | FIXTURE_SETUP(vfio_dma_mapping_test) {
- int ret;
- self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode);
- ret = iova_allocator_init(self->device, &self->iova_allocator);
- VFIO_ASSERT_EQ(ret, 0);
} FIXTURE_TEARDOWN(vfio_dma_mapping_test) {
- iova_allocator_deinit(&self->iova_allocator); vfio_pci_device_cleanup(self->device);
} @@ -144,7 +150,7 @@ TEST_F(vfio_dma_mapping_test, dma_map_unmap) else ASSERT_NE(region.vaddr, MAP_FAILED);
- region.iova = (u64)region.vaddr;
- region.iova = iova_allocator_alloc(&self->iova_allocator, size); region.size = size;
vfio_pci_dma_map(self->device, ®ion);
There's another in the driver test. Thanks,
Oops -- thank you. Will add.
Alex
On Mon, Nov 10, 2025 at 1:11 PM Alex Mastro amastro@fb.com wrote:
Not all IOMMUs support the same virtual address width as the processor, for instance older Intel consumer platforms only support 39-bits of IOMMU address space. On such platforms, using the virtual address as the IOVA and mappings at the top of the address space both fail.
VFIO and IOMMUFD have facilities for retrieving valid IOVA ranges, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE and IOMMU_IOAS_IOVA_RANGES, respectively. These provide compatible arrays of ranges from which we can construct a simple allocator and record the maximum supported IOVA address.
Use this new allocator in place of reusing the virtual address, and incorporate the maximum supported IOVA into the limit testing. This latter change doesn't test quite the same absolute end-of-address space behavior but still seems to have some value. Testing for overflow is skipped when a reduced address space is supported as the desired errno is not generated.
This series is based on Alex Williamson's "Incorporate IOVA range info" [1] along with feedback from the discussion in David Matlack's "Skip vfio_dma_map_limit_test if mapping returns -EINVAL" [2].
Given David's plans to split IOMMU concerns from devices as described in [3], this series' home for `struct iova_allocator` is likely to be short lived, since it resides in vfio_pci_device.c. I assume that the rework can move this functionality to a more appropriate location next to other IOMMU-focused code, once such a place exists.
Yup, I'll rebase my iommu rework on top of this once it goes in, and move the iova allocator to a new home.
And thanks for getting this out so quickly. We've had an unstaffed internal task to get rid of iova=vaddr open for a few months now, so I'm very happy to see it get fixed.
linux-kselftest-mirror@lists.linaro.org