iommufd gives userspace the capabilty to manipulating iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will also support iommu nested translation. Different platform vendors have different implementation for the nested translation. So before set up nested translation, userspace needs to know the hardware iommu capabilities. For example, Intel platform supports guest I/O page table to be the first stage translation structure.
This series reports the iommu capability for a given iommufd_device which has been bound to iommufd. It is a preparation work for nested translation support[1]. In this series, Intel VT-d capability reporting is added. Other vendors may add their own reporting based on this series.
[1] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting_vtd_v1
Regards, Yi Liu
Lu Baolu (2): iommu: Add new iommu op to get iommu hardware information iommu/vt-d: Implement hw_info for iommu capability query
Nicolin Chen (2): iommufd/selftest: Set iommu_device for mock_device iommufd/selftest: Add coverage for IOMMU_DEVICE_GET_INFO ioctl
Yi Liu (2): iommufd: Add IOMMU_DEVICE_GET_INFO iommufd/device: Add mock_device support in iommufd_device_get_info()
drivers/iommu/intel/iommu.c | 19 ++++ drivers/iommu/intel/iommu.h | 1 + drivers/iommu/iommufd/device.c | 91 +++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 2 + drivers/iommu/iommufd/iommufd_test.h | 15 +++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 26 ++++++ include/linux/iommu.h | 8 ++ include/uapi/linux/iommufd.h | 63 +++++++++++++ tools/testing/selftests/iommu/iommufd.c | 18 +++- tools/testing/selftests/iommu/iommufd_utils.h | 26 ++++++ 11 files changed, 271 insertions(+), 1 deletion(-)
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new iommu op get the IOMMU hardware capabilities for iommufd. This information will be used by any vIOMMU driver which is owned by userspace.
This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there is room in the design to grow a generic parameter set as well. No warpper API is added as it is supposed to be used by iommufd only.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com --- include/linux/iommu.h | 8 ++++++++ include/uapi/linux/iommufd.h | 6 ++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cb586d054c57..97b398d19fd2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -15,6 +15,7 @@ #include <linux/of.h> #include <linux/ioasid.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
#define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -223,6 +224,11 @@ struct iommu_iotlb_gather { /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability + * @hw_info: IOMMU hardware capabilities. The type of the returned data is + * defined in include/uapi/linux/iommufd.h. The data buffer is + * allocated in the IOMMU driver and the caller should free it + * after use. Return the data buffer if success, or ERR_PTR on + * failure. * @domain_alloc: allocate iommu domain * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling @@ -252,6 +258,7 @@ struct iommu_iotlb_gather { */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap); + void *(*hw_info)(struct device *dev, u32 *length);
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); @@ -280,6 +287,7 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid);
const struct iommu_domain_ops *default_domain_ops; + enum iommu_device_data_type driver_type; unsigned long pgsize_bitmap; struct module *owner; }; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 98ebba80cfa1..2309edb55028 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -344,4 +344,10 @@ struct iommu_vfio_ioas { __u16 __reserved; }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS) + +/** + * enum iommu_device_data_type - IOMMU hardware Data types + */ +enum iommu_device_data_type { +}; #endif
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM @@ -223,6 +224,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware capabilities. The type of the returned data
is
is it 'info' or 'capability'?
defined in include/uapi/linux/iommufd.h. The data buffer is
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -252,6 +258,7 @@ struct iommu_iotlb_gather { */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap);
void *(*hw_info)(struct device *dev, u32 *length);
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned
iommu_domain_type); @@ -280,6 +287,7 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid);
const struct iommu_domain_ops *default_domain_ops;
- enum iommu_device_data_type driver_type;
the enum is called 'device_data_type' while the field is called 'driver_type'.
btw this new field is not documented above.
On 2023/2/10 15:28, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM @@ -223,6 +224,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware capabilities. The type of the returned data
is
is it 'info' or 'capability'?
defined in include/uapi/linux/iommufd.h. The data buffer is
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -252,6 +258,7 @@ struct iommu_iotlb_gather { */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap);
void *(*hw_info)(struct device *dev, u32 *length);
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned
iommu_domain_type); @@ -280,6 +287,7 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid);
const struct iommu_domain_ops *default_domain_ops;
- enum iommu_device_data_type driver_type;
the enum is called 'device_data_type' while the field is called 'driver_type'.
The enum is named from uAPI's p-o-v and driver_type is from IOMMU's.
btw this new field is not documented above.
Yes. Will do that.
Best regards, baolu
On 2023/2/10 15:28, Tian, Kevin wrote:
From: Liu, Yi Lyi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM @@ -223,6 +224,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware capabilities. The type of the returned data
is
is it 'info' or 'capability'?
hw_info. IOMMU core does not care about specific content, so it is not necessary to define it as capability or anything else.
Perhaps we need to change the comments a bit, say, "IOMMU hardware information"?
Best regards, baolu
From: Baolu Lu baolu.lu@linux.intel.com Sent: Saturday, February 11, 2023 11:43 AM
On 2023/2/10 15:28, Tian, Kevin wrote:
From: Liu, Yi Lyi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM @@ -223,6 +224,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware capabilities. The type of the returned
data
is
is it 'info' or 'capability'?
hw_info. IOMMU core does not care about specific content, so it is not necessary to define it as capability or anything else.
Perhaps we need to change the comments a bit, say, "IOMMU hardware information"?
yes, that is probably more consistent.
On 2/9/2023 12:16 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new iommu op get
get -> to get
the IOMMU hardware capabilities for iommufd. This information will be used by any vIOMMU driver which is owned by userspace.
This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there is room in the design to grow a generic parameter set as well. No warpper
warpper -> wrapper
API is added as it is supposed to be used by iommufd only.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
include/linux/iommu.h | 8 ++++++++ include/uapi/linux/iommufd.h | 6 ++++++ 2 files changed, 14 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cb586d054c57..97b398d19fd2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -15,6 +15,7 @@ #include <linux/of.h> #include <linux/ioasid.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h> #define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -223,6 +224,11 @@ struct iommu_iotlb_gather { /**
- struct iommu_ops - iommu ops and capabilities
- @capable: check capability
- @hw_info: IOMMU hardware capabilities. The type of the returned data is
defined in include/uapi/linux/iommufd.h. The data buffer is
allocated in the IOMMU driver and the caller should free it
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
@@ -252,6 +258,7 @@ struct iommu_iotlb_gather { */ struct iommu_ops { bool (*capable)(struct device *dev, enum iommu_cap);
- void *(*hw_info)(struct device *dev, u32 *length);
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); @@ -280,6 +287,7 @@ struct iommu_ops { void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid); const struct iommu_domain_ops *default_domain_ops;
- enum iommu_device_data_type driver_type;
How to understand the name "iommu_device_data_type"? Is it just refer to the driver types or it has a more generic meaning?
unsigned long pgsize_bitmap; struct module *owner; }; diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 98ebba80cfa1..2309edb55028 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -344,4 +344,10 @@ struct iommu_vfio_ioas { __u16 __reserved; }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
+/**
- enum iommu_device_data_type - IOMMU hardware Data types
- */
+enum iommu_device_data_type { +}; #endif
On 2023/2/13 10:36, Binbin Wu wrote:
On 2/9/2023 12:16 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new iommu op get
get -> to get
the IOMMU hardware capabilities for iommufd. This information will be used by any vIOMMU driver which is owned by userspace.
This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there is room in the design to grow a generic parameter set as well. No warpper
warpper -> wrapper
Thanks, will update.
API is added as it is supposed to be used by iommufd only.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com
include/linux/iommu.h       | 8 ++++++++  include/uapi/linux/iommufd.h | 6 ++++++  2 files changed, 14 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cb586d054c57..97b398d19fd2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -15,6 +15,7 @@  #include <linux/of.h>  #include <linux/ioasid.h>  #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>  #define IOMMU_READ   (1 << 0)  #define IOMMU_WRITE   (1 << 1) @@ -223,6 +224,11 @@ struct iommu_iotlb_gather {  /**   * struct iommu_ops - iommu ops and capabilities   * @capable: check capability
- @hw_info: IOMMU hardware capabilities. The type of the returned
data is
- *Â Â Â Â Â Â Â Â Â Â defined in include/uapi/linux/iommufd.h. The data buffer is
- *Â Â Â Â Â Â Â Â Â Â allocated in the IOMMU driver and the caller should free it
- *Â Â Â Â Â Â Â Â Â Â after use. Return the data buffer if success, or ERR_PTR on
- *Â Â Â Â Â Â Â Â Â failure.
* @domain_alloc: allocate iommu domain   * @probe_device: Add device to iommu driver handling   * @release_device: Remove device from iommu driver handling @@ -252,6 +258,7 @@ struct iommu_iotlb_gather {   */  struct iommu_ops {      bool (*capable)(struct device *dev, enum iommu_cap); +   void *(*hw_info)(struct device *dev, u32 *length);      /* Domain allocation and freeing by the iommu driver */      struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); @@ -280,6 +287,7 @@ struct iommu_ops {      void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid);      const struct iommu_domain_ops *default_domain_ops; +   enum iommu_device_data_type driver_type;
How to understand the name "iommu_device_data_type"?
This is to tell userspace which type of IOMMU the returned information comes from, Intel VT-d, ARM or others...
Is it just refer to the driver types or it has a more generic meaning?
IOMMU driver sets this field to distinguish different IOMMU hardware.
Best regards, baolu
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Sun yi.y.sun@linux.intel.com --- drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++ drivers/iommu/intel/iommu.h | 1 + include/uapi/linux/iommufd.h | 21 +++++++++++++++++++++ 3 files changed, 41 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 59df7e42fd53..929f600cc350 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4760,8 +4760,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) intel_pasid_tear_down_entry(iommu, dev, pasid, false); }
+static void *intel_iommu_hw_info(struct device *dev, u32 *length) +{ + struct device_domain_info *info = dev_iommu_priv_get(dev); + struct intel_iommu *iommu = info->iommu; + struct iommu_device_info_vtd *vtd; + + vtd = kzalloc(sizeof(*vtd), GFP_KERNEL); + if (!vtd) + return ERR_PTR(-ENOMEM); + + vtd->cap_reg = iommu->cap; + vtd->ecap_reg = iommu->ecap; + *length = sizeof(*vtd); + + return vtd; +} + const struct iommu_ops intel_iommu_ops = { .capable = intel_iommu_capable, + .hw_info = intel_iommu_hw_info, .domain_alloc = intel_iommu_domain_alloc, .probe_device = intel_iommu_probe_device, .probe_finalize = intel_iommu_probe_finalize, @@ -4774,6 +4792,7 @@ const struct iommu_ops intel_iommu_ops = { .def_domain_type = device_def_domain_type, .remove_dev_pasid = intel_iommu_remove_dev_pasid, .pgsize_bitmap = SZ_4K, + .driver_type = IOMMU_DEVICE_DATA_INTEL_VTD, #ifdef CONFIG_INTEL_IOMMU_SVM .page_response = intel_svm_page_response, #endif diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 06e61e474856..2e70265d4ceb 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -22,6 +22,7 @@ #include <linux/ioasid.h> #include <linux/bitfield.h> #include <linux/xarray.h> +#include <uapi/linux/iommufd.h>
#include <asm/cacheflush.h> #include <asm/iommu.h> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 2309edb55028..fda75c8450ee 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,7 +347,28 @@ struct iommu_vfio_ioas {
/** * enum iommu_device_data_type - IOMMU hardware Data types + * @IOMMU_DEVICE_DATA_INTEL_VTD: Intel VT-d iommu data type */ enum iommu_device_data_type { + IOMMU_DEVICE_DATA_INTEL_VTD = 1, +}; + +/** + * struct iommu_device_info_vtd - Intel VT-d device info + * + * @flags: Must be set to 0 + * @__reserved: Must be 0 + * @cap_reg: Value of Intel VT-d capability register defined in chapter + * 11.4.2 of Intel VT-d spec. + * @ecap_reg: Value of Intel VT-d capability register defined in chapter + * 11.4.3 of Intel VT-d spec. + * + * Intel hardware iommu capability. + */ +struct iommu_device_info_vtd { + __u32 flags; + __u32 __reserved; + __aligned_u64 cap_reg; + __aligned_u64 ecap_reg; }; #endif
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
remove this sentence. that belongs to next patch.
+/**
- struct iommu_device_info_vtd - Intel VT-d device info
- @flags: Must be set to 0
- @__reserved: Must be 0
- @cap_reg: Value of Intel VT-d capability register defined in chapter
11.4.2 of Intel VT-d spec.
- @ecap_reg: Value of Intel VT-d capability register defined in chapter
11.4.3 of Intel VT-d spec.
let's be specific with section name e.g. "11.4.2 Capability Register" so if the chapter index is changed later people can still know where to look at.
- Intel hardware iommu capability.
duplicated with the first line "Intel VT-d device info"
On 2023/2/10 15:32, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
remove this sentence. that belongs to next patch.
+/**
- struct iommu_device_info_vtd - Intel VT-d device info
- @flags: Must be set to 0
- @__reserved: Must be 0
- @cap_reg: Value of Intel VT-d capability register defined in chapter
11.4.2 of Intel VT-d spec.
- @ecap_reg: Value of Intel VT-d capability register defined in chapter
11.4.3 of Intel VT-d spec.
let's be specific with section name e.g. "11.4.2 Capability Register" so if the chapter index is changed later people can still know where to look at.
- Intel hardware iommu capability.
duplicated with the first line "Intel VT-d device info"
Above all agreed.
Best regards, baolu
On Wed, 8 Feb 2023 20:16:38 -0800 Yi Liu yi.l.liu@intel.com wrote:
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Sun yi.y.sun@linux.intel.com
drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++ drivers/iommu/intel/iommu.h | 1 + include/uapi/linux/iommufd.h | 21 +++++++++++++++++++++ 3 files changed, 41 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 59df7e42fd53..929f600cc350 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4760,8 +4760,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) intel_pasid_tear_down_entry(iommu, dev, pasid, false); } +static void *intel_iommu_hw_info(struct device *dev, u32 *length) +{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct intel_iommu *iommu = info->iommu;
- struct iommu_device_info_vtd *vtd;
- vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
- if (!vtd)
return ERR_PTR(-ENOMEM);
- vtd->cap_reg = iommu->cap;
- vtd->ecap_reg = iommu->ecap;
Just a friendly reminder that these registers are already exposed to userspace under /sys/class/iommu/ and each device has an iommu link back to their iommu device there. This series doesn't really stand on its own without some discussion of why that interface is not sufficient for this use case. Thanks,
Alex
On Fri, Feb 10, 2023 at 03:44:10PM -0700, Alex Williamson wrote:
On Wed, 8 Feb 2023 20:16:38 -0800 Yi Liu yi.l.liu@intel.com wrote:
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Sun yi.y.sun@linux.intel.com
drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++ drivers/iommu/intel/iommu.h | 1 + include/uapi/linux/iommufd.h | 21 +++++++++++++++++++++ 3 files changed, 41 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 59df7e42fd53..929f600cc350 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4760,8 +4760,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) intel_pasid_tear_down_entry(iommu, dev, pasid, false); } +static void *intel_iommu_hw_info(struct device *dev, u32 *length) +{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct intel_iommu *iommu = info->iommu;
- struct iommu_device_info_vtd *vtd;
- vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
- if (!vtd)
return ERR_PTR(-ENOMEM);
- vtd->cap_reg = iommu->cap;
- vtd->ecap_reg = iommu->ecap;
Just a friendly reminder that these registers are already exposed to userspace under /sys/class/iommu/ and each device has an iommu link back to their iommu device there.
I think in cases of mdevs w/ PASID (eg SIOV) it is not general to get from vfio sysfs to the sysfs path for the iommu.
This series doesn't really stand on its own without some discussion of why that interface is not sufficient for this use case.
IMHO I don't really like the idea of mixing iommufd with sysfs, it should stand on its own.
In particular there is no generic way to go from a iommufd dev_id to any sysfs path, userspace would need prior unique knowledge about the subsystem that is being bound to iommufd first.
So, I think some of those explanations would be good in the commit message?
I would also add explanation about what userspace is supposed to do with these bits when it operates the nesting feature.
Jason
On 2/9/2023 12:16 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Sun yi.y.sun@linux.intel.com
drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++ drivers/iommu/intel/iommu.h | 1 + include/uapi/linux/iommufd.h | 21 +++++++++++++++++++++ 3 files changed, 41 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 59df7e42fd53..929f600cc350 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4760,8 +4760,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) intel_pasid_tear_down_entry(iommu, dev, pasid, false); } +static void *intel_iommu_hw_info(struct device *dev, u32 *length) +{
- struct device_domain_info *info = dev_iommu_priv_get(dev);
- struct intel_iommu *iommu = info->iommu;
- struct iommu_device_info_vtd *vtd;
- vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
- if (!vtd)
return ERR_PTR(-ENOMEM);
- vtd->cap_reg = iommu->cap;
- vtd->ecap_reg = iommu->ecap;
- *length = sizeof(*vtd);
- return vtd;
+}
- const struct iommu_ops intel_iommu_ops = { .capable = intel_iommu_capable,
- .hw_info = intel_iommu_hw_info, .domain_alloc = intel_iommu_domain_alloc, .probe_device = intel_iommu_probe_device, .probe_finalize = intel_iommu_probe_finalize,
@@ -4774,6 +4792,7 @@ const struct iommu_ops intel_iommu_ops = { .def_domain_type = device_def_domain_type, .remove_dev_pasid = intel_iommu_remove_dev_pasid, .pgsize_bitmap = SZ_4K,
- .driver_type = IOMMU_DEVICE_DATA_INTEL_VTD, #ifdef CONFIG_INTEL_IOMMU_SVM .page_response = intel_svm_page_response, #endif
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 06e61e474856..2e70265d4ceb 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -22,6 +22,7 @@ #include <linux/ioasid.h> #include <linux/bitfield.h> #include <linux/xarray.h> +#include <uapi/linux/iommufd.h> #include <asm/cacheflush.h> #include <asm/iommu.h> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 2309edb55028..fda75c8450ee 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,7 +347,28 @@ struct iommu_vfio_ioas { /**
- enum iommu_device_data_type - IOMMU hardware Data types
*/ enum iommu_device_data_type {
- @IOMMU_DEVICE_DATA_INTEL_VTD: Intel VT-d iommu data type
- IOMMU_DEVICE_DATA_INTEL_VTD = 1,
+};
+/**
- struct iommu_device_info_vtd - Intel VT-d device info
- @flags: Must be set to 0
Could you add more description about the usage of flags here?
- @__reserved: Must be 0
- @cap_reg: Value of Intel VT-d capability register defined in chapter
11.4.2 of Intel VT-d spec.
- @ecap_reg: Value of Intel VT-d capability register defined in chapter
11.4.3 of Intel VT-d spec.
- Intel hardware iommu capability.
- */
+struct iommu_device_info_vtd {
- __u32 flags;
- __u32 __reserved;
- __aligned_u64 cap_reg;
- __aligned_u64 ecap_reg; }; #endif
On 2023/2/13 11:09, Binbin Wu wrote:
On 2/9/2023 12:16 PM, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Sun yi.y.sun@linux.intel.com
drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++  drivers/iommu/intel/iommu.h | 1 +  include/uapi/linux/iommufd.h | 21 +++++++++++++++++++++  3 files changed, 41 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 59df7e42fd53..929f600cc350 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4760,8 +4760,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid) Â Â Â Â Â intel_pasid_tear_down_entry(iommu, dev, pasid, false); Â } +static void *intel_iommu_hw_info(struct device *dev, u32 *length) +{ +Â Â Â struct device_domain_info *info = dev_iommu_priv_get(dev); +Â Â Â struct intel_iommu *iommu = info->iommu; +Â Â Â struct iommu_device_info_vtd *vtd;
+Â Â Â vtd = kzalloc(sizeof(*vtd), GFP_KERNEL); +Â Â Â if (!vtd) +Â Â Â Â Â Â Â return ERR_PTR(-ENOMEM);
+Â Â Â vtd->cap_reg = iommu->cap; +Â Â Â vtd->ecap_reg = iommu->ecap; +Â Â Â *length = sizeof(*vtd);
+Â Â Â return vtd; +}
const struct iommu_ops intel_iommu_ops = {      .capable       = intel_iommu_capable, +   .hw_info       = intel_iommu_hw_info,      .domain_alloc       = intel_iommu_domain_alloc,      .probe_device       = intel_iommu_probe_device,      .probe_finalize       = intel_iommu_probe_finalize, @@ -4774,6 +4792,7 @@ const struct iommu_ops intel_iommu_ops = {      .def_domain_type   = device_def_domain_type,      .remove_dev_pasid   = intel_iommu_remove_dev_pasid,      .pgsize_bitmap       = SZ_4K, +   .driver_type       = IOMMU_DEVICE_DATA_INTEL_VTD,  #ifdef CONFIG_INTEL_IOMMU_SVM      .page_response       = intel_svm_page_response,  #endif diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 06e61e474856..2e70265d4ceb 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -22,6 +22,7 @@  #include <linux/ioasid.h>  #include <linux/bitfield.h>  #include <linux/xarray.h> +#include <uapi/linux/iommufd.h>  #include <asm/cacheflush.h>  #include <asm/iommu.h> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 2309edb55028..fda75c8450ee 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,7 +347,28 @@ struct iommu_vfio_ioas {  /**   * enum iommu_device_data_type - IOMMU hardware Data types
- @IOMMU_DEVICE_DATA_INTEL_VTD: Intel VT-d iommu data type
*/ Â enum iommu_device_data_type { +Â Â Â IOMMU_DEVICE_DATA_INTEL_VTD = 1, +};
+/**
- struct iommu_device_info_vtd - Intel VT-d device info
- @flags: Must be set to 0
Could you add more description about the usage of flags here?
This is a Reserved 0 fields for other coming features.
Best regards, baolu
On Wed, Feb 08, 2023 at 08:16:38PM -0800, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
To support nested translation in the userspace, it should check the underlying hardware information for the capabilities.
Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Sun yi.y.sun@linux.intel.com
drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++ drivers/iommu/intel/iommu.h | 1 + include/uapi/linux/iommufd.h | 21 +++++++++++++++++++++ 3 files changed, 41 insertions(+)
This should come after the next patch in the series
Jason
Under nested IOMMU translation, userspace owns the stage-1 translation structure (e.g. the stage-1 page table of Intel VT-d or the context table of ARM SMMUv3, and etc.). Such stage-1 translation structures are vendor specific, and needs to be compatiable with the underlying IOMMU hardware. Hence, userspace should know the IOMMU hardware capability before creating and configuring the stage-1 translation structure to kernel.
This adds ioctl: IOMMU_DEVICE_GET_INFO to query the IOMMU hardware capability for a given device. The returned data is vendor specific, userspace can tell it by the @out_device_type field.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 72 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 36 +++++++++++++ 4 files changed, 112 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 8a9834fc129a..3b64aef24807 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -134,6 +134,78 @@ void iommufd_device_unbind(struct iommufd_device *idev) } EXPORT_SYMBOL_NS_GPL(iommufd_device_unbind, IOMMUFD);
+static int iommufd_zero_fill_user(u64 ptr, int bytes) +{ + int index = 0; + + for (; index < bytes; index++) { + if (put_user(0, (uint8_t __user *)(ptr + index))) + return -EFAULT; + } + return 0; +} + +int iommufd_device_get_info(struct iommufd_ucmd *ucmd) +{ + struct iommu_device_info *cmd = ucmd->cmd; + struct iommufd_object *dev_obj; + struct device *dev; + const struct iommu_ops *ops; + void *data; + unsigned int length, data_len; + int rc; + + if (cmd->flags || cmd->__reserved || !cmd->data_len) + return -EOPNOTSUPP; + + dev_obj = iommufd_get_object(ucmd->ictx, cmd->dev_id, + IOMMUFD_OBJ_DEVICE); + if (IS_ERR(dev_obj)) + return PTR_ERR(dev_obj); + + dev = container_of(dev_obj, struct iommufd_device, obj)->dev; + + ops = dev_iommu_ops(dev); + if (!ops || !ops->hw_info) { + rc = -EOPNOTSUPP; + goto out_put; + } + + data = ops->hw_info(dev, &data_len); + if (IS_ERR(data)) { + rc = PTR_ERR(data); + goto out_put; + } + + length = min(cmd->data_len, data_len); + if (copy_to_user(u64_to_user_ptr(cmd->data_ptr), data, length)) { + rc = -EFAULT; + goto out_free_data; + } + + /* + * Zero the trailing bytes for userspace if the buffer is bigger + * than the data size kernel actually has. + */ + if (length < cmd->data_len) { + rc = iommufd_zero_fill_user(cmd->data_ptr + length, + cmd->data_len - length); + if (rc) + goto out_free_data; + } + + cmd->out_device_type = ops->driver_type; + cmd->data_len = data_len; + + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); + +out_free_data: + kfree(data); +out_put: + iommufd_put_object(dev_obj); + return rc; +} + static int iommufd_device_setup_msi(struct iommufd_device *idev, struct iommufd_hw_pagetable *hwpt, phys_addr_t sw_msi_start) diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 200c783800ad..4a0a1a7fdae1 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -258,6 +258,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
void iommufd_device_destroy(struct iommufd_object *obj); +int iommufd_device_get_info(struct iommufd_ucmd *ucmd);
struct iommufd_access { struct iommufd_object obj; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 3fbe636c3d8a..59aa30ad1090 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -250,6 +250,7 @@ static int iommufd_option(struct iommufd_ucmd *ucmd)
union ucmd_buffer { struct iommu_destroy destroy; + struct iommu_device_info info; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -281,6 +282,8 @@ struct iommufd_ioctl_op { } static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), + IOCTL_OP(IOMMU_DEVICE_GET_INFO, iommufd_device_get_info, struct iommu_device_info, + __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index fda75c8450ee..6cfe102f26f3 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -45,6 +45,7 @@ enum { IOMMUFD_CMD_IOAS_UNMAP, IOMMUFD_CMD_OPTION, IOMMUFD_CMD_VFIO_IOAS, + IOMMUFD_CMD_DEVICE_GET_INFO, };
/** @@ -371,4 +372,39 @@ struct iommu_device_info_vtd { __aligned_u64 cap_reg; __aligned_u64 ecap_reg; }; + +/** + * struct iommu_device_info - ioctl(IOMMU_DEVICE_GET_INFO) + * @size: sizeof(struct iommu_device_info) + * @flags: Must be 0 + * @dev_id: The device being attached to the IOMMU + * @data_len: Input the type specific data buffer length in bytes + * @data_ptr: Pointer to the type specific structure (e.g. + * struct iommu_device_info_vtd) + * @out_device_type: Output the underlying iommu hardware type, it is + * one of enum iommu_device_data_type. + * @__reserved: Must be 0 + * + * Query the hardware iommu capability for given device which has been + * bound to iommufd. @data_len is set to be the size of the buffer to + * type specific data and the data will be filled. Trailing bytes are + * zeroed if the user buffer is larger than the data kernel has. + * + * The type specific data would be used to sync capability between the + * vIOMMU and the hardware IOMMU, also for the availabillity checking of + * iommu hardware features like dirty page tracking in I/O page table. + * + * The @out_device_type will be filled if the ioctl succeeds. It would + * be used to decode the data filled in the buffer pointed by @data_ptr. + */ +struct iommu_device_info { + __u32 size; + __u32 flags; + __u32 dev_id; + __u32 data_len; + __aligned_u64 data_ptr; + __u32 out_device_type; + __u32 __reserved; +}; +#define IOMMU_DEVICE_GET_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_INFO) #endif
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM
+int iommufd_device_get_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_device_info *cmd = ucmd->cmd;
- struct iommufd_object *dev_obj;
- struct device *dev;
- const struct iommu_ops *ops;
- void *data;
- unsigned int length, data_len;
- int rc;
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
do we want !cmd->data_len being a way to check how many bytes are required in a following call to get the vendor info?
- /*
* Zero the trailing bytes for userspace if the buffer is bigger
* than the data size kernel actually has.
*/
"Zero the trailing bytes if the user buffer ..."
+/**
- struct iommu_device_info - ioctl(IOMMU_DEVICE_GET_INFO)
- @size: sizeof(struct iommu_device_info)
- @flags: Must be 0
- @dev_id: The device being attached to the IOMMU
iommufd
- @data_len: Input the type specific data buffer length in bytes
also is an output field
- Query the hardware iommu capability for given device which has been
- bound to iommufd. @data_len is set to be the size of the buffer to
- type specific data and the data will be filled. Trailing bytes are
"@data_len is the size of the buffer which captures iommu type specific data"
- zeroed if the user buffer is larger than the data kernel has.
- The type specific data would be used to sync capability between the
- vIOMMU and the hardware IOMMU, also for the availabillity checking of
- iommu hardware features like dirty page tracking in I/O page table.
It's fine to report format information related to stage-1 page table which userspace manages.
but IMHO this should not be an interface to report which capability is supported by iommufd. Having hardware supporting dirty bit doesn't mean the underlying iommu driver provides necessary support to iommufd dirty tracking.
We either don't state this way if following what this series does which simply dumps all raw iommu hw info to userspace, or explicitly require iommu driver to only report information as required when a feature is supported by iommufd.
- The @out_device_type will be filled if the ioctl succeeds. It would
s/will be/is/
- be used to decode the data filled in the buffer pointed by @data_ptr.
s/would be/is/
- */
+struct iommu_device_info {
- __u32 size;
- __u32 flags;
- __u32 dev_id;
- __u32 data_len;
- __aligned_u64 data_ptr;
moving forward iommu hw cap is not the only information reported via this interface, e.g. it can be also used to report pasid mode.
from this angle it's better to rename above two fields to be iommu specific, e.g.:
__u32 iommu_data_len; __aligned_u64 iommu_data_ptr;
- __u32 out_device_type;
- __u32 __reserved;
+}; +#define IOMMU_DEVICE_GET_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_INFO)
#endif
2.34.1
On 10/02/2023 07:55, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM
- zeroed if the user buffer is larger than the data kernel has.
- The type specific data would be used to sync capability between the
- vIOMMU and the hardware IOMMU, also for the availabillity checking of
- iommu hardware features like dirty page tracking in I/O page table.
It's fine to report format information related to stage-1 page table which userspace manages.
but IMHO this should not be an interface to report which capability is supported by iommufd. Having hardware supporting dirty bit doesn't mean the underlying iommu driver provides necessary support to iommufd dirty tracking.
+1
In fact this shouldn't really be a way to check any capability as we are dumping straight away the IOMMU hardware registers. By dumping raw IOMMU hardware data, forces the application to understand IOMMU hardware specific formats. Maybe that's OK for hw nesting as there's a lot of info you need to know for the vIOMMU pgtables, pasid and etc so both are tightly coupled. But it is a bit disconnected from what really the software (IOMMUFD) and driver can use, without getting onto assumptions.
[Calling it IOMMU_DEVICE_GET_HW_INFO would be bit more obvious if you're not asking IOMMUFD support]
For capability checking, I think this really should be returning capabilities that both IOMMU driver supports ... and that IOMMUFD understands and marshalled on a format of its own defined by IOMMUFD. Maybe using IOMMU_CAP or some other thing within the kernel (now that's per-device), or even simpler. That's at least had written initially for the dirty tracking series.
On Fri, Feb 10, 2023 at 11:10:34AM +0000, Joao Martins wrote:
On 10/02/2023 07:55, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM
- zeroed if the user buffer is larger than the data kernel has.
- The type specific data would be used to sync capability between the
- vIOMMU and the hardware IOMMU, also for the availabillity checking of
- iommu hardware features like dirty page tracking in I/O page table.
It's fine to report format information related to stage-1 page table which userspace manages.
but IMHO this should not be an interface to report which capability is supported by iommufd. Having hardware supporting dirty bit doesn't mean the underlying iommu driver provides necessary support to iommufd dirty tracking.
+1
In fact this shouldn't really be a way to check any capability as we are dumping straight away the IOMMU hardware registers. By dumping raw IOMMU hardware data, forces the application to understand IOMMU hardware specific formats. Maybe that's OK for hw nesting as there's a lot of info you need to know for the vIOMMU pgtables, pasid and etc so both are tightly coupled. But it is a bit disconnected from what really the software (IOMMUFD) and driver can use, without getting onto assumptions.
[Calling it IOMMU_DEVICE_GET_HW_INFO would be bit more obvious if you're not asking IOMMUFD support]
For capability checking, I think this really should be returning capabilities that both IOMMU driver supports ... and that IOMMUFD understands and marshalled on a format of its own defined by IOMMUFD. Maybe using IOMMU_CAP or some other thing within the kernel (now that's per-device), or even simpler. That's at least had written initially for the dirty tracking series.
Yes, the design of IOMMU_DEVICE_GET_INFO is already split. The HW specific structure should only contain things related to operating the HW specific paths (ie other HW specific structures in other APIs)
The enclosing struct should have general information about operating the device through the abstract API - like dirty tracking.
But it may be that HW will individually report dirty tracking capabilties related to special page table types - ie can the S2 page table do dirty tracking
Jason
On Fri, Feb 10, 2023 at 07:55:42AM +0000, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, February 9, 2023 12:17 PM
+int iommufd_device_get_info(struct iommufd_ucmd *ucmd) +{
- struct iommu_device_info *cmd = ucmd->cmd;
- struct iommufd_object *dev_obj;
- struct device *dev;
- const struct iommu_ops *ops;
- void *data;
- unsigned int length, data_len;
- int rc;
- if (cmd->flags || cmd->__reserved || !cmd->data_len)
return -EOPNOTSUPP;
do we want !cmd->data_len being a way to check how many bytes are required in a following call to get the vendor info?
There is no reason, if the userspace doesn't have a struct large enough then it also doesn't know what the extra bytes should even be used for. No reason to read the.
+struct iommu_device_info {
- __u32 size;
- __u32 flags;
- __u32 dev_id;
- __u32 data_len;
- __aligned_u64 data_ptr;
moving forward iommu hw cap is not the only information reported via this interface, e.g. it can be also used to report pasid mode.
from this angle it's better to rename above two fields to be iommu specific, e.g.:
__u32 iommu_data_len; __aligned_u64 iommu_data_ptr;
maybe call it hw info
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Saturday, February 11, 2023 5:00 AM
+struct iommu_device_info {
- __u32 size;
- __u32 flags;
- __u32 dev_id;
- __u32 data_len;
- __aligned_u64 data_ptr;
moving forward iommu hw cap is not the only information reported via this interface, e.g. it can be also used to report pasid mode.
from this angle it's better to rename above two fields to be iommu specific, e.g.:
__u32 iommu_data_len; __aligned_u64 iommu_data_ptr;
maybe call it hw info
that is fine given we already have 'struct iommu_device_info'.
probably this cmd should be named as IOMMU_DEVIEC_GET_IOMMU_INFO to be more accurate.
This prepares for adding selftest for IOMMU_DEVICE_GET_INFO. Selftest uses mock device, while physical device uses iommufd_device, so add a helper iommufd_obj_dev() to get struct device from the iommufd_object for a given dev_id.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 22 ++++++++++++++++++++-- drivers/iommu/iommufd/iommufd_private.h | 1 + drivers/iommu/iommufd/selftest.c | 7 +++++++ 3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 3b64aef24807..470838e6902d 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -145,6 +145,20 @@ static int iommufd_zero_fill_user(u64 ptr, int bytes) return 0; }
+static struct device * +iommufd_obj_dev(struct iommufd_object *obj) +{ + struct device *dev = NULL; + + if (obj->type == IOMMUFD_OBJ_DEVICE) + dev = container_of(obj, struct iommufd_device, obj)->dev; +#ifdef CONFIG_IOMMUFD_TEST + else if (obj->type == IOMMUFD_OBJ_SELFTEST) + dev = iommufd_selftest_obj_to_dev(obj); +#endif + return dev; +} + int iommufd_device_get_info(struct iommufd_ucmd *ucmd) { struct iommu_device_info *cmd = ucmd->cmd; @@ -159,11 +173,15 @@ int iommufd_device_get_info(struct iommufd_ucmd *ucmd) return -EOPNOTSUPP;
dev_obj = iommufd_get_object(ucmd->ictx, cmd->dev_id, - IOMMUFD_OBJ_DEVICE); + IOMMUFD_OBJ_ANY); if (IS_ERR(dev_obj)) return PTR_ERR(dev_obj);
- dev = container_of(dev_obj, struct iommufd_device, obj)->dev; + dev = iommufd_obj_dev(dev_obj); + if (!dev) { + rc = -EINVAL; + goto out_put; + }
ops = dev_iommu_ops(dev); if (!ops || !ops->hw_info) { diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 4a0a1a7fdae1..7748117e36f9 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -284,6 +284,7 @@ iommufd_device_selftest_attach(struct iommufd_ctx *ictx, struct device *mock_dev); void iommufd_device_selftest_detach(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt); +struct device *iommufd_selftest_obj_to_dev(struct iommufd_object *obj); int iommufd_test(struct iommufd_ucmd *ucmd); void iommufd_selftest_destroy(struct iommufd_object *obj); extern size_t iommufd_test_memory_limit; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index b94870f93138..2ecde22a60f4 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -802,6 +802,13 @@ void iommufd_selftest_destroy(struct iommufd_object *obj) } }
+struct device *iommufd_selftest_obj_to_dev(struct iommufd_object *obj) +{ + struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); + + return &sobj->idev.mock_dev; +} + int iommufd_test(struct iommufd_ucmd *ucmd) { struct iommu_test_cmd *cmd = ucmd->cmd;
From: Nicolin Chen nicolinc@nvidia.com
This is required to avoid NULL pointer when iommufd selftest tries to allocate domain for mock_device.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/selftest.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 2ecde22a60f4..5afed3fc30fe 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -273,6 +273,8 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd, struct iommu_test_cmd *cmd) { + static struct iommu_device iommu_dev = { .ops = &mock_ops }; + static struct dev_iommu iommu = { .iommu_dev = &iommu_dev }; static struct bus_type mock_bus = { .iommu_ops = &mock_ops }; struct iommufd_hw_pagetable *hwpt; struct selftest_obj *sobj; @@ -291,6 +293,7 @@ static int iommufd_test_mock_domain(struct iommufd_ucmd *ucmd, sobj->idev.ictx = ucmd->ictx; sobj->type = TYPE_IDEV; sobj->idev.mock_dev.bus = &mock_bus; + sobj->idev.mock_dev.iommu = &iommu;
hwpt = iommufd_device_selftest_attach(ucmd->ictx, ioas, &sobj->idev.mock_dev);
From: Nicolin Chen nicolinc@nvidia.com
This allows to get info for the mock_dev, helps to run selftest to check IOMMU_DEVICE_GET_INFO in selftest.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/iommufd_test.h | 15 +++++++++++ drivers/iommu/iommufd/selftest.c | 16 ++++++++++++ tools/testing/selftests/iommu/iommufd.c | 18 ++++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++++++++++++++ 5 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 470838e6902d..0e5d2bde7b3c 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -8,6 +8,7 @@
#include "io_pagetable.h" #include "iommufd_private.h" +#include "iommufd_test.h"
MODULE_IMPORT_NS(IOMMUFD_INTERNAL);
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index f2c61a9500e7..1605ff2b1a90 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -94,4 +94,19 @@ struct iommu_test_cmd { }; #define IOMMU_TEST_CMD _IO(IOMMUFD_TYPE, IOMMUFD_CMD_BASE + 32)
+/* Mock structs for IOMMU_DEVICE_GET_INFO ioctl */ +#define IOMMU_DEVICE_DATA_SELFTEST 0xfeedbeef +#define IOMMU_DEVICE_INFO_SELFTEST_REGVAL 0xdeadbeef + +/** + * struct iommu_device_info_selftest + * + * @flags: Must be set to 0 + * @test_reg: Pass IOMMU_DEVICE_INFO_SELFTEST_REGVAL to user selftest program + */ +struct iommu_device_info_selftest { + __u32 flags; + __u32 test_reg; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 5afed3fc30fe..5013c8757f4b 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -104,6 +104,20 @@ struct selftest_obj { }; };
+static void *mock_domain_hw_info(struct device *dev, u32 *length) +{ + struct iommu_device_info_selftest *info; + + info = kzalloc(sizeof(*info), GFP_KERNEL); + if (!info) + return ERR_PTR(-ENOMEM); + + info->test_reg = IOMMU_DEVICE_INFO_SELFTEST_REGVAL; + *length = sizeof(*info); + + return info; +} + static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) { struct mock_iommu_domain *mock; @@ -239,6 +253,8 @@ static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain, static const struct iommu_ops mock_ops = { .owner = THIS_MODULE, .pgsize_bitmap = MOCK_IO_PAGE_SIZE, + .driver_type = IOMMU_DEVICE_DATA_SELFTEST, + .hw_info = mock_domain_hw_info, .domain_alloc = mock_domain_alloc, .default_domain_ops = &(struct iommu_domain_ops){ diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 1e293950ac88..8e1369451464 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -124,6 +124,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP); TEST_LENGTH(iommu_option, IOMMU_OPTION); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS); + TEST_LENGTH(iommu_device_info, IOMMU_DEVICE_GET_INFO); #undef TEST_LENGTH }
@@ -185,6 +186,7 @@ TEST_F(iommufd, global_options) FIXTURE(iommufd_ioas) { int fd; + uint32_t dev_id; uint32_t ioas_id; uint32_t domain_id; uint64_t base_iova; @@ -212,7 +214,8 @@ FIXTURE_SETUP(iommufd_ioas) }
for (i = 0; i != variant->mock_domains; i++) { - test_cmd_mock_domain(self->ioas_id, NULL, &self->domain_id); + test_cmd_mock_domain(self->ioas_id, &self->dev_id, + &self->domain_id); self->base_iova = MOCK_APERTURE_START; } } @@ -281,6 +284,19 @@ TEST_F(iommufd_ioas, ioas_area_auto_destroy) } }
+TEST_F(iommufd_ioas, device_get_info) +{ + struct iommu_device_info_selftest info; + + if (self->dev_id) { + test_cmd_device_get_info(self->dev_id, sizeof(info), &info); + assert(info.test_reg == IOMMU_DEVICE_INFO_SELFTEST_REGVAL); + } else { + test_err_device_get_info(ENOENT, self->dev_id, + sizeof(info), &info); + } +} + TEST_F(iommufd_ioas, area) { int i; diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 67805afc620f..1807d29c05f8 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -294,3 +294,29 @@ static void teardown_iommufd(int fd, struct __test_metadata *_metadata) })
#endif + +static int _test_cmd_device_get_info(int fd, __u32 device_id, + __u32 data_len, void *data) +{ + struct iommu_device_info cmd = { + .size = sizeof(cmd), + .dev_id = device_id, + .data_len = data_len, + .data_ptr = (uint64_t)data, + }; + int ret; + + ret = ioctl(fd, IOMMU_DEVICE_GET_INFO, &cmd); + if (ret) + return ret; + return 0; +} + +#define test_cmd_device_get_info(device_id, data_len, data) \ + ASSERT_EQ(0, _test_cmd_device_get_info(self->fd, device_id, \ + data_len, data)) + +#define test_err_device_get_info(_errno, device_id, data_len, data) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_device_get_info(self->fd, device_id, \ + data_len, data))
On Wed, Feb 08, 2023 at 08:16:36PM -0800, Yi Liu wrote:
iommufd gives userspace the capabilty to manipulating iommu subsytem. e.g. DMA map/unmap etc. In the near future, it will also support iommu nested translation. Different platform vendors have different implementation for the nested translation. So before set up nested translation, userspace needs to know the hardware iommu capabilities. For example, Intel platform supports guest I/O page table to be the first stage translation structure.
This series reports the iommu capability for a given iommufd_device which has been bound to iommufd. It is a preparation work for nested translation support[1]. In this series, Intel VT-d capability reporting is added. Other vendors may add their own reporting based on this series.
[1] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting_vtd_v1
Let's have the comments addressed and this rebased on top of https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
Which should address eg the selftest issue
I want to start chipping away at bits of the nesting patch pile and this part looks close
Thanks, Jason
linux-kselftest-mirror@lists.linaro.org