IOMMU hardwares that support nested translation would have two stages address translation (normally mentioned as stage-1 and stage-2). The page table formats of the stage-1 and stage-2 can be different. e.g., VT-d has different page table formats for stage-1 and stage-2.
Nested parent domain is the iommu domain used to represent the stage-2 translation. In IOMMUFD, both stage-1 and stage-2 translation are tracked as HWPT (a.k.a. iommu domain). Stage-2 HWPT is parent of stage-1 HWPT as stage-1 cannot work alone in nested translation. In the cases of stage-1 and stage-2 page table format are different, the parent HWPT should use exactly the stage-2 page table format. However, the existing kernel hides the format selection in iommu drivers, so the domain allocated via IOMMU_HWPT_ALLOC can use either stage-1 page table format or stage-2 page table format, there is no guarantees for it.
To enforce the page table format of the nested parent domain, this series introduces a new iommu op (domain_alloc_user) which can accept user flags to allocate domain as userspace requires. It also converts IOMMUFD to use the new domain_alloc_user op for domain allocation if supported, then extends the IOMMU_HWPT_ALLOC ioctl to pass down a NEST_PARENT flag to allocate a HWPT which can be used as parent. This series implements the new op in Intel iommu driver to have a complete picture. It is a preparation for adding nesting support in IOMMUFD/IOMMU.
Complete code can be found: https://github.com/yiliu1765/iommufd/tree/iommufd_alloc_user_v1
Regards, Yi Liu
Yi Liu (6): iommu: Add new iommu op to create domains owned by userspace iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation iommufd/hw_pagetable: Accepts user flags for domain allocation iommufd/hw_pagetable: Support allocating nested parent domain iommufd/selftest: Add domain_alloc_user() support in iommu mock iommu/vt-d: Add domain_alloc_user op
drivers/iommu/intel/iommu.c | 20 ++++++++++++ drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 31 ++++++++++++++----- drivers/iommu/iommufd/iommufd_private.h | 3 +- drivers/iommu/iommufd/selftest.c | 16 ++++++++++ include/linux/iommu.h | 8 +++++ include/uapi/linux/iommufd.h | 12 ++++++- tools/testing/selftests/iommu/iommufd.c | 24 +++++++++++--- .../selftests/iommu/iommufd_fail_nth.c | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 11 +++++-- 10 files changed, 111 insertions(+), 18 deletions(-)
Introduce a new iommu_domain op to create domains owned by userspace, e.g. through IOMMUFD. These domains have a few different properties compares to kernel owned domains:
- They may be UNMANAGED domains, but created with special parameters. For instance aperture size changes/number of levels, different IOPTE formats, or other things necessary to make a vIOMMU work
- We have to track all the memory allocations with GFP_KERNEL_ACCOUNT to make the cgroup sandbox stronger
- Device-specialty domains, such as NESTED domains can be created by IOMMUFD.
The new op clearly says the domain is being created by IOMMUFD, that the domain is intended for userspace use, and it provides a way to pass user flags or a driver specific uAPI structure to customize the created domain to exactly what the vIOMMU userspace driver requires.
iommu drivers that cannot support VFIO/IOMMUFD should not support this op. This includes any driver that cannot provide a fully functional UNMANAGED domain.
This new op for now is only supposed to be used by IOMMUFD, hence no wrapper for it. IOMMUFD would call the callback directly. As for domain free, IOMMUFD would use iommu_domain_free().
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 8 ++++++++ include/uapi/linux/iommufd.h | 12 +++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index c50a769d569a..660dc1931dc9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -235,6 +235,13 @@ struct iommu_iotlb_gather { * use. The information type is one of enum iommu_hw_info_type defined * in include/uapi/linux/iommufd.h. * @domain_alloc: allocate iommu domain + * @domain_alloc_user: Allocate an iommu domain corresponding to the input + * parameters like flags defined as enum iommufd_ioas_map_flags + * in include/uapi/linux/iommufd.h. Different from the + * domain_alloc op, it requires iommu driver to fully + * initialize a new domain including the generic iommu_domain + * struct. Upon success, a domain is returned. Upon failure, + * ERR_PTR must be returned. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -267,6 +274,7 @@ struct iommu_ops {
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); + struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags);
struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index b4ba0c0cbab6..4a7c5c8fdbb4 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,10 +347,20 @@ struct iommu_vfio_ioas { }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
+/** + * enum iommufd_hwpt_alloc_flags - Flags for HWPT allocation + * @IOMMU_HWPT_ALLOC_NEST_PARENT: If set, allocate a domain which can serve + * as the parent domain in the nesting + * configuration. + */ +enum iommufd_hwpt_alloc_flags { + IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, +}; + /** * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) * @size: sizeof(struct iommu_hwpt_alloc) - * @flags: Must be 0 + * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for * @pt_id: The IOAS to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 19, 2023 5:25 PM
@@ -235,6 +235,13 @@ struct iommu_iotlb_gather {
use. The information type is one of enum iommu_hw_info_type
defined
in include/uapi/linux/iommufd.h.
- @domain_alloc: allocate iommu domain
Given now we have two @alloc ops it'd be clearer to also update the comment here so the explanation for @domain_alloc_user() is easier to be understood, e.g.:
@domain_alloc: allocate and return an iommu domain if success. Otherwise NULL is returned. The domain is not fully initialized until the caller iommu_domain_alloc() returns.
- @domain_alloc_user: Allocate an iommu domain corresponding to the
input
parameters like flags defined as enum
iommufd_ioas_map_flags
in include/uapi/linux/iommufd.h. Different from the
"to the input parameters as defined in include/uapi/linux/iommufd.h".
domain_alloc op, it requires iommu driver to fully
initialize a new domain including the generic iommu_domain
"Unlike @domain_alloc, it is called only by iommufd and must fully initialize the new domain before return".
*domain* here already refers to the generic iommu_domain struct.
On 2023/9/26 13:28, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 19, 2023 5:25 PM
@@ -235,6 +235,13 @@ struct iommu_iotlb_gather {
use. The information type is one of enum iommu_hw_info_type
defined
in include/uapi/linux/iommufd.h.
- @domain_alloc: allocate iommu domain
Given now we have two @alloc ops it'd be clearer to also update the comment here so the explanation for @domain_alloc_user() is easier to be understood, e.g.:
@domain_alloc: allocate and return an iommu domain if success. Otherwise NULL is returned. The domain is not fully initialized until the caller iommu_domain_alloc() returns.
- @domain_alloc_user: Allocate an iommu domain corresponding to the
input
parameters like flags defined as enum
iommufd_ioas_map_flags
in include/uapi/linux/iommufd.h. Different from the
"to the input parameters as defined in include/uapi/linux/iommufd.h".
domain_alloc op, it requires iommu driver to fully
initialize a new domain including the generic iommu_domain
"Unlike @domain_alloc, it is called only by iommufd and must fully initialize the new domain before return".
*domain* here already refers to the generic iommu_domain struct.
above comment well received.
This makes IOMMUFD to use iommu_domain_alloc_user() for iommu_domain creation as IOMMUFD needs to support iommu_domain allocation with parameters from userspace in nested support. If the iommu driver doesn't provide domain_alloc_user callback then IOMMUFD falls back to use iommu_domain_alloc().
Suggested-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index cf2c1504e20d..48874f896521 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -5,6 +5,7 @@ #include <linux/iommu.h> #include <uapi/linux/iommufd.h>
+#include "../iommu-priv.h" #include "iommufd_private.h"
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) @@ -74,6 +75,7 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, bool immediate_attach) { + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); struct iommufd_hw_pagetable *hwpt; int rc;
@@ -88,10 +90,19 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, refcount_inc(&ioas->obj.users); hwpt->ioas = ioas;
- hwpt->domain = iommu_domain_alloc(idev->dev->bus); - if (!hwpt->domain) { - rc = -ENOMEM; - goto out_abort; + if (ops->domain_alloc_user) { + hwpt->domain = ops->domain_alloc_user(idev->dev, 0); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + } else { + hwpt->domain = iommu_domain_alloc(idev->dev->bus); + if (!hwpt->domain) { + rc = -ENOMEM; + goto out_abort; + } }
/*
This extends iommufd_hw_pagetable_alloc() to accepts user flags.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 9 ++++++--- drivers/iommu/iommufd/iommufd_private.h | 3 ++- 3 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index ce78c3671539..e88fa73a45e6 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -540,7 +540,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, }
hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, - immediate_attach); + 0, immediate_attach); if (IS_ERR(hwpt)) { destroy_hwpt = ERR_CAST(hwpt); goto out_unlock; diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 48874f896521..5be7a31cbd9c 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -61,6 +61,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * @ictx: iommufd context * @ioas: IOAS to associate the domain with * @idev: Device to get an iommu_domain for + * @flags: Flags from userspace * @immediate_attach: True if idev should be attached to the hwpt * * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT @@ -73,7 +74,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) */ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, - struct iommufd_device *idev, bool immediate_attach) + struct iommufd_device *idev, u32 flags, + bool immediate_attach) { const struct iommu_ops *ops = dev_iommu_ops(idev->dev); struct iommufd_hw_pagetable *hwpt; @@ -91,7 +93,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, hwpt->ioas = ioas;
if (ops->domain_alloc_user) { - hwpt->domain = ops->domain_alloc_user(idev->dev, 0); + hwpt->domain = ops->domain_alloc_user(idev->dev, flags); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -166,7 +168,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) }
mutex_lock(&ioas->mutex); - hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false); + hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, + idev, cmd->flags, false); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt); goto out_unlock; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 2c58670011fe..3064997a0181 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -242,7 +242,8 @@ struct iommufd_hw_pagetable {
struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, - struct iommufd_device *idev, bool immediate_attach); + struct iommufd_device *idev, u32 flags, + bool immediate_attach); int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev);
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 19, 2023 5:25 PM
This extends iommufd_hw_pagetable_alloc() to accepts user flags.
Signed-off-by: Yi Liu yi.l.liu@intel.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
This extends IOMMU_HWPT_ALLOC to allocate domains used as parent (stage-2) in nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 5be7a31cbd9c..26a8a818ffa3 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -83,6 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
lockdep_assert_held(&ioas->mutex);
+ if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user) + return ERR_PTR(-EOPNOTSUPP); + hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); if (IS_ERR(hwpt)) return hwpt; @@ -154,7 +157,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) struct iommufd_ioas *ioas; int rc;
- if (cmd->flags || cmd->__reserved) + if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id);
On 9/19/23 5:25 PM, Yi Liu wrote:
This extends IOMMU_HWPT_ALLOC to allocate domains used as parent (stage-2) in nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 5be7a31cbd9c..26a8a818ffa3 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -83,6 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, lockdep_assert_held(&ioas->mutex);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
- hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); if (IS_ERR(hwpt)) return hwpt;
@@ -154,7 +157,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) struct iommufd_ioas *ioas; int rc;
- if (cmd->flags || cmd->__reserved)
- if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved) return -EOPNOTSUPP;
Need a parenthesis here, otherwise the compiler will interpret it as a different condition.
if ((cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT) || cmd->__reserved) return -EOPNOTSUPP;
Best regards, baolu
On 2023/9/20 13:05, Baolu Lu wrote:
On 9/19/23 5:25 PM, Yi Liu wrote:
This extends IOMMU_HWPT_ALLOC to allocate domains used as parent (stage-2) in nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 5be7a31cbd9c..26a8a818ffa3 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -83,6 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, lockdep_assert_held(&ioas->mutex); + if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user) + return ERR_PTR(-EOPNOTSUPP);
hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); if (IS_ERR(hwpt)) return hwpt; @@ -154,7 +157,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) struct iommufd_ioas *ioas; int rc; - if (cmd->flags || cmd->__reserved) + if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved) return -EOPNOTSUPP;
Need a parenthesis here, otherwise the compiler will interpret it as a different condition.
if ((cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT) || cmd->__reserved) return -EOPNOTSUPP;
ok.
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 19, 2023 5:25 PM
@@ -83,6 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
lockdep_assert_held(&ioas->mutex);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops-
domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
if (flags && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
as long as flags is non-zero we'll need the new alloc_user ops.
On 2023/9/26 13:32, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 19, 2023 5:25 PM
@@ -83,6 +83,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
lockdep_assert_held(&ioas->mutex);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops-
domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
if (flags && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
as long as flags is non-zero we'll need the new alloc_user ops.
yes.
This adds mock_domain_alloc_user function and also new test case for the new flag IOMMU_HWPT_ALLOC_NEST_PARENT.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/selftest.c | 16 +++++++++++++ tools/testing/selftests/iommu/iommufd.c | 24 +++++++++++++++---- .../selftests/iommu/iommufd_fail_nth.c | 2 +- tools/testing/selftests/iommu/iommufd_utils.h | 11 ++++++--- 4 files changed, 45 insertions(+), 8 deletions(-)
diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 56506d5753f1..b54cbfb1862b 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -146,6 +146,8 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type) return info; }
+static const struct iommu_ops mock_ops; + static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) { struct mock_iommu_domain *mock; @@ -162,10 +164,23 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) mock->domain.geometry.aperture_start = MOCK_APERTURE_START; mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST; mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE; + mock->domain.ops = mock_ops.default_domain_ops; + mock->domain.type = iommu_domain_type; xa_init(&mock->pfns); return &mock->domain; }
+static struct iommu_domain * +mock_domain_alloc_user(struct device *dev, u32 flags) +{ + struct iommu_domain *domain; + + domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED); + if (!domain) + domain = ERR_PTR(-ENOMEM); + return domain; +} + static void mock_domain_free(struct iommu_domain *domain) { struct mock_iommu_domain *mock = @@ -307,6 +322,7 @@ static const struct iommu_ops mock_ops = { .pgsize_bitmap = MOCK_IO_PAGE_SIZE, .hw_info = mock_domain_hw_info, .domain_alloc = mock_domain_alloc, + .domain_alloc_user = mock_domain_alloc_user, .capable = mock_domain_capable, .set_platform_dma_ops = mock_domain_set_plaform_dma_ops, .device_group = generic_device_group, diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 9f705c1ea30f..9c129e63d7c7 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -114,6 +114,7 @@ TEST_F(iommufd, cmd_length)
TEST_LENGTH(iommu_destroy, IOMMU_DESTROY); TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO); + TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC); TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC); TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES); TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS); @@ -1404,13 +1405,28 @@ TEST_F(iommufd_mock_domain, alloc_hwpt) int i;
for (i = 0; i != variant->mock_domains; i++) { + uint32_t hwpt_id[2]; uint32_t stddev_id; - uint32_t hwpt_id;
- test_cmd_hwpt_alloc(self->idev_ids[i], self->ioas_id, &hwpt_id); - test_cmd_mock_domain(hwpt_id, &stddev_id, NULL, NULL); + test_err_hwpt_alloc(EOPNOTSUPP, + self->idev_ids[i], self->ioas_id, + ~IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id[0]); + test_cmd_hwpt_alloc(self->idev_ids[i], self->ioas_id, + 0, &hwpt_id[0]); + test_cmd_hwpt_alloc(self->idev_ids[i], self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id[1]); + + /* Do a hw_pagetable rotation test */ + test_cmd_mock_domain_replace(self->stdev_ids[i], hwpt_id[0]); + EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, hwpt_id[0])); + test_cmd_mock_domain_replace(self->stdev_ids[i], hwpt_id[1]); + EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, hwpt_id[1])); + test_cmd_mock_domain_replace(self->stdev_ids[i], self->ioas_id); + test_ioctl_destroy(hwpt_id[1]); + + test_cmd_mock_domain(hwpt_id[0], &stddev_id, NULL, NULL); test_ioctl_destroy(stddev_id); - test_ioctl_destroy(hwpt_id); + test_ioctl_destroy(hwpt_id[0]); } }
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index a220ca2a689d..3d7838506bfe 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -615,7 +615,7 @@ TEST_FAIL_NTH(basic_fail_nth, device) if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info))) return -1;
- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, &hwpt_id)) + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id)) return -1;
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index e0753d03ecaa..be4970a84977 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -103,10 +103,11 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, pt_id, NULL))
static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, - __u32 *hwpt_id) + __u32 flags, __u32 *hwpt_id) { struct iommu_hwpt_alloc cmd = { .size = sizeof(cmd), + .flags = flags, .dev_id = device_id, .pt_id = pt_id, }; @@ -120,8 +121,12 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, return 0; }
-#define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id)) +#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, \ + pt_id, flags, hwpt_id)) +#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ + EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \ + pt_id, flags, hwpt_id))
static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id)
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, September 19, 2023 5:25 PM
+static struct iommu_domain * +mock_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
- if (!domain)
domain = ERR_PTR(-ENOMEM);
- return domain;
+}
this also needs to validate whether the flags bits are supported.
This adds the domain_alloc_user op implementation. It supports allocating domains to be used as parent under nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/iommu.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) return NULL; }
+static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{ + struct iommu_domain *domain; + struct intel_iommu *iommu; + + iommu = device_to_iommu(dev, NULL, NULL); + if (!iommu) + return ERR_PTR(-ENODEV); + + if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap)) + return ERR_PTR(-EOPNOTSUPP); + + domain = iommu_domain_alloc(dev->bus); + if (!domain) + domain = ERR_PTR(-ENOMEM); + return domain; +} + static void intel_iommu_domain_free(struct iommu_domain *domain) { if (domain != &si_domain->domain && domain != &blocking_domain) @@ -4807,6 +4826,7 @@ const struct iommu_ops intel_iommu_ops = { .capable = intel_iommu_capable, .hw_info = intel_iommu_hw_info, .domain_alloc = intel_iommu_domain_alloc, + .domain_alloc_user = intel_iommu_domain_alloc_user, .probe_device = intel_iommu_probe_device, .probe_finalize = intel_iommu_probe_finalize, .release_device = intel_iommu_release_device,
On 9/19/23 5:25 PM, Yi Liu wrote:
This adds the domain_alloc_user op implementation. It supports allocating domains to be used as parent under nested translation.
Documentation/process/submitting-patches.rst:
Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour.
So how about,
Add the domain_alloc_user callback to support allocating domains used as parent under nested translation.
?
Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) return NULL; } +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
return ERR_PTR(-EOPNOTSUPP);
- domain = iommu_domain_alloc(dev->bus);
No need to bounce between core and driver. Just,
intel_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
and fully initialize it before return.
- if (!domain)
domain = ERR_PTR(-ENOMEM);
- return domain;
+}
- static void intel_iommu_domain_free(struct iommu_domain *domain) { if (domain != &si_domain->domain && domain != &blocking_domain)
@@ -4807,6 +4826,7 @@ const struct iommu_ops intel_iommu_ops = { .capable = intel_iommu_capable, .hw_info = intel_iommu_hw_info, .domain_alloc = intel_iommu_domain_alloc,
- .domain_alloc_user = intel_iommu_domain_alloc_user, .probe_device = intel_iommu_probe_device, .probe_finalize = intel_iommu_probe_finalize, .release_device = intel_iommu_release_device,
Best regards, baolu
On Wed, Sep 20, 2023 at 01:28:41PM +0800, Baolu Lu wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) return NULL; } +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
return ERR_PTR(-EOPNOTSUPP);
There is a check missing for supported flags
if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT)) return ERR_PTR(-EOPNOTSUPP);
- domain = iommu_domain_alloc(dev->bus);
No need to bounce between core and driver. Just,
intel_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
and fully initialize it before return.
If you are going to do that then intel_iommu_domain_alloc() should fully initialize the domain, not here.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, September 20, 2023 9:05 PM
On Wed, Sep 20, 2023 at 01:28:41PM +0800, Baolu Lu wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
return NULL; } +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu-
ecap))
return ERR_PTR(-EOPNOTSUPP);
There is a check missing for supported flags
if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT)) return ERR_PTR(-EOPNOTSUPP);
Well, the iommufd has such check. But I also noticed your another reply to Weijiang. So your preference is to do the flags validation in iommu driver instead of iommufd. Isn't it?
- domain = iommu_domain_alloc(dev->bus);
No need to bounce between core and driver. Just,
intel_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
and fully initialize it before return.
If you are going to do that then intel_iommu_domain_alloc() should fully initialize the domain, not here.
I've also considered what Baolu described, but it requires to do some extra initialization which is duplicated with iommu_domain_alloc(). So I chose this simple way.
Regards, Yi Liu
On Wed, Sep 20, 2023 at 01:10:04PM +0000, Liu, Yi L wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, September 20, 2023 9:05 PM
On Wed, Sep 20, 2023 at 01:28:41PM +0800, Baolu Lu wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
return NULL; } +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu-
ecap))
return ERR_PTR(-EOPNOTSUPP);
There is a check missing for supported flags
if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT)) return ERR_PTR(-EOPNOTSUPP);
Well, the iommufd has such check. But I also noticed your another reply to Weijiang. So your preference is to do the flags validation in iommu driver instead of iommufd. Isn't it?
The core code should check that only kernel known bits are set
The driver code should check that only driver supported bits are set.
Today there is only one bit so the checks are the same code.
Tomorrow when we add a new bit the checks will not be the same
Jason
On 2023/9/20 21:18, Jason Gunthorpe wrote:
On Wed, Sep 20, 2023 at 01:10:04PM +0000, Liu, Yi L wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, September 20, 2023 9:05 PM
On Wed, Sep 20, 2023 at 01:28:41PM +0800, Baolu Lu wrote:
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
return NULL;
} +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu-
ecap))
return ERR_PTR(-EOPNOTSUPP);
There is a check missing for supported flags
if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT)) return ERR_PTR(-EOPNOTSUPP);
Well, the iommufd has such check. But I also noticed your another reply to Weijiang. So your preference is to do the flags validation in iommu driver instead of iommufd. Isn't it?
The core code should check that only kernel known bits are set
The driver code should check that only driver supported bits are set.
Today there is only one bit so the checks are the same code.
Tomorrow when we add a new bit the checks will not be the same
fair enough. I'll have the check in both core and iommu driver.
if (flags & (~IOMMU_HWPT_ALLOC_NEST_PARENT)) return ERR_PTR(-EOPNOTSUPP);
On 9/20/23 9:10 PM, Liu, Yi L wrote:
- domain = iommu_domain_alloc(dev->bus);
No need to bounce between core and driver. Just,
intel_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
and fully initialize it before return.
If you are going to do that then intel_iommu_domain_alloc() should fully initialize the domain, not here.
I've also considered what Baolu described, but it requires to do some extra initialization which is duplicated with iommu_domain_alloc(). So I chose this simple way.
Okay, got you.
Once Jason's paging domain and Robin's bus->iommu_ops retirement series have landed, the VT-d driver will need some refactoring. Therefore, I'm fine with you using a simpler approach here. I'll refactor everything later.
Best regards, baolu
On 2023/9/21 09:31, Baolu Lu wrote:
On 9/20/23 9:10 PM, Liu, Yi L wrote:
+ domain = iommu_domain_alloc(dev->bus);
No need to bounce between core and driver. Just,
intel_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
and fully initialize it before return.
If you are going to do that then intel_iommu_domain_alloc() should fully initialize the domain, not here.
I've also considered what Baolu described, but it requires to do some extra initialization which is duplicated with iommu_domain_alloc(). So I chose this simple way.
Okay, got you.
Once Jason's paging domain and Robin's bus->iommu_ops retirement series have landed, the VT-d driver will need some refactoring. Therefore, I'm fine with you using a simpler approach here. I'll refactor everything later.
yes.
From: Baolu Lu baolu.lu@linux.intel.com Sent: Thursday, September 21, 2023 9:31 AM
On 9/20/23 9:10 PM, Liu, Yi L wrote:
- domain = iommu_domain_alloc(dev->bus);
No need to bounce between core and driver. Just,
intel_iommu_domain_alloc(IOMMU_DOMAIN_UNMANAGED);
and fully initialize it before return.
If you are going to do that then intel_iommu_domain_alloc() should fully initialize the domain, not here.
I've also considered what Baolu described, but it requires to do some extra initialization which is duplicated with iommu_domain_alloc(). So I chose this simple way.
Okay, got you.
Please add a comment for this temporary option.
From: Baolu Lu baolu.lu@linux.intel.com Sent: Wednesday, September 20, 2023 1:29 PM
On 9/19/23 5:25 PM, Yi Liu wrote:
This adds the domain_alloc_user op implementation. It supports allocating domains to be used as parent under nested translation.
Documentation/process/submitting-patches.rst:
Describe your changes in imperative mood, e.g. "make xyzzy do frotz" instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy to do frotz", as if you are giving orders to the codebase to change its behaviour.
So how about,
Add the domain_alloc_user callback to support allocating domains used as parent under nested translation.
Sure.
On 9/19/2023 5:25 PM, Yi Liu wrote:
This adds the domain_alloc_user op implementation. It supports allocating domains to be used as parent under nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) return NULL; } +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
return ERR_PTR(-EOPNOTSUPP);
The outer caller has checked (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) before it comes here. If this callback is dedicated for nested domain allocation, then you may omit the condition here.
On Wed, Sep 20, 2023 at 01:41:07PM +0800, Yang, Weijiang wrote:
On 9/19/2023 5:25 PM, Yi Liu wrote:
This adds the domain_alloc_user op implementation. It supports allocating domains to be used as parent under nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) return NULL; } +static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu->ecap))
return ERR_PTR(-EOPNOTSUPP);
The outer caller has checked (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) before it comes here. If this callback is dedicated for nested domain allocation, then you may omit the condition here.
No, please don't.
The point of the flags is to be passed to the driver. The driver should validate them, not the core code.
We will add more flags, I don't want to change every driver to do this.
Jason
From: Yang, Weijiang weijiang.yang@intel.com Sent: Wednesday, September 20, 2023 1:41 PM On 9/19/2023 5:25 PM, Yi Liu wrote:
This adds the domain_alloc_user op implementation. It supports allocating domains to be used as parent under nested translation.
Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/intel/iommu.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 5db283c17e0d..491bcde1ff96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4074,6 +4074,25 @@ static struct iommu_domain
*intel_iommu_domain_alloc(unsigned type)
return NULL; }
+static struct iommu_domain * +intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +{
- struct iommu_domain *domain;
- struct intel_iommu *iommu;
- iommu = device_to_iommu(dev, NULL, NULL);
- if (!iommu)
return ERR_PTR(-ENODEV);
- if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ecap_nest(iommu-
ecap))
return ERR_PTR(-EOPNOTSUPP);
The outer caller has checked (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) before it comes here. If this callback is dedicated for nested domain allocation, then you may omit the condition here.
This check is different. It aims to fail the call if iommu hw does not support nested. I just realized that it may need to check if scalable mode is enabled. This should be more accurate.
Regards, Yi Liu
linux-kselftest-mirror@lists.linaro.org