Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |--------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt) and each has an iommu_domain allocated from iommu driver. So in this series hw_pagetable and iommu_domain means the same thing if no special note. IOMMUFD has already supported allocating hw_pagetable that is linked with an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable with driver specific parameters and interface to sync stage-1 IOTLB as user owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first introduces new iommu op for allocating domains with user data and the op for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure to accept user_data and parent hwpt, then relay the data to iommu core to allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed for user-managed hwpts. ioctl IOMMU_DEVICE_GET_HW_INFO is extended to report the supported hwpt types bitmap to user. Selftest is added as well to cover the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks.
base-commit: 3dfe670c94c7fc4af42e5c08cdd8a110b594e18e
[1] https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.c... [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting
Thanks, Yi Liu
Lu Baolu (2): iommu: Add new iommu op to create domains owned by userspace iommu: Add nested domain support
Nicolin Chen (5): iommufd/hw_pagetable: Do not populate user-managed hw_pagetables iommufd/selftest: Add domain_alloc_user() support in iommu mock iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (5): iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation iommufd: Pass parent hwpt and user_data to iommufd_hw_pagetable_alloc() iommufd: IOMMU_HWPT_ALLOC allocation with user data iommufd: Add IOMMU_HWPT_INVALIDATE iommufd/device: Report supported hwpt_types
drivers/iommu/iommufd/device.c | 9 +- drivers/iommu/iommufd/hw_pagetable.c | 242 +++++++++++++++++- drivers/iommu/iommufd/iommufd_private.h | 16 +- drivers/iommu/iommufd/iommufd_test.h | 30 +++ drivers/iommu/iommufd/main.c | 7 +- drivers/iommu/iommufd/selftest.c | 104 +++++++- include/linux/iommu.h | 11 + include/uapi/linux/iommufd.h | 65 +++++ tools/testing/selftests/iommu/iommufd.c | 126 ++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++ 10 files changed, 654 insertions(+), 27 deletions(-)
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new iommu_domain op to create domains owned by userspace, e.g. through iommufd. These domains have a few different properties compares to kernel owned domains:
- They may be UNMANAGED domains, but created with special parameters. For instance aperture size changes/number of levels, different IOPTE formats, or other things necessary to make a vIOMMU work
- We have to track all the memory allocations with GFP_KERNEL_ACCOUNT to make the cgroup sandbox stronger
- Device-specialty domains, such as NESTED domains can be created by iommufd.
The new op clearly says the domain is being created by IOMMUFD, that the domain is intended for userspace use, and it provides a way to pass a driver specific uAPI structure to customize the created domain to exactly what the vIOMMU userspace driver requires.
iommu drivers that cannot support VFIO/IOMMUFD should not support this op. This includes any driver that cannot provide a fully functional UNMANAGED domain.
This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there is room in the design to grow a generic parameter set as well.
This new op for now is only supposed to be used by iommufd, hence no wrapper for it. iommufd would call the callback directly. As for domain free, iommufd would use iommu_domain_free().
Suggested-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3ef84ee359d2..a269bc62a31c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,6 +229,7 @@ struct iommu_iotlb_gather { * after use. Return the data buffer if success, or ERR_PTR on * failure. * @domain_alloc: allocate iommu domain + * @domain_alloc_user: allocate user iommu domain * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -266,6 +267,9 @@ struct iommu_ops {
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); + struct iommu_domain *(*domain_alloc_user)(struct device *dev, + struct iommu_domain *parent, + const void *user_data);
struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev);
On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3ef84ee359d2..a269bc62a31c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @domain_alloc_user: allocate user iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
- @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -266,6 +267,9 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
- struct iommu_domain *(*domain_alloc_user)(struct device *dev,
struct iommu_domain *parent,
const void *user_data);
Since the kernel does the copy from user and manages the zero fill compat maybe this user_data have a union like Robin suggested.
But yes, this is the idea.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, March 10, 2023 8:56 AM
On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3ef84ee359d2..a269bc62a31c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @domain_alloc_user: allocate user iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
- @probe_finalize: Do final setup work after the device is added to an
IOMMU
@@ -266,6 +267,9 @@ struct iommu_ops {
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned
iommu_domain_type);
- struct iommu_domain *(*domain_alloc_user)(struct device *dev,
struct iommu_domain
*parent,
const void *user_data);
Since the kernel does the copy from user and manages the zero fill compat maybe this user_data have a union like Robin suggested.
But yes, this is the idea.
Ok. so it's a union like the below, and in this patch may be only an empty union can be added as the struct iommu_hwpt_intel_vtd and struct iommu_hwpt_arm_smmuv3 would be added by the vendor nesting patch series.
union iommu_hwpt_alloc_user_data { struct iommu_hwpt_intel_vtd vtd; struct iommu_hwpt_arm_smmuv3 smmuv3; };
Regards, Yi Liu
Hi Jason,
On Thu, Mar 09, 2023 at 08:56:06PM -0400, Jason Gunthorpe wrote:
On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3ef84ee359d2..a269bc62a31c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @domain_alloc_user: allocate user iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
- @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -266,6 +267,9 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
- struct iommu_domain *(*domain_alloc_user)(struct device *dev,
struct iommu_domain *parent,
const void *user_data);
Since the kernel does the copy from user and manages the zero fill compat maybe this user_data have a union like Robin suggested.
But yes, this is the idea.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
We pass in a read-only data to this ->domain_alloc_user() while it also returns NULL on failure, matching ->domain_alloc(). So, there seems to be no error feedback pathway from the driver to user space.
Robin remarked in the SMMU series that an STE configuration can fail. So, a proper error feedback is required for this callback too.
To return a driver/HW specific error, I think we could define a "u8 out_error" in the user_data structure. So, we probably need a non-const pass-in here. What do you think?
Thanks Nic
On Wed, Apr 12, 2023 at 05:44:04PM -0700, Nicolin Chen wrote:
Hi Jason,
On Thu, Mar 09, 2023 at 08:56:06PM -0400, Jason Gunthorpe wrote:
On Thu, Mar 09, 2023 at 12:08:59AM -0800, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 3ef84ee359d2..a269bc62a31c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,6 +229,7 @@ struct iommu_iotlb_gather {
after use. Return the data buffer if success, or ERR_PTR on
failure.
- @domain_alloc: allocate iommu domain
- @domain_alloc_user: allocate user iommu domain
- @probe_device: Add device to iommu driver handling
- @release_device: Remove device from iommu driver handling
- @probe_finalize: Do final setup work after the device is added to an IOMMU
@@ -266,6 +267,9 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
- struct iommu_domain *(*domain_alloc_user)(struct device *dev,
struct iommu_domain *parent,
const void *user_data);
Since the kernel does the copy from user and manages the zero fill compat maybe this user_data have a union like Robin suggested.
But yes, this is the idea.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
We pass in a read-only data to this ->domain_alloc_user() while it also returns NULL on failure, matching ->domain_alloc(). So, there seems to be no error feedback pathway from the driver to user space.
Robin remarked in the SMMU series that an STE configuration can fail. So, a proper error feedback is required for this callback too.
To return a driver/HW specific error, I think we could define a "u8 out_error" in the user_data structure. So, we probably need a non-const pass-in here. What do you think?
What is wrong with err_ptr?
Jason
On Thu, Apr 13, 2023 at 08:37:14AM -0300, Jason Gunthorpe wrote:
@@ -266,6 +267,9 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type);
- struct iommu_domain *(*domain_alloc_user)(struct device *dev,
struct iommu_domain *parent,
const void *user_data);
Since the kernel does the copy from user and manages the zero fill compat maybe this user_data have a union like Robin suggested.
But yes, this is the idea.
Reviewed-by: Jason Gunthorpe jgg@nvidia.com
We pass in a read-only data to this ->domain_alloc_user() while it also returns NULL on failure, matching ->domain_alloc(). So, there seems to be no error feedback pathway from the driver to user space.
Robin remarked in the SMMU series that an STE configuration can fail. So, a proper error feedback is required for this callback too.
To return a driver/HW specific error, I think we could define a "u8 out_error" in the user_data structure. So, we probably need a non-const pass-in here. What do you think?
What is wrong with err_ptr?
I see. That could keep the "const" then. Will try that.
Thanks! Nic
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user space I/O address, which is nested on top of another user space address represented by a UNMANAGED domain. The mappings of a nested domain are managed by user space software, therefore it's unnecessary to have map/unmap callbacks. But the updates of the PTEs in the nested domain page table must be propagated to the caches on both IOMMU (IOTLB) and devices (DevTLB).
The nested domain is allocated by the domain_alloc_user op, and attached to the device through the existing iommu_attach_device/group() interfaces.
A new domain op, named cache_invalidate_user is added for the userspace to flush the hardware caches for a nested domain through iommufd. No wrapper for it, as it's only supposed to be used by iommufd.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a269bc62a31c..080278c8154d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -67,6 +67,9 @@ struct iommu_domain_geometry {
#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
+#define __IOMMU_DOMAIN_NESTED (1U << 5) /* User-managed IOVA nested on + a stage-2 translation */ + /* * This are the possible domain-types * @@ -92,6 +95,7 @@ struct iommu_domain_geometry { __IOMMU_DOMAIN_DMA_API | \ __IOMMU_DOMAIN_DMA_FQ) #define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA) +#define IOMMU_DOMAIN_NESTED (__IOMMU_DOMAIN_NESTED)
struct iommu_domain { unsigned type; @@ -325,6 +329,7 @@ struct iommu_ops { * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush * queue + * @cache_invalidate_user: Flush hardware TLBs caching user space IO mappings * @iova_to_phys: translate iova to physical address * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform @@ -354,6 +359,8 @@ struct iommu_domain_ops { size_t size); void (*iotlb_sync)(struct iommu_domain *domain, struct iommu_iotlb_gather *iotlb_gather); + void (*cache_invalidate_user)(struct iommu_domain *domain, + void *user_data);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 4:09 PM
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user space I/O address, which is nested
'a ... address'? let's call it 'user I/O page table'.
On 2023/3/17 18:25, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 4:09 PM
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user space I/O address, which is nested
'a ... address'? let's call it 'user I/O page table'.
Okay, sure.
Best regards, baolu
This converts IOMMUFD to use iommu_domain_alloc_user() for iommu_domain creation.
Suggested-by: Jason Gunthorpe jgg@nvidia.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 13bdab4c801b..84b4a11e62f8 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -5,6 +5,7 @@ #include <linux/iommu.h> #include <uapi/linux/iommufd.h>
+#include "../iommu-priv.h" #include "iommufd_private.h"
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) @@ -55,6 +56,7 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, bool immediate_attach) { + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); struct iommufd_hw_pagetable *hwpt; int rc;
@@ -69,7 +71,10 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, refcount_inc(&ioas->obj.users); hwpt->ioas = ioas;
- hwpt->domain = iommu_domain_alloc(idev->dev->bus); + if (ops->domain_alloc_user) + hwpt->domain = ops->domain_alloc_user(idev->dev, NULL, NULL); + else + hwpt->domain = iommu_domain_alloc(idev->dev->bus); if (!hwpt->domain) { rc = -ENOMEM; goto out_abort;
On 3/9/23 4:09 PM, Yi Liu wrote:
This converts IOMMUFD to use iommu_domain_alloc_user() for iommu_domain creation.
Nit: I don't think this is a "convert" because iommu_domain_alloc() is still there.
Suggested-by: Jason Gunthorpejgg@nvidia.com Co-developed-by: Nicolin Chennicolinc@nvidia.com Signed-off-by: Nicolin Chennicolinc@nvidia.com Signed-off-by: Yi Liuyi.l.liu@intel.com
Anyway, I agree with the idea,
Reviewed-by: Lu Baolu baolu.lu@linux.intel.com
Best regards, baolu
Nested translation has stage-1 and stage-2 page tables. A stage-1 page table is managed by user space, and it needs to work with a stage-2 page table, which is a parent hwpt for the stage-1 hwpt.
iommu core already supports accepting parent iommu_domain and user_data to allocate an iommu_domain. This makes iommufd_hw_pagetable_alloc() to accept the parent hwpt and user_data, and relays them to iommu core, to prepare for supporting hw_pagetable allocation with user_data.
Also, add a parent pointer in struct iommufd_hw_pagetable for taking and releasing its refcount.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 28 ++++++++++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 5 ++++- 3 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5c352807d946..19cd6df46c6a 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -558,7 +558,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, }
hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, - immediate_attach); + NULL, NULL, immediate_attach); if (IS_ERR(hwpt)) { destroy_hwpt = ERR_CAST(hwpt); goto out_unlock; diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 84b4a11e62f8..16e92a1c150b 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -24,6 +24,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) if (hwpt->domain) iommu_domain_free(hwpt->domain);
+ if (hwpt->parent) + refcount_dec(&hwpt->parent->obj.users); refcount_dec(&hwpt->ioas->obj.users); }
@@ -46,6 +48,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * @ictx: iommufd context * @ioas: IOAS to associate the domain with * @idev: Device to get an iommu_domain for + * @parent: Optional parent HWPT to associate with the domain with + * @user_data: Optional user_data pointer * @immediate_attach: True if idev should be attached to the hwpt * * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT @@ -54,14 +58,20 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) */ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, - struct iommufd_device *idev, bool immediate_attach) + struct iommufd_device *idev, + struct iommufd_hw_pagetable *parent, + void *user_data, bool immediate_attach) { const struct iommu_ops *ops = dev_iommu_ops(idev->dev); + struct iommu_domain *parent_domain = NULL; struct iommufd_hw_pagetable *hwpt; int rc;
lockdep_assert_held(&ioas->mutex);
+ if (parent && !ops->domain_alloc_user) + return ERR_PTR(-EOPNOTSUPP); + hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); if (IS_ERR(hwpt)) return hwpt; @@ -70,9 +80,15 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt->ioas = ioas; + if (parent) { + hwpt->parent = parent; + parent_domain = parent->domain; + refcount_inc(&parent->obj.users); + }
if (ops->domain_alloc_user) - hwpt->domain = ops->domain_alloc_user(idev->dev, NULL, NULL); + hwpt->domain = ops->domain_alloc_user(idev->dev, + parent_domain, user_data); else hwpt->domain = iommu_domain_alloc(idev->dev->bus); if (!hwpt->domain) { @@ -80,6 +96,11 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_abort; }
+ /* It must be either NESTED or UNMANAGED, depending on parent_domain */ + if ((parent_domain && hwpt->domain->type != IOMMU_DOMAIN_NESTED) || + (!parent_domain && hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED)) + goto out_abort; + /* * Set the coherency mode before we do iopt_table_add_domain() as some * iommus have a per-PTE bit that controls it and need to decide before @@ -150,7 +171,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) }
mutex_lock(&ioas->mutex); - hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false); + hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, + NULL, NULL, false); mutex_unlock(&ioas->mutex); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 05b5ad66f716..182c074eecdc 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -243,6 +243,7 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); */ struct iommufd_hw_pagetable { struct iommufd_object obj; + struct iommufd_hw_pagetable *parent; struct iommufd_ioas *ioas; struct iommu_domain *domain; bool auto_domain : 1; @@ -254,7 +255,9 @@ struct iommufd_hw_pagetable {
struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, - struct iommufd_device *idev, bool immediate_attach); + struct iommufd_device *idev, + struct iommufd_hw_pagetable *parent, + void *user_data, bool immediate_attach); int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev);
On 3/9/23 4:09 PM, Yi Liu wrote:
Nested translation has stage-1 and stage-2 page tables. A stage-1 page table is managed by user space, and it needs to work with a stage-2 page table, which is a parent hwpt for the stage-1 hwpt.
iommu core already supports accepting parent iommu_domain and user_data to allocate an iommu_domain. This makes iommufd_hw_pagetable_alloc() to accept the parent hwpt and user_data, and relays them to iommu core, to prepare for supporting hw_pagetable allocation with user_data.
Also, add a parent pointer in struct iommufd_hw_pagetable for taking and releasing its refcount.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 28 ++++++++++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 5 ++++- 3 files changed, 30 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 5c352807d946..19cd6df46c6a 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -558,7 +558,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, } hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev,
immediate_attach);
if (IS_ERR(hwpt)) { destroy_hwpt = ERR_CAST(hwpt); goto out_unlock;NULL, NULL, immediate_attach);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 84b4a11e62f8..16e92a1c150b 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -24,6 +24,8 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) if (hwpt->domain) iommu_domain_free(hwpt->domain);
- if (hwpt->parent)
refcount_dec(&hwpt->ioas->obj.users); }refcount_dec(&hwpt->parent->obj.users);
@@ -46,6 +48,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
- @ictx: iommufd context
- @ioas: IOAS to associate the domain with
- @idev: Device to get an iommu_domain for
- @parent: Optional parent HWPT to associate with the domain with
- @user_data: Optional user_data pointer
- @immediate_attach: True if idev should be attached to the hwpt
- Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
@@ -54,14 +58,20 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) */ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
struct iommufd_device *idev, bool immediate_attach)
struct iommufd_device *idev,
struct iommufd_hw_pagetable *parent,
{ const struct iommu_ops *ops = dev_iommu_ops(idev->dev);void *user_data, bool immediate_attach)
- struct iommu_domain *parent_domain = NULL; struct iommufd_hw_pagetable *hwpt; int rc;
lockdep_assert_held(&ioas->mutex);
- if (parent && !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
My understanding here is that we are checking whether domain_alloc_user is required. It seems that as long as the caller inputs a valid user_data or parent, domain_alloc_user is required. If so,
if ((user_data || parent) && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
Best regards, baolu
On Fri, Mar 10, 2023 at 10:10:56AM +0800, Baolu Lu wrote:
@@ -54,14 +58,20 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) */ struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
struct iommufd_device *idev, bool immediate_attach)
struct iommufd_device *idev,
struct iommufd_hw_pagetable *parent,
{ const struct iommu_ops *ops = dev_iommu_ops(idev->dev);void *user_data, bool immediate_attach)
- struct iommu_domain *parent_domain = NULL; struct iommufd_hw_pagetable *hwpt; int rc; lockdep_assert_held(&ioas->mutex);
- if (parent && !ops->domain_alloc_user)
return ERR_PTR(-EOPNOTSUPP);
My understanding here is that we are checking whether domain_alloc_user is required. It seems that as long as the caller inputs a valid user_data or parent, domain_alloc_user is required. If so,
if ((user_data || parent) && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
Yes
Jason
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 4:09 PM
- /* It must be either NESTED or UNMANAGED, depending on
parent_domain */
- if ((parent_domain && hwpt->domain->type !=
IOMMU_DOMAIN_NESTED) ||
(!parent_domain && hwpt->domain->type !=
IOMMU_DOMAIN_UNMANAGED))
goto out_abort;
WARN_ON()
On Fri, Mar 17, 2023 at 10:23:54AM +0000, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 4:09 PM
- /* It must be either NESTED or UNMANAGED, depending on
parent_domain */
- if ((parent_domain && hwpt->domain->type !=
IOMMU_DOMAIN_NESTED) ||
(!parent_domain && hwpt->domain->type !=
IOMMU_DOMAIN_UNMANAGED))
goto out_abort;
WARN_ON()
Wouldn't that be userspace triggerable? It gets to pick the hwpt used.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Monday, March 20, 2023 8:47 PM
On Fri, Mar 17, 2023 at 10:23:54AM +0000, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, March 9, 2023 4:09 PM
- /* It must be either NESTED or UNMANAGED, depending on
parent_domain */
- if ((parent_domain && hwpt->domain->type !=
IOMMU_DOMAIN_NESTED) ||
(!parent_domain && hwpt->domain->type !=
IOMMU_DOMAIN_UNMANAGED))
goto out_abort;
WARN_ON()
Wouldn't that be userspace triggerable? It gets to pick the hwpt used.
The domain type is set by iommu driver instead of userspace. A sane driver should pass above check, if it claims to support nested.
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it is managed by a guest OS. Move the iopt_table_add_domain and list_add_tail calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) return 0; }
+static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt) +{ + int rc; + + if (hwpt->parent) + hwpt = hwpt->parent; + + if (!list_empty(&hwpt->hwpt_item)) + return 0; + + rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); + if (rc) + return rc; + list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list); + return 0; +} + /** * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device * @ictx: iommufd context @@ -131,10 +148,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_unlock; }
- rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain); + rc = iommufd_hw_pagetable_link_ioas(hwpt); if (rc) goto out_detach; - list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
mutex_unlock(&idev->igroup->lock); return hwpt;
On 3/9/23 4:09 PM, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it is managed by a guest OS. Move the iopt_table_add_domain and list_add_tail calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) return 0; } +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt) +{
- int rc;
- if (hwpt->parent)
hwpt = hwpt->parent;
- if (!list_empty(&hwpt->hwpt_item))
return 0;
What is above check for? Is it "the hwpt has already been inserted into the hwpt list of its ioas in another place"?
If so, is it possible that hwpt will be deleted from the list even when this user hwpt is still linked to the ioas?
- rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
- if (rc)
return rc;
- list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
- return 0;
+}
- /**
- iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
- @ictx: iommufd context
@@ -131,10 +148,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, goto out_unlock; }
- rc = iopt_table_add_domain(&hwpt->ioas->iopt, hwpt->domain);
- rc = iommufd_hw_pagetable_link_ioas(hwpt); if (rc) goto out_detach;
- list_add_tail(&hwpt->hwpt_item, &hwpt->ioas->hwpt_list);
mutex_unlock(&idev->igroup->lock); return hwpt;
Best regards, baolu
On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it is managed by a guest OS. Move the iopt_table_add_domain and list_add_tail calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) return 0; }
+static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt) +{
int rc;
if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item))
return 0;
What is above check for? Is it "the hwpt has already been inserted into the hwpt list of its ioas in another place"?
If so, is it possible that hwpt will be deleted from the list even when this user hwpt is still linked to the ioas?
It means that the hwpt is already linked to the ioas. And the hwpt_item can be only empty after a destroy().
With that being said, after I think it through, perhaps Yi's previous change removing it might be better. So, it could be:
------------------------------------------------------------------------------- + /* + * Only a parent hwpt needs to be linked to the IOAS. And a hwpt->parent + * must be linked to the IOAS already, when it's being allocated. + */ if (hwpt->parent) - hwpt = hwpt->parent; - - if (!list_empty(&hwpt->hwpt_item)) return 0;
-------------------------------------------------------------------------------
I was concerned about the case where a device gets attached to the nested hwpt without staging at the parent hwpt first. But, the link between the parent hwpt and the IOAS happened inside the allocation function now, not attach() any more.
Thanks Nic
On 2023/3/10 14:50, Nicolin Chen wrote:
On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it is managed by a guest OS. Move the iopt_table_add_domain and list_add_tail calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) return 0; }
+static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt) +{
int rc;
if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item))
return 0;
What is above check for? Is it "the hwpt has already been inserted into the hwpt list of its ioas in another place"?
If so, is it possible that hwpt will be deleted from the list even when this user hwpt is still linked to the ioas?
It means that the hwpt is already linked to the ioas. And the hwpt_item can be only empty after a destroy().
With that being said, after I think it through, perhaps Yi's previous change removing it might be better. So, it could be:
- /*
* Only a parent hwpt needs to be linked to the IOAS. And a hwpt->parent
* must be linked to the IOAS already, when it's being allocated.
if (hwpt->parent)*/
hwpt = hwpt->parent;
- if (!list_empty(&hwpt->hwpt_item)) return 0;
I was concerned about the case where a device gets attached to the nested hwpt without staging at the parent hwpt first. But, the link between the parent hwpt and the IOAS happened inside the allocation function now, not attach() any more.
Yes, it's clearer.
Best regards, baolu
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, March 10, 2023 2:51 PM
On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it
is
managed by a guest OS. Move the iopt_table_add_domain and
list_add_tail
calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct
iommufd_hw_pagetable *hwpt)
return 0;
}
+static int iommufd_hw_pagetable_link_ioas(struct
iommufd_hw_pagetable *hwpt)
+{
int rc;
if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item))
return 0;
What is above check for? Is it "the hwpt has already been inserted into the hwpt list of its ioas in another place"?
If so, is it possible that hwpt will be deleted from the list even when this user hwpt is still linked to the ioas?
It means that the hwpt is already linked to the ioas. And the hwpt_item can be only empty after a destroy().
With that being said, after I think it through, perhaps Yi's previous change removing it might be better. So, it could be:
- /*
* Only a parent hwpt needs to be linked to the IOAS. And a hwpt-
parent
* must be linked to the IOAS already, when it's being allocated.
if (hwpt->parent)*/
hwpt = hwpt->parent;
- if (!list_empty(&hwpt->hwpt_item)) return 0;
I was concerned about the case where a device gets attached to the nested hwpt without staging at the parent hwpt first.
I think I was convinced with the reason that this helper may be called by allocation for both standalone s2 hwpt and the nested hwpt. So my change was not enough. Yours covers both cases.
But, the link between the parent hwpt and the IOAS happened inside the allocation function now, not attach() any more.
Not quite get. This helper is also called in the allocation path. Is it? Anyhow, with Jason's comment, this helper may be reworked. We can sync later on the next version.
Regards, Yi Liu
On Thu, Mar 23, 2023 at 08:06:26AM +0000, Liu, Yi L wrote:
External email: Use caution opening links or attachments
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, March 10, 2023 2:51 PM
On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it
is
managed by a guest OS. Move the iopt_table_add_domain and
list_add_tail
calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct
iommufd_hw_pagetable *hwpt)
return 0;
}
+static int iommufd_hw_pagetable_link_ioas(struct
iommufd_hw_pagetable *hwpt)
+{
int rc;
if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item))
return 0;
What is above check for? Is it "the hwpt has already been inserted into the hwpt list of its ioas in another place"?
If so, is it possible that hwpt will be deleted from the list even when this user hwpt is still linked to the ioas?
It means that the hwpt is already linked to the ioas. And the hwpt_item can be only empty after a destroy().
With that being said, after I think it through, perhaps Yi's previous change removing it might be better. So, it could be:
/*
* Only a parent hwpt needs to be linked to the IOAS. And a hwpt-
parent
* must be linked to the IOAS already, when it's being allocated.
*/ if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item)) return 0;
I was concerned about the case where a device gets attached to the nested hwpt without staging at the parent hwpt first.
I think I was convinced with the reason that this helper may be called by allocation for both standalone s2 hwpt and the nested hwpt. So my change was not enough. Yours covers both cases.
But, the link between the parent hwpt and the IOAS happened inside the allocation function now, not attach() any more.
Not quite get. This helper is also called in the allocation path. Is it? Anyhow, with Jason's comment, this helper may be reworked. We can sync later on the next version.
We previously had this link_ioas() in attach() routine so we needed to make sure hwpt->parent got populated, because the device could be attached to an S1 HWPT directly. But now this is in the alloc() routine, so by the time an S1 HWPT is being allocated, an S2 HWPT must be allocated first and populated already.
Nic
From: Nicolin Chen nicolinc@nvidia.com Sent: Thursday, March 23, 2023 4:12 PM
On Thu, Mar 23, 2023 at 08:06:26AM +0000, Liu, Yi L wrote:
External email: Use caution opening links or attachments
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, March 10, 2023 2:51 PM
On Fri, Mar 10, 2023 at 10:25:10AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated,
since it
is
managed by a guest OS. Move the iopt_table_add_domain and
list_add_tail
calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20
++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int
iommufd_hw_pagetable_enforce_cc(struct
iommufd_hw_pagetable *hwpt)
return 0;
}
+static int iommufd_hw_pagetable_link_ioas(struct
iommufd_hw_pagetable *hwpt)
+{
int rc;
if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item))
return 0;
What is above check for? Is it "the hwpt has already been inserted into the hwpt list of its ioas in another place"?
If so, is it possible that hwpt will be deleted from the list even when this user hwpt is still linked to the ioas?
It means that the hwpt is already linked to the ioas. And the hwpt_item can be only empty after a destroy().
With that being said, after I think it through, perhaps Yi's previous change removing it might be better. So, it could be:
/*
* Only a parent hwpt needs to be linked to the IOAS. And a hwpt-
parent
* must be linked to the IOAS already, when it's being allocated.
*/ if (hwpt->parent)
hwpt = hwpt->parent;
if (!list_empty(&hwpt->hwpt_item)) return 0;
I was concerned about the case where a device gets attached to the nested hwpt without staging at the parent hwpt first.
I think I was convinced with the reason that this helper may be called by allocation for both standalone s2 hwpt and the nested hwpt. So my change was not enough. Yours covers both cases.
But, the link between the parent hwpt and the IOAS happened inside the allocation function now, not attach() any more.
Not quite get. This helper is also called in the allocation path. Is it? Anyhow, with Jason's comment, this helper may be reworked. We can sync later on the next version.
We previously had this link_ioas() in attach() routine so we needed to make sure hwpt->parent got populated, because the device could be attached to an S1 HWPT directly. But now this is in the alloc() routine, so by the time an S1 HWPT is being allocated, an S2 HWPT must be allocated first and populated already.
Aha, yes. 😊
On Thu, Mar 09, 2023 at 12:09:03AM -0800, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it is managed by a guest OS. Move the iopt_table_add_domain and list_add_tail calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) return 0; } +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt) +{
- int rc;
- if (hwpt->parent)
This should be:
hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED
Ie if we asked the driver to alloc a domain and it allocated an UNMANAGED domain then it means IOMMUFD manages the mappings and it should be populated from the IOAS.
Arguably drivers should EOPNOTSUPP if presented with a parent in this situation, but still this code should be clear about the purpose.
hwpt = hwpt->parent;
And we definately shouldn't touch the parent. That is already setup and owned by someone else. Just return and don't do anything.
Jason
On Fri, Mar 10, 2023 at 11:29:14AM -0400, Jason Gunthorpe wrote:
On Thu, Mar 09, 2023 at 12:09:03AM -0800, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
A user-managed hw_pagetable does not need to get populated, since it is managed by a guest OS. Move the iopt_table_add_domain and list_add_tail calls into a helper, where the hwpt pointer will be redirected to its hwpt->parent if it's available.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 16e92a1c150b..6e45ec0a66fa 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -43,6 +43,23 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) return 0; } +static int iommufd_hw_pagetable_link_ioas(struct iommufd_hw_pagetable *hwpt) +{
- int rc;
- if (hwpt->parent)
This should be:
hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED
Ie if we asked the driver to alloc a domain and it allocated an UNMANAGED domain then it means IOMMUFD manages the mappings and it should be populated from the IOAS.
OK. That looks better to me.
Arguably drivers should EOPNOTSUPP if presented with a parent in this situation, but still this code should be clear about the purpose.
hwpt = hwpt->parent;
And we definately shouldn't touch the parent. That is already setup and owned by someone else. Just return and don't do anything.
Yes.
Nic
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate hw_pagetables linked with IOAS. There are needs to support hw_pagetable allocation with parameters specified by user. For example, in nested translation, user needs to allocate hw_pagetable for the stage-1 translation (e.g. a single I/O page table or a set of I/O page tables) with user data. It also needs provide a stage-2 hw_pagetable which is linked to the GPA IOAS.
This extends IOMMU_HWPT_ALLOC to accept user specified parameter and hwpt ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires a parent hwpt to point to stage-2 translation.
enum iommu_hwpt_type is defined to differentiate the user parameters use by different usages. For the allocations that don't require user parameter, IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility. Other types would be added by future iommu vendor driver extensions.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 94 +++++++++++++++++++++++++--- drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 30 +++++++++ 3 files changed, 115 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 6e45ec0a66fa..64e7cf7142e1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -165,34 +165,106 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, return ERR_PTR(rc); }
+/* + * size of page table type specific data, indexed by + * enum iommu_hwpt_type. + */ +static const size_t iommufd_hwpt_alloc_data_size[] = { + [IOMMU_HWPT_TYPE_DEFAULT] = 0, +}; + int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; - struct iommufd_hw_pagetable *hwpt; + struct iommufd_hw_pagetable *hwpt, *parent = NULL; + struct iommufd_object *pt_obj; struct iommufd_device *idev; struct iommufd_ioas *ioas; + const struct iommu_ops *ops; + void *data = NULL; + u32 klen; int rc;
- if (cmd->flags) + if (cmd->__reserved || cmd->flags) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id); if (IS_ERR(idev)) return PTR_ERR(idev);
- ioas = iommufd_get_ioas(ucmd, cmd->pt_id); - if (IS_ERR(ioas)) { - rc = PTR_ERR(ioas); + ops = dev_iommu_ops(idev->dev); + if (!ops) { + rc = -EOPNOTSUPP; goto out_put_idev; }
+ /* Only support IOMMU_HWPT_TYPE_DEFAULT for now */ + if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) { + rc = -EINVAL; + goto out_put_idev; + } + + pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) { + rc = -EINVAL; + goto out_put_idev; + } + + switch (pt_obj->type) { + case IOMMUFD_OBJ_IOAS: + ioas = container_of(pt_obj, struct iommufd_ioas, obj); + break; + case IOMMUFD_OBJ_HW_PAGETABLE: + /* pt_id points HWPT only when data_type is !IOMMU_HWPT_TYPE_DEFAULT */ + if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT) { + rc = -EINVAL; + goto out_put_pt; + } + + parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj); + /* + * Cannot allocate user-managed hwpt linking to auto_created + * hwpt. If the parent hwpt is already a user-managed hwpt, + * don't allocate another user-managed hwpt linking to it. + */ + if (parent->auto_domain || parent->parent) { + rc = -EINVAL; + goto out_put_pt; + } + ioas = parent->ioas; + break; + default: + rc = -EINVAL; + goto out_put_pt; + } + + klen = iommufd_hwpt_alloc_data_size[cmd->data_type]; + if (klen) { + if (!cmd->data_len) { + rc = -EINVAL; + goto out_put_pt; + } + + data = kzalloc(klen, GFP_KERNEL); + if (!data) { + rc = -ENOMEM; + goto out_put_pt; + } + + rc = copy_struct_from_user(data, klen, + u64_to_user_ptr(cmd->data_uptr), + cmd->data_len); + if (rc) + goto out_free_data; + } + mutex_lock(&ioas->mutex); hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, - NULL, NULL, false); + parent, data, false); mutex_unlock(&ioas->mutex); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt); - goto out_put_ioas; + goto out_free_data; }
cmd->out_hwpt_id = hwpt->obj.id; @@ -200,12 +272,14 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (rc) goto out_hwpt; iommufd_object_finalize(ucmd->ictx, &hwpt->obj); - goto out_put_ioas; + goto out_free_data;
out_hwpt: iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); -out_put_ioas: - iommufd_put_object(&ioas->obj); +out_free_data: + kfree(data); +out_put_pt: + iommufd_put_object(pt_obj); out_put_idev: iommufd_put_object(&idev->obj); return rc; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index f079c0bda46b..7ab1e2c638a1 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -295,7 +295,7 @@ struct iommufd_ioctl_op { static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, - __reserved), + data_uptr), IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info, struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4ac525897b82..48781ff40a37 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,6 +347,14 @@ struct iommu_vfio_ioas { }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS)
+/** + * enum iommu_hwpt_type - IOMMU HWPT Type + * @IOMMU_HWPT_TYPE_DEFAULT: default + */ +enum iommu_hwpt_type { + IOMMU_HWPT_TYPE_DEFAULT, +}; + /** * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) * @size: sizeof(struct iommu_hwpt_alloc) @@ -355,12 +363,31 @@ struct iommu_vfio_ioas { * @pt_id: The IOAS to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 + * @data_type: One of enum iommu_hwpt_type + * @data_len: Length of the type specific data + * @data_uptr: User pointer to the type specific data * * Explicitly allocate a hardware page table object. This is the same object * type that is returned by iommufd_device_attach() and represents the * underlying iommu driver's iommu_domain kernel object. * * A normal HWPT will be created with the mappings from the given IOAS. + * The @data_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or + * another type (being listed below) to specialize a kernel-managed HWPT. + * + * A user-managed HWPT will be created from a given parent HWPT via @pt_id, in + * which the parent HWPT must be allocated previously via the same ioctl from a + * given IOAS. The @data_type must not be set to IOMMU_HWPT_TYPE_DEFAULT but a + * pre-defined type corresponding to the underlying IOMMU hardware. + * + * If the @data_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the @data_len and + * the @data_uptr will be ignored. Otherwise, both of them must be given. + * + * +==============================+=====================================+===========+ + * | @data_type | Data structure in @data_uptr | @pt_id | + * +------------------------------+-------------------------------------+-----------+ + * | IOMMU_HWPT_TYPE_DEFAULT | N/A | IOAS | + * +------------------------------+-------------------------------------+-----------+ */ struct iommu_hwpt_alloc { __u32 size; @@ -369,6 +396,9 @@ struct iommu_hwpt_alloc { __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved; + __u32 data_type; + __u32 data_len; + __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
On 3/9/23 4:09 PM, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate hw_pagetables linked with IOAS. There are needs to support hw_pagetable allocation with parameters specified by user. For example, in nested translation, user needs to allocate hw_pagetable for the stage-1 translation (e.g. a single I/O page table or a set of I/O page tables) with user data. It also needs provide a stage-2 hw_pagetable which is linked to the GPA IOAS.
This extends IOMMU_HWPT_ALLOC to accept user specified parameter and hwpt ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires a parent hwpt to point to stage-2 translation.
enum iommu_hwpt_type is defined to differentiate the user parameters use by different usages. For the allocations that don't require user parameter, IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility. Other types would be added by future iommu vendor driver extensions.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 94 +++++++++++++++++++++++++--- drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 30 +++++++++ 3 files changed, 115 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 6e45ec0a66fa..64e7cf7142e1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -165,34 +165,106 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, return ERR_PTR(rc); } +/*
- size of page table type specific data, indexed by
- enum iommu_hwpt_type.
- */
+static const size_t iommufd_hwpt_alloc_data_size[] = {
- [IOMMU_HWPT_TYPE_DEFAULT] = 0,
+};
- int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_hw_pagetable *hwpt, *parent = NULL;
- struct iommufd_object *pt_obj; struct iommufd_device *idev; struct iommufd_ioas *ioas;
- const struct iommu_ops *ops;
- void *data = NULL;
- u32 klen; int rc;
Reverse Christmas tree format. Ditto to other places in this file.
- if (cmd->flags)
- if (cmd->__reserved || cmd->flags) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id); if (IS_ERR(idev)) return PTR_ERR(idev);
- ioas = iommufd_get_ioas(ucmd, cmd->pt_id);
- if (IS_ERR(ioas)) {
rc = PTR_ERR(ioas);
- ops = dev_iommu_ops(idev->dev);
- if (!ops) {
goto out_put_idev; }rc = -EOPNOTSUPP;
No need to check. dev_iommu_ops() will never returns a NULL.
- /* Only support IOMMU_HWPT_TYPE_DEFAULT for now */
- if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) {
rc = -EINVAL;
goto out_put_idev;
- }
- pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY);
- if (IS_ERR(pt_obj)) {
rc = -EINVAL;
goto out_put_idev;
- }
- switch (pt_obj->type) {
- case IOMMUFD_OBJ_IOAS:
ioas = container_of(pt_obj, struct iommufd_ioas, obj);
break;
- case IOMMUFD_OBJ_HW_PAGETABLE:
/* pt_id points HWPT only when data_type is !IOMMU_HWPT_TYPE_DEFAULT */
if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT) {
rc = -EINVAL;
goto out_put_pt;
}
parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj);
/*
* Cannot allocate user-managed hwpt linking to auto_created
* hwpt. If the parent hwpt is already a user-managed hwpt,
* don't allocate another user-managed hwpt linking to it.
*/
if (parent->auto_domain || parent->parent) {
rc = -EINVAL;
goto out_put_pt;
}
ioas = parent->ioas;
break;
- default:
rc = -EINVAL;
goto out_put_pt;
- }
- klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
- if (klen) {
if (!cmd->data_len) {
rc = -EINVAL;
goto out_put_pt;
}
Is the user_data still valid if (cmd->data_len < klen)?
data = kzalloc(klen, GFP_KERNEL);
if (!data) {
rc = -ENOMEM;
goto out_put_pt;
}
rc = copy_struct_from_user(data, klen,
u64_to_user_ptr(cmd->data_uptr),
cmd->data_len);
if (rc)
goto out_free_data;
- }
- mutex_lock(&ioas->mutex); hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
NULL, NULL, false);
mutex_unlock(&ioas->mutex); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt);parent, data, false);
goto out_put_ioas;
}goto out_free_data;
cmd->out_hwpt_id = hwpt->obj.id; @@ -200,12 +272,14 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (rc) goto out_hwpt; iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
- goto out_put_ioas;
- goto out_free_data;
out_hwpt: iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); -out_put_ioas:
- iommufd_put_object(&ioas->obj);
+out_free_data:
- kfree(data);
+out_put_pt:
- iommufd_put_object(pt_obj); out_put_idev: iommufd_put_object(&idev->obj); return rc;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index f079c0bda46b..7ab1e2c638a1 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -295,7 +295,7 @@ struct iommufd_ioctl_op { static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct iommu_destroy, id), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc,
__reserved),
IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info, struct iommu_hw_info, __reserved), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl,data_uptr),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4ac525897b82..48781ff40a37 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,6 +347,14 @@ struct iommu_vfio_ioas { }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VFIO_IOAS) +/**
- enum iommu_hwpt_type - IOMMU HWPT Type
- @IOMMU_HWPT_TYPE_DEFAULT: default
- */
+enum iommu_hwpt_type {
- IOMMU_HWPT_TYPE_DEFAULT,
+};
- /**
- struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
- @size: sizeof(struct iommu_hwpt_alloc)
@@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
- @pt_id: The IOAS to connect this HWPT to
- @out_hwpt_id: The ID of the new HWPT
- @__reserved: Must be 0
- @data_type: One of enum iommu_hwpt_type
- @data_len: Length of the type specific data
- @data_uptr: User pointer to the type specific data
- Explicitly allocate a hardware page table object. This is the same object
- type that is returned by iommufd_device_attach() and represents the
- underlying iommu driver's iommu_domain kernel object.
- A normal HWPT will be created with the mappings from the given IOAS.
- The @data_type for its allocation can be set to IOMMU_HWPT_TYPE_DEFAULT, or
- another type (being listed below) to specialize a kernel-managed HWPT.
- A user-managed HWPT will be created from a given parent HWPT via @pt_id, in
- which the parent HWPT must be allocated previously via the same ioctl from a
- given IOAS. The @data_type must not be set to IOMMU_HWPT_TYPE_DEFAULT but a
- pre-defined type corresponding to the underlying IOMMU hardware.
- If the @data_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the @data_len and
- the @data_uptr will be ignored. Otherwise, both of them must be given.
- +==============================+=====================================+===========+
- | @data_type | Data structure in @data_uptr | @pt_id |
- +------------------------------+-------------------------------------+-----------+
- | IOMMU_HWPT_TYPE_DEFAULT | N/A | IOAS |
*/ struct iommu_hwpt_alloc { __u32 size;
- +------------------------------+-------------------------------------+-----------+
@@ -369,6 +396,9 @@ struct iommu_hwpt_alloc { __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved;
- __u32 data_type;
- __u32 data_len;
- __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
Best regards, baolu
From: Baolu Lu baolu.lu@linux.intel.com Sent: Friday, March 10, 2023 11:02 AM
On 3/9/23 4:09 PM, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for
usersapce.
But it can only allocate hw_pagetables linked with IOAS. There are needs to support hw_pagetable allocation with parameters specified by user. For example, in nested translation, user needs to allocate hw_pagetable for the stage-1 translation (e.g. a single I/O page table or a set of I/O page tables) with user data. It also needs provide a stage-2 hw_pagetable
which
is linked to the GPA IOAS.
This extends IOMMU_HWPT_ALLOC to accept user specified parameter
and hwpt
ID in @pt_id field. Such as the user-managed stage-1 hwpt, which requires a parent hwpt to point to stage-2 translation.
enum iommu_hwpt_type is defined to differentiate the user parameters
use
by different usages. For the allocations that don't require user parameter, IOMMU_HWPT_TYPE_DEFAULT is defined for backward compatibility.
Other types
would be added by future iommu vendor driver extensions.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 94
+++++++++++++++++++++++++---
drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 30 +++++++++ 3 files changed, 115 insertions(+), 11 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 6e45ec0a66fa..64e7cf7142e1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -165,34 +165,106 @@ iommufd_hw_pagetable_alloc(struct
iommufd_ctx *ictx, struct iommufd_ioas *ioas,
return ERR_PTR(rc); }
+/*
- size of page table type specific data, indexed by
- enum iommu_hwpt_type.
- */
+static const size_t iommufd_hwpt_alloc_data_size[] = {
- [IOMMU_HWPT_TYPE_DEFAULT] = 0,
+};
- int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- struct iommufd_hw_pagetable *hwpt, *parent = NULL;
- struct iommufd_object *pt_obj; struct iommufd_device *idev; struct iommufd_ioas *ioas;
- const struct iommu_ops *ops;
- void *data = NULL;
- u32 klen; int rc;
Reverse Christmas tree format. Ditto to other places in this file.
Got it.
- if (cmd->flags)
if (cmd->__reserved || cmd->flags) return -EOPNOTSUPP;
idev = iommufd_get_device(ucmd, cmd->dev_id); if (IS_ERR(idev)) return PTR_ERR(idev);
- ioas = iommufd_get_ioas(ucmd, cmd->pt_id);
- if (IS_ERR(ioas)) {
rc = PTR_ERR(ioas);
- ops = dev_iommu_ops(idev->dev);
- if (!ops) {
goto out_put_idev; }rc = -EOPNOTSUPP;
No need to check. dev_iommu_ops() will never returns a NULL.
Got it.
- /* Only support IOMMU_HWPT_TYPE_DEFAULT for now */
- if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) {
rc = -EINVAL;
goto out_put_idev;
- }
- pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id,
IOMMUFD_OBJ_ANY);
- if (IS_ERR(pt_obj)) {
rc = -EINVAL;
goto out_put_idev;
- }
- switch (pt_obj->type) {
- case IOMMUFD_OBJ_IOAS:
ioas = container_of(pt_obj, struct iommufd_ioas, obj);
break;
- case IOMMUFD_OBJ_HW_PAGETABLE:
/* pt_id points HWPT only when data_type
is !IOMMU_HWPT_TYPE_DEFAULT */
if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT) {
rc = -EINVAL;
goto out_put_pt;
}
parent = container_of(pt_obj, struct
iommufd_hw_pagetable, obj);
/*
* Cannot allocate user-managed hwpt linking to
auto_created
* hwpt. If the parent hwpt is already a user-managed hwpt,
* don't allocate another user-managed hwpt linking to it.
*/
if (parent->auto_domain || parent->parent) {
rc = -EINVAL;
goto out_put_pt;
}
ioas = parent->ioas;
break;
- default:
rc = -EINVAL;
goto out_put_pt;
- }
- klen = iommufd_hwpt_alloc_data_size[cmd->data_type];
- if (klen) {
if (!cmd->data_len) {
rc = -EINVAL;
goto out_put_pt;
}
Is the user_data still valid if (cmd->data_len < klen)?
Yes, one day if kernel has new fields added in the end of the uapi structure but qemu is using an old uapi header. In such case, the extra byte should be zeroed.
data = kzalloc(klen, GFP_KERNEL);
if (!data) {
rc = -ENOMEM;
goto out_put_pt;
}
rc = copy_struct_from_user(data, klen,
u64_to_user_ptr(cmd->data_uptr),
cmd->data_len);
if (rc)
goto out_free_data;
- }
- mutex_lock(&ioas->mutex); hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev,
NULL, NULL, false);
mutex_unlock(&ioas->mutex); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt);parent, data, false);
goto out_put_ioas;
goto out_free_data;
}
cmd->out_hwpt_id = hwpt->obj.id;
@@ -200,12 +272,14 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
*ucmd)
if (rc) goto out_hwpt; iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
- goto out_put_ioas;
goto out_free_data;
out_hwpt: iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
-out_put_ioas:
- iommufd_put_object(&ioas->obj);
+out_free_data:
- kfree(data);
+out_put_pt:
- iommufd_put_object(pt_obj); out_put_idev: iommufd_put_object(&idev->obj); return rc;
diff --git a/drivers/iommu/iommufd/main.c
b/drivers/iommu/iommufd/main.c
index f079c0bda46b..7ab1e2c638a1 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -295,7 +295,7 @@ struct iommufd_ioctl_op { static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_DESTROY, iommufd_destroy, struct
iommu_destroy, id),
IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct
iommu_hwpt_alloc,
__reserved),
IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO,data_uptr),
iommufd_device_get_hw_info,
struct iommu_hw_info, __reserved),
IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4ac525897b82..48781ff40a37 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -347,6 +347,14 @@ struct iommu_vfio_ioas { }; #define IOMMU_VFIO_IOAS _IO(IOMMUFD_TYPE,
IOMMUFD_CMD_VFIO_IOAS)
+/**
- enum iommu_hwpt_type - IOMMU HWPT Type
- @IOMMU_HWPT_TYPE_DEFAULT: default
- */
+enum iommu_hwpt_type {
- IOMMU_HWPT_TYPE_DEFAULT,
+};
- /**
- struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
- @size: sizeof(struct iommu_hwpt_alloc)
@@ -355,12 +363,31 @@ struct iommu_vfio_ioas {
- @pt_id: The IOAS to connect this HWPT to
- @out_hwpt_id: The ID of the new HWPT
- @__reserved: Must be 0
- @data_type: One of enum iommu_hwpt_type
- @data_len: Length of the type specific data
- @data_uptr: User pointer to the type specific data
- Explicitly allocate a hardware page table object. This is the same
object
- type that is returned by iommufd_device_attach() and represents the
- underlying iommu driver's iommu_domain kernel object.
- A normal HWPT will be created with the mappings from the given IOAS.
- The @data_type for its allocation can be set to
IOMMU_HWPT_TYPE_DEFAULT, or
- another type (being listed below) to specialize a kernel-managed
HWPT.
- A user-managed HWPT will be created from a given parent HWPT via
@pt_id, in
- which the parent HWPT must be allocated previously via the same ioctl
from a
- given IOAS. The @data_type must not be set to
IOMMU_HWPT_TYPE_DEFAULT but a
- pre-defined type corresponding to the underlying IOMMU hardware.
- If the @data_type is set to IOMMU_HWPT_TYPE_DEFAULT, both the
@data_len and
- the @data_uptr will be ignored. Otherwise, both of them must be
given.
+==============================+============================= ========+===========+
- | @data_type | Data structure in @data_uptr | @pt_id
|
- +------------------------------+-------------------------------------+-----------+
- | IOMMU_HWPT_TYPE_DEFAULT | N/A | IOAS
|
*/ struct iommu_hwpt_alloc { __u32 size;
- +------------------------------+-------------------------------------+-----------+
@@ -369,6 +396,9 @@ struct iommu_hwpt_alloc { __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved;
- __u32 data_type;
- __u32 data_len;
- __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE,
IOMMUFD_CMD_HWPT_ALLOC)
Best regards, baolu
In nested translation, the stage-1 page table is user-managed and used by IOMMU hardware, so destroying mappings in the stage-1 page table should be followed with an IOTLB invalidation.
This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 56 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 9 ++++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 27 ++++++++++++ 4 files changed, 95 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 64e7cf7142e1..67facca98de1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -284,3 +284,59 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) iommufd_put_object(&idev->obj); return rc; } + +/* + * size of page table type specific invalidate_info, indexed by + * enum iommu_hwpt_type. + */ +static const size_t iommufd_hwpt_invalidate_info_size[] = {}; + +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{ + struct iommu_hwpt_invalidate *cmd = ucmd->cmd; + struct iommufd_hw_pagetable *hwpt; + u64 user_ptr; + u32 user_data_len, klen; + int rc = 0; + + /* + * For a user-managed HWPT, type should not be IOMMU_HWPT_TYPE_DEFAULT. + * data_len should not exceed the size of iommufd_invalidate_buffer. + */ + if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len || + cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size)) + return -EOPNOTSUPP; + + hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt)) + return PTR_ERR(hwpt); + + /* Do not allow any kernel-managed hw_pagetable */ + if (!hwpt->parent) { + rc = -EINVAL; + goto out_put_hwpt; + } + + klen = iommufd_hwpt_invalidate_info_size[cmd->data_type]; + if (!klen) { + rc = -EINVAL; + goto out_put_hwpt; + } + + /* + * Copy the needed fields before reusing the ucmd buffer, this + * avoids memory allocation in this path. + */ + user_ptr = cmd->data_uptr; + user_data_len = cmd->data_len; + + rc = copy_struct_from_user(cmd, klen, + u64_to_user_ptr(user_ptr), user_data_len); + if (rc) + goto out_put_hwpt; + + hwpt->domain->ops->cache_invalidate_user(hwpt->domain, cmd); +out_put_hwpt: + iommufd_put_object(&hwpt->obj); + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 182c074eecdc..d879264d1acf 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -265,6 +265,7 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd); +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) @@ -276,6 +277,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, refcount_dec(&hwpt->obj.users); }
+static inline struct iommufd_hw_pagetable * +iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_HW_PAGETABLE), + struct iommufd_hw_pagetable, obj); +} + struct iommufd_group { struct kref ref; struct mutex lock; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 7ab1e2c638a1..2cf45f65b637 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -263,6 +263,7 @@ union ucmd_buffer { struct iommu_destroy destroy; struct iommu_hwpt_alloc hwpt; struct iommu_hw_info info; + struct iommu_hwpt_invalidate cache; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -298,6 +299,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { data_uptr), IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info, struct iommu_hw_info, __reserved), + IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate, + struct iommu_hwpt_invalidate, data_uptr), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 48781ff40a37..d0962c41f8d6 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -47,6 +47,7 @@ enum { IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, IOMMUFD_CMD_DEVICE_GET_HW_INFO, + IOMMUFD_CMD_HWPT_INVALIDATE, };
/** @@ -447,4 +448,30 @@ struct iommu_hw_info { __u32 __reserved; }; #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO) + +/** + * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) + * @size: sizeof(struct iommu_hwpt_invalidate) + * @hwpt_id: HWPT ID of target hardware page table for the invalidation + * @data_type: One of enum iommu_hwpt_type + * @data_len: Length of the type specific data + * @data_uptr: User pointer to the type specific data + * + * Invalidate the iommu cache for user-managed page table. Modifications + * on user-managed page table should be followed with this operation to + * sync the IOTLB. This is only needed by user-managed hw_pagetables, so + * the @data_type should never be IOMMU_HWPT_TYPE_DEFAULT. + * + * +==============================+========================================+ + * | @data_type | Data structure in @data_uptr | + * +------------------------------+----------------------------------------+ + */ +struct iommu_hwpt_invalidate { + __u32 size; + __u32 hwpt_id; + __u32 data_type; + __u32 data_len; + __aligned_u64 data_uptr; +}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) #endif
On 3/9/23 4:09 PM, Yi Liu wrote:
In nested translation, the stage-1 page table is user-managed and used by IOMMU hardware, so destroying mappings in the stage-1 page table should be followed with an IOTLB invalidation.
s/destroying mappings/update of any present page table entry/
This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 56 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 9 ++++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 27 ++++++++++++ 4 files changed, 95 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 64e7cf7142e1..67facca98de1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -284,3 +284,59 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) iommufd_put_object(&idev->obj); return rc; }
+/*
- size of page table type specific invalidate_info, indexed by
- enum iommu_hwpt_type.
- */
+static const size_t iommufd_hwpt_invalidate_info_size[] = {};
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- u64 user_ptr;
- u32 user_data_len, klen;
- int rc = 0;
- /*
* For a user-managed HWPT, type should not be IOMMU_HWPT_TYPE_DEFAULT.
* data_len should not exceed the size of iommufd_invalidate_buffer.
*/
- if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len ||
cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
"data_len should not exceed the size of iommufd_invalidate_buffer."
How is this checked?
return -EOPNOTSUPP;
- hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt))
return PTR_ERR(hwpt);
- /* Do not allow any kernel-managed hw_pagetable */
- if (!hwpt->parent) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
- if (!klen) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- /*
* Copy the needed fields before reusing the ucmd buffer, this
* avoids memory allocation in this path.
*/
- user_ptr = cmd->data_uptr;
- user_data_len = cmd->data_len;
Is it a valid case if "user_data_len < klen"?
- rc = copy_struct_from_user(cmd, klen,
u64_to_user_ptr(user_ptr), user_data_len);
- if (rc)
goto out_put_hwpt;
- hwpt->domain->ops->cache_invalidate_user(hwpt->domain, cmd);
+out_put_hwpt:
- iommufd_put_object(&hwpt->obj);
- return rc;
+} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 182c074eecdc..d879264d1acf 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -265,6 +265,7 @@ struct iommufd_hw_pagetable * iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd); +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd); static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) @@ -276,6 +277,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, refcount_dec(&hwpt->obj.users); } +static inline struct iommufd_hw_pagetable * +iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) +{
- return container_of(iommufd_get_object(ucmd->ictx, id,
IOMMUFD_OBJ_HW_PAGETABLE),
struct iommufd_hw_pagetable, obj);
+}
- struct iommufd_group { struct kref ref; struct mutex lock;
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 7ab1e2c638a1..2cf45f65b637 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -263,6 +263,7 @@ union ucmd_buffer { struct iommu_destroy destroy; struct iommu_hwpt_alloc hwpt; struct iommu_hw_info info;
- struct iommu_hwpt_invalidate cache; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy;
@@ -298,6 +299,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { data_uptr), IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info, struct iommu_hw_info, __reserved),
- IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate,
IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas,struct iommu_hwpt_invalidate, data_uptr),
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 48781ff40a37..d0962c41f8d6 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -47,6 +47,7 @@ enum { IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, IOMMUFD_CMD_DEVICE_GET_HW_INFO,
- IOMMUFD_CMD_HWPT_INVALIDATE, };
/** @@ -447,4 +448,30 @@ struct iommu_hw_info { __u32 __reserved; }; #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
+/**
- struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
- @size: sizeof(struct iommu_hwpt_invalidate)
- @hwpt_id: HWPT ID of target hardware page table for the invalidation
- @data_type: One of enum iommu_hwpt_type
- @data_len: Length of the type specific data
- @data_uptr: User pointer to the type specific data
- Invalidate the iommu cache for user-managed page table. Modifications
- on user-managed page table should be followed with this operation to
- sync the IOTLB. This is only needed by user-managed hw_pagetables, so
- the @data_type should never be IOMMU_HWPT_TYPE_DEFAULT.
- +==============================+========================================+
- | @data_type | Data structure in @data_uptr |
- +------------------------------+----------------------------------------+
- */
+struct iommu_hwpt_invalidate {
- __u32 size;
- __u32 hwpt_id;
- __u32 data_type;
- __u32 data_len;
- __aligned_u64 data_uptr;
+}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) #endif
Best regards, baolu
From: Baolu Lu baolu.lu@linux.intel.com Sent: Friday, March 10, 2023 11:16 AM
On 3/9/23 4:09 PM, Yi Liu wrote:
In nested translation, the stage-1 page table is user-managed and used by IOMMU hardware, so destroying mappings in the stage-1 page table
should
be followed with an IOTLB invalidation.
s/destroying mappings/update of any present page table entry/
Right. Not only destroying.
This adds IOMMU_HWPT_INVALIDATE for IOTLB invalidation.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 56
+++++++++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 9 ++++ drivers/iommu/iommufd/main.c | 3 ++ include/uapi/linux/iommufd.h | 27 ++++++++++++ 4 files changed, 95 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 64e7cf7142e1..67facca98de1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -284,3 +284,59 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd
*ucmd)
iommufd_put_object(&idev->obj); return rc; }
+/*
- size of page table type specific invalidate_info, indexed by
- enum iommu_hwpt_type.
- */
+static const size_t iommufd_hwpt_invalidate_info_size[] = {};
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- u64 user_ptr;
- u32 user_data_len, klen;
- int rc = 0;
- /*
* For a user-managed HWPT, type should not be
IOMMU_HWPT_TYPE_DEFAULT.
* data_len should not exceed the size of
iommufd_invalidate_buffer.
*/
- if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
data_len ||
cmd->data_type >=
ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
"data_len should not exceed the size of iommufd_invalidate_buffer."
How is this checked?
Hmmm, this is a stale comment I suppose.
return -EOPNOTSUPP;
- hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id);
- if (IS_ERR(hwpt))
return PTR_ERR(hwpt);
- /* Do not allow any kernel-managed hw_pagetable */
- if (!hwpt->parent) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- klen = iommufd_hwpt_invalidate_info_size[cmd->data_type];
- if (!klen) {
rc = -EINVAL;
goto out_put_hwpt;
- }
- /*
* Copy the needed fields before reusing the ucmd buffer, this
* avoids memory allocation in this path.
*/
- user_ptr = cmd->data_uptr;
- user_data_len = cmd->data_len;
Is it a valid case if "user_data_len < klen"?
Yes. e.g. an old qemu running on a new kernel which has new field added in the end of the data structure.
Regards, Yi Liu
On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- u64 user_ptr;
- u32 user_data_len, klen;
- int rc = 0;
- /*
* For a user-managed HWPT, type should not be IOMMU_HWPT_TYPE_DEFAULT.
* data_len should not exceed the size of iommufd_invalidate_buffer.
*/
- if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len ||
cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
return -EOPNOTSUPP;
This needs to do the standard check for zeros in unknown trailing data bit. Check that alloc does it too
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Saturday, March 11, 2023 1:50 AM
On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- u64 user_ptr;
- u32 user_data_len, klen;
- int rc = 0;
- /*
* For a user-managed HWPT, type should not be
IOMMU_HWPT_TYPE_DEFAULT.
* data_len should not exceed the size of
iommufd_invalidate_buffer.
*/
- if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
data_len ||
cmd->data_type >=
ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
return -EOPNOTSUPP;
This needs to do the standard check for zeros in unknown trailing data bit. Check that alloc does it too
Yes. would add it in both path.
Regards, Yi Liu
From: Jason Gunthorpe jgg@nvidia.com Sent: Saturday, March 11, 2023 1:50 AM
On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- u64 user_ptr;
- u32 user_data_len, klen;
- int rc = 0;
- /*
* For a user-managed HWPT, type should not be
IOMMU_HWPT_TYPE_DEFAULT.
* data_len should not exceed the size of
iommufd_invalidate_buffer.
*/
- if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
data_len ||
cmd->data_type >=
ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
return -EOPNOTSUPP;
This needs to do the standard check for zeros in unknown trailing data bit. Check that alloc does it too
Maybe it has been covered by the copy_struct_from_user(). Is it?
+ /* + * Copy the needed fields before reusing the ucmd buffer, this + * avoids memory allocation in this path. + */ + user_ptr = cmd->data_uptr; + user_data_len = cmd->data_len; + + rc = copy_struct_from_user(cmd, klen, + u64_to_user_ptr(user_ptr), user_data_len);
Regards, Yi Liu
On Tue, Mar 14, 2023 at 04:18:21AM +0000, Liu, Yi L wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Saturday, March 11, 2023 1:50 AM
On Thu, Mar 09, 2023 at 12:09:05AM -0800, Yi Liu wrote:
+int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{
- struct iommu_hwpt_invalidate *cmd = ucmd->cmd;
- struct iommufd_hw_pagetable *hwpt;
- u64 user_ptr;
- u32 user_data_len, klen;
- int rc = 0;
- /*
* For a user-managed HWPT, type should not be
IOMMU_HWPT_TYPE_DEFAULT.
* data_len should not exceed the size of
iommufd_invalidate_buffer.
*/
- if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd-
data_len ||
cmd->data_type >=
ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))
return -EOPNOTSUPP;
This needs to do the standard check for zeros in unknown trailing data bit. Check that alloc does it too
Maybe it has been covered by the copy_struct_from_user(). Is it?
Yes
Jason
This provides a way for userspace to probe the supported hwpt data types by kernel. Currently, kernel only supports IOMMU_HWPT_TYPE_DEFAULT, new types would be added per vendor drivers' extension.
Userspace that wants to allocate hw_pagetable with user data should check this. While for the allocation without user data, no need for it. It is supported by default.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 18 +++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 2 ++ drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 19cd6df46c6a..0328071dcac1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
cmd->out_data_type = ops->driver_type; cmd->data_len = length; + cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 67facca98de1..160712256c64 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -173,6 +173,14 @@ static const size_t iommufd_hwpt_alloc_data_size[] = { [IOMMU_HWPT_TYPE_DEFAULT] = 0, };
+/* + * bitmaps of supported hwpt types of by underlying iommu, indexed + * by ops->driver_type which is one of enum iommu_hw_info_type. + */ +const u64 iommufd_hwpt_type_bitmaps[] = { + [IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT), +}; + int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; @@ -182,7 +190,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) struct iommufd_ioas *ioas; const struct iommu_ops *ops; void *data = NULL; - u32 klen; + u32 driver_type, klen; int rc;
if (cmd->__reserved || cmd->flags) @@ -198,8 +206,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_put_idev; }
- /* Only support IOMMU_HWPT_TYPE_DEFAULT for now */ - if (cmd->data_type != IOMMU_HWPT_TYPE_DEFAULT) { + driver_type = ops->driver_type; + + /* data_type should be a supported type by the driver */ + if (WARN_ON(driver_type >= ARRAY_SIZE(iommufd_hwpt_type_bitmaps)) || + !((1 << cmd->data_type) & + iommufd_hwpt_type_bitmaps[driver_type])) { rc = -EINVAL; goto out_put_idev; } diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index d879264d1acf..164ccfc2e6e0 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -321,6 +321,8 @@ iommufd_get_device(struct iommufd_ucmd *ucmd, u32 id) void iommufd_device_destroy(struct iommufd_object *obj); int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd);
+extern const u64 iommufd_hwpt_type_bitmaps[]; + struct iommufd_access { struct iommufd_object obj; struct iommufd_ctx *ictx; diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 2cf45f65b637..7ec3ceac01b3 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -298,7 +298,7 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, data_uptr), IOCTL_OP(IOMMU_DEVICE_GET_HW_INFO, iommufd_device_get_hw_info, - struct iommu_hw_info, __reserved), + struct iommu_hw_info, out_hwpt_type_bitmap), IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate, struct iommu_hwpt_invalidate, data_uptr), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index d0962c41f8d6..e2eff9c56ab3 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -421,6 +421,8 @@ enum iommu_hw_info_type { * @out_data_type: Output the iommu hardware info type, it is one of * enum iommu_hw_info_type. * @__reserved: Must be 0 + * @out_hwpt_type_bitmap: Output the supported page table type. Each + * bit is defined in enum iommu_hwpt_type. * * Query the hardware iommu information for given device which has been * bound to iommufd. @data_len is the size of the buffer which captures @@ -435,6 +437,11 @@ enum iommu_hw_info_type { * The @out_data_type will be filled if the ioctl succeeds. It would * be used to decode the data filled in the buffer pointed by @data_ptr. * + * @out_hwpt_type_bitmap reports the supported hwpt types. This differs + * per the @out_data_type. Userspace should check it before allocating a + * user-managed hw_pagetable with user data, unless it allocates a default + * hw_pagetable that does not need user data. + * * This is only available for the physical devices bound to iommufd as * only physical devices can have hardware IOMMU. */ @@ -446,6 +453,7 @@ struct iommu_hw_info { __aligned_u64 data_ptr; __u32 out_data_type; __u32 __reserved; + __aligned_u64 out_hwpt_type_bitmap; }; #define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)
On 3/9/23 4:09 PM, Yi Liu wrote:
This provides a way for userspace to probe the supported hwpt data types by kernel. Currently, kernel only supports IOMMU_HWPT_TYPE_DEFAULT, new types would be added per vendor drivers' extension.
Userspace that wants to allocate hw_pagetable with user data should check this. While for the allocation without user data, no need for it. It is supported by default.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 18 +++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 2 ++ drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 19cd6df46c6a..0328071dcac1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd) cmd->out_data_type = ops->driver_type; cmd->data_len = length;
- cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type];
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 67facca98de1..160712256c64 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -173,6 +173,14 @@ static const size_t iommufd_hwpt_alloc_data_size[] = { [IOMMU_HWPT_TYPE_DEFAULT] = 0, }; +/*
- bitmaps of supported hwpt types of by underlying iommu, indexed
- by ops->driver_type which is one of enum iommu_hw_info_type.
- */
+const u64 iommufd_hwpt_type_bitmaps[] = {
- [IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+};
I am a bit confused here. Why do you need this array? What I read is that you want to convert ops->driver_type to a bit position in cmd->out_hwpt_type_bitmap.
Am I getting it right?
If so, why not just cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
?
Best regards, baolu
On Fri, Mar 10, 2023 at 11:30:04AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
This provides a way for userspace to probe the supported hwpt data types by kernel. Currently, kernel only supports IOMMU_HWPT_TYPE_DEFAULT, new types would be added per vendor drivers' extension.
Userspace that wants to allocate hw_pagetable with user data should check this. While for the allocation without user data, no need for it. It is supported by default.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 18 +++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 2 ++ drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 19cd6df46c6a..0328071dcac1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
cmd->out_data_type = ops->driver_type; cmd->data_len = length;
cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type]; rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 67facca98de1..160712256c64 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -173,6 +173,14 @@ static const size_t iommufd_hwpt_alloc_data_size[] = { [IOMMU_HWPT_TYPE_DEFAULT] = 0, };
+/*
- bitmaps of supported hwpt types of by underlying iommu, indexed
- by ops->driver_type which is one of enum iommu_hw_info_type.
- */
+const u64 iommufd_hwpt_type_bitmaps[] = {
[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+};
I am a bit confused here. Why do you need this array? What I read is that you want to convert ops->driver_type to a bit position in cmd->out_hwpt_type_bitmap.
Am I getting it right?
If so, why not just cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
?
A driver_type would be IOMMUFD_HW_INFO_TYPExx. What's inside the BIT_ULL is IOMMUFD_HWPT_TYPE_*. It seems to get a bit confusing after several rounds of renaming though. And they do seem to be a bit of duplications at the actual values, at least for now.
I recall that we only had one enum for the types, yet later did a separation into driver_types and hwpt_types after a discussion regarding future PASID support?
Thanks Nic
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, March 10, 2023 3:10 PM
On Fri, Mar 10, 2023 at 11:30:04AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
This provides a way for userspace to probe the supported hwpt data types by kernel. Currently, kernel only supports
IOMMU_HWPT_TYPE_DEFAULT,
new types would be added per vendor drivers' extension.
Userspace that wants to allocate hw_pagetable with user data should
check
this. While for the allocation without user data, no need for it. It is supported by default.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 18 +++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 2 ++ drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c
b/drivers/iommu/iommufd/device.c
index 19cd6df46c6a..0328071dcac1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct
iommufd_ucmd *ucmd)
cmd->out_data_type = ops->driver_type; cmd->data_len = length;
cmd->out_hwpt_type_bitmap =
iommufd_hwpt_type_bitmaps[ops->driver_type];
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 67facca98de1..160712256c64 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -173,6 +173,14 @@ static const size_t
iommufd_hwpt_alloc_data_size[] = {
[IOMMU_HWPT_TYPE_DEFAULT] = 0,
};
+/*
- bitmaps of supported hwpt types of by underlying iommu, indexed
- by ops->driver_type which is one of enum iommu_hw_info_type.
- */
+const u64 iommufd_hwpt_type_bitmaps[] = {
[IOMMU_HW_INFO_TYPE_DEFAULT] =
BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+};
I am a bit confused here. Why do you need this array? What I read is that you want to convert ops->driver_type to a bit position in cmd->out_hwpt_type_bitmap.
Am I getting it right?
If so, why not just cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
?
The reason is for future extensions. If future usages need different types of user data to allocate hwpt, it can define a new type and corresponding data structure. Such new usages may be using new vendor-specific page tables or vendor-agnostic usages like Re-use of the KVM page table in the IOMMU mentioned by IOMMUFD basic series.
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
A driver_type would be IOMMUFD_HW_INFO_TYPExx. What's inside the BIT_ULL is IOMMUFD_HWPT_TYPE_*. It seems to get a bit confusing after several rounds of renaming though. And they do seem to be a bit of duplications at the actual values, at least for now.
For now, vendor drivers only have one stage-1 page table format. But in the future, it may change per new page table format introduction and new usage.
Regards, Yi Liu
On Fri, Mar 10, 2023 at 07:39:00AM +0000, Liu, Yi L wrote:
External email: Use caution opening links or attachments
From: Nicolin Chen nicolinc@nvidia.com Sent: Friday, March 10, 2023 3:10 PM
On Fri, Mar 10, 2023 at 11:30:04AM +0800, Baolu Lu wrote:
External email: Use caution opening links or attachments
On 3/9/23 4:09 PM, Yi Liu wrote:
This provides a way for userspace to probe the supported hwpt data types by kernel. Currently, kernel only supports
IOMMU_HWPT_TYPE_DEFAULT,
new types would be added per vendor drivers' extension.
Userspace that wants to allocate hw_pagetable with user data should
check
this. While for the allocation without user data, no need for it. It is supported by default.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/device.c | 1 + drivers/iommu/iommufd/hw_pagetable.c | 18 +++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 2 ++ drivers/iommu/iommufd/main.c | 2 +- include/uapi/linux/iommufd.h | 8 ++++++++ 5 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c
b/drivers/iommu/iommufd/device.c
index 19cd6df46c6a..0328071dcac1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -322,6 +322,7 @@ int iommufd_device_get_hw_info(struct
iommufd_ucmd *ucmd)
cmd->out_data_type = ops->driver_type; cmd->data_len = length;
cmd->out_hwpt_type_bitmap =
iommufd_hwpt_type_bitmaps[ops->driver_type];
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
diff --git a/drivers/iommu/iommufd/hw_pagetable.c
b/drivers/iommu/iommufd/hw_pagetable.c
index 67facca98de1..160712256c64 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -173,6 +173,14 @@ static const size_t
iommufd_hwpt_alloc_data_size[] = {
[IOMMU_HWPT_TYPE_DEFAULT] = 0,
};
+/*
- bitmaps of supported hwpt types of by underlying iommu, indexed
- by ops->driver_type which is one of enum iommu_hw_info_type.
- */
+const u64 iommufd_hwpt_type_bitmaps[] = {
[IOMMU_HW_INFO_TYPE_DEFAULT] =
BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+};
I am a bit confused here. Why do you need this array? What I read is that you want to convert ops->driver_type to a bit position in cmd->out_hwpt_type_bitmap.
Am I getting it right?
If so, why not just cmd->out_hwpt_type_bitmap = BIT_ULL(ops->driver_type);
?
The reason is for future extensions. If future usages need different types of user data to allocate hwpt, it can define a new type and corresponding data structure. Such new usages may be using new vendor-specific page tables or vendor-agnostic usages like Re-use of the KVM page table in the IOMMU mentioned by IOMMUFD basic series.
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_jgg@nvidia.com/
A driver_type would be IOMMUFD_HW_INFO_TYPExx. What's inside the BIT_ULL is IOMMUFD_HWPT_TYPE_*. It seems to get a bit confusing after several rounds of renaming though. And they do seem to be a bit of duplications at the actual values, at least for now.
For now, vendor drivers only have one stage-1 page table format. But in the future, it may change per new page table format introduction and new usage.
Yea, that's what I thought too. Yet, I am wondering a bit if it'd be better to have an ops->hwpt_type in the drivers, v.s. maintaining a potentially big chunk of the array here.
Thanks Nic
On Thu, Mar 09, 2023 at 11:45:05PM -0800, Nicolin Chen wrote:
Yea, that's what I thought too. Yet, I am wondering a bit if it'd be better to have an ops->hwpt_type in the drivers, v.s. maintaining a potentially big chunk of the array here.
It probably is, some ops->iommufd_data.XX is not a bad idea
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Saturday, March 11, 2023 1:53 AM
On Thu, Mar 09, 2023 at 11:45:05PM -0800, Nicolin Chen wrote:
Yea, that's what I thought too. Yet, I am wondering a bit if it'd be better to have an ops->hwpt_type in the drivers, v.s. maintaining a potentially big chunk of the array here.
It probably is, some ops->iommufd_data.XX is not a bad idea
Will try.
Regards, Yi Liu
From: Nicolin Chen nicolinc@nvidia.com
Add mock_domain_alloc_user function and iommu_hwpt_selftest data structure to support user space selftest program to allocate domains with user data.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 8 +++- drivers/iommu/iommufd/hw_pagetable.c | 31 +++++++++++-- drivers/iommu/iommufd/iommufd_test.h | 16 +++++++ drivers/iommu/iommufd/selftest.c | 67 ++++++++++++++++++++++++---- 4 files changed, 109 insertions(+), 13 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 0328071dcac1..f95b558f5e95 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -322,7 +322,13 @@ int iommufd_device_get_hw_info(struct iommufd_ucmd *ucmd)
cmd->out_data_type = ops->driver_type; cmd->data_len = length; - cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type]; + + if (ops->driver_type != IOMMU_HW_INFO_TYPE_SELFTEST) + cmd->out_hwpt_type_bitmap = iommufd_hwpt_type_bitmaps[ops->driver_type]; +#ifdef CONFIG_IOMMUFD_TEST + else + cmd->out_hwpt_type_bitmap = U64_MAX; // Pretend to support all types +#endif
rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd));
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 160712256c64..0871da848447 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -7,6 +7,7 @@
#include "../iommu-priv.h" #include "iommufd_private.h" +#include "iommufd_test.h"
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) { @@ -181,6 +182,18 @@ const u64 iommufd_hwpt_type_bitmaps[] = { [IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT), };
+/* Return true if type is supported, otherwise false */ +static inline bool +iommufd_hwpt_type_check(enum iommu_hw_info_type driver_type, + enum iommu_hwpt_type type) +{ + if (WARN_ON(driver_type >= ARRAY_SIZE(iommufd_hwpt_type_bitmaps))) + return false; + + return ((1 << type) & + iommufd_hwpt_type_bitmaps[driver_type]); +} + int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { struct iommu_hwpt_alloc *cmd = ucmd->cmd; @@ -191,6 +204,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) const struct iommu_ops *ops; void *data = NULL; u32 driver_type, klen; + bool support; int rc;
if (cmd->__reserved || cmd->flags) @@ -209,9 +223,13 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) driver_type = ops->driver_type;
/* data_type should be a supported type by the driver */ - if (WARN_ON(driver_type >= ARRAY_SIZE(iommufd_hwpt_type_bitmaps)) || - !((1 << cmd->data_type) & - iommufd_hwpt_type_bitmaps[driver_type])) { + if (driver_type != IOMMU_HW_INFO_TYPE_SELFTEST) + support = iommufd_hwpt_type_check(driver_type, cmd->data_type); +#ifdef CONFIG_IOMMUFD_TEST + else + support = true; /* selftest pretend to support all types */ +#endif + if (!support) { rc = -EINVAL; goto out_put_idev; } @@ -250,7 +268,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_put_pt; }
- klen = iommufd_hwpt_alloc_data_size[cmd->data_type]; + if (cmd->data_type != IOMMU_HWPT_TYPE_SELFTTEST) + klen = iommufd_hwpt_alloc_data_size[cmd->data_type]; +#ifdef CONFIG_IOMMUFD_TEST + else + klen = sizeof(struct iommu_hwpt_selftest); +#endif if (klen) { if (!cmd->data_len) { rc = -EINVAL; diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 578691602d94..8bbb4f4fa1d7 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -115,4 +115,20 @@ struct iommu_hw_info_selftest { __u32 test_reg; };
+/* Should not be equal to any defined value in enum iommu_hwpt_type */ +#define IOMMU_HWPT_TYPE_SELFTTEST 0xbadbeef + +/** + * struct iommu_hwpt_selftest + * + * @flags: page table entry attributes + * @test_config: default iotlb setup (value IOMMU_TEST_IOTLB_DEFAULT) + */ +struct iommu_hwpt_selftest { +#define IOMMU_TEST_FLAG_NESTED (1ULL << 0) + __u64 flags; +#define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef + __u64 test_config; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index b50ec3528ec1..b9c7f3bf64b5 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -84,7 +84,9 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd,
struct mock_iommu_domain { struct iommu_domain domain; + struct mock_iommu_domain *parent; struct xarray pfns; + u32 iotlb; };
enum selftest_obj_type { @@ -142,26 +144,66 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length) return info; }
-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) +static const struct iommu_ops mock_ops; +static struct iommu_domain_ops domain_nested_ops; + +static struct iommu_domain * +__mock_domain_alloc(unsigned int iommu_domain_type, + struct mock_iommu_domain *mock_parent, + const struct iommu_hwpt_selftest *user_cfg) { struct mock_iommu_domain *mock;
if (iommu_domain_type == IOMMU_DOMAIN_BLOCKED) return &mock_blocking_domain;
- if (WARN_ON(iommu_domain_type != IOMMU_DOMAIN_UNMANAGED)) - return NULL; - mock = kzalloc(sizeof(*mock), GFP_KERNEL); if (!mock) return NULL; - mock->domain.geometry.aperture_start = MOCK_APERTURE_START; - mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST; + mock->parent = mock_parent; + mock->domain.type = iommu_domain_type; mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE; - xa_init(&mock->pfns); + if (mock_parent) { + mock->iotlb = user_cfg->test_config; + mock->domain.ops = &domain_nested_ops; + } else { + mock->domain.geometry.aperture_start = MOCK_APERTURE_START; + mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST; + mock->domain.ops = mock_ops.default_domain_ops; + xa_init(&mock->pfns); + } return &mock->domain; }
+static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) +{ + if (WARN_ON(iommu_domain_type != IOMMU_DOMAIN_BLOCKED && + iommu_domain_type != IOMMU_DOMAIN_UNMANAGED)) + return NULL; + return __mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED, NULL, NULL); +} + +static struct iommu_domain *mock_domain_alloc_user(struct device *dev, + struct iommu_domain *parent, + const void *user_data) +{ + const struct iommu_hwpt_selftest *user_cfg = user_data; + struct mock_iommu_domain *mock_parent = NULL; + unsigned int iommu_domain_type = IOMMU_DOMAIN_UNMANAGED; + + if (parent) { + if (parent->ops != mock_ops.default_domain_ops) + return NULL; + if (!user_cfg || !(user_cfg->flags & IOMMU_TEST_FLAG_NESTED)) + return NULL; + iommu_domain_type = IOMMU_DOMAIN_NESTED; + mock_parent = container_of(parent, + struct mock_iommu_domain, domain); + } + + return __mock_domain_alloc(iommu_domain_type, mock_parent, user_cfg); +} + static void mock_domain_free(struct iommu_domain *domain) { struct mock_iommu_domain *mock = @@ -296,6 +338,7 @@ static const struct iommu_ops mock_ops = { .driver_type = IOMMU_HW_INFO_TYPE_SELFTEST, .hw_info = mock_domain_hw_info, .domain_alloc = mock_domain_alloc, + .domain_alloc_user = mock_domain_alloc_user, .capable = mock_domain_capable, .set_platform_dma_ops = mock_domain_set_plaform_dma_ops, .default_domain_ops = @@ -308,6 +351,11 @@ static const struct iommu_ops mock_ops = { }, };
+static struct iommu_domain_ops domain_nested_ops = { + .free = mock_domain_free, + .attach_dev = mock_domain_nop_attach, +}; + struct iommu_device mock_iommu_device = { .ops = &mock_ops, }; @@ -324,7 +372,10 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, if (IS_ERR(obj)) return ERR_CAST(obj); hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); - if (hwpt->domain->ops != mock_ops.default_domain_ops) { + if ((hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED && + hwpt->domain->ops != mock_ops.default_domain_ops) || + (hwpt->domain->type == IOMMU_DOMAIN_NESTED && + hwpt->domain->ops != &domain_nested_ops)) { iommufd_put_object(&hwpt->obj); return ERR_PTR(-EINVAL); }
From: Nicolin Chen nicolinc@nvidia.com
The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a customized domain. Add its coverage for both a regular domain case and a nested domain case.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 114 +++++++++++++++++- tools/testing/selftests/iommu/iommufd_utils.h | 36 ++++++ 2 files changed, 149 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index d2ce2ddbdc40..783afd6ccd2e 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -125,6 +125,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_option, IOMMU_OPTION); TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS); TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO); + TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC); #undef TEST_LENGTH }
@@ -197,6 +198,7 @@ FIXTURE_VARIANT(iommufd_ioas) { unsigned int mock_domains; unsigned int memory_limit; + bool new_hwpt; };
FIXTURE_SETUP(iommufd_ioas) @@ -236,6 +238,12 @@ FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain) .mock_domains = 1, };
+FIXTURE_VARIANT_ADD(iommufd_ioas, mock_domain_hwpt) +{ + .mock_domains = 1, + .new_hwpt = true, +}; + FIXTURE_VARIANT_ADD(iommufd_ioas, two_mock_domain) { .mock_domains = 2, @@ -263,6 +271,93 @@ TEST_F(iommufd_ioas, ioas_destroy) } }
+TEST_F(iommufd_ioas, hwpt_alloc) +{ + uint32_t new_hwpt_id = 0; + + if (self->stdev_id && self->device_id) { + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, &new_hwpt_id); + test_cmd_mock_domain_replace(self->stdev_id, new_hwpt_id); + /* hw_pagetable cannot be freed if a device is attached to it */ + EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, new_hwpt_id)); + + /* Detach from the new hw_pagetable and try again */ + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(new_hwpt_id); + } else { + test_err_cmd_hwpt_alloc(ENOENT, self->device_id, + self->ioas_id, &new_hwpt_id); + test_err_mock_domain_replace(ENOENT, + self->stdev_id, new_hwpt_id); + } +} + +TEST_F(iommufd_ioas, nested_hwpt_alloc) +{ + uint32_t nested_hwpt_id[2] = {}; + uint32_t parent_hwpt_id = 0; + uint32_t test_hwpt_id = 0; + + if (self->device_id) { + /* Negative tests */ + test_err_cmd_hwpt_alloc(ENOENT, self->ioas_id, self->device_id, + &test_hwpt_id); + test_err_cmd_hwpt_alloc(EINVAL, self->device_id, + self->device_id, &test_hwpt_id); + + /* Allocate two nested hwpts sharing one common parent hwpt */ + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + &parent_hwpt_id); + + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, + &nested_hwpt_id[0]); + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, + &nested_hwpt_id[1]); + + /* Negative test: a nested hwpt on top of a nested hwpt */ + test_err_cmd_hwpt_alloc_nested(EINVAL, self->device_id, + nested_hwpt_id[0], + &test_hwpt_id); + /* Negative test: parent hwpt now cannot be freed */ + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, parent_hwpt_id)); + + /* Attach device to nested_hwpt_id[0] that then will be busy */ + test_cmd_mock_domain_replace(self->stdev_id, + nested_hwpt_id[0]); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, nested_hwpt_id[0])); + + /* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */ + test_cmd_mock_domain_replace(self->stdev_id, + nested_hwpt_id[1]); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, nested_hwpt_id[1])); + test_ioctl_destroy(nested_hwpt_id[0]); + + /* Detach from nested_hwpt_id[1] and destroy it */ + test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id); + test_ioctl_destroy(nested_hwpt_id[1]); + + /* Detach from the parent hw_pagetable and destroy it */ + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(parent_hwpt_id); + } else { + test_err_cmd_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, + &parent_hwpt_id); + test_err_cmd_hwpt_alloc_nested(ENOENT, self->device_id, + parent_hwpt_id, + &nested_hwpt_id[0]); + test_err_cmd_hwpt_alloc_nested(ENOENT, self->device_id, + parent_hwpt_id, + &nested_hwpt_id[1]); + test_err_mock_domain_replace(ENOENT, self->stdev_id, + nested_hwpt_id[0]); + test_err_mock_domain_replace(ENOENT, self->stdev_id, + nested_hwpt_id[1]); + } +} + TEST_F(iommufd_ioas, hwpt_attach) { /* Create a device attached directly to a hwpt */ @@ -632,6 +727,8 @@ TEST_F(iommufd_ioas, access_pin) MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES);
for (npages = 1; npages < BUFFER_SIZE / PAGE_SIZE; npages++) { + uint32_t new_hwpt_id = 0; + uint32_t mock_device_id; uint32_t mock_stdev_id; uint32_t mock_hwpt_id;
@@ -665,12 +762,27 @@ TEST_F(iommufd_ioas, access_pin) _IOMMU_TEST_CMD(IOMMU_TEST_OP_ACCESS_PAGES), &access_cmd)); test_cmd_mock_domain(self->ioas_id, &mock_stdev_id, - &mock_hwpt_id, NULL); + &mock_hwpt_id, &mock_device_id); check_map_cmd.id = mock_hwpt_id; + if (variant->new_hwpt) { + test_cmd_hwpt_alloc(mock_device_id, self->ioas_id, + &new_hwpt_id); + test_cmd_mock_domain_replace(mock_stdev_id, + new_hwpt_id); + check_map_cmd.id = new_hwpt_id; + } else { + check_map_cmd.id = mock_hwpt_id; + } ASSERT_EQ(0, ioctl(self->fd, _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_MAP), &check_map_cmd));
+ if (variant->new_hwpt) { + /* Detach from the new hwpt for its destroy() */ + test_cmd_mock_domain_replace(mock_stdev_id, + mock_hwpt_id); + test_ioctl_destroy(new_hwpt_id); + } test_ioctl_destroy(mock_stdev_id); test_cmd_destroy_access_pages( access_cmd.id, diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index b57e1e60f69d..e0e484e1c775 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -116,6 +116,42 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id,
#define test_cmd_hwpt_alloc(device_id, pt_id, hwpt_id) \ ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, hwpt_id)) +#define test_err_cmd_hwpt_alloc(_errno, device_id, pt_id, hwpt_id) \ + EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \ + pt_id, hwpt_id)) + +static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id, + __u32 *hwpt_id) +{ + struct iommu_hwpt_selftest data = { + .flags = IOMMU_TEST_FLAG_NESTED, + .test_config = IOMMU_TEST_IOTLB_DEFAULT, + }; + struct iommu_hwpt_alloc cmd = { + .size = sizeof(cmd), + .dev_id = device_id, + .pt_id = parent_id, + .data_type = IOMMU_HWPT_TYPE_SELFTTEST, + .data_len = sizeof(data), + .data_uptr = (uint64_t)&data, + }; + int ret; + + ret = ioctl(fd, IOMMU_HWPT_ALLOC, &cmd); + if (ret) + return ret; + if (hwpt_id) + *hwpt_id = cmd.out_hwpt_id; + return 0; +} + +#define test_cmd_hwpt_alloc_nested(device_id, parent_id, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc_nested(self->fd, device_id, \ + parent_id, hwpt_id)) +#define test_err_cmd_hwpt_alloc_nested(_errno, device_id, parent_id, hwpt_id) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_hwpt_alloc_nested(self->fd, device_id, \ + parent_id, hwpt_id))
static int _test_cmd_access_set_ioas(int fd, __u32 access_id, unsigned int ioas_id)
From: Nicolin Chen nicolinc@nvidia.com
This allows to test whether IOTLB has been invalidated or not.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 4 ++++ drivers/iommu/iommufd/selftest.c | 22 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 4 ++++ tools/testing/selftests/iommu/iommufd_utils.h | 14 ++++++++++++ 4 files changed, 44 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 8bbb4f4fa1d7..4b60a6df0428 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -19,6 +19,7 @@ enum { IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT, IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE, IOMMU_TEST_OP_ACCESS_SET_IOAS, + IOMMU_TEST_OP_MD_CHECK_IOTLB, };
enum { @@ -95,6 +96,9 @@ struct iommu_test_cmd { struct { __u32 ioas_id; } access_set_ioas; + struct { + __u32 iotlb; + } check_iotlb; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index b9c7f3bf64b5..a0d57f00f759 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -667,6 +667,25 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd, return 0; }
+static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, + u32 mockpt_id, u32 iotlb) +{ + struct iommufd_hw_pagetable *hwpt; + struct mock_iommu_domain *mock; + int rc = 0; + + hwpt = get_md_pagetable(ucmd, mockpt_id, &mock); + if (IS_ERR(hwpt)) + return PTR_ERR(hwpt); + + mock = container_of(hwpt->domain, struct mock_iommu_domain, domain); + + if (iotlb != mock->iotlb) + rc = -EINVAL; + iommufd_put_object(&hwpt->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1077,6 +1096,9 @@ int iommufd_test(struct iommufd_ucmd *ucmd) return iommufd_test_md_check_refs( ucmd, u64_to_user_ptr(cmd->check_refs.uptr), cmd->check_refs.length, cmd->check_refs.refs); + case IOMMU_TEST_OP_MD_CHECK_IOTLB: + return iommufd_test_md_check_iotlb(ucmd, cmd->id, + cmd->check_iotlb.iotlb); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 783afd6ccd2e..10f1592dc9e7 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -313,6 +313,10 @@ TEST_F(iommufd_ioas, nested_hwpt_alloc) &nested_hwpt_id[0]); test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, &nested_hwpt_id[1]); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[1], + IOMMU_TEST_IOTLB_DEFAULT);
/* Negative test: a nested hwpt on top of a nested hwpt */ test_err_cmd_hwpt_alloc_nested(EINVAL, self->device_id, diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index e0e484e1c775..211798190e5b 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -120,6 +120,20 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \ pt_id, hwpt_id))
+#define test_cmd_hwpt_check_iotlb(hwpt_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_MD_CHECK_IOTLB, \ + .id = hwpt_id, \ + .check_iotlb = { .iotlb = expected }, \ + }; \ + ASSERT_EQ(0, \ + ioctl(self->fd, \ + _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_IOTLB), \ + &test_cmd)); \ + }) + static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id, __u32 *hwpt_id) {
From: Nicolin Chen nicolinc@nvidia.com
Add a mock_domain_cache_invalidate_user() and a corresponding struct iommu_hwpt_invalidate_selftest, to allow to test IOMMU_HWPT_INVALIDATE from user space, by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 10 +++++++-- drivers/iommu/iommufd/iommufd_test.h | 10 +++++++++ drivers/iommu/iommufd/selftest.c | 15 +++++++++++++ tools/testing/selftests/iommu/iommufd.c | 8 +++++++ tools/testing/selftests/iommu/iommufd_utils.h | 21 +++++++++++++++++++ 5 files changed, 62 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 0871da848447..15dd1743d888 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -339,7 +339,8 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) * data_len should not exceed the size of iommufd_invalidate_buffer. */ if (cmd->data_type == IOMMU_HWPT_TYPE_DEFAULT || !cmd->data_len || - cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size)) + (cmd->data_type != IOMMU_HWPT_TYPE_SELFTTEST && + cmd->data_type >= ARRAY_SIZE(iommufd_hwpt_invalidate_info_size))) return -EOPNOTSUPP;
hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id); @@ -352,7 +353,12 @@ int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) goto out_put_hwpt; }
- klen = iommufd_hwpt_invalidate_info_size[cmd->data_type]; + if (cmd->data_type != IOMMU_HWPT_TYPE_SELFTTEST) + klen = iommufd_hwpt_invalidate_info_size[cmd->data_type]; +#ifdef CONFIG_IOMMUFD_TEST + else + klen = sizeof(struct iommu_hwpt_invalidate_selftest); +#endif if (!klen) { rc = -EINVAL; goto out_put_hwpt; diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 4b60a6df0428..090e9d6abd87 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -135,4 +135,14 @@ struct iommu_hwpt_selftest { __u64 test_config; };
+/** + * struct iommu_hwpt_invalidate_selftest + * + * @flags: invalidate flags + */ +struct iommu_hwpt_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_ALL (1ULL << 0) + __u64 flags; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index a0d57f00f759..fe1caba9e2c5 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -351,9 +351,24 @@ static const struct iommu_ops mock_ops = { }, };
+static void mock_domain_cache_invalidate_user(struct iommu_domain *domain, + void *user_data) +{ + struct iommu_hwpt_invalidate_selftest *inv_info = user_data; + struct mock_iommu_domain *mock = + container_of(domain, struct mock_iommu_domain, domain); + + if (domain->type != IOMMU_DOMAIN_NESTED || !mock->parent) + return; + + if (inv_info->flags & IOMMU_TEST_INVALIDATE_ALL) + mock->iotlb = 0; +} + static struct iommu_domain_ops domain_nested_ops = { .free = mock_domain_free, .attach_dev = mock_domain_nop_attach, + .cache_invalidate_user = mock_domain_cache_invalidate_user, };
struct iommu_device mock_iommu_device = { diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 10f1592dc9e7..89a44479835d 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -126,6 +126,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS); TEST_LENGTH(iommu_hw_info, IOMMU_DEVICE_GET_HW_INFO); TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC); + TEST_LENGTH(iommu_hwpt_invalidate, IOMMU_HWPT_INVALIDATE); #undef TEST_LENGTH }
@@ -326,6 +327,13 @@ TEST_F(iommufd_ioas, nested_hwpt_alloc) EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, parent_hwpt_id));
+ /* hwpt_invalidate only supports a user-managed hwpt (nested) */ + test_err_cmd_hwpt_invalidate(EINVAL, parent_hwpt_id); + test_cmd_hwpt_invalidate(nested_hwpt_id[0]); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0); + test_cmd_hwpt_invalidate(nested_hwpt_id[1]); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[1], 0); + /* Attach device to nested_hwpt_id[0] that then will be busy */ test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id[0]); diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 211798190e5b..b1592b4f2049 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -167,6 +167,27 @@ static int _test_cmd_hwpt_alloc_nested(int fd, __u32 device_id, __u32 parent_id, _test_cmd_hwpt_alloc_nested(self->fd, device_id, \ parent_id, hwpt_id))
+static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id) +{ + struct iommu_hwpt_invalidate_selftest data = { + .flags = IOMMU_TEST_INVALIDATE_ALL, + }; + struct iommu_hwpt_invalidate cmd = { + .size = sizeof(cmd), + .hwpt_id = hwpt_id, + .data_type = IOMMU_HWPT_TYPE_SELFTTEST, + .data_len = sizeof(data), + .data_uptr = (uint64_t)&data, + }; + + return ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); +} + +#define test_cmd_hwpt_invalidate(hwpt_id) \ + ASSERT_EQ(0, _test_cmd_hwpt_invalidate(self->fd, hwpt_id)) +#define test_err_cmd_hwpt_invalidate(_errno, hwpt_id) \ + EXPECT_ERRNO(_errno, _test_cmd_hwpt_invalidate(self->fd, hwpt_id)) + static int _test_cmd_access_set_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
On 2023/3/9 16:08, Yi Liu wrote:
Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation.
The last sentence is not always true. If any entry of the stage-1 translation table is changed from non-present to present, the IOTLB has no chance to cache it yet. Hence there's no need for invalidation.
Best regards, baolu
linux-kselftest-mirror@lists.linaro.org