Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |---------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt) and each has an iommu_domain allocated from iommu driver. So in this series hw_pagetable and iommu_domain means the same thing if no special note. IOMMUFD has already supported allocating hw_pagetable that is linked with an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable with driver specific parameters and interface to sync stage-1 IOTLB as user owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first extends domain_alloc_user to allocate domains with user data and adds new op for invalidate stage-1 IOTLB for user-managed domains, then extends the IOMMUFD internal infrastructure to accept user_data and parent hwpt, relay the user_data/parent to iommu core to allocate user-managed iommu_domain. After it, extends the ioctl IOMMU_HWPT_ALLOC to accept user data and stage-2 hwpt ID. Along with it, ioctl IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed for user-managed hwpts. Selftest is added as well to cover the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20230818101033.4100-1-yi.l.liu@intel.com... - merged [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v4: - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs - Rework invalidate uAPI to be a multi-request array-based design - Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array - Add a patch fixing TEST_LENGTH() in selftest program - Drop IOMMU_RESV_IOVA_RANGES patches - Update kdoc and inline comments - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.c... - Add new uAPI things in alphabetical order - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for sanity, replacing the previous op->domain_alloc_user_data_len solution - Return ERR_PTR from domain_alloc_user instead of NULL - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin) - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O page table). (Kevin) - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl - Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.co... - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.c...
Thanks, Yi Liu
Lu Baolu (1): iommu: Add nested domain support
Nicolin Chen (12): iommufd: Unite all kernel-managed members into a struct iommufd: Separate kernel-managed HWPT alloc/destroy/abort functions iommufd: Add shared alloc_fn function pointer and mutex pointer iommufd: Add user-managed hw_pagetable support iommufd: Always setup MSI and anforce cc on kernel-managed domains iommufd/device: Add helpers to enforce/remove device reserved regions iommufd/selftest: Rework TEST_LENGTH to test min_size explicitly iommufd/selftest: Add nested domain allocation for mock domain iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with nested HWPTs iommufd/selftest: Add mock_domain_cache_invalidate_user support iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (4): iommu: Add hwpt_type with user_data for domain_alloc_user op iommufd: Pass in hwpt_type/user_data to iommufd_hw_pagetable_alloc() iommufd: Support IOMMU_HWPT_ALLOC allocation with user data iommufd: Add IOMMU_HWPT_INVALIDATE
drivers/iommu/intel/iommu.c | 5 +- drivers/iommu/iommufd/device.c | 51 +++- drivers/iommu/iommufd/hw_pagetable.c | 257 ++++++++++++++++-- drivers/iommu/iommufd/iommufd_private.h | 59 +++- drivers/iommu/iommufd/iommufd_test.h | 40 +++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 184 ++++++++++++- include/linux/iommu.h | 110 +++++++- include/uapi/linux/iommufd.h | 60 +++- tools/testing/selftests/iommu/iommufd.c | 209 +++++++++++++- .../selftests/iommu/iommufd_fail_nth.c | 3 +- tools/testing/selftests/iommu/iommufd_utils.h | 91 ++++++- 12 files changed, 998 insertions(+), 74 deletions(-)
domain_alloc_user op already accepts user flags for domain allocation, add hwpt_type and user_data support as well. This op chooses to make the special parameters opaque to the core. This suits the current usage model where accessing any of the IOMMU device special parameters does require a userspace driver that matches the kernel driver. If a need for common parameters, implemented similarly by several drivers, arises then there is room in the design to grow a generic parameter set as well.
Define enum iommu_hwpt_type to differentiate the user parameters used by different iommu vendor drivers in the future. For backward compatibility and the allocations without any specified hwpt_type or user parameters, add an IOMMU_HWPT_TYPE_DEFAULT in the list.
Also, add a struct iommu_user_data as a package of data_ptr and data_len from an iommufd core uAPI structure, then provide a helper function for a driver to run data length sanity and copy a defined hwpt_type specific driver user data.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/intel/iommu.c | 5 ++- drivers/iommu/iommufd/hw_pagetable.c | 4 ++- drivers/iommu/iommufd/selftest.c | 5 ++- include/linux/iommu.h | 51 ++++++++++++++++++++++++++-- include/uapi/linux/iommufd.h | 8 +++++ 5 files changed, 67 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 491bcde1ff96..4ffc939a71f0 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4075,7 +4075,10 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type) }
static struct iommu_domain * -intel_iommu_domain_alloc_user(struct device *dev, u32 flags) +intel_iommu_domain_alloc_user(struct device *dev, u32 flags, + enum iommu_hwpt_type hwpt_type, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) { struct iommu_domain *domain; struct intel_iommu *iommu; diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 26a8a818ffa3..1d7378a6cbb3 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -96,7 +96,9 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, hwpt->ioas = ioas;
if (ops->domain_alloc_user) { - hwpt->domain = ops->domain_alloc_user(idev->dev, flags); + hwpt->domain = ops->domain_alloc_user(idev->dev, flags, + IOMMU_HWPT_TYPE_DEFAULT, + NULL, NULL); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index b54cbfb1862b..2205a552e570 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -171,7 +171,10 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) }
static struct iommu_domain * -mock_domain_alloc_user(struct device *dev, u32 flags) +mock_domain_alloc_user(struct device *dev, u32 flags, + enum iommu_hwpt_type hwpt_type, + struct iommu_domain *parent, + const struct iommu_user_data *user_data) { struct iommu_domain *domain;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
#define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -227,6 +228,41 @@ struct iommu_iotlb_gather { bool queued; };
+/** + * struct iommu_user_data - iommu driver specific user space data info + * @uptr: Pointer to the user buffer for copy_from_user() + * @len: The length of the user buffer in bytes + * + * A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h + * Both @uptr and @len should be just copied from an iommufd core uAPI structure + */ +struct iommu_user_data { + void __user *uptr; + size_t len; +}; + +/** + * iommu_copy_user_data - Copy iommu driver specific user space data + * @dst_data: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @src_data: Pointer to a struct iommu_user_data for user space data info + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst) + * @min_len: Initial length of user data structure for backward compatibility. + * This should be offsetofend using the last member in the user data + * struct that was initially added to include/uapi/linux/iommufd.h + */ +static inline int iommu_copy_user_data(void *dst_data, + const struct iommu_user_data *src_data, + size_t data_len, size_t min_len) +{ + if (WARN_ON(!dst_data || !src_data)) + return -EINVAL; + if (src_data->len < min_len || data_len < src_data->len) + return -EINVAL; + return copy_struct_from_user(dst_data, data_len, + src_data->uptr, src_data->len); +} + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability @@ -237,11 +273,17 @@ struct iommu_iotlb_gather { * @domain_alloc: allocate iommu domain * @domain_alloc_user: Allocate an iommu domain corresponding to the input * parameters like flags defined as enum iommufd_ioas_map_flags + * and the @hwpt_type that is defined as enum iommu_hwpt_type * in include/uapi/linux/iommufd.h. Different from the * domain_alloc op, it requires iommu driver to fully * initialize a new domain including the generic iommu_domain - * struct. Upon success, a domain is returned. Upon failure, - * ERR_PTR must be returned. + * struct. + * Upon success, if the @user_data is valid and the @parent + * points to a kernel-managed domain, the new domain must be + * IOMMU_DOMAIN_NESTED type; otherwise, the @parent must be + * NULL while the @user_data can be optionally provided, the + * new domain must be IOMMU_DOMAIN_UNMANAGED type. + * Upon failure, ERR_PTR must be returned. * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU @@ -274,7 +316,10 @@ struct iommu_ops {
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); - struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags); + struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags, + enum iommu_hwpt_type hwpt_type, + struct iommu_domain *parent, + const struct iommu_user_data *user_data);
struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4a7c5c8fdbb4..3c8660fe9bb1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, };
+/** + * enum iommu_hwpt_type - IOMMU HWPT Type + * @IOMMU_HWPT_TYPE_DEFAULT: default + */ +enum iommu_hwpt_type { + IOMMU_HWPT_TYPE_DEFAULT, +}; + /** * struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC) * @size: sizeof(struct iommu_hwpt_alloc)
On 2023/9/21 15:51, Yi Liu wrote:
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is defined in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward compatibility.
This should be offsetofend using the last member in the user data
struct that was initially added to include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data,
const struct iommu_user_data *src_data,
size_t data_len, size_t min_len)
+{
- if (WARN_ON(!dst_data || !src_data))
return -EINVAL;
- if (src_data->len < min_len || data_len < src_data->len)
return -EINVAL;
- return copy_struct_from_user(dst_data, data_len,
src_data->uptr, src_data->len);
+}
I am not sure that I understand the purpose of "min_len" correctly. It seems like it would always be equal to data_len?
Or, it means the minimal data length that the iommu driver requires?
Best regards, baolu
On Thu, Sep 21, 2023 at 08:10:58PM +0800, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is defined in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward compatibility.
This should be offsetofend using the last member in the user data
struct that was initially added to include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data,
const struct iommu_user_data *src_data,
size_t data_len, size_t min_len)
+{
if (WARN_ON(!dst_data || !src_data))
return -EINVAL;
if (src_data->len < min_len || data_len < src_data->len)
return -EINVAL;
return copy_struct_from_user(dst_data, data_len,
src_data->uptr, src_data->len);
+}
I am not sure that I understand the purpose of "min_len" correctly. It seems like it would always be equal to data_len?
Or, it means the minimal data length that the iommu driver requires?
Hmm, I thought I had made it quite clear with the kdoc that it's the "initial" length (min_len) v.s. "current" length (data_len), i.e. min_len was set when the structure was introduced and would never change while data_len can grow if the structure introduces a new member. E.g. after this series struct iommu_hwpt_alloc has a min_len fixed to offsetofend(..., __reserved) but its data_len is actually increased to offsetofend(..., data_uptr).
Perhaps we could put all min_len defines in uAPI header, like: include/uapi/linux/gfs2_ondisk.h:442:#define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash)) In this way, drivers won't need to deal with that nor have risks of breaking ABI by changing a min_len.
Also, if we go a bit further to ease the drivers, we could do:
======================================================================================== diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 65a363f5e81e..13234e67409c 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -147,6 +147,9 @@ struct iommu_hwpt_selftest { __u32 iotlb; };
+#define iommu_hwpt_selftest_min_len \ + (offsetofend(struct iommu_hwpt_selftest, iotlb)) + /** * struct iommu_hwpt_invalidate_selftest * diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 117776d236dc..2cc3a8a3355b 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -263,8 +263,8 @@ mock_domain_alloc_user(struct device *dev, u32 flags, }
if (user_data) { - int rc = iommu_copy_user_data(&data, user_data, - data_len, min_len); + int rc = iommu_copy_user_data2(iommu_hwpt_selftest, &data, + user_data); if (rc) return ERR_PTR(rc); user_cfg = &data; diff --git a/include/linux/iommu.h b/include/linux/iommu.h index fb2febe7b8d8..db82803b026f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -282,6 +282,10 @@ static inline int iommu_copy_user_data(void *dst_data, src_data->uptr, src_data->len); }
+#define iommu_copy_user_data2(dst_struct, dst, src) \ + iommu_copy_user_data(dst, src, sizeof(struct dst_struct), \ + dst_struct##_min_len) + /** * iommu_copy_user_data_from_array - Copy iommu driver specific user space data * from an iommu_user_data_array input ========================================================================================
Surely, the end product of the test code above can do: iommu_copy_user_data = > __iommu_copy_user_data iommu_copy_user_data2 = > iommu_copy_user_data
Thanks Nicolin
On Thu, Sep 21, 2023 at 01:58:19PM -0700, Nicolin Chen wrote:
Perhaps we could put all min_len defines in uAPI header, like: include/uapi/linux/gfs2_ondisk.h:442:#define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash)) In this way, drivers won't need to deal with that nor have risks of breaking ABI by changing a min_len.
I don't think we need constants, just be sure that every call to iommu_copy_user_data() has an offsetof() as the last parameter.
Indeed perhaps you should put it in a macro and force this to happen eg:
#define iommu_copy_user_data(user_data, from, min_size_member) \ __iommu_copy_user_data(user_data, from, offsetofend(typeof(*from), min_size_member))
iommu_copy_user_data(user_data, &data, iotlb);
Jason
On Mon, Sep 25, 2023 at 10:05:06AM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 01:58:19PM -0700, Nicolin Chen wrote:
Perhaps we could put all min_len defines in uAPI header, like: include/uapi/linux/gfs2_ondisk.h:442:#define LH_V1_SIZE (offsetofend(struct gfs2_log_header, lh_hash)) In this way, drivers won't need to deal with that nor have risks of breaking ABI by changing a min_len.
I don't think we need constants, just be sure that every call to iommu_copy_user_data() has an offsetof() as the last parameter.
Indeed perhaps you should put it in a macro and force this to happen eg:
#define iommu_copy_user_data(user_data, from, min_size_member) \ __iommu_copy_user_data(user_data, from, offsetofend(typeof(*from), min_size_member))
iommu_copy_user_data(user_data, &data, iotlb);
OK. And always use sizeof(typeof(*from)) in the mcaro for the current data_len, I assume?
Thanks Nic
On 2023/9/21 20:10, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is
defined in
- *Â Â Â Â Â Â Â Â Â Â Â include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct
_dst)
- @min_len: Initial length of user data structure for backward
compatibility.
- *Â Â Â Â Â Â Â Â Â Â This should be offsetofend using the last member in the
user data
- *Â Â Â Â Â Â Â Â Â Â struct that was initially added to
include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const struct iommu_user_data *src_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t data_len, size_t min_len) +{ +Â Â Â if (WARN_ON(!dst_data || !src_data)) +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â if (src_data->len < min_len || data_len < src_data->len) +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â return copy_struct_from_user(dst_data, data_len, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â src_data->uptr, src_data->len); +}
I am not sure that I understand the purpose of "min_len" correctly. It seems like it would always be equal to data_len?
no, it will not be equal to data_len once there is extension in the uAPI structure.
Or, it means the minimal data length that the iommu driver requires?
it is the minimal data length the uAPI requires. min_len is finalized per the upstream of the first version of the uAPI.
On 2023/9/25 14:22, Yi Liu wrote:
On 2023/9/21 20:10, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is
defined in
- *Â Â Â Â Â Â Â Â Â Â Â include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space
data info
- @data_len: Length of current user data structure, i.e.
sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward
compatibility.
- *Â Â Â Â Â Â Â Â Â Â This should be offsetofend using the last member in the
user data
- *Â Â Â Â Â Â Â Â Â Â struct that was initially added to
include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const struct iommu_user_data *src_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t data_len, size_t min_len) +{ +Â Â Â if (WARN_ON(!dst_data || !src_data)) +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â if (src_data->len < min_len || data_len < src_data->len) +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â return copy_struct_from_user(dst_data, data_len, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â src_data->uptr, src_data->len); +}
I am not sure that I understand the purpose of "min_len" correctly. It seems like it would always be equal to data_len?
no, it will not be equal to data_len once there is extension in the uAPI structure.
Or, it means the minimal data length that the iommu driver requires?
it is the minimal data length the uAPI requires. min_len is finalized per the upstream of the first version of the uAPI.
So, it looks like a constant. Perhaps we should document it in the uapi/iommuf.h and avoid using it as a parameter of a helper function?
Best regards, baolu
On Mon, Sep 25, 2023 at 04:01:55PM +0800, Baolu Lu wrote:
On 2023/9/25 14:22, Yi Liu wrote:
On 2023/9/21 20:10, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data
that is defined in
- *Â Â Â Â Â Â Â Â Â Â Â include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user
space data info
- @data_len: Length of current user data structure, i.e.
sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward
compatibility.
- *Â Â Â Â Â Â Â Â Â Â This should be offsetofend using the last member
in the user data
- *Â Â Â Â Â Â Â Â Â Â struct that was initially added to
include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const struct iommu_user_data *src_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t data_len, size_t min_len) +{ +Â Â Â if (WARN_ON(!dst_data || !src_data)) +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â if (src_data->len < min_len || data_len < src_data->len) +Â Â Â Â Â Â Â return -EINVAL; +Â Â Â return copy_struct_from_user(dst_data, data_len, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â src_data->uptr, src_data->len); +}
I am not sure that I understand the purpose of "min_len" correctly. It seems like it would always be equal to data_len?
no, it will not be equal to data_len once there is extension in the uAPI structure.
Or, it means the minimal data length that the iommu driver requires?
it is the minimal data length the uAPI requires. min_len is finalized per the upstream of the first version of the uAPI.
So, it looks like a constant. Perhaps we should document it in the uapi/iommuf.h and avoid using it as a parameter of a helper function?
It is per-driver, per-struct, so this is the right way to do it
Jason
On 2023/9/21 15:51, Yi Liu wrote:
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4a7c5c8fdbb4..3c8660fe9bb1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, }; +/**
- enum iommu_hwpt_type - IOMMU HWPT Type
- @IOMMU_HWPT_TYPE_DEFAULT: default
How about s/default/vendor agnostic/ ?
- */
+enum iommu_hwpt_type {
- IOMMU_HWPT_TYPE_DEFAULT,
+};
- /**
- struct iommu_hwpt_alloc - ioctl(IOMMU_HWPT_ALLOC)
- @size: sizeof(struct iommu_hwpt_alloc)
Best regards, baolu
On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4a7c5c8fdbb4..3c8660fe9bb1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, }; +/**
- enum iommu_hwpt_type - IOMMU HWPT Type
- @IOMMU_HWPT_TYPE_DEFAULT: default
How about s/default/vendor agnostic/ ?
Please don't use the word vendor :)
IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default
Jason
On 2023-09-21 17:44, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4a7c5c8fdbb4..3c8660fe9bb1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags { IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, }; +/**
- enum iommu_hwpt_type - IOMMU HWPT Type
- @IOMMU_HWPT_TYPE_DEFAULT: default
How about s/default/vendor agnostic/ ?
Please don't use the word vendor :)
IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default
Ah yes, a default domain type, not to be confused with any default domain type, including the default default domain type. Just in case anyone had forgotten how gleefully fun this is :D
I particularly like the bit where we end up with this construct later:
switch (hwpt_type) { case IOMMU_HWPT_TYPE_DEFAULT: /* allocate a domain */ default: /* allocate a different domain */ }
But of course neither case allocates a *default* domain, because it's quite obviously the wrong place to be doing that.
I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format).
Thanks, Robin.
On 2023/9/22 17:47, Robin Murphy wrote:
On 2023-09-21 17:44, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 08:12:03PM +0800, Baolu Lu wrote:
On 2023/9/21 15:51, Yi Liu wrote:
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 4a7c5c8fdbb4..3c8660fe9bb1 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -357,6 +357,14 @@ enum iommufd_hwpt_alloc_flags { Â Â Â Â Â Â IOMMU_HWPT_ALLOC_NEST_PARENT = 1 << 0, Â Â }; +/**
- enum iommu_hwpt_type - IOMMU HWPT Type
- @IOMMU_HWPT_TYPE_DEFAULT: default
How about s/default/vendor agnostic/ ?
Please don't use the word vendor :)
IOMMU_HWPT_TYPE_GENERIC perhaps if we don't like default
Ah yes, a default domain type, not to be confused with any default domain type, including the default default domain type. Just in case anyone had forgotten how gleefully fun this is :D
I particularly like the bit where we end up with this construct later:
switch (hwpt_type) { Â Â Â Â case IOMMU_HWPT_TYPE_DEFAULT: Â Â Â Â Â Â Â /* allocate a domain */ Â Â Â Â default: Â Â Â Â Â Â Â /* allocate a different domain */ Â Â Â Â }
But of course neither case allocates a *default* domain, because it's quite obviously the wrong place to be doing that.
I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format).
yes. It is just what the existing domain_alloc_user op does. Here we add a hwpt_type as the type can be given by user, so we need to define a specific type for it.
Perhaps we can also name it as IOMMU_HWPT_TYPE_UNMANAGED to be aligned with the domain type naming. IOMMU_HWPT_TYPE_GENERIC is also a good choice. Please feel free let me know your preference.
On Mon, Sep 25, 2023 at 02:17:37PM +0800, Yi Liu wrote:
yes. It is just what the existing domain_alloc_user op does. Here we add a hwpt_type as the type can be given by user, so we need to define a specific type for it.
Perhaps we can also name it as IOMMU_HWPT_TYPE_UNMANAGED to be aligned with
unmanged is also a weird nonsense name these days. We are slowly replacing it with paging.
the domain type naming. IOMMU_HWPT_TYPE_GENERIC is also a good choice. Please feel free let me know your preference.
This seems OK to me so far
Jason
From: Robin Murphy robin.murphy@arm.com Sent: Friday, September 22, 2023 5:48 PM
I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format).
IOMMU_HWPT_TYPE_KERNEL then?
IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.
On Tue, Sep 26, 2023 at 06:37:55AM +0000, Tian, Kevin wrote:
From: Robin Murphy robin.murphy@arm.com Sent: Friday, September 22, 2023 5:48 PM
I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format).
IOMMU_HWPT_TYPE_KERNEL then?
IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.
At the end of the day this enum is the type tag for:
struct iommu_hwpt_alloc { __u32 size; __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved; + __u32 hwpt_type; + __u32 data_len; + __aligned_u64 data_uptr; };
That pointer.
IOMMU_HWPT_ALLOC_DATA_NONE = 0 IOMMU_HWPT_ALLOC_DATA_INTEL_VTD IOMMU_HWPT_ALLOC_DATA_ARM_SMMUV3
etc?
DATA_NONE requires data_len == 0
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, September 27, 2023 12:29 AM
On Tue, Sep 26, 2023 at 06:37:55AM +0000, Tian, Kevin wrote:
From: Robin Murphy robin.murphy@arm.com Sent: Friday, September 22, 2023 5:48 PM
I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format).
IOMMU_HWPT_TYPE_KERNEL then?
IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.
At the end of the day this enum is the type tag for:
struct iommu_hwpt_alloc { __u32 size; __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved;
- __u32 hwpt_type;
- __u32 data_len;
- __aligned_u64 data_uptr;
};
That pointer.
IOMMU_HWPT_ALLOC_DATA_NONE = 0 IOMMU_HWPT_ALLOC_DATA_INTEL_VTD IOMMU_HWPT_ALLOC_DATA_ARM_SMMUV3
etc?
DATA_NONE requires data_len == 0
Looks good. Probably hwpt_type can be also renamed to data_type to better match this interpretation.
On 2023/9/27 09:08, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Wednesday, September 27, 2023 12:29 AM
On Tue, Sep 26, 2023 at 06:37:55AM +0000, Tian, Kevin wrote:
From: Robin Murphy robin.murphy@arm.com Sent: Friday, September 22, 2023 5:48 PM
I could go on enjoying myself, but basically yeah, "default" can't be a type in itself (at best it would be a meta-type which could be requested, such that it resolves to some real type to actually allocate), so a good name should reflect what the type functionally *means* to the user. IIUC the important distinction is that it's an abstract kernel-owned pagetable for the user to indirectly control via the API, rather than one it owns and writes directly (and thus has to be in a specific agreed format).
IOMMU_HWPT_TYPE_KERNEL then?
IOMMU_HWPT_TYPE_GENERIC also doesn't sound a straight word.
At the end of the day this enum is the type tag for:
struct iommu_hwpt_alloc { __u32 size; __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved;
- __u32 hwpt_type;
- __u32 data_len;
- __aligned_u64 data_uptr; };
That pointer.
IOMMU_HWPT_ALLOC_DATA_NONE = 0 IOMMU_HWPT_ALLOC_DATA_INTEL_VTD IOMMU_HWPT_ALLOC_DATA_ARM_SMMUV3
etc?
DATA_NONE requires data_len == 0
Looks good. Probably hwpt_type can be also renamed to data_type to better match this interpretation.
ack.
From: Yi Liu yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is defined
in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward compatibility.
This should be offsetofend using the last member in the user data
struct that was initially added to include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data,
const struct iommu_user_data *src_data,
size_t data_len, size_t min_len)
iommu_copy_struct_from_user()?
btw given the confusion raised on how this would be used is it clearer to move it to the patch together with the 1st user?
On 2023/9/26 14:56, Tian, Kevin wrote:
From: Yi Liu yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is defined
in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward compatibility.
This should be offsetofend using the last member in the user data
struct that was initially added to include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data,
const struct iommu_user_data *src_data,
size_t data_len, size_t min_len)
iommu_copy_struct_from_user()?
btw given the confusion raised on how this would be used is it clearer to move it to the patch together with the 1st user?
sure. How about your opinion? @Nic.
On 2023/10/12 17:12, Yi Liu wrote:
On 2023/9/26 14:56, Tian, Kevin wrote:
From: Yi Liu yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is
defined in
- *Â Â Â Â Â Â Â Â Â Â Â include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct
_dst)
- @min_len: Initial length of user data structure for backward
compatibility.
- *Â Â Â Â Â Â Â Â Â Â This should be offsetofend using the last member in the
user data
- *Â Â Â Â Â Â Â Â Â Â struct that was initially added to
include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â const struct iommu_user_data *src_data, +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â size_t data_len, size_t min_len)
iommu_copy_struct_from_user()?
btw given the confusion raised on how this would be used is it clearer to move it to the patch together with the 1st user?
sure. How about your opinion? @Nic.
after a second thinking, the first user of this helper is the patch to extend mock iommu driver. Is it suitable to introduce a common API together with selftest code?
https://lore.kernel.org/linux-iommu/20230921075138.124099-14-yi.l.liu@intel....
On Fri, Oct 13, 2023 at 07:42:50PM +0800, Yi Liu wrote:
On 2023/10/12 17:12, Yi Liu wrote:
On 2023/9/26 14:56, Tian, Kevin wrote:
From: Yi Liu yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
+/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is
defined in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data for user space data info
- @data_len: Length of current user data structure, i.e. sizeof(struct
_dst)
- @min_len: Initial length of user data structure for backward
compatibility.
This should be offsetofend using the last member in the
user data
struct that was initially added to
include/uapi/linux/iommufd.h
- */
+static inline int iommu_copy_user_data(void *dst_data,
const struct iommu_user_data *src_data,
size_t data_len, size_t min_len)
iommu_copy_struct_from_user()?
btw given the confusion raised on how this would be used is it clearer to move it to the patch together with the 1st user?
sure. How about your opinion? @Nic.
after a second thinking, the first user of this helper is the patch to extend mock iommu driver. Is it suitable to introduce a common API together with selftest code?
https://lore.kernel.org/linux-iommu/20230921075138.124099-14-yi.l.liu@intel....
I feel no...
But I could separate iommu_copy_struct_from_user and its array drivitive into an additional patch placed before the selftest changes, so at least it would be closer to the first callers.
Thanks Nic
On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
Oh we should definately avoid doing that!
Maybe this is a good moment to start a new header file exclusively for iommu drivers and core subsystem to include?
include/linux/iommu-driver.h
?
Put iommu_copy_user_data() and struct iommu_user_data in there
Avoid this include in this file.
#define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -227,6 +228,41 @@ struct iommu_iotlb_gather { bool queued; }; +/**
- struct iommu_user_data - iommu driver specific user space data info
- @uptr: Pointer to the user buffer for copy_from_user()
- @len: The length of the user buffer in bytes
- A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
- Both @uptr and @len should be just copied from an iommufd core uAPI structure
- */
+struct iommu_user_data {
- void __user *uptr;
- size_t len;
+};
Put the "hwpt_type" in here and just call it type
Jason
On 2023/10/11 00:58, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
Oh we should definately avoid doing that! Maybe this is a good moment to start a new header file exclusively for iommu drivers and core subsystem to include?
include/linux/iommu-driver.h
?
Put iommu_copy_user_data() and struct iommu_user_data in there
Avoid this include in this file.
sure. btw. seems all the user of this API and structure are in the drivers/iommu directory. can we just putting them in drivers/iommu/iommu-priv.h?
#define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -227,6 +228,41 @@ struct iommu_iotlb_gather { bool queued; }; +/**
- struct iommu_user_data - iommu driver specific user space data info
- @uptr: Pointer to the user buffer for copy_from_user()
- @len: The length of the user buffer in bytes
- A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
- Both @uptr and @len should be just copied from an iommufd core uAPI structure
- */
+struct iommu_user_data {
- void __user *uptr;
- size_t len;
+};
Put the "hwpt_type" in here and just call it type
I assume this is a change related to the above comment to avoid including uapi/linux/iommufd.h. right?
Just one concern. There are other paths (like cache_invalidate of this series and Nic's set_dev_data) uses this struct as well. I'm a bit worrying if it is good to put type here as type is meaningful for the domain_alloc_user path.
On Thu, Oct 12, 2023 at 05:11:09PM +0800, Yi Liu wrote:
On 2023/10/11 00:58, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
Oh we should definately avoid doing that! Maybe this is a good moment to start a new header file exclusively for iommu drivers and core subsystem to include?
include/linux/iommu-driver.h
?
Put iommu_copy_user_data() and struct iommu_user_data in there
Avoid this include in this file.
sure. btw. seems all the user of this API and structure are in the drivers/iommu directory. can we just putting them in drivers/iommu/iommu-priv.h?
iommu-priv.h should be private to the core iommu code, and we sort of extended it to iommufd as well.
iommu-driver.h would be "private" to the core and all the drivers only.
As include ../.. is often frown on at large scale it is probably better to be in include/linux
Just one concern. There are other paths (like cache_invalidate of this series and Nic's set_dev_data) uses this struct as well. I'm a bit worrying if it is good to put type here as type is meaningful for the domain_alloc_user path.
There is always a type though? I haven't got that far in the series yet to see..
Jason
On 2023/10/12 21:39, Jason Gunthorpe wrote:
On Thu, Oct 12, 2023 at 05:11:09PM +0800, Yi Liu wrote:
On 2023/10/11 00:58, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
Oh we should definately avoid doing that! Maybe this is a good moment to start a new header file exclusively for iommu drivers and core subsystem to include?
include/linux/iommu-driver.h
?
Put iommu_copy_user_data() and struct iommu_user_data in there
Avoid this include in this file.
sure. btw. seems all the user of this API and structure are in the drivers/iommu directory. can we just putting them in drivers/iommu/iommu-priv.h?
iommu-priv.h should be private to the core iommu code, and we sort of extended it to iommufd as well.
iommu-driver.h would be "private" to the core and all the drivers only.
As include ../.. is often frown on at large scale it is probably better to be in include/linux
got it.
Just one concern. There are other paths (like cache_invalidate of this series and Nic's set_dev_data) uses this struct as well. I'm a bit worrying if it is good to put type here as type is meaningful for the domain_alloc_user path.
There is always a type though? I haven't got that far in the series yet to see..
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
/* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags, enum iommu_hwpt_type hwpt_type, struct iommu_domain *parent, const struct iommu_user_data *user_data);
int (*set_dev_user_data)(struct device *dev, const struct iommu_user_data *user_data); void (*unset_dev_user_data)(struct device *dev);
int (*cache_invalidate_user)(struct iommu_domain *domain, struct iommu_user_data_array *array, u32 *error_code);
above code snippet is from below file:
https://github.com/yiliu1765/iommufd/blob/iommufd_nesting/include/linux/iomm...
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
Jason
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
/** * enum iommu_hwpt_data_type - IOMMU HWPT Data Type * @IOMMU_HWPT_DATA_NONE: no data * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table */ enum iommu_hwpt_data_type { IOMMU_HWPT_DATA_NONE, IOMMU_HWPT_DATA_VTD_S1, IOMMU_HWPT_DATA_ARM_SMMUV3, };
Though inevitably we'd have to define a separate data group for things like set_dev_data that is related to idev v.s. hwpt:
// IOMMU_DEV_DATA_TYPE sounds like an IOMMU device, other than a // passthrough device, so renaming to "_IDEV_" here. And perhaps // "set_dev_data" could be "set_idev_data" too? Any better name?
/** * enum iommu_idev_data_type - Data Type for a Device behind an IOMMU * @IOMMU_IDEV_DATA_NONE: no data * @IOMMU_IDEV_DATA_ARM_SMMUV3: ARM SMMUv3 specific device data */ enum iommu_idev_data_type { IOMMU_IDEV_DATA_NONE, IOMMU_IDEV_DATA_ARM_SMMUV3, };
/** * struct iommu_idev_data_arm_smmuv3 - ARM SMMUv3 specific device data * @sid: The Stream ID that is assigned in the user space * * The SMMUv3 specific user space data for a device that is behind an SMMU HW. * The guest-level user data should be linked to the host-level kernel data, * which will be used by user space cache invalidation commands. */ struct iommu_idev_data_arm_smmuv3 { __u32 sid; };
/** * struct iommu_set_idev_data - ioctl(IOMMU_SET_IDEV_DATA) * @size: sizeof(struct iommu_set_idev_data) * @dev_id: The device to set an iommu specific device data * @data_uptr: User pointer of the device user data * @data_len: Length of the device user data * * The device data must be unset using ioctl(IOMMU_UNSET_IDEV_DATA), before * another ioctl(IOMMU_SET_IDEV_DATA) call or before the device itself gets * unbind'd from the iommufd context. */ struct iommu_set_idev_data { __u32 size; __u32 dev_id; __aligned_u64 data_uptr; __u32 data_len; };
Thanks Nic
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
/**
- enum iommu_hwpt_data_type - IOMMU HWPT Data Type
- @IOMMU_HWPT_DATA_NONE: no data
- @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table
- @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table
*/ enum iommu_hwpt_data_type { IOMMU_HWPT_DATA_NONE, IOMMU_HWPT_DATA_VTD_S1, IOMMU_HWPT_DATA_ARM_SMMUV3, };
Though inevitably we'd have to define a separate data group for things like set_dev_data that is related to idev v.s. hwpt:
yes, the type field is in separate data group per path.
// IOMMU_DEV_DATA_TYPE sounds like an IOMMU device, other than a // passthrough device, so renaming to "_IDEV_" here. And perhaps // "set_dev_data" could be "set_idev_data" too? Any better name?
/**
- enum iommu_idev_data_type - Data Type for a Device behind an IOMMU
- @IOMMU_IDEV_DATA_NONE: no data
- @IOMMU_IDEV_DATA_ARM_SMMUV3: ARM SMMUv3 specific device data
*/ enum iommu_idev_data_type { IOMMU_IDEV_DATA_NONE, IOMMU_IDEV_DATA_ARM_SMMUV3, };
/**
- struct iommu_idev_data_arm_smmuv3 - ARM SMMUv3 specific device data
- @sid: The Stream ID that is assigned in the user space
- The SMMUv3 specific user space data for a device that is behind an SMMU HW.
- The guest-level user data should be linked to the host-level kernel data,
- which will be used by user space cache invalidation commands.
*/ struct iommu_idev_data_arm_smmuv3 { __u32 sid; };
/**
- struct iommu_set_idev_data - ioctl(IOMMU_SET_IDEV_DATA)
- @size: sizeof(struct iommu_set_idev_data)
- @dev_id: The device to set an iommu specific device data
- @data_uptr: User pointer of the device user data
- @data_len: Length of the device user data
- The device data must be unset using ioctl(IOMMU_UNSET_IDEV_DATA), before
- another ioctl(IOMMU_SET_IDEV_DATA) call or before the device itself gets
- unbind'd from the iommufd context.
*/ struct iommu_set_idev_data { __u32 size; __u32 dev_id; __aligned_u64 data_uptr; __u32 data_len; };
Thanks Nic
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
Jason
On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
The invalidation data has hwpt_id so it's tied to a hwpt and its hwpt->domain. Would it be reasonable to have a different type of invalidation data for the same type of hwpt?
With this being asked, I added it for our next version. At this moment, it only does a sanity job:
// API __iommu_copy_struct_from_user(void *dst_data, const struct iommu_user_data *src_data, unsigned int data_type, size_t data_len, size_t min_len) { if (src_data->type != data_type) return -EINVAL;
// Caller rc = iommu_copy_struct_from_user(&user_cfg, user_data, IOMMU_HWPT_DATA_SELFTEST, iotlb); if (rc) return ERR_PTR(rc);
Thanks Nic
On 2023/10/17 02:17, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
The invalidation data has hwpt_id so it's tied to a hwpt and its hwpt->domain. Would it be reasonable to have a different type of invalidation data for the same type of hwpt?
this seems like what Jason asks. A type of hwpt can have two kinds of invalidation data types. Is it really possible?
With this being asked, I added it for our next version. At this moment, it only does a sanity job:
// API __iommu_copy_struct_from_user(void *dst_data, const struct iommu_user_data *src_data, unsigned int data_type, size_t data_len, size_t min_len) { if (src_data->type != data_type) return -EINVAL;
// Caller rc = iommu_copy_struct_from_user(&user_cfg, user_data, IOMMU_HWPT_DATA_SELFTEST, iotlb); if (rc) return ERR_PTR(rc);
Thanks Nic
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, October 17, 2023 4:52 PM
On 2023/10/17 02:17, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> not really. Below the users of the struct iommu_user_data in my
current
> iommufd_nesting branch. Only the domain_alloc_user op has type as
there
> can be multiple vendor specific alloc data types. Basically, I'm ok to > make the change you suggested, just not sure if it is good to add type > as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
The invalidation data has hwpt_id so it's tied to a hwpt and its hwpt->domain. Would it be reasonable to have a different type of invalidation data for the same type of hwpt?
this seems like what Jason asks. A type of hwpt can have two kinds of invalidation data types. Is it really possible?
e.g. vhost-iommu may want its own vendor-agnostic format from previous discussion...
On 2023/10/17 17:28, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Tuesday, October 17, 2023 4:52 PM
On 2023/10/17 02:17, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote: > On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote: > >> not really. Below the users of the struct iommu_user_data in my
current
>> iommufd_nesting branch. Only the domain_alloc_user op has type as
there
>> can be multiple vendor specific alloc data types. Basically, I'm ok to >> make the change you suggested, just not sure if it is good to add type >> as it is only needed by one path. > > I don't think we should ever have an opaque data blob without a type > tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
The invalidation data has hwpt_id so it's tied to a hwpt and its hwpt->domain. Would it be reasonable to have a different type of invalidation data for the same type of hwpt?
this seems like what Jason asks. A type of hwpt can have two kinds of invalidation data types. Is it really possible?
e.g. vhost-iommu may want its own vendor-agnostic format from previous discussion...
ok. So in future, if there is vendor-agnostic format cache invalidation data structure, then each existing hwpt types would have two cache invalidation types. is it?
So it is required to make the invalidation data format part of the domain allocation. Perhaps we can add it later?
Regards, Yi Liu
On Mon, Oct 16, 2023 at 11:17:56AM -0700, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
not really. Below the users of the struct iommu_user_data in my current iommufd_nesting branch. Only the domain_alloc_user op has type as there can be multiple vendor specific alloc data types. Basically, I'm ok to make the change you suggested, just not sure if it is good to add type as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
The invalidation data has hwpt_id so it's tied to a hwpt and its hwpt->domain. Would it be reasonable to have a different type of invalidation data for the same type of hwpt?
Yeah, maybe? Go down the road 10 years and we might have SMMUv3 invalidation format v1 and v2 or something?
Like we don't know what the HW side will do, they might extend the command queue to have bigger commands and negotiate with the driver if the bigger/smaller format is used. We've done that in our HW a couple of times now.
Jason
On Wed, Oct 18, 2023 at 01:37:20PM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:17:56AM -0700, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:54:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 11:28:15AM +0800, Yi Liu wrote:
On 2023/10/14 01:56, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 11:04:56AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 12:33:13PM +0800, Yi Liu wrote:
> not really. Below the users of the struct iommu_user_data in my current > iommufd_nesting branch. Only the domain_alloc_user op has type as there > can be multiple vendor specific alloc data types. Basically, I'm ok to > make the change you suggested, just not sure if it is good to add type > as it is only needed by one path.
I don't think we should ever have an opaque data blob without a type tag..
I can add those "missing" data types, and then a driver will be responsible for sanitizing the type along with the data_len.
I notice that the enum iommu_hwpt_data_type in the posted patch is confined to the alloc_user uAPI. Perhaps we should share it with invalidate too:
invalidation path does not need a type field today as the data type is vendor specific, vendor driver should know the data type when calls in.
I'm not keen on that, what if a driver needs another type in the future? You'd want to make the invalidation data format part of the domain allocation?
The invalidation data has hwpt_id so it's tied to a hwpt and its hwpt->domain. Would it be reasonable to have a different type of invalidation data for the same type of hwpt?
Yeah, maybe? Go down the road 10 years and we might have SMMUv3 invalidation format v1 and v2 or something?
Like we don't know what the HW side will do, they might extend the command queue to have bigger commands and negotiate with the driver if the bigger/smaller format is used. We've done that in our HW a couple of times now.
I see. We'll have the type. Thanks!
On Tue, Oct 10, 2023 at 01:58:44PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
Oh we should definately avoid doing that! Maybe this is a good moment to start a new header file exclusively for iommu drivers and core subsystem to include?
include/linux/iommu-driver.h
?
Put iommu_copy_user_data() and struct iommu_user_data in there
Avoid this include in this file.
By looking closer, it seems that we included the uapi header for: + struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags, + enum iommu_hwpt_data_type data_type, + struct iommu_domain *parent, + const struct iommu_user_data *user_data);
So we could drop the include, and instead add this next to structs: +enum iommu_hwpt_data_type;
Then it's not that necessary to have a new header? We could mark a section of "driver exclusively functions" in iommu.h, I think.
#define IOMMU_READ (1 << 0) #define IOMMU_WRITE (1 << 1) @@ -227,6 +228,41 @@ struct iommu_iotlb_gather { bool queued; }; +/**
- struct iommu_user_data - iommu driver specific user space data info
- @uptr: Pointer to the user buffer for copy_from_user()
- @len: The length of the user buffer in bytes
- A user space data is an uAPI that is defined in include/uapi/linux/iommufd.h
- Both @uptr and @len should be just copied from an iommufd core uAPI structure
- */
+struct iommu_user_data {
- void __user *uptr;
- size_t len;
+};
Put the "hwpt_type" in here and just call it type
Ah, then domain_alloc_user can omit the enum iommu_hwpt_data_type.
Thanks! Nic
On Thu, Oct 12, 2023 at 05:34:58PM -0700, Nicolin Chen wrote:
On Tue, Oct 10, 2023 at 01:58:44PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:22AM -0700, Yi Liu wrote:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 660dc1931dc9..12e12e5563e6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -14,6 +14,7 @@ #include <linux/err.h> #include <linux/of.h> #include <uapi/linux/iommu.h> +#include <uapi/linux/iommufd.h>
Oh we should definately avoid doing that! Maybe this is a good moment to start a new header file exclusively for iommu drivers and core subsystem to include?
include/linux/iommu-driver.h
?
Put iommu_copy_user_data() and struct iommu_user_data in there
Avoid this include in this file.
By looking closer, it seems that we included the uapi header for:
- struct iommu_domain *(*domain_alloc_user)(struct device *dev, u32 flags,
enum iommu_hwpt_data_type data_type,
struct iommu_domain *parent,
const struct iommu_user_data *user_data);
So we could drop the include, and instead add this next to structs: +enum iommu_hwpt_data_type;
Then it's not that necessary to have a new header? We could mark a section of "driver exclusively functions" in iommu.h, I think.
Yeah, OK, though I still have a desire to split this header..
(though can you really forward declare enums and then pass it by value?)
Jason
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user I/O page table, which is nested on top of another user space address represented by a UNMANAGED domain. The mappings of a nested domain are managed by user space software, so it is not necessary to have map/unmap callbacks. But the updates of the PTEs in the nested page table will be propagated to the hardware caches on both IOMMU (IOTLB) and devices (DevTLB/ATC).
A nested domain is allocated by the domain_alloc_user op, and attached to a device through the existing iommu_attach_device/group() interfaces.
Add a new domain op cache_invalidate_user for the userspace to flush the hardware caches for a nested domain through iommufd. No wrapper for it, as it's only supposed to be used by iommufd.
Pass in invalidation requests to the cache_invalidate_user op, in form of a user data array that conatins a number of invalidation entries. Add an iommu_user_data_array struct and an iommu_copy_user_data_from_array helper for iommu drivers to walk through the invalidation request array and fetch the data entry inside.
Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Kevin Tian kevin.tian@intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- include/linux/iommu.h | 59 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 12e12e5563e6..439e295c91a3 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -66,6 +66,9 @@ struct iommu_domain_geometry {
#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
+#define __IOMMU_DOMAIN_NESTED (1U << 5) /* User-managed address space nested + on a stage-2 translation */ + #define IOMMU_DOMAIN_ALLOC_FLAGS ~__IOMMU_DOMAIN_DMA_FQ /* * This are the possible domain-types @@ -92,6 +95,7 @@ struct iommu_domain_geometry { __IOMMU_DOMAIN_DMA_API | \ __IOMMU_DOMAIN_DMA_FQ) #define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA) +#define IOMMU_DOMAIN_NESTED (__IOMMU_DOMAIN_NESTED)
struct iommu_domain { unsigned type; @@ -241,6 +245,21 @@ struct iommu_user_data { size_t len; };
+/** + * struct iommu_user_data_array - iommu driver specific user space data array + * @uptr: Pointer to the user buffer array for copy_from_user() + * @entry_len: The fixed-width length of a entry in the array, in bytes + * @entry_num: The number of total entries in the array + * + * A array having a @entry_num number of @entry_len sized entries, each entry is + * user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h + */ +struct iommu_user_data_array { + void __user *uptr; + size_t entry_len; + int entry_num; +}; + /** * iommu_copy_user_data - Copy iommu driver specific user space data * @dst_data: Pointer to an iommu driver specific user data that is defined in @@ -263,6 +282,34 @@ static inline int iommu_copy_user_data(void *dst_data, src_data->uptr, src_data->len); }
+/** + * iommu_copy_user_data_from_array - Copy iommu driver specific user space data + * from an iommu_user_data_array input + * @dst_data: Pointer to an iommu driver specific user data that is defined in + * include/uapi/linux/iommufd.h + * @src_data: Pointer to a struct iommu_user_data_array for user space data array + * @index: Index to offset the location in the array to copy user data from + * @data_len: Length of current user data structure, i.e. sizeof(struct _dst) + * @min_len: Initial length of user data structure for backward compatibility. + * This should be offsetofend using the last member in the user data + * struct that was initially added to include/uapi/linux/iommufd.h + */ +static inline int +iommu_copy_user_data_from_array(void *dst_data, + const struct iommu_user_data_array *src_array, + int index, size_t data_len, size_t min_len) +{ + struct iommu_user_data src_data; + + if (WARN_ON(!src_array || index >= src_array->entry_num)) + return -EINVAL; + if (!src_array->entry_num) + return -EINVAL; + src_data.uptr = src_array->uptr + src_array->entry_len * index; + src_data.len = src_array->entry_len; + return iommu_copy_user_data(dst_data, &src_data, data_len, min_len); +} + /** * struct iommu_ops - iommu ops and capabilities * @capable: check capability @@ -374,6 +421,15 @@ struct iommu_ops { * @iotlb_sync_map: Sync mappings created recently using @map to the hardware * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush * queue + * @cache_invalidate_user: Flush hardware cache for user space IO page table. + * The @domain must be IOMMU_DOMAIN_NESTED. The @array + * passes in the cache invalidation requests, in form + * of a driver data structure. The driver must update + * array->entry_num to report the number of handled + * invalidation requests. The 32-bit @error_code can + * forward a driver specific error code to user space. + * Both the driver data structure and the error code + * must be defined in include/uapi/linux/iommufd.h * @iova_to_phys: translate iova to physical address * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, * including no-snoop TLPs on PCIe or other platform @@ -403,6 +459,9 @@ struct iommu_domain_ops { size_t size); void (*iotlb_sync)(struct iommu_domain *domain, struct iommu_iotlb_gather *iotlb_gather); + int (*cache_invalidate_user)(struct iommu_domain *domain, + struct iommu_user_data_array *array, + u32 *error_code);
phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova);
On Thu, Sep 21, 2023 at 12:51:23AM -0700, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user I/O page table, which is nested on top of another user space address represented by a UNMANAGED domain. The
Lets start using the world PAGING whenever you want to type UNMANAGED. I'm trying to get rid of UNMANAGED.
@@ -241,6 +245,21 @@ struct iommu_user_data { size_t len; }; +/**
- struct iommu_user_data_array - iommu driver specific user space data array
- @uptr: Pointer to the user buffer array for copy_from_user()
- @entry_len: The fixed-width length of a entry in the array, in bytes
- @entry_num: The number of total entries in the array
- A array having a @entry_num number of @entry_len sized entries, each entry is
- user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
- */
+struct iommu_user_data_array {
- void __user *uptr;
- size_t entry_len;
- int entry_num;
+};
Ditto about iommu-driver.h for most of this stuff
/**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is defined in
@@ -263,6 +282,34 @@ static inline int iommu_copy_user_data(void *dst_data, src_data->uptr, src_data->len); } +/**
- iommu_copy_user_data_from_array - Copy iommu driver specific user space data
from an iommu_user_data_array input
- @dst_data: Pointer to an iommu driver specific user data that is defined in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data_array for user space data array
- @index: Index to offset the location in the array to copy user data from
- @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward compatibility.
This should be offsetofend using the last member in the user data
struct that was initially added to include/uapi/linux/iommufd.h
- */
+static inline int +iommu_copy_user_data_from_array(void *dst_data,
const struct iommu_user_data_array *src_array,
int index, size_t data_len, size_t min_len)
Index should be 'unsigned int'
Jason
On 2023/10/11 01:21, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:23AM -0700, Yi Liu wrote:
From: Lu Baolu baolu.lu@linux.intel.com
Introduce a new domain type for a user I/O page table, which is nested on top of another user space address represented by a UNMANAGED domain. The
Lets start using the world PAGING whenever you want to type UNMANAGED. I'm trying to get rid of UNMANAGED.
sure.
@@ -241,6 +245,21 @@ struct iommu_user_data { size_t len; }; +/**
- struct iommu_user_data_array - iommu driver specific user space data array
- @uptr: Pointer to the user buffer array for copy_from_user()
- @entry_len: The fixed-width length of a entry in the array, in bytes
- @entry_num: The number of total entries in the array
- A array having a @entry_num number of @entry_len sized entries, each entry is
- user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
- */
+struct iommu_user_data_array {
- void __user *uptr;
- size_t entry_len;
- int entry_num;
+};
Ditto about iommu-driver.h for most of this stuff
ack.
- /**
- iommu_copy_user_data - Copy iommu driver specific user space data
- @dst_data: Pointer to an iommu driver specific user data that is defined in
@@ -263,6 +282,34 @@ static inline int iommu_copy_user_data(void *dst_data, src_data->uptr, src_data->len); } +/**
- iommu_copy_user_data_from_array - Copy iommu driver specific user space data
from an iommu_user_data_array input
- @dst_data: Pointer to an iommu driver specific user data that is defined in
include/uapi/linux/iommufd.h
- @src_data: Pointer to a struct iommu_user_data_array for user space data array
- @index: Index to offset the location in the array to copy user data from
- @data_len: Length of current user data structure, i.e. sizeof(struct _dst)
- @min_len: Initial length of user data structure for backward compatibility.
This should be offsetofend using the last member in the user data
struct that was initially added to include/uapi/linux/iommufd.h
- */
+static inline int +iommu_copy_user_data_from_array(void *dst_data,
const struct iommu_user_data_array *src_array,
int index, size_t data_len, size_t min_len)
Index should be 'unsigned int'
yes.
On Thu, Oct 12, 2023 at 05:13:59PM +0800, Yi Liu wrote:
@@ -241,6 +245,21 @@ struct iommu_user_data { size_t len; };
+/**
- struct iommu_user_data_array - iommu driver specific user space data array
- @uptr: Pointer to the user buffer array for copy_from_user()
- @entry_len: The fixed-width length of a entry in the array, in bytes
- @entry_num: The number of total entries in the array
- A array having a @entry_num number of @entry_len sized entries, each entry is
- user space data, i.e. an uAPI that is defined in include/uapi/linux/iommufd.h
- */
+struct iommu_user_data_array {
- void __user *uptr;
- size_t entry_len;
- int entry_num;
+};
Ditto about iommu-driver.h for most of this stuff
ack.
FYI, I aligned with Jason in another conversation that we can defer this iommu-driver.h thing since there's no more need to include the previous uapi header in iommu.h
Thanks Nic
From: Nicolin Chen nicolinc@nvidia.com
The struct iommufd_hw_pagetable has been representing a kernel-managed HWPT, yet soon will be reused to represent a user-managed HWPT. These two types of HWPTs has the same IOMMUFD object type and an iommu_domain object, but have quite different attributes/members.
Add a union in struct iommufd_hw_pagetable and group all the existing kernel-managed members. One of the following patches will add another struct for user-managed members.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 3064997a0181..947a797536e3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); */ struct iommufd_hw_pagetable { struct iommufd_object obj; - struct iommufd_ioas *ioas; struct iommu_domain *domain; - bool auto_domain : 1; - bool enforce_cache_coherency : 1; - bool msi_cookie : 1; - /* Head at iommufd_ioas::hwpt_list */ - struct list_head hwpt_item; + + union { + struct { /* kernel-managed */ + struct iommufd_ioas *ioas; + bool auto_domain : 1; + bool enforce_cache_coherency : 1; + bool msi_cookie : 1; + /* Head at iommufd_ioas::hwpt_list */ + struct list_head hwpt_item; + }; + }; };
struct iommufd_hw_pagetable *
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
The struct iommufd_hw_pagetable has been representing a kernel-managed HWPT, yet soon will be reused to represent a user-managed HWPT. These two types of HWPTs has the same IOMMUFD object type and an iommu_domain object, but have quite different attributes/members.
Add a union in struct iommufd_hw_pagetable and group all the existing kernel-managed members. One of the following patches will add another struct for user-managed members.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
Reviewed-by: Kevin Tian kevin.tian@intel.com
On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
The struct iommufd_hw_pagetable has been representing a kernel-managed HWPT, yet soon will be reused to represent a user-managed HWPT. These two types of HWPTs has the same IOMMUFD object type and an iommu_domain object, but have quite different attributes/members.
Add a union in struct iommufd_hw_pagetable and group all the existing kernel-managed members. One of the following patches will add another struct for user-managed members.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 3064997a0181..947a797536e3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); */ struct iommufd_hw_pagetable { struct iommufd_object obj;
- struct iommufd_ioas *ioas; struct iommu_domain *domain;
- bool auto_domain : 1;
- bool enforce_cache_coherency : 1;
- bool msi_cookie : 1;
- /* Head at iommufd_ioas::hwpt_list */
- struct list_head hwpt_item;
- union {
struct { /* kernel-managed */
struct iommufd_ioas *ioas;
bool auto_domain : 1;
Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain? If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(), (e.g. as below). Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being accessible though invalid.
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) { - lockdep_assert_not_held(&hwpt->ioas->mutex); - if (hwpt->auto_domain) + if (!hwpt->user_managed) + lockdep_assert_not_held(&hwpt->ioas->mutex); + + if (!hwpt->user_managed && hwpt->auto_domain) iommufd_object_deref_user(ictx, &hwpt->obj); else refcount_dec(&hwpt->obj.users); }
bool enforce_cache_coherency : 1;
bool msi_cookie : 1;
/* Head at iommufd_ioas::hwpt_list */
struct list_head hwpt_item;
};
- };
}; struct iommufd_hw_pagetable * -- 2.34.1
On 2023/10/7 18:08, Yan Zhao wrote:
On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
The struct iommufd_hw_pagetable has been representing a kernel-managed HWPT, yet soon will be reused to represent a user-managed HWPT. These two types of HWPTs has the same IOMMUFD object type and an iommu_domain object, but have quite different attributes/members.
Add a union in struct iommufd_hw_pagetable and group all the existing kernel-managed members. One of the following patches will add another struct for user-managed members.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 3064997a0181..947a797536e3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); */ struct iommufd_hw_pagetable { struct iommufd_object obj;
- struct iommufd_ioas *ioas; struct iommu_domain *domain;
- bool auto_domain : 1;
- bool enforce_cache_coherency : 1;
- bool msi_cookie : 1;
- /* Head at iommufd_ioas::hwpt_list */
- struct list_head hwpt_item;
- union {
struct { /* kernel-managed */
struct iommufd_ioas *ioas;
bool auto_domain : 1;
Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?
yes.
If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(), (e.g. as below). Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being accessible though invalid.
not quite get this sentence.
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) {
lockdep_assert_not_held(&hwpt->ioas->mutex);
if (hwpt->auto_domain)
if (!hwpt->user_managed)
lockdep_assert_not_held(&hwpt->ioas->mutex);
this is true. this assert is not needed when hwpt is not kernel managed domain.
if (!hwpt->user_managed && hwpt->auto_domain)
actually, checking auto_domain is more precise. There is hwpt which is neither user managed nor auto.
iommufd_object_deref_user(ictx, &hwpt->obj); else refcount_dec(&hwpt->obj.users);
}
bool enforce_cache_coherency : 1;
bool msi_cookie : 1;
/* Head at iommufd_ioas::hwpt_list */
struct list_head hwpt_item;
};
- }; };
struct iommufd_hw_pagetable * -- 2.34.1
On Mon, Oct 09, 2023 at 12:13:52PM +0800, Yi Liu wrote:
On 2023/10/7 18:08, Yan Zhao wrote:
On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
The struct iommufd_hw_pagetable has been representing a kernel-managed HWPT, yet soon will be reused to represent a user-managed HWPT. These two types of HWPTs has the same IOMMUFD object type and an iommu_domain object, but have quite different attributes/members.
Add a union in struct iommufd_hw_pagetable and group all the existing kernel-managed members. One of the following patches will add another struct for user-managed members.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 3064997a0181..947a797536e3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); */ struct iommufd_hw_pagetable { struct iommufd_object obj;
- struct iommufd_ioas *ioas; struct iommu_domain *domain;
- bool auto_domain : 1;
- bool enforce_cache_coherency : 1;
- bool msi_cookie : 1;
- /* Head at iommufd_ioas::hwpt_list */
- struct list_head hwpt_item;
- union {
struct { /* kernel-managed */
struct iommufd_ioas *ioas;
bool auto_domain : 1;
Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?
yes.
If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(), (e.g. as below). Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being accessible though invalid.
not quite get this sentence.
I mean with this union, hwpt->auto_domain or hwpt->ioas should only be accessed if and only if hwpt type is kernel-managed. So, any unconditional access, as in iommufd_hw_pagetable_put() pasted below, is buggy.
Therefore, do you think it's better to add a filed like "bool kernel_managed : 1", and access the union fields under /* kernel-managed */ only when hwpt->kernel_managed is true.
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) {
lockdep_assert_not_held(&hwpt->ioas->mutex);
if (hwpt->auto_domain)
if (!hwpt->user_managed)
lockdep_assert_not_held(&hwpt->ioas->mutex);
this is true. this assert is not needed when hwpt is not kernel managed domain.
if (!hwpt->user_managed && hwpt->auto_domain)
actually, checking auto_domain is more precise. There is hwpt which is neither user managed nor auto.
auto_domain is under union fields marked with kernel-managed only. Access it without type checking is invalid.
struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain;
void (*abort)(struct iommufd_object *obj); void (*destroy)(struct iommufd_object *obj);
bool user_managed : 1; union { struct { /* user-managed */ struct iommufd_hw_pagetable *parent; }; struct { /* kernel-managed */ struct iommufd_ioas *ioas; struct mutex mutex; bool auto_domain : 1; bool enforce_cache_coherency : 1; bool msi_cookie : 1; bool nest_parent : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; }; }; };
iommufd_object_deref_user(ictx, &hwpt->obj); else refcount_dec(&hwpt->obj.users);
}
bool enforce_cache_coherency : 1;
bool msi_cookie : 1;
/* Head at iommufd_ioas::hwpt_list */
struct list_head hwpt_item;
};
- }; }; struct iommufd_hw_pagetable *
-- 2.34.1
-- Regards, Yi Liu
On 2023/10/9 13:13, Yan Zhao wrote:
On Mon, Oct 09, 2023 at 12:13:52PM +0800, Yi Liu wrote:
On 2023/10/7 18:08, Yan Zhao wrote:
On Thu, Sep 21, 2023 at 12:51:24AM -0700, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
The struct iommufd_hw_pagetable has been representing a kernel-managed HWPT, yet soon will be reused to represent a user-managed HWPT. These two types of HWPTs has the same IOMMUFD object type and an iommu_domain object, but have quite different attributes/members.
Add a union in struct iommufd_hw_pagetable and group all the existing kernel-managed members. One of the following patches will add another struct for user-managed members.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/iommufd_private.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 3064997a0181..947a797536e3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -231,13 +231,18 @@ int iommufd_vfio_ioas(struct iommufd_ucmd *ucmd); */ struct iommufd_hw_pagetable { struct iommufd_object obj;
- struct iommufd_ioas *ioas; struct iommu_domain *domain;
- bool auto_domain : 1;
- bool enforce_cache_coherency : 1;
- bool msi_cookie : 1;
- /* Head at iommufd_ioas::hwpt_list */
- struct list_head hwpt_item;
- union {
struct { /* kernel-managed */
struct iommufd_ioas *ioas;
bool auto_domain : 1;
Will iommufd_hw_pagetable_put() also be called on non-kernel-managed domain?
yes.
If yes, hwpt->user_managed needs to be checked in iommufd_hw_pagetable_put(), (e.g. as below). Otherwise, this union will lead to hwpt->ioas and hwpt->auto_domain still being accessible though invalid.
not quite get this sentence.
I mean with this union, hwpt->auto_domain or hwpt->ioas should only be accessed if and only if hwpt type is kernel-managed.
ok, I get this part. just not sure about the missing words in your prior comment.
So, any unconditional access, as in iommufd_hw_pagetable_put() pasted below, is buggy.
Therefore, do you think it's better to add a filed like "bool kernel_managed : 1", and access the union fields under /* kernel-managed */ only when hwpt->kernel_managed is true.
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) {
lockdep_assert_not_held(&hwpt->ioas->mutex);
if (hwpt->auto_domain)
if (!hwpt->user_managed)
lockdep_assert_not_held(&hwpt->ioas->mutex);
this is true. this assert is not needed when hwpt is not kernel managed domain.
if (!hwpt->user_managed && hwpt->auto_domain)
actually, checking auto_domain is more precise. There is hwpt which is neither user managed nor auto.
auto_domain is under union fields marked with kernel-managed only. Access it without type checking is invalid.
I see. yes, should check user_managed as well. :)
struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain;
void (*abort)(struct iommufd_object *obj); void (*destroy)(struct iommufd_object *obj); bool user_managed : 1; union { struct { /* user-managed */ struct iommufd_hw_pagetable *parent; }; struct { /* kernel-managed */ struct iommufd_ioas *ioas; struct mutex mutex; bool auto_domain : 1; bool enforce_cache_coherency : 1; bool msi_cookie : 1; bool nest_parent : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; }; };
};
iommufd_object_deref_user(ictx, &hwpt->obj); else refcount_dec(&hwpt->obj.users);
}
bool enforce_cache_coherency : 1;
bool msi_cookie : 1;
/* Head at iommufd_ioas::hwpt_list */
struct list_head hwpt_item;
};
- }; }; struct iommufd_hw_pagetable *
-- 2.34.1
-- Regards, Yi Liu
iommu core already supports accepting user_data and hwpt_type to allocate a user iommu_domain, which allows the caller iommufd_hw_pagetable_alloc() to accept the hwpt_type and user_data too. Thus, pass them in.
Reviewed-by: Kevin Tian kevin.tian@intel.com Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 3 ++- drivers/iommu/iommufd/hw_pagetable.c | 17 ++++++++++++++--- drivers/iommu/iommufd/iommufd_private.h | 3 +++ 3 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index e88fa73a45e6..e04900f101f1 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -540,7 +540,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, }
hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, - 0, immediate_attach); + 0, IOMMU_HWPT_TYPE_DEFAULT, + NULL, immediate_attach); if (IS_ERR(hwpt)) { destroy_hwpt = ERR_CAST(hwpt); goto out_unlock; diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 1d7378a6cbb3..554a9c3d740f 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -62,6 +62,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * @ioas: IOAS to associate the domain with * @idev: Device to get an iommu_domain for * @flags: Flags from userspace + * @hwpt_type: Requested type of hw_pagetable + * @user_data: Optional user_data pointer * @immediate_attach: True if idev should be attached to the hwpt * * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT @@ -75,6 +77,8 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags, + enum iommu_hwpt_type hwpt_type, + struct iommu_user_data *user_data, bool immediate_attach) { const struct iommu_ops *ops = dev_iommu_ops(idev->dev); @@ -86,6 +90,11 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, if ((flags & IOMMU_HWPT_ALLOC_NEST_PARENT) && !ops->domain_alloc_user) return ERR_PTR(-EOPNOTSUPP);
+ if (user_data && hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) + return ERR_PTR(-EINVAL); + if (user_data && !ops->domain_alloc_user) + return ERR_PTR(-EOPNOTSUPP); + hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); if (IS_ERR(hwpt)) return hwpt; @@ -97,8 +106,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, - IOMMU_HWPT_TYPE_DEFAULT, - NULL, NULL); + hwpt_type, NULL, + user_data); if (IS_ERR(hwpt->domain)) { rc = PTR_ERR(hwpt->domain); hwpt->domain = NULL; @@ -174,7 +183,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
mutex_lock(&ioas->mutex); hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, - idev, cmd->flags, false); + idev, cmd->flags, + IOMMU_HWPT_TYPE_DEFAULT, + NULL, false); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt); goto out_unlock; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 947a797536e3..1d3b1a74e854 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -13,6 +13,7 @@ struct iommu_domain; struct iommu_group; struct iommu_option; struct iommufd_device; +struct iommu_user_data;
struct iommufd_ctx { struct file *file; @@ -248,6 +249,8 @@ struct iommufd_hw_pagetable { struct iommufd_hw_pagetable * iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, struct iommufd_device *idev, u32 flags, + enum iommu_hwpt_type hwpt_type, + struct iommu_user_data *user_data, bool immediate_attach); int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt); int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
From: Nicolin Chen nicolinc@nvidia.com
As one of the previous commits mentioned, a user-managed HWPT will have some different attributes/members. It'd be more clear by having separate allocators. Since the existing iommufd_hw_pagetable_alloc() serves well kernel-managed HWPTs, apply some minimal updates to mark it as a kernel- managed HWPT allocator.
Also, add a pair of function pointers (abort and destroy) in the struct, to separate different cleanup routines. Then rename the existing cleanup functions to iommufd_kernel_managed_hwpt_destroy/abort() linked to the HWPT in the allocator.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 34 ++++++++++++++++++++----- drivers/iommu/iommufd/iommufd_private.h | 3 +++ 2 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 554a9c3d740f..1cc7178121d1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,7 +8,7 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
-void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) +static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj) { struct iommufd_hw_pagetable *hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); @@ -27,7 +27,12 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) refcount_dec(&hwpt->ioas->obj.users); }
-void iommufd_hw_pagetable_abort(struct iommufd_object *obj) +void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) +{ + container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj); +} + +static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj) { struct iommufd_hw_pagetable *hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); @@ -42,6 +47,11 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj) iommufd_hw_pagetable_destroy(obj); }
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj) +{ + container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj); +} + int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) { if (hwpt->enforce_cache_coherency) @@ -57,7 +67,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) }
/** - * iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device + * iommufd_hw_pagetable_alloc() - Get a kernel-managed iommu_domain for a device * @ictx: iommufd context * @ioas: IOAS to associate the domain with * @idev: Device to get an iommu_domain for @@ -66,9 +76,9 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * @user_data: Optional user_data pointer * @immediate_attach: True if idev should be attached to the hwpt * - * Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT - * will be linked to the given ioas and upon return the underlying iommu_domain - * is fully popoulated. + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_UNMANAGED) and return it as + * a kernel-managed hw_pagetable. The HWPT will be linked to the given ioas and + * upon return the underlying iommu_domain is fully popoulated. * * The caller must hold the ioas->mutex until after * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on @@ -103,6 +113,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt->ioas = ioas; + hwpt->abort = iommufd_kernel_managed_hwpt_abort; + hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, @@ -121,6 +133,16 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, } }
+ if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED)) { + rc = -EINVAL; + goto out_abort; + } + /* Driver is buggy by mixing user-managed op in kernel-managed ops */ + if (WARN_ON_ONCE(hwpt->domain->ops->cache_invalidate_user)) { + rc = -EINVAL; + goto out_abort; + } + /* * Set the coherency mode before we do iopt_table_add_domain() as some * iommus have a per-PTE bit that controls it and need to decide before diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 1d3b1a74e854..3e89c3d530f3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -234,6 +234,9 @@ struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain;
+ void (*abort)(struct iommufd_object *obj); + void (*destroy)(struct iommufd_object *obj); + union { struct { /* kernel-managed */ struct iommufd_ioas *ioas;
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
-void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) +static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj)
'_managed_' could be removed. 'kernel_hwpt' and 'user_hwpt' are clear enough.
ditto for all other related functions.
On Thu, Sep 21, 2023 at 12:51:26AM -0700, Yi Liu wrote:
From: Nicolin Chen nicolinc@nvidia.com
As one of the previous commits mentioned, a user-managed HWPT will have some different attributes/members. It'd be more clear by having separate allocators. Since the existing iommufd_hw_pagetable_alloc() serves well kernel-managed HWPTs, apply some minimal updates to mark it as a kernel- managed HWPT allocator.
Also, add a pair of function pointers (abort and destroy) in the struct, to separate different cleanup routines. Then rename the existing cleanup functions to iommufd_kernel_managed_hwpt_destroy/abort() linked to the HWPT in the allocator.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 34 ++++++++++++++++++++----- drivers/iommu/iommufd/iommufd_private.h | 3 +++ 2 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 554a9c3d740f..1cc7178121d1 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,7 +8,7 @@ #include "../iommu-priv.h" #include "iommufd_private.h" -void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) +static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj) { struct iommufd_hw_pagetable *hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); @@ -27,7 +27,12 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) refcount_dec(&hwpt->ioas->obj.users); } -void iommufd_hw_pagetable_abort(struct iommufd_object *obj) +void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) +{
- container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj);
+}
+static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj) { struct iommufd_hw_pagetable *hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); @@ -42,6 +47,11 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj) iommufd_hw_pagetable_destroy(obj); } +void iommufd_hw_pagetable_abort(struct iommufd_object *obj) +{
- container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj);
+}
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) { if (hwpt->enforce_cache_coherency) @@ -57,7 +67,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) } /**
- iommufd_hw_pagetable_alloc() - Get an iommu_domain for a device
- iommufd_hw_pagetable_alloc() - Get a kernel-managed iommu_domain for a device
- @ictx: iommufd context
- @ioas: IOAS to associate the domain with
- @idev: Device to get an iommu_domain for
@@ -66,9 +76,9 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
- @user_data: Optional user_data pointer
- @immediate_attach: True if idev should be attached to the hwpt
- Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
- will be linked to the given ioas and upon return the underlying iommu_domain
- is fully popoulated.
- Allocate a new iommu_domain (must be IOMMU_DOMAIN_UNMANAGED) and return it as
- a kernel-managed hw_pagetable. The HWPT will be linked to the given ioas and
- upon return the underlying iommu_domain is fully popoulated.
- The caller must hold the ioas->mutex until after
- iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
@@ -103,6 +113,8 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt->ioas = ioas;
- hwpt->abort = iommufd_kernel_managed_hwpt_abort;
- hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;
if (ops->domain_alloc_user) { hwpt->domain = ops->domain_alloc_user(idev->dev, flags, @@ -121,6 +133,16 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, } }
- if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_UNMANAGED)) {
rc = -EINVAL;
goto out_abort;
- }
- /* Driver is buggy by mixing user-managed op in kernel-managed ops */
- if (WARN_ON_ONCE(hwpt->domain->ops->cache_invalidate_user)) {
rc = -EINVAL;
goto out_abort;
- }
- /*
- Set the coherency mode before we do iopt_table_add_domain() as some
- iommus have a per-PTE bit that controls it and need to decide before
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 1d3b1a74e854..3e89c3d530f3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -234,6 +234,9 @@ struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain;
- void (*abort)(struct iommufd_object *obj);
- void (*destroy)(struct iommufd_object *obj);
- union { struct { /* kernel-managed */ struct iommufd_ioas *ioas;
I think if you are doing this you are trying too hard to share the struct.. Defaintely want to avoid function pointers in general, and function pointers in a writable struct in particular.
An if of some sort in the two functions would be clearer
Jason
On Tue, Oct 10, 2023 at 03:49:32PM -0300, Jason Gunthorpe wrote:
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 1d3b1a74e854..3e89c3d530f3 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -234,6 +234,9 @@ struct iommufd_hw_pagetable { struct iommufd_object obj; struct iommu_domain *domain;
- void (*abort)(struct iommufd_object *obj);
- void (*destroy)(struct iommufd_object *obj);
- union { struct { /* kernel-managed */ struct iommufd_ioas *ioas;
I think if you are doing this you are trying too hard to share the struct.. Defaintely want to avoid function pointers in general, and function pointers in a writable struct in particular.
I looked at this for a while and I do still have the feeling that probably two structs and even two type IDs is probably a cleaner design.
Like this:
// Or maybe use obj.type ? enum iommufd_hw_pagetable_type { IOMMUFD_HWPT_PAGING, IOMMUFD_HWPT_NESTED, };
struct iommufd_hw_pagetable_common { struct iommufd_object obj; struct iommu_domain *domain; enum iommufd_hw_pagetable_type type; };
struct iommufd_hw_pagetable { struct iommufd_hw_pagetable_common common; struct iommufd_ioas *ioas; bool auto_domain : 1; bool enforce_cache_coherency : 1; bool msi_cookie : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; };
struct iommufd_hw_pagetable_nested { struct iommufd_hw_pagetable_common common; // ?? };
I poked at it in an editor for a bit and it was looking OK but requires breaking up a bunch of functions then I ran out of time
Also, we probably should feed enforce_cache_coherency through the alloc_hwpt uapi and not try to autodetect it..
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 13, 2023 3:10 AM
Also, we probably should feed enforce_cache_coherency through the alloc_hwpt uapi and not try to autodetect it..
In the past we had a long discussion about this with the conclusion that user opt is the ideal model but it's fine to stay with autodetect and existing vfio/kvm contract for coherency detection until we see a real demand for user opt.
Is there anything new which changes your mind to have user opt now?
On Fri, Oct 13, 2023 at 07:13:34AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 13, 2023 3:10 AM
Also, we probably should feed enforce_cache_coherency through the alloc_hwpt uapi and not try to autodetect it..
In the past we had a long discussion about this with the conclusion that user opt is the ideal model but it's fine to stay with autodetect and existing vfio/kvm contract for coherency detection until we see a real demand for user opt.
Is there anything new which changes your mind to have user opt now?
I guess, I was just looking at the complexity it brings to keep that working.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 13, 2023 10:06 PM
On Fri, Oct 13, 2023 at 07:13:34AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 13, 2023 3:10 AM
Also, we probably should feed enforce_cache_coherency through the alloc_hwpt uapi and not try to autodetect it..
In the past we had a long discussion about this with the conclusion that user opt is the ideal model but it's fine to stay with autodetect and existing vfio/kvm contract for coherency detection until we see a real demand for user opt.
Is there anything new which changes your mind to have user opt now?
I guess, I was just looking at the complexity it brings to keep that working.
vfio_file_enforced_coherent() currently just looks at device_iommu_capable() and can be called before attach, assuming an autodetect model.
moving away from autodetect would need a new contract which I hesitate to bother with at this point.
On Mon, Oct 16, 2023 at 08:26:25AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 13, 2023 10:06 PM
On Fri, Oct 13, 2023 at 07:13:34AM +0000, Tian, Kevin wrote:
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 13, 2023 3:10 AM
Also, we probably should feed enforce_cache_coherency through the alloc_hwpt uapi and not try to autodetect it..
In the past we had a long discussion about this with the conclusion that user opt is the ideal model but it's fine to stay with autodetect and existing vfio/kvm contract for coherency detection until we see a real demand for user opt.
Is there anything new which changes your mind to have user opt now?
I guess, I was just looking at the complexity it brings to keep that working.
vfio_file_enforced_coherent() currently just looks at device_iommu_capable() and can be called before attach, assuming an autodetect model.
moving away from autodetect would need a new contract which I hesitate to bother with at this point.
I think that is fine, we are not enabling new behaviors where qemu can directly control the wbinvd support. If we want to do that it can be done more directly.
If the vmm sees a coherent device and does not turn on enforced coherency then it will get a broken wbinvd and be sad. Since this is all in the userspace vt-d iommu driver, it can just do the right thing anyhow.
Jason
From: Nicolin Chen nicolinc@nvidia.com
This allows iommufd_hwpt_alloc() to have a common routine but jump to a different allocator and hold a different mutex, corresponding to types of HWPT allocation (either kernel-managed or user-managed). This shared function pointer takes "pt_obj" as an input that would be coverted into an IOAS pointer or a parent HWPT pointer.
Then, update the kernel-managed allocator to follow this pt_obj change.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 2 +- drivers/iommu/iommufd/hw_pagetable.c | 46 ++++++++++++++++++------- drivers/iommu/iommufd/iommufd_private.h | 3 +- 3 files changed, 37 insertions(+), 14 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index e04900f101f1..eb120f70a3e3 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -539,7 +539,7 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev, goto out_unlock; }
- hwpt = iommufd_hw_pagetable_alloc(idev->ictx, ioas, idev, + hwpt = iommufd_hw_pagetable_alloc(idev->ictx, &ioas->obj, idev, 0, IOMMU_HWPT_TYPE_DEFAULT, NULL, immediate_attach); if (IS_ERR(hwpt)) { diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 1cc7178121d1..b2af68776877 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -69,7 +69,7 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) /** * iommufd_hw_pagetable_alloc() - Get a kernel-managed iommu_domain for a device * @ictx: iommufd context - * @ioas: IOAS to associate the domain with + * @pt_obj: An object to an IOAS to associate the domain with * @idev: Device to get an iommu_domain for * @flags: Flags from userspace * @hwpt_type: Requested type of hw_pagetable @@ -85,12 +85,15 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) * the returned hwpt. */ struct iommufd_hw_pagetable * -iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, +iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, + struct iommufd_object *pt_obj, struct iommufd_device *idev, u32 flags, enum iommu_hwpt_type hwpt_type, struct iommu_user_data *user_data, bool immediate_attach) { + struct iommufd_ioas *ioas = + container_of(pt_obj, struct iommufd_ioas, obj); const struct iommu_ops *ops = dev_iommu_ops(idev->dev); struct iommufd_hw_pagetable *hwpt; int rc; @@ -184,10 +187,19 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) { + struct iommufd_hw_pagetable *(*alloc_fn)( + struct iommufd_ctx *ictx, + struct iommufd_object *pt_obj, + struct iommufd_device *idev, + u32 flags, enum iommu_hwpt_type type, + struct iommu_user_data *user_data, + bool flag); struct iommu_hwpt_alloc *cmd = ucmd->cmd; struct iommufd_hw_pagetable *hwpt; + struct iommufd_object *pt_obj; struct iommufd_device *idev; struct iommufd_ioas *ioas; + struct mutex *mutex; int rc;
if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved) @@ -197,17 +209,26 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) if (IS_ERR(idev)) return PTR_ERR(idev);
- ioas = iommufd_get_ioas(ucmd->ictx, cmd->pt_id); - if (IS_ERR(ioas)) { - rc = PTR_ERR(ioas); + pt_obj = iommufd_get_object(ucmd->ictx, cmd->pt_id, IOMMUFD_OBJ_ANY); + if (IS_ERR(pt_obj)) { + rc = -EINVAL; goto out_put_idev; }
- mutex_lock(&ioas->mutex); - hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, - idev, cmd->flags, - IOMMU_HWPT_TYPE_DEFAULT, - NULL, false); + switch (pt_obj->type) { + case IOMMUFD_OBJ_IOAS: + ioas = container_of(pt_obj, struct iommufd_ioas, obj); + mutex = &ioas->mutex; + alloc_fn = iommufd_hw_pagetable_alloc; + break; + default: + rc = -EINVAL; + goto out_put_pt; + } + + mutex_lock(mutex); + hwpt = alloc_fn(ucmd->ictx, pt_obj, idev, cmd->flags, + IOMMU_HWPT_TYPE_DEFAULT, NULL, false); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt); goto out_unlock; @@ -223,8 +244,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) out_hwpt: iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); out_unlock: - mutex_unlock(&ioas->mutex); - iommufd_put_object(&ioas->obj); + mutex_unlock(mutex); +out_put_pt: + iommufd_put_object(pt_obj); out_put_idev: iommufd_put_object(&idev->obj); return rc; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 3e89c3d530f3..e4d06ae6b0c5 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -250,7 +250,8 @@ struct iommufd_hw_pagetable { };
struct iommufd_hw_pagetable * -iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas, +iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, + struct iommufd_object *pt_obj, struct iommufd_device *idev, u32 flags, enum iommu_hwpt_type hwpt_type, struct iommu_user_data *user_data,
From: Nicolin Chen nicolinc@nvidia.com
Add a parent hw_pagetable pointer for user-managed hw_pagetables. Similar to the ioas->mutex, add another mutex in the kernel-managed hw_pagetable to serialize associating user-managed hw_pagetable allocations. Then, add user_managed flag too in the struct to ease identifying a HWPT.
Also, add a new allocator iommufd_user_managed_hwpt_alloc() and two pairs of cleanup functions iommufd_user_managed_hwpt_destroy/abort().
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 112 +++++++++++++++++++++++- drivers/iommu/iommufd/iommufd_private.h | 6 ++ 2 files changed, 117 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index b2af68776877..dc3e11a23acf 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -8,6 +8,17 @@ #include "../iommu-priv.h" #include "iommufd_private.h"
+static void iommufd_user_managed_hwpt_destroy(struct iommufd_object *obj) +{ + struct iommufd_hw_pagetable *hwpt = + container_of(obj, struct iommufd_hw_pagetable, obj); + + if (hwpt->domain) + iommu_domain_free(hwpt->domain); + + refcount_dec(&hwpt->parent->obj.users); +} + static void iommufd_kernel_managed_hwpt_destroy(struct iommufd_object *obj) { struct iommufd_hw_pagetable *hwpt = @@ -32,6 +43,17 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj) container_of(obj, struct iommufd_hw_pagetable, obj)->destroy(obj); }
+static void iommufd_user_managed_hwpt_abort(struct iommufd_object *obj) +{ + struct iommufd_hw_pagetable *hwpt = + container_of(obj, struct iommufd_hw_pagetable, obj); + + /* The parent->mutex must be held until finalize is called. */ + lockdep_assert_held(&hwpt->parent->mutex); + + iommufd_hw_pagetable_destroy(obj); +} + static void iommufd_kernel_managed_hwpt_abort(struct iommufd_object *obj) { struct iommufd_hw_pagetable *hwpt = @@ -52,6 +74,82 @@ void iommufd_hw_pagetable_abort(struct iommufd_object *obj) container_of(obj, struct iommufd_hw_pagetable, obj)->abort(obj); }
+/** + * iommufd_user_managed_hwpt_alloc() - Get a user-managed hw_pagetable + * @ictx: iommufd context + * @pt_obj: Parent object to an HWPT to associate the domain with + * @idev: Device to get an iommu_domain for + * @flags: Flags from userspace + * @hwpt_type: Requested type of hw_pagetable + * @user_data: user_data pointer + * @dummy: never used + * + * Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and return it as + * a user-managed hw_pagetable. + */ +static struct iommufd_hw_pagetable * +iommufd_user_managed_hwpt_alloc(struct iommufd_ctx *ictx, + struct iommufd_object *pt_obj, + struct iommufd_device *idev, + u32 flags, + enum iommu_hwpt_type hwpt_type, + struct iommu_user_data *user_data, + bool dummy) +{ + struct iommufd_hw_pagetable *parent = + container_of(pt_obj, struct iommufd_hw_pagetable, obj); + const struct iommu_ops *ops = dev_iommu_ops(idev->dev); + struct iommufd_hw_pagetable *hwpt; + int rc; + + if (!user_data) + return ERR_PTR(-EINVAL); + if (parent->auto_domain) + return ERR_PTR(-EINVAL); + if (!parent->nest_parent) + return ERR_PTR(-EINVAL); + if (hwpt_type == IOMMU_HWPT_TYPE_DEFAULT) + return ERR_PTR(-EINVAL); + + if (!ops->domain_alloc_user) + return ERR_PTR(-EOPNOTSUPP); + + lockdep_assert_held(&parent->mutex); + + hwpt = iommufd_object_alloc(ictx, hwpt, IOMMUFD_OBJ_HW_PAGETABLE); + if (IS_ERR(hwpt)) + return hwpt; + + refcount_inc(&parent->obj.users); + hwpt->parent = parent; + hwpt->user_managed = true; + hwpt->abort = iommufd_user_managed_hwpt_abort; + hwpt->destroy = iommufd_user_managed_hwpt_destroy; + + hwpt->domain = ops->domain_alloc_user(idev->dev, flags, hwpt_type, + parent->domain, user_data); + if (IS_ERR(hwpt->domain)) { + rc = PTR_ERR(hwpt->domain); + hwpt->domain = NULL; + goto out_abort; + } + + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { + rc = -EINVAL; + goto out_abort; + } + /* Driver is buggy by missing cache_invalidate_user in domain_ops */ + if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) { + rc = -EINVAL; + goto out_abort; + } + return hwpt; + +out_abort: + iommufd_object_abort_and_destroy(ictx, &hwpt->obj); + return ERR_PTR(rc); +} + int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) { if (hwpt->enforce_cache_coherency) @@ -112,10 +210,12 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, if (IS_ERR(hwpt)) return hwpt;
+ mutex_init(&hwpt->mutex); INIT_LIST_HEAD(&hwpt->hwpt_item); /* Pairs with iommufd_hw_pagetable_destroy() */ refcount_inc(&ioas->obj.users); hwpt->ioas = ioas; + hwpt->nest_parent = flags & IOMMU_HWPT_ALLOC_NEST_PARENT; hwpt->abort = iommufd_kernel_managed_hwpt_abort; hwpt->destroy = iommufd_kernel_managed_hwpt_destroy;
@@ -194,8 +294,8 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) u32 flags, enum iommu_hwpt_type type, struct iommu_user_data *user_data, bool flag); + struct iommufd_hw_pagetable *hwpt, *parent; struct iommu_hwpt_alloc *cmd = ucmd->cmd; - struct iommufd_hw_pagetable *hwpt; struct iommufd_object *pt_obj; struct iommufd_device *idev; struct iommufd_ioas *ioas; @@ -221,6 +321,16 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) mutex = &ioas->mutex; alloc_fn = iommufd_hw_pagetable_alloc; break; + case IOMMUFD_OBJ_HW_PAGETABLE: + parent = container_of(pt_obj, struct iommufd_hw_pagetable, obj); + /* No user-managed HWPT on top of an user-managed one */ + if (parent->user_managed) { + rc = -EINVAL; + goto out_put_pt; + } + mutex = &parent->mutex; + alloc_fn = iommufd_user_managed_hwpt_alloc; + break; default: rc = -EINVAL; goto out_put_pt; diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index e4d06ae6b0c5..34940596c2c2 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -237,12 +237,18 @@ struct iommufd_hw_pagetable { void (*abort)(struct iommufd_object *obj); void (*destroy)(struct iommufd_object *obj);
+ bool user_managed : 1; union { + struct { /* user-managed */ + struct iommufd_hw_pagetable *parent; + }; struct { /* kernel-managed */ struct iommufd_ioas *ioas; + struct mutex mutex; bool auto_domain : 1; bool enforce_cache_coherency : 1; bool msi_cookie : 1; + bool nest_parent : 1; /* Head at iommufd_ioas::hwpt_list */ struct list_head hwpt_item; };
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
+static void iommufd_user_managed_hwpt_abort(struct iommufd_object *obj) +{
- struct iommufd_hw_pagetable *hwpt =
container_of(obj, struct iommufd_hw_pagetable, obj);
- /* The parent->mutex must be held until finalize is called. */
- lockdep_assert_held(&hwpt->parent->mutex);
- iommufd_hw_pagetable_destroy(obj);
+}
Can you elaborate what exactly is protected by parent->mutex?
My gut-feeling that the child just grabs a refcnt on the parent object. It doesn't change any other data of the parent.
+/**
- iommufd_user_managed_hwpt_alloc() - Get a user-managed
hw_pagetable
- @ictx: iommufd context
- @pt_obj: Parent object to an HWPT to associate the domain with
- @idev: Device to get an iommu_domain for
- @flags: Flags from userspace
- @hwpt_type: Requested type of hw_pagetable
- @user_data: user_data pointer
- @dummy: never used
- Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and
return it as
- a user-managed hw_pagetable.
Add some text to highlight the requirement being a parent, e.g. not an auto domain, must be capable of being a parent, etc.
- case IOMMUFD_OBJ_HW_PAGETABLE:
parent = container_of(pt_obj, struct iommufd_hw_pagetable,
obj);
/* No user-managed HWPT on top of an user-managed one
*/
if (parent->user_managed) {
rc = -EINVAL;
goto out_put_pt;
}
move to alloc_fn().
On Tue, Sep 26, 2023 at 01:14:47AM -0700, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
+static void iommufd_user_managed_hwpt_abort(struct iommufd_object *obj) +{
struct iommufd_hw_pagetable *hwpt =
container_of(obj, struct iommufd_hw_pagetable, obj);
/* The parent->mutex must be held until finalize is called. */
lockdep_assert_held(&hwpt->parent->mutex);
iommufd_hw_pagetable_destroy(obj);
+}
Can you elaborate what exactly is protected by parent->mutex?
My gut-feeling that the child just grabs a refcnt on the parent object. It doesn't change any other data of the parent.
Ah, you are right. It's added here just for symmetry so we wouldn't end up with something like: if (!hwpt->user_managed) mutex_lock(&hwpt->mutex); alloc_fn(); if (!hwpt->user_managed) mutex_unlock(&hwpt->mutex);
Perhaps I should move the pair of mutex calls to the kernel-managed hwpt allocator.
+/**
- iommufd_user_managed_hwpt_alloc() - Get a user-managed
hw_pagetable
- @ictx: iommufd context
- @pt_obj: Parent object to an HWPT to associate the domain with
- @idev: Device to get an iommu_domain for
- @flags: Flags from userspace
- @hwpt_type: Requested type of hw_pagetable
- @user_data: user_data pointer
- @dummy: never used
- Allocate a new iommu_domain (must be IOMMU_DOMAIN_NESTED) and
return it as
- a user-managed hw_pagetable.
Add some text to highlight the requirement being a parent, e.g. not an auto domain, must be capable of being a parent, etc.
OK.
case IOMMUFD_OBJ_HW_PAGETABLE:
parent = container_of(pt_obj, struct iommufd_hw_pagetable,
obj);
/* No user-managed HWPT on top of an user-managed one
*/
if (parent->user_managed) {
rc = -EINVAL;
goto out_put_pt;
}
move to alloc_fn().
Though being a bit covert, this is actually to avoid a data buffer allocation in the common pathway before calling alloc_fn(), which is added in the following patch. And the reason why it's in the common function is because we previously support a kernel-managed hwpt allocation with data too.
But now, I think we can just move this sanity and data allocation together into the user-managed hwpt allocator.
Thanks Nicolin
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things. So, they should be only setup on kernel-managed domains. If the attaching domain is a user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Co-developed-by: Yi Liu yi.l.liu@intel.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 4 ++++ drivers/iommu/iommufd/hw_pagetable.c | 4 ++++ 2 files changed, 8 insertions(+)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index eb120f70a3e3..104dd061a2a3 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -305,12 +305,16 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup, * domain after request_irq(). If it is not done interrupts will not * work on this domain. * + * Note: always set up a msi_cookie on a kernel-manage hw_pagetable. + * * FIXME: This is conceptually broken for iommufd since we want to allow * userspace to change the domains, eg switch from an identity IOAS to a * DMA IOAS. There is currently no way to create a MSI window that * matches what the IRQ layer actually expects in a newly created * domain. */ + if (hwpt->user_managed) + hwpt = hwpt->parent; if (sw_msi_start != PHYS_ADDR_MAX && !hwpt->msi_cookie) { rc = iommu_get_msi_cookie(hwpt->domain, sw_msi_start); if (rc) diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index dc3e11a23acf..90fd65859e28 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -152,6 +152,10 @@ iommufd_user_managed_hwpt_alloc(struct iommufd_ctx *ictx,
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt) { + /* Always enforce cache coherency on a kernel-managed hw_pagetable */ + if (hwpt->user_managed) + hwpt = hwpt->parent; + if (hwpt->enforce_cache_coherency) return 0;
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things. So, they should be only setup on kernel-managed domains. If the attaching domain is a user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
No redirection. The parent should already have the configuration done when it's created. It shouldn't be triggered in the nesting path.
On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things. So, they should be only setup on kernel-managed domains. If the attaching domain is a user-managed domain, redirect the hwpt to hwpt->parent to do it correctly.
No redirection. The parent should already have the configuration done when it's created. It shouldn't be triggered in the nesting path.
iommufd_hw_pagetable_enforce_cc() is not only called in alloc(), but also in hwpt_attach/replace() if cc is not enforced by the alloc() because the idev that initiates the hwpt_alloc() might not have idev->enforce_cache_coherency. Only when another idev that has idev->enforce_cache_coherency attaches to the shared hwpt, the cc configuration would be done.
In a nesting case encountering the same situation above, the S2 hwpt is allocated without the iommufd_hw_pagetable_enforce_cc(). But the 2nd idev that has idev->enforce_cache_coherency might attach directly to the S1 domain/hwpt without touching the S2 domain (for the same VM, S2 domain can be shared). In this case, without a redirection, the iommufd_hw_pagetable_enforce_cc() would be missed.
Any thought?
Nic
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 14, 2023 8:45 AM
On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things. So, they should be only setup on kernel-managed domains. If the
attaching
domain is a user-managed domain, redirect the hwpt to hwpt->parent to
do
it correctly.
No redirection. The parent should already have the configuration done when it's created. It shouldn't be triggered in the nesting path.
iommufd_hw_pagetable_enforce_cc() is not only called in alloc(), but also in hwpt_attach/replace() if cc is not enforced by the alloc() because the idev that initiates the hwpt_alloc() might not have idev->enforce_cache_coherency. Only when another idev that has idev->enforce_cache_coherency attaches to the shared hwpt, the cc configuration would be done.
is this a bug already? If the 1st device doesn't have enforce_cc in its iommu, setting the snp bit in the hwpt would lead to reserved bit violation.
another problem is that intel_iommu_enforce_cache_coherency() doesn't update existing entries. It only sets a domain flag to affect future mappings. so it means the 2nd idev is also broken.
The simplest option is to follow vfio type1 i.e. don't mix devices with different enforce_cc in one domain.
On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 14, 2023 8:45 AM
On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed hwpt things. So, they should be only setup on kernel-managed domains. If the
attaching
domain is a user-managed domain, redirect the hwpt to hwpt->parent to
do
it correctly.
No redirection. The parent should already have the configuration done when it's created. It shouldn't be triggered in the nesting path.
iommufd_hw_pagetable_enforce_cc() is not only called in alloc(), but also in hwpt_attach/replace() if cc is not enforced by the alloc() because the idev that initiates the hwpt_alloc() might not have idev->enforce_cache_coherency. Only when another idev that has idev->enforce_cache_coherency attaches to the shared hwpt, the cc configuration would be done.
is this a bug already? If the 1st device doesn't have enforce_cc in its iommu, setting the snp bit in the hwpt would lead to reserved bit violation.
I suspect there are technically some gaps in the intel driver, yes..
another problem is that intel_iommu_enforce_cache_coherency() doesn't update existing entries. It only sets a domain flag to affect future mappings. so it means the 2nd idev is also broken.
This is such a gap, intel driver should not permit that.
The simplest option is to follow vfio type1 i.e. don't mix devices with different enforce_cc in one domain.
This is why I wanted to get rid of this bad mechanism going forward.
Manually created hwpt should have a manual specification of cc and then we don't have so many problems.
It means userspace needs to compute if they want to use CC or not, but userspace already needs to figure this out since without autodomains it must create two hwpts manually anyhow.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Monday, October 16, 2023 7:58 PM
On Mon, Oct 16, 2023 at 08:48:03AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Saturday, October 14, 2023 8:45 AM
On Tue, Sep 26, 2023 at 01:16:35AM -0700, Tian, Kevin wrote:
From: Liu, Yi L yi.l.liu@intel.com Sent: Thursday, September 21, 2023 3:51 PM
From: Nicolin Chen nicolinc@nvidia.com
Now enforce_cache_coherency and msi_cookie are kernel-managed
hwpt
things. So, they should be only setup on kernel-managed domains. If the
attaching
domain is a user-managed domain, redirect the hwpt to hwpt->parent
to
do
it correctly.
No redirection. The parent should already have the configuration done when it's created. It shouldn't be triggered in the nesting path.
iommufd_hw_pagetable_enforce_cc() is not only called in alloc(), but also in hwpt_attach/replace() if cc is not enforced by the alloc() because the idev that initiates the hwpt_alloc() might not have idev->enforce_cache_coherency. Only when another idev that has idev->enforce_cache_coherency attaches to the shared hwpt, the cc configuration would be done.
is this a bug already? If the 1st device doesn't have enforce_cc in its iommu, setting the snp bit in the hwpt would lead to reserved bit violation.
I suspect there are technically some gaps in the intel driver, yes..
double checked. intel driver is doing right thing now:
intel_iommu_enforce_cache_coherency() domain_support_force_snooping()
static bool domain_support_force_snooping(struct dmar_domain *domain) { struct device_domain_info *info; bool support = true;
assert_spin_locked(&domain->lock); list_for_each_entry(info, &domain->devices, link) { if (!ecap_sc_support(info->iommu->ecap)) { support = false; break; } }
return support; }
another problem is that intel_iommu_enforce_cache_coherency() doesn't update existing entries. It only sets a domain flag to affect future mappings. so it means the 2nd idev is also broken.
This is such a gap, intel driver should not permit that.
yes. @Baolu, can you add a fix?
The simplest option is to follow vfio type1 i.e. don't mix devices with different enforce_cc in one domain.
This is why I wanted to get rid of this bad mechanism going forward.
Manually created hwpt should have a manual specification of cc and then we don't have so many problems.
It means userspace needs to compute if they want to use CC or not, but userspace already needs to figure this out since without autodomains it must create two hwpts manually anyhow.
Now there is no interface reporting enforce_cc to userspace.
Actually the user doesn't need to know enforce_cc. As long as there is an attach incompatibility error the user has to create a 2nd hwpt for the to-be-attached device then enforce_cc will be handled automatically in hwpt_alloc.
I prefer to removing enforce_cc in attach fn completely then no parent trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to figure out the attaching compatibility:
- attaching a idev with matching enforce_cc as hwpt is always allowed. - attaching a idev w/o enforce_cc to a hwpt with enforce_cc is rejected. - attaching a idev w/ enforce_cc to a hwpt w/o enforce_cc succeeds with hwpt continuing to be w/o enforce_cc. No promotion.
above has been guaranteed by intel iommu driver.
On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
This is why I wanted to get rid of this bad mechanism going forward.
Manually created hwpt should have a manual specification of cc and then we don't have so many problems.
It means userspace needs to compute if they want to use CC or not, but userspace already needs to figure this out since without autodomains it must create two hwpts manually anyhow.
Now there is no interface reporting enforce_cc to userspace.
Yes..
Actually the user doesn't need to know enforce_cc. As long as there is an attach incompatibility error the user has to create a 2nd hwpt for the to-be-attached device then enforce_cc will be handled automatically in hwpt_alloc.
Uh, OK we can do that.. It is a bit hacky because we don't know the order do
I prefer to removing enforce_cc in attach fn completely then no parent trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to figure out the attaching compatibility:
You are basically saying to set the cc mode during creation because we know the idev at that moment and can tell if it should be on/off?
It seems reasonable, but I can't remember why it is in the attach path at the moment.
Jason
On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
I prefer to removing enforce_cc in attach fn completely then no parent trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to figure out the attaching compatibility:
You are basically saying to set the cc mode during creation because we know the idev at that moment and can tell if it should be on/off?
It seems reasonable, but I can't remember why it is in the attach path at the moment.
This was the commit adding it in the alloc path: https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-44eeda11b45e@arm...
The older code was doing a hwpt "upgrade" from !cc to cc: - /* - * Try to upgrade the domain we have, it is an iommu driver bug to - * report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail - * enforce_cache_coherency when there are no devices attached to the - * domain. - */ - if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) { - if (hwpt->domain->ops->enforce_cache_coherency) - hwpt->enforce_cache_coherency = - hwpt->domain->ops->enforce_cache_coherency( - hwpt->domain);
If we remove the enforce_cc call in the attach path and let the driver decide whether to enforce or reject in attach_dev calls, there seems to be no point in tracking an enforce_cache_coherency flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU extension check in the vfio-compat pathway?
Thanks Nic
On Tue, Oct 17, 2023 at 12:58:39PM -0700, Nicolin Chen wrote:
On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
I prefer to removing enforce_cc in attach fn completely then no parent trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver to figure out the attaching compatibility:
You are basically saying to set the cc mode during creation because we know the idev at that moment and can tell if it should be on/off?
It seems reasonable, but I can't remember why it is in the attach path at the moment.
This was the commit adding it in the alloc path: https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-44eeda11b45e@arm...
The older code was doing a hwpt "upgrade" from !cc to cc:
/*
* Try to upgrade the domain we have, it is an iommu driver bug to
* report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
* enforce_cache_coherency when there are no devices attached to the
* domain.
*/
if (idev->enforce_cache_coherency && !hwpt->enforce_cache_coherency) {
if (hwpt->domain->ops->enforce_cache_coherency)
hwpt->enforce_cache_coherency =
hwpt->domain->ops->enforce_cache_coherency(
hwpt->domain);
If we remove the enforce_cc call in the attach path and let the driver decide whether to enforce or reject in attach_dev calls, there seems to be no point in tracking an enforce_cache_coherency flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU extension check in the vfio-compat pathway?
I think the purpose of this code is to try to minimize the number of domains.
Otherwise we have a problem where the order devices are attached to the domain decides how many domains you get. ie the first device attached does not want CC (but is compatible with it) so we create a non-CC domain
Then later we attach a device that does want CC and now we are forced to create a second iommu domain when upgrading the first domain would have been fine.
Hotplug is another scenario (though Intel driver does not support it, and it looks broken)
Really, I hate this CC mechanism. It is only for Intel, can we just punt it to userspace and have an intel 'want cc flag' for the entire nesting path and forget about it?
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Thursday, October 19, 2023 12:51 AM
On Tue, Oct 17, 2023 at 12:58:39PM -0700, Nicolin Chen wrote:
On Tue, Oct 17, 2023 at 12:53:01PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 17, 2023 at 08:52:49AM +0000, Tian, Kevin wrote:
I prefer to removing enforce_cc in attach fn completely then no parent trick in this patch. Just keep it in hwpt_alloc and leave to iommu driver
to
figure out the attaching compatibility:
You are basically saying to set the cc mode during creation because we know the idev at that moment and can tell if it should be on/off?
It seems reasonable, but I can't remember why it is in the attach path at the moment.
This was the commit adding it in the alloc path: https://lore.kernel.org/linux-iommu/8e897628-61fa-b3fb-b609-
44eeda11b45e@arm.com/
The older code was doing a hwpt "upgrade" from !cc to cc:
/*
* Try to upgrade the domain we have, it is an iommu driver bug to
* report IOMMU_CAP_ENFORCE_CACHE_COHERENCY but fail
* enforce_cache_coherency when there are no devices attached to
the
* domain.
*/
if (idev->enforce_cache_coherency && !hwpt-
enforce_cache_coherency) {
if (hwpt->domain->ops->enforce_cache_coherency)
hwpt->enforce_cache_coherency =
hwpt->domain->ops->enforce_cache_coherency(
hwpt->domain);
If we remove the enforce_cc call in the attach path and let the driver decide whether to enforce or reject in attach_dev calls, there seems to be no point in tracking an enforce_cache_coherency flag in the IOMMUFD pathway but only for the VFIO_DMA_CC_IOMMU extension check in the vfio-compat pathway?
I think the purpose of this code is to try to minimize the number of domains.
Otherwise we have a problem where the order devices are attached to the domain decides how many domains you get. ie the first device attached does not want CC (but is compatible with it) so we create a non-CC domain
in autodetect model this won't happen. If the first device is capable of enforce_cc then the domain will be created with enforce_cc.
there is no "does not want CC" input in autodetect.
Then later we attach a device that does want CC and now we are forced to create a second iommu domain when upgrading the first domain would have been fine.
then in this case the 2nd device will reuse the domain.
Hotplug is another scenario (though Intel driver does not support it, and it looks broken)
Can you elaborate how hotplug is broken? If device is hotplugged and failed to attach to an existing domain, then a new one will be created for it.
there is indeed a broken case in KVM which cannot handle dynamic change of coherency due to hotplug. But that one is orthogonal to what we discussed here about how to decide cc in iommufd.
Really, I hate this CC mechanism. It is only for Intel, can we just
Intel and AMD.
punt it to userspace and have an intel 'want cc flag' for the entire nesting path and forget about it?
I dislike it too. But still not get your point why adding such a flag can really simplify things. As explained before the only difference between autodetect and having a user flag just affects the decision of cc when creating a hwpt. whether we should upgrade in the attach path is an orthogonal open which imho is unnecessary and Nicoline's patches to remove that check then also remove this patch makes lot of sense to me.
Then the necessity of introducing such a flag (plus adding a new interface to report cc to user space) is not obvious.
On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:
Otherwise we have a problem where the order devices are attached to the domain decides how many domains you get. ie the first device attached does not want CC (but is compatible with it) so we create a non-CC domain
in autodetect model this won't happen. If the first device is capable of enforce_cc then the domain will be created with enforce_cc.
there is no "does not want CC" input in autodetect.
Then later we attach a device that does want CC and now we are forced to create a second iommu domain when upgrading the first domain would have been fine.
then in this case the 2nd device will reuse the domain.
Then you have the reverse problem where the domain will not be CC when it should be.
Hotplug is another scenario (though Intel driver does not support it, and it looks broken)
Can you elaborate how hotplug is broken? If device is hotplugged and failed to attach to an existing domain, then a new one will be created for it.
A non-cc domain will ask to be upgraded and the driver will let it happen even though it doesn't/can't fix the existing IOPTEs
there is indeed a broken case in KVM which cannot handle dynamic change of coherency due to hotplug. But that one is orthogonal to what we discussed here about how to decide cc in iommufd.
That too
Really, I hate this CC mechanism. It is only for Intel, can we just
Intel and AMD.
Nope, AMD just hardwires their IOMMU to always do CC enforcing. All this mess is only for Intel and their weird IOMMU that can only do the enforcement for a GPU.
punt it to userspace and have an intel 'want cc flag' for the entire nesting path and forget about it?
I dislike it too. But still not get your point why adding such a flag can really simplify things. As explained before the only difference between autodetect and having a user flag just affects the decision of cc when creating a hwpt. whether we should upgrade in the attach path is an orthogonal open which imho is unnecessary and Nicoline's patches to remove that check then also remove this patch makes lot of sense to me.
I don't think we can remove it, it is supposed to provide consistency of result regardless of ordering.
Jason
From: Jason Gunthorpe jgg@nvidia.com Sent: Friday, October 20, 2023 7:54 AM
On Thu, Oct 19, 2023 at 01:56:01AM +0000, Tian, Kevin wrote:
Otherwise we have a problem where the order devices are attached to the domain decides how many domains you get. ie the first device attached does not want CC (but is compatible with it) so we create a non-CC domain
in autodetect model this won't happen. If the first device is capable of enforce_cc then the domain will be created with enforce_cc.
there is no "does not want CC" input in autodetect.
Then later we attach a device that does want CC and now we are forced to create a second iommu domain when upgrading the first domain
would
have been fine.
then in this case the 2nd device will reuse the domain.
Then you have the reverse problem where the domain will not be CC when it should be.
If the domain has been non-CC it's perfectly fine for the 2nd device with CC to reuse it. As long as there is one domain with non-CC then KVM has to have special treatment on wbinvd. In this case there is actually a benefit to use just one non-CC domain for all devices (either non-CC or CC).
What we want to prevent is attaching a non-CC device to a CC domain or upgrade a non-CC domain to CC since in both case the non-CC device will be broken due to incompatible page table format.
Hotplug is another scenario (though Intel driver does not support it, and it looks broken)
Can you elaborate how hotplug is broken? If device is hotplugged and failed to attach to an existing domain, then a new one will be created for it.
A non-cc domain will ask to be upgraded and the driver will let it happen even though it doesn't/can't fix the existing IOPTEs
iommufd should not ask for upgrade at all. The CC attribute of domain should be fixed since creation time.
Baolu will fix the intel-iommu driver accordingly.
there is indeed a broken case in KVM which cannot handle dynamic change of coherency due to hotplug. But that one is orthogonal to what we discussed here about how to decide cc in iommufd.
That too
Really, I hate this CC mechanism. It is only for Intel, can we just
Intel and AMD.
Nope, AMD just hardwires their IOMMU to always do CC enforcing. All this mess is only for Intel and their weird IOMMU that can only do the enforcement for a GPU.
punt it to userspace and have an intel 'want cc flag' for the entire nesting path and forget about it?
I dislike it too. But still not get your point why adding such a flag can really simplify things. As explained before the only difference between autodetect and having a user flag just affects the decision of cc when creating a hwpt. whether we should upgrade in the attach path is an orthogonal open which imho is unnecessary and Nicoline's patches to remove that check then also remove this patch makes lot of sense to me.
I don't think we can remove it, it is supposed to provide consistency of result regardless of ordering.
Who cares about such consistency? sure the result is different due to order:
1) creating hwpt for dev1 (non-CC) then later attaching hwpt to dev2 (CC) will succeed;
2) creating hwpt for dev2 (CC) then later attaching hwpt to dev1 (non-CC) will fail then the user should create a new hwpt for dev1;
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
What we want to prevent is attaching a non-CC device to a CC domain or upgrade a non-CC domain to CC since in both case the non-CC device will be broken due to incompatible page table format.
[..]
Who cares about such consistency? sure the result is different due to order:
creating hwpt for dev1 (non-CC) then later attaching hwpt to dev2 (CC) will succeed;
creating hwpt for dev2 (CC) then later attaching hwpt to dev1 (non-CC) will fail then the user should create a new hwpt for dev1;
AH... So really what the Intel driver wants is not upgrade to CC but *downgrade* from CC.
non-CC is the type that is universally applicable, so if we come across a non-CC capable device the proper/optimal thing is to degrade the HWPT and re-use it, not allocate a new HWPT.
So the whole thing is upside down.
As changing the IOPTEs in flight seems hard, and I don't want to see the Intel driver get slowed down to accomodate this, I think you are right to say this should be a creation time property only.
I still think userspace should be able to select it so it can minimize the number of HWPTs required.
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
Anyhow, OK, lets add a comment summarizing your points and remove the cc upgrade at attach time (sorry Nicolin/Yi!)
It is easy to add a HWPT flag for this later if someone wants to optimize it.
Jason
On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
What we want to prevent is attaching a non-CC device to a CC domain or upgrade a non-CC domain to CC since in both case the non-CC device will be broken due to incompatible page table format.
[..]
Who cares about such consistency? sure the result is different due to order:
creating hwpt for dev1 (non-CC) then later attaching hwpt to dev2 (CC) will succeed;
creating hwpt for dev2 (CC) then later attaching hwpt to dev1 (non-CC) will fail then the user should create a new hwpt for dev1;
AH... So really what the Intel driver wants is not upgrade to CC but *downgrade* from CC.
non-CC is the type that is universally applicable, so if we come across a non-CC capable device the proper/optimal thing is to degrade the HWPT and re-use it, not allocate a new HWPT.
So the whole thing is upside down.
As changing the IOPTEs in flight seems hard, and I don't want to see the Intel driver get slowed down to accomodate this, I think you are right to say this should be a creation time property only.
I still think userspace should be able to select it so it can minimize the number of HWPTs required.
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
Anyhow, OK, lets add a comment summarizing your points and remove the cc upgrade at attach time (sorry Nicolin/Yi!)
Ack. I will send a small removal series. I assume it should CC stable tree also? And where should we add this comment? Kdoc of the alloc uAPI?
Thanks! Nicolin
It is easy to add a HWPT flag for this later if someone wants to optimize it.
Jason
On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
What we want to prevent is attaching a non-CC device to a CC domain or upgrade a non-CC domain to CC since in both case the non-CC device will be broken due to incompatible page table format.
[..]
Who cares about such consistency? sure the result is different due to order:
creating hwpt for dev1 (non-CC) then later attaching hwpt to dev2 (CC) will succeed;
creating hwpt for dev2 (CC) then later attaching hwpt to dev1 (non-CC) will fail then the user should create a new hwpt for dev1;
AH... So really what the Intel driver wants is not upgrade to CC but *downgrade* from CC.
non-CC is the type that is universally applicable, so if we come across a non-CC capable device the proper/optimal thing is to degrade the HWPT and re-use it, not allocate a new HWPT.
So the whole thing is upside down.
As changing the IOPTEs in flight seems hard, and I don't want to see the Intel driver get slowed down to accomodate this, I think you are right to say this should be a creation time property only.
I still think userspace should be able to select it so it can minimize the number of HWPTs required.
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
Anyhow, OK, lets add a comment summarizing your points and remove the cc upgrade at attach time (sorry Nicolin/Yi!)
Ack. I will send a small removal series. I assume it should CC stable tree also?
No, it seems more like tidying that fixing a functional issue, do I misunderstand?
And where should we add this comment? Kdoc of the alloc uAPI?
Maybe right in front of the only enforce_cc op callback?
Jason
On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
What we want to prevent is attaching a non-CC device to a CC domain or upgrade a non-CC domain to CC since in both case the non-CC device will be broken due to incompatible page table format.
[..]
Who cares about such consistency? sure the result is different due to order:
creating hwpt for dev1 (non-CC) then later attaching hwpt to dev2 (CC) will succeed;
creating hwpt for dev2 (CC) then later attaching hwpt to dev1 (non-CC) will fail then the user should create a new hwpt for dev1;
AH... So really what the Intel driver wants is not upgrade to CC but *downgrade* from CC.
non-CC is the type that is universally applicable, so if we come across a non-CC capable device the proper/optimal thing is to degrade the HWPT and re-use it, not allocate a new HWPT.
So the whole thing is upside down.
As changing the IOPTEs in flight seems hard, and I don't want to see the Intel driver get slowed down to accomodate this, I think you are right to say this should be a creation time property only.
I still think userspace should be able to select it so it can minimize the number of HWPTs required.
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
Anyhow, OK, lets add a comment summarizing your points and remove the cc upgrade at attach time (sorry Nicolin/Yi!)
Ack. I will send a small removal series. I assume it should CC stable tree also?
No, it seems more like tidying that fixing a functional issue, do I misunderstand?
Hmm. Maybe the misunderstanding is mine -- Kevin was asking if it was already a bug and you answered yes: https://lore.kernel.org/linux-iommu/20231016115736.GP3952@nvidia.com/
If this shouldn't be a bug fix, I could just merge them into a single tidying patch and add the comments you suggested below.
And where should we add this comment? Kdoc of the alloc uAPI?
Maybe right in front of the only enforce_cc op callback?
OK. How about this? Might be a bit verbose though: /* * enforce_cache_coherenc must be determined during the HWPT allocation. * Note that a HWPT (non-CC) created for a device (non-CC) can be later * reused by another device (either non-CC or CC). However, A HWPT (CC) * created for a device (CC) cannot be reused by another device (non-CC) * but only devices (CC). Instead user space in this case would need to * allocate a separate HWPT (non-CC). */
Thanks Nic
From: Nicolin Chen nicolinc@nvidia.com Sent: Monday, October 23, 2023 8:18 AM
On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
Anyhow, OK, lets add a comment summarizing your points and remove
the
cc upgrade at attach time (sorry Nicolin/Yi!)
Ack. I will send a small removal series. I assume it should CC stable tree also?
No, it seems more like tidying that fixing a functional issue, do I misunderstand?
Hmm. Maybe the misunderstanding is mine -- Kevin was asking if it was already a bug and you answered yes: https://lore.kernel.org/linux-iommu/20231016115736.GP3952@nvidia.com/
currently intel-iommu driver already rejects 1) enforcing CC on a domain which is already attached to non-CC device and 2) attaching a non-CC device to a domain which has enforce_cc.
so there is no explorable bug to fix in stable tree.
Just logically intel-iommu driver doesn't support enforcing Cc on a domain which is capable of enforce-cc and already has valid mappings. Then it should add proper check to prevent it from being attempted by future usages.
On Mon, Oct 23, 2023 at 02:53:20AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Monday, October 23, 2023 8:18 AM
On Sat, Oct 21, 2023 at 01:38:04PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 11:59:13AM -0700, Nicolin Chen wrote:
On Fri, Oct 20, 2023 at 10:55:01AM -0300, Jason Gunthorpe wrote:
On Fri, Oct 20, 2023 at 02:43:58AM +0000, Tian, Kevin wrote:
But the user shouldn't assume such explicit consistency since it's not defined in our uAPI. All we defined is that the attaching may fail due to incompatibility for whatever reason then the user can always try creating a new hwpt for the to-be-attached device. From this regard I don't see providing consistency of result is necessary. 😊
Anyhow, OK, lets add a comment summarizing your points and remove
the
cc upgrade at attach time (sorry Nicolin/Yi!)
Ack. I will send a small removal series. I assume it should CC stable tree also?
No, it seems more like tidying that fixing a functional issue, do I misunderstand?
Hmm. Maybe the misunderstanding is mine -- Kevin was asking if it was already a bug and you answered yes: https://lore.kernel.org/linux-iommu/20231016115736.GP3952@nvidia.com/
currently intel-iommu driver already rejects 1) enforcing CC on a domain which is already attached to non-CC device and 2) attaching a non-CC device to a domain which has enforce_cc.
so there is no explorable bug to fix in stable tree.
I see. Thanks!
On Sun, Oct 22, 2023 at 05:18:03PM -0700, Nicolin Chen wrote:
And where should we add this comment? Kdoc of the alloc uAPI?
Maybe right in front of the only enforce_cc op callback?
OK. How about this? Might be a bit verbose though: /* * enforce_cache_coherenc must be determined during the HWPT allocation. * Note that a HWPT (non-CC) created for a device (non-CC) can be later * reused by another device (either non-CC or CC). However, A HWPT (CC) * created for a device (CC) cannot be reused by another device (non-CC) * but only devices (CC). Instead user space in this case would need to * allocate a separate HWPT (non-CC). */
Ok, fix the spello in enforce_cache_coherenc
Jason
On Mon, Oct 23, 2023 at 10:59:35AM -0300, Jason Gunthorpe wrote:
On Sun, Oct 22, 2023 at 05:18:03PM -0700, Nicolin Chen wrote:
And where should we add this comment? Kdoc of the alloc uAPI?
Maybe right in front of the only enforce_cc op callback?
OK. How about this? Might be a bit verbose though: /* * enforce_cache_coherenc must be determined during the HWPT allocation. * Note that a HWPT (non-CC) created for a device (non-CC) can be later * reused by another device (either non-CC or CC). However, A HWPT (CC) * created for a device (CC) cannot be reused by another device (non-CC) * but only devices (CC). Instead user space in this case would need to * allocate a separate HWPT (non-CC). */
Ok, fix the spello in enforce_cache_coherenc
Oops.
I also found the existing piece sorta says a similar thing: * Set the coherency mode before we do iopt_table_add_domain() as some * iommus have a per-PTE bit that controls it and need to decide before * doing any maps.
So, did this and sending v3: - * enforce_cache_coherenc must be determined during the HWPT allocation. + * The cache coherency mode must be configured here and unchanged later.
On 10/17/23 4:52 PM, Tian, Kevin wrote:
another problem is that intel_iommu_enforce_cache_coherency() doesn't update existing entries. It only sets a domain flag to affect future mappings. so it means the 2nd idev is also broken.
This is such a gap, intel driver should not permit that.
yes. @Baolu, can you add a fix?
Sure thing!
Best regards, baolu
From: Tian, Kevin Sent: Tuesday, October 17, 2023 4:53 PM
This is why I wanted to get rid of this bad mechanism going forward.
Manually created hwpt should have a manual specification of cc and then we don't have so many problems.
Actually I don't see much difference between manual specification of cc vs. indirect specification of cc via specified idev in hwpt_alloc. What really matters is how we want to deal with the compatibility in the following attach path.
From: Nicolin Chen nicolinc@nvidia.com
The iopt_table_enforce_dev_resv_regions() and iopt_remove_reserved_iova() require callers to pass in an ioas->iopt pointer. It simply works with a kernel-managed hw_pagetable by passing in its hwpt->ioas->iopt pointer. However, now there could be a user-managed hw_pagetable that doesn't have an ioas pointer. And typically most of device reserved regions should be enforced to a kernel-managed domain only, although the IOMMU_RESV_SW_MSI used by SMMU will introduce some complication.
Add a pair of iommufd_device_enforce_rr/iommufd_device_remove_rr helpers that calls iopt_table_enforce_dev_resv_regions/iopt_remove_reserved_iova functions after some additional checks. This would also ease any further extension to support the IOMMU_RESV_SW_MSI complication mentioned above.
For the replace() routine, add another helper to compare ioas pointers, with the support of user-managed hw_pagetables.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/device.c | 42 ++++++++++++++++++------- drivers/iommu/iommufd/iommufd_private.h | 18 +++++++++++ 2 files changed, 48 insertions(+), 12 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 104dd061a2a3..10e6ec590ede 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -329,6 +329,28 @@ static int iommufd_group_setup_msi(struct iommufd_group *igroup, return 0; }
+static void iommufd_device_remove_rr(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt) +{ + if (WARN_ON(!hwpt)) + return; + if (hwpt->user_managed) + return; + iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); +} + +static int iommufd_device_enforce_rr(struct iommufd_device *idev, + struct iommufd_hw_pagetable *hwpt, + phys_addr_t *sw_msi_start) +{ + if (WARN_ON(!hwpt)) + return -EINVAL; + if (hwpt->user_managed) + return 0; + return iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, + sw_msi_start); +} + int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, struct iommufd_device *idev) { @@ -348,8 +370,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, goto err_unlock; }
- rc = iopt_table_enforce_dev_resv_regions(&hwpt->ioas->iopt, idev->dev, - &idev->igroup->sw_msi_start); + rc = iommufd_device_enforce_rr(idev, hwpt, &idev->igroup->sw_msi_start); if (rc) goto err_unlock;
@@ -375,7 +396,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt, mutex_unlock(&idev->igroup->lock); return 0; err_unresv: - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + iommufd_device_remove_rr(idev, hwpt); err_unlock: mutex_unlock(&idev->igroup->lock); return rc; @@ -392,7 +413,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev) iommu_detach_group(hwpt->domain, idev->igroup->group); idev->igroup->hwpt = NULL; } - iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev); + iommufd_device_remove_rr(idev, hwpt); mutex_unlock(&idev->igroup->lock);
/* Caller must destroy hwpt */ @@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev, }
old_hwpt = igroup->hwpt; - if (hwpt->ioas != old_hwpt->ioas) { + if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) { list_for_each_entry(cur, &igroup->device_list, group_item) { - rc = iopt_table_enforce_dev_resv_regions( - &hwpt->ioas->iopt, cur->dev, NULL); + rc = iommufd_device_enforce_rr(cur, hwpt, NULL); if (rc) goto err_unresv; } @@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, if (rc) goto err_unresv;
- if (hwpt->ioas != old_hwpt->ioas) { + if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) { list_for_each_entry(cur, &igroup->device_list, group_item) - iopt_remove_reserved_iova(&old_hwpt->ioas->iopt, - cur->dev); + iommufd_device_remove_rr(cur, hwpt); } - igroup->hwpt = hwpt;
/* @@ -483,7 +501,7 @@ iommufd_device_do_replace(struct iommufd_device *idev, return old_hwpt; err_unresv: list_for_each_entry(cur, &igroup->device_list, group_item) - iopt_remove_reserved_iova(&hwpt->ioas->iopt, cur->dev); + iommufd_device_remove_rr(cur, hwpt); err_unlock: mutex_unlock(&idev->igroup->lock); return ERR_PTR(rc); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 34940596c2c2..b14f23d3f42e 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -281,6 +281,24 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, refcount_dec(&hwpt->obj.users); }
+static inline bool +iommufd_hw_pagetable_compare_ioas(struct iommufd_hw_pagetable *old_hwpt, + struct iommufd_hw_pagetable *new_hwpt) +{ + struct iommufd_ioas *old_ioas, *new_ioas; + + WARN_ON(!old_hwpt || !new_hwpt); + if (old_hwpt->user_managed) + old_ioas = old_hwpt->parent->ioas; + else + old_ioas = old_hwpt->ioas; + if (new_hwpt->user_managed) + new_ioas = new_hwpt->parent->ioas; + else + new_ioas = new_hwpt->ioas; + return old_ioas != new_ioas; +} + struct iommufd_group { struct kref ref; struct mutex lock;
@@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev, } old_hwpt = igroup->hwpt;
- if (hwpt->ioas != old_hwpt->ioas) {
- if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) { list_for_each_entry(cur, &igroup->device_list, group_item) {
rc = iopt_table_enforce_dev_resv_regions(
&hwpt->ioas->iopt, cur->dev, NULL);
}rc = iommufd_device_enforce_rr(cur, hwpt, NULL); if (rc) goto err_unresv;
@@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, if (rc) goto err_unresv;
- if (hwpt->ioas != old_hwpt->ioas) {
- if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) { list_for_each_entry(cur, &igroup->device_list, group_item)
iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
cur->dev);
iommufd_device_remove_rr(cur, hwpt);
Should be "iommufd_device_remove_rr(cur, old_hwpt);"
}
- igroup->hwpt = hwpt;
/*
On Sat, Oct 07, 2023 at 03:20:41PM +0800, Yan Zhao wrote:
@@ -444,10 +465,9 @@ iommufd_device_do_replace(struct iommufd_device *idev, }
old_hwpt = igroup->hwpt;
if (hwpt->ioas != old_hwpt->ioas) {
if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) { list_for_each_entry(cur, &igroup->device_list, group_item) {
rc = iopt_table_enforce_dev_resv_regions(
&hwpt->ioas->iopt, cur->dev, NULL);
rc = iommufd_device_enforce_rr(cur, hwpt, NULL); if (rc) goto err_unresv; }
@@ -461,12 +481,10 @@ iommufd_device_do_replace(struct iommufd_device *idev, if (rc) goto err_unresv;
if (hwpt->ioas != old_hwpt->ioas) {
if (iommufd_hw_pagetable_compare_ioas(old_hwpt, hwpt)) { list_for_each_entry(cur, &igroup->device_list, group_item)
iopt_remove_reserved_iova(&old_hwpt->ioas->iopt,
cur->dev);
iommufd_device_remove_rr(cur, hwpt);
Should be "iommufd_device_remove_rr(cur, old_hwpt);"
Ah, right. Should fix this.
Thanks! Nicolin
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index 90fd65859e28..ab25de149ae6 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -300,6 +300,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) bool flag); struct iommufd_hw_pagetable *hwpt, *parent; struct iommu_hwpt_alloc *cmd = ucmd->cmd; + struct iommu_user_data *data = NULL; struct iommufd_object *pt_obj; struct iommufd_device *idev; struct iommufd_ioas *ioas; @@ -308,6 +309,11 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
if (cmd->flags & ~IOMMU_HWPT_ALLOC_NEST_PARENT || cmd->__reserved) return -EOPNOTSUPP; + if (!cmd->data_len && cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) + return -EINVAL; + if (cmd->flags & IOMMU_HWPT_ALLOC_NEST_PARENT && + cmd->hwpt_type != IOMMU_HWPT_TYPE_DEFAULT) + return -EINVAL;
idev = iommufd_get_device(ucmd, cmd->dev_id); if (IS_ERR(idev)) @@ -340,9 +346,19 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) goto out_put_pt; }
+ if (cmd->data_len) { + data = kzalloc(sizeof(*data), GFP_KERNEL); + if (!data) { + rc = -ENOMEM; + goto out_put_pt; + } + data->uptr = u64_to_user_ptr(cmd->data_uptr); + data->len = cmd->data_len; + } + mutex_lock(mutex); hwpt = alloc_fn(ucmd->ictx, pt_obj, idev, cmd->flags, - IOMMU_HWPT_TYPE_DEFAULT, NULL, false); + cmd->hwpt_type, data, false); if (IS_ERR(hwpt)) { rc = PTR_ERR(hwpt); goto out_unlock; @@ -359,6 +375,7 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj); out_unlock: mutex_unlock(mutex); + kfree(data); out_put_pt: iommufd_put_object(pt_obj); out_put_idev: diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index 3c8660fe9bb1..c46b1f772f20 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -370,15 +370,31 @@ enum iommu_hwpt_type { * @size: sizeof(struct iommu_hwpt_alloc) * @flags: Combination of enum iommufd_hwpt_alloc_flags * @dev_id: The device to allocate this HWPT for - * @pt_id: The IOAS to connect this HWPT to + * @pt_id: The IOAS or HWPT to connect this HWPT to * @out_hwpt_id: The ID of the new HWPT * @__reserved: Must be 0 + * @hwpt_type: One of enum iommu_hwpt_type + * @data_len: Length of the type specific data + * @data_uptr: User pointer to the type specific data * * Explicitly allocate a hardware page table object. This is the same object * type that is returned by iommufd_device_attach() and represents the * underlying iommu driver's iommu_domain kernel object. * - * A HWPT will be created with the IOVA mappings from the given IOAS. + * A kernel-managed HWPT will be created with the mappings from the given + * IOAS via the @pt_id. The @hwpt_type for this allocation can be set to + * either IOMMU_HWPT_TYPE_DEFAULT or a pre-defined type corresponding to + * an I/O page table type supported by the underlying IOMMU hardware. + * + * A user-managed HWPT will be created from a given parent HWPT via the + * @pt_id, in which the parent HWPT must be allocated previously via the + * same ioctl from a given IOAS (@pt_id). In this case, the @hwpt_type + * must be set to a pre-defined type corresponding to an I/O page table + * type supported by the underlying IOMMU hardware. + * + * If the @hwpt_type is set to IOMMU_HWPT_TYPE_DEFAULT, @data_len and + * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr + * must be given. */ struct iommu_hwpt_alloc { __u32 size; @@ -387,6 +403,9 @@ struct iommu_hwpt_alloc { __u32 pt_id; __u32 out_hwpt_id; __u32 __reserved; + __u32 hwpt_type; + __u32 data_len; + __aligned_u64 data_uptr; }; #define IOMMU_HWPT_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_ALLOC)
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
Can we also come with a small vt-d patch that does implement an op for this? Or is it too big?
It would be nice if we could wrap IOMMU_HWPT_ALLOC into one self-contained series and another series for invalidate.
Jason
On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
Can we also come with a small vt-d patch that does implement an op for this? Or is it too big?
It would be nice if we could wrap IOMMU_HWPT_ALLOC into one self-contained series and another series for invalidate.
We now only use IOMMU_HWPT_ALLOC for nested domain allocations, which won't be supported until the cache_invalidate_user ops is preset?
/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
+ /* Driver is buggy by missing cache_invalidate_user in domain_ops */ + if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) { + rc = -EINVAL; + goto out_abort; + }
Thanks Nic
On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
Can we also come with a small vt-d patch that does implement an op for this? Or is it too big?
It would be nice if we could wrap IOMMU_HWPT_ALLOC into one self-contained series and another series for invalidate.
We now only use IOMMU_HWPT_ALLOC for nested domain allocations, which won't be supported until the cache_invalidate_user ops is preset?
/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
/* Driver is buggy by missing cache_invalidate_user in domain_ops */
if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
rc = -EINVAL;
goto out_abort;
}
Hm. That hunk could migrate to the invalidate series.
I'm just leeary of doing the invalidation too considering how complicated it is
Jason
On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
Can we also come with a small vt-d patch that does implement an op for this? Or is it too big?
It would be nice if we could wrap IOMMU_HWPT_ALLOC into one self-contained series and another series for invalidate.
We now only use IOMMU_HWPT_ALLOC for nested domain allocations, which won't be supported until the cache_invalidate_user ops is preset?
/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
/* Driver is buggy by missing cache_invalidate_user in domain_ops */
if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
rc = -EINVAL;
goto out_abort;
}
Hm. That hunk could migrate to the invalidate series.
I'm just leeary of doing the invalidation too considering how complicated it is
OK. Let's see how Yi/Kevin/Baolu reply about the feasibility with the VT-d driver. Then Yi and I can accordingly separate the allocation part into a smaller series.
Thanks Nic
On 2023/10/14 08:51, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
Can we also come with a small vt-d patch that does implement an op for this? Or is it too big?
It would be nice if we could wrap IOMMU_HWPT_ALLOC into one self-contained series and another series for invalidate.
We now only use IOMMU_HWPT_ALLOC for nested domain allocations, which won't be supported until the cache_invalidate_user ops is preset?
/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
/* Driver is buggy by missing cache_invalidate_user in domain_ops */
if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
rc = -EINVAL;
goto out_abort;
}
Hm. That hunk could migrate to the invalidate series.
I'm just leeary of doing the invalidation too considering how complicated it is
OK. Let's see how Yi/Kevin/Baolu reply about the feasibility with the VT-d driver. Then Yi and I can accordingly separate the allocation part into a smaller series.
Current nesting series actually extends HWPT_ALLOC ioctl to accept user data for allocating domain with vendor specific data. Nested translation happens to be the usage of it. But nesting requires invalidation. If we want to do further split, then this new series would be just "extending HWPT_ALLOC to accept vendor specific data from userspace". But it will lack of a user if nesting is separated. Is this acceptable? @Jason
BTW. Do we still have unsolved issue on the invalidation path?
On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
On 2023/10/14 08:51, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 09:07:09PM -0300, Jason Gunthorpe wrote:
On Fri, Oct 13, 2023 at 01:58:59PM -0700, Nicolin Chen wrote:
On Fri, Oct 13, 2023 at 12:19:23PM -0300, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:31AM -0700, Yi Liu wrote:
IOMMU_HWPT_ALLOC already supports iommu_domain allocation for usersapce. But it can only allocate a hw_pagetable that associates to a given IOAS, i.e. only a kernel-managed hw_pagetable of IOMMU_HWPT_TYPE_DEFAULT type.
IOMMU drivers can now support user-managed hw_pagetables, for two-stage translation use cases, that require user data input from the user space.
Extend the IOMMU_HWPT_ALLOC ioctl to accept non-default hwpt_type with a type specified user data. Also, update the @pt_id to accept hwpt_id too besides an ioas_id. Then, pass them to the downstream alloc_fn().
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com
drivers/iommu/iommufd/hw_pagetable.c | 19 ++++++++++++++++++- include/uapi/linux/iommufd.h | 23 +++++++++++++++++++++-- 2 files changed, 39 insertions(+), 3 deletions(-)
Can we also come with a small vt-d patch that does implement an op for this? Or is it too big?
It would be nice if we could wrap IOMMU_HWPT_ALLOC into one self-contained series and another series for invalidate.
We now only use IOMMU_HWPT_ALLOC for nested domain allocations, which won't be supported until the cache_invalidate_user ops is preset?
/* e.g. the following piece is in iommufd_user_managed_hwpt_alloc */
/* Driver is buggy by missing cache_invalidate_user in domain_ops */
if (WARN_ON_ONCE(!hwpt->domain->ops->cache_invalidate_user)) {
rc = -EINVAL;
goto out_abort;
}
Hm. That hunk could migrate to the invalidate series.
I'm just leeary of doing the invalidation too considering how complicated it is
OK. Let's see how Yi/Kevin/Baolu reply about the feasibility with the VT-d driver. Then Yi and I can accordingly separate the allocation part into a smaller series.
Current nesting series actually extends HWPT_ALLOC ioctl to accept user data for allocating domain with vendor specific data. Nested translation happens to be the usage of it. But nesting requires invalidation. If we want to do further split, then this new series would be just "extending HWPT_ALLOC to accept vendor specific data from userspace". But it will lack of a user if nesting is separated. Is this acceptable? @Jason
I'd still like to include the nesting allocation and attach parts though, even if they are not usable without invalidation ..
BTW. Do we still have unsolved issue on the invalidation path?
I'm not sure, there were so many different versions of it we need to go back over it and check the dirver implementations again
Jason
On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
Current nesting series actually extends HWPT_ALLOC ioctl to accept user data for allocating domain with vendor specific data. Nested translation happens to be the usage of it. But nesting requires invalidation. If we want to do further split, then this new series would be just "extending HWPT_ALLOC to accept vendor specific data from userspace". But it will lack of a user if nesting is separated. Is this acceptable? @Jason
I'd still like to include the nesting allocation and attach parts though, even if they are not usable without invalidation ..
This is the latest series that I reworked (in bottom-up order): iommu: Add a pair of helper to copy struct iommu_user_data{_array} iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add a nested HW pagetable object iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Add helpers to enforce/remove device reserved regions iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op iommu: Pass in parent domain with user_data to domain_alloc_user op
Perhaps we can have a preparatory series to merge first: iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Add helpers to enforce/remove device reserved regions
Then next cycle would be basically 4 patches + selftests: iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add a nested HW pagetable object iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op iommu: Pass in parent domain with user_data to domain_alloc_user op
The preparatory series doesn't involve functional changes yet have a good amount of pieces to simplify the "nested HW pagetable" that is basically nested_alloc/abort/destroy.
BTW. Do we still have unsolved issue on the invalidation path?
I'm not sure, there were so many different versions of it we need to go back over it and check the dirver implementations again
Only this v4 has the latest array-based invalidation design. And it should be straightforward for drivers to define entry/request structures. It might be a bit rush to review/finalize it at the stage of rc6 though.
Thanks Nic
On 2023/10/17 02:44, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
Current nesting series actually extends HWPT_ALLOC ioctl to accept user data for allocating domain with vendor specific data. Nested translation happens to be the usage of it. But nesting requires invalidation. If we want to do further split, then this new series would be just "extending HWPT_ALLOC to accept vendor specific data from userspace". But it will lack of a user if nesting is separated. Is this acceptable? @Jason
I'd still like to include the nesting allocation and attach parts though, even if they are not usable without invalidation ..
This is the latest series that I reworked (in bottom-up order): iommu: Add a pair of helper to copy struct iommu_user_data{_array} iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add a nested HW pagetable object iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Add helpers to enforce/remove device reserved regions iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op iommu: Pass in parent domain with user_data to domain_alloc_user op
following Jason's comment, it looks like we can just split the cache invalidation path out. Then the above looks good after removing "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op". Is it? @Jason
Perhaps we can have a preparatory series to merge first: iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Add helpers to enforce/remove device reserved regions
Then next cycle would be basically 4 patches + selftests: iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add a nested HW pagetable object iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op iommu: Pass in parent domain with user_data to domain_alloc_user op
The preparatory series doesn't involve functional changes yet have a good amount of pieces to simplify the "nested HW pagetable" that is basically nested_alloc/abort/destroy.
BTW. Do we still have unsolved issue on the invalidation path?
I'm not sure, there were so many different versions of it we need to go back over it and check the dirver implementations again
Only this v4 has the latest array-based invalidation design. And it should be straightforward for drivers to define entry/request structures. It might be a bit rush to review/finalize it at the stage of rc6 though.
yes, before v4, the cache invalidation path is simple and vendor drivers have their own handling.
On Tue, Oct 17, 2023 at 04:55:12PM +0800, Yi Liu wrote:
On 2023/10/17 02:44, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
Current nesting series actually extends HWPT_ALLOC ioctl to accept user data for allocating domain with vendor specific data. Nested translation happens to be the usage of it. But nesting requires invalidation. If we want to do further split, then this new series would be just "extending HWPT_ALLOC to accept vendor specific data from userspace". But it will lack of a user if nesting is separated. Is this acceptable? @Jason
I'd still like to include the nesting allocation and attach parts though, even if they are not usable without invalidation ..
This is the latest series that I reworked (in bottom-up order): iommu: Add a pair of helper to copy struct iommu_user_data{_array} iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add a nested HW pagetable object iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Add helpers to enforce/remove device reserved regions iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op iommu: Pass in parent domain with user_data to domain_alloc_user op
following Jason's comment, it looks like we can just split the cache invalidation path out. Then the above looks good after removing "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op". Is it? @Jason
If it can make sense, sure. It would be nice to be finished with the alloc path
Only this v4 has the latest array-based invalidation design. And it should be straightforward for drivers to define entry/request structures. It might be a bit rush to review/finalize it at the stage of rc6 though.
yes, before v4, the cache invalidation path is simple and vendor drivers have their own handling.
Have driver implementations of v4 been done to look at?
Jason
On Tue, Oct 17, 2023 at 12:50:11PM -0300, Jason Gunthorpe wrote:
On Tue, Oct 17, 2023 at 04:55:12PM +0800, Yi Liu wrote:
On 2023/10/17 02:44, Nicolin Chen wrote:
On Mon, Oct 16, 2023 at 08:59:07AM -0300, Jason Gunthorpe wrote:
On Mon, Oct 16, 2023 at 03:03:04PM +0800, Yi Liu wrote:
Current nesting series actually extends HWPT_ALLOC ioctl to accept user data for allocating domain with vendor specific data. Nested translation happens to be the usage of it. But nesting requires invalidation. If we want to do further split, then this new series would be just "extending HWPT_ALLOC to accept vendor specific data from userspace". But it will lack of a user if nesting is separated. Is this acceptable? @Jason
I'd still like to include the nesting allocation and attach parts though, even if they are not usable without invalidation ..
This is the latest series that I reworked (in bottom-up order): iommu: Add a pair of helper to copy struct iommu_user_data{_array} iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add a nested HW pagetable object iommufd: Share iommufd_hwpt_alloc with IOMMUFD_OBJ_HWPT_NESTED iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable iommufd: Rename IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING iommufd/device: Add helpers to enforce/remove device reserved regions iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op iommu: Pass in parent domain with user_data to domain_alloc_user op
following Jason's comment, it looks like we can just split the cache invalidation path out. Then the above looks good after removing "iommufd: Add IOMMU_HWPT_INVALIDATE" and also the cache_invalidate_user callback in "iommu: Add IOMMU_DOMAIN_NESTED and cache_invalidate_user op". Is it? @Jason
If it can make sense, sure. It would be nice to be finished with the alloc path
I can do the split today. Shall we have a domain_alloc_user op in VT-d driver? Can we accept a core series only? I understood it's better to have though...
Only this v4 has the latest array-based invalidation design. And it should be straightforward for drivers to define entry/request structures. It might be a bit rush to review/finalize it at the stage of rc6 though.
yes, before v4, the cache invalidation path is simple and vendor drivers have their own handling.
Have driver implementations of v4 been done to look at?
I think so: https://lore.kernel.org/linux-iommu/20230921075431.125239-10-yi.l.liu@intel....
Thanks Nicolin
In nested translation, the stage-1 page table is user-managed but cached by the IOMMU hardware, so an update on present page table entries in the stage-1 page table should be followed with a cache invalidation.
Add an IOMMU_HWPT_INVALIDATE ioctl to support such a cache invalidation. It takes hwpt_id to specify the iommu_domain, and a multi-entry array to support multiple invalidation requests in one ioctl.
Co-developed-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/hw_pagetable.c | 33 +++++++++++++++++++++++++ drivers/iommu/iommufd/iommufd_private.h | 9 +++++++ drivers/iommu/iommufd/main.c | 3 +++ include/uapi/linux/iommufd.h | 29 ++++++++++++++++++++++ 4 files changed, 74 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c index ab25de149ae6..72c46de1396b 100644 --- a/drivers/iommu/iommufd/hw_pagetable.c +++ b/drivers/iommu/iommufd/hw_pagetable.c @@ -382,3 +382,36 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd) iommufd_put_object(&idev->obj); return rc; } + +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd) +{ + struct iommu_hwpt_invalidate *cmd = ucmd->cmd; + struct iommu_user_data_array data_array = { + .uptr = u64_to_user_ptr(cmd->reqs_uptr), + .entry_len = cmd->req_len, + .entry_num = cmd->req_num, + }; + struct iommufd_hw_pagetable *hwpt; + int rc = 0; + + if (!cmd->req_len || !cmd->req_num) + return -EOPNOTSUPP; + + hwpt = iommufd_get_hwpt(ucmd, cmd->hwpt_id); + if (IS_ERR(hwpt)) + return PTR_ERR(hwpt); + + if (!hwpt->user_managed) { + rc = -EINVAL; + goto out_put_hwpt; + } + + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, &data_array, + &cmd->out_driver_error_code); + cmd->req_num = data_array.entry_num; + if (iommufd_ucmd_respond(ucmd, sizeof(*cmd))) + return -EFAULT; +out_put_hwpt: + iommufd_put_object(&hwpt->obj); + return rc; +} diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index b14f23d3f42e..bdbc8dac2fd8 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -270,6 +270,7 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev); void iommufd_hw_pagetable_destroy(struct iommufd_object *obj); void iommufd_hw_pagetable_abort(struct iommufd_object *obj); int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd); +int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, struct iommufd_hw_pagetable *hwpt) @@ -281,6 +282,14 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, refcount_dec(&hwpt->obj.users); }
+static inline struct iommufd_hw_pagetable * +iommufd_get_hwpt(struct iommufd_ucmd *ucmd, u32 id) +{ + return container_of(iommufd_get_object(ucmd->ictx, id, + IOMMUFD_OBJ_HW_PAGETABLE), + struct iommufd_hw_pagetable, obj); +} + static inline bool iommufd_hw_pagetable_compare_ioas(struct iommufd_hw_pagetable *old_hwpt, struct iommufd_hw_pagetable *new_hwpt) diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index e71523cbd0de..d9d82a413105 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -307,6 +307,7 @@ union ucmd_buffer { struct iommu_destroy destroy; struct iommu_hw_info info; struct iommu_hwpt_alloc hwpt; + struct iommu_hwpt_invalidate cache; struct iommu_ioas_alloc alloc; struct iommu_ioas_allow_iovas allow_iovas; struct iommu_ioas_copy ioas_copy; @@ -342,6 +343,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = { __reserved), IOCTL_OP(IOMMU_HWPT_ALLOC, iommufd_hwpt_alloc, struct iommu_hwpt_alloc, __reserved), + IOCTL_OP(IOMMU_HWPT_INVALIDATE, iommufd_hwpt_invalidate, + struct iommu_hwpt_invalidate, out_driver_error_code), IOCTL_OP(IOMMU_IOAS_ALLOC, iommufd_ioas_alloc_ioctl, struct iommu_ioas_alloc, out_ioas_id), IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h index c46b1f772f20..2083a0309a9b 100644 --- a/include/uapi/linux/iommufd.h +++ b/include/uapi/linux/iommufd.h @@ -47,6 +47,7 @@ enum { IOMMUFD_CMD_VFIO_IOAS, IOMMUFD_CMD_HWPT_ALLOC, IOMMUFD_CMD_GET_HW_INFO, + IOMMUFD_CMD_HWPT_INVALIDATE, };
/** @@ -478,4 +479,32 @@ struct iommu_hw_info { __u32 __reserved; }; #define IOMMU_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_GET_HW_INFO) + +/** + * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) + * @size: sizeof(struct iommu_hwpt_invalidate) + * @hwpt_id: HWPT ID of target hardware page table for cache invalidation + * @reqs_uptr: User pointer to an array having @req_num of cache invalidation + * requests. The request entries in the array are of fixed width + * @req_len, and contain a user data structure for invalidation + * request specific to the given hardware page table. + * @req_len: Length (in bytes) of a request entry in the request array + * @req_num: Input the number of cache invalidation requests in the array. + * Output the number of requests successfully handled by kernel. + * @out_driver_error_code: Report a driver speicifc error code upon failure + * + * Invalidate the iommu cache for user-managed page table. Modifications on a + * user-managed page table should be followed by this operation to sync cache. + * Each ioctl can support one or more cache invalidation requests in the array + * that has a total size of @req_len * @req_num. + */ +struct iommu_hwpt_invalidate { + __u32 size; + __u32 hwpt_id; + __aligned_u64 reqs_uptr; + __u32 req_len; + __u32 req_num; + __u32 out_driver_error_code; +}; +#define IOMMU_HWPT_INVALIDATE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HWPT_INVALIDATE) #endif
From: Nicolin Chen nicolinc@nvidia.com
TEST_LENGTH passing ".size = sizeof(struct _struct) - 1" expects -EINVAL from "if (ucmd.user_size < op->min_size)" check in iommufd_fops_ioctl(). This has been working when min_size is exactly the size of the structure.
However, if the size of the structure becomes larger than min_size, i.e. the passing size above is larger than min_size, that min_size sanity no longer works.
Since the first test in TEST_LENGTH() was to test that min_size sanity routine, rework it to support a min_size calculation, rather than using the full size of the structure.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 29 ++++++++++++++----------- 1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 9c129e63d7c7..7a29d68bd1d2 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -86,12 +86,13 @@ TEST_F(iommufd, cmd_fail)
TEST_F(iommufd, cmd_length) { -#define TEST_LENGTH(_struct, _ioctl) \ +#define TEST_LENGTH(_struct, _ioctl, _last) \ { \ + size_t min_size = offsetofend(struct _struct, _last); \ struct { \ struct _struct cmd; \ uint8_t extra; \ - } cmd = { .cmd = { .size = sizeof(struct _struct) - 1 }, \ + } cmd = { .cmd = { .size = min_size - 1 }, \ .extra = UINT8_MAX }; \ int old_errno; \ int rc; \ @@ -112,17 +113,19 @@ TEST_F(iommufd, cmd_length) } \ }
- TEST_LENGTH(iommu_destroy, IOMMU_DESTROY); - TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO); - TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC); - TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC); - TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES); - TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS); - TEST_LENGTH(iommu_ioas_map, IOMMU_IOAS_MAP); - TEST_LENGTH(iommu_ioas_copy, IOMMU_IOAS_COPY); - TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP); - TEST_LENGTH(iommu_option, IOMMU_OPTION); - TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS); + TEST_LENGTH(iommu_destroy, IOMMU_DESTROY, id); + TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO, __reserved); + TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC, __reserved); + TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC, out_ioas_id); + TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES, + out_iova_alignment); + TEST_LENGTH(iommu_ioas_allow_iovas, IOMMU_IOAS_ALLOW_IOVAS, + allowed_iovas); + TEST_LENGTH(iommu_ioas_map, IOMMU_IOAS_MAP, iova); + TEST_LENGTH(iommu_ioas_copy, IOMMU_IOAS_COPY, src_iova); + TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length); + TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); + TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); #undef TEST_LENGTH }
From: Nicolin Chen nicolinc@nvidia.com
Add nested domain support in the ->domain_alloc_user op with some proper sanity checks. Then, add a domain_nested_ops for all nested domains.
Also, add an iotlb as a testing property of a nested domain. A following patch will verify its value for the success of a nested domain allocation and a cache invalidation request.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 18 +++++ drivers/iommu/iommufd/selftest.c | 114 ++++++++++++++++++++++++--- 2 files changed, 123 insertions(+), 9 deletions(-)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 3f3644375bf1..7f997234a1a8 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -40,6 +40,11 @@ enum { MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES = 1 << 0, };
+enum { + MOCK_NESTED_DOMAIN_IOTLB_ID_MAX = 3, + MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, +}; + struct iommu_test_cmd { __u32 size; __u32 op; @@ -109,4 +114,17 @@ struct iommu_test_hw_info { __u32 test_reg; };
+/* Should not be equal to any defined value in enum iommu_hwpt_type */ +#define IOMMU_HWPT_TYPE_SELFTEST 0xdead + +/** + * struct iommu_hwpt_selftest + * + * @iotlb: default mock iotlb value, IOMMU_TEST_IOTLB_DEFAULT + */ +struct iommu_hwpt_selftest { +#define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef + __u32 iotlb; +}; + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index 2205a552e570..bd967317927f 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -88,6 +88,17 @@ void iommufd_test_syz_conv_iova_id(struct iommufd_ucmd *ucmd, struct mock_iommu_domain { struct iommu_domain domain; struct xarray pfns; + bool nested : 1; + /* mock domain test data */ + union { + struct { /* nested */ + struct mock_iommu_domain *parent; + u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; + }; + struct { /* parent */ + enum iommu_hwpt_type hwpt_type; + }; + }; };
enum selftest_obj_type { @@ -147,8 +158,12 @@ static void *mock_domain_hw_info(struct device *dev, u32 *length, u32 *type) }
static const struct iommu_ops mock_ops; +static struct iommu_domain_ops domain_nested_ops;
-static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) +static struct iommu_domain * +__mock_domain_alloc_kernel(unsigned int iommu_domain_type, + struct mock_iommu_domain *dummy, + const struct iommu_hwpt_selftest *user_cfg) { struct mock_iommu_domain *mock;
@@ -156,11 +171,11 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) return &mock_blocking_domain;
if (iommu_domain_type != IOMMU_DOMAIN_UNMANAGED) - return NULL; + return ERR_PTR(-EINVAL);
mock = kzalloc(sizeof(*mock), GFP_KERNEL); if (!mock) - return NULL; + return ERR_PTR(-ENOMEM); mock->domain.geometry.aperture_start = MOCK_APERTURE_START; mock->domain.geometry.aperture_end = MOCK_APERTURE_LAST; mock->domain.pgsize_bitmap = MOCK_IO_PAGE_SIZE; @@ -170,18 +185,91 @@ static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) return &mock->domain; }
+static struct iommu_domain * +__mock_domain_alloc_nested(unsigned int iommu_domain_type, + struct mock_iommu_domain *mock_parent, + const struct iommu_hwpt_selftest *user_cfg) +{ + struct mock_iommu_domain *mock; + int i; + + if (iommu_domain_type != IOMMU_DOMAIN_NESTED) + return ERR_PTR(-EINVAL); + + if (!user_cfg) + return ERR_PTR(-EINVAL); + + mock = kzalloc(sizeof(*mock), GFP_KERNEL); + if (!mock) + return ERR_PTR(-ENOMEM); + mock->nested = true; + mock->parent = mock_parent; + mock->domain.type = iommu_domain_type; + mock->domain.ops = &domain_nested_ops; + for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) + mock->iotlb[i] = user_cfg->iotlb; + return &mock->domain; +} + +static struct iommu_domain *mock_domain_alloc(unsigned int iommu_domain_type) +{ + struct iommu_domain *domain; + + if (iommu_domain_type != IOMMU_DOMAIN_BLOCKED && + iommu_domain_type != IOMMU_DOMAIN_UNMANAGED) + return NULL; + domain = __mock_domain_alloc_kernel(iommu_domain_type, NULL, NULL); + if (IS_ERR(domain)) + domain = NULL; + return domain; +} + static struct iommu_domain * mock_domain_alloc_user(struct device *dev, u32 flags, enum iommu_hwpt_type hwpt_type, struct iommu_domain *parent, const struct iommu_user_data *user_data) { - struct iommu_domain *domain; + struct iommu_domain *(*alloc_fn)(unsigned int iommu_domain_type, + struct mock_iommu_domain *mock_parent, + const struct iommu_hwpt_selftest *user_cfg); + unsigned int iommu_domain_type = IOMMU_DOMAIN_UNMANAGED; + struct iommu_hwpt_selftest data, *user_cfg = NULL; + struct mock_iommu_domain *mock_parent = NULL; + size_t min_len, data_len; + + switch (hwpt_type) { + case IOMMU_HWPT_TYPE_DEFAULT: + if (user_data || parent) + return ERR_PTR(-EINVAL); + min_len = data_len = 0; + alloc_fn = __mock_domain_alloc_kernel; + break; + default: + /* IOMMU_HWPT_TYPE_SELFTEST cannot be a case for a big value */ + if (hwpt_type != IOMMU_HWPT_TYPE_SELFTEST) + return ERR_PTR(-EINVAL); + if (!user_data || !parent || + parent->ops != mock_ops.default_domain_ops) + return ERR_PTR(-EINVAL); + iommu_domain_type = IOMMU_DOMAIN_NESTED; + mock_parent = container_of(parent, + struct mock_iommu_domain, domain); + min_len = offsetofend(struct iommu_hwpt_selftest, iotlb); + data_len = sizeof(struct iommu_hwpt_selftest); + alloc_fn = __mock_domain_alloc_nested; + break; + }
- domain = mock_domain_alloc(IOMMU_DOMAIN_UNMANAGED); - if (!domain) - domain = ERR_PTR(-ENOMEM); - return domain; + if (user_data) { + int rc = iommu_copy_user_data(&data, user_data, + data_len, min_len); + if (rc) + return ERR_PTR(rc); + user_cfg = &data; + } + + return alloc_fn(iommu_domain_type, mock_parent, user_cfg); }
static void mock_domain_free(struct iommu_domain *domain) @@ -340,6 +428,11 @@ static const struct iommu_ops mock_ops = { }, };
+static struct iommu_domain_ops domain_nested_ops = { + .free = mock_domain_free, + .attach_dev = mock_domain_nop_attach, +}; + static inline struct iommufd_hw_pagetable * get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, struct mock_iommu_domain **mock) @@ -352,7 +445,10 @@ get_md_pagetable(struct iommufd_ucmd *ucmd, u32 mockpt_id, if (IS_ERR(obj)) return ERR_CAST(obj); hwpt = container_of(obj, struct iommufd_hw_pagetable, obj); - if (hwpt->domain->ops != mock_ops.default_domain_ops) { + if ((hwpt->domain->type == IOMMU_DOMAIN_UNMANAGED && + hwpt->domain->ops != mock_ops.default_domain_ops) || + (hwpt->domain->type == IOMMU_DOMAIN_NESTED && + hwpt->domain->ops != &domain_nested_ops)) { iommufd_put_object(&hwpt->obj); return ERR_PTR(-EINVAL); }
From: Nicolin Chen nicolinc@nvidia.com
The IOMMU_HWPT_ALLOC ioctl now supports passing user_data to allocate a user-managed domain for nested HWPTs. Add its coverage for that. Also, update _test_cmd_hwpt_alloc() and add test_cmd/err_hwpt_alloc_nested().
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 115 ++++++++++++++++++ .../selftests/iommu/iommufd_fail_nth.c | 3 +- tools/testing/selftests/iommu/iommufd_utils.h | 31 +++-- 3 files changed, 141 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index 7a29d68bd1d2..db5e59e4abab 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -264,6 +264,121 @@ TEST_F(iommufd_ioas, ioas_destroy) } }
+TEST_F(iommufd_ioas, alloc_hwpt_nested) +{ + const uint32_t min_data_len = + offsetofend(struct iommu_hwpt_selftest, iotlb); + struct iommu_hwpt_selftest data = { + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, + }; + uint32_t nested_hwpt_id[2] = {}; + uint32_t parent_hwpt_id = 0; + uint32_t parent_hwpt_id_not_work = 0; + uint32_t test_hwpt_id = 0; + + if (self->device_id) { + /* Negative tests */ + test_err_hwpt_alloc(ENOENT, self->ioas_id, self->device_id, + 0, &test_hwpt_id); + test_err_hwpt_alloc(EINVAL, self->device_id, + self->device_id, 0, &test_hwpt_id); + + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &parent_hwpt_id); + + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, + 0, &parent_hwpt_id_not_work); + + /* Negative nested tests */ + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_DEFAULT, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_SELFTEST, + &data, min_data_len - 1); + test_err_hwpt_alloc_nested(EFAULT, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_SELFTEST, + NULL, sizeof(data)); + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, parent_hwpt_id, + IOMMU_HWPT_ALLOC_NEST_PARENT, + &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(EINVAL, self->device_id, + parent_hwpt_id_not_work, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + + /* Allocate two nested hwpts sharing one common parent hwpt */ + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + test_cmd_hwpt_alloc_nested(self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[1], + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + + /* Negative test: a nested hwpt on top of a nested hwpt */ + test_err_hwpt_alloc_nested(EINVAL, + self->device_id, nested_hwpt_id[0], + 0, &test_hwpt_id, + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + /* Negative test: parent hwpt now cannot be freed */ + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, parent_hwpt_id)); + + /* Attach device to nested_hwpt_id[0] that then will be busy */ + test_cmd_mock_domain_replace(self->stdev_id, + nested_hwpt_id[0]); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, nested_hwpt_id[0])); + + /* Switch from nested_hwpt_id[0] to nested_hwpt_id[1] */ + test_cmd_mock_domain_replace(self->stdev_id, + nested_hwpt_id[1]); + EXPECT_ERRNO(EBUSY, + _test_ioctl_destroy(self->fd, nested_hwpt_id[1])); + test_ioctl_destroy(nested_hwpt_id[0]); + + /* Detach from nested_hwpt_id[1] and destroy it */ + test_cmd_mock_domain_replace(self->stdev_id, parent_hwpt_id); + test_ioctl_destroy(nested_hwpt_id[1]); + + /* Detach from the parent hw_pagetable and destroy it */ + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); + test_ioctl_destroy(parent_hwpt_id); + test_ioctl_destroy(parent_hwpt_id_not_work); + } else { + test_err_hwpt_alloc(ENOENT, self->device_id, self->ioas_id, + 0, &parent_hwpt_id); + test_err_hwpt_alloc_nested(ENOENT, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[0], + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + test_err_hwpt_alloc_nested(ENOENT, + self->device_id, parent_hwpt_id, + 0, &nested_hwpt_id[1], + IOMMU_HWPT_TYPE_SELFTEST, + &data, sizeof(data)); + test_err_mock_domain_replace(ENOENT, + self->stdev_id, nested_hwpt_id[0]); + test_err_mock_domain_replace(ENOENT, + self->stdev_id, nested_hwpt_id[1]); + } +} + TEST_F(iommufd_ioas, hwpt_attach) { /* Create a device attached directly to a hwpt */ diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index 3d7838506bfe..d3f47f262c04 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -615,7 +615,8 @@ TEST_FAIL_NTH(basic_fail_nth, device) if (_test_cmd_get_hw_info(self->fd, idev_id, &info, sizeof(info))) return -1;
- if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id)) + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, &hwpt_id, + IOMMU_HWPT_TYPE_DEFAULT, 0, 0)) return -1;
if (_test_cmd_mock_domain_replace(self->fd, stdev_id, ioas_id2, NULL)) diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index be4970a84977..21d7c7e53bd4 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -103,13 +103,17 @@ static int _test_cmd_mock_domain_replace(int fd, __u32 stdev_id, __u32 pt_id, pt_id, NULL))
static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, - __u32 flags, __u32 *hwpt_id) + __u32 flags, __u32 *hwpt_id, __u32 hwpt_type, + void *data, size_t data_len) { struct iommu_hwpt_alloc cmd = { .size = sizeof(cmd), .flags = flags, .dev_id = device_id, .pt_id = pt_id, + .hwpt_type = hwpt_type, + .data_len = data_len, + .data_uptr = (uint64_t)data, }; int ret;
@@ -121,12 +125,25 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, return 0; }
-#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ - ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, \ - pt_id, flags, hwpt_id)) -#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ - EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, \ - pt_id, flags, hwpt_id)) +#define test_cmd_hwpt_alloc(device_id, pt_id, flags, hwpt_id) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + hwpt_id, IOMMU_HWPT_TYPE_DEFAULT, \ + NULL, 0)) +#define test_err_hwpt_alloc(_errno, device_id, pt_id, flags, hwpt_id) \ + EXPECT_ERRNO(_errno, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, \ + flags, hwpt_id, \ + IOMMU_HWPT_TYPE_DEFAULT, \ + NULL, 0)) + +#define test_cmd_hwpt_alloc_nested(device_id, pt_id, flags, hwpt_id, \ + hwpt_type, data, data_len) \ + ASSERT_EQ(0, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + hwpt_id, hwpt_type, data, data_len)) +#define test_err_hwpt_alloc_nested(_errno, device_id, pt_id, flags, hwpt_id, \ + hwpt_type, data, data_len) \ + EXPECT_ERRNO(_errno, \ + _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ + hwpt_id, hwpt_type, data, data_len))
static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id)
From: Nicolin Chen nicolinc@nvidia.com
Add mock_domain_cache_invalidate_user() data structure to support user space selftest program to cover user cache invalidation pathway.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 17 +++++++++++ drivers/iommu/iommufd/selftest.c | 44 ++++++++++++++++++++++++++++ 2 files changed, 61 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 7f997234a1a8..748349dd5bf4 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -127,4 +127,21 @@ struct iommu_hwpt_selftest { __u32 iotlb; };
+/** + * struct iommu_hwpt_invalidate_selftest + * + * @flags: invalidate flags + * @iotlb_id: invalidate iotlb entry index + * + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @iotlb_id will be ignored + */ +struct iommu_hwpt_invalidate_selftest { +#define IOMMU_TEST_INVALIDATE_ALL (1ULL << 0) + __u32 flags; + __u32 iotlb_id; +}; + +#define IOMMU_TEST_INVALIDATE_ERR_FETCH 0xdeadbeee +#define IOMMU_TEST_INVALIDATE_ERR_REQ 0xdeadbeef + #endif diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index bd967317927f..ebb2abc40c4a 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -428,9 +428,53 @@ static const struct iommu_ops mock_ops = { }, };
+static int +mock_domain_cache_invalidate_user(struct iommu_domain *domain, + struct iommu_user_data_array *array, + u32 *error_code) +{ + const size_t min_len = + offsetofend(struct iommu_hwpt_invalidate_selftest, iotlb_id); + struct mock_iommu_domain *mock = + container_of(domain, struct mock_iommu_domain, domain); + struct iommu_hwpt_invalidate_selftest inv; + int rc = 0; + int i, j; + + if (domain->type != IOMMU_DOMAIN_NESTED || !mock->nested) + return -EINVAL; + + for (i = 0; i < array->entry_num; i++) { + rc = iommu_copy_user_data_from_array(&inv, array, + i, sizeof(inv), min_len); + if (rc) { + *error_code = IOMMU_TEST_INVALIDATE_ERR_FETCH; + goto err; + } + /* Invalidate all mock iotlb entries and ignore iotlb_id */ + if (inv.flags & IOMMU_TEST_INVALIDATE_ALL) { + for (j = 0; j < MOCK_NESTED_DOMAIN_IOTLB_NUM; j++) + mock->iotlb[j] = 0; + continue; + } + /* Treat out-of-boundry iotlb_id as a request error */ + if (inv.iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX) { + *error_code = IOMMU_TEST_INVALIDATE_ERR_REQ; + rc = -EINVAL; + goto err; + } + mock->iotlb[inv.iotlb_id] = 0; + } + +err: + array->entry_num = i; + return rc; +} + static struct iommu_domain_ops domain_nested_ops = { .free = mock_domain_free, .attach_dev = mock_domain_nop_attach, + .cache_invalidate_user = mock_domain_cache_invalidate_user, };
static inline struct iommufd_hw_pagetable *
From: Nicolin Chen nicolinc@nvidia.com
This allows to test whether IOTLB has been invalidated or not.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- drivers/iommu/iommufd/iommufd_test.h | 5 ++++ drivers/iommu/iommufd/selftest.c | 25 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd.c | 4 +++ tools/testing/selftests/iommu/iommufd_utils.h | 24 ++++++++++++++++++ 4 files changed, 58 insertions(+)
diff --git a/drivers/iommu/iommufd/iommufd_test.h b/drivers/iommu/iommufd/iommufd_test.h index 748349dd5bf4..f6c8e271e71d 100644 --- a/drivers/iommu/iommufd/iommufd_test.h +++ b/drivers/iommu/iommufd/iommufd_test.h @@ -19,6 +19,7 @@ enum { IOMMU_TEST_OP_SET_TEMP_MEMORY_LIMIT, IOMMU_TEST_OP_MOCK_DOMAIN_REPLACE, IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, + IOMMU_TEST_OP_MD_CHECK_IOTLB, };
enum { @@ -100,6 +101,10 @@ struct iommu_test_cmd { struct { __u32 ioas_id; } access_replace_ioas; + struct { + __u32 id; + __u32 iotlb; + } check_iotlb; }; __u32 last; }; diff --git a/drivers/iommu/iommufd/selftest.c b/drivers/iommu/iommufd/selftest.c index ebb2abc40c4a..92f3e3f5eeb5 100644 --- a/drivers/iommu/iommufd/selftest.c +++ b/drivers/iommu/iommufd/selftest.c @@ -759,6 +759,27 @@ static int iommufd_test_md_check_refs(struct iommufd_ucmd *ucmd, return 0; }
+static int iommufd_test_md_check_iotlb(struct iommufd_ucmd *ucmd, + u32 mockpt_id, unsigned int iotlb_id, + u32 iotlb) +{ + struct iommufd_hw_pagetable *hwpt; + struct mock_iommu_domain *mock; + int rc = 0; + + hwpt = get_md_pagetable(ucmd, mockpt_id, &mock); + if (IS_ERR(hwpt)) + return PTR_ERR(hwpt); + + mock = container_of(hwpt->domain, struct mock_iommu_domain, domain); + + if (!mock->nested || iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX || + mock->iotlb[iotlb_id] != iotlb) + rc = -EINVAL; + iommufd_put_object(&hwpt->obj); + return rc; +} + struct selftest_access { struct iommufd_access *access; struct file *file; @@ -1172,6 +1193,10 @@ int iommufd_test(struct iommufd_ucmd *ucmd) return iommufd_test_md_check_refs( ucmd, u64_to_user_ptr(cmd->check_refs.uptr), cmd->check_refs.length, cmd->check_refs.refs); + case IOMMU_TEST_OP_MD_CHECK_IOTLB: + return iommufd_test_md_check_iotlb(ucmd, cmd->id, + cmd->check_iotlb.id, + cmd->check_iotlb.iotlb); case IOMMU_TEST_OP_CREATE_ACCESS: return iommufd_test_create_access(ucmd, cmd->id, cmd->create_access.flags); diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index db5e59e4abab..c59299248a98 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -327,6 +327,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) 0, &nested_hwpt_id[1], IOMMU_HWPT_TYPE_SELFTEST, &data, sizeof(data)); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], + IOMMU_TEST_IOTLB_DEFAULT);
/* Negative test: a nested hwpt on top of a nested hwpt */ test_err_hwpt_alloc_nested(EINVAL, diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 21d7c7e53bd4..2bc41e9a69e8 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -145,6 +145,30 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, _test_cmd_hwpt_alloc(self->fd, device_id, pt_id, flags, \ hwpt_id, hwpt_type, data, data_len))
+#define test_cmd_hwpt_check_iotlb(hwpt_id, iotlb_id, expected) \ + ({ \ + struct iommu_test_cmd test_cmd = { \ + .size = sizeof(test_cmd), \ + .op = IOMMU_TEST_OP_MD_CHECK_IOTLB, \ + .id = hwpt_id, \ + .check_iotlb = { \ + .id = iotlb_id, \ + .iotlb = expected, \ + }, \ + }; \ + ASSERT_EQ(0, \ + ioctl(self->fd, \ + _IOMMU_TEST_CMD(IOMMU_TEST_OP_MD_CHECK_IOTLB), \ + &test_cmd)); \ + }) + +#define test_cmd_hwpt_check_iotlb_all(hwpt_id, expected) \ + ({ \ + int i; \ + for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) \ + test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
From: Nicolin Chen nicolinc@nvidia.com
Add test cases for the IOMMU_HWPT_INVALIDATE ioctl and verify it by using the new IOMMU_TEST_OP_MD_CHECK_IOTLB.
Signed-off-by: Nicolin Chen nicolinc@nvidia.com Signed-off-by: Yi Liu yi.l.liu@intel.com --- tools/testing/selftests/iommu/iommufd.c | 61 +++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 36 +++++++++++ 2 files changed, 97 insertions(+)
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c index c59299248a98..617e11153761 100644 --- a/tools/testing/selftests/iommu/iommufd.c +++ b/tools/testing/selftests/iommu/iommufd.c @@ -116,6 +116,7 @@ TEST_F(iommufd, cmd_length) TEST_LENGTH(iommu_destroy, IOMMU_DESTROY, id); TEST_LENGTH(iommu_hw_info, IOMMU_GET_HW_INFO, __reserved); TEST_LENGTH(iommu_hwpt_alloc, IOMMU_HWPT_ALLOC, __reserved); + TEST_LENGTH(iommu_hwpt_invalidate, IOMMU_HWPT_INVALIDATE, out_driver_error_code); TEST_LENGTH(iommu_ioas_alloc, IOMMU_IOAS_ALLOC, out_ioas_id); TEST_LENGTH(iommu_ioas_iova_ranges, IOMMU_IOAS_IOVA_RANGES, out_iova_alignment); @@ -271,7 +272,9 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) struct iommu_hwpt_selftest data = { .iotlb = IOMMU_TEST_IOTLB_DEFAULT, }; + struct iommu_hwpt_invalidate_selftest inv_reqs[2] = {0}; uint32_t nested_hwpt_id[2] = {}; + uint32_t num_inv, driver_error; uint32_t parent_hwpt_id = 0; uint32_t parent_hwpt_id_not_work = 0; uint32_t test_hwpt_id = 0; @@ -342,6 +345,64 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested) EXPECT_ERRNO(EBUSY, _test_ioctl_destroy(self->fd, parent_hwpt_id));
+ /* hwpt_invalidate only supports a user-managed hwpt (nested) */ + num_inv = 1; + test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs, + sizeof(*inv_reqs), &num_inv, NULL); + /* Negative test: structure size sanity */ + num_inv = 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + sizeof(*inv_reqs) + 1, &num_inv, + &driver_error); + assert(driver_error == IOMMU_TEST_INVALIDATE_ERR_FETCH); + + num_inv = 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + 1, &num_inv, &driver_error); + assert(driver_error == IOMMU_TEST_INVALIDATE_ERR_FETCH); + + /* Invalidate the 1st iotlb entry but fail the 2nd request */ + num_inv = 2; + inv_reqs[0].iotlb_id = 0; + inv_reqs[1].iotlb_id = MOCK_NESTED_DOMAIN_IOTLB_ID_MAX + 1; + test_err_hwpt_invalidate(EINVAL, nested_hwpt_id[0], inv_reqs, + sizeof(*inv_reqs), &num_inv, + &driver_error); + assert(num_inv == 1); + assert(driver_error == IOMMU_TEST_INVALIDATE_ERR_REQ); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3, + IOMMU_TEST_IOTLB_DEFAULT); + + /* Invalidate the 2nd iotlb entry and verify */ + num_inv = 1; + inv_reqs[0].iotlb_id = 1; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + sizeof(*inv_reqs), &num_inv); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 0, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 1, 0); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 2, + IOMMU_TEST_IOTLB_DEFAULT); + test_cmd_hwpt_check_iotlb(nested_hwpt_id[0], 3, + IOMMU_TEST_IOTLB_DEFAULT); + /* Invalidate the 3rd and 4th iotlb entries and verify */ + num_inv = 2; + inv_reqs[0].iotlb_id = 2; + inv_reqs[1].iotlb_id = 3; + test_cmd_hwpt_invalidate(nested_hwpt_id[0], inv_reqs, + sizeof(*inv_reqs), &num_inv); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[0], 0); + /* Invalidate all iotlb entries for nested_hwpt_id[1] and verify */ + num_inv = 1; + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_ALL; + test_cmd_hwpt_invalidate(nested_hwpt_id[1], inv_reqs, + sizeof(*inv_reqs), &num_inv); + test_cmd_hwpt_check_iotlb_all(nested_hwpt_id[1], 0); + /* Attach device to nested_hwpt_id[0] that then will be busy */ test_cmd_mock_domain_replace(self->stdev_id, nested_hwpt_id[0]); diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h index 2bc41e9a69e8..55906c5da1a9 100644 --- a/tools/testing/selftests/iommu/iommufd_utils.h +++ b/tools/testing/selftests/iommu/iommufd_utils.h @@ -169,6 +169,42 @@ static int _test_cmd_hwpt_alloc(int fd, __u32 device_id, __u32 pt_id, test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ })
+static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, + uint32_t lreq, uint32_t *nreqs, + uint32_t *driver_error) +{ + struct iommu_hwpt_invalidate cmd = { + .size = sizeof(cmd), + .hwpt_id = hwpt_id, + .reqs_uptr = (uint64_t)reqs, + .req_len = lreq, + .req_num = *nreqs, + }; + int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); + *nreqs = cmd.req_num; + if (driver_error) + *driver_error = cmd.out_driver_error_code; + return rc; +} + +#define test_cmd_hwpt_invalidate(hwpt_id, reqs, lreq, nreqs) \ + ({ \ + uint32_t error, num = *nreqs; \ + ASSERT_EQ(0, \ + _test_cmd_hwpt_invalidate(self->fd, hwpt_id, reqs, \ + lreq, nreqs, &error)); \ + assert(num == *nreqs); \ + assert(error == 0); \ + }) +#define test_err_hwpt_invalidate(_errno, hwpt_id, reqs, lreq, nreqs, \ + driver_error) \ + ({ \ + EXPECT_ERRNO(_errno, \ + _test_cmd_hwpt_invalidate(self->fd, hwpt_id, \ + reqs, lreq, nreqs, \ + driver_error)); \ + }) + static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, unsigned int ioas_id) {
On Thu, Sep 21, 2023 at 12:51:21AM -0700, Yi Liu wrote:
v4:
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far.
This doesn't apply to iommufd next so you will have to resend it
Jason
On 2023/10/11 00:53, Jason Gunthorpe wrote:
On Thu, Sep 21, 2023 at 12:51:21AM -0700, Yi Liu wrote:
v4:
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far.
This doesn't apply to iommufd next so you will have to resend it
sure. I'm working with Nic on a new version.
linux-kselftest-mirror@lists.linaro.org