Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and the IOMMU driver might not expect FQ domains at ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5.
Cc: regressions@lists.linux.dev Cc: stable@vger.kernel.org Fixes: a4fdd9762272 ("iommu: Use flush queue capability") Signed-off-by: Hector Martin marcan@marcan.st --- drivers/iommu/iommu.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3bfc56df4f78..12464eaa8d91 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2039,6 +2039,15 @@ static int __iommu_attach_device(struct iommu_domain *domain, if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV;
+ /* + * Ensure we do not try to attach devices to FQ domains if the + * IOMMU does not support them. We can safely fall back to + * non-FQ. + */ + if (domain->type == IOMMU_DOMAIN_DMA_FQ && + !device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH)) + domain->type = IOMMU_DOMAIN_DMA; + ret = domain->ops->attach_dev(domain, dev); if (ret) return ret;
--- base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70 change-id: 20230922-iommu-type-regression-25b4f43df770
Best regards,
On 22/09/2023 2:40 pm, Hector Martin wrote:
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and the IOMMU driver might not expect FQ domains at ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5.
Apologies, I missed that apple-dart was doing something unusual here. However, could we just fix that directly instead?
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 2082081402d3..0b8927508427 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain *domain, return ret;
switch (domain->type) { - case IOMMU_DOMAIN_DMA: - case IOMMU_DOMAIN_UNMANAGED: + default: ret = apple_dart_domain_add_streams(dart_domain, cfg); if (ret) return ret;
That's pretty much where we're headed with the domain_alloc_paging redesign anyway - at the driver level, operations on a paging domain should not need to know about the higher-level usage intent of that domain. Ideally, blocking and identity domains should have their own distinct ops now as well, but that might be a bit too big a change for an immediate fix here.
Thanks, Robin.
Cc: regressions@lists.linux.dev Cc: stable@vger.kernel.org Fixes: a4fdd9762272 ("iommu: Use flush queue capability") Signed-off-by: Hector Martin marcan@marcan.st
drivers/iommu/iommu.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3bfc56df4f78..12464eaa8d91 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2039,6 +2039,15 @@ static int __iommu_attach_device(struct iommu_domain *domain, if (unlikely(domain->ops->attach_dev == NULL)) return -ENODEV;
- /*
* Ensure we do not try to attach devices to FQ domains if the
* IOMMU does not support them. We can safely fall back to
* non-FQ.
*/
- if (domain->type == IOMMU_DOMAIN_DMA_FQ &&
!device_iommu_capable(dev, IOMMU_CAP_DEFERRED_FLUSH))
domain->type = IOMMU_DOMAIN_DMA;
- ret = domain->ops->attach_dev(domain, dev); if (ret) return ret;
base-commit: ce9ecca0238b140b88f43859b211c9fdfd8e5b70 change-id: 20230922-iommu-type-regression-25b4f43df770
Best regards,
On 22/09/2023 23.21, Robin Murphy wrote:
On 22/09/2023 2:40 pm, Hector Martin wrote:
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and the IOMMU driver might not expect FQ domains at ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5.
Apologies, I missed that apple-dart was doing something unusual here. However, could we just fix that directly instead?
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 2082081402d3..0b8927508427 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain *domain, return ret;
switch (domain->type) {
- case IOMMU_DOMAIN_DMA:
- case IOMMU_DOMAIN_UNMANAGED:
- default: ret = apple_dart_domain_add_streams(dart_domain, cfg); if (ret) return ret;
That's pretty much where we're headed with the domain_alloc_paging redesign anyway - at the driver level, operations on a paging domain should not need to know about the higher-level usage intent of that domain. Ideally, blocking and identity domains should have their own distinct ops now as well, but that might be a bit too big a change for an immediate fix here.
Sure, but it sounded like if there's a capability for this the core should probably use it and not expose the type at all to drivers that can't support it :)
If you think defaulting to that branch in DART is correctly future-proof I can make that change. It's not the only driver checking the domain type in attach_dev(), but it might be the only one enumerating all the options instead of checking for specific cases only (e.g. intel checks for IOMMU_DOMAIN_IDENTITY).
- Hector
On Fri, Sep 22, 2023 at 03:21:17PM +0100, Robin Murphy wrote:
On 22/09/2023 2:40 pm, Hector Martin wrote:
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and the IOMMU driver might not expect FQ domains at ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5.
Apologies, I missed that apple-dart was doing something unusual here. However, could we just fix that directly instead?
diff --git a/drivers/iommu/apple-dart.c b/drivers/iommu/apple-dart.c index 2082081402d3..0b8927508427 100644 --- a/drivers/iommu/apple-dart.c +++ b/drivers/iommu/apple-dart.c @@ -671,8 +671,7 @@ static int apple_dart_attach_dev(struct iommu_domain *domain, return ret;
switch (domain->type) {
- case IOMMU_DOMAIN_DMA:
- case IOMMU_DOMAIN_UNMANAGED:
- default: ret = apple_dart_domain_add_streams(dart_domain, cfg); if (ret) return ret;
Yes, I much prefer this to the original patch please. Drivers should not be testing DMA_FQ at all.
I already wrote a series to convert DART to domain_alloc_paging() that fixes this inadvertantly.
Robin's suggestion is good for a temporary -rc fix.
Removing the switch is slightly more robust:
if (domain->type & domain->type & __IOMMU_DOMAIN_PAGING) { [..] return 0 }
if (domain->type == IOMMU_DOMAIN_BLOCKED) { .. }
return -EOPNOTSUPP;
But not so worthwhile since I deleted all this anyhow...
I'll send out the dart series, it can't go to -rc, so a patch is still needed.
Thanks, Jason
[TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.]
On 22.09.23 15:40, Hector Martin wrote:
Commit a4fdd9762272 ("iommu: Use flush queue capability") hid the IOMMU_DOMAIN_DMA_FQ domain type from domain allocation. A check was introduced in iommu_dma_init_domain() to fall back if not supported, but this check runs too late: by that point, devices have been attached to the IOMMU, and the IOMMU driver might not expect FQ domains at ops->attach_dev() time.
Ensure that we immediately clamp FQ domains to plain DMA if not supported by the driver at device attach time, not later.
This regressed apple-dart in v6.5. [...]
Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced a4fdd9762272 #regzbot title iommu: apple-dart regressed #regzbot monitor: https://lore.kernel.org/all/20230922-iommu-type-regression-v2-1-689b2ba9b673... #regzbot fix: iommu/apple-dart: Handle DMA_FQ domains in attach_dev() #regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
linux-stable-mirror@lists.linaro.org