On 2024-11-19 7:10 pm, Pratyush Brahma wrote:
On 11/7/2024 8:31 PM, Robin Murphy wrote:
On 29/10/2024 4:15 pm, Will Deacon wrote:
On Fri, 04 Oct 2024 14:34:28 +0530, Pratyush Brahma wrote:
Null pointer dereference occurs due to a race between smmu driver probe and client driver probe, when of_dma_configure() for client is called after the iommu_device_register() for smmu driver probe has executed but before the driver_bound() for smmu driver has been called.
Following is how the race occurs:
[...]
Applied to will (for-joerg/arm-smmu/updates), thanks!
[1/1] iommu/arm-smmu: Defer probe of clients after smmu device bound https://git.kernel.org/will/c/229e6ee43d2a
I've finally got to the point of proving to myself that this isn't the right fix, since once we do get __iommu_probe_device() working properly in the correct order, iommu_device_register() then runs into the same condition itself. Diff below should make this issue go away - I'll write up proper patches once I've tested it a little more.
Thanks, Robin.
----->8----- diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/ iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 737c5b882355..b7dcb1494aa4 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3171,8 +3171,8 @@ static struct platform_driver arm_smmu_driver; static struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) { - struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver, - fwnode); + struct device *dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode); + put_device(dev); return dev ? dev_get_drvdata(dev) : NULL; } diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/ arm/arm-smmu/arm-smmu.c index 8321962b3714..aba315aa6848 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -1411,8 +1411,8 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap) static struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) { - struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver, - fwnode); + struct device *dev = bus_find_device_by_fwnode(&platform_bus_type, fwnode);
I think it would still follow this path:
bus_find_device_by_fwnode() -> bus_find_device() -> next_device()
next_device() would always return null until the driver is bound to the device which
No, this is traversing the bus list, *not* the driver list, that's the whole point. The SMMU device must exist on the platform bus before the driver can bind, since the bus is responsible for matching the driver in the first place.
happens much later in really_probe() after the iommu_device_register() would be called even as per this patch. That way the race would still occur, wouldn't it? Can you please help me understand what I may be missing here? Are you saying that these additional patches are required along with the fix I've posted?
I'm saying my change makes there be no race, i.e. the "if (!smmu)" case can never be true, and so no longer needs working around.
Thanks, Robin.
put_device(dev); return dev ? dev_get_drvdata(dev) : NULL; } @@ -2232,21 +2232,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev) i, irq); }
- err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL, - "smmu.%pa", &smmu->ioaddr); - if (err) { - dev_err(dev, "Failed to register iommu in sysfs\n"); - return err; - }
- err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, - using_legacy_binding ? NULL : dev); - if (err) { - dev_err(dev, "Failed to register iommu\n"); - iommu_device_sysfs_remove(&smmu->iommu); - return err; - }
platform_set_drvdata(pdev, smmu);
/* Check for RMRs and install bypass SMRs if any */ @@ -2255,6 +2240,18 @@ static int arm_smmu_device_probe(struct platform_device *pdev) arm_smmu_device_reset(smmu); arm_smmu_test_smr_masks(smmu);
+ err = iommu_device_sysfs_add(&smmu->iommu, smmu->dev, NULL, + "smmu.%pa", &smmu->ioaddr); + if (err) + return dev_err_probe(dev, err, "Failed to register iommu in sysfs\n");
+ err = iommu_device_register(&smmu->iommu, &arm_smmu_ops, + using_legacy_binding ? NULL : dev); + if (err) { + iommu_device_sysfs_remove(&smmu->iommu); + return dev_err_probe(dev, err, "Failed to register iommu\n"); + }
/* * We want to avoid touching dev->power.lock in fastpaths unless * it's really going to do something useful - pm_runtime_enabled()