On 22.10.24 18:43:15, Dan Williams wrote:
Changes since v1 [1]:
- Fix some misspellings missed by checkpatch in changelogs (Jonathan)
- Add comments explaining the order of objects in drivers/cxl/Makefile
(Jonathan)
- Rename attach_device => cxl_rescan_attach (Jonathan)
- Fixup Zijun's email (Zijun)
Original cover:
Gregory's modest proposal to fix CXL cxl_mem_probe() failures due to
delayed arrival of the CXL "root" infrastructure [1] prompted questions
of how the existing mechanism for retrying cxl_mem_probe() could be
failing.
I found a similar issue with the region creation.
A region is created with the first endpoint found and immediately
added as device which triggers cxl_region_probe(). Now, in
interleaving setups the region state comes into commit state only
after the last endpoint was probed. So the probe must be repeated
until all endpoints were enumerated. I ended up with this change:
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
index a07b62254596..c78704e435e5 100644
--- a/drivers/cxl/core/region.c
+++ b/drivers/cxl/core/region.c
@@ -3775,8 +3775,8 @@ static int cxl_region_probe(struct device *dev)
    }
if (p->state < CXL_CONFIG_COMMIT) {
-		dev_dbg(&cxlr->dev, "config state: %d\n", p->state);
-		rc = -ENXIO;
+		rc = dev_err_probe(&cxlr->dev, -EPROBE_DEFER,
+				"region config state: %d\n", p->state);
    	goto out;
    }
-- 
2.39.5
I don't see an init order issue here as the mem module is always up
before the regions are probed.
-Robert
> 
> The critical missing piece in the debug was that Gregory's setup had
> almost all CXL modules built-in to the kernel.
> 
> On the way to that discovery several other bugs and init-order corner
> cases were discovered.
> 
> The main fix is to make sure the drivers/cxl/Makefile object order
> supports root CXL ports being fully initialized upon cxl_acpi_probe()
> exit. The modular case has some similar potential holes that are fixed
> with MODULE_SOFTDEP() and other fix ups. Finally, an attempt to update
> cxl_test to reproduce the original report resulted in the discovery of a
> separate long standing use after free bug in cxl_region_detach().
> 
> [2]: 
http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net
> 
> ---
> 
> Dan Williams (6):
>       cxl/port: Fix CXL port initialization order when the subsystem is built-in
>       cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices()
>       cxl/acpi: Ensure ports ready at cxl_acpi_probe() return
>       cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
>       cxl/port: Prevent out-of-order decoder allocation
>       cxl/test: Improve init-order fidelity relative to real-world systems
> 
> 
>  drivers/base/core.c          |   35 +++++++
>  drivers/cxl/Kconfig          |    1 
>  drivers/cxl/Makefile         |   20 +++-
>  drivers/cxl/acpi.c           |    7 +
>  drivers/cxl/core/hdm.c       |   50 +++++++++--
>  drivers/cxl/core/port.c      |   13 ++-
>  drivers/cxl/core/region.c    |   91 ++++++++++---------
>  drivers/cxl/cxl.h            |    3 -
>  include/linux/device.h       |    3 +
>  tools/testing/cxl/test/cxl.c |  200 +++++++++++++++++++++++-------------------
>  tools/testing/cxl/test/mem.c |    1 
>  11 files changed, 269 insertions(+), 155 deletions(-)
> 
> base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b