From: Yi Sun yi.sun@intel.com
[ Upstream commit 17502e7d7b7113346296f6758324798d536c31fd ]
Running IDXD workloads in a container with the /dev directory mounted can trigger a call trace or even a kernel panic when the parent process of the container is terminated.
This issue occurs because, under certain configurations, Docker does not properly propagate the mount replica back to the original mount point.
In this case, when the user driver detaches, the WQ is destroyed but it still calls destroy_workqueue() attempting to completes all pending work. It's necessary to check wq->wq and skip the drain if it no longer exists.
Signed-off-by: Yi Sun yi.sun@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Anil S Keshavamurthy anil.s.keshavamurthy@intel.com
Link: https://lore.kernel.org/r/20250509000304.1402863-1-yi.sun@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org ---
**YES**
This commit should be backported to stable kernel trees. Here's my detailed analysis:
## Critical Bug Fix Analysis
**1. Real User-Affecting Issue:** The commit addresses a kernel panic/call trace that occurs in production environments when running IDXD workloads in containers. The issue is triggered when Docker doesn't properly propagate mount replicas, leading to improper workqueue destruction sequences.
**2. Race Condition Fix:** The core problem is a race condition where: - The user driver detaches and `wq->wq` is set to NULL in `destroy_workqueue(wq->wq); wq->wq = NULL;` - But `idxd_cdev_evl_drain_pasid()` still attempts to call `drain_workqueue(wq->wq)` - This causes a NULL pointer dereference when accessing `wq->wq`
**3. Minimal, Focused Change:** The fix adds just two lines: ```c - drain_workqueue(wq->wq); + if (wq->wq) + drain_workqueue(wq->wq); ```
This is exactly the type of small, defensive change appropriate for stable backports.
**4. Evidence from Kernel Analysis:** My research of the mainline kernel confirms this NULL check pattern doesn't exist in current code, indicating this is a legitimate new bug fix. The current code at `drivers/dma/idxd/cdev.c:352` calls `drain_workqueue(wq->wq)` without any NULL protection.
**5. Critical Subsystem:** The IDXD (Intel Data Streaming Accelerator) driver is used in high- performance computing and data center environments. Kernel panics in this subsystem can affect critical workloads.
**6. Comparison to Similar Backported Commits:** This closely matches Similar Commit #1 and #4 (both marked YES for backporting): - Fixes driver probe/removal issues - Prevents kernel crashes during error paths - Small, contained change with minimal regression risk - Addresses resource cleanup problems
**7. Production Impact:** The commit message specifically mentions this affects containerized workloads, which are common in production environments. The failure mode (kernel panic) is severe.
**8. No Architectural Changes:** The fix doesn't modify any APIs, data structures, or control flow - it simply adds a defensive NULL check before using a potentially destroyed workqueue.
This commit perfectly fits the stable tree criteria: it fixes an important bug that causes kernel panics, has minimal risk of regression, is small and contained, and addresses a real user-reported issue in production environments.
drivers/dma/idxd/cdev.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c index 7e3a67f9f0a65..aa39fcd389a94 100644 --- a/drivers/dma/idxd/cdev.c +++ b/drivers/dma/idxd/cdev.c @@ -354,7 +354,9 @@ static void idxd_cdev_evl_drain_pasid(struct idxd_wq *wq, u32 pasid) set_bit(h, evl->bmap); h = (h + 1) % size; } - drain_workqueue(wq->wq); + if (wq->wq) + drain_workqueue(wq->wq); + mutex_unlock(&evl->lock); }