On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -853,6 +853,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) { int i;
- /*
* Wait for any ongoing device link removals before removing some of
* nodes. Drop the global lock while waiting
*/
- mutex_unlock(&of_mutex);
- device_link_wait_removal();
- mutex_lock(&of_mutex);
I'm still not convinced we need to drop the lock. What happens if someone else grabs the lock while we are in device_link_wait_removal()? Can we guarantee that we can't screw things badly?
The question is, do you have a system/use case where you can really see the deadlock happening? Until I see one, I'm very skeptical about this. And if we have one, I'm not really sure this is also the right solution for it.
- Nuno Sá