In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com --- drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -853,6 +853,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) { int i;
+ /* + * Wait for any ongoing device link removals before removing some of + * nodes. Drop the global lock while waiting + */ + mutex_unlock(&of_mutex); + device_link_wait_removal(); + mutex_lock(&of_mutex); + if (ovcs->cset.entries.next) of_changeset_destroy(&ovcs->cset);
@@ -862,7 +870,6 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) ovcs->id = 0; }
- for (i = 0; i < ovcs->count; i++) { of_node_put(ovcs->fragments[i].target); of_node_put(ovcs->fragments[i].overlay);
On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c @@ -853,6 +853,14 @@ static void free_overlay_changeset(struct overlay_changeset *ovcs) { int i;
- /*
* Wait for any ongoing device link removals before removing some of
* nodes. Drop the global lock while waiting
*/
- mutex_unlock(&of_mutex);
- device_link_wait_removal();
- mutex_lock(&of_mutex);
I'm still not convinced we need to drop the lock. What happens if someone else grabs the lock while we are in device_link_wait_removal()? Can we guarantee that we can't screw things badly?
The question is, do you have a system/use case where you can really see the deadlock happening? Until I see one, I'm very skeptical about this. And if we have one, I'm not really sure this is also the right solution for it.
- Nuno Sá
On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c
In the cover, you mention device.h inclusion but I'm not seeing it? This is clearly up to the DT maintainers to decide but, IMHO, I would very much prefer to see fwnode.h included in here rather than directly device.h (so yeah, renaming the function to fwnode_*). But yeah, I might be biased by own series :)
- Nuno Sá
On Thu, 29 Feb 2024 10:50:21 +0100 Nuno Sá noname.nuno@gmail.com wrote:
On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c
In the cover, you mention device.h inclusion but I'm not seeing it? This is clearly up to the DT maintainers to decide but, IMHO, I would very much prefer to see fwnode.h included in here rather than directly device.h (so yeah, renaming the function to fwnode_*). But yeah, I might be biased by own series :)
Damned. I missed device.h in this patch. Without this one, the patch do not compile :(
A fixup commit I missed to squash before sending.
A v3 is planned to add this device.h.
Nuno, do you prefer I wait few days before sending this v3 waiting for more replies or I send it right now and you re-do your comment on the v3 ?
I would really prefer to send it now :)
Sorry about my mistake. Best regards, Hervé
On Thu, 2024-02-29 at 11:14 +0100, Herve Codina wrote:
On Thu, 29 Feb 2024 10:50:21 +0100 Nuno Sá noname.nuno@gmail.com wrote:
On Thu, 2024-02-29 at 09:39 +0100, Herve Codina wrote:
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove()
During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ...
Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job.
In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_overlay_remove() with the devlink removals.
Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com
drivers/of/overlay.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/of/overlay.c b/drivers/of/overlay.c index 2ae7e9d24a64..99659ae9fb28 100644 --- a/drivers/of/overlay.c +++ b/drivers/of/overlay.c
In the cover, you mention device.h inclusion but I'm not seeing it? This is clearly up to the DT maintainers to decide but, IMHO, I would very much prefer to see fwnode.h included in here rather than directly device.h (so yeah, renaming the function to fwnode_*). But yeah, I might be biased by own series :)
Damned. I missed device.h in this patch. Without this one, the patch do not compile :(
A fixup commit I missed to squash before sending.
A v3 is planned to add this device.h.
Nuno, do you prefer I wait few days before sending this v3 waiting for more replies or I send it right now and you re-do your comment on the v3 ?
I would really prefer to send it now :)
Typically maintainers don't like much of re-spinning versions too fast. That said, up to you :). I can copy paste my comments in v3.
- Nuno Sá
linux-stable-mirror@lists.linaro.org