On Sun, Dec 14, 2025 at 5:10 PM Vladimir Oltean olteanv@gmail.com wrote:
Hi Jonas, Ma Ke,
On Sun, Dec 14, 2025 at 05:02:33PM +0100, Jonas Gorski wrote:
Hi,
On 12/14/25 14:12, Ma Ke wrote:
When of_find_net_device_by_node() successfully acquires a reference to
Your subject is missing the () of dsa_port_parse_of()
a network device but the subsequent call to dsa_port_parse_cpu() fails, dsa_port_parse_of() returns without releasing the reference count on the network device.
of_find_net_device_by_node() increments the reference count of the returned structure, which should be balanced with a corresponding put_device() when the reference is no longer needed.
Found by code review.
I agree with the reference not being properly released on failure, but I don't think this fix is complete.
I was trying to figure out where the put_device() would happen in the success case (or on removal), and I failed to find it.
Also if the (indirect) top caller of dsa_port_parse_of(), dsa_switch_probe(), fails at a later place the reference won't be released either.
The only explicit put_device() that happens is in dsa_dev_to_net_device(), which seems to convert a device reference to a netdev reference via dev_hold().
But the only caller of that, dsa_port_parse() immediately calls dev_put() on it, essentially dropping all references, and then continuing using it.
dsa_switch_shutdown() talks about dropping references taken via netdev_upper_dev_link(), but AFAICT this happens only after dsa_port_parse{,_of}() setup the conduit, so it looks like there could be a window without any reference held onto the conduit.
So AFAICT the current state is:
dsa_port_parse_of() keeps the device reference. dsa_port_parse() drops the device reference, and shortly has a dev_hold(), but it does not extend beyond the function.
Therefore if my analysis is correct (which it may very well not be), the correct fix(es) here could be:
dsa_port_parse{,_of}() should keep a reference via e.g. dev_hold() on success to the conduit.
Or maybe they should unconditionally drop if *after* calling dsa_port_parse_cpu(), and dsa_port_parse_cpu() should take one when assigning dsa_port::conduit.
Regardless, the end result should be that there is a reference on the conduit stored in dsa_port::conduit.
dsa_switch_release_ports() should drop the references, as this seems to be called in all error paths of dsa_port_parse{,of} as well by dsa_switch_remove().
And maybe dsa_switch_shutdown() then also needs to drop the reference? Though it may need to then retake the reference on resume, and I don't know where that exactly should happen. Maybe it should also lookup the conduit(s) again to be correct.
But here I'm more doing educated guesses then actually knowing what's correct.
The alternative/quick "fix" would be to just drop the reference unconditionally, which would align the behaviour to that of dsa_port_parse(). Not sure if it should mirror the dev_hold() / dev_put() spiel as well.
Not that I think this would be the correct behaviour though.
Sorry for the lengthy review/train of thought.
Best regards, Jonas
Thank you for your thoughts on this topic. Indeed there is a problem, for which I managed to find a few hours today to investigate. I was going to just submit a patch directly and refer Ma Ke to it directly, but since you started looking into the situation as well, I just thought I'd reply "please standby". It's currently undergoing testing.
A patch already, that's even better! I'll gladly stand by :)
Best regards, Jonas