On Mon, Dec 01, 2025 at 03:35:04AM -0800, Breno Leitao wrote:
Given you are completely lockless here, so, there is a chance you hit a TOCTOU, also.
I think you want to have dynamic_netconsole_mutex held during the operation of process_resume_target().
- mutex_lock(&dynamic_netconsole_mutex);
- remove from the list
- resume
- re-add to the list
- mutex_unlock(&dynamic_netconsole_mutex);
This is a pretty good point. Will address this on the next version.
- if (bound_by_mac(nt))
/* ensure netpoll_setup will retrieve device by mac */memset(&nt->np.dev_name, 0, IFNAMSIZ);This is a clean-up step that was missing whent the target is getting down, and htis is just a work around that doesn't belong in here.
Please move it to netconsole_process_cleanups_core(), in a separate patch.
Sounds good. Will include this as a separated patch on the next version of this series.
Something as:
list_for_each_entry_safe(nt, tmp, &target_cleanup_list, list) do_netpoll_cleanup(&nt->np); if (bound_by_mac(nt)) memset(&nt->np.dev_name, 0, IFNAMSIZ);
Ideally this should belong to do_netpoll_cleanup(), but let's keep it in netconsole_process_cleanups_core() for three reasons:
- Bounding by mac is a netconsole concept
- do_netpoll_cleanup() is only used by netconsole, and I plan to move it back to netconsole. Some PoC in [1]
- bound_by_mac() should be in netconsole and we do not want to export it.
The reasoning makes sense to me. I considered performing this cleanup on netpoll, but given your patch opted for this 'hack' before setup - I think doing it on netconsole_process_cleanups_core makes more sense.
I need to check this more, but I'm wondering if we would be able to completely remove dev_name and dev_mac from netpoll and make it part of the netconsole_target. Perhaps as a future refactoring after your patch series.
It needs to be initialized earlier before the kzalloc, otherwise we might hit a similar problem to the one fixed by e5235eb6cfe0 ("net: netpoll: initialize work queue before error checks")
The code path would be:
- alloc_param_target()
- alloc_and_init()
- kzalloc() fails and return NULL.
- resume_wq() is still not initialized
fail:
- free_param_target()
initialized
- cancel_work_sync(&nt->resume_wq); and resume_wq is not
Ack. Will fix this.
Thanks for the patch, --breno
Thanks again for the review. Will submit a new version addressing all the comments once net-next re-opens.