Hello Andre,
On Fri, Nov 28, 2025 at 10:08:03PM +0000, Andre Carvalho wrote:
@@ -242,6 +249,75 @@ static void populate_configfs_item(struct netconsole_target *nt, } #endif /* CONFIG_NETCONSOLE_DYNAMIC */ +/* Check if the target was bound by mac address. */ +static bool bound_by_mac(struct netconsole_target *nt) +{
- return is_valid_ether_addr(nt->np.dev_mac);
+}
+/* Attempts to resume logging to a deactivated target. */ +static void resume_target(struct netconsole_target *nt) +{
- int ret;
- /* check if target is still deactivated as it may have been disabled
* while resume was being scheduled.*/
This only happens if this is a dynamic target and someone is toggling the device (or even removing it, which would cause a crash I _think_).
Given you are completely lockless here, so, there is a chance you hit a TOCTOU, also.
I think you want to have dynamic_netconsole_mutex held during the operation of process_resume_target().
* mutex_lock(&dynamic_netconsole_mutex); * remove from the list * resume * re-add to the list * mutex_unlock(&dynamic_netconsole_mutex);
netconsole design has two locks: * target lock list, which protects devices getting disabled by netdev notifications * dynamic_netconsole_mutex, which protects anyone disabling and removing the target from configfs
- if (nt->state != STATE_DEACTIVATED)
return;- if (bound_by_mac(nt))
/* ensure netpoll_setup will retrieve device by mac */memset(&nt->np.dev_name, 0, IFNAMSIZ);
This is a clean-up step that was missing whent the target is getting down, and htis is just a work around that doesn't belong in here.
Please move it to netconsole_process_cleanups_core(), in a separate patch.
Something as:
list_for_each_entry_safe(nt, tmp, &target_cleanup_list, list) do_netpoll_cleanup(&nt->np); if (bound_by_mac(nt)) memset(&nt->np.dev_name, 0, IFNAMSIZ);
Ideally this should belong to do_netpoll_cleanup(), but let's keep it in netconsole_process_cleanups_core() for three reasons:
1) Bounding by mac is a netconsole concept 2) do_netpoll_cleanup() is only used by netconsole, and I plan to move it back to netconsole. Some PoC in [1] 3) bound_by_mac() should be in netconsole and we do not want to export it.
[1]: https://lore.kernel.org/all/20250902-netpoll_untangle_v3-v1-3-51a03d6411be@d...
- ret = netpoll_setup(&nt->np);
- if (ret) {
/* netpoll fails setup once, do not try again. */nt->state = STATE_DISABLED;return;- }
- nt->state = STATE_ENABLED;
- pr_info("network logging resumed on interface %s\n", nt->np.dev_name);
+}
+/* Checks if a deactivated target matches a device. */ +static bool deactivated_target_match(struct netconsole_target *nt,
struct net_device *ndev)+{
- if (nt->state != STATE_DEACTIVATED)
return false;- if (bound_by_mac(nt))
return !memcmp(nt->np.dev_mac, ndev->dev_addr, ETH_ALEN);- return !strncmp(nt->np.dev_name, ndev->name, IFNAMSIZ);
+}
+/* Process work scheduled for target resume. */ +static void process_resume_target(struct work_struct *work) +{
- struct netconsole_target *nt =
container_of(work, struct netconsole_target, resume_wq);- unsigned long flags;
mutex_lock(&dynamic_netconsole_mutex); As discussed above
- /* resume_target is IRQ unsafe, remove target from
* target_list in order to resume it with IRQ enabled.*/- spin_lock_irqsave(&target_list_lock, flags);
- list_del_init(&nt->list);
- spin_unlock_irqrestore(&target_list_lock, flags);
- resume_target(nt);
- /* At this point the target is either enabled or disabled and
* was cleaned up before getting deactivated. Either way, add it* back to target list.*/- spin_lock_irqsave(&target_list_lock, flags);
- list_add(&nt->list, &target_list);
- spin_unlock_irqrestore(&target_list_lock, flags);
mutex_unlock(&dynamic_netconsole_mutex);
+}
/* Allocate and initialize with defaults.
- Note that these targets get their config_item fields zeroed-out.
*/ @@ -264,6 +340,7 @@ static struct netconsole_target *alloc_and_init(void) nt->np.remote_port = 6666; eth_broadcast_addr(nt->np.remote_mac); nt->state = STATE_DISABLED;
- INIT_WORK(&nt->resume_wq, process_resume_target);
It needs to be initialized earlier before the kzalloc, otherwise we might hit a similar problem to the one fixed by e5235eb6cfe0 ("net: netpoll: initialize work queue before error checks")
The code path would be: * alloc_param_target() * alloc_and_init() * kzalloc() fails and return NULL. * resume_wq() is still not initialized fail: * free_param_target() * cancel_work_sync(&nt->resume_wq); and resume_wq is not initialized
Thanks for the patch, --breno
-- pw-bot: cr