Fix a memory leak in netpoll and introduce netconsole selftests that expose the issue when running with kmemleak detection enabled.
This patchset includes a selftest for netpoll with multiple concurrent users (netconsole + bonding), which simulates the scenario from test[1] that originally demonstrated the issue allegedly fixed by commit efa95b01da18 ("netpoll: fix use after free") - a commit that is now being reverted.
Sending this to "net" branch because this is a fix, and the selftest might help with the backports validation.
Link: https://lore.kernel.org/lkml/96b940137a50e5c387687bb4f57de8b0435a653f.140485... [1]
Signed-off-by: Breno Leitao leitao@debian.org --- Changes in v4: - Added an additional selftest to test multiple netpoll users in parallel - Link to v3: https://lore.kernel.org/r/20250905-netconsole_torture-v3-0-875c7febd316@debi...
Changes in v3: - This patchset is a merge of the fix and the selftest together as recommended by Jakub.
Changes in v2: - Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring the create_dynamic_target() (Jakub) - Move the "wait" to after all the messages has been sent. - Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@debi...
--- Breno Leitao (4): net: netpoll: fix incorrect refcount handling causing incorrect cleanup selftest: netcons: refactor target creation selftest: netcons: create a torture test selftest: netcons: add test for netconsole over bonded interfaces
net/core/netpoll.c | 7 +- tools/testing/selftests/drivers/net/Makefile | 2 + .../selftests/drivers/net/lib/sh/lib_netcons.sh | 197 ++++++++++++++++++--- .../selftests/drivers/net/netcons_over_bonding.sh | 76 ++++++++ .../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++ 5 files changed, 384 insertions(+), 25 deletions(-) --- base-commit: 5e87fdc37f8dc619549d49ba5c951b369ce7c136 change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards, -- Breno Leitao leitao@debian.org
commit efa95b01da18 ("netpoll: fix use after free") incorrectly ignored the refcount and prematurely set dev->npinfo to NULL during netpoll cleanup, leading to improper behavior and memory leaks.
Scenario causing lack of proper cleanup:
1) A netpoll is associated with a NIC (e.g., eth0) and netdev->npinfo is allocated, and refcnt = 1 - Keep in mind that npinfo is shared among all netpoll instances. In this case, there is just one.
2) Another netpoll is also associated with the same NIC and npinfo->refcnt += 1. - Now dev->npinfo->refcnt = 2; - There is just one npinfo associated to the netdev.
3) When the first netpolls goes to clean up: - The first cleanup succeeds and clears np->dev->npinfo, ignoring refcnt. - It basically calls `RCU_INIT_POINTER(np->dev->npinfo, NULL);` - Set dev->npinfo = NULL, without proper cleanup - No ->ndo_netpoll_cleanup() is either called
4) Now the second target tries to clean up - The second cleanup fails because np->dev->npinfo is already NULL. * In this case, ops->ndo_netpoll_cleanup() was never called, and the skb pool is not cleaned as well (for the second netpoll instance) - This leaks npinfo and skbpool skbs, which is clearly reported by kmemleak.
Revert commit efa95b01da18 ("netpoll: fix use after free") and adds clarifying comments emphasizing that npinfo cleanup should only happen once the refcount reaches zero, ensuring stable and correct netpoll behavior.
Cc: stable@vger.kernel.org # 3.17.x Cc: Jay Vosburgh jv@jvosburgh.net Fixes: efa95b01da18 ("netpoll: fix use after free") Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Simon Horman horms@kernel.org --- net/core/netpoll.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 5f65b62346d4e..19676cd379640 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -815,6 +815,10 @@ static void __netpoll_cleanup(struct netpoll *np) if (!npinfo) return;
+ /* At this point, there is a single npinfo instance per netdevice, and + * its refcnt tracks how many netpoll structures are linked to it. We + * only perform npinfo cleanup when the refcnt decrements to zero. + */ if (refcount_dec_and_test(&npinfo->refcnt)) { const struct net_device_ops *ops;
@@ -824,8 +828,7 @@ static void __netpoll_cleanup(struct netpoll *np)
RCU_INIT_POINTER(np->dev->npinfo, NULL); call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info); - } else - RCU_INIT_POINTER(np->dev->npinfo, NULL); + }
skb_pool_flush(np); }
linux-stable-mirror@lists.linaro.org