Running the test added with a recent fix on a driver with persistent NAPI config leads to a deadlock. The deadlock is fixed by patch 3, patch 2 is I think a more fundamental problem with the way we implemented the config.
I hope the fix makes sense, my own thinking is definitely colored by my preference (IOW how the per-queue config RFC was implemented).
v2: add missing kdoc v1: https://lore.kernel.org/20250808014952.724762-1-kuba@kernel.org
Jakub Kicinski (3): selftests: drv-net: don't assume device has only 2 queues net: update NAPI threaded config even for disabled NAPIs net: prevent deadlocks when enabling NAPIs with mixed kthread config
include/linux/netdevice.h | 5 ++++- net/core/dev.h | 8 ++++++++ net/core/dev.c | 12 +++++++++--- tools/testing/selftests/drivers/net/napi_threaded.py | 10 ++++++---- 4 files changed, 27 insertions(+), 8 deletions(-)
The test is implicitly assuming the device only has 2 queues. A real device will likely have more. The exact problem is that because NAPIs get added to the list from the head, the netlink dump reports them in reverse order. So the naive napis[0] will actually likely give us the _last_ NAPI, not the first one. Re-enable all the NAPIs instead of hard-coding 2 in the test. This way the NAPIs we operated on will always reappear, doesn't matter where they were in the registration order.
Fixes: e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski kuba@kernel.org --- tools/testing/selftests/drivers/net/napi_threaded.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/napi_threaded.py b/tools/testing/selftests/drivers/net/napi_threaded.py index b2698db39817..9699a100a87d 100755 --- a/tools/testing/selftests/drivers/net/napi_threaded.py +++ b/tools/testing/selftests/drivers/net/napi_threaded.py @@ -35,6 +35,8 @@ from lib.py import cmd, defer, ethtool threaded = cmd(f"cat /sys/class/net/{cfg.ifname}/threaded").stdout defer(_set_threaded_state, cfg, threaded)
+ return combined +
def enable_dev_threaded_disable_napi_threaded(cfg, nl) -> None: """ @@ -49,7 +51,7 @@ from lib.py import cmd, defer, ethtool napi0_id = napis[0]['id'] napi1_id = napis[1]['id']
- _setup_deferred_cleanup(cfg) + qcnt = _setup_deferred_cleanup(cfg)
# set threaded _set_threaded_state(cfg, 1) @@ -62,7 +64,7 @@ from lib.py import cmd, defer, ethtool nl.napi_set({'id': napi1_id, 'threaded': 'disabled'})
cmd(f"ethtool -L {cfg.ifname} combined 1") - cmd(f"ethtool -L {cfg.ifname} combined 2") + cmd(f"ethtool -L {cfg.ifname} combined {qcnt}") _assert_napi_threaded_enabled(nl, napi0_id) _assert_napi_threaded_disabled(nl, napi1_id)
@@ -80,7 +82,7 @@ from lib.py import cmd, defer, ethtool napi0_id = napis[0]['id'] napi1_id = napis[1]['id']
- _setup_deferred_cleanup(cfg) + qcnt = _setup_deferred_cleanup(cfg)
# set threaded _set_threaded_state(cfg, 1) @@ -90,7 +92,7 @@ from lib.py import cmd, defer, ethtool _assert_napi_threaded_enabled(nl, napi1_id)
cmd(f"ethtool -L {cfg.ifname} combined 1") - cmd(f"ethtool -L {cfg.ifname} combined 2") + cmd(f"ethtool -L {cfg.ifname} combined {qcnt}")
# check napi threaded is set for both napis _assert_napi_threaded_enabled(nl, napi0_id)
On Fri, Aug 08, 2025 at 05:12:03PM -0700, Jakub Kicinski wrote:
The test is implicitly assuming the device only has 2 queues. A real device will likely have more. The exact problem is that because NAPIs get added to the list from the head, the netlink dump reports them in reverse order. So the naive napis[0] will actually likely give us the _last_ NAPI, not the first one. Re-enable all the NAPIs instead of hard-coding 2 in the test. This way the NAPIs we operated on will always reappear, doesn't matter where they were in the registration order.
Fixes: e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski kuba@kernel.org
tools/testing/selftests/drivers/net/napi_threaded.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
Reviewed-by: Joe Damato joe@dama.to
We have to make sure that all future NAPIs will have the right threaded state when the state is configured on the device level. We chose not to have an "unset" state for threaded, and not to wipe the NAPI config clean when channels are explicitly disabled. This means the persistent config structs "exist" even when their NAPIs are not instantiated.
Differently put - the NAPI persistent state lives in the net_device (ncfg == struct napi_config):
,--- [napi 0] - [napi 1] [dev] | | `--- [ncfg 0] - [ncfg 1]
so say we a device with 2 queues but only 1 enabled:
,--- [napi 0] [dev] | `--- [ncfg 0] - [ncfg 1]
now we set the device to threaded=1:
,---------- [napi 0 (thr:1)] [dev(thr:1)] | `---------- [ncfg 0 (thr:1)] - [ncfg 1 (thr:?)]
Since [ncfg 1] was not attached to a NAPI during configuration we skipped it. If we create a NAPI for it later it will have the old setting (presumably disabled). One could argue if this is right or not "in principle", but it's definitely not how things worked before per-NAPI config..
Fixes: 2677010e7793 ("Add support to set NAPI threaded for individual NAPI") Signed-off-by: Jakub Kicinski kuba@kernel.org --- v2: add missing kdoc --- include/linux/netdevice.h | 5 ++++- net/core/dev.c | 7 ++++++- 2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5e5de4b0a433..f3a3b761abfb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2071,6 +2071,8 @@ enum netdev_reg_state { * @max_pacing_offload_horizon: max EDT offload horizon in nsec. * @napi_config: An array of napi_config structures containing per-NAPI * settings. + * @num_napi_configs: number of allocated NAPI config structs, + * always >= max(num_rx_queues, num_tx_queues). * @gro_flush_timeout: timeout for GRO layer in NAPI * @napi_defer_hard_irqs: If not zero, provides a counter that would * allow to avoid NIC hard IRQ, on busy queues. @@ -2482,8 +2484,9 @@ struct net_device {
u64 max_pacing_offload_horizon; struct napi_config *napi_config; - unsigned long gro_flush_timeout; + u32 num_napi_configs; u32 napi_defer_hard_irqs; + unsigned long gro_flush_timeout;
/** * @up: copy of @state's IFF_UP, but safe to read with just @lock. diff --git a/net/core/dev.c b/net/core/dev.c index 68dc47d7e700..f180746382a1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6999,7 +6999,7 @@ int netif_set_threaded(struct net_device *dev, enum netdev_napi_threaded threaded) { struct napi_struct *napi; - int err = 0; + int i, err = 0;
netdev_assert_locked_or_invisible(dev);
@@ -7021,6 +7021,10 @@ int netif_set_threaded(struct net_device *dev, list_for_each_entry(napi, &dev->napi_list, dev_list) WARN_ON_ONCE(napi_set_threaded(napi, threaded));
+ /* Override the config for all NAPIs even if currently not listed */ + for (i = 0; i < dev->num_napi_configs; i++) + dev->napi_config[i].threaded = threaded; + return err; }
@@ -11873,6 +11877,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, goto free_all; dev->cfg_pending = dev->cfg;
+ dev->num_napi_configs = maxqs; napi_config_sz = array_size(maxqs, sizeof(*dev->napi_config)); dev->napi_config = kvzalloc(napi_config_sz, GFP_KERNEL_ACCOUNT); if (!dev->napi_config)
On Fri, Aug 08, 2025 at 05:12:04PM -0700, Jakub Kicinski wrote:
We chose not to have an "unset" state for threaded, and not to wipe the NAPI config clean when channels are explicitly disabled.
Yea... I wonder if we could change that now or if it's too late? I think this is the thing you mentioned that I couldn't recall in my response to the cover letter.
This means the persistent config structs "exist" even when their NAPIs are not instantiated.
Differently put - the NAPI persistent state lives in the net_device (ncfg == struct napi_config):
,--- [napi 0] - [napi 1]
[dev] | | `--- [ncfg 0] - [ncfg 1]
so say we a device with 2 queues but only 1 enabled:
,--- [napi 0]
[dev] | `--- [ncfg 0] - [ncfg 1]
now we set the device to threaded=1:
,---------- [napi 0 (thr:1)]
[dev(thr:1)] | `---------- [ncfg 0 (thr:1)] - [ncfg 1 (thr:?)]
Since [ncfg 1] was not attached to a NAPI during configuration we skipped it. If we create a NAPI for it later it will have the old setting (presumably disabled). One could argue if this is right or not "in principle", but it's definitely not how things worked before per-NAPI config..
Thanks for the detailed commit message. I agree that it should probably work the same now.
Fixes: 2677010e7793 ("Add support to set NAPI threaded for individual NAPI") Signed-off-by: Jakub Kicinski kuba@kernel.org
v2: add missing kdoc
include/linux/netdevice.h | 5 ++++- net/core/dev.c | 7 ++++++- 2 files changed, 10 insertions(+), 2 deletions(-)
Reviewed-by: Joe Damato joe@dama.to
The following order of calls currently deadlocks if: - device has threaded=1; and - NAPI has persistent config with threaded=0.
netif_napi_add_weight_config() dev->threaded == 1 napi_kthread_create()
napi_enable() napi_restore_config() napi_set_threaded(0) napi_stop_kthread() while (NAPIF_STATE_SCHED) msleep(20)
We deadlock because disabled NAPI has STATE_SCHED set. Creating a thread in netif_napi_add() just to destroy it in napi_disable() is fairly ugly in the first place. Let's read both the device config and the NAPI config in netif_napi_add().
Fixes: e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski kuba@kernel.org --- net/core/dev.h | 8 ++++++++ net/core/dev.c | 5 +++-- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.h b/net/core/dev.h index ab69edc0c3e3..d6b08d435479 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -323,6 +323,14 @@ static inline enum netdev_napi_threaded napi_get_threaded(struct napi_struct *n) return NETDEV_NAPI_THREADED_DISABLED; }
+static inline enum netdev_napi_threaded +napi_get_threaded_config(struct net_device *dev, struct napi_struct *n) +{ + if (n->config) + return n->config->threaded; + return dev->threaded; +} + int napi_set_threaded(struct napi_struct *n, enum netdev_napi_threaded threaded);
diff --git a/net/core/dev.c b/net/core/dev.c index f180746382a1..5a3c0f40a93f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7357,8 +7357,9 @@ void netif_napi_add_weight_locked(struct net_device *dev, * Clear dev->threaded if kthread creation failed so that * threaded mode will not be enabled in napi_enable(). */ - if (dev->threaded && napi_kthread_create(napi)) - dev->threaded = NETDEV_NAPI_THREADED_DISABLED; + if (napi_get_threaded_config(dev, napi)) + if (napi_kthread_create(napi)) + dev->threaded = NETDEV_NAPI_THREADED_DISABLED; netif_napi_set_irq_locked(napi, -1); } EXPORT_SYMBOL(netif_napi_add_weight_locked);
On Fri, Aug 08, 2025 at 05:12:05PM -0700, Jakub Kicinski wrote:
The following order of calls currently deadlocks if:
- device has threaded=1; and
- NAPI has persistent config with threaded=0.
netif_napi_add_weight_config() dev->threaded == 1 napi_kthread_create()
napi_enable() napi_restore_config() napi_set_threaded(0) napi_stop_kthread() while (NAPIF_STATE_SCHED) msleep(20)
We deadlock because disabled NAPI has STATE_SCHED set. Creating a thread in netif_napi_add() just to destroy it in napi_disable() is fairly ugly in the first place. Let's read both the device config and the NAPI config in netif_napi_add().
Fixes: e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski kuba@kernel.org
net/core/dev.h | 8 ++++++++ net/core/dev.c | 5 +++-- 2 files changed, 11 insertions(+), 2 deletions(-)
Reviewed-by: Joe Damato joe@dama.to
On Fri, Aug 08, 2025 at 05:12:02PM -0700, Jakub Kicinski wrote:
Running the test added with a recent fix on a driver with persistent NAPI config leads to a deadlock. The deadlock is fixed by patch 3, patch 2 is I think a more fundamental problem with the way we implemented the config.
I hope the fix makes sense, my own thinking is definitely colored by my preference (IOW how the per-queue config RFC was implemented).
Maybe it's too late now, but I am open to revisiting how the whole per-queue NAPI config works after a conversation we had a couple months ago (IIRC ?).
I think you had proposed something that made sense to me at the time (although I can't recall what that was or what thread that was in).
On Sat, 9 Aug 2025 18:59:27 -0700 Joe Damato wrote:
On Fri, Aug 08, 2025 at 05:12:02PM -0700, Jakub Kicinski wrote:
Running the test added with a recent fix on a driver with persistent NAPI config leads to a deadlock. The deadlock is fixed by patch 3, patch 2 is I think a more fundamental problem with the way we implemented the config.
I hope the fix makes sense, my own thinking is definitely colored by my preference (IOW how the per-queue config RFC was implemented).
Maybe it's too late now, but I am open to revisiting how the whole per-queue NAPI config works after a conversation we had a couple months ago (IIRC ?).
I think you had proposed something that made sense to me at the time (although I can't recall what that was or what thread that was in).
FWIW the discussion was whether setting things at the device level should override all the per-NAPI settings, or should we treat the device level as lower priority and only apply it if user didn't set per-NAPI override.
I guess it doesn't make a huge difference, other than that resetting the unused NAPIs to "unset" would remove the need for patch 2.
Hello:
This series was applied to netdev/net.git (main) by Paolo Abeni pabeni@redhat.com:
On Fri, 8 Aug 2025 17:12:02 -0700 you wrote:
Running the test added with a recent fix on a driver with persistent NAPI config leads to a deadlock. The deadlock is fixed by patch 3, patch 2 is I think a more fundamental problem with the way we implemented the config.
I hope the fix makes sense, my own thinking is definitely colored by my preference (IOW how the per-queue config RFC was implemented).
[...]
Here is the summary with links: - [net,v2,1/3] selftests: drv-net: don't assume device has only 2 queues https://git.kernel.org/netdev/net/c/bda053d64457 - [net,v2,2/3] net: update NAPI threaded config even for disabled NAPIs https://git.kernel.org/netdev/net/c/ccba9f6baa90 - [net,v2,3/3] net: prevent deadlocks when enabling NAPIs with mixed kthread config https://git.kernel.org/netdev/net/c/b3fc08ab9a56
You are awesome, thank you!
linux-kselftest-mirror@lists.linaro.org