Running the test added with a recent fix on a driver with persistent NAPI config leads to a deadlock. The deadlock is fixed by patch 3, patch 2 is I think a more fundamental problem with the way we implemented the config.
I hope the fix makes sense, my own thinking is definitely colored by my preference (IOW how the per-queue config RFC was implemented).
Jakub Kicinski (3): selftests: drv-net: don't assume device has only 2 queues net: update NAPI threaded config even for disabled NAPIs net: prevent deadlocks when enabling NAPIs with mixed kthread config
include/linux/netdevice.h | 3 ++- net/core/dev.h | 8 ++++++++ net/core/dev.c | 12 +++++++++--- tools/testing/selftests/drivers/net/napi_threaded.py | 10 ++++++---- 4 files changed, 25 insertions(+), 8 deletions(-)
The test is implicitly assuming the device only has 2 queues. A real device will likely have more. The exact problem is that because NAPIs get added to the list from the head, the netlink dump reports them in reverse order. So the naive napis[0] will actually likely give us the _last_ NAPI, not the first one. Re-enable all the NAPIs instead of hard-coding 2 in the test. This way the NAPIs we operated on will always reappear, doesn't matter where they were in the registration order.
Fixes: e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski kuba@kernel.org --- tools/testing/selftests/drivers/net/napi_threaded.py | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/napi_threaded.py b/tools/testing/selftests/drivers/net/napi_threaded.py index b2698db39817..9699a100a87d 100755 --- a/tools/testing/selftests/drivers/net/napi_threaded.py +++ b/tools/testing/selftests/drivers/net/napi_threaded.py @@ -35,6 +35,8 @@ from lib.py import cmd, defer, ethtool threaded = cmd(f"cat /sys/class/net/{cfg.ifname}/threaded").stdout defer(_set_threaded_state, cfg, threaded)
+ return combined +
def enable_dev_threaded_disable_napi_threaded(cfg, nl) -> None: """ @@ -49,7 +51,7 @@ from lib.py import cmd, defer, ethtool napi0_id = napis[0]['id'] napi1_id = napis[1]['id']
- _setup_deferred_cleanup(cfg) + qcnt = _setup_deferred_cleanup(cfg)
# set threaded _set_threaded_state(cfg, 1) @@ -62,7 +64,7 @@ from lib.py import cmd, defer, ethtool nl.napi_set({'id': napi1_id, 'threaded': 'disabled'})
cmd(f"ethtool -L {cfg.ifname} combined 1") - cmd(f"ethtool -L {cfg.ifname} combined 2") + cmd(f"ethtool -L {cfg.ifname} combined {qcnt}") _assert_napi_threaded_enabled(nl, napi0_id) _assert_napi_threaded_disabled(nl, napi1_id)
@@ -80,7 +82,7 @@ from lib.py import cmd, defer, ethtool napi0_id = napis[0]['id'] napi1_id = napis[1]['id']
- _setup_deferred_cleanup(cfg) + qcnt = _setup_deferred_cleanup(cfg)
# set threaded _set_threaded_state(cfg, 1) @@ -90,7 +92,7 @@ from lib.py import cmd, defer, ethtool _assert_napi_threaded_enabled(nl, napi1_id)
cmd(f"ethtool -L {cfg.ifname} combined 1") - cmd(f"ethtool -L {cfg.ifname} combined 2") + cmd(f"ethtool -L {cfg.ifname} combined {qcnt}")
# check napi threaded is set for both napis _assert_napi_threaded_enabled(nl, napi0_id)
We have to make sure that all future NAPIs will have the right threaded state when the state is configured on the device level. We chose not to have an "unset" state for threaded, and not to wipe the NAPI config clean when channels are explicitly disabled. This means the persistent config structs "exist" even when their NAPIs are not instantiated.
Differently put - the NAPI persistent state lives in the net_device (ncfg == struct napi_config):
,--- [napi 0] - [napi 1] [dev] | | `--- [ncfg 0] - [ncfg 1]
so say we a device with 2 queues but only 1 enabled:
,--- [napi 0] [dev] | `--- [ncfg 0] - [ncfg 1]
now we set the device to threaded=1:
,---------- [napi 0 (thr:1)] [dev(thr:1)] | `---------- [ncfg 0 (thr:1)] - [ncfg 1 (thr:?)]
Since [ncfg 1] was not attached to a NAPI during configuration we skipped it. If we create a NAPI for it later it will have the old setting (presumably disabled). One could argue if this is right or not "in principle", but it's definitely not how things worked before per-NAPI config..
Fixes: 2677010e7793 ("Add support to set NAPI threaded for individual NAPI") Signed-off-by: Jakub Kicinski kuba@kernel.org --- include/linux/netdevice.h | 3 ++- net/core/dev.c | 7 ++++++- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5e5de4b0a433..bfda1d7b9ee0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2482,8 +2482,9 @@ struct net_device {
u64 max_pacing_offload_horizon; struct napi_config *napi_config; - unsigned long gro_flush_timeout; + u32 num_napi_configs; u32 napi_defer_hard_irqs; + unsigned long gro_flush_timeout;
/** * @up: copy of @state's IFF_UP, but safe to read with just @lock. diff --git a/net/core/dev.c b/net/core/dev.c index 68dc47d7e700..f180746382a1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6999,7 +6999,7 @@ int netif_set_threaded(struct net_device *dev, enum netdev_napi_threaded threaded) { struct napi_struct *napi; - int err = 0; + int i, err = 0;
netdev_assert_locked_or_invisible(dev);
@@ -7021,6 +7021,10 @@ int netif_set_threaded(struct net_device *dev, list_for_each_entry(napi, &dev->napi_list, dev_list) WARN_ON_ONCE(napi_set_threaded(napi, threaded));
+ /* Override the config for all NAPIs even if currently not listed */ + for (i = 0; i < dev->num_napi_configs; i++) + dev->napi_config[i].threaded = threaded; + return err; }
@@ -11873,6 +11877,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, goto free_all; dev->cfg_pending = dev->cfg;
+ dev->num_napi_configs = maxqs; napi_config_sz = array_size(maxqs, sizeof(*dev->napi_config)); dev->napi_config = kvzalloc(napi_config_sz, GFP_KERNEL_ACCOUNT); if (!dev->napi_config)
The following order of calls currently deadlocks if: - device has threaded=1; and - NAPI has persistent config with threaded=0.
netif_napi_add_weight_config() dev->threaded == 1 napi_kthread_create()
napi_enable() napi_restore_config() napi_set_threaded(0) napi_stop_kthread() while (NAPIF_STATE_SCHED) msleep(20)
We deadlock because disabled NAPI has STATE_SCHED set. Creating a thread in netif_napi_add() just to destroy it in napi_disable() is fairly ugly in the first place. Let's read both the device config and the NAPI config in netif_napi_add().
Fixes: e6d76268813d ("net: Update threaded state in napi config in netif_set_threaded") Signed-off-by: Jakub Kicinski kuba@kernel.org --- net/core/dev.h | 8 ++++++++ net/core/dev.c | 5 +++-- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/core/dev.h b/net/core/dev.h index ab69edc0c3e3..d6b08d435479 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -323,6 +323,14 @@ static inline enum netdev_napi_threaded napi_get_threaded(struct napi_struct *n) return NETDEV_NAPI_THREADED_DISABLED; }
+static inline enum netdev_napi_threaded +napi_get_threaded_config(struct net_device *dev, struct napi_struct *n) +{ + if (n->config) + return n->config->threaded; + return dev->threaded; +} + int napi_set_threaded(struct napi_struct *n, enum netdev_napi_threaded threaded);
diff --git a/net/core/dev.c b/net/core/dev.c index f180746382a1..5a3c0f40a93f 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7357,8 +7357,9 @@ void netif_napi_add_weight_locked(struct net_device *dev, * Clear dev->threaded if kthread creation failed so that * threaded mode will not be enabled in napi_enable(). */ - if (dev->threaded && napi_kthread_create(napi)) - dev->threaded = NETDEV_NAPI_THREADED_DISABLED; + if (napi_get_threaded_config(dev, napi)) + if (napi_kthread_create(napi)) + dev->threaded = NETDEV_NAPI_THREADED_DISABLED; netif_napi_set_irq_locked(napi, -1); } EXPORT_SYMBOL(netif_napi_add_weight_locked);
linux-kselftest-mirror@lists.linaro.org