The first patch fixes the xfrm offload feature during setup active-backup mode. The second patch add a ipsec offload testing.
Hangbin Liu (2): bonding: fix xfrm offload feature setup on active-backup mode selftests: bonding: add ipsec offload test
drivers/net/bonding/bond_main.c | 2 +- drivers/net/bonding/bond_netlink.c | 17 +- include/net/bonding.h | 1 + .../selftests/drivers/net/bonding/Makefile | 3 +- .../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++ .../selftests/drivers/net/bonding/config | 4 + 6 files changed, 173 insertions(+), 9 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
The active-backup bonding mode supports XFRM ESP offload. However, when a bond is added using command like `ip link add bond0 type bond mode 1 miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is disabled. This occurs because, in bond_newlink(), we change bond link first and register bond device later. So the XFRM feature update in bond_option_mode_set() is not called as the bond device is not yet registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to ensure that the bond device is registered first before changing the bond link parameters. This change will allow the XFRM ESP offload feature to be correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time") Signed-off-by: Hangbin Liu liuhangbin@gmail.com --- drivers/net/bonding/bond_main.c | 2 +- drivers/net/bonding/bond_netlink.c | 17 ++++++++++------- include/net/bonding.h | 1 + 3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 49dd4fe195e5..7daeab67e7b5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond) INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler); }
-static void bond_work_cancel_all(struct bonding *bond) +void bond_work_cancel_all(struct bonding *bond) { cancel_delayed_work_sync(&bond->mii_work); cancel_delayed_work_sync(&bond->arp_work); diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 2a6a424806aa..7fe8c62366eb 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) { + struct bonding *bond = netdev_priv(bond_dev); int err;
- err = bond_changelink(bond_dev, tb, data, extack); - if (err < 0) + err = register_netdevice(bond_dev); + if (err) return err;
- err = register_netdevice(bond_dev); - if (!err) { - struct bonding *bond = netdev_priv(bond_dev); + netif_carrier_off(bond_dev); + bond_work_init_all(bond);
- netif_carrier_off(bond_dev); - bond_work_init_all(bond); + err = bond_changelink(bond_dev, tb, data, extack); + if (err) { + bond_work_cancel_all(bond); + netif_carrier_on(bond_dev); + unregister_netdevice(bond_dev); }
return err; diff --git a/include/net/bonding.h b/include/net/bonding.h index 8bb5f016969f..e5e005cd2e17 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -707,6 +707,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev, int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave); void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay); void bond_work_init_all(struct bonding *bond); +void bond_work_cancel_all(struct bonding *bond);
#ifdef CONFIG_PROC_FS void bond_create_proc_entry(struct bonding *bond);
On 12/11/24 09:11, Hangbin Liu wrote:
The active-backup bonding mode supports XFRM ESP offload. However, when a bond is added using command like `ip link add bond0 type bond mode 1 miimon 100`, the `ethtool -k` command shows that the XFRM ESP offload is disabled. This occurs because, in bond_newlink(), we change bond link first and register bond device later. So the XFRM feature update in bond_option_mode_set() is not called as the bond device is not yet registered, leading to the offload feature not being set successfully.
To resolve this issue, we can modify the code order in bond_newlink() to ensure that the bond device is registered first before changing the bond link parameters. This change will allow the XFRM ESP offload feature to be correctly enabled.
Fixes: 007ab5345545 ("bonding: fix feature flag setting at init time") Signed-off-by: Hangbin Liu liuhangbin@gmail.com
drivers/net/bonding/bond_main.c | 2 +- drivers/net/bonding/bond_netlink.c | 17 ++++++++++------- include/net/bonding.h | 1 + 3 files changed, 12 insertions(+), 8 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 49dd4fe195e5..7daeab67e7b5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond) INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler); } -static void bond_work_cancel_all(struct bonding *bond) +void bond_work_cancel_all(struct bonding *bond) { cancel_delayed_work_sync(&bond->mii_work); cancel_delayed_work_sync(&bond->arp_work); diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 2a6a424806aa..7fe8c62366eb 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) {
- struct bonding *bond = netdev_priv(bond_dev); int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
- err = register_netdevice(bond_dev);
- if (err) return err;
- err = register_netdevice(bond_dev);
- if (!err) {
struct bonding *bond = netdev_priv(bond_dev);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
netif_carrier_off(bond_dev);
bond_work_init_all(bond);
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err) {
bond_work_cancel_all(bond);
netif_carrier_on(bond_dev);
The patch looks good, but I'm curious why the carrier on here?
}unregister_netdevice(bond_dev);
return err; diff --git a/include/net/bonding.h b/include/net/bonding.h index 8bb5f016969f..e5e005cd2e17 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -707,6 +707,7 @@ struct bond_vlan_tag *bond_verify_device_path(struct net_device *start_dev, int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave); void bond_slave_arr_work_rearm(struct bonding *bond, unsigned long delay); void bond_work_init_all(struct bonding *bond); +void bond_work_cancel_all(struct bonding *bond); #ifdef CONFIG_PROC_FS void bond_create_proc_entry(struct bonding *bond);
On Thu, Dec 12, 2024 at 11:19:33AM +0200, Nikolay Aleksandrov wrote:
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 49dd4fe195e5..7daeab67e7b5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond) INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler); } -static void bond_work_cancel_all(struct bonding *bond) +void bond_work_cancel_all(struct bonding *bond) { cancel_delayed_work_sync(&bond->mii_work); cancel_delayed_work_sync(&bond->arp_work); diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 2a6a424806aa..7fe8c62366eb 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) {
- struct bonding *bond = netdev_priv(bond_dev); int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
- err = register_netdevice(bond_dev);
- if (err) return err;
- err = register_netdevice(bond_dev);
- if (!err) {
struct bonding *bond = netdev_priv(bond_dev);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
netif_carrier_off(bond_dev);
bond_work_init_all(bond);
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err) {
bond_work_cancel_all(bond);
netif_carrier_on(bond_dev);
The patch looks good, but I'm curious why the carrier on here?
The current code set netif_carrier_off(bond_dev) after register_netdevice() success, So I make it on if register failed.
Thanks hangbin
}unregister_netdevice(bond_dev);
return err;
On 12/12/24 11:39, Hangbin Liu wrote:
On Thu, Dec 12, 2024 at 11:19:33AM +0200, Nikolay Aleksandrov wrote:
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 49dd4fe195e5..7daeab67e7b5 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4389,7 +4389,7 @@ void bond_work_init_all(struct bonding *bond) INIT_DELAYED_WORK(&bond->slave_arr_work, bond_slave_arr_handler); } -static void bond_work_cancel_all(struct bonding *bond) +void bond_work_cancel_all(struct bonding *bond) { cancel_delayed_work_sync(&bond->mii_work); cancel_delayed_work_sync(&bond->arp_work); diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 2a6a424806aa..7fe8c62366eb 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) {
- struct bonding *bond = netdev_priv(bond_dev); int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
- err = register_netdevice(bond_dev);
- if (err) return err;
- err = register_netdevice(bond_dev);
- if (!err) {
struct bonding *bond = netdev_priv(bond_dev);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
netif_carrier_off(bond_dev);
bond_work_init_all(bond);
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err) {
bond_work_cancel_all(bond);
netif_carrier_on(bond_dev);
The patch looks good, but I'm curious why the carrier on here?
The current code set netif_carrier_off(bond_dev) after register_netdevice() success, So I make it on if register failed.
Thanks hangbin
I don't like adding code just for symmetry alone, I think you should drop it unless there is an actual reason to turn carrier on.
}unregister_netdevice(bond_dev);
return err;
On Thu, Dec 12, 2024 at 11:43:15AM +0200, Nikolay Aleksandrov wrote:
--- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -568,18 +568,21 @@ static int bond_newlink(struct net *src_net, struct net_device *bond_dev, struct nlattr *tb[], struct nlattr *data[], struct netlink_ext_ack *extack) {
- struct bonding *bond = netdev_priv(bond_dev); int err;
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err < 0)
- err = register_netdevice(bond_dev);
- if (err) return err;
- err = register_netdevice(bond_dev);
- if (!err) {
struct bonding *bond = netdev_priv(bond_dev);
- netif_carrier_off(bond_dev);
- bond_work_init_all(bond);
netif_carrier_off(bond_dev);
bond_work_init_all(bond);
- err = bond_changelink(bond_dev, tb, data, extack);
- if (err) {
bond_work_cancel_all(bond);
netif_carrier_on(bond_dev);
The patch looks good, but I'm curious why the carrier on here?
The current code set netif_carrier_off(bond_dev) after register_netdevice() success, So I make it on if register failed.
Thanks hangbin
I don't like adding code just for symmetry alone, I think you should drop it unless there is an actual reason to turn carrier on.
OK, I will drop it.
Thanks Hangbin
This introduces a test for IPSec offload over bonding, utilizing netdevsim for the testing process, as veth interfaces do not support IPSec offload. The test will ensure that the IPSec offload functionality remains operational even after a failover event occurs in the bonding configuration.
Signed-off-by: Hangbin Liu liuhangbin@gmail.com --- .../selftests/drivers/net/bonding/Makefile | 3 +- .../drivers/net/bonding/bond_ipsec_offload.sh | 155 ++++++++++++++++++ .../selftests/drivers/net/bonding/config | 4 + 3 files changed, 161 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh
diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile index 03a089165d3f..c938475fdefa 100644 --- a/tools/testing/selftests/drivers/net/bonding/Makefile +++ b/tools/testing/selftests/drivers/net/bonding/Makefile @@ -10,7 +10,8 @@ TEST_PROGS := \ mode-2-recovery-updelay.sh \ bond_options.sh \ bond-eth-type-change.sh \ - bond_macvlan.sh + bond_macvlan.sh \ + bond_ipsec_offload.sh
TEST_FILES := \ lag_lib.sh \ diff --git a/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh b/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh new file mode 100755 index 000000000000..868f22ad11aa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_ipsec_offload.sh @@ -0,0 +1,155 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# IPsec over bonding offload test: +# +# +----------------+ +# | bond0 | +# | | | +# | eth0 eth1 | +# +---+-------+----+ +# +# We use netdevsim instead of physical interfaces +#------------------------------------------------------------------- +# Example commands +# ip x s add proto esp src 192.0.2.1 dst 192.0.2.2 \ +# spi 0x07 mode transport reqid 0x07 replay-window 32 \ +# aead 'rfc4106(gcm(aes))' 1234567890123456dcba 128 \ +# sel src 192.0.2.1/24 dst 192.0.2.2/24 +# offload dev bond0 dir out +# ip x p add dir out src 192.0.2.1/24 dst 192.0.2.2/24 \ +# tmpl proto esp src 192.0.2.1 dst 192.0.2.2 \ +# spi 0x07 mode transport reqid 0x07 +# +#------------------------------------------------------------------- + +lib_dir=$(dirname "$0") +source "$lib_dir"/../../../net/lib.sh +algo="aead rfc4106(gcm(aes)) 0x3132333435363738393031323334353664636261 128" +srcip=192.0.2.1 +dstip=192.0.2.2 +ipsec0=/sys/kernel/debug/netdevsim/netdevsim0/ports/0/ipsec +ipsec1=/sys/kernel/debug/netdevsim/netdevsim0/ports/1/ipsec +ret=0 + +cleanup() +{ + modprobe -r netdevsim + cleanup_ns $ns +} + +active_slave_changed() +{ + local old_active_slave=$1 + local new_active_slave=$(ip -n ${ns} -d -j link show bond0 | \ + jq -r ".[].linkinfo.info_data.active_slave") + [ "$new_active_slave" != "$old_active_slave" -a "$new_active_slave" != "null" ] +} + +test_offload() +{ + # use ping to exercise the Tx path + ip netns exec $ns ping -I bond0 -c 3 -W 1 -i 0 $dstip >/dev/null + + active_slave=$(ip -n ${ns} -d -j link show bond0 | \ + jq -r ".[].linkinfo.info_data.active_slave") + + if [ $active_slave = $nic0 ]; then + sysfs=$ipsec0 + elif [ $active_slave = $nic1 ]; then + sysfs=$ipsec1 + else + echo "FAIL: bond_ipsec_offload invalid active_slave $active_slave" + ret=1 + fi + + # The tx/rx order in sysfs may changed after failover + if grep -q "SA count=2 tx=3" $sysfs && grep -q "tx ipaddr=$dstip" $sysfs; then + echo "PASS: bond_ipsec_offload has correct tx count with link ${active_slave}" + else + echo "FAIL: bond_ipsec_offload incorrect tx count with link ${active_slave}" + ret=1 + fi +} + +if ! mount | grep -q debugfs; then + mount -t debugfs none /sys/kernel/debug/ &> /dev/null +fi + +# setup netdevsim since dummy/veth dev doesn't have offload support +if [ ! -w /sys/bus/netdevsim/new_device ] ; then + modprobe -q netdevsim + if [ $? -ne 0 ]; then + echo "SKIP: can't load netdevsim for ipsec offload" + return $ksft_skip + fi +fi + +trap cleanup EXIT + +setup_ns ns +ip -n $ns link add bond0 type bond mode active-backup miimon 100 +ip -n $ns addr add $srcip/24 dev bond0 +ip -n $ns link set bond0 up + +ifaces=$(ip netns exec $ns bash -c ' + sysfsnet=/sys/bus/netdevsim/devices/netdevsim0/net/ + echo "0 2" > /sys/bus/netdevsim/new_device + while [ ! -d $sysfsnet ] ; do :; done + udevadm settle + ls $sysfsnet +') +nic0=$(echo $ifaces | cut -f1 -d ' ') +nic1=$(echo $ifaces | cut -f2 -d ' ') +ip -n $ns link set $nic0 master bond0 +ip -n $ns link set $nic1 master bond0 + +# create offloaded SAs, both in and out +ip -n $ns x p add dir out src $srcip/24 dst $dstip/24 \ + tmpl proto esp src $srcip dst $dstip spi 9 \ + mode transport reqid 42 + +ip -n $ns x p add dir in src $dstip/24 dst $srcip/24 \ + tmpl proto esp src $dstip dst $srcip spi 9 \ + mode transport reqid 42 + +ip -n $ns x s add proto esp src $srcip dst $dstip spi 9 \ + mode transport reqid 42 $algo sel src $srcip/24 dst $dstip/24 \ + offload dev bond0 dir out + +ip -n $ns x s add proto esp src $dstip dst $srcip spi 9 \ + mode transport reqid 42 $algo sel src $dstip/24 dst $srcip/24 \ + offload dev bond0 dir in + +# does offload show up in ip output +lines=`ip -n $ns x s list | grep -c "crypto offload parameters: dev bond0 dir"` +if [ $lines -ne 2 ] ; then + echo "FAIL: bond_ipsec_offload SA offload missing from list output" + ret=1 +fi + +# we didn't create a peer, make sure we can Tx by adding a permanent neighbour +# this need to be added after enslave +ip -n $ns neigh add $dstip dev bond0 lladdr 00:11:22:33:44:55 + +# start Offload testing +test_offload + +# do failover +ip -n $ns link set $active_slave down +slowwait 5 active_slave_changed $active_slave +test_offload + +# make sure offload get removed from driver +ip -n $ns x s flush +ip -n $ns x p flush +line0=$(grep -c "SA count=0" $ipsec0) +line1=$(grep -c "SA count=0" $ipsec1) +if [ $line0 -ne 1 -o $line1 -ne 1 ] ; then + echo "FAIL: bond_ipsec_offload SA not removed from driver" + ret=1 +else + echo "PASS: bond_ipsec_offload SA removed from driver" +fi + +exit $ret diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config index 899d7fb6ea8e..91c581abe79c 100644 --- a/tools/testing/selftests/drivers/net/bonding/config +++ b/tools/testing/selftests/drivers/net/bonding/config @@ -8,3 +8,7 @@ CONFIG_NET_CLS_FLOWER=y CONFIG_NET_SCH_INGRESS=y CONFIG_NLMON=y CONFIG_VETH=y +CONFIG_INET_ESP=y +CONFIG_INET_ESP_OFFLOAD=y +CONFIG_XFRM_USER=m +CONFIG_NETDEVSIM=m
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
The first patch fixes the xfrm offload feature during setup active-backup mode. The second patch add a ipsec offload testing.
Looks like the test is too good, is there a fix pending somewhere for the BUG below? We can't merge the test before that:
https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/900082/11-bond-ip...
[ 859.672652][ C3] bond_xfrm_update_stats: eth0 doesn't support xdo_dev_state_update_stats [ 860.467189][ T8677] bond0: (slave eth0): link status definitely down, disabling slave [ 860.467664][ T8677] bond0: (slave eth1): making interface the new active one [ 860.831042][ T9677] bond_xfrm_update_stats: eth1 doesn't support xdo_dev_state_update_stats [ 862.195271][ T9683] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562 [ 862.195880][ T9683] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9683, name: ip [ 862.196189][ T9683] preempt_count: 201, expected: 0 [ 862.196396][ T9683] RCU nest depth: 0, expected: 0 [ 862.196591][ T9683] 2 locks held by ip/9683: [ 862.196818][ T9683] #0: ffff88800a829558 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x65/0x90 [xfrm_user] [ 862.197264][ T9683] #1: ffff88800f460548 (&x->lock){+.-.}-{3:3}, at: xfrm_state_flush+0x1b3/0x3a0 [ 862.197629][ T9683] CPU: 3 UID: 0 PID: 9683 Comm: ip Not tainted 6.13.0-rc1-virtme #1 [ 862.197967][ T9683] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 862.198204][ T9683] Call Trace: [ 862.198352][ T9683] <TASK> [ 862.198458][ T9683] dump_stack_lvl+0xb0/0xd0 [ 862.198659][ T9683] __might_resched+0x2f8/0x530 [ 862.198852][ T9683] ? kfree+0x2d/0x330 [ 862.199005][ T9683] __mutex_lock+0xd9/0xbc0 [ 862.199202][ T9683] ? ref_tracker_free+0x35e/0x910 [ 862.199401][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790 [ 862.199937][ T9683] ? find_held_lock+0x2c/0x110 [ 862.200133][ T9683] ? __pfx___mutex_lock+0x10/0x10 [ 862.200329][ T9683] ? bond_ipsec_del_sa+0x280/0x790 [ 862.200519][ T9683] ? xfrm_dev_state_delete+0x97/0x170 [ 862.200711][ T9683] ? __xfrm_state_delete+0x681/0x8e0 [ 862.200907][ T9683] ? xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user] [ 862.201151][ T9683] ? netlink_rcv_skb+0x130/0x360 [ 862.201347][ T9683] ? xfrm_netlink_rcv+0x74/0x90 [xfrm_user] [ 862.201587][ T9683] ? netlink_unicast+0x44b/0x710 [ 862.201780][ T9683] ? netlink_sendmsg+0x723/0xbe0 [ 862.201973][ T9683] ? ____sys_sendmsg+0x7ac/0xa10 [ 862.202164][ T9683] ? ___sys_sendmsg+0xee/0x170 [ 862.202355][ T9683] ? __sys_sendmsg+0x109/0x1a0 [ 862.202546][ T9683] ? do_syscall_64+0xc1/0x1d0 [ 862.202738][ T9683] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 862.202986][ T9683] ? __pfx_nsim_ipsec_del_sa+0x10/0x10 [netdevsim] [ 862.203251][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790 [ 862.203457][ T9683] bond_ipsec_del_sa+0x2c1/0x790 [ 862.203648][ T9683] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 862.203845][ T9683] ? __pfx_bond_ipsec_del_sa+0x10/0x10 [ 862.204034][ T9683] ? do_raw_spin_lock+0x131/0x270 [ 862.204225][ T9683] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 862.204468][ T9683] xfrm_dev_state_delete+0x97/0x170 [ 862.204665][ T9683] __xfrm_state_delete+0x681/0x8e0 [ 862.204858][ T9683] xfrm_state_flush+0x1bb/0x3a0 [ 862.205057][ T9683] xfrm_flush_sa+0xf0/0x270 [xfrm_user] [ 862.205290][ T9683] ? __pfx_xfrm_flush_sa+0x10/0x10 [xfrm_user] [ 862.205537][ T9683] ? __nla_validate_parse+0x48/0x3d0 [ 862.205744][ T9683] xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user] [ 862.205985][ T9683] ? __pfx___lock_release+0x10/0x10 [ 862.206174][ T9683] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user] [ 862.206412][ T9683] ? __pfx_validate_chain+0x10/0x10 [ 862.206614][ T9683] ? hlock_class+0x4e/0x130 [ 862.206807][ T9683] ? mark_lock+0x38/0x3e0 [ 862.206986][ T9683] ? __mutex_trylock_common+0xfa/0x260 [ 862.207181][ T9683] ? __pfx___mutex_trylock_common+0x10/0x10 [ 862.207425][ T9683] netlink_rcv_skb+0x130/0x360
On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
The first patch fixes the xfrm offload feature during setup active-backup mode. The second patch add a ipsec offload testing.
Looks like the test is too good, is there a fix pending somewhere for the BUG below? We can't merge the test before that:
This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock) for the xfrm state delete.
But I'm not sure if it's proper to release the spin lock in bond code. This seems too specific.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7daeab67e7b5..69563bc958ca 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) real_dev->xfrmdev_ops->xdo_dev_state_delete(xs); out: netdev_put(real_dev, &tracker); + spin_unlock_bh(&xs->lock); mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { if (ipsec->xs == xs) { @@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) } } mutex_unlock(&bond->ipsec_lock); + spin_lock_bh(&xs->lock); }
What do you think?
Thanks Hangbin
https://netdev-3.bots.linux.dev/vmksft-bonding-dbg/results/900082/11-bond-ip...
[ 859.672652][ C3] bond_xfrm_update_stats: eth0 doesn't support xdo_dev_state_update_stats [ 860.467189][ T8677] bond0: (slave eth0): link status definitely down, disabling slave [ 860.467664][ T8677] bond0: (slave eth1): making interface the new active one [ 860.831042][ T9677] bond_xfrm_update_stats: eth1 doesn't support xdo_dev_state_update_stats [ 862.195271][ T9683] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562 [ 862.195880][ T9683] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9683, name: ip [ 862.196189][ T9683] preempt_count: 201, expected: 0 [ 862.196396][ T9683] RCU nest depth: 0, expected: 0 [ 862.196591][ T9683] 2 locks held by ip/9683: [ 862.196818][ T9683] #0: ffff88800a829558 (&net->xfrm.xfrm_cfg_mutex){+.+.}-{4:4}, at: xfrm_netlink_rcv+0x65/0x90 [xfrm_user] [ 862.197264][ T9683] #1: ffff88800f460548 (&x->lock){+.-.}-{3:3}, at: xfrm_state_flush+0x1b3/0x3a0 [ 862.197629][ T9683] CPU: 3 UID: 0 PID: 9683 Comm: ip Not tainted 6.13.0-rc1-virtme #1 [ 862.197967][ T9683] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 862.198204][ T9683] Call Trace: [ 862.198352][ T9683] <TASK> [ 862.198458][ T9683] dump_stack_lvl+0xb0/0xd0 [ 862.198659][ T9683] __might_resched+0x2f8/0x530 [ 862.198852][ T9683] ? kfree+0x2d/0x330 [ 862.199005][ T9683] __mutex_lock+0xd9/0xbc0 [ 862.199202][ T9683] ? ref_tracker_free+0x35e/0x910 [ 862.199401][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790 [ 862.199937][ T9683] ? find_held_lock+0x2c/0x110 [ 862.200133][ T9683] ? __pfx___mutex_lock+0x10/0x10 [ 862.200329][ T9683] ? bond_ipsec_del_sa+0x280/0x790 [ 862.200519][ T9683] ? xfrm_dev_state_delete+0x97/0x170 [ 862.200711][ T9683] ? __xfrm_state_delete+0x681/0x8e0 [ 862.200907][ T9683] ? xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user] [ 862.201151][ T9683] ? netlink_rcv_skb+0x130/0x360 [ 862.201347][ T9683] ? xfrm_netlink_rcv+0x74/0x90 [xfrm_user] [ 862.201587][ T9683] ? netlink_unicast+0x44b/0x710 [ 862.201780][ T9683] ? netlink_sendmsg+0x723/0xbe0 [ 862.201973][ T9683] ? ____sys_sendmsg+0x7ac/0xa10 [ 862.202164][ T9683] ? ___sys_sendmsg+0xee/0x170 [ 862.202355][ T9683] ? __sys_sendmsg+0x109/0x1a0 [ 862.202546][ T9683] ? do_syscall_64+0xc1/0x1d0 [ 862.202738][ T9683] ? entry_SYSCALL_64_after_hwframe+0x77/0x7f [ 862.202986][ T9683] ? __pfx_nsim_ipsec_del_sa+0x10/0x10 [netdevsim] [ 862.203251][ T9683] ? bond_ipsec_del_sa+0x2c1/0x790 [ 862.203457][ T9683] bond_ipsec_del_sa+0x2c1/0x790 [ 862.203648][ T9683] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 862.203845][ T9683] ? __pfx_bond_ipsec_del_sa+0x10/0x10 [ 862.204034][ T9683] ? do_raw_spin_lock+0x131/0x270 [ 862.204225][ T9683] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 862.204468][ T9683] xfrm_dev_state_delete+0x97/0x170 [ 862.204665][ T9683] __xfrm_state_delete+0x681/0x8e0 [ 862.204858][ T9683] xfrm_state_flush+0x1bb/0x3a0 [ 862.205057][ T9683] xfrm_flush_sa+0xf0/0x270 [xfrm_user] [ 862.205290][ T9683] ? __pfx_xfrm_flush_sa+0x10/0x10 [xfrm_user] [ 862.205537][ T9683] ? __nla_validate_parse+0x48/0x3d0 [ 862.205744][ T9683] xfrm_user_rcv_msg+0x4f8/0x920 [xfrm_user] [ 862.205985][ T9683] ? __pfx___lock_release+0x10/0x10 [ 862.206174][ T9683] ? __pfx_xfrm_user_rcv_msg+0x10/0x10 [xfrm_user] [ 862.206412][ T9683] ? __pfx_validate_chain+0x10/0x10 [ 862.206614][ T9683] ? hlock_class+0x4e/0x130 [ 862.206807][ T9683] ? mark_lock+0x38/0x3e0 [ 862.206986][ T9683] ? __mutex_trylock_common+0xfa/0x260 [ 862.207181][ T9683] ? __pfx___mutex_trylock_common+0x10/0x10 [ 862.207425][ T9683] netlink_rcv_skb+0x130/0x360
On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
The first patch fixes the xfrm offload feature during setup active-backup mode. The second patch add a ipsec offload testing.
Looks like the test is too good, is there a fix pending somewhere for the BUG below? We can't merge the test before that:
This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock) for the xfrm state delete.
But I'm not sure if it's proper to release the spin lock in bond code. This seems too specific.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7daeab67e7b5..69563bc958ca 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) real_dev->xfrmdev_ops->xdo_dev_state_delete(xs); out: netdev_put(real_dev, &tracker);
- spin_unlock_bh(&xs->lock); mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { if (ipsec->xs == xs) {
@@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) } } mutex_unlock(&bond->ipsec_lock);
- spin_lock_bh(&xs->lock);
}
What do you think?
Re-locking doesn't look great, glancing at the code I don't see any obvious better workarounds. Easiest fix would be to don't let the drivers sleep in the callbacks and then we can go back to a spin lock. Maybe nvidia people have better ideas, I'm not familiar with this offload.
On Fri, Dec 13, 2024 at 07:31:27PM -0800, Jakub Kicinski wrote:
On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
The first patch fixes the xfrm offload feature during setup active-backup mode. The second patch add a ipsec offload testing.
Looks like the test is too good, is there a fix pending somewhere for the BUG below? We can't merge the test before that:
This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock) for the xfrm state delete.
But I'm not sure if it's proper to release the spin lock in bond code. This seems too specific.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7daeab67e7b5..69563bc958ca 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) real_dev->xfrmdev_ops->xdo_dev_state_delete(xs); out: netdev_put(real_dev, &tracker);
- spin_unlock_bh(&xs->lock); mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { if (ipsec->xs == xs) {
@@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) } } mutex_unlock(&bond->ipsec_lock);
- spin_lock_bh(&xs->lock);
}
What do you think?
Re-locking doesn't look great, glancing at the code I don't see any obvious better workarounds. Easiest fix would be to don't let the drivers sleep in the callbacks and then we can go back to a spin lock. Maybe nvidia people have better ideas, I'm not familiar with this offload.
I don't know how to disable bonding sleeping since we use mutex_lock now. Hi Jianbo, do you have any idea?
Thanks Hangbin
On 1/2/2025 10:44 AM, Hangbin Liu wrote:
On Fri, Dec 13, 2024 at 07:31:27PM -0800, Jakub Kicinski wrote:
On Fri, 13 Dec 2024 07:18:08 +0000 Hangbin Liu wrote:
On Thu, Dec 12, 2024 at 06:27:34AM -0800, Jakub Kicinski wrote:
On Wed, 11 Dec 2024 07:11:25 +0000 Hangbin Liu wrote:
The first patch fixes the xfrm offload feature during setup active-backup mode. The second patch add a ipsec offload testing.
Looks like the test is too good, is there a fix pending somewhere for the BUG below? We can't merge the test before that:
This should be a regression of 2aeeef906d5a ("bonding: change ipsec_lock from spin lock to mutex"). As in xfrm_state_delete we called spin_lock_bh(&x->lock) for the xfrm state delete.
But I'm not sure if it's proper to release the spin lock in bond code. This seems too specific.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7daeab67e7b5..69563bc958ca 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -592,6 +592,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) real_dev->xfrmdev_ops->xdo_dev_state_delete(xs); out: netdev_put(real_dev, &tracker);
- spin_unlock_bh(&xs->lock); mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { if (ipsec->xs == xs) {
@@ -601,6 +602,7 @@ static void bond_ipsec_del_sa(struct xfrm_state *xs) } } mutex_unlock(&bond->ipsec_lock);
- spin_lock_bh(&xs->lock); }
What do you think?
Re-locking doesn't look great, glancing at the code I don't see any obvious better workarounds. Easiest fix would be to don't let the drivers sleep in the callbacks and then we can go back to a spin lock. Maybe nvidia people have better ideas, I'm not familiar with this offload.
I don't know how to disable bonding sleeping since we use mutex_lock now. Hi Jianbo, do you have any idea?
I think we should allow drivers to sleep in the callbacks. So, maybe it's better to move driver's xdo_dev_state_delete out of state's spin lock.
Thanks! Jianbo
linux-kselftest-mirror@lists.linaro.org