This series fixes similar problems in the bonding and team drivers.
Because of missing dev_{uc,mc}_unsync() calls, addresses added to underlying devices may be leftover after the aggregated device is deleted. Add the missing calls and a few related tests.
v2: * fix selftest installation, see patch 3
Benjamin Poirier (3): net: bonding: Unsync device addresses on ndo_stop net: team: Unsync device addresses on ndo_stop net: Add tests for bonding and team address list management
MAINTAINERS | 1 + drivers/net/bonding/bond_main.c | 31 ++++--- drivers/net/team/team.c | 8 ++ tools/testing/selftests/Makefile | 1 + .../selftests/drivers/net/bonding/Makefile | 5 +- .../selftests/drivers/net/bonding/config | 1 + .../drivers/net/bonding/dev_addr_lists.sh | 89 +++++++++++++++++++ .../selftests/drivers/net/bonding/lag_lib.sh | 63 +++++++++++++ .../selftests/drivers/net/team/Makefile | 6 ++ .../testing/selftests/drivers/net/team/config | 3 + .../drivers/net/team/dev_addr_lists.sh | 51 +++++++++++ 11 files changed, 248 insertions(+), 11 deletions(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/dev_addr_lists.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/lag_lib.sh create mode 100644 tools/testing/selftests/drivers/net/team/Makefile create mode 100644 tools/testing/selftests/drivers/net/team/config create mode 100755 tools/testing/selftests/drivers/net/team/dev_addr_lists.sh
Netdev drivers are expected to call dev_{uc,mc}_sync() in their ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method. This is mentioned in the kerneldoc for those dev_* functions.
The bonding driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have already been emptied in unregister_netdevice_many() before ndo_uninit is called. This mistake can result in addresses being leftover on former bond slaves after a bond has been deleted; see test_LAG_cleanup() in the last patch in this series.
Add unsync calls, via bond_hw_addr_flush(), at their expected location, bond_close(). Add dev_mc_add() call to bond_open() to match the above change. The existing call __bond_release_one->bond_hw_addr_flush is left in place because there are other call chains that lead to __bond_release_one(), not just ndo_uninit.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Benjamin Poirier bpoirier@nvidia.com --- drivers/net/bonding/bond_main.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 2f4da2c13c0a..5784fbe03552 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -254,6 +254,8 @@ static const struct flow_dissector_key flow_keys_bonding_keys[] = {
static struct flow_dissector flow_keys_bonding __read_mostly;
+static const u8 lacpdu_multicast[] = MULTICAST_LACPDU_ADDR; + /*-------------------------- Forward declarations ---------------------------*/
static int bond_init(struct net_device *bond_dev); @@ -865,12 +867,8 @@ static void bond_hw_addr_flush(struct net_device *bond_dev, dev_uc_unsync(slave_dev, bond_dev); dev_mc_unsync(slave_dev, bond_dev);
- if (BOND_MODE(bond) == BOND_MODE_8023AD) { - /* del lacpdu mc addr from mc list */ - u8 lacpdu_multicast[ETH_ALEN] = MULTICAST_LACPDU_ADDR; - + if (BOND_MODE(bond) == BOND_MODE_8023AD) dev_mc_del(slave_dev, lacpdu_multicast); - } }
/*--------------------------- Active slave change ---------------------------*/ @@ -2171,12 +2169,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, dev_uc_sync_multiple(slave_dev, bond_dev); netif_addr_unlock_bh(bond_dev);
- if (BOND_MODE(bond) == BOND_MODE_8023AD) { - /* add lacpdu mc addr to mc list */ - u8 lacpdu_multicast[ETH_ALEN] = MULTICAST_LACPDU_ADDR; - + if (BOND_MODE(bond) == BOND_MODE_8023AD) dev_mc_add(slave_dev, lacpdu_multicast); - } }
bond->slave_cnt++; @@ -4211,6 +4205,9 @@ static int bond_open(struct net_device *bond_dev) /* register to receive LACPDUs */ bond->recv_probe = bond_3ad_lacpdu_recv; bond_3ad_initiate_agg_selection(bond, 1); + + bond_for_each_slave(bond, slave, iter) + dev_mc_add(slave->dev, lacpdu_multicast); }
if (bond_mode_can_use_xmit_hash(bond)) @@ -4222,6 +4219,7 @@ static int bond_open(struct net_device *bond_dev) static int bond_close(struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev); + struct slave *slave;
bond_work_cancel_all(bond); bond->send_peer_notif = 0; @@ -4229,6 +4227,19 @@ static int bond_close(struct net_device *bond_dev) bond_alb_deinitialize(bond); bond->recv_probe = NULL;
+ if (bond_uses_primary(bond)) { + rcu_read_lock(); + slave = rcu_dereference(bond->curr_active_slave); + if (slave) + bond_hw_addr_flush(bond_dev, slave->dev); + rcu_read_unlock(); + } else { + struct list_head *iter; + + bond_for_each_slave(bond, slave, iter) + bond_hw_addr_flush(bond_dev, slave->dev); + } + return 0; }
Benjamin Poirier bpoirier@nvidia.com wrote:
Repeating a couple of questions that I suspect were missed the first time around:
Netdev drivers are expected to call dev_{uc,mc}_sync() in their ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method. This is mentioned in the kerneldoc for those dev_* functions.
The bonding driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have already been emptied in unregister_netdevice_many() before ndo_uninit is called. This mistake can result in addresses being leftover on former bond slaves after a bond has been deleted; see test_LAG_cleanup() in the last patch in this series.
Add unsync calls, via bond_hw_addr_flush(), at their expected location, bond_close(). Add dev_mc_add() call to bond_open() to match the above change. The existing call __bond_release_one->bond_hw_addr_flush is left in place because there are other call chains that lead to __bond_release_one(), not just ndo_uninit.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Benjamin Poirier bpoirier@nvidia.com
drivers/net/bonding/bond_main.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 2f4da2c13c0a..5784fbe03552 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -254,6 +254,8 @@ static const struct flow_dissector_key flow_keys_bonding_keys[] = {
static struct flow_dissector flow_keys_bonding __read_mostly;
+static const u8 lacpdu_multicast[] = MULTICAST_LACPDU_ADDR;
/*-------------------------- Forward declarations ---------------------------*/
static int bond_init(struct net_device *bond_dev); @@ -865,12 +867,8 @@ static void bond_hw_addr_flush(struct net_device *bond_dev, dev_uc_unsync(slave_dev, bond_dev); dev_mc_unsync(slave_dev, bond_dev);
- if (BOND_MODE(bond) == BOND_MODE_8023AD) {
/* del lacpdu mc addr from mc list */
u8 lacpdu_multicast[ETH_ALEN] = MULTICAST_LACPDU_ADDR;
- if (BOND_MODE(bond) == BOND_MODE_8023AD) dev_mc_del(slave_dev, lacpdu_multicast);
- }
}
/*--------------------------- Active slave change ---------------------------*/ @@ -2171,12 +2169,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, dev_uc_sync_multiple(slave_dev, bond_dev); netif_addr_unlock_bh(bond_dev);
if (BOND_MODE(bond) == BOND_MODE_8023AD) {
/* add lacpdu mc addr to mc list */
u8 lacpdu_multicast[ETH_ALEN] = MULTICAST_LACPDU_ADDR;
if (BOND_MODE(bond) == BOND_MODE_8023AD) dev_mc_add(slave_dev, lacpdu_multicast);
}}
Just to make sure I'm clear (not missing something in the churn), the above changes regarding lacpdu_multicast have no functional impact, correct? They appear to move lacpdu_multicast to global scope for use in the change just below.
bond->slave_cnt++; @@ -4211,6 +4205,9 @@ static int bond_open(struct net_device *bond_dev) /* register to receive LACPDUs */ bond->recv_probe = bond_3ad_lacpdu_recv; bond_3ad_initiate_agg_selection(bond, 1);
bond_for_each_slave(bond, slave, iter)
}dev_mc_add(slave->dev, lacpdu_multicast);
After this change, am I understanding correctly that both bond_enslave() and bond_open() will call dev_mc_add() for lacpdu_multicast? Since dev_mc_add() -> __dev_mc_add() calls __hw_addr_add_ex() with sync=false and exclusive=false, could that allow us to end up with two references for lacpdu_multicast?
-J
if (bond_mode_can_use_xmit_hash(bond)) @@ -4222,6 +4219,7 @@ static int bond_open(struct net_device *bond_dev) static int bond_close(struct net_device *bond_dev) { struct bonding *bond = netdev_priv(bond_dev);
struct slave *slave;
bond_work_cancel_all(bond); bond->send_peer_notif = 0;
@@ -4229,6 +4227,19 @@ static int bond_close(struct net_device *bond_dev) bond_alb_deinitialize(bond); bond->recv_probe = NULL;
- if (bond_uses_primary(bond)) {
rcu_read_lock();
slave = rcu_dereference(bond->curr_active_slave);
if (slave)
bond_hw_addr_flush(bond_dev, slave->dev);
rcu_read_unlock();
- } else {
struct list_head *iter;
bond_for_each_slave(bond, slave, iter)
bond_hw_addr_flush(bond_dev, slave->dev);
- }
- return 0;
}
-- 2.37.2
--- -Jay Vosburgh, jay.vosburgh@canonical.com
On 2022-09-02 11:28 -0700, Jay Vosburgh wrote:
Benjamin Poirier bpoirier@nvidia.com wrote:
Repeating a couple of questions that I suspect were missed the first time around:
Thanks for repeating, I did miss the other questions, sorry.
[...]
@@ -2171,12 +2169,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev, dev_uc_sync_multiple(slave_dev, bond_dev); netif_addr_unlock_bh(bond_dev);
if (BOND_MODE(bond) == BOND_MODE_8023AD) {
/* add lacpdu mc addr to mc list */
u8 lacpdu_multicast[ETH_ALEN] = MULTICAST_LACPDU_ADDR;
if (BOND_MODE(bond) == BOND_MODE_8023AD) dev_mc_add(slave_dev, lacpdu_multicast);
}}
Just to make sure I'm clear (not missing something in the churn), the above changes regarding lacpdu_multicast have no functional impact, correct? They appear to move lacpdu_multicast to global scope for use in the change just below.
Yes, that's right - no functional impact. I'll split that to a separate patch to make it clearer.
bond->slave_cnt++; @@ -4211,6 +4205,9 @@ static int bond_open(struct net_device *bond_dev) /* register to receive LACPDUs */ bond->recv_probe = bond_3ad_lacpdu_recv; bond_3ad_initiate_agg_selection(bond, 1);
bond_for_each_slave(bond, slave, iter)
}dev_mc_add(slave->dev, lacpdu_multicast);
After this change, am I understanding correctly that both bond_enslave() and bond_open() will call dev_mc_add() for lacpdu_multicast? Since dev_mc_add() -> __dev_mc_add() calls __hw_addr_add_ex() with sync=false and exclusive=false, could that allow us to end up with two references for lacpdu_multicast?
You are correct once again. When enslaving to an up bond (case in the selftest), it is ok, but when enslaving to a down bond and then setting it up, there is a double add.
Thanks for the review. I'll send a v3.
Netdev drivers are expected to call dev_{uc,mc}_sync() in their ndo_set_rx_mode method and dev_{uc,mc}_unsync() in their ndo_stop method. This is mentioned in the kerneldoc for those dev_* functions.
The team driver calls dev_{uc,mc}_unsync() during ndo_uninit instead of ndo_stop. This is ineffective because address lists (dev->{uc,mc}) have already been emptied in unregister_netdevice_many() before ndo_uninit is called. This mistake can result in addresses being leftover on former team ports after a team device has been deleted; see test_LAG_cleanup() in the last patch in this series.
Add unsync calls at their expected location, team_close(). The existing unsync calls in team_port_del() are left in place because there are other call chains that lead to team_port_del(), not just ndo_uninit.
Fixes: 3d249d4ca7d0 ("net: introduce ethernet teaming device") Signed-off-by: Benjamin Poirier bpoirier@nvidia.com --- drivers/net/team/team.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index aac133a1e27a..07e7187d46bc 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1700,6 +1700,14 @@ static int team_open(struct net_device *dev)
static int team_close(struct net_device *dev) { + struct team *team = netdev_priv(dev); + struct team_port *port; + + list_for_each_entry(port, &team->port_list, list) { + dev_uc_unsync(port->dev, dev); + dev_mc_unsync(port->dev, dev); + } + return 0; }
Test that the bonding and team drivers clean up an underlying device's address lists (dev->uc, dev->mc) when the aggregated device is deleted.
Test addition and removal of the LACPDU multicast address on underlying devices by the bonding driver.
v2: * add lag_lib.sh to TEST_FILES
Signed-off-by: Benjamin Poirier bpoirier@nvidia.com --- MAINTAINERS | 1 + tools/testing/selftests/Makefile | 1 + .../selftests/drivers/net/bonding/Makefile | 5 +- .../selftests/drivers/net/bonding/config | 1 + .../drivers/net/bonding/dev_addr_lists.sh | 89 +++++++++++++++++++ .../selftests/drivers/net/bonding/lag_lib.sh | 63 +++++++++++++ .../selftests/drivers/net/team/Makefile | 6 ++ .../testing/selftests/drivers/net/team/config | 3 + .../drivers/net/team/dev_addr_lists.sh | 51 +++++++++++ 9 files changed, 219 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/drivers/net/bonding/dev_addr_lists.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/lag_lib.sh create mode 100644 tools/testing/selftests/drivers/net/team/Makefile create mode 100644 tools/testing/selftests/drivers/net/team/config create mode 100755 tools/testing/selftests/drivers/net/team/dev_addr_lists.sh
diff --git a/MAINTAINERS b/MAINTAINERS index 589517372408..4194f44e7bb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -19937,6 +19937,7 @@ S: Supported F: drivers/net/team/ F: include/linux/if_team.h F: include/uapi/linux/if_team.h +F: tools/testing/selftests/net/team/
TECHNOLOGIC SYSTEMS TS-5500 PLATFORM SUPPORT M: "Savoir-faire Linux Inc." kernel@savoirfairelinux.com diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index c2064a35688b..1fc89b8ef433 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -13,6 +13,7 @@ TARGETS += damon TARGETS += drivers/dma-buf TARGETS += drivers/s390x/uvdevice TARGETS += drivers/net/bonding +TARGETS += drivers/net/team TARGETS += efivarfs TARGETS += exec TARGETS += filesystems diff --git a/tools/testing/selftests/drivers/net/bonding/Makefile b/tools/testing/selftests/drivers/net/bonding/Makefile index ab6c54b12098..0f9659407969 100644 --- a/tools/testing/selftests/drivers/net/bonding/Makefile +++ b/tools/testing/selftests/drivers/net/bonding/Makefile @@ -1,6 +1,9 @@ # SPDX-License-Identifier: GPL-2.0 # Makefile for net selftests
-TEST_PROGS := bond-break-lacpdu-tx.sh +TEST_PROGS := bond-break-lacpdu-tx.sh \ + dev_addr_lists.sh + +TEST_FILES := lag_lib.sh
include ../../../lib.mk diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config index dc1c22de3c92..70638fa50b2c 100644 --- a/tools/testing/selftests/drivers/net/bonding/config +++ b/tools/testing/selftests/drivers/net/bonding/config @@ -1 +1,2 @@ CONFIG_BONDING=y +CONFIG_MACVLAN=y diff --git a/tools/testing/selftests/drivers/net/bonding/dev_addr_lists.sh b/tools/testing/selftests/drivers/net/bonding/dev_addr_lists.sh new file mode 100755 index 000000000000..47ad6f22c15b --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/dev_addr_lists.sh @@ -0,0 +1,89 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test bond device handling of addr lists (dev->uc, mc) +# + +ALL_TESTS=" + bond_cleanup_mode1 + bond_cleanup_mode4 + bond_listen_lacpdu_multicast +" + +REQUIRE_MZ=no +NUM_NETIFS=0 +lib_dir=$(dirname "$0") +source "$lib_dir"/../../../net/forwarding/lib.sh + +source "$lib_dir"/lag_lib.sh + + +destroy() +{ + local ifnames=(dummy1 dummy2 bond1 mv0) + local ifname + + for ifname in "${ifnames[@]}"; do + ip link del "$ifname" &>/dev/null + done +} + +cleanup() +{ + pre_cleanup + + destroy +} + + +# bond driver control paths vary between modes that have a primary slave +# (bond_uses_primary()) and others. Test both kinds of modes. + +bond_cleanup_mode1() +{ + RET=0 + + test_LAG_cleanup "bonding" "active-backup" +} + +bond_cleanup_mode4() { + RET=0 + + test_LAG_cleanup "bonding" "802.3ad" +} + +bond_listen_lacpdu_multicast() +{ + RET=0 + + local lacpdu_mc="01:80:c2:00:00:02" + + ip link add dummy1 type dummy + ip link add bond1 up type bond mode 802.3ad + ip link set dev dummy1 master bond1 + ip link set dev dummy1 up + + grep_bridge_fdb "$lacpdu_mc" bridge fdb show brport dummy1 >/dev/null + check_err $? "LACPDU multicast address not present on slave (1)" + + ip link set dev bond1 down + + not grep_bridge_fdb "$lacpdu_mc" bridge fdb show brport dummy1 >/dev/null + check_err $? "LACPDU multicast address still present on slave" + + ip link set dev bond1 up + + grep_bridge_fdb "$lacpdu_mc" bridge fdb show brport dummy1 >/dev/null + check_err $? "LACPDU multicast address not present on slave (2)" + + cleanup + + log_test "Bond adds LACPDU multicast address to slave" +} + + +trap cleanup EXIT + +tests_run + +exit "$EXIT_STATUS" diff --git a/tools/testing/selftests/drivers/net/bonding/lag_lib.sh b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh new file mode 100644 index 000000000000..51458f1da035 --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/lag_lib.sh @@ -0,0 +1,63 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# Test that a link aggregation device (bonding, team) removes the hardware +# addresses that it adds on its underlying devices. +test_LAG_cleanup() +{ + local driver="$1" + local mode="$2" + local ucaddr="02:00:00:12:34:56" + local addr6="fe80::78:9abc/64" + local mcaddr="33:33:ff:78:9a:bc" + local name + + ip link add dummy1 type dummy + ip link add dummy2 type dummy + if [ "$driver" = "bonding" ]; then + name="bond1" + ip link add "$name" up type bond mode "$mode" + ip link set dev dummy1 master "$name" + ip link set dev dummy2 master "$name" + elif [ "$driver" = "team" ]; then + name="team0" + teamd -d -c ' + { + "device": "'"$name"'", + "runner": { + "name": "'"$mode"'" + }, + "ports": { + "dummy1": + {}, + "dummy2": + {} + } + } + ' + ip link set dev "$name" up + else + check_err 1 + log_test test_LAG_cleanup ": unknown driver "$driver"" + return + fi + + ip link set dev dummy1 up + ip link set dev dummy2 up + # Used to test dev->uc handling + ip link add mv0 link "$name" up address "$ucaddr" type macvlan + # Used to test dev->mc handling + ip address add "$addr6" dev "$name" + ip link set dev "$name" down + ip link del "$name" + + not grep_bridge_fdb "$ucaddr" bridge fdb show >/dev/null + check_err $? "macvlan unicast address still present on a slave" + + not grep_bridge_fdb "$mcaddr" bridge fdb show >/dev/null + check_err $? "IPv6 solicited-node multicast mac address still present on a slave" + + cleanup + + log_test "$driver cleanup mode $mode" +} diff --git a/tools/testing/selftests/drivers/net/team/Makefile b/tools/testing/selftests/drivers/net/team/Makefile new file mode 100644 index 000000000000..642d8df1c137 --- /dev/null +++ b/tools/testing/selftests/drivers/net/team/Makefile @@ -0,0 +1,6 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for net selftests + +TEST_PROGS := dev_addr_lists.sh + +include ../../../lib.mk diff --git a/tools/testing/selftests/drivers/net/team/config b/tools/testing/selftests/drivers/net/team/config new file mode 100644 index 000000000000..265b6882cc21 --- /dev/null +++ b/tools/testing/selftests/drivers/net/team/config @@ -0,0 +1,3 @@ +CONFIG_NET_TEAM=y +CONFIG_NET_TEAM_MODE_LOADBALANCE=y +CONFIG_MACVLAN=y diff --git a/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh b/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh new file mode 100755 index 000000000000..debda7262956 --- /dev/null +++ b/tools/testing/selftests/drivers/net/team/dev_addr_lists.sh @@ -0,0 +1,51 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test team device handling of addr lists (dev->uc, mc) +# + +ALL_TESTS=" + team_cleanup +" + +REQUIRE_MZ=no +NUM_NETIFS=0 +lib_dir=$(dirname "$0") +source "$lib_dir"/../../../net/forwarding/lib.sh + +source "$lib_dir"/../bonding/lag_lib.sh + + +destroy() +{ + local ifnames=(dummy0 dummy1 team0 mv0) + local ifname + + for ifname in "${ifnames[@]}"; do + ip link del "$ifname" &>/dev/null + done +} + +cleanup() +{ + pre_cleanup + + destroy +} + + +team_cleanup() +{ + RET=0 + + test_LAG_cleanup "team" "lacp" +} + + +require_command teamd + +trap cleanup EXIT + +tests_run + +exit "$EXIT_STATUS"
linux-kselftest-mirror@lists.linaro.org