This patchset fixes an issue where bonding fails to establish a stable LACP negotiation when operating in passive mode (lacp_active=off).
In passive mode, the current implementation only replies when the partner's state changes, which results in LACP timeout and unstable aggregator formation.
With this change, the bond responds to each received LACPDU in passive mode by setting ntt = true, ensuring timely replies and stable LACP negotiation.
Hangbin Liu (2): bonding: update ntt to true in passive mode selftests: bonding: add test for passive LACP mode
drivers/net/bonding/bond_3ad.c | 6 ++ .../drivers/net/bonding/bond_passive_lacp.sh | 21 +++++ .../drivers/net/bonding/bond_topo_lacp.sh | 77 +++++++++++++++++++ 3 files changed, 104 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh
When lacp_active is set to off, the bond operates in passive mode, meaning it will only "speak when spoken to." However, the current kernel implementation only sends an LACPDU in response when the partner's state changes.
In this situation, once LACP negotiation succeeds, the actor stops sending LACPDUs until the partner times out and sends an "expired" LACPDU. This leads to endless LACP state flapping.
To avoid this, we need update ntt to true once received an LACPDU from the partner, ensuring an immediate reply. With this fix, the link becomes stable in most cases, except for one specific scenario:
Actor: lacp_active=off, lacp_rate=slow Partner: lacp_active=on, lacp_rate=fast
In this case, the partner expects frequent LACPDUs (every 1 second), but the actor only responds after receiving an LACPDU, which, in this setup, the partner sends every 30 seconds due to the actor's lacp_rate=slow. By the time the actor replies, the partner has already timed out and sent an "expired" LACPDU.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hangbin Liu liuhangbin@gmail.com --- drivers/net/bonding/bond_3ad.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index c6807e473ab7..e001d1c8a49b 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -666,6 +666,8 @@ static void __update_default_selected(struct port *port) */ static void __update_ntt(struct lacpdu *lacpdu, struct port *port) { + struct bonding *bond; + /* validate lacpdu and port */ if (lacpdu && port) { /* check if any parameter is different then @@ -683,6 +685,10 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port) ) { port->ntt = true; } + + bond = __get_bond_by_port(port); + if (bond && !bond->params.lacp_active) + port->ntt = true; } }
Hangbin Liu liuhangbin@gmail.com wrote:
When lacp_active is set to off, the bond operates in passive mode, meaning it will only "speak when spoken to." However, the current kernel implementation only sends an LACPDU in response when the partner's state changes.
In this situation, once LACP negotiation succeeds, the actor stops sending LACPDUs until the partner times out and sends an "expired" LACPDU. This leads to endless LACP state flapping.
From the above, I suspect our implementation isn't compliant to the standard. Per IEEE 802.1AX-2014 6.4.1 LACP design elements:
c) Active or passive participation in LACP is controlled by LACP_Activity, an administrative control associated with each Aggregation Port, that can take the value Active LACP or Passive LACP. Passive LACP indicates the Aggregation Port’s preference for not transmitting LACPDUs unless its Partner’s control value is Active LACP (i.e., a preference not to speak unless spoken to). Active LACP indicates the Aggregation Port’s preference to participate in the protocol regardless of the Partner’s control value (i.e., a preference to speak regardless).
d) Periodic transmission of LACPDUs occurs if the LACP_Activity control of either the Actor or the Partner is Active LACP. These periodic transmissions will occur at either a slow or fast transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short Timeout) of the Partner System.
Which, in summary, means that if either end (actor or partner) has LACP_Activity set, both ends must send periodic LACPDUs at the rate specified by their respective partner's LACP_Timeout rate.
To avoid this, we need update ntt to true once received an LACPDU from the partner, ensuring an immediate reply. With this fix, the link becomes stable in most cases, except for one specific scenario:
Actor: lacp_active=off, lacp_rate=slow Partner: lacp_active=on, lacp_rate=fast
In this case, the partner expects frequent LACPDUs (every 1 second), but the actor only responds after receiving an LACPDU, which, in this setup, the partner sends every 30 seconds due to the actor's lacp_rate=slow. By the time the actor replies, the partner has already timed out and sent an "expired" LACPDU.
Presuming that I'm correct that we're not implementing 6.4.1 d), above, correctly, then I don't think this is a proper fix, as it kind of band-aids over the problem a bit.
Looking at the code, I suspect the problem revolves around the "lacp_active" check in ad_periodic_machine():
static void ad_periodic_machine(struct port *port, struct bond_params *bond_params) { periodic_states_t last_state;
/* keep current state machine state to compare later if it was changed */ last_state = port->sm_periodic_state;
/* check if port was reinitialized */ if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) || (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) || !bond_params->lacp_active) { port->sm_periodic_state = AD_NO_PERIODIC; }
In the above, because all the tests are chained with ||, the lacp_active test overrides the two correct-looking LACP_STATE_LACP_ACTIVITY tests.
It looks like ad_initialize_port() always sets LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing ever clears it.
Thinking out loud, perhaps this could be fixed by
a) remove the test of bond_params->lacp_active here, and,
b) The lacp_active option setting controls whether LACP_ACTIVITY is set in port->actor_oper_port_state.
Thoughts?
-J
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Hangbin Liu liuhangbin@gmail.com
drivers/net/bonding/bond_3ad.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index c6807e473ab7..e001d1c8a49b 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -666,6 +666,8 @@ static void __update_default_selected(struct port *port) */ static void __update_ntt(struct lacpdu *lacpdu, struct port *port) {
- struct bonding *bond;
- /* validate lacpdu and port */ if (lacpdu && port) { /* check if any parameter is different then
@@ -683,6 +685,10 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port) ) { port->ntt = true; }
bond = __get_bond_by_port(port);
if (bond && !bond->params.lacp_active)
}port->ntt = true;
}
-- 2.46.0
--- -Jay Vosburgh, jv@jvosburgh.net
On Tue, Jul 15, 2025 at 09:19:49PM -0700, Jay Vosburgh wrote:
Hangbin Liu liuhangbin@gmail.com wrote:
When lacp_active is set to off, the bond operates in passive mode, meaning it will only "speak when spoken to." However, the current kernel implementation only sends an LACPDU in response when the partner's state changes.
In this situation, once LACP negotiation succeeds, the actor stops sending LACPDUs until the partner times out and sends an "expired" LACPDU. This leads to endless LACP state flapping.
From the above, I suspect our implementation isn't compliant to the standard. Per IEEE 802.1AX-2014 6.4.1 LACP design elements:
c) Active or passive participation in LACP is controlled by LACP_Activity, an administrative control associated with each Aggregation Port, that can take the value Active LACP or Passive LACP. Passive LACP indicates the Aggregation Port’s preference for not transmitting LACPDUs unless its Partner’s control value is Active LACP (i.e., a preference not to speak unless spoken to). Active LACP indicates the Aggregation Port’s preference to
OK, so this means the passive side should start sending LACPDUs when receive passive actor's LACPDUs, with the slow/fast rate based on partner's rate?
Hmm, then when we should stop sending LACPDUs? After port->sm_mux_state == AD_MUX_DETACHED ?
participate in the protocol regardless of the Partner’s control value (i.e., a preference to speak regardless).
d) Periodic transmission of LACPDUs occurs if the LACP_Activity control of either the Actor or the Partner is Active LACP. These periodic transmissions will occur at either a slow or fast transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short Timeout) of the Partner System.
Which, in summary, means that if either end (actor or partner) has LACP_Activity set, both ends must send periodic LACPDUs at the rate specified by their respective partner's LACP_Timeout rate.
To avoid this, we need update ntt to true once received an LACPDU from the partner, ensuring an immediate reply. With this fix, the link becomes stable in most cases, except for one specific scenario:
Actor: lacp_active=off, lacp_rate=slow Partner: lacp_active=on, lacp_rate=fast
In this case, the partner expects frequent LACPDUs (every 1 second), but the actor only responds after receiving an LACPDU, which, in this setup, the partner sends every 30 seconds due to the actor's lacp_rate=slow. By the time the actor replies, the partner has already timed out and sent an "expired" LACPDU.
Presuming that I'm correct that we're not implementing 6.4.1 d), above, correctly, then I don't think this is a proper fix, as it kind of band-aids over the problem a bit.
Looking at the code, I suspect the problem revolves around the "lacp_active" check in ad_periodic_machine():
static void ad_periodic_machine(struct port *port, struct bond_params *bond_params) { periodic_states_t last_state;
/* keep current state machine state to compare later if it was changed */ last_state = port->sm_periodic_state;
/* check if port was reinitialized */ if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) || (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) || !bond_params->lacp_active) { port->sm_periodic_state = AD_NO_PERIODIC; }
In the above, because all the tests are chained with ||, the lacp_active test overrides the two correct-looking LACP_STATE_LACP_ACTIVITY tests.
It looks like ad_initialize_port() always sets LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing ever clears it.
Thinking out loud, perhaps this could be fixed by
a) remove the test of bond_params->lacp_active here, and,
b) The lacp_active option setting controls whether LACP_ACTIVITY is set in port->actor_oper_port_state.
Thoughts?
As the upper question. When should we stop sending the LACPDUs?
Thanks Hangbin
Hangbin Liu liuhangbin@gmail.com wrote:
On Tue, Jul 15, 2025 at 09:19:49PM -0700, Jay Vosburgh wrote:
Hangbin Liu liuhangbin@gmail.com wrote:
When lacp_active is set to off, the bond operates in passive mode, meaning it will only "speak when spoken to." However, the current kernel implementation only sends an LACPDU in response when the partner's state changes.
In this situation, once LACP negotiation succeeds, the actor stops sending LACPDUs until the partner times out and sends an "expired" LACPDU. This leads to endless LACP state flapping.
From the above, I suspect our implementation isn't compliant to the standard. Per IEEE 802.1AX-2014 6.4.1 LACP design elements:
c) Active or passive participation in LACP is controlled by LACP_Activity, an administrative control associated with each Aggregation Port, that can take the value Active LACP or Passive LACP. Passive LACP indicates the Aggregation Port’s preference for not transmitting LACPDUs unless its Partner’s control value is Active LACP (i.e., a preference not to speak unless spoken to). Active LACP indicates the Aggregation Port’s preference to
OK, so this means the passive side should start sending LACPDUs when receive passive actor's LACPDUs, with the slow/fast rate based on partner's rate?
Did you mean "receive active actor's LACPDUs"?
Regardless, the standard requires both sides to initiate periodic LACPDU transmission if either or both enable LACP_Activity in their LACPDUs.
So, if a received LACPDU from the partner has LACP_Activity set, then, yes, we would enable periodic LACPDU transmission, regardless of our local setting of "lacp_active" / LACP_Activity.
Hmm, then when we should stop sending LACPDUs? After port->sm_mux_state == AD_MUX_DETACHED ?
We stop sending when the criteria for NO_PERIODIC in the periodic state machine is met (IEEE 802.1AX-2014 6.4.13, Figure 6-19).
Practically speaking, this happens when a BEGIN event occurs, due to a port being reinitialized. The ad_mux_machine() will set the mux state to AD_MUX_DETACHED when BEGIN occurs, so I don't think we need to test for DETACHED explicitly.
The NO_PERIODIC check is the first "if" block in ad_periodic_machine() that I referenced below. The code currently tests all of the criteria from Figure 6-19, but adds a test of "!lacp_active", which is why I suspect that removing that bit and managing the lacp_active option via the LACP_Activity in the actor port state would do the right thing.
-J
participate in the protocol regardless of the Partner’s control value (i.e., a preference to speak regardless).
d) Periodic transmission of LACPDUs occurs if the LACP_Activity control of either the Actor or the Partner is Active LACP. These periodic transmissions will occur at either a slow or fast transmission rate depending upon the expressed LACP_Timeout preference (Long Timeout or Short Timeout) of the Partner System.
Which, in summary, means that if either end (actor or partner) has LACP_Activity set, both ends must send periodic LACPDUs at the rate specified by their respective partner's LACP_Timeout rate.
To avoid this, we need update ntt to true once received an LACPDU from the partner, ensuring an immediate reply. With this fix, the link becomes stable in most cases, except for one specific scenario:
Actor: lacp_active=off, lacp_rate=slow Partner: lacp_active=on, lacp_rate=fast
In this case, the partner expects frequent LACPDUs (every 1 second), but the actor only responds after receiving an LACPDU, which, in this setup, the partner sends every 30 seconds due to the actor's lacp_rate=slow. By the time the actor replies, the partner has already timed out and sent an "expired" LACPDU.
Presuming that I'm correct that we're not implementing 6.4.1 d), above, correctly, then I don't think this is a proper fix, as it kind of band-aids over the problem a bit.
Looking at the code, I suspect the problem revolves around the "lacp_active" check in ad_periodic_machine():
static void ad_periodic_machine(struct port *port, struct bond_params *bond_params) { periodic_states_t last_state;
/* keep current state machine state to compare later if it was changed */ last_state = port->sm_periodic_state;
/* check if port was reinitialized */ if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) || (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) || !bond_params->lacp_active) { port->sm_periodic_state = AD_NO_PERIODIC; }
In the above, because all the tests are chained with ||, the lacp_active test overrides the two correct-looking LACP_STATE_LACP_ACTIVITY tests.
It looks like ad_initialize_port() always sets LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing ever clears it.
Thinking out loud, perhaps this could be fixed by
a) remove the test of bond_params->lacp_active here, and,
b) The lacp_active option setting controls whether LACP_ACTIVITY is set in port->actor_oper_port_state.
Thoughts?
As the upper question. When should we stop sending the LACPDUs?
Thanks Hangbin
--- -Jay Vosburgh, jv@jvosburgh.net
On Tue, Jul 15, 2025 at 09:19:49PM -0700, Jay Vosburgh wrote:
Presuming that I'm correct that we're not implementing 6.4.1 d), above, correctly, then I don't think this is a proper fix, as it kind of band-aids over the problem a bit.
Looking at the code, I suspect the problem revolves around the "lacp_active" check in ad_periodic_machine():
static void ad_periodic_machine(struct port *port, struct bond_params *bond_params) { periodic_states_t last_state;
/* keep current state machine state to compare later if it was changed */ last_state = port->sm_periodic_state;
/* check if port was reinitialized */ if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) || (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) || !bond_params->lacp_active) { port->sm_periodic_state = AD_NO_PERIODIC; }
In the above, because all the tests are chained with ||, the lacp_active test overrides the two correct-looking LACP_STATE_LACP_ACTIVITY tests.
It looks like ad_initialize_port() always sets LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing ever clears it.
Thinking out loud, perhaps this could be fixed by
a) remove the test of bond_params->lacp_active here, and,
b) The lacp_active option setting controls whether LACP_ACTIVITY is set in port->actor_oper_port_state.
Thoughts?
Hi Jay,
I did some investigation and testing. In addition to your previous change, we also need to initialize the partner's state to 0 in ad_initialize_port_tmpl(). Otherwise, the check: ``` !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY) ``` in ad_periodic_machine() will fail even when the actor is in passive mode.
Also, the line: ``` port->partner_oper.port_state |= LACP_STATE_LACP_ACTIVITY; ``` in ad_rx_machine() should be removed, since we can't assume the partner is in active mode. [1]
With these two changes, we can ensure: 1. In passive mode, the actor will not send LACPDU before receiving any LACPDU from the partner. 2. Once it receives the partner’s LACPDU, it will start sending periodic LACPDUs as expected.
Do you agree with making these changes? If so, I can post a patch for your review.
[1] IEEE 8021AX-2020, Figure 6-14—LACP Receive state diagram, the AD_RX_EXPIRED statue should be ``` Partner_Oper_Port_State.Synchronization = FALSE; Partner_Oper_Port_State.Short_Timeout = TRUE; Actor_Oper_Port_State.Expired = TRUE; LACP_currentWhile = Short_Timeout_Time; ```
Thanks, Hangbin
Hangbin Liu liuhangbin@gmail.com wrote:
On Tue, Jul 15, 2025 at 09:19:49PM -0700, Jay Vosburgh wrote:
Presuming that I'm correct that we're not implementing 6.4.1 d), above, correctly, then I don't think this is a proper fix, as it kind of band-aids over the problem a bit.
Looking at the code, I suspect the problem revolves around the "lacp_active" check in ad_periodic_machine():
static void ad_periodic_machine(struct port *port, struct bond_params *bond_params) { periodic_states_t last_state;
/* keep current state machine state to compare later if it was changed */ last_state = port->sm_periodic_state;
/* check if port was reinitialized */ if (((port->sm_vars & AD_PORT_BEGIN) || !(port->sm_vars & AD_PORT_LACP_ENABLED) || !port->is_enabled) || (!(port->actor_oper_port_state & LACP_STATE_LACP_ACTIVITY) && !(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)) || !bond_params->lacp_active) { port->sm_periodic_state = AD_NO_PERIODIC; }
In the above, because all the tests are chained with ||, the lacp_active test overrides the two correct-looking LACP_STATE_LACP_ACTIVITY tests.
It looks like ad_initialize_port() always sets LACP_STATE_LACP_ACTIVITY in the port->actor_oper_port_state, and nothing ever clears it.
Thinking out loud, perhaps this could be fixed by
a) remove the test of bond_params->lacp_active here, and,
b) The lacp_active option setting controls whether LACP_ACTIVITY is set in port->actor_oper_port_state.
Thoughts?
Hi Jay,
I did some investigation and testing. In addition to your previous change, we also need to initialize the partner's state to 0 in ad_initialize_port_tmpl(). Otherwise, the check:
!(port->partner_oper.port_state & LACP_STATE_LACP_ACTIVITY)
in ad_periodic_machine() will fail even when the actor is in passive mode.
Agreed; the .port_state in the port_params tmpl should just be zero; the magic number 1 there now, which is LACP_STATE_LACP_ACTIVITY, is just wrong. For the actor side, the lacp_active option will set it appropriately, and the partner's will be updated by any LACPDUs that arrive.
Also, the line:
port->partner_oper.port_state |= LACP_STATE_LACP_ACTIVITY;
in ad_rx_machine() should be removed, since we can't assume the partner is in active mode. [1]
Also agreed.
With these two changes, we can ensure:
- In passive mode, the actor will not send LACPDU before receiving any LACPDU from the partner.
- Once it receives the partner’s LACPDU, it will start sending periodic LACPDUs as expected.
Do you agree with making these changes? If so, I can post a patch for your review.
Yes, please post a patch.
[1] IEEE 8021AX-2020, Figure 6-14—LACP Receive state diagram, the AD_RX_EXPIRED statue should be
Partner_Oper_Port_State.Synchronization = FALSE; Partner_Oper_Port_State.Short_Timeout = TRUE; Actor_Oper_Port_State.Expired = TRUE; LACP_currentWhile = Short_Timeout_Time;
FWIW, I usually reference the older standards 2008 or 2014, as the 2020 edition changes a lot of things and bonding isn't necessarily conformant to those changes (e.g., many of the state machines are different in large or small ways). Technically, the bonding implementation was written to the pre-802.1AX standard when it was still part of 802.3 (hence the name 802.3ad), clause 43.
This particular bit (the EXPIRED state actions) is the same, but, for example, the transition test from EXPIRED to DEFAULTED is different in the 2014 vs 2020 editions, and we need to be careful not to implement the state machines piecemeal from different editions of the standard.
-J
--- -Jay Vosburgh, jv@jvosburgh.net
On Thu, Jul 24, 2025 at 11:57:53AM +0200, Jay Vosburgh wrote:
FWIW, I usually reference the older standards 2008 or 2014, as the 2020 edition changes a lot of things and bonding isn't necessarily conformant to those changes (e.g., many of the state machines are different in large or small ways). Technically, the bonding implementation was written to the pre-802.1AX standard when it was still part of 802.3 (hence the name 802.3ad), clause 43.
This particular bit (the EXPIRED state actions) is the same, but, for example, the transition test from EXPIRED to DEFAULTED is different in the 2014 vs 2020 editions, and we need to be careful not to implement the state machines piecemeal from different editions of the standard.
Thanks for this info. I will download 2014 version and recheck my changes.
Cheers Hangbin
Add a selftest to verify bonding behavior when lacp_active is set to off.
Signed-off-by: Hangbin Liu liuhangbin@gmail.com --- .../drivers/net/bonding/bond_passive_lacp.sh | 21 +++++ .../drivers/net/bonding/bond_topo_lacp.sh | 77 +++++++++++++++++++ 2 files changed, 98 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off + +lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh + +trap cleanup EXIT +setup_topo_lacp + +lacp_bond_reset "${c_ns}" "lacp_active off" +# make sure the switch state is not expired [A,T,G,S,Ex] +if slowwait 15 ip netns exec ${s_ns} grep -q 'port state: 143' /proc/net/bonding/bond0; then + RET=1 +else + RET=0 +fi +log_test "bond 802.3ad" "lacp_active off" + +exit $EXIT_STATUS diff --git a/tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh new file mode 100644 index 000000000000..33dca6ebdc4b --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh @@ -0,0 +1,77 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Topology for Bond mode 802.3ad testing +# +# +-------------------------+ +# | bond0 | Switch +# | + | 192.0.2.254/24 +# | eth0 | eth1 | 2001:db8::254/24 +# | +---+---+ | +# | | | | +# +-------------------------+ +# | | +# +-------------------------+ +# | | | | +# | +---+---+ | Client +# | eth0 | eth1 | 192.0.2.1/24 +# | + | 2001:db8::1/24 +# | bond0 | +# +-------------------------+ + +REQUIRE_MZ=no +NUM_NETIFS=0 +lib_dir=$(dirname "$0") +source "$lib_dir"/../../../net/forwarding/lib.sh + +s_ip4="192.0.2.254" +c_ip4="192.0.2.1" +s_ip6="2001:db8::254" +c_ip6="2001:db8::1" + +switch_create() +{ + ip netns exec ${s_ns} sysctl -q net.ipv6.conf.default.keep_addr_on_down=1 + ip -n ${s_ns} link add eth0 type veth peer name eth0 netns ${c_ns} + ip -n ${s_ns} link add eth1 type veth peer name eth1 netns ${c_ns} + ip -n ${s_ns} link add bond0 type bond mode 802.3ad miimon 100 lacp_active on lacp_rate fast + ip -n ${s_ns} link set eth0 master bond0 + ip -n ${s_ns} link set eth1 master bond0 + ip -n ${s_ns} addr add ${s_ip4}/24 dev bond0 + ip -n ${s_ns} addr add ${s_ip6}/24 dev bond0 + ip -n ${s_ns} link set bond0 up +} + +client_create() +{ + ip netns exec ${c_ns} sysctl -q net.ipv6.conf.default.keep_addr_on_down=1 + ip -n ${c_ns} link add bond0 type bond mode 802.3ad miimon 100 lacp_active on lacp_rate fast + ip -n ${c_ns} link set eth0 master bond0 + ip -n ${c_ns} link set eth1 master bond0 + ip -n ${c_ns} addr add ${c_ip4}/24 dev bond0 + ip -n ${c_ns} addr add ${c_ip6}/24 dev bond0 + ip -n ${c_ns} link set bond0 up +} + +setup_topo_lacp() +{ + setup_ns s_ns c_ns + defer cleanup_all_ns + + switch_create + client_create +} + +# Reset bond with and options +lacp_bond_reset() +{ + local netns="$1" + local param="$2" + + ip -n ${netns} link set bond0 down + ip -n ${netns} link set bond0 type bond $param + ip -n ${netns} link set bond0 up + + # Wait for IPv6 address ready as it needs DAD + slowwait 10 ip netns exec ${c_ns} ping6 ${s_ip6} -c 1 -W 0.1 &> /dev/null +}
On 7/9/25 11:03 AM, Hangbin Liu wrote:
Add a selftest to verify bonding behavior when lacp_active is set to off.
Signed-off-by: Hangbin Liu liuhangbin@gmail.com
.../drivers/net/bonding/bond_passive_lacp.sh | 21 +++++ .../drivers/net/bonding/bond_topo_lacp.sh | 77 +++++++++++++++++++ 2 files changed, 98 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh
New test should be listed in the relevant makefile
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off
+lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh
shellcheck is not super happy about 'source' usage:
In bond_passive_lacp.sh line 7: source ${lib_dir}/bond_topo_lacp.sh ^-- SC3046 (warning): In POSIX sh, 'source' in place of '.' is undefined. ^-- SC3051 (warning): In POSIX sh, 'source' in place of '.' is undefined.
either switch to '. ' or use bash instead of 'sh'.
+lacp_bond_reset "${c_ns}" "lacp_active off" +# make sure the switch state is not expired [A,T,G,S,Ex] +if slowwait 15 ip netns exec ${s_ns} grep -q 'port state: 143' /proc/net/bonding/bond0; then
Shellcheck wants double quote everywhere. Since in many cases (all the blamed ones in this patch) we know the variable is really a single word, I think you could simply disable the warning with:
#shellcheck disable=SC2086
(same in the other test file)
- RET=1
+else
- RET=0
+fi
/P
On Tue, Jul 15, 2025 at 11:37:54AM +0200, Paolo Abeni wrote:
On 7/9/25 11:03 AM, Hangbin Liu wrote:
Add a selftest to verify bonding behavior when lacp_active is set to off.
Signed-off-by: Hangbin Liu liuhangbin@gmail.com
.../drivers/net/bonding/bond_passive_lacp.sh | 21 +++++ .../drivers/net/bonding/bond_topo_lacp.sh | 77 +++++++++++++++++++ 2 files changed, 98 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh create mode 100644 tools/testing/selftests/drivers/net/bonding/bond_topo_lacp.sh
New test should be listed in the relevant makefile
Ah, yes, I forgot this.
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off
+lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh
shellcheck is not super happy about 'source' usage:
In bond_passive_lacp.sh line 7: source ${lib_dir}/bond_topo_lacp.sh ^-- SC3046 (warning): In POSIX sh, 'source' in place of '.' is undefined. ^-- SC3051 (warning): In POSIX sh, 'source' in place of '.' is undefined.
either switch to '. ' or use bash instead of 'sh'.
OK, I will fix this and other warns.
Thanks Hangbin
+lacp_bond_reset "${c_ns}" "lacp_active off" +# make sure the switch state is not expired [A,T,G,S,Ex] +if slowwait 15 ip netns exec ${s_ns} grep -q 'port state: 143' /proc/net/bonding/bond0; then
Shellcheck wants double quote everywhere. Since in many cases (all the blamed ones in this patch) we know the variable is really a single word, I think you could simply disable the warning with:
#shellcheck disable=SC2086
(same in the other test file)
- RET=1
+else
- RET=0
+fi
/P
On Tue, Jul 15, 2025 at 11:37:54AM +0200, Paolo Abeni wrote:
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off
+lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh
shellcheck is not super happy about 'source' usage:
In bond_passive_lacp.sh line 7: source ${lib_dir}/bond_topo_lacp.sh ^-- SC3046 (warning): In POSIX sh, 'source' in place of '.' is undefined. ^-- SC3051 (warning): In POSIX sh, 'source' in place of '.' is undefined.
either switch to '. ' or use bash instead of 'sh'.
Hi Paolo,
I updated the case and remove the source file bond_topo_lacp.sh. Instead I source the forwarding lib directly like:
lib_dir=$(dirname "$0") source "$lib_dir"/../../../net/forwarding/lib.sh
But this cause shell check unable to find the lib.sh as $lib_dir is get dynamically. This usage is common in selftest. How should we resolves this problem?
Thanks Hangbin
On Thu, Jul 24, 2025 at 04:06:03AM +0000, Hangbin Liu wrote:
On Tue, Jul 15, 2025 at 11:37:54AM +0200, Paolo Abeni wrote:
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off
+lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh
shellcheck is not super happy about 'source' usage:
In bond_passive_lacp.sh line 7: source ${lib_dir}/bond_topo_lacp.sh ^-- SC3046 (warning): In POSIX sh, 'source' in place of '.' is undefined. ^-- SC3051 (warning): In POSIX sh, 'source' in place of '.' is undefined.
either switch to '. ' or use bash instead of 'sh'.
Hi Paolo,
I updated the case and remove the source file bond_topo_lacp.sh. Instead I source the forwarding lib directly like:
lib_dir=$(dirname "$0") source "$lib_dir"/../../../net/forwarding/lib.sh
But this cause shell check unable to find the lib.sh as $lib_dir is get dynamically. This usage is common in selftest. How should we resolves this problem?
OK, I just disabled this warning.
# shellcheck disable=SC1091
Hangbin
Hangbin Liu liuhangbin@gmail.com writes:
On Thu, Jul 24, 2025 at 04:06:03AM +0000, Hangbin Liu wrote:
On Tue, Jul 15, 2025 at 11:37:54AM +0200, Paolo Abeni wrote:
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off
+lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh
shellcheck is not super happy about 'source' usage:
In bond_passive_lacp.sh line 7: source ${lib_dir}/bond_topo_lacp.sh ^-- SC3046 (warning): In POSIX sh, 'source' in place of '.' is undefined. ^-- SC3051 (warning): In POSIX sh, 'source' in place of '.' is undefined.
either switch to '. ' or use bash instead of 'sh'.
Hi Paolo,
I updated the case and remove the source file bond_topo_lacp.sh. Instead I source the forwarding lib directly like:
lib_dir=$(dirname "$0") source "$lib_dir"/../../../net/forwarding/lib.sh
But this cause shell check unable to find the lib.sh as $lib_dir is get dynamically. This usage is common in selftest. How should we resolves this problem?
OK, I just disabled this warning.
# shellcheck disable=SC1091
I believe the point was only about using "." instead of "source". The following should have fixed it:
. ${lib_dir}/bond_topo_lacp.sh
... or just use bash as the interpreter, I suspect lib.sh is not actually POSIX clean.
On Fri, Jul 25, 2025 at 10:27:48AM +0200, Petr Machata wrote:
Hangbin Liu liuhangbin@gmail.com writes:
On Thu, Jul 24, 2025 at 04:06:03AM +0000, Hangbin Liu wrote:
On Tue, Jul 15, 2025 at 11:37:54AM +0200, Paolo Abeni wrote:
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh new file mode 100755 index 000000000000..4cf8a5999aaa --- /dev/null +++ b/tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh @@ -0,0 +1,21 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 +# +# Testing if bond works with lacp_active = off
+lib_dir=$(dirname "$0") +source ${lib_dir}/bond_topo_lacp.sh
shellcheck is not super happy about 'source' usage:
In bond_passive_lacp.sh line 7: source ${lib_dir}/bond_topo_lacp.sh ^-- SC3046 (warning): In POSIX sh, 'source' in place of '.' is undefined. ^-- SC3051 (warning): In POSIX sh, 'source' in place of '.' is undefined.
either switch to '. ' or use bash instead of 'sh'.
Hi Paolo,
I updated the case and remove the source file bond_topo_lacp.sh. Instead I source the forwarding lib directly like:
lib_dir=$(dirname "$0") source "$lib_dir"/../../../net/forwarding/lib.sh
But this cause shell check unable to find the lib.sh as $lib_dir is get dynamically. This usage is common in selftest. How should we resolves this problem?
OK, I just disabled this warning.
# shellcheck disable=SC1091
I believe the point was only about using "." instead of "source". The following should have fixed it:
. ${lib_dir}/bond_topo_lacp.sh
... or just use bash as the interpreter, I suspect lib.sh is not actually POSIX clean.
Thanks Petr, I know Paolo means to use "." to fix this. The issue is that I changed the script to source forwarding lib. And shell check could only analyse static path. Unless use -x to supply the real source path. But I guess the CI can't do this. So I disabled the SC1091 checking as a workaround.
Regards Hangbin
linux-kselftest-mirror@lists.linaro.org