On Fri, Apr 11, 2025 at 10:49:57AM +0300, Cosmin Ratiu wrote:
When the active link is changed for a bond device, the existing xfrm states need to be migrated over to the new link. This is done with:
- bond_ipsec_del_sa_all() goes through the offloaded states list and removes all of them from hw.
- bond_ipsec_add_sa_all() re-offloads all states to the new device.
But because the offload status of xfrm states isn't marked in any way, there can be bugs.
When all bond links are down, bond_ipsec_del_sa_all() unoffloads everything from the previous active link. If the same link then comes back up, nothing gets reoffloaded by bond_ipsec_add_sa_all(). This results in a stack trace like this a bit later when user space removes the offloaded rules, because mlx5e_xfrm_del_state() is asked to remove a rule that's no longer offloaded:
[] Call Trace: [] <TASK> [] ? __warn+0x7d/0x110 [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] [] ? report_bug+0x16d/0x180 [] ? handle_bug+0x4f/0x90 [] ? exc_invalid_op+0x14/0x70 [] ? asm_exc_invalid_op+0x16/0x20 [] ? mlx5e_xfrm_del_state+0x73/0xa0 [mlx5_core] [] ? mlx5e_xfrm_del_state+0x90/0xa0 [mlx5_core] [] bond_ipsec_del_sa+0x1ab/0x200 [bonding] [] xfrm_dev_state_delete+0x1f/0x60 [] __xfrm_state_delete+0x196/0x200 [] xfrm_state_delete+0x21/0x40 [] xfrm_del_sa+0x69/0x110 [] xfrm_user_rcv_msg+0x11d/0x300 [] ? release_pages+0xca/0x140 [] ? copy_to_user_tmpl.part.0+0x110/0x110 [] netlink_rcv_skb+0x54/0x100 [] xfrm_netlink_rcv+0x31/0x40 [] netlink_unicast+0x1fc/0x2d0 [] netlink_sendmsg+0x1e4/0x410 [] __sock_sendmsg+0x38/0x60 [] sock_write_iter+0x94/0xf0 [] vfs_write+0x338/0x3f0 [] ksys_write+0xba/0xd0 [] do_syscall_64+0x4c/0x100 [] entry_SYSCALL_64_after_hwframe+0x4b/0x53
There's also another theoretical bug: Calling bond_ipsec_del_sa_all() multiple times can result in corruption in the driver implementation if the double-free isn't tolerated. This isn't nice.
Before the "Fixes" commit, xs->xso.real_dev was set to NULL when an xfrm state was unoffloaded from a device, but a race with netdevsim's .xdo_dev_offload_ok() accessing real_dev was considered a sufficient reason to not set real_dev to NULL anymore. This unfortunately introduced the new bugs.
Since .xdo_dev_offload_ok() was significantly refactored by [1] and there are no more users in the stack of xso.real_dev, that race is now gone and xs->xso.real_dev can now once again be used to represent which device (if any) currently holds the offloaded rule.
Go one step further and set real_dev after add/before delete calls, to catch any future driver misuses of real_dev.
[1] https://lore.kernel.org/netdev/cover.1739972570.git.leon@kernel.org/ Fixes: f8cde9805981 ("bonding: fix xfrm real_dev null pointer dereference") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Leon Romanovsky leonro@nvidia.com
drivers/net/bonding/bond_main.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 4ba525a564c5..14f7c9712ad4 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -496,9 +496,9 @@ static int bond_ipsec_add_sa(struct net_device *bond_dev, goto out; }
- xs->xso.real_dev = real_dev; err = real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, xs, extack); if (!err) {
ipsec->xs = xs; INIT_LIST_HEAD(&ipsec->list); mutex_lock(&bond->ipsec_lock);xs->xso.real_dev = real_dev;
@@ -540,12 +540,12 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) if (ipsec->xs->xso.real_dev == real_dev) continue;
if (real_dev->xfrmdev_ops->xdo_dev_state_add(real_dev, ipsec->xs, NULL)) { slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__);ipsec->xs->xso.real_dev = real_dev;
ipsec->xs->xso.real_dev = NULL;
}continue;
}ipsec->xs->xso.real_dev = real_dev;
out: mutex_unlock(&bond->ipsec_lock); @@ -629,6 +629,7 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) __func__); continue; }
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev, ipsec->xs); if (real_dev->xfrmdev_ops->xdo_dev_state_free)ipsec->xs->xso.real_dev = NULL;
@@ -664,6 +665,7 @@ static void bond_ipsec_free_sa(struct net_device *bond_dev, WARN_ON(xs->xso.real_dev != real_dev);
- xs->xso.real_dev = NULL; if (real_dev && real_dev->xfrmdev_ops && real_dev->xfrmdev_ops->xdo_dev_state_free) real_dev->xfrmdev_ops->xdo_dev_state_free(real_dev, xs);
-- 2.45.0
Tested-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com