On Thu, Mar 06, 2025 at 09:37:53AM +0000, Hangbin Liu wrote:
The reason the mutex was added (instead of the spinlock used before) was exactly because the add and free offload operations could sleep.
With your reply, I also checked the xdo_dev_state_add() in bond_ipsec_add_sa_all(), which may also sleep, e.g. mlx5e_xfrm_add_state(),
If we unlock the spin lock, then the race came back again.
Any idea about this?
The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all releases x->lock, bond_ipsec_del_sa can immediately be called, followed by bond_ipsec_free_sa. Maybe dropping x->lock after setting real_dev to NULL? I checked, real_dev is not used anywhere on the free calls, I think. I have another series refactoring things around real_dev, I hope to be able to send it soon.
Here's a sketch of this idea:
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) {
if (!ipsec->xs->xso.real_dev)
spin_lock(&ipsec->x->lock);
if (!ipsec->xs->xso.real_dev) {
spin_unlock(&ipsec->x->lock); continue;
}
if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_state_delete || @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
} else {
real_dev->xfrmdev_ops-
xdo_dev_state_delete(real_dev, ipsec->xs);
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops-
xdo_dev_state_free(ipsec->xs);
ipsec->xs->xso.real_dev = NULL;
spin_unlock(&ipsec->x->lock);
continue; }
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
ipsec->xs);
ipsec->xs->xso.real_dev = NULL;
Set xs->xso.real_dev = NULL is a good idea. As we will break in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no xs->xso.real_dev.
For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev after .xdo_dev_state_add() in case the following situation.
bond_ipsec_add_sa_all() spin_unlock(&ipsec->x->lock); ipsec->xs->xso.real_dev = real_dev; __xfrm_state_delete x->state = DEAD - bond_ipsec_del_sa() - .xdo_dev_state_delete() .xdo_dev_state_add()
Hmm, do we still need to the spin_lock in bond_ipsec_add_sa_all()? With xs->xso.real_dev = NULL after bond_ipsec_del_sa_all(), it looks there is no need the spin_lock in bond_ipsec_add_sa_all(). e.g.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 04b677d0c45b..3ada51c63207 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -537,15 +537,27 @@ static void bond_ipsec_add_sa_all(struct bonding *bond) }
list_for_each_entry(ipsec, &bond->ipsec_list, list) { + spin_lock_bh(&ipsec->xs->lock); + /* Skip dead xfrm states, they'll be freed later. */ + if (ipsec->xs->km.state == XFRM_STATE_DEAD) { + spin_unlock_bh(&ipsec->xs->lock); + continue; + } + /* If new state is added before ipsec_lock acquired */ - if (ipsec->xs->xso.real_dev == real_dev) + if (ipsec->xs->xso.real_dev == real_dev) { + spin_unlock_bh(&ipsec->xs->lock); continue; + }
- ipsec->xs->xso.real_dev = real_dev; if (real_dev->xfrmdev_ops->xdo_dev_state_add(ipsec->xs, NULL)) { slave_warn(bond_dev, real_dev, "%s: failed to add SA\n", __func__); ipsec->xs->xso.real_dev = NULL; } + /* Set real_dev after .xdo_dev_state_add in case + * __xfrm_state_delete() is called in parallel + */ + ipsec->xs->xso.real_dev = real_dev; }
The spin_lock here seems useless now. What do you think?
Thanks Hangbin