On Wed, Mar 05, 2025 at 04:12:18PM +0000, Cosmin Ratiu wrote:
On Wed, 2025-03-05 at 14:13 +0000, Hangbin Liu wrote:
On Wed, Mar 05, 2025 at 10:38:36AM +0200, Nikolay Aleksandrov wrote:
@@ -617,8 +614,18 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) {
Second time - you should use list_for_each_entry_safe if you're walking and deleting elements from the list.
Sorry, I missed this comment. I will update in next version.
spin_lock_bh(&ipsec->xs->lock);
if (!ipsec->xs->xso.real_dev)
continue;
goto next;
if (ipsec->xs->km.state == XFRM_STATE_DEAD) {
/* already dead no need to delete again
*/
if (real_dev->xfrmdev_ops-
xdo_dev_state_free)
real_dev->xfrmdev_ops-
xdo_dev_state_free(ipsec->xs);
Have you checked if .xdo_dev_state_free can sleep? I see at least one that can: mlx5e_xfrm_free_state().
Hmm, This brings us back to the initial problem. We tried to avoid calling a spin lock in a sleep context (bond_ipsec_del_sa), but now the new code encounters this issue again.
The reason the mutex was added (instead of the spinlock used before) was exactly because the add and free offload operations could sleep.
With your reply, I also checked the xdo_dev_state_add() in bond_ipsec_add_sa_all(), which may also sleep, e.g. mlx5e_xfrm_add_state(),
If we unlock the spin lock, then the race came back again.
Any idea about this?
The race is between bond_ipsec_del_sa_all and bond_ipsec_del_sa (plus bond_ipsec_free_sa). The issue is that when bond_ipsec_del_sa_all releases x->lock, bond_ipsec_del_sa can immediately be called, followed by bond_ipsec_free_sa. Maybe dropping x->lock after setting real_dev to NULL? I checked, real_dev is not used anywhere on the free calls, I think. I have another series refactoring things around real_dev, I hope to be able to send it soon.
Here's a sketch of this idea:
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -613,8 +613,11 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) {
if (!ipsec->xs->xso.real_dev)
spin_lock(&ipsec->x->lock);
if (!ipsec->xs->xso.real_dev) {
spin_unlock(&ipsec->x->lock); continue;
}
if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_state_delete || @@ -622,12 +625,16 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) slave_warn(bond_dev, real_dev, "%s: no slave xdo_dev_state_delete\n", __func__);
} else {
real_dev->xfrmdev_ops-
xdo_dev_state_delete(real_dev, ipsec->xs);
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops-
xdo_dev_state_free(ipsec->xs);
ipsec->xs->xso.real_dev = NULL;
spin_unlock(&ipsec->x->lock);
continue; }
real_dev->xfrmdev_ops->xdo_dev_state_delete(real_dev,
ipsec->xs);
ipsec->xs->xso.real_dev = NULL;
Set xs->xso.real_dev = NULL is a good idea. As we will break in bond_ipsec_del_sa()/bond_ipsec_free_sa() when there is no xs->xso.real_dev.
For bond_ipsec_add_sa_all(), I will move the xso.real_dev = real_dev after .xdo_dev_state_add() in case the following situation.
bond_ipsec_add_sa_all() spin_unlock(&ipsec->x->lock); ipsec->xs->xso.real_dev = real_dev; __xfrm_state_delete x->state = DEAD - bond_ipsec_del_sa() - .xdo_dev_state_delete() .xdo_dev_state_add()
Thanks Hangbin
/* Unlock before freeing device state, it could sleep.
*/
spin_unlock(&ipsec->x->lock);
if (real_dev->xfrmdev_ops->xdo_dev_state_free)
real_dev->xfrmdev_ops-
xdo_dev_state_free(ipsec->xs);
Cosmin.