On Thu, Feb 27, 2025 at 03:31:01PM +0200, Nikolay Aleksandrov wrote:
One more thing - note I'm not an xfrm expert by far but it seems to me here you have to also call xdo_dev_state_free() with the old active slave dev otherwise that will never get called with the original real_dev after the switch to a new active slave (or more accurately it might if the GC runs between the switching but it is a race), care must be taken wrt sequence of events because the XFRM
Can we just call xs->xso.real_dev->xfrmdev_ops->xdo_dev_state_free(xs) no matter xs->xso.real_dev == real_dev or not? I'm afraid calling xdo_dev_state_free() every where may make us lot more easily.
You'd have to check all drivers that implement the callback to answer that and even then I'd stick to the canonical way of how it's done in xfrm and make the bond just passthrough. Any other games become dangerous and new code will have to be carefully reviewed every time, calling another device's free_sa when it wasn't added before doesn't sound good.
GC may be running in parallel which probably means that in bond_ipsec_free_sa() you'll have to take the mutex before calling xdo_dev_state_free() and check if the entry is still linked in the bond's ipsec list before calling the free_sa callback, if it isn't then del_sa_all got to it before the GC and there's nothing to do if it also called the dev's free_sa callback. The check for real_dev doesn't seem enough to protect against this race.
I agree that we need to take the mutex before calling xdo_dev_state_free() in bond_ipsec_free_sa(). Do you think if this is enough? I'm a bit lot here.
Thanks Hangbin
Well, the race is between the xfrm GC and del_sa_all, in bond's free_sa if you walk the list under the mutex before calling real_dev's free callback and don't find the current element that's being freed in free_sa then it was cleaned up by del_sa_all, otherwise del_sa_all is waiting to walk that list and clean the entries. I think it should be fine as long as free_sa was called once with the proper device.
OK, so the free will be called either in del_sa_all() or free_sa(). Something like this?
static void bond_ipsec_del_sa_all(struct bonding *bond) @@ -620,6 +614,16 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) if (!ipsec->xs->xso.real_dev) continue;
+ if (ipsec->xs->km.state == XFRM_STATE_DEAD) { + /* already dead no need to delete again */ + if (real_dev->xfrmdev_ops->xdo_dev_state_free) + real_dev->xfrmdev_ops->xdo_dev_state_free(ipsec->xs); + list_del(&ipsec->list); + kfree(ipsec); + continue; + } + if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_state_delete || netif_is_bond_master(real_dev)) {
@@ -659,11 +664,22 @@ static void bond_ipsec_free_sa(struct xfrm_state *xs) if (!xs->xso.real_dev) goto out;
- WARN_ON(xs->xso.real_dev != real_dev); + mutex_lock(&bond->ipsec_lock); + list_for_each_entry(ipsec, &bond->ipsec_list, list) { + if (ipsec->xs == xs) { + if (real_dev && xs->xso.real_dev == real_dev &&
^^ looks we don't need this xs->xso.real_dev == real_dev checking if there is no race, do we? Or just keep the WARN_ON() in case of any race.
+ real_dev->xfrmdev_ops && + real_dev->xfrmdev_ops->xdo_dev_state_free) + real_dev->xfrmdev_ops->xdo_dev_state_free(xs); + list_del(&ipsec->list); + kfree(ipsec); + break; + } + } + mutex_unlock(&bond->ipsec_lock);
- if (real_dev && real_dev->xfrmdev_ops && - real_dev->xfrmdev_ops->xdo_dev_state_free) - real_dev->xfrmdev_ops->xdo_dev_state_free(xs); out: netdev_put(real_dev, &tracker); }