On Mon, Jan 20, 2025 at 04:16:49PM +0000, Cosmin Ratiu wrote:
On Fri, 2025-01-17 at 08:54 +0100, Steffen Klassert wrote:
Hi Jianbo,
I talked with Sabrina and it looks we can't simply do this. Because both xfrm_add_sa_expire() and xfrm_timer_handler() calling __xfrm_state_delete() under spin lock. If we move the xfrm_dev_state_delete() out of __xfrm_state_delete(), all the places need to be handled correctly.
At the same time xfrm_timer_handler() calling xfrm_dev_state_update_stats before __xfrm_state_delete(). Should we also take care of it to make sure the state change and delete are called at the same time?
Hi Steffen, do you have any comments?
Can't you just fix this in bonding? xfrm_timer_handler() can't sleep anyway, even if you remove the spinlock, it is a timer function.
I am not sure this can be fixed in bonding given that the xdo_dev_state_delete op could, in the general case, sleep while talking to the hardware. I don't think it's reasonable to expect devices to offload xfrm while the kernel holds a spinlock. Bonding just exposed this assumption mismatch because of the mutex that was added to replace a spinlock which exhibited the same problem we are talking about here.
Do the dev offload operations need to be synchronous? Couldn't __xfrm_state_delete instead schedule a wq to do the dev offload? I saw there's already an xfrm_state_gc_task that's invoked to call xfrm_dev_state_free, perhaps that could be used to do the delete as well?
Yes, I have tried to move the bonding gc work in bond_ipsec_del_sa() to a wq in https://lore.kernel.org/netdev/Z33nEKg4PxwReUu_@fedora/. i.e. move the following part out of spin lock via wq.
mutex_lock(&bond->ipsec_lock); list_for_each_entry(ipsec, &bond->ipsec_list, list) { if (ipsec->xs == xs) { list_del(&ipsec->list); kfree(ipsec); break; } } mutex_unlock(&bond->ipsec_lock);
But we can see there is an (ipsec->xs == xs). So we still need to make sure the xs is not released. Can we add a xs reference in bond_ipsec_del_sa() to achieve this?
Thanks Hangbin