On 05/06, Cosmin Ratiu wrote:
On Tue, 2025-05-06 at 10:12 -0700, Stanislav Fomichev wrote:
On 05/06, Cosmin Ratiu wrote:
__netdev_update_features() expects the netdevice to be ops-locked, but it gets called recursively on the lower level netdevices to sync their features, and nothing locks those.
This commit fixes that, with the assumption that it shouldn't be possible for both higher-level and lover-level netdevices to require the instance lock, because that would lead to lock dependency warnings.
Without this, playing with higher level (e.g. vxlan) netdevices on top of netdevices with instance locking enabled can run into issues:
Mentioning vxlan is a bit confusing here; it shouldn't let you flip lro (I think). Which upper are you testing against?
It is vxlan, but LRO is just a red herring in this case, mlx5e_set_features calls every feature handler in turn, and this is just the example I picked from the sea of stack traces.
Trying to understand if we can cover this case in the selftests. netdevsim also doesn't expose F_LRO feature... (yet?)
I see you found a way with teaming, but I think in essence a sequence of commands that makes __netdev_update_features call itself recursively once on the lower dev will trigger the netdev_ops_assert_locked on the lower dev.
Right, but netdev_sync_lower_features calls lower's __netdev_update_features only for NETIF_F_UPPER_DISABLES. So it doesn't propagate all features, only LRO AFAICT.