On 9/29/24 3:11 AM, Ido Schimmel wrote:
On Sun, Sep 29, 2024 at 02:18:20AM -0400, Willem de Bruijn wrote:
From: Willem de Bruijn willemb@google.com
This reverts commit 504fc6f4f7f681d2a03aa5f68aad549d90eab853.
dev_queue_xmit_nit is expected to be called with BH disabled. __dev_queue_xmit has the following:
/* Disable soft irqs for various locks below. Also * stops preemption for RCU. */ rcu_read_lock_bh();
VRF must follow this invariant. The referenced commit removed this protection. Which triggered a lockdep warning:
[...]
Fixes: 504fc6f4f7f6 ("vrf: Remove unnecessary RCU-bh critical section") Link: https://lore.kernel.org/netdev/20240925185216.1990381-1-greearb@candelatech.... Reported-by: Ben Greear greearb@candelatech.com Signed-off-by: Willem de Bruijn willemb@google.com Cc: stable@vger.kernel.org
Reviewed-by: Ido Schimmel idosch@nvidia.com Tested-by: Ido Schimmel idosch@nvidia.com
Reviewed-by: David Ahern dsahern@kernel.org
Thanks Willem!
The reason my script from 504fc6f4f7f6 did not trigger the problem is that it was pinging the address inside the VRF, so vrf_finish_direct() was only called from the Rx path.
If you ping the address outside of the VRF:
ping -I vrf1 -i 0.1 -c 10 -q 192.0.2.1
Then vrf_finish_direct() is called from process context and the lockdep warning is triggered. Tested that it does not trigger after applying the revert.
That case should be covered by the fcnal-test suite which does all combinations of addresses.