So in spite of this Syzkaller bug being unrelated in the end, I've continued to think about the stacktrace a bit, and combined with some other [potentially false alarm] bug reports I'm trying to wrap my head around, I'm a bit a curious about ideal usage for the udp_tunnel API.
All the uses I've seen in the kernel (including wireguard) follow this pattern:
rcu_read_lock_bh(); sock = rcu_dereference(obj->sock); ... udp_tunnel_xmit_skb(..., sock, ...); rcu_read_unlock_bh();
udp_tunnel_xmit_skb calls iptunnel_xmit, which winds up in the usual ip_local_out path, which eventually winds up calling some other devices' ndo_xmit, or gets queued up in a qdisc. Calls to udp_tunnel_xmit_skb aren't exactly cheap. So I wonder: is holding the rcu lock for all that time really a good thing?
A different pattern that avoids holding the rcu lock would be:
rcu_read_lock_bh(); sock = rcu_dereference(obj->sock); sock_hold(sock); rcu_read_unlock_bh(); ... udp_tunnel_xmit_skb(..., sock, ...); sock_put(sock);
This seems better, but I wonder if it has some drawbacks too. For example, sock_put has some comment that warns against incrementing it in response to forwarded packets. And if this isn't necessary to do, it's marginally more costly than the first pattern.
Any opinions about this?
Jason