On Sat, Nov 30, 2024 at 12:33:10PM +0100, Christian Marangi wrote:
It seems real hardware requires some time to stabilize and actually works after an 'ip link up'. This is not the case for veth as everything is simulated but this is a requirement for real hardware to permit receiving packet.
Without this the very fist test for unicast always fails on real hardware. With the introduced sleep of one second after mc_route_prepare, the test corretly pass as the packet can correctly be delivered.
I think the analysis is not very convincing for the following reason.
To wait after "ip link up", setup_wait() calls setup_wait_dev_with_timeout() which waits until "ip link show dev $dev up" reports 'state UP'. This comes from IFLA_OPERSTATE, set by linkwatch.
I remember having this conversation with Danielle Ratson a few years ago: https://lore.kernel.org/netdev/20210624151515.794224-1-danieller@nvidia.com/ but the bottom line should be that, since commit facd15dfd691 ("net: core: synchronize link-watch when carrier is queried") AFAIU, an operstate of UP really means that the net device is ready of passing traffic. Failure to do so should be a device-side problem.
Then I thought that maybe tcpdump needs some time to set up its filters on the receving net device. But tcpdump_start() already has "sleep 1" in it. I admit, that was purely empirical and there's no guarantee that tcpdump has finished setting up even after 1 second. If you increase it to 2, does it also solve your problem?
Or do you really have to place the sleep call after the mc_route_prepare() calls, and any earlier won't help? In that case, it isolates the sleeping requirement to the multicast routes themselves?