Jakub Sitnicki wrote:
Hi Willem,
On Fri, Oct 11, 2024 at 02:17 PM +02, Jakub Sitnicki wrote:
If:
- the user requested USO, but
- there is not enough payload for GSO to kick in, and
- the egress device doesn't offer checksum offload, then
we want to compute the L4 checksum in software early on.
In the case when we are not taking the GSO path, but it has been requested, the software checksum fallback in skb_segment doesn't get a chance to compute the full checksum, if the egress device can't do it. As a result we end up sending UDP datagrams with only a partial checksum filled in, which the peer will discard.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") Reported-by: Ivan Babrou ivan@cloudflare.com Signed-off-by: Jakub Sitnicki jakub@cloudflare.com Acked-by: Willem de Bruijn willemdebruijn.kernel@gmail.com Cc: stable@vger.kernel.org
I'm finally circling back to add a regression test for the above fix.
Instead of extending the selftest/net/udpgso.sh test case, I want to propose a different approach. I would like to check if the UDP packets packets are handed over to the netdevice with the expected checksum (complete or partial, depending on the device features), instead of testing for side-effects (packet dropped due to bad checksum).
For that we could use packetdrill. We would need to extend it a bit to allow specifying a UDP checksum in the script, but otherwise it would make writing such tests rather easy. For instance, the regression test for this fix could be as simple as:
---8<--- // Check if sent datagrams with length below GSO size get checksummed correctly
--ip_version=ipv4 --local_ip=192.168.0.1
` ethtool -K tun0 tx-checksumming off >/dev/null `
0 socket(..., SOCK_DGRAM, IPPROTO_UDP) = 3 +0 bind(3, ..., ...) = 0 +0 connect(3, ..., ...) = 0
+0 write(3, ..., 1000) = 1000 +0 > udp sum 0x3643 (1000) // expect complete checksum
+0 setsockopt(3, IPPROTO_UDP, UDP_SEGMENT, [1280], 4) = 0 +0 write(3, ..., 1000) = 1000 +0 > udp sum 0x3643 (1000) // expect complete checksum --->8---
(I'd actually like to have a bit mode of syntax sugar there, so we can simply specify "sum complete" and have packetdrill figure out the expected checksum value. Then IP address pinning wouldn't be needed.)
If we ever regress, the failure will be straightforward to understand. Here's what I got when running the above test with the fix reverted:
~ # packetdrill dgram_below_gso_size.pkt dgram_below_gso_size.pkt:19: error handling packet: live packet field l4_csum: expected: 13891 (0x3643) vs actual: 34476 (0x86ac) script packet: 0.000168 udp sum 0x3643 (1000) actual packet: 0.000166 udp sum 0x86ac (1000) ~ #
My patched packetdrill PoC is at:
https://github.com/jsitnicki/packetdrill/commits/udp-segment/rfc1/
If we want to go with the packetdrill-based test, that raises the question where do keep it? In the packetdrill repo? Or with the rest of the selftests/net?
Using the packetdrill repo would make it easier to synchronize the development of packetdrill features with the tests that use them. But we would also have to hook it up to netdev CI.
WDYT?
+1. Packetdrill is great environment for such tests. Packetdrill tests are also concise and easy to read.
I recently imported an initial batch of packetdrill tests to ksft and with that the netdev CI. We have a patch series with the remaining .pkt files on github ready for when the merge window opens.
Any changes to packetdrill itself need to go to github, as that does not ship with the kernel.
I encourage .pkt files to go there too. But I suspect that that won't be enforceable, and we do want the review on netdev@ first.
One issue with testing optional features may be that packetdrill runs by default on a tun device (though it can also run across two NICs as a two machine test).
Tun supports NETIF_F_GSO_UDP_L4, so that should be no concern in this case.
One small request: to avoid confusion with CHECKSUM_COMPLETE, please use something else to mean fully computed checksums on the egress path (which, somewhat non-obviously, would be CHECKSUM_NONE). Perhaps SUM_PSEUDO and SUM_FULL?