On May 28, 2019, at 8:08 AM, Willem de Bruijn willemdebruijn.kernel@gmail.com wrote:
I will push up latest patches soon.
I did some testing and discovered that only TCP audit tests failed. They failed much less often when enabling poll. Once in about 20 runs still failed. Therefore I commented out the TCP audit tests.
As for the other tests, this is what I got with poll() disabled…
udp gso zerocopy timestamp audit udp rx: 1611 MB/s 1148129 calls/s udp tx: 1659 MB/s 28146 calls/s 28146 msg/s udp rx: 1686 MB/s 1201494 calls/s udp tx: 1685 MB/s 28579 calls/s 28579 msg/s udp rx: 1685 MB/s 1200402 calls/s udp tx: 1683 MB/s 28552 calls/s 28552 msg/s Summary over 3.000 seconds... sum udp tx: 1716 MB/s 85277 calls (28425/s) 85277 msgs (28425/s) Tx Timestamps: 85277 received 0 errors Zerocopy acks: 85277 received 0 errors
Here you see that with poll() enabled, it is a bit slower, so I don’t have it enabled in udpgso_bench.sh …
udp gso zerocopy timestamp audit udp rx: 1591 MB/s 1133945 calls/s udp tx: 1613 MB/s 27358 calls/s 27358 msg/s udp rx: 1644 MB/s 1171674 calls/s udp tx: 1643 MB/s 27869 calls/s 27869 msg/s udp rx: 1643 MB/s 1170666 calls/s udp tx: 1641 MB/s 27845 calls/s 27845 msg/s Summary over 3.000 seconds... sum udp tx: 1671 MB/s 83072 calls (27690/s) 83072 msgs (27690/s) Tx Timestamps: 83072 received 0 errors Zerocopy acks: 83072 received 0 errors
You may be interested that I reduced test lengths from 4 to 3 seconds, but I am still getting 3 reports per test. I picked up the extra report by changing 'if (tnow > treport)’ to 'if (tnow >= treport)’
The only issue specific to GSO is that xmit_more can forego this doorbell until the last segment. We want to complicate this logic with a special case based on tx_flags. A process that cares should either not use GSO, or the timestamp should be associated with the last segment as I've been arguing so far.
This is the area I was thinking of looking into. I’m not sure it will work or that it will be too messy. It may be worth a little bit of digging to see if there is anything there. That will be down the road a bu
I’ll get back to you when I have tested this more thoroughly. Early results suggest that adding the -P poll() option has fixed it without any appreciable performance hit. I’ll share raw results with you, and we can make a final decision together.
In the main loop? It still is peculiar that notifications appear to go missing unless the process blocks waiting for them. Nothing in sock_zerocopy_callback or the queueing onto the error queue should cause drops, as far as I know.
Now that I know the issue is only in TCP, I can speculate that all bytes are being reported, but done with fewer messages. It may warrant some investigation in case there is some kind of bug.
Indeed. Ideally even run all tests, but return error if any failed, like this recent patch
selftests/bpf: fail test_tunnel.sh if subtests fail https://patchwork.ozlabs.org/patch/1105221/
but that may be a lot of code churn and better left to a separate patch.
I like it. I have it coded up, and it seems to work well. I’ll make a separate commit in the patch set so we can yank it out if you feel it is too much