Paolo Abeni wrote:
Hi,
On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote:
This patch is meaningful by itself - removing checks against non-relevant packets and making the flush/flush_id checks in a single place.
I'm personally not sure this patch is a win. The code churn is significant. I understand this is for performance's sake, but I don't see the benefit???
Could you clarify what do you mean by code churn?
This patch removes all use of p->flush and flush_id from the CB. The entire logic for L3 flush_id is scattered in tcp_gro_receive and {inet,ipv6}_gro_receive with conditionals rewriting ->flush, ->flush_id and ->is_atomic. Moving it to one place (gro_network_flush) should be more readable. (Personally, it took me a lot of time to understand the current logic of flush + flush_id + is_atomic)
The changelog shows that perf reports slightly lower figures for inet_gro_receive(). That is expected, as this patch move code out of such functio. What about inet_gro_flush()/tcp_gro_receive() where such code is moved?
Please consider the following 2 common scenarios:
1) Multiple packets in the GRO bucket - the common case with multiple packets in the bucket (i.e. running super_netperf TCP_STREAM) - each layer executes a for loop - going over each packet in the bucket. Specifically, L3 gro_receive loops over the bucket making flush,flush_id,is_atomic checks. For most packets in the bucket, these checks are not relevant. (possibly also dirtying cache lines with non-relevant p packets). Removing code in the for loop for this case is significant.
2) UDP/TCP streams which do not coalesce in GRO. This is the common case for regular UDP connections (i.e. running netperf UDP_STREAM). In this case, GRO is just overhead. Removing any code from these layers is good (shown in the first measurement of the commit message).
In the case of a single TCP connection - the amount of checks should be the same overall not causing any noticeable difference.
Additionally the reported deltas is within noise level according to my personal experience with similar tests.
I've tested the difference between net-next and this patch repetitively, which showed stable results each time. Is there any specific test you think would be helpful to show the result?
Thanks