On Tue, 2024-03-26 at 18:25 +0100, Richard Gobert wrote:
Paolo Abeni wrote:
Hi,
On Tue, 2024-03-26 at 16:02 +0100, Richard Gobert wrote:
This patch is meaningful by itself - removing checks against non-relevant packets and making the flush/flush_id checks in a single place.
I'm personally not sure this patch is a win. The code churn is significant. I understand this is for performance's sake, but I don't see the benefit???
Could you clarify what do you mean by code churn?
The diffstat of this patch is not negligible and touches very sensitive areas.
he changelog shows that perf reports slightly lower figures for inet_gro_receive(). That is expected, as this patch move code out of such functio. What about inet_gro_flush()/tcp_gro_receive() where such code is moved?
Please consider the following 2 common scenarios:
- Multiple packets in the GRO bucket - the common case with multiple packets in the bucket (i.e. running super_netperf TCP_STREAM) - each layer executes a for loop - going over each packet in the bucket. Specifically, L3 gro_receive loops over the bucket making flush,flush_id,is_atomic checks.
Only for packets with the same rx hash.
For most packets in the bucket, these checks are not relevant. (possibly also dirtying cache lines with non-relevant p packets). Removing code in the for loop for this case is significant.
- UDP/TCP streams which do not coalesce in GRO. This is the common case for regular UDP connections (i.e. running netperf UDP_STREAM). In this case, GRO is just overhead. Removing any code from these layers is good (shown in the first measurement of the commit message).
If UDP GRO is not enabled, there are no UDP packet staging in the UDP gro engine, the bucket list is empty.
Additionally the reported deltas is within noise level according to my personal experience with similar tests.
I've tested the difference between net-next and this patch repetitively, which showed stable results each time. Is there any specific test you think would be helpful to show the result?
Anything that show measurable gain.
Reporting the CPU utilization in the inet_gro_receive() function alone is not enough, as part of the load has been moved into gro_network_flush()/tcp_gro_receive().
Cheers,
Paolo