Richard Gobert wrote:
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, iph->id, ...) against all packets in a loop. These flush checks are used in all merging UDP and TCP flows.
These checks need to be done only once and only against the found p skb, since they only affect flush and not same_flow.
This patch leverages correct network header offsets from the cb for both outer and inner network headers - allowing these checks to be done only once, in tcp_gro_receive and udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are more declarative and contained in inet_gro_flush, thus removing the need for flush_id in napi_gro_cb.
This results in less parsing code for non-loop flush tests for TCP and UDP flows.
To make sure results are not within noise range - I've made netfilter drop all TCP packets, and measured CPU performance in GRO (in this case GRO is responsible for about 50% of the CPU utilization).
perf top while replaying 64 parallel IP/TCP streams merging in GRO: (gro_network_flush is compiled inline to tcp_gro_receive) net-next: 6.94% [kernel] [k] inet_gro_receive 3.02% [kernel] [k] tcp_gro_receive
patch applied: 4.27% [kernel] [k] tcp_gro_receive 4.22% [kernel] [k] inet_gro_receive
perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same results for any encapsulation, in this case inet_gro_receive is top offender in net-next) net-next: 10.09% [kernel] [k] inet_gro_receive 2.08% [kernel] [k] tcp_gro_receive
patch applied: 6.97% [kernel] [k] inet_gro_receive 3.68% [kernel] [k] tcp_gro_receive
Thanks for getting the additional numbers. The savings are not huge.
But +1 on the change also because it simplifies this non-obvious logic. It makes sense to separate flow matching and flush logic.
Btw please include Alexander Duyck in the Cc: of this series.
+static inline int inet_gro_flush(const struct iphdr *iph, const struct iphdr *iph2,
struct sk_buff *p, bool outer)
+{
- const u32 id = ntohl(*(__be32 *)&iph->id);
- const u32 id2 = ntohl(*(__be32 *)&iph2->id);
- const u16 flush_id = (id >> 16) - (id2 >> 16);
- const u16 count = NAPI_GRO_CB(p)->count;
- const u32 df = id & IP_DF;
- u32 is_atomic;
- int flush;
- /* All fields must match except length and checksum. */
- flush = (iph->ttl ^ iph2->ttl) | (iph->tos ^ iph2->tos) | (df ^ (id2 & IP_DF));
- if (outer && df)
return flush;
Does the fixed id logic apply equally to inner and outer IPv4?
- /* When we receive our second frame we can make a decision on if we
* continue this flow as an atomic flow with a fixed ID or if we use
* an incrementing ID.
*/
- NAPI_GRO_CB(p)->is_atomic |= (count == 1 && df && flush_id == 0);
- is_atomic = (df && NAPI_GRO_CB(p)->is_atomic) - 1;
- return flush | (flush_id ^ (count & is_atomic));
This is a good time to consider making this logical more obvious.
First off, the flush check can be part of the outer && df above, as flush is not modified after.
Subjective, but I find the following more readable, and not worth saving a few branches.
if (count == 1 && df && !flush_id) NAPI_GRO_CB(p)->is_atomic = true;
ip_fixedid_matches = NAPI_GRO_CB(p)->is_atomic ^ df; ipid_offset_matches = ipid_offset - count;
return ip_fixedid_matches & ipid_offset_matches;
Have to be a bit careful about types. Have not checked that in detail.
And while nitpicking: ipid_offset may be a more descriptive variable name than flush_id, and ip_fixedid than is_atomic. If changing those does not result in a lot of code churn.
+}
+static inline int ipv6_gro_flush(const struct ipv6hdr *iph, const struct ipv6hdr *iph2) +{
- /* Version:4<Traffic_Class:8><Flow_Label:20> */
- __be32 first_word = *(__be32 *)iph ^ *(__be32 *)iph2;
- /* Flush if Traffic Class fields are different. */
- return !!((first_word & htonl(0x0FF00000)) |
(__force __be32)(iph->hop_limit ^ iph2->hop_limit));
+}
+static inline int gro_network_flush(const void *th, const void *th2, struct sk_buff *p, int off) +{
- const bool encap_mark = NAPI_GRO_CB(p)->encap_mark;
Is this correct when udp_gro_complete clears this for tunnels?
- int flush = 0;
- int i;
- for (i = 0; i <= encap_mark; i++) {
const u16 diff = off - NAPI_GRO_CB(p)->network_offsets[i];
const void *nh = th - diff;
const void *nh2 = th2 - diff;
if (((struct iphdr *)nh)->version == 6)
flush |= ipv6_gro_flush(nh, nh2);
else
flush |= inet_gro_flush(nh, nh2, p, i != encap_mark);
- }
Maybe slightly better for branch prediction, and more obvious, if creating a helper function __gro_network_flush and calling
__gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[0]) if (NAPI_GRO_CB(p)->encap_mark) __gro_network_flush(th, th2, p, off - NAPI_GRO_CB(p)->network_offsets[1])
- return flush;
+}
int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb);