Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") introduces a ~10% performance drop when using virtio-net drivers.
This commit has been backported to v4.19 in commit 669c0b5782fb and this performance drop is also visible there. Here at Tessares, we can also notice this drop with the MPTCP fork [1] on top of the v4.19 kernel.
Eric Dumazet already fixed this issue a few months ago, see commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head").
Unfortunately, this patch has not been backported to < v5.4 because it caused issues [2]. Indeed, after having backported it, the kernel failed to compile because one commit was missing, see commit 503d539a6e41 ("virtio_net: Add XDP meta data support"). However, this missing commit has been added in 4.19.186 but probably because there were still some opened discussions [3] around commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head"), the latter has not been backported at all in v4.19.
A cherry-pick of this patch without any modification is proposed here. It has been validated: it fixes the original issue on v4.19 as well.
Please note that there is also a fix for the fix, see commit 38ec4944b593 ("gro: ensure frag0 meets IP header alignment").
This second fix has also not been backported because it caused issues as well [4]. Here, it was due to a conflict but also a compilation error when the conflict has been resolved. Please refer to patch 2/2 for more details.
One last note: these two patches have also been backported and validated on a v4.14 release. A second series is going to be sent. It looks like it could be interesting to backport these two patches to v4.9 and v4.4 as well but unfortunately, the backport of these two patches fails with conflicts and I don't have any setup to validate the performance drop and fix with v4.9 and v4.4 kernels.
[1] https://github.com/multipath-tcp/mptcp [2] https://lore.kernel.org/stable/161806389686151@kroah.com/ [3] https://lore.kernel.org/stable/20210412051204-mutt-send-email-mst@kernel.org... [4] https://lore.kernel.org/stable/1618749018155126@kroah.com/
Eric Dumazet (2): virtio_net: Do not pull payload in skb->head gro: ensure frag0 meets IP header alignment
drivers/net/virtio_net.c | 10 +++++++--- include/linux/skbuff.h | 9 +++++++++ include/linux/virtio_net.h | 14 +++++++++----- net/core/dev.c | 3 ++- 4 files changed, 27 insertions(+), 9 deletions(-)
From: Eric Dumazet edumazet@google.com
[ Upstream commit 0f6925b3e8da0dbbb52447ca8a8b42b371aac7db ]
Xuan Zhuo reported that commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop.
The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which uses more memory but also cause packet consumers to go over a lot of overhead handling all the tiny skbs.
It turns out that virtio_net page_to_skb() has a wrong strategy : It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then copies 128 bytes from the page, before feeding the packet to GRO stack.
This was suboptimal before commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") because GRO was using 2 frags per MSS, meaning we were not packing MSS with 100% efficiency.
Fix is to pull only the ethernet header in page_to_skb()
Then, we change virtio_net_hdr_to_skb() to pull the missing headers, instead of assuming they were already pulled by callers.
This fixes the performance regression, but could also allow virtio_net to accept packets with more than 128bytes of headers.
Many thanks to Xuan Zhuo for his report, and his tests/help.
Fixes: 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") Reported-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Link: https://www.spinics.net/lists/netdev/msg731397.html Co-Developed-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Jason Wang jasowang@redhat.com Cc: virtualization@lists.linux-foundation.org Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- drivers/net/virtio_net.c | 10 +++++++--- include/linux/virtio_net.h | 14 +++++++++----- 2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 5e8b40630286..1a8fe5bacb19 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -413,9 +413,13 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, offset += hdr_padded_len; p += hdr_padded_len;
- copy = len; - if (copy > skb_tailroom(skb)) - copy = skb_tailroom(skb); + /* Copy all frame if it fits skb->head, otherwise + * we let virtio_net_hdr_to_skb() and GRO pull headers as needed. + */ + if (len <= skb_tailroom(skb)) + copy = len; + else + copy = ETH_HLEN + metasize; skb_put_data(skb, p, copy);
if (metasize) { diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index a1829139ff4a..8f48264f5dab 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -65,14 +65,18 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, skb_reset_mac_header(skb);
if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { - u16 start = __virtio16_to_cpu(little_endian, hdr->csum_start); - u16 off = __virtio16_to_cpu(little_endian, hdr->csum_offset); + u32 start = __virtio16_to_cpu(little_endian, hdr->csum_start); + u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset); + u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16)); + + if (!pskb_may_pull(skb, needed)) + return -EINVAL;
if (!skb_partial_csum_set(skb, start, off)) return -EINVAL;
p_off = skb_transport_offset(skb) + thlen; - if (p_off > skb_headlen(skb)) + if (!pskb_may_pull(skb, p_off)) return -EINVAL; } else { /* gso packets without NEEDS_CSUM do not set transport_offset. @@ -102,14 +106,14 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, }
p_off = keys.control.thoff + thlen; - if (p_off > skb_headlen(skb) || + if (!pskb_may_pull(skb, p_off) || keys.basic.ip_proto != ip_proto) return -EINVAL;
skb_set_transport_header(skb, keys.control.thoff); } else if (gso_type) { p_off = thlen; - if (p_off > skb_headlen(skb)) + if (!pskb_may_pull(skb, p_off)) return -EINVAL; } }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 38ec4944b593fd90c5ef42aaaa53e66ae5769d04 ]
After commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head") Guenter Roeck reported one failure in his tests using sh architecture.
After much debugging, we have been able to spot silent unaligned accesses in inet_gro_receive()
The issue at hand is that upper networking stacks assume their header is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN bytes before the Ethernet header to make that happen.
This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path if the fragment is not properly aligned.
Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN as 0, this extra check will be a NOP for them.
Note that if frag0 is not used, GRO will call pskb_may_pull() as many times as needed to pull network and transport headers.
[ Backport note ]
A small conflict has been reported:
++<<<<<<< HEAD + if (skb_mac_header(skb) == skb_tail_pointer(skb) && + pinfo->nr_frags && + !PageHighMem(skb_frag_page(frag0))) { ++======= + if (!skb_headlen(skb) && pinfo->nr_frags && + !PageHighMem(skb_frag_page(frag0)) && + (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) { ++>>>>>>> 38ec4944b593 (gro: ensure frag0 meets IP header alignment)
This is expected because older kernels are missing commit 8aef998df3979 ("net: core: allow fast GRO for skbs with Ethernet header in head"). This commit modifies the beginning of the 'if' statement.
The resolution was easy: the patch we want to backport here is adding new conditions to the 'if' statement:
&& (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))
We simply append these new conditions to it on older kernels.
Another issue had to be resolved: skb_frag_off() is used in this patch we want to backport but this function is not defined in kernels < 5.4. It has then been extracted and imported from commit 7240b60c98d63 ("linux: Add skb_frag_t page_offset accessors").
Fixes: 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head") Fixes: 78a478d0efd9 ("gro: Inline skb_gro_header and cache frag0 virtual address") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: Guenter Roeck linux@roeck-us.net Cc: Xuan Zhuo xuanzhuo@linux.alibaba.com Cc: "Michael S. Tsirkin" mst@redhat.com Cc: Jason Wang jasowang@redhat.com Acked-by: Michael S. Tsirkin mst@redhat.com Tested-by: Guenter Roeck linux@roeck-us.net Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net --- include/linux/skbuff.h | 9 +++++++++ net/core/dev.c | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 06176ef2a842..5f2e6451ece5 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2788,6 +2788,15 @@ static inline void skb_propagate_pfmemalloc(struct page *page, skb->pfmemalloc = true; }
+/** + * skb_frag_off() - Returns the offset of a skb fragment + * @frag: the paged fragment + */ +static inline unsigned int skb_frag_off(const skb_frag_t *frag) +{ + return frag->page_offset; +} + /** * skb_frag_page - retrieve the page referred to by a paged fragment * @frag: the paged fragment diff --git a/net/core/dev.c b/net/core/dev.c index 722ae0b57f3f..a6798117bb1a 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5400,7 +5400,8 @@ static void skb_gro_reset_offset(struct sk_buff *skb)
if (skb_mac_header(skb) == skb_tail_pointer(skb) && pinfo->nr_frags && - !PageHighMem(skb_frag_page(frag0))) { + !PageHighMem(skb_frag_page(frag0)) && + (!NET_IP_ALIGN || !(skb_frag_off(frag0) & 3))) { NAPI_GRO_CB(skb)->frag0 = skb_frag_address(frag0); NAPI_GRO_CB(skb)->frag0_len = min_t(unsigned int, skb_frag_size(frag0),
On Thu, Jul 29, 2021 at 08:44:25PM +0200, Matthieu Baerts wrote:
Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") introduces a ~10% performance drop when using virtio-net drivers.
This commit has been backported to v4.19 in commit 669c0b5782fb and this performance drop is also visible there. Here at Tessares, we can also notice this drop with the MPTCP fork [1] on top of the v4.19 kernel.
Eric Dumazet already fixed this issue a few months ago, see commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head").
Unfortunately, this patch has not been backported to < v5.4 because it caused issues [2]. Indeed, after having backported it, the kernel failed to compile because one commit was missing, see commit 503d539a6e41 ("virtio_net: Add XDP meta data support"). However, this missing commit has been added in 4.19.186 but probably because there were still some opened discussions [3] around commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head"), the latter has not been backported at all in v4.19.
A cherry-pick of this patch without any modification is proposed here. It has been validated: it fixes the original issue on v4.19 as well.
Please note that there is also a fix for the fix, see commit 38ec4944b593 ("gro: ensure frag0 meets IP header alignment").
This second fix has also not been backported because it caused issues as well [4]. Here, it was due to a conflict but also a compilation error when the conflict has been resolved. Please refer to patch 2/2 for more details.
One last note: these two patches have also been backported and validated on a v4.14 release. A second series is going to be sent. It looks like it could be interesting to backport these two patches to v4.9 and v4.4 as well but unfortunately, the backport of these two patches fails with conflicts and I don't have any setup to validate the performance drop and fix with v4.9 and v4.4 kernels.
Both sets of series now queued up, thanks!
greg k-h
linux-stable-mirror@lists.linaro.org