On Sun, 29 Sep 2024 16:10:47 +0900 Akihiko Odaki akihiko.odaki@daynix.com wrote:
On 2024/09/29 11:07, Jason Wang wrote:
On Fri, Sep 27, 2024 at 3:51 PM Akihiko Odaki akihiko.odaki@daynix.com wrote:
On 2024/09/27 13:31, Jason Wang wrote:
On Fri, Sep 27, 2024 at 10:11 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:
On 2024/09/25 12:30, Jason Wang wrote:
On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki akihiko.odaki@daynix.com wrote: > > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to overcome > thse challenges. > > An alternative solution is to extend the eBPF steering program so that it > will be able to report to the userspace, but it is based on context > rewrites, which is in feature freeze. We can adopt kfuncs, but they will > not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM > and vhost_net). >
I wonder if we could clone the skb and reuse some to store the hash, then the steering eBPF program can access these fields without introducing full RSS in the kernel?
I don't get how cloning the skb can solve the issue.
We can certainly implement Toeplitz function in the kernel or even with tc-bpf to store a hash value that can be used for eBPF steering program and virtio hash reporting. However we don't have a means of storing a hash type, which is specific to virtio hash reporting and lacks a corresponding skb field.
I may miss something but looking at sk_filter_is_valid_access(). It looks to me we can make use of skb->cb[0..4]?
I didn't opt to using cb. Below is the rationale:
cb is for tail call so it means we reuse the field for a different purpose. The context rewrite allows adding a field without increasing the size of the underlying storage (the real sk_buff) so we should add a new field instead of reusing an existing field to avoid confusion.
We are however no longer allowed to add a new field. In my understanding, this is because it is an UAPI, and eBPF maintainers found it is difficult to maintain its stability.
Reusing cb for hash reporting is a workaround to avoid having a new field, but it does not solve the underlying problem (i.e., keeping eBPF as stable as UAPI is unreasonably hard). In my opinion, adding an ioctl is a reasonable option to keep the API as stable as other virtualization UAPIs while respecting the underlying intention of the context rewrite feature freeze.
Fair enough.
Btw, I remember DPDK implements tuntap RSS via eBPF as well (probably via cls or other). It might worth to see if anything we miss here.
Thanks for the information. I wonder why they used cls instead of steering program. Perhaps it may be due to compatibility with macvtap and ipvtap, which don't steering program.
Their RSS implementation looks cleaner so I will improve my RSS implementation accordingly.
DPDK needs to support flow rules. The specific case is where packets are classified by a flow, then RSS is done across a subset of the queues. The support for flow in TUN driver is more academic than useful, I fixed it for current BPF, but doubt anyone is using it really.
A full steering program would be good, but would require much more complexity to take a general set of flow rules then communicate that to the steering program.