Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

List overview All Threads
Download

newer

older

[RFC 0/8] iommufd support pasid...

Re: [RFC PATCH 5/7] tun: Introduce...

Willem de Bruijn

9 Oct 2023 9 Oct '23

8:13 a.m.

On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...

virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...

@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }

    if (vnet_hdr_sz) {

```
          struct virtio_net_hdr gso;
```

```
          union {
```

                  struct virtio_net_hdr hdr;

                  struct virtio_net_hdr_v1_hash v1_hash_hdr;

```
          } hdr;
```

          int ret;

          if (iov_iter_count(iter) < vnet_hdr_sz)
                  return -EINVAL;

          if (virtio_net_hdr_from_skb(skb, &gso,

                                      tun_is_little_endian(tun), true,

                                      vlan_hlen)) {

          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&

              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&

```
              skb->tun_vnet_hash) {
```

Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible

...

                  vnet_hdr_content_sz = sizeof(hdr.v1_hash_hdr);

                  ret = virtio_net_hdr_v1_hash_from_skb(skb,

                                                        &hdr.v1_hash_hdr,

                                                        true,

                                                        vlan_hlen,

                                                        &vnet_hash);

```
          } else {
```

                  vnet_hdr_content_sz = sizeof(hdr.hdr);

                  ret = virtio_net_hdr_from_skb(skb, &hdr.hdr,

                                                tun_is_little_endian(tun),

                                                true, vlan_hlen);

```
          }
```

Show replies by date

Akihiko Odaki

9 Oct 9 Oct

8:44 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On 2023/10/09 17:13, Willem de Bruijn wrote:

...

On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
     if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

           if (iov_iter_count(iter) < vnet_hdr_sz)
                   return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible

There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.

Willem de Bruijn

9:54 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...

On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
     if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

           if (iov_iter_count(iter) < vnet_hdr_sz)
                   return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible
There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.

It should not be possible to downgrade the hdr_sz once v1 is selected.

Akihiko Odaki

10:05 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On 2023/10/09 18:54, Willem de Bruijn wrote:

...

On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
      if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

            if (iov_iter_count(iter) < vnet_hdr_sz)
                    return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible
There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.
It should not be possible to downgrade the hdr_sz once v1 is selected.

I see nothing that prevents shrinking the header size.

tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen even for the case the header size grows though this can be fixed by reordering the two reads.

Willem de Bruijn

10:07 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...

On 2023/10/09 18:54, Willem de Bruijn wrote:

...
On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
      if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

            if (iov_iter_count(iter) < vnet_hdr_sz)
                    return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible
There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.
It should not be possible to downgrade the hdr_sz once v1 is selected.
I see nothing that prevents shrinking the header size.

tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen even for the case the header size grows though this can be fixed by reordering the two reads.

One option is to fail any control path that tries to re-negotiate header size once this hash option is enabled?

There is no practical reason to allow feature re-negotiation at any arbitrary time.

Akihiko Odaki

10:11 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On 2023/10/09 19:07, Willem de Bruijn wrote:

...

On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
On 2023/10/09 18:54, Willem de Bruijn wrote:

...
On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
       if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

             if (iov_iter_count(iter) < vnet_hdr_sz)
                     return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible
There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.
It should not be possible to downgrade the hdr_sz once v1 is selected.
I see nothing that prevents shrinking the header size.

tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen even for the case the header size grows though this can be fixed by reordering the two reads.
One option is to fail any control path that tries to re-negotiate header size once this hash option is enabled?

There is no practical reason to allow feature re-negotiation at any arbitrary time.

I think it's a bit awkward interface design since tun allows to reconfigure any of its parameters, but it's certainly possible.

Willem de Bruijn

10:32 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 9, 2023 at 3:11 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...

On 2023/10/09 19:07, Willem de Bruijn wrote:

...
On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
On 2023/10/09 18:54, Willem de Bruijn wrote:

...
On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote: > > virtio-net have two usage of hashes: one is RSS and another is hash > reporting. Conventionally the hash calculation was done by the VMM. > However, computing the hash after the queue was chosen defeats the > purpose of RSS. > > Another approach is to use eBPF steering program. This approach has > another downside: it cannot report the calculated hash due to the > restrictive nature of eBPF. > > Introduce the code to compute hashes to the kernel in order to overcome > thse challenges. An alternative solution is to extend the eBPF steering > program so that it will be able to report to the userspace, but it makes > little sense to allow to implement different hashing algorithms with > eBPF since the hash value reported by virtio-net is strictly defined by > the specification. > > The hash value already stored in sk_buff is not used and computed > independently since it may have been computed in a way not conformant > with the specification. > > Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, > } > > if (vnet_hdr_sz) { > - struct virtio_net_hdr gso; > + union { > + struct virtio_net_hdr hdr; > + struct virtio_net_hdr_v1_hash v1_hash_hdr; > + } hdr; > + int ret; > > if (iov_iter_count(iter) < vnet_hdr_sz) > return -EINVAL; > > - if (virtio_net_hdr_from_skb(skb, &gso, > - tun_is_little_endian(tun), true, > - vlan_hlen)) { > + if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) && > + vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) && > + skb->tun_vnet_hash) {

Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible

There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.

It should not be possible to downgrade the hdr_sz once v1 is selected.

I see nothing that prevents shrinking the header size.

tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen even for the case the header size grows though this can be fixed by reordering the two reads.

One option is to fail any control path that tries to re-negotiate header size once this hash option is enabled?

There is no practical reason to allow feature re-negotiation at any arbitrary time.

I think it's a bit awkward interface design since tun allows to reconfigure any of its parameters, but it's certainly possible.

If this would be the only exception to that rule, and this is the only place that needs a datapath check, then it's fine to leave as is.

In general, this runtime configurability serves little purpose but to help syzbot exercise code paths no real application would attempt. But I won't ask to diverge from whatever tun already does. We just have to be more careful about the possible races it brings.

Michael S. Tsirkin

11:50 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On Mon, Oct 09, 2023 at 05:44:20PM +0900, Akihiko Odaki wrote:

...

On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
     if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

           if (iov_iter_count(iter) < vnet_hdr_sz)
                   return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible
There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.

And then it's a complete mess and you get inconsistent behaviour with packets getting sent all over the place, right? So maybe keep a pointer to this struct so it can be changed atomically then. Maybe even something with rcu I donnu.

-- MST

Akihiko Odaki

10 Oct 10 Oct

2:34 a.m.

New subject: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

On 2023/10/09 20:50, Michael S. Tsirkin wrote:

...

On Mon, Oct 09, 2023 at 05:44:20PM +0900, Akihiko Odaki wrote:

...
On 2023/10/09 17:13, Willem de Bruijn wrote:

...
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki akihiko.odaki@daynix.com wrote:

...
virtio-net have two usage of hashes: one is RSS and another is hash reporting. Conventionally the hash calculation was done by the VMM. However, computing the hash after the queue was chosen defeats the purpose of RSS.

Another approach is to use eBPF steering program. This approach has another downside: it cannot report the calculated hash due to the restrictive nature of eBPF.

Introduce the code to compute hashes to the kernel in order to overcome thse challenges. An alternative solution is to extend the eBPF steering program so that it will be able to report to the userspace, but it makes little sense to allow to implement different hashing algorithms with eBPF since the hash value reported by virtio-net is strictly defined by the specification.

The hash value already stored in sk_buff is not used and computed independently since it may have been computed in a way not conformant with the specification.

Signed-off-by: Akihiko Odaki akihiko.odaki@daynix.com

...
@@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun, }
      if (vnet_hdr_sz) {
          struct virtio_net_hdr gso;
          union {
                  struct virtio_net_hdr hdr;
                  struct virtio_net_hdr_v1_hash v1_hash_hdr;
          } hdr;
          int ret;

            if (iov_iter_count(iter) < vnet_hdr_sz)
                    return -EINVAL;
          if (virtio_net_hdr_from_skb(skb, &gso,
                                      tun_is_little_endian(tun), true,
                                      vlan_hlen)) {
          if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) &&
              vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
              skb->tun_vnet_hash) {
Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible
There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not read at once.
And then it's a complete mess and you get inconsistent behaviour with packets getting sent all over the place, right? So maybe keep a pointer to this struct so it can be changed atomically then. Maybe even something with rcu I donnu.

I think it's a good idea to use RCU for the vnet_hash members, but vnet_hdr_sz is something not specific to vnet_hash so this check will be still necessary.

822

days inactive

823

days old

linux-kselftest-mirror@lists.linaro.org

8 comments

participants

tags (0)

participants (3)

Akihiko Odaki
Michael S. Tsirkin
Willem de Bruijn