Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+, but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
Regards, Jack Wang 1 & 1 IONOS Cloud GmbH
Jason Wang (4): vhost: introduce vhost_exceeds_weight() vhost_net: fix possible infinite loop vhost: vsock: add weight support vhost: scsi: add weight support
drivers/vhost/net.c | 41 ++++++++++++++--------------------------- drivers/vhost/scsi.c | 15 +++++++++++---- drivers/vhost/vhost.c | 20 +++++++++++++++++++- drivers/vhost/vhost.h | 5 ++++- drivers/vhost/vsock.c | 28 +++++++++++++++++++++------- 5 files changed, 69 insertions(+), 40 deletions(-)
From: Jason Wang jasowang@redhat.com
commit e82b9b0727ff6d665fff2d326162b460dded554d upstream.
We used to have vhost_exceeds_weight() for vhost-net to:
- prevent vhost kthread from hogging the cpu - balance the time spent between TX and RX
This function could be useful for vsock and scsi as well. So move it to vhost.c. Device must specify a weight which counts the number of requests, or it can also specific a byte_weight which counts the number of bytes that has been processed.
Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com [jwang: backport to 4.19, fix conflict in net.c] Signed-off-by: Jack Wang jinpu.wang@cloud.ionos.com --- drivers/vhost/net.c | 22 ++++++---------------- drivers/vhost/scsi.c | 9 ++++++++- drivers/vhost/vhost.c | 20 +++++++++++++++++++- drivers/vhost/vhost.h | 5 ++++- drivers/vhost/vsock.c | 12 +++++++++++- 5 files changed, 48 insertions(+), 20 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 39155d7cc894..f6cf6825c15f 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -497,12 +497,6 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter, return iov_iter_count(iter); }
-static bool vhost_exceeds_weight(int pkts, int total_len) -{ - return total_len >= VHOST_NET_WEIGHT || - pkts >= VHOST_NET_PKT_WEIGHT; -} - static int get_tx_bufs(struct vhost_net *net, struct vhost_net_virtqueue *nvq, struct msghdr *msg, @@ -598,10 +592,8 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) err, len); if (++nvq->done_idx >= VHOST_NET_BATCH) vhost_net_signal_used(nvq); - if (vhost_exceeds_weight(++sent_pkts, total_len)) { - vhost_poll_queue(&vq->poll); + if (vhost_exceeds_weight(vq, ++sent_pkts, total_len)) break; - } }
vhost_net_signal_used(nvq); @@ -701,10 +693,9 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(++sent_pkts, total_len))) { - vhost_poll_queue(&vq->poll); + if (unlikely(vhost_exceeds_weight(vq, ++sent_pkts, + total_len))) break; - } } }
@@ -1027,10 +1018,8 @@ static void handle_rx(struct vhost_net *net) vhost_log_write(vq, vq_log, log, vhost_len, vq->iov, in); total_len += vhost_len; - if (unlikely(vhost_exceeds_weight(++recv_pkts, total_len))) { - vhost_poll_queue(&vq->poll); + if (unlikely(vhost_exceeds_weight(vq, ++recv_pkts, total_len))) goto out; - } } if (unlikely(busyloop_intr)) vhost_poll_queue(&vq->poll); @@ -1115,7 +1104,8 @@ static int vhost_net_open(struct inode *inode, struct file *f) vhost_net_buf_init(&n->vqs[i].rxq); } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, - UIO_MAXIOV + VHOST_NET_BATCH); + UIO_MAXIOV + VHOST_NET_BATCH, + VHOST_NET_WEIGHT, VHOST_NET_PKT_WEIGHT);
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev); diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 0cfa925be4ec..087ce17b0c39 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -57,6 +57,12 @@ #define VHOST_SCSI_PREALLOC_UPAGES 2048 #define VHOST_SCSI_PREALLOC_PROT_SGLS 2048
+/* Max number of requests before requeueing the job. + * Using this limit prevents one virtqueue from starving others with + * request. + */ +#define VHOST_SCSI_WEIGHT 256 + struct vhost_scsi_inflight { /* Wait for the flush operation to finish */ struct completion comp; @@ -1398,7 +1404,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) vqs[i] = &vs->vqs[i].vq; vs->vqs[i].vq.handle_kick = vhost_scsi_handle_kick; } - vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV); + vhost_dev_init(&vs->dev, vqs, VHOST_SCSI_MAX_VQ, UIO_MAXIOV, + VHOST_SCSI_WEIGHT, 0);
vhost_scsi_init_inflight(vs, NULL);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c163bc15976a..0752f8dc47b1 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -413,8 +413,24 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev) vhost_vq_free_iovecs(dev->vqs[i]); }
+bool vhost_exceeds_weight(struct vhost_virtqueue *vq, + int pkts, int total_len) +{ + struct vhost_dev *dev = vq->dev; + + if ((dev->byte_weight && total_len >= dev->byte_weight) || + pkts >= dev->weight) { + vhost_poll_queue(&vq->poll); + return true; + } + + return false; +} +EXPORT_SYMBOL_GPL(vhost_exceeds_weight); + void vhost_dev_init(struct vhost_dev *dev, - struct vhost_virtqueue **vqs, int nvqs, int iov_limit) + struct vhost_virtqueue **vqs, int nvqs, + int iov_limit, int weight, int byte_weight) { struct vhost_virtqueue *vq; int i; @@ -428,6 +444,8 @@ void vhost_dev_init(struct vhost_dev *dev, dev->mm = NULL; dev->worker = NULL; dev->iov_limit = iov_limit; + dev->weight = weight; + dev->byte_weight = byte_weight; init_llist_head(&dev->work_list); init_waitqueue_head(&dev->wait); INIT_LIST_HEAD(&dev->read_list); diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 9490e7ddb340..27a78a9b8cc7 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -171,10 +171,13 @@ struct vhost_dev { struct list_head pending_list; wait_queue_head_t wait; int iov_limit; + int weight; + int byte_weight; };
+bool vhost_exceeds_weight(struct vhost_virtqueue *vq, int pkts, int total_len); void vhost_dev_init(struct vhost_dev *, struct vhost_virtqueue **vqs, - int nvqs, int iov_limit); + int nvqs, int iov_limit, int weight, int byte_weight); long vhost_dev_set_owner(struct vhost_dev *dev); bool vhost_dev_has_owner(struct vhost_dev *dev); long vhost_dev_check_owner(struct vhost_dev *); diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index e440f87ae1d6..58c5c82bc0be 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -21,6 +21,14 @@ #include "vhost.h"
#define VHOST_VSOCK_DEFAULT_HOST_CID 2 +/* Max number of bytes transferred before requeueing the job. + * Using this limit prevents one virtqueue from starving others. */ +#define VHOST_VSOCK_WEIGHT 0x80000 +/* Max number of packets transferred before requeueing the job. + * Using this limit prevents one virtqueue from starving others with + * small pkts. + */ +#define VHOST_VSOCK_PKT_WEIGHT 256
enum { VHOST_VSOCK_FEATURES = VHOST_FEATURES, @@ -531,7 +539,9 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) vsock->vqs[VSOCK_VQ_TX].handle_kick = vhost_vsock_handle_tx_kick; vsock->vqs[VSOCK_VQ_RX].handle_kick = vhost_vsock_handle_rx_kick;
- vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs), UIO_MAXIOV); + vhost_dev_init(&vsock->dev, vqs, ARRAY_SIZE(vsock->vqs), + UIO_MAXIOV, VHOST_VSOCK_PKT_WEIGHT, + VHOST_VSOCK_WEIGHT);
file->private_data = vsock; spin_lock_init(&vsock->send_pkt_list_lock);
From: Jason Wang jasowang@redhat.com
commit e2412c07f8f3040593dfb88207865a3cd58680c0 upstream.
When the rx buffer is too small for a packet, we will discard the vq descriptor and retry it for the next packet:
while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, &busyloop_intr))) { ... /* On overrun, truncate and discard */ if (unlikely(headcount > UIO_MAXIOV)) { iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); err = sock->ops->recvmsg(sock, &msg, 1, MSG_DONTWAIT | MSG_TRUNC); pr_debug("Discarded rx packet: len %zd\n", sock_len); continue; } ... }
This makes it possible to trigger a infinite while..continue loop through the co-opreation of two VMs like:
1) Malicious VM1 allocate 1 byte rx buffer and try to slow down the vhost process as much as possible e.g using indirect descriptors or other. 2) Malicious VM2 generate packets to VM1 as fast as possible
Fixing this by checking against weight at the end of RX and TX loop. This also eliminate other similar cases when:
- userspace is consuming the packets in the meanwhile - theoretical TOCTOU attack if guest moving avail index back and forth to hit the continue after vhost find guest just add new buffers
This addresses CVE-2019-3900.
Fixes: d8316f3991d20 ("vhost: fix total length when packets are too short") Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com [jwang: backport to 4.19] Signed-off-by: Jack Wang jinpu.wang@cloud.ionos.com --- drivers/vhost/net.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index f6cf6825c15f..9a20ec796c12 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -551,7 +551,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) int err; int sent_pkts = 0;
- for (;;) { + do { bool busyloop_intr = false;
head = get_tx_bufs(net, nvq, &msg, &out, &in, &len, @@ -592,9 +592,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock) err, len); if (++nvq->done_idx >= VHOST_NET_BATCH) vhost_net_signal_used(nvq); - if (vhost_exceeds_weight(vq, ++sent_pkts, total_len)) - break; - } + } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
vhost_net_signal_used(nvq); } @@ -618,7 +616,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) bool zcopy_used; int sent_pkts = 0;
- for (;;) { + do { bool busyloop_intr;
/* Release DMAs done buffers first */ @@ -693,10 +691,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock) else vhost_zerocopy_signal_used(net, vq); vhost_net_tx_packet(net); - if (unlikely(vhost_exceeds_weight(vq, ++sent_pkts, - total_len))) - break; - } + } while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len))); }
/* Expects to be always run from workqueue - which acts as @@ -932,8 +927,11 @@ static void handle_rx(struct vhost_net *net) vq->log : NULL; mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
- while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk, - &busyloop_intr))) { + do { + sock_len = vhost_net_rx_peek_head_len(net, sock->sk, + &busyloop_intr); + if (!sock_len) + break; sock_len += sock_hlen; vhost_len = sock_len + vhost_hlen; headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx, @@ -1018,12 +1016,11 @@ static void handle_rx(struct vhost_net *net) vhost_log_write(vq, vq_log, log, vhost_len, vq->iov, in); total_len += vhost_len; - if (unlikely(vhost_exceeds_weight(vq, ++recv_pkts, total_len))) - goto out; - } + } while (likely(!vhost_exceeds_weight(vq, ++recv_pkts, total_len))); + if (unlikely(busyloop_intr)) vhost_poll_queue(&vq->poll); - else + else if (!sock_len) vhost_net_enable_vq(net, vq); out: vhost_net_signal_used(nvq); @@ -1105,7 +1102,7 @@ static int vhost_net_open(struct inode *inode, struct file *f) } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX, UIO_MAXIOV + VHOST_NET_BATCH, - VHOST_NET_WEIGHT, VHOST_NET_PKT_WEIGHT); + VHOST_NET_PKT_WEIGHT, VHOST_NET_WEIGHT);
vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, EPOLLOUT, dev); vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, EPOLLIN, dev);
From: Jason Wang jasowang@redhat.com
commit e79b431fb901ba1106670bcc80b9b617b25def7d upstream.
This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing vsock kthread from hogging cpu which is guest triggerable. The weight can help to avoid starving the request from on direction while another direction is being processed.
The value of weight is picked from vhost-net.
This addresses CVE-2019-3900.
Cc: Stefan Hajnoczi stefanha@redhat.com Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com [jwang: backport to 4.19] Signed-off-by: Jack Wang jinpu.wang@cloud.ionos.com --- drivers/vhost/vsock.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 58c5c82bc0be..bab495d73195 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -86,6 +86,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, struct vhost_virtqueue *vq) { struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX]; + int pkts = 0, total_len = 0; bool added = false; bool restart_tx = false;
@@ -97,7 +98,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, /* Avoid further vmexits, we're already processing the virtqueue */ vhost_disable_notify(&vsock->dev, vq);
- for (;;) { + do { struct virtio_vsock_pkt *pkt; struct iov_iter iov_iter; unsigned out, in; @@ -182,8 +183,9 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, */ virtio_transport_deliver_tap_pkt(pkt);
+ total_len += pkt->len; virtio_transport_free_pkt(pkt); - } + } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len))); if (added) vhost_signal(&vsock->dev, vq);
@@ -358,7 +360,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock, dev); struct virtio_vsock_pkt *pkt; - int head; + int head, pkts = 0, total_len = 0; unsigned int out, in; bool added = false;
@@ -368,7 +370,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) goto out;
vhost_disable_notify(&vsock->dev, vq); - for (;;) { + do { u32 len;
if (!vhost_vsock_more_replies(vsock)) { @@ -409,9 +411,11 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work) else virtio_transport_free_pkt(pkt);
- vhost_add_used(vq, head, sizeof(pkt->hdr) + len); + len += sizeof(pkt->hdr); + vhost_add_used(vq, head, len); + total_len += len; added = true; - } + } while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
no_more_replies: if (added)
From: Jason Wang jasowang@redhat.com
commit c1ea02f15ab5efb3e93fc3144d895410bf79fcf2 upstream
This patch will check the weight and exit the loop if we exceeds the weight. This is useful for preventing scsi kthread from hogging cpu which is guest triggerable.
This addresses CVE-2019-3900.
Cc: Paolo Bonzini pbonzini@redhat.com Cc: Stefan Hajnoczi stefanha@redhat.com Fixes: 057cbf49a1f0 ("tcm_vhost: Initial merge for vhost level target fabric driver") Signed-off-by: Jason Wang jasowang@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefan Hajnoczi stefanha@redhat.com [jwang: backport to 4.19] Signed-off-by: Jack Wang jinpu.wang@cloud.ionos.com --- drivers/vhost/scsi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 087ce17b0c39..5e298d9287f1 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -817,7 +817,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) u64 tag; u32 exp_data_len, data_direction; unsigned int out = 0, in = 0; - int head, ret, prot_bytes; + int head, ret, prot_bytes, c = 0; size_t req_size, rsp_size = sizeof(struct virtio_scsi_cmd_resp); size_t out_size, in_size; u16 lun; @@ -836,7 +836,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
vhost_disable_notify(&vs->dev, vq);
- for (;;) { + do { head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov), &out, &in, NULL, NULL); @@ -1051,7 +1051,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) */ INIT_WORK(&cmd->work, vhost_scsi_submission_work); queue_work(vhost_scsi_workqueue, &cmd->work); - } + } while (likely(!vhost_exceeds_weight(vq, ++c, 0))); out: mutex_unlock(&vq->mutex); }
On Mon, Jul 22, 2019 at 03:03:09PM +0200, Jack Wang wrote:
Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+, but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
Thanks Jack. It'll be great if someone can test it and confirm it fixes the issue (and nothing else breaks).
-- Thanks, Sasha
Sasha Levin sashal@kernel.org 于2019年7月22日周一 下午5:42写道:
On Mon, Jul 22, 2019 at 03:03:09PM +0200, Jack Wang wrote:
Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+, but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
Thanks Jack. It'll be great if someone can test it and confirm it fixes the issue (and nothing else breaks).
Agree, thanks
-- Thanks, Sasha
Hi,
just wanted to ask about the status of those? I'm testing patches on top of 4.19.60, not sure about how can I test if the problem is fixed, but at least nothing seems to be broken so far..
BR
nik
On Tue, Jul 23, 2019 at 11:59:16AM +0200, Jinpu Wang wrote:
Sasha Levin sashal@kernel.org 于2019年7月22日周一 下午5:42写道:
On Mon, Jul 22, 2019 at 03:03:09PM +0200, Jack Wang wrote:
Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+, but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
Thanks Jack. It'll be great if someone can test it and confirm it fixes the issue (and nothing else breaks).
Agree, thanks
-- Thanks, Sasha
A: No. Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Fri, Aug 02, 2019 at 09:02:01AM +0200, Nikola Ciprich wrote:
Hi,
just wanted to ask about the status of those? I'm testing patches on top of 4.19.60, not sure about how can I test if the problem is fixed, but at least nothing seems to be broken so far..
We wanted to get some verification that the issues were really fixed by these patches. If no one knows how to test them, then odds are they are not vulnerable, right? :)
thanks,
greg k-h
On Fri, Aug 02, 2019 at 09:13:42AM +0200, Greg Kroah-Hartman wrote:
A: No. Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Fri, Aug 02, 2019 at 09:02:01AM +0200, Nikola Ciprich wrote:
Hi,
just wanted to ask about the status of those? I'm testing patches on top of 4.19.60, not sure about how can I test if the problem is fixed, but at least nothing seems to be broken so far..
We wanted to get some verification that the issues were really fixed by these patches. If no one knows how to test them, then odds are they are not vulnerable, right? :)
Ok, snarkyness aside, let me look at these today and queue them up. They have been sitting here for a while and they would be good to get merged...
thanks,
greg k-h
On Mon, Jul 22, 2019 at 03:03:09PM +0200, Jack Wang wrote:
Hi, Greg, hi Sasha,
I noticed the fixes for CVE-2019-3900 are only backported to 4.14.133+, but not to 4.19, also 5.1, fixes have been included in 5.2.
So I backported to 4.19, only compiles fine, no functional tests.
Please review, and consider to include in next release.
All looks good, sorry for the delay, now queued up.
greg k-h
linux-stable-mirror@lists.linaro.org