September 2021 - Linux-stable-mirror

FAILED: patch "[PATCH] fuse: truncate pagecache on atomic_o_trunc" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 76224355db7570cbe6b6f75c8929a1558828dd55 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi <mszeredi(a)redhat.com> Date: Tue, 17 Aug 2021 21:05:16 +0200 Subject: [PATCH] fuse: truncate pagecache on atomic_o_trunc fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in fuse_launder_page() triggered by invalidate_inode_pages2(). Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE or fc->writeback cache, so do it unconditionally. Reported-by: Xie Yongji <xieyongji(a)bytedance.com> Reported-and-tested-by: syzbot+bea44a5189836d956894(a)syzkaller.appspotmail.com Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC") Cc: <stable(a)vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com> diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 97f860cfc195..5e5efb66c7d7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -198,12 +198,11 @@ void fuse_finish_open(struct inode *inode, struct file *file) struct fuse_file *ff = file->private_data; struct fuse_conn *fc = get_fuse_conn(inode); - if (!(ff->open_flags & FOPEN_KEEP_CACHE)) - invalidate_inode_pages2(inode->i_mapping); if (ff->open_flags & FOPEN_STREAM) stream_open(inode, file); else if (ff->open_flags & FOPEN_NONSEEKABLE) nonseekable_open(inode, file); + if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -211,10 +210,14 @@ void fuse_finish_open(struct inode *inode, struct file *file) fi->attr_version = atomic64_inc_return(&fc->attr_version); i_size_write(inode, 0); spin_unlock(&fi->lock); + truncate_pagecache(inode, 0); fuse_invalidate_attr(inode); if (fc->writeback_cache) file_update_time(file); + } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) { + invalidate_inode_pages2(inode->i_mapping); } + if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache) fuse_link_write_file(file); }

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] fuse: truncate pagecache on atomic_o_trunc" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 76224355db7570cbe6b6f75c8929a1558828dd55 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi <mszeredi(a)redhat.com> Date: Tue, 17 Aug 2021 21:05:16 +0200 Subject: [PATCH] fuse: truncate pagecache on atomic_o_trunc fuse_finish_open() will be called with FUSE_NOWRITE in case of atomic O_TRUNC. This can deadlock with fuse_wait_on_page_writeback() in fuse_launder_page() triggered by invalidate_inode_pages2(). Fix by replacing invalidate_inode_pages2() in fuse_finish_open() with a truncate_pagecache() call. This makes sense regardless of FOPEN_KEEP_CACHE or fc->writeback cache, so do it unconditionally. Reported-by: Xie Yongji <xieyongji(a)bytedance.com> Reported-and-tested-by: syzbot+bea44a5189836d956894(a)syzkaller.appspotmail.com Fixes: e4648309b85a ("fuse: truncate pending writes on O_TRUNC") Cc: <stable(a)vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com> diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 97f860cfc195..5e5efb66c7d7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -198,12 +198,11 @@ void fuse_finish_open(struct inode *inode, struct file *file) struct fuse_file *ff = file->private_data; struct fuse_conn *fc = get_fuse_conn(inode); - if (!(ff->open_flags & FOPEN_KEEP_CACHE)) - invalidate_inode_pages2(inode->i_mapping); if (ff->open_flags & FOPEN_STREAM) stream_open(inode, file); else if (ff->open_flags & FOPEN_NONSEEKABLE) nonseekable_open(inode, file); + if (fc->atomic_o_trunc && (file->f_flags & O_TRUNC)) { struct fuse_inode *fi = get_fuse_inode(inode); @@ -211,10 +210,14 @@ void fuse_finish_open(struct inode *inode, struct file *file) fi->attr_version = atomic64_inc_return(&fc->attr_version); i_size_write(inode, 0); spin_unlock(&fi->lock); + truncate_pagecache(inode, 0); fuse_invalidate_attr(inode); if (fc->writeback_cache) file_update_time(file); + } else if (!(ff->open_flags & FOPEN_KEEP_CACHE)) { + invalidate_inode_pages2(inode->i_mapping); } + if ((file->f_mode & FMODE_WRITE) && fc->writeback_cache) fuse_link_write_file(file); }

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: check max_worker limits if a worker transitions bound" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ecc53c48c13d995e6fe5559e30ffee48d92784fd Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Sun, 29 Aug 2021 16:13:03 -0600 Subject: [PATCH] io-wq: check max_worker limits if a worker transitions bound state For the two places where new workers are created, we diligently check if we are allowed to create a new worker. If we're currently at the limit of how many workers of a given type we can have, then we don't create any new ones. If you have a mixed workload with various types of bound and unbounded work, then it can happen that a worker finishes one type of work and is then transitioned to the other type. For this case, we don't check if we are actually allowed to do so. This can cause io-wq to temporarily exceed the allowed number of workers for a given type. When retrieving work, check that the types match. If they don't, check if we are allowed to transition to the other type. If not, then don't handle the new work. Cc: stable(a)vger.kernel.org Reported-by: Johannes Lundberg <johalun0(a)gmail.com> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index 4b5fc621ab39..da3ad45028f9 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -424,7 +424,28 @@ static void io_wait_on_hash(struct io_wqe *wqe, unsigned int hash) spin_unlock(&wq->hash->wait.lock); } -static struct io_wq_work *io_get_next_work(struct io_wqe *wqe) +/* + * We can always run the work if the worker is currently the same type as + * the work (eg both are bound, or both are unbound). If they are not the + * same, only allow it if incrementing the worker count would be allowed. + */ +static bool io_worker_can_run_work(struct io_worker *worker, + struct io_wq_work *work) +{ + struct io_wqe_acct *acct; + + if (!(worker->flags & IO_WORKER_F_BOUND) != + !(work->flags & IO_WQ_WORK_UNBOUND)) + return true; + + /* not the same type, check if we'd go over the limit */ + acct = io_work_get_acct(worker->wqe, work); + return acct->nr_workers < acct->max_workers; +} + +static struct io_wq_work *io_get_next_work(struct io_wqe *wqe, + struct io_worker *worker, + bool *stalled) __must_hold(wqe->lock) { struct io_wq_work_node *node, *prev; @@ -436,6 +457,9 @@ static struct io_wq_work *io_get_next_work(struct io_wqe *wqe) work = container_of(node, struct io_wq_work, list); + if (!io_worker_can_run_work(worker, work)) + break; + /* not hashed, can run anytime */ if (!io_wq_is_hashed(work)) { wq_list_del(&wqe->work_list, node, prev); @@ -462,6 +486,7 @@ static struct io_wq_work *io_get_next_work(struct io_wqe *wqe) raw_spin_unlock(&wqe->lock); io_wait_on_hash(wqe, stall_hash); raw_spin_lock(&wqe->lock); + *stalled = true; } return NULL; @@ -501,6 +526,7 @@ static void io_worker_handle_work(struct io_worker *worker) do { struct io_wq_work *work; + bool stalled; get_next: /* * If we got some work, mark us as busy. If we didn't, but @@ -509,10 +535,11 @@ static void io_worker_handle_work(struct io_worker *worker) * can't make progress, any work completion or insertion will * clear the stalled flag. */ - work = io_get_next_work(wqe); + stalled = false; + work = io_get_next_work(wqe, worker, &stalled); if (work) __io_worker_busy(wqe, worker, work); - else if (!wq_list_empty(&wqe->work_list)) + else if (stalled) wqe->flags |= IO_WQE_FLAG_STALLED; raw_spin_unlock_irq(&wqe->lock);

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: fix race between adding work and activating a free" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 94ffb0a282872c2f4b14f757fa1aef2302aeaabb Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Mon, 30 Aug 2021 11:55:22 -0600 Subject: [PATCH] io-wq: fix race between adding work and activating a free worker The attempt to find and activate a free worker for new work is currently combined with creating a new one if we don't find one, but that opens io-wq up to a race where the worker that is found and activated can put itself to sleep without knowing that it has been selected to perform this new work. Fix this by moving the activation into where we add the new work item, then we can retain it within the wqe->lock scope and elimiate the race with the worker itself checking inside the lock, but sleeping outside of it. Cc: stable(a)vger.kernel.org Reported-by: Andres Freund <andres(a)anarazel.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index cd9bd095fb1b..94f8f2ecb8e5 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -236,9 +236,9 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe) * We need a worker. If we find a free one, we're good. If not, and we're * below the max number of workers, create one. */ -static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) +static void io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) { - bool ret; + bool do_create = false, first = false; /* * Most likely an attempt to queue unbounded work on an io_wq that @@ -247,26 +247,18 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) if (unlikely(!acct->max_workers)) pr_warn_once("io-wq is not configured for unbound workers"); - rcu_read_lock(); - ret = io_wqe_activate_free_worker(wqe); - rcu_read_unlock(); - - if (!ret) { - bool do_create = false, first = false; - - raw_spin_lock(&wqe->lock); - if (acct->nr_workers < acct->max_workers) { - if (!acct->nr_workers) - first = true; - acct->nr_workers++; - do_create = true; - } - raw_spin_unlock(&wqe->lock); - if (do_create) { - atomic_inc(&acct->nr_running); - atomic_inc(&wqe->wq->worker_refs); - create_io_worker(wqe->wq, wqe, acct->index, first); - } + raw_spin_lock(&wqe->lock); + if (acct->nr_workers < acct->max_workers) { + if (!acct->nr_workers) + first = true; + acct->nr_workers++; + do_create = true; + } + raw_spin_unlock(&wqe->lock); + if (do_create) { + atomic_inc(&acct->nr_running); + atomic_inc(&wqe->wq->worker_refs); + create_io_worker(wqe->wq, wqe, acct->index, first); } } @@ -794,7 +786,8 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); - bool do_wake; + unsigned work_flags = work->flags; + bool do_create; /* * If io-wq is exiting for this task, or if the request has explicitly @@ -809,12 +802,16 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) raw_spin_lock(&wqe->lock); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; - do_wake = (work->flags & IO_WQ_WORK_CONCURRENT) || - !atomic_read(&acct->nr_running); + + rcu_read_lock(); + do_create = !io_wqe_activate_free_worker(wqe); + rcu_read_unlock(); + raw_spin_unlock(&wqe->lock); - if (do_wake) - io_wqe_wake_worker(wqe, acct); + if (do_create && ((work_flags & IO_WQ_WORK_CONCURRENT) || + !atomic_read(&acct->nr_running))) + io_wqe_create_worker(wqe, acct); } void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: fix race between adding work and activating a free" failed to apply to 5.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 94ffb0a282872c2f4b14f757fa1aef2302aeaabb Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Mon, 30 Aug 2021 11:55:22 -0600 Subject: [PATCH] io-wq: fix race between adding work and activating a free worker The attempt to find and activate a free worker for new work is currently combined with creating a new one if we don't find one, but that opens io-wq up to a race where the worker that is found and activated can put itself to sleep without knowing that it has been selected to perform this new work. Fix this by moving the activation into where we add the new work item, then we can retain it within the wqe->lock scope and elimiate the race with the worker itself checking inside the lock, but sleeping outside of it. Cc: stable(a)vger.kernel.org Reported-by: Andres Freund <andres(a)anarazel.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index cd9bd095fb1b..94f8f2ecb8e5 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -236,9 +236,9 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe) * We need a worker. If we find a free one, we're good. If not, and we're * below the max number of workers, create one. */ -static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) +static void io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) { - bool ret; + bool do_create = false, first = false; /* * Most likely an attempt to queue unbounded work on an io_wq that @@ -247,26 +247,18 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) if (unlikely(!acct->max_workers)) pr_warn_once("io-wq is not configured for unbound workers"); - rcu_read_lock(); - ret = io_wqe_activate_free_worker(wqe); - rcu_read_unlock(); - - if (!ret) { - bool do_create = false, first = false; - - raw_spin_lock(&wqe->lock); - if (acct->nr_workers < acct->max_workers) { - if (!acct->nr_workers) - first = true; - acct->nr_workers++; - do_create = true; - } - raw_spin_unlock(&wqe->lock); - if (do_create) { - atomic_inc(&acct->nr_running); - atomic_inc(&wqe->wq->worker_refs); - create_io_worker(wqe->wq, wqe, acct->index, first); - } + raw_spin_lock(&wqe->lock); + if (acct->nr_workers < acct->max_workers) { + if (!acct->nr_workers) + first = true; + acct->nr_workers++; + do_create = true; + } + raw_spin_unlock(&wqe->lock); + if (do_create) { + atomic_inc(&acct->nr_running); + atomic_inc(&wqe->wq->worker_refs); + create_io_worker(wqe->wq, wqe, acct->index, first); } } @@ -794,7 +786,8 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); - bool do_wake; + unsigned work_flags = work->flags; + bool do_create; /* * If io-wq is exiting for this task, or if the request has explicitly @@ -809,12 +802,16 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) raw_spin_lock(&wqe->lock); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; - do_wake = (work->flags & IO_WQ_WORK_CONCURRENT) || - !atomic_read(&acct->nr_running); + + rcu_read_lock(); + do_create = !io_wqe_activate_free_worker(wqe); + rcu_read_unlock(); + raw_spin_unlock(&wqe->lock); - if (do_wake) - io_wqe_wake_worker(wqe, acct); + if (do_create && ((work_flags & IO_WQ_WORK_CONCURRENT) || + !atomic_read(&acct->nr_running))) + io_wqe_create_worker(wqe, acct); } void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: fix race between adding work and activating a free" failed to apply to 5.13-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.13-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 94ffb0a282872c2f4b14f757fa1aef2302aeaabb Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Mon, 30 Aug 2021 11:55:22 -0600 Subject: [PATCH] io-wq: fix race between adding work and activating a free worker The attempt to find and activate a free worker for new work is currently combined with creating a new one if we don't find one, but that opens io-wq up to a race where the worker that is found and activated can put itself to sleep without knowing that it has been selected to perform this new work. Fix this by moving the activation into where we add the new work item, then we can retain it within the wqe->lock scope and elimiate the race with the worker itself checking inside the lock, but sleeping outside of it. Cc: stable(a)vger.kernel.org Reported-by: Andres Freund <andres(a)anarazel.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index cd9bd095fb1b..94f8f2ecb8e5 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -236,9 +236,9 @@ static bool io_wqe_activate_free_worker(struct io_wqe *wqe) * We need a worker. If we find a free one, we're good. If not, and we're * below the max number of workers, create one. */ -static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) +static void io_wqe_create_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) { - bool ret; + bool do_create = false, first = false; /* * Most likely an attempt to queue unbounded work on an io_wq that @@ -247,26 +247,18 @@ static void io_wqe_wake_worker(struct io_wqe *wqe, struct io_wqe_acct *acct) if (unlikely(!acct->max_workers)) pr_warn_once("io-wq is not configured for unbound workers"); - rcu_read_lock(); - ret = io_wqe_activate_free_worker(wqe); - rcu_read_unlock(); - - if (!ret) { - bool do_create = false, first = false; - - raw_spin_lock(&wqe->lock); - if (acct->nr_workers < acct->max_workers) { - if (!acct->nr_workers) - first = true; - acct->nr_workers++; - do_create = true; - } - raw_spin_unlock(&wqe->lock); - if (do_create) { - atomic_inc(&acct->nr_running); - atomic_inc(&wqe->wq->worker_refs); - create_io_worker(wqe->wq, wqe, acct->index, first); - } + raw_spin_lock(&wqe->lock); + if (acct->nr_workers < acct->max_workers) { + if (!acct->nr_workers) + first = true; + acct->nr_workers++; + do_create = true; + } + raw_spin_unlock(&wqe->lock); + if (do_create) { + atomic_inc(&acct->nr_running); + atomic_inc(&wqe->wq->worker_refs); + create_io_worker(wqe->wq, wqe, acct->index, first); } } @@ -794,7 +786,8 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); - bool do_wake; + unsigned work_flags = work->flags; + bool do_create; /* * If io-wq is exiting for this task, or if the request has explicitly @@ -809,12 +802,16 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) raw_spin_lock(&wqe->lock); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; - do_wake = (work->flags & IO_WQ_WORK_CONCURRENT) || - !atomic_read(&acct->nr_running); + + rcu_read_lock(); + do_create = !io_wqe_activate_free_worker(wqe); + rcu_read_unlock(); + raw_spin_unlock(&wqe->lock); - if (do_wake) - io_wqe_wake_worker(wqe, acct); + if (do_create && ((work_flags & IO_WQ_WORK_CONCURRENT) || + !atomic_read(&acct->nr_running))) + io_wqe_create_worker(wqe, acct); } void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: fix wakeup race when adding new work" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 87df7fb922d18e96992aa5e824aa34b2065fef59 Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Mon, 30 Aug 2021 07:45:47 -0600 Subject: [PATCH] io-wq: fix wakeup race when adding new work When new work is added, io_wqe_enqueue() checks if we need to wake or create a new worker. But that check is done outside the lock that otherwise synchronizes us with a worker going to sleep, so we can end up in the following situation: CPU0 CPU1 lock insert work unlock atomic_read(nr_running) != 0 lock atomic_dec(nr_running) no wakeup needed Hold the wqe lock around the "need to wakeup" check. Then we can also get rid of the temporary work_flags variable, as we know the work will remain valid as long as we hold the lock. Cc: stable(a)vger.kernel.org Reported-by: Andres Freund <andres(a)anarazel.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index 13aeb48a0964..cd9bd095fb1b 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -794,7 +794,7 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); - int work_flags; + bool do_wake; /* * If io-wq is exiting for this task, or if the request has explicitly @@ -806,14 +806,14 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) return; } - work_flags = work->flags; raw_spin_lock(&wqe->lock); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; + do_wake = (work->flags & IO_WQ_WORK_CONCURRENT) || + !atomic_read(&acct->nr_running); raw_spin_unlock(&wqe->lock); - if ((work_flags & IO_WQ_WORK_CONCURRENT) || - !atomic_read(&acct->nr_running)) + if (do_wake) io_wqe_wake_worker(wqe, acct); }

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: fix wakeup race when adding new work" failed to apply to 5.13-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.13-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 87df7fb922d18e96992aa5e824aa34b2065fef59 Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Mon, 30 Aug 2021 07:45:47 -0600 Subject: [PATCH] io-wq: fix wakeup race when adding new work When new work is added, io_wqe_enqueue() checks if we need to wake or create a new worker. But that check is done outside the lock that otherwise synchronizes us with a worker going to sleep, so we can end up in the following situation: CPU0 CPU1 lock insert work unlock atomic_read(nr_running) != 0 lock atomic_dec(nr_running) no wakeup needed Hold the wqe lock around the "need to wakeup" check. Then we can also get rid of the temporary work_flags variable, as we know the work will remain valid as long as we hold the lock. Cc: stable(a)vger.kernel.org Reported-by: Andres Freund <andres(a)anarazel.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index 13aeb48a0964..cd9bd095fb1b 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -794,7 +794,7 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); - int work_flags; + bool do_wake; /* * If io-wq is exiting for this task, or if the request has explicitly @@ -806,14 +806,14 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) return; } - work_flags = work->flags; raw_spin_lock(&wqe->lock); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; + do_wake = (work->flags & IO_WQ_WORK_CONCURRENT) || + !atomic_read(&acct->nr_running); raw_spin_unlock(&wqe->lock); - if ((work_flags & IO_WQ_WORK_CONCURRENT) || - !atomic_read(&acct->nr_running)) + if (do_wake) io_wqe_wake_worker(wqe, acct); }

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io-wq: fix wakeup race when adding new work" failed to apply to 5.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 87df7fb922d18e96992aa5e824aa34b2065fef59 Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Mon, 30 Aug 2021 07:45:47 -0600 Subject: [PATCH] io-wq: fix wakeup race when adding new work When new work is added, io_wqe_enqueue() checks if we need to wake or create a new worker. But that check is done outside the lock that otherwise synchronizes us with a worker going to sleep, so we can end up in the following situation: CPU0 CPU1 lock insert work unlock atomic_read(nr_running) != 0 lock atomic_dec(nr_running) no wakeup needed Hold the wqe lock around the "need to wakeup" check. Then we can also get rid of the temporary work_flags variable, as we know the work will remain valid as long as we hold the lock. Cc: stable(a)vger.kernel.org Reported-by: Andres Freund <andres(a)anarazel.de> Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io-wq.c b/fs/io-wq.c index 13aeb48a0964..cd9bd095fb1b 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -794,7 +794,7 @@ static void io_wqe_insert_work(struct io_wqe *wqe, struct io_wq_work *work) static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) { struct io_wqe_acct *acct = io_work_get_acct(wqe, work); - int work_flags; + bool do_wake; /* * If io-wq is exiting for this task, or if the request has explicitly @@ -806,14 +806,14 @@ static void io_wqe_enqueue(struct io_wqe *wqe, struct io_wq_work *work) return; } - work_flags = work->flags; raw_spin_lock(&wqe->lock); io_wqe_insert_work(wqe, work); wqe->flags &= ~IO_WQE_FLAG_STALLED; + do_wake = (work->flags & IO_WQ_WORK_CONCURRENT) || + !atomic_read(&acct->nr_running); raw_spin_unlock(&wqe->lock); - if ((work_flags & IO_WQ_WORK_CONCURRENT) || - !atomic_read(&acct->nr_running)) + if (do_wake) io_wqe_wake_worker(wqe, acct); }

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3134cc8beb69d0db9de651081707c4651c011621 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz(a)kernel.org> Date: Thu, 19 Aug 2021 19:03:05 +0100 Subject: [PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta(a)google.com> Tested-by: Raghavendra Rao Ananta <rananta(a)google.com> Reviewed-by: Oliver Upton <oupton(a)google.com> Signed-off-by: Marc Zyngier <maz(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c index 2c580204f1dc..95a18cec14a3 100644 --- a/arch/arm64/kvm/vgic/vgic-v2.c +++ b/arch/arm64/kvm/vgic/vgic-v2.c @@ -60,6 +60,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) u32 val = cpuif->vgic_lr[lr]; u32 cpuid, intid = val & GICH_LR_VIRTUALID; struct vgic_irq *irq; + bool deactivated; /* Extract the source vCPU id from the LR */ cpuid = val & GICH_LR_PHYSID_CPUID; @@ -75,7 +76,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & GICH_LR_ACTIVE_BIT); irq->active = !!(val & GICH_LR_ACTIVE_BIT); if (irq->active && vgic_irq_is_sgi(intid)) @@ -96,36 +98,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & GICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & GICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & GICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 66004f61cd83..21a6207fb2ee 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -46,6 +46,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) u32 intid, cpuid; struct vgic_irq *irq; bool is_v2_sgi = false; + bool deactivated; cpuid = val & GICH_LR_PHYSID_CPUID; cpuid >>= GICH_LR_PHYSID_CPUID_SHIFT; @@ -68,7 +69,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & ICH_LR_ACTIVE_BIT); irq->active = !!(val & ICH_LR_ACTIVE_BIT); if (irq->active && is_v2_sgi) @@ -89,36 +91,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & ICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & ICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & ICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c index 81cec508d413..5dad4996cfb2 100644 --- a/arch/arm64/kvm/vgic/vgic.c +++ b/arch/arm64/kvm/vgic/vgic.c @@ -1021,3 +1021,41 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid) return map_is_active; } + +/* + * Level-triggered mapped IRQs are special because we only observe rising + * edges as input to the VGIC. + * + * If the guest never acked the interrupt we have to sample the physical + * line and set the line level, because the device state could have changed + * or we simply need to process the still pending interrupt later. + * + * We could also have entered the guest with the interrupt active+pending. + * On the next exit, we need to re-evaluate the pending state, as it could + * otherwise result in a spurious interrupt by injecting a now potentially + * stale pending state. + * + * If this causes us to lower the level, we have to also clear the physical + * active state, since we will otherwise never be told when the interrupt + * becomes asserted again. + * + * Another case is when the interrupt requires a helping hand on + * deactivation (no HW deactivation, for example). + */ +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending) +{ + if (vgic_irq_is_mapped_level(irq)) { + bool resample = false; + + if (unlikely(vgic_irq_needs_resampling(irq))) { + resample = !(irq->active || irq->pending_latch); + } else if (lr_pending || (lr_deactivated && irq->line_level)) { + irq->line_level = vgic_get_phys_line_level(irq); + resample = !irq->line_level; + } + + if (resample) + vgic_irq_set_phys_active(irq, false); + } +} diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index dc1f3d1657ee..14a9218641f5 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -169,6 +169,8 @@ void vgic_irq_set_phys_active(struct vgic_irq *irq, bool active); bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, unsigned long flags); void vgic_kick_vcpus(struct kvm *kvm); +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending); int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, phys_addr_t addr, phys_addr_t alignment);

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3134cc8beb69d0db9de651081707c4651c011621 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz(a)kernel.org> Date: Thu, 19 Aug 2021 19:03:05 +0100 Subject: [PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta(a)google.com> Tested-by: Raghavendra Rao Ananta <rananta(a)google.com> Reviewed-by: Oliver Upton <oupton(a)google.com> Signed-off-by: Marc Zyngier <maz(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c index 2c580204f1dc..95a18cec14a3 100644 --- a/arch/arm64/kvm/vgic/vgic-v2.c +++ b/arch/arm64/kvm/vgic/vgic-v2.c @@ -60,6 +60,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) u32 val = cpuif->vgic_lr[lr]; u32 cpuid, intid = val & GICH_LR_VIRTUALID; struct vgic_irq *irq; + bool deactivated; /* Extract the source vCPU id from the LR */ cpuid = val & GICH_LR_PHYSID_CPUID; @@ -75,7 +76,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & GICH_LR_ACTIVE_BIT); irq->active = !!(val & GICH_LR_ACTIVE_BIT); if (irq->active && vgic_irq_is_sgi(intid)) @@ -96,36 +98,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & GICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & GICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & GICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 66004f61cd83..21a6207fb2ee 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -46,6 +46,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) u32 intid, cpuid; struct vgic_irq *irq; bool is_v2_sgi = false; + bool deactivated; cpuid = val & GICH_LR_PHYSID_CPUID; cpuid >>= GICH_LR_PHYSID_CPUID_SHIFT; @@ -68,7 +69,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & ICH_LR_ACTIVE_BIT); irq->active = !!(val & ICH_LR_ACTIVE_BIT); if (irq->active && is_v2_sgi) @@ -89,36 +91,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & ICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & ICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & ICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c index 81cec508d413..5dad4996cfb2 100644 --- a/arch/arm64/kvm/vgic/vgic.c +++ b/arch/arm64/kvm/vgic/vgic.c @@ -1021,3 +1021,41 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid) return map_is_active; } + +/* + * Level-triggered mapped IRQs are special because we only observe rising + * edges as input to the VGIC. + * + * If the guest never acked the interrupt we have to sample the physical + * line and set the line level, because the device state could have changed + * or we simply need to process the still pending interrupt later. + * + * We could also have entered the guest with the interrupt active+pending. + * On the next exit, we need to re-evaluate the pending state, as it could + * otherwise result in a spurious interrupt by injecting a now potentially + * stale pending state. + * + * If this causes us to lower the level, we have to also clear the physical + * active state, since we will otherwise never be told when the interrupt + * becomes asserted again. + * + * Another case is when the interrupt requires a helping hand on + * deactivation (no HW deactivation, for example). + */ +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending) +{ + if (vgic_irq_is_mapped_level(irq)) { + bool resample = false; + + if (unlikely(vgic_irq_needs_resampling(irq))) { + resample = !(irq->active || irq->pending_latch); + } else if (lr_pending || (lr_deactivated && irq->line_level)) { + irq->line_level = vgic_get_phys_line_level(irq); + resample = !irq->line_level; + } + + if (resample) + vgic_irq_set_phys_active(irq, false); + } +} diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index dc1f3d1657ee..14a9218641f5 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -169,6 +169,8 @@ void vgic_irq_set_phys_active(struct vgic_irq *irq, bool active); bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, unsigned long flags); void vgic_kick_vcpus(struct kvm *kvm); +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending); int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, phys_addr_t addr, phys_addr_t alignment);

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3134cc8beb69d0db9de651081707c4651c011621 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz(a)kernel.org> Date: Thu, 19 Aug 2021 19:03:05 +0100 Subject: [PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta(a)google.com> Tested-by: Raghavendra Rao Ananta <rananta(a)google.com> Reviewed-by: Oliver Upton <oupton(a)google.com> Signed-off-by: Marc Zyngier <maz(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c index 2c580204f1dc..95a18cec14a3 100644 --- a/arch/arm64/kvm/vgic/vgic-v2.c +++ b/arch/arm64/kvm/vgic/vgic-v2.c @@ -60,6 +60,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) u32 val = cpuif->vgic_lr[lr]; u32 cpuid, intid = val & GICH_LR_VIRTUALID; struct vgic_irq *irq; + bool deactivated; /* Extract the source vCPU id from the LR */ cpuid = val & GICH_LR_PHYSID_CPUID; @@ -75,7 +76,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & GICH_LR_ACTIVE_BIT); irq->active = !!(val & GICH_LR_ACTIVE_BIT); if (irq->active && vgic_irq_is_sgi(intid)) @@ -96,36 +98,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & GICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & GICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & GICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 66004f61cd83..21a6207fb2ee 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -46,6 +46,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) u32 intid, cpuid; struct vgic_irq *irq; bool is_v2_sgi = false; + bool deactivated; cpuid = val & GICH_LR_PHYSID_CPUID; cpuid >>= GICH_LR_PHYSID_CPUID_SHIFT; @@ -68,7 +69,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & ICH_LR_ACTIVE_BIT); irq->active = !!(val & ICH_LR_ACTIVE_BIT); if (irq->active && is_v2_sgi) @@ -89,36 +91,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & ICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & ICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & ICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c index 81cec508d413..5dad4996cfb2 100644 --- a/arch/arm64/kvm/vgic/vgic.c +++ b/arch/arm64/kvm/vgic/vgic.c @@ -1021,3 +1021,41 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid) return map_is_active; } + +/* + * Level-triggered mapped IRQs are special because we only observe rising + * edges as input to the VGIC. + * + * If the guest never acked the interrupt we have to sample the physical + * line and set the line level, because the device state could have changed + * or we simply need to process the still pending interrupt later. + * + * We could also have entered the guest with the interrupt active+pending. + * On the next exit, we need to re-evaluate the pending state, as it could + * otherwise result in a spurious interrupt by injecting a now potentially + * stale pending state. + * + * If this causes us to lower the level, we have to also clear the physical + * active state, since we will otherwise never be told when the interrupt + * becomes asserted again. + * + * Another case is when the interrupt requires a helping hand on + * deactivation (no HW deactivation, for example). + */ +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending) +{ + if (vgic_irq_is_mapped_level(irq)) { + bool resample = false; + + if (unlikely(vgic_irq_needs_resampling(irq))) { + resample = !(irq->active || irq->pending_latch); + } else if (lr_pending || (lr_deactivated && irq->line_level)) { + irq->line_level = vgic_get_phys_line_level(irq); + resample = !irq->line_level; + } + + if (resample) + vgic_irq_set_phys_active(irq, false); + } +} diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index dc1f3d1657ee..14a9218641f5 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -169,6 +169,8 @@ void vgic_irq_set_phys_active(struct vgic_irq *irq, bool active); bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, unsigned long flags); void vgic_kick_vcpus(struct kvm *kvm); +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending); int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, phys_addr_t addr, phys_addr_t alignment);

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation" failed to apply to 5.13-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.13-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 3134cc8beb69d0db9de651081707c4651c011621 Mon Sep 17 00:00:00 2001 From: Marc Zyngier <maz(a)kernel.org> Date: Thu, 19 Aug 2021 19:03:05 +0100 Subject: [PATCH] KVM: arm64: vgic: Resample HW pending state on deactivation When a mapped level interrupt (a timer, for example) is deactivated by the guest, the corresponding host interrupt is equally deactivated. However, the fate of the pending state still needs to be dealt with in SW. This is specially true when the interrupt was in the active+pending state in the virtual distributor at the point where the guest was entered. On exit, the pending state is potentially stale (the guest may have put the interrupt in a non-pending state). If we don't do anything, the interrupt will be spuriously injected in the guest. Although this shouldn't have any ill effect (spurious interrupts are always possible), we can improve the emulation by detecting the deactivation-while-pending case and resample the interrupt. While we're at it, move the logic into a common helper that can be shared between the two GIC implementations. Fixes: e40cc57bac79 ("KVM: arm/arm64: vgic: Support level-triggered mapped interrupts") Reported-by: Raghavendra Rao Ananta <rananta(a)google.com> Tested-by: Raghavendra Rao Ananta <rananta(a)google.com> Reviewed-by: Oliver Upton <oupton(a)google.com> Signed-off-by: Marc Zyngier <maz(a)kernel.org> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20210819180305.1670525-1-maz@kernel.org diff --git a/arch/arm64/kvm/vgic/vgic-v2.c b/arch/arm64/kvm/vgic/vgic-v2.c index 2c580204f1dc..95a18cec14a3 100644 --- a/arch/arm64/kvm/vgic/vgic-v2.c +++ b/arch/arm64/kvm/vgic/vgic-v2.c @@ -60,6 +60,7 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) u32 val = cpuif->vgic_lr[lr]; u32 cpuid, intid = val & GICH_LR_VIRTUALID; struct vgic_irq *irq; + bool deactivated; /* Extract the source vCPU id from the LR */ cpuid = val & GICH_LR_PHYSID_CPUID; @@ -75,7 +76,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & GICH_LR_ACTIVE_BIT); irq->active = !!(val & GICH_LR_ACTIVE_BIT); if (irq->active && vgic_irq_is_sgi(intid)) @@ -96,36 +98,8 @@ void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & GICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & GICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & GICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c index 66004f61cd83..21a6207fb2ee 100644 --- a/arch/arm64/kvm/vgic/vgic-v3.c +++ b/arch/arm64/kvm/vgic/vgic-v3.c @@ -46,6 +46,7 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) u32 intid, cpuid; struct vgic_irq *irq; bool is_v2_sgi = false; + bool deactivated; cpuid = val & GICH_LR_PHYSID_CPUID; cpuid >>= GICH_LR_PHYSID_CPUID_SHIFT; @@ -68,7 +69,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) raw_spin_lock(&irq->irq_lock); - /* Always preserve the active bit */ + /* Always preserve the active bit, note deactivation */ + deactivated = irq->active && !(val & ICH_LR_ACTIVE_BIT); irq->active = !!(val & ICH_LR_ACTIVE_BIT); if (irq->active && is_v2_sgi) @@ -89,36 +91,8 @@ void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) if (irq->config == VGIC_CONFIG_LEVEL && !(val & ICH_LR_STATE)) irq->pending_latch = false; - /* - * Level-triggered mapped IRQs are special because we only - * observe rising edges as input to the VGIC. - * - * If the guest never acked the interrupt we have to sample - * the physical line and set the line level, because the - * device state could have changed or we simply need to - * process the still pending interrupt later. - * - * If this causes us to lower the level, we have to also clear - * the physical active state, since we will otherwise never be - * told when the interrupt becomes asserted again. - * - * Another case is when the interrupt requires a helping hand - * on deactivation (no HW deactivation, for example). - */ - if (vgic_irq_is_mapped_level(irq)) { - bool resample = false; - - if (val & ICH_LR_PENDING_BIT) { - irq->line_level = vgic_get_phys_line_level(irq); - resample = !irq->line_level; - } else if (vgic_irq_needs_resampling(irq) && - !(irq->active || irq->pending_latch)) { - resample = true; - } - - if (resample) - vgic_irq_set_phys_active(irq, false); - } + /* Handle resampling for mapped interrupts if required */ + vgic_irq_handle_resampling(irq, deactivated, val & ICH_LR_PENDING_BIT); raw_spin_unlock(&irq->irq_lock); vgic_put_irq(vcpu->kvm, irq); diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c index 81cec508d413..5dad4996cfb2 100644 --- a/arch/arm64/kvm/vgic/vgic.c +++ b/arch/arm64/kvm/vgic/vgic.c @@ -1021,3 +1021,41 @@ bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid) return map_is_active; } + +/* + * Level-triggered mapped IRQs are special because we only observe rising + * edges as input to the VGIC. + * + * If the guest never acked the interrupt we have to sample the physical + * line and set the line level, because the device state could have changed + * or we simply need to process the still pending interrupt later. + * + * We could also have entered the guest with the interrupt active+pending. + * On the next exit, we need to re-evaluate the pending state, as it could + * otherwise result in a spurious interrupt by injecting a now potentially + * stale pending state. + * + * If this causes us to lower the level, we have to also clear the physical + * active state, since we will otherwise never be told when the interrupt + * becomes asserted again. + * + * Another case is when the interrupt requires a helping hand on + * deactivation (no HW deactivation, for example). + */ +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending) +{ + if (vgic_irq_is_mapped_level(irq)) { + bool resample = false; + + if (unlikely(vgic_irq_needs_resampling(irq))) { + resample = !(irq->active || irq->pending_latch); + } else if (lr_pending || (lr_deactivated && irq->line_level)) { + irq->line_level = vgic_get_phys_line_level(irq); + resample = !irq->line_level; + } + + if (resample) + vgic_irq_set_phys_active(irq, false); + } +} diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h index dc1f3d1657ee..14a9218641f5 100644 --- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -169,6 +169,8 @@ void vgic_irq_set_phys_active(struct vgic_irq *irq, bool active); bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq, unsigned long flags); void vgic_kick_vcpus(struct kvm *kvm); +void vgic_irq_handle_resampling(struct vgic_irq *irq, + bool lr_deactivated, bool lr_pending); int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, phys_addr_t addr, phys_addr_t alignment);

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From f7782bb8d818d8f47c26b22079db10599922787a Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Tue, 10 Aug 2021 07:45:26 -0700 Subject: [PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter Clear nested.pi_pending on nested VM-Enter even if L2 will run without posted interrupts enabled. If nested.pi_pending is left set from a previous L2, vmx_complete_nested_posted_interrupt() will pick up the stale flag and exit to userspace with an "internal emulation error" due the new L2 not having a valid nested.pi_desc. Arguably, vmx_complete_nested_posted_interrupt() should first check for posted interrupts being enabled, but it's also completely reasonable that KVM wouldn't screw up a fundamental flag. Not to mention that the mere existence of nested.pi_pending is a long-standing bug as KVM shouldn't move the posted interrupt out of the IRR until it's actually processed, e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit with a "pending" posted interrupt. Fixing the mess is a future problem. Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI descriptor as an error, this was a benign bug as the null PI descriptor effectively served as a check on PI not being enabled. Even then, the new flow did not become problematic until KVM started checking the result of kvm_check_nested_events(). Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails") Cc: stable(a)vger.kernel.org Cc: Jim Mattson <jmattson(a)google.com> Signed-off-by: Sean Christopherson <seanjc(a)google.com> Message-Id: <20210810144526.2662272-1-seanjc(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 264a9f4c9179..bc6327950657 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2187,12 +2187,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 ~PIN_BASED_VMX_PREEMPTION_TIMER); /* Posted interrupts setting is only taken from vmcs12. */ - if (nested_cpu_has_posted_intr(vmcs12)) { + vmx->nested.pi_pending = false; + if (nested_cpu_has_posted_intr(vmcs12)) vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; - vmx->nested.pi_pending = false; - } else { + else exec_control &= ~PIN_BASED_POSTED_INTR; - } pin_controls_set(vmx, exec_control); /*

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From f7782bb8d818d8f47c26b22079db10599922787a Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Tue, 10 Aug 2021 07:45:26 -0700 Subject: [PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter Clear nested.pi_pending on nested VM-Enter even if L2 will run without posted interrupts enabled. If nested.pi_pending is left set from a previous L2, vmx_complete_nested_posted_interrupt() will pick up the stale flag and exit to userspace with an "internal emulation error" due the new L2 not having a valid nested.pi_desc. Arguably, vmx_complete_nested_posted_interrupt() should first check for posted interrupts being enabled, but it's also completely reasonable that KVM wouldn't screw up a fundamental flag. Not to mention that the mere existence of nested.pi_pending is a long-standing bug as KVM shouldn't move the posted interrupt out of the IRR until it's actually processed, e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit with a "pending" posted interrupt. Fixing the mess is a future problem. Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI descriptor as an error, this was a benign bug as the null PI descriptor effectively served as a check on PI not being enabled. Even then, the new flow did not become problematic until KVM started checking the result of kvm_check_nested_events(). Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails") Cc: stable(a)vger.kernel.org Cc: Jim Mattson <jmattson(a)google.com> Signed-off-by: Sean Christopherson <seanjc(a)google.com> Message-Id: <20210810144526.2662272-1-seanjc(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 264a9f4c9179..bc6327950657 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2187,12 +2187,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 ~PIN_BASED_VMX_PREEMPTION_TIMER); /* Posted interrupts setting is only taken from vmcs12. */ - if (nested_cpu_has_posted_intr(vmcs12)) { + vmx->nested.pi_pending = false; + if (nested_cpu_has_posted_intr(vmcs12)) vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; - vmx->nested.pi_pending = false; - } else { + else exec_control &= ~PIN_BASED_POSTED_INTR; - } pin_controls_set(vmx, exec_control); /*

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From f7782bb8d818d8f47c26b22079db10599922787a Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Tue, 10 Aug 2021 07:45:26 -0700 Subject: [PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter Clear nested.pi_pending on nested VM-Enter even if L2 will run without posted interrupts enabled. If nested.pi_pending is left set from a previous L2, vmx_complete_nested_posted_interrupt() will pick up the stale flag and exit to userspace with an "internal emulation error" due the new L2 not having a valid nested.pi_desc. Arguably, vmx_complete_nested_posted_interrupt() should first check for posted interrupts being enabled, but it's also completely reasonable that KVM wouldn't screw up a fundamental flag. Not to mention that the mere existence of nested.pi_pending is a long-standing bug as KVM shouldn't move the posted interrupt out of the IRR until it's actually processed, e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit with a "pending" posted interrupt. Fixing the mess is a future problem. Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI descriptor as an error, this was a benign bug as the null PI descriptor effectively served as a check on PI not being enabled. Even then, the new flow did not become problematic until KVM started checking the result of kvm_check_nested_events(). Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails") Cc: stable(a)vger.kernel.org Cc: Jim Mattson <jmattson(a)google.com> Signed-off-by: Sean Christopherson <seanjc(a)google.com> Message-Id: <20210810144526.2662272-1-seanjc(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 264a9f4c9179..bc6327950657 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2187,12 +2187,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 ~PIN_BASED_VMX_PREEMPTION_TIMER); /* Posted interrupts setting is only taken from vmcs12. */ - if (nested_cpu_has_posted_intr(vmcs12)) { + vmx->nested.pi_pending = false; + if (nested_cpu_has_posted_intr(vmcs12)) vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; - vmx->nested.pi_pending = false; - } else { + else exec_control &= ~PIN_BASED_POSTED_INTR; - } pin_controls_set(vmx, exec_control); /*

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From f7782bb8d818d8f47c26b22079db10599922787a Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Tue, 10 Aug 2021 07:45:26 -0700 Subject: [PATCH] KVM: nVMX: Unconditionally clear nested.pi_pending on nested VM-Enter Clear nested.pi_pending on nested VM-Enter even if L2 will run without posted interrupts enabled. If nested.pi_pending is left set from a previous L2, vmx_complete_nested_posted_interrupt() will pick up the stale flag and exit to userspace with an "internal emulation error" due the new L2 not having a valid nested.pi_desc. Arguably, vmx_complete_nested_posted_interrupt() should first check for posted interrupts being enabled, but it's also completely reasonable that KVM wouldn't screw up a fundamental flag. Not to mention that the mere existence of nested.pi_pending is a long-standing bug as KVM shouldn't move the posted interrupt out of the IRR until it's actually processed, e.g. KVM effectively drops an interrupt when it performs a nested VM-Exit with a "pending" posted interrupt. Fixing the mess is a future problem. Prior to vmx_complete_nested_posted_interrupt() interpreting a null PI descriptor as an error, this was a benign bug as the null PI descriptor effectively served as a check on PI not being enabled. Even then, the new flow did not become problematic until KVM started checking the result of kvm_check_nested_events(). Fixes: 705699a13994 ("KVM: nVMX: Enable nested posted interrupt processing") Fixes: 966eefb89657 ("KVM: nVMX: Disable vmcs02 posted interrupts if vmcs12 PID isn't mappable") Fixes: 47d3530f86c0 ("KVM: x86: Exit to userspace when kvm_check_nested_events fails") Cc: stable(a)vger.kernel.org Cc: Jim Mattson <jmattson(a)google.com> Signed-off-by: Sean Christopherson <seanjc(a)google.com> Message-Id: <20210810144526.2662272-1-seanjc(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 264a9f4c9179..bc6327950657 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2187,12 +2187,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0 ~PIN_BASED_VMX_PREEMPTION_TIMER); /* Posted interrupts setting is only taken from vmcs12. */ - if (nested_cpu_has_posted_intr(vmcs12)) { + vmx->nested.pi_pending = false; + if (nested_cpu_has_posted_intr(vmcs12)) vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; - vmx->nested.pi_pending = false; - } else { + else exec_control &= ~PIN_BASED_POSTED_INTR; - } pin_controls_set(vmx, exec_control); /*

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 81b4b56d4f8130bbb99cf4e2b48082e5b4cfccb9 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Thu, 26 Aug 2021 12:57:49 +0300 Subject: [PATCH] KVM: VMX: avoid running vmx_handle_exit_irqoff in case of emulation If we are emulating an invalid guest state, we don't have a correct exit reason, and thus we shouldn't do anything in this function. Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Message-Id: <20210826095750.1650467-2-mlevitsk(a)redhat.com> Cc: stable(a)vger.kernel.org Fixes: 95b5a48c4f2b ("KVM: VMX: Handle NMIs, #MCs and async #PFs in common irqs-disabled fn", 2019-06-18) Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index fada1055f325..0c2c0d5ae873 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6382,6 +6382,9 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); + if (vmx->emulation_required) + return; + if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT) handle_external_interrupt_irqoff(vcpu); else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: x86: clamp host mapping level to max_level in" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From ec607a564f70519b340f7eb4cfc0f4a6b55285ac Mon Sep 17 00:00:00 2001 From: Paolo Bonzini <pbonzini(a)redhat.com> Date: Fri, 6 Aug 2021 07:05:58 -0400 Subject: [PATCH] KVM: x86: clamp host mapping level to max_level in kvm_mmu_max_mapping_level This change started as a way to make kvm_mmu_hugepage_adjust a bit simpler, but it does fix two bugs as well. One bug is in zapping collapsible PTEs. If a large page size is disallowed but not all of them, kvm_mmu_max_mapping_level will return the host mapping level and the small PTEs will be zapped up to that level. However, if e.g. 1GB are prohibited, we can still zap 4KB mapping and preserve the 2MB ones. This can happen for example when NX huge pages are in use. The second would happen when userspace backs guest memory with a 1gb hugepage but only assign a subset of the page to the guest. 1gb pages would be disallowed by the memslot, but not 2mb. kvm_mmu_max_mapping_level() would fall through to the host_pfn_mapping_level() logic, see the 1gb hugepage, and map the whole thing into the guest. Fixes: 2f57b7051fe8 ("KVM: x86/mmu: Persist gfn_lpage_is_disallowed() to max_level") Cc: stable(a)vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 54cb15e4b550..bfd2705a7291 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -2910,6 +2910,7 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, int max_level) { struct kvm_lpage_info *linfo; + int host_level; max_level = min(max_level, max_huge_page_level); for ( ; max_level > PG_LEVEL_4K; max_level--) { @@ -2921,7 +2922,8 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K; - return host_pfn_mapping_level(kvm, gfn, pfn, slot); + host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot); + return min(host_level, max_level); } int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2945,17 +2947,12 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, if (!slot) return PG_LEVEL_4K; - level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level); - if (level == PG_LEVEL_4K) - return level; - - *req_level = level = min(level, max_level); - /* * Enforce the iTLB multihit workaround after capturing the requested * level, which will be used to do precise, accurate accounting. */ - if (huge_page_disallowed) + *req_level = level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level); + if (level == PG_LEVEL_4K || huge_page_disallowed) return PG_LEVEL_4K; /*

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a3e03bc1368c1bc16e19b001fc96dc7430573cc8 Mon Sep 17 00:00:00 2001 From: Halil Pasic <pasic(a)linux.ibm.com> Date: Fri, 27 Aug 2021 14:54:29 +0200 Subject: [PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true, it may not always be, and we must not rely on this. Reason is that KVM decides the vcpu_idx, userspace decides the vcpu_id, thus the two might not match. Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies that code like for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { vcpu = kvm_get_vcpu(kvm, vcpu_id); do_stuff(vcpu); } is not legit. Reason is that kvm_get_vcpu expects an vcpu_idx, not an vcpu_id. The trouble is, we do actually use kvm->arch.idle_mask like this. To fix this problem we have two options. Either use kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id, or switch to indexing via vcpu_idx. The latter is preferable for obvious reasons. Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the same index as idle_mask lets make the same change for it as well. Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array") Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com> Reviewed-by: Christian Bornträger <borntraeger(a)de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> # 3.15+ Link: https://lore.kernel.org/r/20210827125429.1912577-1-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 118d5450c523..611f18ecde91 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -963,6 +963,7 @@ struct kvm_arch{ atomic64_t cmma_dirty_pages; /* subset of available cpu features enabled by user space */ DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS); + /* indexed by vcpu_idx */ DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS); struct kvm_s390_gisa_interrupt gisa_int; struct kvm_s390_pv pv; diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index d548d60caed2..16256e17a544 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -419,13 +419,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu) static void __set_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_set_cpuflags(vcpu, CPUSTAT_WAIT); - set_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __unset_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_clear_cpuflags(vcpu, CPUSTAT_WAIT); - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __reset_intercept_indicators(struct kvm_vcpu *vcpu) @@ -3050,18 +3050,18 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len) static void __airqs_kick_single_vcpu(struct kvm *kvm, u8 deliverable_mask) { - int vcpu_id, online_vcpus = atomic_read(&kvm->online_vcpus); + int vcpu_idx, online_vcpus = atomic_read(&kvm->online_vcpus); struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int; struct kvm_vcpu *vcpu; - for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { - vcpu = kvm_get_vcpu(kvm, vcpu_id); + for_each_set_bit(vcpu_idx, kvm->arch.idle_mask, online_vcpus) { + vcpu = kvm_get_vcpu(kvm, vcpu_idx); if (psw_ioint_disabled(vcpu)) continue; deliverable_mask &= (u8)(vcpu->arch.sie_block->gcr[6] >> 24); if (deliverable_mask) { /* lately kicked but not yet running */ - if (test_and_set_bit(vcpu_id, gi->kicked_mask)) + if (test_and_set_bit(vcpu_idx, gi->kicked_mask)) return; kvm_s390_vcpu_wakeup(vcpu); return; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 5b45c83ced21..e144c8046ceb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4026,7 +4026,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu) kvm_s390_patch_guest_per_regs(vcpu); } - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.gisa_int.kicked_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.gisa_int.kicked_mask); vcpu->arch.sie_block->icptcode = 0; cpuflags = atomic_read(&vcpu->arch.sie_block->cpuflags); diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 9fad25109b0d..ecd741ee3276 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -79,7 +79,7 @@ static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu) static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) { - return test_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + return test_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static inline int kvm_is_ucontrol(struct kvm *kvm)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a3e03bc1368c1bc16e19b001fc96dc7430573cc8 Mon Sep 17 00:00:00 2001 From: Halil Pasic <pasic(a)linux.ibm.com> Date: Fri, 27 Aug 2021 14:54:29 +0200 Subject: [PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true, it may not always be, and we must not rely on this. Reason is that KVM decides the vcpu_idx, userspace decides the vcpu_id, thus the two might not match. Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies that code like for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { vcpu = kvm_get_vcpu(kvm, vcpu_id); do_stuff(vcpu); } is not legit. Reason is that kvm_get_vcpu expects an vcpu_idx, not an vcpu_id. The trouble is, we do actually use kvm->arch.idle_mask like this. To fix this problem we have two options. Either use kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id, or switch to indexing via vcpu_idx. The latter is preferable for obvious reasons. Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the same index as idle_mask lets make the same change for it as well. Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array") Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com> Reviewed-by: Christian Bornträger <borntraeger(a)de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> # 3.15+ Link: https://lore.kernel.org/r/20210827125429.1912577-1-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 118d5450c523..611f18ecde91 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -963,6 +963,7 @@ struct kvm_arch{ atomic64_t cmma_dirty_pages; /* subset of available cpu features enabled by user space */ DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS); + /* indexed by vcpu_idx */ DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS); struct kvm_s390_gisa_interrupt gisa_int; struct kvm_s390_pv pv; diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index d548d60caed2..16256e17a544 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -419,13 +419,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu) static void __set_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_set_cpuflags(vcpu, CPUSTAT_WAIT); - set_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __unset_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_clear_cpuflags(vcpu, CPUSTAT_WAIT); - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __reset_intercept_indicators(struct kvm_vcpu *vcpu) @@ -3050,18 +3050,18 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len) static void __airqs_kick_single_vcpu(struct kvm *kvm, u8 deliverable_mask) { - int vcpu_id, online_vcpus = atomic_read(&kvm->online_vcpus); + int vcpu_idx, online_vcpus = atomic_read(&kvm->online_vcpus); struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int; struct kvm_vcpu *vcpu; - for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { - vcpu = kvm_get_vcpu(kvm, vcpu_id); + for_each_set_bit(vcpu_idx, kvm->arch.idle_mask, online_vcpus) { + vcpu = kvm_get_vcpu(kvm, vcpu_idx); if (psw_ioint_disabled(vcpu)) continue; deliverable_mask &= (u8)(vcpu->arch.sie_block->gcr[6] >> 24); if (deliverable_mask) { /* lately kicked but not yet running */ - if (test_and_set_bit(vcpu_id, gi->kicked_mask)) + if (test_and_set_bit(vcpu_idx, gi->kicked_mask)) return; kvm_s390_vcpu_wakeup(vcpu); return; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 5b45c83ced21..e144c8046ceb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4026,7 +4026,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu) kvm_s390_patch_guest_per_regs(vcpu); } - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.gisa_int.kicked_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.gisa_int.kicked_mask); vcpu->arch.sie_block->icptcode = 0; cpuflags = atomic_read(&vcpu->arch.sie_block->cpuflags); diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 9fad25109b0d..ecd741ee3276 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -79,7 +79,7 @@ static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu) static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) { - return test_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + return test_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static inline int kvm_is_ucontrol(struct kvm *kvm)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a3e03bc1368c1bc16e19b001fc96dc7430573cc8 Mon Sep 17 00:00:00 2001 From: Halil Pasic <pasic(a)linux.ibm.com> Date: Fri, 27 Aug 2021 14:54:29 +0200 Subject: [PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true, it may not always be, and we must not rely on this. Reason is that KVM decides the vcpu_idx, userspace decides the vcpu_id, thus the two might not match. Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies that code like for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { vcpu = kvm_get_vcpu(kvm, vcpu_id); do_stuff(vcpu); } is not legit. Reason is that kvm_get_vcpu expects an vcpu_idx, not an vcpu_id. The trouble is, we do actually use kvm->arch.idle_mask like this. To fix this problem we have two options. Either use kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id, or switch to indexing via vcpu_idx. The latter is preferable for obvious reasons. Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the same index as idle_mask lets make the same change for it as well. Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array") Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com> Reviewed-by: Christian Bornträger <borntraeger(a)de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> # 3.15+ Link: https://lore.kernel.org/r/20210827125429.1912577-1-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 118d5450c523..611f18ecde91 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -963,6 +963,7 @@ struct kvm_arch{ atomic64_t cmma_dirty_pages; /* subset of available cpu features enabled by user space */ DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS); + /* indexed by vcpu_idx */ DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS); struct kvm_s390_gisa_interrupt gisa_int; struct kvm_s390_pv pv; diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index d548d60caed2..16256e17a544 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -419,13 +419,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu) static void __set_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_set_cpuflags(vcpu, CPUSTAT_WAIT); - set_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __unset_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_clear_cpuflags(vcpu, CPUSTAT_WAIT); - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __reset_intercept_indicators(struct kvm_vcpu *vcpu) @@ -3050,18 +3050,18 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len) static void __airqs_kick_single_vcpu(struct kvm *kvm, u8 deliverable_mask) { - int vcpu_id, online_vcpus = atomic_read(&kvm->online_vcpus); + int vcpu_idx, online_vcpus = atomic_read(&kvm->online_vcpus); struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int; struct kvm_vcpu *vcpu; - for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { - vcpu = kvm_get_vcpu(kvm, vcpu_id); + for_each_set_bit(vcpu_idx, kvm->arch.idle_mask, online_vcpus) { + vcpu = kvm_get_vcpu(kvm, vcpu_idx); if (psw_ioint_disabled(vcpu)) continue; deliverable_mask &= (u8)(vcpu->arch.sie_block->gcr[6] >> 24); if (deliverable_mask) { /* lately kicked but not yet running */ - if (test_and_set_bit(vcpu_id, gi->kicked_mask)) + if (test_and_set_bit(vcpu_idx, gi->kicked_mask)) return; kvm_s390_vcpu_wakeup(vcpu); return; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 5b45c83ced21..e144c8046ceb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4026,7 +4026,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu) kvm_s390_patch_guest_per_regs(vcpu); } - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.gisa_int.kicked_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.gisa_int.kicked_mask); vcpu->arch.sie_block->icptcode = 0; cpuflags = atomic_read(&vcpu->arch.sie_block->cpuflags); diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 9fad25109b0d..ecd741ee3276 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -79,7 +79,7 @@ static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu) static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) { - return test_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + return test_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static inline int kvm_is_ucontrol(struct kvm *kvm)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx" failed to apply to 4.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From a3e03bc1368c1bc16e19b001fc96dc7430573cc8 Mon Sep 17 00:00:00 2001 From: Halil Pasic <pasic(a)linux.ibm.com> Date: Fri, 27 Aug 2021 14:54:29 +0200 Subject: [PATCH] KVM: s390: index kvm->arch.idle_mask by vcpu_idx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true, it may not always be, and we must not rely on this. Reason is that KVM decides the vcpu_idx, userspace decides the vcpu_id, thus the two might not match. Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies that code like for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { vcpu = kvm_get_vcpu(kvm, vcpu_id); do_stuff(vcpu); } is not legit. Reason is that kvm_get_vcpu expects an vcpu_idx, not an vcpu_id. The trouble is, we do actually use kvm->arch.idle_mask like this. To fix this problem we have two options. Either use kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id, or switch to indexing via vcpu_idx. The latter is preferable for obvious reasons. Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the same index as idle_mask lets make the same change for it as well. Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array") Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com> Reviewed-by: Christian Bornträger <borntraeger(a)de.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> # 3.15+ Link: https://lore.kernel.org/r/20210827125429.1912577-1-pasic@linux.ibm.com Signed-off-by: Christian Borntraeger <borntraeger(a)de.ibm.com> diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index 118d5450c523..611f18ecde91 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -963,6 +963,7 @@ struct kvm_arch{ atomic64_t cmma_dirty_pages; /* subset of available cpu features enabled by user space */ DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS); + /* indexed by vcpu_idx */ DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS); struct kvm_s390_gisa_interrupt gisa_int; struct kvm_s390_pv pv; diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index d548d60caed2..16256e17a544 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -419,13 +419,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu) static void __set_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_set_cpuflags(vcpu, CPUSTAT_WAIT); - set_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __unset_cpu_idle(struct kvm_vcpu *vcpu) { kvm_s390_clear_cpuflags(vcpu, CPUSTAT_WAIT); - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static void __reset_intercept_indicators(struct kvm_vcpu *vcpu) @@ -3050,18 +3050,18 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len) static void __airqs_kick_single_vcpu(struct kvm *kvm, u8 deliverable_mask) { - int vcpu_id, online_vcpus = atomic_read(&kvm->online_vcpus); + int vcpu_idx, online_vcpus = atomic_read(&kvm->online_vcpus); struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int; struct kvm_vcpu *vcpu; - for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) { - vcpu = kvm_get_vcpu(kvm, vcpu_id); + for_each_set_bit(vcpu_idx, kvm->arch.idle_mask, online_vcpus) { + vcpu = kvm_get_vcpu(kvm, vcpu_idx); if (psw_ioint_disabled(vcpu)) continue; deliverable_mask &= (u8)(vcpu->arch.sie_block->gcr[6] >> 24); if (deliverable_mask) { /* lately kicked but not yet running */ - if (test_and_set_bit(vcpu_id, gi->kicked_mask)) + if (test_and_set_bit(vcpu_idx, gi->kicked_mask)) return; kvm_s390_vcpu_wakeup(vcpu); return; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 5b45c83ced21..e144c8046ceb 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -4026,7 +4026,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu) kvm_s390_patch_guest_per_regs(vcpu); } - clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.gisa_int.kicked_mask); + clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.gisa_int.kicked_mask); vcpu->arch.sie_block->icptcode = 0; cpuflags = atomic_read(&vcpu->arch.sie_block->cpuflags); diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 9fad25109b0d..ecd741ee3276 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -79,7 +79,7 @@ static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu) static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) { - return test_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask); + return test_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask); } static inline int kvm_is_ucontrol(struct kvm *kvm)

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io_uring: fail links of cancelled timeouts" failed to apply to 5.13-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.13-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 2ae2eb9dde18979b40629dd413b9adbd6c894cdf Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence(a)gmail.com> Date: Thu, 9 Sep 2021 13:56:27 +0100 Subject: [PATCH] io_uring: fail links of cancelled timeouts When we cancel a timeout we should mark it with REQ_F_FAIL, so linked requests are cancelled as well, but not queued for further execution. Cc: stable(a)vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Link: https://lore.kernel.org/r/fff625b44eeced3a5cae79f60e6acf3fbdf8f990.16311921… Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/fs/io_uring.c b/fs/io_uring.c index b21a423a4de8..ffd91844b2e5 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1482,6 +1482,8 @@ static void io_kill_timeout(struct io_kiocb *req, int status) struct io_timeout_data *io = req->async_data; if (hrtimer_try_to_cancel(&io->timer) != -1) { + if (status) + req_set_fail(req); atomic_set(&req->ctx->cq_timeouts, atomic_read(&req->ctx->cq_timeouts) + 1); list_del_init(&req->timeout.list);

4 years, 1 month

1
0
0 0

FAILED: patch "[PATCH] io_uring: reexpand under-reexpanded iters" failed to apply to 5.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 89c2b3b74918200e46699338d7bcc19b1ea12110 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov <asml.silence(a)gmail.com> Date: Mon, 23 Aug 2021 11:18:45 +0100 Subject: [PATCH] io_uring: reexpand under-reexpanded iters [ 74.211232] BUG: KASAN: stack-out-of-bounds in iov_iter_revert+0x809/0x900 [ 74.212778] Read of size 8 at addr ffff888025dc78b8 by task syz-executor.0/828 [ 74.214756] CPU: 0 PID: 828 Comm: syz-executor.0 Not tainted 5.14.0-rc3-next-20210730 #1 [ 74.216525] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 74.219033] Call Trace: [ 74.219683] dump_stack_lvl+0x8b/0xb3 [ 74.220706] print_address_description.constprop.0+0x1f/0x140 [ 74.224226] kasan_report.cold+0x7f/0x11b [ 74.226085] iov_iter_revert+0x809/0x900 [ 74.227960] io_write+0x57d/0xe40 [ 74.232647] io_issue_sqe+0x4da/0x6a80 [ 74.242578] __io_queue_sqe+0x1ac/0xe60 [ 74.245358] io_submit_sqes+0x3f6e/0x76a0 [ 74.248207] __do_sys_io_uring_enter+0x90c/0x1a20 [ 74.257167] do_syscall_64+0x3b/0x90 [ 74.257984] entry_SYSCALL_64_after_hwframe+0x44/0xae old_size = iov_iter_count(); ... iov_iter_revert(old_size - iov_iter_count()); If iov_iter_revert() is done base on the initial size as above, and the iter is truncated and not reexpanded in the middle, it miscalculates borders causing problems. This trace is due to no one reexpanding after generic_write_checks(). Now iters store how many bytes has been truncated, so reexpand them to the initial state right before reverting. Cc: stable(a)vger.kernel.org Reported-by: Palash Oswal <oswalpalash(a)gmail.com> Reported-by: Sudip Mukherjee <sudipm.mukherjee(a)gmail.com> Reported-and-tested-by: syzbot+9671693590ef5aad8953(a)syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com> Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk> diff --git a/fs/io_uring.c b/fs/io_uring.c index d94fb5835a20..71639d39caf9 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -3281,6 +3281,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags) if (req->flags & REQ_F_NOWAIT) goto done; /* some cases will consume bytes even on error returns */ + iov_iter_reexpand(iter, iter->count + iter->truncated); iov_iter_revert(iter, io_size - iov_iter_count(iter)); ret = 0; } else if (ret == -EIOCBQUEUED) { @@ -3420,6 +3421,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags) } else { copy_iov: /* some cases will consume bytes even on error returns */ + iov_iter_reexpand(iter, iter->count + iter->truncated); iov_iter_revert(iter, io_size - iov_iter_count(iter)); ret = io_setup_async_rw(req, iovec, inline_vecs, iter, false); return ret ?: -EAGAIN;

4 years, 1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror September 2021