Hi Greg, Sasha,
The batch contains Netfilter fixes for -stable 5.4. This batch is
targeting at garbage collection (GC) / set timeout fixes that address
possible UaF and memleaks as well as assorted bugs.
I am using original commit IDs for reference:
1) 0c2a85edd143 ("netfilter: nf_tables: pass context to nft_set_destroy()")
2) f8bb7889af58 ("netfilter: nftables: rename set element data activation/deactivation functions")
3) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
4) c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
5) 61ae320a29b0 ("netfilter: nft_set_rbtree: fix null deref on element insertion")
6) f718863aca46 ("netfilter: nft_set_rbtree: fix overlap expiration walk")
7) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
8) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
9) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
10) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
11) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
12) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
13) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
14) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
15) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
16) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
17) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
18) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
19) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
20) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
21) 6069da443bf6 ("netfilter: nf_tables: unregister flowtable hooks on netns exit")
22) f9a43007d3f7 ("netfilter: nf_tables: double hook unregistration in netns path")
23) 0ce7cf4127f1 ("netfilter: nftables: update table flags from the commit phase")
24) 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
25) c9bd26513b3a ("netfilter: nf_tables: disable toggling dormant table state more than once")
26) ("netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush (for 5.4)")
does not exist in any tree, but it is required to fix a bogus EBUSY error in 5.4.
This bug was implicitly fixed by 3f0465a9ef02 ("netfilter: nf_tables: dynamically
allocate hooks per net_device in flowtables").
Please apply,
Thanks.
Florian Westphal (4):
netfilter: nft_set_rbtree: fix null deref on element insertion
netfilter: nft_set_rbtree: fix overlap expiration walk
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: defer gc run if previous batch is still pending
Pablo Neira Ayuso (22):
netfilter: nf_tables: pass context to nft_set_destroy()
netfilter: nftables: rename set element data activation/deactivation functions
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nft_set_rbtree: Switch to node list walk for overlap detection
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: unregister flowtable hooks on netns exit
netfilter: nf_tables: double hook unregistration in netns path
netfilter: nftables: update table flags from the commit phase
netfilter: nf_tables: fix table flag updates
netfilter: nf_tables: disable toggling dormant table state more than once
netfilter: nf_tables: bogus EBUSY when deleting flowtable after flush (for 5.4)
include/net/netfilter/nf_tables.h | 129 +++---
include/uapi/linux/netfilter/nf_tables.h | 1 +
net/netfilter/nf_tables_api.c | 512 +++++++++++++++++++----
net/netfilter/nft_chain_filter.c | 3 +
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 110 +++--
net/netfilter/nft_set_rbtree.c | 375 +++++++++++++----
7 files changed, 867 insertions(+), 268 deletions(-)
--
2.30.2
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 85dd3af64965c1c0eb7373b340a1b1f7773586b0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023112041-creasing-democrat-d805@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
85dd3af64965 ("mmc: sdhci-pci-gli: GL9755: Mask the replay timer timeout of AER")
f3a5b56c1286 ("mmc: sdhci-pci-gli: Add Genesys Logic GL9767 support")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 85dd3af64965c1c0eb7373b340a1b1f7773586b0 Mon Sep 17 00:00:00 2001
From: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Date: Tue, 7 Nov 2023 17:57:41 +0800
Subject: [PATCH] mmc: sdhci-pci-gli: GL9755: Mask the replay timer timeout of
AER
Due to a flaw in the hardware design, the GL9755 replay timer frequently
times out when ASPM is enabled. As a result, the warning messages will
often appear in the system log when the system accesses the GL9755
PCI config. Therefore, the replay timer timeout must be masked.
Fixes: 36ed2fd32b2c ("mmc: sdhci-pci-gli: A workaround to allow GL9755 to enter ASPM L1.2")
Signed-off-by: Victor Shih <victor.shih(a)genesyslogic.com.tw>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Acked-by: Kai-Heng Feng <kai.heng.geng(a)canonical.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20231107095741.8832-3-victorshihgli@gmail.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-pci-gli.c b/drivers/mmc/host/sdhci-pci-gli.c
index d83261e857a5..044b4704d5bb 100644
--- a/drivers/mmc/host/sdhci-pci-gli.c
+++ b/drivers/mmc/host/sdhci-pci-gli.c
@@ -152,6 +152,9 @@
#define PCI_GLI_9755_PM_CTRL 0xFC
#define PCI_GLI_9755_PM_STATE GENMASK(1, 0)
+#define PCI_GLI_9755_CORRERR_MASK 0x214
+#define PCI_GLI_9755_CORRERR_MASK_REPLAY_TIMER_TIMEOUT BIT(12)
+
#define SDHCI_GLI_9767_GM_BURST_SIZE 0x510
#define SDHCI_GLI_9767_GM_BURST_SIZE_AXI_ALWAYS_SET BIT(8)
@@ -770,6 +773,11 @@ static void gl9755_hw_setting(struct sdhci_pci_slot *slot)
value &= ~PCI_GLI_9755_PM_STATE;
pci_write_config_dword(pdev, PCI_GLI_9755_PM_CTRL, value);
+ /* mask the replay timer timeout of AER */
+ pci_read_config_dword(pdev, PCI_GLI_9755_CORRERR_MASK, &value);
+ value |= PCI_GLI_9755_CORRERR_MASK_REPLAY_TIMER_TIMEOUT;
+ pci_write_config_dword(pdev, PCI_GLI_9755_CORRERR_MASK, value);
+
gl9755_wt_off(pdev);
}
Hi,
I notice a stable-specific regression on Bugzilla [1]. Quoting from it:
> The backport commit to 5.15 branch:
> 9d4f84a15f9c9727bc07f59d9dafc89e65aadb34 "arm64: dts: imx8mp: Add snps,gfladj-refclk-lpm-sel quirk to USB nodes" (from upstream commit 5c3d5ecf48ab06c709c012bf1e8f0c91e1fcd7ad)
> switched from "snps,dis-u2-freeclk-exists-quirk" to "snps,gfladj-refclk-lpm-sel-quirk".
>
> The problem is that the gfladj-refclk-lpm-sel-quirk quirk is not implemented / backported to 5.15 branch.
>
> This commit should be either reverted, or the commit introducing gfladj-refclk-lpm-sel-quirk needs to be merged to 5.15 kernel branch.
>
> As a result of this patch, on Gateworks Venice GW7400 revB board the USB 3.x devices which are connected to the USB Type C port does not enumerate and the following errors are generated:
>
> [ 14.906302] xhci-hcd xhci-hcd.0.auto: Timeout while waiting for setup device command
> [ 15.122383] usb 2-1: device not accepting address 2, error -62
> [ 25.282195] xhci-hcd xhci-hcd.0.auto: Abort failed to stop command ring: -110
> [ 25.297408] xhci-hcd xhci-hcd.0.auto: xHCI host controller not responding, assume dead
> [ 25.305345] xhci-hcd xhci-hcd.0.auto: HC died; cleaning up
> [ 25.311058] xhci-hcd xhci-hcd.0.auto: Timeout while waiting for stop endpoint command
> [ 25.334361] usb usb2-port1: couldn't allocate usb_device
>
> When the commit is reverted the USB 3.x drives works fine.
See Bugzilla for the full thread and attach dmesgs.
Anyway, I'm adding it to regzbot:
#regzbot introduced: 9d4f84a15f9c97 https://bugzilla.kernel.org/show_bug.cgi?id=217670
#regzbot title: regression in USB DWC3 driver due to missing gfladj-refclk-lpm-sel-quirk quirk
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217670
--
An old man doll... just what I always wanted! - Clara
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023110614-natural-tweak-9ee4@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 31 Oct 2023 12:24:53 -0400
Subject: [PATCH] tracing: Have trace_event_file have ref counters
The following can crash the kernel:
# cd /sys/kernel/tracing
# echo 'p:sched schedule' > kprobe_events
# exec 5>>events/kprobes/sched/enable
# > kprobe_events
# exec 5>&-
The above commands:
1. Change directory to the tracefs directory
2. Create a kprobe event (doesn't matter what one)
3. Open bash file descriptor 5 on the enable file of the kprobe event
4. Delete the kprobe event (removes the files too)
5. Close the bash file descriptor 5
The above causes a crash!
BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:tracing_release_file_tr+0xc/0x50
What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.
But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.
To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.…
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mark Rutland <mark.rutland(a)arm.com>
Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <beaub(a)linux.microsoft.com>
Tested-by: Beau Belgrave <beaub(a)linux.microsoft.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 12207dc6722d..696f8dc4aa53 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -492,6 +492,7 @@ enum {
EVENT_FILE_FL_TRIGGER_COND_BIT,
EVENT_FILE_FL_PID_FILTER_BIT,
EVENT_FILE_FL_WAS_ENABLED_BIT,
+ EVENT_FILE_FL_FREED_BIT,
};
extern struct trace_event_file *trace_get_event_file(const char *instance,
@@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...);
* TRIGGER_COND - When set, one or more triggers has an associated filter
* PID_FILTER - When set, the event is filtered based on pid
* WAS_ENABLED - Set when enabled to know to clear trace on module removal
+ * FREED - File descriptor is freed, all fields should be considered invalid
*/
enum {
EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT),
@@ -643,6 +645,7 @@ enum {
EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT),
EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT),
EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT),
+ EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT),
};
struct trace_event_file {
@@ -671,6 +674,7 @@ struct trace_event_file {
* caching and such. Which is mostly OK ;-)
*/
unsigned long flags;
+ atomic_t ref; /* ref count for opened files */
atomic_t sm_ref; /* soft-mode reference counter */
atomic_t tm_ref; /* trigger-mode reference counter */
};
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2539cfc20a97..9aebf904ff97 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4978,6 +4978,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp)
if (ret)
return ret;
+ mutex_lock(&event_mutex);
+
+ /* Fail if the file is marked for removal */
+ if (file->flags & EVENT_FILE_FL_FREED) {
+ trace_array_put(file->tr);
+ ret = -ENODEV;
+ } else {
+ event_file_get(file);
+ }
+
+ mutex_unlock(&event_mutex);
+ if (ret)
+ return ret;
+
filp->private_data = inode->i_private;
return 0;
@@ -4988,6 +5002,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp)
struct trace_event_file *file = inode->i_private;
trace_array_put(file->tr);
+ event_file_put(file);
return 0;
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 0e1405abf4f7..b7f4ea25a194 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1669,6 +1669,9 @@ extern void event_trigger_unregister(struct event_command *cmd_ops,
char *glob,
struct event_trigger_data *trigger_data);
+extern void event_file_get(struct trace_event_file *file);
+extern void event_file_put(struct trace_event_file *file);
+
/**
* struct event_trigger_ops - callbacks for trace event triggers
*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f9e3e24d8796..f29e815ca5b2 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -990,13 +990,35 @@ static void remove_subsystem(struct trace_subsystem_dir *dir)
}
}
+void event_file_get(struct trace_event_file *file)
+{
+ atomic_inc(&file->ref);
+}
+
+void event_file_put(struct trace_event_file *file)
+{
+ if (WARN_ON_ONCE(!atomic_read(&file->ref))) {
+ if (file->flags & EVENT_FILE_FL_FREED)
+ kmem_cache_free(file_cachep, file);
+ return;
+ }
+
+ if (atomic_dec_and_test(&file->ref)) {
+ /* Count should only go to zero when it is freed */
+ if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED)))
+ return;
+ kmem_cache_free(file_cachep, file);
+ }
+}
+
static void remove_event_file_dir(struct trace_event_file *file)
{
eventfs_remove_dir(file->ei);
list_del(&file->list);
remove_subsystem(file->system);
free_event_filter(file->filter);
- kmem_cache_free(file_cachep, file);
+ file->flags |= EVENT_FILE_FL_FREED;
+ event_file_put(file);
}
/*
@@ -1369,7 +1391,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
flags = file->flags;
mutex_unlock(&event_mutex);
- if (!file)
+ if (!file || flags & EVENT_FILE_FL_FREED)
return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED &&
@@ -1403,7 +1425,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
ret = -ENODEV;
mutex_lock(&event_mutex);
file = event_file_data(filp);
- if (likely(file)) {
+ if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) {
ret = tracing_update_buffers(file->tr);
if (ret < 0) {
mutex_unlock(&event_mutex);
@@ -1683,7 +1705,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
mutex_lock(&event_mutex);
file = event_file_data(filp);
- if (file)
+ if (file && !(file->flags & EVENT_FILE_FL_FREED))
print_event_filter(file, s);
mutex_unlock(&event_mutex);
@@ -2902,6 +2924,7 @@ trace_create_new_event(struct trace_event_call *call,
atomic_set(&file->tm_ref, 0);
INIT_LIST_HEAD(&file->triggers);
list_add(&file->list, &tr->events);
+ event_file_get(file);
return file;
}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 33264e510d16..0c611b281a5b 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string)
struct event_filter *filter = NULL;
int err;
+ if (file->flags & EVENT_FILE_FL_FREED)
+ return -ENODEV;
+
if (!strcmp(strstrip(filter_string), "0")) {
filter_disable(file);
filter = event_filter(file);
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
commit 4595a298d5563cf76c1d852970f162051fd1a7a6 upstream.
For filesystems with block size < page size, we need to set all the
per-block uptodate bits if the page was already uptodate at the time
we create the per-block metadata. This can happen if the page is
invalidated (eg by a write to drop_caches) but ultimately not removed
from the page cache.
This is a data corruption issue as page writeback skips blocks which
are marked !uptodate.
Fixes: 9dc55f1389f9 ("iomap: add support for sub-pagesize buffered I/O without buffer heads")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Qian Cai <cai(a)redhat.com>
Cc: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Gao Xiang <hsiangkao(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Shida Zhang <zhangshida(a)kylinos.cn>
---
fs/iomap.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/iomap.c b/fs/iomap.c
index ac7b2152c3ad..04e82b6bd9bf 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -109,6 +109,7 @@ static struct iomap_page *
iomap_page_create(struct inode *inode, struct page *page)
{
struct iomap_page *iop = to_iomap_page(page);
+ unsigned int nr_blocks = PAGE_SIZE / i_blocksize(inode);
if (iop || i_blocksize(inode) == PAGE_SIZE)
return iop;
@@ -118,6 +119,8 @@ iomap_page_create(struct inode *inode, struct page *page)
atomic_set(&iop->write_count, 0);
spin_lock_init(&iop->uptodate_lock);
bitmap_zero(iop->uptodate, PAGE_SIZE / SECTOR_SIZE);
+ if (PageUptodate(page))
+ bitmap_fill(iop->uptodate, nr_blocks);
/*
* migrate_page_move_mapping() assumes that pages with private data have
--
2.27.0
This is the fix of CVE-2023-25012 for kernel v5.15.
Upstream commit: https://github.com/torvalds/linux/commit/7644b1a1c9a7ae8ab99175989bfc867605…
The affected code is in io_uring/io_uring.c instead of io_uring/fdinfo.c for v5.15. So the patch applies the same change to io_uring/io_uring.c.
Thanks!
He
Jens Axboe (1):
io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid
io_uring/io_uring.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
--
2.42.0.869.gea05f2083d-goog
From: Dongli Zhang <dongli.zhang(a)oracle.com>
[ Upstream commit 1978f30a87732d4d9072a20abeded9fe17884f1b ]
When tag_set->nr_maps is 1, the block layer limits the number of hw queues
by nr_cpu_ids. No matter how many hw queues are used by virtio-scsi, as it
has (tag_set->nr_maps == 1), it can use at most nr_cpu_ids hw queues.
In addition, specifically for pci scenario, when the 'num_queues' specified
by qemu is more than maxcpus, virtio-scsi would not be able to allocate
more than maxcpus vectors in order to have a vector for each queue. As a
result, it falls back into MSI-X with one vector for config and one shared
for queues.
Considering above reasons, this patch limits the number of hw queues used
by virtio-scsi by nr_cpu_ids.
Reviewed-by: Stefan Hajnoczi <stefanha(a)redhat.com>
Signed-off-by: Dongli Zhang <dongli.zhang(a)oracle.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Kunkun Jiang <jiangkunkun(a)huawei.com>
---
drivers/scsi/virtio_scsi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
index 50e87823baab..0b45424e7ca5 100644
--- a/drivers/scsi/virtio_scsi.c
+++ b/drivers/scsi/virtio_scsi.c
@@ -853,6 +853,7 @@ static int virtscsi_probe(struct virtio_device *vdev)
/* We need to know how many queues before we allocate. */
num_queues = virtscsi_config_get(vdev, num_queues) ? : 1;
+ num_queues = min_t(unsigned int, nr_cpu_ids, num_queues);
num_targets = virtscsi_config_get(vdev, max_target) + 1;
--
2.33.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 886b92f63573eab4ba30b06c4514b8f4af114e6a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023112409-fructose-spry-be2c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
886b92f63573 ("drm/amdgpu: ungate power gating when system suspend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 886b92f63573eab4ba30b06c4514b8f4af114e6a Mon Sep 17 00:00:00 2001
From: Perry Yuan <perry.yuan(a)amd.com>
Date: Thu, 27 Jul 2023 01:45:30 -0400
Subject: [PATCH] drm/amdgpu: ungate power gating when system suspend
[Why] During suspend, if GFX DPM is enabled and GFXOFF feature is
enabled the system may get hung. So, it is suggested to disable
GFXOFF feature during suspend and enable it after resume.
[How] Update the code to disable GFXOFF feature during suspend and enable
it after resume.
[ 311.396526] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 311.396530] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 311.396531] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Acked-by: Yang Wang <kevinyang.wang(a)amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng(a)amd.com>
Signed-off-by: Perry Yuan <perry.yuan(a)amd.com>
Signed-off-by: Kun Liu <kun.liu2(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index dafb8cc1c684..c8a3bf01743f 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3498,6 +3498,8 @@ static void gfx_v10_0_ring_invalidate_tlbs(struct amdgpu_ring *ring,
static void gfx_v10_0_update_spm_vmid_internal(struct amdgpu_device *adev,
unsigned int vmid);
+static int gfx_v10_0_set_powergating_state(void *handle,
+ enum amd_powergating_state state);
static void gfx10_kiq_set_resources(struct amdgpu_ring *kiq_ring, uint64_t queue_mask)
{
amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_RESOURCES, 6));
@@ -7179,6 +7181,13 @@ static int gfx_v10_0_hw_fini(void *handle)
amdgpu_irq_put(adev, &adev->gfx.priv_reg_irq, 0);
amdgpu_irq_put(adev, &adev->gfx.priv_inst_irq, 0);
+ /* WA added for Vangogh asic fixing the SMU suspend failure
+ * It needs to set power gating again during gfxoff control
+ * otherwise the gfxoff disallowing will be failed to set.
+ */
+ if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(10, 3, 1))
+ gfx_v10_0_set_powergating_state(handle, AMD_PG_STATE_UNGATE);
+
if (!adev->no_hw_access) {
if (amdgpu_async_gfx_ring) {
if (amdgpu_gfx_disable_kgq(adev, 0))