Greg,
These two patches have been (correctly) auto selected to 5.15.y
along with the two dependency patches tagged with:
Stable-dep-of: b306e90ffabd ("ovl: remove privs in ovl_copyfile()")
9636e70ee2d3 ("ovl: use ovl_copy_{real,upper}attr() wrappers")
a54843833caf ("ovl: store lower path in ovl_inode")
It wasn't wrong to apply those patches with the two dependencies
to 5.15.y, but it is not as easy to do for 5.10.y, so here is a
very simple backport of the two fixes to 5.10.y, i.e.:
replaced ovl_copyattr(X) with ovl_copyattr(ovl_inode_real(X), X).
Note that the language "This fixes some failure in fstests..."
in commit message means that those fixes are not enough for the
tests to pass. Additional backports from v6.2 are needed for the
tests to pass and I am collaborating those backports with Leah,
so they will hit 5.15.y first before posting them for 5.10.y.
Never the less, these overlayfs fixes are important security
fixes, so they should be applied to LTS kernel even before
all the cases in the fstests are fixed.
Thanks,
Amir.
Amir Goldstein (2):
ovl: remove privs in ovl_copyfile()
ovl: remove privs in ovl_fallocate()
fs/overlayfs/file.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
--
2.34.1
BugLink: https://bugs.launchpad.net/bugs/2007581
Commit 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members
before initialization") attempted to fix a race condition that lead to a
NULL pointer, but in the process caused a regression for _AEI/_EVT
declared GPIOs.
This manifests in messages showing deferred probing while trying to
allocate IRQs like so:
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x0000 to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x002C to IRQ, err -517
amd_gpio AMDI0030:00: Failed to translate GPIO pin 0x003D to IRQ, err -517
[ .. more of the same .. ]
The code for walking _AEI doesn't handle deferred probing and so this
leads to non-functional GPIO interrupts.
Fix this issue by moving the call to `acpi_gpiochip_request_interrupts`
to occur after gc->irc.initialized is set.
Fixes: 5467801f1fcb ("gpio: Restrict usage of GPIO chip irq members before initialization")
Link: https://lore.kernel.org/linux-gpio/BL1PR12MB51577A77F000A008AA694675E2EF9@B…
Link: https://bugzilla.suse.com/show_bug.cgi?id=1198697
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215850
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1979
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1976
Reported-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Shreeya Patel <shreeya.patel(a)collabora.com>
Tested-By: Samuel Čavoj <samuel(a)cavoj.net>
Tested-By: lukeluk498(a)gmail.com Link:
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Acked-by: Linus Walleij <linus.walleij(a)linaro.org>
Reviewed-and-tested-by: Takashi Iwai <tiwai(a)suse.de>
Cc: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
(backported from commit 06fb4ecfeac7e00d6704fa5ed19299f2fefb3cc9)
Signed-off-by: Asmaa Mnebhi <asmaa(a)nvidia.com>
---
drivers/gpio/gpiolib.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index e4d47e00c392..049cdfc975b3 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2329,8 +2329,6 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
gpiochip_set_irq_hooks(gpiochip);
- acpi_gpiochip_request_interrupts(gpiochip);
-
/*
* Using barrier() here to prevent compiler from reordering
* gc->irq.initialized before initialization of above
@@ -2340,6 +2338,8 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
gpiochip->irq.initialized = true;
+ acpi_gpiochip_request_interrupts(gpiochip);
+
return 0;
}
--
2.30.1
A number of Cezanne systems report IRQ1 as a wakeup source when it's not actually
a wakeup. This can cause problems for certain ACPI events. The following fix
went upstream that fixed it:
commit 8e60615e8932 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN")
It was reported that this fix actually helped here with older kernels too:
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1921#note_1770257
So backport this fix to 5.15.y as well.
This backport is dependent upon being able to read the SMU version which
happened in a different commit. So backport that commit and follow up fixes
as well.
v1->v2:
* Split into multiple commits
* Catch some fixes for reading SMU version too
Hans de Goede (1):
platform/x86: amd-pmc: Fix compilation when CONFIG_DEBUGFS is disabled
Mario Limonciello (2):
platform/x86: amd-pmc: Correct usage of SMU version
platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN
Sanket Goswami (1):
platform/x86: amd-pmc: Export Idlemask values based on the APU
drivers/platform/x86/amd-pmc.c | 116 +++++++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
--
2.34.1
No upstream commit exists: the problem addressed here is that
'commit 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")'
was backported to 5.10. This commit is broken, but nobody noticed
upstream, since shortly after s390 converted to generic entry with
'commit 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")', which
implicitly fixed the problem outlined below.
Thread flag is set to TIF_NOTIFY_SIGNAL for io_uring work. The io work
user or syscall calls do_signal when either one of the TIF_SIGPENDING or
TIF_NOTIFY_SIGNAL flag is set. However, do_signal does consider only
TIF_SIGPENDING signal and ignores TIF_NOTIFY_SIGNAL condition. This
means get_signal is never invoked for TIF_NOTIFY_SIGNAL and hence the
flag is not cleared, which results in an endless do_signal loop.
Reference: 'commit 788d0824269b ("io_uring: import 5.15-stable io_uring")'
Fixes: 75309018a24d ("s390: add support for TIF_NOTIFY_SIGNAL")
Cc: stable(a)vger.kernel.org # 5.10.162
Acked-by: Heiko Carstens <hca(a)linux.ibm.com>
Acked-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
---
arch/s390/kernel/signal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/s390/kernel/signal.c b/arch/s390/kernel/signal.c
index b27b6c1f058d..9e900a8977bd 100644
--- a/arch/s390/kernel/signal.c
+++ b/arch/s390/kernel/signal.c
@@ -472,7 +472,7 @@ void do_signal(struct pt_regs *regs)
current->thread.system_call =
test_pt_regs_flag(regs, PIF_SYSCALL) ? regs->int_code : 0;
- if (test_thread_flag(TIF_SIGPENDING) && get_signal(&ksig)) {
+ if (get_signal(&ksig)) {
/* Whee! Actually deliver the signal. */
if (current->thread.system_call) {
regs->int_code = current->thread.system_call;
--
2.37.2
commit 5f58d783fd7823b2c2d5954d1126e702f94bfc4c upstream
We have this check to make sure we don't accidentally add older devices
that may have disappeared and re-appeared with an older generation from
being added to an fs_devices (such as a replace source device). This
makes sense, we don't want stale disks in our file system. However for
single disks this doesn't really make sense.
I've seen this in testing, but I was provided a reproducer from a
project that builds btrfs images on loopback devices. The loopback
device gets cached with the new generation, and then if it is re-used to
generate a new file system we'll fail to mount it because the new fs is
"older" than what we have in cache.
Fix this by freeing the cache when closing the device for a single device
filesystem. This will ensure that the mount command passed device path is
scanned successfully during the next mount.
CC: stable(a)vger.kernel.org # 5.10+
Reported-by: Daan De Meyer <daandemeyer(a)fb.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
---
This patch has already been submitted for the LTS stable 5.10 and above.
fs/btrfs/volumes.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 548de841cee5..dacaea61c2f7 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -354,6 +354,7 @@ void btrfs_free_device(struct btrfs_device *device)
static void free_fs_devices(struct btrfs_fs_devices *fs_devices)
{
struct btrfs_device *device;
+
WARN_ON(fs_devices->opened);
while (!list_empty(&fs_devices->devices)) {
device = list_entry(fs_devices->devices.next,
@@ -1401,6 +1402,17 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
if (!fs_devices->opened) {
seed_devices = fs_devices->seed;
fs_devices->seed = NULL;
+
+ /*
+ * If the struct btrfs_fs_devices is not assembled with any
+ * other device, it can be re-initialized during the next mount
+ * without the needing device-scan step. Therefore, it can be
+ * fully freed.
+ */
+ if (fs_devices->num_devices == 1) {
+ list_del(&fs_devices->fs_list);
+ free_fs_devices(fs_devices);
+ }
}
mutex_unlock(&uuid_mutex);
--
2.31.1
This patch series is to remove reader optimistic spinning in
kernel 5.10 to improve the MongoDB performance. Performance measurements
(10 times running average of overall throughput ops/sec) are using
MongoDB 5.0.11 and YCSB [1] microbenchmark with workloadA [2] on AWS EC2
m5.4xlarge/m6g.4xlarge (16-vCPU 64GiB-memory) instances with a 512GB EBS
IO1 drive disk with 5000 IOPS and separating MongoDB and YCSB load generator
on 2 instances and setting recordcount=25000000 and operationcount=10000000
to see the impacts of these changes:
Before - v5.10.165 kernel in OS Amazon Linux 2
After - v5.10.165 kernel with reader spinning disabled in OS Amazon Linux 2
| Arch | Instance Type | Before | After |
|---------+---------------+---------+---------|
| x86_64 | m5.4xlarge | 37365.4 | 42373.9 |
|---------+---------------+---------+---------|
| aarch64 | m6g.4xlarge | 33823.1 | 43113.7 |
|---------+---------------+---------+---------|
It can be seen that the MongoDB throughput can be improved around 13% in x86_64
and 27% in aarch64 after disabling reader optimistic spinning and these patches
can be applied to 5.10 with no conflict so we wonder if it's possible to backport
them to stable 5.10?
[1] https://github.com/brianfrankcooper/YCSB/releases/download/0.17.0/ycsb-0.17…
[2] https://github.com/brianfrankcooper/YCSB/blob/master/workloads/workloada
Thanks,
Shaoying
Peter Zijlstra (3):
locking/rwsem: Better collate rwsem_read_trylock()
locking/rwsem: Introduce rwsem_write_trylock()
locking/rwsem: Fold __down_{read,write}*()
Waiman Long (4):
locking/rwsem: Pass the current atomic count to
rwsem_down_read_slowpath()
locking/rwsem: Prevent potential lock starvation
locking/rwsem: Enable reader optimistic lock stealing
locking/rwsem: Remove reader optimistic spinning
kernel/locking/lock_events_list.h | 6 +-
kernel/locking/rwsem.c | 359 +++++++++---------------------
2 files changed, 106 insertions(+), 259 deletions(-)
--
2.38.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ec76d0c2da5c ("vmxnet3: move rss code block under eop descriptor")
bdeed8b0958c ("vmxnet3: Record queue number to incoming packets")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ec76d0c2da5c6dfb6a33f1545cc15997013923da Mon Sep 17 00:00:00 2001
From: Ronak Doshi <doshir(a)vmware.com>
Date: Wed, 8 Feb 2023 14:38:59 -0800
Subject: [PATCH] vmxnet3: move rss code block under eop descriptor
Commit b3973bb40041 ("vmxnet3: set correct hash type based on
rss information") added hashType information into skb. However,
rssType field is populated for eop descriptor. This can lead
to incorrectly reporting of hashType for packets which use
multiple rx descriptors. Multiple rx descriptors are used
for Jumbo frame or LRO packets, which can hit this issue.
This patch moves the RSS codeblock under eop descritor.
Cc: stable(a)vger.kernel.org
Fixes: b3973bb40041 ("vmxnet3: set correct hash type based on rss information")
Signed-off-by: Ronak Doshi <doshir(a)vmware.com>
Acked-by: Peng Li <lpeng(a)vmware.com>
Acked-by: Guolin Yang <gyang(a)vmware.com>
Link: https://lore.kernel.org/r/20230208223900.5794-1-doshir@vmware.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 56267c327f0b..682987040ea8 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1546,31 +1546,6 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
rxd->len = rbi->len;
}
-#ifdef VMXNET3_RSS
- if (rcd->rssType != VMXNET3_RCD_RSS_TYPE_NONE &&
- (adapter->netdev->features & NETIF_F_RXHASH)) {
- enum pkt_hash_types hash_type;
-
- switch (rcd->rssType) {
- case VMXNET3_RCD_RSS_TYPE_IPV4:
- case VMXNET3_RCD_RSS_TYPE_IPV6:
- hash_type = PKT_HASH_TYPE_L3;
- break;
- case VMXNET3_RCD_RSS_TYPE_TCPIPV4:
- case VMXNET3_RCD_RSS_TYPE_TCPIPV6:
- case VMXNET3_RCD_RSS_TYPE_UDPIPV4:
- case VMXNET3_RCD_RSS_TYPE_UDPIPV6:
- hash_type = PKT_HASH_TYPE_L4;
- break;
- default:
- hash_type = PKT_HASH_TYPE_L3;
- break;
- }
- skb_set_hash(ctx->skb,
- le32_to_cpu(rcd->rssHash),
- hash_type);
- }
-#endif
skb_record_rx_queue(ctx->skb, rq->qid);
skb_put(ctx->skb, rcd->len);
@@ -1653,6 +1628,31 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
u32 mtu = adapter->netdev->mtu;
skb->len += skb->data_len;
+#ifdef VMXNET3_RSS
+ if (rcd->rssType != VMXNET3_RCD_RSS_TYPE_NONE &&
+ (adapter->netdev->features & NETIF_F_RXHASH)) {
+ enum pkt_hash_types hash_type;
+
+ switch (rcd->rssType) {
+ case VMXNET3_RCD_RSS_TYPE_IPV4:
+ case VMXNET3_RCD_RSS_TYPE_IPV6:
+ hash_type = PKT_HASH_TYPE_L3;
+ break;
+ case VMXNET3_RCD_RSS_TYPE_TCPIPV4:
+ case VMXNET3_RCD_RSS_TYPE_TCPIPV6:
+ case VMXNET3_RCD_RSS_TYPE_UDPIPV4:
+ case VMXNET3_RCD_RSS_TYPE_UDPIPV6:
+ hash_type = PKT_HASH_TYPE_L4;
+ break;
+ default:
+ hash_type = PKT_HASH_TYPE_L3;
+ break;
+ }
+ skb_set_hash(skb,
+ le32_to_cpu(rcd->rssHash),
+ hash_type);
+ }
+#endif
vmxnet3_rx_csum(adapter, skb,
(union Vmxnet3_GenericDesc *)rcd);
skb->protocol = eth_type_trans(skb, adapter->netdev);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ce4d9a1ea35a ("of: reserved_mem: Have kmemleak ignore dynamically allocated reserved mem")
3ecc68349bba ("memblock: rename memblock_free to memblock_phys_free")
fa27717110ae ("memblock: drop memblock_free_early_nid() and memblock_free_early()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ce4d9a1ea35ac5429e822c4106cb2859d5c71f3e Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacmanjarres(a)google.com>
Date: Wed, 8 Feb 2023 15:20:00 -0800
Subject: [PATCH] of: reserved_mem: Have kmemleak ignore dynamically allocated
reserved mem
Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
When trying to boot a device with an ARM64 kernel with the following
config options enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_KMEMLEAK=y
a crash is encountered when kmemleak starts to scan the list of gray
or allocated objects that it maintains. Upon closer inspection, it was
observed that these page-faults always occurred when kmemleak attempted
to scan a CMA region.
At the moment, kmemleak is made aware of CMA regions that are specified
through the devicetree to be dynamically allocated within a range of
addresses. However, kmemleak should not need to scan CMA regions or any
reserved memory region, as those regions can be used for DMA transfers
between drivers and peripherals, and thus wouldn't contain anything
useful for kmemleak.
Additionally, since CMA regions are unmapped from the kernel's address
space when they are freed to the buddy allocator at boot when
CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
those memory regions, as that will trigger a crash. Thus, kmemleak
should ignore all dynamically allocated reserved memory regions.
This patch (of 1):
Currently, kmemleak ignores dynamically allocated reserved memory regions
that don't have a kernel mapping. However, regions that do retain a
kernel mapping (e.g. CMA regions) do get scanned by kmemleak.
This is not ideal for two reasons:
1 kmemleak works by scanning memory regions for pointers to allocated
objects to determine if those objects have been leaked or not.
However, reserved memory regions can be used between drivers and
peripherals for DMA transfers, and thus, would not contain pointers to
allocated objects, making it unnecessary for kmemleak to scan these
reserved memory regions.
2 When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
CMA reserved memory regions are unmapped from the kernel's address
space when they are freed to buddy at boot. These CMA reserved regions
are still tracked by kmemleak, however, and when kmemleak attempts to
scan them, a crash will happen, as accessing the CMA region will result
in a page-fault, since the regions are unmapped.
Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
memory regions, instead of those that do not have a kernel mapping
associated with them.
Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Frank Rowand <frowand.list(a)gmail.com>
Cc: Kirill A. Shutemov <kirill.shtuemov(a)linux.intel.com>
Cc: Nick Kossifidis <mick(a)ics.forth.gr>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: Rob Herring <robh(a)kernel.org>
Cc: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: Saravana Kannan <saravanak(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 65f3b02a0e4e..f90975e00446 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -48,9 +48,10 @@ static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
err = memblock_mark_nomap(base, size);
if (err)
memblock_phys_free(base, size);
- kmemleak_ignore_phys(base);
}
+ kmemleak_ignore_phys(base);
+
return err;
}