From: Dominique Martinet <dominique.martinet(a)atmark-techno.com>
commit 89c7f5078935872cf47a713a645affb5037be694 upstream
This does not matter the least, but there is no other .[ch] file in the
repo that is executable, so clean this up.
Fixes: 29b83a64df3b ("MIPS: Octeon: Add PCIe link status check")
Signed-off-by: Dominique Martinet <dominique.martinet(a)atmark-techno.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
---
arch/mips/pci/pcie-octeon.c | 0
1 file changed, 0 insertions(+), 0 deletions(-)
mode change 100755 => 100644 arch/mips/pci/pcie-octeon.c
diff --git a/arch/mips/pci/pcie-octeon.c b/arch/mips/pci/pcie-octeon.c
old mode 100755
new mode 100644
--
2.43.4
In commit 91419ae0420f ("arm64: dts: rockchip: use BCLK to GPIO switch
on rk3399"), an additional pinctrl state was added whose default pinmux
is for 8ch i2s0. However, Puma only has 2ch i2s0. It's been overriding
the pinctrl-0 property but the second property override was missed in
the aforementioned commit.
On Puma, a hardware slider called "BIOS Disable/Normal Boot" can disable
eMMC and SPI to force booting from SD card. Another software-controlled
GPIO is then configured to override this behavior to make eMMC and SPI
available without human intervention. This is currently done in U-Boot
and it was enough until the aforementioned commit.
Indeed, because of this additional not-yet-overridden property, this
software-controlled GPIO is now muxed in a state that does not override
this hardware slider anymore, rendering SPI and eMMC flashes unusable.
Let's override the property with the 2ch pinmux to fix this.
While at it, add a GPIO hog for this software-controlled GPIO to make it
explicit and also make it reserve the pin through the pinctrl subsystem
to make sure nobody can mistakenly request it for something else: better
have a non-working feature than eMMC/SPI being corrupted "randomly"!
Signed-off-by: Quentin Schulz <quentin.schulz(a)cherry.de>
---
Quentin Schulz (2):
arm64: dts: rockchip: fix eMMC/SPI corruption when audio has been used on RK3399 Puma
arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma
arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 36 ++++++++++++++++++++++++---
1 file changed, 33 insertions(+), 3 deletions(-)
---
base-commit: e4fc196f5ba36eb7b9758cf2c73df49a44199895
change-id: 20240731-puma-emmc-6-c141e891220b
Best regards,
--
Quentin Schulz <quentin.schulz(a)cherry.de>
In _set_opp() we are normally bailing out when trying to set an OPP that is
the current one. This make perfect sense, but becomes a problem when
_set_required_opps() calls it recursively.
More precisely, when a required OPP is being shared by multiple PM domains,
we end up skipping to request the corresponding performance-state for all
of the PM domains, but the first one. Let's fix the problem, by calling
_set_opp_level() from _set_required_opps() instead.
Fixes: e37440e7e2c2 ("OPP: Call dev_pm_opp_set_opp() for required OPPs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
drivers/opp/core.c | 47 +++++++++++++++++++++++-----------------------
1 file changed, 24 insertions(+), 23 deletions(-)
diff --git a/drivers/opp/core.c b/drivers/opp/core.c
index cb4611fe1b5b..45eca65f27f9 100644
--- a/drivers/opp/core.c
+++ b/drivers/opp/core.c
@@ -1061,6 +1061,28 @@ static int _set_opp_bw(const struct opp_table *opp_table,
return 0;
}
+static int _set_opp_level(struct device *dev, struct opp_table *opp_table,
+ struct dev_pm_opp *opp)
+{
+ unsigned int level = 0;
+ int ret = 0;
+
+ if (opp) {
+ if (opp->level == OPP_LEVEL_UNSET)
+ return 0;
+
+ level = opp->level;
+ }
+
+ /* Request a new performance state through the device's PM domain. */
+ ret = dev_pm_domain_set_performance_state(dev, level);
+ if (ret)
+ dev_err(dev, "Failed to set performance state %u (%d)\n", level,
+ ret);
+
+ return ret;
+}
+
/* This is only called for PM domain for now */
static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
struct dev_pm_opp *opp, bool up)
@@ -1091,7 +1113,8 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
if (devs[index]) {
required_opp = opp ? opp->required_opps[index] : NULL;
- ret = dev_pm_opp_set_opp(devs[index], required_opp);
+ ret = _set_opp_level(devs[index], opp_table,
+ required_opp);
if (ret)
return ret;
}
@@ -1102,28 +1125,6 @@ static int _set_required_opps(struct device *dev, struct opp_table *opp_table,
return 0;
}
-static int _set_opp_level(struct device *dev, struct opp_table *opp_table,
- struct dev_pm_opp *opp)
-{
- unsigned int level = 0;
- int ret = 0;
-
- if (opp) {
- if (opp->level == OPP_LEVEL_UNSET)
- return 0;
-
- level = opp->level;
- }
-
- /* Request a new performance state through the device's PM domain. */
- ret = dev_pm_domain_set_performance_state(dev, level);
- if (ret)
- dev_err(dev, "Failed to set performance state %u (%d)\n", level,
- ret);
-
- return ret;
-}
-
static void _find_current_opp(struct device *dev, struct opp_table *opp_table)
{
struct dev_pm_opp *opp = ERR_PTR(-ENODEV);
--
2.34.1
This patchset serves to prevent an AB/BA deadlock:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg7…
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock
with Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Cc: stable(a)vger.kernel.org
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (2):
Revert "ALSA: firewire-lib: obsolete workqueue for period update"
Revert "ALSA: firewire-lib: operate for period elapse event in process
context"
sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 25 insertions(+), 14 deletions(-)
--
2.45.2
Good afternoon,
I am trying to allocate the 0-th page with mmap function in my code.
I am always getting this error with this error-code: mmap error ffffffff
Then I was searching the internet for this topic and I have found the
same topic at stackoverflow web pages.
Here I am sending the link:
https://stackoverflow.com/questions/63790813/allocating-address-zero-on-lin…
I was setting value of |/proc/sys/vm/mmap_min_add to zero and using the
root privileges along the link.
And I am having the same problem still.
Can you help, please.
Thank you for the reply.
Michal Hrachovec
|
These couple of patches intends to fix the reset-gpio handling
for imx335 driver.
Patch 1/2 mentions reset-gpio polarity in DT binding example.
Patch 2/2 fixes the logical value of reset-gpio during
power-on/power-off sequence.
--
Changes in v3:
- Rework 1/2 commit message
- Fix gpio include in DT example in 1/2
- Remove not-so-informative XCLR comment in 2/2
Changes in v2:
- Also include reset-gpio polarity, mention in DT binding
- Add Fixes tag in 2/2
- Set the reset line to high during init time in 2/2
Link to v2:
https://lore.kernel.org/linux-media/20240729110437.199428-1-umang.jain@idea…
Link to v1:
https://lore.kernel.org/linux-media/tyo5etjwsfznuk6vzwqmcphbu4pz4lskrg3fjie…
Signed-off-by: Umang Jain <umang.jain(a)ideasonboard.com>
---
Umang Jain (2):
dt-bindings: media: imx335: Add reset-gpios to the DT example
media: imx335: Fix reset-gpio handling
Documentation/devicetree/bindings/media/i2c/sony,imx335.yaml | 4 ++++
drivers/media/i2c/imx335.c | 9 ++++-----
2 files changed, 8 insertions(+), 5 deletions(-)
---
base-commit: f3d2b941adafcdfba9ef63d9ca5bb2d9b263e2af
change-id: 20240731-imx335-gpio-818d736f9295
Best regards,
--
Umang Jain <umang.jain(a)ideasonboard.com>
commit ab091ec536cb7b271983c0c063b17f62f3591583 upstream
There is a hardware power-saving problem with the Lenovo N60z
board. When turn it on and leave it for 10 hours, there is a
20% chance that a nvme disk will not wake up until reboot.
Link: https://lore.kernel.org/all/2B5581C46AC6E335+9c7a81f1-05fb-4fd0-9fbb-108757…
Signed-off-by: hmy <huanglin(a)uniontech.com>
Signed-off-by: Wentao Guan <guanwentao(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
---
drivers/nvme/host/pci.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 486e44d20b43..e4776cff4208 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2821,6 +2821,13 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
return NVME_QUIRK_SIMPLE_SUSPEND;
}
+ /*
+ * NVMe SSD drops off the PCIe bus after system idle
+ * for 10 hours on a Lenovo N60z board.
+ */
+ if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
+ return NVME_QUIRK_NO_APST;
+
return 0;
}
--
2.43.4
commit ab091ec536cb7b271983c0c063b17f62f3591583 upstream
There is a hardware power-saving problem with the Lenovo N60z
board. When turn it on and leave it for 10 hours, there is a
20% chance that a nvme disk will not wake up until reboot.
Link: https://lore.kernel.org/all/2B5581C46AC6E335+9c7a81f1-05fb-4fd0-9fbb-108757…
Signed-off-by: hmy <huanglin(a)uniontech.com>
Signed-off-by: Wentao Guan <guanwentao(a)uniontech.com>
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
---
drivers/nvme/host/pci.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 163497ef48fd..a243c066d923 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2481,6 +2481,13 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
return NVME_QUIRK_NO_APST;
}
+ /*
+ * NVMe SSD drops off the PCIe bus after system idle
+ * for 10 hours on a Lenovo N60z board.
+ */
+ if (dmi_match(DMI_BOARD_NAME, "LXKT-ZXEG-N6"))
+ return NVME_QUIRK_NO_APST;
+
return 0;
}
--
2.43.4
USB 3.0 on xhci1 is not used, as the controller shares the same PHY as
pcie1. The latter is enabled to support the M.2 PCIe WLAN card on this
design.
Mark USB 3.0 as disabled on this controller using the
"mediatek,u3p-dis-msk" property.
Fixes: 96564b1e2ea4 ("arm64: dts: mediatek: Introduce the MT8395 Radxa NIO 12L board")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
---
arch/arm64/boot/dts/mediatek/mt8395-radxa-nio-12l.dts | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/mediatek/mt8395-radxa-nio-12l.dts b/arch/arm64/boot/dts/mediatek/mt8395-radxa-nio-12l.dts
index 4b5f6cf16f70..096fa999aa59 100644
--- a/arch/arm64/boot/dts/mediatek/mt8395-radxa-nio-12l.dts
+++ b/arch/arm64/boot/dts/mediatek/mt8395-radxa-nio-12l.dts
@@ -898,6 +898,7 @@ &xhci1 {
usb2-lpm-disable;
vusb33-supply = <&mt6359_vusb_ldo_reg>;
vbus-supply = <&vsys>;
+ mediatek,u3p-dis-msk = <1>;
status = "okay";
};
--
2.46.0.rc1.232.g9752f9e123-goog
USB 3.0 on xhci1 is not used, as the controller shares the same PHY as
pcie1. The latter is enabled to support the M.2 PCIe WLAN card on this
design.
Mark USB 3.0 as disabled on this controller using the
"mediatek,u3p-dis-msk" property.
Reported-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com> #KernelCI
Closes: https://lore.kernel.org/all/9fce9838-ef87-4d1b-b3df-63e1ddb0ec51@notapiano/
Fixes: b6267a396e1c ("arm64: dts: mediatek: cherry: Enable T-PHYs and USB XHCI controllers")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org>
---
arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
index fe5400e17b0f..d3a52acbe48a 100644
--- a/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8195-cherry.dtsi
@@ -1404,6 +1404,7 @@ &xhci1 {
rx-fifo-depth = <3072>;
vusb33-supply = <&mt6359_vusb_ldo_reg>;
vbus-supply = <&usb_vbus>;
+ mediatek,u3p-dis-msk = <1>;
};
&xhci2 {
--
2.46.0.rc1.232.g9752f9e123-goog
The patch titled
Subject: kcov: properly check for softirq context
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kcov-properly-check-for-softirq-context.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrey Konovalov <andreyknvl(a)gmail.com>
Subject: kcov: properly check for softirq context
Date: Mon, 29 Jul 2024 04:21:58 +0200
When collecting coverage from softirqs, KCOV uses in_serving_softirq() to
check whether the code is running in the softirq context. Unfortunately,
in_serving_softirq() is > 0 even when the code is running in the hardirq
or NMI context for hardirqs and NMIs that happened during a softirq.
As a result, if a softirq handler contains a remote coverage collection
section and a hardirq with another remote coverage collection section
happens during handling the softirq, KCOV incorrectly detects a nested
softirq coverate collection section and prints a WARNING, as reported by
syzbot.
This issue was exposed by commit a7f3813e589f ("usb: gadget: dummy_hcd:
Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using
hrtimer and made the timer's callback be executed in the hardirq context.
Change the related checks in KCOV to account for this behavior of
in_serving_softirq() and make KCOV ignore remote coverage collection
sections in the hardirq and NMI contexts.
This prevents the WARNING printed by syzbot but does not fix the inability
of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd
is in use (caused by a7f3813e589f); a separate patch is required for that.
Link: https://lkml.kernel.org/r/20240729022158.92059-1-andrey.konovalov@linux.dev
Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts")
Signed-off-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Reported-by: syzbot+2388cdaeb6b10f0c13ac(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac
Acked-by: Marco Elver <elver(a)google.com>
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Aleksandr Nogikh <nogikh(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Marcello Sylvester Bauer <sylv(a)sylv.io>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kcov.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
--- a/kernel/kcov.c~kcov-properly-check-for-softirq-context
+++ a/kernel/kcov.c
@@ -161,6 +161,15 @@ static void kcov_remote_area_put(struct
kmsan_unpoison_memory(&area->list, sizeof(area->list));
}
+/*
+ * Unlike in_serving_softirq(), this function returns false when called during
+ * a hardirq or an NMI that happened in the softirq context.
+ */
+static inline bool in_softirq_really(void)
+{
+ return in_serving_softirq() && !in_hardirq() && !in_nmi();
+}
+
static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
{
unsigned int mode;
@@ -170,7 +179,7 @@ static notrace bool check_kcov_mode(enum
* so we ignore code executed in interrupts, unless we are in a remote
* coverage collection section in a softirq.
*/
- if (!in_task() && !(in_serving_softirq() && t->kcov_softirq))
+ if (!in_task() && !(in_softirq_really() && t->kcov_softirq))
return false;
mode = READ_ONCE(t->kcov_mode);
/*
@@ -849,7 +858,7 @@ void kcov_remote_start(u64 handle)
if (WARN_ON(!kcov_check_handle(handle, true, true, true)))
return;
- if (!in_task() && !in_serving_softirq())
+ if (!in_task() && !in_softirq_really())
return;
local_lock_irqsave(&kcov_percpu_data.lock, flags);
@@ -991,7 +1000,7 @@ void kcov_remote_stop(void)
int sequence;
unsigned long flags;
- if (!in_task() && !in_serving_softirq())
+ if (!in_task() && !in_softirq_really())
return;
local_lock_irqsave(&kcov_percpu_data.lock, flags);
_
Patches currently in -mm which might be from andreyknvl(a)gmail.com are
kcov-properly-check-for-softirq-context.patch
kcov-dont-instrument-lib-find_bitc.patch
We recently made GUP's common page table walking code to also walk hugetlb
VMAs without most hugetlb special-casing, preparing for the future of
having less hugetlb-specific page table walking code in the codebase.
Turns out that we missed one page table locking detail: page table locking
for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB
hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the
page tables, will perform a pte_offset_map_lock() to grab the PTE table
lock.
However, hugetlb that concurrently modifies these page tables would
actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the
locks would differ. Something similar can happen right now with hugetlb
folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------
[ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188
[ 3105.944634] Modules linked in: [...]
[ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1
[ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024
[ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 3105.991108] pc : try_grab_folio+0x11c/0x188
[ 3105.994013] lr : follow_page_pte+0xd8/0x430
[ 3105.996986] sp : ffff80008eafb8f0
[ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43
[ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48
[ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978
[ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001
[ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000
[ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000
[ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0
[ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080
[ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000
[ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000
[ 3106.047957] Call trace:
[ 3106.049522] try_grab_folio+0x11c/0x188
[ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0
[ 3106.055527] follow_page_mask+0x1a0/0x2b8
[ 3106.058118] __get_user_pages+0xf0/0x348
[ 3106.060647] faultin_page_range+0xb0/0x360
[ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively uses the same PT locks as any
core-mm page table walker would. Add ptep_lockptr() to obtain the PTE
page table lock using a pte pointer -- unfortunately we cannot convert
pte_lockptr() because virt_to_page() doesn't work with kmap'ed page
tables we can have with CONFIG_HIGHPTE.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb
folio being mapped using two PTE page tables. While hugetlb wants to take
the PMD table lock, core-mm would grab the PTE table lock of one of both
PTE page tables. In such corner cases, we have to make sure that both
locks match, which is (fortunately!) currently guaranteed for 8xx as it
does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Cc: <stable(a)vger.kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
Still busy runtime-testing of this version -- have to set up my ARM
environment again. Dropped the RB's/ACKs because there was significant
change in the pte_lockptr() handling.
v1 -> 2:
* Extend patch description
* Drop "mm: let pte_lockptr() consume a pte_t pointer"
* Introduce ptep_lockptr() in this patch
I wish there was a nicer way to avoid messing with CONFIG_HIGHPTE ...
---
include/linux/hugetlb.h | 26 +++++++++++++++++++++++---
include/linux/mm.h | 10 ++++++++++
2 files changed, 33 insertions(+), 3 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index c9bf68c239a01..dd6d4ee5ee59c 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -944,10 +944,30 @@ static inline bool htlb_allow_alloc_fallback(int reason)
static inline spinlock_t *huge_pte_lockptr(struct hstate *h,
struct mm_struct *mm, pte_t *pte)
{
- if (huge_page_size(h) == PMD_SIZE)
+ VM_WARN_ON(huge_page_size(h) == PAGE_SIZE);
+ VM_WARN_ON(huge_page_size(h) >= P4D_SIZE);
+
+ /*
+ * hugetlb must use the exact same PT locks as core-mm page table
+ * walkers would. When modifying a PTE table, hugetlb must take the
+ * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD
+ * PT lock etc.
+ *
+ * The expectation is that any hugetlb folio smaller than a PMD is
+ * always mapped into a single PTE table and that any hugetlb folio
+ * smaller than a PUD (but at least as big as a PMD) is always mapped
+ * into a single PMD table.
+ *
+ * If that does not hold for an architecture, then that architecture
+ * must disable split PT locks such that all *_lockptr() functions
+ * will give us the same result: the per-MM PT lock.
+ */
+ if (huge_page_size(h) < PMD_SIZE && !IS_ENABLED(CONFIG_HIGHPTE))
+ /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */
+ return ptep_lockptr(mm, pte);
+ else if (huge_page_size(h) < PUD_SIZE)
return pmd_lockptr(mm, (pmd_t *) pte);
- VM_BUG_ON(huge_page_size(h) == PAGE_SIZE);
- return &mm->page_table_lock;
+ return pud_lockptr(mm, (pud_t *) pte);
}
#ifndef hugepages_supported
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b100df8cb5857..1b1f40ff00b7d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2926,6 +2926,12 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE));
+ return ptlock_ptr(virt_to_ptdesc(pte));
+}
+
static inline bool ptlock_init(struct ptdesc *ptdesc)
{
/*
@@ -2950,6 +2956,10 @@ static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
{
return &mm->page_table_lock;
}
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte)
+{
+ return &mm->page_table_lock;
+}
static inline void ptlock_cache_init(void) {}
static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; }
static inline void ptlock_free(struct ptdesc *ptdesc) {}
--
2.45.2
The quilt patch titled
Subject: mm: let pte_lockptr() consume a pte_t pointer
has been removed from the -mm tree. Its filename was
mm-let-pte_lockptr-consume-a-pte_t-pointer.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm: let pte_lockptr() consume a pte_t pointer
Date: Thu, 25 Jul 2024 20:39:54 +0200
Patch series "mm/hugetlb: fix hugetlb vs. core-mm PT locking".
Working on another generic page table walker that tries to avoid
special-casing hugetlb, I found a page table locking issue with hugetlb
folios that are not mapped using a single PMD/PUD.
For some hugetlb folio sizes, GUP will take different page table locks
when walking the page tables than hugetlb when modifying the page tables.
I did not actually try reproducing an issue, but looking at
follow_pmd_mask() where we might be rereading a PMD value multiple times
it's rather clear that concurrent modifications are rather unpleasant.
In follow_page_pte() we might be better in that regard -- ptep_get() does
a READ_ONCE() -- but who knows what else could happen concurrently in some
weird corner cases (e.g., hugetlb folio getting unmapped and freed).
This patch (of 2):
pte_lockptr() is the only *_lockptr() function that doesn't consume what
would be expected: it consumes a pmd_t pointer instead of a pte_t pointer.
Let's change that. The two callers in pgtable-generic.c are easily
adjusted. Adjust khugepaged.c:retract_page_tables() to simply do a
pte_offset_map_nolock() to obtain the lock, even though we won't actually
be traversing the page table.
This makes the code more similar to the other variants and avoids other
hacks to make the new pte_lockptr() version happy. pte_lockptr() users
reside now only in pgtable-generic.c.
Maybe, using pte_offset_map_nolock() is the right thing to do because the
PTE table could have been removed in the meantime? At least it sounds
more future proof if we ever have other means of page table reclaim.
It's not quite clear if holding the PTE table lock is really required:
what if someone else obtains the lock just after we unlock it? But we'll
leave that as is for now, maybe there are good reasons.
This is a preparation for adapting hugetlb page table locking logic to
take the same locks as core-mm page table walkers would.
Link: https://lkml.kernel.org/r/20240725183955.2268884-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240725183955.2268884-2-david@redhat.com
Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 7 ++++---
mm/khugepaged.c | 21 +++++++++++++++------
mm/pgtable-generic.c | 4 ++--
3 files changed, 21 insertions(+), 11 deletions(-)
--- a/include/linux/mm.h~mm-let-pte_lockptr-consume-a-pte_t-pointer
+++ a/include/linux/mm.h
@@ -2915,9 +2915,10 @@ static inline spinlock_t *ptlock_ptr(str
}
#endif /* ALLOC_SPLIT_PTLOCKS */
-static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
+static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte)
{
- return ptlock_ptr(page_ptdesc(pmd_page(*pmd)));
+ /* PTE page tables don't currently exceed a single page. */
+ return ptlock_ptr(virt_to_ptdesc(pte));
}
static inline bool ptlock_init(struct ptdesc *ptdesc)
@@ -2940,7 +2941,7 @@ static inline bool ptlock_init(struct pt
/*
* We use mm->page_table_lock to guard all pagetable pages of the mm.
*/
-static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pmd_t *pmd)
+static inline spinlock_t *pte_lockptr(struct mm_struct *mm, pte_t *pte)
{
return &mm->page_table_lock;
}
--- a/mm/khugepaged.c~mm-let-pte_lockptr-consume-a-pte_t-pointer
+++ a/mm/khugepaged.c
@@ -1697,12 +1697,13 @@ static void retract_page_tables(struct a
i_mmap_lock_read(mapping);
vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff, pgoff) {
struct mmu_notifier_range range;
+ bool retracted = false;
struct mm_struct *mm;
unsigned long addr;
pmd_t *pmd, pgt_pmd;
spinlock_t *pml;
spinlock_t *ptl;
- bool skipped_uffd = false;
+ pte_t *pte;
/*
* Check vma->anon_vma to exclude MAP_PRIVATE mappings that
@@ -1739,9 +1740,17 @@ static void retract_page_tables(struct a
mmu_notifier_invalidate_range_start(&range);
pml = pmd_lock(mm, pmd);
- ptl = pte_lockptr(mm, pmd);
+
+ /*
+ * No need to check the PTE table content, but we'll grab the
+ * PTE table lock while we zap it.
+ */
+ pte = pte_offset_map_nolock(mm, pmd, addr, &ptl);
+ if (!pte)
+ goto unlock_pmd;
if (ptl != pml)
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
+ pte_unmap(pte);
/*
* Huge page lock is still held, so normally the page table
@@ -1752,20 +1761,20 @@ static void retract_page_tables(struct a
* repeating the anon_vma check protects from one category,
* and repeating the userfaultfd_wp() check from another.
*/
- if (unlikely(vma->anon_vma || userfaultfd_wp(vma))) {
- skipped_uffd = true;
- } else {
+ if (likely(!vma->anon_vma && !userfaultfd_wp(vma))) {
pgt_pmd = pmdp_collapse_flush(vma, addr, pmd);
pmdp_get_lockless_sync();
+ retracted = true;
}
if (ptl != pml)
spin_unlock(ptl);
+unlock_pmd:
spin_unlock(pml);
mmu_notifier_invalidate_range_end(&range);
- if (!skipped_uffd) {
+ if (retracted) {
mm_dec_nr_ptes(mm);
page_table_check_pte_clear_range(mm, addr, pgt_pmd);
pte_free_defer(mm, pmd_pgtable(pgt_pmd));
--- a/mm/pgtable-generic.c~mm-let-pte_lockptr-consume-a-pte_t-pointer
+++ a/mm/pgtable-generic.c
@@ -313,7 +313,7 @@ pte_t *pte_offset_map_nolock(struct mm_s
pte = __pte_offset_map(pmd, addr, &pmdval);
if (likely(pte))
- *ptlp = pte_lockptr(mm, &pmdval);
+ *ptlp = pte_lockptr(mm, pte);
return pte;
}
@@ -371,7 +371,7 @@ again:
pte = __pte_offset_map(pmd, addr, &pmdval);
if (unlikely(!pte))
return pte;
- ptl = pte_lockptr(mm, &pmdval);
+ ptl = pte_lockptr(mm, pte);
spin_lock(ptl);
if (likely(pmd_same(pmdval, pmdp_get_lockless(pmd)))) {
*ptlp = ptl;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-let-pte_lockptr-consume-a-pte_t-pointer-fix.patch
mm-hugetlb-fix-hugetlb-vs-core-mm-pt-locking.patch
mm-turn-use_split_pte_ptlocks-use_split_pte_ptlocks-into-kconfig-options.patch
mm-hugetlb-enforce-that-pmd-pt-sharing-has-split-pmd-pt-locks.patch
powerpc-8xx-document-and-enforce-that-split-pt-locks-are-not-used.patch
mm-simplify-arch_make_folio_accessible.patch
mm-gup-convert-to-arch_make_folio_accessible.patch
s390-uv-drop-arch_make_page_accessible.patch
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
With RME Fireface 800, this lead to a regression since
Kernels 5.14.0, causing an AB/BA deadlock competition for the
substream lock with eventual system freeze under ALSA operation:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
Restore the process context work queue to prevent deadlock
AB/BA deadlock competition for ALSA substream lock of
snd_pcm_stream_lock_irq() in snd_pcm_status64()
and snd_pcm_stream_lock_irqsave() in snd_pcm_period_elapsed().
revert commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period
elapse event in process context")
Replace inline description to prevent future deadlock.
Cc: stable(a)vger.kernel.org
Fixes: 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event in process context")
Reported-by: edmund.raile <edmund.raile(a)proton.me>
Closes: https://lore.kernel.org/r/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg76hzeo4simn…
Signed-off-by: Edmund Raile <edmund.raile(a)protonmail.com>
---
sound/firewire/amdtp-stream.c | 23 +++++++++--------------
1 file changed, 9 insertions(+), 14 deletions(-)
diff --git a/sound/firewire/amdtp-stream.c b/sound/firewire/amdtp-stream.c
index 31201d506a21..7438999e0510 100644
--- a/sound/firewire/amdtp-stream.c
+++ b/sound/firewire/amdtp-stream.c
@@ -615,16 +615,8 @@ static void update_pcm_pointers(struct amdtp_stream *s,
// The program in user process should periodically check the status of intermediate
// buffer associated to PCM substream to process PCM frames in the buffer, instead
// of receiving notification of period elapsed by poll wait.
- if (!pcm->runtime->no_period_wakeup) {
- if (in_softirq()) {
- // In software IRQ context for 1394 OHCI.
- snd_pcm_period_elapsed(pcm);
- } else {
- // In process context of ALSA PCM application under acquired lock of
- // PCM substream.
- snd_pcm_period_elapsed_under_stream_lock(pcm);
- }
- }
+ if (!pcm->runtime->no_period_wakeup)
+ queue_work(system_highpri_wq, &s->period_work);
}
}
@@ -1864,11 +1856,14 @@ unsigned long amdtp_domain_stream_pcm_pointer(struct amdtp_domain *d,
{
struct amdtp_stream *irq_target = d->irq_target;
- // Process isochronous packets queued till recent isochronous cycle to handle PCM frames.
if (irq_target && amdtp_stream_running(irq_target)) {
- // In software IRQ context, the call causes dead-lock to disable the tasklet
- // synchronously.
- if (!in_softirq())
+ // use wq to prevent AB/BA deadlock competition for
+ // substream lock:
+ // fw_iso_context_flush_completions() acquires
+ // lock by ohci_flush_iso_completions(),
+ // amdtp-stream process_rx_packets() attempts to
+ // acquire same lock by snd_pcm_elapsed()
+ if (current_work() != &s->period_work)
fw_iso_context_flush_completions(irq_target->context);
}
--
2.45.2
This piece was missing in commit ae678317b95e ("netfs: Remove
deprecated use of PG_private_2 as a second writeback flag").
There is one remaining use of PG_private_2: the function
__fscache_clear_page_bits(), whose only purpose is to clear
PG_private_2. This is done via folio_end_private_2() which also
releases the folio reference which was supposed to be taken by
folio_start_private_2() (via ceph_set_page_fscache()).
__fscache_clear_page_bits() is called by __fscache_write_to_cache(),
but only if the parameter using_pgpriv2 is true; the only caller of
that function is ceph_fscache_write_to_cache() which still passes
true.
By calling folio_end_private_2() without folio_start_private_2(), the
folio refcounter breaks and causes trouble like RCU stalls and general
protection faults.
Cc: stable(a)vger.kernel.org
Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag")
Link: https://lore.kernel.org/ceph-devel/CAKPOu+_DA8XiMAA2ApMj7Pyshve_YWknw8Hdt1=…
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
---
fs/ceph/addr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 8c16bc5250ef..aacea3e8fd6d 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -512,7 +512,7 @@ static void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, b
struct fscache_cookie *cookie = ceph_fscache_cookie(ci);
fscache_write_to_cache(cookie, inode->i_mapping, off, len, i_size_read(inode),
- ceph_fscache_write_terminated, inode, true, caching);
+ ceph_fscache_write_terminated, inode, false, caching);
}
#else
static inline void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, bool caching)
--
2.43.0
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 943ad0b62e3c21f324c4884caa6cb4a871bca05c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072940-parish-shirt-3e49@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
943ad0b62e3c ("kernel: rerun task_work while freezing in get_signal()")
f5d39b020809 ("freezer,sched: Rewrite core freezer logic")
9963e444f71e ("sched: Widen TAKS_state literals")
f9fc8cad9728 ("sched: Add TASK_ANY for wait_task_inactive()")
9204a97f7ae8 ("sched: Change wait_task_inactive()s match_state")
1fbcaa923ce2 ("freezer,umh: Clean up freezer/initrd interaction")
5950e5d574c6 ("freezer: Have {,un}lock_system_sleep() save/restore flags")
0b9d46fc5ef7 ("sched: Rename task_running() to task_on_cpu()")
8386c414e27c ("PM: hibernate: defer device probing when resuming from hibernation")
57b6de08b5f6 ("ptrace: Admit ptrace_stop can generate spuriuos SIGTRAPs")
7b0fe1367ef2 ("ptrace: Document that wait_task_inactive can't fail")
1930a6e739c4 ("Merge tag 'ptrace-cleanups-for-v5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 943ad0b62e3c21f324c4884caa6cb4a871bca05c Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Wed, 10 Jul 2024 18:58:18 +0100
Subject: [PATCH] kernel: rerun task_work while freezing in get_signal()
io_uring can asynchronously add a task_work while the task is getting
freezed. TIF_NOTIFY_SIGNAL will prevent the task from sleeping in
do_freezer_trap(), and since the get_signal()'s relock loop doesn't
retry task_work, the task will spin there not being able to sleep
until the freezing is cancelled / the task is killed / etc.
Run task_works in the freezer path. Keep the patch small and simple
so it can be easily back ported, but we might need to do some cleaning
after and look if there are other places with similar problems.
Cc: stable(a)vger.kernel.org
Link: https://github.com/systemd/systemd/issues/33626
Fixes: 12db8b690010c ("entry: Add support for TIF_NOTIFY_SIGNAL")
Reported-by: Julian Orth <ju.orth(a)gmail.com>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/89ed3a52933370deaaf61a0a620a6ac91f1e754d.17206341…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/kernel/signal.c b/kernel/signal.c
index 1f9dd41c04be..60c737e423a1 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2600,6 +2600,14 @@ static void do_freezer_trap(void)
spin_unlock_irq(¤t->sighand->siglock);
cgroup_enter_frozen();
schedule();
+
+ /*
+ * We could've been woken by task_work, run it to clear
+ * TIF_NOTIFY_SIGNAL. The caller will retry if necessary.
+ */
+ clear_notify_signal();
+ if (unlikely(task_work_pending(current)))
+ task_work_run();
}
static int ptrace_signal(int signr, kernel_siginfo_t *info, enum pid_type type)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0453aad676ff99787124b9b3af4a5f59fbe808e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072924-robin-manger-e92b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0453aad676ff ("io_uring/io-wq: limit retrying worker initialisation")
8a565304927f ("io_uring/io-wq: Use set_bit() and test_bit() at worker->flags")
eb47943f2238 ("io-wq: Drop struct io_wqe")
dfd63baf892c ("io-wq: Move wq accounting to io_wq")
da64d6db3bd3 ("io_uring: One wqe per wq")
01e68ce08a30 ("io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers")
996d3efeb091 ("io-wq: Fix memory leak in worker creation")
024f15e033a5 ("io_uring: dedup io_run_task_work")
a6b21fbb4ce3 ("io_uring: move list helpers to a separate file")
ab1c84d855cf ("io_uring: make io_uring_types.h public")
48c13d898084 ("io_uring: explain io_wq_work::cancel_seq placement")
e418bbc97bff ("io_uring: move our reference counting into a header")
cd40cae29ef8 ("io_uring: split out open/close operations")
453b329be5ea ("io_uring: separate out file table handling code")
f4c163dd7d4b ("io_uring: split out fadvise/madvise operations")
0d5847274037 ("io_uring: split out fs related sync/fallocate functions")
531113bbd5bf ("io_uring: split out splice related operations")
11aeb71406dd ("io_uring: split out filesystem related operations")
e28683bdfc2f ("io_uring: move nop into its own file")
5e2a18d93fec ("io_uring: move xattr related opcodes to its own file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0453aad676ff99787124b9b3af4a5f59fbe808e2 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Wed, 10 Jul 2024 18:58:17 +0100
Subject: [PATCH] io_uring/io-wq: limit retrying worker initialisation
If io-wq worker creation fails, we retry it by queueing up a task_work.
tasK_work is needed because it should be done from the user process
context. The problem is that retries are not limited, and if queueing a
task_work is the reason for the failure, we might get into an infinite
loop.
It doesn't seem to happen now but it would with the following patch
executing task_work in the freezer's loop. For now, arbitrarily limit the
number of attempts to create a worker.
Cc: stable(a)vger.kernel.org
Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals")
Reported-by: Julian Orth <ju.orth(a)gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/8280436925db88448c7c85c6656edee1a43029ea.17206341…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 913c92249522..f1e7c670add8 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -23,6 +23,7 @@
#include "io_uring.h"
#define WORKER_IDLE_TIMEOUT (5 * HZ)
+#define WORKER_INIT_LIMIT 3
enum {
IO_WORKER_F_UP = 0, /* up and active */
@@ -58,6 +59,7 @@ struct io_worker {
unsigned long create_state;
struct callback_head create_work;
+ int init_retries;
union {
struct rcu_head rcu;
@@ -745,7 +747,7 @@ static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
return true;
}
-static inline bool io_should_retry_thread(long err)
+static inline bool io_should_retry_thread(struct io_worker *worker, long err)
{
/*
* Prevent perpetual task_work retry, if the task (or its group) is
@@ -753,6 +755,8 @@ static inline bool io_should_retry_thread(long err)
*/
if (fatal_signal_pending(current))
return false;
+ if (worker->init_retries++ >= WORKER_INIT_LIMIT)
+ return false;
switch (err) {
case -EAGAIN:
@@ -779,7 +783,7 @@ static void create_worker_cont(struct callback_head *cb)
io_init_new_worker(wq, worker, tsk);
io_worker_release(worker);
return;
- } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
+ } else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) {
struct io_wq_acct *acct = io_wq_get_acct(worker);
atomic_dec(&acct->nr_running);
@@ -846,7 +850,7 @@ static bool create_io_worker(struct io_wq *wq, int index)
tsk = create_io_thread(io_wq_worker, worker, NUMA_NO_NODE);
if (!IS_ERR(tsk)) {
io_init_new_worker(wq, worker, tsk);
- } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
+ } else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) {
kfree(worker);
goto fail;
} else {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0453aad676ff99787124b9b3af4a5f59fbe808e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072923-bodacious-claw-442b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0453aad676ff ("io_uring/io-wq: limit retrying worker initialisation")
8a565304927f ("io_uring/io-wq: Use set_bit() and test_bit() at worker->flags")
eb47943f2238 ("io-wq: Drop struct io_wqe")
dfd63baf892c ("io-wq: Move wq accounting to io_wq")
da64d6db3bd3 ("io_uring: One wqe per wq")
01e68ce08a30 ("io_uring/io-wq: stop setting PF_NO_SETAFFINITY on io-wq workers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0453aad676ff99787124b9b3af4a5f59fbe808e2 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Wed, 10 Jul 2024 18:58:17 +0100
Subject: [PATCH] io_uring/io-wq: limit retrying worker initialisation
If io-wq worker creation fails, we retry it by queueing up a task_work.
tasK_work is needed because it should be done from the user process
context. The problem is that retries are not limited, and if queueing a
task_work is the reason for the failure, we might get into an infinite
loop.
It doesn't seem to happen now but it would with the following patch
executing task_work in the freezer's loop. For now, arbitrarily limit the
number of attempts to create a worker.
Cc: stable(a)vger.kernel.org
Fixes: 3146cba99aa28 ("io-wq: make worker creation resilient against signals")
Reported-by: Julian Orth <ju.orth(a)gmail.com>
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/8280436925db88448c7c85c6656edee1a43029ea.17206341…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c
index 913c92249522..f1e7c670add8 100644
--- a/io_uring/io-wq.c
+++ b/io_uring/io-wq.c
@@ -23,6 +23,7 @@
#include "io_uring.h"
#define WORKER_IDLE_TIMEOUT (5 * HZ)
+#define WORKER_INIT_LIMIT 3
enum {
IO_WORKER_F_UP = 0, /* up and active */
@@ -58,6 +59,7 @@ struct io_worker {
unsigned long create_state;
struct callback_head create_work;
+ int init_retries;
union {
struct rcu_head rcu;
@@ -745,7 +747,7 @@ static bool io_wq_work_match_all(struct io_wq_work *work, void *data)
return true;
}
-static inline bool io_should_retry_thread(long err)
+static inline bool io_should_retry_thread(struct io_worker *worker, long err)
{
/*
* Prevent perpetual task_work retry, if the task (or its group) is
@@ -753,6 +755,8 @@ static inline bool io_should_retry_thread(long err)
*/
if (fatal_signal_pending(current))
return false;
+ if (worker->init_retries++ >= WORKER_INIT_LIMIT)
+ return false;
switch (err) {
case -EAGAIN:
@@ -779,7 +783,7 @@ static void create_worker_cont(struct callback_head *cb)
io_init_new_worker(wq, worker, tsk);
io_worker_release(worker);
return;
- } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
+ } else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) {
struct io_wq_acct *acct = io_wq_get_acct(worker);
atomic_dec(&acct->nr_running);
@@ -846,7 +850,7 @@ static bool create_io_worker(struct io_wq *wq, int index)
tsk = create_io_thread(io_wq_worker, worker, NUMA_NO_NODE);
if (!IS_ERR(tsk)) {
io_init_new_worker(wq, worker, tsk);
- } else if (!io_should_retry_thread(PTR_ERR(tsk))) {
+ } else if (!io_should_retry_thread(worker, PTR_ERR(tsk))) {
kfree(worker);
goto fail;
} else {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 30d77b7eef019fa4422980806e8b7cdc8674493e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072912-during-vitalize-fe0c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
30d77b7eef01 ("mm/mglru: fix ineffective protection calculation")
3f74e6bd3b84 ("mm/mglru: fix overshooting shrinker memory")
4acef5694e01 ("mm/mglru: improve swappiness handling")
745b13e647cd ("mm/mglru: remove CONFIG_MEMCG")
4376807bf2d5 ("mm/mglru: reclaim offlined memcgs harder")
8aa420617918 ("mm/mglru: respect min_ttl_ms with memcgs")
5095a2b23987 ("mm/mglru: try to stop at high watermarks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 30d77b7eef019fa4422980806e8b7cdc8674493e Mon Sep 17 00:00:00 2001
From: Yu Zhao <yuzhao(a)google.com>
Date: Fri, 12 Jul 2024 17:29:56 -0600
Subject: [PATCH] mm/mglru: fix ineffective protection calculation
mem_cgroup_calculate_protection() is not stateless and should only be used
as part of a top-down tree traversal. shrink_one() traverses the per-node
memcg LRU instead of the root_mem_cgroup tree, and therefore it should not
call mem_cgroup_calculate_protection().
The existing misuse in shrink_one() can cause ineffective protection of
sub-trees that are grandchildren of root_mem_cgroup. Fix it by reusing
lru_gen_age_node(), which already traverses the root_mem_cgroup tree, to
calculate the protection.
Previously lru_gen_age_node() opportunistically skips the first pass,
i.e., when scan_control->priority is DEF_PRIORITY. On the second pass,
lruvec_is_sizable() uses appropriate scan_control->priority, set by
set_initial_priority() from lru_gen_shrink_node(), to decide whether a
memcg is too small to reclaim from.
Now lru_gen_age_node() unconditionally traverses the root_mem_cgroup tree.
So it should call set_initial_priority() upfront, to make sure
lruvec_is_sizable() uses appropriate scan_control->priority on the first
pass. Otherwise, lruvec_is_reclaimable() can return false negatives and
result in premature OOM kills when min_ttl_ms is used.
Link: https://lkml.kernel.org/r/20240712232956.1427127-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: T.J. Mercier <tjmercier(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6216d79edb7f..525d3ffa8451 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3915,6 +3915,32 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq,
* working set protection
******************************************************************************/
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int priority;
+ unsigned long reclaimable;
+
+ if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+ return;
+ /*
+ * Determine the initial priority based on
+ * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
+ * where reclaimed_to_scanned_ratio = inactive / total.
+ */
+ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+ if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc))
+ reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+ /* round down reclaimable and round up sc->nr_to_reclaim */
+ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+ /*
+ * The estimation is based on LRU pages only, so cap it to prevent
+ * overshoots of shrinker objects by large margins.
+ */
+ sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
+}
+
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;
@@ -3948,19 +3974,17 @@ static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MIN_SEQ(lruvec);
- /* see the comment on lru_gen_folio */
- gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
- birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
-
- if (time_is_after_jiffies(birth + min_ttl))
+ if (mem_cgroup_below_min(NULL, memcg))
return false;
if (!lruvec_is_sizable(lruvec, sc))
return false;
- mem_cgroup_calculate_protection(NULL, memcg);
+ /* see the comment on lru_gen_folio */
+ gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+ birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
- return !mem_cgroup_below_min(NULL, memcg);
+ return time_is_before_jiffies(birth + min_ttl);
}
/* to protect the working set of the last N jiffies */
@@ -3970,23 +3994,20 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct mem_cgroup *memcg;
unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
+ bool reclaimable = !min_ttl;
VM_WARN_ON_ONCE(!current_is_kswapd());
- /* check the order to exclude compaction-induced reclaim */
- if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
- return;
+ set_initial_priority(pgdat, sc);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
- mem_cgroup_iter_break(NULL, memcg);
- return;
- }
+ mem_cgroup_calculate_protection(NULL, memcg);
- cond_resched();
+ if (!reclaimable)
+ reclaimable = lruvec_is_reclaimable(lruvec, sc, min_ttl);
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
/*
@@ -3994,7 +4015,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
* younger than min_ttl. However, another possibility is all memcgs are
* either too small or below min.
*/
- if (mutex_trylock(&oom_lock)) {
+ if (!reclaimable && mutex_trylock(&oom_lock)) {
struct oom_control oc = {
.gfp_mask = sc->gfp_mask,
};
@@ -4786,8 +4807,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- mem_cgroup_calculate_protection(NULL, memcg);
-
+ /* lru_gen_age_node() called mem_cgroup_calculate_protection() */
if (mem_cgroup_below_min(NULL, memcg))
return MEMCG_LRU_YOUNG;
@@ -4911,32 +4931,6 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
blk_finish_plug(&plug);
}
-static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
-{
- int priority;
- unsigned long reclaimable;
-
- if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
- return;
- /*
- * Determine the initial priority based on
- * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
- * where reclaimed_to_scanned_ratio = inactive / total.
- */
- reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
- if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc))
- reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
-
- /* round down reclaimable and round up sc->nr_to_reclaim */
- priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
-
- /*
- * The estimation is based on LRU pages only, so cap it to prevent
- * overshoots of shrinker objects by large margins.
- */
- sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
-}
-
static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct blk_plug plug;
commit 4811f7af6090e8f5a398fbdd766f903ef6c0d787 upstream.
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
Fixes: a60be987d45d ("nilfs2: B-tree node cache")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+89cc4f2324ed37988b60(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to the stable trees indicated by the subject
prefix instead of the failed patches or the one I asked you to drop.
This patch is tailored to take page/folio conversion into account and
can be applied from v4.11 to v6.7.
Also, all the builds and tests I did on each stable tree passed.
Thanks,
Ryusuke Konishi
fs/nilfs2/btnode.c | 25 ++++++++++++++++++++-----
fs/nilfs2/btree.c | 4 ++--
2 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 5710833ac1cc..8fe348bceabe 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -51,12 +51,21 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
if (unlikely(!bh))
- return NULL;
+ return ERR_PTR(-ENOMEM);
if (unlikely(buffer_mapped(bh) || buffer_uptodate(bh) ||
buffer_dirty(bh))) {
- brelse(bh);
- BUG();
+ /*
+ * The block buffer at the specified new address was already
+ * in use. This can happen if it is a virtual block number
+ * and has been reallocated due to corruption of the bitmap
+ * used to manage its allocation state (if not, the buffer
+ * clearing of an abandoned b-tree node is missing somewhere).
+ */
+ nilfs_error(inode->i_sb,
+ "state inconsistency probably due to duplicate use of b-tree node block address %llu (ino=%lu)",
+ (unsigned long long)blocknr, inode->i_ino);
+ goto failed;
}
memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
@@ -67,6 +76,12 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
unlock_page(bh->b_page);
put_page(bh->b_page);
return bh;
+
+failed:
+ unlock_page(bh->b_page);
+ put_page(bh->b_page);
+ brelse(bh);
+ return ERR_PTR(-EIO);
}
int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
@@ -217,8 +232,8 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
}
nbh = nilfs_btnode_create_block(btnc, newkey);
- if (!nbh)
- return -ENOMEM;
+ if (IS_ERR(nbh))
+ return PTR_ERR(nbh);
BUG_ON(nbh == obh);
ctxt->newbh = nbh;
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index 65659fa0372e..598f05867059 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -63,8 +63,8 @@ static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree,
struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr);
- if (!bh)
- return -ENOMEM;
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
set_buffer_nilfs_volatile(bh);
*bhp = bh;
--
2.43.5
Upstream commit aba3a8d01d62 ("usb: gadget: u_serial: add suspend
resume callbacks") added started_delayed flag, so that new ports
which are opened after USB suspend can start IO while resuming.
But if the port was already opened, and gadget suspend kicks in
afterwards, start_delayed will never be set. This causes resume
to bail out before calling gs_start_io(). Fix this by setting
start_delayed during suspend.
Fixes: aba3a8d01d62 ("usb: gadget: u_serial: add suspend resume callbacks")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Prashanth K <quic_prashk(a)quicinc.com>
---
drivers/usb/gadget/function/u_serial.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index eec7f7a2e40f..b394105e55d6 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -1441,6 +1441,7 @@ void gserial_suspend(struct gserial *gser)
spin_lock(&port->port_lock);
spin_unlock(&serial_port_lock);
port->suspended = true;
+ port->start_delayed = true;
spin_unlock_irqrestore(&port->port_lock, flags);
}
EXPORT_SYMBOL_GPL(gserial_suspend);
--
2.25.1
From: Edward Adam Davis <eadavis(a)qq.com>
[ Upstream commit ce6dede912f064a855acf6f04a04cbb2c25b8c8c ]
[syzbot reported]
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713
...
[Analyze]
In dtInsertEntry(), when the pointer h has the same value as p, after writing
name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the
previously true judgment "p->header.flag & BT-LEAF" to change to no after writing
the name operation, this leads to entering an incorrect branch and accessing the
uninitialized object ih when judging this condition for the second time.
[Fix]
After got the page, check freelist first, if freelist == 0 then exit dtInsert()
and return -EINVAL.
Reported-by: syzbot+bba84aef3a26fb93deb9(a)syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dtree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c
index ea2c8f0fe832c..0c16f7f8eaa2b 100644
--- a/fs/jfs/jfs_dtree.c
+++ b/fs/jfs/jfs_dtree.c
@@ -847,6 +847,8 @@ int dtInsert(tid_t tid, struct inode *ip,
* the full page.
*/
DT_GETSEARCH(ip, btstack->top, bn, mp, p, index);
+ if (p->header.freelist == 0)
+ return -EINVAL;
/*
* insert entry for new key
--
2.43.0
From: Edward Adam Davis <eadavis(a)qq.com>
[ Upstream commit ce6dede912f064a855acf6f04a04cbb2c25b8c8c ]
[syzbot reported]
general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5061 Comm: syz-executor404 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
RIP: 0010:dtInsertEntry+0xd0c/0x1780 fs/jfs/jfs_dtree.c:3713
...
[Analyze]
In dtInsertEntry(), when the pointer h has the same value as p, after writing
name in UniStrncpy_to_le(), p->header.flag will be cleared. This will cause the
previously true judgment "p->header.flag & BT-LEAF" to change to no after writing
the name operation, this leads to entering an incorrect branch and accessing the
uninitialized object ih when judging this condition for the second time.
[Fix]
After got the page, check freelist first, if freelist == 0 then exit dtInsert()
and return -EINVAL.
Reported-by: syzbot+bba84aef3a26fb93deb9(a)syzkaller.appspotmail.com
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Signed-off-by: Dave Kleikamp <dave.kleikamp(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/jfs/jfs_dtree.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/jfs/jfs_dtree.c b/fs/jfs/jfs_dtree.c
index 077a87e530205..3bcfb37a9c1f6 100644
--- a/fs/jfs/jfs_dtree.c
+++ b/fs/jfs/jfs_dtree.c
@@ -834,6 +834,8 @@ int dtInsert(tid_t tid, struct inode *ip,
* the full page.
*/
DT_GETSEARCH(ip, btstack->top, bn, mp, p, index);
+ if (p->header.freelist == 0)
+ return -EINVAL;
/*
* insert entry for new key
--
2.43.0
From: Christoph Hellwig <hch(a)lst.de>
[ Upstream commit bf4c89fc8797f5c0964a0c3d561fbe7e8483b62f ]
Commit b222dd2fdd53 ("block: call bio_uninit in bio_endio") added a call
to bio_uninit in bio_endio to work around callers that use bio_init but
fail to call bio_uninit after they are done to release the resources.
While this is an abuse of the bio_init API we still have quite a few of
those left. But this early uninit causes a problem for integrity data,
as at least some users need the bio_integrity_payload. Right now the
only one is the NVMe passthrough which archives this by adding a special
case to skip the freeing if the BIP_INTEGRITY_USER flag is set.
Sort this out by only putting bi_blkg in bio_endio as that is the cause
of the actual leaks - the few users of the crypto context and integrity
data all properly call bio_uninit, usually through bio_put for
dynamically allocated bios.
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Link: https://lore.kernel.org/r/20240702151047.1746127-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
block/bio.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index e9e809a63c597..c7a4bc05c43e7 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1630,8 +1630,18 @@ void bio_endio(struct bio *bio)
goto again;
}
- /* release cgroup info */
- bio_uninit(bio);
+#ifdef CONFIG_BLK_CGROUP
+ /*
+ * Release cgroup info. We shouldn't have to do this here, but quite
+ * a few callers of bio_init fail to call bio_uninit, so we cover up
+ * for that here at least for now.
+ */
+ if (bio->bi_blkg) {
+ blkg_put(bio->bi_blkg);
+ bio->bi_blkg = NULL;
+ }
+#endif
+
if (bio->bi_end_io)
bio->bi_end_io(bio);
}
--
2.43.0
On the off chance that clock value ends up being too high (by means
of skl_ddi_calculate_wrpll() having benn called with big enough
value of crtc_state->port_clock * 1000), one possible consequence
may be that the result will not be able to fit into signed int.
Fix this issue by moving conversion of clock parameter from kHz to Hz
into the body of skl_ddi_calculate_wrpll(), as well as casting the
same parameter to u64 type while calculating the value for AFE clock.
This both mitigates the overflow problem and avoids possible erroneous
integer promotion mishaps.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: fe70b262e781 ("drm/i915: Move a bunch of stuff into rodata from the stack")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
---
v2: instead of double casting of 'clock' with (u64)(u32), convert
'clock' to Hz inside skl_ddi_calculate_wrpll() and cast it only
to u64 to mitigate the issue. Per Jani's <jani.nikula(a)linux.intel.com>
helpful suggestion made here:
https://lore.kernel.org/all/87ed7gzhin.fsf@intel.com/
Also, change commit description accordingly.
v1: https://lore.kernel.org/all/20240724184911.12250-1-n.zhandarovich@fintech.r…
drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
index 90998b037349..292d163036b1 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@@ -1658,7 +1658,7 @@ static void skl_wrpll_params_populate(struct skl_wrpll_params *params,
}
static int
-skl_ddi_calculate_wrpll(int clock /* in Hz */,
+skl_ddi_calculate_wrpll(int clock,
int ref_clock,
struct skl_wrpll_params *wrpll_params)
{
@@ -1683,7 +1683,7 @@ skl_ddi_calculate_wrpll(int clock /* in Hz */,
};
unsigned int dco, d, i;
unsigned int p0, p1, p2;
- u64 afe_clock = clock * 5; /* AFE Clock is 5x Pixel clock */
+ u64 afe_clock = (u64)clock * 1000 * 5; /* AFE Clock is 5x Pixel clock, in Hz */
for (d = 0; d < ARRAY_SIZE(dividers); d++) {
for (dco = 0; dco < ARRAY_SIZE(dco_central_freq); dco++) {
@@ -1808,7 +1808,7 @@ static int skl_ddi_hdmi_pll_dividers(struct intel_crtc_state *crtc_state)
struct skl_wrpll_params wrpll_params = {};
int ret;
- ret = skl_ddi_calculate_wrpll(crtc_state->port_clock * 1000,
+ ret = skl_ddi_calculate_wrpll(crtc_state->port_clock,
i915->display.dpll.ref_clks.nssc, &wrpll_params);
if (ret)
return ret;
commit 11a1f4bc47362700fcbde717292158873fb847ed upstream.
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
RIP: pci_bus_read_config_dword+0x17/0x50
pci_dev_wait()
pci_bridge_wait_for_secondary_bus()
dpc_reset_link()
pcie_do_recovery()
dpc_handler()
Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset")
Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.…
Reported-by: Keith Busch <kbusch(a)kernel.org>
Tested-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
---
drivers/pci/pci.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 67216f4ea215..a88909f2ae65 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4916,7 +4916,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
int timeout)
{
struct pci_dev *child;
- int delay;
+ int delay, ret = 0;
if (pci_dev_is_disconnected(dev))
return 0;
@@ -4944,8 +4944,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
return 0;
}
- child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
- bus_list);
+ child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
+ struct pci_dev, bus_list));
up_read(&pci_bus_sem);
/*
@@ -4955,7 +4955,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
if (!pci_is_pcie(dev)) {
pci_dbg(dev, "waiting %d ms for secondary bus\n", 1000 + delay);
msleep(1000 + delay);
- return 0;
+ goto put_child;
}
/*
@@ -4976,7 +4976,7 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
* until the timeout expires.
*/
if (!pcie_downstream_port(dev))
- return 0;
+ goto put_child;
if (pcie_get_speed_cap(dev) <= PCIE_SPEED_5_0GT) {
pci_dbg(dev, "waiting %d ms for downstream link\n", delay);
@@ -4987,11 +4987,16 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type,
if (!pcie_wait_for_link_delay(dev, true, delay)) {
/* Did not train, no need to wait any further */
pci_info(dev, "Data Link Layer Link Active not set in 1000 msec\n");
- return -ENOTTY;
+ ret = -ENOTTY;
+ goto put_child;
}
}
- return pci_dev_wait(child, reset_type, timeout - delay);
+ ret = pci_dev_wait(child, reset_type, timeout - delay);
+
+put_child:
+ pci_dev_put(child);
+ return ret;
}
void pci_reset_secondary_bus(struct pci_dev *dev)
--
2.43.0
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073021-strut-specimen-8aad@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c3281abea67c9c0dc6219bbc41d1feae05a16da3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073024-ruined-frightful-2dc2@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
c3281abea67c ("remoteproc: stm32_rproc: Fix mailbox interrupts queuing")
35bdafda40cc ("remoteproc: stm32_rproc: Add mutex protection for workqueue")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c3281abea67c9c0dc6219bbc41d1feae05a16da3 Mon Sep 17 00:00:00 2001
From: Gwenael Treuveur <gwenael.treuveur(a)foss.st.com>
Date: Tue, 21 May 2024 18:23:16 +0200
Subject: [PATCH] remoteproc: stm32_rproc: Fix mailbox interrupts queuing
Manage interrupt coming from coprocessor also when state is
ATTACHED.
Fixes: 35bdafda40cc ("remoteproc: stm32_rproc: Add mutex protection for workqueue")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gwenael Treuveur <gwenael.treuveur(a)foss.st.com>
Acked-by: Arnaud Pouliquen <arnaud.pouliquen(a)foss.st.com>
Link: https://lore.kernel.org/r/20240521162316.156259-1-gwenael.treuveur@foss.st.…
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
diff --git a/drivers/remoteproc/stm32_rproc.c b/drivers/remoteproc/stm32_rproc.c
index 88623df7d0c3..8c7f7950b80e 100644
--- a/drivers/remoteproc/stm32_rproc.c
+++ b/drivers/remoteproc/stm32_rproc.c
@@ -294,7 +294,7 @@ static void stm32_rproc_mb_vq_work(struct work_struct *work)
mutex_lock(&rproc->lock);
- if (rproc->state != RPROC_RUNNING)
+ if (rproc->state != RPROC_RUNNING && rproc->state != RPROC_ATTACHED)
goto unlock_mutex;
if (rproc_vq_interrupt(rproc, mb->vq_id) == IRQ_NONE)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f70fd92df7529e7283e02a6c3a2510075f13ba30
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073000-yelp-remnant-0e02@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
f70fd92df752 ("MIPS: dts: loongson: Fix ls2k1000-rtc interrupt")
dbb69b9d6234 ("MIPS: dts: loongson: Fix liointc IRQ polarity")
d89a415ff8d5 ("MIPS: Loongson64: DTS: Fix PCIe port nodes for ls7a")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f70fd92df7529e7283e02a6c3a2510075f13ba30 Mon Sep 17 00:00:00 2001
From: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Date: Fri, 14 Jun 2024 16:40:11 +0100
Subject: [PATCH] MIPS: dts: loongson: Fix ls2k1000-rtc interrupt
The correct interrupt line for RTC is line 8 on liointc1.
Fixes: e47084e116fc ("MIPS: Loongson64: DTS: Add RTC support to Loongson-2K1000")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
index 3f5255584c30..c3a57a0befa7 100644
--- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
+++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
@@ -92,8 +92,8 @@ liointc1: interrupt-controller@1fe11440 {
rtc0: rtc@1fe07800 {
compatible = "loongson,ls2k1000-rtc";
reg = <0 0x1fe07800 0 0x78>;
- interrupt-parent = <&liointc0>;
- interrupts = <60 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-parent = <&liointc1>;
+ interrupts = <8 IRQ_TYPE_LEVEL_HIGH>;
};
uart0: serial@1fe00000 {
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x f70fd92df7529e7283e02a6c3a2510075f13ba30
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073000-directory-swimmable-5878@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
f70fd92df752 ("MIPS: dts: loongson: Fix ls2k1000-rtc interrupt")
dbb69b9d6234 ("MIPS: dts: loongson: Fix liointc IRQ polarity")
d89a415ff8d5 ("MIPS: Loongson64: DTS: Fix PCIe port nodes for ls7a")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f70fd92df7529e7283e02a6c3a2510075f13ba30 Mon Sep 17 00:00:00 2001
From: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Date: Fri, 14 Jun 2024 16:40:11 +0100
Subject: [PATCH] MIPS: dts: loongson: Fix ls2k1000-rtc interrupt
The correct interrupt line for RTC is line 8 on liointc1.
Fixes: e47084e116fc ("MIPS: Loongson64: DTS: Add RTC support to Loongson-2K1000")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
index 3f5255584c30..c3a57a0befa7 100644
--- a/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
+++ b/arch/mips/boot/dts/loongson/loongson64-2k1000.dtsi
@@ -92,8 +92,8 @@ liointc1: interrupt-controller@1fe11440 {
rtc0: rtc@1fe07800 {
compatible = "loongson,ls2k1000-rtc";
reg = <0 0x1fe07800 0 0x78>;
- interrupt-parent = <&liointc0>;
- interrupts = <60 IRQ_TYPE_LEVEL_HIGH>;
+ interrupt-parent = <&liointc1>;
+ interrupts = <8 IRQ_TYPE_LEVEL_HIGH>;
};
uart0: serial@1fe00000 {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x dce68a49be26abf52712e0ee452a45fa01ab4624
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073049-brutishly-astride-ce55@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
dce68a49be26 ("remoteproc: imx_rproc: Fix refcount mistake in imx_rproc_addr_init")
61afafe8b938 ("remoteproc: imx_rproc: Fix refcount leak in imx_rproc_addr_init")
afe670e23af9 ("remoteproc: imx_rproc: Fix ignoring mapping vdev regions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dce68a49be26abf52712e0ee452a45fa01ab4624 Mon Sep 17 00:00:00 2001
From: Aleksandr Mishin <amishin(a)t-argos.ru>
Date: Wed, 12 Jun 2024 16:17:14 +0300
Subject: [PATCH] remoteproc: imx_rproc: Fix refcount mistake in
imx_rproc_addr_init
In imx_rproc_addr_init() strcmp() is performed over the node after the
of_node_put() is performed over it.
Fix this error by moving of_node_put() calls.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 5e4c1243071d ("remoteproc: imx_rproc: support remote cores booted before Linux Kernel")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
Link: https://lore.kernel.org/r/20240612131714.12907-1-amishin@t-argos.ru
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 39eacd90af14..144c8e9a642e 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -734,25 +734,29 @@ static int imx_rproc_addr_init(struct imx_rproc *priv,
continue;
}
err = of_address_to_resource(node, 0, &res);
- of_node_put(node);
if (err) {
dev_err(dev, "unable to resolve memory region\n");
+ of_node_put(node);
return err;
}
- if (b >= IMX_RPROC_MEM_MAX)
+ if (b >= IMX_RPROC_MEM_MAX) {
+ of_node_put(node);
break;
+ }
/* Not use resource version, because we might share region */
priv->mem[b].cpu_addr = devm_ioremap_wc(&pdev->dev, res.start, resource_size(&res));
if (!priv->mem[b].cpu_addr) {
dev_err(dev, "failed to remap %pr\n", &res);
+ of_node_put(node);
return -ENOMEM;
}
priv->mem[b].sys_addr = res.start;
priv->mem[b].size = resource_size(&res);
if (!strcmp(node->name, "rsc-table"))
priv->rsc_table = priv->mem[b].cpu_addr;
+ of_node_put(node);
b++;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2fa26ca8b786888673689ccc9da6094150939982
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073036-capable-slander-fe44@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2fa26ca8b786 ("remoteproc: imx_rproc: Skip over memory region when node value is NULL")
afe670e23af9 ("remoteproc: imx_rproc: Fix ignoring mapping vdev regions")
8f2d8961640f ("remoteproc: imx_rproc: ignore mapping vdev regions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2fa26ca8b786888673689ccc9da6094150939982 Mon Sep 17 00:00:00 2001
From: Aleksandr Mishin <amishin(a)t-argos.ru>
Date: Thu, 6 Jun 2024 10:52:04 +0300
Subject: [PATCH] remoteproc: imx_rproc: Skip over memory region when node
value is NULL
In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
number of phandles. But phandles may be empty. So of_parse_phandle() in
the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
Adjust this issue by adding NULL-return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver")
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
Reviewed-by: Peng Fan <peng.fan(a)nxp.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240606075204.12354-1-amishin@t-argos.ru
[Fixed title to fit within the prescribed 70-75 charcters]
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 5a3fb902acc9..39eacd90af14 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -726,6 +726,8 @@ static int imx_rproc_addr_init(struct imx_rproc *priv,
struct resource res;
node = of_parse_phandle(np, "memory-region", a);
+ if (!node)
+ continue;
/* Not map vdevbuffer, vdevring region */
if (!strncmp(node->name, "vdev", strlen("vdev"))) {
of_node_put(node);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2fa26ca8b786888673689ccc9da6094150939982
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073033-santa-unscrew-ac70@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2fa26ca8b786 ("remoteproc: imx_rproc: Skip over memory region when node value is NULL")
afe670e23af9 ("remoteproc: imx_rproc: Fix ignoring mapping vdev regions")
8f2d8961640f ("remoteproc: imx_rproc: ignore mapping vdev regions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2fa26ca8b786888673689ccc9da6094150939982 Mon Sep 17 00:00:00 2001
From: Aleksandr Mishin <amishin(a)t-argos.ru>
Date: Thu, 6 Jun 2024 10:52:04 +0300
Subject: [PATCH] remoteproc: imx_rproc: Skip over memory region when node
value is NULL
In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
number of phandles. But phandles may be empty. So of_parse_phandle() in
the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
Adjust this issue by adding NULL-return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver")
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
Reviewed-by: Peng Fan <peng.fan(a)nxp.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240606075204.12354-1-amishin@t-argos.ru
[Fixed title to fit within the prescribed 70-75 charcters]
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 5a3fb902acc9..39eacd90af14 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -726,6 +726,8 @@ static int imx_rproc_addr_init(struct imx_rproc *priv,
struct resource res;
node = of_parse_phandle(np, "memory-region", a);
+ if (!node)
+ continue;
/* Not map vdevbuffer, vdevring region */
if (!strncmp(node->name, "vdev", strlen("vdev"))) {
of_node_put(node);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2fa26ca8b786888673689ccc9da6094150939982
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073032-attentive-tamale-4489@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2fa26ca8b786 ("remoteproc: imx_rproc: Skip over memory region when node value is NULL")
afe670e23af9 ("remoteproc: imx_rproc: Fix ignoring mapping vdev regions")
8f2d8961640f ("remoteproc: imx_rproc: ignore mapping vdev regions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2fa26ca8b786888673689ccc9da6094150939982 Mon Sep 17 00:00:00 2001
From: Aleksandr Mishin <amishin(a)t-argos.ru>
Date: Thu, 6 Jun 2024 10:52:04 +0300
Subject: [PATCH] remoteproc: imx_rproc: Skip over memory region when node
value is NULL
In imx_rproc_addr_init() "nph = of_count_phandle_with_args()" just counts
number of phandles. But phandles may be empty. So of_parse_phandle() in
the parsing loop (0 < a < nph) may return NULL which is later dereferenced.
Adjust this issue by adding NULL-return check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: a0ff4aa6f010 ("remoteproc: imx_rproc: add a NXP/Freescale imx_rproc driver")
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
Reviewed-by: Peng Fan <peng.fan(a)nxp.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240606075204.12354-1-amishin@t-argos.ru
[Fixed title to fit within the prescribed 70-75 charcters]
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
diff --git a/drivers/remoteproc/imx_rproc.c b/drivers/remoteproc/imx_rproc.c
index 5a3fb902acc9..39eacd90af14 100644
--- a/drivers/remoteproc/imx_rproc.c
+++ b/drivers/remoteproc/imx_rproc.c
@@ -726,6 +726,8 @@ static int imx_rproc_addr_init(struct imx_rproc *priv,
struct resource res;
node = of_parse_phandle(np, "memory-region", a);
+ if (!node)
+ continue;
/* Not map vdevbuffer, vdevring region */
if (!strncmp(node->name, "vdev", strlen("vdev"))) {
of_node_put(node);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073029-defog-revolt-64f0@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
ded080c86b3f ("rbd: don't move requests to the running list on errors")
637cd060537d ("rbd: new exclusive lock wait/wake code")
e1fddc8fdd22 ("rbd: quiescing lock should wait for image requests")
a2b1da09793d ("rbd: lock should be quiesced on reacquire")
0192ce2ee68b ("rbd: introduce image request state machine")
85b5e6d11898 ("rbd: move OSD request submission into object request state machines")
0ad5d953548f ("rbd: get rid of RBD_OBJ_WRITE_{FLAT,GUARD}")
a9b67e69949d ("rbd: replace obj_req->tried_parent with obj_req->read_state")
54ab3b24c536 ("rbd: get rid of obj_req->xferred, obj_req->result and img_req->xferred")
9b17eb2ce102 ("rbd: whole-object write and zeroout should copyup when snapshots exist")
89a59c1ca73b ("rbd: copyup with an empty snapshot context (aka deep-copyup)")
3a482501cf70 ("rbd: introduce rbd_obj_issue_copyup_ops()")
13488d53775b ("rbd: stop copying num_osd_ops in rbd_obj_issue_copyup()")
356889c49d84 ("rbd: clear ->xferred on error from rbd_obj_issue_copyup()")
0c93e1b7a26b ("rbd: round off and ignore discards that are too small")
6484cbe987e0 ("rbd: handle DISCARD and WRITE_ZEROES separately")
fd7e3f0d8f25 ("rbd: get rid of obj_req->obj_request_count")
26f887e0a3c4 ("libceph, rbd, ceph: move ceph_osdc_alloc_messages() calls")
39e58c3425b1 ("libceph: introduce alloc_watch_request()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073028-crepe-gently-80e3@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
ded080c86b3f ("rbd: don't move requests to the running list on errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073025-raffle-sadness-6e42@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
ded080c86b3f ("rbd: don't move requests to the running list on errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073024-usable-mullets-a00c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
ded080c86b3f ("rbd: don't move requests to the running list on errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073023-payback-dawdler-5c2b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
ded080c86b3f ("rbd: don't move requests to the running list on errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073022-profound-unsigned-6302@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
2237ceb71f89 ("rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive mappings")
ded080c86b3f ("rbd: don't move requests to the running list on errors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2237ceb71f89837ac47c5dce2aaa2c2b3a337a3c Mon Sep 17 00:00:00 2001
From: Ilya Dryomov <idryomov(a)gmail.com>
Date: Tue, 23 Jul 2024 18:07:59 +0200
Subject: [PATCH] rbd: don't assume RBD_LOCK_STATE_LOCKED for exclusive
mappings
Every time a watch is reestablished after getting lost, we need to
update the cookie which involves quiescing exclusive lock. For this,
we transition from RBD_LOCK_STATE_LOCKED to RBD_LOCK_STATE_QUIESCING
roughly for the duration of rbd_reacquire_lock() call. If the mapping
is exclusive and I/O happens to arrive in this time window, it's failed
with EROFS (later translated to EIO) based on the wrong assumption in
rbd_img_exclusive_lock() -- "lock got released?" check there stopped
making sense with commit a2b1da09793d ("rbd: lock should be quiesced on
reacquire").
To make it worse, any such I/O is added to the acquiring list before
EROFS is returned and this sets up for violating rbd_lock_del_request()
precondition that the request is either on the running list or not on
any list at all -- see commit ded080c86b3f ("rbd: don't move requests
to the running list on errors"). rbd_lock_del_request() ends up
processing these requests as if they were on the running list which
screws up quiescing_wait completion counter and ultimately leads to
rbd_assert(!completion_done(&rbd_dev->quiescing_wait));
being triggered on the next watch error.
Cc: stable(a)vger.kernel.org # 06ef84c4e9c4: rbd: rename RBD_LOCK_STATE_RELEASING and releasing_wait
Cc: stable(a)vger.kernel.org
Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code")
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Reviewed-by: Dongsheng Yang <dongsheng.yang(a)easystack.cn>
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index c30d227753d7..ea6c592e015c 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -3457,6 +3457,7 @@ static void rbd_lock_del_request(struct rbd_img_request *img_req)
lockdep_assert_held(&rbd_dev->lock_rwsem);
spin_lock(&rbd_dev->lock_lists_lock);
if (!list_empty(&img_req->lock_item)) {
+ rbd_assert(!list_empty(&rbd_dev->running_list));
list_del_init(&img_req->lock_item);
need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_QUIESCING &&
list_empty(&rbd_dev->running_list));
@@ -3476,11 +3477,6 @@ static int rbd_img_exclusive_lock(struct rbd_img_request *img_req)
if (rbd_lock_add_request(img_req))
return 1;
- if (rbd_dev->opts->exclusive) {
- WARN_ON(1); /* lock got released? */
- return -EROFS;
- }
-
/*
* Note the use of mod_delayed_work() in rbd_acquire_lock()
* and cancel_delayed_work() in wake_lock_waiters().
@@ -4601,6 +4597,10 @@ static void rbd_reacquire_lock(struct rbd_device *rbd_dev)
rbd_warn(rbd_dev, "failed to update lock cookie: %d",
ret);
+ if (rbd_dev->opts->exclusive)
+ rbd_warn(rbd_dev,
+ "temporarily releasing lock on exclusive mapping");
+
/*
* Lock cookie cannot be updated on older OSDs, so do
* a manual release and queue an acquire.
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073001-kindling-liqueur-1bf1@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073000-cedar-nutrient-4b9a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073059-trodden-challenge-685c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073059-maker-presume-f696@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073058-suing-candied-447f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073057-gonad-crabgrass-4689@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a503f91a3645651a39baf97f1aed90d5d9f9bda9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073057-pasture-amused-2fe9@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
a503f91a3645 ("mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks")
478211867460 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a503f91a3645651a39baf97f1aed90d5d9f9bda9 Mon Sep 17 00:00:00 2001
From: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Date: Fri, 5 Jul 2024 13:51:35 +0200
Subject: [PATCH] mtd: rawnand: lpx32xx: Fix dma_request_chan() error checks
The dma_request_chan() returns error pointer in case of error, while
dma_request_channel() returns NULL in case of error therefore different
error checks are needed for the two.
Fixes: 7326d3fb1ee3 ("mtd: rawnand: lpx32xx: Request DMA channels using DT entries")
Signed-off-by: Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240705115139.126522-1-piotr.wojtaszczyk…
diff --git a/drivers/mtd/nand/raw/lpc32xx_mlc.c b/drivers/mtd/nand/raw/lpc32xx_mlc.c
index 92cebe871bb4..b9c3adc54c01 100644
--- a/drivers/mtd/nand/raw/lpc32xx_mlc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_mlc.c
@@ -575,7 +575,7 @@ static int lpc32xx_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
diff --git a/drivers/mtd/nand/raw/lpc32xx_slc.c b/drivers/mtd/nand/raw/lpc32xx_slc.c
index 3b7e3d259785..ade971e4cc3b 100644
--- a/drivers/mtd/nand/raw/lpc32xx_slc.c
+++ b/drivers/mtd/nand/raw/lpc32xx_slc.c
@@ -722,7 +722,7 @@ static int lpc32xx_nand_dma_setup(struct lpc32xx_nand_host *host)
dma_cap_mask_t mask;
host->dma_chan = dma_request_chan(mtd->dev.parent, "rx-tx");
- if (!host->dma_chan) {
+ if (IS_ERR(host->dma_chan)) {
/* fallback to request using platform data */
if (!host->pdata || !host->pdata->dma_filter) {
dev_err(mtd->dev.parent, "no DMA platform data\n");
> If you're trying to backport something, I think you forgot to Cc stable
> and provide the proper upstream commit.
>
> As is this isn't something I can do anything with. The patch does not
> apply to any recent kernel and AFAICT this issue has long since been
> fixed.
When fixing this bug, I didn't pay much attention to upstream changes.
Upon reviewing the history of relevant commits, I found that they have
been merged and reverted multiple times:
```bash
git log -S 'cpumask_test_cpu(cpu, sched_domain_span(sd))' --oneline \
kernel/sched/fair.c
8aeaffef8c6e sched/fair: Take the scheduling domain into account in select_idle_smt()
3e6efe87cd5c sched/fair: Remove redundant check in select_idle_smt()
3e8c6c9aac42 sched/fair: Remove task_util from effective utilization in feec()
c722f35b513f sched/fair: Bring back select_idle_smt(), but differently
6cd56ef1df39 sched/fair: Remove select_idle_smt()
df3cb4ea1fb6 sched/fair: Fix wrong cpu selecting from isolated domain
```
The latest upstream commit 8aeaffef8c6e is not applicable to linux-4.19.y.
The current patch has been tested on linux-4.19.y and I am looking forward
to its inclusion in the stable version.
From: Ira Weiny <ira.weiny(a)intel.com>
commit ced085ef369af7a2b6da962ec2fbd01339f60693 upstream.
The "goto error" pattern is notorious for introducing subtle resource
leaks. Use the new cleanup.h helpers for PCI device reference counts and
locks.
Similar to the new put_device() and device_lock() cleanup helpers,
__free(put_device) and guard(device), define the same for PCI devices,
__free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
need for "goto free;" and "goto unlock;" patterns. For example, A
'struct pci_dev *' instance declared as:
struct pci_dev *pdev __free(pci_dev_put) = NULL;
...will automatically call pci_dev_put() if @pdev is non-NULL when @pdev
goes out of scope (automatic variable scope). If a function wants to
invoke pci_dev_put() on error, but return @pdev on success, it can do:
return no_free_ptr(pdev);
...or:
return_ptr(pdev);
For potential cleanup opportunity there are 587 open-coded calls to
pci_dev_put() in the kernel with 65 instances within 10 lines of a goto
statement with the CXL driver threatening to add another one.
The guard() helper holds the associated lock for the remainder of the
current scope in which it was invoked. So, for example:
func(...)
{
if (...) {
...
guard(pci_dev); /* pci_dev_lock() invoked here */
...
} /* <- implied pci_dev_unlock() triggered here */
}
There are 15 invocations of pci_dev_unlock() in the kernel with 5
instances within 10 lines of a goto statement. Again, the CXL driver is
threatening to add another.
Introduce these helpers to preclude the addition of new more error prone
goto put; / goto unlock; sequences. For now, these helpers are used in
drivers/cxl/pci.c to allow ACPI error reports to be fed back into the
CXL driver associated with the PCI device identified in the report.
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-8-1bb8a4ca2c7a@intel.com
[djbw: rewrite changelog]
Acked-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
---
include/linux/pci.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 512cb40150df..f14130011621 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1146,6 +1146,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
struct pci_dev *pci_dev_get(struct pci_dev *dev);
void pci_dev_put(struct pci_dev *dev);
+DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
void pci_remove_bus(struct pci_bus *b);
void pci_stop_and_remove_bus_device(struct pci_dev *dev);
void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
@@ -1851,6 +1852,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
void pci_dev_lock(struct pci_dev *dev);
int pci_dev_trylock(struct pci_dev *dev);
void pci_dev_unlock(struct pci_dev *dev);
+DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
--
2.43.0
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c7d0b2db5bc5e8c0fdc67b3c8f463c3dfec92f77
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073024-passable-cardigan-cd15@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
c7d0b2db5bc5 ("bus: mhi: ep: Do not allocate memory for MHI objects from DMA zone")
2547beb00ddb ("bus: mhi: ep: Add support for async DMA read operation")
ee08acb58fe4 ("bus: mhi: ep: Add support for async DMA write operation")
8b786ed8fb08 ("bus: mhi: ep: Introduce async read/write callbacks")
927105244f8b ("bus: mhi: ep: Rename read_from_host() and write_to_host() APIs")
b08ded2ef2e9 ("bus: mhi: ep: Pass mhi_ep_buf_info struct to read/write APIs")
62210a26cd4f ("bus: mhi: ep: Use slab allocator where applicable")
987fdb5a43a6 ("bus: mhi: ep: Do not allocate event ring element on stack")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c7d0b2db5bc5e8c0fdc67b3c8f463c3dfec92f77 Mon Sep 17 00:00:00 2001
From: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Date: Mon, 3 Jun 2024 22:13:54 +0530
Subject: [PATCH] bus: mhi: ep: Do not allocate memory for MHI objects from DMA
zone
MHI endpoint stack accidentally started allocating memory for objects from
DMA zone since commit 62210a26cd4f ("bus: mhi: ep: Use slab allocator
where applicable"). But there is no real need to allocate memory from this
naturally limited DMA zone. This also causes the MHI endpoint stack to run
out of memory while doing high bandwidth transfers.
So let's switch over to normal memory.
Cc: <stable(a)vger.kernel.org> # 6.8
Fixes: 62210a26cd4f ("bus: mhi: ep: Use slab allocator where applicable")
Reviewed-by: Mayank Rana <quic_mrana(a)quicinc.com>
Link: https://lore.kernel.org/r/20240603164354.79035-1-manivannan.sadhasivam@lina…
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
diff --git a/drivers/bus/mhi/ep/main.c b/drivers/bus/mhi/ep/main.c
index f8f674adf1d4..4acfac73ca9a 100644
--- a/drivers/bus/mhi/ep/main.c
+++ b/drivers/bus/mhi/ep/main.c
@@ -90,7 +90,7 @@ static int mhi_ep_send_completion_event(struct mhi_ep_cntrl *mhi_cntrl, struct m
struct mhi_ring_element *event;
int ret;
- event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL | GFP_DMA);
+ event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL);
if (!event)
return -ENOMEM;
@@ -109,7 +109,7 @@ int mhi_ep_send_state_change_event(struct mhi_ep_cntrl *mhi_cntrl, enum mhi_stat
struct mhi_ring_element *event;
int ret;
- event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL | GFP_DMA);
+ event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL);
if (!event)
return -ENOMEM;
@@ -127,7 +127,7 @@ int mhi_ep_send_ee_event(struct mhi_ep_cntrl *mhi_cntrl, enum mhi_ee_type exec_e
struct mhi_ring_element *event;
int ret;
- event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL | GFP_DMA);
+ event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL);
if (!event)
return -ENOMEM;
@@ -146,7 +146,7 @@ static int mhi_ep_send_cmd_comp_event(struct mhi_ep_cntrl *mhi_cntrl, enum mhi_e
struct mhi_ring_element *event;
int ret;
- event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL | GFP_DMA);
+ event = kmem_cache_zalloc(mhi_cntrl->ev_ring_el_cache, GFP_KERNEL);
if (!event)
return -ENOMEM;
@@ -438,7 +438,7 @@ static int mhi_ep_read_channel(struct mhi_ep_cntrl *mhi_cntrl,
read_offset = mhi_chan->tre_size - mhi_chan->tre_bytes_left;
write_offset = len - buf_left;
- buf_addr = kmem_cache_zalloc(mhi_cntrl->tre_buf_cache, GFP_KERNEL | GFP_DMA);
+ buf_addr = kmem_cache_zalloc(mhi_cntrl->tre_buf_cache, GFP_KERNEL);
if (!buf_addr)
return -ENOMEM;
@@ -1481,14 +1481,14 @@ int mhi_ep_register_controller(struct mhi_ep_cntrl *mhi_cntrl,
mhi_cntrl->ev_ring_el_cache = kmem_cache_create("mhi_ep_event_ring_el",
sizeof(struct mhi_ring_element), 0,
- SLAB_CACHE_DMA, NULL);
+ 0, NULL);
if (!mhi_cntrl->ev_ring_el_cache) {
ret = -ENOMEM;
goto err_free_cmd;
}
mhi_cntrl->tre_buf_cache = kmem_cache_create("mhi_ep_tre_buf", MHI_EP_DEFAULT_MTU, 0,
- SLAB_CACHE_DMA, NULL);
+ 0, NULL);
if (!mhi_cntrl->tre_buf_cache) {
ret = -ENOMEM;
goto err_destroy_ev_ring_el_cache;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 509580fad7323b6a5da27e8365cd488f3b57210e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073042-crux-ditch-2ce7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
509580fad732 ("drm/i915/dp: Don't switch the LTTPR mode on an active link")
657586e474bd ("drm/i915: Add a DP1.2 compatible way to read LTTPR capabilities")
77f4ec2a4111 ("drm/i915/dp: remove accidental static on what should be a local variable")
a421d8a99216 ("drm/i915/dp: rewrite DP 2.0 128b/132b link training based on errata")
92e438619d16 ("drm/i915/dp: move intel_dp_prepare_link_train() call")
6c4d46523bf3 ("drm/i915: Pimp link training debug prints")
1f662675335b ("drm/i915: Print the DP vswing adjustment request")
be1525048c58 ("drm/i915: Show LTTPR in the TPS debug print")
c6921d484d3f ("drm/i915: Prepare link training for per-lane drive settings")
e722ab8b6968 ("drm/i915: Generalize .set_signal_levels()")
5bafd85dd770 ("drm/i915: Introduce has_buf_trans_select()")
f820693bc238 ("drm/i915: Introduce has_iboost()")
f6e3be98654e ("drm/i915: Fix DP clock recovery "voltage_tries" handling")
3b4da8315add ("drm/i915/dg2: use existing mechanisms for SNPS PHY translations")
0707570248b8 ("drm/i915/dp: pass crtc_state to intel_ddi_dp_level()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 509580fad7323b6a5da27e8365cd488f3b57210e Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak(a)intel.com>
Date: Mon, 8 Jul 2024 22:00:25 +0300
Subject: [PATCH] drm/i915/dp: Don't switch the LTTPR mode on an active link
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Switching to transparent mode leads to a loss of link synchronization,
so prevent doing this on an active link. This happened at least on an
Intel N100 system / DELL UD22 dock, the LTTPR residing either on the
host or the dock. To fix the issue, keep the current mode on an active
link, adjusting the LTTPR count accordingly (resetting it to 0 in
transparent mode).
v2: Adjust code comment during link training about reiniting the LTTPRs.
(Ville)
Fixes: 7b2a4ab8b0ef ("drm/i915: Switch to LTTPR transparent mode link training")
Reported-and-tested-by: Gareth Yu <gareth.yu(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10902
Cc: <stable(a)vger.kernel.org> # v5.15+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708190029.271247-3-imre.…
(cherry picked from commit 211ad49cf8ccfdc798a719b4d1e000d0a8a9e588)
Signed-off-by: Tvrtko Ursulin <tursulin(a)ursulin.net>
diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
index 1bc4ef84ff3b..d044c8e36bb3 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
@@ -117,10 +117,24 @@ intel_dp_set_lttpr_transparent_mode(struct intel_dp *intel_dp, bool enable)
return drm_dp_dpcd_write(&intel_dp->aux, DP_PHY_REPEATER_MODE, &val, 1) == 1;
}
-static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
+static bool intel_dp_lttpr_transparent_mode_enabled(struct intel_dp *intel_dp)
+{
+ return intel_dp->lttpr_common_caps[DP_PHY_REPEATER_MODE -
+ DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV] ==
+ DP_PHY_REPEATER_MODE_TRANSPARENT;
+}
+
+/*
+ * Read the LTTPR common capabilities and switch the LTTPR PHYs to
+ * non-transparent mode if this is supported. Preserve the
+ * transparent/non-transparent mode on an active link.
+ *
+ * Return the number of detected LTTPRs in non-transparent mode or 0 if the
+ * LTTPRs are in transparent mode or the detection failed.
+ */
+static int intel_dp_init_lttpr_phys(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
{
int lttpr_count;
- int i;
if (!intel_dp_read_lttpr_common_caps(intel_dp, dpcd))
return 0;
@@ -134,6 +148,19 @@ static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEI
if (lttpr_count == 0)
return 0;
+ /*
+ * Don't change the mode on an active link, to prevent a loss of link
+ * synchronization. See DP Standard v2.0 3.6.7. about the LTTPR
+ * resetting its internal state when the mode is changed from
+ * non-transparent to transparent.
+ */
+ if (intel_dp->link_trained) {
+ if (lttpr_count < 0 || intel_dp_lttpr_transparent_mode_enabled(intel_dp))
+ goto out_reset_lttpr_count;
+
+ return lttpr_count;
+ }
+
/*
* See DP Standard v2.0 3.6.6.1. about the explicit disabling of
* non-transparent mode and the disable->enable non-transparent mode
@@ -154,11 +181,25 @@ static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEI
"Switching to LTTPR non-transparent LT mode failed, fall-back to transparent mode\n");
intel_dp_set_lttpr_transparent_mode(intel_dp, true);
- intel_dp_reset_lttpr_count(intel_dp);
- return 0;
+ goto out_reset_lttpr_count;
}
+ return lttpr_count;
+
+out_reset_lttpr_count:
+ intel_dp_reset_lttpr_count(intel_dp);
+
+ return 0;
+}
+
+static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
+{
+ int lttpr_count;
+ int i;
+
+ lttpr_count = intel_dp_init_lttpr_phys(intel_dp, dpcd);
+
for (i = 0; i < lttpr_count; i++)
intel_dp_read_lttpr_phy_caps(intel_dp, dpcd, DP_PHY_LTTPR(i));
@@ -1482,10 +1523,10 @@ void intel_dp_start_link_train(struct intel_atomic_state *state,
struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
struct intel_encoder *encoder = &dig_port->base;
bool passed;
-
/*
- * TODO: Reiniting LTTPRs here won't be needed once proper connector
- * HW state readout is added.
+ * Reinit the LTTPRs here to ensure that they are switched to
+ * non-transparent mode. During an earlier LTTPR detection this
+ * could've been prevented by an active link.
*/
int lttpr_count = intel_dp_init_lttpr_and_dprx_caps(intel_dp);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 509580fad7323b6a5da27e8365cd488f3b57210e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073041-suffocate-marrow-e9ec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
509580fad732 ("drm/i915/dp: Don't switch the LTTPR mode on an active link")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 509580fad7323b6a5da27e8365cd488f3b57210e Mon Sep 17 00:00:00 2001
From: Imre Deak <imre.deak(a)intel.com>
Date: Mon, 8 Jul 2024 22:00:25 +0300
Subject: [PATCH] drm/i915/dp: Don't switch the LTTPR mode on an active link
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Switching to transparent mode leads to a loss of link synchronization,
so prevent doing this on an active link. This happened at least on an
Intel N100 system / DELL UD22 dock, the LTTPR residing either on the
host or the dock. To fix the issue, keep the current mode on an active
link, adjusting the LTTPR count accordingly (resetting it to 0 in
transparent mode).
v2: Adjust code comment during link training about reiniting the LTTPRs.
(Ville)
Fixes: 7b2a4ab8b0ef ("drm/i915: Switch to LTTPR transparent mode link training")
Reported-and-tested-by: Gareth Yu <gareth.yu(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10902
Cc: <stable(a)vger.kernel.org> # v5.15+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal(a)intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240708190029.271247-3-imre.…
(cherry picked from commit 211ad49cf8ccfdc798a719b4d1e000d0a8a9e588)
Signed-off-by: Tvrtko Ursulin <tursulin(a)ursulin.net>
diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
index 1bc4ef84ff3b..d044c8e36bb3 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
@@ -117,10 +117,24 @@ intel_dp_set_lttpr_transparent_mode(struct intel_dp *intel_dp, bool enable)
return drm_dp_dpcd_write(&intel_dp->aux, DP_PHY_REPEATER_MODE, &val, 1) == 1;
}
-static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
+static bool intel_dp_lttpr_transparent_mode_enabled(struct intel_dp *intel_dp)
+{
+ return intel_dp->lttpr_common_caps[DP_PHY_REPEATER_MODE -
+ DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV] ==
+ DP_PHY_REPEATER_MODE_TRANSPARENT;
+}
+
+/*
+ * Read the LTTPR common capabilities and switch the LTTPR PHYs to
+ * non-transparent mode if this is supported. Preserve the
+ * transparent/non-transparent mode on an active link.
+ *
+ * Return the number of detected LTTPRs in non-transparent mode or 0 if the
+ * LTTPRs are in transparent mode or the detection failed.
+ */
+static int intel_dp_init_lttpr_phys(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
{
int lttpr_count;
- int i;
if (!intel_dp_read_lttpr_common_caps(intel_dp, dpcd))
return 0;
@@ -134,6 +148,19 @@ static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEI
if (lttpr_count == 0)
return 0;
+ /*
+ * Don't change the mode on an active link, to prevent a loss of link
+ * synchronization. See DP Standard v2.0 3.6.7. about the LTTPR
+ * resetting its internal state when the mode is changed from
+ * non-transparent to transparent.
+ */
+ if (intel_dp->link_trained) {
+ if (lttpr_count < 0 || intel_dp_lttpr_transparent_mode_enabled(intel_dp))
+ goto out_reset_lttpr_count;
+
+ return lttpr_count;
+ }
+
/*
* See DP Standard v2.0 3.6.6.1. about the explicit disabling of
* non-transparent mode and the disable->enable non-transparent mode
@@ -154,11 +181,25 @@ static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEI
"Switching to LTTPR non-transparent LT mode failed, fall-back to transparent mode\n");
intel_dp_set_lttpr_transparent_mode(intel_dp, true);
- intel_dp_reset_lttpr_count(intel_dp);
- return 0;
+ goto out_reset_lttpr_count;
}
+ return lttpr_count;
+
+out_reset_lttpr_count:
+ intel_dp_reset_lttpr_count(intel_dp);
+
+ return 0;
+}
+
+static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
+{
+ int lttpr_count;
+ int i;
+
+ lttpr_count = intel_dp_init_lttpr_phys(intel_dp, dpcd);
+
for (i = 0; i < lttpr_count; i++)
intel_dp_read_lttpr_phy_caps(intel_dp, dpcd, DP_PHY_LTTPR(i));
@@ -1482,10 +1523,10 @@ void intel_dp_start_link_train(struct intel_atomic_state *state,
struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
struct intel_encoder *encoder = &dig_port->base;
bool passed;
-
/*
- * TODO: Reiniting LTTPRs here won't be needed once proper connector
- * HW state readout is added.
+ * Reinit the LTTPRs here to ensure that they are switched to
+ * non-transparent mode. During an earlier LTTPR detection this
+ * could've been prevented by an active link.
*/
int lttpr_count = intel_dp_init_lttpr_and_dprx_caps(intel_dp);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d63d81094d208abb20fc444514b2d9ec2f4b7c4e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073032-gradient-gully-658f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d63d81094d20 ("drm/dp_mst: Fix all mstb marked as not probed after suspend/resume")
da68386d9edb ("drm: Rename dp/ to display/")
6c64ae228f08 ("Backmerge tag 'v5.17-rc6' into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d63d81094d208abb20fc444514b2d9ec2f4b7c4e Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 26 Jun 2024 16:48:23 +0800
Subject: [PATCH] drm/dp_mst: Fix all mstb marked as not probed after
suspend/resume
[Why]
After supend/resume, with topology unchanged, observe that
link_address_sent of all mstb are marked as false even the topology probing
is done without any error.
It is caused by wrongly also include "ret == 0" case as a probing failure
case.
[How]
Remove inappropriate checking conditions.
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Harry Wentland <hwentlan(a)amd.com>
Cc: Jani Nikula <jani.nikula(a)intel.com>
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: stable(a)vger.kernel.org
Fixes: 37dfdc55ffeb ("drm/dp_mst: Cleanup drm_dp_send_link_address() a bit")
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240626084825.878565-2-Wayne…
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 7f8e1cfbe19d..68831f4e502a 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -2929,7 +2929,7 @@ static int drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
/* FIXME: Actually do some real error handling here */
ret = drm_dp_mst_wait_tx_reply(mstb, txmsg);
- if (ret <= 0) {
+ if (ret < 0) {
drm_err(mgr->dev, "Sending link address failed with %d\n", ret);
goto out;
}
@@ -2981,7 +2981,7 @@ static int drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
mutex_unlock(&mgr->lock);
out:
- if (ret <= 0)
+ if (ret < 0)
mstb->link_address_sent = false;
kfree(txmsg);
return ret < 0 ? ret : changed;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d63d81094d208abb20fc444514b2d9ec2f4b7c4e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073031-survivor-unaudited-6efd@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d63d81094d20 ("drm/dp_mst: Fix all mstb marked as not probed after suspend/resume")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d63d81094d208abb20fc444514b2d9ec2f4b7c4e Mon Sep 17 00:00:00 2001
From: Wayne Lin <Wayne.Lin(a)amd.com>
Date: Wed, 26 Jun 2024 16:48:23 +0800
Subject: [PATCH] drm/dp_mst: Fix all mstb marked as not probed after
suspend/resume
[Why]
After supend/resume, with topology unchanged, observe that
link_address_sent of all mstb are marked as false even the topology probing
is done without any error.
It is caused by wrongly also include "ret == 0" case as a probing failure
case.
[How]
Remove inappropriate checking conditions.
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: Harry Wentland <hwentlan(a)amd.com>
Cc: Jani Nikula <jani.nikula(a)intel.com>
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: stable(a)vger.kernel.org
Fixes: 37dfdc55ffeb ("drm/dp_mst: Cleanup drm_dp_send_link_address() a bit")
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240626084825.878565-2-Wayne…
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 7f8e1cfbe19d..68831f4e502a 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -2929,7 +2929,7 @@ static int drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
/* FIXME: Actually do some real error handling here */
ret = drm_dp_mst_wait_tx_reply(mstb, txmsg);
- if (ret <= 0) {
+ if (ret < 0) {
drm_err(mgr->dev, "Sending link address failed with %d\n", ret);
goto out;
}
@@ -2981,7 +2981,7 @@ static int drm_dp_send_link_address(struct drm_dp_mst_topology_mgr *mgr,
mutex_unlock(&mgr->lock);
out:
- if (ret <= 0)
+ if (ret < 0)
mstb->link_address_sent = false;
kfree(txmsg);
return ret < 0 ? ret : changed;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073011-enigmatic-overstep-1479@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5aed213c7c6c ("drm/udl: Remove DRM_CONNECTOR_POLL_HPD")
0862cfd3e22f ("drm/udl: Move connector to modesetting code")
43858eb41e0d ("drm/udl: Various improvements to the connector")
2c1eafc40e53 ("drm/udl: Use USB timeout constant when reading EDID")
c020f66013b6 ("drm/udl: Test pixel limit in mode-config's mode-valid function")
59a811faa74f ("drm/udl: Rename struct udl_drm_connector to struct udl_connector")
255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
0f95ee9a0c57 ("Merge tag 'drm-misc-next-2022-06-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Fri, 10 May 2024 17:47:08 +0200
Subject: [PATCH] drm/udl: Remove DRM_CONNECTOR_POLL_HPD
DisplayLink devices do not generate hotplug events. Remove the poll
flag DRM_CONNECTOR_POLL_HPD, as it may not be specified together with
DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: afdfc4c6f55f ("drm/udl: Fixed problem with UDL adpater reconnection")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: Robert Tarasov <tutankhamen(a)chromium.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Sean Paul <sean(a)poorly.run>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20240510154841.11370-2-tzimme…
diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c
index 7702359c90c2..751da3a294c4 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -527,8 +527,7 @@ struct drm_connector *udl_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector, &udl_connector_helper_funcs);
- connector->polled = DRM_CONNECTOR_POLL_HPD |
- DRM_CONNECTOR_POLL_CONNECT |
+ connector->polled = DRM_CONNECTOR_POLL_CONNECT |
DRM_CONNECTOR_POLL_DISCONNECT;
return connector;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073010-pond-chamber-4302@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5aed213c7c6c ("drm/udl: Remove DRM_CONNECTOR_POLL_HPD")
0862cfd3e22f ("drm/udl: Move connector to modesetting code")
43858eb41e0d ("drm/udl: Various improvements to the connector")
2c1eafc40e53 ("drm/udl: Use USB timeout constant when reading EDID")
c020f66013b6 ("drm/udl: Test pixel limit in mode-config's mode-valid function")
59a811faa74f ("drm/udl: Rename struct udl_drm_connector to struct udl_connector")
255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
0f95ee9a0c57 ("Merge tag 'drm-misc-next-2022-06-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Fri, 10 May 2024 17:47:08 +0200
Subject: [PATCH] drm/udl: Remove DRM_CONNECTOR_POLL_HPD
DisplayLink devices do not generate hotplug events. Remove the poll
flag DRM_CONNECTOR_POLL_HPD, as it may not be specified together with
DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: afdfc4c6f55f ("drm/udl: Fixed problem with UDL adpater reconnection")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: Robert Tarasov <tutankhamen(a)chromium.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Sean Paul <sean(a)poorly.run>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20240510154841.11370-2-tzimme…
diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c
index 7702359c90c2..751da3a294c4 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -527,8 +527,7 @@ struct drm_connector *udl_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector, &udl_connector_helper_funcs);
- connector->polled = DRM_CONNECTOR_POLL_HPD |
- DRM_CONNECTOR_POLL_CONNECT |
+ connector->polled = DRM_CONNECTOR_POLL_CONNECT |
DRM_CONNECTOR_POLL_DISCONNECT;
return connector;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073010-deserve-askew-b1bf@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
5aed213c7c6c ("drm/udl: Remove DRM_CONNECTOR_POLL_HPD")
0862cfd3e22f ("drm/udl: Move connector to modesetting code")
43858eb41e0d ("drm/udl: Various improvements to the connector")
2c1eafc40e53 ("drm/udl: Use USB timeout constant when reading EDID")
c020f66013b6 ("drm/udl: Test pixel limit in mode-config's mode-valid function")
59a811faa74f ("drm/udl: Rename struct udl_drm_connector to struct udl_connector")
255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
0f95ee9a0c57 ("Merge tag 'drm-misc-next-2022-06-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Fri, 10 May 2024 17:47:08 +0200
Subject: [PATCH] drm/udl: Remove DRM_CONNECTOR_POLL_HPD
DisplayLink devices do not generate hotplug events. Remove the poll
flag DRM_CONNECTOR_POLL_HPD, as it may not be specified together with
DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: afdfc4c6f55f ("drm/udl: Fixed problem with UDL adpater reconnection")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: Robert Tarasov <tutankhamen(a)chromium.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Sean Paul <sean(a)poorly.run>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20240510154841.11370-2-tzimme…
diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c
index 7702359c90c2..751da3a294c4 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -527,8 +527,7 @@ struct drm_connector *udl_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector, &udl_connector_helper_funcs);
- connector->polled = DRM_CONNECTOR_POLL_HPD |
- DRM_CONNECTOR_POLL_CONNECT |
+ connector->polled = DRM_CONNECTOR_POLL_CONNECT |
DRM_CONNECTOR_POLL_DISCONNECT;
return connector;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073009-turmoil-zombie-8941@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
5aed213c7c6c ("drm/udl: Remove DRM_CONNECTOR_POLL_HPD")
0862cfd3e22f ("drm/udl: Move connector to modesetting code")
43858eb41e0d ("drm/udl: Various improvements to the connector")
2c1eafc40e53 ("drm/udl: Use USB timeout constant when reading EDID")
c020f66013b6 ("drm/udl: Test pixel limit in mode-config's mode-valid function")
59a811faa74f ("drm/udl: Rename struct udl_drm_connector to struct udl_connector")
255490f9150d ("drm: Drop drm_edid.h from drm_crtc.h")
0f95ee9a0c57 ("Merge tag 'drm-misc-next-2022-06-08' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Fri, 10 May 2024 17:47:08 +0200
Subject: [PATCH] drm/udl: Remove DRM_CONNECTOR_POLL_HPD
DisplayLink devices do not generate hotplug events. Remove the poll
flag DRM_CONNECTOR_POLL_HPD, as it may not be specified together with
DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: afdfc4c6f55f ("drm/udl: Fixed problem with UDL adpater reconnection")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: Robert Tarasov <tutankhamen(a)chromium.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Sean Paul <sean(a)poorly.run>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20240510154841.11370-2-tzimme…
diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c
index 7702359c90c2..751da3a294c4 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -527,8 +527,7 @@ struct drm_connector *udl_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector, &udl_connector_helper_funcs);
- connector->polled = DRM_CONNECTOR_POLL_HPD |
- DRM_CONNECTOR_POLL_CONNECT |
+ connector->polled = DRM_CONNECTOR_POLL_CONNECT |
DRM_CONNECTOR_POLL_DISCONNECT;
return connector;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073008-bakeshop-unwarlike-9346@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5aed213c7c6c ("drm/udl: Remove DRM_CONNECTOR_POLL_HPD")
0862cfd3e22f ("drm/udl: Move connector to modesetting code")
43858eb41e0d ("drm/udl: Various improvements to the connector")
2c1eafc40e53 ("drm/udl: Use USB timeout constant when reading EDID")
c020f66013b6 ("drm/udl: Test pixel limit in mode-config's mode-valid function")
59a811faa74f ("drm/udl: Rename struct udl_drm_connector to struct udl_connector")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5aed213c7c6c4f5dcb1a3ef146f493f18fe703dc Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Fri, 10 May 2024 17:47:08 +0200
Subject: [PATCH] drm/udl: Remove DRM_CONNECTOR_POLL_HPD
DisplayLink devices do not generate hotplug events. Remove the poll
flag DRM_CONNECTOR_POLL_HPD, as it may not be specified together with
DRM_CONNECTOR_POLL_CONNECT or DRM_CONNECTOR_POLL_DISCONNECT.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: afdfc4c6f55f ("drm/udl: Fixed problem with UDL adpater reconnection")
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
Cc: Robert Tarasov <tutankhamen(a)chromium.org>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Sean Paul <sean(a)poorly.run>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.15+
Link: https://patchwork.freedesktop.org/patch/msgid/20240510154841.11370-2-tzimme…
diff --git a/drivers/gpu/drm/udl/udl_modeset.c b/drivers/gpu/drm/udl/udl_modeset.c
index 7702359c90c2..751da3a294c4 100644
--- a/drivers/gpu/drm/udl/udl_modeset.c
+++ b/drivers/gpu/drm/udl/udl_modeset.c
@@ -527,8 +527,7 @@ struct drm_connector *udl_connector_init(struct drm_device *dev)
drm_connector_helper_add(connector, &udl_connector_helper_funcs);
- connector->polled = DRM_CONNECTOR_POLL_HPD |
- DRM_CONNECTOR_POLL_CONNECT |
+ connector->polled = DRM_CONNECTOR_POLL_CONNECT |
DRM_CONNECTOR_POLL_DISCONNECT;
return connector;
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x fa0c1c9d283b37fdb7fc1dcccbb88fc8f48a4aa4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073042-perennial-patio-0790@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
fa0c1c9d283b ("perf/x86/intel: Add a distinct name for Granite Rapids")
d142df13f357 ("perf/x86/intel: Switch to new Intel CPU model defines")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fa0c1c9d283b37fdb7fc1dcccbb88fc8f48a4aa4 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang(a)linux.intel.com>
Date: Mon, 8 Jul 2024 12:33:35 -0700
Subject: [PATCH] perf/x86/intel: Add a distinct name for Granite Rapids
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, the Sapphire Rapids and Granite Rapids share the same PMU
name, sapphire_rapids. Because from the kernel’s perspective, GNR is
similar to SPR. The only key difference is that they support different
extra MSRs. The code path and the PMU name are shared.
However, from end users' perspective, they are quite different. Besides
the extra MSRs, GNR has a newer PEBS format, supports Retire Latency,
supports new CPUID enumeration architecture, doesn't required the
load-latency AUX event, has additional TMA Level 1 Architectural Events,
etc. The differences can be enumerated by CPUID or the PERF_CAPABILITIES
MSR. They weren't reflected in the model-specific kernel setup.
But it is worth to have a distinct PMU name for GNR.
Fixes: a6742cb90b56 ("perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL")
Suggested-by: Ahmad Yasin <ahmad.yasin(a)intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20240708193336.1192217-3-kan.liang@linux.intel.com
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b61367991a16..0c9c2706d4ec 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6943,12 +6943,18 @@ __init int intel_pmu_init(void)
case INTEL_EMERALDRAPIDS_X:
x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
x86_pmu.extra_regs = intel_glc_extra_regs;
- fallthrough;
+ pr_cont("Sapphire Rapids events, ");
+ name = "sapphire_rapids";
+ goto glc_common;
+
case INTEL_GRANITERAPIDS_X:
case INTEL_GRANITERAPIDS_D:
+ x86_pmu.extra_regs = intel_rwc_extra_regs;
+ pr_cont("Granite Rapids events, ");
+ name = "granite_rapids";
+
+ glc_common:
intel_pmu_init_glc(NULL);
- if (!x86_pmu.extra_regs)
- x86_pmu.extra_regs = intel_rwc_extra_regs;
x86_pmu.pebs_ept = 1;
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = glc_get_event_constraints;
@@ -6959,8 +6965,6 @@ __init int intel_pmu_init(void)
td_attr = glc_td_events_attrs;
tsx_attr = glc_tsx_events_attrs;
intel_pmu_pebs_data_source_skl(true);
- pr_cont("Sapphire Rapids events, ");
- name = "sapphire_rapids";
break;
case INTEL_ALDERLAKE:
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x fa0c1c9d283b37fdb7fc1dcccbb88fc8f48a4aa4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073043-shortlist-silica-557d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
fa0c1c9d283b ("perf/x86/intel: Add a distinct name for Granite Rapids")
d142df13f357 ("perf/x86/intel: Switch to new Intel CPU model defines")
97588df87b56 ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
b0560bfd4b70 ("perf/x86/intel: Clean up the hybrid CPU type handling code")
299a5fc8e783 ("perf/x86/intel: Apply the common initialization code for ADL")
d87d221f854b ("perf/x86/intel: Factor out the initialization code for ADL e-core")
0ba0c03528e9 ("perf/x86/intel: Factor out the initialization code for SPR")
d4b5694c75d4 ("perf/x86/intel: Use the common uarch name for the shared functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fa0c1c9d283b37fdb7fc1dcccbb88fc8f48a4aa4 Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang(a)linux.intel.com>
Date: Mon, 8 Jul 2024 12:33:35 -0700
Subject: [PATCH] perf/x86/intel: Add a distinct name for Granite Rapids
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, the Sapphire Rapids and Granite Rapids share the same PMU
name, sapphire_rapids. Because from the kernel’s perspective, GNR is
similar to SPR. The only key difference is that they support different
extra MSRs. The code path and the PMU name are shared.
However, from end users' perspective, they are quite different. Besides
the extra MSRs, GNR has a newer PEBS format, supports Retire Latency,
supports new CPUID enumeration architecture, doesn't required the
load-latency AUX event, has additional TMA Level 1 Architectural Events,
etc. The differences can be enumerated by CPUID or the PERF_CAPABILITIES
MSR. They weren't reflected in the model-specific kernel setup.
But it is worth to have a distinct PMU name for GNR.
Fixes: a6742cb90b56 ("perf/x86/intel: Fix the FRONTEND encoding on GNR and MTL")
Suggested-by: Ahmad Yasin <ahmad.yasin(a)intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20240708193336.1192217-3-kan.liang@linux.intel.com
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index b61367991a16..0c9c2706d4ec 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6943,12 +6943,18 @@ __init int intel_pmu_init(void)
case INTEL_EMERALDRAPIDS_X:
x86_pmu.flags |= PMU_FL_MEM_LOADS_AUX;
x86_pmu.extra_regs = intel_glc_extra_regs;
- fallthrough;
+ pr_cont("Sapphire Rapids events, ");
+ name = "sapphire_rapids";
+ goto glc_common;
+
case INTEL_GRANITERAPIDS_X:
case INTEL_GRANITERAPIDS_D:
+ x86_pmu.extra_regs = intel_rwc_extra_regs;
+ pr_cont("Granite Rapids events, ");
+ name = "granite_rapids";
+
+ glc_common:
intel_pmu_init_glc(NULL);
- if (!x86_pmu.extra_regs)
- x86_pmu.extra_regs = intel_rwc_extra_regs;
x86_pmu.pebs_ept = 1;
x86_pmu.hw_config = hsw_hw_config;
x86_pmu.get_event_constraints = glc_get_event_constraints;
@@ -6959,8 +6965,6 @@ __init int intel_pmu_init(void)
td_attr = glc_td_events_attrs;
tsx_attr = glc_tsx_events_attrs;
intel_pmu_pebs_data_source_skl(true);
- pr_cont("Sapphire Rapids events, ");
- name = "sapphire_rapids";
break;
case INTEL_ALDERLAKE:
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ac9aa295f7a89d38656739628796f086f0b160e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073034-senator-fringe-8188@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
ac9aa295f7a8 ("perf: imx_perf: fix counter start and config sequence")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ac9aa295f7a89d38656739628796f086f0b160e2 Mon Sep 17 00:00:00 2001
From: Xu Yang <xu.yang_2(a)nxp.com>
Date: Wed, 29 May 2024 16:03:55 +0800
Subject: [PATCH] perf: imx_perf: fix counter start and config sequence
In current driver, the counter will start firstly and then be configured.
This sequence is not correct for AXI filter events since the correct
AXI_MASK and AXI_ID are not set yet. Then the results may be inaccurate.
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Fixes: 55691f99d417 ("drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver")
cc: stable(a)vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
Link: https://lore.kernel.org/r/20240529080358.703784-5-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/drivers/perf/fsl_imx9_ddr_perf.c b/drivers/perf/fsl_imx9_ddr_perf.c
index 5433c52a9872..7b43b54920da 100644
--- a/drivers/perf/fsl_imx9_ddr_perf.c
+++ b/drivers/perf/fsl_imx9_ddr_perf.c
@@ -541,12 +541,12 @@ static int ddr_perf_event_add(struct perf_event *event, int flags)
hwc->idx = counter;
hwc->state |= PERF_HES_STOPPED;
- if (flags & PERF_EF_START)
- ddr_perf_event_start(event, flags);
-
/* read trans, write trans, read beat */
imx93_ddr_perf_monitor_config(pmu, event_id, counter, cfg1, cfg2);
+ if (flags & PERF_EF_START)
+ ddr_perf_event_start(event, flags);
+
return 0;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x ac9aa295f7a89d38656739628796f086f0b160e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073034-defender-boastful-74f0@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
ac9aa295f7a8 ("perf: imx_perf: fix counter start and config sequence")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ac9aa295f7a89d38656739628796f086f0b160e2 Mon Sep 17 00:00:00 2001
From: Xu Yang <xu.yang_2(a)nxp.com>
Date: Wed, 29 May 2024 16:03:55 +0800
Subject: [PATCH] perf: imx_perf: fix counter start and config sequence
In current driver, the counter will start firstly and then be configured.
This sequence is not correct for AXI filter events since the correct
AXI_MASK and AXI_ID are not set yet. Then the results may be inaccurate.
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Fixes: 55691f99d417 ("drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver")
cc: stable(a)vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
Link: https://lore.kernel.org/r/20240529080358.703784-5-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will(a)kernel.org>
diff --git a/drivers/perf/fsl_imx9_ddr_perf.c b/drivers/perf/fsl_imx9_ddr_perf.c
index 5433c52a9872..7b43b54920da 100644
--- a/drivers/perf/fsl_imx9_ddr_perf.c
+++ b/drivers/perf/fsl_imx9_ddr_perf.c
@@ -541,12 +541,12 @@ static int ddr_perf_event_add(struct perf_event *event, int flags)
hwc->idx = counter;
hwc->state |= PERF_HES_STOPPED;
- if (flags & PERF_EF_START)
- ddr_perf_event_start(event, flags);
-
/* read trans, write trans, read beat */
imx93_ddr_perf_monitor_config(pmu, event_id, counter, cfg1, cfg2);
+ if (flags & PERF_EF_START)
+ ddr_perf_event_start(event, flags);
+
return 0;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x ad97196379d0b8cb24ef3d5006978a6554e6467f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073023-blame-activist-10a9@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
ad97196379d0 ("perf/x86/intel/pt: Fix a topa_entry base address calculation")
38bb8d77d0b9 ("perf/x86/intel/pt: Split ToPA metadata and page layout")
539f7c26b41d ("perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation")
fffec50f541a ("perf/x86/intel/pt: Use helpers to obtain ToPA entry size")
f6d079ce867d ("perf/x86/intel/pt: Export pt_cap_get()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ad97196379d0b8cb24ef3d5006978a6554e6467f Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter(a)intel.com>
Date: Mon, 24 Jun 2024 23:10:56 +0300
Subject: [PATCH] perf/x86/intel/pt: Fix a topa_entry base address calculation
topa_entry->base is a bit-field. Bit-fields are not promoted to a 64-bit
type, even if the underlying type is 64-bit, and so, if necessary, must
be cast to a larger type when calculations are done.
Fix a topa_entry->base address calculation by adding a cast.
Without the cast, the address was limited to 36-bits i.e. 64GiB.
The address calculation is used on systems that do not support Multiple
Entry ToPA (only Broadwell), and affects physical addresses on or above
64GiB. Instead of writing to the correct address, the address comprising
the first 36 bits would be written to.
Intel PT snapshot and sampling modes are not affected.
Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver")
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240624201101.60186-3-adrian.hunter@intel.com
diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 14db6d9d318b..047a2cd5b3fe 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -878,7 +878,7 @@ static void pt_update_head(struct pt *pt)
*/
static void *pt_buffer_region(struct pt_buffer *buf)
{
- return phys_to_virt(TOPA_ENTRY(buf->cur, buf->cur_idx)->base << TOPA_SHIFT);
+ return phys_to_virt((phys_addr_t)TOPA_ENTRY(buf->cur, buf->cur_idx)->base << TOPA_SHIFT);
}
/**
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073058-game-plant-575c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073056-hypertext-proving-0c10@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073057-ogle-vocalist-31c7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073055-coaster-antitrust-74f5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073055-planner-cabana-cb16@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073054-emblaze-cufflink-f604@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 76ba6acfcce871db13ad51c6dc8f56fec2e92853
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073053-sterilize-garland-3014@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
76ba6acfcce8 ("mm: optimize the redundant loop of mm_update_owner_next()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76ba6acfcce871db13ad51c6dc8f56fec2e92853 Mon Sep 17 00:00:00 2001
From: Jinliang Zheng <alexjlzheng(a)tencent.com>
Date: Thu, 20 Jun 2024 20:21:24 +0800
Subject: [PATCH] mm: optimize the redundant loop of mm_update_owner_next()
When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or
/proc or ptrace or page migration (get_task_mm()), it is impossible to
find an appropriate task_struct in the loop whose mm_struct is the same as
the target mm_struct.
If the above race condition is combined with the stress-ng-zombie and
stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in
write_lock_irq() for tasklist_lock.
Recognize this situation in advance and exit early.
Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com
Signed-off-by: Jinliang Zheng <alexjlzheng(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/exit.c b/kernel/exit.c
index f95a2c1338a8..81fcee45d630 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -484,6 +484,8 @@ void mm_update_next_owner(struct mm_struct *mm)
* Search through everything else, we should not get here often.
*/
for_each_process(g) {
+ if (atomic_read(&mm->mm_users) <= 1)
+ break;
if (g->flags & PF_KTHREAD)
continue;
for_each_thread(g, c) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073048-jurist-stem-a276@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
0d648dd5c899 ("mm: drop the 'anon_' prefix for swap-out mTHP counters")
42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()")
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics")
085ff35e7636 ("mm: memory: move mem_cgroup_charge() into alloc_anon_folio()")
19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP")
3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073046-feminine-relapsing-3d49@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
0d648dd5c899 ("mm: drop the 'anon_' prefix for swap-out mTHP counters")
42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()")
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics")
085ff35e7636 ("mm: memory: move mem_cgroup_charge() into alloc_anon_folio()")
19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP")
3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073045-browbeat-banshee-05db@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
0d648dd5c899 ("mm: drop the 'anon_' prefix for swap-out mTHP counters")
42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()")
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics")
085ff35e7636 ("mm: memory: move mem_cgroup_charge() into alloc_anon_folio()")
19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP")
3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073044-compound-musky-a016@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
0d648dd5c899 ("mm: drop the 'anon_' prefix for swap-out mTHP counters")
42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()")
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics")
085ff35e7636 ("mm: memory: move mem_cgroup_charge() into alloc_anon_folio()")
19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP")
3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073043-licking-broadband-c7f8@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
0d648dd5c899 ("mm: drop the 'anon_' prefix for swap-out mTHP counters")
42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()")
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics")
085ff35e7636 ("mm: memory: move mem_cgroup_charge() into alloc_anon_folio()")
19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP")
3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073042-livable-headscarf-8bb2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
0d648dd5c899 ("mm: drop the 'anon_' prefix for swap-out mTHP counters")
42248b9d34ea ("mm: add docs for per-order mTHP counters and transhuge_page ABI")
d0f048ac39f6 ("mm: add per-order mTHP anon_swpout and anon_swpout_fallback counters")
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters")
5ed890ce5147 ("mm: vmscan: avoid split during shrink_folio_list()")
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics")
085ff35e7636 ("mm: memory: move mem_cgroup_charge() into alloc_anon_folio()")
19eaf44954df ("mm: thp: support allocation of anonymous multi-size THP")
3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 63d9866ab01ffd0d0835d5564107283a4afc0a38
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073041-sadden-duller-4fb5@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
63d9866ab01f ("mm: shmem: rename mTHP shmem counters")
f216c845f3c7 ("mm: add per-order mTHP split counters")
66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous shmem")
e7a2ab7b3bb5 ("mm: shmem: add mTHP support for anonymous shmem")
3d95bc21cea5 ("mm: shmem: add THP validation for PMD-mapped THP related statistics")
6f775463d002 ("mm: shmem: use folio_alloc_mpol() in shmem_alloc_folio()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 63d9866ab01ffd0d0835d5564107283a4afc0a38 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 10 Jul 2024 10:55:01 +0100
Subject: [PATCH] mm: shmem: rename mTHP shmem counters
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index fe237825b95c..058485daf186 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index acb6ac24a07e..cff002be83eb 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9ec64aa2be94..f9696c94e211 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallback, MTHP_STAT_ANON_FAULT_FALLBACK);
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] = {
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
diff --git a/mm/shmem.c b/mm/shmem.c
index 921d59c3d669..f24dfbd387ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf,
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ static int shmem_get_folio_gfp(struct inode *inode, pgoff_t index,
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073036-headed-ethically-758e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073034-regulator-impatient-bbba@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073033-riverbed-nutcase-30ab@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073032-uniquely-herring-53cb@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073031-dandelion-revered-89bd@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073030-affront-vigorous-240c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8dfcffa37094fef2c8cf8b602316766a86956d07
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073029-ungraded-buckskin-4be1@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
8dfcffa37094 ("Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8dfcffa37094fef2c8cf8b602316766a86956d07 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 21 Jun 2024 16:42:37 +0200
Subject: [PATCH] Revert "mm/writeback: fix possible divide-by-zero in
wb_dirty_limits(), again"
Patch series "mm: Avoid possible overflows in dirty throttling".
Dirty throttling logic assumes dirty limits in page units fit into
32-bits. This patch series makes sure this is true (see patch 2/2 for
more details).
This patch (of 2):
This reverts commit 9319b647902cbd5cc884ac08a8a6d54ce111fc78.
The commit is broken in several ways. Firstly, the removed (u64) cast
from the multiplication will introduce a multiplication overflow on 32-bit
archs if wb_thresh * bg_thresh >= 1<<32 (which is actually common - the
default settings with 4GB of RAM will trigger this). Secondly, the
div64_u64() is unnecessarily expensive on 32-bit archs. We have
div64_ul() in case we want to be safe & cheap. Thirdly, if dirty
thresholds are larger than 1<<32 pages, then dirty balancing is going to
blow up in many other spectacular ways anyway so trying to fix one
possible overflow is just moot.
Link: https://lkml.kernel.org/r/20240621144017.30993-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20240621144246.11148-1-jack@suse.cz
Fixes: 9319b647902c ("mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-By: Zach O'Keefe <zokeefe(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 7168e25f88e5..c4aa6e84c20a 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1683,7 +1683,7 @@ static inline void wb_dirty_limits(struct dirty_throttle_control *dtc)
*/
dtc->wb_thresh = __wb_calc_thresh(dtc, dtc->thresh);
dtc->wb_bg_thresh = dtc->thresh ?
- div64_u64(dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
+ div_u64((u64)dtc->wb_thresh * dtc->bg_thresh, dtc->thresh) : 0;
/*
* In order to avoid the stacked BDI deadlock we need
Connecting two 4k displays with display port through a lenovo usb-c dock (type 40AS) to a Lenovo P14s Gen 2 (type 21A0) results in no image on the connected displays.
The CPU in the Lenovo P14s is a 'AMD Ryzen 7 PRO 5850U with Radeon Graphics' and it has no discrete GPU.
I first noticed the issue with kernel version '6.10.0-arch1-2' provided by arch linux. With the previous kernel version '6.9.10.arch1-1' both connected displays worked normally. I reported the issue in the arch forums at https://bbs.archlinux.org/viewtopic.php?id=297999 and was guided to do a bisection to find the commit that caused the problem. Through testing I identified that the issue is not present in the latest kernel directly compiled from the trovalds/linux git repository.
With git bisect I identified 4df96ba66760345471a85ef7bb29e1cd4e956057 as the first bad commit and fa57924c76d995e87ca3533ec60d1d5e55769a27 as the first commit that fixed the problem again.
The initial commit only still shows an image on one of the connected 4k screens. I have not investigated further to find out at what point both displays stopped showing an image.
Best Regards,
Kevin
#regzbot introduced: 4df96ba66760345471a85ef7bb29e1cd4e956057
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 6ce3e98184b625d2870991880bf9586ded7ea7f9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073029-outfit-monogram-93d9@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
6ce3e98184b6 ("irqdomain: Fixed unbalanced fwnode get and put")
67a4e1a3bf7c ("irqdomain: Use return value of strreplace()")
ed1054a02aa2 ("irqdomain: Mark fwnodes when their irqdomain is added/removed")
181e9d4efaf6 ("irqdomain: Make __irq_domain_add() less OF-dependent")
43b98d876f89 ("genirq/irqdomain: Remove WARN_ON() on out-of-memory condition")
94967b55ebf3 ("genirq/debugfs: Reinstate full OF path for domain name")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ce3e98184b625d2870991880bf9586ded7ea7f9 Mon Sep 17 00:00:00 2001
From: Herve Codina <herve.codina(a)bootlin.com>
Date: Fri, 14 Jun 2024 19:32:04 +0200
Subject: [PATCH] irqdomain: Fixed unbalanced fwnode get and put
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 28709c14d894..7b4d580fc8e4 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -156,7 +156,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
switch (fwid->type) {
case IRQCHIP_FWNODE_NAMED:
case IRQCHIP_FWNODE_NAMED_ID:
- domain->fwnode = fwnode;
domain->name = kstrdup(fwid->name, GFP_KERNEL);
if (!domain->name) {
kfree(domain);
@@ -165,7 +164,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
- domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
@@ -185,7 +183,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
}
domain->name = strreplace(name, '/', ':');
- domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
@@ -201,8 +198,8 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
- fwnode_handle_get(fwnode);
- fwnode_dev_initialized(fwnode, true);
+ domain->fwnode = fwnode_handle_get(fwnode);
+ fwnode_dev_initialized(domain->fwnode, true);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6ce3e98184b625d2870991880bf9586ded7ea7f9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073029-each-sprung-4744@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
6ce3e98184b6 ("irqdomain: Fixed unbalanced fwnode get and put")
67a4e1a3bf7c ("irqdomain: Use return value of strreplace()")
ed1054a02aa2 ("irqdomain: Mark fwnodes when their irqdomain is added/removed")
181e9d4efaf6 ("irqdomain: Make __irq_domain_add() less OF-dependent")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ce3e98184b625d2870991880bf9586ded7ea7f9 Mon Sep 17 00:00:00 2001
From: Herve Codina <herve.codina(a)bootlin.com>
Date: Fri, 14 Jun 2024 19:32:04 +0200
Subject: [PATCH] irqdomain: Fixed unbalanced fwnode get and put
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 28709c14d894..7b4d580fc8e4 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -156,7 +156,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
switch (fwid->type) {
case IRQCHIP_FWNODE_NAMED:
case IRQCHIP_FWNODE_NAMED_ID:
- domain->fwnode = fwnode;
domain->name = kstrdup(fwid->name, GFP_KERNEL);
if (!domain->name) {
kfree(domain);
@@ -165,7 +164,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
- domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
@@ -185,7 +183,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
}
domain->name = strreplace(name, '/', ':');
- domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
@@ -201,8 +198,8 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
- fwnode_handle_get(fwnode);
- fwnode_dev_initialized(fwnode, true);
+ domain->fwnode = fwnode_handle_get(fwnode);
+ fwnode_dev_initialized(domain->fwnode, true);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6ce3e98184b625d2870991880bf9586ded7ea7f9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073028-suffix-spoiler-8263@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
6ce3e98184b6 ("irqdomain: Fixed unbalanced fwnode get and put")
67a4e1a3bf7c ("irqdomain: Use return value of strreplace()")
ed1054a02aa2 ("irqdomain: Mark fwnodes when their irqdomain is added/removed")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ce3e98184b625d2870991880bf9586ded7ea7f9 Mon Sep 17 00:00:00 2001
From: Herve Codina <herve.codina(a)bootlin.com>
Date: Fri, 14 Jun 2024 19:32:04 +0200
Subject: [PATCH] irqdomain: Fixed unbalanced fwnode get and put
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 28709c14d894..7b4d580fc8e4 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -156,7 +156,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
switch (fwid->type) {
case IRQCHIP_FWNODE_NAMED:
case IRQCHIP_FWNODE_NAMED_ID:
- domain->fwnode = fwnode;
domain->name = kstrdup(fwid->name, GFP_KERNEL);
if (!domain->name) {
kfree(domain);
@@ -165,7 +164,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
- domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
@@ -185,7 +183,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
}
domain->name = strreplace(name, '/', ':');
- domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
@@ -201,8 +198,8 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
- fwnode_handle_get(fwnode);
- fwnode_dev_initialized(fwnode, true);
+ domain->fwnode = fwnode_handle_get(fwnode);
+ fwnode_dev_initialized(domain->fwnode, true);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6ce3e98184b625d2870991880bf9586ded7ea7f9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073022-onshore-evaluator-a043@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
6ce3e98184b6 ("irqdomain: Fixed unbalanced fwnode get and put")
67a4e1a3bf7c ("irqdomain: Use return value of strreplace()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ce3e98184b625d2870991880bf9586ded7ea7f9 Mon Sep 17 00:00:00 2001
From: Herve Codina <herve.codina(a)bootlin.com>
Date: Fri, 14 Jun 2024 19:32:04 +0200
Subject: [PATCH] irqdomain: Fixed unbalanced fwnode get and put
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 28709c14d894..7b4d580fc8e4 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -156,7 +156,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
switch (fwid->type) {
case IRQCHIP_FWNODE_NAMED:
case IRQCHIP_FWNODE_NAMED_ID:
- domain->fwnode = fwnode;
domain->name = kstrdup(fwid->name, GFP_KERNEL);
if (!domain->name) {
kfree(domain);
@@ -165,7 +164,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
- domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
@@ -185,7 +183,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
}
domain->name = strreplace(name, '/', ':');
- domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
@@ -201,8 +198,8 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
- fwnode_handle_get(fwnode);
- fwnode_dev_initialized(fwnode, true);
+ domain->fwnode = fwnode_handle_get(fwnode);
+ fwnode_dev_initialized(domain->fwnode, true);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6ce3e98184b625d2870991880bf9586ded7ea7f9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073021-grimy-finalist-464d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
6ce3e98184b6 ("irqdomain: Fixed unbalanced fwnode get and put")
67a4e1a3bf7c ("irqdomain: Use return value of strreplace()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6ce3e98184b625d2870991880bf9586ded7ea7f9 Mon Sep 17 00:00:00 2001
From: Herve Codina <herve.codina(a)bootlin.com>
Date: Fri, 14 Jun 2024 19:32:04 +0200
Subject: [PATCH] irqdomain: Fixed unbalanced fwnode get and put
fwnode_handle_get(fwnode) is called when a domain is created with fwnode
passed as a function parameter. fwnode_handle_put(domain->fwnode) is called
when the domain is destroyed but during the creation a path exists that
does not set domain->fwnode.
If this path is taken, the fwnode get will never be put.
To avoid the unbalanced get and put, set domain->fwnode unconditionally.
Fixes: d59f6617eef0 ("genirq: Allow fwnode to carry name information only")
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240614173232.1184015-4-herve.codina@bootlin.com
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 28709c14d894..7b4d580fc8e4 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -156,7 +156,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
switch (fwid->type) {
case IRQCHIP_FWNODE_NAMED:
case IRQCHIP_FWNODE_NAMED_ID:
- domain->fwnode = fwnode;
domain->name = kstrdup(fwid->name, GFP_KERNEL);
if (!domain->name) {
kfree(domain);
@@ -165,7 +164,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
break;
default:
- domain->fwnode = fwnode;
domain->name = fwid->name;
break;
}
@@ -185,7 +183,6 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
}
domain->name = strreplace(name, '/', ':');
- domain->fwnode = fwnode;
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
@@ -201,8 +198,8 @@ static struct irq_domain *__irq_domain_create(struct fwnode_handle *fwnode,
domain->flags |= IRQ_DOMAIN_NAME_ALLOCATED;
}
- fwnode_handle_get(fwnode);
- fwnode_dev_initialized(fwnode, true);
+ domain->fwnode = fwnode_handle_get(fwnode);
+ fwnode_dev_initialized(domain->fwnode, true);
/* Fill structure */
INIT_RADIX_TREE(&domain->revmap_tree, GFP_KERNEL);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x bd50a974097bb82d52a458bd3ee39fb723129a0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073007-discover-unearned-d69c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
bd50a974097b ("devres: Fix memory leakage caused by driver API devm_free_percpu()")
d7aa44f5a1f8 ("driver core: Cast to (void *) with __force for __percpu pointer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bd50a974097bb82d52a458bd3ee39fb723129a0c Mon Sep 17 00:00:00 2001
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
Date: Tue, 2 Jul 2024 22:51:51 +0800
Subject: [PATCH] devres: Fix memory leakage caused by driver API
devm_free_percpu()
It will cause memory leakage when use driver API devm_free_percpu()
to free memory allocated by devm_alloc_percpu(), fixed by using
devres_release() instead of devres_destroy() within devm_free_percpu().
Fixes: ff86aae3b411 ("devres: add devm_alloc_percpu()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Link: https://lore.kernel.org/r/1719931914-19035-3-git-send-email-quic_zijuhu@qui…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index ff2247eec43c..8d709dbd4e0c 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -1225,7 +1225,11 @@ EXPORT_SYMBOL_GPL(__devm_alloc_percpu);
*/
void devm_free_percpu(struct device *dev, void __percpu *pdata)
{
- WARN_ON(devres_destroy(dev, devm_percpu_release, devm_percpu_match,
+ /*
+ * Use devres_release() to prevent memory leakage as
+ * devm_free_pages() does.
+ */
+ WARN_ON(devres_release(dev, devm_percpu_release, devm_percpu_match,
(__force void *)pdata));
}
EXPORT_SYMBOL_GPL(devm_free_percpu);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x bd50a974097bb82d52a458bd3ee39fb723129a0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073006-taekwondo-rocker-e76a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
bd50a974097b ("devres: Fix memory leakage caused by driver API devm_free_percpu()")
d7aa44f5a1f8 ("driver core: Cast to (void *) with __force for __percpu pointer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bd50a974097bb82d52a458bd3ee39fb723129a0c Mon Sep 17 00:00:00 2001
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
Date: Tue, 2 Jul 2024 22:51:51 +0800
Subject: [PATCH] devres: Fix memory leakage caused by driver API
devm_free_percpu()
It will cause memory leakage when use driver API devm_free_percpu()
to free memory allocated by devm_alloc_percpu(), fixed by using
devres_release() instead of devres_destroy() within devm_free_percpu().
Fixes: ff86aae3b411 ("devres: add devm_alloc_percpu()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Link: https://lore.kernel.org/r/1719931914-19035-3-git-send-email-quic_zijuhu@qui…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index ff2247eec43c..8d709dbd4e0c 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -1225,7 +1225,11 @@ EXPORT_SYMBOL_GPL(__devm_alloc_percpu);
*/
void devm_free_percpu(struct device *dev, void __percpu *pdata)
{
- WARN_ON(devres_destroy(dev, devm_percpu_release, devm_percpu_match,
+ /*
+ * Use devres_release() to prevent memory leakage as
+ * devm_free_pages() does.
+ */
+ WARN_ON(devres_release(dev, devm_percpu_release, devm_percpu_match,
(__force void *)pdata));
}
EXPORT_SYMBOL_GPL(devm_free_percpu);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x bd50a974097bb82d52a458bd3ee39fb723129a0c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073005-sporty-elm-72e4@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
bd50a974097b ("devres: Fix memory leakage caused by driver API devm_free_percpu()")
d7aa44f5a1f8 ("driver core: Cast to (void *) with __force for __percpu pointer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From bd50a974097bb82d52a458bd3ee39fb723129a0c Mon Sep 17 00:00:00 2001
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
Date: Tue, 2 Jul 2024 22:51:51 +0800
Subject: [PATCH] devres: Fix memory leakage caused by driver API
devm_free_percpu()
It will cause memory leakage when use driver API devm_free_percpu()
to free memory allocated by devm_alloc_percpu(), fixed by using
devres_release() instead of devres_destroy() within devm_free_percpu().
Fixes: ff86aae3b411 ("devres: add devm_alloc_percpu()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
Link: https://lore.kernel.org/r/1719931914-19035-3-git-send-email-quic_zijuhu@qui…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/base/devres.c b/drivers/base/devres.c
index ff2247eec43c..8d709dbd4e0c 100644
--- a/drivers/base/devres.c
+++ b/drivers/base/devres.c
@@ -1225,7 +1225,11 @@ EXPORT_SYMBOL_GPL(__devm_alloc_percpu);
*/
void devm_free_percpu(struct device *dev, void __percpu *pdata)
{
- WARN_ON(devres_destroy(dev, devm_percpu_release, devm_percpu_match,
+ /*
+ * Use devres_release() to prevent memory leakage as
+ * devm_free_pages() does.
+ */
+ WARN_ON(devres_release(dev, devm_percpu_release, devm_percpu_match,
(__force void *)pdata));
}
EXPORT_SYMBOL_GPL(devm_free_percpu);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 3415b10a03945b0da4a635e146750dfe5ce0f448
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024073001-unaudited-uncooked-cfe3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
3415b10a0394 ("kbuild: Fix '-S -c' in x86 stack protector scripts")
3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
c9a1ff316bc9 ("x86/stackprotector: Pre-initialize canary for secondary CPUs")
dc4e0021b00b ("x86/doublefault/32: Move #DF stack and TSS to cpu_entry_area")
e99b6f46ee5c ("x86/doublefault/32: Rename doublefault.c to doublefault_32.c")
93efbde2c331 ("x86/traps: Disentangle the 32-bit and 64-bit doublefault code")
ab851d49f6bf ("Merge branch 'x86-iopl-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3415b10a03945b0da4a635e146750dfe5ce0f448 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <nathan(a)kernel.org>
Date: Fri, 26 Jul 2024 11:05:00 -0700
Subject: [PATCH] kbuild: Fix '-S -c' in x86 stack protector scripts
After a recent change in clang to stop consuming all instances of '-S'
and '-c' [1], the stack protector scripts break due to the kernel's use
of -Werror=unused-command-line-argument to catch cases where flags are
not being properly consumed by the compiler driver:
$ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument
clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument]
This results in CONFIG_STACKPROTECTOR getting disabled because
CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set.
'-c' and '-S' both instruct the compiler to stop at different stages of
the pipeline ('-S' after compiling, '-c' after assembling), so having
them present together in the same command makes little sense. In this
case, the test wants to stop before assembling because it is looking at
the textual assembly output of the compiler for either '%fs' or '%gs',
so remove '-c' from the list of arguments to resolve the error.
All versions of GCC continue to work after this change, along with
versions of clang that do or do not contain the change mentioned above.
Cc: stable(a)vger.kernel.org
Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS")
Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector")
Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353… [1]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
diff --git a/scripts/gcc-x86_32-has-stack-protector.sh b/scripts/gcc-x86_32-has-stack-protector.sh
index 825c75c5b715..9459ca4f0f11 100755
--- a/scripts/gcc-x86_32-has-stack-protector.sh
+++ b/scripts/gcc-x86_32-has-stack-protector.sh
@@ -5,4 +5,4 @@
# -mstack-protector-guard-reg, added by
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81708
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -m32 -O0 -fstack-protector -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard - -o - 2> /dev/null | grep -q "%fs"
diff --git a/scripts/gcc-x86_64-has-stack-protector.sh b/scripts/gcc-x86_64-has-stack-protector.sh
index 75e4e22b986a..f680bb01aeeb 100755
--- a/scripts/gcc-x86_64-has-stack-protector.sh
+++ b/scripts/gcc-x86_64-has-stack-protector.sh
@@ -1,4 +1,4 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
-echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -c -m64 -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
+echo "int foo(void) { char X[200]; return 3; }" | $* -S -x c -m64 -O0 -mcmodel=kernel -fno-PIE -fstack-protector - -o - 2> /dev/null | grep -q "%gs"
In all the MPTCP backup related tests, the backup flag was set on one
side, and the expected behaviour is to have both sides respecting this
decision. That's also the "natural" way, and what the users seem to
expect.
On the scheduler side, only the 'backup' field was checked, which is
supposed to be set only if the other peer flagged a subflow as backup.
But in various places, this flag was also set when the local host
flagged the subflow as backup, certainly to have the expected behaviour
mentioned above.
Patch 1 modifies the packet scheduler to check if the backup flag has
been set on both directions, not to change its behaviour after having
applied the following patches. That's what the default packet scheduler
should have done since the beginning in v5.7.
Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by
accident since its introduction in v5.7. Instead, the received and sent
backup flags are properly distinguished in requests.
Patch 3 stops setting the received backup flag as well when sending an
MP_PRIO, something that was done since the MP_PRIO support in v5.12.
Patch 4 adds related and missing MIB counters to be able to easily check
if MP_JOIN are sent with a backup flag. Certainly because these counters
were not there, the behaviour that is fixed by patches here was not
properly verified.
Patch 5 validates the previous patch by extending the MPTCP Join
selftest.
Patch 6 fixes the backup support in signal endpoints: if a signal
endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as
expected. It was only set for ongoing connections, but not future ones
as expected, since the introduction of the backup flag in endpoints in
v5.10.
Patch 7 validates the previous patch by extending the MPTCP Join
selftest as well.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (7):
mptcp: sched: check both directions for backup
mptcp: distinguish rcv vs sent backup flag in requests
mptcp: pm: only set request_bkup flag when sending MP_PRIO
mptcp: mib: count MPJ with backup flag
selftests: mptcp: join: validate backup in MPJ
mptcp: pm: fix backup support in signal endpoints
selftests: mptcp: join: check backup support in signal endp
include/trace/events/mptcp.h | 2 +-
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 2 +-
net/mptcp/pm.c | 12 +++++
net/mptcp/pm_netlink.c | 19 ++++++-
net/mptcp/pm_userspace.c | 18 +++++++
net/mptcp/protocol.c | 10 ++--
net/mptcp/protocol.h | 4 ++
net/mptcp/subflow.c | 10 ++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++-----
11 files changed, 132 insertions(+), 21 deletions(-)
---
base-commit: 301927d2d2eb8e541357ba850bc7a1a74dbbd670
change-id: 20240727-upstream-net-20240727-mptcp-backup-signal-948235f2ad08
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
From: Ira Weiny <ira.weiny(a)intel.com>
commit ced085ef369af7a2b6da962ec2fbd01339f60693 upstream.
The "goto error" pattern is notorious for introducing subtle resource
leaks. Use the new cleanup.h helpers for PCI device reference counts and
locks.
Similar to the new put_device() and device_lock() cleanup helpers,
__free(put_device) and guard(device), define the same for PCI devices,
__free(pci_dev_put) and guard(pci_dev). These helpers eliminate the
need for "goto free;" and "goto unlock;" patterns. For example, A
'struct pci_dev *' instance declared as:
struct pci_dev *pdev __free(pci_dev_put) = NULL;
...will automatically call pci_dev_put() if @pdev is non-NULL when @pdev
goes out of scope (automatic variable scope). If a function wants to
invoke pci_dev_put() on error, but return @pdev on success, it can do:
return no_free_ptr(pdev);
...or:
return_ptr(pdev);
For potential cleanup opportunity there are 587 open-coded calls to
pci_dev_put() in the kernel with 65 instances within 10 lines of a goto
statement with the CXL driver threatening to add another one.
The guard() helper holds the associated lock for the remainder of the
current scope in which it was invoked. So, for example:
func(...)
{
if (...) {
...
guard(pci_dev); /* pci_dev_lock() invoked here */
...
} /* <- implied pci_dev_unlock() triggered here */
}
There are 15 invocations of pci_dev_unlock() in the kernel with 5
instances within 10 lines of a goto statement. Again, the CXL driver is
threatening to add another.
Introduce these helpers to preclude the addition of new more error prone
goto put; / goto unlock; sequences. For now, these helpers are used in
drivers/cxl/pci.c to allow ACPI error reports to be fed back into the
CXL driver associated with the PCI device identified in the report.
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Ira Weiny <ira.weiny(a)intel.com>
Link: https://lore.kernel.org/r/20231220-cxl-cper-v5-8-1bb8a4ca2c7a@intel.com
[djbw: rewrite changelog]
Acked-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Acked-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
---
include/linux/pci.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 4da7411da9ba..df73fb26b825 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1138,6 +1138,7 @@ int pci_get_interrupt_pin(struct pci_dev *dev, struct pci_dev **bridge);
u8 pci_common_swizzle(struct pci_dev *dev, u8 *pinp);
struct pci_dev *pci_dev_get(struct pci_dev *dev);
void pci_dev_put(struct pci_dev *dev);
+DEFINE_FREE(pci_dev_put, struct pci_dev *, if (_T) pci_dev_put(_T))
void pci_remove_bus(struct pci_bus *b);
void pci_stop_and_remove_bus_device(struct pci_dev *dev);
void pci_stop_and_remove_bus_device_locked(struct pci_dev *dev);
@@ -1746,6 +1747,7 @@ void pci_cfg_access_unlock(struct pci_dev *dev);
void pci_dev_lock(struct pci_dev *dev);
int pci_dev_trylock(struct pci_dev *dev);
void pci_dev_unlock(struct pci_dev *dev);
+DEFINE_GUARD(pci_dev, struct pci_dev *, pci_dev_lock(_T), pci_dev_unlock(_T))
/*
* PCI domain support. Sometimes called PCI segment (eg by ACPI),
--
2.43.0
Hello!
My Asus Laptop (ASUS VivoBook PRO 15 OLED M3500QA-L1072) reporting this
wmi code everytime I connect power cable.
I got no key code on power disconnect.
[11238.502716] asus_wmi: Unknown key code 0xcf
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: v4l: Fix missing tabular column hint for Y14P format
Author: Jean-Michel Hautbois <jeanmichel.hautbois(a)yoseli.org>
Date: Sat Jun 8 18:41:27 2024 +0200
The original patch added two columns in the flat-table of Luma-Only
Image Formats, without updating hints to latex: above it. This results
in wrong column count in the output of Sphinx's latex builder.
Fix it.
Reported-by: Akira Yokosawa <akiyks(a)gmail.com>
Closes: https://lore.kernel.org/linux-media/bdbc27ba-5098-49fb-aabf-753c81361cc7@gm…
Fixes: adb1d4655e53 ("media: v4l: Add V4L2-PIX-FMT-Y14P format")
Cc: stable(a)vger.kernel.org # for v6.10
Signed-off-by: Jean-Michel Hautbois <jeanmichel.hautbois(a)yoseli.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Documentation/userspace-api/media/v4l/pixfmt-yuv-luma.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
---
diff --git a/Documentation/userspace-api/media/v4l/pixfmt-yuv-luma.rst b/Documentation/userspace-api/media/v4l/pixfmt-yuv-luma.rst
index f02e6cf3516a..74df19be91f6 100644
--- a/Documentation/userspace-api/media/v4l/pixfmt-yuv-luma.rst
+++ b/Documentation/userspace-api/media/v4l/pixfmt-yuv-luma.rst
@@ -21,9 +21,9 @@ are often referred to as greyscale formats.
.. raw:: latex
- \scriptsize
+ \tiny
-.. tabularcolumns:: |p{3.6cm}|p{3.0cm}|p{1.3cm}|p{2.6cm}|p{1.3cm}|p{1.3cm}|p{1.3cm}|
+.. tabularcolumns:: |p{3.6cm}|p{2.4cm}|p{1.3cm}|p{1.3cm}|p{1.3cm}|p{1.3cm}|p{1.3cm}|p{1.3cm}|p{1.3cm}|
.. flat-table:: Luma-Only Image Formats
:header-rows: 1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: intel/ipu6: select AUXILIARY_BUS in Kconfig
Author: Bingbu Cao <bingbu.cao(a)intel.com>
Date: Wed Jul 17 15:40:50 2024 +0800
Intel IPU6 PCI driver need register its devices on auxiliary
bus, so it needs to select the AUXILIARY_BUS in Kconfig.
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202407161833.7BEFXejx-lkp@intel.com/
Fixes: c70281cc83d6 ("media: intel/ipu6: add Kconfig and Makefile")
Signed-off-by: Bingbu Cao <bingbu.cao(a)intel.com>
Cc: stable(a)vger.kernel.org # for v6.10
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/pci/intel/ipu6/Kconfig | 1 +
1 file changed, 1 insertion(+)
---
diff --git a/drivers/media/pci/intel/ipu6/Kconfig b/drivers/media/pci/intel/ipu6/Kconfig
index b7ab24b89836..40e20f0aa5ae 100644
--- a/drivers/media/pci/intel/ipu6/Kconfig
+++ b/drivers/media/pci/intel/ipu6/Kconfig
@@ -4,6 +4,7 @@ config VIDEO_INTEL_IPU6
depends on VIDEO_DEV
depends on X86 && X86_64 && HAS_DMA
depends on IPU_BRIDGE || !IPU_BRIDGE
+ select AUXILIARY_BUS
select DMA_OPS
select IOMMU_IOVA
select VIDEO_V4L2_SUBDEV_API
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: ipu-bridge: fix ipu6 Kconfig dependencies
Author: Arnd Bergmann <arnd(a)arndb.de>
Date: Fri Jul 19 11:53:50 2024 +0200
Commit 4670c8c3fb04 ("media: ipu-bridge: Fix Kconfig dependencies") changed
how IPU_BRIDGE dependencies are handled for all drivers, but the IPU6
variant was added the old way, which causes build time warnings when I2C is
turned off:
WARNING: unmet direct dependencies detected for IPU_BRIDGE
Depends on [n]: MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && I2C [=n]
Selected by [m]:
- VIDEO_INTEL_IPU6 [=m] && MEDIA_SUPPORT [=m] && PCI [=y] && MEDIA_PCI_SUPPORT [=y] && (ACPI [=y] || COMPILE_TEST [=y]) && VIDEO_DEV [=m] && X86 [=y] && X86_64 [=y] && HAS_DMA [=y]
To make it consistent with the other IPU drivers as well as avoid this
warning, change the 'select' into 'depends on'.
Fixes: c70281cc83d6 ("media: intel/ipu6: add Kconfig and Makefile")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
[Sakari Ailus: Alternatively depend on !IPU_BRIDGE.]
Cc: stable(a)vger.kernel.org # for v6.10
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/pci/intel/ipu6/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/pci/intel/ipu6/Kconfig b/drivers/media/pci/intel/ipu6/Kconfig
index 154343080c82..b7ab24b89836 100644
--- a/drivers/media/pci/intel/ipu6/Kconfig
+++ b/drivers/media/pci/intel/ipu6/Kconfig
@@ -3,13 +3,13 @@ config VIDEO_INTEL_IPU6
depends on ACPI || COMPILE_TEST
depends on VIDEO_DEV
depends on X86 && X86_64 && HAS_DMA
+ depends on IPU_BRIDGE || !IPU_BRIDGE
select DMA_OPS
select IOMMU_IOVA
select VIDEO_V4L2_SUBDEV_API
select MEDIA_CONTROLLER
select VIDEOBUF2_DMA_CONTIG
select V4L2_FWNODE
- select IPU_BRIDGE
help
This is the 6th Gen Intel Image Processing Unit, found in Intel SoCs
and used for capturing images and video from camera sensors.
From: Rameshkumar Sundaram <quic_ramess(a)quicinc.com>
[ Upstream commit 57b341e9ab13e5688491bfd54f8b5502416c8905 ]
Stations can update bandwidth/NSS change in
VHT action frame with action type Operating Mode Notification.
(IEEE Std 802.11-2020 - 9.4.1.53 Operating Mode field)
For Operating Mode Notification, an RX NSS change to a value
greater than AP's maximum NSS should not be allowed.
During fuzz testing, by forcefully sending VHT Op. mode notif.
frames from STA with random rx_nss values, it is found that AP
accepts rx_nss values greater that APs maximum NSS instead of
discarding such NSS change.
Hence allow NSS change only up to maximum NSS that is negotiated
and capped to AP's capability during association.
Signed-off-by: Rameshkumar Sundaram <quic_ramess(a)quicinc.com>
Link: https://lore.kernel.org/r/20230207114146.10567-1-quic_ramess@quicinc.com
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Hauke Mehrtens <hauke(a)hauke-m.de>
---
net/mac80211/vht.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/net/mac80211/vht.c b/net/mac80211/vht.c
index f7526be8a1c7..b3a5c3e96a72 100644
--- a/net/mac80211/vht.c
+++ b/net/mac80211/vht.c
@@ -637,7 +637,7 @@ u32 __ieee80211_vht_handle_opmode(struct ieee80211_sub_if_data *sdata,
enum ieee80211_sta_rx_bandwidth new_bw;
struct sta_opmode_info sta_opmode = {};
u32 changed = 0;
- u8 nss;
+ u8 nss, cur_nss;
/* ignore - no support for BF yet */
if (opmode & IEEE80211_OPMODE_NOTIF_RX_NSS_TYPE_BF)
@@ -648,10 +648,25 @@ u32 __ieee80211_vht_handle_opmode(struct ieee80211_sub_if_data *sdata,
nss += 1;
if (link_sta->pub->rx_nss != nss) {
- link_sta->pub->rx_nss = nss;
- sta_opmode.rx_nss = nss;
- changed |= IEEE80211_RC_NSS_CHANGED;
- sta_opmode.changed |= STA_OPMODE_N_SS_CHANGED;
+ cur_nss = link_sta->pub->rx_nss;
+ /* Reset rx_nss and call ieee80211_sta_set_rx_nss() which
+ * will set the same to max nss value calculated based on capability.
+ */
+ link_sta->pub->rx_nss = 0;
+ ieee80211_sta_set_rx_nss(link_sta);
+ /* Do not allow an nss change to rx_nss greater than max_nss
+ * negotiated and capped to APs capability during association.
+ */
+ if (nss <= link_sta->pub->rx_nss) {
+ link_sta->pub->rx_nss = nss;
+ sta_opmode.rx_nss = nss;
+ changed |= IEEE80211_RC_NSS_CHANGED;
+ sta_opmode.changed |= STA_OPMODE_N_SS_CHANGED;
+ } else {
+ link_sta->pub->rx_nss = cur_nss;
+ pr_warn_ratelimited("Ignoring NSS change in VHT Operating Mode Notification from %pM with invalid nss %d",
+ link_sta->pub->addr, nss);
+ }
}
switch (opmode & IEEE80211_OPMODE_NOTIF_CHANWIDTH_MASK) {
--
2.45.2
This patchset serves to prevent an AB/BA deadlock:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg7…
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Cc: stable(a)vger.kernel.org
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (3):
Revert "ALSA: firewire-lib: obsolete workqueue for period update"
Revert "ALSA: firewire-lib: operate for period elapse event in process
context"
ALSA: firewire-lib: amdtp-stream work queue inline description
sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 25 insertions(+), 14 deletions(-)
--
2.45.2
This patchset serves to prevent an AB/BA deadlock:
thread 0:
* (lock A) acquire substream lock by
snd_pcm_stream_lock_irq() in
snd_pcm_status64()
* (lock B) wait for tasklet to finish by calling
tasklet_unlock_spin_wait() in
tasklet_disable_in_atomic() in
ohci_flush_iso_completions() of ohci.c
thread 1:
* (lock B) enter tasklet
* (lock A) attempt to acquire substream lock,
waiting for it to be released:
snd_pcm_stream_lock_irqsave() in
snd_pcm_period_elapsed() in
update_pcm_pointers() in
process_ctx_payloads() in
process_rx_packets() of amdtp-stream.c
? tasklet_unlock_spin_wait
</NMI>
<TASK>
ohci_flush_iso_completions firewire_ohci
amdtp_domain_stream_pcm_pointer snd_firewire_lib
snd_pcm_update_hw_ptr0 snd_pcm
snd_pcm_status64 snd_pcm
? native_queued_spin_lock_slowpath
</NMI>
<IRQ>
_raw_spin_lock_irqsave
snd_pcm_period_elapsed snd_pcm
process_rx_packets snd_firewire_lib
irq_target_callback snd_firewire_lib
handle_it_packet firewire_ohci
context_tasklet firewire_ohci
The issue has been reported as a regression of kernel 5.14:
Link: https://lore.kernel.org/regressions/kwryofzdmjvzkuw6j3clftsxmoolynljztxqwg7…
("[REGRESSION] ALSA: firewire-lib: snd_pcm_period_elapsed deadlock with
Fireface 800")
Commit 7ba5ca32fe6e ("ALSA: firewire-lib: operate for period elapse event
in process context") removed the process context workqueue from
amdtp_domain_stream_pcm_pointer() and update_pcm_pointers() to remove
its overhead.
Commit b5b519965c4c ("ALSA: firewire-lib: obsolete workqueue for period
update") belongs to the same patch series and removed
the now-unused workqueue entirely.
Though being observed on RME Fireface 800, this issue would affect all
Firewire audio interfaces using ohci amdtp + pcm streaming.
ALSA streaming, especially under intensive CPU load will reveal this issue
the soonest due to issuing more hardIRQs, with time to occurrence ranging
from 2 secons to 30 minutes after starting playback.
to reproduce the issue:
direct ALSA playback to the device:
mpv --audio-device=alsa/sysdefault:CARD=Fireface800 Spor-Ignition.flac
Time to occurrence: 2s to 30m
Likelihood increased by:
- high CPU load
stress --cpu $(nproc)
- switching between applications via workspaces
tested with i915 in Xfce
PulsaAudio / PipeWire conceal the issue as they run PCM substream
without period wakeup mode, issuing less hardIRQs.
Cc: stable(a)vger.kernel.org
Backport note:
Also applies to and fixes on (tested):
6.10.2, 6.9.12, 6.6.43, 6.1.102, 5.15.164
Edmund Raile (3):
Revert "ALSA: firewire-lib: obsolete workqueue for period update"
Revert "ALSA: firewire-lib: operate for period elapse event in process
context"
ALSA: firewire-lib: amdtp-stream work queue inline description
sound/firewire/amdtp-stream.c | 38 ++++++++++++++++++++++-------------
sound/firewire/amdtp-stream.h | 1 +
2 files changed, 25 insertions(+), 14 deletions(-)
--
2.45.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 525bd65aa759ec320af1dc06e114ed69733e9e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072914-sierra-aflutter-e231@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
525bd65aa759 ("fuse: verify {g,u}id mount options correctly")
84c215075b57 ("fuse: name fs_context consistently")
1dd539577c42 ("virtiofs: add a mount option to enable dax")
f4fd4ae354ba ("virtiofs: get rid of no_mount_options")
b330966f79fb ("fuse: reject options on reconfigure via fsconfig(2)")
e8b20a474cf2 ("fuse: ignore 'data' argument of mount(..., MS_REMOUNT)")
0189a2d367f4 ("fuse: use ->reconfigure() instead of ->remount_fs()")
7fd3abfa8dd7 ("virtiofs: do not use fuse_fill_super_common() for device installation")
c9d35ee049b4 ("Merge branch 'merge.nfs-fs_parse.1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 525bd65aa759ec320af1dc06e114ed69733e9e23 Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen(a)redhat.com>
Date: Tue, 2 Jul 2024 17:22:41 -0500
Subject: [PATCH] fuse: verify {g,u}id mount options correctly
As was done in
0200679fc795 ("tmpfs: verify {g,u}id mount options correctly")
we need to validate that the requested uid and/or gid is representable in
the filesystem's idmapping.
Cribbing from the above commit log,
The contract for {g,u}id mount options and {g,u}id values in general set
from userspace has always been that they are translated according to the
caller's idmapping. In so far, fuse has been doing the correct thing.
But since fuse is mountable in unprivileged contexts it is also
necessary to verify that the resulting {k,g}uid is representable in the
namespace of the superblock.
Fixes: c30da2e981a7 ("fuse: convert to use the new mount API")
Cc: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Eric Sandeen <sandeen(a)redhat.com>
Link: https://lore.kernel.org/r/8f07d45d-c806-484d-a2e3-7a2199df1cd2@redhat.com
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..32fe6fa72f46 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -755,6 +755,8 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
struct fs_parse_result result;
struct fuse_fs_context *ctx = fsc->fs_private;
int opt;
+ kuid_t kuid;
+ kgid_t kgid;
if (fsc->purpose == FS_CONTEXT_FOR_RECONFIGURE) {
/*
@@ -799,16 +801,30 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
break;
case OPT_USER_ID:
- ctx->user_id = make_kuid(fsc->user_ns, result.uint_32);
- if (!uid_valid(ctx->user_id))
+ kuid = make_kuid(fsc->user_ns, result.uint_32);
+ if (!uid_valid(kuid))
return invalfc(fsc, "Invalid user_id");
+ /*
+ * The requested uid must be representable in the
+ * filesystem's idmapping.
+ */
+ if (!kuid_has_mapping(fsc->user_ns, kuid))
+ return invalfc(fsc, "Invalid user_id");
+ ctx->user_id = kuid;
ctx->user_id_present = true;
break;
case OPT_GROUP_ID:
- ctx->group_id = make_kgid(fsc->user_ns, result.uint_32);
- if (!gid_valid(ctx->group_id))
+ kgid = make_kgid(fsc->user_ns, result.uint_32);;
+ if (!gid_valid(kgid))
return invalfc(fsc, "Invalid group_id");
+ /*
+ * The requested gid must be representable in the
+ * filesystem's idmapping.
+ */
+ if (!kgid_has_mapping(fsc->user_ns, kgid))
+ return invalfc(fsc, "Invalid group_id");
+ ctx->group_id = kgid;
ctx->group_id_present = true;
break;
All batches of the Pine64 Pinebook Pro, except the latest batch (as of 2024)
whose hardware design was revised due to the component shortage, use a 1S
lithium battery whose nominal/design capacity is 10,000 mAh, according to the
battery datasheet. [1][2] Let's correct the design full-charge value in the
Pinebook Pro board dts, to improve the accuracy of the hardware description,
and to hopefully improve the accuracy of the fuel gauge a bit on all units
that don't belong to the latest batch.
The above-mentioned latest batch uses a different 1S lithium battery with
a slightly lower capacity, more precisely 9,600 mAh. To make the fuel gauge
work reliably on the latest batch, a sample battery would need to be sent to
CellWise, to obtain its proprietary battery profile, whose data goes into
"cellwise,battery-profile" in the Pinebook Pro board dts. Without that data,
the fuel gauge reportedly works unreliably, so changing the design capacity
won't have any negative effects on the already unreliable operation of the
fuel gauge in the Pinebook Pros that belong to the latest batch.
According to the battery datasheet, its voltage can go as low as 2.75 V while
discharging, but it's better to leave the current 3.0 V value in the dts file,
because of the associated Pinebook Pro's voltage regulation issues.
[1] https://wiki.pine64.org/index.php/Pinebook_Pro#Battery
[2] https://files.pine64.org/doc/datasheet/pinebook/40110175P%203.8V%2010000mAh…
Fixes: c7c4d698cd28 ("arm64: dts: rockchip: add fuel gauge to Pinebook Pro dts")
Cc: stable(a)vger.kernel.org
Cc: Marek Kraus <gamiee(a)pine64.org>
Signed-off-by: Dragan Simic <dsimic(a)manjaro.org>
---
arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
index 294eb2de263d..c918968714e4 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3399-pinebook-pro.dts
@@ -37,7 +37,7 @@ backlight: edp-backlight {
bat: battery {
compatible = "simple-battery";
- charge-full-design-microamp-hours = <9800000>;
+ charge-full-design-microamp-hours = <10000000>;
voltage-max-design-microvolt = <4350000>;
voltage-min-design-microvolt = <3000000>;
};
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 525bd65aa759ec320af1dc06e114ed69733e9e23
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072908-everglade-starved-66da@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
525bd65aa759 ("fuse: verify {g,u}id mount options correctly")
84c215075b57 ("fuse: name fs_context consistently")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 525bd65aa759ec320af1dc06e114ed69733e9e23 Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen(a)redhat.com>
Date: Tue, 2 Jul 2024 17:22:41 -0500
Subject: [PATCH] fuse: verify {g,u}id mount options correctly
As was done in
0200679fc795 ("tmpfs: verify {g,u}id mount options correctly")
we need to validate that the requested uid and/or gid is representable in
the filesystem's idmapping.
Cribbing from the above commit log,
The contract for {g,u}id mount options and {g,u}id values in general set
from userspace has always been that they are translated according to the
caller's idmapping. In so far, fuse has been doing the correct thing.
But since fuse is mountable in unprivileged contexts it is also
necessary to verify that the resulting {k,g}uid is representable in the
namespace of the superblock.
Fixes: c30da2e981a7 ("fuse: convert to use the new mount API")
Cc: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Eric Sandeen <sandeen(a)redhat.com>
Link: https://lore.kernel.org/r/8f07d45d-c806-484d-a2e3-7a2199df1cd2@redhat.com
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..32fe6fa72f46 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -755,6 +755,8 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
struct fs_parse_result result;
struct fuse_fs_context *ctx = fsc->fs_private;
int opt;
+ kuid_t kuid;
+ kgid_t kgid;
if (fsc->purpose == FS_CONTEXT_FOR_RECONFIGURE) {
/*
@@ -799,16 +801,30 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
break;
case OPT_USER_ID:
- ctx->user_id = make_kuid(fsc->user_ns, result.uint_32);
- if (!uid_valid(ctx->user_id))
+ kuid = make_kuid(fsc->user_ns, result.uint_32);
+ if (!uid_valid(kuid))
return invalfc(fsc, "Invalid user_id");
+ /*
+ * The requested uid must be representable in the
+ * filesystem's idmapping.
+ */
+ if (!kuid_has_mapping(fsc->user_ns, kuid))
+ return invalfc(fsc, "Invalid user_id");
+ ctx->user_id = kuid;
ctx->user_id_present = true;
break;
case OPT_GROUP_ID:
- ctx->group_id = make_kgid(fsc->user_ns, result.uint_32);
- if (!gid_valid(ctx->group_id))
+ kgid = make_kgid(fsc->user_ns, result.uint_32);;
+ if (!gid_valid(kgid))
return invalfc(fsc, "Invalid group_id");
+ /*
+ * The requested gid must be representable in the
+ * filesystem's idmapping.
+ */
+ if (!kgid_has_mapping(fsc->user_ns, kgid))
+ return invalfc(fsc, "Invalid group_id");
+ ctx->group_id = kgid;
ctx->group_id_present = true;
break;
The ufshcd_add_delay_before_dme_cmd() always introduces a delay of
MIN_DELAY_BEFORE_DME_CMDS_US between DME commands even when it's not
required. The delay is added when the UFS host controller supplies the
quirk UFSHCD_QUIRK_DELAY_BEFORE_DME_CMDS.
Fix the logic to update hba->last_dme_cmd_tstamp to ensure subsequent
DME commands have the correct delay in the range of 0 to
MIN_DELAY_BEFORE_DME_CMDS_US.
Update the timestamp at the end of the function to ensure it captures
the latest time after any necessary delay has been applied.
Signed-off-by: Vamshi Gajjela <vamshigajjela(a)google.com>
Fixes: cad2e03d8607 ("ufs: add support to allow non standard behaviours (quirks)")
Cc: stable(a)vger.kernel.org
---
drivers/ufs/core/ufshcd.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index dc757ba47522..406bda1585f6 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4090,11 +4090,16 @@ static inline void ufshcd_add_delay_before_dme_cmd(struct ufs_hba *hba)
min_sleep_time_us =
MIN_DELAY_BEFORE_DME_CMDS_US - delta;
else
- return; /* no more delay required */
+ min_sleep_time_us = 0; /* no more delay required */
}
- /* allow sleep for extra 50us if needed */
- usleep_range(min_sleep_time_us, min_sleep_time_us + 50);
+ if (min_sleep_time_us > 0) {
+ /* allow sleep for extra 50us if needed */
+ usleep_range(min_sleep_time_us, min_sleep_time_us + 50);
+ }
+
+ /* update the last_dme_cmd_tstamp */
+ hba->last_dme_cmd_tstamp = ktime_get();
}
/**
--
2.45.2.1089.g2a221341d9-goog
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 24dce1c538a7ceac43f2f97aae8dfd4bb93ea9b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072931-sarcastic-coagulant-6821@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
24dce1c538a7 ("io_uring: fix lost getsockopt completions")
d10f19dff56e ("io_uring/uring_cmd: switch to always allocating async data")
a9165b83c193 ("io_uring/rw: always setup io_async_rw for read/write requests")
8e5b3b89ecaf ("io_uring: remove struct io_tw_state::locked")
92219afb980e ("io_uring: force tw ctx locking")
6e6b8c62120a ("io_uring/rw: avoid punting to io-wq directly")
e1eef2e56cb0 ("io_uring/cmd: fix tw <-> issue_flags conversion")
6edd953b6ec7 ("io_uring/cmd: kill one issue_flags to tw conversion")
da12d9ab5889 ("io_uring/cmd: move io_uring_try_cancel_uring_cmd()")
8d0c12a80cde ("io-uring: add napi busy poll support")
405b4dc14b10 ("io-uring: move io_wait_queue definition to header file")
da08d2edb020 ("io_uring: re-arrange struct io_ring_ctx to reduce padding")
42c0905f0cac ("io_uring: cleanup handle_tw_list() calling convention")
9fe3eaea4a35 ("io_uring: remove unconditional looping in local task_work handling")
670d9d3df880 ("io_uring: remove next io_kiocb fetch in task_work running")
170539bdf109 ("io_uring: handle traditional task_work in FIFO order")
592b4805432a ("io_uring: remove looping around handling traditional task_work")
95041b93e90a ("io_uring: add io_file_can_poll() helper")
521223d7c229 ("io_uring/cancel: don't default to setting req->work.cancel_seq")
4bcb982cce74 ("io_uring: expand main struct io_kiocb flags to 64-bits")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24dce1c538a7ceac43f2f97aae8dfd4bb93ea9b9 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Tue, 16 Jul 2024 19:05:46 +0100
Subject: [PATCH] io_uring: fix lost getsockopt completions
There is a report that iowq executed getsockopt never completes. The
reason being that io_uring_cmd_sock() can return a positive result, and
io_uring_cmd() propagates it back to core io_uring, instead of IOU_OK.
In case of io_wq_submit_work(), the request will be dropped without
completing it.
The offending code was introduced by a hack in
a9c3eda7eada9 ("io_uring: fix submission-failure handling for uring-cmd"),
however it was fine until getsockopt was introduced and started
returning positive results.
The right solution is to always return IOU_OK, since
e0b23d9953b0c ("io_uring: optimise ltimeout for inline execution"),
we should be able to do it without problems, however for the sake of
backporting and minimising side effects, let's keep returning negative
return codes and otherwise do IOU_OK.
Link: https://github.com/axboe/liburing/issues/1181
Cc: stable(a)vger.kernel.org
Fixes: 8e9fad0e70b7b ("io_uring: Add io_uring command support for sockets")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Reviewed-by: Breno Leitao <leitao(a)debian.org>
Link: https://lore.kernel.org/r/ff349cf0654018189b6077e85feed935f0f8839e.17211498…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/uring_cmd.c b/io_uring/uring_cmd.c
index 21ac5fb2d5f0..a54163a83968 100644
--- a/io_uring/uring_cmd.c
+++ b/io_uring/uring_cmd.c
@@ -265,7 +265,7 @@ int io_uring_cmd(struct io_kiocb *req, unsigned int issue_flags)
req_set_fail(req);
io_req_uring_cleanup(req, issue_flags);
io_req_set_res(req, ret, 0);
- return ret;
+ return ret < 0 ? ret : IOU_OK;
}
int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f442fa6141379a20b48ae3efabee827a3d260787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071541-observing-landline-c9ec@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
f442fa614137 ("mm: gup: stop abusing try_grab_folio")
01d89b93e176 ("mm/gup: fix hugepd handling in hugetlb rework")
9cbe4954c6d9 ("gup: use folios for gup_devmap")
53e45c4f6d4f ("mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs()")
6785c54a1b43 ("mm: remove put_devmap_managed_page()")
25176ad09ca3 ("mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST")
23babe1934d7 ("mm/gup: consistently name GUP-fast functions")
a12083d721d7 ("mm/gup: handle hugepd for follow_page()")
4418c522f683 ("mm/gup: handle huge pmd for follow_pmd_mask()")
1b1676180246 ("mm/gup: handle huge pud for follow_pud_mask()")
caf8cab79857 ("mm/gup: cache *pudp in follow_pud_mask()")
878b0c451621 ("mm/gup: handle hugetlb for no_page_table()")
f3c94c625fe3 ("mm/gup: refactor record_subpages() to find 1st small page")
607c63195d63 ("mm/gup: drop gup_fast_folio_allowed() in hugepd processing")
f002882ca369 ("mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed()")
1965e933ddeb ("mm/treewide: replace pXd_huge() with pXd_leaf()")
7db86dc389aa ("mm/gup: merge pXd huge mapping checks")
089f92141ed0 ("mm/gup: check p4d presence before going on")
e6fd5564c07c ("mm/gup: cache p4d in follow_p4d_mask()")
65291dcfcf89 ("mm/secretmem: fix GUP-fast succeeding on secretmem folios")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f442fa6141379a20b48ae3efabee827a3d260787 Mon Sep 17 00:00:00 2001
From: Yang Shi <yang(a)os.amperecomputing.com>
Date: Fri, 28 Jun 2024 12:14:58 -0700
Subject: [PATCH] mm: gup: stop abusing try_grab_folio
A kernel warning was reported when pinning folio in CMA memory when
launching SEV virtual machine. The splat looks like:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520
[ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6
[ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520
[ 464.325515] Call Trace:
[ 464.325520] <TASK>
[ 464.325523] ? __get_user_pages+0x423/0x520
[ 464.325528] ? __warn+0x81/0x130
[ 464.325536] ? __get_user_pages+0x423/0x520
[ 464.325541] ? report_bug+0x171/0x1a0
[ 464.325549] ? handle_bug+0x3c/0x70
[ 464.325554] ? exc_invalid_op+0x17/0x70
[ 464.325558] ? asm_exc_invalid_op+0x1a/0x20
[ 464.325567] ? __get_user_pages+0x423/0x520
[ 464.325575] __gup_longterm_locked+0x212/0x7a0
[ 464.325583] internal_get_user_pages_fast+0xfb/0x190
[ 464.325590] pin_user_pages_fast+0x47/0x60
[ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd]
[ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Per the analysis done by yangge, when starting the SEV virtual machine, it
will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory.
But the page is in CMA area, so fast GUP will fail then fallback to the
slow path due to the longterm pinnalbe check in try_grab_folio().
The slow path will try to pin the pages then migrate them out of CMA area.
But the slow path also uses try_grab_folio() to pin the page, it will
also fail due to the same check then the above warning is triggered.
In addition, the try_grab_folio() is supposed to be used in fast path and
it elevates folio refcount by using add ref unless zero. We are guaranteed
to have at least one stable reference in slow path, so the simple atomic add
could be used. The performance difference should be trivial, but the
misuse may be confusing and misleading.
Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page()
to try_grab_folio(), and use them in the proper paths. This solves both
the abuse and the kernel warning.
The proper naming makes their usecase more clear and should prevent from
abusing in the future.
peterx said:
: The user will see the pin fails, for gpu-slow it further triggers the WARN
: right below that failure (as in the original report):
:
: folio = try_grab_folio(page, page_increm - 1,
: foll_flags);
: if (WARN_ON_ONCE(!folio)) { <------------------------ here
: /*
: * Release the 1st page ref if the
: * folio is problematic, fail hard.
: */
: gup_put_folio(page_folio(page), 1,
: foll_flags);
: ret = -EFAULT;
: goto out;
: }
[1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge11…
[shy828301(a)gmail.com: fix implicit declaration of function try_grab_folio_fast]
Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xj…
Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.…
Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Reported-by: yangge <yangge1116(a)126.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 469799f805f1..f1d6bc06eb52 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -97,95 +97,6 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
return folio;
}
-/**
- * try_grab_folio() - Attempt to get or pin a folio.
- * @page: pointer to page to be grabbed
- * @refs: the value to (effectively) add to the folio's refcount
- * @flags: gup flags: these are the FOLL_* flag values.
- *
- * "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
- *
- * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
- * same time. (That's true throughout the get_user_pages*() and
- * pin_user_pages*() APIs.) Cases:
- *
- * FOLL_GET: folio's refcount will be incremented by @refs.
- *
- * FOLL_PIN on large folios: folio's refcount will be incremented by
- * @refs, and its pincount will be incremented by @refs.
- *
- * FOLL_PIN on single-page folios: folio's refcount will be incremented by
- * @refs * GUP_PIN_COUNTING_BIAS.
- *
- * Return: The folio containing @page (with refcount appropriately
- * incremented) for success, or NULL upon failure. If neither FOLL_GET
- * nor FOLL_PIN was set, that's considered failure, and furthermore,
- * a likely bug in the caller, so a warning is also emitted.
- */
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
-{
- struct folio *folio;
-
- if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
- return NULL;
-
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
- return NULL;
-
- if (flags & FOLL_GET)
- return try_get_folio(page, refs);
-
- /* FOLL_PIN is set */
-
- /*
- * Don't take a pin on the zero page - it's not going anywhere
- * and it is used in a *lot* of places.
- */
- if (is_zero_page(page))
- return page_folio(page);
-
- folio = try_get_folio(page, refs);
- if (!folio)
- return NULL;
-
- /*
- * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
- * right zone, so fail and let the caller fall back to the slow
- * path.
- */
- if (unlikely((flags & FOLL_LONGTERM) &&
- !folio_is_longterm_pinnable(folio))) {
- if (!put_devmap_managed_folio_refs(folio, refs))
- folio_put_refs(folio, refs);
- return NULL;
- }
-
- /*
- * When pinning a large folio, use an exact count to track it.
- *
- * However, be sure to *also* increment the normal folio
- * refcount field at least once, so that the folio really
- * is pinned. That's why the refcount from the earlier
- * try_get_folio() is left intact.
- */
- if (folio_test_large(folio))
- atomic_add(refs, &folio->_pincount);
- else
- folio_ref_add(folio,
- refs * (GUP_PIN_COUNTING_BIAS - 1));
- /*
- * Adjust the pincount before re-checking the PTE for changes.
- * This is essentially a smp_mb() and is paired with a memory
- * barrier in folio_try_share_anon_rmap_*().
- */
- smp_mb__after_atomic();
-
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
-
- return folio;
-}
-
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
@@ -203,58 +114,59 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
}
/**
- * try_grab_page() - elevate a page's refcount by a flag-dependent amount
- * @page: pointer to page to be grabbed
- * @flags: gup flags: these are the FOLL_* flag values.
+ * try_grab_folio() - add a folio's refcount by a flag-dependent amount
+ * @folio: pointer to folio to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values
*
* This might not do anything at all, depending on the flags argument.
*
* "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
*
* Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
- * time. Cases: please see the try_grab_folio() documentation, with
- * "refs=1".
+ * time.
*
* Return: 0 for success, or if no action was required (if neither FOLL_PIN
* nor FOLL_GET was set, nothing is done). A negative error code for failure:
*
- * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not
+ * -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not
* be grabbed.
+ *
+ * It is called when we have a stable reference for the folio, typically in
+ * GUP slow path.
*/
-int __must_check try_grab_page(struct page *page, unsigned int flags)
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags)
{
- struct folio *folio = page_folio(page);
-
if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
return -ENOMEM;
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page)))
return -EREMOTEIO;
if (flags & FOLL_GET)
- folio_ref_inc(folio);
+ folio_ref_add(folio, refs);
else if (flags & FOLL_PIN) {
/*
* Don't take a pin on the zero page - it's not going anywhere
* and it is used in a *lot* of places.
*/
- if (is_zero_page(page))
+ if (is_zero_folio(folio))
return 0;
/*
- * Similar to try_grab_folio(): be sure to *also*
- * increment the normal page refcount field at least once,
+ * Increment the normal page refcount field at least once,
* so that the page really is pinned.
*/
if (folio_test_large(folio)) {
- folio_ref_add(folio, 1);
- atomic_add(1, &folio->_pincount);
+ folio_ref_add(folio, refs);
+ atomic_add(refs, &folio->_pincount);
} else {
- folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+ folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS);
}
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
}
return 0;
@@ -515,6 +427,102 @@ static int record_subpages(struct page *page, unsigned long sz,
return nr;
}
+
+/**
+ * try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
+ * @page: pointer to page to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values.
+ *
+ * "grab" names in this file mean, "look at flags to decide whether to use
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
+ *
+ * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
+ * same time. (That's true throughout the get_user_pages*() and
+ * pin_user_pages*() APIs.) Cases:
+ *
+ * FOLL_GET: folio's refcount will be incremented by @refs.
+ *
+ * FOLL_PIN on large folios: folio's refcount will be incremented by
+ * @refs, and its pincount will be incremented by @refs.
+ *
+ * FOLL_PIN on single-page folios: folio's refcount will be incremented by
+ * @refs * GUP_PIN_COUNTING_BIAS.
+ *
+ * Return: The folio containing @page (with refcount appropriately
+ * incremented) for success, or NULL upon failure. If neither FOLL_GET
+ * nor FOLL_PIN was set, that's considered failure, and furthermore,
+ * a likely bug in the caller, so a warning is also emitted.
+ *
+ * It uses add ref unless zero to elevate the folio refcount and must be called
+ * in fast path only.
+ */
+static struct folio *try_grab_folio_fast(struct page *page, int refs,
+ unsigned int flags)
+{
+ struct folio *folio;
+
+ /* Raise warn if it is not called in fast GUP */
+ VM_WARN_ON_ONCE(!irqs_disabled());
+
+ if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
+ return NULL;
+
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ return NULL;
+
+ if (flags & FOLL_GET)
+ return try_get_folio(page, refs);
+
+ /* FOLL_PIN is set */
+
+ /*
+ * Don't take a pin on the zero page - it's not going anywhere
+ * and it is used in a *lot* of places.
+ */
+ if (is_zero_page(page))
+ return page_folio(page);
+
+ folio = try_get_folio(page, refs);
+ if (!folio)
+ return NULL;
+
+ /*
+ * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
+ * right zone, so fail and let the caller fall back to the slow
+ * path.
+ */
+ if (unlikely((flags & FOLL_LONGTERM) &&
+ !folio_is_longterm_pinnable(folio))) {
+ if (!put_devmap_managed_folio_refs(folio, refs))
+ folio_put_refs(folio, refs);
+ return NULL;
+ }
+
+ /*
+ * When pinning a large folio, use an exact count to track it.
+ *
+ * However, be sure to *also* increment the normal folio
+ * refcount field at least once, so that the folio really
+ * is pinned. That's why the refcount from the earlier
+ * try_get_folio() is left intact.
+ */
+ if (folio_test_large(folio))
+ atomic_add(refs, &folio->_pincount);
+ else
+ folio_ref_add(folio,
+ refs * (GUP_PIN_COUNTING_BIAS - 1));
+ /*
+ * Adjust the pincount before re-checking the PTE for changes.
+ * This is essentially a smp_mb() and is paired with a memory
+ * barrier in folio_try_share_anon_rmap_*().
+ */
+ smp_mb__after_atomic();
+
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
+
+ return folio;
+}
#endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_GUP_FAST */
#ifdef CONFIG_ARCH_HAS_HUGEPD
@@ -535,7 +543,7 @@ static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
*/
static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz,
unsigned long addr, unsigned long end, unsigned int flags,
- struct page **pages, int *nr)
+ struct page **pages, int *nr, bool fast)
{
unsigned long pte_end;
struct page *page;
@@ -558,9 +566,15 @@ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz
page = pte_page(pte);
refs = record_subpages(page, sz, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
- if (!folio)
- return 0;
+ if (fast) {
+ folio = try_grab_folio_fast(page, refs, flags);
+ if (!folio)
+ return 0;
+ } else {
+ folio = page_folio(page);
+ if (try_grab_folio(folio, refs, flags))
+ return 0;
+ }
if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
gup_put_folio(folio, refs, flags);
@@ -588,7 +602,7 @@ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz
static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
unsigned long addr, unsigned int pdshift,
unsigned long end, unsigned int flags,
- struct page **pages, int *nr)
+ struct page **pages, int *nr, bool fast)
{
pte_t *ptep;
unsigned long sz = 1UL << hugepd_shift(hugepd);
@@ -598,7 +612,8 @@ static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
ptep = hugepte_offset(hugepd, addr, pdshift);
do {
next = hugepte_addr_end(addr, end, sz);
- ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr);
+ ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr,
+ fast);
if (ret != 1)
return ret;
} while (ptep++, addr = next, addr != end);
@@ -625,7 +640,7 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
ptep = hugepte_offset(hugepd, addr, pdshift);
ptl = huge_pte_lock(h, vma->vm_mm, ptep);
ret = gup_hugepd(vma, hugepd, addr, pdshift, addr + PAGE_SIZE,
- flags, &page, &nr);
+ flags, &page, &nr, false);
spin_unlock(ptl);
if (ret == 1) {
@@ -642,7 +657,7 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
static inline int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
unsigned long addr, unsigned int pdshift,
unsigned long end, unsigned int flags,
- struct page **pages, int *nr)
+ struct page **pages, int *nr, bool fast)
{
return 0;
}
@@ -729,7 +744,7 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma,
gup_must_unshare(vma, flags, page))
return ERR_PTR(-EMLINK);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
else
@@ -806,7 +821,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
return ERR_PTR(ret);
@@ -968,8 +983,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
- ret = try_grab_page(page, flags);
+ /* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (unlikely(ret)) {
page = ERR_PTR(ret);
goto out;
@@ -1233,7 +1248,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
goto unmap;
*page = pte_page(entry);
}
- ret = try_grab_page(*page, gup_flags);
+ ret = try_grab_folio(page_folio(*page), 1, gup_flags);
if (unlikely(ret))
goto unmap;
out:
@@ -1636,20 +1651,19 @@ static long __get_user_pages(struct mm_struct *mm,
* pages.
*/
if (page_increm > 1) {
- struct folio *folio;
+ struct folio *folio = page_folio(page);
/*
* Since we already hold refcount on the
* large folio, this should never fail.
*/
- folio = try_grab_folio(page, page_increm - 1,
- foll_flags);
- if (WARN_ON_ONCE(!folio)) {
+ if (try_grab_folio(folio, page_increm - 1,
+ foll_flags)) {
/*
* Release the 1st page ref if the
* folio is problematic, fail hard.
*/
- gup_put_folio(page_folio(page), 1,
+ gup_put_folio(folio, 1,
foll_flags);
ret = -EFAULT;
goto out;
@@ -2797,7 +2811,6 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
* This code is based heavily on the PowerPC implementation by Nick Piggin.
*/
#ifdef CONFIG_HAVE_GUP_FAST
-
/*
* Used in the GUP-fast path to determine whether GUP is permitted to work on
* a specific folio.
@@ -2962,7 +2975,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- folio = try_grab_folio(page, 1, flags);
+ folio = try_grab_folio_fast(page, 1, flags);
if (!folio)
goto pte_unmap;
@@ -3049,7 +3062,7 @@ static int gup_fast_devmap_leaf(unsigned long pfn, unsigned long addr,
break;
}
- folio = try_grab_folio(page, 1, flags);
+ folio = try_grab_folio_fast(page, 1, flags);
if (!folio) {
gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
break;
@@ -3138,7 +3151,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = pmd_page(orig);
refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -3182,7 +3195,7 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
page = pud_page(orig);
refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -3222,7 +3235,7 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr,
page = pgd_page(orig);
refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -3276,7 +3289,8 @@ static int gup_fast_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr,
* pmd format and THP pmd format
*/
if (gup_hugepd(NULL, __hugepd(pmd_val(pmd)), addr,
- PMD_SHIFT, next, flags, pages, nr) != 1)
+ PMD_SHIFT, next, flags, pages, nr,
+ true) != 1)
return 0;
} else if (!gup_fast_pte_range(pmd, pmdp, addr, next, flags,
pages, nr))
@@ -3306,7 +3320,8 @@ static int gup_fast_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr,
return 0;
} else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) {
if (gup_hugepd(NULL, __hugepd(pud_val(pud)), addr,
- PUD_SHIFT, next, flags, pages, nr) != 1)
+ PUD_SHIFT, next, flags, pages, nr,
+ true) != 1)
return 0;
} else if (!gup_fast_pmd_range(pudp, pud, addr, next, flags,
pages, nr))
@@ -3333,7 +3348,8 @@ static int gup_fast_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr,
BUILD_BUG_ON(p4d_leaf(p4d));
if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) {
if (gup_hugepd(NULL, __hugepd(p4d_val(p4d)), addr,
- P4D_SHIFT, next, flags, pages, nr) != 1)
+ P4D_SHIFT, next, flags, pages, nr,
+ true) != 1)
return 0;
} else if (!gup_fast_pud_range(p4dp, p4d, addr, next, flags,
pages, nr))
@@ -3362,7 +3378,8 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end,
return;
} else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) {
if (gup_hugepd(NULL, __hugepd(pgd_val(pgd)), addr,
- PGDIR_SHIFT, next, flags, pages, nr) != 1)
+ PGDIR_SHIFT, next, flags, pages, nr,
+ true) != 1)
return;
} else if (!gup_fast_p4d_range(pgdp, pgd, addr, next, flags,
pages, nr))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index db7946a0a28c..2120f7478e55 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1331,7 +1331,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!*pgmap)
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
diff --git a/mm/internal.h b/mm/internal.h
index 6902b7dd8509..cc2c5e07fad3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1182,8 +1182,8 @@ int migrate_device_coherent_page(struct page *page);
/*
* mm/gup.c
*/
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
-int __must_check try_grab_page(struct page *page, unsigned int flags);
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags);
/*
* mm/huge_memory.c
From: Max Kellermann <max.kellermann(a)ionos.com>
This fixes a NULL pointer dereference bug due to a data race which
looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
Workqueue: events_unbound netfs_rreq_write_to_cache_work
RIP: 0010:cachefiles_prepare_write+0x30/0xa0
Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
Call Trace:
<TASK>
? __die+0x1f/0x70
? page_fault_oops+0x15d/0x440
? search_module_extables+0xe/0x40
? fixup_exception+0x22/0x2f0
? exc_page_fault+0x5f/0x100
? asm_exc_page_fault+0x22/0x30
? cachefiles_prepare_write+0x30/0xa0
netfs_rreq_write_to_cache_work+0x135/0x2e0
process_one_work+0x137/0x2c0
worker_thread+0x2e9/0x400
? __pfx_worker_thread+0x10/0x10
kthread+0xcc/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:
struct cachefiles_object *object = cachefiles_cres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()). The cookie must not be
withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: Jeff Layton <jlayton(a)kernel.org>
cc: netfs(a)lists.linux.dev
cc: linux-fsdevel(a)vger.kernel.org
cc: stable(a)vger.kernel.org
---
fs/netfs/fscache_cookie.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/netfs/fscache_cookie.c b/fs/netfs/fscache_cookie.c
index bce2492186d0..d4d4b3a8b106 100644
--- a/fs/netfs/fscache_cookie.c
+++ b/fs/netfs/fscache_cookie.c
@@ -741,6 +741,10 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
spin_lock(&cookie->lock);
}
if (test_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) {
+ if (atomic_read(&cookie->n_accesses) != 0)
+ /* still being accessed: postpone it */
+ break;
+
__fscache_set_cookie_state(cookie,
FSCACHE_COOKIE_STATE_LRU_DISCARDING);
wake = true;
From: Andrey Konovalov <andreyknvl(a)gmail.com>
When collecting coverage from softirqs, KCOV uses in_serving_softirq() to
check whether the code is running in the softirq context. Unfortunately,
in_serving_softirq() is > 0 even when the code is running in the hardirq
or NMI context for hardirqs and NMIs that happened during a softirq.
As a result, if a softirq handler contains a remote coverage collection
section and a hardirq with another remote coverage collection section
happens during handling the softirq, KCOV incorrectly detects a nested
softirq coverate collection section and prints a WARNING, as reported
by syzbot.
This issue was exposed by commit a7f3813e589f ("usb: gadget: dummy_hcd:
Switch to hrtimer transfer scheduler"), which switched dummy_hcd to using
hrtimer and made the timer's callback be executed in the hardirq context.
Change the related checks in KCOV to account for this behavior of
in_serving_softirq() and make KCOV ignore remote coverage collection
sections in the hardirq and NMI contexts.
This prevents the WARNING printed by syzbot but does not fix the inability
of KCOV to collect coverage from the __usb_hcd_giveback_urb when dummy_hcd
is in use (caused by a7f3813e589f); a separate patch is required for that.
Reported-by: syzbot+2388cdaeb6b10f0c13ac(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=2388cdaeb6b10f0c13ac
Fixes: 5ff3b30ab57d ("kcov: collect coverage from interrupts")
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrey Konovalov <andreyknvl(a)gmail.com>
---
kernel/kcov.c | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/kernel/kcov.c b/kernel/kcov.c
index f0a69d402066e..274b6b7c718de 100644
--- a/kernel/kcov.c
+++ b/kernel/kcov.c
@@ -161,6 +161,15 @@ static void kcov_remote_area_put(struct kcov_remote_area *area,
kmsan_unpoison_memory(&area->list, sizeof(area->list));
}
+/*
+ * Unlike in_serving_softirq(), this function returns false when called during
+ * a hardirq or an NMI that happened in the softirq context.
+ */
+static inline bool in_softirq_really(void)
+{
+ return in_serving_softirq() && !in_hardirq() && !in_nmi();
+}
+
static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_struct *t)
{
unsigned int mode;
@@ -170,7 +179,7 @@ static notrace bool check_kcov_mode(enum kcov_mode needed_mode, struct task_stru
* so we ignore code executed in interrupts, unless we are in a remote
* coverage collection section in a softirq.
*/
- if (!in_task() && !(in_serving_softirq() && t->kcov_softirq))
+ if (!in_task() && !(in_softirq_really() && t->kcov_softirq))
return false;
mode = READ_ONCE(t->kcov_mode);
/*
@@ -849,7 +858,7 @@ void kcov_remote_start(u64 handle)
if (WARN_ON(!kcov_check_handle(handle, true, true, true)))
return;
- if (!in_task() && !in_serving_softirq())
+ if (!in_task() && !in_softirq_really())
return;
local_lock_irqsave(&kcov_percpu_data.lock, flags);
@@ -991,7 +1000,7 @@ void kcov_remote_stop(void)
int sequence;
unsigned long flags;
- if (!in_task() && !in_serving_softirq())
+ if (!in_task() && !in_softirq_really())
return;
local_lock_irqsave(&kcov_percpu_data.lock, flags);
--
2.25.1
This patch series makes it possible to use Rust together with the shadow
call stack sanitizer. The first patch is intended to be backported to
ensure that people don't try to use SCS with Rust on older kernel
versions. The second patch makes it possible to use Rust with the shadow
call stack sanitizer.
The second patch in this series doesn't make sense without [1], though
it doesn't break the build if [1] is missing.
Link: https://lore.kernel.org/rust-for-linux/20240701183625.665574-12-ojeda@kerne… [1]
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
Changes in v3:
- Use -Zfixed-x18.
- Add logic to reject unsupported rustc versions.
- Also include a fix to be backported.
- Link to v2: https://lore.kernel.org/rust-for-linux/20240305-shadow-call-stack-v2-1-c7b4…
Changes in v2:
- Add -Cforce-unwind-tables flag.
- Link to v1: https://lore.kernel.org/rust-for-linux/20240304-shadow-call-stack-v1-1-f055…
---
Alice Ryhl (2):
rust: SHADOW_CALL_STACK is incompatible with Rust
rust: add flags for shadow call stack sanitizer
Makefile | 1 +
arch/Kconfig | 1 +
arch/arm64/Makefile | 3 +++
3 files changed, 5 insertions(+)
---
base-commit: 83b1e6e4170cf96b2a7c49070dd43749649f454e
change-id: 20240304-shadow-call-stack-9c197a4361d9
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
> Thank you for your sending the revised patches, it looks better than the
> previous one. However, I have an additional request.
Allright, patch v3 it is.
> [1] https://git-scm.com/docs/git-revert
Should have known git has something like that, how handy!
> $ git revert -s b5b519965c4c
Yes, 5b5 can be removed via revert, but what is the difference in
effect? Just time saving?
> $ git revert -s 7ba5ca32fe6e
This one I'd like to ask you about:
The original inline comment in amdtp-stream.c
amdtp_domain_stream_pcm_pointer()
```
// This function is called in software IRQ context of
// period_work or process context.
//
// When the software IRQ context was scheduled by software IRQ
// context of IT contexts, queued packets were already handled.
// Therefore, no need to flush the queue in buffer furthermore.
//
// When the process context reach here, some packets will be
// already queued in the buffer. These packets should be handled
// immediately to keep better granularity of PCM pointer.
//
// Later, the process context will sometimes schedules software
// IRQ context of the period_work. Then, no need to flush the
// queue by the same reason as described in the above
```
(let's call the above v1) was replaced with
```
// In software IRQ context, the call causes dead-lock to disable the tasklet
// synchronously.
```
on occasion of 7ba5ca32fe6e (let's call this v2).
I sought to replace it with
```
// use wq to prevent deadlock between process context spin_lock
// of snd_pcm_stream_lock_irq() in snd_pcm_status64() and
// softIRQ context spin_lock of snd_pcm_stream_lock_irqsave()
// in snd_pcm_period_elapsed()
```
to prevent this issue from occurring again (let's call this v3).
Should I include v1, v3 or a combination of v1 and v3 in my next patch?
> Just for safe, it is preferable to execute 'scripts/checkpatch.pl' in
> kernel tree to check the patchset generated by send-email subcommand[3].
Absolutely should have done so, sorry.
Thank you for your patience and guidance,
Edmund Raile.
Issue #501 [1] showed that the Netlink PM currently doesn't correctly
support removal and re-add of signal endpoints.
Patches 1 and 2 address the issue: the first one in the userspace path-
manager, introduced in v5.19 ; and the second one in the in-kernel path-
manager, introduced in v5.7.
Patch 3 introduces a related selftest. There is no 'Fixes' tag, because
it might be hard to backport it automatically, as missing helpers in
Bash will not be caught when compiling the kernel or the selftests.
The last two patches address two small issues in the MPTCP selftests,
one introduced in v6.6., and the other one in v5.17.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501 [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Liu Jing (1):
selftests: mptcp: always close input's FD if opened
Paolo Abeni (4):
mptcp: fix user-space PM announced address accounting
mptcp: fix NL PM announced address accounting
selftests: mptcp: add explicit test case for remove/readd
selftests: mptcp: fix error path
net/mptcp/pm_netlink.c | 27 ++++++++++++++------
tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 +++---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 31 ++++++++++++++++++++++-
3 files changed, 53 insertions(+), 13 deletions(-)
---
base-commit: 301927d2d2eb8e541357ba850bc7a1a74dbbd670
change-id: 20240726-upstream-net-20240726-mptcp-fix-signal-readd-f3c72bbcbceb
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 4811f7af6090e8f5a398fbdd766f903ef6c0d787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072930-unbridle-negotiate-9977@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
4811f7af6090 ("nilfs2: handle inconsistent state in nilfs_btnode_create_block()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4811f7af6090e8f5a398fbdd766f903ef6c0d787 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Thu, 25 Jul 2024 14:20:07 +0900
Subject: [PATCH] nilfs2: handle inconsistent state in
nilfs_btnode_create_block()
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
Fixes: a60be987d45d ("nilfs2: B-tree node cache")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+89cc4f2324ed37988b60(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..c034080c334b 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -51,12 +51,21 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
if (unlikely(!bh))
- return NULL;
+ return ERR_PTR(-ENOMEM);
if (unlikely(buffer_mapped(bh) || buffer_uptodate(bh) ||
buffer_dirty(bh))) {
- brelse(bh);
- BUG();
+ /*
+ * The block buffer at the specified new address was already
+ * in use. This can happen if it is a virtual block number
+ * and has been reallocated due to corruption of the bitmap
+ * used to manage its allocation state (if not, the buffer
+ * clearing of an abandoned b-tree node is missing somewhere).
+ */
+ nilfs_error(inode->i_sb,
+ "state inconsistency probably due to duplicate use of b-tree node block address %llu (ino=%lu)",
+ (unsigned long long)blocknr, inode->i_ino);
+ goto failed;
}
memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
@@ -67,6 +76,12 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
folio_unlock(bh->b_folio);
folio_put(bh->b_folio);
return bh;
+
+failed:
+ folio_unlock(bh->b_folio);
+ folio_put(bh->b_folio);
+ brelse(bh);
+ return ERR_PTR(-EIO);
}
int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
@@ -217,8 +232,8 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
}
nbh = nilfs_btnode_create_block(btnc, newkey);
- if (!nbh)
- return -ENOMEM;
+ if (IS_ERR(nbh))
+ return PTR_ERR(nbh);
BUG_ON(nbh == obh);
ctxt->newbh = nbh;
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index a139970e4804..862bdf23120e 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -63,8 +63,8 @@ static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree,
struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr);
- if (!bh)
- return -ENOMEM;
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
set_buffer_nilfs_volatile(bh);
*bhp = bh;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 4811f7af6090e8f5a398fbdd766f903ef6c0d787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072929-blip-squad-13da@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
4811f7af6090 ("nilfs2: handle inconsistent state in nilfs_btnode_create_block()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4811f7af6090e8f5a398fbdd766f903ef6c0d787 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Thu, 25 Jul 2024 14:20:07 +0900
Subject: [PATCH] nilfs2: handle inconsistent state in
nilfs_btnode_create_block()
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
Fixes: a60be987d45d ("nilfs2: B-tree node cache")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+89cc4f2324ed37988b60(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..c034080c334b 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -51,12 +51,21 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
if (unlikely(!bh))
- return NULL;
+ return ERR_PTR(-ENOMEM);
if (unlikely(buffer_mapped(bh) || buffer_uptodate(bh) ||
buffer_dirty(bh))) {
- brelse(bh);
- BUG();
+ /*
+ * The block buffer at the specified new address was already
+ * in use. This can happen if it is a virtual block number
+ * and has been reallocated due to corruption of the bitmap
+ * used to manage its allocation state (if not, the buffer
+ * clearing of an abandoned b-tree node is missing somewhere).
+ */
+ nilfs_error(inode->i_sb,
+ "state inconsistency probably due to duplicate use of b-tree node block address %llu (ino=%lu)",
+ (unsigned long long)blocknr, inode->i_ino);
+ goto failed;
}
memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
@@ -67,6 +76,12 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
folio_unlock(bh->b_folio);
folio_put(bh->b_folio);
return bh;
+
+failed:
+ folio_unlock(bh->b_folio);
+ folio_put(bh->b_folio);
+ brelse(bh);
+ return ERR_PTR(-EIO);
}
int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
@@ -217,8 +232,8 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
}
nbh = nilfs_btnode_create_block(btnc, newkey);
- if (!nbh)
- return -ENOMEM;
+ if (IS_ERR(nbh))
+ return PTR_ERR(nbh);
BUG_ON(nbh == obh);
ctxt->newbh = nbh;
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index a139970e4804..862bdf23120e 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -63,8 +63,8 @@ static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree,
struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr);
- if (!bh)
- return -ENOMEM;
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
set_buffer_nilfs_volatile(bh);
*bhp = bh;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4811f7af6090e8f5a398fbdd766f903ef6c0d787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072928-thermal-reactor-e976@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4811f7af6090 ("nilfs2: handle inconsistent state in nilfs_btnode_create_block()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4811f7af6090e8f5a398fbdd766f903ef6c0d787 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Thu, 25 Jul 2024 14:20:07 +0900
Subject: [PATCH] nilfs2: handle inconsistent state in
nilfs_btnode_create_block()
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
Fixes: a60be987d45d ("nilfs2: B-tree node cache")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+89cc4f2324ed37988b60(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..c034080c334b 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -51,12 +51,21 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
if (unlikely(!bh))
- return NULL;
+ return ERR_PTR(-ENOMEM);
if (unlikely(buffer_mapped(bh) || buffer_uptodate(bh) ||
buffer_dirty(bh))) {
- brelse(bh);
- BUG();
+ /*
+ * The block buffer at the specified new address was already
+ * in use. This can happen if it is a virtual block number
+ * and has been reallocated due to corruption of the bitmap
+ * used to manage its allocation state (if not, the buffer
+ * clearing of an abandoned b-tree node is missing somewhere).
+ */
+ nilfs_error(inode->i_sb,
+ "state inconsistency probably due to duplicate use of b-tree node block address %llu (ino=%lu)",
+ (unsigned long long)blocknr, inode->i_ino);
+ goto failed;
}
memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
@@ -67,6 +76,12 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
folio_unlock(bh->b_folio);
folio_put(bh->b_folio);
return bh;
+
+failed:
+ folio_unlock(bh->b_folio);
+ folio_put(bh->b_folio);
+ brelse(bh);
+ return ERR_PTR(-EIO);
}
int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
@@ -217,8 +232,8 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
}
nbh = nilfs_btnode_create_block(btnc, newkey);
- if (!nbh)
- return -ENOMEM;
+ if (IS_ERR(nbh))
+ return PTR_ERR(nbh);
BUG_ON(nbh == obh);
ctxt->newbh = nbh;
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index a139970e4804..862bdf23120e 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -63,8 +63,8 @@ static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree,
struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr);
- if (!bh)
- return -ENOMEM;
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
set_buffer_nilfs_volatile(bh);
*bhp = bh;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4811f7af6090e8f5a398fbdd766f903ef6c0d787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072927-oozy-arousal-a5a8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4811f7af6090 ("nilfs2: handle inconsistent state in nilfs_btnode_create_block()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4811f7af6090e8f5a398fbdd766f903ef6c0d787 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Thu, 25 Jul 2024 14:20:07 +0900
Subject: [PATCH] nilfs2: handle inconsistent state in
nilfs_btnode_create_block()
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
Fixes: a60be987d45d ("nilfs2: B-tree node cache")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+89cc4f2324ed37988b60(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..c034080c334b 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -51,12 +51,21 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
if (unlikely(!bh))
- return NULL;
+ return ERR_PTR(-ENOMEM);
if (unlikely(buffer_mapped(bh) || buffer_uptodate(bh) ||
buffer_dirty(bh))) {
- brelse(bh);
- BUG();
+ /*
+ * The block buffer at the specified new address was already
+ * in use. This can happen if it is a virtual block number
+ * and has been reallocated due to corruption of the bitmap
+ * used to manage its allocation state (if not, the buffer
+ * clearing of an abandoned b-tree node is missing somewhere).
+ */
+ nilfs_error(inode->i_sb,
+ "state inconsistency probably due to duplicate use of b-tree node block address %llu (ino=%lu)",
+ (unsigned long long)blocknr, inode->i_ino);
+ goto failed;
}
memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
@@ -67,6 +76,12 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
folio_unlock(bh->b_folio);
folio_put(bh->b_folio);
return bh;
+
+failed:
+ folio_unlock(bh->b_folio);
+ folio_put(bh->b_folio);
+ brelse(bh);
+ return ERR_PTR(-EIO);
}
int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
@@ -217,8 +232,8 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
}
nbh = nilfs_btnode_create_block(btnc, newkey);
- if (!nbh)
- return -ENOMEM;
+ if (IS_ERR(nbh))
+ return PTR_ERR(nbh);
BUG_ON(nbh == obh);
ctxt->newbh = nbh;
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index a139970e4804..862bdf23120e 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -63,8 +63,8 @@ static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree,
struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr);
- if (!bh)
- return -ENOMEM;
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
set_buffer_nilfs_volatile(bh);
*bhp = bh;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4811f7af6090e8f5a398fbdd766f903ef6c0d787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072926-epiphany-basis-4f11@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
4811f7af6090 ("nilfs2: handle inconsistent state in nilfs_btnode_create_block()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4811f7af6090e8f5a398fbdd766f903ef6c0d787 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Thu, 25 Jul 2024 14:20:07 +0900
Subject: [PATCH] nilfs2: handle inconsistent state in
nilfs_btnode_create_block()
Syzbot reported that a buffer state inconsistency was detected in
nilfs_btnode_create_block(), triggering a kernel bug.
It is not appropriate to treat this inconsistency as a bug; it can occur
if the argument block address (the buffer index of the newly created
block) is a virtual block number and has been reallocated due to
corruption of the bitmap used to manage its allocation state.
So, modify nilfs_btnode_create_block() and its callers to treat it as a
possible filesystem error, rather than triggering a kernel bug.
Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com
Fixes: a60be987d45d ("nilfs2: B-tree node cache")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+89cc4f2324ed37988b60(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/btnode.c b/fs/nilfs2/btnode.c
index 0131d83b912d..c034080c334b 100644
--- a/fs/nilfs2/btnode.c
+++ b/fs/nilfs2/btnode.c
@@ -51,12 +51,21 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
bh = nilfs_grab_buffer(inode, btnc, blocknr, BIT(BH_NILFS_Node));
if (unlikely(!bh))
- return NULL;
+ return ERR_PTR(-ENOMEM);
if (unlikely(buffer_mapped(bh) || buffer_uptodate(bh) ||
buffer_dirty(bh))) {
- brelse(bh);
- BUG();
+ /*
+ * The block buffer at the specified new address was already
+ * in use. This can happen if it is a virtual block number
+ * and has been reallocated due to corruption of the bitmap
+ * used to manage its allocation state (if not, the buffer
+ * clearing of an abandoned b-tree node is missing somewhere).
+ */
+ nilfs_error(inode->i_sb,
+ "state inconsistency probably due to duplicate use of b-tree node block address %llu (ino=%lu)",
+ (unsigned long long)blocknr, inode->i_ino);
+ goto failed;
}
memset(bh->b_data, 0, i_blocksize(inode));
bh->b_bdev = inode->i_sb->s_bdev;
@@ -67,6 +76,12 @@ nilfs_btnode_create_block(struct address_space *btnc, __u64 blocknr)
folio_unlock(bh->b_folio);
folio_put(bh->b_folio);
return bh;
+
+failed:
+ folio_unlock(bh->b_folio);
+ folio_put(bh->b_folio);
+ brelse(bh);
+ return ERR_PTR(-EIO);
}
int nilfs_btnode_submit_block(struct address_space *btnc, __u64 blocknr,
@@ -217,8 +232,8 @@ int nilfs_btnode_prepare_change_key(struct address_space *btnc,
}
nbh = nilfs_btnode_create_block(btnc, newkey);
- if (!nbh)
- return -ENOMEM;
+ if (IS_ERR(nbh))
+ return PTR_ERR(nbh);
BUG_ON(nbh == obh);
ctxt->newbh = nbh;
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index a139970e4804..862bdf23120e 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -63,8 +63,8 @@ static int nilfs_btree_get_new_block(const struct nilfs_bmap *btree,
struct buffer_head *bh;
bh = nilfs_btnode_create_block(btnc, ptr);
- if (!bh)
- return -ENOMEM;
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
set_buffer_nilfs_volatile(bh);
*bhp = bh;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8ddad558997002ce67980e30c9e8dfaa696e163b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072928-engaged-steerable-531f@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
8ddad5589970 ("dmaengine: fsl-edma: change the memory access from local into remote mode in i.MX 8QM")
77584368a0f3 ("dmaengine: fsl-edma: clean up unused "fsl,imx8qm-adma" compatible string")
d8d4355861d8 ("dmaengine: fsl-edma: add i.MX8ULP edma support")
4ee632c82d2d ("dmaengine: fsl-edma: fix DMA channel leak in eDMAv4")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8ddad558997002ce67980e30c9e8dfaa696e163b Mon Sep 17 00:00:00 2001
From: Joy Zou <joy.zou(a)nxp.com>
Date: Fri, 10 May 2024 11:09:34 +0800
Subject: [PATCH] dmaengine: fsl-edma: change the memory access from local into
remote mode in i.MX 8QM
Fix the issue where MEM_TO_MEM fail on i.MX8QM due to the requirement
that both source and destination addresses need pass through the IOMMU.
Typically, peripheral FIFO addresses bypass the IOMMU, necessitating
only one of the source or destination to go through it.
Set "is_remote" to true to ensure both source and destination
addresses pass through the IOMMU.
iMX8 Spec define "Local" and "Remote" bus as below.
Local bus: bypass IOMMU to directly access other peripheral register,
such as FIFO.
Remote bus: go through IOMMU to access system memory.
The test fail log as follow:
[ 66.268506] dmatest: dma0chan0-copy0: result #1: 'test timed out' with src_off=0x100 dst_off=0x80 len=0x3ec0 (0)
[ 66.278785] dmatest: dma0chan0-copy0: summary 1 tests, 1 failures 0.32 iops 4 KB/s (0)
Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support")
Signed-off-by: Joy Zou <joy.zou(a)nxp.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Link: https://lore.kernel.org/r/20240510030959.703663-1-joy.zou@nxp.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
diff --git a/drivers/dma/fsl-edma-common.c b/drivers/dma/fsl-edma-common.c
index 9d565ab85502..b7f15ab96855 100644
--- a/drivers/dma/fsl-edma-common.c
+++ b/drivers/dma/fsl-edma-common.c
@@ -755,6 +755,8 @@ struct dma_async_tx_descriptor *fsl_edma_prep_memcpy(struct dma_chan *chan,
fsl_desc->iscyclic = false;
fsl_chan->is_sw = true;
+ if (fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_MEM_REMOTE)
+ fsl_chan->is_remote = true;
/* To match with copy_align and max_seg_size so 1 tcd is enough */
fsl_edma_fill_tcd(fsl_chan, fsl_desc->tcd[0].vtcd, dma_src, dma_dst,
@@ -848,6 +850,7 @@ void fsl_edma_free_chan_resources(struct dma_chan *chan)
fsl_chan->tcd_pool = NULL;
fsl_chan->is_sw = false;
fsl_chan->srcid = 0;
+ fsl_chan->is_remote = false;
if (fsl_edma_drvflags(fsl_chan) & FSL_EDMA_DRV_HAS_CHCLK)
clk_disable_unprepare(fsl_chan->clk);
}
diff --git a/drivers/dma/fsl-edma-common.h b/drivers/dma/fsl-edma-common.h
index 1c90b95f4ff8..ce37e1ee9c46 100644
--- a/drivers/dma/fsl-edma-common.h
+++ b/drivers/dma/fsl-edma-common.h
@@ -194,6 +194,7 @@ struct fsl_edma_desc {
#define FSL_EDMA_DRV_HAS_PD BIT(5)
#define FSL_EDMA_DRV_HAS_CHCLK BIT(6)
#define FSL_EDMA_DRV_HAS_CHMUX BIT(7)
+#define FSL_EDMA_DRV_MEM_REMOTE BIT(8)
/* control and status register is in tcd address space, edma3 reg layout */
#define FSL_EDMA_DRV_SPLIT_REG BIT(9)
#define FSL_EDMA_DRV_BUS_8BYTE BIT(10)
diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c
index ec4f5baafad5..c66185c5a199 100644
--- a/drivers/dma/fsl-edma-main.c
+++ b/drivers/dma/fsl-edma-main.c
@@ -343,7 +343,7 @@ static struct fsl_edma_drvdata imx7ulp_data = {
};
static struct fsl_edma_drvdata imx8qm_data = {
- .flags = FSL_EDMA_DRV_HAS_PD | FSL_EDMA_DRV_EDMA3,
+ .flags = FSL_EDMA_DRV_HAS_PD | FSL_EDMA_DRV_EDMA3 | FSL_EDMA_DRV_MEM_REMOTE,
.chreg_space_sz = 0x10000,
.chreg_off = 0x10000,
.setup_irq = fsl_edma3_irq_init,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072915-exclusive-powdery-eeb5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
8cb1f4080dd9 ("f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid")
21327a042dd9 ("f2fs: fix to avoid use SSR allocate when do defragment")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk(a)kernel.org>
Date: Tue, 18 Jun 2024 02:15:38 +0000
Subject: [PATCH] f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
mkdir /mnt/test/comp
f2fs_io setflags compression /mnt/test/comp
dd if=/dev/zero of=/mnt/test/comp/testfile bs=16k count=1
truncate --size 13 /mnt/test/comp/testfile
In the above scenario, we can get a BUG_ON.
kernel BUG at fs/f2fs/segment.c:3589!
Call Trace:
do_write_page+0x78/0x390 [f2fs]
f2fs_outplace_write_data+0x62/0xb0 [f2fs]
f2fs_do_write_data_page+0x275/0x740 [f2fs]
f2fs_write_single_data_page+0x1dc/0x8f0 [f2fs]
f2fs_write_multi_pages+0x1e5/0xae0 [f2fs]
f2fs_write_cache_pages+0xab1/0xc60 [f2fs]
f2fs_write_data_pages+0x2d8/0x330 [f2fs]
do_writepages+0xcf/0x270
__writeback_single_inode+0x44/0x350
writeback_sb_inodes+0x242/0x530
__writeback_inodes_wb+0x54/0xf0
wb_writeback+0x192/0x310
wb_workfn+0x30d/0x400
The reason is we gave CURSEG_ALL_DATA_ATGC to COMPR_ADDR where the
page was set the gcing flag by set_cluster_dirty().
Cc: stable(a)vger.kernel.org
Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration")
Reviewed-by: Chao Yu <chao(a)kernel.org>
Tested-by: Will McVicker <willmcvicker(a)google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 362cfb550408..4db1add43e36 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3505,6 +3505,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
if (fio->sbi->am.atgc_enabled &&
(fio->io_type == FS_DATA_IO) &&
(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
+ __is_valid_data_blkaddr(fio->old_blkaddr) &&
!is_inode_flag_set(inode, FI_OPU_WRITE))
return CURSEG_ALL_DATA_ATGC;
else
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072914-caviar-emphasize-6bc8@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8cb1f4080dd9 ("f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid")
21327a042dd9 ("f2fs: fix to avoid use SSR allocate when do defragment")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk(a)kernel.org>
Date: Tue, 18 Jun 2024 02:15:38 +0000
Subject: [PATCH] f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
mkdir /mnt/test/comp
f2fs_io setflags compression /mnt/test/comp
dd if=/dev/zero of=/mnt/test/comp/testfile bs=16k count=1
truncate --size 13 /mnt/test/comp/testfile
In the above scenario, we can get a BUG_ON.
kernel BUG at fs/f2fs/segment.c:3589!
Call Trace:
do_write_page+0x78/0x390 [f2fs]
f2fs_outplace_write_data+0x62/0xb0 [f2fs]
f2fs_do_write_data_page+0x275/0x740 [f2fs]
f2fs_write_single_data_page+0x1dc/0x8f0 [f2fs]
f2fs_write_multi_pages+0x1e5/0xae0 [f2fs]
f2fs_write_cache_pages+0xab1/0xc60 [f2fs]
f2fs_write_data_pages+0x2d8/0x330 [f2fs]
do_writepages+0xcf/0x270
__writeback_single_inode+0x44/0x350
writeback_sb_inodes+0x242/0x530
__writeback_inodes_wb+0x54/0xf0
wb_writeback+0x192/0x310
wb_workfn+0x30d/0x400
The reason is we gave CURSEG_ALL_DATA_ATGC to COMPR_ADDR where the
page was set the gcing flag by set_cluster_dirty().
Cc: stable(a)vger.kernel.org
Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration")
Reviewed-by: Chao Yu <chao(a)kernel.org>
Tested-by: Will McVicker <willmcvicker(a)google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 362cfb550408..4db1add43e36 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3505,6 +3505,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
if (fio->sbi->am.atgc_enabled &&
(fio->io_type == FS_DATA_IO) &&
(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
+ __is_valid_data_blkaddr(fio->old_blkaddr) &&
!is_inode_flag_set(inode, FI_OPU_WRITE))
return CURSEG_ALL_DATA_ATGC;
else
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072913-acronym-duty-c413@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
8cb1f4080dd9 ("f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid")
21327a042dd9 ("f2fs: fix to avoid use SSR allocate when do defragment")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk(a)kernel.org>
Date: Tue, 18 Jun 2024 02:15:38 +0000
Subject: [PATCH] f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
mkdir /mnt/test/comp
f2fs_io setflags compression /mnt/test/comp
dd if=/dev/zero of=/mnt/test/comp/testfile bs=16k count=1
truncate --size 13 /mnt/test/comp/testfile
In the above scenario, we can get a BUG_ON.
kernel BUG at fs/f2fs/segment.c:3589!
Call Trace:
do_write_page+0x78/0x390 [f2fs]
f2fs_outplace_write_data+0x62/0xb0 [f2fs]
f2fs_do_write_data_page+0x275/0x740 [f2fs]
f2fs_write_single_data_page+0x1dc/0x8f0 [f2fs]
f2fs_write_multi_pages+0x1e5/0xae0 [f2fs]
f2fs_write_cache_pages+0xab1/0xc60 [f2fs]
f2fs_write_data_pages+0x2d8/0x330 [f2fs]
do_writepages+0xcf/0x270
__writeback_single_inode+0x44/0x350
writeback_sb_inodes+0x242/0x530
__writeback_inodes_wb+0x54/0xf0
wb_writeback+0x192/0x310
wb_workfn+0x30d/0x400
The reason is we gave CURSEG_ALL_DATA_ATGC to COMPR_ADDR where the
page was set the gcing flag by set_cluster_dirty().
Cc: stable(a)vger.kernel.org
Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration")
Reviewed-by: Chao Yu <chao(a)kernel.org>
Tested-by: Will McVicker <willmcvicker(a)google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 362cfb550408..4db1add43e36 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3505,6 +3505,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
if (fio->sbi->am.atgc_enabled &&
(fio->io_type == FS_DATA_IO) &&
(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
+ __is_valid_data_blkaddr(fio->old_blkaddr) &&
!is_inode_flag_set(inode, FI_OPU_WRITE))
return CURSEG_ALL_DATA_ATGC;
else
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072912-upstream-tropics-2c8a@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
8cb1f4080dd9 ("f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid")
21327a042dd9 ("f2fs: fix to avoid use SSR allocate when do defragment")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8cb1f4080dd91c6e6b01dbea013a3f42341cb6a1 Mon Sep 17 00:00:00 2001
From: Jaegeuk Kim <jaegeuk(a)kernel.org>
Date: Tue, 18 Jun 2024 02:15:38 +0000
Subject: [PATCH] f2fs: assign CURSEG_ALL_DATA_ATGC if blkaddr is valid
mkdir /mnt/test/comp
f2fs_io setflags compression /mnt/test/comp
dd if=/dev/zero of=/mnt/test/comp/testfile bs=16k count=1
truncate --size 13 /mnt/test/comp/testfile
In the above scenario, we can get a BUG_ON.
kernel BUG at fs/f2fs/segment.c:3589!
Call Trace:
do_write_page+0x78/0x390 [f2fs]
f2fs_outplace_write_data+0x62/0xb0 [f2fs]
f2fs_do_write_data_page+0x275/0x740 [f2fs]
f2fs_write_single_data_page+0x1dc/0x8f0 [f2fs]
f2fs_write_multi_pages+0x1e5/0xae0 [f2fs]
f2fs_write_cache_pages+0xab1/0xc60 [f2fs]
f2fs_write_data_pages+0x2d8/0x330 [f2fs]
do_writepages+0xcf/0x270
__writeback_single_inode+0x44/0x350
writeback_sb_inodes+0x242/0x530
__writeback_inodes_wb+0x54/0xf0
wb_writeback+0x192/0x310
wb_workfn+0x30d/0x400
The reason is we gave CURSEG_ALL_DATA_ATGC to COMPR_ADDR where the
page was set the gcing flag by set_cluster_dirty().
Cc: stable(a)vger.kernel.org
Fixes: 4961acdd65c9 ("f2fs: fix to tag gcing flag on page during block migration")
Reviewed-by: Chao Yu <chao(a)kernel.org>
Tested-by: Will McVicker <willmcvicker(a)google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 362cfb550408..4db1add43e36 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3505,6 +3505,7 @@ static int __get_segment_type_6(struct f2fs_io_info *fio)
if (fio->sbi->am.atgc_enabled &&
(fio->io_type == FS_DATA_IO) &&
(fio->sbi->gc_mode != GC_URGENT_HIGH) &&
+ __is_valid_data_blkaddr(fio->old_blkaddr) &&
!is_inode_flag_set(inode, FI_OPU_WRITE))
return CURSEG_ALL_DATA_ATGC;
else
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f18d0076933689775fe7faeeb10ee93ff01be6ab
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072902-overnight-carried-32ca@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f18d00769336 ("f2fs: use meta inode for GC of COW file")
b40a2b003709 ("f2fs: use meta inode for GC of atomic file")
9f0c4a46be1f ("f2fs: fix to truncate meta inode pages forcely")
fd244524c2cf ("f2fs: compress: fix to cover normal cluster write with cp_rwsem")
55fdc1c24a1d ("f2fs: fix to wait on block writeback for post_read case")
4e4f1eb9949b ("f2fs: introduce f2fs_invalidate_internal_cache() for cleanup")
a46bebd502fe ("f2fs: synchronize atomic write aborts")
2eae077e6e46 ("f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_info")
ae267fc1cfe9 ("f2fs: fix to abort atomic write only during do_exist()")
e7547daccd6a ("f2fs: refactor extent_cache to support for read and more")
749d543c0d45 ("f2fs: remove unnecessary __init_extent_tree")
3bac20a8f011 ("f2fs: move internal functions into extent_cache.c")
12607c1ba763 ("f2fs: specify extent cache for read explicitly")
ed8ac22b6b75 ("f2fs: introduce f2fs_is_readonly() for readability")
41e8f85a75fc ("f2fs: introduce F2FS_IOC_START_ATOMIC_REPLACE")
967eaad1fed5 ("f2fs: fix to set flush_merge opt and show noflush_merge")
4d8d45df2252 ("f2fs: correct i_size change for atomic writes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f18d0076933689775fe7faeeb10ee93ff01be6ab Mon Sep 17 00:00:00 2001
From: Sunmin Jeong <s_min.jeong(a)samsung.com>
Date: Wed, 10 Jul 2024 20:51:43 +0900
Subject: [PATCH] f2fs: use meta inode for GC of COW file
In case of the COW file, new updates and GC writes are already
separated to page caches of the atomic file and COW file. As some cases
that use the meta inode for GC, there are some race issues between a
foreground thread and GC thread.
To handle them, we need to take care when to invalidate and wait
writeback of GC pages in COW files as the case of using the meta inode.
Also, a pointer from the COW inode to the original inode is required to
check the state of original pages.
For the former, we can solve the problem by using the meta inode for GC
of COW files. Then let's get a page from the original inode in
move_data_block when GCing the COW file to avoid race condition.
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 1e4937af50f3..6457e5bca9c9 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2608,7 +2608,7 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
return true;
if (IS_NOQUOTA(inode))
return true;
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_used_in_atomic_write(inode))
return true;
/* rewrite low ratio compress data w/ OPU mode to avoid fragmentation */
if (f2fs_compressed_file(inode) &&
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 796ae11c0fa3..4a8621e4a33a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -843,7 +843,11 @@ struct f2fs_inode_info {
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
- struct inode *cow_inode; /* copy-on-write inode for atomic write */
+ union {
+ struct inode *cow_inode; /* copy-on-write inode for atomic write */
+ struct inode *atomic_inode;
+ /* point to atomic_inode, available only for cow_inode */
+ };
/* avoid racing between foreground op and gc */
struct f2fs_rwsem i_gc_rwsem[2];
@@ -4263,9 +4267,14 @@ static inline bool f2fs_post_read_required(struct inode *inode)
f2fs_compressed_file(inode);
}
+static inline bool f2fs_used_in_atomic_write(struct inode *inode)
+{
+ return f2fs_is_atomic_file(inode) || f2fs_is_cow_file(inode);
+}
+
static inline bool f2fs_meta_inode_gc_required(struct inode *inode)
{
- return f2fs_post_read_required(inode) || f2fs_is_atomic_file(inode);
+ return f2fs_post_read_required(inode) || f2fs_used_in_atomic_write(inode);
}
/*
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index e4a7cff00796..547e7ec32b1f 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2183,6 +2183,9 @@ static int f2fs_ioc_start_atomic_write(struct file *filp, bool truncate)
set_inode_flag(fi->cow_inode, FI_COW_FILE);
clear_inode_flag(fi->cow_inode, FI_INLINE_DATA);
+
+ /* Set the COW inode's atomic_inode to the atomic inode */
+ F2FS_I(fi->cow_inode)->atomic_inode = inode;
} else {
/* Reuse the already created COW inode */
ret = f2fs_do_truncate_blocks(fi->cow_inode, 0, true);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index cb3006551ab5..724bbcb447d3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1171,7 +1171,8 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
static int ra_data_block(struct inode *inode, pgoff_t index)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct address_space *mapping = inode->i_mapping;
+ struct address_space *mapping = f2fs_is_cow_file(inode) ?
+ F2FS_I(inode)->atomic_inode->i_mapping : inode->i_mapping;
struct dnode_of_data dn;
struct page *page;
struct f2fs_io_info fio = {
@@ -1260,6 +1261,8 @@ static int ra_data_block(struct inode *inode, pgoff_t index)
static int move_data_block(struct inode *inode, block_t bidx,
int gc_type, unsigned int segno, int off)
{
+ struct address_space *mapping = f2fs_is_cow_file(inode) ?
+ F2FS_I(inode)->atomic_inode->i_mapping : inode->i_mapping;
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
.ino = inode->i_ino,
@@ -1282,7 +1285,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
/* do not read out */
- page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
+ page = f2fs_grab_cache_page(mapping, bidx, false);
if (!page)
return -ENOMEM;
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 1fba5728be70..cca7d448e55c 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -16,7 +16,7 @@
static bool support_inline_data(struct inode *inode)
{
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_used_in_atomic_write(inode))
return false;
if (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode))
return false;
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7a3e2458b2d9..18dea43e694b 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,8 +804,9 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_abort_atomic_write(inode, true);
- if (fi->cow_inode) {
+ if (fi->cow_inode && f2fs_is_cow_file(fi->cow_inode)) {
clear_inode_flag(fi->cow_inode, FI_COW_FILE);
+ F2FS_I(fi->cow_inode)->atomic_inode = NULL;
iput(fi->cow_inode);
fi->cow_inode = NULL;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b40a2b00370931b0c50148681dd7364573e52e6b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072946-footpad-posh-e008@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
b40a2b003709 ("f2fs: use meta inode for GC of atomic file")
9f0c4a46be1f ("f2fs: fix to truncate meta inode pages forcely")
fd244524c2cf ("f2fs: compress: fix to cover normal cluster write with cp_rwsem")
55fdc1c24a1d ("f2fs: fix to wait on block writeback for post_read case")
4e4f1eb9949b ("f2fs: introduce f2fs_invalidate_internal_cache() for cleanup")
2eae077e6e46 ("f2fs: reduce stack memory cost by using bitfield in struct f2fs_io_info")
ed8ac22b6b75 ("f2fs: introduce f2fs_is_readonly() for readability")
967eaad1fed5 ("f2fs: fix to set flush_merge opt and show noflush_merge")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b40a2b00370931b0c50148681dd7364573e52e6b Mon Sep 17 00:00:00 2001
From: Sunmin Jeong <s_min.jeong(a)samsung.com>
Date: Wed, 10 Jul 2024 20:51:17 +0900
Subject: [PATCH] f2fs: use meta inode for GC of atomic file
The page cache of the atomic file keeps new data pages which will be
stored in the COW file. It can also keep old data pages when GCing the
atomic file. In this case, new data can be overwritten by old data if a
GC thread sets the old data page as dirty after new data page was
evicted.
Also, since all writes to the atomic file are redirected to COW inodes,
GC for the atomic file is not working well as below.
f2fs_gc(gc_type=FG_GC)
- select A as a victim segment
do_garbage_collect
- iget atomic file's inode for block B
move_data_page
f2fs_do_write_data_page
- use dn of cow inode
- set fio->old_blkaddr from cow inode
- seg_freed is 0 since block B is still valid
- goto gc_more and A is selected as victim again
To solve the problem, let's separate GC writes and updates in the atomic
file by using the meta inode for GC writes.
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 1aa7eefa659c..1e4937af50f3 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2695,7 +2695,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
}
/* wait for GCed page writeback via META_MAPPING */
- if (fio->post_read)
+ if (fio->meta_gc)
f2fs_wait_on_block_writeback(inode, fio->old_blkaddr);
/*
@@ -2790,7 +2790,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
.submitted = 0,
.compr_blocks = compr_blocks,
.need_lock = compr_blocks ? LOCK_DONE : LOCK_RETRY,
- .post_read = f2fs_post_read_required(inode) ? 1 : 0,
+ .meta_gc = f2fs_meta_inode_gc_required(inode) ? 1 : 0,
.io_type = io_type,
.io_wbc = wbc,
.bio = bio,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f7ee6c5e371e..796ae11c0fa3 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1211,7 +1211,7 @@ struct f2fs_io_info {
unsigned int in_list:1; /* indicate fio is in io_list */
unsigned int is_por:1; /* indicate IO is from recovery or not */
unsigned int encrypted:1; /* indicate file is encrypted */
- unsigned int post_read:1; /* require post read */
+ unsigned int meta_gc:1; /* require meta inode GC */
enum iostat_type io_type; /* io type */
struct writeback_control *io_wbc; /* writeback control */
struct bio **bio; /* bio for ipu */
@@ -4263,6 +4263,11 @@ static inline bool f2fs_post_read_required(struct inode *inode)
f2fs_compressed_file(inode);
}
+static inline bool f2fs_meta_inode_gc_required(struct inode *inode)
+{
+ return f2fs_post_read_required(inode) || f2fs_is_atomic_file(inode);
+}
+
/*
* compress.c
*/
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index ef667fec9a12..cb3006551ab5 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1589,7 +1589,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
start_bidx = f2fs_start_bidx_of_node(nofs, inode) +
ofs_in_node;
- if (f2fs_post_read_required(inode)) {
+ if (f2fs_meta_inode_gc_required(inode)) {
int err = ra_data_block(inode, start_bidx);
f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
@@ -1640,7 +1640,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
start_bidx = f2fs_start_bidx_of_node(nofs, inode)
+ ofs_in_node;
- if (f2fs_post_read_required(inode))
+ if (f2fs_meta_inode_gc_required(inode))
err = move_data_block(inode, start_bidx,
gc_type, segno, off);
else
@@ -1648,7 +1648,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
segno, off);
if (!err && (gc_type == FG_GC ||
- f2fs_post_read_required(inode)))
+ f2fs_meta_inode_gc_required(inode)))
submitted++;
if (locked) {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 64b11d88a21c..78c3198a6308 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3859,7 +3859,7 @@ int f2fs_inplace_write_data(struct f2fs_io_info *fio)
goto drop_bio;
}
- if (fio->post_read)
+ if (fio->meta_gc)
f2fs_truncate_meta_inode_pages(sbi, fio->new_blkaddr, 1);
stat_inc_inplace_blocks(fio->sbi);
@@ -4029,7 +4029,7 @@ void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr)
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *cpage;
- if (!f2fs_post_read_required(inode))
+ if (!f2fs_meta_inode_gc_required(inode))
return;
if (!__is_valid_data_blkaddr(blkaddr))
@@ -4048,7 +4048,7 @@ void f2fs_wait_on_block_writeback_range(struct inode *inode, block_t blkaddr,
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
block_t i;
- if (!f2fs_post_read_required(inode))
+ if (!f2fs_meta_inode_gc_required(inode))
return;
for (i = 0; i < len; i++)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072938-tackling-banking-b3a2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072937-blazer-sulfur-cc24@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072937-scandal-destiny-2a6b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072936-harvest-drown-bd07@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072935-october-footer-b2ed@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072934-durable-scuttle-9c45@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072933-unhappily-flashy-46e1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 49ae997f8f0d5e268bbd271c5fd66166ce8287fe
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072932-sage-amazingly-1d3e@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
49ae997f8f0d ("nilfs2: add missing check for inode numbers on directory entries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ae997f8f0d5e268bbd271c5fd66166ce8287fe Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:34 +0900
Subject: [PATCH] nilfs2: add missing check for inode numbers on directory
entries
Syzbot reported that mounting and unmounting a specific pattern of
corrupted nilfs2 filesystem images causes a use-after-free of metadata
file inodes, which triggers a kernel bug in lru_add_fn().
As Jan Kara pointed out, this is because the link count of a metadata file
gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(),
tries to delete that inode (ifile inode in this case).
The inconsistency occurs because directories containing the inode numbers
of these metadata files that should not be visible in the namespace are
read without checking.
Fix this issue by treating the inode numbers of these internal files as
errors in the sanity check helper when reading directory folios/pages.
Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer
analysis.
Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d79afb004be235636ee8(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8
Reported-by: Jan Kara <jack(a)suse.cz>
Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 52e50b1b7f22..dddfa604491a 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -135,6 +135,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto Enamelen;
if (((offs + rec_len - 1) ^ offs) & ~(chunk_size-1))
goto Espan;
+ if (unlikely(p->inode &&
+ NILFS_PRIVATE_INODE(le64_to_cpu(p->inode))))
+ goto Einumber;
}
if (offs != limit)
goto Eend;
@@ -160,6 +163,9 @@ static bool nilfs_check_folio(struct folio *folio, char *kaddr)
goto bad_entry;
Espan:
error = "directory entry across blocks";
+ goto bad_entry;
+Einumber:
+ error = "disallowed inode number";
bad_entry:
nilfs_error(sb,
"bad entry in directory #%lu: %s - offset=%lu, inode=%lu, rec_len=%zd, name_len=%d",
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 7e39e277c77f..4017f7856440 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -121,6 +121,11 @@ enum {
((ino) >= NILFS_FIRST_INO(sb) || \
((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
+#define NILFS_PRIVATE_INODE(ino) ({ \
+ ino_t __ino = (ino); \
+ ((__ino) < NILFS_USER_INO && (__ino) != NILFS_ROOT_INO && \
+ (__ino) != NILFS_SKETCH_INO); })
+
/**
* struct nilfs_transaction_info: context information for synchronization
* @ti_magic: Magic number
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072930-clover-wool-670c@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072929-dreamy-fable-db92@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072928-childless-tremble-2776@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072926-jawless-untracked-60f3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072925-squatter-waffle-122d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072924-daycare-driveway-4395@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
1ab8425091db ("nilfs2: fix inode number range checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1ab8425091dba7ce8bf59e09e183aaca6c2a12ac Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:33 +0900
Subject: [PATCH] nilfs2: fix inode number range checks
Patch series "nilfs2: fix potential issues related to reserved inodes".
This series fixes one use-after-free issue reported by syzbot, caused by
nilfs2's internal inode being exposed in the namespace on a corrupted
filesystem, and a couple of flaws that cause problems if the starting
number of non-reserved inodes written in the on-disk super block is
intentionally (or corruptly) changed from its default value.
This patch (of 3):
In the current implementation of nilfs2, "nilfs->ns_first_ino", which
gives the first non-reserved inode number, is read from the superblock,
but its lower limit is not checked.
As a result, if a number that overlaps with the inode number range of
reserved inodes such as the root directory or metadata files is set in the
super block parameter, the inode number test macros (NILFS_MDT_INODE and
NILFS_VALID_INODE) will not function properly.
In addition, these test macros use left bit-shift calculations using with
the inode number as the shift count via the BIT macro, but the result of a
shift calculation that exceeds the bit width of an integer is undefined in
the C specification, so if "ns_first_ino" is set to a large value other
than the default value NILFS_USER_INO (=11), the macros may potentially
malfunction depending on the environment.
Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and
by preventing bit shifts equal to or greater than the NILFS_USER_INO
constant in the inode number test macros.
Also, change the type of "ns_first_ino" from signed integer to unsigned
integer to avoid the need for type casting in comparisons such as the
lower bound check introduced this time.
Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/nilfs.h b/fs/nilfs2/nilfs.h
index 728e90be3570..7e39e277c77f 100644
--- a/fs/nilfs2/nilfs.h
+++ b/fs/nilfs2/nilfs.h
@@ -116,9 +116,10 @@ enum {
#define NILFS_FIRST_INO(sb) (((struct the_nilfs *)sb->s_fs_info)->ns_first_ino)
#define NILFS_MDT_INODE(sb, ino) \
- ((ino) < NILFS_FIRST_INO(sb) && (NILFS_MDT_INO_BITS & BIT(ino)))
+ ((ino) < NILFS_USER_INO && (NILFS_MDT_INO_BITS & BIT(ino)))
#define NILFS_VALID_INODE(sb, ino) \
- ((ino) >= NILFS_FIRST_INO(sb) || (NILFS_SYS_INO_BITS & BIT(ino)))
+ ((ino) >= NILFS_FIRST_INO(sb) || \
+ ((ino) < NILFS_USER_INO && (NILFS_SYS_INO_BITS & BIT(ino))))
/**
* struct nilfs_transaction_info: context information for synchronization
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index f41d7b6d432c..e44dde57ab65 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -452,6 +452,12 @@ static int nilfs_store_disk_layout(struct the_nilfs *nilfs,
}
nilfs->ns_first_ino = le32_to_cpu(sbp->s_first_ino);
+ if (nilfs->ns_first_ino < NILFS_USER_INO) {
+ nilfs_err(nilfs->ns_sb,
+ "too small lower limit for non-reserved inode numbers: %u",
+ nilfs->ns_first_ino);
+ return -EINVAL;
+ }
nilfs->ns_blocks_per_segment = le32_to_cpu(sbp->s_blocks_per_segment);
if (nilfs->ns_blocks_per_segment < NILFS_SEG_MIN_BLOCKS) {
diff --git a/fs/nilfs2/the_nilfs.h b/fs/nilfs2/the_nilfs.h
index 85da0629415d..1e829ed7b0ef 100644
--- a/fs/nilfs2/the_nilfs.h
+++ b/fs/nilfs2/the_nilfs.h
@@ -182,7 +182,7 @@ struct the_nilfs {
unsigned long ns_nrsvsegs;
unsigned long ns_first_data_block;
int ns_inode_size;
- int ns_first_ino;
+ unsigned int ns_first_ino;
u32 ns_crc_seed;
/* /sys/fs/<nilfs>/<device> */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072922-polyester-flatware-fa34@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
af6eae646851 ("nilfs2: convert persistent object allocator to use kmap_local")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072921-detective-rink-b08a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
af6eae646851 ("nilfs2: convert persistent object allocator to use kmap_local")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072920-limb-census-d305@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
af6eae646851 ("nilfs2: convert persistent object allocator to use kmap_local")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072919-voyage-outline-047b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
af6eae646851 ("nilfs2: convert persistent object allocator to use kmap_local")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072917-crumpet-velcro-eae5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
af6eae646851 ("nilfs2: convert persistent object allocator to use kmap_local")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072914-egotistic-crewmate-8f1b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
af6eae646851 ("nilfs2: convert persistent object allocator to use kmap_local")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x f41e355f8b48d894324a3fdc9727e08b1bce78e2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072913-overcrowd-affair-324f@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
f41e355f8b48 ("nilfs2: fix incorrect inode allocation from reserved inodes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f41e355f8b48d894324a3fdc9727e08b1bce78e2 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sun, 23 Jun 2024 14:11:35 +0900
Subject: [PATCH] nilfs2: fix incorrect inode allocation from reserved inodes
If the bitmap block that manages the inode allocation status is corrupted,
nilfs_ifile_create_inode() may allocate a new inode from the reserved
inode area where it should not be allocated.
Previous fix commit d325dc6eb763 ("nilfs2: fix use-after-free bug of
struct nilfs_root"), fixed the problem that reserved inodes with inode
numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to
bitmap corruption, but since the start number of non-reserved inodes is
read from the super block and may change, in which case inode allocation
may occur from the extended reserved inode area.
If that happens, access to that inode will cause an IO error, causing the
file system to degrade to an error state.
Fix this potential issue by adding a wraparound option to the common
metadata object allocation routine and by modifying
nilfs_ifile_create_inode() to disable the option so that it only allocates
inodes with inode numbers greater than or equal to the inode number read
in "nilfs->ns_first_ino", regardless of the bitmap status of reserved
inodes.
Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c
index 89caef7513db..ba50388ee4bf 100644
--- a/fs/nilfs2/alloc.c
+++ b/fs/nilfs2/alloc.c
@@ -377,11 +377,12 @@ void *nilfs_palloc_block_get_entry(const struct inode *inode, __u64 nr,
* @target: offset number of an entry in the group (start point)
* @bsize: size in bits
* @lock: spin lock protecting @bitmap
+ * @wrap: whether to wrap around
*/
static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
unsigned long target,
unsigned int bsize,
- spinlock_t *lock)
+ spinlock_t *lock, bool wrap)
{
int pos, end = bsize;
@@ -397,6 +398,8 @@ static int nilfs_palloc_find_available_slot(unsigned char *bitmap,
end = target;
}
+ if (!wrap)
+ return -ENOSPC;
/* wrap around */
for (pos = 0; pos < end; pos++) {
@@ -495,9 +498,10 @@ int nilfs_palloc_count_max_entries(struct inode *inode, u64 nused, u64 *nmaxp)
* nilfs_palloc_prepare_alloc_entry - prepare to allocate a persistent object
* @inode: inode of metadata file using this allocator
* @req: nilfs_palloc_req structure exchanged for the allocation
+ * @wrap: whether to wrap around
*/
int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
- struct nilfs_palloc_req *req)
+ struct nilfs_palloc_req *req, bool wrap)
{
struct buffer_head *desc_bh, *bitmap_bh;
struct nilfs_palloc_group_desc *desc;
@@ -516,7 +520,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
entries_per_group = nilfs_palloc_entries_per_group(inode);
for (i = 0; i < ngroups; i += n) {
- if (group >= ngroups) {
+ if (group >= ngroups && wrap) {
/* wrap around */
group = 0;
maxgroup = nilfs_palloc_group(inode, req->pr_entry_nr,
@@ -550,7 +554,14 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
bitmap_kaddr = kmap_local_page(bitmap_bh->b_page);
bitmap = bitmap_kaddr + bh_offset(bitmap_bh);
pos = nilfs_palloc_find_available_slot(
- bitmap, group_offset, entries_per_group, lock);
+ bitmap, group_offset, entries_per_group, lock,
+ wrap);
+ /*
+ * Since the search for a free slot in the second and
+ * subsequent bitmap blocks always starts from the
+ * beginning, the wrap flag only has an effect on the
+ * first search.
+ */
kunmap_local(bitmap_kaddr);
if (pos >= 0)
goto found;
diff --git a/fs/nilfs2/alloc.h b/fs/nilfs2/alloc.h
index b667e869ac07..d825a9faca6d 100644
--- a/fs/nilfs2/alloc.h
+++ b/fs/nilfs2/alloc.h
@@ -50,8 +50,8 @@ struct nilfs_palloc_req {
struct buffer_head *pr_entry_bh;
};
-int nilfs_palloc_prepare_alloc_entry(struct inode *,
- struct nilfs_palloc_req *);
+int nilfs_palloc_prepare_alloc_entry(struct inode *inode,
+ struct nilfs_palloc_req *req, bool wrap);
void nilfs_palloc_commit_alloc_entry(struct inode *,
struct nilfs_palloc_req *);
void nilfs_palloc_abort_alloc_entry(struct inode *, struct nilfs_palloc_req *);
diff --git a/fs/nilfs2/dat.c b/fs/nilfs2/dat.c
index 180fc8d36213..fc1caf63a42a 100644
--- a/fs/nilfs2/dat.c
+++ b/fs/nilfs2/dat.c
@@ -75,7 +75,7 @@ int nilfs_dat_prepare_alloc(struct inode *dat, struct nilfs_palloc_req *req)
{
int ret;
- ret = nilfs_palloc_prepare_alloc_entry(dat, req);
+ ret = nilfs_palloc_prepare_alloc_entry(dat, req, true);
if (ret < 0)
return ret;
diff --git a/fs/nilfs2/ifile.c b/fs/nilfs2/ifile.c
index 612e609158b5..1e86b9303b7c 100644
--- a/fs/nilfs2/ifile.c
+++ b/fs/nilfs2/ifile.c
@@ -56,13 +56,10 @@ int nilfs_ifile_create_inode(struct inode *ifile, ino_t *out_ino,
struct nilfs_palloc_req req;
int ret;
- req.pr_entry_nr = 0; /*
- * 0 says find free inode from beginning
- * of a group. dull code!!
- */
+ req.pr_entry_nr = NILFS_FIRST_INO(ifile->i_sb);
req.pr_entry_bh = NULL;
- ret = nilfs_palloc_prepare_alloc_entry(ifile, &req);
+ ret = nilfs_palloc_prepare_alloc_entry(ifile, &req, false);
if (!ret) {
ret = nilfs_palloc_get_entry_block(ifile, req.pr_entry_nr, 1,
&req.pr_entry_bh);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 66eca1021a42856d6af2a9802c99e160278aed91
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072902-yippee-superbowl-f065@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
66eca1021a42 ("mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()")
55f77df7d715 ("mm: page_alloc: control latency caused by zone PCP draining")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 66eca1021a42856d6af2a9802c99e160278aed91 Mon Sep 17 00:00:00 2001
From: Li Zhijian <lizhijian(a)fujitsu.com>
Date: Tue, 23 Jul 2024 14:44:28 +0800
Subject: [PATCH] mm/page_alloc: fix pcp->count race between drain_pages_zone()
vs __rmqueue_pcplist()
It's expected that no page should be left in pcp_list after calling
zone_pcp_disable() in offline_pages(). Previously, it's observed that
offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
Cause:
There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
involving the pcp->count variable. See below scenario:
CPU0 CPU1
---------------- ---------------
spin_lock(&pcp->lock);
__rmqueue_pcplist() {
zone_pcp_disable() {
/* list is empty */
if (list_empty(list)) {
/* add pages to pcp_list */
alloced = rmqueue_bulk()
mutex_lock(&pcp_batch_high_lock)
...
__drain_all_pages() {
drain_pages_zone() {
/* read pcp->count, it's 0 here */
count = READ_ONCE(pcp->count)
/* 0 means nothing to drain */
/* update pcp->count */
pcp->count += alloced << order;
...
...
spin_unlock(&pcp->lock);
In this case, after calling zone_pcp_disable() though, there are still some
pages in pcp_list. And these pages in pcp_list are neither movable nor
isolated, offline_pages() gets stuck as a result.
Solution:
Expand the scope of the pcp->lock to also protect pcp->count in
drain_pages_zone(), to ensure no pages are left in the pcp list after
zone_pcp_disable()
[1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujit…
Link: https://lkml.kernel.org/r/20240723064428.1179519-1-lizhijian@fujitsu.com
Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reported-by: Yao Xingtao <yaoxt.fnst(a)fujitsu.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7ac8d61148fe..28f80daf5c04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2343,16 +2343,20 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
- int count = READ_ONCE(pcp->count);
-
- while (count) {
- int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
- count -= to_drain;
+ int count;
+ do {
spin_lock(&pcp->lock);
- free_pcppages_bulk(zone, to_drain, pcp, 0);
+ count = pcp->count;
+ if (count) {
+ int to_drain = min(count,
+ pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+
+ free_pcppages_bulk(zone, to_drain, pcp, 0);
+ count -= to_drain;
+ }
spin_unlock(&pcp->lock);
- }
+ } while (count);
}
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 66eca1021a42856d6af2a9802c99e160278aed91
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072954-blaspheme-safeness-fcbc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
66eca1021a42 ("mm/page_alloc: fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()")
55f77df7d715 ("mm: page_alloc: control latency caused by zone PCP draining")
574907741599 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations")
c3e58a70425a ("mm/page_alloc: always remove pages from temporary list")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 66eca1021a42856d6af2a9802c99e160278aed91 Mon Sep 17 00:00:00 2001
From: Li Zhijian <lizhijian(a)fujitsu.com>
Date: Tue, 23 Jul 2024 14:44:28 +0800
Subject: [PATCH] mm/page_alloc: fix pcp->count race between drain_pages_zone()
vs __rmqueue_pcplist()
It's expected that no page should be left in pcp_list after calling
zone_pcp_disable() in offline_pages(). Previously, it's observed that
offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
Cause:
There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
involving the pcp->count variable. See below scenario:
CPU0 CPU1
---------------- ---------------
spin_lock(&pcp->lock);
__rmqueue_pcplist() {
zone_pcp_disable() {
/* list is empty */
if (list_empty(list)) {
/* add pages to pcp_list */
alloced = rmqueue_bulk()
mutex_lock(&pcp_batch_high_lock)
...
__drain_all_pages() {
drain_pages_zone() {
/* read pcp->count, it's 0 here */
count = READ_ONCE(pcp->count)
/* 0 means nothing to drain */
/* update pcp->count */
pcp->count += alloced << order;
...
...
spin_unlock(&pcp->lock);
In this case, after calling zone_pcp_disable() though, there are still some
pages in pcp_list. And these pages in pcp_list are neither movable nor
isolated, offline_pages() gets stuck as a result.
Solution:
Expand the scope of the pcp->lock to also protect pcp->count in
drain_pages_zone(), to ensure no pages are left in the pcp list after
zone_pcp_disable()
[1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujit…
Link: https://lkml.kernel.org/r/20240723064428.1179519-1-lizhijian@fujitsu.com
Fixes: 4b23a68f9536 ("mm/page_alloc: protect PCP lists with a spinlock")
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reported-by: Yao Xingtao <yaoxt.fnst(a)fujitsu.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7ac8d61148fe..28f80daf5c04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2343,16 +2343,20 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
- int count = READ_ONCE(pcp->count);
-
- while (count) {
- int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
- count -= to_drain;
+ int count;
+ do {
spin_lock(&pcp->lock);
- free_pcppages_bulk(zone, to_drain, pcp, 0);
+ count = pcp->count;
+ if (count) {
+ int to_drain = min(count,
+ pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
+
+ free_pcppages_bulk(zone, to_drain, pcp, 0);
+ count -= to_drain;
+ }
spin_unlock(&pcp->lock);
- }
+ } while (count);
}
/*
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 11a1f4bc47362700fcbde717292158873fb847ed
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072919-usual-unstopped-30af@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
11a1f4bc4736 ("PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 11a1f4bc47362700fcbde717292158873fb847ed Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Tue, 18 Jun 2024 12:54:55 +0200
Subject: [PATCH] PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
RIP: pci_bus_read_config_dword+0x17/0x50
pci_dev_wait()
pci_bridge_wait_for_secondary_bus()
dpc_reset_link()
pcie_do_recovery()
dpc_handler()
Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset")
Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.…
Reported-by: Keith Busch <kbusch(a)kernel.org>
Tested-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 59e0949fb079..c40a6c11e959 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4753,7 +4753,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
*/
int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
{
- struct pci_dev *child;
+ struct pci_dev *child __free(pci_dev_put) = NULL;
int delay;
if (pci_dev_is_disconnected(dev))
@@ -4782,8 +4782,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
return 0;
}
- child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
- bus_list);
+ child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
+ struct pci_dev, bus_list));
up_read(&pci_bus_sem);
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 11a1f4bc47362700fcbde717292158873fb847ed
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072918-headache-absently-933a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
11a1f4bc4736 ("PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 11a1f4bc47362700fcbde717292158873fb847ed Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Tue, 18 Jun 2024 12:54:55 +0200
Subject: [PATCH] PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Keith reports a use-after-free when a DPC event occurs concurrently to
hot-removal of the same portion of the hierarchy:
The dpc_handler() awaits readiness of the secondary bus below the
Downstream Port where the DPC event occurred. To do so, it polls the
config space of the first child device on the secondary bus. If that
child device is concurrently removed, accesses to its struct pci_dev
cause the kernel to oops.
That's because pci_bridge_wait_for_secondary_bus() neglects to hold a
reference on the child device. Before v6.3, the function was only
called on resume from system sleep or on runtime resume. Holding a
reference wasn't necessary back then because the pciehp IRQ thread
could never run concurrently. (On resume from system sleep, IRQs are
not enabled until after the resume_noirq phase. And runtime resume is
always awaited before a PCI device is removed.)
However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also
called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness
of secondary bus after reset"), which introduced that, failed to
appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a
reference on the child device because dpc_handler() and pciehp may
indeed run concurrently. The commit was backported to v5.10+ stable
kernels, so that's the oldest one affected.
Add the missing reference acquisition.
Abridged stack trace:
BUG: unable to handle page fault for address: 00000000091400c0
CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0
RIP: pci_bus_read_config_dword+0x17/0x50
pci_dev_wait()
pci_bridge_wait_for_secondary_bus()
dpc_reset_link()
pcie_do_recovery()
dpc_handler()
Fixes: 53b54ad074de ("PCI/DPC: Await readiness of secondary bus after reset")
Closes: https://lore.kernel.org/r/20240612181625.3604512-3-kbusch@meta.com/
Link: https://lore.kernel.org/linux-pci/8e4bcd4116fd94f592f2bf2749f168099c480ddf.…
Reported-by: Keith Busch <kbusch(a)kernel.org>
Tested-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Reviewed-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v5.10+
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 59e0949fb079..c40a6c11e959 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4753,7 +4753,7 @@ static int pci_bus_max_d3cold_delay(const struct pci_bus *bus)
*/
int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
{
- struct pci_dev *child;
+ struct pci_dev *child __free(pci_dev_put) = NULL;
int delay;
if (pci_dev_is_disconnected(dev))
@@ -4782,8 +4782,8 @@ int pci_bridge_wait_for_secondary_bus(struct pci_dev *dev, char *reset_type)
return 0;
}
- child = list_first_entry(&dev->subordinate->devices, struct pci_dev,
- bus_list);
+ child = pci_dev_get(list_first_entry(&dev->subordinate->devices,
+ struct pci_dev, bus_list));
up_read(&pci_bus_sem);
/*