Hello hope everyone is well, just following up on this bug report, this
appears to have been patched here
https://lore.kernel.org/lkml/20251015051828.12809-1-dapeng1.mi@linux.intel.…
thanks to Dapeng Mi; however, the patch email does not appear to have
CCed the regressions or stable list.
On 2025/10/15 Peter Zijlstra wrote:
> So yeah, I suppose this works. Let me go queue this up.
In regard to the patch, hence I assume the patch is approved for
implementation in future versions.
This is a critical bug causing widespread irrecoverable system freezes
reported by many users with many differing setups, including myself.
Notably it is being triggered by a minecraft mod called spark with at
least 150 million aggregate downloads
(https://www.curseforge.com/minecraft/mc-mods/sparkhttps://github.com/lucko/spark/issues/530).
If at all possible I would love to request that this please be
implemented/backported to kernels 6.17 and 6.18.
Thank you.
We observed failures in the 'memcontrol02' test case from the Linux Test
Project (LTP) [1] when running on a 256-core server with the 6.6.y kernel.
The test fails due to stale memory.stat values being returned, which is
caused by the current stats flushing implementation's limitations with large
core counts.
This series backports the memcg subtree stats flushing improvements from
Linux 6.8 to 6.6.y to address the issue. The main goal is to restore
per-memcg stats flushing with dynamic thresholds, which improves both
accuracy and performance of memory cgroup statistics, especially on
high-core-count systems.
Background
==========
The current stats flushing in 6.6.y flushes the entire memcg hierarchy with
a global threshold. This is not efficient and can cause stale stats when read
'memory.stat'.
Dependency Patches
==================
Patches 1-2 are dependencies required for clean application of the main
series:
Patch 1: 811244a501b9 "mm: memcg: add THP swap out info for anonymous reclaim"
This patch adds THP_SWPOUT and THP_SWPOUT_FALLBACK entries to the
memcg_vm_event_stat[] array. It is needed because patch 4 (e0bf1dc859fd)
moves the vmstats struct definitions, including this array. Without this
patch, the array structure would not match between 6.6.y and 6.8, causing
context conflicts during cherry-pick.
The patch is already in mainline (merged in v6.7) but was not included in
the stable 6.6.y branch.
Patch 2: 7108cc3f765c "mm: memcg: add per-memcg zswap writeback stat"
This patch adds the ZSWPWB entry to the memcg_vm_event_stat[] array. Like
patch 1, it is required for patch 4 to apply cleanly. The array structure
must match the 6.8 state for the code movement to succeed without
conflicts.
This patch is also in mainline (merged in v6.8) but was not backported to
6.6.y.
Main Series
===========
Patches 3-7 are the core memcg stats flushing improvements:
- Patch 3: Renames flush_next_time to flush_last_time for clarity
- Patch 4: Moves vmstats struct definitions for better code organization
- Patch 5: Implements per-memcg stats flushing thresholds (key change)
- Patch 6: Moves stats flush into workingset_test_recent()
- Patch 7: Restores subtree stats flushing (main feature)
Cherry-Pick Notes for Patch 7
==============================
Patch 7 (7d7ef0a4686a) requires manual conflict resolution in mm/zswap.c:
The conflict occurs because this patch includes changes to zswap shrinker
code that was introduced in Linux 6.8. Since this new shrinker
infrastructure does not exist in 6.6.y, the conflicting code should be
removed during cherry-pick.
Resolution: Keep the 6.6.y (HEAD) version of mm/zswap.c and discard the
new shrinker code from the patch. The conflict markers will show:
<<<<<<< HEAD
// existing 6.6.y code
=======
// new 6.8 shrinker code (shrink_memcg_cb, zswap_shrinker_scan, etc.)
>>>>>>> 7d7ef0a4686a
Simply keep the HEAD version and remove everything between the "======="
and ">>>>>>>" markers. This is safe because the zswap shrinker is a
separate new feature, not a dependency for the memcg stats changes.
Additionally, if you encounter a conflict in mm/workingset.c, it may be
due to commit 417dbd7be383 ("mm: ratelimit stat flush from workingset
shrinker") which was backported to 6.6.y. The resolution is to use:
mem_cgroup_flush_stats_ratelimited(sc->memcg)
which preserves the performance optimization while using the new API.
Testing
=======
This series has been extensively tested upstream with:
- 5000 concurrent workers in 500 cgroups doing allocations and reclaim
- 250k threads reading stats every 100ms in 50k cgroups
- No performance regressions observed with per-memcg thresholds
The changes improve both stats accuracy and reduce unnecessary flushing
overhead.
References
==========
[1] Linux Test Project (LTP): https://github.com/linux-test-project/ltp
Domenico Cerasuolo (1):
mm: memcg: add per-memcg zswap writeback stat
Xin Hao (1):
mm: memcg: add THP swap out info for anonymous reclaim
Yosry Ahmed (5):
mm: memcg: change flush_next_time to flush_last_time
mm: memcg: move vmstats structs definition above flushing code
mm: memcg: make stats flushing threshold per-memcg
mm: workingset: move the stats flush into workingset_test_recent()
mm: memcg: restore subtree stats flushing
Documentation/admin-guide/cgroup-v2.rst | 9 +
include/linux/memcontrol.h | 8 +-
include/linux/vm_event_item.h | 1 +
mm/memcontrol.c | 266 +++++++++++++-----------
mm/page_io.c | 8 +-
mm/vmscan.c | 3 +-
mm/vmstat.c | 1 +
mm/workingset.c | 42 ++--
mm/zswap.c | 4 +
9 files changed, 203 insertions(+), 139 deletions(-)
--
2.50.1
Good day,
Please see attached Purchase Order.
Kindly provide us with the invoice and relevant details including
delivery date and pricing.
Thank you
Best Regards.
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0fd20f65df6aa430454a0deed8f43efa91c54835
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025110328-lyrically-confusing-b129@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0fd20f65df6aa430454a0deed8f43efa91c54835 Mon Sep 17 00:00:00 2001
From: Gerd Bayer <gbayer(a)linux.ibm.com>
Date: Thu, 16 Oct 2025 11:27:03 +0200
Subject: [PATCH] s390/pci: Avoid deadlock between PCI error recovery and mlx5
crdump
Do not block PCI config accesses through pci_cfg_access_lock() when
executing the s390 variant of PCI error recovery: Acquire just
device_lock() instead of pci_dev_lock() as powerpc's EEH and
generig PCI AER processing do.
During error recovery testing a pair of tasks was reported to be hung:
mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working
INFO: task kmcheck:72 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<000000065256f572>] schedule_preempt_disabled+0x22/0x30
[<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8
[<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core]
[<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core]
[<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398
[<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0
INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000
Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<0000000652172e28>] pci_wait_cfg+0x80/0xe8
[<0000000652172f94>] pci_cfg_access_lock+0x74/0x88
[<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core]
[<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core]
[<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core]
[<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168
[<0000000652513212>] devlink_health_report+0x19a/0x230
[<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core]
No kernel log of the exact same error with an upstream kernel is
available - but the very same deadlock situation can be constructed there,
too:
- task: kmcheck
mlx5_unload_one() tries to acquire devlink lock while the PCI error
recovery code has set pdev->block_cfg_access by way of
pci_cfg_access_lock()
- task: kworker
mlx5_crdump_collect() tries to set block_cfg_access through
pci_cfg_access_lock() while devlink_health_report() had acquired
the devlink lock.
A similar deadlock situation can be reproduced by requesting a
crdump with
> devlink health dump show pci/<BDF> reporter fw_fatal
while PCI error recovery is executed on the same <BDF> physical function
by mlx5_core's pci_error_handlers. On s390 this can be injected with
> zpcictl --reset-fw <BDF>
Tests with this patch failed to reproduce that second deadlock situation,
the devlink command is rejected with "kernel answers: Permission denied" -
and we get a kernel log message of:
mlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5
because the config read of VSC_SEMAPHORE is rejected by the underlying
hardware.
Two prior attempts to address this issue have been discussed and
ultimately rejected [see link], with the primary argument that s390's
implementation of PCI error recovery is imposing restrictions that
neither powerpc's EEH nor PCI AER handling need. Tests show that PCI
error recovery on s390 is running to completion even without blocking
access to PCI config space.
Link: https://lore.kernel.org/all/20251007144826.2825134-1-gbayer@linux.ibm.com/
Cc: stable(a)vger.kernel.org
Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index b95376041501..27db1e72c623 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -188,7 +188,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
* is unbound or probed and that userspace can't access its
* configuration space while we perform recovery.
*/
- pci_dev_lock(pdev);
+ device_lock(&pdev->dev);
if (pdev->error_state == pci_channel_io_perm_failure) {
ers_res = PCI_ERS_RESULT_DISCONNECT;
goto out_unlock;
@@ -257,7 +257,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
driver->err_handler->resume(pdev);
pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
out_unlock:
- pci_dev_unlock(pdev);
+ device_unlock(&pdev->dev);
zpci_report_status(zdev, "recovery", status_str);
return ers_res;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 0fd20f65df6aa430454a0deed8f43efa91c54835
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025110327-capably-pond-f178@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0fd20f65df6aa430454a0deed8f43efa91c54835 Mon Sep 17 00:00:00 2001
From: Gerd Bayer <gbayer(a)linux.ibm.com>
Date: Thu, 16 Oct 2025 11:27:03 +0200
Subject: [PATCH] s390/pci: Avoid deadlock between PCI error recovery and mlx5
crdump
Do not block PCI config accesses through pci_cfg_access_lock() when
executing the s390 variant of PCI error recovery: Acquire just
device_lock() instead of pci_dev_lock() as powerpc's EEH and
generig PCI AER processing do.
During error recovery testing a pair of tasks was reported to be hung:
mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working
INFO: task kmcheck:72 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<000000065256f572>] schedule_preempt_disabled+0x22/0x30
[<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8
[<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core]
[<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core]
[<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398
[<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0
INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000
Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<0000000652172e28>] pci_wait_cfg+0x80/0xe8
[<0000000652172f94>] pci_cfg_access_lock+0x74/0x88
[<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core]
[<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core]
[<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core]
[<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168
[<0000000652513212>] devlink_health_report+0x19a/0x230
[<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core]
No kernel log of the exact same error with an upstream kernel is
available - but the very same deadlock situation can be constructed there,
too:
- task: kmcheck
mlx5_unload_one() tries to acquire devlink lock while the PCI error
recovery code has set pdev->block_cfg_access by way of
pci_cfg_access_lock()
- task: kworker
mlx5_crdump_collect() tries to set block_cfg_access through
pci_cfg_access_lock() while devlink_health_report() had acquired
the devlink lock.
A similar deadlock situation can be reproduced by requesting a
crdump with
> devlink health dump show pci/<BDF> reporter fw_fatal
while PCI error recovery is executed on the same <BDF> physical function
by mlx5_core's pci_error_handlers. On s390 this can be injected with
> zpcictl --reset-fw <BDF>
Tests with this patch failed to reproduce that second deadlock situation,
the devlink command is rejected with "kernel answers: Permission denied" -
and we get a kernel log message of:
mlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5
because the config read of VSC_SEMAPHORE is rejected by the underlying
hardware.
Two prior attempts to address this issue have been discussed and
ultimately rejected [see link], with the primary argument that s390's
implementation of PCI error recovery is imposing restrictions that
neither powerpc's EEH nor PCI AER handling need. Tests show that PCI
error recovery on s390 is running to completion even without blocking
access to PCI config space.
Link: https://lore.kernel.org/all/20251007144826.2825134-1-gbayer@linux.ibm.com/
Cc: stable(a)vger.kernel.org
Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index b95376041501..27db1e72c623 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -188,7 +188,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
* is unbound or probed and that userspace can't access its
* configuration space while we perform recovery.
*/
- pci_dev_lock(pdev);
+ device_lock(&pdev->dev);
if (pdev->error_state == pci_channel_io_perm_failure) {
ers_res = PCI_ERS_RESULT_DISCONNECT;
goto out_unlock;
@@ -257,7 +257,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
driver->err_handler->resume(pdev);
pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
out_unlock:
- pci_dev_unlock(pdev);
+ device_unlock(&pdev->dev);
zpci_report_status(zdev, "recovery", status_str);
return ers_res;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 0fd20f65df6aa430454a0deed8f43efa91c54835
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025110326-germicide-pantry-dd6f@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0fd20f65df6aa430454a0deed8f43efa91c54835 Mon Sep 17 00:00:00 2001
From: Gerd Bayer <gbayer(a)linux.ibm.com>
Date: Thu, 16 Oct 2025 11:27:03 +0200
Subject: [PATCH] s390/pci: Avoid deadlock between PCI error recovery and mlx5
crdump
Do not block PCI config accesses through pci_cfg_access_lock() when
executing the s390 variant of PCI error recovery: Acquire just
device_lock() instead of pci_dev_lock() as powerpc's EEH and
generig PCI AER processing do.
During error recovery testing a pair of tasks was reported to be hung:
mlx5_core 0000:00:00.1: mlx5_health_try_recover:338:(pid 5553): health recovery flow aborted, PCI reads still not working
INFO: task kmcheck:72 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kmcheck state:D stack:0 pid:72 tgid:72 ppid:2 flags:0x00000000
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<000000065256f572>] schedule_preempt_disabled+0x22/0x30
[<0000000652570a94>] __mutex_lock.constprop.0+0x484/0x8a8
[<000003ff800673a4>] mlx5_unload_one+0x34/0x58 [mlx5_core]
[<000003ff8006745c>] mlx5_pci_err_detected+0x94/0x140 [mlx5_core]
[<0000000652556c5a>] zpci_event_attempt_error_recovery+0xf2/0x398
[<0000000651b9184a>] __zpci_event_error+0x23a/0x2c0
INFO: task kworker/u1664:6:1514 blocked for more than 122 seconds.
Not tainted 5.14.0-570.12.1.bringup7.el9.s390x #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/u1664:6 state:D stack:0 pid:1514 tgid:1514 ppid:2 flags:0x00000000
Workqueue: mlx5_health0000:00:00.0 mlx5_fw_fatal_reporter_err_work [mlx5_core]
Call Trace:
[<000000065256f030>] __schedule+0x2a0/0x590
[<000000065256f356>] schedule+0x36/0xe0
[<0000000652172e28>] pci_wait_cfg+0x80/0xe8
[<0000000652172f94>] pci_cfg_access_lock+0x74/0x88
[<000003ff800916b6>] mlx5_vsc_gw_lock+0x36/0x178 [mlx5_core]
[<000003ff80098824>] mlx5_crdump_collect+0x34/0x1c8 [mlx5_core]
[<000003ff80074b62>] mlx5_fw_fatal_reporter_dump+0x6a/0xe8 [mlx5_core]
[<0000000652512242>] devlink_health_do_dump.part.0+0x82/0x168
[<0000000652513212>] devlink_health_report+0x19a/0x230
[<000003ff80075a12>] mlx5_fw_fatal_reporter_err_work+0xba/0x1b0 [mlx5_core]
No kernel log of the exact same error with an upstream kernel is
available - but the very same deadlock situation can be constructed there,
too:
- task: kmcheck
mlx5_unload_one() tries to acquire devlink lock while the PCI error
recovery code has set pdev->block_cfg_access by way of
pci_cfg_access_lock()
- task: kworker
mlx5_crdump_collect() tries to set block_cfg_access through
pci_cfg_access_lock() while devlink_health_report() had acquired
the devlink lock.
A similar deadlock situation can be reproduced by requesting a
crdump with
> devlink health dump show pci/<BDF> reporter fw_fatal
while PCI error recovery is executed on the same <BDF> physical function
by mlx5_core's pci_error_handlers. On s390 this can be injected with
> zpcictl --reset-fw <BDF>
Tests with this patch failed to reproduce that second deadlock situation,
the devlink command is rejected with "kernel answers: Permission denied" -
and we get a kernel log message of:
mlx5_core 1ed0:00:00.1: mlx5_crdump_collect:50:(pid 254382): crdump: failed to lock vsc gw err -5
because the config read of VSC_SEMAPHORE is rejected by the underlying
hardware.
Two prior attempts to address this issue have been discussed and
ultimately rejected [see link], with the primary argument that s390's
implementation of PCI error recovery is imposing restrictions that
neither powerpc's EEH nor PCI AER handling need. Tests show that PCI
error recovery on s390 is running to completion even without blocking
access to PCI config space.
Link: https://lore.kernel.org/all/20251007144826.2825134-1-gbayer@linux.ibm.com/
Cc: stable(a)vger.kernel.org
Fixes: 4cdf2f4e24ff ("s390/pci: implement minimal PCI error recovery")
Reviewed-by: Niklas Schnelle <schnelle(a)linux.ibm.com>
Signed-off-by: Gerd Bayer <gbayer(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
diff --git a/arch/s390/pci/pci_event.c b/arch/s390/pci/pci_event.c
index b95376041501..27db1e72c623 100644
--- a/arch/s390/pci/pci_event.c
+++ b/arch/s390/pci/pci_event.c
@@ -188,7 +188,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
* is unbound or probed and that userspace can't access its
* configuration space while we perform recovery.
*/
- pci_dev_lock(pdev);
+ device_lock(&pdev->dev);
if (pdev->error_state == pci_channel_io_perm_failure) {
ers_res = PCI_ERS_RESULT_DISCONNECT;
goto out_unlock;
@@ -257,7 +257,7 @@ static pci_ers_result_t zpci_event_attempt_error_recovery(struct pci_dev *pdev)
driver->err_handler->resume(pdev);
pci_uevent_ers(pdev, PCI_ERS_RESULT_RECOVERED);
out_unlock:
- pci_dev_unlock(pdev);
+ device_unlock(&pdev->dev);
zpci_report_status(zdev, "recovery", status_str);
return ers_res;
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 64e2f60f355e556337fcffe80b9bcff1b22c9c42
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025110339-catching-blah-8209@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 64e2f60f355e556337fcffe80b9bcff1b22c9c42 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <hca(a)linux.ibm.com>
Date: Thu, 30 Oct 2025 15:55:05 +0100
Subject: [PATCH] s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
As reported by Luiz Capitulino enabling HVO on s390 leads to reproducible
crashes. The problem is that kernel page tables are modified without
flushing corresponding TLB entries.
Even if it looks like the empty flush_tlb_all() implementation on s390 is
the problem, it is actually a different problem: on s390 it is not allowed
to replace an active/valid page table entry with another valid page table
entry without the detour over an invalid entry. A direct replacement may
lead to random crashes and/or data corruption.
In order to invalidate an entry special instructions have to be used
(e.g. ipte or idte). Alternatively there are also special instructions
available which allow to replace a valid entry with a different valid
entry (e.g. crdte or cspg).
Given that the HVO code currently does not provide the hooks to allow for
an implementation which is compliant with the s390 architecture
requirements, disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP again, which is
basically a revert of the original patch which enabled it.
Reported-by: Luiz Capitulino <luizcap(a)redhat.com>
Closes: https://lore.kernel.org/all/20251028153930.37107-1-luizcap@redhat.com/
Fixes: 00a34d5a99c0 ("s390: select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP")
Cc: stable(a)vger.kernel.org
Tested-by: Luiz Capitulino <luizcap(a)redhat.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index c4145672ca34..df22b10d9141 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -158,7 +158,6 @@ config S390
select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
select ARCH_WANT_KERNEL_PMD_MKWRITE
select ARCH_WANT_LD_ORPHAN_WARN
- select ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
select ARCH_WANTS_THP_SWAP
select BUILDTIME_TABLE_SORT
select CLONE_BACKWARDS2
This kernel version doesn't build with GCC 15:
In file included from include/uapi/linux/posix_types.h:5,
from include/uapi/linux/types.h:14,
from include/linux/types.h:6,
from arch/x86/realmode/rm/wakeup.h:11,
from arch/x86/realmode/rm/wakemain.c:2:
include/linux/stddef.h:11:9: error: cannot use keyword 'false' as enumeration constant
11 | false = 0,
| ^~~~~
include/linux/stddef.h:11:9: note: 'false' is a keyword with '-std=c23' onwards
include/linux/types.h:30:33: error: 'bool' cannot be defined via 'typedef'
30 | typedef _Bool bool;
| ^~~~
include/linux/types.h:30:33: note: 'bool' is a keyword with '-std=c23' onwards
include/linux/types.h:30:1: warning: useless type name in empty declaration
30 | typedef _Bool bool;
| ^~~~~~~
I initially fixed this by adding -std=gnu11 in arch/x86/Makefile, then I
realised this fix was already done in an upstream commit, created before
the GCC 15 release and not mentioning the error I had. This is the first
patch.
When I was investigating my error, I noticed other commits were already
backported to v5.15. They were all adding -std=gnu11 in different
Makefiles. In their commit message, they were mentioning 'gnu11' was
picked to use the same as the one from the main Makefile. But this is
not the case in this kernel version. Patch 2 fixes that.
Finally, I noticed the documentation was not correct in this kernel
version: this is because a commit was backported to v5.15 while it was
not supposed to. Patch 3 fixes that.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Alexey Dobriyan (1):
x86/boot: Compile boot code with -std=gnu11 too
Matthieu Baerts (NGI0) (2):
arch: back to -std=gnu89 in < v5.18
Revert "docs/process/howto: Replace C89 with C11"
Documentation/process/howto.rst | 2 +-
Documentation/translations/it_IT/process/howto.rst | 2 +-
Documentation/translations/ja_JP/howto.rst | 2 +-
Documentation/translations/ko_KR/howto.rst | 2 +-
Documentation/translations/zh_CN/process/howto.rst | 2 +-
Documentation/translations/zh_TW/process/howto.rst | 2 +-
arch/parisc/boot/compressed/Makefile | 2 +-
arch/s390/Makefile | 2 +-
arch/s390/purgatory/Makefile | 2 +-
arch/x86/Makefile | 2 +-
arch/x86/boot/compressed/Makefile | 2 +-
drivers/firmware/efi/libstub/Makefile | 2 +-
12 files changed, 12 insertions(+), 12 deletions(-)
---
base-commit: 06cf22cc87e00b878c310d5441981b7750f04078
change-id: 20251017-v5-15-gcc-15-5ceda8ebe577
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>