Commit cc3ed80ae69f ("KVM: nSVM: always use vmcb01 to for vmsave/vmload
of guest state") made KVM always use vmcb01 for the fields controlled by
VMSAVE/VMLOAD, but it missed updating the VMLOAD/VMSAVE emulation code
to always use vmcb01.
As a result, if VMSAVE/VMLOAD is executed by an L2 guest and is not
intercepted by L1, KVM will mistakenly use vmcb02. Always use vmcb01
instead of the current VMCB.
Fixes: cc3ed80ae69f ("KVM: nSVM: always use vmcb01 to for vmsave/vmload of guest state")
Cc: Maxim Levitsky <mlevitsk(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
arch/x86/kvm/svm/svm.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7041498a8091..4e4439a01828 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2165,12 +2165,13 @@ static int vmload_vmsave_interception(struct kvm_vcpu *vcpu, bool vmload)
ret = kvm_skip_emulated_instruction(vcpu);
+ /* KVM always performs VMLOAD/VMSAVE on VMCB01 (see __svm_vcpu_run()) */
if (vmload) {
- svm_copy_vmloadsave_state(svm->vmcb, vmcb12);
+ svm_copy_vmloadsave_state(svm->vmcb01.ptr, vmcb12);
svm->sysenter_eip_hi = 0;
svm->sysenter_esp_hi = 0;
} else {
- svm_copy_vmloadsave_state(vmcb12, svm->vmcb);
+ svm_copy_vmloadsave_state(vmcb12, svm->vmcb01.ptr);
}
kvm_vcpu_unmap(vcpu, &map);
--
2.52.0.457.g6b5491de43-goog
This is the start of the stable review cycle for the 6.12.65 release.
There are 16 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jan 2026 11:19:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.65-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.65-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "iommu/amd: Skip enabling command/event buffers for kdump"
Sean Nyekjaer <sean(a)geanix.com>
pwm: stm32: Always program polarity
Maximilian Immanuel Brandtner <maxbr(a)linux.ibm.com>
virtio_console: fix order of fields cols and rows
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Proportional newidle balance
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to update_newidle_cost()
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to sched_balance_newidle()
Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com>
net: Remove RTNL dance for SIOCBRADDIF and SIOCBRDELIF.
Richa Bharti <richa.bharti(a)siemens.com>
cpufreq: intel_pstate: Check IDA only before MSR_IA32_PERF_CTL writes
Natalie Vock <natalie.vock(a)gmx.de>
drm/amdgpu: Forward VMID reservation errors
Miaoqian Lin <linmq006(a)gmail.com>
net: phy: mediatek: fix nvmem cell reference leak in mt798x_phy_calibration
Jouni Malinen <jouni.malinen(a)oss.qualcomm.com>
wifi: mac80211: Discard Beacon frames to non-broadcast address
Paolo Abeni <pabeni(a)redhat.com>
mptcp: ensure context reset on disconnect()
Bijan Tabatabai <bijan311(a)gmail.com>
mm: consider non-anon swap cache folios in folio_expected_ref_count()
David Hildenbrand <david(a)redhat.com>
mm: simplify folio_expected_ref_count()
Alexander Gordeev <agordeev(a)linux.ibm.com>
mm/page_alloc: change all pageblocks migrate type on coalescing
Paolo Abeni <pabeni(a)redhat.com>
mptcp: fallback earlier on simult connection
-------------
Diffstat:
Makefile | 4 +--
drivers/char/virtio_console.c | 2 +-
drivers/cpufreq/intel_pstate.c | 9 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 6 ++--
drivers/iommu/amd/init.c | 28 +++++----------
drivers/net/phy/mediatek-ge-soc.c | 2 +-
drivers/pwm/pwm-stm32.c | 3 +-
include/linux/if_bridge.h | 6 ++--
include/linux/mm.h | 10 +++---
include/linux/sched/topology.h | 3 ++
kernel/sched/core.c | 3 ++
kernel/sched/fair.c | 65 +++++++++++++++++++++++++++-------
kernel/sched/features.h | 5 +++
kernel/sched/sched.h | 7 ++++
kernel/sched/topology.c | 6 ++++
mm/page_alloc.c | 24 ++++++-------
net/bridge/br_ioctl.c | 36 +++++++++++++++++--
net/bridge/br_private.h | 3 +-
net/core/dev_ioctl.c | 16 ---------
net/mac80211/rx.c | 5 +++
net/mptcp/options.c | 10 ++++++
net/mptcp/protocol.c | 8 +++--
net/mptcp/protocol.h | 9 +++--
net/mptcp/subflow.c | 10 +-----
net/socket.c | 19 +++++-----
25 files changed, 186 insertions(+), 113 deletions(-)
This is the start of the stable review cycle for the 6.18.5 release.
There are 5 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jan 2026 11:19:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.18.5-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.18.5-rc1
Mike Snitzer <snitzer(a)kernel.org>
nfs/localio: fix regression due to out-of-order __put_cred
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Proportional newidle balance
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to update_newidle_cost()
Peter Zijlstra <peterz(a)infradead.org>
sched/fair: Small cleanup to sched_balance_newidle()
Paolo Abeni <pabeni(a)redhat.com>
mptcp: ensure context reset on disconnect()
-------------
Diffstat:
Makefile | 4 +--
fs/nfs/localio.c | 12 ++++----
include/linux/sched/topology.h | 3 ++
kernel/sched/core.c | 3 ++
kernel/sched/fair.c | 65 ++++++++++++++++++++++++++++++++++--------
kernel/sched/features.h | 5 ++++
kernel/sched/sched.h | 7 +++++
kernel/sched/topology.c | 6 ++++
net/mptcp/protocol.c | 8 ++++--
net/mptcp/protocol.h | 3 +-
10 files changed, 92 insertions(+), 24 deletions(-)
When the CRU is configured to use ICnSVC for virtual channel mapping,
as on the RZ/{G3E, V2H/P} SoC, the ICnMC register must not be
programmed.
Return early after setting up ICnSVC to avoid overriding the ICnMC
register, which is not applicable in this mode.
This prevents unintended register programming when ICnSVC is enabled.
Fixes: 3c5ca0a48bb0 ("media: rzg2l-cru: Drop function pointer to configure CSI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tommaso Merciai <tommaso.merciai.xr(a)bp.renesas.com>
---
drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c b/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c
index 162e2ace6931..480e9b5dbcfe 100644
--- a/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c
+++ b/drivers/media/platform/renesas/rzg2l-cru/rzg2l-video.c
@@ -268,6 +268,8 @@ static void rzg2l_cru_csi2_setup(struct rzg2l_cru_dev *cru,
rzg2l_cru_write(cru, ICnSVCNUM, csi_vc);
rzg2l_cru_write(cru, ICnSVC, ICnSVC_SVC0(0) | ICnSVC_SVC1(1) |
ICnSVC_SVC2(2) | ICnSVC_SVC3(3));
+
+ return;
}
icnmc |= rzg2l_cru_read(cru, info->image_conv) & ~ICnMC_INF_MASK;
--
2.43.0
The arm64 kernel doesn't boot with annotated branches
(PROFILE_ANNOTATED_BRANCHES) enabled and CONFIG_DEBUG_VIRTUAL together.
Bisecting it, I found that disabling branch profiling in arch/arm64/mm
solved the problem. Narrowing down a bit further, I found that
physaddr.c is the file that needs to have branch profiling disabled to
get the machine to boot.
I suspect that it might invoke some ftrace helper very early in the boot
process and ftrace is still not enabled(!?).
Disable branch profiling for physaddr.o to allow booting an arm64
machine with CONFIG_PROFILE_ANNOTATED_BRANCHES and
CONFIG_DEBUG_VIRTUAL together.
Cc: stable(a)vger.kernel.org
Fixes: ec6d06efb0bac ("arm64: Add support for CONFIG_DEBUG_VIRTUAL")
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Another approach is to disable profiling on all arch/arm64 code, similarly to
x86, where DISABLE_BRANCH_PROFILING is called for all arch/x86 code. See
commit 2cbb20b008dba ("tracing: Disable branch profiling in noinstr
code").
---
arch/arm64/mm/Makefile | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index c26489cf96cd..8bfe2451ea26 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -14,5 +14,10 @@ obj-$(CONFIG_ARM64_MTE) += mteswap.o
obj-$(CONFIG_ARM64_GCS) += gcs.o
KASAN_SANITIZE_physaddr.o += n
+# Branch profiling isn't noinstr-safe
+ifdef CONFIG_TRACE_BRANCH_PROFILING
+CFLAGS_physaddr.o += -DDISABLE_BRANCH_PROFILING
+endif
+
obj-$(CONFIG_KASAN) += kasan_init.o
KASAN_SANITIZE_kasan_init.o := n
---
base-commit: c8ebd433459bcbf068682b09544e830acd7ed222
change-id: 20251231-annotated-75de3f33cd7b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
With PWRSTS_OFF_ON, PCIe GDSCs are turned off during gdsc_disable(). This
can happen during scenarios such as system suspend and breaks the resume
of PCIe controllers from suspend.
So use PWRSTS_RET_ON to indicate the GDSC driver to not turn off the GDSCs
during gdsc_disable() and allow the hardware to transition the GDSCs to
retention when the parent domain enters low power state during system
suspend.
Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
---
Krishna Chaitanya Chundru (7):
clk: qcom: gcc-sc7280: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-sa8775p: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-sm8750: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-glymur: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-qcs8300: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-x1e80100: Do not turn off PCIe GDSCs during gdsc_disable()
clk: qcom: gcc-kaanapali: Do not turn off PCIe GDSCs during gdsc_disable()
drivers/clk/qcom/gcc-glymur.c | 16 ++++++++--------
drivers/clk/qcom/gcc-kaanapali.c | 2 +-
drivers/clk/qcom/gcc-qcs8300.c | 4 ++--
drivers/clk/qcom/gcc-sa8775p.c | 4 ++--
drivers/clk/qcom/gcc-sc7280.c | 2 +-
drivers/clk/qcom/gcc-sm8750.c | 2 +-
drivers/clk/qcom/gcc-x1e80100.c | 16 ++++++++--------
7 files changed, 23 insertions(+), 23 deletions(-)
---
base-commit: 98e506ee7d10390b527aeddee7bbeaf667129646
change-id: 20260102-pci_gdsc_fix-1dcf08223922
Best regards,
--
Krishna Chaitanya Chundru <krishna.chundru(a)oss.qualcomm.com>
TCR2_ELx.E0POE is set during smp_init().
However, this bit is not reprogrammed when the CPU enters suspension and
later resumes via cpu_resume(), as __cpu_setup() does not re-enable E0POE
and there is no save/restore logic for the TCR2_ELx system register.
As a result, the E0POE feature no longer works after cpu_resume().
To address this, save and restore TCR2_EL1 in the cpu_suspend()/cpu_resume()
path, rather than adding related logic to __cpu_setup(), taking into account
possible future extensions of the TCR2_ELx feature.
Cc: stable(a)vger.kernel.org
Fixes: bf83dae90fbc ("arm64: enable the Permission Overlay Extension for EL0")
Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
---
Patch History
==============
from v1 to v2:
- following @Kevin Brodsky suggestion.
- https://lore.kernel.org/all/20260105200707.2071169-1-yeoreum.yun@arm.com/
NOTE:
This patch based on v6.19-rc4
---
arch/arm64/include/asm/suspend.h | 2 +-
arch/arm64/mm/proc.S | 8 ++++++++
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/suspend.h b/arch/arm64/include/asm/suspend.h
index e65f33edf9d6..e9ce68d50ba4 100644
--- a/arch/arm64/include/asm/suspend.h
+++ b/arch/arm64/include/asm/suspend.h
@@ -2,7 +2,7 @@
#ifndef __ASM_SUSPEND_H
#define __ASM_SUSPEND_H
-#define NR_CTX_REGS 13
+#define NR_CTX_REGS 14
#define NR_CALLEE_SAVED_REGS 12
/*
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 01e868116448..5d907ce3b6d3 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -110,6 +110,10 @@ SYM_FUNC_START(cpu_do_suspend)
* call stack.
*/
str x18, [x0, #96]
+alternative_if ARM64_HAS_TCR2
+ mrs x2, REG_TCR2_EL1
+ str x2, [x0, #104]
+alternative_else_nop_endif
ret
SYM_FUNC_END(cpu_do_suspend)
@@ -144,6 +148,10 @@ SYM_FUNC_START(cpu_do_resume)
msr tcr_el1, x8
msr vbar_el1, x9
msr mdscr_el1, x10
+alternative_if ARM64_HAS_TCR2
+ ldr x2, [x0, #104]
+ msr REG_TCR2_EL1, x2
+alternative_else_nop_endif
msr sctlr_el1, x12
set_this_cpu_offset x13
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
Sparse inode cluster allocation sets min/max agbno values to avoid
allocating an inode cluster that might map to an invalid inode
chunk. For example, we can't have an inode record mapped to agbno 0
or that extends past the end of a runt AG of misaligned size.
The initial calculation of max_agbno is unnecessarily conservative,
however. This has triggered a corner case allocation failure where a
small runt AG (i.e. 2063 blocks) is mostly full save for an extent
to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this
case, which happens to be the offset of the last possible valid
inode chunk in the AG. In practice, we should be able to allocate
the 4-block cluster at agbno 2052 to map to the parent inode record
at agbno 2048, but the max_agbno value precludes it.
Note that this can result in filesystem shutdown via dirty trans
cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk
all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because
the tail AG selection by the allocator sets t_highest_agno on the
transaction. If the inode allocator spins around and finds an inode
chunk with free inodes in an earlier AG, the subsequent dir name
creation path may still fail to allocate due to the AG restriction
and cancel.
To avoid this problem, update the max_agbno calculation to the agbno
prior to the last chunk aligned agbno in the AG. This is not
necessarily the last valid allocation target for a sparse chunk, but
since inode chunks (i.e. records) are chunk aligned and sparse
allocs are cluster sized/aligned, this allows the sb_spino_align
alignment restriction to take over and round down the max effective
agbno to within the last valid inode chunk in the AG.
Note that even though the allocator improvements in the
aforementioned commit seem to avoid this particular dirty trans
cancel situation, the max_agbno logic improvement still applies as
we should be able to allocate from an AG that has been appropriately
selected. The more important target for this patch however are
older/stable kernels prior to this allocator rework/improvement.
Cc: <stable(a)vger.kernel.org> # v4.2
Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure")
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: "Darrick J. Wong" <djwong(a)kernel.org>
---
v2:
- Added misc. commit log tags.
v1: https://lore.kernel.org/linux-xfs/20260108141129.7765-1-bfoster@redhat.com/
fs/xfs/libxfs/xfs_ialloc.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
index d97295eaebe6..c19d6d713780 100644
--- a/fs/xfs/libxfs/xfs_ialloc.c
+++ b/fs/xfs/libxfs/xfs_ialloc.c
@@ -848,15 +848,16 @@ xfs_ialloc_ag_alloc(
* invalid inode records, such as records that start at agbno 0
* or extend beyond the AG.
*
- * Set min agbno to the first aligned, non-zero agbno and max to
- * the last aligned agbno that is at least one full chunk from
- * the end of the AG.
+ * Set min agbno to the first chunk aligned, non-zero agbno and
+ * max to one less than the last chunk aligned agbno from the
+ * end of the AG. We subtract 1 from max so that the cluster
+ * allocation alignment takes over and allows allocation within
+ * the last full inode chunk in the AG.
*/
args.min_agbno = args.mp->m_sb.sb_inoalignmt;
args.max_agbno = round_down(xfs_ag_block_count(args.mp,
pag_agno(pag)),
- args.mp->m_sb.sb_inoalignmt) -
- igeo->ialloc_blks;
+ args.mp->m_sb.sb_inoalignmt) - 1;
error = xfs_alloc_vextent_near_bno(&args,
xfs_agbno_to_fsb(pag,
--
2.52.0
Since commit c6e126de43e7 ("of: Keep track of populated platform
devices") child devices will not be created by of_platform_populate()
if the devices had previously been deregistered individually so that the
OF_POPULATED flag is still set in the corresponding OF nodes.
Switch to using of_platform_depopulate() instead of open coding so that
the child devices are created if the driver is rebound.
Fixes: c6e126de43e7 ("of: Keep track of populated platform devices")
Cc: stable(a)vger.kernel.org # 3.16
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/mfd/omap-usb-host.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/mfd/omap-usb-host.c b/drivers/mfd/omap-usb-host.c
index a77b6fc790f2..4d29a6e2ed87 100644
--- a/drivers/mfd/omap-usb-host.c
+++ b/drivers/mfd/omap-usb-host.c
@@ -819,8 +819,10 @@ static void usbhs_omap_remove(struct platform_device *pdev)
{
pm_runtime_disable(&pdev->dev);
- /* remove children */
- device_for_each_child(&pdev->dev, NULL, usbhs_omap_remove_child);
+ if (pdev->dev.of_node)
+ of_platform_depopulate(&pdev->dev);
+ else
+ device_for_each_child(&pdev->dev, NULL, usbhs_omap_remove_child);
}
static const struct dev_pm_ops usbhsomap_dev_pm_ops = {
--
2.51.2
From ade501a5ea27db18e827054d812ea6cc4679b65e Mon Sep 17 00:00:00 2001
From: Ionut Nechita <ionut.nechita(a)windriver.com>
Date: Tue, 23 Dec 2025 12:29:14 +0200
Subject: [PATCH] block/blk-mq: fix RT kernel regression with dedicated
quiesce_sync_lock
In RT kernel (PREEMPT_RT), commit 679b1874eba7 ("block: fix ordering
between checking QUEUE_FLAG_QUIESCED request adding") causes severe
performance regression on systems with multiple MSI-X interrupt vectors.
The commit added spinlock_t queue_lock to blk_mq_run_hw_queue() to
synchronize QUEUE_FLAG_QUIESCED checks with blk_mq_unquiesce_queue().
While this works correctly in standard kernel, it causes catastrophic
serialization in RT kernel where spinlock_t converts to sleeping
rt_mutex.
Problem in RT kernel:
- blk_mq_run_hw_queue() is called from IRQ thread context (I/O completion)
- With 8 MSI-X vectors, all 8 IRQ threads contend on the same queue_lock
- queue_lock becomes rt_mutex (sleeping) in RT kernel
- IRQ threads serialize and enter D-state waiting for lock
- Throughput drops from 640 MB/s to 153 MB/s
The original commit message noted that memory barriers were considered
but rejected because "memory barrier is not easy to be maintained" -
barriers would need to be added at multiple call sites throughout the
block layer where work is added before calling blk_mq_run_hw_queue().
Solution:
Instead of using the general-purpose queue_lock or attempting complex
memory barrier pairing across many call sites, introduce a dedicated
raw_spinlock_t quiesce_sync_lock specifically for synchronizing the
quiesce state between:
- blk_mq_quiesce_queue_nowait()
- blk_mq_unquiesce_queue()
- blk_mq_run_hw_queue()
Why raw_spinlock is safe:
- Critical section is provably short (only flag and counter checks)
- No sleeping operations under lock
- raw_spinlock does not convert to rt_mutex in RT kernel
- Provides same ordering guarantees as original queue_lock approach
This approach:
- Maintains correctness of original synchronization
- Avoids sleeping in RT kernel's IRQ thread context
- Limits scope to only quiesce-related synchronization
- Simpler than auditing all call sites for memory barrier pairing
Additionally, change blk_freeze_queue_start to use async=true for better
performance in RT kernel by avoiding synchronous queue runs during freeze.
Test results on RT kernel (megaraid_sas with 8 MSI-X vectors):
- Before: 153 MB/s, 6-8 IRQ threads in D-state
- After: 640 MB/s, 0 IRQ threads blocked
Fixes: 679b1874eba7 ("block: fix ordering between checking QUEUE_FLAG_QUIESCED request adding")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ionut Nechita <ionut.nechita(a)windriver.com>
---
block/blk-core.c | 1 +
block/blk-mq.c | 30 +++++++++++++++++++-----------
include/linux/blkdev.h | 6 ++++++
3 files changed, 26 insertions(+), 11 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index c7b6c1f76359..33a954422415 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -434,6 +434,7 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
mutex_init(&q->limits_lock);
mutex_init(&q->rq_qos_mutex);
spin_lock_init(&q->queue_lock);
+ raw_spin_lock_init(&q->quiesce_sync_lock);
init_waitqueue_head(&q->mq_freeze_wq);
mutex_init(&q->mq_freeze_lock);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index e1bca29dc358..c7ca2f485e8e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -178,7 +178,7 @@ bool __blk_freeze_queue_start(struct request_queue *q,
percpu_ref_kill(&q->q_usage_counter);
mutex_unlock(&q->mq_freeze_lock);
if (queue_is_mq(q))
- blk_mq_run_hw_queues(q, false);
+ blk_mq_run_hw_queues(q, true);
} else {
mutex_unlock(&q->mq_freeze_lock);
}
@@ -289,10 +289,10 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
{
unsigned long flags;
- spin_lock_irqsave(&q->queue_lock, flags);
+ raw_spin_lock_irqsave(&q->quiesce_sync_lock, flags);
if (!q->quiesce_depth++)
blk_queue_flag_set(QUEUE_FLAG_QUIESCED, q);
- spin_unlock_irqrestore(&q->queue_lock, flags);
+ raw_spin_unlock_irqrestore(&q->quiesce_sync_lock, flags);
}
EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
@@ -344,14 +344,14 @@ void blk_mq_unquiesce_queue(struct request_queue *q)
unsigned long flags;
bool run_queue = false;
- spin_lock_irqsave(&q->queue_lock, flags);
+ raw_spin_lock_irqsave(&q->quiesce_sync_lock, flags);
if (WARN_ON_ONCE(q->quiesce_depth <= 0)) {
;
} else if (!--q->quiesce_depth) {
blk_queue_flag_clear(QUEUE_FLAG_QUIESCED, q);
run_queue = true;
}
- spin_unlock_irqrestore(&q->queue_lock, flags);
+ raw_spin_unlock_irqrestore(&q->quiesce_sync_lock, flags);
/* dispatch requests which are inserted during quiescing */
if (run_queue)
@@ -2323,19 +2323,27 @@ void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async)
might_sleep_if(!async && hctx->flags & BLK_MQ_F_BLOCKING);
+ /*
+ * First lockless check to avoid unnecessary overhead.
+ */
need_run = blk_mq_hw_queue_need_run(hctx);
if (!need_run) {
unsigned long flags;
/*
- * Synchronize with blk_mq_unquiesce_queue(), because we check
- * if hw queue is quiesced locklessly above, we need the use
- * ->queue_lock to make sure we see the up-to-date status to
- * not miss rerunning the hw queue.
+ * Synchronize with blk_mq_unquiesce_queue(). We check if hw
+ * queue is quiesced locklessly above, so we need to use
+ * quiesce_sync_lock to ensure we see the up-to-date status
+ * and don't miss rerunning the hw queue.
+ *
+ * Uses raw_spinlock to avoid sleeping in RT kernel's IRQ
+ * thread context during I/O completion. Critical section is
+ * short (only flag and counter checks), making raw_spinlock
+ * safe.
*/
- spin_lock_irqsave(&hctx->queue->queue_lock, flags);
+ raw_spin_lock_irqsave(&hctx->queue->quiesce_sync_lock, flags);
need_run = blk_mq_hw_queue_need_run(hctx);
- spin_unlock_irqrestore(&hctx->queue->queue_lock, flags);
+ raw_spin_unlock_irqrestore(&hctx->queue->quiesce_sync_lock, flags);
if (!need_run)
return;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cd9c97f6f948..0f651a4fae8d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -480,6 +480,12 @@ struct request_queue {
struct request *last_merge;
spinlock_t queue_lock;
+ /*
+ * Synchronizes quiesce state checks between blk_mq_run_hw_queue()
+ * and blk_mq_unquiesce_queue(). Uses raw_spinlock to avoid sleeping
+ * in RT kernel's IRQ thread context during I/O completion.
+ */
+ raw_spinlock_t quiesce_sync_lock;
int quiesce_depth;
--
2.43.0
Now that the upstream code has been getting broader test coverage by our
users we occasionally see issues with USB2 devices plugged in during boot.
Before Linux is running, the USB2 PHY has usually been running in device
mode and it turns out that sometimes host->device or device->host
transitions don't work.
The root cause: If the role inside the USB2 PHY is re-configured when it
has already been powered on or when dwc3 has already enabled the ULPI
interface the new configuration sometimes doesn't take affect until dwc3
is reset again. Fix this rare issue by configuring the role much earlier.
Note that the USB3 PHY does not suffer from this issue and actually
requires dwc3 to be up before the correct role can be configured there.
Reported-by: James Calligeros <jcalligeros99(a)gmail.com>
Reported-by: Janne Grunau <j(a)jannau.net>
Fixes: 0ec946d32ef7 ("usb: dwc3: Add Apple Silicon DWC3 glue layer driver")
Cc: stable(a)vger.kernel.org
Tested-by: Janne Grunau <j(a)jannau.net>
Reviewed-by: Janne Grunau <j(a)jannau.net>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Sven Peter <sven(a)kernel.org>
---
Changes in v2:
- Picked up tags, thanks!
- Fixed a typo in the commit messages (dwc2 -> dwc3)
- Link to v1: https://patch.msgid.link/20260108-dwc3-apple-usb2phy-fix-v1-1-5dd7bc642040@…
---
drivers/usb/dwc3/dwc3-apple.c | 48 +++++++++++++++++++++++++++++--------------
1 file changed, 33 insertions(+), 15 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-apple.c b/drivers/usb/dwc3/dwc3-apple.c
index cc47cad232e397ac4498b09165dfdb5bd215ded7..c2ae8eb21d514e5e493d2927bc12908c308dfe19 100644
--- a/drivers/usb/dwc3/dwc3-apple.c
+++ b/drivers/usb/dwc3/dwc3-apple.c
@@ -218,25 +218,31 @@ static int dwc3_apple_core_init(struct dwc3_apple *appledwc)
return ret;
}
-static void dwc3_apple_phy_set_mode(struct dwc3_apple *appledwc, enum phy_mode mode)
-{
- lockdep_assert_held(&appledwc->lock);
-
- /*
- * This platform requires SUSPHY to be enabled here already in order to properly configure
- * the PHY and switch dwc3's PIPE interface to USB3 PHY.
- */
- dwc3_enable_susphy(&appledwc->dwc, true);
- phy_set_mode(appledwc->dwc.usb2_generic_phy[0], mode);
- phy_set_mode(appledwc->dwc.usb3_generic_phy[0], mode);
-}
-
static int dwc3_apple_init(struct dwc3_apple *appledwc, enum dwc3_apple_state state)
{
int ret, ret_reset;
lockdep_assert_held(&appledwc->lock);
+ /*
+ * The USB2 PHY on this platform must be configured for host or device mode while it is
+ * still powered off and before dwc3 tries to access it. Otherwise, the new configuration
+ * will sometimes only take affect after the *next* time dwc3 is brought up which causes
+ * the connected device to just not work.
+ * The USB3 PHY must be configured later after dwc3 has already been initialized.
+ */
+ switch (state) {
+ case DWC3_APPLE_HOST:
+ phy_set_mode(appledwc->dwc.usb2_generic_phy[0], PHY_MODE_USB_HOST);
+ break;
+ case DWC3_APPLE_DEVICE:
+ phy_set_mode(appledwc->dwc.usb2_generic_phy[0], PHY_MODE_USB_DEVICE);
+ break;
+ default:
+ /* Unreachable unless there's a bug in this driver */
+ return -EINVAL;
+ }
+
ret = reset_control_deassert(appledwc->reset);
if (ret) {
dev_err(appledwc->dev, "Failed to deassert reset, err=%d\n", ret);
@@ -257,7 +263,13 @@ static int dwc3_apple_init(struct dwc3_apple *appledwc, enum dwc3_apple_state st
case DWC3_APPLE_HOST:
appledwc->dwc.dr_mode = USB_DR_MODE_HOST;
dwc3_apple_set_ptrcap(appledwc, DWC3_GCTL_PRTCAP_HOST);
- dwc3_apple_phy_set_mode(appledwc, PHY_MODE_USB_HOST);
+ /*
+ * This platform requires SUSPHY to be enabled here already in order to properly
+ * configure the PHY and switch dwc3's PIPE interface to USB3 PHY. The USB2 PHY
+ * has already been configured to the correct mode earlier.
+ */
+ dwc3_enable_susphy(&appledwc->dwc, true);
+ phy_set_mode(appledwc->dwc.usb3_generic_phy[0], PHY_MODE_USB_HOST);
ret = dwc3_host_init(&appledwc->dwc);
if (ret) {
dev_err(appledwc->dev, "Failed to initialize host, ret=%d\n", ret);
@@ -268,7 +280,13 @@ static int dwc3_apple_init(struct dwc3_apple *appledwc, enum dwc3_apple_state st
case DWC3_APPLE_DEVICE:
appledwc->dwc.dr_mode = USB_DR_MODE_PERIPHERAL;
dwc3_apple_set_ptrcap(appledwc, DWC3_GCTL_PRTCAP_DEVICE);
- dwc3_apple_phy_set_mode(appledwc, PHY_MODE_USB_DEVICE);
+ /*
+ * This platform requires SUSPHY to be enabled here already in order to properly
+ * configure the PHY and switch dwc3's PIPE interface to USB3 PHY. The USB2 PHY
+ * has already been configured to the correct mode earlier.
+ */
+ dwc3_enable_susphy(&appledwc->dwc, true);
+ phy_set_mode(appledwc->dwc.usb3_generic_phy[0], PHY_MODE_USB_DEVICE);
ret = dwc3_gadget_init(&appledwc->dwc);
if (ret) {
dev_err(appledwc->dev, "Failed to initialize gadget, ret=%d\n", ret);
---
base-commit: 8f0b4cce4481fb22653697cced8d0d04027cb1e8
change-id: 20260108-dwc3-apple-usb2phy-fix-cf1d26018dd0
Best regards,
--
Sven Peter <sven(a)kernel.org>
The error path of xfs_attr_leaf_hasname() can leave a NULL
xfs_buf pointer. xfs_has_attr() checks for the NULL pointer but
the other callers do not.
We tripped over the NULL pointer in xfs_attr_leaf_get() but fix
the other callers too.
Fixes v5.8-rc4-95-g07120f1abdff ("xfs: Add xfs_has_attr and subroutines")
No reproducer.
Cc: stable(a)vger.kernel.org # v5.19+ with another port for v5.9 - v5.18
Signed-off-by: Mark Tinguely <mark.tinguely(a)oracle.com>
---
fs/xfs/libxfs/xfs_attr.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 8c04acd30d48..25e2ecf20d14 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1266,7 +1266,8 @@ xfs_attr_leaf_removename(
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
- xfs_trans_brelse(args->trans, bp);
+ if (bp)
+ xfs_trans_brelse(args->trans, bp);
if (args->op_flags & XFS_DA_OP_RECOVERY)
return 0;
return error;
@@ -1305,7 +1306,8 @@ xfs_attr_leaf_get(xfs_da_args_t *args)
error = xfs_attr_leaf_hasname(args, &bp);
if (error == -ENOATTR) {
- xfs_trans_brelse(args->trans, bp);
+ if (bp)
+ xfs_trans_brelse(args->trans, bp);
return error;
} else if (error != -EEXIST)
return error;
--
2.50.1 (Apple Git-155)
The for_each_available_child_of_node() calls of_node_put() to
release child_np in each success loop. After breaking from the
loop with the child_np has been released, the code will jump to
the put_child label and will call the of_node_put() again if the
devm_request_threaded_irq() fails. These cause a double free bug.
Fix by returning directly to avoid the duplicate of_node_put().
Fixes: ed2b5a8e6b98 ("phy: phy-rockchip-inno-usb2: support muxed interrupts")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wentao Liang <vulab(a)iscas.ac.cn>
---
Changes in v2:
- Drop error jumping label.
---
drivers/phy/rockchip/phy-rockchip-inno-usb2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
index b0f23690ec30..fe97a26297af 100644
--- a/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
+++ b/drivers/phy/rockchip/phy-rockchip-inno-usb2.c
@@ -1491,7 +1491,7 @@ static int rockchip_usb2phy_probe(struct platform_device *pdev)
rphy);
if (ret) {
dev_err_probe(rphy->dev, ret, "failed to request usb2phy irq handle\n");
- goto put_child;
+ return ret;
}
}
--
2.34.1
Since commit c6e126de43e7 ("of: Keep track of populated platform
devices") child devices will not be created by of_platform_populate()
if the devices had previously been deregistered individually so that the
OF_POPULATED flag is still set in the corresponding OF nodes.
Switch to using of_platform_depopulate() instead of open coding so that
the child devices are created if the driver is rebound.
Fixes: c6e126de43e7 ("of: Keep track of populated platform devices")
Cc: stable(a)vger.kernel.org # 3.16
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/mfd/qcom-pm8xxx.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/mfd/qcom-pm8xxx.c b/drivers/mfd/qcom-pm8xxx.c
index 1149f7102a36..0cf374c015ce 100644
--- a/drivers/mfd/qcom-pm8xxx.c
+++ b/drivers/mfd/qcom-pm8xxx.c
@@ -577,17 +577,11 @@ static int pm8xxx_probe(struct platform_device *pdev)
return rc;
}
-static int pm8xxx_remove_child(struct device *dev, void *unused)
-{
- platform_device_unregister(to_platform_device(dev));
- return 0;
-}
-
static void pm8xxx_remove(struct platform_device *pdev)
{
struct pm_irq_chip *chip = platform_get_drvdata(pdev);
- device_for_each_child(&pdev->dev, NULL, pm8xxx_remove_child);
+ of_platform_depopulate(&pdev->dev);
irq_domain_remove(chip->irqdomain);
}
--
2.51.2