Sohil reported seeing a split lock warning when running a test that
generates userspace #DB:
x86/split lock detection: #DB: sigtrap_loop_64/4614 took a bus_lock trap at address: 0x4011ae
We investigated the issue and figured out:
1) The warning is a false positive.
2) It is not caused by the test itself.
3) It occurs even when Bus Lock Detection (BLD) is disabled.
4) It only happens on the first #DB on a CPU.
And the root cause is, at boot time, Linux zeros DR6. This leads to
different DR6 values depending on whether the CPU supports BLD:
1) On CPUs with BLD support, DR6 becomes 0xFFFF07F0 (bit 11, DR6.BLD,
is cleared).
2) On CPUs without BLD, DR6 becomes 0xFFFF0FF0.
Since only BLD-induced #DB exceptions clear DR6.BLD and other debug
exceptions leave it unchanged, even if the first #DB is unrelated to
BLD, DR6.BLD is still cleared. As a result, such a first #DB is
misinterpreted as a BLD #DB, and a false warning is triggerred.
Fix the bug by initializing DR6 by writing its architectural reset
value at boot time.
DR7 suffers from a similar issue, apply the same fix.
This patch set is based on tip/x86/urgent branch as of today.
Link to the previous patch set v2:
https://lore.kernel.org/lkml/20250617073234.1020644-1-xin@zytor.com/
Changes in v3:
*) Polish initialize_debug_regs() (PeterZ).
*) Rewrite the comment for DR6_RESERVED definition (Sohil and Sean).
*) Reword the patch 2's changelog using Sean's description.
*) Explain the definition of DR7_FIXED_1 (Sohil).
*) Collect TB, RB, AB (PeterZ, Sohil and Sean).
Xin Li (Intel) (2):
x86/traps: Initialize DR6 by writing its architectural reset value
x86/traps: Initialize DR7 by writing its architectural reset value
arch/x86/include/asm/debugreg.h | 19 ++++++++++++----
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/include/uapi/asm/debugreg.h | 21 ++++++++++++++++-
arch/x86/kernel/cpu/common.c | 24 ++++++++------------
arch/x86/kernel/kgdb.c | 2 +-
arch/x86/kernel/process_32.c | 2 +-
arch/x86/kernel/process_64.c | 2 +-
arch/x86/kernel/traps.c | 34 +++++++++++++++++-----------
arch/x86/kvm/x86.c | 4 ++--
9 files changed, 72 insertions(+), 38 deletions(-)
base-commit: 2aebf5ee43bf0ed225a09a30cf515d9f2813b759
--
2.49.0
From: anvithdosapati <anvithdosapati(a)google.com>
In ufshcd_host_reset_and_restore, scale up clocks only when clock
scaling is supported. Without this change cpu latency is voted for 0
(ufshcd_pm_qos_update) during resume unconditionally.
Signed-off-by: anvithdosapati <anvithdosapati(a)google.com>
Fixes: a3cd5ec55f6c7 ("scsi: ufs: add load based scaling of UFS gear")
Cc: stable(a)vger.kernel.org
---
v2:
- Update commit message
- Add Fixes and Cc stable
drivers/ufs/core/ufshcd.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 4410e7d93b7d..fac381ea2b3a 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -7802,7 +7802,8 @@ static int ufshcd_host_reset_and_restore(struct ufs_hba *hba)
hba->silence_err_logs = false;
/* scale up clocks to max frequency before full reinitialization */
- ufshcd_scale_clks(hba, ULONG_MAX, true);
+ if (ufshcd_is_clkscaling_supported(hba))
+ ufshcd_scale_clks(hba, ULONG_MAX, true);
err = ufshcd_hba_enable(hba);
--
2.50.0.rc1.591.g9c95f17f64-goog
Hi dear LKML and community, this is my first post here, so I'd
appreciate any guidance or redirection if it's due.
Starting from commit
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…,
HDMI handling for certain refresh rates on my intel iGPU is broken.
The error is still present in 6.16rc1.
Specifically, this is the command that disambiguates the newer broken
kernels:
xrandr --output HDMI-1 --auto --scale 1x1 --mode 1920x1080
--rate 120 --pos 0x0 --output eDP-1 --off
The important parts are 1920x1080 and 120Hz. When run on commits prior
to the bisected above, it behaves as expected, delivering 1920x1080 @
120Hz. When run on kernel builds with the above commit included (that
commit or later), the monitor goes completely blank. After about 30
seconds, it shuts down entirely (which I assume means that from the
monitor's perspective, HDMI got "disconnected").
On this link you can see my original report in the ArchLinux community,
where Christian Heusel (@gromit) kindly guided me through the bisection
process and built the bisection images for me to try:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/14…
This link also contains the bisection history.
Additional info:
* The monitor and the notebook are connected via an HDMI cable, the
monitor itself is a 4k@120Hz monitor.
* According to `lsmod | grep -E "(i915|Xe)"`, I'm using the i915 kernel
driver for the GPU.
* The GPU is an iGPU from intel, specifically `Intel Core Ultra 7 155H`.
* One symptom that disambiguates the working and non-working kernels,
besides whether they actually have the bug, is that the broken kernels
cause xrandr to additionally report the 144.05 refresh rate as possible
for the monitor, whereas the non-broken kernels consistently cause
xrandr to only list refresh rate 120 and below as possible. I'm only
ever testing the refresh rate of 120, but the presence of the 144.05
rate is correlated.
If any other information or anything is needed, please write.
Thank you,
Vas
----
#regzbot introduced: 1efd5384277eb71fce20922579061cd3acdb07cf
#regzbot title: intel iGPU with HDMI PLL stopped working at 1080p@120Hz
1efd5384
#regzbot link:
https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/145
As part of a wider cleanup trying to get rid of OF specific APIs, an
incorrect (and partially unrelated) cleanup was introduced.
The goal was to replace a device_for_each_chil_node() loop including an
additional condition inside by a macro doing both the loop and the
check on a single line.
The snippet:
device_for_each_child_node(dev, child)
if (fwnode_property_present(child, "gpio-controller"))
continue;
was replaced by:
for_each_gpiochip_node(dev, child)
which expands into:
device_for_each_child_node(dev, child)
for_each_if(fwnode_property_present(child, "gpio-controller"))
This change is actually doing the opposite of what was initially
expected, breaking the probe of this driver, breaking at the same time
the whole boot of Nuvoton platforms (no more console, the kernel WARN()).
Revert these two changes to roll back to the correct behavior.
Fixes: 693c9ecd8326 ("pinctrl: nuvoton: Reduce use of OF-specific APIs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/pinctrl/nuvoton/pinctrl-ma35.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/pinctrl/nuvoton/pinctrl-ma35.c b/drivers/pinctrl/nuvoton/pinctrl-ma35.c
index 06ae1fe8b8c5..b51704bafd81 100644
--- a/drivers/pinctrl/nuvoton/pinctrl-ma35.c
+++ b/drivers/pinctrl/nuvoton/pinctrl-ma35.c
@@ -1074,7 +1074,10 @@ static int ma35_pinctrl_probe_dt(struct platform_device *pdev, struct ma35_pinct
u32 idx = 0;
int ret;
- for_each_gpiochip_node(dev, child) {
+ device_for_each_child_node(dev, child) {
+ if (fwnode_property_present(child, "gpio-controller"))
+ continue;
+
npctl->nfunctions++;
npctl->ngroups += of_get_child_count(to_of_node(child));
}
@@ -1092,7 +1095,10 @@ static int ma35_pinctrl_probe_dt(struct platform_device *pdev, struct ma35_pinct
if (!npctl->groups)
return -ENOMEM;
- for_each_gpiochip_node(dev, child) {
+ device_for_each_child_node(dev, child) {
+ if (fwnode_property_present(child, "gpio-controller"))
+ continue;
+
ret = ma35_pinctrl_parse_functions(child, npctl, idx++);
if (ret) {
fwnode_handle_put(child);
--
2.48.1
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
On some platforms, the UFS-reset pin has no interrupt logic in TLMM but
is nevertheless registered as a GPIO in the kernel. This enables the
user-space to trigger a BUG() in the pinctrl-msm driver by running, for
example: `gpiomon -c 0 113` on RB2.
The exact culprit is requesting pins whose intr_detection_width setting
is not 1 or 2 for interrupts. This hits a BUG() in
msm_gpio_irq_set_type(). Potentially crashing the kernel due to an
invalid request from user-space is not optimal, so let's go through the
pins and mark those that would fail the check as invalid for the irq chip
as we should not even register them as available irqs.
This function can be extended if we determine that there are more
corner-cases like this.
Fixes: f365be092572 ("pinctrl: Add Qualcomm TLMM driver")
Cc: stable(a)vger.kernel.org
Reviewed-by: Bjorn Andersson <andersson(a)kernel.org>
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
---
Changes in v2:
- expand the commit message, describing the underlying code issue in
detail
- added a newline for better readability
drivers/pinctrl/qcom/pinctrl-msm.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/pinctrl/qcom/pinctrl-msm.c b/drivers/pinctrl/qcom/pinctrl-msm.c
index f012ea88aa22c..1ff84e8c176d4 100644
--- a/drivers/pinctrl/qcom/pinctrl-msm.c
+++ b/drivers/pinctrl/qcom/pinctrl-msm.c
@@ -1038,6 +1038,25 @@ static bool msm_gpio_needs_dual_edge_parent_workaround(struct irq_data *d,
test_bit(d->hwirq, pctrl->skip_wake_irqs);
}
+static void msm_gpio_irq_init_valid_mask(struct gpio_chip *gc,
+ unsigned long *valid_mask,
+ unsigned int ngpios)
+{
+ struct msm_pinctrl *pctrl = gpiochip_get_data(gc);
+ const struct msm_pingroup *g;
+ int i;
+
+ bitmap_fill(valid_mask, ngpios);
+
+ for (i = 0; i < ngpios; i++) {
+ g = &pctrl->soc->groups[i];
+
+ if (g->intr_detection_width != 1 &&
+ g->intr_detection_width != 2)
+ clear_bit(i, valid_mask);
+ }
+}
+
static int msm_gpio_irq_set_type(struct irq_data *d, unsigned int type)
{
struct gpio_chip *gc = irq_data_get_irq_chip_data(d);
@@ -1441,6 +1460,7 @@ static int msm_gpio_init(struct msm_pinctrl *pctrl)
girq->default_type = IRQ_TYPE_NONE;
girq->handler = handle_bad_irq;
girq->parents[0] = pctrl->irq;
+ girq->init_valid_mask = msm_gpio_irq_init_valid_mask;
ret = gpiochip_add_data(&pctrl->chip, pctrl);
if (ret) {
--
2.48.1
Hi Michael, Stefan,
> From: Parav Pandit <parav(a)nvidia.com>
> Sent: 02 June 2025 08:15 AM
> To: mst(a)redhat.com; stefanha(a)redhat.com; axboe(a)kernel.dk;
> virtualization(a)lists.linux.dev; linux-block(a)vger.kernel.org
> Cc: stable(a)vger.kernel.org; NBU-Contact-Li Rongqing (EXTERNAL)
> <lirongqing(a)baidu.com>; Chaitanya Kulkarni <kch(a)nvidia.com>;
> xuanzhuo(a)linux.alibaba.com; pbonzini(a)redhat.com;
> jasowang(a)redhat.com; alok.a.tiwari(a)oracle.com; Parav Pandit
> <parav(a)nvidia.com>; Max Gurtovoy <mgurtovoy(a)nvidia.com>; Israel
> Rukshin <israelr(a)nvidia.com>
> Subject: [PATCH v5] virtio_blk: Fix disk deletion hang on device surprise
> removal
>
> When the PCI device is surprise removed, requests may not complete the
> device as the VQ is marked as broken. Due to this, the disk deletion hangs.
>
> Fix it by aborting the requests when the VQ is broken.
>
> With this fix now fio completes swiftly.
> An alternative of IO timeout has been considered, however when the driver
> knows about unresponsive block device, swiftly clearing them enables users
> and upper layers to react quickly.
>
> Verified with multiple device unplug iterations with pending requests in virtio
> used ring and some pending with the device.
>
> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> device")
> Cc: stable(a)vger.kernel.org
> Reported-by: Li RongQing <lirongqing(a)baidu.com>
> Closes:
> https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741
> @baidu.com/
> Reviewed-by: Max Gurtovoy <mgurtovoy(a)nvidia.com>
> Reviewed-by: Israel Rukshin <israelr(a)nvidia.com>
> Signed-off-by: Parav Pandit <parav(a)nvidia.com>
>
> ---
> v4->v5:
> - fixed comment style where comment to start with one empty line at start
> - Addressed comments from Alok
> - fixed typo in broken vq check
Did you get a chance to review this version where I fixed all the comments you proposed?
[..]