When CPU 1 enters the nohz_full state, and the kworker on CPU 0 executes
the function sched_tick_remote, holding the lock on CPU1's rq
and triggering the warning WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3).
This leads to the process of printing the warning message, where the
console_sem semaphore is held. At this point, the print task on the
CPU1's rq cannot acquire the console_sem and joins the wait queue,
entering the UNINTERRUPTIBLE state. It waits for the console_sem to be
released and then wakes up. After the task on CPU 0 releases
the console_sem, it wakes up the waiting console_sem task.
In try_to_wake_up, it attempts to acquire the lock on CPU1's rq again,
resulting in a deadlock.
The triggering scenario is as follows:
CPU0 CPU1
sched_tick_remote
WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)
report_bug con_write
printk
console_unlock
do_con_write
console_lock
down(&console_sem)
list_add_tail(&waiter.list, &sem->wait_list);
up(&console_sem)
wake_up_q(&wake_q)
try_to_wake_up
__task_rq_lock
_raw_spin_lock
This patch fixes the issue by deffering all printk console printing
during the lock holding period.
Fixes: d84b31313ef8 ("sched/isolation: Offload residual 1Hz scheduler tick")
Signed-off-by: Wang Tao <wangtao554(a)huawei.com>
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index be00629f0ba4..8b2d5b5bfb93 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5723,8 +5723,10 @@ static void sched_tick_remote(struct work_struct *work)
* Make sure the next tick runs within a
* reasonable amount of time.
*/
+ printk_deferred_enter();
u64 delta = rq_clock_task(rq) - curr->se.exec_start;
WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);
+ printk_deferred_exit();
}
curr->sched_class->task_tick(rq, curr, 0);
--
2.34.1
Using an OOB offset past end of the available OOB data is invalid,
irregardless of whether the 'ooblen' is set in the ops or not. Move
the relevant check out from the if statement to always verify that.
The 'oobtest' module executes four tests to verify how reading/writing
OOB data past end of the devices is handled. It expects errors in case
of these tests, but this expectation fails in the last two tests on
MTD devices, which have no OOB bytes available.
This is indicated in the test output like the following:
[ 212.059416] mtd_oobtest: attempting to write past end of device
[ 212.060379] mtd_oobtest: an error is expected...
[ 212.066353] mtd_oobtest: error: wrote past end of device
[ 212.071142] mtd_oobtest: attempting to read past end of device
[ 212.076507] mtd_oobtest: an error is expected...
[ 212.082080] mtd_oobtest: error: read past end of device
...
[ 212.330508] mtd_oobtest: finished with 2 errors
For reference, here is the corresponding code from the oobtest module:
/* Attempt to write off end of device */
ops.mode = MTD_OPS_AUTO_OOB;
ops.len = 0;
ops.retlen = 0;
ops.ooblen = mtd->oobavail;
ops.oobretlen = 0;
ops.ooboffs = 1;
ops.datbuf = NULL;
ops.oobbuf = writebuf;
pr_info("attempting to write past end of device\n");
pr_info("an error is expected...\n");
err = mtd_write_oob(mtd, mtd->size - mtd->writesize, &ops);
if (err) {
pr_info("error occurred as expected\n");
} else {
pr_err("error: wrote past end of device\n");
errcnt += 1;
}
As it can be seen, the code sets 'ooboffs' to 1, and 'ooblen' to
mtd->oobavail which is zero in our case.
Since the mtd_check_oob_ops() function only verifies 'ooboffs' if 'ooblen'
is not zero, the 'ooboffs' value does not gets validated and the function
returns success whereas it should fail.
After the change, the oobtest module will bail out early with an error if
there are no OOB bytes available on the MDT device under test:
# cat /sys/class/mtd/mtd0/oobavail
0
# insmod mtd_test; insmod mtd_oobtest dev=0
[ 943.606228]
[ 943.606259] =================================================
[ 943.606784] mtd_oobtest: MTD device: 0
[ 943.612660] mtd_oobtest: MTD device size 524288, eraseblock size 131072, page size 2048, count of eraseblocks 4, pages per eraseblock 64, OOB size 128
[ 943.616091] mtd_test: scanning for bad eraseblocks
[ 943.629571] mtd_test: scanned 4 eraseblocks, 0 are bad
[ 943.634313] mtd_oobtest: test 1 of 5
[ 943.653402] mtd_oobtest: writing OOBs of whole device
[ 943.653424] mtd_oobtest: error: writeoob failed at 0x0
[ 943.657419] mtd_oobtest: error: use_len 0, use_offset 0
[ 943.662493] mtd_oobtest: error -22 occurred
[ 943.667574] =================================================
This behaviour is more accurate than the current one where most tests
are indicating successful writing of OOB data even that in fact nothing
gets written into the device, which is quite misleading.
Cc: stable(a)vger.kernel.org
Fixes: 5cdd929da53d ("mtd: Add sanity checks in mtd_write/read_oob()")
Reviewed-by: Daniel Golle <daniel(a)makrotopia.org>
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
Changes in v2:
- add Reviewed-by tag from Daniel
- add stable and Fixes tags
- Link to v1: https://lore.kernel.org/r/20250831-mtd-validate-ooboffs-v1-1-d3fdce7a8698@g…
---
drivers/mtd/mtdcore.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 5ba9a741f5ac3c297ae21329c2827baf5dc471f0..9a3c9f163219bcb9fde66839f228fd8d38310f2d 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -1590,12 +1590,12 @@ static int mtd_check_oob_ops(struct mtd_info *mtd, loff_t offs,
if (offs < 0 || offs + ops->len > mtd->size)
return -EINVAL;
+ if (ops->ooboffs >= mtd_oobavail(mtd, ops))
+ return -EINVAL;
+
if (ops->ooblen) {
size_t maxooblen;
- if (ops->ooboffs >= mtd_oobavail(mtd, ops))
- return -EINVAL;
-
maxooblen = ((size_t)(mtd_div_by_ws(mtd->size, mtd) -
mtd_div_by_ws(offs, mtd)) *
mtd_oobavail(mtd, ops)) - ops->ooboffs;
---
base-commit: 1b237f190eb3d36f52dffe07a40b5eb210280e00
change-id: 20250831-mtd-validate-ooboffs-e35c796540fe
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca703 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable(a)vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge(a)oracle.com>
Reviewed-by: Allison Henderson <allison.henderson(a)oracle.com>
---
v3 -> v4:
* Removed unused "out:" label
* Added Allison's r-b
v2 -> v3:
* Amended commit message
* Removed indentation of this section
* Fixing error path from ib_post_send()
v1 -> v2: Added Cc: stable(a)vger.kernel.org
---
net/rds/ib_frmr.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index 28c1b00221780..bd861191157b5 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -133,12 +133,15 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_dma_len,
&off, PAGE_SIZE);
- if (unlikely(ret != ibmr->sg_dma_len))
- return ret < 0 ? ret : -EINVAL;
+ if (unlikely(ret != ibmr->sg_dma_len)) {
+ ret = ret < 0 ? ret : -EINVAL;
+ goto out_inc;
+ }
- if (cmpxchg(&frmr->fr_state,
- FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE)
- return -EBUSY;
+ if (cmpxchg(&frmr->fr_state, FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE) {
+ ret = -EBUSY;
+ goto out_inc;
+ }
atomic_inc(&ibmr->ic->i_fastreg_inuse_count);
@@ -166,11 +169,10 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
/* Failure here can be because of -ENOMEM as well */
rds_transition_frwr_state(ibmr, FRMR_IS_INUSE, FRMR_IS_STALE);
- atomic_inc(&ibmr->ic->i_fastreg_wrs);
if (printk_ratelimit())
pr_warn("RDS/IB: %s returned error(%d)\n",
__func__, ret);
- goto out;
+ goto out_inc;
}
/* Wait for the registration to complete in order to prevent an invalid
@@ -179,8 +181,10 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
*/
wait_event(frmr->fr_reg_done, !frmr->fr_reg);
-out:
+ return ret;
+out_inc:
+ atomic_inc(&ibmr->ic->i_fastreg_wrs);
return ret;
}
--
2.43.5
--
Attn: Sir/Madam,
We are interested in your services and invite your company to register
for Vendor/Contractor Partnership with Etihad Aviation Group for our
2025/2026 projects.
If you wish to participate, please confirm your interest, and we will
send the Vendor Questionnaire and EOI.
Thank you.
Kind Regards,
Engr, Tami Hassan,
Vendor Procurement & Contracts Shared Services Center
Etihad Aviation Group PJSC
We need to increment i_fastreg_wrs before we bail out from
rds_ib_post_reg_frmr().
We have a fixed budget of how many FRWR operations that can be
outstanding using the dedicated QP used for memory registrations and
de-registrations. This budget is enforced by the atomic_t
i_fastreg_wrs. If we bail out early in rds_ib_post_reg_frmr(), we will
"leak" the possibility of posting an FRWR operation, and if that
accumulates, no FRWR operation can be carried out.
Fixes: 1659185fb4d0 ("RDS: IB: Support Fastreg MR (FRMR) memory registration mode")
Fixes: 3a2886cca703 ("net/rds: Keep track of and wait for FRWR segments in use upon shutdown")
Cc: stable(a)vger.kernel.org
Signed-off-by: Håkon Bugge <haakon.bugge(a)oracle.com>
---
v2 -> v3:
* Amended commit message
* Removed indentation of this section
* Fixing error path from ib_post_send()
v1 -> v2: Added Cc: stable(a)vger.kernel.org
---
net/rds/ib_frmr.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/net/rds/ib_frmr.c b/net/rds/ib_frmr.c
index 28c1b00221780..395a99b5a65ca 100644
--- a/net/rds/ib_frmr.c
+++ b/net/rds/ib_frmr.c
@@ -133,12 +133,15 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
ret = ib_map_mr_sg_zbva(frmr->mr, ibmr->sg, ibmr->sg_dma_len,
&off, PAGE_SIZE);
- if (unlikely(ret != ibmr->sg_dma_len))
- return ret < 0 ? ret : -EINVAL;
+ if (unlikely(ret != ibmr->sg_dma_len)) {
+ ret = ret < 0 ? ret : -EINVAL;
+ goto out_inc;
+ }
- if (cmpxchg(&frmr->fr_state,
- FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE)
- return -EBUSY;
+ if (cmpxchg(&frmr->fr_state, FRMR_IS_FREE, FRMR_IS_INUSE) != FRMR_IS_FREE) {
+ ret = -EBUSY;
+ goto out_inc;
+ }
atomic_inc(&ibmr->ic->i_fastreg_inuse_count);
@@ -166,11 +169,10 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
/* Failure here can be because of -ENOMEM as well */
rds_transition_frwr_state(ibmr, FRMR_IS_INUSE, FRMR_IS_STALE);
- atomic_inc(&ibmr->ic->i_fastreg_wrs);
if (printk_ratelimit())
pr_warn("RDS/IB: %s returned error(%d)\n",
__func__, ret);
- goto out;
+ goto out_inc;
}
/* Wait for the registration to complete in order to prevent an invalid
@@ -178,9 +180,11 @@ static int rds_ib_post_reg_frmr(struct rds_ib_mr *ibmr)
* being accessed while registration is still pending.
*/
wait_event(frmr->fr_reg_done, !frmr->fr_reg);
-
out:
+ return ret;
+out_inc:
+ atomic_inc(&ibmr->ic->i_fastreg_wrs);
return ret;
}
--
2.43.5
When CPU 1 enters the nohz_full state, and the kworker on CPU 0 executes
the function sched_tick_remote, holding the lock on CPU1's rq
and triggering the warning WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3).
This leads to the process of printing the warning message, where the
console_sem semaphore is held. At this point, the print task on the
CPU1's rq cannot acquire the console_sem and joins the wait queue,
entering the UNINTERRUPTIBLE state. It waits for the console_sem to be
released and then wakes up. After the task on CPU 0 releases
the console_sem, it wakes up the waiting console_sem task.
In try_to_wake_up, it attempts to acquire the lock on CPU1's rq again,
resulting in a deadlock.
The triggering scenario is as follows:
CPU 0 CPU1
sched_tick_remote
WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3)
report_bug con_write
printk
console_unlock
do_con_write
console_lock
down(&console_sem)
list_add_tail(&waiter.list, &sem->wait_list);
up(&console_sem)
wake_up_q(&wake_q)
try_to_wake_up
__task_rq_lock
_raw_spin_lock
This patch fixes the issue by deffering all printk console printing
during the lock holding period.
Fixes: d84b31313ef8 ("sched/isolation: Offload residual 1Hz scheduler tick")
Signed-off-by: Wang Tao <wangtao554(a)huawei.com>
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 40f40f359c5d..fd2c83058ec2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4091,6 +4091,7 @@ static void sched_tick_remote(struct work_struct *work)
goto out_requeue;
rq_lock_irq(rq, &rf);
+ printk_deferred_enter();
curr = rq->curr;
if (cpu_is_offline(cpu))
goto out_unlock;
@@ -4109,6 +4110,7 @@ static void sched_tick_remote(struct work_struct *work)
calc_load_nohz_remote(rq);
out_unlock:
+ printk_deferred_exit();
rq_unlock_irq(rq, &rf);
out_requeue:
--
2.34.1
From: Dimitri John Ledkov <dimitri.ledkov(a)canonical.com>
Strictly encode patterns of supported hw_variants of firmware files
the kernel driver supports requesting. This now includes many missing
and previously undeclared module firmware files for 0x07, 0x08,
0x11-0x14, 0x17-0x1b hw_variants.
This especially affects environments that only install firmware files
declared and referenced by the kernel modules. In such environments,
only the declared firmware files are copied resulting in most Intel
Bluetooth devices not working. I.e. host-only dracut-install initrds,
or Ubuntu Core kernel snaps.
BugLink: https://bugs.launchpad.net/bugs/1970819
Cc: stable(a)vger.kernel.org # 4.15+
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov(a)canonical.com>
---
Notes:
Changes since v4:
- Add missing "intel/" prefix for 0x17+ firmware
- Add Cc stable for v4.15+ kernels
Changes since v3:
- Hopefully pacify trailing whitespace from GitLint in this optional
portion of the commit.
Changes since v2:
- encode patterns for 0x17 0x18 0x19 0x1b hw_variants
- rebase on top of latest rc tag
Changes since v1:
- encode strict patterns of supported firmware files for each of the
supported hw_variant generations.
drivers/bluetooth/btintel.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c
index a657e9a3e96a..d0e22fe09567 100644
--- a/drivers/bluetooth/btintel.c
+++ b/drivers/bluetooth/btintel.c
@@ -2656,7 +2656,25 @@ MODULE_AUTHOR("Marcel Holtmann <marcel(a)holtmann.org>");
MODULE_DESCRIPTION("Bluetooth support for Intel devices ver " VERSION);
MODULE_VERSION(VERSION);
MODULE_LICENSE("GPL");
-MODULE_FIRMWARE("intel/ibt-11-5.sfi");
-MODULE_FIRMWARE("intel/ibt-11-5.ddc");
-MODULE_FIRMWARE("intel/ibt-12-16.sfi");
-MODULE_FIRMWARE("intel/ibt-12-16.ddc");
+/* hw_variant 0x07 0x08 */
+MODULE_FIRMWARE("intel/ibt-hw-37.7.*-fw-*.*.*.*.*.bseq");
+MODULE_FIRMWARE("intel/ibt-hw-37.7.bseq");
+MODULE_FIRMWARE("intel/ibt-hw-37.8.*-fw-*.*.*.*.*.bseq");
+MODULE_FIRMWARE("intel/ibt-hw-37.8.bseq");
+/* hw_variant 0x0b 0x0c */
+MODULE_FIRMWARE("intel/ibt-11-*.sfi");
+MODULE_FIRMWARE("intel/ibt-12-*.sfi");
+MODULE_FIRMWARE("intel/ibt-11-*.ddc");
+MODULE_FIRMWARE("intel/ibt-12-*.ddc");
+/* hw_variant 0x11 0x12 0x13 0x14 */
+MODULE_FIRMWARE("intel/ibt-17-*-*.sfi");
+MODULE_FIRMWARE("intel/ibt-18-*-*.sfi");
+MODULE_FIRMWARE("intel/ibt-19-*-*.sfi");
+MODULE_FIRMWARE("intel/ibt-20-*-*.sfi");
+MODULE_FIRMWARE("intel/ibt-17-*-*.ddc");
+MODULE_FIRMWARE("intel/ibt-18-*-*.ddc");
+MODULE_FIRMWARE("intel/ibt-19-*-*.ddc");
+MODULE_FIRMWARE("intel/ibt-20-*-*.ddc");
+/* hw_variant 0x17 0x18 0x19 0x1b, read and use cnvi/cnvr */
+MODULE_FIRMWARE("intel/ibt-[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9].sfi");
+MODULE_FIRMWARE("intel/ibt-[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9].ddc");
--
2.34.1
ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control
the -mstrict-align flag. However, ACPI structures are packed by default
so will cause unaligned accesses.
To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to
align ACPI structures if ARCH_STRICT_ALIGN enabled.
Cc: stable(a)vger.kernel.org
Reported-by: Binbin Zhou <zhoubinbin(a)loongson.cn>
Suggested-by: Xi Ruoyao <xry111(a)xry111.site>
Suggested-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Modify asm/acenv.h instead of Makefile.
arch/loongarch/include/asm/acenv.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/acenv.h b/arch/loongarch/include/asm/acenv.h
index 52f298f7293b..483c955f2ae5 100644
--- a/arch/loongarch/include/asm/acenv.h
+++ b/arch/loongarch/include/asm/acenv.h
@@ -10,9 +10,8 @@
#ifndef _ASM_LOONGARCH_ACENV_H
#define _ASM_LOONGARCH_ACENV_H
-/*
- * This header is required by ACPI core, but we have nothing to fill in
- * right now. Will be updated later when needed.
- */
+#ifdef CONFIG_ARCH_STRICT_ALIGN
+#define ACPI_MISALIGNMENT_NOT_SUPPORTED
+#endif /* CONFIG_ARCH_STRICT_ALIGN */
#endif /* _ASM_LOONGARCH_ACENV_H */
--
2.47.3
The q6i2s_set_fmt() function was defined but never linked into the
I2S DAI operations, resulting DAI format settings is being ignored
during stream setup. This change fixes the issue by properly linking
the .set_fmt handler within the DAI ops.
Fixes: 30ad723b93ade ("ASoC: qdsp6: audioreach: add q6apm lpass dai support")
Cc: stable(a)vger.kernel.org
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla(a)oss.qualcomm.com>
Signed-off-by: Mohammad Rafi Shaik <mohammad.rafi.shaik(a)oss.qualcomm.com>
---
sound/soc/qcom/qdsp6/q6apm-lpass-dais.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c b/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c
index 20974f10406b..528756f1332b 100644
--- a/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c
+++ b/sound/soc/qcom/qdsp6/q6apm-lpass-dais.c
@@ -262,6 +262,7 @@ static const struct snd_soc_dai_ops q6i2s_ops = {
.shutdown = q6apm_lpass_dai_shutdown,
.set_channel_map = q6dma_set_channel_map,
.hw_params = q6dma_hw_params,
+ .set_fmt = q6i2s_set_fmt,
};
static const struct snd_soc_dai_ops q6hdmi_ops = {
--
2.34.1
Support for parsing the topology on AMD/Hygon processors using CPUID
leaf 0xb was added in commit 3986a0a805e6 ("x86/CPU/AMD: Derive CPU
topology from CPUID function 0xB when available"). In an effort to keep
all the topology parsing bits in one place, this commit also introduced
a pseudo dependency on the TOPOEXT feature to parse the CPUID leaf 0xb.
TOPOEXT feature (CPUID 0x80000001 ECX[22]) advertises the support for
Cache Properties leaf 0x8000001d and the CPUID leaf 0x8000001e EAX for
"Extended APIC ID" however support for 0xb was introduced alongside the
x2APIC support not only on AMD [1], but also historically on x86 [2].
Similar to 0xb, the support for extended CPU topology leaf 0x80000026
too does not depend on the TOPOEXT feature.
The support for these leaves is expected to be confirmed by ensuring
"leaf <= {extended_}cpuid_level" and then parsing the level 0 of the
respective leaf to confirm EBX[15:0] (LogProcAtThisLevel) is non-zero as
stated in the definition of "CPUID_Fn0000000B_EAX_x00 [Extended Topology
Enumeration] (Core::X86::Cpuid::ExtTopEnumEax0)" in Processor
Programming Reference (PPR) for AMD Family 19h Model 01h Rev B1 Vol1 [3]
Sec. 2.1.15.1 "CPUID Instruction Functions".
This has not been a problem on baremetal platforms since support for
TOPOEXT (Fam 0x15 and later) predates the support for CPUID leaf 0xb
(Fam 0x17[Zen2] and later), however, for AMD guests on QEMU, "x2apic"
feature can be enabled independent of the "topoext" feature where QEMU
expects topology and the initial APICID to be parsed using the CPUID
leaf 0xb (especially when number of cores > 255) which is populated
independent of the "topoext" feature flag.
Unconditionally call cpu_parse_topology_ext() on AMD and Hygon
processors to first parse the topology using the XTOPOLOGY leaves
(0x80000026 / 0xb) before using the TOPOEXT leaf (0x8000001e).
While at it, break down the single large comment in parse_topology_amd()
to better highlight the purpose of each CPUID leaf.
Cc: stable(a)vger.kernel.org # Only v6.9 and above; Depends on x86 topology rewrite
Link: https://lore.kernel.org/lkml/1529686927-7665-1-git-send-email-suravee.suthi… [1]
Link: https://lore.kernel.org/lkml/20080818181435.523309000@linux-os.sc.intel.com/ [2]
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 [3]
Suggested-by: Naveen N Rao (AMD) <naveen(a)kernel.org>
Fixes: 3986a0a805e6 ("x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available")
Signed-off-by: K Prateek Nayak <kprateek.nayak(a)amd.com>
---
Changelog v4..v5:
o Made a note on only targeting versions >= v6.9 for stable backports
since the fix depends on the x86 topology rewrite. (Boris)
o Renamed "has_topoext" to "has_xtopology". (Boris)
o Broke down the large comment in parse_topology_amd() to better
highlight the purpose of each leaf and the overall parsing flow.
(Boris)
---
arch/x86/kernel/cpu/topology_amd.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/cpu/topology_amd.c b/arch/x86/kernel/cpu/topology_amd.c
index 827dd0dbb6e9..c79ebbb639cb 100644
--- a/arch/x86/kernel/cpu/topology_amd.c
+++ b/arch/x86/kernel/cpu/topology_amd.c
@@ -175,27 +175,30 @@ static void topoext_fixup(struct topo_scan *tscan)
static void parse_topology_amd(struct topo_scan *tscan)
{
- bool has_topoext = false;
-
/*
- * If the extended topology leaf 0x8000_001e is available
- * try to get SMT, CORE, TILE, and DIE shifts from extended
+ * Try to get SMT, CORE, TILE, and DIE shifts from extended
* CPUID leaf 0x8000_0026 on supported processors first. If
* extended CPUID leaf 0x8000_0026 is not supported, try to
- * get SMT and CORE shift from leaf 0xb first, then try to
- * get the CORE shift from leaf 0x8000_0008.
+ * get SMT and CORE shift from leaf 0xb. If either leaf is
+ * available, cpu_parse_topology_ext() will return true.
*/
- if (cpu_feature_enabled(X86_FEATURE_TOPOEXT))
- has_topoext = cpu_parse_topology_ext(tscan);
+ bool has_xtopology = cpu_parse_topology_ext(tscan);
if (cpu_feature_enabled(X86_FEATURE_AMD_HTR_CORES))
tscan->c->topo.cpu_type = cpuid_ebx(0x80000026);
- if (!has_topoext && !parse_8000_0008(tscan))
+ /*
+ * If XTOPOLOGY leaves (0x26/0xb) are not available, try to
+ * get the CORE shift from leaf 0x8000_0008 first.
+ */
+ if (!has_xtopology && !parse_8000_0008(tscan))
return;
- /* Prefer leaf 0x8000001e if available */
- if (parse_8000_001e(tscan, has_topoext))
+ /*
+ * Prefer leaf 0x8000001e if available to get the SMT shift and
+ * the initial APIC ID if XTOPOLOGY leaves are not available.
+ */
+ if (parse_8000_001e(tscan, has_xtopology))
return;
/* Try the NODEID MSR */
--
2.34.1
When turbo mode is unavailable on a Skylake-X system, executing the
command:
"echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo"
results in an unchecked MSR access error: WRMSR to 0x199
(attempted to write 0x0000000100001300).
This issue was reproduced on an OEM (Original Equipment Manufacturer)
system and is not a common problem across all Skylake-X systems.
This error occurs because the MSR 0x199 Turbo Engage Bit (bit 32) is set
when turbo mode is disabled. The issue arises when intel_pstate fails to
detect that turbo mode is disabled. Here intel_pstate relies on
MSR_IA32_MISC_ENABLE bit 38 to determine the status of turbo mode.
However, on this system, bit 38 is not set even when turbo mode is
disabled.
According to the Intel Software Developer's Manual (SDM), the BIOS sets
this bit during platform initialization to enable or disable
opportunistic processor performance operations. Logically, this bit
should be set in such cases. However, the SDM also specifies that "OS and
applications must use CPUID leaf 06H to detect processors with
opportunistic processor performance operations enabled."
Therefore, in addition to checking MSR_IA32_MISC_ENABLE bit 38, verify
that CPUID.06H:EAX[1] is 0 to accurately determine if turbo mode is
disabled.
Fixes: 4521e1a0ce17 ("cpufreq: intel_pstate: Reflect current no_turbo state correctly")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/cpufreq/intel_pstate.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index f41ed0b9e610..ba9bf06f1c77 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -598,6 +598,9 @@ static bool turbo_is_disabled(void)
{
u64 misc_en;
+ if (!cpu_feature_enabled(X86_FEATURE_IDA))
+ return true;
+
rdmsrl(MSR_IA32_MISC_ENABLE, misc_en);
return !!(misc_en & MSR_IA32_MISC_ENABLE_TURBO_DISABLE);
--
2.48.1
Fix a memory leak issue on netpoll and create a netconsole test that exposes
the problem, when run with kmemleak enabled.
This is a merge of two patches I've sent individually and are merged on
the same patchset[1][2].
Link: https://lore.kernel.org/all/20250904-netconsole_torture-v2-0-5775ed5dc366@d… [1]
Link: https://lore.kernel.org/all/20250902165426.6d6cd172@kernel.org/ [2]
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v3:
- this patchset is a merge of the fix and the selftest together as
recommended by Jakub.
Changes in v2:
- Reuse the netconsole creation from lib_netcons.sh. Thus, refactoring
the create_dynamic_target() (Jakub)
- Move the "wait" to after all the messages has been sent.
- Link to v1: https://lore.kernel.org/r/20250902-netconsole_torture-v1-1-03c6066598e9@deb…
---
Breno Leitao (3):
netpoll: fix incorrect refcount handling causing incorrect cleanup
selftest: netcons: refactor target creation
selftest: netcons: create a torture test
net/core/netpoll.c | 7 +-
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 30 +++--
.../selftests/drivers/net/netcons_torture.sh | 127 +++++++++++++++++++++
4 files changed, 152 insertions(+), 13 deletions(-)
---
base-commit: d69eb204c255c35abd9e8cb621484e8074c75eaa
change-id: 20250902-netconsole_torture-8fc23f0aca99
Best regards,
--
Breno Leitao <leitao(a)debian.org>