The documentation for `Cursor::peek_next` incorrectly describes it as
"Access the previous node without moving the cursor" when it actually
accesses the next node. Update the description to correctly state
"Access the next node without moving the cursor" to match the function
name and implementation.
Reported-by: Miguel Ojeda <ojeda(a)kernel.org>
Closes: https://github.com/Rust-for-Linux/linux/issues/1205
Fixes: 98c14e40e07a0 ("rust: rbtree: add cursor")
Cc: stable(a)vger.kernel.org
Signed-off-by: WeiKang Guo <guoweikang.kernel(a)outlook.com>
---
rust/kernel/rbtree.rs | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/rust/kernel/rbtree.rs b/rust/kernel/rbtree.rs
index 4729eb56827a..cd187e2ca328 100644
--- a/rust/kernel/rbtree.rs
+++ b/rust/kernel/rbtree.rs
@@ -985,7 +985,7 @@ pub fn peek_prev(&self) -> Option<(&K, &V)> {
self.peek(Direction::Prev)
}
- /// Access the previous node without moving the cursor.
+ /// Access the next node without moving the cursor.
pub fn peek_next(&self) -> Option<(&K, &V)> {
self.peek(Direction::Next)
}
--
2.52.0
This is the start of the stable review cycle for the 6.12.62 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 12 Dec 2025 07:29:38 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.62-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.62-rc1
Daniele Palmas <dnlplm(a)gmail.com>
bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
Daniele Palmas <dnlplm(a)gmail.com>
bus: mhi: host: pci_generic: Add Telit FN920C04 modem support
Navaneeth K <knavaneeth786(a)gmail.com>
staging: rtl8723bs: fix out-of-bounds read in OnBeacon ESR IE parsing
Navaneeth K <knavaneeth786(a)gmail.com>
staging: rtl8723bs: fix stack buffer overflow in OnAssocReq IE parsing
Navaneeth K <knavaneeth786(a)gmail.com>
staging: rtl8723bs: fix out-of-bounds read in rtw_get_ie() parser
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
comedi: check device's attached status in compat ioctls
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
comedi: multiq3: sanitize config options in multiq3_attach()
Ian Abbott <abbotti(a)mev.co.uk>
comedi: c6xdigio: Fix invalid PNP driver unregistration
Zenm Chen <zenmchen(a)gmail.com>
wifi: rtw88: Add USB ID 2001:3329 for D-Link AC13U rev. A1
Zenm Chen <zenmchen(a)gmail.com>
wifi: rtl8xxxu: Add USB ID 2001:3328 for D-Link AN3U rev. A1
Linus Torvalds <torvalds(a)linux-foundation.org>
samples: work around glibc redefining some of our defines wrong
Huacai Chen <chenhuacai(a)kernel.org>
LoongArch: Mask all interrupts during kexec/kdump
Naoki Ueki <naoki25519(a)gmail.com>
HID: elecom: Add support for ELECOM M-XT3URBK (018F)
Antheas Kapenekakis <lkml(a)antheas.dev>
platform/x86/amd/pmc: Add spurious_8042 to Xbox Ally
Antheas Kapenekakis <lkml(a)antheas.dev>
platform/x86/amd: pmc: Add Lenovo Legion Go 2 to pmc quirk list
Jia Ston <ston.jia(a)outlook.com>
platform/x86: huawei-wmi: add keys for HONOR models
April Grimoire <april(a)aprilg.moe>
HID: apple: Add SONiX AK870 PRO to non_apple_keyboards quirk list
Armin Wolf <W_Armin(a)gmx.de>
platform/x86: acer-wmi: Ignore backlight event
Praveen Talari <praveen.talari(a)oss.qualcomm.com>
pinctrl: qcom: msm: Fix deadlock in pinmux configuration
Keith Busch <kbusch(a)kernel.org>
nvme: fix admin request_queue lifetime
Mario Limonciello (AMD) <superm1(a)kernel.org>
HID: hid-input: Extend Elan ignore battery quirk to USB
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
bfs: Reconstruct file type when loading from disk
Lushih Hsieh <bruce(a)mail.kh.edu.tw>
ALSA: usb-audio: Add native DSD quirks for PureAudio DAC series
Harish Kasiviswanathan <Harish.Kasiviswanathan(a)amd.com>
drm/amdkfd: Fix GPU mappings for APU after prefetch
Yiqi Sun <sunyiqixm(a)gmail.com>
smb: fix invalid username check in smb3_fs_context_parse_param()
Max Chou <max.chou(a)realtek.com>
Bluetooth: btrtl: Avoid loading the config file on security chips
Ian Forbes <ian.forbes(a)broadcom.com>
drm/vmwgfx: Use kref in vmw_bo_dirty
Robin Gong <yibin.gong(a)nxp.com>
spi: imx: keep dma request disabled before dma transfer setup
Alvaro Gamez Machado <alvaro.gamez(a)hazent.com>
spi: xilinx: increase number of retries before declaring stall
Song Liu <song(a)kernel.org>
ftrace: bpf: Fix IPMODIFY + DIRECT in modify_ftrace_direct()
Johan Hovold <johan(a)kernel.org>
USB: serial: kobil_sct: fix TIOCMBIS and TIOCMBIC
Johan Hovold <johan(a)kernel.org>
USB: serial: belkin_sa: fix TIOCMBIS and TIOCMBIC
Magne Bruno <magne.bruno(a)addi-data.com>
serial: add support of CPCI cards
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: match on interface number for jtag
Fabio Porcedda <fabio.porcedda(a)gmail.com>
USB: serial: option: move Telit 0x10c7 composition in the right place
Fabio Porcedda <fabio.porcedda(a)gmail.com>
USB: serial: option: add Telit Cinterion FE910C04 new compositions
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add Foxconn T99W760
Omar Sandoval <osandov(a)fb.com>
KVM: SVM: Don't skip unrelated instruction if INT3/INTO is replaced
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
comedi: pcl818: fix null-ptr-deref in pcl818_ai_cancel()
Alexey Nepomnyashih <sdl(a)nppct.ru>
ext4: add i_data_sem protection in ext4_destroy_inline_data_nolock()
Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
locking/spinlock/debug: Fix data-race in do_raw_write_lock
Qianchang Zhao <pioooooooooip(a)gmail.com>
ksmbd: ipc: fix use-after-free in ipc_msg_send_request
Deepanshu Kartikey <kartikey406(a)gmail.com>
ext4: refresh inline data size before write operations
Ye Bin <yebin10(a)huawei.com>
jbd2: avoid bug_on in jbd2_journal_get_create_access() when file system corrupted
Bagas Sanjaya <bagasdotme(a)gmail.com>
Documentation: process: Also mention Sasha Levin as stable tree maintainer
Sabrina Dubroca <sd(a)queasysnail.net>
xfrm: flush all states in xfrm_state_fini
Sabrina Dubroca <sd(a)queasysnail.net>
xfrm: also call xfrm_state_delete_tunnel at destroy time for states that were never added
Sabrina Dubroca <sd(a)queasysnail.net>
Revert "xfrm: destroy xfrm_state synchronously on net exit path"
Sabrina Dubroca <sd(a)queasysnail.net>
xfrm: delete x->tunnel as we delete x
-------------
Diffstat:
Documentation/process/2.Process.rst | 6 ++-
Makefile | 4 +-
arch/loongarch/kernel/machine_kexec.c | 2 +
arch/x86/include/asm/kvm_host.h | 9 ++++
arch/x86/kvm/svm/svm.c | 24 +++++----
arch/x86/kvm/x86.c | 21 ++++++++
drivers/bluetooth/btrtl.c | 24 +++++----
drivers/bus/mhi/host/pci_generic.c | 52 +++++++++++++++++++
drivers/comedi/comedi_fops.c | 42 ++++++++++++---
drivers/comedi/drivers/c6xdigio.c | 46 ++++++++++++----
drivers/comedi/drivers/multiq3.c | 9 ++++
drivers/comedi/drivers/pcl818.c | 5 +-
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +
drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 12 ++---
drivers/hid/hid-apple.c | 1 +
drivers/hid/hid-elecom.c | 6 ++-
drivers/hid/hid-ids.h | 3 +-
drivers/hid/hid-input.c | 5 +-
drivers/hid/hid-quirks.c | 3 +-
drivers/net/wireless/realtek/rtl8xxxu/core.c | 3 ++
drivers/net/wireless/realtek/rtw88/rtw8822cu.c | 2 +
drivers/nvme/host/core.c | 3 +-
drivers/pinctrl/qcom/pinctrl-msm.c | 2 +-
drivers/platform/x86/acer-wmi.c | 4 ++
drivers/platform/x86/amd/pmc/pmc-quirks.c | 25 +++++++++
drivers/platform/x86/huawei-wmi.c | 4 ++
drivers/spi/spi-imx.c | 15 ++++--
drivers/spi/spi-xilinx.c | 2 +-
drivers/staging/rtl8723bs/core/rtw_ieee80211.c | 14 ++---
drivers/staging/rtl8723bs/core/rtw_mlme_ext.c | 13 +++--
drivers/tty/serial/8250/8250_pci.c | 37 +++++++++++++
drivers/usb/serial/belkin_sa.c | 28 ++++++----
drivers/usb/serial/ftdi_sio.c | 72 +++++++++-----------------
drivers/usb/serial/kobil_sct.c | 18 +++----
drivers/usb/serial/option.c | 22 ++++++--
fs/bfs/inode.c | 19 ++++++-
fs/ext4/inline.c | 14 ++++-
fs/jbd2/transaction.c | 19 +++++--
fs/smb/client/fs_context.c | 2 +-
fs/smb/server/transport_ipc.c | 7 ++-
include/net/xfrm.h | 13 ++---
kernel/locking/spinlock_debug.c | 4 +-
kernel/trace/ftrace.c | 40 ++++++++++----
net/ipv4/ipcomp.c | 2 +
net/ipv6/ipcomp6.c | 2 +
net/ipv6/xfrm6_tunnel.c | 2 +-
net/key/af_key.c | 2 +-
net/xfrm/xfrm_ipcomp.c | 1 -
net/xfrm/xfrm_state.c | 41 ++++++---------
net/xfrm/xfrm_user.c | 2 +-
samples/vfs/test-statx.c | 6 +++
samples/watch_queue/watch_test.c | 6 +++
sound/usb/quirks.c | 6 +++
53 files changed, 521 insertions(+), 207 deletions(-)
Before this change the LED was added to leds_list before led_init_core()
gets called adding it the list before led_classdev.set_brightness_work gets
initialized.
This leaves a window where led_trigger_register() of a LED's default
trigger will call led_trigger_set() which calls led_set_brightness()
which in turn will end up queueing the *uninitialized*
led_classdev.set_brightness_work.
This race gets hit by the lenovo-thinkpad-t14s EC driver which registers
2 LEDs with a default trigger provided by snd_ctl_led.ko in quick
succession. The first led_classdev_register() causes an async modprobe of
snd_ctl_led to run and that async modprobe manages to exactly hit
the window where the second LED is on the leds_list without led_init_core()
being called for it, resulting in:
------------[ cut here ]------------
WARNING: CPU: 11 PID: 5608 at kernel/workqueue.c:4234 __flush_work+0x344/0x390
Hardware name: LENOVO 21N2S01F0B/21N2S01F0B, BIOS N42ET93W (2.23 ) 09/01/2025
...
Call trace:
__flush_work+0x344/0x390 (P)
flush_work+0x2c/0x50
led_trigger_set+0x1c8/0x340
led_trigger_register+0x17c/0x1c0
led_trigger_register_simple+0x84/0xe8
snd_ctl_led_init+0x40/0xf88 [snd_ctl_led]
do_one_initcall+0x5c/0x318
do_init_module+0x9c/0x2b8
load_module+0x7e0/0x998
Close the race window by moving the adding of the LED to leds_list to
after the led_init_core() call.
Cc: Sebastian Reichel <sre(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <johannes.goede(a)oss.qualcomm.com>
---
Note no Fixes tag as this problem has been around for a long long time,
so I could not really find a good commit for the Fixes tag.
---
drivers/leds/led-class.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index f3faf37f9a08..6b9fa060c3a1 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -560,11 +560,6 @@ int led_classdev_register_ext(struct device *parent,
#ifdef CONFIG_LEDS_BRIGHTNESS_HW_CHANGED
led_cdev->brightness_hw_changed = -1;
#endif
- /* add to the list of leds */
- down_write(&leds_list_lock);
- list_add_tail(&led_cdev->node, &leds_list);
- up_write(&leds_list_lock);
-
if (!led_cdev->max_brightness)
led_cdev->max_brightness = LED_FULL;
@@ -574,6 +569,11 @@ int led_classdev_register_ext(struct device *parent,
led_init_core(led_cdev);
+ /* add to the list of leds */
+ down_write(&leds_list_lock);
+ list_add_tail(&led_cdev->node, &leds_list);
+ up_write(&leds_list_lock);
+
#ifdef CONFIG_LEDS_TRIGGERS
led_trigger_set_default(led_cdev);
#endif
--
2.52.0
lkkbd_interrupt() schedules lk->tq with schedule_work(), and the work
handler lkkbd_reinit() dereferences the lkkbd structure and its
serio/input_dev fields.
lkkbd_disconnect() frees the lkkbd structure without cancelling this
work, so the work can run after the structure has been freed, leading
to a potential use-after-free.
Cancel the pending work in lkkbd_disconnect() before unregistering and
freeing the device, following the same pattern as sunkbd.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Minseong Kim <ii4gsp(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Minseong Kim <ii4gsp(a)gmail.com>
---
drivers/input/keyboard/lkkbd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/input/keyboard/lkkbd.c b/drivers/input/keyboard/lkkbd.c
index c035216dd27c..72c477aab1fc 100644
--- a/drivers/input/keyboard/lkkbd.c
+++ b/drivers/input/keyboard/lkkbd.c
@@ -684,6 +684,8 @@ static void lkkbd_disconnect(struct serio *serio)
{
struct lkkbd *lk = serio_get_drvdata(serio);
+ cancel_work_sync(&lk->tq);
+
input_get_device(lk->dev);
input_unregister_device(lk->dev);
serio_close(serio);
--
2.39.5
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 4368a3b96c427ea3299be8cedb28684ba4157529
Gitweb: https://git.kernel.org/tip/4368a3b96c427ea3299be8cedb28684ba4157529
Author: Yazen Ghannam <yazen.ghannam(a)amd.com>
AuthorDate: Tue, 11 Nov 2025 14:53:57
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitterDate: Fri, 12 Dec 2025 09:38:30 +01:00
x86/acpi/boot: Correct acpi_is_processor_usable() check again
ACPI v6.3 defined a new "Online Capable" MADT LAPIC flag. This bit is
used in conjunction with the "Enabled" MADT LAPIC flag to determine if
a CPU can be enabled/hotplugged by the OS after boot.
Before the new bit was defined, the "Enabled" bit was explicitly
described like this (ACPI v6.0 wording provided):
"If zero, this processor is unusable, and the operating system
support will not attempt to use it"
This means that CPU hotplug (based on MADT) is not possible. Many BIOS
implementations follow this guidance. They may include LAPIC entries in
MADT for unavailable CPUs, but since these entries are marked with
"Enabled=0" it is expected that the OS will completely ignore these
entries.
However, QEMU will do the same (include entries with "Enabled=0") for
the purpose of allowing CPU hotplug within the guest.
Comment from QEMU function pc_madt_cpu_entry():
/* ACPI spec says that LAPIC entry for non present
* CPU may be omitted from MADT or it must be marked
* as disabled. However omitting non present CPU from
* MADT breaks hotplug on linux. So possible CPUs
* should be put in MADT but kept disabled.
*/
Recent Linux topology changes broke the QEMU use case. A following fix
for the QEMU use case broke bare metal topology enumeration.
Rework the Linux MADT LAPIC flags check to allow the QEMU use case only
for guests and to maintain the ACPI spec behavior for bare metal.
Remove an unnecessary check added to fix a bare metal case introduced by
the QEMU "fix".
[ bp: Change logic as Michal suggested. ]
Fixes: fed8d8773b8e ("x86/acpi/boot: Correct acpi_is_processor_usable() check")
Fixes: f0551af02130 ("x86/topology: Ignore non-present APIC IDs in a present package")
Closes: https://lore.kernel.org/r/20251024204658.3da9bf3f.michal.pecio@gmail.com
Reported-by: Michal Pecio <michal.pecio(a)gmail.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Tested-by: Michal Pecio <michal.pecio(a)gmail.com>
Tested-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/20251111145357.4031846-1-yazen.ghannam@amd.com
---
arch/x86/kernel/acpi/boot.c | 12 ++++++++----
arch/x86/kernel/cpu/topology.c | 15 ---------------
2 files changed, 8 insertions(+), 19 deletions(-)
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 9fa321a..d6138b2 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -35,6 +35,7 @@
#include <asm/smp.h>
#include <asm/i8259.h>
#include <asm/setup.h>
+#include <asm/hypervisor.h>
#include "sleep.h" /* To include x86_acpi_suspend_lowlevel */
static int __initdata acpi_force = 0;
@@ -164,11 +165,14 @@ static bool __init acpi_is_processor_usable(u32 lapic_flags)
if (lapic_flags & ACPI_MADT_ENABLED)
return true;
- if (!acpi_support_online_capable ||
- (lapic_flags & ACPI_MADT_ONLINE_CAPABLE))
- return true;
+ if (acpi_support_online_capable)
+ return lapic_flags & ACPI_MADT_ONLINE_CAPABLE;
- return false;
+ /*
+ * QEMU expects legacy "Enabled=0" LAPIC entries to be counted as usable
+ * in order to support CPU hotplug in guests.
+ */
+ return !hypervisor_is_type(X86_HYPER_NATIVE);
}
static int __init
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index f55ea3c..23190a7 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -27,7 +27,6 @@
#include <xen/xen.h>
#include <asm/apic.h>
-#include <asm/hypervisor.h>
#include <asm/io_apic.h>
#include <asm/mpspec.h>
#include <asm/msr.h>
@@ -236,20 +235,6 @@ static __init void topo_register_apic(u32 apic_id, u32 acpi_id, bool present)
cpuid_to_apicid[cpu] = apic_id;
topo_set_cpuids(cpu, apic_id, acpi_id);
} else {
- u32 pkgid = topo_apicid(apic_id, TOPO_PKG_DOMAIN);
-
- /*
- * Check for present APICs in the same package when running
- * on bare metal. Allow the bogosity in a guest.
- */
- if (hypervisor_is_type(X86_HYPER_NATIVE) &&
- topo_unit_count(pkgid, TOPO_PKG_DOMAIN, phys_cpu_present_map)) {
- pr_info_once("Ignoring hot-pluggable APIC ID %x in present package.\n",
- apic_id);
- topo_info.nr_rejected_cpus++;
- return;
- }
-
topo_info.nr_disabled_cpus++;
}
Add two flags for KVM_CAP_X2APIC_API to allow userspace to control support
for Suppress EOI Broadcasts, which KVM completely mishandles. When x2APIC
support was first added, KVM incorrectly advertised and "enabled" Suppress
EOI Broadcast, without fully supporting the I/O APIC side of the equation,
i.e. without adding directed EOI to KVM's in-kernel I/O APIC.
That flaw was carried over to split IRQCHIP support, i.e. KVM advertised
support for Suppress EOI Broadcasts irrespective of whether or not the
userspace I/O APIC implementation supported directed EOIs. Even worse,
KVM didn't actually suppress EOI broadcasts, i.e. userspace VMMs without
support for directed EOI came to rely on the "spurious" broadcasts.
KVM "fixed" the in-kernel I/O APIC implementation by completely disabling
support for Suppress EOI Broadcasts in commit 0bcc3fb95b97 ("KVM: lapic:
stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use"), but
didn't do anything to remedy userspace I/O APIC implementations.
KVM's bogus handling of Suppress EOI Broadcast is problematic when the
guest relies on interrupts being masked in the I/O APIC until well after
the initial local APIC EOI. E.g. Windows with Credential Guard enabled
handles interrupts in the following order:
1. Interrupt for L2 arrives.
2. L1 APIC EOIs the interrupt.
3. L1 resumes L2 and injects the interrupt.
4. L2 EOIs after servicing.
5. L1 performs the I/O APIC EOI.
Because KVM EOIs the I/O APIC at step #2, the guest can get an interrupt
storm, e.g. if the IRQ line is still asserted and userspace reacts to the
EOI by re-injecting the IRQ, because the guest doesn't de-assert the line
until step #4, and doesn't expect the interrupt to be re-enabled until
step #5.
Unfortunately, simply "fixing" the bug isn't an option, as KVM has no way
of knowing if the userspace I/O APIC supports directed EOIs, i.e.
suppressing EOI broadcasts would result in interrupts being stuck masked
in the userspace I/O APIC due to step #5 being ignored by userspace. And
fully disabling support for Suppress EOI Broadcast is also undesirable, as
picking up the fix would require a guest reboot, *and* more importantly
would change the virtual CPU model exposed to the guest without any buy-in
from userspace.
Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
explicitly enable or disable support for Suppress EOI Broadcasts while
using split IRQCHIP mode. This gives userspace control over the virtual
CPU model exposed to the guest, as KVM should never have enabled support
for Suppress EOI Broadcast without a userspace opt-in. Not setting
either flag will result in legacy quirky behavior for backward
compatibility.
Note, Suppress EOI Broadcasts is defined only in Intel's SDM, not in AMD's
APM. But the bit is writable on some AMD CPUs, e.g. Turin, and KVM's ABI
is to support Directed EOI (KVM's name) irrespective of guest CPU vendor.
Fixes: 7543a635aa09 ("KVM: x86: Add KVM exit for IOAPIC EOIs")
Closes: https://lore.kernel.org/kvm/7D497EF1-607D-4D37-98E7-DAF95F099342@nutanix.com
Cc: stable(a)vger.kernel.org
Suggested-by: David Woodhouse <dwmw2(a)infradead.org>
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Khushit Shah <khushit.shah(a)nutanix.com>
---
v4:
- Add KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST flags to allow userspace to
explicitly enable or disable support for Suppress EOI Broadcasts while
using split IRQCHIP mode.
After the inclusion of David Woodhouse's patch to support IOAPIC version 0x20,
we can tweak the uAPI to support kernel IRQCHIP mode as well.
Testing:
- Setting both the flags fails with EINVAL.
- Setting flags in kernel IRQCHIP mode fails with EINVAL.
- Setting flags in split IRQCHIP mode succeeds and both the flags work
as expected.
---
Documentation/virt/kvm/api.rst | 27 ++++++++++++++++++++--
arch/x86/include/asm/kvm_host.h | 7 ++++++
arch/x86/include/uapi/asm/kvm.h | 6 +++--
arch/x86/kvm/lapic.c | 40 +++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 19 +++++++++++++---
5 files changed, 92 insertions(+), 7 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 57061fa29e6a..b26528e0fec1 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7800,8 +7800,10 @@ Will return -EBUSY if a VCPU has already been created.
Valid feature flags in args[0] are::
- #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
- #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
+ #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+ #define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (1ULL << 2)
+ #define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
Enabling KVM_X2APIC_API_USE_32BIT_IDS changes the behavior of
KVM_SET_GSI_ROUTING, KVM_SIGNAL_MSI, KVM_SET_LAPIC, and KVM_GET_LAPIC,
@@ -7814,6 +7816,27 @@ as a broadcast even in x2APIC mode in order to support physical x2APIC
without interrupt remapping. This is undesirable in logical mode,
where 0xff represents CPUs 0-7 in cluster 0.
+Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST instructs KVM to enable
+Suppress EOI Broadcasts. KVM will advertise support for Suppress EOI Broadcast
+to the guest and suppress LAPIC EOI broadcasts when the guest sets the
+Suppress EOI Broadcast bit in the SPIV register.
+
+Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST disables support for
+Suppress EOI Broadcasts entirely, i.e. instructs KVM to NOT advertise support
+to the guest.
+
+Modern VMMs should either enable KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST or
+KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST. If not, legacy quirky behavior will
+be used by KVM, which is to advertise support for Suppress EOI Broadcasts but
+not actually suppressing EOI broadcasts.
+
+Currently, both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
+KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST must only be set when in split IRQCHIP
+mode. Otherwise, the ioctl will fail with an EINVAL error.
+
+Setting both KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST and
+KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST will fail with an EINVAL error.
+
7.8 KVM_CAP_S390_USER_INSTR0
----------------------------
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 48598d017d6f..4a6d94dc7a2a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1229,6 +1229,12 @@ enum kvm_irqchip_mode {
KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */
};
+enum kvm_suppress_eoi_broadcast_mode {
+ KVM_SUPPRESS_EOI_BROADCAST_QUIRKED, /* Legacy behavior */
+ KVM_SUPPRESS_EOI_BROADCAST_ENABLED, /* Enable Suppress EOI broadcast */
+ KVM_SUPPRESS_EOI_BROADCAST_DISABLED /* Disable Suppress EOI broadcast */
+};
+
struct kvm_x86_msr_filter {
u8 count;
bool default_allow:1;
@@ -1480,6 +1486,7 @@ struct kvm_arch {
bool x2apic_format;
bool x2apic_broadcast_quirk_disabled;
+ enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast_mode;
bool has_mapped_host_mmio;
bool guest_can_read_msr_platform_info;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d420c9c066d4..d30241429fa8 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -913,8 +913,10 @@ struct kvm_sev_snp_launch_finish {
__u64 pad1[4];
};
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0))
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1))
+#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (_BITULL(2))
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (_BITULL(3))
struct kvm_hyperv_eventfd {
__u32 conn_id;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0ae7f913d782..1ef0bd3eff1e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -105,6 +105,34 @@ bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector)
apic_test_vector(vector, apic->regs + APIC_IRR);
}
+static inline bool kvm_lapic_advertise_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ /*
+ * Advertise Suppress EOI broadcast support to the guest unless the VMM
+ * explicitly disabled it.
+ *
+ * Historically, KVM advertised this capability even though it did not
+ * actually suppress EOIs.
+ */
+ return kvm->arch.suppress_eoi_broadcast_mode !=
+ KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
+}
+
+static inline bool kvm_lapic_ignore_suppress_eoi_broadcast(struct kvm *kvm)
+{
+ /*
+ * Returns true if KVM should ignore the suppress EOI broadcast bit set by
+ * the guest and broadcast EOIs anyway.
+ *
+ * Only returns false when the VMM explicitly enabled Suppress EOI
+ * broadcast. If disabled by VMM, the bit should be ignored as it is not
+ * supported. Legacy behavior was to ignore the bit and broadcast EOIs
+ * anyway.
+ */
+ return kvm->arch.suppress_eoi_broadcast_mode !=
+ KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
+}
+
__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_has_noapic_vcpu);
@@ -562,6 +590,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
* IOAPIC.
*/
if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) &&
+ kvm_lapic_advertise_suppress_eoi_broadcast(vcpu->kvm) &&
!ioapic_in_kernel(vcpu->kvm))
v |= APIC_LVR_DIRECTED_EOI;
kvm_lapic_set_reg(apic, APIC_LVR, v);
@@ -1517,6 +1546,17 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
+ /*
+ * Don't exit to userspace if the guest has enabled Directed
+ * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local
+ * APIC doesn't broadcast EOIs (the guest must EOI the target
+ * I/O APIC(s) directly). Ignore the suppression if userspace
+ * has NOT explicitly enabled Suppress EOI broadcast.
+ */
+ if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) &&
+ !kvm_lapic_ignore_suppress_eoi_broadcast(apic->vcpu->kvm))
+ return;
+
apic->vcpu->arch.pending_ioapic_eoi = vector;
kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu);
return;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index c9c2aa6f4705..81b40fdb5f5f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -121,8 +121,11 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \
- KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
+#define KVM_X2APIC_API_VALID_FLAGS \
+ (KVM_X2APIC_API_USE_32BIT_IDS | \
+ KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \
+ KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \
+ KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void update_cr8_intercept(struct kvm_vcpu *vcpu);
static void process_nmi(struct kvm_vcpu *vcpu);
@@ -6777,12 +6780,22 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = -EINVAL;
if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS)
break;
+ if ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) &&
+ (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST))
+ break;
+ if (!irqchip_split(kvm) &&
+ ((cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) ||
+ (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)))
+ break;
if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS)
kvm->arch.x2apic_format = true;
if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK)
kvm->arch.x2apic_broadcast_quirk_disabled = true;
-
+ if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_ENABLED;
+ if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
+ kvm->arch.suppress_eoi_broadcast_mode = KVM_SUPPRESS_EOI_BROADCAST_DISABLED;
r = 0;
break;
case KVM_CAP_X86_DISABLE_EXITS:
--
2.39.3