On Tue, Dec 02, 2025, David Woodhouse wrote:
On Tue, 2025-12-02 at 07:42 -0800, Sean Christopherson wrote:
On Tue, Dec 02, 2025, David Woodhouse wrote:
On Tue, 2025-12-02 at 12:58 +0000, Khushit Shah wrote:
Thanks for the review!
On 2 Dec 2025, at 2:43 PM, David Woodhouse dwmw2@infradead.org wrote:
Firstly, excellent work debugging and diagnosing that!
On Tue, 2025-11-25 at 18:05 +0000, Khushit Shah wrote:
--- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -7800,8 +7800,10 @@ Will return -EBUSY if a VCPU has already been created. Valid feature flags in args[0] are:: - #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) - #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) + #define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) + #define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) + #define KVM_X2APIC_API_DISABLE_IGNORE_SUPPRESS_EOI_BROADCAST_QUIRK (1ULL << 2) + #define KVM_X2APIC_API_DISABLE_SUPPRESS_EOI_BROADCAST (1ULL << 3)
I kind of hate these names. This part right here is what we leave behind for future generations, to understand the weird behaviour of KVM. To have "IGNORE" "SUPPRESS" "QUIRK" all in the same flag, quite apart from the length of the token, makes my brain hurt.
...
Could we perhaps call them 'ENABLE_SUPPRESS_EOI_BROADCAST' and 'DISABLE_SUPPRESS_EOI_BROADCAST', with a note saying that modern VMMs should always explicitly enable one or the other, because for historical reasons KVM only *pretends* to support it by default but it doesn't actually work correctly?
I don't disagree on the names being painful, but ENABLE_SUPPRESS_EOI_BROADCAST vs. DISABLE_SUPPRESS_EOI_BROADCAST won't work, and is even more confusing IMO.
I dunno, KVM never actually *did* suppress the EOI broadcast anyway, did it? This fix really *does* enable it — as opposed to just pretending to?
I was thinking along the lines of ...
Setting KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST causes KVM to advertise and correctly implement the Directed EOI feature in the local APIC, suppressing broadcast EOI when the feature is enabled by the guest.
Setting KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST causes KVM not to advertise the Directed EOI feature in the local APIC.
Userspace should explicitly either enable or disable the EOI broadcast using one of the two flags above. For historical compatibility reasons, if neither flag is set then KVM will advertise the feature but will not actually suppress the EOI broadcast, leading to potential IRQ storms in some guest configurations.
Hmm, I suppose that could work for uAPI. Having both an ENABLE and a DISABLE is obviously a bit odd, but slowing down the reader might actually be a good thing in this case. And the documentation should be easy enough to write.
I was worried that having ENABLE and DISABLE controls would lead to confusing code internally, but there's no reason KVM's internal tracking needs to match uAPI.
How about this?
--- arch/x86/include/asm/kvm_host.h | 7 +++++++ arch/x86/include/uapi/asm/kvm.h | 6 ++++-- arch/x86/kvm/lapic.c | 16 +++++++++++++++- arch/x86/kvm/x86.c | 15 ++++++++++++--- 4 files changed, 38 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 5a3bfa293e8b..b4c41255f01d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1226,6 +1226,12 @@ enum kvm_irqchip_mode { KVM_IRQCHIP_SPLIT, /* created with KVM_CAP_SPLIT_IRQCHIP */ };
+enum kvm_suppress_eoi_broadcast_mode { + KVM_SUPPRESS_EOI_QUIRKED, + KVM_SUPPRESS_EOI_ENABLED, + KVM_SUPPRESS_EOI_DISABLED, +}; + struct kvm_x86_msr_filter { u8 count; bool default_allow:1; @@ -1475,6 +1481,7 @@ struct kvm_arch {
bool x2apic_format; bool x2apic_broadcast_quirk_disabled; + enum kvm_suppress_eoi_broadcast_mode suppress_eoi_broadcast;
bool has_mapped_host_mmio; bool guest_can_read_msr_platform_info; diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h index 7ceff6583652..bd51596001f8 100644 --- a/arch/x86/include/uapi/asm/kvm.h +++ b/arch/x86/include/uapi/asm/kvm.h @@ -914,8 +914,10 @@ struct kvm_sev_snp_launch_finish { __u64 pad1[4]; };
-#define KVM_X2APIC_API_USE_32BIT_IDS (1ULL << 0) -#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (1ULL << 1) +#define KVM_X2APIC_API_USE_32BIT_IDS (_BITULL(0)) +#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK (_BITULL(1)) +#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST (_BITULL(2)) +#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST (_BITULL(3))
struct kvm_hyperv_eventfd { __u32 conn_id; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 1597dd0b0cc6..3f00c9640785 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -562,7 +562,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu) * IOAPIC. */ if (guest_cpu_cap_has(vcpu, X86_FEATURE_X2APIC) && - !ioapic_in_kernel(vcpu->kvm)) + !ioapic_in_kernel(vcpu->kvm) && + vcpu->kvm->arch.suppress_eoi_broadcast != KVM_SUPPRESS_EOI_DISABLED) v |= APIC_LVR_DIRECTED_EOI; kvm_lapic_set_reg(apic, APIC_LVR, v); } @@ -1517,6 +1518,19 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
/* Request a KVM exit to inform the userspace IOAPIC. */ if (irqchip_split(apic->vcpu->kvm)) { + /* + * Don't exit to userspace if the guest has enabled Directed + * EOI, a.k.a. Suppress EOI Broadcasts, in which case the local + * APIC doesn't broadcast EOIs (the guest must EOI the target + * I/O APIC(s) directly). Ignore the suppression if userspace + * has not explictly enabled support (KVM's historical quirky + * behavior is to advertise support for Suppress EOI Broadcasts + * without actually suppressing EOIs). + */ + if ((kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI) && + apic->vcpu->kvm->arch.suppress_eoi_broadcast != KVM_SUPPRESS_EOI_QUIRKED) + return; + apic->vcpu->arch.pending_ioapic_eoi = vector; kvm_make_request(KVM_REQ_IOAPIC_EOI_EXIT, apic->vcpu); return; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0c6d899d53dd..b36e048c7862 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -121,8 +121,10 @@ static u64 __read_mostly efer_reserved_bits = ~((u64)EFER_SCE);
#define KVM_CAP_PMU_VALID_MASK KVM_PMU_CAP_DISABLE
-#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \ - KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK) +#define KVM_X2APIC_API_VALID_FLAGS (KVM_X2APIC_API_USE_32BIT_IDS | \ + KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK | \ + KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST | \ + KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST)
static void update_cr8_intercept(struct kvm_vcpu *vcpu); static void process_nmi(struct kvm_vcpu *vcpu); @@ -6739,11 +6741,18 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, if (cap->args[0] & ~KVM_X2APIC_API_VALID_FLAGS) break;
+ if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST && + cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST) + break; + if (cap->args[0] & KVM_X2APIC_API_USE_32BIT_IDS) kvm->arch.x2apic_format = true; if (cap->args[0] & KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK) kvm->arch.x2apic_broadcast_quirk_disabled = true; - + if (cap->args[0] & KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST) + kvm->arch.suppress_eoi_broadcast = KVM_SUPPRESS_EOI_ENABLED; + if (cap->args[0] & KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST) + kvm->arch.suppress_eoi_broadcast = KVM_SUPPRESS_EOI_DISABLED; r = 0; break; case KVM_CAP_X86_DISABLE_EXITS:
base-commit: 6c3373b26189853230552bd3932b3edba5883423 --