- Linux-stable-mirror - lists.linaro.org

FAILED: patch "[PATCH] smb: client: fix netns refcount leak after net_passive" failed to apply to 6.12-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 59b33fab4ca4d7dacc03367082777627e05d0323 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081230-dentist-cautious-7bb4@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 59b33fab4ca4d7dacc03367082777627e05d0323 Mon Sep 17 00:00:00 2001 From: Wang Zhaolong <wangzhaolong(a)huaweicloud.com> Date: Thu, 17 Jul 2025 21:29:26 +0800 Subject: [PATCH] smb: client: fix netns refcount leak after net_passive changes After commit 5c70eb5c593d ("net: better track kernel sockets lifetime"), kernel sockets now use net_passive reference counting. However, commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") restored the manual socket refcount manipulation without adapting to this new mechanism, causing a memory leak. The issue can be reproduced by[1]: 1. Creating a network namespace 2. Mounting and Unmounting CIFS within the namespace 3. Deleting the namespace Some memory leaks may appear after a period of time following step 3. unreferenced object 0xffff9951419f6b00 (size 256): comm "ip", pid 447, jiffies 4294692389 (age 14.730s) hex dump (first 32 bytes): 1b 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 80 77 c2 44 51 99 ff ff .........w.DQ... backtrace: __kmem_cache_alloc_node+0x30e/0x3d0 __kmalloc+0x52/0x120 net_alloc_generic+0x1d/0x30 copy_net_ns+0x86/0x200 create_new_namespaces+0x117/0x300 unshare_nsproxy_namespaces+0x60/0xa0 ksys_unshare+0x148/0x360 __x64_sys_unshare+0x12/0x20 do_syscall_64+0x59/0x110 entry_SYSCALL_64_after_hwframe+0x78/0xe2 ... unreferenced object 0xffff9951442e7500 (size 32): comm "mount.cifs", pid 475, jiffies 4294693782 (age 13.343s) hex dump (first 32 bytes): 40 c5 38 46 51 99 ff ff 18 01 96 42 51 99 ff ff @.8FQ......BQ... 01 00 00 00 6f 00 c5 07 6f 00 d8 07 00 00 00 00 ....o...o....... backtrace: __kmem_cache_alloc_node+0x30e/0x3d0 kmalloc_trace+0x2a/0x90 ref_tracker_alloc+0x8e/0x1d0 sk_alloc+0x18c/0x1c0 inet_create+0xf1/0x370 __sock_create+0xd7/0x1e0 generic_ip_connect+0x1d4/0x5a0 [cifs] cifs_get_tcp_session+0x5d0/0x8a0 [cifs] cifs_mount_get_session+0x47/0x1b0 [cifs] dfs_mount_share+0xfa/0xa10 [cifs] cifs_mount+0x68/0x2b0 [cifs] cifs_smb3_do_mount+0x10b/0x760 [cifs] smb3_get_tree+0x112/0x2e0 [cifs] vfs_get_tree+0x29/0xf0 path_mount+0x2d4/0xa00 __se_sys_mount+0x165/0x1d0 Root cause: When creating kernel sockets, sk_alloc() calls net_passive_inc() for sockets with sk_net_refcnt=0. The CIFS code manually converts kernel sockets to user sockets by setting sk_net_refcnt=1, but doesn't call the corresponding net_passive_dec(). This creates an imbalance in the net_passive counter, which prevents the network namespace from being destroyed when its last user reference is dropped. As a result, the entire namespace and all its associated resources remain allocated. Timeline of patches leading to this issue: - commit ef7134c7fc48 ("smb: client: Fix use-after-free of network namespace.") in v6.12 fixed the original netns UAF by manually managing socket refcounts - commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod") in v6.13 attempted to use kernel sockets but introduced TCP timer issues - commit 5c70eb5c593d ("net: better track kernel sockets lifetime") in v6.14-rc5 introduced the net_passive mechanism with sk_net_refcnt_upgrade() for proper socket conversion - commit 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") in v6.15-rc3 reverted to manual refcount management without adapting to the new net_passive changes Fix this by using sk_net_refcnt_upgrade() which properly handles the net_passive counter when converting kernel sockets to user sockets. Link: https://bugzilla.kernel.org/show_bug.cgi?id=220343 [1] Fixes: 95d2b9f693ff ("Revert "smb: client: fix TCP timers deadlock after rmmod"") Cc: stable(a)vger.kernel.org Reviewed-by: Kuniyuki Iwashima <kuniyu(a)google.com> Reviewed-by: Enzo Matsumiya <ematsumiya(a)suse.de> Signed-off-by: Wang Zhaolong <wangzhaolong(a)huaweicloud.com> Signed-off-by: Steve French <stfrench(a)microsoft.com> diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 205f547ca49e..5eec8957f2a9 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -3362,18 +3362,15 @@ generic_ip_connect(struct TCP_Server_Info *server) struct net *net = cifs_net_ns(server); struct sock *sk; - rc = __sock_create(net, sfamily, SOCK_STREAM, - IPPROTO_TCP, &server->ssocket, 1); + rc = sock_create_kern(net, sfamily, SOCK_STREAM, + IPPROTO_TCP, &server->ssocket); if (rc < 0) { cifs_server_dbg(VFS, "Error %d creating socket\n", rc); return rc; } sk = server->ssocket->sk; - __netns_tracker_free(net, &sk->ns_tracker, false); - sk->sk_net_refcnt = 1; - get_net_track(net, &sk->ns_tracker, GFP_KERNEL); - sock_inuse_add(net, 1); + sk_net_refcnt_upgrade(sk); /* BB other socket options to set KEEPALIVE, NODELAY? */ cifs_dbg(FYI, "Socket created\n");

1 month

2
3
0 0

[PATCH 0/4] drm/tidss: Fixes data edge sampling

by Louis Chauvet

Currently the driver only configure the data edge sampling partially. The AM62 require it to be configured in two distincts registers: one in tidss and one in the general device registers. Introduce a new dt property to link the proper syscon node from the main device registers into the tidss driver. Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") --- Cc: stable(a)vger.kernel.org Signed-off-by: Louis Chauvet <louis.chauvet(a)bootlin.com> --- Louis Chauvet (4): dt-bindings: display: ti,am65x-dss: Add clk property for data edge synchronization dt-bindings: mfd: syscon: Add ti,am625-dss-clk-ctrl arm64: dts: ti: k3-am62-main: Add tidss clk-ctrl property drm/tidss: Fix sampling edge configuration .../devicetree/bindings/display/ti/ti,am65x-dss.yaml | 6 ++++++ Documentation/devicetree/bindings/mfd/syscon.yaml | 3 ++- arch/arm64/boot/dts/ti/k3-am62-main.dtsi | 6 ++++++ drivers/gpu/drm/tidss/tidss_dispc.c | 14 ++++++++++++++ 4 files changed, 28 insertions(+), 1 deletion(-) --- base-commit: 85c23f28905cf20a86ceec3cfd7a0a5572c9eb13 change-id: 20250730-fix-edge-handling-9123f7438910 Best regards, -- Louis Chauvet <louis.chauvet(a)bootlin.com>

1 month

6
21
0 0

FAILED: patch "[PATCH] mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got" failed to apply to 6.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y git checkout FETCH_HEAD git cherry-pick -x 9bbffee67ffd16360179327b57f3b1245579ef08 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081243-dense-paragraph-ed33@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9bbffee67ffd16360179327b57f3b1245579ef08 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan <surenb(a)google.com> Date: Mon, 28 Jul 2025 10:53:55 -0700 Subject: [PATCH] mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped By inducing delays in the right places, Jann Horn created a reproducer for a hard to hit UAF issue that became possible after VMAs were allowed to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache. Race description is borrowed from Jann's discovery report: lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under rcu_read_lock(). At that point, the VMA may be concurrently freed, and it can be recycled by another process. vma_start_read() then increments the vma->vm_refcnt (if it is in an acceptable range), and if this succeeds, vma_start_read() can return a recycled VMA. In this scenario where the VMA has been recycled, lock_vma_under_rcu() will then detect the mismatching ->vm_mm pointer and drop the VMA through vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm. This is wrong: It implicitly assumes that the caller is keeping the VMA's mm alive, but in this scenario the caller has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF. The diagram depicting the race: T1 T2 T3 == == == lock_vma_under_rcu mas_walk <VMA gets removed from mm> mmap <the same VMA is reallocated> vma_start_read __refcount_inc_not_zero_limited_acquire munmap __vma_enter_locked refcount_add_not_zero vma_end_read vma_refcount_put __refcount_dec_and_test rcuwait_wait_event <finish operation> rcuwait_wake_up [UAF] Note that rcuwait_wait_event() in T3 does not block because refcount was already dropped by T1. At this point T3 can exit and free the mm causing UAF in T1. To avoid this we move vma->vm_mm verification into vma_start_read() and grab vma->vm_mm to stabilize it before vma_refcount_put() operation. [surenb(a)google.com: v3] Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU") Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> Reported-by: Jann Horn <jannh(a)google.com> Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+… Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Jann Horn <jannh(a)google.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 1f4f44951abe..11a078de9150 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w); #include <linux/tracepoint-defs.h> #include <linux/types.h> #include <linux/cleanup.h> +#include <linux/sched/mm.h> #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), @@ -154,6 +155,10 @@ static inline void vma_refcount_put(struct vm_area_struct *vma) * reused and attached to a different mm before we lock it. * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got * detached. + * + * WARNING! The vma passed to this function cannot be used if the function + * fails to lock it because in certain cases RCU lock is dropped and then + * reacquired. Once RCU lock is dropped the vma can be concurently freed. */ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, struct vm_area_struct *vma) @@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, } rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); + + /* + * If vma got attached to another mm from under us, that mm is not + * stable and can be freed in the narrow window after vma->vm_refcnt + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before + * releasing vma->vm_refcnt. + */ + if (unlikely(vma->vm_mm != mm)) { + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ + struct mm_struct *other_mm = vma->vm_mm; + + /* + * __mmdrop() is a heavy operation and we don't need RCU + * protection here. Release RCU lock during these operations. + * We reinstate the RCU read lock as the caller expects it to + * be held when this function returns even on error. + */ + rcu_read_unlock(); + mmgrab(other_mm); + vma_refcount_put(vma); + mmdrop(other_mm); + rcu_read_lock(); + return NULL; + } + /* * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. * False unlocked result is impossible because we modify and check diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index 729fb7d0dd59..b006cec8e6fe 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, */ /* Check if the vma we locked is the right one. */ - if (unlikely(vma->vm_mm != mm || - address < vma->vm_start || address >= vma->vm_end)) + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) goto inval_end_read; rcu_read_unlock(); @@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm, goto fallback; } - /* - * Verify the vma we locked belongs to the same address space and it's - * not behind of the last search position. - */ - if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end)) + /* Verify the vma is not behind the last search position. */ + if (unlikely(from_addr >= vma->vm_end)) goto fallback_unlock; /*

1 month

2
1
0 0

FAILED: patch "[PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081208-scrubber-veggie-e1a5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 17ec2f965344ee3fd6620bef7ef68792f4ac3af0 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc(a)google.com> Date: Tue, 10 Jun 2025 16:20:06 -0700 Subject: [PATCH] KVM: VMX: Allow guest to set DEBUGCTL.RTM_DEBUG if RTM is supported Let the guest set DEBUGCTL.RTM_DEBUG if RTM is supported according to the guest CPUID model, as debug support is supposed to be available if RTM is supported, and there are no known downsides to letting the guest debug RTM aborts. Note, there are no known bug reports related to RTM_DEBUG, the primary motivation is to reduce the probability of breaking existing guests when a future change adds a missing consistency check on vmcs12.GUEST_DEBUGCTL (KVM currently lets L2 run with whatever hardware supports; whoops). Note #2, KVM already emulates DR6.RTM, and doesn't restrict access to DR7.RTM. Fixes: 83c529151ab0 ("KVM: x86: expose Intel cpu new features (HLE, RTM) to guest") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20250610232010.162191-5-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index b7dded3c8113..fa878b136eba 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -419,6 +419,7 @@ #define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI (1UL << 12) #define DEBUGCTLMSR_FREEZE_IN_SMM_BIT 14 #define DEBUGCTLMSR_FREEZE_IN_SMM (1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT) +#define DEBUGCTLMSR_RTM_DEBUG BIT(15) #define MSR_PEBS_FRONTEND 0x000003f7 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 4ee6cc796855..311f6fa53b67 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2186,6 +2186,10 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated (host_initiated || intel_pmu_lbr_is_enabled(vcpu))) debugctl |= DEBUGCTLMSR_LBR | DEBUGCTLMSR_FREEZE_LBRS_ON_PMI; + if (boot_cpu_has(X86_FEATURE_RTM) && + (host_initiated || guest_cpu_cap_has(vcpu, X86_FEATURE_RTM))) + debugctl |= DEBUGCTLMSR_RTM_DEBUG; + return debugctl; }

1 month

1
0
0 0

FAILED: patch "[PATCH] mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got" failed to apply to 6.16-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.16-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.16.y git checkout FETCH_HEAD git cherry-pick -x 9bbffee67ffd16360179327b57f3b1245579ef08 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081237-buffed-scuba-d3f3@gregkh' --subject-prefix 'PATCH 6.16.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9bbffee67ffd16360179327b57f3b1245579ef08 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan <surenb(a)google.com> Date: Mon, 28 Jul 2025 10:53:55 -0700 Subject: [PATCH] mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped By inducing delays in the right places, Jann Horn created a reproducer for a hard to hit UAF issue that became possible after VMAs were allowed to be recycled by adding SLAB_TYPESAFE_BY_RCU to their cache. Race description is borrowed from Jann's discovery report: lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under rcu_read_lock(). At that point, the VMA may be concurrently freed, and it can be recycled by another process. vma_start_read() then increments the vma->vm_refcnt (if it is in an acceptable range), and if this succeeds, vma_start_read() can return a recycled VMA. In this scenario where the VMA has been recycled, lock_vma_under_rcu() will then detect the mismatching ->vm_mm pointer and drop the VMA through vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm. This is wrong: It implicitly assumes that the caller is keeping the VMA's mm alive, but in this scenario the caller has no relation to the VMA's mm, so the rcuwait_wake_up() can cause UAF. The diagram depicting the race: T1 T2 T3 == == == lock_vma_under_rcu mas_walk <VMA gets removed from mm> mmap <the same VMA is reallocated> vma_start_read __refcount_inc_not_zero_limited_acquire munmap __vma_enter_locked refcount_add_not_zero vma_end_read vma_refcount_put __refcount_dec_and_test rcuwait_wait_event <finish operation> rcuwait_wake_up [UAF] Note that rcuwait_wait_event() in T3 does not block because refcount was already dropped by T1. At this point T3 can exit and free the mm causing UAF in T1. To avoid this we move vma->vm_mm verification into vma_start_read() and grab vma->vm_mm to stabilize it before vma_refcount_put() operation. [surenb(a)google.com: v3] Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU") Signed-off-by: Suren Baghdasaryan <surenb(a)google.com> Reported-by: Jann Horn <jannh(a)google.com> Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+… Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Jann Horn <jannh(a)google.com> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index 1f4f44951abe..11a078de9150 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -12,6 +12,7 @@ extern int rcuwait_wake_up(struct rcuwait *w); #include <linux/tracepoint-defs.h> #include <linux/types.h> #include <linux/cleanup.h> +#include <linux/sched/mm.h> #define MMAP_LOCK_INITIALIZER(name) \ .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), @@ -154,6 +155,10 @@ static inline void vma_refcount_put(struct vm_area_struct *vma) * reused and attached to a different mm before we lock it. * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got * detached. + * + * WARNING! The vma passed to this function cannot be used if the function + * fails to lock it because in certain cases RCU lock is dropped and then + * reacquired. Once RCU lock is dropped the vma can be concurently freed. */ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, struct vm_area_struct *vma) @@ -183,6 +188,31 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, } rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); + + /* + * If vma got attached to another mm from under us, that mm is not + * stable and can be freed in the narrow window after vma->vm_refcnt + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before + * releasing vma->vm_refcnt. + */ + if (unlikely(vma->vm_mm != mm)) { + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ + struct mm_struct *other_mm = vma->vm_mm; + + /* + * __mmdrop() is a heavy operation and we don't need RCU + * protection here. Release RCU lock during these operations. + * We reinstate the RCU read lock as the caller expects it to + * be held when this function returns even on error. + */ + rcu_read_unlock(); + mmgrab(other_mm); + vma_refcount_put(vma); + mmdrop(other_mm); + rcu_read_lock(); + return NULL; + } + /* * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. * False unlocked result is impossible because we modify and check diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index 729fb7d0dd59..b006cec8e6fe 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -164,8 +164,7 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, */ /* Check if the vma we locked is the right one. */ - if (unlikely(vma->vm_mm != mm || - address < vma->vm_start || address >= vma->vm_end)) + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) goto inval_end_read; rcu_read_unlock(); @@ -236,11 +235,8 @@ struct vm_area_struct *lock_next_vma(struct mm_struct *mm, goto fallback; } - /* - * Verify the vma we locked belongs to the same address space and it's - * not behind of the last search position. - */ - if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end)) + /* Verify the vma is not behind the last search position. */ + if (unlikely(from_addr >= vma->vm_end)) goto fallback_unlock; /*

1 month

3
4
0 0

[PATCH V4 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings()

by Harry Yoo

Define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() to ensure page tables are properly synchronized when calling p*d_populate_kernel(). It is inteneded to synchronize page tables via pgd_pouplate_kernel() when 5-level paging is in use and via p4d_pouplate_kernel() when 4-level paging is used. This fixes intermittent boot failures on systems using 4-level paging and a large amount of persistent memory: BUG: unable to handle page fault for address: ffffe70000000034 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP NOPTI RIP: 0010:__init_single_page+0x9/0x6d Call Trace: <TASK> __init_zone_device_page+0x17/0x5d memmap_init_zone_device+0x154/0x1bb pagemap_range+0x2e0/0x40f memremap_pages+0x10b/0x2f0 devm_memremap_pages+0x1e/0x60 dev_dax_probe+0xce/0x2ec [device_dax] dax_bus_probe+0x6d/0xc9 [... snip ...] </TASK> It also fixes a crash in vmemmap_set_pmd() caused by accessing vmemmap before sync_global_pgds() [1]: BUG: unable to handle page fault for address: ffffeb3ff1200000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI Tainted: [W]=WARN RIP: 0010:vmemmap_set_pmd+0xff/0x230 <TASK> vmemmap_populate_hugepages+0x176/0x180 vmemmap_populate+0x34/0x80 __populate_section_memmap+0x41/0x90 sparse_add_section+0x121/0x3e0 __add_pages+0xba/0x150 add_pages+0x1d/0x70 memremap_pages+0x3dc/0x810 devm_memremap_pages+0x1c/0x60 xe_devm_add+0x8b/0x100 [xe] xe_tile_init_noalloc+0x6a/0x70 [xe] xe_device_probe+0x48c/0x740 [xe] [... snip ...] Cc: <stable(a)vger.kernel.org> Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges") Closes: https://lore.kernel.org/linux-mm/20250311114420.240341-1-gwan-gyeong.mun@in… [1] Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com> Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com> --- arch/x86/include/asm/pgtable_64_types.h | 3 +++ arch/x86/mm/init_64.c | 5 +++++ 2 files changed, 8 insertions(+) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 4604f924d8b8..7eb61ef6a185 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -36,6 +36,9 @@ static inline bool pgtable_l5_enabled(void) #define pgtable_l5_enabled() cpu_feature_enabled(X86_FEATURE_LA57) #endif /* USE_EARLY_PGTABLE_L5 */ +#define ARCH_PAGE_TABLE_SYNC_MASK \ + (pgtable_l5_enabled() ? PGTBL_PGD_MODIFIED : PGTBL_P4D_MODIFIED) + extern unsigned int pgdir_shift; extern unsigned int ptrs_per_p4d; diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 76e33bd7c556..a78b498c0dc3 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -223,6 +223,11 @@ static void sync_global_pgds(unsigned long start, unsigned long end) sync_global_pgds_l4(start, end); } +void arch_sync_kernel_mappings(unsigned long start, unsigned long end) +{ + sync_global_pgds(start, end); +} + /* * NOTE: This function is marked __ref because it calls __init function * (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0. -- 2.43.0

1 month

3
4
0 0

Re: Don't apply "Bluetooth: btusb: Add USB ID 3625:010b for TP-LINK Archer TX10UB Nano" to kernel 6.1 and 5.15

by Greg KH

On Wed, Aug 13, 2025 at 12:29:10AM +0800, Zenm Chen wrote: > Hi Greg, > > Sorry that my patch [1] didn't clearly tell which kernels it should apply > to. > > The Bluetooth support for Realtek RTL8851BE/RTL8851BU chips was added into > the kernel since 6.4 [2], so please don't apply this patch to kernel 6.1, > 5.15 and other older stable kernels, thank you very much! > > [1] Bluetooth: btusb: Add USB ID 3625:010b for TP-LINK Archer TX10UB Nano > https://lore.kernel.org/stable/20250521013020.1983-1-zenmchen@gmail.com/ > > [2] Bluetooth: btrtl: Add the support for RTL8851B > https://github.com/torvalds/linux/commit/7948fe1c92d92313eea5453f83deb7f014… Ok, now dropped, thanks. greg k-h

1 month

1
0
0 0

Re: [PATCH ath-next 0/2] wifi: ath12k: install pairwise key first

by Jeff Johnson

On 6/25/2025 3:15 AM, Baochen Qiang wrote: > > > On 6/25/2025 5:51 PM, Johan Hovold wrote: >> [ +CC: Gregoire ] >> >> On Fri, May 23, 2025 at 11:49:00AM +0800, Baochen Qiang wrote: >>> We got report that WCN7850 is not working with IWD [1][2]. Debug >>> shows the reason is that IWD installs group key before pairwise >>> key, which goes against WCN7850's firmware. >>> >>> Reorder key install to workaround this. >>> >>> [1] https://bugzilla.kernel.org/show_bug.cgi?id=218733 >>> [2] https://lore.kernel.org/all/AS8P190MB12051DDBD84CD88E71C40AD7873F2@AS8P190M… >>> >>> Signed-off-by: Baochen Qiang <quic_bqiang(a)quicinc.com> >>> --- >>> --- >>> Baochen Qiang (2): >>> wifi: ath12k: avoid bit operation on key flags >>> wifi: ath12k: install pairwise key first >> >> Thanks for fixing this, Baochen. >> >> I noticed the patches weren't clearly marked as fixes. Do you think we >> should ask the stable team to backport these once they are in mainline >> (e.g. after 6.17-rc1 is out)? Or do you think they are too intrusive and >> risky to backport or similar? > > Yeah, I think they should be backported. > >> >> [ Also please try to remember to CC any (public) reporters. I only found >> out today that this had been addressed in linux-next. ] > > Thanks, will keep in mind. +Stable team, Per the above discussion please backport the series: https://msgid.link/20250523-ath12k-unicast-key-first-v1-0-f53c3880e6d8@quic… This is a 0-day issue so ideally this should be backported from 6.6+ /jeff

1 month

3
5
0 0

[PATCH v2 1/3] KEYS: trusted_tpm1: Compare HMAC values in constant time

by Eric Biggers

To prevent timing attacks, HMAC value comparison needs to be constant time. Replace the memcmp() with the correct function, crypto_memneq(). [For the Fixes commit I used the commit that introduced the memcmp(). It predates the introduction of crypto_memneq(), but it was still a bug at the time even though a helper function didn't exist yet.] Fixes: d00a1c72f7f4 ("keys: add new trusted key-type") Cc: stable(a)vger.kernel.org Signed-off-by: Eric Biggers <ebiggers(a)kernel.org> --- security/keys/trusted-keys/trusted_tpm1.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/security/keys/trusted-keys/trusted_tpm1.c b/security/keys/trusted-keys/trusted_tpm1.c index 89c9798d18007..e73f2c6c817a0 100644 --- a/security/keys/trusted-keys/trusted_tpm1.c +++ b/security/keys/trusted-keys/trusted_tpm1.c @@ -5,10 +5,11 @@ * * See Documentation/security/keys/trusted-encrypted.rst */ #include <crypto/hash_info.h> +#include <crypto/utils.h> #include <linux/init.h> #include <linux/slab.h> #include <linux/parser.h> #include <linux/string.h> #include <linux/err.h> @@ -239,11 +240,11 @@ int TSS_checkhmac1(unsigned char *buffer, TPM_NONCE_SIZE, enonce, TPM_NONCE_SIZE, ononce, 1, continueflag, 0, 0); if (ret < 0) goto out; - if (memcmp(testhmac, authdata, SHA1_DIGEST_SIZE)) + if (crypto_memneq(testhmac, authdata, SHA1_DIGEST_SIZE)) ret = -EINVAL; out: kfree_sensitive(sdesc); return ret; } @@ -332,20 +333,20 @@ static int TSS_checkhmac2(unsigned char *buffer, ret = TSS_rawhmac(testhmac1, key1, keylen1, SHA1_DIGEST_SIZE, paramdigest, TPM_NONCE_SIZE, enonce1, TPM_NONCE_SIZE, ononce, 1, continueflag1, 0, 0); if (ret < 0) goto out; - if (memcmp(testhmac1, authdata1, SHA1_DIGEST_SIZE)) { + if (crypto_memneq(testhmac1, authdata1, SHA1_DIGEST_SIZE)) { ret = -EINVAL; goto out; } ret = TSS_rawhmac(testhmac2, key2, keylen2, SHA1_DIGEST_SIZE, paramdigest, TPM_NONCE_SIZE, enonce2, TPM_NONCE_SIZE, ononce, 1, continueflag2, 0, 0); if (ret < 0) goto out; - if (memcmp(testhmac2, authdata2, SHA1_DIGEST_SIZE)) + if (crypto_memneq(testhmac2, authdata2, SHA1_DIGEST_SIZE)) ret = -EINVAL; out: kfree_sensitive(sdesc); return ret; } -- 2.50.1

1 month

2
1
0 0

FAILED: patch "[PATCH] HID: core: Harden s32ton() against conversion to 0 bits" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081235-dinghy-spectrum-644a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Wed, 23 Jul 2025 10:37:04 -0400 Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits Testing by the syzbot fuzzer showed that the HID core gets a shift-out-of-bounds exception when it tries to convert a 32-bit quantity to a 0-bit quantity. Ideally this should never occur, but there are buggy devices and some might have a report field with size set to zero; we shouldn't reject the report or the device just because of that. Instead, harden the s32ton() routine so that it returns a reasonable result instead of crashing when it is called with the number of bits set to 0 -- the same as what snto32() does. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.… Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva… Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index ef1f79951d9b..f7d4efcae603 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n) static u32 s32ton(__s32 value, unsigned int n) { - s32 a = value >> (n - 1); + s32 a; + if (!value || !n) + return 0; + + a = value >> (n - 1); if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1; return value & ((1 << n) - 1);

1 month

1
0
0 0

FAILED: patch "[PATCH] HID: core: Harden s32ton() against conversion to 0 bits" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081233-stamp-fiddling-ffca@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Wed, 23 Jul 2025 10:37:04 -0400 Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits Testing by the syzbot fuzzer showed that the HID core gets a shift-out-of-bounds exception when it tries to convert a 32-bit quantity to a 0-bit quantity. Ideally this should never occur, but there are buggy devices and some might have a report field with size set to zero; we shouldn't reject the report or the device just because of that. Instead, harden the s32ton() routine so that it returns a reasonable result instead of crashing when it is called with the number of bits set to 0 -- the same as what snto32() does. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.… Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva… Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index ef1f79951d9b..f7d4efcae603 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n) static u32 s32ton(__s32 value, unsigned int n) { - s32 a = value >> (n - 1); + s32 a; + if (!value || !n) + return 0; + + a = value >> (n - 1); if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1; return value & ((1 << n) - 1);

1 month

1
0
0 0

FAILED: patch "[PATCH] HID: core: Harden s32ton() against conversion to 0 bits" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081234-unmapped-serpent-9b0d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Wed, 23 Jul 2025 10:37:04 -0400 Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits Testing by the syzbot fuzzer showed that the HID core gets a shift-out-of-bounds exception when it tries to convert a 32-bit quantity to a 0-bit quantity. Ideally this should never occur, but there are buggy devices and some might have a report field with size set to zero; we shouldn't reject the report or the device just because of that. Instead, harden the s32ton() routine so that it returns a reasonable result instead of crashing when it is called with the number of bits set to 0 -- the same as what snto32() does. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.… Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva… Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index ef1f79951d9b..f7d4efcae603 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n) static u32 s32ton(__s32 value, unsigned int n) { - s32 a = value >> (n - 1); + s32 a; + if (!value || !n) + return 0; + + a = value >> (n - 1); if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1; return value & ((1 << n) - 1);

1 month

1
0
0 0

FAILED: patch "[PATCH] HID: core: Harden s32ton() against conversion to 0 bits" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081233-observing-prepay-25da@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Wed, 23 Jul 2025 10:37:04 -0400 Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits Testing by the syzbot fuzzer showed that the HID core gets a shift-out-of-bounds exception when it tries to convert a 32-bit quantity to a 0-bit quantity. Ideally this should never occur, but there are buggy devices and some might have a report field with size set to zero; we shouldn't reject the report or the device just because of that. Instead, harden the s32ton() routine so that it returns a reasonable result instead of crashing when it is called with the number of bits set to 0 -- the same as what snto32() does. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.… Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva… Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index ef1f79951d9b..f7d4efcae603 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n) static u32 s32ton(__s32 value, unsigned int n) { - s32 a = value >> (n - 1); + s32 a; + if (!value || !n) + return 0; + + a = value >> (n - 1); if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1; return value & ((1 << n) - 1);

1 month

1
0
0 0

FAILED: patch "[PATCH] HID: core: Harden s32ton() against conversion to 0 bits" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081232-immovable-among-3b59@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From a6b87bfc2ab5bccb7ad953693c85d9062aef3fdd Mon Sep 17 00:00:00 2001 From: Alan Stern <stern(a)rowland.harvard.edu> Date: Wed, 23 Jul 2025 10:37:04 -0400 Subject: [PATCH] HID: core: Harden s32ton() against conversion to 0 bits Testing by the syzbot fuzzer showed that the HID core gets a shift-out-of-bounds exception when it tries to convert a 32-bit quantity to a 0-bit quantity. Ideally this should never occur, but there are buggy devices and some might have a report field with size set to zero; we shouldn't reject the report or the device just because of that. Instead, harden the s32ton() routine so that it returns a reasonable result instead of crashing when it is called with the number of bits set to 0 -- the same as what snto32() does. Signed-off-by: Alan Stern <stern(a)rowland.harvard.edu> Reported-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/68753a08.050a0220.33d347.0008.GAE@google.… Tested-by: syzbot+b63d677d63bcac06cf90(a)syzkaller.appspotmail.com Fixes: dde5845a529f ("[PATCH] Generic HID layer - code split") Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/613a66cd-4309-4bce-a4f7-2905f9bce0c9@rowland.harva… Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org> diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index ef1f79951d9b..f7d4efcae603 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -66,8 +66,12 @@ static s32 snto32(__u32 value, unsigned int n) static u32 s32ton(__s32 value, unsigned int n) { - s32 a = value >> (n - 1); + s32 a; + if (!value || !n) + return 0; + + a = value >> (n - 1); if (a && a != -1) return value < 0 ? 1 << (n - 1) : (1 << (n - 1)) - 1; return value & ((1 << n) - 1);

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081238-veggie-abrasive-ce80@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081237-acclaim-finished-f854@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081236-platonic-plant-3b8f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081235-galleria-reapply-8a83@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081234-labrador-unturned-2c21@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 6.12-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081233-clang-reusable-6ef9@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while" failed to apply to 6.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.15.y git checkout FETCH_HEAD git cherry-pick -x 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081232-language-safehouse-4f99@gregkh' --subject-prefix 'PATCH 6.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6b1dd26544d045f6a79e8c73572c0c0db3ef3c1a Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:10 -0700 Subject: [PATCH] KVM: VMX: Preserve host's DEBUGCTLMSR_FREEZE_IN_SMM while running the guest Set/clear DEBUGCTLMSR_FREEZE_IN_SMM in GUEST_IA32_DEBUGCTL based on the host's pre-VM-Enter value, i.e. preserve the host's FREEZE_IN_SMM setting while running the guest. When running with the "default treatment of SMIs" in effect (the only mode KVM supports), SMIs do not generate a VM-Exit that is visible to host (non-SMM) software, and instead transitions directly from VMX non-root to SMM. And critically, DEBUGCTL isn't context switched by hardware on SMI or RSM, i.e. SMM will run with whatever value was resident in hardware at the time of the SMI. Failure to preserve FREEZE_IN_SMM results in the PMU unexpectedly counting events while the CPU is executing in SMM, which can pollute profiling and potentially leak information into the guest. Check for changes in FREEZE_IN_SMM prior to every entry into KVM's inner run loop, as the bit can be toggled in IRQ context via IPI callback (SMP function call), by way of /sys/devices/cpu/freeze_on_smi. Add a field in kvm_x86_ops to communicate which DEBUGCTL bits need to be preserved, as FREEZE_IN_SMM is only supported and defined for Intel CPUs, i.e. explicitly checking FREEZE_IN_SMM in common x86 is at best weird, and at worst could lead to undesirable behavior in the future if AMD CPUs ever happened to pick up a collision with the bit. Exempt TDX vCPUs, i.e. protected guests, from the check, as the TDX Module owns and controls GUEST_IA32_DEBUGCTL. WARN in SVM if KVM_RUN_LOAD_DEBUGCTL is set, mostly to document that the lack of handling isn't a KVM bug (TDX already WARNs on any run_flag). Lastly, explicitly reload GUEST_IA32_DEBUGCTL on a VM-Fail that is missed by KVM but detected by hardware, i.e. in nested_vmx_restore_host_state(). Doing so avoids the need to track host_debugctl on a per-VMCS basis, as GUEST_IA32_DEBUGCTL is unconditionally written by prepare_vmcs02() and load_vmcs12_host_state(). For the VM-Fail case, even though KVM won't have actually entered the guest, vcpu_enter_guest() will have run with vmcs02 active and thus could result in vmcs01 being run with a stale value. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-9-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 4f7ee2dbf10e..5e0415a8ee3f 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1677,6 +1677,7 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical) enum kvm_x86_run_flags { KVM_RUN_FORCE_IMMEDIATE_EXIT = BIT(0), KVM_RUN_LOAD_GUEST_DR6 = BIT(1), + KVM_RUN_LOAD_DEBUGCTL = BIT(2), }; struct kvm_x86_ops { @@ -1707,6 +1708,12 @@ struct kvm_x86_ops { void (*vcpu_load)(struct kvm_vcpu *vcpu, int cpu); void (*vcpu_put)(struct kvm_vcpu *vcpu); + /* + * Mask of DEBUGCTL bits that are owned by the host, i.e. that need to + * match the host's value even while the guest is active. + */ + const u64 HOST_OWNED_DEBUGCTL; + void (*update_exception_bitmap)(struct kvm_vcpu *vcpu); int (*get_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr); diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index c85cbce6d2f6..4a6d4460f947 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -915,6 +915,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = { .vcpu_load = vt_op(vcpu_load), .vcpu_put = vt_op(vcpu_put), + .HOST_OWNED_DEBUGCTL = DEBUGCTLMSR_FREEZE_IN_SMM, + .update_exception_bitmap = vt_op(update_exception_bitmap), .get_feature_msr = vmx_get_feature_msr, .get_msr = vt_op(get_msr), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index ef20184b8b11..c69df3aba8d1 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4861,6 +4861,9 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) WARN_ON(kvm_set_dr(vcpu, 7, vmcs_readl(GUEST_DR7))); } + /* Reload DEBUGCTL to ensure vmcs01 has a fresh FREEZE_IN_SMM value. */ + vmx_reload_guest_debugctl(vcpu); + /* * Note that calling vmx_set_{efer,cr0,cr4} is important as they * handle a variety of side effects to KVM's software model. diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index a77d325fe78b..0db1bd383302 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7377,6 +7377,9 @@ fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags) if (run_flags & KVM_RUN_LOAD_GUEST_DR6) set_debugreg(vcpu->arch.dr6, 6); + if (run_flags & KVM_RUN_LOAD_DEBUGCTL) + vmx_reload_guest_debugctl(vcpu); + /* * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index c20a4185d10a..076af78af151 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -419,12 +419,25 @@ bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) { + WARN_ON_ONCE(val & DEBUGCTLMSR_FREEZE_IN_SMM); + + val |= vcpu->arch.host_debugctl & DEBUGCTLMSR_FREEZE_IN_SMM; vmcs_write64(GUEST_IA32_DEBUGCTL, val); } static inline u64 vmx_guest_debugctl_read(void) { - return vmcs_read64(GUEST_IA32_DEBUGCTL); + return vmcs_read64(GUEST_IA32_DEBUGCTL) & ~DEBUGCTLMSR_FREEZE_IN_SMM; +} + +static inline void vmx_reload_guest_debugctl(struct kvm_vcpu *vcpu) +{ + u64 val = vmcs_read64(GUEST_IA32_DEBUGCTL); + + if (!((val ^ vcpu->arch.host_debugctl) & DEBUGCTLMSR_FREEZE_IN_SMM)) + return; + + vmx_guest_debugctl_write(vcpu, val & ~DEBUGCTLMSR_FREEZE_IN_SMM); } /* diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 7fe3549bbd93..179c2c550c8d 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10779,7 +10779,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) dm_request_for_irq_injection(vcpu) && kvm_cpu_accept_dm_intr(vcpu); fastpath_t exit_fastpath; - u64 run_flags; + u64 run_flags, debug_ctl; bool req_immediate_exit = false; @@ -11051,7 +11051,17 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) set_debugreg(0, 7); } - vcpu->arch.host_debugctl = get_debugctlmsr(); + /* + * Refresh the host DEBUGCTL snapshot after disabling IRQs, as DEBUGCTL + * can be modified in IRQ context, e.g. via SMP function calls. Inform + * vendor code if any host-owned bits were changed, e.g. so that the + * value loaded into hardware while running the guest can be updated. + */ + debug_ctl = get_debugctlmsr(); + if ((debug_ctl ^ vcpu->arch.host_debugctl) & kvm_x86_ops.HOST_OWNED_DEBUGCTL && + !vcpu->arch.guest_state_protected) + run_flags |= KVM_RUN_LOAD_DEBUGCTL; + vcpu->arch.host_debugctl = debug_ctl; guest_timing_enter_irqoff();

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081226-sublease-shoptalk-bc18@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:08 -0700 Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports a subset of hardware functionality, i.e. KVM can't rely on hardware to detect illegal/unsupported values. Failure to check the vmcs12 value would allow the guest to load any harware-supported value while running L2. Take care to exempt BTF and LBR from the validity check in order to match KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR are being intercepted. Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set *and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but that would incur non-trivial complexity and wouldn't change the fact that KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity is not worth carrying. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 7211c71d4241..1b8b0642fc2d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl); + vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl & + vmx_get_supported_debugctl(vcpu, false)); } else { kvm_set_dr(vcpu, 7, vcpu->arch.dr7); vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); @@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && - CC(!kvm_dr7_valid(vmcs12->guest_dr7))) + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || + CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) && @@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) | (vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE); + /* + * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02. + * Writes to DEBUGCTL that aren't intercepted by L1 are immediately + * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into + * vmcs02 doesn't strictly track vmcs12. + */ if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS) vmcs12->guest_dr7 = vcpu->arch.dr7; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 4f827a75d980..6a8b78e954cd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu, return (unsigned long)data; } -static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) { u64 debugctl = 0; @@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated return debugctl; } -static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, - bool host_initiated) +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) { u64 invalid; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index b5758c33c60f..392e66c7e5fe 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated); +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081225-unsettled-hardener-28aa@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:08 -0700 Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports a subset of hardware functionality, i.e. KVM can't rely on hardware to detect illegal/unsupported values. Failure to check the vmcs12 value would allow the guest to load any harware-supported value while running L2. Take care to exempt BTF and LBR from the validity check in order to match KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR are being intercepted. Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set *and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but that would incur non-trivial complexity and wouldn't change the fact that KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity is not worth carrying. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 7211c71d4241..1b8b0642fc2d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl); + vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl & + vmx_get_supported_debugctl(vcpu, false)); } else { kvm_set_dr(vcpu, 7, vcpu->arch.dr7); vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); @@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && - CC(!kvm_dr7_valid(vmcs12->guest_dr7))) + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || + CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) && @@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) | (vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE); + /* + * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02. + * Writes to DEBUGCTL that aren't intercepted by L1 are immediately + * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into + * vmcs02 doesn't strictly track vmcs12. + */ if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS) vmcs12->guest_dr7 = vcpu->arch.dr7; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 4f827a75d980..6a8b78e954cd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu, return (unsigned long)data; } -static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) { u64 debugctl = 0; @@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated return debugctl; } -static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, - bool host_initiated) +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) { u64 invalid; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index b5758c33c60f..392e66c7e5fe 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated); +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 095686e6fcb4150f0a55b1a25987fad3d8af58d6 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081224-earlobe-tilt-6881@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 095686e6fcb4150f0a55b1a25987fad3d8af58d6 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:08 -0700 Subject: [PATCH] KVM: nVMX: Check vmcs12->guest_ia32_debugctl on nested VM-Enter Add a consistency check for L2's guest_ia32_debugctl, as KVM only supports a subset of hardware functionality, i.e. KVM can't rely on hardware to detect illegal/unsupported values. Failure to check the vmcs12 value would allow the guest to load any harware-supported value while running L2. Take care to exempt BTF and LBR from the validity check in order to match KVM's behavior for writes via WRMSR, but without clobbering vmcs12. Even if VM_EXIT_SAVE_DEBUG_CONTROLS is set in vmcs12, L1 can reasonably expect that vmcs12->guest_ia32_debugctl will not be modified if writes to the MSR are being intercepted. Arguably, KVM _should_ update vmcs12 if VM_EXIT_SAVE_DEBUG_CONTROLS is set *and* writes to MSR_IA32_DEBUGCTLMSR are not being intercepted by L1, but that would incur non-trivial complexity and wouldn't change the fact that KVM's handling of DEBUGCTL is blatantly broken. I.e. the extra complexity is not worth carrying. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> Co-developed-by: Sean Christopherson <seanjc(a)google.com> Link: https://lore.kernel.org/r/20250610232010.162191-7-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 7211c71d4241..1b8b0642fc2d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2663,7 +2663,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl); + vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl & + vmx_get_supported_debugctl(vcpu, false)); } else { kvm_set_dr(vcpu, 7, vcpu->arch.dr7); vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); @@ -3156,7 +3157,8 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu, return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS) && - CC(!kvm_dr7_valid(vmcs12->guest_dr7))) + (CC(!kvm_dr7_valid(vmcs12->guest_dr7)) || + CC(!vmx_is_valid_debugctl(vcpu, vmcs12->guest_ia32_debugctl, false)))) return -EINVAL; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT) && @@ -4608,6 +4610,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) (vmcs12->vm_entry_controls & ~VM_ENTRY_IA32E_MODE) | (vm_entry_controls_get(to_vmx(vcpu)) & VM_ENTRY_IA32E_MODE); + /* + * Note! Save DR7, but intentionally don't grab DEBUGCTL from vmcs02. + * Writes to DEBUGCTL that aren't intercepted by L1 are immediately + * propagated to vmcs12 (see vmx_set_msr()), as the value loaded into + * vmcs02 doesn't strictly track vmcs12. + */ if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_DEBUG_CONTROLS) vmcs12->guest_dr7 = vcpu->arch.dr7; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 4f827a75d980..6a8b78e954cd 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2174,7 +2174,7 @@ static u64 nested_vmx_truncate_sysenter_addr(struct kvm_vcpu *vcpu, return (unsigned long)data; } -static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated) { u64 debugctl = 0; @@ -2193,8 +2193,7 @@ static u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated return debugctl; } -static bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, - bool host_initiated) +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated) { u64 invalid; diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index b5758c33c60f..392e66c7e5fe 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -414,6 +414,9 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr, void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); +u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated); +bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated); + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and

1 month

1
0
0 0

FAILED: patch "[PATCH] KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 7d0cce6cbe71af6e9c1831bff101a2b9c249c4a2 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025081216-unruffled-starry-0b41@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7d0cce6cbe71af6e9c1831bff101a2b9c249c4a2 Mon Sep 17 00:00:00 2001 From: Maxim Levitsky <mlevitsk(a)redhat.com> Date: Tue, 10 Jun 2025 16:20:09 -0700 Subject: [PATCH] KVM: VMX: Wrap all accesses to IA32_DEBUGCTL with getter/setter APIs Introduce vmx_guest_debugctl_{read,write}() to handle all accesses to vmcs.GUEST_IA32_DEBUGCTL. This will allow stuffing FREEZE_IN_SMM into GUEST_IA32_DEBUGCTL based on the host setting without bleeding the state into the guest, and without needing to copy+paste the FREEZE_IN_SMM logic into every patch that accesses GUEST_IA32_DEBUGCTL. No functional change intended. Cc: stable(a)vger.kernel.org Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com> [sean: massage changelog, make inline, use in all prepare_vmcs02() cases] Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com> Link: https://lore.kernel.org/r/20250610232010.162191-8-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc(a)google.com> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 1b8b0642fc2d..ef20184b8b11 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -2663,11 +2663,11 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) { kvm_set_dr(vcpu, 7, vmcs12->guest_dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmcs12->guest_ia32_debugctl & - vmx_get_supported_debugctl(vcpu, false)); + vmx_guest_debugctl_write(vcpu, vmcs12->guest_ia32_debugctl & + vmx_get_supported_debugctl(vcpu, false)); } else { kvm_set_dr(vcpu, 7, vcpu->arch.dr7); - vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.pre_vmenter_debugctl); + vmx_guest_debugctl_write(vcpu, vmx->nested.pre_vmenter_debugctl); } if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) @@ -3532,7 +3532,7 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, if (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_DEBUG_CONTROLS)) - vmx->nested.pre_vmenter_debugctl = vmcs_read64(GUEST_IA32_DEBUGCTL); + vmx->nested.pre_vmenter_debugctl = vmx_guest_debugctl_read(); if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending || !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))) @@ -4806,7 +4806,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, __vmx_set_segment(vcpu, &seg, VCPU_SREG_LDTR); kvm_set_dr(vcpu, 7, 0x400); - vmcs_write64(GUEST_IA32_DEBUGCTL, 0); + vmx_guest_debugctl_write(vcpu, 0); if (nested_vmx_load_msr(vcpu, vmcs12->vm_exit_msr_load_addr, vmcs12->vm_exit_msr_load_count)) diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c index bbf4509f32d0..0b173602821b 100644 --- a/arch/x86/kvm/vmx/pmu_intel.c +++ b/arch/x86/kvm/vmx/pmu_intel.c @@ -653,11 +653,11 @@ static void intel_pmu_reset(struct kvm_vcpu *vcpu) */ static void intel_pmu_legacy_freezing_lbrs_on_pmi(struct kvm_vcpu *vcpu) { - u64 data = vmcs_read64(GUEST_IA32_DEBUGCTL); + u64 data = vmx_guest_debugctl_read(); if (data & DEBUGCTLMSR_FREEZE_LBRS_ON_PMI) { data &= ~DEBUGCTLMSR_LBR; - vmcs_write64(GUEST_IA32_DEBUGCTL, data); + vmx_guest_debugctl_write(vcpu, data); } } @@ -730,7 +730,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu) if (!lbr_desc->event) { vmx_disable_lbr_msrs_passthrough(vcpu); - if (vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR) + if (vmx_guest_debugctl_read() & DEBUGCTLMSR_LBR) goto warn; if (test_bit(INTEL_PMC_IDX_FIXED_VLBR, pmu->pmc_in_use)) goto warn; @@ -752,7 +752,7 @@ void vmx_passthrough_lbr_msrs(struct kvm_vcpu *vcpu) static void intel_pmu_cleanup(struct kvm_vcpu *vcpu) { - if (!(vmcs_read64(GUEST_IA32_DEBUGCTL) & DEBUGCTLMSR_LBR)) + if (!(vmx_guest_debugctl_read() & DEBUGCTLMSR_LBR)) intel_pmu_release_guest_lbr_event(vcpu); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6a8b78e954cd..a77d325fe78b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2149,7 +2149,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) msr_info->data = vmx->pt_desc.guest.addr_a[index / 2]; break; case MSR_IA32_DEBUGCTLMSR: - msr_info->data = vmcs_read64(GUEST_IA32_DEBUGCTL); + msr_info->data = vmx_guest_debugctl_read(); break; default: find_uret_msr: @@ -2283,7 +2283,8 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) VM_EXIT_SAVE_DEBUG_CONTROLS) get_vmcs12(vcpu)->guest_ia32_debugctl = data; - vmcs_write64(GUEST_IA32_DEBUGCTL, data); + vmx_guest_debugctl_write(vcpu, data); + if (intel_pmu_lbr_is_enabled(vcpu) && !to_vmx(vcpu)->lbr_desc.event && (data & DEBUGCTLMSR_LBR)) intel_pmu_create_guest_lbr_event(vcpu); @@ -4798,7 +4799,8 @@ static void init_vmcs(struct vcpu_vmx *vmx) vmcs_write32(GUEST_SYSENTER_CS, 0); vmcs_writel(GUEST_SYSENTER_ESP, 0); vmcs_writel(GUEST_SYSENTER_EIP, 0); - vmcs_write64(GUEST_IA32_DEBUGCTL, 0); + + vmx_guest_debugctl_write(&vmx->vcpu, 0); if (cpu_has_vmx_tpr_shadow()) { vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, 0); diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h index 392e66c7e5fe..c20a4185d10a 100644 --- a/arch/x86/kvm/vmx/vmx.h +++ b/arch/x86/kvm/vmx/vmx.h @@ -417,6 +417,16 @@ void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu); u64 vmx_get_supported_debugctl(struct kvm_vcpu *vcpu, bool host_initiated); bool vmx_is_valid_debugctl(struct kvm_vcpu *vcpu, u64 data, bool host_initiated); +static inline void vmx_guest_debugctl_write(struct kvm_vcpu *vcpu, u64 val) +{ + vmcs_write64(GUEST_IA32_DEBUGCTL, val); +} + +static inline u64 vmx_guest_debugctl_read(void) +{ + return vmcs_read64(GUEST_IA32_DEBUGCTL); +} + /* * Note, early Intel manuals have the write-low and read-high bitmap offsets * the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and

1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror