RE: [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest

21 Jul 2024

...
On Thu, Jul 18, 2024, H. Peter Anvin wrote:
...
On July 12, 2024 8:12:51 AM PDT, Sean Christopherson seanjc@google.com
wrote:
...
...
On Wed, Jul 10, 2024, Xin3 Li wrote:
...
...
On Wed, Feb 07, 2024, Xin Li wrote:
...
Switch MSR_IA32_FRED_RSP0 between host and guest in
vmx_prepare_switch_to_{host,guest}().
MSR_IA32_FRED_RSP0 is used during ring 3 event delivery only,
thus KVM, running on ring 0, can run safely with guest FRED
RSP0, i.e., no need to switch between host/guest FRED RSP0 during VM
entry and exit.
...
...
...
...
...
KVM should switch to host FRED RSP0 before returning to user
level, and switch to guest FRED RSP0 before entering guest mode.
Heh, if only KVM had a framework that was specifically designed
for context switching MSRs on return to userspace.  Translation:
please use the
user_return_msr() APIs.
IIUC the user return MSR framework works for MSRs that are per CPU
constants, but like MSR_KERNEL_GS_BASE, MSR_IA32_FRED_RSP0 is a per
*task* constant, thus we can't use it.
Ah, in that case, the changelog is very misleading and needs to be fixed.
Alternatively, is the desired RSP0 value tracked anywhere other than the MSR?
E.g. if it's somewhere in task_struct, then kvm_on_user_return()
would restore the current task's desired RSP0.  Even if we don't get
fancy, avoiding the RDMSR to get the current task's value would be nice.
Hm, perhaps the right thing to do is to always invoke this function
before a context switch happens if that happens before return to user space?
Actually, if the _TIF_NEED_RSP0_LOAD doesn't provide a meaningful benefit (or
y'all just don't want it :-) ),
We want it 😊.
My concern was adding an extra check of (ti_work & _TIF_NEED_RSP0_LOAD)
into a hot function arch_exit_to_user_mode_prepare().  HPA checked the
function and suggested to test ti_work for zero and then process
individual bits in it:

diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index fb2809b20b0a..4c78b99060b5 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -47,15 +47,17 @@ static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs)
 static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
                                                  unsigned long ti_work)
 {
-       if (ti_work & _TIF_USER_RETURN_NOTIFY)
-               fire_user_return_notifiers();
+       if (ti_work) {
+               if (ti_work & _TIF_USER_RETURN_NOTIFY)
+                       fire_user_return_notifiers();
-       if (unlikely(ti_work & _TIF_IO_BITMAP))
-               tss_update_io_bitmap();
+               if (unlikely(ti_work & _TIF_IO_BITMAP))
+                       tss_update_io_bitmap();
-       fpregs_assert_state_consistent();
-       if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
-               switch_fpu_return();
+               fpregs_assert_state_consistent();
+               if (unlikely(ti_work & _TIF_NEED_FPU_LOAD))
+                       switch_fpu_return();
+       }
#ifdef CONFIG_COMPAT
        /*
Based on it, I measured how many 0s are out of every one million ti_work
values in kernel build tests, it's over 99%, i.e., unlikely(ti_work).
When booting a KVM guest, it becomes 75%, which is expected.  After the
guest is up running kernel build in it, it's 99% again.
So at least this patch seems a low-hanging fruit, and I have sent it to
Intel 0day for broader perf tests.
As context switches are way less frequent than exit to user mode, I do
NOT expect it makes a difference to write MSR_IA32_FRED_RSP0 on exit to
user mode instead of on context switch especially when we do it on top
of the above patch.

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v2 09/25] KVM: VMX: Switch FRED RSP0 between host and guest