Malicious guests can cause bus locks to degrade the performance of a system. Non-WB (write-back) and misaligned locked RMW (read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee the atomicity. The bus locks can impose notable performance penalties for all processors within the system.
Support for the Bus Lock Threshold is indicated by CPUID Fn8000_000A_EDX[29] BusLockThreshold=1, the VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count.
VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations
Bus lock threshold count VMCB Offset Bits Function 120h 15:0 Bus lock counter
During VMRUN, the bus lock threshold count is fetched and stored in an internal count register. Prior to executing a bus lock within the guest, the processor verifies the count in the bus lock register. If the count is greater than zero, the processor executes the bus lock, reducing the count. However, if the count is zero, the bus lock operation is not performed, and instead, a Bus Lock Threshold #VMEXIT is triggered to transfer control to the Virtual Machine Monitor (VMM).
A Bus Lock Threshold #VMEXIT is reported to the VMM with VMEXIT code 0xA5h, VMEXIT_BUSLOCK. EXITINFO1 and EXITINFO2 are set to 0 on a VMEXIT_BUSLOCK. On a #VMEXIT, the processor writes the current value of the Bus Lock Threshold Counter to the VMCB.
More details about the Bus Lock Threshold feature can be found in AMD APM [1].
Patches are prepared on kvm-x86/svm (704ec48fc2fb)
Testing done: - Added a selftest for the Bus Lock Threadshold functionality. - Tested the Bus Lock Threshold functionality on SEV and SEV-ES guests. - Tested the Bus Lock Threshold functionality on nested guests.
Qemu changes can be found on: Repo: https://github.com/AMDESE/qemu.git Branch: buslock_threshold
Qemu commandline to use the bus lock threshold functionality: qemu-system-x86_64 -enable-kvm -cpu EPYC-Turin,+svm -M q35,bus-lock-ratelimit=10 \ ..
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
Manali Shukla (2): x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold KVM: x86: nSVM: Implement support for nested Bus Lock Threshold
Nikunj A Dadhania (2): KVM: SVM: Enable Bus lock threshold exit KVM: selftests: Add bus lock exit test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 5 +- arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 25 ++++ arch/x86/kvm/svm/svm.c | 48 ++++++++ arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.h | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_buslock_test.c | 114 ++++++++++++++++++ 10 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
base-commit: 704ec48fc2fbd4e41ec982662ad5bf1eee33eeb2
Malicious guests can cause bus locks to degrade the performance of a system. Non-WB(write-back) and misaligned locked RMW(read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee atomicity. The bus locks may incur significant performance penalties for all processors in the system.
The Bus Lock Threshold feature proves beneficial for hypervisors seeking to restrict guests' ability to initiate numerous bus locks, thereby preventing system slowdowns that affect all tenants.
Presence of the Bus Lock threshold feature is indicated via CPUID function 0x8000000A_EDX[29]
Signed-off-by: Manali Shukla manali.shukla@amd.com --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 3c7434329661..10f397873790 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -381,6 +381,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* "" SVME addr check */ +#define X86_FEATURE_BUS_LOCK_THRESHOLD (15*32+29) /* "" Bus lock threshold */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
base-commit: 704ec48fc2fbd4e41ec982662ad5bf1eee33eeb2
On Tue, Jul 09, 2024, Manali Shukla wrote:
Malicious guests can cause bus locks to degrade the performance of
I would say "misbehaving", I bet the overwhelming majority of bus locks in practice are due to legacy/crusty software, not malicious software.
a system. Non-WB(write-back) and misaligned locked RMW(read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee atomicity. The bus locks may incur significant performance penalties for all processors in the system.
The Bus Lock Threshold feature proves beneficial for hypervisors seeking to restrict guests' ability to initiate numerous bus locks, thereby preventing system slowdowns that affect all tenants.
None of this actually says what the feature does.
Presence of the Bus Lock threshold feature is indicated via CPUID function 0x8000000A_EDX[29]
Signed-off-by: Manali Shukla manali.shukla@amd.com
arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 3c7434329661..10f397873790 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -381,6 +381,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* "" SVME addr check */ +#define X86_FEATURE_BUS_LOCK_THRESHOLD (15*32+29) /* "" Bus lock threshold */
I would strongly prefer to enumerate this in /proc/cpuinfo, having to manually query CPUID to see if a CPU supports a feature I want to test is beyond annoying.
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
base-commit: 704ec48fc2fbd4e41ec982662ad5bf1eee33eeb2
2.34.1
Hi Sean,
Thank you for reviewing my patches.
On 8/17/2024 1:07 AM, Sean Christopherson wrote:
On Tue, Jul 09, 2024, Manali Shukla wrote:
Malicious guests can cause bus locks to degrade the performance of
I would say "misbehaving", I bet the overwhelming majority of bus locks in practice are due to legacy/crusty software, not malicious software.
Ack.
a system. Non-WB(write-back) and misaligned locked RMW(read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee atomicity. The bus locks may incur significant performance penalties for all processors in the system.
The Bus Lock Threshold feature proves beneficial for hypervisors seeking to restrict guests' ability to initiate numerous bus locks, thereby preventing system slowdowns that affect all tenants.
None of this actually says what the feature does.
Sure I will rewrite the commit message.
Presence of the Bus Lock threshold feature is indicated via CPUID function 0x8000000A_EDX[29]
Signed-off-by: Manali Shukla manali.shukla@amd.com
arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 3c7434329661..10f397873790 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -381,6 +381,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* "" SVME addr check */ +#define X86_FEATURE_BUS_LOCK_THRESHOLD (15*32+29) /* "" Bus lock threshold */
I would strongly prefer to enumerate this in /proc/cpuinfo, having to manually query CPUID to see if a CPU supports a feature I want to test is beyond annoying.
I will do the modifications accordingly.
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/
base-commit: 704ec48fc2fbd4e41ec982662ad5bf1eee33eeb2
2.34.1
- Manali
On Fri, Aug 16, 2024 at 12:37:52PM -0700, Sean Christopherson wrote:
I would strongly prefer to enumerate this in /proc/cpuinfo, having to manually query CPUID to see if a CPU supports a feature I want to test is beyond annoying.
Why?
We have tools/arch/x86/kcpuid/kcpuid.c for that.
On Thu, Aug 29, 2024, Borislav Petkov wrote:
On Fri, Aug 16, 2024 at 12:37:52PM -0700, Sean Christopherson wrote:
I would strongly prefer to enumerate this in /proc/cpuinfo, having to manually query CPUID to see if a CPU supports a feature I want to test is beyond annoying.
Why?
We have tools/arch/x86/kcpuid/kcpuid.c for that.
Ah, sorry, if the platform+kernel supports the feature, not just raw CPU. And because that utility is not available by default on most targets I care about, and having to build and copy over a binary is annoying (though this is a minor gripe).
That said, what I really want in most cases is to know if _KVM_ supports a feature. I'll think more on this, I have a few vague ideas for getting a pile of information out of KVM without needing to add more uABI.
On Thu, Aug 29, 2024 at 09:42:40PM -0700, Sean Christopherson wrote:
Ah, sorry, if the platform+kernel supports the feature, not just raw CPU.
Yeah, that's not always trivial, as I'm sure you know. Especially if it is a complicated feature like, SNP, for example, which needs fw and platform to be configured properly and so on.
And because that utility is not available by default on most targets I care about, and having to build and copy over a binary is annoying (though this is a minor gripe).
I'm keeping that thing as simple as possible on purpose. So if you wanna make it available on such targets, I'm all ears.
That said, what I really want in most cases is to know if _KVM_ supports a feature. I'll think more on this, I have a few vague ideas for getting a pile of information out of KVM without needing to add more uABI.
That's exactly my pet peeve - making it a uABI and then supporting it foreva.
We have tried to explain what cpuinfo should be:
Documentation/arch/x86/cpuinfo.rst
The gist of it is:
"So, the current use of /proc/cpuinfo is to show features which the kernel has *enabled* and *supports*. As in: the CPUID feature flag is there, there's an additional setup which the kernel has done while booting and the functionality is ready to use. A perfect example for that is "user_shstk" where additional code enablement is present in the kernel to support shadow stack for user programs."
So if it is something that has been enabled and is actively supported, then sure, ofc. What I don't want to have there is a partial mirror of every possible CPUID flag which is going to be a senseless and useless madness.
Dunno, I guess if we had a
"virt: ..."
line in /proc/cpuinfo which has flags of what the hypervisor has enabled as a feature, it might not be such a wrong idea... with the above caveats, ofc. I don't think you want a flurry of patches setting all possible flags just because.
Or maybe somewhere else where you can query it conveniently...
On 8/30/2024 1:51 PM, Borislav Petkov wrote:
On Thu, Aug 29, 2024 at 09:42:40PM -0700, Sean Christopherson wrote:
Ah, sorry, if the platform+kernel supports the feature, not just raw CPU.
Yeah, that's not always trivial, as I'm sure you know. Especially if it is a complicated feature like, SNP, for example, which needs fw and platform to be configured properly and so on.
And because that utility is not available by default on most targets I care about, and having to build and copy over a binary is annoying (though this is a minor gripe).
I'm keeping that thing as simple as possible on purpose. So if you wanna make it available on such targets, I'm all ears.
That said, what I really want in most cases is to know if _KVM_ supports a feature. I'll think more on this, I have a few vague ideas for getting a pile of information out of KVM without needing to add more uABI.
That's exactly my pet peeve - making it a uABI and then supporting it foreva.
We have tried to explain what cpuinfo should be:
Documentation/arch/x86/cpuinfo.rst
The gist of it is:
"So, the current use of /proc/cpuinfo is to show features which the kernel has *enabled* and *supports*. As in: the CPUID feature flag is there, there's an additional setup which the kernel has done while booting and the functionality is ready to use. A perfect example for that is "user_shstk" where additional code enablement is present in the kernel to support shadow stack for user programs."
So if it is something that has been enabled and is actively supported, then sure, ofc. What I don't want to have there is a partial mirror of every possible CPUID flag which is going to be a senseless and useless madness.
Dunno, I guess if we had a
"virt: ..."
line in /proc/cpuinfo which has flags of what the hypervisor has enabled as a feature, it might not be such a wrong idea... with the above caveats, ofc. I don't think you want a flurry of patches setting all possible flags just because.
Or maybe somewhere else where you can query it conveniently...
I came up with this patch. Does it look okay?
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 0b9611da6c53..74c52bfd8cf2 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -41,6 +41,7 @@ enum cpuid_leafs #define x86_cap_flag_num(flag) ((flag) >> 5), ((flag) & 31)
extern const char * const x86_cap_flags[NCAPINTS*32]; +extern const char * const x86_virt_flags[NCAPINTS*32]; extern const char * const x86_power_flags[32]; #define X86_CAP_FMT "%s" #define x86_cap_flag(flag) x86_cap_flags[flag] diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 734940fdb6c1..20f389ee0079 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -382,7 +382,7 @@ #define X86_FEATURE_V_SPEC_CTRL (15*32+20) /* "v_spec_ctrl" Virtual SPEC_CTRL */ #define X86_FEATURE_VNMI (15*32+25) /* "vnmi" Virtual NMI */ #define X86_FEATURE_SVME_ADDR_CHK (15*32+28) /* SVME addr check */ -#define X86_FEATURE_BUS_LOCK_THRESHOLD (15*32+29) /* "" Bus lock threshold */ +#define X86_VIRT_FEATURE_BUS_LOCK_THRESHOLD (15*32+29) /* "buslock" Bus lock threshold */
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* "avx512vbmi" AVX512 Vector Bit Manipulation instructions*/ diff --git a/arch/x86/kernel/cpu/mkcapflags.sh b/arch/x86/kernel/cpu/mkcapflags.sh index 68f537347466..3671c7892c56 100644 --- a/arch/x86/kernel/cpu/mkcapflags.sh +++ b/arch/x86/kernel/cpu/mkcapflags.sh @@ -62,6 +62,9 @@ trap 'rm "$OUT"' EXIT dump_array "x86_bug_flags" "NBUGINTS*32" "X86_BUG_" "NCAPINTS*32" $2 echo ""
+ dump_array "x86_virt_flags" "NCAPINTS*32" "X86_VIRT_FEATURE_" "" $2 + echo "" + echo "#ifdef CONFIG_X86_VMX_FEATURE_NAMES" echo "#ifndef _ASM_X86_VMXFEATURES_H" echo "#include <asm/vmxfeatures.h>" diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index e65fae63660e..3068b0a110e4 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -103,6 +103,11 @@ static int show_cpuinfo(struct seq_file *m, void *v) if (cpu_has(c, i) && x86_cap_flags[i] != NULL) seq_printf(m, " %s", x86_cap_flags[i]);
+ seq_puts(m, "\nvirt\t\t:"); + for (i = 0; i < 32*NCAPINTS; i++) + if (cpu_has(c, i) && x86_virt_flags[i] != NULL) + seq_printf(m, " %s", x86_virt_flags[i]); + #ifdef CONFIG_X86_VMX_FEATURE_NAMES if (cpu_has(c, X86_FEATURE_VMX) && c->vmx_capability[0]) { seq_puts(m, "\nvmx flags\t:");
Output for this patch from /proc/cpuinfo looks like below:
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp vmmcall fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx_vnni avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid movdiri movdir64b overflow_recov succor smca fsrm avx512_vp2intersect flush_l1d sev sev_es sev_snp debug_swap amd_lbr_pmc_freeze virt : buslock bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass
- Manali
From: Nikunj A Dadhania nikunj@amd.com
Malicious guests can cause bus locks to degrade the performance of system. Non-WB(write-back) and misaligned locked RMW(read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee atomicity. Bus locks may incur significant performance penalties for all processors in the system.
The Bus Lock Threshold feature proves beneficial for hypervisors seeking to restrict guests' ability to initiate numerous bus locks, thereby preventing system slowdowns that affect all tenants.
Support for the buslock threshold is indicated via CPUID function 0x8000000A_EDX[29].
VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations (occurs after guest instruction finishes)
Bus lock threshold VMCB Offset Bits Function 120h 15:0 Bus lock counter
Use the KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to enable the feature.
When the bus lock threshold counter reaches to zero, KVM will exit to user space by setting KVM_RUN_BUS_LOCK in vcpu->run->flags in bus_lock_exit handler, indicating that a bus lock has been detected in the guest.
More details about the Bus Lock Threshold feature can be found in AMD APM [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
[Manali: - Added exit reason string for SVM_EXIT_BUS_LOCK. - Moved enablement and disablement of bus lock intercept support. to svm_vcpu_after_set_cpuid(). - Massage commit message. - misc cleanups. ]
Signed-off-by: Nikunj A Dadhania nikunj@amd.com Co-developed-by: Manali Shukla manali.shukla@amd.com Signed-off-by: Manali Shukla manali.shukla@amd.com --- arch/x86/include/asm/svm.h | 5 +++- arch/x86/include/uapi/asm/svm.h | 2 ++ arch/x86/kvm/svm/svm.c | 43 +++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.h | 1 + 4 files changed, 50 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h index 728c98175b9c..538b7d60b05c 100644 --- a/arch/x86/include/asm/svm.h +++ b/arch/x86/include/asm/svm.h @@ -116,6 +116,7 @@ enum { INTERCEPT_INVPCID, INTERCEPT_MCOMMIT, INTERCEPT_TLBSYNC, + INTERCEPT_BUSLOCK, };
@@ -158,7 +159,9 @@ struct __attribute__ ((__packed__)) vmcb_control_area { u64 avic_physical_id; /* Offset 0xf8 */ u8 reserved_7[8]; u64 vmsa_pa; /* Used for an SEV-ES guest */ - u8 reserved_8[720]; + u8 reserved_8[16]; + u16 bus_lock_counter; /* Offset 0x120 */ + u8 reserved_9[702]; /* * Offset 0x3e0, 32 bytes reserved * for use by hypervisor/software. diff --git a/arch/x86/include/uapi/asm/svm.h b/arch/x86/include/uapi/asm/svm.h index 80e1df482337..dcce3ca367e9 100644 --- a/arch/x86/include/uapi/asm/svm.h +++ b/arch/x86/include/uapi/asm/svm.h @@ -95,6 +95,7 @@ #define SVM_EXIT_CR14_WRITE_TRAP 0x09e #define SVM_EXIT_CR15_WRITE_TRAP 0x09f #define SVM_EXIT_INVPCID 0x0a2 +#define SVM_EXIT_BUS_LOCK 0x0a5 #define SVM_EXIT_NPF 0x400 #define SVM_EXIT_AVIC_INCOMPLETE_IPI 0x401 #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS 0x402 @@ -223,6 +224,7 @@ { SVM_EXIT_CR4_WRITE_TRAP, "write_cr4_trap" }, \ { SVM_EXIT_CR8_WRITE_TRAP, "write_cr8_trap" }, \ { SVM_EXIT_INVPCID, "invpcid" }, \ + { SVM_EXIT_BUS_LOCK, "buslock" }, \ { SVM_EXIT_NPF, "npf" }, \ { SVM_EXIT_AVIC_INCOMPLETE_IPI, "avic_incomplete_ipi" }, \ { SVM_EXIT_AVIC_UNACCELERATED_ACCESS, "avic_unaccelerated_access" }, \ diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7d396f5fa010..9f1d51384eac 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -191,6 +191,9 @@ module_param(pause_filter_count_shrink, ushort, 0444); static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX; module_param(pause_filter_count_max, ushort, 0444);
+static unsigned short bus_lock_counter = KVM_SVM_DEFAULT_BUS_LOCK_COUNTER; +module_param(bus_lock_counter, ushort, 0644); + /* * Use nested page tables by default. Note, NPT may get forced off by * svm_hardware_setup() if it's unsupported by hardware or the host kernel. @@ -3231,6 +3234,19 @@ static int invpcid_interception(struct kvm_vcpu *vcpu) return kvm_handle_invpcid(vcpu, type, gva); }
+static int bus_lock_exit(struct kvm_vcpu *vcpu) +{ + struct vcpu_svm *svm = to_svm(vcpu); + + vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK; + vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK; + + /* Reload the counter again */ + svm->vmcb->control.bus_lock_counter = bus_lock_counter; + + return 0; +} + static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -3298,6 +3314,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception, + [SVM_EXIT_BUS_LOCK] = bus_lock_exit, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, @@ -4356,6 +4373,27 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0, !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+ if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) && + vcpu->kvm->arch.bus_lock_detection_enabled) { + svm_set_intercept(svm, INTERCEPT_BUSLOCK); + + /* + * The CPU decrements the bus lock counter every time a bus lock + * is detected. Once the counter reaches zero a VMEXIT_BUSLOCK + * is generated. A value of zero for bus lock counter means a + * VMEXIT_BUSLOCK at every bus lock detection. + * + * Currently, default value for bus_lock_counter is set to 10. + * So, the VMEXIT_BUSLOCK is generated after every 10 bus locks + * detected. + */ + svm->vmcb->control.bus_lock_counter = bus_lock_counter; + pr_debug("Setting buslock counter to %u\n", bus_lock_counter); + } else { + svm_clr_intercept(svm, INTERCEPT_BUSLOCK); + svm->vmcb->control.bus_lock_counter = 0; + } + if (sev_guest(vcpu->kvm)) sev_vcpu_after_set_cpuid(svm);
@@ -5149,6 +5187,11 @@ static __init void svm_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }
+ if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) { + pr_info("Bus Lock Threashold supported\n"); + kvm_caps.has_bus_lock_exit = true; + } + /* CPUID 0x80000008 */ if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || boot_cpu_has(X86_FEATURE_AMD_SSBD)) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d80a4c6b5a38..2a77232105da 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -58,6 +58,7 @@ void kvm_spurious_fault(void); #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX UINT_MAX #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX #define KVM_SVM_DEFAULT_PLE_WINDOW 3000 +#define KVM_SVM_DEFAULT_BUS_LOCK_COUNTER 10
static inline unsigned int __grow_ple_window(unsigned int val, unsigned int base, unsigned int modifier, unsigned int max)
On Tue, Jul 09, 2024, Manali Shukla wrote:
From: Nikunj A Dadhania nikunj@amd.com
Malicious guests can cause bus locks to degrade the performance of system. Non-WB(write-back) and misaligned locked RMW(read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee atomicity. Bus locks may incur significant performance penalties for all processors in the system.
Copy+pasting the background into every changelog isn't helpful. Instead, focus on what the feature actually does, and simply mention what bus locks are in passing. If someone really doesn't know, it shouldn't be had for them to find the previous changelog.
The Bus Lock Threshold feature proves beneficial for hypervisors seeking to restrict guests' ability to initiate numerous bus locks, thereby preventing system slowdowns that affect all tenants.
Support for the buslock threshold is indicated via CPUID function 0x8000000A_EDX[29].
VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations (occurs after guest instruction finishes)
Bus lock threshold VMCB Offset Bits Function 120h 15:0 Bus lock counter
I can make a pretty educated guess as to how this works, but this is a pretty simple feature, i.e. there's no reason not to document how it works in the changelog.
Use the KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to enable the feature.
When the bus lock threshold counter reaches to zero, KVM will exit to user space by setting KVM_RUN_BUS_LOCK in vcpu->run->flags in bus_lock_exit handler, indicating that a bus lock has been detected in the guest.
More details about the Bus Lock Threshold feature can be found in AMD APM [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
[Manali:
- Added exit reason string for SVM_EXIT_BUS_LOCK.
- Moved enablement and disablement of bus lock intercept support. to svm_vcpu_after_set_cpuid().
- Massage commit message.
- misc cleanups.
]
No need for this since you are listed as co-author.
Signed-off-by: Nikunj A Dadhania nikunj@amd.com Co-developed-by: Manali Shukla manali.shukla@amd.com Signed-off-by: Manali Shukla manali.shukla@amd.com
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7d396f5fa010..9f1d51384eac 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -191,6 +191,9 @@ module_param(pause_filter_count_shrink, ushort, 0444); static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX; module_param(pause_filter_count_max, ushort, 0444); +static unsigned short bus_lock_counter = KVM_SVM_DEFAULT_BUS_LOCK_COUNTER; +module_param(bus_lock_counter, ushort, 0644);
This should be read-only, otherwise the behavior is non-deterministic, e.g. as proposed, awon't take effect until a vCPU happens to trigger a bus lock exit.
If we really want it to be writable, then a per-VM capability is likely a better solution.
Actually, we already have a capability, which means there's zero reason for this module param to exist. Userspace already has to opt-in to turning on bus lock detection, i.e. userspace already has the opportunity to provide a different threshold.
That said, unless someone specifically needs a threshold other than '0', I vote to keep the uAPI as-is and simply exit on every bus lock.
/*
- Use nested page tables by default. Note, NPT may get forced off by
- svm_hardware_setup() if it's unsupported by hardware or the host kernel.
@@ -3231,6 +3234,19 @@ static int invpcid_interception(struct kvm_vcpu *vcpu) return kvm_handle_invpcid(vcpu, type, gva); } +static int bus_lock_exit(struct kvm_vcpu *vcpu) +{
- struct vcpu_svm *svm = to_svm(vcpu);
- vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
- vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
- /* Reload the counter again */
- svm->vmcb->control.bus_lock_counter = bus_lock_counter;
- return 0;
+}
static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -3298,6 +3314,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception,
- [SVM_EXIT_BUS_LOCK] = bus_lock_exit, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception,
@@ -4356,6 +4373,27 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
Why on earth is this in svm_vcpu_after_set_cpuid()? This has nothing to do with guest CPUID.
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0, !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
- if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) &&
This should be a slow path, there's zero reason to check for host support as bus_lock_detection_enabled should be allowed if and only if it's supported.
vcpu->kvm->arch.bus_lock_detection_enabled) {
svm_set_intercept(svm, INTERCEPT_BUSLOCK);
/*
* The CPU decrements the bus lock counter every time a bus lock
* is detected. Once the counter reaches zero a VMEXIT_BUSLOCK
* is generated. A value of zero for bus lock counter means a
* VMEXIT_BUSLOCK at every bus lock detection.
*
* Currently, default value for bus_lock_counter is set to 10.
Please don't document the default _here_. Because inevitably this will become stale when the default changes.
* So, the VMEXIT_BUSLOCK is generated after every 10 bus locks
* detected.
*/
svm->vmcb->control.bus_lock_counter = bus_lock_counter;
pr_debug("Setting buslock counter to %u\n", bus_lock_counter);
- } else {
svm_clr_intercept(svm, INTERCEPT_BUSLOCK);
svm->vmcb->control.bus_lock_counter = 0;
- }
- if (sev_guest(vcpu->kvm)) sev_vcpu_after_set_cpuid(svm);
@@ -5149,6 +5187,11 @@ static __init void svm_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }
- if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) {
pr_info("Bus Lock Threashold supported\n");
kvm_caps.has_bus_lock_exit = true;
- }
- /* CPUID 0x80000008 */ if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || boot_cpu_has(X86_FEATURE_AMD_SSBD))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d80a4c6b5a38..2a77232105da 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -58,6 +58,7 @@ void kvm_spurious_fault(void); #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX UINT_MAX #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX #define KVM_SVM_DEFAULT_PLE_WINDOW 3000 +#define KVM_SVM_DEFAULT_BUS_LOCK_COUNTER 10
There's zero reason this needs to be in x86.h. I don't even see a reason to have a #define, there's literally one user.
Hi Sean, Thank you for reviewing my patches.
On 8/17/2024 1:24 AM, Sean Christopherson wrote:
On Tue, Jul 09, 2024, Manali Shukla wrote:
From: Nikunj A Dadhania nikunj@amd.com
Malicious guests can cause bus locks to degrade the performance of system. Non-WB(write-back) and misaligned locked RMW(read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee atomicity. Bus locks may incur significant performance penalties for all processors in the system.
Copy+pasting the background into every changelog isn't helpful. Instead, focus on what the feature actually does, and simply mention what bus locks are in passing. If someone really doesn't know, it shouldn't be had for them to find the previous changelog.
Sure. I will rewrite the commit messages based on the suggestions.
The Bus Lock Threshold feature proves beneficial for hypervisors seeking to restrict guests' ability to initiate numerous bus locks, thereby preventing system slowdowns that affect all tenants.
Support for the buslock threshold is indicated via CPUID function 0x8000000A_EDX[29].
VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations (occurs after guest instruction finishes)
Bus lock threshold VMCB Offset Bits Function 120h 15:0 Bus lock counter
I can make a pretty educated guess as to how this works, but this is a pretty simple feature, i.e. there's no reason not to document how it works in the changelog.
Sure.
Use the KVM capability KVM_CAP_X86_BUS_LOCK_EXIT to enable the feature.
When the bus lock threshold counter reaches to zero, KVM will exit to user space by setting KVM_RUN_BUS_LOCK in vcpu->run->flags in bus_lock_exit handler, indicating that a bus lock has been detected in the guest.
More details about the Bus Lock Threshold feature can be found in AMD APM [1].
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
[Manali:
- Added exit reason string for SVM_EXIT_BUS_LOCK.
- Moved enablement and disablement of bus lock intercept support. to svm_vcpu_after_set_cpuid().
- Massage commit message.
- misc cleanups.
]
No need for this since you are listed as co-author.
Ack.
Signed-off-by: Nikunj A Dadhania nikunj@amd.com Co-developed-by: Manali Shukla manali.shukla@amd.com Signed-off-by: Manali Shukla manali.shukla@amd.com
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 7d396f5fa010..9f1d51384eac 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -191,6 +191,9 @@ module_param(pause_filter_count_shrink, ushort, 0444); static unsigned short pause_filter_count_max = KVM_SVM_DEFAULT_PLE_WINDOW_MAX; module_param(pause_filter_count_max, ushort, 0444); +static unsigned short bus_lock_counter = KVM_SVM_DEFAULT_BUS_LOCK_COUNTER; +module_param(bus_lock_counter, ushort, 0644);
This should be read-only, otherwise the behavior is non-deterministic, e.g. as proposed, awon't take effect until a vCPU happens to trigger a bus lock exit.
If we really want it to be writable, then a per-VM capability is likely a better solution.
Actually, we already have a capability, which means there's zero reason for this module param to exist. Userspace already has to opt-in to turning on bus lock detection, i.e. userspace already has the opportunity to provide a different threshold.
That said, unless someone specifically needs a threshold other than '0', I vote to keep the uAPI as-is and simply exit on every bus lock.
According to APM [1], "The VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count. On VMRUN, this value is loaded into an internal count register. Before the processor executes a bus lock in the guest, it checks the value of this register. If the value is greater than 0, the processor executes the bus lock successfully and decrements the count. If the value is 0, the bus lock is not executed and a #VMEXIT to the VMM is taken."
So, the bus_lock_counter value "0" always results in VMEXIT_BUSLOCK, so the default value of the bus_lock_counter should be greater or equal to "1".
I can remove the module parameter and initialize the value of bus_lock_counter as "1" ?
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
-Manali
On Sat, Aug 24, 2024, Manali Shukla wrote:
Actually, we already have a capability, which means there's zero reason for this module param to exist. Userspace already has to opt-in to turning on bus lock detection, i.e. userspace already has the opportunity to provide a different threshold.
That said, unless someone specifically needs a threshold other than '0', I vote to keep the uAPI as-is and simply exit on every bus lock.
According to APM [1], "The VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count. On VMRUN, this value is loaded into an internal count register. Before the processor executes a bus lock in the guest, it checks the value of this register. If the value is greater than 0, the processor executes the bus lock successfully and decrements the count. If the value is 0, the bus lock is not executed and a #VMEXIT to the VMM is taken."
So, the bus_lock_counter value "0" always results in VMEXIT_BUSLOCK, so the default value of the bus_lock_counter should be greater or equal to "1".
Ugh, so AMD's bus-lock VM-Exit is fault-like. That's annoying.
I can remove the module parameter and initialize the value of bus_lock_counter as "1" ?
No, because that will have the effect of detecting every other bus lock, whereas the intent is to detect _every_ bus lock.
I think the only sane approach is to set it to '0' when enabled, and then set it to '1' on a bus lock exit _before_ exiting to userspace. If userspace or the guest mucks with RIP or the guest code stream and doesn't immediately trigger the bus lock, then so be it. That only defers the allowed bus lock to a later time, so effectively such shenanigans would penalize the guest even more.
We'll need to document that KVM on AMD exits to userspace with RIP pointing at the offending instruction, whereas KVM on Intel exits with RIP pointing at the instruction after the guilty instruction.
On 8/26/2024 9:45 PM, Sean Christopherson wrote:
On Sat, Aug 24, 2024, Manali Shukla wrote:
Actually, we already have a capability, which means there's zero reason for this module param to exist. Userspace already has to opt-in to turning on bus lock detection, i.e. userspace already has the opportunity to provide a different threshold.
That said, unless someone specifically needs a threshold other than '0', I vote to keep the uAPI as-is and simply exit on every bus lock.
According to APM [1], "The VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count. On VMRUN, this value is loaded into an internal count register. Before the processor executes a bus lock in the guest, it checks the value of this register. If the value is greater than 0, the processor executes the bus lock successfully and decrements the count. If the value is 0, the bus lock is not executed and a #VMEXIT to the VMM is taken."
So, the bus_lock_counter value "0" always results in VMEXIT_BUSLOCK, so the default value of the bus_lock_counter should be greater or equal to "1".
Ugh, so AMD's bus-lock VM-Exit is fault-like. That's annoying.
Yeah.
I can remove the module parameter and initialize the value of bus_lock_counter as "1" ?
No, because that will have the effect of detecting every other bus lock, whereas the intent is to detect _every_ bus lock.
I think the only sane approach is to set it to '0' when enabled, and then set it to '1' on a bus lock exit _before_ exiting to userspace. If userspace or the guest mucks with RIP or the guest code stream and doesn't immediately trigger the bus lock, then so be it. That only defers the allowed bus lock to a later time, so effectively such shenanigans would penalize the guest even more.
When the bus_lock_counter is set to '1' and a bus lock is generated in the guest, the counter is decremented to '0', triggering a bus lock exit immediately. So, bus lock exit is triggered for every generated bus locks in the guest when bus_lock_counter value is set to '1'.
We'll need to document that KVM on AMD exits to userspace with RIP pointing at the offending instruction, whereas KVM on Intel exits with RIP pointing at the instruction after the guilty instruction.
Sure I will document it.
- Manali
/*
- Use nested page tables by default. Note, NPT may get forced off by
- svm_hardware_setup() if it's unsupported by hardware or the host kernel.
@@ -3231,6 +3234,19 @@ static int invpcid_interception(struct kvm_vcpu *vcpu) return kvm_handle_invpcid(vcpu, type, gva); } +static int bus_lock_exit(struct kvm_vcpu *vcpu) +{
- struct vcpu_svm *svm = to_svm(vcpu);
- vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK;
- vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK;
- /* Reload the counter again */
- svm->vmcb->control.bus_lock_counter = bus_lock_counter;
- return 0;
+}
static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_READ_CR0] = cr_interception, [SVM_EXIT_READ_CR3] = cr_interception, @@ -3298,6 +3314,7 @@ static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { [SVM_EXIT_CR4_WRITE_TRAP] = cr_trap, [SVM_EXIT_CR8_WRITE_TRAP] = cr_trap, [SVM_EXIT_INVPCID] = invpcid_interception,
- [SVM_EXIT_BUS_LOCK] = bus_lock_exit, [SVM_EXIT_NPF] = npf_interception, [SVM_EXIT_RSM] = rsm_interception, [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception,
@@ -4356,6 +4373,27 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
Why on earth is this in svm_vcpu_after_set_cpuid()? This has nothing to do with guest CPUID.>
Sorry, my bad. I will move it to init_vmcb().
set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0, !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
- if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) &&
This should be a slow path, there's zero reason to check for host support as bus_lock_detection_enabled should be allowed if and only if it's supported.>
Agreed. I will remove this check.
vcpu->kvm->arch.bus_lock_detection_enabled) {
svm_set_intercept(svm, INTERCEPT_BUSLOCK);
/*
* The CPU decrements the bus lock counter every time a bus lock
* is detected. Once the counter reaches zero a VMEXIT_BUSLOCK
* is generated. A value of zero for bus lock counter means a
* VMEXIT_BUSLOCK at every bus lock detection.
*
* Currently, default value for bus_lock_counter is set to 10.
Please don't document the default _here_. Because inevitably this will become stale when the default changes.
Ack.
* So, the VMEXIT_BUSLOCK is generated after every 10 bus locks
* detected.
*/
svm->vmcb->control.bus_lock_counter = bus_lock_counter;
pr_debug("Setting buslock counter to %u\n", bus_lock_counter);
- } else {
svm_clr_intercept(svm, INTERCEPT_BUSLOCK);
svm->vmcb->control.bus_lock_counter = 0;
- }
- if (sev_guest(vcpu->kvm)) sev_vcpu_after_set_cpuid(svm);
@@ -5149,6 +5187,11 @@ static __init void svm_set_cpu_caps(void) kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); }
- if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) {
pr_info("Bus Lock Threashold supported\n");
kvm_caps.has_bus_lock_exit = true;
- }
- /* CPUID 0x80000008 */ if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) || boot_cpu_has(X86_FEATURE_AMD_SSBD))
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d80a4c6b5a38..2a77232105da 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -58,6 +58,7 @@ void kvm_spurious_fault(void); #define KVM_VMX_DEFAULT_PLE_WINDOW_MAX UINT_MAX #define KVM_SVM_DEFAULT_PLE_WINDOW_MAX USHRT_MAX #define KVM_SVM_DEFAULT_PLE_WINDOW 3000 +#define KVM_SVM_DEFAULT_BUS_LOCK_COUNTER 10
There's zero reason this needs to be in x86.h. I don't even see a reason to have a #define, there's literally one user.
Yeah. I agree. I remove it from V2.
- Manali
Expose the Bus Lock Threshold in the guest CPUID and support its functionality in nested guest.
Ensure proper restoration and saving of the bus_lock_counter at VM Entry and VM Exit respectively in nested guest scenarios.
Case 1: L0 supports buslock exit and L1 does not: use buslock counter from L0 and exits happen to L0 VMM.
Case 2: Both L0 and L1 supports buslock exit: use L1 buslock counter value and exits happen to L1 VMM.
Signed-off-by: Manali Shukla manali.shukla@amd.com --- arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 25 +++++++++++++++++++++++++ arch/x86/kvm/svm/svm.c | 5 +++++ arch/x86/kvm/svm/svm.h | 1 + 4 files changed, 32 insertions(+)
diff --git a/arch/x86/kvm/governed_features.h b/arch/x86/kvm/governed_features.h index ad463b1ed4e4..0982eb107f0b 100644 --- a/arch/x86/kvm/governed_features.h +++ b/arch/x86/kvm/governed_features.h @@ -17,6 +17,7 @@ KVM_GOVERNED_X86_FEATURE(PFTHRESHOLD) KVM_GOVERNED_X86_FEATURE(VGIF) KVM_GOVERNED_X86_FEATURE(VNMI) KVM_GOVERNED_X86_FEATURE(LAM) +KVM_GOVERNED_X86_FEATURE(BUS_LOCK_THRESHOLD)
#undef KVM_GOVERNED_X86_FEATURE #undef KVM_GOVERNED_FEATURE diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 6f704c1037e5..d09434225e4d 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -363,6 +363,7 @@ void __nested_copy_vmcb_control_to_cache(struct kvm_vcpu *vcpu, to->virt_ext = from->virt_ext; to->pause_filter_count = from->pause_filter_count; to->pause_filter_thresh = from->pause_filter_thresh; + to->bus_lock_counter = from->bus_lock_counter;
/* Copy asid here because nested_vmcb_check_controls will check it. */ to->asid = from->asid; @@ -758,6 +759,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, } }
+ /* + * If guest intercepts BUSLOCK, use guest's bus_lock_counter value, + * otherwise use host bus_lock_counter value. + */ + if (guest_can_use(vcpu, X86_FEATURE_BUS_LOCK_THRESHOLD) && + vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_BUSLOCK)) + vmcb02->control.bus_lock_counter = svm->nested.ctl.bus_lock_counter; + else + vmcb02->control.bus_lock_counter = vmcb01->control.bus_lock_counter; + nested_svm_transition_tlb_flush(vcpu);
/* Enter Guest-Mode */ @@ -1035,6 +1046,12 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
}
+ if (guest_can_use(vcpu, X86_FEATURE_BUS_LOCK_THRESHOLD) && + vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_BUSLOCK)) + vmcb12->control.bus_lock_counter = vmcb02->control.bus_lock_counter; + else + vmcb01->control.bus_lock_counter = vmcb02->control.bus_lock_counter; + nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr);
svm_switch_vmcb(svm, &svm->vmcb01); @@ -1333,6 +1350,13 @@ static int nested_svm_intercept(struct vcpu_svm *svm) vmexit = NESTED_EXIT_DONE; break; } + case SVM_EXIT_BUS_LOCK: { + if (!(vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_BUSLOCK))) + vmexit = NESTED_EXIT_HOST; + else + vmexit = NESTED_EXIT_DONE; + break; + } default: { if (vmcb12_is_intercept(&svm->nested.ctl, exit_code)) vmexit = NESTED_EXIT_DONE; @@ -1572,6 +1596,7 @@ static void nested_copy_vmcb_cache_to_control(struct vmcb_control_area *dst, dst->virt_ext = from->virt_ext; dst->pause_filter_count = from->pause_filter_count; dst->pause_filter_thresh = from->pause_filter_thresh; + dst->bus_lock_counter = from->bus_lock_counter; /* 'clean' and 'hv_enlightenments' are not changed by KVM */ }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9f1d51384eac..bb2437a7694c 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4373,6 +4373,8 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) set_msr_interception(vcpu, svm->msrpm, MSR_IA32_FLUSH_CMD, 0, !!guest_cpuid_has(vcpu, X86_FEATURE_FLUSH_L1D));
+ kvm_governed_feature_check_and_set(vcpu, X86_FEATURE_BUS_LOCK_THRESHOLD); + if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD) && vcpu->kvm->arch.bus_lock_detection_enabled) { svm_set_intercept(svm, INTERCEPT_BUSLOCK); @@ -5183,6 +5185,9 @@ static __init void svm_set_cpu_caps(void) if (vnmi) kvm_cpu_cap_set(X86_FEATURE_VNMI);
+ if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) + kvm_cpu_cap_set(X86_FEATURE_BUS_LOCK_THRESHOLD); + /* Nested VM can receive #VMEXIT instead of triggering #GP */ kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); } diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 8983eabf8f84..f49ea38187ba 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -150,6 +150,7 @@ struct vmcb_ctrl_area_cached { u64 nested_cr3; u64 virt_ext; u32 clean; + u16 bus_lock_counter; union { #if IS_ENABLED(CONFIG_HYPERV) || IS_ENABLED(CONFIG_KVM_HYPERV) struct hv_vmcb_enlightenments hv_enlightenments;
On Tue, Jul 09, 2024, Manali Shukla wrote:
Expose the Bus Lock Threshold in the guest CPUID and support its functionality in nested guest.
Why? This is a rather messy feature to support in a nested setup, and I'd much prefer to not open that can of worms unless there's a very good reason to do so.
Ensure proper restoration and saving of the bus_lock_counter at VM Entry and VM Exit respectively in nested guest scenarios.
Case 1: L0 supports buslock exit and L1 does not: use buslock counter from L0 and exits happen to L0 VMM.
Case 2: Both L0 and L1 supports buslock exit: use L1 buslock counter value and exits happen to L1 VMM.
Yeah, no. L1 wants to attack the host, so it runs L2 with buslock detection enabled, but the highest possible threshold. Game over.
If we take the min between the two, then we have to track the delta and shove _that_ into the VMCB. E.g. L1 wants every 4, L0 wants every 5. After 4 locks, KVM synthesizes a nested VM-Exit. Then on nested VMRUN, KVM needs to remember it should run L2 with a threshold of 1.
If we really want to support virtualizing bus lock detection for L1, the easiest approach would be to do so if and only if it's NOT in use by L0. But IMO that's not worth doing.
On 8/17/2024 1:35 AM, Sean Christopherson wrote:
On Tue, Jul 09, 2024, Manali Shukla wrote:
Expose the Bus Lock Threshold in the guest CPUID and support its functionality in nested guest.
Why? This is a rather messy feature to support in a nested setup, and I'd much prefer to not open that can of worms unless there's a very good reason to do so.
Agreed.
Ensure proper restoration and saving of the bus_lock_counter at VM Entry and VM Exit respectively in nested guest scenarios.
Case 1: L0 supports buslock exit and L1 does not: use buslock counter from L0 and exits happen to L0 VMM.
Case 2: Both L0 and L1 supports buslock exit: use L1 buslock counter value and exits happen to L1 VMM.
Yeah, no. L1 wants to attack the host, so it runs L2 with buslock detection enabled, but the highest possible threshold. Game over.
If we take the min between the two, then we have to track the delta and shove _that_ into the VMCB. E.g. L1 wants every 4, L0 wants every 5. After 4 locks, KVM synthesizes a nested VM-Exit. Then on nested VMRUN, KVM needs to remember it should run L2 with a threshold of 1.
If we really want to support virtualizing bus lock detection for L1, the easiest approach would be to do so if and only if it's NOT in use by L0. But IMO that's not worth doing.
I will drop the nested implementation in V2.
-Manali
On Tue, Jul 09, 2024, Manali Shukla wrote:
@@ -758,6 +759,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, } }
- /*
* If guest intercepts BUSLOCK, use guest's bus_lock_counter value,
* otherwise use host bus_lock_counter value.
*/
- if (guest_can_use(vcpu, X86_FEATURE_BUS_LOCK_THRESHOLD) &&
vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_BUSLOCK))
vmcb02->control.bus_lock_counter = svm->nested.ctl.bus_lock_counter;
- else
vmcb02->control.bus_lock_counter = vmcb01->control.bus_lock_counter;
Copying vmcb01's count to/from vmcb02 belongs in the core enabling patch. From KVM's perspective, the counter is associated with a vCPU, not a VMCB, and so the count should keep running across nested transitions.
As written, taking only the core enabling patch will mean that L2 runs with the wrong count. Amusingly, because '0' means "always exit", L2 would run in a *more* restrictive environment due to the VMCB being zero-allocated.
On 8/17/2024 1:44 AM, Sean Christopherson wrote:
On Tue, Jul 09, 2024, Manali Shukla wrote:
@@ -758,6 +759,16 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm, } }
- /*
* If guest intercepts BUSLOCK, use guest's bus_lock_counter value,
* otherwise use host bus_lock_counter value.
*/
- if (guest_can_use(vcpu, X86_FEATURE_BUS_LOCK_THRESHOLD) &&
vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_BUSLOCK))
vmcb02->control.bus_lock_counter = svm->nested.ctl.bus_lock_counter;
- else
vmcb02->control.bus_lock_counter = vmcb01->control.bus_lock_counter;
Copying vmcb01's count to/from vmcb02 belongs in the core enabling patch. From KVM's perspective, the counter is associated with a vCPU, not a VMCB, and so the count should keep running across nested transitions.
As written, taking only the core enabling patch will mean that L2 runs with the wrong count. Amusingly, because '0' means "always exit", L2 would run in a *more* restrictive environment due to the VMCB being zero-allocated.
Yeah. From my testing, with core enabling patch + copying vmcb01's count to/from vmcb02, L2 runs with correct value of bus lock counter and counter continues to run across nested transitions. The bus lock exit happens to L0 hypervisor when buslock is generated from L2 guest.
- Manali
From: Nikunj A Dadhania nikunj@amd.com
Malicious guests can cause bus locks to degrade the performance of a system. The Bus Lock Threshold feature is beneficial for hypervisors aiming to restrict the ability of the guests to perform excessive bus locks and slow down the system for all the tenants.
Add a test case to verify the Bus Lock Threshold feature for SVM.
[Manali: - The KVM_CAP_X86_BUS_LOCK_EXIT capability is not enabled while vcpus are created, changed the VM and vCPU creation logic to resolve the mentioned issue. - Added nested guest test case for bus lock exit. - massage commit message. - misc cleanups. ]
Signed-off-by: Nikunj A Dadhania nikunj@amd.com Co-developed-by: Manali Shukla manali.shukla@amd.com Signed-off-by: Manali Shukla manali.shukla@amd.com --- tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_buslock_test.c | 114 ++++++++++++++++++ 2 files changed, 115 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index ce8ff8e8ce3a..711ec195e386 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -94,6 +94,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_buslock_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c b/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c new file mode 100644 index 000000000000..dcb595999046 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c @@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * svm_buslock_test + * + * Copyright (C) 2024 Advanced Micro Devices, Inc. + * + * SVM testing: Buslock exit + */ + +#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h" + +#define NO_ITERATIONS 100 +#define __cacheline_aligned __aligned(128) + +struct buslock_test { + unsigned char pad[126]; + atomic_long_t val; +} __packed; + +struct buslock_test test __cacheline_aligned; + +static __always_inline void buslock_atomic_add(int i, atomic_long_t *v) +{ + asm volatile(LOCK_PREFIX "addl %1,%0" + : "+m" (v->counter) + : "ir" (i) : "memory"); +} + +static void buslock_add(void) +{ + /* + * Increment a cache unaligned variable atomically. + * This should generate a bus lock exit. + */ + for (int i = 0; i < NO_ITERATIONS; i++) + buslock_atomic_add(2, &test.val); +} + +static void l2_guest_code(void) +{ + buslock_add(); + GUEST_DONE(); +} + +static void l1_guest_code(struct svm_test_data *svm) +{ + #define L2_GUEST_STACK_SIZE 64 + unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE]; + struct vmcb *vmcb = svm->vmcb; + + generic_svm_setup(svm, l2_guest_code, + &l2_guest_stack[L2_GUEST_STACK_SIZE]); + run_guest(vmcb, svm->vmcb_gpa); + GUEST_ASSERT(vmcb->control.exit_code == SVM_EXIT_VMMCALL); + GUEST_DONE(); +} + +static void guest_code(struct svm_test_data *svm) +{ + buslock_add(); + + if (this_cpu_has(X86_FEATURE_SVM)) + l1_guest_code(svm); +} + +int main(int argc, char *argv[]) +{ + struct kvm_vcpu *vcpu; + struct kvm_run *run; + struct kvm_vm *vm; + vm_vaddr_t svm_gva; + + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_SVM)); + TEST_REQUIRE(kvm_has_cap(KVM_CAP_X86_BUS_LOCK_EXIT)); + + vm = vm_create(1); + vm_enable_cap(vm, KVM_CAP_X86_BUS_LOCK_EXIT, KVM_BUS_LOCK_DETECTION_EXIT); + vcpu = vm_vcpu_add(vm, 0, guest_code); + + vcpu_alloc_svm(vm, &svm_gva); + vcpu_args_set(vcpu, 1, svm_gva); + + run = vcpu->run; + + for (;;) { + struct ucall uc; + + vcpu_run(vcpu); + + if (run->exit_reason == KVM_EXIT_X86_BUS_LOCK) { + run->flags &= ~KVM_RUN_X86_BUS_LOCK; + run->exit_reason = 0; + continue; + } + + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + /* NOT REACHED */ + case UCALL_SYNC: + break; + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } + } +done: + kvm_vm_free(vm); + return 0; +}
On Tue, Jul 09, 2024, Manali Shukla wrote:
From: Nikunj A Dadhania nikunj@amd.com
Malicious guests can cause bus locks to degrade the performance of a system. The Bus Lock Threshold feature is beneficial for hypervisors aiming to restrict the ability of the guests to perform excessive bus locks and slow down the system for all the tenants.
Add a test case to verify the Bus Lock Threshold feature for SVM.
[Manali:
- The KVM_CAP_X86_BUS_LOCK_EXIT capability is not enabled while vcpus are created, changed the VM and vCPU creation logic to resolve the mentioned issue.
- Added nested guest test case for bus lock exit.
- massage commit message.
- misc cleanups. ]
Again, 99% of the changelog is boilerplate that does nothing to help me understand what the test actually does.
Signed-off-by: Nikunj A Dadhania nikunj@amd.com Co-developed-by: Manali Shukla manali.shukla@amd.com Signed-off-by: Manali Shukla manali.shukla@amd.com
tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_buslock_test.c | 114 ++++++++++++++++++ 2 files changed, 115 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index ce8ff8e8ce3a..711ec195e386 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -94,6 +94,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_buslock_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c b/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c new file mode 100644 index 000000000000..dcb595999046 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
I would *very* strongly prefer to have a bus lock test that is comment to VMX and SVM. For L1, there's no unique behavior. And for L2, assuming we don't support nested bus lock enabling, the only vendor specific bits are launching L2.
I.e. writing this so it works on both VMX and SVM should be quite straightforward.
@@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- svm_buslock_test
- Copyright (C) 2024 Advanced Micro Devices, Inc.
- SVM testing: Buslock exit
Keep the Copyright, ditch everything else.
- */
+#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h"
+#define NO_ITERATIONS 100
Heh, NR_ITERATIONS.
+#define __cacheline_aligned __aligned(128)
Eh, I would just split a page, that's about as future proof as we can get in terms of cache line sizes.
+struct buslock_test {
- unsigned char pad[126];
- atomic_long_t val;
+} __packed;
+struct buslock_test test __cacheline_aligned;
+static __always_inline void buslock_atomic_add(int i, atomic_long_t *v) +{
- asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)
: "ir" (i) : "memory");
+}
+static void buslock_add(void) +{
- /*
* Increment a cache unaligned variable atomically.
* This should generate a bus lock exit.
So... this test doesn't actually verify that a bus lock exit occurs. The userspace side will eat an exit if one occurs, but there's literally not a single TEST_ASSERT() in here.
Hi Sean, Thank you for reviewing my changes.
On 8/17/2024 1:51 AM, Sean Christopherson wrote:
On Tue, Jul 09, 2024, Manali Shukla wrote:
From: Nikunj A Dadhania nikunj@amd.com
Malicious guests can cause bus locks to degrade the performance of a system. The Bus Lock Threshold feature is beneficial for hypervisors aiming to restrict the ability of the guests to perform excessive bus locks and slow down the system for all the tenants.
Add a test case to verify the Bus Lock Threshold feature for SVM.
[Manali:
- The KVM_CAP_X86_BUS_LOCK_EXIT capability is not enabled while vcpus are created, changed the VM and vCPU creation logic to resolve the mentioned issue.
- Added nested guest test case for bus lock exit.
- massage commit message.
- misc cleanups. ]
Again, 99% of the changelog is boilerplate that does nothing to help me understand what the test actually does.
Sure. I will rewrite the commit messages for all the patches.
Signed-off-by: Nikunj A Dadhania nikunj@amd.com Co-developed-by: Manali Shukla manali.shukla@amd.com Signed-off-by: Manali Shukla manali.shukla@amd.com
tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_buslock_test.c | 114 ++++++++++++++++++ 2 files changed, 115 insertions(+) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile index ce8ff8e8ce3a..711ec195e386 100644 --- a/tools/testing/selftests/kvm/Makefile +++ b/tools/testing/selftests/kvm/Makefile @@ -94,6 +94,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/smaller_maxphyaddr_emulation_test TEST_GEN_PROGS_x86_64 += x86_64/smm_test TEST_GEN_PROGS_x86_64 += x86_64/state_test TEST_GEN_PROGS_x86_64 += x86_64/vmx_preemption_timer_test +TEST_GEN_PROGS_x86_64 += x86_64/svm_buslock_test TEST_GEN_PROGS_x86_64 += x86_64/svm_vmcall_test TEST_GEN_PROGS_x86_64 += x86_64/svm_int_ctl_test TEST_GEN_PROGS_x86_64 += x86_64/svm_nested_shutdown_test diff --git a/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c b/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c new file mode 100644 index 000000000000..dcb595999046 --- /dev/null +++ b/tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
I would *very* strongly prefer to have a bus lock test that is comment to VMX and SVM. For L1, there's no unique behavior. And for L2, assuming we don't support nested bus lock enabling, the only vendor specific bits are launching L2.
I.e. writing this so it works on both VMX and SVM should be quite straightforward.
Sure I will try to write a common test for SVM and VMX.
@@ -0,0 +1,114 @@ +// SPDX-License-Identifier: GPL-2.0-only +/*
- svm_buslock_test
- Copyright (C) 2024 Advanced Micro Devices, Inc.
- SVM testing: Buslock exit
Keep the Copyright, ditch everything else.
Sure.
- */
+#include "test_util.h" +#include "kvm_util.h" +#include "processor.h" +#include "svm_util.h"
+#define NO_ITERATIONS 100
Heh, NR_ITERATIONS.
Ack.
+#define __cacheline_aligned __aligned(128)
Eh, I would just split a page, that's about as future proof as we can get in terms of cache line sizes.
Sure.
+struct buslock_test {
- unsigned char pad[126];
- atomic_long_t val;
+} __packed;
+struct buslock_test test __cacheline_aligned;
+static __always_inline void buslock_atomic_add(int i, atomic_long_t *v) +{
- asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)
: "ir" (i) : "memory");
+}
+static void buslock_add(void) +{
- /*
* Increment a cache unaligned variable atomically.
* This should generate a bus lock exit.
So... this test doesn't actually verify that a bus lock exit occurs. The userspace side will eat an exit if one occurs, but there's literally not a single TEST_ASSERT() in here.
Agreed, How about doing following?
+ for (;;) { + struct ucall uc; + + vcpu_run(vcpu); + + if (run->exit_reason == KVM_EXIT_IO) { + switch (get_ucall(vcpu, &uc)) { + case UCALL_ABORT: + REPORT_GUEST_ASSERT(uc); + /* NOT REACHED */ + case UCALL_SYNC: + break; + case UCALL_DONE: + goto done; + default: + TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd); + } + } + + TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_X86_BUS_LOCK); + TEST_ASSERT_EQ(run->flags, KVM_RUN_X86_BUS_LOCK); + run->flags &= ~KVM_RUN_X86_BUS_LOCK; + run->exit_reason = 0; + }
- Manali
On Mon, Aug 26, 2024, Manali Shukla wrote:
+struct buslock_test {
- unsigned char pad[126];
- atomic_long_t val;
+} __packed;
+struct buslock_test test __cacheline_aligned;
+static __always_inline void buslock_atomic_add(int i, atomic_long_t *v) +{
- asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)
: "ir" (i) : "memory");
+}
+static void buslock_add(void) +{
- /*
* Increment a cache unaligned variable atomically.
* This should generate a bus lock exit.
So... this test doesn't actually verify that a bus lock exit occurs. The userspace side will eat an exit if one occurs, but there's literally not a single TEST_ASSERT() in here.
Agreed, How about doing following?
for (;;) {
struct ucall uc;
vcpu_run(vcpu);
if (run->exit_reason == KVM_EXIT_IO) {
switch (get_ucall(vcpu, &uc)) {
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
case UCALL_SYNC:
break;
case UCALL_DONE:
goto done;
default:
TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
}
}
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_X86_BUS_LOCK);
I doubt this works, the UCALL_SYNC above will fallthrough to this assert. I assume run->exit_reason needs a continue for UCALL_SYNC.
TEST_ASSERT_EQ(run->flags, KVM_RUN_X86_BUS_LOCK);
run->flags &= ~KVM_RUN_X86_BUS_LOCK;
No need, KVM should clear the flag if the exit isn't due to a bus lock.
run->exit_reason = 0;
Again, no need, KVM should take care of resetting exit_reason.
}
- Manali
Hi Sean,
Thank you for reviewing my patches.
On 8/26/2024 9:36 PM, Sean Christopherson wrote:
On Mon, Aug 26, 2024, Manali Shukla wrote:
+struct buslock_test {
- unsigned char pad[126];
- atomic_long_t val;
+} __packed;
+struct buslock_test test __cacheline_aligned;
+static __always_inline void buslock_atomic_add(int i, atomic_long_t *v) +{
- asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)
: "ir" (i) : "memory");
+}
+static void buslock_add(void) +{
- /*
* Increment a cache unaligned variable atomically.
* This should generate a bus lock exit.
So... this test doesn't actually verify that a bus lock exit occurs. The userspace side will eat an exit if one occurs, but there's literally not a single TEST_ASSERT() in here.
Agreed, How about doing following?
for (;;) {
struct ucall uc;
vcpu_run(vcpu);
if (run->exit_reason == KVM_EXIT_IO) {
switch (get_ucall(vcpu, &uc)) {
case UCALL_ABORT:
REPORT_GUEST_ASSERT(uc);
/* NOT REACHED */
case UCALL_SYNC:
break;
case UCALL_DONE:
goto done;
default:
TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
}
}
TEST_ASSERT_KVM_EXIT_REASON(vcpu, KVM_EXIT_X86_BUS_LOCK);
I doubt this works, the UCALL_SYNC above will fallthrough to this assert. I assume run->exit_reason needs a continue for UCALL_SYNC.
I agree, there should be a continue for UCALL_SYNC in place of break. I will correct it in V2.
I didn't observe this issue because UCALL_SYNC is invoked, when GUEST_SYNC() is called from the guest code. Since GUEST_SYNC() is not present in the guest code used in bus lock test case, UCALL_SYNC was never triggered.
TEST_ASSERT_EQ(run->flags, KVM_RUN_X86_BUS_LOCK);
run->flags &= ~KVM_RUN_X86_BUS_LOCK;
No need, KVM should clear the flag if the exit isn't due to a bus lock.
Sure I will remove this.
run->exit_reason = 0;
Again, no need, KVM should take care of resetting exit_reason.
Ack.
}
- Manali
On 7/9/2024 11:21 PM, Manali Shukla wrote:
Malicious guests can cause bus locks to degrade the performance of a system. Non-WB (write-back) and misaligned locked RMW (read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee the atomicity. The bus locks can impose notable performance penalties for all processors within the system.
Support for the Bus Lock Threshold is indicated by CPUID Fn8000_000A_EDX[29] BusLockThreshold=1, the VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count.
VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations
Bus lock threshold count VMCB Offset Bits Function 120h 15:0 Bus lock counter
During VMRUN, the bus lock threshold count is fetched and stored in an internal count register. Prior to executing a bus lock within the guest, the processor verifies the count in the bus lock register. If the count is greater than zero, the processor executes the bus lock, reducing the count. However, if the count is zero, the bus lock operation is not performed, and instead, a Bus Lock Threshold #VMEXIT is triggered to transfer control to the Virtual Machine Monitor (VMM).
A Bus Lock Threshold #VMEXIT is reported to the VMM with VMEXIT code 0xA5h, VMEXIT_BUSLOCK. EXITINFO1 and EXITINFO2 are set to 0 on a VMEXIT_BUSLOCK. On a #VMEXIT, the processor writes the current value of the Bus Lock Threshold Counter to the VMCB.
More details about the Bus Lock Threshold feature can be found in AMD APM [1].
Patches are prepared on kvm-x86/svm (704ec48fc2fb)
Testing done:
- Added a selftest for the Bus Lock Threadshold functionality.
- Tested the Bus Lock Threshold functionality on SEV and SEV-ES guests.
- Tested the Bus Lock Threshold functionality on nested guests.
Qemu changes can be found on: Repo: https://github.com/AMDESE/qemu.git Branch: buslock_threshold
Qemu commandline to use the bus lock threshold functionality: qemu-system-x86_64 -enable-kvm -cpu EPYC-Turin,+svm -M q35,bus-lock-ratelimit=10 \ ..
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
Manali Shukla (2): x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold KVM: x86: nSVM: Implement support for nested Bus Lock Threshold
Nikunj A Dadhania (2): KVM: SVM: Enable Bus lock threshold exit KVM: selftests: Add bus lock exit test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 5 +- arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 25 ++++ arch/x86/kvm/svm/svm.c | 48 ++++++++ arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.h | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_buslock_test.c | 114 ++++++++++++++++++ 10 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
base-commit: 704ec48fc2fbd4e41ec982662ad5bf1eee33eeb2
A gentle reminder.
-Manali
On 7/30/2024 10:22 AM, Manali Shukla wrote:
On 7/9/2024 11:21 PM, Manali Shukla wrote:
Malicious guests can cause bus locks to degrade the performance of a system. Non-WB (write-back) and misaligned locked RMW (read-modify-write) instructions are referred to as "bus locks" and require system wide synchronization among all processors to guarantee the atomicity. The bus locks can impose notable performance penalties for all processors within the system.
Support for the Bus Lock Threshold is indicated by CPUID Fn8000_000A_EDX[29] BusLockThreshold=1, the VMCB provides a Bus Lock Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count.
VMCB intercept bit VMCB Offset Bits Function 14h 5 Intercept bus lock operations
Bus lock threshold count VMCB Offset Bits Function 120h 15:0 Bus lock counter
During VMRUN, the bus lock threshold count is fetched and stored in an internal count register. Prior to executing a bus lock within the guest, the processor verifies the count in the bus lock register. If the count is greater than zero, the processor executes the bus lock, reducing the count. However, if the count is zero, the bus lock operation is not performed, and instead, a Bus Lock Threshold #VMEXIT is triggered to transfer control to the Virtual Machine Monitor (VMM).
A Bus Lock Threshold #VMEXIT is reported to the VMM with VMEXIT code 0xA5h, VMEXIT_BUSLOCK. EXITINFO1 and EXITINFO2 are set to 0 on a VMEXIT_BUSLOCK. On a #VMEXIT, the processor writes the current value of the Bus Lock Threshold Counter to the VMCB.
More details about the Bus Lock Threshold feature can be found in AMD APM [1].
Patches are prepared on kvm-x86/svm (704ec48fc2fb)
Testing done:
- Added a selftest for the Bus Lock Threadshold functionality.
- Tested the Bus Lock Threshold functionality on SEV and SEV-ES guests.
- Tested the Bus Lock Threshold functionality on nested guests.
Qemu changes can be found on: Repo: https://github.com/AMDESE/qemu.git Branch: buslock_threshold
Qemu commandline to use the bus lock threshold functionality: qemu-system-x86_64 -enable-kvm -cpu EPYC-Turin,+svm -M q35,bus-lock-ratelimit=10 \ ..
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024, Vol 2, 15.14.5 Bus Lock Threshold. https://bugzilla.kernel.org/attachment.cgi?id=306250
Manali Shukla (2): x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold KVM: x86: nSVM: Implement support for nested Bus Lock Threshold
Nikunj A Dadhania (2): KVM: SVM: Enable Bus lock threshold exit KVM: selftests: Add bus lock exit test
arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/svm.h | 5 +- arch/x86/include/uapi/asm/svm.h | 2 + arch/x86/kvm/governed_features.h | 1 + arch/x86/kvm/svm/nested.c | 25 ++++ arch/x86/kvm/svm/svm.c | 48 ++++++++ arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/x86.h | 1 + tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/svm_buslock_test.c | 114 ++++++++++++++++++ 10 files changed, 198 insertions(+), 1 deletion(-) create mode 100644 tools/testing/selftests/kvm/x86_64/svm_buslock_test.c
base-commit: 704ec48fc2fbd4e41ec982662ad5bf1eee33eeb2
A gentle reminder.
-Manali
A gentle reminder.
- Manali
linux-kselftest-mirror@lists.linaro.org