The FPU support for CET virtualization has already been merged into 6.17-rc1.
Building on that, this series introduces Intel CET virtualization support for
KVM.
Changes in v13
1. Add "arch" and "size" fields to the register ID used in
KVM_GET/SET_ONE_REG ioctls
2. Add a kselftest for KVM_GET/SET_ONE_REG ioctls
3. Advertise KVM_CAP_ONE_REG
4. Document how the emulation of SSP MSRs is flawed for 32-bit guests
5. Don't pass-thru MSR_IA32_INT_SSP_TAB and report it as unsupported for
32-bit guests
6. Refine changelog to clarify why CET MSRs are pass-thru'd.
7. Limit SHSTK to 64-bit kernels
8. Retain CET state if L1 doesn't set VM_EXIT_LOAD_CET_STATE
9. Rename a new functions for clarity
---
Control-flow Enforcement Technology (CET) is a kind of CPU feature used
to prevent Return/CALL/Jump-Oriented Programming (ROP/COP/JOP) attacks.
It provides two sub-features(SHSTK,IBT) to defend against ROP/COP/JOP
style control-flow subversion attacks.
Shadow Stack (SHSTK):
A shadow stack is a second stack used exclusively for control transfer
operations. The shadow stack is separate from the data/normal stack and
can be enabled individually in user and kernel mode. When shadow stack
is enabled, CALL pushes the return address on both the data and shadow
stack. RET pops the return address from both stacks and compares them.
If the return addresses from the two stacks do not match, the processor
generates a #CP.
Indirect Branch Tracking (IBT):
IBT introduces new instruction(ENDBRANCH)to mark valid target addresses
of indirect branches (CALL, JMP etc...). If an indirect branch is
executed and the next instruction is _not_ an ENDBRANCH, the processor
generates a #CP. These instruction behaves as a NOP on platforms that
doesn't support CET.
CET states management
=====================
KVM cooperates with host kernel FPU framework to manage guest CET registers.
With CET supervisor mode state support in this series, KVM can save/restore
full guest CET xsave-managed states.
CET user mode and supervisor mode xstates, i.e., MSR_IA32_{U_CET,PL3_SSP}
and MSR_IA32_PL{0,1,2}, depend on host FPU framework to swap guest and host
xstates. On VM-Exit, guest CET xstates are saved to guest fpu area and host
CET xstates are loaded from task/thread context before vCPU returns to
userspace, vice-versa on VM-Entry. See details in kvm_{load,put}_guest_fpu().
CET supervisor mode states are grouped into two categories : XSAVE-managed
and non-XSAVE-managed, the former includes MSR_IA32_PL{0,1,2}_SSP and are
controlled by CET supervisor mode bit(S_CET bit) in XSS, the later consists
of MSR_IA32_S_CET and MSR_IA32_INTR_SSP_TBL.
VMX introduces new VMCS fields, {GUEST|HOST}_{S_CET,SSP,INTR_SSP_TABL}, to
facilitate guest/host non-XSAVES-managed states. When VMX CET entry/exit load
bits are set, guest/host MSR_IA32_{S_CET,INTR_SSP_TBL,SSP} are loaded from
equivalent fields at VM-Exit/Entry. With these new fields, such supervisor
states require no addtional KVM save/reload actions.
Tests
======
This series has successfully passed the basic CET user shadow stack test
and kernel IBT test in both L1 and L2 guests. The newly added
KVM-unit-tests [2] also passed, and its v11 has been tested with the AMD
CET series by John [3].
For your convenience, you can use my WIP QEMU [1] for testing.
[1]: https://github.com/gaochaointel/qemu-dev qemu-cet
[2]: https://lore.kernel.org/kvm/20250626073459.12990-1-minipli@grsecurity.net/
[3]: https://lore.kernel.org/kvm/aH6CH+x5mCDrvtoz@AUSJOHALLEN.amd.com/
Chao Gao (4):
KVM: nVMX: Add consistency checks for CR0.WP and CR4.CET
KVM: nVMX: Add consistency checks for CET states
KVM: nVMX: Advertise new VM-Entry/Exit control bits for CET state
KVM: selftest: Add tests for KVM_{GET,SET}_ONE_REG
Sean Christopherson (2):
KVM: x86: Report XSS as to-be-saved if there are supported features
KVM: x86: Load guest FPU state when access XSAVE-managed MSRs
Yang Weijiang (15):
KVM: x86: Introduce KVM_{G,S}ET_ONE_REG uAPIs support
KVM: x86: Refresh CPUID on write to guest MSR_IA32_XSS
KVM: x86: Initialize kvm_caps.supported_xss
KVM: x86: Add fault checks for guest CR4.CET setting
KVM: x86: Report KVM supported CET MSRs as to-be-saved
KVM: VMX: Introduce CET VMCS fields and control bits
KVM: x86: Enable guest SSP read/write interface with new uAPIs
KVM: VMX: Emulate read and write to CET MSRs
KVM: x86: Save and reload SSP to/from SMRAM
KVM: VMX: Set up interception for CET MSRs
KVM: VMX: Set host constant supervisor states to VMCS fields
KVM: x86: Don't emulate instructions guarded by CET
KVM: x86: Enable CET virtualization for VMX and advertise to userspace
KVM: nVMX: Virtualize NO_HW_ERROR_CODE_CC for L1 event injection to L2
KVM: nVMX: Prepare for enabling CET support for nested guest
arch/x86/include/asm/kvm_host.h | 5 +-
arch/x86/include/asm/vmx.h | 9 +
arch/x86/include/uapi/asm/kvm.h | 24 ++
arch/x86/kvm/cpuid.c | 17 +-
arch/x86/kvm/emulate.c | 46 ++-
arch/x86/kvm/smm.c | 8 +
arch/x86/kvm/smm.h | 2 +-
arch/x86/kvm/svm/svm.c | 4 +
arch/x86/kvm/vmx/capabilities.h | 9 +
arch/x86/kvm/vmx/nested.c | 163 ++++++++++-
arch/x86/kvm/vmx/nested.h | 5 +
arch/x86/kvm/vmx/vmcs12.c | 6 +
arch/x86/kvm/vmx/vmcs12.h | 14 +-
arch/x86/kvm/vmx/vmx.c | 84 +++++-
arch/x86/kvm/vmx/vmx.h | 9 +-
arch/x86/kvm/x86.c | 267 +++++++++++++++++-
arch/x86/kvm/x86.h | 61 ++++
tools/arch/x86/include/uapi/asm/kvm.h | 24 ++
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/x86/get_set_one_reg.c | 35 +++
20 files changed, 752 insertions(+), 41 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86/get_set_one_reg.c
--
2.47.3
This patch fixes unstable LACP negotiation when bonding is configured in
passive mode (`lacp_active=off`).
Previously, the actor would stop sending LACPDUs after initial negotiation
succeeded, leading to the partner timing out and restarting the negotiation
cycle. This resulted in continuous LACP state flapping.
The fix ensures the passive actor starts sending periodic LACPDUs after
receiving the first LACPDU from the partner, in accordance with IEEE
802.1AX-2020 section 6.4.1.
v3:
a) const bond_params for ad_initialize_port (Paolo Abeni)
b) add comment about why we need to sleep in test script (Paolo Abeni)
v2:
a) Split the patch in to 2 parts. One for lacp_active setting.
One for passive mode negotiation flapping issue. (Nikolay Aleksandrov)
b) Update fixes tags and some comments (Nikolay Aleksandrov)
c) Update selftest to pass on normal kernel (Jakub Kicinski)
Hangbin Liu (3):
bonding: update LACP activity flag after setting lacp_active
bonding: send LACPDUs periodically in passive mode after receiving
partner's LACPDU
selftests: bonding: add test for passive LACP mode
drivers/net/bonding/bond_3ad.c | 67 ++++++++---
drivers/net/bonding/bond_options.c | 1 +
include/net/bond_3ad.h | 1 +
.../selftests/drivers/net/bonding/Makefile | 3 +-
.../drivers/net/bonding/bond_passive_lacp.sh | 105 ++++++++++++++++++
.../selftests/drivers/net/bonding/config | 1 +
6 files changed, 159 insertions(+), 19 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/bonding/bond_passive_lacp.sh
--
2.50.1
Signed-off-by: Abhishek Jadhav <abhijadhav.dev(a)gmail.com>
---
tools/testing/selftests/amd-pstate/gitsource.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/amd-pstate/gitsource.sh b/tools/testing/selftests/amd-pstate/gitsource.sh
index 4cde62f90468..9b7323b1d0a6 100755
--- a/tools/testing/selftests/amd-pstate/gitsource.sh
+++ b/tools/testing/selftests/amd-pstate/gitsource.sh
@@ -121,7 +121,7 @@ parse_gitsource()
en_sum=$(awk 'BEGIN {sum=0};{sum += $1};END {print sum}' $OUTFILE_GIT-energy-$1-$2.log)
printf "Gitsource-$1-#$2 power consumption(J): $en_sum\n" | tee -a $OUTFILE_GIT.result
- # Permance is the number of run gitsource per second, denoted 1/t, where 1 is the number of run gitsource in t
+ # Performance is the number of run gitsource per second, denoted 1/t, where 1 is the number of run gitsource in t
# seconds. It is well known that P=E/t, where P is power measured in watts(W), E is energy measured in joules(J),
# and t is time measured in seconds(s). This means that performance per watt becomes
# 1/t 1/t 1
@@ -179,7 +179,7 @@ gather_gitsource()
avg_en=$(awk 'BEGIN {sum=0};{sum += $1};END {print sum/'$LOOP_TIMES'}' $OUTFILE_GIT-energy-$1.log)
printf "Gitsource-$1 avg power consumption(J): $avg_en\n" | tee -a $OUTFILE_GIT.result
- # Permance is the number of run gitsource per second, denoted 1/t, where 1 is the number of run gitsource in t
+ # Performance is the number of run gitsource per second, denoted 1/t, where 1 is the number of run gitsource in t
# seconds. It is well known that P=E/t, where P is power measured in watts(W), E is energy measured in joules(J),
# and t is time measured in seconds(s). This means that performance per watt becomes
# 1/t 1/t 1
--
2.50.1
This series adds ONE_REG interface for SBI FWFT extension implemented
by KVM RISC-V. This was missed out in accepted SBI FWFT patches for
KVM RISC-V.
These patches can also be found in the riscv_kvm_fwft_one_reg_v1 branch
at: https://github.com/avpatel/linux.git
Anup Patel (6):
RISC-V: KVM: Set initial value of hedeleg in kvm_arch_vcpu_create()
RISC-V: KVM: Introduce feature specific reset for SBI FWFT
RISC-V: KVM: Introduce optional ONE_REG callbacks for SBI extensions
RISC-V: KVM: Move copy_sbi_ext_reg_indices() to SBI implementation
RISC-V: KVM: Implement ONE_REG interface for SBI FWFT state
KVM: riscv: selftests: Add SBI FWFT to get-reg-list test
arch/riscv/include/asm/kvm_vcpu_sbi.h | 23 +-
arch/riscv/include/uapi/asm/kvm.h | 14 ++
arch/riscv/kvm/vcpu.c | 3 +-
arch/riscv/kvm/vcpu_onereg.c | 60 +-----
arch/riscv/kvm/vcpu_sbi.c | 172 ++++++++++++---
arch/riscv/kvm/vcpu_sbi_fwft.c | 199 ++++++++++++++++--
arch/riscv/kvm/vcpu_sbi_sta.c | 64 ++++--
.../selftests/kvm/riscv/get-reg-list.c | 28 +++
8 files changed, 436 insertions(+), 127 deletions(-)
--
2.43.0
Fixed multiple typos in powerpc/tm reported by Codespell
Signed-off-by: Rakuram Eswaran <rakuram.e96(a)gmail.com>
---
tools/testing/selftests/powerpc/tm/tm-signal-msr-resv.c | 2 +-
tools/testing/selftests/powerpc/tm/tm-signal-stack.c | 4 ++--
tools/testing/selftests/powerpc/tm/tm-sigreturn.c | 2 +-
tools/testing/selftests/powerpc/tm/tm-tar.c | 2 +-
tools/testing/selftests/powerpc/tm/tm-tmspr.c | 2 +-
tools/testing/selftests/powerpc/tm/tm-trap.c | 4 ++--
6 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-msr-resv.c b/tools/testing/selftests/powerpc/tm/tm-signal-msr-resv.c
index 4a61e9bd12b4..8aee18819603 100644
--- a/tools/testing/selftests/powerpc/tm/tm-signal-msr-resv.c
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-msr-resv.c
@@ -42,7 +42,7 @@ void signal_usr1(int signum, siginfo_t *info, void *uc)
#else
ucp->uc_mcontext.uc_regs->gregs[PT_MSR] |= (7ULL);
#endif
- /* Should segv on return becuase of invalid context */
+ /* Should segv on return because of invalid context */
segv_expected = 1;
}
diff --git a/tools/testing/selftests/powerpc/tm/tm-signal-stack.c b/tools/testing/selftests/powerpc/tm/tm-signal-stack.c
index 68807aac8dd3..e793b5d97c48 100644
--- a/tools/testing/selftests/powerpc/tm/tm-signal-stack.c
+++ b/tools/testing/selftests/powerpc/tm/tm-signal-stack.c
@@ -2,7 +2,7 @@
/*
* Copyright 2015, Michael Neuling, IBM Corp.
*
- * Test the kernel's signal delievery code to ensure that we don't
+ * Test the kernel's signal delivery code to ensure that we don't
* trelaim twice in the kernel signal delivery code. This can happen
* if we trigger a signal when in a transaction and the stack pointer
* is bogus.
@@ -52,7 +52,7 @@ int tm_signal_stack()
/*
* The flow here is:
- * 1) register a signal handler (so signal delievery occurs)
+ * 1) register a signal handler (so signal delivery occurs)
* 2) make stack pointer (r1) = NULL
* 3) start transaction
* 4) cause segv
diff --git a/tools/testing/selftests/powerpc/tm/tm-sigreturn.c b/tools/testing/selftests/powerpc/tm/tm-sigreturn.c
index ffe4e5515f33..4dfb25409393 100644
--- a/tools/testing/selftests/powerpc/tm/tm-sigreturn.c
+++ b/tools/testing/selftests/powerpc/tm/tm-sigreturn.c
@@ -5,7 +5,7 @@
*
* Test the kernel's signal returning code to check reclaim is done if the
* sigreturn() is called while in a transaction (suspended since active is
- * already dropped trough the system call path).
+ * already dropped through the system call path).
*
* The kernel must discard the transaction when entering sigreturn, since
* restoring the potential TM SPRS from the signal frame is requiring to not be
diff --git a/tools/testing/selftests/powerpc/tm/tm-tar.c b/tools/testing/selftests/powerpc/tm/tm-tar.c
index f2a9137f3c1e..ea420caa3961 100644
--- a/tools/testing/selftests/powerpc/tm/tm-tar.c
+++ b/tools/testing/selftests/powerpc/tm/tm-tar.c
@@ -50,7 +50,7 @@ int test_tar(void)
"bne 2b;"
"tend.;"
- /* Transaction sucess! TAR should be 3 */
+ /* Transaction success! TAR should be 3 */
"mfspr 7, %[tar];"
"ori %[res], 7, 4;" // res = 3|4 = 7
"b 4f;"
diff --git a/tools/testing/selftests/powerpc/tm/tm-tmspr.c b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
index dd5ddffa28b7..e2c3ae7c9035 100644
--- a/tools/testing/selftests/powerpc/tm/tm-tmspr.c
+++ b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
@@ -9,7 +9,7 @@
* - TFIAR - stores address of location of transaction failure
* - TFHAR - stores address of software failure handler (if transaction
* fails)
- * - TEXASR - lots of info about the transacion(s)
+ * - TEXASR - lots of info about the transaction(s)
*
* (1) create more threads than cpus
* (2) in each thread:
diff --git a/tools/testing/selftests/powerpc/tm/tm-trap.c b/tools/testing/selftests/powerpc/tm/tm-trap.c
index 97cb74768e30..99acb7c78403 100644
--- a/tools/testing/selftests/powerpc/tm/tm-trap.c
+++ b/tools/testing/selftests/powerpc/tm/tm-trap.c
@@ -91,9 +91,9 @@ void trap_signal_handler(int signo, siginfo_t *si, void *uc)
* LE endianness does in effect nothing, instruction (2)
* is then executed again as 'trap', generating a second
* trap event (note that in that case 'trap' is caught
- * not in transacional mode). On te other hand, if after
+ * not in transactional mode). On the other hand, if after
* the return from the signal handler the endianness in-
- * advertently flipped, instruction (1) is tread as a
+ * advertently flipped, instruction (1) is thread as a
* branch instruction, i.e. b .+8, hence instruction (3)
* and (4) are executed (tbegin.; trap;) and we get sim-
* ilaly on the trap signal handler, but now in TM mode.
--
2.43.0
This series updates crc_kunit to use the same interrupt context testing
strategy that I used in the crypto KUnit tests. I.e., test CRC
computation in hardirq, softirq, and task context concurrently. This
detect issues related to use of the FPU/SIMD/vector registers.
To allow lib/crc/tests/ and lib/crypto/tests/ to share code, move the
needed helper function to include/kunit/run-in-irq-context.h.
include/kunit/ seems like the most relevant location for this sort of
thing, but let me know if there is any other preference.
The third patch replaces the calls to crypto_simd_usable() in lib/crc/
with calls to the underlying functions, now that we have a better
solution that doesn't rely on the test injecting values. (Note that
crc_kunit wasn't actually using the injection solution, anyway.)
I'd like to take this series via crc-next.
Eric Biggers (3):
kunit, lib/crypto: Move run_irq_test() to common header
lib/crc: crc_kunit: Test CRC computation in interrupt contexts
lib/crc: Use underlying functions instead of crypto_simd_usable()
include/kunit/run-in-irq-context.h | 129 ++++++++++++++++++++++++++
lib/crc/arm/crc-t10dif.h | 6 +-
lib/crc/arm/crc32.h | 6 +-
lib/crc/arm64/crc-t10dif.h | 6 +-
lib/crc/arm64/crc32.h | 11 ++-
lib/crc/powerpc/crc-t10dif.h | 5 +-
lib/crc/powerpc/crc32.h | 5 +-
lib/crc/tests/crc_kunit.c | 62 +++++++++++--
lib/crc/x86/crc-pclmul-template.h | 3 +-
lib/crc/x86/crc32.h | 2 +-
lib/crypto/tests/hash-test-template.h | 123 +-----------------------
11 files changed, 206 insertions(+), 152 deletions(-)
create mode 100644 include/kunit/run-in-irq-context.h
base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585
--
2.50.1
Even with slowwait used to avoid system sleep in the preferred_lft test,
failures can still occur after long runtimes.
Print the device address info when the test fails to provide better
troubleshooting data.
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
tools/testing/selftests/net/rtnetlink.sh | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/net/rtnetlink.sh b/tools/testing/selftests/net/rtnetlink.sh
index d6c00efeb664..91b0f6cae04d 100755
--- a/tools/testing/selftests/net/rtnetlink.sh
+++ b/tools/testing/selftests/net/rtnetlink.sh
@@ -313,6 +313,8 @@ kci_test_addrlft()
slowwait 5 check_addr_not_exist "$devdummy" "10.23.11."
if [ $? -eq 1 ]; then
+ # troubleshoot the reason for our failure
+ run_cmd ip addr show dev "$devdummy"
check_err 1
end_test "FAIL: preferred_lft addresses remaining"
return
--
2.50.1