The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V Zicfiss feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
We currently maintain the x86 pattern of implicitly allocating a shadow
stack for threads started with shadow stack enabled, there has been some
discussion of removing this support and requiring the use of clone3()
with explicit allocation of shadow stacks instead. I have no strong
feelings either way, implicit allocation is not really consistent with
anything else we do and creates the potential for errors around thread
exit but on the other hand it is existing ABI on x86 and minimises the
changes needed in userspace code.
glibc and bionic changes using this ABI have been implemented and
tested. Headless Android systems have been validated and Ross Burton
has used this code has been used to bring up a Yocto system with GCS
enabed as standard, a test implementation of V8 support has also been
done.
uprobes are not currently supported, missing emulation was identified
late in review.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
It is likely that we could relax some of the barriers added here with
some more targeted placements, this is left for further study.
There is an in process series adding clone3() support for shadow stacks:
https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Previous versions of this series depended on that, this dependency has
been removed in order to make merging easier.
[1] https://lore.kernel.org/lkml/20240403234054.2020347-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v13:
- Rebase onto v6.12-rc1.
- Allocate VM_HIGH_ARCH_6 since protection keys used all the existing
bits.
- Implement mm_release() and free transparently allocated GCSs there.
- Use bit 32 of AT_HWCAP for GCS due to AT_HWCAP2 being filled.
- Since we now only set GCSCRE0_EL1 on change ensure that it is
initialised with GCSPR_EL0 accessible to EL0.
- Fix OOM handling on thread copy.
- Link to v12: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec947436a@kernel.org
Changes in v12:
- Clarify and simplify the signal handling code so we work with the
register state.
- When checking for write aborts to shadow stack pages ensure the fault
is a data abort.
- Depend on !UPROBES.
- Comment cleanups.
- Link to v11: https://lore.kernel.org/r/20240822-arm64-gcs-v11-0-41b81947ecb5@kernel.org
Changes in v11:
- Remove the dependency on the addition of clone3() support for shadow
stacks, rebasing onto v6.11-rc3.
- Make ID_AA64PFR1_EL1.GCS writeable in KVM.
- Hide GCS registers when GCS is not enabled for KVM guests.
- Require HCRX_EL2.GCSEn if booting at EL1.
- Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of
if we boot at EL2 or EL1.
- Remove some stray use of bit 63 in signal cap tokens.
- Warn if we see a GCS with VM_SHARED.
- Remove rdundant check for VM_WRITE in fault handling.
- Cleanups and clarifications in the ABI document.
- Clean up and improve documentation of some sync placement.
- Only set the EL0 GCS mode if it's actually changed.
- Various minor fixes and tweaks.
- Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org
Changes in v10:
- Fix issues with THP.
- Tighten up requirements for initialising GCSCR*.
- Only generate GCS signal frames for threads using GCS.
- Only context switch EL1 GCS registers if S1PIE is enabled.
- Move context switch of GCSCRE0_EL1 to EL0 context switch.
- Make GCS registers unconditionally visible to userspace.
- Use FHU infrastructure.
- Don't change writability of ID_AA64PFR1_EL1 for KVM.
- Remove unused arguments from alloc_gcs().
- Typo fixes.
- Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org
Changes in v9:
- Rebase onto v6.10-rc3.
- Restructure and clarify memory management fault handling.
- Fix up basic-gcs for the latest clone3() changes.
- Convert to newly merged KVM ID register based feature configuration.
- Fixes for NV traps.
- Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (40):
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
mm: Define VM_HIGH_ARCH_6
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
mman: Add map_shadow_stack() flags
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide put_user_gcs()
arm64/gcs: Provide basic EL2 setup to allow GCS usage at EL0 and EL1
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS access and registers for guests
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Ensure that new threads have a GCS
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
kselftest/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
KVM: selftests: arm64: Add GCS registers to get-reg-list
Documentation/admin-guide/kernel-parameters.txt | 3 +
Documentation/arch/arm64/booting.rst | 32 +
Documentation/arch/arm64/elf_hwcaps.rst | 4 +
Documentation/arch/arm64/gcs.rst | 230 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 21 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 30 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 107 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_host.h | 12 +
arch/arm64/include/asm/mman.h | 23 +-
arch/arm64/include/asm/mmu_context.h | 9 +
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 40 ++
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 3 +-
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 23 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/pi/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 94 +++
arch/arm64/kernel/ptrace.c | 62 +-
arch/arm64/kernel/signal.c | 227 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 31 +
arch/arm64/kvm/sys_regs.c | 27 +-
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 40 ++
arch/arm64/mm/gcs.c | 254 +++++++
arch/arm64/mm/mmap.c | 9 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/Kconfig | 1 +
arch/x86/include/uapi/asm/mman.h | 3 -
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 18 +-
include/uapi/asm-generic/mman.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
mm/Kconfig | 6 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 357 ++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 530 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 728 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 62 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 28 +
75 files changed, 4120 insertions(+), 34 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
If MPLS is not available in the kernel then skip MPLS tests.
This avoids the test failing in situations where the test is not
supported by the underlying kernel.
In the case where all tests are run, just skip over the MPLS tests
without altering the exit code of the overall test run - there
is only one exit code in this scenario.
In the case where a single test is run, exit with KSFT_SKIP (4).
In both cases log an informative message.
Signed-off-by: Simon Horman <horms(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 7989ec608454..71cddabc4ade 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -102,6 +102,20 @@ wait_for_port() {
return 1
}
+skip_mac() {
+ if [ "$1" = "mpls" ]; then
+ modprobe mpls_iptunnel || true
+ modprobe mpls_gso || true
+
+ if [ ! -e /proc/sys/net/mpls/platform_labels ]; then
+ echo -e "skip: mpls tunnel not supported by kernel\n"
+ return # true
+ fi
+ fi
+
+ false
+}
+
set -e
# no arguments: automated test, run all
@@ -125,6 +139,8 @@ if [[ "$#" -eq "0" ]]; then
$0 ipv6 ip6vxlan eth 2000
for mac in none mpls eth ; do
+ ! skip_mac "$mac" || continue
+
echo "ip gre $mac"
$0 ipv4 gre $mac 100
@@ -193,6 +209,10 @@ readonly tuntype=$2
readonly mac=$3
readonly datalen=$4
+if skip_mac "$mac"; then
+ exit 4 # KSFT_SKIP=4
+fi
+
echo "encap ${addr1} to ${addr2}, type ${tuntype}, mac ${mac} len ${datalen}"
trap cleanup EXIT
@@ -278,8 +298,6 @@ elif [[ "$tuntype" =~ (gre|vxlan) && "$mac" == "eth" ]]; then
awk '/ether/ { print $2 }')
ip netns exec "${ns2}" ip link set testtun0 address $ethaddr
elif [[ "$mac" == "mpls" ]]; then
- modprobe mpls_iptunnel ||true
- modprobe mpls_gso ||true
ip netns exec "${ns2}" sysctl -qw net.mpls.platform_labels=65536
ip netns exec "${ns2}" ip -f mpls route add 1000 dev lo
ip netns exec "${ns2}" ip link set lo up
This series introduces a new ioctl KVM_TRANSLATE2, which expands on
KVM_TRANSLATE. It is required to implement Hyper-V's
HvTranslateVirtualAddress hyper-call as part of the ongoing effort to
emulate HyperV's Virtual Secure Mode (VSM) within KVM and QEMU. The hyper-
call requires several new KVM APIs, one of which is KVM_TRANSLATE2, which
implements the core functionality of the hyper-call. The rest of the
required functionality will be implemented in subsequent series.
Other than translating guest virtual addresses, the ioctl allows the
caller to control whether the access and dirty bits are set during the
page walk. It also allows specifying an access mode instead of returning
viable access modes, which enables setting the bits up to the level that
caused a failure. Additionally, the ioctl provides more information about
why the page walk failed, and which page table is responsible. This
functionality is not available within KVM_TRANSLATE, and can't be added
without breaking backwards compatiblity, thus a new ioctl is required.
The ioctl was designed to facilitate as many other use cases as possible
apart from VSM. The error codes were intentionally chosen to be broad
enough to avoid exposing architecture specific details. Even though
HvTranslateVirtualAddress only really needs one flag to set the accessed
and dirty bits whenever possible, that was split into several flags so
that future users can chose more gradually when these bits should be set.
Furthermore, as much information as possible is provided to the caller.
The patch series includes selftests for the ioctl, as well as fuzzy
testing on random garbage guest page table entries. All previously passing
KVM selftests and KVM unit tests still pass.
Series overview:
- 1: Document the new ioctl
- 2-11: Update the page walker in preparation
- 12-14: Implement the ioctl
- 15: Implement testing
This series, alongside the series by Nicolas Saenz Julienne [1]
introducing the core building blocks for VSM and the accompanying QEMU
implementation [2], is capable of booting Windows Server 2019.
Both series are also available on GitHub [3].
[1] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.c…
[2] https://github.com/vianpl/qemu/tree/vsm/next
[3] https://github.com/vianpl/linux/tree/vsm/next
Best,
Nikolas
Nikolas Wipper (15):
KVM: Add API documentation for KVM_TRANSLATE2
KVM: x86/mmu: Abort page walk if permission checks fail
KVM: x86/mmu: Introduce exception flag for unmapped GPAs
KVM: x86/mmu: Store GPA in exception if applicable
KVM: x86/mmu: Introduce flags parameter to page walker
KVM: x86/mmu: Implement PWALK_SET_ACCESSED in page walker
KVM: x86/mmu: Implement PWALK_SET_DIRTY in page walker
KVM: x86/mmu: Implement PWALK_FORCE_SET_ACCESSED in page walker
KVM: x86/mmu: Introduce status parameter to page walker
KVM: x86/mmu: Implement PWALK_STATUS_READ_ONLY_PTE_GPA in page walker
KVM: x86: Introduce generic gva to gpa translation function
KVM: Introduce KVM_TRANSLATE2
KVM: Add KVM_TRANSLATE2 stub
KVM: x86: Implement KVM_TRANSLATE2
KVM: selftests: Add test for KVM_TRANSLATE2
Documentation/virt/kvm/api.rst | 131 ++++++++
arch/x86/include/asm/kvm_host.h | 18 +-
arch/x86/kvm/hyperv.c | 3 +-
arch/x86/kvm/kvm_emulate.h | 8 +
arch/x86/kvm/mmu.h | 10 +-
arch/x86/kvm/mmu/mmu.c | 7 +-
arch/x86/kvm/mmu/paging_tmpl.h | 80 +++--
arch/x86/kvm/x86.c | 123 ++++++-
include/linux/kvm_host.h | 6 +
include/uapi/linux/kvm.h | 33 ++
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/kvm_translate2.c | 310 ++++++++++++++++++
virt/kvm/kvm_main.c | 41 +++
13 files changed, 724 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_translate2.c
--
2.40.1
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Misbehaving guests can cause bus locks to degrade the performance of
a system. Non-WB (write-back) and misaligned locked RMW
(read-modify-write) instructions are referred to as "bus locks" and
require system wide synchronization among all processors to guarantee
the atomicity. The bus locks can impose notable performance penalties
for all processors within the system.
Support for the Bus Lock Threshold is indicated by CPUID
Fn8000_000A_EDX[29] BusLockThreshold=1, the VMCB provides a Bus Lock
Threshold enable bit and an unsigned 16-bit Bus Lock Threshold count.
VMCB intercept bit
VMCB Offset Bits Function
14h 5 Intercept bus lock operations
Bus lock threshold count
VMCB Offset Bits Function
120h 15:0 Bus lock counter
During VMRUN, the bus lock threshold count is fetched and stored in an
internal count register. Prior to executing a bus lock within the
guest, the processor verifies the count in the bus lock register. If
the count is greater than zero, the processor executes the bus lock,
reducing the count. However, if the count is zero, the bus lock
operation is not performed, and instead, a Bus Lock Threshold #VMEXIT
is triggered to transfer control to the Virtual Machine Monitor (VMM).
A Bus Lock Threshold #VMEXIT is reported to the VMM with VMEXIT code
0xA5h, VMEXIT_BUSLOCK. EXITINFO1 and EXITINFO2 are set to 0 on
a VMEXIT_BUSLOCK. On a #VMEXIT, the processor writes the current
value of the Bus Lock Threshold Counter to the VMCB.
More details about the Bus Lock Threshold feature can be found in AMD
APM [1].
v2 -> v3
- Drop parch to add virt tag in /proc/cpuinfo.
- Incorporated Tom's review comments.
v1 -> v2
- Incorporated misc review comments from Sean.
- Removed bus_lock_counter module parameter.
- Set the value of bus_lock_counter to zero by default and reload the value by 1
in bus lock exit handler.
- Add documentation for the behavioral difference for KVM_EXIT_BUS_LOCK.
- Improved selftest for buslock to work on SVM and VMX.
- Rewrite the commit messages.
Patches are prepared on kvm-next/next (efbc6bd090f4).
Testing done:
- Added a selftest for the Bus Lock Threshold functionality.
- The bus lock threshold selftest has been tested on both Intel and AMD platforms.
- Tested the Bus Lock Threshold functionality on SEV, SEV-ES, SEV-SNP guests.
- Tested the Bus Lock Threshold functionality on nested guests.
v1: https://lore.kernel.org/kvm/20240709175145.9986-4-manali.shukla@amd.com/T/
v2: https://lore.kernel.org/kvm/20241001063413.687787-4-manali.shukla@amd.com/T/
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.14.5 Bus Lock Threshold.
https://bugzilla.kernel.org/attachment.cgi?id=306250
Manali Shukla (2):
x86/cpufeatures: Add CPUID feature bit for the Bus Lock Threshold
KVM: X86: Add documentation about behavioral difference for
KVM_EXIT_BUS_LOCK
Nikunj A Dadhania (2):
KVM: SVM: Enable Bus lock threshold exit
KVM: selftests: Add bus lock exit test
Documentation/virt/kvm/api.rst | 4 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 5 +-
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/nested.c | 10 ++
arch/x86/kvm/svm/svm.c | 27 ++++
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/x86_64/kvm_buslock_test.c | 130 ++++++++++++++++++
8 files changed, 179 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_buslock_test.c
base-commit: efbc6bd090f48ccf64f7a8dd5daea775821d57ec
--
2.34.1
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they all build
upon each other. Or the DT part can be merged through the DT tree to
reduce the dependencies.
Changes from v7: https://lore.kernel.org/r/20240710201246.1802189-1-sboyd@kernel.org
* Support modular builds properly by compiling overlay with tests into
one .ko
* Fold in thinko fix from Geert to DT overlay application patch
* Export device_is_bound() to fix module build
* Add more module license and description
Changes from v6: https://lore.kernel.org/r/20240706045454.215701-1-sboyd@kernel.org
* Fix kasan error in platform test by fixing the condition to check for
correct free callback
* Add module descriptions to new modules
Changes from v5: https://lore.kernel.org/r/20240603223811.3815762-1-sboyd@kernel.org
* Pick up reviewed-by tags
* Drop test vendor prefix bindings as dtschema allows anything now
* Use of_node_put_kunit() more to plug some reference leaks
* Select DTC config to avoid compile fails because of missing dtc
* Don't skip for OF_OVERLAY in overlay tests because they depend on it
Changes from v4: https://lore.kernel.org/r/20240422232404.213174-1-sboyd@kernel.org
* Picked up reviewed-by tags
* Check for non-NULL device pointers before calling put_device()
* Fix CFI issues with kunit actions
* Introduce platform_device_prepare_wait_for_probe() helper to wait for
a platform device to probe
* Move platform code to lib/kunit and rename functions to have kunit
prefix
* Fix issue with platform wrappers messing up reference counting
because they used kunit actions
* New patch to populate overlay devices on root node for powerpc
* Make fixed-rate binding generic single clk consumer binding
Changes from v3: https://lore.kernel.org/r/20230327222159.3509818-1-sboyd@kernel.org
* No longer depend on Frank's series[1] because it was merged upstream[2]
* Use kunit_add_action_or_reset() to shorten code
* Skip tests properly when CONFIG_OF_OVERLAY isn't set
Changes from v2: https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1: https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
[2] https://lore.kernel.org/r/20240308195737.GA1174908-robh@kernel.org
Stephen Boyd (8):
of/platform: Allow overlays to create platform devices from the root
node
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 21 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/dd.c | 1 +
drivers/clk/.kunitconfig | 2 +
drivers/clk/Kconfig | 11 +
drivers/clk/Makefile | 11 +-
drivers/clk/clk-fixed-rate_test.c | 380 +++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit_helpers.c | 207 ++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 453 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 1 +
drivers/of/Kconfig | 10 +
drivers/of/Makefile | 3 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit_helpers.c | 77 +++
drivers/of/overlay_test.c | 115 +++++
drivers/of/platform.c | 9 +-
include/kunit/clk.h | 28 ++
include/kunit/of.h | 115 +++++
include/kunit/platform_device.h | 20 +
lib/kunit/Makefile | 4 +-
lib/kunit/platform-test.c | 224 +++++++++
lib/kunit/platform.c | 302 ++++++++++++
29 files changed, 2097 insertions(+), 6 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit_helpers.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit_helpers.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
create mode 100644 lib/kunit/platform-test.c
create mode 100644 lib/kunit/platform.c
base-commit: 1613e604df0cd359cf2a7fbd9be7a0bcfacfabd0
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
This patch allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail)
unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors. These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.
Changes in v4:
* Only allow for CAP_BPF
* Add test for stack growing upwards
* Improve comment about stack growing upwards
Changes in v3:
* Check if stack is (erroneously) growing upwards
* Mention in commit message why existing tests needed change
Changes in v2:
* Added a check for when R2 is not a ptr to stack
* Added a check for when stack is uninitialized (no stack slot yet)
* Updated existing tests to account for null elision
* Added test case for when R2 can be both const and non-const
Daniel Xu (2):
bpf: verifier: Support eliding map lookup nullness
bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 73 ++++++-
tools/testing/selftests/bpf/progs/iters.c | 14 +-
.../selftests/bpf/progs/map_kptr_fail.c | 2 +-
.../bpf/progs/verifier_array_access.c | 183 ++++++++++++++++++
.../selftests/bpf/progs/verifier_map_in_map.c | 2 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
6 files changed, 265 insertions(+), 11 deletions(-)
--
2.46.0
When cross building kselftest out-of-tree the following issue can be
seen:
[...]
make[4]: Entering directory
'/src/kernel/linux/tools/testing/selftests/net/lib'
CC csum
/usr/lib/gcc-cross/aarch64-linux-gnu/13/../../../../aarch64-linux-gnu/bin/ld:
cannot open output file /tmp/build/kselftest/net/lib/csum: No such
file or directory
collect2: error: ld returned 1 exit status
[...]
Create the output build directory before building the targets, solves
this issue with building 'net/lib/csum'.
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b38199965f99..05c143bcff6a 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -261,6 +261,7 @@ ifdef INSTALL_PATH
@ret=1; \
for TARGET in $(TARGETS) $(INSTALL_DEP_TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
+ mkdir -p $$BUILD_TARGET; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET install \
INSTALL_PATH=$(INSTALL_PATH)/$$TARGET \
SRC_PATH=$(shell readlink -e $$(pwd)) \
--
2.45.2
Rename ip_len to payload_len since the length in this case refers only
to the payload, and not the entire IP packet like for IPv4. While we're
at it, just use the variable directly when calling
recv_verify_packet_udp/tcp.
Signed-off-by: Sean Anderson <sean.anderson(a)linux.dev>
---
tools/testing/selftests/net/lib/csum.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/net/lib/csum.c b/tools/testing/selftests/net/lib/csum.c
index e0a34e5e8dd5..27437590eeb5 100644
--- a/tools/testing/selftests/net/lib/csum.c
+++ b/tools/testing/selftests/net/lib/csum.c
@@ -675,22 +675,20 @@ static int recv_verify_packet_ipv6(void *nh, int len)
{
struct ipv6hdr *ip6h = nh;
uint16_t proto = cfg_encap ? IPPROTO_UDP : cfg_proto;
- uint16_t ip_len;
+ uint16_t payload_len;
if (len < sizeof(*ip6h) || ip6h->nexthdr != proto)
return -1;
- ip_len = ntohs(ip6h->payload_len);
- if (ip_len > len - sizeof(*ip6h))
+ payload_len = ntohs(ip6h->payload_len);
+ if (payload_len > len - sizeof(*ip6h))
return -1;
- len = ip_len;
iph_addr_p = &ip6h->saddr;
-
if (proto == IPPROTO_TCP)
- return recv_verify_packet_tcp(ip6h + 1, len);
+ return recv_verify_packet_tcp(ip6h + 1, payload_len);
else
- return recv_verify_packet_udp(ip6h + 1, len);
+ return recv_verify_packet_udp(ip6h + 1, payload_len);
}
/* return whether auxdata includes TP_STATUS_CSUM_VALID */
--
2.35.1.1320.gc452695387.dirty
From: Amit Cohen <amcohen(a)nvidia.com>
The test runs "devlink reload" explicitly. Instead, it is better to use
devlink_reload() which waits for udev events to be processed. Do not sleep
after reload, as devlink_reload() blocks until all the netdevs are renamed.
Signed-off-by: Amit Cohen <amcohen(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh b/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh
index 893a693ad805..45a569618424 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/rtnetlink.sh
@@ -186,10 +186,7 @@ bridge_vlan_flags_test()
# If we did not handle references correctly, then this should produce a
# trace
- devlink dev reload "$DEVLINK_DEV"
-
- # Allow netdevices to be re-created following the reload
- sleep 20
+ devlink_reload
log_test "bridge vlan flags"
}
@@ -923,12 +920,9 @@ devlink_reload_test()
# devlink reload can be performed without errors
RET=0
- devlink dev reload "$DEVLINK_DEV"
- check_err $? "devlink reload failed"
+ devlink_reload
log_test "devlink reload - last test"
-
- sleep 20
}
trap cleanup EXIT
--
2.45.0