This patch series extends the sev_init2 and the sev_smoke test to
exercise the SEV-SNP VM launch workflow.
Primarily, it introduces the architectural defines, its support in the
SEV library and extends the tests to interact with the SEV-SNP ioctl()
wrappers.
Patch 1 - Do not advertise SNP on initialization failure
Patch 2 - SNP test for KVM_SEV_INIT2
Patch 3 - Add vmgexit helper
Patch 4 - Add SMT control interface helper
Patch 5 - Replace assert() with TEST_ASSERT_EQ()
Patch 6 - Introduce SEV+ VM type check
Patch 7 - SNP iotcl() plumbing for the SEV library
Patch 8 - Force set GUEST_MEMFD for SNP
Patch 9 - Cleanups of smoke test - Decouple policy from type
Patch 10 - SNP smoke test
The series is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
v7..v8:
* Dropped exporting the SNP initialized API from ccp to KVM. Instead
call SNP_PLATFORM_STATUS within KVM to query the initialization. (Tom)
While it may be cheaper to query sev->snp_initialized from ccp, making
the SNP platform call within KVM does away with any dependencies.
v6..v7:
https://lore.kernel.org/kvm/20250221210200.244405-7-prsampat@amd.com/
Based on comments from Sean -
* Replaced FW check with sev->snp_initialized
* Dropped the patch which removes SEV+ KVM advertisement if INIT fails.
This should be now be resolved by the combination of the patches [1,2]
from Ashish.
* Change vmgexit to an inline function
* Export SMT control parsing interface to kvm_util
Note: hyperv_cpuid KST only compile tested
* Replace assert() with TEST_ASSERT_EQ() within SEV library
* Define KVM_SEV_PAGE_TYPE_INVALID for SEV call of encrypt_region()
* Parameterize encrypt_region() to include privatize_region()
* Deduplication of sev test calls between SEV,SEV-ES and SNP
* Removed FW version tests for SNP
* Included testing of SNP_POLICY_DBG
* Dropped most tags from patches that have been changed or indirectly
affected
[1] https://lore.kernel.org/all/d6d08c6b-9602-4f3d-92c2-8db6d50a1b92@amd.com
[2] https://lore.kernel.org/all/f78ddb64087df27e7bcb1ae0ab53f55aa0804fab.173922…
v5..v6:
https://lore.kernel.org/kvm/ab433246-e97c-495b-ab67-b0cb1721fb99@amd.com/
* Rename is_sev_platform_init to sev_fw_initialized (Nikunj)
* Rename KVM CPU feature X86_FEATURE_SNP to X86_FEATURE_SEV_SNP (Nikunj)
* Collected Tags from Nikunj, Pankaj, Srikanth.
v4..v5:
https://lore.kernel.org/kvm/8e7d8172-879e-4a28-8438-343b1c386ec9@amd.com/
* Introduced a check to disable advertising support for SEV, SEV-ES
and SNP when platform initialization fails (Nikunj)
* Remove the redundant SNP check within is_sev_vm() (Nikunj)
* Cleanup of the encrypt_region flow for better readability (Nikunj)
* Refactor paths to use the canonical $(ARCH) to rebase for kvm/next
v3..v4:
https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd…
* Remove SNP FW API version check in the test and ensure the KVM
capability advertises the presence of the feature. Retain the minimum
version definitions to exercise these API versions in the smoke test
* Retained only the SNP smoke test and SNP_INIT2 test
* The SNP architectural defined merged with SNP_INIT2 test patch
* SNP shutdown merged with SNP smoke test patch
* Add SEV VM type check to abstract comparisons and reduce clutter
* Define a SNP default policy which sets bits based on the presence of
SMT
* Decouple privatization and encryption for it to be SNP agnostic
* Assert for only positive tests using vm_ioctl()
* Dropped tested-by tags
In summary - based on comments from Sean, I have primarily reduced the
scope of this patch series to focus on breaking down the SNP smoke test
patch (v3 - patch2) to first introduce SEV-SNP support and use this
interface to extend the sev_init2 and the sev_smoke test.
The rest of the v3 patchset that introduces ioctl, pre fault, fallocate
and negative tests, will be re-worked and re-introduced subsequently in
future patch series post addressing the issues discussed.
v2..v3:
https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c…
* Remove the assignments for the prefault and fallocate test type
enums.
* Fix error message for sev launch measure and finish.
* Collect tested-by tags [Peter, Srikanth]
Pratik R. Sampat (10):
KVM: SEV: Disable SEV-SNP support on initialization failure
KVM: selftests: SEV-SNP test for KVM_SEV_INIT2
KVM: selftests: Add vmgexit helper
KVM: selftests: Add SMT control state helper
KVM: selftests: Replace assert() with TEST_ASSERT_EQ()
KVM: selftests: Introduce SEV VM type check
KVM: selftests: Add library support for interacting with SNP
KVM: selftests: Force GUEST_MEMFD flag for SNP VM type
KVM: selftests: Abstractions for SEV to decouple policy from type
KVM: selftests: Add a basic SEV-SNP smoke test
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/svm/sev.c | 30 +++++-
tools/arch/x86/include/uapi/asm/kvm.h | 1 +
.../testing/selftests/kvm/include/kvm_util.h | 35 +++++++
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/include/x86/sev.h | 42 ++++++++-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
.../testing/selftests/kvm/lib/x86/processor.c | 4 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 93 +++++++++++++++++--
.../testing/selftests/kvm/x86/hyperv_cpuid.c | 19 ----
.../selftests/kvm/x86/sev_init2_tests.c | 13 +++
.../selftests/kvm/x86/sev_smoke_test.c | 75 +++++++++------
12 files changed, 261 insertions(+), 60 deletions(-)
--
2.43.0
Hello,
This patchset builds upon the code at
https://lore.kernel.org/lkml/20230718234512.1690985-1-seanjc@google.com/T/.
This code is available at
https://github.com/googleprodkernel/linux-cc/tree/kvm-gmem-link-migrate-rfc….
In guest_mem v11, a split file/inode model was proposed, where memslot
bindings belong to the file and pages belong to the inode. This model
lends itself well to having different VMs use separate files pointing
to the same inode.
This RFC proposes an ioctl, KVM_LINK_GUEST_MEMFD, that takes a VM and
a gmem fd, and returns another gmem fd referencing a different file
and associated with VM. This RFC also includes an update to
KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to migrate memory context
(slot->arch.lpage_info and kvm->mem_attr_array) from source to
destination vm, intra-host.
Intended usage of the two ioctls:
1. Source VM’s fd is passed to destination VM via unix sockets
2. Destination VM uses new ioctl KVM_LINK_GUEST_MEMFD to link source
VM’s fd to a new fd.
3. Destination VM will pass new fds to KVM_SET_USER_MEMORY_REGION,
which will bind the new file, pointing to the same inode that the
source VM’s file points to, to memslots
4. Use KVM_CAP_VM_MOVE_ENC_CONTEXT_FROM to move kvm->mem_attr_array
and slot->arch.lpage_info to the destination VM.
5. Run the destination VM as per normal
Some other approaches considered were:
+ Using the linkat() syscall, but that requires a mount/directory for
a source fd to be linked to
+ Using the dup() syscall, but that only duplicates the fd, and both
fds point to the same file
---
Ackerley Tng (11):
KVM: guest_mem: Refactor out kvm_gmem_alloc_file()
KVM: guest_mem: Add ioctl KVM_LINK_GUEST_MEMFD
KVM: selftests: Add tests for KVM_LINK_GUEST_MEMFD ioctl
KVM: selftests: Test transferring private memory to another VM
KVM: x86: Refactor sev's flag migration_in_progress to kvm struct
KVM: x86: Refactor common code out of sev.c
KVM: x86: Refactor common migration preparation code out of
sev_vm_move_enc_context_from
KVM: x86: Let moving encryption context be configurable
KVM: x86: Handle moving of memory context for intra-host migration
KVM: selftests: Generalize migration functions from
sev_migrate_tests.c
KVM: selftests: Add tests for migration of private mem
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/svm/sev.c | 85 ++-----
arch/x86/kvm/svm/svm.h | 3 +-
arch/x86/kvm/x86.c | 221 +++++++++++++++++-
arch/x86/kvm/x86.h | 6 +
include/linux/kvm_host.h | 18 ++
include/uapi/linux/kvm.h | 8 +
tools/testing/selftests/kvm/Makefile | 1 +
.../testing/selftests/kvm/guest_memfd_test.c | 42 ++++
.../selftests/kvm/include/kvm_util_base.h | 31 +++
.../kvm/x86_64/private_mem_migrate_tests.c | 93 ++++++++
.../selftests/kvm/x86_64/sev_migrate_tests.c | 48 ++--
virt/kvm/guest_mem.c | 151 ++++++++++--
virt/kvm/kvm_main.c | 10 +
virt/kvm/kvm_mm.h | 7 +
15 files changed, 596 insertions(+), 132 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_migrate_tests.c
--
2.41.0.640.ga95def55d0-goog
'realpath' is not always available, fallback to 'readlink -f' if is not
available. They seem to work equally well in this context.
Signed-off-by: Yosry Ahmed <yosry.ahmed(a)linux.dev>
---
tools/testing/selftests/run_kselftest.sh | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/run_kselftest.sh b/tools/testing/selftests/run_kselftest.sh
index 50e03eefe7ac7..0443beacf3621 100755
--- a/tools/testing/selftests/run_kselftest.sh
+++ b/tools/testing/selftests/run_kselftest.sh
@@ -3,7 +3,14 @@
#
# Run installed kselftest tests.
#
-BASE_DIR=$(realpath $(dirname $0))
+
+# Fallback to readlink if realpath is not available
+if which realpath > /dev/null; then
+ BASE_DIR=$(realpath $(dirname $0))
+else
+ BASE_DIR=$(readlink -f $(dirname $0))
+fi
+
cd $BASE_DIR
TESTS="$BASE_DIR"/kselftest-list.txt
if [ ! -r "$TESTS" ] ; then
--
2.49.0.rc1.451.g8f38331e32-goog
ksft runner sends 2 SIGTERMs in a row if a test runs out of time.
Handle this in a similar way we handle SIGINT - cleanup and stop
running further tests.
Because we get 2 signals we need a bit of logic to ignore
the subsequent one, they come immediately one after the other
(due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM
to runner child")).
This change makes sure we run cleanup (scheduled defer()s)
and also print a stack trace on SIGTERM, which doesn't happen
by default. Tests occasionally hang in NIPA and it's impossible
to tell what they are waiting from or doing.
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
v3:
- remove unnecessary isinstance()
v2: https://lore.kernel.org/20250429170804.2649622-1-kuba@kernel.org
- remove declaration at the global scope
v1: https://lore.kernel.org/20250425151757.1652517-1-kuba@kernel.org
CC: ecree.xilinx(a)gmail.com
CC: petrm(a)nvidia.com
CC: willemb(a)google.com
CC: sdf(a)fomichev.me
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/net/lib/py/ksft.py | 24 +++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
index 3cfad0fd4570..61287c203b6e 100644
--- a/tools/testing/selftests/net/lib/py/ksft.py
+++ b/tools/testing/selftests/net/lib/py/ksft.py
@@ -3,6 +3,7 @@
import builtins
import functools
import inspect
+import signal
import sys
import time
import traceback
@@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True
pass
+class KsftTerminate(KeyboardInterrupt):
+ pass
+
+
def ksft_pr(*objs, **kwargs):
print("#", *objs, **kwargs)
@@ -193,6 +198,17 @@ KSFT_DISRUPTIVE = True
return env
+def _ksft_intr(signum, frame):
+ # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
+ # if we don't ignore the second one it will stop us from handling cleanup
+ global term_cnt
+ term_cnt += 1
+ if term_cnt == 1:
+ raise KsftTerminate()
+ else:
+ ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
+
+
def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
cases = cases or []
@@ -205,6 +221,10 @@ KSFT_DISRUPTIVE = True
cases.append(value)
break
+ global term_cnt
+ term_cnt = 0
+ prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
+
totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
print("TAP version 13")
@@ -233,7 +253,7 @@ KSFT_DISRUPTIVE = True
for line in tb.strip().split('\n'):
ksft_pr("Exception|", line)
if stop:
- ksft_pr("Stopping tests due to KeyboardInterrupt.")
+ ksft_pr(f"Stopping tests due to {type(e).__name__}.")
KSFT_RESULT = False
cnt_key = 'fail'
@@ -248,6 +268,8 @@ KSFT_DISRUPTIVE = True
if stop:
break
+ signal.signal(signal.SIGTERM, prev_sigterm)
+
print(
f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"
)
--
2.49.0
v13: https://lore.kernel.org/netdev/20250425204743.617260-1-almasrymina@google.c…
===
Changelog:
- Fix unneeded error label pointed out by Christoph, and addressed
nitpick.
v12: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c…
====
No changes in v12, just restored the selftests patch I accidentally dropped in
v11
v11: https://lore.kernel.org/netdev/20250423031117.907681-1-almasrymina@google.c…
====
Addressed a couple of nits and collected Acked-by from Harshitha
(thanks!)
v10: https://lore.kernel.org/netdev/20250417231540.2780723-1-almasrymina@google.…
====
Addressed comments following conversations with Pavel, Stan, and
Harshitha. Thank you guys for the reviews again. Overall minor changes:
Changelog:
- Check for !niov->pp in io_zcrx_recv_frag, just in case we end up with
a TX niov in that path (Pavel).
- Fix locking case in !netif_device_present (Jakub/Stan).
v9: https://lore.kernel.org/netdev/20250415224756.152002-1-almasrymina@google.c…
===
Changelog:
- Use priv->bindings list instead of sock_bindings_list. This was missed
during the rebase as the bindings have been updated to use
priv->bindings recently (thanks Stan!)
v8: https://lore.kernel.org/netdev/20250308214045.1160445-1-almasrymina@google.…
===
Only address minor comments on V7
Changelog:
- Use netdev locking instead of rtnl_locking to match rx path.
- Now that iouring zcrx is in net-next, use NET_IOV_IOURING instead of
NET_IOV_UNSPECIFIED.
- Post send binding to net_devmem_dmabuf_bindings after it's been fully
initialized (Stan).
v7: https://lore.kernel.org/netdev/20250227041209.2031104-1-almasrymina@google.…
===
Changelog:
- Check the dmabuf net_iov binding belongs to the device the TX is going
out on. (Jakub)
- Provide detailed inspection of callsites of
__skb_frag_ref/skb_page_unref in patch 2's changelog (Jakub)
v6: https://lore.kernel.org/netdev/20250222191517.743530-1-almasrymina@google.c…
===
v6 has no major changes. Addressed a few issues from Paolo and David,
and collected Acks from Stan. Thank you everyone for the review!
Changes:
- retain behavior to process MSG_FASTOPEN even if the provided cmsg is
invalid (Paolo).
- Rework the freeing of tx_vec slightly (it now has its own err label).
(Paolo).
- Squash the commit that makes dmabuf unbinding scheduled work into the
same one which implements the TX path so we don't run into future
errors on bisecting (Paolo).
- Fix/add comments to explain how dmabuf binding refcounting works
(David).
v5: https://lore.kernel.org/netdev/20250220020914.895431-1-almasrymina@google.c…
===
v5 has no major changes; it clears up the relatively minor issues
pointed out to in v4, and rebases the series on top of net-next to
resolve the conflict with a patch that raced to the tree. It also
collects the review tags from v4.
Changes:
- Rebase to net-next
- Fix issues in selftest (Stan).
- Address comments in the devmem and netmem driver docs (Stan and Bagas)
- Fix zerocopy_fill_skb_from_devmem return error code (Stan).
v4: https://lore.kernel.org/netdev/20250203223916.1064540-1-almasrymina@google.…
===
v4 mainly addresses the critical driver support issue surfaced in v3 by
Paolo and Stan. Drivers aiming to support netmem_tx should make sure not
to pass the netmem dma-addrs to the dma-mapping APIs, as these dma-addrs
may come from dma-bufs.
Additionally other feedback from v3 is addressed.
Major changes:
- Add helpers to handle netmem dma-addrs. Add GVE support for
netmem_tx.
- Fix binding->tx_vec not being freed on error paths during the
tx binding.
- Add a minimal devmem_tx test to devmem.py.
- Clean up everything obsolete from the cover letter (Paolo).
v3: https://patchwork.kernel.org/project/netdevbpf/list/?series=929401&state=*
===
Address minor comments from RFCv2 and fix a few build warnings and
ynl-regen issues. No major changes.
RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now multiple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Full outline on usage of the TX path is detailed in the documentation
included with this series.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
1. Add netmem refcounting needed for the TX path.
2. Devmem TX netlink API.
3. Devmem TX net stack implementation.
4. Make dma-buf unbinding scheduled work to handle TX cases where it gets
freed from contexts where we can't sleep.
5. Add devmem TX documentation.
6. Add scaffolding enabling driver support for netmem_tx. Add helpers, driver
feature flag, and docs to enable drivers to declare netmem_tx support.
7. Guard netmem_tx against being enabled against drivers that don't
support it.
8. Add devmem_tx selftests. Add TX path to ncdevmem and add a test to
devmem.py.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Cc: Samiullah Khawaja <skhawaja(a)google.com>
Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Mina Almasry (8):
netmem: add niov->type attribute to distinguish different net_iov
types
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: add devmem TCP TX documentation
net: enable driver support for netmem TX
gve: add netmem TX support to GVE DQO-RDA mode
net: check for driver support in netmem TX
selftests: ncdevmem: Implement devmem TCP TX
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 150 ++++++++-
.../networking/net_cachelines/net_device.rst | 1 +
Documentation/networking/netdev-features.rst | 5 +
Documentation/networking/netmem.rst | 23 +-
drivers/net/ethernet/google/gve/gve_main.c | 3 +
drivers/net/ethernet/google/gve/gve_tx_dqo.c | 8 +-
include/linux/netdevice.h | 2 +
include/linux/skbuff.h | 17 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 34 +-
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
io_uring/zcrx.c | 3 +-
net/core/datagram.c | 48 ++-
net/core/dev.c | 34 +-
net/core/devmem.c | 131 ++++++--
net/core/devmem.h | 83 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 80 ++++-
net/core/skbuff.c | 48 ++-
net/core/sock.c | 6 +
net/ipv4/ip_output.c | 3 +-
net/ipv4/tcp.c | 50 ++-
net/ipv6/ip6_output.c | 3 +-
net/vmw_vsock/virtio_transport_common.c | 5 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/devmem.py | 26 +-
.../selftests/drivers/net/hw/ncdevmem.c | 300 +++++++++++++++++-
30 files changed, 1008 insertions(+), 88 deletions(-)
base-commit: 0d15a26b247d25cd012134bf8825128fedb15cc9
--
2.49.0.901.g37484f566f-goog