Re: [PATCH v1 1/3] x86/tdx: Check for TDX partitioning during early TDX init

7 Dec 2023


      ...
...
I think we are lacking background of this usage model and how it works.  For
instance, typically L2 is created by L1, and L1 is responsible for L2's device
I/O emulation.  I don't quite understand how could L0 emulate L2's device I/O?
Can you provide more information?
Let's differentiate between fast and slow I/O. The whole point of the paravisor in
L1 is to provide device emulation for slow I/O: TPM, RTC, NVRAM, IO-APIC, serial ports.
But fast I/O is designed to bypass it and go straight to L0. Hyper-V uses paravirtual
vmbus devices for fast I/O (net/block). The vmbus protocol has awareness of page visibility
built-in and uses native (GHCI on TDX, GHCB on SNP) mechanisms for notifications. So once
everything is set up (rings/buffers in swiotlb), the I/O for fast devices does not
involve L1. This is only possible when the VM manages C-bit itself.
Yeah that makes sense.  Thanks for the info.
...
I think the same thing could work for virtio if someone would "enlighten" vring
notification calls (instead of I/O or MMIO instructions).
...
...
...
...
Whats missing is the tdx_guest flag is not exposed to userspace in /proc/cpuinfo,
and as a result dmesg does not currently display:
"Memory Encryption Features active: Intel TDX".
That's what I set out to correct.
...
So far I see that you try to get kernel think that it runs as TDX guest,
but not really. This is not very convincing model.
No that's not accurate at all. The kernel is running as a TDX guest so I
want the kernel to know that.
But it isn't.  It runs on a hypervisor which is a TDX guest, but this doesn't
make itself a TDX guest.>
That depends on your definition of "TDX guest". The TDX 1.5 TD partitioning spec
talks of TDX-enlightened L1 VMM, (optionally) TDX-enlightened L2 VM and Unmodified
Legacy L2 VM. Here we're dealing with a TDX-enlightened L2 VM.
If a guest runs inside an Intel TDX protected TD, is aware of memory encryption and
issues TDVMCALLs - to me that makes it a TDX guest.
The thing I don't quite understand is what enlightenment(s) requires L2 to issue
TDVMCALL and know "encryption bit".
The reason that I can think of is:
If device I/O emulation of L2 is done by L0 then I guess it's reasonable to make
L2 aware of the "encryption bit" because L0 can only write emulated data to
shared buffer.  The shared buffer must be initially converted by the L2 by using
MAP_GPA TDVMCALL to L0 (to zap private pages in S-EPT etc), and L2 needs to know
the "encryption bit" to set up its page table properly.  L1 must be aware of
such private <-> shared conversion too to setup page table properly so L1 must
also be notified.
Your description is correct, except that L2 uses a hypercall (hv_mark_gpa_visibility())
to notify L1 and L1 issues the MAP_GPA TDVMCALL to L0.
In TDX partitioning IIUC L1 and L2 use different secure-EPT page table when
mapping GPA of L1 and L2.  Therefore IIUC entries of both secure-EPT table which
map to the "to be converted page" need to be zapped.
I am not entirely sure whether using hv_mark_gpa_visibility() is suffice?  As if
the MAP_GPA was from L1 then I am not sure L0 is easy to zap secure-EPT entry
for L2.
But anyway these are details probably we don't need to consider.
...
C-bit awareness is necessary to setup the whole swiotlb pool to be host visible for
DMA.
Agreed.
...
...
The concern I am having is whether there's other usage model(s) that we need to
consider.  For instance, running both unmodified L2 and enlightened L2.  Or some
L2 only needs TDVMCALL enlightenment but no "encryption bit".
Presumably unmodified L2 and enlightened L2 are already covered by current code but
require excessive trapping to L1.
I can't see a usecase for TDVMCALLs but no "encryption bit".
...
In other words, that seems pretty much L1 hypervisor/paravisor implementation
specific.  I am wondering whether we can completely hide the enlightenment(s)
logic to hypervisor/paravisor specific code but not generically mark L2 as TDX
guest but still need to disable TDCALL sort of things.
That's how it currently works - all the enlightenments are in hypervisor/paravisor
specific code in arch/x86/hyperv and drivers/hv and the vm is not marked with
X86_FEATURE_TDX_GUEST.
And I believe there's a reason that the VM is not marked as TDX guest.
...
But without X86_FEATURE_TDX_GUEST userspace has no unified way to discover that an
environment is protected by TDX and also the VM gets classified as "AMD SEV" in dmesg.
This is due to CC_ATTR_GUEST_MEM_ENCRYPT being set but X86_FEATURE_TDX_GUEST not.
Can you provide more information about what does _userspace_ do here?
What's the difference if it sees a TDX guest or a normal non-coco guest in
/proc/cpuinfo?
Looks the whole purpose of this series is to make userspace happy by advertising
TDX guest to /proc/cpuinfo.  But if we do that we will have bad side-effect in
the kernel so that we need to do things in your patch 2/3.
That doesn't seem very convincing.  Is there any other way that userspace can
utilize, e.g., any HV hypervisor/paravisor specific attributes that are exposed
to userspace?

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v1 1/3] x86/tdx: Check for TDX partitioning during early TDX init