First, thank you for engaging, it speeds up the iteration. This confirmed my worry that the secondary goal of this proposal, a common verification implementation, is indeed unachievable in the near term. A few clarifying questions below, but I will let this go.
The primary goal, achievable on a short runway, is more for kernel developers. It is to have a common infrastructure for marshaling vendor payloads, provide a mechanism to facilitate kernel initiated requests to a key-server, and to deploy a common frontend for concepts like runtime measurement (likely as another backend to what Keys already understands for various TPM PCR implementations).
That sounds good, though the devil is in the details. The TPM situation will be exacerbated by a lower root of trust. The TPM itself doesn't have a specification for its own attested firmware.
All the specific fields of the blob have to be decoded and subjected to an acceptance policy. That policy will most always be different across different platforms and VM owners. I wrote all of github.com/google/go-sev-guest, including the verification and validation logic, and it's going to get more complicated, and the sources of the data that provide validators with notions of what values can be trusted will be varied.
Can you provide an example? I ask only to include it in the kernel commit log for a crisp explanation why this proposed Keys format will continue to convey a raw vendor blob with no kernel abstraction as part of its payload for the foreseeable future.
An example is that while there is a common notion that each report will have some attestation key whose certificate needs to be verified, there is additional collateral that must be downloaded to
* verify a TDX key certificate against updates to known weaknesses of the key's details * verify the measurement in the report against a vendor's signed golden measurement * [usually offline and signed by the analyzing principal that the analysis was done] fully verify the measurement given a build provenance document like SLSA. The complexity of this analysis could even engage in static analysis of every commit since a certain date, or from a developer of low repute... whatever the verifier wants to do.
These are all in the realm of interpreting the blob for acceptance, so it's best to keep uninterpreted.
The formats are not standardized. The Confidential Computing Consortium should be working toward that, but it's a slow process. There's IETF RATS. There's in-toto.io attestations. There's Azure's JWT thing. There's a signed serialized protocol buffer that I've decided is what Google is going to produce while we figure out all the "right" formats to use. There will be factions and absolute gridlock for multiple years if we require solidifying an abstraction for the kernel to manage all this logic before passing a report on to user space.
Understood. When that standardization process completes my expectation is that it slots into the common conveyance method and no need to go rewrite code that already knows how to interface with Keys to get attestation evidence.
I can get on board with that. I don't think there will be much cause for more than a handful of attestation requests with different report data, so it shouldn't overwhelm the key subsystem.
You really shouldn't be putting attestation validation logic in the kernel.
It was less putting validation logic in the kernel, and more hoping for a way to abstract some common parsing in advance of a true standard attestation format, but point taken.
I think we'll have hardware-provided blobs and host-provided cached collateral. The caching could be achieved with a hosted proxy server, but given SEV-SNP already has GET_EXT_GUEST_REQUEST to simplify delivery, I think it's fair to offer other technologies the chance at supporting a similar simple solution.
Everything else will have to come from the network or the workload itself.
It belongs outside of the VM entirely with the party that will only release access keys to the VM if it can prove it's running the software it claims, on the platform it claims. I think Windows puts a remote procedure call in their guest attestation driver to the Azure attestation service, and that is an anti-pattern in my mind.
I can not speak to the Windows implementation, but the Linux Keys subsystem is there to handle Key construction that may be requested by userspace or the kernel and may be serviced by built-in keys, device/platform instantiated keys, or keys retrieved via an upcall to userspace.
The observation is that existing calls to request_key() in the kernel likely have reason to be serviced by a confidential computing key server somewhere in the chain. So, might as well enlighten the Keys subsystem to retrieve this information and skip round trips to userspace run vendor specific ioctls. Make the kernel as self sufficient as possible, and make SEV, TDX, etc. developers talk more to each other about their needs.
That sounds reasonable. I think one wrinkle in the current design is that SGX and SEV-SNP provide derived keys as a thing separate from attestation but still based on firmware measurement, and TDX doesn't yet. It may in the future come with a TDX module update that gets derived keys through an SGX enclave–who knows. The MSG_KEY_REQ guest request for a SEV-SNP derived key has some bits and bobs to select different VM material to mix into the key derivation, so that would need to be in the API as well. It makes request_key a little weird to use for both. I don't even think there's a sufficient abstraction for the guest-attest device to provide, since there isn't a common REPORT_DATA + attestation level pair of inputs that drive it. If we're fine with a technology-tagged uninterpreted input blob for key derivation, and the device returns an error if the real hardware doesn't match the technology tag, then that could be an okay enough interface.
I could be convinced to leave MSG_KEY_REQ out of Linux entirely, but only for selfish reasons. The alternative is to set up a sealing key escrow service that releases sealing keys when a VM's attestation matches a pre-registered policy, which is extremely heavy-handed when you can enforce workload identity at VM launch time and have a safe derived key with this technology. I think there's a Decentriq blog post about setting that whole supply chain up ("swiss cheese to cheddar"), so they'd likely have some words about that.