Hi All,
In TDX guest, the attestation process is used to verify the TDX guest trustworthiness to other entities before provisioning secrets to the guest.
The TDX guest attestation process consists of two steps:
1. TDREPORT generation 2. Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest measurement data in the format of TDREPORT which is further used to validate the authenticity of the TDX guest. The second step involves sending the TDREPORT to a Quoting Enclave (QE) server to generate a remotely verifiable Quote. TDREPORT by design can only be verified on the local platform. To support remote verification of the TDREPORT, TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT locally and convert it to a remotely verifiable Quote. Although attestation software can use communication methods like TCP/IP or vsock to send the TDREPORT to QE, not all platforms support these communication models. So TDX GHCI specification [1] defines a method for Quote generation via hypercalls. Please check the discussion from Google [2] and Alibaba [3] which clarifies the need for hypercall based Quote generation support. This patch set adds this support.
Support for TDREPORT generation already exists in the TDX guest driver. This patchset extends the same driver to add the Quote generation support.
Following are the details of the patch set:
Patch 1/3 -> Adds event notification IRQ support. Patch 2/3 -> Adds Quote generation support. Patch 3/3 -> Adds selftest support for Quote generation feature.
[1] https://cdrdv2.intel.com/v1/dl/getContent/726790, section titled "TDG.VP.VMCALL<GetQuote>". [2] https://lore.kernel.org/lkml/CAAYXXYxxs2zy_978GJDwKfX5Hud503gPc8=1kQ-+JwG_kA... [3] https://lore.kernel.org/lkml/a69faebb-11e8-b386-d591-dbd08330b008@linux.alib...
Kuppuswamy Sathyanarayanan (3): x86/tdx: Add TDX Guest event notify interrupt support virt: tdx-guest: Add Quote generation support selftests/tdx: Test GetQuote TDX attestation feature
Documentation/virt/coco/tdx-guest.rst | 11 + arch/x86/coco/tdx/tdx.c | 203 +++++++++++++++ arch/x86/include/asm/tdx.h | 8 + drivers/virt/coco/tdx-guest/tdx-guest.c | 249 ++++++++++++++++++- include/uapi/linux/tdx-guest.h | 44 ++++ tools/testing/selftests/tdx/tdx_guest_test.c | 68 ++++- 6 files changed, 575 insertions(+), 8 deletions(-)
Host-guest event notification via configured interrupt vector is useful in cases where a guest makes an asynchronous request and needs a callback from the host to indicate the completion or to let the host notify the guest about events like device removal. One usage example is, callback requirement of GetQuote asynchronous hypercall.
In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the guest to specify which interrupt vector to use as an event-notify vector to the VMM. Details about the SetupEventNotifyInterrupt hypercall can be found in TDX Guest-Host Communication Interface (GHCI) Specification, sec 3.5 "VP.VMCALL<SetupEventNotifyInterrupt>".
As per design, VMM will post the event completion IRQ using the same CPU in which SetupEventNotifyInterrupt hypercall request is received. So allocate an IRQ vector from "x86_vector_domain", and set the CPU affinity of the IRQ vector to the current CPU.
Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb() interfaces to allow drivers register/unregister event noficiation handlers.
Reviewed-by: Tony Luck tony.luck@intel.com Reviewed-by: Andi Kleen ak@linux.intel.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Acked-by: Wander Lairson Costa wander@redhat.com Signed-off-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com --- arch/x86/coco/tdx/tdx.c | 163 +++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/tdx.h | 6 ++ 2 files changed, 169 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 055300e08fb3..d03985952d45 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,12 +7,18 @@ #include <linux/cpufeature.h> #include <linux/export.h> #include <linux/io.h> +#include <linux/string.h> +#include <linux/uaccess.h> +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/numa.h> #include <asm/coco.h> #include <asm/tdx.h> #include <asm/vmx.h> #include <asm/insn.h> #include <asm/insn-eval.h> #include <asm/pgtable.h> +#include <asm/irqdomain.h>
/* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 @@ -27,6 +33,7 @@ /* TDX hypercall Leaf IDs */ #define TDVMCALL_MAP_GPA 0x10001 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 +#define TDVMCALL_SETUP_NOTIFY_INTR 0x10004
/* MMIO direction */ #define EPT_READ 0 @@ -51,6 +58,16 @@
#define TDREPORT_SUBTYPE_0 0
+struct event_irq_entry { + tdx_event_irq_cb_t handler; + void *data; + struct list_head head; +}; + +static int tdx_event_irq; +static LIST_HEAD(event_irq_cb_list); +static DEFINE_SPINLOCK(event_irq_cb_lock); + /* * Wrapper for standard use of __tdx_hypercall with no output aside from * return code. @@ -873,3 +890,149 @@ void __init tdx_early_init(void)
pr_info("Guest detected\n"); } + +static irqreturn_t tdx_event_irq_handler(int irq, void *dev_id) +{ + struct event_irq_entry *entry; + + spin_lock(&event_irq_cb_lock); + list_for_each_entry(entry, &event_irq_cb_list, head) { + if (entry->handler) + entry->handler(entry->data); + } + spin_unlock(&event_irq_cb_lock); + + return IRQ_HANDLED; +} + +/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{ + struct irq_alloc_info info; + cpumask_t saved_cpumask; + struct irq_cfg *cfg; + int cpu, irq; + + if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) + return 0; + + init_irq_alloc_info(&info, NULL); + + /* + * Event notification vector will be delivered to the CPU + * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested. + * So set the IRQ affinity to the current CPU. + */ + cpu = get_cpu(); + cpumask_copy(&saved_cpumask, current->cpus_ptr); + info.mask = cpumask_of(cpu); + put_cpu(); + + irq = irq_domain_alloc_irqs(x86_vector_domain, 1, NUMA_NO_NODE, &info); + if (irq <= 0) { + pr_err("Event notification IRQ allocation failed %d\n", irq); + return -EIO; + } + + irq_set_handler(irq, handle_edge_irq); + + cfg = irq_cfg(irq); + if (!cfg) { + pr_err("Event notification IRQ config not found\n"); + goto err_free_irqs; + } + + if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING, + "tdx_event_irq", NULL)) { + pr_err("Event notification IRQ request failed\n"); + goto err_free_irqs; + } + + set_cpus_allowed_ptr(current, cpumask_of(cpu)); + + /* + * Register callback vector address with VMM. More details + * about the ABI can be found in TDX Guest-Host-Communication + * Interface (GHCI), sec titled + * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>". + */ + if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) { + pr_err("Event notification hypercall failed\n"); + goto err_restore_cpus; + } + + set_cpus_allowed_ptr(current, &saved_cpumask); + + tdx_event_irq = irq; + + return 0; + +err_restore_cpus: + set_cpus_allowed_ptr(current, &saved_cpumask); + free_irq(irq, NULL); +err_free_irqs: + irq_domain_free_irqs(irq, 1); + + return -EIO; +} +arch_initcall(tdx_event_irq_init) + +/** + * tdx_register_event_irq_cb() - Register TDX event IRQ callback handler. + * @handler: Address of driver specific event IRQ callback handler. Handler + * will be called in IRQ context and hence cannot sleep. + * @data: Context data to be passed to the callback handler. + * + * Return: 0 on success or standard error code on other failures. + */ +int tdx_register_event_irq_cb(tdx_event_irq_cb_t handler, void *data) +{ + struct event_irq_entry *entry; + unsigned long flags; + + if (tdx_event_irq <= 0) + return -EIO; + + entry = kzalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + return -ENOMEM; + + entry->data = data; + entry->handler = handler; + + spin_lock_irqsave(&event_irq_cb_lock, flags); + list_add_tail(&entry->head, &event_irq_cb_list); + spin_unlock_irqrestore(&event_irq_cb_lock, flags); + + return 0; +} +EXPORT_SYMBOL_GPL(tdx_register_event_irq_cb); + +/** + * tdx_unregister_event_irq_cb() - Unregister TDX event IRQ callback handler. + * @handler: Address of driver specific event IRQ callback handler. + * @data: Context data to be passed to the callback handler. + * + * Return: 0 on success or -EIO if event IRQ is not allocated. + */ +int tdx_unregister_event_irq_cb(tdx_event_irq_cb_t handler, void *data) +{ + struct event_irq_entry *entry; + unsigned long flags; + + if (tdx_event_irq <= 0) + return -EIO; + + spin_lock_irqsave(&event_irq_cb_lock, flags); + list_for_each_entry(entry, &event_irq_cb_list, head) { + if (entry->handler == handler && entry->data == data) { + list_del(&entry->head); + kfree(entry); + break; + } + } + spin_unlock_irqrestore(&event_irq_cb_lock, flags); + + return 0; +} +EXPORT_SYMBOL_GPL(tdx_unregister_event_irq_cb); diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 28d889c9aa16..8807fe1b1f3f 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -53,6 +53,8 @@ struct ve_info {
#ifdef CONFIG_INTEL_TDX_GUEST
+typedef int (*tdx_event_irq_cb_t)(void *); + void __init tdx_early_init(void);
/* Used to communicate with the TDX module */ @@ -69,6 +71,10 @@ bool tdx_early_handle_ve(struct pt_regs *regs);
int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport);
+int tdx_register_event_irq_cb(tdx_event_irq_cb_t handler, void *data); + +int tdx_unregister_event_irq_cb(tdx_event_irq_cb_t handler, void *data); + #else
static inline void tdx_early_init(void) { };
On Sat, 2023-03-25 at 23:20 -0700, Kuppuswamy Sathyanarayanan wrote:
Host-guest event notification via configured interrupt vector is useful in cases where a guest makes an asynchronous request and needs a callback from the host to indicate the completion or to let the host notify the guest about events like device removal. One usage example is, callback requirement of GetQuote asynchronous hypercall.
In TDX guest, SetupEventNotifyInterrupt hypercall can be used by the guest to specify which interrupt vector to use as an event-notify vector to the VMM.
"to the VMM" -> "from the VMM"?
Details about the SetupEventNotifyInterrupt hypercall can be found in TDX Guest-Host Communication Interface (GHCI) Specification, sec 3.5 "VP.VMCALL<SetupEventNotifyInterrupt>".
It seems we shouldn't mention the exact section number.
As per design, VMM will post the event completion IRQ using the same CPU in which SetupEventNotifyInterrupt hypercall request is received.
"in which" -> "on which"
So allocate an IRQ vector from "x86_vector_domain", and set the CPU affinity of the IRQ vector to the current CPU.
IMHO "current CPU" is a little bit vague. Perhaps just "to the CPU on which SetupEventNotifyInterrupt hypercall is made".
Also, perhaps it's better to mention to use IRQF_NOBALANCING to prevent the IRQ from being migrated to another cpu.
Add tdx_register_event_irq_cb()/tdx_unregister_event_irq_cb() interfaces to allow drivers register/unregister event noficiation handlers.
Reviewed-by: Tony Luck tony.luck@intel.com Reviewed-by: Andi Kleen ak@linux.intel.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Acked-by: Wander Lairson Costa wander@redhat.com Signed-off-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com
arch/x86/coco/tdx/tdx.c | 163 +++++++++++++++++++++++++++++++++++++ arch/x86/include/asm/tdx.h | 6 ++ 2 files changed, 169 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 055300e08fb3..d03985952d45 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -7,12 +7,18 @@ #include <linux/cpufeature.h> #include <linux/export.h> #include <linux/io.h> +#include <linux/string.h> +#include <linux/uaccess.h>
Do you need above two headers?
Also, perhaps you should explicitly include <.../list.h> and <.../spinlock.h>.
+#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/numa.h> #include <asm/coco.h> #include <asm/tdx.h> #include <asm/vmx.h> #include <asm/insn.h> #include <asm/insn-eval.h> #include <asm/pgtable.h> +#include <asm/irqdomain.h> /* TDX module Call Leaf IDs */ #define TDX_GET_INFO 1 @@ -27,6 +33,7 @@ /* TDX hypercall Leaf IDs */ #define TDVMCALL_MAP_GPA 0x10001 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 +#define TDVMCALL_SETUP_NOTIFY_INTR 0x10004 /* MMIO direction */ #define EPT_READ 0 @@ -51,6 +58,16 @@ #define TDREPORT_SUBTYPE_0 0 +struct event_irq_entry {
- tdx_event_irq_cb_t handler;
- void *data;
- struct list_head head;
+};
+static int tdx_event_irq;
__ro_after_init?
+static LIST_HEAD(event_irq_cb_list); +static DEFINE_SPINLOCK(event_irq_cb_lock);
/*
- Wrapper for standard use of __tdx_hypercall with no output aside from
- return code.
@@ -873,3 +890,149 @@ void __init tdx_early_init(void) pr_info("Guest detected\n"); }
+static irqreturn_t tdx_event_irq_handler(int irq, void *dev_id) +{
- struct event_irq_entry *entry;
- spin_lock(&event_irq_cb_lock);
- list_for_each_entry(entry, &event_irq_cb_list, head) {
if (entry->handler)
entry->handler(entry->data);
- }
- spin_unlock(&event_irq_cb_lock);
- return IRQ_HANDLED;
+}
+/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{
- struct irq_alloc_info info;
- cpumask_t saved_cpumask;
- struct irq_cfg *cfg;
- int cpu, irq;
- if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return 0;
- init_irq_alloc_info(&info, NULL);
- /*
* Event notification vector will be delivered to the CPU
* in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
* So set the IRQ affinity to the current CPU.
*/
- cpu = get_cpu();
- cpumask_copy(&saved_cpumask, current->cpus_ptr);
- info.mask = cpumask_of(cpu);
- put_cpu();
The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of this function, I think you can remove all related code:
cpu = get_cpu();
/* * Set @info->mask to local cpu to make sure a valid vector is * pre-allocated when TDX event notification IRQ is allocated * from x86_vector_domain. */ init_irq_alloc_info(&info, cpumask_of(cpu));
// rest staff: request_irq(), hypercall ...
put_cpu();
- irq = irq_domain_alloc_irqs(x86_vector_domain, 1, NUMA_NO_NODE, &info);
Should you use cpu_to_node(cpu) instead of NUMA_NO_NODE?
- if (irq <= 0) {
pr_err("Event notification IRQ allocation failed %d\n", irq);
return -EIO;
- }
- irq_set_handler(irq, handle_edge_irq);
- cfg = irq_cfg(irq);
- if (!cfg) {
pr_err("Event notification IRQ config not found\n");
goto err_free_irqs;
- }
You are depending on irq_domain_alloc_irqs() to have already allocated a vector on the local cpu. Then if !cfg, your code of calling irq_domain_alloc_irqs() to allocate vector is broken.
So, perhaps you should just WARN() if vector hasn't been allocated to catch bug.
WARN(!(irq_cfg(irq)->vector));
- if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING,
It's better to add a comment to explain why using IRQF_NOBALANCING.
"tdx_event_irq", NULL)) {
pr_err("Event notification IRQ request failed\n");
goto err_free_irqs;
- }
- set_cpus_allowed_ptr(current, cpumask_of(cpu));
- /*
* Register callback vector address with VMM. More details
* about the ABI can be found in TDX Guest-Host-Communication
* Interface (GHCI), sec titled
* "TDG.VP.VMCALL<SetupEventNotifyInterrupt>".
*/
- if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) {
pr_err("Event notification hypercall failed\n");
goto err_restore_cpus;
- }
- set_cpus_allowed_ptr(current, &saved_cpumask);
- tdx_event_irq = irq;
- return 0;
+err_restore_cpus:
- set_cpus_allowed_ptr(current, &saved_cpumask);
- free_irq(irq, NULL);
+err_free_irqs:
- irq_domain_free_irqs(irq, 1);
- return -EIO;
+} +arch_initcall(tdx_event_irq_init)
Hi Kai,
On 3/27/23 7:38 PM, Huang, Kai wrote:
+/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{
- struct irq_alloc_info info;
- cpumask_t saved_cpumask;
- struct irq_cfg *cfg;
- int cpu, irq;
- if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return 0;
- init_irq_alloc_info(&info, NULL);
- /*
* Event notification vector will be delivered to the CPU
* in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
* So set the IRQ affinity to the current CPU.
*/
- cpu = get_cpu();
- cpumask_copy(&saved_cpumask, current->cpus_ptr);
- info.mask = cpumask_of(cpu);
- put_cpu();
The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of this function, I think you can remove all related code:
cpu = get_cpu();
/* * Set @info->mask to local cpu to make sure a valid vector is * pre-allocated when TDX event notification IRQ is allocated * from x86_vector_domain. */ init_irq_alloc_info(&info, cpumask_of(cpu));
// rest staff: request_irq(), hypercall ...
put_cpu();
init_irq_alloc_info() is a sleeping function. Since get_cpu() disables preemption, we cannot call sleeping function after it. Initially, I have implemented it like you have mentioned. However, I discovered the following error.
[ 2.400755] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 [ 2.404664] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 [ 2.408671] preempt_count: 1, expected: 0 [ 2.412650] RCU nest depth: 0, expected: 0 [ 2.412666] no locks held by swapper/0/1. [ 2.416650] Preemption disabled at: [ 2.416650] [<ffffffff83b8089f>] tdx_arch_init+0x38/0x117 [ 2.420670] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00117-g672ca073d9f9-dirty #2527 [ 2.424650] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 2.424650] Call Trace: [ 2.424650] <TASK> [ 2.424650] dump_stack_lvl+0x6a/0x86 [ 2.424650] __might_resched.cold+0xf4/0x12f [ 2.424650] __mutex_lock+0x50/0x810 [ 2.424650] ? lock_is_held_type+0xd8/0x130 [ 2.424650] ? __irq_alloc_descs+0xcf/0x310 [ 2.424650] ? find_held_lock+0x2b/0x80 [ 2.424650] ? __irq_alloc_descs+0xcf/0x310 [ 2.424650] __irq_alloc_descs+0xcf/0x310 [ 2.424650] irq_domain_alloc_descs.part.0+0x49/0xa0 [ 2.424650] __irq_domain_alloc_irqs+0x2a0/0x4f0 [ 2.424650] ? next_arg+0x129/0x1f0 [ 2.424650] ? tdx_guest_init+0x5b/0x5b [ 2.424650] tdx_arch_init+0x8e/0x117 [ 2.424650] do_one_initcall+0x137/0x2ec [ 2.424650] ? rcu_read_lock_sched_held+0x36/0x60 [ 2.424650] kernel_init_freeable+0x1e3/0x241 [ 2.424650] ? rest_init+0x1a0/0x1a0 [ 2.424650] kernel_init+0x17/0x170 [ 2.424650] ? rest_init+0x1a0/0x1a0 [ 2.424650] ret_from_fork+0x1f/0x30 [ 2.424650] </TASK>
On Mon, 2023-03-27 at 19:50 -0700, Sathyanarayanan Kuppuswamy wrote:
Hi Kai,
On 3/27/23 7:38 PM, Huang, Kai wrote:
+/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{
- struct irq_alloc_info info;
- cpumask_t saved_cpumask;
- struct irq_cfg *cfg;
- int cpu, irq;
- if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return 0;
- init_irq_alloc_info(&info, NULL);
- /*
* Event notification vector will be delivered to the CPU
* in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
* So set the IRQ affinity to the current CPU.
*/
- cpu = get_cpu();
- cpumask_copy(&saved_cpumask, current->cpus_ptr);
- info.mask = cpumask_of(cpu);
- put_cpu();
The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of this function, I think you can remove all related code:
cpu = get_cpu();
/* * Set @info->mask to local cpu to make sure a valid vector is * pre-allocated when TDX event notification IRQ is allocated * from x86_vector_domain. */ init_irq_alloc_info(&info, cpumask_of(cpu));
// rest staff: request_irq(), hypercall ...
put_cpu();
init_irq_alloc_info() is a sleeping function. Since get_cpu() disables preemption, we cannot call sleeping function after it. Initially, I have implemented it like you have mentioned. However, I discovered the following error.
Oh sorry I forgot this. So I think we should use migrate_disable() instead:
migrate_disable();
init_irq_alloc_info(&info, cpumask_of(smp_processor_id()));
...
migrate_enable();
Or, should we just use early_initcall() so that only BSP is running? IMHO it's OK to always allocate the vector from BSP.
Anyway, either way is fine to me.
[ 2.400755] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 [ 2.404664] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 [ 2.408671] preempt_count: 1, expected: 0 [ 2.412650] RCU nest depth: 0, expected: 0 [ 2.412666] no locks held by swapper/0/1. [ 2.416650] Preemption disabled at: [ 2.416650] [<ffffffff83b8089f>] tdx_arch_init+0x38/0x117 [ 2.420670] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00117-g672ca073d9f9-dirty #2527 [ 2.424650] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 2.424650] Call Trace: [ 2.424650] <TASK> [ 2.424650] dump_stack_lvl+0x6a/0x86 [ 2.424650] __might_resched.cold+0xf4/0x12f [ 2.424650] __mutex_lock+0x50/0x810 [ 2.424650] ? lock_is_held_type+0xd8/0x130 [ 2.424650] ? __irq_alloc_descs+0xcf/0x310 [ 2.424650] ? find_held_lock+0x2b/0x80 [ 2.424650] ? __irq_alloc_descs+0xcf/0x310 [ 2.424650] __irq_alloc_descs+0xcf/0x310 [ 2.424650] irq_domain_alloc_descs.part.0+0x49/0xa0 [ 2.424650] __irq_domain_alloc_irqs+0x2a0/0x4f0 [ 2.424650] ? next_arg+0x129/0x1f0 [ 2.424650] ? tdx_guest_init+0x5b/0x5b [ 2.424650] tdx_arch_init+0x8e/0x117 [ 2.424650] do_one_initcall+0x137/0x2ec [ 2.424650] ? rcu_read_lock_sched_held+0x36/0x60 [ 2.424650] kernel_init_freeable+0x1e3/0x241 [ 2.424650] ? rest_init+0x1a0/0x1a0 [ 2.424650] kernel_init+0x17/0x170 [ 2.424650] ? rest_init+0x1a0/0x1a0 [ 2.424650] ret_from_fork+0x1f/0x30 [ 2.424650] </TASK>
On 3/27/23 9:02 PM, Huang, Kai wrote:
On Mon, 2023-03-27 at 19:50 -0700, Sathyanarayanan Kuppuswamy wrote:
Hi Kai,
On 3/27/23 7:38 PM, Huang, Kai wrote:
+/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{
- struct irq_alloc_info info;
- cpumask_t saved_cpumask;
- struct irq_cfg *cfg;
- int cpu, irq;
- if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return 0;
- init_irq_alloc_info(&info, NULL);
- /*
* Event notification vector will be delivered to the CPU
* in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
* So set the IRQ affinity to the current CPU.
*/
- cpu = get_cpu();
- cpumask_copy(&saved_cpumask, current->cpus_ptr);
- info.mask = cpumask_of(cpu);
- put_cpu();
The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of this function, I think you can remove all related code:
cpu = get_cpu();
/* * Set @info->mask to local cpu to make sure a valid vector is * pre-allocated when TDX event notification IRQ is allocated * from x86_vector_domain. */ init_irq_alloc_info(&info, cpumask_of(cpu));
// rest staff: request_irq(), hypercall ...
put_cpu();
init_irq_alloc_info() is a sleeping function. Since get_cpu() disables preemption, we cannot call sleeping function after it. Initially, I have implemented it like you have mentioned. However, I discovered the following error.
Oh sorry I forgot this. So I think we should use migrate_disable() instead:
migrate_disable();
init_irq_alloc_info(&info, cpumask_of(smp_processor_id()));
...
migrate_enable();
Or, should we just use early_initcall() so that only BSP is running? IMHO it's OK to always allocate the vector from BSP.
Anyway, either way is fine to me.
Final version looks like below.
static int __init tdx_event_irq_init(void) { struct irq_alloc_info info; struct irq_cfg *cfg; int irq;
if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) return 0;
init_irq_alloc_info(&info, NULL);
/* * Event notification vector will be delivered to the CPU * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested. * So set the IRQ affinity to the current CPU. */ info.mask = cpumask_of(0);
irq = irq_domain_alloc_irqs(x86_vector_domain, 1, cpu_to_node(0), &info); if (irq <= 0) { pr_err("Event notification IRQ allocation failed %d\n", irq); return -EIO; }
irq_set_handler(irq, handle_edge_irq);
/* Since the IRQ affinity is set, it cannot be balanced */ if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING, "tdx_event_irq", NULL)) { pr_err("Event notification IRQ request failed\n"); goto err_free_domain_irqs; }
cfg = irq_cfg(irq);
/* * Since tdx_event_irq_init() is triggered via early_initcall(), * it will called before secondary CPUs bringup. Since there is * only one CPU, it complies with the requirement of executing * the TDVMCALL_SETUP_NOTIFY_INTR hypercall on the same CPU where * the IRQ vector is allocated. * * Register callback vector address with VMM. More details * about the ABI can be found in TDX Guest-Host-Communication * Interface (GHCI), sec titled * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>". */ if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) { pr_err("Event notification hypercall failed\n"); goto err_free_irqs; }
tdx_event_irq = irq;
return 0;
err_free_irqs: free_irq(irq, NULL); err_free_domain_irqs: irq_domain_free_irqs(irq, 1);
return -EIO; } early_initcall(tdx_event_irq_init)
[ 2.400755] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:580 [ 2.404664] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: swapper/0 [ 2.408671] preempt_count: 1, expected: 0 [ 2.412650] RCU nest depth: 0, expected: 0 [ 2.412666] no locks held by swapper/0/1. [ 2.416650] Preemption disabled at: [ 2.416650] [<ffffffff83b8089f>] tdx_arch_init+0x38/0x117 [ 2.420670] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc4-00117-g672ca073d9f9-dirty #2527 [ 2.424650] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 2.424650] Call Trace: [ 2.424650] <TASK> [ 2.424650] dump_stack_lvl+0x6a/0x86 [ 2.424650] __might_resched.cold+0xf4/0x12f [ 2.424650] __mutex_lock+0x50/0x810 [ 2.424650] ? lock_is_held_type+0xd8/0x130 [ 2.424650] ? __irq_alloc_descs+0xcf/0x310 [ 2.424650] ? find_held_lock+0x2b/0x80 [ 2.424650] ? __irq_alloc_descs+0xcf/0x310 [ 2.424650] __irq_alloc_descs+0xcf/0x310 [ 2.424650] irq_domain_alloc_descs.part.0+0x49/0xa0 [ 2.424650] __irq_domain_alloc_irqs+0x2a0/0x4f0 [ 2.424650] ? next_arg+0x129/0x1f0 [ 2.424650] ? tdx_guest_init+0x5b/0x5b [ 2.424650] tdx_arch_init+0x8e/0x117 [ 2.424650] do_one_initcall+0x137/0x2ec [ 2.424650] ? rcu_read_lock_sched_held+0x36/0x60 [ 2.424650] kernel_init_freeable+0x1e3/0x241 [ 2.424650] ? rest_init+0x1a0/0x1a0 [ 2.424650] kernel_init+0x17/0x170 [ 2.424650] ? rest_init+0x1a0/0x1a0 [ 2.424650] ret_from_fork+0x1f/0x30 [ 2.424650] </TASK>
On Tue, 2023-04-04 at 18:02 -0700, Sathyanarayanan Kuppuswamy wrote:
On 3/27/23 9:02 PM, Huang, Kai wrote:
On Mon, 2023-03-27 at 19:50 -0700, Sathyanarayanan Kuppuswamy wrote:
Hi Kai,
On 3/27/23 7:38 PM, Huang, Kai wrote:
+/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{
- struct irq_alloc_info info;
- cpumask_t saved_cpumask;
- struct irq_cfg *cfg;
- int cpu, irq;
- if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return 0;
- init_irq_alloc_info(&info, NULL);
- /*
* Event notification vector will be delivered to the CPU
* in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
* So set the IRQ affinity to the current CPU.
*/
- cpu = get_cpu();
- cpumask_copy(&saved_cpumask, current->cpus_ptr);
- info.mask = cpumask_of(cpu);
- put_cpu();
The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of this function, I think you can remove all related code:
cpu = get_cpu();
/* * Set @info->mask to local cpu to make sure a valid vector is * pre-allocated when TDX event notification IRQ is allocated * from x86_vector_domain. */ init_irq_alloc_info(&info, cpumask_of(cpu));
// rest staff: request_irq(), hypercall ...
put_cpu();
init_irq_alloc_info() is a sleeping function. Since get_cpu() disables preemption, we cannot call sleeping function after it. Initially, I have implemented it like you have mentioned. However, I discovered the following error.
Oh sorry I forgot this. So I think we should use migrate_disable() instead:
migrate_disable();
init_irq_alloc_info(&info, cpumask_of(smp_processor_id()));
...
migrate_enable();
Or, should we just use early_initcall() so that only BSP is running? IMHO it's OK to always allocate the vector from BSP.
Anyway, either way is fine to me.
Final version looks like below.
static int __init tdx_event_irq_init(void) { struct irq_alloc_info info; struct irq_cfg *cfg; int irq;
if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) return 0; init_irq_alloc_info(&info, NULL); /* * Event notification vector will be delivered to the CPU * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested. * So set the IRQ affinity to the current CPU. */ info.mask = cpumask_of(0); irq = irq_domain_alloc_irqs(x86_vector_domain, 1, cpu_to_node(0), &info); if (irq <= 0) { pr_err("Event notification IRQ allocation failed %d\n", irq); return -EIO; } irq_set_handler(irq, handle_edge_irq); /* Since the IRQ affinity is set, it cannot be balanced */ if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING, "tdx_event_irq", NULL)) { pr_err("Event notification IRQ request failed\n"); goto err_free_domain_irqs; } cfg = irq_cfg(irq); /* * Since tdx_event_irq_init() is triggered via early_initcall(), * it will called before secondary CPUs bringup. Since there is * only one CPU, it complies with the requirement of executing * the TDVMCALL_SETUP_NOTIFY_INTR hypercall on the same CPU where * the IRQ vector is allocated. * * Register callback vector address with VMM. More details * about the ABI can be found in TDX Guest-Host-Communication * Interface (GHCI), sec titled * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>". */ if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) { pr_err("Event notification hypercall failed\n"); goto err_free_irqs; } tdx_event_irq = irq; return 0;
err_free_irqs: free_irq(irq, NULL); err_free_domain_irqs: irq_domain_free_irqs(irq, 1);
return -EIO;
} early_initcall(tdx_event_irq_init)
I found there's another series also doing similar thing, and it seems Thomas wasn't happy about using x86_vector_domain directly:
https://lore.kernel.org/lkml/877cv99k0y.ffs@tglx/
An alternative was also posted (creating IRQ domain on top of x86_vector_domain):
https://lore.kernel.org/lkml/20230328182933.GA1403032@vm02.guest.corp.micros...
I think we should monitor that and hear from others more.
In TDX guest, the second stage in attestation process is to send the TDREPORT to QE/QGS to generate the TD Quote. For platforms that does not support communication channels like vsock or TCP/IP, implement support to get TD Quote using hypercall. GetQuote hypercall can be used by the TD guest to request VMM facilitate the Quote generation via QE/QGS. More details about GetQuote hypercall can be found in TDX Guest-Host Communication Interface (GHCI) for Intel TDX 1.0, section titled "TDG.VP.VMCALL<GetQuote>".
Add support for TDX_CMD_GET_QUOTE IOCTL to allow attestation agent submit GetQuote requests from the user space using GetQuote hypercall.
Since GetQuote is an asynchronous request hypercall, VMM will use callback interrupt vector configured by SetupEventNotifyInterrupt hypercall to notify the guest about Quote generation completion or failure. So register an IRQ handler for it.
GetQuote TDVMCALL requires TD guest pass a 4K aligned shared buffer with TDREPORT data as input, which is further used by the VMM to copy the TD Quote result after successful Quote generation. To create the shared buffer without breaking the direct map, use DMA alloc APIs.
Also note that, shared buffer allocation is currently handled in IOCTL handler, although it will increase the TDX_CMD_GET_QUOTE IOCTL response time, it is negligible compared to the time required for the quote generation completion. So IOCTL performance optimization is not considered at this time.
To support parallel GetQuote requests, use linked list to track the active GetQuote requests and upon receiving the callback IRQ, loop through the active requests and mark the processed requests complete. Users can open multiple instances of the attestation device and send GetQuote requests in parallel.
Since GetQuote support requires usage of DMA APIs, convert TDX guest driver to a platform driver.
Reviewed-by: Tony Luck tony.luck@intel.com Reviewed-by: Andi Kleen ak@linux.intel.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com --- Documentation/virt/coco/tdx-guest.rst | 11 ++ arch/x86/coco/tdx/tdx.c | 40 ++++ arch/x86/include/asm/tdx.h | 2 + drivers/virt/coco/tdx-guest/tdx-guest.c | 249 +++++++++++++++++++++++- include/uapi/linux/tdx-guest.h | 44 +++++ 5 files changed, 344 insertions(+), 2 deletions(-)
diff --git a/Documentation/virt/coco/tdx-guest.rst b/Documentation/virt/coco/tdx-guest.rst index 46e316db6bb4..69954ed85c68 100644 --- a/Documentation/virt/coco/tdx-guest.rst +++ b/Documentation/virt/coco/tdx-guest.rst @@ -42,6 +42,17 @@ ABI. However, in the future, if the TDX Module supports more than one subtype, a new IOCTL CMD will be created to handle it. To keep the IOCTL naming consistent, a subtype index is added as part of the IOCTL CMD.
+2.2 TDX_CMD_GET_QUOTE +---------------------- + +:Input parameters: struct tdx_quote_req +:Output: Return 0 on success, -EINTR for interrupted request, -EIO on TDCALL + failure or standard error number on common failures. Upon successful + execution, QUOTE data is copied to tdx_quote_req.buf. + +The TDX_CMD_GET_QUOTE IOCTL can be used by attestation software to generate +QUOTE for the given TDREPORT using TDG.VP.VMCALL<GetQuote> hypercall. + Reference ---------
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index d03985952d45..b823c4f493d4 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -34,6 +34,7 @@ #define TDVMCALL_MAP_GPA 0x10001 #define TDVMCALL_REPORT_FATAL_ERROR 0x10003 #define TDVMCALL_SETUP_NOTIFY_INTR 0x10004 +#define TDVMCALL_GET_QUOTE 0x10002
/* MMIO direction */ #define EPT_READ 0 @@ -199,6 +200,45 @@ static void __noreturn tdx_panic(const char *msg) __tdx_hypercall(&args, 0); }
+/** + * tdx_hcall_get_quote() - Wrapper to request TD Quote using GetQuote + * hypercall. + * @tdquote: Address of the direct mapped kernel buffer which contains + * TDREPORT data. The same buffer will be used by VMM to store + * the generated TD Quote output. + * @size: size of the tdquote buffer. + * + * Refer to section titled "TDG.VP.VMCALL<GetQuote>" in the TDX GHCI + * v1.0 specification for more information on GetQuote hypercall. + * It is used in the TDX guest driver module to get the TD Quote. + * + * Return 0 on success or error code on failure. + */ +int tdx_hcall_get_quote(u8 *tdquote, size_t size) +{ + struct tdx_hypercall_args args = {0}; + + /* + * TDX guest driver is the only user of this function and it uses + * the kernel mapped memory. So use virt_to_phys() to get the + * physical address of the TDQuote buffer without any additional + * checks for memory type. + */ + args.r10 = TDX_HYPERCALL_STANDARD; + args.r11 = TDVMCALL_GET_QUOTE; + args.r12 = cc_mkdec(virt_to_phys(tdquote)); + args.r13 = size; + + /* + * Pass the physical address of TDREPORT to the VMM and + * trigger the Quote generation. It is not a blocking + * call, hence completion of this request will be notified to + * the TD guest via a callback interrupt. + */ + return __tdx_hypercall(&args, 0); +} +EXPORT_SYMBOL_GPL(tdx_hcall_get_quote); + static void tdx_parse_tdinfo(u64 *cc_mask) { struct tdx_module_output out; diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 8807fe1b1f3f..a72bd7b96564 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -75,6 +75,8 @@ int tdx_register_event_irq_cb(tdx_event_irq_cb_t handler, void *data);
int tdx_unregister_event_irq_cb(tdx_event_irq_cb_t handler, void *data);
+int tdx_hcall_get_quote(u8 *tdquote, size_t size); + #else
static inline void tdx_early_init(void) { }; diff --git a/drivers/virt/coco/tdx-guest/tdx-guest.c b/drivers/virt/coco/tdx-guest/tdx-guest.c index 5e44a0fa69bd..2424a0d02bc8 100644 --- a/drivers/virt/coco/tdx-guest/tdx-guest.c +++ b/drivers/virt/coco/tdx-guest/tdx-guest.c @@ -12,12 +12,151 @@ #include <linux/mod_devicetable.h> #include <linux/string.h> #include <linux/uaccess.h> +#include <linux/platform_device.h> +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/dma-mapping.h>
#include <uapi/linux/tdx-guest.h>
#include <asm/cpu_device_id.h> #include <asm/tdx.h>
+/** + * struct quote_entry - Quote request struct + * @valid: Flag to check validity of the GetQuote request. + * @buf: Kernel buffer to share data with VMM (size is page aligned). + * @buf_len: Size of the buf in bytes. + * @dma_address: DMA address of the buffer. + * @dev: Reference to device used in DMA allocation. + * @compl: Completion object to track completion of GetQuote request. + * @list: List object for reference in quote_list. + */ +struct quote_entry { + bool valid; + void *buf; + size_t buf_len; + dma_addr_t dma_address; + struct device *dev; + struct completion compl; + struct list_head list; +}; + +/* + * To support parallel GetQuote requests, use the list + * to track active GetQuote requests. + */ +static LIST_HEAD(quote_list); + +/* Lock to protect quote_list */ +static DEFINE_MUTEX(quote_lock); + +/* Quote generation work */ +static void quote_callback_handler(struct work_struct *work); + +static DECLARE_WORK(quote_work, quote_callback_handler); + +static struct platform_device *tdx_dev; + +static struct quote_entry *alloc_quote_entry(struct device *dev, size_t buf_len) +{ + struct quote_entry *entry = NULL; + size_t new_len = PAGE_ALIGN(buf_len); + + entry = kmalloc(sizeof(*entry), GFP_KERNEL); + if (!entry) + return NULL; + + entry->buf = dma_alloc_coherent(dev, new_len, &entry->dma_address, + GFP_KERNEL); + if (!entry->buf) { + kfree(entry); + return NULL; + } + + entry->buf_len = new_len; + init_completion(&entry->compl); + entry->valid = true; + entry->dev = dev; + + return entry; +} + +static void free_quote_entry(struct quote_entry *entry) +{ + dma_free_coherent(entry->dev, entry->buf_len, entry->buf, + entry->dma_address); + kfree(entry); +} + +/* Must be called with quote_lock held */ +static void _del_quote_entry(struct quote_entry *entry) +{ + list_del(&entry->list); + free_quote_entry(entry); +} + +static void del_quote_entry(struct quote_entry *entry) +{ + mutex_lock("e_lock); + _del_quote_entry(entry); + mutex_unlock("e_lock); +} + +/* Handles early termination of GetQuote requests */ +static void terminate_quote_request(struct quote_entry *entry) +{ + struct tdx_quote_hdr *quote_hdr; + + /* + * For early termination, if the request is not yet + * processed by VMM (GET_QUOTE_IN_FLIGHT), the VMM + * still owns the shared buffer, so mark the request + * invalid to let quote_callback_handler() handle the + * memory cleanup function. If the request is already + * processed, then do the cleanup and return. + */ + mutex_lock("e_lock); + quote_hdr = entry->buf; + if (quote_hdr->status == GET_QUOTE_IN_FLIGHT) { + entry->valid = false; + mutex_unlock("e_lock); + return; + } + _del_quote_entry(entry); + mutex_unlock("e_lock); +} + +static int attestation_callback_handler(void *dev_id) +{ + schedule_work("e_work); + return 0; +} + +static void quote_callback_handler(struct work_struct *work) +{ + struct tdx_quote_hdr *quote_hdr; + struct quote_entry *entry, *next; + + /* Find processed quote request and mark it complete */ + mutex_lock("e_lock); + list_for_each_entry_safe(entry, next, "e_list, list) { + quote_hdr = entry->buf; + if (quote_hdr->status == GET_QUOTE_IN_FLIGHT) + continue; + /* + * If user invalidated the current request, remove the + * entry from the quote list and free it. If the request + * is still valid, mark it complete. + */ + if (entry->valid) + complete(&entry->compl); + else + _del_quote_entry(entry); + } + mutex_unlock("e_lock); +} + static long tdx_get_report0(struct tdx_report_req __user *req) { u8 *reportdata, *tdreport; @@ -53,12 +192,78 @@ static long tdx_get_report0(struct tdx_report_req __user *req) return ret; }
+static long tdx_get_quote(struct tdx_quote_req __user *ureq) +{ + struct device *dev = &tdx_dev->dev; + struct quote_entry *entry; + struct tdx_quote_req req; + long ret; + + if (copy_from_user(&req, ureq, sizeof(req))) + return -EFAULT; + + /* + * TDX GHCI specification does not specify any upper limit + * on Quote buffer size. So no upper limit check is added. + * If a user passes a large value, it is expected that it + * will fail during memory allocation in alloc_quote_entry(). + */ + if (!req.len) + return -EINVAL; + + entry = alloc_quote_entry(dev, req.len); + if (!entry) + return -ENOMEM; + + if (copy_from_user(entry->buf, (void __user *)req.buf, req.len)) { + free_quote_entry(entry); + return -EFAULT; + } + + mutex_lock("e_lock); + + /* Submit GetQuote Request using GetQuote hypercall */ + ret = tdx_hcall_get_quote(entry->buf, entry->buf_len); + if (ret) { + mutex_unlock("e_lock); + dev_err(dev, "GetQuote hypercall failed, status:%lx\n", ret); + free_quote_entry(entry); + return -EIO; + } + + /* Add current quote entry to quote_list to track active requests */ + list_add_tail(&entry->list, "e_list); + + mutex_unlock("e_lock); + + /* + * Wait till GetQuote completion or request is cancelled by the + * user signal. If the request is cancelled by the user, don't + * cleanup the quote buffer as it is still owned by the VMM. + */ + ret = wait_for_completion_interruptible(&entry->compl); + if (ret < 0) { + dev_err(dev, "GetQuote request terminated\n"); + terminate_quote_request(entry); + return -EINTR; + } + + if (copy_to_user((void __user *)req.buf, entry->buf, req.len)) + ret = -EFAULT; + + del_quote_entry(entry); + + return ret; +} + static long tdx_guest_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { switch (cmd) { case TDX_CMD_GET_REPORT0: return tdx_get_report0((struct tdx_report_req __user *)arg); + case TDX_CMD_GET_QUOTE: + return tdx_get_quote((struct tdx_quote_req *)arg); default: return -ENOTTY; } @@ -76,6 +281,27 @@ static struct miscdevice tdx_misc_dev = { .fops = &tdx_guest_fops, };
+static int tdx_guest_probe(struct platform_device *pdev) +{ + if (tdx_register_event_irq_cb(attestation_callback_handler, pdev)) + return -EIO; + + return misc_register(&tdx_misc_dev); +} + +static int tdx_guest_remove(struct platform_device *pdev) +{ + tdx_unregister_event_irq_cb(attestation_callback_handler, pdev); + misc_deregister(&tdx_misc_dev); + return 0; +} + +static struct platform_driver tdx_guest_driver = { + .probe = tdx_guest_probe, + .remove = tdx_guest_remove, + .driver.name = KBUILD_MODNAME, +}; + static const struct x86_cpu_id tdx_guest_ids[] = { X86_MATCH_FEATURE(X86_FEATURE_TDX_GUEST, NULL), {} @@ -84,16 +310,35 @@ MODULE_DEVICE_TABLE(x86cpu, tdx_guest_ids);
static int __init tdx_guest_init(void) { + int ret; + if (!x86_match_cpu(tdx_guest_ids)) return -ENODEV;
- return misc_register(&tdx_misc_dev); + ret = platform_driver_register(&tdx_guest_driver); + if (ret) { + pr_err("failed to register driver, err=%d\n", ret); + return ret; + } + + tdx_dev = platform_device_register_simple(KBUILD_MODNAME, + PLATFORM_DEVID_NONE, + NULL, 0); + if (IS_ERR(tdx_dev)) { + ret = PTR_ERR(tdx_dev); + pr_err("failed to allocate device, err=%d\n", ret); + platform_driver_unregister(&tdx_guest_driver); + return ret; + } + + return 0; } module_init(tdx_guest_init);
static void __exit tdx_guest_exit(void) { - misc_deregister(&tdx_misc_dev); + platform_device_unregister(tdx_dev); + platform_driver_unregister(&tdx_guest_driver); } module_exit(tdx_guest_exit);
diff --git a/include/uapi/linux/tdx-guest.h b/include/uapi/linux/tdx-guest.h index a6a2098c08ff..94c78d08f2aa 100644 --- a/include/uapi/linux/tdx-guest.h +++ b/include/uapi/linux/tdx-guest.h @@ -17,6 +17,12 @@ /* Length of TDREPORT used in TDG.MR.REPORT TDCALL */ #define TDX_REPORT_LEN 1024
+/* TD Quote status codes */ +#define GET_QUOTE_SUCCESS 0 +#define GET_QUOTE_IN_FLIGHT 0xffffffffffffffff +#define GET_QUOTE_ERROR 0x8000000000000000 +#define GET_QUOTE_SERVICE_UNAVAILABLE 0x8000000000000001 + /** * struct tdx_report_req - Request struct for TDX_CMD_GET_REPORT0 IOCTL. * @@ -30,6 +36,35 @@ struct tdx_report_req { __u8 tdreport[TDX_REPORT_LEN]; };
+/* struct tdx_quote_hdr: Format of Quote request buffer header. + * @version: Quote format version, filled by TD. + * @status: Status code of Quote request, filled by VMM. + * @in_len: Length of TDREPORT, filled by TD. + * @out_len: Length of Quote data, filled by VMM. + * @data: Quote data on output or TDREPORT on input. + * + * More details of Quote data header can be found in TDX + * Guest-Host Communication Interface (GHCI) for Intel TDX 1.0, + * section titled "TDG.VP.VMCALL<GetQuote>" + */ +struct tdx_quote_hdr { + __u64 version; + __u64 status; + __u32 in_len; + __u32 out_len; + __u64 data[]; +}; + +/* struct tdx_quote_req: Request struct for TDX_CMD_GET_QUOTE IOCTL. + * @buf: Address of user buffer that includes TDREPORT. Upon successful + * completion of IOCTL, output is copied back to the same buffer. + * @len: Length of the Quote buffer. + */ +struct tdx_quote_req { + __u64 buf; + __u64 len; +}; + /* * TDX_CMD_GET_REPORT0 - Get TDREPORT0 (a.k.a. TDREPORT subtype 0) using * TDCALL[TDG.MR.REPORT] @@ -39,4 +74,13 @@ struct tdx_report_req { */ #define TDX_CMD_GET_REPORT0 _IOWR('T', 1, struct tdx_report_req)
+/* + * TDX_CMD_GET_QUOTE - Get TD Guest Quote from QE/QGS using GetQuote + * TDVMCALL. + * + * Returns 0 on success, -EINTR for interrupted request, and + * standard errno on other failures. + */ +#define TDX_CMD_GET_QUOTE _IOR('T', 2, struct tdx_quote_req) + #endif /* _UAPI_LINUX_TDX_GUEST_H_ */
On Sat, Mar 25, 2023 at 11:20:38PM -0700, Kuppuswamy Sathyanarayanan wrote:
Since GetQuote support requires usage of DMA APIs, convert TDX guest driver to a platform driver.
Sorry, but that's not a valid reason to use a platform device for fake things like this:
+static struct platform_device *tdx_dev;
Especially a single static one.
+static int tdx_guest_probe(struct platform_device *pdev) +{
- if (tdx_register_event_irq_cb(attestation_callback_handler, pdev))
return -EIO;
- return misc_register(&tdx_misc_dev);
+}
+static int tdx_guest_remove(struct platform_device *pdev) +{
- tdx_unregister_event_irq_cb(attestation_callback_handler, pdev);
- misc_deregister(&tdx_misc_dev);
- return 0;
+}
+static struct platform_driver tdx_guest_driver = {
- .probe = tdx_guest_probe,
- .remove = tdx_guest_remove,
- .driver.name = KBUILD_MODNAME,
+};
static const struct x86_cpu_id tdx_guest_ids[] = { X86_MATCH_FEATURE(X86_FEATURE_TDX_GUEST, NULL), {} @@ -84,16 +310,35 @@ MODULE_DEVICE_TABLE(x86cpu, tdx_guest_ids); static int __init tdx_guest_init(void) {
- int ret;
- if (!x86_match_cpu(tdx_guest_ids)) return -ENODEV;
- return misc_register(&tdx_misc_dev);
- ret = platform_driver_register(&tdx_guest_driver);
- if (ret) {
pr_err("failed to register driver, err=%d\n", ret);
return ret;
- }
No, please do not create a fake platform driver.
- tdx_dev = platform_device_register_simple(KBUILD_MODNAME,
PLATFORM_DEVID_NONE,
NULL, 0);
And please do not create a fake platform device.
As always, do not create fake platform devices for things that are NOT platform devices.
If this device needs DMA (but why?) then make it a real device and tie it to the bus it belongs to (that it is obviously doing DMA on.)
But as-is, this isn't ok, sorry.
thanks,
greg k-h
Hi Greg,
Thanks for the review comments.
On 3/25/23 11:34 PM, Greg KH wrote:
On Sat, Mar 25, 2023 at 11:20:38PM -0700, Kuppuswamy Sathyanarayanan wrote:
Since GetQuote support requires usage of DMA APIs, convert TDX guest driver to a platform driver.
Sorry, but that's not a valid reason to use a platform device for fake things like this:
+static struct platform_device *tdx_dev;
Especially a single static one.
+static int tdx_guest_probe(struct platform_device *pdev) +{
- if (tdx_register_event_irq_cb(attestation_callback_handler, pdev))
return -EIO;
- return misc_register(&tdx_misc_dev);
+}
+static int tdx_guest_remove(struct platform_device *pdev) +{
- tdx_unregister_event_irq_cb(attestation_callback_handler, pdev);
- misc_deregister(&tdx_misc_dev);
- return 0;
+}
+static struct platform_driver tdx_guest_driver = {
- .probe = tdx_guest_probe,
- .remove = tdx_guest_remove,
- .driver.name = KBUILD_MODNAME,
+};
static const struct x86_cpu_id tdx_guest_ids[] = { X86_MATCH_FEATURE(X86_FEATURE_TDX_GUEST, NULL), {} @@ -84,16 +310,35 @@ MODULE_DEVICE_TABLE(x86cpu, tdx_guest_ids); static int __init tdx_guest_init(void) {
- int ret;
- if (!x86_match_cpu(tdx_guest_ids)) return -ENODEV;
- return misc_register(&tdx_misc_dev);
- ret = platform_driver_register(&tdx_guest_driver);
- if (ret) {
pr_err("failed to register driver, err=%d\n", ret);
return ret;
- }
No, please do not create a fake platform driver.
- tdx_dev = platform_device_register_simple(KBUILD_MODNAME,
PLATFORM_DEVID_NONE,
NULL, 0);
And please do not create a fake platform device.
As always, do not create fake platform devices for things that are NOT platform devices.
If this device needs DMA (but why?) then make it a real device and tie
Since the Quote generation process requires the guest and VMM to share data, we need to use shared buffers. But in TDX guest, VMM cannot directly access the guest private memory. Any memory that is required for communication with the VMM must be shared explicitly. One way is to allocate the memory on demand and share it explicitly using set_memory_decrypted() call. Although this approach is clean, since it breaks the direct mapping, it is not efficient. An alternative way is to reserve a pool of shared buffers during boot time and re-use them for guest/VMM communication. Since in TDX, DMA API is configured to tap into the SWIOTLB memory framework (which is shared by default), we try to use dma_alloc_* APIs to allocate the shared buffers from the shared pool without breaking the direct map.
If usage of DMA APIs / platform device is not acceptable for this use case, an alternative approach is to allocate a fixed number of shared buffers during the TDX guest driver probe and use it for GetQuote requests. Although it would limit the amount of memory we can use for GetQuote requests at a time and also reserve a chunk of memory during the init() time, I think it is an acceptable tradeoff when compared to alternative choices. The AMD SEV guest driver also adopts the similar approach. Please let me know if this approach is acceptable.
it to the bus it belongs to (that it is obviously doing DMA on.)
thanks,
greg k-h
On Sun, Mar 26, 2023 at 12:06:26PM -0700, Sathyanarayanan Kuppuswamy wrote:
If usage of DMA APIs / platform device is not acceptable for this use case,
It is not ok to use a platform device for this because you just do not have a platform device for it, don't make one up out of thin air please, as that really doesn't even have the correct bindings to the DMA memory that you want here.
an alternative approach is to allocate a fixed number of shared buffers during the TDX guest driver probe and use it for GetQuote requests. Although it would limit the amount of memory we can use for GetQuote requests at a time and also reserve a chunk of memory during the init() time, I think it is an acceptable tradeoff when compared to alternative choices. The AMD SEV guest driver also adopts the similar approach. Please let me know if this approach is acceptable.
This sounds like a better approach.
thanks,
greg k-h
In TDX guest, the second stage of the attestation process is Quote generation. This process is required to convert the locally generated TDREPORT into a remotely verifiable Quote. It involves sending the TDREPORT data to a Quoting Enclave (QE) which will verify the integerity of the TDREPORT and sign it with an attestation key.
Intel's TDX attestation driver exposes TDX_CMD_GET_QUOTE IOCTL to allow user agent get the TD Quote.
Add a kernel selftest module to verify the Quote generation feature.
TD Quote generation involves following steps:
* Get the TDREPORT data using TDX_CMD_GET_REPORT IOCTL. * Embed the TDREPORT data in quote buffer and request for quote generation via TDX_CMD_GET_QUOTE IOCTL request. * Upon completion of the GetQuote request, check for non zero value in the status field of Quote header to make sure the generated quote is valid.
Reviewed-by: Tony Luck tony.luck@intel.com Reviewed-by: Andi Kleen ak@linux.intel.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com --- tools/testing/selftests/tdx/tdx_guest_test.c | 68 ++++++++++++++++++-- 1 file changed, 62 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/tdx/tdx_guest_test.c b/tools/testing/selftests/tdx/tdx_guest_test.c index 81d8cb88ea1a..2eccde54185b 100644 --- a/tools/testing/selftests/tdx/tdx_guest_test.c +++ b/tools/testing/selftests/tdx/tdx_guest_test.c @@ -18,6 +18,7 @@ #define TDX_GUEST_DEVNAME "/dev/tdx_guest" #define HEX_DUMP_SIZE 8 #define DEBUG 0 +#define QUOTE_SIZE 8192
/** * struct tdreport_type - Type header of TDREPORT_STRUCT. @@ -128,21 +129,29 @@ static void print_array_hex(const char *title, const char *prefix_str, printf("\n"); }
+/* Helper function to get TDREPORT */ +long get_tdreport0(int devfd, struct tdx_report_req *req) +{ + int i; + + /* Generate sample report data */ + for (i = 0; i < TDX_REPORTDATA_LEN; i++) + req->reportdata[i] = i; + + return ioctl(devfd, TDX_CMD_GET_REPORT0, req); +} + TEST(verify_report) { struct tdx_report_req req; struct tdreport *tdreport; - int devfd, i; + int devfd;
devfd = open(TDX_GUEST_DEVNAME, O_RDWR | O_SYNC); ASSERT_LT(0, devfd);
- /* Generate sample report data */ - for (i = 0; i < TDX_REPORTDATA_LEN; i++) - req.reportdata[i] = i; - /* Get TDREPORT */ - ASSERT_EQ(0, ioctl(devfd, TDX_CMD_GET_REPORT0, &req)); + ASSERT_EQ(0, get_tdreport0(devfd, &req));
if (DEBUG) { print_array_hex("\n\t\tTDX report data\n", "", @@ -160,4 +169,51 @@ TEST(verify_report) ASSERT_EQ(0, close(devfd)); }
+TEST(verify_quote) +{ + struct tdx_quote_hdr *quote_hdr; + struct tdx_report_req rep_req; + struct tdx_quote_req req; + __u64 quote_buf_size; + __u8 *quote_buf; + int devfd; + + /* Open attestation device */ + devfd = open(TDX_GUEST_DEVNAME, O_RDWR | O_SYNC); + + ASSERT_LT(0, devfd); + + /* Add size for quote header */ + quote_buf_size = sizeof(*quote_hdr) + QUOTE_SIZE; + + /* Allocate quote buffer */ + quote_buf = malloc(quote_buf_size); + ASSERT_NE(NULL, quote_buf); + + quote_hdr = (struct tdx_quote_hdr *)quote_buf; + + /* Initialize GetQuote header */ + quote_hdr->version = 1; + quote_hdr->status = GET_QUOTE_SUCCESS; + quote_hdr->in_len = TDX_REPORT_LEN; + quote_hdr->out_len = 0; + + /* Get TDREPORT data */ + ASSERT_EQ(0, get_tdreport0(devfd, &rep_req)); + + /* Fill GetQuote request */ + memcpy(quote_hdr->data, rep_req.tdreport, TDX_REPORT_LEN); + req.buf = (__u64)quote_buf; + req.len = quote_buf_size; + + ASSERT_EQ(0, ioctl(devfd, TDX_CMD_GET_QUOTE, &req)); + + /* Check whether GetQuote request is successful */ + EXPECT_EQ(0, quote_hdr->status); + + free(quote_buf); + + ASSERT_EQ(0, close(devfd)); +} + TEST_HARNESS_MAIN
On 3/26/23 00:20, Kuppuswamy Sathyanarayanan wrote:
In TDX guest, the second stage of the attestation process is Quote generation. This process is required to convert the locally generated TDREPORT into a remotely verifiable Quote. It involves sending the TDREPORT data to a Quoting Enclave (QE) which will verify the integerity of the TDREPORT and sign it with an attestation key.
Intel's TDX attestation driver exposes TDX_CMD_GET_QUOTE IOCTL to allow user agent get the TD Quote.
Add a kernel selftest module to verify the Quote generation feature.
TD Quote generation involves following steps:
- Get the TDREPORT data using TDX_CMD_GET_REPORT IOCTL.
- Embed the TDREPORT data in quote buffer and request for quote generation via TDX_CMD_GET_QUOTE IOCTL request.
- Upon completion of the GetQuote request, check for non zero value in the status field of Quote header to make sure the generated quote is valid.
Reviewed-by: Tony Luck tony.luck@intel.com Reviewed-by: Andi Kleen ak@linux.intel.com Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Signed-off-by: Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com
tools/testing/selftests/tdx/tdx_guest_test.c | 68 ++++++++++++++++++-- 1 file changed, 62 insertions(+), 6 deletions(-)
Looks good to me.
Reviewed-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Sat, Mar 25, 2023 at 11:20 PM Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com wrote:
Hi All,
In TDX guest, the attestation process is used to verify the TDX guest trustworthiness to other entities before provisioning secrets to the guest.
The TDX guest attestation process consists of two steps:
- TDREPORT generation
- Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest measurement data in the format of TDREPORT which is further used to validate the authenticity of the TDX guest. The second step involves sending the TDREPORT to a Quoting Enclave (QE) server to generate a remotely verifiable Quote. TDREPORT by design can only be verified on the local platform. To support remote verification of the TDREPORT, TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT locally and convert it to a remotely verifiable Quote. Although attestation software can use communication methods like TCP/IP or vsock to send the TDREPORT to QE, not all platforms support these communication models. So TDX GHCI specification [1] defines a method for Quote generation via hypercalls. Please check the discussion from Google [2] and Alibaba [3] which clarifies the need for hypercall based
Thanks Sathyanarayanan for submitting patches again.
I just wanted to reiterate what I said before that having a clean TDVMCALL based interface to get TDX Quote without any virtio/vsock dependency is critical for us to support many use cases.
+Chong Cai
Adding a colleague per his request since he's not subscribed to the list yet.
On Mon, Mar 27, 2023 at 10:36 AM Erdem Aktas erdemaktas@google.com wrote:
On Sat, Mar 25, 2023 at 11:20 PM Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com wrote:
Hi All,
In TDX guest, the attestation process is used to verify the TDX guest trustworthiness to other entities before provisioning secrets to the guest.
The TDX guest attestation process consists of two steps:
- TDREPORT generation
- Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest measurement data in the format of TDREPORT which is further used to validate the authenticity of the TDX guest. The second step involves sending the TDREPORT to a Quoting Enclave (QE) server to generate a remotely verifiable Quote. TDREPORT by design can only be verified on the local platform. To support remote verification of the TDREPORT, TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT locally and convert it to a remotely verifiable Quote. Although attestation software can use communication methods like TCP/IP or vsock to send the TDREPORT to QE, not all platforms support these communication models. So TDX GHCI specification [1] defines a method for Quote generation via hypercalls. Please check the discussion from Google [2] and Alibaba [3] which clarifies the need for hypercall based
Thanks Sathyanarayanan for submitting patches again.
I just wanted to reiterate what I said before that having a clean TDVMCALL based interface to get TDX Quote without any virtio/vsock dependency is critical for us to support many use cases.
On Tue, Mar 28, 2023 at 12:59 PM Dionna Amalie Glaze dionnaglaze@google.com wrote:
+Chong Cai
Adding a colleague per his request since he's not subscribed to the list yet.
On Mon, Mar 27, 2023 at 10:36 AM Erdem Aktas erdemaktas@google.com wrote:
On Sat, Mar 25, 2023 at 11:20 PM Kuppuswamy Sathyanarayanan sathyanarayanan.kuppuswamy@linux.intel.com wrote:
Hi All,
In TDX guest, the attestation process is used to verify the TDX guest trustworthiness to other entities before provisioning secrets to the guest.
The TDX guest attestation process consists of two steps:
- TDREPORT generation
- Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest measurement data in the format of TDREPORT which is further used to validate the authenticity of the TDX guest. The second step involves sending the TDREPORT to a Quoting Enclave (QE) server to generate a remotely verifiable Quote. TDREPORT by design can only be verified on the local platform. To support remote verification of the TDREPORT, TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT locally and convert it to a remotely verifiable Quote. Although attestation software can use communication methods like TCP/IP or vsock to send the TDREPORT to QE, not all platforms support these communication models. So TDX GHCI specification [1] defines a method for Quote generation via hypercalls. Please check the discussion from Google [2] and Alibaba [3] which clarifies the need for hypercall based
Thanks Sathyanarayanan for submitting patches again.
I just wanted to reiterate what I said before that having a clean TDVMCALL based interface to get TDX Quote without any virtio/vsock dependency is critical for us to support many use cases.
+1 to Erdem's point. A simple TDVMCALL interface could make it much easier for user cases that can not depend on virtio and vsock. Without the TDVMCALL, it will largely limit those user cases to adopt TDX. Thanks Sathyanarayanan for submitting this patch. -- -Dionna Glaze, PhD (she/her)
linux-kselftest-mirror@lists.linaro.org