On Tue, 2023-04-04 at 18:02 -0700, Sathyanarayanan Kuppuswamy wrote:
On 3/27/23 9:02 PM, Huang, Kai wrote:
On Mon, 2023-03-27 at 19:50 -0700, Sathyanarayanan Kuppuswamy wrote:
Hi Kai,
On 3/27/23 7:38 PM, Huang, Kai wrote:
+/* Reserve an IRQ from x86_vector_domain for TD event notification */ +static int __init tdx_event_irq_init(void) +{
- struct irq_alloc_info info;
- cpumask_t saved_cpumask;
- struct irq_cfg *cfg;
- int cpu, irq;
- if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST))
return 0;
- init_irq_alloc_info(&info, NULL);
- /*
* Event notification vector will be delivered to the CPU
* in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested.
* So set the IRQ affinity to the current CPU.
*/
- cpu = get_cpu();
- cpumask_copy(&saved_cpumask, current->cpus_ptr);
- info.mask = cpumask_of(cpu);
- put_cpu();
The 'saved_cpumask' related code is ugly. If you move put_cpu() to the end of this function, I think you can remove all related code:
cpu = get_cpu();
/* * Set @info->mask to local cpu to make sure a valid vector is * pre-allocated when TDX event notification IRQ is allocated * from x86_vector_domain. */ init_irq_alloc_info(&info, cpumask_of(cpu));
// rest staff: request_irq(), hypercall ...
put_cpu();
init_irq_alloc_info() is a sleeping function. Since get_cpu() disables preemption, we cannot call sleeping function after it. Initially, I have implemented it like you have mentioned. However, I discovered the following error.
Oh sorry I forgot this. So I think we should use migrate_disable() instead:
migrate_disable();
init_irq_alloc_info(&info, cpumask_of(smp_processor_id()));
...
migrate_enable();
Or, should we just use early_initcall() so that only BSP is running? IMHO it's OK to always allocate the vector from BSP.
Anyway, either way is fine to me.
Final version looks like below.
static int __init tdx_event_irq_init(void) { struct irq_alloc_info info; struct irq_cfg *cfg; int irq;
if (!cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) return 0; init_irq_alloc_info(&info, NULL); /* * Event notification vector will be delivered to the CPU * in which TDVMCALL_SETUP_NOTIFY_INTR hypercall is requested. * So set the IRQ affinity to the current CPU. */ info.mask = cpumask_of(0); irq = irq_domain_alloc_irqs(x86_vector_domain, 1, cpu_to_node(0), &info); if (irq <= 0) { pr_err("Event notification IRQ allocation failed %d\n", irq); return -EIO; } irq_set_handler(irq, handle_edge_irq); /* Since the IRQ affinity is set, it cannot be balanced */ if (request_irq(irq, tdx_event_irq_handler, IRQF_NOBALANCING, "tdx_event_irq", NULL)) { pr_err("Event notification IRQ request failed\n"); goto err_free_domain_irqs; } cfg = irq_cfg(irq); /* * Since tdx_event_irq_init() is triggered via early_initcall(), * it will called before secondary CPUs bringup. Since there is * only one CPU, it complies with the requirement of executing * the TDVMCALL_SETUP_NOTIFY_INTR hypercall on the same CPU where * the IRQ vector is allocated. * * Register callback vector address with VMM. More details * about the ABI can be found in TDX Guest-Host-Communication * Interface (GHCI), sec titled * "TDG.VP.VMCALL<SetupEventNotifyInterrupt>". */ if (_tdx_hypercall(TDVMCALL_SETUP_NOTIFY_INTR, cfg->vector, 0, 0, 0)) { pr_err("Event notification hypercall failed\n"); goto err_free_irqs; } tdx_event_irq = irq; return 0;
err_free_irqs: free_irq(irq, NULL); err_free_domain_irqs: irq_domain_free_irqs(irq, 1);
return -EIO;
} early_initcall(tdx_event_irq_init)
I found there's another series also doing similar thing, and it seems Thomas wasn't happy about using x86_vector_domain directly:
https://lore.kernel.org/lkml/877cv99k0y.ffs@tglx/
An alternative was also posted (creating IRQ domain on top of x86_vector_domain):
https://lore.kernel.org/lkml/20230328182933.GA1403032@vm02.guest.corp.micros...
I think we should monitor that and hear from others more.