On Fri, Apr 25, 2025, Kan Liang wrote:
On 2025-04-25 9:43 a.m., Peter Zijlstra wrote:
On Fri, Apr 25, 2025 at 09:06:26AM -0400, Liang, Kan wrote:
On 2025-04-25 7:15 a.m., Peter Zijlstra wrote:
On Mon, Mar 24, 2025 at 05:30:50PM +0000, Mingwei Zhang wrote:
From: Kan Liang kan.liang@linux.intel.com
Implement switch_guest_ctx interface for x86 PMU, switch PMI to dedicated KVM_GUEST_PMI_VECTOR at perf guest enter, and switch PMI back to NMI at perf guest exit.
Signed-off-by: Xiong Zhang xiong.y.zhang@linux.intel.com Signed-off-by: Kan Liang kan.liang@linux.intel.com Tested-by: Yongwei Ma yongwei.ma@intel.com Signed-off-by: Mingwei Zhang mizhang@google.com
arch/x86/events/core.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 8f218ac0d445..28161d6ff26d 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2677,6 +2677,16 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu) return ret; } +static void x86_pmu_switch_guest_ctx(bool enter, void *data) +{
- u32 guest_lvtpc = *(u32 *)data;
- if (enter)
apic_write(APIC_LVTPC, guest_lvtpc);
- else
apic_write(APIC_LVTPC, APIC_DM_NMI);
+}
This, why can't it use x86_pmu.guest_lvtpc here and call it a day? Why is that argument passed around through the generic code only to get back here?
The vector has to be from the KVM. However, the current interfaces only support KVM read perf variables, e.g., perf_get_x86_pmu_capability and perf_get_hw_event_config. We need to add an new interface to allow the KVM write a perf variable, e.g., perf_set_guest_lvtpc.
But all that should remain in x86, there is no reason what so ever to leak this into generic code.
Finally prepping v5, and this is one of two <knock wood> comments that isn't fully addressed.
The vector isn't a problem; that's *always* PERF_GUEST_MEDIATED_PMI_VECTOR and so doesn't even require anything in x86_pmu.
But whether or not the entry should be masked comes from the guest's LVTPC entry, and I don't see a cleaner way to get that information into x86, especially since the switch between host and guest PMI needs to happen in the "perf context disabled" section.
I think/hope I dressed up the code so that it's not _so_ ugly, and so that it's fully extensible in the unlikely event a non-x86 arch were to ever support a mediated vPMU, e.g. @data could be used to pass a pointer to a struct.
void perf_load_guest_context(unsigned long data) { struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
lockdep_assert_irqs_disabled();
guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);
if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded))) return;
perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST); ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST); if (cpuctx->task_ctx) { perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST); task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST); }
arch_perf_load_guest_context(data);
... }
void arch_perf_load_guest_context(unsigned long data) { u32 masked = data & APIC_LVT_MASKED;
apic_write(APIC_LVTPC, APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked); this_cpu_write(x86_guest_ctx_loaded, true); }
Holler if you have a better idea. I'll plan on posting v5 in the next day or so no matter what, so that it's not delayed for this one thing (it's already been delayed more than I was hoping, and there are a lot of changes relative to v4).