Re: [PATCH v4 10/38] perf/x86: Support switch_guest_ctx interface

30 Jul 2025

On Fri, Apr 25, 2025, Kan Liang wrote:
...
On 2025-04-25 9:43 a.m., Peter Zijlstra wrote:
...
On Fri, Apr 25, 2025 at 09:06:26AM -0400, Liang, Kan wrote:
...
On 2025-04-25 7:15 a.m., Peter Zijlstra wrote:
...
On Mon, Mar 24, 2025 at 05:30:50PM +0000, Mingwei Zhang wrote:
...
From: Kan Liang kan.liang@linux.intel.com
Implement switch_guest_ctx interface for x86 PMU, switch PMI to dedicated
KVM_GUEST_PMI_VECTOR at perf guest enter, and switch PMI back to
NMI at perf guest exit.
Signed-off-by: Xiong Zhang xiong.y.zhang@linux.intel.com
Signed-off-by: Kan Liang kan.liang@linux.intel.com
Tested-by: Yongwei Ma yongwei.ma@intel.com
Signed-off-by: Mingwei Zhang mizhang@google.com

arch/x86/events/core.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 8f218ac0d445..28161d6ff26d 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2677,6 +2677,16 @@ static bool x86_pmu_filter(struct pmu *pmu, int cpu)
   return ret;
 }
 
+static void x86_pmu_switch_guest_ctx(bool enter, void *data)
+{

u32 guest_lvtpc = *(u32 *)data;

if (enter)
apic_write(APIC_LVTPC, guest_lvtpc);


else
apic_write(APIC_LVTPC, APIC_DM_NMI);



+}
This, why can't it use x86_pmu.guest_lvtpc here and call it a day? Why
is that argument passed around through the generic code only to get back
here?
The vector has to be from the KVM. However, the current interfaces only
support KVM read perf variables, e.g., perf_get_x86_pmu_capability and
perf_get_hw_event_config.
We need to add an new interface to allow the KVM write a perf variable,
e.g., perf_set_guest_lvtpc.
But all that should remain in x86, there is no reason what so ever to
leak this into generic code.
Finally prepping v5, and this is one of two <knock wood> comments that isn't fully
addressed.
The vector isn't a problem; that's *always* PERF_GUEST_MEDIATED_PMI_VECTOR and
so doesn't even require anything in x86_pmu.
But whether or not the entry should be masked comes from the guest's LVTPC entry,
and I don't see a cleaner way to get that information into x86, especially since
the switch between host and guest PMI needs to happen in the "perf context disabled"
section.
I think/hope I dressed up the code so that it's not _so_ ugly, and so that it's
fully extensible in the unlikely event a non-x86 arch were to ever support a
mediated vPMU, e.g. @data could be used to pass a pointer to a struct.
void perf_load_guest_context(unsigned long data)
  {
    struct perf_cpu_context *cpuctx = this_cpu_ptr(&perf_cpu_context);
lockdep_assert_irqs_disabled();
guard(perf_ctx_lock)(cpuctx, cpuctx->task_ctx);
if (WARN_ON_ONCE(__this_cpu_read(guest_ctx_loaded)))
    	return;
perf_ctx_disable(&cpuctx->ctx, EVENT_GUEST);
    ctx_sched_out(&cpuctx->ctx, NULL, EVENT_GUEST);
    if (cpuctx->task_ctx) {
    	perf_ctx_disable(cpuctx->task_ctx, EVENT_GUEST);
    	task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
    }
arch_perf_load_guest_context(data);
...
  }
void arch_perf_load_guest_context(unsigned long data)
  {
    u32 masked = data & APIC_LVT_MASKED;
apic_write(APIC_LVTPC,
    	   APIC_DM_FIXED | PERF_GUEST_MEDIATED_PMI_VECTOR | masked);
    this_cpu_write(x86_guest_ctx_loaded, true);
  }
Holler if you have a better idea.  I'll plan on posting v5 in the next day or so
no matter what, so that it's not delayed for this one thing (it's already been
delayed more than I was hoping, and there are a lot of changes relative to v4).

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v4 10/38] perf/x86: Support switch_guest_ctx interface