On 04/12/2023 09:59, Marc Zyngier wrote:
> On Thu, 19 Oct 2023 17:55:02 +0100,
> James Clark <james.clark(a)arm.com> wrote:
>>
>> Add an interface for the Coresight driver to use to set the value of the
>> TRFCR register for the guest. This register controls the exclude
>> settings for trace at different exception levels, and is used to honor
>> the exclude_host and exclude_guest parameters from the Perf session.
>> This will be used to later write TRFCR_EL1 on nVHE at guest switch. For
>> VHE, the host trace is controlled by TRFCR_EL2 and thus we can write to
>> the TRFCR_EL1 immediately. Because guest writes to the register are
>> trapped, the value will persist and can't be modified.
>>
>> The settings must be copied to the vCPU before each run in the same
>> way that PMU events are, because the per-cpu struct isn't accessible in
>> protected mode.
>
> Then maybe we should look at a better way of sharing global data
> between EL1 and EL2 instead of copying stuff ad-nauseam?
>
That probably makes sense, I can have a look into that.
>>
>> Signed-off-by: James Clark <james.clark(a)arm.com>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 3 +++
>> arch/arm64/kvm/arm.c | 1 +
>> arch/arm64/kvm/debug.c | 26 ++++++++++++++++++++++++++
>> 3 files changed, 30 insertions(+)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 0f0bf8e641bd..e1852102550d 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -1125,6 +1125,8 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu);
>> void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr);
>> void kvm_clr_pmu_events(u32 clr);
>> bool kvm_set_pmuserenr(u64 val);
>> +void kvm_etm_set_guest_trfcr(u64 trfcr_guest);
>> +void kvm_etm_update_vcpu_events(struct kvm_vcpu *vcpu);
>> #else
>> static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {}
>> static inline void kvm_clr_pmu_events(u32 clr) {}
>> @@ -1132,6 +1134,7 @@ static inline bool kvm_set_pmuserenr(u64 val)
>> {
>> return false;
>> }
>> +static inline void kvm_etm_set_guest_trfcr(u64 trfcr_guest) {}
>> #endif
>>
>> void kvm_vcpu_load_sysregs_vhe(struct kvm_vcpu *vcpu);
>> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
>> index 0f717b6a9151..e4d846f2f665 100644
>> --- a/arch/arm64/kvm/arm.c
>> +++ b/arch/arm64/kvm/arm.c
>> @@ -1015,6 +1015,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
>> kvm_vgic_flush_hwstate(vcpu);
>>
>> kvm_pmu_update_vcpu_events(vcpu);
>> + kvm_etm_update_vcpu_events(vcpu);
>>
>> /*
>> * Ensure we set mode to IN_GUEST_MODE after we disable
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 20cdd40b3c42..2ab41b954512 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -23,6 +23,12 @@
>>
>> static DEFINE_PER_CPU(u64, mdcr_el2);
>>
>> +/*
>> + * Per CPU value for TRFCR that should be applied to any guest vcpu that may
>> + * run on that core in the future.
>> + */
>> +static DEFINE_PER_CPU(u64, guest_trfcr);
>> +
>> /**
>> * save/restore_guest_debug_regs
>> *
>> @@ -356,3 +362,23 @@ void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu)
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE);
>> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR);
>> }
>> +
>> +void kvm_etm_set_guest_trfcr(u64 trfcr_guest)
>> +{
>> + if (has_vhe())
>> + write_sysreg_s(trfcr_guest, SYS_TRFCR_EL12);
>> + else
>> + *this_cpu_ptr(&guest_trfcr) = trfcr_guest;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_etm_set_guest_trfcr);
>
> How does the ETM code know what guests it impacts? Don't you have some
> per-process context already?
>
It doesn't know what guests it impacts, it just does it blindly based on
host CPU and whatever guest might run on the CPU in the future. PMU
events are the same.
We do have per-process context for per-process sessions, so if that was
the VM process we might have been able to do something with that info.
But we also have per-cpu sessions that would trace anything that runs on
that CPU, so to be able to support that mode I think it has to be done
without knowing about any guest.
>> +
>> +/*
>> + * Updates the vcpu's view of the etm events for this cpu. Must be
>> + * called before every vcpu run after disabling interrupts, to ensure
>> + * that an interrupt cannot fire and update the structure.
>> + */
>> +void kvm_etm_update_vcpu_events(struct kvm_vcpu *vcpu)
>> +{
>> + if (!has_vhe() && vcpu_get_flag(vcpu, DEBUG_STATE_SAVE_TRFCR))
>> + ctxt_sys_reg(&vcpu->arch.ctxt, TRFCR_EL1) = *this_cpu_ptr(&guest_trfcr);
>> +}
>
> Why this requirement of updating it at all times? Why can't this be
> done in a more lazy way, using the flags to instruct the hypervisor
> what and when to load it?
>
> M.
>
I could probably add a flag that gets set if the guest value should be
different to the host value. I was just trying to keep it simple and in
terms of just what the registers should be.
The PMU one has something similar where it doesn't write anything if
kvm_pmu_switch_needed() is false, but that's only on the path where the
host sets the events, it still always does the copy in
kvm_pmu_update_vcpu_events() before the guest switch.
I suppose if I make the change to have the shared global data then the
copy isn't needed and this function and kvm_pmu_update_vcpu_events()
will just get deleted.
Thanks
James
On 04/12/2023 18:37, kernel test robot wrote:
> Hi James,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on atorgue-stm32/stm32-next]
> [also build test WARNING on linus/master v6.7-rc4 next-20231204]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/James-Clark/coresight-Make-c…
> base: https://git.kernel.org/pub/scm/linux/kernel/git/atorgue/stm32.git stm32-next
> patch link: https://lore.kernel.org/r/20231115162834.355598-1-james.clark%40arm.com
> patch subject: [PATCH v2] coresight: Make current W=1 warnings default
> config: arm-randconfig-r131-20231117 (https://download.01.org/0day-ci/archive/20231205/202312050158.FKpxuafP-lkp@…)
> compiler: arm-linux-gnueabi-gcc (GCC) 13.2.0
> reproduce: (https://download.01.org/0day-ci/archive/20231205/202312050158.FKpxuafP-lkp@…)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp(a)intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202312050158.FKpxuafP-lkp@intel.com/
>
> sparse warnings: (new ones prefixed by >>)
>>> drivers/hwtracing/coresight/coresight-etb10.c:840:17: sparse: sparse: Using plain integer as NULL pointer
> drivers/hwtracing/coresight/coresight-etb10.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/xarray.h, ...):
> include/linux/page-flags.h:242:46: sparse: sparse: self-comparison always evaluates to false
> --
>>> drivers/hwtracing/coresight/coresight-funnel.c:395:17: sparse: sparse: Using plain integer as NULL pointer
> drivers/hwtracing/coresight/coresight-funnel.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/slab.h, ...):
> include/linux/page-flags.h:242:46: sparse: sparse: self-comparison always evaluates to false
> --
>>> drivers/hwtracing/coresight/coresight-tpdm.c:242:17: sparse: sparse: Using plain integer as NULL pointer
> drivers/hwtracing/coresight/coresight-tpdm.c: note: in included file (through include/linux/mmzone.h, include/linux/gfp.h, include/linux/xarray.h, ...):
> include/linux/page-flags.h:242:46: sparse: sparse: self-comparison always evaluates to false
>
> vim +840 drivers/hwtracing/coresight/coresight-etb10.c
>
> 834
> 835 static const struct amba_id etb_ids[] = {
> 836 {
> 837 .id = 0x000bb907,
> 838 .mask = 0x000fffff,
> 839 },
> > 840 { 0, 0, 0 },
> 841 };
> 842
>
This should already be fixed in V3 here:
https://lore.kernel.org/linux-arm-kernel/20231123120459.287578-1-james.clar…
On 04/12/2023 09:29, Marc Zyngier wrote:
> On Thu, 19 Oct 2023 17:55:00 +0100,
> James Clark <james.clark(a)arm.com> wrote:
>>
>> pmscr_el1 and trfcr_el1 are currently special cased in the
>> host_debug_state struct, but they're just registers after all so give
>> them entries in the sysreg array and refer to them through the host
>> context.
>>
>> Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> Signed-off-by: James Clark <james.clark(a)arm.com>
>> ---
>> arch/arm64/include/asm/kvm_host.h | 6 ++--
>> arch/arm64/include/asm/kvm_hyp.h | 4 +--
>> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 44 +++++++++++++++---------------
>> arch/arm64/kvm/hyp/nvhe/switch.c | 4 +--
>> 4 files changed, 28 insertions(+), 30 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>> index 4a966c0d7373..7c82927ddaf2 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -437,6 +437,8 @@ enum vcpu_sysreg {
>> CNTHP_CVAL_EL2,
>> CNTHV_CTL_EL2,
>> CNTHV_CVAL_EL2,
>> + PMSCR_EL1, /* Statistical profiling extension */
>> + TRFCR_EL1, /* Self-hosted trace filters */
>
> Why this move? Are you also adding guest support for SPE?
>
> Until you do, I don't see the need to pollute the guest's sysreg
> namespace.
>
> M.
>
Ah ok yes, I think I misunderstood your previous review comment. You're
right I don't touch SPE and it's only ever for the host so I can leave
it where it was in struct host_debug_state.
James
On 12/1/23 18:05, Sudeep Holla wrote:
> On Fri, Dec 01, 2023 at 11:50:47AM +0530, Anshuman Khandual wrote:
>> Add support for the dynamic replicator device in the platform driver, which
>> can then be used on ACPI based platforms. This change would now allow
>> runtime power management for repliacator devices on ACPI based systems.
>>
>> The driver would try to enable the APB clock if available. Also, rename the
>> code to reflect the fact that it now handles both static and dynamic
>> replicators.
>>
>> Cc: Lorenzo Pieralisi <lpieralisi(a)kernel.org>
>> Cc: Sudeep Holla <sudeep.holla(a)arm.com>
>
> Except the minor nit below which may apply also for few other patches
> in the series
>
> Acked-by: Sudeep Holla <sudeep.holla(a)arm.com> # For ACPI related changes
> Tested-by: Sudeep Holla <sudeep.holla(a)arm.com> # Boot and driver probe only
>
> [...]
>
>> diff --git a/drivers/hwtracing/coresight/coresight-replicator.c b/drivers/hwtracing/coresight/coresight-replicator.c
>> index b6be73034996..64de0bee02ec 100644
>> --- a/drivers/hwtracing/coresight/coresight-replicator.c
>> +++ b/drivers/hwtracing/coresight/coresight-replicator.c
>> @@ -38,6 +38,7 @@ DEFINE_CORESIGHT_DEVLIST(replicator_devs, "replicator");
>> struct replicator_drvdata {
>> void __iomem *base;
>> struct clk *atclk;
>> + struct clk *pclk;
>
> [minor nit] Perhaps can be documented as well ?
Sure, will add the following comment above the structure.
@pclk: optional clock for the core parts of the replicator.
Hi All,
On a Linux system with isolated cpus and nohz_full[1] we see the following issue:
While profiling an application running on isolated CPU with Coresight via Perf
tool, we noticed that only one sample of trace data is collected. To collect
more samples, we tried using the -F max option.
$ perf record -e cs_etm/@tmc_etr0/ -F max -C 5 taskset -c 5 test-app
This results in a lot of trace data samples being generated. As a side-effect
several timer work interrupts are handled on that core which are affecting
latency of the application running on the isolated cpu.
Setting -F to a lower value isn't changing the behavior and still results
in the same amount of trace samples as -F max option.
Have you noticed this behavior with -F option ?
To avoid too many timer interrupts being handled on the profiled core, are
the following solutions viable ?
1. Send IPIs to enable/disable the ETMs and handle the timer interrupts on a
different Core.
2. CTI to enable/disable the ETMs on every timer expiry.
Can you please suggest any other solutions ?
With Regards,
Tanmay
[1] Kernel command line: isolcpus=4-23 nohz_full=4-23 rcu_nocbs=4-23
Hi Greg
Please find fixes for hwtracing/coresight subsystem, targetting Linux v6.7.
Kindly pull
Suzuki
The following changes since commit b85ea95d086471afb4ad062012a4d73cd328fa86:
Linux 6.7-rc1 (2023-11-12 16:19:07 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git tags/coresight-fixes-for-v6.7-rc1
for you to fetch changes up to 862c135bde8bc185e8aae2110374175e6a1b6ed5:
coresight: ultrasoc-smb: Fix uninitialized before use buf_hw_base (2023-11-16 10:00:14 +0000)
----------------------------------------------------------------
coresight: Fixes for v6.7-rc1
Here are a few fixes for the hwtracing subsystem targetting v6.7.
Includes:
- Ultrasoc-SMB driver fixes
- HiSilicon PTT driver fixes
- Corsight driver fixes
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
----------------------------------------------------------------
James Clark (1):
coresight: Fix crash when Perf and sysfs modes are used concurrently
Junhao He (4):
hwtracing: hisi_ptt: Add dummy callback pmu::read()
coresight: ultrasoc-smb: Fix sleep while close preempt in enable_smb
coresight: ultrasoc-smb: Config SMB buffer before register sink
coresight: ultrasoc-smb: Fix uninitialized before use buf_hw_base
Uwe Kleine-König (1):
coresight: etm4x: Remove bogous __exit annotation for some functions
Vegard Nossum (1):
Documentation: coresight: fix `make refcheckdocs` warning
Yicong Yang (2):
hwtracing: hisi_ptt: Handle the interrupt in hardirq context
hwtracing: hisi_ptt: Don't try to attach a task
Documentation/trace/coresight/coresight.rst | 2 +-
drivers/hwtracing/coresight/coresight-etm-perf.c | 4 +-
drivers/hwtracing/coresight/coresight-etm4x-core.c | 6 +-
drivers/hwtracing/coresight/ultrasoc-smb.c | 64 +++++++++-------------
drivers/hwtracing/coresight/ultrasoc-smb.h | 6 +-
drivers/hwtracing/ptt/hisi_ptt.c | 14 ++++-
6 files changed, 46 insertions(+), 50 deletions(-)
On 29/11/2023 12:23, Peter Zijlstra wrote:
> On Wed, Nov 29, 2023 at 01:15:43PM +0200, Adrian Hunter wrote:
>> On 29/11/23 12:58, Peter Zijlstra wrote:
>>> On Wed, Nov 29, 2023 at 09:53:39AM +0000, James Clark wrote:
>>>> On 23/11/2023 12:18, Adrian Hunter wrote:
>>>
>>>>> +static void pt_event_pause_resume(struct perf_event *event)
>>>>> +{
>>>>> + if (event->aux_paused)
>>>>> + pt_config_stop(event);
>>>>> + else if (!event->hw.state)
>>>>> + pt_config_start(event);
>>>>> +}
>>>>
>>>> It seems like having a single pause/resume callback rather than separate
>>>> pause and resume ones pushes some of the event state management into the
>>>> individual drivers and would be prone to code duplication and divergent
>>>> behavior.
>>>>
>>>> Would it be possible to move the conditions from here into the core code
>>>> and call separate functions instead?
>>>>
>>>>> +
>>>>> static void pt_event_start(struct perf_event *event, int mode)
>>>>> {
>>>>> struct hw_perf_event *hwc = &event->hw;
>>>>> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
>>>>> pt_pmu.pmu.del = pt_event_del;
>>>>> pt_pmu.pmu.start = pt_event_start;
>>>>> pt_pmu.pmu.stop = pt_event_stop;
>>>>> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
>>>>
>>>> The general idea seems ok to me. Is there a reason to not use the
>>>> existing start() stop() callbacks, rather than adding a new one?
>>>>
>>>> I assume it's intended to be something like an optimisation where you
>>>> can turn it on and off without having to do the full setup, teardown and
>>>> emit an AUX record because you know the process being traced never gets
>>>> switched out?
>>>
>>> So the actual scheduling uses ->add() / ->del(), the ->start() /
>>> ->stop() methods are something that can be used after ->add() and before
>>> ->del() to 'temporarily' pause things.
>>>
>>> Pretty much exactly what is required here I think. We currently use this
>>> for PMI throttling and adaptive frequency stuff, but there is no reason
>>> it could not also be used for this.
>>>
>>> As is, we don't track the paused state across ->del() / ->add(), but
>>> perhaps that can be fixed. We can easily add more PERF_EF_ / PERF_HES_
>>> bits to manage things.
>>>
>>>
>>
>> I am not sure stop / start play nice with NMI's from other events e.g.
>>
>> PMC NMI wants to pause or resume AUX but what if AUX event is currently
>> being processed in ->stop() or ->start()? Or maybe that can't happen?
>
> I think that can happen, and pt_event_stop() can actually handle some of
> that, while your pause_resume() thing, which uses pt_config_stop() does
> not.
>
> But yes, I think that if you add pt_event_{stop,start}() calls from
> *other* events their PMI, then you get to deal with more 'fun'.
>
> Something like:
>
> perf_addr_filters_adjust()
> __perf_addr_filters_adjust()
> perf_event_stop()
> __perf_event_stop()
> event->pmu->stop()
> <NMI>
> ...
> perf_event_overflow()
> pt_event->pmu->stop()
> </NMI>
> event->pmu->start() // whoopsie!
>
> Should now be possible.
>
> I think what you want to do is rename pt->handle_nmi into pt->stop_count
> and make it a counter, then ->stop() increments it, and ->start()
> decrements it and everybody ensures the thing doesn't get restart while
> !0 etc..
>
> I suspect you need to guard the generic part of this feature with a new
> PERF_PMU_CAP_ flag and then have the coresight/etc. people opt-in once
> they've audited things.
>
> James, does that work for you?
>
Yep I think that would work.
If I understand it with the stop_count counter, to be able to support
the new CAP, every device would basically have to implement a similar
counter?
Would it be possible to do that reference counting on the outside of
start() and stop() in the core code? So that a device only ever sees one
call to start and one call to stop and doesn't need to do any extra
tracking?
On 23/11/2023 12:18, Adrian Hunter wrote:
> Prevent tracing to start if aux_paused.
>
> Implement pause_resume() callback. When aux_paused, stop tracing. When
> not aux_paused, only start tracing if it isn't currently meant to be
> stopped.
>
> Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
> ---
> arch/x86/events/intel/pt.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 42a55794004a..aa883b64814a 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -418,6 +418,9 @@ static void pt_config_start(struct perf_event *event)
> struct pt *pt = this_cpu_ptr(&pt_ctx);
> u64 ctl = event->hw.config;
>
> + if (event->aux_paused)
> + return;
> +
> ctl |= RTIT_CTL_TRACEEN;
> if (READ_ONCE(pt->vmx_on))
> perf_aux_output_flag(&pt->handle, PERF_AUX_FLAG_PARTIAL);
> @@ -1563,6 +1566,14 @@ EXPORT_SYMBOL_GPL(intel_pt_handle_vmx);
> * PMU callbacks
> */
>
> +static void pt_event_pause_resume(struct perf_event *event)
> +{
> + if (event->aux_paused)
> + pt_config_stop(event);
> + else if (!event->hw.state)
> + pt_config_start(event);
> +}
It seems like having a single pause/resume callback rather than separate
pause and resume ones pushes some of the event state management into the
individual drivers and would be prone to code duplication and divergent
behavior.
Would it be possible to move the conditions from here into the core code
and call separate functions instead?
> +
> static void pt_event_start(struct perf_event *event, int mode)
> {
> struct hw_perf_event *hwc = &event->hw;
> @@ -1798,6 +1809,7 @@ static __init int pt_init(void)
> pt_pmu.pmu.del = pt_event_del;
> pt_pmu.pmu.start = pt_event_start;
> pt_pmu.pmu.stop = pt_event_stop;
> + pt_pmu.pmu.pause_resume = pt_event_pause_resume;
The general idea seems ok to me. Is there a reason to not use the
existing start() stop() callbacks, rather than adding a new one?
I assume it's intended to be something like an optimisation where you
can turn it on and off without having to do the full setup, teardown and
emit an AUX record because you know the process being traced never gets
switched out?
Could you make it so that it works out of the box, with the option of
later optimisation if you do something like this (not here but something
like this in events/core.c):
/* Use specialised pause/resume if it exists, otherwise use more
* expensive start/stop.
*/
if (pmu->pause_resume)
pmu->pause_resume(...)
else
pmu->stop(...)
> pt_pmu.pmu.snapshot_aux = pt_event_snapshot_aux;
> pt_pmu.pmu.read = pt_event_read;
> pt_pmu.pmu.setup_aux = pt_buffer_setup_aux;