On 08/08/2023 11:18, John Garry wrote:
On 07/08/2023 15:20, James Clark wrote:
Hi James,
Thanks for the effort in doing this.
N2 r0p3 doesn't require the workaround [1], so gating on (#slots - 5) no longer works because all N2s have 5 slots. Add a new expression builtin that identifies the need for the workaround correctly.
Signed-off-by: James Clark james.clark@arm.com
tools/perf/arch/arm64/util/pmu.c | 21 +++++++++++++++++++ .../arm64/arm/neoverse-n2-v2/metrics.json | 8 +++---- tools/perf/util/expr.c | 4 ++++ tools/perf/util/pmu.c | 6 ++++++ tools/perf/util/pmu.h | 1 + 5 files changed, 36 insertions(+), 4 deletions(-)
diff --git a/tools/perf/arch/arm64/util/pmu.c b/tools/perf/arch/arm64/util/pmu.c index 561de0cb6b95..30e2385a83cf 100644 --- a/tools/perf/arch/arm64/util/pmu.c +++ b/tools/perf/arch/arm64/util/pmu.c @@ -2,6 +2,7 @@ #include <internal/cpumap.h> #include "../../../util/cpumap.h" +#include "../../../util/header.h" #include "../../../util/pmu.h" #include "../../../util/pmus.h" #include <api/fs/fs.h> @@ -62,3 +63,23 @@ double perf_pmu__cpu_slots_per_cycle(void) return slots ? (double)slots : NAN; }
+double perf_pmu__no_stall_errata(void)
While I like the approach of encoding the per-CPU support in the metric expression, I find that literal "no stall errata" is vague and could apply to any "stall errata" for any SoC for any architecture.
If we were going to do it this way, then we would need a more specific name, like perf_pmu__no_stall_errata_arm64_errata123456(), but that should not be in the core perf code.
Could we instead add a function to check cpuid and have something like this in the JSON:
"MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if (cpuid_less_than(410fd493)) else (stall_slot - cpu_cycles)) / (#slots * cpu_cycles))"
I'm currently figuring out how cpuid_less_than() would be implemented (I'm no great python wrangler), but it would be along the lines of what Ian added for "has_event" in https://lore.kernel.org/linux-perf-users/20230623151016.4193660-1-irogers@go...
Thanks, John
Yeah it looks like it could be done that way. Also, the way I added it, it doesn't have access to the PMU type, it just does a generic pmu__find_core_pmu() so won't work very well for heterogeneous systems.
If we're going to do a deeper modification of the expression parser like with has_event() it might be possible to pass in the actual CPU ID that the metric is running on which would be better.
I'll have a look.
James