CoreSight August 2023

coresight@lists.linaro.org

12 participants
71 discussions

Re: [PATCH v5 4/6] perf vendor events arm64: Update scale units and descriptions of common topdown metrics

by Arnaldo Carvalho de Melo

Em Fri, Aug 11, 2023 at 03:53:49PM +0100, John Garry escreveu: > On 11/08/2023 15:39, James Clark wrote: > > Metrics will be published here [1] going forwards, but they have > > slightly different scale units. To allow autogenerated metrics to be > > added more easily, update the scale units to match. > > > > The more detailed descriptions have also been taken and added to the > > common file. > > > > [1]:https://urldefense.com/v3/__https://gitlab.arm.com/telemetry-solution/te… > > > > Acked-by: Ian Rogers<irogers(a)google.com> > > Signed-off-by: James Clark<james.clark(a)arm.com> > > Reviewed-by: John Garry <john.g.garry(a)oracle.com> Applied as well, will wait for the others to get the review comment addressed. - Arnaldo

2 years, 7 months

[PATCH v5 0/6] perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo

by James Clark

This is a completely new approach from V3 [1], although the metrics and event descriptions are autogenerated, the topdown metrics have been manually edited to use #no_stall_errata (now directly comparing on the CPUID in v5). This removes the need to duplicate the whole set of JSONs when only the topdown metrics are different between N2 and V2. The CPU ID comparison function still needs to change so that the new literal can compare on versions, but now no change is needed to mapfile or the PMU event generation code because we still only support one set of JSONs per CPU. [1]: https://lore.kernel.org/lkml/20230711100218.1651995-1-james.clark@arm.com/ ------ Changes since v4: * Replace the #no_stall_errata literal with a more generic function for comparing CPU IDs. This will hopefully keep configuration out of the code and inside the JSONs Changes since v3: * Instead of duplicating all the metrics, add a new expression literal that can be used to share the same metrics between N2 and V2 * Move tests to arch/arm64/tests * Remove changes from jevents.py and mapfile.csv Changes since v2: * version -> variant in second commit message * Add a bit more detail about version matching in the second commit message * Update the comments in pmu-events/arch/arm64/mapfile.csv to say that variant and revision fields are now used * Increase the CC list Changes since v1: * Split last change into two so it doesn't hit the mailing list size limit James Clark (6): perf: cs-etm: Don't duplicate FIELD_GET() perf arm64: Allow version comparisons of CPU IDs perf test: Add a test for the new Arm CPU ID comparison behavior perf vendor events arm64: Update scale units and descriptions of common topdown metrics perf vendor events arm64: Update stall_slot workaround for N2 r0p3 perf vendor events arm64: Update N2 and V2 metrics and events using Arm telemetry repo tools/perf/arch/arm64/include/arch-tests.h | 3 + tools/perf/arch/arm64/tests/Build | 1 + tools/perf/arch/arm64/tests/arch-tests.c | 4 + tools/perf/arch/arm64/tests/cpuid-match.c | 38 ++ tools/perf/arch/arm64/util/header.c | 64 ++- tools/perf/arch/arm64/util/pmu.c | 18 +- .../arch/arm64/arm/neoverse-n2-v2/branch.json | 8 - .../arch/arm64/arm/neoverse-n2-v2/bus.json | 18 +- .../arch/arm64/arm/neoverse-n2-v2/cache.json | 155 -------- .../arm64/arm/neoverse-n2-v2/exception.json | 45 ++- .../arm/neoverse-n2-v2/fp_operation.json | 22 ++ .../arm64/arm/neoverse-n2-v2/general.json | 10 + .../arm64/arm/neoverse-n2-v2/instruction.json | 143 ------- .../arm64/arm/neoverse-n2-v2/l1d_cache.json | 54 +++ .../arm64/arm/neoverse-n2-v2/l1i_cache.json | 14 + .../arm64/arm/neoverse-n2-v2/l2_cache.json | 50 +++ .../arm64/arm/neoverse-n2-v2/l3_cache.json | 22 ++ .../arm64/arm/neoverse-n2-v2/ll_cache.json | 10 + .../arch/arm64/arm/neoverse-n2-v2/memory.json | 39 +- .../arm64/arm/neoverse-n2-v2/metrics.json | 365 ++++++++++-------- .../arm64/arm/neoverse-n2-v2/pipeline.json | 23 -- .../arm64/arm/neoverse-n2-v2/retired.json | 30 ++ .../arch/arm64/arm/neoverse-n2-v2/spe.json | 12 +- .../arm/neoverse-n2-v2/spec_operation.json | 110 ++++++ .../arch/arm64/arm/neoverse-n2-v2/stall.json | 30 ++ .../arch/arm64/arm/neoverse-n2-v2/sve.json | 50 +++ .../arch/arm64/arm/neoverse-n2-v2/tlb.json | 66 ++++ .../arch/arm64/arm/neoverse-n2-v2/trace.json | 27 +- tools/perf/pmu-events/arch/arm64/sbsa.json | 24 +- tools/perf/pmu-events/metric.py | 17 +- tools/perf/util/cs-etm.c | 14 +- tools/perf/util/expr.c | 18 + tools/perf/util/expr.h | 1 + tools/perf/util/expr.l | 1 + tools/perf/util/expr.y | 8 +- tools/perf/util/pmu.c | 17 + tools/perf/util/pmu.h | 1 + 37 files changed, 923 insertions(+), 609 deletions(-) create mode 100644 tools/perf/arch/arm64/tests/cpuid-match.c delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/branch.json delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/cache.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/fp_operation.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/general.json delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/instruction.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l1d_cache.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l1i_cache.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l2_cache.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/l3_cache.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/ll_cache.json delete mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/pipeline.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/retired.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/spec_operation.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/stall.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/sve.json create mode 100644 tools/perf/pmu-events/arch/arm64/arm/neoverse-n2-v2/tlb.json -- 2.34.1

2 years, 7 months

Re: [PATCH v5 2/6] perf arm64: Allow version comparisons of CPU IDs

by James Clark

On 14/08/2023 14:07, John Garry wrote: > On 11/08/2023 15:39, James Clark wrote: >> Currently variant and revision fields are masked out of the MIDR so >> it's not possible to compare different versions of the same CPU. >> In a later commit a workaround will be removed just for N2 r0p3, so >> enable comparisons on version. >> >> This has the side effect of changing the MIDR stored in the header of >> the perf.data file to no longer have masked version fields. > > Did you consider adding a raw version of _get_cpuid(), which returns the > full MIDR just for the purpose of caller strcmp_cpuid_str()? I did, but I thought that seeing as it would only be used in one place, and that changing the existing one didn't break anything, that it was better to not fragment the CPU ID interface. I thought it might also have repercussions for the other architectures as well. It would also mean that the MIDR that's stored in the header wouldn't have the version information, which if we're starting to do things with that could be bad. There are already callers of strcmp_cpuid_str() so it's probably best to keep it using the same get_cpuid() string. Unless there is a reason _not_ to do it? There isn't really anything that can't be done with it accepting/returning the full unmasked MIDR. If you want the old behavior, you just set the version fields to 0, which I've also used in a later patch and is already done in mapfile.csv > > I can't comment on how it will be called in relation to > strcmp_cpuid_str(), as I am unsure whether strcmp_cpuid_str() should be > just in arch arm64 code - see comment on later patch. > > It also >> affects the lookups in mapfile.csv, but as that currently only has >> zeroed version fields, it has no actual effect. The mapfile.csv >> documentation also states to zero the version fields, so unless this >> isn't done it will continue to have no effect. >> >> Signed-off-by: James Clark <james.clark(a)arm.com> >> --- >> tools/perf/arch/arm64/util/header.c | 64 ++++++++++++++++++++++------- >> 1 file changed, 50 insertions(+), 14 deletions(-) >> >> diff --git a/tools/perf/arch/arm64/util/header.c >> b/tools/perf/arch/arm64/util/header.c >> index 80b9f6287fe2..8f74e801e1ab 100644 >> --- a/tools/perf/arch/arm64/util/header.c >> +++ b/tools/perf/arch/arm64/util/header.c >> @@ -1,3 +1,6 @@ >> +#include <linux/kernel.h> >> +#include <linux/bits.h> >> +#include <linux/bitfield.h> >> #include <stdio.h> >> #include <stdlib.h> >> #include <perf/cpumap.h> >> @@ -10,14 +13,12 @@ >> #define MIDR "/regs/identification/midr_el1" >> #define MIDR_SIZE 19 >> -#define MIDR_REVISION_MASK 0xf >> -#define MIDR_VARIANT_SHIFT 20 >> -#define MIDR_VARIANT_MASK (0xf << MIDR_VARIANT_SHIFT) >> +#define MIDR_REVISION_MASK GENMASK(3, 0) >> +#define MIDR_VARIANT_MASK GENMASK(23, 20) >> static int _get_cpuid(char *buf, size_t sz, struct perf_cpu_map >> *cpus) >> { >> const char *sysfs = sysfs__mountpoint(); >> - u64 midr = 0; >> int cpu; >> if (!sysfs || sz < MIDR_SIZE) >> @@ -44,21 +45,11 @@ static int _get_cpuid(char *buf, size_t sz, struct >> perf_cpu_map *cpus) >> } >> fclose(file); >> - /* Ignore/clear Variant[23:20] and >> - * Revision[3:0] of MIDR >> - */ >> - midr = strtoul(buf, NULL, 16); >> - midr &= (~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK)); >> - scnprintf(buf, MIDR_SIZE, "0x%016lx", midr); >> /* got midr break loop */ >> break; >> } >> perf_cpu_map__put(cpus); >> - >> - if (!midr) >> - return EINVAL; >> - >> return 0; >> } >> @@ -99,3 +90,48 @@ char *get_cpuid_str(struct perf_pmu *pmu) >> return buf; >> } >> + >> +/* >> + * Return 0 if idstr is a higher or equal to version of the same part as >> + * mapcpuid. >> + * >> + * Therefore, if mapcpuid has 0 for revision and variant then any >> version of >> + * idstr will match as long as it's the same CPU type. >> + */ >> +int strcmp_cpuid_str(const char *mapcpuid, const char *idstr) > > should we check implementator and other fields as a sanity check? > I'm not sure what we could check them against? There are no existing sanity checks around the MIDR stuff, it just takes whatever MIDR is in mapfile.csv and from the system. Unless there is something specific to this change that would benefit from it? Otherwise it sounds like it could be a separate additional change to what's already in Perf. >> +{ >> + u64 map_id = strtoull(mapcpuid, NULL, 16); >> + char map_id_variant = FIELD_GET(MIDR_VARIANT_MASK, map_id); >> + char map_id_revision = FIELD_GET(MIDR_REVISION_MASK, map_id); >> + u64 id = strtoull(idstr, NULL, 16); >> + char id_variant = FIELD_GET(MIDR_VARIANT_MASK, id); >> + char id_revision = FIELD_GET(MIDR_REVISION_MASK, id); >> + u64 id_fields = ~(MIDR_VARIANT_MASK | MIDR_REVISION_MASK); >> + >> + /* Compare without version first */ >> + if ((map_id & id_fields) != (id & id_fields)) >> + return 1; >> + >> + /* >> + * ID matches, now compare version. >> + * >> + * Arm revisions (like r0p0) are compared here like two digit semver >> + * values eg. 1.3 < 2.0 < 2.1 < 2.2. The events json file with the >> + * highest matching version is used. >> + * >> + * r = high value = 'Variant' field in MIDR >> + * p = low value = 'Revision' field in MIDR >> + * >> + */ >> + if (id_variant > map_id_variant) >> + return 0; >> + >> + if (id_variant == map_id_variant && id_revision >= map_id_revision) >> + return 0; >> + >> + /* >> + * variant is less than mapfile variant or variants are the same but >> + * the revision doesn't match. Return no match. >> + */ >> + return 1; >> +} > > Thanks, > John >

2 years, 7 months

Re: [PATCH v5 5/6] perf vendor events arm64: Update stall_slot workaround for N2 r0p3

by James Clark

On 14/08/2023 14:02, John Garry wrote: > >> try: >> parsed = ast.parse(py, mode='eval') >> except SyntaxError as e: >> diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c >> index 7410a165f68b..0985a3cbc6f9 100644 >> --- a/tools/perf/util/expr.c >> +++ b/tools/perf/util/expr.c >> @@ -13,6 +13,8 @@ >> #include <util/expr-bison.h> >> #include <util/expr-flex.h> >> #include "util/hashmap.h" >> +#include "util/header.h" >> +#include "util/pmu.h" >> #include "smt.h" >> #include "tsc.h" >> #include <api/fs/fs.h> >> @@ -495,3 +497,19 @@ double expr__has_event(const struct >> expr_parse_ctx *ctx, bool compute_ids, const >> evlist__delete(tmp); >> return ret; >> } >> + >> +double expr__strcmp_cpuid_str(const struct expr_parse_ctx *ctx >> __maybe_unused, >> + bool compute_ids __maybe_unused, const char *test_id) >> +{ >> + double ret; >> + struct perf_pmu *pmu = pmu__find_core_pmu(); >> + char *cpuid = perf_pmu__getcpuid(pmu); >> + >> + if (!cpuid) >> + return NAN; >> + >> + ret = !strcmp_cpuid_str(test_id, cpuid); > > It seems that strcmp_cpuid_str() is only added in arm64 arch code - > should there be a weak version for other archs? I think there is one in tools/perf/util/header.c. I tested the build on x86 as well as arm so it should be working. > >> + >> + free(cpuid); >> + return ret; >> +} >> diff --git a/tools/perf/util/expr.h b/tools/perf/util/expr.h >> index 3c1e49b3e35d..c0cec29ddc29 100644 >> --- a/tools/perf/util/expr.h >> +++ b/tools/perf/util/expr.h >> @@ -55,5 +55,6 @@ double expr_id_data__value(const struct expr_id_data >> *data); >> double expr_id_data__source_count(const struct expr_id_data *data); >> double expr__get_literal(const char *literal, const struct >> expr_scanner_ctx *ctx); >> double expr__has_event(const struct expr_parse_ctx *ctx, bool >> compute_ids, const char *id); >> +double expr__strcmp_cpuid_str(const struct expr_parse_ctx *ctx, bool >> compute_ids, const char *id); >> #endif >> diff --git a/tools/perf/util/expr.l b/tools/perf/util/expr.l >> index dbb117414710..0feef0726c48 100644 >> --- a/tools/perf/util/expr.l >> +++ b/tools/perf/util/expr.l >> @@ -114,6 +114,7 @@ if { return IF; } >> else { return ELSE; } >> source_count { return SOURCE_COUNT; } >> has_event { return HAS_EVENT; } >> +strcmp_cpuid_str { return STRCMP_CPUID_STR; } >> {literal} { return literal(yyscanner, sctx); } >> {number} { return value(yyscanner); } >> {symbol} { return str(yyscanner, ID, sctx->runtime); } >> diff --git a/tools/perf/util/expr.y b/tools/perf/util/expr.y >> index 65d54a6f29ad..6c93b358cc2d 100644 >> --- a/tools/perf/util/expr.y >> +++ b/tools/perf/util/expr.y >> @@ -39,7 +39,7 @@ int expr_lex(YYSTYPE * yylval_param , void *yyscanner); >> } ids; >> } >> -%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT >> HAS_EVENT EXPR_ERROR >> +%token ID NUMBER MIN MAX IF ELSE LITERAL D_RATIO SOURCE_COUNT >> HAS_EVENT STRCMP_CPUID_STR EXPR_ERROR >> %left MIN MAX IF >> %left '|' >> %left '^' >> @@ -207,6 +207,12 @@ expr: NUMBER >> $$.ids = NULL; >> free($3); >> } >> +| STRCMP_CPUID_STR '(' ID ')' >> +{ >> + $$.val = expr__strcmp_cpuid_str(ctx, compute_ids, $3); >> + $$.ids = NULL; >> + free($3); >> +} >> | expr '|' expr >> { >> if (is_const($1.val) && is_const($3.val)) { >> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c >> index b6654b9f55d2..b6948021fe29 100644 >> --- a/tools/perf/util/pmu.c >> +++ b/tools/perf/util/pmu.c >> @@ -1779,3 +1779,20 @@ void perf_pmu__delete(struct perf_pmu *pmu) >> zfree(&pmu->alias_name); >> free(pmu); >> } >> + >> +struct perf_pmu *pmu__find_core_pmu(void) > > Why was this relocated? I don't see anywhere changed which referenced > pmu__find_core_pmu() in this patch > It's in expr__strcmp_cpuid_str(), I added a new call to it. >> +{ >> + struct perf_pmu *pmu = NULL; >> + >> + while ((pmu = perf_pmus__scan_core(pmu))) { >> + /* >> + * The cpumap should cover all CPUs. Otherwise, some CPUs may >> + * not support some events or have different event IDs. >> + */ >> + if (RC_CHK_ACCESS(pmu->cpus)->nr != cpu__max_cpu().cpu) >> + return NULL; >> + >> + return pmu; >> + } >> + return NULL; >> +} >> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h >> index 203b92860e3c..580b8d65bd65 100644 >> --- a/tools/perf/util/pmu.h >> +++ b/tools/perf/util/pmu.h >> @@ -288,5 +288,6 @@ int perf_pmu__pathname_fd(int dirfd, const char >> *pmu_name, const char *filename, >> struct perf_pmu *perf_pmu__lookup(struct list_head *pmus, int dirfd, >> const char *lookup_name); >> struct perf_pmu *perf_pmu__create_placeholder_core_pmu(struct >> list_head *core_pmus); >> void perf_pmu__delete(struct perf_pmu *pmu); >> +struct perf_pmu *pmu__find_core_pmu(void); >> #endif /* __PMU_H */ >

2 years, 7 months

[PATCH v2 0/3] Add support for a streaming interface for TMC ETR

by Mao Jinlong

This patch series is to add support for a streaming interface for TMC ETR to allow for continuous log collection to secondary storage. An interrupt based mechanism is used to stream out the data from the device. QDSS_CS_QDSSCSR_ETRIRQCTRL register is used to set the IRQ byte counter value. The value of this registers defines the number of bytes that when moved by the ETR AXI interface. It will casues an interrupt which can be used by an userspace program to know how much data is present in memory requiring copy to some other location. A zero setting disables the interrupt.A one setting means 8 bytes, two 16 bytes, etc. In other words, the value in this register is the interrupt threshold times 8 bytes. ETR must be enabled when use this interrupt function. Sample: echo 4096 > /sys/bus/coresight/devices/csr0/etr_byte_cntr_val echo 1 > /sys/bus/coresight/devices/tmc_etr0/enable_sink echo 1 > /sys/bus/coresight/devices/stm0/enabl_source cat /dev/byte-cntr > /data/qdss.bin & The log collection will stop after disabling the ETR. codelinaro link: https://git.codelinaro.org/clo/linux-kernel/coresight/-/commits/coresight-b… Change in V2: Make CSR device as a helper device of ETR device. When enable ETR, it will enable CSR byte-cntr function. Mao Jinlong (3): Coresight: Add driver to support for CSR dt-bindings: arm: Adds CoreSight CSR hardware definitions coresight-csr: Add support for streaming interface for ETR .../testing/sysfs-bus-coresight-devices-csr | 6 + .../bindings/arm/qcom,coresight-csr.yaml | 130 ++++++ MAINTAINERS | 2 +- drivers/hwtracing/coresight/Kconfig | 12 + drivers/hwtracing/coresight/Makefile | 2 + drivers/hwtracing/coresight/coresight-core.c | 31 +- .../coresight/coresight-csr-bytecntr.c | 275 ++++++++++++ .../hwtracing/coresight/coresight-csr-core.c | 393 ++++++++++++++++++ drivers/hwtracing/coresight/coresight-csr.h | 112 +++++ drivers/hwtracing/coresight/coresight-priv.h | 8 + .../hwtracing/coresight/coresight-tmc-etr.c | 3 +- drivers/hwtracing/coresight/coresight-tmc.h | 2 + include/dt-bindings/arm/coresight-csr-dt.h | 12 + include/linux/coresight.h | 3 +- 14 files changed, 984 insertions(+), 7 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-bus-coresight-devices-csr create mode 100644 Documentation/devicetree/bindings/arm/qcom,coresight-csr.yaml create mode 100644 drivers/hwtracing/coresight/coresight-csr-bytecntr.c create mode 100644 drivers/hwtracing/coresight/coresight-csr-core.c create mode 100644 drivers/hwtracing/coresight/coresight-csr.h create mode 100644 include/dt-bindings/arm/coresight-csr-dt.h -- 2.17.1

2 years, 7 months

Re: [PATCH V4 1/4] arm_pmu: acpi: Refactor arm_spe_acpi_register_device()

by Anshuman Khandual

On 8/11/23 15:42, Will Deacon wrote: > On Fri, Aug 11, 2023 at 02:13:42PM +0530, Anshuman Khandual wrote: >> On 8/8/23 13:52, Anshuman Khandual wrote: >>> + /* >>> + * Sanity check all the GICC tables for the same interrupt >>> + * number. For now, only support homogeneous ACPI machines. >>> + */ >>> + for_each_possible_cpu(cpu) { >>> + struct acpi_madt_generic_interrupt *gicc; >>> + >>> + gicc = acpi_cpu_get_madt_gicc(cpu); >>> + if (gicc->header.length < len) >>> + return gsi ? -ENXIO : 0; >>> + >>> + this_gsi = parse_gsi(gicc); >>> + if (!this_gsi) >>> + return gsi ? -ENXIO : 0; >>> + >>> + this_hetid = find_acpi_cpu_topology_hetero_id(cpu); >>> + if (!gsi) { >>> + hetid = this_hetid; >>> + gsi = this_gsi; >>> + } else if (hetid != this_hetid || gsi != this_gsi) { >>> + pr_warn("ACPI: %s: must be homogeneous\n", pdev->name); >>> + return -ENXIO; >>> + } >>> + } >> >> As discussed on the previous version i.e V3 thread, will move the >> 'this_gsi' check after parse_gsi(), inside if (!gsi) conditional >> block. This will treat subsequent cpu parse_gsi()'s failure as a >> mismatch thus triggering the pr_warn() message. >> >> diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c >> index 845683ca7c64..6eae772d6298 100644 >> --- a/drivers/perf/arm_pmu_acpi.c >> +++ b/drivers/perf/arm_pmu_acpi.c >> @@ -98,11 +98,11 @@ arm_acpi_register_pmu_device(struct platform_device *pdev, u8 len, >> return gsi ? -ENXIO : 0; >> >> this_gsi = parse_gsi(gicc); >> - if (!this_gsi) >> - return gsi ? -ENXIO : 0; >> - >> this_hetid = find_acpi_cpu_topology_hetero_id(cpu); >> if (!gsi) { >> + if (!this_gsi) >> + return 0; > > Why do you need this hunk? Otherwise '0' gsi on all cpus would just clear the above homogeneity test, and end up in acpi_register_gsi() making it fail, but with the following warning before returning with -ENXIO. irq = acpi_register_gsi(NULL, gsi, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH); if (irq < 0) { pr_warn("ACPI: %s Unable to register interrupt: %d\n", pdev->name, gsi); return -ENXIO; } Is this behaviour better than returning 0 after detecting '0' gsi in the first cpu to avoid the above mentioned scenario ? Although 0 gsi followed by non-zero ones will still end up warning about a mismatch.

2 years, 8 months

[PATCH V4 0/4] coresight: trbe: Enable ACPI based devices

by Anshuman Khandual

This series enables detection of ACPI based TRBE devices via a stand alone purpose built representative platform device. But as a pre-requisite this changes coresight_platform_data structure assignment for the TRBE device. This series is based on v6.5-rc4 kernel, is also dependent on the following EDK2 changes posted earlier by Sami. https://edk2.groups.io/g/devel/message/107239 https://edk2.groups.io/g/devel/message/107241 Changes in V4: - Added in-code comment for arm_trbe_device_probe() - Reverted back using IS_ENABLED() for SPE PMU platform device - Replaced #ifdef with IS_ENABLED() for TRBE platform device - Protected arm_trbe_acpi_match with ACPI_PTR() - preventing a build failure when CONFIG_ACPI is not enabled - Added __maybe_unused for arm_acpi_register_pmu_device() and dropped config checks with IS_ENABLED() Changes in V3: https://lore.kernel.org/all/20230803055652.1322801-1-anshuman.khandual@arm.… - Changed ARMV8_TRBE_PDEV_NAME from "arm-trbe-acpi" into "arm,trbe" - Dropped local variable 'matched' - Replaced 'matched' with 'valid gsi' as being already matched once - Moved find_acpi_cpu_topology_hetero_id() outside conditional check Changes in V2: https://lore.kernel.org/all/20230801094052.750416-1-anshuman.khandual@arm.c… - Refactored arm_spe_acpi_register_device() in a separate patch - Renamed trbe_acpi_resources as trbe_resources - Renamed trbe_acpi_dev as trbe_dev Changes in V1: https://lore.kernel.org/all/20230728112733.359620-1-anshuman.khandual@arm.c… Cc: Sami Mujawar <sami.mujawar(a)arm.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Will Deacon <will(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Suzuki K Poulose <suzuki.poulose(a)arm.com> Cc: Mike Leach <mike.leach(a)linaro.org> Cc: Leo Yan <leo.yan(a)linaro.org> Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com> Cc: James Clark <james.clark(a)arm.com> Cc: coresight(a)lists.linaro.org Cc: linux-arm-kernel(a)lists.infradead.org Cc: linux-kernel(a)vger.kernel.org Anshuman Khandual (4): arm_pmu: acpi: Refactor arm_spe_acpi_register_device() arm_pmu: acpi: Add a representative platform device for TRBE coresight: trbe: Add a representative coresight_platform_data for TRBE coresight: trbe: Enable ACPI based TRBE devices arch/arm64/include/asm/acpi.h | 3 + drivers/hwtracing/coresight/coresight-trbe.c | 26 +++- drivers/hwtracing/coresight/coresight-trbe.h | 2 + drivers/perf/arm_pmu_acpi.c | 138 +++++++++++++------ include/linux/perf/arm_pmu.h | 1 + 5 files changed, 128 insertions(+), 42 deletions(-) -- 2.25.1

2 years, 8 months

Re: [PATCH V4 1/4] arm_pmu: acpi: Refactor arm_spe_acpi_register_device()

by Suzuki K Poulose

On 08/08/2023 14:16, Will Deacon wrote: > On Tue, Aug 08, 2023 at 09:48:16AM +0100, Suzuki K Poulose wrote: >> On 08/08/2023 09:22, Anshuman Khandual wrote: >>> Sanity checking all the GICC tables for same interrupt number, and ensuring >>> a homogeneous ACPI based machine, could be used for other platform devices >>> as well. Hence this refactors arm_spe_acpi_register_device() into a common >>> helper arm_acpi_register_pmu_device(). >>> >>> Cc: Catalin Marinas <catalin.marinas(a)arm.com> >>> Cc: Will Deacon <will(a)kernel.org> >>> Cc: Mark Rutland <mark.rutland(a)arm.com> >>> Cc: linux-arm-kernel(a)lists.infradead.org >>> Cc: linux-kernel(a)vger.kernel.org >>> Co-developed-by: Will Deacon <will(a)kernel.org> >>> Signed-off-by: Will Deacon <will(a)kernel.org> >>> Signed-off-by: Anshuman Khandual <anshuman.khandual(a)arm.com> >>> --- >>> drivers/perf/arm_pmu_acpi.c | 105 ++++++++++++++++++++++-------------- >>> 1 file changed, 65 insertions(+), 40 deletions(-) >>> >>> diff --git a/drivers/perf/arm_pmu_acpi.c b/drivers/perf/arm_pmu_acpi.c >>> index 90815ad762eb..72454bef2a70 100644 >>> --- a/drivers/perf/arm_pmu_acpi.c >>> +++ b/drivers/perf/arm_pmu_acpi.c >>> @@ -69,6 +69,63 @@ static void arm_pmu_acpi_unregister_irq(int cpu) >>> acpi_unregister_gsi(gsi); >>> } >>> +static int __maybe_unused >>> +arm_acpi_register_pmu_device(struct platform_device *pdev, u8 len, >>> + u16 (*parse_gsi)(struct acpi_madt_generic_interrupt *)) >>> +{ >>> + int cpu, this_hetid, hetid, irq, ret; >>> + u16 this_gsi, gsi = 0; >>> + >>> + /* >>> + * Ensure that platform device must have IORESOURCE_IRQ >>> + * resource to hold gsi interrupt. >>> + */ >>> + if (pdev->num_resources != 1) >>> + return -ENXIO; >>> + >>> + if (pdev->resource[0].flags != IORESOURCE_IRQ) >>> + return -ENXIO; >>> + >>> + /* >>> + * Sanity check all the GICC tables for the same interrupt >>> + * number. For now, only support homogeneous ACPI machines. >>> + */ >>> + for_each_possible_cpu(cpu) { >>> + struct acpi_madt_generic_interrupt *gicc; >>> + >>> + gicc = acpi_cpu_get_madt_gicc(cpu); >>> + if (gicc->header.length < len) >>> + return gsi ? -ENXIO : 0; >>> + >>> + this_gsi = parse_gsi(gicc); >>> + if (!this_gsi) >>> + return gsi ? -ENXIO : 0; >>> + >>> + this_hetid = find_acpi_cpu_topology_hetero_id(cpu); >>> + if (!gsi) { >>> + hetid = this_hetid; >>> + gsi = this_gsi; >>> + } else if (hetid != this_hetid || gsi != this_gsi) { >>> + pr_warn("ACPI: %s: must be homogeneous\n", pdev->name); >>> + return -ENXIO; >>> + } >>> + } >>> + >>> + irq = acpi_register_gsi(NULL, gsi, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_HIGH); >>> + if (irq < 0) { >>> + pr_warn("ACPI: %s Unable to register interrupt: %d\n", pdev->name, gsi); >>> + return -ENXIO; >>> + } >>> + >>> + pdev->resource[0].start = irq; >>> + ret = platform_device_register(pdev); >>> + if (ret < 0) { >>> + pr_warn("ACPI: %s: Unable to register device\n", pdev->name); >>> + acpi_unregister_gsi(gsi); >>> + } >>> + return ret; >> >> A postivie return value here could confuse the caller. Also, with my comment >> below, we don't really need to return something from here. > > How does this return a positive value? Right now, there aren't. My point is this function returns a "return value" of another function. And the caller of this function doesn't really follow the "check" it needs. e.g.: ret = foo(); if (ret < 0) error; return ret; And the caller only checks for if (ret) error; This seems fragile. > >>> + int ret = arm_acpi_register_pmu_device(&spe_dev, ACPI_MADT_GICC_SPE, >>> + arm_spe_parse_gsi); >>> + if (ret) >>> pr_warn("ACPI: SPE: Unable to register device\n"); >> >> With this change, a system without SPE interrupt description always >> generates the above message. Is this intended ? > > If there are no irqs, why doesn't this return 0? Apologies, I missed that. > arm_acpi_register_pmu_device() should only fail if either: > > - The static resources passed in are broken > - The tables are not homogeneous > - We fail to register the interrupt > > so something is amiss. Agreed. We don't need duplicate messages about an error ? i.e., one in arm_acpi_register_pmu_device() and another one in the caller ? (Of course adding any missing error msgs). > >> Could we not drop the above message as all the other possible error >> scenarios are reported. We could simply make the above helper void, see my >> comment above. > > I disagree. If the ACPI tables are borked, we should print a message saying > so. Ok, fair point. Suzuki > > Will

2 years, 8 months

Re: [PATCH v4 5/6] perf vendor events arm64: Update stall_slot workaround for N2 r0p3

by James Clark

On 09/08/2023 14:54, John Garry wrote: > On 09/08/2023 14:06, James Clark wrote: >>> "MetricExpr": "(op_retired / op_spec) * (1 - (stall_slot if >>> (cpuid_less_than(410fd493)) else (stall_slot - cpu_cycles)) / (#slots * >>> cpu_cycles))" >>> >>> I'm currently figuring out how cpuid_less_than() would be implemented >>> (I'm no great python wrangler), but it would be along the lines of what >>> Ian added for "has_event" in >>> https://urldefense.com/v3/__https://lore.kernel.org/linux-perf-users/202306… >>> Thanks, >>> John >> Yeah it looks like it could be done that way. Also, the way I added it, >> it doesn't have access to the PMU type, it just does a generic >> pmu__find_core_pmu() so won't work very well for heterogeneous systems. > > I haven't been keeping a close eye on the hybrid PMU support, but AFAIK > metrics for hybrid arm64 system, i.e. bL, aren't supported - maybe that > has changed. The gating for bL support was in pmu__find_core_pmu() > returning NULL for a hybrid system. > Yes we're not currently publishing any metrics for hybrid systems, so in this case for N2 and V2 it's not needed. But it would be good to try to future proof it if possible. Although I don't know how the metrics currently react on hybrid systems, it's also something I have to take a look at. >> >> If we're going to do a deeper modification of the expression parser like >> with has_event() it might be possible to pass in the actual CPU ID that >> the metric is running on which would be better. >> >> I'll have a look. > > Thanks. I was playing with this yesterday, but I was making slow > progress. I was essentially following the has_event example, but the > argument type causes an issue, in that has_event expected an event name, > while we want to pass a hex string. > > If you could check this then that would be great. > > Thanks, > John >

2 years, 8 months

Re: [RFC PATCH 2/3] arm64: KVM: Support exclude_guest for Coresight trace in nVHE

by James Clark

On 08/08/2023 12:04, Marc Zyngier wrote: > On Fri, 04 Aug 2023 11:13:12 +0100, > James Clark <james.clark(a)arm.com> wrote: >> >> Currently trace will always be generated in nVHE as long as TRBE isn't >> being used. To allow filtering out guest trace, re-apply the filter >> rules before switching to the guest. >> >> The TRFCR restore function remains the same. >> >> Signed-off-by: James Clark <james.clark(a)arm.com> >> --- >> arch/arm64/kvm/debug.c | 7 ++++ >> arch/arm64/kvm/hyp/nvhe/debug-sr.c | 56 +++++++++++++++++++++++++++--- >> 2 files changed, 59 insertions(+), 4 deletions(-) >> >> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c >> index 8725291cb00a..ebb4db20a859 100644 >> --- a/arch/arm64/kvm/debug.c >> +++ b/arch/arm64/kvm/debug.c >> @@ -335,10 +335,17 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu) >> if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceBuffer_SHIFT) && >> !(read_sysreg_s(SYS_TRBIDR_EL1) & TRBIDR_EL1_P)) >> vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRBE); >> + /* >> + * Save TRFCR on nVHE if FEAT_TRF exists. This will be done in cases >> + * where DEBUG_STATE_SAVE_TRBE doesn't completely disable trace. >> + */ >> + if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT)) >> + vcpu_set_flag(vcpu, DEBUG_STATE_SAVE_TRFCR); >> } >> >> void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu) >> { >> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_SPE); >> vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRBE); >> + vcpu_clear_flag(vcpu, DEBUG_STATE_SAVE_TRFCR); >> } >> diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c >> index 4558c02eb352..0e8c85b29b92 100644 >> --- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c >> +++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c >> @@ -51,13 +51,17 @@ static void __debug_restore_spe(u64 pmscr_el1) >> write_sysreg_s(pmscr_el1, SYS_PMSCR_EL1); >> } >> >> -static void __debug_save_trace(u64 *trfcr_el1) >> +/* >> + * Save TRFCR and disable trace completely if TRBE is being used. Return true >> + * if trace was disabled. >> + */ >> +static bool __debug_save_trace(u64 *trfcr_el1) >> { >> *trfcr_el1 = 0; >> >> /* Check if the TRBE is enabled */ >> if (!(read_sysreg_s(SYS_TRBLIMITR_EL1) & TRBLIMITR_EL1_E)) >> - return; >> + return false; > > While you're refactoring this code, please move the zeroing of > *trfcr_el1 under the if statement. > >> /* >> * Prohibit trace generation while we are in guest. >> * Since access to TRFCR_EL1 is trapped, the guest can't >> @@ -68,6 +72,8 @@ static void __debug_save_trace(u64 *trfcr_el1) >> isb(); >> /* Drain the trace buffer to memory */ >> tsb_csync(); >> + >> + return true; >> } >> >> static void __debug_restore_trace(u64 trfcr_el1) >> @@ -79,14 +85,55 @@ static void __debug_restore_trace(u64 trfcr_el1) >> write_sysreg_s(trfcr_el1, SYS_TRFCR_EL1); >> } >> >> +#if IS_ENABLED(CONFIG_PERF_EVENTS) > > As previously stated, just always compile this. There shouldn't be > anything here that's so large that it becomes a candidate for > exclusion. Hell, even the whole of NV+pKVM are permanent features, > even of most people won't use *any* of that. > >> +static inline void __debug_save_trfcr(struct kvm_vcpu *vcpu) >> +{ >> + u64 trfcr; >> + struct kvm_etm_event etm_event = vcpu->arch.host_debug_state.etm_event; >> + >> + /* No change if neither are excluded */ >> + if (!etm_event.exclude_guest && !etm_event.exclude_host) { >> + /* Zeroing prevents restoring a stale value */ >> + vcpu->arch.host_debug_state.trfcr_el1 = 0; > > I find this "zero means do nothing" part very odd. I can see it is > already done, but I really dislike this sort of assumption to avoid > writing to a register. > > I'd really prefer we track another version of TRFCR_EL1, compare host > and guest, and decide to avoid writing if they are equal. At least, it > would be readable. > > And in the end, expressing *everything* in terms of the register would > really help, instead of the exclude_* stuff that has no place in the > low-level arch code. > Yep, I agree with all of the above, I can make these changes for the next version. I just want to clarify your point about disabling trace for protected guests when not in debug mode that I asked about in the review on patch 1. > Thanks, > > M. >

2 years, 8 months

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

CoreSight August 2023