This series adds CPU erratum work arounds related to the self-hosted
tracing. The list of affected errata handled in this series are :
* TRBE may overwrite trace in FILL mode
- Arm Neoverse-N2 #2139208
- Cortex-A710 #2119858
* A TSB instruction may not flush the trace completely when executed
in trace prohibited region.
- Arm Neoverse-N2 #2067961
- Cortex-A710 #2054223
The series applies on the self-hosted/trbe fixes posted here [0].
A tree containing both the series is available here [1].
[0] https://lkml.kernel.org/r/20210723124456.3828769-1-suzuki.poulose@arm.com
[1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v1
Suzuki K Poulose (10):
coresight: trbe: Add infrastructure for Errata handling
coresight: trbe: Add a helper to calculate the trace generated
coresight: trbe: Add a helper to pad a given buffer area
coresight: trbe: Decouple buffer base from the hardware base
coresight: trbe: Allow driver to choose a different alignment
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
arm64: Add erratum detection for TRBE overwrite in FILL mode
coresight: trbe: Workaround TRBE errat overwrite in FILL mode
arm64: Enable workaround for TRBE overwrite in FILL mode
arm64: errata: Add workaround for TSB flush failures
Documentation/arm64/silicon-errata.rst | 8 +
arch/arm64/Kconfig | 70 ++++++
arch/arm64/include/asm/barrier.h | 17 +-
arch/arm64/include/asm/cputype.h | 4 +
arch/arm64/kernel/cpu_errata.c | 44 ++++
arch/arm64/tools/cpucaps | 2 +
drivers/hwtracing/coresight/coresight-trbe.c | 227 ++++++++++++++++---
7 files changed, 341 insertions(+), 31 deletions(-)
--
2.24.1
On 02/08/2021 07:43, Anshuman Khandual wrote:
>
>
> On 7/28/21 7:22 PM, Suzuki K Poulose wrote:
>> Add a minimal infrastructure to keep track of the errata
>> affecting the given TRBE instance. Given that we have
>> heterogeneous CPUs, we have to manage the list per-TRBE
>> instance to be able to apply the work around as needed.
>>
>> We rely on the arm64 errata framework for the actual
>> description and the discovery of a given erratum, to
>> keep the Erratum work around at a central place and
>> benefit from the code and the advertisement from the
>> kernel. We use a local mapping of the erratum to
>> avoid bloating up the individual TRBE structures.
>
> I guess there is no other way around apart from each TRBE instance
> tracking applicable erratas locally per CPU, even though it sounds
> bit redundant.
>
>> i.e, each arm64 TRBE erratum bit is assigned a new number
>> within the driver to track. Each trbe instance updates
>> the list of affected erratum at probe time on the CPU.
>> This makes sure that we can easily access the list of
>> errata on a given TRBE instance without much overhead.
>
> It also ensures that the generic errata framework is queried just
> once during individual CPU probe.
>
>>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++++++++++
>> 1 file changed, 48 insertions(+)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index b8586c170889..0368bf405e35 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -16,6 +16,8 @@
>> #define pr_fmt(fmt) DRVNAME ": " fmt
>>
>> #include <asm/barrier.h>
>> +#include <asm/cputype.h>
>> +
>> #include "coresight-self-hosted-trace.h"
>> #include "coresight-trbe.h"
>>
>> @@ -65,6 +67,35 @@ struct trbe_buf {
>> struct trbe_cpudata *cpudata;
>> };
>>
>> +/*
>> + * TRBE erratum list
>> + *
>> + * We rely on the corresponding cpucaps to be defined for a given
>> + * TRBE erratum. We map the given cpucap into a TRBE internal number
>> + * to make the tracking of the errata lean.
>> + *
>> + * This helps in :
>> + * - Not duplicating the detection logic
>> + * - Streamlined detection of erratum across the system
>> + *
>> + * Since the erratum work arounds could be applied individually
>> + * per TRBE instance, we keep track of the list of errata that
>> + * affects the given instance of the TRBE.
>> + */
>> +#define TRBE_ERRATA_MAX 0
>> +
>> +static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> +};
>
> This needs to be tighten up. There should be build time guard rails in
> arm64 errata cpucaps, so that only TRBE specific ones could be assigned
> here as trbe_errata_cpucaps[].
I don't get your point. The actual arm64 erratum caps are not linear
and as such we don't have to force it. This approach gives us a hand
picked exact list of errata that apply to the TRBE driver by mapping
it linearly here. The only reason why we have that TRBE_ERRATA_MAX,
is such that we can track it per TRBE instance and ...
>
>> +
>> +/*
>> + * struct trbe_cpudata: TRBE instance specific data
>> + * @trbe_flag - TRBE dirty/access flag support
>> + * @tbre_align - Actual TRBE alignment required for TRBPTR_EL1.
>> + * @cpu - CPU this TRBE belongs to.
>> + * @mode - Mode of current operation. (perf/disabled)
>> + * @drvdata - TRBE specific drvdata
>> + * @errata - Bit map for the errata on this TRBE.
>> + */
>> struct trbe_cpudata {
>> bool trbe_flag;
>> u64 trbe_align;
>> @@ -72,6 +103,7 @@ struct trbe_cpudata {
>> enum cs_mode mode;
>> struct trbe_buf *buf;
>> struct trbe_drvdata *drvdata;
>> + DECLARE_BITMAP(errata, TRBE_ERRATA_MAX);
>> };
>>
>> struct trbe_drvdata {
>> @@ -84,6 +116,21 @@ struct trbe_drvdata {
>> struct platform_device *pdev;
>> };
>>
>> +static void trbe_check_errata(struct trbe_cpudata *cpudata)
>> +{
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) {
>
> BUILD_BUG_ON() - if trbe_errata_cpucaps[i] is not inside TRBE specific
> errata cpucap range ?
... also run these detection tests.
>
>> + if (this_cpu_has_cap(trbe_errata_cpucaps[i]))
>> + set_bit(i, cpudata->errata);
>> + }
>> +}
>> +
>> +static inline bool trbe_has_erratum(int i, struct trbe_cpudata *cpudata)
>
> Switch the argument positions here ? 'int i' should be the second one.
>
ok.
>> +{
>> + return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata);
>> +}
>> +
>> static int trbe_alloc_node(struct perf_event *event)
>> {
>> if (event->cpu == -1)
>> @@ -925,6 +972,7 @@ static void arm_trbe_probe_cpu(void *info)
>> goto cpu_clear;
>> }
>>
>> + trbe_check_errata(cpudata);
>
> This should be called right at the end before arm_trbe_probe_cpu() exits
> on the success path. Errata should not be evaluated if TRBE on the CPU
> wont be used for some reason i.e cpumask_clear_cpu() path.
ok
>
>> cpudata->trbe_align = 1ULL << get_trbe_address_align(trbidr);
>> if (cpudata->trbe_align > SZ_2K) {
>> pr_err("Unsupported alignment on cpu %d\n", cpu);
>>
>
> This patch should be moved after [PATCH 5/10] i.e just before adding the
> first TRBE errata.
>
I will take a look.
Thanks for the review
Suzuki
In the previous patch set for fixing CoreSight snapshot mode [1], the
patch for perf tool has been merged into the mainline kernel [2]; other
two patches for CoreSight driver have been left out.
This patch series resends these two missed out patches, alongside
patches 01 and 02 are updated with minor improvement commits.
This patch series has been tested on Arm64 Juno board.
Changes from v2:
- Minor improvement the commits for patches 01 and 02.
[1] https://lore.kernel.org/lkml/20210701093537.90759-1-leo.yan@linaro.org/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Leo Yan (2):
coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
coresight: Update comments for removing cs_etm_find_snapshot()
drivers/hwtracing/coresight/coresight-etb10.c | 2 +-
drivers/hwtracing/coresight/coresight-tmc-etf.c | 2 +-
drivers/hwtracing/coresight/coresight-tmc-etr.c | 12 ++++--------
3 files changed, 6 insertions(+), 10 deletions(-)
--
2.25.1
Changes since v1:
* Re-implement with a new magic number instead of piggybacking on ETMv4
* Improve comments and function name around cs_etm_decoder__get_etmv4_arch_ver()
* Add a warning for unrecognised magic numbers
* Split typo fix into new commit
* Add Leo's reviewed-by tags
* Create a new struct for ETE config (cs_ete_trace_params) instead of re-using ETMv4 config
Applies to perf/core f3c33cbd922
Also available at https://gitlab.arm.com/linux-arm/linux-jc/-/tree/james-ete-v2
James Clark (9):
perf cs-etm: Refactor initialisation of decoder params.
perf cs-etm: Initialise architecture based on TRCIDR1
perf cs-etm: Refactor out ETMv4 header saving
perf cs-etm: Save TRCDEVARCH register
perf cs-etm: Fix typo
perf cs-etm: Update OpenCSD decoder for ETE
perf cs-etm: Create ETE decoder
perf cs-etm: Print the decoder name
perf cs-etm: Show a warning for an unknown magic number
tools/build/feature/test-libopencsd.c | 4 +-
tools/perf/arch/arm/util/cs-etm.c | 97 ++++++++----
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 148 ++++++++----------
.../perf/util/cs-etm-decoder/cs-etm-decoder.h | 13 ++
tools/perf/util/cs-etm.c | 43 ++++-
tools/perf/util/cs-etm.h | 10 ++
6 files changed, 200 insertions(+), 115 deletions(-)
--
2.28.0
Hello everyone,
I'm interested in using CoreSight on a DragonBoard 410c. My preference
is to perform trace acquisition and decoding using perf tools. I
successfully cross-compiled the Linaro Linux release 21.03 [1] and
built a boot image with CoreSight components enabled. Then I tried to
figure out trace decoding, which I prefer to perform on a host
machine. I successfully compiled OpenCSD on my Ubuntu host using the
instructions in [2], but I got errors when I tried to compile perf
with OpenCSD.
Then I moved to the other option that was trace decoding on the board.
I successfully compiled OpenCSD on my DragonBoard 410c. Since the free
space on the eMMC is less than the size of the kernel source, I stored
the kernel source on a MicroSD card and tried to compile perf within
the MicroSD. But I got the following error in both cases of trying to
compile perf with OpenCSD (make -C tools/perf VF=1 CORESIGHT=1) or
compiling it standalone (make -C tools/perf):
linaro@linaro-alip:/media/linaro/mymicrosd/kernel$ make -C tools/perf
VF=1 CORESIGHT=1
make: Entering directory '/media/linaro/mymicrosd/kernel/tools/perf'
BUILD: Doing 'make -j4' parallel build
make[1]: ./check-headers.sh: Permission denied
make[1]: *** [Makefile.perf:232: sub-make] Error 127
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/media/linaro/mymicrosd/kernel/tools/perf'
To conclude, my problem regarding trace acquisition is that I cannot
compile perf on DragonBoard 410c, and my problem regarding trace
decoding is that I cannot compile perf with OpenCSD either on a Ubuntu
host or on the board.
Any help is greatly appreciated.
Regards,
Farzam
[1] https://releases.linaro.org/96boards/dragonboard410c/linaro/debian/21.03/
[2] https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md
Hi Tao,
Apologies for the late reply - this patch fell through the cracks.
On Thu, Aug 19, 2021 at 05:29:37PM +0800, Tao Zhang wrote:
> The input parameter of the function pm_runtime_put should be the
> same in the function cti_enable_hw and cti_disable_hw. The correct
> parameter to use here should be dev->parent.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-cti-core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-cti-core.c b/drivers/hwtracing/coresight/coresight-cti-core.c
> index e2a3620..8988b2e 100644
> --- a/drivers/hwtracing/coresight/coresight-cti-core.c
> +++ b/drivers/hwtracing/coresight/coresight-cti-core.c
> @@ -175,7 +175,7 @@ static int cti_disable_hw(struct cti_drvdata *drvdata)
> coresight_disclaim_device_unlocked(csdev);
> CS_LOCK(drvdata->base);
> spin_unlock(&drvdata->spinlock);
> - pm_runtime_put(dev);
> + pm_runtime_put(dev->parent);
You are correct - I have added this patch to my next tree.
Thanks,
Mathieu
> return 0;
>
> /* not disabled this call */
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
When build perf tool with passing option 'CORESIGHT=1' explicitly, if
the feature test fails for library libopencsd, the build doesn't
complain the feature failure and continue to build the tool with
disabling the CoreSight feature insteadly.
This patch changes the building behaviour, when build perf tool with the
option 'CORESIGHT=1' and detect the failure for testing feature
libopencsd, the build process will be aborted and it shows the complaint
info.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
Changes from v1:
Fixed a typo in the error message.
tools/perf/Makefile.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 4a0d9a6defc7..5df79538486b 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -489,6 +489,8 @@ ifdef CORESIGHT
CFLAGS += -DCS_RAW_PACKED
endif
endif
+ else
+ dummy := $(error Error: No libopencsd library found or the version is not up-to-date. Please install recent libopencsd to build with CORESIGHT=1)
endif
endif
--
2.25.1