[PATCH v2 00/17] arm64: Self-hosted trace related errata workarounds

List overview All Threads
Download

newer

older

Re: [PATCH v2 00/17] arm64:...

[PATCH v2 00/12] OpenCSD: v1.2.0:...

Suzuki K Poulose

21 Sep 2021 21 Sep '21

1:41 p.m.

This series adds CPU erratum work arounds related to the self-hosted tracing. The list of affected errata handled in this series are :

* TRBE may overwrite trace in FILL mode - Arm Neoverse-N2 #2139208 - Cortex-A710 #211985

* A TSB instruction may not flush the trace completely when executed in trace prohibited region.

- Arm Neoverse-N2 #2067961 - Cortex-A710 #2054223

* TRBE may write to out-of-range address - Arm Neoverse-N2 #2253138 - Cortex-A710 #2224489

The series applies on the self-hosted/trbe fixes posted here [0]. A tree containing both the series is available here [1]

[0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com [1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2

Changes since v1: https://lkml.kernel.org/r/20210728135217.591173-1-suzuki.poulose@arm.com - Added a fix to the TRBE driver handling of sink_specific data - Added more description and ASCII art for overwrite in FILL mode work around - Added another TRBE erratum to the list. "TRBE may write to out-of-range address" Patches from 12-17 - Added comment to list the expectations around TSB erratum workaround.

Suzuki K Poulose (17): coresight: trbe: Fix incorrect access of the sink specific data coresight: trbe: Add infrastructure for Errata handling coresight: trbe: Add a helper to calculate the trace generated coresight: trbe: Add a helper to pad a given buffer area coresight: trbe: Decouple buffer base from the hardware base coresight: trbe: Allow driver to choose a different alignment arm64: Add Neoverse-N2, Cortex-A710 CPU part definition arm64: Add erratum detection for TRBE overwrite in FILL mode coresight: trbe: Workaround TRBE errata overwrite in FILL mode arm64: Enable workaround for TRBE overwrite in FILL mode arm64: errata: Add workaround for TSB flush failures coresight: trbe: Add a helper to fetch cpudata from perf handle coresight: trbe: Add a helper to determine the minimum buffer size coresight: trbe: Make sure we have enough space arm64: Add erratum detection for TRBE write to out-of-range coresight: trbe: Work around write to out of range arm64: Advertise TRBE erratum workaround for write to out-of-range address

Documentation/arm64/silicon-errata.rst | 12 + arch/arm64/Kconfig | 109 ++++++ arch/arm64/include/asm/barrier.h | 16 +- arch/arm64/include/asm/cputype.h | 4 + arch/arm64/kernel/cpu_errata.c | 64 ++++ arch/arm64/tools/cpucaps | 3 + drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++-- 7 files changed, 510 insertions(+), 37 deletions(-)

-- 2.24.1

Show replies by date

Suzuki K Poulose

21 Sep 21 Sep

1:41 p.m.

New subject: [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data

The TRBE driver wrongly treats the aux private data as the TRBE driver specific buffer for a given perf handle, while it is the ETM PMU's event specific data. Fix this by correcting the instance to use appropriate helper.

Fixes: 3fbf7f011f242 ("coresight: sink: Add TRBE driver") Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index d4c57aed05e5..e3d73751d568 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -363,7 +363,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle)

static unsigned long trbe_normal_offset(struct perf_output_handle *handle) { - struct trbe_buf *buf = perf_get_aux(handle); + struct trbe_buf *buf = etm_perf_sink_config(handle); u64 limit = __trbe_normal_offset(handle); u64 head = PERF_IDX2OFF(handle->head, buf);

-- 2.24.1

Mathieu Poirier

30 Sep 30 Sep

5:57 p.m.

New subject: [PATCH v2 01/17] coresight: trbe: Fix incorrect access of the sink specific data

On Tue, Sep 21, 2021 at 02:41:05PM +0100, Suzuki K Poulose wrote:

...

The TRBE driver wrongly treats the aux private data as the TRBE driver specific buffer for a given perf handle, while it is the ETM PMU's event specific data. Fix this by correcting the instance to use appropriate helper.

Fixes: 3fbf7f011f242 ("coresight: sink: Add TRBE driver") Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index d4c57aed05e5..e3d73751d568 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -363,7 +363,7 @@ static unsigned long __trbe_normal_offset(struct perf_output_handle *handle) static unsigned long trbe_normal_offset(struct perf_output_handle *handle) {

struct trbe_buf *buf = perf_get_aux(handle);

struct trbe_buf *buf = etm_perf_sink_config(handle);

I really wonder how things got to work before...

I have fixed the 13-character SHA in the "Fixes" tag and added this patch to my local tree. More comments tomorrow.

Thanks, Mathieu

...

u64 limit = __trbe_normal_offset(handle); u64 head = PERF_IDX2OFF(handle->head, buf); -- 2.24.1

Suzuki K Poulose

21 Sep 21 Sep

1:41 p.m.

New subject: [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling

Add a minimal infrastructure to keep track of the errata affecting the given TRBE instance. Given that we have heterogeneous CPUs, we have to manage the list per-TRBE instance to be able to apply the work around as needed.

We rely on the arm64 errata framework for the actual description and the discovery of a given erratum, to keep the Erratum work around at a central place and benefit from the code and the advertisement from the kernel. We use a local mapping of the erratum to avoid bloating up the individual TRBE structures. i.e, each arm64 TRBE erratum bit is assigned a new number within the driver to track. Each trbe instance updates the list of affected erratum at probe time on the CPU. This makes sure that we can easily access the list of errata on a given TRBE instance without much overhead.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Changes since v1: - Flip the order of args for trbe_has_erratum() - Move erratum detection further down in the sequence --- drivers/hwtracing/coresight/coresight-trbe.c | 49 ++++++++++++++++++++ 1 file changed, 49 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index e3d73751d568..63f7edd5fd1f 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,8 @@ #define pr_fmt(fmt) DRVNAME ": " fmt

#include <asm/barrier.h> +#include <asm/cputype.h> + #include "coresight-self-hosted-trace.h" #include "coresight-trbe.h"

@@ -65,6 +67,35 @@ struct trbe_buf { struct trbe_cpudata *cpudata; };

+/* + * TRBE erratum list + * + * We rely on the corresponding cpucaps to be defined for a given + * TRBE erratum. We map the given cpucap into a TRBE internal number + * to make the tracking of the errata lean. + * + * This helps in : + * - Not duplicating the detection logic + * - Streamlined detection of erratum across the system + * + * Since the erratum work arounds could be applied individually + * per TRBE instance, we keep track of the list of errata that + * affects the given instance of the TRBE. + */ +#define TRBE_ERRATA_MAX 0 + +static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = { +}; + +/* + * struct trbe_cpudata: TRBE instance specific data + * @trbe_flag - TRBE dirty/access flag support + * @tbre_align - Actual TRBE alignment required for TRBPTR_EL1. + * @cpu - CPU this TRBE belongs to. + * @mode - Mode of current operation. (perf/disabled) + * @drvdata - TRBE specific drvdata + * @errata - Bit map for the errata on this TRBE. + */ struct trbe_cpudata { bool trbe_flag; u64 trbe_align; @@ -72,6 +103,7 @@ struct trbe_cpudata { enum cs_mode mode; struct trbe_buf *buf; struct trbe_drvdata *drvdata; + DECLARE_BITMAP(errata, TRBE_ERRATA_MAX); };

struct trbe_drvdata { @@ -84,6 +116,21 @@ struct trbe_drvdata { struct platform_device *pdev; };

+static void trbe_check_errata(struct trbe_cpudata *cpudata) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) { + if (this_cpu_has_cap(trbe_errata_cpucaps[i])) + set_bit(i, cpudata->errata); + } +} + +static inline bool trbe_has_erratum(struct trbe_cpudata *cpudata, int i) +{ + return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata); +} + static int trbe_alloc_node(struct perf_event *event) { if (event->cpu == -1) @@ -926,6 +973,8 @@ static void arm_trbe_probe_cpu(void *info) pr_err("Unsupported alignment on cpu %d\n", cpu); goto cpu_clear; } + + trbe_check_errata(cpudata); cpudata->trbe_flag = get_trbe_flag_update(trbidr); cpudata->cpu = cpu; cpudata->drvdata = drvdata;

-- 2.24.1

Mathieu Poirier

5 Oct 5 Oct

4:46 p.m.

New subject: [PATCH v2 02/17] coresight: trbe: Add infrastructure for Errata handling

On Tue, Sep 21, 2021 at 02:41:06PM +0100, Suzuki K Poulose wrote:

...

Add a minimal infrastructure to keep track of the errata affecting the given TRBE instance. Given that we have heterogeneous CPUs, we have to manage the list per-TRBE instance to be able to apply the work around as needed.

We rely on the arm64 errata framework for the actual description and the discovery of a given erratum, to keep the Erratum work around at a central place and benefit from the code and the advertisement from the kernel. We use a local mapping of the erratum to avoid bloating up the individual TRBE structures. i.e, each arm64 TRBE erratum bit is assigned a new number within the driver to track. Each trbe instance updates the list of affected erratum at probe time on the CPU. This makes sure that we can easily access the list of errata on a given TRBE instance without much overhead.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

Changes since v1:

Flip the order of args for trbe_has_erratum()

Move erratum detection further down in the sequence

drivers/hwtracing/coresight/coresight-trbe.c | 49 ++++++++++++++++++++ 1 file changed, 49 insertions(+)

Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org

...

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index e3d73751d568..63f7edd5fd1f 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,8 @@ #define pr_fmt(fmt) DRVNAME ": " fmt #include <asm/barrier.h> +#include <asm/cputype.h>

#include "coresight-self-hosted-trace.h" #include "coresight-trbe.h" @@ -65,6 +67,35 @@ struct trbe_buf { struct trbe_cpudata *cpudata; }; +/*

TRBE erratum list

We rely on the corresponding cpucaps to be defined for a given

TRBE erratum. We map the given cpucap into a TRBE internal number

to make the tracking of the errata lean.

This helps in :

Not duplicating the detection logic

Streamlined detection of erratum across the system

Since the erratum work arounds could be applied individually

per TRBE instance, we keep track of the list of errata that

affects the given instance of the TRBE.

*/

+#define TRBE_ERRATA_MAX 0

+static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = { +};

+/*

struct trbe_cpudata: TRBE instance specific data

@trbe_flag - TRBE dirty/access flag support

@tbre_align - Actual TRBE alignment required for TRBPTR_EL1.

@cpu - CPU this TRBE belongs to.

@mode - Mode of current operation. (perf/disabled)

@drvdata - TRBE specific drvdata

@errata - Bit map for the errata on this TRBE.

*/

struct trbe_cpudata { bool trbe_flag; u64 trbe_align; @@ -72,6 +103,7 @@ struct trbe_cpudata { enum cs_mode mode; struct trbe_buf *buf; struct trbe_drvdata *drvdata;

DECLARE_BITMAP(errata, TRBE_ERRATA_MAX);

}; struct trbe_drvdata { @@ -84,6 +116,21 @@ struct trbe_drvdata { struct platform_device *pdev; }; +static void trbe_check_errata(struct trbe_cpudata *cpudata) +{
int i;

for (i = 0; i < ARRAY_SIZE(trbe_errata_cpucaps); i++) {
if (this_cpu_has_cap(trbe_errata_cpucaps[i]))
	set_bit(i, cpudata->errata);
}
+}

+static inline bool trbe_has_erratum(struct trbe_cpudata *cpudata, int i) +{

return (i < TRBE_ERRATA_MAX) && test_bit(i, cpudata->errata);

+}

static int trbe_alloc_node(struct perf_event *event) { if (event->cpu == -1) @@ -926,6 +973,8 @@ static void arm_trbe_probe_cpu(void *info) pr_err("Unsupported alignment on cpu %d\n", cpu); goto cpu_clear; }

trbe_check_errata(cpudata); cpudata->trbe_flag = get_trbe_flag_update(trbidr); cpudata->cpu = cpu; cpudata->drvdata = drvdata;

-- 2.24.1

Suzuki K Poulose

21 Sep 21 Sep

1:41 p.m.

New subject: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

We collect the trace from the TRBE on FILL event from IRQ context and when via update_buffer(), when the event is stopped. Let us consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++-------- 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 63f7edd5fd1f..063c4505a203 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) return TRBE_FAULT_ACT_SPURIOUS; }

+static unsigned long trbe_get_trace_size(struct perf_output_handle *handle, + struct trbe_buf *buf, + bool wrap) +{ + u64 write; + u64 start_off, end_off; + + /* + * If the TRBE has wrapped around the write pointer has + * wrapped and should be treated as limit. + */ + if (wrap) + write = get_trbe_limit_pointer(); + else + write = get_trbe_write_pointer(); + + end_off = write - buf->trbe_base; + start_off = PERF_IDX2OFF(handle->head, buf); + + if (WARN_ON_ONCE(end_off < start_off)) + return 0; + return (end_off - start_off); +} + static void *arm_trbe_alloc_buffer(struct coresight_device *csdev, struct perf_event *event, void **pages, int nr_pages, bool snapshot) @@ -588,9 +612,9 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); struct trbe_buf *buf = config; enum trbe_fault_action act; - unsigned long size, offset; - unsigned long write, base, status; + unsigned long size, status; unsigned long flags; + bool wrap = false;

WARN_ON(buf->cpudata != cpudata); WARN_ON(cpudata->cpu != smp_processor_id()); @@ -630,8 +654,6 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, * handle gets freed in etm_event_stop(). */ trbe_drain_and_disable_local(); - write = get_trbe_write_pointer(); - base = get_trbe_base_pointer();

/* Check if there is a pending interrupt and handle it here */ status = read_sysreg_s(SYS_TRBSR_EL1); @@ -655,20 +677,11 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, goto done; }

- /* - * Otherwise, the buffer is full and the write pointer - * has reached base. Adjust this back to the Limit pointer - * for correct size. Also, mark the buffer truncated. - */ - write = get_trbe_limit_pointer(); perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION); + wrap = true; }

- offset = write - base; - if (WARN_ON_ONCE(offset < PERF_IDX2OFF(handle->head, buf))) - size = 0; - else - size = offset - PERF_IDX2OFF(handle->head, buf); + size = trbe_get_trace_size(handle, buf, wrap);

done: local_irq_restore(flags); @@ -749,11 +762,10 @@ static int trbe_handle_overflow(struct perf_output_handle *handle) { struct perf_event *event = handle->event; struct trbe_buf *buf = etm_perf_sink_config(handle); - unsigned long offset, size; + unsigned long size; struct etm_event_data *event_data;

- offset = get_trbe_limit_pointer() - get_trbe_base_pointer(); - size = offset - PERF_IDX2OFF(handle->head, buf); + size = trbe_get_trace_size(handle, buf, true); if (buf->snapshot) handle->head += size;

-- 2.24.1

Mathieu Poirier

30 Sep 30 Sep

5:54 p.m.

New subject: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:

...

We collect the trace from the TRBE on FILL event from IRQ context and when via update_buffer(), when the event is stopped. Let us

s/"and when via"/"and via"

...

consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++-------- 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 63f7edd5fd1f..063c4505a203 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) return TRBE_FAULT_ACT_SPURIOUS; } +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
			 struct trbe_buf *buf,
			 bool wrap)

Stacking

...

+{
u64 write;

u64 start_off, end_off;

/*
* If the TRBE has wrapped around the write pointer has
* wrapped and should be treated as limit.
*/
if (wrap)
write = get_trbe_limit_pointer();
else
write = get_trbe_write_pointer();
end_off = write - buf->trbe_base;

In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is acquired using get_trbe_base_pointer() but here it is referenced directly - any reason for that? It certainly makes reviewing this simple patch quite difficult because I keep wondering if I am missing something subtle...

...

start_off = PERF_IDX2OFF(handle->head, buf);

if (WARN_ON_ONCE(end_off < start_off))
return 0;
return (end_off - start_off);
+}

static void *arm_trbe_alloc_buffer(struct coresight_device *csdev, struct perf_event *event, void **pages, int nr_pages, bool snapshot) @@ -588,9 +612,9 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, struct trbe_cpudata *cpudata = dev_get_drvdata(&csdev->dev); struct trbe_buf *buf = config; enum trbe_fault_action act;

unsigned long size, offset;

unsigned long write, base, status;

unsigned long size, status; unsigned long flags;

bool wrap = false;

WARN_ON(buf->cpudata != cpudata); WARN_ON(cpudata->cpu != smp_processor_id()); @@ -630,8 +654,6 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, * handle gets freed in etm_event_stop(). */ trbe_drain_and_disable_local();

write = get_trbe_write_pointer();

base = get_trbe_base_pointer();

/* Check if there is a pending interrupt and handle it here */ status = read_sysreg_s(SYS_TRBSR_EL1); @@ -655,20 +677,11 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, goto done; }
/*
 * Otherwise, the buffer is full and the write pointer
 * has reached base. Adjust this back to the Limit pointer
 * for correct size. Also, mark the buffer truncated.
 */
write = get_trbe_limit_pointer();
perf_aux_output_flag(handle, PERF_AUX_FLAG_COLLISION);
wrap = true;
}
offset = write - base;

if (WARN_ON_ONCE(offset < PERF_IDX2OFF(handle->head, buf)))
size = 0;
else
size = offset - PERF_IDX2OFF(handle->head, buf);
size = trbe_get_trace_size(handle, buf, wrap);

done: local_irq_restore(flags); @@ -749,11 +762,10 @@ static int trbe_handle_overflow(struct perf_output_handle *handle) { struct perf_event *event = handle->event; struct trbe_buf *buf = etm_perf_sink_config(handle);

unsigned long offset, size;

unsigned long size; struct etm_event_data *event_data;

offset = get_trbe_limit_pointer() - get_trbe_base_pointer();

size = offset - PERF_IDX2OFF(handle->head, buf);

size = trbe_get_trace_size(handle, buf, true); if (buf->snapshot) handle->head += size;

2.24.1

Suzuki K Poulose

1 Oct 1 Oct

8:36 a.m.

New subject: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

On 30/09/2021 18:54, Mathieu Poirier wrote:

...

Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:

...
We collect the trace from the TRBE on FILL event from IRQ context and when via update_buffer(), when the event is stopped. Let us

s/"and when via"/"and via"

...
consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++-------- 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 63f7edd5fd1f..063c4505a203 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) return TRBE_FAULT_ACT_SPURIOUS; } +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
			 struct trbe_buf *buf,
			 bool wrap)
Stacking

Ack

...

...
+{
u64 write;

u64 start_off, end_off;

/*
* If the TRBE has wrapped around the write pointer has
* wrapped and should be treated as limit.
*/
if (wrap)
write = get_trbe_limit_pointer();
else
write = get_trbe_write_pointer();
end_off = write - buf->trbe_base;
In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is acquired using get_trbe_base_pointer() but here it is referenced directly - any reason for that? It certainly makes reviewing this simple patch quite difficult because I keep wondering if I am missing something subtle...

Very good observation. So far, we always prgrammed the TRBBASER with the the VA(ring_buffer[0]). And thus reading the BASER and using the buf->trbe_base is all fine.

But going forward, we are going to use different values for the TRBBASER to work around erratum. Thus to make the computation of the "offsets" within the ring buffer, it is always correct to use this field. I could move this to the patch where the work around is introduced, and put in a comment there.

Thanks for the review

Suzuki

Mathieu Poirier

3:15 p.m.

New subject: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

On Fri, Oct 01, 2021 at 09:36:24AM +0100, Suzuki K Poulose wrote:

...

On 30/09/2021 18:54, Mathieu Poirier wrote:

...
Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:

...
We collect the trace from the TRBE on FILL event from IRQ context and when via update_buffer(), when the event is stopped. Let us

s/"and when via"/"and via"

...
consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++-------- 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 63f7edd5fd1f..063c4505a203 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) return TRBE_FAULT_ACT_SPURIOUS; } +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
			 struct trbe_buf *buf,
			 bool wrap)
Stacking
Ack

...
...
+{
u64 write;

u64 start_off, end_off;

/*
* If the TRBE has wrapped around the write pointer has
* wrapped and should be treated as limit.
*/
if (wrap)
write = get_trbe_limit_pointer();
else
write = get_trbe_write_pointer();
end_off = write - buf->trbe_base;
In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is acquired using get_trbe_base_pointer() but here it is referenced directly - any reason for that? It certainly makes reviewing this simple patch quite difficult because I keep wondering if I am missing something subtle...
Very good observation. So far, we always prgrammed the TRBBASER with the the VA(ring_buffer[0]). And thus reading the BASER and using the buf->trbe_base is all fine.

But going forward, we are going to use different values for the TRBBASER to work around erratum. Thus to make the computation of the "offsets" within the ring buffer, it is always correct to use this field. I could move this to the patch where the work around is introduced, and put in a comment there.

That will be greatly appreciated.

...

Thanks for the review

Suzuki

Suzuki K Poulose

3:22 p.m.

New subject: [PATCH v2 03/17] coresight: trbe: Add a helper to calculate the trace generated

On 01/10/2021 16:15, Mathieu Poirier wrote:

...

On Fri, Oct 01, 2021 at 09:36:24AM +0100, Suzuki K Poulose wrote:

...
On 30/09/2021 18:54, Mathieu Poirier wrote:

...
Hi Suzuki,

On Tue, Sep 21, 2021 at 02:41:07PM +0100, Suzuki K Poulose wrote:

...
We collect the trace from the TRBE on FILL event from IRQ context and when via update_buffer(), when the event is stopped. Let us

s/"and when via"/"and via"

...
consolidate how we calculate the trace generated into a helper.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 48 ++++++++++++-------- 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 63f7edd5fd1f..063c4505a203 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -527,6 +527,30 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) return TRBE_FAULT_ACT_SPURIOUS; } +static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
			 struct trbe_buf *buf,
			 bool wrap)
Stacking
Ack

...
...
+{
u64 write;

u64 start_off, end_off;

/*
* If the TRBE has wrapped around the write pointer has
* wrapped and should be treated as limit.
*/
if (wrap)
write = get_trbe_limit_pointer();
else
write = get_trbe_write_pointer();
end_off = write - buf->trbe_base;
In both arm_trbe_alloc_buffer() and trbe_handle_overflow() the base address is acquired using get_trbe_base_pointer() but here it is referenced directly - any reason for that? It certainly makes reviewing this simple patch quite difficult because I keep wondering if I am missing something subtle...
Very good observation. So far, we always prgrammed the TRBBASER with the the VA(ring_buffer[0]). And thus reading the BASER and using the buf->trbe_base is all fine.

But going forward, we are going to use different values for the TRBBASER to work around erratum. Thus to make the computation of the "offsets" within the ring buffer, it is always correct to use this field. I could move this to the patch where the work around is introduced, and put in a comment there.
That will be greatly appreciated.

I have moved this to the patch, which introduces the concept of "TRBE using" a different BASE address than the beginning of the ring buffer.

Thanks Suzuki

Suzuki K Poulose

21 Sep 21 Sep

1:41 p.m.

New subject: [PATCH v2 04/17] coresight: trbe: Add a helper to pad a given buffer area

Refactor the helper to pad a given AUX buffer area to allow "filling" ignore packets, without moving any handle pointers. This will be useful in working around errata, where we may have to fill the buffer after a session.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 063c4505a203..a32ef083aa36 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -227,12 +227,18 @@ static void trbe_stop_and_truncate_event(struct perf_output_handle *handle) * consumed from the user space. The enabled TRBE buffer area is a moving subset of * the allocated perf auxiliary buffer. */ + +static void __trbe_pad_buf(struct trbe_buf *buf, u64 offset, int len) +{ + memset((void *)buf->trbe_base + offset, ETE_IGNORE_PACKET, len); +} + static void trbe_pad_buf(struct perf_output_handle *handle, int len) { struct trbe_buf *buf = etm_perf_sink_config(handle); u64 head = PERF_IDX2OFF(handle->head, buf);

- memset((void *)buf->trbe_base + head, ETE_IGNORE_PACKET, len); + __trbe_pad_buf(buf, head, len); if (!buf->snapshot) perf_aux_output_skip(handle, len); }

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 05/17] coresight: trbe: Decouple buffer base from the hardware base

We always set the TRBBASER_EL1 to the base of the virtual ring buffer. We are about to change this for working around an erratum. So, in preparation to that, allow the driver to choose a different base for the TRBBASER_EL1 (which is within the buffer range).

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index a32ef083aa36..27616eac24ba 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -59,6 +59,8 @@ struct trbe_buf { * trbe_limit sibling pointers. */ unsigned long trbe_base; + /* The base programmed into the TRBE */ + unsigned long trbe_hw_base; unsigned long trbe_limit; unsigned long trbe_write; int nr_pages; @@ -498,12 +500,13 @@ static void set_trbe_limit_pointer_enabled(unsigned long addr)

static void trbe_enable_hw(struct trbe_buf *buf) { - WARN_ON(buf->trbe_write < buf->trbe_base); + WARN_ON(buf->trbe_hw_base < buf->trbe_base); + WARN_ON(buf->trbe_write < buf->trbe_hw_base); WARN_ON(buf->trbe_write >= buf->trbe_limit); set_trbe_disabled(); isb(); clr_trbe_status(); - set_trbe_base_pointer(buf->trbe_base); + set_trbe_base_pointer(buf->trbe_hw_base); set_trbe_write_pointer(buf->trbe_write);

/* @@ -707,6 +710,8 @@ static int __arm_trbe_enable(struct trbe_buf *buf, trbe_stop_and_truncate_event(handle); return -ENOSPC; } + /* Set the base of the TRBE to the buffer base */ + buf->trbe_hw_base = buf->trbe_base; *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle; trbe_enable_hw(buf); return 0; @@ -804,7 +809,7 @@ static bool is_perf_trbe(struct perf_output_handle *handle) struct trbe_drvdata *drvdata = cpudata->drvdata; int cpu = smp_processor_id();

- WARN_ON(buf->trbe_base != get_trbe_base_pointer()); + WARN_ON(buf->trbe_hw_base != get_trbe_base_pointer()); WARN_ON(buf->trbe_limit != get_trbe_limit_pointer());

if (cpudata->mode != CS_MODE_PERF)

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 06/17] coresight: trbe: Allow driver to choose a different alignment

The TRBE hardware mandates a minimum alignment for the TRBPTR_EL1, advertised via the TRBIDR_EL1. This is used by the driver to align the buffer write head. This patch allows the driver to choose a different alignment from that of the hardware, by decoupling the alignment tracking. This will be useful for working around errata.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 27616eac24ba..f569010c672b 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -92,7 +92,8 @@ static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = { /* * struct trbe_cpudata: TRBE instance specific data * @trbe_flag - TRBE dirty/access flag support - * @tbre_align - Actual TRBE alignment required for TRBPTR_EL1. + * @trbe_hw_align - Actual TRBE alignment required for TRBPTR_EL1. + * @trbe_align - Software alignment used for the TRBPTR_EL1, * @cpu - CPU this TRBE belongs to. * @mode - Mode of current operation. (perf/disabled) * @drvdata - TRBE specific drvdata @@ -100,6 +101,7 @@ static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = { */ struct trbe_cpudata { bool trbe_flag; + u64 trbe_hw_align; u64 trbe_align; int cpu; enum cs_mode mode; @@ -903,7 +905,7 @@ static ssize_t align_show(struct device *dev, struct device_attribute *attr, cha { struct trbe_cpudata *cpudata = dev_get_drvdata(dev);

- return sprintf(buf, "%llx\n", cpudata->trbe_align); + return sprintf(buf, "%llx\n", cpudata->trbe_hw_align); } static DEVICE_ATTR_RO(align);

@@ -991,13 +993,14 @@ static void arm_trbe_probe_cpu(void *info) goto cpu_clear; }

- cpudata->trbe_align = 1ULL << get_trbe_address_align(trbidr); - if (cpudata->trbe_align > SZ_2K) { + cpudata->trbe_hw_align = 1ULL << get_trbe_address_align(trbidr); + if (cpudata->trbe_hw_align > SZ_2K) { pr_err("Unsupported alignment on cpu %d\n", cpu); goto cpu_clear; }

trbe_check_errata(cpudata); + cpudata->trbe_align = cpudata->trbe_hw_align; cpudata->trbe_flag = get_trbe_flag_update(trbidr); cpudata->cpu = cpu; cpudata->drvdata = drvdata;

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 07/17] arm64: Add Neoverse-N2, Cortex-A710 CPU part definition

Add the CPU Partnumbers for the new Arm designs.

Cc: Catalin Marinas catalin.marinas@arm.com Cc: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will@kernel.org Acked-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- arch/arm64/include/asm/cputype.h | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 6231e1f0abe7..19b8441aa8f2 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -73,6 +73,8 @@ #define ARM_CPU_PART_CORTEX_A76 0xD0B #define ARM_CPU_PART_NEOVERSE_N1 0xD0C #define ARM_CPU_PART_CORTEX_A77 0xD0D +#define ARM_CPU_PART_CORTEX_A710 0xD47 +#define ARM_CPU_PART_NEOVERSE_N2 0xD49

#define APM_CPU_PART_POTENZA 0x000

@@ -113,6 +115,8 @@ #define MIDR_CORTEX_A76 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A76) #define MIDR_NEOVERSE_N1 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N1) #define MIDR_CORTEX_A77 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A77) +#define MIDR_CORTEX_A710 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_CORTEX_A710) +#define MIDR_NEOVERSE_N2 MIDR_CPU_MODEL(ARM_CPU_IMP_ARM, ARM_CPU_PART_NEOVERSE_N2) #define MIDR_THUNDERX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX) #define MIDR_THUNDERX_81XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_81XX) #define MIDR_THUNDERX_83XX MIDR_CPU_MODEL(ARM_CPU_IMP_CAVIUM, CAVIUM_CPU_PART_THUNDERX_83XX)

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 08/17] arm64: Add erratum detection for TRBE overwrite in FILL mode

Arm Neoverse-N2 and the Cortex-A710 cores are affected by a CPU erratum where the TRBE will overwrite the trace buffer in FILL mode. The TRBE doesn't stop (as expected in FILL mode) when it reaches the limit and wraps to the base to continue writing upto 3 cache lines. This will overwrite any trace that was written previously.

Add the Neoverse-N2 erratumi(#2139208) and Cortex-A710 erratum (#2119858) to the detection logic.

This will be used by the TRBE driver in later patches to work around the issue. The detection has been kept with the core arm64 errata framework list to make sure : - We don't duplicate the framework in TRBE driver - The errata detection is advertised like the rest of the CPU errata.

Note that the Kconfig entries will be added after we have added the work around in the TRBE driver, which depends on the cpucap from here.

Cc: Will Deacon will@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org cc: Leo Yan leo.yan@linaro.org Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Acked-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- arch/arm64/kernel/cpu_errata.c | 25 +++++++++++++++++++++++++ arch/arm64/tools/cpucaps | 1 + 2 files changed, 26 insertions(+)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index e2c20c036442..ccd757373f36 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -340,6 +340,18 @@ static const struct midr_range erratum_1463225[] = { }; #endif

+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE +static const struct midr_range trbe_overwrite_fill_mode_cpus[] = { +#ifdef CONFIG_ARM64_ERRATUM_2139208 + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), +#endif +#ifdef CONFIG_ARM64_ERRATUM_2119858 + MIDR_ALL_VERSIONS(MIDR_CORTEX_A710), +#endif + {}, +}; +#endif /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */ + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -533,6 +545,19 @@ const struct arm64_cpu_capabilities arm64_errata[] = { .capability = ARM64_WORKAROUND_NVIDIA_CARMEL_CNP, ERRATA_MIDR_ALL_VERSIONS(MIDR_NVIDIA_CARMEL), }, +#endif +#ifdef CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE + { + /* + * The erratum work around is handled within the TRBE + * driver and can be applied per-cpu. So, we can allow + * a late CPU to come online with this erratum. + */ + .desc = "ARM erratum 2119858 or 2139208", + .capability = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, + .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, + CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus), + }, #endif { } diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 49305c2e6dfd..1ccb92165bd8 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -53,6 +53,7 @@ WORKAROUND_1418040 WORKAROUND_1463225 WORKAROUND_1508412 WORKAROUND_1542419 +WORKAROUND_TRBE_OVERWRITE_FILL_MODE WORKAROUND_CAVIUM_23154 WORKAROUND_CAVIUM_27456 WORKAROUND_CAVIUM_30115

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode

ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from an erratum, which when triggered, might cause the TRBE to overwrite the trace data already collected in FILL mode, in the event of a WRAP. i.e, the TRBE doesn't stop writing the data, instead wraps to the base and could write upto 3 cache line size worth trace. Thus, this could corrupt the trace at the "BASE" pointer.

The workaround is to program the write pointer 256bytes from the base, such that if the erratum is triggered, it doesn't overwrite the trace data that was captured. This skipped region could be padded with ignore packets at the end of the session, so that the decoder sees a continuous buffer with some padding at the beginning. The trace data written at the base is considered lost as the limit could have been in the middle of the perf ring buffer, and jumping to the "base" is not acceptable. We set the flags already to indicate that some amount of trace was lost during the FILL event IRQ. So this is fine.

One important change with the work around is, we program the TRBBASER_EL1 to current page where we are allowed to write. Otherwise, it could overwrite a region that may be consumed by the perf. Towards this, we always make sure that the "handle->head" and thus the trbe_write is PAGE_SIZE aligned, so that we can set the BASE to the PAGE base and move the TRBPTR to the 256bytes offset.

Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Change since v1: - Updated comment with ASCII art - Add _BYTES suffix for the space to skip for the work around. --- drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++-- 1 file changed, 132 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index f569010c672b..983dd5039e52 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,7 @@ #define pr_fmt(fmt) DRVNAME ": " fmt

#include <asm/barrier.h> +#include <asm/cpufeature.h> #include <asm/cputype.h>

#include "coresight-self-hosted-trace.h" @@ -84,9 +85,17 @@ struct trbe_buf { * per TRBE instance, we keep track of the list of errata that * affects the given instance of the TRBE. */ -#define TRBE_ERRATA_MAX 0 +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0 +#define TRBE_ERRATA_MAX 1 + +/* + * Safe limit for the number of bytes that may be overwritten + * when the erratum is triggered. + */ +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256

static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = { + [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, };

/* @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf) set_trbe_limit_pointer_enabled(buf->trbe_limit); }

-static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle, + u64 trbsr) { int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr); + struct trbe_buf *buf = etm_perf_sink_config(handle); + struct trbe_cpudata *cpudata = buf->cpudata;

WARN_ON(is_trbe_running(trbsr)); if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr)) @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT)) return TRBE_FAULT_ACT_FATAL;

- if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) { - if (get_trbe_write_pointer() == get_trbe_base_pointer()) - return TRBE_FAULT_ACT_WRAP; - } + /* + * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE, + * it might write data after a WRAP event in the fill mode. + * Thus the check TRBPTR == TRBBASER will not be honored. + */ + if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) && + (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) || + get_trbe_write_pointer() == get_trbe_base_pointer())) + return TRBE_FAULT_ACT_WRAP; + return TRBE_FAULT_ACT_SPURIOUS; }

@@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle, { u64 write; u64 start_off, end_off; + u64 size; + u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;

/* * If the TRBE has wrapped around the write pointer has @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,

if (WARN_ON_ONCE(end_off < start_off)) return 0; - return (end_off - start_off); + + size = end_off - start_off; + /* + * If the TRBE is affected by the following erratum, we must fill + * the space we skipped with IGNORE packets. And we are always + * guaranteed to have at least a PAGE_SIZE space in the buffer. + */ + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) && + !WARN_ON(size < overwrite_skip)) + __trbe_pad_buf(buf, start_off, overwrite_skip); + + return size; }

static void *arm_trbe_alloc_buffer(struct coresight_device *csdev, @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, clr_trbe_irq(); isb();

- act = trbe_get_fault_act(status); + act = trbe_get_fault_act(handle, status); /* * If this was not due to a WRAP event, we have some * errors and as such buffer is empty. @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, return size; }

+ +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf) +{ + /* + * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache + * line size from the "TRBBASER_EL1" in the event of a "FILL". + * Thus, we could loose some amount of the trace at the base. + * + * Before Fix: + * + * normal-BASE head normal-PTR tail normal-LIMIT + * | / / + * ------------------------------------------------------------- + * | | |xyzdefghij..|... tuvw| | + * ------------------------------------------------------------- + * / | \ + * After Fix-> TRBBASER TRBPTR TRBLIMITR.LIMIT + * + * In the normal course of action, we would set the TRBBASER to the + * beginning of the ring-buffer (normal-BASE). But with the erratum, + * the TRBE could overwrite the contents at the "normal-BASE", after + * hitting the "normal-LIMIT", since it doesn't stop as expected. And + * this is wrong. So we must always make sure that the TRBBASER is + * within the region [head, head+size]. + * + * Also, we would set the TRBPTR to head (after adjusting for + * alignment) at normal-PTR. This would mean that the last few bytes + * of the trace (say, "xyz") might overwrite the first few bytes of + * trace written ("abc"). More importantly they will appear in what\ + * userspace sees as the beginning of the trace, which is wrong. We may + * not always have space to move the latest trace "xyz" to the correct + * order as it must appear beyond the LIMIT. (i.e, [head..head+size]. + * Thus it is easier to ignore those bytes than to complicate the + * driver to move it, assuming that the erratum was triggered and doing + * additional checks to see if there is indeed allowed space at + * TRBLIMITR.LIMIT. + * + * To summarize, with the work around: + * + * - We always align the offset for the next session to PAGE_SIZE + * (This is to ensure we can program the TRBBASER to this offset + * within the region [head...head+size]). + * + * - At TRBE enable: + * - Set the TRBBASER to the page aligned offset of the current + * proposed write offset. (which is guaranteed to be aligned + * as above) + * - Move the TRBPTR to skip first 256bytes (that might be + * overwritten with the erratum). This ensures that the trace + * generated in the session is not re-written. + * + * - At trace collection: + * - Pad the 256bytes skipped above again with IGNORE packets. + */ + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) { + if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE))) + return -EINVAL; + buf->trbe_hw_base = buf->trbe_write; + buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES; + } + + return 0; +} + static int __arm_trbe_enable(struct trbe_buf *buf, struct perf_output_handle *handle) { + int ret = 0; + perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW); buf->trbe_limit = compute_trbe_buffer_limit(handle); buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf); if (buf->trbe_limit == buf->trbe_base) { - trbe_stop_and_truncate_event(handle); - return -ENOSPC; + ret = -ENOSPC; + goto err; } /* Set the base of the TRBE to the buffer base */ buf->trbe_hw_base = buf->trbe_base; + + ret = trbe_apply_work_around_before_enable(buf); + if (ret) + goto err; + *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle; trbe_enable_hw(buf); return 0; +err: + trbe_stop_and_truncate_event(handle); + return ret; }

static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) if (!is_perf_trbe(handle)) return IRQ_NONE;

- act = trbe_get_fault_act(status); + act = trbe_get_fault_act(handle, status); switch (act) { case TRBE_FAULT_ACT_WRAP: truncated = !!trbe_handle_overflow(handle); @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info) }

trbe_check_errata(cpudata); - cpudata->trbe_align = cpudata->trbe_hw_align; + /* + * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE, + * we must always program the TBRPTR_EL1, 256bytes from a page + * boundary, with TRBBASER_EL1 set to the page, to prevent + * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event. + * + * Thus make sure we always align our write pointer to a PAGE_SIZE, + * which also guarantees that we have at least a PAGE_SIZE space in + * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip + * the required bytes at the base. + */ + if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) + cpudata->trbe_align = PAGE_SIZE; + else + cpudata->trbe_align = cpudata->trbe_hw_align; + cpudata->trbe_flag = get_trbe_flag_update(trbidr); cpudata->cpu = cpu; cpudata->drvdata = drvdata;

-- 2.24.1

Mathieu Poirier

1 Oct 1 Oct

5:15 p.m.

New subject: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode

On Tue, Sep 21, 2021 at 02:41:13PM +0100, Suzuki K Poulose wrote:

...

ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from an erratum, which when triggered, might cause the TRBE to overwrite the trace data already collected in FILL mode, in the event of a WRAP. i.e, the TRBE doesn't stop writing the data, instead wraps to the base and could write upto 3 cache line size worth trace. Thus, this could corrupt the trace at the "BASE" pointer.

The workaround is to program the write pointer 256bytes from the base, such that if the erratum is triggered, it doesn't overwrite the trace data that was captured. This skipped region could be padded with ignore packets at the end of the session, so that the decoder sees a continuous buffer with some padding at the beginning. The trace data written at the base is considered lost as the limit could have been in the middle of the perf ring buffer, and jumping to the "base" is not acceptable. We set the flags already to indicate that some amount of trace was lost during the FILL event IRQ. So this is fine.

One important change with the work around is, we program the TRBBASER_EL1 to current page where we are allowed to write. Otherwise, it could overwrite a region that may be consumed by the perf. Towards this, we always make sure that the "handle->head" and thus the trbe_write is PAGE_SIZE aligned, so that we can set the BASE to the PAGE base and move the TRBPTR to the 256bytes offset.

Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

Change since v1:

Updated comment with ASCII art

Add _BYTES suffix for the space to skip for the work around.

drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++-- 1 file changed, 132 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index f569010c672b..983dd5039e52 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,7 @@ #define pr_fmt(fmt) DRVNAME ": " fmt #include <asm/barrier.h> +#include <asm/cpufeature.h> #include <asm/cputype.h> #include "coresight-self-hosted-trace.h" @@ -84,9 +85,17 @@ struct trbe_buf {

per TRBE instance, we keep track of the list of errata that

affects the given instance of the TRBE.

*/ -#define TRBE_ERRATA_MAX 0 +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0 +#define TRBE_ERRATA_MAX 1

+/*

Safe limit for the number of bytes that may be overwritten

when the erratum is triggered.

*/

+#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256 static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {

[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,

}; /* @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf) set_trbe_limit_pointer_enabled(buf->trbe_limit); } -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
				 u64 trbsr)
{ int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);

struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata;

WARN_ON(is_trbe_running(trbsr)); if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr)) @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT)) return TRBE_FAULT_ACT_FATAL;
if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
if (get_trbe_write_pointer() == get_trbe_base_pointer())
	return TRBE_FAULT_ACT_WRAP;
}
/*
* If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
* it might write data after a WRAP event in the fill mode.
* Thus the check TRBPTR == TRBBASER will not be honored.
*/
if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
   (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
    get_trbe_write_pointer() == get_trbe_base_pointer()))
return TRBE_FAULT_ACT_WRAP;

I'm very perplexed by the trbe_has_erratum() infrastructure... Since this is a TRBE the code will always run on the CPU it is associated with, and if I'm correct here we could call this_cpu_has_cap() directly with the same outcome. I doubt that all divers using the cpucaps subsystem carry a shadow structure to keep the same information.

I have to stop here for today. Although small in size this patchset demands a fair amount of involvement - I will continue next week but I may not go through the whole thing for this revision.

Thanks, Mathieu

...

return TRBE_FAULT_ACT_SPURIOUS; } @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle, { u64 write; u64 start_off, end_off;

u64 size;

u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;

/* * If the TRBE has wrapped around the write pointer has @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle, if (WARN_ON_ONCE(end_off < start_off)) return 0;

return (end_off - start_off);
size = end_off - start_off;

/*
* If the TRBE is affected by the following erratum, we must fill
* the space we skipped with IGNORE packets. And we are always
* guaranteed to have at least a PAGE_SIZE space in the buffer.
*/
if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
   !WARN_ON(size < overwrite_skip))
__trbe_pad_buf(buf, start_off, overwrite_skip);
return size;
} static void *arm_trbe_alloc_buffer(struct coresight_device *csdev, @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, clr_trbe_irq(); isb();
act = trbe_get_fault_act(status);
act = trbe_get_fault_act(handle, status);
/*

If this was not due to a WRAP event, we have some

errors and as such buffer is empty.
@@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev, return size; }

+static int trbe_apply_work_around_before_enable(struct trbe_buf *buf) +{
/*
* TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
* line size from the "TRBBASER_EL1" in the event of a "FILL".
* Thus, we could loose some amount of the trace at the base.
*
* Before Fix:
*
*  normal-BASE     head  normal-PTR              tail normal-LIMIT
*  |                   \/                       /
*   -------------------------------------------------------------
*  |         |          |xyzdefghij..|...  tuvw|                |
*   -------------------------------------------------------------
*                      /    |                   \
* After Fix->  TRBBASER     TRBPTR              TRBLIMITR.LIMIT
*
* In the normal course of action, we would set the TRBBASER to the
* beginning of the ring-buffer (normal-BASE). But with the erratum,
* the TRBE could overwrite the contents at the "normal-BASE", after
* hitting the "normal-LIMIT", since it doesn't stop as expected. And
* this is wrong. So we must always make sure that the TRBBASER is
* within the region [head, head+size].
*
* Also, we would set the TRBPTR to head (after adjusting for
* alignment) at normal-PTR. This would mean that the last few bytes
* of the trace (say, "xyz") might overwrite the first few bytes of
* trace written ("abc"). More importantly they will appear in what\
* userspace sees as the beginning of the trace, which is wrong. We may
* not always have space to move the latest trace "xyz" to the correct
* order as it must appear beyond the LIMIT. (i.e, [head..head+size].
* Thus it is easier to ignore those bytes than to complicate the
* driver to move it, assuming that the erratum was triggered and doing
* additional checks to see if there is indeed allowed space at
* TRBLIMITR.LIMIT.
*
* To summarize, with the work around:
*
*  - We always align the offset for the next session to PAGE_SIZE
*    (This is to ensure we can program the TRBBASER to this offset
*    within the region [head...head+size]).
*
*  - At TRBE enable:
*     - Set the TRBBASER to the page aligned offset of the current
*       proposed write offset. (which is guaranteed to be aligned
*       as above)
*     - Move the TRBPTR to skip first 256bytes (that might be
*       overwritten with the erratum). This ensures that the trace
*       generated in the session is not re-written.
*
*  - At trace collection:
*     - Pad the 256bytes skipped above again with IGNORE packets.
*/
if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
	return -EINVAL;
buf->trbe_hw_base = buf->trbe_write;
buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
}

return 0;
+}

static int __arm_trbe_enable(struct trbe_buf *buf, struct perf_output_handle *handle) {

int ret = 0;

perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW); buf->trbe_limit = compute_trbe_buffer_limit(handle); buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf); if (buf->trbe_limit == buf->trbe_base) {
trbe_stop_and_truncate_event(handle);
return -ENOSPC;
ret = -ENOSPC;
goto err;
} /* Set the base of the TRBE to the buffer base */ buf->trbe_hw_base = buf->trbe_base;
ret = trbe_apply_work_around_before_enable(buf);

if (ret)
goto err;
*this_cpu_ptr(buf->cpudata->drvdata->handle) = handle; trbe_enable_hw(buf); return 0;
+err:

trbe_stop_and_truncate_event(handle);

return ret;

} static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data) @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev) if (!is_perf_trbe(handle)) return IRQ_NONE;

act = trbe_get_fault_act(status);

act = trbe_get_fault_act(handle, status); switch (act) { case TRBE_FAULT_ACT_WRAP: truncated = !!trbe_handle_overflow(handle);

@@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info) } trbe_check_errata(cpudata);

cpudata->trbe_align = cpudata->trbe_hw_align;
/*
* If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
* we must always program the TBRPTR_EL1, 256bytes from a page
* boundary, with TRBBASER_EL1 set to the page, to prevent
* TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
*
* Thus make sure we always align our write pointer to a PAGE_SIZE,
* which also guarantees that we have at least a PAGE_SIZE space in
* the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
* the required bytes at the base.
*/
if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
cpudata->trbe_align = PAGE_SIZE;
else
cpudata->trbe_align = cpudata->trbe_hw_align;
cpudata->trbe_flag = get_trbe_flag_update(trbidr); cpudata->cpu = cpu; cpudata->drvdata = drvdata;
-- 2.24.1

Suzuki K Poulose

4 Oct 4 Oct

8:46 a.m.

New subject: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode

Hi Mathieu

On 01/10/2021 18:15, Mathieu Poirier wrote:

...

On Tue, Sep 21, 2021 at 02:41:13PM +0100, Suzuki K Poulose wrote:

...
ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from an erratum, which when triggered, might cause the TRBE to overwrite the trace data already collected in FILL mode, in the event of a WRAP. i.e, the TRBE doesn't stop writing the data, instead wraps to the base and could write upto 3 cache line size worth trace. Thus, this could corrupt the trace at the "BASE" pointer.

The workaround is to program the write pointer 256bytes from the base, such that if the erratum is triggered, it doesn't overwrite the trace data that was captured. This skipped region could be padded with ignore packets at the end of the session, so that the decoder sees a continuous buffer with some padding at the beginning. The trace data written at the base is considered lost as the limit could have been in the middle of the perf ring buffer, and jumping to the "base" is not acceptable. We set the flags already to indicate that some amount of trace was lost during the FILL event IRQ. So this is fine.

One important change with the work around is, we program the TRBBASER_EL1 to current page where we are allowed to write. Otherwise, it could overwrite a region that may be consumed by the perf. Towards this, we always make sure that the "handle->head" and thus the trbe_write is PAGE_SIZE aligned, so that we can set the BASE to the PAGE base and move the TRBPTR to the 256bytes offset.

Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

Change since v1:

Updated comment with ASCII art

Add _BYTES suffix for the space to skip for the work around.

drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++-- 1 file changed, 132 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index f569010c672b..983dd5039e52 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,7 @@ #define pr_fmt(fmt) DRVNAME ": " fmt #include <asm/barrier.h> +#include <asm/cpufeature.h> #include <asm/cputype.h> #include "coresight-self-hosted-trace.h" @@ -84,9 +85,17 @@ struct trbe_buf {

per TRBE instance, we keep track of the list of errata that

affects the given instance of the TRBE.

*/ -#define TRBE_ERRATA_MAX 0 +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0 +#define TRBE_ERRATA_MAX 1

+/*

Safe limit for the number of bytes that may be overwritten

when the erratum is triggered.

*/

+#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256 static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {

[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, };

/* @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf) set_trbe_limit_pointer_enabled(buf->trbe_limit); } -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
				 u64 trbsr)
{ int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);
struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata;
WARN_ON(is_trbe_running(trbsr)); if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr)) @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT)) return TRBE_FAULT_ACT_FATAL;
if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
if (get_trbe_write_pointer() == get_trbe_base_pointer())
	return TRBE_FAULT_ACT_WRAP;
}
/*
* If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
* it might write data after a WRAP event in the fill mode.
* Thus the check TRBPTR == TRBBASER will not be honored.
*/
if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
   (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
    get_trbe_write_pointer() == get_trbe_base_pointer()))
return TRBE_FAULT_ACT_WRAP;
I'm very perplexed by the trbe_has_erratum() infrastructure... Since this is a TRBE the code will always run on the CPU it is associated with, and if I'm correct here we could call this_cpu_has_cap() directly with the same outcome. I doubt that all divers using the cpucaps subsystem carry a shadow structure to keep the same information.

Very valid question. Of course, we can use the this_cpu_has_cap() helper. Unlike the cpus_have_*_cap() - which gives you the system wide status of the erratum - the cpucap doesn't keep a cache of which CPUs are affected by a given erratum. Thus this_cpu_has_cap() would involve running the detection on the current CPU everytime we call it. i.e, scanning the MIDR of the CPU through the list of affected MIDRs for the given erratum. This is a bit of overhead.

Given that we already have CPU specific information in trbe_cpudata, I chose to cache the affected errata locally. This gives us quick access to the erratum for individual TRBE instances. Of course this list is initialised at TRBE probe and thus avoids us having to do the costly check, each time we need it. I could make this clear in the patch which introduces the framework.

Thanks for the review

Suzuki

...

Thanks, Mathieu

Mathieu Poirier

4:47 p.m.

New subject: [PATCH v2 09/17] coresight: trbe: Workaround TRBE errata overwrite in FILL mode

Good morning,

On Mon, Oct 04, 2021 at 09:46:07AM +0100, Suzuki K Poulose wrote:

...

Hi Mathieu

On 01/10/2021 18:15, Mathieu Poirier wrote:

...
On Tue, Sep 21, 2021 at 02:41:13PM +0100, Suzuki K Poulose wrote:

...
ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from an erratum, which when triggered, might cause the TRBE to overwrite the trace data already collected in FILL mode, in the event of a WRAP. i.e, the TRBE doesn't stop writing the data, instead wraps to the base and could write upto 3 cache line size worth trace. Thus, this could corrupt the trace at the "BASE" pointer.

The workaround is to program the write pointer 256bytes from the base, such that if the erratum is triggered, it doesn't overwrite the trace data that was captured. This skipped region could be padded with ignore packets at the end of the session, so that the decoder sees a continuous buffer with some padding at the beginning. The trace data written at the base is considered lost as the limit could have been in the middle of the perf ring buffer, and jumping to the "base" is not acceptable. We set the flags already to indicate that some amount of trace was lost during the FILL event IRQ. So this is fine.

One important change with the work around is, we program the TRBBASER_EL1 to current page where we are allowed to write. Otherwise, it could overwrite a region that may be consumed by the perf. Towards this, we always make sure that the "handle->head" and thus the trbe_write is PAGE_SIZE aligned, so that we can set the BASE to the PAGE base and move the TRBPTR to the 256bytes offset.

Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

Change since v1:

Updated comment with ASCII art

Add _BYTES suffix for the space to skip for the work around.

drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++-- 1 file changed, 132 insertions(+), 12 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index f569010c672b..983dd5039e52 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -16,6 +16,7 @@ #define pr_fmt(fmt) DRVNAME ": " fmt #include <asm/barrier.h> +#include <asm/cpufeature.h> #include <asm/cputype.h> #include "coresight-self-hosted-trace.h" @@ -84,9 +85,17 @@ struct trbe_buf {

per TRBE instance, we keep track of the list of errata that

affects the given instance of the TRBE.

*/ -#define TRBE_ERRATA_MAX 0 +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0 +#define TRBE_ERRATA_MAX 1

+/*

Safe limit for the number of bytes that may be overwritten

when the erratum is triggered.

*/

+#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256 static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {

[TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, }; /*

@@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf) set_trbe_limit_pointer_enabled(buf->trbe_limit); } -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
				 u64 trbsr)
{ int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);
struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata; WARN_ON(is_trbe_running(trbsr)); if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
@@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr) if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT)) return TRBE_FAULT_ACT_FATAL;
if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
if (get_trbe_write_pointer() == get_trbe_base_pointer())
	return TRBE_FAULT_ACT_WRAP;
}
/*
* If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
* it might write data after a WRAP event in the fill mode.
* Thus the check TRBPTR == TRBBASER will not be honored.
*/
if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
   (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
    get_trbe_write_pointer() == get_trbe_base_pointer()))
return TRBE_FAULT_ACT_WRAP;
I'm very perplexed by the trbe_has_erratum() infrastructure... Since this is a TRBE the code will always run on the CPU it is associated with, and if I'm correct here we could call this_cpu_has_cap() directly with the same outcome. I doubt that all divers using the cpucaps subsystem carry a shadow structure to keep the same information.
Very valid question. Of course, we can use the this_cpu_has_cap() helper. Unlike the cpus_have_*_cap() - which gives you the system wide status of the erratum - the cpucap doesn't keep a cache of which CPUs are affected by a given erratum. Thus this_cpu_has_cap() would involve running the detection on the current CPU everytime we call it. i.e, scanning the MIDR of the CPU through the list of affected MIDRs for the given erratum. This is a bit of overhead.

I've looked around in the kernel for other places where this_cpu_has_cap() is used. In most instance it is part of some initialisation code where actions are taken based on the turn value of the function. In our case we need to call this regularly so yes, I agree with your design.

...

Given that we already have CPU specific information in trbe_cpudata, I chose to cache the affected errata locally. This gives us quick access to the erratum for individual TRBE instances. Of course this list is initialised at TRBE probe and thus avoids us having to do the costly check, each time we need it. I could make this clear in the patch which introduces the framework.

Yes please.

Thanks, Mathieu

...

Thanks for the review

Suzuki

...
Thanks, Mathieu

Suzuki K Poulose

21 Sep 21 Sep

1:41 p.m.

New subject: [PATCH v2 10/17] arm64: Enable workaround for TRBE overwrite in FILL mode

Now that we have the work around implmented in the TRBE driver, add the Kconfig entries and document the errata.

Cc: Mark Rutland mark.rutland@arm.com Cc: Will Deacon will@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Documentation/arm64/silicon-errata.rst | 4 +++ arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++ 2 files changed, 43 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index d410a47ffa57..2f99229d993c 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -92,12 +92,16 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A77 | #1508412 | ARM64_ERRATUM_1508412 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1349291 | N/A | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1542419 | ARM64_ERRATUM_1542419 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 077f2ec4eeb2..eac4030322df 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -666,6 +666,45 @@ config ARM64_ERRATUM_1508412

If unsure, say Y.

+config ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE + bool + +config ARM64_ERRATUM_2119858 + bool "Cortex-A710: 2119858: workaround TRBE overwriting trace data in FILL mode" + default y + depends on CORESIGHT_TRBE + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE + help + This option adds the workaround for ARM Cortex-A710 erratum 2119858. + + Affected Cortex-A710 cores could overwrite upto 3 cache lines of trace + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in + the event of a WRAP event. + + Work around the issue by always making sure we move the TRBPTR_EL1 by + 256bytes before enabling the buffer and filling the first 256bytes of + the buffer with ETM ignore packets upon disabling. + + If unsure, say Y. + +config ARM64_ERRATUM_2139208 + bool "Neoverse-N2: 2139208: workaround TRBE overwriting trace data in FILL mode" + default y + depends on CORESIGHT_TRBE + select ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE + help + This option adds the workaround for ARM Neoverse-N2 erratum 2139208. + + Affected Neoverse-N2 cores could overwrite upto 3 cache lines of trace + data at the base of the buffer (ponited by TRBASER_EL1) in FILL mode in + the event of a WRAP event. + + Work around the issue by always making sure we move the TRBPTR_EL1 by + 256bytes before enabling the buffer and filling the first 256bytes of + the buffer with ETM ignore packets upon disabling. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 11/17] arm64: errata: Add workaround for TSB flush failures

Arm Neoverse-N2 (#2067961) and Cortex-A710 (#2054223) suffers from errata, where a TSB (trace synchronization barrier) fails to flush the trace data completely, when executed from a trace prohibited region. In Linux we always execute it after we have moved the PE to trace prohibited region. So, we can apply the workaround everytime a TSB is executed.

The work around is to issue two TSB consecutively.

NOTE: This errata is defined as LOCAL_CPU_ERRATUM, implying that a late CPU could be blocked from booting if it is the first CPU that requires the workaround. This is because we do not allow setting a cpu_hwcaps after the SMP boot. The other alternative is to use "this_cpu_has_cap()" instead of the faster system wide check, which may be a bit of an overhead, given we may have to do this in nvhe KVM host before a guest entry.

Cc: Will Deacon will@kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Mark Rutland mark.rutland@arm.com Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Marc Zyngier maz@kernel.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Changes since v1: - Switch to cpus_have_final_cap() - Document the requirements on TSB. --- Documentation/arm64/silicon-errata.rst | 4 ++++ arch/arm64/Kconfig | 31 ++++++++++++++++++++++++++ arch/arm64/include/asm/barrier.h | 16 ++++++++++++- arch/arm64/kernel/cpu_errata.c | 19 ++++++++++++++++ arch/arm64/tools/cpucaps | 1 + 5 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index 2f99229d993c..569a92411dcd 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -94,6 +94,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2119858 | ARM64_ERRATUM_2119858 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1349291 | N/A | @@ -102,6 +104,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N2 | #2139208 | ARM64_ERRATUM_2139208 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index eac4030322df..0764774e12bb 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -705,6 +705,37 @@ config ARM64_ERRATUM_2139208

If unsure, say Y.

+config ARM64_WORKAROUND_TSB_FLUSH_FAILURE + bool + +config ARM64_ERRATUM_2054223 + bool "Cortex-A710: 2054223: workaround TSB instruction failing to flush trace" + default y + help + Enable workaround for ARM Cortex-A710 erratum 2054223 + + Affected cores may fail to flush the trace data on a TSB instruction, when + the PE is in trace prohibited state. This will cause losing a few bytes + of the trace cached. + + Workaround is to issue two TSB consecutively on affected cores. + + If unsure, say Y. + +config ARM64_ERRATUM_2067961 + bool "Neoverse-N2: 2067961: workaround TSB instruction failing to flush trace" + default y + help + Enable workaround for ARM Neoverse-N2 erratum 2067961 + + Affected cores may fail to flush the trace data on a TSB instruction, when + the PE is in trace prohibited state. This will cause losing a few bytes + of the trace cached. + + Workaround is to issue two TSB consecutively on affected cores. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index 451e11e5fd23..1c5a00598458 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h @@ -23,7 +23,7 @@ #define dsb(opt) asm volatile("dsb " #opt : : : "memory")

#define psb_csync() asm volatile("hint #17" : : : "memory") -#define tsb_csync() asm volatile("hint #18" : : : "memory") +#define __tsb_csync() asm volatile("hint #18" : : : "memory") #define csdb() asm volatile("hint #20" : : : "memory")

#ifdef CONFIG_ARM64_PSEUDO_NMI @@ -46,6 +46,20 @@ #define dma_rmb() dmb(oshld) #define dma_wmb() dmb(oshst)

+ +#define tsb_csync() \ + do { \ + /* \ + * CPUs affected by Arm Erratum 2054223 or 2067961 needs \ + * another TSB to ensure the trace is flushed. The barriers \ + * don't have to be strictly back to back, as long as the \ + * CPU is in trace prohibited state. \ + */ \ + if (cpus_have_final_cap(ARM64_WORKAROUND_TSB_FLUSH_FAILURE)) \ + __tsb_csync(); \ + __tsb_csync(); \ + } while (0) + /* * Generate a mask for array_index__nospec() that is ~0UL when 0 <= idx < sz * and 0 otherwise. diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index ccd757373f36..bdbeac75ead6 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -352,6 +352,18 @@ static const struct midr_range trbe_overwrite_fill_mode_cpus[] = { }; #endif /* CONFIG_ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE */

+#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE +static const struct midr_range tsb_flush_fail_cpus[] = { +#ifdef CONFIG_ARM64_ERRATUM_2067961 + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), +#endif +#ifdef CONFIG_ARM64_ERRATUM_2054223 + MIDR_ALL_VERSIONS(MIDR_CORTEX_A710), +#endif + {}, +}; +#endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */ + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -558,6 +570,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = { .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, CAP_MIDR_RANGE_LIST(trbe_overwrite_fill_mode_cpus), }, +#endif +#ifdef CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILRE + { + .desc = "ARM erratum 2067961 or 2054223", + .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE, + ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus), + }, #endif { } diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 1ccb92165bd8..2102e15af43d 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -54,6 +54,7 @@ WORKAROUND_1463225 WORKAROUND_1508412 WORKAROUND_1542419 WORKAROUND_TRBE_OVERWRITE_FILL_MODE +WORKAROUND_TSB_FLUSH_FAILURE WORKAROUND_CAVIUM_23154 WORKAROUND_CAVIUM_27456 WORKAROUND_CAVIUM_30115

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

Add a helper to get the CPU specific data for TRBE instance, from a given perf handle. This also adds extra checks to make sure that the event associated with the handle is "bound" to the CPU and is active on the TRBE.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 983dd5039e52..797d978f9fa7 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) return buf->nr_pages * PAGE_SIZE; }

+static inline struct trbe_cpudata * +trbe_handle_to_cpudata(struct perf_output_handle *handle) +{ + struct trbe_buf *buf = etm_perf_sink_config(handle); + + BUG_ON(!buf || !buf->cpudata); + return buf->cpudata; +} + /* * TRBE Limit Calculation * @@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand { int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr); - struct trbe_buf *buf = etm_perf_sink_config(handle); - struct trbe_cpudata *cpudata = buf->cpudata; + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

WARN_ON(is_trbe_running(trbsr)); if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))

-- 2.24.1

Mathieu Poirier

4 Oct 4 Oct

5:42 p.m.

New subject: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:

...

Add a helper to get the CPU specific data for TRBE instance, from a given perf handle. This also adds extra checks to make sure that the event associated with the handle is "bound" to the CPU and is active on the TRBE.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 983dd5039e52..797d978f9fa7 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) return buf->nr_pages * PAGE_SIZE; } +static inline struct trbe_cpudata * +trbe_handle_to_cpudata(struct perf_output_handle *handle) +{

struct trbe_buf *buf = etm_perf_sink_config(handle);

BUG_ON(!buf || !buf->cpudata);

return buf->cpudata;

+}

/*

TRBE Limit Calculation

@@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand { int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);

struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata;

struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

There is two other places where this pattern is present: is_perf_trbe() and __trbe_normal_offset().

I have to stop here for today. More comments tomorrow.

Thanks, Mathieu

...

WARN_ON(is_trbe_running(trbsr)); if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr)) -- 2.24.1

Suzuki K Poulose

5 Oct 5 Oct

10:35 p.m.

New subject: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

Hi Mathieu

On 04/10/2021 18:42, Mathieu Poirier wrote:

...

On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:

...
Add a helper to get the CPU specific data for TRBE instance, from a given perf handle. This also adds extra checks to make sure that the event associated with the handle is "bound" to the CPU and is active on the TRBE.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 983dd5039e52..797d978f9fa7 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) return buf->nr_pages * PAGE_SIZE; } +static inline struct trbe_cpudata * +trbe_handle_to_cpudata(struct perf_output_handle *handle) +{

struct trbe_buf *buf = etm_perf_sink_config(handle);

BUG_ON(!buf || !buf->cpudata);

return buf->cpudata;

+}

/*

TRBE Limit Calculation

@@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand { int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);

struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata;

struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

There is two other places where this pattern is present: is_perf_trbe() and __trbe_normal_offset().

I skipped them, as they have to get access to the "trbe_buf" anyways. So the step by step, made sense. But I could replace them too to make it transparent.

What do you think ?

Suzuki

Mathieu Poirier

6 Oct 6 Oct

5:15 p.m.

New subject: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

On Tue, Oct 05, 2021 at 11:35:13PM +0100, Suzuki K Poulose wrote:

...

Hi Mathieu

On 04/10/2021 18:42, Mathieu Poirier wrote:

...
On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:

...
Add a helper to get the CPU specific data for TRBE instance, from a given perf handle. This also adds extra checks to make sure that the event associated with the handle is "bound" to the CPU and is active on the TRBE.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 983dd5039e52..797d978f9fa7 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) return buf->nr_pages * PAGE_SIZE; } +static inline struct trbe_cpudata * +trbe_handle_to_cpudata(struct perf_output_handle *handle) +{

struct trbe_buf *buf = etm_perf_sink_config(handle);

BUG_ON(!buf || !buf->cpudata);

return buf->cpudata;

+}

/*

TRBE Limit Calculation

@@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand { int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);

struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata;

struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

There is two other places where this pattern is present: is_perf_trbe() and __trbe_normal_offset().

I skipped them, as they have to get access to the "trbe_buf" anyways. So the step by step, made sense. But I could replace them too to make it transparent.

What do you think ?

Humm... I don't think there is a right way or a wrong way here. If we move forward with this patchset we have two ways of getting to buf->cpudata. One using trbe_handle_to_cpudata() and another one as laid out in is_perf_trbe() and __trbe_normal_offset(), each with an equal number of occurences (2 for each).

I am usually not fond of small functions like trbe_handle_to_cpudata() and to me keeping the current heuristic in trbe_get_fault_act() would have been just fine. I agree with the argument that trbe_handle_to_cpudata() provides more checks but is it really worth it if they aren't done everywhere?

In short I would get rid of trbe_handle_to_cpudata() entirely and live without the extra checks... But I'm not strongly opinionated on this either.

...

Suzuki

Suzuki K Poulose

7 Oct 7 Oct

9:18 a.m.

New subject: [PATCH v2 12/17] coresight: trbe: Add a helper to fetch cpudata from perf handle

On 06/10/2021 18:15, Mathieu Poirier wrote:

...

On Tue, Oct 05, 2021 at 11:35:13PM +0100, Suzuki K Poulose wrote:

...
Hi Mathieu

On 04/10/2021 18:42, Mathieu Poirier wrote:

...
On Tue, Sep 21, 2021 at 02:41:16PM +0100, Suzuki K Poulose wrote:

...
Add a helper to get the CPU specific data for TRBE instance, from a given perf handle. This also adds extra checks to make sure that the event associated with the handle is "bound" to the CPU and is active on the TRBE.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com

drivers/hwtracing/coresight/coresight-trbe.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 983dd5039e52..797d978f9fa7 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -268,6 +268,15 @@ static unsigned long trbe_snapshot_offset(struct perf_output_handle *handle) return buf->nr_pages * PAGE_SIZE; } +static inline struct trbe_cpudata * +trbe_handle_to_cpudata(struct perf_output_handle *handle) +{

struct trbe_buf *buf = etm_perf_sink_config(handle);

BUG_ON(!buf || !buf->cpudata);

return buf->cpudata;

+}

/*

TRBE Limit Calculation

@@ -533,8 +542,7 @@ static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *hand { int ec = get_trbe_ec(trbsr); int bsc = get_trbe_bsc(trbsr);

struct trbe_buf *buf = etm_perf_sink_config(handle);

struct trbe_cpudata *cpudata = buf->cpudata;

struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);

There is two other places where this pattern is present: is_perf_trbe() and __trbe_normal_offset().

I skipped them, as they have to get access to the "trbe_buf" anyways. So the step by step, made sense. But I could replace them too to make it transparent.

What do you think ?

Humm... I don't think there is a right way or a wrong way here. If we move forward with this patchset we have two ways of getting to buf->cpudata. One using trbe_handle_to_cpudata() and another one as laid out in is_perf_trbe() and __trbe_normal_offset(), each with an equal number of occurences (2 for each).

I am usually not fond of small functions like trbe_handle_to_cpudata() and to me keeping the current heuristic in trbe_get_fault_act() would have been just fine.

There is another user introduced in the work around patch. But, yes, I agree, we could open code it, rather than having it inconsistent across the driver.

...

I agree with the argument that trbe_handle_to_cpudata() provides more checks but is it really worth it if they aren't done everywhere?

In short I would get rid of trbe_handle_to_cpudata() entirely and live without the extra checks... But I'm not strongly opinionated on this either.

Ok, I will remove this then. Thanks for the feedback.

Suzuki

Suzuki K Poulose

21 Sep 21 Sep

1:41 p.m.

New subject: [PATCH v2 13/17] coresight: trbe: Add a helper to determine the minimum buffer size

For the TRBE to operate, we need a minimum space available to collect meaningful trace session. This is currently a few bytes, but we may need to extend this for working around errata. So, abstract this into a helper function.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 797d978f9fa7..3373f4e2183b 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -277,6 +277,11 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle) return buf->cpudata; }

+static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle) +{ + return TRBE_TRACE_MIN_BUF_SIZE; +} + /* * TRBE Limit Calculation * @@ -447,7 +452,7 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle) * have space for a meaningful run, we rather pad it * and start fresh. */ - if (limit && (limit - head < TRBE_TRACE_MIN_BUF_SIZE)) { + if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) { trbe_pad_buf(handle, limit - head); limit = __trbe_normal_offset(handle); }

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 14/17] coresight: trbe: Make sure we have enough space

The TRBE driver makes sure that there is enough space for a meaningful run, otherwise pads the given space and restarts the offset calculation once. But there is no guarantee that we may find space or hit "no space". Make sure that we repeat the step until, either : - We have the minimum space OR - There is NO space at all.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 3373f4e2183b..02f9e00e2091 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -451,10 +451,14 @@ static unsigned long trbe_normal_offset(struct perf_output_handle *handle) * If the head is too close to the limit and we don't * have space for a meaningful run, we rather pad it * and start fresh. + * + * We might have to do this more than once to make sure + * we have enough required space. */ - if (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) { + while (limit && ((limit - head) < trbe_min_trace_buf_size(handle))) { trbe_pad_buf(handle, limit - head); limit = __trbe_normal_offset(handle); + head = PERF_IDX2OFF(handle->head, buf); } return limit; }

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 15/17] arm64: Add erratum detection for TRBE write to out-of-range

Arm Neoverse-N2 and Cortex-A710 cores are affected by an erratum where the trbe, under some circumstances, might write upto 64bytes to an address after the Limit as programmed by the TRBLIMITR_EL1.LIMIT. This might -

- Corrupt a page in the ring buffer, which may corrupt trace from a previous session, consumed by userspace. - Hit the guard page at the end of the vmalloc area and raise a fault.

To keep the handling simpler, we always leave the last page from the range, which TRBE is allowed to write. This can be achieved by ensuring that we always have more than a PAGE worth space in the range, while calculating the LIMIT for TRBE. And then the LIMIT pointer can be adjusted to leave the PAGE (TRBLIMITR.LIMIT -= PAGE_SIZE), out of the TRBE range while enabling it. This makes sure that the TRBE will only write to an area within its allowed limit (i.e, [head-head+size]) and we do not have to handle address faults within the driver.

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- arch/arm64/kernel/cpu_errata.c | 20 ++++++++++++++++++++ arch/arm64/tools/cpucaps | 1 + 2 files changed, 21 insertions(+)

diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index bdbeac75ead6..e2978b89d4b8 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -364,6 +364,18 @@ static const struct midr_range tsb_flush_fail_cpus[] = { }; #endif /* CONFIG_ARM64_WORKAROUND_TSB_FLUSH_FAILURE */

+#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE +static struct midr_range trbe_write_out_of_range_cpus[] = { +#ifdef CONFIG_ARM64_ERRATUM_2253138 + MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2), +#endif +#ifdef CONFIG_ARM64_ERRATUM_2224489 + MIDR_ALL_VERSIONS(MIDR_CORTEX_A710), +#endif + {}, +}; +#endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */ + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -577,6 +589,14 @@ const struct arm64_cpu_capabilities arm64_errata[] = { .capability = ARM64_WORKAROUND_TSB_FLUSH_FAILURE, ERRATA_MIDR_RANGE_LIST(tsb_flush_fail_cpus), }, +#endif +#ifdef CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE + { + .desc = "ARM erratum 2253138 or 2224489", + .capability = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE, + .type = ARM64_CPUCAP_WEAK_LOCAL_CPU_FEATURE, + CAP_MIDR_RANGE_LIST(trbe_write_out_of_range_cpus), + }, #endif { } diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 2102e15af43d..90628638e0f9 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -55,6 +55,7 @@ WORKAROUND_1508412 WORKAROUND_1542419 WORKAROUND_TRBE_OVERWRITE_FILL_MODE WORKAROUND_TSB_FLUSH_FAILURE +WORKAROUND_TRBE_WRITE_OUT_OF_RANGE WORKAROUND_CAVIUM_23154 WORKAROUND_CAVIUM_27456 WORKAROUND_CAVIUM_30115

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 16/17] coresight: trbe: Work around write to out of range

TRBE implementations affected by Arm erratum (2253138 or 2224489), could write to the next address after the TRBLIMITR.LIMIT, instead of wrapping to the TRBBASER. This implies that the TRBE could potentially corrupt :

- A page used by the rest of the kernel/user (if the LIMIT = end of perf ring buffer) - A page within the ring buffer, but outside the driver's range. [head, head + size]. This may contain some trace data, may be consumed by the userspace.

We workaround this erratum by : - Making sure that there is at least an extra PAGE space left in the TRBE's range than we normally assign. This will be additional to other restrictions (e.g, the TRBE alignment for working around TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE. Thus we would have 2 * PAGE_SIZE)

- Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :

TRBLIMITR.LIMIT -= PAGE_SIZE

Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++- 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c index 02f9e00e2091..ea907345354c 100644 --- a/drivers/hwtracing/coresight/coresight-trbe.c +++ b/drivers/hwtracing/coresight/coresight-trbe.c @@ -86,7 +86,8 @@ struct trbe_buf { * affects the given instance of the TRBE. */ #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0 -#define TRBE_ERRATA_MAX 1 +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1 +#define TRBE_ERRATA_MAX 2

/* * Safe limit for the number of bytes that may be overwritten @@ -96,6 +97,7 @@ struct trbe_buf {

static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = { [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE, + [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE, };

/* @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)

static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle) { - return TRBE_TRACE_MIN_BUF_SIZE; + u64 size = TRBE_TRACE_MIN_BUF_SIZE; + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle); + + /* + * When the TRBE is affected by an erratum that could make it + * write to the next "virtually addressed" page beyond the LIMIT. + * We need to make sure there is always a PAGE after the LIMIT, + * within the buffer. Thus we ensure there is at least an extra + * page than normal. With this we could then adjust the LIMIT + * pointer down by a PAGE later. + */ + if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE)) + size += PAGE_SIZE; + return size; }

/* @@ -585,6 +600,17 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle, /* * If the TRBE has wrapped around the write pointer has * wrapped and should be treated as limit. + * + * When the TRBE is affected by TRBE_WORKAROUND_WRITE_OUT_OF_RANGE, + * it may write upto 64bytes beyond the "LIMIT". The driver already + * keeps a valid page next to the LIMIT and we could potentially + * consume the trace data that may have been collected there. But we + * cannot be really sure it is available, and the TRBPTR may not + * indicate the same. Also, affected cores are also affected by another + * erratum which forces the PAGE_SIZE alignment on the TRBPTR, and thus + * could potentially pad an entire PAGE_SIZE - 64bytes, to get those + * 64bytes. Thus we ignore the potential triggering of the erratum + * on WRAP and limit the data to LIMIT. */ if (wrap) write = get_trbe_limit_pointer(); @@ -811,6 +837,35 @@ static int trbe_apply_work_around_before_enable(struct trbe_buf *buf) buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES; }

+ /* + * TRBE_WORKAROUND_WRITE_OUT_OF_RANGE could cause the TRBE to write to + * the next page after the TRBLIMITR.LIMIT. For perf, the "next page" + * may be: + * - The page beyond the ring buffer. This could mean, TRBE could + * corrupt another entity (kernel / user) + * - A portion of the "ring buffer" consumed by the userspace. + * i.e, a page outisde [head, head + size]. + * + * We work around this by: + * - Making sure that we have at least an extra space of PAGE left + * in the ring buffer [head, head + size], than we normally do + * without the erratum. See trbe_min_trace_buf_size(). + * + * - Adjust the TRBLIMITR.LIMIT to leave the extra PAGE outside + * the TRBE's range (i.e [TRBBASER, TRBLIMITR.LIMI] ). + */ + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_WRITE_OUT_OF_RANGE)) { + s64 space = buf->trbe_limit - buf->trbe_write; + /* + * We must have more than a PAGE_SIZE worth space in the proposed + * range for the TRBE. + */ + if (WARN_ON(space <= PAGE_SIZE || + !IS_ALIGNED(buf->trbe_limit, PAGE_SIZE))) + return -EINVAL; + buf->trbe_limit -= PAGE_SIZE; + } + return 0; }

-- 2.24.1

Suzuki K Poulose

1:41 p.m.

New subject: [PATCH v2 17/17] arm64: Advertise TRBE erratum workaround for write to out-of-range address

Add Kconfig entries for the errata workarounds for TRBE writing to an out-of-range address.

Cc: Mathieu Poirier mathieu.poirier@linaro.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Mike Leach mike.leach@linaro.org Cc: Leo Yan leo.yan@linaro.org Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com --- Documentation/arm64/silicon-errata.rst | 4 +++ arch/arm64/Kconfig | 39 ++++++++++++++++++++++++++ 2 files changed, 43 insertions(+)

diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index 569a92411dcd..5342e895fb60 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -96,6 +96,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A710 | #2054223 | ARM64_ERRATUM_2054223 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A710 | #2224489 | ARM64_ERRATUM_2224489 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1188873,1418040| ARM64_ERRATUM_1418040 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N1 | #1349291 | N/A | @@ -106,6 +108,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Neoverse-N2 | #2067961 | ARM64_ERRATUM_2067961 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Neoverse-N2 | #2253138 | ARM64_ERRATUM_2253138 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | MMU-500 | #841119,826419 | N/A | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 0764774e12bb..611ae02aabbd 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -736,6 +736,45 @@ config ARM64_ERRATUM_2067961

If unsure, say Y.

+config ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE + bool + +config ARM64_ERRATUM_2253138 + bool "Neoverse-N2: 2253138: workaround TRBE writing to address out-of-range" + depends on CORESIGHT_TRBE + default y + select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE + help + This option adds the workaround for ARM Neoverse-N2 erratum 2253138. + + Affected Neoverse-N2 cores might write to an out-of-range address, not reserved + for TRBE. Under some conditions, the TRBE might generate a write to the next + virtually addressed page following the last page of the TRBE address space + (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base. + + We work around this in the driver by, always making sure that there is a + page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE. + + If unsure, say Y. + +config ARM64_ERRATUM_2224489 + bool "Cortex-A710: 2224489: workaround TRBE writing to address out-of-range" + depends on CORESIGHT_TRBE + default y + select ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE + help + This option adds the workaround for ARM Cortex-A710 erratum 2224489. + + Affected Cortex-A710 cores might write to an out-of-range address, not reserved + for TRBE. Under some conditions, the TRBE might generate a write to the next + virtually addressed page following the last page of the TRBE address space + (i.e, the TRBLIMITR_EL1.LIMIT), instead of wrapping around to the base. + + We work around this in the driver by, always making sure that there is a + page beyond the TRBLIMITR_EL1.LIMIT, within the space allowed for the TRBE. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y

-- 2.24.1

Mathieu Poirier

5 Oct 5 Oct

5:04 p.m.

On Tue, Sep 21, 2021 at 02:41:04PM +0100, Suzuki K Poulose wrote:

...

This series adds CPU erratum work arounds related to the self-hosted tracing. The list of affected errata handled in this series are :

TRBE may overwrite trace in FILL mode

Arm Neoverse-N2 #2139208

Cortex-A710 #211985

A TSB instruction may not flush the trace completely when executed in trace prohibited region.

Arm Neoverse-N2 #2067961

Cortex-A710 #2054223

TRBE may write to out-of-range address

Arm Neoverse-N2 #2253138

Cortex-A710 #2224489

The series applies on the self-hosted/trbe fixes posted here [0]. A tree containing both the series is available here [1]

[0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com [1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2

Changes since v1: https://lkml.kernel.org/r/20210728135217.591173-1-suzuki.poulose@arm.com

Added a fix to the TRBE driver handling of sink_specific data

Added more description and ASCII art for overwrite in FILL mode work around

Added another TRBE erratum to the list.

"TRBE may write to out-of-range address" Patches from 12-17

Added comment to list the expectations around TSB erratum workaround.

Suzuki K Poulose (17): coresight: trbe: Fix incorrect access of the sink specific data coresight: trbe: Add infrastructure for Errata handling coresight: trbe: Add a helper to calculate the trace generated coresight: trbe: Add a helper to pad a given buffer area coresight: trbe: Decouple buffer base from the hardware base coresight: trbe: Allow driver to choose a different alignment arm64: Add Neoverse-N2, Cortex-A710 CPU part definition arm64: Add erratum detection for TRBE overwrite in FILL mode coresight: trbe: Workaround TRBE errata overwrite in FILL mode arm64: Enable workaround for TRBE overwrite in FILL mode arm64: errata: Add workaround for TSB flush failures coresight: trbe: Add a helper to fetch cpudata from perf handle coresight: trbe: Add a helper to determine the minimum buffer size coresight: trbe: Make sure we have enough space arm64: Add erratum detection for TRBE write to out-of-range coresight: trbe: Work around write to out of range arm64: Advertise TRBE erratum workaround for write to out-of-range address

Documentation/arm64/silicon-errata.rst | 12 + arch/arm64/Kconfig | 109 ++++++ arch/arm64/include/asm/barrier.h | 16 +- arch/arm64/include/asm/cputype.h | 4 + arch/arm64/kernel/cpu_errata.c | 64 ++++ arch/arm64/tools/cpucaps | 3 + drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++-- 7 files changed, 510 insertions(+), 37 deletions(-)

Patches 04 to 11 and 13 to 17:

Reviewed-by: Mathieu Poirier mathieu.poirier@linaro.org

I am done reviewing this set.

Thanks, Mathieu

...

-- 2.24.1

1629

days inactive

1645

days old

coresight@lists.linaro.org

31 comments

participants

tags (0)

participants (2)

Mathieu Poirier
Suzuki K Poulose