This patchset adds support for stopping trace on all enabled coresight
sources upon execution of a programmed instruction address.
ETM stop event is triggered by an instruction address match on a comparator.
This address match event is then used to stop the tracing across all ETM
source devices with the help of external input/output signals and ECT.
The instruction address for the stop event can be programmed using sysfs
interface.
The event flow diagram is like this,
Address comparator --> EXT OUT --> ECT---> EXT IN --> Counter-->ViewInst
The original intention of this patch is to stop all ETM sources at the
time of kernel panic without software intervention so that it can be
used as one of the building block while enabling panic/kdump support in
coresight drivers [1].
But there can be other use cases like stopping trace on assertions or
error functions etc. as well.
I am sending this patch as an RFC so as to get an early feedback
on the approach taken for implementation and other inputs if any.
Few caveats:
- Testing was done only with sysfs interface using a arbitrary kernel
symbol address.
Perf support will be added later based on initial feedback.
- CTI hook for enabling trace event connection need to be rewritten
to use existing support APIs.
[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1652258.html
Linu Cherian (2):
coresight: Add support to setup Trace event signals
coresight: etm4x: Add support to generate and synchronize stop event
drivers/hwtracing/coresight/coresight-core.c | 37 ++++
.../hwtracing/coresight/coresight-cti-core.c | 21 +++
drivers/hwtracing/coresight/coresight-cti.h | 3 +
.../coresight/coresight-etm4x-core.c | 167 ++++++++++++++++++
.../coresight/coresight-etm4x-sysfs.c | 64 +++++++
drivers/hwtracing/coresight/coresight-etm4x.h | 16 ++
include/linux/coresight.h | 6 +
7 files changed, 314 insertions(+)
--
2.31.1
This series adds CPU erratum work arounds related to the self-hosted
tracing. The list of affected errata handled in this series are :
* TRBE may overwrite trace in FILL mode
- Arm Neoverse-N2 #2139208
- Cortex-A710 #211985
* A TSB instruction may not flush the trace completely when executed
in trace prohibited region.
- Arm Neoverse-N2 #2067961
- Cortex-A710 #2054223
* TRBE may write to out-of-range address
- Arm Neoverse-N2 #2253138
- Cortex-A710 #2224489
The series applies on the self-hosted/trbe fixes posted here [0].
A tree containing both the series is available here [1]
[0] https://lkml.kernel.org/r/20210914102641.1852544-1-suzuki.poulose@arm.com
[1] git@git.gitlab.arm.com:linux-arm/linux-skp.git coresight/errata/trbe-tsb-n2-a710/v2
Changes since v1:
https://lkml.kernel.org/r/20210728135217.591173-1-suzuki.poulose@arm.com
- Added a fix to the TRBE driver handling of sink_specific data
- Added more description and ASCII art for overwrite in FILL mode
work around
- Added another TRBE erratum to the list.
"TRBE may write to out-of-range address"
Patches from 12-17
- Added comment to list the expectations around TSB erratum workaround.
Suzuki K Poulose (17):
coresight: trbe: Fix incorrect access of the sink specific data
coresight: trbe: Add infrastructure for Errata handling
coresight: trbe: Add a helper to calculate the trace generated
coresight: trbe: Add a helper to pad a given buffer area
coresight: trbe: Decouple buffer base from the hardware base
coresight: trbe: Allow driver to choose a different alignment
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
arm64: Add erratum detection for TRBE overwrite in FILL mode
coresight: trbe: Workaround TRBE errata overwrite in FILL mode
arm64: Enable workaround for TRBE overwrite in FILL mode
arm64: errata: Add workaround for TSB flush failures
coresight: trbe: Add a helper to fetch cpudata from perf handle
coresight: trbe: Add a helper to determine the minimum buffer size
coresight: trbe: Make sure we have enough space
arm64: Add erratum detection for TRBE write to out-of-range
coresight: trbe: Work around write to out of range
arm64: Advertise TRBE erratum workaround for write to out-of-range address
Documentation/arm64/silicon-errata.rst | 12 +
arch/arm64/Kconfig | 109 ++++++
arch/arm64/include/asm/barrier.h | 16 +-
arch/arm64/include/asm/cputype.h | 4 +
arch/arm64/kernel/cpu_errata.c | 64 ++++
arch/arm64/tools/cpucaps | 3 +
drivers/hwtracing/coresight/coresight-trbe.c | 339 +++++++++++++++++--
7 files changed, 510 insertions(+), 37 deletions(-)
--
2.24.1
Adds a generic API to allow packet processors to count the amount of bytes per channel
processed and not synced plus any packet header or format errors.
The ETMv4 / ETE packet processor is updated to use this API.
Statistics are also provided for the CoreSight frame demux, which are global across
all channels.
Typical output from trc_pkt_lister when run with -stats option to print stats:
===============================================================
Trace Packet Lister : Trace buffer done, processed 65536 bytes.
Reading packet decoder statistics....
Decode stats ID 0x10
Total Bytes: 55273; Unsynced Bytes: 1453
Bad Header Errors: 0; Bad Sequence Errors: 0
Decode stats ID 0x11
Total Bytes: 672; Unsynced Bytes: 132
Bad Header Errors: 0; Bad Sequence Errors: 0
Decode stats ID 0x12
Total Bytes: 672; Unsynced Bytes: 648
Bad Header Errors: 0; Bad Sequence Errors: 0
Decode stats ID 0x13
Total Bytes: 698; Unsynced Bytes: 0
Bad Header Errors: 0; Bad Sequence Errors: 0
Decode stats ID 0x14
Total Bytes: 0; Unsynced Bytes: 0
Bad Header Errors: 0; Bad Sequence Errors: 0
Decode stats ID 0x15
Total Bytes: 2783; Unsynced Bytes: 471
Bad Header Errors: 0; Bad Sequence Errors: 0
Frame Demux Stats
Trace data bytes sent to registered ID decoders: 60098
Trace data bytes without registered ID decoders: 0
Trace data bytes with unknown ID: 81
Trace data bytes with reserved ID: 22
Frame demux bytes, ID bytes and sync bytes: 5335
Total bytes processed by frame demux: 65536
================================================================
API adds ocsd_decode_stats_t structure to contain the statistics. (ocsd_if_types.h)
C-API (ocsd_c_apo.h) adds functions:-
ocsd_dt_get_decode_stats() - get pointer to stats block.
ocsd_dt_reset_decode_stats() - resets the counts to zero. This function operates independently
of the main decoder reset.
This allows for tools such as perf which may reset the decoder multiple times per AUXTRACE_BUFFER
to count stats for the entire buffer rather than each capture block.
Changes since v1:
1) stats structure now contains a block for frame demux data.
2) patchset contains a number of additional build and code fixes that will be included
in the 1.2.0 release.
James Clark (1):
opencsd: Remove noisy printf
Mike Leach (10):
opencsd: Add decode statistics API to packet processor.
opencsd: ETMv4: ETE: Add packet processing stats to decoders.
tests: Update test programs to use the packet decoder statistics API
build: tests: Fix build warnings in snapshot parser.
build: tests: Fix build warnings in mem_buff_demo test
build: tests: Fix build warnings in C-API test program
opencsd: stats: Add collection of CoreSight frame demux stats
tests: Add printing of CS frame Demux stats to test program.
tests: Update test scripts to pass additional options
opencsd: Update readme and version info for v1.2.0
Yi Kong (1):
opencsd: build: Remove unused variable
README.md | 9 ++-
decoder/include/common/ocsd_dcd_tree.h | 29 ++++++++-
.../include/common/trc_frame_deformatter.h | 4 ++
decoder/include/common/trc_pkt_proc_base.h | 47 +++++++++++++-
decoder/include/opencsd/c_api/opencsd_c_api.h | 30 ++++++++-
decoder/include/opencsd/ocsd_if_types.h | 47 ++++++++++++++
decoder/include/opencsd/ocsd_if_version.h | 6 +-
decoder/source/c_api/ocsd_c_api.cpp | 18 +++++-
decoder/source/etmv4/trc_pkt_proc_etmv4i.cpp | 10 ++-
decoder/source/ocsd_dcd_tree.cpp | 64 +++++++++++++++++++
decoder/source/trc_frame_deformatter.cpp | 60 +++++++++++++++--
decoder/source/trc_frame_deformatter_impl.h | 14 +++-
decoder/source/trc_printable_elem.cpp | 2 -
decoder/tests/run_pkt_decode_single.bash | 7 +-
decoder/tests/run_pkt_decode_tests-ete.bash | 11 ++--
decoder/tests/run_pkt_decode_tests.bash | 17 +++--
.../include/snapshot_parser.h | 6 +-
.../source/snapshot_parser.cpp | 4 ++
decoder/tests/source/c_api_pkt_print_test.c | 39 ++++++++++-
decoder/tests/source/mem_buff_demo.cpp | 9 ++-
decoder/tests/source/trc_pkt_lister.cpp | 63 +++++++++++++++++-
21 files changed, 455 insertions(+), 41 deletions(-)
--
2.17.1
Hi Tao
On 26/09/2021 11:35, Tao Zhang wrote:
> clang-12 fails to build the etm4x driver with -fsanitize=array-bounds,
> where it decides to unroll certain loops in a way that result in a
> C variable getting put into an inline assembly.
>
> Search this build failure and find this is a known issue and there
> has been a mail thread discussing it.
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210429145752.…
> According to the modification suggestions of this mail thread,
> coresight infrastucture has already provided another API that
> can replace the function that caused the error.
>
> Used here "csdev_access_read32" to replace the original API
> "etm4x_relaxed_read32".
>
> This patch applies to coresight/next
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
Thanks for picking up the patch. Please could you convert all the other
variable indexed register access too ? That would save us spinning up
patches for fixing those individual cases whenever the compiler decides
to change its behavior.
Suzuki
Hi Tao,
On Sun, Sep 26, 2021 at 06:35:50PM +0800, Tao Zhang wrote:
> clang-12 fails to build the etm4x driver with -fsanitize=array-bounds,
> where it decides to unroll certain loops in a way that result in a
> C variable getting put into an inline assembly.
>
> Search this build failure and find this is a known issue and there
> has been a mail thread discussing it.
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210429145752.…
> According to the modification suggestions of this mail thread,
> coresight infrastucture has already provided another API that
> can replace the function that caused the error.
>
> Used here "csdev_access_read32" to replace the original API
> "etm4x_relaxed_read32".
>
> This patch applies to coresight/next
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
> ---
> drivers/hwtracing/coresight/coresight-etm4x-core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> index f58afbab6e6d..0bca8e2be070 100644
> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> @@ -797,7 +797,7 @@ static void etm4_disable_hw(void *info)
> /* read back the current counter values */
> for (i = 0; i < drvdata->nr_cntr; i++) {
> config->cntr_val[i] =
> - etm4x_relaxed_read32(csa, TRCCNTVRn(i));
> + csdev_access_read32(csa, TRCCNTVRn(i));
It seems like the patch you are referencing above was never applied... So the
question is, how is it that only this instance is giving you trouble when there
are many more instances of the same pattern in the file?
Thanks,
Mathieu
> }
>
> coresight_disclaim_device_unlocked(csdev);
> --
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>
On 23/09/2021 07:13, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> ARM Neoverse-N2 (#2139208) and Cortex-A710(##2119858) suffers from
>> an erratum, which when triggered, might cause the TRBE to overwrite
>> the trace data already collected in FILL mode, in the event of a WRAP.
>> i.e, the TRBE doesn't stop writing the data, instead wraps to the base
>> and could write upto 3 cache line size worth trace. Thus, this could
>> corrupt the trace at the "BASE" pointer.
>>
>> The workaround is to program the write pointer 256bytes from the
>
> 3 cache lines = 256 bytes on all implementation which might have TRBE ?
> OR this skid bytes should be derived from the platform cache line size
> instead.
256bytes is the aligned (to the power of 2) value for the safe guard.
Not 3 cache lines. Ideally, if there is another CPU that has larger
cache line size, affected by the erratum, yes, we must do that.
But for now this is sufficient.
>
>> base, such that if the erratum is triggered, it doesn't overwrite
>> the trace data that was captured. This skipped region could be
>> padded with ignore packets at the end of the session, so that
>> the decoder sees a continuous buffer with some padding at the
>> beginning. The trace data written at the base is considered
>> lost as the limit could have been in the middle of the perf
>> ring buffer, and jumping to the "base" is not acceptable.
>> We set the flags already to indicate that some amount of trace
>> was lost during the FILL event IRQ. So this is fine.
>
> Via PERF_AUX_FLAG_TRUNCATED ? Should be specified here to be clear.
Please note that setting the flag is not a side effect of the
work around. And as such I don't think this needs to be mentioned
here. e.g, we changed this to COLLISION recently for WRAP events.
It makes sense to keep the details to the driver.
>
>>
>> One important change with the work around is, we program the
>> TRBBASER_EL1 to current page where we are allowed to write.
>> Otherwise, it could overwrite a region that may be consumed
>> by the perf. Towards this, we always make sure that the
>> "handle->head" and thus the trbe_write is PAGE_SIZE aligned,
>> so that we can set the BASE to the PAGE base and move the
>> TRBPTR to the 256bytes offset.
>>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> Change since v1:
>> - Updated comment with ASCII art
>> - Add _BYTES suffix for the space to skip for the work around.
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 144 +++++++++++++++++--
>> 1 file changed, 132 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index f569010c672b..983dd5039e52 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -16,6 +16,7 @@
>> #define pr_fmt(fmt) DRVNAME ": " fmt
>>
>> #include <asm/barrier.h>
>> +#include <asm/cpufeature.h>
>> #include <asm/cputype.h>
>>
>> #include "coresight-self-hosted-trace.h"
>> @@ -84,9 +85,17 @@ struct trbe_buf {
>> * per TRBE instance, we keep track of the list of errata that
>> * affects the given instance of the TRBE.
>> */
>> -#define TRBE_ERRATA_MAX 0
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> +#define TRBE_ERRATA_MAX 1
>> +
>> +/*
>> + * Safe limit for the number of bytes that may be overwritten
>> + * when the erratum is triggered.
>> + */
>> +#define TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES 256
>
> As mentioned earlier, does it depend on the platform cache line size ?
> Otherwise if the skip bytes is something platform independent, should
> be mentioned here in a comment.
I could add in a comment.
>
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> + [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> };
>>
>> /*
>> @@ -519,10 +528,13 @@ static void trbe_enable_hw(struct trbe_buf *buf)
>> set_trbe_limit_pointer_enabled(buf->trbe_limit);
>> }
>>
>> -static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> +static enum trbe_fault_action trbe_get_fault_act(struct perf_output_handle *handle,
>> + u64 trbsr)
>> {
>> int ec = get_trbe_ec(trbsr);
>> int bsc = get_trbe_bsc(trbsr);
>> + struct trbe_buf *buf = etm_perf_sink_config(handle);
>> + struct trbe_cpudata *cpudata = buf->cpudata;
>
> Passing down the perf handle to derive trbe_cpudata seems to be right.
>
>>
>> WARN_ON(is_trbe_running(trbsr));
>> if (is_trbe_trg(trbsr) || is_trbe_abort(trbsr))
>> @@ -531,10 +543,16 @@ static enum trbe_fault_action trbe_get_fault_act(u64 trbsr)
>> if ((ec == TRBE_EC_STAGE1_ABORT) || (ec == TRBE_EC_STAGE2_ABORT))
>> return TRBE_FAULT_ACT_FATAL;
>>
>> - if (is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) {
>> - if (get_trbe_write_pointer() == get_trbe_base_pointer())
>> - return TRBE_FAULT_ACT_WRAP;
>> - }
>> + /*
>> + * If the trbe is affected by TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> + * it might write data after a WRAP event in the fill mode.
>> + * Thus the check TRBPTR == TRBBASER will not be honored.
>> + */
>
> Needs bit formatting/alignment cleanup.
>
>> + if ((is_trbe_wrap(trbsr) && (ec == TRBE_EC_OTHERS) && (bsc == TRBE_BSC_FILLED)) &&
>> + (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ||
>> + get_trbe_write_pointer() == get_trbe_base_pointer()))
>> + return TRBE_FAULT_ACT_WRAP;
>> +
>
> Right, TRBE without the errata should continue to have the write
> pointer = base pointer check. Could all TRBE errata checks like
> the following be shortened (without the workaround index) for
> better readability ? But not something very important.
>
> trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)
Do you mean something like :
trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) ->
trbe_may_overwrite_in_fill_mode(cpudata) ?
>
>
>> return TRBE_FAULT_ACT_SPURIOUS;
>> }
>>
>> @@ -544,6 +562,8 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>> {
>> u64 write;
>> u64 start_off, end_off;
>> + u64 size;
>> + u64 overwrite_skip = TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>>
>> /*
>> * If the TRBE has wrapped around the write pointer has
>> @@ -559,7 +579,18 @@ static unsigned long trbe_get_trace_size(struct perf_output_handle *handle,
>>
>> if (WARN_ON_ONCE(end_off < start_off))
>> return 0;
>> - return (end_off - start_off);
>> +
>> + size = end_off - start_off;
>> + /*
>> + * If the TRBE is affected by the following erratum, we must fill
>> + * the space we skipped with IGNORE packets. And we are always
>> + * guaranteed to have at least a PAGE_SIZE space in the buffer.
>> + */
>> + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE) &&
>> + !WARN_ON(size < overwrite_skip))
>> + __trbe_pad_buf(buf, start_off, overwrite_skip);
>> +
>> + return size;
>> }
>>
>> static void *arm_trbe_alloc_buffer(struct coresight_device *csdev,
>> @@ -678,7 +709,7 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> clr_trbe_irq();
>> isb();
>>
>> - act = trbe_get_fault_act(status);
>> + act = trbe_get_fault_act(handle, status);
>> /*
>> * If this was not due to a WRAP event, we have some
>> * errors and as such buffer is empty.
>> @@ -702,21 +733,95 @@ static unsigned long arm_trbe_update_buffer(struct coresight_device *csdev,
>> return size;
>> }
>>
>> +
>> +static int trbe_apply_work_around_before_enable(struct trbe_buf *buf)
>> +{
>> + /*
>> + * TRBE_WORKAROUND_OVERWRITE_FILL_MODE causes the TRBE to overwrite a few cache
>
> few cache lines = 3 cache lines ?
Yes, upto 3.
>
>> + * line size from the "TRBBASER_EL1" in the event of a "FILL".
>> + * Thus, we could loose some amount of the trace at the base.
>> + *
>> + * Before Fix:
>> + *
>> + * normal-BASE head normal-PTR tail normal-LIMIT
>> + * | \/ /
>> + * -------------------------------------------------------------
>> + * | | |xyzdefghij..|... tuvw| |
>> + * -------------------------------------------------------------
>> + * / | \
>> + * After Fix-> TRBBASER TRBPTR TRBLIMITR.LIMIT
>> + *
>> + * In the normal course of action, we would set the TRBBASER to the
>> + * beginning of the ring-buffer (normal-BASE). But with the erratum,
>> + * the TRBE could overwrite the contents at the "normal-BASE", after
>> + * hitting the "normal-LIMIT", since it doesn't stop as expected. And
>> + * this is wrong. So we must always make sure that the TRBBASER is
>> + * within the region [head, head+size].
>> + *
>> + * Also, we would set the TRBPTR to head (after adjusting for
>> + * alignment) at normal-PTR. This would mean that the last few bytes
>> + * of the trace (say, "xyz") might overwrite the first few bytes of
>> + * trace written ("abc"). More importantly they will appear in what\
>> + * userspace sees as the beginning of the trace, which is wrong. We may
>> + * not always have space to move the latest trace "xyz" to the correct
>> + * order as it must appear beyond the LIMIT. (i.e, [head..head+size].
>> + * Thus it is easier to ignore those bytes than to complicate the
>> + * driver to move it, assuming that the erratum was triggered and doing
>> + * additional checks to see if there is indeed allowed space at
>> + * TRBLIMITR.LIMIT.
>> + *
>> + * To summarize, with the work around:
>> + *
>> + * - We always align the offset for the next session to PAGE_SIZE
>> + * (This is to ensure we can program the TRBBASER to this offset
>> + * within the region [head...head+size]).
>> + *
>> + * - At TRBE enable:
>> + * - Set the TRBBASER to the page aligned offset of the current
>> + * proposed write offset. (which is guaranteed to be aligned
>> + * as above)
>> + * - Move the TRBPTR to skip first 256bytes (that might be
>> + * overwritten with the erratum). This ensures that the trace
>> + * generated in the session is not re-written.
>> + *
>> + * - At trace collection:
>> + * - Pad the 256bytes skipped above again with IGNORE packets.
>> + */
>> + if (trbe_has_erratum(buf->cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE)) {
>> + if (WARN_ON(!IS_ALIGNED(buf->trbe_write, PAGE_SIZE)))
>> + return -EINVAL;
>> + buf->trbe_hw_base = buf->trbe_write;
>> + buf->trbe_write += TRBE_WORKAROUND_OVERWRITE_FILL_MODE_SKIP_BYTES;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> static int __arm_trbe_enable(struct trbe_buf *buf,
>> struct perf_output_handle *handle)
>> {
>> + int ret = 0;
>> +
>> perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW);
>> buf->trbe_limit = compute_trbe_buffer_limit(handle);
>> buf->trbe_write = buf->trbe_base + PERF_IDX2OFF(handle->head, buf);
>> if (buf->trbe_limit == buf->trbe_base) {
>> - trbe_stop_and_truncate_event(handle);
>> - return -ENOSPC;
>> + ret = -ENOSPC;
>> + goto err;
>> }
>> /* Set the base of the TRBE to the buffer base */
>> buf->trbe_hw_base = buf->trbe_base;
>> +
>> + ret = trbe_apply_work_around_before_enable(buf);
>> + if (ret)
>> + goto err;
>> +
>> *this_cpu_ptr(buf->cpudata->drvdata->handle) = handle;
>> trbe_enable_hw(buf);
>> return 0;
>> +err:
>> + trbe_stop_and_truncate_event(handle);
>> + return ret;
>> }
>>
>> static int arm_trbe_enable(struct coresight_device *csdev, u32 mode, void *data)
>> @@ -860,7 +965,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> if (!is_perf_trbe(handle))
>> return IRQ_NONE;
>>
>> - act = trbe_get_fault_act(status);
>> + act = trbe_get_fault_act(handle, status);
>> switch (act) {
>> case TRBE_FAULT_ACT_WRAP:
>> truncated = !!trbe_handle_overflow(handle);
>> @@ -1000,7 +1105,22 @@ static void arm_trbe_probe_cpu(void *info)
>> }
>>
>> trbe_check_errata(cpudata);
>> - cpudata->trbe_align = cpudata->trbe_hw_align;
>> + /*
>> + * If the TRBE is affected by erratum TRBE_WORKAROUND_OVERWRITE_FILL_MODE,
>> + * we must always program the TBRPTR_EL1, 256bytes from a page
>> + * boundary, with TRBBASER_EL1 set to the page, to prevent
>> + * TRBE over-writing 256bytes at TRBBASER_EL1 on FILL event.
>> + *
>> + * Thus make sure we always align our write pointer to a PAGE_SIZE,
>> + * which also guarantees that we have at least a PAGE_SIZE space in
>> + * the buffer (TRBLIMITR is PAGE aligned) and thus we can skip
>> + * the required bytes at the base.
>> + */
>> + if (trbe_has_erratum(cpudata, TRBE_WORKAROUND_OVERWRITE_FILL_MODE))
>> + cpudata->trbe_align = PAGE_SIZE;
>> + else
>> + cpudata->trbe_align = cpudata->trbe_hw_align;
>> +
>
> But like trbe_apply_work_around_before_enable(), trbe_align assignment
> should also be wrapped inside a new helper which should contain these
> comments and conditional block. Because it makes sense to have errata
> work arounds in the leaf level helper functions, rather than TRBE core
> operations.
That would imply we re-initialize the trbe_align in the new helper after
setting the value here for all other unaffected TRBEs. I would rather
leave it as it is, until we have more work arounds that touch this area.
This is one of code called per TRBE instance.
Suzuki
On 23/09/2021 04:15, Anshuman Khandual wrote:
>
>
> On 9/21/21 7:11 PM, Suzuki K Poulose wrote:
>> TRBE implementations affected by Arm erratum (2253138 or 2224489), could
>> write to the next address after the TRBLIMITR.LIMIT, instead of wrapping
>> to the TRBBASER. This implies that the TRBE could potentially corrupt :
>>
>> - A page used by the rest of the kernel/user (if the LIMIT = end of
>> perf ring buffer)
>> - A page within the ring buffer, but outside the driver's range.
>> [head, head + size]. This may contain some trace data, may be
>> consumed by the userspace.
>>
>> We workaround this erratum by :
>> - Making sure that there is at least an extra PAGE space left in the
>> TRBE's range than we normally assign. This will be additional to other
>> restrictions (e.g, the TRBE alignment for working around
>> TRBE_WORKAROUND_OVERWRITE_IN_FILL_MODE, where there is a minimum of PAGE_SIZE.
>> Thus we would have 2 * PAGE_SIZE)
>>
>> - Adjust the LIMIT to leave the last PAGE_SIZE out of the TRBE's allowed
>> range (i.e, TRBEBASER...TRBLIMITR.LIMIT), by :
>>
>> TRBLIMITR.LIMIT -= PAGE_SIZE
>>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 59 +++++++++++++++++++-
>> 1 file changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index 02f9e00e2091..ea907345354c 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -86,7 +86,8 @@ struct trbe_buf {
>> * affects the given instance of the TRBE.
>> */
>> #define TRBE_WORKAROUND_OVERWRITE_FILL_MODE 0
>> -#define TRBE_ERRATA_MAX 1
>> +#define TRBE_WORKAROUND_WRITE_OUT_OF_RANGE 1
>> +#define TRBE_ERRATA_MAX 2
>>
>> /*
>> * Safe limit for the number of bytes that may be overwritten
>> @@ -96,6 +97,7 @@ struct trbe_buf {
>>
>> static unsigned long trbe_errata_cpucaps[TRBE_ERRATA_MAX] = {
>> [TRBE_WORKAROUND_OVERWRITE_FILL_MODE] = ARM64_WORKAROUND_TRBE_OVERWRITE_FILL_MODE,
>> + [TRBE_WORKAROUND_WRITE_OUT_OF_RANGE] = ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE,
>> };
>>
>> /*
>> @@ -279,7 +281,20 @@ trbe_handle_to_cpudata(struct perf_output_handle *handle)
>>
>> static u64 trbe_min_trace_buf_size(struct perf_output_handle *handle)
>> {
>> - return TRBE_TRACE_MIN_BUF_SIZE;
>> + u64 size = TRBE_TRACE_MIN_BUF_SIZE;
>> + struct trbe_cpudata *cpudata = trbe_handle_to_cpudata(handle);
>> +
>> + /*
>> + * When the TRBE is affected by an erratum that could make it
>> + * write to the next "virtually addressed" page beyond the LIMIT.
>
> What if the next "virtually addressed" page is just blocked from future
> usage in the kernel and never really gets mapped into a physical page ?
That is the case today for vmap(), the end of the vm_area has a guard
page. But that implies when the erratum is triggered, the TRBE
encounters a fault and we need to handle that in the driver. This works
for "end" of the ring buffer. But not when the LIMIT is in the middle
of the ring buffer.
> In that case it would be guaranteed that, a next "virtually addressed"
> page would not even exist after the LIMIT pointer and hence the errata
> would not be triggered. Something like there is a virtual mapping cliff
> right after the LIMIT pointer from the MMU perspective.
>
> Although it might be bit tricky. Currently the entire ring buffer gets
> mapped at once with vmap() in arm_trbe_alloc_buffer(). Just to achieve
> the above solution, each computation of the LIMIT pointer needs to be
> followed by a temporary unmapping of next virtual page from existing
> vmap() buffer. Subsequently it could be mapped back as trbe_buf->pages
> always contains all the physical pages from the perf ring buffer.
It is much easier to leave a page aside than to do this map, unmap
dance, which might even change the VA address you get and thus it
complicates the TRBE driver in general. I believe this is much
simpler and we can reason about the code better. And all faults
are still illegal for the driver, which helps us to detect any
other issues in the TRBE.
Suzuki
On 24/09/2021 11:06, Tao Zhang wrote:
> Add ETM PID for Kryo-5XX to the list of supported ETMs.
> Otherwise, Kryo-5XX ETMs will not be initialized successfully.
> e.g.
> This change can be verified on qrb5165-rb5 board. ETM4-ETM7 nodes
> will not be visible without this change.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
I have queued this one.
Suzuki
Hi Tao
On 24/09/2021 11:06, Tao Zhang wrote:
> Add the basic coresight components found on Qualcomm SM8250 Soc. The
> basic coresight components include ETF, ETMs,STM and the related
> funnels.
>
> Signed-off-by: Tao Zhang <quic_taozha(a)quicinc.com>
Acked-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
PS: This patch must go via the Qcom DT maintainers. I would
recommend sending this to the following people, so that it
can be queued.
$ scripts/get_maintainer.pl arch/arm64/boot/dts/qcom/qrb5165-rb5.dts
Andy Gross <agross(a)kernel.org> (maintainer:ARM/QUALCOMM SUPPORT)
Bjorn Andersson <bjorn.andersson(a)linaro.org> (maintainer:ARM/QUALCOMM
SUPPORT)
Rob Herring <robh+dt(a)kernel.org> (maintainer:OPEN FIRMWARE AND FLATTENED
DEVICE TREE BINDINGS)
linux-arm-msm(a)vger.kernel.org (open list:ARM/QUALCOMM SUPPORT)
devicetree(a)vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED DEVICE
TREE BINDINGS)
linux-kernel(a)vger.kernel.org (open list)
Kind regards
Suzuki