This patchset consists of refactoring to allow the decoder to be
created in advance when the AUX records are iterated over. The
AUX record flags are used to communicate whether the data is
formatted or not which is the reason this refactoring is required.
These changes result in some simplifications, removal of early exit
conditions etc.
A change was also made to --dump-raw-trace code to allow the
formatted/unformatted status to persist and for the decoder to
not be continually deleted and recreated.
The changes apply on top of the previous patchset "[PATCH v7 0/2] perf
cs-etm: Split Coresight decode by aux records".
Changes since v1:
* Change 'decoders_per_cpu' variable name to 'decoders' and add a comment
* Add a warning that piped mode is best effort, suggested by Suzuki
James Clark (6):
perf cs-etm: Refactor initialisation of kernel start address
perf cs-etm: Split setup and timestamp search functions
perf cs-etm: Only setup queues when they are modified
perf cs-etm: Suppress printing when resetting decoder
perf cs-etm: Use existing decoder instead of resetting it
perf cs-etm: Pass unformatted flag to decoder
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 14 +-
tools/perf/util/cs-etm.c | 185 +++++++++---------
2 files changed, 97 insertions(+), 102 deletions(-)
--
2.28.0
Patch release for OpenCSD v.1.1.1 made to address C-API include file
issues for the ETE decoder, raised by work on perf tools
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Add -lstdc++ to perf when linking libopencsd as it is a dependency. It
does not hurt to add it when dynamic linking.
Filter out -static flag when building plugins as they are always built
as dynamic libraries and -static and -dynamic don't work well together
on arm and arm64.
Signed-off-by: Tamas Zsoldos <tamas.zsoldos(a)arm.com>
Signed-off-by: Branislav Rankov <branislav.rankov(a)arm.com>
---
tools/perf/Makefile.config | 2 +-
tools/perf/Makefile.perf | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 73df23dd664c..b014a9bdd0db 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -143,7 +143,7 @@ FEATURE_CHECK_LDFLAGS-libcrypto = -lcrypto
ifdef CSINCLUDES
LIBOPENCSD_CFLAGS := -I$(CSINCLUDES)
endif
-OPENCSDLIBS := -lopencsd_c_api -lopencsd
+OPENCSDLIBS := -lopencsd_c_api -lopencsd -lstdc++
ifdef CSLIBS
LIBOPENCSD_LDFLAGS := -L$(CSLIBS)
endif
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index e47f04e5b51e..cd3cf910fa8a 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -792,7 +792,7 @@ endif
$(patsubst perf-%,%.o,$(PROGRAMS)): $(wildcard */*.h)
-LIBTRACEEVENT_FLAGS += plugin_dir=$(plugindir_SQ) 'EXTRA_CFLAGS=$(EXTRA_CFLAGS)' 'LDFLAGS=$(LDFLAGS)'
+LIBTRACEEVENT_FLAGS += plugin_dir=$(plugindir_SQ) 'EXTRA_CFLAGS=$(EXTRA_CFLAGS)' 'LDFLAGS=$(filter-out -static,$(LDFLAGS))'
$(LIBTRACEEVENT): FORCE
$(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) $(OUTPUT)libtraceevent.a
--
2.17.1
This patchset consists of refactoring to allow the decoder to be
created in advance when the AUX records are iterated over. The
AUX record flags are used to communicate whether the data is
formatted or not which is the reason this refactoring is required.
These changes result in some simplifications, removal of early exit
conditions etc.
A change was also made to --dump-raw-trace code to allow the
formatted/unformatted status to persist and for the decoder to
not be continually deleted and recreated.
The changes apply on top of the previous patchset "[PATCH v7 0/2] perf
cs-etm: Split Coresight decode by aux records".
James Clark (6):
perf cs-etm: Refactor initialisation of kernel start address
perf cs-etm: Split setup and timestamp search functions
perf cs-etm: Only setup queues when they are modified
perf cs-etm: Suppress printing when resetting decoder
perf cs-etm: Use existing decoder instead of resetting it
perf cs-etm: Pass unformatted flag to decoder
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +-
tools/perf/util/cs-etm.c | 174 ++++++++----------
2 files changed, 84 insertions(+), 100 deletions(-)
--
2.28.0
Change applies to perf/core (45237f9898fc)
Changes since v6:
* Fix for snapshot mode where buffers are wrapped. This fix was done by clamping the aux record
size to the size of the buffer (see comment).
* Added an extra debugging printout.
* Typo/formatting fixes.
* Add the change for --dump-raw-trace as a second commit. I planned to do this later, but have now
finished it so I'll submit it at the same time.
* Did some more thorough testing around the different snapshot scenarios.
Decoding snapshot files with duplicate data is improved by this patchset because of the reason
mentioned at the end of the testing section. Coincidentally, the same issue is also fixed in
"[PATCH v1 0/3] coresight: Fix for snapshot mode" but by not saving duplicates, rather than not
decoding them.
James Clark (2):
perf cs-etm: Split Coresight decode by aux records
perf cs-etm: Split --dump-raw-trace by AUX records
tools/perf/util/cs-etm.c | 188 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 185 insertions(+), 3 deletions(-)
--
2.28.0
This is a series of fixes addressing the issues in the way we handle
- Self-Hosted trace filter control register for ETM/ETE
- AUX buffer and event handling of TRBE at overflow.
The use of TRUNCATED flag on an IRQ for the TRBE driver is
something that needs to be rexamined. Please see Patch 3 for
more details.
Suzuki K Poulose (5):
coresight: etm4x: Save restore TRFCR_EL1
coresight: etm4x: Use Trace Filtering controls dynamically
coresight: trbe: Keep TRBE disabled on overflow irq
coresight: trbe: Move irq_work queue processing
coresight: trbe: Prohibit tracing while handling an IRQ
.../coresight/coresight-etm4x-core.c | 109 ++++++++++++++----
drivers/hwtracing/coresight/coresight-etm4x.h | 7 +-
drivers/hwtracing/coresight/coresight-trbe.c | 91 ++++++++++-----
3 files changed, 149 insertions(+), 58 deletions(-)
--
2.24.1
HI James,
Agreed the header is missing from the install. Looking in more detail
the header should also be part of ocsd_c_api_types.h
so that it gets pulled in by any program including the main api header.
I'll look at an update to fix both these issues.
Normally these should come through the CS mailing list, or as an issue
on the git hub project.
Thanks
Mike
On Fri, 16 Jul 2021 at 14:56, James Clark <james.clark(a)arm.com> wrote:
>
> To be able to instantiate the OCSD_BUILTIN_DCD_ETE decoder the header
> trc_pkt_types_ete.h is required, so install it alongside the headers for
> the other versions.
>
> Signed-off-by: James Clark <james.clark(a)arm.com>
> ---
> decoder/build/linux/rctdl_c_api_lib/makefile | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/decoder/build/linux/rctdl_c_api_lib/makefile b/decoder/build/linux/rctdl_c_api_lib/makefile
> index a0bd5a3..7b4055d 100644
> --- a/decoder/build/linux/rctdl_c_api_lib/makefile
> +++ b/decoder/build/linux/rctdl_c_api_lib/makefile
> @@ -113,6 +113,8 @@ install_inc:
> $(INSTALL) --mode=0644 $(INST_INC_SRC)/etmv3/trc_pkt_types_etmv3.h $(INST_INC_DST)/etmv3/
> $(INSTALL) -d --mode=0755 $(INST_INC_DST)/etmv4
> $(INSTALL) --mode=0644 $(INST_INC_SRC)/etmv4/trc_pkt_types_etmv4.h $(INST_INC_DST)/etmv4/
> + $(INSTALL) -d --mode=0755 $(INST_INC_DST)/ete
> + $(INSTALL) --mode=0644 $(INST_INC_SRC)/ete/trc_pkt_types_ete.h $(INST_INC_DST)/ete/
> $(INSTALL) -d --mode=0755 $(INST_INC_DST)/c_api
> $(INSTALL) --mode=0644 $(INST_INC_SRC)/c_api/ocsd_c_api_types.h $(INST_INC_DST)/c_api/
> $(INSTALL) --mode=0644 $(INST_INC_SRC)/c_api/opencsd_c_api.h $(INST_INC_DST)/c_api/
> --
> 2.28.0
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
On 15/07/2021 04:09, Anshuman Khandual wrote:
> A small nit. Paragraphs in the commit message do not seem to be aligned
> properly to a maximum 75 characters width.
>
> On 7/12/21 5:08 PM, Suzuki K Poulose wrote:
>> When the TRBE generates an IRQ, we stop the TRBE, collect the trace
>> and then reprogram the TRBE with the updated buffer pointers in case
>> of a spurious IRQ. We might also leave the TRBE disabled, on an
>> overflow interrupt, without touching the ETE. This means the
>> the ETE is only disabled when the event is disabled later (via irq_work).
>> This is incorrect, as the ETE trace is still ON without actually being
>> captured and may be routed to the ATB.
>
> I had an assumption that when the TRBE is stopped, ETE would also stop
> implicitly given that the trace packets are not being accepted anymore.
> But if that assumption does not always hold true, then yes trace must
> be stopped upon a TRBE IRQ.
No, the ETE never stops, until it is stopped. The ETE doesn't care who
is the consumer of the trace. Be it TRBE or ATB or any other sink.
>
>>
>> So, we move the CPU into trace prohibited state (for all exception
>> levels) upon entering the IRQ handler. The state is restored before
>> enabling the TRBE back. Otherwise the trace remains prohibited.
>> Since, the ETM/ETE driver controls the TRFCR_EL1 per session,
>> (from commit "coresight: etm4x: Use Trace Filtering controls dynamically")
>
> commit SHA ID ?
>
The patch is in this series, not committed yet.
>> the tracing can be restored/enabled back when the event is rescheduled
>> in.
>
> Makes sense.
>
>>
>> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 43 ++++++++++++++++++--
>> 1 file changed, 40 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index c0c264264427..e4d88e0de2a8 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -83,6 +83,31 @@ struct trbe_drvdata {
>> struct platform_device *pdev;
>> };
>>
>> +static inline void write_trfcr(u64 val)
>> +{
>> + write_sysreg_s(val, SYS_TRFCR_EL1);
>> + isb();
>> +}
>> +
>
> There is another instance of write_trfcr() in coresight-etm4x-core.c and
> some other writes into SYS_TRFCR_EL1 elsewhere. write_trfcr() should be
> factored out and moved to a common place.
Agreed, but I couldn't find a right candidate for this. Welcome
to suggestions. May be we could add something like:
asm/self-hosted.h
>
>> +/*
>> + * Prohibit the CPU tracing at all ELs, in preparation to collect
>> + * the trace buffer.
>> + *
>> + * Returns the original value of the trfcr for restoring later.
>> + */
>> +static u64 cpu_prohibit_tracing(void)
>> +{
>> + u64 trfcr = read_sysreg_s(SYS_TRFCR_EL1);
>> +
>> + write_trfcr(trfcr & ~(TRFCR_ELx_ExTRE | TRFCR_ELx_E0TRE));
>> + return trfcr;
>> +}
>
> This also should be factored out along with etm4x_prohibit_trace()
> usage and moved to a common header instead.
>
>> +
>> +static void cpu_restore_tracing(u64 trfcr)
>> +{
>> + write_trfcr(trfcr);
>> +}
>> +
>> static int trbe_alloc_node(struct perf_event *event)
>> {
>> if (event->cpu == -1)
>> @@ -681,7 +706,7 @@ static int arm_trbe_disable(struct coresight_device *csdev)
>> return 0;
>> }
>>
>> -static void trbe_handle_spurious(struct perf_output_handle *handle)
>> +static void trbe_handle_spurious(struct perf_output_handle *handle, u64 trfcr)
>> {
>> struct trbe_buf *buf = etm_perf_sink_config(handle);
>>
>> @@ -691,6 +716,7 @@ static void trbe_handle_spurious(struct perf_output_handle *handle)
>> trbe_drain_and_disable_local();
>> return;
>> }
>
> A small comment here would be great because this will be the only
> IRQ handler path, where it actually restores the tracing back.
Agreed
>
>> + cpu_restore_tracing(trfcr);
>> trbe_enable_hw(buf);
>> }
>>
>> @@ -760,7 +786,18 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> struct perf_output_handle **handle_ptr = dev;
>> struct perf_output_handle *handle = *handle_ptr;
>> enum trbe_fault_action act;
>> - u64 status;
>> + u64 status, trfcr;
>> +
>> + /*
>> + * Prohibit the tracing, while we process this. We turn
>> + * things back right, if we get to enabling the TRBE
>> + * back again. Otherwise, the tracing still remains
>> + * prohibited, until the perf event state changes
>> + * or another event is scheduled. This ensures that
>> + * the trace is not generated when it cannot be
>> + * captured.
>> + */
>
> Right.
>
> But a small nit though. Please keep the comments here formatted and
> aligned with the existing ones.
>
ok
>> + trfcr = cpu_prohibit_tracing();
>>
>> /*
>> * Ensure the trace is visible to the CPUs and
>> @@ -791,7 +828,7 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> trbe_handle_overflow(handle);
>> break;
>> case TRBE_FAULT_ACT_SPURIOUS:
>> - trbe_handle_spurious(handle);
>> + trbe_handle_spurious(handle, trfcr);
>> break;
>> case TRBE_FAULT_ACT_FATAL:
>> trbe_stop_and_truncate_event(handle);
>>
>
> But stopping the trace (even though from a sink IRQ handler) is a source
> device action. Should not this be done via a new coresight_ops_source
> callback instead ?
It is a valid point. But that has limitations.
Here is the list:
* Stopping the source is a heavy hammer, especially if we
are about to continue the trace soon. (e.g, spurious
interrupt and possibly soon for FILL events with reworking
the flags)
* Stopping the source, via source_ops() is doing things
under the driving mode of the session, perf vs sysfs.
We only support perf though, but if there is another
user.
* This is agnostic to the mode (as above), the TRBE driver
doesn't need to be taught, how to find the path and
stop the current session for the given mode.
* If the tracing is enabled in kernel mode, the ETE still
generates the trace until we trigger the longer route
for disabling, which is not nice.
Suzuki
On 15/07/2021 04:23, Anshuman Khandual wrote:
>
>
> On 7/12/21 5:08 PM, Suzuki K Poulose wrote:
>> The TRBE driver issues the irq_work_run() to ensure that
>> any pending disable event callback has been processed. But this
>> is done before we mark the AUX buffer as TRUNCATED, making
>> the call pointless. Move the call after the current handle
>> has been closed.
>
> So there is a possibility that a disable event gets missed before
> the buffer is marked TRUNCATED ? But even then would not another
No, it is the other way around. A disable event is triggered
in response to the TRUNCATED flag.
> disable event be triggered again subsequently ? Just trying to
> understand what is the actual problem because of the current
> placement of irq_work_run() ?
That begs the question, What is the purpose of irq_work_run() ?
See below.
>
>>
>> Fixes: 3fbf7f011f24 ("coresight: sink: Add TRBE driver")
>> Reported-by: Tamas Zsoldos <Tamas.Zsoldos(a)arm.com>
>> Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
>> Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
>> Cc: Mike Leach <mike.leach(a)linaro.org>
>> Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org>
>> Cc: Leo Yan <leo.yan(a)linaro.org>
>> Cc: Will Deacon <will(a)kernel.org>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
>> ---
>> drivers/hwtracing/coresight/coresight-trbe.c | 14 ++++++++------
>> 1 file changed, 8 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/hwtracing/coresight/coresight-trbe.c b/drivers/hwtracing/coresight/coresight-trbe.c
>> index ec38cf17b693..c0c264264427 100644
>> --- a/drivers/hwtracing/coresight/coresight-trbe.c
>> +++ b/drivers/hwtracing/coresight/coresight-trbe.c
>> @@ -723,6 +723,14 @@ static void trbe_handle_overflow(struct perf_output_handle *handle)
>> perf_aux_output_flag(handle, PERF_AUX_FLAG_CORESIGHT_FORMAT_RAW |
>> PERF_AUX_FLAG_TRUNCATED);
>> perf_aux_output_end(handle, size);
>> +
>> + /*
>> + * Ensure perf callbacks have completed. Since we
>> + * always TRUNCATE the buffer on IRQ, the event
>> + * is scheduled to be disabled. Make sure that happens
>> + * as soon as possible.
>> + */
>> + irq_work_run();
>> }
>>
>> static bool is_perf_trbe(struct perf_output_handle *handle)
>> @@ -777,12 +785,6 @@ static irqreturn_t arm_trbe_irq_handler(int irq, void *dev)
>> if (!is_perf_trbe(handle))
>> return IRQ_NONE;
>>
>> - /*
>> - * Ensure perf callbacks have completed, which may disable
>> - * the trace buffer in response to a TRUNCATION flag.
>> - */
This comment here explains what the irq_work_run() is supposed to do.
This clearly indicates that the event will be disabled in response
to TRUNCATION. Now, we haven't updated the buffer flags yet here.
It happens *after* this call, in handle_overflow(). So, this is placed
incorrectly. The purpose of this patch is to make sure that we actually
complete the callbacks in response to the TRUNCATED flag we set with
this IRQ.
>> - irq_work_run();
>> -
>> act = trbe_get_fault_act(status);
>> switch (act) {
>> case TRBE_FAULT_ACT_WRAP:
>>
Suzuki