This patch set is to explore Coresight tracing data for postmortem
debugging. When kernel panic happens, the Coresight panic kdump can
help to save on-chip tracing data and tracer metadata into DRAM, later
relies on kdump and crash/perf tools to recovery tracing data for
"offline" analysis.
Comparing the patch series v4 and previous series, this patch series
has heavily refactored the implementation after investigated Intel PT
for kdump support. Intel PT calls one function for emergency stopping
trace when kernel panic occurs, in the function it reuses perf operation
to dump trace data into ring buffer, later crash tool extracts trace
data from perf ring buffer.
This patch series takes Intel PT as an example to use the same way to
stop ETM trace with perf mode. So far the related work is primarily
to focus on to support Coresight kdump with perf mode and we can add
support SysFS mode if later there have more clear requirement.
Comparing to previous series, this patch series also simplifies the
handling for tracer metadata. The old series introduced extra data
structure and two double link lists to maintain CoreSignt kdump
components; in the old implementation, one list was used to track tracer
metadata and another list was used to trace dump buffers, later these
two lists can be used to retrieve metadata and trace data buffer from
vmcore file. In this patch series it directly relies on CoreSight
driver global variables to retrieve related info, e.g. for perf mode we
can rely on per CPU pointer 'ctx_handle' to get perf ring buffer related
info and 'csdev_src' is for per CPU tracer device structure for metadata.
The crash extension program now has been enhanced to parse the data
structures in the kernel and use them to extract metadata and dump trace
data [1]; the crash extension program is updated to build with OpenCSD
decoder so this can simplize the decoding process, rather than before
needs to use perf to help decoding trace data.
This patch series has been verified on 96boards DB410c with below steps,
the 'long_loop' is a pretty simple program to only execute big number
loops so can generate big amount number of branch instructions.
Enable trace on the target board:
$ perf record -e cs_etm/(a)825000.etf/ --per-thread ./long_loop &
$ sleep 3
$ echo c > /proc/sysrq-trigger
Use crash tool for post analysis:
$ crash vmcore vmlinux
crash> extend arm_cs_dump.so
crash> arm_cs_dump -o out
[1] https://git.linaro.org/people/leo.yan/crash.git/log/?h=arm_cs_dump_etm_perf
Changes from v4:
* Support for CoreSight ETM with perf mode;
* Add API for crash stop;
* Simplized implementation with removing kdump dedicated data structures
and functions;
Changes from v3:
* Following Mathieu suggestion, reworked the panic kdump framework,
used kdump array to maintain source and sink device handlers;
* According to Mathieu suggestion, optimized panic notifier to
firstly dump panic CPU tracing data and then dump other CPUs tracing
data;
* Refined doc to reflect these implementation changes;
* Changed ETMv4 driver to add source device handler at probe phase;
* Refactored crash extension program to reflect kernel changes.
Changes from v2:
* Add the two patches for documentation.
* Following Mathieu suggestion, reworked the panic kdump framework,
removed the useless flag "PRE_PANIC".
* According to comment, changed to add and delete kdump node operations
in sink enable/disable functions;
* According to Mathieu suggestion, handle kdump node
addition/deletion/updating separately for sysFS interface and perf
method.
Changes from v1:
* Add support to dump ETMv4 meta data.
* Wrote 'crash' extension csdump.so so rely on it to generate 'perf'
format compatible file.
* Refactored panic dump driver to support pre & post panic dump.
Changes from RFC:
* Follow Mathieu's suggestion, use general framework to support dump
functionality.
* Changed to use perf to analyse trace data.
Leo Yan (6):
doc: Add Coresight documentation directory
doc: Add documentation for Coresight panic kdump
coresight: etm4x: Save ID values in config structure
coresight: tmc: Update latest value for page index and offset
coresight: etm-perf: Add interface to stop etm trace
arm64: smp: Stop CoreSight trace for kdump
.../trace/{ => coresight}/coresight-cpu-debug.txt | 0
.../trace/coresight/coresight-panic-kdump.txt | 99 ++++++++++++++++++++++
Documentation/trace/{ => coresight}/coresight.txt | 0
MAINTAINERS | 5 +-
arch/arm64/kernel/smp.c | 5 ++
drivers/hwtracing/coresight/Kconfig | 10 +++
drivers/hwtracing/coresight/coresight-etm-perf.c | 10 +++
drivers/hwtracing/coresight/coresight-etm4x.c | 7 ++
drivers/hwtracing/coresight/coresight-etm4x.h | 8 ++
drivers/hwtracing/coresight/coresight-tmc-etf.c | 8 ++
include/linux/coresight.h | 6 ++
11 files changed, 156 insertions(+), 2 deletions(-)
rename Documentation/trace/{ => coresight}/coresight-cpu-debug.txt (100%)
create mode 100644 Documentation/trace/coresight/coresight-panic-kdump.txt
rename Documentation/trace/{ => coresight}/coresight.txt (100%)
--
2.7.4
Coresight architecture defines CLAIM tags for a device to negotiate
control of the components (external agent vs self-hosted). Each device
has a pair of registers (CLAIMSET & CLAIMCLR) for managing the CLAIM
tags. However, the protocol for the CLAIM tags is IMPLEMENTATION DEFINED.
PSCI has recommendations for the use of the CLAIM tags to negotiate
controls for external agent vs self-hosted use, as defined in
ARM DEN 0022D, Section "6.8.1 Debug and Trace save and restore".
This series implements the recommended protocol by PSCI.
There were two options for the implementation.
1) Have the claim/disclaim operations performed from the coresight
generic driver - This doesn't work unfortunately for ETM devices
as the need cross-CPU calls to access the CLAIM registers. Also,
makes it complex for error recovery and reference counting.
2) Have the claim/disclaim operations performed from the device
specific drivers. The disadvantage is that the calls are sprinkled
in each driver, but this makes the operation much simpler.
This series implements the method (2). The first part of the series
prepares different drivers to handle errors from the lower layer
and clean up the state. The second part of the series updates the
existing drivers to claim/disclaim the devices as necessary.
Tested with a hacked coresight driver which modifies the external
claim tag via sysfs handle.
Applies on coresight/next in Mathieu's tree.
Changes since V1:
- Handle errors is enabling path and disable only the components
that were enabled in the iteration.
- Fix build break on arm32 (etm3x)
- Update commit description for "coresight: Add support for CLAIM tag protocol"
Suzuki K Poulose (14):
coresight: Handle failures in enabling a trace path
coresight: tmc-etr: Refactor for handling errors
coresight: tmc-etr: Handle errors enabling CATU
coresight: tmc-etb/etf: Prepare to handle errors enabling
coresight: etm4x: Add support for handling errors
coresight: etm3: Add support for handling errors
coresight: etb10: Handle errors enabling the device
coresight: dynamic-replicator: Handle multiple connections
coresight: Add support for CLAIM tag protocol
coresight: etmx: Claim devices before use
coresight: funnel: Claim devices before use
coresight: catu: Claim device before use
coresight: dynamic-replicator: Claim device for use
coreisght: tmc: Claim device before use
drivers/hwtracing/coresight/coresight-catu.c | 6 ++
.../coresight/coresight-dynamic-replicator.c | 79 ++++++++++----
drivers/hwtracing/coresight/coresight-etb10.c | 18 +++-
drivers/hwtracing/coresight/coresight-etm3x.c | 56 +++++++---
drivers/hwtracing/coresight/coresight-etm4x.c | 51 ++++++---
drivers/hwtracing/coresight/coresight-funnel.c | 26 ++++-
drivers/hwtracing/coresight/coresight-priv.h | 7 ++
drivers/hwtracing/coresight/coresight-tmc-etf.c | 95 +++++++++++------
drivers/hwtracing/coresight/coresight-tmc-etr.c | 80 +++++++++-----
drivers/hwtracing/coresight/coresight.c | 118 +++++++++++++++++++--
include/linux/coresight.h | 20 ++++
11 files changed, 434 insertions(+), 122 deletions(-)
--
2.7.4
This patch series is to two fixing of updating ring buffer in tmc-etf
driver. The first patch is to fix alignment setting for RRP; the second
patch tries to fix discarding trace data issue caused by filling
barrier packets in the same place, the patch keeps complete trace data
with inserting extra barrier packets.
This patch series has been rebased on CoreSight next branch:
https://git.linaro.org/kernel/coresight.git/log/?h=next with latest
commit 3733ca5a6578 ("coresight: tmc: Refactor loops in etb dump").
Changes from v1:
* Rebased on CoreSight next branch (Sept 11th, 2018);
* Added checking 'lost || to_read > handle->size' to set 'barrier_sz'.
Leo Yan (2):
coresight: tmc: Fix byte-address alignment for RRP
coresight: tmc: Fix writing barrier packets for ring buffer
drivers/hwtracing/coresight/coresight-tmc-etf.c | 41 +++++++++++++++++--------
1 file changed, 29 insertions(+), 12 deletions(-)
--
2.7.4
We do not enable scatter-gather mode in the TMC-ETR by default
to prevent malfunctioning of systems where the ETR may not be
properly connected to the memory subsystem to allow for simultaneous
READ/WRITE transactions when used in SG mode. Instead we whitelist
the platforms where we know that it is safe to use the mode.
All revisions of Juno have a proper ETR connection and hence
white list them.
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Mike Leach <mike.leach(a)linaro.org>
Cc: Sudeep Holla <sudeep.holla(a)arm.com>
Cc: Liviu Dudau <liviu.dudau(a)arm.com>
Cc: Lorenzo Pierlisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
---
arch/arm64/boot/dts/arm/juno-base.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi
index ce56a4a..3596e5d 100644
--- a/arch/arm64/boot/dts/arm/juno-base.dtsi
+++ b/arch/arm64/boot/dts/arm/juno-base.dtsi
@@ -199,6 +199,7 @@
clocks = <&soc_smc50mhz>;
clock-names = "apb_pclk";
power-domains = <&scpi_devpd 0>;
+ arm,scatter-gather;
port {
etr_in_port: endpoint {
slave-mode;
--
2.7.4
>From the comment in the code, it claims the requirement for byte-address
alignment for RRP register: 'for 32-bit, 64-bit and 128-bit wide trace
memory, the four LSBs must be 0s. For 256-bit wide trace memory, the
five LSBs must be 0s'. This isn't consistent with the program, the
program sets five LSBs as zeros for 32/64/128-bit wide trace memory and
set six LSBs zeros for 256-bit wide trace memory.
After checking with the CoreSight Trace Memory Controller technical
reference manual (ARM DDI 0461B, section 3.3.4 RAM Read Pointer
Register), it proves the comment is right and the program does wrong
setting.
This patch fixes byte-address alignment for RRP by following correct
definition in the technical reference manual.
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Cc: Mike Leach <mike.leach(a)linaro.org>
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-tmc-etf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 0549249..e310613 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -438,10 +438,10 @@ static void tmc_update_etf_buffer(struct coresight_device *csdev,
case TMC_MEM_INTF_WIDTH_32BITS:
case TMC_MEM_INTF_WIDTH_64BITS:
case TMC_MEM_INTF_WIDTH_128BITS:
- mask = GENMASK(31, 5);
+ mask = GENMASK(31, 4);
break;
case TMC_MEM_INTF_WIDTH_256BITS:
- mask = GENMASK(31, 6);
+ mask = GENMASK(31, 5);
break;
}
--
2.7.4
In ETB dump function tmc_etb_dump_hw() it has nested loops. The second
level loop is to iterate index in the range [0 .. drvdata->memwidth);
but the index isn't really used in the code, thus the second level
loop is useless.
This patch is to remove the second level loop; the refactor also reduces
indentation and we can use 'break' to replace 'goto' tag.
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
drivers/hwtracing/coresight/coresight-tmc-etf.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index 9c599c9..8b34161 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -34,23 +34,20 @@ static void tmc_etb_dump_hw(struct tmc_drvdata *drvdata)
{
char *bufp;
u32 read_data, lost;
- int i;
/* Check if the buffer wrapped around. */
lost = readl_relaxed(drvdata->base + TMC_STS) & TMC_STS_FULL;
bufp = drvdata->buf;
drvdata->len = 0;
while (1) {
- for (i = 0; i < drvdata->memwidth; i++) {
- read_data = readl_relaxed(drvdata->base + TMC_RRD);
- if (read_data == 0xFFFFFFFF)
- goto done;
- memcpy(bufp, &read_data, 4);
- bufp += 4;
- drvdata->len += 4;
- }
+ read_data = readl_relaxed(drvdata->base + TMC_RRD);
+ if (read_data == 0xFFFFFFFF)
+ break;
+ memcpy(bufp, &read_data, 4);
+ bufp += 4;
+ drvdata->len += 4;
}
-done:
+
if (lost)
coresight_insert_barrier_packet(drvdata->buf);
return;
--
2.7.4
Coresight uses DT graph bindings to describe the connections of the
components. However we have some undocumented usage of the bindings
to describe some of the properties of the connections.
The coresight driver needs to know the hardware ports invovled
in the connection and the direction of data flow to effectively
manage the trace sessions. So far we have relied on the "port"
address (as described by the generic graph bindings) to represent
the hardware port of the component for a connection.
The hardware uses separate numbering scheme for input and output
ports, which implies, we could have two different (input and output)
ports with the same port number. This could create problems in the
graph bindings where the label of the port wouldn't match the address.
e.g, with the existing bindings we get :
port@0{ // Output port 0
reg = <0>;
...
};
port@1{
reg = <0>; // Input port 0
endpoint {
slave-mode;
...
};
};
With the new enforcement in the DT rules, mismatches in label and address
are not allowed (as see in the case for port@1). So, we need a new mechanism
to describe the hardware port number reliably.
Also, we relied on an undocumented "slave-mode" property (see the above
example) to indicate if the port is an input port. Let us formalise and
switch to a new property to describe the direction of data flow.
There were three options considered for the hardware port number scheme:
1) Use natural ordering in the DT to infer the hardware port number.
i.e, Mandate that the all ports are listed in the DT and in the ascending
order for each class (input and output respectively).
Pros :
- We don't need new properties and if the existing DTS list them in
order (which most of them do), they work out of the box.
Cons :
- We must list all the ports even if the system cannot/shouldn't use
it.
- It is prone to human errors (if the order is not kept).
2) Use an explicit property to list both the direction and the hw port
number and direction. Define "coresight,hwid" as 2 member array of u32,
where the members are port number and the direction respectively.
e.g
port@0{
reg = <0>;
endpoint {
coresight,hwid = <0 1>; // Port # 0, Output
}
};
port@1{
reg = <1>;
endpoint {
coresight,hwid = <0 0>; // Port # 0, Input
};
};
Pros:
- The bindings are formal but not so reader friendly and could
potentially lead to human errors.
Cons:
- Backward compatiblity is lost.
3) Use explicit properties (implemented in the series) for the hardware
port id and direction. We define a new property "coresight,hwid" for
each endpoint in coresight devices to specify the hardware port number
explicitly. Also use a separate property "direction" to specify the
direction of the data flow.
e.g,
port@0{
reg = <0>;
endpoint {
direction = <1>; // Output
coresight,hwid = <0>; // Port # 0
}
};
port@1{
reg = <1>;
endpoint {
direction = <0>; // Input
coresight,hwid = <0>; // Port # 0
};
};
Pros:
- The bindings are formal and reader friendly, and less prone to errors.
Cons:
- Backward compatibility is lost.
After a round of discussions [1], the following option (4) is adopted :
4) Group ports based on the directions under a dedicated node. This has been
checked with the upstream DTC tool to resolve the "address mismatch" issue.
e.g,
out-ports { // Output ports for this component
port@0 { // Outport 0
reg = 0;
endpoint { ... };
};
port@1 { // Outport 1
reg = 1;
endpoint { ... };
};
};
in-ports { // Input ports for this component
port@0 { // Inport 0
reg = 0;
endpoint { ... };
};
port@1 { // Inport 1
reg = 1;
endpoint { ... };
};
};
This series implements Option (4) listed above and falls back to the old
bindings if the new bindings are not available. This allows the systems
with old bindings work with the new driver. The driver now issues a warning
(once) when it encounters the old bindings. The series contains DT update
for Juno platform. The remaining in-kernel sources could be updated once
we are fine with the proposal.
It also cleans up the platform parsing code to reduce the memory usage by
reusing the platform description.
Applies on coresight/next
Changes since V2:
- Clean of_coresight_parse_endpoint() to return 1 to indicate a connection
record was updated.
- Drop documentation for old bindings
Changes since V1:
- Implement the proposal by Rob.
- Drop the DTS updates for all platforms except Juno
- Drop the incorrect fix in coresight_register. Instead document the code
to prevent people trying to un-fix it again.
- Add a patch to drop remote device references in DT graph parsing
- Split of_node refcount fixing patch, fix a typo in the comment.
- Add Reviewed-by tags from Mathieu.
- Drop patches picked up for 4.18-rc series
Changes since RFC:
- Fixed style issues
- Fix an existing memory leak coresight_register (Found in code update)
- Fix missing of_node_put() in the existing driver (Reported-by Mathieu)
- Update the existing dts in kernel tree.
Suzuki K Poulose (9):
coresight: Document error handling in coresight_register
coresight: platform: Refactor graph endpoint parsing
coresight: platform: Fix refcounting for graph nodes
coresight: platform: Fix leaking device reference
coresight: Fix remote endpoint parsing
coresight: Add helper to check if the endpoint is input
coresight: platform: Cleanup coresight connection handling
coresight: Cleanup coresight DT bindings
dts: juno: Update coresight bindings
.../devicetree/bindings/arm/coresight.txt | 95 +++++---
arch/arm64/boot/dts/arm/juno-base.dtsi | 161 ++++++------
arch/arm64/boot/dts/arm/juno-cs-r1r2.dtsi | 52 ++--
arch/arm64/boot/dts/arm/juno.dts | 13 +-
drivers/hwtracing/coresight/coresight.c | 35 +--
drivers/hwtracing/coresight/of_coresight.c | 269 ++++++++++++++-------
include/linux/coresight.h | 9 +-
7 files changed, 359 insertions(+), 275 deletions(-)
--
2.7.4
On Wed, 8 Aug 2018 at 01:59, Tomasz Nowicki <tnowicki(a)caviumnetworks.com> wrote:
>
> Hi Mathieu,
>
> It's been a while but I am back to Coresight.
>
> Let me remind my setup and the issue I am struggling with now.
>
> Kernel baseline:
> https://github.com/Linaro/perf-opencsd (perf-opencsd-v4.16)
> OpenCSD:
> https://github.com/Linaro/OpenCSD.git (master)
>
> The simplest Coresight components path I used as a start point:
> ETMv4.1 -> TDR -> FUNNEL -> ETF
>
> As I mentioned TDR is built by Cavium and it was added to aggregate 128
> inputs into one output rather than cascading funnels. TDR has its own
> driver just to keep path connected in Linux Coresight framework.
>
> Here is how I catch some trace data:
> sudo perf record -C 0 -e cs_etm/@etf0/ --per-thread test_app
The above command line tells perf to trace everything that is
happening on CPU0 for as long as "test_app" is executing. In this
case the "--per-thread" option is ignored. This is called a CPU-wide
trace scenario and is currently not supported for CS (I am currently
working on it).
If you want to make sure "test_app" executes on CPU0 and that you
trace just that you will need to use the "taskset" utility:
sudo perf record -e cs_etm/@etf0/ --per-thread taskset 0x1 test_app
An alternative to the above would be to CPU-hotplug out CPU128-255
while you are testing.
Let's start with that before going further.
Thanks,
Mathieu
>
> I need to use -C because my machines has 2 nodes, 32 cores (128 threads)
> each and each node has different ETF. So I have to specify which CPU is
> the source and for specified ETF sink (EFT0 can be a sink for
> CPU0-CPU127, ETF1 can be a sink for CPU128-CPU255). Otherwise Linux
> cannot find path for ETMs related to CPU128-CPU255 if I specify ETF0 as
> a sink.
>
> Overall, I can see some data using:
> # sudo perf report --stdio --dump
> [...]
> . ... CoreSight ETM Trace data: size 16384 bytes
> Frame deformatter: Found 4 FSYNCS
> ID:12 RESET operation on trace decode path
> Idx:108; ID:12; I_NOT_SYNC : I Stream not synchronised
> Idx:455; ID:12; I_ASYNC : Alignment Synchronisation.
> Idx:468; ID:12; I_TRACE_INFO : Trace Info.; INFO=0x0
> Idx:470; ID:12; I_TRACE_ON : Trace On.
> Idx:471; ID:12; I_CTXT : Context Packet.; Ctxt: AArch64,EL0, NS;
> Idx:473; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
> Addr=0x0000AAABE0B09584;
> Idx:483; ID:12; I_ATOM_F1 : Atom format 1.; N
> Idx:484; ID:12; I_TIMESTAMP : Timestamp.; Updated val =
> 0x1b6a5d937cc1
> Idx:492; ID:12; I_ATOM_F3 : Atom format 3.; NNE
> Idx:493; ID:12; I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.;
> Addr=0x0000AAABE0B0D210;
> Idx:504; ID:12; I_ATOM_F3 : Atom format 3.; NEE
> Idx:505; ID:12; I_ATOM_F3 : Atom format 3.; NEN
> Idx:506; ID:12; I_ATOM_F6 : Atom format 6.; EEEN
> Idx:507; ID:12; I_ATOM_F3 : Atom format 3.; NNE
> Idx:508; ID:12; I_ATOM_F1 : Atom format 1.; N
> Idx:509; ID:12; I_ATOM_F3 : Atom format 3.; NNN
> Idx:510; ID:12; I_ATOM_F3 : Atom format 3.; EEN
> Idx:512; ID:12; I_ATOM_F1 : Atom format 1.; E
> [...]
>
> However, I still see errors while using:
> # sudo perf report --stdio
> 0x1e8 [0x60]: failed to process type: 1
> Error:
> failed to process sample
> # To display the perf.data header info, please use
> --header/--header-only options.
>
> The reason is that cs_etm__process_event() is failing on:
> if (!etm->timeless_decoding)
> return -EINVAL;
>
> and etm->timeless_decoding is setup in cs_etm__is_timeless_decoding().
> For some events time bit set and so far I failed to figure out what is
> going on. Have you met similar issue so far? Any pointers or hints are
> very appreciated.
>
> One more comment below.
>
> On 10.01.2018 21:10, Mathieu Poirier wrote:
> > On 10 January 2018 at 06:57, Tomasz Nowicki <tnowicki(a)caviumnetworks.com> wrote:
> >> Hello Mathieu,
> >>
> >> Thank you for your response. Please see comments below.
> >>
> >> On 08.01.2018 17:53, Mathieu Poirier wrote:
> >>>
> >>> Good day Tomasz,
> >>>
> >>>
> >>> On 5 January 2018 at 05:51, tn <Tomasz.Nowicki(a)caviumnetworks.com> wrote:
> >>>>
> >>>> Hi Mathieu,
> >>>>
> >>>> I am bringing up Coresight functiproject zeroonality on ThunderX2. While
> >>>> ramping up I
> >>>> come across your Connect session:
> >>>>
> >>>> which I found very helpful.
> >>>
> >>>
> >>> Perfect - a few things have changed since then, see below.
> >>>
> >>>>
> >>>> During my research I had to create new Coresight component driver for
> >>>> Linux,
> >>>> here is the story. For ThunderX2, we aggregate data trace from all 128
> >>>> ETMs
> >>>> into one funnel inport using so called TDR (Trace Data Ring) component.
> >>>> This
> >>>> should be transparent to software and does not require configuration at
> >>>> all.
> >>>> However, Linux Coresight framework requires components to be connected
> >>>> each
> >>>> other so we cannot leave funnel and ETMs disconnected in DT. I decided to
> >>>> create pure software component i.e. TDR which is meant to connect chain
> >>>> only, no actions on registers.
> >>>
> >>>
> >>> Is this TDR an ARM IP or built in-house by Cavium?
> >>
> >>
> >> This is Cavium specific component which I am going to upstream once I test
> >> the whole functionality.
> >>
> >> And I suppose it
> >>>
> >>> was added there to aggregate 128 input into one output rather than
> >>> cascading funnels?
> >>
> >>
> >> Correct.
> >>
> >>>>
> >>>> Now I am able to enable ETF sink and path from ETM via TDR via FUNNEL up
> >>>> to
> >>>> ETF and gather some data. To be sure things work properly I want to
> >>>> decode
> >>>> data using Linaro OpenCSD library following instructions from here:
> >>>>
> >>>> https://community.arm.com/tools/b/blog/posts/do-a-coresight-trace-on-linux-…
> >>>
> >>>
> >>> Thanks for pointing this out, I didn't know about it.
> >>>
> >>>> but still got error while doing 'perf report' step. Kernel perf tool
> >>>> support
> >>>> for OpenCSD is out of tree for now so I may miss some patches.
> >>>
> >>>
> >>> Can you get me a pastebin of the errors you're getting?
> >>
> >>
> >> Sure, see:
> >> https://pastebin.com/6YDq8KfC
> >> As you see there is not much info about error cause.
> >>
> >>>
> >>>>
> >>>> Here is my setup:
> >>>> https://github.com/Linaro/perf-opencsd/commits/upstream-v1 (+ ThunderX2
> >>>> specific patches)
> >>>
> >>>
> >>> Oh boy... I wasn't expecting people to use that but I suppose it is
> >>> the right thing to do. Keep going with that code.
> >>>
> >>>> https://github.com/Linaro/OpenCSD/commits/master
> >>>
> >>>
> >>> This, in combination with the upstream-v1 branch should work properly.
> >>> That's how I test things on my Juno and Dragon board.
> >>>
> >>>>
> >>>> # echo 1 > etf0/enable_sink
> >>>> # perf record -C 0 -e cs_etm// sleep 2
> >>>
> >>>
> >>> Ok, that won't work as the -C option is currently not supported (I am
> >>> working on it). I also suggest to make sure you have the very latest
> >>> TIP [1] on branch [2] and to carefully read the README.md. We
> >>> recently updated the instructions to fit the newest development.
> >>> Lastly we have deprecated enabling the sink from the sysFS interface -
> >>> it can still work but no guarantees are provided. It is better to
> >>> specify the sink as part of the perf record command line, as shown in
> >>> the most recent HOWTO.md.
> >>
> >>
> >> I am able to specify sink as part of the perf record command line only for
> >> Linux Perf master branch:
> >> https://github.com/Linaro/perf-opencsd/commits/master
> >>
> >> For upstream-v1 branch I am getting:
> >> $ perf record -vvv -e cs_etm/@etf0/ --per-thread uname
> >> Using CPUID 0x00000000420f5160
> >> perf: util/evsel.c:783: apply_config_terms: Assertion `!(1)' failed.
> >> Aborted (core dumped)
> >
> >
> > Ok, I've uploaded upstream-v2. With that branch everything works fine
> > on my side, no changes needed. I added a fix for a regression in the
> > perf tip tree and the code required to use the ETR from the perf
> > interface.
> >
> > One thing about the above: "@etf0". Is this really the name you gave
> > to the device in the DT? Look under /sys/bus/coresight/devices/ for
> > an etf entry. What is listed there should is the name of the ETF as
> > it is known to the system.
>
> Indeed, the name is different but for perf command clarity I use shortcut.
>
> Thanks,
> Tomasz
+CoreSight ML and Mathieu
---------- Forwarded message ----------
From: Mike Leach <mike.leach(a)linaro.org>
Date: 3 September 2018 at 17:39
Subject: Re: Failed for ETM decoding with db410c snapshot mode
To: Leo Yan <leo.yan(a)linaro.org>
HI Leo,
Short summary - there is a problem with the trace collected - not the
decoder. See below for details
On 3 September 2018 at 08:06, <leo.yan(a)linaro.org> wrote:
> Hi Mike, Mathieu,
>
> [ + CoreSight ML ]
>
> When I work on the CoreSight + perf tool and used crash extension
> program to extract the tracing data from perf aux buffer, finally I
> can get the trace data for about 1.6MB from ETF sink from DB410c board.
>
> To verify the extracted trace data, I used 'snapshot' mode under
> OpenCSD code base, you could see the tar file for this [1]. After
> you download this file, you could place it under OpenCSD folder:
>
> $ cp db410c_snapshot_kdump.tgz my_opencsd/decoder/tests/snapshots
> $ cd my_opencsd/decoder/tests/snapshots
> $ tar zxvf db410c_snapshot_kdump.tgz
> $ cd db410c_snapshot_kdump
>
> $ ../../bin/builddir/trc_pkt_lister
This will print raw trace packets as it finds them without attempting
any sort of interpretation.
> $ ../../bin/builddir/trc_pkt_lister -decode
This will try to decode the raw trace packets into a sequence of
instructions executed (alongside the raw packets)
This is where the packets are being flagged as incorrect.
>
> If I use the command 'trc_pkt_lister' without any extra options, it
> can print out trace packets successfully; but if I add the extra
> option '-decode' it uses 'decode all' mode and it reports the errors as:
>
> 483710 Idx:53086; ID:10; [0xf8 ]; I_ATOM_F3 : Atom format 3.; NNN
> 483711 Idx:53086; ID:10; OCSD_GEN_TRC_ELEM_ADDR_NACC( 0xffff000008abc9f0 )
> 483712 Idx:53088; ID:10; [0xdb ]; I_ATOM_F2 : Atom format 2.; EE
> 483713 Idx:53194; ID:10; [0x6b 0x8c 0x08 0xfa 0xdc 0x95 0x5c ]; I_COND_RES_F1 : Conditional Result, format 1.
This is a conditional result trace packet - however as far as I am
aware the trace unit on an A53 (i.e. DB410 core) cannot produce these.
Additionally in the entire file I see 2 I_COND packets and 1
I_NUM_DS_MKR - a data synchronisation marker packet.
Now Data sync can only ever occur if data trace is supported and
enabled. Data trace is architecturally prohibited for A class v8 cores
(and unimplemented on most A class v7 cores).
If there were tracing of conditional elements occurring, and it were
enabled, then the packets should match up - a cond instruction should
match with one cond result element.
But in the end - event without these inconsistencies - the TRACE_INFO
element at the top of the listing tells me that conditional
instruction trace is disabled.
Thus you are seeing what I believe is the effect of concatenating
trace data buffers together (you mention you have 1.6MB of data from
the ETF - which is not that large), without inserting barrier packets
in between.
The decoder cannot spot the boundaries, and will carry on and be out
of sync so can mis-read trace packet payload data as header data which
will throw off the decode process.
When I look at the raw byte data I am seeing this at the top of the listing:-
Frame Data; Index 0; ID_DATA[????]; ff
Frame Data; Index 0; ID_DATA[0x7f]; 7f ff 7f ff 7f ff
This does not look valid at all to me.
> 483714 DCD_ETMV4_0016 : 0x0018 (OCSD_ERR_BAD_DECODE_PKT) [Reserved or unknown packet in decoder.]; Unsupported packet type.Trace Packet Lister : Data Path fatal error
> 483715 0x0018 (OCSD_ERR_BAD_DECODE_PKT) [Reserved or unknown packet in decoder.]; Unsupported packet type.Trace Packet Lister : Trace buffer done, processed 53216 bytes.
>
> You also could check detailed log trc_pkt_lister.ppl in the shared
> tar packet; After searched for the OpenCSD code and found this error is
> due it cannot support some types of packets [2].
>
> So want to check what's the best for this issue; seems to me we need
> to fix this so it can support well to complete the decoding?
>
The reason we have not implemented support for these packets, is that
we have never seen an implementation that generates them.
regards
Mike
> Thanks in advance for suggestion.
> Leo Yan
>
> [1] http://people.linaro.org/~leo.yan/opencsd_db410c/db410c_snapshot_kdump.tgz
> [2] https://github.com/Linaro/OpenCSD/blob/master/decoder/source/etmv4/trc_pkt_…
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK