This patch series is to address the issue for synthesizing instruction
samples, especially when the instruction sample period is small enough,
the current logic cannot synthesize multiple instruction samples within
one instruction range packet.
To fix this issue, patch 0001 avoids to reset the last branches for
every instruction sample; if reset the last branches when every time
generate instruction sample, then the later samples in the same range
packet cannot use the last branches anymore.
Patch 0002 is the main patch to fix the logic for synthesizing
instruction samples; it allows to handle different instruction periods.
Patch 0003 is an optimization for copying last branches; it only copies
last branches once if the instruction samples share the same last
branches.
Patch 0004 is a minor fix for unsigned variable comparison to zero.
To verify my changing for synthesizing instruction samples, I added
some logs in the code, and reviewed the output log manually for
instuctions samples. The below commands are tested on DB410c board:
# perf script --itrace=i2
# perf script --itrace=i2il16
# perf inject --itrace=i2il16 -i perf.data -o perf.data.new
# perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Changes from v1:
* Rebased patch set on perf/core branch with latest commit 9fec3cd5fa4a
("perf map: Check if the map still has some refcounts on exit").
Leo Yan (4):
perf cs-etm: Continuously record last branches
perf cs-etm: Correct synthesizing instruction samples
perf cs-etm: Optimize copying last branches
perf cs-etm: Fix unsigned variable comparison to zero
tools/perf/util/cs-etm.c | 137 ++++++++++++++++++++++++++++++++-------
1 file changed, 115 insertions(+), 22 deletions(-)
--
2.17.1
This patch series is to address the issue for synthesizing instruction
samples, especially when the instruction sample period is small enough,
the current logic cannot synthesize multiple instruction samples within
one instruction range packet.
To fix this issue, patch 0001 avoids to reset the last branches for
every instruction sample; if reset the last branches when every time
generate instruction sample, then the later samples in the same range
packet cannot use the last branches anymore.
Patch 0002 is the main patch to fix the logic for synthesizing
instruction samples; it allows to handle different instruction periods.
Patch 0003 is an optimization for copying last branches; it only copies
last branches once if the instruction samples share the same last
branches.
Patch 0004 is a minor fix for unsigned variable comparison to zero.
To verify my changing for synthesizing instruction samples, I added
some logs in the code, and reviewed the output log manually for
instuctions samples. The below commands are tested on DB410c board:
# perf script --itrace=i2
# perf script --itrace=i2li16
# perf inject --itrace=i2il16 -i perf.data -o perf.data.new
# perf inject --itrace=i100il16 -i perf.data -o perf.data.new
Leo Yan (4):
perf cs-etm: Continuously record last branches
perf cs-etm: Correct synthesizing instruction samples
perf cs-etm: Optimize copying last branches
perf cs-etm: Fix unsigned variable comparison to zero
tools/perf/util/cs-etm.c | 137 ++++++++++++++++++++++++++++++++-------
1 file changed, 115 insertions(+), 22 deletions(-)
--
2.17.1
Review of ETMV4 sysfs code resulted in a number of minor issues being
discovered.
Patch set fixes these issues:-
1) Update for ETM v4.4 archtecture.
2) Add missing single shot comparator API.
3) Misc fixes and improvements to sysfs API
4) Updated programmers documentation and reference.
Changes since v1 (from reviews by Mathieu and Leo):-
Usability patch split into 2 separate functional patches.
Docs patch split into 3 patches.
Misc style and comment typo fixes.
Mike Leach (11):
coresight: etm4x: Fixes for ETM v4.4 architecture updates.
coresight: etm4x: Fix input validation for sysfs.
coresight: etm4x: Add missing API to set EL match on address filters
coresight: etm4x: Fix issues with start-stop logic.
coresight: etm4x: Improve usability of sysfs - include/exclude addr.
coresight: etm4x: Improve usability of sysfs - CID and VMID masks.
coresight: etm4x: Add view comparator settings API to sysfs.
coresight: etm4x: Add missing single-shot control API to sysfs
coresight: etm4x: docs: Update ABI doc for sysfs features added.
coresight: docs: Create common sub-directory for coresight trace.
coresight: etm4x: docs: Adds detailed document for programming etm4x.
.../testing/sysfs-bus-coresight-devices-etm4x | 183 ++++---
.../{ => coresight}/coresight-cpu-debug.txt | 0
.../coresight/coresight-etm4x-reference.txt | 458 ++++++++++++++++++
.../trace/{ => coresight}/coresight.txt | 0
MAINTAINERS | 3 +-
.../coresight/coresight-etm4x-sysfs.c | 312 +++++++++++-
drivers/hwtracing/coresight/coresight-etm4x.c | 32 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 18 +-
8 files changed, 905 insertions(+), 101 deletions(-)
rename Documentation/trace/{ => coresight}/coresight-cpu-debug.txt (100%)
create mode 100644 Documentation/trace/coresight/coresight-etm4x-reference.txt
rename Documentation/trace/{ => coresight}/coresight.txt (100%)
--
2.17.1
This patch series adds support for thread stack and callchain.
Patch 01 is to fix the unsigned variable comparison to zero; patch 02 is
to refactor the instruction size calculation; these two patches are
preparation for patch 03.
Patch 03 is to add thread stack support, after applying this patch the
option '-F,+callindent' can be used by perf script tool; patch 04 is to
add branch filter thus the Perf tool can display branch samples only
for function calls and returns after enable the call indentation or call
chain related options.
Patch 05 is to synthesize call chain for the instruction samples.
Patch 06 allows the instruction sample can be handled synchronously with
the thread stack, thus it fixes an error for the callchain generation.
This patch set has been tested on 96boards Hikey620 after applied on
perf/core branch with latest commit f7bf75a78095 ("perf annotate: Don't
return -1 for error when doing BPF disassembly").
Test for option '-F,+callindent':
Before:
# perf script -F,+callindent
main 2808 1 branches: coresight_test1 ffff8634f5c8 coresight_test1+0x3c (/root/coresight_test/libcstest.so)
main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main)
main 2808 1 branches: printf@plt aaaaba8d36bc printf@plt+0xc (/root/coresight_test/main)
main 2808 1 branches: _init aaaaba8d3650 _init+0x30 (/root/coresight_test/main)
main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.so)
main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
After:
# perf script -F,+callindent
main 2808 1 branches: coresight_test1@plt aaaaba8d37d8 main+0x14 (/root/coresight_test/main)
main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
main 2808 1 branches: do_lookup_x ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
main 2808 1 branches: check_match ffff86369bf0 do_lookup_x+0x238 (/lib/aarch64-linux-gnu/ld-2.28.so)
main 2808 1 branches: strcmp ffff86369888 check_match+0x70 (/lib/aarch64-linux-gnu/ld-2.28.so)
main 2808 1 branches: printf@plt aaaaba8d37ec main+0x28 (/root/coresight_test/main)
main 2808 1 branches: _dl_fixup ffff86373b4c _dl_runtime_resolve+0x40 (/lib/aarch64-linux-gnu/ld-2.28.s
main 2808 1 branches: _dl_lookup_symbol_x ffff8636e078 _dl_fixup+0xb8 (/lib/aarch64-linux-gnu/ld-2.28.so)
main 2808 1 branches: do_lookup_x ffff8636a49c _dl_lookup_symbol_x+0x104 (/lib/aarch64-linux-gnu/ld-2.28.
main 2808 1 branches: _dl_name_match_p ffff86369af0 do_lookup_x+0x138 (/lib/aarch64-linux-gnu/ld-2.28.so)
main 2808 1 branches: strcmp ffff8636f7f0 _dl_name_match_p+0x18 (/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Test for option '--itrace=g':
Before:
# perf script --itrace=g16l64i100
main 1579 100 instructions: ffff0000102137f0 group_sched_in+0xb0 ([kernel.kallsyms])
main 1579 100 instructions: ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
main 1579 100 instructions: ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
main 1579 100 instructions: ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
main 1579 100 instructions: ffff000010214854 perf_event_update_userpage+0x4c ([kernel.kallsyms])
[...]
After:
# perf script --itrace=g16l64i100
main 1579 100 instructions:
ffff000010213b78 flexible_sched_in+0xf0 ([kernel.kallsyms])
ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
main 1579 100 instructions:
ffff0000102135ac event_sched_in.isra.57+0x74 ([kernel.kallsyms])
ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
main 1579 100 instructions:
ffff000010219344 perf_swevent_add+0x6c ([kernel.kallsyms])
ffff0000102135f4 event_sched_in.isra.57+0xbc ([kernel.kallsyms])
ffff0000102137a0 group_sched_in+0x60 ([kernel.kallsyms])
ffff000010213b84 flexible_sched_in+0xfc ([kernel.kallsyms])
ffff00001020c0b4 visit_groups_merge+0x12c ([kernel.kallsyms])
[...]
Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
(Suzuki).
* Refined commit logs.
Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
another is for callchain support (Mathieu).
* Added a new patch to support branch filter.
Leo Yan (6):
perf cs-etm: Fix unsigned variable comparison to zero
perf cs-etm: Refactor instruction size handling
perf cs-etm: Support thread stack
perf cs-etm: Support branch filter
perf cs-etm: Support callchain for instruction sample
perf cs-etm: Synchronize instruction sample with the thread stack
tools/perf/util/cs-etm.c | 145 ++++++++++++++++++++++++++++++++-------
1 file changed, 120 insertions(+), 25 deletions(-)
--
2.17.1
Review of ETMV4 sysfs code resulted in a number of minor issues being
discovered. Patchset fixed these and updated docs.
Applies to coresight/next
Changes since v3
First 8 patches of v3 have been accepted onto coresight/next. The patch
series is now documents only
Docs .txt files changed to .rst by unrelated patch. This set reflects
this change and updates the added docs to match.
Indexing changed for new coresight docs directory.
Changes since v2 (reviews from Mathieu and Leo):-
Patch 0002 now adds stable tag. Tested on 4.9, 4.14, 4.19
Applies to coresight/next (5.4-rc1)
Documentation changed to .rst format to match recent updates that
converted other CoreSight .txt files.
Misc typo / comment changes.
Changes since v1 (from reviews by Mathieu and Leo):-
Usability patch split into 2 separate functional patches.
Docs patch split into 3 patches.
Misc style and comment typo fixes.
Mike Leach (4):
coresight: etm4x: docs: Update ABI doc for new sysfs name scheme.
coresight: etm4x: docs: Update ABI doc for new sysfs etm4 attributes
coresight: docs: Create common sub-directory for coresight trace.
coresight: etm4x: docs: Adds detailed document for programming etm4x.
.../testing/sysfs-bus-coresight-devices-etm4x | 183 ++--
.../{ => coresight}/coresight-cpu-debug.rst | 0
.../coresight/coresight-etm4x-reference.rst | 798 ++++++++++++++++++
.../trace/{ => coresight}/coresight.rst | 2 +-
Documentation/trace/coresight/index.rst | 9 +
Documentation/trace/index.rst | 3 +-
MAINTAINERS | 3 +-
7 files changed, 925 insertions(+), 73 deletions(-)
rename Documentation/trace/{ => coresight}/coresight-cpu-debug.rst (100%)
create mode 100644 Documentation/trace/coresight/coresight-etm4x-reference.rst
rename Documentation/trace/{ => coresight}/coresight.rst (99%)
create mode 100644 Documentation/trace/coresight/index.rst
--
2.17.1
Good day Jan,
Please CC the coresight mailing list when you have questions such as
this one. There is a lot of knowledgeable people on it that are also
be able to help you.
On Fri, 18 Oct 2019 at 07:42, Jan Hoogerbrugge <jan.hoogerbrugge(a)nxp.com> wrote:
>
> Hi Mathieu,
>
> I am trying to understand Coresight support in the Linux kernel. I am using a Xilinx Zynq
> Ultrascale+ system. I configured the kernel with coresight support enabled. When the
> system is running I see the /sys/bus/coresight directory but the devices directory in it
> stays empty. Also, I do not see messages about coresight reported when booting:
>
> root@xilinx-zcu102-2017_4:~# ls -R /sys/bus/coresight
> /sys/bus/coresight:
> devices drivers_autoprobe uevent
> drivers drivers_probe
>
> /sys/bus/coresight/devices:
>
> /sys/bus/coresight/drivers:
> root@xilinx-zcu102-2017_4:~# dmesg | grep -i coresight
> root@xilinx-zcu102-2017_4:~#
>
> Any idea what I am doing wrong?
My guess it that coresight devices for that processor have not been
specified in the device tree. If I'm not mistaking some people (also
on this list) from Xiling have been experiencing with coresight on
that specific platform - hopefully they will chime in.
>
> I want to use Coresight to obtain some TPUI/ETM traces so that I can experiment with them.
Note that a driver for the TPIU IP block is currently not available.
I never had time to write a driver and nobody has ever submitted one.
> I hope that I can dump some traces to file so that I can process them later. Do you know about
> publicly accessible archives with traces on the Internet? This might be then an alternative
> for me.
I do not know of any.
Thanks,
Mathieu
>
> Regards,
>
> Jan
>
> --
>
> Jan Hoogerbrugge
>
> Principal Security Architect
>
> Competence Center Crypto & Security
>
> NXP Semiconductors
>
> High Tech Campus 46, 5656AE Eindhoven, The Netherlands
>
> Phone: +31 6 57728704
Review of ETMV4 sysfs code resulted in a number of minor issues being
discovered.
Patch set fixes these issues:-
1) Update for ETM v4.4 archtecture.
2) Add missing single shot comparator API.
3) Misc fixes and improvements to sysfs API
4) Updated programmers documentation and reference.
Changes since v2 (reviews from Mathieu and Leo):-
Patch 0002 now adds stable tag. Tested on 4.9, 4.14, 4.19
Applies to coresight/next (5.4-rc1)
Documentation changed to .rst format to match recent updates that
converted other CoreSight .txt files.
Misc typo / comment changes.
Changes since v1 (from reviews by Mathieu and Leo):-
Usability patch split into 2 separate functional patches.
Docs patch split into 3 patches.
Misc style and comment typo fixes.
Mike Leach (11):
coresight: etm4x: Fixes for ETM v4.4 architecture updates.
coresight: etm4x: Fix input validation for sysfs.
coresight: etm4x: Add missing API to set EL match on address filters
coresight: etm4x: Fix issues with start-stop logic.
coresight: etm4x: Improve usability of sysfs - include/exclude addr.
coresight: etm4x: Improve usability of sysfs - CID and VMID masks.
coresight: etm4x: Add view comparator settings API to sysfs.
coresight: etm4x: Add missing single-shot control API to sysfs
coresight: etm4x: docs: Update ABI doc for sysfs features added.
coresight: docs: Create common sub-directory for coresight trace.
coresight: etm4x: docs: Adds detailed document for programming etm4x.
.../testing/sysfs-bus-coresight-devices-etm4x | 183 ++--
.../{ => coresight}/coresight-cpu-debug.rst | 0
.../coresight/coresight-etm4x-reference.rst | 798 ++++++++++++++++++
.../trace/{ => coresight}/coresight.rst | 2 +-
Documentation/trace/{ => coresight}/stm.rst | 0
MAINTAINERS | 3 +-
.../coresight/coresight-etm4x-sysfs.c | 312 ++++++-
drivers/hwtracing/coresight/coresight-etm4x.c | 32 +-
drivers/hwtracing/coresight/coresight-etm4x.h | 17 +-
9 files changed, 1245 insertions(+), 102 deletions(-)
rename Documentation/trace/{ => coresight}/coresight-cpu-debug.rst (100%)
create mode 100644 Documentation/trace/coresight/coresight-etm4x-reference.rst
rename Documentation/trace/{ => coresight}/coresight.rst (99%)
rename Documentation/trace/{ => coresight}/stm.rst (100%)
--
2.17.1
Hi Yabin,
On Tue, 17 Sep 2019 at 15:53, Yabin Cui <yabinc(a)google.com> wrote:
>
> Hi guys,
>
> I am trying to reduce ETM data lose. There seems to have two types of data lose:
> 1. caused by instruction trace buffer overflow, which generates an Overflow packet after recovering.
> 2. caused by ETR buffer overflow, which generates an PERF_RECORD_AUX with PERF_AUX_FLAG_TRUNCATED.
>
> In practice, the second one is unlikely to happen when I set ETR buffer size to 4M, and can be improved by setting higher buffer size.
> The first one happens much more frequently, can happen about 21K times when generating 5.3M etm data.
> So I want to know if there is any way to reduce that.
>
> I found in the ETM arch manual that the overflow seems controlled by TRCSTALLCTLR, a stall control register.
> But TRCIDR3.NOOVERFLOW is not supported on my experiment device, which seems using qualcomm sdm845.
> TRCSTALLCTLR.ISTALL seems can be set by writing to mode file in /sys. But I didn't find any way to set
> TRCSTALLCTLR.LEVEL in linux kernel. So I wonder if it is ever used.
>
> Do you guys have any suggestions?
In order to get a timely response to your questions I advise to CC the
coresight mailing list (which I have included). There is a lot of
knowledgeable people there that can also help, especially with
architecture specific configuration.
TRCSTALLCTLR.LEVEL is currently not accessible to users because we
simply never needed to use the feature. If using the ISTALL and LEVEL
parameters help with your use case send a patch that exposes the
entire TRCSTALLCTLR register (rather than just the LEVEL field) and
I'll be happy to fold it in.
Thanks,
Mathieu
>
> Best,
> Yabin
>
Hi coresight team,
In ARM ETM specification for ETM4.0 to 4.2, TRCAUTHSTATUS contains NSNID,
showing whether non-invasive debug is disabled.
And In ARM Coresight Specification 3.0, NSNID field is 0b10 (supported and
disabled) if (NIDEN | DBGEN) == FALSE.
And In ARM architecture manual for ARMv8-A, in K2.1:
DBGEN is for external debug enable.
NIDEN is for external profiling and trace enable.
So it seems if DBGEN and NIDEN is disabled by hardware, even if the
intention is to disable external debug interface, ETM is totally
disabled. And there is no way to use it for self-hosted tracing.
Is it true? And if yes, is there any plan to solve it in the future?
Thanks,
Yabin
Hello,
We are using coresight to dump compressed stream through ETR into a 16MB
buffer in RAM. (The platform is nVidia TX2) To gather data from the buffer
we are using the Linux dd command:
dd if=/dev/8050000.etr of=~/coresightdata.bin
The issue is that during the copying of the data, the coresight recording
becomes disabled (which shouldn't really be the case since its a circular
buffer), so we are having some blind spots in the recordings for the
duration of the copying of data from RAM into a file.
Is there any way to prevent this from happening? Maybe to set up some kind
of ping-pong scheme, or to specify somehow that coresight shouldn't stop
recording while accessing the ETR buffer?
Thank you.
Srdjan Stokic
Mobile: +389-78-835-505