This patchset represents the second phase of CoreSight configuration
management.
1) API updated to allow dynamic load and unload of configurations and
features. Dependency management between loaded sets is added.
2) New configuration and feature sets can be added using a loadable module.
An example in /samples/coresight is provided to demonstrate this.
3) Resource management API is added. This allows the system to ensure that
loaded configurations and features are only loaded onto devices that can
support them.
Further - it ensures that configurations with multiple features cannot over
allocate resources.
4) configfs can be used to activate a configuration which will then be used
when controlling tracing using sysfs.
5) Resource management is added to ETMv4 configurations. This allows current
and future features and configurations to be defined in terms of resources
used as well as registers to be programmed.
Defining features in this way allows the resource management to operate
correctly.
The perf event parsing is also adjusted to allow the ETM resources requested
on the command line (e.g. address filters, etc) to be correctly handled
using resoruce management alongside the complex configurations such as
autofdo.
Applies to coresight/next - which is 5.13-rc1 + initial Coresight configuration
patchset.
To follow in future revisions / sets:-
a) load of additional config and features by configfs
b) ECT and CTI and other Coresight components support for configuration and
features.
Mike Leach (8):
coresight: syscfg: Update API to allow dynamic load and unload
coresight: syscfg: Update load API for config loadable modules
coresight: syscfg: Example CoreSight configuration loadable module
coresight: configfs: Allow configfs to activate configuration.
coresight: syscfg: Add API to check and validate device resources.
coresight: etm4x: syscfg: Add resource management to etm4x.
coresight: etm4x: Update perf event resource handling.
coresight: etm4x: Update configuration example.
MAINTAINERS | 1 +
.../hwtracing/coresight/coresight-cfg-afdo.c | 38 +-
.../coresight/coresight-cfg-preload.c | 9 +-
.../hwtracing/coresight/coresight-config.c | 71 ++-
.../hwtracing/coresight/coresight-config.h | 45 +-
.../hwtracing/coresight/coresight-etm4x-cfg.c | 533 ++++++++++++++++++
.../hwtracing/coresight/coresight-etm4x-cfg.h | 196 ++++++-
.../coresight/coresight-etm4x-core.c | 250 +++-----
.../coresight/coresight-syscfg-configfs.c | 87 +++
.../coresight/coresight-syscfg-configfs.h | 4 +
.../hwtracing/coresight/coresight-syscfg.c | 390 +++++++++++--
.../hwtracing/coresight/coresight-syscfg.h | 38 +-
include/linux/coresight.h | 2 +
samples/Kconfig | 9 +
samples/Makefile | 1 +
samples/coresight/Makefile | 4 +
samples/coresight/coresight-cfg-sample.c | 73 +++
17 files changed, 1511 insertions(+), 240 deletions(-)
create mode 100644 samples/coresight/Makefile
create mode 100644 samples/coresight/coresight-cfg-sample.c
--
2.17.1
This patch series is to refine the memory barriers for AUX ring buffer.
Patches 01 ~ 04 to address the barriers usage in the kernel. The first
patch is to make clear comment for how to use the barriers between the
data store and aux_head store, this asks the driver to make sure the
data is visible. Patches 02 ~ 04 is to refine the drivers for barriers
after the data store.
Patches 05 ~ 07 is to drop the legacy __sync functions, and polish for
duplicate code and cleanup the build after SYNC_COMPARE_AND_SWAP is not
used.
Patch 08 is to use WRITE_ONCE() for updating aux_tail.
Since the 64-bit value's atomicity is not promised on 32-bit perf, the
last two patches tries to fixup for perf tool when it runs in compat
mode. Patch 09 introduces a new global variable to indicate the kernel
runs in 64-bit mode which can be used to confirm if in compat mode;
patch 10 introduces variant functions for accessing AUX head/tail, it
can resolve the aotmicity issue for reading head pointer, and for the
tail write overflow issue it returns error to notify the tool to exit.
Have testes the patches on Arm64 Juno platform.
Changes from v2:
- Removed auxtrace_mmap__read_snapshot_head(), which has the duplicated
code with auxtrace_mmap__read_head();
- Cleanuped the build for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT (Adrian);
- Added global variable "kernel_is_64_bit" (Adrian);
- Added compat variants compat_auxtrace_mmap__{read_head|write_tail}
(Adrian).
Leo Yan (10):
perf/ring_buffer: Add comment for barriers on AUX ring buffer
coresight: tmc-etr: Add barrier after updating AUX ring buffer
coresight: tmc-etf: Add comment for store ordering
perf/x86: Add barrier after updating bts
perf auxtrace: Drop legacy __sync functions
perf auxtrace: Remove auxtrace_mmap__read_snapshot_head()
perf: Cleanup for HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
perf auxtrace: Use WRITE_ONCE() for updating aux_tail
perf env: Set kernel bit mode
perf auxtrace: Add compat_auxtrace_mmap__{read_head|write_tail}
arch/x86/events/intel/bts.c | 3 +
.../hwtracing/coresight/coresight-tmc-etf.c | 6 +
.../hwtracing/coresight/coresight-tmc-etr.c | 8 ++
kernel/events/ring_buffer.c | 9 ++
tools/perf/Makefile.config | 4 -
tools/perf/util/auxtrace.c | 19 ++-
tools/perf/util/auxtrace.h | 109 ++++++++++++++----
tools/perf/util/env.c | 17 ++-
tools/perf/util/env.h | 1 +
9 files changed, 136 insertions(+), 40 deletions(-)
--
2.25.1
This patch series is to correct the pointer usages for the snapshot
mode.
Patch 01 is to polish code, it removes the redundant header maintained
in tmc-etr driver and directly uses pointer perf_output_handle::head.
Patch 02 removes the callback cs_etm_find_snapshot() which wrongly
calculates the buffer headers; we can simply use the perf's common
function __auxtrace_mmap__read() for headers calculation. Patch 03 is
to update comments in CoreSight drivers to reflect the changes
introduced by patch 02.
This patch can be cleanly applied on the mainline kernel with:
commit dbe69e433722 ("Merge tag 'net-next-5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next")
And it has been tested on Arm64 Juno board.
Changes from v1:
- Dropped the patch "coresight: etm-perf: Correct buffer syncing for
snapshot", after a long discussion, the patch doesn't really resolve
any issues for snapshot mode. And another reason for unlike this
patch is now the CoreSight and Intel-PT have the consistent behaviour
(Suzuki/James/Mathieu);
- Added the patch 03 to updates drivers' comments (James);
- Added Suzuki's review tag for patch 01;
- Added James' review and testing tags for patch 02.
Leo Yan (3):
coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
perf cs-etm: Remove callback cs_etm_find_snapshot()
coresight: Update comments for removing cs_etm_find_snapshot()
drivers/hwtracing/coresight/coresight-etb10.c | 2 +-
.../hwtracing/coresight/coresight-tmc-etf.c | 2 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 12 +-
tools/perf/arch/arm/util/cs-etm.c | 133 ------------------
4 files changed, 6 insertions(+), 143 deletions(-)
--
2.25.1
Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
This can cause the "data has no samples!" error if the first
PERF_RECORD_AUX record comes before the first (or any relevant)
PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
to access the binary data.
This change pushes the start of non-timeless decoding to the very end of
parsing the file. The PERF_RECORD_EXIT event can't be used because it
might not exist in system-wide or snapshot modes.
I have not been able to find the exact cause for the events to be
intermittently in the wrong order in the basic scenario:
perf record -e cs_etm/@tmc_etr0/u top
But it can be made to happen every time with the --delay option. This is
because "enable_on_exec" is disabled, which causes tracing to start
before the process to be launched is exec'd. For example:
perf record -e cs_etm/@tmc_etr0/u --delay=1 top
perf report -D | grep 'AUX\|MAP'
0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
Another scenario in which decoding from the first aux record fails is a
workload that forks. Although the aux record comes after 'bash', it
comes before 'top', which is what we are interested in. For example:
perf record -e cs_etm/@tmc_etr0/u -- bash -c top
perf report -D | grep 'AUX\|MAP'
4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
...
0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
A third scenario is when the majority of time is spent in a shared
library that is not loaded at startup. For example a dynamically loaded
plugin.
Testing
=======
Testing was done by checking if any samples that are present in the
old output are missing from the new output. Timestamps must be
stripped out with awk because now they are set to the last AUX sample,
rather than the first:
./perf script $4 | awk '!($4="")' > new.script
./perf-default script $4 | awk '!($4="")' > default.script
comm -13 <(sort -u new.script) <(sort -u default.script)
Testing showed that the new output is a superset of the old. When lines
appear in the comm output, it is not because they are missing but
because [unknown] is now resolved to sensible locations. For example
last putp branch here now resolves to libtinfo, so it's not missing
from the output, but is actually improved:
Old:
top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
New:
top 305 [001] 1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
top 305 [001] 1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
In the following two modes, decoding now works and the "data has no
samples!" error is not displayed any more:
perf record -e cs_etm/@tmc_etr0/u -- bash -c top
perf record -e cs_etm/@tmc_etr0/u --delay=1 top
In snapshot mode, there is also an improvement to decoding. Previously
samples for the 'kill' process that was used to send SIGUSR2 were
completely missing, because the process hadn't started yet. But now
there are additional samples present:
perf record -e cs_etm/@tmc_etr0/u --snapshot -a
perf script
stress 19380 [003] 161627.938153: 1000000 instructions:uH: aaaabb612fb4 [unknown] (/usr/bin/stress)
kill 19644 [000] 161627.938153: 1000000 instructions:uH: ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
stress 19380 [003] 161627.938153: 1000000 instructions:uH: ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
Also tested was the round trip of 'perf inject' followed by 'perf
report' which has the same differences and improvements.
Signed-off-by: James Clark <james.clark(a)arm.com>
---
tools/perf/util/cs-etm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 57aea2c7fc77..ceed0b038796 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
return err;
}
+ /*
+ * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
+ * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
+ * ETM_OPT_CTXTID is not enabled.
+ */
if (etm->timeless_decoding &&
event->header.type == PERF_RECORD_EXIT)
return cs_etm__process_timeless_queues(etm,
@@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
* onwards.
*/
etm->latest_kernel_timestamp = sample_kernel_timestamp;
- return cs_etm__process_queues(etm);
}
return 0;
--
2.28.0