Good morning,
Is tracing a multi-threaded program a supported use case for perf cs-etm?
If yes, are there any flags that should be specified with perf?
Thanks,
Andrea
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
This patch series adds support for thread stack and callchain; this patch
set depends on the instruction sample fix patch set [1].
This patch set get more complex, so before divide into small groups, I'd
like to use this patch set version to include all relevant patches, hope
this can give whole context for related code change.
Briefly, this patch can be divided into three parts, which also can be
reviewed separately for every part:
Patches 01, 02 are used to fix samples for one corner case is for
accessing the branch's target address and trigger an exception.
Essentially, an extra branch sample is added to reflect this
mediate branch between the previous branch and exception entry.
Patches 03, 04, 05, 06 are coming from patch v4, which are used to
support thread stack and callchain.
Patches 07, 08, 09 are used to fixup for exception entry and exit. This
is mainly used to fix two cases, one part is to fixup the thread stack
and callchain for the case when access branch target address and trigger
exception; another part is to fixup the thread stack for instruction
emulation (and other single step cases).
This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is also applied on
top of instruction sample fix patch set [1].
Test for option '-F,+callindent':
# perf script -F,+callindent
main 3258 1 branches: main ffffad684d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: _dl_fixup ffffad811b4c _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: _dl_lookup_symbol_x ffffad80c078 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: do_lookup_x ffffad80849c _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: check_match ffffad807bf0 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: strcmp ffffad807888 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
[...]
Test for option '--itrace=g':
# perf script --itrace=g16l64i100
main 3258 100 instructions:
ffffad816a80 memcpy+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad809468 _dl_new_object+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffffad80952c _dl_new_object+0x16c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffffad8018dc dl_main+0x814 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffff8000100878d0 el0_sync_handler+0x168 ([kernel.kallsyms])
ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Changes from v4:
* Addressed Mike's suggestion for performance improvement for function
cs_etm__instr_addr() for quick calculation for non T32;
* Removed the patch 'perf cs-etm: Synchronize instruction sample with
the thread stack' (Mike);
* Fixed the issue for exception is taken for branch target address
accessing, for the branch sample and stack thread handling, the
related patches are 01, 02, 07;
* Fixed the stack thread handling for instruction emulation and single
step with patches 08, 09.
Changes from v3:
* Split out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.
Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
(Suzuki).
* Refined commit logs.
Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
another is for callchain support (Mathieu).
* Added a new patch to support branch filter.
[1] https://lkml.org/lkml/2020/2/18/1406
Leo Yan (9):
perf cs-etm: Defer to assign exception sample flag
perf cs-etm: Reflect branch prior to exception
perf cs-etm: Refactor instruction size handling
perf cs-etm: Support thread stack
perf cs-etm: Support branch filter
perf cs-etm: Support callchain for instruction sample
perf cs-etm: Fixup exception entry for thread stack
perf thread: Add helper to get top return address
perf cs-etm: Fixup exception exit for thread stack
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 +
tools/perf/util/cs-etm.c | 290 ++++++++++++++++--
tools/perf/util/thread-stack.c | 10 +
tools/perf/util/thread-stack.h | 1 +
4 files changed, 268 insertions(+), 34 deletions(-)
--
2.17.1
Hi, this is an incomplete patch for an issue with EL2 kernels, and I'm looking
for feedback on how to complete it.
The background is that to support tracing multiple address spaces we get ETM to
embed the context id in the trace, and we build with CONFIG_PID_IN_CONTEXTIDR
to get the scheduler to put the thread id in CONTEXTIDR_EL1. This is a known
technique, it's what context id tracing is designed for.
The problem is when the kernel is running not at EL1 (OS level) but at EL2
(hypervisor level), which is now becoming common. With HCR_EL2.E2H set,
the kernel's writes to CONTEXTIDR_EL1 actually change a different physical
register, CONTEXTIDR_EL2. However, ETM still traces CONTEXTIDR_EL1.
So the context ids in the trace are zero, and trace cannot be reconstructed.
ETM 4.1 has an option VMIDOPT to cause CONTEXTIDR_EL2 to be output in trace,
in the VMID field replacing the value of VTTBR.VMID. So we can use that, but the
trace follower, collecting events from OpenCSD, needs to be aware it needs to
check the VMID field not the CID field. OpenCSD doesn't need to change but
perf does. TRCCONFIGR is already in the metadata, so perf consumers can check
it to see what's going on.
The patch below does the kernel and userspace side but is not complete.
The problem is that userspace perf creates the metadata copy of TRCCONFIGR
based on its request (and fills in the other id registers by reading sysfs),
but the detection of EL2/E2H happens in the kernel which adjusts TRCCONFIGR,
and it's this config which is needed for decode. I see three ways round this:
- have userspace test to see if the kernel is EL2 (somehow) and adjust the
metadata to mirror what the kernel is doing
- have the kernel pass the adjusted TRCCONFIGR back so perf can put it in the
metadata
- have the perf decoder get the thread id from whichever of VMID and
CONTEXTID is available in a PE_CONTEXT element
Obviously, the last is simplest, but it's a bodge, and means that OpenCSD
will see VMIDs when its TRCCONFIGR says it won't. It's kind of cleanest to get
the real TRCCONFIGR somehow, but how do we do that?
Al
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index a128b5063f46..96488a0cfdcf 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -353,8 +353,32 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata,
}
if (attr->config & BIT(ETM_OPT_CTXTID))
- /* bit[6], Context ID tracing bit */
- config->cfg |= BIT(ETM4_CFG_BIT_CTXTID);
+ {
+ /*
+ * Enable context-id tracing. The assumption is that this
+ * will work with CONFIG_PID_IN_CONTEXTIDR to trace process
+ * id changes and support decode of multiple processes.
+ * But ETM's context id trace traces physical CONTEXTIDR_EL1,
+ * while the logical CONTEXTIDR_EL1 that is written to on
+ * process switch is either physical CONTEXTIDR_EL1 or
+ * CONTEXTIDR_EL2 depending on HCR_EL2.E2H. On principle
+ * we should continue to use logical CONTEXTIDR_EL1.
+ * In order to trace physical CONTEXTIDR_EL2, we need to
+ * enable VMID tracing and use the VMIDOPT flag to trace
+ * CONTEXTIDR_EL2 rather than VTTBR.VMID in the VMID field.
+ * Trace decoders will need to inspect TRCCONFIGR and use
+ * either the CID or the VMID field from the trace packet.
+ */
+ if (!(is_kernel_in_hyp_mode() &&
+ (read_sysreg(hcr_el2) & BIT(34)) != 0)) {
+ /* bit[6], Context ID tracing bit */
+ config->cfg |= BIT(ETM4_CFG_BIT_CTXTID);
+ } else {
+ /* bits[7,15], trace CONTEXTID_EL2 in VMID field */
+ config->cfg |= (BIT(ETM4_CFG_BIT_VMID) |
+ BIT(ETM4_CFG_BIT_VMIDOPT));
+ }
+ }
/* return stack - enable if selected and supported */
if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
index b0e35eec6499..c2f47b25daab 100644
--- a/include/linux/coresight-pmu.h
+++ b/include/linux/coresight-pmu.h
@@ -19,8 +19,10 @@
/* ETMv4 CONFIGR programming bits for the ETM OPTs */
#define ETM4_CFG_BIT_CYCACC 4
#define ETM4_CFG_BIT_CTXTID 6
+#define ETM4_CFG_BIT_VMID 7
#define ETM4_CFG_BIT_TS 11
#define ETM4_CFG_BIT_RETSTK 12
+#define ETM4_CFG_BIT_VMIDOPT 15
static inline int coresight_get_trace_id(int cpu)
{
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h
index b0e35eec6499..c2f47b25daab 100644
--- a/tools/include/linux/coresight-pmu.h
+++ b/tools/include/linux/coresight-pmu.h
@@ -19,8 +19,10 @@
/* ETMv4 CONFIGR programming bits for the ETM OPTs */
#define ETM4_CFG_BIT_CYCACC 4
#define ETM4_CFG_BIT_CTXTID 6
+#define ETM4_CFG_BIT_VMID 7
#define ETM4_CFG_BIT_TS 11
#define ETM4_CFG_BIT_RETSTK 12
+#define ETM4_CFG_BIT_VMIDOPT 15
static inline int coresight_get_trace_id(int cpu)
{
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index cd92a99eb89d..a54cad778841 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -35,6 +35,7 @@ struct cs_etm_decoder {
dcd_tree_handle_t dcd_tree;
cs_etm_mem_cb_type mem_access;
ocsd_datapath_resp_t prev_return;
+ uint32 thread_id_in_vmid:1;
};
static u32
@@ -496,17 +497,24 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue,
static ocsd_datapath_resp_t
cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
+ struct cs_etm_decoder *decoder,
struct cs_etm_packet_queue *packet_queue,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
{
pid_t tid;
- /* Ignore PE_CONTEXT packets that don't have a valid contextID */
- if (!elem->context.ctxt_id_valid)
- return OCSD_RESP_CONT;
+ if (!decoder->thread_id_in_vmid) {
+ /* Ignore PE_CONTEXT packets that don't have a valid contextID */
+ if (!elem->context.ctxt_id_valid)
+ return OCSD_RESP_CONT;
+ tid = elem->context.context_id;
+ } else {
+ if (!elem->context.vmid_valid)
+ return OCSD_RESP_CONT;
+ tid = elem->context.vmid;
+ }
- tid = elem->context.context_id;
if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id))
return OCSD_RESP_FATAL_SYS_ERR;
@@ -561,7 +569,7 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
- resp = cs_etm_decoder__set_tid(etmq, packet_queue,
+ resp = cs_etm_decoder__set_tid(etmq, decoder, packet_queue,
elem, trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_ADDR_NACC:
@@ -595,11 +603,15 @@ static int cs_etm_decoder__create_etm_packet_decoder(
OCSD_BUILTIN_DCD_ETMV3 :
OCSD_BUILTIN_DCD_PTM;
trace_config = &config_etmv3;
+ decoder->thread_id_in_vmid = 0;
break;
case CS_ETM_PROTO_ETMV4i:
cs_etm_decoder__gen_etmv4_config(t_params, &trace_config_etmv4);
decoder_name = OCSD_BUILTIN_DCD_ETMV4I;
trace_config = &trace_config_etmv4;
+ /* If VMID and VMIDOPT are set, thread id is in VMID not CID */
+ decoder->thread_id_in_vmid =
+ ((trace_config_etmv4.reg.configr & 0x8080) == 0x8080);
break;
default:
return -1;
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
CoreSight ETMv4.4 introduced system instructions for accessing
the ETM. This also implies that they may not be on the amba bus.
Right now all the CoreSight components are accessed via memory
map. Also, we have some common routines in coresight generic
code driver (e.g, CS_LOCK, claim/disclaim), which assume the
mmio. In order to preserve the generic algorithms at a single
place and to allow dynamic switch for ETMs, this series introduces
an abstraction layer for accessing a coresight device. It is
designed such that the mmio access are fast tracked (i.e, without
an indirect function call).
This will also help us to get rid of the driver+attribute specific
sysfs show/store routines and replace them with a single routine
to access a given register offset (which can be embedded in the
dev_ext_attribute). This is not currently implemented in the series,
but can be achieved.
Further we switch the generic routines to work with the abstraction.
With this in place, we refactor the etm4x code a bit to allow for
supporting the system instructions with very little new code. The
changes also switch to using the system instructions by default
even when we may have an MMIO.
The series has been mildly tested on a model. I would really
appreciate any testing on real hardware.
Applies on coresight/next tree. The tree is also available here :
git://linux-arm.org/linux-skp.git etm-4.4/rfc
Suzuki K Poulose (14):
coresight: etm4x: Skip save/restore before device registration
coresight: Introduce device access abstraction
coresight: tpiu: Use coresight device access abstraction
coresight: etm4x: Free up argument of etm4_init_arch_data
coresight: Convert coresight_timeout to use access abstraction
coresight: Convert claim and lock operations to use access wrappers
coresight: etm4x: Always read the registers on the host CPU
coresight: etm4x: Convert all register accesses
coresight: etm4x: Add sysreg access helpers
coresight: etm4x: Define DEVARCH register fields
coresight: etm4x: Detect system register access support
coresight: etm4x: Refactor probing routine
coresight: etm4x: Add support for sysreg only devices
dts: bindings: coresight: ETMv4.4 system register access only units
.../devicetree/bindings/arm/coresight.txt | 6 +-
drivers/hwtracing/coresight/coresight-catu.c | 17 +-
.../hwtracing/coresight/coresight-cpu-debug.c | 26 +-
.../hwtracing/coresight/coresight-cti-sysfs.c | 4 +-
drivers/hwtracing/coresight/coresight-cti.c | 31 +-
drivers/hwtracing/coresight/coresight-etb10.c | 26 +-
.../coresight/coresight-etm3x-sysfs.c | 8 +-
drivers/hwtracing/coresight/coresight-etm3x.c | 45 +-
.../coresight/coresight-etm4x-sysfs.c | 32 +-
drivers/hwtracing/coresight/coresight-etm4x.c | 580 +++++++++++-------
drivers/hwtracing/coresight/coresight-etm4x.h | 403 +++++++++++-
.../hwtracing/coresight/coresight-funnel.c | 19 +-
drivers/hwtracing/coresight/coresight-priv.h | 9 +-
.../coresight/coresight-replicator.c | 28 +-
drivers/hwtracing/coresight/coresight-stm.c | 49 +-
.../hwtracing/coresight/coresight-tmc-etf.c | 36 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 19 +-
drivers/hwtracing/coresight/coresight-tmc.c | 10 +-
drivers/hwtracing/coresight/coresight-tpiu.c | 32 +-
drivers/hwtracing/coresight/coresight.c | 130 +++-
include/linux/coresight.h | 189 +++++-
21 files changed, 1273 insertions(+), 426 deletions(-)
--
2.24.1
etm4_count keeps track of number of ETMv4 registered and on some systems,
a race is observed on etm4_count variable which can lead to multiple calls
to cpuhp_setup_state_nocalls_cpuslocked(). This function internally calls
cpuhp_store_callbacks() which prevents multiple registrations of callbacks
for a given state and due to this race, it returns -EBUSY leading to ETM
probe failures like below.
coresight-etm4x: probe of 7040000.etm failed with error -16
This race can easily be triggered with async probe by setting probe type
as PROBE_PREFER_ASYNCHRONOUS and with ETM power management property
"arm,coresight-loses-context-with-cpu".
Prevent this race by moving cpuhp callbacks to etm driver init since the
cpuhp callbacks doesn't have to depend on the etm4_count and can be once
setup during driver init. Similarly we move cpu_pm notifier registration
to driver init and completely remove etm4_count usage. Also now we can
use non cpuslocked version of cpuhp callbacks with this movement.
Fixes: 9b6a3f3633a5 ("coresight: etmv4: Fix CPU power management setup in probe() function")
Fixes: 58eb457be028 ("hwtracing/coresight-etm4x: Convert to hotplug state machine")
Suggested-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan(a)codeaurora.org>
---
Changes in v3:
* Minor cleanups from v2 and change to device_initcall (Stephen Boyd)
* Move to non cpuslocked cpuhp callbacks and rename to etm_pm_setup() (Mike Leach)
Changes in v2:
* Rearrange cpuhp callbacks and move them to driver init (Suzuki K Poulose)
---
drivers/hwtracing/coresight/coresight-etm4x.c | 65 +++++++++----------
1 file changed, 31 insertions(+), 34 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index 6d7d2169bfb2..fddfd93b9a7b 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -48,8 +48,6 @@ module_param(pm_save_enable, int, 0444);
MODULE_PARM_DESC(pm_save_enable,
"Save/restore state on power down: 1 = never, 2 = self-hosted");
-/* The number of ETMv4 currently registered */
-static int etm4_count;
static struct etmv4_drvdata *etmdrvdata[NR_CPUS];
static void etm4_set_default_config(struct etmv4_config *config);
static int etm4_set_event_filters(struct etmv4_drvdata *drvdata,
@@ -1398,28 +1396,25 @@ static struct notifier_block etm4_cpu_pm_nb = {
.notifier_call = etm4_cpu_pm_notify,
};
-/* Setup PM. Called with cpus locked. Deals with error conditions and counts */
-static int etm4_pm_setup_cpuslocked(void)
+/* Setup PM. Deals with error conditions and counts */
+static int __init etm4_pm_setup(void)
{
int ret;
- if (etm4_count++)
- return 0;
-
ret = cpu_pm_register_notifier(&etm4_cpu_pm_nb);
if (ret)
- goto reduce_count;
+ return ret;
- ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING,
- "arm/coresight4:starting",
- etm4_starting_cpu, etm4_dying_cpu);
+ ret = cpuhp_setup_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING,
+ "arm/coresight4:starting",
+ etm4_starting_cpu, etm4_dying_cpu);
if (ret)
goto unregister_notifier;
- ret = cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_ONLINE_DYN,
- "arm/coresight4:online",
- etm4_online_cpu, NULL);
+ ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "arm/coresight4:online",
+ etm4_online_cpu, NULL);
/* HP dyn state ID returned in ret on success */
if (ret > 0) {
@@ -1428,21 +1423,15 @@ static int etm4_pm_setup_cpuslocked(void)
}
/* failed dyn state - remove others */
- cpuhp_remove_state_nocalls_cpuslocked(CPUHP_AP_ARM_CORESIGHT_STARTING);
+ cpuhp_remove_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING);
unregister_notifier:
cpu_pm_unregister_notifier(&etm4_cpu_pm_nb);
-
-reduce_count:
- --etm4_count;
return ret;
}
-static void etm4_pm_clear(void)
+static void __init etm4_pm_clear(void)
{
- if (--etm4_count != 0)
- return;
-
cpu_pm_unregister_notifier(&etm4_cpu_pm_nb);
cpuhp_remove_state_nocalls(CPUHP_AP_ARM_CORESIGHT_STARTING);
if (hp_online) {
@@ -1498,22 +1487,12 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
if (!desc.name)
return -ENOMEM;
- cpus_read_lock();
etmdrvdata[drvdata->cpu] = drvdata;
if (smp_call_function_single(drvdata->cpu,
etm4_init_arch_data, drvdata, 1))
dev_err(dev, "ETM arch init failed\n");
- ret = etm4_pm_setup_cpuslocked();
- cpus_read_unlock();
-
- /* etm4_pm_setup_cpuslocked() does its own cleanup - exit on error */
- if (ret) {
- etmdrvdata[drvdata->cpu] = NULL;
- return ret;
- }
-
if (etm4_arch_supported(drvdata->arch) == false) {
ret = -EINVAL;
goto err_arch_supported;
@@ -1560,7 +1539,6 @@ static int etm4_probe(struct amba_device *adev, const struct amba_id *id)
err_arch_supported:
etmdrvdata[drvdata->cpu] = NULL;
- etm4_pm_clear();
return ret;
}
@@ -1598,4 +1576,23 @@ static struct amba_driver etm4x_driver = {
.probe = etm4_probe,
.id_table = etm4_ids,
};
-builtin_amba_driver(etm4x_driver);
+
+static int __init etm4x_init(void)
+{
+ int ret;
+
+ ret = etm4_pm_setup();
+
+ /* etm4_pm_setup() does its own cleanup - exit on error */
+ if (ret)
+ return ret;
+
+ ret = amba_driver_register(&etm4x_driver);
+ if (ret) {
+ pr_err("Error registering etm4x driver\n");
+ etm4_pm_clear();
+ }
+
+ return ret;
+}
+device_initcall(etm4x_init);
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
Allow to build coresight as modules. This gives developers the feasibility to
test their code without reboot.
This series is based on below two series.
- "coresight: allow to build components as modules"
https://lkml.org/lkml/2018/6/5/989
- "coresight: make drivers modular"
https://lkml.org/lkml/2020/1/17/468
Change from v5:
Add below CTI clean up change from Mike into series
-https://lists.linaro.org/pipermail/coresight/2020-July/004349.html
Increase module reference count when enabling CTI device (Mike)
Change from v4:
Fix error handling in coresight_grab_devicei() (Greg)
Add coresight: cti: Fix remove sysfs link error from Mike
-https://lists.linaro.org/pipermail/coresight/2020-July/004275.html
Move cti_remove_conn_xrefs() into cti_remove() (Mike)
Align patch subject to coresight: <component>: <description> (Mike)
Change from v3:
Rebase to coresight-next (Mike and Mathieu)
Reorder try_get_module() (Suzuki)
Clean up etmdrvdata[] in device remote path (Mike)
Move cti_remove_conn_xrefs to cti_remove (Mike)
Change from v2:
Rebase to 5.8-rc5. Export coresight_add_sysfs_link and
coresight_remove_sysfs_link
Fix one cut and paste error on MODULE_DESCRIPTION of CTI
Change from v1:
Use try_module_get() to avoid module to be unloaded when device is used
in active trace session. (Mathieu P)
Change from above two series.
This series adds the support to dynamically remove module when the device in
that module is enabled and used by some trace path. It disables all trace
paths with that device and release the trace path.
Kim Phillips (7):
coresight: use IS_ENABLED for CONFIGs that may be modules
coresight: etm3x: allow etm3x to be built as a module
coresight: etm4x: allow etm4x to be built as a module
coresight: etb: allow etb to be built as a module
coresight: tpiu: allow tpiu to be built as a module
coresight: tmc: allow tmc to be built as a module
coresight: allow funnel and replicator drivers to be built as modules
Mian Yousaf Kaukab (4):
coresight: export global symbols
coresight: funnel: remove multiple init calls from funnel driver
coresight: replicator: remove multiple init calls
coresight: tmc-etr: add function to register catu ops
Mike Leach (1):
coresight: cti: Fix remove sysfs link error
Tingwei Zhang (13):
coresight: cpu_debug: add module name in Kconfig
coresight: cpu_debug: define MODULE_DEVICE_TABLE
coresight: add coresight prefix to barrier_pkt
coresight: add try_get_module() in coresight_grab_device()
coresight: stm: allow to build coresight-stm as a module
coresight: etm: perf: Fix warning caused by etm_setup_aux failure
coresight: cti: add function to register cti associate ops
coresight: cti: Fix bug clearing sysfs links on callback
coresight: cti: don't disable ect device if it's not enabled
coresight: cti: increase reference count when enabling cti
coresight: cti: allow cti to be built as a module
coresight: catu: allow catu drivers to be built as modules
coresight: allow the coresight core driver to be built as a module
drivers/hwtracing/coresight/Kconfig | 54 +++++--
drivers/hwtracing/coresight/Makefile | 22 +--
drivers/hwtracing/coresight/coresight-catu.c | 37 ++++-
drivers/hwtracing/coresight/coresight-catu.h | 2 -
.../{coresight.c => coresight-core.c} | 134 +++++++++++++++---
.../hwtracing/coresight/coresight-cpu-debug.c | 2 +
.../{coresight-cti.c => coresight-cti-core.c} | 62 ++++++--
drivers/hwtracing/coresight/coresight-etb10.c | 22 ++-
.../hwtracing/coresight/coresight-etm-perf.c | 13 +-
.../hwtracing/coresight/coresight-etm-perf.h | 5 +-
...resight-etm3x.c => coresight-etm3x-core.c} | 27 +++-
...resight-etm4x.c => coresight-etm4x-core.c} | 26 +++-
.../hwtracing/coresight/coresight-funnel.c | 62 +++++++-
.../hwtracing/coresight/coresight-platform.c | 1 +
drivers/hwtracing/coresight/coresight-priv.h | 24 ++--
.../coresight/coresight-replicator.c | 63 +++++++-
drivers/hwtracing/coresight/coresight-stm.c | 20 ++-
drivers/hwtracing/coresight/coresight-sysfs.c | 2 +
.../{coresight-tmc.c => coresight-tmc-core.c} | 19 ++-
.../hwtracing/coresight/coresight-tmc-etf.c | 2 +-
.../hwtracing/coresight/coresight-tmc-etr.c | 21 ++-
drivers/hwtracing/coresight/coresight-tmc.h | 3 +
drivers/hwtracing/coresight/coresight-tpiu.c | 19 ++-
include/linux/coresight.h | 3 +-
24 files changed, 559 insertions(+), 86 deletions(-)
rename drivers/hwtracing/coresight/{coresight.c => coresight-core.c} (93%)
rename drivers/hwtracing/coresight/{coresight-cti.c => coresight-cti-core.c} (94%)
rename drivers/hwtracing/coresight/{coresight-etm3x.c => coresight-etm3x-core.c} (97%)
rename drivers/hwtracing/coresight/{coresight-etm4x.c => coresight-etm4x-core.c} (98%)
rename drivers/hwtracing/coresight/{coresight-tmc.c => coresight-tmc-core.c} (96%)
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Hi Mike,
We are trying to make open CSD working in our silicon sample. It is very
much required for our debug as we cannot attach DS5 early enough.
We were able to generate the trace.bin and generate the trace decode
file(attached)
I have few questions
[image: image.png]
The end portion of teh trace looks something like this.
I know PTM only takes the branch address. Is there any way we can make it
more readable by filling the missing instructions between branches? or is
this the final output.
Also can you check the attached file and see if anything else is missing in
our setup?
Ajith K Issac
Broadcom DCSG
Coresight driver assumes sink is common across all the ETMs,
and tries to build a path between ETM and the first enabled
sink found using bus based search. This breaks implmentations
that has multiple per core sinks in enabled state.
For this,
- coresight_find_sink API is updated with an additional flag
so that it is able to return an enabled sink
- coresight_get_enabled_sink API is updated to do a
connection based search, when a source reference is given.
Signed-off-by: Linu Cherian <lcherian(a)marvell.com>
---
.../hwtracing/coresight/coresight-etm-perf.c | 2 +-
drivers/hwtracing/coresight/coresight-priv.h | 5 +-
drivers/hwtracing/coresight/coresight.c | 51 +++++++++++++++++--
3 files changed, 51 insertions(+), 7 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 1a3169e69bb1..25041d2654e3 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -223,7 +223,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
id = (u32)event->attr.config2;
sink = coresight_get_sink_by_id(id);
} else {
- sink = coresight_get_enabled_sink(true);
+ sink = coresight_get_enabled_sink(NULL, true);
}
mask = &event_data->mask;
diff --git a/drivers/hwtracing/coresight/coresight-priv.h b/drivers/hwtracing/coresight/coresight-priv.h
index f2dc625ea585..010ed26db340 100644
--- a/drivers/hwtracing/coresight/coresight-priv.h
+++ b/drivers/hwtracing/coresight/coresight-priv.h
@@ -148,10 +148,13 @@ static inline void coresight_write_reg_pair(void __iomem *addr, u64 val,
void coresight_disable_path(struct list_head *path);
int coresight_enable_path(struct list_head *path, u32 mode, void *sink_data);
struct coresight_device *coresight_get_sink(struct list_head *path);
-struct coresight_device *coresight_get_enabled_sink(bool reset);
+struct coresight_device *
+coresight_get_enabled_sink(struct coresight_device *source, bool reset);
struct coresight_device *coresight_get_sink_by_id(u32 id);
struct coresight_device *
coresight_find_default_sink(struct coresight_device *csdev);
+struct coresight_device *
+coresight_find_enabled_sink(struct coresight_device *csdev);
struct list_head *coresight_build_path(struct coresight_device *csdev,
struct coresight_device *sink);
void coresight_release_path(struct list_head *path);
diff --git a/drivers/hwtracing/coresight/coresight.c b/drivers/hwtracing/coresight/coresight.c
index e9c90f2de34a..ae69169c58d3 100644
--- a/drivers/hwtracing/coresight/coresight.c
+++ b/drivers/hwtracing/coresight/coresight.c
@@ -566,6 +566,10 @@ static int coresight_enabled_sink(struct device *dev, const void *data)
/**
* coresight_get_enabled_sink - returns the first enabled sink found on the bus
+ * When a source reference is given, enabled sink is found using connection based
+ * search.
+ *
+ * @source: Coresight source device reference
* @deactivate: Whether the 'enable_sink' flag should be reset
*
* When operated from perf the deactivate parameter should be set to 'true'.
@@ -576,10 +580,21 @@ static int coresight_enabled_sink(struct device *dev, const void *data)
* parameter should be set to 'false', hence mandating users to explicitly
* clear the flag.
*/
-struct coresight_device *coresight_get_enabled_sink(bool deactivate)
+struct coresight_device *
+coresight_get_enabled_sink(struct coresight_device *source, bool deactivate)
{
struct device *dev = NULL;
+ struct coresight_device *sink;
+
+ if (!source)
+ goto bus_search;
+ sink = coresight_find_enabled_sink(source);
+ if (sink && deactivate)
+ sink->activated = false;
+
+ return sink;
+bus_search:
dev = bus_find_device(&coresight_bustype, NULL, &deactivate,
coresight_enabled_sink);
@@ -828,6 +843,7 @@ coresight_select_best_sink(struct coresight_device *sink, int *depth,
*
* @csdev: source / current device to check.
* @depth: [in] search depth of calling dev, [out] depth of found sink.
+ * @enabled: flag to search only enabled sinks
*
* This will walk the connection path from a source (ETM) till a suitable
* sink is encountered and return that sink to the original caller.
@@ -839,7 +855,7 @@ coresight_select_best_sink(struct coresight_device *sink, int *depth,
* return best sink found, or NULL if not found at this node or child nodes.
*/
static struct coresight_device *
-coresight_find_sink(struct coresight_device *csdev, int *depth)
+coresight_find_sink(struct coresight_device *csdev, int *depth, bool enabled)
{
int i, curr_depth = *depth + 1, found_depth = 0;
struct coresight_device *found_sink = NULL;
@@ -862,7 +878,8 @@ coresight_find_sink(struct coresight_device *csdev, int *depth)
child_dev = csdev->pdata->conns[i].child_dev;
if (child_dev)
- sink = coresight_find_sink(child_dev, &child_depth);
+ sink = coresight_find_sink(child_dev, &child_depth,
+ enabled);
if (sink)
found_sink = coresight_select_best_sink(found_sink,
@@ -872,6 +889,10 @@ coresight_find_sink(struct coresight_device *csdev, int *depth)
}
return_def_sink:
+ /* Check if we need to return an enabled sink */
+ if (enabled && found_sink)
+ if (!found_sink->activated)
+ found_sink = NULL;
/* return found sink and depth */
if (found_sink)
*depth = found_depth;
@@ -901,10 +922,30 @@ coresight_find_default_sink(struct coresight_device *csdev)
/* look for a default sink if we have not found for this device */
if (!csdev->def_sink)
- csdev->def_sink = coresight_find_sink(csdev, &depth);
+ csdev->def_sink = coresight_find_sink(csdev, &depth, false);
return csdev->def_sink;
}
+/**
+ * coresight_find_enabled_sink: Find the suitable enabled sink
+ *
+ * @csdev: starting source to find a connected sink.
+ *
+ * Walks connections graph looking for a suitable sink to enable for the
+ * supplied source. Uses CoreSight device subtypes and distance from source
+ * to select the best sink.
+ *
+ * Used in cases where the CoreSight user (sysfs) has selected a sink.
+ */
+struct coresight_device *
+coresight_find_enabled_sink(struct coresight_device *csdev)
+{
+ int depth = 0;
+
+ /* look for the enabled sink */
+ return coresight_find_sink(csdev, &depth, true);
+}
+
static int coresight_remove_sink_ref(struct device *dev, void *data)
{
struct coresight_device *sink = data;
@@ -992,7 +1033,7 @@ int coresight_enable(struct coresight_device *csdev)
* Search for a valid sink for this session but don't reset the
* "enable_sink" flag in sysFS. Users get to do that explicitly.
*/
- sink = coresight_get_enabled_sink(false);
+ sink = coresight_get_enabled_sink(csdev, false);
if (!sink) {
ret = -EINVAL;
goto out;
--
2.25.1
In the proposed Coresight module set v5 [1] from Tingwei, unloading the
ETM module before the CTI module will crash on unload of the CTI module
due to the cleanup callback from Coresight to the CTI module not working
correctly in clearing sysfs callbacks.
Patch fixes this issue. Applies on [1].
Tingwei - could you consider adding this to your set for v6?
[1] https://lists.linaro.org/pipermail/coresight/2020-July/004349.html
Mike Leach (1):
coresight: cti: Fix bug clearing sysfs links on callback
drivers/hwtracing/coresight/coresight-core.c | 4 ++--
drivers/hwtracing/coresight/coresight-cti-core.c | 3 +--
2 files changed, 3 insertions(+), 4 deletions(-)
--
2.17.1