Good morning,
Is tracing a multi-threaded program a supported use case for perf cs-etm?
If yes, are there any flags that should be specified with perf?
Thanks,
Andrea
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
This patch series adds support for thread stack and callchain; this patch
set depends on the instruction sample fix patch set [1].
This patch set get more complex, so before divide into small groups, I'd
like to use this patch set version to include all relevant patches, hope
this can give whole context for related code change.
Briefly, this patch can be divided into three parts, which also can be
reviewed separately for every part:
Patches 01, 02 are used to fix samples for one corner case is for
accessing the branch's target address and trigger an exception.
Essentially, an extra branch sample is added to reflect this
mediate branch between the previous branch and exception entry.
Patches 03, 04, 05, 06 are coming from patch v4, which are used to
support thread stack and callchain.
Patches 07, 08, 09 are used to fixup for exception entry and exit. This
is mainly used to fix two cases, one part is to fixup the thread stack
and callchain for the case when access branch target address and trigger
exception; another part is to fixup the thread stack for instruction
emulation (and other single step cases).
This patch set has been tested on Juno-r2 after applied on perf/core
branch with latest commit 85fc95d75970 ("perf maps: Add missing unlock
to maps__insert() error case"), and this patch set is also applied on
top of instruction sample fix patch set [1].
Test for option '-F,+callindent':
# perf script -F,+callindent
main 3258 1 branches: main ffffad684d20 __libc_start_main+0xe0 (/usr/lib/aarch64-linux-gnu/libc-2.28.so)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: _dl_fixup ffffad811b4c _dl_runtime_resolve+0x40 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: _dl_lookup_symbol_x ffffad80c078 _dl_fixup+0xb8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: do_lookup_x ffffad80849c _dl_lookup_symbol_x+0x104 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: check_match ffffad807bf0 do_lookup_x+0x238 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: strcmp ffffad807888 check_match+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
main 3258 1 branches: lib_loop_test@plt aaaae2c4d78c main+0x18 (/root/coresight_test/main)
[...]
Test for option '--itrace=g':
# perf script --itrace=g16l64i100
main 3258 100 instructions:
ffffad816a80 memcpy+0x70 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad809468 _dl_new_object+0xa8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffffad80952c _dl_new_object+0x16c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad801840 dl_main+0x778 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffffad8018dc dl_main+0x814 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
main 3258 100 instructions:
ffff8000100878d0 el0_sync_handler+0x168 ([kernel.kallsyms])
ffff800010082d00 el0_sync+0x140 ([kernel.kallsyms])
ffffad801910 dl_main+0x848 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad81384c _dl_sysdep_start+0x36c (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800884 _dl_start_final+0xac (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800b00 _dl_start+0x200 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
ffffad800048 _start+0x8 (/usr/lib/aarch64-linux-gnu/ld-2.28.so)
[...]
Changes from v4:
* Addressed Mike's suggestion for performance improvement for function
cs_etm__instr_addr() for quick calculation for non T32;
* Removed the patch 'perf cs-etm: Synchronize instruction sample with
the thread stack' (Mike);
* Fixed the issue for exception is taken for branch target address
accessing, for the branch sample and stack thread handling, the
related patches are 01, 02, 07;
* Fixed the stack thread handling for instruction emulation and single
step with patches 08, 09.
Changes from v3:
* Split out separate patch set for instruction samples fixing.
* Rebased on latest perf/core branch.
Changes from v2:
* Added patch 01 to fix the unsigned variable comparison to zero
(Suzuki).
* Refined commit logs.
Changes from v1:
* Added comments for task thread handling (Mathieu).
* Split patch 02 into two patches, one is for support thread stack and
another is for callchain support (Mathieu).
* Added a new patch to support branch filter.
[1] https://lkml.org/lkml/2020/2/18/1406
Leo Yan (9):
perf cs-etm: Defer to assign exception sample flag
perf cs-etm: Reflect branch prior to exception
perf cs-etm: Refactor instruction size handling
perf cs-etm: Support thread stack
perf cs-etm: Support branch filter
perf cs-etm: Support callchain for instruction sample
perf cs-etm: Fixup exception entry for thread stack
perf thread: Add helper to get top return address
perf cs-etm: Fixup exception exit for thread stack
.../perf/util/cs-etm-decoder/cs-etm-decoder.c | 1 +
tools/perf/util/cs-etm.c | 290 ++++++++++++++++--
tools/perf/util/thread-stack.c | 10 +
tools/perf/util/thread-stack.h | 1 +
4 files changed, 268 insertions(+), 34 deletions(-)
--
2.17.1
Hi, this is an incomplete patch for an issue with EL2 kernels, and I'm looking
for feedback on how to complete it.
The background is that to support tracing multiple address spaces we get ETM to
embed the context id in the trace, and we build with CONFIG_PID_IN_CONTEXTIDR
to get the scheduler to put the thread id in CONTEXTIDR_EL1. This is a known
technique, it's what context id tracing is designed for.
The problem is when the kernel is running not at EL1 (OS level) but at EL2
(hypervisor level), which is now becoming common. With HCR_EL2.E2H set,
the kernel's writes to CONTEXTIDR_EL1 actually change a different physical
register, CONTEXTIDR_EL2. However, ETM still traces CONTEXTIDR_EL1.
So the context ids in the trace are zero, and trace cannot be reconstructed.
ETM 4.1 has an option VMIDOPT to cause CONTEXTIDR_EL2 to be output in trace,
in the VMID field replacing the value of VTTBR.VMID. So we can use that, but the
trace follower, collecting events from OpenCSD, needs to be aware it needs to
check the VMID field not the CID field. OpenCSD doesn't need to change but
perf does. TRCCONFIGR is already in the metadata, so perf consumers can check
it to see what's going on.
The patch below does the kernel and userspace side but is not complete.
The problem is that userspace perf creates the metadata copy of TRCCONFIGR
based on its request (and fills in the other id registers by reading sysfs),
but the detection of EL2/E2H happens in the kernel which adjusts TRCCONFIGR,
and it's this config which is needed for decode. I see three ways round this:
- have userspace test to see if the kernel is EL2 (somehow) and adjust the
metadata to mirror what the kernel is doing
- have the kernel pass the adjusted TRCCONFIGR back so perf can put it in the
metadata
- have the perf decoder get the thread id from whichever of VMID and
CONTEXTID is available in a PE_CONTEXT element
Obviously, the last is simplest, but it's a bodge, and means that OpenCSD
will see VMIDs when its TRCCONFIGR says it won't. It's kind of cleanest to get
the real TRCCONFIGR somehow, but how do we do that?
Al
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index a128b5063f46..96488a0cfdcf 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -353,8 +353,32 @@ static int etm4_parse_event_config(struct etmv4_drvdata *drvdata,
}
if (attr->config & BIT(ETM_OPT_CTXTID))
- /* bit[6], Context ID tracing bit */
- config->cfg |= BIT(ETM4_CFG_BIT_CTXTID);
+ {
+ /*
+ * Enable context-id tracing. The assumption is that this
+ * will work with CONFIG_PID_IN_CONTEXTIDR to trace process
+ * id changes and support decode of multiple processes.
+ * But ETM's context id trace traces physical CONTEXTIDR_EL1,
+ * while the logical CONTEXTIDR_EL1 that is written to on
+ * process switch is either physical CONTEXTIDR_EL1 or
+ * CONTEXTIDR_EL2 depending on HCR_EL2.E2H. On principle
+ * we should continue to use logical CONTEXTIDR_EL1.
+ * In order to trace physical CONTEXTIDR_EL2, we need to
+ * enable VMID tracing and use the VMIDOPT flag to trace
+ * CONTEXTIDR_EL2 rather than VTTBR.VMID in the VMID field.
+ * Trace decoders will need to inspect TRCCONFIGR and use
+ * either the CID or the VMID field from the trace packet.
+ */
+ if (!(is_kernel_in_hyp_mode() &&
+ (read_sysreg(hcr_el2) & BIT(34)) != 0)) {
+ /* bit[6], Context ID tracing bit */
+ config->cfg |= BIT(ETM4_CFG_BIT_CTXTID);
+ } else {
+ /* bits[7,15], trace CONTEXTID_EL2 in VMID field */
+ config->cfg |= (BIT(ETM4_CFG_BIT_VMID) |
+ BIT(ETM4_CFG_BIT_VMIDOPT));
+ }
+ }
/* return stack - enable if selected and supported */
if ((attr->config & BIT(ETM_OPT_RETSTK)) && drvdata->retstack)
diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h
index b0e35eec6499..c2f47b25daab 100644
--- a/include/linux/coresight-pmu.h
+++ b/include/linux/coresight-pmu.h
@@ -19,8 +19,10 @@
/* ETMv4 CONFIGR programming bits for the ETM OPTs */
#define ETM4_CFG_BIT_CYCACC 4
#define ETM4_CFG_BIT_CTXTID 6
+#define ETM4_CFG_BIT_VMID 7
#define ETM4_CFG_BIT_TS 11
#define ETM4_CFG_BIT_RETSTK 12
+#define ETM4_CFG_BIT_VMIDOPT 15
static inline int coresight_get_trace_id(int cpu)
{
diff --git a/tools/include/linux/coresight-pmu.h b/tools/include/linux/coresight-pmu.h
index b0e35eec6499..c2f47b25daab 100644
--- a/tools/include/linux/coresight-pmu.h
+++ b/tools/include/linux/coresight-pmu.h
@@ -19,8 +19,10 @@
/* ETMv4 CONFIGR programming bits for the ETM OPTs */
#define ETM4_CFG_BIT_CYCACC 4
#define ETM4_CFG_BIT_CTXTID 6
+#define ETM4_CFG_BIT_VMID 7
#define ETM4_CFG_BIT_TS 11
#define ETM4_CFG_BIT_RETSTK 12
+#define ETM4_CFG_BIT_VMIDOPT 15
static inline int coresight_get_trace_id(int cpu)
{
diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index cd92a99eb89d..a54cad778841 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -35,6 +35,7 @@ struct cs_etm_decoder {
dcd_tree_handle_t dcd_tree;
cs_etm_mem_cb_type mem_access;
ocsd_datapath_resp_t prev_return;
+ uint32 thread_id_in_vmid:1;
};
static u32
@@ -496,17 +497,24 @@ cs_etm_decoder__buffer_exception_ret(struct cs_etm_packet_queue *queue,
static ocsd_datapath_resp_t
cs_etm_decoder__set_tid(struct cs_etm_queue *etmq,
+ struct cs_etm_decoder *decoder,
struct cs_etm_packet_queue *packet_queue,
const ocsd_generic_trace_elem *elem,
const uint8_t trace_chan_id)
{
pid_t tid;
- /* Ignore PE_CONTEXT packets that don't have a valid contextID */
- if (!elem->context.ctxt_id_valid)
- return OCSD_RESP_CONT;
+ if (!decoder->thread_id_in_vmid) {
+ /* Ignore PE_CONTEXT packets that don't have a valid contextID */
+ if (!elem->context.ctxt_id_valid)
+ return OCSD_RESP_CONT;
+ tid = elem->context.context_id;
+ } else {
+ if (!elem->context.vmid_valid)
+ return OCSD_RESP_CONT;
+ tid = elem->context.vmid;
+ }
- tid = elem->context.context_id;
if (cs_etm__etmq_set_tid(etmq, tid, trace_chan_id))
return OCSD_RESP_FATAL_SYS_ERR;
@@ -561,7 +569,7 @@ static ocsd_datapath_resp_t cs_etm_decoder__gen_trace_elem_printer(
trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_PE_CONTEXT:
- resp = cs_etm_decoder__set_tid(etmq, packet_queue,
+ resp = cs_etm_decoder__set_tid(etmq, decoder, packet_queue,
elem, trace_chan_id);
break;
case OCSD_GEN_TRC_ELEM_ADDR_NACC:
@@ -595,11 +603,15 @@ static int cs_etm_decoder__create_etm_packet_decoder(
OCSD_BUILTIN_DCD_ETMV3 :
OCSD_BUILTIN_DCD_PTM;
trace_config = &config_etmv3;
+ decoder->thread_id_in_vmid = 0;
break;
case CS_ETM_PROTO_ETMV4i:
cs_etm_decoder__gen_etmv4_config(t_params, &trace_config_etmv4);
decoder_name = OCSD_BUILTIN_DCD_ETMV4I;
trace_config = &trace_config_etmv4;
+ /* If VMID and VMIDOPT are set, thread id is in VMID not CID */
+ decoder->thread_id_in_vmid =
+ ((trace_config_etmv4.reg.configr & 0x8080) == 0x8080);
break;
default:
return -1;
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Poonam,
Please CC the coresight mailing list (as I did) when asking questions
- there is a lot of well informed people on there that can also help
you.
On Thu, 23 Jan 2020 at 22:33, Poonam Aggrwal <poonam.aggrwal(a)nxp.com> wrote:
>
> Hello Mathieu
>
>
>
> Greetings!
>
>
>
> I have started to take a look at the Linux coresight framework, and get this enabled on a NXP ARMv8 device.
>
>
>
> Can you share some documentation on the configs required to be enabled and the device tree nodes?
For V8 we have to reference implementation - ARM Juno and the
dragonboard 410c. I highly recommend purchasing the latter (because
it is very cheap) in order to get an understanding of what a working
coresight system look like. It is much easier to start from a working
example than nothing at all. Other than that the coresight bindings
[1] are full of good examples. I would also have a look at the DT for
Juno [2] and the dragonboard[3]. The HOWTO.md [4] on github is a
really good starting point when you'll get to test things out.
[1]. https://elixir.bootlin.com/linux/latest/source/Documentation/devicetree/bin…
[2]. https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/arm/juno…
[3]. https://elixir.bootlin.com/linux/latest/source/arch/arm64/boot/dts/qcom/msm…
[4]. https://github.com/Linaro/OpenCSD/blob/master/HOWTO.md
>
> To start I am looking to enable the ARMv8 ETM tracing.
Before going further I advise you to look at the source and sink
configuration on your platform. Up to now we've been working with
configurations where sources share a single sink (N:1 topology).
Newer SoC will have one source per sink (1:1 topology). At this time
only the former is supported by the framework. Supporting 1:1
topologies would require a fair amount of refactoring, something we
haven't had the opportunity to do for lack of HW platform to work
with.
Regards,
Mathieu
>
> Is there a reference which I can check in Linux for device tree and config.
>
>
>
> Many Thanks
>
> Poonam
[DEFAULT_HEADER]Hello,
I just wanted to reach out to you about the PPE stock we have in the USA (Los Angeles) right now.
KN95 - 5 Layers - 300,000 KN95 - 4 Layers - 600,000 3Ply - 1.2 Million
Take all or minimum 400,000 KN95 and 500,000 3Ply:
$2.85 for KN95 and $.62 for 3Ply FOB LA.
Individual price:
KN95 - $2.95 3Ply - $.69
Masks are FDA Approved and very high quality and can be inspected at:
Alanic International
8730 Wilshire Blvd, Penthouse
Beverly Hills, CA 90211
Please call me on 310 596 5555 or 310 800 6438 or visit www.ppekits.net
We can ship today or tomorrow. FDA Approved masks. We are FDA approved importer.
Payment and P.O will be made to the company on FDA list.
https://www.accessdata.fda.gov/scripts/cdrh/cfdocs/cfRL/rl.cfm?rid=246905
Photo:
https://www.dropbox.com/s/26h70sbdip0zy91/FDA%20KN951.png?dl=0
Thanks
Tony;
[DEFAULT_FOOTER]
[DEFAULT_HEADER]Hello,
Could you please direct me to your PPE department, we are FDA approved company (FDA number 3016649392). We can get a lot of PPE very quickly and large quantity as well.
For LA stock we have and we can ship same day from Beverly Hills, CA,
KN95 5 Layers Masks (500,000 Masks) KN95 4 Layers Masks (600,000 Masks) 3Ply Disposable Masks (1.2 Million Masks) Nitrile Gloves (1.5 Million Gloves) Vinyl Gloves (40,000 Gloves) Hand Sanitizers (30,000 100ML Bottles)
For production, we can also ship ASAP for you:
Isolation Gowns (Production - 50,000 pcs a day) Contactless Thermometer (10,000 each day) Disposable Shoe Cover (Production - 50,000 pcs a day) Disposable Head Cover (Production - 50,000 pcs a day) Rapid Antibody Test Kits (FOB China) Swabs (Production - 1M in a week) Cotton Masks (Production - 100,000 pcs a day) Custom branded masks (Production - 100,000 pcs a day)
We can ship 7 Million masks in 5-7 days. We are FDA approved importer.
As we all are very well aware, air space from China is not smooth these days but we are doing very good at pre booking and getting things well on time as we are speeding production to save time.
Please click below for:
[ Website ]( http://r.ppe-supplier.online/mk/cl/f/wvi3huY_-DiOmAZEB2G5e8H3IhmSUGho1vCDP_… ) [ Catalogue ]( http://r.ppe-supplier.online/mk/cl/f/xLCsTuxlPTOLRl424k8ehdMB1O5iEERY-z5ZYB… ) [ Pricelist ]( http://r.ppe-supplier.online/mk/cl/f/hDhq7oa4siAdginIo3T3zmQM8lkHvg3UmnPJHk… )
Let me know if you have any questions.
Thank you,
Tony
PPE Kits
[ Website ]( http://r.ppe-supplier.online/mk/cl/f/yB-LyK7ptKZuRtwHnqVxN49u0WNEi-O0cb3Lbo… )[DEFAULT_FOOTER]
Unable to view? Read it
<http://tracking.ppe-kit.site/view?msgid=UGex5pFcqW97wGbQgpJNRQ2>
Online
Hello,
I am following up to see, if you need anything.
You can check ppekits.net for more products or also visit www.ppekits.net/supply for our catalogue. We are FDA approved importer and will guarantee your money and product delivery.
Also just wanted to give you an idea about shipping, we can ship within few days approx:
- 3 Ply (20 Million)
- KN95 (2 Million)
- Gloves (1M)
- Faceshield (500K)
- Hand Sanitizer (200K)
- Thermometer (100K - Contactless)
- Isolated Gowns (500K)
and a lot more.
We also have lot of stock in Los Angeles for Kn95, 3Ply and Gloves etc.
We can send items via express or cargo air freight to get it to you or client ASAP.
Please let me know if you would like to get on a call.
If you have any questions now.
Thanks
--
Tony
ppekits.net
If you no longer wish to receive mail from us, you can
<http://tracking.ppe-kit.site/tracking/unsubscribe?msgid=UGex5pFcqW97wGbQgpJ…>
unsubscribe
PERSONAL PROTECTIVE EQUIPMENT, 8730 Wilshire Blvd,, Los Angeles, CA, 90210, United States, https://www.emergencyessentials.co.uk / https://www.ppekits.net