Since before there have no platforms use static funnel in mainline
kernel (though maybe some in-house SoC has used it but didn't upstream
for mainline kernel yet so we don't be aware for it), when enable
CoreSight DT binding for hikey960, we found the SoC uses the static
funnel in the link path and but it's not supported in CoreSight funnel
driver.
So the first patch is to update DT documentation to support static
funnel (we call it as non-configurable funnel in documentation); the
second patch is to support the static funnel in the CoreSight funnel
driver.
Credits to Suzuki shared code for CoreSight replicator refactoring,
the static funnel related implementation heavily follows up the same
fashion in Suzuki's replicator code.
This patch set has been rebased on CoreSight next branch [1] with
latest commit d5d246a56af0 ("coresight: Merge the static and dynamic
replicator drivers") and tested on Hikey960 with perf commands:
# perf record -e cs_etm/(a)20010000.etf/ --per-thread ./main
# perf report --tui
P.s. in this version, Hikey960 CoreSight DT binding has not been
included and will be sent out separately.
[1] https://git.linaro.org/kernel/coresight.git/log/?h=next
Leo Yan (2):
dt-bindings: arm: coresight: Support static funnel
coresight: funnel: Support static funnel
.../devicetree/bindings/arm/coresight.txt | 45 ++++++-
.../hwtracing/coresight/coresight-funnel.c | 112 +++++++++++++-----
2 files changed, 127 insertions(+), 30 deletions(-)
--
2.17.1
Hello CoreSight team,
I'm trying to bring up TMC-ETR on Xilinx Zynq Ultrascale+ and I ran into some troubles.
I hope you may have some ideas on where to look next.
Detailed CoreSight topology of Zynq US+ MPSoC may be found in ug1085-zynq-ultrascale-trm.pdf
(easy to google), but to make this discussion easier, I'll try to sketch it below:
[2x C-R5] [4x C-A53]
| |
[2x ETMs] [4x ETM]
| |
[Funnel0] [Funnel1] [STM]
| | |
| [TMC-ETF 4kB] |
| | |
[--------------------ATB----------------]
|
[Funnel2]
|
[TMC-ETF 8 KB]
|
[Replicator]
| |
[TMC-ETR] [TPIU]
I can happily use perf to trace Cortex-A53 cores and get trace data from the upmost ETF
(the 4kB one). However, I feel like I often get buffer overflows (thanks Mathieu
for this hypothesis) overwriting my trace with new data during the session. To overcome
this I'd like to use either the second ETF or, preferably, ETR with significantly larger
buffer. The problem is, I'm not able to get any trace from ETR.
Observations:
1. It is possible to choose ETR as sink in perf - there is no error and the session starts.
2. There are no CoreSight related errors in dmesg.
3. By examining TMC-ETR memory mapped registers (busybox devmem 0x...) I can see that
indeed perf sees the device and configures it properly. I've added some prints around
struct etr_buf manipulations in TMC drivers and I can actually see that buffer address
and size saved into this structure are programmed into TMC, as the same values appear
in its registers.
I can also see that the enable bit is set high when tracing starts and low when perf returns.
4. There is never any useful data in AUXTRACE sections of perf.data. When tracing with
--per-thread I observe that the size of the section grows significantly the longer I trace:
' ... CoreSight ETM Trace data: size xxx bytes' with xxx exceeding kBytes.
However, all I get is:
0xd60 [0x8]: event: 68
.
. ... raw event: size 8 bytes
. 0000: 44 00 00 00 00 00 08 00 D.......
0xd60 [0x8]: PERF_RECORD_FINISHED_ROUND
With --all-cpus, I always get ' ... CoreSight ETM Trace data: size 16 bytes' no matter
how long the tracing session is.
Interestingly, the data part does not change - it's always the same 8 bytes each time I try
using ETR as sink, regardless --per-thread or --all-cpus mode.
5. Each time I print etr_buf contents in tmc_etr_sync_flat_buf() or tmc_etr_sync_sg_buf(),
I can see that the buffer, no matter how big, gets only 16 bytes of data on each sync.
I wonder if this issue may point to SMMU issues. I can see in juno-base.dtsi in Linux mainline
that the ETR node (and only this one from the CS family) has iommus=< > property pointing to smmu_etr:
etr@20070000 {
compatible = "arm,coresight-tmc", "arm,primecell";
reg = <0 0x20070000 0 0x1000>;
iommus = <&smmu_etr 0>;
...
I tried to mimic this behaviour on my platform by adding similar reference to the only SMMU node
defined in xilinx/zynqmp.dtsi. In my case it's iommus = <&smmu 0xc5>; since there is no dedicated SMMU
for ETR (and I don't see it in TRM) and 0xc5 is stream ID calculated from the CoreSight master ID
(TRM Chapter 16, Table 16-11). I can see in dmesg that SMMU is enabled and ETR is added to iommu
group 0, but this does not change the behaviour. I'd appreciate any suggestions if this direction
seem worth further debugging.
Another interesting observation is that I'm actually unable to access anything below the 4k ETF
in the topology I sketched. I can't use ETF2 nor STM via sysfs. I wonder if there is some ATB
configuration that may be worth checking as well?
I would appreciate any suggestions where to look next.
Thanks and best regards,
Wojciech
Hello,
I have a Jetson TX2 board which has dual-core NVIDIA Denver2 + quad-core
ARM Cortex-A57.
nVidia recently released a new SDK package (which they call JetPack, based
on Ubuntu and 4.9 Kernel). out of the box it comes with coresight disabled,
so I had to recompile the kernel in order to enable it. Coresight kind of
works, I'm getting some data out of it, but I'm trying to use Perf in order
to do the tracing.
The issue I'm facing is that when I execute:
perf record -e cs_etm/(a)8030000.etf/u --per-thread uname
it works properly, and I'm getting great reports, but if I omit the
--per-thread parameter I'm getting the:
failed to mmap with 12 (Cannot allocate memory)
mmap error also happens if I want to record already running process with
--pid parameter.
If I specify the number of the cpu:
perf record -e cs_etm/(a)8030000.etf/u --cpu 0 uname
then recording works but when I try to do perf report it says:
0x228 [0x40]: failed to process type: 7
What might be interesting to mention is that the processors on this system
are arranged in weird combination. cpu 0,3,4,5 are ARM cores which have
coresight on them, and cpu 1,2 are Denver cores without coresight. I'm
suspecting that this might create some issues.
Do you have any ideas on how to proceed to diagnose the issue?
Thank you.
This patchset adds support for CPU-wide trace scenarios and as such, it is
now possible to issue the following commands:
# perf record -e cs_etm/(a)20070000.etr/ -C 2,3 $COMMAND
# perf record -e cs_etm/(a)20070000.etr/ -a $COMMAND
The above will trace all instructions executed by a given processor for as
long as $COMMAND hasn't returned. The solution is designed to work for
both 1:1 and N:1 source/sink topologies, though the former hasn't been
tested for lack of access to HW.
Most of the changes revolve around allowing more than one event to use
a sink when operated from perf. More specifically the first event to
use a sink switches it on while the last one is tasked to aggregate traces
and switching off the device.
This is the kernel part of the solution, with the user space portion to be
released in a separate set. All the patches have been rebased on
yesterday's linux next and hosted here[1]. Everything has been tested on
Juno. I have not CC'ed the kernel mailing list because of the ongoing
merge window.
Review and comments would be most appreciated.
Regards,
Mathieu
[1]. https://git.linaro.org/people/mathieu.poirier/coresight.git/log/?h=next-201…
Mathieu Poirier (20):
coresight: pmu: Adding ITRACE property to cs_etm PMU
coresight: etm4x: Add kernel configuration for CONTEXTID
coresight: etm4x: Configure tracers to emit timestamps
coresight: Adding return code to sink::disable() operation
coresight: Move reference counting inside sink drivers
coresight: Refactor sink::disable() functions
coresight: Refactor sink::update() functions
coresight: perf: Refactor function etm_setup_aux()
coresight: perf: Refactor function free_event_data()
coresight: Introduce the notion of process ID to the framework
coresight: tmc-etr: Refactor function tmc_etr_setup_perf_buf()
coresight: tmc-etr: Introduce the notion of process ID to ETR devices
coresight: tmc-etr: Allow events to use the same ETR buffer
coresight: tmc-etr: Add support for CPU-wide trace scenarios
coresight: tmc-etf: Add support for CPU-wide trace scenarios
coresight: etb10: Add support for CPU-wide trace scenarios
coresight: Refactor sink::alloc_buffer() functions
coresight: Add function coresight_sink_is_shared()
coresight: tmc-etr: Make ETR aware of topology
coresight: Use event->cpu to determine session type
drivers/hwtracing/coresight/coresight-etb10.c | 79 +++++-
.../hwtracing/coresight/coresight-etm-perf.c | 47 +++-
drivers/hwtracing/coresight/coresight-etm4x.c | 114 +++++++-
drivers/hwtracing/coresight/coresight-priv.h | 1 +
.../hwtracing/coresight/coresight-tmc-etf.c | 84 ++++--
.../hwtracing/coresight/coresight-tmc-etr.c | 265 +++++++++++++++---
drivers/hwtracing/coresight/coresight-tmc.c | 4 +
drivers/hwtracing/coresight/coresight-tmc.h | 11 +
drivers/hwtracing/coresight/coresight-tpiu.c | 9 +-
drivers/hwtracing/coresight/coresight.c | 53 +++-
include/linux/coresight-pmu.h | 2 +
include/linux/coresight.h | 8 +-
tools/include/linux/coresight-pmu.h | 2 +
13 files changed, 568 insertions(+), 111 deletions(-)
--
2.17.1
Hi,
Can I send you a sample of one of our B2B email list based on your
requirement?
Data Fields includes: Contact name, Company name, Job Title, Website,
Industry, SIC Code, Email address, Direct mail address, Telephone number,
Revenue Size, Employee Size, etc.
Kindly just share your requirements by filling in the below table:
Industries : _____________? (Any Industry)
Job Titles : _____________? (Any Titles)
Geography: _____________? (Any Location)
I'll come up with the data counts, costs & few sample contacts for your
review.
Awaiting your response,
Best Regards
Leslie Atkins
Data Analyst
Starting with the v5.1 kernel cycle compiling the perf tools (on and off
target) requires the addition of a new CORESIGHT=1 command line flag.
See the following commit for details:
1c3b28fd7ae8 ("perf coresight: Do not test for libopencsd by default")
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
---
HOWTO.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/HOWTO.md b/HOWTO.md
index 551b085c9c78..3b93b3d392aa 100644
--- a/HOWTO.md
+++ b/HOWTO.md
@@ -21,10 +21,10 @@ supplemented with modifications to the CoreSight framework and drivers to be
usable by the Perf core. The remaining out of tree patches are being
upstreamed incrementally.
-From there compiling the perf tools with `make -C tools/perf` will yield a
-`perf` executable that will support CoreSight trace collection. Note that if
-traces are to be decompressed *off* target, there is no need to download and
-compile the openCSD library (on the target).
+From there compiling the perf tools with `make -C tools/perf CORESIGHT=1` will
+yield a `perf` executable that will support CoreSight trace collection. Note
+that if traces are to be decompressed *off* target, there is no need to download
+and compile the openCSD library (on the target).
Before launching a trace run a sink that will collect trace data needs to be
identified. All CoreSight blocks identified by the framework are registed in
@@ -306,7 +306,7 @@ and needs to be installed on a system prior to compilation. Information about
the status of the openCSD library on a system is given at compile time by the
perf tools build script:
- linaro@t430:~/linaro/linux-kernel$ make VF=1 -C tools/perf
+ linaro@t430:~/linaro/linux-kernel$ make CORESIGHT=1 VF=1 -C tools/perf
Auto-detecting system features:
... dwarf: [ on ]
... dwarf_getlocations: [ on ]
--
2.17.1
Hi,
The OCSD_INSTR_WFI_WFE instruction sub-type is added to the library
headers from version 0.11.0 of openCSD to support later ETMv4
versions.
This does require an update to the perf code in cs-etm-decoder.c to
add this value into the handling code in the default part of the case
statement - e.g.:-
case OCSD_INSTR_ISB:
case OCSD_INSTR_DSB_DMB:
+ case OCSD_INSTR_WFI_WFE:
case OCSD_INSTR_OTHER:
default:
The perf-opencsd master branch has not had an update to cover this yet.
The present perf decode does not use this value. If you do not
specifically need ETMv4.3 support for authenticated pointer trace then
it is safe to use the latest v0.10.x decoder with the perf-opencsd.
Otherwise you will need to patch locally till a patch is made
available in the repository, or the upstream perf supports the later
OpenCSD.
Regards
Mike
On Tue, 19 Mar 2019 at 07:46, Solomon <notifications(a)github.com> wrote:
>
> After installing the OpenCSD library, I tried to compile perf from perf-opencsd. I used the command make VF=1 -C tools/perf. However, I got the following error:
>
> CC util/intel-pt-decoder/intel-pt-log.o
>
> CC util/cs-etm-decoder/cs-etm-decoder.o
>
> util/cs-etm-decoder/cs-etm-decoder.c: In function ‘cs_etm_decoder__buffer_range’:
>
> util/cs-etm-decoder/cs-etm-decoder.c:370:2: error: enumeration value ‘OCSD_INSTR_WFI_WFE’ not handled in switch [-Werror=switch-enum]
>
> switch (elem->last_i_type) {
>
> ^~~~~~
>
> CC util/intel-pt-decoder/intel-pt-decoder.o
>
> cc1: all warnings being treated as errors
>
> Has anyone had the same issue before?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Minor update to library released - build fixes.
Fixes issue with Debian build on Sparc. See README for details.
Regards
Mike
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK
Hi Mathieu,
Apologies if mailman does not see this as a reply. I'm not sure if Outlook handles In-Reply-To properly.
I'm testing CPU-wide tracing on Zynq Ultrascale+ MPSoC and I have some comments I'd like to share.
Some introduction at first:
- I'm using mainline Linux from a couple of days ago (12ad143e1b80 Merge branch 'perf-urgent-for-linus'...)
- on top of it I have a couple of my changes introducing CoreSight support on US+
- on top of this I cherry-picked your two patch sets with CPU-wide tracing
I prepared a test program that's supposed to generate deterministic trace. I created a function that should,
depending on the argument, create either continuous E atoms or E/N atoms alternately. In main() I spawn
two threads with affinity attributes:
- the first thread is set up as atom E generator, pinned to CPU1
- the other as E/N generator, pinned to CPU2
The main thread is pinned to CPU0.
The atom generator function's body looks like below. If *atom == 'n', branch is not taken, thus atom N should
be generated, and if *atom == 'e', branch is taken and atom E should be generated. After that, another atom
E is expected, since the while loop branches back to the start. It's counter-intuitive when you look at the C code,
but the if-condition is actually evaluated to b.ne instruction, so getting inside the condition happens when the branch
is not taken.
volatile int sum = 0;
while (1) {
// Reference by pointer, so it's not optimized out.
if (*atom == 'n') // compiler creates b.ne here
sum += 0xdeadbeef * (*atom + 1);
}
Here are my observations:
1. -C option works well. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -C1 ./atom_gen
In perf report I can see lots of E atoms in packets marked with ID:12. If I collect trace with -C2 instead,
I see E/N atoms in packets with ID:14. Everything works as expected each time I trace this application.
2. -a option works unreliable. I run perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u -a ./atom_gen
What I expect is perf.data containing similar output to what I got with -C1 plus what I got with -C2, i.e. ID:12
Atom E packets and ID:14 atom E/N packets. What actually happens is inconsistent each time I try this command.
Sometimes I have no atom packets associated with IDs 12 and 14 but I have some with ID:16. Sometimes I get
ID:14 atoms but no ID:12. Sometimes I get expected trace but still some noise in ID:16 packets, which I would
not expect at all, since the program schedules nothing on CPU3. I wonder if I'm missing something here in my
understanding of CoreSight. Is this behaviour expected?
3. I'm not able to use filters.
I'd like to narrow down tracing to the while(1) loop in trace generator, to filter out noise from other instructions.
However, I find it impossible to use --filter flag along with -C or -a:
# perf record -e cs_etm/(a)fe940000.etf1/u --filter 'filter atom_n_generator @./atom_gen' -a ./atom_gen
failed to set filter "filter 0x90c/0x8c@/root/atom_gen" on event cs_etm/(a)fe940000.etf1/u with 95 (Operation not supported)
It works fine with --per-thread. Is the behaviour expected, or is this a bug?
4. Kernel crashes if used with no -a, -C or --per-thread.
If I call perf with:
# perf record -e cs_etm/(a)fe940000.etf1/u ./atom_gen
I can see some printfs from the program, but immediately kernel gets NULL pointer dereference.
Please find a log below. My serial connection drops characters sometimes, sorry for that.
The crash happens in tmc_enable_etf_sink+0x90, which is:
/* Get a handle on the pid of the process to monitor */
if (handle->event->owner)
pid = task_pid_nr(handle->event->owner);
The handle->event->owner seems to be NULL.
[ 1313.650726Unable to handle kernel NULL pointer dereference at virtual adess 00000000000003b8
[ 1313.659501] Mem abort info:
[ 1313.662281] ESR = 0x96000006
[ 1313.665320] Exption class = DABT (current EL), IL = 32 bits
[ 1313.671232] SET = 0, FnV = 0
[ 1313.674277] EA = 0, S1PTW = 0
[ 1313.677401] Data abort info:
[ 1313.680266] ISV = 0, ISS =x00000006
[ 1313.684085] CM = 0, WnR = 0
[ 1313.687039] us pgtable: 4k pages, 39-bit VAs, pgdp = 000000003b61a770
[ 1313.693644] [00000000000003b8] pgd=000000006c6da003, pud=0000006c6da003, pmd=0000000000000000
[ 1313.702336] Internal err: Oops: 96000006 [#1] SMP
[ 1313.707201] Modules linked in:
[ 1313.710250] CPU: 1 PID: 3255 Comm: multithread-two N tainted 5.0.0-10411-g66431e6376c4-dirty #26
[ 1313.719200] Hdware name: ZynqMP ZCU104 RevA (DT)
[ 1313.723981] pstate: 20000085 (nzCv daIf -PAN -UAO)
[ 1313.728770] pc : tmc_enle_etf_sink+0x90/0x3b0
[ 1313.733286] lr : tmc_enable_etf_sin0x64/0x3b0
[ 1313.737806] sp : ffffff8011263b40
[ 1313.741104] x29: ffffff8011263b40 x28: 0000000000000000
[ 1313.6409] x27: 0000000000000000 x26: ffffffc06d4ce180
[ 1313.7512] x25: 0000000000000001 x24: ffffffc06faa4ce0
[ 1313.757015] x23: 0000000000000002 x22: 0000000000000080
[ 1313.7319] x21: ffffffc06faa4ce0 x20: ffffffc06cf07c00
[ 1313.7676] x19: ffffffc06d560e80 x18: 0000000000000000
[ 1313.772926] x17: 0000000000000000 x16: 0000000000000000
[ 1313.7729] x15: 0000000000000000 x14: ffffff8010879388
[ 1313.78353 x13: 0000000000000000 x12: 0000000002e8fc00
[ 1313.788836] x11: 0000000000000000 x10: 00000000000007f0
[ 1313.7940] x9 : 0000000000000000 x8 : 0000000000000000
[ 1313.799443x7 : 0000000000000030 x6 : ffffffc06c279030
[ 1313.804747] x5 : 0000000000000030 x4 : 0000000000000002
[ 1313.8100] x3 : ffffffc06d560ee8 x2 : 0000000000000001
[ 1313.815354]1 : 0000000000000000 x0 : 0000000000000000
[ 1313.820659] Process multithread-two (pid: 3255, stack limit = 0x00000073629f1e)
[ 1313.828133] Call trace:
[ 1313.830571] tmc_enae_etf_sink+0x90/0x3b0
[ 1313.834748] coresight_enable_path+0xe4/0x1f8
[ 1313.839096] etm_event_start+0x8c/0x120
[313.842923] etm_event_add+0x38/0x58
[ 1313.846492] event_scd_in.isra.61.part.62+0x94/0x1b0
[ 1313.851620] group_sched_in+0xa0/0x1c8
[ 1313.855360] flexible_sched_in+0xac/0x1
[ 1313.859364] visit_groups_merge+0x144/0x1f8
[ 1313.86353 ctx_sched_in.isra.39+0x128/0x138
[ 1313.867887] perf_event_sched_in.isra.41+0x54/0x80
[ 1313.872669] __perf_eventask_sched_in+0x16c/0x180
[ 1313.877540] finish_task_switch+104/0x1d8
[ 1313.881715] schedule_tail+0xc/0x98
[ 1313.885195] ret_from_fork+0x4/0x18
[ 1313.888677] Code: 540016 f9001bb7 f94002a0 f9414400 (b943b817)
[ 1313.894761] ---[ e trace 99bb09dc83a83a1a ]---
Best regards,
Wojciech
Hi,
(+coresight mailing lists.)
Looked at this - -fpic is supposed to generate smaller code then -fPIC.
That said, I've tried both variants for x86_64 and aarch64 builds:
x86_64 showed no change, (gcc 5.4)
cross compiled aarch64 code was 0.45% smaller using -fpic rather than
-fPIC. (gcc 6.2)
native compiled aarch64 code showed no change (gcc 4.9)
While we could add some code to the makefile to dynamically change the
-fPIC/pic option when building on sparc architectures, unless there are
objections on the mailing list, I propose to change to -fPIC across the
board at this point.
This will be released as a 0.11.1 patch (along with another minor build
fix.)
Regards
Mike
On Wed, 13 Mar 2019 at 08:50, John Paul Adrian Glaubitz <
notifications(a)github.com> wrote:
> I have just tested this on sparc64 and can confirm that replacing -fpic
> with -fPIC fixes the issue for me.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <https://github.com/Linaro/OpenCSD/issues/16#issuecomment-472332457>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AMvwsxbzERGcBbzJECyGHDUxn…>
> .
>
--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK