Re: [PATCH 2/2] perf tools: new inject capabilitity for CoreSight traces

24 Mar 2017


      Hi Km
On 24 March 2017 at 00:31, Kim Phillips kim.phillips@arm.com wrote:
...
On Thu, 16 Mar 2017 22:36:08 +0000
Mike Leach mike.leach@linaro.org wrote:
...
This thread is now getting a bit unwieldy with all the inlined extensive
logging.
the idea was that one could cut out all irrelevant pieces, leaving
relevant ones intact for an in-line, context-sensitive response.  Thus
the 'no top posting/always trim your replies' recommendations for
replying on development lists like these.
...
Also we are considerably off topic - no longer about perf inject,
but more about simply getting the CS infrastructure working in your
environment.
oh, you're referring to the title, which remains the ultimate goal, and
I consider the getting-there experience still relevant (since,
e.g., the AutoFDO documentation area doesn't explicitly point out what
sink was enabled if any, and so on).  Btw, an alternate means of
splitting threads is to take a message and reply twice - once with one
new title 'Subject: new theme A (was: <old title>)' and addressing that
theme there, then another with a similar change in the title for theme
B.  I thought this thread was doing well since we were both carrying
the unable to mmap and hard lockup problems along, and I had no proof
they weren't related.
...
At this point it appears that google mail is not correctly showing all the
previous mail so I cannot all the answers at your questions.
Google's webmail interface wouldn't be my first choice for an open
source developer's mailer.  IIRC, linaro had IMAP/SMTP instructions for
hooking up your account to real mailers like thunderbird, mutt, etc.
I use sylpheed, e.g., and have no problem continuing threads like these.
Nor mine, but as I am using my linaro.org mail, ARM IT will only allow
web access inside the ARM network to servers other than official ARM
ones.
Therefore the imap server for linaro is blocked inside ARM.
...
...
I have attached my boot log from the serial terminal for my r2 run. I was
using the 'screen' serial terminal monitor with logging enabled.
The log is tricky to read in places as the backspaces end up a special
characters and repeated incrementing count message appears which stayed
stationary in the  serial window
Indeed, I spent more than enough time trying to parse them and
still failed.  Normally, just cutting-n-pasting from a terminal window,
where the backspace etc. characters have already been interpreted,
works out ok.
...
It may be worth starting a new mail thread?
I see that was done for the kernel splat for running perf without
enabling sinks.  I'm going to just say I'd expect perf to run the same
way every time, the same way 'ls' runs, no matter what underlying h/w
is in use.  AFAICT, Coresight is currently the only device that makes
perf have this 'unnatural' behaviour.  E.g., the fact that sinks have
to be turned on outside of perf, only to have perf in turn, turn *off*
those same sinks, I consider asymmetric behaviour and poor design in
terms of a perf user's expected usage.
That was the initial design, but has now been changed to allow the
sink to be specifed on the perf command line.
We are in the process of changing the kernel docs to state this,
matching the OpenCSD howto docs.
...
...
...
Also, can you confirm the OOM when enabling the sinks prior to perf
invocation?  If all else fails, can we work on fixing that problem?
I can double check this when I get into the office tomorrow, but I expect
it will still be there as I could reproduce on r1 + gcc 4.9.
This is worth investigating a some point, but surely secondary to you
figuring out why you are hitting a hard crash.
OK, I've trimmed out all the nitty gritty details where we were trying
to compare setups, mainly because I've reached a point where I can
finally run an ETM trace without locking up the machine (without
having to buy an SSD for an older operating system's rootfs :).
Recipe for success - at least for this juno R2 user - is:

Use upstream Linus' ToT as source for kernel (not OpenCSD's

tree/branches).  This commit (093b995) works for me, built with ubuntu's
standard gcc 6.2 cross compiler:
root@juno:~# dmesg | grep gcc
[    0.000000] Linux version 4.11.0-rc3-00035-g093b995 (kim@dupont) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #25 SMP PREEMPT Wed Mar 22 01:41:55 CDT 2017

Rebase OpenCSD's perf-opencsd-4.11-rc1 branch (containing S.Pop's

patches) onto acme's upstream 'perf/urgent' branch found here:
    git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git


I used commit e7ede72a6d, which contains a fix for a use-after-free
bug,  and built perf natively:
root@juno:~# strings -a ./perf | grep "GCC:"
GCC: (Debian 6.3.0-6) 6.3.0 20170205
Now I'm able to repeatedly issue these two commands:
echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 uname
somewhat successfully, but most importantly without the hard lockup.
I say somewhat because a lot of the runs lose pretty much all the data?:
root@juno:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink
root@juno:~# taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 ./sort-O3
[ 1142.454452] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled
[ 1142.460852] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled
Bubble sorting array of 30000 elements
[ 1144.169454] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled
<lots of funnel messages snipped>
[ 1153.166523] coresight-tmc 20010000.etf: TMC-ETB/ETF disabled
[ 1153.172365] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled
[ 1153.178760] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled
10718 ms
[ 1153.195407] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled
[ 1153.201875] coresight-funnel 20040000.funnel: FUNNEL inport 0 disabled
[ 1153.208353] coresight-tmc 20010000.etf: TMC-ETB/ETF disabled
[ perf record: Woken up 59 times to write data ]
Warning:
AUX data lost 57 times out of 58!
[ perf record: Captured and wrote 3.579 MB perf-etm.data ]
root@juno:~#
Anyway, what's a good means of measuring the overhead incurred by
perf -e cs-etm//?  By using perf stat on perf itself?:
root@juno:~# /home/kim/git/OpenCSD/tools/perf/perf stat taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 ./sort-O3
[ 1592.081642] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled
[ 1592.088042] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled
<lots of funnel messages snipped>
[ 1605.184854] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled
13035 ms
[ 1605.254383] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled
[ 1605.260850] coresight-funnel 20040000.funnel: FUNNEL inport 0 disabled
[ 1605.267328] coresight-tmc 20010000.etf: TMC-ETB/ETF disabled
[ perf record: Woken up 83 times to write data ]
Warning:
AUX data lost 82 times out of 85!
[ perf record: Captured and wrote 5.164 MB perf-etm.data ]
Performance counter stats for 'taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 ./sort-O3':
   6568.653900      task-clock (msec)         #    0.496 CPUs utilized
           181      context-switches          #    0.028 K/sec
             1      cpu-migrations            #    0.000 K/sec
         1,950      page-faults               #    0.297 K/sec
 <not counted>      cycles                                                        (0.00%)
 <not counted>      instructions                                                  (0.00%)
 <not counted>      branches                                                      (0.00%)
 <not counted>      branch-misses                                                 (0.00%)

  13.235000365 seconds time elapsed


root@juno:~# /home/kim/git/OpenCSD/tools/perf/perf stat taskset -c 2 ./sort-O3Bubble sorting array of 30000 elements
5364 ms
Performance counter stats for 'taskset -c 2 ./sort-O3':
   5365.715560      task-clock (msec)         #    1.000 CPUs utilized
             3      context-switches          #    0.001 K/sec
             0      cpu-migrations            #    0.000 K/sec
           126      page-faults               #    0.023 K/sec
 <not counted>      cycles                                                        (0.00%)
 <not counted>      instructions                                                  (0.00%)
 <not counted>      branches                                                      (0.00%)
 <not counted>      branch-misses                                                 (0.00%)

   5.366555299 seconds time elapsed


Is this the expected overhead, given the (85 - 82) / 85 = 3.5% success
rate of capturing and storing the trace data?  I'd expect a lot more
than 5MB of trace for the incurred 2.5x execution time (8 seconds in
this case).  I can't help but imagine those funnel printk's have a
negative influence in the area...
The "loss" in this case is a flag indicator that the 64k ETF is
wrapping. So there is some data loss, but you do get a full 64k buffer
each time.
You should get less loss if you use the ETR.
Mike
...
Kim
-- 
Mike Leach
Principal Engineer, ARM Ltd.
Blackburn Design Centre. UK

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [PATCH 2/2] perf tools: new inject capabilitity for CoreSight traces