Hi Km
On 24 March 2017 at 00:31, Kim Phillips kim.phillips@arm.com wrote:
On Thu, 16 Mar 2017 22:36:08 +0000 Mike Leach mike.leach@linaro.org wrote:
This thread is now getting a bit unwieldy with all the inlined extensive logging.
the idea was that one could cut out all irrelevant pieces, leaving relevant ones intact for an in-line, context-sensitive response. Thus the 'no top posting/always trim your replies' recommendations for replying on development lists like these.
Also we are considerably off topic - no longer about perf inject, but more about simply getting the CS infrastructure working in your environment.
oh, you're referring to the title, which remains the ultimate goal, and I consider the getting-there experience still relevant (since, e.g., the AutoFDO documentation area doesn't explicitly point out what sink was enabled if any, and so on). Btw, an alternate means of splitting threads is to take a message and reply twice - once with one new title 'Subject: new theme A (was: <old title>)' and addressing that theme there, then another with a similar change in the title for theme B. I thought this thread was doing well since we were both carrying the unable to mmap and hard lockup problems along, and I had no proof they weren't related.
At this point it appears that google mail is not correctly showing all the previous mail so I cannot all the answers at your questions.
Google's webmail interface wouldn't be my first choice for an open source developer's mailer. IIRC, linaro had IMAP/SMTP instructions for hooking up your account to real mailers like thunderbird, mutt, etc. I use sylpheed, e.g., and have no problem continuing threads like these.
Nor mine, but as I am using my linaro.org mail, ARM IT will only allow web access inside the ARM network to servers other than official ARM ones. Therefore the imap server for linaro is blocked inside ARM.
I have attached my boot log from the serial terminal for my r2 run. I was using the 'screen' serial terminal monitor with logging enabled. The log is tricky to read in places as the backspaces end up a special characters and repeated incrementing count message appears which stayed stationary in the serial window
Indeed, I spent more than enough time trying to parse them and still failed. Normally, just cutting-n-pasting from a terminal window, where the backspace etc. characters have already been interpreted, works out ok.
It may be worth starting a new mail thread?
I see that was done for the kernel splat for running perf without enabling sinks. I'm going to just say I'd expect perf to run the same way every time, the same way 'ls' runs, no matter what underlying h/w is in use. AFAICT, Coresight is currently the only device that makes perf have this 'unnatural' behaviour. E.g., the fact that sinks have to be turned on outside of perf, only to have perf in turn, turn *off* those same sinks, I consider asymmetric behaviour and poor design in terms of a perf user's expected usage.
That was the initial design, but has now been changed to allow the sink to be specifed on the perf command line. We are in the process of changing the kernel docs to state this, matching the OpenCSD howto docs.
Also, can you confirm the OOM when enabling the sinks prior to perf invocation? If all else fails, can we work on fixing that problem?
I can double check this when I get into the office tomorrow, but I expect it will still be there as I could reproduce on r1 + gcc 4.9. This is worth investigating a some point, but surely secondary to you figuring out why you are hitting a hard crash.
OK, I've trimmed out all the nitty gritty details where we were trying to compare setups, mainly because I've reached a point where I can finally run an ETM trace without locking up the machine (without having to buy an SSD for an older operating system's rootfs :).
Recipe for success - at least for this juno R2 user - is:
- Use upstream Linus' ToT as source for kernel (not OpenCSD's
tree/branches). This commit (093b995) works for me, built with ubuntu's standard gcc 6.2 cross compiler:
root@juno:~# dmesg | grep gcc [ 0.000000] Linux version 4.11.0-rc3-00035-g093b995 (kim@dupont) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #25 SMP PREEMPT Wed Mar 22 01:41:55 CDT 2017
- Rebase OpenCSD's perf-opencsd-4.11-rc1 branch (containing S.Pop's
patches) onto acme's upstream 'perf/urgent' branch found here:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git
I used commit e7ede72a6d, which contains a fix for a use-after-free bug, and built perf natively:
root@juno:~# strings -a ./perf | grep "GCC:" GCC: (Debian 6.3.0-6) 6.3.0 20170205
Now I'm able to repeatedly issue these two commands:
echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 uname
somewhat successfully, but most importantly without the hard lockup.
I say somewhat because a lot of the runs lose pretty much all the data?:
root@juno:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink root@juno:~# taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 ./sort-O3 [ 1142.454452] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled [ 1142.460852] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled Bubble sorting array of 30000 elements [ 1144.169454] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled
<lots of funnel messages snipped> [ 1153.166523] coresight-tmc 20010000.etf: TMC-ETB/ETF disabled [ 1153.172365] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled [ 1153.178760] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled 10718 ms [ 1153.195407] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled [ 1153.201875] coresight-funnel 20040000.funnel: FUNNEL inport 0 disabled [ 1153.208353] coresight-tmc 20010000.etf: TMC-ETB/ETF disabled [ perf record: Woken up 59 times to write data ] Warning: AUX data lost 57 times out of 58!
[ perf record: Captured and wrote 3.579 MB perf-etm.data ] root@juno:~#
Anyway, what's a good means of measuring the overhead incurred by perf -e cs-etm//? By using perf stat on perf itself?:
root@juno:~# /home/kim/git/OpenCSD/tools/perf/perf stat taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 ./sort-O3 [ 1592.081642] coresight-funnel 20040000.funnel: FUNNEL inport 0 enabled [ 1592.088042] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled
<lots of funnel messages snipped> [ 1605.184854] coresight-funnel 220c0000.funnel: FUNNEL inport 1 enabled 13035 ms [ 1605.254383] coresight-funnel 220c0000.funnel: FUNNEL inport 1 disabled [ 1605.260850] coresight-funnel 20040000.funnel: FUNNEL inport 0 disabled [ 1605.267328] coresight-tmc 20010000.etf: TMC-ETB/ETF disabled [ perf record: Woken up 83 times to write data ] Warning: AUX data lost 82 times out of 85!
[ perf record: Captured and wrote 5.164 MB perf-etm.data ]
Performance counter stats for 'taskset -c 2 /home/kim/git/OpenCSD/tools/perf/perf record -o perf-etm.data -e cs_etm//u --per-thread taskset -c 2 ./sort-O3':
6568.653900 task-clock (msec) # 0.496 CPUs utilized 181 context-switches # 0.028 K/sec 1 cpu-migrations # 0.000 K/sec 1,950 page-faults # 0.297 K/sec <not counted> cycles (0.00%) <not counted> instructions (0.00%) <not counted> branches (0.00%) <not counted> branch-misses (0.00%) 13.235000365 seconds time elapsed
root@juno:~# /home/kim/git/OpenCSD/tools/perf/perf stat taskset -c 2 ./sort-O3Bubble sorting array of 30000 elements 5364 ms
Performance counter stats for 'taskset -c 2 ./sort-O3':
5365.715560 task-clock (msec) # 1.000 CPUs utilized 3 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 126 page-faults # 0.023 K/sec <not counted> cycles (0.00%) <not counted> instructions (0.00%) <not counted> branches (0.00%) <not counted> branch-misses (0.00%) 5.366555299 seconds time elapsed
Is this the expected overhead, given the (85 - 82) / 85 = 3.5% success rate of capturing and storing the trace data? I'd expect a lot more than 5MB of trace for the incurred 2.5x execution time (8 seconds in this case). I can't help but imagine those funnel printk's have a negative influence in the area...
The "loss" in this case is a flag indicator that the 64k ETF is wrapping. So there is some data loss, but you do get a full 64k buffer each time. You should get less loss if you use the ETR.
Mike
Kim