Dmitry,
The reason for the slowdown is that perf sched record default settings is tuned for x86 pretty much, and there's a huge amount of data being generated.
perf sched record is just a wrapper for perf record so try using this script for recording:
#!/bin/sh perf record \ -a \ -R \ -f \ -m 8192 \ -c 1 \ -e sched:sched_switch \ -e sched:sched_process_exit \ -e sched:sched_process_fork \ -e sched:sched_wakeup \ -e sched:sched_migrate_task
You can verify that it works by looking at the amount of times that perf got woken up; typically is something like this
root@omap4430-panda:~# time ./perf-sched-record.sh ^C[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.031 MB perf.data (~1357 samples) ]
real 0m8.226s user 0m0.016s sys 0m0.641s
While running vanilla perf you get this:
root@omap4430-panda:~# time ./perf sched record ^C[ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 11.678 MB perf.data (~510240 samples) ] Processed 120671 events and lost 1 chunks!
Check IO/CPU overload!
real 0m11.039s user 0m0.141s sys 0m1.266s
That works for spr-replay just fine; might work for you as well.
Regards
-- Pantelis
On Apr 4, 2012, at 1:13 PM, Dmitry Antipov wrote:
On 04/02/2012 02:18 PM, Pantelis Antoniou wrote:
Ah, about the load it's because perf sched record adds too many events to the recording (and configuring small buffers for perf). Using a smaller set of events works much better.
I tried with a different subsets of 'sched:*' events, but it didn't help too much - shell interactivity ruins to almost zero for everything beyond 'perf sched record sleep 10'.
One thing I did was to record on /tmp - You have enough memory for this to work.
Even with this, I can see a periodical noise about lost samples. It looks like that perf subsystem is quite CPU-intensive even for the case where the workload itself is just a thing like 'sleep 10'.
Dmitry