On 03/08/2012 05:20 PM, Pantelis Antoniou wrote:
The current issue is that scheduler development is not easily shared between developers. Each developer has their own 'itch', be it Android use cases, server workloads, VM, etc. The risk is high of optimizing for one's own use case and causing severe degradation on most other use cases.
One way to fix this problem would be the development of a method with which one could perform a given use-case workload in a host, record the activity in a interchangeable portable trace format file, and then play it back on another host via a playback application that will generate an approximately similar load which was observed during recording.
Have you tried to investigate whether 'perf' tool with 'sched record' and 'sched replay' features might be useful for such a purpose?
I tried to record and replay the various types of commonly used benchmarks, including CPU, I/O and network intensive workloads, and have to say that the recording and (especially) replaying overhead is quite high, at least for the default Panda board configuration (where main I/O is slow due to root file system on SD card). Simple things like 'perf sched record sleep 10' works for the most of the cases (but still may cause sample loss, up to 10-20%). But, when I tried to add some I/O, for example, with 'find /', the total workload becomes too high and the system (almost) hangs with a lot of messages like:
INFO: task kjournald:512 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. INFO: rcu_preempt detected stalls on CPUs/tasks: 8055ec64 0 512 2 0x00000000 INFO: Stall ended before state dump start INFO: task kjournald:512 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. INFO: task flush-179:0:511 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. INFO: task kjournald:512 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Now I'm checking whether it's possible to do some partial recording (by skipping some kinds of unrelated samples) and offload the kernel tracing subsystem to get more CPUs time for the user-space tasks.
Do you have any thoughts about this?
Thanks, Dmitry