Hi there,
There's considerable activity in the subject of the scheduler lately and how to adapt it to the peculiarities of the new class of hardware coming out lately, like the big.LITTLE class of devices from a number of manufacturers.
The platforms that Linux runs are very diverse, and run differing workloads. For example most consumer devices will very likely run something like Android, with common use cases such as audio and/or video playback. Methods to achieve lower power consumption using a power aware scheduler are under investigation.
Similarly for server applications, or VM hosting, the behavior of the scheduler shouldn't have adverse performance implications; any power saving on top of that would be a welcome improvement.
The current issue is that scheduler development is not easily shared between developers. Each developer has their own 'itch', be it Android use cases, server workloads, VM, etc. The risk is high of optimizing for one's own use case and causing severe degradation on most other use cases.
One way to fix this problem would be the development of a method with which one could perform a given use-case workload in a host, record the activity in a interchangeable portable trace format file, and then play it back on another host via a playback application that will generate an approximately similar load which was observed during recording.
The way that the two hosts respond under the same load generated by the playback application can be compared, so that the performance of the two scheduler implementations measured in various metrics (like performance, power consumption etc.) can be evaluated.
The fidelity of the this approximation is of great importance but it is debatable if it is possible to have a fully identical load generated, since details of the hosts might differ in such a way that such a thing is impossible. I believe that it should be possible at least to simulate a purely CPU load, and the blocking behavior of tasks, in such a way that it would result in scheduler decisions that can be compared and shared among developers.
The recording part I believe can be handled by the kernel's tracing infrastructure, either by using the existing tracepoints, or need be adding more; possibly even creating a new tracer solely for this purpose. Since some applications can adapt their behavior according to insufficient system resources (think media players that can drop frames for example), I think it would be beneficial to record such events to the same trace file.
The trace file should have a portable format so that it can be freely shared between developers. An ASCII format like we currently use should be OK, as long as it doesn't cause too much of an effect during execution of the recording.
The playback application can be implemented via two ways.
One way, which is the LinSched way would be to have the full scheduler implementation compiled as part of said application, and use application specific methods to evaluate performance. While this will work, it won't allow comparison of the two hosts in a meaningful manner.
For both scheduler and platform evaluation, the playback application will generate the load on the running host by simulating the source host's recorded work load session. That means emulating process activity like forks, thread spawning, blocking on resources etc. It is not clear to me yet if that's possible without using some kind of kernel level helper module, but not requiring such is desirable.
Since one would have the full trace of scheduling activity: past, present and future; there would be the possibility of generating a perfect schedule (as defined by best performance, or best power consumption), and use it as a yardstick of evaluation against the actual scheduler. Comparing the results, you would get an estimate of the best case improvement that could be achieved if the ideal scheduler existed.
I know this is a bit long, but I hope this can be a basis of thinking on how to go about developing this.
Regards
-- Pantelis