Hi Alex, Thank you very much for running the below benchmark on blocked_load+runnable_load:) Just a few queries.
How did you do the wake up balancing? Did you iterate over the L3 package looking for an idle cpu? Or did you just query the L2 package for an idle cpu?
I think when you are using blocked_load+runnable_load it would be better if we just query the L2 package as Vincent had pointed out because the fundamental behind using blocked_load+runnable_load is to keep a steady state across cpus unless we could reap the advantage of moving the blocked load to a sibling core when it wakes up.
And the drop of performance is relative to what? 1.Your v3 patchset with runnable_load_avg in weighted_cpu_load(). 2.Your v3 patchset with runnable_load_avg+blocked_load_avg in weighted_cpu_load().
Are the above two what you are comparing? And in the above two versions have you included your [PATCH] sched: use instant load weight in burst regular load balance?
On 01/20/2013 09:22 PM, Alex Shi wrote:
The blocked load of a cluster will be high if the blocked tasks have run recently. The contribution of a blocked task will be divided by 2 each 32ms, so it means that a high blocked load will be made of recent running tasks and the long sleeping tasks will not influence the load balancing. The load balance period is between 1 tick (10ms for idle load balance on ARM) and up to 256 ms (for busy load balance) so a high blocked load should imply some tasks that have run recently otherwise your blocked load will be small and will not have a large influence on your load balance
Just tried using cfs's runnable_load_avg + blocked_load_avg in weighted_cpuload() with my v3 patchset, aim9 shared workfile testing show the performance dropped 70% more on the NHM EP machine. :(
Ops, the performance is still worse than just count runnable_load_avg. But dropping is not so big, it dropped 30%, not 70%.
Thank you
Regards Preeti U Murthy