On 12/12/2013 09:30 PM, Vincent Guittot wrote:
========================== 1, Current scheduler load balance is bottom-up mode, each CPU need initiate the balance by self. Like in a integrate computer system, it has smt/core/cpu/numa, 4 level scheduler domains. If there is just 2 tasks in whole system that both running on cpu0. Current load balance need to pull task to another smt in smt domain, then pull task to another core, then pull task to another cpu, finally pull task to another numa. Totally it is need 4 times task moving to get system balance.
I don't fully agree with your example above. Nothing prevent the scheduler to directly migrate a task to another cpu without going to smt and core step. Nevertheless, only one cpu in a group can pull tasks for the entire group at a level (cpu level as an example) and then the tasks will be spread in this group of cpus during load balance at upper level (core and smt level)
Yes, the task moving maybe start from some middle level or top level. but according the balance sequence -- bottom up, and the longer and longer balance interval of higher domain. This type moving is possible.