On Mon, Feb 17, 2014 at 01:55:06AM +0000, Alex Shi wrote:
The cpu_load decays on time according past cpu load of rq. The sched_avg also decays tasks' load on time. Now we has 2 kind decay for cpu_load. That is a kind of redundancy. And increase the system load by decay calculation. This patch try to remove the cpu_load decay.
There are 5 load_idx used for cpu_load in sched_domain. busy_idx and idle_idx are not zero usually, but newidle_idx, wake_idx and forkexec_idx are all zero on every arch. A shortcut to remove cpu_Load decay in the first patch. just one line patch for this change.
V2, 1, This version do some tuning on load bias of target load, to maximum match current code logical. 2, Got further to remove the cpu_load in rq. 3, Revert the patch 'Limit sd->*_idx range on sysctl' since no needs
Any testing/comments are appreciated.
Removing cpu_load completely certainly makes things simpler, my worry is just how much was lost by doing it. I agree that cpu_load needs a cleanup, but I can't convince myself that just removing it completely and not having any longer term view of cpu load anymore is without any negative side-effects.
{source, target}_load() are now instantaneous views of the cpu load, which means that they may change very frequently. That could potentially lead to more task migrations at all levels in the domain hierarchy as we no longer have the more conservative cpu_load[] indexes that were used at NUMA level.
Maybe some of the NUMA experts have an opinion about this?
In the discussions around V1 I think blocked load came up again as a potential replacement for the current cpu_load array. There are some issues that need to be solved around blocked_load first though.
Morten