On 28 May 2014 14:10, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, May 23, 2014 at 04:53:02PM +0100, Vincent Guittot wrote:
Monitor the activity level of each group of each sched_domain level. The activity is the amount of cpu_power that is currently used on a CPU or group of CPUs. We use the runnable_avg_sum and _period to evaluate this activity level. In the special use case where the CPU is fully loaded by more than 1 task, the activity level is set above the cpu_power in order to reflect the overload of the CPU
Signed-off-by: Vincent Guittot vincent.guittot@linaro.org
kernel/sched/fair.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b7c51be..c01d8b6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4044,6 +4044,11 @@ static unsigned long power_of(int cpu) return cpu_rq(cpu)->cpu_power; }
+static unsigned long power_orig_of(int cpu) +{
return cpu_rq(cpu)->cpu_power_orig;
+}
static unsigned long cpu_avg_load_per_task(int cpu) { struct rq *rq = cpu_rq(cpu); @@ -4438,6 +4443,18 @@ done: return target; }
+static int get_cpu_activity(int cpu) +{
struct rq *rq = cpu_rq(cpu);
u32 sum = rq->avg.runnable_avg_sum;
u32 period = rq->avg.runnable_avg_period;
if (sum >= period)
return power_orig_of(cpu) + rq->nr_running - 1;
return (sum * power_orig_of(cpu)) / period;
+}
The rq runnable_avg_{sum, period} give a very long term view of the cpu utilization (I will use the term utilization instead of activity as I think that is what we are talking about here). IMHO, it is too slow to be used as basis for load balancing decisions. I think that was also agreed upon in the last discussion related to this topic [1].
The basic problem is that worst case: sum starting from 0 and period already at LOAD_AVG_MAX = 47742, it takes LOAD_AVG_MAX_N = 345 periods (ms) for sum to reach 47742. In other words, the cpu might have been fully utilized for 345 ms before it is considered fully utilized. Periodic load-balancing happens much more frequently than that.
I agree that it's not really responsive but several statistics of the scheduler use the same kind of metrics and have the same kind of responsiveness. I agree that it's not enough and that's why i'm not using only this metric but it gives information that the unweighted load_avg_contrib (that you are speaking about below) can't give. So i would be less contrasted than you and would say that we probably need additional metrics
Also, if load-balancing actually moves tasks around it may take quite a while before runnable_avg_sum actually reflects this change. The next periodic load-balance is likely to happen before runnable_avg_sum has reflected the result of the previous periodic load-balance.
runnable_avg_sum uses a 1ms unit step so i tend to disagree with your point above
To avoid these problems, we need to base utilization on a metric which is updated instantaneously when we add/remove tasks to a cpu (or a least fast enough that we don't see the above problems). In the previous discussion [1] it was suggested that a sum of unweighted task runnable_avg_{sum,period} ratio instead. That is, an unweighted equivalent to weighted_cpuload(). That isn't a perfect solution either.
Regarding the unweighted load_avg_contrib, you will have similar issue because of the slowness in the variation of each sched_entity load that will be added/removed in the unweighted load_avg_contrib.
The update of the runnable_avg_{sum,period} of an sched_entity is quite similar to cpu utilization. This value is linked to the CPU on which it has run previously because of the time sharing with others tasks, so the unweighted load of a freshly migrated task will reflect its load on the previous CPU (with the time sharing with other tasks on prev CPU).
I'm not saying that such metric is useless but it's not perfect as well.
Vincent
It is fine as long as the cpus are not fully utilized, but when they are we need to use weighted_cpuload() to preserve smp_nice. What to do around the tipping point needs more thought, but I think that is currently the best proposal for a solution for task and cpu utilization.
rq runnable_avg_sum is useful for decisions where we need a longer term view of the cpu utilization, but I don't see how we can use as cpu utilization metric for load-balancing decisions at wakeup or periodically.
Morten