On Mon, Apr 04, 2016 at 09:48:23AM +0100, Morten Rasmussen wrote:
On Sat, Apr 02, 2016 at 03:11:54PM +0800, Leo Yan wrote:
On Fri, Apr 01, 2016 at 03:28:49PM -0700, Steve Muckle wrote:
I think I follow - Leo please correct me if I mangle your intentions. It's an issue that Morten and Dietmar had mentioned to me as well.
Yes. We have been working on this issue for a while without getting to a nice solution yet.
Good to know this. This patch is mainly for discussion purpose.
[...]
Leo I noticed you did not modify detach_entity_load_average(). I think this would be needed to avoid the task's stats being double counted for a while after switched_from_fair() or task_move_group_fair().
I'm afraid that the solution to problem is more complicated than that :-(
You are adding/removing a contribution from the root cfs_rq.avg which isn't part of the signal in the first place. The root cfs_rq.avg only contains the sum of the load/util of the sched_entities on the cfs_rq. If you remove the contribution of the tasks from there you may end up double-accounting for the task migration. Once due to you patch and then again slowly over time as the group sched_entity starts reflecting that the task has migrated. Furthermore, for group scheduling to make sense it has to be the task_h_load() you add/remove otherwise the group weighting is completely lost. Or am I completely misreading your patch?
Here have one thing want to confirm firstly: though CFS has maintained task group's hierarchy, but between task group's cfs_rq.avg and root cfs_rq.avg, CFS updates these signals independently rather than accouting them by crossing the hierarchy.
So currently CFS decreases the group's cfs_rq.avg for task's migration, but it don't iterate task group's hierarchy to root cfs_rq.avg. I don't understand your meantioned the second accounting by "then again slowly over time as the group sched_entity starts reflecting that the task has migrated."
Another question is: does cfs_rq.avg _ONLY_ signal historic behavior but not present behavior? so even the task has been migrated we still need decay it slowly? Or this will be different between load and util?
I don't think the slow response time for _load_ is necessarily a big problem. Otherwise we would have had people complaining already about group scheduling being broken. It is however a problem for all the initiatives that built on utilization.
Or maybe we need seperate utilization and load, these two signals have different semantics and purpose.
Thanks, Leo Yan