Hi Dietmar,
On 21 September 2016 at 12:14, Dietmar Eggemann dietmar.eggemann@arm.com wrote:
Hi Vincent,
On 12/09/16 08:47, Vincent Guittot wrote:
Fix the insertion of cfs_rq in rq->leaf_cfs_rq_list to ensure that a child will always be called before its parent.
The hierarchical order in shares update list has been introduced by commit 67e86250f8ea ("sched: Introduce hierarchal order on shares update list")
With the current implementation a child can be still put after its parent.
Lets take the example of root \ b /\ c d* | e*
with root -> b -> c already enqueued but not d -> e so the leaf_cfs_rq_list looks like: head -> c -> b -> root -> tail
The branch d -> e will be added the first time that they are enqueued, starting with e then d.
When e is added, its parents is not already on the list so e is put at the tail : head -> c -> b -> root -> e -> tail
Then, d is added at the head because its parent is already on the list: head -> d -> c -> b -> root -> e -> tail
e is not placed at the right position and will be called the last whereas it should be called at the beginning.
Because it follows the bottom-up enqueue sequence, we are sure that we will finished to add either a cfs_rq without parent or a cfs_rq with a parent that is already on the list. We can use this event to detect when we have finished to add a new branch. For the others, whose parents are not already added, we have to ensure that they will be added after their children that have just been inserted the steps before, and after any potential parents that are already in the list. The easiest way is to put the cfs_rq just after the last inserted one and to keep track of it untl the branch is fully added.
[...]
I tested it with a multi-level task group hierarchy and it does the right thing for this testcase (task t runs alternately on Cpu0 and Cpu1 in tg w/ tg_css_id=6) in a multi-level task group hierarchy (tg_css_id=2,4,6).
Thanks for testing
I wonder if this patch is related to the comment "TODO: fix up out-of-order children on enqueue." in update_shares_cpu() of commit 82958366cfea ("sched: Replace update_shares weight distribution with per-entity computation")?
I had not that comment in mind when i have done this patch only to ensure that the propagation of load and utilization in children will be done before testing their parents That being said, I agree that the comment probably points to the same ordering issue than what this patch solves
I guess in the meantime we lost the functionality to remove a cfs_rq from the leaf_cfs_rq list once there are no se's enqueued on it anymore. If e.g. t migrates away from Cpu1, all the cfs_rq's of the task hierarchy (for tg_css_id=2,4,6) owned by Cpu1 stay in the leaf_cfs_rq list.
Shouldn't we reintegrate it? Following patch goes on top of this patch:
I see one problem: Once a cfs_rq has been removed , it will not be periodically updated anymore until the next enqueue. A sleeping task is only attached but not enqueued when it moves into a cfs_rq so its load is added to cfs_rq's load which have to be periodically updated/decayed. So adding a cfs_rq to the list only during an enqueue is not enough in this case.
Then, only the highest child level of task group will be removed because cfs_rq->nr_running will be > 0 as soon as a child task group is created and enqueued into a task group. Or you should use cfs.h_nr_running instead i don't know all implications
Regards, Vincent
-- >8 --
From: Dietmar Eggemann dietmar.eggemann@arm.com Date: Tue, 20 Sep 2016 17:30:09 +0100 Subject: [PATCH] Re-integrate list_del_leaf_cfs_rq() into update_blocked_averages()
There is no functionality anymore to delete a cfs_rq from the leaf_cfs_rq list in case there are no se's enqueued on it.
The functionality had been initially introduced by commit 82958366cfea ("sched: Replace update_shares weight distribution with per-entity computation") but has been taken out by commit 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking").
Signed-off-by: Dietmar Eggemann dietmar.eggemann@arm.com
kernel/sched/fair.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dd530359ef84..951c83337e2b 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6584,8 +6584,12 @@ static void update_blocked_averages(int cpu) update_tg_load_avg(cfs_rq, 0);
/* Propagate pending load changes to the parent */
if (cfs_rq->tg->se[cpu])
if (cfs_rq->tg->se[cpu]) { update_load_avg(cfs_rq->tg->se[cpu], 0, 0);
if (!cfs_rq->nr_running)
list_del_leaf_cfs_rq(cfs_rq);
} } raw_spin_unlock_irqrestore(&rq->lock, flags);
}
1.9.1