Hi Morten,
On Thu, Oct 27, 2016 at 04:49:54PM +0100, Morten Rasmussen wrote:
On Thu, Oct 27, 2016 at 01:28:51AM +0800, Leo Yan wrote:
@@ -6878,17 +6893,22 @@ static const unsigned int sched_nr_migrate_break = 32; */ static int detach_tasks(struct lb_env *env) {
struct list_head *tasks = &env->src_rq->cfs_tasks; struct task_struct *p; unsigned long load; int detached = 0;
struct sched_entity *se;
struct rb_node *node; lockdep_assert_held(&env->src_rq->lock); if (env->imbalance <= 0) return 0;
while (!list_empty(tasks)) {
node = rb_first(&env->src_rq->seq_node);
while (node) {
se = rb_entry(node, struct sched_entity, seq_node);
/* * We don't want to steal all, otherwise we may be treated likewise, * which could at worst lead to a livelock crash.
@@ -6896,7 +6916,7 @@ static int detach_tasks(struct lb_env *env) if (env->idle != CPU_NOT_IDLE && env->src_rq->nr_running <= 1) break;
p = list_first_entry(tasks, struct task_struct, se.group_node);
p = task_of(se); env->loop++; /* We've more or less seen every task there is, call it quits */
@@ -6944,9 +6964,10 @@ static int detach_tasks(struct lb_env *env) if (env->imbalance <= 0) break;
node = rb_first(&env->src_rq->seq_node); continue;
next:
list_move_tail(&p->se.group_node, tasks);
node = rb_next(&se->seq_node);
I think you subtly break existing detach_tasks() behaviour here. list_move_tail() is cleverly used to rotate the list as we work through it such that the starting point is not the same each time we try to balance.
The rationale is that we may bail out before we have iterated through the entire list (env->loop++ < env->loop_max) if we think we have spent too much time trying to pull tasks. If didn't pull any tasks we may start from the exact same position in the list and reach the same result for subsequent load-balancing attempts. To prevent that we continue iterating over the same env->loop_max tasks each time, we need to change the starting point. list_move_tail() does that by moving tasks that we couldn't migrate to the tail of the list.
For that reason you can't use your rb-tree as a drop-in replacement for the task list.
Thanks for reveiwing.
IMHO, for this reason I think it's easily to add a cache node to point to last one node which iterate in previous time before bail out. So next time we can start to iterate from the cache node but not the root node. Do you think this is feasible?
Thanks, Leo Yan