Hi Joel,
On 10/12/2017 02:47 PM, Joel Fernandes wrote:
On Thu, Oct 12, 2017 at 10:03 AM, Rohit Jain rohit.k.jain@oracle.com wrote:
Hi Joel, Atish,
Moving off-line discussions to LKML, just so everyone's on the same page, I actually like this version now and it is outperforming my previous code, so I am on board with this version. It makes the code simpler too.
I think you should have explained what the version does differently. Nobody can read your mind.
I apologize for being terse (will do better next time)
This is based on your (offline) suggestion (and rightly so), that find_idlest_group today bases its decision on capacity_spare_wake which in turn only looks at the original capacity of the CPU. This diff (version) changes that to look at the current capacity after being scaled down (due to IRQ/RT/etc.).
Also, this diff changed find_idlest_group_cpu to not do a search for CPUs based on the 'full_capacity()' function, instead changed it to find the idlest CPU with max available capacity. This way we can avoid all the 'backup' stuff in the code as in the version (v5) below it.
I think as you can see from the way it will work itself out that the code will look much simpler with the new search. This is OK because we are doing a full CPU search in the sched_group_span anyway.
[..]
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 56f343b..a1f622c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5724,7 +5724,7 @@ static int cpu_util_wake(int cpu, struct task_struct *p);
static unsigned long capacity_spare_wake(int cpu, struct task_struct *p) {
- return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
return capacity_of(cpu) - cpu_util_wake(cpu, p); }
/*
@@ -5870,6 +5870,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this unsigned long load, min_load = ULONG_MAX; unsigned int min_exit_latency = UINT_MAX; u64 latest_idle_timestamp = 0;
- unsigned int idle_cpu_cap = 0; int least_loaded_cpu = this_cpu; int shallowest_idle_cpu = -1; int i;
@@ -5881,6 +5882,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this /* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_span(group), &p->cpus_allowed) { if (idle_cpu(i)) {
int idle_candidate = -1; struct rq *rq = cpu_rq(i); struct cpuidle_state *idle = idle_get_state(rq); if (idle && idle->exit_latency < min_exit_latency) {
@@ -5891,7 +5893,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this */ min_exit_latency = idle->exit_latency; latest_idle_timestamp = rq->idle_stamp;
shallowest_idle_cpu = i;
idle_candidate = i; } else if ((!idle || idle->exit_latency == min_exit_latency) && rq->idle_stamp > latest_idle_timestamp) { /*
@@ -5900,8 +5902,14 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this * a warmer cache. */ latest_idle_timestamp = rq->idle_stamp;
shallowest_idle_cpu = i;
idle_candidate = i; }
if (idle_candidate != -1 &&
(capacity_of(idle_candidate) > idle_cpu_cap)) {
shallowest_idle_cpu = idle_candidate;
idle_cpu_cap = capacity_of(idle_candidate);
}
This is broken, incase idle_candidate != -1 but idle_cpu_cap makes the condition false - you're still setting min_exit_latency which is wrong.
Yes, you're right. I will fix this.
Also this means if you have 2 CPUs and 1 is in a shallower idle state than the other, but lesser in capacity, then it would select the CPU with less shallow idle state right? So 'shallowest_idle_cpu' loses its meaning.
OK, I will change the name
Thanks, Rohit
[..]