In find_energy_effecient_cpu(), if we set a task min util clamp to 1024, all cpus will be skipped because of !fit_capcity() condition. And the return value will always be prev_cpu (best_energy_cpu initial value).
For this case is it better to find max spare capcity CPU in the whole system? ex. diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e5b54715b..7e431195753e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6536,6 +6536,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; unsigned long cpu_cap, util, base_energy = 0; + unsigned long sys_max_spare_cap = 0; + int sys_max_spare_cap_cpu = prev_cpu; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd; @@ -6576,6 +6578,14 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) cpu_cap = capacity_of(cpu); spare_cap = cpu_cap - util;
+ /* + * Find the CPU with the maximum spare capacity in + * the performance domain + */ + if (spare_cap > sys_max_spare_cap) { + sys_max_spare_cap = spare_cap; + sys_max_spare_cap_cpu = cpu; + } /* * Skip CPUs that cannot satisfy the capacity request. * IOW, placing the task there would make the CPU @@ -6622,7 +6632,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * least 6% of the energy used by prev_cpu. */ if (prev_delta == ULONG_MAX) - return best_energy_cpu; + return sys_max_spare_cap_cpu;
if ((prev_delta - best_delta) > ((prev_delta + base_energy) >> 4)) return best_energy_cpu;
Best Regards, Yun
Hi Yun,
On 29/07/2020 05:17, 向澐 wrote:
In find_energy_effecient_cpu(), if we set a task min util clamp to 1024, all cpus will be skipped because of !fit_capcity() condition. And the return value will always be prev_cpu (best_energy_cpu initial value).
For this case is it better to find max spare capcity CPU in the whole system? ex.
you're highlighting an existing issue in the current code in Linux mainline as well as in Android Common Kernel.
In case the task's UCLAMP_MIN value is larger than 0.8 * 1024 (~819) there won't be any CPU in the system which fits the capacity request. But this shouldn't be a showstopper right now since you can configure your task's UCLAMP_MIN always in the range of [0 .. 819]. By default task's UCLAMP_MIN value is 0.
IMHO, since no CPU in the system can handle such a UCLAMP_MIN value we should think about to disable the possibility to set those UCLAMP_MIN values in the first place.
But there is more to it then just this static 819 boundary. Since we use cpu_cap = capacity_of(cpu) (and not capacity_orig_of(cpu)), the cpu_cap might be < 1024 on a big CPU.
So even a task with smaller UCLAMP_MIN values than 819 might suffer from the fact that find_energy_efficient_cpu() returns prev_cpu in case there is capacity pressure on all the big CPUs (due to RT/DL/IRQ/thermal).
There are several ideas on the table to solve (or at least mitigate) this issue:
(1) Reduce tasks UCLAMP_MIN space to [0 .. 819] (mentioned above) or remap [0 ... 1023] to [0 .. 819] internally.
(2) System-wide spare capacity CPU (your proposal).
The issue w/ this is that you would return the system-wide max_spare_capacity CPU although CPUs sharing the Perf Domain with prev_cpu could serve as a best_energy_cpu. This can happen in case only prev_cpu experiences (temporary) capacity pressure.
(3) Route a task which cannot fit on prev_cpu due to its UCLAMP_MIN value through select_idle_sibling() -> select_idle_capacity(). I.e. add an appropriate check next to rd->overutilized and sd check at the beginning of find_energy_efficient_cpu() which goes to fail
Similar issue as in (2) here.
(4) Deal with the fact that we return prev cpu in this case (current behavior). Doesn't seem to bad in combination with (1).
I guess each solution can only work in combination with (1).
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e5b54715b..7e431195753e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6536,6 +6536,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; unsigned long cpu_cap, util, base_energy = 0;
- unsigned long sys_max_spare_cap = 0;
- int sys_max_spare_cap_cpu = prev_cpu; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd;
@@ -6576,6 +6578,14 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) cpu_cap = capacity_of(cpu); spare_cap = cpu_cap - util;
- /*
- Find the CPU with the maximum spare capacity in
- the performance domain
- */
- if (spare_cap > sys_max_spare_cap) {
- sys_max_spare_cap = spare_cap;
- sys_max_spare_cap_cpu = cpu;
- } /*
- Skip CPUs that cannot satisfy the capacity request.
- IOW, placing the task there would make the CPU
@@ -6622,7 +6632,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
- least 6% of the energy used by prev_cpu.
*/ if (prev_delta == ULONG_MAX)
- return best_energy_cpu;
- return sys_max_spare_cap_cpu;
On 08/14/20 19:44, Dietmar Eggemann wrote:
Hi Yun,
On 29/07/2020 05:17, 向澐 wrote:
In find_energy_effecient_cpu(), if we set a task min util clamp to 1024, all cpus will be skipped because of !fit_capcity() condition. And the return value will always be prev_cpu (best_energy_cpu initial value).
For this case is it better to find max spare capcity CPU in the whole system? ex.
you're highlighting an existing issue in the current code in Linux mainline as well as in Android Common Kernel.
In case the task's UCLAMP_MIN value is larger than 0.8 * 1024 (~819) there won't be any CPU in the system which fits the capacity request. But this shouldn't be a showstopper right now since you can configure your task's UCLAMP_MIN always in the range of [0 .. 819]. By default task's UCLAMP_MIN value is 0.
IMHO, since no CPU in the system can handle such a UCLAMP_MIN value we should think about to disable the possibility to set those UCLAMP_MIN values in the first place.
Keep in mind that uclamp is used in schedutil for frequency selection. By restricting the range you restrict the hint for schedutil too.
I think this implementation detail of how find_energy_efficient_cpu() works should be handled by this function without leaking its implementation details to user space or other users.
But there is more to it then just this static 819 boundary. Since we use cpu_cap = capacity_of(cpu) (and not capacity_orig_of(cpu)), the cpu_cap might be < 1024 on a big CPU.
So even a task with smaller UCLAMP_MIN values than 819 might suffer from the fact that find_energy_efficient_cpu() returns prev_cpu in case there is capacity pressure on all the big CPUs (due to RT/DL/IRQ/thermal).
There are several ideas on the table to solve (or at least mitigate) this issue:
(1) Reduce tasks UCLAMP_MIN space to [0 .. 819] (mentioned above) or remap [0 ... 1023] to [0 .. 819] internally.
Reducing uclamp_min in userspace is a hack. Which is okay as a temporary workaround. This mapping issue is internal implementation detail that must be fixed internally IMO.
I haven't giving it much thought (nor testing!), but wouldn't something along the lines of below be enough?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2ba8f230feb9..3e9bb8a51315 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6555,7 +6555,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) { unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; - unsigned long cpu_cap, util, base_energy = 0; + unsigned long cpu_cap, uclamp_util, util, base_energy = 0; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd; @@ -6603,7 +6603,10 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * much capacity we can get out of the CPU; this is * aligned with schedutil_cpu_util(). */ - util = uclamp_rq_util_with(cpu_rq(cpu), util, p); + uclamp_util = uclamp_rq_util_with(cpu_rq(cpu), util, p); + if (uclamp_util > util) + util = min(uclamp_util, 818); + if (!fits_capacity(util, cpu_cap)) continue;
(2) System-wide spare capacity CPU (your proposal).
The issue w/ this is that you would return the system-wide max_spare_capacity CPU although CPUs sharing the Perf Domain with prev_cpu could serve as a best_energy_cpu. This can happen in case only prev_cpu experiences (temporary) capacity pressure.
(3) Route a task which cannot fit on prev_cpu due to its UCLAMP_MIN value through select_idle_sibling() -> select_idle_capacity(). I.e. add an appropriate check next to rd->overutilized and sd check at the beginning of find_energy_efficient_cpu() which goes to fail
Similar issue as in (2) here.
boosting != overutilized IMO.
If the task asked for more boosting than can be satisfied, then best effort is a better fallback. 1024 for me translates into the 'maximum available performance point'. If the system is under pressured, we still want to try to return the maximum performance point under pressure, even if it is not the absolute maximum that one can get without pressure.
If the actual utilization of the task is high, then over utilization logic will trigger anyway.
(4) Deal with the fact that we return prev cpu in this case (current behavior). Doesn't seem to bad in combination with (1).
Fixing 1 internally fixes the immediate bug agreed. I think 2 is a good option, but that is an optimization rather than a bug fix and needs more info to see how much it's worth it.
Yun, can you share more info about your use case?
Thanks
-- Qais Yousef
I guess each solution can only work in combination with (1).
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e5b54715b..7e431195753e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6536,6 +6536,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; unsigned long cpu_cap, util, base_energy = 0;
- unsigned long sys_max_spare_cap = 0;
- int sys_max_spare_cap_cpu = prev_cpu; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd;
@@ -6576,6 +6578,14 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) cpu_cap = capacity_of(cpu); spare_cap = cpu_cap - util;
- /*
- Find the CPU with the maximum spare capacity in
- the performance domain
- */
- if (spare_cap > sys_max_spare_cap) {
- sys_max_spare_cap = spare_cap;
- sys_max_spare_cap_cpu = cpu;
- } /*
- Skip CPUs that cannot satisfy the capacity request.
- IOW, placing the task there would make the CPU
@@ -6622,7 +6632,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
- least 6% of the energy used by prev_cpu.
*/ if (prev_delta == ULONG_MAX)
- return best_energy_cpu;
- return sys_max_spare_cap_cpu;
On Mon, Aug 17, 2020 at 02:39:05PM +0100, Qais Yousef wrote:
On 08/14/20 19:44, Dietmar Eggemann wrote:
Hi Yun,
On 29/07/2020 05:17, 向澐 wrote:
In find_energy_effecient_cpu(), if we set a task min util clamp to 1024, all cpus will be skipped because of !fit_capcity() condition. And the return value will always be prev_cpu (best_energy_cpu initial value).
For this case is it better to find max spare capcity CPU in the whole system? ex.
you're highlighting an existing issue in the current code in Linux mainline as well as in Android Common Kernel.
In case the task's UCLAMP_MIN value is larger than 0.8 * 1024 (~819) there won't be any CPU in the system which fits the capacity request. But this shouldn't be a showstopper right now since you can configure your task's UCLAMP_MIN always in the range of [0 .. 819]. By default task's UCLAMP_MIN value is 0.
IMHO, since no CPU in the system can handle such a UCLAMP_MIN value we should think about to disable the possibility to set those UCLAMP_MIN values in the first place.
Keep in mind that uclamp is used in schedutil for frequency selection. By restricting the range you restrict the hint for schedutil too.
I think this implementation detail of how find_energy_efficient_cpu() works should be handled by this function without leaking its implementation details to user space or other users.
But there is more to it then just this static 819 boundary. Since we use cpu_cap = capacity_of(cpu) (and not capacity_orig_of(cpu)), the cpu_cap might be < 1024 on a big CPU.
So even a task with smaller UCLAMP_MIN values than 819 might suffer from the fact that find_energy_efficient_cpu() returns prev_cpu in case there is capacity pressure on all the big CPUs (due to RT/DL/IRQ/thermal).
There are several ideas on the table to solve (or at least mitigate) this issue:
(1) Reduce tasks UCLAMP_MIN space to [0 .. 819] (mentioned above) or remap [0 ... 1023] to [0 .. 819] internally.
Reducing uclamp_min in userspace is a hack. Which is okay as a temporary workaround. This mapping issue is internal implementation detail that must be fixed internally IMO.
IMO, uclamp is simply clamping to task util. So this remap hack is a little weird to me. And as Dietmar mentioned, if there are other capacity pressure (RT/DL/IRQ/thermal), we will still meet the same problem.
I haven't giving it much thought (nor testing!), but wouldn't something along the lines of below be enough?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 2ba8f230feb9..3e9bb8a51315 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6555,7 +6555,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) { unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
unsigned long cpu_cap, util, base_energy = 0;
unsigned long cpu_cap, uclamp_util, util, base_energy = 0; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd;
@@ -6603,7 +6603,10 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * much capacity we can get out of the CPU; this is * aligned with schedutil_cpu_util(). */
util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
uclamp_util = uclamp_rq_util_with(cpu_rq(cpu), util, p);
if (uclamp_util > util)
util = min(uclamp_util, 818);
if (!fits_capacity(util, cpu_cap)) continue;
(2) System-wide spare capacity CPU (your proposal).
The issue w/ this is that you would return the system-wide max_spare_capacity CPU although CPUs sharing the Perf Domain with prev_cpu could serve as a best_energy_cpu. This can happen in case only prev_cpu experiences (temporary) capacity pressure.
(3) Route a task which cannot fit on prev_cpu due to its UCLAMP_MIN value through select_idle_sibling() -> select_idle_capacity(). I.e. add an appropriate check next to rd->overutilized and sd check at the beginning of find_energy_efficient_cpu() which goes to fail
Similar issue as in (2) here.
boosting != overutilized IMO.
If the task asked for more boosting than can be satisfied, then best effort is a better fallback. 1024 for me translates into the 'maximum available performance point'. If the system is under pressured, we still want to try to return the maximum performance point under pressure, even if it is not the absolute maximum that one can get without pressure.
I agree. I think the task util/uclamp is a performance requirement. The scheduler should find a CPU to satisfy the task performance requirement (util), and if it is possible, choose the energy-efficient one. So if there is no cpu that can satisfy the requirements, find a max spare cpu may be a better choice.
If the actual utilization of the task is high, then over utilization logic will trigger anyway.
(4) Deal with the fact that we return prev cpu in this case (current behavior). Doesn't seem to bad in combination with (1).
Fixing 1 internally fixes the immediate bug agreed. I think 2 is a good option, but that is an optimization rather than a bug fix and needs more info to see how much it's worth it.
Yun, can you share more info about your use case?
Our case is gaming scenario on android. We want a task (related to game but util is low) to run on big core with the highest frequency. We set uclamp min to 1024 to the task but it is staying on little core.
Thanks
-- Qais Yousef
I guess each solution can only work in combination with (1).
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da3e5b54715b..7e431195753e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6536,6 +6536,8 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; unsigned long cpu_cap, util, base_energy = 0;
- unsigned long sys_max_spare_cap = 0;
- int sys_max_spare_cap_cpu = prev_cpu; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd;
@@ -6576,6 +6578,14 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) cpu_cap = capacity_of(cpu); spare_cap = cpu_cap - util;
- /*
- Find the CPU with the maximum spare capacity in
- the performance domain
- */
- if (spare_cap > sys_max_spare_cap) {
- sys_max_spare_cap = spare_cap;
- sys_max_spare_cap_cpu = cpu;
- } /*
- Skip CPUs that cannot satisfy the capacity request.
- IOW, placing the task there would make the CPU
@@ -6622,7 +6632,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
- least 6% of the energy used by prev_cpu.
*/ if (prev_delta == ULONG_MAX)
- return best_energy_cpu;
- return sys_max_spare_cap_cpu;
Thanks, Yun
On 08/18/20 15:14, Yun Hsiang wrote:
(1) Reduce tasks UCLAMP_MIN space to [0 .. 819] (mentioned above) or remap [0 ... 1023] to [0 .. 819] internally.
Reducing uclamp_min in userspace is a hack. Which is okay as a temporary workaround. This mapping issue is internal implementation detail that must be fixed internally IMO.
IMO, uclamp is simply clamping to task util. So this remap hack is a little weird to me. And as Dietmar mentioned, if there are other capacity pressure (RT/DL/IRQ/thermal), we will still meet the same problem.
It's a stepping point to fix the immediate problem. I agree it's not optimized solution.
(2) System-wide spare capacity CPU (your proposal).
The issue w/ this is that you would return the system-wide max_spare_capacity CPU although CPUs sharing the Perf Domain with prev_cpu could serve as a best_energy_cpu. This can happen in case only prev_cpu experiences (temporary) capacity pressure.
(3) Route a task which cannot fit on prev_cpu due to its UCLAMP_MIN value through select_idle_sibling() -> select_idle_capacity(). I.e. add an appropriate check next to rd->overutilized and sd check at the beginning of find_energy_efficient_cpu() which goes to fail
Similar issue as in (2) here.
boosting != overutilized IMO.
If the task asked for more boosting than can be satisfied, then best effort is a better fallback. 1024 for me translates into the 'maximum available performance point'. If the system is under pressured, we still want to try to return the maximum performance point under pressure, even if it is not the absolute maximum that one can get without pressure.
I agree. I think the task util/uclamp is a performance requirement. The scheduler should find a CPU to satisfy the task performance requirement (util), and if it is possible, choose the energy-efficient one. So if there is no cpu that can satisfy the requirements, find a max spare cpu may be a better choice.
This does sound ideal to me too. But how do you handle the other 20% range, e.g: util_min = 850? In this case another medium CPU could be a better fit and more energy efficient.
If you can improve your patch to handle this 20% scenario and send it as RFC to LKML, it'll help pushing things more forward in this direction :-)
If the actual utilization of the task is high, then over utilization logic will trigger anyway.
(4) Deal with the fact that we return prev cpu in this case (current behavior). Doesn't seem to bad in combination with (1).
Fixing 1 internally fixes the immediate bug agreed. I think 2 is a good option, but that is an optimization rather than a bug fix and needs more info to see how much it's worth it.
Yun, can you share more info about your use case?
Our case is gaming scenario on android. We want a task (related to game but util is low) to run on big core with the highest frequency. We set uclamp min to 1024 to the task but it is staying on little core.
Thanks for sharing!
Cheers
-- Qais Yousef
On Thu, Aug 20, 2020 at 11:38:13AM +0100, Qais Yousef wrote:
On 08/18/20 15:14, Yun Hsiang wrote:
(1) Reduce tasks UCLAMP_MIN space to [0 .. 819] (mentioned above) or remap [0 ... 1023] to [0 .. 819] internally.
Reducing uclamp_min in userspace is a hack. Which is okay as a temporary workaround. This mapping issue is internal implementation detail that must be fixed internally IMO.
IMO, uclamp is simply clamping to task util. So this remap hack is a little weird to me. And as Dietmar mentioned, if there are other capacity pressure (RT/DL/IRQ/thermal), we will still meet the same problem.
It's a stepping point to fix the immediate problem. I agree it's not optimized solution.
(2) System-wide spare capacity CPU (your proposal).
The issue w/ this is that you would return the system-wide max_spare_capacity CPU although CPUs sharing the Perf Domain with prev_cpu could serve as a best_energy_cpu. This can happen in case only prev_cpu experiences (temporary) capacity pressure.
(3) Route a task which cannot fit on prev_cpu due to its UCLAMP_MIN value through select_idle_sibling() -> select_idle_capacity(). I.e. add an appropriate check next to rd->overutilized and sd check at the beginning of find_energy_efficient_cpu() which goes to fail
Similar issue as in (2) here.
boosting != overutilized IMO.
If the task asked for more boosting than can be satisfied, then best effort is a better fallback. 1024 for me translates into the 'maximum available performance point'. If the system is under pressured, we still want to try to return the maximum performance point under pressure, even if it is not the absolute maximum that one can get without pressure.
I agree. I think the task util/uclamp is a performance requirement. The scheduler should find a CPU to satisfy the task performance requirement (util), and if it is possible, choose the energy-efficient one. So if there is no cpu that can satisfy the requirements, find a max spare cpu may be a better choice.
This does sound ideal to me too. But how do you handle the other 20% range, e.g: util_min = 850? In this case another medium CPU could be a better fit and more energy efficient.
If you can improve your patch to handle this 20% scenario and send it as RFC to LKML, it'll help pushing things more forward in this direction :-)
Thanks for the feedback. I think the patch can be modified like this. If there is no CPU can pass fit_capacity, select sys_max_spare_cap_cpu. If there are any CPU that can fit it, find the CPU with the best energy.
if (prev_delta == ULONG_MAX) { /* Pick system max spare cpu if all cpu cannot fit requirement. */ if (best_energy_cpu == prev_cpu) return sys_max_spare_cap_cpu; else return best_energy_cpu; }
If you think this modification is fine, I'll send it as RFC to LKML.
If the actual utilization of the task is high, then over utilization logic will trigger anyway.
(4) Deal with the fact that we return prev cpu in this case (current behavior). Doesn't seem to bad in combination with (1).
Fixing 1 internally fixes the immediate bug agreed. I think 2 is a good option, but that is an optimization rather than a bug fix and needs more info to see how much it's worth it.
Yun, can you share more info about your use case?
Our case is gaming scenario on android. We want a task (related to game but util is low) to run on big core with the highest frequency. We set uclamp min to 1024 to the task but it is staying on little core.
Thanks for sharing!
Cheers
-- Qais Yousef
On 08/25/20 09:59, Yun Hsiang wrote:
This does sound ideal to me too. But how do you handle the other 20% range, e.g: util_min = 850? In this case another medium CPU could be a better fit and more energy efficient.
If you can improve your patch to handle this 20% scenario and send it as RFC to LKML, it'll help pushing things more forward in this direction :-)
Thanks for the feedback. I think the patch can be modified like this. If there is no CPU can pass fit_capacity, select sys_max_spare_cap_cpu. If there are any CPU that can fit it, find the CPU with the best energy.
if (prev_delta == ULONG_MAX) { /* Pick system max spare cpu if all cpu cannot fit requirement. */ if (best_energy_cpu == prev_cpu) return sys_max_spare_cap_cpu; else return best_energy_cpu; }
If you think this modification is fine, I'll send it as RFC to LKML.
Sorry for the delayed response. It's LPC this week so busy with that. And next week I am on holiday.
I suggest creating a proper patch that has:
1. Good explanation of the problem. 2. Good explanation of your proposed solution. 3. Any limitations you are aware of. 4. Any open questions you have that needs addressing.
And tag it as RFC so people would know it's WIP.
AFAICS now, I don't think what you have is enough because you don't explicitly check if the task was boosted
if (beset_energy_cpu == prev_cpu /!! && boosted !!/) { ... }
But I might have looked at it in a haste..
Still not tested, but how does the patch below behave for you? Maybe it needs to be combined with your patch to look for max_spare_capacity_cpu. But I'd be interested to see how it behaves on its own too.
Anyway. Your patch doesn't have to be perfect to be posted to LKML if it's tagged with RFC. We can have wider audience for discussion there :-)
Thanks!
-- Qais Yousef
-->8--
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1a68a0536add..7af5c3eca517 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -113,6 +113,13 @@ int __weak arch_asym_cpu_priority(int cpu) */ #define fits_capacity(cap, max) ((cap) * 1280 < (max) * 1024)
+/* + * When a task is boosted, we want to ignore the 20% margin rule as the task + * has inflated itself to run on a bigger core. That threshold will be + * honoured once the tasks real util reaches the threshold. + */ +#define fits_capacity_boosted(cap, max) ((cap) <= (max)) + #endif
#ifdef CONFIG_CFS_BANDWIDTH @@ -6555,7 +6562,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) { unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX; struct root_domain *rd = cpu_rq(smp_processor_id())->rd; - unsigned long cpu_cap, util, base_energy = 0; + unsigned long cpu_cap, uclamp_util, util, base_energy = 0; int cpu, best_energy_cpu = prev_cpu; struct sched_domain *sd; struct perf_domain *pd; @@ -6603,10 +6610,19 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * much capacity we can get out of the CPU; this is * aligned with schedutil_cpu_util(). */ - util = uclamp_rq_util_with(cpu_rq(cpu), util, p); - if (!fits_capacity(util, cpu_cap)) + uclamp_util = uclamp_rq_util_with(cpu_rq(cpu), util, p); + + if (uclamp_util <= util && + !fits_capacity(uclamp_util, cpu_cap)) { + continue;
+ } else if (uclamp_util > util && + !fits_capacity_boosted(uclamp_util, cpu_cap)) { + + continue; + } + /* Always use prev_cpu as a candidate. */