Re: [PATCH v5 3/4] hmp: Use idle pull to perform forced up-migrations

28 Mar 2014


      Hi Alex,
Glad to have you looking at my code as well :)
On 28/03/14 09:14, Alex Shi wrote:
...
On 03/24/2014 09:47 PM, Chris Redpath wrote:
...
When a normal forced up-migration takes place we stop the task to
be migrated while the target CPU becomes available. This delay can
range from 80us to 1500us on TC2 if the target CPU is in a deep idle
state.
Instead, interrupt the target CPU and ask it to pull a task.
This lets the current eligible task continue executing on the
original CPU while the target CPU wakes. Use a pinned timer to
prevent the pulling CPU going back into power-down with pending
up-migrations.
If we trigger for a nohz kick, it doesn't matter about triggering
for an idle pull since the idle_pull flag will be set when we
execute the softirq and we'll still do the idle pull.
If the target CPU is busy, we will not pull any tasks.
Chris, I do not fully understand the MP feature. So correct me if I am
wrong. :)
The trade off is one more reschedule interrupt, and keep big cpu alive,
that cause more energy cost.
It's not really extra cost. We would have performed the reschedule 
anyway in order to do the migration except that previously we would have 
waited in the CPU stopper on the source CPU while the target CPU woke 
from sleep and now we continue while that happens.
Since we are always waking an idle big CPU when we make this decision, 
we are typically paying an idle wakeup cost each time. When running the 
mobile workloads we are mostly interested in, that idle wakeup is 
frequently a wakeup from cluster shutdown mode which can be over 1ms.
The aim of this change was to try and prevent dropped frames during hmp 
up-migrations caused by the execution stalling while waiting for the 
target CPU to become available.
The CPU keepalive is there to prevent entering deep idle states in the 
couple of hundred microseconds that the CPU stopper takes to run on the 
source CPU. It could be more logically expressed as a (very) temporary 
idle latency requirement, except that we cannot express such constraints 
for a single CPU in the kernel today.
...
So do you have data show the trade off is worthy? like, the res
interrupt cost, cpu alive cost VS go to idle and be wakeup cost. or
benchmark data to show we get benefit from performance/power.
I have traces which show the resulting improvement but it's so small 
that it is lost in the noise in all the benchmarks we have. Most of the 
benchmarks do not actually involve that much migration between clusters 
- typically the 'benchmark' app tasks start heavy processing and 
continue until complete, and with the HMP thresholds we use our lighter 
workloads generally migrate once or twice per operation.
We have loads of data to show that the change has no detrimental impact 
on any of the metrics for our benchmarked scenarios (power is largely 
unchanged as is performance) however I need to complete the 
microbenchmark I've been working on to get some numbers showing what is 
visible in traces.
...
As to the one more res interrupt, could we check the target's pending
timer before add new one? if it has a timer in time, we can save the
keep_alive interrupt.
We could do this, but since we are waking an idle CPU I do not expect us 
to ever have such a pending timer. I have wondered if it is worth 
canceling the pending timer event when we start executing the new task 
but I haven't done any measurements in that area.
...
BTW, did we check the little cpu domain to see if they are under
utilize? If so, relief the big cpu load is helpful on power efficiency.
I just newly idle pull for big cpu, not for little cpu.
We have a bunch of mechanisms in place to try never to have a situation 
where the load on a big CPU could be relieved by a little CPU. In the 
HMP patches, our little CPUs are allowed to run any tasks but the big 
CPUs are only allowed to run tasks where their tracked (unweighted) load 
is above our up-threshold. We do this by removing the scheduler's 
cluster balancing and replacing it at a handful of key points with a 
load-based decision (both task load and domain loads are used).
When big tasks become runnable, we will use a big CPU if we have an idle 
one but we will happily use a little CPU if all the big CPUs are in use. 
While a big task is running, we will move it to a big CPU if there is an 
idle one, and if a big CPU becomes idle it will check that there are no 
suitable tasks on a little CPU that could be pulled before it goes idle.
This way, we mostly avoid a situation where the big cluster is 
overloaded and the little cluster has spare capacity.
If we do get into a situation where we have multiple tasks resident on a 
big CPU (it can happen if we pull something and the original task wakes 
up again), we check to see if the overall progress can be better served 
by offloading one of the heavier tasks to a little CPU.
It is true that in really heavy load situations where you have > 
num_cpus busy tasks, we could do better by allowing tasks to spread 
according to the relative compute capacities of the cores, but this is 
not a situation we have seen in any mobile workload yet. Hopefully 
energy-aware scheduling will handle this perfectly :)
We do have a difference in the newly-idle behaviour depending upon if 
the CPU is big or little. Since we have disabled the cpu-level balance, 
CPUs do not idle pull across the clusters but only inside. This covers 
spreading light load and heavy load idependently and that is the end of 
the story for little CPUs.
big CPUs can idle pull from other big CPUS (as normal) and if that 
doesn't find a task, they have an additional step which will finds the 
heaviest-load task on any little CPU and will pull it if the task load 
is above the up-threshold.
--Chris

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH v5 3/4] hmp: Use idle pull to perform forced up-migrations