Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq

3 Feb 2014


      On Mon, 3 Feb 2014, Morten Rasmussen wrote:
...
On Fri, Jan 31, 2014 at 06:19:26PM +0000, Nicolas Pitre wrote:
...
A cluster should map naturally to a scheduling domain.  If we need to 
wake up a CPU, it is quite obvious that we should prefer an idle CPU 
from a scheduling domain which load is not zero.  If the load is not 
zero then this means that any idle CPU in that domain, even if it 
indicated it was ready for a cluster power down, will not require the 
cluster power-up latency as some other CPUs must still be running.  But 
we already know that of course even if the recorded latency might not 
say so.
In other words, the hardware latency information is dynamic of course.  
But we might not _need_ to have it reflected at the scheduler domain all 
the time as in this case it can be inferred by the scheduling domain 
load.
I agree that the existing sched domain hierarchy should be used to
represent the power topology. But, it is not clear to me how much we can say
about the C-state of cpu without checking the load of the entire cluster
every time?
We would need to know which C-states (index) that are per cpu and per
cluster and ignore the cluster states when the cluster load is non-zero.
In any case i.e. whether the cluster load is zero or not, we want to 
select the CPU to wake up with the shallowest C-state.  That should 
correspond to the actual cluster C-state already without having to track 
it explicitly.
...
Current sched domain load is not maintained in the scheduler, it is only
produced when needed. But I guess you could derive the necessary
information from the idle cpu masks.
Even better.
...
...
Within a scheduling domain it is OK to pick up the best idle CPU by 
looking at the index as it is best to leave those CPUs ready for a 
cluster power down set to that state and prefer one which is not.  And a 
scheduling domain with a load of zero should be left alone if idle CPUs 
are found in another domain which load is not zero, irrespective of 
absolute latency information. So all the existing heuristics already in 
place to optimize cache utilization and so on will make things just work 
for idle as well.
IIUC, you propose to only use the index when picking an idle cpu inside
an already busy sched domain and leave idle sched domains alone if
possible. It may work for homogeneous SMP systems, but I don't think it
will work for heterogeneous systems like big.LITTLE.
Hence the caveat "everything else being equal" I said previously.
...
If the little cluster has zero load and the big has stuff running, it
doesn't mean that it is a good idea to wake up another big cpu. It may
be more power efficient to wake up the little cluster. Comparing idle
state index of a big and little cpu won't help us in making that choice
as the clusters may have different idle states and the costs associated
with each state are different.
Agreed.  But let's evolve this in manageable steps.
...
I'm therefore not convinced that idle state index is the right thing to
give the scheduler. Using a cost metric would be better in my
opinion.
That won't be difficult to move from the idle state index to some other 
cost metric once we've proven the simple index on homogeneous systems 
has benefits.
Nicolas

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [RFC PATCH 3/3] idle: store the idle state index in the struct rq