Re: [Question] Regard Of IKS Implementation

10 May 2013

      On 05/08/2013 11:40 PM, Nicolas Pitre wrote:
...
On Wed, 8 May 2013, Leo Yan wrote:
...

When outbound core wake up inbound core, the outbound core's thread will

sleep until the inbound core use MCPM’s early pork to send IPI;
a) Looks like this method somehow is due to TC2 board has long letancy to
power on/off cluster and core; right? How about to use polling method? because
on our own SoC, the wakenup interval will take _only_ about 10 ~ 20us;
There is no need to poll anything.  If your SOC is fast enough in all
cases, then the outbound may simply go ahead and let the inbound resume
with the saved context whenever it is ready.
Yes, i go through the code and here should be fine; Let's keep current 
simple code.
Here have a corner case is: the outbound core set power controller's 
register for power down, then it will flush its L1 cache; if it's the 
last man of the cluster it need flush L2 cache as well, so this 
operation may take long time (about 2ms for 512KB's L2 cache).
At the meantime, the inbound core is running concurrently and the 
inbound core may trigger another switching and call 
*mcpm_cpu_power_up()* to help set some power controller registers' for 
outbound core; so finally when the outbound core call "WFI", the 
outbound core cannot be really powered off by power controller; So the 
polling at here is the inbound core will wait until the outbound core is 
really powered off.
Even the outbound core has not been powered off, it will not introduce 
any issue. Because if the outbound core is waken up from "WFI" state, it 
will run s/w's reset sequence.
Here have ONLY one thing need to confirm: the state machine of SoC's 
power controller will totally not be interfered by upper corner case. :-)
...
...

Now the switching is an async operation, means after the function

bL_switch_request is return back, we cannot say switching has been completed;
so we have some concern for it.
For example, when switch from A15 core to A7 core, then maybe we want to
decrease the voltage so that can save power; if the switching is an async
operation, then it maybe introduce the issue is: after return back from the
function bL_switch_request, then s/w will decrease the voltage; but at the
meantime, the real switching is ongoing on another pair cores.
i browser the git log and get to know at the beginning the switching is synced
by using kernel's workqueue, later changed to use a dedicated kernel thread
with FIFO type; do u think it's better to go ahead to add sync method for
switching?
No.  This is absolutely the wrong way to look at things.
The switcher is _just_ a specialized CPU hotplug agent with a special
side effect.  What it does is to tell the MCPM layer to shut CPU x down,
power up CPU y, etc.  It happens that cpuidle may be doing the same
thing in parallel, and so does the classical CPU hotplug.
So you must add your voltage policy into the MCPM backend for your
platform instead, _irrespective_ of the switcher presence.
First, the switcher is aware of the state on a per logical CPU basis.
It knows when its logical CPU0 switched from the A15 to the A7.  That
logical CPU0 instance doesn't know and doesn't have to know what is
happening with logical CPU1.  The switcher does not perform cluster wide
switching so it does not know when the entire A7 or the entire A15 cores
are down.  That's the job of the MCPM layer.
Another example: suppose that logical CPU0 is running on the A7 and
logical CPU1 is running on the A15, but cpuidle for the later decides to
shut itself down.  The cpuidle driver will ask MCPM to shut down logical
CPU1 which happens to be the last A15 and therefore no more A15 will be
alive at that moment, even if the switcher knows that logical CPU1 is
still tied to the A15.  You certainly want to lower the voltage in that
case too.
MCPM is a basic framework for cpuidle/IKS/hotplug, all low power mode 
should run with MCPM's general APIs; so it's make sense to add related 
code into MCPM backend.
Let's see another scenario:
At the beginning the logical core 0 is running on A7 core, if the 
profiling governor (such as cpufreq's governor) think the performance is 
not high enough, then it will call *bL_switch_request()* to switch to 
A15 core, the function *bL_switch_request()* will directly return back; 
but from this point the governor will think now it has already run on 
A15, so the governor will do profiling based on A15's frequency but 
actually it still run on A7. So the switcher's async operation may 
introduce some misunderstanding for governor;
How about u think for this?
...
...

After enabled switcher, then it will disable hotplug.

Actually current code can support hotplug with IKS; because with IKS, the
logical core will map the according physical core id and GIC's interface id,
so that it can make sure if the system has hotplugged out which physical core,
later the kernel can hotplug in this physical core. So could u give more hints
why iks need disable hotplug?
The problem is to decide what the semantic of a hotplug request would
be.
Let's use an example.  The system boots with CPUs 0,1,2,3.  When the
switcher initializes, it itself hot-unplugs CPUs 2,3 and only CPUs 0,1
remain. Of course physical CPUs 2 and 3 are used when a switch happens,
but even if physical CPU 0 is switched to physical CPU 2, the logical
CPU number as far as Linux is concerned remains CPU 0.  So even if CPU 2
is running, Linux thinks this is still CPU 0.
Now, when the switcher is active, we must forbid any hotplugging of
logical CPUs 2 and 3, or the semantic of the switcher would be broken.
So that means keeping track of CPUs that can and cannot be hotplugged,
and that lack of uniformity is likely to cause confusion in user space
already.
But if you really want to hot-unplug CPU 0, this might correspond to
either physical CPU0 or physical CPU2.  What physical CPU should be
brought back in when a hotplug request comes in?
From the functionality level, who is hot-unplugged, who will be brought 
back.
...
And if the switcher is disabled after hot-unplugging CPU0 when it was
still active, should both physical CPUs 0 and 2 left disabled, or
logical CPU2 should be brought back online nevertheless?
This is a hard decision for dynamical enabling/disabling IKS. Maybe when 
disable IKS, we need go back to the start point before we enable IKS: 
hot-plug all cores and then disable IKS.
...
There are many issues to cover, and the code needed to deal with them
becomes increasingly complex.  And this is not only about the switcher,
as some other parts of the kernel such as pmu might expects to shut down
physical CPU0 when its hotplug callback is invoked for logical CPU0,
etc. etc.
So please tell me: why do you want CPU hotplug in combination with the
switcher in the first place?  Using hotplug for power management is
already a bad idea to start with given the cost and overhead associated
with it.  The switcher does perform CPU hotplugging behind the scene but
it avoids all the extra costs from a hotplug operation of a logical CPU
in the core kernel.
Sometimes the customer has strictly power requirement for phone. We 
found we can get some benefit from hot-plug/hot-unplug when the system 
have low load, the basic reason is: we can reduce times for the core 
enter/exit low power modes.
If the system have very seldom tasks need to run, but there have more 
than one tasks on the core's runqueue, then kernel will send IPI to 
other core to do reschedule to run the thread; if the thread have very 
low workload, the most time will spend on the flow for low power mode's 
enter/exit but not for really tasks, then hot-plug will be better choice.
The per core timer has the same issue, if the core is powered off and 
its local timer cannot use anymore, so kernel need use the broadcast 
timer to help wake up the core and the core will be waken up to handle 
the timer event.
So if hot-unplug the cores, we can avoid many times IPIs, so finally we 
can get some power benefit when system have low workload.
Let's use TC2 as the example to describe the hot-plug's implementation: 
the logical cpu 0 has virtual frequency points: 
175Mhz/200Mhz/250Mhz/300Mhz/350Mhz/400Mhz/450Mhz/500Mhz for A7 core, and 
600Mhz/700Mhz/800Mhz/900Mhz/1000Mhz/1100Mhz/1200MHz for A15 core;
When the system can meet the performance requirement, cpufreq will call 
IKS to firstly switch to A7 core; if the core run <= virtual frequency 
200Mhz, then system can hot-plug the core. If kernel need improve the 
performance, it will execute the reverse flow: hot-plug A7 core -> do 
switch to A15 core.
...
But if you _really_ insist on performing CPU hotplug while using the
switcher, you still can disable the switcher via sysfs, hot-unplug a CPU
still via sysfs, and re-enable the switcher.  When the switcher
reinitializes, it will go through its pairing with the available CPUs,
and if there is no available pairing for a logical CPU because one of
the physical CPU has been hot-unplugged then that logical CPU won't be
available with the switcher.
...
Nicolas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Question] Regard Of IKS Implementation