Dave,
On Thu, Aug 23, 2012 at 8:47 PM, Dave Martin dave.martin@linaro.org wrote:
On Thu, Aug 23, 2012 at 04:51:45PM +0800, Lei Wen wrote:
Hi Dave,
[snip here]
We don't use the virtualisation extensions for this in our current code. It just becomes normal kernel code, analogous to subsystems like cpufreq and CPU hotplug.
Virtualisation is only really needed if we want to trick the OS into thinking that it is not really being migrated between different physical CPUs. This approach has the advantage that it can work with any OS, with no need for modifying the OS. But because we can modify Linux so that it understands and controls the switching, virtualisation is not needed.
This also makes it easier to use the virtualisation extensions for running true hypervisors like KVM, because we don't have to work out a way to let KVM and the switcher co-exist in hypervisor space.
In kernel implementation is a very elegant way to handle the coexisting of switching and kvm. :)
While in-kernel implementation is using paired-cpu switching way, I think it is more close to Big.Little MP solution, which also has both A7/A15 alive. The only different here is that paired-cpu allow one cpu in pair alive. I don't know whether I understand it right, system may run with both A7/A15 existed. Correct me if I am wrong. :)
Ignoring some implementation details, your understanding is correct:
With big.LITTLE MP and the switcher, the kernel has access to all the physical CPUs in the system.
In a sense, the switcher implements a particular policy for how the CPUs are used. big.LITTLE MP just gives all the CPUs to the kernel, but the switcher combines the physical CPUs into big+LITTLE pairs so that only one is running any any given time, and presents those logical paired CPUs to the rest of the kernel.
One question here, do we still need to bind identical cpuid into pair? I think only by way, processor would still believe itself don't need any change. While I don't know whether the changed cluster id would affect system process or not, since this is not visualized by hypervisor as ARM's reference code.
So does this in-kernel implementation take the consideration of load-balance issue which is also faced by the MP solution, since the computing capability difference? Or linaro just did the paired-cpu switching for all A7/A15 pairs, which would mimic the cluster switching in the ARM's reference code?
With big.LITTLE MP, you have many CPUs with differing properties:
bbbbLLLL
whereas with the switcher, the kernel sees fewer CPUs, but they are identical:
ssss
The switcher logical CPUs can run either on the big or little cluster, but for scheduling purposes most of the kernel does not really need to understand this. big/LITTLE becomes an extra performance point parameter for each logical CPU, similar to frequency/voltage scaling. Because the switcher logical CPUs have identical properties, the kernel can treat them as identical for scheduling purposes. This means that the scheduler should work sensibly without any modifications.
Good abstraction! However I cannot see why kernel still believe those logic cpus has same computing capability, if the real cpu running is bLbL. What I can learn from SMP is that the kernel believe the cpus has the same DMIPS. Does the logic cpu fake its DMIPS capability and report the same value to the kernel side?
Just as with frequency/voltage scaling, we can decide when to switch each CPU to big or little depending on how busy that CPU is. Note that the Linaro switcher implementation switches each CPU independently. It does not switch them all at the same time like the ARM reference switcher implementation does.
Does the cpufreq driver need to consider whether switching to the cluster which may bring power benefit? Like if the power of bLLL is higher than LLLL, for bLLL has both cluster powered on, while LLLL only has one cluster works.
Anyway switching independently provide more flexible user policy.
The cost of this simplicity is that you can't run Linux on all the physical CPUs simultaneously. This means lower peak throughput and lower parallelism than is possible with big.LITTLE MP. But a similar level of powersaving should be achievable with both, because they both allow any physical CPU or cluster to be idled or turned off when the system is sufficiently idle.
For most use cases, the reduced throughput probably won't be an issue: the big cores have higher performance than the little cores anyway, so when the platform is running at full throttle, you get most of the theoretically possible system throughput even with the little cores turned off. Having more CPUs active also adds interprocessor/ intercluster communication overheads, so you may still get a better power-performance tradeoff if the little cluster is simply turned off when you want to run the device as fast as possible.
I am curious to know which implementation linaro finally choose. :) Many appreciations for all the support.
The main difference is that the switcher approach does not rely on experimental scheduler modifications, and the impact of the switcher on system behaviour and performance is better understood than for MP right now. Therefore it should be possible to have it working well sooner (and upstreamed sooner) than will be possible with b.L MP.
b.L MP is a more flexible and powerful approach, but this is expected to mature over a longer timescale. Linaro is interested in both.
Yep, the scheduler modification is a tough task. :)
Cheers ---Dave
Thanks, Lei