How big.little switcher works when it try to boot all cpus in all clusters and then try to switch?

List overview All Threads
Download

newer

older

[PATCH 0/9] Get rid of...

[PATCH v4 0/8] Function tracing...

Lei Wen

15 Aug 2012 15 Aug '12

2:24 p.m.

Hi Linaro,

I get confused when reading the switcher reference code recently: http://git.linaro.org/git/arm/big.LITTLE/switcher.git

What I cannot understand is in bootwrapper, that code would be running on all cpus, and then only cpu_id=0 and with BOOT_CLUSTER would be survive to go to the next stage of booting kernel, while the others would be kept in wfi to wait till FLAGS_SET gotten updated.

So my question is how could the bootwrapper hind another cpu in one cluster from the kernel's view? I mean how bootwrapper make another cpu in the same cluster as outbound, so that when switch happen, it could take into inbound?

In my understand for kernel's SMP, it would bring up all cores later, so I cannot see where kernel could power down another cluster.

Anyone could help me out of this?

Thanks, Lei

Show replies by date

Dave Martin

15 Aug 15 Aug

6:30 p.m.

New subject: How big.little switcher works when it try to boot all cpus in all clusters and then try to switch?

On Wed, Aug 15, 2012 at 10:24:39PM +0800, Lei Wen wrote:

...

Hi Linaro,

I get confused when reading the switcher reference code recently: http://git.linaro.org/git/arm/big.LITTLE/switcher.git

What I cannot understand is in bootwrapper, that code would be running on all cpus, and then only cpu_id=0 and with BOOT_CLUSTER would be survive to go to the next stage of booting kernel, while the others would be kept in wfi to wait till FLAGS_SET gotten updated.

In this reference implementation, the switching is done by stand-alone code which acts as a hypervisor.

The hypervisor code gets booted on each CPU via this call in c_start():

enter_nonsecure_world((unsigned)bl_image);

... before they go to sleep in secondary_main().

I believe that the code assumes that the secondary cluster is initially held in reset (this is the default behaviour of the ARM fast model).

After booting, an OS can migrate logical CPUs from one cluster to another by using an HVC call to send a request to the hypervisor code.

The HVC dispatcher is in big-little/virtualisor/virt_handle.c. The HVC_SWITCHER_CLUSTER_SWITCH call is routed to big-little/switcher/ async_switchover.c:signal_switchover(), which is responsible for the whole cluster switch, including any power-up or power-down of CPUs and clusters.

...

So my question is how could the bootwrapper hind another cpu in one cluster from the kernel's view? I mean how bootwrapper make another cpu in the same cluster as outbound, so that when switch happen, it could take into inbound?

In my understand for kernel's SMP, it would bring up all cores later, so I cannot see where kernel could power down another cluster.

I'm not sure that I understand your question fully.

In this kind of implementation, the hypervisor would be responsible for most aspects of power handling. The virtualisation support in the CPU allows CPUs to be hidden from Linux, and also allows CPU IDs and interrupt routing to be "faked" so that Linux does not see any change when it is migrated onto a different physical CPU by the hypervisor.

Linaro is currently working on a different implementation which gives Linux much more control over the whole process, and avoids the over- heads associated with using virtualisation for this purpose, but the code is not ready to be published yet.

Does this answer your questions?

---Dave

Lei Wen

16 Aug 16 Aug

1:02 a.m.

Hi Dave,

On Thu, Aug 16, 2012 at 2:30 AM, Dave Martin dave.martin@linaro.org wrote:

...

On Wed, Aug 15, 2012 at 10:24:39PM +0800, Lei Wen wrote:

...
Hi Linaro,

I get confused when reading the switcher reference code recently: http://git.linaro.org/git/arm/big.LITTLE/switcher.git

What I cannot understand is in bootwrapper, that code would be running on all cpus, and then only cpu_id=0 and with BOOT_CLUSTER would be survive to go to the next stage of booting kernel, while the others would be kept in wfi to wait till FLAGS_SET gotten updated.

In this reference implementation, the switching is done by stand-alone code which acts as a hypervisor.

The hypervisor code gets booted on each CPU via this call in c_start():
    enter_nonsecure_world((unsigned)bl_image);
... before they go to sleep in secondary_main().

I believe that the code assumes that the secondary cluster is initially held in reset (this is the default behaviour of the ARM fast model).

This solve my confusion. :) Since secondary cluster is in reset, then Linux would never see two CPUs in different cluster at one time.

...

After booting, an OS can migrate logical CPUs from one cluster to another by using an HVC call to send a request to the hypervisor code.

The HVC dispatcher is in big-little/virtualisor/virt_handle.c. The HVC_SWITCHER_CLUSTER_SWITCH call is routed to big-little/switcher/ async_switchover.c:signal_switchover(), which is responsible for the whole cluster switch, including any power-up or power-down of CPUs and clusters.

...
So my question is how could the bootwrapper hind another cpu in one cluster from the kernel's view? I mean how bootwrapper make another cpu in the same cluster as outbound, so that when switch happen, it could take into inbound?

In my understand for kernel's SMP, it would bring up all cores later, so I cannot see where kernel could power down another cluster.

I'm not sure that I understand your question fully.

In this kind of implementation, the hypervisor would be responsible for most aspects of power handling. The virtualisation support in the CPU allows CPUs to be hidden from Linux, and also allows CPU IDs and interrupt routing to be "faked" so that Linux does not see any change when it is migrated onto a different physical CPU by the hypervisor.

Linaro is currently working on a different implementation which gives Linux much more control over the whole process, and avoids the over- heads associated with using virtualisation for this purpose, but the code is not ready to be published yet.

Is it feasible to disclose in more details? What I can see is to move hypervisor code into kernel and install it at beginning like what KVM current does. While there is also heavy usage of secure monitor code in the switcher, does linaro also want to optimize them out?

...

Does this answer your questions?

Yep, your answer really help me indeed. :)

...

---Dave

Thanks, Lei

Nicolas Pitre

4:10 a.m.

New subject: How big.little switcher works when it try to boot all cpus in all clusters and then try to switch?

On Thu, 16 Aug 2012, Lei Wen wrote:

...

Hi Dave,

On Thu, Aug 16, 2012 at 2:30 AM, Dave Martin dave.martin@linaro.org wrote:

...
In this kind of implementation, the hypervisor would be responsible for most aspects of power handling. The virtualisation support in the CPU allows CPUs to be hidden from Linux, and also allows CPU IDs and interrupt routing to be "faked" so that Linux does not see any change when it is migrated onto a different physical CPU by the hypervisor.

Linaro is currently working on a different implementation which gives Linux much more control over the whole process, and avoids the over- heads associated with using virtualisation for this purpose, but the code is not ready to be published yet.

Is it feasible to disclose in more details? What I can see is to move hypervisor code into kernel and install it at beginning like what KVM current does. While there is also heavy usage of secure monitor code in the switcher, does linaro also want to optimize them out?

That depends on the firmware to come with any given hardware implementation. If there is actually a Secore OS in that firmware then it is likely to assert control over all power up/down operations, in which case the sMCs can't be "optimized out".

But we can and did optimize out all the hypervisor code. You can have a look at this article for more details on the general design:

http://lwn.net/Articles/481055/

The code is available to Linaro members only for now and will be made public and submitted upstream in due course.

Nicolas

Lei Wen

5:08 a.m.

Hi Nico,

On Thu, Aug 16, 2012 at 12:10 PM, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Thu, 16 Aug 2012, Lei Wen wrote:

...
Hi Dave,

On Thu, Aug 16, 2012 at 2:30 AM, Dave Martin dave.martin@linaro.org wrote:

...
In this kind of implementation, the hypervisor would be responsible for most aspects of power handling. The virtualisation support in the CPU allows CPUs to be hidden from Linux, and also allows CPU IDs and interrupt routing to be "faked" so that Linux does not see any change when it is migrated onto a different physical CPU by the hypervisor.

Linaro is currently working on a different implementation which gives Linux much more control over the whole process, and avoids the over- heads associated with using virtualisation for this purpose, but the code is not ready to be published yet.

Is it feasible to disclose in more details? What I can see is to move hypervisor code into kernel and install it at beginning like what KVM current does. While there is also heavy usage of secure monitor code in the switcher, does linaro also want to optimize them out?

That depends on the firmware to come with any given hardware implementation. If there is actually a Secore OS in that firmware then it is likely to assert control over all power up/down operations, in which case the sMCs can't be "optimized out".

I see.

...

But we can and did optimize out all the hypervisor code. You can have a look at this article for more details on the general design:

Do you mean current Linaro would not use the virt extension for the cluster switch? Or just put the hypervisor code inside the kernel?

...

    http://lwn.net/Articles/481055/

Very good article indeed. :)

...

The code is available to Linaro members only for now and will be made public and submitted upstream in due course.

Nicolas

Thanks, Lei

Dave Martin

9:14 a.m.

New subject: How big.little switcher works when it try to boot all cpus in all clusters and then try to switch?

On Thu, Aug 16, 2012 at 01:08:52PM +0800, Lei Wen wrote:

...

Hi Nico,

On Thu, Aug 16, 2012 at 12:10 PM, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Thu, 16 Aug 2012, Lei Wen wrote:

...
Hi Dave,

On Thu, Aug 16, 2012 at 2:30 AM, Dave Martin dave.martin@linaro.org wrote:

...
In this kind of implementation, the hypervisor would be responsible for most aspects of power handling. The virtualisation support in the CPU allows CPUs to be hidden from Linux, and also allows CPU IDs and interrupt routing to be "faked" so that Linux does not see any change when it is migrated onto a different physical CPU by the hypervisor.

Linaro is currently working on a different implementation which gives Linux much more control over the whole process, and avoids the over- heads associated with using virtualisation for this purpose, but the code is not ready to be published yet.

Is it feasible to disclose in more details? What I can see is to move hypervisor code into kernel and install it at beginning like what KVM current does. While there is also heavy usage of secure monitor code in the switcher, does linaro also want to optimize them out?

That depends on the firmware to come with any given hardware implementation. If there is actually a Secore OS in that firmware then it is likely to assert control over all power up/down operations, in which case the sMCs can't be "optimized out".

I see.

...
But we can and did optimize out all the hypervisor code. You can have a look at this article for more details on the general design:

Do you mean current Linaro would not use the virt extension for the cluster switch? Or just put the hypervisor code inside the kernel?

We don't use the virtualisation extensions for this in our current code. It just becomes normal kernel code, analogous to subsystems like cpufreq and CPU hotplug.

Virtualisation is only really needed if we want to trick the OS into thinking that it is not really being migrated between different physical CPUs. This approach has the advantage that it can work with any OS, with no need for modifying the OS. But because we can modify Linux so that it understands and controls the switching, virtualisation is not needed.

This also makes it easier to use the virtualisation extensions for running true hypervisors like KVM, because we don't have to work out a way to let KVM and the switcher co-exist in hypervisor space.

Cheers ---Dave

...

...
    http://lwn.net/Articles/481055/
Very good article indeed. :)

...
The code is available to Linaro members only for now and will be made public and submitted upstream in due course.

Nicolas

Thanks, Lei

Lei Wen

23 Aug 23 Aug

8:51 a.m.

Hi Dave,

[snip here]

...

We don't use the virtualisation extensions for this in our current code. It just becomes normal kernel code, analogous to subsystems like cpufreq and CPU hotplug.

Virtualisation is only really needed if we want to trick the OS into thinking that it is not really being migrated between different physical CPUs. This approach has the advantage that it can work with any OS, with no need for modifying the OS. But because we can modify Linux so that it understands and controls the switching, virtualisation is not needed.

This also makes it easier to use the virtualisation extensions for running true hypervisors like KVM, because we don't have to work out a way to let KVM and the switcher co-exist in hypervisor space.

In kernel implementation is a very elegant way to handle the coexisting of switching and kvm. :)

While in-kernel implementation is using paired-cpu switching way, I think it is more close to Big.Little MP solution, which also has both A7/A15 alive. The only different here is that paired-cpu allow one cpu in pair alive. I don't know whether I understand it right, system may run with both A7/A15 existed. Correct me if I am wrong. :)

So does this in-kernel implementation take the consideration of load-balance issue which is also faced by the MP solution, since the computing capability difference? Or linaro just did the paired-cpu switching for all A7/A15 pairs, which would mimic the cluster switching in the ARM's reference code?

I am curious to know which implementation linaro finally choose. :) Many appreciations for all the support.

Thanks, Lei

Dave Martin

12:47 p.m.

New subject: How big.little switcher works when it try to boot all cpus in all clusters and then try to switch?

On Thu, Aug 23, 2012 at 04:51:45PM +0800, Lei Wen wrote:

...

Hi Dave,

[snip here]

...
We don't use the virtualisation extensions for this in our current code. It just becomes normal kernel code, analogous to subsystems like cpufreq and CPU hotplug.

Virtualisation is only really needed if we want to trick the OS into thinking that it is not really being migrated between different physical CPUs. This approach has the advantage that it can work with any OS, with no need for modifying the OS. But because we can modify Linux so that it understands and controls the switching, virtualisation is not needed.

This also makes it easier to use the virtualisation extensions for running true hypervisors like KVM, because we don't have to work out a way to let KVM and the switcher co-exist in hypervisor space.

In kernel implementation is a very elegant way to handle the coexisting of switching and kvm. :)

While in-kernel implementation is using paired-cpu switching way, I think it is more close to Big.Little MP solution, which also has both A7/A15 alive. The only different here is that paired-cpu allow one cpu in pair alive. I don't know whether I understand it right, system may run with both A7/A15 existed. Correct me if I am wrong. :)

Ignoring some implementation details, your understanding is correct:

With big.LITTLE MP and the switcher, the kernel has access to all the physical CPUs in the system.

In a sense, the switcher implements a particular policy for how the CPUs are used. big.LITTLE MP just gives all the CPUs to the kernel, but the switcher combines the physical CPUs into big+LITTLE pairs so that only one is running any any given time, and presents those logical paired CPUs to the rest of the kernel.

...

So does this in-kernel implementation take the consideration of load-balance issue which is also faced by the MP solution, since the computing capability difference? Or linaro just did the paired-cpu switching for all A7/A15 pairs, which would mimic the cluster switching in the ARM's reference code?

With big.LITTLE MP, you have many CPUs with differing properties:

bbbbLLLL

whereas with the switcher, the kernel sees fewer CPUs, but they are identical:

ssss

The switcher logical CPUs can run either on the big or little cluster, but for scheduling purposes most of the kernel does not really need to understand this. big/LITTLE becomes an extra performance point parameter for each logical CPU, similar to frequency/voltage scaling. Because the switcher logical CPUs have identical properties, the kernel can treat them as identical for scheduling purposes. This means that the scheduler should work sensibly without any modifications.

Just as with frequency/voltage scaling, we can decide when to switch each CPU to big or little depending on how busy that CPU is. Note that the Linaro switcher implementation switches each CPU independently. It does not switch them all at the same time like the ARM reference switcher implementation does.

The cost of this simplicity is that you can't run Linux on all the physical CPUs simultaneously. This means lower peak throughput and lower parallelism than is possible with big.LITTLE MP. But a similar level of powersaving should be achievable with both, because they both allow any physical CPU or cluster to be idled or turned off when the system is sufficiently idle.

For most use cases, the reduced throughput probably won't be an issue: the big cores have higher performance than the little cores anyway, so when the platform is running at full throttle, you get most of the theoretically possible system throughput even with the little cores turned off. Having more CPUs active also adds interprocessor/ intercluster communication overheads, so you may still get a better power-performance tradeoff if the little cluster is simply turned off when you want to run the device as fast as possible.

...

I am curious to know which implementation linaro finally choose. :) Many appreciations for all the support.

The main difference is that the switcher approach does not rely on experimental scheduler modifications, and the impact of the switcher on system behaviour and performance is better understood than for MP right now. Therefore it should be possible to have it working well sooner (and upstreamed sooner) than will be possible with b.L MP.

b.L MP is a more flexible and powerful approach, but this is expected to mature over a longer timescale. Linaro is interested in both.

Cheers ---Dave

Lei Wen

2:57 p.m.

Dave,

On Thu, Aug 23, 2012 at 8:47 PM, Dave Martin dave.martin@linaro.org wrote:

...

On Thu, Aug 23, 2012 at 04:51:45PM +0800, Lei Wen wrote:

...
Hi Dave,

[snip here]

...
We don't use the virtualisation extensions for this in our current code. It just becomes normal kernel code, analogous to subsystems like cpufreq and CPU hotplug.

Virtualisation is only really needed if we want to trick the OS into thinking that it is not really being migrated between different physical CPUs. This approach has the advantage that it can work with any OS, with no need for modifying the OS. But because we can modify Linux so that it understands and controls the switching, virtualisation is not needed.

This also makes it easier to use the virtualisation extensions for running true hypervisors like KVM, because we don't have to work out a way to let KVM and the switcher co-exist in hypervisor space.

In kernel implementation is a very elegant way to handle the coexisting of switching and kvm. :)

While in-kernel implementation is using paired-cpu switching way, I think it is more close to Big.Little MP solution, which also has both A7/A15 alive. The only different here is that paired-cpu allow one cpu in pair alive. I don't know whether I understand it right, system may run with both A7/A15 existed. Correct me if I am wrong. :)

Ignoring some implementation details, your understanding is correct:

With big.LITTLE MP and the switcher, the kernel has access to all the physical CPUs in the system.

In a sense, the switcher implements a particular policy for how the CPUs are used. big.LITTLE MP just gives all the CPUs to the kernel, but the switcher combines the physical CPUs into big+LITTLE pairs so that only one is running any any given time, and presents those logical paired CPUs to the rest of the kernel.

One question here, do we still need to bind identical cpuid into pair? I think only by way, processor would still believe itself don't need any change. While I don't know whether the changed cluster id would affect system process or not, since this is not visualized by hypervisor as ARM's reference code.

...

...
So does this in-kernel implementation take the consideration of load-balance issue which is also faced by the MP solution, since the computing capability difference? Or linaro just did the paired-cpu switching for all A7/A15 pairs, which would mimic the cluster switching in the ARM's reference code?

With big.LITTLE MP, you have many CPUs with differing properties:
    bbbbLLLL
whereas with the switcher, the kernel sees fewer CPUs, but they are identical:
    ssss
The switcher logical CPUs can run either on the big or little cluster, but for scheduling purposes most of the kernel does not really need to understand this. big/LITTLE becomes an extra performance point parameter for each logical CPU, similar to frequency/voltage scaling. Because the switcher logical CPUs have identical properties, the kernel can treat them as identical for scheduling purposes. This means that the scheduler should work sensibly without any modifications.

Good abstraction! However I cannot see why kernel still believe those logic cpus has same computing capability, if the real cpu running is bLbL. What I can learn from SMP is that the kernel believe the cpus has the same DMIPS. Does the logic cpu fake its DMIPS capability and report the same value to the kernel side?

...

Just as with frequency/voltage scaling, we can decide when to switch each CPU to big or little depending on how busy that CPU is. Note that the Linaro switcher implementation switches each CPU independently. It does not switch them all at the same time like the ARM reference switcher implementation does.

Does the cpufreq driver need to consider whether switching to the cluster which may bring power benefit? Like if the power of bLLL is higher than LLLL, for bLLL has both cluster powered on, while LLLL only has one cluster works.

Anyway switching independently provide more flexible user policy.

...

The cost of this simplicity is that you can't run Linux on all the physical CPUs simultaneously. This means lower peak throughput and lower parallelism than is possible with big.LITTLE MP. But a similar level of powersaving should be achievable with both, because they both allow any physical CPU or cluster to be idled or turned off when the system is sufficiently idle.

For most use cases, the reduced throughput probably won't be an issue: the big cores have higher performance than the little cores anyway, so when the platform is running at full throttle, you get most of the theoretically possible system throughput even with the little cores turned off. Having more CPUs active also adds interprocessor/ intercluster communication overheads, so you may still get a better power-performance tradeoff if the little cluster is simply turned off when you want to run the device as fast as possible.

...
I am curious to know which implementation linaro finally choose. :) Many appreciations for all the support.

The main difference is that the switcher approach does not rely on experimental scheduler modifications, and the impact of the switcher on system behaviour and performance is better understood than for MP right now. Therefore it should be possible to have it working well sooner (and upstreamed sooner) than will be possible with b.L MP.

b.L MP is a more flexible and powerful approach, but this is expected to mature over a longer timescale. Linaro is interested in both.

Yep, the scheduler modification is a tough task. :)

...

Cheers ---Dave

Thanks, Lei

Dave Martin

5 p.m.

New subject: How big.little switcher works when it try to boot all cpus in all clusters and then try to switch?

On Thu, Aug 23, 2012 at 10:57:18PM +0800, Lei Wen wrote:

...

Dave,

On Thu, Aug 23, 2012 at 8:47 PM, Dave Martin dave.martin@linaro.org wrote:

...
On Thu, Aug 23, 2012 at 04:51:45PM +0800, Lei Wen wrote:

...
Hi Dave,

[snip here]

...
We don't use the virtualisation extensions for this in our current code. It just becomes normal kernel code, analogous to subsystems like cpufreq and CPU hotplug.

Virtualisation is only really needed if we want to trick the OS into thinking that it is not really being migrated between different physical CPUs. This approach has the advantage that it can work with any OS, with no need for modifying the OS. But because we can modify Linux so that it understands and controls the switching, virtualisation is not needed.

This also makes it easier to use the virtualisation extensions for running true hypervisors like KVM, because we don't have to work out a way to let KVM and the switcher co-exist in hypervisor space.

In kernel implementation is a very elegant way to handle the coexisting of switching and kvm. :)

While in-kernel implementation is using paired-cpu switching way, I think it is more close to Big.Little MP solution, which also has both A7/A15 alive. The only different here is that paired-cpu allow one cpu in pair alive. I don't know whether I understand it right, system may run with both A7/A15 existed. Correct me if I am wrong. :)

Ignoring some implementation details, your understanding is correct:

With big.LITTLE MP and the switcher, the kernel has access to all the physical CPUs in the system.

In a sense, the switcher implements a particular policy for how the CPUs are used. big.LITTLE MP just gives all the CPUs to the kernel, but the switcher combines the physical CPUs into big+LITTLE pairs so that only one is running any any given time, and presents those logical paired CPUs to the rest of the kernel.

One question here, do we still need to bind identical cpuid into pair? I think only by way, processor would still believe itself don't need any change. While I don't know whether the changed cluster id would affect system process or not, since this is not visualized by hypervisor as ARM's reference code.

In most places Linux only looks at a few bits of the MPIDR (the multiprocessor CPU id register). So most of the time Linux will ignore the cluster part of the ID anyway.

This will need to change when proper b.L MP support gets integrated. However, there is still no need to virtualise: is the switcher is being used, Linux is aware of it -- so we can modify the way Linux deduces the currently running CPU ID when the switcher is active.

Since we don't have to solve this problem just yet, I can't predict exactly what the implementation would look like.

...

...
...
So does this in-kernel implementation take the consideration of load-balance issue which is also faced by the MP solution, since the computing capability difference? Or linaro just did the paired-cpu switching for all A7/A15 pairs, which would mimic the cluster switching in the ARM's reference code?

With big.LITTLE MP, you have many CPUs with differing properties:
    bbbbLLLL
whereas with the switcher, the kernel sees fewer CPUs, but they are identical:
    ssss
The switcher logical CPUs can run either on the big or little cluster, but for scheduling purposes most of the kernel does not really need to understand this. big/LITTLE becomes an extra performance point parameter for each logical CPU, similar to frequency/voltage scaling. Because the switcher logical CPUs have identical properties, the kernel can treat them as identical for scheduling purposes. This means that the scheduler should work sensibly without any modifications.
Good abstraction! However I cannot see why kernel still believe those logic cpus has same computing capability, if the real cpu running is bLbL. What I can learn from SMP is that the kernel believe the cpus has the same DMIPS. Does the logic cpu fake its DMIPS capability and report the same value to the kernel side?

No. The situation is exactly the same as for cpufreq. I'm not an expert on the way cpufreq works, but I believe the kernel only normally needs to know the _peak_ performance available on each CPU. Reducing down the frequency or switching to little is a power-saving optimisation which occurs when a CPU is under-utilised. The peak potential capacity of the logical CPUs remains the same all the time -- it is the capacity of the big CPUs running at the higest frequency available.

This has some limitations -- the kernel should do a reasonable job, but proper MP scheduling can theoretically be better.

...

...
Just as with frequency/voltage scaling, we can decide when to switch each CPU to big or little depending on how busy that CPU is. Note that the Linaro switcher implementation switches each CPU independently. It does not switch them all at the same time like the ARM reference switcher implementation does.

Does the cpufreq driver need to consider whether switching to the cluster which may bring power benefit? Like if the power of bLLL is higher than LLLL, for bLLL has both cluster powered on, while LLLL only has one cluster works.

Anyway switching independently provide more flexible user policy.

This would be something for the platform-specific cpufreq driver to decide.

The core switcher code simply provides the maximum flexibility, and the code which builds on top of that then implements whatever policy is appropriate for the platform.

For now, we just have a very simple cpufreq driver for test purposes which doesn't consider such issues.

...

...
The cost of this simplicity is that you can't run Linux on all the physical CPUs simultaneously. This means lower peak throughput and lower parallelism than is possible with big.LITTLE MP. But a similar level of powersaving should be achievable with both, because they both allow any physical CPU or cluster to be idled or turned off when the system is sufficiently idle.

For most use cases, the reduced throughput probably won't be an issue: the big cores have higher performance than the little cores anyway, so when the platform is running at full throttle, you get most of the theoretically possible system throughput even with the little cores turned off. Having more CPUs active also adds interprocessor/ intercluster communication overheads, so you may still get a better power-performance tradeoff if the little cluster is simply turned off when you want to run the device as fast as possible.

...
I am curious to know which implementation linaro finally choose. :) Many appreciations for all the support.

The main difference is that the switcher approach does not rely on experimental scheduler modifications, and the impact of the switcher on system behaviour and performance is better understood than for MP right now. Therefore it should be possible to have it working well sooner (and upstreamed sooner) than will be possible with b.L MP.

b.L MP is a more flexible and powerful approach, but this is expected to mature over a longer timescale. Linaro is interested in both.

Yep, the scheduler modification is a tough task. :)

People are working on it -- it will just take some time to get it right.

Cheers ---Dave

Amit Kucheria

7 p.m.

On Thu, Aug 23, 2012 at 8:27 PM, Lei Wen adrian.wenl@gmail.com wrote:

...

Dave,

On Thu, Aug 23, 2012 at 8:47 PM, Dave Martin dave.martin@linaro.org wrote:

<snip>

...

...
The switcher logical CPUs can run either on the big or little cluster, but for scheduling purposes most of the kernel does not really need to understand this. big/LITTLE becomes an extra performance point parameter for each logical CPU, similar to frequency/voltage scaling. Because the switcher logical CPUs have identical properties, the kernel can treat them as identical for scheduling purposes. This means that the scheduler should work sensibly without any modifications.

Good abstraction! However I cannot see why kernel still believe those logic cpus has same computing capability, if the real cpu running is bLbL. What I can learn from SMP is that the kernel believe the cpus has the same DMIPS. Does the logic cpu fake its DMIPS capability and report the same value to the kernel side?

At the scheduler level, the MIPS is represented by a normalised abstraction called cpu_power. And we now have the ability to change that through device tree (or at runtime) if needed.

See SHA-ID 130d9aabf997bd8449ff4e877fe3c42df066805e in 3.6-rc or https://lkml.org/lkml/2012/6/12/202

4695

days inactive

4703

days old

linaro-kernel@lists.linaro.org

10 comments

participants

tags (0)

participants (4)

Amit Kucheria
Dave Martin
Lei Wen
Nicolas Pitre