[Question] MCPM Supporting For ARM64

List overview All Threads
Download

newer

older

[PATCH] arm64: KVM: Add Kconfig...

linus/master boot: 22 pass, 1 fail...

Leo Yan

30 Aug 2013 30 Aug '13

9:06 a.m.

hi Nico,

For later ARMv8's big.LITTLE and for current ARM64 multi-core's power management, actually we can use MCPM as well, so we want to confirm if there have plan to support MCPM framework for ARM64 as well?

-- Thx, Leo Yan

Show replies by date

Amit Kucheria

30 Aug 30 Aug

9:32 a.m.

On Fri, Aug 30, 2013 at 2:36 PM, Leo Yan leoy@marvell.com wrote:

...

hi Nico,

For later ARMv8's big.LITTLE and for current ARM64 multi-core's power management, actually we can use MCPM as well, so we want to confirm if there have plan to support MCPM framework for ARM64 as well?

Hi Leo,

An MCPM port to ARM64 is still being discussed. ARM expects PSCI to provide the necessary callbacks in firmware for ARM64. I've cc'ed Charles who is coordinating this from ARM.

Regards, Amit

Leo Yan

10:07 a.m.

On 08/30/2013 05:32 PM, Amit Kucheria wrote:

...

On Fri, Aug 30, 2013 at 2:36 PM, Leo Yan leoy@marvell.com wrote:

...
hi Nico,

For later ARMv8's big.LITTLE and for current ARM64 multi-core's power management, actually we can use MCPM as well, so we want to confirm if there have plan to support MCPM framework for ARM64 as well?

Hi Leo,

An MCPM port to ARM64 is still being discussed. ARM expects PSCI to provide the necessary callbacks in firmware for ARM64. I've cc'ed Charles who is coordinating this from ARM.

hi Amit,

Thx for quick response.

I want to confirm: if use PSCI, then the core's state machine will be maintained in PSCI rather than in kernel, right? If so, then ARM will suggest ARM64 to use PSCI as the default option for power management?

Though PSCI's firmware will let the kernel code to be clean, but MCPM have one obvious benefit is we can maintain the code in kernel rather than in a dedicated firmware, and it's quite similar with our previous implementation for power management.

Thx, Leo Yan

Amit Kucheria

10:20 a.m.

On Fri, Aug 30, 2013 at 3:37 PM, Leo Yan leoy@marvell.com wrote:

...

On 08/30/2013 05:32 PM, Amit Kucheria wrote:

...
On Fri, Aug 30, 2013 at 2:36 PM, Leo Yan leoy@marvell.com wrote:

...
hi Nico,

For later ARMv8's big.LITTLE and for current ARM64 multi-core's power management, actually we can use MCPM as well, so we want to confirm if there have plan to support MCPM framework for ARM64 as well?

Hi Leo,

An MCPM port to ARM64 is still being discussed. ARM expects PSCI to provide the necessary callbacks in firmware for ARM64. I've cc'ed Charles who is coordinating this from ARM.

hi Amit,

Thx for quick response.

I want to confirm: if use PSCI, then the core's state machine will be maintained in PSCI rather than in kernel, right? If so, then ARM will suggest ARM64 to use PSCI as the default option for power management?

That is correct.

...

Though PSCI's firmware will let the kernel code to be clean, but MCPM have one obvious benefit is we can maintain the code in kernel rather than in a dedicated firmware, and it's quite similar with our previous implementation for power management.

Yes, that is what the discussion is about. :) There is also the question of schedules.

One approach might be to have a light-weight PSCI backend for MCPM in the kernel that allows us to directly call PSCI instead of using MCPM heuristics. I'll let someone from ARM state their position here.

Regards, Amit

Catalin Marinas

4:27 p.m.

On Fri, Aug 30, 2013 at 11:20:42AM +0100, Amit Kucheria wrote:

...

On Fri, Aug 30, 2013 at 3:37 PM, Leo Yan leoy@marvell.com wrote:

...
On 08/30/2013 05:32 PM, Amit Kucheria wrote:

...
On Fri, Aug 30, 2013 at 2:36 PM, Leo Yan leoy@marvell.com wrote:

...
For later ARMv8's big.LITTLE and for current ARM64 multi-core's power management, actually we can use MCPM as well, so we want to confirm if there have plan to support MCPM framework for ARM64 as well?

An MCPM port to ARM64 is still being discussed. ARM expects PSCI to provide the necessary callbacks in firmware for ARM64. I've cc'ed Charles who is coordinating this from ARM.

hi Amit,

Thx for quick response.

I want to confirm: if use PSCI, then the core's state machine will be maintained in PSCI rather than in kernel, right? If so, then ARM will suggest ARM64 to use PSCI as the default option for power management?

That is correct.

...
Though PSCI's firmware will let the kernel code to be clean, but MCPM have one obvious benefit is we can maintain the code in kernel rather than in a dedicated firmware, and it's quite similar with our previous implementation for power management.

Yes, that is what the discussion is about. :) There is also the question of schedules.

One approach might be to have a light-weight PSCI backend for MCPM in the kernel that allows us to directly call PSCI instead of using MCPM heuristics. I'll let someone from ARM state their position here.

My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

-- Catalin

Nicolas Pitre

31 Aug 31 Aug

2:19 a.m.

On Fri, 30 Aug 2013, Catalin Marinas wrote:

...

My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

That being said, it is normally those people who need it who should put resources forward to do that work. Linaro members needed it on ARM32 and this is why this work came out of Linaro initially.

Nicolas

Roger Teague

2:30 p.m.

Hi Nicolas, one of the ideas behind PSCI and the Generic Firmware apart from the obvious consolidation of methods is to ensure that partners have a framework for future expansion as the functionality of ARM IP increases.

With a common strategy we are much better placed to pipeline new functionality and other features into to software ecosystem.

We absolutely would never discourage innovation and perhaps alternatives but these will always lag what ARM is supporting purely because they will need any new IP to be public whereas we can work on Generic FW ahead of the curve.

So we're not being harsh but simply acknowledging how the product pipeline works

Hope this helps

Roger

Sent from yet another low power ARM device

On 31 Aug 2013, at 03:21, "Nicolas Pitre" nicolas.pitre@linaro.org wrote:

...

On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

That being said, it is normally those people who need it who should put resources forward to do that work. Linaro members needed it on ARM32 and this is why this work came out of Linaro initially.

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

Leo Yan

2 Sep 2 Sep

5:16 a.m.

hi,

i'm interesting w/t the light-weight PSCI backend for MCPM which reminded by Amit. When will ARM release related code to the community so that we can get know more well for it?

Just like Nico said, PSCI is a common framework and the the common framework will take related long time to be adopt, not only for ARM, also for the SoC companies. So if we can know more detailed info for PSCI's schedule, then we can decide we can follow PSCI in our own product or use MCPM for near term's product and switch to PSCI after it's ready.

Thx, Leo Yan

On 08/31/2013 10:30 PM, Roger Teague wrote:

...

Hi Nicolas, one of the ideas behind PSCI and the Generic Firmware apart from the obvious consolidation of methods is to ensure that partners have a framework for future expansion as the functionality of ARM IP increases.

With a common strategy we are much better placed to pipeline new functionality and other features into to software ecosystem.

We absolutely would never discourage innovation and perhaps alternatives but these will always lag what ARM is supporting purely because they will need any new IP to be public whereas we can work on Generic FW ahead of the curve.

So we're not being harsh but simply acknowledging how the product pipeline works

Hope this helps

Roger

Sent from yet another low power ARM device

On 31 Aug 2013, at 03:21, "Nicolas Pitre" nicolas.pitre@linaro.org wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

That being said, it is normally those people who need it who should put resources forward to do that work. Linaro members needed it on ARM32 and this is why this work came out of Linaro initially.

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Roger Teague

5:49 a.m.

Hi there,

PSCI as a specification is available to partners via drop zone. If you contact your partner manager they will be able to help you.

In terms of the Generic Firmware this is in development now and binaries are available, a complete implementation will be released later this year.

As Catalin stated there are no plans for a light weight back-end for MCPM.

Thanks

Sent from yet another low power ARM device

On 2 Sep 2013, at 06:17, "Leo Yan" leoy@marvell.com wrote:

...

hi,

i'm interesting w/t the light-weight PSCI backend for MCPM which reminded by Amit. When will ARM release related code to the community so that we can get know more well for it?

Just like Nico said, PSCI is a common framework and the the common framework will take related long time to be adopt, not only for ARM, also for the SoC companies. So if we can know more detailed info for PSCI's schedule, then we can decide we can follow PSCI in our own product or use MCPM for near term's product and switch to PSCI after it's ready.

Thx, Leo Yan

On 08/31/2013 10:30 PM, Roger Teague wrote:

...
Hi Nicolas, one of the ideas behind PSCI and the Generic Firmware apart from the obvious consolidation of methods is to ensure that partners have a framework for future expansion as the functionality of ARM IP increases.

With a common strategy we are much better placed to pipeline new functionality and other features into to software ecosystem.

We absolutely would never discourage innovation and perhaps alternatives but these will always lag what ARM is supporting purely because they will need any new IP to be public whereas we can work on Generic FW ahead of the curve.

So we're not being harsh but simply acknowledging how the product pipeline works

Hope this helps

Roger

Sent from yet another low power ARM device

On 31 Aug 2013, at 03:21, "Nicolas Pitre" nicolas.pitre@linaro.org wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

That being said, it is normally those people who need it who should put resources forward to do that work. Linaro members needed it on ARM32 and this is why this work came out of Linaro initially.

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Catalin Marinas

8:19 a.m.

On Mon, Sep 02, 2013 at 06:49:00AM +0100, Roger Teague wrote:

...

PSCI as a specification is available to partners via drop zone. If you contact your partner manager they will be able to help you.

For the record, PSCI is actually public now (I think since July). A click-through is probably required (like other documents on infocenter):

http://infocenter.arm.com/help/topic/com.arm.doc.den0022b/index.html

-- Catalin

Catalin Marinas

2:24 p.m.

(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

On Mon, Sep 02, 2013 at 06:49:00AM +0100, Roger Teague wrote:

...

PSCI as a specification is available to partners via drop zone. If you contact your partner manager they will be able to help you.

For the record, PSCI is actually public now (I think since July). A click-through is probably required (like other documents on infocenter):

http://infocenter.arm.com/help/topic/com.arm.doc.den0022b/index.html

-- Catalin

Daniel Lezcano

2:27 p.m.

On 09/02/2013 04:24 PM, Catalin Marinas wrote:

...

(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

On Mon, Sep 02, 2013 at 06:49:00AM +0100, Roger Teague wrote:

...
PSCI as a specification is available to partners via drop zone. If you contact your partner manager they will be able to help you.

For the record, PSCI is actually public now (I think since July). A click-through is probably required (like other documents on infocenter):

http://infocenter.arm.com/help/topic/com.arm.doc.den0022b/index.html

Thanks for the pointer.

-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog

Catalin Marinas

8:16 a.m.

Hi Nico,

On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...

On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term. As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

...

In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

I don't have a problem with the MCPM itself. It is nicely written and can probably be abstracted into a library. My concern is how MCPM is going to be used by the SoC code. The example I have so far is dcscb.c and I really want such code moved out of the kernel into firmware (or even better if there is some hardware control available). It is highly dependent to the SoC and CPU implementation and setting/clearing bits like ACTLT.SMP which is not possible in non-secure mode. I'm pretty sure that such code will end up doing non-standard SMC calls to the secure firmware and we lose any hope of standardisation.

The PSCI API can evolve in time (it has version information) but so far there is no such thing as "lightweight" PSCI that could be used as an MCPM back-end. For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

-- Catalin

Charles Garcia-Tobin

8:49 a.m.

I think there are a few points being missed in this discussion: 1. If you want to implement a product that supports secure and non-secure world, and optionally virtualisation, you are always going to need something like PSCI. You can work around it and implement something else (e.g. MCPM) but you'd essentially be fixing the same problem in new ways, IOW replicating work. 2. When you go to ARMv8 you have to write secure world FW anyway. You can't just reuse what existed in ARMv7. EL3 needs to run in AArch64, and in addition it is a whole execution environment, it's not just a processor mode.

Additional comments below

Cheers

Charles

...

-----Original Message----- From: Catalin Marinas [mailto:catalin.marinas@arm.com] Sent: 02 September 2013 09:17 To: Nicolas Pitre Cc: Amit Kucheria; Leo Yan; Charles Garcia-Tobin; linaro- kernel@lists.linaro.org; Zhou Zhu; Chao Xie; Yu Tang; Neil Zhang; Mingliang Hu; Mark Hambleton; Christian Daudt Subject: Re: [Question] MCPM Supporting For ARM64

Hi Nico,

On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

Also worth pointing out that PSCI works just as happily at the EL1<->EL2 boundary. It's a an easily virtualisable interface.

...

It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term. As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

I think to be honest most of the commentary is derived from the fact that all of this is a "change". Nobody likes change, and there is some pain involved. But the point is that ARMv8 mandates change anyway. Doing it this way will lead to more standard methodologies and less change in the future. Compare this to what we have today in ARMv7 where everybody does their own thing.

...

...
In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

I don't have a problem with the MCPM itself. It is nicely written and can probably be abstracted into a library. My concern is how MCPM is going to be used by the SoC code. The example I have so far is dcscb.c and I really want such code moved out of the kernel into firmware (or even better if there is some hardware control available). It is highly dependent to the SoC and CPU implementation and setting/clearing bits like ACTLT.SMP which is not possible in non-secure mode. I'm pretty sure that such code will end up doing non-standard SMC calls to the secure firmware and we lose any hope of standardisation.

The PSCI API can evolve in time (it has version information) but so far there is no such thing as "lightweight" PSCI that could be used as an MCPM back-end. For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

If you could work round that, MCPM on an PSCI implementation would effectively be a nop layer that does nothing but to forward calls onto PSCI. I don't see great value in that.

...

-- Catalin

Catalin Marinas

2:23 p.m.

(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

Hi Nico,

On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...

On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

...

In other words, if someone does the work to port MCPM to ARM64 and properly abstract the common parts with ARM32 then I don't see why you should refuse merging it.

-- Catalin

Nicolas Pitre

3 Sep 3 Sep

4:16 a.m.

On Mon, 2 Sep 2013, Catalin Marinas wrote:

...

(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

This is the first time I see your reply.

...

Hi Nico,

On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

I may agree to that.

...

It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term.

I'll say this only once, and then I'll shut up on this very issue.

<<< Beginning of my editorial on firmware >>>

The firmware _interface_ standard is not a problem. So this is not about PSCI in particular. I'm glad I was among st the early reviewers of the specification and therefore I would be the first to be blamed if the interface was wrong to me. In fact there was something wrong to me in the very first draft and I'm glad this was removed from subsequent revisions. This just to say that I was involved with PSCI to some degree.

The problem is in the _implementation_ of the firmware. Or more precisely its _maintenance_.

Because, as experience has shown, firmware is always *bad* and *buggy* at some point. No matter what you say, someone somewhere will screw it up. And we'll be stuck with it.

You only need to look at the abomination the BIOS on X86 can be. Or buggy ACPI tables requiring all sorts of quirks in the kernel. Or the catastrophe the APM BIOS was before it was later replaced by ACPI.

You might say: this is X86 and on ARM we can do better. Yeah... well... Look at our bootloader story for example. How many people are simply unable to upgrade their bootloader in order to be DT compliant. Or how many OEMs are too scared to let end users upgrade their bootloaders.

Upgrading system firmware is not much different from upgrading a bootloader. Something goes wrong during the upgrade and you have a nice paperweight. So OEMs simply won't be forthcoming with firmware upgrades as this is a huge risk with significant support costs when this goes wrong. And if the firmware is responsible for the secure world as well then you have QA and certification costs to consider, etc.

And even if the upgrade is easy, the OEM might not necessarily be that interested in fixing buggy firmware for many reasons. A good example of that is the very Versatile Express which we (Linaro) reported issues for with the MMC slot many months ago and no fix has come back yet:

https://bugs.launchpad.net/linaro-big-little-system/+bug/1166246

Achin was experimenting with a modified firmware before priorities and people at ARM have moved around and then nothing. OEMs will have priority shifts as well -- this is the hard reality of this business. This means unfixed firmware for older devices that users can't fix unlike with kernel solutions.

You might say: We'll then help people make their firmware perfect from the start. Well... I have enough software engineering experience to know this is simply impossible. Software by definition is always going to have bugs. Especially in the market ARM is after where time to market is paramount. So pressure to have the software done quickly is real. And even if you provide an example implementation for customers to use and extend, then they'll extend it all right and introduce new bugs, possibly even breaking the previously working generic code.

Heck, I've privately reviewed a few MCPM backends so far, and they all had serious flaws. The kernel has the advantage where any extension is reviewed by people from the outside. Example firmware being extended by ARM partners will not. And I don't see why those people wouldn't make the same kind of mistakes in their fork of the generic firmware implementation. This stuff is simply too complex to always be right on the first attempt, even if testing shows it should.

So no, I don't have this faith in firmware like you do. And I've discussed this with you in the past as well, and my mind on this issue has not changed. In particular, when you give a presentation on ARMv8 and your answer to a question from the audience is: "firmware will take care of this" I may only think you'll regret this answer one day.

...

As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

Firmware calls such as provided by the PC BIOS, APM, UEFI runtime services or PSCI are black boxes you have to trust while the code being called doesn't know what is really hapening on the system.

Oh and by the way people are now realizing all the issues with DT where the maintainers want only stable final bindings but developers can't tell when a binding is final unless it has seen wide testing. So DT is not the greatest thing since sliced bread anymore.

But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

Now what is the piece of software people are not afraid of upgrading? Where is it easier to upgrade functionality beyond the initial shipping state because time to market pressure drove products on the street before the software was ready? Yes, it is the kernel. And that's exactly what IKS was about: being able to drive a big.LITTLE system early on before the scheduler was properly enhanced for big.LITTLE. And only a simple kernel upgrade is needed to move from IKS to full MP.

Think for a minute when the kernel used to call the APM BIOS for power management in the old days, and then the kernel was enhanced to be preemptive. APM was not re-entrant and therefore conceptually incompatible with this kernel context at the time. Obviously things where even more difficult when SMP showed up. And just ask the X86 maintainers what they think about the firmware induced NMIs *today*.

So, is e.g. PSCI compatible with Linux-RT now? Is there another kernel execution model innovation coming up for which the firmware will be an obstacle on ARM?

I do understand that the way the ARM architecture is designed, especially latest revisions with TZ and all, do require some sort of firmware. And while there is no other way but to have it, then of course a standardized firmware interface is a must. I'm not disputing this at all.

I'm just extremely worried about the simple presence of firmware and the need to call into it for complex operations such as intelligent power management when the kernel would be a much better location to perform such operations. The kernel is also the only place where those things may be improved over time, even for free from third parties, , whereas the firmware is going to be locked down with no possible evolution beyond its shipping state.

For example, after working on MCPM for a while, I do have ideas for additional performance and power usage optimization. But if that functionality is getting burned into firmware controlled by secure world then there is just no hope for _me_ to optimize things any further.

Not that I can do anything about it anyway. But I had to vent my discomfort about anything firmware-like. Yet I was the one who suggested to Will Deacon and Marc Zyngier they should use PSCI to control KVM instances instead of their ad hoc interface. However, if someone has a legitimate case for _not_ using firmware calls and use machine specific extensions in the kernel then I'll support them.

<<< End of my editorial on firmware >>>

...

I don't have a problem with the MCPM itself. It is nicely written and can probably be abstracted into a library. My concern is how MCPM is going to be used by the SoC code. The example I have so far is dcscb.c and I really want such code moved out of the kernel into firmware (or even better if there is some hardware control available). It is highly dependent to the SoC and CPU implementation and setting/clearing bits like ACTLT.SMP which is not possible in non-secure mode. I'm pretty sure that such code will end up doing non-standard SMC calls to the secure firmware and we lose any hope of standardisation.

Standardization and innovation are often opposing each other. And yes that sucks.

...

The PSCI API can evolve in time (it has version information) but so far there is no such thing as "lightweight" PSCI that could be used as an MCPM back-end.

That exists still, written by Achin Gupta. His 6th version is sitting in my mbox and is dated 12 Mar 2013.

...

For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

Nicolas

Catalin Marinas

4:28 p.m.

On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote:

...

On Mon, 2 Sep 2013, Catalin Marinas wrote:

...
(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

This is the first time I see your reply.

I should have said "sorry for the future duplicate email that you may receive" (we try to work around IT here but sometimes they break our back doors ;)).

...

...
On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

I may agree to that.

...
It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term.

I'll say this only once, and then I'll shut up on this very issue.

I actually enjoy this discussion ;)

...

<<< Beginning of my editorial on firmware >>>

...

So no, I don't have this faith in firmware like you do. And I've discussed this with you in the past as well, and my mind on this issue has not changed. In particular, when you give a presentation on ARMv8 and your answer to a question from the audience is: "firmware will take care of this" I may only think you'll regret this answer one day.

Well, I don't trust firmware either and I've seen it causing hard to debug issues in the past. But actually a better statement would be that I don't trust any software (not even Linux) unless I can see the source. The big advantage Linux has is that it requires (well, unless you are NVidia ;)) opening the code and more people look at it.

My hope for the generic firmware is that it will turn into a proper open-source project (with a friendly license). This would make it different from old BIOS implementations.

Of course, bugs can happen and the firmware is harder to update but really not impossible.

...

...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

...

But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

How much you leave on the secure or the non-secure side goes through a great deal of discussions/reviews both with the general purpose OS and secure OS people, hence the creation of PSCI (really, it's not just about standardising the SoC support on arm64 Linux).

MCPM is well suited if you don't have a trusted execution environment but once you do, the secure OS could have the same needs as the non-secure one in terms of CPU availability. Would it trust the non-secure OS to handle the cluster shutdown, coherency? I doubt it.

...

Think for a minute when the kernel used to call the APM BIOS for power management in the old days, and then the kernel was enhanced to be preemptive. APM was not re-entrant and therefore conceptually incompatible with this kernel context at the time. Obviously things where even more difficult when SMP showed up. And just ask the X86 maintainers what they think about the firmware induced NMIs *today*.

So, is e.g. PSCI compatible with Linux-RT now?

I think it is as much compatible as MCPM. What you need for RT is bounded time for going down and coming back from a lower power state. On ARM and other architectures this involves firmware (at least after reset), so not related to PSCI. Those time bounds could be provided by the SoC vendor for their firmware (whether you use MCPM or PSCI). If you don't like them, you can simply use WFI, PSCI does not mandate a call for such idle state (not sure whether APM required this in the past), only deeper sleep states.

...

Is there another kernel execution model innovation coming up for which the firmware will be an obstacle on ARM?

Firmwares evolve in time but as I said, even if we would like to, we can't eliminate them.

...

I'm just extremely worried about the simple presence of firmware and the need to call into it for complex operations such as intelligent power management when the kernel would be a much better location to perform such operations. The kernel is also the only place where those things may be improved over time, even for free from third parties, , whereas the firmware is going to be locked down with no possible evolution beyond its shipping state.

For example, after working on MCPM for a while, I do have ideas for additional performance and power usage optimization. But if that functionality is getting burned into firmware controlled by secure world then there is just no hope for _me_ to optimize things any further.

I don't dispute the above but I don't have a better solution that would accommodate secure/non-secure worlds. With proper education, SoC vendors can learn to allow upgradable (parts of) firmware. But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

...

Not that I can do anything about it anyway. But I had to vent my discomfort about anything firmware-like. Yet I was the one who suggested to Will Deacon and Marc Zyngier they should use PSCI to control KVM instances instead of their ad hoc interface. However, if someone has a legitimate case for _not_ using firmware calls and use machine specific extensions in the kernel then I'll support them.

Legitimate calls yes (like no EL3). But if people would like MCPM just for temporary kernel bring-up, I don't agree with. They start with an simple MCPM back-end and later realise that they need Linux to run in non-secure mode (well, like getting EL2 or just because they get a secure OS), so they modify the MCPM back-end for (non-standard) secure calls. Some time later they decide they need a secure OS (e.g. mobile payments) and yet again they modify their back-end, possibly ignoring MCPM because the secure OS asks them to handle such state machine at EL3. In Linux we end up carrying all those intermediate implementations.

...

...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

-- Catalin

Charles Garcia-Tobin

5:25 p.m.

In addition to the points raised by Catalin, and I think it's worth reemphasising the background and reasoning behind PSCI to balance this argument. Products that end up in people's hands sport trusted FW and trusted OS. The kind of power management, cache and coherency management that PSCI calls for happens today, on real devices all the time. However everybody does it in their own individual way. It's messy and counterproductive. PSCI gives that a more formal structure. Ultimately this should actually ease things rather than make them more complex. Having a generic interface has obvious advantages in terms of integration, and testing/validation.

In addition PSCI, by virtue of being generic, is easily virtualisable. That is an OS independent virtualisation, you can have any combination of guest/host/hyp OSs. Having an OS specific power management API would prevent this. Clearly having a generic interface also allows you to reduce platform specific code in the kernel, helping to get you that single kernel image.

I think a lot of people have been bitten in the past with fixed unmovable FW. Particularly on devboards, which do not reflect real shipping devices. But it's in no semiconductors/OEM's interest to release FW that is crap on a device that ships to end users. Going to my first point, which Catalin also makes below, FW like this will exist already today in shipping devices, which manufacturer doing their own individual API, and people have to fix it where necessary, today.

Cheers

Charles

...

-----Original Message----- From: Catalin Marinas [mailto:catalin.marinas@arm.com] Sent: 03 September 2013 17:28 To: Nicolas Pitre Cc: Amit Kucheria; Leo Yan; Charles Garcia-Tobin; linaro- kernel@lists.linaro.org; Zhou Zhu; Chao Xie; Yu Tang; Neil Zhang; Mingliang Hu; Mark Hambleton; Christian Daudt Subject: Re: [Question] MCPM Supporting For ARM64

On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote:

...
On Mon, 2 Sep 2013, Catalin Marinas wrote:

...
(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

This is the first time I see your reply.

I should have said "sorry for the future duplicate email that you may receive" (we try to work around IT here but sometimes they break our back doors ;)).

...
...
On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for

their

...
...
...
...
needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

I may agree to that.

...
It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term.

I'll say this only once, and then I'll shut up on this very issue.

I actually enjoy this discussion ;)

...
<<< Beginning of my editorial on firmware >>>

...

...
So no, I don't have this faith in firmware like you do. And I've discussed this with you in the past as well, and my mind on this issue has not changed. In particular, when you give a presentation on ARMv8 and your answer to a question from the audience is: "firmware will take care of this" I may only think you'll regret this answer one day.

Well, I don't trust firmware either and I've seen it causing hard to debug issues in the past. But actually a better statement would be that I don't trust any software (not even Linux) unless I can see the source. The big advantage Linux has is that it requires (well, unless you are NVidia ;)) opening the code and more people look at it.

My hope for the generic firmware is that it will turn into a proper open-source project (with a friendly license). This would make it different from old BIOS implementations.

Of course, bugs can happen and the firmware is harder to update but really not impossible.

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

...
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

How much you leave on the secure or the non-secure side goes through a great deal of discussions/reviews both with the general purpose OS and secure OS people, hence the creation of PSCI (really, it's not just about standardising the SoC support on arm64 Linux).

MCPM is well suited if you don't have a trusted execution environment but once you do, the secure OS could have the same needs as the non-secure one in terms of CPU availability. Would it trust the non-secure OS to handle the cluster shutdown, coherency? I doubt it.

...
Think for a minute when the kernel used to call the APM BIOS for power management in the old days, and then the kernel was enhanced to be preemptive. APM was not re-entrant and therefore conceptually incompatible with this kernel context at the time. Obviously things where even more difficult when SMP showed up. And just ask the X86 maintainers what they think about the firmware induced NMIs *today*.

So, is e.g. PSCI compatible with Linux-RT now?

I think it is as much compatible as MCPM. What you need for RT is bounded time for going down and coming back from a lower power state. On ARM and other architectures this involves firmware (at least after reset), so not related to PSCI. Those time bounds could be provided by the SoC vendor for their firmware (whether you use MCPM or PSCI). If you don't like them, you can simply use WFI, PSCI does not mandate a call for such idle state (not sure whether APM required this in the past), only deeper sleep states.

...
Is there another kernel execution model innovation coming up for which the firmware will be an obstacle on ARM?

Firmwares evolve in time but as I said, even if we would like to, we can't eliminate them.

...
I'm just extremely worried about the simple presence of firmware and the need to call into it for complex operations such as intelligent power management when the kernel would be a much better location to perform such operations. The kernel is also the only place where those things may be improved over time, even for free from third parties, , whereas the firmware is going to be locked down with no possible evolution beyond its shipping state.

For example, after working on MCPM for a while, I do have ideas for additional performance and power usage optimization. But if that functionality is getting burned into firmware controlled by secure world then there is just no hope for _me_ to optimize things any further.

I don't dispute the above but I don't have a better solution that would accommodate secure/non-secure worlds. With proper education, SoC vendors can learn to allow upgradable (parts of) firmware. But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

...
Not that I can do anything about it anyway. But I had to vent my discomfort about anything firmware-like. Yet I was the one who suggested to Will Deacon and Marc Zyngier they should use PSCI to control KVM instances instead of their ad hoc interface. However, if someone has a legitimate case for _not_ using firmware calls and use machine specific extensions in the kernel then I'll support them.

Legitimate calls yes (like no EL3). But if people would like MCPM just for temporary kernel bring-up, I don't agree with. They start with an simple MCPM back-end and later realise that they need Linux to run in non-secure mode (well, like getting EL2 or just because they get a secure OS), so they modify the MCPM back-end for (non-standard) secure calls. Some time later they decide they need a secure OS (e.g. mobile payments) and yet again they modify their back-end, possibly ignoring MCPM because the secure OS asks them to handle such state machine at EL3. In Linux we end up carrying all those intermediate implementations.

...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

-- Catalin

Nicolas Pitre

7:02 p.m.

For the record, I want to state that I agree with everything being said below. I hope it is clear by now that my discomfort about firmware lays elsewhere.

On Tue, 3 Sep 2013, Charles Garcia-Tobin wrote:

...

Hi

In addition to the points raised by Catalin, and I think it's worth reemphasising the background and reasoning behind PSCI to balance this argument. Products that end up in people's hands sport trusted FW and trusted OS. The kind of power management, cache and coherency management that PSCI calls for happens today, on real devices all the time. However everybody does it in their own individual way. It's messy and counterproductive. PSCI gives that a more formal structure. Ultimately this should actually ease things rather than make them more complex. Having a generic interface has obvious advantages in terms of integration, and testing/validation.

In addition PSCI, by virtue of being generic, is easily virtualisable. That is an OS independent virtualisation, you can have any combination of guest/host/hyp OSs. Having an OS specific power management API would prevent this. Clearly having a generic interface also allows you to reduce platform specific code in the kernel, helping to get you that single kernel image.

I think a lot of people have been bitten in the past with fixed unmovable FW. Particularly on devboards, which do not reflect real shipping devices. But it's in no semiconductors/OEM's interest to release FW that is crap on a device that ships to end users. Going to my first point, which Catalin also makes below, FW like this will exist already today in shipping devices, which manufacturer doing their own individual API, and people have to fix it where necessary, today.

Cheers

Charles

...
-----Original Message----- From: Catalin Marinas [mailto:catalin.marinas@arm.com] Sent: 03 September 2013 17:28 To: Nicolas Pitre Cc: Amit Kucheria; Leo Yan; Charles Garcia-Tobin; linaro- kernel@lists.linaro.org; Zhou Zhu; Chao Xie; Yu Tang; Neil Zhang; Mingliang Hu; Mark Hambleton; Christian Daudt Subject: Re: [Question] MCPM Supporting For ARM64

On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote:

...
On Mon, 2 Sep 2013, Catalin Marinas wrote:

...
(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

This is the first time I see your reply.

I should have said "sorry for the future duplicate email that you may receive" (we try to work around IT here but sometimes they break our back doors ;)).

...
...
On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for

their

...
...
...
...
needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

I may agree to that.

...
It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term.

I'll say this only once, and then I'll shut up on this very issue.

I actually enjoy this discussion ;)

...
<<< Beginning of my editorial on firmware >>>

...

...
So no, I don't have this faith in firmware like you do. And I've discussed this with you in the past as well, and my mind on this issue has not changed. In particular, when you give a presentation on ARMv8 and your answer to a question from the audience is: "firmware will take care of this" I may only think you'll regret this answer one day.

Well, I don't trust firmware either and I've seen it causing hard to debug issues in the past. But actually a better statement would be that I don't trust any software (not even Linux) unless I can see the source. The big advantage Linux has is that it requires (well, unless you are NVidia ;)) opening the code and more people look at it.

My hope for the generic firmware is that it will turn into a proper open-source project (with a friendly license). This would make it different from old BIOS implementations.

Of course, bugs can happen and the firmware is harder to update but really not impossible.

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

...
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

How much you leave on the secure or the non-secure side goes through a great deal of discussions/reviews both with the general purpose OS and secure OS people, hence the creation of PSCI (really, it's not just about standardising the SoC support on arm64 Linux).

MCPM is well suited if you don't have a trusted execution environment but once you do, the secure OS could have the same needs as the non-secure one in terms of CPU availability. Would it trust the non-secure OS to handle the cluster shutdown, coherency? I doubt it.

...
Think for a minute when the kernel used to call the APM BIOS for power management in the old days, and then the kernel was enhanced to be preemptive. APM was not re-entrant and therefore conceptually incompatible with this kernel context at the time. Obviously things where even more difficult when SMP showed up. And just ask the X86 maintainers what they think about the firmware induced NMIs *today*.

So, is e.g. PSCI compatible with Linux-RT now?

I think it is as much compatible as MCPM. What you need for RT is bounded time for going down and coming back from a lower power state. On ARM and other architectures this involves firmware (at least after reset), so not related to PSCI. Those time bounds could be provided by the SoC vendor for their firmware (whether you use MCPM or PSCI). If you don't like them, you can simply use WFI, PSCI does not mandate a call for such idle state (not sure whether APM required this in the past), only deeper sleep states.

...
Is there another kernel execution model innovation coming up for which the firmware will be an obstacle on ARM?

Firmwares evolve in time but as I said, even if we would like to, we can't eliminate them.

...
I'm just extremely worried about the simple presence of firmware and the need to call into it for complex operations such as intelligent power management when the kernel would be a much better location to perform such operations. The kernel is also the only place where those things may be improved over time, even for free from third parties, , whereas the firmware is going to be locked down with no possible evolution beyond its shipping state.

For example, after working on MCPM for a while, I do have ideas for additional performance and power usage optimization. But if that functionality is getting burned into firmware controlled by secure world then there is just no hope for _me_ to optimize things any further.

I don't dispute the above but I don't have a better solution that would accommodate secure/non-secure worlds. With proper education, SoC vendors can learn to allow upgradable (parts of) firmware. But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

...
Not that I can do anything about it anyway. But I had to vent my discomfort about anything firmware-like. Yet I was the one who suggested to Will Deacon and Marc Zyngier they should use PSCI to control KVM instances instead of their ad hoc interface. However, if someone has a legitimate case for _not_ using firmware calls and use machine specific extensions in the kernel then I'll support them.

Legitimate calls yes (like no EL3). But if people would like MCPM just for temporary kernel bring-up, I don't agree with. They start with an simple MCPM back-end and later realise that they need Linux to run in non-secure mode (well, like getting EL2 or just because they get a secure OS), so they modify the MCPM back-end for (non-standard) secure calls. Some time later they decide they need a secure OS (e.g. mobile payments) and yet again they modify their back-end, possibly ignoring MCPM because the secure OS asks them to handle such state machine at EL3. In Linux we end up carrying all those intermediate implementations.

...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

-- Catalin

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782

Dave Martin

6:11 p.m.

On Tue, Sep 03, 2013 at 05:28:29PM +0100, Catalin Marinas wrote:

...

On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote:

...
On Mon, 2 Sep 2013, Catalin Marinas wrote:

...
(sorry if you get a duplicate email, the SMTP server I was using keeps reporting delays, so I'm resending with different settings)

This is the first time I see your reply.

I should have said "sorry for the future duplicate email that you may receive" (we try to work around IT here but sometimes they break our back doors ;)).

...
...
On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:

...
On Fri, 30 Aug 2013, Catalin Marinas wrote:

...
My position for the arm64 kernel support is to use the PSCI and implement the cluster power synchronisation in the firmware. IOW, no MCPM in the arm64 kernel :(. To help with this, ARM is going to provide a generic firmware implementation that SoC vendors can expand for their needs.

I am open for discussing a common API that could be shared between MCPM-based code and the PSCI one. But I'm definitely not opting for a light-weight PSCI back-end to a heavy-weight MCPM implementation.

Also note that IKS won't be supported on arm64.

I find the above statements a bit rigid.

While I think the reasoning behind PSCI is sound, I suspect some people will elect not to add it to their system. Either because the hardware doesn't support all the necessary priviledge levels, or simply because they prefer the easiest solution in terms of maintenance and upgradability which means making the kernel in charge. And that may imply MCPM. I know that ARM would like to see PSCI be adopted everywhere but I doubt it'll be easy.

I agree PSCI is not always possible because of technical reasons (missing certain exception levels), so let me refine the above statement: if EL3 mode is available on a CPU implementation, PSCI must be supported (which I think is the case for Marvell).

I may agree to that.

...
It indeed sounds rigid but what's the point of trying to standardise something if you don't enforce it? People usually try to find the easiest path for them which may not necessarily be the cleanest longer term.

I'll say this only once, and then I'll shut up on this very issue.

I actually enjoy this discussion ;)

...
<<< Beginning of my editorial on firmware >>>

...

...
So no, I don't have this faith in firmware like you do. And I've discussed this with you in the past as well, and my mind on this issue has not changed. In particular, when you give a presentation on ARMv8 and your answer to a question from the audience is: "firmware will take care of this" I may only think you'll regret this answer one day.

Well, I don't trust firmware either and I've seen it causing hard to debug issues in the past. But actually a better statement would be that I don't trust any software (not even Linux) unless I can see the source. The big advantage Linux has is that it requires (well, unless you are NVidia ;)) opening the code and more people look at it.

My hope for the generic firmware is that it will turn into a proper open-source project (with a friendly license). This would make it different from old BIOS implementations.

Of course, bugs can happen and the firmware is harder to update but really not impossible.

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

...
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

How much you leave on the secure or the non-secure side goes through a great deal of discussions/reviews both with the general purpose OS and secure OS people, hence the creation of PSCI (really, it's not just about standardising the SoC support on arm64 Linux).

MCPM is well suited if you don't have a trusted execution environment but once you do, the secure OS could have the same needs as the non-secure one in terms of CPU availability. Would it trust the non-secure OS to handle the cluster shutdown, coherency? I doubt it.

...
Think for a minute when the kernel used to call the APM BIOS for power management in the old days, and then the kernel was enhanced to be preemptive. APM was not re-entrant and therefore conceptually incompatible with this kernel context at the time. Obviously things where even more difficult when SMP showed up. And just ask the X86 maintainers what they think about the firmware induced NMIs *today*.

So, is e.g. PSCI compatible with Linux-RT now?

I think it is as much compatible as MCPM. What you need for RT is bounded time for going down and coming back from a lower power state. On ARM and other architectures this involves firmware (at least after reset), so not related to PSCI. Those time bounds could be provided by the SoC vendor for their firmware (whether you use MCPM or PSCI). If you don't like them, you can simply use WFI, PSCI does not mandate a call for such idle state (not sure whether APM required this in the past), only deeper sleep states.

...
Is there another kernel execution model innovation coming up for which the firmware will be an obstacle on ARM?

Firmwares evolve in time but as I said, even if we would like to, we can't eliminate them.

...
I'm just extremely worried about the simple presence of firmware and the need to call into it for complex operations such as intelligent power management when the kernel would be a much better location to perform such operations. The kernel is also the only place where those things may be improved over time, even for free from third parties, , whereas the firmware is going to be locked down with no possible evolution beyond its shipping state.

For example, after working on MCPM for a while, I do have ideas for additional performance and power usage optimization. But if that functionality is getting burned into firmware controlled by secure world then there is just no hope for _me_ to optimize things any further.

I don't dispute the above but I don't have a better solution that would accommodate secure/non-secure worlds. With proper education, SoC vendors can learn to allow upgradable (parts of) firmware. But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

...
Not that I can do anything about it anyway. But I had to vent my discomfort about anything firmware-like. Yet I was the one who suggested to Will Deacon and Marc Zyngier they should use PSCI to control KVM instances instead of their ad hoc interface. However, if someone has a legitimate case for _not_ using firmware calls and use machine specific extensions in the kernel then I'll support them.

Legitimate calls yes (like no EL3). But if people would like MCPM just for temporary kernel bring-up, I don't agree with. They start with an simple MCPM back-end and later realise that they need Linux to run in non-secure mode (well, like getting EL2 or just because they get a secure OS), so they modify the MCPM back-end for (non-standard) secure calls. Some time later they decide they need a secure OS (e.g. mobile payments) and yet again they modify their back-end, possibly ignoring MCPM because the secure OS asks them to handle such state machine at EL3. In Linux we end up carrying all those intermediate implementations.

...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

You do miss a couple of things here:

...

mcpm_cpu_power_down() dcscb_power_down()

__mcpm_cpu_going_down(cpu, cluster);

...

flush_cache_all()				- both secure and non-secure
set_auxcr()					- secure-only

last_man && __mcpm_outbound_enter_critical(cpu, cluster);

...

cci_disable_port_by_cpu()			- secure-only?
(__mcpm_outbound_leave_critical())
__mcpm_cpu_down()
wfi()

In other words, the whole state machine is driven from the backend. The __mcpm_*() helpers are library functionality to help you manage it, but if your backend has its own methods of doing the coordination (as with the proposed PSCI-based backend) then there'so no compulsion to use them at all. There is no transfer of control away from the backend at eny point.

This is all about providing people who need to write a native Linux backend with tools for doing so.

Of course, the only purpose of MCPM in that scenario is to allow allow both kinds of backend to present a common interface to the kernel. This is only needed if both kinds of backend exist together, and the arch's smp_ops (or equivalent) provides an insufficient abstraction (I'm not familiar yet with how things look on arm64).

...

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

So in the PSCI case, most of these backend methods would simply translate to calls into PSCI (with some argument and semantics kludging to align the two interfaces).

...

From a personal point of view I would like to see wider use of MCPM,

but ... it really comes down to how many native backends we get. If the way the TZ architecture works blocks native backends from existing on most platforms, we might see none or almost none, though.

Cheers ---Dave

Christian Daudt

6:34 p.m.

On 13-09-03 11:11 AM, Dave Martin wrote:

...

On Tue, Sep 03, 2013 at 05:28:29PM +0100, Catalin Marinas wrote:

...
OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

You do miss a couple of things here:

...
mcpm_cpu_power_down() dcscb_power_down()
     __mcpm_cpu_going_down(cpu, cluster);
...
 flush_cache_all()				- both secure and non-secure
 set_auxcr()					- secure-only
last_man && __mcpm_outbound_enter_critical(cpu, cluster);

...
 cci_disable_port_by_cpu()			- secure-only?
 (__mcpm_outbound_leave_critical())
 __mcpm_cpu_down()
 wfi()
In other words, the whole state machine is driven from the backend. The __mcpm_*() helpers are library functionality to help you manage it, but if your backend has its own methods of doing the coordination (as with the proposed PSCI-based backend) then there'so no compulsion to use them at all. There is no transfer of control away from the backend at eny point.

This is all about providing people who need to write a native Linux backend with tools for doing so.

Of course, the only purpose of MCPM in that scenario is to allow allow both kinds of backend to present a common interface to the kernel. This is only needed if both kinds of backend exist together, and the arch's smp_ops (or equivalent) provides an insufficient abstraction (I'm not familiar yet with how things look on arm64).

...
So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

So in the PSCI case, most of these backend methods would simply translate to calls into PSCI (with some argument and semantics kludging to align the two interfaces).

From a personal point of view I would like to see wider use of MCPM, but ... it really comes down to how many native backends we get. If the way the TZ architecture works blocks native backends from existing on most platforms, we might see none or almost none, though.

I completely agree. I think that the the mcpm-on-psci scenario is (a) useful and (b) simple and (b) works on an already established kernel framework, so I see no reason to try to replace mcpm with psci for armv8, short or long term. It will be much better to have this flexibility built in, even if in most cases it is not utilized. There will be cases where it will be utilized. And while it is nice to wish that platform vendors all provide upgradeable EL3 implementations, the plain truth is that we are far away from that at this point, having barely (and not fully) arrived there @ kernel+userspace at present for the mobile space... To count on that as the basis for dismissing the need for mcpm as a flexible injection point is just asking for headaches. If it becomes true in the future that mcpm becomes an unecessary shim to psci for all platforms, then it will be easy enough to drop it out of the picture at that time. I suspect it will not.

Thanks, csd

Catalin Marinas

4 Sep 4 Sep

10:25 a.m.

On 3 Sep 2013, at 19:34, Christian Daudt csd@broadcom.com wrote:

...

On 13-09-03 11:11 AM, Dave Martin wrote:

...
From a personal point of view I would like to see wider use of MCPM, but ... it really comes down to how many native backends we get. If the way the TZ architecture works blocks native backends from existing on most platforms, we might see none or almost none, though.

I completely agree. I think that the the mcpm-on-psci scenario is (a) useful and (b) simple and (b) works on an already established kernel framework, so I see no reason to try to replace mcpm with psci for armv8, short or long term.

Sorry but I don't think you've followed the whole discussion. On ARMv8, you *need* calls to EL3 for CPU power management. With PSCI we try to standardise such calls. It's not about MCPM vs PSCI but about non-standard MCPM back-ends vs PSCI.

I have yet to see the value of MCPM on top of a PSCI back-end, other than allowing people to implement MCPM back-ends with non-standard EL3 calls (but happy to be proven wrong).

For cases where EL3 is not present (and PSCI not easily possible), I can accept MCPM but I would rather use it as a library driven from SoC-specific power code (e.g. under drivers/cpuidle/) rather than MCPM being the front-end to any CPU power management.

Catalin

Nicolas Pitre

1:57 p.m.

On Wed, 4 Sep 2013, Catalin Marinas wrote:

...

I have yet to see the value of MCPM on top of a PSCI back-end, other than allowing people to implement MCPM back-ends with non-standard EL3 calls (but happy to be proven wrong).

That's certainly a valid case. PSCI is very abstract and some people don't want to put all the power management complexity into firmware.

...

For cases where EL3 is not present (and PSCI not easily possible), I can accept MCPM but I would rather use it as a library driven from SoC-specific power code (e.g. under drivers/cpuidle/) rather than MCPM being the front-end to any CPU power management.

I disagree. If experience has tought us something is that you cannot have power management into cpuidle, and then have power management for CPU hotplug, and then have something else like IKS also wanting to have a say into power management, or kexec, etc. This turned out to be a complete unsolvable mess on TC2, which is the very reason why MCPM was created.

At some point you need a central place where concurrent and sometimes conflicting requests are arbitrated.

Nicolas

Nicolas Pitre

3 Sep 3 Sep

6:53 p.m.

On Tue, 3 Sep 2013, Catalin Marinas wrote:

...

On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote:

...
I'll say this only once, and then I'll shut up on this very issue.

I actually enjoy this discussion ;)

OK then. That's fine by me.

...

...
So no, I don't have this faith in firmware like you do. And I've discussed this with you in the past as well, and my mind on this issue has not changed. In particular, when you give a presentation on ARMv8 and your answer to a question from the audience is: "firmware will take care of this" I may only think you'll regret this answer one day.

Well, I don't trust firmware either and I've seen it causing hard to debug issues in the past. But actually a better statement would be that I don't trust any software (not even Linux) unless I can see the source. The big advantage Linux has is that it requires (well, unless you are NVidia ;)) opening the code and more people look at it.

I can't agree more.

...

My hope for the generic firmware is that it will turn into a proper open-source project (with a friendly license). This would make it different from old BIOS implementations.

I'm afraid this won't happen, unfortunately. "Friendly license" really depends who you talk to. Many people in the industry don't consider the GPL as very "friendly" because of the source code requirement. They much prefer the BSD license which is in their position much friendlier.

And if vendors are not constrained to provide their source code, they simply won't. Look at all the high profile BSD derrived deployments to see that they've all gone closed source.

...

Of course, bugs can happen and the firmware is harder to update but really not impossible.

Sure. Same can be said about bootloaders today. Yet they simply are not updated in most cases because it is less risk to work around them.

...

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

That's fine, obviously.

But reality will mess that up somewhat eventually. You'll have to accept machine specific quirks for firmware bugs because the firmware update is not forthcoming if at all. Same story everywhere.

...

...
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

I know, I know... From my own point of view this is rather sad.

...

How much you leave on the secure or the non-secure side goes through a great deal of discussions/reviews both with the general purpose OS and secure OS people, hence the creation of PSCI (really, it's not just about standardising the SoC support on arm64 Linux).

Again I have absolutely nothing against PSCI as in the interface specification.

[...]

...

...
Is there another kernel execution model innovation coming up for which the firmware will be an obstacle on ARM?

Firmwares evolve in time but as I said, even if we would like to, we can't eliminate them.

...
I'm just extremely worried about the simple presence of firmware and the need to call into it for complex operations such as intelligent power management when the kernel would be a much better location to perform such operations. The kernel is also the only place where those things may be improved over time, even for free from third parties, , whereas the firmware is going to be locked down with no possible evolution beyond its shipping state.

For example, after working on MCPM for a while, I do have ideas for additional performance and power usage optimization. But if that functionality is getting burned into firmware controlled by secure world then there is just no hope for _me_ to optimize things any further.

I don't dispute the above but I don't have a better solution that would accommodate secure/non-secure worlds.

So if a conclusion can be drawn out of this is that we're in agreement here.

It is on the firmware implementation front that we probably have diverging expectations.

...

With proper education, SoC vendors can learn to allow upgradable (parts of) firmware.

Is this education in ARM's plans? Is someone working on recommendations about proper design for fail-safe firmware upgrades via separate firmware components?

If there is anything I might have contributed to the the DT on ARM story it is the insistence for the need to be able to update the DTB separately from the bootloader, and so by end users without special equipment. Given OEM's reticense to let users perform bootloader upgrades, it was important that the DTB provided alongside the bootloader doesn't get the same treatment.

A broken bootloader is bad but once the kernel has booted the bootloader is out of the way. A broken DTB is a real PITA since that is what the kernel has to work with. Same goes for firmware.

So if firmware is unavoidable in ARM's future, at least there must be some contingency plans to mitigate the unavoidable buggy firmware issues to come.

...

But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

I think that Linux has gained its dominant position for one fundamental reason: source availability. No company could ever match the work force that gathered around that source code. If secure OS people would agree to this principle then things could end up more secure and more efficient. But that's not something I have influence over.

...

...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

If you have PSCI then the MCPM call graph is roughly:

mcpm_cpu_power_down() psci_power_down() psci_ops.cpu_off(power_state)

That's it. Nothing has to call back into the kernel.

Of course this is missing many details. See below for the full patch. Please note that this might not apply on top of current mainline and it shouldn't have to be TC2 specific. But that should give you a good idea of what a PSCI backend for MCPM entails.

----- >8 Date: Tue, 12 Mar 2013 15:41:57 +0000 From: Achin Gupta achin.gupta@arm.com To: dave.martin@arm.com, nicolas.pitre@linaro.org, charles.garcia-tobin@arm.com X-Mailer: git-send-email 1.7.9.5 Message-ID: 1363102919-27822-15-git-send-email-achin.gupta@arm.com Subject: [RFC PATCH v6 13/15] ARM: vexpress: add shim layer for psci backend on TC2

This patch introduces a shim layer for the TC2 platform which converts 'bL_platform_power_ops' routines to their psci counterparts. The psci counterparts are implemented by the secure firmware. The shim layer is used only when Linux is running in non-secure world and the secure firmware implements psci.

It also introduces the use of a reference count to allow a power up call to go ahead of a power down call.

Change-Id: I6a582bfac9aa7dc2f3e6ef0aaa71a0036457311f Signed-off-by: Achin Gupta achin.gupta@arm.com --- arch/arm/mach-vexpress/Makefile | 4 + arch/arm/mach-vexpress/tc2_pm_psci.c | 179 ++++++++++++++++++++++++++++++++++ 2 files changed, 183 insertions(+) create mode 100644 arch/arm/mach-vexpress/tc2_pm_psci.c

diff --git a/arch/arm/mach-vexpress/Makefile b/arch/arm/mach-vexpress/Makefile index ff2ba26..3822184 100644 --- a/arch/arm/mach-vexpress/Makefile +++ b/arch/arm/mach-vexpress/Makefile @@ -10,6 +10,10 @@ obj-$(CONFIG_ARCH_VEXPRESS_DCSCB) += dcscb.o dcscb_setup.o CFLAGS_REMOVE_dcscb.o = -pg obj-$(CONFIG_ARCH_VEXPRESS_TC2) += tc2_pm.o tc2_pm_setup.o CFLAGS_REMOVE_tc2_pm.o = -pg +ifeq ($(CONFIG_ARCH_VEXPRESS_TC2),y) +obj-$(CONFIG_ARM_PSCI) += tc2_pm_psci.o +CFLAGS_REMOVE_tc2_pm_psci.o = -pg +endif obj-$(CONFIG_SMP) += platsmp.o obj-$(CONFIG_HOTPLUG_CPU) += hotplug.o obj-$(CONFIG_VEXPRESS_TC2_CPUIDLE) += cpuidle-tc2.o diff --git a/arch/arm/mach-vexpress/tc2_pm_psci.c b/arch/arm/mach-vexpress/tc2_pm_psci.c new file mode 100644 index 0000000..c9715b8 --- /dev/null +++ b/arch/arm/mach-vexpress/tc2_pm_psci.c @@ -0,0 +1,179 @@ +/* + * arch/arm/mach-vexpress/tc2_pm_psci.c - TC2 PSCI support + * + * Created by: Achin Gupta, December 2012 + * Copyright: (C) 2012 ARM Limited + * + * Some portions of this file were originally written by Nicolas Pitre + * Copyright: (C) 2012 Linaro Limited + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include <linux/init.h> +#include <linux/kernel.h> +#include <linux/spinlock.h> +#include <linux/errno.h> + +#include <asm/bL_entry.h> +#include <asm/proc-fns.h> +#include <asm/cacheflush.h> +#include <asm/psci.h> +#include <asm/atomic.h> +#include <asm/cputype.h> + +#include <mach/motherboard.h> +#include <mach/tc2.h> + +#include <linux/vexpress.h> + +/* + * Platform specific state id understood by the firmware and used to + * program the power controller + */ +#define PSCI_POWER_STATE_ID 0 + +static atomic_t tc2_pm_use_count[TC2_MAX_CPUS][TC2_MAX_CLUSTERS]; + +static int tc2_pm_psci_power_up(unsigned int cpu, unsigned int cluster) +{ + unsigned int mpidr = (cluster << 8) | cpu; + int ret = 0; + + BUG_ON(!psci_ops.cpu_on); + + switch (atomic_inc_return(&tc2_pm_use_count[cpu][cluster])) { + case 1: + /* + * This is a request to power up a cpu that linux thinks has + * been powered down. Retries are needed if the firmware has + * seen the power down request as yet. + */ + do + ret = psci_ops.cpu_on(mpidr, + virt_to_phys(bL_entry_point)); + while (ret == -EAGAIN); + + return ret; + case 2: + /* This power up request has overtaken a power down request */ + return ret; + default: + /* Any other value is a bug */ + BUG(); + } +} + +static void tc2_pm_psci_power_down(void) +{ + struct psci_power_state power_state; + unsigned int mpidr, cpu, cluster; + + mpidr = read_cpuid_mpidr(); + cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0); + cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1); + + BUG_ON(!psci_ops.cpu_off); + + switch (atomic_dec_return(&tc2_pm_use_count[cpu][cluster])) { + case 1: + /* + * Overtaken by a power up. Flush caches, exit coherency, + * return & fake a reset + */ + asm volatile ( + "mrc p15, 0, ip, c1, c0, 0 \n\t" + "bic ip, ip, #(1 << 2) @ clear C bit \n\t" + "mcr p15, 0, ip, c1, c0, 0 \n\t" + "dsb \n\t" + "isb" + : : : "ip" ); + + flush_cache_louis(); + + asm volatile ( + "clrex \n\t" + "mrc p15, 0, ip, c1, c0, 1 \n\t" + "bic ip, ip, #(1 << 6) @ clear SMP bit \n\t" + "mcr p15, 0, ip, c1, c0, 1 \n\t" + "isb \n\t" + "dsb" + : : : "ip" ); + + return; + case 0: + /* A normal request to possibly power down the cluster */ + power_state.id = PSCI_POWER_STATE_ID; + power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN; + power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1; + + psci_ops.cpu_off(power_state); + + /* On success this function never returns */ + default: + /* Any other value is a bug */ + BUG(); + } +} + +static void tc2_pm_psci_suspend(u64 unused) +{ + struct psci_power_state power_state; + + BUG_ON(!psci_ops.cpu_suspend); + + /* On TC2 always attempt to power down the cluster */ + power_state.id = PSCI_POWER_STATE_ID; + power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN; + power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1; + + psci_ops.cpu_suspend(power_state, virt_to_phys(bL_entry_point)); + + /* On success this function never returns */ + BUG(); +} + +static const struct bL_platform_power_ops tc2_pm_power_ops = { + .power_up = tc2_pm_psci_power_up, + .power_down = tc2_pm_psci_power_down, + .suspend = tc2_pm_psci_suspend, +}; + +static void __init tc2_pm_usage_count_init(void) +{ + unsigned int mpidr, cpu, cluster; + + mpidr = read_cpuid_mpidr(); + cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0); + cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1); + + pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster); + BUG_ON(cluster >= TC2_MAX_CLUSTERS || + cpu >= vexpress_spc_get_nb_cpus(cluster)); + + atomic_set(&tc2_pm_use_count[cpu][cluster], 1); +} + +static int __init tc2_pm_psci_init(void) +{ + int ret; + + ret = psci_probe(); + if (ret) { + pr_debug("psci not found. Aborting psci init\n"); + return -ENODEV; + } + + tc2_pm_usage_count_init(); + + ret = bL_platform_power_register(&tc2_pm_power_ops); + if (!ret) + ret = bL_cluster_sync_init(NULL); + if (!ret) + pr_info("TC2 power management initialized\n"); + return ret; +} + +early_initcall(tc2_pm_psci_init);

-- 1.7.9.5

Catalin Marinas

4 Sep 4 Sep

9:57 a.m.

On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Tue, 3 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

That's fine, obviously.

But reality will mess that up somewhat eventually. You'll have to accept machine specific quirks for firmware bugs because the firmware update is not forthcoming if at all. Same story everywhere.

It's one thing to accept machine specific quirks for firmware bugs (but I'll push back as much as possible) and entirely different to accept temporary code because the firmware features are not ready yet.

...

...
...
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

I know, I know... From my own point of view this is rather sad.

Whether it's sad for Linux or not, be aware that ARM Ltd is not a Linux-only shop. There are other OSes, secure or non-secure, there are vendors that ask for these features (and this includes vendors that run Linux/Android).

...

...
With proper education, SoC vendors can learn to allow upgradable (parts of) firmware.

Is this education in ARM's plans? Is someone working on recommendations about proper design for fail-safe firmware upgrades via separate firmware components?

The generic firmware is probably a better place to provide such functionality rather than a recommendations document. But I haven't followed its development closely enough to comment (I know I raised this exact issue in the past, primarily for handling undocumented CPU errata bits accessible only in secure mode).

...

...
But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

I think that Linux has gained its dominant position for one fundamental reason: source availability. No company could ever match the work force that gathered around that source code. If secure OS people would agree to this principle then things could end up more secure and more efficient. But that's not something I have influence over.

It's not about secure OS, for many reasons this will probably remain closed source. But the firmware and UEFI are a different story and most of it can be open. I can see vendors keeping parts of the firmware closed but I would hope those are minimal (it already happens, it's not something introduced by PSCI requirements).

...

...
...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware. As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

One think I don't like about MCPM (and I raised it during review) is the cluster/cpu separation with hard-coded number of clusters. I would have really liked a linear view of the CPUs and let the back-end (or MCPM library itself) handle the topology. I don't think it's hard to change anyway.

...

If you have PSCI then the MCPM call graph is roughly:

mcpm_cpu_power_down() psci_power_down() psci_ops.cpu_off(power_state)

That's it. Nothing has to call back into the kernel.

So for arm64 we expose PSCI functionality via smp_operations (cpu up/down, suspend is work in progress). Populating smp_operations is driven from DT and it has been decoupled from the SoC code. What would the MCPM indirection bring here?

In the presence of PSCI firmware, do you agree that a potential MCPM back-end should be generic (not tied to an SoC)? In such case, what would the MCPM front-end bring which cannot be currently handled by smp_operations (or an extension to it)?

I don't (yet?) see the point of PSCI back-end to arm64 MCPM since all people are asking MCPM for is exactly to avoid the PSCI implementation.

Couple of comments on the patch below. Not aimed as a proper review:

...

--- /dev/null +++ b/arch/arm/mach-vexpress/tc2_pm_psci.c

[…]

...

+static void tc2_pm_psci_power_down(void) +{

```
  struct psci_power_state power_state;
```
```
  unsigned int mpidr, cpu, cluster;
```
```
  mpidr = read_cpuid_mpidr();
```

  cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);

  cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);

```
  BUG_ON(!psci_ops.cpu_off);
```

  switch (atomic_dec_return(&tc2_pm_use_count[cpu][cluster])) {

```
  case 1:
```
```
          /*
```

           * Overtaken by a power up. Flush caches, exit coherency,

```
           * return & fake a reset
```
```
           */
```
```
          asm volatile (
```

                  "mrc    p15, 0, ip, c1, c0, 0 \n\t"

                  "bic    ip, ip, #(1 << 2) @ clear C bit \n\t"

                  "mcr    p15, 0, ip, c1, c0, 0 \n\t"

```
                  "dsb \n\t"
```
```
                  "isb"
```
```
                  : : : "ip" );
```
```
          flush_cache_louis();
```
```
          asm volatile (
```
```
                  "clrex  \n\t"
```

                  "mrc    p15, 0, ip, c1, c0, 1 \n\t"

                  "bic    ip, ip, #(1 << 6) @ clear SMP bit \n\t"

                  "mcr    p15, 0, ip, c1, c0, 1 \n\t"

```
                  "isb \n\t"
```
```
                  "dsb"
```
```
                  : : : "ip" );
```
```
          return;
```

The above part needs to be done on the secure side, ACTLR.SMP bit cannot be cleared on the non-secure side. Is this return with coherency disabled required by MCPM?

...

```
  case 0:
```

          /* A normal request to possibly power down the cluster */

          power_state.id = PSCI_POWER_STATE_ID;

          power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;

          power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;

          psci_ops.cpu_off(power_state);

          /* On success this function never returns */

```
  default:
```

          /* Any other value is a bug */

```
          BUG();
```
```
  }
```

+static void tc2_pm_psci_suspend(u64 unused) +{

```
  struct psci_power_state power_state;
```
```
  BUG_ON(!psci_ops.cpu_suspend);
```

  /* On TC2 always attempt to power down the cluster */

  power_state.id = PSCI_POWER_STATE_ID;

  power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;

  power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;

  psci_ops.cpu_suspend(power_state, virt_to_phys(bL_entry_point));

  /* On success this function never returns */

```
  BUG();
```

CPU_SUSPEND is allowed to return if there is a pending interrupt.

Catalin

Nicolas Pitre

1:48 p.m.

On Wed, 4 Sep 2013, Catalin Marinas wrote:

...

On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Tue, 3 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

That's fine, obviously.

But reality will mess that up somewhat eventually. You'll have to accept machine specific quirks for firmware bugs because the firmware update is not forthcoming if at all. Same story everywhere.

It's one thing to accept machine specific quirks for firmware bugs (but I'll push back as much as possible) and entirely different to accept temporary code because the firmware features are not ready yet.

You know that if you push back too hard then people will simply stop submitting their code for upstream inclusion and keep it into private trees. There is a fine balance to reach.

...

...
...
I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

I know, I know... From my own point of view this is rather sad.

Whether it's sad for Linux or not, be aware that ARM Ltd is not a Linux-only shop. There are other OSes, secure or non-secure, there are vendors that ask for these features (and this includes vendors that run Linux/Android).

Obviously. And those vendors would be the happier if Linux was BSD licensed and they didn't have any GPL obligations to follow. But I'm sure you know that already.

Code cleanliness and maintainability is a concern for developers and engineers only. Product vendors don't usually care as much as their primary concern is the customer end result.

...

...
...
With proper education, SoC vendors can learn to allow upgradable (parts of) firmware.

Is this education in ARM's plans? Is someone working on recommendations about proper design for fail-safe firmware upgrades via separate firmware components?

The generic firmware is probably a better place to provide such functionality rather than a recommendations document. But I haven't followed its development closely enough to comment (I know I raised this exact issue in the past, primarily for handling undocumented CPU errata bits accessible only in secure mode).

Could this aspect officially be put on the firmware agenda at ARM? I think this is much more important than people may realize.

...

...
...
But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

I think that Linux has gained its dominant position for one fundamental reason: source availability. No company could ever match the work force that gathered around that source code. If secure OS people would agree to this principle then things could end up more secure and more efficient. But that's not something I have influence over.

It's not about secure OS, for many reasons this will probably remain closed source. But the firmware and UEFI are a different story and most of it can be open. I can see vendors keeping parts of the firmware closed but I would hope those are minimal (it already happens, it's not something introduced by PSCI requirements).

Again the issue is not PSCI per se.

If the source is available, then anyone may fix bugs independently from the OEM who might have different priorities. The firmware may evolve with advancements and innovations in the kernel.

But given the firmware is going to run in secure world I don't think any OEM will allow for easy upgrade as this would dillute the very principle of a secure world. So even if the source is available, this is not going to help much.

...

...
...
...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware.

And this is a hell of a good reason. I'm scared to death by the prospect of seeing this kind of algorithmic complexity shoved into closed firmware.

...

As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

Maybe the secure call standardization should focus on providing less abstract and more low-level interfaces then. Right now, the PSCI definition implies that everything is performed behind the interface.

But again this would be against the spirit of a secure layer with veto power on everything happening in non secure world.

...

One think I don't like about MCPM (and I raised it during review) is the cluster/cpu separation with hard-coded number of clusters. I would have really liked a linear view of the CPUs and let the back-end (or MCPM library itself) handle the topology. I don't think it's hard to change anyway.

Indeed, that's actually a simple implementation detail. And MCPM being entirely a Linux internal API, we have the freedom to change it at will.

...

...
If you have PSCI then the MCPM call graph is roughly:

mcpm_cpu_power_down() psci_power_down() psci_ops.cpu_off(power_state)

That's it. Nothing has to call back into the kernel.

So for arm64 we expose PSCI functionality via smp_operations (cpu up/down, suspend is work in progress). Populating smp_operations is driven from DT and it has been decoupled from the SoC code. What would the MCPM indirection bring here?

This looks like if you just re-invented the MCPM high-level interface.

...

From this quick description, the SMP ops would do the same as MCPM under

a different name.

...

In the presence of PSCI firmware, do you agree that a potential MCPM back-end should be generic (not tied to an SoC)?

Absolutely.

...

In such case, what would the MCPM front-end bring which cannot be currently handled by smp_operations (or an extension to it)?

Appart from having a common interface between PSCI and non-PSCI systems, the MCPM frontend is likely to handle CPU/cluster synchronization and the last man determination in the frontend. This is currently handled by backends, but we now see a common pattern which should be factored out. Not that this is highly useful to a PSCI system though.

The MCPM API also implies additional expectations that are not (and should not) be provided by PSCI, such as the ability to have unordered cpu_up/cpu_down calls because this greatly helps solving race conditions at the caller level. The reason why PSCI should not carry these expectations is because this is a Linux specific implementation detail and it is best to have a shim layer to isolate this from a lower level interface.

...

I don't (yet?) see the point of PSCI back-end to arm64 MCPM since all people are asking MCPM for is exactly to avoid the PSCI implementation.

And again there are very legitimate reasons for that. Having the cluster synchronization complexity locked behind a PSCI interface implies it has to be done in hard-to-fix firmware. It is not the fault of PSCI, but rather the notion of having algorithmic complexity into firmware which is a problem when firmware complexity should be kept to a minimum while the easily replaceable OS is responsible for it.

...

Couple of comments on the patch below. Not aimed as a proper review:

...

--- /dev/null +++ b/arch/arm/mach-vexpress/tc2_pm_psci.c

[…]

...

+static void tc2_pm_psci_power_down(void) +{

```
  struct psci_power_state power_state;
```
```
  unsigned int mpidr, cpu, cluster;
```
```
  mpidr = read_cpuid_mpidr();
```

  cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);

  cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);

```
  BUG_ON(!psci_ops.cpu_off);
```

  switch (atomic_dec_return(&tc2_pm_use_count[cpu][cluster])) {

```
  case 1:
```
```
          /*
```

           * Overtaken by a power up. Flush caches, exit coherency,

```
           * return & fake a reset
```
```
           */
```
```
          asm volatile (
```

                  "mrc    p15, 0, ip, c1, c0, 0 \n\t"

                  "bic    ip, ip, #(1 << 2) @ clear C bit \n\t"

                  "mcr    p15, 0, ip, c1, c0, 0 \n\t"

```
                  "dsb \n\t"
```
```
                  "isb"
```
```
                  : : : "ip" );
```
```
          flush_cache_louis();
```
```
          asm volatile (
```
```
                  "clrex  \n\t"
```

                  "mrc    p15, 0, ip, c1, c0, 1 \n\t"

                  "bic    ip, ip, #(1 << 6) @ clear SMP bit \n\t"

                  "mcr    p15, 0, ip, c1, c0, 1 \n\t"

```
                  "isb \n\t"
```
```
                  "dsb"
```
```
                  : : : "ip" );
```
```
          return;
```

The above part needs to be done on the secure side, ACTLR.SMP bit cannot be cleared on the non-secure side. Is this return with coherency disabled required by MCPM?

The requirement here is that the system should be put in the same state as if the kernel was re-entered from the firmware after a cpu_up operation i.e. MMU and cache off. So in this very case I don't think the code has to care about ACTLR.SMP as this would be handled by the firmware and the kernel would already be entered with this bit set.

...

```
  case 0:
```

          /* A normal request to possibly power down the cluster */

          power_state.id = PSCI_POWER_STATE_ID;

          power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;

          power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;

          psci_ops.cpu_off(power_state);

          /* On success this function never returns */

```
  default:
```

          /* Any other value is a bug */

```
          BUG();
```
```
  }
```

+static void tc2_pm_psci_suspend(u64 unused) +{

```
  struct psci_power_state power_state;
```
```
  BUG_ON(!psci_ops.cpu_suspend);
```

  /* On TC2 always attempt to power down the cluster */

  power_state.id = PSCI_POWER_STATE_ID;

  power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;

  power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;

  psci_ops.cpu_suspend(power_state, virt_to_phys(bL_entry_point));

  /* On success this function never returns */

```
  BUG();
```

CPU_SUSPEND is allowed to return if there is a pending interrupt.

Yes, but not through this path. If an interrupt is pending, the PSCI implementation should immediately return through virt_to_phys(bL_entry_point) (this is mcpm_entry_point in mainline).

Nicolas

Catalin Marinas

5 Sep 5 Sep

3:16 p.m.

I'll let the firmware guys discuss what is and is not upgradable at some point in the future (when the generic firmware will be available). This is not related just to PSCI but to other things like secure-only errata bits.

In the meantime I'll try to focus on the Linux interaction with the secure firmware (and PSCI). Also cc'ing Mark Rutland since he's working on CPU hotplug for arm64.

On Wed, Sep 04, 2013 at 02:48:48PM +0100, Nicolas Pitre wrote:

...

On Wed, 4 Sep 2013, Catalin Marinas wrote:

...
On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Tue, 3 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote

...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware.

And this is a hell of a good reason. I'm scared to death by the prospect of seeing this kind of algorithmic complexity shoved into closed firmware.

OK, so if I understand you correctly, you don't want to see PSCI firmware in use at all or at least not in it's current form (which *you* also reviewed) because you think it is too complex to be bug-free or impractical to upgrade.

I respect your opinion but do you have a more concrete proposal? The options so far:

1. (current status) Don't use PSCI firmware, let Linux handle all the CPU power management (possibly under the MCPM framework). If not all power-related actions can be done at the non-secure level, just implement non-standard SMC calls as needed. If these are changed (because in time vendor may have other security needs), add them to the driver and hope they have a way to detect or just not upstream the #ifdef'ed code.

2. New standard firmware interface, simpler and less error-prone. Handle most power management in Linux (with an MCPM-like state machine) and have guaranteed race-free calls to the firmware. In the process, also convince the secure OS guys that Linux is part of their trusted environment (I personally trust Linux more than the trusted OS ;) but this doesn't hold any water for certification purposes). Basically if you can disable coherency from the non-secure OS (e.g. CCI or just powering down a CPU without the secure OS having the chance to flush its caches), the only workaround for the secure OS would be to run UP (which is probably the case now) or flush caches at every return to the non-secure world.

3. Very similar to 2, with PSCI firmware interface but without the requirements to do cluster state coordination in firmware (with some semantic changes to the affinity arguments). Linux handles the state coordination (MCPM state machine) but PSCI firmware does the necessary flushing, coherency disabling based on the specified affinity level (it doesn't question that because it does not track the "last man"). Slightly better security model than 2 as it can flush the secure OS caches but I'm not entirely sure PSCI can avoid a state machine and whether this has other security implications.

4. MCPM state machine on top of full PSCI. Here I don't see the point of tracking cluster/last-man state in Linux if PSCI does it already. If PSCI got it wrong (broken coherency, deadlocks), MCPM cannot really solve it. Also, are there additional races by having two separate state machines for the same thing (I can't think of any now)?

5. Full PSCI with a light wrapper (smp_operations) allowing SoC to hook additional code (only if needed for parts which aren't handled by firmware). This is similar to the mcpm_platform_ops registration but *without* the MCPM state machine and without the separate cpu/cluster arguments (just linearise this space for flexibility). Other key point is that the re-entry address is driven via PSCI rather than mcpm_entry_vectors. Platform ops registration is optional, just available for flexibility (which could also mean that it is not the platform ops driving the final PSCI call, different from the MCPM framework). This approach does not enforce a secure model, it's up to the SoC vendor to allow/prevent non-secure access to power controller, CCI etc. But it still mandates common kernel entry/exit path via PSCI.

6. Full PSCI with generic CPU hotplug and cpuidle drivers. I won't list the pros/cons, that's what this thread is mainly about.

Any other options?

My goal is for 6 but 5 could be a more practical/flexible approach.

...

...
As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

Maybe the secure call standardization should focus on providing less abstract and more low-level interfaces then. Right now, the PSCI definition implies that everything is performed behind the interface.

Sorry but you are late for this call. PSCI has been discussed for nearly two years and you have been involved.

...

But again this would be against the spirit of a secure layer with veto power on everything happening in non secure world.

The veto power is one of the approaches to reduce security risks. It may be doable in other ways but it either complicates certification or reduces the secure OS functionality (like UP only).

...

...
...
If you have PSCI then the MCPM call graph is roughly:

mcpm_cpu_power_down() psci_power_down() psci_ops.cpu_off(power_state)

That's it. Nothing has to call back into the kernel.

So for arm64 we expose PSCI functionality via smp_operations (cpu up/down, suspend is work in progress). Populating smp_operations is driven from DT and it has been decoupled from the SoC code. What would the MCPM indirection bring here?

This looks like if you just re-invented the MCPM high-level interface. From this quick description, the SMP ops would do the same as MCPM under a different name.

The MCPM high level interface is nothing other than smp_ops. It's not a new API but the same smp_ops. I don't think there is much to re-invent here. The mcpm_platform_ops callbacks and registration is indeed a unification but see my point 5 above about what would be different in the context of PSCI.

MCPM state machine is the biggest innovation of this framework. While it happens to be implemented in the same MCPM code, I see it as more of a locking library that can exist outside the MCPM front- or back-end interface (but in the actual back-end implementation if PSCI isn't present). You may see a common pattern and move such state machine in the front-end but then you get to point 4 above if do it on top of PSCI.

-- Catalin

Nicolas Pitre

6 Sep 6 Sep

1:50 a.m.

On Thu, 5 Sep 2013, Catalin Marinas wrote:

...

I'll let the firmware guys discuss what is and is not upgradable at some point in the future (when the generic firmware will be available). This is not related just to PSCI but to other things like secure-only errata bits.

In the meantime I'll try to focus on the Linux interaction with the secure firmware (and PSCI). Also cc'ing Mark Rutland since he's working on CPU hotplug for arm64.

On Wed, Sep 04, 2013 at 02:48:48PM +0100, Nicolas Pitre wrote:

...
On Wed, 4 Sep 2013, Catalin Marinas wrote:

...
On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Tue, 3 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote

...
> For example, MCPM provides callbacks into the platform > code when a CPU goes down to disable coherency, flush caches etc. and > this code must call back into the MCPM to complete the CPU tear-down. If > you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware.

And this is a hell of a good reason. I'm scared to death by the prospect of seeing this kind of algorithmic complexity shoved into closed firmware.

OK, so if I understand you correctly, you don't want to see PSCI firmware in use at all or at least not in it's current form (which *you* also reviewed) because you think it is too complex to be bug-free or impractical to upgrade.

Well... PSCI as an *interface* definition is sound. I reviewed this interface definition which is great for virtualized systems as well as simple hardware.

What I didn't fully grasp at the time is the implied _complexity_ when doing power management on a cluster based system. Partly because I had not started the work on MCPM at the time and I seriously under-estimated the effort needed to 1) get it done, and 2) get it right.

Now PSCI is very abstract which is both a good and a bad thing. It is good because it makes for a very nice and clean interface that hides all the platform particularities. And I do understand why you as the maintainer of a new architecture could be fond of it as it keeps the kernel clean and free of machine specific quirks.

However it implies that a lot of the complexity currently handled by MCPM in the kernel has to move behind the PSCI interface i.e. in the secure firmware. And *that* is bad.

For some secure firmware to actually be secure, it has to be simple and obvious. Security always gets compromised to some degree with added complexity.

...

I respect your opinion but do you have a more concrete proposal? The options so far:

(current status) Don't use PSCI firmware, let Linux handle all the CPU power management (possibly under the MCPM framework). If not all power-related actions can be done at the non-secure level, just implement non-standard SMC calls as needed. If these are changed (because in time vendor may have other security needs), add them to the driver and hope they have a way to detect or just not upstream the #ifdef'ed code.

New standard firmware interface, simpler and less error-prone. Handle most power management in Linux (with an MCPM-like state machine) and have guaranteed race-free calls to the firmware. In the process, also convince the secure OS guys that Linux is part of their trusted environment (I personally trust Linux more than the trusted OS ;) but this doesn't hold any water for certification purposes). Basically if you can disable coherency from the non-secure OS (e.g. CCI or just powering down a CPU without the secure OS having the chance to flush its caches), the only workaround for the secure OS would be to run UP (which is probably the case now) or flush caches at every return to the non-secure world.

Very similar to 2, with PSCI firmware interface but without the requirements to do cluster state coordination in firmware (with some semantic changes to the affinity arguments). Linux handles the state coordination (MCPM state machine) but PSCI firmware does the necessary flushing, coherency disabling based on the specified affinity level (it doesn't question that because it does not track the "last man"). Slightly better security model than 2 as it can flush the secure OS caches but I'm not entirely sure PSCI can avoid a state machine and whether this has other security implications.

MCPM state machine on top of full PSCI. Here I don't see the point of tracking cluster/last-man state in Linux if PSCI does it already. If PSCI got it wrong (broken coherency, deadlocks), MCPM cannot really solve it. Also, are there additional races by having two separate state machines for the same thing (I can't think of any now)?

Full PSCI with a light wrapper (smp_operations) allowing SoC to hook additional code (only if needed for parts which aren't handled by firmware). This is similar to the mcpm_platform_ops registration but *without* the MCPM state machine and without the separate cpu/cluster arguments (just linearise this space for flexibility). Other key point is that the re-entry address is driven via PSCI rather than mcpm_entry_vectors. Platform ops registration is optional, just available for flexibility (which could also mean that it is not the platform ops driving the final PSCI call, different from the MCPM framework). This approach does not enforce a secure model, it's up to the SoC vendor to allow/prevent non-secure access to power controller, CCI etc. But it still mandates common kernel entry/exit path via PSCI.

Full PSCI with generic CPU hotplug and cpuidle drivers. I won't list the pros/cons, that's what this thread is mainly about.

Any other options?

My goal is for 6 but 5 could be a more practical/flexible approach.

Both those options (5 and 6) imply the state machine is in the firmware. And that's where complexity lies. So that wouldn't be my choice at all.

Option 4 is rather useless. In fact the MCPM backend for PSCI I showed you doesn't exercise the MCPM state machine. And the firmware still implements a state machine.

Differences between 2,3,4 are a bit fuzzy to me. I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.

...

...
...
As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

Maybe the secure call standardization should focus on providing less abstract and more low-level interfaces then. Right now, the PSCI definition implies that everything is performed behind the interface.

Sorry but you are late for this call. PSCI has been discussed for nearly two years and you have been involved.

And two years later I've gained enough experience to state that this might not be such a good idea for all cases after all. Sometimes you may not realize all the ramifications of a concept or design until you do actual work on it.

And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

But my point is that there is legitimacy in both models and right now I don't see a single "true" solution. Hence my claim that you should not insist on PSCI or nothing.

Nicolas

Charles Garcia-Tobin

11:04 a.m.

[snip]

...

...
I respect your opinion but do you have a more concrete proposal? The options so far:

(current status) Don't use PSCI firmware, let Linux handle all the CPU power management (possibly under the MCPM framework). If not all power-related actions can be done at the non-secure level, just implement non-standard SMC calls as needed. If these are changed (because in time vendor may have other security needs), add them to the driver and hope they have a way to detect or just not upstream the #ifdef'ed code.

New standard firmware interface, simpler and less error-prone. Handle most power management in Linux (with an MCPM-like state machine) and have guaranteed race-free calls to the firmware. In the process, also convince the secure OS guys that Linux is part of their trusted environment (I personally trust Linux more than the trusted OS ;) but this doesn't hold any water for certification purposes). Basically if you can disable coherency from the non-secure OS (e.g. CCI or just powering down a CPU without the secure OS having the chance to flush its caches), the only workaround for the secure OS would be to run UP (which is probably the case now) or flush caches at every return to the non-secure world.

Very similar to 2, with PSCI firmware interface but without the requirements to do cluster state coordination in firmware (with some semantic changes to the affinity arguments). Linux handles the state coordination (MCPM state machine) but PSCI firmware does the necessary flushing, coherency disabling based on the specified affinity level (it doesn't question that because it does not track the "last man"). Slightly better security model than 2 as it can flush the secure OS caches but I'm not entirely sure PSCI can avoid a state machine and whether this has other security implications.

MCPM state machine on top of full PSCI. Here I don't see the point of tracking cluster/last-man state in Linux if PSCI does it already. If PSCI got it wrong (broken coherency, deadlocks), MCPM cannot really solve it. Also, are there additional races by having two separate state machines for the same thing (I can't think of any now)?

Full PSCI with a light wrapper (smp_operations) allowing SoC to hook additional code (only if needed for parts which aren't handled by firmware). This is similar to the mcpm_platform_ops registration but *without* the MCPM state machine and without the separate cpu/cluster arguments (just linearise this space for flexibility). Other key point is that the re-entry address is driven via PSCI rather than mcpm_entry_vectors. Platform ops registration is optional, just available for flexibility (which could also mean that it is not the platform ops driving the final PSCI call, different from the MCPM framework). This approach does not enforce a secure model, it's up to the SoC vendor to allow/prevent non-secure access to power controller, CCI etc. But it still mandates common kernel entry/exit path via PSCI.

Full PSCI with generic CPU hotplug and cpuidle drivers. I won't list the pros/cons, that's what this thread is mainly about.

Any other options?

My goal is for 6 but 5 could be a more practical/flexible approach.

Both those options (5 and 6) imply the state machine is in the firmware. And that's where complexity lies. So that wouldn't be my choice at all.

Option 4 is rather useless. In fact the MCPM backend for PSCI I showed you doesn't exercise the MCPM state machine. And the firmware still implements a state machine.

Differences between 2,3,4 are a bit fuzzy to me. I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.

...
...
...
As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

Maybe the secure call standardization should focus on providing less abstract and more low-level interfaces then. Right now, the PSCI definition implies that everything is performed behind the interface.

Sorry but you are late for this call. PSCI has been discussed for nearly two years and you have been involved.

And two years later I've gained enough experience to state that this might not be such a good idea for all cases after all. Sometimes you may not realize all the ramifications of a concept or design until you do actual work on it.

And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

But my point is that there is legitimacy in both models and right now I don't see a single "true" solution. Hence my claim that you should not insist on PSCI or nothing.

My main problem is that I only see MCPM being legitimate in a system where linux runs in secure world. Here all the coherency/cache management, and by consequence last man tracking etc, can be done happily in Linux. However if you are going to productise a device, and consequently run linux non-secure then you need PSCI. As pointed out before in this thread, you can't disable coherency from linux or ignore secure cache maintenance. If you go with MCPM you end up in a situation where folk will do bring up using that, ignore sorting out their FW, and then end up either asking for obscure, very platform specific ways back doors into the secure world like we have today. This will obviate the whole point of PSCI, and the whole effort. If you go with PSCI you'd encourage a model where folk do it properly from the start. I can understand the need for flexibility, so having a system which ultimately always calls PSCI, but which allows a SoC callback in the path (which is essentially the option 5 above) makes sense to me. This model encourages adoption of PSCI, but gives the flexibility for early adoption and provides a development path. Finally I just don't think we should use the "FW has been crap in the past" card as way to discourage FW for ever more. That's not the right answer, there are valid uses for it and a whole industry around it. I absolutely agree with using the card to encourage better development of FW in the future. Standardised APIs is part of that story.

Charles

...

Nicolas

Nicolas Pitre

7:13 p.m.

On Fri, 6 Sep 2013, Charles Garcia-Tobin wrote:

...

[snip]

...
And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

But my point is that there is legitimacy in both models and right now I don't see a single "true" solution. Hence my claim that you should not insist on PSCI or nothing.

My main problem is that I only see MCPM being legitimate in a system where linux runs in secure world. Here all the coherency/cache management, and by consequence last man tracking etc, can be done happily in Linux. However if you are going to productise a device, and consequently run linux non-secure then you need PSCI.

Let's assume that you need a secure firmware. Whether it is PSCI or not is inconsequential for the current discussion. Obviously, if you are going to have a secure firmware then you may as well use a standard interface definition like PSCI of course. But that is not my point.

...

As pointed out before in this thread, you can't disable coherency from linux or ignore secure cache maintenance. If you go with MCPM you end up in a situation where folk will do bring up using that, ignore sorting out their FW, and then end up either asking for obscure, very platform specific ways back doors into the secure world like we have today.

But "sorting out FW" is a damn hard thing to do.

If MCPM is not in the kernel, something equivalent has to be implemented in the firmware. There is no way around that.

I'll let other judge my own abilities, but I can say that the other two ARM guys who worked on MCPM are very smart. Yet it took us about 6 months to get MCPM right on TC2. And we did introduce a few screw-ups that weren't caught by formal testing along the way. Fortunately for us we had only the kernel to update in order to fix things.

Wishful thinking doesn't work in software development. Shit happens all the time despite formal testing, only because the tests might themselves be buggy. Therefore proper engineering practices dictate that you *must* keep your hard-to-replace firmware as simple and obvious as possible otherwise you're typically stuck with its bugs for a long long time. The issue with the TC2 M3 firmware I highlighted earlier seems to be an example of that.

So what do many engineers do? They try to lower the risk by transferring complexity where it is less costly to fix after the facts i.e. in the kernel. And quite often that means very platform specific back doors into the secure world. Those back doors are significantly easier to implement and validate, with a better probability to get right on the first try.

...

This will obviate the whole point of PSCI, and the whole effort. If you go with PSCI you'd encourage a model where folk do it properly from the start.

A good software model should not pull complexity out of the kernel to move it into secure firmware. Will the IP stack be moved to the secure OS next... because we need "secure" networking? That's simply an engineering non-sense.

And maybe the whole secure world architecture is to blame for that. Having to delegate tasks to the secure firmware like cache maintenance for the non secure world that have nothing to do with secure services per se because the hardware cannot on its own ensure the integrity of the secure world otherwise is insane. The secure OS should provide services only. It certainly shouldn't be in the complex and subtle power management business at all. There is certainly something to reconsider on the hardware side here.

As a compromise the secure OS could veto some operations while it is active and relinquish that power when told to lie down. But I suspect that also requires hardware support to do properly.

...

I can understand the need for flexibility, so having a system which ultimately always calls PSCI, but which allows a SoC callback in the path (which is essentially the option 5 above) makes sense to me. This model encourages adoption of PSCI, but gives the flexibility for early adoption and provides a development path.

Again I don't mind PSCI in some cases. It is especially nice for virtual environments where the hardware _has_ to be abstracted anyway.

But the idea of trying to abstract as much hardware controls behind firmware as possible is wrong. Software has bugs, and hardware has bugs too. The kernel must have the ability to adapt to that during the life cycle of a product, and implement more efficient handling of that hardware with time.

...

Finally I just don't think we should use the "FW has been crap in the past" card as way to discourage FW for ever more. That's not the right answer, there are valid uses for it and a whole industry around it.

Obviously. If you are in the business of writing firmware code, I'm sure you are full of hatred for me at the moment and wish I'd shut up. I'll continue to assert that firmware should always be kept as simple, obvious and unobtrusive as possible nevertheless. While there are valid uses for firmware, there are too many misguided uses as well.

The second best option is to make the demonstration that firmware is actually easy to audit, and easy to fix and upgrade when real life usage in the field shows it is flawed. And there must be recommendations and mechanisms to ensure that actually happens in a timely manner. Unfortunately I'm not seeing any effort in that direction so far, nor even anyone wishing to assume the extra costs implied by this.

...

absolutely agree with using the card to encourage better development of FW in the future. Standardised APIs is part of that story.

If we're stuck with firmware then yes: we do need Standardized APIs indeed. Bad firmware with a good interface is certainly better than bad firmware with a bad interface.

But more complex firmware is not a good solution from an engineering point of view. This is why, despite the great efforts deployed by ARM and yourself in particular on PSCI, some people will opt to keep their firmware to a minimum and keep as much control into the cheaply updateable kernel instead.

Nicolas

Dave Martin

2:56 p.m.

On Thu, Sep 05, 2013 at 09:50:45PM -0400, Nicolas Pitre wrote:

...

On Thu, 5 Sep 2013, Catalin Marinas wrote:

...
I'll let the firmware guys discuss what is and is not upgradable at some point in the future (when the generic firmware will be available). This is not related just to PSCI but to other things like secure-only errata bits.

In the meantime I'll try to focus on the Linux interaction with the secure firmware (and PSCI). Also cc'ing Mark Rutland since he's working on CPU hotplug for arm64.

On Wed, Sep 04, 2013 at 02:48:48PM +0100, Nicolas Pitre wrote:

...
On Wed, 4 Sep 2013, Catalin Marinas wrote:

...
On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Tue, 3 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote >> For example, MCPM provides callbacks into the platform >> code when a CPU goes down to disable coherency, flush caches etc. and >> this code must call back into the MCPM to complete the CPU tear-down. If >> you want such thing, you need a different PSCI specification. > > Hmmm... The above statement makes no sense to me. Sorry I must have > missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware.

And this is a hell of a good reason. I'm scared to death by the prospect of seeing this kind of algorithmic complexity shoved into closed firmware.

OK, so if I understand you correctly, you don't want to see PSCI firmware in use at all or at least not in it's current form (which *you* also reviewed) because you think it is too complex to be bug-free or impractical to upgrade.

Well... PSCI as an *interface* definition is sound. I reviewed this interface definition which is great for virtualized systems as well as simple hardware.

What I didn't fully grasp at the time is the implied _complexity_ when doing power management on a cluster based system. Partly because I had not started the work on MCPM at the time and I seriously under-estimated the effort needed to 1) get it done, and 2) get it right.

Now PSCI is very abstract which is both a good and a bad thing. It is good because it makes for a very nice and clean interface that hides all the platform particularities. And I do understand why you as the maintainer of a new architecture could be fond of it as it keeps the kernel clean and free of machine specific quirks.

However it implies that a lot of the complexity currently handled by MCPM in the kernel has to move behind the PSCI interface i.e. in the secure firmware. And *that* is bad.

For some secure firmware to actually be secure, it has to be simple and obvious. Security always gets compromised to some degree with added complexity.

...
I respect your opinion but do you have a more concrete proposal? The options so far:

(current status) Don't use PSCI firmware, let Linux handle all the CPU power management (possibly under the MCPM framework). If not all power-related actions can be done at the non-secure level, just implement non-standard SMC calls as needed. If these are changed (because in time vendor may have other security needs), add them to the driver and hope they have a way to detect or just not upstream the #ifdef'ed code.

New standard firmware interface, simpler and less error-prone. Handle most power management in Linux (with an MCPM-like state machine) and have guaranteed race-free calls to the firmware. In the process, also convince the secure OS guys that Linux is part of their trusted environment (I personally trust Linux more than the trusted OS ;) but this doesn't hold any water for certification purposes). Basically if you can disable coherency from the non-secure OS (e.g. CCI or just powering down a CPU without the secure OS having the chance to flush its caches), the only workaround for the secure OS would be to run UP (which is probably the case now) or flush caches at every return to the non-secure world.

Very similar to 2, with PSCI firmware interface but without the requirements to do cluster state coordination in firmware (with some semantic changes to the affinity arguments). Linux handles the state coordination (MCPM state machine) but PSCI firmware does the necessary flushing, coherency disabling based on the specified affinity level (it doesn't question that because it does not track the "last man"). Slightly better security model than 2 as it can flush the secure OS caches but I'm not entirely sure PSCI can avoid a state machine and whether this has other security implications.

For 1-3, either the vendor commits to having no resident Secure World software, or it has to check the firmware call sequence, in which case it's hard to see how significant complexity is eliminated.

If there are vendors whose products will really have no resident security software on CPU(s) shared with Linux, these models can work well.

...

...

MCPM state machine on top of full PSCI. Here I don't see the point of tracking cluster/last-man state in Linux if PSCI does it already. If PSCI got it wrong (broken coherency, deadlocks), MCPM cannot really solve it. Also, are there additional races by having two separate state machines for the same thing (I can't think of any now)?

Full PSCI with a light wrapper (smp_operations) allowing SoC to hook additional code (only if needed for parts which aren't handled by firmware). This is similar to the mcpm_platform_ops registration but *without* the MCPM state machine and without the separate cpu/cluster arguments (just linearise this space for flexibility). Other key point is that the re-entry address is driven via PSCI rather than mcpm_entry_vectors. Platform ops registration is optional, just available for flexibility (which could also mean that it is not the platform ops driving the final PSCI call, different from the MCPM framework). This approach does not enforce a secure model, it's up to the SoC vendor to allow/prevent non-secure access to power controller, CCI etc. But it still mandates common kernel entry/exit path via PSCI.

If there is a dumb firmware which just passes through PM/cache/coherency control operations without safe sequencing, then MCPM-like coordination will be needed on the Linux side -- there's no avoiding that.

Linux must do that coordination unless the firmware does it. _Someone_ has to do it, or otherwise clusters must never be turned off unless the whole system is powered down (that's option 7 -- power down at whole-node granularity only, maybe some server folks with many nodes are actually fine with that).

...

...

Full PSCI with generic CPU hotplug and cpuidle drivers. I won't list the pros/cons, that's what this thread is mainly about.

Any other options?

My goal is for 6 but 5 could be a more practical/flexible approach.

Both those options (5 and 6) imply the state machine is in the firmware. And that's where complexity lies. So that wouldn't be my choice at all.

Option 4 is rather useless. In fact the MCPM backend for PSCI I showed you doesn't exercise the MCPM state machine. And the firmware still implements a state machine.

Differences between 2,3,4 are a bit fuzzy to me. I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

Basically yes. Most people seem to agree that it will be "too difficult" for Linux and the Secure OS to cooperate productively if they both try to drive power policy.

But what exactly is the firmware supposed to do when receiving an out-of- order "disable CCI" call?

Either you:

a) Just Do It and hope for the best. (This breaks design assumptions all over the architecture, hardware and, I suspect, all exsting Secure OSes.)

b) Refuse (requires internal MCPM in the firmware to detect the out-of- order call)

c) Virtualise it and handle the power-down asynchronously (requires internal MCPM again)

The other option is to make the "pack and hide" request explicit -- i.e., there is a Big Cluster Lock which the Normal World has to request and obtain explicitly before low-level power management calls like "disable CCI" are permitted.

The trouble is, I think that it's not possible to come up with a fully correct implementation of the Big Cluster Lock without an MCPM equivalent again, or something similarly complex. Releasing the BCL is particularly problematic ... did the Normal World really set things up right? How can you be sure? Packaging all the setup/teardown as single calls makes this easier to track, but that starts to look a lot like PSCI again.

Needless to say, this would be a significant change from PSCI, still has much or all of the complexity, and would probably get implemented wrong a few times before people get it right, especially when each vendor has their own implementation.

Cheers ---Dave

...

Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.

...
...
...
As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

Maybe the secure call standardization should focus on providing less abstract and more low-level interfaces then. Right now, the PSCI definition implies that everything is performed behind the interface.

Sorry but you are late for this call. PSCI has been discussed for nearly two years and you have been involved.

And two years later I've gained enough experience to state that this might not be such a good idea for all cases after all. Sometimes you may not realize all the ramifications of a concept or design until you do actual work on it.

And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

But my point is that there is legitimacy in both models and right now I don't see a single "true" solution. Hence my claim that you should not insist on PSCI or nothing.

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Catalin Marinas

5:18 p.m.

On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:

...

On Thu, 5 Sep 2013, Catalin Marinas wrote:

...
I respect your opinion but do you have a more concrete proposal? The options so far:

(current status) Don't use PSCI firmware, let Linux handle all the CPU power management (possibly under the MCPM framework). If not all power-related actions can be done at the non-secure level, just implement non-standard SMC calls as needed. If these are changed (because in time vendor may have other security needs), add them to the driver and hope they have a way to detect or just not upstream the #ifdef'ed code.

New standard firmware interface, simpler and less error-prone. Handle most power management in Linux (with an MCPM-like state machine) and have guaranteed race-free calls to the firmware. In the process, also convince the secure OS guys that Linux is part of their trusted environment (I personally trust Linux more than the trusted OS ;) but this doesn't hold any water for certification purposes). Basically if you can disable coherency from the non-secure OS (e.g. CCI or just powering down a CPU without the secure OS having the chance to flush its caches), the only workaround for the secure OS would be to run UP (which is probably the case now) or flush caches at every return to the non-secure world.

Very similar to 2, with PSCI firmware interface but without the requirements to do cluster state coordination in firmware (with some semantic changes to the affinity arguments). Linux handles the state coordination (MCPM state machine) but PSCI firmware does the necessary flushing, coherency disabling based on the specified affinity level (it doesn't question that because it does not track the "last man"). Slightly better security model than 2 as it can flush the secure OS caches but I'm not entirely sure PSCI can avoid a state machine and whether this has other security implications.

MCPM state machine on top of full PSCI. Here I don't see the point of tracking cluster/last-man state in Linux if PSCI does it already. If PSCI got it wrong (broken coherency, deadlocks), MCPM cannot really solve it. Also, are there additional races by having two separate state machines for the same thing (I can't think of any now)?

Full PSCI with a light wrapper (smp_operations) allowing SoC to hook additional code (only if needed for parts which aren't handled by firmware). This is similar to the mcpm_platform_ops registration but *without* the MCPM state machine and without the separate cpu/cluster arguments (just linearise this space for flexibility). Other key point is that the re-entry address is driven via PSCI rather than mcpm_entry_vectors. Platform ops registration is optional, just available for flexibility (which could also mean that it is not the platform ops driving the final PSCI call, different from the MCPM framework). This approach does not enforce a secure model, it's up to the SoC vendor to allow/prevent non-secure access to power controller, CCI etc. But it still mandates common kernel entry/exit path via PSCI.

Full PSCI with generic CPU hotplug and cpuidle drivers. I won't list the pros/cons, that's what this thread is mainly about.

Any other options?

My goal is for 6 but 5 could be a more practical/flexible approach.

Both those options (5 and 6) imply the state machine is in the firmware. And that's where complexity lies. So that wouldn't be my choice at all.

Indeed.

...

Option 4 is rather useless. In fact the MCPM backend for PSCI I showed you doesn't exercise the MCPM state machine. And the firmware still implements a state machine.

Correct. That's mainly to show that a PSCI back-end to MCPM doesn't help much. It adds some unification but we already have a common smp_ops API. The MCPM re-entry point is handled by the PSCI API.

...

Differences between 2,3,4 are a bit fuzzy to me.

4 we just discussed above. 2 and 3 are basically the same, with 3 an instantiation of a standard firmware API but it still has the same issues as 1 in terms of security.

...

I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

The problem is when it is *not* told to do so. If the non-secure OS is allowed to disable coherency (at the CCI level or simply by shutting down a CPU in a cluster) *without* the secure OS being informed, the trusted model is broken (or has to include the non-secure OS). In a more paranoid world, this part must be moved into the secure firmware and there is no way to do it without a similar "last man" state machine. It is probably hard to create an attack but random data corruption in the secure OS is not something that can be ignored.

...

Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.

Too much flexibility also has drawbacks and we have the ARM SoC past experience - code duplication, difficult single zImage, people asking for machine quirks in __v7_setup (though we managed to prevent them so far). A unified approach (like standard firmware interface) should be the default and we can later relax it if there are good reasons. This unification is more important for the server distro space as the mobile space tend not to contribute that much back into the kernel.

...

And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

As I said above, this complexity in the firmware is required to *increase* security. Of course, any complexity has its own risks but unless you include the non-secure OS in your trusted environment there is no way around (well, not with TrustZone at least and not efficiently, but you can always have a separate independent processor doing security-related stuff).

-- Catalin

Nicolas Pitre

7:52 p.m.

On Fri, 6 Sep 2013, Catalin Marinas wrote:

...

On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:

...
I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

The problem is when it is *not* told to do so.

Well, just halt the whole system in that case. Or raise a fault if you want to be nice.

This is like memory protection in user space: if you don't ask the kernel for extra memory before touching it, you get killed. But the kernel doesn't interfere with user space for the managing of that memory beyond giving it blank pages.

...

If the non-secure OS is allowed to disable coherency (at the CCI level or simply by shutting down a CPU in a cluster) *without* the secure OS being informed, the trusted model is broken (or has to include the non-secure OS). In a more paranoid world, this part must be moved into the secure firmware and there is no way to do it without a similar "last man" state machine. It is probably hard to create an attack but random data corruption in the secure OS is not something that can be ignored.

As I said, the secure OS model should imply veto power only, not executive power. That ought to be sufficient. The non-secure world can learn to inform the secure OS of its intent.

...

...
Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.

Too much flexibility also has drawbacks and we have the ARM SoC past experience - code duplication, difficult single zImage, people asking for machine quirks in __v7_setup (though we managed to prevent them so far).

No no... code duplication and difficult single zImage are not arguments I'll buy. That solely has to do with project structure and code design. Proof is that we're getting there now despite varied machine architectures, and if we had the men power at the time we could have done it from the start. The difficulty is in changing established habits, just like this secure OS model.

...

A unified approach (like standard firmware interface) should be the default and we can later relax it if there are good reasons. This unification is more important for the server distro space as the mobile space tend not to contribute that much back into the kernel.

Again this is a fallacious argument. The fastest growing architecture in the Linux kernel is ARM32 at the moment, and that is mostly about the mobile space.

It is true that the server space is not concerned as deeply about power management as the battery powered mobile space. The time-to-market pressure and life cycle are quite different as well. So that tends to favor a standard firmware interface. OTOH servers aren't as much into cost cutting to the point of reducing the number of MCUs to zero and put the equivalent functionality into TrustZone. That helps keeping firmware simple.

...

...
And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

As I said above, this complexity in the firmware is required to *increase* security.

IMHO this statement is a non-sense and a clear indication that something somewhere was not designed properly.

Nicolas

Catalin Marinas

7 Sep 7 Sep

9:42 a.m.

On 6 Sep 2013, at 20:52, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Fri, 6 Sep 2013, Catalin Marinas wrote:

...
On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:

...
I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

The problem is when it is *not* told to do so.

Well, just halt the whole system in that case. Or raise a fault if you want to be nice.

I don't think you got my point. How to force the halting of the whole system when the non-secure OS controls what to halt? It controls which CPUs to halt, when to disable cluster coherency. This is normally for good reasons like power management but a malicious non-secure OS may also use these to cause data corruption on the secure side (with various consequences, it allows options for attack).

If you want the secure side to detect what the non-secure OS tries to halt, you don't allow non-secure access to certain peripherals like CCI, power controller, DCSCB etc. At this point the only option is do it in firmware. You still want the power decision (policy) to happen in the non-secure OS but with the actual hardware access in firmware. For the same malicious use-case reasons, the firmware cannot afford to rely on the non-secure OS to prevent "last man" races.

...

This is like memory protection in user space: if you don't ask the kernel for extra memory before touching it, you get killed. But the kernel doesn't interfere with user space for the managing of that memory beyond giving it blank pages.

I don't think I understand the analogy. If I got it right, it's more like allowing user space to get its memory without trapping into the kernel.

Shifting the privilege levels down, a better analogy would be user application (non-root since root has a special 'trusted' status in Linux) able to control coherency and CPU/cluster shutdown *without* having to do system calls. Would you feel comfortable with this?

The only workaround is not to trust things controlled from outside that privilege levels, in such case coherency. This pretty much means UP only.

...

...
If the non-secure OS is allowed to disable coherency (at the CCI level or simply by shutting down a CPU in a cluster) *without* the secure OS being informed, the trusted model is broken (or has to include the non-secure OS). In a more paranoid world, this part must be moved into the secure firmware and there is no way to do it without a similar "last man" state machine. It is probably hard to create an attack but random data corruption in the secure OS is not something that can be ignored.

As I said, the secure OS model should imply veto power only, not executive power. That ought to be sufficient. The non-secure world can learn to inform the secure OS of its intent.

The veto power can only be imposed by blocking access to devices controlling coherency/shutdown. That's the only way you can safely 'teach' the non-secure OS to inform the secure OS about it.

...

...
...
Of course option 1 is the most flexible in terms of implementation efficiency, but it has drawbacks as well.

Too much flexibility also has drawbacks and we have the ARM SoC past experience - code duplication, difficult single zImage, people asking for machine quirks in __v7_setup (though we managed to prevent them so far).

No no... code duplication and difficult single zImage are not arguments I'll buy. That solely has to do with project structure and code design.

And interfaces, including those to firmware, are part of the code design.

...

Proof is that we're getting there now despite varied machine architectures, and if we had the men power at the time we could have done it from the start. The difficulty is in changing established habits, just like this secure OS model.

It depends on how you start. If you wait to see how many patterns appear and later try to unify them you need a lot more men power. The aim here is to try to provide rules and only relax the rules when there are valid reasons (sorry but not trusting the secure firmware doesn't look to me like a valid one).

...

...
A unified approach (like standard firmware interface) should be the default and we can later relax it if there are good reasons. This unification is more important for the server distro space as the mobile space tend not to contribute that much back into the kernel.

Again this is a fallacious argument. The fastest growing architecture in the Linux kernel is ARM32 at the moment, and that is mostly about the mobile space.

That was an observation, I don't see lots of code into mainline that results from mobile device development. We see some SoC vendors contributing (though not all) but many just keep it out of tree. Those doing out of tree development don't always care to put the effort into a clean, standard implementation. Which I find a bit sad.

...

...
...
And this is not my call to make either. System vendors will choose their own poison for themselves. Between risk inducing complexity in secure firmware and non-standard low-level machine specific L3 calls I don't think there is much to rejoice about.

As I said above, this complexity in the firmware is required to *increase* security.

IMHO this statement is a non-sense and a clear indication that something somewhere was not designed properly.

It's only non-sense if you don't fully understand the security implications I tried to explain above.

Catalin

Nicolas Pitre

8:31 p.m.

On Sat, 7 Sep 2013, Catalin Marinas wrote:

...

On 6 Sep 2013, at 20:52, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Fri, 6 Sep 2013, Catalin Marinas wrote:

...
On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:

...
I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

The problem is when it is *not* told to do so.

Well, just halt the whole system in that case. Or raise a fault if you want to be nice.

I don't think you got my point. How to force the halting of the whole system when the non-secure OS controls what to halt? It controls which CPUs to halt, when to disable cluster coherency. This is normally for good reasons like power management but a malicious non-secure OS may also use these to cause data corruption on the secure side (with various consequences, it allows options for attack).

What I meant is:

- Secure OS traps on any attempt from the non secure OS to disable coherency or halt CPUs while it is active.

- Non-secure OS wants to do some power management so it tells secure OS to pack its things and remove its hands from the hardware controls.

- While non-secure OS has control over the hardware knobs, secure OS refuses to operate.

- Non-secure OS tells secure OS to come back. Secure OS reinstates its watch guard on the hardware control knobs.

- If non-secure OS attempts to touch the hardware knobs without telling secure OS to get away first, secure OS takes offence and either hangs the system or signals a fault.

...

If you want the secure side to detect what the non-secure OS tries to halt, you don't allow non-secure access to certain peripherals like CCI, power controller, DCSCB etc.

As I say above. But...

...

At this point the only option is do it in firmware.

What I'm saying is that there could be another option i.e. telling secure OS to lie down and give non-secure OS control over the hardware knobs until non-secure OS asks secure OS to come back into service.

...

You still want the power decision (policy) to happen in the non-secure OS but with the actual hardware access in firmware.

That's where things get murky. The policy comes as a result of last man determination, etc. In other words, the policy is not only about "I want to save power now". It is also "what kind of power saving I can aford now". And that's basically what MCPM does. With an abstract interface such as PSCI, that policy decision is moved into firmware.

...

same malicious use-case reasons, the firmware cannot afford to rely on the non-secure OS to prevent "last man" races.

Again, by the time non-secure OS attempts to determine the last man, it should have told the secure OS to take cover.

...

...
This is like memory protection in user space: if you don't ask the kernel for extra memory before touching it, you get killed. But the kernel doesn't interfere with user space for the managing of that memory beyond giving it blank pages.

I don't think I understand the analogy. If I got it right, it's more like allowing user space to get its memory without trapping into the kernel.

I hope my explanation above clears up this analogy now.

...

Shifting the privilege levels down, a better analogy would be user application (non-root since root has a special 'trusted' status in Linux) able to control coherency and CPU/cluster shutdown *without* having to do system calls. Would you feel comfortable with this?

Let me propose a counter-example: with PSCI and the power management in secure firmware is like having the GNOME Power Manager compiled into the kernel. It would work of course, but maybe the GNOME developers would prefer dealing with it in user space instead without having to update the kernel when there is a bug in the Power Manager.

...

The only workaround is not to trust things controlled from outside that privilege levels, in such case coherency. This pretty much means UP only.

And why is that a problem? Certainly the secure OS shouldn't need to be that CPU hungry, just like the Linux kernel is not supposed to take away too much CPU cycles from user space.

...

...
As I said, the secure OS model should imply veto power only, not executive power. That ought to be sufficient. The non-secure world can learn to inform the secure OS of its intent.

The veto power can only be imposed by blocking access to devices controlling coherency/shutdown. That's the only way you can safely 'teach' the non-secure OS to inform the secure OS about it.

That's what I say above.

...

It depends on how you start. If you wait to see how many patterns appear and later try to unify them you need a lot more men power. The aim here is to try to provide rules and only relax the rules when there are valid reasons (sorry but not trusting the secure firmware doesn't look to me like a valid one).

It's not about mistrust. It is about proper software complexity maintenance.

Nicolas

Catalin Marinas

8 Sep 8 Sep

12:32 p.m.

On 7 Sep 2013, at 21:31, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Sat, 7 Sep 2013, Catalin Marinas wrote:

...
On 6 Sep 2013, at 20:52, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Fri, 6 Sep 2013, Catalin Marinas wrote:

...
On Fri, Sep 06, 2013 at 02:50:45AM +0100, Nicolas Pitre wrote:

...
I understand the issue with having a secure OS that needs to protect itself from the nasty Linux world. However, if I understand the model right, the secure OS is there to provide special services to the non-secure OS and not the reverse. Therefore the secure OS should simply pack and hide its things when told to do so, right?

The problem is when it is *not* told to do so.

Well, just halt the whole system in that case. Or raise a fault if you want to be nice.

I don't think you got my point. How to force the halting of the whole system when the non-secure OS controls what to halt? It controls which CPUs to halt, when to disable cluster coherency. This is normally for good reasons like power management but a malicious non-secure OS may also use these to cause data corruption on the secure side (with various consequences, it allows options for attack).

What I meant is:

Secure OS traps on any attempt from the non secure OS to disable

coherency or halt CPUs while it is active.

Good. So we agree that the non-secure OS cannot freely disable coherency or halt CPUs without the secure OS being informed first.

The architecture does not allow trapping at EL3 as that's normally for hypervisor-type implementations. But it can indeed (subject to SoC implementation) block access to certain peripherals, in which case the non-secure OS (at EL1) most likely gets an external synchronous abort.

...

Non-secure OS wants to do some power management so it tells secure OS

to pack its things and remove its hands from the hardware controls.

OK, so let's assume the non-secure OS does an SMC #PREPARE_* so that the firmware enables non-secure access to such hardware after packing its things.

...

While non-secure OS has control over the hardware knobs, secure OS

refuses to operate.

Non-secure OS tells secure OS to come back. Secure OS reinstates its

watch guard on the hardware control knobs.

If non-secure OS attempts to touch the hardware knobs without telling

secure OS to get away first, secure OS takes offence and either hangs the system or signals a fault.

What you are missing here is that secure OS "packing its things" is a lot more complex than simply "refusing to operate". Let's consider some scenarios:

First scenario, the non-secure OS tells the secure one to pack all its things, no matter whether it's CPU suspend or power-down:

1. Non-secure OS issues SMC #SECURE_PACK_ALL. 2. Secure OS needs to issue IPIs to all the CPUs that may be running secure code. 3. Non-secure OS performs CPU or cluster power down.

This is either inefficient (secure OS waking up all the CPUs that may be in suspend) or just not possible if some CPUs were in power-down mode rather than suspend. In addition, if only a CPU is in suspend, you still want the secure OS to work on the other CPUs. We can just dismiss the "pack all things" scenario.

Second scenario, per-CPU SMC #PREPARE_CPU_DOWN (or SUSPEND):

1. Non-secure OS issues SMC #PREPARE_CPU_DOWN. 2. Secure OS disables the MMU (coherency) and flushes its caches on that CPU. It then enables non-secure access to the power controller. 3. Non-secure OS performs CPU power down.

This scenario only works if the secure firmware can control CPU and cluster down independently. Let's assume this is doable, so we move to the next scenario.

Third scenario, per-cluster SMC #PREPARE_CLUSTER_DOWN (or SUSPEND) as a result of a 'last man' detection (in the non-secure OS):

1. Non-secure OS issues SMC #PREPARE_CLUSTER_DOWN 2. Secure OS disables the MMU on that CPU, flushes L1 and L2 caches, enables non-secure access to power controller. 3. Non-secure OS performs cluster power down.

At point 2 above, the secure OS has 3 options:

2.a) trusts the non-secure OS to have shut down the other CPUs. 2.b) issues IPI to the other CPUs in the cluster to pack things (flush caches, disable MMU). 2.c) refuses to enable non-secure access to power controller.

2.a breaks the security model. 2.b has the same issues with the first scenario (which CPUs to send the IPI to?). 2.c is the safest but it *requires* 'last man' state machine in the *secure* firmware (same with 2.b, it would need to track which CPUs in the cluster are still up).

You can do the above exercise again but instead of enabling non-secure access to the power controller, the firmware would perform the actual power controller action. You'll see that the cluster scenario still requires the firmware to track the 'last man' state.

...

...
You still want the power decision (policy) to happen in the non-secure OS but with the actual hardware access in firmware.

That's where things get murky. The policy comes as a result of last man determination, etc. In other words, the policy is not only about "I want to save power now". It is also "what kind of power saving I can afford now". And that's basically what MCPM does. With an abstract interface such as PSCI, that policy decision is moved into firmware.

Wrong. PSCI ops get an affinity parameter whether its CPU or cluster power down/suspend. Of course, you can always ask for cluster if only interested in power saving and PSCI can choose what it is safe. There isn't anything in PSCI that would take CPU vs cluster decision away from the non-secure OS.

And the MCPM framework is not the place for such CPU vs cluster policy either. This needs is decided higher up in the cpuidle subsystem and in an abstract terms like target residency, time taken to recover from various low power states. You may go for cluster down directly if 'last man' but may as well go for CPU down first even if 'last man'. This is a decision to be taken by the cpuidle governor and *not* by MCPM. PSCI already allows this via affinity parameter.

Of course, we could have said a PSCI CPU_POWER_DOWN with cluster affinity should return an error if not 'last man'. But this would have required a duplicate 'last man' state machine in the non-secure OS.

...

...
same malicious use-case reasons, the firmware cannot afford to rely on the non-secure OS to prevent "last man" races.

Again, by the time non-secure OS attempts to determine the last man, it should have told the secure OS to take cover.

It cannot take cover entirely just because a CPU is going into idle. See above.

...

...
Shifting the privilege levels down, a better analogy would be user application (non-root since root has a special 'trusted' status in Linux) able to control coherency and CPU/cluster shutdown *without* having to do system calls. Would you feel comfortable with this?

Let me propose a counter-example: with PSCI and the power management in secure firmware is like having the GNOME Power Manager compiled into the kernel. It would work of course, but maybe the GNOME developers would prefer dealing with it in user space instead without having to update the kernel when there is a bug in the Power Manager.

I understand your uneasiness with more complex firmware but I now wonder whether you completely missed the point of PSCI. I'll restate - it does *not* take away the power management policy from the non-secure, high-level OS. It does what it is *asked* to do and in a safe, secure manner. This safety *requires* 'last man' state machine in the firmware. You can have another state machine in the non-secure OS if you want/need to, but as I said above, such CPU vs cluster should be decided based on cost, residency and that's by the cpuidle governor. MCPM or PSCI can only choose the safest state for the security level they run at.

...

...
The only workaround is not to trust things controlled from outside that privilege levels, in such case coherency. This pretty much means UP only.

And why is that a problem? Certainly the secure OS shouldn't need to be that CPU hungry, just like the Linux kernel is not supposed to take away too much CPU cycles from user space.

It's not about CPU intensive tasks. It is about the secure OS being available on all CPUs in an MP system. This secure OS can just be a library with a big lock to serialise MP access. But as long as it needs to run code on more than one CPU it needs to rely safe cache coherency.

Imagine a secure OS which gets some data from secure storage and provides it to the non-secure OS:

1. Such data is copied from a PIO secure device as a result of an FIQ. If the FIQ happens on CPU0, the secure firmware would dirty the caches on that CPU. 2. A non-secure OS asks for that data on CPU1 via an SMC. 3. The secure OS performs a memcpy from the buffer previously allocated for the FIQ copy to the non-secure buffer.

The above is a normal secure service provided to the non-secure OS and memcpy in step 3 requires cache coherency, otherwise CPU1 can access old data and leak information.

You can think of other scenarios where a (malicious) full cluster is shut down but the non-secure OS 'missed' one of the CPUs (third scenario above). Same loss of data, information leak.

Catalin

Nicolas Pitre

4:16 p.m.

On Sun, 8 Sep 2013, Catalin Marinas wrote:

...

On 7 Sep 2013, at 21:31, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Sat, 7 Sep 2013, Catalin Marinas wrote:

...
You still want the power decision (policy) to happen in the non-secure OS but with the actual hardware access in firmware.

That's where things get murky. The policy comes as a result of last man determination, etc. In other words, the policy is not only about "I want to save power now". It is also "what kind of power saving I can afford now". And that's basically what MCPM does. With an abstract interface such as PSCI, that policy decision is moved into firmware.

Wrong. PSCI ops get an affinity parameter whether its CPU or cluster power down/suspend. Of course, you can always ask for cluster if only interested in power saving and PSCI can choose what it is safe. There isn't anything in PSCI that would take CPU vs cluster decision away from the non-secure OS.

And the MCPM framework is not the place for such CPU vs cluster policy either. This needs is decided higher up in the cpuidle subsystem and in an abstract terms like target residency, time taken to recover from various low power states. You may go for cluster down directly if 'last man' but may as well go for CPU down first even if 'last man'. This is a decision to be taken by the cpuidle governor and *not* by MCPM. PSCI already allows this via affinity parameter.

I think this shows a misunderstanding of the role of MCPM on your part.

Indeed the cpuidle layer is responsible for deciding what level of power saving should be applied. But that is done on a per CPU basis. It *has* to be done on a per CPU basis because it is too difficult to track what's going on on the other CPUs in every subsystems interested in some form of power management.

What MCPM does is to receive this power saving request from cpuidle on individual CPUs including their target residency, etc. It also receives similar requests for CPU hotplug and so on. And then MCPM _arbitrates_ the combination of those requests according to 1) the sttrictest restrictions in terms of wake-up latency of _all_ CPUs in the same power domain, and 2) the state of the other CPUs which might be in the process of coming back from an interrupt or any other event, and 3) the particularities of the hardware platform where this is happening. Not only that, but the determination of the best power saving mode to engage must be done in a race free manner that satisfies all the criteria on all CPUs. And the "race free" here must not be underestimated because the hardware might be in all varying state of coherency here hence the MCPM ad hoc state machine outside of the regular kernel exclusion mechanisms which took us so long to get right.

So the concept of "policy" has to be split in two parts: what is _desired_ by the upper layer such as cpuidle as determined by the governor and its view of the system load and utilisation patterns vs implied costs, and the second part which is the _possible_ power saving mode according to the sum of all the constraints presented to MCPM by various requestors. And because the action of shutting down a CPU or a cluster may take some time (think of cache flushing) then those constraints may also change _during_ the operation and proper measures should be taken to re-evaluate the power management decision dynamically. And that can be achieved only by having simultaneous visibility into both the higher level requirements and the lower level changing hardware states.

The coupled C-states in the cpuidle is a good example of where this separation was not done properly. It was used initially to handle the CPU vs cluster power down on TC2 and that turned out to be impossible to work with outside of the cpuidle context such as IKS or CPU hotplug. Some people are even thinking of getting rid of the coupled C-state layer entirely in favor of MCPM even on pre b.L systems since it represents a better separation of responsibilities and cleaner design overall.

Of course MCPM is not "done" yet. There are many things that still can be improved to be more efficient. But those improvements need research and experiments. And those might be either generic or completely different from one hardware platform to another. Yet, what MCPM provides is a proper separation of power management responsibilities so the higher level and the lower level can be developed and improved separately.

And above all, it needs a way for *asy* updates of the corresponding code over time when improvements are developed.

...

I understand your uneasiness with more complex firmware but I now wonder whether you completely missed the point of PSCI. I'll restate - it does *not* take away the power management policy from the non-secure, high-level OS. It does what it is *asked* to do and in a safe, secure manner.

I also wonder if on your end you missed the point of MCPM. I hope that you understand now that power management policy is far more elaborate and intricate than what the cpuidle layer should be concerned about.

So I reitterate my assertion that something is wrong in the overall secure OS architecture if it has to be that intimate with power management to the point of locking it up into firmware in order to remain secure.

Nicolas

Catalin Marinas

9 Sep 9 Sep

1:02 p.m.

On Sun, Sep 08, 2013 at 05:16:16PM +0100, Nicolas Pitre wrote:

...

On Sun, 8 Sep 2013, Catalin Marinas wrote:

...
On 7 Sep 2013, at 21:31, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Sat, 7 Sep 2013, Catalin Marinas wrote:

...
You still want the power decision (policy) to happen in the non-secure OS but with the actual hardware access in firmware.

That's where things get murky. The policy comes as a result of last man determination, etc. In other words, the policy is not only about "I want to save power now". It is also "what kind of power saving I can afford now". And that's basically what MCPM does. With an abstract interface such as PSCI, that policy decision is moved into firmware.

Wrong. PSCI ops get an affinity parameter whether its CPU or cluster power down/suspend. Of course, you can always ask for cluster if only interested in power saving and PSCI can choose what it is safe. There isn't anything in PSCI that would take CPU vs cluster decision away from the non-secure OS.

And the MCPM framework is not the place for such CPU vs cluster policy either. This needs is decided higher up in the cpuidle subsystem and in an abstract terms like target residency, time taken to recover from various low power states. You may go for cluster down directly if 'last man' but may as well go for CPU down first even if 'last man'. This is a decision to be taken by the cpuidle governor and *not* by MCPM. PSCI already allows this via affinity parameter.

I think this shows a misunderstanding of the role of MCPM on your part.

My understanding is mostly based on what's currently in mainline and the to be merged TC2 code. What I may not be aware of is future plans for MCPM, the future use of residency parameter (which I don't think should be handled in MCPM, see more below).

...

Indeed the cpuidle layer is responsible for deciding what level of power saving should be applied. But that is done on a per CPU basis. It *has* to be done on a per CPU basis because it is too difficult to track what's going on on the other CPUs in every subsystems interested in some form of power management.

I agree.

...

What MCPM does is to receive this power saving request from cpuidle on individual CPUs including their target residency, etc. It also receives similar requests for CPU hotplug and so on. And then MCPM _arbitrates_ the combination of those requests according to

I also agree that (in the absence of anything else) MCPM needs to arbitrate the combination of such requests.

...

the sttrictest restrictions in terms of wake-up latency of _all_

CPUs in the same power domain, and

Wouldn't the strictest restrictions just translate to min(C-state(CPUs-in-cluster)), min(C-state(clusters)) etc.? IOW simple if/then/or/and rules because deeper C states have higher target residency and wake-up latency?

...

the state of the other CPUs which might be in the process of coming

back from an interrupt or any other event, and 3) the particularities of the hardware platform where this is happening.

It's fine for MCPM to handle in the absence of any other synchronisation (which could be firmware).

...

So the concept of "policy" has to be split in two parts: what is _desired_ by the upper layer such as cpuidle as determined by the governor and its view of the system load and utilisation patterns vs implied costs, and the second part which is the _possible_ power saving mode according to the sum of all the constraints presented to MCPM by various requestors.

And that's where I think MCPM (or PSCI) should only be concerned with C-state concepts (and correct arbitration). Pushing actions based on the expected residency down to the MCPM back-end is a bad design decision IMHO.

Taking the TC2 code example (it may be extended, I don't know the plans here) it seems that the cpuidle driver is only concerned with the C1 state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of deeper sleep states. The MCPM back-end would get an expected residency information and make another decision for deeper sleep states. Where does it get the residency information from? Isn't this the estimation done by the cpuidle governor? At this point you pretty much move part of cpuidle governor functionality (and its concepts like target residency) down to the MCPM back-end level. Such split will have bad consequences longer term with code duplication between back-ends, harder to understand and maintain cpuidle decision points.

...

And because the action of shutting down a CPU or a cluster may take some time (think of cache flushing) then those constraints may also change _during_ the operation and proper measures should be taken to re-evaluate the power management decision dynamically. And that can be achieved only by having simultaneous visibility into both the higher level requirements and the lower level changing hardware states.

I understand the races and how MCPM avoids them. But why not keep the concepts clear: (1) residency and best C-state recommendation in cpuidle (policy), (2) actual C-state hardware setting in MCPM (mechanism).

Point (1) is a cpuidle driver defining C states (for a single CPU, it doesn't need to be concerned with cluster state, just abstract states):

C1: CPU suspend mode C2: cluster suspend mode C3: system suspend mode etc.

Each of these states have corresponding target_residency, exit_latency. The cpuidle governor makes the best recommendation for a each CPU individually. If, for example, it expects long sleep for a CPU, can ask for (or recommend) a C2/C3 state directly.

Point (2) above is about MCPM (or PSCI) having an overall view of the cluster/system that allows it to select the best safe recommended C-state. Simplified pseudo-code:

if (all CPUs in cluster (have a recommended C2 state || are in power-down) && no CPU in cluster is coming up)

Enable cluster suspend

You can continue the logic for other C states, add more logic about CPUs coming up to avoid races. But this still means the strictest of all states (which normally means Cx stricter than Cy for x < y) in a race-free manner.

What I don't get is why you want to make decisions based on expected residency in the MCPM (framework or back-end). Isn't the C-state and the strictness ordering enough?

...

So I reitterate my assertion that something is wrong in the overall secure OS architecture if it has to be that intimate with power management to the point of locking it up into firmware in order to remain secure.

One of the ARM security architecture features is secure vs non-secure cache separation. Once the non-secure OS actions can affect the secure caches, the security model is broken. In such case the only way the secure OS can be secure is by not relying on its caches. That's a pretty simple model.

-- Catalin

Nicolas Pitre

2:13 p.m.

On Mon, 9 Sep 2013, Catalin Marinas wrote:

...

On Sun, Sep 08, 2013 at 05:16:16PM +0100, Nicolas Pitre wrote:

...
What MCPM does is to receive this power saving request from cpuidle on individual CPUs including their target residency, etc. It also receives similar requests for CPU hotplug and so on. And then MCPM _arbitrates_ the combination of those requests according to

I also agree that (in the absence of anything else) MCPM needs to arbitrate the combination of such requests.

...

the sttrictest restrictions in terms of wake-up latency of _all_

CPUs in the same power domain, and

Wouldn't the strictest restrictions just translate to min(C-state(CPUs-in-cluster)), min(C-state(clusters)) etc.? IOW simple if/then/or/and rules because deeper C states have higher target residency and wake-up latency?

Sure, just like the CPU,cluster tuple vs MPIDR, this is just a different way to communicate the information. The current code in mainline is currently rather rough and serves as a basis for further discussion. IOW this is the Linux kernel with a policy of having no internal stable APIs which may easily evolve to fit the overall design over time. Those are simple implementation details compared to the question of whether or not this should live in firmware or in the kernel.

For instance, what is not universal are the C-state definitions. Hardware is getting more complex and varied, and it is becoming hard to simply abstract different power modes into a linear numerical scale. So the C-state concept will also need to evolve.

...

...

the state of the other CPUs which might be in the process of coming

back from an interrupt or any other event, and 3) the particularities of the hardware platform where this is happening.

It's fine for MCPM to handle in the absence of any other synchronisation (which could be firmware).

It could be firmware... if you are convinced you'll be able to update the firmware easily if the synchronization model changes (which as I highlighted already is far from obvious as exemplified by needed bug fixes in the TC2 firmware over 6 months after reporting them), or if you can be convinced that your better than everyone else and your firmware implementation is totally free of bugs and that the implemented synchronization model will never need to change.

Look, I _know_ that firmware could be made to do the job and PSCI is an appropriate abstraction in that case. That is not the point. So let me refocus the discussion back to my initial concern that started this conversation.

No one so far was able to make the demonstration that any firmware implementation would be bug free, and/or flexible enough to cater for future innovations in the kernel, and/or easily updateable after product deployment just like the kernel is.

Complexity always induces bugs, and bugs in the firmware are order of magnitude more costly than bugs in the kernel. That cost is either in the bug fixing with re-certification of the firmware and the sensitive update process, or in the need to work around those bugs that will be there forever and the inability to improve things over time only with a simple non-secure software update. But I'm sure you know that already.

Therefore having (equivalent of) MCPM in the firmware is _not_ cost effective. So it is to be expected that some people will make the engineering decision not to do the full power synchronization in the firmware and opt for simpler and more flexible alternatives to PSCI from the firmware perspective.

I'm sorry to bring this over again, but the discussion seems to keep being diverted away from this fundamental point with no concrete answers. I'm attributing this to a flaw in the overall secure world architecture that you have to put up with, hence the apparent diversion.

Nicolas

Grant Likely

2:43 p.m.

On Mon, Sep 9, 2013 at 3:13 PM, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Mon, 9 Sep 2013, Catalin Marinas wrote: Therefore having (equivalent of) MCPM in the firmware is _not_ cost effective. So it is to be expected that some people will make the engineering decision not to do the full power synchronization in the firmware and opt for simpler and more flexible alternatives to PSCI from the firmware perspective.

I'm sorry to bring this over again, but the discussion seems to keep being diverted away from this fundamental point with no concrete answers. I'm attributing this to a flaw in the overall secure world architecture that you have to put up with, hence the apparent diversion.

As strongly as I am able, I second this point. I hazard that for a lot of products, the secure world increases the complexity of the software stack without providing any benefit. This is particularly true for a lot of embedded products where all of the hardware support revolves around the Linux kernel. Putting part of the functionality in firmware adds complexity to debugging and it makes upgrades more complicated.

As Nico said, this is *not* an argument against PSCI. When secure world is there it is absolutely the right thing to do.

However, there needs to be acknowledgement that some users will chose to put as little as possible in Secure World because they don't need or want it.

Lorenzo Pieralisi

5:12 p.m.

On Mon, Sep 09, 2013 at 02:02:47PM +0100, Catalin Marinas wrote:

...

On Sun, Sep 08, 2013 at 05:16:16PM +0100, Nicolas Pitre wrote:

[...]

...

...
So the concept of "policy" has to be split in two parts: what is _desired_ by the upper layer such as cpuidle as determined by the governor and its view of the system load and utilisation patterns vs implied costs, and the second part which is the _possible_ power saving mode according to the sum of all the constraints presented to MCPM by various requestors.

And that's where I think MCPM (or PSCI) should only be concerned with C-state concepts (and correct arbitration). Pushing actions based on the expected residency down to the MCPM back-end is a bad design decision IMHO.

Taking the TC2 code example (it may be extended, I don't know the plans here) it seems that the cpuidle driver is only concerned with the C1 state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of deeper sleep states. The MCPM back-end would get an expected residency information and make another decision for deeper sleep states. Where does it get the residency information from? Isn't this the estimation done by the cpuidle governor? At this point you pretty much move part of cpuidle governor functionality (and its concepts like target residency) down to the MCPM back-end level. Such split will have bad consequences longer term with code duplication between back-ends, harder to understand and maintain cpuidle decision points.

IMHO the subject of this thread should not be related to power management policy decisions and where they should live. The goal of MCPM and PSCI was not about defining policy for power management but providing mechanism and I agree with Catalin on this, we have to keep them separate. Then if the MCPM or PSCI implementation want to demote a C-state request since the code is about to flush the L2 cache with a wake-up interrupt pending that's perfectly fine by me, that is what Intel HW does BTW. But those are just optimizations, policy is implemented in the kernel, regardless of MCPM or PSCI. Let's always keep in mind that those policy decisions might and will be wrong sometimes eg:

1) last man in a cluster (in MCPM or PSCI kingdom) polls pending IRQs 2) No IRQs pending - last man starts flushing L2 3) a packet shows its head at the NI and triggers an IRQ

-> policy decision goes for a toss (ie L2 is flushed for nothing)

Since the kernel has no crystal ball, policy decisions sometimes might be wrong and that's FINE and this will happen even if MCPM and PSCI trim those policy decisions (demoting a C-state is fine, and it is done in Intel world in HW all the time).

And yes, the menu governor has been written for Intel platforms where cluster is a non-existing concept in C-states terms; reading C-states (in eg TC2) is misleading on ARM since we are forced to fill C-states with target residencies values that cater for cluster states even if that state *depends* on the state of other CPUS. The menu governor makes decision on a per-CPU basis and this is not optimal for ARM, it is as simple as that.

All this long-winded explanation to say that the debate MCPM vs PSCI has nothing to do with power management policies, that are better kept in the generic kernel layers and improved for ARM as a whole.

IMHO the debate must be and is around the coordination interface, which is by far the most important feature of MCPM and I think you summarized the concepts very well, with respective pros and cons; if I am allowed to give my opinion, please do not split the coordination across layers, that would be a total disaster - CPUs must be coordinated at the level where syncronization is required (if eg disabling/enabling CCI has to be done in secure world, the coordination scheme must live in secure code).

Lorenzo

Nicolas Pitre

6:02 p.m.

On Mon, 9 Sep 2013, Lorenzo Pieralisi wrote:

...

On Mon, Sep 09, 2013 at 02:02:47PM +0100, Catalin Marinas wrote:

...
Taking the TC2 code example (it may be extended, I don't know the plans here) it seems that the cpuidle driver is only concerned with the C1 state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of deeper sleep states. The MCPM back-end would get an expected residency information and make another decision for deeper sleep states. Where does it get the residency information from? Isn't this the estimation done by the cpuidle governor? At this point you pretty much move part of cpuidle governor functionality (and its concepts like target residency) down to the MCPM back-end level. Such split will have bad consequences longer term with code duplication between back-ends, harder to understand and maintain cpuidle decision points.

IMHO the subject of this thread should not be related to power management policy decisions and where they should live. The goal of MCPM and PSCI was not about defining policy for power management but providing mechanism and I agree with Catalin on this, we have to keep them separate.

I do agree as well. That's now where my argument fundamentally is. Please let's not divert the discussion again.

Nicolas

Nicolas Pitre

6:20 p.m.

On Mon, 9 Sep 2013, Nicolas Pitre wrote:

...

On Mon, 9 Sep 2013, Lorenzo Pieralisi wrote:

...
On Mon, Sep 09, 2013 at 02:02:47PM +0100, Catalin Marinas wrote:

...
Taking the TC2 code example (it may be extended, I don't know the plans here) it seems that the cpuidle driver is only concerned with the C1 state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of deeper sleep states. The MCPM back-end would get an expected residency information and make another decision for deeper sleep states. Where does it get the residency information from? Isn't this the estimation done by the cpuidle governor? At this point you pretty much move part of cpuidle governor functionality (and its concepts like target residency) down to the MCPM back-end level. Such split will have bad consequences longer term with code duplication between back-ends, harder to understand and maintain cpuidle decision points.

IMHO the subject of this thread should not be related to power management policy decisions and where they should live. The goal of MCPM and PSCI was not about defining policy for power management but providing mechanism and I agree with Catalin on this, we have to keep them separate.

I do agree as well. That's now where my argument fundamentally is.

I meant "not" not "now" here, just to be clear.

...

Please let's not divert the discussion again.

Nicolas

Catalin Marinas

10 Sep 10 Sep

4:43 p.m.

On Mon, Sep 09, 2013 at 07:02:50PM +0100, Nicolas Pitre wrote:

...

On Mon, 9 Sep 2013, Lorenzo Pieralisi wrote:

...
On Mon, Sep 09, 2013 at 02:02:47PM +0100, Catalin Marinas wrote:

...
Taking the TC2 code example (it may be extended, I don't know the plans here) it seems that the cpuidle driver is only concerned with the C1 state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of deeper sleep states. The MCPM back-end would get an expected residency information and make another decision for deeper sleep states. Where does it get the residency information from? Isn't this the estimation done by the cpuidle governor? At this point you pretty much move part of cpuidle governor functionality (and its concepts like target residency) down to the MCPM back-end level. Such split will have bad consequences longer term with code duplication between back-ends, harder to understand and maintain cpuidle decision points.

IMHO the subject of this thread should not be related to power management policy decisions and where they should live. The goal of MCPM and PSCI was not about defining policy for power management but providing mechanism and I agree with Catalin on this, we have to keep them separate.

I do agree as well. That's not where my argument fundamentally is. Please let's not divert the discussion again.

To avoid diverting the discussion, we first need to agree on what this discussion is about. As Lorenzo said, the goal of MCPM or PSCI was not about defining policy but coordinating the C states in a multi-cluster context (with or without security implications). Policy in MCPM is a new thing you brought and if that's the future plan I want to stay far away from it. Selling MCPM as the next idle governor does not work for me, sorry.

Going back to the original topic, when we talk about MCPM vs PSCI in the arm64 context, we need to be clear on _which_ parts of MCPM to consider. PSCI is clearly defined, MCPM is not (as you said, it's work in progress but you can probably set high-level goals). So let's start with defining MCPM and please correct my understanding (based on what's currently in mainline):

1. Front-end to CPU hotplug and cpuidle. 2. Common back-end interface to low-level SoC power management. I would add mcpm_entry_vectors setting here. 3. 'last man' state machine for CPU/cluster power coordination.

I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point. Myself and others in this thread stated the security requirements for such complexity in firmware several times already. You can complain about the ARM secure architecture but, again, that's not something to be addressed in this thread.

So, if PSCI is needed in firmware for security reasons, it already takes care of point 3 above and it pretty much eliminates the need for 2 (if 2 is still needed, usually something is wrong in the SoC security model since non-secure OS should not be allowed power management actions that affect the secure layer coherency). We are left with point 1 which currently is smp_ops, so not a new interface for MCPM. PSCI support on arm64 plugs into smp_ops directly, so I don't see a need for MCPM front-end plus PSCI as back-end, it's just a no-op level of indirection.

I do _agree_ that not all SoCs have (TrustZone-like) security needs. There are two classes here: (a) EL3 is not present and (b) vendors don't plan to support a secure layer.

For (a) things are pretty clear, they need a 'last man' state machine for power management in the (only, no secure/non-secure differentiation) OS and this can be addressed by existing implementations like MCPM (point 3). As I said in a previous email, I would rather use MCPM as a library driven from SoC power management code, possibly overriding cpu_operations (currently smp_ops or a new class of cpu_pm_ops) with mcpm_* ones. Implementation details TBD (as needed) but an important aspect here is that MCPM or PSCI are _interchangeable_ and there won't be an MCPM framework with PSCI as back-end (see above on why I don't see value here).

Class (b) - EL3 present but no (apparent) need for a secure layer - is what I think this thread should be about. Here you have SoC vendors that simply don't need a security layer and others that don't bother (because it's the final mobile device manufacturer that has to put up with the secure OS integration). But they all need to deal with some level of firmware at EL3 because (1) that's the mode the CPU starts at and (2) that's where it needs to go to back for certain actions like ACTLR.SMP bit. These SMC calls I would really like standardised and PSCI is one way of doing it (MCPM doesn't really care about the firmware interface, that's a back-end issue). The TC2 MCPM implementation is not a good example for this case because it runs in secure mode and it's not an ARMv8/AArch64 SoC either.

My question (ignoring the security holes): if you _need_ EL3 to handle coherency enabling/disabling (ACTLR.SMP, CCI etc. not available at non-secure EL1), can you avoid a 'last man' state machine in the firmware and just rely on a non-secure EL1 (MCPM) state machine?

And please don't bring back the argument that Linux is more flexible, easily upgradable than the firmware, I don't dispute these. That's not a better code contest but a real technical problem regarding the interaction between non-secure OS and secure firmware. As far as I'm aware, the majority of (in development) ARMv8 implementations have EL3.

Hopefully I got the discussion back on track.

-- Catalin

Nicolas Pitre

7:07 p.m.

On Tue, 10 Sep 2013, Catalin Marinas wrote:

...

On Mon, Sep 09, 2013 at 07:02:50PM +0100, Nicolas Pitre wrote:

...
On Mon, 9 Sep 2013, Lorenzo Pieralisi wrote:

...
On Mon, Sep 09, 2013 at 02:02:47PM +0100, Catalin Marinas wrote:

...
Taking the TC2 code example (it may be extended, I don't know the plans here) it seems that the cpuidle driver is only concerned with the C1 state (CPU rather than cluster suspend). IIUC, cpuidle is not aware of deeper sleep states. The MCPM back-end would get an expected residency information and make another decision for deeper sleep states. Where does it get the residency information from? Isn't this the estimation done by the cpuidle governor? At this point you pretty much move part of cpuidle governor functionality (and its concepts like target residency) down to the MCPM back-end level. Such split will have bad consequences longer term with code duplication between back-ends, harder to understand and maintain cpuidle decision points.

IMHO the subject of this thread should not be related to power management policy decisions and where they should live. The goal of MCPM and PSCI was not about defining policy for power management but providing mechanism and I agree with Catalin on this, we have to keep them separate.

I do agree as well. That's not where my argument fundamentally is. Please let's not divert the discussion again.

To avoid diverting the discussion, we first need to agree on what this discussion is about. As Lorenzo said, the goal of MCPM or PSCI was not about defining policy but coordinating the C states in a multi-cluster context (with or without security implications). Policy in MCPM is a new thing you brought and if that's the future plan I want to stay far away from it. Selling MCPM as the next idle governor does not work for me, sorry.

That's not what I wanted to convey at all. If that's the impression I gave you then I sincerely apologize. Lorenzo probably more appropriately designated it as a mechanism that also has the ability to demote policy requests which is, as far as I know, not contradicting the longer description I made of it. Let's not twist my words any further please.

...

Going back to the original topic, when we talk about MCPM vs PSCI in the arm64 context, we need to be clear on _which_ parts of MCPM to consider. PSCI is clearly defined, MCPM is not (as you said, it's work in progress but you can probably set high-level goals). So let's start with defining MCPM and please correct my understanding (based on what's currently in mainline):

Front-end to CPU hotplug and cpuidle.

Common back-end interface to low-level SoC power management. I would add mcpm_entry_vectors setting here.

'last man' state machine for CPU/cluster power coordination.

I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point.

Indeed it is the point! And as I keep repeating this over and over in various ways because there is no one who is addressing my very concern so far:

- The MCPM state machine is _not_ as obvious to implement as it may seem. Either that, or the sum of us who took a couple months to get it right are total incompetent idiots. The biggest idiot amongst those people was certainly myself who clearly missed the mark by _far_ in my estimation of the effort required to implement this.

- Implementing non trivial functionality in the secure firmware does increase risk. Risk of a buggy implementation that may not be detected by testing since tests tend not to simulate the real life usage patterns accurately. Or risk of compromising the secure part of the system in some unforeseen ways. Please don't tell me you're above those concerns.

- There is always a cost associated to such risks which is unfortunately (or conveniently, due to hardware restrictions or political reasons) seemingly dismissed at the moment.

- Yet the best way to mitigate the risk is to have a flexible update mechanism providing incentive to address issues quickly.

Now... will someone clearly tell me if those concerns have been addressed yet? Is there something besides the PSCI document and the generic firmware implementation in the plans for covering those aspects? Because I've asked this very question, using the TC2 firmware bug that has not been fixed after 6 months of being reported as an example of why I'm even more concerned by even more complex firmware intermangling security operations with power management. It seems that answers to those very concerns are carefully avoided in _all_ the replies I've received so far.

And that has direct influence on whether people will opt for the complex firmware model or managing themselves cheap escape hatches in case something doesn't go according to the plan. And as you said yourself, PSCI is clearly defined, but that's (relatively speaking -- no punt intended on Charles' excellent work) the easy part. The implementation behind it (just like MCPM core and backends today) might not necessarily always be as easy and even complete.

And please let's drop the MCPM vs smp_ops interface argument. This is only bikeshedding over implementation details. I personally don't care what the actual frontend is called.

...

Myself and others in this thread stated the security requirements for such complexity in firmware several times already. You can complain about the ARM secure architecture but, again, that's not something to be addressed in this thread.

All right. Thank you for stating it so bluntly. There probably is no point discussing this any further in that case.

Nicolas

Catalin Marinas

12 Sep 12 Sep

5:20 p.m.

On Tue, Sep 10, 2013 at 08:07:17PM +0100, Nicolas Pitre wrote:

...

On Tue, 10 Sep 2013, Catalin Marinas wrote:

...
I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point.

Indeed it is the point! And as I keep repeating this over and over in various ways because there is no one who is addressing my very concern so far:

It's not hard to understand: the generic firmware is *not* public yet. You will be able review it and discuss your concerns at the right time.

I *do* appreciate the complexities of MCPM but that doesn't make it write once only (it's not even software only, part of the state coordination may be handled in hardware). But the experience gained from developing it is definitely not lost.

We clearly have a different understanding of the ARM security model and I've already gone to great lengths explaining it. I won't go over those arguments again, you seem to have ignored them so far.

The decision for adopting the ARM generic firmware or other firmware lies entirely with the ARMv8 SoC vendors and they should know better what their security needs are (or will be). It's not the role of the Linux community to mandate certain firmware. However, we *do* have the right to mandate certain standards like booting protocols, DT, ACPI etc. for code aimed to be included in mainline.

Linux interaction with the firmware is another area which badly needs standardisation, whether it is secure firmware or not, simply because that's the first code a CPU executes when out of reset. Such standardisation is even more important in the presence of secure firmware (and given that AArch64 is new, companies will have to write new firmware and there is little legacy to carry).

The first interaction with firmware (EL3 or not, boot loader) is the booting protocol (primary and secondary CPUs). This is defined by Documentation/arm64/booting.txt and will also cover PSCI. It can be extended to other protocols in the future as long as they follow simple rules:

- Existing protocols are not feasible/sufficient (with good arguments, EFI_STUB for example) - There are no SoC or CPU implementation specific quirks required before the FDT is parsed and the (secondary) CPUs enabled the MMU. IOW: - caches and TLBs clean&invalidated - full CPU/cluster coherency enabled - errata workaround bits set

Power management is not covered by the above document, though there is a relation between secondary CPU booting and hotplug. To be clear, as the arm64 kernel _gatekeeper_ I set the basic rules for ARMv8 SoC power management code aimed for mainline (I'll capture them in a soc.txt document):

- If EL3 is present, standard EL3 firmware interface required - New EL3 interface can be accepted if the existing interfaces are not feasible (with good arguments, it is properly documented and widely accepted) - CPUs coming out of power down or idle need to be have all the SoC or implementation specific quirks enabled: - caches and TLBs clean & invalidated - coherency enabled - errata workaround bits set

ARM provides PSCI as such standard API in the presence of EL3 but I'm _open_ to other _well-thought_ firmware API proposals that can gain wider acceptance. Hint: Linaro is a good forum for wider SoC vendor and Linux community discussions, I would expect concrete proposals rather than complains. (BTW, my impression from the last Connect was that LEG is adopting PSCI for the ACPI work)

Note that the above rules don't have anything to do with MCPM. That's a SoC power driver implementation detail (and I already suggested turning it into a library if needed to avoid duplication). But the above firmware API rules still apply and if PSCI is present you have the advantage of generic support in Linux.

(as a side note, generalising your TC2 MMC experience to _any_ firmware is unprofessional IMHO. You keep repeating it and to me starts sounding like FUD)

-- Catalin

Nicolas Pitre

9:54 p.m.

On Thu, 12 Sep 2013, Catalin Marinas wrote:

...

On Tue, Sep 10, 2013 at 08:07:17PM +0100, Nicolas Pitre wrote:

...
On Tue, 10 Sep 2013, Catalin Marinas wrote:

...
I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point.

Indeed it is the point! And as I keep repeating this over and over in various ways because there is no one who is addressing my very concern so far:

It's not hard to understand: the generic firmware is *not* public yet. You will be able review it and discuss your concerns at the right time.

And how the reviewing of it will alleviate my concerns? At least if you or ARM have no answers then please say so rather than continuously ignoring the issues.

I'm asking if there is a _plan_ to produce _recommendations_ for best practices about firmware update deployment. Unless you don't recognize the need for them? That doesn't have to wait until a piece of code is published before answers are given, no?

...

I *do* appreciate the complexities of MCPM but that doesn't make it write once only (it's not even software only, part of the state coordination may be handled in hardware). But the experience gained from developing it is definitely not lost.

It is not "write once" for sure. The gained experience shows that this code is going to be an _evolving_ target that cannot be cast into static firmware just like a done job.

...

We clearly have a different understanding of the ARM security model and I've already gone to great lengths explaining it. I won't go over those arguments again, you seem to have ignored them so far.

I didn't ignore them. They simply failed to address my point. Repeating them won't make them any more relevant.

Let me summarize the situation one more time:

- The market is asking for security sensitive code to be executed in perfect isolation from the standard OS. Hence TrustZone / Secure World.

- The _design_ of the current Secure World architecture implies that the code running there has no choice but to concern itself with non security related operations as well simply to _preserve_ its secure attributes. In other words, the fact that secure code has to implement e.g. power management mechanisms and cache management operations is a consequence of the architecture design and not a secure service that the market asked for.

- For secure code to be truly secure, the code itself has to be unalterable and unaccessible, especially if it carries encryption keys, etc. What we've seen so far is that the secure code is getting burned directly into the SoC for those reasons.

- And Secure World is there to stay.

Do we agree so far?

What I'm claiming is that adding more and more complexity to non-alterable firmware code is bad bad bad. You may repeat over and over that the ARM security model requires that complexity in the secure firmware. I never denied that fact either. But I will continue to assert that this is still the unfortunate _consequence_ of a bad architecture model and not something that should be promoted as a design feature.

...

The decision for adopting the ARM generic firmware or other firmware lies entirely with the ARMv8 SoC vendors and they should know better what their security needs are (or will be). It's not the role of the Linux community to mandate certain firmware. However, we *do* have the right to mandate certain standards like booting protocols, DT, ACPI etc. for code aimed to be included in mainline.

Standards are _*NOT*_ the problem. Please drop this argument.

...

Linux interaction with the firmware is another area which badly needs standardisation, whether it is secure firmware or not, simply because that's the first code a CPU executes when out of reset. Such standardisation is even more important in the presence of secure firmware (and given that AArch64 is new, companies will have to write new firmware and there is little legacy to carry).

What I'm saying is that _complexity_ in the firmware is _*THE*_ problem. Whether it is old or new, AArch64 or AArch32, PSCI or whatnot.

The _increasing_ interactions between any firmware/bootloader and Linux _is_ a serious problem. This is a problem because it _will_ have bugs. It _will_ have version and implementation skews. It will require _more_ coordinations between different software parts beyond any standard interfaces. And that is _costly_ and a total nightmare to manage. And even more so when those bugs are in the (possibly unalterable) firmware.

You can bury your head in the sand as you wish and conveniently downplay those facts. I personally do care greatly and I do have sympathy for vendors who might wish to pursue a different path than this single solution with everything in firmware approach.

...

The first interaction with firmware (EL3 or not, boot loader) is the booting protocol (primary and secondary CPUs). This is defined by Documentation/arm64/booting.txt and will also cover PSCI. It can be extended to other protocols in the future as long as they follow simple rules:

Existing protocols are not feasible/sufficient (with good arguments, EFI_STUB for example)

"Good" is pretty subjective.

"I don't want complex firmware in my SoC" could be a "good" reasons according to certain point of views.

...

There are no SoC or CPU implementation specific quirks required before the FDT is parsed and the (secondary) CPUs enabled the MMU. IOW:

caches and TLBs clean&invalidated

full CPU/cluster coherency enabled

errata workaround bits set

Items 1 and 2 are normally easy. Item 3 is often known _after_ product deployment. What do you do if this is the responsibility of the secure firmware to do? How do you manage re-certification of the secure code? How do you provide secure code updates to products in the field? What if the L3 firmware is not alterable? Are there recommendations in ARM's plans to address this?

...

Power management is not covered by the above document, though there is a relation between secondary CPU booting and hotplug. To be clear, as the arm64 kernel _gatekeeper_ I set the basic rules for ARMv8 SoC power management code aimed for mainline (I'll capture them in a soc.txt document):

If EL3 is present, standard EL3 firmware interface required

Fair enough for booting.

...

New EL3 interface can be accepted if the existing interfaces are not feasible (with good arguments, it is properly documented and widely accepted)

How can an interface be widely accepted if it is not accepted first?

...

CPUs coming out of power down or idle need to be have all the SoC or implementation specific quirks enabled:

What if they're not all known up front the day the Soc goes into production? Because in practice those quirks often end up much more often than we would like being errata workarounds once products are deployed.

...

caches and TLBs clean & invalidated

coherency enabled

This looks like a simple enumeration, doesn't it?

What if the above implies the equivalent of MCPM which complexities you said you do appreciate? What if its hardware specific implementation (backend in MCPM parlance) is non trivial to implement optimally and requires updating?

...

errata workaround bits set

Yada.

...

ARM provides PSCI as such standard API in the presence of EL3 but I'm _open_ to other _well-thought_ firmware API proposals that can gain wider acceptance.

Thank you.

One such API might simply be a small L3 stub which only purpose in life is to proxy system control accesses that L1 cannot do otherwise, especially if there is no need for a secure OS on the system. This is likely to free some vendors of the risks from not getting their firmware right for all cases.

Now let's be clear on what my position is: I do agree on the value of a standard booting interface for the kernel or even bootloaders, etc. This really helps in having a common distro procedure for different hardware, etc.

But once the kernel is booted, it does require hardware specific drivers to work properly in all cases. No one is ever going to accept abstracting ethernet hardware into firmware (virtual machines notwithstanding). Specialized disk arrays will also require custom drivers -- I really doubt AHCI will cut it for them all. Furthermore, improvements in kernel subsystems often implies modifications to those drivers (e.g. when NAPI was introduced), etc.

Therefore... there is no reason why _conceptually_ the same principle could not be applied to power management. If specific _drivers_ are needed to support this or that platform then this shouldn't be a problem, just like it is not a problem for ethernet interfaces. Once the kernel is booted via the standard firmware interface, then the kernel should be provided with the right modules to drive the rest of the system in the best possible way. The only reason why this wouldn't work on ARM is because of the security model.

Of course it is a good idea to have DT or ACPI. Those standards are very useful for the factoring of integration differences on otherwise common hardware blocks. They are _informative_ and they allow the kernel to bypass them when they turn out to be insufficient. Firmware calls do not have that flexibility.

...

Hint: Linaro is a good forum for wider SoC vendor and Linux community discussions, I would expect concrete proposals rather than complains.

They might come sooner than you'd expect.

...

(BTW, my impression from the last Connect was that LEG is adopting PSCI for the ACPI work)

PSCI has its place, there's no doubt about that. It can't be a one-size-fits-all solution though.

...

Note that the above rules don't have anything to do with MCPM. That's a SoC power driver implementation detail (and I already suggested turning it into a library if needed to avoid duplication).

I do agree. However this is again implementation details. I'm more concerned about the larger picture.

...

But the above firmware API rules still apply and if PSCI is present you have the advantage of generic support in Linux.

Easier said than done. Linux code is cheaper to write and maintain than firmware code, _even_ if it has to stay out of mainline because you'd be opposing it.

...

(as a side note, generalising your TC2 MMC experience to _any_ firmware is unprofessional IMHO. You keep repeating it and to me starts sounding like FUD)

I'm a pragmatic. I don't believe in magic and wishful thinking.

Ignoring reported firmware bugs for months may has its share of unprofessionalism too. But instead of going down the route of name calling, I prefer to believe that you have good reasons at ARM to explain this situation such as resource shortage and/or higher priorities. And from experience for having worked at several different companies I can tell you that resource shortage and priority shifts do happen everywhere.

Hence my assertion that complex firmware are not cost effective. If a simpler (aka cheaper) solution exists, you must expect vendors to embrace it, whether or not you like it.

Nicolas

Leo Yan

13 Sep 13 Sep

2:11 a.m.

On 09/13/2013 05:54 AM, Nicolas Pitre wrote:

...

On Thu, 12 Sep 2013, Catalin Marinas wrote:

...
Linux interaction with the firmware is another area which badly needs standardisation, whether it is secure firmware or not, simply because that's the first code a CPU executes when out of reset. Such standardisation is even more important in the presence of secure firmware (and given that AArch64 is new, companies will have to write new firmware and there is little legacy to carry).

What I'm saying is that _complexity_ in the firmware is _*THE*_ problem. Whether it is old or new, AArch64 or AArch32, PSCI or whatnot.

The _increasing_ interactions between any firmware/bootloader and Linux _is_ a serious problem. This is a problem because it _will_ have bugs. It _will_ have version and implementation skews. It will require _more_ coordinations between different software parts beyond any standard interfaces. And that is _costly_ and a total nightmare to manage. And even more so when those bugs are in the (possibly unalterable) firmware.

You can bury your head in the sand as you wish and conveniently downplay those facts. I personally do care greatly and I do have sympathy for vendors who might wish to pursue a different path than this single solution with everything in firmware approach.

For the discussion, i'm neutral. :-) But upper question is really good point, so let me give more input here.

Before when we developed for Trustzone Software (TZSW), the two worlds (Non-Secure and Secure world) we can simply take them as two separate contexts, if these two contexts have their own state machine and need interactive with each other's state machine, the code will be complex and is really bad thing.

Finally we refined the code to decouple the state machine, the basic idea is to let one world maintain the context and another world _ONLY_ provide the primitive functionalities.

IMHO, PSCI have an assumption at here: we know ARM has used a micro-controller (or a tiny ARM core) as the separated power controller for the power management, and has the firmware run on this controller. So whatever PSCI or the power controller's own code can be flexibly upgraded. But i think many SoC companies will design their own logic for power management unit (PMU) resident in the SoC, it's not flexible and from previous product experience, the logic for low power mode is easily to have silicon bugs.

Taking account upper concern, if the PSCI only provide the primitive functionalities (cache operations, CCI port operations, GIC/Timer related operations) and the state machine is maintained in Linux will be more flexible.

Thx, Leo Yan

Catalin Marinas

14 Sep 14 Sep

10:33 a.m.

On Thu, Sep 12, 2013 at 10:54:25PM +0100, Nicolas Pitre wrote:

...

On Thu, 12 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 10, 2013 at 08:07:17PM +0100, Nicolas Pitre wrote:

...
On Tue, 10 Sep 2013, Catalin Marinas wrote:

...
I hope it is clear by now that if you care about _proper_ security in the power management context, point 3 above must be handled in firmware. That's independent of how complex the firmware needs to be for such task. PSCI as an API and the Generic Firmware implementation is addressing this (and at the same time providing a common framework for secure OS vendors to rely on). Generic Firmware will be further developed to address other concerns but that's not the point.

Indeed it is the point! And as I keep repeating this over and over in various ways because there is no one who is addressing my very concern so far:

It's not hard to understand: the generic firmware is *not* public yet. You will be able review it and discuss your concerns at the right time.

And how the reviewing of it will alleviate my concerns? At least if you or ARM have no answers then please say so rather than continuously ignoring the issues.

You can have answers at the upcoming Connect, if you want a timeline.

...

...
We clearly have a different understanding of the ARM security model and I've already gone to great lengths explaining it. I won't go over those arguments again, you seem to have ignored them so far.

I didn't ignore them. They simply failed to address my point. Repeating them won't make them any more relevant.

Let me summarize the situation one more time:

The market is asking for security sensitive code to be executed in

perfect isolation from the standard OS. Hence TrustZone / Secure World.

The _design_ of the current Secure World architecture implies that the

code running there has no choice but to concern itself with non security related operations as well simply to _preserve_ its secure attributes. In other words, the fact that secure code has to implement e.g. power management mechanisms and cache management operations is a consequence of the architecture design and not a secure service that the market asked for.

For secure code to be truly secure, the code itself has to be

unalterable and unaccessible, especially if it carries encryption keys, etc. What we've seen so far is that the secure code is getting burned directly into the SoC for those reasons.

And Secure World is there to stay.

Do we agree so far?

Mostly yes. Regarding what the market asked for, another way to do world separation is to use a separate processor. But the market found it cheaper/easier to use the same CPU with some restrictions (power management mechanism on one side, policy on the other). Anyway, nothing in the architecture prevents loading parts of the secure firmware (e.g. signed images), only that people aren't used to do it now. A few years ago you weren't even able to update your mobile phone software.

...

...

There are no SoC or CPU implementation specific quirks required before

the FDT is parsed and the (secondary) CPUs enabled the MMU. IOW:

caches and TLBs clean&invalidated

full CPU/cluster coherency enabled

errata workaround bits set

Items 1 and 2 are normally easy. Item 3 is often known _after_ product deployment. What do you do if this is the responsibility of the secure firmware to do? How do you manage re-certification of the secure code? How do you provide secure code updates to products in the field? What if the L3 firmware is not alterable? Are there recommendations in ARM's plans to address this?

We need to be careful here not to confuse secure OS (usually secure EL1) and the secure firmware (usually EL3). Certification (usually EAL4) limits itself to secure OS code design, test, review. I don't think we've ever had certified firmware. So modifying firmware does not mean re-certifying the secure OS. It also doesn't mean that the full system is certified.

Some (not all) errata have the lucky workaround of simply setting an implementation-defined bit. We've had lots of problems on ARMv7 with such bits which (for security reasons) are not exposed to the non-secure world. You can either get the SoC vendors to unblock those in firmware or get them to provide firmware updates (not necessarily secure firmware). Every time such issue appears on the list, we say that should have been done in firmware (or a boot loader in a SoC-specific way). You can't do a SoC specific SMC before you know the SoC and that's only after parsing the DT (unless we go back to machine numbers). To make things worse, many workarounds have to be enabled before enabling the MMU.

As for what ARM recommends, single OS image is not really a concern of the hardware/architecture people. Such recommendations really should come from the OS communities. So do we allow random #ifdef's throughout the kernel or tell SoC vendors to handle them in firmware or boot-loader?

Please don't try to make this an ARM responsibility only, the Linux community should come up with guidelines for the SoC and firmware people.

...

...

New EL3 interface can be accepted if the existing interfaces are not

feasible (with good arguments, it is properly documented and widely accepted)

How can an interface be widely accepted if it is not accepted first?

Public reviews involving non-secure OS, secure OS (if applicable), firmware and SoC people. That's the process we went through with PSCI (including many public sessions at Linaro Connect).

...

...

CPUs coming out of power down or idle need to be have all the SoC or

implementation specific quirks enabled:

What if they're not all known up front the day the Soc goes into production? Because in practice those quirks often end up much more often than we would like being errata workarounds once products are deployed.

We can still take quirks as errata workarounds. But quirks should not be a default non-errata mode. And they should not break single Image.

...

...

caches and TLBs clean & invalidated

coherency enabled

This looks like a simple enumeration, doesn't it?

What should it look like? The reason is that there are many implementation-specific ways of flushing the caches and enabling coherency. Hotplug and CPU boot are not necessarily different code paths in firmware and Linux (especially if we take kexec into account). So we can extend the cold boot reasoning to hoplug or idle.

...

What if the above implies the equivalent of MCPM which complexities you said you do appreciate? What if its hardware specific implementation (backend in MCPM parlance) is non trivial to implement optimally and requires updating?

You could defer things like CCI enabling (not specific to a single CPU) until SoC-specific code can be run (IOW you know what SoC it is) but only if (1) it doesn't break the security assumptions (if any) and (2) you have a (standard) way to call back into firmware.

...

...
ARM provides PSCI as such standard API in the presence of EL3 but I'm _open_ to other _well-thought_ firmware API proposals that can gain wider acceptance.

Thank you.

One such API might simply be a small L3 stub which only purpose in life is to proxy system control accesses that L1 cannot do otherwise, especially if there is no need for a secure OS on the system. This is likely to free some vendors of the risks from not getting their firmware right for all cases.

Minor correction on terminology to avoid confusion - ELx for exception levels, Lx for cache levels.

Such API should also state what the security aims are, what restrictions, if any, are imposed on the secure OS (e.g. running UP and tied to a single CPU).

...

...
Hint: Linaro is a good forum for wider SoC vendor and Linux community discussions, I would expect concrete proposals rather than complains.

They might come sooner than you'd expect.

Looking forward to it. Depending on how soon, Connect or the ARM kernel mini-summit are good opportunities for wider discussions. Actually for the ARM mini-summit, a topic on single Image and firmware requirements would be really good to create high-level guidelines.

BTW, I hope to get arm64 hotplug and suspend merged for 3.13. The patches target PSCI and you have seen/acked some of them. We'll do some slight reworking to generalise the CPU enable-method DT parsing (spin-table and PSCI), also used for hotplug and suspend. Please feel free to comment when they get on the list.

-- Catalin

Dave Martin

4 Sep 4 Sep

2:09 p.m.

On Wed, Sep 04, 2013 at 10:57:57AM +0100, Catalin Marinas wrote:

...

On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
On Tue, 3 Sep 2013, Catalin Marinas wrote:

...
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote

...
...
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?

This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.

The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.

That's fine, obviously.

But reality will mess that up somewhat eventually. You'll have to accept machine specific quirks for firmware bugs because the firmware update is not forthcoming if at all. Same story everywhere.

It's one thing to accept machine specific quirks for firmware bugs (but I'll push back as much as possible) and entirely different to accept temporary code because the firmware features are not ready yet.

...
...
...
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.

I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.

I know, I know... From my own point of view this is rather sad.

Whether it's sad for Linux or not, be aware that ARM Ltd is not a Linux-only shop. There are other OSes, secure or non-secure, there are vendors that ask for these features (and this includes vendors that run Linux/Android).

...
...
With proper education, SoC vendors can learn to allow upgradable (parts of) firmware.

Is this education in ARM's plans? Is someone working on recommendations about proper design for fail-safe firmware upgrades via separate firmware components?

The generic firmware is probably a better place to provide such functionality rather than a recommendations document. But I haven't followed its development closely enough to comment (I know I raised this exact issue in the past, primarily for handling undocumented CPU errata bits accessible only in secure mode).

...
...
But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.

I think that Linux has gained its dominant position for one fundamental reason: source availability. No company could ever match the work force that gathered around that source code. If secure OS people would agree to this principle then things could end up more secure and more efficient. But that's not something I have influence over.

It's not about secure OS, for many reasons this will probably remain closed source. But the firmware and UEFI are a different story and most of it can be open. I can see vendors keeping parts of the firmware closed but I would hope those are minimal (it already happens, it's not something introduced by PSCI requirements).

...
...
...
...
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.

Hmmm... The above statement makes no sense to me. Sorry I must have missed something.

OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):

mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()

So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.

The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.

The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware. As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.

One think I don't like about MCPM (and I raised it during review) is the cluster/cpu separation with hard-coded number of clusters. I would have really liked a linear view of the CPUs and let the back-end (or MCPM library itself) handle the topology. I don't think it's hard to change anyway.

...
If you have PSCI then the MCPM call graph is roughly:

mcpm_cpu_power_down() psci_power_down() psci_ops.cpu_off(power_state)

That's it. Nothing has to call back into the kernel.

So for arm64 we expose PSCI functionality via smp_operations (cpu up/down, suspend is work in progress). Populating smp_operations is driven from DT and it has been decoupled from the SoC code. What would the MCPM indirection bring here?

In the presence of PSCI firmware, do you agree that a potential MCPM back-end should be generic (not tied to an SoC)? In such case, what would the MCPM front-end bring which cannot be currently handled by smp_operations (or an extension to it)?

I don't (yet?) see the point of PSCI back-end to arm64 MCPM since all people are asking MCPM for is exactly to avoid the PSCI implementation.

Couple of comments on the patch below. Not aimed as a proper review:

...
--- /dev/null +++ b/arch/arm/mach-vexpress/tc2_pm_psci.c

[…]

...
+static void tc2_pm_psci_power_down(void) +{
  struct psci_power_state power_state;
  unsigned int mpidr, cpu, cluster;
  mpidr = read_cpuid_mpidr();
  cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
  cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
  BUG_ON(!psci_ops.cpu_off);
  switch (atomic_dec_return(&tc2_pm_use_count[cpu][cluster])) {
  case 1:
          /*
           * Overtaken by a power up. Flush caches, exit coherency,
           * return & fake a reset
           */
          asm volatile (
                  "mrc    p15, 0, ip, c1, c0, 0 \n\t"
                  "bic    ip, ip, #(1 << 2) @ clear C bit \n\t"
                  "mcr    p15, 0, ip, c1, c0, 0 \n\t"
                  "dsb \n\t"
                  "isb"
                  : : : "ip" );
          flush_cache_louis();
          asm volatile (
                  "clrex  \n\t"
                  "mrc    p15, 0, ip, c1, c0, 1 \n\t"
                  "bic    ip, ip, #(1 << 6) @ clear SMP bit \n\t"
                  "mcr    p15, 0, ip, c1, c0, 1 \n\t"
                  "isb \n\t"
                  "dsb"
                  : : : "ip" );
          return;
The above part needs to be done on the secure side, ACTLR.SMP bit cannot be cleared on the non-secure side. Is this return with coherency disabled required by MCPM?

We go through soft reboot after this, so caches have to be clean and disabled as required by the boot protocol. setup_mm_for_reboot() has already been called, but that is insufficient: we still need to clean the caches. The reason why the backend still has to do that is that it seems impossible to factor the cache handling out of the backend in a reusable way.

This means that there is a redundant cache flush: because whatever, the firmware must do it again, and if it fails to do so it may be impossible to work around anyway, because dangling dirty lines in the cache across powerdown could lead to bus deadlocks etc. even if the Secure World doesn't care about the actual data they contain. The assumption is that cleaning a cache that is already clean will be inexpensive.

However, the boot protocol doesn't (and can't) require the CPU to be noncoherent. Clearing the SMP bit here is certainly not useful, probably not possible, and possibly not safe (since it will #undef on some CPUs).

...

```
  case 0:
```

          /* A normal request to possibly power down the cluster */

          power_state.id = PSCI_POWER_STATE_ID;

          power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;

          power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;

          psci_ops.cpu_off(power_state);

          /* On success this function never returns */

```
  default:
```

          /* Any other value is a bug */

```
          BUG();
```
```
  }
```

+static void tc2_pm_psci_suspend(u64 unused) +{

```
  struct psci_power_state power_state;
```
```
  BUG_ON(!psci_ops.cpu_suspend);
```

  /* On TC2 always attempt to power down the cluster */

  power_state.id = PSCI_POWER_STATE_ID;

  power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;

  power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;

  psci_ops.cpu_suspend(power_state, virt_to_phys(bL_entry_point));

  /* On success this function never returns */

```
  BUG();
```

CPU_SUSPEND is allowed to return if there is a pending interrupt.

True. The BUG should be conditional on the the return value being != SUCCESS. SUCCESS just means the call was valid, but preempted by a wakeup.

Possibly some aspects of the CPU_SUSPEND behaviour weren't tied down at the time this code was written.

Cheers ---Dave

4447

days inactive

4462

days old

linaro-kernel@lists.linaro.org

49 comments

participants

tags (0)

participants (12)

Amit Kucheria
Catalin Marinas
Catalin Marinas
Charles Garcia-Tobin
Christian Daudt
Daniel Lezcano
Dave Martin
Grant Likely
Leo Yan
Lorenzo Pieralisi
Nicolas Pitre
Roger Teague