Re: [Question] MCPM Supporting For ARM64

3 Sep 2013


      On Mon, 2 Sep 2013, Catalin Marinas wrote:
...
(sorry if you get a duplicate email, the SMTP server I was using keeps
reporting delays, so I'm resending with different settings)
This is the first time I see your reply.
...
Hi Nico,
On Sat, Aug 31, 2013 at 03:19:46AM +0100, Nicolas Pitre wrote:
...
On Fri, 30 Aug 2013, Catalin Marinas wrote:
...
My position for the arm64 kernel support is to use the PSCI and
implement the cluster power synchronisation in the firmware. IOW, no
MCPM in the arm64 kernel :(. To help with this, ARM is going to provide
a generic firmware implementation that SoC vendors can expand for their
needs.
I am open for discussing a common API that could be shared between
MCPM-based code and the PSCI one. But I'm definitely not opting for a
light-weight PSCI back-end to a heavy-weight MCPM implementation.
Also note that IKS won't be supported on arm64.
I find the above statements a bit rigid.
While I think the reasoning behind PSCI is sound, I suspect some people 
will elect not to add it to their system.  Either because the hardware 
doesn't support all the necessary priviledge levels, or simply because 
they prefer the easiest solution in terms of maintenance and 
upgradability which means making the kernel in charge.  And that may 
imply MCPM.  I know that ARM would like to see PSCI be adopted 
everywhere but I doubt it'll be easy.
I agree PSCI is not always possible because of technical reasons
(missing certain exception levels), so let me refine the above
statement: if EL3 mode is available on a CPU implementation, PSCI must
be supported (which I think is the case for Marvell).
I may agree to that.
...
It indeed sounds rigid but what's the point of trying to standardise
something if you don't enforce it? People usually try to find the
easiest path for them which may not necessarily be the cleanest longer
term.
I'll say this only once, and then I'll shut up on this very issue.
<<< Beginning of my editorial on firmware >>>
The firmware _interface_ standard is not a problem.  So this is not 
about PSCI in particular.  I'm glad I was among st the early reviewers of 
the specification and therefore I would be the first to be blamed if the 
interface was wrong to me.  In fact there was something wrong to me in 
the very first draft and I'm glad this was removed from subsequent 
revisions.  This just to say that I was involved with PSCI to some 
degree.
The problem is in the _implementation_ of the firmware.  Or more 
precisely its _maintenance_.
Because, as experience has shown, firmware is always *bad* and *buggy* 
at some point.  No matter what you say, someone somewhere will screw it 
up.  And we'll be stuck with it.
You only need to look at the abomination the BIOS on X86 can be.  Or 
buggy ACPI tables requiring all sorts of quirks in the kernel.  Or the 
catastrophe the APM BIOS was before it was later replaced by ACPI.
You might say: this is X86 and on ARM we can do better.  Yeah... well... 
Look at our bootloader story for example.  How many people are simply 
unable to upgrade their bootloader in order to be DT compliant.  Or how 
many OEMs are too scared to let end users upgrade their bootloaders.
Upgrading system firmware is not much different from upgrading a 
bootloader.  Something goes wrong during the upgrade and you have a nice 
paperweight.  So OEMs simply won't be forthcoming with firmware upgrades 
as this is a huge risk with significant support costs when this goes 
wrong.  And if the firmware is responsible for the secure world as well 
then you have QA and certification costs to consider, etc.
And even if the upgrade is easy, the OEM might not necessarily be that 
interested in fixing buggy firmware for many reasons.  A good example of 
that is the very Versatile Express which we (Linaro) reported issues for 
with the MMC slot many months ago and no fix has come back yet:
https://bugs.launchpad.net/linaro-big-little-system/+bug/1166246
Achin was experimenting with a modified firmware before priorities and 
people at ARM have moved around and then nothing.  OEMs will have 
priority shifts as well -- this is the hard reality of this business.  
This means unfixed firmware for older devices that users can't fix 
unlike with kernel solutions.
You might say: We'll then help people make their firmware perfect from 
the start.  Well... I have enough software engineering experience to 
know this is simply impossible.  Software by definition is always going 
to have bugs.  Especially in the market ARM is after where time to 
market is paramount.  So pressure to have the software done quickly is 
real.  And even if you provide an example implementation for customers 
to use and extend, then they'll extend it all right and introduce new 
bugs, possibly even breaking the previously working generic code.
Heck, I've privately reviewed a few MCPM backends so far, and they all 
had serious flaws.  The kernel has the advantage where any extension is 
reviewed by people from the outside.  Example firmware being extended by 
ARM partners will not.  And I don't see why those people wouldn't make 
the same kind of mistakes in their fork of the generic firmware 
implementation.  This stuff is simply too complex to always be right on 
the first attempt, even if testing shows it should.
So no, I don't have this faith in firmware like you do.  And I've 
discussed this with you in the past as well, and my mind on this issue 
has not changed.  In particular, when you give a presentation on ARMv8 
and your answer to a question from the audience is: "firmware will take 
care of this" I may only think you'll regret this answer one day.
...
As a past example, how many SoC maintainers were willing to
convert their code to DT if there wasn't a clear statement from arm-soc
guys mandating this?
This is a wrong comparison.  DT is more or less a more structured way to 
describe hardware.  Just like ACPI.  The kernel is not *forced* to 
follow DT information if that turns out to be wrong.  And again 
experience has shown that this does happen.
Firmware calls such as provided by the PC BIOS, APM, UEFI runtime 
services or PSCI are black boxes you have to trust while the code being 
called doesn't know what is really hapening on the system.
Oh and by the way people are now realizing all the issues with DT where 
the maintainers want only stable final bindings but developers can't 
tell when a binding is final unless it has seen wide testing.  So DT is 
not the greatest thing since sliced bread anymore.
But let's get back to firmware. Delegating power management to firmware 
is a completely different type of contract.  In that case you're moving 
a critical functionality with potentially significant algorithmic 
complexity (as evidenced by the MCPM development effort) out of the 
kernel's control.  The more functionality and complexity you delegate to 
the firmware, the greater the risk you'll end up with a crippled system 
the kernel will have to cope with eventually.  Because this will happen 
at some point there is no doubt about that.
Now what is the piece of software people are not afraid of upgrading?  
Where is it easier to upgrade functionality beyond the initial shipping 
state because time to market pressure drove products on the street 
before the software was ready? Yes, it is the kernel.  And that's 
exactly what IKS was about: being able to drive a big.LITTLE system 
early on before the scheduler was properly enhanced for big.LITTLE.  
And only a simple kernel upgrade is needed to move from IKS to full MP.
Think for a minute when the kernel used to call the APM BIOS for power 
management in the old days, and then the kernel was enhanced to be 
preemptive.  APM was not re-entrant and therefore conceptually 
incompatible with this kernel context at the time.  Obviously things 
where even more difficult when SMP showed up.  And just ask the X86 
maintainers what they think about the firmware induced NMIs *today*.
So, is e.g. PSCI compatible with Linux-RT now?  Is there another kernel 
execution model innovation coming up for which the firmware will be an 
obstacle on ARM?
I do understand that the way the ARM architecture is designed, 
especially latest revisions with TZ and all, do require some sort of 
firmware.  And while there is no other way but to have it, then of 
course a standardized firmware interface is a must.  I'm not disputing 
this at all.
I'm just extremely worried about the simple presence of firmware and the 
need to call into it for complex operations such as intelligent power 
management when the kernel would be a much better location to perform 
such operations.  The kernel is also the only place where those things 
may be improved over time, even for free from third parties, , whereas 
the firmware is going to be locked down with no possible evolution 
beyond its shipping state.
For example, after working on MCPM for a while, I do have ideas for 
additional performance and power usage optimization.  But if that 
functionality is getting burned into firmware controlled by secure world 
then there is just no hope for _me_ to optimize things any further.
Not that I can do anything about it anyway.  But I had to vent my 
discomfort about anything firmware-like.  Yet I was the one who 
suggested to Will Deacon and Marc Zyngier they should use PSCI to 
control KVM instances instead of their ad hoc interface. However, if 
someone has a legitimate case for _not_ using firmware calls and use 
machine specific extensions in the kernel then I'll support them.
<<< End of my editorial on firmware >>>
...
I don't have a problem with the MCPM itself. It is nicely written and
can probably be abstracted into a library. My concern is how MCPM is
going to be used by the SoC code. The example I have so far is dcscb.c
and I really want such code moved out of the kernel into firmware (or
even better if there is some hardware control available). It is highly
dependent to the SoC and CPU implementation and setting/clearing bits
like ACTLT.SMP which is not possible in non-secure mode. I'm pretty sure
that such code will end up doing non-standard SMC calls to the secure
firmware and we lose any hope of standardisation.
Standardization and innovation are often opposing each other.
And yes that sucks.
...
The PSCI API can evolve in time (it has version information) but so far
there is no such thing as "lightweight" PSCI that could be used as an
MCPM back-end.
That exists still, written by Achin Gupta.  His 6th version is sitting 
in my mbox and is dated 12 Mar 2013.
...
For example, MCPM provides callbacks into the platform
code when a CPU goes down to disable coherency, flush caches etc. and
this code must call back into the MCPM to complete the CPU tear-down. If
you want such thing, you need a different PSCI specification.
Hmmm... The above statement makes no sense to me.  Sorry I must have 
missed something.
Nicolas

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Question] MCPM Supporting For ARM64