On Wed, Sep 04, 2013 at 10:57:57AM +0100, Catalin Marinas wrote:
On 3 Sep 2013, at 19:53, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Tue, 3 Sep 2013, Catalin Marinas wrote:
On Tue, Sep 03, 2013 at 05:16:17AM +0100, Nicolas Pitre wrote
As a past example, how many SoC maintainers were willing to convert their code to DT if there wasn't a clear statement from arm-soc guys mandating this?
This is a wrong comparison. DT is more or less a more structured way to describe hardware. Just like ACPI. The kernel is not *forced* to follow DT information if that turns out to be wrong. And again experience has shown that this does happen.
The comparison is not meant for DT vs PSCI but an example that even though DT has benefits and less risks, people didn't rush into adopting it unless it was mandated for new platforms. For arm64 we try not to get SoC code under arch/arm64/ (mach-virt like approach) but I still get people asking in private about copying code into arch/arm64/mach-* directories for the same easy path reasons.
That's fine, obviously.
But reality will mess that up somewhat eventually. You'll have to accept machine specific quirks for firmware bugs because the firmware update is not forthcoming if at all. Same story everywhere.
It's one thing to accept machine specific quirks for firmware bugs (but I'll push back as much as possible) and entirely different to accept temporary code because the firmware features are not ready yet.
But let's get back to firmware. Delegating power management to firmware is a completely different type of contract. In that case you're moving a critical functionality with potentially significant algorithmic complexity (as evidenced by the MCPM development effort) out of the kernel's control. The more functionality and complexity you delegate to the firmware, the greater the risk you'll end up with a crippled system the kernel will have to cope with eventually. Because this will happen at some point there is no doubt about that.
I agree with your arguments that Linux is more flexible and easily upgradable. However, the point I want to emphasize is that unless Linux is playing all the roles of firmware/secure/non-secure code, you must have firmware and calls to it from the non-secure OS. On ARMv8, EL3 is the first mode CPU is running out of reset and it needs to get back to this mode for power-related (e.g. coherency) settings. Whether we (Linux people) like it or not, that's the reality.
I know, I know... From my own point of view this is rather sad.
Whether it's sad for Linux or not, be aware that ARM Ltd is not a Linux-only shop. There are other OSes, secure or non-secure, there are vendors that ask for these features (and this includes vendors that run Linux/Android).
With proper education, SoC vendors can learn to allow upgradable (parts of) firmware.
Is this education in ARM's plans? Is someone working on recommendations about proper design for fail-safe firmware upgrades via separate firmware components?
The generic firmware is probably a better place to provide such functionality rather than a recommendations document. But I haven't followed its development closely enough to comment (I know I raised this exact issue in the past, primarily for handling undocumented CPU errata bits accessible only in secure mode).
But if you have a better proposal and can get all the parts (including secure OS people) to agree, I'm open to it.
I think that Linux has gained its dominant position for one fundamental reason: source availability. No company could ever match the work force that gathered around that source code. If secure OS people would agree to this principle then things could end up more secure and more efficient. But that's not something I have influence over.
It's not about secure OS, for many reasons this will probably remain closed source. But the firmware and UEFI are a different story and most of it can be open. I can see vendors keeping parts of the firmware closed but I would hope those are minimal (it already happens, it's not something introduced by PSCI requirements).
For example, MCPM provides callbacks into the platform code when a CPU goes down to disable coherency, flush caches etc. and this code must call back into the MCPM to complete the CPU tear-down. If you want such thing, you need a different PSCI specification.
Hmmm... The above statement makes no sense to me. Sorry I must have missed something.
OK, let me clarify. I took the dcscb.c example for CPU going down (please correct me if I got it wrong):
mcpm_cpu_power_down() dcscb_power_down() flush_cache_all() - both secure and non-secure set_auxcr() - secure-only cci_disable_port_by_cpu() - secure-only? (__mcpm_outbound_leave_critical()) __mcpm_cpu_down() wfi()
So the dcscb back-end calls back into MCPM to update the state machine. If you are to run Linux in non-secure mode, set_auxcr() and CCI would need secure access. Cache flushing also needs to affect the secure cachelines and disable the caches on the secure side. So from the flush_cache_all() point, you need to get into EL3. But MCPM requires (as I see in the dcscb back-end) a return from such EL3 code to call __mcpm_cpu_down() and do WFI on the non-secure side. This is incompatible with the (current) PSCI specification.
The MCPM backend doesn't _need_ to call __mcpm_cpu_down() and friends. Those are helpers for when there is no firmware and proper synchronization needs to be done between different cores.
The reason people currently ask for MCPM is exactly this synchronisation which they don't want to do in the firmware. As I said in a previous post, I'm not against MCPM as such but against the back-ends which will eventually get non-standard secure calls.
One think I don't like about MCPM (and I raised it during review) is the cluster/cpu separation with hard-coded number of clusters. I would have really liked a linear view of the CPUs and let the back-end (or MCPM library itself) handle the topology. I don't think it's hard to change anyway.
If you have PSCI then the MCPM call graph is roughly:
mcpm_cpu_power_down() psci_power_down() psci_ops.cpu_off(power_state)
That's it. Nothing has to call back into the kernel.
So for arm64 we expose PSCI functionality via smp_operations (cpu up/down, suspend is work in progress). Populating smp_operations is driven from DT and it has been decoupled from the SoC code. What would the MCPM indirection bring here?
In the presence of PSCI firmware, do you agree that a potential MCPM back-end should be generic (not tied to an SoC)? In such case, what would the MCPM front-end bring which cannot be currently handled by smp_operations (or an extension to it)?
I don't (yet?) see the point of PSCI back-end to arm64 MCPM since all people are asking MCPM for is exactly to avoid the PSCI implementation.
Couple of comments on the patch below. Not aimed as a proper review:
--- /dev/null +++ b/arch/arm/mach-vexpress/tc2_pm_psci.c
[…]
+static void tc2_pm_psci_power_down(void) +{
struct psci_power_state power_state;
unsigned int mpidr, cpu, cluster;
mpidr = read_cpuid_mpidr();
cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1);
BUG_ON(!psci_ops.cpu_off);
switch (atomic_dec_return(&tc2_pm_use_count[cpu][cluster])) {
case 1:
/*
* Overtaken by a power up. Flush caches, exit coherency,
* return & fake a reset
*/
asm volatile (
"mrc p15, 0, ip, c1, c0, 0 \n\t"
"bic ip, ip, #(1 << 2) @ clear C bit \n\t"
"mcr p15, 0, ip, c1, c0, 0 \n\t"
"dsb \n\t"
"isb"
: : : "ip" );
flush_cache_louis();
asm volatile (
"clrex \n\t"
"mrc p15, 0, ip, c1, c0, 1 \n\t"
"bic ip, ip, #(1 << 6) @ clear SMP bit \n\t"
"mcr p15, 0, ip, c1, c0, 1 \n\t"
"isb \n\t"
"dsb"
: : : "ip" );
return;
The above part needs to be done on the secure side, ACTLR.SMP bit cannot be cleared on the non-secure side. Is this return with coherency disabled required by MCPM?
We go through soft reboot after this, so caches have to be clean and disabled as required by the boot protocol. setup_mm_for_reboot() has already been called, but that is insufficient: we still need to clean the caches. The reason why the backend still has to do that is that it seems impossible to factor the cache handling out of the backend in a reusable way.
This means that there is a redundant cache flush: because whatever, the firmware must do it again, and if it fails to do so it may be impossible to work around anyway, because dangling dirty lines in the cache across powerdown could lead to bus deadlocks etc. even if the Secure World doesn't care about the actual data they contain. The assumption is that cleaning a cache that is already clean will be inexpensive.
However, the boot protocol doesn't (and can't) require the CPU to be noncoherent. Clearing the SMP bit here is certainly not useful, probably not possible, and possibly not safe (since it will #undef on some CPUs).
case 0:
/* A normal request to possibly power down the cluster */
power_state.id = PSCI_POWER_STATE_ID;
power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;
power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;
psci_ops.cpu_off(power_state);
/* On success this function never returns */
default:
/* Any other value is a bug */
BUG();
}
+}
+static void tc2_pm_psci_suspend(u64 unused) +{
struct psci_power_state power_state;
BUG_ON(!psci_ops.cpu_suspend);
/* On TC2 always attempt to power down the cluster */
power_state.id = PSCI_POWER_STATE_ID;
power_state.type = PSCI_POWER_STATE_TYPE_POWER_DOWN;
power_state.affinity_level = PSCI_POWER_STATE_AFFINITY_LEVEL1;
psci_ops.cpu_suspend(power_state, virt_to_phys(bL_entry_point));
/* On success this function never returns */
BUG();
+}
CPU_SUSPEND is allowed to return if there is a pending interrupt.
True. The BUG should be conditional on the the return value being != SUCCESS. SUCCESS just means the call was valid, but preempted by a wakeup.
Possibly some aspects of the CPU_SUSPEND behaviour weren't tied down at the time this code was written.
Cheers ---Dave