From: Hanjun Guo hanjun.guo@linaro.org
In order to implement both topology and ACPI support we need to define per-CPU data.
Signed-off-by: Graeme Gregory graeme.gregory@linaro.org Signed-off-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Mark Brown broonie@linaro.org --- arch/arm64/include/asm/cpu.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) create mode 100644 arch/arm64/include/asm/cpu.h
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h new file mode 100644 index 000000000000..d67ff011d361 --- /dev/null +++ b/arch/arm64/include/asm/cpu.h @@ -0,0 +1,25 @@ +/* + * Copyright (C) 2004-2005 ARM Ltd. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#ifndef __ASM_ARM64_CPU_H +#define __ASM_ARM64_CPU_H + +#include <linux/percpu.h> +#include <linux/cpu.h> +#include <linux/topology.h> + +struct cpuinfo_arm { + struct cpu cpu; + u64 cpuid; +#ifdef CONFIG_SMP + unsigned int loops_per_jiffy; +#endif +}; + +DECLARE_PER_CPU(struct cpuinfo_arm, cpu_data); + +#endif
From: Mark Hambleton mahamble@broadcom.com
Describe the virtio device so we can mount disk images in the simulator.
[Reduced the size of the region based on feedback from review -- broonie]
Signed-off-by: Mark Hambleton mahamble@broadcom.com Signed-off-by: Mark Brown broonie@linaro.org Acked-by: Will Deacon will.deacon@arm.com --- arch/arm64/boot/dts/rtsm_ve-motherboard.dtsi | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/arch/arm64/boot/dts/rtsm_ve-motherboard.dtsi b/arch/arm64/boot/dts/rtsm_ve-motherboard.dtsi index b45e5f39f577..2f2ecd217363 100644 --- a/arch/arm64/boot/dts/rtsm_ve-motherboard.dtsi +++ b/arch/arm64/boot/dts/rtsm_ve-motherboard.dtsi @@ -183,6 +183,12 @@ clocks = <&v2m_oscclk1>, <&v2m_clk24mhz>; clock-names = "clcdclk", "apb_pclk"; }; + + virtio_block@0130000 { + compatible = "virtio,mmio"; + reg = <0x130000 0x200>; + interrupts = <42>; + }; };
v2m_fixed_3v3: fixedregulator@0 {
From: Mark Brown broonie@linaro.org
Add a dts file for ARMv8 fixed virtual platform which models a big.LITTLE configuration for the architecture. This is based on the DT shipped by ARM but has been modified for mainline, removing features not present in mainline and updating the CPU bindings for mainline.
Signed-off-by: Mark Brown broonie@linaro.org --- arch/arm64/boot/dts/Makefile | 4 +- arch/arm64/boot/dts/fvp-base-gicv2-psci.dts | 233 ++++++++++++++++++++++++++++ 2 files changed, 236 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/boot/dts/fvp-base-gicv2-psci.dts
diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile index c52bdb051f66..2d16cdbe5d47 100644 --- a/arch/arm64/boot/dts/Makefile +++ b/arch/arm64/boot/dts/Makefile @@ -1,4 +1,6 @@ -dtb-$(CONFIG_ARCH_VEXPRESS) += rtsm_ve-aemv8a.dtb foundation-v8.dtb +dtb-$(CONFIG_ARCH_VEXPRESS) += rtsm_ve-aemv8a.dtb \ + foundation-v8.dtb \ + fvp-base-gicv2-psci.dtb dtb-$(CONFIG_ARCH_XGENE) += apm-mustang.dtb
targets += dtbs diff --git a/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts b/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts new file mode 100644 index 000000000000..736f546123bb --- /dev/null +++ b/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts @@ -0,0 +1,233 @@ +/* + * Copyright (c) 2013, ARM Limited. All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * Redistributions of source code must retain the above copyright notice, this + * list of conditions and the following disclaimer. + * + * Redistributions in binary form must reproduce the above copyright notice, + * this list of conditions and the following disclaimer in the documentation + * and/or other materials provided with the distribution. + * + * Neither the name of ARM nor the names of its contributors may be used + * to endorse or promote products derived from this software without specific + * prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +/dts-v1/; + +/memreserve/ 0x80000000 0x00010000; + +/ { +}; + +/ { + model = "FVP Base"; + compatible = "arm,vfp-base", "arm,vexpress"; + interrupt-parent = <&gic>; + #address-cells = <2>; + #size-cells = <2>; + + chosen { }; + + aliases { + serial0 = &v2m_serial0; + serial1 = &v2m_serial1; + serial2 = &v2m_serial2; + serial3 = &v2m_serial3; + }; + + psci { + compatible = "arm,psci"; + method = "smc"; + cpu_suspend = <0xc4000001>; + cpu_off = <0x84000002>; + cpu_on = <0xc4000003>; + }; + + cpus { + #address-cells = <2>; + #size-cells = <0>; + + big0: cpu@0 { + device_type = "cpu"; + compatible = "arm,cortex-a57", "arm,armv8"; + reg = <0x0 0x0>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + big1: cpu@1 { + device_type = "cpu"; + compatible = "arm,cortex-a57", "arm,armv8"; + reg = <0x0 0x1>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + big2: cpu@2 { + device_type = "cpu"; + compatible = "arm,cortex-a57", "arm,armv8"; + reg = <0x0 0x2>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + big3: cpu@3 { + device_type = "cpu"; + compatible = "arm,cortex-a57", "arm,armv8"; + reg = <0x0 0x3>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + little0: cpu@100 { + device_type = "cpu"; + compatible = "arm,cortex-a53", "arm,armv8"; + reg = <0x0 0x100>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + little1: cpu@101 { + device_type = "cpu"; + compatible = "arm,cortex-a53", "arm,armv8"; + reg = <0x0 0x101>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + little2: cpu@102 { + device_type = "cpu"; + compatible = "arm,cortex-a53", "arm,armv8"; + reg = <0x0 0x102>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + little3: cpu@103 { + device_type = "cpu"; + compatible = "arm,cortex-a53", "arm,armv8"; + reg = <0x0 0x103>; + enable-method = "psci"; + clock-frequency = <1000000>; + }; + }; + + memory@80000000 { + device_type = "memory"; + reg = <0x00000000 0x80000000 0 0x80000000>, + <0x00000008 0x80000000 0 0x80000000>; + }; + + gic: interrupt-controller@2f000000 { + compatible = "arm,cortex-a15-gic", "arm,cortex-a9-gic"; + #interrupt-cells = <3>; + #address-cells = <0>; + interrupt-controller; + reg = <0x0 0x2f000000 0 0x10000>, + <0x0 0x2c000000 0 0x2000>, + <0x0 0x2c010000 0 0x2000>, + <0x0 0x2c02F000 0 0x2000>; + interrupts = <1 9 0xf04>; + }; + + timer { + compatible = "arm,armv8-timer"; + interrupts = <1 13 0xff01>, + <1 14 0xff01>, + <1 11 0xff01>, + <1 10 0xff01>; + clock-frequency = <100000000>; + }; + + timer@2a810000 { + compatible = "arm,armv7-timer-mem"; + reg = <0x0 0x2a810000 0x0 0x10000>; + clock-frequency = <100000000>; + #address-cells = <2>; + #size-cells = <2>; + ranges; + frame@2a820000 { + frame-number = <0>; + interrupts = <0 25 4>; + reg = <0x0 0x2a820000 0x0 0x10000>; + }; + }; + + pmu { + compatible = "arm,armv8-pmuv3"; + interrupts = <0 60 4>, + <0 61 4>, + <0 62 4>, + <0 63 4>; + }; + + smb { + compatible = "simple-bus"; + + #address-cells = <2>; + #size-cells = <1>; + ranges = <0 0 0 0x08000000 0x04000000>, + <1 0 0 0x14000000 0x04000000>, + <2 0 0 0x18000000 0x04000000>, + <3 0 0 0x1c000000 0x04000000>, + <4 0 0 0x0c000000 0x04000000>, + <5 0 0 0x10000000 0x04000000>; + + #interrupt-cells = <1>; + interrupt-map-mask = <0 0 63>; + interrupt-map = <0 0 0 &gic 0 0 4>, + <0 0 1 &gic 0 1 4>, + <0 0 2 &gic 0 2 4>, + <0 0 3 &gic 0 3 4>, + <0 0 4 &gic 0 4 4>, + <0 0 5 &gic 0 5 4>, + <0 0 6 &gic 0 6 4>, + <0 0 7 &gic 0 7 4>, + <0 0 8 &gic 0 8 4>, + <0 0 9 &gic 0 9 4>, + <0 0 10 &gic 0 10 4>, + <0 0 11 &gic 0 11 4>, + <0 0 12 &gic 0 12 4>, + <0 0 13 &gic 0 13 4>, + <0 0 14 &gic 0 14 4>, + <0 0 15 &gic 0 15 4>, + <0 0 16 &gic 0 16 4>, + <0 0 17 &gic 0 17 4>, + <0 0 18 &gic 0 18 4>, + <0 0 19 &gic 0 19 4>, + <0 0 20 &gic 0 20 4>, + <0 0 21 &gic 0 21 4>, + <0 0 22 &gic 0 22 4>, + <0 0 23 &gic 0 23 4>, + <0 0 24 &gic 0 24 4>, + <0 0 25 &gic 0 25 4>, + <0 0 26 &gic 0 26 4>, + <0 0 27 &gic 0 27 4>, + <0 0 28 &gic 0 28 4>, + <0 0 29 &gic 0 29 4>, + <0 0 30 &gic 0 30 4>, + <0 0 31 &gic 0 31 4>, + <0 0 32 &gic 0 32 4>, + <0 0 33 &gic 0 33 4>, + <0 0 34 &gic 0 34 4>, + <0 0 35 &gic 0 35 4>, + <0 0 36 &gic 0 36 4>, + <0 0 37 &gic 0 37 4>, + <0 0 38 &gic 0 38 4>, + <0 0 39 &gic 0 39 4>, + <0 0 40 &gic 0 40 4>, + <0 0 41 &gic 0 41 4>, + <0 0 42 &gic 0 42 4>; + + /include/ "rtsm_ve-motherboard.dtsi" + }; +};
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
From: Mark Brown broonie@linaro.org
Add a dts file for ARMv8 fixed virtual platform which models a big.LITTLE configuration for the architecture. This is based on the DT shipped by ARM but has been modified for mainline, removing features not present in mainline and updating the CPU bindings for mainline.
Signed-off-by: Mark Brown broonie@linaro.org
arch/arm64/boot/dts/Makefile | 4 +- arch/arm64/boot/dts/fvp-base-gicv2-psci.dts | 233 ++++++++++++++++++++++++++++ 2 files changed, 236 insertions(+), 1 deletion(-) create mode 100644 arch/arm64/boot/dts/fvp-base-gicv2-psci.dts
diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile index c52bdb051f66..2d16cdbe5d47 100644 --- a/arch/arm64/boot/dts/Makefile +++ b/arch/arm64/boot/dts/Makefile @@ -1,4 +1,6 @@ -dtb-$(CONFIG_ARCH_VEXPRESS) += rtsm_ve-aemv8a.dtb foundation-v8.dtb +dtb-$(CONFIG_ARCH_VEXPRESS) += rtsm_ve-aemv8a.dtb \
foundation-v8.dtb \
fvp-base-gicv2-psci.dtb
dtb-$(CONFIG_ARCH_XGENE) += apm-mustang.dtb targets += dtbs diff --git a/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts b/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts new file mode 100644 index 000000000000..736f546123bb --- /dev/null +++ b/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts @@ -0,0 +1,233 @@ +/*
- Copyright (c) 2013, ARM Limited. All rights reserved.
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this
- list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
- this list of conditions and the following disclaimer in the documentation
- and/or other materials provided with the distribution.
- Neither the name of ARM nor the names of its contributors may be used
- to endorse or promote products derived from this software without specific
- prior written permission.
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
- LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- POSSIBILITY OF SUCH DAMAGE.
- */
+/dts-v1/;
+/memreserve/ 0x80000000 0x00010000;
While we are admittedly missing them elsewhere, it would be nice to have a comment stating what the memreserve is for. With a proper PSCI implementation, I don't see why this should be necessary.
+/ { +};
+/ {
- model = "FVP Base";
FVP Base (is as the name implies) a base upon which particular model instances are built. This name should be clarified (e.g. "FVP Base A57x4 A53x4").
That also applies to the filename.
- compatible = "arm,vfp-base", "arm,vexpress";
s/vfp/fvp/
Likewise, it would be nice to have a more specific compatible string here.
As the FVP is not a vexpress (though it is similar), we should probably have "arm,fvp-base" as the fallback and drop the "arm,vexpress" entry.
- interrupt-parent = <&gic>;
- #address-cells = <2>;
- #size-cells = <2>;
- chosen { };
- aliases {
serial0 = &v2m_serial0;
serial1 = &v2m_serial1;
serial2 = &v2m_serial2;
serial3 = &v2m_serial3;
- };
- psci {
compatible = "arm,psci";
method = "smc";
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
cpu_on = <0xc4000003>;
- };
Are these IDs right? One of these IDs is a different width than the others.
Which firmware/bootloader does this correspond to?
- cpus {
#address-cells = <2>;
#size-cells = <0>;
big0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a57", "arm,armv8";
reg = <0x0 0x0>;
enable-method = "psci";
clock-frequency = <1000000>;
Is clock-frequency used anywhere? Is it a useful thing to have (regardless of whether ePAPR defines it)?
[...]
- timer {
compatible = "arm,armv8-timer";
interrupts = <1 13 0xff01>,
<1 14 0xff01>,
<1 11 0xff01>,
<1 10 0xff01>;
clock-frequency = <100000000>;
A clock-frequency property in a timer is an indicator that the bootloader/firmware is broken and should be fixed. Is this actually necessary, or does the firmware/loader program CNTFRQ correctly on all CPUs?
- };
- timer@2a810000 {
compatible = "arm,armv7-timer-mem";
reg = <0x0 0x2a810000 0x0 0x10000>;
clock-frequency = <100000000>;
#address-cells = <2>;
#size-cells = <2>;
ranges;
frame@2a820000 {
frame-number = <0>;
interrupts = <0 25 4>;
reg = <0x0 0x2a820000 0x0 0x10000>;
};
- };
- pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <0 60 4>,
<0 61 4>,
<0 62 4>,
<0 63 4>;
- };
I assume this is just the A57 cores? That should probably be noted for now. In future we'll need to define the relationship between interrupts and CPUs, and describe the A53 cores.
Thanks, Mark.
On Wed, Dec 11, 2013 at 01:55:36PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
+/dts-v1/;
+/memreserve/ 0x80000000 0x00010000;
While we are admittedly missing them elsewhere, it would be nice to have a comment stating what the memreserve is for. With a proper PSCI implementation, I don't see why this should be necessary.
Could you tell me what it's there for? Like you say everyone else has it with no comments and the source doesn't either...
+/ {
- model = "FVP Base";
FVP Base (is as the name implies) a base upon which particular model instances are built. This name should be clarified (e.g. "FVP Base A57x4 A53x4").
That also applies to the filename.
I can update these, though they do seem to come from what you guys are releasing - you might want to follow up on this internally (this applies to almost all of your review comments, sorry). It's probably going to be a bit confusing for users to have the filename change but ho hum :/
As the FVP is not a vexpress (though it is similar), we should probably have "arm,fvp-base" as the fallback and drop the "arm,vexpress" entry.
Is it not intended to be a software emulation of a vexpress and hence supposed to be broadly compatible with one?
- psci {
compatible = "arm,psci";
method = "smc";
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
cpu_on = <0xc4000003>;
- };
Are these IDs right? One of these IDs is a different width than the others.
I have no idea, I have no documentation for any of this stuff other than the DT you guys release on github - it's just copied verbatim from there.
Which firmware/bootloader does this correspond to?
I'm testing using the Linaro 13.11 release.
big0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a57", "arm,armv8";
reg = <0x0 0x0>;
enable-method = "psci";
clock-frequency = <1000000>;
Is clock-frequency used anywhere? Is it a useful thing to have (regardless of whether ePAPR defines it)?
The topology code for ARMv7 (which I'm following for ARMv8) uses this in conjunction with the compatible to provide information to the scheduler about the relative performance of the cores.
- timer {
compatible = "arm,armv8-timer";
interrupts = <1 13 0xff01>,
<1 14 0xff01>,
<1 11 0xff01>,
<1 10 0xff01>;
clock-frequency = <100000000>;
A clock-frequency property in a timer is an indicator that the bootloader/firmware is broken and should be fixed. Is this actually necessary, or does the firmware/loader program CNTFRQ correctly on all CPUs?
I can try to see what happens if I remove it - I doubt anyone has tested without it...
- pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <0 60 4>,
<0 61 4>,
<0 62 4>,
<0 63 4>;
- };
I assume this is just the A57 cores? That should probably be noted for now. In future we'll need to define the relationship between interrupts and CPUs, and describe the A53 cores.
Again I don't know, this is just copied verbatim from what you guys release - I have no additional documentation.
On Wed, Dec 11, 2013 at 02:11:48PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 01:55:36PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
+/dts-v1/;
+/memreserve/ 0x80000000 0x00010000;
While we are admittedly missing them elsewhere, it would be nice to have a comment stating what the memreserve is for. With a proper PSCI implementation, I don't see why this should be necessary.
Could you tell me what it's there for? Like you say everyone else has it with no comments and the source doesn't either...
When using the bootwrapper, the bootwrapper itself takes up a portion of memory (along with the spin-table) which would otherwise be available to the kernel. This is also true with my bootwrapper PSCI implementation. The memreserve tells the kernel not to stomp over this memory (though it will map it in).
With a proper firmware, I'd expect the PSCI implementation to be outside of the memory exposed to non-secure software, and thus there shouldn't be anything to reserve.
It's possible that there is a reason for the reservation, but I'd rather we understood it than we propagated and codified a misunderstanding.
+/ {
- model = "FVP Base";
FVP Base (is as the name implies) a base upon which particular model instances are built. This name should be clarified (e.g. "FVP Base A57x4 A53x4").
That also applies to the filename.
I can update these, though they do seem to come from what you guys are releasing - you might want to follow up on this internally (this applies to almost all of your review comments, sorry). It's probably going to be a bit confusing for users to have the filename change but ho hum :/
I'll try to chase up the issues, thanks for making me aware.
I don't see the name issue as a big problem. This DT has never been part of the kernel tree, so there's no filename compatibility issue to deal with. Existing users of the DT will already have to be modified to get the DTs from a new source.
There should be nothing hanging off the compatible string for the platform yet -- we have no board files or platform blobs in the arm64 port. If the model name is being used as anything other than a handy indicator to users, then that's broken anyway.
As the FVP is not a vexpress (though it is similar), we should probably have "arm,fvp-base" as the fallback and drop the "arm,vexpress" entry.
Is it not intended to be a software emulation of a vexpress and hence supposed to be broadly compatible with one?
As I understand it, "Base" is a base platform that's somewhat like vexpress, but is not intended to be an emulation of vexpress. It may be broadly equivalent, but to limit confusion I'd rather treat them separately.
Regardless, top-level compatible strings aren't used for anything at the moment (and hopefully never will be), so nothing should be relying on "arm,vexpress" being present and we can remove it.
- psci {
compatible = "arm,psci";
method = "smc";
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
cpu_on = <0xc4000003>;
- };
Are these IDs right? One of these IDs is a different width than the others.
I have no idea, I have no documentation for any of this stuff other than the DT you guys release on github - it's just copied verbatim from there.
So this goes with the trusted firmware?
Which firmware/bootloader does this correspond to?
I'm testing using the Linaro 13.11 release.
Release of what? I'm not familiar with the entire stack Linaro manage.
I guess the trusted firmware is being used, as mentioned above?
big0: cpu@0 {
device_type = "cpu";
compatible = "arm,cortex-a57", "arm,armv8";
reg = <0x0 0x0>;
enable-method = "psci";
clock-frequency = <1000000>;
Is clock-frequency used anywhere? Is it a useful thing to have (regardless of whether ePAPR defines it)?
The topology code for ARMv7 (which I'm following for ARMv8) uses this in conjunction with the compatible to provide information to the scheduler about the relative performance of the cores.
Ok. While I have concerns regarding the topology code, I'm not averse to describing the clock-frequency here.
However I worry about how configurable clocks are going to be handled here. On the 32-bit side I see some platforms with clock-frequency and others with clocks. We should figure out what the expected way to handle configurable clocks before we add the code for handling a fixed rate clock here, or we're going to back ourselves into a corner where this support can only work on a subset of systems.
- timer {
compatible = "arm,armv8-timer";
interrupts = <1 13 0xff01>,
<1 14 0xff01>,
<1 11 0xff01>,
<1 10 0xff01>;
clock-frequency = <100000000>;
A clock-frequency property in a timer is an indicator that the bootloader/firmware is broken and should be fixed. Is this actually necessary, or does the firmware/loader program CNTFRQ correctly on all CPUs?
I can try to see what happens if I remove it - I doubt anyone has tested without it...
I would very much hope it's unnecessary. If it's needed, then bugs should be filed and the firmware fixed. The system initialisation code _must_ set CNTFRQ on all CPUs or it's fundamentally broken.
- pmu {
compatible = "arm,armv8-pmuv3";
interrupts = <0 60 4>,
<0 61 4>,
<0 62 4>,
<0 63 4>;
- };
I assume this is just the A57 cores? That should probably be noted for now. In future we'll need to define the relationship between interrupts and CPUs, and describe the A53 cores.
Again I don't know, this is just copied verbatim from what you guys release - I have no additional documentation.
Given that the DTs on the github page don't mention specific CPUs, these interrupts might not be correct for specific implementations.
I'd rather that the elements we are unsure about were dropped from the DT for the timebeing rather than being present but incorrect. There's less room for something to blow up that way, and it makes it clearer where the deficiencies are.
Thanks, Mark.
On Wed, Dec 11, 2013 at 03:04:33PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 02:11:48PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 01:55:36PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
+/memreserve/ 0x80000000 0x00010000;
While we are admittedly missing them elsewhere, it would be nice to have a comment stating what the memreserve is for. With a proper PSCI implementation, I don't see why this should be necessary.
Could you tell me what it's there for? Like you say everyone else has it with no comments and the source doesn't either...
When using the bootwrapper, the bootwrapper itself takes up a portion of memory (along with the spin-table) which would otherwise be available to the kernel. This is also true with my bootwrapper PSCI implementation. The memreserve tells the kernel not to stomp over this memory (though it will map it in).
OK, I'll try to distill that down to a comment (and add the same comment to the other DTs). However see below...
With a proper firmware, I'd expect the PSCI implementation to be outside of the memory exposed to non-secure software, and thus there shouldn't be anything to reserve.
It's possible that there is a reason for the reservation, but I'd rather we understood it than we propagated and codified a misunderstanding.
It'd certainly be nice if it were clearer.
I don't see the name issue as a big problem. This DT has never been part of the kernel tree, so there's no filename compatibility issue to deal with. Existing users of the DT will already have to be modified to get the DTs from a new source.
I'm more worried about existing users not noticing that this is a DT for the same thing.
There should be nothing hanging off the compatible string for the platform yet -- we have no board files or platform blobs in the arm64 port. If the model name is being used as anything other than a handy indicator to users, then that's broken anyway.
Yeah, that doesn't seem at all risky.
- psci {
compatible = "arm,psci";
method = "smc";
cpu_suspend = <0xc4000001>;
cpu_off = <0x84000002>;
cpu_on = <0xc4000003>;
- };
Are these IDs right? One of these IDs is a different width than the others.
I have no idea, I have no documentation for any of this stuff other than the DT you guys release on github - it's just copied verbatim from there.
So this goes with the trusted firmware?
This is on the ARM github site so I'm not 100% sure how tied that is.
Which firmware/bootloader does this correspond to?
I'm testing using the Linaro 13.11 release.
Release of what? I'm not familiar with the entire stack Linaro manage.
I guess the trusted firmware is being used, as mentioned above?
The only release Linaro have for ARMv8 is an OE one AFAICT which I got from here:
http://releases.linaro.org/13.11/openembedded/aarch64
According to the notes there it includes the trusted firmware. The above is all the information I have on it.
Ok. While I have concerns regarding the topology code, I'm not averse to describing the clock-frequency here.
However I worry about how configurable clocks are going to be handled here. On the 32-bit side I see some platforms with clock-frequency and others with clocks. We should figure out what the expected way to handle configurable clocks before we add the code for handling a fixed rate clock here, or we're going to back ourselves into a corner where this support can only work on a subset of systems.
Well, we can always extend later. I do share your concerns about doing this using clock-frequency though, that's why I specified it as being the maximum when I copied it into the kernel bindings. That's clearly the intention of the existing ARMv7 implementation.
One trick here is that this is all happening rather early in boot so going through the whole clock tree might be entertaining, though we can probably go round and do another pass when cpufreq comes up.
I'd rather that the elements we are unsure about were dropped from the DT for the timebeing rather than being present but incorrect. There's less room for something to blow up that way, and it makes it clearer where the deficiencies are.
OK. To be honest I'm leaning towards letting you guys upstream this given that all the information I have on it is the DT so from my point of view I'm unsure about essentially all of it.
On Wed, 2013-12-11 at 15:04 +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 02:11:48PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 01:55:36PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
[...]
+/ {
- model = "FVP Base";
FVP Base (is as the name implies) a base upon which particular model instances are built. This name should be clarified (e.g. "FVP Base A57x4 A53x4").
That also applies to the filename.
I can update these, though they do seem to come from what you guys are releasing - you might want to follow up on this internally (this applies to almost all of your review comments, sorry). It's probably going to be a bit confusing for users to have the filename change but ho hum :/
I'll try to chase up the issues, thanks for making me aware.
I don't see the name issue as a big problem. This DT has never been part of the kernel tree, so there's no filename compatibility issue to deal with. Existing users of the DT will already have to be modified to get the DTs from a new source.
There should be nothing hanging off the compatible string for the platform yet -- we have no board files or platform blobs in the arm64 port. If the model name is being used as anything other than a handy indicator to users, then that's broken anyway.
I believe Android uses model names to determine the filenames of init scripts to run. That's not a kernel problem, but thought I would point out one 'broken' use that I have first hand experience of, having been tripped up before by ARM's twice yearly lets-rename-everything-again exercise ;-)
On 11 December 2013 16:08, Jon Medhurst (Tixy) tixy@linaro.org wrote:
On Wed, 2013-12-11 at 15:04 +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 02:11:48PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 01:55:36PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
[...]
+/ {
model = "FVP Base";
FVP Base (is as the name implies) a base upon which particular model instances are built. This name should be clarified (e.g. "FVP Base A57x4 A53x4").
That also applies to the filename.
This same file is used to boot the AEMv8 architectural model as well as the Cortex A57-A73 model, so I think someone would need to find another filename that makes sense in both contexts.
I guess that using the same file for two models could in itself be a problem solved via includes and simpler wrappers.
But as Mark Brown says, ARM have originated this file and personally I'd rather it was changed in the ARM Trusted Firmware repo first and propagated here.
To answer another question from earlier: there is no direct correlation between the ARM Trusted Firmware and the device tree files other than the same repo hosts both files. Trusted firmware does not build or embed the DTBs. UEFI is currently what loads the DTB and passes it to the kernel. And that isn't part of the trusted firmware repo, of course.
I can update these, though they do seem to come from what you guys are releasing - you might want to follow up on this internally (this applies to almost all of your review comments, sorry). It's probably going to be a bit confusing for users to have the filename change but ho hum :/
I'll try to chase up the issues, thanks for making me aware.
I don't see the name issue as a big problem. This DT has never been part of the kernel tree, so there's no filename compatibility issue to deal with. Existing users of the DT will already have to be modified to get the DTs from a new source.
There should be nothing hanging off the compatible string for the platform yet -- we have no board files or platform blobs in the arm64 port. If the model name is being used as anything other than a handy indicator to users, then that's broken anyway.
I believe Android uses model names to determine the filenames of init scripts to run. That's not a kernel problem, but thought I would point out one 'broken' use that I have first hand experience of, having been tripped up before by ARM's twice yearly lets-rename-everything-again exercise ;-)
-- Tixy
linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel
On Wed, Dec 11, 2013 at 04:41:32PM +0000, Ryan Harkin wrote:
On 11 December 2013 16:08, Jon Medhurst (Tixy) tixy@linaro.org wrote:
On Wed, 2013-12-11 at 15:04 +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 02:11:48PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 01:55:36PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 01:13:23PM +0000, Mark Brown wrote:
[...]
+/ {
model = "FVP Base";
FVP Base (is as the name implies) a base upon which particular model instances are built. This name should be clarified (e.g. "FVP Base A57x4 A53x4").
That also applies to the filename.
This same file is used to boot the AEMv8 architectural model as well as the Cortex A57-A73 model, so I think someone would need to find another filename that makes sense in both contexts.
I guess that using the same file for two models could in itself be a problem solved via includes and simpler wrappers.
We should have a base platform dtsi that describes the standard devices and memory map.
Individual variants can include the dtsi and describe the model name and compatible string, CPUs, variant-specific devices, and firmware details.
While the same DT will work regardless of cpu type currently, it's relatively easy to factor that out anyway.
But as Mark Brown says, ARM have originated this file and personally I'd rather it was changed in the ARM Trusted Firmware repo first and propagated here.
I completely agree that the DTs in the Trusted Firmware repo should be corrected. I will try to get them fixed.
To answer another question from earlier: there is no direct correlation between the ARM Trusted Firmware and the device tree files other than the same repo hosts both files. Trusted firmware does not build or embed the DTBs. UEFI is currently what loads the DTB and passes it to the kernel. And that isn't part of the trusted firmware repo, of course.
There _is_ a direct correlation between the trusted firmware and the DT; the psci node describes the Trusted Firmware PSCI implementation. If you don't use the Trusted Firmware then you need a different DT.
However, this would only be a platform variant built atop of the more common FVP Base, so most of the DT can be shared.
Thanks, Mark.
On Wed, Dec 11, 2013 at 05:09:03PM +0000, Mark Rutland wrote:
On Wed, Dec 11, 2013 at 04:41:32PM +0000, Ryan Harkin wrote:
But as Mark Brown says, ARM have originated this file and personally I'd rather it was changed in the ARM Trusted Firmware repo first and propagated here.
I completely agree that the DTs in the Trusted Firmware repo should be corrected. I will try to get them fixed.
And mainlined?
From: Mark Brown broonie@linaro.org
Add basic CPU topology support to arm64, based on the existing pre-v8 code and some work done by Mark Hambleton. This patch does not implement the ARM CPU topology bindings, it implements equivalent support to the existing the equivalent pre-v8 capability using the mandatory MPIDR information in the CPU binding in device tree and assuming that a simple SMP or multi-cluster topology is in use.
The primary goal is to separate the architecture hookup for providing topology information from the DT parsing in order to ease review and avoid blocking the architecture code (which will be built on by other work) with the DT code review by providing something something simple and basic. Having this support should also make the kernel cope better with incomplete DTs.
Further patches will provide support for overriding this using the topology bindings, providing richer support for a wider range of systems.
Signed-off-by: Mark Brown broonie@linaro.org --- arch/arm64/Kconfig | 8 +++ arch/arm64/include/asm/cpu.h | 1 - arch/arm64/include/asm/cputype.h | 9 +++ arch/arm64/include/asm/smp_plat.h | 1 + arch/arm64/include/asm/topology.h | 42 +++++++++++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/setup.c | 9 +-- arch/arm64/kernel/smp.c | 19 ++++- arch/arm64/kernel/topology.c | 143 ++++++++++++++++++++++++++++++++++++++ 9 files changed, 227 insertions(+), 6 deletions(-) create mode 100644 arch/arm64/include/asm/topology.h create mode 100644 arch/arm64/kernel/topology.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 88c8b6c1341a..7b4dab852937 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -154,6 +154,14 @@ config SMP
If you don't know what to do here, say N.
+config ARM_CPU_TOPOLOGY + bool "Support CPU topology definition" + depends on SMP + default y + help + Support CPU topology definition, based on configuration + provided by the firmware. + config NR_CPUS int "Maximum number of CPUs (2-32)" range 2 32 diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index d67ff011d361..8a26b690110c 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -10,7 +10,6 @@
#include <linux/percpu.h> #include <linux/cpu.h> -#include <linux/topology.h>
struct cpuinfo_arm { struct cpu cpu; diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 5fe138e0b828..bd504739cbfd 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -29,6 +29,15 @@ #define INVALID_HWID ULONG_MAX
#define MPIDR_HWID_BITMASK 0xff00ffffff +#define MPIDR_SMP_BITMASK (0x3 << 30) +#define MPIDR_SMP_VALUE (0x2 << 30) +#define MPIDR_MT_BITMASK (0x1 << 24) +#define MPIDR_LEVEL_BITS 8 +#define MPIDR_LEVEL_MASK ((1 << MPIDR_LEVEL_BITS) - 1) + +#define MPIDR_AFFINITY_LEVEL(mpidr, level) \ + ((mpidr >> (MPIDR_LEVEL_BITS * level)) & MPIDR_LEVEL_MASK) +
#define read_cpuid(reg) ({ \ u64 __val; \ diff --git a/arch/arm64/include/asm/smp_plat.h b/arch/arm64/include/asm/smp_plat.h index ed43a0d2b1b2..4ad4ecc93bcf 100644 --- a/arch/arm64/include/asm/smp_plat.h +++ b/arch/arm64/include/asm/smp_plat.h @@ -19,6 +19,7 @@ #ifndef __ASM_SMP_PLAT_H #define __ASM_SMP_PLAT_H
+#include <linux/cpumask.h> #include <asm/types.h>
/* diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h new file mode 100644 index 000000000000..611edefaeaf1 --- /dev/null +++ b/arch/arm64/include/asm/topology.h @@ -0,0 +1,42 @@ +#ifndef _ASM_ARM_TOPOLOGY_H +#define _ASM_ARM_TOPOLOGY_H + +#ifdef CONFIG_ARM_CPU_TOPOLOGY + +#include <linux/cpumask.h> + +struct cputopo_arm { + int thread_id; + int core_id; + int socket_id; + cpumask_t thread_sibling; + cpumask_t core_sibling; +}; + +extern struct cputopo_arm cpu_topology[NR_CPUS]; + +#define topology_physical_package_id(cpu) (cpu_topology[cpu].socket_id) +#define topology_core_id(cpu) (cpu_topology[cpu].core_id) +#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) +#define topology_thread_cpumask(cpu) (&cpu_topology[cpu].thread_sibling) + +#define mc_capable() (cpu_topology[0].socket_id != -1) +#define smt_capable() (cpu_topology[0].thread_id != -1) + +void init_cpu_topology(void); +void store_cpu_topology(unsigned int cpuid); +const struct cpumask *cpu_coregroup_mask(int cpu); +int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask); + +#else + +static inline void init_cpu_topology(void) { } +static inline void store_cpu_topology(unsigned int cpuid) { } +static inline int cluster_to_logical_mask(unsigned int socket_id, + cpumask_t *cluster_mask) { return -EINVAL; } + +#endif + +#include <asm-generic/topology.h> + +#endif /* _ASM_ARM_TOPOLOGY_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 5ba2fd43a75b..2d145e38ad49 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -18,6 +18,7 @@ arm64-obj-$(CONFIG_SMP) += smp.o smp_spin_table.o arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o arm64-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o +arm64-obj-$(CONFIG_ARM_CPU_TOPOLOGY) += topology.o
obj-y += $(arm64-obj-y) vdso/ obj-m += $(arm64-obj-m) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 0bc5e4cbc017..dbe4a9ba90cb 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -54,6 +54,7 @@ #include <asm/traps.h> #include <asm/memblock.h> #include <asm/psci.h> +#include <asm/cpu.h>
unsigned int processor_id; EXPORT_SYMBOL(processor_id); @@ -250,16 +251,16 @@ static int __init arm64_device_init(void) } arch_initcall(arm64_device_init);
-static DEFINE_PER_CPU(struct cpu, cpu_data); +DEFINE_PER_CPU(struct cpuinfo_arm, cpu_data);
static int __init topology_init(void) { int i;
for_each_possible_cpu(i) { - struct cpu *cpu = &per_cpu(cpu_data, i); - cpu->hotpluggable = 1; - register_cpu(cpu, i); + struct cpuinfo_arm *cpu = &per_cpu(cpu_data, i); + cpu->cpu.hotpluggable = 1; + register_cpu(&cpu->cpu, i); }
return 0; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index a5aeefab03c3..f29c7ffad84a 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -35,7 +35,6 @@ #include <linux/clockchips.h> #include <linux/completion.h> #include <linux/of.h> - #include <asm/atomic.h> #include <asm/cacheflush.h> #include <asm/cputype.h> @@ -48,6 +47,7 @@ #include <asm/sections.h> #include <asm/tlbflush.h> #include <asm/ptrace.h> +#include <asm/cpu.h>
/* * as from 2.5, kernels no longer have an init_tasks structure @@ -113,6 +113,16 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) return ret; }
+static void __cpuinit smp_store_cpu_info(unsigned int cpuid) +{ + struct cpuinfo_arm *cpu_info = &per_cpu(cpu_data, cpuid); + + cpu_info->loops_per_jiffy = loops_per_jiffy; + cpu_info->cpuid = read_cpuid_id(); + + store_cpu_topology(cpuid); +} + /* * This is the secondary CPU boot entry. We're using this CPUs * idle thread stack, but a set of temporary page tables. @@ -150,6 +160,8 @@ asmlinkage void secondary_start_kernel(void) */ notify_cpu_starting(cpu);
+ smp_store_cpu_info(cpu); + /* * OK, now it's safe to let the boot CPU continue. Wait for * the CPU migration code to notice that the CPU is online @@ -387,6 +399,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus) int err; unsigned int cpu, ncores = num_possible_cpus();
+ init_cpu_topology(); + + smp_store_cpu_info(smp_processor_id()); + + /* * are we trying to boot more cores than exist? */ diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c new file mode 100644 index 000000000000..e0b40f48b448 --- /dev/null +++ b/arch/arm64/kernel/topology.c @@ -0,0 +1,143 @@ +/* + * arch/arm64/kernel/topology.c + * + * Copyright (C) 2011,2013 Linaro Limited. + * Written by: Vincent Guittot + * + * based on arch/sh/kernel/topology.c + * + * This file is subject to the terms and conditions of the GNU General Public + * License. See the file "COPYING" in the main directory of this archive + * for more details. + */ + +#include <linux/cpu.h> +#include <linux/cpumask.h> +#include <linux/export.h> +#include <linux/init.h> +#include <linux/percpu.h> +#include <linux/node.h> +#include <linux/nodemask.h> +#include <linux/sched.h> +#include <linux/slab.h> + +#include <asm/cputype.h> +#include <asm/smp_plat.h> +#include <asm/topology.h> + +/* + * cpu topology table + */ +struct cputopo_arm cpu_topology[NR_CPUS]; +EXPORT_SYMBOL_GPL(cpu_topology); + +const struct cpumask *cpu_coregroup_mask(int cpu) +{ + return &cpu_topology[cpu].core_sibling; +} + +static void update_siblings_masks(unsigned int cpuid) +{ + struct cputopo_arm *cpu_topo, *cpuid_topo = &cpu_topology[cpuid]; + int cpu; + + /* update core and thread sibling masks */ + for_each_possible_cpu(cpu) { + cpu_topo = &cpu_topology[cpu]; + + if (cpuid_topo->socket_id != cpu_topo->socket_id) + continue; + + cpumask_set_cpu(cpuid, &cpu_topo->core_sibling); + if (cpu != cpuid) + cpumask_set_cpu(cpu, &cpuid_topo->core_sibling); + + if (cpuid_topo->core_id != cpu_topo->core_id) + continue; + + cpumask_set_cpu(cpuid, &cpu_topo->thread_sibling); + if (cpu != cpuid) + cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling); + } + smp_wmb(); +} + +/* + * store_cpu_topology is called at boot when only one cpu is running + * and with the mutex cpu_hotplug.lock locked, when several cpus have booted, + * which prevents simultaneous write access to cpu_topology array + */ +void store_cpu_topology(unsigned int cpuid) +{ + struct cputopo_arm *cpuid_topo = &cpu_topology[cpuid]; + u64 mpidr; + + /* If the cpu topology has been already set, just return */ + if (cpuid_topo->core_id != -1) + return; + + mpidr = cpu_logical_map(cpuid); + + /* + * Create cpu topology mapping, assume the cores are largely + * independent since the DT bindings do not include the flags + * for MT. + */ + cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); + cpuid_topo->socket_id = MPIDR_AFFINITY_LEVEL(mpidr, 1); + + update_siblings_masks(cpuid); + + pr_info("CPU%u: cpu %d, socket %d mapped using MPIDR %llx\n", + cpuid, cpu_topology[cpuid].core_id, + cpu_topology[cpuid].socket_id, mpidr); +} + + +/* + * cluster_to_logical_mask - return cpu logical mask of CPUs in a cluster + * @socket_id: cluster HW identifier + * @cluster_mask: the cpumask location to be initialized, modified by the + * function only if return value == 0 + * + * Return: + * + * 0 on success + * -EINVAL if cluster_mask is NULL or there is no record matching socket_id + */ +int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask) +{ + int cpu; + + if (!cluster_mask) + return -EINVAL; + + for_each_online_cpu(cpu) + if (socket_id == topology_physical_package_id(cpu)) { + cpumask_copy(cluster_mask, topology_core_cpumask(cpu)); + return 0; + } + + return -EINVAL; +} + +/* + * init_cpu_topology is called at boot when only one cpu is running + * which prevent simultaneous write access to cpu_topology array + */ +void __init init_cpu_topology(void) +{ + unsigned int cpu; + + /* init core mask and power*/ + for_each_possible_cpu(cpu) { + struct cputopo_arm *cpu_topo = &(cpu_topology[cpu]); + + cpu_topo->thread_id = -1; + cpu_topo->core_id = -1; + cpu_topo->socket_id = -1; + cpumask_clear(&cpu_topo->core_sibling); + cpumask_clear(&cpu_topo->thread_sibling); + } + smp_wmb(); +}
On Wed, Dec 11, 2013 at 01:13:24PM +0000, Mark Brown wrote:
From: Mark Brown broonie@linaro.org
Add basic CPU topology support to arm64, based on the existing pre-v8 code and some work done by Mark Hambleton. This patch does not implement the ARM CPU topology bindings, it implements equivalent support to the existing the equivalent pre-v8 capability using the mandatory MPIDR information in the CPU binding in device tree and assuming that a simple SMP or multi-cluster topology is in use.
The primary goal is to separate the architecture hookup for providing topology information from the DT parsing in order to ease review and avoid blocking the architecture code (which will be built on by other work) with the DT code review by providing something something simple and basic. Having this support should also make the kernel cope better with incomplete DTs.
Further patches will provide support for overriding this using the topology bindings, providing richer support for a wider range of systems.
[Adding Lorenzo]
I seem to remember Lorenzo having patches already for this, along with bindings, Documentation etc. so it would be good to know how these two series are supposed to interact.
Will
Signed-off-by: Mark Brown broonie@linaro.org
arch/arm64/Kconfig | 8 +++ arch/arm64/include/asm/cpu.h | 1 - arch/arm64/include/asm/cputype.h | 9 +++ arch/arm64/include/asm/smp_plat.h | 1 + arch/arm64/include/asm/topology.h | 42 +++++++++++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/setup.c | 9 +-- arch/arm64/kernel/smp.c | 19 ++++- arch/arm64/kernel/topology.c | 143 ++++++++++++++++++++++++++++++++++++++ 9 files changed, 227 insertions(+), 6 deletions(-) create mode 100644 arch/arm64/include/asm/topology.h create mode 100644 arch/arm64/kernel/topology.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 88c8b6c1341a..7b4dab852937 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -154,6 +154,14 @@ config SMP If you don't know what to do here, say N. +config ARM_CPU_TOPOLOGY
- bool "Support CPU topology definition"
- depends on SMP
- default y
- help
Support CPU topology definition, based on configuration
provided by the firmware.
config NR_CPUS int "Maximum number of CPUs (2-32)" range 2 32 diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index d67ff011d361..8a26b690110c 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -10,7 +10,6 @@ #include <linux/percpu.h> #include <linux/cpu.h> -#include <linux/topology.h> struct cpuinfo_arm { struct cpu cpu; diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 5fe138e0b828..bd504739cbfd 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -29,6 +29,15 @@ #define INVALID_HWID ULONG_MAX #define MPIDR_HWID_BITMASK 0xff00ffffff +#define MPIDR_SMP_BITMASK (0x3 << 30) +#define MPIDR_SMP_VALUE (0x2 << 30) +#define MPIDR_MT_BITMASK (0x1 << 24) +#define MPIDR_LEVEL_BITS 8 +#define MPIDR_LEVEL_MASK ((1 << MPIDR_LEVEL_BITS) - 1)
+#define MPIDR_AFFINITY_LEVEL(mpidr, level) \
- ((mpidr >> (MPIDR_LEVEL_BITS * level)) & MPIDR_LEVEL_MASK)
#define read_cpuid(reg) ({ \ u64 __val; \ diff --git a/arch/arm64/include/asm/smp_plat.h b/arch/arm64/include/asm/smp_plat.h index ed43a0d2b1b2..4ad4ecc93bcf 100644 --- a/arch/arm64/include/asm/smp_plat.h +++ b/arch/arm64/include/asm/smp_plat.h @@ -19,6 +19,7 @@ #ifndef __ASM_SMP_PLAT_H #define __ASM_SMP_PLAT_H +#include <linux/cpumask.h> #include <asm/types.h> /* diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h new file mode 100644 index 000000000000..611edefaeaf1 --- /dev/null +++ b/arch/arm64/include/asm/topology.h @@ -0,0 +1,42 @@ +#ifndef _ASM_ARM_TOPOLOGY_H +#define _ASM_ARM_TOPOLOGY_H
+#ifdef CONFIG_ARM_CPU_TOPOLOGY
+#include <linux/cpumask.h>
+struct cputopo_arm {
- int thread_id;
- int core_id;
- int socket_id;
- cpumask_t thread_sibling;
- cpumask_t core_sibling;
+};
+extern struct cputopo_arm cpu_topology[NR_CPUS];
+#define topology_physical_package_id(cpu) (cpu_topology[cpu].socket_id) +#define topology_core_id(cpu) (cpu_topology[cpu].core_id) +#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) +#define topology_thread_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
+#define mc_capable() (cpu_topology[0].socket_id != -1) +#define smt_capable() (cpu_topology[0].thread_id != -1)
+void init_cpu_topology(void); +void store_cpu_topology(unsigned int cpuid); +const struct cpumask *cpu_coregroup_mask(int cpu); +int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask);
+#else
+static inline void init_cpu_topology(void) { } +static inline void store_cpu_topology(unsigned int cpuid) { } +static inline int cluster_to_logical_mask(unsigned int socket_id,
- cpumask_t *cluster_mask) { return -EINVAL; }
+#endif
+#include <asm-generic/topology.h>
+#endif /* _ASM_ARM_TOPOLOGY_H */ diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 5ba2fd43a75b..2d145e38ad49 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -18,6 +18,7 @@ arm64-obj-$(CONFIG_SMP) += smp.o smp_spin_table.o arm64-obj-$(CONFIG_HW_PERF_EVENTS) += perf_event.o arm64-obj-$(CONFIG_HAVE_HW_BREAKPOINT)+= hw_breakpoint.o arm64-obj-$(CONFIG_EARLY_PRINTK) += early_printk.o +arm64-obj-$(CONFIG_ARM_CPU_TOPOLOGY) += topology.o obj-y += $(arm64-obj-y) vdso/ obj-m += $(arm64-obj-m) diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 0bc5e4cbc017..dbe4a9ba90cb 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -54,6 +54,7 @@ #include <asm/traps.h> #include <asm/memblock.h> #include <asm/psci.h> +#include <asm/cpu.h> unsigned int processor_id; EXPORT_SYMBOL(processor_id); @@ -250,16 +251,16 @@ static int __init arm64_device_init(void) } arch_initcall(arm64_device_init); -static DEFINE_PER_CPU(struct cpu, cpu_data); +DEFINE_PER_CPU(struct cpuinfo_arm, cpu_data); static int __init topology_init(void) { int i; for_each_possible_cpu(i) {
struct cpu *cpu = &per_cpu(cpu_data, i);
cpu->hotpluggable = 1;
register_cpu(cpu, i);
struct cpuinfo_arm *cpu = &per_cpu(cpu_data, i);
cpu->cpu.hotpluggable = 1;
}register_cpu(&cpu->cpu, i);
return 0; diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index a5aeefab03c3..f29c7ffad84a 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -35,7 +35,6 @@ #include <linux/clockchips.h> #include <linux/completion.h> #include <linux/of.h>
#include <asm/atomic.h> #include <asm/cacheflush.h> #include <asm/cputype.h> @@ -48,6 +47,7 @@ #include <asm/sections.h> #include <asm/tlbflush.h> #include <asm/ptrace.h> +#include <asm/cpu.h> /*
- as from 2.5, kernels no longer have an init_tasks structure
@@ -113,6 +113,16 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) return ret; } +static void __cpuinit smp_store_cpu_info(unsigned int cpuid) +{
- struct cpuinfo_arm *cpu_info = &per_cpu(cpu_data, cpuid);
- cpu_info->loops_per_jiffy = loops_per_jiffy;
- cpu_info->cpuid = read_cpuid_id();
- store_cpu_topology(cpuid);
+}
/*
- This is the secondary CPU boot entry. We're using this CPUs
- idle thread stack, but a set of temporary page tables.
@@ -150,6 +160,8 @@ asmlinkage void secondary_start_kernel(void) */ notify_cpu_starting(cpu);
- smp_store_cpu_info(cpu);
- /*
- OK, now it's safe to let the boot CPU continue. Wait for
- the CPU migration code to notice that the CPU is online
@@ -387,6 +399,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus) int err; unsigned int cpu, ncores = num_possible_cpus();
- init_cpu_topology();
- smp_store_cpu_info(smp_processor_id());
- /*
*/
- are we trying to boot more cores than exist?
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c new file mode 100644 index 000000000000..e0b40f48b448 --- /dev/null +++ b/arch/arm64/kernel/topology.c @@ -0,0 +1,143 @@ +/*
- arch/arm64/kernel/topology.c
- Copyright (C) 2011,2013 Linaro Limited.
- Written by: Vincent Guittot
- based on arch/sh/kernel/topology.c
- This file is subject to the terms and conditions of the GNU General Public
- License. See the file "COPYING" in the main directory of this archive
- for more details.
- */
+#include <linux/cpu.h> +#include <linux/cpumask.h> +#include <linux/export.h> +#include <linux/init.h> +#include <linux/percpu.h> +#include <linux/node.h> +#include <linux/nodemask.h> +#include <linux/sched.h> +#include <linux/slab.h>
+#include <asm/cputype.h> +#include <asm/smp_plat.h> +#include <asm/topology.h>
+/*
- cpu topology table
- */
+struct cputopo_arm cpu_topology[NR_CPUS]; +EXPORT_SYMBOL_GPL(cpu_topology);
+const struct cpumask *cpu_coregroup_mask(int cpu) +{
- return &cpu_topology[cpu].core_sibling;
+}
+static void update_siblings_masks(unsigned int cpuid) +{
- struct cputopo_arm *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
- int cpu;
- /* update core and thread sibling masks */
- for_each_possible_cpu(cpu) {
cpu_topo = &cpu_topology[cpu];
if (cpuid_topo->socket_id != cpu_topo->socket_id)
continue;
cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
if (cpu != cpuid)
cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
if (cpuid_topo->core_id != cpu_topo->core_id)
continue;
cpumask_set_cpu(cpuid, &cpu_topo->thread_sibling);
if (cpu != cpuid)
cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
- }
- smp_wmb();
+}
+/*
- store_cpu_topology is called at boot when only one cpu is running
- and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
- which prevents simultaneous write access to cpu_topology array
- */
+void store_cpu_topology(unsigned int cpuid) +{
- struct cputopo_arm *cpuid_topo = &cpu_topology[cpuid];
- u64 mpidr;
- /* If the cpu topology has been already set, just return */
- if (cpuid_topo->core_id != -1)
return;
- mpidr = cpu_logical_map(cpuid);
- /*
* Create cpu topology mapping, assume the cores are largely
* independent since the DT bindings do not include the flags
* for MT.
*/
- cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
- cpuid_topo->socket_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
- update_siblings_masks(cpuid);
- pr_info("CPU%u: cpu %d, socket %d mapped using MPIDR %llx\n",
cpuid, cpu_topology[cpuid].core_id,
cpu_topology[cpuid].socket_id, mpidr);
+}
+/*
- cluster_to_logical_mask - return cpu logical mask of CPUs in a cluster
- @socket_id: cluster HW identifier
- @cluster_mask: the cpumask location to be initialized, modified by the
function only if return value == 0
- Return:
- 0 on success
- -EINVAL if cluster_mask is NULL or there is no record matching socket_id
- */
+int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask) +{
- int cpu;
- if (!cluster_mask)
return -EINVAL;
- for_each_online_cpu(cpu)
if (socket_id == topology_physical_package_id(cpu)) {
cpumask_copy(cluster_mask, topology_core_cpumask(cpu));
return 0;
}
- return -EINVAL;
+}
+/*
- init_cpu_topology is called at boot when only one cpu is running
- which prevent simultaneous write access to cpu_topology array
- */
+void __init init_cpu_topology(void) +{
- unsigned int cpu;
- /* init core mask and power*/
- for_each_possible_cpu(cpu) {
struct cputopo_arm *cpu_topo = &(cpu_topology[cpu]);
cpu_topo->thread_id = -1;
cpu_topo->core_id = -1;
cpu_topo->socket_id = -1;
cpumask_clear(&cpu_topo->core_sibling);
cpumask_clear(&cpu_topo->thread_sibling);
- }
- smp_wmb();
+}
1.8.5.1
On Wed, Dec 11, 2013 at 02:12:24PM +0000, Will Deacon wrote:
[Adding Lorenzo]
I seem to remember Lorenzo having patches already for this, along with bindings, Documentation etc. so it would be good to know how these two series are supposed to interact.
He said he hadn't had time to work on the actual implementation which is why I picked this up - as far as I'm aware we only have the (already merged) binding for this.
On Wed, Dec 11, 2013 at 02:15:03PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:12:24PM +0000, Will Deacon wrote:
I seem to remember Lorenzo having patches already for this, along with bindings, Documentation etc. so it would be good to know how these two series are supposed to interact.
He said he hadn't had time to work on the actual implementation which is why I picked this up - as far as I'm aware we only have the (already merged) binding for this.
Lorenzo (on holiday now) pushed the DT topology bindings but there are no patches yet for reading them. I would much prefer to only describe the topology via DT rather than MPIDR (because hardware people sometimes have strange ideas).
On Wed, Dec 11, 2013 at 02:24:32PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 02:15:03PM +0000, Mark Brown wrote:
He said he hadn't had time to work on the actual implementation which is why I picked this up - as far as I'm aware we only have the (already merged) binding for this.
Lorenzo (on holiday now) pushed the DT topology bindings but there are no patches yet for reading them. I would much prefer to only describe the topology via DT rather than MPIDR (because hardware people sometimes have strange ideas).
Right, that was my understanding too.
For clarity the current code only uses DT information, the MPIDR values it uses are those in the CPU nodes in the DT which are used by the SMP code and are not read from the hardware. We can drop that bit of code easily enough but it seemed useful for getting the scaffolding in place separately to the binding parsing and for general robustness, perhaps adding a patch at the end of the series to drop the code handling MPIDR would deal with the application issues?
On Wed, Dec 11, 2013 at 02:30:18PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:24:32PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 02:15:03PM +0000, Mark Brown wrote:
He said he hadn't had time to work on the actual implementation which is why I picked this up - as far as I'm aware we only have the (already merged) binding for this.
Lorenzo (on holiday now) pushed the DT topology bindings but there are no patches yet for reading them. I would much prefer to only describe the topology via DT rather than MPIDR (because hardware people sometimes have strange ideas).
Right, that was my understanding too.
For clarity the current code only uses DT information, the MPIDR values it uses are those in the CPU nodes in the DT which are used by the SMP code and are not read from the hardware.
But these must match the hardware MPIDR to be useful on SMP. So whether you read them from hardware or DT, they are still the same.
We can drop that bit of code easily enough but it seemed useful for getting the scaffolding in place separately to the binding parsing and for general robustness, perhaps adding a patch at the end of the series to drop the code handling MPIDR would deal with the application issues?
Can we have the topology infrastructure with a flat initial topology and the actual building in a separate patch?
On Wed, Dec 11, 2013 at 02:49:24PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 02:30:18PM +0000, Mark Brown wrote:
For clarity the current code only uses DT information, the MPIDR values it uses are those in the CPU nodes in the DT which are used by the SMP code and are not read from the hardware.
But these must match the hardware MPIDR to be useful on SMP. So whether you read them from hardware or DT, they are still the same.
OK. To be honest it does seem a bit unehelpful to just ignore the values; systems with bogus MPIDRs can always use the topology binding to set something sensible so we end up making work for people who did the right thing. That said...
We can drop that bit of code easily enough but it seemed useful for getting the scaffolding in place separately to the binding parsing and for general robustness, perhaps adding a patch at the end of the series to drop the code handling MPIDR would deal with the application issues?
Can we have the topology infrastructure with a flat initial topology and the actual building in a separate patch?
That would be the effect of just dropping that hunk completely, the default values we initialise if we don't do anything have that effect. I'll do that when I repost.
On 2013-12-11 22:24, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 02:15:03PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:12:24PM +0000, Will Deacon wrote:
I seem to remember Lorenzo having patches already for this, along with bindings, Documentation etc. so it would be good to know how these two series are supposed to interact.
He said he hadn't had time to work on the actual implementation which is why I picked this up - as far as I'm aware we only have the (already merged) binding for this.
Lorenzo (on holiday now) pushed the DT topology bindings but there are no patches yet for reading them. I would much prefer to only describe the topology via DT rather than MPIDR (because hardware people sometimes have strange ideas).
Topology code for ARM32 is based on MPIDR now, should we update it too with describing the topology via DT?
Thanks Hanjun
On Thu, Dec 12, 2013 at 02:59:52PM +0800, Hanjun Guo wrote:
On 2013-12-11 22:24, Catalin Marinas wrote:
Lorenzo (on holiday now) pushed the DT topology bindings but there are no patches yet for reading them. I would much prefer to only describe the topology via DT rather than MPIDR (because hardware people sometimes have strange ideas).
Topology code for ARM32 is based on MPIDR now, should we update it too with describing the topology via DT?
Yes, the binding is supposed to apply to both so it'd be good to support it on pre-v8. However using MPIDR should continue to be supported on pre-v8 as existing systems don't use the topology binding and so presumably have accurate MPIDRs, requiring the binding would be a regression for them.
On 2013-12-12 18:27, Mark Brown wrote:
On Thu, Dec 12, 2013 at 02:59:52PM +0800, Hanjun Guo wrote:
On 2013-12-11 22:24, Catalin Marinas wrote:
Lorenzo (on holiday now) pushed the DT topology bindings but there are no patches yet for reading them. I would much prefer to only describe the topology via DT rather than MPIDR (because hardware people sometimes have strange ideas).
Topology code for ARM32 is based on MPIDR now, should we update it too with describing the topology via DT?
Yes, the binding is supposed to apply to both so it'd be good to support it on pre-v8. However using MPIDR should continue to be supported on pre-v8 as existing systems don't use the topology binding and so presumably have accurate MPIDRs, requiring the binding would be a regression for them.
Ah, that makes sense :)
On Wed, Dec 11, 2013 at 01:13:24PM +0000, Mark Brown wrote:
From: Mark Brown broonie@linaro.org
Add basic CPU topology support to arm64, based on the existing pre-v8 code and some work done by Mark Hambleton. This patch does not implement the ARM CPU topology bindings, it implements equivalent support to the existing the equivalent pre-v8 capability using the mandatory MPIDR information in the CPU binding in device tree and assuming that a simple SMP or multi-cluster topology is in use.
The primary goal is to separate the architecture hookup for providing topology information from the DT parsing in order to ease review and avoid blocking the architecture code (which will be built on by other work) with the DT code review by providing something something simple and basic. Having this support should also make the kernel cope better with incomplete DTs.
Further patches will provide support for overriding this using the topology bindings, providing richer support for a wider range of systems.
Signed-off-by: Mark Brown broonie@linaro.org
arch/arm64/Kconfig | 8 +++ arch/arm64/include/asm/cpu.h | 1 - arch/arm64/include/asm/cputype.h | 9 +++ arch/arm64/include/asm/smp_plat.h | 1 + arch/arm64/include/asm/topology.h | 42 +++++++++++ arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/setup.c | 9 +-- arch/arm64/kernel/smp.c | 19 ++++- arch/arm64/kernel/topology.c | 143 ++++++++++++++++++++++++++++++++++++++ 9 files changed, 227 insertions(+), 6 deletions(-) create mode 100644 arch/arm64/include/asm/topology.h create mode 100644 arch/arm64/kernel/topology.c
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 88c8b6c1341a..7b4dab852937 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -154,6 +154,14 @@ config SMP
If you don't know what to do here, say N.
+config ARM_CPU_TOPOLOGY
bool "Support CPU topology definition"
depends on SMP
default y
help
Support CPU topology definition, based on configuration
provided by the firmware.
config NR_CPUS int "Maximum number of CPUs (2-32)" range 2 32 diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index d67ff011d361..8a26b690110c 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -10,7 +10,6 @@
#include <linux/percpu.h> #include <linux/cpu.h> -#include <linux/topology.h>
struct cpuinfo_arm { struct cpu cpu; diff --git a/arch/arm64/include/asm/cputype.h b/arch/arm64/include/asm/cputype.h index 5fe138e0b828..bd504739cbfd 100644 --- a/arch/arm64/include/asm/cputype.h +++ b/arch/arm64/include/asm/cputype.h @@ -29,6 +29,15 @@ #define INVALID_HWID ULONG_MAX
#define MPIDR_HWID_BITMASK 0xff00ffffff +#define MPIDR_SMP_BITMASK (0x3 << 30) +#define MPIDR_SMP_VALUE (0x2 << 30) +#define MPIDR_MT_BITMASK (0x1 << 24) +#define MPIDR_LEVEL_BITS 8 +#define MPIDR_LEVEL_MASK ((1 << MPIDR_LEVEL_BITS) - 1)
+#define MPIDR_AFFINITY_LEVEL(mpidr, level) \
((mpidr >> (MPIDR_LEVEL_BITS * level)) & MPIDR_LEVEL_MASK)
This macros does not cover all affinity levels, I have a patch that implements this macro in my arm64 cpu_{suspend}/{resume} series.
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-November/213031.h...
#define read_cpuid(reg) ({ \ u64 __val; \ diff --git a/arch/arm64/include/asm/smp_plat.h b/arch/arm64/include/asm/smp_plat.h index ed43a0d2b1b2..4ad4ecc93bcf 100644 --- a/arch/arm64/include/asm/smp_plat.h +++ b/arch/arm64/include/asm/smp_plat.h @@ -19,6 +19,7 @@ #ifndef __ASM_SMP_PLAT_H #define __ASM_SMP_PLAT_H
+#include <linux/cpumask.h> #include <asm/types.h>
/* diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h new file mode 100644 index 000000000000..611edefaeaf1 --- /dev/null +++ b/arch/arm64/include/asm/topology.h @@ -0,0 +1,42 @@ +#ifndef _ASM_ARM_TOPOLOGY_H +#define _ASM_ARM_TOPOLOGY_H
+#ifdef CONFIG_ARM_CPU_TOPOLOGY
+#include <linux/cpumask.h>
+struct cputopo_arm {
int thread_id;
int core_id;
int socket_id;
cpumask_t thread_sibling;
cpumask_t core_sibling;
+};
+extern struct cputopo_arm cpu_topology[NR_CPUS];
+#define topology_physical_package_id(cpu) (cpu_topology[cpu].socket_id) +#define topology_core_id(cpu) (cpu_topology[cpu].core_id) +#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling) +#define topology_thread_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
+#define mc_capable() (cpu_topology[0].socket_id != -1) +#define smt_capable() (cpu_topology[0].thread_id != -1)
+void init_cpu_topology(void); +void store_cpu_topology(unsigned int cpuid);
Do not think function above should be exported. Topology can be built in one go from DT, code needing this function was there because the MPIDR in arm32 was "probed", hence all CPUs (primaries and secondaries) had to call it IIRC.
+const struct cpumask *cpu_coregroup_mask(int cpu); +int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask);
I think the function signature above needs changing when we move to DT topology bindings, put it differently the look up won't be based on a socket id anymore, I need some time to think about it.
[...]
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index a5aeefab03c3..f29c7ffad84a 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -35,7 +35,6 @@ #include <linux/clockchips.h> #include <linux/completion.h> #include <linux/of.h>
#include <asm/atomic.h> #include <asm/cacheflush.h> #include <asm/cputype.h> @@ -48,6 +47,7 @@ #include <asm/sections.h> #include <asm/tlbflush.h> #include <asm/ptrace.h> +#include <asm/cpu.h>
/*
- as from 2.5, kernels no longer have an init_tasks structure
@@ -113,6 +113,16 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle) return ret; }
+static void __cpuinit smp_store_cpu_info(unsigned int cpuid) +{
struct cpuinfo_arm *cpu_info = &per_cpu(cpu_data, cpuid);
cpu_info->loops_per_jiffy = loops_per_jiffy;
cpu_info->cpuid = read_cpuid_id();
store_cpu_topology(cpuid);
All bits of info are in the DT, topology can be built by primary CPU, no need to call store_cpu_topology() on each CPU, that was only needed because on arm32 the topology code relies on each CPU reading its own MPIDR.
/*
- This is the secondary CPU boot entry. We're using this CPUs
- idle thread stack, but a set of temporary page tables.
@@ -150,6 +160,8 @@ asmlinkage void secondary_start_kernel(void) */ notify_cpu_starting(cpu);
smp_store_cpu_info(cpu);
/* * OK, now it's safe to let the boot CPU continue. Wait for * the CPU migration code to notice that the CPU is online
@@ -387,6 +399,11 @@ void __init smp_prepare_cpus(unsigned int max_cpus) int err; unsigned int cpu, ncores = num_possible_cpus();
init_cpu_topology();
smp_store_cpu_info(smp_processor_id());
/* * are we trying to boot more cores than exist? */
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c new file mode 100644 index 000000000000..e0b40f48b448 --- /dev/null +++ b/arch/arm64/kernel/topology.c @@ -0,0 +1,143 @@ +/*
- arch/arm64/kernel/topology.c
- Copyright (C) 2011,2013 Linaro Limited.
- Written by: Vincent Guittot
- based on arch/sh/kernel/topology.c
- This file is subject to the terms and conditions of the GNU General Public
- License. See the file "COPYING" in the main directory of this archive
- for more details.
- */
+#include <linux/cpu.h> +#include <linux/cpumask.h> +#include <linux/export.h> +#include <linux/init.h> +#include <linux/percpu.h> +#include <linux/node.h> +#include <linux/nodemask.h> +#include <linux/sched.h> +#include <linux/slab.h>
+#include <asm/cputype.h> +#include <asm/smp_plat.h> +#include <asm/topology.h>
+/*
- cpu topology table
- */
+struct cputopo_arm cpu_topology[NR_CPUS]; +EXPORT_SYMBOL_GPL(cpu_topology);
+const struct cpumask *cpu_coregroup_mask(int cpu) +{
return &cpu_topology[cpu].core_sibling;
+}
+static void update_siblings_masks(unsigned int cpuid) +{
struct cputopo_arm *cpu_topo, *cpuid_topo = &cpu_topology[cpuid];
int cpu;
/* update core and thread sibling masks */
for_each_possible_cpu(cpu) {
cpu_topo = &cpu_topology[cpu];
if (cpuid_topo->socket_id != cpu_topo->socket_id)
continue;
cpumask_set_cpu(cpuid, &cpu_topo->core_sibling);
if (cpu != cpuid)
cpumask_set_cpu(cpu, &cpuid_topo->core_sibling);
if (cpuid_topo->core_id != cpu_topo->core_id)
continue;
cpumask_set_cpu(cpuid, &cpu_topo->thread_sibling);
if (cpu != cpuid)
cpumask_set_cpu(cpu, &cpuid_topo->thread_sibling);
}
smp_wmb();
+}
+/*
- store_cpu_topology is called at boot when only one cpu is running
- and with the mutex cpu_hotplug.lock locked, when several cpus have booted,
- which prevents simultaneous write access to cpu_topology array
- */
+void store_cpu_topology(unsigned int cpuid) +{
struct cputopo_arm *cpuid_topo = &cpu_topology[cpuid];
u64 mpidr;
/* If the cpu topology has been already set, just return */
if (cpuid_topo->core_id != -1)
return;
mpidr = cpu_logical_map(cpuid);
/*
* Create cpu topology mapping, assume the cores are largely
* independent since the DT bindings do not include the flags
* for MT.
*/
cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cpuid_topo->socket_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
This is what we are trying to prevent. The way levels are mapped to core and cluster id is just a recommendation. Better to follow DT bindings, they are stricter and provide us with all the required bits of information.
update_siblings_masks(cpuid);
pr_info("CPU%u: cpu %d, socket %d mapped using MPIDR %llx\n",
cpuid, cpu_topology[cpuid].core_id,
cpu_topology[cpuid].socket_id, mpidr);
+}
+/*
- cluster_to_logical_mask - return cpu logical mask of CPUs in a cluster
- @socket_id: cluster HW identifier
- @cluster_mask: the cpumask location to be initialized, modified by the
function only if return value == 0
- Return:
- 0 on success
- -EINVAL if cluster_mask is NULL or there is no record matching socket_id
- */
+int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask) +{
int cpu;
if (!cluster_mask)
return -EINVAL;
for_each_online_cpu(cpu)
if (socket_id == topology_physical_package_id(cpu)) {
cpumask_copy(cluster_mask, topology_core_cpumask(cpu));
return 0;
}
return -EINVAL;
+}
As mentioned, I think this function will have to change. Masks can be built using phandles to topology nodes. I know this is how cluster masks are currently built in arm32 kernels, but this does not mean that's the correct approach, given the laxity of the MPIDR specification.
+/*
- init_cpu_topology is called at boot when only one cpu is running
- which prevent simultaneous write access to cpu_topology array
- */
+void __init init_cpu_topology(void) +{
unsigned int cpu;
/* init core mask and power*/
for_each_possible_cpu(cpu) {
struct cputopo_arm *cpu_topo = &(cpu_topology[cpu]);
cpu_topo->thread_id = -1;
cpu_topo->core_id = -1;
cpu_topo->socket_id = -1;
cpumask_clear(&cpu_topo->core_sibling);
cpumask_clear(&cpu_topo->thread_sibling);
}
This is probably the place where the topology should be parsed and built in one go, from DT, I did that and then needed to rewrite the code since topology bindings changed before getting merged.
Thanks, Lorenzo
On Mon, Dec 16, 2013 at 10:57:34AM +0000, Lorenzo Pieralisi wrote:
+#define MPIDR_AFFINITY_LEVEL(mpidr, level) \
((mpidr >> (MPIDR_LEVEL_BITS * level)) & MPIDR_LEVEL_MASK)
This macros does not cover all affinity levels, I have a patch that implements this macro in my arm64 cpu_{suspend}/{resume} series.
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-November/213031.h...
That hunk has been dropped from the latest postings of the series anyway, please see the more current threads (and I'm going to do another spin today too). It'd be good to get things like your helpers integrated so we don't have duplication of effort or people running around trying to keep track of out of tree patches.
On Mon, Dec 16, 2013 at 10:57:34AM +0000, Lorenzo Pieralisi wrote:
On Wed, Dec 11, 2013 at 01:13:24PM +0000, Mark Brown wrote:
Replying again since I didn't notice half the content here - please delete unneeded context from your mails, it makes it much easier to find the content you've added.
+const struct cpumask *cpu_coregroup_mask(int cpu); +int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask);
I think the function signature above needs changing when we move to DT topology bindings, put it differently the look up won't be based on a socket id anymore, I need some time to think about it.
Well, we need to consider the possibility of ACPI or whatever as well.
cpu_info->loops_per_jiffy = loops_per_jiffy;
cpu_info->cpuid = read_cpuid_id();
store_cpu_topology(cpuid);
All bits of info are in the DT, topology can be built by primary CPU, no need to call store_cpu_topology() on each CPU, that was only needed because on arm32 the topology code relies on each CPU reading its own MPIDR.
This is also gone in the current versions of the series.
/*
* Create cpu topology mapping, assume the cores are largely
* independent since the DT bindings do not include the flags
* for MT.
*/
cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cpuid_topo->socket_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
This is what we are trying to prevent. The way levels are mapped to core and cluster id is just a recommendation. Better to follow DT bindings, they are stricter and provide us with all the required bits of information.
Again, this is gone from the current version. Like I said to Catalin it does feel like this is making more work for systems that have done the right thing with their MPIDRs which doesn't seem ideal (and note that all the DTs that you guys are publishing for your models lack any topology information at present).
for_each_online_cpu(cpu)
if (socket_id == topology_physical_package_id(cpu)) {
cpumask_copy(cluster_mask, topology_core_cpumask(cpu));
return 0;
}
return -EINVAL;
+}
As mentioned, I think this function will have to change. Masks can be built using phandles to topology nodes. I know this is how cluster masks are currently built in arm32 kernels, but this does not mean that's the correct approach, given the laxity of the MPIDR specification.
So, this can actually be removed completely since there aren't any references to this function any more that said the socket_id assignment is still there...
This isn't being done using MPDIR, it's being done based on using the lowest level of cluster however we found it. What we're doing is that while parsing the topology information source we use it to pick a physical package identifier and then later on we use that identifier as a handle. The socket ID isn't really taken from a field in the MPDIR, it's the result of doing the mapping you mention.
I definitely don't think we should be using phandles directly unless we want to have a separate implementation for ACPI (or whatever else might come up) which would mean less encapsulation of the topology code. Having the parsing code assign socket IDs doesn't seem like a particularly bad way of doing things, we need IDs for the generic topology API functions to use anyway.
for_each_possible_cpu(cpu) {
struct cputopo_arm *cpu_topo = &(cpu_topology[cpu]);
cpu_topo->thread_id = -1;
cpu_topo->core_id = -1;
cpu_topo->socket_id = -1;
cpumask_clear(&cpu_topo->core_sibling);
cpumask_clear(&cpu_topo->thread_sibling);
}
This is probably the place where the topology should be parsed and built in one go, from DT, I did that and then needed to rewrite the code since topology bindings changed before getting merged.
Right, and that's where it happens for the DT parsing code.
On Mon, Dec 16, 2013 at 12:29:48PM +0000, Mark Brown wrote:
On Mon, Dec 16, 2013 at 10:57:34AM +0000, Lorenzo Pieralisi wrote:
On Wed, Dec 11, 2013 at 01:13:24PM +0000, Mark Brown wrote:
Replying again since I didn't notice half the content here - please delete unneeded context from your mails, it makes it much easier to find the content you've added.
+const struct cpumask *cpu_coregroup_mask(int cpu); +int cluster_to_logical_mask(unsigned int socket_id, cpumask_t *cluster_mask);
I think the function signature above needs changing when we move to DT topology bindings, put it differently the look up won't be based on a socket id anymore, I need some time to think about it.
Well, we need to consider the possibility of ACPI or whatever as well.
That's a fair point, I will have a look at v2.
[...]
/*
* Create cpu topology mapping, assume the cores are largely
* independent since the DT bindings do not include the flags
* for MT.
*/
cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 0);
cpuid_topo->socket_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
This is what we are trying to prevent. The way levels are mapped to core and cluster id is just a recommendation. Better to follow DT bindings, they are stricter and provide us with all the required bits of information.
Again, this is gone from the current version. Like I said to Catalin it does feel like this is making more work for systems that have done the right thing with their MPIDRs which doesn't seem ideal (and note that all the DTs that you guys are publishing for your models lack any topology information at present).
This is an age-old question and the problem has always been that the "right thing" is recommended, not enforced. I do not want to turn this into bikeshedding, as long as cpu-map node takes priority if present, fine by me.
for_each_online_cpu(cpu)
if (socket_id == topology_physical_package_id(cpu)) {
cpumask_copy(cluster_mask, topology_core_cpumask(cpu));
return 0;
}
return -EINVAL;
+}
As mentioned, I think this function will have to change. Masks can be built using phandles to topology nodes. I know this is how cluster masks are currently built in arm32 kernels, but this does not mean that's the correct approach, given the laxity of the MPIDR specification.
So, this can actually be removed completely since there aren't any references to this function any more that said the socket_id assignment is still there...
This isn't being done using MPDIR, it's being done based on using the lowest level of cluster however we found it. What we're doing is that while parsing the topology information source we use it to pick a physical package identifier and then later on we use that identifier as a handle. The socket ID isn't really taken from a field in the MPDIR, it's the result of doing the mapping you mention.
I definitely don't think we should be using phandles directly unless we want to have a separate implementation for ACPI (or whatever else might come up) which would mean less encapsulation of the topology code. Having the parsing code assign socket IDs doesn't seem like a particularly bad way of doing things, we need IDs for the generic topology API functions to use anyway.
It makes sense, again I will have a look at v2 and comment on that.
Thanks, Lorenzo
On Mon, Dec 16, 2013 at 02:46:38PM +0000, Lorenzo Pieralisi wrote:
On Mon, Dec 16, 2013 at 12:29:48PM +0000, Mark Brown wrote:
Well, we need to consider the possibility of ACPI or whatever as well.
That's a fair point, I will have a look at v2.
Probably best to wait until the v4 or whatever that I'm going to post shortly (need to do a few more checks locally before I post). I'll CC you.
[MPIDR parsing]
Again, this is gone from the current version. Like I said to Catalin it does feel like this is making more work for systems that have done the right thing with their MPIDRs which doesn't seem ideal (and note that all the DTs that you guys are publishing for your models lack any topology information at present).
This is an age-old question and the problem has always been that the "right thing" is recommended, not enforced. I do not want to turn this into bikeshedding, as long as cpu-map node takes priority if present, fine by me.
I already dropped that code, though I could resurrect it (perhaps as a separate patch). The way the code was written was as you describe as a last resort - MPIDR would only be considered if the explict topology binding was not present, it was done as a last step before reporting if no other topology information was discovered.
Actually now I think about it if we're not going to parse the MPIDR we should probably update the bindings to say that the topology binding is mandatory for any v8 system with more than one core.
On Mon, Dec 16, 2013 at 03:12:32PM +0000, Mark Brown wrote:
On Mon, Dec 16, 2013 at 02:46:38PM +0000, Lorenzo Pieralisi wrote:
On Mon, Dec 16, 2013 at 12:29:48PM +0000, Mark Brown wrote:
[MPIDR parsing]
Again, this is gone from the current version. Like I said to Catalin it does feel like this is making more work for systems that have done the right thing with their MPIDRs which doesn't seem ideal (and note that all the DTs that you guys are publishing for your models lack any topology information at present).
This is an age-old question and the problem has always been that the "right thing" is recommended, not enforced. I do not want to turn this into bikeshedding, as long as cpu-map node takes priority if present, fine by me.
I already dropped that code, though I could resurrect it (perhaps as a separate patch). The way the code was written was as you describe as a last resort - MPIDR would only be considered if the explict topology binding was not present, it was done as a last step before reporting if no other topology information was discovered.
Actually now I think about it if we're not going to parse the MPIDR we should probably update the bindings to say that the topology binding is mandatory for any v8 system with more than one core.
Do we need such information if only a flat topology is needed?
On Tue, Dec 17, 2013 at 11:47:34AM +0000, Catalin Marinas wrote:
On Mon, Dec 16, 2013 at 03:12:32PM +0000, Mark Brown wrote:
Actually now I think about it if we're not going to parse the MPIDR we should probably update the bindings to say that the topology binding is mandatory for any v8 system with more than one core.
Do we need such information if only a flat topology is needed?
Probably not but it's simpler to specify that rather than saying it's mandatory only if you need it and it might be useful to know if everything is a flat collection of threads or a flat collection of cores, it seems better to get the information in there if we can in case the DT is fixed. Single core systems are the only ones where it's definitely never going to be helpful.
On 12/11/2013 09:13 PM, Mark Brown wrote:
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h new file mode 100644 index 000000000000..611edefaeaf1 --- /dev/null +++ b/arch/arm64/include/asm/topology.h @@ -0,0 +1,42 @@ +#ifndef _ASM_ARM_TOPOLOGY_H +#define _ASM_ARM_TOPOLOGY_H
+#ifdef CONFIG_ARM_CPU_TOPOLOGY
+#include <linux/cpumask.h>
+struct cputopo_arm {
- int thread_id;
- int core_id;
- int socket_id;
- cpumask_t thread_sibling;
- cpumask_t core_sibling;
+};
Forgive me if I am stupid. :)
why we don't need a cluster_id? and does one cpu socket include few clusters?
On Mon, Dec 16, 2013 at 10:45:18PM +0800, Alex Shi wrote:
On 12/11/2013 09:13 PM, Mark Brown wrote:
+struct cputopo_arm {
- int thread_id;
- int core_id;
- int socket_id;
Forgive me if I am stupid. :)
why we don't need a cluster_id? and does one cpu socket include few clusters?
The ARMv7 code calls a cluster a socket I think because it's trying to maintain similarity with other architectures and the ARMv8 code follows ARMv7 in this regard. At the minute we're using this ID for whatever the lowest level of grouping is above a core and presenting the scheduler with a flat topology there - the topology binding and MPIDR both allow multiple layers of cluster so you could definitely have multiple clusters in a supercluster.
Without practical examples of such systems or more architecture documentation than I've been able to find it's not clear how to represent them to the scheduler, it will depend on how closely associated the clusters are and what the scheduler's features are, perhaps we should describe such a system as NUMA but it's not clear to me that this would produce the desired results. I wonder if we may end up figuring this out from other data such as descriptions of caches and interconnects rather than purely from the topology information.
On 12/16/2013 11:22 PM, Mark Brown wrote:
On Mon, Dec 16, 2013 at 10:45:18PM +0800, Alex Shi wrote:
On 12/11/2013 09:13 PM, Mark Brown wrote:
+struct cputopo_arm {
- int thread_id;
- int core_id;
- int socket_id;
Forgive me if I am stupid. :)
why we don't need a cluster_id? and does one cpu socket include few clusters?
The ARMv7 code calls a cluster a socket I think because it's trying to maintain similarity with other architectures and the ARMv8 code follows ARMv7 in this regard. At the minute we're using this ID for whatever the lowest level of grouping is above a core and presenting the scheduler with a flat topology there - the topology binding and MPIDR both allow multiple layers of cluster so you could definitely have multiple clusters in a supercluster.
Without practical examples of such systems or more architecture documentation than I've been able to find it's not clear how to represent them to the scheduler, it will depend on how closely associated the clusters are and what the scheduler's features are, perhaps we should describe such a system as NUMA but it's not clear to me that this would produce the desired results. I wonder if we may end up figuring this out from other data such as descriptions of caches and interconnects rather than purely from the topology information.
For topology meaning, it may be better to have cluster concept there. And don't know if there will has a real ARM NUMA system for 64bit server. If so, socket_id is good in a NUMA system.
On Tue, Dec 17, 2013 at 03:19:39PM +0800, Alex Shi wrote:
For topology meaning, it may be better to have cluster concept there. And don't know if there will has a real ARM NUMA system for 64bit server. If so, socket_id is good in a NUMA system.
The scheduler does have a separate NUMA node ID that we're not currently using which I'd have expected to be a better fit for actual NUMA stuff; looking at the options we've got it's not clear to me that with what the scheduler has at the minute sockets aren't a good mapping for what the hardware has. There doesn't seem to be anything between threaded cores and physical packages.
From: Mark Brown broonie@linaro.org
In non-heterogeneous systems like big.LITTLE systems the scheduler will be able to make better use of the available cores if we provide power numbers to it. Do this by parsing the CPU nodes in the DT.
The power numbers are the same as for ARMv7 since it seems that the expected differential between the big and little cores is very similar on both ARMv7 and ARMv8.
Signed-off-by: Mark Brown broonie@linaro.org --- arch/arm64/kernel/topology.c | 164 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 164 insertions(+)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index e0b40f48b448..f08bb2306cd4 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -18,6 +18,7 @@ #include <linux/percpu.h> #include <linux/node.h> #include <linux/nodemask.h> +#include <linux/of.h> #include <linux/sched.h> #include <linux/slab.h>
@@ -26,6 +27,163 @@ #include <asm/topology.h>
/* + * cpu power scale management + */ + +/* + * cpu power table + * This per cpu data structure describes the relative capacity of each core. + * On a heteregenous system, cores don't have the same computation capacity + * and we reflect that difference in the cpu_power field so the scheduler can + * take this difference into account during load balance. A per cpu structure + * is preferred because each CPU updates its own cpu_power field during the + * load balance except for idle cores. One idle core is selected to run the + * rebalance_domains for all idle cores and the cpu_power can be updated + * during this sequence. + */ +static DEFINE_PER_CPU(unsigned long, cpu_scale); + +unsigned long arch_scale_freq_power(struct sched_domain *sd, int cpu) +{ + return per_cpu(cpu_scale, cpu); +} + +static void set_power_scale(unsigned int cpu, unsigned long power) +{ + per_cpu(cpu_scale, cpu) = power; +} + +#ifdef CONFIG_OF +struct cpu_efficiency { + const char *compatible; + unsigned long efficiency; +}; + +/* + * Table of relative efficiency of each processors + * The efficiency value must fit in 20bit and the final + * cpu_scale value must be in the range + * 0 < cpu_scale < 3*SCHED_POWER_SCALE/2 + * in order to return at most 1 when DIV_ROUND_CLOSEST + * is used to compute the capacity of a CPU. + * Processors that are not defined in the table, + * use the default SCHED_POWER_SCALE value for cpu_scale. + */ +static const struct cpu_efficiency table_efficiency[] = { + { "arm,cortex-a57", 3891 }, + { "arm,cortex-a53", 2048 }, + { NULL, }, +}; + +static unsigned long *__cpu_capacity; +#define cpu_capacity(cpu) __cpu_capacity[cpu] + +static unsigned long middle_capacity = 1; + +/* + * Iterate all CPUs' descriptor in DT and compute the efficiency + * (as per table_efficiency). Also calculate a middle efficiency + * as close as possible to (max{eff_i} - min{eff_i}) / 2 + * This is later used to scale the cpu_power field such that an + * 'average' CPU is of middle power. Also see the comments near + * table_efficiency[] and update_cpu_power(). + */ +static void __init parse_dt_topology(void) +{ + const struct cpu_efficiency *cpu_eff; + struct device_node *cn = NULL; + unsigned long min_capacity = (unsigned long)(-1); + unsigned long max_capacity = 0; + unsigned long capacity = 0; + int alloc_size, cpu; + + alloc_size = nr_cpu_ids * sizeof(*__cpu_capacity); + __cpu_capacity = kzalloc(alloc_size, GFP_NOWAIT); + + for_each_possible_cpu(cpu) { + const u32 *rate; + int len; + + /* Too early to use cpu->of_node */ + cn = of_get_cpu_node(cpu, NULL); + if (!cn) { + pr_err("Missing device node for CPU %d\n", cpu); + continue; + } + + /* check if the cpu is marked as "disabled", if so ignore */ + if (!of_device_is_available(cn)) + continue; + + for (cpu_eff = table_efficiency; cpu_eff->compatible; cpu_eff++) + if (of_device_is_compatible(cn, cpu_eff->compatible)) + break; + + if (cpu_eff->compatible == NULL) { + pr_warn("%s: Unknown CPU type\n", cn->full_name); + continue; + } + + rate = of_get_property(cn, "clock-frequency", &len); + if (!rate || len != 4) { + pr_err("%s: Missing clock-frequency property\n", + cn->full_name); + continue; + } + + capacity = ((be32_to_cpup(rate)) >> 20) * cpu_eff->efficiency; + + /* Save min capacity of the system */ + if (capacity < min_capacity) + min_capacity = capacity; + + /* Save max capacity of the system */ + if (capacity > max_capacity) + max_capacity = capacity; + + cpu_capacity(cpu) = capacity; + } + + /* If min and max capacities are equal we bypass the update of the + * cpu_scale because all CPUs have the same capacity. Otherwise, we + * compute a middle_capacity factor that will ensure that the capacity + * of an 'average' CPU of the system will be as close as possible to + * SCHED_POWER_SCALE, which is the default value, but with the + * constraint explained near table_efficiency[]. + */ + if (min_capacity == max_capacity) + return; + else if (4 * max_capacity < (3 * (max_capacity + min_capacity))) + middle_capacity = (min_capacity + max_capacity) + >> (SCHED_POWER_SHIFT+1); + else + middle_capacity = ((max_capacity / 3) + >> (SCHED_POWER_SHIFT-1)) + 1; + +} + +/* + * Look for a customed capacity of a CPU in the cpu_topo_data table during the + * boot. The update of all CPUs is in O(n^2) for heteregeneous system but the + * function returns directly for SMP system. + */ +static void update_cpu_power(unsigned int cpu, unsigned long hwid) +{ + if (!cpu_capacity(cpu)) + return; + + set_power_scale(cpu, cpu_capacity(cpu) / middle_capacity); + + pr_info("CPU%u: update cpu_power %lu\n", + cpu, arch_scale_freq_power(NULL, cpu)); +} + +#else +static inline void parse_dt_topology(void) {} +static inline void update_cpu_power(unsigned int cpuid, unsigned int mpidr) {} +#endif + +/* * cpu topology table */ struct cputopo_arm cpu_topology[NR_CPUS]; @@ -88,6 +246,8 @@ void store_cpu_topology(unsigned int cpuid)
update_siblings_masks(cpuid);
+ update_cpu_power(cpuid, mpidr & MPIDR_HWID_BITMASK); + pr_info("CPU%u: cpu %d, socket %d mapped using MPIDR %llx\n", cpuid, cpu_topology[cpuid].core_id, cpu_topology[cpuid].socket_id, mpidr); @@ -138,6 +298,10 @@ void __init init_cpu_topology(void) cpu_topo->socket_id = -1; cpumask_clear(&cpu_topo->core_sibling); cpumask_clear(&cpu_topo->thread_sibling); + + set_power_scale(cpu, SCHED_POWER_SCALE); } smp_wmb(); + + parse_dt_topology(); }
On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
The power numbers are the same as for ARMv7 since it seems that the expected differential between the big and little cores is very similar on both ARMv7 and ARMv8.
I have no idea ;). We don't have real silicon yet, so that's just a wild guess.
+/*
- Table of relative efficiency of each processors
- The efficiency value must fit in 20bit and the final
- cpu_scale value must be in the range
- 0 < cpu_scale < 3*SCHED_POWER_SCALE/2
- in order to return at most 1 when DIV_ROUND_CLOSEST
- is used to compute the capacity of a CPU.
- Processors that are not defined in the table,
- use the default SCHED_POWER_SCALE value for cpu_scale.
- */
+static const struct cpu_efficiency table_efficiency[] = {
- { "arm,cortex-a57", 3891 },
- { "arm,cortex-a53", 2048 },
- { NULL, },
+};
I also don't think we can just have absolute numbers here. I'm pretty sure these were generated on TC2 but other platforms may have different max CPU frequencies, memory subsystem, level and size of caches. The "average" efficiency and difference will be different.
Can we define this via DT? It's a bit strange since that's a constant used by the Linux scheduler but highly related to hardware.
On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
The power numbers are the same as for ARMv7 since it seems that the expected differential between the big and little cores is very similar on both ARMv7 and ARMv8.
I have no idea ;). We don't have real silicon yet, so that's just a wild guess.
I was going on some typical DMIPS/MHz numbers that I'd found so hopefully it's not a complete guess, though it will vary and that's just one benchmark with all the realism problems that entails. The ratio seemed to be about the same as the equivalent for the ARMv7 cores so given that it's a finger in the air thing it didn't seem worth drilling down much further.
+static const struct cpu_efficiency table_efficiency[] = {
- { "arm,cortex-a57", 3891 },
- { "arm,cortex-a53", 2048 },
- { NULL, },
+};
I also don't think we can just have absolute numbers here. I'm pretty sure these were generated on TC2 but other platforms may have different max CPU frequencies, memory subsystem, level and size of caches. The "average" efficiency and difference will be different.
The CPU frequencies at least are taken care of already, these numbers get scaled for each core. Once we're talking about things like the memory I'd also start worrying about application specific effects. There's also going to be stuff like thermal management which get fed in here and which varies during runtime.
I don't know where the numbers came from for v7.
Can we define this via DT? It's a bit strange since that's a constant used by the Linux scheduler but highly related to hardware.
I really don't think that's a good idea at this point, it seems better for the DT to stick to factual descriptions of what's present rather than putting tuning numbers in there. If the wild guesses are in the kernel source it's fairly easy to improve them, if they're baked into system DTs that becomes harder.
I think it's important not to overthink what we're doing here - the information we're trying to convey is that the A57s are a lot faster than the A53s. Getting the numbers "right" is good and helpful but it's not so critical that we should let perfect be the enemy of good. This should at least give ARMv8 implementations about equivalent performance to ARMv7 with this stuff.
I'm also worried about putting numbers into the DT now with all the scheduler work going on, this time next year we may well have a completely different idea of what we want to tell the scheduler. It may be that we end up being able to explicitly tell the scheduler about things like the memory architecture, or that the scheduler just gets smarter and can estimate all this stuff at runtime.
Customisation seems better provided at runtime than in the DT, that's more friendly to application specific tuning and it means that we're less committed to what's in the DT so we can improve things as our understanding increases. If it was punting to platform data and we could just update it if we decided it wasn't ideal it'd be less of an issue but punting to something that ought to be an ABI isn't awesome.
Once we've got more experience with the silicon and the scheduler work has progressed we might decide it's helpful to put tuning controls into DT but starting from that point feels like it's more likely to cause problems than help. With where we are now something simple and in the ballpark is going to get us a long way.
On Wed, 11 Dec 2013, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
The power numbers are the same as for ARMv7 since it seems that the expected differential between the big and little cores is very similar on both ARMv7 and ARMv8.
I have no idea ;). We don't have real silicon yet, so that's just a wild guess.
I was going on some typical DMIPS/MHz numbers that I'd found so hopefully it's not a complete guess, though it will vary and that's just one benchmark with all the realism problems that entails. The ratio seemed to be about the same as the equivalent for the ARMv7 cores so given that it's a finger in the air thing it didn't seem worth drilling down much further.
+static const struct cpu_efficiency table_efficiency[] = {
- { "arm,cortex-a57", 3891 },
- { "arm,cortex-a53", 2048 },
- { NULL, },
+};
I also don't think we can just have absolute numbers here. I'm pretty sure these were generated on TC2 but other platforms may have different max CPU frequencies, memory subsystem, level and size of caches. The "average" efficiency and difference will be different.
The CPU frequencies at least are taken care of already, these numbers get scaled for each core. Once we're talking about things like the memory I'd also start worrying about application specific effects. There's also going to be stuff like thermal management which get fed in here and which varies during runtime.
I don't know where the numbers came from for v7.
Can we define this via DT? It's a bit strange since that's a constant used by the Linux scheduler but highly related to hardware.
I really don't think that's a good idea at this point, it seems better for the DT to stick to factual descriptions of what's present rather than putting tuning numbers in there. If the wild guesses are in the kernel source it's fairly easy to improve them, if they're baked into system DTs that becomes harder.
I really think putting such things into DT is wrong.
If those numbers were derived from benchmark results, then it is most probably best to try to come up with some kind of equivalent benchmark in the kernel to qualify CPUs at run time. After all this is what actually matters i.e. how CPUs perform relative to each other, and that may vary with many factors that people will forget to update when copying a DT content to enable a new board.
And that wouldn't be the first time some benchmark is used at boot time. Different crypto/RAID algorithms are tested to determine the best one to use, etc.
I'm also worried about putting numbers into the DT now with all the scheduler work going on, this time next year we may well have a completely different idea of what we want to tell the scheduler. It may be that we end up being able to explicitly tell the scheduler about things like the memory architecture, or that the scheduler just gets smarter and can estimate all this stuff at runtime.
Exactly. Which is why the kernel better be self-sufficient to determine such params. Dt should be used only for things that may not be probed at run time. The relative performance of a CPU certainly can be probed at run time.
Obviously the specifics of the actual benchmark might be debated, but the same can be said about static numbers.
Nicolas
On Wed, Dec 11, 2013 at 07:27:09PM +0000, Nicolas Pitre wrote:
On Wed, 11 Dec 2013, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
The power numbers are the same as for ARMv7 since it seems that the expected differential between the big and little cores is very similar on both ARMv7 and ARMv8.
I have no idea ;). We don't have real silicon yet, so that's just a wild guess.
I was going on some typical DMIPS/MHz numbers that I'd found so hopefully it's not a complete guess, though it will vary and that's just one benchmark with all the realism problems that entails. The ratio seemed to be about the same as the equivalent for the ARMv7 cores so given that it's a finger in the air thing it didn't seem worth drilling down much further.
+static const struct cpu_efficiency table_efficiency[] = {
- { "arm,cortex-a57", 3891 },
- { "arm,cortex-a53", 2048 },
- { NULL, },
+};
I also don't think we can just have absolute numbers here. I'm pretty sure these were generated on TC2 but other platforms may have different max CPU frequencies, memory subsystem, level and size of caches. The "average" efficiency and difference will be different.
The CPU frequencies at least are taken care of already, these numbers get scaled for each core. Once we're talking about things like the memory I'd also start worrying about application specific effects. There's also going to be stuff like thermal management which get fed in here and which varies during runtime.
I don't know where the numbers came from for v7.
I'm fairly sure that they are guestimates based on TC2. Vincent should know. I wouldn't consider them accurate in any way as the relative performance varies wildly depending on the workload. However, they are better than having no information at all.
Can we define this via DT? It's a bit strange since that's a constant used by the Linux scheduler but highly related to hardware.
I really don't think that's a good idea at this point, it seems better for the DT to stick to factual descriptions of what's present rather than putting tuning numbers in there. If the wild guesses are in the kernel source it's fairly easy to improve them, if they're baked into system DTs that becomes harder.
I really think putting such things into DT is wrong.
If those numbers were derived from benchmark results, then it is most probably best to try to come up with some kind of equivalent benchmark in the kernel to qualify CPUs at run time. After all this is what actually matters i.e. how CPUs perform relative to each other, and that may vary with many factors that people will forget to update when copying a DT content to enable a new board.
And that wouldn't be the first time some benchmark is used at boot time. Different crypto/RAID algorithms are tested to determine the best one to use, etc.
I'm also worried about putting numbers into the DT now with all the scheduler work going on, this time next year we may well have a completely different idea of what we want to tell the scheduler. It may be that we end up being able to explicitly tell the scheduler about things like the memory architecture, or that the scheduler just gets smarter and can estimate all this stuff at runtime.
I agree. We need to sort the scheduler side out first before we commit to anything. If we are worried about including code into v8 that we are going to change later, then it is probably better to leave this part out. See my response to Mark's patch subset with the same patch for details (I didn't see this thread until afterwardsi - sorry).
Exactly. Which is why the kernel better be self-sufficient to determine such params. Dt should be used only for things that may not be probed at run time. The relative performance of a CPU certainly can be probed at run time.
Obviously the specifics of the actual benchmark might be debated, but the same can be said about static numbers.
Indeed.
Morten
On Thu, Dec 12, 2013 at 11:56:40AM +0000, Morten Rasmussen wrote:
I'm also worried about putting numbers into the DT now with all the scheduler work going on, this time next year we may well have a completely different idea of what we want to tell the scheduler. It may be that we end up being able to explicitly tell the scheduler about things like the memory architecture, or that the scheduler just gets smarter and can estimate all this stuff at runtime.
I agree. We need to sort the scheduler side out first before we commit to anything. If we are worried about including code into v8 that we are going to change later, then it is probably better to leave this part out. See my response to Mark's patch subset with the same patch for details (I didn't see this thread until afterwardsi - sorry).
My take on change is that we should be doing as good a job as we can with the scheduler we have so users get whatever we're able to deliver at the current time. Having to change in kernel code shouldn't be that big a deal, especially with something like this where the scheduler is free to ignore what it's told without churning the interface.
On Thu, Dec 12, 2013 at 12:22:36PM +0000, Mark Brown wrote:
On Thu, Dec 12, 2013 at 11:56:40AM +0000, Morten Rasmussen wrote:
I'm also worried about putting numbers into the DT now with all the scheduler work going on, this time next year we may well have a completely different idea of what we want to tell the scheduler. It may be that we end up being able to explicitly tell the scheduler about things like the memory architecture, or that the scheduler just gets smarter and can estimate all this stuff at runtime.
I agree. We need to sort the scheduler side out first before we commit to anything. If we are worried about including code into v8 that we are going to change later, then it is probably better to leave this part out. See my response to Mark's patch subset with the same patch for details (I didn't see this thread until afterwardsi - sorry).
My take on change is that we should be doing as good a job as we can with the scheduler we have so users get whatever we're able to deliver at the current time. Having to change in kernel code shouldn't be that big a deal, especially with something like this where the scheduler is free to ignore what it's told without churning the interface.
Fair enough. I just wanted to make sure that people knew about the cpu_power issues before deciding whether to do the same for v8.
Morten
On 12 December 2013 12:56, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Dec 11, 2013 at 07:27:09PM +0000, Nicolas Pitre wrote:
On Wed, 11 Dec 2013, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:47:55PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:25PM +0000, Mark Brown wrote:
The power numbers are the same as for ARMv7 since it seems that the expected differential between the big and little cores is very similar on both ARMv7 and ARMv8.
I have no idea ;). We don't have real silicon yet, so that's just a wild guess.
I was going on some typical DMIPS/MHz numbers that I'd found so hopefully it's not a complete guess, though it will vary and that's just one benchmark with all the realism problems that entails. The ratio seemed to be about the same as the equivalent for the ARMv7 cores so given that it's a finger in the air thing it didn't seem worth drilling down much further.
+static const struct cpu_efficiency table_efficiency[] = {
{ "arm,cortex-a57", 3891 },
{ "arm,cortex-a53", 2048 },
{ NULL, },
+};
I also don't think we can just have absolute numbers here. I'm pretty sure these were generated on TC2 but other platforms may have different max CPU frequencies, memory subsystem, level and size of caches. The "average" efficiency and difference will be different.
The CPU frequencies at least are taken care of already, these numbers get scaled for each core. Once we're talking about things like the memory I'd also start worrying about application specific effects. There's also going to be stuff like thermal management which get fed in here and which varies during runtime.
I don't know where the numbers came from for v7.
I'm fairly sure that they are guestimates based on TC2. Vincent should know. I wouldn't consider them accurate in any way as the relative
The values are not based on TC2 but on the dmips/Mhz figures from ARM
Vincent
performance varies wildly depending on the workload. However, they are better than having no information at all.
Can we define this via DT? It's a bit strange since that's a constant used by the Linux scheduler but highly related to hardware.
I really don't think that's a good idea at this point, it seems better for the DT to stick to factual descriptions of what's present rather than putting tuning numbers in there. If the wild guesses are in the kernel source it's fairly easy to improve them, if they're baked into system DTs that becomes harder.
I really think putting such things into DT is wrong.
If those numbers were derived from benchmark results, then it is most probably best to try to come up with some kind of equivalent benchmark in the kernel to qualify CPUs at run time. After all this is what actually matters i.e. how CPUs perform relative to each other, and that may vary with many factors that people will forget to update when copying a DT content to enable a new board.
And that wouldn't be the first time some benchmark is used at boot time. Different crypto/RAID algorithms are tested to determine the best one to use, etc.
I'm also worried about putting numbers into the DT now with all the scheduler work going on, this time next year we may well have a completely different idea of what we want to tell the scheduler. It may be that we end up being able to explicitly tell the scheduler about things like the memory architecture, or that the scheduler just gets smarter and can estimate all this stuff at runtime.
I agree. We need to sort the scheduler side out first before we commit to anything. If we are worried about including code into v8 that we are going to change later, then it is probably better to leave this part out. See my response to Mark's patch subset with the same patch for details (I didn't see this thread until afterwardsi - sorry).
Exactly. Which is why the kernel better be self-sufficient to determine such params. Dt should be used only for things that may not be probed at run time. The relative performance of a CPU certainly can be probed at run time.
Obviously the specifics of the actual benchmark might be debated, but the same can be said about static numbers.
Indeed.
Morten
linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel
From: Mark Brown broonie@linaro.org
This provides for discovery of the clusters using the CPU topology bindings, it is added as a separate commit to aid testing of the code for systems without explicit topology bindings.
Signed-off-by: Mark Brown broonie@linaro.org --- arch/arm64/boot/dts/fvp-base-gicv2-psci.dts | 31 +++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
diff --git a/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts b/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts index 736f546123bb..452ba22f3c6e 100644 --- a/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts +++ b/arch/arm64/boot/dts/fvp-base-gicv2-psci.dts @@ -119,6 +119,37 @@ enable-method = "psci"; clock-frequency = <1000000>; }; + + cpu-map { + cluster0 { + core0 { + cpu = <&big0>; + }; + core1 { + cpu = <&big1>; + }; + core2 { + cpu = <&big2>; + }; + core3 { + cpu = <&big3>; + }; + }; + cluster1 { + core0 { + cpu = <&little0>; + }; + core1 { + cpu = <&little1>; + }; + core2 { + cpu = <&little2>; + }; + core3 { + cpu = <&little3>; + }; + }; + }; };
memory@80000000 {
On Wed, Dec 11, 2013 at 01:13:21PM +0000, Mark Brown wrote:
+struct cpuinfo_arm {
- struct cpu cpu;
- u64 cpuid;
+#ifdef CONFIG_SMP
- unsigned int loops_per_jiffy;
+#endif +};
How is this structure used? I haven't seen the ACPI code doing anything with struct cpu (though I haven't dug deep enough). Also loops_per_jiffy is useless, that's related to the delay loop based on the generic timer.
On Wed, Dec 11, 2013 at 02:10:19PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:21PM +0000, Mark Brown wrote:
+struct cpuinfo_arm {
- struct cpu cpu;
- u64 cpuid;
+#ifdef CONFIG_SMP
- unsigned int loops_per_jiffy;
+#endif +};
How is this structure used? I haven't seen the ACPI code doing anything with struct cpu (though I haven't dug deep enough). Also loops_per_jiffy is useless, that's related to the delay loop based on the generic timer.
Now I look again we can probably drop this for the toplogy work - it had been pulled in as part of pulling things in from pre-v8 and the cpuid was used in the code while I was working on it but isn't any more unless I'm misreading the code.
I don't know what the ACPI guys are doing wtih it, I just saw they added the same thing.
On Wed, Dec 11, 2013 at 02:23:08PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:10:19PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:21PM +0000, Mark Brown wrote:
+struct cpuinfo_arm {
- struct cpu cpu;
- u64 cpuid;
+#ifdef CONFIG_SMP
- unsigned int loops_per_jiffy;
+#endif +};
How is this structure used? I haven't seen the ACPI code doing anything with struct cpu (though I haven't dug deep enough). Also loops_per_jiffy is useless, that's related to the delay loop based on the generic timer.
Now I look again we can probably drop this for the toplogy work - it had been pulled in as part of pulling things in from pre-v8 and the cpuid was used in the code while I was working on it but isn't any more unless I'm misreading the code.
I don't know what the ACPI guys are doing wtih it, I just saw they added the same thing.
It looked to me that for ACPI and empty asm/cpu.h file would do for now. There are a couple of prototypes for CPU hotplug I think but haven't seen the structure used (not even the x86_cpu one).
On 2013-12-11 22:27, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 02:23:08PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:10:19PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:21PM +0000, Mark Brown wrote:
+struct cpuinfo_arm {
- struct cpu cpu;
- u64 cpuid;
+#ifdef CONFIG_SMP
- unsigned int loops_per_jiffy;
+#endif +};
How is this structure used? I haven't seen the ACPI code doing anything with struct cpu (though I haven't dug deep enough). Also loops_per_jiffy is useless, that's related to the delay loop based on the generic timer.
Now I look again we can probably drop this for the toplogy work - it had been pulled in as part of pulling things in from pre-v8 and the cpuid was used in the code while I was working on it but isn't any more unless I'm misreading the code.
I don't know what the ACPI guys are doing wtih it, I just saw they added the same thing.
It looked to me that for ACPI and empty asm/cpu.h file would do for now. There are a couple of prototypes for CPU hotplug I think but haven't seen the structure used (not even the x86_cpu one).
In ACPI code, only struct cpu is used for ACPI based CPU hotplug, here is the code will send to upstream when the ACPI core for ARM is ready.
--- arch/arm64/include/asm/cpu.h | 5 +++++ arch/arm64/kernel/topology.c | 26 ++++++++++++++++++++++++++ 2 files changed, 31 insertions(+)
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index dbeb98d..5613e09 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -20,6 +20,11 @@ struct cpuinfo_arm { #endif };
+#ifdef CONFIG_HOTPLUG_CPU +extern int arch_register_cpu(int cpu); +extern void arch_unregister_cpu(int cpu); +#endif + DECLARE_PER_CPU(struct cpuinfo_arm, cpu_data);
#endif diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index cb548f1..5c8e69c 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -18,3 +18,29 @@ void arch_fix_phys_package_id(int num, u32 slot) } EXPORT_SYMBOL_GPL(arch_fix_phys_package_id);
+#ifdef CONFIG_HOTPLUG_CPU +int __ref arch_register_cpu(int cpu) +{ + struct cpuinfo_arm *cpuinfo = &per_cpu(cpu_data, cpu); + + /* BSP cann't be taken down on arm */ + if (cpu) + cpuinfo->cpu.hotpluggable = 1; + + return register_cpu(&cpuinfo->cpu, cpu); +} +EXPORT_SYMBOL(arch_register_cpu); + +void arch_unregister_cpu(int cpu) +{ + unregister_cpu(&per_cpu(cpu_data, cpu).cpu); +} +EXPORT_SYMBOL(arch_unregister_cpu); +#else /* CONFIG_HOTPLUG_CPU */ + +static int __init arch_register_cpu(int cpu) +{ + return register_cpu(&per_cpu(cpu_data, cpu).cpu, cpu); +} +#endif /* CONFIG_HOTPLUG_CPU */ +
On Thu, Dec 12, 2013 at 06:50:27AM +0000, Hanjun Guo wrote:
On 2013-12-11 22:27, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 02:23:08PM +0000, Mark Brown wrote:
On Wed, Dec 11, 2013 at 02:10:19PM +0000, Catalin Marinas wrote:
On Wed, Dec 11, 2013 at 01:13:21PM +0000, Mark Brown wrote:
+struct cpuinfo_arm {
- struct cpu cpu;
- u64 cpuid;
+#ifdef CONFIG_SMP
- unsigned int loops_per_jiffy;
+#endif +};
How is this structure used? I haven't seen the ACPI code doing anything with struct cpu (though I haven't dug deep enough). Also loops_per_jiffy is useless, that's related to the delay loop based on the generic timer.
Now I look again we can probably drop this for the toplogy work - it had been pulled in as part of pulling things in from pre-v8 and the cpuid was used in the code while I was working on it but isn't any more unless I'm misreading the code.
I don't know what the ACPI guys are doing wtih it, I just saw they added the same thing.
It looked to me that for ACPI and empty asm/cpu.h file would do for now. There are a couple of prototypes for CPU hotplug I think but haven't seen the structure used (not even the x86_cpu one).
In ACPI code, only struct cpu is used for ACPI based CPU hotplug, here is the code will send to upstream when the ACPI core for ARM is ready.
arch/arm64/include/asm/cpu.h | 5 +++++ arch/arm64/kernel/topology.c | 26 ++++++++++++++++++++++++++ 2 files changed, 31 insertions(+)
diff --git a/arch/arm64/include/asm/cpu.h b/arch/arm64/include/asm/cpu.h index dbeb98d..5613e09 100644 --- a/arch/arm64/include/asm/cpu.h +++ b/arch/arm64/include/asm/cpu.h @@ -20,6 +20,11 @@ struct cpuinfo_arm { #endif };
+#ifdef CONFIG_HOTPLUG_CPU +extern int arch_register_cpu(int cpu); +extern void arch_unregister_cpu(int cpu); +#endif
DECLARE_PER_CPU(struct cpuinfo_arm, cpu_data);
#endif diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index cb548f1..5c8e69c 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -18,3 +18,29 @@ void arch_fix_phys_package_id(int num, u32 slot) } EXPORT_SYMBOL_GPL(arch_fix_phys_package_id);
+#ifdef CONFIG_HOTPLUG_CPU +int __ref arch_register_cpu(int cpu) +{
struct cpuinfo_arm *cpuinfo = &per_cpu(cpu_data, cpu);
/* BSP cann't be taken down on arm */
if (cpu)
cpuinfo->cpu.hotpluggable = 1;
_why_ does the ACPI standard prohibit hotplugging the boot CPU?
In non-ACPI systems we can hotplug out the boot CPU (we can do so under KVM using PSCI).
I note that the x86 arch_register_cpu allows CPU0 to be hotpluggable on Intel systems as long as there are no dependencies on CPU0 being active. Surely we can test for something more fine-grained rather than disallowing CPU0 hotplug outright?
How does this interact with the existing arm64 hotplug code?
Thanks, Mark.
On 2013-12-12 18:36, Mark Rutland wrote:
On Thu, Dec 12, 2013 at 06:50:27AM +0000, Hanjun Guo wrote:
[...]
#endif diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index cb548f1..5c8e69c 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -18,3 +18,29 @@ void arch_fix_phys_package_id(int num, u32 slot) } EXPORT_SYMBOL_GPL(arch_fix_phys_package_id);
+#ifdef CONFIG_HOTPLUG_CPU +int __ref arch_register_cpu(int cpu) +{
struct cpuinfo_arm *cpuinfo = &per_cpu(cpu_data, cpu);
/* BSP cann't be taken down on arm */
if (cpu)
cpuinfo->cpu.hotpluggable = 1;
_why_ does the ACPI standard prohibit hotplugging the boot CPU?
ACPI spec has not limitations to hotplug the boot CPU, it depends on the arch.
In non-ACPI systems we can hotplug out the boot CPU (we can do so under KVM using PSCI).
If all the things (interrupts and etc) running on CPU0 can be migrated to other CPUs when offline a CPU, it can be hotplugable in ACPI way too.
I note that the x86 arch_register_cpu allows CPU0 to be hotpluggable on Intel systems as long as there are no dependencies on CPU0 being active. Surely we can test for something more fine-grained rather than disallowing CPU0 hotplug outright?
Ok, will update it when I formally send this patch out.
How does this interact with the existing arm64 hotplug code?
Some other patches are needed for ACPI based CPU hotplug,will send out when the ACPI core for ARM is ready, I will cc you and then you will know what's going on :)
Thanks Hanjun
linaro-kernel@lists.linaro.org