From: Mark Brown broonie@linaro.org
The only requirement the scheduler has on cluster IDs is that they must be unique. When enumerating the topology based on MPIDR information the kernel currently generates cluster IDs by using the first level of affinity above the core ID (either level one or two depending on if the core has multiple threads) however the ARMv8 architecture allows for up to three levels of affinity. This means that an ARMv8 system may contain cores which have MPIDRs identical other than affinity level three which with current code will cause us to report multiple cores with the same identification to the scheduler in violation of its uniqueness requirement.
Ensure that we do not violate the scheduler requirements on systems that uses all the affinity levels by incorporating both affinity levels two and three into the cluser ID when the cores are not threaded.
While no currently known hardware uses multi-level clusters it is better to program defensively, this will help ease bringup of systems that have them and will ensure that things like distribution install media do not need to be respun to replace kernels in order to deploy such systems. In the worst case the system will work but perform suboptimally until a kernel modified to handle the new topology better is installed, in the best case this will be an adequate description of such topologies for the scheduler to perform well.
Signed-off-by: Mark Brown broonie@linaro.org --- arch/arm64/kernel/topology.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index b6ee26b..5752c1b 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -255,7 +255,8 @@ void store_cpu_topology(unsigned int cpuid) /* Multiprocessor system : Multi-threads per core */ cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1); - cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2); + cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) | + MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; } else { /* Multiprocessor system : Single-thread per core */ cpuid_topo->thread_id = -1;
Hi Mark,
On Tue, Aug 19, 2014 at 9:02 PM, Mark Brown broonie@kernel.org wrote:
From: Mark Brown broonie@linaro.org
The only requirement the scheduler has on cluster IDs is that they must be unique. When enumerating the topology based on MPIDR information the kernel currently generates cluster IDs by using the first level of affinity above the core ID (either level one or two depending on if the core has multiple threads) however the ARMv8 architecture allows for up to three levels of affinity. This means that an ARMv8 system may contain cores which have MPIDRs identical other than affinity level three which with current code will cause us to report multiple cores with the same identification to the scheduler in violation of its uniqueness requirement.
Ensure that we do not violate the scheduler requirements on systems that uses all the affinity levels by incorporating both affinity levels two and three into the cluser ID when the cores are not threaded.
While no currently known hardware uses multi-level clusters it is better to program defensively, this will help ease bringup of systems that have them and will ensure that things like distribution install media do not need to be respun to replace kernels in order to deploy such systems. In the worst case the system will work but perform suboptimally until a kernel modified to handle the new topology better is installed, in the best case this will be an adequate description of such topologies for the scheduler to perform well.
Signed-off-by: Mark Brown broonie@linaro.org
arch/arm64/kernel/topology.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index b6ee26b..5752c1b 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -255,7 +255,8 @@ void store_cpu_topology(unsigned int cpuid) /* Multiprocessor system : Multi-threads per core */ cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2);
cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; } else { /* Multiprocessor system : Single-thread per core */ cpuid_topo->thread_id = -1;
Can you please extend this for non-SMT case also?
-- 2.1.0.rc1
thanks ganapat
On Thu, Sep 11, 2014 at 06:01:50AM +0100, Ganapatrao Kulkarni wrote:
Hi Mark,
On Tue, Aug 19, 2014 at 9:02 PM, Mark Brown broonie@kernel.org wrote:
From: Mark Brown broonie@linaro.org
The only requirement the scheduler has on cluster IDs is that they must be unique. When enumerating the topology based on MPIDR information the kernel currently generates cluster IDs by using the first level of affinity above the core ID (either level one or two depending on if the core has multiple threads) however the ARMv8 architecture allows for up to three levels of affinity. This means that an ARMv8 system may contain cores which have MPIDRs identical other than affinity level three which with current code will cause us to report multiple cores with the same identification to the scheduler in violation of its uniqueness requirement.
Ensure that we do not violate the scheduler requirements on systems that uses all the affinity levels by incorporating both affinity levels two and three into the cluser ID when the cores are not threaded.
While no currently known hardware uses multi-level clusters it is better to program defensively, this will help ease bringup of systems that have them and will ensure that things like distribution install media do not need to be respun to replace kernels in order to deploy such systems. In the worst case the system will work but perform suboptimally until a kernel modified to handle the new topology better is installed, in the best case this will be an adequate description of such topologies for the scheduler to perform well.
Signed-off-by: Mark Brown broonie@linaro.org
arch/arm64/kernel/topology.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index b6ee26b..5752c1b 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -255,7 +255,8 @@ void store_cpu_topology(unsigned int cpuid) /* Multiprocessor system : Multi-threads per core */ cpuid_topo->thread_id = MPIDR_AFFINITY_LEVEL(mpidr, 0); cpuid_topo->core_id = MPIDR_AFFINITY_LEVEL(mpidr, 1);
cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2);
cpuid_topo->cluster_id = MPIDR_AFFINITY_LEVEL(mpidr, 2) |
MPIDR_AFFINITY_LEVEL(mpidr, 3) << 8; } else { /* Multiprocessor system : Single-thread per core */ cpuid_topo->thread_id = -1;
Can you please extend this for non-SMT case also?
To be honest with you all I would have preferred to sort this out with DT, that's already doable now without this patch.
Having said that, for platforms setting MPIDR_EL1 to reasonable values, I think that's what we should do, squash the upper MPIDR levels (for the non SMT case too). Please stick a proper comment for that to the code. We can always rely on DT to fix other cases that can't be treated properly with code above.
AFAIK book topology level is unused in the kernel at the moment, that would have been a candidate for the additional MPIDR level(s).
I took some time to think about that, since I am aware of tools (ie powertop) taking the socket id value verbatim, which can become a big number when we squash it. Since it is not a userspace API (or at least it is written nowhere that the socket id value must be a sequential, monotonic value starting from 0 and and on top of that that's an arch specific id that as far as I know must only be unique) I think the change above (plus additional code for non-SMT case) is acceptable.
Mark, do you want me to write the patch or you will do it ?
Fine either way by me, thanks. Lorenzo
On Fri, Oct 03, 2014 at 05:37:16PM +0100, Lorenzo Pieralisi wrote:
To be honest with you all I would have preferred to sort this out with DT, that's already doable now without this patch.
We'd still have to make sure there's a complete topology binding for ACPI and get through standardisation if any changes are needed. I guess it's probably mostly OK as the scenarios should all be shared with x86 systems but I've not looked to see if some of the subsetting that's being done for ACPI on ARMv8 causes issues.
Having said that, for platforms setting MPIDR_EL1 to reasonable values, I think that's what we should do, squash the upper MPIDR levels (for the non SMT case too). Please stick a proper comment for that to the code. We can always rely on DT to fix other cases that can't be treated properly with code above.
OK, I'll send a patch.
I took some time to think about that, since I am aware of tools (ie powertop) taking the socket id value verbatim, which can become a big number when we squash it. Since it is not a userspace API (or at least it is written nowhere that the socket id value must be a sequential, monotonic value starting from 0 and and on top of that that's an arch specific id that as far as I know must only be unique) I think the change above (plus additional code for non-SMT case) is acceptable.
More than that, it's explicitly documented as being intended to be the hardware platform's ID in cputopology.txt. I've no idea where this assumption that the numbers have to be sequential came from but it's been such a source of pain :/
On Wed, Oct 08, 2014 at 09:56:28PM +0100, Mark Brown wrote:
On Fri, Oct 03, 2014 at 05:37:16PM +0100, Lorenzo Pieralisi wrote:
To be honest with you all I would have preferred to sort this out with DT, that's already doable now without this patch.
We'd still have to make sure there's a complete topology binding for ACPI and get through standardisation if any changes are needed. I guess it's probably mostly OK as the scenarios should all be shared with x86 systems but I've not looked to see if some of the subsetting that's being done for ACPI on ARMv8 causes issues.
Having said that, for platforms setting MPIDR_EL1 to reasonable values, I think that's what we should do, squash the upper MPIDR levels (for the non SMT case too). Please stick a proper comment for that to the code. We can always rely on DT to fix other cases that can't be treated properly with code above.
OK, I'll send a patch.
Mark,
do you have time to send it ?
Please let me know.
Thanks, Lorenzo
On Wed, Nov 12, 2014 at 03:07:56PM +0000, Lorenzo Pieralisi wrote:
On Wed, Oct 08, 2014 at 09:56:28PM +0100, Mark Brown wrote:
I think that's what we should do, squash the upper MPIDR levels (for the non SMT case too). Please stick a proper comment for that to the code. We can always rely on DT to fix other cases that can't be treated properly with code above.
OK, I'll send a patch.
do you have time to send it ?
Should do.
linaro-kernel@lists.linaro.org