Re: [RFC] sched: CPU topology try

6 Jan 2014

      On 23 December 2013 18:22, Dietmar Eggemann dietmar.eggemann@arm.com wrote:
...
Hi Vincent,
On 18/12/13 14:13, Vincent Guittot wrote:
...
This patch applies on top of the two patches [1][2] that have been
proposed by
Peter for creating a new way to initialize sched_domain. It includes some
minor
compilation fixes and a trial of using this new method on ARM platform.
[1] https://lkml.org/lkml/2013/11/5/239
[2] https://lkml.org/lkml/2013/11/5/449
I came up w/ a similar implementation proposal for an arch specific
interface for scheduler domain set-up a couple of days ago:
[1] https://lkml.org/lkml/2013/12/13/182
I had the following requirements in mind:

The arch should not be able to fine tune individual scheduler behaviour,

i.e. get rid of the arch specific SD_FOO_INIT macros.

Unify the set-up code for conventional and NUMA scheduler domains.

The arch is able to specify additional scheduler domain level, other than

SMT, MC, BOOK, and CPU.

Allow to integrate the provision of additional topology related data

(e.g. energy information) to the scheduler.
Moreover, I think now that:

Something like the existing default set-up via default_topology[] is

needed to avoid code duplication for archs not interested in (3) or (4).
Hi Dietmar,
I agree. This default array is available in Peter's patch and my
patches overwrites the default array only if it wants to add more/new
levels
[snip]
...
...
CPU2:
domain 0: span 2-3 level: SMT
     flags: SD_SHARE_CPUPOWER | SD_SHARE_PKG_RESOURCES |
SD_SHARE_POWERDOMAIN
     groups: 0 1
   domain 1: span 2-7 level: MC
       flags: SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN
       groups: 2-7 4-5 6-7
     domain 2: span 0-7 level: MC
         flags: SD_SHARE_PKG_RESOURCES
         groups: 2-7 0-1
       domain 3: span 0-15 level: CPU
           flags:
           groups: 0-7 8-15
In this case, we have an aditionnal sched_domain MC level for this subset
(2-7)
of cores so we can trigger some load balance in this subset before doing
that
on the complete cluster (which is the last level of cache in my example)
I think the weakest point right now is the condition in sd_init() where we
convert the topology flags into scheduler behaviour. We not only introduce a
very tight coupling between topology flags and scheduler domain level but
also we need to follow a certain order in the initialization. This bit needs
more thinking.
IMHO, these settings will disappear sooner or later, as an example the
idle/busy _idx are going to be removed by Alex's patch.
...
...
We can add more levels that will describe other dependency/independency
like
the frequency scaling dependency and as a result the final sched_domain
topology will have additional levels (if they have not been removed during
the degenerate sequence)
My concern is about the configuration of the table that is used to create
the
sched_domain. Some levels are "duplicated" with different flags
configuration
which make the table not easily readable and we must also take care of the
order  because parents have to gather all cpus of its childs. So we must
choose which capabilities will be a subset of the other one. The order is
almost straight forward when we describe 1 or 2 kind of capabilities
(package ressource sharing and power sharing) but it can become complex if
we
want to add more.
I'm not sure if the idea to create a dedicated sched_domain level for every
topology flag representing a specific functionality will scale. From the
It's up to the arch to decide how many levels they want to add; if a
dedicated level is needed or if it can gather some features/flags.
IMHO, having sub structs for energy information like what we have for
the cpu/group capacity will not prevent from having a 1st and quick
topology tree description
...
perspective of energy-aware scheduling we need e.g. energy costs (P and C
state) which can only be populated towards the scheduler via an additional
sub-struct and additional function arch_sd_energy() like depicted in
Morten's email:
[2] lkml.org/lkml/2013/11/14/102
[snip]
...
...

+static int __init arm_sched_topology(void)
+{

  sched_domain_topology = arm_topology;

return missing
good catch
Thanks
Vincent

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [RFC] sched: CPU topology try