Hi Leo,
On 12/18/2015 03:29 AM, Leo Yan wrote:
On Thu, Dec 17, 2015 at 07:18:56PM +0000, Dietmar Eggemann wrote:
On 10/11/15 09:58, Leo Yan wrote:
[...]
Your idea is better than my patch, because it has more clear topology corresponding to hardware design. Also I have below questions so can understand well your patch :)
(1) Hold cluster energy model (EM) data on a single cluster system
For single cluster system, there only has 'MC' sched domain (for cpu level) so there have no 'CPU' sched domain to hold cluster's EM data. So should
Yes, this is the plan for a single cluster EAS system. (s/'CPU'/'DIE')
add sched domain 'SYS' to help save cluster's EM data, right?
Yeah, something like this. You only need the SYS sd with one sg spawning the whole system, no EM info is needed on this sd for hikey.
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c index 5883f9e262ec..805627f61fc5 100644 --- a/arch/arm64/kernel/topology.c +++ b/arch/arm64/kernel/topology.c @@ -354,6 +354,7 @@ static struct sched_domain_topology_level arm64_topology[] = { { cpu_coregroup_mask, cpu_corepower_flags, cpu_core_energy, SD_INIT_NAME(MC) }, #endif { cpu_cpu_mask, NULL, cpu_cluster_energy, SD_INIT_NAME(DIE) }, + { cpu_cpu_mask, NULL, NULL, SD_INIT_NAME(SYS) }, { NULL, }, };
In the meantime I played a little bit with this patch and I think you have to add some code to cover your functionality of having a top sd with one sg spawning all cpus. The original patch covers only the functionality when individual cpus are offline on a big.Little system and a single cluster system where we want to attach EM data on SYS sd.
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 487bb3c4627b..8b6cb48b8985 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5929,7 +5929,9 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent) SD_PREFER_SIBLING | SD_SHARE_POWERDOMAIN | SD_SHARE_CAP_STATES); - if (parent->groups->sge) { + if (parent->groups->sge || + (sd->groups->sge && parent->groups == parent->groups->next && + parent->span_weight == num_online_cpus())) { parent->flags &= ~SD_LOAD_BALANCE; return 0; }
The additional if condition is necessary since you don't want to attach EM data (parent->groups->sge) onto your SYS sd.
JUNO looks like this with these two changes on top of patch 'sched: EAS & cpu hotplug interoperability':
root@genericarmv8:~# cat /proc/schedstat (only for cpu0)
cpu0 0 0 4436 1854 3368 1751 1034319260 310590900 2555 domain0 39 1136 1136 0 0 0 0 0 1136 1 1 0 0 0 0 0 1 1436 1335 94 ... domain1 3f 900 852 46 25311 2 0 1 770 0 0 0 0 0 0 0 0 1429 1235 ... domain2 3f 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...
$ cat /proc/sys/kernel/sched_domain/cpu0/domain*/{name,flags}
MC DIE SYS 33583 4143 4142 <-- !SD_LOAD_BALANCE
(2) Offer this 'sg_shared_cap = sd->parent->groups' thing on a platform like Hikey (3) Let cpu and cluster level EM data survive if cpus get off-lined
Just curious, what benefit we can get for this? When CPU or whole cluster has been off-lined, then we should not take care their EM data anymore.
If the entire cluster is offline, then yes, we shouldn't take these cpus or the cluster contribution into consideration but I was talking about situations where all but one cpu of a cluster is offline or about the cpus in the other cluster of a two cluster system.
From the patch header:
For Energy-Aware Scheduling (EAS) to work properly, even in the case that cpus are hot-plugged out, the energy model (EM) data on all energy-aware sched domains has to be present for all online cpus.
Mainline sd hierarchy setup code will remove sd's which are not useful for task scheduling e.g. in the following situations:
1. Only one cpu remains in one cluster of a two cluster system.
This remaining cpu only has DIE and no MC sd.
2. A complete cluster in a two-cluster system is hot-plugged out.
The cpus of the remaining cluster only have MC and no DIE sd.
Obviously, this sd level can't be part of the actual scheduling decisions so CFS code has to change slightly for this to work. I attached the appropriate patch. It's only tested lightly on TC2 and JUNO and might not apply on the current code line.
Would be nice if you can test it and give feedback if you consider it as something you're fine with. Thanks!
Do you suggest I test this based on RFCv5 or RFCv6?
Definitely RFCv5.2. EAS RFCv6 does not exist yet.
-- Dietmar
[...] IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.