On Wed, Nov 13, 2013 at 03:47:16PM +0000, Dietmar Eggemann wrote:
On 12/11/13 18:08, Peter Zijlstra wrote:
On Tue, Nov 12, 2013 at 05:43:36PM +0000, Dietmar Eggemann wrote:
This patch removes the sched_domain initializer macros SD_[SIBLING|MC|BOOK|CPU]_INIT in core.c and in archs and replaces them with calls to the new function sd_init(). The function sd_init incorporates the already existing function sd_numa_init().
Your patch retains far too much of the weird behavioural variations we have, nor does it create a proper separation between topology and behaviour.
Could you please explain a little bit further on the weird behavioural variations. Are you referring to the specific SD_ flags or sd_domain levels?
+++ b/arch/ia64/kernel/topology.c @@ -99,6 +99,14 @@ out:
subsys_initcall(topology_init);
+void arch_sd_customize(int level, struct sched_domain *sd, int cpu) +{ + if (level == SD_LVL_CPU) { + sd->cache_nice_tries = 2; + + sd->flags &= ~SD_PREFER_SIBLING; + } +}
+++ b/arch/tile/kernel/smp.c @@ -254,3 +254,15 @@ void smp_send_reschedule(int cpu) }
#endif /* CHIP_HAS_IPI() */ + +void arch_sd_customize(int level, struct sched_domain *sd, int cpu) +{ + if (level == SD_LVL_CPU) { + sd->min_interval = 4; + sd->max_interval = 128; + + sd->flags &= ~(SD_WAKE_AFFINE | SD_PREFER_SIBLING); + + sd->balance_interval = 32; + } +}
Many of these differences are just bitrot / accidents, and the different min interval for tile was already taken care of by basing the intervals off of the domain weight.
On that, you should also not rely on these SD_LVL things; if we wanted to inject an extra level they'd go all funny.
I agree that this patch doesn't separate behaviour and topology and I will consider this going forward.
Please take the patch I did and work from there.
We might indeed have to have a single arch_() function that adds SD_flags, but please restrict the flags it can set -- never allow it to set behavioural flags.
Understood. Simply exporting an sd_domain pointer is a no-go.
I was more thinking along the lines of:
unsigned long arch_sd_flags(unsigned long sd_flags) { return 0 }
Used like:
extra_sd_flags = arch_sd_flags(sd->sd_flags); if (extra_sd_flags & FOO) { WARN("silly bugger: %x\n", extra_sd_flags); extra_sd_flags &= ~FOO; } sd->sd_flags |= extra_sd_flags;
Or something.
Furthermore, I think we want to allow the arch to override the base topology; we've had desire to add per arch level in the past.. eg. add an L2 level for some x86 variants.
I quite don't understand this one. Are you saying that one idea for the topology side of things is to have an extra arch specific sd level which would be the only sd_domain level which could be then overridden by the arch?
No allow an arch to fully override default_topology[] like I did in that s390 case.
The case where the x86 cpu_core_map != cpu_llc_shared_map can currently not be represented. Luckily no recent chips have had this, but there was a time when this was a popular configuration (Intel Core2Quad).
There's been other 'fun' cases, like AMD putting two nodes in 1 package. That's not something we can represent (not sure we need to but still).
And there's the AMD bulldozer shared core thing, which we currently model the same as SMT but which could do with some tweaks - they're distinct in that they share an 'large' L2.
Then there's the Xeon-7400 which is 'similar' to the AMD in that cores share L2.
Anyway, all I'm saying is that even within one architecture there's sufficient variation to allow for runtime topology creation. I'm sure ARM has plenty weird configurations too.
And I'm not sure we need to represent all the weird variations, but in the past we've had moments where we wanted to but could not (sanely) do things.