On 11/03/14 13:17, Peter Zijlstra wrote:
On Sat, Mar 08, 2014 at 12:40:58PM +0000, Dietmar Eggemann wrote:
I don't have a strong opinion about using or not a cpu argument for setting the flags of a level (it was part of the initial proposal before we start to completely rework the build of sched_domain) Nevertheless, I see one potential concern that you can have completely different flags configuration of the same sd level of 2 cpus.
Could you elaborate a little bit further regarding the last sentence? Do you think that those completely different flags configuration would make it impossible, that the load-balance code could work at all at this sd?
So a problem with such an interfaces is that is makes it far too easy to generate completely broken domains.
I see the point. What I'm still struggling with is to understand why this interface is worse then the one where we set-up additional, adjacent sd levels with new cpu_foo_mask functions plus different static sd-flags configurations and rely on the sd degenerate functionality in the core scheduler to fold these levels together to achieve different per cpu sd flags configurations.
IMHO, exposing struct sched_domain_topology_level bar_topology[] to the arch is the reason why the core scheduler has to check if the arch provides a sane sd setup in both cases.
You can, for two cpus in the same domain provide, different flags; such a configuration doesn't make any sense at all.
Now I see why people would like to have this; but unless we can make it robust I'd be very hesitant to go this route.
By making it robust, I guess you mean that the core scheduler has to check that the provided set-ups are sane, something like the following code snippet in sd_init()
if (WARN_ONCE(tl->sd_flags & ~TOPOLOGY_SD_FLAGS, "wrong sd_flags in topology description\n")) tl->sd_flags &= ~TOPOLOGY_SD_FLAGS;
but for per cpu set-up's. Obviously, this check has to be in sync with the usage of these flags in the core scheduler algorithms. This comprises probably that a subset of these topology sd flags has to be set for all cpus in a sd level whereas other can be set only for some cpus.