On Wed, Apr 29, 2015 at 03:12:38PM +0100, Daniel Lezcano wrote:
On 04/29/2015 03:33 PM, Vincent Guittot wrote:
On 29 April 2015 at 15:27, Daniel Lezcano daniel.lezcano@linaro.org wrote:
On 04/29/2015 03:17 PM, Vincent Guittot wrote:
On 29 April 2015 at 15:10, Daniel Lezcano daniel.lezcano@linaro.org wrote:
On 04/29/2015 02:34 PM, Vincent Guittot wrote:
On 29 April 2015 at 14:22, Daniel Lezcano daniel.lezcano@linaro.org wrote: > > > On 04/29/2015 01:15 PM, Vincent Guittot wrote: >> >> >> >> On 29 April 2015 at 12:34, Daniel Lezcano daniel.lezcano@linaro.org >> wrote: >>> >>> >>> >>> On 04/27/2015 09:46 AM, Michael Turquette wrote: >>>> >>>> >>>> >>>> >>>> From: Morten Rasmussen Morten.Rasmussen@arm.com >>>> >>>> Implements arch-specific function to provide the scheduler with a >>>> frequency scaling correction factor for more accurate load-tracking. >>>> The >>>> factor is: >>>> >>>> current_freq(cpu) << SCHED_CAPACITY_SHIFT / max_freq(cpu) >>>> >>>> This implementation only provides frequency invariance. No >>>> micro-architecture invariance yet. >>>> >>>> Signed-off-by: Morten Rasmussen morten.rasmussen@arm.com >>>> --- >>>> changes since internal v1: >>>> * replaced two commits from eas v3 with this new one from Morten >>>> >>>> arch/arm/include/asm/topology.h | 7 ++++++ >>>> arch/arm/kernel/smp.c | 53 >>>> +++++++++++++++++++++++++++++++++++++++-- >>>> arch/arm/kernel/topology.c | 17 +++++++++++++ >>>> 3 files changed, 75 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/arch/arm/include/asm/topology.h >>>> b/arch/arm/include/asm/topology.h >>>> index 2fe85ff..4b985dc 100644 >>>> --- a/arch/arm/include/asm/topology.h >>>> +++ b/arch/arm/include/asm/topology.h >>>> @@ -24,6 +24,13 @@ void init_cpu_topology(void); >>>> void store_cpu_topology(unsigned int cpuid); >>>> const struct cpumask *cpu_coregroup_mask(int cpu); >>>> >>>> +#define arch_scale_freq_capacity arm_arch_scale_freq_capacity >>> >>> >>> >>> >>> >>> What is for this macro ? >> >> >> >> >> This is used to doesn't add any useless computation in the hot path >> when arch_scale_freq_capacity is not used. This has been asked by >> peter > > > > > What is the difference with having a dummy empty function with a 'weak' > attribute (which is how it is done currently in the kernel) ?
You can have a look at the thread for the full discussion: https://lkml.org/lkml/2015/3/24/113
Thanks for the pointer. The link seems down for the moment, but I was able to dig in the different folder and find the thread in my mailbox (that would be easier if I have been cc'ed).
It is not clear for the me why the macro is better than 'weak'. It sounds like using the 'weak' attribute is the best thing to do, no ?
the weak function adds the useless sequence: value *= 1024 value >>=10
whereas the macro doesn't add any additional instruction
So if the macro is not defined for the architecture, the compilation will fail.
no there is a test in sched.h
#ifndef arch_scale_freq_capacity static __always_inline unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu) { return SCHED_CAPACITY_SCALE; } #endif
Ah, ok I didn't see this one.
I don't see why the function below is not right:
void __weak arch_scale_freq_capacity(struct sched_domain *sd, int cpu) { return SCHED_CAPACITY_SCALE;
}
AFAIU, it doesn't ensure that the function will be inlined and optimized by the compiler.
AFAIU, the __weak attribute guarantees that the function cannot be inlined at all. The __weak symbols are resolved at link time, not at compile time, hence they cannot be inlined. So Daniel's example above, which is pretty much what we used to have prior to 4.1, will always lead to a function call. In addition to that, the compiler has no chance of figuring out that it always returns a constant which is paired with a shift or division in every call site in fair.c which could all have been optimized out.
The _weak trick was fine while we didn't call arch_scale_freq_capacity() very often. Now that we call it all the time, we have to minimize the overhead and the #define trick Peter came up with seems to be the only feasible way to get zero overhead when the architecture doesn't care about frequency scaling of capacity.
Morten