Hi Juri & Morten,
Thanks for taking the quick call today. Here are some of the patches from the EAS V3 which I think are outstanding dependencies for sched-freq to be posted on LKML:
#4 "sched: Make sched entity usage tracking frequency-invariant"
#11 "sched: Make load tracking frequency scale-invariant"
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Let me know what you think.
Regards, Mike
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
Hi Juri & Morten,
Thanks for taking the quick call today. Here are some of the patches from the EAS V3 which I think are outstanding dependencies for sched-freq to be posted on LKML:
#4 "sched: Make sched entity usage tracking frequency-invariant"
This one is already in tip/sched/core with a slightly different subject ('...scale-invariant' instead of '...frequency-invariant'). The patch is the same.
#11 "sched: Make load tracking frequency scale-invariant"
This patch only affect load (runnable priority scaled). If you rely only on usage/utilization (get_cpu_usage()) for the sched/dvfs patches, I don't think you need this one?
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Let me know what you think.
To me it looks like it boils down to two patches. If you want blocked utilization as well you can add:
# sched: Include blocked utilization in usage tracking
but that isn't essential.
Including those two in your posting shouldn't cause much harm I think as long we make it clear where they came from and they will continue to into the EAS RFCs.
Thanks, Morten
On Wed, Apr 08, 2015 at 11:38:10AM +0100, Morten Rasmussen wrote:
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
Let me know what you think.
To me it looks like it boils down to two patches. If you want blocked utilization as well you can add:
# sched: Include blocked utilization in usage tracking
but that isn't essential.
You will need the patch before as well:
#18 sched: Track blocked utilization contributions
for blocked utilization.
Morten
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
Hi Juri & Morten,
Thanks for taking the quick call today. Here are some of the patches from the EAS V3 which I think are outstanding dependencies for sched-freq to be posted on LKML:
#4 "sched: Make sched entity usage tracking frequency-invariant"
This one is already in tip/sched/core with a slightly different subject ('...scale-invariant' instead of '...frequency-invariant'). The patch is the same.
#11 "sched: Make load tracking frequency scale-invariant"
This patch only affect load (runnable priority scaled). If you rely only on usage/utilization (get_cpu_usage()) for the sched/dvfs patches, I don't think you need this one?
Correct.
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Great!
Let me know what you think.
To me it looks like it boils down to two patches. If you want blocked utilization as well you can add:
# sched: Include blocked utilization in usage tracking
but that isn't essential.
I'll skip the blocked load contribution for now.
Including those two in your posting shouldn't cause much harm I think as long we make it clear where they came from and they will continue to into the EAS RFCs.
Ack.
Thanks, Mike
Thanks, Morten
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote:
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
Let me know what you think.
To me it looks like it boils down to two patches. If you want blocked utilization as well you can add:
# sched: Include blocked utilization in usage tracking
but that isn't essential.
I'll skip the blocked load contribution for now.
Ack.
Thanks, Morten
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote:
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote:
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
I have removed the extra call to arch_scale_set_curr_freq() from the patch and pushed an updated branch.
Morten
On 10 April 2015 at 15:36, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote:
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote:
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real
This can even never happen according to the governor you use
system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
With current implementation, if the freq of a cpu is not set by cpufreq, arch_scale_freq_capacity will stay with default value whatever the real freq. If a cluster boots with powersave governor and the current freq is already min freq for this cluster, arch_scale_freq_capacity will return max capacity instead of min
Vincent
I have removed the extra call to arch_scale_set_curr_freq() from the patch and pushed an updated branch.
Morten _______________________________________________ eas-dev mailing list eas-dev@lists.linaro.org https://lists.linaro.org/mailman/listinfo/eas-dev
On Fri, Apr 10, 2015 at 02:51:18PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 15:36, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote:
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote:
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
#13 "cpufreq: Architecture specific callback for frequency changes"
Vincent's fix for setting current freq (should be rolled into #13 above): https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0...
#14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real
This can even never happen according to the governor you use
system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
With current implementation, if the freq of a cpu is not set by cpufreq, arch_scale_freq_capacity will stay with default value whatever the real freq. If a cluster boots with powersave governor and the current freq is already min freq for this cluster, arch_scale_freq_capacity will return max capacity instead of min
All agreed. But do you think it is a problem?
If you run at a fixed frequency (like the performance and powersave, and even userspace in case the user does requrest any chances), does it really matter that arch_scale_freq_capacity is returning max capacity? It will artificially inflate the utilization/usage tracking, but since we aren't changing frequency the scaling is the same for everyone.
One could of course come up with scenarios where you choose the powersave governor as default on big.LITTLE-like systems where one cluster would have a very wide frequency range and the other didn't. In that case your scaling could be far off the actual capacity. I'm not sure if it worth fixing that one. It shouldn't break anything.
I can put it back in if you think it is necessary. It is not necessary anymore to prevent arch_scale_freq_capacity() from returning 0.
Morten
On 10 April 2015 at 16:14, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, Apr 10, 2015 at 02:51:18PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 15:36, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote:
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote:
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote:
Hi Mike,
On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
> #13 "cpufreq: Architecture specific callback for frequency changes" > > Vincent's fix for setting current freq (should be rolled into #13 above): > https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0... > > #14 "arm: Frequency invariant scheduler load-tracking support"
Yes, these two are the essential ones I think. Since we no longer use a __weak function to implement arch_scale_freq_capacity() they will have to be modified a bit. I should be able to that today and provide you with new versions of those two.
Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real
This can even never happen according to the governor you use
system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
With current implementation, if the freq of a cpu is not set by cpufreq, arch_scale_freq_capacity will stay with default value whatever the real freq. If a cluster boots with powersave governor and the current freq is already min freq for this cluster, arch_scale_freq_capacity will return max capacity instead of min
All agreed. But do you think it is a problem?
If you run at a fixed frequency (like the performance and powersave, and even userspace in case the user does requrest any chances), does it really matter that arch_scale_freq_capacity is returning max capacity? It will artificially inflate the utilization/usage tracking, but since we aren't changing frequency the scaling is the same for everyone.
IHMO, it's matter because the utilization/usage and the weighted_load sooner or later will not track the right value which will disturb the load balancing.
One could of course come up with scenarios where you choose the powersave governor as default on big.LITTLE-like systems where one cluster would have a very wide frequency range and the other didn't. In that case your scaling could be far off the actual capacity. I'm not sure if it worth fixing that one. It shouldn't break anything.
i agree that the system will not crash but it will change the scheduler behavior which will use inaccurate figures. Then, it's not break anything for now but we don't know what could be the future use of this value so we should fix it now that we have noticed it instead of waiting more visible damage linked to a new use of the value
Viincent
I can put it back in if you think it is necessary. It is not necessary anymore to prevent arch_scale_freq_capacity() from returning 0.
Morten
On Fri, Apr 10, 2015 at 03:43:45PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 16:14, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, Apr 10, 2015 at 02:51:18PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 15:36, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote:
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote:
On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen morten.rasmussen@arm.com wrote: > Hi Mike, > > On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
>> #13 "cpufreq: Architecture specific callback for frequency changes" >> >> Vincent's fix for setting current freq (should be rolled into #13 above): >> https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0... >> >> #14 "arm: Frequency invariant scheduler load-tracking support" > > Yes, these two are the essential ones I think. Since we no longer use a > __weak function to implement arch_scale_freq_capacity() they will have > to be modified a bit. I should be able to that today and provide you > with new versions of those two.
Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real
This can even never happen according to the governor you use
system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
With current implementation, if the freq of a cpu is not set by cpufreq, arch_scale_freq_capacity will stay with default value whatever the real freq. If a cluster boots with powersave governor and the current freq is already min freq for this cluster, arch_scale_freq_capacity will return max capacity instead of min
All agreed. But do you think it is a problem?
If you run at a fixed frequency (like the performance and powersave, and even userspace in case the user does requrest any chances), does it really matter that arch_scale_freq_capacity is returning max capacity? It will artificially inflate the utilization/usage tracking, but since we aren't changing frequency the scaling is the same for everyone.
IHMO, it's matter because the utilization/usage and the weighted_load sooner or later will not track the right value which will disturb the load balancing.
One could of course come up with scenarios where you choose the powersave governor as default on big.LITTLE-like systems where one cluster would have a very wide frequency range and the other didn't. In that case your scaling could be far off the actual capacity. I'm not sure if it worth fixing that one. It shouldn't break anything.
i agree that the system will not crash but it will change the scheduler behavior which will use inaccurate figures. Then, it's not break anything for now but we don't know what could be the future use of this value so we should fix it now that we have noticed it instead of waiting more visible damage linked to a new use of the value
I will put it back in then :)
However, it doesn't mean that we are all good for all cpufreq drivers. AFAICT, drivers are not required to export a 'get' function which means that set policy->cur might not get set. Also, the drivers using 'set_policy' instead of target, might not fire the change notifiers. So there are plenty of other cases to worry about that are worse I think.
Morten
On 10 April 2015 at 17:09, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, Apr 10, 2015 at 03:43:45PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 16:14, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, Apr 10, 2015 at 02:51:18PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 15:36, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote:
On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote: > On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen > morten.rasmussen@arm.com wrote: > > Hi Mike, > > > > On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote:
[...]
> >> #13 "cpufreq: Architecture specific callback for frequency changes" > >> > >> Vincent's fix for setting current freq (should be rolled into #13 above): > >> https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0... > >> > >> #14 "arm: Frequency invariant scheduler load-tracking support" > > > > Yes, these two are the essential ones I think. Since we no longer use a > > __weak function to implement arch_scale_freq_capacity() they will have > > to be modified a bit. I should be able to that today and provide you > > with new versions of those two. > > Great!
I have pushed two updated patches to the usual repo based on a fairly recent snapshot of tip/sched/core. I rolled in an optimization as well so the code looks slightly different now. I did a quick test on TC2 and it appears to be working.
Please let me know if it doesn't :)
http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real
This can even never happen according to the governor you use
system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
With current implementation, if the freq of a cpu is not set by cpufreq, arch_scale_freq_capacity will stay with default value whatever the real freq. If a cluster boots with powersave governor and the current freq is already min freq for this cluster, arch_scale_freq_capacity will return max capacity instead of min
All agreed. But do you think it is a problem?
If you run at a fixed frequency (like the performance and powersave, and even userspace in case the user does requrest any chances), does it really matter that arch_scale_freq_capacity is returning max capacity? It will artificially inflate the utilization/usage tracking, but since we aren't changing frequency the scaling is the same for everyone.
IHMO, it's matter because the utilization/usage and the weighted_load sooner or later will not track the right value which will disturb the load balancing.
One could of course come up with scenarios where you choose the powersave governor as default on big.LITTLE-like systems where one cluster would have a very wide frequency range and the other didn't. In that case your scaling could be far off the actual capacity. I'm not sure if it worth fixing that one. It shouldn't break anything.
i agree that the system will not crash but it will change the scheduler behavior which will use inaccurate figures. Then, it's not break anything for now but we don't know what could be the future use of this value so we should fix it now that we have noticed it instead of waiting more visible damage linked to a new use of the value
I will put it back in then :)
However, it doesn't mean that we are all good for all cpufreq drivers. AFAICT, drivers are not required to export a 'get' function which means that set policy->cur might not get set. Also, the drivers using 'set_policy' instead of target, might not fire the change notifiers. So there are plenty of other cases to worry about that are worse I think.
So we should also solve them too :-)
Vincent
Morten
On Fri, Apr 10, 2015 at 04:11:48PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 17:09, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, Apr 10, 2015 at 03:43:45PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 16:14, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Fri, Apr 10, 2015 at 02:51:18PM +0100, Vincent Guittot wrote:
On 10 April 2015 at 15:36, Morten Rasmussen morten.rasmussen@arm.com wrote:
On Wed, Apr 08, 2015 at 05:47:33PM +0100, Morten Rasmussen wrote: > On Wed, Apr 08, 2015 at 04:16:15PM +0100, Mike Turquette wrote: > > On Wed, Apr 8, 2015 at 3:38 AM, Morten Rasmussen > > morten.rasmussen@arm.com wrote: > > > Hi Mike, > > > > > > On Tue, Apr 07, 2015 at 09:53:42PM +0100, Mike Turquette wrote: > > [...] > > > >> #13 "cpufreq: Architecture specific callback for frequency changes" > > >> > > >> Vincent's fix for setting current freq (should be rolled into #13 above): > > >> https://git.linaro.org/people/vincent.guittot/kernel.git/commitdiff/3ac2b6a0... > > >> > > >> #14 "arm: Frequency invariant scheduler load-tracking support" > > > > > > Yes, these two are the essential ones I think. Since we no longer use a > > > __weak function to implement arch_scale_freq_capacity() they will have > > > to be modified a bit. I should be able to that today and provide you > > > with new versions of those two. > > > > Great! > > I have pushed two updated patches to the usual repo based on a fairly > recent snapshot of tip/sched/core. I rolled in an optimization as well > so the code looks slightly different now. I did a quick test on TC2 and > it appears to be working. > > Please let me know if it doesn't :) > > http://www.linux-arm.org/git?p=linux-power.git%3Ba=shortlog%3Bh=refs/heads/w...
I looked into the issue with setting the current frequency at boot time where I included Vincent's fix in the branch above. It turns out that it isn't necessary with the new implementation of arch_scale_freq_capacity() as it will never return 0. If current frequency hasn't been reported by cpufreq (setting the default governor to userspace is the easiest test case) asfc() will return SCHED_CAPACITY_SCALE until the first frequency change occur. That should be fine I think.
One could argue that we are getting the scaling wrong until the first frequency change which should happen fairly soon after boot for any real
This can even never happen according to the governor you use
system. If the frequency doesn't change, say stays at 80% (like it does for TC2 with userspace as default), there isn't much reason to track frequency changes in the first place.
With current implementation, if the freq of a cpu is not set by cpufreq, arch_scale_freq_capacity will stay with default value whatever the real freq. If a cluster boots with powersave governor and the current freq is already min freq for this cluster, arch_scale_freq_capacity will return max capacity instead of min
All agreed. But do you think it is a problem?
If you run at a fixed frequency (like the performance and powersave, and even userspace in case the user does requrest any chances), does it really matter that arch_scale_freq_capacity is returning max capacity? It will artificially inflate the utilization/usage tracking, but since we aren't changing frequency the scaling is the same for everyone.
IHMO, it's matter because the utilization/usage and the weighted_load sooner or later will not track the right value which will disturb the load balancing.
One could of course come up with scenarios where you choose the powersave governor as default on big.LITTLE-like systems where one cluster would have a very wide frequency range and the other didn't. In that case your scaling could be far off the actual capacity. I'm not sure if it worth fixing that one. It shouldn't break anything.
i agree that the system will not crash but it will change the scheduler behavior which will use inaccurate figures. Then, it's not break anything for now but we don't know what could be the future use of this value so we should fix it now that we have noticed it instead of waiting more visible damage linked to a new use of the value
I will put it back in then :)
However, it doesn't mean that we are all good for all cpufreq drivers. AFAICT, drivers are not required to export a 'get' function which means that set policy->cur might not get set. Also, the drivers using 'set_policy' instead of target, might not fire the change notifiers. So there are plenty of other cases to worry about that are worse I think.
So we should also solve them too :-)
But I don't want to :-)
cpufreq is a nightmare of optional driver features. And we actually don't want to get the information from there if we could avoid it.
We can't really fix it for drivers like intel_pstate.c and longrun.c as they don't report anything back unless explicitly asked to do so. cpufreq just feeds longrun.c with a max, min, and a 'policy' using set_policy and let it do what it wants. I don't want to hack in polling in cpufreq to feed information back to arch/* code through the arch_scale_set*() functions when arch/* could avoid polling completely by replicating the quite simple code in the driver 'get'-function implementation.
I think we should just keep the arch_scale_* callbacks as an optional thing in cpufreq just like the rest and make sure that it doesn't blow up :-)