Hi,
I was trying to investigate performance issues that we were seeing with some usecases like Video playback on OMAP Platforms with ondemand governor. As part of this, I found a tool called cpufreq-bench (http://lwn.net/Articles/339862) which can be used determine the performance impact of ondemand governor compared to performacne governor. When I ran this tool on OMAP3 (ZOOM3) platform using 2.6.36 kernel with below command, the worstcase ondemand performance is 35% compared to performance governor. cpufreq-bench -l 50000 -s 100000 -x 50000 -y 100000 -g ondemand -r 5 -n 5 -v
I tried the same on x86 platforms and there the worstcase performance is around 88%. Attached are the cpufreq-bench logs for x86 and omap3.
Questions: 1. Is this is known limitaiton of ondemand governor? 2. How do we support system usecases (like video playback etc) with ondemand governor if governor is not able to scale the frequencies in realtime? Are applications expected to play with scaling_min_freq to increase mpu frequency?
Regards Vishwa
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that. It helps in two ways:
1) the governor does not spend as much overhead on the governor when the CPU is truly busy
2) the governor is a lot less eager to downshift when the CPU is busy -- without this patch, even on a busy system ondemand will blip down in clock speed surprisingly often, hurting performance.
This patch is all about improving peak load performance. On quite a few loads I've tried this patch with a sampling_down_factor of 100 matches the performance governor quite well while the original ondemand performance was poor. On the other hand, it is not much help if you are trying to minimize power consumption on light to medium loads. If you set sampling_down_factor to "1" it preserves default behavior.
David C Niemi
Vishwanath Sripathy wrote:
Hi,
I was trying to investigate performance issues that we were seeing with some usecases like Video playback on OMAP Platforms with ondemand governor. As part of this, I found a tool called cpufreq-bench (http://lwn.net/Articles/339862) which can be used determine the performance impact of ondemand governor compared to performacne governor. When I ran this tool on OMAP3 (ZOOM3) platform using 2.6.36 kernel with below command, the worstcase ondemand performance is 35% compared to performance governor. cpufreq-bench -l 50000 -s 100000 -x 50000 -y 100000 -g ondemand -r 5 -n 5 -v
I tried the same on x86 platforms and there the worstcase performance is around 88%. Attached are the cpufreq-bench logs for x86 and omap3.
Questions:
- Is this is known limitaiton of ondemand governor?
- How do we support system usecases (like video playback etc) with
ondemand governor if governor is not able to scale the frequencies in realtime? Are applications expected to play with scaling_min_freq to increase mpu frequency?
Regards Vishwa
Thanks David for the inputs. I tried your patch. In addition to that I reduced transition_latency. With these 2 changes, I do see much better results (worst case performance of ondemand is 88%).
Vishwa
On Mon, Nov 22, 2010 at 9:39 PM, David C Niemi dniemi@verisign.com wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that. It helps in two ways:
- the governor does not spend as much overhead on the governor when the
CPU is truly busy
- the governor is a lot less eager to downshift when the CPU is busy --
without this patch, even on a busy system ondemand will blip down in clock speed surprisingly often, hurting performance.
This patch is all about improving peak load performance. On quite a few loads I've tried this patch with a sampling_down_factor of 100 matches the performance governor quite well while the original ondemand performance was poor. On the other hand, it is not much help if you are trying to minimize power consumption on light to medium loads. If you set sampling_down_factor to "1" it preserves default behavior.
David C Niemi
Vishwanath Sripathy wrote:
Hi,
I was trying to investigate performance issues that we were seeing with some usecases like Video playback on OMAP Platforms with ondemand governor. As part of this, I found a tool called cpufreq-bench (http://lwn.net/Articles/339862) which can be used determine the performance impact of ondemand governor compared to performacne governor. When I ran this tool on OMAP3 (ZOOM3) platform using 2.6.36 kernel with below command, the worstcase ondemand performance is 35% compared to performance governor. cpufreq-bench -l 50000 -s 100000 -x 50000 -y 100000 -g ondemand -r 5 -n 5 -v
I tried the same on x86 platforms and there the worstcase performance is around 88%. Attached are the cpufreq-bench logs for x86 and omap3.
Questions:
- Is this is known limitaiton of ondemand governor?
- How do we support system usecases (like video playback etc) with
ondemand governor if governor is not able to scale the frequencies in realtime? Are applications expected to play with scaling_min_freq to increase mpu frequency?
Regards Vishwa
Vishwa,
Have you had a chance to do some usetime tests with these changes?
It would be interesting to measure the power consumption with and without these changes.
/Amit
On Tue, Nov 23, 2010 at 5:59 PM, Vishwanath Sripathy vishwanath.sripathy@linaro.org wrote:
Thanks David for the inputs. I tried your patch. In addition to that I reduced transition_latency. With these 2 changes, I do see much better results (worst case performance of ondemand is 88%).
Vishwa
On Mon, Nov 22, 2010 at 9:39 PM, David C Niemi dniemi@verisign.com wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that. It helps in two ways:
- the governor does not spend as much overhead on the governor when the
CPU is truly busy
- the governor is a lot less eager to downshift when the CPU is busy --
without this patch, even on a busy system ondemand will blip down in clock speed surprisingly often, hurting performance.
This patch is all about improving peak load performance. On quite a few loads I've tried this patch with a sampling_down_factor of 100 matches the performance governor quite well while the original ondemand performance was poor. On the other hand, it is not much help if you are trying to minimize power consumption on light to medium loads. If you set sampling_down_factor to "1" it preserves default behavior.
David C Niemi
Vishwanath Sripathy wrote:
Hi,
I was trying to investigate performance issues that we were seeing with some usecases like Video playback on OMAP Platforms with ondemand governor. As part of this, I found a tool called cpufreq-bench (http://lwn.net/Articles/339862) which can be used determine the performance impact of ondemand governor compared to performacne governor. When I ran this tool on OMAP3 (ZOOM3) platform using 2.6.36 kernel with below command, the worstcase ondemand performance is 35% compared to performance governor. cpufreq-bench -l 50000 -s 100000 -x 50000 -y 100000 -g ondemand -r 5 -n 5 -v
I tried the same on x86 platforms and there the worstcase performance is around 88%. Attached are the cpufreq-bench logs for x86 and omap3.
Questions:
- Is this is known limitaiton of ondemand governor?
- How do we support system usecases (like video playback etc) with
ondemand governor if governor is not able to scale the frequencies in realtime? Are applications expected to play with scaling_min_freq to increase mpu frequency?
Regards Vishwa
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Amit,
On Tue, Nov 23, 2010 at 8:22 PM, Amit Kucheria amit.kucheria@linaro.org wrote:
Vishwa,
Have you had a chance to do some usetime tests with these changes?
I did test USB performance with this and I see ondmeand is 90% close to performance.
It would be interesting to measure the power consumption with and without these changes.
Power consumption impact can vary from usecase to usecase and extra performance will have some power impact. However in idle scenario, I feel this should not have much impact since ondemand timer is a deferrable timer which means that it does not prevent cpuidle. I will try to measure it for some usecase and compare the power impact.
Vishwa
/Amit
On Tue, Nov 23, 2010 at 5:59 PM, Vishwanath Sripathy vishwanath.sripathy@linaro.org wrote:
Thanks David for the inputs. I tried your patch. In addition to that I reduced transition_latency. With these 2 changes, I do see much better results (worst case performance of ondemand is 88%).
Vishwa
On Mon, Nov 22, 2010 at 9:39 PM, David C Niemi dniemi@verisign.com wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that. It helps in two ways:
- the governor does not spend as much overhead on the governor when the
CPU is truly busy
- the governor is a lot less eager to downshift when the CPU is busy --
without this patch, even on a busy system ondemand will blip down in clock speed surprisingly often, hurting performance.
This patch is all about improving peak load performance. On quite a few loads I've tried this patch with a sampling_down_factor of 100 matches the performance governor quite well while the original ondemand performance was poor. On the other hand, it is not much help if you are trying to minimize power consumption on light to medium loads. If you set sampling_down_factor to "1" it preserves default behavior.
David C Niemi
Vishwanath Sripathy wrote:
Hi,
I was trying to investigate performance issues that we were seeing with some usecases like Video playback on OMAP Platforms with ondemand governor. As part of this, I found a tool called cpufreq-bench (http://lwn.net/Articles/339862) which can be used determine the performance impact of ondemand governor compared to performacne governor. When I ran this tool on OMAP3 (ZOOM3) platform using 2.6.36 kernel with below command, the worstcase ondemand performance is 35% compared to performance governor. cpufreq-bench -l 50000 -s 100000 -x 50000 -y 100000 -g ondemand -r 5 -n 5 -v
I tried the same on x86 platforms and there the worstcase performance is around 88%. Attached are the cpufreq-bench logs for x86 and omap3.
Questions:
- Is this is known limitaiton of ondemand governor?
- How do we support system usecases (like video playback etc) with
ondemand governor if governor is not able to scale the frequencies in realtime? Are applications expected to play with scaling_min_freq to increase mpu frequency?
Regards Vishwa
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Thanks for running the tests, Vishwa. Your results are what I'd expect but it's good to see independent confirmation. In my benchmarks I saw 95-100% of the performance governor's performance, but the conditions were more favorable and the original ondemand governor was "only" degrading performance 20-30% to begin with.
There should be absolutely no changes in power consumption at all for the patch itself, as behavior does not change until you raise sampling_down_factor above 1 (the default). If you set it high, I would expect higher power consumption (but also higher performance) under load and no change in power consumption when idle or close to idle. Setting a high sampling_down_factor causes the governor to reevaluate load less often when at max cpu speed, both to reduce overhead and to let it remain at maximum performance more consistently. Without this change, the ondemand governor jitters a lot in and out of max clock speed when under high loads, which is why its performance can be much worse than the performance governor. Reducing the number of transitions and load evaluations should also improve performance per watt, though the details of that depend on the relative efficiency of the CPU's respective clock speeds.
If you want to balance power consumption and performance, a middle setting of sampling_down_factor like "10" should make a noticeable improvement in performance while not having as much impact on power. But if you want to match the performance governor's performance and are less concerned about transient power consumption, you will want to set it higher.
Another note: I recommend setting io_is_busy to 1 when using sampling_down_factor above 1, as it improves responsiveness to quick load transients involving some I/O. It's also worth considering lowering up_threshold to 50 or even down to 15-20.
David C Niemi
Vishwanath Sripathy wrote:
Amit,
On Tue, Nov 23, 2010 at 8:22 PM, Amit Kucheria amit.kucheria@linaro.org wrote:
Vishwa,
Have you had a chance to do some usetime tests with these changes?
I did test USB performance with this and I see ondmeand is 90% close to performance.
It would be interesting to measure the power consumption with and without these changes.
Power consumption impact can vary from usecase to usecase and extra performance will have some power impact. However in idle scenario, I feel this should not have much impact since ondemand timer is a deferrable timer which means that it does not prevent cpuidle. I will try to measure it for some usecase and compare the power impact.
Vishwa
Thanks David. If I would like to fine tune up_threshold and sampling_down_factor for say OMAP platform, is there any way to do it in kernel itself? I know these are configurable via sysfs entries. But if I want to optimize them in kernel itself, is there anyway? I see that default values are set in cpufreq-ondemand.c which is common kernel file. I would like to know if these can be set in platform specific code?
Vishwa On Wed, Nov 24, 2010 at 7:42 PM, David C Niemi dniemi@verisign.com wrote:
Thanks for running the tests, Vishwa. Your results are what I'd expect but it's good to see independent confirmation. In my benchmarks I saw 95-100% of the performance governor's performance, but the conditions were more favorable and the original ondemand governor was "only" degrading performance 20-30% to begin with.
There should be absolutely no changes in power consumption at all for the patch itself, as behavior does not change until you raise sampling_down_factor above 1 (the default). If you set it high, I would expect higher power consumption (but also higher performance) under load and no change in power consumption when idle or close to idle. Setting a high sampling_down_factor causes the governor to reevaluate load less often when at max cpu speed, both to reduce overhead and to let it remain at maximum performance more consistently. Without this change, the ondemand governor jitters a lot in and out of max clock speed when under high loads, which is why its performance can be much worse than the performance governor. Reducing the number of transitions and load evaluations should also improve performance per watt, though the details of that depend on the relative efficiency of the CPU's respective clock speeds.
If you want to balance power consumption and performance, a middle setting of sampling_down_factor like "10" should make a noticeable improvement in performance while not having as much impact on power. But if you want to match the performance governor's performance and are less concerned about transient power consumption, you will want to set it higher.
Another note: I recommend setting io_is_busy to 1 when using sampling_down_factor above 1, as it improves responsiveness to quick load transients involving some I/O. It's also worth considering lowering up_threshold to 50 or even down to 15-20.
David C Niemi
Vishwanath Sripathy wrote:
Amit,
On Tue, Nov 23, 2010 at 8:22 PM, Amit Kucheria amit.kucheria@linaro.org wrote:
Vishwa,
Have you had a chance to do some usetime tests with these changes?
I did test USB performance with this and I see ondmeand is 90% close to performance.
It would be interesting to measure the power consumption with and without these changes.
Power consumption impact can vary from usecase to usecase and extra performance will have some power impact. However in idle scenario, I feel this should not have much impact since ondemand timer is a deferrable timer which means that it does not prevent cpuidle. I will try to measure it for some usecase and compare the power impact.
Vishwa
On Thursday 25 November 2010 13:05:49 Vishwanath Sripathy wrote:
Thanks David. If I would like to fine tune up_threshold and sampling_down_factor for say OMAP platform, is there any way to do it in kernel itself? I know these are configurable via sysfs entries. But if I want to optimize them in kernel itself, is there anyway? I see that default values are set in cpufreq-ondemand.c which is common kernel file. I would like to know if these can be set in platform specific code?
Ugly to do... This should be done global and not per cpu? If per_cpu it could be added to policy, simlar to latency: set in driver's init func, evaluated in governor later.
Possibly if in cpufreq.h a struct: { unsigned int sampling_down_factor; ... } cpufreq_governor_hints; is added which gets filled by the platform driver in the .init func and then gets evaluated in the governor, but only once also at init time. Doing this when the governor is already active involves locking which must get avoided.
Thomas
On Mon, Nov 22, 2010 at 11:09:03AM -0500, David C Niemi wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that.
Would you like to get it merged into linux-linaro? Given it's been ack'd I think Nicolas might be willing to consider it:
http://kerneltrap.org/mailarchive/linux-kernel/2010/10/6/4628889/thread
Nicolas, Can you pls merge this patch into Linaro tree?
Vishwa
On Sat, Nov 27, 2010 at 4:08 AM, Christian Robottom Reis kiko@linaro.org wrote:
On Mon, Nov 22, 2010 at 11:09:03AM -0500, David C Niemi wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that.
Would you like to get it merged into linux-linaro? Given it's been ack'd I think Nicolas might be willing to consider it:
http://kerneltrap.org/mailarchive/linux-kernel/2010/10/6/4628889/thread
Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko
I certainly have no objections to it going into the Linaro tree, though I was hoping to get it into the main kernel tree too.
DCN
Christian Robottom Reis wrote:
On Mon, Nov 22, 2010 at 11:09:03AM -0500, David C Niemi wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that.
Would you like to get it merged into linux-linaro? Given it's been ack'd I think Nicolas might be willing to consider it:
http://kerneltrap.org/mailarchive/linux-kernel/2010/10/6/4628889/thread
On Mon, 29 Nov 2010, David C Niemi wrote:
I certainly have no objections to it going into the Linaro tree, though I was hoping to get it into the main kernel tree too.
What might prevent it from going into mainline at the moment, if anything?
Christian Robottom Reis wrote:
On Mon, Nov 22, 2010 at 11:09:03AM -0500, David C Niemi wrote:
The general problem here is that the ondemand governor is aimed more at power savings than performance. In cases where the ondemand governor performs worse than the performance governor, the "sampling_down_factor" tunable is often useful. I submitted the patch to add this tunable a few weeks ago and it was acked by Venki, but I don't know what happened to it after that.
Would you like to get it merged into linux-linaro? Given it's been ack'd I think Nicolas might be willing to consider it:
http://kerneltrap.org/mailarchive/linux-kernel/2010/10/6/4628889/thread
On Mon, Nov 29, 2010 at 10:38:52AM -0500, Nicolas Pitre wrote:
On Mon, 29 Nov 2010, David C Niemi wrote:
I certainly have no objections to it going into the Linaro tree, though I was hoping to get it into the main kernel tree too.
What might prevent it from going into mainline at the moment, if anything?
What patch are we talking about ? The sampling_down_factor tunable was added in commit 3f78a9f7fcee0e9b44a15f39ac382664e301fad5
Dave
Dave Jones wrote:
On Mon, Nov 29, 2010 at 10:38:52AM -0500, Nicolas Pitre wrote:
On Mon, 29 Nov 2010, David C Niemi wrote:
I certainly have no objections to it going into the Linaro tree, though I was hoping to get it into the main kernel tree too.
What might prevent it from going into mainline at the moment, if anything?
What patch are we talking about ? The sampling_down_factor tunable was added in commit 3f78a9f7fcee0e9b44a15f39ac382664e301fad5
Dave
Excellent! I had not seen that, but I'm glad it made it in. Thanks for the update.
DCN