Hi Morten,
On Thu, Oct 27, 2016 at 05:14:58PM +0100, Morten Rasmussen wrote:
[...]
Tested video playback on Juno for LB_MIN vs rb tree:
LB_MIN Nrg:LITTLE Nrg:Big Nrg:Sum
11.3122 8.983429 20.295629 11.337446 8.174061 19.511507 11.256941 8.547895 19.804836 10.994329 9.633028 20.627357 11.483148 8.522364 20.005512 avg. 11.2768128 8.7721554 20.0489682
stdev 0.431777914
rb tree Nrg:LITTLE Nrg:Big Nrg:Sum
11.384301 8.412714 19.797015 11.673992 8.455219 20.129211 11.586081 8.414606 20.000687 11.423509 8.64781 20.071319 11.43709 8.595252 20.032342 avg. 11.5009946 8.5051202 20.0061148
stdev 0.1263371635
vs LB_MIN +1.99% -3.04% -0.21%
Should I read this as the energy benefits of the rb-tree solution is the negligible? It seems to be much smaller than the error margins.
From the power data on Juno, the final difference is quite small. But if we review the energy for big cluster and little cluster, we can see the benefit by reducing big core's energy and uses more LITTLE core for tasks.
From my previous experience, the power optimization patches can save power 10% on another b.L system for camera case, but it only can save CPU power 2.97% for video playback on Juno. This is caused by video playback still have no enough small tasks, for the case with many small tasks we can see more benefit.
Do you have suggestion for power testing case on Juno?
In the end it is real energy consumption that matters, shifting energy from big to little doesn't really matter if the sum is the same which seems to be the case for the Juno test.
You are suggesting that video doesn't have enough small tasks. Would it be possible to use rtapp to generate a bunch of small tasks of roughly similar size to those you see for the camera case?
Following up your suggestion, I wrote a rt-app json script to generate synthetic test for camera user case and uploaded onto github[1].
Based on this camera case, I finished several testing on Juno for power data. In below testings, I set big cluster's scaling_min_freq to 600MHz, 1000MHz and 1200MHz saperately, using this method we can deliberately enlarge the power efficiency's difference between two clusters, so finally we can see how much the power benefit we can get with power optimization patches.
Big cluster's scaling_min_freq = 600MHz: W/o power opt. (J) W/t power opt.(J) Comparision Big Nrg. 22.30 10.44 -53.18% LITTLE Nrg. 119.98 131.58 +9.67% Total Nrg. 142.28 142.02 -0.18%
Big cluster's scaling_min_freq = 1000MHz: W/o power opt. (J) W/t power opt.(J) Comparision Big Nrg. 28.53 12.85 -54.96% LITTLE Nrg. 120.47 132.06 +9.62% Total Nrg. 149 144.91 -2.74%
Big cluster's scaling_min_freq = 1200MHz: W/o power opt. (J) W/t power opt.(J) Comparision Big Nrg. 38.39 16.58 -54.96% LITTLE Nrg. 120.01 131.35 +9.62% Total Nrg. 158.4 147.93 -6.6%
Here should meantion one thing is: I reviewed a bit for power modeling on Juno-r2, below tables are voltage and OPPs mapping on it. So for LITTLE cluster OPP 950MHz increases voltage to 1010mV, which increase 11.11% for voltage level compared against 800MHz's voltage level 909mV. So LITTLE core 950MHz OPP has much worse power efficiency, and it should have similiar power efficiency with big core's 600MHz. So this is why we see if we use big cluster minimum frequency to 600MHz, even optimized to place small tasks on LITTLE cluster, we cannot see benefit for total power data (-0.18%). So for Juno default configuration, the power optimization patches cannot reflect much benefit, but we should get benefit if two clusters have bigger gap for power efficiency within them.
LITTLE core voltage level: 450MHz -> 829mV 800MHz -> 909mV 950MHz -> 1010mV
Big core voltage level: 600MHz -> 829mV 1000MHz -> 909mV 1200MHz -> 1010mV
[1] https://github.com/ARM-software/workload-automation/pull/284/commits/d3c59e2...
Thanks, Leo Yan