On 01/05/2018 11:43 AM, Viresh Kumar wrote:
Hello,
I did some comparisons of Pelt and Walt and have some very interesting performance results that I wanted to share with all of you. I haven't got any power numbers as I don't have setup for that.
Key points:
All the tests were done on Hikey960, with a 5V Fan placed over the SoC to cool it down.
HDMI port was disconnected while running tests.
CONFIG_SCHED_TUNE was configured out to keep things simple.
Only the PCmark bench was tested, with help of workload automation.
Below number shows the average out of 3 runs, performed during a single kernel boot cycle.
Pelt 8/16/32 are the half-life periods.
While testing Pelt, CONFIG_WALT was disabled.
+------------------+----------+------------+------------+-----------+ | | | | | | | Test name | WALT | Pelt 8 ms | Pelt 16 ms | Pelt 32 ms| +------------------+----------+------------+------------+-----------+ | | | | | | | DataManipulation | 5341 | 5561 | 5453 | 5400 | | | | | | | | PhotoEditingV2 | 9015 | 8577 | 7911 | 6043 | | | | | | | | VideoEditing | 0 | 4291 | 3746 | 3755 | | | | | | | | WebV2 | 6202 | 6448 | 5465 | 4648 | | | | | | | | Workv2 | 0 | 5697 | 5069 | 4517 | | | | | | | | WritingV2 | 4302 | 4549 | 3811 | 3306 | +------------------+----------+------------+------------+-----------+
As you can see in the results Pelt 8 is very much comparable to the Walt results now. Hurray ? :)
A detailed report is present here with some more useful numbers:
How to replicate setup:
Android kernel tree: https://git.linaro.org/people/vireshk/mylinux.git android-4.9-hikey
This has several patches over latest 4.9-hikey aosp tree.
Some patches to reduce disturbances, which Vincent shared earlier with a document.
"thermal: Add debugfs support for cooling devices" and "cpufreq: stats: New sysfs attribute for clearing statistics" are used to read some more data from userspace after tests are done which can be used to build conclusions on working of pelt/walt and how they are behaving differently.
For example, we can know the amount of time we spent on individual cpu frequencies while the test was running. And also the time for which cpu-cooling and devfreq (ddr) has throttled some frequencies.
Pelt 16 and pelt 8 patches.
The below changes are required to capture the extra data that I have captured in my sheet above.
I have attached pelt_walt.sh script, which you need to push to /data:
$ adb push pelt_walt.sh /data
And I have updated the pcmark plugin file to run the script and collect data. That is attached as well.
Happy testing !!
I heard from Vincent earlier that ARM did similar testing earlier on but never found anything significant. Why ? I may have an answer to that, not sure though.
I found a patch from Juri which someone is using:
https://android.googlesource.com/kernel/msm/+/b52bb1f248e4cef65edaece54a68c6...
and one of the problem here is that the patch hasn't updated the __accumulated_sum_N32 array, but only runnable_avg_yN_inv and runnable_avg_yN_sum.
We are aware of the fact that reducing PELT's half-life periods can make the system more responsive. I remember that earlier product versions actually shipped with 16ms. But being more responsive is only one side of the problem. Energy consumption is the other big problem. And by making PELT too aggressive (making half-life period smaller) you risk overshooting of the signal and you reduce the effect of history, two things which could be fatal for energy savings.
IIRC, for newer product kernel, we went back to 32ms.
Driving this idea of reducing PELT's half-life period further ... you end up not using PELT at all but instantaneous load(/util) (se->load.weight, cfs_rq->load.weight).
So the energy consumption values wltests spits out or some tests are also very important to have. And then there is still this issue that h960 (especially close to mainline) is not mature enough to show the same results as when the tests run on a production device.
-- Dietmar
[...]