Hi Viresh,
On Fri, Jan 19, 2018 at 11:28:16AM +0530, Viresh Kumar wrote:
On 11-01-18, 16:44, Pavan Kondeti wrote:
What is the WALT window size (walt_ravg_window) in your setup? I see it is 20 msec here [1], but just want to double confirm. Reducing WALT window size makes it more responsive. Have you tried with different window size like 10 msec?
Hi Pavan,
Sorry for the delayed response, I wanted to make sure I get back after running some tests with the 10 ms thing.
- So yes, the earlier tests were with the 20 ms window size. I have tested it again with 10 ms and the results have surely improved significantly. No power numbers yet.
Good to know :-) btw what is HZ in your setup? Is it 300?
- Another thing to notice is that the PCMark Video-decoding and Work-V2 tests show proper numbers after this change, which were failing earlier and giving 0 as the results.
It is not clear how WALT window size is breaking these tests. I have not seen any issues (on internal platform) with 20 msec WALT window size.
Few observations from pelt vs walt tests I performed:
The cpufreq statistics show that with walt the big cluster runs for a significantly longer period of time at the highest OPP compared to pelt. Which plays a significant role boosting Walt's performance numbers.
This may happen due to one of the two reasons (or both):
- The slope of the signal is higher and so cpu_util() increases rapidly.
- We put (more) tasks on the bigger CPU quickly with Walt.
My test shows that the main reason is the second one and it isn't really about the slope of the signal thing.
The results I have presented in WALT vs PELT session, are obtained with schedtune.boost = 10 for top-app. So the main thread of PCMark (AsyncTask) is always running on the BIG cluster for both PELT and WALT. So the benefit is purley coming from the fast frequency rampup of WALT. The gap is reduced from ~23% to ~10% with Patrick's util_est patches.
If you don't have boost and util_est feature, you would see better BIG cluster residency with WALT since PELT forgets the history. The main threads runs for about 500 msec and sleep for ~900msec in many subscores.
- I also performed Walt tests without walt_cpu_high_irqload() thing and the results weren't significantly different than walt with it. So that isn't playing a big role either.
walt_cpu_high_irqload() does not come into the picture at all for PCMark or for any other CPU benchmark. It is meant to avoid placing tasks on CPUs busy with IRQs and Softirqs. For example use cases involving high rate WiFi and data transfers.
Thanks, Pavan
-- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project.