Hi Dietmar,
On 2016-09-13 04:50, Dietmar Eggemann wrote:
Hi Vikram,
I ran the patch set on my x86-box to get familiar with the WALT internals. Unfortunately I'm not part of the EAS product initiative so this is my first encounter with it. However, my humble knowledge of PELT might help creating a WALT patch-set LKML folk might appreciate.
Cheers,
-- Dietmar
Thank you for taking the time to actually run the code in addition to reviewing it! Very much appreciated. You and Juri have already brought up good points.
On 03/09/16 00:27, markivx@codeaurora.org wrote:
This patch series implements an alternative window assisted load tracking mechanism in lieu of PELT based cpu utilization tracking. Testing has shown that a window based non-decaying metric such as WALT guiding cpu frequency and task placement decisions can improve performance/power especially when running workloads more commonly found on mobile devices. The aim of this series is to incorporate WALT accounting into the scheduler and feed WALT statistics to schedutil in order to guide cpu frequency selection. The implementation is detailed in the commit text of Patch 1. The eventual goal is to also guide placement decisions based on WALT statistics.
By placement you mean EAS/capacity aware wakeup or does it include load-balance?
Wake-up placement mostly, I should clarify that. WALT statistics in load balancing decisions is something to be investigated. This is also leading to the question of whether we want to completely eliminate PELT stats for a true complete comparison.
WALT has existed in out-of-tree kernels for ARM/ARM64 commercialized devices for a few years. This is an effort to bring WALT to mainline as well as to test on multiple architectures and with varied workloads.
This RFC version is mainly to preview what the code will look like on mainline. Future RFC revisions will include a theoretical discussion and benchmark results.
This would be the more interesting part (the 'why' (TM Morten R.) we would have to replace PELT util with WALT. And this also in perspective to the recent effort from Vincent G. to fix util for task groups on LKML.
Tested on an Intel x86_64 machine (on top of 4.7-rc6). (Benchmark results will be sent out separately and as part of this message in the next RFC version).
Run 'perf bench sched messaging -g 20 -l 5000' on my 'Intel Core i7-4750HQ CPU @ 2.00GHz' on v4.7 and v4.7+ and didn't see much difference. So we probably need some more clever workloads to evaluate the overhead of this extra locking.
[...]
Perhaps measuring function call execution times during migration and other cases where we have additional locking might at least provide a data point. The original CFS patch that removed double_lock_balance from load balancing did not have data along with the patch..
I will also instrument some of the WALT update functions to see how much overhead is added in comparison to the existing __update_load_avg code. It will also be interesting to see how much PELT math costs as well.
Also, power measurement using RAPL counters is fairly easy on Intel. It does seem to give me consistent results and we're looking for A-B results anyway. WALT utilization does pretty well on things like video playback.
Thanks, Vikram