On 2016-09-16 10:58, Dietmar Eggemann wrote:
On 15/09/16 15:43, Srivatsa Vaddagiri wrote:
- Dietmar Eggemann dietmar.eggemann@arm.com [2016-09-13 12:50:15]:
This patch implements an alternative window-based CPU utilization tracking mechanism in the scheduler. Per task and per CPU counters are updated with utilization statistics using a synchronized (across CPUs) time source and a single statistic (prev_runnable_sum) is fed to the registered utilization callback listeners. A windowed view of time
What are these 'registered utilization callback listeners'?
Anyone that has interest in WALT data, mainly cpufreq governors. Also, a minor correction here - I don't think we envisoned any registration mechanism here to share WALT data. In our production kernel, cpufreq (interactive) governor pulls the data whenever it needs by calling an exported function of scheduler.
Ok, understood ... just stumbled upon this registration thing ...
sched/cpufreq.c exposes the registration API for governors. It currently allows just one callback function, but I suppose that may change in the future. I'll disambiguate this a bit.
[...]
(window size determined by walt_ravg_window) is used to determine CPU utilization.
There are two per-CPU-rq quantities maintained by WALT, both normalized to the max possible frequency and the max efficiency (IPC) of that CPU:
curr_runnable_sum: aggregate utilization of all tasks that
executed during the current (not yet completed) window
prev_runnable_sum: aggregate utilization of all tasks that
executed during the most recent completed window
prev_runnable_sum is the primary stastic used to guide CPU frequency in lieu of PELT's cfs_rq->util_avg. No additional policy is imposed on this
s/cfs_rq->util_avg/cfs_rq->avg.util_avg
statistic, the assumption being that the consumer (e.g., schedutil) will perform appropriate policy decisions (e.g., margin) before deciding the next P-state.
The former paragraph is related to 'return (util >= capacity) ? capacity : util;' in cpu_util()? Just asking because otherwise IMHO this is no different to PELT util.
Not sure I follow you here. Which "former" paragraph is being referred here?
I was referring to the text above.
To add some clarity on "policy" stuff Vikram is referring to here, prev_runnable_sum refers to actual busy time incurred in previous window. How that is used to decide on next frequency involves consideration of desired headroom or idle time. For example, a CPU that was busy for 99% of the previous window when it was running at some frequency f1, may or may not result in a frequency increase for the next window, depending on the "idle time" goals set by user (which is the policy aspect involved here).
Understood, will have another look into the code trying to grasp it.
This was more of a standalone intro to schedutil+WALT, I really wanted to say that we don't do anything different from schedutil+PELT in terms of enforcing policy, and that the reported number is true frequency/capacity invariant utilization.
Thanks, Vikram