Re: [Eas-dev] [RFC PATCH v1 1/3] sched: Introduce Window Assisted Load Tracking (WALT) to track CPU utilization

16 Sep 2016

      On 16/09/16 19:42, Srivatsa Vaddagiri wrote:
...

Juri Lelli juri.lelli@arm.com [2016-09-16 10:22:52]:

...
...
We realize that not all architectures have hardware clock that is synchronized
across CPUs and I think it should still be possible to have synchronized windows
as long as the frequency of hardware clock is same on all cpus. That would be
the next major change WALT need to address.
Do you think it might be actually possible to relax the synchronization
constraint and implement WALT with un-synchronized windows?
The main difficulty would be to adjust busy counters when task migrate.
Synchronized windows would make this pretty trivial. We subtract task's current
window contribution from src_cpu and add that to dst_cpu.
Right. PELT does the removed_{load,utilization} atomic dance to solve
this problem. But, of course, is not as immediate as an atomic src/dst
update as you do. Signals are still an approximation of real execution
though, so it remains to see how much the added locking pays off.
...
In our early version of WALT, we did not have synchronized windows
across CPU. Windows applied to just tasks and not cpus. Each task tracked its
own window_start and cpus did not even track windows. The cited benefit of WALT
(rapid reclassification of tasks) can still be had from such a scheme.
Yes.
...
The additional advantage we get from synchronized windows and busy-time
adjustment upon migration is related to frequency. Lets say task is migrating
between little cpu cluster and big cpu cluster at the end of a window (because it
got classified as big task towards end of window). Synchronized windows allow
us to migrate task's busy time away from its little cpu to big cpu. The load
reported for little cpu after this adjustment ensures that little cpu's
frequency for next window does not include a representation of migrated task's
needs. Vice versa for big cpu.
Yeah. I remember we had that problem on product codeline before
switching to WALT and the way it was fixed involved kicking a
utilization update in the src runqueue (little cpu in your example) and
then a frequency re-evalutation for src runqueue's cluster. Big CPU
update is already happening as consequence of an enqueue there.
...
I haven't given much thought on how impossible this would be to achieve on other
architectures. Does anyone foresee this to be a show-stopper on any
architecture?
I'm mostly afraid of the fact that you basically reintroduce double
locking on migration after it has been removed (for CFS) a couple of
years ago with commit 163122b7fcfa "sched/fair: Remove
double_lock_balance() from load_balance()".
...
Regarding overheads associated with synchronization, there is only a small
overhead during bootup when secondary cpus need to sync up on 'window_start' for
first time. After that they roll on their own (provided there is some constant
offset between hardware clock of various cpus).
Not sure we can assume this for all architectures.
...
The other subtle overhead
related to synchronization is that we require both src_rq and dst_rq lock to be
held during migration (so that we can fixup busy times). I need to think some
more and see if we may be able to relax that requirement.

vatsa

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a
Linux Foundation Collaborative Project

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC PATCH v1 1/3] sched: Introduce Window Assisted Load Tracking (WALT) to track CPU utilization