Re: [Eas-dev] [RFC eas-dev] sched/fair: Skip pelt update if preempted by RT

11 Jun 2018

On 06/08/2018 05:54 PM, Joel Fernandes wrote:
...
From: "Joel Fernandes (Google)" joel@joelfernandes.org
Here's a very rough patch just to discuss prevention of decay of
CPU/task's util_avg signal incase its preempted by RT or DL. Its
likely not correct and needs more work but it solves the issue I see
with my synthetic test.
To reproduce the issue, I wrote a synthetic rt-app test with RT task
preempting a 100% CFS task for 300ms. https://pastebin.com/raw/rXNmRUZY
I have seen in traces that the util_avg decays quickly even before the
RT task sleeps.
The idea is to mark the CFS class from put_prev_task_fair if the task
was running while the put_prev_task() happened and unmark it from
pick_next_task_fair. This approach also keeps the solution of this
within CFS without modification to other classes. What do you think?
Note:

Just for demo/trial, I took a full int from sched_avg. I am open to
 any ideas on a flag we can set, all that's needed is 1 bit.
The current UTIL_EST design cannot prevent this issue because it
 relies on task's sleeping and infact could be affect by this issue
 itself incase task's util-est EWMA hasn't built up.
Vincent's patches for PELT in DL/RT are still needed for correct OPP
 selection and estimation of capacity_of etc. This patch solves a
 different issue.

Utilization (util_avg) is defined as 'accrue+decay' when running and
'decay' when !running. IMHO, this fundamental thing shouldn't be
changed. Especially if you do this 'time warping' only for a subset of
involved entities (in this case tasks and root cfs_rq's).
[...]
...

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ca3f3eae8980..46b686192641 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -404,6 +404,7 @@ struct sched_avg {
   unsigned long			runnable_load_avg;
   unsigned long			util_avg;
   struct util_est			util_est;

int				freeze_updates;
} ____cacheline_aligned;
struct sched_statistics {


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09b7eb69802c..521656b05b22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3103,6 +3103,13 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
   u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
   u64 periods;

/*
* If an update is supposed to be skipped, then do nothing.


* sa->last_update_time should still moves forward.


*/


if (sa->freeze_updates)
return 0;



So load(_avg) and runnable_load(_avg) would be affected as well. Why?
[...]
...
@@ -7006,6 +7017,18 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev)
   	cfs_rq = cfs_rq_of(se);
   	put_prev_entity(cfs_rq, se);
   }


/*
* Task is being put because of preemption either by CFS or by a higher


* class.  Make sure no util updates happen.


*/


if (!prev->state) { /* Task is put while running */
rq->cfs.avg.freeze_updates = 1;


prev->se.avg.freeze_updates = 1;



Only for tasks and root cfs_rq's! Not for cfs_rq's and se's representing
task groups. Synchronization issues when we add/subtract se's into cfs_rq's.
[...]

    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [RFC eas-dev] sched/fair: Skip pelt update if preempted by RT