From: "Joel Fernandes (Google)" <joel(a)joelfernandes.org>
Here's a very rough patch just to discuss prevention of decay of
CPU/task's util_avg signal incase its preempted by RT or DL. Its
likely not correct and needs more work but it solves the issue I see
with my synthetic test.
To reproduce the issue, I wrote a synthetic rt-app test with RT task
preempting a 100% CFS task for 300ms. https://pastebin.com/raw/rXNmRUZY
I have seen in traces that the util_avg decays quickly even before the
RT task sleeps.
The idea is to mark the CFS class from put_prev_task_fair if the task
was running while the put_prev_task() happened and unmark it from
pick_next_task_fair. This approach also keeps the solution of this
within CFS without modification to other classes. What do you think?
Note:
- Just for demo/trial, I took a full int from sched_avg. I am open to
any ideas on a flag we can set, all that's needed is 1 bit.
- The current UTIL_EST design cannot prevent this issue because it
relies on task's sleeping and infact could be affect by this issue
itself incase task's util-est EWMA hasn't built up.
- Vincent's patches for PELT in DL/RT are still needed for correct OPP
selection and estimation of capacity_of etc. This patch solves a
different issue.
CC: "eas-dev" <eas-dev(a)lists.linaro.org>
CC: vincent.guittot(a)linaro.org
CC: patrick.bellasi(a)arm.com
CC: dietmar.eggemann(a)arm.com
CC: chris.redpath(a)arm.com
CC: juri.lelli(a)redhat.com
CC: tkjos(a)google.com
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
include/linux/sched.h | 1 +
kernel/sched/fair.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index ca3f3eae8980..46b686192641 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -404,6 +404,7 @@ struct sched_avg {
unsigned long runnable_load_avg;
unsigned long util_avg;
struct util_est util_est;
+ int freeze_updates;
} ____cacheline_aligned;
struct sched_statistics {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09b7eb69802c..521656b05b22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3103,6 +3103,13 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
u64 periods;
+ /*
+ * If an update is supposed to be skipped, then do nothing.
+ * sa->last_update_time should still moves forward.
+ */
+ if (sa->freeze_updates)
+ return 0;
+
scale_freq = arch_scale_freq_capacity(cpu);
scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
@@ -6963,6 +6970,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
p = task_of(se);
done: __maybe_unused;
+
#ifdef CONFIG_SMP
/*
* Move the next running task to the front of
@@ -6975,6 +6983,8 @@ done: __maybe_unused;
if (hrtick_enabled(rq))
hrtick_start_fair(rq, p);
+ rq->cfs.avg.freeze_updates = 0;
+ p->se.avg.freeze_updates = 0;
return p;
idle:
@@ -6991,6 +7001,7 @@ done: __maybe_unused;
if (new_tasks > 0)
goto again;
+ rq->cfs.avg.freeze_updates = 0;
return NULL;
}
@@ -7006,6 +7017,18 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev)
cfs_rq = cfs_rq_of(se);
put_prev_entity(cfs_rq, se);
}
+
+ /*
+ * Task is being put because of preemption either by CFS or by a higher
+ * class. Make sure no util updates happen.
+ */
+ if (!prev->state) { /* Task is put while running */
+ rq->cfs.avg.freeze_updates = 1;
+ prev->se.avg.freeze_updates = 1;
+ } else {
+ rq->cfs.avg.freeze_updates = 0;
+ prev->se.avg.freeze_updates = 0;
+ }
}
/*
--
2.18.0.rc1.242.g61856ae69a-goog