[Eas-dev] [RFC eas-dev] sched/fair: Skip pelt update if preempted by RT

8 Jun 2018

From: "Joel Fernandes (Google)" joel@joelfernandes.org
Here's a very rough patch just to discuss prevention of decay of
CPU/task's util_avg signal incase its preempted by RT or DL. Its
likely not correct and needs more work but it solves the issue I see
with my synthetic test.
To reproduce the issue, I wrote a synthetic rt-app test with RT task
preempting a 100% CFS task for 300ms. https://pastebin.com/raw/rXNmRUZY
I have seen in traces that the util_avg decays quickly even before the
RT task sleeps.
The idea is to mark the CFS class from put_prev_task_fair if the task
was running while the put_prev_task() happened and unmark it from
pick_next_task_fair. This approach also keeps the solution of this
within CFS without modification to other classes. What do you think?
Note:
- Just for demo/trial, I took a full int from sched_avg. I am open to
  any ideas on a flag we can set, all that's needed is 1 bit.
- The current UTIL_EST design cannot prevent this issue because it
  relies on task's sleeping and infact could be affect by this issue
  itself incase task's util-est EWMA hasn't built up.
- Vincent's patches for PELT in DL/RT are still needed for correct OPP
  selection and estimation of capacity_of etc. This patch solves a
  different issue.
CC: "eas-dev" eas-dev@lists.linaro.org
CC: vincent.guittot@linaro.org
CC: patrick.bellasi@arm.com
CC: dietmar.eggemann@arm.com
CC: chris.redpath@arm.com
CC: juri.lelli@redhat.com
CC: tkjos@google.com
Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org
---
 include/linux/sched.h |  1 +
 kernel/sched/fair.c   | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ca3f3eae8980..46b686192641 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -404,6 +404,7 @@ struct sched_avg {
    unsigned long			runnable_load_avg;
    unsigned long			util_avg;
    struct util_est			util_est;
+	int				freeze_updates;
 } ____cacheline_aligned;
struct sched_statistics {
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 09b7eb69802c..521656b05b22 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3103,6 +3103,13 @@ accumulate_sum(u64 delta, int cpu, struct sched_avg *sa,
    u32 contrib = (u32)delta; /* p == 0 -> delta < 1024 */
    u64 periods;
+	/*
+	 * If an update is supposed to be skipped, then do nothing.
+	 * sa->last_update_time should still moves forward.
+	 */
+	if (sa->freeze_updates)
+		return 0;
+
    scale_freq = arch_scale_freq_capacity(cpu);
    scale_cpu = arch_scale_cpu_capacity(NULL, cpu);
@@ -6963,6 +6970,7 @@ pick_next_task_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf
    p = task_of(se);
done: __maybe_unused;
+
 #ifdef CONFIG_SMP
    /*
     * Move the next running task to the front of
@@ -6975,6 +6983,8 @@ done: __maybe_unused;
    if (hrtick_enabled(rq))
    	hrtick_start_fair(rq, p);
+	rq->cfs.avg.freeze_updates = 0;
+	p->se.avg.freeze_updates = 0;
    return p;
idle:
@@ -6991,6 +7001,7 @@ done: __maybe_unused;
    if (new_tasks > 0)
    	goto again;
+	rq->cfs.avg.freeze_updates = 0;
    return NULL;
 }
@@ -7006,6 +7017,18 @@ static void put_prev_task_fair(struct rq *rq, struct task_struct *prev)
    	cfs_rq = cfs_rq_of(se);
    	put_prev_entity(cfs_rq, se);
    }
+
+	/*
+	 * Task is being put because of preemption either by CFS or by a higher
+	 * class.  Make sure no util updates happen.
+	 */
+	if (!prev->state) { /* Task is put while running */
+		rq->cfs.avg.freeze_updates = 1;
+		prev->se.avg.freeze_updates = 1;
+	} else {
+		rq->cfs.avg.freeze_updates = 0;
+		prev->se.avg.freeze_updates = 0;
+	}
 }
/*
--
2.18.0.rc1.242.g61856ae69a-goog

    

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

[Eas-dev] [RFC eas-dev] sched/fair: Skip pelt update if preempted by RT