Hi Deitmar,
[ + eas-dev ]
On Sun, May 22, 2016 at 06:06:47PM +0100, Dietmar Eggemann wrote:
> On 05/21/2016 06:25 AM, Leo Yan wrote:
> >On Fri, May 20, 2016 at 08:08:16PM +0100, Dietmar Eggemann wrote:
> >>Hi Leo,
> >>
> >>try the attached testcase in LISA which runs the wl's in '/tg_1/tg_11'
> >>You can create different tg level-hierarchies in rfc_tg.config if you wish.
> >>
> >>08:05:17 DEBUG : sudo -- sh -c '/root/devlib-target/bin/shutils cgroups_run_into /tg_1/tg_11 '\''/root/devlib-target/bin/rt-app /root/devlib-target/run_dir/06_pct_00.json'\'''
> >
> >Very appreciate for the case.
>
> I thought about this again and maybe you want to test task migration of
> a task running in a task group? This would make much more sense than
> only running task in a task group in case you want to test the pelt signals.
After enable EAS, I can see the task running in task group is migrated
between different CPUs when task is waken up.
> I added some functionality to rt-app which lets you restrict the cpu
> affinity of a task per phase of its run so you can create a task inside
> a task group which alternates between two cpus while running. This
> migration is done by the running task (so it's
> sched_setaffinity()->__set_cpus_allowed_ptr()->stop_one_cpu(...,
> migration_cpu_stop, ...)->__migrate_task()->move_queued_task()
>
> So if you interested in this just ask me on eas-dev so I can share the
> rt-app functionality and a how-to build rt-app on the list for a broader
> audience.
Yes, this is another path we should test for task migration. So could
you share this on mailing list? We also can consider to integrate this
into rt-app's repo.
Thanks,
Leo Yan
In current code the CPU's idle state cpufreq_pstates::idle is initialized
to '-1'; and until parse first "cpu_idle" event for the CPU then set CPU's
idle state to '0' or '1' corresponding to active or idle. This will cause
error for P-state's statistics: from the beginning to first "cpu_idle"
event, during this period the CPU's idle state is '-1' so function
cpu_change_pstate() will always think it's first update and finally abandon
previous time.
This will introduce very big error if the CPU is always running and never
run into idle state. So this patch is to fix this issue by initialize CPU's
idle state before parse P-state and C-state's time. Initialize CPU's idle
state according to first cpu_idle log:
- If the CPU first cpu_idle state is '-1', that means from the beginning
the CPU is stayed on idle state;
- If the CPU first cpu_idle state is other value, means the CPU is active.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
tracefile_idlestat.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 51 insertions(+)
diff --git a/tracefile_idlestat.c b/tracefile_idlestat.c
index 3430693..fb0b3d8 100644
--- a/tracefile_idlestat.c
+++ b/tracefile_idlestat.c
@@ -152,6 +152,55 @@ int load_text_data_line(char *buffer, struct cpuidle_datas *datas, char *format,
return get_wakeup_irq(datas, buffer);
}
+/**
+ * init_cpu_idle_state - Init CPU's idle state according to first cpu_idle log.
+ * For a specific cpu_idle event, its state is '-1' then that means from the
+ * beginning the CPU is stayed on idle state; Otherwise means the CPU is active.
+ * So initilize per-CPU idle flag to get more accurate time.
+ *
+ * @datas: structure for P-state and C-state's statistics
+ * @f: the file handle of the idlestat trace file
+ */
+void init_cpu_idle_state(struct cpuidle_datas *datas, FILE *f)
+{
+ struct cpufreq_pstates *ps;
+ unsigned int state, freq, cpu;
+ double time;
+ char buffer[BUFSIZE];
+
+ do {
+ if (strstr(buffer, "cpu_idle")) {
+ if (sscanf(buffer, TRACE_FORMAT, &time, &state, &cpu)
+ != 3) {
+ fprintf(stderr, "warning: Unrecognized cpuidle "
+ "record. The result of analysis might "
+ "be wrong.\n");
+ return -1;
+ }
+ }
+
+ ps = &(datas->pstates[cpu]);
+
+ /* CPU's state has been initialized, skip it */
+ if (ps->idle != -1)
+ continue;
+
+ /*
+ * The CPU's first cpu_idle is '-1', means CPU is staying in
+ * idle state and exit from idle until first cpu_idle event.
+ * Otherwise, means the CPU is active at beginning.
+ */
+ if (state == -1)
+ ps->idle = 1;
+ else
+ ps->idle = 0;
+
+ } while (fgets(buffer, BUFSIZE, f));
+
+ /* After traverse file, reset offset */
+ fseek(f, 0, SEEK_SET);
+}
+
void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas)
{
double begin = 0, end = 0;
@@ -159,6 +208,8 @@ void load_text_data_lines(FILE *f, char *buffer, struct cpuidle_datas *datas)
setup_topo_states(datas);
+ init_cpu_idle_state(datas, f);
+
do {
if (load_text_data_line(buffer, datas, TRACE_FORMAT,
&begin, &end, &start) != -1) {
--
1.9.1
Hi Patrick,
[ + eas-dev ]
With non-root user in Android, I cannot add PID to SchedTune's cgroup;
At beginning I thought it's related with cgroup's file node
attribution, so tried to use "root" user to change permission with
"a+rwx", even so still cannot set cgroup's node by non-root user.
hikey:/ $ su
hikey:/ # chmod a+rwx /sys/fs/cgroup/stune/performance/cgroup.procs
hikey:/ # exit
hikey:/ $ echo 1937 > /sys/fs/cgroup/stune/performance/cgroup.procs
hikey:/ $ cat /sys/fs/cgroup/stune/performance/cgroup.procs
Do you have suggestion for what's the formal method for adding PID to
SchedTune's cgroup with non-root user?
Thanks,
Leo Yan
When task is migrated from CPU_A to CPU_B, scheduler will decrease
the task's load/util from the task's cfs_rq and also add them into
migrated cfs_rq. But if kernel enables CONFIG_FAIR_GROUP_SCHED then this
cfs_rq is not the same one with cpu's cfs_rq. As a result, after task is
migrated to CPU_B, then CPU_A still have task's stale value for
load/util; on the other hand CPU_B also cannot reflect new load/util
which introduced by the task.
So this patch is to operate the task's load/util to cpu's cfs_rq, so
finally cpu's cfs_rq can really reflect task's migration.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
kernel/sched/fair.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0fe30e6..10ca1a9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2825,12 +2825,24 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq);
static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
{
struct sched_avg *sa = &cfs_rq->avg;
+ struct sched_avg *cpu_sa = NULL;
int decayed, removed = 0;
+ int cpu = cpu_of(rq_of(cfs_rq));
+
+ if (&cpu_rq(cpu)->cfs != cfs_rq)
+ cpu_sa = &cpu_rq(cpu)->cfs.avg;
if (atomic_long_read(&cfs_rq->removed_load_avg)) {
s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
sa->load_avg = max_t(long, sa->load_avg - r, 0);
sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
+
+ if (cpu_sa) {
+ cpu_sa->load_avg = max_t(long, cpu_sa->load_avg - r, 0);
+ cpu_sa->load_sum = max_t(s64,
+ cpu_sa->load_sum - r * LOAD_AVG_MAX, 0);
+ }
+
removed = 1;
}
@@ -2838,6 +2850,12 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
sa->util_avg = max_t(long, sa->util_avg - r, 0);
sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);
+
+ if (cpu_sa) {
+ cpu_sa->util_avg = max_t(long, cpu_sa->util_avg - r, 0);
+ cpu_sa->util_sum = max_t(s64,
+ cpu_sa->util_sum - r * LOAD_AVG_MAX, 0);
+ }
}
decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
@@ -2896,6 +2914,8 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
+ int cpu = cpu_of(rq_of(cfs_rq));
+
if (!sched_feat(ATTACH_AGE_LOAD))
goto skip_aging;
@@ -2919,6 +2939,13 @@ skip_aging:
cfs_rq->avg.load_sum += se->avg.load_sum;
cfs_rq->avg.util_avg += se->avg.util_avg;
cfs_rq->avg.util_sum += se->avg.util_sum;
+
+ if (&cpu_rq(cpu)->cfs != cfs_rq) {
+ cpu_rq(cpu)->cfs.avg.load_avg += se->avg.load_avg;
+ cpu_rq(cpu)->cfs.avg.load_sum += se->avg.load_sum;
+ cpu_rq(cpu)->cfs.avg.util_avg += se->avg.util_avg;
+ cpu_rq(cpu)->cfs.avg.util_sum += se->avg.util_sum;
+ }
}
static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
--
1.9.1
When log CPU's load and utilization, should directly use CPU's cfs_rq
for tracking. If use the task's cfs_rq, it may introduce error value
by using task_group's cfs_rq but not real CPU's cfs_rq.
Signed-off-by: Leo Yan <leo.yan(a)linaro.org>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6ac9ea3..26f3f2d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2825,7 +2825,7 @@ static inline void update_load_avg(struct sched_entity *se, int update_tg)
if (entity_is_task(se))
trace_sched_load_avg_task(task_of(se), &se->avg);
- trace_sched_load_avg_cpu(cpu, cfs_rq);
+ trace_sched_load_avg_cpu(cpu, &cpu_rq(cpu)->cfs);
}
static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
--
1.9.1
the cap_idx should be set to energy_env->cap_idx,
it is used by group_norm_usage() later.
Signed-off-by: Mark Yang <mark.yang(a)spreadtrum.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 93005c9..ced4a99 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4617,7 +4617,7 @@ static int find_new_capacity(struct energy_env *eenv,
for (idx = 0; idx < sge->nr_cap_states; idx++) {
if (sge->cap_states[idx].cap >= util)
- return idx;
+ break;
}
eenv->cap_idx = idx;
--
2.5.0
Hi all,
as promised during our meeting last week here is a link to the
report I've presented regarding "Task Estimation Utilization":
https://docs.google.com/document/d/1f2NpnYUS0ci_sLn4i2IY6ZKrRVOqRNnktI9xnt4…
Comment can be added to the document.
Patches are still under internal review, we will post the initial
proposal as soon as we get them validated.
Cheers Patrick
--
#include <best/regards.h>
Patrick Bellasi
This patch series is a following up for EASv5 power profiling on Hikey.
>From profiling result, rt-app-31/38/44 are inconsistent; Finally
found this issue can be fixed by these 4 patches. After applied these
patch, we can get good improvement for these cases (mW):
Energy BestComb Mainline(ndm) noEAS(ndm) EAS(ndm) EAS(sched) EAS(Applied Patches)
mp3 412 604.41 551.79 528.99 530.20 491.10
rt-app-6 676 864.18 846.72 792.88 840.33 759.96
rt-app-13 968 1222.47 1210.35 1673.04 1332.13 1253.99
rt-app-19 1348 1412.08 1474.86 1612.12 1421.28 1355.49
rt-app-25 1619 1718.67 1710.73 2104.41 2028.25 1584.25
rt-app-31 1878 1968.08 1965.87 2318.11 2976.59 1903.69
rt-app-38 2283 2580.23 2540.45 2576.46 2724.32 2241.29
rt-app-44 2578 3092.66 3056.92 2913.91 2669.91 2406.45
rt-app-50 2848 3492.36 3423.26 3489.14 3429.41 3290.25
This patch series is ONLY for EXPERIMENTAL purpose.
Leo Yan (4):
sched/fair: EASv5: Fix CPU shared capacity issue
sched/fair: EASv5: snapshot CPU's utilization
sched/fair: EASv5: Add CPU's total utilization
sched/fair: EASv5: update new capacity index
kernel/sched/fair.c | 88 +++++++++++++++++++++++++++++++++++++++++++---------
kernel/sched/sched.h | 1 +
2 files changed, 74 insertions(+), 15 deletions(-)
--
1.9.1
Hi all,
At connect, Steve also brought out related questions: pack tasks with
higher OPP or spread tasks with lower OPP? so I'd like to summary this
and combind with recently profiling result:
- When task_A is waken up, then scheduler need decide it should pack
task_A onto a busy CPU, or scheduler need spread task_A to a idle
CPU.
If pack task_A onto one busy CPU, this will introduce possible power
penalty caused by higher OPP; on the other hand, if spread task_A to
a idle CPU (The idle CPU's cluster may also stay in idle state),
then this will introduce power penalty caused by extra power domain.
So I think we can enhance energy calculation algorithm when wake up
the task in function energy_aware_wake_cpu(). For example, we can
select two candidate CPUs for waken up task, one possible CPU is in
the same schedule group with the task's original CPU, and another
possible CPU is in another schedule group (this schedule group
should have best or equal power efficiency in system). Then finally
we can get to know if need spread tasks to different cluster, or need
spread task to different CPU but in the same cluster, or just stay
on original CPU.
- I also observed here have another possible scenario. For example, if
tasks have been already packed to several CPUs, and though every
task's workload is not quite high (such like rt-app-13) but they
accumulate load on one CPU, so finally CPU will run at high OPP.
So if EAS pick up only one of these tasks and try to migrate the
task to another CPU, usually will not migrate to that CPU. The
reason is even target CPU have run at high OPP, but usually it still
have capacity to run more workloads with highest OPP; so energy_diff()
also will get worse power result after increase OPP, and task will
stay on original CPU. [1][2]
Even if pick one idle CPU from another cluster, still cannot resolve
this issue. Because if spread task to another cluster, the original
cluster and CPU's OPP will not decrease but introduce extra power by
the new cluster and CPU.
So in this case, should consider as a global view and define some
criteria:
* CPUs don't stay on lowest OPP, but system have idle CPUs;
* CPU's lower OPP can meet capacity requirement for all task's
average load;
* CPU's lower OPP can meet capacity requirement for the highest load
task in system.
If meet these criteria, EAS can select idle CPU from schedule group
with best power efficiency.
I think you guys may have discussed this topic yet, so before I start
to try with these ideas, want to check if I missed some discussion and
welcom any suggestion.
[1] http://people.linaro.org/~leo.yan/eas_profiling/eas_tasks_in_one_cluster_hi…
[2] http://people.linaro.org/~leo.yan/eas_profiling/eas_tasks_in_one_cluster_en…
Thanks,
Leo Yan