Hi Lei,

Thanks for looking at these patches. Have you seen the additional optimization patch set that was released to go on-top of 13.06? It's linked from the 13.06 release notes at http://releases.linaro.org/13.06/android/vexpress . The additional patches contained there will be part of the 13.07 release. Please note however that we're releasing these patches against a 3.10-based kernel release and we have not attempted to port or test them on 3.11.

That said, I will attempt to answer your questions inline.

On 29/07/13 06:51, Lei Wen wrote:

Hi list,

I am trying to porting linaro MP patches to other kernel branches,
and find some regression due to latest kernel change.

The regression is while we test over Linaro 13.06 release with MP3 scenario,
we find CA7 is always busy, while one core of CA15 is occasionally raise up with
the average of 2% cpu usage ratio, another core is kept silence most of time.

But when switch to our backported branch, we find both core in CA15 becomes
active, and keep the usage ratio around 2%.

With further checking, I find one recent merged patch in mainline cause this:
https://lkml.org/lkml/2013/6/27/152

This patch mainly change the initial load for the new born task to the
largest one,
so that it cause hmp make the decision to move all such task to the big cluster.
Since cpu3(The first cpu of CA15) is getting busy now, cpu4 is raise
up to share the
load as consequence. And it make ALL 5 cpus are used in the MP3 scenario, which
should make power result looks bad than before...

What is your opinion for this regression, especially for how HMP should act with
the increased load raising up with the merged patch?

In general, the changes introduced in Alex Shi's patch set will probably conflict a bit with big.LITTLE MP patches. Our patches and his both use the same load tracking metrics and having the two patch sets present will probably have some subtle interactions even though on the surface they look relatively benign. We don't intend to work on forward-porting to 3.11 right now since the Linaro LSK is going to stay on 3.10 and we and Linaro are supporting the patch set on that release. Even if you were to remove all the patches from 141965c7 to 2509940f (Alex's switch to use load-tracking) before integrating big.LITTLE MP, I can't guarantee that there hasn't been a behavioural change elsewhere which won't increase power consumption or damage performance since we haven't tested it.

Specifically about the initial task load profile, we agree with Alex that tasks probably should not always start with zero load for performance reasons and have done something similar in the patch titled "HMP: Force new non-kernel tasks onto big CPUs until load stabilises" in the 13.06 Optimization pack. The aim of that patch is to provide application tasks with access to the maximum compute capacity as soon as possible, and to keep all other tasks migrating between big and little clusters only on the basis of need. It's a kind of initial performance boost behaviour for user application threads, after a couple of sched_periods the load profile will reasonably accurately represent the history and task placement will happen according to need as usual.

Its interesting that the effect is long-lasting enough to use both A15s - are there tasks being created all the time which are exercising the big CPUs? Alex only adds a small amount of busy time to teh accounting, so after 10-20ms its impact should be very small. From the behaviour you describe, I feel that something else is happening to interfere with task placement which is made more visible by the patch you point to. The only way for you to be sure of the cause is to investigate the behaviour - which tasks are resident on the big CPUs, for how long, and what does their load profile look like.

For investigating those kinds of issues, I generally use Streamline or Kernelshark.

In the 13.06 Optimization pack patch I mentioned we give each task an initial load profile, but we chose to make it fully loaded in the beginning. However, just giving a task an initial load profile is not the whole story. In the kernels we have tested, the CPU that a task runs on in its first few slices is largely governed by the location of the parent task and the overall system load (it can be balanced elsewhere within its cluster, but only if unbalanced), so energy consumption and the performance in the first few schedules can be unpredictable.

In order to achieve our desired behaviour, we added further code in select_task_rq_fair to place new tasks on a big CPU. I did not expect the power consumption to increase much, since tasks will only be on a big CPU for a short time unless they have a heavy enough load to justify it, but it turns out that new tasks are started much more often than expected in these low-power media use cases, so we also added code to prevent giving a startup-boost to kernel threads, rt tasks and indeed any tasks forked from init directly (further patches in the 13.06 optimization pack). However, the init fork limit is likely only to be a good idea if you are using an Android userspace where all the apps generally fork from Zygote.

Actually, I have one patch which would forbid the new born task to the
faster cluster.
Not sure it would cause any other side affect. Comments welcomes. :)

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -840,10 +840,26 @@ void init_task_runnable_average(struct task_struct *p)
        p->se.avg.runnable_avg_period = slice;
        __update_task_entity_contrib(&p->se);
 }
+
+static inline bool task_is_new_born(struct task_struct *p)
+{
+       u32 slice;
+
+       /* tribble the times for the new born task */
+       slice = sched_slice(task_cfs_rq(p), &p->se) >> 8;
+
+       return p->se.avg.runnable_avg_period < slice;
+}
+
 #else
 void init_task_runnable_average(struct task_struct *p)
 {
 }
+
+static inline bool task_is_new_born(struct task_struct *p)
+{
+       return true;
+}
 #endif

 /*
@@ -6331,7 +6347,7 @@ static unsigned int hmp_up_migration(int cpu,
struct sched_entity *se)
                                        < hmp_next_up_threshold)
                return 0;

-       if (se->avg.load_avg_ratio > hmp_up_threshold) {
+       if (!task_is_new_born(p) && se->avg.load_avg_ratio > hmp_up_threshold) {
                /* Target domain load < ~94% */
                if (hmp_domain_min_load(hmp_faster_domain(cpu), NULL)
                                                        > NICE_0_LOAD-64)

Thanks,
Lei

This patch avoids up-migration of the newly-created tasks with the artificial load for all tasks. Why not just avoid applying the initial load in the first place if you don't want it to have an impact, or do you want to retain the load-balance behaviour without impacting migration?

Good Luck and Best Regards,
Chris