RE: Re: [PATCH 1/1] HMP: Do not send IPI if core already waked up

4 Sep 2015

Dear All,
We further checked with some modifications in the code while making the decision for up migration.
In place of fetching the heaviest task on slower domain and then searching for idle CPU in big domain and then sending IPI to big CPU,
We now first look for the idle CPUs on the bigger domain and then send IPI to big idle CPUs till their exist any heavy task on slower domain CPUs.
We checked for the performance of this patch with the hackbench tool.
Results for 10 samples are;
Without patch
maximum time of execution 3.342secs
minimum time of execution 3.313secs
average of 10 samples:  3.368secs
With patch
maximum time of execution 3.319secs
minimum time of execution 3.071secs
average of 10 samples:  3.169secs
On an average with this patch we would have performance improvement of 5.90%.
This would essentially lead to have better responsiveness of scheduler to migrate the big tasks to big CPUs.
---------------------------------------------------------------------------------------------------------------------------------
...
From e1df1fc8f069aafc592b42bc3fcc7780f32048a7 Mon Sep 17 00:00:00 2001
From: rahulkhandelwal rahul.khandelwal@spreadtrum.com
Date: Sat, 5 Sep 2015 01:35:23 +0800
Subject: [PATCH 1/1] HMP: send IPI to all available big idle CPUs till their
 exist any heavy task on slower domain CPUs
In place of fetching the heaviest task on slower domain and then searching for
idle CPU in big domain and then sending IPI to big CPU,
we now first look for the idle CPUs on the bigger domain and then send
IPI to big idle CPUs till their exist any heavy task on slower domain CPUs.
CC: Sanjeev Yadav sanjeev.yadav@spreadtrum.com
CC: Gaurav Jindal gaurav.jindal@spreadtrum.com
CC: sprd-ind-kernel-group@googlegroups.com
Signed-off-by: rahulkhandelwal rahul.khandelwal@spreadtrum.com
---
 kernel/sched/fair.c |  188 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 155 insertions(+), 33 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1baf641..a81ad6c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7156,6 +7156,34 @@ out:
    put_task_struct(p);
 }
+/*
+ *	Update idle cpu for hmp_domain(faster or slower)
+ *	Return 1 for Success(found idle cpu) otherwise 0
+ */
+
+static inline unsigned int hmp_domain_idle_cpu(struct hmp_domain *hmpd,
+				*idle_cpu, struct cpumask *affinity)
+{
+	int cpu;
+	unsigned long contrib;
+	struct sched_avg *avg;
+	struct cpumask temp_cpumask;
+
+	cpumask_and(&temp_cpumask, &hmpd->cpus, affinity ? affinity : cpu_online_mask);
+
+	for_each_cpu_mask(cpu, temp_cpumask) {
+		avg = &cpu_rq(cpu)->avg;
+		contrib = avg->load_avg_ratio;
+		if (!contrib && !cpu_rq(cpu)->wake_for_idle_pull) {
+			if (idle_cpu)
+				*idle_cpu = cpu;
+			return 1;
+			}
+		}
+	return 0;
+}
+
+
 static DEFINE_SPINLOCK(hmp_force_migration);
/*
@@ -7164,23 +7192,125 @@ static DEFINE_SPINLOCK(hmp_force_migration);
  */
 static void hmp_force_up_migration(int this_cpu)
 {
-	int cpu, target_cpu;
+	int cpu;
+	int target_cpu = NR_CPUS;
    struct sched_entity *curr, *orig;
-	struct rq *target;
    unsigned long flags;
    unsigned int force, got_target;
    struct task_struct *p;
+	struct cpumask idle_cpus = {0};
+	struct sched_avg *avg;
+	struct sched_entity *se;
+	struct rq *rq;
+	int nr_count;
-	if (!spin_trylock(&hmp_force_migration))
+	struct cpumask *hmp_slowcpu_mask = NULL;
+	struct cpumask *hmp_fastcpu_mask = NULL;
+	u64 now;
+	
+	if (!spin_trylock(&hmp_force_migration)) 
    	return;
-	for_each_online_cpu(cpu) {
+		
+	if(hmp_cpu_is_slowest(this_cpu)) {
+		hmp_fastcpu_mask = &hmp_faster_domain(this_cpu)->cpus;
+		hmp_slowcpu_mask = &hmp_cpu_domain(this_cpu)->cpus;
+		}
+	else {
+		hmp_fastcpu_mask = &hmp_cpu_domain(this_cpu)->cpus;
+		hmp_slowcpu_mask = &hmp_slower_domain(this_cpu)->cpus;
+		}
+
+	/* Check for idle_cpus in faster domain, */
+	/* update the idle mask if load_avg_ratio = 0 */
+	for_each_cpu(cpu, hmp_fastcpu_mask) {
+		rq = cpu_rq(cpu);
+		avg = &cpu_rq(cpu)->avg;
+		if(avg->load_avg_ratio == 0) 
+			cpumask_set_cpu(cpu, &idle_cpus);
+	}
+
+	/* Don't iterate for slower domain, if no idle cpu in faster domain*/
+	if(cpumask_empty(&idle_cpus)) 
+		goto DOWN;
+
+	/* Iterate slower domain cpus for up migration*/
+	for_each_cpu(cpu, hmp_slowcpu_mask) {
+		if(!cpu_online(cpu)) {
+			continue;
+		}
+		
+		nr_count = hmp_max_tasks;
+		rq = cpu_rq(cpu);
+			
+		raw_spin_lock_irqsave(&rq->lock, flags);
+		
+		curr = rq->cfs.curr;
+		if (!curr || rq->active_balance) {
+			raw_spin_unlock_irqrestore(&rq->lock, flags);
+			continue;
+		}
+		if (!entity_is_task(curr)) {
+			struct cfs_rq *cfs_rq;
+			cfs_rq = group_cfs_rq(curr);
+			while (cfs_rq) {
+				curr = cfs_rq->curr;
+				cfs_rq = group_cfs_rq(curr);
+			}
+		}
+
+		/* The currently running task is not on the runqueue */
+		se = __pick_first_entity(cfs_rq_of(curr));
+		while(!cpumask_empty(&idle_cpus) && nr_count-- && se) {
+			struct task_struct *p = task_of(se);
+			/* Check for eligible task for up migration */
+			if (entity_is_task(se) &&
+				se->avg.load_avg_ratio > hmp_up_threshold &&
+				cpumask_intersects(hmp_fastcpu_mask, tsk_cpus_allowed(p))) {	
+
+#ifdef CONFIG_SCHED_HMP_PRIO_FILTER
+				/* Filter by task priority */
+				if (p->prio >= hmp_up_prio) {
+					se = __pick_next_entity(se);
+					continue;
+				}
+#endif
+						
+				/* Let the task load settle before doing another up migration */
+				/* hack - always use clock from first online CPU */
+				now = cpu_rq(cpumask_first(cpu_online_mask))->clock_task;
+				if (((now - se->avg.hmp_last_up_migration) >> 10)
+					< hmp_next_up_threshold) {
+					se = __pick_next_entity(se);
+					continue;
+				}
+				
+				/* Wakeup for eligible idle cpu in faster domain */
+				if (hmp_domain_idle_cpu(hmp_faster_domain(cpu), &target_cpu,
+					tsk_cpus_allowed(p))) {
+					cpu_rq(target_cpu)->wake_for_idle_pull = 1;
+					cpumask_clear_cpu(target_cpu, &idle_cpus); /* Update idle cpumask */
+					smp_send_reschedule(target_cpu);	
+				}	
+			}
+			se = __pick_next_entity(se);		
+		}
+		raw_spin_unlock_irqrestore(&rq->lock, flags);		
+	}
+		
+DOWN:
+
+	for_each_cpu(cpu, hmp_fastcpu_mask) {
+		if(!cpu_online(cpu))
+				continue;
    	force = 0;
    	got_target = 0;
-		target = cpu_rq(cpu);
-		raw_spin_lock_irqsave(&target->lock, flags);
-		curr = target->cfs.curr;
-		if (!curr || target->active_balance) {
-			raw_spin_unlock_irqrestore(&target->lock, flags);
+		rq = cpu_rq(cpu);
+		
+		raw_spin_lock_irqsave(&rq->lock, flags);
+
+		curr = rq->cfs.curr;
+		if (!curr || rq->active_balance) {
+			raw_spin_unlock_irqrestore(&rq->lock, flags);
    		continue;
    	}
    	if (!entity_is_task(curr)) {
@@ -7193,19 +7323,7 @@ static void hmp_force_up_migration(int this_cpu)
    		}
    	}
    	orig = curr;
-		curr = hmp_get_heaviest_task(curr, -1);
-		if (!curr) {
-			raw_spin_unlock_irqrestore(&target->lock, flags);
-			continue;
-		}
-		p = task_of(curr);
-		if (hmp_up_migration(cpu, &target_cpu, curr)) {
-			cpu_rq(target_cpu)->wake_for_idle_pull = 1;
-			raw_spin_unlock_irqrestore(&target->lock, flags);
-			spin_unlock(&hmp_force_migration);
-			smp_send_reschedule(target_cpu);
-			return;
-		}
+
    	if (!got_target) {
    		/*
    		 * For now we just check the currently running task.
@@ -7214,13 +7332,13 @@ static void hmp_force_up_migration(int this_cpu)
    		 */
    		curr = hmp_get_lightest_task(orig, 1);
    		p = task_of(curr);
-			target->push_cpu = hmp_offload_down(cpu, curr);
-			if (target->push_cpu < NR_CPUS) {
+			rq->push_cpu = hmp_offload_down(cpu, curr);
+			if (rq->push_cpu < NR_CPUS) {
    			get_task_struct(p);
-				target->migrate_task = p;
+				rq->migrate_task = p;
    			got_target = 1;
-				trace_sched_hmp_migrate(p, target->push_cpu, HMP_MIGRATE_OFFLOAD);
-				hmp_next_down_delay(&p->se, target->push_cpu);
+				trace_sched_hmp_migrate(p, rq->push_cpu, HMP_MIGRATE_OFFLOAD);
+				hmp_next_down_delay(&p->se, rq->push_cpu);
    		}
    	}
    	/*
@@ -7229,24 +7347,28 @@ static void hmp_force_up_migration(int this_cpu)
    	 * CPU stopper take care of it.
    	 */
    	if (got_target) {
-			if (!task_running(target, p)) {
+			if (!task_running(rq, p)) {
    			trace_sched_hmp_migrate_force_running(p, 0);
-				hmp_migrate_runnable_task(target);
+				hmp_migrate_runnable_task(rq);
    		} else {
-				target->active_balance = 1;
+				rq->active_balance = 1;
    			force = 1;
    		}
    	}
-		raw_spin_unlock_irqrestore(&target->lock, flags);
+		raw_spin_unlock_irqrestore(&rq->lock, flags);
if (force)
-			stop_one_cpu_nowait(cpu_of(target),
+			stop_one_cpu_nowait(cpu_of(rq),
    			hmp_active_task_migration_cpu_stop,
-				target, &target->active_balance_work);
+				rq, &rq->active_balance_work);
    }
+
+
    spin_unlock(&hmp_force_migration);
+
 }
+
 /*
  * hmp_idle_pull looks at little domain runqueues to see
  * if a task should be pulled.
-- 
1.7.9.5

---------------------------------------------------------------------------------------------------------------



-----Original Message-----
From: Sarbojit Ganguly [mailto:ganguly.s@samsung.com] 
Sent: 04 September 2015 08:46
To: Alex Shi; Rahul Khandelwal (Rahul Khandelwal); Sarbojit Ganguly; Jon Medhurst (Tixy)
Cc: Hiren Bharatbhai Gajjar; khilman@linaro.org; Morten.Rasmussen@arm.com; vincent.guittot@linaro.org; Gaurav Jindal (Gaurav Jindal); linaro-kernel@lists.linaro.org; Sanjeev Yadav (Sanjeev Kumar Yadav); chris.redpath@arm.com; Patrick Bellasi; SHARAN ALLUR; VIKRAM MUPPARTHI; SUNEEL KUMAR SURIMANI
Subject: Re: Re: [PATCH 1/1] HMP: Do not send IPI if core already waked up

Rahul, 

I will run hackbench with your patch and share the results. Lets see if we can get some measurable differences on the performance side.


Regards,
Sarbojit

------- Original Message -------
Sender : Alex Shialex.shi@linaro.org
Date : Aug 28, 2015 12:25 (GMT+05:30)
Title : Re: [PATCH 1/1] HMP: Do not send IPI if core already waked up

Seems it is big saving...

On 08/25/2015 11:58 PM, Rahul Khandelwal (Rahul Khandelwal) wrote:
> Dear Sarbojit,
> 
> Agreed with your point, thanks for raising it..
> 
>  
> 
> There was one issue in the total no of interrupt count in the previous
> chart preparation, actually the total no of interrupt is equal to the
> interrupt sent to Idle core.
> 
>  
> 
> On an average we are saving more than 25% of the IPIs sent to big domain
> CPUs. I just updated the below mentioned chart with the related data.
> 
>  
> 


?????
???   ??   ?? ??
----------------------------------------------------------------------+
The Tao lies beyond Yin and Yang. It is silent and still as a pool of water.      |
It does not seek fame, therefore nobody knows its presence.                       |
It does not seek fortune, for it is complete within itself.                       |
It exists beyond space and time.                                                  |
----------------------------------------------------------------------+

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

RE: Re: [PATCH 1/1] HMP: Do not send IPI if core already waked up