[PATCH v2 0/2] sched/idle : find the best idle CPU with cpuidle info

List overview All Threads
Download

newer

older

arm-soc boot: 88 boots: 78 pass,...

mainline boot: 88 boots: 86 pass,...

Nicolas Pitre

4 Sep 2014 4 Sep '14

3:32 p.m.

This is a rework of the series initially posted by Daniel Lezcano here:

http://news.gmane.org/group/gmane.linux.power-management.general/thread=4416...

Those patches were straightened up, commit logs are more comprehensive, bugs were fixed, etc.

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++------- kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 7 deletions(-)

Nicolas

Show replies by date

Nicolas Pitre

4 Sep 4 Sep

3:32 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

From: Daniel Lezcano daniel.lezcano@linaro.org

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

1. For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' 'cpuidle_pause_and_lock' 'cpuidle_uninstall_idle_handler' 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to NULL.

2. The cpuidle driver is unloaded. Logically that could happen but not in practice because the drivers are always compiled in and 95% of them are not coded to unregister themselves. In any case, the unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be used inside a rcu_read_lock section where we are protected with the 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The idle_get_state() and idle_put_state() accessors should be used to that effect.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org --- drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ee9df5e3f5..530e3055a2 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void) initialized = 0; kick_all_cpus_sync(); } + + /* + * Make sure external observers (such as the scheduler) + * are done looking at pointed idle states. + */ + rcu_barrier(); }

/** diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 11e7bc434f..c47fce75e6 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -147,6 +147,9 @@ use_default: clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu)) goto use_default;

+ /* Take note of the planned idle state. */ + idle_set_state(this_rq(), &drv->states[next_state]); + /* * Enter the idle state previously returned by the governor decision. * This function will block until an interrupt occurs and will take @@ -154,6 +157,9 @@ use_default: */ entered_state = cpuidle_enter(drv, dev, next_state);

+ /* The cpu is no longer idle or about to enter idle. */ + idle_set_state(this_rq(), NULL); + if (broadcast) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 579712f4e9..aea8baa7a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h"

struct rq; +struct cpuidle_state;

extern __read_mostly int scheduler_running;

@@ -636,6 +637,11 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif + +#ifdef CONFIG_CPU_IDLE + /* Must be inspected within a rcu lock section */ + struct cpuidle_state *idle_state; +#endif };

static inline int cpu_of(struct rq *rq) @@ -1180,6 +1186,39 @@ static inline void idle_exit_fair(struct rq *rq) { }

#endif

+#ifdef CONFIG_CPU_IDLE +static inline void idle_set_state(struct rq *rq, + struct cpuidle_state *idle_state) +{ + rq->idle_state = idle_state; +} + +static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{ + rcu_read_lock(); + return rq->idle_state; +} + +static inline void cpuidle_put_state(struct rq *rq) +{ + rcu_read_unlock(); +} +#else +static inline void idle_set_state(struct rq *rq, + struct cpuidle_state *idle_state) +{ +} + +static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{ + return NULL; +} + +static inline void cpuidle_put_state(struct rq *rq) +{ +} +#endif + extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); extern void update_max_interval(void);

-- 1.8.4.108.g55ea5f6

Paul E. McKenney

18 Sep 18 Sep

5:37 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Thu, Sep 04, 2014 at 11:32:09AM -0400, Nicolas Pitre wrote:

...

From: Daniel Lezcano daniel.lezcano@linaro.org

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' 'cpuidle_pause_and_lock' 'cpuidle_uninstall_idle_handler' 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to NULL.

The cpuidle driver is unloaded. Logically that could happen but not

in practice because the drivers are always compiled in and 95% of them are not coded to unregister themselves. In any case, the unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be used inside a rcu_read_lock section where we are protected with the 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The idle_get_state() and idle_put_state() accessors should be used to that effect.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ee9df5e3f5..530e3055a2 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void) initialized = 0; kick_all_cpus_sync(); }
/*
* Make sure external observers (such as the scheduler)
* are done looking at pointed idle states.
*/
rcu_barrier();

Actually, all rcu_barrier() does is to make sure that all previously queued RCU callbacks have been invoked. And given the current implementation, if there are no callbacks queued anywhere in the system, rcu_barrier() is an extended no-op. "Has CPU 0 any callbacks?" "Nope!" "Has CPU 1 any callbacks?" "Nope!" ... "Has CPU nr_cpu_ids-1 any callbacks?" "Nope!" "OK, done!"

This is all done with the current task looking at per-CPU data structures, with no interaction with the scheduler and with no need to actually make those other CPUs do anything.

So what is it that you really need to do here?

A synchronize_sched() will wait for all non-idle online CPUs to pass through the scheduler, where "idle" includes usermode execution in CONFIG_NO_HZ_FULL=y kernels. But it won't wait for CPUs executing in the idle loop.

A synchronize_rcu_tasks() will wait for all non-idle tasks that are currently on a runqueue to do a voluntary context switch. There has been some discussion about extending this to idle tasks, but the current prospective users can live without this. But if you need it, I can push on getting it set up. (Current plans are that synchronize_rcu_tasks() goes into the v3.18 merge window.) And one caveat: There is long latency associated with synchronize_rcu_tasks() by design. Grace periods are measured in seconds.

A stop_cpus() will force a context switch on all CPUs, though it is a rather big hammer.

So again, what do you really need?

Thanx, Paul

...

}

/** diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 11e7bc434f..c47fce75e6 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -147,6 +147,9 @@ use_default: clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu)) goto use_default;

/* Take note of the planned idle state. */

idle_set_state(this_rq(), &drv->states[next_state]);

/*

Enter the idle state previously returned by the governor decision.

This function will block until an interrupt occurs and will take

@@ -154,6 +157,9 @@ use_default: */ entered_state = cpuidle_enter(drv, dev, next_state);

/* The cpu is no longer idle or about to enter idle. */

idle_set_state(this_rq(), NULL);

if (broadcast) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 579712f4e9..aea8baa7a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h"

struct rq; +struct cpuidle_state;

extern __read_mostly int scheduler_running;

@@ -636,6 +637,11 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif

+#ifdef CONFIG_CPU_IDLE

/* Must be inspected within a rcu lock section */

struct cpuidle_state *idle_state;

+#endif };

static inline int cpu_of(struct rq *rq) @@ -1180,6 +1186,39 @@ static inline void idle_exit_fair(struct rq *rq) { }

#endif

+#ifdef CONFIG_CPU_IDLE +static inline void idle_set_state(struct rq *rq,
		  struct cpuidle_state *idle_state)
+{

rq->idle_state = idle_state;

+}

+static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{

rcu_read_lock();

return rq->idle_state;

+}

+static inline void cpuidle_put_state(struct rq *rq) +{

rcu_read_unlock();

+} +#else +static inline void idle_set_state(struct rq *rq,
		  struct cpuidle_state *idle_state)
+{ +}

+static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{

return NULL;

+}

+static inline void cpuidle_put_state(struct rq *rq) +{ +} +#endif

extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); extern void update_max_interval(void); -- 1.8.4.108.g55ea5f6

-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Paul E. McKenney

5:39 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Thu, Sep 18, 2014 at 10:37:33AM -0700, Paul E. McKenney wrote:

...

On Thu, Sep 04, 2014 at 11:32:09AM -0400, Nicolas Pitre wrote:

...
From: Daniel Lezcano daniel.lezcano@linaro.org

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' 'cpuidle_pause_and_lock' 'cpuidle_uninstall_idle_handler' 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to NULL.

The cpuidle driver is unloaded. Logically that could happen but not

in practice because the drivers are always compiled in and 95% of them are not coded to unregister themselves. In any case, the unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be used inside a rcu_read_lock section where we are protected with the 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The idle_get_state() and idle_put_state() accessors should be used to that effect.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ee9df5e3f5..530e3055a2 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void) initialized = 0; kick_all_cpus_sync(); }
/*
* Make sure external observers (such as the scheduler)
* are done looking at pointed idle states.
*/
rcu_barrier();
Actually, all rcu_barrier() does is to make sure that all previously queued RCU callbacks have been invoked. And given the current implementation, if there are no callbacks queued anywhere in the system, rcu_barrier() is an extended no-op. "Has CPU 0 any callbacks?" "Nope!" "Has CPU 1 any callbacks?" "Nope!" ... "Has CPU nr_cpu_ids-1 any callbacks?" "Nope!" "OK, done!"

This is all done with the current task looking at per-CPU data structures, with no interaction with the scheduler and with no need to actually make those other CPUs do anything.

So what is it that you really need to do here?

A synchronize_sched() will wait for all non-idle online CPUs to pass through the scheduler, where "idle" includes usermode execution in CONFIG_NO_HZ_FULL=y kernels. But it won't wait for CPUs executing in the idle loop.

A synchronize_rcu_tasks() will wait for all non-idle tasks that are currently on a runqueue to do a voluntary context switch. There has been some discussion about extending this to idle tasks, but the current prospective users can live without this. But if you need it, I can push on getting it set up. (Current plans are that synchronize_rcu_tasks() goes into the v3.18 merge window.) And one caveat: There is long latency associated with synchronize_rcu_tasks() by design. Grace periods are measured in seconds.

A stop_cpus() will force a context switch on all CPUs, though it is a rather big hammer.

And I was reminded by the very next email that kick_all_cpus_sync() is another possibility -- it forces an interrupt on all online CPUs, idle or not.

Thanx, Paul

...

So again, what do you really need?
					Thanx, Paul
...
}

/** diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index 11e7bc434f..c47fce75e6 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -147,6 +147,9 @@ use_default: clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu)) goto use_default;

/* Take note of the planned idle state. */

idle_set_state(this_rq(), &drv->states[next_state]);

/*

Enter the idle state previously returned by the governor decision.

This function will block until an interrupt occurs and will take

@@ -154,6 +157,9 @@ use_default: */ entered_state = cpuidle_enter(drv, dev, next_state);

/* The cpu is no longer idle or about to enter idle. */

idle_set_state(this_rq(), NULL);

if (broadcast) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 579712f4e9..aea8baa7a5 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h"

struct rq; +struct cpuidle_state;

extern __read_mostly int scheduler_running;

@@ -636,6 +637,11 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif

+#ifdef CONFIG_CPU_IDLE

/* Must be inspected within a rcu lock section */

struct cpuidle_state *idle_state;

+#endif };

static inline int cpu_of(struct rq *rq) @@ -1180,6 +1186,39 @@ static inline void idle_exit_fair(struct rq *rq) { }

#endif

+#ifdef CONFIG_CPU_IDLE +static inline void idle_set_state(struct rq *rq,
		  struct cpuidle_state *idle_state)
+{

rq->idle_state = idle_state;

+}

+static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{

rcu_read_lock();

return rq->idle_state;

+}

+static inline void cpuidle_put_state(struct rq *rq) +{

rcu_read_unlock();

+} +#else +static inline void idle_set_state(struct rq *rq,
		  struct cpuidle_state *idle_state)
+{ +}

+static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{

return NULL;

+}

+static inline void cpuidle_put_state(struct rq *rq) +{ +} +#endif

extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); extern void update_max_interval(void); -- 1.8.4.108.g55ea5f6

-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Peter Zijlstra

11:15 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Thu, Sep 18, 2014 at 10:39:25AM -0700, Paul E. McKenney wrote:

...

On Thu, Sep 18, 2014 at 10:37:33AM -0700, Paul E. McKenney wrote:

...

...
A stop_cpus() will force a context switch on all CPUs, though it is a rather big hammer.

And I was reminded by the very next email that kick_all_cpus_sync() is another possibility -- it forces an interrupt on all online CPUs, idle or not.

I actually have a patch http://lkml.kernel.org/r/1409815075-4180-2-git-send-email-chuansheng.liu@int... that changes that, because apparently there are idle loops that don't actually exit on interrupt :-)

But yes, something like the wake_up_all_idle_cpus() should do.

Nicolas Pitre

6:32 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Thu, 18 Sep 2014, Paul E. McKenney wrote:

...

On Thu, Sep 04, 2014 at 11:32:09AM -0400, Nicolas Pitre wrote:

...
From: Daniel Lezcano daniel.lezcano@linaro.org

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' 'cpuidle_pause_and_lock' 'cpuidle_uninstall_idle_handler' 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to NULL.

The cpuidle driver is unloaded. Logically that could happen but not

in practice because the drivers are always compiled in and 95% of them are not coded to unregister themselves. In any case, the unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be used inside a rcu_read_lock section where we are protected with the 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The idle_get_state() and idle_put_state() accessors should be used to that effect.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index ee9df5e3f5..530e3055a2 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void) initialized = 0; kick_all_cpus_sync(); }
/*
* Make sure external observers (such as the scheduler)
* are done looking at pointed idle states.
*/
rcu_barrier();
Actually, all rcu_barrier() does is to make sure that all previously queued RCU callbacks have been invoked. And given the current implementation, if there are no callbacks queued anywhere in the system, rcu_barrier() is an extended no-op. "Has CPU 0 any callbacks?" "Nope!" "Has CPU 1 any callbacks?" "Nope!" ... "Has CPU nr_cpu_ids-1 any callbacks?" "Nope!" "OK, done!"

This is all done with the current task looking at per-CPU data structures, with no interaction with the scheduler and with no need to actually make those other CPUs do anything.

So what is it that you really need to do here?

In short, we don't want the cpufreq data to go away (see the 2 scenarios above) while the scheduler is looking at it. The scheduler uses the provided accessors (see patch 2/2) so we can put any protection mechanism we want in them. A simple spinlock could do just as well which should be good enough.

Nicolas

Peter Zijlstra

11:17 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Thu, Sep 18, 2014 at 02:32:25PM -0400, Nicolas Pitre wrote:

...

On Thu, 18 Sep 2014, Paul E. McKenney wrote:

...

...
So what is it that you really need to do here?

In short, we don't want the cpufreq data to go away (see the 2 scenarios above) while the scheduler is looking at it. The scheduler uses the provided accessors (see patch 2/2) so we can put any protection mechanism we want in them. A simple spinlock could do just as well which should be good enough.

rq->lock disables interrupts so on that something like kick_all_cpus_sync() will guarantee what you need -- wake_up_all_idle_cpus() will not.

Peter Zijlstra

11:28 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Fri, Sep 19, 2014 at 01:17:15AM +0200, Peter Zijlstra wrote:

...

On Thu, Sep 18, 2014 at 02:32:25PM -0400, Nicolas Pitre wrote:

...
On Thu, 18 Sep 2014, Paul E. McKenney wrote:

...
...
So what is it that you really need to do here?

In short, we don't want the cpufreq data to go away (see the 2 scenarios above) while the scheduler is looking at it. The scheduler uses the provided accessors (see patch 2/2) so we can put any protection mechanism we want in them. A simple spinlock could do just as well which should be good enough.

rq->lock disables interrupts so on that something like kick_all_cpus_sync() will guarantee what you need -- wake_up_all_idle_cpus() will not.

Something like so then?

--- Subject: sched: let the scheduler see CPU idle states From: Daniel Lezcano daniel.lezcano@linaro.org Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

All cpus will exit their idle state and the pointed object will be set to NULL.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Ingo Molnar mingo@redhat.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org --- drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 3 files changed, 41 insertions(+)

--- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void initialized = 0; wake_up_all_idle_cpus(); } + + /* + * Make sure external observers (such as the scheduler) + * are done looking at pointed idle states. + */ + kick_all_cpus_sync(); }

/** --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -147,6 +147,9 @@ static void cpuidle_idle_call(void) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu)) goto use_default;

+ /* Take note of the planned idle state. */ + idle_set_state(this_rq(), &drv->states[next_state]); + /* * Enter the idle state previously returned by the governor decision. * This function will block until an interrupt occurs and will take @@ -154,6 +157,9 @@ static void cpuidle_idle_call(void) */ entered_state = cpuidle_enter(drv, dev, next_state);

+ /* The cpu is no longer idle or about to enter idle. */ + idle_set_state(this_rq(), NULL); + if (broadcast) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

--- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h"

struct rq; +struct cpuidle_state;

/* task_struct::on_rq states: */ #define TASK_ON_RQ_QUEUED 1 @@ -640,6 +641,11 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif + +#ifdef CONFIG_CPU_IDLE + /* Must be inspected within a rcu lock section */ + struct cpuidle_state *idle_state; +#endif };

static inline int cpu_of(struct rq *rq) @@ -1193,6 +1199,29 @@ static inline void idle_exit_fair(struct

#endif

+#ifdef CONFIG_CPU_IDLE +static inline void idle_set_state(struct rq *rq, + struct cpuidle_state *idle_state) +{ + rq->idle_state = idle_state; +} + +static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{ + return rq->idle_state; +} +#else +static inline void idle_set_state(struct rq *rq, + struct cpuidle_state *idle_state) +{ +} + +static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{ + return NULL; +} +#endif + extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); extern void update_max_interval(void);

Nicolas Pitre

19 Sep 19 Sep

6:30 p.m.

New subject: [PATCH v2 1/2] sched: let the scheduler see CPU idle states

On Fri, 19 Sep 2014, Peter Zijlstra wrote:

...

On Fri, Sep 19, 2014 at 01:17:15AM +0200, Peter Zijlstra wrote:

...
On Thu, Sep 18, 2014 at 02:32:25PM -0400, Nicolas Pitre wrote:

...
On Thu, 18 Sep 2014, Paul E. McKenney wrote:

...
...
So what is it that you really need to do here?

In short, we don't want the cpufreq data to go away (see the 2 scenarios above) while the scheduler is looking at it. The scheduler uses the provided accessors (see patch 2/2) so we can put any protection mechanism we want in them. A simple spinlock could do just as well which should be good enough.

rq->lock disables interrupts so on that something like kick_all_cpus_sync() will guarantee what you need -- wake_up_all_idle_cpus() will not.

Something like so then?

I'll trust you for anything that relates to RCU as its subtleties are still escaping my mind.

Still, the commit log refers to idle_put_state() which is no more, and that should be adjusted.

...

Subject: sched: let the scheduler see CPU idle states From: Daniel Lezcano daniel.lezcano@linaro.org Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

For x86/acpi with dynamic c-states, when a laptop switches from battery to AC that could result on removing the deeper idle state. The acpi driver triggers: 'acpi_processor_cst_has_changed' 'cpuidle_pause_and_lock' 'cpuidle_uninstall_idle_handler' 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set to NULL.

The cpuidle driver is unloaded. Logically that could happen but not

in practice because the drivers are always compiled in and 95% of them are not coded to unregister themselves. In any case, the unloading code must call 'cpuidle_unregister_device', that calls 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync' as mentioned above.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

In order to be safe, the idle state pointer stored in the rq must be used inside a rcu_read_lock section where we are protected with the 'rcu_barrier' in the 'cpuidle_uninstall_idle_handler' function. The idle_get_state() and idle_put_state() accessors should be used to that effect.

Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Ingo Molnar mingo@redhat.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 29 +++++++++++++++++++++++++++++ 3 files changed, 41 insertions(+)

--- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -225,6 +225,12 @@ void cpuidle_uninstall_idle_handler(void initialized = 0; wake_up_all_idle_cpus(); }
/*
* Make sure external observers (such as the scheduler)
* are done looking at pointed idle states.
*/
kick_all_cpus_sync();
} /** --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -147,6 +147,9 @@ static void cpuidle_idle_call(void) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu)) goto use_default;

/* Take note of the planned idle state. */

idle_set_state(this_rq(), &drv->states[next_state]);

/*

Enter the idle state previously returned by the governor decision.

This function will block until an interrupt occurs and will take

@@ -154,6 +157,9 @@ static void cpuidle_idle_call(void) */ entered_state = cpuidle_enter(drv, dev, next_state);

/* The cpu is no longer idle or about to enter idle. */

idle_set_state(this_rq(), NULL);

if (broadcast) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

--- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h" struct rq; +struct cpuidle_state; /* task_struct::on_rq states: */ #define TASK_ON_RQ_QUEUED 1 @@ -640,6 +641,11 @@ struct rq { #ifdef CONFIG_SMP struct llist_head wake_list; #endif

+#ifdef CONFIG_CPU_IDLE

/* Must be inspected within a rcu lock section */

struct cpuidle_state *idle_state;

+#endif }; static inline int cpu_of(struct rq *rq) @@ -1193,6 +1199,29 @@ static inline void idle_exit_fair(struct #endif +#ifdef CONFIG_CPU_IDLE +static inline void idle_set_state(struct rq *rq,
		  struct cpuidle_state *idle_state)
+{

rq->idle_state = idle_state;

+}

+static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{

return rq->idle_state;

+} +#else +static inline void idle_set_state(struct rq *rq,
		  struct cpuidle_state *idle_state)
+{ +}

+static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{

return NULL;

+} +#endif

extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); extern void update_max_interval(void);

Nicolas Pitre

4 Sep 4 Sep

3:32 p.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

The code in find_idlest_cpu() looks for the CPU with the smallest load. However, if multiple CPUs are idle, the first idle CPU is selected irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle state, or the latest to have gone idle if all idle CPUs are in the same state. The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

- The idle exit latency of a CPU might be larger than the time needed to migrate the waking task to an already running CPU with sufficient capacity, and therefore performance would benefit from task packing in such case (in most cases task packing is about power saving).

- Some idle states have a non negligible and non abortable entry latency which needs to run to completion before the exit latency can start. A concurrent patch series is making this info available to the cpuidle core. Once available, the entry latency with the idle timestamp could determine when the exit latency may be effective.

Those issues will be handled in due course. In the mean time, what is implemented here should improve things already compared to the current state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre nico@linaro.org --- kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86d0d..416329e1a6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -23,6 +23,7 @@ #include <linux/latencytop.h> #include <linux/sched.h> #include <linux/cpumask.h> +#include <linux/cpuidle.h> #include <linux/slab.h> #include <linux/profile.h> #include <linux/interrupt.h> @@ -4428,20 +4429,48 @@ static int find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) { unsigned long load, min_load = ULONG_MAX; - int idlest = -1; + unsigned int min_exit_latency = UINT_MAX; + u64 latest_idle_timestamp = 0; + int least_loaded_cpu = this_cpu; + int shallowest_idle_cpu = -1; int i;

/* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) { - load = weighted_cpuload(i); - - if (load < min_load || (load == min_load && i == this_cpu)) { - min_load = load; - idlest = i; + if (idle_cpu(i)) { + struct rq *rq = cpu_rq(i); + struct cpuidle_state *idle = idle_get_state(rq); + if (idle && idle->exit_latency < min_exit_latency) { + /* + * We give priority to a CPU whose idle state + * has the smallest exit latency irrespective + * of any idle timestamp. + */ + min_exit_latency = idle->exit_latency; + latest_idle_timestamp = rq->idle_stamp; + shallowest_idle_cpu = i; + } else if ((!idle || idle->exit_latency == min_exit_latency) && + rq->idle_stamp > latest_idle_timestamp) { + /* + * If equal or no active idle state, then + * the most recently idled CPU might have + * a warmer cache. + */ + latest_idle_timestamp = rq->idle_stamp; + shallowest_idle_cpu = i; + } + cpuidle_put_state(rq); + } else { + load = weighted_cpuload(i); + if (load < min_load || + (load == min_load && i == this_cpu)) { + min_load = load; + least_loaded_cpu = i; + } } }

- return idlest; + return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu; }

-- 1.8.4.108.g55ea5f6

Daniel Lezcano

5 Sep 5 Sep

7:52 a.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

On 09/04/2014 05:32 PM, Nicolas Pitre wrote:

...

The code in find_idlest_cpu() looks for the CPU with the smallest load. However, if multiple CPUs are idle, the first idle CPU is selected irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle state, or the latest to have gone idle if all idle CPUs are in the same state. The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

The idle exit latency of a CPU might be larger than the time needed to migrate the waking task to an already running CPU with sufficient capacity, and therefore performance would benefit from task packing in such case (in most cases task packing is about power saving).

Some idle states have a non negligible and non abortable entry latency which needs to run to completion before the exit latency can start. A concurrent patch series is making this info available to the cpuidle core. Once available, the entry latency with the idle timestamp could determine when the exit latency may be effective.

Those issues will be handled in due course. In the mean time, what is implemented here should improve things already compared to the current state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre nico@linaro.org

Sounds good for me.

Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

...

kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86d0d..416329e1a6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -23,6 +23,7 @@ #include <linux/latencytop.h> #include <linux/sched.h> #include <linux/cpumask.h> +#include <linux/cpuidle.h> #include <linux/slab.h> #include <linux/profile.h> #include <linux/interrupt.h> @@ -4428,20 +4429,48 @@ static int find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) { unsigned long load, min_load = ULONG_MAX;

int idlest = -1;

unsigned int min_exit_latency = UINT_MAX;

u64 latest_idle_timestamp = 0;

int least_loaded_cpu = this_cpu;

int shallowest_idle_cpu = -1; int i;

/* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
load = weighted_cpuload(i);
if (load < min_load || (load == min_load && i == this_cpu)) {
	min_load = load;
	idlest = i;
if (idle_cpu(i)) {
	struct rq *rq = cpu_rq(i);
	struct cpuidle_state *idle = idle_get_state(rq);
	if (idle && idle->exit_latency < min_exit_latency) {
		/*
		 * We give priority to a CPU whose idle state
		 * has the smallest exit latency irrespective
		 * of any idle timestamp.
		 */
		min_exit_latency = idle->exit_latency;
		latest_idle_timestamp = rq->idle_stamp;
		shallowest_idle_cpu = i;
	} else if ((!idle || idle->exit_latency == min_exit_latency) &&
		   rq->idle_stamp > latest_idle_timestamp) {
		/*
		 * If equal or no active idle state, then
		 * the most recently idled CPU might have
		 * a warmer cache.
		 */
		latest_idle_timestamp = rq->idle_stamp;
		shallowest_idle_cpu = i;
	}
	cpuidle_put_state(rq);
} else {
	load = weighted_cpuload(i);
	if (load < min_load ||
	    (load == min_load && i == this_cpu)) {
		min_load = load;
		least_loaded_cpu = i;
	}
} }
return idlest;

return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu; }

/*

-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog

Peter Zijlstra

18 Sep 18 Sep

11:46 p.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:

...

The code in find_idlest_cpu() looks for the CPU with the smallest load. However, if multiple CPUs are idle, the first idle CPU is selected irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle state, or the latest to have gone idle if all idle CPUs are in the same state. The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

The idle exit latency of a CPU might be larger than the time needed to migrate the waking task to an already running CPU with sufficient capacity, and therefore performance would benefit from task packing in such case (in most cases task packing is about power saving).

Some idle states have a non negligible and non abortable entry latency which needs to run to completion before the exit latency can start. A concurrent patch series is making this info available to the cpuidle core. Once available, the entry latency with the idle timestamp could determine when the exit latency may be effective.

Those issues will be handled in due course. In the mean time, what is implemented here should improve things already compared to the current state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre nico@linaro.org

kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86d0d..416329e1a6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -23,6 +23,7 @@ #include <linux/latencytop.h> #include <linux/sched.h> #include <linux/cpumask.h> +#include <linux/cpuidle.h> #include <linux/slab.h> #include <linux/profile.h> #include <linux/interrupt.h> @@ -4428,20 +4429,48 @@ static int find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)

Ah, now I see, you use it in find_idlest_cpu(), that does not indeed hold rq->lock, but it does already hold rcu_read_lock(), so in that regard sync_rcu() should be the right primitive.

I suppose we want the same kind of logic in select_idle_sibling() and that too already has rcu_read_lock().

So I'll replace the kick_all_cpus_sync() with sync_rcu() and add a WARN_ON(!rcu_read_lock_held()) to idle_get_state(), like the below.

I however do think we need a few word on why we don't need rcu_assign_pointer() and rcu_dereference() for rq->idle_state -- and I do indeed think we do not because the idle state data is static.

--- Subject: sched: let the scheduler see CPU idle states From: Daniel Lezcano daniel.lezcano@linaro.org Date: Thu, 04 Sep 2014 11:32:09 -0400

When the cpu enters idle, it stores the cpuidle state pointer in its struct rq instance which in turn could be used to make a better decision when balancing tasks.

As soon as the cpu exits its idle state, the struct rq reference is cleared.

There are a couple of situations where the idle state pointer could be changed while it is being consulted:

All cpus will exit their idle state and the pointed object will be set to NULL.

A race can happen if we use the pointer and then one of these two scenarios occurs at the same moment.

Cc: "Rafael J. Wysocki" rjw@rjwysocki.net Cc: Ingo Molnar mingo@redhat.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Nicolas Pitre nico@linaro.org --- drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 30 ++++++++++++++++++++++++++++++ 3 files changed, 42 insertions(+)

/** --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -147,6 +147,9 @@ static void cpuidle_idle_call(void) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &dev->cpu)) goto use_default;

+ /* Take note of the planned idle state. */ + idle_set_state(this_rq(), &drv->states[next_state]); + /* * Enter the idle state previously returned by the governor decision. * This function will block until an interrupt occurs and will take @@ -154,6 +157,9 @@ static void cpuidle_idle_call(void) */ entered_state = cpuidle_enter(drv, dev, next_state);

+ /* The cpu is no longer idle or about to enter idle. */ + idle_set_state(this_rq(), NULL); + if (broadcast) clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_EXIT, &dev->cpu);

--- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -14,6 +14,7 @@ #include "cpuacct.h"

struct rq; +struct cpuidle_state;

static inline int cpu_of(struct rq *rq) @@ -1193,6 +1199,30 @@ static inline void idle_exit_fair(struct

#endif

+#ifdef CONFIG_CPU_IDLE +static inline void idle_set_state(struct rq *rq, + struct cpuidle_state *idle_state) +{ + rq->idle_state = idle_state; +} + +static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{ + WARN_ON(!rcu_read_lock_held()); + return rq->idle_state; +} +#else +static inline void idle_set_state(struct rq *rq, + struct cpuidle_state *idle_state) +{ +} + +static inline struct cpuidle_state *idle_get_state(struct rq *rq) +{ + return NULL; +} +#endif + extern void sysrq_sched_debug_show(void); extern void sched_init_granularity(void); extern void update_max_interval(void);

Peter Zijlstra

19 Sep 19 Sep

12:05 a.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

On Thu, Sep 04, 2014 at 11:32:10AM -0400, Nicolas Pitre wrote:

...

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86d0d..416329e1a6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -23,6 +23,7 @@ #include <linux/latencytop.h> #include <linux/sched.h> #include <linux/cpumask.h> +#include <linux/cpuidle.h> #include <linux/slab.h> #include <linux/profile.h> #include <linux/interrupt.h> @@ -4428,20 +4429,48 @@ static int find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) { unsigned long load, min_load = ULONG_MAX;

int idlest = -1;

unsigned int min_exit_latency = UINT_MAX;

u64 latest_idle_timestamp = 0;

int least_loaded_cpu = this_cpu;

int shallowest_idle_cpu = -1; int i;

/* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
load = weighted_cpuload(i);
if (load < min_load || (load == min_load && i == this_cpu)) {
	min_load = load;
	idlest = i;
if (idle_cpu(i)) {
	struct rq *rq = cpu_rq(i);
	struct cpuidle_state *idle = idle_get_state(rq);
	if (idle && idle->exit_latency < min_exit_latency) {
		/*
		 * We give priority to a CPU whose idle state
		 * has the smallest exit latency irrespective
		 * of any idle timestamp.
		 */
		min_exit_latency = idle->exit_latency;
		latest_idle_timestamp = rq->idle_stamp;
		shallowest_idle_cpu = i;
	} else if ((!idle || idle->exit_latency == min_exit_latency) &&
		   rq->idle_stamp > latest_idle_timestamp) {
		/*
		 * If equal or no active idle state, then
		 * the most recently idled CPU might have
		 * a warmer cache.
		 */
		latest_idle_timestamp = rq->idle_stamp;
		shallowest_idle_cpu = i;
	}
	cpuidle_put_state(rq);

Right, matching the other changes, I killed that line. The rest looks ok.

...

} else {
	load = weighted_cpuload(i);
	if (load < min_load ||
	    (load == min_load && i == this_cpu)) {
		min_load = load;
		least_loaded_cpu = i;
	}
} }
return idlest;

return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;

}

Yao Dongdong

4:49 a.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

On 2014/9/4 23:32, Nicolas Pitre wrote:

...

The code in find_idlest_cpu() looks for the CPU with the smallest load. However, if multiple CPUs are idle, the first idle CPU is selected irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle state, or the latest to have gone idle if all idle CPUs are in the same state. The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

The idle exit latency of a CPU might be larger than the time needed to migrate the waking task to an already running CPU with sufficient capacity, and therefore performance would benefit from task packing in such case (in most cases task packing is about power saving).

Some idle states have a non negligible and non abortable entry latency which needs to run to completion before the exit latency can start. A concurrent patch series is making this info available to the cpuidle core. Once available, the entry latency with the idle timestamp could determine when the exit latency may be effective.

Those issues will be handled in due course. In the mean time, what is implemented here should improve things already compared to the current state of affairs.

Based on an initial patch from Daniel Lezcano.

Signed-off-by: Nicolas Pitre nico@linaro.org

kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 36 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bfa3c86d0d..416329e1a6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -23,6 +23,7 @@ #include <linux/latencytop.h> #include <linux/sched.h> #include <linux/cpumask.h> +#include <linux/cpuidle.h> #include <linux/slab.h> #include <linux/profile.h> #include <linux/interrupt.h> @@ -4428,20 +4429,48 @@ static int find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu) { unsigned long load, min_load = ULONG_MAX;

int idlest = -1;

unsigned int min_exit_latency = UINT_MAX;

u64 latest_idle_timestamp = 0;

int least_loaded_cpu = this_cpu;

int shallowest_idle_cpu = -1; int i;

/* Traverse only the allowed CPUs */ for_each_cpu_and(i, sched_group_cpus(group), tsk_cpus_allowed(p)) {
load = weighted_cpuload(i);
if (load < min_load || (load == min_load && i == this_cpu)) {
	min_load = load;
	idlest = i;
if (idle_cpu(i)) {
	struct rq *rq = cpu_rq(i);
	struct cpuidle_state *idle = idle_get_state(rq);
	if (idle && idle->exit_latency < min_exit_latency) {
		/*
		 * We give priority to a CPU whose idle state
		 * has the smallest exit latency irrespective
		 * of any idle timestamp.
		 */
		min_exit_latency = idle->exit_latency;
		latest_idle_timestamp = rq->idle_stamp;
		shallowest_idle_cpu = i;
	} else if ((!idle || idle->exit_latency == min_exit_latency) &&
		   rq->idle_stamp > latest_idle_timestamp) {
		/*
		 * If equal or no active idle state, then
		 * the most recently idled CPU might have
		 * a warmer cache.
		 */
		latest_idle_timestamp = rq->idle_stamp;
		shallowest_idle_cpu = i;
	}
	cpuidle_put_state(rq);
} else {

I think we needn't test no idle cpus after find an idle cpu. And what about this? } else if (shallowest_idle_cpu == -1) {

...

	load = weighted_cpuload(i);
	if (load < min_load ||
	    (load == min_load && i == this_cpu)) {
		min_load = load;
		least_loaded_cpu = i;
	}
} }
return idlest;

return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;

} /*

Rik van Riel

30 Sep 30 Sep

9:58 p.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 09/04/2014 11:32 AM, Nicolas Pitre wrote:

...

The code in find_idlest_cpu() looks for the CPU with the smallest load. However, if multiple CPUs are idle, the first idle CPU is selected irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle state, or the latest to have gone idle if all idle CPUs are in the same state. The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

The main thing it does not cover is already running tasks that get woken up again, since select_idle_sibling() covers everything except for newly forked and newly executed tasks.

I am looking at adding similar logic to select_idle_sibling()

- -- All rights reversed

...PGP SIGNATURE...

Nicolas Pitre

11:15 p.m.

New subject: [PATCH v2 2/2] sched/fair: leverage the idle state info when choosing the "idlest" cpu

On Tue, 30 Sep 2014, Rik van Riel wrote:

...

On 09/04/2014 11:32 AM, Nicolas Pitre wrote:

...
The code in find_idlest_cpu() looks for the CPU with the smallest load. However, if multiple CPUs are idle, the first idle CPU is selected irrespective of the depth of its idle state.

Among the idle CPUs we should pick the one with with the shallowest idle state, or the latest to have gone idle if all idle CPUs are in the same state. The later applies even when cpuidle is configured out.

This patch doesn't cover the following issues:

The main thing it does not cover is already running tasks that get woken up again, since select_idle_sibling() covers everything except for newly forked and newly executed tasks.

True. Now that you bring this up, I remember that Peter mentioned it as well.

...

I am looking at adding similar logic to select_idle_sibling()

OK thanks.

Nicolas

Rik van Riel

2 Oct 2 Oct

5:15 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Tue, 30 Sep 2014 19:15:00 -0400 (EDT) Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

On Tue, 30 Sep 2014, Rik van Riel wrote:

...

...
The main thing it does not cover is already running tasks that get woken up again, since select_idle_sibling() covers everything except for newly forked and newly executed tasks.

True. Now that you bring this up, I remember that Peter mentioned it as well.

...
I am looking at adding similar logic to select_idle_sibling()

OK thanks.

This patch is ugly. I have not bothered cleaning it up, because it causes a regression with hackbench. Apparently for hackbench (and potentially other sync wakeups), locality is more important than idleness.

We may need to add a third clause before the search, something along the lines of, to ensure target gets selected if neither target or i are idle and the wakeup is synchronous...

if (sync_wakeup && cpu_of(target)->nr_running == 1) return target;

I still need to run tests with other workloads, too.

Another consideration is that search costs with this patch are potentially much increased. I suspect we may want to simply propagate the load on each sched_group up the tree hierarchically, with delta accounting and propagating the info upwards only when the delta is significant, like done in __update_tg_runnable_avg.

---8<---

Subject: sched,idle: teach select_idle_sibling about idle states

Change select_idle_sibling to take cpu idle exit latency into account. First preference is to select the cpu with the lowest exit latency from a completely idle sched_group inside the CPU; if that is not available, we pick the CPU with the lowest exit latency in any sched_group.

This increases the total search time of select_idle_sibling, we may want to look into propagating load info up the sched_group tree in some way. That information would also be useful to prevent the wake_affine logic from causing a load imbalance between sched_groups.

It is not clear when locality (from staying on the old CPU) beats a lower idle exit latency. Having information on whether the CPU drops content from the CPU caches in certain idle states would help with that, but with multiple CPUs bound together in the same physical CPU core, the hardware often does not do what we tell it, anyway...

Signed-off-by: Rik van Riel riel@redhat.com --- kernel/sched/fair.c | 47 +++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 10a5a28..12540cd 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4465,41 +4465,76 @@ static int select_idle_sibling(struct task_struct *p, int target) { struct sched_domain *sd; struct sched_group *sg; + unsigned int min_exit_latency_thread = UINT_MAX; + unsigned int min_exit_latency_core = UINT_MAX; + int shallowest_idle_thread = -1; + int shallowest_idle_core = -1; int i = task_cpu(p);

+ /* target always has some code running and is not in an idle state */ if (idle_cpu(target)) return target;

/* * If the prevous cpu is cache affine and idle, don't be stupid. + * XXX: does i's exit latency exceed sysctl_sched_migration_cost? */ if (i != target && cpus_share_cache(i, target) && idle_cpu(i)) return i;

/* * Otherwise, iterate the domains and find an elegible idle cpu. + * First preference is finding a totally idle core with a thread + * in a shallow idle state; second preference is whatever idle + * thread has the shallowest idle state anywhere. */ sd = rcu_dereference(per_cpu(sd_llc, target)); for_each_lower_domain(sd) { sg = sd->groups; do { + unsigned int min_sg_exit_latency = UINT_MAX; + int shallowest_sg_idle_thread = -1; + bool all_idle = true; + if (!cpumask_intersects(sched_group_cpus(sg), tsk_cpus_allowed(p))) goto next;

for_each_cpu(i, sched_group_cpus(sg)) { - if (i == target || !idle_cpu(i)) - goto next; + struct rq *rq; + struct cpuidle_state *idle; + + if (i == target || !idle_cpu(i)) { + all_idle = false; + continue; + } + + rq = cpu_rq(i); + idle = idle_get_state(rq); + + if (idle && idle->exit_latency < min_sg_exit_latency) { + min_sg_exit_latency = idle->exit_latency; + shallowest_sg_idle_thread = i; + } + } + + if (all_idle && min_sg_exit_latency < min_exit_latency_core) { + shallowest_idle_core = shallowest_sg_idle_thread; + min_exit_latency_core = min_sg_exit_latency; + } else if (min_sg_exit_latency < min_exit_latency_thread) { + shallowest_idle_thread = shallowest_sg_idle_thread; + min_exit_latency_thread = min_sg_exit_latency; }

- target = cpumask_first_and(sched_group_cpus(sg), - tsk_cpus_allowed(p)); - goto done; next: sg = sg->next; } while (sg != sd->groups); } -done: + if (shallowest_idle_core >= 0) + target = shallowest_idle_core; + else if (shallowest_idle_thread >= 0) + target = shallowest_idle_thread; + return target; }

Mike Galbraith

3 Oct 3 Oct

6:04 a.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:

...

This patch is ugly. I have not bothered cleaning it up, because it causes a regression with hackbench. Apparently for hackbench (and potentially other sync wakeups), locality is more important than idleness.

We may need to add a third clause before the search, something along the lines of, to ensure target gets selected if neither target or i are idle and the wakeup is synchronous...
if (sync_wakeup && cpu_of(target)->nr_running == 1)
return target;

I recommend you forget that trusting sync hint ever sprang to mind, it is often a big fat lie.

-Mike

Mike Galbraith

6:23 a.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:

...

Subject: sched,idle: teach select_idle_sibling about idle states

Change select_idle_sibling to take cpu idle exit latency into account. First preference is to select the cpu with the lowest exit latency from a completely idle sched_group inside the CPU; if that is not available, we pick the CPU with the lowest exit latency in any sched_group.

This increases the total search time of select_idle_sibling, we may want to look into propagating load info up the sched_group tree in some way. That information would also be useful to prevent the wake_affine logic from causing a load imbalance between sched_groups.

A generic boo hiss aimed in the general direction of all of this let's go look at every possibility on every wakeup stuff. Less is more.

-Mike

Peter Zijlstra

7:50 a.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:

...

On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:

...
Subject: sched,idle: teach select_idle_sibling about idle states

Change select_idle_sibling to take cpu idle exit latency into account. First preference is to select the cpu with the lowest exit latency from a completely idle sched_group inside the CPU; if that is not available, we pick the CPU with the lowest exit latency in any sched_group.

This increases the total search time of select_idle_sibling, we may want to look into propagating load info up the sched_group tree in some way. That information would also be useful to prevent the wake_affine logic from causing a load imbalance between sched_groups.

A generic boo hiss aimed in the general direction of all of this let's go look at every possibility on every wakeup stuff. Less is more.

I hear you, can you see actual slowdown with the patch? While the worst case doesn't change, it does make the average case equal to the worst case iteration -- where we previously would average out at inspecting half the CPUs before finding an idle one, we'd now always inspect all of them in order to compare all idle ones on their properties.

Also, with the latest generation of Haswell Xeons having 18 cores (36 threads) this is one massively painful loop for sure.

I'm just not sure what to do about it.. I suppose we can artificially split it into smaller groups, but I bet that'll hurt some, but if we can show it gains more we might still be able to do it. The only real problem is actual numbers/workloads (isn't it always) :/

One thing I suppose we could try is keeping a 'busy' flag at the llc domain which is set when all CPUs are busy (we'll clear it from new_idle) that way we can avoid the entire iteration if we know its pointless.

Hmm...

Mike Galbraith

1:05 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Fri, 2014-10-03 at 09:50 +0200, Peter Zijlstra wrote:

...

On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:

...

...
A generic boo hiss aimed in the general direction of all of this let's go look at every possibility on every wakeup stuff. Less is more.

I hear you, can you see actual slowdown with the patch? While the worst case doesn't change, it does make the average case equal to the worst case iteration -- where we previously would average out at inspecting half the CPUs before finding an idle one, we'd now always inspect all of them in order to compare all idle ones on their properties.

Also, with the latest generation of Haswell Xeons having 18 cores (36 threads) this is one massively painful loop for sure.

Yeah, the things are getting too damn big. I didn't try the patch and measure anything, my gut instantly said "nope, not worth it".

...

I'm just not sure what to do about it.. I suppose we can artificially split it into smaller groups, but I bet that'll hurt some, but if we can show it gains more we might still be able to do it. The only real problem is actual numbers/workloads (isn't it always) :/

One thing I suppose we could try is keeping a 'busy' flag at the llc domain which is set when all CPUs are busy (we'll clear it from new_idle) that way we can avoid the entire iteration if we know its pointless.

On one of those huge packages, heck, even on a 8 core that could save a substantial number of busy box cycles.

-Mike

Rik van Riel

2:28 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 10/03/2014 03:50 AM, Peter Zijlstra wrote:

...

On Fri, Oct 03, 2014 at 08:23:04AM +0200, Mike Galbraith wrote:

...
On Thu, 2014-10-02 at 13:15 -0400, Rik van Riel wrote:

...
Subject: sched,idle: teach select_idle_sibling about idle states

Change select_idle_sibling to take cpu idle exit latency into account. First preference is to select the cpu with the lowest exit latency from a completely idle sched_group inside the CPU; if that is not available, we pick the CPU with the lowest exit latency in any sched_group.

This increases the total search time of select_idle_sibling, we may want to look into propagating load info up the sched_group tree in some way. That information would also be useful to prevent the wake_affine logic from causing a load imbalance between sched_groups.

A generic boo hiss aimed in the general direction of all of this let's go look at every possibility on every wakeup stuff. Less is more.

I hear you, can you see actual slowdown with the patch? While the worst case doesn't change, it does make the average case equal to the worst case iteration -- where we previously would average out at inspecting half the CPUs before finding an idle one, we'd now always inspect all of them in order to compare all idle ones on their properties.

Also, with the latest generation of Haswell Xeons having 18 cores (36 threads) this is one massively painful loop for sure.

We have 3 different goals when selecting a runqueue for a task: 1) locality: get the task running close to where it has stuff cached 2) work preserving: get the task running ASAP, and preferably on a fully idle core 3) idle state latency: place the task on a CPU that can start running it ASAP

We may also consider the interplay of the above 3 to have an impact on 4) power use: pack tasks on some CPUs so other CPUs can go into deeper idle states

The current implementation is a "compromise" between (1) and (2), with a strong preference for (2), falling back to (1) if no fully idle core is found.

My ugly hack isn't any better, trading off (1) in order to be better at (2) and (3). Whether it even affects (4) remains to be seen.

I know my patch is probably unacceptable, but I do think it is important that we talk about the problem, and hopefully agree on exactly what the problem is that we want to solve.

One big question in my mind is, when is locality more important, and when is work preserving more important? Do we have an answer to that question?

The current code has the potential to be quite painful on systems with a large number of cores per chip, so we will have to change things anyway...

- -- All rights reversed

...PGP SIGNATURE...

Peter Zijlstra

2:46 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Fri, Oct 03, 2014 at 10:28:42AM -0400, Rik van Riel wrote:

...

We have 3 different goals when selecting a runqueue for a task:

locality: get the task running close to where it has stuff cached

work preserving: get the task running ASAP, and preferably on a fully idle core

idle state latency: place the task on a CPU that can start running it ASAP

3 can also be considered part of power aware, seeing how it will try and let CPUs reach their deep idle potential.

...

We may also consider the interplay of the above 3 to have an impact on 4) power use: pack tasks on some CPUs so other CPUs can go into deeper idle states

The current implementation is a "compromise" between (1) and (2), with a strong preference for (2), falling back to (1) if no fully idle core is found.

My ugly hack isn't any better, trading off (1) in order to be better at (2) and (3). Whether it even affects (4) remains to be seen.

I know my patch is probably unacceptable, but I do think it is important that we talk about the problem, and hopefully agree on exactly what the problem is that we want to solve.

Yeah, we've been through this several times, it basically boils down to the amount of fail vs win on 'various' workloads. The endless problem is of course that the fail vs win ratio is entirely workload dependent and as ever there is no comprehensive set.

The last time this came up was when Mike tried to do his cache buddy idea, which basically reduced things to only looking at 2 cpus. That make some things fly and some things tank.

...

One big question in my mind is, when is locality more important, and when is work preserving more important? Do we have an answer to that question?

Typically 2) is important when there's lots of short running tasks around, any queueing typically destroys throughput in that case.

...

The current code has the potential to be quite painful on systems with a large number of cores per chip, so we will have to change things anyway...

What I said.. so far we've failed at coming up with anything sane though, so far we've found that 2 cpus is too small a slice to look at and we're fairly sure 18/36 is too large :-)

Rik van Riel

3:37 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 10/03/2014 10:46 AM, Peter Zijlstra wrote:

...

On Fri, Oct 03, 2014 at 10:28:42AM -0400, Rik van Riel wrote:

...

...
The current code has the potential to be quite painful on systems with a large number of cores per chip, so we will have to change things anyway...

What I said.. so far we've failed at coming up with anything sane though, so far we've found that 2 cpus is too small a slice to look at and we're fairly sure 18/36 is too large :-)

Some more brainstorming points...

1) We should probably (lazily/batched?) propagate load information up the sched_group tree. This will be useful for wake_affine, load_balancing, find_idlest_cpu, and select_idle_sibling

2) With both find_idlest_cpu and select_idle_sibling walking down the tree from the LLC level, they could probably share code

3) Counting both blocked and runnable load may give better long term stability of loads, resulting in a reduction in work preserving behaviour, but an improvement in locality - this could be worthwhile, but it is hard to say in advance

4) We can be pretty sure that CPU makers are not going to stop at a mere 18 cores. We need to subdivide things below the LLC level, turning select_idle_sibling and find_idlest_cpu into a tree walk.

This means whatever selection criteria are used by these need to be propagated up the sched_group tree. This, in turn, means we probably need to restrict ourselves to things that do not get changed/updated too often.

Am I overlooking anything?

- -- All rights reversed

...PGP SIGNATURE...

Peter Zijlstra

9 Oct 9 Oct

4:04 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Fri, Oct 03, 2014 at 11:37:31AM -0400, Rik van Riel wrote:

...

Some more brainstorming points...

We should probably (lazily/batched?) propagate load information up the sched_group tree. This will be useful for wake_affine, load_balancing, find_idlest_cpu, and select_idle_sibling

With both find_idlest_cpu and select_idle_sibling walking down the tree from the LLC level, they could probably share code

Counting both blocked and runnable load may give better long term stability of loads, resulting in a reduction in work preserving behaviour, but an improvement in locality - this could be worthwhile, but it is hard to say in advance

We can be pretty sure that CPU makers are not going to stop at a mere 18 cores. We need to subdivide things below the LLC level, turning select_idle_sibling and find_idlest_cpu into a tree walk.

This means whatever selection criteria are used by these need to be propagated up the sched_group tree. This, in turn, means we probably need to restrict ourselves to things that do not get changed/updated too often.

Am I overlooking anything?

Well, we can certainly try something like that; but your last point seems like a contradition; seeing how _the_ important point for select_idle_sibling() is the actual idle state, and that per definition is something that can change/update often.

But yes, the only viable option is some artificial breakup of the topology and we can indeed try and bridge the gap with some caching.

Nicolas Pitre

3 Oct 3 Oct

6:52 p.m.

New subject: [PATCH RFC] sched,idle: teach select_idle_sibling about idle states

On Fri, 3 Oct 2014, Rik van Riel wrote:

...

We have 3 different goals when selecting a runqueue for a task:

locality: get the task running close to where it has stuff cached

work preserving: get the task running ASAP, and preferably on a fully idle core

idle state latency: place the task on a CPU that can start running it ASAP

We may also consider the interplay of the above 3 to have an impact on 4) power use: pack tasks on some CPUs so other CPUs can go into deeper idle states

In my mind the actual choice is between (1) and (2). Once you decided on (2) then obviously you should imply (3) all the time. And by having (3) then (4) should be a natural side effect by not selecting idle CPUs randomly.

By selecting (1) you already have (4).

The deficient part right now is (3) as a consequence of (2). Fixing (3) should not have to affect (1).

...

The current implementation is a "compromise" between (1) and (2), with a strong preference for (2), falling back to (1) if no fully idle core is found.

My ugly hack isn't any better, trading off (1) in order to be better at (2) and (3). Whether it even affects (4) remains to be seen.

(4) is greatly influenced by (3) on mobile platforms, especially those with a cluster topology. This might not be as significant on server type systems, although performance should benefit as well from the smaller wake-up latency.

On a mobile system losing 10% performance to save 20% on power usage might be an excellent compromise. Maybe not so on a server system where performance is everything.

Nicolas

Nicolas Pitre

10 Sep 10 Sep

9:35 p.m.

Ping.

On Thu, 4 Sep 2014, Nicolas Pitre wrote:

...

This is a rework of the series initially posted by Daniel Lezcano here:

http://news.gmane.org/group/gmane.linux.power-management.general/thread=4416...

Those patches were straightened up, commit logs are more comprehensive, bugs were fixed, etc.

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++------- kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 7 deletions(-)

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Rafael J. Wysocki

10:50 p.m.

On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:

...

Ping.

Is this urgent, and if so, then why?

...

On Thu, 4 Sep 2014, Nicolas Pitre wrote:

...
This is a rework of the series initially posted by Daniel Lezcano here:

http://news.gmane.org/group/gmane.linux.power-management.general/thread=4416...

Those patches were straightened up, commit logs are more comprehensive, bugs were fixed, etc.

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++------- kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 7 deletions(-)

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

-- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.

Nicolas Pitre

11:25 p.m.

On Thu, 11 Sep 2014, Rafael J. Wysocki wrote:

...

On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:

...
Ping.

Is this urgent, and if so, then why?

What makes you think this could be urgent?

After almost a week after the original posting without any feedback, one may simply wonder if things could have accidentally fell into a crack, that's all.

...

...
On Thu, 4 Sep 2014, Nicolas Pitre wrote:

...
This is a rework of the series initially posted by Daniel Lezcano here:

http://news.gmane.org/group/gmane.linux.power-management.general/thread=4416...

Those patches were straightened up, commit logs are more comprehensive, bugs were fixed, etc.

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++------- kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 7 deletions(-)

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

-- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center.

Nicolas Pitre

11:28 p.m.

On Wed, 10 Sep 2014, Nicolas Pitre wrote:

...

On Thu, 11 Sep 2014, Rafael J. Wysocki wrote:

...
On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:

...
Ping.

Is this urgent, and if so, then why?

What makes you think this could be urgent?

After almost a week after the original posting without any feedback, one may simply wonder if things could have accidentally fell into a crack, that's all.

s/fell/fallen/ of course.

Nicolas

Rafael J. Wysocki

11:50 p.m.

On Wednesday, September 10, 2014 07:25:37 PM Nicolas Pitre wrote:

...

On Thu, 11 Sep 2014, Rafael J. Wysocki wrote:

...
On Wednesday, September 10, 2014 05:35:27 PM Nicolas Pitre wrote:

...
Ping.

Is this urgent, and if so, then why?

What makes you think this could be urgent?

After almost a week after the original posting without any feedback, one may simply wonder if things could have accidentally fell into a crack, that's all.

Well, they didn't, but some recipients have been traveling quite a bit lately and are in the process of dealing with their email backlogs ...

Sorry about being less responsive than expected.

Rafael

Nicolas Pitre

18 Sep 18 Sep

12:39 a.m.

Ping-ping.

On Wed, 10 Sep 2014, Nicolas Pitre wrote:

...

Ping.

On Thu, 4 Sep 2014, Nicolas Pitre wrote:

...
This is a rework of the series initially posted by Daniel Lezcano here:

http://news.gmane.org/group/gmane.linux.power-management.general/thread=4416...

Those patches were straightened up, commit logs are more comprehensive, bugs were fixed, etc.

drivers/cpuidle/cpuidle.c | 6 ++++++ kernel/sched/fair.c | 43 ++++++++++++++++++++++++++++++++++------- kernel/sched/idle.c | 6 ++++++ kernel/sched/sched.h | 39 +++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 7 deletions(-)

Nicolas

linaro-kernel mailing list linaro-kernel@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-kernel

Peter Zijlstra

11:24 p.m.

On Wed, Sep 17, 2014 at 08:39:34PM -0400, Nicolas Pitre wrote:

...

Ping-ping.

Right, finally got to it. Too much travel and some time away from the computer with the 'usual' result :/

Nicolas Pitre

19 Sep 19 Sep

6:22 p.m.

On Fri, 19 Sep 2014, Peter Zijlstra wrote:

...

On Wed, Sep 17, 2014 at 08:39:34PM -0400, Nicolas Pitre wrote:

...
Ping-ping.

Right, finally got to it. Too much travel and some time away from the computer with the 'usual' result :/

No problem. Next wednesday I'd simply have sent a ping^3. :-)

Nicolas

4032

days inactive

4067

days old

linaro-kernel@lists.linaro.org

33 comments

participants

tags (0)

participants (8)

Daniel Lezcano
Mike Galbraith
Nicolas Pitre
Paul E. McKenney
Peter Zijlstra
Rafael J. Wysocki
Rik van Riel
Yao Dongdong