[PATCH v2 0/6] setting the table for integration of cpuidle with the scheduler

List overview All Threads
Download

newer

older

[PATCH 1/2] spi/s3c64xx: Remove...

mainline boot: 35 pass, 2 fail...

Nicolas Pitre

29 Jan 2014 29 Jan '14

5:45 p.m.

As everyone should know by now, we want to integrate the cpuidle governor with the scheduler for a more efficient idling of CPUs. In order to help the transition, this small patch series moves the existing interaction with cpuidle from architecture code to generic core code. The ARM, PPC, SH and X86 architectures are concerned. No functional change should have occurred yet.

@peterz: Are you willing to pick up those patches?

Change from v1:

- dropped removal of arch_cpu_idle_prepare()

arch/arm/kernel/process.c | 16 +++------ arch/powerpc/platforms/pseries/processor_idle.c | 5 +++ arch/powerpc/platforms/pseries/setup.c | 34 ++++++++----------- arch/sh/kernel/idle.c | 4 +-- arch/x86/kernel/process.c | 5 +-- kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/sched/Makefile | 2 +- kernel/{cpu => sched}/idle.c | 4 ++- 9 files changed, 30 insertions(+), 42 deletions(-)

Nicolas

Show replies by date

Nicolas Pitre

29 Jan 29 Jan

5:45 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org --- kernel/cpu/idle.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 988573a9a3..ffcd3ee9af 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -3,6 +3,7 @@ */ #include <linux/sched.h> #include <linux/cpu.h> +#include <linux/cpuidle.h> #include <linux/tick.h> #include <linux/mm.h> #include <linux/stackprotector.h> @@ -95,7 +96,8 @@ static void cpu_idle_loop(void) if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter(); - arch_cpu_idle(); + if (cpuidle_idle_call()) + arch_cpu_idle(); WARN_ON_ONCE(irqs_disabled()); rcu_idle_exit(); start_critical_timings();

-- 1.8.4.108.g55ea5f6

Nicolas Pitre

8:31 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and allowing for the warning to trig.

Signed-off-by: Nicolas Pitre nico@linaro.org

diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 988573a9a3..14ca43430a 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -3,6 +3,7 @@ */ #include <linux/sched.h> #include <linux/cpu.h> +#include <linux/cpuidle.h> #include <linux/tick.h> #include <linux/mm.h> #include <linux/stackprotector.h> @@ -95,8 +96,10 @@ static void cpu_idle_loop(void) if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter(); - arch_cpu_idle(); - WARN_ON_ONCE(irqs_disabled()); + if (cpuidle_idle_call()) + arch_cpu_idle(); + if (WARN_ON_ONCE(irqs_disabled())) + local_irq_enable(); rcu_idle_exit(); start_critical_timings(); } else {

Preeti U Murthy

30 Jan 30 Jan

3:38 a.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:

...

On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and

Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it?

Thanks

Regards Preeti U Murthy

...

allowing for the warning to trig.

Signed-off-by: Nicolas Pitre nico@linaro.org

diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index 988573a9a3..14ca43430a 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -3,6 +3,7 @@ */ #include <linux/sched.h> #include <linux/cpu.h> +#include <linux/cpuidle.h> #include <linux/tick.h> #include <linux/mm.h> #include <linux/stackprotector.h> @@ -95,8 +96,10 @@ static void cpu_idle_loop(void) if (!current_clr_polling_and_test()) { stop_critical_timings(); rcu_idle_enter();
			arch_cpu_idle();
			WARN_ON_ONCE(irqs_disabled());
			if (cpuidle_idle_call())
				arch_cpu_idle();
			if (WARN_ON_ONCE(irqs_disabled()))
				local_irq_enable();
		rcu_idle_exit();
		start_critical_timings();
	} else {

Nicolas Pitre

5:28 a.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On Thu, 30 Jan 2014, Preeti U Murthy wrote:

...

Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:

...
On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and

Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it?

Exact. However x86 currently does this:

if (cpuidle_idle_call()) x86_idle(); else local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run).

Nicolas

Preeti U Murthy

5:50 a.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

Hi Nicolas,

On 01/30/2014 10:58 AM, Nicolas Pitre wrote:

...

On Thu, 30 Jan 2014, Preeti U Murthy wrote:

...
Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:

...
On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and

Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it?

Exact. However x86 currently does this:

if (cpuidle_idle_call()) x86_idle(); else local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run).

Oh ok, thank you for clarifying this:)

Regards Preeti U Murthy

...

Nicolas

Daniel Lezcano

1:44 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On 01/30/2014 06:28 AM, Nicolas Pitre wrote:

...

On Thu, 30 Jan 2014, Preeti U Murthy wrote:

...
Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:

...
On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and

Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it?

Exact. However x86 currently does this:

if (cpuidle_idle_call()) x86_idle(); else local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run).

But what I don't get with your comment is the local_irq_enable is done from the cpuidle common framework in 'cpuidle_enter_state' it is not done from the arch specific backend cpuidle driver.

So the code above could be:

if (cpuidle_idle_call()) x86_idle();

without the else section, this local_irq_enable is pointless. Or may be I missed something ?

-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog

Nicolas Pitre

4:07 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On Thu, 30 Jan 2014, Daniel Lezcano wrote:

...

On 01/30/2014 06:28 AM, Nicolas Pitre wrote:

...
On Thu, 30 Jan 2014, Preeti U Murthy wrote:

...
Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:

...
On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and

Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it?

Exact. However x86 currently does this:

if (cpuidle_idle_call()) x86_idle(); else local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run).

But what I don't get with your comment is the local_irq_enable is done from the cpuidle common framework in 'cpuidle_enter_state' it is not done from the arch specific backend cpuidle driver.

Oh well... This certainly means we'll have to clean this mess as some drivers do it on their own while some others don't. Some drivers also loop on !need_resched() while some others simply return on the first interrupt.

...

So the code above could be:

if (cpuidle_idle_call()) x86_idle();

without the else section, this local_irq_enable is pointless. Or may be I missed something ?

A later patch removes it anyway. But if it is really necessary to enable interrupts then the core will do it but with a warning now.

Nicolas

Daniel Lezcano

5:28 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On 01/30/2014 05:07 PM, Nicolas Pitre wrote:

...

On Thu, 30 Jan 2014, Daniel Lezcano wrote:

...
On 01/30/2014 06:28 AM, Nicolas Pitre wrote:

...
On Thu, 30 Jan 2014, Preeti U Murthy wrote:

...
Hi Nicolas,

On 01/30/2014 02:01 AM, Nicolas Pitre wrote:

...
On Wed, 29 Jan 2014, Nicolas Pitre wrote:

...
In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

As mentioned in my reply to Olof's comment on patch #5/6, here's a new version of this patch adding the safety local_irq_enable() to the core code.

----- >8

From: Nicolas Pitre nicolas.pitre@linaro.org Subject: idle: move the cpuidle entry point to the generic idle loop

In order to integrate cpuidle with the scheduler, we must have a better proximity in the core code with what cpuidle is doing and not delegate such interaction to arch code.

Architectures implementing arch_cpu_idle() should simply enter a cheap idle mode in the absence of a proper cpuidle driver.

In both cases i.e. whether it is a cpuidle driver or the default arch_cpu_idle(), the calling convention expects IRQs to be disabled on entry and enabled on exit. There is a warning in place already but let's add a forced IRQ enable here as well. This will allow for removing the forced IRQ enable some implementations do locally and

Why would this patch allow for removing the forced IRQ enable that are being done on some archs in arch_cpu_idle()? Isn't this patch expecting the default arch_cpu_idle() to have re-enabled the interrupts after exiting from the default idle state? Its supposed to only catch faulty cpuidle drivers that haven't enabled IRQs on exit from idle state but are expected to have done so, isn't it?

Exact. However x86 currently does this:

if (cpuidle_idle_call()) x86_idle(); else local_irq_enable();

So whenever cpuidle_idle_call() is successful then IRQs are unconditionally enabled whether or not the underlying cpuidle driver has properly done it or not. And the reason is that some of the x86 cpuidle do fail to enable IRQs before returning.

So the idea is to get rid of this unconditional IRQ enabling and let the core issue a warning instead (as well as enabling IRQs to allow the system to run).

But what I don't get with your comment is the local_irq_enable is done from the cpuidle common framework in 'cpuidle_enter_state' it is not done from the arch specific backend cpuidle driver.

Oh well... This certainly means we'll have to clean this mess as some drivers do it on their own while some others don't. Some drivers also loop on !need_resched() while some others simply return on the first interrupt.

Ok, I think the mess is coming from 'default_idle' which does not re-enable the local_irq but used from different places like amd_e400_idle and apm_cpu_idle.

void default_idle(void) { trace_cpu_idle_rcuidle(1, smp_processor_id()); safe_halt(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); }

Considering the system configured without cpuidle because this one *always* enable the local irq, we have the different cases:

x86_idle = default_idle(); ==> local_irq_enable is missing

x86_idle = amd_e400_idle(); ==> it calls local_irq_disable(); but in the idle loop context where the local irqs are already disabled. ==> if amd_e400_c1e_detected is true, the local_irq are enabled ==> otherwise no ==> default_idle is called from there and does not enable local_irqs

...

...
So the code above could be:

if (cpuidle_idle_call()) x86_idle();

without the else section, this local_irq_enable is pointless. Or may be I missed something ?

A later patch removes it anyway. But if it is really necessary to enable interrupts then the core will do it but with a warning now.

This WARN should disappear. It was there because it was up to the backend cpuidle driver to enable the irq. But in the meantime, that was consolidated into a single place in the cpuidle framework so no need to try to catch errors.

What about (based on this patchset).

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 4505e2a..2d60cbb 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -299,6 +299,7 @@ void arch_cpu_idle_dead(void) void arch_cpu_idle(void) { x86_idle(); + local_irq_enable(); }

Peter Zijlstra

6:06 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On Thu, Jan 30, 2014 at 06:28:52PM +0100, Daniel Lezcano wrote:

...

Ok, I think the mess is coming from 'default_idle' which does not re-enable the local_irq but used from different places like amd_e400_idle and apm_cpu_idle.

void default_idle(void) { trace_cpu_idle_rcuidle(1, smp_processor_id()); safe_halt(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); }

Considering the system configured without cpuidle because this one *always* enable the local irq, we have the different cases:

x86_idle = default_idle(); ==> local_irq_enable is missing

safe_halt() is "sti; hlt" and so very much does the irq_enable.

Nicolas Pitre

7:24 p.m.

New subject: [PATCH v2 1/6] idle: move the cpuidle entry point to the generic idle loop

On Thu, 30 Jan 2014, Daniel Lezcano wrote:

...

On 01/30/2014 05:07 PM, Nicolas Pitre wrote:

...
On Thu, 30 Jan 2014, Daniel Lezcano wrote:

...
But what I don't get with your comment is the local_irq_enable is done from the cpuidle common framework in 'cpuidle_enter_state' it is not done from the arch specific backend cpuidle driver.

Oh well... This certainly means we'll have to clean this mess as some drivers do it on their own while some others don't. Some drivers also loop on !need_resched() while some others simply return on the first interrupt.

Ok, I think the mess is coming from 'default_idle' which does not re-enable the local_irq but used from different places like amd_e400_idle and apm_cpu_idle.

Yet if you look at the code path before my patches you'll see that IRQs were enabled only after cpuidle_idle_call() had returned success.

...

void default_idle(void) { trace_cpu_idle_rcuidle(1, smp_processor_id()); safe_halt(); trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); }

Considering the system configured without cpuidle because this one *always* enable the local irq,

Yet this is discutable. Given that some hardware do have IRQs turned on upon exiting idle mode, I think we should generalize it so that the explicit enabling of IRQs, when needed, should be done as close as possible to the operation that caused idle mode to be entered.

...

we have the different cases:

x86_idle = default_idle(); ==> local_irq_enable is missing

According to Peter it is not.

...

x86_idle = amd_e400_idle(); ==> it calls local_irq_disable(); but in the idle loop context where the local irqs are already disabled.

Since it returned from default_idle() then IRQs are enabled.

...

==> if amd_e400_c1e_detected is true, the local_irq are enabled ==> otherwise no ==> default_idle is called from there and does not enable local_irqs

Again, it does.

...

...
...
So the code above could be:

if (cpuidle_idle_call()) x86_idle();

without the else section, this local_irq_enable is pointless. Or may be I missed something ?

A later patch removes it anyway. But if it is really necessary to enable interrupts then the core will do it but with a warning now.

This WARN should disappear. It was there because it was up to the backend cpuidle driver to enable the irq. But in the meantime, that was consolidated into a single place in the cpuidle framework so no need to try to catch errors.

And that consolidation was a mistake IMHO. We should assume that the exiting of idle mode has IRQs enabled already, and do so manually in the backend driver if it is not the case on particular hardware. That's the only way to ensure uniformity at a higher level.

Yet, if a code path is buggy in that regard, whether this is through cpuidle when enabled, or the default idle function otherwise, then the warning is there in cpu_idle_loop() to catch them all.

...

What about (based on this patchset).

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 4505e2a..2d60cbb 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -299,6 +299,7 @@ void arch_cpu_idle_dead(void) void arch_cpu_idle(void) { x86_idle();
  local_irq_enable();
}

Again this is redundant.

Nicolas

Nicolas Pitre

29 Jan 29 Jan

5:45 p.m.

New subject: [PATCH v2 2/6] ARM: remove redundant cpuidle_idle_call()

The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org --- arch/arm/kernel/process.c | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 92f7b15dd2..adabeababe 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -30,7 +30,6 @@ #include <linux/uaccess.h> #include <linux/random.h> #include <linux/hw_breakpoint.h> -#include <linux/cpuidle.h> #include <linux/leds.h> #include <linux/reboot.h>

@@ -133,7 +132,11 @@ EXPORT_SYMBOL_GPL(arm_pm_restart);

void (*arm_pm_idle)(void);

-static void default_idle(void) +/* + * Called from the core idle loop. + */ + +void arch_cpu_idle(void) { if (arm_pm_idle) arm_pm_idle(); @@ -168,15 +171,6 @@ void arch_cpu_idle_dead(void) #endif

/* - * Called from the core idle loop. - */ -void arch_cpu_idle(void) -{ - if (cpuidle_idle_call()) - default_idle(); -} - -/* * Called by kexec, immediately prior to machine_kexec(). * * This must completely disable all secondary CPUs; simply causing those CPUs

-- 1.8.4.108.g55ea5f6

Nicolas Pitre

5:45 p.m.

New subject: [PATCH v2 3/6] PPC: remove redundant cpuidle_idle_call()

The core idle loop now takes care of it. However a few things need checking:

- Invocation of cpuidle_idle_call() in pseries_lpar_idle() happened through arch_cpu_idle() and was therefore always preceded by a call to ppc64_runlatch_off(). To preserve this property now that cpuidle_idle_call() is invoked directly from core code, a call to ppc64_runlatch_off() has been added to idle_loop_prolog() in platforms/pseries/processor_idle.c.

- Similarly, cpuidle_idle_call() was followed by ppc64_runlatch_off() so a call to the later has been added to idle_loop_epilog().

- And since arch_cpu_idle() always made sure to re-enable IRQs if they were not enabled, this is now done in idle_loop_epilog() as well.

The above was made in order to keep the execution flow close to the original. I don't know if that was strictly necessary. Someone well aquainted with the platform details might find some room for possible optimizations.

Signed-off-by: Nicolas Pitre nico@linaro.org Reviewed-by: Preeti U Murthy preeti@linux.vnet.ibm.com --- arch/powerpc/platforms/pseries/processor_idle.c | 5 ++++ arch/powerpc/platforms/pseries/setup.c | 34 ++++++++++--------------- 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/processor_idle.c b/arch/powerpc/platforms/pseries/processor_idle.c index a166e38bd6..72ddfe3d2f 100644 --- a/arch/powerpc/platforms/pseries/processor_idle.c +++ b/arch/powerpc/platforms/pseries/processor_idle.c @@ -33,6 +33,7 @@ static struct cpuidle_state *cpuidle_state_table;

static inline void idle_loop_prolog(unsigned long *in_purr) { + ppc64_runlatch_off(); *in_purr = mfspr(SPRN_PURR); /* * Indicate to the HV that we are idle. Now would be @@ -49,6 +50,10 @@ static inline void idle_loop_epilog(unsigned long in_purr) wait_cycles += mfspr(SPRN_PURR) - in_purr; get_lppaca()->wait_state_cycles = cpu_to_be64(wait_cycles); get_lppaca()->idle = 0; + + if (irqs_disabled()) + local_irq_enable(); + ppc64_runlatch_on(); }

static int snooze_loop(struct cpuidle_device *dev, diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index c1f1908587..7604c19d54 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -39,7 +39,6 @@ #include <linux/irq.h> #include <linux/seq_file.h> #include <linux/root_dev.h> -#include <linux/cpuidle.h> #include <linux/of.h> #include <linux/kexec.h>

@@ -356,29 +355,24 @@ early_initcall(alloc_dispatch_log_kmem_cache);

static void pseries_lpar_idle(void) { - /* This would call on the cpuidle framework, and the back-end pseries - * driver to go to idle states + /* + * Default handler to go into low thread priority and possibly + * low power mode by cedeing processor to hypervisor */ - if (cpuidle_idle_call()) { - /* On error, execute default handler - * to go into low thread priority and possibly - * low power mode by cedeing processor to hypervisor - */

- /* Indicate to hypervisor that we are idle. */ - get_lppaca()->idle = 1; + /* Indicate to hypervisor that we are idle. */ + get_lppaca()->idle = 1;

- /* - * Yield the processor to the hypervisor. We return if - * an external interrupt occurs (which are driven prior - * to returning here) or if a prod occurs from another - * processor. When returning here, external interrupts - * are enabled. - */ - cede_processor(); + /* + * Yield the processor to the hypervisor. We return if + * an external interrupt occurs (which are driven prior + * to returning here) or if a prod occurs from another + * processor. When returning here, external interrupts + * are enabled. + */ + cede_processor();

- get_lppaca()->idle = 0; - } + get_lppaca()->idle = 0; }

-- 1.8.4.108.g55ea5f6

Nicolas Pitre

5:45 p.m.

New subject: [PATCH v2 4/6] SH: remove redundant cpuidle_idle_call()

The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org --- arch/sh/kernel/idle.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c index 2ea4483fd7..be616ee0cf 100644 --- a/arch/sh/kernel/idle.c +++ b/arch/sh/kernel/idle.c @@ -16,7 +16,6 @@ #include <linux/thread_info.h> #include <linux/irqflags.h> #include <linux/smp.h> -#include <linux/cpuidle.h> #include <linux/atomic.h> #include <asm/pgalloc.h> #include <asm/smp.h> @@ -40,8 +39,7 @@ void arch_cpu_idle_dead(void)

void arch_cpu_idle(void) { - if (cpuidle_idle_call()) - sh_idle(); + sh_idle(); }

void __init select_idle_routine(void)

-- 1.8.4.108.g55ea5f6

Nicolas Pitre

5:45 p.m.

New subject: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()

The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org --- arch/x86/kernel/process.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8..4505e2a950 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,10 +298,7 @@ void arch_cpu_idle_dead(void) */ void arch_cpu_idle(void) { - if (cpuidle_idle_call()) - x86_idle(); - else - local_irq_enable(); + x86_idle(); }

-- 1.8.4.108.g55ea5f6

Olof Johansson

7:02 p.m.

New subject: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()

Hi,

On Wed, Jan 29, 2014 at 9:45 AM, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...

The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

arch/x86/kernel/process.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8..4505e2a950 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,10 +298,7 @@ void arch_cpu_idle_dead(void) */ void arch_cpu_idle(void) {
  if (cpuidle_idle_call())
          x86_idle();
  else
          local_irq_enable();
  x86_idle();

You're taking out the local_irq_enable() here but I don't see the equivalent of adding it back in the 1/6 patch that moves the cpuidle_idle_call() up to common code. It seems that one of the call paths through cpuidle_idle_call() don't re-enable it on its own.

Even if this is the right thing to do, why it's OK to do so should probably be documented in the patch description.

-Olof

Nicolas Pitre

8:14 p.m.

New subject: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()

On Wed, 29 Jan 2014, Olof Johansson wrote:

...

Hi,

On Wed, Jan 29, 2014 at 9:45 AM, Nicolas Pitre nicolas.pitre@linaro.org wrote:

...
The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre nico@linaro.org Acked-by: Daniel Lezcano daniel.lezcano@linaro.org

arch/x86/kernel/process.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 3fb8d95ab8..4505e2a950 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -298,10 +298,7 @@ void arch_cpu_idle_dead(void) */ void arch_cpu_idle(void) {
  if (cpuidle_idle_call())
          x86_idle();
  else
          local_irq_enable();
  x86_idle();
You're taking out the local_irq_enable() here but I don't see the equivalent of adding it back in the 1/6 patch that moves the cpuidle_idle_call() up to common code. It seems that one of the call paths through cpuidle_idle_call() don't re-enable it on its own.

When cpuidle_idle_call() returns non-zero, IRQs are left disabled. When it returns zero then IRQs should be disabled. Same goes for cpuidle drivers. That's the theory at least.

Looking into some cpuidle drivers for x86 I found at least one that doesn't respect this convention. Damn.

...

Even if this is the right thing to do, why it's OK to do so should probably be documented in the patch description.

Better yet, I'm going to amend patch 1/6 with the below to make things more reliable while still identifying misbehaving drivers.

diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c index ffcd3ee9af..14ca43430a 100644 --- a/kernel/cpu/idle.c +++ b/kernel/cpu/idle.c @@ -98,7 +98,8 @@ static void cpu_idle_loop(void) rcu_idle_enter(); if (cpuidle_idle_call()) arch_cpu_idle(); - WARN_ON_ONCE(irqs_disabled()); + if (WARN_ON_ONCE(irqs_disabled())) + local_irq_enable(); rcu_idle_exit(); start_critical_timings(); } else {

Peter Zijlstra

30 Jan 30 Jan

9:24 a.m.

New subject: [PATCH v2 5/6] X86: remove redundant cpuidle_idle_call()

On Wed, Jan 29, 2014 at 03:14:40PM -0500, Nicolas Pitre wrote:

...

Looking into some cpuidle drivers for x86 I found at least one that doesn't respect this convention. Damn.

Which one? We should probably fix it :-)

Nicolas Pitre

29 Jan 29 Jan

5:45 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

Integration of cpuidle with the scheduler requires that the idle loop be closely integrated with the scheduler proper. Moving cpu/idle.c into the sched directory will allow for a smoother integration, and eliminate a subdirectory which contained only one source file.

Signed-off-by: Nicolas Pitre nico@linaro.org --- kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/sched/Makefile | 2 +- kernel/{cpu => sched}/idle.c | 0 4 files changed, 1 insertion(+), 3 deletions(-) delete mode 100644 kernel/cpu/Makefile rename kernel/{cpu => sched}/idle.c (100%)

diff --git a/kernel/Makefile b/kernel/Makefile index bc010ee272..6f1c7e5cfc 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -22,7 +22,6 @@ obj-y += sched/ obj-y += locking/ obj-y += power/ obj-y += printk/ -obj-y += cpu/ obj-y += irq/ obj-y += rcu/

diff --git a/kernel/cpu/Makefile b/kernel/cpu/Makefile deleted file mode 100644 index 59ab052ef7..0000000000 --- a/kernel/cpu/Makefile +++ /dev/null @@ -1 +0,0 @@ -obj-y = idle.o diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 7b621409cf..ac3e0ea68f 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif

-- 1.8.4.108.g55ea5f6

Peter Zijlstra

30 Jan 30 Jan

3:25 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Wed, Jan 29, 2014 at 12:45:13PM -0500, Nicolas Pitre wrote:

...

Integration of cpuidle with the scheduler requires that the idle loop be closely integrated with the scheduler proper. Moving cpu/idle.c into the sched directory will allow for a smoother integration, and eliminate a subdirectory which contained only one source file.

Signed-off-by: Nicolas Pitre nico@linaro.org

kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/sched/Makefile | 2 +- kernel/{cpu => sched}/idle.c | 0 4 files changed, 1 insertion(+), 3 deletions(-) delete mode 100644 kernel/cpu/Makefile rename kernel/{cpu => sched}/idle.c (100%)

...

--- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif -obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o +obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o obj-y += wait.o completion.o obj-$(CONFIG_SMP) += cpupri.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o diff --git a/kernel/cpu/idle.c b/kernel/sched/idle.c similarity index 100% rename from kernel/cpu/idle.c rename to kernel/sched/idle.c

This is not a valid patch for PATCH(1). Please try again.

Nicolas Pitre

4:03 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, 30 Jan 2014, Peter Zijlstra wrote:

...

On Wed, Jan 29, 2014 at 12:45:13PM -0500, Nicolas Pitre wrote:

...
Integration of cpuidle with the scheduler requires that the idle loop be closely integrated with the scheduler proper. Moving cpu/idle.c into the sched directory will allow for a smoother integration, and eliminate a subdirectory which contained only one source file.

Signed-off-by: Nicolas Pitre nico@linaro.org

kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/sched/Makefile | 2 +- kernel/{cpu => sched}/idle.c | 0 4 files changed, 1 insertion(+), 3 deletions(-) delete mode 100644 kernel/cpu/Makefile rename kernel/{cpu => sched}/idle.c (100%)

...
--- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif -obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o +obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o obj-y += wait.o completion.o obj-$(CONFIG_SMP) += cpupri.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o diff --git a/kernel/cpu/idle.c b/kernel/sched/idle.c similarity index 100% rename from kernel/cpu/idle.c rename to kernel/sched/idle.c

This is not a valid patch for PATCH(1). Please try again.

Don't you use git? ;-)

Here's a plain patch:

----- >8

...

From 1bf40eb80a44633094e94986a74bd5ffa222f9d4 Mon Sep 17 00:00:00 2001

From: Nicolas Pitre nicolas.pitre@linaro.org Date: Sun, 26 Jan 2014 23:42:01 -0500 Subject: [PATCH] cpu/idle.c: move to sched/idle.c

Signed-off-by: Nicolas Pitre nico@linaro.org --- kernel/Makefile | 1 - kernel/cpu/Makefile | 1 - kernel/cpu/idle.c | 144 -------------------------------------------------- kernel/sched/Makefile | 2 +- kernel/sched/idle.c | 144 ++++++++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 145 insertions(+), 147 deletions(-) delete mode 100644 kernel/cpu/Makefile delete mode 100644 kernel/cpu/idle.c create mode 100644 kernel/sched/idle.c

diff --git a/kernel/cpu/Makefile b/kernel/cpu/Makefile deleted file mode 100644 index 59ab052ef7..0000000000 --- a/kernel/cpu/Makefile +++ /dev/null @@ -1 +0,0 @@ -obj-y = idle.o diff --git a/kernel/cpu/idle.c b/kernel/cpu/idle.c deleted file mode 100644 index 14ca43430a..0000000000 --- a/kernel/cpu/idle.c +++ /dev/null @@ -1,144 +0,0 @@ -/* - * Generic entry point for the idle threads - */ -#include <linux/sched.h> -#include <linux/cpu.h> -#include <linux/cpuidle.h> -#include <linux/tick.h> -#include <linux/mm.h> -#include <linux/stackprotector.h> - -#include <asm/tlb.h> - -#include <trace/events/power.h> - -static int __read_mostly cpu_idle_force_poll; - -void cpu_idle_poll_ctrl(bool enable) -{ - if (enable) { - cpu_idle_force_poll++; - } else { - cpu_idle_force_poll--; - WARN_ON_ONCE(cpu_idle_force_poll < 0); - } -} - -#ifdef CONFIG_GENERIC_IDLE_POLL_SETUP -static int __init cpu_idle_poll_setup(char *__unused) -{ - cpu_idle_force_poll = 1; - return 1; -} -__setup("nohlt", cpu_idle_poll_setup); - -static int __init cpu_idle_nopoll_setup(char *__unused) -{ - cpu_idle_force_poll = 0; - return 1; -} -__setup("hlt", cpu_idle_nopoll_setup); -#endif - -static inline int cpu_idle_poll(void) -{ - rcu_idle_enter(); - trace_cpu_idle_rcuidle(0, smp_processor_id()); - local_irq_enable(); - while (!tif_need_resched()) - cpu_relax(); - trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); - rcu_idle_exit(); - return 1; -} - -/* Weak implementations for optional arch specific functions */ -void __weak arch_cpu_idle_prepare(void) { } -void __weak arch_cpu_idle_enter(void) { } -void __weak arch_cpu_idle_exit(void) { } -void __weak arch_cpu_idle_dead(void) { } -void __weak arch_cpu_idle(void) -{ - cpu_idle_force_poll = 1; - local_irq_enable(); -} - -/* - * Generic idle loop implementation - */ -static void cpu_idle_loop(void) -{ - while (1) { - tick_nohz_idle_enter(); - - while (!need_resched()) { - check_pgt_cache(); - rmb(); - - if (cpu_is_offline(smp_processor_id())) - arch_cpu_idle_dead(); - - local_irq_disable(); - arch_cpu_idle_enter(); - - /* - * In poll mode we reenable interrupts and spin. - * - * Also if we detected in the wakeup from idle - * path that the tick broadcast device expired - * for us, we don't want to go deep idle as we - * know that the IPI is going to arrive right - * away - */ - if (cpu_idle_force_poll || tick_check_broadcast_expired()) { - cpu_idle_poll(); - } else { - if (!current_clr_polling_and_test()) { - stop_critical_timings(); - rcu_idle_enter(); - if (cpuidle_idle_call()) - arch_cpu_idle(); - if (WARN_ON_ONCE(irqs_disabled())) - local_irq_enable(); - rcu_idle_exit(); - start_critical_timings(); - } else { - local_irq_enable(); - } - __current_set_polling(); - } - arch_cpu_idle_exit(); - /* - * We need to test and propagate the TIF_NEED_RESCHED - * bit here because we might not have send the - * reschedule IPI to idle tasks. - */ - if (tif_need_resched()) - set_preempt_need_resched(); - } - tick_nohz_idle_exit(); - schedule_preempt_disabled(); - } -} - -void cpu_startup_entry(enum cpuhp_state state) -{ - /* - * This #ifdef needs to die, but it's too late in the cycle to - * make this generic (arm and sh have never invoked the canary - * init for the non boot cpus!). Will be fixed in 3.11 - */ -#ifdef CONFIG_X86 - /* - * If we're the non-boot CPU, nothing set the stack canary up - * for us. The boot CPU already has it initialized but no harm - * in doing it again. This is a good place for updating it, as - * we wont ever return from this function (so the invalid - * canaries already on the stack wont ever trigger). - */ - boot_init_stack_canary(); -#endif - __current_set_polling(); - arch_cpu_idle_prepare(); - cpu_idle_loop(); -} diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile index 7b621409cf..ac3e0ea68f 100644 --- a/kernel/sched/Makefile +++ b/kernel/sched/Makefile @@ -11,7 +11,7 @@ ifneq ($(CONFIG_SCHED_OMIT_FRAME_POINTER),y) CFLAGS_core.o := $(PROFILING) -fno-omit-frame-pointer endif

-obj-y += core.o proc.o clock.o cputime.o idle_task.o fair.o rt.o stop_task.o +obj-y += core.o proc.o clock.o cputime.o idle_task.o idle.o fair.o rt.o stop_task.o obj-y += wait.o completion.o obj-$(CONFIG_SMP) += cpupri.o obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c new file mode 100644 index 0000000000..14ca43430a --- /dev/null +++ b/kernel/sched/idle.c @@ -0,0 +1,144 @@ +/* + * Generic entry point for the idle threads + */ +#include <linux/sched.h> +#include <linux/cpu.h> +#include <linux/cpuidle.h> +#include <linux/tick.h> +#include <linux/mm.h> +#include <linux/stackprotector.h> + +#include <asm/tlb.h> + +#include <trace/events/power.h> + +static int __read_mostly cpu_idle_force_poll; + +void cpu_idle_poll_ctrl(bool enable) +{ + if (enable) { + cpu_idle_force_poll++; + } else { + cpu_idle_force_poll--; + WARN_ON_ONCE(cpu_idle_force_poll < 0); + } +} + +#ifdef CONFIG_GENERIC_IDLE_POLL_SETUP +static int __init cpu_idle_poll_setup(char *__unused) +{ + cpu_idle_force_poll = 1; + return 1; +} +__setup("nohlt", cpu_idle_poll_setup); + +static int __init cpu_idle_nopoll_setup(char *__unused) +{ + cpu_idle_force_poll = 0; + return 1; +} +__setup("hlt", cpu_idle_nopoll_setup); +#endif + +static inline int cpu_idle_poll(void) +{ + rcu_idle_enter(); + trace_cpu_idle_rcuidle(0, smp_processor_id()); + local_irq_enable(); + while (!tif_need_resched()) + cpu_relax(); + trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); + rcu_idle_exit(); + return 1; +} + +/* Weak implementations for optional arch specific functions */ +void __weak arch_cpu_idle_prepare(void) { } +void __weak arch_cpu_idle_enter(void) { } +void __weak arch_cpu_idle_exit(void) { } +void __weak arch_cpu_idle_dead(void) { } +void __weak arch_cpu_idle(void) +{ + cpu_idle_force_poll = 1; + local_irq_enable(); +} + +/* + * Generic idle loop implementation + */ +static void cpu_idle_loop(void) +{ + while (1) { + tick_nohz_idle_enter(); + + while (!need_resched()) { + check_pgt_cache(); + rmb(); + + if (cpu_is_offline(smp_processor_id())) + arch_cpu_idle_dead(); + + local_irq_disable(); + arch_cpu_idle_enter(); + + /* + * In poll mode we reenable interrupts and spin. + * + * Also if we detected in the wakeup from idle + * path that the tick broadcast device expired + * for us, we don't want to go deep idle as we + * know that the IPI is going to arrive right + * away + */ + if (cpu_idle_force_poll || tick_check_broadcast_expired()) { + cpu_idle_poll(); + } else { + if (!current_clr_polling_and_test()) { + stop_critical_timings(); + rcu_idle_enter(); + if (cpuidle_idle_call()) + arch_cpu_idle(); + if (WARN_ON_ONCE(irqs_disabled())) + local_irq_enable(); + rcu_idle_exit(); + start_critical_timings(); + } else { + local_irq_enable(); + } + __current_set_polling(); + } + arch_cpu_idle_exit(); + /* + * We need to test and propagate the TIF_NEED_RESCHED + * bit here because we might not have send the + * reschedule IPI to idle tasks. + */ + if (tif_need_resched()) + set_preempt_need_resched(); + } + tick_nohz_idle_exit(); + schedule_preempt_disabled(); + } +} + +void cpu_startup_entry(enum cpuhp_state state) +{ + /* + * This #ifdef needs to die, but it's too late in the cycle to + * make this generic (arm and sh have never invoked the canary + * init for the non boot cpus!). Will be fixed in 3.11 + */ +#ifdef CONFIG_X86 + /* + * If we're the non-boot CPU, nothing set the stack canary up + * for us. The boot CPU already has it initialized but no harm + * in doing it again. This is a good place for updating it, as + * we wont ever return from this function (so the invalid + * canaries already on the stack wont ever trigger). + */ + boot_init_stack_canary(); +#endif + __current_set_polling(); + arch_cpu_idle_prepare(); + cpu_idle_loop(); +}

-- 1.8.4.108.g55ea5f6

Peter Zijlstra

4:27 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, Jan 30, 2014 at 11:03:31AM -0500, Nicolas Pitre wrote:

...

...
This is not a valid patch for PATCH(1). Please try again.

Don't you use git? ;-)

Nah, git and me don't get along well.

...

Here's a plain patch:

Thanks!

Joe Perches

4:41 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, 2014-01-30 at 17:27 +0100, Peter Zijlstra wrote:

...

On Thu, Jan 30, 2014 at 11:03:31AM -0500, Nicolas Pitre wrote:

...
...
This is not a valid patch for PATCH(1). Please try again.

Don't you use git? ;-)

Nah, git and me don't get along well.

Perhaps you could use a newer version of patch

http://savannah.gnu.org/forum/forum.php?forum_id=7361

GNU patch version 2.7 released

Item posted by Andreas Gruenbacher <agruen> on Wed 12 Sep 2012 02:18:14 PM UTC.

I am pleased to announce that version 2.7 of GNU patch has been released. The following significant changes have happened since the last stable release in December 2009:

* Support for most features of the "diff --git" format, including renames and copies, permission changes, and symlink diffs. Binary diffs are not supported yet; patch will complain and skip them.

Peter Zijlstra

4:52 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, Jan 30, 2014 at 08:41:16AM -0800, Joe Perches wrote:

...

Perhaps you could use a newer version of patch

GNU patch version 2.7 released

Yeah, I know about that, I'll wait until its common in all distros, updating all machines I use by hand is just painful.

Nicolas Pitre

6 Feb 6 Feb

2:09 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, 30 Jan 2014, Peter Zijlstra wrote:

...

On Thu, Jan 30, 2014 at 11:03:31AM -0500, Nicolas Pitre wrote:

...
...
This is not a valid patch for PATCH(1). Please try again.

Don't you use git? ;-)

Nah, git and me don't get along well.

...
Here's a plain patch:

Thanks!

Hi Peter,

Did you merge those patches in your tree? If so, is it published somewhere? That would be a good idea if that could appear in linux-next so to prevent people from adding more calls to cpuidle_idle_call() from architecture code. I'm sending you 2 additional patches right away to remove those that appeared in v3.14-rc1.

Nicolas

Peter Zijlstra

4:43 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, Feb 06, 2014 at 02:09:59PM +0000, Nicolas Pitre wrote:

...

Hi Peter,

Did you merge those patches in your tree?

tree, tree, what's in a word. Its in my patch stack yes. I should get some of that into tip I suppose, been side-tracked a bit this week. Sorry for the delay.

...

If so, is it published somewhere?

http://programming.kicks-ass.net/sekrit/patches.tar.bz2

It is of varying quality at best.

...

That would be a good idea if that could appear in linux-next so to prevent people from adding more calls to cpuidle_idle_call() from architecture code. I'm sending you 2 additional patches right away to remove those that appeared in v3.14-rc1.

Yeah, once they land in tip they'll end up in -next automagically. I'll try and get that sorted tomorrow somewhere.

Nicolas Pitre

7 Feb 7 Feb

11:09 a.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Thu, 6 Feb 2014, Peter Zijlstra wrote:

...

On Thu, Feb 06, 2014 at 02:09:59PM +0000, Nicolas Pitre wrote:

...
Hi Peter,

Did you merge those patches in your tree?

tree, tree, what's in a word.

Something you may plant on a patch of grass? "Merging" becomes a strange concept in that context though. :-)

...

Its in my patch stack yes.

Quilt I suppose?? (yet another word.)

...

I should get some of that into tip I suppose, been side-tracked a bit this week. Sorry for the delay.

If you prefer we pile those patches (and future ones after revew) ourselves just let me know. Future patches are likely to be more intimate with the scheduler so I just need to know who to upstream them through afterwards.

Nicolas

Peter Zijlstra

12:32 p.m.

New subject: [PATCH v2 6/6] cpu/idle.c: move to sched/idle.c

On Fri, Feb 07, 2014 at 11:09:23AM +0000, Nicolas Pitre wrote:

...

On Thu, 6 Feb 2014, Peter Zijlstra wrote:

...
tree, tree, what's in a word.

Something you may plant on a patch of grass? "Merging" becomes a strange concept in that context though. :-)

I do know some farmers who splice trees thought :-)

...

...
Its in my patch stack yes.

Quilt I suppose?? (yet another word.)

Yes, I'm one of the refugee Quilt users. Comes in handy when cold too :-)

...

...
I should get some of that into tip I suppose, been side-tracked a bit this week. Sorry for the delay.

If you prefer we pile those patches (and future ones after revew) ourselves just let me know. Future patches are likely to be more intimate with the scheduler so I just need to know who to upstream them through afterwards.

Normally I get Ingo to pick up the queue 1-2 times a week so latency shouldn't be too bad. But we just had the merge window and then I got side-tracked rewriting all atomic implementations.

So usually submit patches against tip/master unless you know there's other pending bits that conflict, in which case you can grab my queue on top of tip/master -- but I try to make sure that's usually not needed.

Peter Zijlstra

30 Jan 30 Jan

9:28 a.m.

On Wed, Jan 29, 2014 at 12:45:07PM -0500, Nicolas Pitre wrote:

...

As everyone should know by now, we want to integrate the cpuidle governor with the scheduler for a more efficient idling of CPUs. In order to help the transition, this small patch series moves the existing interaction with cpuidle from architecture code to generic core code. The ARM, PPC, SH and X86 architectures are concerned. No functional change should have occurred yet.

@peterz: Are you willing to pick up those patches?

Yeah.. no objections. Should I pick these up or will you be sending another round?

Nicolas Pitre

1:31 p.m.

On Thu, 30 Jan 2014, Peter Zijlstra wrote:

...

On Wed, Jan 29, 2014 at 12:45:07PM -0500, Nicolas Pitre wrote:

...
As everyone should know by now, we want to integrate the cpuidle governor with the scheduler for a more efficient idling of CPUs. In order to help the transition, this small patch series moves the existing interaction with cpuidle from architecture code to generic core code. The ARM, PPC, SH and X86 architectures are concerned. No functional change should have occurred yet.

@peterz: Are you willing to pick up those patches?

Yeah.. no objections. Should I pick these up or will you be sending another round?

I think you could pick them now, taking care of picking up the amended #1/6.

Nicolas

4528

days inactive

4537

days old

linaro-kernel@lists.linaro.org

29 comments

participants

tags (0)

participants (6)

Daniel Lezcano
Joe Perches
Nicolas Pitre
Olof Johansson
Peter Zijlstra
Preeti U Murthy