[PATCH 0/4] time: dynamic irq affinity

List overview All Threads
Download

newer

older

[PATCH 1/3] workqueue: define...

[PATCH v13 0/2] Add display-timing...

Daniel Lezcano

26 Feb 2013 26 Feb '13

10:17 p.m.

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer

Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity

arch/arm/common/timer-sp.c | 3 ++- drivers/clocksource/nomadik-mtu.c | 3 ++- include/linux/clockchips.h | 1 + kernel/time/tick-broadcast.c | 40 +++++++++++++++++++++++++++++-------- 4 files changed, 37 insertions(+), 10 deletions(-)

-- 1.7.9.5

Show replies by date

Daniel Lezcano

26 Feb 26 Feb

10:17 p.m.

New subject: [PATCH 1/4] time : pass broadcast parameter

The broadcast timer could be passed as parameter to the function instead of using again tick_broadcast_device.evtdev which was previously used in the caller function.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- kernel/time/tick-broadcast.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 2fb8cb8..6197ac0 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -406,10 +406,9 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void) return to_cpumask(tick_broadcast_oneshot_mask); }

-static int tick_broadcast_set_event(ktime_t expires, int force) +static int tick_broadcast_set_event(struct clock_event_device *bc, + ktime_t expires, int force) { - struct clock_event_device *bc = tick_broadcast_device.evtdev; - if (bc->mode != CLOCK_EVT_MODE_ONESHOT) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);

@@ -479,7 +478,7 @@ again: * Rearm the broadcast device. If event expired, * repeat the above */ - if (tick_broadcast_set_event(next_event, 0)) + if (tick_broadcast_set_event(dev, next_event, 0)) goto again; } raw_spin_unlock(&tick_broadcast_lock); @@ -522,7 +521,7 @@ void tick_broadcast_oneshot_control(unsigned long reason) cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); if (dev->next_event.tv64 < bc->next_event.tv64) - tick_broadcast_set_event(dev->next_event, 1); + tick_broadcast_set_event(bc, dev->next_event, 1); } } else { if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { @@ -591,7 +590,7 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); tick_broadcast_init_next_event(to_cpumask(tmpmask), tick_next_period); - tick_broadcast_set_event(tick_next_period, 1); + tick_broadcast_set_event(bc, tick_next_period, 1); } else bc->next_event.tv64 = KTIME_MAX; } else {

-- 1.7.9.5

Santosh Shilimkar

27 Feb 27 Feb

5:09 a.m.

New subject: [PATCH 1/4] time : pass broadcast parameter

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...

The broadcast timer could be passed as parameter to the function instead of using again tick_broadcast_device.evtdev which was previously used in the caller function.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org

The change doesn't buy us as such even after looking at next patch which tries to use bc. No strong opinion though.

Regards, Santosh

Daniel Lezcano

26 Feb 26 Feb

10:17 p.m.

New subject: [PATCH 2/4] time : set broadcast irq affinity

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- include/linux/clockchips.h | 1 + kernel/time/tick-broadcast.c | 39 ++++++++++++++++++++++++++++++++------- 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 6634652..c256cea 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -54,6 +54,7 @@ enum clock_event_nofitiers { */ #define CLOCK_EVT_FEAT_C3STOP 0x000008 #define CLOCK_EVT_FEAT_DUMMY 0x000010 +#define CLOCK_EVT_FEAT_DYNIRQ 0x000020

/** * struct clock_event_device - clock event device descriptor diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 6197ac0..1f7b4f4 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -406,13 +406,36 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void) return to_cpumask(tick_broadcast_oneshot_mask); }

-static int tick_broadcast_set_event(struct clock_event_device *bc, +/* + * Set broadcast interrupt affinity + */ +static void tick_broadcast_set_affinity(struct clock_event_device *bc, int cpu) +{ + if (!(bc->features & CLOCK_EVT_FEAT_DYNIRQ)) + return; + + if (cpumask_equal(bc->cpumask, cpumask_of(cpu))) + return; + + bc->cpumask = cpumask_of(cpu); + irq_set_affinity(bc->irq, bc->cpumask); +} + +static int tick_broadcast_set_event(struct clock_event_device *bc, int cpu, ktime_t expires, int force) { + int ret; + if (bc->mode != CLOCK_EVT_MODE_ONESHOT) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);

- return clockevents_program_event(bc, expires, force); + ret = clockevents_program_event(bc, expires, force); + if (ret) + return ret; + + tick_broadcast_set_affinity(bc, cpu); + + return 0; }

int tick_resume_broadcast_oneshot(struct clock_event_device *bc) @@ -441,7 +464,7 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) { struct tick_device *td; ktime_t now, next_event; - int cpu; + int cpu, next_cpu;

raw_spin_lock(&tick_broadcast_lock); again: @@ -454,8 +477,10 @@ again: td = &per_cpu(tick_cpu_device, cpu); if (td->evtdev->next_event.tv64 <= now.tv64) cpumask_set_cpu(cpu, to_cpumask(tmpmask)); - else if (td->evtdev->next_event.tv64 < next_event.tv64) + else if (td->evtdev->next_event.tv64 < next_event.tv64) { next_event.tv64 = td->evtdev->next_event.tv64; + next_cpu = cpu; + } }

/* @@ -478,7 +503,7 @@ again: * Rearm the broadcast device. If event expired, * repeat the above */ - if (tick_broadcast_set_event(dev, next_event, 0)) + if (tick_broadcast_set_event(dev, next_cpu, next_event, 0)) goto again; } raw_spin_unlock(&tick_broadcast_lock); @@ -521,7 +546,7 @@ void tick_broadcast_oneshot_control(unsigned long reason) cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); if (dev->next_event.tv64 < bc->next_event.tv64) - tick_broadcast_set_event(bc, dev->next_event, 1); + tick_broadcast_set_event(bc, cpu, dev->next_event, 1); } } else { if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { @@ -590,7 +615,7 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); tick_broadcast_init_next_event(to_cpumask(tmpmask), tick_next_period); - tick_broadcast_set_event(bc, tick_next_period, 1); + tick_broadcast_set_event(bc, cpu, tick_next_period, 1); } else bc->next_event.tv64 = KTIME_MAX; } else {

-- 1.7.9.5

Santosh Shilimkar

27 Feb 27 Feb

5:33 a.m.

New subject: [PATCH 2/4] time : set broadcast irq affinity

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Minor. Can mention the flag name as well here "CLOCK_EVT_FEAT_DYNIRQ"

...

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org

include/linux/clockchips.h | 1 + kernel/time/tick-broadcast.c | 39 ++++++++++++++++++++++++++++++++------- 2 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 6634652..c256cea 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -54,6 +54,7 @@ enum clock_event_nofitiers { */ #define CLOCK_EVT_FEAT_C3STOP 0x000008 #define CLOCK_EVT_FEAT_DUMMY 0x000010 +#define CLOCK_EVT_FEAT_DYNIRQ 0x000020

Please add some comments about the usage of the flag.

...

/**

struct clock_event_device - clock event device descriptor

diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 6197ac0..1f7b4f4 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -406,13 +406,36 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void) return to_cpumask(tick_broadcast_oneshot_mask); }

-static int tick_broadcast_set_event(struct clock_event_device *bc, +/*

Set broadcast interrupt affinity

*/

+static void tick_broadcast_set_affinity(struct clock_event_device *bc, int cpu) +{

Better is just make second parameter as cpu_mask rather than CPU cpu number. Its a semantic of affinity hook which you can easily retain.

...

if (!(bc->features & CLOCK_EVT_FEAT_DYNIRQ))
return;
if (cpumask_equal(bc->cpumask, cpumask_of(cpu)))
return;
bc->cpumask = cpumask_of(cpu);

You can avoid the cpumask_of() couple of times above.

...

irq_set_affinity(bc->irq, bc->cpumask);

+}

+static int tick_broadcast_set_event(struct clock_event_device *bc, int cpu, ktime_t expires, int force) {

int ret;

if (bc->mode != CLOCK_EVT_MODE_ONESHOT) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);

return clockevents_program_event(bc, expires, force);
ret = clockevents_program_event(bc, expires, force);

if (ret)
return ret;
tick_broadcast_set_affinity(bc, cpu);

In case you go by cpumask paramater, then above can be just tick_broadcast_set_affinity(bc, cpumask_of(cpu));

...

return 0; }

int tick_resume_broadcast_oneshot(struct clock_event_device *bc)

@@ -441,7 +464,7 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) { struct tick_device *td; ktime_t now, next_event;

int cpu;

int cpu, next_cpu;

raw_spin_lock(&tick_broadcast_lock); again:

@@ -454,8 +477,10 @@ again: td = &per_cpu(tick_cpu_device, cpu); if (td->evtdev->next_event.tv64 <= now.tv64) cpumask_set_cpu(cpu, to_cpumask(tmpmask));
else if (td->evtdev->next_event.tv64 < next_event.tv64)
else if (td->evtdev->next_event.tv64 < next_event.tv64) {
next_event.tv64 = td->evtdev->next_event.tv64;
	next_cpu = cpu;
}
}

/*
@@ -478,7 +503,7 @@ again: * Rearm the broadcast device. If event expired, * repeat the above */
if (tick_broadcast_set_event(dev, next_event, 0))
if (tick_broadcast_set_event(dev, next_cpu, next_event, 0))
goto again;
} raw_spin_unlock(&tick_broadcast_lock);
@@ -521,7 +546,7 @@ void tick_broadcast_oneshot_control(unsigned long reason) cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); if (dev->next_event.tv64 < bc->next_event.tv64)
		tick_broadcast_set_event(bc, dev->next_event, 1);
		tick_broadcast_set_event(bc, cpu, dev->next_event, 1);

Since you have embedded the irq_affinity() in above function, the IRQ affinity for bc->irq will remain to the last CPU on which the interrupt fired. In general it should be fine but would be good if you clear it on CLOCK_EVT_NOTIFY_BROADCAST_EXIT. Not a must have though.

Regards, Santosh

Daniel Lezcano

26 Feb 26 Feb

10:17 p.m.

New subject: [PATCH 3/4] ARM: nomadik: add dynamic irq flag to the timer

Add the dynamic irq affinity feature to the timer clock device.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- drivers/clocksource/nomadik-mtu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/clocksource/nomadik-mtu.c b/drivers/clocksource/nomadik-mtu.c index 7cbcaa0..73dc540 100644 --- a/drivers/clocksource/nomadik-mtu.c +++ b/drivers/clocksource/nomadik-mtu.c @@ -136,7 +136,8 @@ static void nmdk_clkevt_mode(enum clock_event_mode mode,

static struct clock_event_device nmdk_clkevt = { .name = "mtu_1", - .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC, + .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC | + CLOCK_EVT_FEAT_DYNIRQ, .rating = 200, .set_mode = nmdk_clkevt_mode, .set_next_event = nmdk_clkevt_next,

-- 1.7.9.5

Linus Walleij

1 Mar 1 Mar

1:13 a.m.

New subject: [PATCH 3/4] ARM: nomadik: add dynamic irq flag to the timer

On Tue, Feb 26, 2013 at 11:17 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:

...

Add the dynamic irq affinity feature to the timer clock device.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org

Looks reasonable to me, sadly I do not fully grasp the patch set, Vincent+Rickard can you have a look at this?

Yours, Linus Walleij

Vincent Guittot

8:56 a.m.

New subject: [PATCH 3/4] ARM: nomadik: add dynamic irq flag to the timer

On 1 March 2013 02:13, Linus Walleij linus.walleij@linaro.org wrote:

...

On Tue, Feb 26, 2013 at 11:17 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:

...
Add the dynamic irq affinity feature to the timer clock device.

Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org

Looks reasonable to me, sadly I do not fully grasp the patch set, Vincent+Rickard can you have a look at this?

ux500 is able to trig the wake up on one CPU and let the other one in WFI. This patch will minimize the spurious wake up of CPU0 when CPU1 is the target CPU of the broadcast timer. One main consequence is that we will not uselessly execute all the deferrable and newly idle activities on the CPU0 .

you can add my reviewed-by if you want

Vincent

...

Yours, Linus Walleij

Rickard Andersson

1:28 p.m.

New subject: [PATCH 3/4] ARM: nomadik: add dynamic irq flag to the timer

On 03/01/2013 09:56 AM, Vincent Guittot wrote:

...

On 1 March 2013 02:13, Linus Walleijlinus.walleij@linaro.org wrote:

...
On Tue, Feb 26, 2013 at 11:17 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:

...
Add the dynamic irq affinity feature to the timer clock device.

Signed-off-by: Daniel Lezcanodaniel.lezcano@linaro.org

Looks reasonable to me, sadly I do not fully grasp the patch set, Vincent+Rickard can you have a look at this?

ux500 is able to trig the wake up on one CPU and let the other one in WFI. This patch will minimize the spurious wake up of CPU0 when CPU1 is the target CPU of the broadcast timer. One main consequence is that we will not uselessly execute all the deferrable and newly idle activities on the CPU0 .

you can add my reviewed-by if you want

Vincent

It looks ok to me as well.

BR Rickard

Daniel Lezcano

26 Feb 26 Feb

10:17 p.m.

New subject: [PATCH 4/4] ARM: timer-sp: Set dynamic irq affinity

From: Viresh Kumar viresh.kumar@linaro.org

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch fixes this for ARM platforms using timer-sp, by setting CLOCK_EVT_FEAT_DYNIRQ feature.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- arch/arm/common/timer-sp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/common/timer-sp.c b/arch/arm/common/timer-sp.c index 9d2d3ba..ae3c0f9 100644 --- a/arch/arm/common/timer-sp.c +++ b/arch/arm/common/timer-sp.c @@ -158,7 +158,8 @@ static int sp804_set_next_event(unsigned long next, }

static struct clock_event_device sp804_clockevent = { - .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, + .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT | + CLOCK_EVT_FEAT_DYNIRQ, .set_mode = sp804_set_mode, .set_next_event = sp804_set_next_event, .rating = 300,

-- 1.7.9.5

Santosh Shilimkar

27 Feb 27 Feb

4:56 a.m.

New subject: [PATCH 4/4] ARM: timer-sp: Set dynamic irq affinity

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...

From: Viresh Kumar viresh.kumar@linaro.org

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

Broad-cast device will only open the CPU for which the timer IRQ affined to. And infact with subject series the affinity also is updated for the CPU which owns the last timer expiry event.

...

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch fixes this for ARM platforms using timer-sp, by setting CLOCK_EVT_FEAT_DYNIRQ feature.

Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org

What am I missing here ?

Regards, Santosh

Viresh Kumar

4:59 a.m.

New subject: [PATCH 4/4] ARM: timer-sp: Set dynamic irq affinity

On 27 February 2013 10:26, Santosh Shilimkar santosh.shilimkar@ti.com wrote:

...

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...
From: Viresh Kumar viresh.kumar@linaro.org

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

Broad-cast device will only open the CPU for which the timer IRQ affined to. And infact with subject series the affinity also is updated for the CPU which owns the last timer expiry event.

What am I missing here ?

Dynamic affinity will work only if the following flag is set for a clock_event_device: CLOCK_EVT_FEAT_DYNIRQ, otherwise wakeup would happen on the cpu to which static affinity was set to.

Santosh Shilimkar

5:04 a.m.

New subject: [PATCH 4/4] ARM: timer-sp: Set dynamic irq affinity

On Wednesday 27 February 2013 10:29 AM, Viresh Kumar wrote:

...

On 27 February 2013 10:26, Santosh Shilimkar santosh.shilimkar@ti.com wrote:

...
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...
From: Viresh Kumar viresh.kumar@linaro.org

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

Broad-cast device will only open the CPU for which the timer IRQ affined to. And infact with subject series the affinity also is updated for the CPU which owns the last timer expiry event.

What am I missing here ?

Dynamic affinity will work only if the following flag is set for a clock_event_device: CLOCK_EVT_FEAT_DYNIRQ, otherwise wakeup would happen on the cpu to which static affinity was set to.

I should have looked at the patches in order first :) Sorry for the noise.

Regards, Santosh

Santosh Shilimkar

6 a.m.

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Not completely related to this series but there is another issue where this local timer not wakeup capable hurts. So far we are discussing only the timer related future events which are known and can be programmed with broadcast device.

But think of the scenario's where we need to send asynchronous IPIs to CPUs to do some work. e.g generic_exec_single(). If the CPU which is suppose to be available after IPI call is in deep low power state, then the IPI(implemented on ARM) isn't effective. In CPU off idle modes, a GIC SGI will not wake the CPU and hence a special wakeup is needed to bring out those CPUs out of idle. This special wakeup is handled by broad-cast timer in case of CPUIDLE.

In short what I mean is, you need to have IPI which can wakeup CPUs from any deep idle power state to address above. Has anybody thought of this one ?

Regards, Santosh P.S: Time and again it proves that making the local timer wakeup capable solves the issue.

Russell King - ARM Linux

10:47 a.m.

On Wed, Feb 27, 2013 at 11:30:11AM +0530, Santosh Shilimkar wrote:

...

P.S: Time and again it proves that making the local timer wakeup capable solves the issue.

Slightly different take: it proves that hardware people don't talk to software people about what they require to make an operating system work. Hardware people think they understand that and go off and do their own thing, and expect software people to sort out their mess.

This happens all the time; there is no solution for it as long as companies view the creation of hardware as being entirely separate from software.

Thomas Gleixner

10 p.m.

On Wed, 27 Feb 2013, Russell King - ARM Linux wrote:

...

On Wed, Feb 27, 2013 at 11:30:11AM +0530, Santosh Shilimkar wrote:

...
P.S: Time and again it proves that making the local timer wakeup capable solves the issue.

Slightly different take: it proves that hardware people don't talk to software people about what they require to make an operating system work. Hardware people think they understand that and go off and do their own thing, and expect software people to sort out their mess.

This happens all the time; there is no solution for it as long as companies view the creation of hardware as being entirely separate from software.

Amen!

We have seen the mess this kind of thinking creates on x86 already 10+ years ago and we are still suffering.

As I said before, I really can't understand that ARM and the ARM SoC vendors insisted to repeat the same mistakes, which we kernel people have proven to be fatal already. It's even worse: they asked me what problems they should avoid before they went to implement them.

I can halfways understand the little kid who insists to burn his hand on the hot oven instead of listening to parental advice, but this kind of advisory resistance is either caused by abuse of secrect drugs or by living in a disconnected universe or both.

Thanks,

tglx

Santosh Shilimkar

10 Mar 10 Mar

5:33 p.m.

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...

When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer

Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity

Thanks Daniel for addressing the comments from earlier version. This version looks good to me.

Reviewed-by: Santosh Shilimkar santosh.shilimkar@ti.com

Regards, Santosh P.S: As I mentioned 'CLOCK_EVT_FEAT_DYNIRQ' optimization on OMAP at least I found risky because you might end up missing the asynchronous IPI wakeups because of the current SGI's implementation. This must be true for other ARM platforms as well.

Daniel Lezcano

6:22 p.m.

On 03/10/2013 06:33 PM, Santosh Shilimkar wrote:

...

On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer

Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity

Thanks Daniel for addressing the comments from earlier version. This version looks good to me.

Reviewed-by: Santosh Shilimkar santosh.shilimkar@ti.com

Regards, Santosh P.S: As I mentioned 'CLOCK_EVT_FEAT_DYNIRQ' optimization on OMAP at least I found risky because you might end up missing the asynchronous IPI wakeups because of the current SGI's implementation. This must be true for other ARM platforms as well.

I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.

I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.

Testing on more boards will be worth but not until we have correct cpuidle support, with deep idle states.

Thanks -- Daniel

-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog

Santosh Shilimkar

11 Mar 11 Mar

3:24 a.m.

On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:

...

On 03/10/2013 06:33 PM, Santosh Shilimkar wrote:

...
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:

...
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.

Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.

This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.

This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.

As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.

Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer

Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity

Thanks Daniel for addressing the comments from earlier version. This version looks good to me.

Reviewed-by: Santosh Shilimkar santosh.shilimkar@ti.com

Regards, Santosh P.S: As I mentioned 'CLOCK_EVT_FEAT_DYNIRQ' optimization on OMAP at least I found risky because you might end up missing the asynchronous IPI wakeups because of the current SGI's implementation. This must be true for other ARM platforms as well.

I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.

You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?

...

I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.

I already mention the issue here [1]. You might not see any major issues because the missed asynchronous IPIs might eventually get executed when CPU's wakeup from deeper states because of idle wakeups. OMAP is no different from idle wakeup optimisation and it will surely benefit and work. The main reason I didn't pursue it because of not having solution for [1] which as discussed in past is very much essential from kernel functional correctness perspective. You might want to verify that by adding a tracepoint on IPI's on other reasons except the timer wakeup.

Regards, Santosh

[1] https://lkml.org/lkml/2013/2/27/39

Daniel Lezcano

8:40 a.m.

On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:

...

On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:

[ ... ]

...

...
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.

You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?

Well, the cpuidle driver is not going into a deep idle state to check this out.

AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.

But definitively worth to check.

...

...
I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.

I already mention the issue here [1]. You might not see any major issues because the missed asynchronous IPIs might eventually get executed when CPU's wakeup from deeper states because of idle wakeups. OMAP is no different from idle wakeup optimisation and it will surely benefit and work. The main reason I didn't pursue it because of not having solution for [1] which as discussed in past is very much essential from kernel functional correctness perspective. You might want to verify that by adding a tracepoint on IPI's on other reasons except the timer wakeup.

Oh, ok. I didn't make the connection. I got the point now.

If we can raise a fake hardware interrupt on the GIC (not sure that could be done), may be we can implement something similar to the broadcast timer mechanism to replace the IPI when the cores are not IPI wakeup capable.

...

[1] https://lkml.org/lkml/2013/2/27/39

Santosh Shilimkar

9:12 a.m.

On Monday 11 March 2013 02:10 PM, Daniel Lezcano wrote:

...

On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:

...
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:

[ ... ]

...
...
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.

You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?

Well, the cpuidle driver is not going into a deep idle state to check this out.

AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.

But most likely it will be limited to peripheral interrupts. SGI's are per-cpu irq's so you need to check that part.

...

But definitively worth to check.

...
...
I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.

I already mention the issue here [1]. You might not see any major issues because the missed asynchronous IPIs might eventually get executed when CPU's wakeup from deeper states because of idle wakeups. OMAP is no different from idle wakeup optimisation and it will surely benefit and work. The main reason I didn't pursue it because of not having solution for [1] which as discussed in past is very much essential from kernel functional correctness perspective. You might want to verify that by adding a tracepoint on IPI's on other reasons except the timer wakeup.

Oh, ok. I didn't make the connection. I got the point now.

Good.

...

If we can raise a fake hardware interrupt on the GIC (not sure that could be done), may be we can implement something similar to the broadcast timer mechanism to replace the IPI when the cores are not IPI wakeup capable.

It isn't straight forward. The easiest way is to replace the SGI IPI with one which is wakeup capable even from deeper states. But then it all depends on hardware support and what that hook looks like per platform.

Regards, Santosh

Rickard Andersson

9:28 a.m.

On 03/11/2013 10:12 AM, Santosh Shilimkar wrote:

...

On Monday 11 March 2013 02:10 PM, Daniel Lezcano wrote:

...
On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:

...
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:

[ ... ]

...
...
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.

You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?

Well, the cpuidle driver is not going into a deep idle state to check this out.

AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.

But most likely it will be limited to peripheral interrupts. SGI's are per-cpu irq's so you need to check that part.

In the U8500 case, when the first CPU is woken up it will work ok for that CPU to send an IPI to the other CPU.

BR Rickard

Santosh Shilimkar

10:29 a.m.

On Monday 11 March 2013 02:58 PM, Rickard Andersson wrote:

...

On 03/11/2013 10:12 AM, Santosh Shilimkar wrote:

...
On Monday 11 March 2013 02:10 PM, Daniel Lezcano wrote:

...
On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:

...
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:

[ ... ]

...
...
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.

You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?

Well, the cpuidle driver is not going into a deep idle state to check this out.

AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.

But most likely it will be limited to peripheral interrupts. SGI's are per-cpu irq's so you need to check that part.

In the U8500 case, when the first CPU is woken up it will work ok for that CPU to send an IPI to the other CPU.

Nice. So in your case, IPI's will always work as long as one of the CPU is active.

Regards Santosh

4715

days inactive

4728

days old

linaro-kernel@lists.linaro.org

22 comments

participants

tags (0)

participants (8)

Daniel Lezcano
Linus Walleij
Rickard Andersson
Russell King - ARM Linux
Santosh Shilimkar
Thomas Gleixner
Vincent Guittot
Viresh Kumar