When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer
Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity
arch/arm/common/timer-sp.c | 3 ++- drivers/clocksource/nomadik-mtu.c | 3 ++- include/linux/clockchips.h | 1 + kernel/time/tick-broadcast.c | 40 +++++++++++++++++++++++++++++-------- 4 files changed, 37 insertions(+), 10 deletions(-)
The broadcast timer could be passed as parameter to the function instead of using again tick_broadcast_device.evtdev which was previously used in the caller function.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- kernel/time/tick-broadcast.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 2fb8cb8..6197ac0 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -406,10 +406,9 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void) return to_cpumask(tick_broadcast_oneshot_mask); }
-static int tick_broadcast_set_event(ktime_t expires, int force) +static int tick_broadcast_set_event(struct clock_event_device *bc, + ktime_t expires, int force) { - struct clock_event_device *bc = tick_broadcast_device.evtdev; - if (bc->mode != CLOCK_EVT_MODE_ONESHOT) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
@@ -479,7 +478,7 @@ again: * Rearm the broadcast device. If event expired, * repeat the above */ - if (tick_broadcast_set_event(next_event, 0)) + if (tick_broadcast_set_event(dev, next_event, 0)) goto again; } raw_spin_unlock(&tick_broadcast_lock); @@ -522,7 +521,7 @@ void tick_broadcast_oneshot_control(unsigned long reason) cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); if (dev->next_event.tv64 < bc->next_event.tv64) - tick_broadcast_set_event(dev->next_event, 1); + tick_broadcast_set_event(bc, dev->next_event, 1); } } else { if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { @@ -591,7 +590,7 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); tick_broadcast_init_next_event(to_cpumask(tmpmask), tick_next_period); - tick_broadcast_set_event(tick_next_period, 1); + tick_broadcast_set_event(bc, tick_next_period, 1); } else bc->next_event.tv64 = KTIME_MAX; } else {
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
The broadcast timer could be passed as parameter to the function instead of using again tick_broadcast_device.evtdev which was previously used in the caller function.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
The change doesn't buy us as such even after looking at next patch which tries to use bc. No strong opinion though.
Regards, Santosh
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- include/linux/clockchips.h | 1 + kernel/time/tick-broadcast.c | 39 ++++++++++++++++++++++++++++++++------- 2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 6634652..c256cea 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -54,6 +54,7 @@ enum clock_event_nofitiers { */ #define CLOCK_EVT_FEAT_C3STOP 0x000008 #define CLOCK_EVT_FEAT_DUMMY 0x000010 +#define CLOCK_EVT_FEAT_DYNIRQ 0x000020
/** * struct clock_event_device - clock event device descriptor diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 6197ac0..1f7b4f4 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -406,13 +406,36 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void) return to_cpumask(tick_broadcast_oneshot_mask); }
-static int tick_broadcast_set_event(struct clock_event_device *bc, +/* + * Set broadcast interrupt affinity + */ +static void tick_broadcast_set_affinity(struct clock_event_device *bc, int cpu) +{ + if (!(bc->features & CLOCK_EVT_FEAT_DYNIRQ)) + return; + + if (cpumask_equal(bc->cpumask, cpumask_of(cpu))) + return; + + bc->cpumask = cpumask_of(cpu); + irq_set_affinity(bc->irq, bc->cpumask); +} + +static int tick_broadcast_set_event(struct clock_event_device *bc, int cpu, ktime_t expires, int force) { + int ret; + if (bc->mode != CLOCK_EVT_MODE_ONESHOT) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
- return clockevents_program_event(bc, expires, force); + ret = clockevents_program_event(bc, expires, force); + if (ret) + return ret; + + tick_broadcast_set_affinity(bc, cpu); + + return 0; }
int tick_resume_broadcast_oneshot(struct clock_event_device *bc) @@ -441,7 +464,7 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) { struct tick_device *td; ktime_t now, next_event; - int cpu; + int cpu, next_cpu;
raw_spin_lock(&tick_broadcast_lock); again: @@ -454,8 +477,10 @@ again: td = &per_cpu(tick_cpu_device, cpu); if (td->evtdev->next_event.tv64 <= now.tv64) cpumask_set_cpu(cpu, to_cpumask(tmpmask)); - else if (td->evtdev->next_event.tv64 < next_event.tv64) + else if (td->evtdev->next_event.tv64 < next_event.tv64) { next_event.tv64 = td->evtdev->next_event.tv64; + next_cpu = cpu; + } }
/* @@ -478,7 +503,7 @@ again: * Rearm the broadcast device. If event expired, * repeat the above */ - if (tick_broadcast_set_event(dev, next_event, 0)) + if (tick_broadcast_set_event(dev, next_cpu, next_event, 0)) goto again; } raw_spin_unlock(&tick_broadcast_lock); @@ -521,7 +546,7 @@ void tick_broadcast_oneshot_control(unsigned long reason) cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); if (dev->next_event.tv64 < bc->next_event.tv64) - tick_broadcast_set_event(bc, dev->next_event, 1); + tick_broadcast_set_event(bc, cpu, dev->next_event, 1); } } else { if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { @@ -590,7 +615,7 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT); tick_broadcast_init_next_event(to_cpumask(tmpmask), tick_next_period); - tick_broadcast_set_event(bc, tick_next_period, 1); + tick_broadcast_set_event(bc, cpu, tick_next_period, 1); } else bc->next_event.tv64 = KTIME_MAX; } else {
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Minor. Can mention the flag name as well here "CLOCK_EVT_FEAT_DYNIRQ"
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
include/linux/clockchips.h | 1 + kernel/time/tick-broadcast.c | 39 ++++++++++++++++++++++++++++++++------- 2 files changed, 33 insertions(+), 7 deletions(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 6634652..c256cea 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -54,6 +54,7 @@ enum clock_event_nofitiers { */ #define CLOCK_EVT_FEAT_C3STOP 0x000008 #define CLOCK_EVT_FEAT_DUMMY 0x000010 +#define CLOCK_EVT_FEAT_DYNIRQ 0x000020
Please add some comments about the usage of the flag.
/**
- struct clock_event_device - clock event device descriptor
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index 6197ac0..1f7b4f4 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -406,13 +406,36 @@ struct cpumask *tick_get_broadcast_oneshot_mask(void) return to_cpumask(tick_broadcast_oneshot_mask); }
-static int tick_broadcast_set_event(struct clock_event_device *bc, +/*
- Set broadcast interrupt affinity
- */
+static void tick_broadcast_set_affinity(struct clock_event_device *bc, int cpu) +{
Better is just make second parameter as cpu_mask rather than CPU cpu number. Its a semantic of affinity hook which you can easily retain.
- if (!(bc->features & CLOCK_EVT_FEAT_DYNIRQ))
return;
- if (cpumask_equal(bc->cpumask, cpumask_of(cpu)))
return;
- bc->cpumask = cpumask_of(cpu);
You can avoid the cpumask_of() couple of times above.
- irq_set_affinity(bc->irq, bc->cpumask);
+}
+static int tick_broadcast_set_event(struct clock_event_device *bc, int cpu, ktime_t expires, int force) {
- int ret;
- if (bc->mode != CLOCK_EVT_MODE_ONESHOT) clockevents_set_mode(bc, CLOCK_EVT_MODE_ONESHOT);
- return clockevents_program_event(bc, expires, force);
- ret = clockevents_program_event(bc, expires, force);
- if (ret)
return ret;
- tick_broadcast_set_affinity(bc, cpu);
In case you go by cpumask paramater, then above can be just tick_broadcast_set_affinity(bc, cpumask_of(cpu));
return 0; }
int tick_resume_broadcast_oneshot(struct clock_event_device *bc)
@@ -441,7 +464,7 @@ static void tick_handle_oneshot_broadcast(struct clock_event_device *dev) { struct tick_device *td; ktime_t now, next_event;
- int cpu;
int cpu, next_cpu;
raw_spin_lock(&tick_broadcast_lock); again:
@@ -454,8 +477,10 @@ again: td = &per_cpu(tick_cpu_device, cpu); if (td->evtdev->next_event.tv64 <= now.tv64) cpumask_set_cpu(cpu, to_cpumask(tmpmask));
else if (td->evtdev->next_event.tv64 < next_event.tv64)
else if (td->evtdev->next_event.tv64 < next_event.tv64) { next_event.tv64 = td->evtdev->next_event.tv64;
next_cpu = cpu;
}
}
/*
@@ -478,7 +503,7 @@ again: * Rearm the broadcast device. If event expired, * repeat the above */
if (tick_broadcast_set_event(dev, next_event, 0))
} raw_spin_unlock(&tick_broadcast_lock);if (tick_broadcast_set_event(dev, next_cpu, next_event, 0)) goto again;
@@ -521,7 +546,7 @@ void tick_broadcast_oneshot_control(unsigned long reason) cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); if (dev->next_event.tv64 < bc->next_event.tv64)
tick_broadcast_set_event(bc, dev->next_event, 1);
tick_broadcast_set_event(bc, cpu, dev->next_event, 1);
Since you have embedded the irq_affinity() in above function, the IRQ affinity for bc->irq will remain to the last CPU on which the interrupt fired. In general it should be fine but would be good if you clear it on CLOCK_EVT_NOTIFY_BROADCAST_EXIT. Not a must have though.
Regards, Santosh
Add the dynamic irq affinity feature to the timer clock device.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- drivers/clocksource/nomadik-mtu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/clocksource/nomadik-mtu.c b/drivers/clocksource/nomadik-mtu.c index 7cbcaa0..73dc540 100644 --- a/drivers/clocksource/nomadik-mtu.c +++ b/drivers/clocksource/nomadik-mtu.c @@ -136,7 +136,8 @@ static void nmdk_clkevt_mode(enum clock_event_mode mode,
static struct clock_event_device nmdk_clkevt = { .name = "mtu_1", - .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC, + .features = CLOCK_EVT_FEAT_ONESHOT | CLOCK_EVT_FEAT_PERIODIC | + CLOCK_EVT_FEAT_DYNIRQ, .rating = 200, .set_mode = nmdk_clkevt_mode, .set_next_event = nmdk_clkevt_next,
On Tue, Feb 26, 2013 at 11:17 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
Add the dynamic irq affinity feature to the timer clock device.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
Looks reasonable to me, sadly I do not fully grasp the patch set, Vincent+Rickard can you have a look at this?
Yours, Linus Walleij
On 1 March 2013 02:13, Linus Walleij linus.walleij@linaro.org wrote:
On Tue, Feb 26, 2013 at 11:17 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
Add the dynamic irq affinity feature to the timer clock device.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
Looks reasonable to me, sadly I do not fully grasp the patch set, Vincent+Rickard can you have a look at this?
ux500 is able to trig the wake up on one CPU and let the other one in WFI. This patch will minimize the spurious wake up of CPU0 when CPU1 is the target CPU of the broadcast timer. One main consequence is that we will not uselessly execute all the deferrable and newly idle activities on the CPU0 .
you can add my reviewed-by if you want
Vincent
Yours, Linus Walleij
On 03/01/2013 09:56 AM, Vincent Guittot wrote:
On 1 March 2013 02:13, Linus Walleijlinus.walleij@linaro.org wrote:
On Tue, Feb 26, 2013 at 11:17 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
Add the dynamic irq affinity feature to the timer clock device.
Signed-off-by: Daniel Lezcanodaniel.lezcano@linaro.org
Looks reasonable to me, sadly I do not fully grasp the patch set, Vincent+Rickard can you have a look at this?
ux500 is able to trig the wake up on one CPU and let the other one in WFI. This patch will minimize the spurious wake up of CPU0 when CPU1 is the target CPU of the broadcast timer. One main consequence is that we will not uselessly execute all the deferrable and newly idle activities on the CPU0 .
you can add my reviewed-by if you want
Vincent
It looks ok to me as well.
BR Rickard
From: Viresh Kumar viresh.kumar@linaro.org
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch fixes this for ARM platforms using timer-sp, by setting CLOCK_EVT_FEAT_DYNIRQ feature.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- arch/arm/common/timer-sp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm/common/timer-sp.c b/arch/arm/common/timer-sp.c index 9d2d3ba..ae3c0f9 100644 --- a/arch/arm/common/timer-sp.c +++ b/arch/arm/common/timer-sp.c @@ -158,7 +158,8 @@ static int sp804_set_next_event(unsigned long next, }
static struct clock_event_device sp804_clockevent = { - .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT, + .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT | + CLOCK_EVT_FEAT_DYNIRQ, .set_mode = sp804_set_mode, .set_next_event = sp804_set_next_event, .rating = 300,
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
From: Viresh Kumar viresh.kumar@linaro.org
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
Broad-cast device will only open the CPU for which the timer IRQ affined to. And infact with subject series the affinity also is updated for the CPU which owns the last timer expiry event.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch fixes this for ARM platforms using timer-sp, by setting CLOCK_EVT_FEAT_DYNIRQ feature.
Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
What am I missing here ?
Regards, Santosh
On 27 February 2013 10:26, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
From: Viresh Kumar viresh.kumar@linaro.org
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
Broad-cast device will only open the CPU for which the timer IRQ affined to. And infact with subject series the affinity also is updated for the CPU which owns the last timer expiry event.
What am I missing here ?
Dynamic affinity will work only if the following flag is set for a clock_event_device: CLOCK_EVT_FEAT_DYNIRQ, otherwise wakeup would happen on the cpu to which static affinity was set to.
On Wednesday 27 February 2013 10:29 AM, Viresh Kumar wrote:
On 27 February 2013 10:26, Santosh Shilimkar santosh.shilimkar@ti.com wrote:
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
From: Viresh Kumar viresh.kumar@linaro.org
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time frame work to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
Broad-cast device will only open the CPU for which the timer IRQ affined to. And infact with subject series the affinity also is updated for the CPU which owns the last timer expiry event.
What am I missing here ?
Dynamic affinity will work only if the following flag is set for a clock_event_device: CLOCK_EVT_FEAT_DYNIRQ, otherwise wakeup would happen on the cpu to which static affinity was set to.
I should have looked at the patches in order first :) Sorry for the noise.
Regards, Santosh
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Not completely related to this series but there is another issue where this local timer not wakeup capable hurts. So far we are discussing only the timer related future events which are known and can be programmed with broadcast device.
But think of the scenario's where we need to send asynchronous IPIs to CPUs to do some work. e.g generic_exec_single(). If the CPU which is suppose to be available after IPI call is in deep low power state, then the IPI(implemented on ARM) isn't effective. In CPU off idle modes, a GIC SGI will not wake the CPU and hence a special wakeup is needed to bring out those CPUs out of idle. This special wakeup is handled by broad-cast timer in case of CPUIDLE.
In short what I mean is, you need to have IPI which can wakeup CPUs from any deep idle power state to address above. Has anybody thought of this one ?
Regards, Santosh P.S: Time and again it proves that making the local timer wakeup capable solves the issue.
On Wed, Feb 27, 2013 at 11:30:11AM +0530, Santosh Shilimkar wrote:
P.S: Time and again it proves that making the local timer wakeup capable solves the issue.
Slightly different take: it proves that hardware people don't talk to software people about what they require to make an operating system work. Hardware people think they understand that and go off and do their own thing, and expect software people to sort out their mess.
This happens all the time; there is no solution for it as long as companies view the creation of hardware as being entirely separate from software.
On Wed, 27 Feb 2013, Russell King - ARM Linux wrote:
On Wed, Feb 27, 2013 at 11:30:11AM +0530, Santosh Shilimkar wrote:
P.S: Time and again it proves that making the local timer wakeup capable solves the issue.
Slightly different take: it proves that hardware people don't talk to software people about what they require to make an operating system work. Hardware people think they understand that and go off and do their own thing, and expect software people to sort out their mess.
This happens all the time; there is no solution for it as long as companies view the creation of hardware as being entirely separate from software.
Amen!
We have seen the mess this kind of thinking creates on x86 already 10+ years ago and we are still suffering.
As I said before, I really can't understand that ARM and the ARM SoC vendors insisted to repeat the same mistakes, which we kernel people have proven to be fatal already. It's even worse: they asked me what problems they should avoid before they went to implement them.
I can halfways understand the little kid who insists to burn his hand on the hot oven instead of listening to parental advice, but this kind of advisory resistance is either caused by abuse of secrect drugs or by living in a disconnected universe or both.
Thanks,
tglx
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer
Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity
Thanks Daniel for addressing the comments from earlier version. This version looks good to me.
Reviewed-by: Santosh Shilimkar santosh.shilimkar@ti.com
Regards, Santosh P.S: As I mentioned 'CLOCK_EVT_FEAT_DYNIRQ' optimization on OMAP at least I found risky because you might end up missing the asynchronous IPI wakeups because of the current SGI's implementation. This must be true for other ARM platforms as well.
On 03/10/2013 06:33 PM, Santosh Shilimkar wrote:
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer
Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity
Thanks Daniel for addressing the comments from earlier version. This version looks good to me.
Reviewed-by: Santosh Shilimkar santosh.shilimkar@ti.com
Regards, Santosh P.S: As I mentioned 'CLOCK_EVT_FEAT_DYNIRQ' optimization on OMAP at least I found risky because you might end up missing the asynchronous IPI wakeups because of the current SGI's implementation. This must be true for other ARM platforms as well.
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.
I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.
Testing on more boards will be worth but not until we have correct cpuidle support, with deep idle states.
Thanks -- Daniel
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:
On 03/10/2013 06:33 PM, Santosh Shilimkar wrote:
On Wednesday 27 February 2013 03:47 AM, Daniel Lezcano wrote:
When a cpu goes to a deep idle state where its local timer is shutdown, it notifies the time framework to use the broadcast timer instead.
Unfortunately, the broadcast device could wake up any CPU, including an idle one which is not concerned by the wake up at all.
This implies, in the worst case, an idle CPU will wake up to send an IPI to another idle cpu.
This patch solves this by setting the irq affinity to the cpu concerned by the nearest timer event, by this way, the CPU which is wake up is guarantee to be the one concerned by the next event and we are safe with unnecessary wakeup for another idle CPU.
As the irq affinity is not supported by all the archs, a flag is needed to specify which clocksource can handle it.
Daniel Lezcano (3): time : pass broadcast parameter time : set broadcast irq affinity ARM: nomadik: add dynamic irq flag to the timer
Viresh Kumar (1): ARM: timer-sp: Set dynamic irq affinity
Thanks Daniel for addressing the comments from earlier version. This version looks good to me.
Reviewed-by: Santosh Shilimkar santosh.shilimkar@ti.com
Regards, Santosh P.S: As I mentioned 'CLOCK_EVT_FEAT_DYNIRQ' optimization on OMAP at least I found risky because you might end up missing the asynchronous IPI wakeups because of the current SGI's implementation. This must be true for other ARM platforms as well.
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.
You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?
I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.
I already mention the issue here [1]. You might not see any major issues because the missed asynchronous IPIs might eventually get executed when CPU's wakeup from deeper states because of idle wakeups. OMAP is no different from idle wakeup optimisation and it will surely benefit and work. The main reason I didn't pursue it because of not having solution for [1] which as discussed in past is very much essential from kernel functional correctness perspective. You might want to verify that by adding a tracepoint on IPI's on other reasons except the timer wakeup.
Regards, Santosh
On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:
[ ... ]
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.
You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?
Well, the cpuidle driver is not going into a deep idle state to check this out.
AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.
But definitively worth to check.
I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.
I already mention the issue here [1]. You might not see any major issues because the missed asynchronous IPIs might eventually get executed when CPU's wakeup from deeper states because of idle wakeups. OMAP is no different from idle wakeup optimisation and it will surely benefit and work. The main reason I didn't pursue it because of not having solution for [1] which as discussed in past is very much essential from kernel functional correctness perspective. You might want to verify that by adding a tracepoint on IPI's on other reasons except the timer wakeup.
Oh, ok. I didn't make the connection. I got the point now.
If we can raise a fake hardware interrupt on the GIC (not sure that could be done), may be we can implement something similar to the broadcast timer mechanism to replace the IPI when the cores are not IPI wakeup capable.
On Monday 11 March 2013 02:10 PM, Daniel Lezcano wrote:
On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:
[ ... ]
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.
You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?
Well, the cpuidle driver is not going into a deep idle state to check this out.
AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.
But most likely it will be limited to peripheral interrupts. SGI's are per-cpu irq's so you need to check that part.
But definitively worth to check.
I will test it on OMAP but with the coupled idle state, I am not sure of the behavior. Could elaborate a bit the specificity of OMAP ? I am not sure to understand why I may miss some IPI wakeups.
I already mention the issue here [1]. You might not see any major issues because the missed asynchronous IPIs might eventually get executed when CPU's wakeup from deeper states because of idle wakeups. OMAP is no different from idle wakeup optimisation and it will surely benefit and work. The main reason I didn't pursue it because of not having solution for [1] which as discussed in past is very much essential from kernel functional correctness perspective. You might want to verify that by adding a tracepoint on IPI's on other reasons except the timer wakeup.
Oh, ok. I didn't make the connection. I got the point now.
Good.
If we can raise a fake hardware interrupt on the GIC (not sure that could be done), may be we can implement something similar to the broadcast timer mechanism to replace the IPI when the cores are not IPI wakeup capable.
It isn't straight forward. The easiest way is to replace the SGI IPI with one which is wakeup capable even from deeper states. But then it all depends on hardware support and what that hook looks like per platform.
Regards, Santosh
On 03/11/2013 10:12 AM, Santosh Shilimkar wrote:
On Monday 11 March 2013 02:10 PM, Daniel Lezcano wrote:
On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:
[ ... ]
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.
You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?
Well, the cpuidle driver is not going into a deep idle state to check this out.
AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.
But most likely it will be limited to peripheral interrupts. SGI's are per-cpu irq's so you need to check that part.
In the U8500 case, when the first CPU is woken up it will work ok for that CPU to send an IPI to the other CPU.
BR Rickard
On Monday 11 March 2013 02:58 PM, Rickard Andersson wrote:
On 03/11/2013 10:12 AM, Santosh Shilimkar wrote:
On Monday 11 March 2013 02:10 PM, Daniel Lezcano wrote:
On 03/11/2013 04:24 AM, Santosh Shilimkar wrote:
On Sunday 10 March 2013 11:52 PM, Daniel Lezcano wrote:
[ ... ]
I don't think it is the case for all the ARM platforms, at least we tested it on vexpress TC2 and u8500, and the number of IPI were reduced very significantly increasing the idle time for cpu0. TC2 will need another optimization on another area for the idle wake up to gain real improvements.
You are missing my point. TC2 can be an exception since the SGI can wakeup CPUs even from low power states where local timer's are stalled. Is that the case with U8500 ?
Well, the cpuidle driver is not going into a deep idle state to check this out.
AFAICT this board has a specific firmware with the PRCMU (a device managing the power on the board) and it replaces the GIC when going to deep idle state, especially by reconnecting the GIC to the A9 cores automatically when an interrupt occurs.
But most likely it will be limited to peripheral interrupts. SGI's are per-cpu irq's so you need to check that part.
In the U8500 case, when the first CPU is woken up it will work ok for that CPU to send an IPI to the other CPU.
Nice. So in your case, IPI's will always work as long as one of the CPU is active.
Regards Santosh
linaro-kernel@lists.linaro.org