Hi,
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up.
Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
For example, let's say we have a 2 cpus system.
cpu0, cpu1 are idle
The next event is for cpu1 but cpu0 is wake up by the broadcast timer, after checking it has nothing to do except send a IPI_TIMER to cpu1 and then goes to idle again.
Wouldn't be worth to set the broadcast timer affinity to cpu1, so cpu0 is not wake up ?
Did I missed something or does it sound correct ?
Thanks -- Daniel
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up. Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks,
tglx
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up. Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks Thomas for your quick answer. I will write a RFC patchset.
-- Daniel
On 02/19/2013 10:21 AM, Daniel Lezcano wrote:
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up. Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks Thomas for your quick answer. I will write a RFC patchset.
I'm curious what the use case is. I played with this code awhile ago, and AFAICT it's not used on sensible (i.e. modern) systems. Is there anything other than old x86 machines that needs it?
--Andy
On Tue, 19 Feb 2013, Andy Lutomirski wrote:
On 02/19/2013 10:21 AM, Daniel Lezcano wrote:
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up. Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks Thomas for your quick answer. I will write a RFC patchset.
I'm curious what the use case is. I played with this code awhile ago, and AFAICT it's not used on sensible (i.e. modern) systems. Is there anything other than old x86 machines that needs it?
If the local apic timer is not affected by C-States, it's irrelevant, but there are enough machines out there which do not have that. The point is that we want a flag on the broadcast device which tells us whether we should use dynamic affinity settings or not. On x86 we would not set that flag ever.
Thanks,
tglx
On Tuesday 19 February 2013 11:51 PM, Daniel Lezcano wrote:
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up.
Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks Thomas for your quick answer. I will write a RFC patchset.
Last year I implemented the affinity hook for broad-cast code and experimented with it. Since the system I was using was dual core, it wasn't much beneficial and hence gave up later. I did remember discussing the approach with few folks in the conference.
Patch in the end of the email (also attached) for generic broadcast code. I didn't look at all corner case though. In arch code then you need to setup "broadcast_affinity" hook which should be able to get handle of the arch irqchip and call the respective affinity handler. Just 3 lines function should do the trick.
As Thomas said, effectiveness of such optimization solely depends on how well the affinity (in low powers) supported by your IRQ chip.
Hope this is helpful for you.
Regards, Santosh
From d70f2d48ec08a3f1d73187c49b16e4e60f81a50c Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar santosh.shilimkar@ti.com Date: Wed, 25 Jul 2012 03:42:33 +0530 Subject: [PATCH] tick-broadcast: Add tick road-cast affinity suport
Current tick broad-cast code has affinity set to the boot CPU and hence the boot CPU will always wakeup from low power states when broad cast timer is armed even if the next expiry event doesn't belong to it.
Patch adds broadcast affinity functionality to avoid above and let the tick framework set the affinity of the event for the CPU it belongs.
Signed-off-by: Santosh Shilimkar santosh.shilimkar@ti.com --- include/linux/clockchips.h | 2 ++ kernel/time/tick-broadcast.c | 13 ++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 8a7096f..5488cdc 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -95,6 +95,8 @@ struct clock_event_device { unsigned long retries;
void (*broadcast)(const struct cpumask *mask); + void (*broadcast_affinity) + (const struct cpumask *mask, int irq); void (*set_mode)(enum clock_event_mode mode, struct clock_event_device *); void (*suspend)(struct clock_event_device *); diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index f113755..2ec2425 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -39,6 +39,8 @@ static void tick_broadcast_clear_oneshot(int cpu); static inline void tick_broadcast_clear_oneshot(int cpu) { } #endif
+static inline void dummy_broadcast_affinity(const struct cpumask *mask, + int irq) { } /* * Debugging: see timer_list.c */ @@ -485,14 +487,19 @@ void tick_broadcast_oneshot_control(unsigned long reason) if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN); - if (dev->next_event.tv64 < bc->next_event.tv64) + if (dev->next_event.tv64 < bc->next_event.tv64) { tick_broadcast_set_event(dev->next_event, 1); + bc->broadcast_affinity( + tick_get_broadcast_oneshot_mask(), bc->irq); + } } } else { if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT); + bc->broadcast_affinity( + tick_get_broadcast_oneshot_mask(), bc->irq); if (dev->next_event.tv64 != KTIME_MAX) tick_program_event(dev->next_event, 1); } @@ -536,6 +543,10 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
bc->event_handler = tick_handle_oneshot_broadcast;
+ /* setup dummy broadcast affinity handler if not provided */ + if (bc->broadcast_affinity) + bc->broadcast_affinity = dummy_broadcast_affinity; + /* Take the do_timer update */ tick_do_timer_cpu = cpu;
On 02/21/2013 07:19 AM, Santosh Shilimkar wrote:
On Tuesday 19 February 2013 11:51 PM, Daniel Lezcano wrote:
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up.
Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks Thomas for your quick answer. I will write a RFC patchset.
Last year I implemented the affinity hook for broad-cast code and experimented with it. Since the system I was using was dual core, it wasn't much beneficial and hence gave up later. I did remember discussing the approach with few folks in the conference.
I did a brief test with a similar patch on a ARM u8500 board. The timer is tied with CPU0 by default, setting the dynamic irq affinity reduce considerably the number of IPI. The difference with your patch is the affinity is set to one CPU, the first one which is supposed to be wake up by the timer expiration.
This is easy to spot with a small program doing usleep wired on CPU1.
We see CPU0 waking up to send an IPI to CPU1 and going to idle again.
I don't know how that behaves with OMAP4 with this patch (which I guess it is the board you used), but the coupled idle state traces could be ambiguous if you relied on it to check the benefit of this patch.
IMO, it is worth to implement such solution and perhaps we can extend it to optimize the package idle time with the generic power domain tied with the irq. Anyway, it is a random thought let's see that later :)
Patch in the end of the email (also attached) for generic broadcast code. I didn't look at all corner case though. In arch code then you need to setup "broadcast_affinity" hook which should be able to get handle of the arch irqchip and call the respective affinity handler. Just 3 lines function should do the trick.
As Thomas said, effectiveness of such optimization solely depends on how well the affinity (in low powers) supported by your IRQ chip.
Hope this is helpful for you.
Thanks a lot for your patch and your feedbacks.
-- Daniel
From d70f2d48ec08a3f1d73187c49b16e4e60f81a50c Mon Sep 17 00:00:00 2001 From: Santosh Shilimkar santosh.shilimkar@ti.com Date: Wed, 25 Jul 2012 03:42:33 +0530 Subject: [PATCH] tick-broadcast: Add tick road-cast affinity suport
Current tick broad-cast code has affinity set to the boot CPU and hence the boot CPU will always wakeup from low power states when broad cast timer is armed even if the next expiry event doesn't belong to it.
Patch adds broadcast affinity functionality to avoid above and let the tick framework set the affinity of the event for the CPU it belongs.
Signed-off-by: Santosh Shilimkar santosh.shilimkar@ti.com
include/linux/clockchips.h | 2 ++ kernel/time/tick-broadcast.c | 13 ++++++++++++- 2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h index 8a7096f..5488cdc 100644 --- a/include/linux/clockchips.h +++ b/include/linux/clockchips.h @@ -95,6 +95,8 @@ struct clock_event_device { unsigned long retries;
void (*broadcast)(const struct cpumask *mask);
- void (*broadcast_affinity)
void (*set_mode)(enum clock_event_mode mode, struct clock_event_device *); void (*suspend)(struct clock_event_device *);(const struct cpumask *mask, int irq);
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c index f113755..2ec2425 100644 --- a/kernel/time/tick-broadcast.c +++ b/kernel/time/tick-broadcast.c @@ -39,6 +39,8 @@ static void tick_broadcast_clear_oneshot(int cpu); static inline void tick_broadcast_clear_oneshot(int cpu) { } #endif
+static inline void dummy_broadcast_affinity(const struct cpumask *mask,
int irq) { }
/*
- Debugging: see timer_list.c
*/ @@ -485,14 +487,19 @@ void tick_broadcast_oneshot_control(unsigned long reason) if (!cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { cpumask_set_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_SHUTDOWN);
if (dev->next_event.tv64 < bc->next_event.tv64)
if (dev->next_event.tv64 < bc->next_event.tv64) { tick_broadcast_set_event(dev->next_event, 1);
bc->broadcast_affinity(
tick_get_broadcast_oneshot_mask(), bc->irq);
} else { if (cpumask_test_cpu(cpu, tick_get_broadcast_oneshot_mask())) { cpumask_clear_cpu(cpu, tick_get_broadcast_oneshot_mask()); clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);} }
bc->broadcast_affinity(
tick_get_broadcast_oneshot_mask(), bc->irq); if (dev->next_event.tv64 != KTIME_MAX) tick_program_event(dev->next_event, 1); }
@@ -536,6 +543,10 @@ void tick_broadcast_setup_oneshot(struct clock_event_device *bc)
bc->event_handler = tick_handle_oneshot_broadcast;
/* setup dummy broadcast affinity handler if not provided */
if (bc->broadcast_affinity)
bc->broadcast_affinity = dummy_broadcast_affinity;
/* Take the do_timer update */ tick_do_timer_cpu = cpu;
On Thursday 21 February 2013 02:31 PM, Daniel Lezcano wrote:
On 02/21/2013 07:19 AM, Santosh Shilimkar wrote:
On Tuesday 19 February 2013 11:51 PM, Daniel Lezcano wrote:
On 02/19/2013 07:10 PM, Thomas Gleixner wrote:
On Tue, 19 Feb 2013, Daniel Lezcano wrote:
I am working on identifying the different wakeup sources from the interrupts and I have a question regarding the timer broadcast.
The broadcast timer is setup to the next event and that will wake up any idle cpu belonging to the "broadcast cpumask", right ?
The cpu which has been woken up will look for each cpu the next-event and send an IPI to wake it up.
Although, it is possible the sender of this IPI may not be concerned by the timer expiration and has been woken up just for sending the IPI, right ?
Correct.
If this is correct, is it possible to setup the timer irq affinity to a cpu which will be concerned by the timer expiration ? so we prevent an unnecessary wake up for a cpu.
It is possible, but we never implemented it.
If we go there, we want to make that conditional on a property flag, because some interrupt controllers especially on x86 only allow to move the affinity from interrupt context, which is pointless.
Thanks Thomas for your quick answer. I will write a RFC patchset.
Last year I implemented the affinity hook for broad-cast code and experimented with it. Since the system I was using was dual core, it wasn't much beneficial and hence gave up later. I did remember discussing the approach with few folks in the conference.
I did a brief test with a similar patch on a ARM u8500 board. The timer is tied with CPU0 by default, setting the dynamic irq affinity reduce considerably the number of IPI. The difference with your patch is the affinity is set to one CPU, the first one which is supposed to be wake up by the timer expiration.
This is easy to spot with a small program doing usleep wired on CPU1.
We see CPU0 waking up to send an IPI to CPU1 and going to idle again.
I don't know how that behaves with OMAP4 with this patch (which I guess it is the board you used), but the coupled idle state traces could be ambiguous if you relied on it to check the benefit of this patch.
Across OMAP4 and OMAP5 based devices, only the general purpose OMAP5 devices the approach was useful. Rest of the devices had constraints of master CPU(CPU0) waking up first always which in turns means pining the affinity to that CPU always which the current code already does. That was also another reason I didn't persue it further.
IMO, it is worth to implement such solution and perhaps we can extend it to optimize the package idle time with the generic power domain tied with the irq. Anyway, it is a random thought let's see that later :)
It is surely a good optimization especially for multi-core CPUIdle.
Patch in the end of the email (also attached) for generic broadcast code. I didn't look at all corner case though. In arch code then you need to setup "broadcast_affinity" hook which should be able to get handle of the arch irqchip and call the respective affinity handler. Just 3 lines function should do the trick.
As Thomas said, effectiveness of such optimization solely depends on how well the affinity (in low powers) supported by your IRQ chip.
Hope this is helpful for you.
Thanks a lot for your patch and your feedbacks.
Am glad that it was helpful.
Regards, Santosh