-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi,
while trying the linux-next at the point it boots (commit be9b7335e70696bee731c152429b1737e42fe163, after v3.2-rc4), I noticed the timers were not working properly with CONFIG_NO_HZ.
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Thanks -- Daniel
- -- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On Wed, Dec 14, 2011 at 4:27 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
while trying the linux-next at the point it boots (commit be9b7335e70696bee731c152429b1737e42fe163, after v3.2-rc4), I noticed the timers were not working properly with CONFIG_NO_HZ.
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Sleeps are only guaranteed at max speed.
Since this is jiffy-based sleep I think these patches (which I just updated and put into Russell's patch tracker) are needed: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7210/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7211/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7212/1
If these patches solve your issue please ACK them on the linux-arm-kernel maillist, so Russell et al can see that they solve problems for people...
You will then encounter the same problem at the udelay(), mdelay() etc to which these patches provide a solution (with an additional ux500 MTU patch that is somewhere in our tree): http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6873/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6874/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6875/1
Linus Walleij
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/14/2011 05:02 PM, Linus Walleij wrote:
On Wed, Dec 14, 2011 at 4:27 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
while trying the linux-next at the point it boots (commit be9b7335e70696bee731c152429b1737e42fe163, after v3.2-rc4), I noticed the timers were not working properly with CONFIG_NO_HZ.
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Sleeps are only guaranteed at max speed.
I am not sure to get the point. Do you mean cpufreq max frequency ?
Since this is jiffy-based sleep I think these patches (which I just updated and put into Russell's patch tracker) are needed: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7210/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7211/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7212/1
These three patches do not solve the problem.
If these patches solve your issue please ACK them on the linux-arm-kernel maillist, so Russell et al can see that they solve problems for people...
You will then encounter the same problem at the udelay(), mdelay() etc to which these patches provide a solution (with an additional ux500 MTU patch that is somewhere in our tree): http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6873/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6874/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6875/1
I tried to apply these patches on linux-next (again at the point the snowball boots), but they don't apply. They are trying to modify arch/arm/lib/delay.c which does not exist in the current commit neither in the HEAD. Isn't there a patch to be applied before ?
By the way, while reading the description of the patches, I tested with an UP kernel instead of SMP and the problem does not appear.
I tried again with a SMP kernel but unplugging cpu1 and the problem is still there.
Hope that helps.
-- Daniel
Thanks -- Daniel
- -- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On Thu, Dec 15, 2011 at 1:16 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
[Me]
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Sleeps are only guaranteed at max speed.
I am not sure to get the point. Do you mean cpufreq max frequency ?
It means that the kernel idea of sleep(1) is, sleep atleast 1 second, possibly more. When the system scales down frequency, say to half the frequency, things start to take twice the time. So sleep(1) may result in 2 seconds of sleep or so.
The patches below are intended to address this...
What happens if you disable CPUfreq? cd /sys/devices/system/cpu/cpu0/cpufreq cat scaling_max_freq > scaling_min_freq
(This should set the CPU to max speed, always.)
Does the problem go away?
Then it's CPUfreq-related.
If it persists we have to look for something else...
Since this is jiffy-based sleep I think these patches (which I just updated and put into Russell's patch tracker) are needed: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7210/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7211/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7212/1
These three patches do not solve the problem.
How typical :-/
If these patches solve your issue please ACK them on the linux-arm-kernel maillist, so Russell et al can see that they solve problems for people...
You will then encounter the same problem at the udelay(), mdelay() etc to which these patches provide a solution (with an additional ux500 MTU patch that is somewhere in our tree): http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6873/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6874/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6875/1
I tried to apply these patches on linux-next (again at the point the snowball boots), but they don't apply. They are trying to modify arch/arm/lib/delay.c which does not exist in the current commit neither in the HEAD. Isn't there a patch to be applied before ?
No I think they just need to be rebased.... nobody seems to be driving this right now.
By the way, while reading the description of the patches, I tested with an UP kernel instead of SMP and the problem does not appear.
Hmmmm.
I tried again with a SMP kernel but unplugging cpu1 and the problem is still there.
Try with deactivated CPUfreq and see what happens.
Yours, Linus Walleij
On 15 December 2011 13:06, Linus Walleij linus.walleij@linaro.org wrote:
On Thu, Dec 15, 2011 at 1:16 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
[Me]
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Sleeps are only guaranteed at max speed.
I am not sure to get the point. Do you mean cpufreq max frequency ?
It means that the kernel idea of sleep(1) is, sleep atleast 1 second, possibly more. When the system scales down frequency, say to half the frequency, things start to take twice the time. So sleep(1) may result in 2 seconds of sleep or so.
In that situation will multiple sleep's on different CPUs behave consistently or could they take wildly different amounts of time?
The patches below are intended to address this...
I wonder if this is my problem with membase; I'll try and give it a go. In particular it has a testsuite that's designed to exercise it's data expiry mechanism, where it stores some data telling it to expire it in 1 seconds time, then the test suite has a 2 second sleep, and then sees if the data is still there - which it is some annoyingly small fraction of the time.
Dave
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/15/2011 02:06 PM, Linus Walleij wrote:
On Thu, Dec 15, 2011 at 1:16 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
[Me]
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Sleeps are only guaranteed at max speed.
I am not sure to get the point. Do you mean cpufreq max frequency ?
It means that the kernel idea of sleep(1) is, sleep atleast 1 second, possibly more. When the system scales down frequency, say to half the frequency, things start to take twice the time. So sleep(1) may result in 2 seconds of sleep or so.
The patches below are intended to address this...
What happens if you disable CPUfreq? cd /sys/devices/system/cpu/cpu0/cpufreq cat scaling_max_freq > scaling_min_freq
(This should set the CPU to max speed, always.)
Does the problem go away?
Then it's CPUfreq-related.
If it persists we have to look for something else...
Since this is jiffy-based sleep I think these patches (which I just updated and put into Russell's patch tracker) are needed: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7210/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7211/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=7212/1
These three patches do not solve the problem.
How typical :-/
If these patches solve your issue please ACK them on the linux-arm-kernel maillist, so Russell et al can see that they solve problems for people...
You will then encounter the same problem at the udelay(), mdelay() etc to which these patches provide a solution (with an additional ux500 MTU patch that is somewhere in our tree): http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6873/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6874/1 http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=6875/1
I tried to apply these patches on linux-next (again at the point the snowball boots), but they don't apply. They are trying to modify arch/arm/lib/delay.c which does not exist in the current commit neither in the HEAD. Isn't there a patch to be applied before ?
No I think they just need to be rebased.... nobody seems to be driving this right now.
By the way, while reading the description of the patches, I tested with an UP kernel instead of SMP and the problem does not appear.
Hmmmm.
I tried again with a SMP kernel but unplugging cpu1 and the problem is still there.
Try with deactivated CPUfreq and see what happens.
Ok, if cpufreq is compiled out, the problem does not occur. If it is compiled in, the problem occurs even if I set the governor to 'performance' or 'userspace' + frequency set to 1000000.
- -- http://www.linaro.org/ Linaro.org ? Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On Thu, 2011-12-15 at 14:06 +0100, Linus Walleij wrote:
On Thu, Dec 15, 2011 at 1:16 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
[Me]
It is easy to reproduce with 'time sleep 1' where the timer expires 1, 2 or 3 seconds later.
It seems that does not happen with linux-linaro-3.1 but I was able to reproduce the problem on a vanilla kernel 3.1.5.
Is it a known problem ?
Sleeps are only guaranteed at max speed.
I am not sure to get the point. Do you mean cpufreq max frequency ?
It means that the kernel idea of sleep(1) is, sleep atleast 1 second, possibly more. When the system scales down frequency, say to half the frequency, things start to take twice the time. So sleep(1) may result in 2 seconds of sleep or so.
Just a minor clarification: So, while Linus is right that sleep can validly go longer then the requested time (the only promise is that it shouldn't return success early), sleep() should be timer based (not delay based), so even if the frequency drops, you *shouldn't* see freq proportional delays.
If that were happening, it would seem timekeeping would also be slowed down, which def shouldn't happen if we're using a sane clocksource (although broken clocksources - which may change freq with the cpu - have caused symptoms like the above).
That said, Linus knows more about the specific issues around the board, so I'd defer to him in debugging the issue. I just didn't want anyone to get the impression that sleep length *should* be proportional to cpu freq.
thanks -john
On Thu, Dec 15, 2011 at 8:02 PM, john stultz johnstul@us.ibm.com wrote:
On Thu, 2011-12-15 at 14:06 +0100, Linus Walleij wrote:
It means that the kernel idea of sleep(1) is, sleep atleast 1 second, possibly more. When the system scales down frequency, say to half the frequency, things start to take twice the time. So sleep(1) may result in 2 seconds of sleep or so.
Just a minor clarification: So, while Linus is right that sleep can validly go longer then the requested time (the only promise is that it shouldn't return success early), sleep() should be timer based (not delay based), so even if the frequency drops, you *shouldn't* see freq proportional delays.
True. So what the three patches to the SMP_TWD is try to fix that for the localtimers on these ARM systems, but sadly that doesn't seem to cut it :-(
If that were happening, it would seem timekeeping would also be slowed down, which def shouldn't happen if we're using a sane clocksource (although broken clocksources - which may change freq with the cpu - have caused symptoms like the above).
Yes skew in the clocksource would sort of affect the ruler that you're using giving such phenomena.
The ux500 supports two different clock sources, so Daniel, what happens if you just reactivate CPUfreq then use menuconfig to go into drivers/ directly you will see the "Clocksource PRCMU Timer" (sorry the clksrc subsystem does not have its own submenu...), deselect that so as to use the more monotone MTU clock source (it lives up in arch/arm/plat-nomadik/timer.c by the way), what happens?
If that solves it, reactivate the PRCMU clock source and just deselect the "clocksource PRCMU timer sched_clock" and see what happens.
This way we can see if the PRCMU clock source is causing it, or if it's simply caused by using the PRCMU for sched_clock or if neither is causing it.
Yours, Linus Walleij
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/16/2011 12:23 AM, Linus Walleij wrote:
On Thu, Dec 15, 2011 at 8:02 PM, john stultz johnstul@us.ibm.com wrote:
On Thu, 2011-12-15 at 14:06 +0100, Linus Walleij wrote:
It means that the kernel idea of sleep(1) is, sleep atleast 1 second, possibly more. When the system scales down frequency, say to half the frequency, things start to take twice the time. So sleep(1) may result in 2 seconds of sleep or so.
Just a minor clarification: So, while Linus is right that sleep can validly go longer then the requested time (the only promise is that it shouldn't return success early), sleep() should be timer based (not delay based), so even if the frequency drops, you *shouldn't* see freq proportional delays.
True. So what the three patches to the SMP_TWD is try to fix that for the localtimers on these ARM systems, but sadly that doesn't seem to cut it :-(
If that were happening, it would seem timekeeping would also be slowed down, which def shouldn't happen if we're using a sane clocksource (although broken clocksources - which may change freq with the cpu - have caused symptoms like the above).
Yes skew in the clocksource would sort of affect the ruler that you're using giving such phenomena.
The ux500 supports two different clock sources, so Daniel, what happens if you just reactivate CPUfreq then use menuconfig to go into drivers/ directly you will see the "Clocksource PRCMU Timer" (sorry the clksrc subsystem does not have its own submenu...), deselect that so as to use the more monotone MTU clock source (it lives up in arch/arm/plat-nomadik/timer.c by the way), what happens?
I am still seeing the problem ... :/
If that solves it, reactivate the PRCMU clock source and just deselect the "clocksource PRCMU timer sched_clock" and see what happens.
This way we can see if the PRCMU clock source is causing it, or if it's simply caused by using the PRCMU for sched_clock or if neither is causing it.
Yours, Linus Walleij
- -- http://www.linaro.org/ Linaro.org ? Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On Fri, Dec 16, 2011 at 6:56 AM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
[Me]
what happens if you just reactivate CPUfreq then use menuconfig to go into drivers/ directly you will see the "Clocksource PRCMU Timer" (sorry the clksrc subsystem does not have its own submenu...), deselect that so as to use the more monotone MTU clock source (it lives up in arch/arm/plat-nomadik/timer.c by the way), what happens?
I am still seeing the problem ... :/
Hm, let's see what we know:
- It's not due to the clocksources - we tried two of them (both actually known to be good and stable) since the same timers are used for sched_clock that is probably not the cause either. - It's related to CPUfreq - Localtimer smp_twd CPUfreq patches does not help on their own
But my commit ef7a474cef00594ccef432ce0840464e51ea4ac0 adding the smp_twd clock may be giving the wrong frequency to the localtimer. This is on the eternal TODO to fix up the clock implementation for ux500.
Rabin fixed this in the non-mainline kernel: http://git.linaro.org/gitweb?p=bsp/st-ericsson/linux-3.0-ux500.git%3Ba=commi...
As you can see it hacks around in a quite different version of the CPUfreq driver (yes that needs to be fixed up too).
I should have thought about that immediately :-/ Too much in my head.
Can you try something like the below (together with the three smp_twd patches) to see if it solves the problem? It's an ugly fix though, the clock implementation and CPUfreq driver both needs to be fixed for real.
diff --git a/arch/arm/mach-ux500/clock.c b/arch/arm/mach-ux500/clock.c index e832664..60378b3 100644 --- a/arch/arm/mach-ux500/clock.c +++ b/arch/arm/mach-ux500/clock.c @@ -743,7 +743,8 @@ err_out: late_initcall(clk_debugfs_init); #endif /* defined(CONFIG_DEBUG_FS) */
-unsigned long clk_smp_twd_rate = 400000000; +/* Half the max CPU frequency on most systems (UGLY ASSUMPTION!) */ +unsigned long clk_smp_twd_rate = 500000000;
unsigned long clk_smp_twd_get_rate(struct clk *clk) { @@ -769,7 +770,7 @@ static int clk_twd_cpufreq_transition(struct notifier_block *nb,
if (state == CPUFREQ_PRECHANGE) { /* Save frequency in simple Hz */ - clk_smp_twd_rate = f->new * 1000; + clk_smp_twd_rate = (f->new * 1000) / 2; }
return NOTIFY_OK;
Thanks, Linus Walleij
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 12/16/2011 08:57 AM, Linus Walleij wrote:
On Fri, Dec 16, 2011 at 6:56 AM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
[Me]
what happens if you just reactivate CPUfreq then use menuconfig to go into drivers/ directly you will see the "Clocksource PRCMU Timer" (sorry the clksrc subsystem does not have its own submenu...), deselect that so as to use the more monotone MTU clock source (it lives up in arch/arm/plat-nomadik/timer.c by the way), what happens?
I am still seeing the problem ... :/
Hm, let's see what we know:
- It's not due to the clocksources - we tried two of them (both actually known to be good and stable) since the same timers are used for sched_clock that is probably not the cause either.
- It's related to CPUfreq
- Localtimer smp_twd CPUfreq patches does not help on their own
But my commit ef7a474cef00594ccef432ce0840464e51ea4ac0 adding the smp_twd clock may be giving the wrong frequency to the localtimer. This is on the eternal TODO to fix up the clock implementation for ux500.
Rabin fixed this in the non-mainline kernel: http://git.linaro.org/gitweb?p=bsp/st-ericsson/linux-3.0-ux500.git%3Ba=commi...
As you can see it hacks around in a quite different version of the CPUfreq driver (yes that needs to be fixed up too).
I should have thought about that immediately :-/ Too much in my head.
Can you try something like the below (together with the three smp_twd patches) to see if it solves the problem? It's an ugly fix though, the clock implementation and CPUfreq driver both needs to be fixed for real.
diff --git a/arch/arm/mach-ux500/clock.c b/arch/arm/mach-ux500/clock.c index e832664..60378b3 100644 --- a/arch/arm/mach-ux500/clock.c +++ b/arch/arm/mach-ux500/clock.c @@ -743,7 +743,8 @@ err_out: late_initcall(clk_debugfs_init); #endif /* defined(CONFIG_DEBUG_FS) */
-unsigned long clk_smp_twd_rate = 400000000; +/* Half the max CPU frequency on most systems (UGLY ASSUMPTION!) */ +unsigned long clk_smp_twd_rate = 500000000;
unsigned long clk_smp_twd_get_rate(struct clk *clk) { @@ -769,7 +770,7 @@ static int clk_twd_cpufreq_transition(struct notifier_block *nb,
if (state == CPUFREQ_PRECHANGE) { /* Save frequency in simple Hz */
clk_smp_twd_rate = f->new * 1000;
}clk_smp_twd_rate = (f->new * 1000) / 2;
Hi Linus,
That fixes the problem.
Are you planning to send a fix to lakml ?
Thanks -- Daniel
- -- http://www.linaro.org/ Linaro.org ? Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On Fri, Dec 16, 2011 at 8:20 PM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
diff --git a/arch/arm/mach-ux500/clock.c b/arch/arm/mach-ux500/clock.c index e832664..60378b3 100644 --- a/arch/arm/mach-ux500/clock.c +++ b/arch/arm/mach-ux500/clock.c @@ -743,7 +743,8 @@ err_out: late_initcall(clk_debugfs_init); #endif /* defined(CONFIG_DEBUG_FS) */
-unsigned long clk_smp_twd_rate = 400000000; +/* Half the max CPU frequency on most systems (UGLY ASSUMPTION!) */ +unsigned long clk_smp_twd_rate = 500000000;
unsigned long clk_smp_twd_get_rate(struct clk *clk) { @@ -769,7 +770,7 @@ static int clk_twd_cpufreq_transition(struct notifier_block *nb,
if (state == CPUFREQ_PRECHANGE) { /* Save frequency in simple Hz */
- clk_smp_twd_rate = f->new * 1000;
- clk_smp_twd_rate = (f->new * 1000) / 2;
}
Hi Linus,
That fixes the problem.
Yay!! :-D
Are you planning to send a fix to lakml ?
I don't think I have much of a choice, I would prefer to have a "perfect" solution say by registering smp_twd with the right frequency inside the CPUfreq driver itself (why not by the way) I'll try that.
Thanks, Linus Walleij