As pointed by Lorenzo, when a cpu powers down, the L1 cache must be flushed before, otherwise:
* data cachelines are not empty and the other cpu may fetch data * cpu will lost some data leading to a memory corruption
Note this bug is very difficult to reproduce and this test will not spot the issue everytime.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org --- cpuidle/cpuidle-l1.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ cpuidle/cpuidle_05.sh | 42 ++++++++++++++++++++++++++++ cpuidle/cpuidle_05.txt | 1 + 3 files changed, 115 insertions(+) create mode 100644 cpuidle/cpuidle-l1.c create mode 100755 cpuidle/cpuidle_05.sh create mode 100644 cpuidle/cpuidle_05.txt
diff --git a/cpuidle/cpuidle-l1.c b/cpuidle/cpuidle-l1.c new file mode 100644 index 0000000..bbcde28 --- /dev/null +++ b/cpuidle/cpuidle-l1.c @@ -0,0 +1,72 @@ +#include <stdio.h> +#include <stdlib.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <pthread.h> +#define BUFSIZE (4*1024) +#define DEADBEEF 0xDEADBEEF +static int buffer[BUFSIZE]; + +static pthread_t threads[64]; + +void *thread_routine(void *arg) +{ + int i, display = *(int *)arg; + int dummy; + + for (i = 0; i < 100; i++) { + + int j; + + for (j = 0; j < BUFSIZE * 1000; j++) { + dummy = buffer[j % BUFSIZE]; + dummy++; + } + + usleep(200000); + + if (buffer[i] != DEADBEEF) { + fprintf(stderr, "memory corruption\n"); + return (void *)-1; + } + + if (display == 0) + printf("%d%%%s", i, i < 10 ? "\b\b" : "\b\b\b"); + } + + if (display == 0) + printf(" \b\b\b\b"); + + return NULL; +} + +int main(int argc, char *argv[]) +{ + int i, ret = 0; + int nrcpus = sysconf(_SC_NPROCESSORS_ONLN); + + for (i = 0; i < BUFSIZE; i++) + buffer[i] = DEADBEEF; + + setbuf(stdout, NULL); + + for(i = 0; i < nrcpus; i++) { + + if (pthread_create(&threads[i], NULL, thread_routine, &i)) { + perror("pthread_create"); + return 1; + } + + } + + for (i = 0; i < nrcpus; i++) { + void *result; + pthread_join(threads[i], &result); + + if (result == (void *)-1) + ret = 1; + } + + return ret; +} diff --git a/cpuidle/cpuidle_05.sh b/cpuidle/cpuidle_05.sh new file mode 100755 index 0000000..679439d --- /dev/null +++ b/cpuidle/cpuidle_05.sh @@ -0,0 +1,42 @@ +#!/bin/bash +# +# PM-QA validation test suite for the power management on Linux +# +# Copyright (C) 2011, Linaro Limited. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. +# +# Contributors: +# Daniel Lezcano daniel.lezcano@linaro.org (IBM Corporation) +# - initial API and implementation +# + +# URL : https://wiki.linaro.org/WorkingGroups/PowerManagement/Resources/TestSuite/Pm... + +source ../include/functions.sh + +CPUIDLE_L1=./cpuidle-l1 + +if [ $(id -u) -ne 0 ]; then + log_skip "run as non-root" + exit 0 +fi + +check_cpuidle_l1() { + check "Fill L1 cache and sleep" "./$CPUIDLE_L1" +} + +check_cpuidle_l1 +test_status_show diff --git a/cpuidle/cpuidle_05.txt b/cpuidle/cpuidle_05.txt new file mode 100644 index 0000000..1f80e36 --- /dev/null +++ b/cpuidle/cpuidle_05.txt @@ -0,0 +1 @@ +Run cpuidle L1 test program to catch L1 flush missing vs cpu power down
Hi Daniel,
Have you noticed this on any platform yet with this test?
Regards, Amit
On Tue, Apr 8, 2014 at 4:05 PM, Daniel Lezcano daniel.lezcano@linaro.orgwrote:
As pointed by Lorenzo, when a cpu powers down, the L1 cache must be flushed before, otherwise:
- data cachelines are not empty and the other cpu may fetch data
- cpu will lost some data leading to a memory corruption
Note this bug is very difficult to reproduce and this test will not spot the issue everytime.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
cpuidle/cpuidle-l1.c | 72 ++++++++++++++++++++++++++++++++++++++++++++++++ cpuidle/cpuidle_05.sh | 42 ++++++++++++++++++++++++++++ cpuidle/cpuidle_05.txt | 1 + 3 files changed, 115 insertions(+) create mode 100644 cpuidle/cpuidle-l1.c create mode 100755 cpuidle/cpuidle_05.sh create mode 100644 cpuidle/cpuidle_05.txt
diff --git a/cpuidle/cpuidle-l1.c b/cpuidle/cpuidle-l1.c new file mode 100644 index 0000000..bbcde28 --- /dev/null +++ b/cpuidle/cpuidle-l1.c @@ -0,0 +1,72 @@ +#include <stdio.h> +#include <stdlib.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> +#include <pthread.h> +#define BUFSIZE (4*1024) +#define DEADBEEF 0xDEADBEEF +static int buffer[BUFSIZE];
+static pthread_t threads[64];
+void *thread_routine(void *arg) +{
int i, display = *(int *)arg;
int dummy;
for (i = 0; i < 100; i++) {
int j;
for (j = 0; j < BUFSIZE * 1000; j++) {
dummy = buffer[j % BUFSIZE];
dummy++;
}
usleep(200000);
if (buffer[i] != DEADBEEF) {
fprintf(stderr, "memory corruption\n");
return (void *)-1;
}
if (display == 0)
printf("%d%%%s", i, i < 10 ? "\b\b" : "\b\b\b");
}
if (display == 0)
printf(" \b\b\b\b");
return NULL;
+}
+int main(int argc, char *argv[]) +{
int i, ret = 0;
int nrcpus = sysconf(_SC_NPROCESSORS_ONLN);
for (i = 0; i < BUFSIZE; i++)
buffer[i] = DEADBEEF;
setbuf(stdout, NULL);
for(i = 0; i < nrcpus; i++) {
if (pthread_create(&threads[i], NULL, thread_routine, &i))
{
perror("pthread_create");
return 1;
}
}
for (i = 0; i < nrcpus; i++) {
void *result;
pthread_join(threads[i], &result);
if (result == (void *)-1)
ret = 1;
}
return ret;
+} diff --git a/cpuidle/cpuidle_05.sh b/cpuidle/cpuidle_05.sh new file mode 100755 index 0000000..679439d --- /dev/null +++ b/cpuidle/cpuidle_05.sh @@ -0,0 +1,42 @@ +#!/bin/bash +# +# PM-QA validation test suite for the power management on Linux +# +# Copyright (C) 2011, Linaro Limited. +# +# This program is free software; you can redistribute it and/or +# modify it under the terms of the GNU General Public License +# as published by the Free Software Foundation; either version 2 +# of the License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. +# +# Contributors: +# Daniel Lezcano daniel.lezcano@linaro.org (IBM Corporation) +# - initial API and implementation +#
+# URL : https://wiki.linaro.org/WorkingGroups/PowerManagement/Resources/TestSuite/Pm...
+source ../include/functions.sh
+CPUIDLE_L1=./cpuidle-l1
+if [ $(id -u) -ne 0 ]; then
- log_skip "run as non-root"
- exit 0
+fi
+check_cpuidle_l1() {
- check "Fill L1 cache and sleep" "./$CPUIDLE_L1"
+}
+check_cpuidle_l1 +test_status_show diff --git a/cpuidle/cpuidle_05.txt b/cpuidle/cpuidle_05.txt new file mode 100644 index 0000000..1f80e36 --- /dev/null +++ b/cpuidle/cpuidle_05.txt @@ -0,0 +1 @@
+Run cpuidle L1 test program to catch L1 flush missing vs cpu power down
1.7.9.5
On 04/08/2014 01:07 PM, Amit Kucheria wrote:
Hi Daniel,
Have you noticed this on any platform yet with this test?
I have noticed a very very rare hang on the exynos4 board with this test and the dual cpu support but it is not reproducible enough to check if the cache flush fixes it or not (or it is related to the cpuidle driver). I had a long discussion yesterday with Lorenzo who explained me what could happen without flushing the cache and exiting the SMP mode.
AFAICT,
* The tc2 flushes its cache.
* exynos5 still need to disable cpu1 to enter the AFTR state (which is broken today), so cache is flushed in the hotplug code path. I hope I can spot the issue with a quad core.
* omap4 flushes its cache.
* omap3 is not concerned by this because it is an UP system.
* vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu power down.
* calxeda hides that through the firmware I believe
* I don't know for tegra but I assume they are flushing and disabling the cache
I hope with this test we can spot the issue, if any, with multiple runs on the boards, especially when new drivers will be implemented.
Hi Daniel,
- L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP) is part of the recommended sequence for Individual core power down . - If a core is powered down having dirty lines in L1 then the system should encounter an issue (abort) very easily. May be the first idle attempt itself is sufficient to break things. Does any platform (exynos4 ?) work without doing L1 D cache clean in cpu idle individual core power down sequence ?
On 8 April 2014 17:07, Daniel Lezcano daniel.lezcano@linaro.org wrote:
On 04/08/2014 01:07 PM, Amit Kucheria wrote:
Hi Daniel,
Have you noticed this on any platform yet with this test?
I have noticed a very very rare hang on the exynos4 board with this test and the dual cpu support but it is not reproducible enough to check if the cache flush fixes it or not (or it is related to the cpuidle driver). I had a long discussion yesterday with Lorenzo who explained me what could happen without flushing the cache and exiting the SMP mode.
AFAICT,
The tc2 flushes its cache.
exynos5 still need to disable cpu1 to enter the AFTR state (which is
broken today), so cache is flushed in the hotplug code path. I hope I can spot the issue with a quad core.
omap4 flushes its cache.
omap3 is not concerned by this because it is an UP system.
vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu power
down.
calxeda hides that through the firmware I believe
I don't know for tegra but I assume they are flushing and disabling the
cache
I hope with this test we can spot the issue, if any, with multiple runs on the boards, especially when new drivers will be implemented.
-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Hi Sandeep,
On 04/09/2014 07:15 AM, Sandeep Tripathy wrote:
Hi Daniel, - L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP) is part
of the recommended sequence for Individual core power down .
Yes, absolutely. It is what Lorenzo and I discussed.
The macro v7_exit_coherency_flush should be used for that.
- If a core is powered down having dirty lines in L1 then the
system should encounter an issue (abort) very easily. May be the first idle attempt itself is sufficient to break things.
Does any platform (exynos4 ?) work without doing L1 D cache
clean in cpu idle individual core power down sequence ?
Yes, the power down sequence is done but without L1 D cache clean and it is very very hard to make the board hang. It is so rare, I can't say 100% it is related to the driver itself or something else.
If there is a way to spot the issue, I will be happy to test it.
Thanks -- Daniel
On 8 April 2014 17:07, Daniel Lezcano <daniel.lezcano@linaro.org mailto:daniel.lezcano@linaro.org> wrote:
On 04/08/2014 01:07 PM, Amit Kucheria wrote: Hi Daniel, Have you noticed this on any platform yet with this test? I have noticed a very very rare hang on the exynos4 board with this test and the dual cpu support but it is not reproducible enough to check if the cache flush fixes it or not (or it is related to the cpuidle driver). I had a long discussion yesterday with Lorenzo who explained me what could happen without flushing the cache and exiting the SMP mode. AFAICT, * The tc2 flushes its cache. * exynos5 still need to disable cpu1 to enter the AFTR state (which is broken today), so cache is flushed in the hotplug code path. I hope I can spot the issue with a quad core. * omap4 flushes its cache. * omap3 is not concerned by this because it is an UP system. * vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu power down. * calxeda hides that through the firmware I believe * I don't know for tegra but I assume they are flushing and disabling the cache I hope with this test we can spot the issue, if any, with multiple runs on the boards, especially when new drivers will be implemented. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/__pages/Linaro <http://www.facebook.com/pages/Linaro>> Facebook | <http://twitter.com/#!/__linaroorg <http://twitter.com/#!/linaroorg>> Twitter | <http://www.linaro.org/linaro-__blog/ <http://www.linaro.org/linaro-blog/>> Blog _________________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org <mailto:linaro-dev@lists.linaro.org> http://lists.linaro.org/__mailman/listinfo/linaro-dev <http://lists.linaro.org/mailman/listinfo/linaro-dev>
Hi Daniel, Just for test you may try removing the flush_cache_louis(); from __cpu_suspend_save . Because if the driver is using cpu_suspend() in idle path then almost all the important data on core L1 is clean. Still it can fail iff the code after that ( cpu_suspend()) modifies some cacheable important data because SCTLR.C bit is not cleared yet.
It should just fail if no cache clean is done before powering down. And it should work iff v7_exit_coherency_flush() or similar is done.
Note: This is based on my observation on A7 quad. Please correct me if the understanding is wrong.
Thanks Sandeep
On 9 April 2014 13:24, Daniel Lezcano daniel.lezcano@linaro.org wrote:
Hi Sandeep,
On 04/09/2014 07:15 AM, Sandeep Tripathy wrote:
Hi Daniel, - L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP) is part
of the recommended sequence for Individual core power down .
Yes, absolutely. It is what Lorenzo and I discussed.
The macro v7_exit_coherency_flush should be used for that.
- If a core is powered down having dirty lines in L1 then the
system should encounter an issue (abort) very easily. May be the first idle attempt itself is sufficient to break things.
Does any platform (exynos4 ?) work without doing L1 D cache
clean in cpu idle individual core power down sequence ?
Yes, the power down sequence is done but without L1 D cache clean and it is very very hard to make the board hang. It is so rare, I can't say 100% it is related to the driver itself or something else.
If there is a way to spot the issue, I will be happy to test it.
Thanks -- Daniel
On 8 April 2014 17:07, Daniel Lezcano <daniel.lezcano@linaro.org
mailto:daniel.lezcano@linaro.org> wrote:
On 04/08/2014 01:07 PM, Amit Kucheria wrote: Hi Daniel, Have you noticed this on any platform yet with this test? I have noticed a very very rare hang on the exynos4 board with this test and the dual cpu support but it is not reproducible enough to check if the cache flush fixes it or not (or it is related to the cpuidle driver). I had a long discussion yesterday with Lorenzo who explained me what could happen without flushing the cache and exiting the SMP mode. AFAICT, * The tc2 flushes its cache. * exynos5 still need to disable cpu1 to enter the AFTR state (which is broken today), so cache is flushed in the hotplug code path. I hope I can spot the issue with a quad core. * omap4 flushes its cache. * omap3 is not concerned by this because it is an UP system. * vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu power down. * calxeda hides that through the firmware I believe * I don't know for tegra but I assume they are flushing and disabling the cache I hope with this test we can spot the issue, if any, with multiple runs on the boards, especially when new drivers will be implemented. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/__pages/Linaro <http://www.facebook.com/pages/Linaro>> Facebook | <http://twitter.com/#!/__linaroorg <http://twitter.com/#!/linaroorg>> Twitter | <http://www.linaro.org/linaro-__blog/ <http://www.linaro.org/linaro-blog/>> Blog _________________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org <mailto:linaro-dev@lists.linaro.org> http://lists.linaro.org/__mailman/listinfo/linaro-dev <http://lists.linaro.org/mailman/listinfo/linaro-dev>
-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog
On 04/09/2014 02:19 PM, Sandeep Tripathy wrote:
Hi Daniel, Just for test you may try removing the flush_cache_louis(); from __cpu_suspend_save . Because if the driver is using cpu_suspend() in idle path then almost all the important data on core L1 is clean. Still it can fail iff the code after that ( cpu_suspend()) modifies some cacheable important data because SCTLR.C bit is not cleared yet. It should just fail if no cache clean is done before powering down. And it should work iff v7_exit_coherency_flush() or similar is done. Note: This is based on my observation on A7 quad. Please correct me if the understanding is wrong.
Thanks Sandeep for the info.
I think Lorenzo spotted the SCTLR.C bit must be cleared before powering down the cpu and what does cpu_suspend is not enough because some data could be fetch from the other cpu. The only way to properly handle this is to call always v7_exit_coherency_flush before powering down the cpu.
On 9 April 2014 13:24, Daniel Lezcano <daniel.lezcano@linaro.org mailto:daniel.lezcano@linaro.org> wrote:
Hi Sandeep, On 04/09/2014 07:15 AM, Sandeep Tripathy wrote: Hi Daniel, - L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP) is part of the recommended sequence for Individual core power down . Yes, absolutely. It is what Lorenzo and I discussed. The macro v7_exit_coherency_flush should be used for that. - If a core is powered down having dirty lines in L1 then the system should encounter an issue (abort) very easily. May be the first idle attempt itself is sufficient to break things. Does any platform (exynos4 ?) work without doing L1 D cache clean in cpu idle individual core power down sequence ? Yes, the power down sequence is done but without L1 D cache clean and it is very very hard to make the board hang. It is so rare, I can't say 100% it is related to the driver itself or something else. If there is a way to spot the issue, I will be happy to test it. Thanks -- Daniel On 8 April 2014 17:07, Daniel Lezcano <daniel.lezcano@linaro.org <mailto:daniel.lezcano@linaro.org> <mailto:daniel.lezcano@linaro.__org <mailto:daniel.lezcano@linaro.org>>> wrote: On 04/08/2014 01:07 PM, Amit Kucheria wrote: Hi Daniel, Have you noticed this on any platform yet with this test? I have noticed a very very rare hang on the exynos4 board with this test and the dual cpu support but it is not reproducible enough to check if the cache flush fixes it or not (or it is related to the cpuidle driver). I had a long discussion yesterday with Lorenzo who explained me what could happen without flushing the cache and exiting the SMP mode. AFAICT, * The tc2 flushes its cache. * exynos5 still need to disable cpu1 to enter the AFTR state (which is broken today), so cache is flushed in the hotplug code path. I hope I can spot the issue with a quad core. * omap4 flushes its cache. * omap3 is not concerned by this because it is an UP system. * vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu power down. * calxeda hides that through the firmware I believe * I don't know for tegra but I assume they are flushing and disabling the cache I hope with this test we can spot the issue, if any, with multiple runs on the boards, especially when new drivers will be implemented. -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/____pages/Linaro <http://www.facebook.com/__pages/Linaro> <http://www.facebook.com/__pages/Linaro <http://www.facebook.com/pages/Linaro>>> Facebook | <http://twitter.com/#!/____linaroorg <http://twitter.com/#!/__linaroorg> <http://twitter.com/#!/__linaroorg <http://twitter.com/#!/linaroorg>>> Twitter | <http://www.linaro.org/linaro-____blog/ <http://www.linaro.org/linaro-__blog/> <http://www.linaro.org/linaro-__blog/ <http://www.linaro.org/linaro-blog/>>> Blog ___________________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org <mailto:linaro-dev@lists.linaro.org> <mailto:linaro-dev@lists.__linaro.org <mailto:linaro-dev@lists.linaro.org>> http://lists.linaro.org/____mailman/listinfo/linaro-dev <http://lists.linaro.org/__mailman/listinfo/linaro-dev> <http://lists.linaro.org/__mailman/listinfo/linaro-dev <http://lists.linaro.org/mailman/listinfo/linaro-dev>> -- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/__pages/Linaro <http://www.facebook.com/pages/Linaro>> Facebook | <http://twitter.com/#!/__linaroorg <http://twitter.com/#!/linaroorg>> Twitter | <http://www.linaro.org/linaro-__blog/ <http://www.linaro.org/linaro-blog/>> Blog
hello Daniel,
Sorry for the late reply. few minor comments:
- this is needed both for ubuntu & android? need makefile changes - can you move the cache validation unit to utils, as all other utilities are placed there.
On Tuesday 08 April 2014 04:05 PM, Daniel Lezcano wrote:
As pointed by Lorenzo, when a cpu powers down, the L1 cache must be flushed before, otherwise:
- data cachelines are not empty and the other cpu may fetch data
- cpu will lost some data leading to a memory corruption
Note this bug is very difficult to reproduce and this test will not spot the issue everytime.
[...]