Here is a bunch of scenarii I am planning to integrate to the pm-qa package.
Any idea or comment will be appreciate.
Note the test cases are designed for a specific host configured to have a minimal number of services running on it and without any pending cron jobs. This pre-requisite is needed in order to not alter the expected results.
Thanks -- Daniel
cpufreq: --------
(1) test the cpufreq framework is available check the following files are present in the sysfs path: /sys/devices/system/cpu/cpu[0-9].* -> cpufreq/scaling_available_frequencies -> cpufreq/scaling_cur_freq -> cpufreq/scaling_setspeed
There are also several other files: -> cpufreq/cpuinfo_max_freq -> cpufreq/cpuinfo_cur_freq -> cpufreq/cpuinfo_min_freq -> cpufreq/cpuinfo_transition_latency -> cpufreq/stats/time_in_state -> cpufreq/stats/total_trans -> cpufreq/stats/trans_table -> ...
Should we do some testing on that or do we assume it is not up to Linaro to do that as being part of the generic cpufreq framework ?
(2) test the change of the frequency is effective in 'userspace' mode
- set the governor to 'userspace' policy - for each frequency and cpu - write the frequency - wait at least cpuinfo_transition_latency - read the frequency - check the frequency is the expected one
(3) test the change of the frequencies affect the performances of a test program
(*) Write a simple program which takes a cpu list as parameter in order to set the affinity on it. This program computes the number of cycles (eg. a simple counter) it did in 1 second and display the result. In case more than one cpu is specified, a process is created for each cpu, setting the affinity on it.
(**) - set the governor to userspace policy - set the frequency to the lower value - wait at least cpuinfo_transition_latency - run the test (*) program for each cpu, combinate them and concatenate the result to a file - for each frequency rerun (**) - check the result file contains noticeable increasing values
(3) test the load of the cpu affects the frequency with 'ondemand'
(*) write a simple program which does nothing more than consuming cpu (no syscall)
for each cpu - set the governor to 'ondemand' - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the higher frequency available
- wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - check the frequency is the lowest available
(4) test the load of the cpu does not affect the frequency with 'userspace'
for each cpu - set the governor to 'userspace' - set the frequency between min and max frequencies - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the one we set
(5) test the load of the cpu does not affect the frequency with 'powersave'
for each cpu - set the governor to 'powersave' - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - check the frequency is the lowest available - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - - read the frequency - kill (*) - check the frequency is the lowest available
(6) test the load of the cpu affects the frequency with 'conservative'
for each cpu - set the governor to 'conservative' for each freq step - set the up_threshold to the freq step - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - check the frequency is equal to higher we have with the freq_step - kill (*)
Has the bug that caused pm qa tests to hang on beagleXM been fixed yet? We had these tests running daily at one time, but had to disable them because they were completely hanging the boards.
Thanks, Paul Larson
On Tue, May 31, 2011 at 10:15 AM, Daniel Lezcano daniel.lezcano@linaro.orgwrote:
Here is a bunch of scenarii I am planning to integrate to the pm-qa package.
Any idea or comment will be appreciate.
Note the test cases are designed for a specific host configured to have a minimal number of services running on it and without any pending cron jobs. This pre-requisite is needed in order to not alter the expected results.
Thanks -- Daniel
cpufreq:
(1) test the cpufreq framework is available check the following files are present in the sysfs path: /sys/devices/system/cpu/cpu[0-9].* -> cpufreq/scaling_available_frequencies -> cpufreq/scaling_cur_freq -> cpufreq/scaling_setspeed
There are also several other files: -> cpufreq/cpuinfo_max_freq -> cpufreq/cpuinfo_cur_freq -> cpufreq/cpuinfo_min_freq -> cpufreq/cpuinfo_transition_latency -> cpufreq/stats/time_in_state -> cpufreq/stats/total_trans -> cpufreq/stats/trans_table -> ...
Should we do some testing on that or do we assume it is not up to Linaro to do that as being part of the generic cpufreq framework ?
(2) test the change of the frequency is effective in 'userspace' mode
- set the governor to 'userspace' policy
- for each frequency and cpu
- write the frequency
- wait at least cpuinfo_transition_latency
- read the frequency
- check the frequency is the expected one
(3) test the change of the frequencies affect the performances of a test program
(*) Write a simple program which takes a cpu list as parameter in order to set the affinity on it. This program computes the number of cycles (eg. a simple counter) it did in 1 second and display the result. In case more than one cpu is specified, a process is created for each cpu, setting the affinity on it.
(**) - set the governor to userspace policy - set the frequency to the lower value - wait at least cpuinfo_transition_latency - run the test (*) program for each cpu, combinate them and concatenate the result to a file - for each frequency rerun (**) - check the result file contains noticeable increasing values
(3) test the load of the cpu affects the frequency with 'ondemand'
(*) write a simple program which does nothing more than consuming cpu (no syscall)
for each cpu - set the governor to 'ondemand' - run (*) in background - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the higher frequency available
- wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - read the frequency - check the frequency is the lowest available
(4) test the load of the cpu does not affect the frequency with 'userspace'
for each cpu - set the governor to 'userspace' - set the frequency between min and max frequencies - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the one we set
(5) test the load of the cpu does not affect the frequency with 'powersave'
for each cpu - set the governor to 'powersave' - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - check the frequency is the lowest available - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - - read the frequency - kill (*) - check the frequency is the lowest available
(6) test the load of the cpu affects the frequency with 'conservative'
for each cpu - set the governor to 'conservative' for each freq step - set the up_threshold to the freq step - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - check the frequency is equal to higher we have with the freq_step - kill (*)
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On 05/31/2011 05:46 PM, Paul Larson wrote:
Has the bug that caused pm qa tests to hang on beagleXM been fixed yet? We had these tests running daily at one time, but had to disable them because they were completely hanging the boards.
Oh, I was not aware of such bug. Do you have a pointer to this bug please ? What do you mean when you say the they were 'hanging the boards' ? Is the board unresponsive or the qa tests are blocked ?
Thanks, Paul Larson
On Tue, May 31, 2011 at 10:15 AM, Daniel Lezcano daniel.lezcano@linaro.orgwrote:
Here is a bunch of scenarii I am planning to integrate to the pm-qa package.
Any idea or comment will be appreciate.
Note the test cases are designed for a specific host configured to have a minimal number of services running on it and without any pending cron jobs. This pre-requisite is needed in order to not alter the expected results.
Thanks -- Daniel
cpufreq:
(1) test the cpufreq framework is available check the following files are present in the sysfs path: /sys/devices/system/cpu/cpu[0-9].* -> cpufreq/scaling_available_frequencies -> cpufreq/scaling_cur_freq -> cpufreq/scaling_setspeed
There are also several other files: -> cpufreq/cpuinfo_max_freq -> cpufreq/cpuinfo_cur_freq -> cpufreq/cpuinfo_min_freq -> cpufreq/cpuinfo_transition_latency -> cpufreq/stats/time_in_state -> cpufreq/stats/total_trans -> cpufreq/stats/trans_table -> ...
Should we do some testing on that or do we assume it is not up to Linaro to do that as being part of the generic cpufreq framework ?
(2) test the change of the frequency is effective in 'userspace' mode
- set the governor to 'userspace' policy - for each frequency and cpu - write the frequency - wait at least cpuinfo_transition_latency - read the frequency - check the frequency is the expected one
(3) test the change of the frequencies affect the performances of a test program
(*) Write a simple program which takes a cpu list as parameter in order to set the affinity on it. This program computes the number of cycles (eg. a simple counter) it did in 1 second and display the result. In case more than one cpu is specified, a process is created for each cpu, setting the affinity on it. (**) - set the governor to userspace policy - set the frequency to the lower value - wait at least cpuinfo_transition_latency - run the test (*) program for each cpu, combinate them and concatenate the result to a file - for each frequency rerun (**) - check the result file contains noticeable increasing values
(3) test the load of the cpu affects the frequency with 'ondemand'
(*) write a simple program which does nothing more than consuming cpu
(no syscall)
for each cpu - set the governor to 'ondemand' - run (*) in background - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the higher frequency available
- wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - read the frequency - check the frequency is the lowest available
(4) test the load of the cpu does not affect the frequency with 'userspace'
for each cpu - set the governor to 'userspace' - set the frequency between min and max frequencies - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the one we set
(5) test the load of the cpu does not affect the frequency with 'powersave'
for each cpu - set the governor to 'powersave' - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - check the frequency is the lowest available - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - - read the frequency - kill (*) - check the frequency is the lowest available
(6) test the load of the cpu affects the frequency with 'conservative'
for each cpu - set the governor to 'conservative' for each freq step - set the up_threshold to the freq step - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - check the frequency is equal to higher we have with the freq_step - kill (*)
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
Since there is no lp project (that I am aware of) for the pmqa tests, I think I just reported it to Amit directly at the time. Iirc, he said to follow up on it when he had someone working on the qa tests, which appears to be you, so I'm doing that now. :) Do you have a place where bugs for it should live? We could put it against abrek I suppose, but it's not really, it's a bug in the testsuite, not the frameworks.
What I was seeing at the time, was avail_freq02 would never exit on beagleXM. It should take only a few seconds to complete I think, based on what I saw on panda. But on beagleXM it would sit in this state for hours:
root@linaro:~# abrek run pwrmgmt [ 241.163391] INFO: task kworker/0:1:24 blocked for more than 120 seconds. [ 241.170379] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.178680] INFO: task avail_freq02.sh:1078 blocked for more than 120 seconds. [ 241.186218] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 361.186889] INFO: task kworker/0:1:24 blocked for more than 120 seconds. [ 361.193878] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.202209] INFO: task avail_freq02.sh:1078 blocked for more than 120 seconds. [ 361.209747] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Thanks, Paul Larson
On Tue, May 31, 2011 at 12:25 PM, Daniel Lezcano daniel.lezcano@linaro.orgwrote:
On 05/31/2011 05:46 PM, Paul Larson wrote:
Has the bug that caused pm qa tests to hang on beagleXM been fixed yet? We had these tests running daily at one time, but had to disable them because they were completely hanging the boards.
Oh, I was not aware of such bug. Do you have a pointer to this bug please ? What do you mean when you say the they were 'hanging the boards' ? Is the board unresponsive or the qa tests are blocked ?
Thanks,
Paul Larson
On Tue, May 31, 2011 at 10:15 AM, Daniel Lezcano daniel.lezcano@linaro.orgwrote:
Here is a bunch of scenarii I am planning to integrate to the pm-qa
package.
Any idea or comment will be appreciate.
Note the test cases are designed for a specific host configured to have a minimal number of services running on it and without any pending cron jobs. This pre-requisite is needed in order to not alter the expected results.
Thanks -- Daniel
cpufreq:
(1) test the cpufreq framework is available check the following files are present in the sysfs path: /sys/devices/system/cpu/cpu[0-9].* -> cpufreq/scaling_available_frequencies -> cpufreq/scaling_cur_freq -> cpufreq/scaling_setspeed
There are also several other files: -> cpufreq/cpuinfo_max_freq -> cpufreq/cpuinfo_cur_freq -> cpufreq/cpuinfo_min_freq -> cpufreq/cpuinfo_transition_latency -> cpufreq/stats/time_in_state -> cpufreq/stats/total_trans -> cpufreq/stats/trans_table -> ...
Should we do some testing on that or do we assume it is not up to Linaro to do that as being part of the generic cpufreq framework ?
(2) test the change of the frequency is effective in 'userspace' mode
- set the governor to 'userspace' policy
- for each frequency and cpu
- write the frequency
- wait at least cpuinfo_transition_latency
- read the frequency
- check the frequency is the expected one
(3) test the change of the frequencies affect the performances of a test program
(*) Write a simple program which takes a cpu list as parameter in order to set the affinity on it. This program computes the number of cycles (eg. a simple counter) it did in 1 second and display the result. In case more than one cpu is specified, a process is created for each cpu, setting the affinity on it.
(**) - set the governor to userspace policy - set the frequency to the lower value - wait at least cpuinfo_transition_latency - run the test (*) program for each cpu, combinate them and concatenate the result to a file - for each frequency rerun (**) - check the result file contains noticeable increasing values
(3) test the load of the cpu affects the frequency with 'ondemand'
(*) write a simple program which does nothing more than consuming cpu (no syscall)
for each cpu - set the governor to 'ondemand' - run (*) in background - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the higher frequency available
- wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - read the frequency - check the frequency is the lowest available
(4) test the load of the cpu does not affect the frequency with 'userspace'
for each cpu - set the governor to 'userspace' - set the frequency between min and max frequencies - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - kill (*) - check the frequency is equal to the one we set
(5) test the load of the cpu does not affect the frequency with 'powersave'
for each cpu - set the governor to 'powersave' - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - check the frequency is the lowest available - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - - read the frequency - kill (*) - check the frequency is the lowest available
(6) test the load of the cpu affects the frequency with 'conservative'
for each cpu - set the governor to 'conservative' for each freq step - set the up_threshold to the freq step - wait at least cpuinfo_transition_latency *
nr_scaling_available_frequencies - run (*) in background - wait at least cpuinfo_transition_latency * nr_scaling_available_frequencies - read the frequency - check the frequency is equal to higher we have with the freq_step - kill (*)
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
On 05/31/2011 08:48 PM, Paul Larson wrote:
Since there is no lp project (that I am aware of) for the pmqa tests, I think I just reported it to Amit directly at the time. Iirc, he said to follow up on it when he had someone working on the qa tests, which appears to be you, so I'm doing that now. :) Do you have a place where bugs for it should live?
No, not yet. Let me discuss with the power management team about creating a launchpad project.
We could put it against abrek I suppose, but it's not really, it's a bug in the testsuite, not the frameworks.
What I was seeing at the time, was avail_freq02 would never exit on beagleXM. It should take only a few seconds to complete I think, based on what I saw on panda. But on beagleXM it would sit in this state for hours:
I doubt the problem is coming from the test suite. At the first glance I think it raises a kernel bug. It is not normal to have an userspace program blocked in uninterruptible state and moreover some kthread blocked too. It is probable there is a domino effect here with a dangling lock in the kernel.
Is it possible to have the kernel version where this problem appears ? Does it happen with beagleXM only or with more boards ? If the former, it will be hard for me to reproduce the problem as I don't have it. Is there a solution for that ? (eg. accessing a boards farm + a magic finger to reboot the boards).
root@linaro:~# abrek run pwrmgmt [ 241.163391] INFO: task kworker/0:1:24 blocked for more than 120 seconds. [ 241.170379] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.178680] INFO: task avail_freq02.sh:1078 blocked for more than 120 seconds. [ 241.186218] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 361.186889] INFO: task kworker/0:1:24 blocked for more than 120 seconds. [ 361.193878] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.202209] INFO: task avail_freq02.sh:1078 blocked for more than 120 seconds. [ 361.209747] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Can you reproduce the problem but with hung_task_panic set to 1, so we will have a full stack trace and the more context.
Thanks -- Daniel
I have a board at home I can try to reproduce with if someone on the pm team doesn't. I probably won't be able to get to it until later in the week though.
Thanks, Paul Larson On May 31, 2011 5:38 PM, "Daniel Lezcano" daniel.lezcano@linaro.org wrote:
On 05/31/2011 08:48 PM, Paul Larson wrote:
Since there is no lp project (that I am aware of) for the pmqa tests, I think I just reported it to Amit directly at the time. Iirc, he said to follow up on it when he had someone working on the qa tests, which
appears
to be you, so I'm doing that now. :) Do you have a place where bugs for
it
should live?
No, not yet. Let me discuss with the power management team about creating a launchpad project.
We could put it against abrek I suppose, but it's not really, it's a bug in the testsuite, not the frameworks.
What I was seeing at the time, was avail_freq02 would never exit on beagleXM. It should take only a few seconds to complete I think, based on what I saw on panda. But on beagleXM it would sit in this state for
hours:
I doubt the problem is coming from the test suite. At the first glance I think it raises a kernel bug. It is not normal to have an userspace program blocked in uninterruptible state and moreover some kthread blocked too. It is probable there is a domino effect here with a dangling lock in the kernel.
Is it possible to have the kernel version where this problem appears ? Does it happen with beagleXM only or with more boards ? If the former, it will be hard for me to reproduce the problem as I don't have it. Is there a solution for that ? (eg. accessing a boards farm + a magic finger to reboot the boards).
root@linaro:~# abrek run pwrmgmt [ 241.163391] INFO: task kworker/0:1:24 blocked for more than 120
seconds.
[ 241.170379] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 241.178680] INFO: task avail_freq02.sh:1078 blocked for more than 120 seconds. [ 241.186218] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 361.186889] INFO: task kworker/0:1:24 blocked for more than 120
seconds.
[ 361.193878] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 361.202209] INFO: task avail_freq02.sh:1078 blocked for more than 120 seconds. [ 361.209747] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Can you reproduce the problem but with hung_task_panic set to 1, so we will have a full stack trace and the more context.
Thanks -- Daniel
On 06/01/2011 09:59 AM, Paul Larson wrote:
I have a board at home I can try to reproduce with if someone on the pm team doesn't. I probably won't be able to get to it until later in the week though.
Ok, thanks very much. I will discuss with the PM team and I will try to get a beagleboardXM. That will probably take more than one week for me.
Do you confirm this problem happens only on the beagleXM ?
Thanks -- Daniel
On Wed, Jun 1, 2011 at 4:10 AM, Daniel Lezcano daniel.lezcano@linaro.orgwrote:
On 06/01/2011 09:59 AM, Paul Larson wrote:
I have a board at home I can try to reproduce with if someone on the pm team doesn't. I probably won't be able to get to it until later in the week though.
Ok, thanks very much. I will discuss with the PM team and I will try to get a beagleboardXM. That will probably take more than one week for me.
Do you confirm this problem happens only on the beagleXM ?
The only other place I was able to test it was panda, and it worked fine there - no problem.
On 06/02/2011 06:14 PM, Paul Larson wrote:
On Wed, Jun 1, 2011 at 4:10 AM, Daniel Lezcanodaniel.lezcano@linaro.orgwrote:
On 06/01/2011 09:59 AM, Paul Larson wrote:
I have a board at home I can try to reproduce with if someone on the pm team doesn't. I probably won't be able to get to it until later in the week though.
Ok, thanks very much. I will discuss with the PM team and I will try to get a beagleboardXM. That will probably take more than one week for me.
Do you confirm this problem happens only on the beagleXM ?
The only other place I was able to test it was panda, and it worked fine there - no problen
Thanks for checking.
I ran the tests on a igepv2 and I had no problem but the cpufreq driver is not complete with the linaro-2.6.38 kernel. On the panda board it is still work in progress AFAIK. I will try with the latest 2.6.39 linaro kernel.
On Wed, Jun 1, 2011 at 1:38 AM, Daniel Lezcano daniel.lezcano@linaro.org wrote:
On 05/31/2011 08:48 PM, Paul Larson wrote:
Since there is no lp project (that I am aware of) for the pmqa tests, I think I just reported it to Amit directly at the time. Iirc, he said to follow up on it when he had someone working on the qa tests, which appears to be you, so I'm doing that now. :) Do you have a place where bugs for it should live?
No, not yet. Let me discuss with the power management team about creating a launchpad project.
Yes. I've created a new branch in LP linked to Torez's git tree. We can file a bug against that for now.
https://code.edge.launchpad.net/~linaro-pm-wg/linaro-pm-wg/devel
Once Daniel gets control of the git repo, we can switch to branch.