Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218https://bugs.launchpad.net/linaro-android/+bug/1034218 ), For now I tried two ways to verify it: 1. Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet 2. Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher.
From my investigation, it's the select syscall in telnet that consumes CPU,
So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
Finally, I feel that this problem is of lava-dispatcher, not the problem of lava-android-test or CTS, so can we change it to be a bug of lava-dispatcher?
Hi Andy and Michael,
We need to enable CTS tests for few of the builds as they are a requirement for few projects. Is there anyone looking at the issue from the lava-dispatcher end since load could be an issue once we enable back CTS.
On 14 November 2012 12:18, YongQin Liu yongqin.liu@linaro.org wrote:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218https://bugs.launchpad.net/linaro-android/+bug/1034218 ), For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes CPU, So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
Finally, I feel that this problem is of lava-dispatcher, not the problem of lava-android-test or CTS, so can we change it to be a bug of lava-dispatcher?
-- Thanks, Yongqin Liu
#mailing list linaro-android@lists.linaro.org linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android linaro-validation@lists.linaro.org linaro-dev@lists.linaro.org http://lists.linaro.org/pipermail/linaro-validation
On 11/14/2012 12:48 AM, YongQin Liu wrote:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218 https://bugs.launchpad.net/linaro-android/+bug/1034218), For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes CPU, So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
Do you get 100% CPU when you run the job by hand, ie "lava-dispatch jobfile.json"?
This just doesn't make sense. I don't see how the telnet binary's usage of the "select" API is being influenced by its parent process.
Finally, I feel that this problem is of lava-dispatcher, not the problem of lava-android-test or CTS, so can we change it to be a bug of lava-dispatcher?
That's fine. The core problem remains the same. The big question is: what do we need to try and debug next?
On 14 November 2012 23:57, Andy Doan andy.doan@linaro.org wrote:
On 11/14/2012 12:48 AM, YongQin Liu wrote:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218 <https://bugs.launchpad.net/**linaro-android/+bug/1034218https://bugs.launchpad.net/linaro-android/+bug/1034218
),
For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes CPU, So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
Do you get 100% CPU when you run the job by hand, ie "lava-dispatch jobfile.json"?
Will try to run "lava-dispatch jobfile.json" from command line tomorrow.
Hi, Andy
I started a cts job from the command line via lava-dispatch command when I was off my work (about 11:00 UTC), and now the telnet process is consuming the CPU to 100%(started from 12:25).
but the lava-dispatch process is disappeared. that maybe because my ssh connection from company disconnected. And the parent pid of the telnet process becomes 1.
the process has the 7287 pid is the telnet session connected to panda24 liuyq0307@staging:~$ ps -ef|grep telnet 1005 7287 1 65 10:36 pts/4 03:23:52 /usr/bin/telnet serial2 7033 root 14409 14324 0 15:47 pts/1 00:00:00 /usr/bin/telnet serial1 7005 1008 14749 13592 0 15:48 pts/7 00:00:00 grep --color=auto telnet liuyq0307@staging:~$
The output of the top command: Tasks: 158 total, 2 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 23.0%sy, 0.0%ni, 73.0%id, 1.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8178504k total, 6159136k used, 2019368k free, 65632k buffers Swap: 0k total, 0k used, 0k free, 4714592k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 instance 20 0 27516 1520 1228 R 100 0.0 202:05.32 telnet 26256 root 20 0 881m 46m 5740 S 2 0.6 27:02.47 lava-server 5005 instance 20 0 2439m 141m 8588 S 1 1.8 0:10.40 java 13804 liuyq030 20 0 17340 1296 912 R 0 0.0 0:00.01 top 19202 root 20 0 38792 1488 1016 S 0 0.0 5:46.67 adb 26284 postgres 20 0 127m 32m 28m S 0 0.4 0:38.28 postgres 1 root 20 0 24460 2340 1244 S 0 0.0 0:00.97 init
Thanks, Yongqin Liu On 14/11/2012, Andy Doan andy.doan@linaro.org wrote:
On 11/14/2012 12:48 AM, YongQin Liu wrote:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218 https://bugs.launchpad.net/linaro-android/+bug/1034218), For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes CPU, So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
Do you get 100% CPU when you run the job by hand, ie "lava-dispatch jobfile.json"?
This just doesn't make sense. I don't see how the telnet binary's usage of the "select" API is being influenced by its parent process.
Finally, I feel that this problem is of lava-dispatcher, not the problem of lava-android-test or CTS, so can we change it to be a bug of lava-dispatcher?
That's fine. The core problem remains the same. The big question is: what do we need to try and debug next?
On 11/15/2012 09:56 AM, YongQin Liu wrote:
Hi, Andy
I started a cts job from the command line via lava-dispatch command when I was off my work (about 11:00 UTC), and now the telnet process is consuming the CPU to 100%(started from 12:25).
but the lava-dispatch process is disappeared. that maybe because my ssh connection from company disconnected. And the parent pid of the telnet process becomes 1.
the process has the 7287 pid is the telnet session connected to panda24 liuyq0307@staging:~$ ps -ef|grep telnet 1005 7287 1 65 10:36 pts/4 03:23:52 /usr/bin/telnet serial2 7033 root 14409 14324 0 15:47 pts/1 00:00:00 /usr/bin/telnet serial1 7005 1008 14749 13592 0 15:48 pts/7 00:00:00 grep --color=auto telnet liuyq0307@staging:~$
The output of the top command: Tasks: 158 total, 2 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 23.0%sy, 0.0%ni, 73.0%id, 1.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8178504k total, 6159136k used, 2019368k free, 65632k buffers Swap: 0k total, 0k used, 0k free, 4714592k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 instance 20 0 27516 1520 1228 R 100 0.0 202:05.32 telnet 26256 root 20 0 881m 46m 5740 S 2 0.6 27:02.47 lava-server 5005 instance 20 0 2439m 141m 8588 S 1 1.8 0:10.40 java 13804 liuyq030 20 0 17340 1296 912 R 0 0.0 0:00.01 top 19202 root 20 0 38792 1488 1016 S 0 0.0 5:46.67 adb 26284 postgres 20 0 127m 32m 28m S 0 0.4 0:38.28 postgres 1 root 20 0 24460 2340 1244 S 0 0.0 0:00.97 init
This roughly matches what I remember seeing. However, I can't remember if the dispatcher process was still active or not. I think it would be interesting to run this again - use byobu or screen so you won't have the disconnect problem. If the dispatcher process is really going away, maybe we just have some odd thing where we aren't killing this telnet process?
Andy Doan andy.doan@linaro.org writes:
On 11/15/2012 09:56 AM, YongQin Liu wrote:
Hi, Andy
I started a cts job from the command line via lava-dispatch command when I was off my work (about 11:00 UTC), and now the telnet process is consuming the CPU to 100%(started from 12:25).
but the lava-dispatch process is disappeared. that maybe because my ssh connection from company disconnected. And the parent pid of the telnet process becomes 1.
the process has the 7287 pid is the telnet session connected to panda24 liuyq0307@staging:~$ ps -ef|grep telnet 1005 7287 1 65 10:36 pts/4 03:23:52 /usr/bin/telnet serial2 7033 root 14409 14324 0 15:47 pts/1 00:00:00 /usr/bin/telnet serial1 7005 1008 14749 13592 0 15:48 pts/7 00:00:00 grep --color=auto telnet liuyq0307@staging:~$
The output of the top command: Tasks: 158 total, 2 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 23.0%sy, 0.0%ni, 73.0%id, 1.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8178504k total, 6159136k used, 2019368k free, 65632k buffers Swap: 0k total, 0k used, 0k free, 4714592k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 instance 20 0 27516 1520 1228 R 100 0.0 202:05.32 telnet 26256 root 20 0 881m 46m 5740 S 2 0.6 27:02.47 lava-server 5005 instance 20 0 2439m 141m 8588 S 1 1.8 0:10.40 java 13804 liuyq030 20 0 17340 1296 912 R 0 0.0 0:00.01 top 19202 root 20 0 38792 1488 1016 S 0 0.0 5:46.67 adb 26284 postgres 20 0 127m 32m 28m S 0 0.4 0:38.28 postgres 1 root 20 0 24460 2340 1244 S 0 0.0 0:00.97 init
This roughly matches what I remember seeing. However, I can't remember if the dispatcher process was still active or not.
This issue where telnet ends up with ppid 1 and using 100% cpu is the old problem I remember.
I think it would be interesting to run this again - use byobu or screen so you won't have the disconnect problem. If the dispatcher process is really going away, maybe we just have some odd thing where we aren't killing this telnet process?
We have an atexit hook that kills the telnet process, but in the case of ssh dropping the dispatcher probably got SIGHUPped and so the atexit handler didn't run.
To be clear, the problem we're chasing here is that telnet uses 100% cpu while the dispatcher / lava-android-test is running cts? I think this must be a different problem.
Cheers, mwh
On 16 November 2012 06:30, Michael Hudson-Doyle michael.hudson@linaro.orgwrote:
Andy Doan andy.doan@linaro.org writes:
On 11/15/2012 09:56 AM, YongQin Liu wrote:
Hi, Andy
I started a cts job from the command line via lava-dispatch command when I was off my work (about 11:00 UTC), and now the telnet process is consuming the CPU to 100%(started from
12:25).
but the lava-dispatch process is disappeared. that maybe because my ssh connection from company disconnected. And the parent pid of the telnet process becomes 1.
the process has the 7287 pid is the telnet session connected to panda24 liuyq0307@staging:~$ ps -ef|grep telnet 1005 7287 1 65 10:36 pts/4 03:23:52 /usr/bin/telnet serial2
7033
root 14409 14324 0 15:47 pts/1 00:00:00 /usr/bin/telnet serial1
7005
1008 14749 13592 0 15:48 pts/7 00:00:00 grep --color=auto telnet liuyq0307@staging:~$
The output of the top command: Tasks: 158 total, 2 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 23.0%sy, 0.0%ni, 73.0%id, 1.0%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 8178504k total, 6159136k used, 2019368k free, 65632k buffers Swap: 0k total, 0k used, 0k free, 4714592k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 instance 20 0 27516 1520 1228 R 100 0.0 202:05.32 telnet 26256 root 20 0 881m 46m 5740 S 2 0.6 27:02.47 lava-server 5005 instance 20 0 2439m 141m 8588 S 1 1.8 0:10.40 java 13804 liuyq030 20 0 17340 1296 912 R 0 0.0 0:00.01 top 19202 root 20 0 38792 1488 1016 S 0 0.0 5:46.67 adb 26284 postgres 20 0 127m 32m 28m S 0 0.4 0:38.28 postgres 1 root 20 0 24460 2340 1244 S 0 0.0 0:00.97 init
This roughly matches what I remember seeing. However, I can't remember if the dispatcher process was still active or not.
This issue where telnet ends up with ppid 1 and using 100% cpu is the old problem I remember.
But this time, I don't think the telnet process using 100% cpu is because it ends up with ppid1. the consumption started during the test.
I started a new job from web. will check if the ppid of telnet is 1 when it uses 100%CPU.
Thanks, Yongqin Liu
I think it would be interesting to run this again - use byobu or screen so you won't have the disconnect problem. If the dispatcher process is really going away, maybe we just have some odd thing where we aren't killing this telnet process?
We have an atexit hook that kills the telnet process, but in the case of ssh dropping the dispatcher probably got SIGHUPped and so the atexit handler didn't run.
To be clear, the problem we're chasing here is that telnet uses 100% cpu while the dispatcher / lava-android-test is running cts? I think this must be a different problem.
Yes, it should be different problem from the problem mentioned above. And here, the lava-android-test and the CTS itself don't use the telnet command, it's the lava-dispatcher uses the telnet command, so I still think it's the way we use telnet in lava-dispatcher has problems when running CTS test.
Thanks, Yongqin Liu
Cheers, mwh
On 15 November 2012 20:20, YongQin Liu yongqin.liu@linaro.org wrote:
On 16 November 2012 06:30, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Andy Doan andy.doan@linaro.org writes:
On 11/15/2012 09:56 AM, YongQin Liu wrote:
Hi, Andy
I started a cts job from the command line via lava-dispatch command when I was off my work (about 11:00 UTC), and now the telnet process is consuming the CPU to 100%(started from 12:25).
but the lava-dispatch process is disappeared. that maybe because my ssh connection from company disconnected. And the parent pid of the telnet process becomes 1.
the process has the 7287 pid is the telnet session connected to panda24 liuyq0307@staging:~$ ps -ef|grep telnet 1005 7287 1 65 10:36 pts/4 03:23:52 /usr/bin/telnet serial2 7033 root 14409 14324 0 15:47 pts/1 00:00:00 /usr/bin/telnet serial1 7005 1008 14749 13592 0 15:48 pts/7 00:00:00 grep --color=auto telnet liuyq0307@staging:~$
The output of the top command: Tasks: 158 total, 2 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 23.0%sy, 0.0%ni, 73.0%id, 1.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8178504k total, 6159136k used, 2019368k free, 65632k buffers Swap: 0k total, 0k used, 0k free, 4714592k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 instance 20 0 27516 1520 1228 R 100 0.0 202:05.32 telnet 26256 root 20 0 881m 46m 5740 S 2 0.6 27:02.47 lava-server 5005 instance 20 0 2439m 141m 8588 S 1 1.8 0:10.40 java 13804 liuyq030 20 0 17340 1296 912 R 0 0.0 0:00.01 top 19202 root 20 0 38792 1488 1016 S 0 0.0 5:46.67 adb 26284 postgres 20 0 127m 32m 28m S 0 0.4 0:38.28 postgres 1 root 20 0 24460 2340 1244 S 0 0.0 0:00.97 init
This roughly matches what I remember seeing. However, I can't remember if the dispatcher process was still active or not.
This issue where telnet ends up with ppid 1 and using 100% cpu is the old problem I remember.
But this time, I don't think the telnet process using 100% cpu is because it ends up with ppid1. the consumption started during the test.
I started a new job from web. will check if the ppid of telnet is 1 when it uses 100%CPU.
I have no idea if these suggestions are relevant or pertinent, so...
What is actually happening when telnet is at 100%, is it just a gut of data being transmitted? Also, if it is useful data, can we just disconnect from the process and let it run itself out or mark it low prio so it doesn't consume the machine?
Thanks, Yongqin Liu
I think it would be interesting to run this again - use byobu or screen so you won't have the disconnect problem. If the dispatcher process is really going away, maybe we just have some odd thing where we aren't killing this telnet process?
We have an atexit hook that kills the telnet process, but in the case of ssh dropping the dispatcher probably got SIGHUPped and so the atexit handler didn't run.
To be clear, the problem we're chasing here is that telnet uses 100% cpu while the dispatcher / lava-android-test is running cts? I think this must be a different problem.
Yes, it should be different problem from the problem mentioned above. And here, the lava-android-test and the CTS itself don't use the telnet command, it's the lava-dispatcher uses the telnet command, so I still think it's the way we use telnet in lava-dispatcher has problems when running CTS test.
Thanks, Yongqin Liu
Cheers, mwh
-- Thanks, Yongqin Liu
#mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android linaro-validation@lists.linaro.org http://lists.linaro.org/pipermail/linaro-validation
Zach Pfeffer zach.pfeffer@linaro.org writes:
On 15 November 2012 20:20, YongQin Liu yongqin.liu@linaro.org wrote:
I started a new job from web. will check if the ppid of telnet is 1 when it uses 100%CPU.
I have no idea if these suggestions are relevant or pertinent, so...
It's certainly a relevant thought...
What is actually happening when telnet is at 100%, is it just a gut of data being transmitted?
But sadly my answer is a bit vague and negtative: I don't think data is being transmitted at the time.
Cheers, mwh
On 18 November 2012 14:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Zach Pfeffer zach.pfeffer@linaro.org writes:
On 15 November 2012 20:20, YongQin Liu yongqin.liu@linaro.org wrote:
I started a new job from web. will check if the ppid of telnet is 1 when it uses 100%CPU.
I have no idea if these suggestions are relevant or pertinent, so...
It's certainly a relevant thought...
What is actually happening when telnet is at 100%, is it just a gut of data being transmitted?
But sadly my answer is a bit vague and negtative: I don't think data is being transmitted at the time.
Does the process eventually finish? How long does it take?
Cheers, mwh
+Alexander
On 19 November 2012 07:25, Zach Pfeffer zach.pfeffer@linaro.org wrote:
On 18 November 2012 14:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Zach Pfeffer zach.pfeffer@linaro.org writes:
On 15 November 2012 20:20, YongQin Liu yongqin.liu@linaro.org wrote:
I started a new job from web. will check if the ppid of telnet is 1
when it
uses 100%CPU.
I have no idea if these suggestions are relevant or pertinent, so...
It's certainly a relevant thought...
What is actually happening when telnet is at 100%, is it just a gut of data being transmitted?
But sadly my answer is a bit vague and negtative: I don't think data is being transmitted at the time.
Does the process eventually finish? How long does it take?
Cheers, mwh
-- Zach Pfeffer Android Platform Team Lead, Linaro Platform Teams Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
Hi, Andy & Michael
I submitted the following MP for the CPU consumption of telnet process. https://code.launchpad.net/~liuyq0307/lava-dispatcher/clean-console-when-tes... Could you help to review the MP?
I have tested the modification with two CTS jobs on staging yesterday, and not found the CPU consumption happened from http://staging-metrics.validation.linaro.org:8080/collectd/bin/index.cgi. * Actually the test was on the version before move the DrainConsoleOutput to utils.
And now testing the r465 version, it has finished one job without the CPU consumption happened. Another CTS job is waiting to be executed. So I think this MP can resolve the CPU consumption problem.
Thanks, Yongqin Liu On 19 November 2012 19:30, Vishal Bhoj vishal.bhoj@linaro.org wrote:
+Alexander
On 19 November 2012 07:25, Zach Pfeffer zach.pfeffer@linaro.org wrote:
On 18 November 2012 14:59, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Zach Pfeffer zach.pfeffer@linaro.org writes:
On 15 November 2012 20:20, YongQin Liu yongqin.liu@linaro.org wrote:
I started a new job from web. will check if the ppid of telnet is 1
when it
uses 100%CPU.
I have no idea if these suggestions are relevant or pertinent, so...
It's certainly a relevant thought...
What is actually happening when telnet is at 100%, is it just a gut of data being transmitted?
But sadly my answer is a bit vague and negtative: I don't think data is being transmitted at the time.
Does the process eventually finish? How long does it take?
Cheers, mwh
-- Zach Pfeffer Android Platform Team Lead, Linaro Platform Teams Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
On 11/21/2012 04:34 AM, YongQin Liu wrote:
I submitted the following MP for the CPU consumption of telnet process. https://code.launchpad.net/~liuyq0307/lava-dispatcher/clean-console-when-tes... https://code.launchpad.net/%7Eliuyq0307/lava-dispatcher/clean-console-when-test/+merge/135084 Could you help to review the MP?
Just to keep people in the loop not subscribed to the bug and MP:
This fix is nearly identical to something I've already implemented. I've asked that the two things be unified in the MP so we don't have to maintain both things going forward.
YongQin Liu yongqin.liu@linaro.org writes:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218https://bugs.launchpad.net/linaro-android/+bug/1034218 ), For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes CPU, So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
This is all fairly strange. However, it seems like this is not the problem we had running CTS before; IIRC then it was java processes causing trouble on control. As the load from telnet is pure CPU load (i.e. it doesn't consume masses of RAM or do loads of IO) I would be in favour of enabling CTS again (carefully) and watching the graphs on http://munin.validation.linaro.org/validation.linaro.org/control.validation....
Cheers, mwh
On 15 November 2012 06:52, Michael Hudson-Doyle michael.hudson@linaro.orgwrote:
YongQin Liu yongqin.liu@linaro.org writes:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218https://bugs.launchpad.net/linaro-android/+bug/1034218 ), For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes
CPU,
So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
This is all fairly strange. However, it seems like this is not the problem we had running CTS before; IIRC then it was java processes causing trouble on control. As the load from telnet is pure CPU load (i.e. it doesn't consume masses of RAM or do loads of IO) I would be in favour of enabling CTS again (carefully) and watching the graphs on
http://munin.validation.linaro.org/validation.linaro.org/control.validation....
Okay. I will go ahead and enable it for the builds which have hardware acceleration support: https://android-build.linaro.org/builds/~linaro-android/panda-jb-gcc47-tilt-... https://android-build.linaro.org/builds/~linaro-android/snowball-jb-gcc47-ig...https://android-build.linaro.org/builds/~linaro-android/snowball-jb-gcc47-igloo-stable-blob/# https://android-build.linaro.org/builds/~linaro-android/origen-jb-gcc47-sams...https://android-build.linaro.org/builds/~linaro-android/origen-jb-gcc47-samsunglt-tracking-blob/#
Cheers, mwh
On 14 November 2012 23:35, Vishal Bhoj vishal.bhoj@linaro.org wrote:
On 15 November 2012 06:52, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
YongQin Liu yongqin.liu@linaro.org writes:
Hi, Andy & Michael
About the problem that the telnet process consumes CPU(bug1034218https://bugs.launchpad.net/linaro-android/+bug/1034218 ), For now I tried two ways to verify it:
- Run the CTS test via submitting a lava-job In this way, the process that consumes CPU is telnet
- Run the CTS test via command line "lava-android-test run cts" In this way, there is no process that consumes CPU to 100%, In the meanwhile, I also opened the telnet session.
So I guess the problem is the way we calling the telnet command in lava-dispatcher. From my investigation, it's the select syscall in telnet that consumes CPU, So I doubt if there is some place in lava-dispatcher that reads the ouput of telnet in a loop without sleep in the loop. but I did not find such place in lava-dispatcher.
How do you think about it?
This is all fairly strange. However, it seems like this is not the problem we had running CTS before; IIRC then it was java processes causing trouble on control. As the load from telnet is pure CPU load (i.e. it doesn't consume masses of RAM or do loads of IO) I would be in favour of enabling CTS again (carefully) and watching the graphs on
http://munin.validation.linaro.org/validation.linaro.org/control.validation....
Okay. I will go ahead and enable it for the builds which have hardware acceleration support: https://android-build.linaro.org/builds/~linaro-android/panda-jb-gcc47-tilt-... https://android-build.linaro.org/builds/~linaro-android/snowball-jb-gcc47-ig... https://android-build.linaro.org/builds/~linaro-android/origen-jb-gcc47-sams...
Awesome. Thanks Vishal.
Cheers, mwh
linaro-validation@lists.linaro.org