On 16 November 2012 06:30, Michael Hudson-Doyle michael.hudson@linaro.orgwrote:
Andy Doan andy.doan@linaro.org writes:
On 11/15/2012 09:56 AM, YongQin Liu wrote:
Hi, Andy
I started a cts job from the command line via lava-dispatch command when I was off my work (about 11:00 UTC), and now the telnet process is consuming the CPU to 100%(started from
12:25).
but the lava-dispatch process is disappeared. that maybe because my ssh connection from company disconnected. And the parent pid of the telnet process becomes 1.
the process has the 7287 pid is the telnet session connected to panda24 liuyq0307@staging:~$ ps -ef|grep telnet 1005 7287 1 65 10:36 pts/4 03:23:52 /usr/bin/telnet serial2
7033
root 14409 14324 0 15:47 pts/1 00:00:00 /usr/bin/telnet serial1
7005
1008 14749 13592 0 15:48 pts/7 00:00:00 grep --color=auto telnet liuyq0307@staging:~$
The output of the top command: Tasks: 158 total, 2 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 23.0%sy, 0.0%ni, 73.0%id, 1.0%wa, 0.0%hi, 0.1%si,
0.0%st
Mem: 8178504k total, 6159136k used, 2019368k free, 65632k buffers Swap: 0k total, 0k used, 0k free, 4714592k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7287 instance 20 0 27516 1520 1228 R 100 0.0 202:05.32 telnet 26256 root 20 0 881m 46m 5740 S 2 0.6 27:02.47 lava-server 5005 instance 20 0 2439m 141m 8588 S 1 1.8 0:10.40 java 13804 liuyq030 20 0 17340 1296 912 R 0 0.0 0:00.01 top 19202 root 20 0 38792 1488 1016 S 0 0.0 5:46.67 adb 26284 postgres 20 0 127m 32m 28m S 0 0.4 0:38.28 postgres 1 root 20 0 24460 2340 1244 S 0 0.0 0:00.97 init
This roughly matches what I remember seeing. However, I can't remember if the dispatcher process was still active or not.
This issue where telnet ends up with ppid 1 and using 100% cpu is the old problem I remember.
But this time, I don't think the telnet process using 100% cpu is because it ends up with ppid1. the consumption started during the test.
I started a new job from web. will check if the ppid of telnet is 1 when it uses 100%CPU.
Thanks, Yongqin Liu
I think it would be interesting to run this again - use byobu or screen so you won't have the disconnect problem. If the dispatcher process is really going away, maybe we just have some odd thing where we aren't killing this telnet process?
We have an atexit hook that kills the telnet process, but in the case of ssh dropping the dispatcher probably got SIGHUPped and so the atexit handler didn't run.
To be clear, the problem we're chasing here is that telnet uses 100% cpu while the dispatcher / lava-android-test is running cts? I think this must be a different problem.
Yes, it should be different problem from the problem mentioned above. And here, the lava-android-test and the CTS itself don't use the telnet command, it's the lava-dispatcher uses the telnet command, so I still think it's the way we use telnet in lava-dispatcher has problems when running CTS test.
Thanks, Yongqin Liu
Cheers, mwh