On 16 November 2012 06:30, Michael Hudson-Doyle <michael.hudson@linaro.org> wrote:
Andy Doan <andy.doan@linaro.org> writes:

> On 11/15/2012 09:56 AM, YongQin Liu wrote:
>> Hi, Andy
>>
>> I started a cts job from the command line via lava-dispatch command
>> when I was off my work (about 11:00 UTC),
>> and now the telnet process is consuming the CPU to 100%(started from 12:25).
>>
>> but the lava-dispatch process is disappeared. that maybe because my
>> ssh connection from company disconnected. And the parent pid of the
>> telnet process becomes 1.
>>
>> the process has the 7287 pid  is the telnet session connected to panda24
>> liuyq0307@staging:~$ ps -ef|grep telnet
>> 1005      7287     1 65 10:36 pts/4    03:23:52 /usr/bin/telnet serial2 7033
>> root     14409 14324  0 15:47 pts/1    00:00:00 /usr/bin/telnet serial1 7005
>> 1008     14749 13592  0 15:48 pts/7    00:00:00 grep --color=auto telnet
>> liuyq0307@staging:~$
>>
>> The output of the top command:
>> Tasks: 158 total,   2 running, 156 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  2.9%us, 23.0%sy,  0.0%ni, 73.0%id,  1.0%wa,  0.0%hi,  0.1%si,  0.0%st
>> Mem:   8178504k total,  6159136k used,  2019368k free,    65632k buffers
>> Swap:        0k total,        0k used,        0k free,  4714592k cached
>>
>>    PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>   7287 instance  20   0 27516 1520 1228 R  100  0.0 202:05.32 telnet
>> 26256 root      20   0  881m  46m 5740 S    2  0.6  27:02.47
>> lava-server
>>   5005 instance  20   0 2439m 141m 8588 S    1  1.8   0:10.40 java
>> 13804 liuyq030  20   0 17340 1296  912 R    0  0.0   0:00.01 top
>> 19202 root      20   0 38792 1488 1016 S    0  0.0   5:46.67 adb
>> 26284 postgres  20   0  127m  32m  28m S    0  0.4   0:38.28 postgres
>>      1 root      20   0 24460 2340 1244 S    0  0.0   0:00.97 init
>
> This roughly matches what I remember seeing. However, I can't remember
> if the dispatcher process was still active or not.

This issue where telnet ends up with ppid 1 and using 100% cpu is the
old problem I remember.
But this time, I don't think the telnet process using 100% cpu is because it ends up with ppid1.
the consumption started during  the test.

I started a new job from web. will check if the ppid of telnet is 1 when it uses 100%CPU.

Thanks,
Yongqin Liu

> I think it would be interesting to run this again - use byobu or
> screen so you won't have the disconnect problem. If the dispatcher
> process is really going away, maybe we just have some odd thing where
> we aren't killing this telnet process?

We have an atexit hook that kills the telnet process, but in the case of
ssh dropping the dispatcher probably got SIGHUPped and so the atexit
handler didn't run.

To be clear, the problem we're chasing here is that telnet uses 100% cpu
while the dispatcher / lava-android-test is running cts?  I think this
must be a different problem.
Yes, it should be different problem from the problem mentioned above. 
And here, the lava-android-test and the CTS itself don't use the telnet command,
it's the lava-dispatcher uses the telnet command, 
so I still think it's the way we use telnet in lava-dispatcher has problems when running CTS test.

Thanks,
Yongqin Liu

Cheers,
mwh



--
Thanks,
Yongqin Liu
---------------------------------------------------------------
#mailing list
linaro-android@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-android
linaro-validation@lists.linaro.org
http://lists.linaro.org/pipermail/linaro-validation