+ android team
This is becoming a big problem. I just checked and the load on the system was growing out of control again.
Upon inspection, I found several Java (CTS) processes that were consuming lots of CPU cycles, but upon inspection there was no ADB connection to the board they were supposed to be testing.
I also found a couple of places where MonkeyRunner seemed to crash, but keep running (and also consume CPU cycles).
In addition the mmtest output is ridiculous. It just dumps %download info throughout our log file making it really hard to read through the logs.
I think we need to do a few things quickly here:
1) We need to limit the amount of builds sending CTS jobs until we understand what's going on. I'd suggest doing it for something like Origen or Panda since we have the most of those boards. Right now, the jobs are queuing up on snowball faster than it can unsuccessfully execute them
2) We need to understand what's going on with CTS. Is this due to the patched version we deployed, etc?
3) For sanity sake update the logic for the wgets in mmtest.py to not dump so much junk.
We'll need the android team's help on item 1. I also think we may need their help on item 2.
On 07/16/2012 10:06 PM, Michael Hudson-Doyle wrote:
Hi gang,
We had a fright today with LAVA being unreachable. Luckily, we could log in again after a time and notice the cause of the load: 10 or so Java processes like this:
root 31180 53.2 0.5 8175148 159856 ? Sl Jul16 603:22 java -cp :./android-cts/tools/../../android-cts/tools/ddmlib-prebuilt.jar:./android-cts/tools/../../android-cts/tools/tradefed-prebuilt.jar:./android-cts/tools/../../android-cts/tools/hosttestlib.jar:./android-cts/tools/../../android-cts/tools/cts-tradefed.jar -DCTS_ROOT=./android-cts/tools/../.. com.android.cts.tradefed.command.CtsConsole run cts --serial 192.168.1.199:5555 --plan CTS
Clearly, this is related to the CTS upgrades we've done recently. There was no device connected to 192.168.1.199:5555 so somehow we're leaking these processes. I guess we should stop that :-)
While looking into this, I noticed that monkeyrunner tests are quite CPU heavy. Is this expected? Do we need to limit how many of these we run at once?
Cheers, mwh
On 17 Jul 2012 23:30, "Andy Doan" andy.doan@linaro.org wrote:
- android team
This is becoming a big problem. I just checked and the load on the system
was growing out of control again.
Upon inspection, I found several Java (CTS) processes that were consuming
lots of CPU cycles, but upon inspection there was no ADB connection to the board they were supposed to be testing.
I also found a couple of places where MonkeyRunner seemed to crash, but
keep running (and also consume CPU cycles).
In addition the mmtest output is ridiculous. It just dumps %download info
throughout our log file making it really hard to read through the logs.
I think we need to do a few things quickly here:
- We need to limit the amount of builds sending CTS jobs until we
understand what's going on. I'd suggest doing it for something like Origen or Panda since we have the most of those boards. Right now, the jobs are queuing up on snowball faster than it can unsuccessfully execute them
- We need to understand what's going on with CTS. Is this due to the
patched version we deployed, etc?
I have a suggestion. Should we run CTS on release candidates only? This is a time consuming test. Do we need to run it on all daily builds?
- For sanity sake update the logic for the wgets in mmtest.py to not
dump so much junk.
We'll need the android team's help on item 1. I also think we may need
their help on item 2.
On 07/16/2012 10:06 PM, Michael Hudson-Doyle wrote:
Hi gang,
We had a fright today with LAVA being unreachable. Luckily, we could log in again after a time and notice the cause of the load: 10 or so Java processes like this:
root 31180 53.2 0.5 8175148 159856 ? Sl Jul16 603:22 java
-cp :./android-cts/tools/../../android-cts/tools/ddmlib-prebuilt.jar:./android-cts/tools/../../android-cts/tools/tradefed-prebuilt.jar:./android-cts/tools/../../android-cts/tools/hosttestlib.jar:./android-cts/tools/../../android-cts/tools/cts-tradefed.jar -DCTS_ROOT=./android-cts/tools/../.. com.android.cts.tradefed.command.CtsConsole run cts --serial 192.168.1.199:5555 --plan CTS
Clearly, this is related to the CTS upgrades we've done recently. There was no device connected to 192.168.1.199:5555 so somehow we're leaking these processes. I guess we should stop that :-)
While looking into this, I noticed that monkeyrunner tests are quite CPU heavy. Is this expected? Do we need to limit how many of these we run at once?
Cheers, mwh
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
On 17 July 2012 13:00, Andy Doan andy.doan@linaro.org wrote:
- android team
This is becoming a big problem. I just checked and the load on the system was growing out of control again.
Upon inspection, I found several Java (CTS) processes that were consuming lots of CPU cycles, but upon inspection there was no ADB connection to the board they were supposed to be testing.
I also found a couple of places where MonkeyRunner seemed to crash, but keep running (and also consume CPU cycles).
In addition the mmtest output is ridiculous. It just dumps %download info throughout our log file making it really hard to read through the logs.
I think we need to do a few things quickly here:
- We need to limit the amount of builds sending CTS jobs until we
understand what's going on. I'd suggest doing it for something like Origen or Panda since we have the most of those boards. Right now, the jobs are queuing up on snowball faster than it can unsuccessfully execute them
- We need to understand what's going on with CTS. Is this due to the
patched version we deployed, etc?
- For sanity sake update the logic for the wgets in mmtest.py to not dump
so much junk.
We'll need the android team's help on item 1. I also think we may need their help on item 2.
On 07/16/2012 10:06 PM, Michael Hudson-Doyle wrote:
Hi gang,
We had a fright today with LAVA being unreachable. Luckily, we could log in again after a time and notice the cause of the load: 10 or so Java processes like this:
root 31180 53.2 0.5 8175148 159856 ? Sl Jul16 603:22 java -cp :./android-cts/tools/../../android-cts/tools/ddmlib-prebuilt.jar:./android-cts/tools/../../android-cts/tools/tradefed-prebuilt.jar:./android-cts/tools/../../android-cts/tools/hosttestlib.jar:./android-cts/tools/../../android-cts/tools/cts-tradefed.jar -DCTS_ROOT=./android-cts/tools/../.. com.android.cts.tradefed.command.CtsConsole run cts --serial 192.168.1.199:5555 --plan CTS
Clearly, this is related to the CTS upgrades we've done recently. There was no device connected to 192.168.1.199:5555 so somehow we're leaking these processes. I guess we should stop that :-)
While looking into this, I noticed that monkeyrunner tests are quite CPU heavy. Is this expected? Do we need to limit how many of these we run at once?
Cheers, mwh
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
On 07/17/2012 01:27 PM, Zach Pfeffer wrote:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I think we should go and ahead "shut off the CTS valve". LAVA is getting hammered right now, and we need to get back to a sane state.
The investigation shouldn't depend on this being enabled, so I think Zygmunt can still make progress.
Aye. I took cts out of the test list.
On 17 July 2012 17:06, Andy Doan andy.doan@linaro.org wrote:
On 07/17/2012 01:27 PM, Zach Pfeffer wrote:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I think we should go and ahead "shut off the CTS valve". LAVA is getting hammered right now, and we need to get back to a sane state.
The investigation shouldn't depend on this being enabled, so I think Zygmunt can still make progress.
W dniu 17.07.2012 20:27, Zach Pfeffer pisze:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Thanks ZK
On 17 July 2012 13:00, Andy Doan andy.doan@linaro.org wrote:
- android team
This is becoming a big problem. I just checked and the load on the system was growing out of control again.
Upon inspection, I found several Java (CTS) processes that were consuming lots of CPU cycles, but upon inspection there was no ADB connection to the board they were supposed to be testing.
I also found a couple of places where MonkeyRunner seemed to crash, but keep running (and also consume CPU cycles).
In addition the mmtest output is ridiculous. It just dumps %download info throughout our log file making it really hard to read through the logs.
I think we need to do a few things quickly here:
- We need to limit the amount of builds sending CTS jobs until we
understand what's going on. I'd suggest doing it for something like Origen or Panda since we have the most of those boards. Right now, the jobs are queuing up on snowball faster than it can unsuccessfully execute them
- We need to understand what's going on with CTS. Is this due to the
patched version we deployed, etc?
- For sanity sake update the logic for the wgets in mmtest.py to not dump
so much junk.
We'll need the android team's help on item 1. I also think we may need their help on item 2.
On 07/16/2012 10:06 PM, Michael Hudson-Doyle wrote:
Hi gang,
We had a fright today with LAVA being unreachable. Luckily, we could log in again after a time and notice the cause of the load: 10 or so Java processes like this:
root 31180 53.2 0.5 8175148 159856 ? Sl Jul16 603:22 java -cp :./android-cts/tools/../../android-cts/tools/ddmlib-prebuilt.jar:./android-cts/tools/../../android-cts/tools/tradefed-prebuilt.jar:./android-cts/tools/../../android-cts/tools/hosttestlib.jar:./android-cts/tools/../../android-cts/tools/cts-tradefed.jar -DCTS_ROOT=./android-cts/tools/../.. com.android.cts.tradefed.command.CtsConsole run cts --serial 192.168.1.199:5555 --plan CTS
Clearly, this is related to the CTS upgrades we've done recently. There was no device connected to 192.168.1.199:5555 so somehow we're leaking these processes. I guess we should stop that :-)
While looking into this, I noticed that monkeyrunner tests are quite CPU heavy. Is this expected? Do we need to limit how many of these we run at once?
Cheers, mwh
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
On 18 July 2012 04:11, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 17.07.2012 20:27, Zach Pfeffer pisze:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Yuppers.
For right now, lets just put this on the backburner and pick it back up next cycle. CTS not running sucks, but in lieu of other priorities I think picking up debugging later may be a good idea.
Thoughts?
Thanks ZK
On 17 July 2012 13:00, Andy Doan andy.doan@linaro.org wrote:
- android team
This is becoming a big problem. I just checked and the load on the system was growing out of control again.
Upon inspection, I found several Java (CTS) processes that were consuming lots of CPU cycles, but upon inspection there was no ADB connection to the board they were supposed to be testing.
I also found a couple of places where MonkeyRunner seemed to crash, but keep running (and also consume CPU cycles).
In addition the mmtest output is ridiculous. It just dumps %download info throughout our log file making it really hard to read through the logs.
I think we need to do a few things quickly here:
- We need to limit the amount of builds sending CTS jobs until we
understand what's going on. I'd suggest doing it for something like Origen or Panda since we have the most of those boards. Right now, the jobs are queuing up on snowball faster than it can unsuccessfully execute them
- We need to understand what's going on with CTS. Is this due to the
patched version we deployed, etc?
- For sanity sake update the logic for the wgets in mmtest.py to not
dump so much junk.
We'll need the android team's help on item 1. I also think we may need their help on item 2.
On 07/16/2012 10:06 PM, Michael Hudson-Doyle wrote:
Hi gang,
We had a fright today with LAVA being unreachable. Luckily, we could log in again after a time and notice the cause of the load: 10 or so Java processes like this:
root 31180 53.2 0.5 8175148 159856 ? Sl Jul16 603:22 java -cp
:./android-cts/tools/../../android-cts/tools/ddmlib-prebuilt.jar:./android-cts/tools/../../android-cts/tools/tradefed-prebuilt.jar:./android-cts/tools/../../android-cts/tools/hosttestlib.jar:./android-cts/tools/../../android-cts/tools/cts-tradefed.jar -DCTS_ROOT=./android-cts/tools/../.. com.android.cts.tradefed.command.CtsConsole run cts --serial 192.168.1.199:5555 --plan CTS
Clearly, this is related to the CTS upgrades we've done recently. There was no device connected to 192.168.1.199:5555 so somehow we're leaking these processes. I guess we should stop that :-)
While looking into this, I noticed that monkeyrunner tests are quite CPU heavy. Is this expected? Do we need to limit how many of these we run at once?
Cheers, mwh
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
-- Zygmunt Krynicki Linaro Validation Team s/Validation/Android/
W dniu 18.07.2012 14:39, Zach Pfeffer pisze:
On 18 July 2012 04:11, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 17.07.2012 20:27, Zach Pfeffer pisze:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Yuppers.
For right now, lets just put this on the backburner and pick it back up next cycle. CTS not running sucks, but in lieu of other priorities I think picking up debugging later may be a good idea.
Thoughts?
Agreed that this should be in the backlog for now. Is there a bug number we can track?
Thanks ZK
On 18 July 2012 07:48, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 18.07.2012 14:39, Zach Pfeffer pisze:
On 18 July 2012 04:11, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 17.07.2012 20:27, Zach Pfeffer pisze:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Yuppers.
For right now, lets just put this on the backburner and pick it back up next cycle. CTS not running sucks, but in lieu of other priorities I think picking up debugging later may be a good idea.
Thoughts?
Agreed that this should be in the backlog for now. Is there a bug number we can track?
No. Would you file one based off your debugging experience?
Thanks ZK
-- Zygmunt Krynicki Linaro Validation Team s/Validation/Android/
W dniu 18.07.2012 14:55, Zach Pfeffer pisze:
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Yuppers.
For right now, lets just put this on the backburner and pick it back up next cycle. CTS not running sucks, but in lieu of other priorities I think picking up debugging later may be a good idea.
Thoughts?
Agreed that this should be in the backlog for now. Is there a bug number we can track?
No. Would you file one based off your debugging experience?
Reported as:
https://bugs.launchpad.net/linaro-android/+bug/1026118
NOTE: I have not reproduced this locally yet. I'll give it a go after the meeting today but I don't have much time to spend on that this week.
On 18 July 2012 08:03, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 18.07.2012 14:55, Zach Pfeffer pisze:
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Yuppers.
For right now, lets just put this on the backburner and pick it back up next cycle. CTS not running sucks, but in lieu of other priorities I think picking up debugging later may be a good idea.
Thoughts?
Agreed that this should be in the backlog for now. Is there a bug number we can track?
No. Would you file one based off your debugging experience?
Reported as:
https://bugs.launchpad.net/linaro-android/+bug/1026118
NOTE: I have not reproduced this locally yet. I'll give it a go after the meeting today but I don't have much time to spend on that this week.
Actually, just leave it for the time being. I think the bug is enough.
-- Zygmunt Krynicki Linaro Validation Team s/Validation/Android/
linaro-android@lists.linaro.org