On 18 July 2012 04:11, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 17.07.2012 20:27, Zach Pfeffer pisze:
We can certainly shut off the CTS valve. Before we change all the builds, maybe we can solve the problem at the source. Zyga, do you want to take a look?
I had a quick look yesterday. From what Andy told me it's the Java parts that are stuck. I started with lava-android-tests CTS wrapper code. All I could find is a reference to CTS that we download from google servers. I recall someone mentioning that we've patched CTS somehow and this does not agree with the code I saw.
So, is the next step to dig into CTS java code that runs on the host?
Yuppers.
For right now, lets just put this on the backburner and pick it back up next cycle. CTS not running sucks, but in lieu of other priorities I think picking up debugging later may be a good idea.
Thoughts?
Thanks ZK
On 17 July 2012 13:00, Andy Doan andy.doan@linaro.org wrote:
- android team
This is becoming a big problem. I just checked and the load on the system was growing out of control again.
Upon inspection, I found several Java (CTS) processes that were consuming lots of CPU cycles, but upon inspection there was no ADB connection to the board they were supposed to be testing.
I also found a couple of places where MonkeyRunner seemed to crash, but keep running (and also consume CPU cycles).
In addition the mmtest output is ridiculous. It just dumps %download info throughout our log file making it really hard to read through the logs.
I think we need to do a few things quickly here:
- We need to limit the amount of builds sending CTS jobs until we
understand what's going on. I'd suggest doing it for something like Origen or Panda since we have the most of those boards. Right now, the jobs are queuing up on snowball faster than it can unsuccessfully execute them
- We need to understand what's going on with CTS. Is this due to the
patched version we deployed, etc?
- For sanity sake update the logic for the wgets in mmtest.py to not
dump so much junk.
We'll need the android team's help on item 1. I also think we may need their help on item 2.
On 07/16/2012 10:06 PM, Michael Hudson-Doyle wrote:
Hi gang,
We had a fright today with LAVA being unreachable. Luckily, we could log in again after a time and notice the cause of the load: 10 or so Java processes like this:
root 31180 53.2 0.5 8175148 159856 ? Sl Jul16 603:22 java -cp
:./android-cts/tools/../../android-cts/tools/ddmlib-prebuilt.jar:./android-cts/tools/../../android-cts/tools/tradefed-prebuilt.jar:./android-cts/tools/../../android-cts/tools/hosttestlib.jar:./android-cts/tools/../../android-cts/tools/cts-tradefed.jar -DCTS_ROOT=./android-cts/tools/../.. com.android.cts.tradefed.command.CtsConsole run cts --serial 192.168.1.199:5555 --plan CTS
Clearly, this is related to the CTS upgrades we've done recently. There was no device connected to 192.168.1.199:5555 so somehow we're leaking these processes. I guess we should stop that :-)
While looking into this, I noticed that monkeyrunner tests are quite CPU heavy. Is this expected? Do we need to limit how many of these we run at once?
Cheers, mwh
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
-- Zygmunt Krynicki Linaro Validation Team s/Validation/Android/