We're going to be talking about test case management in LAVA at the
Connect. I've brain-dumped some of my thoughts here:
https://wiki.linaro.org/Platform/LAVA/Specs/TestCaseManagement
Comments welcome. But if all you do is read it before coming to the
session, that's enough for me :-)
Cheers,
mwh
Hi,
I did code changes to support yaml based testdefs. I shall show you the
demo tomorrow of what I have and we can discuss the yaml structure in
more detail and get more stuff in.
My sample testdef looks like the following (testdef.yaml):
<snip1>
metadata:
name: simple
version: 1.0
format: lava-test v1.0
environment:
image-type: [beagle]
install:
url:
steps:
run:
steps:
- /bin/echo cache-coherency-switching - PASS
- ls
- pwd
parse:
pattern: (?P<test_case_id>.*-*)\\s+:\\s+(?P<result>(PASS|FAIL))
fixupdict:
PASS: pass
FAIL: fail
</snip1>
Following is a snipe from sample run:
<snip2>
cache-coherency-switching - PASS
install.sh
run.sh
testdef.yaml
/lava/tests/0_simple
<LAVA_TEST_RUNNER>: 0_simple exited with: 0
0_simple-1351674922 build.txt cpuinfo.txt meminfo.txt pkgs.txt
<LAVA_TEST_RUNNER>: exiting<LAVA_DISPATCHER>2012-10-31 02:44:54 PM INFO:
lava_test_shell seems to have completed
<LAVA_DISPATCHER>2012-10-31 02:44:54 PM INFO: attempting a filesystem
sync before power_off
linaro-test [rc=0]# sync
sync
linaro-test [rc=0]# <LAVA_DISPATCHER>2012-10-31 02:44:57 PM INFO:
[ACTION-E] lava_test_shell is finished successfully.
<LAVA_DISPATCHER>2012-10-31 02:44:57 PM INFO: Submitting the test result
with parameters = {u'stream': u'/anonymous/stylesen/', u'server':
u'http://10.155.13.219/RPC2/'}
dashboard-put-result:
http://10.155.13.219/dashboard/permalink/bundle/9fdccd73c7e825c2eec7850e61d…
<LAVA_DISPATCHER>2012-10-31 02:44:57 PM INFO: Dashboard :
http://10.155.13.219/dashboard/permalink/bundle/9fdccd73c7e825c2eec7850e61d…
</snip2>
Thank You.
--
Senthil Kumaran S
http://www.stylesen.org/http://www.sasenthilkumaran.com/
HI, All
I just run a CTS job on staging now,
and found that the cpu usage of the telnet process is nearly 100%
Anyone has any idea about that?
Below is the link of the collectd information installed on staging:
http://staging-metrics.validation.linaro.org:8080/collectd/bin/index.cgi?ho…
The high CPU usage is from 13:40 on CPU3 and changed to CPU1 at 14:30.
below is the output of the top command:
top - 15:18:26 up 7 days, 5:27, 1 user, load average: 1.81, 1.80, 1.46
Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie
Cpu(s): 2.5%us, 23.1%sy, 0.0%ni, 74.0%id, 0.3%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 8178504k total, 4866776k used, 3311728k free, 66408k buffers
Swap: 0k total, 0k used, 0k free, 3870820k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31383 root 20 0 27516 1504 1224 R 100 0.0 97:09.40 telnet
3626 root 20 0 2437m 92m 8496 S 0 1.2 0:22.77 java
13383 lava-sta 20 0 198m 47m 5180 S 0 0.6 0:18.85 uwsgi
1 root 20 0 24460 2336 1244 S 0 0.0 0:00.98 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0 0.0 0:08.43 ksoftirqd/0
Thanks,
Yongqin Liu
---------------------------------------------------------------
#mailing list
linaro-android(a)lists.linaro.org <linaro-dev(a)lists.linaro.org>
http://lists.linaro.org/mailman/listinfo/linaro-android
linaro-validation(a)lists.linaro.org <linaro-dev(a)lists.linaro.org>
http://lists.linaro.org/pipermail/linaro-validation
Hey Guys,
I just looked into Panda10 health check failures over the past 24hours.
The good news is my bad code isn't to blame. The bad news is that the SD
card appears to be having some issues.
Hi all,
I've just created a very basic munin installation for the lab:
http://munin.validation.linaro.org/
The monitoring wonk I consulted said that munin is perhaps not the
greatest way of getting graphs of your system but that it's probably the
easiest to set up. Better than nothing :-)
To add a system to munin you need to:
1) apt-get install munin-node on the system
2) Edit /etc/munin/munin-node.conf on the system to contain:
host_name XXX.validation.linaro.org
allow ^192\.168\.1\.32$
3) sudo service munin-node restart on the system
4) Add the following to /etc/munin/munin.conf on linaro-gateway:
[XXX.validation.linaro.org]
address 192.168.1.YYY
use_node_name yes
and that's it! The data viewable at http://munin.validation.linaro.org/
is generated by a */5 cron, so it takes a while for a new host to
appear. If someone wants to add dogfood, the compute nodes, the fast
model instances etc etc be my guest...
Once all the systems are added, the next thing is to start looking at
adding more us-specific metrics -- scheduler queue lengths, request
numbers and duration from django or apache, various postgres stats etc
etc. It would also be nice to add "events" to the graphs such as
rollouts and job start/ends but I don't know if that is supported.
Cheers,
mwh
Hi All,
Due to a glitch with UEFI and the latest kernels, we are forced to leave the TC2s offline until the issue is resolved. Ryan Harkin and I have been working to try and resolve this, but the best we could do is to get them to pass their health check (using sticking plaster, string and a large hammer) but they would then fail every test that was submitted to them, which would be kind of pointless. We're working actively to fix this problem, and I'll let you know when we're back up and running.
Thanks, and apologies once again,
Dave
Hi all,
It's all kinds of rough but I've just crossed a milestone: I ran the
dispatcher and had DS-5 capture energy data from my host while it was
running, using this branch (lots of which Andy wrote):
https://code.launchpad.net/~mwhudson/lava-dispatcher/signals/+merge/131128
Currently the output of streamline -report is attached to the test
result as an attribute, which is just awful. Either it should be parsed
into interesting data, or the -report output should be attached to test
run in a useful way (or both). But it's a start! I'm attaching the
test definition and job file I used.
Cheers,
mwh
Hey Guys,
I just hit a really annoying issue while trying to upgrade control to
our latest lava-dispatcher code.
Everything works great in dogfood and staging. However, I guess the
python version on control is just different enough to cause a problem
with our new use of "configglue". The issue is with our "boot_cmds" that
are set by our device-type .conf files. The faulty snippet is roughly:
string_to_list(boot_cmds)
on a "normal" system, this produces an array of commands. On control we
get a encoding mess that doesn't work with u-boot. eg:
['m\x00\x00\x00m\x00\x00\x00c\x00\x00\x00 .......
I think the easiest fix is to change our master.py to call:
string_to_list(boot_cmds.encode('ascii'))
I'm doing another round of unit testing to prove this works before
attempting to deploy.
For now I've marked all the devices that execute from control as
offline. If the fix takes too long, I'll just revert to the previous
lava deployment
-andy
Its often difficult to achieve pre-planned hacking goals at Connect, but
Michael and I spent a little time thinking about this topic and wanted
to try and lay out an agenda for LAVA next week.
The general thought process we took for these was:
* Is it beneficial to work on as a group?
* Is it something that we'll benefit from even we only wind up having 20
minutes and could it also work well if we find the time to work on it
for 2 hours.
With that in mind we are thinking about these items:
= Galaxy Nexus Fastboot Hacking
Have a "learn fastboot" session based on the email thread from last week.
= NI battery simulator
I can show how this works
Zach can try and help with the TCP disconnect issue
= Versatile Express intro hacking
Dave can give us some education on how the VExpress works/boots/etc.
Possibly grab Ryan/Tixy to join.
= Deployment Type Improvements
I think this will roughly be a "get Antonio to teach us Chef" session.
Maybe think about how to get Chef built into open stack image with
cloud-init.
= monitoring – adding app-specific metrics to munin
Michael can talk to us a bit about Munin and how to add new custom
metrics like. Then maybe we can hack on adding some like:
* web/django stuff
* postgres stuff
* job stuff
- num flocks
- device type wait time at various %iles
- device type utilizations
------------
origen02
------------
http://validation.linaro.org/lava-server/scheduler/job/35250
When trying to boot the test image, it went into a panic. I went onto the board and booted it both into master and test images and it was fine. So it looks like it was a random glitch. Back online to re-test.
------------
origen09
------------
http://validation.linaro.org/lava-server/scheduler/job/35134
Dropped into initramfs on test image boot. Booted up test image, and it sat for ages doing recovery on mmcblk0p6, which is testrootfs. Let it complete fsck, then did a reboot. Still recording errors, so replaced sd card with new image. Looks like one of the rare sd card failures. A health check finding a real problem. :)
------------
panda04
------------
http://validation.linaro.org/lava-server/scheduler/job/35095
Android glitch of some sort - home screen problem. Put back online to retest.
--------------------
snowball06/08
--------------------
http://192.168.1.10/lava-server/scheduler/job/35179
eth0 failed to come up. We see this a lot with snowballs. Perhaps there's a known bug in the master image we use (12.02)?
Thanks
Dave