I was just doing a manual update of staging and see it got broken
because the dashboard code now requires a newer version of
linaro-dashboard-bundle. I doubt we've packaged that new version and
deployed it, so to address it for now I created a local branch for that
module as well.
This is more of a note that we'll need to remember to make appropriate
updates (lava-manifests ...) when we do our next roll-out of dashboard
changes.
Hi James,
Added the validation list to your e-mail for future reference. :)
Dave
On 8 Nov 2012, at 14:45, James Tunnicliffe <james.tunnicliffe(a)linaro.org> wrote:
> (sorry Dave, validation(a)linaro.org bounced. Not sure what your team's
> address is - please forward)
>
> http://askubuntu.com/questions/211603/problems-with-nat-adapater-since-upgr…
>
> This got me. NAT doesn't work in 12.10, but the proposed fix works:
>
> VBoxManage modifyvm "name" --natdnshostresolver1 on
> VBoxManage modifyvm "name" --natdnsproxy1 on
>
> Not sure if it is worth mentioning in the docs or not.
>
> --
> James Tunnicliffe
Hey Guys,
I'm doing a final test on panda24 before deploying a new dispatcher in
production. This means its no longer in the looping health job mode we
were doing.
-andy
http://staging.validation.linaro.org/scheduler/job/35505/log_file
This is a bit odd. It got confused when we were in the u-boot prompt while trying to boot up the android test image. It may be some code flaw, though I can't see what, other than it took 5 minutes to get from reboot to that point. Perhaps this calls for a similar approach to booting test images, i.e. if it fails, try a couple more times. May be an edge case, but would put our reliability way up. Along with that, we may have to up the timeouts for booting, given we might do it 3 times.
Thoughts?
Dave
On 5 November 2012 17:35, David Long wrote:
> It looks to me like our TI 3.4 nightly LAVA runs are broken for all test
> cases (not just the TILT stuff). Is someone investigating this?
I'm looking to ci-LT-TI-working-tree-3_4 stream.
It seems the test fails to run since 2012-10-24.
lava-test doesn't install successfully:
OperationFailed: executing u'apt-get -o
Acquire::http::proxy=http://192.168.1.10:3128/ update' failed with
code 100
I see other errors on other streams since the same date...
OK. We've had 3 failures out of 250 in 24 hours - 98.8% -better than the 16 or so we were getting, but still...
----------------
snowball10
----------------
http://staging.validation.linaro.org/scheduler/job/35883
Test image kernel died on boot.
----------------
beaglexm04
----------------
http://staging.validation.linaro.org/scheduler/job/35901
and
http://staging.validation.linaro.org/scheduler/job/35838
Corrupted serial output in test image. Michael's new code caught it, but it was so corrupted it still failed. Next check went ok.
Conclusion: All 3 of these could have been fixed by attempting a reboot of the test image, just like we're now doing on the master.
Of course, one could take the view that the corrupted serial line means a broken board, but given it came back and consistently ran afterwards means we just hit some edge case.
I'll file a bug. I strongly believe that we are close to achieving our 99.9%+ target set by Alexander.
Thanks
Dave
Hi all,
I've just checked in and deployed a more general fix for the network failures on staging, and it's running through its paces now. Essentially, the change now tries to reboot three times if *anything* goes wrong, not just the network coming up. Looking back at the failures of the last 24 hours (2 out of 158) both of those would have been fixed by my change.
We'll know how well we've done in 24 hours time.
Thanks
Dave
Although munin is working out fine for us currently, an option I'd
prefer in some ways is graphite. This document:
http://www.aosabook.org/en/graphite.html
explains how graphite works in a way that that made way more sense to me
than anything else I'd read.
Cheers,
mwh
Hi,
I did code changes to support yaml based testdefs. I shall show you the
demo tomorrow of what I have and we can discuss the yaml structure in
more detail and get more stuff in.
My sample testdef looks like the following (testdef.yaml):
<snip1>
metadata:
name: simple
version: 1.0
format: lava-test v1.0
environment:
image-type: [beagle]
install:
url:
steps:
run:
steps:
- /bin/echo cache-coherency-switching - PASS
- ls
- pwd
parse:
pattern: (?P<test_case_id>.*-*)\\s+:\\s+(?P<result>(PASS|FAIL))
fixupdict:
PASS: pass
FAIL: fail
</snip1>
Following is a snipe from sample run:
<snip2>
cache-coherency-switching - PASS
install.sh
run.sh
testdef.yaml
/lava/tests/0_simple
<LAVA_TEST_RUNNER>: 0_simple exited with: 0
0_simple-1351674922 build.txt cpuinfo.txt meminfo.txt pkgs.txt
<LAVA_TEST_RUNNER>: exiting<LAVA_DISPATCHER>2012-10-31 02:44:54 PM INFO:
lava_test_shell seems to have completed
<LAVA_DISPATCHER>2012-10-31 02:44:54 PM INFO: attempting a filesystem
sync before power_off
linaro-test [rc=0]# sync
sync
linaro-test [rc=0]# <LAVA_DISPATCHER>2012-10-31 02:44:57 PM INFO:
[ACTION-E] lava_test_shell is finished successfully.
<LAVA_DISPATCHER>2012-10-31 02:44:57 PM INFO: Submitting the test result
with parameters = {u'stream': u'/anonymous/stylesen/', u'server':
u'http://10.155.13.219/RPC2/'}
dashboard-put-result:
http://10.155.13.219/dashboard/permalink/bundle/9fdccd73c7e825c2eec7850e61d…
<LAVA_DISPATCHER>2012-10-31 02:44:57 PM INFO: Dashboard :
http://10.155.13.219/dashboard/permalink/bundle/9fdccd73c7e825c2eec7850e61d…
</snip2>
Thank You.
--
Senthil Kumaran S
http://www.stylesen.org/http://www.sasenthilkumaran.com/