Re: [Linaro-validation] Lab health checks

19 Sep 2012


      We need a way to better investigate whats going on.
Are we using dhclient or how are we getting the IP?
Anyway, I feel to nail this down we need first make this reproducible.
For that having a "(re-)connect to eth on ubuntu" test case in our
enablement suite that we just run 10000 times for a test job without
rebooting and see if it breaks might do it...
Maybe someone from Ricardo's team or naresh/soumya (all three CCed)
can help write a simple lava-test that does that and integrate that in
our enablement testsuite for lava-test? Once we can force reproduction
we have a better chance to nail this down and we can ensure that
priority is there to fix this in kernel etc.
Note: not saying this has to happen now, but we need to improve how we
deal with rarely happen issues in a way that we nail them down.
Otherwise we will never reach 99.999% health job success :).
On Wed, Sep 19, 2012 at 5:18 PM, Andy Doan andy.doan@linaro.org wrote:
...
On 09/19/2012 06:01 AM, Alexander Sack wrote:
...
Do we know why we have regular networking issues on master images
still? Can we have an effort to nail this down? How can we do that?
We attempted to add some debugging in the past, but so far nothing has
helped much. The biggest problem we have now is repeatability of this issue.
If you ignore the TC2 failures which are skewing the results a little now,
we have about a 5% failure rate (for a 2 week period we actually had 1%!).
Of that 5%, 50% are network related:
pinging control fails
 downloading *.tgz in master fails
So out of 100 runs we get about 2 "wget" type failues and 2 "ping" type
failures. Regardless of how small the number is, its _half_ of our issues,
so we do get a good bang for our buck by improving it.
-- 
Alexander Sack
Technical Director, Linaro Platform Teams
http://www.linaro.org | Open source software for ARM SoCs
http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] Lab health checks