Re: [Linaro-validation] Health check failures

16 Oct 2012


      On 16 Oct 2012, at 17:22, Alexander Sack asac@linaro.org wrote:
...
+anmar
On Tue, Oct 16, 2012 at 5:59 PM, Andy Doan andy.doan@linaro.org wrote:
...
On 10/16/2012 02:26 AM, Lee Jones wrote:
...
On Mon, 15 Oct 2012, Andy Doan wrote:
...
On 10/15/2012 01:04 PM, Alexander Sack wrote:
...
...
>>> 
>>> --------------------
>>> snowball06/08
>>> --------------------
>>> http://192.168.1.10/lava-server/scheduler/job/35179
>>> 
>>> eth0 failed to come up. We see this a lot with snowballs.
> 
> 
> "We see this a lot" -- do we have actual numbers?  To everyone:
> assuming
> not, what can we do to get some?
I keep the log of health check failures at:
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
In the past 5 days its happened 4 times on snowball.
Prior to that. In a span of 25 health failures snowball accounted
for 8 of the failures. Half of those failures look like this
problem. So this snowball issue is accounting for around 16% of our
health check failures.
So it works sometimes, but not others? Sounds like a h/w bug.
could be hwbug, but driver bugs can also give undeterministic
behaviour in full system stacks from what i experience (racy things
etc.). Since we are in software business I feel we should look closer
at the software side before disregarding something as hwbug ...
How can we nail the source of this? Maybe we have a kernel that we
have the guts feeling is better than the 12.02 and could give that a
stress test try?
Idea for a plan: We take snowball06 and run loop tests on 12.{03-09} for a few days and see if any one seems to behave better than the others?
Dave

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] Health check failures