+anmar
On Tue, Oct 16, 2012 at 5:59 PM, Andy Doan andy.doan@linaro.org wrote:
On 10/16/2012 02:26 AM, Lee Jones wrote:
On Mon, 15 Oct 2012, Andy Doan wrote:
On 10/15/2012 01:04 PM, Alexander Sack wrote:
>> >> -------------------- >> snowball06/08 >> -------------------- >> http://192.168.1.10/lava-server/scheduler/job/35179 >> >> eth0 failed to come up. We see this a lot with snowballs.
"We see this a lot" -- do we have actual numbers? To everyone: assuming not, what can we do to get some?
I keep the log of health check failures at:
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
In the past 5 days its happened 4 times on snowball.
Prior to that. In a span of 25 health failures snowball accounted for 8 of the failures. Half of those failures look like this problem. So this snowball issue is accounting for around 16% of our health check failures.
So it works sometimes, but not others? Sounds like a h/w bug.
could be hwbug, but driver bugs can also give undeterministic behaviour in full system stacks from what i experience (racy things etc.). Since we are in software business I feel we should look closer at the software side before disregarding something as hwbug ...
How can we nail the source of this? Maybe we have a kernel that we have the guts feeling is better than the 12.02 and could give that a stress test try?
That's interesting. It happens most often, but is not limited to snowball06. I also have records of this failure on:
snowball01, snowball03, snowball08
But 6 of the last 8 failures of this type were snowball06
Are there logs?
Here's the normal way it fails:
http://validation.linaro.org/lava-server/scheduler/job/35517/log_file