On 8 June 2012 06:01, Michael Hudson-Doyle <michael.hudson@linaro.org> wrote:
YongQin Liu <yongqin.liu@linaro.org> writes:

> Hi, All
>
> I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18 android
> images.
> 4 of them are failed because of the network problem.
> 3 on panda01 and 1 on panda06.
> you can see here for the details.
> https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYTdDLWZWRVFGaWFxQzRLNWtaNmc#gid=5
>
> For the download problem of image files, I guest we can set the retry
> number to 5.

Makes sense.  I think waiting 5 minutes between retries is probably a
touch excessive too, maybe we should scale that down too.

> This time I saw one job on panda01 succeed to download the images files at
> the 2nd try.

I've seen this happen a few times too.

I have filed a bug about this.
https://bugs.launchpad.net/lava-dispatcher/+bug/1010285

do you think setting the wait time to 1 minute is ok?
 
Thanks,
Yongqin Liu
 
> For the network problem, is it just related to the specified board(like
> panda01), or has relation to the entire network of lab?
> Anyone has any thoughts about it?

I don't know what's going on at all.  There is flakiness somewhere, but
I don't know if it's in the panda master kernel, the hardware of some of
the pandas or somewhere in our lab's setup.  It's interesting that it
happens more on particular devices though, suggests a more
hardware-sided problem (whether with the panda or a loose cable or
something else).

> And another thing I suggest is that we change to use images of panda stable
> images in our health job here.

+1

Cheers,
mwh