YongQin Liu yongqin.liu@linaro.org writes:
Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18 android images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details. https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5.
Makes sense. I think waiting 5 minutes between retries is probably a touch excessive too, maybe we should scale that down too.
This time I saw one job on panda01 succeed to download the images files at the 2nd try.
I've seen this happen a few times too.
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
I don't know what's going on at all. There is flakiness somewhere, but I don't know if it's in the panda master kernel, the hardware of some of the pandas or somewhere in our lab's setup. It's interesting that it happens more on particular devices though, suggests a more hardware-sided problem (whether with the panda or a loose cable or something else).
And another thing I suggest is that we change to use images of panda stable images in our health job here.
+1
Cheers, mwh