Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18 android images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details. https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5. This time I saw one job on panda01 succeed to download the images files at the 2nd try.
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
And another thing I suggest is that we change to use images of panda stable images in our health job here.
Thanks, Yongqin Liu
On Thu, Jun 7, 2012 at 9:20 AM, YongQin Liu yongqin.liu@linaro.org wrote:
Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18 android images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details. https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5. This time I saw one job on panda01 succeed to download the images files at the 2nd try.
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
Interesting find! Is it easy to add feature to retry?
What kernel/image are we using as master image? Are we using the same for _all_ panda boards?
Otherwise, Dave might want to take a look what's different. Maybe it's a different SoC rev etc.?
And another thing I suggest is that we change to use images of panda stable images in our health job here.
You want to move our panda health jobs to use the build you tested (panda-ics-gcc47-tilt-stable-blob#18 )?
On 7 June 2012 15:33, Alexander Sack asac@linaro.org wrote:
On Thu, Jun 7, 2012 at 9:20 AM, YongQin Liu yongqin.liu@linaro.org wrote:
Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18
android
images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details.
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry
number
to 5. This time I saw one job on panda01 succeed to download the images files
at
the 2nd try.
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
Interesting find! Is it easy to add feature to retry?
What kernel/image are we using as master image? Are we using the same for _all_ panda boards?
Otherwise, Dave might want to take a look what's different. Maybe it's a different SoC rev etc.?
Yeah, we can ask Dave about the information later.
And we can also wait a week to see the trend as the same error occurs, may be we can find the reason that time.
Thanks, Yongqin Liu
And another thing I suggest is that we change to use images of panda
stable images in our health job here.
You want to move our panda health jobs to use the build you tested (panda-ics-gcc47-tilt-stable-blob#18 )?
-- Alexander Sack Technical Director, Linaro Platform Teams http://www.linaro.org | Open source software for ARM SoCs http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
On 06/07/2012 02:20 AM, YongQin Liu wrote:
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18 android images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details. https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5. This time I saw one job on panda01 succeed to download the images files at the 2nd try.
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
And another thing I suggest is that we change to use images of panda stable images in our health job here.
I'm more than ready to update our health job to use this image. At a minimum its an improvement from what we have now. Last we talked, you were hesitant to make the change. However, it sounds like you feel good about it now.
Please bring this up in our meeting later today (if I don't). I'd like to make the change as soon as possible.
-andy
YongQin Liu yongqin.liu@linaro.org writes:
Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18 android images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details. https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5.
Makes sense. I think waiting 5 minutes between retries is probably a touch excessive too, maybe we should scale that down too.
This time I saw one job on panda01 succeed to download the images files at the 2nd try.
I've seen this happen a few times too.
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
I don't know what's going on at all. There is flakiness somewhere, but I don't know if it's in the panda master kernel, the hardware of some of the pandas or somewhere in our lab's setup. It's interesting that it happens more on particular devices though, suggests a more hardware-sided problem (whether with the panda or a loose cable or something else).
And another thing I suggest is that we change to use images of panda stable images in our health job here.
+1
Cheers, mwh
On 8 June 2012 06:01, Michael Hudson-Doyle michael.hudson@linaro.orgwrote:
YongQin Liu yongqin.liu@linaro.org writes:
Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18
android
images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details.
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5.
Makes sense. I think waiting 5 minutes between retries is probably a touch excessive too, maybe we should scale that down too.
This time I saw one job on panda01 succeed to download the images files
at
the 2nd try.
I've seen this happen a few times too.
I have filed a bug about this. https://bugs.launchpad.net/lava-dispatcher/+bug/1010285
do you think setting the wait time to 1 minute is ok?
Thanks, Yongqin Liu
For the network problem, is it just related to the specified board(like panda01), or has relation to the entire network of lab? Anyone has any thoughts about it?
I don't know what's going on at all. There is flakiness somewhere, but I don't know if it's in the panda master kernel, the hardware of some of the pandas or somewhere in our lab's setup. It's interesting that it happens more on particular devices though, suggests a more hardware-sided problem (whether with the panda or a loose cable or something else).
And another thing I suggest is that we change to use images of panda
stable
images in our health job here.
+1
Cheers, mwh
YongQin Liu yongqin.liu@linaro.org writes:
On 8 June 2012 06:01, Michael Hudson-Doyle michael.hudson@linaro.orgwrote:
YongQin Liu yongqin.liu@linaro.org writes:
Hi, All
I just submitted 100 jobs with panda-ics-gcc47-tilt-stable-blob#18
android
images. 4 of them are failed because of the network problem. 3 on panda01 and 1 on panda06. you can see here for the details.
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
For the download problem of image files, I guest we can set the retry number to 5.
Makes sense. I think waiting 5 minutes between retries is probably a touch excessive too, maybe we should scale that down too.
This time I saw one job on panda01 succeed to download the images files
at
the 2nd try.
I've seen this happen a few times too.
I have filed a bug about this. https://bugs.launchpad.net/lava-dispatcher/+bug/1010285
Thanks.
do you think setting the wait time to 1 minute is ok?
Yeah, that sounds about right.
Cheers, mwh
linaro-validation@lists.linaro.org