We've seen a significant reduction in health job failures, but I still wanted to send out a report on these so people could see how things are still breaking.
We've had 25 real health failures over the past 2 weeks.
By device type:
6 snowball 1 imx53 1 vexpress 2 beagle 6 origen 9 panda
By failure type:
2 SD cards died: (both on Origen)
7 Serial Console Related: - 5 connection never established at start of job - 1 connection dropped during test - 1 garbage over serial line
10 Network Related: - 3 network failed to come up - 3 ping unreachable error
- 3 wget | tar type failure (kind of network we think)
NOTE: A report on something like this is semi-subjective, given that its based on my interpretation of the failures. The raw information on this can be found at:
Andy Doan andy.doan@linaro.org writes:
We've seen a significant reduction in health job failures, but I still wanted to send out a report on these so people could see how things are still breaking.
We've had 25 real health failures over the past 2 weeks.
By device type:
6 snowball 1 imx53 1 vexpress 2 beagle 6 origen 9 panda
By failure type:
2 SD cards died: (both on Origen)
Yay! That's the sort of problem we are _supposed_ to be finding :)
7 Serial Console Related:
- 5 connection never established at start of job
I'd dearly love to know what's going on here. I could implement a kind of ~exponential back off where we wait 5 seconds, 1 minute, 5 minutes between attempts to reset the port?
- 1 connection dropped during test
- 1 garbage over serial line
Not sure what we can do about these in general.
10 Network Related:
3 network failed to come up
3 ping unreachable error
3 wget | tar type failure (kind of network we think)
We have a plan around these, at least.
Cheers, mwh
On 07/01/2012 07:03 PM, Michael Hudson-Doyle wrote:
Andy Doan andy.doan@linaro.org writes:
We've seen a significant reduction in health job failures, but I still wanted to send out a report on these so people could see how things are still breaking.
We've had 25 real health failures over the past 2 weeks.
By device type:
6 snowball 1 imx53 1 vexpress 2 beagle 6 origen 9 panda
By failure type:
2 SD cards died: (both on Origen)
Yay! That's the sort of problem we are _supposed_ to be finding :)
7 Serial Console Related: - 5 connection never established at start of job
I'd dearly love to know what's going on here. I could implement a kind of ~exponential back off where we wait 5 seconds, 1 minute, 5 minutes between attempts to reset the port?
Maybe Dave has some thoughts. I haven't played around enough with that stuff to have a very informed opinion, but that does sound worth trying on the surface.
- 1 connection dropped during test - 1 garbage over serial line
Not sure what we can do about these in general.
yeah, and if its that small a number, I'm inclined to wait and until they become a higher percentage of our problems.
10 Network Related: - 3 network failed to come up - 3 ping unreachable error
- 3 wget | tar type failure (kind of network we think)
We have a plan around these, at least.
Cheers, mwh
On 2 Jul 2012, at 17:58, Andy Doan wrote:
On 07/01/2012 07:03 PM, Michael Hudson-Doyle wrote:
Andy Doan andy.doan@linaro.org writes:
We've seen a significant reduction in health job failures, but I still wanted to send out a report on these so people could see how things are still breaking.
We've had 25 real health failures over the past 2 weeks.
By device type:
6 snowball 1 imx53 1 vexpress 2 beagle 6 origen 9 panda
By failure type:
2 SD cards died: (both on Origen)
Yay! That's the sort of problem we are _supposed_ to be finding :)
7 Serial Console Related:
- 5 connection never established at start of job
I'd dearly love to know what's going on here. I could implement a kind of ~exponential back off where we wait 5 seconds, 1 minute, 5 minutes between attempts to reset the port?
Maybe Dave has some thoughts. I haven't played around enough with that stuff to have a very informed opinion, but that does sound worth trying on the surface.
I'm not completely sure what's going on, but I know that essentially when it happens you have to power cycle the board to get back to it. One thing I haven't tried as yet is to reset the serial port when it's stuck like this, but next time it happens I'll play around a bit more, rather than just doing a quick fix to get the board back online.
Dave
linaro-validation@lists.linaro.org