Just the one…
------------ panda03 ------------ http://validation.linaro.org/lava-server/scheduler/job/38289
Looks like the board locked up just after the startup animation completed. Went onto the board, and it was indeed locked. Hardreset and it came back. Put it down to a one off glitch.
Dave
On 11/10/2012 03:17 AM, Dave Pigott wrote:
Just the one…
panda03
http://validation.linaro.org/lava-server/scheduler/job/38289
Looks like the board locked up just after the startup animation completed. Went onto the board, and it was indeed locked. Hardreset and it came back. Put it down to a one off glitch.
Thanks for looking into this. With the new "newline" code the failure pattern looked different and I wasn't sure what went wrong.
I think we've had this type of failure occur 3 times in the past week on Panda. I think its becoming our #2 failure reason (more days needed to really be sure).
On 12 Nov 2012, at 15:33, Andy Doan andy.doan@linaro.org wrote:
On 11/10/2012 03:17 AM, Dave Pigott wrote:
Just the one…
panda03
http://validation.linaro.org/lava-server/scheduler/job/38289
Looks like the board locked up just after the startup animation completed. Went onto the board, and it was indeed locked. Hardreset and it came back. Put it down to a one off glitch.
Thanks for looking into this. With the new "newline" code the failure pattern looked different and I wasn't sure what went wrong.
I think we've had this type of failure occur 3 times in the past week on Panda. I think its becoming our #2 failure reason (more days needed to really be sure).
I suspect you're right, and underlining what Michael said the other day: We should define a period (week/month) over which we collect stats, and then the biggest problem in that period gets the attention, and then iterate until the failures are negligibly small and we move to the next highest sample unit (month/quarter -> quarter/year). The data set is now small enough it might warrant a month cycle, but I'm happy to review each week and see where we are.
Thanks
Dave
On 11/12/2012 09:44 AM, Dave Pigott wrote:
On 12 Nov 2012, at 15:33, Andy Doan andy.doan@linaro.org wrote:
On 11/10/2012 03:17 AM, Dave Pigott wrote:
Just the one…
------------ panda03 ------------ http://validation.linaro.org/lava-server/scheduler/job/38289
Looks like the board locked up just after the startup animation completed. Went onto the board, and it was indeed locked. Hardreset and it came back. Put it down to a one off glitch.
Thanks for looking into this. With the new "newline" code the failure pattern looked different and I wasn't sure what went wrong.
I think we've had this type of failure occur 3 times in the past week on Panda. I think its becoming our #2 failure reason (more days needed to really be sure).
I suspect you're right, and underlining what Michael said the other day: We should define a period (week/month) over which we collect stats, and then the biggest problem in that period gets the attention, and then iterate until the failures are negligibly small and we move to the next highest sample unit (month/quarter -> quarter/year). The data set is now small enough it might warrant a month cycle, but I'm happy to review each week and see where we are.
And since we have new people on the team, I keep the stats here:
https://docs.google.com/a/linaro.org/spreadsheet/ccc?key=0AnxpY5uv-BlNdG9zYT...
so its easy to determine what our issues are.
On 12 Nov 2012, at 15:44, Dave Pigott dpigott@gmail.com wrote:
On 12 Nov 2012, at 15:33, Andy Doan andy.doan@linaro.org wrote:
On 11/10/2012 03:17 AM, Dave Pigott wrote:
Just the one…
panda03
http://validation.linaro.org/lava-server/scheduler/job/38289
Looks like the board locked up just after the startup animation completed. Went onto the board, and it was indeed locked. Hardreset and it came back. Put it down to a one off glitch.
Thanks for looking into this. With the new "newline" code the failure pattern looked different and I wasn't sure what went wrong.
I think we've had this type of failure occur 3 times in the past week on Panda. I think its becoming our #2 failure reason (more days needed to really be sure).
I suspect you're right, and underlining what Michael said the other day: We should define a period (week/month) over which we collect stats, and then the biggest problem in that period gets the attention, and then iterate until the failures are negligibly small and we move to the next highest sample unit (month/quarter -> quarter/year). The data set is now small enough it might warrant a month cycle, but I'm happy to review each week and see where we are.
Thoughts (conjecture):
We don't do a clean boot in two ways: (1) We do a soft reboot (2) We use the master image u-boot
Both of these could, in theory, contribute to image lock up
We can fix (1) easily enough, but until we have a reliable sd-mux solution and/or switch to boot from USB thumb drive, there's not much we can do about that. It might be worth running a soak test on staging with a fix for (1) and see if we see any improvements.
Thanks
Dave
linaro-validation@lists.linaro.org