On 10/10/2012 08:56 AM, Andrey Konovalov wrote:
Hi Dave,
On 10/10/2012 11:35 AM, Dave Pigott wrote:
Hi all,
I found an interesting health failure today on origen07
http://validation.linaro.org/lava-server/scheduler/job/35016/log_file
When you look at the log, you see that the board starts off at the u-boot prompt. It then tries to do a "reboot", which (obviously) fails. So naturally, it then does a hard reset, and this is where it does something very odd: It interrupts the boot and tries to boot the previously installed test image. I haven't yet looked at the dispatcher code to figure out why (that's my next job).
I'm not sure we can trust anything that occurred in this job file after the "deploy_linaro_image is finished with error". I think at this point the dispatcher is in an unknown state and doesn't know what it should be sending to the serial console.
In this case, it still tried to do the boot_linaro_image action. However, we didn't successfully deploy an image, so anything going wrong there probably can't be trusted. I would have guessed it would have found the DTB file, but I'm not sure that's worth digging too far into.
I think the real problem we see here is what you and I discussed on IRC earlier. There are certain actions in our job file, that if failed should be considered non-recoverable. ie:
* if deploy_linaro_image fails, then boot_linaro_image can't run. * if boot_linaro_image fails, lava_test_install can't run * if lava_test_install fails - well that's tricky since it may have installed some of the test we need but not all.
I'm wondering if we need to spend some time trying to improve how actions related to one other in code?
What then started alarm bells ringing was that I saw this:
1261680 bytes read reading uInitrd
1532597 bytes read reading board.dtb
** Unable to read "board.dtb" from mmc 0:5 **
So whatever the test image was, it was expecting a device tree blob, which I would have assumed would have to have been installed during deploy_linaro_image() being that if there is one it should just be part of the test boot deployment.