Hi all,
Two last night, which means we're averaging approximately one health check failure per day, which equates to a 95% pass rate. Not great.
------------ panda04 ------------ http://validation.linaro.org/lava-server/scheduler/job/39484
When it got into the test image, the device was spewing out lots of weird error messages. Went onto the board and rebooted the test image: same problem. Shell prompt was also corrupted. I wasn't clear if this is a board/sd card/corrupt image deployment problem, so I booted the master image, and that seems fine. Putting back online to see if it was a one off corruption.
If the board passes this time, then the only way to fix this problem would be to set up so that if for some reason things fail in the test image, go round and do it all again - including deployment, because rebooting the test image wouldn't have worked.
------------ panda06 ------------ http://validation.linaro.org/lava-server/scheduler/job/39477
wget weirdness. Kept getting "Connection reset by peer", and then retrying. Putting back online to see if it's a one off glitch.
If the board passes this time, then the way to fix the problem is, if deployment fails, reboot to the master image and try again.
Thanks
Dave
On 11/21/2012 02:38 AM, Dave Pigott wrote:
Hi all,
Two last night, which means we're averaging approximately one health check failure per day, which equates to a 95% pass rate. Not great.
panda04
http://validation.linaro.org/lava-server/scheduler/job/39484
When it got into the test image, the device was spewing out lots of weird error messages. Went onto the board and rebooted the test image: same problem. Shell prompt was also corrupted. I wasn't clear if this is a board/sd card/corrupt image deployment problem, so I booted the master image, and that seems fine. Putting back online to see if it was a one off corruption.
If the board passes this time, then the only way to fix this problem would be to set up so that if for some reason things fail in the test image, go round and do it all again - including deployment, because rebooting the test image wouldn't have worked.
Specifically the kernel failed with:
[ 16.490295] ===================================== [ 16.506622] [ BUG: bad unlock balance detected! ] [ 16.517089] ------------------------------------- [ 16.527618] syslink_daemon/1141 is trying to release lock (gatemp_module->gate_mutex) at: [ 16.541992] [<c0743b40>] mutex_unlock+0x18/0x1c [ 16.552520] but there are no more locks to release! [ 16.563262] [ 16.563262] other info that might help us debug this: [ 16.581390] no locks held by syslink_daemon/1141. [ 16.592010] [ 16.592041] stack backtrace: [ 16.607757] [<c001b5f0>] (unwind_backtrace+0x0/0xec) from [<c0720fc8>] (dump_stack+0x20/0x24) [ 16.622558] [<c0720fc8>] (dump_stack+0x20/0x24) from [<c072452c>] (print_unlock_inbalance_bug.part.19+0x84/0xac) [ 16.639373] [<c072452c>] (print_unlock_inbalance_bug.part.19+0x84/0xac) from [<c0094840>] (print_unlock_inbalance_bug+0x44/0x50) [ 16.663909] [<c0094840>] (print_unlock_inbalance_bug+0x44/0x50) from [<c0097388>] (__lock_release+0x9c/0xd8) [ 16.680877] [<c0097388>] (__lock_release+0x9c/0xd8) from [<c00974a0>] (lock_release+0xdc/0x100) [ 16.696624] [<c00974a0>] (lock_release+0xdc/0x100) from [<c0743a94>] (__mutex_unlock_slowpath+0xf8/0x18c) [ 16.713470] [<c0743a94>] (__mutex_unlock_slowpath+0xf8/0x18c) from [<c0743b40>] (mutex_unlock+0x18/0x1c) [ 16.730285] [<c0743b40>] (mutex_unlock+0x18/0x1c) from [<c0537f0c>] (gatehwspinlock_leave+0xc0/0xe0) [ 16.746795] [<c0537f0c>] (gatehwspinlock_leave+0xc0/0xe0) from [<c0538ea8>] (gatemp_leave+0x28/0x2c) [ 16.763488] [<c0538ea8>] (gatemp_leave+0x28/0x2c) from [<c053b21c>] (heapmemmp_alloc+0x1b8/0x200) [ 16.779937] [<c053b21c>] (heapmemmp_alloc+0x1b8/0x200) from [<c053c6e4>] (heapmemmp_ioctl+0x13c/0x4f0) [ 16.796722] [<c053c6e4>] (heapmemmp_ioctl+0x13c/0x4f0) from [<c0544dcc>] (ipc_ioc_router+0x164/0x194) [ 16.813171] [<c0544dcc>] (ipc_ioc_router+0x164/0x194) from [<c0544e64>] (ipc_ioctl+0x68/0x74) [ 16.828765] [<c0544e64>] (ipc_ioctl+0x68/0x74) from [<c0127af4>] (do_vfs_ioctl+0x2a8/0x2e4) [ 16.844085] [<c0127af4>] (do_vfs_ioctl+0x2a8/0x2e4) from [<c0127b90>] (sys_ioctl+0x60/0x84) [ 16.859344] [<c0127b90>] (sys_ioctl+0x60/0x84) from [<c00130e0>] (ret_fast_syscall+0x0/0x3c)
This doesn't seem to happen very often, but I'm keeping notes just in case.
linaro-validation@lists.linaro.org