All on snowballs this time. Some have been stuck for 3 days.
---------------- snowball02 ---------------- http://validation.linaro.org/lava-server/scheduler/job/40045
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
---------------- snowball03 ---------------- http://validation.linaro.org/lava-server/scheduler/job/40221
Bit of an odd one. Running a health check, but it never even started executing. Did a cancel, but then couldn't find a process on control to match it. Went into admin and forced board offline, then back online and it started running a health check. Trouble is, it's still showing up as a long running job. :-/
---------------- snowball04 ---------------- http://validation.linaro.org/lava-server/scheduler/job/40219/log_file
Same story as snowball03, 07 and 08.
---------------- snowball05 ---------------- http://validation.linaro.org/lava-server/scheduler/job/39987
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
---------------- snowball06 ---------------- http://validation.linaro.org/lava-server/scheduler/job/40079
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
---------------- snowball07 ---------------- http://validation.linaro.org/lava-server/scheduler/job/40220
Same story as snowball03, 04 and 08.
---------------- snowball08 ---------------- http://validation.linaro.org/lava-server/scheduler/job/40218
Same story as snowball03, 04 and 07.
Oh, and to add insult to injury, this has skewed the reports. It now thinks the jobs that I killed with -2 failed their health checks. :(
Dave
On 28 Nov 2012, at 11:36, Dave Pigott dave.pigott@linaro.org wrote:
All on snowballs this time. Some have been stuck for 3 days.
snowball02
http://validation.linaro.org/lava-server/scheduler/job/40045
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
snowball03
http://validation.linaro.org/lava-server/scheduler/job/40221
Bit of an odd one. Running a health check, but it never even started executing. Did a cancel, but then couldn't find a process on control to match it. Went into admin and forced board offline, then back online and it started running a health check. Trouble is, it's still showing up as a long running job. :-/
snowball04
http://validation.linaro.org/lava-server/scheduler/job/40219/log_file
Same story as snowball03, 07 and 08.
snowball05
http://validation.linaro.org/lava-server/scheduler/job/39987
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
snowball06
http://validation.linaro.org/lava-server/scheduler/job/40079
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
snowball07
http://validation.linaro.org/lava-server/scheduler/job/40220
Same story as snowball03, 04 and 08.
snowball08
http://validation.linaro.org/lava-server/scheduler/job/40218
Same story as snowball03, 04 and 07.
Dave Pigott dave.pigott@linaro.org writes:
All on snowballs this time. Some have been stuck for 3 days.
snowball02
http://validation.linaro.org/lava-server/scheduler/job/40045
The usual story. Pushed bundle, and then stuck. Did a cancel and kill -2
You don't need to cancel the jobs -- just kill -2 them.
This is https://bugs.launchpad.net/lava-scheduler/+bug/1043059 -- I'm completely stumped on it, unfortunately. What happens is that we have an extra fd open between the scheduler monitor process and the dispatcher and this extra fd is never reported closed to the monitor code so it doesn't think that it's exited. Probably we should just give up on this --oob-fd cuteness and parse the dashboard-put-result: message out of stdout.
Cheers, mwh
linaro-validation@lists.linaro.org