Adding lkft-triage. I didn't realize but this is impacting us too.
On Thu, Nov 15, 2018 at 08:57:39AM -0600, Dan Rue wrote:
Thanks for the heads-up and investigation Dave :)
Adding Chase. This looks related: https://review.linaro.org/#/c/qa/test-definitions/+/29019/
Dan
On Thu, Nov 15, 2018 at 10:30:49AM +0000, Dave Pigott wrote:
Hi guys,
We had 20%+ Job failures over the last 24 hours (see https://pastebin.linaro.org/view/52dd817f). The majority are "Unable to open test definition”. Example here: https://lkft.validation.linaro.org/scheduler/job/508417#L200
Neil traced this down: "The jobs which failed were submitted *before* the referenced file was added to git.linaro.org - the failed test job records a git commit hash of https://git.linaro.org/qa/test-definitions.git/commit/?id=509ddb3a96eaf8eae5... which precedes https://git.linaro.org/qa/test-definitions.git/commit/?id=6071dd4c6f7827c0e5... which actually adds the file the test jobs needed."
This needs addressing. Some sync needs to happen so that jobs are not able to be submitted until the git commit has completed.
Any ideas?
Dave
Dave Pigott LAVA Lab Lead Linaro Ltd t: (+44) (0) 1223 400063