On Thu, Nov 15, 2018 at 11:03:44PM +0800, Chase Qi wrote:
The update for test plan https://review.linaro.org/#/c/ci/job/configs/+/29042/ https://review.linaro.org/#/c/ci/job/configs/+/29042/ was merged a few hours earlier than its dependence https://review.linaro.org/#/c/qa/test-definitions/+/29019/ https://review.linaro.org/#/c/qa/test-definitions/+/29019/ , I think that is the cause.
When I saw the update for test plan merged on my morning, then I merged the test case immediately.
Thanks Chase. Adding lkft-triage@.
Dan
Thanks, Chase
On Nov 15, 2018, at 10:57 PM, Dan Rue dan.rue@linaro.org wrote:
Thanks for the heads-up and investigation Dave :)
Adding Chase. This looks related: https://review.linaro.org/#/c/qa/test-definitions/+/29019/
Dan
On Thu, Nov 15, 2018 at 10:30:49AM +0000, Dave Pigott wrote:
Hi guys,
We had 20%+ Job failures over the last 24 hours (see https://pastebin.linaro.org/view/52dd817f). The majority are "Unable to open test definition”. Example here: https://lkft.validation.linaro.org/scheduler/job/508417#L200
Neil traced this down: "The jobs which failed were submitted *before* the referenced file was added to git.linaro.org - the failed test job records a git commit hash of https://git.linaro.org/qa/test-definitions.git/commit/?id=509ddb3a96eaf8eae5... which precedes https://git.linaro.org/qa/test-definitions.git/commit/?id=6071dd4c6f7827c0e5... which actually adds the file the test jobs needed."
This needs addressing. Some sync needs to happen so that jobs are not able to be submitted until the git commit has completed.
Any ideas?
Dave
Dave Pigott LAVA Lab Lead Linaro Ltd t: (+44) (0) 1223 400063
One thing that would be nice here is if LAVA reported this better. As the test definition has requested a file that doesn't exist in the repo that has just been cloned, throwing a nicer exception (RepositoryException or such like) would make this easier to understand in future.
On Thu, 15 Nov 2018 at 15:23, Dan Rue dan.rue@linaro.org wrote:
On Thu, Nov 15, 2018 at 11:03:44PM +0800, Chase Qi wrote:
The update for test plan https://review.linaro.org/#/c/ci/job/configs/+/29042/ https://review.linaro.org/#/c/ci/job/configs/+/29042/ was merged a few hours earlier than its dependence https://review.linaro.org/#/c/qa/test-definitions/+/29019/ https://review.linaro.org/#/c/qa/test-definitions/+/29019/ , I think that is the cause.
When I saw the update for test plan merged on my morning, then I merged the test case immediately.
Thanks Chase. Adding lkft-triage@.
Dan
Thanks, Chase
On Nov 15, 2018, at 10:57 PM, Dan Rue dan.rue@linaro.org wrote:
Thanks for the heads-up and investigation Dave :)
Adding Chase. This looks related: https://review.linaro.org/#/c/qa/test-definitions/+/29019/
Dan
On Thu, Nov 15, 2018 at 10:30:49AM +0000, Dave Pigott wrote:
Hi guys,
We had 20%+ Job failures over the last 24 hours (see https://pastebin.linaro.org/view/52dd817f). The majority are "Unable to open test definition”. Example here: https://lkft.validation.linaro.org/scheduler/job/508417#L200
Neil traced this down: "The jobs which failed were submitted *before* the referenced file was added to git.linaro.org - the failed test job records a git commit hash of https://git.linaro.org/qa/test-definitions.git/commit/?id=509ddb3a96eaf8eae5... which precedes https://git.linaro.org/qa/test-definitions.git/commit/?id=6071dd4c6f7827c0e5... which actually adds the file the test jobs needed."
This needs addressing. Some sync needs to happen so that jobs are not able to be submitted until the git commit has completed.
Any ideas?
Dave
Dave Pigott LAVA Lab Lead Linaro Ltd t: (+44) (0) 1223 400063