Re: LAVA scheduler spec

10 Feb 2011


      On 4 February 2011 13:53, Paul Larson paul.larson@linaro.org wrote:
...
Hi Mirsad, I'm looking at the recent edits to
https://wiki.linaro.org/Platform/Validation/Specs/ValidationScheduler and
wanted to start a thread to discuss.  Would love to hear thoughts from
others as well.
We could probably use some more in the way of implementation details, but
this is starting to take shape pretty well, good work.  I have a few
comments below:
...
Admin users can also cancel any scheduled jobs.
Job submitters should be allowed to cancel their own jobs too, right?
Correct, that's described for 'normal users': "Normal users will be able to
define a test job, submit it for execution, cancel an ongoing job...". I
will clarify this more explicitly in the spec.
I think in general, the user stories need tweaking.  Many of them center
...
around automatic scheduling of jobs based on some event (adding a machine,
adding a test, etc).  Based on the updated design, this kind of logic would
be in the piece we were referring to as the driver.  The scheduler shouldn't
be making those decisions on its own, but it should provide an interface for
both humans to schedule jobs (web, cli) as well as and api for machines
(driver) to do this.
Agree, we had some discussion about the driver part which didn't end with
any specific conclusion, so I just kept the driver user stories in the
scheduler. I will remove the driver specific user stories and develop in
more detail the scheduler API definition and usage.
...
...
should we avoid scheduling image tests twice because a hwpack is coming
in after images or vv.
Is this a question?  Again, I don't think that's the scheduler's call.  The
scheduler isn't deciding what tests to run, and what to run them on.  In
this case, assuming we have the resources to pull it off, running the new
image with the old, and the new hwpack would be good to do.
Agree, will remove this.
...
...
Test job definition
Is this different from the job definition used by the dipatcher?  Please
tell me if I'm missing something here, but I think to schedule something,
you only really need two blobs of information:
1a. specific host to run on
   -OR-
1b. (any/every system matching given criteria)
    This one is tricky, and though it sounds really useful, my personally
feeling is that it is of questionable value.  In theory, it lets you make
more efficient use of your hardware when you have multiple identical
machines.  In practice, what I've seen on similar systems is that humans
typically know exactly which machine they want to run something on.  Where
it might really come in to play is later when we have a driver automatically
scheduling jobs for us.
2. job file - this is the piece that the job dispatcher consumes.  It could
be handwritten, machine generated, or created based on a web form where the
user selects what they want.
This is the same test job definition that the dispatcher will use. The idea
here is that the end-users define a test job, which is then pushed to the
dispatcher in some way. The scheduler will provide the web form you mention
under 2. Is it something I'm missing here?
...
...
Test job status
One distinction I want to make here is job status vs. test result.  A
failed test can certainly have a "complete" job status.
Incomplete, as a job status, just means that the dispatcher was unable to
finsish all the steps in the job.  For instance, a better example would be
if we had a test that required an image to be deployed, booted, and a test
run on it.  If we tried to deploy the image and hit a kernel panic on
reboot, that is an incomplete job because it never made it far enough to run
the specified test.
Exactly, that was my idea in the beginning. I will try to expand the spec
about this issue. One question here is if we want to collect logs from the
failed test job and make them visible from the scheduler? Guess we don't
want these logs/results pushed to dashboard.
...
...
Link to test results in launch-control
If we tie this closely enough with launch-control, it seems we could just
communicate the job id to the dispatcher so that it gets rolled up with the
bundle.  That way the dashboard would have a backlink to the job, and could
create the link to the bundle once it is deserialized.  Just a different
option if it's easier.  I don't see an obvious advantage to either approach.
I like the backlink idea and solution, will put it in the spec (dashboard
pointing to test job in scheduler). And test job ID is maybe all that
scheduler needs to produce a link to test results in the dashboard. Zygmunt,
do you have any comments on this?
...
Thanks,
Paul Larson
Thanks for the comments, Paul!

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: LAVA scheduler spec