On 4 February 2011 13:53, Paul Larson paul.larson@linaro.org wrote:
Hi Mirsad, I'm looking at the recent edits to https://wiki.linaro.org/Platform/Validation/Specs/ValidationScheduler and wanted to start a thread to discuss. Would love to hear thoughts from others as well.
We could probably use some more in the way of implementation details, but this is starting to take shape pretty well, good work. I have a few comments below:
Admin users can also cancel any scheduled jobs.
Job submitters should be allowed to cancel their own jobs too, right?
Correct, that's described for 'normal users': "Normal users will be able to define a test job, submit it for execution, cancel an ongoing job...". I will clarify this more explicitly in the spec.
I think in general, the user stories need tweaking. Many of them center
around automatic scheduling of jobs based on some event (adding a machine, adding a test, etc). Based on the updated design, this kind of logic would be in the piece we were referring to as the driver. The scheduler shouldn't be making those decisions on its own, but it should provide an interface for both humans to schedule jobs (web, cli) as well as and api for machines (driver) to do this.
Agree, we had some discussion about the driver part which didn't end with any specific conclusion, so I just kept the driver user stories in the scheduler. I will remove the driver specific user stories and develop in more detail the scheduler API definition and usage.
should we avoid scheduling image tests twice because a hwpack is coming
in after images or vv. Is this a question? Again, I don't think that's the scheduler's call. The scheduler isn't deciding what tests to run, and what to run them on. In this case, assuming we have the resources to pull it off, running the new image with the old, and the new hwpack would be good to do.
Agree, will remove this.
Test job definition
Is this different from the job definition used by the dipatcher? Please tell me if I'm missing something here, but I think to schedule something, you only really need two blobs of information: 1a. specific host to run on -OR- 1b. (any/every system matching given criteria) This one is tricky, and though it sounds really useful, my personally feeling is that it is of questionable value. In theory, it lets you make more efficient use of your hardware when you have multiple identical machines. In practice, what I've seen on similar systems is that humans typically know exactly which machine they want to run something on. Where it might really come in to play is later when we have a driver automatically scheduling jobs for us. 2. job file - this is the piece that the job dispatcher consumes. It could be handwritten, machine generated, or created based on a web form where the user selects what they want.
This is the same test job definition that the dispatcher will use. The idea here is that the end-users define a test job, which is then pushed to the dispatcher in some way. The scheduler will provide the web form you mention under 2. Is it something I'm missing here?
Test job status
One distinction I want to make here is job status vs. test result. A failed test can certainly have a "complete" job status. Incomplete, as a job status, just means that the dispatcher was unable to finsish all the steps in the job. For instance, a better example would be if we had a test that required an image to be deployed, booted, and a test run on it. If we tried to deploy the image and hit a kernel panic on reboot, that is an incomplete job because it never made it far enough to run the specified test.
Exactly, that was my idea in the beginning. I will try to expand the spec about this issue. One question here is if we want to collect logs from the failed test job and make them visible from the scheduler? Guess we don't want these logs/results pushed to dashboard.
Link to test results in launch-control
If we tie this closely enough with launch-control, it seems we could just communicate the job id to the dispatcher so that it gets rolled up with the bundle. That way the dashboard would have a backlink to the job, and could create the link to the bundle once it is deserialized. Just a different option if it's easier. I don't see an obvious advantage to either approach.
I like the backlink idea and solution, will put it in the spec (dashboard pointing to test job in scheduler). And test job ID is maybe all that scheduler needs to produce a link to test results in the dashboard. Zygmunt, do you have any comments on this?
Thanks, Paul Larson
Thanks for the comments, Paul!