Re: [Linaro-validation] CI Runtime API - linaro-validation

26 Apr 2013


      On Fri Apr 26 02:25:14 UTC 2013 Antonio Terceiro replied to me, but I
wasn't on the list, so only just spotted it in the web interface...
...
Non-API related thought: I think it is reasonable to have an internal
...
file server to store disk images on that we create during builds
without having to push up to snapshots.linaro.org and pull them back
down. It makes far more sense to boot and test an image, then
optionally upload it to the wider world. Let me know if we have this
soft of temporary storage available.
...
We don't have something like this, but we probably should have one.
...
...
I need to know if a machine is ready for me to use. I am happy to poll
something.
I need to tell LAVA/CI that I have finished with a machine.
...
If I understand correctly your asumption is to receive a interactive
session on the requested device(s), and then issue commands on it. Is
that correct?
Yep Very similar to the LAVA hack tool that Tyler just demoed.
...
Maybe it's too late to ask this, but did you consider the possibility of
having the CI runtime produce "actual" LAVA jobs (i.e. a target device
spec + a non-interactive script), and then using an API to submit those
jobs, poll for their completion (or block until completion depending on
the use case) and acessing/manipulating/addressing the job results,
perhaps to use them as input for other jobs?
I did consider a fixed function test approach, but for some jobs we
are creating new commands based on the output of previous ones. It
also restricts jobs to be a fixed sequence of commands unless you
start implementing branches and loops in the test spec. Those are the
reasons I chose Python - I don't have to re-invent anything.
...
This approach would have the advantage that since you don't directly
control the device, you don't have the need to tell LAVA that you are
finished with it. LAVA knows when the job you submited is done. Besides,
if a CI job crashes, LAVA won't stay forever waiting for being told that
a given device is done with, and doesn't need to care about handling
timeouts, and we don't need to worry about what is the right timeout to
wait for etc.
...
Does that make sense?
Absolutely. I know that with all this added flexibility I am
potentially making life difficult for us, but I think it will
encourage adoption and is a nice, flexible method of doing some really
powerful stuff. I am keen to mitigate the problems it creates as much
as possible. I expect us to have jobs that specify a maximum runtime
anyway (this is standard practice with lots of grid compute and
cluster products). We should have a reasonably short default timeout
for jobs that don't specify one to "train" users to do this right.
That avoid the crashing issue to some extent.
...
OTOH, I realize that having the ability to reserve a device and receive
an interactive session on it is useful and would open up several other
possibilities, so I don't necessarily think it is a bad idea at all.
--
James Tunnicliffe