On Fri Apr 26 02:25:14 UTC 2013 Antonio Terceiro replied to me, but I wasn't on the list, so only just spotted it in the web interface...
Non-API related thought: I think it is reasonable to have an internal
file server to store disk images on that we create during builds without having to push up to snapshots.linaro.org and pull them back down. It makes far more sense to boot and test an image, then optionally upload it to the wider world. Let me know if we have this soft of temporary storage available.
We don't have something like this, but we probably should have one.
I need to know if a machine is ready for me to use. I am happy to poll something.
I need to tell LAVA/CI that I have finished with a machine.
If I understand correctly your asumption is to receive a interactive session on the requested device(s), and then issue commands on it. Is that correct?
Yep Very similar to the LAVA hack tool that Tyler just demoed.
Maybe it's too late to ask this, but did you consider the possibility of having the CI runtime produce "actual" LAVA jobs (i.e. a target device spec + a non-interactive script), and then using an API to submit those jobs, poll for their completion (or block until completion depending on the use case) and acessing/manipulating/addressing the job results, perhaps to use them as input for other jobs?
I did consider a fixed function test approach, but for some jobs we are creating new commands based on the output of previous ones. It also restricts jobs to be a fixed sequence of commands unless you start implementing branches and loops in the test spec. Those are the reasons I chose Python - I don't have to re-invent anything.
This approach would have the advantage that since you don't directly control the device, you don't have the need to tell LAVA that you are finished with it. LAVA knows when the job you submited is done. Besides, if a CI job crashes, LAVA won't stay forever waiting for being told that a given device is done with, and doesn't need to care about handling timeouts, and we don't need to worry about what is the right timeout to wait for etc.
Does that make sense?
Absolutely. I know that with all this added flexibility I am potentially making life difficult for us, but I think it will encourage adoption and is a nice, flexible method of doing some really powerful stuff. I am keen to mitigate the problems it creates as much as possible. I expect us to have jobs that specify a maximum runtime anyway (this is standard practice with lots of grid compute and cluster products). We should have a reasonably short default timeout for jobs that don't specify one to "train" users to do this right. That avoid the crashing issue to some extent.
OTOH, I realize that having the ability to reserve a device and receive an interactive session on it is useful and would open up several other possibilities, so I don't necessarily think it is a bad idea at all.
-- James Tunnicliffe
linaro-validation@lists.linaro.org