[be wary of the cross post when replying!]
Hi all,
Seeing as LAVA isn't going to support multi-node tests or highbank in the super near future, I spent a while today hacking up a script to run some tests automatically on the calxeda nodes in the lab. You can see it in all its gory detail here:
http://bazaar.launchpad.net/~mwhudson/+junk/highbank-bench-scripts/view/head...
(probably best read from the bottom up, come to think of it)
To do things like power cycle and prepare instances in parallel, it's written in an asynchronous style using the Twisted event-driven framework. This was a bit of an experiment and I'm not sure what I think of the result -- it's /reasonably/ clear and it works, but perhaps just using one thread per node being tested and writing blocking code (and using semaphores or whatever to synchronize) would have been clearer. So I guess before I do any more hacking like this, it would be good to hear what you guys (especially Ard I suppose!) think of this style.
In general, how to express a job that consists of a number of steps, some of which can be executed in parallel and some of which have dependencies is an interesting one. I suppose my eventy one is more on the side of one-step-at-a-time by default with explicit parallelization and threads + locks would be more on the side of parallel by default and explicit serialization. This has implications for how we write the job descriptions we give to a hypothetical multi-node test supporting LAVA -- has anyone thought about this yet? I think I prefer the explicit parallelism style myself (makes me think of cilk and grand central dispatch and csp...).
Cheers, mwh