Re: [Linaro-validation] hackish test automation & thoughts about writing multi-node jobs

24 Feb 2013


      On Wed, Feb 20, 2013 at 07:56:26PM +1300, Michael Hudson-Doyle wrote:
...
[be wary of the cross post when replying!]
Hi all,
Seeing as LAVA isn't going to support multi-node tests or highbank in
the super near future, I spent a while today hacking up a script to run
some tests automatically on the calxeda nodes in the lab.  You can see
it in all its gory detail here:
http://bazaar.launchpad.net/~mwhudson/+junk/highbank-bench-scripts/view/head...
(probably best read from the bottom up, come to think of it)
To do things like power cycle and prepare instances in parallel, it's
written in an asynchronous style using the Twisted event-driven
framework.  This was a bit of an experiment and I'm not sure what I
think of the result -- it's /reasonably/ clear and it works, but perhaps
just using one thread per node being tested and writing blocking code
(and using semaphores or whatever to synchronize) would have been
clearer.  So I guess before I do any more hacking like this, it would be
good to hear what you guys (especially Ard I suppose!) think of this
style.
In general, how to express a job that consists of a number of steps,
some of which can be executed in parallel and some of which have
dependencies is an interesting one.  I suppose my eventy one is more on
the side of one-step-at-a-time by default with explicit parallelization
and threads + locks would be more on the side of parallel by default and
explicit serialization.  This has implications for how we write the job
descriptions we give to a hypothetical multi-node test supporting LAVA
-- has anyone thought about this yet?  I think I prefer the explicit
parallelism style myself (makes me think of cilk and grand central
dispatch and csp...).
My thoughts, from a LAVA standpoint.
This parallelism style is indeed very elegant, but I couldn't think of
how we could take advantage of that in the existing LAVA infrastructure.
Maybe we could make the dispatcher spawn child dispatchers (one for each
node involved in the test) and wait for all of them to finish.
Inside each child dispatcher invocation, there should be a primitive
that says "wait until all by test budies are ready" so that after
flashing and booting each once can perform its setup steps (i.e. the
stuff we do before actually running tests), and wait for the others
before executing its part in the distributed job. This communication
might be coordinated by the "parent" dispatcher through signals.  I'm
not sure whether this primitive would be a new dispatcher action (and
thus declared in the job description file), or a binary inside the
target (and thus able to be invoked from inside lava test-shell-test
test runs), or both.
To describe the tests, I thought of adding a new alternative attribute
to the job description called "device_group", mutually exclusive with
"device_type". This description would include a list of device
specifications, including their type, any tags indicating special
capabilities expected. We can then tag all nodes inside a single calxeda
box with the same tag (say "calxeda-box-1"), then we can use that to
request N devices in the same box for tests that require
low-latency/high-bandwith networking between the participants.
Are my thought too abstract for non-LAVA people?
-- 
Antonio Terceiro
Software Engineer - Linaro
http://www.linaro.org

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] hackish test automation & thoughts about writing multi-node jobs