Re: [Linaro-validation] hackish test automation & thoughts about writing multi-node jobs

25 Feb 2013

      Antonio Terceiro antonio.terceiro@linaro.org writes:
...
On Mon, Feb 25, 2013 at 12:37:24PM +1300, Michael Hudson-Doyle wrote:
...
Antonio Terceiro antonio.terceiro@linaro.org writes:
...
My thoughts, from a LAVA standpoint.
This parallelism style is indeed very elegant, but I couldn't think of
how we could take advantage of that in the existing LAVA infrastructure.
Yeah, I guess the LAVA trend has been towards being more
device-controlled (lava-test-shell and all that) and that doesn't really
fit with the explicit parallelism style.  Oh well.  I'll get over it :-)
...
Maybe we could make the dispatcher spawn child dispatchers (one for each
node involved in the test) and wait for all of them to finish.
I think on some level this model makes sense (whether it's subprocesses
or threads or the dispatcher does some async stuff doesn't really matter
for the mental model IMHO).
...
Inside each child dispatcher invocation, there should be a primitive
that says "wait until all by test budies are ready" so that after
flashing and booting each once can perform its setup steps (i.e. the
stuff we do before actually running tests), and wait for the others
before executing its part in the distributed job. This communication
might be coordinated by the "parent" dispatcher through signals.  I'm
not sure whether this primitive would be a new dispatcher action (and
thus declared in the job description file), or a binary inside the
target (and thus able to be invoked from inside lava test-shell-test
test runs), or both.
I think ... perhaps both?  It seems to me that the difference is around
rebooting: (currently, anyway) a lava_test_shell action implies a
reboot, and one thing a lava_test_shell-invoked script _cannot_ do
(well, easily, there are probably hacks) is reboot.  And I can just
about imagine tests that might want do some configuration that requires
a reboot to take effect.
A job that requires a reboot could declare the following:

deploy image
boot image
lava-test-shell <- setup.yaml
boot image
lava-test-shell <- run.yaml

Yeah, that would work.  It's kind of crummy in that there is a
dependence between the structure of the job file and the repository with
the yaml files in it but as it's a bit of a special case..
...
the run.yaml lava-test-shell definition could then just call a binary
that implements "wait for buddies".
(this way we don't need a "wait for budies" dispatcher action, just a
binary that can be called by the test suite).
...
I think we should probably try to write some tests like my simple iperf
test and see what API we would like.
Yep.
...
Here's a fun problem: devices will need to know the IP addresses of the
other devices in the test.  I suppose we could delay starting the
lava-test-shell processes on any device until they have all booted and
acquired an IP address?  Or we could run some service on the host
running the dispatcher that can be queried and informed of IP addresses
or something.
the "wait for buddies" action could inform the node's IP to the
dispatcher via a signal. When the dispatcher receives those from all
nodes, then it sends a list of all IP's to each node. After receiving
that list, the node can then write a /etc/hosts-like file with the IP's
of the group that can be read by the tests scripts being run.
Ah yeah.  Putting it in /etc/hosts would be a neat trick -- I'd been
thinking anyway that the job file should give names to the nodes it
requests (origin-server-1, origin-server-2, proxy-node, load-gen-1,
load-gen-2...).
Cheers,
mwh

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] hackish test automation & thoughts about writing multi-node jobs