Re: [Linaro-validation] De-emphasize serial connection for LAVA Dispatcher, introduce LAVA Master Agent, LAVA Test Agent

29 Mar 2012


      On Thu, 29 Mar 2012 11:07:32 +0200, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
...
W dniu 29.03.2012 06:33, Michael Hudson-Doyle pisze:
...
On Mon, 26 Mar 2012 19:55:59 +0200, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
...
I've registered another blueprint. This time on LAVA Dispatcher. The
goal is similar as the previous blueprint about device manager, to
gather feedback and to propose some small steps sub-blueprints that
could be scheduled in 2012.04.
The general goal is to improve the way we run tests by making them more
reliable and more featureful (richer in state) at the same time.
Please read and comment on the mailing list
https://blueprints.launchpad.net/lava-dispatcher/+spec/lava-dispatcher-de-em...
I basically like this.  Let's do it.
Glad to hear that.
:)
...
...
I think we can implement this incrementally by making a
LavaAgentUsingClient or something in the dispatcher, although we'll have
to make changes to the dispatcher too -- for example, the lava_test_run
actions become a little... different (I guess the job file becomes a
little less imperative? Or maybe there is a distinction between actions
that execute on the host and those that execute on the board?  Or
something else?).  But nothing impossible.
I'll write a more concrete proposal. I'm +1 on starting small and doing
iterations but I want to make it clear that the goal is to have
something entirely different than what we have today, we'll start with
the dispatcher source code but let's keep our heads open ;-)
The one part of this that we can't really take this attitude on is the
interface we present to our users, i.e. the format of the job file.  Do
we want to/need to change that?  Can we provide a transition plan if we
do need to change it?
I'm not amazingly attached to the current format, it's a bit procedural
and redundant in some ways, a format that allows are users to express
what they _mean_ a bit more would be good -- I think with the current
format we're showing our guts to the world a bit :) Slightly separate
discussion, I suppose.
...
I'd like to lay down a plan on how the implementation will evolve, at
each milestone we should be good to deploy this to production with full
confidence.
I think there is a second side to incremental -- we don't want to have
to redo the master image for each board for each iteration (and I don't
see any reason below why we might have to, just making the point).
...
+0.0 (current dispatcher tree)
+0.1, replace external programs with lava-serial, serial config is not
constrained to serial class (direct, networked) and constructor data
Side note: with lava-device-manager the dispatcher would get this from
the outside and would be thus 100% config-free
+0.2, add mini master agent to master rootfs, make it accept shell
commands over IP, master image is scripted with one RPC method similar
to subprocess.Popen(). Dispatcher learns of the board IP over serial.
+0.3, add improved master image agent, extra specialized methods for
deployment, no shell during image deployment (download and copy to
partition driven from python)
Yeah, so with the current reasons for health job failures, this is the
bit I really want to see asap :)
...
+0.4, add mini test agent to test image before reboot, mounts testrootfs
and unpacks a tarball from master image (so that agent code is
synchronized to master image version), test agent supplements current
serial scripted code with simple methods (IP discovery, maybe shell
execution as in +0.2)
+0.5, test agent drives the whole test process, dispatcher job copied by
the master agent, data saved to testrootfs partition (TODO: maybe we
should pick a better location?)
I think the test agent should store the test results on its own
rootfs -- what else could it do?
...
+0.6, master agent takes over the part of sending the data back to the
dispatcher, no hacky/racy webservers, clean code on both ends
How does this sound? I just wrote it from the top of my head, no deeper
thoughts yet.
I think it's basically fine.
...
...
The only thing I'd change is that I don't really see a reason to *not*
spam the test output over the serial line and show said spam in the
scheduler web UI.  We should also store it in neat files on the test
image and make sure that those files are what the dashboard's view of
events are based on.
Thinking about it yeah, we may just spam the serial line for now.
Ideally I'd like to be able to get perfect separation of sources without
loosing correlated time. Imagine a scheduler page that has filters for
kernel messages (without relying on flaky pattern matching) and can
highlight them perfectly in the application run log.
Oh yes, that would be totally awesome.  But we shouldn't have a less
useful page in the mean time for no good reason.
Cheers,
mwh

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] De-emphasize serial connection for LAVA Dispatcher, introduce LAVA Master Agent, LAVA Test Agent