[Linaro-validation] De-emphasize serial connection for LAVA Dispatcher, introduce LAVA Master Agent, LAVA Test Agent
zygmunt.krynicki at linaro.org
Fri Mar 30 01:52:03 UTC 2012
W dniu 30.03.2012 03:26, Le.chi Thu pisze:
> definition: DUT = device under test
> I am not agree to change the dispatcher. The suggested solution will
> not solve the root problem LAVA is facing today (unstable system) and
> also put the unneccessary constraint on LAVA by only allow DUT based
> test scenario (I mean the tests run isolated in DUT).
> There are many tests which require host / DUT communication during the
> test execution. Actually the test scenario is on host side and send
> command to target to perform action on the DUT.
In all of our current tests we don't really need to send anything, we
just do because that's how we started.
> Example if you test WLAN roaming, the test scenario is on the host,
> controlling both the WLAN simulator and the DUT.
When we cross that bridge we can think about it. I'm not convinced it's
not possible to do that without talking to the machine that controls
DUTs. Remember that I only want to eradicate the absolute abuse of the
serial line. Not any means of communication. In a specialized test where
you really absolutely have to talk to the test controller we could have
a way of doing that. It still does not make the generic pattern of
copying the scenario to the test image and running it there via an agent
Now our issues are:
1) Serial lines loose data
2) In our current architecture that means loosing the job (if unlucky).
3) We have very poor code running on the DUT (stuff like re-trying HTTP
would be otherwise easy to perform) because it's basically limited to
whatever we have in busybox/coreutils.
> Other example is multiple DUT tests.
No, that's perfectly possible. If the network works all devices are free
to talk to one another. I just don't want them to need to talk to the
> The serial port problem is a side effect when LAVA server is
> overloaded. Same with the 'wget image' problem.
Even if serial worked 100% reliably today I'd like to get rid of it as
the architecture is flaky. Sending shell across the wire and hoping for
the best is not the right way to do it.
> The solution is not overload the LAVA server. Possible solutions are :
> * Make the scheduler more intelligent and scheduler out job evently
> (it make no sense to start more jobs than the lava server can handle)
Since we don't know how much "too much" is this will never solve anything.
> * Distribute heavy task to cloud instances
This will happen anyway, we need to scale to other machines.
> * Update lava-dispatcher to retry if fail on some operations.
In the current implementation you cannot sensibly retry stuff over
shell. You need some glimpse of API to even attempt that. It's like
trying to do reliable protocol over UDP messages without getting any ack
from the other side. If we loose a byte in the middle of a command (or a
whole block and a ton of logging along with that) we just cannot assume
it's safe to try again.
Linaro Validation Team
More information about the linaro-validation