Re: [Linaro-validation] Forced reboots are harmful

20 Mar 2012


      On Tue, Mar 20, 2012 at 12:52:48PM +0100, Zygmunt Krynicki wrote:
...
W dniu 20.03.2012 11:48, Alexander Sack pisze:
...
On Tue, Mar 20, 2012 at 11:41:37AM +0100, Zygmunt Krynicki wrote:
...
Hi
Experimenting with the dispatcher made me realize that forced reboots (on
timeouts, for example) are an excellent way to damage the master image. At
the very best we are forced to re-check the master image. At the very worst
we may damage the superblock and generally hose the master.
Do you think it is feasible to mount the master read-only and only do r/w
work on the test partitions?
I like this idea... That combined with always-poweroff-on-reboot feels
like a good idea to compensate potential issues...
Just curious: why would we always poweroff on reboot? Do you mean actual
power being cut or the equivalent of poweroff(8)?
That's a different background. The key of automation infrastructure is
to ensure that each invidividual test is run in a controlled
environment with as close to 100% reprodicibility of the state as
possible.
soft rebooting the unit doesn't guarantee to bring you back into a
known base state. Hence, the requirement to always hard reboot with
proper time unpowered in between.
...
...
Take the approach known from live-cd into account, such as
aufs/unionfs and things should work well ... maybe master image doesnt
even need a partition anymore, but can be just a .img file on the fat
boot partition, just like how ubuntu live-cd etc. works...
I wonder what is the complexity of this approach. I would also like to
consider the memory requirements. As an alternative we could try to mount
the master image from NBD. The NBD server already support "reverting to
snapshot" and keeping delta for each connected client in a temporary
file.
Considering that the master image is nano and boots to the console
only, I don't think that the memory requirements would exceed what we
target LAVA lab at.
What I don't like about NBD is that it makes the LAVA infrastructure
more complex and harder to replicate. Everytime we add a new
server/service that isn't the image/board itself, we diverge from
something that can be validated and released efficiently/effectively a
bit further.
...
...
Anyone can think of reason to not put that into the backlog?
If LAVA team decides to investigate that path, please check with
DevPlatform team on how they can help...
I think we should seriously consider it as a milestone towards LAVA
reliability and automation of master image construction.
Do we have a few empiric examples of the gathered list of LAVA
incident that allows us to identify changes to master image (not
talking about reproducability here) as a recurring source for
unreliability?
-- 
Alexander Sack asac@linaro.org
Technical Director, Linaro Platform Teams
http://www.linaro.org | Open source software for ARM SoCs
http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] Forced reboots are harmful