On Mon, 24 Sep 2018 at 19:07, Tim Jaacks <tim.jaacks@garz-fricke.com> wrote:

Hello everyone,

is there a way to reset the LAVA database to the state of a fresh installation? From the docs I thought that "lava-server manage flush" command would do that.

If flush is mentioned in our docs it needs to be removed. It is not a supported operation. It exists because it is available from Django for local developer use.

There should NEVER be a need to reset a LAVA database to job ID #1. Talk to us BEFORE running any lava-server manage commands except those explicitly in our docs or performed during a package upgrade and the database integrity WILL be maintained. (Unless you have out of tree changes to one of the models.py files). We have done this a lot, we have tested this to the Nth degree and we know it works and there has never been a need to reset a production instance back to test job #1, no matter how bad it looked. If only our documentation is followed properly and admins talk to us before making things worse, as noted in our docs. It takes time and unscheduled downtime is always a bit of a panic but that is why our docs strongly recommend having backups. Then you can decide whether it is better to lose a day's data or turn an error into an unscheduled maintenance window until we have talked to you, privately, about exactly how to get out of your particular mess.

Last week would have been difficult because we were all at our Linaro Connect conference and timezones plus conference events would have taken more time than usual to respond. We would still have tried, IF you had asked.

As a hint, it is NOT supported to downgrade a lava-server instance without talking to us specifically about which version you have and which you want. Database migrations MUST be handled carefully and WILL cause loss of data. (Installing an old version of lava-server without managing the migrations counts as out of tree changes to models.py but is still recoverable IF you talk to us before mangling things further by running unsupported commands.)

As advised in the docs, all admins are strongly recommended to have tested backups of the database at all times.

https://master.lavasoftware.org/static/docs/v2/admin-backups.html#creating-backups

However, there are some things missing afterwards, e.g. the lava-health user and the master worker. Any hints on how to repair this? Re-installing lava-server did not help.

Re-install deliberately does not reinitialise the database, it simply ensures that the existing migrations are up to date. This is a requirement of Debian Policy and good practice for all distributions.

You will have to completely drop the PostgreSQL cluster (which will also drop other tables like lavapdu but also resets the database roles and user accounts) as covered in the docs for migrating your database from one PostgreSQL major version to the next. Also ensure you remove /etc/lava-server/instance.conf and /etc/lava-server/secret_key.conf - also, you must remove everything in /var/lib/lava-server/ - all your test job log files. Then, purge the lava-server package. Stop there, and talk to us again.

https://master.lavasoftware.org/static/docs/v2/debian.html#migrating-postgresql-versions

If you had talked to us before running "flush", this could have been prevented. If you had a backup, this would have been prevented.

If done properly, a package installation will then find a clean database and re-run the initial migrations.

However, I repeat, this should not have been necessary and we test this mechanism extensively. (i.e. several times every day whether there are code changes or not, for the last 5 years). We cannot help you if you do not talk to us BEFORE running dangerous commands you do not fully understand. We now test this mechanism on EVERY proposed code change through GitLab CI as well.

Once you have a database back MAKE A BACKUP, TEST THAT THE BACKUP WORKS & AUTOMATE MAKING MORE BACKUPS.

No instance you care about should ever be without a backup. A backup you haven't tested is not worth keeping.

If you don't care about the data in an instance, run it in a VM and blow away the entire VM. Then either recreate the VM or restore from a snapshot.

This is all standard sys admin stuff and there are plenty of warnings in the docs about talking to us before you get into this kind of mess.

Don't be a hero - TALK TO US before you make things worse. Used properly, there are Django and LAVA operations which have so far never failed to restore a functional lava-server database with minimal loss of data. Every situation is different and we need specific information and exact error messages (and a fair bit of time) but we have not had to do a full reset yet, across 8 years of LAVA and a dozen production instances. Anyone who has had to do that has done it because they have not followed our docs and has not stopped and talked to us before making things worse.

Mit freundlichen Grüßen / Best regards
i.A. Tim Jaacks

Software Engineering
Garz & Fricke GmbH
Tempowerkring 2, 21079 Hamburg - Germany
Amtsgericht Hamburg HRB 60514
Geschäftsführer: Manfred Garz, Matthias Fricke
Phone: +49 40 791899 - 55
Fax: +49 40 791899 – 39
tim.jaacks@garz-fricke.com
www.garz-fricke.com
_______________________________________________
Lava-users mailing list
Lava-users@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/lava-users

Neil Williams
=============
neil.williams@linaro.org
http://www.linux.codehelp.co.uk/