Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
I also see the non-existent job 92 (which is deleted) running there?! http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running 253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
How to delete nonexistent jobs (from both CLI and GUI)?
Thank you, Zoran
On 21 February 2018 at 13:33, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay in submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of archival.
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running
We don't use this format for device transitions anymore. There were issues with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but the first thing to do is debug what is happening on your localhost.
Thank you, Zoran _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
Hello Neil,
This does not help me at all. I ran qemu01 device, which was perfectly OK. Now it is running, and stuck in running state. I cancelled this job, but it is stucked in cancelling state!?
I need method to reset the whole LAVA.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l lava-server lava-dispatcher Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==================================-======================-======================-========================================================================= ii lava-dispatcher 2017.7-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2017.7-1~bpo9+1 all Linaro Automated Validation Architecture server root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava does do the magic.
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
No idea what the output means. "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:04 +0000] "GET /static/docs/v2/_static/js/jquery-fix.js HTTP/1.1" 200 446 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:04 +0000] "GET /static/docs/v2/_static/bootstrap-3.3.6/js/bootstrap.min.js HTTP/1.1" 200 10151 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:04 +0000] "GET /static/docs/v2/_static/bootstrap-sphinx.js HTTP/1.1" 200 2254 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:04 +0000] "GET /static/docs/v2/_static/jquery.js HTTP/1.1" 200 79459 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:04 +0000] "GET /static/docs/v2/_static/js/jquery-1.11.0.min.js HTTP/1.1" 200 33722 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:04 +0000] "GET /static/docs/v2/_static/lava.png HTTP/1.1" 200 990 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:05 +0000] "GET /static/docs/v2/admin-backups.html HTTP/1.1" 200 8382 "http://localhost:8080/static/docs/v2/simple-admin.html" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0" 10.0.2.2 - - [21/Feb/2018:13:10:05 +0000] "GET /static/docs/v2/_static/favicon.ico HTTP/1.1" 200 807 "-" "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0"
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:33, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay in submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of archival.
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running
We don't use this format for device transitions anymore. There were issues with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but the first thing to do is debug what is happening on your localhost.
Thank you, Zoran _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 21 February 2018 at 13:58, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
This does not help me at all. I ran qemu01 device, which was perfectly OK. Now it is running, and stuck in running state. I cancelled this job, but it is stucked in cancelling state!?
I need method to reset the whole LAVA.
Not necessarily. You can do that but you are better off learning about the problems and the causes. LAVA is not a small utility that you reset at the first sign of problems. Everyone installing LAVA needs some level of administrative skills and that can involve a learning curve. apt-get install is only the very beginning of the work.
What version of LAVA are you running? There were important changes in
scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l lava-server lava-dispatcher Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/ trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description
+++-==================================-=====================
====================================== ii lava-dispatcher 2017.7-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2017.7-1~bpo9+1 all Linaro Automated Validation Architecture server root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava does do the magic.
The documentation covers upgrades - https://validation.linaro.org/static/docs/v2/installing_on_debian.html#lava-...
Check the lava-master and lava-slave logs to find the misconfiguration.
It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
This is also in the documentation. https://validation.linaro.org/static/docs/v2/simple-admin.html#where-to-find...
No idea what the output means.
There is apache documentation to help you with this but apache is doing the serving of the pages, not the scheduling.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:33, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay in submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of archival.
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running
We don't use this format for device transitions anymore. There were
issues
with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the misconfiguration.
It's
likely to be invalid state of the device and/or test job or misconfigured device configuration.
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but the first thing to do is debug what is happening on your localhost.
Thank you, Zoran _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
Hello Neil,
I found the cause why the whole Lava behaves insane. It is that VM Virtual Box stretch.vmdk reached the limit (device full).
Since stretch.vmdk is fixed size file, I've managed to clone and create stretch.vdi (dynamic Virtual disk), which is 4x size of stretch .vmdk. So far the new (cloned) VM, till Lava (and there are tons of SW running prior Lava) behave very correctly.
/dev/sda1 38240460 8160436 28118092 23% /
I have tried again to run the scheduler, with qemu01 which perfectly worked before. It behaves the same. The job (qemu) stays indefinitely in submitted state.
This job (qemu) should be NOT dependent of anything else, correct? If it is (of other independent jobs), the whole Lava project is created as wrong architecture?!
It is very clear to me that Lava does not behave correctly. It is Lava's fault, I am sure.
Please, provide the way to reset the whole Lava in some synch-ed way. Commands service lava-server restart and similar do not work.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 3:05 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:58, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
This does not help me at all. I ran qemu01 device, which was perfectly OK. Now it is running, and stuck in running state. I cancelled this job, but it is stucked in cancelling state!?
I need method to reset the whole LAVA.
Not necessarily. You can do that but you are better off learning about the problems and the causes. LAVA is not a small utility that you reset at the first sign of problems. Everyone installing LAVA needs some level of administrative skills and that can involve a learning curve. apt-get install is only the very beginning of the work.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l lava-server lava-dispatcher Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description
+++-==================================-======================-======================-========================================================================= ii lava-dispatcher 2017.7-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2017.7-1~bpo9+1 all Linaro Automated Validation Architecture server root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava does do the magic.
The documentation covers upgrades - https://validation.linaro.org/static/docs/v2/installing_on_debian.html#lava-...
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
This is also in the documentation. https://validation.linaro.org/static/docs/v2/simple-admin.html#where-to-find...
No idea what the output means.
There is apache documentation to help you with this but apache is doing the serving of the pages, not the scheduling.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:33, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay in submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of archival.
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running
We don't use this format for device transitions anymore. There were issues with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but the first thing to do is debug what is happening on your localhost.
Thank you, Zoran _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 22 February 2018 at 08:57, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
I found the cause why the whole Lava behaves insane. It is that VM Virtual Box stretch.vmdk reached the limit (device full).
Since stretch.vmdk is fixed size file, I've managed to clone and create stretch.vdi (dynamic Virtual disk), which is 4x size of stretch .vmdk. So far the new (cloned) VM, till Lava (and there are tons of SW running prior Lava) behave very correctly.
/dev/sda1 38240460 8160436 28118092 23% /
I have tried again to run the scheduler, with qemu01 which perfectly worked before. It behaves the same. The job (qemu) stays indefinitely in submitted state.
Then this is a local issue in the database in that VM resulting from the local problem of ENOSPACE. Sorry, but this is not something we can solve, it has to be resolved locally.
This job (qemu) should be NOT dependent of anything else, correct?
It is a test job. It is dependent on device configuration like any other. It is dependent on database configuration like any other. It looks like there is a problem in the database state on your local VM. You will need to use the django admin interface and the available log files to resolve that, inside that VM.
If it is (of other independent jobs), the whole Lava project is created as wrong architecture?!
Sorry, that makes no sense to me. QEMU can support multiple architectures.
It is very clear to me that Lava does not behave correctly. It is Lava's fault, I am sure.
Sorry, this is a problem inside the database in your local VM resulting from the problems arising from your local VM running out of space and possibly other issues inside the VM, particularly some of the current values in the database. That can only be resolved by running commands on that database.
Please, provide the way to reset the whole Lava in some synch-ed way.
Commands service lava-server restart and similar do not work.
Please understand that a service restart just restarts a single process - the problem is in the database which that process then uses. That needs investigation, not a reset.
Depending on how much data you have in that VM, it might be best to throw it away and start again with a fresh installation without space issues and taking your time to go through the documentation thoroughly and *carefully*. Things get a lot easier if you have a dedicated machine to run Debian Stretch instead of relying on virtual machines, however, we regularly test LAVA installations inside QEMU VMs too. If you have a backup of the database, then restore the backup.
https://staging.validation.linaro.org/static/docs/v2/admin-backups.html
https://staging.validation.linaro.org/results/query/~neil.williams/staging-f...
We also regularly test running jobs in a fresh virtual machine installation.
This particular VM needs work to investigate the database issues before it can be upgraded. Some of the values for device state and test job state have been corrupted and you'll find more information in the log files.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 3:05 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:58, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
This does not help me at all. I ran qemu01 device, which was perfectly OK. Now it is running, and stuck in running state. I cancelled this job, but it is stucked in cancelling state!?
I need method to reset the whole LAVA.
Not necessarily. You can do that but you are better off learning about
the
problems and the causes. LAVA is not a small utility that you reset at
the
first sign of problems. Everyone installing LAVA needs some level of administrative skills and that can involve a learning curve. apt-get
install
is only the very beginning of the work.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l lava-server lava-dispatcher Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/
trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description
+++-==================================-=====================
=-======================-===================================
ii lava-dispatcher 2017.7-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2017.7-1~bpo9+1 all Linaro Automated Validation Architecture server root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava does do the magic.
The documentation covers upgrades - https://validation.linaro.org/static/docs/v2/installing_on_
debian.html#lava-repositories
Check the lava-master and lava-slave logs to find the
misconfiguration.
It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
This is also in the documentation. https://validation.linaro.org/static/docs/v2/simple-admin.
html#where-to-find-debug-information
No idea what the output means.
There is apache documentation to help you with this but apache is doing
the
serving of the pages, not the scheduling.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams <
neil.williams@linaro.org>
wrote:
On 21 February 2018 at 13:33, Zoran S <zoran.stojsavljevic.de@gmail.
com>
wrote:
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay
in
submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of archival.
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running
We don't use this format for device transitions anymore. There were issues with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20,
2018,
9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the
misconfiguration.
It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but the first thing to do is debug what is happening on your localhost.
Thank you, Zoran _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
Sorry, this is a problem inside the database in your local VM resulting from the problems arising from your local VM running out of space and possibly other issues inside the VM, particularly some of the current values in the database. That can only be resolved by running commands on that database.
Can you provide me the way of throwing the whole existing database inside Lava? I do not want to de-install Lava if I do not need to.
https://staging.validation.linaro.org/results/query/~neil.williams/staging-f...
As my understanding is from this pointer, I can de-install the whole Lava with the command: apt-get remove lava
And then use the provided .yaml files to re-install Lava using ansible-playbooks with (for example): https://staging.validation.linaro.org/scheduler/job/211190/definition .yaml file.
Am I correct?
Thank you, Zoran _______
Thank you, Zoran
On Thu, Feb 22, 2018 at 10:14 AM, Neil Williams neil.williams@linaro.org wrote:
On 22 February 2018 at 08:57, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
I found the cause why the whole Lava behaves insane. It is that VM Virtual Box stretch.vmdk reached the limit (device full).
Since stretch.vmdk is fixed size file, I've managed to clone and create stretch.vdi (dynamic Virtual disk), which is 4x size of stretch .vmdk. So far the new (cloned) VM, till Lava (and there are tons of SW running prior Lava) behave very correctly.
/dev/sda1 38240460 8160436 28118092 23% /
I have tried again to run the scheduler, with qemu01 which perfectly worked before. It behaves the same. The job (qemu) stays indefinitely in submitted state.
Then this is a local issue in the database in that VM resulting from the local problem of ENOSPACE. Sorry, but this is not something we can solve, it has to be resolved locally.
This job (qemu) should be NOT dependent of anything else, correct?
It is a test job. It is dependent on device configuration like any other. It is dependent on database configuration like any other. It looks like there is a problem in the database state on your local VM. You will need to use the django admin interface and the available log files to resolve that, inside that VM.
If it is (of other independent jobs), the whole Lava project is created as wrong architecture?!
Sorry, that makes no sense to me. QEMU can support multiple architectures.
It is very clear to me that Lava does not behave correctly. It is Lava's fault, I am sure.
Sorry, this is a problem inside the database in your local VM resulting from the problems arising from your local VM running out of space and possibly other issues inside the VM, particularly some of the current values in the database. That can only be resolved by running commands on that database.
Please, provide the way to reset the whole Lava in some synch-ed way.
Commands service lava-server restart and similar do not work.
Please understand that a service restart just restarts a single process - the problem is in the database which that process then uses. That needs investigation, not a reset.
Depending on how much data you have in that VM, it might be best to throw it away and start again with a fresh installation without space issues and taking your time to go through the documentation thoroughly and *carefully*. Things get a lot easier if you have a dedicated machine to run Debian Stretch instead of relying on virtual machines, however, we regularly test LAVA installations inside QEMU VMs too. If you have a backup of the database, then restore the backup.
https://staging.validation.linaro.org/static/docs/v2/admin-backups.html
https://staging.validation.linaro.org/results/query/~neil.williams/staging-f...
We also regularly test running jobs in a fresh virtual machine installation.
This particular VM needs work to investigate the database issues before it can be upgraded. Some of the values for device state and test job state have been corrupted and you'll find more information in the log files.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 3:05 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:58, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
This does not help me at all. I ran qemu01 device, which was perfectly OK. Now it is running, and stuck in running state. I cancelled this job, but it is stucked in cancelling state!?
I need method to reset the whole LAVA.
Not necessarily. You can do that but you are better off learning about the problems and the causes. LAVA is not a small utility that you reset at the first sign of problems. Everyone installing LAVA needs some level of administrative skills and that can involve a learning curve. apt-get install is only the very beginning of the work.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l lava-server lava-dispatcher Desired=Unknown/Install/Remove/Purge/Hold |
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description
+++-==================================-======================-======================-========================================================================= ii lava-dispatcher 2017.7-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2017.7-1~bpo9+1 all Linaro Automated Validation Architecture server root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava does do the magic.
The documentation covers upgrades -
https://validation.linaro.org/static/docs/v2/installing_on_debian.html#lava-...
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
This is also in the documentation.
https://validation.linaro.org/static/docs/v2/simple-admin.html#where-to-find...
No idea what the output means.
There is apache documentation to help you with this but apache is doing the serving of the pages, not the scheduling.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:33, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello to LAVA admins,
You have fatal error in Lava V2. I submitted job, which hangs in submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will stay in submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes in scheduling after the removal of V1 in 2018.1
I also see the non-existent job 92 (which is deleted) running there?!
Test jobs should not typically be deleted - except possibly as part of archival.
http://localhost:8080/scheduler/device/bbb01
254 2018-02-21 12:05 Reserved → Running — Job 92 running
We don't use this format for device transitions anymore. There were issues with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
253 2018-02-21 12:04 Idle → Reserved — Reserved for job 92 252 2018-02-21 12:03 Running → Idle — Job 91 cancelled 251 2018-02-21 12:03 Reserved → Running — Job 91 running 250 2018-02-21 12:03 Idle → Reserved — Reserved for job 91 249 2018-02-21 11:33 Running → Idle — Job 89 has ended. Setting job status Incomplete
But when I look to this local pointer: http://localhost:8080/admin/lava_scheduler_app/testjob/
102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07 p.m.-- 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:55 a.m. 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m.
How to force submitted job to be active job (from both CLI and GUI)?
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects but the first thing to do is debug what is happening on your localhost.
Thank you, Zoran _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 22 February 2018 at 11:24, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Sorry, this is a problem inside the database in your local VM resulting from the problems arising from your local VM running out of space and possibly other issues inside the VM, particularly some of the current values in the database. That can only be resolved by running commands on that database.
Can you provide me the way of throwing the whole existing database inside Lava? I do not want to de-install Lava if I do not need to.
That would be a postgres command, not a LAVA one - but you need to avoid not trashing the GRANTS, users and passwords already configured. It's not something I'd recommend for your situation. It can be done but should it go wrong, it can be tricky to sort out.
I'm sorry, I can't step you through the entire process here. You need to read up on Debian administration, postgresql database admin and investigate what is happening on your own system. Installing LAVA requires a certain level of administrative knowledge and familiarity with systems like Debian.
Alternatively, start with a new VM.
neil.williams/staging-fresh-install
As my understanding is from this pointer, I can de-install the whole Lava with the command: apt-get remove lava
That just removes the lava metapackage, not lava-server or anything else. It won't actually help your problem at all. It's not how the Debian package system works.
Please read up on management of a Debian system. You need to differentiate between the code installed by the package and the database created (and subsequently modified) by executing that code. apt only knows about the code installed and how to do the initial step to configure the database on a fresh install.
And then use the provided .yaml files to re-install Lava using ansible-playbooks with (for example): https://staging.validation.linaro.org/scheduler/job/211190/definition .yaml file.
Am I correct?
Umm, no. The provided YAML files have very little to do with your problem - that test job is just a way to test installation. That particular job is modified to support automation. In your case, you want the install to be interactive. The commands to install LAVA in an interactive situation are all covered in the documentation. You should also enable the LAVA repositories so that you start off with the current release of LAVA.
Thank you, Zoran _______
Thank you, Zoran
On Thu, Feb 22, 2018 at 10:14 AM, Neil Williams neil.williams@linaro.org wrote:
On 22 February 2018 at 08:57, Zoran S zoran.stojsavljevic.de@gmail.com wrote:
Hello Neil,
I found the cause why the whole Lava behaves insane. It is that VM Virtual Box stretch.vmdk reached the limit (device full).
Since stretch.vmdk is fixed size file, I've managed to clone and create stretch.vdi (dynamic Virtual disk), which is 4x size of stretch .vmdk. So far the new (cloned) VM, till Lava (and there are tons of SW running prior Lava) behave very correctly.
/dev/sda1 38240460 8160436 28118092 23% /
I have tried again to run the scheduler, with qemu01 which perfectly worked before. It behaves the same. The job (qemu) stays indefinitely in submitted state.
Then this is a local issue in the database in that VM resulting from the local problem of ENOSPACE. Sorry, but this is not something we can
solve, it
has to be resolved locally.
This job (qemu) should be NOT dependent of anything else, correct?
It is a test job. It is dependent on device configuration like any
other. It
is dependent on database configuration like any other. It looks like
there
is a problem in the database state on your local VM. You will need to use the django admin interface and the available log files to resolve that, inside that VM.
If it is (of other independent jobs), the whole Lava project is created as wrong architecture?!
Sorry, that makes no sense to me. QEMU can support multiple
architectures.
It is very clear to me that Lava does not behave correctly. It is Lava's fault, I am sure.
Sorry, this is a problem inside the database in your local VM resulting
from
the problems arising from your local VM running out of space and possibly other issues inside the VM, particularly some of the current values in
the
database. That can only be resolved by running commands on that database.
Please, provide the way to reset the whole Lava in some synch-ed way.
Commands service lava-server restart and similar do not work.
Please understand that a service restart just restarts a single process - the problem is in the database which that process then uses. That needs investigation, not a reset.
Depending on how much data you have in that VM, it might be best to
throw it
away and start again with a fresh installation without space issues and taking your time to go through the documentation thoroughly and
*carefully*.
Things get a lot easier if you have a dedicated machine to run Debian Stretch instead of relying on virtual machines, however, we regularly
test
LAVA installations inside QEMU VMs too. If you have a backup of the database, then restore the backup.
https://staging.validation.linaro.org/static/docs/v2/admin-backups.html
neil.williams/staging-fresh-install
We also regularly test running jobs in a fresh virtual machine
installation.
This particular VM needs work to investigate the database issues before
it
can be upgraded. Some of the values for device state and test job state
have
been corrupted and you'll find more information in the log files.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 3:05 PM, Neil Williams <
neil.williams@linaro.org>
wrote:
On 21 February 2018 at 13:58, Zoran S <zoran.stojsavljevic.de@gmail.
com>
wrote:
Hello Neil,
This does not help me at all. I ran qemu01 device, which was
perfectly
OK. Now it is running, and stuck in running state. I cancelled this job, but it is stucked in cancelling state!?
I need method to reset the whole LAVA.
Not necessarily. You can do that but you are better off learning about the problems and the causes. LAVA is not a small utility that you reset at the first sign of problems. Everyone installing LAVA needs some level of administrative skills and that can involve a learning curve. apt-get install is only the very beginning of the work.
What version of LAVA are you running? There were important changes
in
scheduling after the removal of V1 in 2018.1
root@stretch:/etc/lava-server/dispatcher-config/devices# dpkg -l lava-server lava-dispatcher Desired=Unknown/Install/Remove/Purge/Hold |
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/
trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description
+++-==================================-=====================
=-======================-===================================
ii lava-dispatcher 2017.7-1~bpo9+1 amd64 Linaro Automated Validation Architecture dispatcher ii lava-server 2017.7-1~bpo9+1 all Linaro Automated Validation Architecture server root@stretch:/etc/lava-server/dispatcher-config/devices#
How I can upgrade to latest Lava. I know that apt-get upgrade Lava does do the magic.
The documentation covers upgrades -
debian.html#lava-repositories
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
No idea where the logs are? /var/log/apache2/lava-server.log ???
This is also in the documentation.
html#where-to-find-debug-information
No idea what the output means.
There is apache documentation to help you with this but apache is
doing
the serving of the pages, not the scheduling.
Thank you, Zoran _______
On Wed, Feb 21, 2018 at 2:45 PM, Neil Williams neil.williams@linaro.org wrote:
On 21 February 2018 at 13:33, Zoran S zoran.stojsavljevic.de@gmail.com wrote: > > Hello to LAVA admins, > > You have fatal error in Lava V2. I submitted job, which hangs in > submitted state indefinitely.
This will be down to local misconfiguration. Submitted jobs will
stay
in submitted until everything is ready for the job to start.
What version of LAVA are you running? There were important changes
in
scheduling after the removal of V1 in 2018.1
> > I also see the non-existent job 92 (which is deleted) running > there?!
Test jobs should not typically be deleted - except possibly as part of archival.
> > http://localhost:8080/scheduler/device/bbb01 > > 254 2018-02-21 12:05 Reserved → Running — > Job 92 running
We don't use this format for device transitions anymore. There were issues with the old transitions but those could not be addressed until V1 was removed.
https://validation.linaro.org/scheduler/device/beaglebone-black02
> > 253 2018-02-21 12:04 Idle → Reserved — > Reserved for job 92 > 252 2018-02-21 12:03 Running → Idle — > Job 91 cancelled > 251 2018-02-21 12:03 Reserved → Running — > Job 91 running > 250 2018-02-21 12:03 Idle → Reserved — > Reserved for job 91 > 249 2018-02-21 11:33 Running → Idle — > Job 89 has ended. Setting job status Incomplete > > But when I look to this local pointer: > http://localhost:8080/admin/lava_scheduler_app/testjob/ > > 102 Submittednobodylessbeaglebone-black -Feb. 21, 2018, 1:07
p.m.--
> 76 Incompletenobodylessbeaglebone-black bbb01 (Running, health > Unknown)Feb. 20, 2018, 9:50 a.m.Feb. 20, 2018, 9:50 a.m.Feb. 20, > 2018, > 9:55 a.m. > 49 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, > 2018, 10:44 a.m.Feb. 16, 2018, 10:44 a.m.Feb. 16, 2018, 10:50 a.m. > 43 Completenobodylessqemu qemu01 (Idle, health Unknown)Feb. 16, > 2018, 8:42 a.m.Feb. 16, 2018, 8:43 a.m.Feb. 16, 2018, 8:49 a.m. > > How to force submitted job to be active job (from both CLI and
GUI)?
Check the lava-master and lava-slave logs to find the misconfiguration. It's likely to be invalid state of the device and/or test job or misconfigured device configuration.
> > > How to delete nonexistent jobs (from both CLI and GUI)?
Not recommended. lava-server manage has helpers to delete objects
but
the first thing to do is debug what is happening on your localhost.
> > > Thank you, > Zoran > _______________________________________________ > Lava-users mailing list > Lava-users@lists.linaro.org > https://lists.linaro.org/mailman/listinfo/lava-users
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/