On Tue, 8 Oct 2013 17:48:18 +0100 Dean Arnold Dean.Arnold@arm.com wrote:
Hi Neil,
*Please* keep the list in the loop. I am not the sole point of contact for this issue.
The reason the scheduler log wasn't present is because my scheduler is crashing when I try to run it. Unfortunately the upstart commands didn't seem to want to output the failure to me.
It's a daemon, stdout and stderr are closed for all daemons - this isn't confined to upstart. That is why I advised running the command manually....
When I attempted to launch the scheduler manually I was able to detect from the command line output that the initial problem was due to the postgres database not accepting TCP/IP connections on port 5432.
If you had an older version of postgresql ever installed at the same time as a new version, postgresql will change that to 5433, then 5434 and so on for each one. This is standard postgresql behaviour and nothing to do with LAVA.
WARNING:root:This instance will not use sentry as SENTRY_DSN is not configured execvp: No such file or directory 2013-10-08 16:26:48,742 [ERROR] [lava_scheduler_daemon.job.SchedulerMonitorPP] scheduler monitor for pdswlava-vetc2-04 crashed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ProcessTerminated'>: A process has ended with a probable error condition: process ended with exit code 1. ] 2013-10-08 16:26:48,864 [ERROR] [sentry.errors] No servers configured, and sentry not installed. Cannot send message No servers configured, and sentry not installed. Cannot send message
Looks like a django error - your database connection is still not correct.
Is this something you have seen before?
No. I just googled SENTRY_DSN.
Did you need to install any extra packages outside of what the lava-deployment-tool provides when running the setupworker/installworker commands?
No - however, if you have a postgresql server installed on the worker, it is not required.
Could I have missed a configuration step somewhere?
The initial use of setup instead of setupworker could have messed up the database configuration on the worker. It just looks like the worker cannot find the database.
If you had an older version of postgresql ever installed at the same time as a new version, postgresql will change that to 5433, then 5434 and so on for each one. This is standard postgresql behaviour and nothing to do with LAVA.
My master instance of LAVA was installed on a clean Ubuntu server installation and the only time postresql was added was during the deployment of LAVA. The port number hadn't changed, the listen_addresses setting hadn't been set and therefore it just wasn't listening on that port at all. The information regarding the configuration of postresql was included for completeness only, I wasn't suggesting this was anything to do with LAVA.
Looks like a django error - your database connection is still not correct.
Is this something you have seen before?
No. I just googled SENTRY_DSN.
I too googled this, however as it is the first time we are setting up LAVA this way, and as this aspect of LAVA configuration is not documented, I didn't think it would be unreasonable to ask those who had written the system, and configured it this way before, whether they had seen this particular error, in case they were able to point me at a simple fix or a configuration step I may have missed, before I started investigating this further.
If you haven't seen this then fair enough, I will go carry on with my own investigations.
No - however, if you have a postgresql server installed on the worker, it is not required.
No there is no postgresql server installed on the worker.
The initial use of setup instead of setupworker could have messed up the database configuration on the worker. It just looks like the worker cannot find the database.
After the mess I made of the previous install, I set up a VM to trial this, so this has been deployed from scratch using the updated commands you gave me.
Possibly the issue is with the configuration of the master. There has already been one issue with the postgresql setup, so possibly there are more. I will look into it.
Cheers Dean
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
On Tue, 8 Oct 2013 19:43:17 +0100 Dean Arnold Dean.Arnold@arm.com wrote:
next - Use the master information as is edit - Edit the master information Please decide what to do [next]: next ./lava-deployment-tool: line 270: defaults_coordinator: command not found
Could this be what is causing my problems?
Sorry, I completely missed this first time around. This is a bug in lava-deployment-tool - it is what caused the lack of the /etc/lava-coordinator/lava-coordinator.conf file which you added manually. I'll fix that problem in lava-deployment-tool. It's a minor change:
diff --git a/lava-deployment-tool b/lava-deployment-tool index 35486db..ca5d52e 100755 --- a/lava-deployment-tool +++ b/lava-deployment-tool @@ -708,6 +708,10 @@ wizard_coordinator () { true }
+defaults_coordinator () { + true +} + defaults_buildout () { true }
LAVA should still have setup postgresql correctly from a fresh install, especially as you mentioned that this is a fresh Ubuntu 12.04 LTS install.
My master instance of LAVA was installed on a clean Ubuntu server installation and the only time postresql was added was during the deployment of LAVA. The port number hadn't changed, the listen_addresses setting hadn't been set and therefore it just wasn't listening on that port at all.
I do intermittent reinstalls but most of the time the LAVA team are doing upgrades of existing clients.
Fresh installs tend to be tests in a VM which doesn't involve a remote dispatcher. The remote dispatchers are usually set up later but without these kind of problems.
lava-deployment-tool should have added this line to the postgresql configuration:
"host all all 0.0.0.0/0 trust"
I think that shows where the problem originally lies and that led me to this resource which should become the basis of the missing docs.
http://www.linaro.org/connect-lcu13/resources/Q/lce13 ADVANCED LAVA LAB CONFIGURATION Page 11 of this PDF http://www.linaro.org/documents/download/74099337b34eb0ab4521fb574f3bcab751e...
*Before* running ./lava-deployment-tool on the *master*, there is a variable to be exported which allows for the remote database connection from remote workers:
export LAVA_DB_ALLOWREMOTE=yes
That command also works on *upgrades*, so this may actually fix your problem. Upgrade master with this environment variable set, then restart lava on the worker.
I'll see about adding a note to setupworker or installworker to ensure that this is setup on the master - the problem being that the master doesn't want this enabled by default and the security implications should be clearly set out in the docs.
The information regarding the configuration of postresql was included for completeness only, I wasn't suggesting this was anything to do with LAVA.
Looks like a django error - your database connection is still not correct.
Is this something you have seen before?
No. I just googled SENTRY_DSN.
I too googled this, however as it is the first time we are setting up LAVA this way, and as this aspect of LAVA configuration is not documented,
... we plan to sort out missing docs like that ... there's already a card for it and I've added a comment to the card as a reminder about the magic environment variable.
The error message is very obscure - lava-deployment-tool could do with a way of testing whether it can see the database and reporting a sensible error message instead.
I didn't think it would be unreasonable to ask those who had written the system, and configured it this way before, whether they had seen this particular error, in case they were able to point me at a simple fix or a configuration step I may have missed, before I started investigating this further.
If you haven't seen this then fair enough, I will go carry on with my own investigations.
Sorry to leave you on a limb with this. The error you saw was unfamiliar, but after investigation, I hope this will fix the problem for you.
No - however, if you have a postgresql server installed on the worker, it is not required.
No there is no postgresql server installed on the worker.
OK, then this does need investigation because that kind of setup should "just work", especially on precise.
The initial use of setup instead of setupworker could have messed up the database configuration on the worker. It just looks like the worker cannot find the database.
After the mess I made of the previous install, I set up a VM to trial this, so this has been deployed from scratch using the updated commands you gave me.
Possibly the issue is with the configuration of the master. There has already been one issue with the postgresql setup, so possibly there are more. I will look into it.
It may well be with the configuration of the master, down to the missing environment variable.
Hi Neil,
Thanks for the reply. I may not be able to look at this today as we have a release coming up, but as soon as I get chance, I will apply the patch to my lava-deployment-tool instance and follow the additional steps you have given me.
Hopefully this will sort things out. I will let you know how it goes, Cheers Dean
-----Original Message----- From: Neil Williams [mailto:codehelp@debian.org] Sent: 09 October 2013 10:20 To: Dean Arnold Cc: linaro-validation@lists.linaro.org Validation Subject: Re: [Linaro-validation] More Remote Dispatcher Questions - possible fix
On Tue, 8 Oct 2013 19:43:17 +0100 Dean Arnold Dean.Arnold@arm.com wrote:
next - Use the master information as is edit - Edit the master information Please decide what to do [next]: next ./lava-deployment-tool: line 270: defaults_coordinator: command not found
Could this be what is causing my problems?
Sorry, I completely missed this first time around. This is a bug in lava-deployment-tool - it is what caused the lack of the /etc/lava-coordinator/lava-coordinator.conf file which you added manually. I'll fix that problem in lava-deployment-tool. It's a minor change:
diff --git a/lava-deployment-tool b/lava-deployment-tool index 35486db..ca5d52e 100755 --- a/lava-deployment-tool +++ b/lava-deployment-tool @@ -708,6 +708,10 @@ wizard_coordinator () { true }
+defaults_coordinator () {
- true
+}
defaults_buildout () { true }
LAVA should still have setup postgresql correctly from a fresh install, especially as you mentioned that this is a fresh Ubuntu 12.04 LTS install.
My master instance of LAVA was installed on a clean Ubuntu server installation and the only time postresql was added was during the deployment of LAVA. The port number hadn't changed, the listen_addresses setting hadn't been set and therefore it just wasn't listening on that port at all.
I do intermittent reinstalls but most of the time the LAVA team are doing upgrades of existing clients.
Fresh installs tend to be tests in a VM which doesn't involve a remote dispatcher. The remote dispatchers are usually set up later but without these kind of problems.
lava-deployment-tool should have added this line to the postgresql configuration:
"host all all 0.0.0.0/0 trust"
I think that shows where the problem originally lies and that led me to this resource which should become the basis of the missing docs.
http://www.linaro.org/connect-lcu13/resources/Q/lce13 ADVANCED LAVA LAB CONFIGURATION Page 11 of this PDF http://www.linaro.org/documents/download/74099337b34eb0ab4521fb574f3bca b751e5603eee579
*Before* running ./lava-deployment-tool on the *master*, there is a variable to be exported which allows for the remote database connection from remote workers:
export LAVA_DB_ALLOWREMOTE=yes
That command also works on *upgrades*, so this may actually fix your problem. Upgrade master with this environment variable set, then restart lava on the worker.
I'll see about adding a note to setupworker or installworker to ensure that this is setup on the master - the problem being that the master doesn't want this enabled by default and the security implications should be clearly set out in the docs.
The information regarding the configuration of postresql was included for completeness only, I wasn't suggesting this was anything to do with LAVA.
Looks like a django error - your database connection is still not correct.
Is this something you have seen before?
No. I just googled SENTRY_DSN.
I too googled this, however as it is the first time we are setting up LAVA this way, and as this aspect of LAVA configuration is not documented,
... we plan to sort out missing docs like that ... there's already a card for it and I've added a comment to the card as a reminder about the magic environment variable.
The error message is very obscure - lava-deployment-tool could do with a way of testing whether it can see the database and reporting a sensible error message instead.
I didn't think it would be unreasonable to ask those who had written the system, and configured it this way before, whether they had seen this particular error, in case they were able to point me at a simple fix or a configuration step I may have missed, before I started investigating this further.
If you haven't seen this then fair enough, I will go carry on with my own investigations.
Sorry to leave you on a limb with this. The error you saw was unfamiliar, but after investigation, I hope this will fix the problem for you.
No - however, if you have a postgresql server installed on the worker, it is not required.
No there is no postgresql server installed on the worker.
OK, then this does need investigation because that kind of setup should "just work", especially on precise.
The initial use of setup instead of setupworker could have messed
up
the database configuration on the worker. It just looks like the worker cannot find the database.
After the mess I made of the previous install, I set up a VM to trial this, so this has been deployed from scratch using the updated commands you gave me.
Possibly the issue is with the configuration of the master. There has already been one issue with the postgresql setup, so possibly there are more. I will look into it.
It may well be with the configuration of the master, down to the missing environment variable.
--
Neil Williams
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
Hi Neil,
Earlier today I configured two new virtual machines to test the master/worker setup and I have managed to get this to work now. For this I installed a the master and slave from scratch with the LAVA_DB_ALLOWREMOTE variable set on the master, and all went well. I was able to do a couple of qemu runs on the worker with no issues.
It looks as though the problem I was seeing on our production LAVA setup, was down to my master configuration, which wasn't correct. When doing an upgrade of a master instance, the deployment tool doesn't carry out the steps for enabling remote workers, even with the LAVA_DB_ALLOWREMOTE variable set. I had then corrected some of the steps manually, but had missed a couple of bits, which is why I ended up with a *kind of* working setup.
When I did fresh install on my test setup today, the lava-deployment-tool handled the extra steps perfectly.
Thanks for your help with this, now I have a working setup to diff my production version against, I can apply the missing bits of config to the master. Alternatively I'll just do a fresh install and restore the database from a backup :)
I don't know if this is something you guys would like to be automated by the lava-deployment-tool during upgrades, or whether some steps on the wiki for the extra bit of config would be enough? If you need me to raise a ticket/bug for this I can do?
Cheers Dean
-----Original Message----- From: Neil Williams [mailto:codehelp@debian.org] Sent: 09 October 2013 10:20 To: Dean Arnold Cc: linaro-validation@lists.linaro.org Validation Subject: Re: [Linaro-validation] More Remote Dispatcher Questions - possible fix
On Tue, 8 Oct 2013 19:43:17 +0100 Dean Arnold Dean.Arnold@arm.com wrote:
next - Use the master information as is edit - Edit the master information Please decide what to do [next]: next ./lava-deployment-tool: line 270: defaults_coordinator: command not found
Could this be what is causing my problems?
Sorry, I completely missed this first time around. This is a bug in lava-deployment-tool - it is what caused the lack of the /etc/lava-coordinator/lava-coordinator.conf file which you added manually. I'll fix that problem in lava-deployment-tool. It's a minor change:
diff --git a/lava-deployment-tool b/lava-deployment-tool index 35486db..ca5d52e 100755 --- a/lava-deployment-tool +++ b/lava-deployment-tool @@ -708,6 +708,10 @@ wizard_coordinator () { true }
+defaults_coordinator () {
- true
+}
defaults_buildout () { true }
LAVA should still have setup postgresql correctly from a fresh install, especially as you mentioned that this is a fresh Ubuntu 12.04 LTS install.
My master instance of LAVA was installed on a clean Ubuntu server installation and the only time postresql was added was during the deployment of LAVA. The port number hadn't changed, the listen_addresses setting hadn't been set and therefore it just wasn't listening on that port at all.
I do intermittent reinstalls but most of the time the LAVA team are doing upgrades of existing clients.
Fresh installs tend to be tests in a VM which doesn't involve a remote dispatcher. The remote dispatchers are usually set up later but without these kind of problems.
lava-deployment-tool should have added this line to the postgresql configuration:
"host all all 0.0.0.0/0 trust"
I think that shows where the problem originally lies and that led me to this resource which should become the basis of the missing docs.
http://www.linaro.org/connect-lcu13/resources/Q/lce13 ADVANCED LAVA LAB CONFIGURATION Page 11 of this PDF http://www.linaro.org/documents/download/74099337b34eb0ab4521fb574f3bca b751e5603eee579
*Before* running ./lava-deployment-tool on the *master*, there is a variable to be exported which allows for the remote database connection from remote workers:
export LAVA_DB_ALLOWREMOTE=yes
That command also works on *upgrades*, so this may actually fix your problem. Upgrade master with this environment variable set, then restart lava on the worker.
I'll see about adding a note to setupworker or installworker to ensure that this is setup on the master - the problem being that the master doesn't want this enabled by default and the security implications should be clearly set out in the docs.
The information regarding the configuration of postresql was included for completeness only, I wasn't suggesting this was anything to do with LAVA.
Looks like a django error - your database connection is still not correct.
Is this something you have seen before?
No. I just googled SENTRY_DSN.
I too googled this, however as it is the first time we are setting up LAVA this way, and as this aspect of LAVA configuration is not documented,
... we plan to sort out missing docs like that ... there's already a card for it and I've added a comment to the card as a reminder about the magic environment variable.
The error message is very obscure - lava-deployment-tool could do with a way of testing whether it can see the database and reporting a sensible error message instead.
I didn't think it would be unreasonable to ask those who had written the system, and configured it this way before, whether they had seen this particular error, in case they were able to point me at a simple fix or a configuration step I may have missed, before I started investigating this further.
If you haven't seen this then fair enough, I will go carry on with my own investigations.
Sorry to leave you on a limb with this. The error you saw was unfamiliar, but after investigation, I hope this will fix the problem for you.
No - however, if you have a postgresql server installed on the worker, it is not required.
No there is no postgresql server installed on the worker.
OK, then this does need investigation because that kind of setup should "just work", especially on precise.
The initial use of setup instead of setupworker could have messed
up
the database configuration on the worker. It just looks like the worker cannot find the database.
After the mess I made of the previous install, I set up a VM to trial this, so this has been deployed from scratch using the updated commands you gave me.
Possibly the issue is with the configuration of the master. There has already been one issue with the postgresql setup, so possibly there are more. I will look into it.
It may well be with the configuration of the master, down to the missing environment variable.
--
Neil Williams
-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2557590 ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered in England & Wales, Company No: 2548782
On Thu, Oct 10, 2013 at 05:27:03PM +0100, Dean Arnold wrote:
Hi Neil,
Earlier today I configured two new virtual machines to test the master/worker setup and I have managed to get this to work now. For this I installed a the master and slave from scratch with the LAVA_DB_ALLOWREMOTE variable set on the master, and all went well. I was able to do a couple of qemu runs on the worker with no issues.
It looks as though the problem I was seeing on our production LAVA setup, was down to my master configuration, which wasn't correct. When doing an upgrade of a master instance, the deployment tool doesn't carry out the steps for enabling remote workers, even with the LAVA_DB_ALLOWREMOTE variable set. I had then corrected some of the steps manually, but had missed a couple of bits, which is why I ended up with a *kind of* working setup.
When I did fresh install on my test setup today, the lava-deployment-tool handled the extra steps perfectly.
Thanks for your help with this, now I have a working setup to diff my production version against, I can apply the missing bits of config to the master. Alternatively I'll just do a fresh install and restore the database from a backup :)
I don't know if this is something you guys would like to be automated by the lava-deployment-tool during upgrades, or whether some steps on the wiki for the extra bit of config would be enough? If you need me to raise a ticket/bug for this I can do?
Yes, please open a bug against lava-deployment-tool. It should be able to handle upgrades from older versions without losing the setup for allowing access to the remote workers.
On Thu, 10 Oct 2013 17:27:03 +0100 Dean Arnold Dean.Arnold@arm.com wrote:
Hi Neil,
Earlier today I configured two new virtual machines to test the master/worker setup and I have managed to get this to work now. For this I installed a the master and slave from scratch with the LAVA_DB_ALLOWREMOTE variable set on the master, and all went well. I was able to do a couple of qemu runs on the worker with no issues.
OK.
It looks as though the problem I was seeing on our production LAVA setup, was down to my master configuration, which wasn't correct. When doing an upgrade of a master instance, the deployment tool doesn't carry out the steps for enabling remote workers, even with the LAVA_DB_ALLOWREMOTE variable set. I had then corrected some of the steps manually, but had missed a couple of bits, which is why I ended up with a *kind of* working setup.
Please file this as a bug against lava-deployment-tool. There are other parts of the tool which fail to run on upgrade and it's time that this is fixed properly instead of adding another workaround.
TBH the environment variable should be replaced by a wizard question and a config option.
I don't know if this is something you guys would like to be automated by the lava-deployment-tool during upgrades, or whether some steps on the wiki for the extra bit of config would be enough? If you need me to raise a ticket/bug for this I can do?
The errors which occur when it goes wrong are so obscure and misleading that this needs to be handled inside the tool properly, especially as it fails on upgrade.
Please file a bug against lava-deployment-tool to get the remote worker support working without an environment variable during upgrades and add any comments you have regarding why your master postgresql configuration still wasn't right.
It needs to be possible to take a "stand-alone" master with it's own devices and add a remote worker using lava-deployment-tool upgrade just using the tool and without breakage.
linaro-validation@lists.linaro.org