On Mon, 7 Oct 2013 09:03:09 +0100 Dave Pigott dave.pigott@linaro.org wrote:
Hi Dave/Matt,
I have recently installed a remote worker on a new server, but I am unable to get it to talk to the master properly. I just wanted to check if what I have done is correct and if I had missed anything.
Here are the steps I followed..
My master server (pdsw-lava) is running a "production" instance of LAVA. I have updated the LAVA version to the latest using the git lava-deployement-tool.
On new dispatcher (pdsw-lava-dispatcher01), using the latest git lava-deployement-tool, I installed the remote worker in the following way: $ lava-deployment-tool setup $ lava-deployment-tool installworker production
That should have been lava-deployment-tool setupworker
You need to now remove the lava-coordinator package - sudo apt-get purge lava-coordinator.
Then copy the coordinator configuration from the master, as described here: http://validation.linaro.org/static/docs/lava-dispatcher/multinode.html#lava...
That would block MultiNode jobs but wouldn't affect other jobs.
setup adds packages needed by the master (like apache) which setupworker does not install.
On pdsw-lava-dispatcher01 I created these directories: /srv/lava/instances/production/etc/lava-dispatcher/devices /srv/lava/instances/production/etc/lava-dispatcher/device-types
and SCP'd the contents of /srv/lava/instances/production/etc/lava-dispatcher/device-types over from my master server.
I removed the device config file pdswlava-vetc2-06.conf from /srv/lava/instances/production/etc/lava-dispatcher/devices on the master and added this to the same directory on the remote worker.
I then restarted lava on both servers (just in case it was needed).
I submitted a job on the master using pdswlava-vetc2-06 and this stayed in the "submitted" state but never did anything.
To debug the scheduler, check the lava-scheduler log on the master and the worker - change the scheduler to debug status too.
The log file will be something like: /srv/lava/instances/playground/var/log/lava-scheduler.log
To change the daemon to debug mode, edit the loglevel in the command at the end of this file: /etc/init/lava-instance-scheduler.conf
--loglevel=info => --loglevel=debug
Restart lava to get the daemon to notice the config change. Cancel the pending job, if it still exists and submit a new one. Consider using a KVM device type to isolate problems with the device configuration.
After a while I removed the pdswlava-vetc2-06.conf file from /srv/lava/instances/production/etc/lava-dispatcher/devices on the remote worker and added this back into the same directory on the master. Instantly the job ran as expected.
More than likely the answer will be in the lava-scheduler log on the worker.
Just check you have the scheduler enabled on the worker:
e.g. in your equivalent of: /srv/lava/instances/playground/instance.conf LAVA_SCHEDULER_ENABLED='yes'
Can you think of anything I have missed in my setup which would prevent the master and worker from talking to each other? Is there anything I should be seeing in the web UI of the master to indicate it is connected to a remote worker, or is there an easy way to test the connection between the two?
As long as the scheduler daemon is running on the worker, the scheduler log on the worker should give some indication of whether it can connect.
The log also shows how many configured devices the scheduler thinks it has.