On 17 Oct 2012, at 14:39, Antonio Terceiro antonio.terceiro@linaro.org wrote:
Dave Pigott escreveu:
Hi guys,
I've deployed a new compute node in the cloud, and upped the VCPU allocation in the project, but I'm seeing some oddities.
The "nova-manage service list" command tells me that nova-network and nova-compute are not running on 02 (I knew that and am working on restoring it) and 03 and 05. However, the instances that are running on those nodes are still accessible. Additionally, when I try to create an instance, it immediately errors.
That makes sense: once the VM's were started with KVM, they will continue to run independently of the processes that started them (nova-*).
Yeah, I kind of figured something like that was going on. Trouble is we've seen so many different types of fail in the process of getting the cloud running, that I'm never *quite* sure how broken things have got. :)
I've had similar problems in the past, and typically I had to reboot all nodes, starting with the control node (lava-cloud01). I also believe (note wording) I need to do an update/upgrade on every node to bring the cloud up to latest revisions of nova/services.
Obviously, if I do this, it will disrupt staging, dogfood and fastmodels, so this is a warning that absent from any dissent, I will reboot the cloud tomorrow morning. If you would rather defer this let's discuss when it should be deferred to. Obviously, to get the v8 fast model instance up is paramount.
Interrupting staging and dogfood is fine, I think. For fastmodels, if you put the devices offline in the scheduler before the interruption and put them back online after it's done (i.e. control will still be accepting jobs and queuing them), it should be OK.
Of course. Was planning on that, but good to be reminded, and I should have listed it in my e-mail.
Thanks
Dave