Hi all,
I've been worrying this one all weekend, and woke up this morning at 3:30AM with it all whirling round my head. I think I have an outline of how we're going to manage this, and I would like to get your comments/additions/brickbats before I add it all to the BP:
1) Move over to MAC address based DHCP serving for all infrastructure
Currently the IP address of every device in the infrastructure (cloud nodes, gateway, control etc etc etc) is all done by static assignment. This is bad for a number of reasons. My only concern here (lack of knowledge, rather than well founded fear) is on control itself. Can the DHCP server serve itself an IP address, or is this going to be the one exception?
2) Reserve 192.168.0.x for the public IP addresses for Cloud instances and 192.168.1.x for infrastructure. I'm pretty sure I can do this in dhcp.conf. Essentially this is a block list that only serves in those spaces if they are explicitly assigned by MAC addresses. The reason for this is trying to maintain the small existing cloud IP pool, which can't be assigned by dhcp MAC address, because it's managed by the cloud. (See my self argument in point 4.)
3) Go round every infrastructure node, and move it to dhcp served address
4) Drop the cloud pool of IP addresses and create the new range and restart the instances of the cloud.
Alternatively, I could just extend the pool to add the 192.168.0.x range. This is less disruptive because it means the existing instances won't have to be re-started/re-created. I think I +1 this myself. :D
5) Reconfigure dhcp.conf to 192.168.x.x/16
6 Restart DHCP
7) Restart networking on every node
My concern here would be that this will mean some disruption, so I would recommend that we wait until I have the new scheduler backup server in place so that we don't lose any jobs. Once control is back up we should run some test jobs through it just to be on the safe side.
All thoughts very welcome!
Thanks
Dave
It may be out of hours for the UK, but it's the LAVA leased line.
LAVA Team: We'll need to provide some external support for LAVA for the window in which they're doing maintenance. Any suggestions?
Thanks
Dave
On 2 Dec 2012, at 11:55, Arwen Donaghey <arwen.donaghey(a)linaro.org> wrote:
> JFYI. This is out of hours.
> Regards,
> --
> Arwen Donaghey
> Events Manager
>
> T: +44 1223 400 061| M: +44 7791 279 521
> Suite 220 | The Quorum | Barnwell Road | Cambridge | CB5 8RE
> Linaro.org │ Open source software for ARM SoCs
> Follow Linaro: Facebook | Twitter | Blog
> Registered Number: 07180318 | VAT Number: 990 0273 24
>
>
>
>
> Begin forwarded message:
>
>> From: "Zen - Fault Management" <faultmanagement(a)zen.co.uk>
>> Subject: At Risk - Planned Maintenance 12/12/2012 [5379552:4096964]
>> Date: 2 December 2012 02:03:05 GMT
>> To: <accounts(a)linaro.org>, <arwen.donaghey(a)linaro.org>
>>
>> Dear Leased Circuit Customer,
>>
>>
>> We have been notified of planned maintenance affecting your leased line service, the details of which are included below.
>>
>>
>> Affected installation address:
>>
>> Suite 220
>> The Quorum
>> Cambridge
>>
>> Associated Order ID: 803244
>>
>>
>> Maintenance window start: 12/12/2012 - 21:00
>>
>> Maintenance window end: 13/12/2012 - 01:00
>>
>>
>> Impact: None – At risk only
>>
>> Expected impact duration: None
>>
>>
>> Reason given for maintenance: Updates to router configuration.
>>
>> There should be no downtime experienced.
>>
>> If the maintenance is unsuccessful, a rollback and a router reboot will be required. Service will be restored when the router is back online.
>>
>>
>> Your service will be at risk of outages for the duration of the maintenance window.
>>
>> If you have any queries regarding this maintenance, please contact our Central Monitoring Team using the details below.
>>
>> We apologise for any inconvenience this may cause.
>>
>> Kind Regards,
>>
>> Central Monitoring Team
>>
>> --
>>
>> Zen Central Monitoring
>> E: central.monitoring(a)zen.co.uk
>> T: 0845 058 9010
>> F: 0845 058 9001
>> W: www.zensupport.co.uk
>>
>> ------------------
>> Our Home Talk Plus package has 5000 inclusive anytime minutes including 0845 and 0870 numbers. Switch now and talk longer for less - http://www.zen.co.uk/home-office/voice/phone-services.aspx
>>
>> Please consider your environmental responsibility before printing this email.
>>
>> This message is private and confidential. If you have received this message in error please notify us and remove it from your system.
>>
>> Zen Internet Limited may monitor email traffic data to manage billing, to handle customer enquiries and for the prevention and detection of fraud. We may also monitor the content of emails sent to and/or from Zen Internet Limited for the purposes of security, staff training and to monitor quality of service.
>>
>> Zen Internet Limited is registered in England and Wales
>> Sandbrook Park, Sandbrook Way, Rochdale OL11 1RY
>> Company No. 03101568
>> VAT Reg No. 686 0495 01
>
No joke. As I sent the last e-mail, another board failed:
------------
origen03
------------
http://validation.linaro.org/lava-server/scheduler/job/40335
Looks like serial line corruption when we were trying to deploy the Android test image, so commands failed. Put back online to retest.
Dave