Looks like we need to plan for some downtime while this happens. I'll make sure I take the boards offline. Can we run something outside the lab to keep availability of job submission?
Thanks
Dave
Begin forwarded message:
From: "Lee Heyes" central.monitoring@zen.co.uk Subject: Notification of Planned Maintenance [5468619:4172292] Date: 26 January 2013 05:23:06 GMT To: dave.pigott@linaro.org
Dear Leased Circuit Customer,
We have been notified of planned maintenance affecting your leased line service, the details of which are included below.
Affected installation address: Suite 220 The Quorum Cambridge Associated Order ID: 803244
Maintenance window start: 06/02/2013 - 23:55 Maintenance window end:07/02/2013 - 06:00
Impact: Loss of connectivity on leased circuit Expected impact duration: Up to 30 minutes
Reason given for maintenance: Essential switch Maintenance in London PoP
Your service will be at risk of additional or extended outages for the duration of the maintenance window.
If you have any queries regarding this maintenance, please contact our Central Monitoring Team using the details below.
Information relating to this outage including any updates will be posted on our Service Alerts page http://status.zensupport.co.uk/active/3/2880
We apologise for any inconvenience this may cause.
Kind Regards,
Central Monitoring Team
Zen Central Monitoring E: central.monitoring@zen.co.uk T: 0845 058 9010 F: 0845 058 9001 W: www.zensupport.co.uk
Service Alerts: http://status.zensupport.co.uk/
Fibre Offer – Free installation on Fibre Optic Broadband orders placed before 31st March 2013, subject to a 24 month contract. http://www.zen.co.uk/home-office/broadband/fibre-optic-broadband.aspx
Please consider your environmental responsibility before printing this email.
This message is private and confidential. If you have received this message in error please notify us and remove it from your system.
Zen Internet Limited may monitor email traffic data to manage billing, to handle customer enquiries and for the prevention and detection of fraud. We may also monitor the content of emails sent to and/or from Zen Internet Limited for the purposes of security, staff training and to monitor quality of service.
Zen Internet Limited is registered in England and Wales Sandbrook Park, Sandbrook Way, Rochdale OL11 1RY Company No. 03101568 VAT Reg No. 686 0495 01
On 01/28/2013 02:27 AM, Dave Pigott wrote:
Looks like we need to plan for some downtime while this happens. I'll make sure I take the boards offline. Can we run something outside the lab to keep availability of job submission?
This seems to be an idea worth at least discussing. It seems like it would be really cool to create some sort of simple scheduler service that basically accepts accepts all job requests and saves them to disk. Maybe its preseeded with a job-id so that we can return unique job ID's back to the caller.
We then create some type of import tool that, once the service is back online, can suck in this data and execute the jobs.
Thoughts? Am I solving the wrong problem?
Andy Doan andy.doan@linaro.org writes:
On 01/28/2013 02:27 AM, Dave Pigott wrote:
Looks like we need to plan for some downtime while this happens. I'll make sure I take the boards offline. Can we run something outside the lab to keep availability of job submission?
This seems to be an idea worth at least discussing.
In fact, I think we have discussed it before :-)
It seems like it would be really cool to create some sort of simple scheduler service that basically accepts accepts all job requests and saves them to disk.
Yeah, feels like it should be fairly simple. It could run in EC2 or whatever.
Maybe its preseeded with a job-id so that we can return unique job ID's back to the caller.
I think this is essential. It might help to move away from using the database-assigned primary key id as the id we present to the user maybe? One way this could work though is people _always_ submit to this simple service in the cloud, in which case it could get to assign the IDs.
We then create some type of import tool that, once the service is back online, can suck in this data and execute the jobs.
Right.
Thoughts? Am I solving the wrong problem?
Well. The thing that occurs to me is that what we are doing here is building a system that aims to be available for writes in the face of network partitions, and other people have already built systems that have this property -- it is basically the whole principle behind Amazon's famous dynamo db [1] and the systems it inspired like Riak and Cassandra. It seems unlikely that we'd do a better job than them.
One thing that I don't completely understand how to replicate if we have a simple job-accepting scheduler in the cloud is the sanity check about the submitting user being able to submit results to the stream specified in the job -- or even if token provided while submitting the job is valid, come to think of it!
Cheers, mwh
[1] Everyone in computing should take the 40 minutes or so it takes to read this paper: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf if only for this quote:
"For example, customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados."
On 01/28/2013 03:31 PM, Michael Hudson-Doyle wrote:
Thoughts? Am I solving the wrong problem?
Well. The thing that occurs to me is that what we are doing here is building a system that aims to be available for writes in the face of network partitions, and other people have already built systems that have this property -- it is basically the whole principle behind Amazon's famous dynamo db [1] and the systems it inspired like Riak and Cassandra. It seems unlikely that we'd do a better job than them.
time to read your article below.
One thing that I don't completely understand how to replicate if we have a simple job-accepting scheduler in the cloud is the sanity check about the submitting user being able to submit results to the stream specified in the job -- or even if token provided while submitting the job is valid, come to think of it!
I thought about this before my original email and decided to not mention this issue. However, my thinking was that we might need to add a new state to a job called something like "REJECTED". Now normally, we just reject a request before it ever becomes a job. However, in this offline mode, we could wait to reject the job until we came back online and were able to do the proper checks.
Cheers, mwh
[1] Everyone in computing should take the 40 minutes or so it takes to read this paper: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf if only for this quote:
yeah - this is bound to make me smarter :)
"For example, customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados."
Andy Doan andy.doan@linaro.org writes:
On 01/28/2013 03:31 PM, Michael Hudson-Doyle wrote:
Thoughts? Am I solving the wrong problem?
Well. The thing that occurs to me is that what we are doing here is building a system that aims to be available for writes in the face of network partitions, and other people have already built systems that have this property -- it is basically the whole principle behind Amazon's famous dynamo db [1] and the systems it inspired like Riak and Cassandra. It seems unlikely that we'd do a better job than them.
time to read your article below.
To be fair, we're pretty unlikely to build anything on this sort of technology before the outager we're being notified of :-)
One thing that I don't completely understand how to replicate if we have a simple job-accepting scheduler in the cloud is the sanity check about the submitting user being able to submit results to the stream specified in the job -- or even if token provided while submitting the job is valid, come to think of it!
I thought about this before my original email and decided to not mention this issue.
:)
However, my thinking was that we might need to add a new state to a job called something like "REJECTED". Now normally, we just reject a request before it ever becomes a job. However, in this offline mode, we could wait to reject the job until we came back online and were able to do the proper checks.
Yeah, that could work. Still not sure if we'll get this done before the outage though...
Cheers, mwh
linaro-validation@lists.linaro.org