On 01/28/2013 03:31 PM, Michael Hudson-Doyle wrote:
Thoughts? Am I solving the wrong problem?
Well. The thing that occurs to me is that what we are doing here is building a system that aims to be available for writes in the face of network partitions, and other people have already built systems that have this property -- it is basically the whole principle behind Amazon's famous dynamo db [1] and the systems it inspired like Riak and Cassandra. It seems unlikely that we'd do a better job than them.
time to read your article below.
One thing that I don't completely understand how to replicate if we have a simple job-accepting scheduler in the cloud is the sanity check about the submitting user being able to submit results to the stream specified in the job -- or even if token provided while submitting the job is valid, come to think of it!
I thought about this before my original email and decided to not mention this issue. However, my thinking was that we might need to add a new state to a job called something like "REJECTED". Now normally, we just reject a request before it ever becomes a job. However, in this offline mode, we could wait to reject the job until we came back online and were able to do the proper checks.
Cheers, mwh
[1] Everyone in computing should take the 40 minutes or so it takes to read this paper: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf if only for this quote:
yeah - this is bound to make me smarter :)
"For example, customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados."