On 07/02/2012 12:31 AM, Michael Hudson-Doyle wrote:
Multi-machine code
This is easy IMHO: all machines should have the same code installed. With the appropriate ssh keys scattered around, it should be easy to write a fabric job or just plain bash script to run ldt update on each machine.
+1
Multi-machine data
Accessing postgres from another machine is a solved problem, to put it mildly :-)
I don't have a good idea on how to access media files across the network. In an ideal world, we would have a Django storage backend that talked to something like http://ceph.com/ceph-storage/object-storage/ or http://hadoop.apache.org/hdfs/ -- we don't need anything like full file system semantics -- but for now, maybe just mounting the media files over NFS might be the quickest way to get things going.
This feels like the messiest part of the problem to me. I keep trying to think of how we can avoid solving it, but if our fastmodel dispatcher conversation is any indicator - we'll have to use some hacks to work around things until its solved.
Multi-machine configuration
I think by and large the configuration of each instance should be the same. This means we need a mechanism to distribute changes to the instances. One way would be to store the configuration in a branch, and have ldt update upgrade this branch too (I think it would even be fairly easy to have multiple copies of the configuration on disk, similar to the way we have multiple copies of the buildouts, and have each buildout point to a specific revision of the config).
We could also have the revision of the config branch to use be specified in the lava-manifest branch but that doesn't sound friendly to third party deployments -- I think the config branch should be specified as a parameter of the instance set and updating an instance set should update each instance to the latest version of the config branch. This will require a certain discipline in making changes to the branch!
A thought: what if the "master" server had some sort of API where it listed what code/config level it was at. Then the worker nodes could periodically pull from that and update themselves as needed? This might make upgrades easier. However, maybe this is the wrong idea and we should get to a point where puppet can handle this.
Setup issues
There will be a requirement to make sure ssh keys etc are set up on the various machines involved. Ideally this would be done via puppet but for now I think we can just do it by hand...
yeah- we are only scaling to like 2 nodes, but puppet does seem like the most sane way to manage this long term.