So we're reaching the point where we are going to want to have more than one machine involved in running LAVA production. The immediate cause is to avoid running fastmodels on the same machine as the database and web server and everything else, but I think as the system grows up we'll want to do this for more functions.
I know there have been some thinking about this in the past, but not to my knowledge anything written down. I've spent a good while today thinking about this sort of thing, so here's a bit of a brain dump.
The different parts of a LAVA instance ======================================
A LAVA instance is three things:
1) Code 2) Configuration 3) Data.
Data comes in two kinds: a postgres database and "media" files (log files and bundles mostly for us).
There are a few parts to the configuration: there is the dispatcher config, the server/django config and the apache config. TBH, it would be better if the dispatcher config was derived from the database somehow. I still think we don't do a good job of handling the apache config, but it's a messy problem.
The Django configuration includes the information on how to reach the data, both media and db.
I think we'll need to define an "instance set" concept -- a list of machines and instance on those machines that share code, configuration and data.
Multi-machine code ==================
This is easy IMHO: all machines should have the same code installed. With the appropriate ssh keys scattered around, it should be easy to write a fabric job or just plain bash script to run ldt update on each machine.
Multi-machine data ==================
Accessing postgres from another machine is a solved problem, to put it mildly :-)
I don't have a good idea on how to access media files across the network. In an ideal world, we would have a Django storage backend that talked to something like http://ceph.com/ceph-storage/object-storage/ or http://hadoop.apache.org/hdfs/ -- we don't need anything like full file system semantics -- but for now, maybe just mounting the media files over NFS might be the quickest way to get things going.
Multi-machine configuration ===========================
I think by and large the configuration of each instance should be the same. This means we need a mechanism to distribute changes to the instances. One way would be to store the configuration in a branch, and have ldt update upgrade this branch too (I think it would even be fairly easy to have multiple copies of the configuration on disk, similar to the way we have multiple copies of the buildouts, and have each buildout point to a specific revision of the config).
We could also have the revision of the config branch to use be specified in the lava-manifest branch but that doesn't sound friendly to third party deployments -- I think the config branch should be specified as a parameter of the instance set and updating an instance set should update each instance to the latest version of the config branch. This will require a certain discipline in making changes to the branch!
All that said, we don't want the instances on each machine to be _completely_ identical, leading to...
Differentiating instances =========================
The point of this excercise is not to purely scale horizontally; we want different instances to do different things. I think the primary way we will differentiate instances is by which services they run: do they run uwsgi or not, do they run the scheduler or not, do they run celeryd or not? We already have a limited form of this already, in that you can configure an instance to start the scheduler or not, but this will need systemizing.
In addition one instance in a set will need to 'own' the database: when we upgrade an instance we want one and only one instance to run the migrations.
I'd also like to push towards a model where we can do rolling upgrades but that's a different kettle of fish I think ...
Setup issues ============
There will be a requirement to make sure ssh keys etc are set up on the various machines involved. Ideally this would be done via puppet but for now I think we can just do it by hand...
Thoughts? After writing this email, I don't think that there is a huge amount of work to do a good job here; we shouldn't settle for hacks.
Cheers, mwh
On Mon, 02 Jul 2012 17:31:08 +1200, Michael Hudson-Doyle michael.hudson@canonical.com wrote:
Multi-machine data
Accessing postgres from another machine is a solved problem, to put it mildly :-)
I don't have a good idea on how to access media files across the network. In an ideal world, we would have a Django storage backend that talked to something like http://ceph.com/ceph-storage/object-storage/ or http://hadoop.apache.org/hdfs/ -- we don't need anything like full file system semantics -- but for now, maybe just mounting the media files over NFS might be the quickest way to get things going.
Is it possible to have the other machines explicitly fetch the files they need from the current primary machine?
That avoids having to set up infrastructure to share files in this manner at the cost of having the fetches going on (which should be reasonably quick)
If implicit sharing is wanted/needed then ceph/hdfs seems to be a great way to go, or nfs to start with as you say.
Thanks,
James
James Westby james.westby@canonical.com writes:
On Mon, 02 Jul 2012 17:31:08 +1200, Michael Hudson-Doyle michael.hudson@canonical.com wrote:
Multi-machine data
Accessing postgres from another machine is a solved problem, to put it mildly :-)
I don't have a good idea on how to access media files across the network. In an ideal world, we would have a Django storage backend that talked to something like http://ceph.com/ceph-storage/object-storage/ or http://hadoop.apache.org/hdfs/ -- we don't need anything like full file system semantics -- but for now, maybe just mounting the media files over NFS might be the quickest way to get things going.
Is it possible to have the other machines explicitly fetch the files they need from the current primary machine?
All (or at least >1) nodes need to write.
That avoids having to set up infrastructure to share files in this manner at the cost of having the fetches going on (which should be reasonably quick)
Otherwise that would be a great idea :-)
We could do something like have all files be written locally (only one node will need to write to a given file) and served over http to other nodes but eh. I don't want to implement something myself here.
If implicit sharing is wanted/needed then ceph/hdfs seems to be a great way to go, or nfs to start with as you say.
Cheers, mwh
On Mon, 02 Jul 2012 23:16:19 +1200, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
All (or at least >1) nodes need to write.
Well, would it be possible to add an api so that workers can write results back. Something like the fast-models worker reporting everything in the results, rather than using the filesystem?
Apologies if this makes no sense for your particular use case, I haven't looked at it in depth.
Thanks,
James
On 07/02/2012 12:31 AM, Michael Hudson-Doyle wrote:
Multi-machine code
This is easy IMHO: all machines should have the same code installed. With the appropriate ssh keys scattered around, it should be easy to write a fabric job or just plain bash script to run ldt update on each machine.
+1
Multi-machine data
Accessing postgres from another machine is a solved problem, to put it mildly :-)
I don't have a good idea on how to access media files across the network. In an ideal world, we would have a Django storage backend that talked to something like http://ceph.com/ceph-storage/object-storage/ or http://hadoop.apache.org/hdfs/ -- we don't need anything like full file system semantics -- but for now, maybe just mounting the media files over NFS might be the quickest way to get things going.
This feels like the messiest part of the problem to me. I keep trying to think of how we can avoid solving it, but if our fastmodel dispatcher conversation is any indicator - we'll have to use some hacks to work around things until its solved.
Multi-machine configuration
I think by and large the configuration of each instance should be the same. This means we need a mechanism to distribute changes to the instances. One way would be to store the configuration in a branch, and have ldt update upgrade this branch too (I think it would even be fairly easy to have multiple copies of the configuration on disk, similar to the way we have multiple copies of the buildouts, and have each buildout point to a specific revision of the config).
We could also have the revision of the config branch to use be specified in the lava-manifest branch but that doesn't sound friendly to third party deployments -- I think the config branch should be specified as a parameter of the instance set and updating an instance set should update each instance to the latest version of the config branch. This will require a certain discipline in making changes to the branch!
A thought: what if the "master" server had some sort of API where it listed what code/config level it was at. Then the worker nodes could periodically pull from that and update themselves as needed? This might make upgrades easier. However, maybe this is the wrong idea and we should get to a point where puppet can handle this.
Setup issues
There will be a requirement to make sure ssh keys etc are set up on the various machines involved. Ideally this would be done via puppet but for now I think we can just do it by hand...
yeah- we are only scaling to like 2 nodes, but puppet does seem like the most sane way to manage this long term.
Andy Doan andy.doan@linaro.org writes:
On 07/02/2012 12:31 AM, Michael Hudson-Doyle wrote:
Multi-machine code
This is easy IMHO: all machines should have the same code installed. With the appropriate ssh keys scattered around, it should be easy to write a fabric job or just plain bash script to run ldt update on each machine.
+1
Multi-machine data
Accessing postgres from another machine is a solved problem, to put it mildly :-)
I don't have a good idea on how to access media files across the network. In an ideal world, we would have a Django storage backend that talked to something like http://ceph.com/ceph-storage/object-storage/ or http://hadoop.apache.org/hdfs/ -- we don't need anything like full file system semantics -- but for now, maybe just mounting the media files over NFS might be the quickest way to get things going.
This feels like the messiest part of the problem to me. I keep trying to think of how we can avoid solving it, but if our fastmodel dispatcher conversation is any indicator - we'll have to use some hacks to work around things until its solved.
So let's think about our requirements here a bit. We have two kinds of storage requirements really:
* bundles -- these come in via XML-RPC (i.e. the web front end) currently, although it's possible that we'll stop sending results from the dispatcher via XML-RPC in favour of some more direct insertion at some point.
They need to be read when being deserialized, which currently is done in the web front end but maybe we should stop doing that. They arrive in one lump and I can't really see that changing.
They only need to be read by the web front end in the "view bundle" tab, and I can't really see that changing.
* dispatcher log files. These are created where the dispatcher runs, although we already have the output handled by a specific process -- that could easily send the output via an API rather than just writing it to a file as it does now. They need to be accessible by the web front end for display and we really want to keep the incremental output we currently have.
Oh, there are also attachments -- but they are like bundles really.
If we get to rework how these files are stored, we should really figure out how to do backups sensibly. I'm sure things like Ceph support snapshots which would be much better than tarballing all the files every day...
That said, it feels like we could get away with keeping the media files on the web front end machine and shipping them from wherever the dispatcher runs to the web front end via celery or something.
Multi-machine configuration
I think by and large the configuration of each instance should be the same. This means we need a mechanism to distribute changes to the instances. One way would be to store the configuration in a branch, and have ldt update upgrade this branch too (I think it would even be fairly easy to have multiple copies of the configuration on disk, similar to the way we have multiple copies of the buildouts, and have each buildout point to a specific revision of the config).
We could also have the revision of the config branch to use be specified in the lava-manifest branch but that doesn't sound friendly to third party deployments -- I think the config branch should be specified as a parameter of the instance set and updating an instance set should update each instance to the latest version of the config branch. This will require a certain discipline in making changes to the branch!
A thought: what if the "master" server had some sort of API where it listed what code/config level it was at. Then the worker nodes could periodically pull from that and update themselves as needed? This might make upgrades easier. However, maybe this is the wrong idea and we should get to a point where puppet can handle this.
So, the stuff I'm reading suggests that a "push" mode using something like fabric is better for deployments than a "pull" oriented tool like puppet.
Setup issues
There will be a requirement to make sure ssh keys etc are set up on the various machines involved. Ideally this would be done via puppet but for now I think we can just do it by hand...
yeah- we are only scaling to like 2 nodes, but puppet does seem like the most sane way to manage this long term.
My impression is that we'd use something like puppet to do the sort of thing ldt setup does, and also maybe some of the unix-user and ssh key management, and stuff built on fabric or similar to run ldt install/update.
Cheers, mwh
linaro-validation@lists.linaro.org