Moving cbuild into the validation lab

List overview All Threads
Download

newer

older

Some thought about the...

Re: [Linaro-validation] Panda ES...

Michael Hope

19 Jul 2012 19 Jul '12

10:15 p.m.

I'd like to move the cbuild infrastructure out of my home office and domains and into the validation lab. That makes things one step cleaner than the current setup and one step closer to hooking things into LAVA.

There's more terse notes at: https://wiki.linaro.org/MichaelHope/Sandbox/CBuildMove

but here's how I'd deploy it: * Create cbuild.linaro.org to replace ex.seabright.co.nz * Add a real or virtual medium capacity machine to run the web server, scheduler, snapshotter, storage, and other administrative stuff * Add 500 GB+ of backup storage * Add a reverse proxy to expose the server to the internet * Delete orion that currently runs ex.seabright.co.nz * Delete the EC2 micro instance that runs apus.seabright.co.nz * Redirect and delete builds.linaro.org

tcserver01 stays as a build and benchmark machine.

I'm not happy with control being the web server, bounce host, and a build machine. It's too taxing and unreliable. I'd like an unloaded minimal host there instead.

Thoughts? Who can drive this?

-- Michael

Show replies by date

Joey STANFORD

24 Jul 24 Jul

10:55 p.m.

Howdy,

Meeting notes from today's meeting...

https://wiki.linaro.org/MichaelHope/Sandbox/CBuildMove

Attendees: Michael Andy Joey

Everything is running on an atom server. Need a chunk of ram. Pulls gcc-linaro tip and fires off a build Keeps an eye on gcc-linaro changes all low tech scripts but very robust

Wants to move this over to the Lava Lab.

Andy needs to look at machines left over.

Michael wants to have some control changes as part of this to help unburden it.

* Create cbuild.linaro.org to replace ex.seabright.co.nz * Add a real or virtual medium capacity machine in the LAVA farm to run the web server, scheduler, snapshotter, storage, and other administrative stuff * Add 500 GB+ of backup storage * Add a reverse proxy to expose the server to the internet * Delete orion that currently runs ex.seabright.co.nz * Delete the EC2 micro instance that runs apus.seabright.co.nz * Redirect and delete builds.linaro.org

Andy thinks the machine isn't a problem. Agrees with the control rework. Dave Piggot gets jerked around by various different activities so it'll be difficult to get the cycles.

Andy proposes: * Dave can setup the bounce host since lava needs it. Mostly contained in an existing 12.08 blueprint * We give someone remote access and let that someone take it over. Michael volunteered to do this if we go this route

We could spawn something off our openstack instance but Michael is quite concerned about contention. We're still waiting on the leased line to be installed so we don't have great bandwidth in Cambridge at this point.

IO is the killer for this according to Michael.

If we have enough memory it would be helpful to load everything into a tempfs which would increase performance.

Definitely need a dedicated sysadmin in Cambridge. It would be better to have a sysadmin set up the infrastructure as written up by Michael.

Andy would like a lab diagram. Michael will document his proposed layout and then hand that off to Andy and Dave.

Dave and Andy will talk tomorrow and Andy will setup a follow up with Michael.

Loïc Minier

25 Jul 25 Jul

11:30 a.m.

Thanks for the notes and sorry for missing the meeting (was actually around but thought it was a full-day event reminder that was firing rather than a real meeting, and didn't see the meeting in the daily google calendar email! sorry).

I'm fine with the plan, but it strikes me that we're considering Cambridge for this. Ok, we will eventually get a leased line but: * Cambridge is not a real DC * we're getting recommendations to use EC2 or other major cloud providers to run everything we're doing rather than trying to build our own infrastructure * the relatively large storage requirements would be more easily addressed with services like S3 than with the expensive RAID arrays we'd have to buy

It seems whatever the choice, this is blocked on sysadmin time.

On Tue, Jul 24, 2012, Joey STANFORD wrote:

...

Howdy,

Meeting notes from today's meeting...

https://wiki.linaro.org/MichaelHope/Sandbox/CBuildMove

Attendees: Michael Andy Joey

Everything is running on an atom server. Need a chunk of ram. Pulls gcc-linaro tip and fires off a build Keeps an eye on gcc-linaro changes all low tech scripts but very robust

Wants to move this over to the Lava Lab.

Andy needs to look at machines left over.

Michael wants to have some control changes as part of this to help unburden it.

Create cbuild.linaro.org to replace ex.seabright.co.nz

Add a real or virtual medium capacity machine in the LAVA farm

to run the web server, scheduler, snapshotter, storage, and other administrative stuff

Add 500 GB+ of backup storage

Add a reverse proxy to expose the server to the internet

Delete orion that currently runs ex.seabright.co.nz

Delete the EC2 micro instance that runs apus.seabright.co.nz

Redirect and delete builds.linaro.org

Andy thinks the machine isn't a problem. Agrees with the control rework. Dave Piggot gets jerked around by various different activities so it'll be difficult to get the cycles.

Andy proposes:

Dave can setup the bounce host since lava needs it. Mostly

contained in an existing 12.08 blueprint

We give someone remote access and let that someone take it over.

Michael volunteered to do this if we go this route

We could spawn something off our openstack instance but Michael is quite concerned about contention. We're still waiting on the leased line to be installed so we don't have great bandwidth in Cambridge at this point.

IO is the killer for this according to Michael.

If we have enough memory it would be helpful to load everything into a tempfs which would increase performance.

Definitely need a dedicated sysadmin in Cambridge. It would be better to have a sysadmin set up the infrastructure as written up by Michael.

Andy would like a lab diagram. Michael will document his proposed layout and then hand that off to Andy and Dave.

Dave and Andy will talk tomorrow and Andy will setup a follow up with Michael.

-- Loïc Minier

Michael Hope

9:31 p.m.

On 25 July 2012 23:30, Loïc Minier loic.minier@linaro.org wrote:

...

Thanks for the notes and sorry for missing the meeting (was actually around but thought it was a full-day event reminder that was firing rather than a real meeting, and didn't see the meeting in the daily google calendar email! sorry).

I'm fine with the plan, but it strikes me that we're considering Cambridge for this. Ok, we will eventually get a leased line but:

Cambridge is not a real DC

we're getting recommendations to use EC2 or other major cloud providers to run everything we're doing rather than trying to build our own infrastructure

the relatively large storage requirements would be more easily addressed with services like S3 than with the expensive RAID arrays we'd have to buy

I don't want a split solution where we use one thing for x86 and another for ARM and there's no way of doing PandaBoards in the cloud. Might be interesting in the future to treat the Pandas as a cloud like pool and see if there's a tool that gives a EC2 like API on top.

I want the same storage solution for the x86 and ARM builds. It seems inefficient to push the build results up to S3, especially the ~100 MB binaries. 500 GB is $60/month on S3 and I don't think the sysadmin costs will be any different.

My tools are also bad and use the filesystem as the database. They'd need reworking to run against S3.

-- Michael

Loïc Minier

10:30 p.m.

On Thu, Jul 26, 2012, Michael Hope wrote:

...

I don't want a split solution where we use one thing for x86 and another for ARM and there's no way of doing PandaBoards in the cloud. Might be interesting in the future to treat the Pandas as a cloud like pool and see if there's a tool that gives a EC2 like API on top.

The EC2 instances would just be your x86 machines; if you need dedicated ones in Cambridge for benchmarks accuracy, that's fine. I don't think we care about the EC2 API here, it's more about the fact that Amazon is in the business of building efficient / competitive infrastructure and we're not: anything we build is bound to be expensive and a waste of our brains (hmmm brainnns). We're struggling for sysadmin time, vms in EC2 at least relieve us from the physical burden of admin and give us the possibility to upgrade/downgrade/revisit our virtual infrastructure.

...

I want the same storage solution for the x86 and ARM builds. It seems inefficient to push the build results up to S3, especially the ~100 MB binaries. 500 GB is $60/month on S3 and I don't think the sysadmin costs will be any different.

I don't know how much it costs to buy 500 GB of RAID storage, but I'm pretty sure it will take us more than a year to repay just the cost of the disks (disregarding the time to order, the time to set them up, the electricity costs, the cost of the RAID controller and of the server hosting it); then some disks might fail and we will have to replace them. Will we have 24/7 staff to replace the disks? etc.

If you're worried about the speed of the line, it's a worry no matter what; there's more pressure on the quality of the line if we host more things in Cambridge.

...

My tools are also bad and use the filesystem as the database. They'd need reworking to run against S3.

Why can't they use the local EBS filesystem on the instance they run on for the database side of things and S3 for the build publishing side of things? We could also use an EBS volume to store everything and expose it with Apache from EC2, just like we do for Android builds.

(There are solutions to pretend S3 is a filesystem, but frankly they suck, even more so with large files.)

-- Loïc Minier

Andy Doan

8:09 p.m.

Loic brought up some good points about EC2 and I'll let that thread continue to discuss those merits. However, there are some things I can still discuss that are needed for LAVA regardless but also happen to fit nicely with Michael's goals.

I spoke to Dave this morning and got some guidance. The easiest way to cover things would be to describe how I'd like to see the lab laid out:

As we know everything goes through "control" and that box is also running LAVA which is not good. So we start with a new system, lets call it "bounce".

* Runs SSHD * Runs apache with mulitiple vhosts set up for reverse proxy.

This could be the new system Dave's setting up now to give access to the TC2s. I looked at the configuration stuff for this and its pretty easy to do. We could prototype it on alternate ports before switching live to ensure there's no lapse of service.

As Michael noted yesterday, it would be nice to grant access to the lab using some type sync with SSH keys of users from a launchpad group. I'm guessing that code has already been written somewhere else before.

We then start taking advantage of our new cloud set up. We currently have 5 System76 systems with specs:

24GB RAM 500GB Disk Intel Xeon X3450 quad core

1 is the cloud controller and the other 4 can run VMs. For LAVA we'll be creating a new VM called "staging". We'll move the staging instance from our control node to this node and also run a dogfood instance of LAVA. We'll also probably create a new instance for our FastModel node.

So up to this point, I'm just covering basic things we need to do for LAVA. To service Michael, we'd:

* grant ssh access to "bounce" * create a VM for him * let him do as he wishes

-andy

Michael Hope

9:33 p.m.

On 26 July 2012 08:09, Andy Doan andy.doan@linaro.org wrote:

...

Loic brought up some good points about EC2 and I'll let that thread continue to discuss those merits. However, there are some things I can still discuss that are needed for LAVA regardless but also happen to fit nicely with Michael's goals.

I spoke to Dave this morning and got some guidance. The easiest way to cover things would be to describe how I'd like to see the lab laid out:

As we know everything goes through "control" and that box is also running LAVA which is not good. So we start with a new system, lets call it "bounce".

Runs SSHD

Runs apache with mulitiple vhosts set up for reverse proxy.

This could be the new system Dave's setting up now to give access to the TC2s. I looked at the configuration stuff for this and its pretty easy to do. We could prototype it on alternate ports before switching live to ensure there's no lapse of service.

As Michael noted yesterday, it would be nice to grant access to the lab using some type sync with SSH keys of users from a launchpad group. I'm guessing that code has already been written somewhere else before.

We then start taking advantage of our new cloud set up. We currently have 5 System76 systems with specs:

24GB RAM 500GB Disk Intel Xeon X3450 quad core

1 is the cloud controller and the other 4 can run VMs. For LAVA we'll be creating a new VM called "staging". We'll move the staging instance from our control node to this node and also run a dogfood instance of LAVA. We'll also probably create a new instance for our FastModel node.

So up to this point, I'm just covering basic things we need to do for LAVA. To service Michael, we'd:

grant ssh access to "bounce"

create a VM for him

let him do as he wishes

I'm happy to do that but we should pick a philosophy first. I can do it and document it, but it won't be to the best practice and won't match the other systems.

-- Michael

Loïc Minier

10:08 p.m.

On Wed, Jul 25, 2012, Andy Doan wrote:

...

                            So we start with a new system, lets
call it "bounce".

Runs SSHD

Runs apache with mulitiple vhosts set up for reverse proxy.

Yup, this is pretty much what I run at home as a frontend; simple and effective

...

As Michael noted yesterday, it would be nice to grant access to the lab using some type sync with SSH keys of users from a launchpad group. I'm guessing that code has already been written somewhere else before.

I looked at ssh-import-id, but it didn't have a team feature and didn't particularly impress me, so instead I hacked a custom script for the ~linaro-flexlm use case: http://bazaar.launchpad.net/~linaro-sysadmins/linaro-its-tools/trunk/view/he...

It's custom because it hardcodes ~linaro-flexlm and it also checks that people there are members of ~linaro (harcoded as well). Not hard to make more generic.

It's wrapped by a lock in this script: http://bazaar.launchpad.net/~linaro-sysadmins/linaro-its-tools/trunk/view/he...

and the crontab entry looks like this: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && ./update-ssh-keys ./flexlm-sshkeys

-- Loïc Minier

Loïc Minier

26 Jul 26 Jul

3:04 p.m.

I've revamped the scripts and split them differently; it's more generic now, especially if you only care about the union of a set of Launchpad teams or persons.

e.g.: ./lp-members-sshkeys --sshkeys linaro-flexlm linaro-validation to dump SSH keys from folks recursively under ~linaro-flexlm or ~linaro-validation.

Cron to update ~/.ssh/authorized_keys would be: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && PATH="$PATH:`pwd`" && update-ssh-keys lp-members-sshkeys --sshkeys linaro-flexlm linaro-validation

Cheers,

On Thu, Jul 26, 2012, Loïc Minier wrote:

...

...
As Michael noted yesterday, it would be nice to grant access to the lab using some type sync with SSH keys of users from a launchpad group. I'm guessing that code has already been written somewhere else before.

I looked at ssh-import-id, but it didn't have a team feature and didn't particularly impress me, so instead I hacked a custom script for the ~linaro-flexlm use case: http://bazaar.launchpad.net/~linaro-sysadmins/linaro-its-tools/trunk/view/he...

It's custom because it hardcodes ~linaro-flexlm and it also checks that people there are members of ~linaro (harcoded as well). Not hard to make more generic.

It's wrapped by a lock in this script: http://bazaar.launchpad.net/~linaro-sysadmins/linaro-its-tools/trunk/view/he...

and the crontab entry looks like this: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && ./update-ssh-keys ./flexlm-sshkeys

-- Loïc Minier

-- Loïc Minier

Andy Doan

3:07 p.m.

That's awesome!

Dave has the new gateway box set up, so I can take a stab at this today.

One question: Does this script have the logic to remove users if they get removed from the LP group, or is that a manual step?

On 07/26/2012 10:04 AM, Loïc Minier wrote:

...

I've revamped the scripts and split them differently; it's more generic now, especially if you only care about the union of a set of Launchpad teams or persons.

e.g.: ./lp-members-sshkeys --sshkeys linaro-flexlm linaro-validation to dump SSH keys from folks recursively under ~linaro-flexlm or ~linaro-validation.

Cron to update ~/.ssh/authorized_keys would be: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && PATH="$PATH:`pwd`" && update-ssh-keys lp-members-sshkeys --sshkeys linaro-flexlm linaro-validation
Cheers,
On Thu, Jul 26, 2012, Loïc Minier wrote:

...
...
As Michael noted yesterday, it would be nice to grant access to the lab using some type sync with SSH keys of users from a launchpad group. I'm guessing that code has already been written somewhere else before.

I looked at ssh-import-id, but it didn't have a team feature and didn't particularly impress me, so instead I hacked a custom script for the ~linaro-flexlm use case: http://bazaar.launchpad.net/~linaro-sysadmins/linaro-its-tools/trunk/view/he...

It's custom because it hardcodes ~linaro-flexlm and it also checks that people there are members of ~linaro (harcoded as well). Not hard to make more generic.

It's wrapped by a lock in this script: http://bazaar.launchpad.net/~linaro-sysadmins/linaro-its-tools/trunk/view/he...

and the crontab entry looks like this: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && ./update-ssh-keys ./flexlm-sshkeys

-- Loïc Minier

Loïc Minier

3:15 p.m.

On Thu, Jul 26, 2012, Andy Doan wrote:

...

One question: Does this script have the logic to remove users if they get removed from the LP group, or is that a manual step?

It doesn't add/remove/merge, it just replaces the authorized_keys wholesale with a new version.

If you're worried and would like to make sure certain keys get included, you can easily extend this to concatenate a set of fixed keys; e.g. create a ~/bin/my-ssh-keys script which runs: cat ~/.ssh/authorized_keys.prepend lp-members-sshkeys --sshkeys linaro-access-team

and then call it from cron: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && PATH="$PATH:`pwd`" && update-ssh-keys ~/bin/my-ssh-keys

Another good idea is to have SSH keys on the root account as a mean to recover from issues on the regular user account.

-- Loïc Minier

Andy Doan

5:36 p.m.

On 07/26/2012 10:15 AM, Loïc Minier wrote:

...

On Thu, Jul 26, 2012, Andy Doan wrote:

...
One question: Does this script have the logic to remove users if they get removed from the LP group, or is that a manual step?

It doesn't add/remove/merge, it just replaces the authorized_keys wholesale with a new version.

If you're worried and would like to make sure certain keys get included, you can easily extend this to concatenate a set of fixed keys; e.g. create a ~/bin/my-ssh-keys script which runs: cat ~/.ssh/authorized_keys.prepend lp-members-sshkeys --sshkeys linaro-access-team

and then call it from cron: @hourly cd linaro-its-tools && bzr pull >/dev/null 2>&1 && PATH="$PATH:`pwd`" && update-ssh-keys ~/bin/my-ssh-keys

Another good idea is to have SSH keys on the root account as a mean to recover from issues on the regular user account.

I see how your usage is now. I've just made a one-off script to yours here:

http://bazaar.launchpad.net/~doanac/+junk/linaro-its-tools/view/head:/lp-manage-local

This takes users from a given LP group and syncs them to a local account. I've started it on our new server and it seems to be working smoothly so far. Not sure if you are interested in this or not.

4771

days inactive

4778

days old

linaro-validation@lists.linaro.org

11 comments

participants

tags (0)

participants (4)

Andy Doan
Joey STANFORD
Loïc Minier
Michael Hope