On Mon, 3 Jun 2013 12:57:43 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
[]
I know some of our ARM slaves are a bit CPU light, but they also tend to have slow network connections. I am sure a bit of experimentation will tell us if we we should always move files off some slaves to an intermediary to do the hash+upload stuff.
Well, I'd personally aim to make client-side publishing support clean and lean, so it was easy to setup and run on any client, including not too powerful. Of course, some cases may need intermediary (like when we need to publish [big] files from non-networked board (hmm)), but those are niche cases.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that "security token". It's by definition limited use, like has source IP constraints, timing constraints (use not before 30min after issuance and not after 60min), other constraints, like may use for no more than 50 API calls, may publish not more than 10 files, etc.
OK, it occurs that I may have not broadcast my use case, which is: A server gets a token from the publishing service then pass it to a slave. The slave uses it and once the job has finished the server should be able to inform the publishing service the token is no longer required.
Per security practices, that's worse solution than specifying constraints upfront. What if server "forgets" to terminate token life? What if server DoSed so it's unable to terminate token which is used to do bad things in the meantime? Otherwise, such usecase is doable of course.
I don't actually care if this is a public IP facing service.
My primary usecase is EC2 build slaves, for which the service is essentially public (well, we can allow it only for 10.* IPs, but it's still open for all EC2 instances then, including foreign).
I have already decided to create a proxy publishing service in the LAVA lab so slaves can share files between themselves internally, some of which can be tagged for upload. I can just run SSH publishing from the proxy.
That just follows cumbersome design we currently have for Jenkins publishers. The more steps, the harder to get them right, and then maintain them in right state.
This can clearly be part of linaro-license-protection / some shared publishing service project, or a separate project. It is going to have quite a large overlap with any other publishing project though...
Yes, so if it's clearly a publishing service, then I'd encourage you to work on a design which solves all currently known publishing requirements and would be flexible and generic enough to accommodate future ones (and those basicly should be reducible to what we discussed: support sufficiently efficient publishing of arbitrary file sets, lean on a client side).
Actually, I specifically brought this question up to avoid situation that other engineers go for "spur of the moment" adhoc implementations, and we end up with bunch of crippled, insecure, hard-to-maintain publishing implementations (current Jenkins one already has enough holes and pain to setup/debug).
If we are only issuing and using the tokens over HTTPS I think that the best practice is to not restrict the use of the service other than how long the token is issued for.
Well, constraints above were just an example of what we can easily implement with HTTP-based system (and not so easily with PAM-based). Of course, the idea is that token constraints are flexible: scheduling server decides a token with how many constraints to request for particular publishing client. I agree that basic constraints to start with would be: source IP (important for EC2, maybe less important for LAVA) and max lifetime.
We can easily generate stats and set up alerts that point to potential abuse, but I would rather not have a job fail at the publishing step because they want to upload a large list of files.
api/add_link?type=md5&hash=1234abcd&<same as upload API from here>
It would be very easy to integrate this with most jobs since it can all be done with CLI tools that are probably in the default install. Maybe not for Android/busybox. Haven't looked. It would solve the OE problem quite easily and be flexible enough to allow other jobs to do the same.
For file updates or creating a new file based on a diff, as long as both endpoints have the old file you can just create and send a binary diff. That would be simple HTTP[S] stuff as well.
We could do such optimizations, sure. Later, if needed ;-).
If the server has the old file and the client only has the new one you are into interactive protocol territory; it would probably be easier to work out how to get rsync to work nicely than invent something new. Probably involving a temporary SSH login that you can get over the HTTPS API and the account login (password or key) is changed/deleted as the first login happens so it can't be re-used?
What about parallel publishers? What pool size of such manually-maintained accounts will be needed? What about race conditions and overall stability?
That way you don't need to mess with LDAP, PAM, Kerberos etc? I dunno, that was the first thing that came to mind!
Well, that feels fragile. And mixing up HTTP service and native service approach doesn't seem to be good idea either. The whole idea of HTTP service was to skip dealing with system-level (and thus "risky" both in terms of overall system stability and security) services. And we don't really have usecase for full rsync behavior ("only changed *parts* of file are uploaded") - we usually use tarballs with gazillion of not too big files. So, even if for a *build*, only one file is changed, then tar will have lots of mtime changes in file headers which are interspersed with file contents, and *zip will smear that around for diff-style algo to not have much benefit.
On the other hand, Neil sent email that there're similar challenges for multi-node LAVA setup. I didn't read thru it yet, but my guess that for (arbitrary) LAVA tests we'd rather use (and let our users use) standard tech like ssh/scp/rsync for inter-node comm, and then we'd need to have "PAM" level auth anyway, and then it makes little sense to have separate auth scheme just for publishing.
All I was saying was that writing an interactive protocol to update a file instead of sending a diff is a lot of effort and that we should at least think about (and probably prototype) something using rsync if we want to go down that route. I probably shouldn't have said more than that! I personally want to avoid that whole area since rsync + ssh has the same security headaches as we are currently living with and writing our own replacement seems bonkers.
I am glad to hear you don't think full-on rsync is useful :-)
Ok, so that's largely development/implementation details. I should say that I don't personally target to work on this (unless assigned of course, or unless this comes up on critical path). That's why I handed it over to Tyler to spec out better and plan implementation. I'm glad it lies on your path, and I encourage you to pick up this task, as I still have bunch of unfinished ones in my queue. In that regard, I just ping-pong some requirements and ideas I had.
As for uploading tarballs, that sounds a lot like a case for the build not to create the tarball :-)
Well, not we decide what our builds produce, and even assuming that changing that is grounded, it's quite a big scope creep ;-).
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Except for, well, we already have adhoc behavioral idiosyncrasies, like Android builds flattening happening on server. You hardly can "configure" that (though lambdas in YAML sounds cool (for debugging :-D)). Better approach would be to move that stuff to client side and have simple well-defined publishing semantics.
Indeed! I don't know why that is done on server at the moment and I assumed that the publish script would just be a wrapper around curl with the destination path modified appropriately (or similar, you know me, I would write the whole thing in Python). I am sure there are good reasons for the current logic, but I don't know what they are :-)
James
James
On 29 May 2013 16:57, James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Hi Paul,
Thanks for this. I need a mechanism for publishing from CI runtime jobs so this is important to me. I did look into using SSH/SFTP and it is simple to do in a reasonably insecure way, it would be much better to have an HTTP[S] based solution that uses temporary authentication tokens.
I was looking at this today because I woke up early worrying about it. Clearly I need more interesting stuff to think about! (and now, more sleep).
Anyway, it should be possible to do in Django:
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/ https://pypi.python.org/pypi/django-transfer/0.2-2 http://wiki.nginx.org/HttpUploadModule
My own notes are more focused on an internal server that would upload files to releases/snapshots but could retain files until disk space was needed to act as a cache. I was going to look at extending linaro-licenese-protection for this so there was no way to use the cache to avoid licenses. I was also going to have completely private files that you could only access if you had the authentication token that a job was given.
https://docs.google.com/a/linaro.org/document/d/1_ewb-xFDJc8Adk7AijV95XthGMv...
Feel free to add comments or insert more information and thoughts.
Note that for high performance uploads we probably want to hand off the upload to a web server. That django-transfer module doesn't support any Apache related upload stuff, which may mean that it doesn't exist. Moving to an nginx based solution would be easy enough if we needed to (we could replace the mod-xsendfile with the equivalent nginx call).
I think a prototype branch is ~1 day of engineering effort (no nginx, token based upload call in place, probably some kind of token request call, probably limited security, no proxy-publishing). Adding the rest of the features, testing etc probably takes it to more like 1 week.
James
On 29 May 2013 16:26, Paul Sokolovsky paul.sokolovsky@linaro.org wrote: > > > Begin forwarded message: > > Date: Wed, 29 May 2013 17:19:31 +0300 > From: Paul Sokolovsky Paul.Sokolovsky@linaro.org > To: Tyler Baker tyler.baker@linaro.org, Alan Bennett > alan.bennett@linaro.org Cc: Senthil Kumaran > senthil.kumaran@linaro.org, Fathi Boudra > fathi.boudra@linaro.org Subject: Re: New publishing infra > prototype report > > > Hello Tyler, > > As brought up today on IRC, it's a month since the below > report and proposal for further steps, and I don't remember > any reply. This whole publishing thing is peculiar, as it > sits still when its not needed, but it's actually in ambush > there to cause havoc at any time. > > For example, today Senthil came up with the question on how > to publish to snapshots. Fortunately, it turned out that it > was request for publishing a single file manually. But I > know the guys are working on Fedora builds, and that > definitely will need automated publishing (unless initial > requirements as provided by Fathi changed). And it's > definitely needed for CBuild migration, which I assume will > be worked on next month. > > Btw, I discovered that a BP for that was submitted yet by > Danilo: > https://blueprints.launchpad.net/linaro-infrastructure-misc/+spec/file-publi... > > > Thanks, > Paul > > > On Mon, 29 Apr 2013 18:58:39 +0300 > Paul Sokolovsky Paul.Sokolovsky@linaro.org wrote: > >> Hello, >> >> Last month I worked on a blueprint >> https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/prototy... >> to prototype an implementation of publishing framework which >> wouldn't depend on particular Jenkins features (and >> misfeatures) and could be reused for other services across >> Linaro CI infrastructure. Among these other projects are: >> >> 1. OpenEmbedded builds - efficient ("fresh only") >> publishing of source tarballs and cache files. >> 2. CBuild - publishing of toolchain build artifacts and >> logs. 3. Fedora/Lava - publishing of build artifacts and >> logs. >> >> So, the good news is that was possible to implement a >> publishing system whose interface is a single script which >> hides all the publishing complexity underneath. >> Implementation was a cumbersome, because existing >> publishing backend was reused, but it already opens >> possibility for better logging, debugging, profiling, etc. >> >> With proof-of-concept client-side available, the main >> complexity still lies in server-side backend. It's clear >> that current "SFTP >> + SSH trigger script" approach doesn't scale well in terms >> of ease of setup and security. I added my considerations to >> address that topic in " Conclusions and Future Work" >> section of >> http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr... >> >> So, action items I suggest based on this report: >> >> 1. Tyler to consult with Fathi (Fedora), Marcin (OE) and me >> (CBuild) and prepare architecture/spec for the general >> publishing system. It would be nice to BP this task to start >> in 13.05. 2. Depending on the time required to prepare spec, >> implementation can be scheduled right away, or postponed >> until LCE13, so we had another chance to discuss it face to >> face (as adhoc meeting, or as a session, if it's really >> worth it). >> >> >> Thanks, >> Paul >> >> Linaro.org | Open source software for ARM SoCe >> Follow Linaro: http://www.facebook.com/pages/Linaro >> http://twitter.com/#%21/linaroorg - >> http://www.linaro.org/linaro-blog > > > > -- > Best Regards, > Paul > > Linaro.org | Open source software for ARM SoCs > Follow Linaro: http://www.facebook.com/pages/Linaro > http://twitter.com/#%21/linaroorg - > http://www.linaro.org/linaro-blog > > > -- > Best Regards, > Paul > > Linaro.org | Open source software for ARM SoCs > Follow Linaro: http://www.facebook.com/pages/Linaro > http://twitter.com/#%21/linaroorg - > http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe