On Wed, 29 May 2013 17:11:11 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Yes, just like that: we have fixed (and thus trusted) job scheduler, which can reliably authenticate itself to a token server, get a token and pass that token to a much less trusted builder.
This may require a Jenkins plugin.
Yes, that's one possibility, as line 94 of the doc you quote says. Another possibility is to have extra frontend on top of Jenkins, which would request a token and inject it into Jenkins job (as a paremeter during start). That's possible right away with android-build and would be possible with other Jenkins with original ci-frontend plan.
Open Embedded Builds/rsync: This is not required. OE has a list of package sources and versions associated with each build so we should be able to write a simple "upload packages if that version doesn't exist" script. This probably requires us to host files in openembedded/sources/<package name>/<version>/<package file>, but that is a simple enough change.
James, for me, what you said translates into something "let's dig into adhoc peculiarities of one usecase, because it seems that it's easy to handle that, as if we don't know that it the end it still won't be that easy, but the result will not be reusable". Nope, let's shoot for general solution of problem of publishing arbitrary file sets. That's also "easy". The hardest part of that is to accept that we'll duplicate best-practice solution (rsync) and by duplicating it, we'll lose extra features like intra-file optimizations (because *that* we for sure don't want to duplicate).
That's why I'd like who anyone has strong reasons for going with rsync is come up with them before we decide on architecture (keeping in mind that rsync comes with friends like ssh, LDAP, PAM, Kerberos, etc.)
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Except for, well, we already have adhoc behavioral idiosyncrasies, like Android builds flattening happening on server. You hardly can "configure" that (though lambdas in YAML sounds cool (for debugging :-D)). Better approach would be to move that stuff to client side and have simple well-defined publishing semantics.
James
On 29 May 2013 16:57, James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Hi Paul,
Thanks for this. I need a mechanism for publishing from CI runtime jobs so this is important to me. I did look into using SSH/SFTP and it is simple to do in a reasonably insecure way, it would be much better to have an HTTP[S] based solution that uses temporary authentication tokens.
I was looking at this today because I woke up early worrying about it. Clearly I need more interesting stuff to think about! (and now, more sleep).
Anyway, it should be possible to do in Django:
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/ https://pypi.python.org/pypi/django-transfer/0.2-2 http://wiki.nginx.org/HttpUploadModule
My own notes are more focused on an internal server that would upload files to releases/snapshots but could retain files until disk space was needed to act as a cache. I was going to look at extending linaro-licenese-protection for this so there was no way to use the cache to avoid licenses. I was also going to have completely private files that you could only access if you had the authentication token that a job was given.
https://docs.google.com/a/linaro.org/document/d/1_ewb-xFDJc8Adk7AijV95XthGMv...
Feel free to add comments or insert more information and thoughts.
Note that for high performance uploads we probably want to hand off the upload to a web server. That django-transfer module doesn't support any Apache related upload stuff, which may mean that it doesn't exist. Moving to an nginx based solution would be easy enough if we needed to (we could replace the mod-xsendfile with the equivalent nginx call).
I think a prototype branch is ~1 day of engineering effort (no nginx, token based upload call in place, probably some kind of token request call, probably limited security, no proxy-publishing). Adding the rest of the features, testing etc probably takes it to more like 1 week.
James
On 29 May 2013 16:26, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Begin forwarded message:
Date: Wed, 29 May 2013 17:19:31 +0300 From: Paul Sokolovsky Paul.Sokolovsky@linaro.org To: Tyler Baker tyler.baker@linaro.org, Alan Bennett alan.bennett@linaro.org Cc: Senthil Kumaran senthil.kumaran@linaro.org, Fathi Boudra fathi.boudra@linaro.org Subject: Re: New publishing infra prototype report
Hello Tyler,
As brought up today on IRC, it's a month since the below report and proposal for further steps, and I don't remember any reply. This whole publishing thing is peculiar, as it sits still when its not needed, but it's actually in ambush there to cause havoc at any time.
For example, today Senthil came up with the question on how to publish to snapshots. Fortunately, it turned out that it was request for publishing a single file manually. But I know the guys are working on Fedora builds, and that definitely will need automated publishing (unless initial requirements as provided by Fathi changed). And it's definitely needed for CBuild migration, which I assume will be worked on next month.
Btw, I discovered that a BP for that was submitted yet by Danilo: https://blueprints.launchpad.net/linaro-infrastructure-misc/+spec/file-publi...
Thanks, Paul
On Mon, 29 Apr 2013 18:58:39 +0300 Paul Sokolovsky Paul.Sokolovsky@linaro.org wrote:
Hello,
Last month I worked on a blueprint https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/prototy... to prototype an implementation of publishing framework which wouldn't depend on particular Jenkins features (and misfeatures) and could be reused for other services across Linaro CI infrastructure. Among these other projects are:
- OpenEmbedded builds - efficient ("fresh only") publishing of
source tarballs and cache files. 2. CBuild - publishing of toolchain build artifacts and logs. 3. Fedora/Lava - publishing of build artifacts and logs.
So, the good news is that was possible to implement a publishing system whose interface is a single script which hides all the publishing complexity underneath. Implementation was a cumbersome, because existing publishing backend was reused, but it already opens possibility for better logging, debugging, profiling, etc.
With proof-of-concept client-side available, the main complexity still lies in server-side backend. It's clear that current "SFTP
- SSH trigger script" approach doesn't scale well in terms of
ease of setup and security. I added my considerations to address that topic in " Conclusions and Future Work" section of http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
So, action items I suggest based on this report:
- Tyler to consult with Fathi (Fedora), Marcin (OE) and me
(CBuild) and prepare architecture/spec for the general publishing system. It would be nice to BP this task to start in 13.05. 2. Depending on the time required to prepare spec, implementation can be scheduled right away, or postponed until LCE13, so we had another chance to discuss it face to face (as adhoc meeting, or as a session, if it's really worth it).
Thanks, Paul
Linaro.org | Open source software for ARM SoCe Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
On 31 May 2013 14:22, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Wed, 29 May 2013 17:11:11 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Yes, just like that: we have fixed (and thus trusted) job scheduler, which can reliably authenticate itself to a token server, get a token and pass that token to a much less trusted builder.
This may require a Jenkins plugin.
Yes, that's one possibility, as line 94 of the doc you quote says. Another possibility is to have extra frontend on top of Jenkins, which would request a token and inject it into Jenkins job (as a paremeter during start). That's possible right away with android-build and would be possible with other Jenkins with original ci-frontend plan.
Open Embedded Builds/rsync: This is not required. OE has a list of package sources and versions associated with each build so we should be able to write a simple "upload packages if that version doesn't exist" script. This probably requires us to host files in openembedded/sources/<package name>/<version>/<package file>, but that is a simple enough change.
James, for me, what you said translates into something "let's dig into adhoc peculiarities of one usecase, because it seems that it's easy to handle that, as if we don't know that it the end it still won't be that easy, but the result will not be reusable". Nope, let's shoot for general solution of problem of publishing arbitrary file sets. That's also "easy". The hardest part of that is to accept that we'll duplicate best-practice solution (rsync) and by duplicating it, we'll lose extra features like intra-file optimizations (because *that* we for sure don't want to duplicate).
That's why I'd like who anyone has strong reasons for going with rsync is come up with them before we decide on architecture (keeping in mind that rsync comes with friends like ssh, LDAP, PAM, Kerberos, etc.)
Well, my explanation is very OE specific and I didn't get anywhere close to specifying how it would work. Criticism received and understood :-)
How about we make it easy for the client side to identify files that are already on the server and have some kind of "I would upload this, but you already have it" semantic (basically setting up a file system link over HTTP[S]):
api/test_exists?type=md5&hash=1234abcd --> return text 0 or 1.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
api/add_link?type=md5&hash=1234abcd&<same as upload API from here>
It would be very easy to integrate this with most jobs since it can all be done with CLI tools that are probably in the default install. Maybe not for Android/busybox. Haven't looked. It would solve the OE problem quite easily and be flexible enough to allow other jobs to do the same.
For file updates or creating a new file based on a diff, as long as both endpoints have the old file you can just create and send a binary diff. That would be simple HTTP[S] stuff as well.
If the server has the old file and the client only has the new one you are into interactive protocol territory; it would probably be easier to work out how to get rsync to work nicely than invent something new. Probably involving a temporary SSH login that you can get over the HTTPS API and the account login (password or key) is changed/deleted as the first login happens so it can't be re-used? That way you don't need to mess with LDAP, PAM, Kerberos etc? I dunno, that was the first thing that came to mind!
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Except for, well, we already have adhoc behavioral idiosyncrasies, like Android builds flattening happening on server. You hardly can "configure" that (though lambdas in YAML sounds cool (for debugging :-D)). Better approach would be to move that stuff to client side and have simple well-defined publishing semantics.
Indeed! I don't know why that is done on server at the moment and I assumed that the publish script would just be a wrapper around curl with the destination path modified appropriately (or similar, you know me, I would write the whole thing in Python). I am sure there are good reasons for the current logic, but I don't know what they are :-)
James
James
On 29 May 2013 16:57, James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Hi Paul,
Thanks for this. I need a mechanism for publishing from CI runtime jobs so this is important to me. I did look into using SSH/SFTP and it is simple to do in a reasonably insecure way, it would be much better to have an HTTP[S] based solution that uses temporary authentication tokens.
I was looking at this today because I woke up early worrying about it. Clearly I need more interesting stuff to think about! (and now, more sleep).
Anyway, it should be possible to do in Django:
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/ https://pypi.python.org/pypi/django-transfer/0.2-2 http://wiki.nginx.org/HttpUploadModule
My own notes are more focused on an internal server that would upload files to releases/snapshots but could retain files until disk space was needed to act as a cache. I was going to look at extending linaro-licenese-protection for this so there was no way to use the cache to avoid licenses. I was also going to have completely private files that you could only access if you had the authentication token that a job was given.
https://docs.google.com/a/linaro.org/document/d/1_ewb-xFDJc8Adk7AijV95XthGMv...
Feel free to add comments or insert more information and thoughts.
Note that for high performance uploads we probably want to hand off the upload to a web server. That django-transfer module doesn't support any Apache related upload stuff, which may mean that it doesn't exist. Moving to an nginx based solution would be easy enough if we needed to (we could replace the mod-xsendfile with the equivalent nginx call).
I think a prototype branch is ~1 day of engineering effort (no nginx, token based upload call in place, probably some kind of token request call, probably limited security, no proxy-publishing). Adding the rest of the features, testing etc probably takes it to more like 1 week.
James
On 29 May 2013 16:26, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Begin forwarded message:
Date: Wed, 29 May 2013 17:19:31 +0300 From: Paul Sokolovsky Paul.Sokolovsky@linaro.org To: Tyler Baker tyler.baker@linaro.org, Alan Bennett alan.bennett@linaro.org Cc: Senthil Kumaran senthil.kumaran@linaro.org, Fathi Boudra fathi.boudra@linaro.org Subject: Re: New publishing infra prototype report
Hello Tyler,
As brought up today on IRC, it's a month since the below report and proposal for further steps, and I don't remember any reply. This whole publishing thing is peculiar, as it sits still when its not needed, but it's actually in ambush there to cause havoc at any time.
For example, today Senthil came up with the question on how to publish to snapshots. Fortunately, it turned out that it was request for publishing a single file manually. But I know the guys are working on Fedora builds, and that definitely will need automated publishing (unless initial requirements as provided by Fathi changed). And it's definitely needed for CBuild migration, which I assume will be worked on next month.
Btw, I discovered that a BP for that was submitted yet by Danilo: https://blueprints.launchpad.net/linaro-infrastructure-misc/+spec/file-publi...
Thanks, Paul
On Mon, 29 Apr 2013 18:58:39 +0300 Paul Sokolovsky Paul.Sokolovsky@linaro.org wrote:
Hello,
Last month I worked on a blueprint https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/prototy... to prototype an implementation of publishing framework which wouldn't depend on particular Jenkins features (and misfeatures) and could be reused for other services across Linaro CI infrastructure. Among these other projects are:
- OpenEmbedded builds - efficient ("fresh only") publishing of
source tarballs and cache files. 2. CBuild - publishing of toolchain build artifacts and logs. 3. Fedora/Lava - publishing of build artifacts and logs.
So, the good news is that was possible to implement a publishing system whose interface is a single script which hides all the publishing complexity underneath. Implementation was a cumbersome, because existing publishing backend was reused, but it already opens possibility for better logging, debugging, profiling, etc.
With proof-of-concept client-side available, the main complexity still lies in server-side backend. It's clear that current "SFTP
- SSH trigger script" approach doesn't scale well in terms of
ease of setup and security. I added my considerations to address that topic in " Conclusions and Future Work" section of http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
So, action items I suggest based on this report:
- Tyler to consult with Fathi (Fedora), Marcin (OE) and me
(CBuild) and prepare architecture/spec for the general publishing system. It would be nice to BP this task to start in 13.05. 2. Depending on the time required to prepare spec, implementation can be scheduled right away, or postponed until LCE13, so we had another chance to discuss it face to face (as adhoc meeting, or as a session, if it's really worth it).
Thanks, Paul
Linaro.org | Open source software for ARM SoCe Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
On Fri, 31 May 2013 15:50:54 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
On 31 May 2013 14:22, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Wed, 29 May 2013 17:11:11 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
This is very similar to what the Multi-Node parent <-> child communication may need.
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Precisely as the Multi-Node setup could use. I'm thinking the token could be added as part of the jobdata in lava-dispatcher, commands.py: job = LavaTestJob(jobdata, oob_file, config, self.args.output_dir)
My expectation for this Multi-Node support would be that the token would be written to the test environment and the test would use that to get the list of other clients for this multi-node job. Once each client has booted and "called home" to the parent using the token, clients could get the IP address and role of other clients. This then allows Multi-Node tests to make calls between clients to test their own protocols and other requirements.
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Just wondering how much overlap there could be between publishing for CI and publishing IP addresses between clients running a Multi-Node job.
I've only done the briefest look at the Multi-Node aspects so far, there's a PDF I've shared with some rough flow ideas for Parent-Child-Communication.
On 31 May 2013 16:25, Neil Williams codehelp@debian.org wrote:
On Fri, 31 May 2013 15:50:54 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
On 31 May 2013 14:22, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Wed, 29 May 2013 17:11:11 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
This is very similar to what the Multi-Node parent <-> child communication may need.
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Precisely as the Multi-Node setup could use. I'm thinking the token could be added as part of the jobdata in lava-dispatcher, commands.py: job = LavaTestJob(jobdata, oob_file, config, self.args.output_dir)
My expectation for this Multi-Node support would be that the token would be written to the test environment and the test would use that to get the list of other clients for this multi-node job. Once each client has booted and "called home" to the parent using the token, clients could get the IP address and role of other clients. This then allows Multi-Node tests to make calls between clients to test their own protocols and other requirements.
Cool. Looks like if one of us finds or invents a library to do this then we should share. I did use a REST library with Django before that did some token based auth, but it was a bit lacking in features where I needed them.
Linaro-license-protection (which runs snapshots and releases) uses a SQLite database to store some data so if I wrote something (and frankly not writing something right now just as an experiment seems impossible) I would likely store a user: key mapping in there and have a nice interface to request a new key and delete old ones. Essentially it sounds like a persistent dictionary though. (I know that this doesn't cover group based stuff, but that doesn't matter for a proof of concept hack and would be simple enough to add).
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Just wondering how much overlap there could be between publishing for CI and publishing IP addresses between clients running a Multi-Node job.
I've only done the briefest look at the Multi-Node aspects so far, there's a PDF I've shared with some rough flow ideas for Parent-Child-Communication.
It sounds like a similar problem. Writing a Django app that receives a GET /<job ID> and returns a JSON blob of metadata is simple to implement and I would be happy to hack something together to get you started or pair program with you if that would help. Adding in a layer of security and the ability for the clients to put updates to the server would be easy as well.
-- James Tunnicliffe
Hello Neil,
On Fri, 31 May 2013 16:25:44 +0100 Neil Williams codehelp@debian.org wrote:
[]
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
This is very similar to what the Multi-Node parent <-> child communication may need.
Thanks for looking into this, Neil, and seeing for possibilities for reuse in other scenarios, that's exactly why I wanted to wider discussion of this and be led by Tyler, architect for entire team, because with "infrastructure" outlook we can miss something.
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Precisely as the Multi-Node setup could use. I'm thinking the token could be added as part of the jobdata in lava-dispatcher, commands.py: job = LavaTestJob(jobdata, oob_file, config, self.args.output_dir)
My expectation for this Multi-Node support would be that the token would be written to the test environment and the test would use that to get the list of other clients for this multi-node job. Once each client has booted and "called home" to the parent using the token, clients could get the IP address and role of other clients. This then allows Multi-Node tests to make calls between clients to test their own protocols and other requirements.
Well, my questions here would be what are these "calls" that clients in multinode setup do among themselves. My first thinking would be network testing, but as Antonio wrote in the other email, it doesn't have to be, and primary communication channel is still serial.
I'm afraid I don't know enough about multinode to asses if it shares enough commonality in authentication requirements with build slaves (which do publishing at the end of build). Just to elaborate with Jenkins build slaves we have:
1. Static build master which schedules builds. This system is considered trusted as user/3rd-party code cannot be executed there. 2. But actual builds are scheduled on EC2 slaves which come and go and also execute code as specified by users. So, we cannot trust slave environment and don't want to expose any "persistent" authentication credentials there, instead, only once-off tokens should be used there, so if attacker gets to them, one won't get too much.
Fairly speaking, as far as I can tell, p.2 applies pretty much to LAVA board pool too, except for some difference: there's finite set of build boards (i.e. they're under more scrutiny), and they're firewalled. But again, I don't know what kind of security (vs flexibility) you guys want with multinode setup and what's the best way to implement it...
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Just wondering how much overlap there could be between publishing for CI and publishing IP addresses between clients running a Multi-Node job.
I've only done the briefest look at the Multi-Node aspects so far, there's a PDF I've shared with some rough flow ideas for Parent-Child-Communication.
Hello,
On Fri, 31 May 2013 15:50:54 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
On 31 May 2013 14:22, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Wed, 29 May 2013 17:11:11 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Yes, just like that: we have fixed (and thus trusted) job scheduler, which can reliably authenticate itself to a token server, get a token and pass that token to a much less trusted builder.
This may require a Jenkins plugin.
Yes, that's one possibility, as line 94 of the doc you quote says. Another possibility is to have extra frontend on top of Jenkins, which would request a token and inject it into Jenkins job (as a paremeter during start). That's possible right away with android-build and would be possible with other Jenkins with original ci-frontend plan.
Open Embedded Builds/rsync: This is not required. OE has a list of package sources and versions associated with each build so we should be able to write a simple "upload packages if that version doesn't exist" script. This probably requires us to host files in openembedded/sources/<package name>/<version>/<package file>, but that is a simple enough change.
James, for me, what you said translates into something "let's dig into adhoc peculiarities of one usecase, because it seems that it's easy to handle that, as if we don't know that it the end it still won't be that easy, but the result will not be reusable". Nope, let's shoot for general solution of problem of publishing arbitrary file sets. That's also "easy". The hardest part of that is to accept that we'll duplicate best-practice solution (rsync) and by duplicating it, we'll lose extra features like intra-file optimizations (because *that* we for sure don't want to duplicate).
That's why I'd like who anyone has strong reasons for going with rsync is come up with them before we decide on architecture (keeping in mind that rsync comes with friends like ssh, LDAP, PAM, Kerberos, etc.)
Well, my explanation is very OE specific and I didn't get anywhere close to specifying how it would work. Criticism received and understood :-)
How about we make it easy for the client side to identify files that are already on the server and have some kind of "I would upload this, but you already have it" semantic (basically setting up a file system link over HTTP[S]):
api/test_exists?type=md5&hash=1234abcd --> return text 0 or 1.
Yes, that was my idea. Actually, we could/should do it like rsync does: allow to use either hash, or length/mtime - because hash may be too long too compute for large files, and mtime may be off between servers and not reliable enough for all usecases.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that "security token". It's by definition limited use, like has source IP constraints, timing constraints (use not before 30min after issuance and not after 60min), other constraints, like may use for no more than 50 API calls, may publish not more than 10 files, etc.
api/add_link?type=md5&hash=1234abcd&<same as upload API from here>
It would be very easy to integrate this with most jobs since it can all be done with CLI tools that are probably in the default install. Maybe not for Android/busybox. Haven't looked. It would solve the OE problem quite easily and be flexible enough to allow other jobs to do the same.
For file updates or creating a new file based on a diff, as long as both endpoints have the old file you can just create and send a binary diff. That would be simple HTTP[S] stuff as well.
We could do such optimizations, sure. Later, if needed ;-).
If the server has the old file and the client only has the new one you are into interactive protocol territory; it would probably be easier to work out how to get rsync to work nicely than invent something new. Probably involving a temporary SSH login that you can get over the HTTPS API and the account login (password or key) is changed/deleted as the first login happens so it can't be re-used?
What about parallel publishers? What pool size of such manually-maintained accounts will be needed? What about race conditions and overall stability?
That way you don't need to mess with LDAP, PAM, Kerberos etc? I dunno, that was the first thing that came to mind!
Well, that feels fragile. And mixing up HTTP service and native service approach doesn't seem to be good idea either. The whole idea of HTTP service was to skip dealing with system-level (and thus "risky" both in terms of overall system stability and security) services. And we don't really have usecase for full rsync behavior ("only changed *parts* of file are uploaded") - we usually use tarballs with gazillion of not too big files. So, even if for a *build*, only one file is changed, then tar will have lots of mtime changes in file headers which are interspersed with file contents, and *zip will smear that around for diff-style algo to not have much benefit.
On the other hand, Neil sent email that there're similar challenges for multi-node LAVA setup. I didn't read thru it yet, but my guess that for (arbitrary) LAVA tests we'd rather use (and let our users use) standard tech like ssh/scp/rsync for inter-node comm, and then we'd need to have "PAM" level auth anyway, and then it makes little sense to have separate auth scheme just for publishing.
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Except for, well, we already have adhoc behavioral idiosyncrasies, like Android builds flattening happening on server. You hardly can "configure" that (though lambdas in YAML sounds cool (for debugging :-D)). Better approach would be to move that stuff to client side and have simple well-defined publishing semantics.
Indeed! I don't know why that is done on server at the moment and I assumed that the publish script would just be a wrapper around curl with the destination path modified appropriately (or similar, you know me, I would write the whole thing in Python). I am sure there are good reasons for the current logic, but I don't know what they are :-)
James
James
On 29 May 2013 16:57, James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Hi Paul,
Thanks for this. I need a mechanism for publishing from CI runtime jobs so this is important to me. I did look into using SSH/SFTP and it is simple to do in a reasonably insecure way, it would be much better to have an HTTP[S] based solution that uses temporary authentication tokens.
I was looking at this today because I woke up early worrying about it. Clearly I need more interesting stuff to think about! (and now, more sleep).
Anyway, it should be possible to do in Django:
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/ https://pypi.python.org/pypi/django-transfer/0.2-2 http://wiki.nginx.org/HttpUploadModule
My own notes are more focused on an internal server that would upload files to releases/snapshots but could retain files until disk space was needed to act as a cache. I was going to look at extending linaro-licenese-protection for this so there was no way to use the cache to avoid licenses. I was also going to have completely private files that you could only access if you had the authentication token that a job was given.
https://docs.google.com/a/linaro.org/document/d/1_ewb-xFDJc8Adk7AijV95XthGMv...
Feel free to add comments or insert more information and thoughts.
Note that for high performance uploads we probably want to hand off the upload to a web server. That django-transfer module doesn't support any Apache related upload stuff, which may mean that it doesn't exist. Moving to an nginx based solution would be easy enough if we needed to (we could replace the mod-xsendfile with the equivalent nginx call).
I think a prototype branch is ~1 day of engineering effort (no nginx, token based upload call in place, probably some kind of token request call, probably limited security, no proxy-publishing). Adding the rest of the features, testing etc probably takes it to more like 1 week.
James
On 29 May 2013 16:26, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Begin forwarded message:
Date: Wed, 29 May 2013 17:19:31 +0300 From: Paul Sokolovsky Paul.Sokolovsky@linaro.org To: Tyler Baker tyler.baker@linaro.org, Alan Bennett alan.bennett@linaro.org Cc: Senthil Kumaran senthil.kumaran@linaro.org, Fathi Boudra fathi.boudra@linaro.org Subject: Re: New publishing infra prototype report
Hello Tyler,
As brought up today on IRC, it's a month since the below report and proposal for further steps, and I don't remember any reply. This whole publishing thing is peculiar, as it sits still when its not needed, but it's actually in ambush there to cause havoc at any time.
For example, today Senthil came up with the question on how to publish to snapshots. Fortunately, it turned out that it was request for publishing a single file manually. But I know the guys are working on Fedora builds, and that definitely will need automated publishing (unless initial requirements as provided by Fathi changed). And it's definitely needed for CBuild migration, which I assume will be worked on next month.
Btw, I discovered that a BP for that was submitted yet by Danilo: https://blueprints.launchpad.net/linaro-infrastructure-misc/+spec/file-publi...
Thanks, Paul
On Mon, 29 Apr 2013 18:58:39 +0300 Paul Sokolovsky Paul.Sokolovsky@linaro.org wrote:
Hello,
Last month I worked on a blueprint https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/prototy... to prototype an implementation of publishing framework which wouldn't depend on particular Jenkins features (and misfeatures) and could be reused for other services across Linaro CI infrastructure. Among these other projects are:
- OpenEmbedded builds - efficient ("fresh only") publishing of
source tarballs and cache files. 2. CBuild - publishing of toolchain build artifacts and logs. 3. Fedora/Lava - publishing of build artifacts and logs.
So, the good news is that was possible to implement a publishing system whose interface is a single script which hides all the publishing complexity underneath. Implementation was a cumbersome, because existing publishing backend was reused, but it already opens possibility for better logging, debugging, profiling, etc.
With proof-of-concept client-side available, the main complexity still lies in server-side backend. It's clear that current "SFTP
- SSH trigger script" approach doesn't scale well in terms of
ease of setup and security. I added my considerations to address that topic in " Conclusions and Future Work" section of http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
So, action items I suggest based on this report:
- Tyler to consult with Fathi (Fedora), Marcin (OE) and me
(CBuild) and prepare architecture/spec for the general publishing system. It would be nice to BP this task to start in 13.05. 2. Depending on the time required to prepare spec, implementation can be scheduled right away, or postponed until LCE13, so we had another chance to discuss it face to face (as adhoc meeting, or as a session, if it's really worth it).
Thanks, Paul
Linaro.org | Open source software for ARM SoCe Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
On 3 June 2013 11:40, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Hello,
On Fri, 31 May 2013 15:50:54 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
On 31 May 2013 14:22, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Wed, 29 May 2013 17:11:11 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Issues raised in http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr...
Getting a token: I see this as the service that starts the job has some secret allowing it to request the token, which is then passed onto the job.
Yes, just like that: we have fixed (and thus trusted) job scheduler, which can reliably authenticate itself to a token server, get a token and pass that token to a much less trusted builder.
This may require a Jenkins plugin.
Yes, that's one possibility, as line 94 of the doc you quote says. Another possibility is to have extra frontend on top of Jenkins, which would request a token and inject it into Jenkins job (as a paremeter during start). That's possible right away with android-build and would be possible with other Jenkins with original ci-frontend plan.
Open Embedded Builds/rsync: This is not required. OE has a list of package sources and versions associated with each build so we should be able to write a simple "upload packages if that version doesn't exist" script. This probably requires us to host files in openembedded/sources/<package name>/<version>/<package file>, but that is a simple enough change.
James, for me, what you said translates into something "let's dig into adhoc peculiarities of one usecase, because it seems that it's easy to handle that, as if we don't know that it the end it still won't be that easy, but the result will not be reusable". Nope, let's shoot for general solution of problem of publishing arbitrary file sets. That's also "easy". The hardest part of that is to accept that we'll duplicate best-practice solution (rsync) and by duplicating it, we'll lose extra features like intra-file optimizations (because *that* we for sure don't want to duplicate).
That's why I'd like who anyone has strong reasons for going with rsync is come up with them before we decide on architecture (keeping in mind that rsync comes with friends like ssh, LDAP, PAM, Kerberos, etc.)
Well, my explanation is very OE specific and I didn't get anywhere close to specifying how it would work. Criticism received and understood :-)
How about we make it easy for the client side to identify files that are already on the server and have some kind of "I would upload this, but you already have it" semantic (basically setting up a file system link over HTTP[S]):
api/test_exists?type=md5&hash=1234abcd --> return text 0 or 1.
Yes, that was my idea. Actually, we could/should do it like rsync does: allow to use either hash, or length/mtime - because hash may be too long too compute for large files, and mtime may be off between servers and not reliable enough for all usecases.
Indeed, a hash is a gold standard, but file name, size, modification date are all good first checks.
I wouldn't worry too much about hash calculation overhead - md5sum is disk limited for magnetic spinny disks for my desktop (the disk is averaging 149MB/s, CPU is about 60% on 1 core). Provided we cache the result on the server it shouldn't be too bad. If the assumption of files don't change in the storage area is true, we should be able to generate hashes offline and keep them recorded permanently. It would be reasonable to store name, mtime, hash for each file so we can do a quick mtime check before returning the hash.
I know some of our ARM slaves are a bit CPU light, but they also tend to have slow network connections. I am sure a bit of experimentation will tell us if we we should always move files off some slaves to an intermediary to do the hash+upload stuff.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that "security token". It's by definition limited use, like has source IP constraints, timing constraints (use not before 30min after issuance and not after 60min), other constraints, like may use for no more than 50 API calls, may publish not more than 10 files, etc.
OK, it occurs that I may have not broadcast my use case, which is: A server gets a token from the publishing service then pass it to a slave. The slave uses it and once the job has finished the server should be able to inform the publishing service the token is no longer required.
I don't actually care if this is a public IP facing service. I have already decided to create a proxy publishing service in the LAVA lab so slaves can share files between themselves internally, some of which can be tagged for upload. I can just run SSH publishing from the proxy. This can clearly be part of linaro-license-protection / some shared publishing service project, or a separate project. It is going to have quite a large overlap with any other publishing project though...
If we are only issuing and using the tokens over HTTPS I think that the best practice is to not restrict the use of the service other than how long the token is issued for. We can easily generate stats and set up alerts that point to potential abuse, but I would rather not have a job fail at the publishing step because they want to upload a large list of files.
api/add_link?type=md5&hash=1234abcd&<same as upload API from here>
It would be very easy to integrate this with most jobs since it can all be done with CLI tools that are probably in the default install. Maybe not for Android/busybox. Haven't looked. It would solve the OE problem quite easily and be flexible enough to allow other jobs to do the same.
For file updates or creating a new file based on a diff, as long as both endpoints have the old file you can just create and send a binary diff. That would be simple HTTP[S] stuff as well.
We could do such optimizations, sure. Later, if needed ;-).
If the server has the old file and the client only has the new one you are into interactive protocol territory; it would probably be easier to work out how to get rsync to work nicely than invent something new. Probably involving a temporary SSH login that you can get over the HTTPS API and the account login (password or key) is changed/deleted as the first login happens so it can't be re-used?
What about parallel publishers? What pool size of such manually-maintained accounts will be needed? What about race conditions and overall stability?
That way you don't need to mess with LDAP, PAM, Kerberos etc? I dunno, that was the first thing that came to mind!
Well, that feels fragile. And mixing up HTTP service and native service approach doesn't seem to be good idea either. The whole idea of HTTP service was to skip dealing with system-level (and thus "risky" both in terms of overall system stability and security) services. And we don't really have usecase for full rsync behavior ("only changed *parts* of file are uploaded") - we usually use tarballs with gazillion of not too big files. So, even if for a *build*, only one file is changed, then tar will have lots of mtime changes in file headers which are interspersed with file contents, and *zip will smear that around for diff-style algo to not have much benefit.
On the other hand, Neil sent email that there're similar challenges for multi-node LAVA setup. I didn't read thru it yet, but my guess that for (arbitrary) LAVA tests we'd rather use (and let our users use) standard tech like ssh/scp/rsync for inter-node comm, and then we'd need to have "PAM" level auth anyway, and then it makes little sense to have separate auth scheme just for publishing.
All I was saying was that writing an interactive protocol to update a file instead of sending a diff is a lot of effort and that we should at least think about (and probably prototype) something using rsync if we want to go down that route. I probably shouldn't have said more than that! I personally want to avoid that whole area since rsync + ssh has the same security headaches as we are currently living with and writing our own replacement seems bonkers.
I am glad to hear you don't think full-on rsync is useful :-)
As for uploading tarballs, that sounds a lot like a case for the build not to create the tarball :-)
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Except for, well, we already have adhoc behavioral idiosyncrasies, like Android builds flattening happening on server. You hardly can "configure" that (though lambdas in YAML sounds cool (for debugging :-D)). Better approach would be to move that stuff to client side and have simple well-defined publishing semantics.
Indeed! I don't know why that is done on server at the moment and I assumed that the publish script would just be a wrapper around curl with the destination path modified appropriately (or similar, you know me, I would write the whole thing in Python). I am sure there are good reasons for the current logic, but I don't know what they are :-)
James
James
On 29 May 2013 16:57, James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Hi Paul,
Thanks for this. I need a mechanism for publishing from CI runtime jobs so this is important to me. I did look into using SSH/SFTP and it is simple to do in a reasonably insecure way, it would be much better to have an HTTP[S] based solution that uses temporary authentication tokens.
I was looking at this today because I woke up early worrying about it. Clearly I need more interesting stuff to think about! (and now, more sleep).
Anyway, it should be possible to do in Django:
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/ https://pypi.python.org/pypi/django-transfer/0.2-2 http://wiki.nginx.org/HttpUploadModule
My own notes are more focused on an internal server that would upload files to releases/snapshots but could retain files until disk space was needed to act as a cache. I was going to look at extending linaro-licenese-protection for this so there was no way to use the cache to avoid licenses. I was also going to have completely private files that you could only access if you had the authentication token that a job was given.
https://docs.google.com/a/linaro.org/document/d/1_ewb-xFDJc8Adk7AijV95XthGMv...
Feel free to add comments or insert more information and thoughts.
Note that for high performance uploads we probably want to hand off the upload to a web server. That django-transfer module doesn't support any Apache related upload stuff, which may mean that it doesn't exist. Moving to an nginx based solution would be easy enough if we needed to (we could replace the mod-xsendfile with the equivalent nginx call).
I think a prototype branch is ~1 day of engineering effort (no nginx, token based upload call in place, probably some kind of token request call, probably limited security, no proxy-publishing). Adding the rest of the features, testing etc probably takes it to more like 1 week.
James
On 29 May 2013 16:26, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
Begin forwarded message:
Date: Wed, 29 May 2013 17:19:31 +0300 From: Paul Sokolovsky Paul.Sokolovsky@linaro.org To: Tyler Baker tyler.baker@linaro.org, Alan Bennett alan.bennett@linaro.org Cc: Senthil Kumaran senthil.kumaran@linaro.org, Fathi Boudra fathi.boudra@linaro.org Subject: Re: New publishing infra prototype report
Hello Tyler,
As brought up today on IRC, it's a month since the below report and proposal for further steps, and I don't remember any reply. This whole publishing thing is peculiar, as it sits still when its not needed, but it's actually in ambush there to cause havoc at any time.
For example, today Senthil came up with the question on how to publish to snapshots. Fortunately, it turned out that it was request for publishing a single file manually. But I know the guys are working on Fedora builds, and that definitely will need automated publishing (unless initial requirements as provided by Fathi changed). And it's definitely needed for CBuild migration, which I assume will be worked on next month.
Btw, I discovered that a BP for that was submitted yet by Danilo: https://blueprints.launchpad.net/linaro-infrastructure-misc/+spec/file-publi...
Thanks, Paul
On Mon, 29 Apr 2013 18:58:39 +0300 Paul Sokolovsky Paul.Sokolovsky@linaro.org wrote:
> Hello, > > Last month I worked on a blueprint > https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/prototy... > to prototype an implementation of publishing framework which > wouldn't depend on particular Jenkins features (and > misfeatures) and could be reused for other services across > Linaro CI infrastructure. Among these other projects are: > > 1. OpenEmbedded builds - efficient ("fresh only") publishing of > source tarballs and cache files. > 2. CBuild - publishing of toolchain build artifacts and logs. > 3. Fedora/Lava - publishing of build artifacts and logs. > > So, the good news is that was possible to implement a > publishing system whose interface is a single script which > hides all the publishing complexity underneath. Implementation > was a cumbersome, because existing publishing backend was > reused, but it already opens possibility for better logging, > debugging, profiling, etc. > > With proof-of-concept client-side available, the main > complexity still lies in server-side backend. It's clear that > current "SFTP > + SSH trigger script" approach doesn't scale well in terms of > ease of setup and security. I added my considerations to > address that topic in " Conclusions and Future Work" section of > http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr... > > So, action items I suggest based on this report: > > 1. Tyler to consult with Fathi (Fedora), Marcin (OE) and me > (CBuild) and prepare architecture/spec for the general > publishing system. It would be nice to BP this task to start > in 13.05. 2. Depending on the time required to prepare spec, > implementation can be scheduled right away, or postponed until > LCE13, so we had another chance to discuss it face to face (as > adhoc meeting, or as a session, if it's really worth it). > > > Thanks, > Paul > > Linaro.org | Open source software for ARM SoCe > Follow Linaro: http://www.facebook.com/pages/Linaro > http://twitter.com/#%21/linaroorg - > http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
On Mon, 3 Jun 2013 12:57:43 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
[]
I know some of our ARM slaves are a bit CPU light, but they also tend to have slow network connections. I am sure a bit of experimentation will tell us if we we should always move files off some slaves to an intermediary to do the hash+upload stuff.
Well, I'd personally aim to make client-side publishing support clean and lean, so it was easy to setup and run on any client, including not too powerful. Of course, some cases may need intermediary (like when we need to publish [big] files from non-networked board (hmm)), but those are niche cases.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that "security token". It's by definition limited use, like has source IP constraints, timing constraints (use not before 30min after issuance and not after 60min), other constraints, like may use for no more than 50 API calls, may publish not more than 10 files, etc.
OK, it occurs that I may have not broadcast my use case, which is: A server gets a token from the publishing service then pass it to a slave. The slave uses it and once the job has finished the server should be able to inform the publishing service the token is no longer required.
Per security practices, that's worse solution than specifying constraints upfront. What if server "forgets" to terminate token life? What if server DoSed so it's unable to terminate token which is used to do bad things in the meantime? Otherwise, such usecase is doable of course.
I don't actually care if this is a public IP facing service.
My primary usecase is EC2 build slaves, for which the service is essentially public (well, we can allow it only for 10.* IPs, but it's still open for all EC2 instances then, including foreign).
I have already decided to create a proxy publishing service in the LAVA lab so slaves can share files between themselves internally, some of which can be tagged for upload. I can just run SSH publishing from the proxy.
That just follows cumbersome design we currently have for Jenkins publishers. The more steps, the harder to get them right, and then maintain them in right state.
This can clearly be part of linaro-license-protection / some shared publishing service project, or a separate project. It is going to have quite a large overlap with any other publishing project though...
Yes, so if it's clearly a publishing service, then I'd encourage you to work on a design which solves all currently known publishing requirements and would be flexible and generic enough to accommodate future ones (and those basicly should be reducible to what we discussed: support sufficiently efficient publishing of arbitrary file sets, lean on a client side).
Actually, I specifically brought this question up to avoid situation that other engineers go for "spur of the moment" adhoc implementations, and we end up with bunch of crippled, insecure, hard-to-maintain publishing implementations (current Jenkins one already has enough holes and pain to setup/debug).
If we are only issuing and using the tokens over HTTPS I think that the best practice is to not restrict the use of the service other than how long the token is issued for.
Well, constraints above were just an example of what we can easily implement with HTTP-based system (and not so easily with PAM-based). Of course, the idea is that token constraints are flexible: scheduling server decides a token with how many constraints to request for particular publishing client. I agree that basic constraints to start with would be: source IP (important for EC2, maybe less important for LAVA) and max lifetime.
We can easily generate stats and set up alerts that point to potential abuse, but I would rather not have a job fail at the publishing step because they want to upload a large list of files.
api/add_link?type=md5&hash=1234abcd&<same as upload API from here>
It would be very easy to integrate this with most jobs since it can all be done with CLI tools that are probably in the default install. Maybe not for Android/busybox. Haven't looked. It would solve the OE problem quite easily and be flexible enough to allow other jobs to do the same.
For file updates or creating a new file based on a diff, as long as both endpoints have the old file you can just create and send a binary diff. That would be simple HTTP[S] stuff as well.
We could do such optimizations, sure. Later, if needed ;-).
If the server has the old file and the client only has the new one you are into interactive protocol territory; it would probably be easier to work out how to get rsync to work nicely than invent something new. Probably involving a temporary SSH login that you can get over the HTTPS API and the account login (password or key) is changed/deleted as the first login happens so it can't be re-used?
What about parallel publishers? What pool size of such manually-maintained accounts will be needed? What about race conditions and overall stability?
That way you don't need to mess with LDAP, PAM, Kerberos etc? I dunno, that was the first thing that came to mind!
Well, that feels fragile. And mixing up HTTP service and native service approach doesn't seem to be good idea either. The whole idea of HTTP service was to skip dealing with system-level (and thus "risky" both in terms of overall system stability and security) services. And we don't really have usecase for full rsync behavior ("only changed *parts* of file are uploaded") - we usually use tarballs with gazillion of not too big files. So, even if for a *build*, only one file is changed, then tar will have lots of mtime changes in file headers which are interspersed with file contents, and *zip will smear that around for diff-style algo to not have much benefit.
On the other hand, Neil sent email that there're similar challenges for multi-node LAVA setup. I didn't read thru it yet, but my guess that for (arbitrary) LAVA tests we'd rather use (and let our users use) standard tech like ssh/scp/rsync for inter-node comm, and then we'd need to have "PAM" level auth anyway, and then it makes little sense to have separate auth scheme just for publishing.
All I was saying was that writing an interactive protocol to update a file instead of sending a diff is a lot of effort and that we should at least think about (and probably prototype) something using rsync if we want to go down that route. I probably shouldn't have said more than that! I personally want to avoid that whole area since rsync + ssh has the same security headaches as we are currently living with and writing our own replacement seems bonkers.
I am glad to hear you don't think full-on rsync is useful :-)
Ok, so that's largely development/implementation details. I should say that I don't personally target to work on this (unless assigned of course, or unless this comes up on critical path). That's why I handed it over to Tyler to spec out better and plan implementation. I'm glad it lies on your path, and I encourage you to pick up this task, as I still have bunch of unfinished ones in my queue. In that regard, I just ping-pong some requirements and ideas I had.
As for uploading tarballs, that sounds a lot like a case for the build not to create the tarball :-)
Well, not we decide what our builds produce, and even assuming that changing that is grounded, it's quite a big scope creep ;-).
I already have a protocol designed to do the file update thing (server has old file, client has new one but not old one) for a pet project that I was going to open source, but it is just a bit of fun and hasn't been tested, so even having got that far I would still tell other people to use rsync.
publish --token=<token> --type=<build_type> --strip=<strip> <build_id> <glob_pattern>...
This seems like a reasonable starting point. Lets make sure that it uses a configuration file to specify what to do with those build types etc. Preferably one that it can update from a public location so we don't have to re-spin the tool to add a new build type (though I guess we normally check it out of VCS as we go, so that works too).
Well, on client side, it's ideally just a single file which just handle obvious filtering options (like <glob_pattern> or --strip=) locally and passes the rest to API/service. Server-side can handle the options in any way it wants, note that options above don't require much "configuration", for example --type= just maps to top-level download dir.
Except for, well, we already have adhoc behavioral idiosyncrasies, like Android builds flattening happening on server. You hardly can "configure" that (though lambdas in YAML sounds cool (for debugging :-D)). Better approach would be to move that stuff to client side and have simple well-defined publishing semantics.
Indeed! I don't know why that is done on server at the moment and I assumed that the publish script would just be a wrapper around curl with the destination path modified appropriately (or similar, you know me, I would write the whole thing in Python). I am sure there are good reasons for the current logic, but I don't know what they are :-)
James
James
On 29 May 2013 16:57, James Tunnicliffe james.tunnicliffe@linaro.org wrote:
Hi Paul,
Thanks for this. I need a mechanism for publishing from CI runtime jobs so this is important to me. I did look into using SSH/SFTP and it is simple to do in a reasonably insecure way, it would be much better to have an HTTP[S] based solution that uses temporary authentication tokens.
I was looking at this today because I woke up early worrying about it. Clearly I need more interesting stuff to think about! (and now, more sleep).
Anyway, it should be possible to do in Django:
https://docs.djangoproject.com/en/dev/topics/http/file-uploads/ https://pypi.python.org/pypi/django-transfer/0.2-2 http://wiki.nginx.org/HttpUploadModule
My own notes are more focused on an internal server that would upload files to releases/snapshots but could retain files until disk space was needed to act as a cache. I was going to look at extending linaro-licenese-protection for this so there was no way to use the cache to avoid licenses. I was also going to have completely private files that you could only access if you had the authentication token that a job was given.
https://docs.google.com/a/linaro.org/document/d/1_ewb-xFDJc8Adk7AijV95XthGMv...
Feel free to add comments or insert more information and thoughts.
Note that for high performance uploads we probably want to hand off the upload to a web server. That django-transfer module doesn't support any Apache related upload stuff, which may mean that it doesn't exist. Moving to an nginx based solution would be easy enough if we needed to (we could replace the mod-xsendfile with the equivalent nginx call).
I think a prototype branch is ~1 day of engineering effort (no nginx, token based upload call in place, probably some kind of token request call, probably limited security, no proxy-publishing). Adding the rest of the features, testing etc probably takes it to more like 1 week.
James
On 29 May 2013 16:26, Paul Sokolovsky paul.sokolovsky@linaro.org wrote: > > > Begin forwarded message: > > Date: Wed, 29 May 2013 17:19:31 +0300 > From: Paul Sokolovsky Paul.Sokolovsky@linaro.org > To: Tyler Baker tyler.baker@linaro.org, Alan Bennett > alan.bennett@linaro.org Cc: Senthil Kumaran > senthil.kumaran@linaro.org, Fathi Boudra > fathi.boudra@linaro.org Subject: Re: New publishing infra > prototype report > > > Hello Tyler, > > As brought up today on IRC, it's a month since the below > report and proposal for further steps, and I don't remember > any reply. This whole publishing thing is peculiar, as it > sits still when its not needed, but it's actually in ambush > there to cause havoc at any time. > > For example, today Senthil came up with the question on how > to publish to snapshots. Fortunately, it turned out that it > was request for publishing a single file manually. But I > know the guys are working on Fedora builds, and that > definitely will need automated publishing (unless initial > requirements as provided by Fathi changed). And it's > definitely needed for CBuild migration, which I assume will > be worked on next month. > > Btw, I discovered that a BP for that was submitted yet by > Danilo: > https://blueprints.launchpad.net/linaro-infrastructure-misc/+spec/file-publi... > > > Thanks, > Paul > > > On Mon, 29 Apr 2013 18:58:39 +0300 > Paul Sokolovsky Paul.Sokolovsky@linaro.org wrote: > >> Hello, >> >> Last month I worked on a blueprint >> https://blueprints.launchpad.net/linaro-android-infrastructure/+spec/prototy... >> to prototype an implementation of publishing framework which >> wouldn't depend on particular Jenkins features (and >> misfeatures) and could be reused for other services across >> Linaro CI infrastructure. Among these other projects are: >> >> 1. OpenEmbedded builds - efficient ("fresh only") >> publishing of source tarballs and cache files. >> 2. CBuild - publishing of toolchain build artifacts and >> logs. 3. Fedora/Lava - publishing of build artifacts and >> logs. >> >> So, the good news is that was possible to implement a >> publishing system whose interface is a single script which >> hides all the publishing complexity underneath. >> Implementation was a cumbersome, because existing >> publishing backend was reused, but it already opens >> possibility for better logging, debugging, profiling, etc. >> >> With proof-of-concept client-side available, the main >> complexity still lies in server-side backend. It's clear >> that current "SFTP >> + SSH trigger script" approach doesn't scale well in terms >> of ease of setup and security. I added my considerations to >> address that topic in " Conclusions and Future Work" >> section of >> http://bazaar.launchpad.net/~linaro-automation/linaro-android-build-tools/tr... >> >> So, action items I suggest based on this report: >> >> 1. Tyler to consult with Fathi (Fedora), Marcin (OE) and me >> (CBuild) and prepare architecture/spec for the general >> publishing system. It would be nice to BP this task to start >> in 13.05. 2. Depending on the time required to prepare spec, >> implementation can be scheduled right away, or postponed >> until LCE13, so we had another chance to discuss it face to >> face (as adhoc meeting, or as a session, if it's really >> worth it). >> >> >> Thanks, >> Paul >> >> Linaro.org | Open source software for ARM SoCe >> Follow Linaro: http://www.facebook.com/pages/Linaro >> http://twitter.com/#%21/linaroorg - >> http://www.linaro.org/linaro-blog > > > > -- > Best Regards, > Paul > > Linaro.org | Open source software for ARM SoCs > Follow Linaro: http://www.facebook.com/pages/Linaro > http://twitter.com/#%21/linaroorg - > http://www.linaro.org/linaro-blog > > > -- > Best Regards, > Paul > > Linaro.org | Open source software for ARM SoCs > Follow Linaro: http://www.facebook.com/pages/Linaro > http://twitter.com/#%21/linaroorg - > http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
-- Best Regards, Paul
Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
-- James Tunnicliffe
On 3 June 2013 19:18, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Mon, 3 Jun 2013 12:57:43 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
I know some of our ARM slaves are a bit CPU light, but they also tend to have slow network connections. I am sure a bit of experimentation will tell us if we we should always move files off some slaves to an intermediary to do the hash+upload stuff.
Well, I'd personally aim to make client-side publishing support clean and lean, so it was easy to setup and run on any client, including not too powerful. Of course, some cases may need intermediary (like when we need to publish [big] files from non-networked board (hmm)), but those are niche cases.
From a MultiNode perspective, clients will set up their own services
if they need heavy lifting, e.g. Aarch64 MultiNode could easily need to setup saturation bandwidth big.LITTLE connections over TCP/IP but that is up to the ARMv8 engineering team to prepare suitable images with this support already implemented. All LAVA would need to do is provide a connection from the child job back to the parent so that each child can tell the parent the IP address it gets after boot and for the parent to tell other child jobs the IP address of other clients for that parent. This only needs a basic level of communication to be supported by LAVA itself. So this sounds like only a small part of what the publishing protocol would need.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that "security token". It's by definition limited use, like has source IP constraints, timing constraints (use not before 30min after issuance and not after 60min), other constraints, like may use for no more than 50 API calls, may publish not more than 10 files, etc.
OK, it occurs that I may have not broadcast my use case, which is: A server gets a token from the publishing service then pass it to a slave. The slave uses it and once the job has finished the server should be able to inform the publishing service the token is no longer required.
Per security practices, that's worse solution than specifying constraints upfront. What if server "forgets" to terminate token life?
We are aware of intermittent problems where a job sits in "Canceling" interminably. The risks of a token not being revoked need to be discussed but it is a token-based service which I am considering for MultiNode.
Actually, I specifically brought this question up to avoid situation that other engineers go for "spur of the moment" adhoc implementations, and we end up with bunch of crippled, insecure, hard-to-maintain publishing implementations (current Jenkins one already has enough holes and pain to setup/debug).
The two mechanisms have a lot in common, we clearly need to work together for both sets of use cases.
If we are only issuing and using the tokens over HTTPS I think that the best practice is to not restrict the use of the service other than how long the token is issued for.
Well, constraints above were just an example of what we can easily implement with HTTP-based system (and not so easily with PAM-based). Of course, the idea is that token constraints are flexible: scheduling server decides a token with how many constraints to request for particular publishing client. I agree that basic constraints to start with would be: source IP (important for EC2, maybe less important for LAVA) and max lifetime.
The lifetime being specified in the job JSON?
On the other hand, Neil sent email that there're similar challenges for multi-node LAVA setup. I didn't read thru it yet, but my guess that for (arbitrary) LAVA tests we'd rather use (and let our users use) standard tech like ssh/scp/rsync for inter-node comm, and then we'd need to have "PAM" level auth anyway, and then it makes little sense to have separate auth scheme just for publishing.
I'm not sure how much of that LAVA would need to setup for MultiNode. It's more likely that the setup of a secure connection between two clients under test would need to be part of the test itself. An image with openssh-server, known users, possibly pre-configured keys even. MultiNode cannot prescribe how clients under test arrange their in-test connections. We just need to allow for a child job to declare it's allocated IP details to the parent, the parent job to collate that data and serve it back to child jobs of the same parent, upon request via a token setup by the parent on the child filesystem prior to boot. Child jobs interested in a particular node will simply need to loop until the parent has the data from that client or fail the test on a timeout. LAVA can provide helpers to do the queries to the parent and install those onto the child as part of lava_test_shell. Those helpers could well be the same as the ones which establish the connections used for publishing too?
That would just mean exposing the helper to lava_test_shell so that a test can obtain the data and start using the IP addresses as it sees fit. There would be no need / no support for exposing the actual token outside the helpers. I'm working on the basis that MultiNode exposes only the IP addresses and hostnames of the jobs being managed by the parent along with the "role" description specified in the job JSON. If a particular client image doesn't manage to setup networking within the timeout specified by the original JSON, that client will simply have a blank IP and hostname section. So as far as authentication goes, I expect MultiNode to only need to do a minimal amount of work to read a token put onto the child before boot, contact the parent with details of the IP address of that child and then be able to query the parent for the IP addresses of other child jobs of that parent.
("parent" in this context would have to be the lava-dispatcher of the parent job as the contact details of the parent need to be written to the child filesystem prior to boot.)
Neil.
linaro-validation@lists.linaro.org