On 3 June 2013 19:18, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
On Mon, 3 Jun 2013 12:57:43 +0100 James Tunnicliffe james.tunnicliffe@linaro.org wrote:
I know some of our ARM slaves are a bit CPU light, but they also tend to have slow network connections. I am sure a bit of experimentation will tell us if we we should always move files off some slaves to an intermediary to do the hash+upload stuff.
Well, I'd personally aim to make client-side publishing support clean and lean, so it was easy to setup and run on any client, including not too powerful. Of course, some cases may need intermediary (like when we need to publish [big] files from non-networked board (hmm)), but those are niche cases.
From a MultiNode perspective, clients will set up their own services
if they need heavy lifting, e.g. Aarch64 MultiNode could easily need to setup saturation bandwidth big.LITTLE connections over TCP/IP but that is up to the ARMv8 engineering team to prepare suitable images with this support already implemented. All LAVA would need to do is provide a connection from the child job back to the parent so that each child can tell the parent the IP address it gets after boot and for the parent to tell other child jobs the IP address of other clients for that parent. This only needs a basic level of communication to be supported by LAVA itself. So this sounds like only a small part of what the publishing protocol would need.
Do we want to authenticate this sort of call? Should just be a dictionary or DB lookup, so it would probably involve more CPU time to authenticate it. That said, you can use it to fish for files that already exist that you don't have access to, so perhaps we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that "security token". It's by definition limited use, like has source IP constraints, timing constraints (use not before 30min after issuance and not after 60min), other constraints, like may use for no more than 50 API calls, may publish not more than 10 files, etc.
OK, it occurs that I may have not broadcast my use case, which is: A server gets a token from the publishing service then pass it to a slave. The slave uses it and once the job has finished the server should be able to inform the publishing service the token is no longer required.
Per security practices, that's worse solution than specifying constraints upfront. What if server "forgets" to terminate token life?
We are aware of intermittent problems where a job sits in "Canceling" interminably. The risks of a token not being revoked need to be discussed but it is a token-based service which I am considering for MultiNode.
Actually, I specifically brought this question up to avoid situation that other engineers go for "spur of the moment" adhoc implementations, and we end up with bunch of crippled, insecure, hard-to-maintain publishing implementations (current Jenkins one already has enough holes and pain to setup/debug).
The two mechanisms have a lot in common, we clearly need to work together for both sets of use cases.
If we are only issuing and using the tokens over HTTPS I think that the best practice is to not restrict the use of the service other than how long the token is issued for.
Well, constraints above were just an example of what we can easily implement with HTTP-based system (and not so easily with PAM-based). Of course, the idea is that token constraints are flexible: scheduling server decides a token with how many constraints to request for particular publishing client. I agree that basic constraints to start with would be: source IP (important for EC2, maybe less important for LAVA) and max lifetime.
The lifetime being specified in the job JSON?
On the other hand, Neil sent email that there're similar challenges for multi-node LAVA setup. I didn't read thru it yet, but my guess that for (arbitrary) LAVA tests we'd rather use (and let our users use) standard tech like ssh/scp/rsync for inter-node comm, and then we'd need to have "PAM" level auth anyway, and then it makes little sense to have separate auth scheme just for publishing.
I'm not sure how much of that LAVA would need to setup for MultiNode. It's more likely that the setup of a secure connection between two clients under test would need to be part of the test itself. An image with openssh-server, known users, possibly pre-configured keys even. MultiNode cannot prescribe how clients under test arrange their in-test connections. We just need to allow for a child job to declare it's allocated IP details to the parent, the parent job to collate that data and serve it back to child jobs of the same parent, upon request via a token setup by the parent on the child filesystem prior to boot. Child jobs interested in a particular node will simply need to loop until the parent has the data from that client or fail the test on a timeout. LAVA can provide helpers to do the queries to the parent and install those onto the child as part of lava_test_shell. Those helpers could well be the same as the ones which establish the connections used for publishing too?
That would just mean exposing the helper to lava_test_shell so that a test can obtain the data and start using the IP addresses as it sees fit. There would be no need / no support for exposing the actual token outside the helpers. I'm working on the basis that MultiNode exposes only the IP addresses and hostnames of the jobs being managed by the parent along with the "role" description specified in the job JSON. If a particular client image doesn't manage to setup networking within the timeout specified by the original JSON, that client will simply have a blank IP and hostname section. So as far as authentication goes, I expect MultiNode to only need to do a minimal amount of work to read a token put onto the child before boot, contact the parent with details of the IP address of that child and then be able to query the parent for the IP addresses of other child jobs of that parent.
("parent" in this context would have to be the lava-dispatcher of the parent job as the contact details of the parent need to be written to the child filesystem prior to boot.)
Neil.