Re: [Linaro-validation] New publishing infra prototype report

4 Jun 2013


      On 3 June 2013 19:18, Paul Sokolovsky paul.sokolovsky@linaro.org wrote:
...
On Mon, 3 Jun 2013 12:57:43 +0100
James Tunnicliffe james.tunnicliffe@linaro.org wrote:
...
I know some of our ARM slaves are a bit CPU light, but they also tend
to have slow network connections. I am sure a bit of experimentation
will tell us if we we should always move files off some slaves to an
intermediary to do the hash+upload stuff.
Well, I'd personally aim to make client-side publishing support
clean and lean, so it was easy to setup and run on any client, including
not too powerful. Of course, some cases may need intermediary (like
when we need to publish [big] files from non-networked board (hmm)),
but those are niche cases.
...
From a MultiNode perspective, clients will set up their own services
if they need heavy lifting, e.g. Aarch64 MultiNode could easily need
to setup saturation bandwidth big.LITTLE connections over TCP/IP but
that is up to the ARMv8 engineering team to prepare suitable images
with this support already implemented. All LAVA would need to do is
provide a connection from the child job back to the parent so that
each child can tell the parent the IP address it gets after boot and
for the parent to tell other child jobs the IP address of other
clients for that parent. This only needs a basic level of
communication to be supported by LAVA itself. So this sounds like only
a small part of what the publishing protocol would need.
...
...
...
...
Do we want to authenticate this sort of call? Should just be a
dictionary or DB lookup, so it would probably involve more CPU
time to authenticate it. That said, you can use it to fish for
files that already exist that you don't have access to, so perhaps
we need to filter the results based on each user...
My idea is that all publishing API calls are authed by that
"security token". It's by definition limited use, like has source IP
constraints, timing constraints (use not before 30min after issuance
and not after 60min), other constraints, like may use for no more
than 50 API calls, may publish not more than 10 files, etc.
OK, it occurs that I may have not broadcast my use case, which is: A
server gets a token from the publishing service then pass it to a
slave. The slave uses it and once the job has finished the server
should be able to inform the publishing service the token is no longer
required.
Per security practices, that's worse solution than specifying
constraints upfront. What if server "forgets" to terminate token life?
We are aware of intermittent problems where a job sits in "Canceling"
interminably. The risks of a token not being revoked need to be
discussed but it is a token-based service which I am considering for
MultiNode.
...
Actually, I specifically brought this question up to avoid situation
that other engineers go for "spur of the moment" adhoc implementations,
and we end up with bunch of crippled, insecure, hard-to-maintain
publishing implementations (current Jenkins one already has enough
holes and pain to setup/debug).
The two mechanisms have a lot in common, we clearly need to work
together for both sets of use cases.
...
...
If we are only issuing and using the tokens over HTTPS I think that
the best practice is to not restrict the use of the service other than
how long the token is issued for.
Well, constraints above were just an example of what we can easily
implement with HTTP-based system (and not so easily with PAM-based). Of
course, the idea is that token constraints are flexible: scheduling
server decides a token with how many constraints to request for
particular publishing client. I agree that basic constraints to start
with would be: source IP (important for EC2, maybe less important for
LAVA) and max lifetime.
The lifetime being specified in the job JSON?
...
...
...
On the other hand, Neil sent email that there're similar challenges
for multi-node LAVA setup. I didn't read thru it yet, but my guess
that for (arbitrary) LAVA tests we'd rather use (and let our users
use) standard tech like ssh/scp/rsync for inter-node comm, and then
we'd need to have "PAM" level auth anyway, and then it makes little
sense to have separate auth scheme just for publishing.
I'm not sure how much of that LAVA would need to setup for MultiNode.
It's more likely that the setup of a secure connection between two
clients under test would need to be part of the test itself. An image
with openssh-server, known users, possibly pre-configured keys even.
MultiNode cannot prescribe how clients under test arrange their
in-test connections. We just need to allow for a child job to declare
it's allocated IP details to the parent, the parent job to collate
that data and serve it back to child jobs of the same parent, upon
request via a token setup by the parent on the child filesystem prior
to boot. Child jobs interested in a particular node will simply need
to loop until the parent has the data from that client or fail the
test on a timeout. LAVA can provide helpers to do the queries to the
parent and install those onto the child as part of lava_test_shell.
Those helpers could well be the same as the ones which establish the
connections used for publishing too?
That would just mean exposing the helper to lava_test_shell so that a
test can obtain the data and start using the IP addresses as it sees
fit. There would be no need / no support for exposing the actual token
outside the helpers. I'm working on the basis that MultiNode exposes
only the IP addresses and hostnames of the jobs being managed by the
parent along with the "role" description specified in the job JSON. If
a particular client image doesn't manage to setup networking within
the timeout specified by the original JSON, that client will simply
have a blank IP and hostname section. So as far as authentication
goes, I expect MultiNode to only need to do a minimal amount of work
to read a token put onto the child before boot, contact the parent
with details of the IP address of that child and then be able to query
the parent for the IP addresses of other child jobs of that parent.
("parent" in this context would have to be the lava-dispatcher of the
parent job as the contact details of the parent need to be written to
the child filesystem prior to boot.)
Neil.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

Re: [Linaro-validation] New publishing infra prototype report