Hi all,
The LAVA team is working on support for private jobs -- we already have some support for private results, but if the log of the job that produced the results is publicly visible, this isn't much privacy.
The model for result security is that a set of results can be:
- anonymous (anyone can see, anyone can write) - public (anyone can see, only owning user or group can write) - private (only owning user or group can see or write)
Each non-anonymous set of results is owned by a group or user. I think this model is sufficiently flexible -- the only gap I can see is that it's not possible to have a stream where a subset of the people who can see it can submit results to it.
Clearly it makes sense to have the set of people who can see the eventual results and see the job output be the same. Currently the former group is encoded in the stream name of the submit_results action, for example:
{ "command": "submit_results", "parameters": { "server": "http://locallava/RPC2/", "stream": "/private/personal/mwhudson/test/" } }
would place results in a stream called 'test' that only I can or
"stream": "/public/team/linaro/kernel/"
identifies a stream that anyone can see but only members of the linaro group can put results in.
The scheduler *could* read out this parameter from the job json and enforce the privacy rules based on this, but that seems a bit fragile somehow. I think top level attribute in the json describing who can see the job would make sense -- we can then make sure the stream name on the submit_results matches this.
Does the /{public,private}/{personal,team}/{team-or-user-name} syntax make sense to people? I think it's reasonably clear and nicely terse.
We should do as much validation at submit time as we can (rejecting jobs that submit to streams that do not exist, for example).
Cheers, mwh
On Wed, Feb 22, 2012 at 02:21:57PM +1300, Michael Hudson-Doyle wrote:
Hi all,
The LAVA team is working on support for private jobs -- we already have some support for private results, but if the log of the job that produced the results is publicly visible, this isn't much privacy.
The model for result security is that a set of results can be:
- anonymous (anyone can see, anyone can write)
- public (anyone can see, only owning user or group can write)
- private (only owning user or group can see or write)
Each non-anonymous set of results is owned by a group or user. I think this model is sufficiently flexible -- the only gap I can see is that it's not possible to have a stream where a subset of the people who can see it can submit results to it.
We may, one day, want to implement real permissions but for the moment I think the security model we have is sufficient.
A bigger issue is the abuse of anonymous streams. I'd like to abolish them over the next few months. If anything, they were a workaround around lack of oauth support in early versions of the dashboard (something that has since proven a failure for our use case). We should IMO move everyone to non-anonymous streams and reserve anonymous streams for mass-filing of profiling information from end-users, something that we have yet to see being used.
Clearly it makes sense to have the set of people who can see the eventual results and see the job output be the same. Currently the former group is encoded in the stream name of the submit_results action, for example:
{ "command": "submit_results", "parameters": { "server": "http://locallava/RPC2/", "stream": "/private/personal/mwhudson/test/" } }
would place results in a stream called 'test' that only I can or
"stream": "/public/team/linaro/kernel/"
identifies a stream that anyone can see but only members of the linaro group can put results in.
The scheduler *could* read out this parameter from the job json and enforce the privacy rules based on this, but that seems a bit fragile somehow. I think top level attribute in the json describing who can see the job would make sense -- we can then make sure the stream name on the submit_results matches this.
Does the /{public,private}/{personal,team}/{team-or-user-name} syntax make sense to people? I think it's reasonably clear and nicely terse.
You've missed the /{any-other-name,} at the end (a single person can have any number of streams.
Despite being the author I always forget if the privacy flag comes before the owner classification. The words "personal", "private" and "public" are easy to confuse. I was thinking that perhaps we should one day migrate towards something else. The stuff below is my random proposal:
~{team-or-person}/{private,}/{name,}
We should do as much validation at submit time as we can (rejecting jobs that submit to streams that do not exist, for example).
That will break the scheduler / dashboard separation model. You must also remember that scheduler and dashboard can use separate databases so you cannot reason about remote (dashboard) users without an explicit interface (that we don't have).
On a side note. I think that the very first thing we should do is migrate Job to be a RestrictedResource. Then we can simply allow users to submit --private jobs, or delegate ownership to a --team they are a member of. This will immediately unlock a lot of testing that currently cannot happen (toolchain tests with restricted benchmarks).
When that works we can see how we can bring both extensions closer so that users have a better experience. In my opinion that is to clearly define that scheduler _must_ be in the same database as the dashboard and to discard the full URL in favour of stream name. Less confusion, all validation possible, no real use cases lost (exactly who is using a private dispatcher to schedule tests to a public dashboard, or vice versa?)
Best regards ZK
Argh, resending to include list. Sorry Zygmunt.
On Wed, 22 Feb 2012 10:10:05 +0000, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
On Wed, Feb 22, 2012 at 02:21:57PM +1300, Michael Hudson-Doyle wrote:
Hi all,
The LAVA team is working on support for private jobs -- we already have some support for private results, but if the log of the job that produced the results is publicly visible, this isn't much privacy.
The model for result security is that a set of results can be:
- anonymous (anyone can see, anyone can write)
- public (anyone can see, only owning user or group can write)
- private (only owning user or group can see or write)
Each non-anonymous set of results is owned by a group or user. I think this model is sufficiently flexible -- the only gap I can see is that it's not possible to have a stream where a subset of the people who can see it can submit results to it.
We may, one day, want to implement real permissions but for the moment I think the security model we have is sufficient.
'real permissions'?
A bigger issue is the abuse of anonymous streams. I'd like to abolish them over the next few months. If anything, they were a workaround around lack of oauth support in early versions of the dashboard (something that has since proven a failure for our use case). We should IMO move everyone to non-anonymous streams and reserve anonymous streams for mass-filing of profiling information from end-users, something that we have yet to see being used.
Yeah. This should be easy to manage now, I'm not sure how to arrange the changeover without getting every user to change their job descriptions all at once. Maybe we could say an authenticated request to put a bundle into /anonymous/foo just goes into /public/personal/$user/foo by magic.
Clearly it makes sense to have the set of people who can see the eventual results and see the job output be the same. Currently the former group is encoded in the stream name of the submit_results action, for example:
{ "command": "submit_results", "parameters": { "server": "http://locallava/RPC2/", "stream": "/private/personal/mwhudson/test/" } }
would place results in a stream called 'test' that only I can or
"stream": "/public/team/linaro/kernel/"
identifies a stream that anyone can see but only members of the linaro group can put results in.
The scheduler *could* read out this parameter from the job json and enforce the privacy rules based on this, but that seems a bit fragile somehow. I think top level attribute in the json describing who can see the job would make sense -- we can then make sure the stream name on the submit_results matches this.
Does the /{public,private}/{personal,team}/{team-or-user-name} syntax make sense to people? I think it's reasonably clear and nicely terse.
You've missed the /{any-other-name,} at the end (a single person can have any number of streams.
Right but the name of the stream is not part of the "who can see it" stuff.
Despite being the author I always forget if the privacy flag comes before the owner classification. The words "personal", "private" and "public" are easy to confuse. I was thinking that perhaps we should one day migrate towards something else. The stuff below is my random proposal:
~{team-or-person}/{private,}/{name,}
We should do as much validation at submit time as we can (rejecting jobs that submit to streams that do not exist, for example).
That will break the scheduler / dashboard separation model. You must also remember that scheduler and dashboard can use separate databases so you cannot reason about remote (dashboard) users without an explicit interface (that we don't have).
Well yes. I don't know how much of a benefit that separation is really -- some level of separation so that results can be submitted to a dashboard by a developer running tests on her desk is useful, but I don't know to what extent having the scheduler be able to send results to an entirely different dashboard is.
On a side note. I think that the very first thing we should do is migrate Job to be a RestrictedResource. Then we can simply allow users to submit --private jobs, or delegate ownership to a --team they are a member of. This will immediately unlock a lot of testing that currently cannot happen (toolchain tests with restricted benchmarks).
Yep. That's on the list.
When that works we can see how we can bring both extensions closer so that users have a better experience. In my opinion that is to clearly define that scheduler _must_ be in the same database as the dashboard and to discard the full URL in favour of stream name. Less confusion, all validation possible, no real use cases lost (exactly who is using a private dispatcher to schedule tests to a public dashboard, or vice versa?)
Yeah, I agree. The question I was trying (badly) to ask is twofold:
1) what do we want users to write in their job file?
2) (less important) how do we handle the transition from what we have now to the answer to 1)?
Cheers, mwh