W dniu 19.10.2012 01:36, Michael Hudson-Doyle pisze:
Incidentally that's something we may collaborate on.
Yeah, so how does checkbox deal with this? I guess it doesn't quite have the concept of remote users submitted requests that jobs are run? (i.e. checkbox is more dispatcher than scheduler in lava terminology).
We have largely the same problem but in different context (there are different internal users).
Checkbox has the concept of "whitelists" which basically specify the test scenario. Each item in the whitelist is a "job" (full test definition) that can use various checkbox "plugins" (like shell, manual and many others that I'm not familiar with). Checkbox then transforms the whitelist (resolving dependencies and things like that) and executes the tests much like dispatcher would.
I see.
There are several use cases that are currently broken
Such as?
From what I recall mostly on the way upstream/downstream (and sometimes side-stream) relationships work. The actual details are specific to Canonical (I would gladly explain that in a private channel if you wish to know more) but the general idea is that without some API stability (and we offer none today), script stability (you can think of it as another level of API) our downstream users (which are NOT just consumers) have a hard time following our releases.
The second issue that is more directly addressed is that there is poor conductivity for actual tests to flow from team to team and to get "stability" people prefer to keep similar/identical tests to themselves (not as in secret but as in not collaborated upon easily)
One of the proposals would be to build a pypi-like directory of tests and use that as a base for namespacing (first-come first-served name allocation). I'm not entirely sure this would help to solve the problem but it's something that, if available, could give us another vector.
Hm. This is definitely an interesting idea. I had actually already thought that using user specified distutils- or debian-style versioning would make sense -- you would get the latest version by the chosen algorithm by default, but could still upload revisions of old versions if you wanted to.
I'd rather avoid debian-style versions in favor of strict, constant length, version system. Let's not have a custom postgresql function for comparing versions again ;)
Part of this would be a command line tool for fetching / publishing test definitions I guess. In fact this could almost be the main thing: it depends whether you want to produce (and host, I guess) a single site which is the centrepoint of the test definition world (like pypi.python.org is for Python stuff) or just the tools / protocols people use to run and work with their own repositories (testdef.validation.linaro.org or testdef.qa.ubuntu.com or whatever).
I think that there _should_ be a central repository simply because it means less fractures early on. From what I know people don't deploy their own pypi just to host their pet project. They only do that if they depend on the protocols and tools around pypi and want to keep the code private.
I think that, as with pypi, even if there is a "single centrepoint of the test definition world", we should expect that sites will have local test repositories for one reason and another (as they do with pypi).
Having said what I did above, nothing can prevent others from re-implementing the same protocols or deploying their own archive but I think we should encourage working in the common pool as this will improve the ecosystem IMHO (look at easy_install, pip or even crate.io, they would not have happened if there was a competing group of pypi-like systems that have no dominance over others). In other words the value of pypi is the data that is stored there.
Another way to handle namespacing is to include the name of the user / group that can update a resource in its name, ala branches on LP or repos on github (or bundle streams in LAVA). Not sure if that's a good idea for our use case or not.
I thought about one thing that would warrant ~user/project approach. Both pypi and launchpad are product-centric -- you go to shop for solutions looking for the product name. GitHub on the other hand is developer centric as $product can have any number of forks that are equally exposed.
I think for our goals we should focus on product-centric views. The actual code, wherever it exists, should be managed with other tools. I would not like to outgrow this concept to a DVCS or a code hosting tool.
I wonder if checkbox's rfc822ish format would be better than JSON for test interchange...
Probably although it's still imperfect and suffers from binary deficiency.
What I'd like to see in practice is a web service that is free-for-all that can hold test meta data. I believe that as we go test meta data will formalize and at some point it may become possible to run lava-test test from checkbox and checkbox job in lava (given appropriate adapters on both sides) merely by specifying the name of the test.
So that's an argument for aiming for a single site? Maybe. Maybe you'd just give a URL of a testdef rather than the name of a test, so http://testdef.validation.linaro.org/stream rather than just 'stream'.
Imagine pip installing that each time. IMO it's better to stick to names rather than URLS, if we can. People know how to manage names already and URLs is something we can only google for.
The full URL could be usable for some kind of "packages" but that's not the primary scope of the proposal, I think. Packages are more complicated and secondary and the directory should merely point you at something that you can install with an absolute URL.
Initially it could be a simple RESTful interface based on a dumb HTTP server serving files from a tree structure.
And then could grow wiki like features? :-)
I'd rather not go there. IMHO it should only have search and CRUD actions on the content. Anything beyond that works better elsewhere (readthedocs / crate.io). Remember that it's not the 'appstore' experience that we are after here. The goal is to introduce a common component that people can converge and thrive on. This alone may give us better code re-usability as we gain partial visibility to other developers _and_ we fix the release process for test definitions so that people can depend on them indefinitely.
One of the user stories we have is "which tests are available to run on board X with Y deployed to it?" -- if we use test repositories that are entirely disconnected from the LAVA database I think this becomes a bit harder to answer. Although one could make searching a required feature of a test repository...
I think that's something to do in stage 2 as we get a better understanding of what we have. In the end the perfect solution, for LAVA, might be LAVA-specific and we should not sacrifice the generic useful aspects in the quest for something this narrow.
In simple classifiers that might help there:
Environment::Hardware::SoC::OMAP35xx Environment::Hardware::Board::Panda Board ES Environment::Hardware::Add-Ons::Linaro::ABCDXYZ-Power-Probe Environment::Software::Linaro::Ubuntu Desktop Environment::Software::Ubuntu::Ubuntu Desktop
But this requires building a sensible taxonomy which is something I don't want to require in the first stage. The important part is to be _able_ to build one as the meta-data format won't constrain you. As we go we can release "official" meta-data spec releases that standardize what certain things mean. This could them be used as a basis for reliable (as in no false positives) and advanced search tools.
This would allow us to try moving some of the experimental meta-data there and build the client parts. If the idea gains traction it could grow from there.
Some considerations:
- Some tests have to be private. I don't know how to solve that in
namespaces. Some of the ideas that come to mind is .private. namespace that is explicitly non-global and can be provided by a local "test definition repository"
That would work, I think.
- It should probably be schema free, serving simple rfc822 files with
python-like classifiers (Test::Platform::Android anyone?) as this will allow free experimentation
FWIW, I think they're pedantically called "trove classifiers" :-)
Right, thanks!
I guess there would be two mandatory fields: name and version. And maybe format? So you could have
Yeah, name and version is a good start. Obviously each test definition will have a maintainer / owner but that's not something that has to be visible here (and it certainly won't be a part of what gets published "to the archive" if we go that far).
Name: stream Version: 1.0b3 Format: LAVA testdef version 1.3
We could also prefix all non-standard (non standardized) headers with the vendor string (-Linaro -Canonical) or have a standard custom extension header prefix as in HTTP, X-foo
...
and everything else would only need to make sense to LAVA.
Then you would say client side:
$ testdef-get lava-stream
We definitely need a catchy name
But seriously. I'm not entirely sure that the command line tool will be a part of the "standard issue". The same way you use pip to install python stuff from pypi you'd use lava to install test definitions into lava. I don't imagine how a generic tool could know how to interact with lava and checkbox in a way that would still be useful. While your example is strictly about running tests (it's about defining them) I think it's important to emphasize -- the protocols, and maybe the common repo, matter more than the tools as those may be more domain-specific for a while.
Fetched lava-stream version 1.0b3 $ vi lava-stream.txt # update stuff $ testdef-push lava-stream.txt ERROR: lava-stream version 1.0b3 already exists on server $ vi lava-stream.txt # Oops, update version $ testdef-push lava-stream.txt Uploaded lava-stream version 1.0b4
I wonder if we could actually cheat and use pypi to prototype this. I don't suppose they have a staging instance where I can register 20 tiny projects with oddball meta-data?
- It should (must?) have pypi-like version support so that a test can
be updated but the old definition is never lost.
Must, imho. I guess support for explicitly removing a version would be good, but the default should be append-only.
No disagreement here
- It probably does not have to be the download server as anyone can
host tests themselves. Just meta-data would be kept there.
By metadata you mean the key-value data as listed above, right?
Yes
(For small tests that may be enough but I can envision tests with external code and resources)
Yeah, the way lava-test tests can specify URLs and bzr and git repos to be fetched needs to stay I think.
That's the part I hate the most about current LAVA setup. I think that going forward they should go away and should be converted into test definitions that describe the very same code you'd git clone or bzr branch. The reason I believe that is that it will allow you do to reliable releases. This is the same distinction as pypi not having any tarballs, just git urls. I think that would defeat the long term purpose of the directory. Remember that both the test "wrapper" / definition and the test code is something that gets consumed by users/testers so _both_ should be released in the same, reliable, way.
In addition to that, having "downloads" makes offline easier. I'm not entire sure how that would work with very high level tests that, say, apt-get install something from the archive and then run some arbitrary commands. One might be tempted to create a reproducible test environment where all the downloads are kept offline and versioned but perhaps that kind of test needs to be explicitly marked as non-idempotent and that's the actual value it provides.
Thanks ZK