Recently I've been thinking about the dispatcher a bit (as other mails should have indicated) and I've gotten to think about dependencies between actions. If you don't already know, a dispatcher job file mostly consists of a list of actions to execute, for example:
"actions": [ { "command": "deploy_linaro_image", "parameters": {"rootfs": "...", "hwpack": "..."} }, { "command": "lava_test_install", "parameters": {"tests": ["stream", "ltp"]} }, { "command": "boot_linaro_image" }, { "command": "lava_test_run", "parameters": {"test_name": "stream"} }, { "command": "lava_test_run", "parameters": {"test_name": "ltp"} }, { "command": "submit_results", "parameters": { "server": "http://localhost/lava-server/RPC2/", "stream": "/anonymous/test/" } } ]
I hope what the actions do is reasonably clear from their names.
What is easy-ish for us, but probably rather harder for a computer program to do is to see the data dependencies between the different actions.
boot_linaro_image makes no sense if deploy_linaro_image failed.
Running tests with lava_test_run doesn't make sense if the test failed to install (or if boot_linaro_image failed).
But the lava_test_run's are independent, even if the stream test hangs we could still run the ltp tests.
And we should always submit the results, that's a kind of special case (and I'm not sure submit results should really be an action).
It seems like the way we (aim to, at least) handle this isn't too bad: basically any action can veto the running of any more actions (apart from the special case of submit_results).
But there's more than control flow going on here -- there is data flow too. The reason I'm writing this mail is that I'm working on testing images in qemu[1]. If we want to use a similar layout of the job files (and I think using the same commands even will be possible), then we'll have an action that builds an image and another that starts up qemu. The action that starts up qemu needs to know where the previous action is!
And of course I've been a bit sneaky here, because there's another, very important kind of data that needs to move around: the test results. Currently we assume that all the test results end up in a particular directory (either on the device (for ubuntu-based tests) or on the host (android-based tests)). This feels a bit grotty to me, and will need to change for tests run under qemu and possibly for the multi-system tests that were discussed at the connect.
There is an object in the dispatcher -- the context -- that encapsulates the state that persists through the run, so this is probably where the data should live -- we could have a dictionary attached to the context and then deploy_linaro_image for a qemu client type could stuff the path into this and the boot_linaro_image action for a qemu client could read this path (and complain appropriately if its not there). Additionally we could have a list of 'result locations' (could be filesystem paths on the host, or locations on the linaro image) and the submit results step could read from here to gather together the results.
This feels like it will work, but is still a bit implicit -- it's still not obvious that boot_linaro_image depends on something deploy_linaro_image does -- but maybe this is information that should be maintained outside in the job file, maybe in a JSON schema for job files?
Apologies for the second brain dump today. I think these are the changes I want to make:
1) Change submit results to not be an action. 2) Add a result_locations list and action_data dictionary to LavaContext. My half-thought through idea is that actions will use the action name as a prefix, e.g. deploy_linaro_image for a qemu client might set 'deploy_linaro_image.qemu_img_path' in this dict. 3) Change the lava-test and lava-android-test to store into result_locations and the submit step to read from there. 4) Use action data to have deploy_linaro_image and boot_linaro_image (and maybe lava_test_install and lava_test_run) talk to each other.
What do you guys think?
Cheers, mwh
[1] testing in qemu is perhaps not incredibly useful for us, but doing this forces me to confront some of the issues with testing images with in a fast model, which is something we really want to do, as we can get access to the fast model of the cortex-a15 long before we'll get access to hardware