dependencies between actions in the dispatcher - linaro-dev

11 Nov 2011


      Recently I've been thinking about the dispatcher a bit (as other mails
should have indicated) and I've gotten to think about dependencies
between actions.  If you don't already know, a dispatcher job file
mostly consists of a list of actions to execute, for example:
"actions": [
    {
      "command": "deploy_linaro_image",
      "parameters": {"rootfs": "...", "hwpack": "..."}
    },
    {
      "command": "lava_test_install",
      "parameters": {"tests": ["stream", "ltp"]}
    },
    {
      "command": "boot_linaro_image"
    },
    {
      "command": "lava_test_run",
      "parameters": {"test_name": "stream"}
    },
    {
      "command": "lava_test_run",
      "parameters": {"test_name": "ltp"}
    },
    {
      "command": "submit_results",
      "parameters": {
          "server": "http://localhost/lava-server/RPC2/",
          "stream": "/anonymous/test/"
        }
    }
  ]
I hope what the actions do is reasonably clear from their names.
What is easy-ish for us, but probably rather harder for a computer
program to do is to see the data dependencies between the different
actions.
boot_linaro_image makes no sense if deploy_linaro_image failed.
Running tests with lava_test_run doesn't make sense if the test failed
to install (or if boot_linaro_image failed).
But the lava_test_run's are independent, even if the stream test hangs
we could still run the ltp tests.
And we should always submit the results, that's a kind of special case
(and I'm not sure submit results should really be an action).
It seems like the way we (aim to, at least) handle this isn't too bad:
basically any action can veto the running of any more actions (apart
from the special case of submit_results).
But there's more than control flow going on here -- there is data flow
too.  The reason I'm writing this mail is that I'm working on testing
images in qemu[1].  If we want to use a similar layout of the job files
(and I think using the same commands even will be possible), then we'll
have an action that builds an image and another that starts up qemu.
The action that starts up qemu needs to know where the previous action
is!
And of course I've been a bit sneaky here, because there's another, very
important kind of data that needs to move around: the test results.
Currently we assume that all the test results end up in a particular
directory (either on the device (for ubuntu-based tests) or on the host
(android-based tests)).  This feels a bit grotty to me, and will need to
change for tests run under qemu and possibly for the multi-system tests
that were discussed at the connect.
There is an object in the dispatcher -- the context -- that encapsulates
the state that persists through the run, so this is probably where the
data should live -- we could have a dictionary attached to the context
and then deploy_linaro_image for a qemu client type could stuff the path
into this and the boot_linaro_image action for a qemu client could read
this path (and complain appropriately if its not there).  Additionally
we could have a list of 'result locations' (could be filesystem paths on
the host, or locations on the linaro image) and the submit results step
could read from here to gather together the results.
This feels like it will work, but is still a bit implicit -- it's still
not obvious that boot_linaro_image depends on something
deploy_linaro_image does -- but maybe this is information that should be
maintained outside in the job file, maybe in a JSON schema for job files?
Apologies for the second brain dump today.  I think these are the
changes I want to make:
1) Change submit results to not be an action.
2) Add a result_locations list and action_data dictionary to
   LavaContext.  My half-thought through idea is that actions will use
   the action name as a prefix, e.g. deploy_linaro_image for a qemu
   client might set 'deploy_linaro_image.qemu_img_path' in this dict.
3) Change the lava-test and lava-android-test to store into
   result_locations and the submit step to read from there.
4) Use action data to have deploy_linaro_image and boot_linaro_image
   (and maybe lava_test_install and lava_test_run) talk to each other.
What do you guys think?
Cheers,
mwh
[1] testing in qemu is perhaps not incredibly useful for us, but doing
    this forces me to confront some of the issues with testing images
    with in a fast model, which is something we really want to do, as we
    can get access to the fast model of the cortex-a15 long before we'll
    get access to hardware