Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar things are going well but it seems there is no ability to specify device tags nor to specific target device in V2 thusfar. Is support for those something that's still in the pipeline :) ?
On Sun, 24 Jul 2016 00:21:34 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar things are going well but it seems there is no ability to specify device tags nor to specific target device in V2 thusfar. Is support for those something that's still in the pipeline :) ?
Device tag support is a missing element from the V2 job submission schema. (It has support in the unit tests.) I'll get this into review next week and then into 2016.8, with some docs too.
There is no support for submitting to specific target devices as this impedes both scheduling and lab management when needing to retire broken hardware.
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:
On Sun, 24 Jul 2016 00:21:34 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar things are going well but it seems there is no ability to specify device tags nor to specific target device in V2 thusfar. Is support for those something that's still in the pipeline :) ?
Device tag support is a missing element from the V2 job submission schema. (It has support in the unit tests.) I'll get this into review next week and then into 2016.8, with some docs too.
Great thanks!
There is no support for submitting to specific target devices as this impedes both scheduling and lab management when needing to retire broken hardware.
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to specific target devices for is mostly for hacking sessions and the likes when needing to do some maintaince or other aspects that really need one specific target device rather then any regular jobs. It would be nice to cover that use-case somehow.
On Mon, 25 Jul 2016 09:29:28 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:
On Sun, 24 Jul 2016 00:21:34 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
Hey all,
We've started experimenting with lava V2 jobs in our lava lab. Thusfar things are going well but it seems there is no ability to specify device tags nor to specific target device in V2 thusfar. Is support for those something that's still in the pipeline :) ?
Device tag support is a missing element from the V2 job submission schema. (It has support in the unit tests.) I'll get this into review next week and then into 2016.8, with some docs too.
Great thanks!
There is no support for submitting to specific target devices as this impedes both scheduling and lab management when needing to retire broken hardware.
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to specific target devices for is mostly for hacking sessions and the likes when needing to do some maintaince or other aspects that really need one specific target device rather then any regular jobs. It would be nice to cover that use-case somehow.
Hacking sessions are for users though. As an admin, you already have direct access to the device. This was one of the reasons why V1 had all the device configuration on the dispatcher, so that local scripts could parse out the connection_command and power_on_cmd to get a way to get onto the device whilst it was Offline. (This is why we have maintenance mode on a per-device level.)
With V2, that information is available directly from the UI, so all the admin needs is take the device offline, ssh onto the dispatcher and have a web browser looking at the device detail page. No need to wait for the hacking session to be scheduled (another job could always get in first, even at high priority a health check takes precedence or there could be another high priority job already in the queue).
Just because hacking sessions log in a user as root, does *not* mean that this is a workable solution for administration - that confuses the issues. TestJobs, like hacking sessions, need to be ephemeral in terms of storage - that way admins can trust that users can't actually undo the admin setup just by using a hacking session themselves.
On Mon, 2016-07-25 at 10:53 +0100, Neil Williams wrote:
On Mon, 25 Jul 2016 09:29:28 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:
On Sun, 24 Jul 2016 00:21:34 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
There is no support for submitting to specific target devices as this impedes both scheduling and lab management when needing to retire broken hardware.
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to specific target devices for is mostly for hacking sessions and the likes when needing to do some maintaince or other aspects that really need one specific target device rather then any regular jobs. It would be nice to cover that use-case somehow.
Hacking sessions are for users though. As an admin, you already have direct access to the device. This was one of the reasons why V1 had all the device configuration on the dispatcher, so that local scripts could parse out the connection_command and power_on_cmd to get a way to get onto the device whilst it was Offline. (This is why we have maintenance mode on a per-device level.)
With V2, that information is available directly from the UI, so all the admin needs is take the device offline, ssh onto the dispatcher and have a web browser looking at the device detail page.
But that's basically doing by hand things that lava can already do for you.
Maybe i'm just too lazy, but I like telling lava to just go and boot a board for me with a rootfs of choice such that i can login and do whatever needs to be done without having to resort to setting things up by hand.
No need to wait for the hacking session to be scheduled (another job could always get in first, even at high priority a health check takes precedence or there could be another high priority job already in the queue).
In my experience health checks don't happen often enough to be problematic for this. For the other aspects, simply restricting submission to the device works well (Which depending on what gets done is a good choice anyway).
Though a maintaince priority/type of job that runs even if the device is currently offline and trumps all other priorities would be really nice for these kind of things. Though I bet you disagree on this aspect :)
Just because hacking sessions log in a user as root, does *not* mean that this is a workable solution for administration - that confuses the issues. TestJobs, like hacking sessions, need to be ephemeral in terms of storage - that way admins can trust that users can't actually undo the admin setup just by using a hacking session themselves.
Given that a hacking session gives you root per definitions means folks can do whatever they like on a board. Nothing is stopping someone in a hacking session to e.g. reflash the bootloader :)
On Mon, 25 Jul 2016 12:56:13 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
On Mon, 2016-07-25 at 10:53 +0100, Neil Williams wrote:
On Mon, 25 Jul 2016 09:29:28 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:
On Sun, 24 Jul 2016 00:21:34 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
There is no support for submitting to specific target devices as this impedes both scheduling and lab management when needing to retire broken hardware.
Hmm, That's true though. Fwiw What I tend to (ab)use submitting to specific target devices for is mostly for hacking sessions and the likes when needing to do some maintaince or other aspects that really need one specific target device rather then any regular jobs. It would be nice to cover that use-case somehow.
Hacking sessions are for users though. As an admin, you already have direct access to the device. This was one of the reasons why V1 had all the device configuration on the dispatcher, so that local scripts could parse out the connection_command and power_on_cmd to get a way to get onto the device whilst it was Offline. (This is why we have maintenance mode on a per-device level.)
With V2, that information is available directly from the UI, so all the admin needs is take the device offline, ssh onto the dispatcher and have a web browser looking at the device detail page.
But that's basically doing by hand things that lava can already do for you.
Maybe i'm just too lazy, but I like telling lava to just go and boot a board for me with a rootfs of choice such that i can login and do whatever needs to be done without having to resort to setting things up by hand.
Why do you need a rootfs in the first place?
With LAVA V2, the only software needed on the board is the bootloader - with the exception of devices supporting primary connections. There is nothing that needs to be done in a rootfs for a V2 device.
No need to wait for the hacking session to be scheduled (another job could always get in first, even at high priority a health check takes precedence or there could be another high priority job already in the queue).
In my experience health checks don't happen often enough to be problematic for this.
That's configurable. In a lab running 1,000 jobs a day it is routine.
For the other aspects, simply restricting submission to the device works well (Which depending on what gets done is a good choice anyway).
Though a maintaince priority/type of job that runs even if the device is currently offline and trumps all other priorities would be really nice for these kind of things. Though I bet you disagree on this aspect :)
Only a forced health check must ever run on a device which is offline. Health checks always take precedence over any priority settings.
Offline is a maintenance mode, especially for admins. That is the only purpose of having an offline status. Offline means that the device is currently unusable - it could be disconnected, bricked etc. It is up to the admin to be confident that it is safe to run a health check. There is also looping mode for repeating such tests.
We're updating the docs on health checks - stressing that a health check needs to test every type of action supported by the device type (except a hacking session as it still needs to be fully automated). The health check still needs to be quick but it also needs to be thorough.
Just because hacking sessions log in a user as root, does *not* mean that this is a workable solution for administration - that confuses the issues. TestJobs, like hacking sessions, need to be ephemeral in terms of storage - that way admins can trust that users can't actually undo the admin setup just by using a hacking session themselves.
Given that a hacking session gives you root per definitions means folks can do whatever they like on a board. Nothing is stopping someone in a hacking session to e.g. reflash the bootloader :)
Exactly - it is up to the admins to sanction such users as that causes work for the admin. It depends on the device - with a device with sufficient support, the bootloader can be safely replaced by the testjob so it would not be a problem.
I don't see what operations are needed in V2 that can be done inside a hacking session, except possibly updating the UBoot uEnv.txt but that's possible to do from the bootloader shell as well.
On Mon, 2016-07-25 at 19:11 +0100, Neil Williams wrote:
On Mon, 25 Jul 2016 12:56:13 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
On Mon, 2016-07-25 at 10:53 +0100, Neil Williams wrote:
On Mon, 25 Jul 2016 09:29:28 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
On Sun, 2016-07-24 at 14:29 +0100, Neil Williams wrote:
On Sun, 24 Jul 2016 00:21:34 +0200 Sjoerd Simons sjoerd.simons@collabora.co.uk wrote:
Maybe i'm just too lazy, but I like telling lava to just go and boot a board for me with a rootfs of choice such that i can login and do whatever needs to be done without having to resort to setting things up by hand.
Why do you need a rootfs in the first place?
With LAVA V2, the only software needed on the board is the bootloader
with the exception of devices supporting primary connections. There is nothing that needs to be done in a rootfs for a V2 device.
Yes V2 makes a lot of the task redundant which is lovely.
What I do use it for is, which is think still applies to V2 is: * Upgrading the bootloader * For some board flashing peripherals firmware * Diagnosing issues pointed out by health checks * Diagnosing issues with particular boards exposed by tests.
In a prefect world health checks would be able to check everything, in practise we don't have glass ball. In which cases it's been very helpful to go and diagnose by hand what's going on to potentially improve the health checks or general infrastructure.
No need to wait for the hacking session to be scheduled (another job could always get in first, even at high priority a health check takes precedence or there could be another high priority job already in the queue).
In my experience health checks don't happen often enough to be problematic for this.
That's configurable. In a lab running 1,000 jobs a day it is routine.
Ofcourse it's routine on the lab as a whole. I'm just talking from experience that health checks don't happen often enough on any particular board to be problematic for me to be blocked by them when i need things done on a particular board (and if they do get in the way, it's a good excuse to get some hot beverage).
For the other aspects, simply restricting submission to the device works well (Which depending on what gets done is a good choice anyway).
Though a maintaince priority/type of job that runs even if the device is currently offline and trumps all other priorities would be really nice for these kind of things. Though I bet you disagree on this aspect :)
Only a forced health check must ever run on a device which is offline. Health checks always take precedence over any priority settings.
Offline is a maintenance mode, especially for admins. That is the only purpose of having an offline status. Offline means that the device is currently unusable - it could be disconnected, bricked etc. It is up to the admin to be confident that it is safe to run a health check. There is also looping mode for repeating such tests.
Yes, what i'm saying is that as an admin who has determined all those things I'd like to run a job which is not a health check for maintaince work on the board. Potentially to solve the reason of why the health check failed in the first place :)
We're updating the docs on health checks - stressing that a health check needs to test every type of action supported by the device type (except a hacking session as it still needs to be fully automated). The health check still needs to be quick but it also needs to be thorough.
Yes that's what our healthchecks do, but as said there is always a tradeoff. (We've seen funky issues where for some reason an SD cards broke in a way that made accessing particular areas very very slow (probably firmware retrying reads) but others were entirely fine, that's not really something a healthcheck can verify and still be quick)
Just because hacking sessions log in a user as root, does *not* mean that this is a workable solution for administration - that confuses the issues. TestJobs, like hacking sessions, need to be ephemeral in terms of storage - that way admins can trust that users can't actually undo the admin setup just by using a hacking session themselves.
Given that a hacking session gives you root per definitions means folks can do whatever they like on a board. Nothing is stopping someone in a hacking session to e.g. reflash the bootloader :)
Exactly - it is up to the admins to sanction such users as that causes work for the admin. It depends on the device - with a device with sufficient support, the bootloader can be safely replaced by the testjob so it would not be a problem.
Yes that's what i've done in some cases. But even then i tend to be careful and switch boards one by one to prevent unexpected side- effect/issues.
But even if you assume it's safe enough to just run it as a normal test job with no other precautions, how would I ensure that each board has it's bootloader replaced apart from scheduling one job per specific board?
I don't see what operations are needed in V2 that can be done inside a hacking session, except possibly updating the UBoot uEnv.txt but that's possible to do from the bootloader shell as well.
Unfortunately not everything is u-boot, but yes for changing the u-boot environment ofcourse you don't need to go fully into a rootfs.
-- Sjoerd Simons Collabora Ltd.
linaro-validation@lists.linaro.org