Hi all,
One of the remaining pieces of the job health story is the notification side: we should get an email whenever a health job fails.
The reason that I've been procrastinating about this for so long is that it feels cheap to simply do "if job_failed and is_health_job: send_email". It would be better to implement some more general notification scheme and leverage that.
One existing blueprint in the area is this:
https://blueprints.launchpad.net/lava-dashboard/+spec/linaro-platforms-o-not...
which would get us a little of the way there: we could subscribe to the failure of the job_complete test in the lava test suite, but that would tell us about all failing jobs. We could beef up the subscription model to limit to a bundle stream or something, but *that* starts to feel a bit arbitrary (other options would be to filter on test run tags or other bits of metadata).
I guess we should take this problem to the other WGs and find out what notifications they would like to receive (if we can convince them to care at all until we can test the bootloader :(), and in the mean time do the cheap thing?
Cheers, mwh
On 3 April 2012 07:41, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
One of the remaining pieces of the job health story is the notification side: we should get an email whenever a health job fails.
The reason that I've been procrastinating about this for so long is that it feels cheap to simply do "if job_failed and is_health_job: send_email". It would be better to implement some more general notification scheme and leverage that.
+1
One existing blueprint in the area is this:
https://blueprints.launchpad.net/lava-dashboard/+spec/linaro-platforms-o-not...
which would get us a little of the way there: we could subscribe to the failure of the job_complete test in the lava test suite, but that would tell us about all failing jobs. We could beef up the subscription model to limit to a bundle stream or something, but *that* starts to feel a bit arbitrary (other options would be to filter on test run tags or other bits of metadata).
I'd like to share some use cases for job notifications (not specifically health jobs):
* As a user or an admin, I want to know the lab health I can subscribe to notifications (rss feed or e-mail) from http://validation.linaro.org/lava-server/scheduler/labhealth/
* As a user, I want to get my job results I can subscribe to my notifications in my profile
I guess we should take this problem to the other WGs and find out what notifications they would like to receive (if we can convince them to care at all until we can test the bootloader :(), and in the mean time do the cheap thing?
In general people care but they have to know the job id they're looking for and monitor the job. It can take time to get a slot in the queue and also to run the tests themselves. Receiving notifications is a huge step forward for the users (at least me).
On Tue, Apr 3, 2012 at 3:09 AM, Fathi Boudra fathi.boudra@linaro.orgwrote:
On 3 April 2012 07:41, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
One of the remaining pieces of the job health story is the notification side: we should get an email whenever a health job fails.
The reason that I've been procrastinating about this for so long is that it feels cheap to simply do "if job_failed and is_health_job: send_email". It would be better to implement some more general notification scheme and leverage that.
+1
One existing blueprint in the area is this:
https://blueprints.launchpad.net/lava-dashboard/+spec/linaro-platforms-o-not...
That was written quite a while back. My current opinion is that we should probably have a notify list in the json, of email addresses we wish to notify when the job is complete. This could be modified slightly to have a separate list of email address that only want to be notified in the event of a failure. For health check jobs, we set the email address in the notify_on_fail list to lava-notifications mailing list. For jobs that I submit, I would probably want all notifications of job completion to go to my email address so I know when the job has finished running, or has been canceled. Bonus points for including a summary of the results in the email. :)
-Paul Larson
On Tue, 3 Apr 2012 13:31:20 -0500, Paul Larson paul.larson@linaro.org wrote:
On Tue, Apr 3, 2012 at 3:09 AM, Fathi Boudra fathi.boudra@linaro.orgwrote:
On 3 April 2012 07:41, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
One of the remaining pieces of the job health story is the notification side: we should get an email whenever a health job fails.
The reason that I've been procrastinating about this for so long is that it feels cheap to simply do "if job_failed and is_health_job: send_email". It would be better to implement some more general notification scheme and leverage that.
+1
One existing blueprint in the area is this:
https://blueprints.launchpad.net/lava-dashboard/+spec/linaro-platforms-o-not...
That was written quite a while back. My current opinion is that we should probably have a notify list in the json, of email addresses we wish to notify when the job is complete. This could be modified slightly to have a separate list of email address that only want to be notified in the event of a failure. For health check jobs, we set the email address in the notify_on_fail list to lava-notifications mailing list. For jobs that I submit, I would probably want all notifications of job completion to go to my email address so I know when the job has finished running, or has been canceled. Bonus points for including a summary of the results in the email. :)
Yeah, that's probably good enough to get going for now. But it seems suboptimal for the automatic jobs -- to get notified of (say) boot failures of the tilt tree on panda, you'd have to talk to the jenkins admins to get yourself added to the job file submitted, which seems very roundabout. So sure, let's do notify/notify_on_fail (although will users understand what 'fail' means here?) but we should also do something in the spirit of the blueprint I linked to eventually.
Cheers, mwh
On Tue, 3 Apr 2012 11:09:14 +0300, Fathi Boudra fathi.boudra@linaro.org wrote:
On 3 April 2012 07:41, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
One of the remaining pieces of the job health story is the notification side: we should get an email whenever a health job fails.
The reason that I've been procrastinating about this for so long is that it feels cheap to simply do "if job_failed and is_health_job: send_email". It would be better to implement some more general notification scheme and leverage that.
+1
One existing blueprint in the area is this:
https://blueprints.launchpad.net/lava-dashboard/+spec/linaro-platforms-o-not...
which would get us a little of the way there: we could subscribe to the failure of the job_complete test in the lava test suite, but that would tell us about all failing jobs. We could beef up the subscription model to limit to a bundle stream or something, but *that* starts to feel a bit arbitrary (other options would be to filter on test run tags or other bits of metadata).
I'd like to share some use cases for job notifications (not specifically health jobs):
Thanks.
- As a user or an admin, I want to know the lab health
I can subscribe to notifications (rss feed or e-mail) from http://validation.linaro.org/lava-server/scheduler/labhealth/
- As a user, I want to get my job results
I can subscribe to my notifications in my profile
Here you're envisioning a setting that says "for all jobs I submit, please email me the results"? Would what Paul suggested, i.e. putting your email address in the job file be an acceptable compromise?
It would make subscribing to job health failures a lot less self-service but I think for the "own job" case it would be OK.
I guess we should take this problem to the other WGs and find out what notifications they would like to receive (if we can convince them to care at all until we can test the bootloader :(), and in the mean time do the cheap thing?
In general people care but they have to know the job id they're looking for and monitor the job. It can take time to get a slot in the queue and also to run the tests themselves. Receiving notifications is a huge step forward for the users (at least me).
Yeah. Let's get something done this cycle!
Cheers, mwh
On 4 April 2012 01:51, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
On Tue, 3 Apr 2012 11:09:14 +0300, Fathi Boudra fathi.boudra@linaro.org wrote:
On 3 April 2012 07:41, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Hi all,
One of the remaining pieces of the job health story is the notification side: we should get an email whenever a health job fails.
The reason that I've been procrastinating about this for so long is that it feels cheap to simply do "if job_failed and is_health_job: send_email". It would be better to implement some more general notification scheme and leverage that.
+1
One existing blueprint in the area is this:
https://blueprints.launchpad.net/lava-dashboard/+spec/linaro-platforms-o-not...
which would get us a little of the way there: we could subscribe to the failure of the job_complete test in the lava test suite, but that would tell us about all failing jobs. We could beef up the subscription model to limit to a bundle stream or something, but *that* starts to feel a bit arbitrary (other options would be to filter on test run tags or other bits of metadata).
I'd like to share some use cases for job notifications (not specifically health jobs):
Thanks.
- As a user or an admin, I want to know the lab health
I can subscribe to notifications (rss feed or e-mail) from http://validation.linaro.org/lava-server/scheduler/labhealth/
- As a user, I want to get my job results
I can subscribe to my notifications in my profile
Here you're envisioning a setting that says "for all jobs I submit, please email me the results"? Would what Paul suggested, i.e. putting your email address in the job file be an acceptable compromise?
Yes, that's acceptable.
It would make subscribing to job health failures a lot less self-service but I think for the "own job" case it would be OK.
I guess we should take this problem to the other WGs and find out what notifications they would like to receive (if we can convince them to care at all until we can test the bootloader :(), and in the mean time do the cheap thing?
In general people care but they have to know the job id they're looking for and monitor the job. It can take time to get a slot in the queue and also to run the tests themselves. Receiving notifications is a huge step forward for the users (at least me).
Yeah. Let's get something done this cycle!
Thanks.
linaro-validation@lists.linaro.org