Re: Incident Management (was: Re: pointless mail, (was Re: android-build's are failing...))

11 May 2012

      On Fri, 11 May 2012 12:11:36 +1200, Michael Hudson-Doyle michael.hudson@canonical.com wrote:
...
On Fri, 11 May 2012 00:30:26 +0200, Alexander Sack asac@linaro.org wrote:
...
On Fri, May 11, 2012 at 12:24 AM, Ricardo Salveti
...
Sure, I just think there are better places for it :-) Based on issues
we had with LAVA and Jenkins at the previous cycle, if I had one email
for every issue, I'd send at least 20 of them, which is useful but
that still doesn't make me send them to the list.]
Actually, I think LAVA outage was announced. I poked for getting more
status updates, so more mails would have been great.
Same goes for ci.linaro.org ... if our CI service used for everything
but android is not available, I want to get a mail that this is the
case.
So, what this discussion points to is: we need a process for handling
disruptions to the services we provide.  When the **** hits the fan, the
last think you want people to be doing is _thinking_, or at least,
thinking about things that could have been thought through ahead of
time and are not totally specific to the incident at hand.
Just recently within the LAVA team, we've started following such a
process:
https://wiki.linaro.org/Internal/LAVA/Incidents

(apologies to the non-Linaro insiders for the internal link).  The
process will look very familiar to anyone who works at Canonical...
Creating a wiki page for each incident can feel a bit heavyweight,
It turns out that moin has a funky NewPage macro
(https://wiki.linaro.org/HelpOnMacros#Others) that one can use to make
this really easy.  So we've scrapped the Google document.
Cheers,
mwh

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: Incident Management (was: Re: pointless mail, (was Re: android-build's are failing...))