+android list
On Tue, Aug 21, 2012 at 12:41 PM, Dave Pigott dave.pigott@linaro.org wrote:
Dave Pigott Validation Engineer T: +44 1223 40 00 63 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 21 Aug 2012, at 10:19, Alexander Sack wrote:
On Tue, Aug 21, 2012 at 10:37 AM, Dave Pigott dave.pigott@linaro.org wrote:
beaglexm02
http://validation.linaro.org/lava-server/scheduler/job/29737
Absolutely enormous log file. The board was in a very strange state, spewing
out loads of exceptions. Went onto the board and it was still throwing out
exceptions. Did a hard reset and it came back cleanly. Not clear why hard
reset didn't work from the LAVA session.
Put back online to retest.
origen04
http://validation.linaro.org/lava-server/scheduler/job/28745
Failed to get root.tgz. I may be missing something, but if you look at
http://validation.linaro.org/lava-server/scheduler/job/28745/log_file#entry1...
you'll see that it says it's waiting 60 seconds to retry, but doesn't seem
to actually retry. Anyone any ideas?
Put back online to retest
panda01-05/09/10/12/14-23
http://validation.linaro.org/lava-server/scheduler/job/29825 (as an example)
This is just odd. It says it couldn't get the android artefact
(http://validation.linaro.org/lava-server/scheduler/job/29825/log_file#entry2...)
but it doesn't appear to have even issued a wget!
Looking at the time stamps, something happened between 14:00UTC and 20:00UTC
that stopped things working. Whatever it was, I'm retesting panda01 to see
if it went away, or if (as I suspect) all the other boards will fail when
they run their health check.
Could we maintain an easy to find trackrecord about what was deployed when? This might also help us to attach a check list that people run through and sign off before pushing the production button (e.g. all health jobs must have succeeded on staging before rolling out etc.).
--
+1 to that, but in this case, it turns out that the android image we use for testing comes from snapshots, and that particular snapshot was retired yesterday. None of the releases seem stable enough to use for health checks so, for the moment, we're re-baslining on a new working snapshot (liuyq is working on this at the moment.)
For the future, we're discussing holding the health check images cached locally, so that we're not hampered by these issues again, and we can choose when and how to re-baseline.
Hmm. I would very much prefer if we could figure a way to use released images as our health-check base. Even if it means we do a "special" promotion of a certain daily build outside the monthly cadence if needed ...
On this front: is there an easy way to check that a certain build is suitable for health check? If so, we could make that part of our monthly validation process and daily dashboard and assign priority to bugs that would disqualify a build from being suitable as a health check...
What do you think?
On 21 Aug 2012, at 12:16, Alexander Sack wrote:
+android list
On Tue, Aug 21, 2012 at 12:41 PM, Dave Pigott dave.pigott@linaro.org wrote:
+1 to that, but in this case, it turns out that the android image we use for testing comes from snapshots, and that particular snapshot was retired yesterday. None of the releases seem stable enough to use for health checks so, for the moment, we're re-baslining on a new working snapshot (liuyq is working on this at the moment.)
For the future, we're discussing holding the health check images cached locally, so that we're not hampered by these issues again, and we can choose when and how to re-baseline.
Hmm. I would very much prefer if we could figure a way to use released images as our health-check base. Even if it means we do a "special" promotion of a certain daily build outside the monthly cadence if needed ...
After testing, I've baselined to the 12.07 android release for health checks. Every other board type takes its images from releases, so panda was an anomaly.
On this front: is there an easy way to check that a certain build is suitable for health check? If so, we could make that part of our monthly validation process and daily dashboard and assign priority to bugs that would disqualify a build from being suitable as a health check...
What do you think?
Hmm. Nice idea, and I was thinking along similar lines as we were going through this crisis. We need to think about this and how we might achieve it.
Dave
On 08/21/2012 07:48 AM, Dave Pigott wrote:
On 21 Aug 2012, at 12:16, Alexander Sack wrote:
+android list
On Tue, Aug 21, 2012 at 12:41 PM, Dave Pigott dave.pigott@linaro.org wrote:
+1 to that, but in this case, it turns out that the android image we use for testing comes from snapshots, and that particular snapshot was retired yesterday. None of the releases seem stable enough to use for health checks so, for the moment, we're re-baslining on a new working snapshot (liuyq is working on this at the moment.)
For the future, we're discussing holding the health check images cached locally, so that we're not hampered by these issues again, and we can choose when and how to re-baseline.
Hmm. I would very much prefer if we could figure a way to use released images as our health-check base. Even if it means we do a "special" promotion of a certain daily build outside the monthly cadence if needed ...
After testing, I've baselined to the 12.07 android release for health checks. Every other board type takes its images from releases, so panda was an anomaly.
When we were trying to fix health check issues, we were having troubles finding a really good panda build. I think YongQin had found this one. 12.07 seems reasonable for now.
On this front: is there an easy way to check that a certain build is suitable for health check? If so, we could make that part of our monthly validation process and daily dashboard and assign priority to bugs that would disqualify a build from being suitable as a health check...
What do you think?
Hmm. Nice idea, and I was thinking along similar lines as we were going through this crisis. We need to think about this and how we might achieve it.
I had my 1x1 with asac and have a thought on this. Lets take the monthly releases and submit like 50 jobs for that image that work like a health job. We can then look at the results and see how it compares to the previous month's releases. This should help us know what's good and also provide some valuable metrics to the other Platform teams in Linaro.
Andy Doan andy.doan@linaro.org writes:
On 08/21/2012 07:48 AM, Dave Pigott wrote:
On 21 Aug 2012, at 12:16, Alexander Sack wrote:
On this front: is there an easy way to check that a certain build is suitable for health check? If so, we could make that part of our monthly validation process and daily dashboard and assign priority to bugs that would disqualify a build from being suitable as a health check...
What do you think?
Hmm. Nice idea, and I was thinking along similar lines as we were going through this crisis. We need to think about this and how we might achieve it.
I had my 1x1 with asac and have a thought on this. Lets take the monthly releases and submit like 50 jobs for that image that work like a health job. We can then look at the results and see how it compares to the previous month's releases. This should help us know what's good and also provide some valuable metrics to the other Platform teams in Linaro.
That said, I don't really see why we'd change which image our health check board uses unless we had a good reason. The point of them is to keep an eye on the state of the hardware...
Cheers, mwh
On 22 August 2012 15:36, Michael Hudson-Doyle michael.hudson@linaro.org wrote:
Andy Doan andy.doan@linaro.org writes:
On 08/21/2012 07:48 AM, Dave Pigott wrote:
On 21 Aug 2012, at 12:16, Alexander Sack wrote:
On this front: is there an easy way to check that a certain build is suitable for health check? If so, we could make that part of our monthly validation process and daily dashboard and assign priority to bugs that would disqualify a build from being suitable as a health check...
What do you think?
Hmm. Nice idea, and I was thinking along similar lines as we were going through this crisis. We need to think about this and how we might achieve it.
I had my 1x1 with asac and have a thought on this. Lets take the monthly releases and submit like 50 jobs for that image that work like a health job. We can then look at the results and see how it compares to the previous month's releases. This should help us know what's good and also provide some valuable metrics to the other Platform teams in Linaro.
That said, I don't really see why we'd change which image our health check board uses unless we had a good reason. The point of them is to keep an eye on the state of the hardware...
If its all the same I think we should shut off beagle and stick with Panda. Running beagle when we don't need too has other costs.
Cheers, mwh
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
On 08/22/2012 04:26 PM, Zach Pfeffer wrote:
If its all the same I think we should shut off beagle and stick with Panda. Running beagle when we don't need too has other costs.
Before we could shut down beagle in the lab you need to:
* shut down all CI kernel jobs for it * shut down pre-built image building and submissin * shut down jenkins hardware pack building for it
That said - they are still handy for us. ie - its not a cost to LAVA. Its a benefit.
On 21 August 2012 06:16, Alexander Sack asac@linaro.org wrote:
+android list
On Tue, Aug 21, 2012 at 12:41 PM, Dave Pigott dave.pigott@linaro.org wrote:
Dave Pigott Validation Engineer T: +44 1223 40 00 63 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 21 Aug 2012, at 10:19, Alexander Sack wrote:
On Tue, Aug 21, 2012 at 10:37 AM, Dave Pigott dave.pigott@linaro.org wrote:
beaglexm02
Shouldn't beagle be off?
http://validation.linaro.org/lava-server/scheduler/job/29737
Absolutely enormous log file. The board was in a very strange state, spewing
out loads of exceptions. Went onto the board and it was still throwing out
exceptions. Did a hard reset and it came back cleanly. Not clear why hard
reset didn't work from the LAVA session.
Put back online to retest.
origen04
http://validation.linaro.org/lava-server/scheduler/job/28745
Failed to get root.tgz. I may be missing something, but if you look at
http://validation.linaro.org/lava-server/scheduler/job/28745/log_file#entry1...
you'll see that it says it's waiting 60 seconds to retry, but doesn't seem
to actually retry. Anyone any ideas?
Put back online to retest
panda01-05/09/10/12/14-23
http://validation.linaro.org/lava-server/scheduler/job/29825 (as an example)
This is just odd. It says it couldn't get the android artefact
(http://validation.linaro.org/lava-server/scheduler/job/29825/log_file#entry2...)
but it doesn't appear to have even issued a wget!
Looking at the time stamps, something happened between 14:00UTC and 20:00UTC
that stopped things working. Whatever it was, I'm retesting panda01 to see
if it went away, or if (as I suspect) all the other boards will fail when
they run their health check.
Could we maintain an easy to find trackrecord about what was deployed when? This might also help us to attach a check list that people run through and sign off before pushing the production button (e.g. all health jobs must have succeeded on staging before rolling out etc.).
We have this in Gerrit, and want to move what isn't in Gerrit, in Gerrit. This allows us to stage everything change by change and revert things that break the build.
--
+1 to that, but in this case, it turns out that the android image we use for testing comes from snapshots, and that particular snapshot was retired yesterday. None of the releases seem stable enough to use for health checks so, for the moment, we're re-baslining on a new working snapshot (liuyq is working on this at the moment.)
For the future, we're discussing holding the health check images cached locally, so that we're not hampered by these issues again, and we can choose when and how to re-baseline.
Hmm. I would very much prefer if we could figure a way to use released images as our health-check base. Even if it means we do a "special" promotion of a certain daily build outside the monthly cadence if needed ...
On this front: is there an easy way to check that a certain build is suitable for health check? If so, we could make that part of our monthly validation process and daily dashboard and assign priority to bugs that would disqualify a build from being suitable as a health check...
What do you think?
-- Alexander Sack Technical Director, Linaro Platform Teams http://www.linaro.org | Open source software for ARM SoCs http://twitter.com/#%21/linaroorg - http://www.linaro.org/linaro-blog
linaro-android mailing list linaro-android@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-android
On 21 Aug 2012, at 15:55, Zach Pfeffer wrote:
On 21 August 2012 06:16, Alexander Sack asac@linaro.org wrote:
+android list
On Tue, Aug 21, 2012 at 12:41 PM, Dave Pigott dave.pigott@linaro.org wrote:
Dave Pigott Validation Engineer T: +44 1223 40 00 63 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 21 Aug 2012, at 10:19, Alexander Sack wrote:
On Tue, Aug 21, 2012 at 10:37 AM, Dave Pigott dave.pigott@linaro.org wrote:
beaglexm02
Shouldn't beagle be off?
Nobody's told me if that's the case. Are we not supporting beagle any more?
Dave
On 08/22/2012 01:52 AM, Dave Pigott wrote:
On 21 Aug 2012, at 15:55, Zach Pfeffer wrote:
On 21 August 2012 06:16, Alexander Sack asac@linaro.org wrote:
+android list
On Tue, Aug 21, 2012 at 12:41 PM, Dave Pigott dave.pigott@linaro.org wrote:
Dave Pigott Validation Engineer T: +44 1223 40 00 63 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 21 Aug 2012, at 10:19, Alexander Sack wrote:
On Tue, Aug 21, 2012 at 10:37 AM, Dave Pigott dave.pigott@linaro.org wrote:
beaglexm02
Shouldn't beagle be off?
Nobody's told me if that's the case. Are we not supporting beagle any more?
I've gotten no requests and there are still kernel CI jobs and pre-built images being submitted daily.
Andy Doan andy.doan@linaro.org writes:
Shouldn't beagle be off?
Nobody's told me if that's the case. Are we not supporting beagle any more?
I've gotten no requests and there are still kernel CI jobs and pre-built images being submitted daily.
Beagles are useful to have around for us because noone really cares if we break them and the health jobs complete really quickly :)
Cheers, mwh
linaro-android@lists.linaro.org