On 16 April 2013 15:31, Paul Sokolovsky <paul.sokolovsky@linaro.org> wrote:
Hello,

On Tue, 16 Apr 2013 06:19:51 +0530
Vishal Bhoj <vishal.bhoj@linaro.org> wrote:

[]
> > > This error is related to infrastructure. I am not sure if this
> > > will be resolved if the publishing is updated.
> >
> > ChannelClosedException as quoted below means an EC2 instance got
> > terminated (or otherwise "lost") behind Jenkins' back. Generally,
> > this issue is a known non-deterministic failure issue, and bound to
> > happen from time to time due to the nature of EC2 (complex big
> > system, has non-zero stream of errors).
> >
>
> We really need to find a solution for this. We are running into this
> error on a regular basis nowadays.

Yes, I see that 2 builds I tried yesterday didn't succeed either, with
the same error. Builds of that job look weird, because they're still in
compile phase after 3.5hrs after the start - that's too long. At last 2
builds were killed at almost the same time, but it's not build timeout
(set at 4:45mins, looks differently), not EC2 monitoring script (not
active now, never killed running builds, only zombie instances).

I'm cc:ing Phillip just in case if he may know of anything which may
kill EC2 in the "old" EC2 Linaro account in 3.5hrs?


I still think the likely cause though is master overload due to
publishing issues, and would like to keep working on resolving that
first. I have good results so far - "copycat" build on a sandbox
finished with less than 2min:
https://ec2-107-20-93-222.compute-1.amazonaws.com/jenkins/job/pfalcon_galaxynexus-linaro/9/

That's 1/2 of all work needed tho, going to deploy needed parts on
production and continue with it.

Is there any update on why we are seeing this failure ? We still continue to see the same failure:
https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro-mp/263/console


>
> >
> > The fact that it happened 3 times is worrying though. We appear to
> > have had another zombie slave storm over weekend, and on top of
> > that, we have
> > https://bugs.launchpad.net/linaro-android-infrastructure/+bug/1164273 ,
> > which causes builds to prolong too much, to hit any EC2-related
> > issues with higher probability. I'm working on that issue as top
> > priority.
> >
> > In the meantime, I restarted
> > https://android-build.linaro.org/builds/~linaro-android/vexpress-linaro-mp/
> > and will watch it to make sure we have fresh succeeding build.
> >
> >
> >
> > > On 15 April 2013 10:55, Naresh Kamboju <naresh.kamboju@linaro.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to inform you that following builds failed.
> > > >
> > > >
> > > >
> > https://android-build.linaro.org/builds/~linaro-android/vexpress-linaro-mp/#build=257
> > > >
> > > >
> > https://android-build.linaro.org/builds/~linaro-android/vexpress-linaro-mp/#build=256
> > > >
> > > >
> > https://android-build.linaro.org/builds/~linaro-android/vexpress-linaro-mp/#build=255
> > > >
> > > > Build log snapshot:
> > > > -----------------------
> > > > Caused by: hudson.remoting.ChannelClosedException: channel is
> > > > already closed
> > > > Caused by: hudson.remoting.Channel$OrderlyShutdown
> > > > Caused by: Command close created at
> > > >
> > > > FATAL: hudson.remoting.RequestAbortedException:
> > > > hudson.remoting.Channel$OrderlyShutdown
> > > >
> > > > build complete log:
> > > > -----------------------
> > > >
> > > >
> > https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro-mp/257/consoleText
> > > >
> > > > Best regards
> > > > Naresh Kamboju
> > > >
> >
> >
> >
> > --
> > Best Regards,
> > Paul
> >
> > Linaro.org | Open source software for ARM SoCs
> > Follow Linaro: http://www.facebook.com/pages/Linaro
> > http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
> >



--
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog