Hi all,
I just wanted to forward this thread from LAKML to linaro-dev: http://article.gmane.org/gmane.linux.ports.tegra/10683
Seems there is lots desire for an improvement to automated build coverage and automated reporting along with it.
Regards, Mike
On 9 May 2013 20:03, Mike Turquette mturquette@linaro.org wrote:
Hi all,
I just wanted to forward this thread from LAKML to linaro-dev: http://article.gmane.org/gmane.linux.ports.tegra/10683
Seems there is lots desire for an improvement to automated build coverage and automated reporting along with it.
I replied to it. We've got already such daily builds with boot testing: https://ci.linaro.org/jenkins/view/kernel-ci
I'm surprised that some people involved in Linaro and this thread didn't mentioned it. Anyway, it's a good opportunity to remind people that we've got a Kernel CI and I'll be happy to get more feedback to improve it.
Cheers,
Fathi
On Thu, May 9, 2013 at 7:56 PM, Fathi Boudra fathi.boudra@linaro.orgwrote:
Hi all,
I just wanted to forward this thread from LAKML to linaro-dev: http://article.gmane.org/gmane.linux.ports.tegra/10683
Seems there is lots desire for an improvement to automated build coverage and automated reporting along with it.
I replied to it. We've got already such daily builds with boot testing: https://ci.linaro.org/jenkins/view/kernel-ci
I'm surprised that some people involved in Linaro and this thread didn't mentioned it. Anyway, it's a good opportunity to remind people that we've got a Kernel CI and I'll be happy to get more feedback to improve it.
Hi Fathi,
I have to admit that what we do in terms of Kernel CI is still a bit fuzzy to me, even now that I am an insider. When I was at TI and working closely with the TI Landing team, I don't believe we ever reached the point where Linaro kernel CI was useful for the 'products' we were jointly doing. Now that I am at Linaro, I am going to need LAVA and kernel CI for our project shortly. I have no doubt that what is being is worthwhile, but I believe a little bit of marketing and/or presentation would be very welcome. It might be nice to highlight the bugs that have been found (and fixed?) *thanks to* Linaro kernel CI too, for example. also in the link above all of the 7 'active' jobs are failing with 3 of them who always failed, and 2 of them failing for 2 weeks. so it's not clear what that means. i am sure it doesn't mean that none of our kernel ever boots ;-) if we want Kernel CI to be useful and kernel devs to rely on it, it should work all the time, so that failure are quickly identified and fixed. maybe this is why Linaro Kernel CI was not mentioned by Linaro people in that thread.
nico
On 13/05/13 16:12, the mail apparently from Nicolas Dechesne included:
On Thu, May 9, 2013 at 7:56 PM, Fathi Boudra <fathi.boudra@linaro.org mailto:fathi.boudra@linaro.org> wrote:
> Hi all, > > I just wanted to forward this thread from LAKML to linaro-dev: > http://article.gmane.org/gmane.linux.ports.tegra/10683 > > Seems there is lots desire for an improvement to automated build > coverage and automated reporting along with it. I replied to it. We've got already such daily builds with boot testing: https://ci.linaro.org/jenkins/view/kernel-ci I'm surprised that some people involved in Linaro and this thread didn't mentioned it. Anyway, it's a good opportunity to remind people that we've got a Kernel CI and I'll be happy to get more feedback to improve it.
Hi Fathi,
I have to admit that what we do in terms of Kernel CI is still a bit fuzzy to me, even now that I am an insider. When I was at TI and working closely with the TI Landing team, I don't believe we ever reached the point where Linaro kernel CI was useful for the 'products' we were jointly doing. Now that I am at Linaro, I am going to need LAVA and kernel CI for our project shortly. I have no doubt that what is being is worthwhile, but I believe a little bit of marketing and/or presentation would be very welcome. It might be nice to highlight the bugs that have been found (and fixed?) *thanks to* Linaro kernel CI too, for example. also in the link above all of the 7 'active' jobs are failing with 3 of them who always failed, and 2 of them failing for 2 weeks. so it's not clear what that means. i am sure it doesn't mean that none of our kernel ever boots ;-) if we want Kernel CI to be useful and kernel devs to rely on it, it should work all the time, so that failure are quickly identified and fixed. maybe this is why Linaro Kernel CI was not mentioned by Linaro people in that thread.
I think TI use of CI only evolved as far as compiling the thing, it's not hooked up to any actual testing.
The error mails we are still getting spammed with are partly my fault.
Previously, LAVA would remain silent if a build failed. That is quite a bad situation if you're committing to that tree, and the last thing you heard was everything is good then the build machine has a problem and stops testing. You can have committed something that broke build even, but continue thinking everything is good because nothing is telling you that it's not actually in test any more.
So after some prompting from me pointing out that false sense of security undermines the point of LAVA, now we get notifications that the build attempt failed. However since I didn't touch tilt-3.4 or tilt-tracking for months, it's surprising the number of failure reports we periodically get that are basically Lava infrastructure choking on trying to build it, not any actual problems.
-Andy
Nicolas Dechesne nicolas.dechesne@linaro.org writes:
On Thu, May 9, 2013 at 7:56 PM, Fathi Boudra fathi.boudra@linaro.orgwrote:
Hi all,
I just wanted to forward this thread from LAKML to linaro-dev: http://article.gmane.org/gmane.linux.ports.tegra/10683
Seems there is lots desire for an improvement to automated build coverage and automated reporting along with it.
I replied to it. We've got already such daily builds with boot testing: https://ci.linaro.org/jenkins/view/kernel-ci
I'm surprised that some people involved in Linaro and this thread didn't mentioned it. Anyway, it's a good opportunity to remind people that we've got a Kernel CI and I'll be happy to get more feedback to improve it.
Hi Fathi,
I have to admit that what we do in terms of Kernel CI is still a bit fuzzy to me, even now that I am an insider. When I was at TI and working closely with the TI Landing team, I don't believe we ever reached the point where Linaro kernel CI was useful for the 'products' we were jointly doing. Now that I am at Linaro, I am going to need LAVA and kernel CI for our project shortly. I have no doubt that what is being is worthwhile, but I believe a little bit of marketing and/or presentation would be very welcome. It might be nice to highlight the bugs that have been found (and fixed?) *thanks to* Linaro kernel CI too, for example. also in the link above all of the 7 'active' jobs are failing with 3 of them who always failed, and 2 of them failing for 2 weeks. so it's not clear what that means. i am sure it doesn't mean that none of our kernel ever boots ;-) if we want Kernel CI to be useful and kernel devs to rely on it, it should work all the time, so that failure are quickly identified and fixed. maybe this is why Linaro Kernel CI was not mentioned by Linaro people in that thread.
I'll second Nicolas' comments.
As a kernel developer and upstream maintainer (and now working for Linaro), it's not at all clear (or documented that I can find) how kernel CI is being structure and implemented nor how it could be useful for kernel developers and especially maintainers (who are *very* interested early notification of build/boot failures for various defconfigs and even non-ARM arch builds.)
I use my own jenkins setup for some of my stuff, and scripts for the rest, but it would be ideal to not have to duplicate effort here.
A quick scan of https://ci.linaro.org/jenkins/view/kernel-ci makes me realize there is nothing terribly helpful there (based on reasons Nico already pointed out), but it has peaked my curiosity on the Jenkins changes/plugins being used.
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
Kevin
On 14 May 2013 21:52, Kevin Hilman khilman@linaro.org wrote:
I'll second Nicolas' comments.
As a kernel developer and upstream maintainer (and now working for Linaro), it's not at all clear (or documented that I can find) how kernel CI is being structure and implemented nor how it could be useful for kernel developers and especially maintainers (who are *very* interested early notification of build/boot failures for various defconfigs and even non-ARM arch builds.)
https://wiki.linaro.org/Platform/Infrastructure/infrastructure-knowledgebase... but I believe it's now partly outdated.
I use my own jenkins setup for some of my stuff, and scripts for the rest, but it would be ideal to not have to duplicate effort here.
A quick scan of https://ci.linaro.org/jenkins/view/kernel-ci makes me realize there is nothing terribly helpful there (based on reasons Nico already pointed out), but it has peaked my curiosity on the Jenkins changes/plugins being used.
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up. I used a multi-configuration project for these jobs. If you have more questions, just ask.
Cheers,
Fathi
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up.
I'm logged in to ci.linaro.org (via Launchpad) but I don't see the 'configure' option on any of the jobs, or the 'New Job' link to create a new job. What am I missing?
I used a multi-configuration project for these jobs. If you have more questions, just ask.
I use multi-configuration projects locally too, but the output of yours looks different from mine. Not sure if it's because of different jenkins versions, or if you've got some additional jenkins add-ons. That's why I'd like to see the jenkins job details.
Thanks for the help,
Kevin
On 14 May 2013 23:49, Kevin Hilman khilman@linaro.org wrote:
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up.
I'm logged in to ci.linaro.org (via Launchpad) but I don't see the 'configure' option on any of the jobs, or the 'New Job' link to create a new job. What am I missing?
Make sure "Team membership: linaro-ci-build-test-service" is checked when you log in.
I used a multi-configuration project for these jobs. If you have more questions, just ask.
I use multi-configuration projects locally too, but the output of yours looks different from mine. Not sure if it's because of different jenkins versions, or if you've got some additional jenkins add-ons. That's why I'd like to see the jenkins job details.
to get a list of plugins installed: https://ci.linaro.org/jenkins/pluginManager/api/xml?depth=1 https://ci.linaro.org/jenkins/pluginManager/api/xml?depth=1&xpath=/*/*/s...
off-topic, I'd like to move the jobs configuration to a more friendly format: http://ci.openstack.org/jenkins-job-builder/configuration.html
hosted on git.linaro.org. It should make the life easier to set up job and duplicate them as needed.
Thanks for the help,
Kevin
Fathi Boudra fathi.boudra@linaro.org writes:
On 14 May 2013 23:49, Kevin Hilman khilman@linaro.org wrote:
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up.
I'm logged in to ci.linaro.org (via Launchpad) but I don't see the 'configure' option on any of the jobs, or the 'New Job' link to create a new job. What am I missing?
Make sure "Team membership: linaro-ci-build-test-service" is checked when you log in.
Yup, that works. Now I can see all the configuration. Thanks.
Kevin
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up. I used a multi-configuration project for these jobs. If you have more questions, just ask.
Now that I have permission on linaro jenkins, I started experimenting with trying to get something useful for ARM maintainers, and I created a basic job[1] for building all ARM defconfigs (120+) in linux-next.
This is a simple build-only job. No downloading toolchains, ubuntu packaging, lava testing, etc. etc like the other Linaro CI jobs. Just kernel builds, and output that should make sense to kernel developers. IMO, this is the starting point to having some basic sanity testing for maintainers.
I have several questions (and suggestions) now on how to speed this up based on configuration of the slaves, as well as several questions around best practice for using the slaves, how workspaces/tools/scripts are (or aren't) shared between slaves, etc.
The first suggestion is to speed up the git clones/fetches. Even a shallow git clone (--depth=1) is taking > 3 minutes on the slaves. What I do on my home jenkins box is to have a local repo (periodically updated via cron), and then use the advanced options under jenkins git SCM config to point to it using the "Path of the reference repo to use during clone" option. That makes the git clones/fetches very fast since they're (almost) always from a locl repo.
Another suggestion is to have ccache installed on the slaves. Since this job is building much of the same code over and over (120+ defconfigs), ccache would dramatically speed this up, and probably make it more sensible to run all the builds sequentially on the same slave.
Kevin
P.S. I hope I'm not overloading the slaves too much[2]
P.P.S. Can someone install the Jenkins Warnings Plugin[3]. This can automatically post-process gcc warnings/errors and plot the history of them across multiple jobs. It's a really useful plugin.
[1] https://ci.linaro.org/jenkins/job/khilman-kernel-arm-next/ [2] https://ci.linaro.org/jenkins/label/kernel_cloud/load-statistics [3] https://wiki.jenkins-ci.org/display/JENKINS/Warnings+Plugin
On 18 May 2013 04:06, Kevin Hilman khilman@linaro.org wrote:
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up. I used a multi-configuration project for these jobs. If you have more questions, just ask.
Now that I have permission on linaro jenkins, I started experimenting with trying to get something useful for ARM maintainers, and I created a basic job[1] for building all ARM defconfigs (120+) in linux-next.
This is a simple build-only job. No downloading toolchains, ubuntu packaging, lava testing, etc. etc like the other Linaro CI jobs.
It will be great if we can keep using latest Linaro GCC. We copy a tarball from the master to slave and extract it only once, it's taking less than a minute on the full build time.
We probably want to keep the boot testing part optional, there's several options to implement, without impacting the main build testing job. IMO, to investigate.
Just kernel builds, and output that should make sense to kernel developers. IMO, this is the starting point to having some basic sanity testing for maintainers.
I have several questions (and suggestions) now on how to speed this up based on configuration of the slaves, as well as several questions around best practice for using the slaves, how workspaces/tools/scripts are (or aren't) shared between slaves, etc.
The first suggestion is to speed up the git clones/fetches. Even a shallow git clone (--depth=1) is taking > 3 minutes on the slaves. What I do on my home jenkins box is to have a local repo (periodically updated via cron), and then use the advanced options under jenkins git SCM config to point to it using the "Path of the reference repo to use during clone" option. That makes the git clones/fetches very fast since they're (almost) always from a local repo.
The difference between the slaves and your home box is that the slaves are ephemeral instances (terminated after a defined timeout) while your box is always up. We'll need to move to a persistent slave (stopped instead of terminated), opening the door to proper caching (local mirror).
We have such set up for OpenEmbedded builds and other tricks in our toolbox for Android builds.
Another suggestion is to have ccache installed on the slaves. Since this job is building much of the same code over and over (120+ defconfigs), ccache would dramatically speed this up, and probably make it more sensible to run all the builds sequentially on the same slave.
ccache is already installed but due to the ephemeral property of current instances, it isn't exploited. Once again, moving to persistent slave will resolve the issue.
Also, the EC2 instances aren't I/O optimized. On some jobs, I create a tmpfs directory where I build. It reduces dramatically build time.
Kevin
P.S. I hope I'm not overloading the slaves too much[2]
nope :)
P.P.S. Can someone install the Jenkins Warnings Plugin[3]. This can automatically post-process gcc warnings/errors and plot the history of them across multiple jobs. It's a really useful plugin.
Done. Note, we have log parser plugin doing similar post-processing on gcc warnings/errors.
[1] https://ci.linaro.org/jenkins/job/khilman-kernel-arm-next/ [2] https://ci.linaro.org/jenkins/label/kernel_cloud/load-statistics [3] https://wiki.jenkins-ci.org/display/JENKINS/Warnings+Plugin
Cheers, -- Fathi Boudra Builds and Baselines Manager | Release Manager Linaro.org | Open source software for ARM SoCs
On 18 May 2013 10:12, Fathi Boudra fathi.boudra@linaro.org wrote:
On 18 May 2013 04:06, Kevin Hilman khilman@linaro.org wrote:
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Now that I have permission on linaro jenkins, I started experimenting with trying to get something useful for ARM maintainers, and I created a basic job[1] for building all ARM defconfigs (120+) in linux-next.
I've looked at your job, for building all ARM defconfigs (120+) in linux-next, it took less than an hour (latest build). It sounds reasonable to me :) What's your build time expectation/goal?
Fathi Boudra fathi.boudra@linaro.org writes:
On 18 May 2013 04:06, Kevin Hilman khilman@linaro.org wrote:
Fathi Boudra fathi.boudra@linaro.org writes:
[...]
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
You've got access and can even view/modify the set up. I used a multi-configuration project for these jobs. If you have more questions, just ask.
Now that I have permission on linaro jenkins, I started experimenting with trying to get something useful for ARM maintainers, and I created a basic job[1] for building all ARM defconfigs (120+) in linux-next.
This is a simple build-only job. No downloading toolchains, ubuntu packaging, lava testing, etc. etc like the other Linaro CI jobs.
It will be great if we can keep using latest Linaro GCC. We copy a tarball from the master to slave and extract it only once, it's taking less than a minute on the full build time.
OK.
We probably want to keep the boot testing part optional, there's several options to implement, without impacting the main build testing job. IMO, to investigate.
Yes, the build, packaging, boot test, etc. should all be separate jobs that are independent, but could be chained together as needed.
Just kernel builds, and output that should make sense to kernel developers. IMO, this is the starting point to having some basic sanity testing for maintainers.
I have several questions (and suggestions) now on how to speed this up based on configuration of the slaves, as well as several questions around best practice for using the slaves, how workspaces/tools/scripts are (or aren't) shared between slaves, etc.
The first suggestion is to speed up the git clones/fetches. Even a shallow git clone (--depth=1) is taking > 3 minutes on the slaves. What I do on my home jenkins box is to have a local repo (periodically updated via cron), and then use the advanced options under jenkins git SCM config to point to it using the "Path of the reference repo to use during clone" option. That makes the git clones/fetches very fast since they're (almost) always from a local repo.
The difference between the slaves and your home box is that the slaves are ephemeral instances (terminated after a defined timeout) while your box is always up. We'll need to move to a persistent slave (stopped instead of terminated), opening the door to proper caching (local mirror).
Yes, that sounds better. I'll be glad to test.
We have such set up for OpenEmbedded builds and other tricks in our toolbox for Android builds.
Another suggestion is to have ccache installed on the slaves. Since this job is building much of the same code over and over (120+ defconfigs), ccache would dramatically speed this up, and probably make it more sensible to run all the builds sequentially on the same slave.
ccache is already installed but due to the ephemeral property of current instances, it isn't exploited. Once again, moving to persistent slave will resolve the issue.
OK
Also, the EC2 instances aren't I/O optimized.
Yes, I've definitely noticed that.
Here's a quick comparison of my home build box (6 x i7, 16G RAM) compared to the linaro build slaves. Build times in seconds:
mine linaro allnoconfig 18 128 exynos4_defconfig 32 201 imx_v6_v7_defconfig 72 703 omap2plus_defconfig 100 ?
That's roughly a 10x difference, and that's only comparing the build time, not the time it takes for the git clone, toolchain copy/unpack, etc.
However, with all parallel builds due to the slaves, all 120+ defconfigs still finish quicker on the slaves than my single box at home.
On some jobs, I create a tmpfs directory where I build. It reduces dramatically build time.
Do you have some examples of how to do that as a normal jenkins user?
I see there's a /run directory on the slaves mounted as a tmpfs but it's only 1.5G and only root accessible.
I'd be willing to run some experiments with a local tmpfs. I'm sure that will make a huge difference.
Kevin
+++ Kevin Hilman [2013-05-14 11:52 -0700]:
Nicolas Dechesne nicolas.dechesne@linaro.org writes:
On Thu, May 9, 2013 at 7:56 PM, Fathi Boudra fathi.boudra@linaro.orgwrote:
Hi all,
I just wanted to forward this thread from LAKML to linaro-dev: http://article.gmane.org/gmane.linux.ports.tegra/10683
Seems there is lots desire for an improvement to automated build coverage and automated reporting along with it.
I replied to it. We've got already such daily builds with boot testing: https://ci.linaro.org/jenkins/view/kernel-ci
I'm surprised that some people involved in Linaro and this thread didn't mentioned it.
I have to admit that what we do in terms of Kernel CI is still a bit fuzzy to me, even now that I am an insider.
I'll second Nicolas' comments.
As a kernel developer and upstream maintainer (and now working for Linaro), it's not at all clear (or documented that I can find) how kernel CI is being structure and implemented nor how it could be useful for kernel developers and especially maintainers (who are *very* interested early notification of build/boot failures for various defconfigs and even non-ARM arch builds.)
Is there a way for us (linaro folks) to see more of the Jenkins setup for these jobs (including the scripts.) There appears to be some useful add-ons being used. Read-only access to the detailed configuration of the jenkins jobs would be very useful.
I've been mugged into improving our CI, with initial focus particularly on kernels, so I'm actually quite keen to hear what people actually want/need, and why they aren't using it currently (if they aren't). Speed of builds is one thing that's obviously in need of improvement and that's coming very soon, but there are no doubt other issues.
I'll set up a session at Connect Dublin on this so we can have a brainstorm, and people can get a better idea of what we have already. But obviously feel free to continue this thread or poke me directly. There is certainly plenty of stuff we can make faster/clearer/more useful.
I've so far got as far as working out how the Jenkins part of it works and implementing faster ubuntu kernel builds (by crossing them instead of native-building on canonical PPAs) (I'm still kicking this, but I hope it'll be running any day now), but there is piles of other stuff that only Fathi and the Lava people understand currently :-)
Glad to hear that some people do actually care and want to use this stuff...
Wookey
On 13 May 2013 11:12, Nicolas Dechesne nicolas.dechesne@linaro.org
also in the link above all of the 7 'active' jobs are failing with 3 of them who always failed, and 2 of them failing for 2 weeks. so it's not clear what that means.
I we have a look at one og the jobs that is "always failing" :
https://ci.linaro.org/jenkins/job/linux-next/
We see that it is the snowball build, that has never built successfully, and thus jenkins considers the whole job failed. That is of course somewhat unfair representation, since other build are successful. But it also means, that nobody has cared about snowball in linux-next, not even enough to make the defconfig build.
It is kind of underlining Russel's point - that CI is not useful if people are not watching the results and reacting on them.