linaro-validation February 2012

linaro-validation@lists.linaro.org

25 participants
36 discussions

Re: [Linaro-validation] Second update on fast models

by Zygmunt Krynicki

On Tue, Feb 14, 2012 at 3:26 AM, Michael Hudson-Doyle <michael.hudson(a)canonical.com> wrote: > On Mon, 13 Feb 2012 22:27:25 +0100, Zygmunt Krynicki <zygmunt.krynicki(a)linaro.org> wrote: >> Hi. >> >> Fast model support is getting better. It seems that with the excellent >> patches by Peter Maydell we can now boot some kernels (I've only tried >> one tree, additional trees welcome :-). I'm currently building a >> from-scratch environment to ensure everything is accounted for and I >> understand how pieces interact. >> >> Having said that I'd like to summarize how LAVA handles fast models: >> >> Technically the solution is not unlike QEMU which you are all familiar >> with. The key differences are: >> 1) Only NFS boot makes sense. There are no other sensible method that >> I know of. We may also use a SD card (virtual obviously) but it is >> constrained to two gigabytes of data. > > As mentioned in the other thread, it would be good to at least let ARM > know that removing this limit would help us (if we can figure out how to > do this). We may figure out how to do this by reading the LISA source code that came with the model. That's a big task though (maybe grepping for mmc0 is a low hanging fruit, I did not check) >> 2) The way we actually boot is complicated. There is no uboot, fast >> model interpreter actually starts an .axf file that can do anything >> (some examples include running tests and benchmarks without actually >> setting the kernel or anything like that). There is no way to easily >> load the kernel and pass a command line. To work around that we're >> using a special axf file that uses fast model semihosting features to >> load the kernel/initrd from a host filesystem as well as to setup the >> command line that will be passed to the booting kernel. This allows us >> to freely configure NFS services and point our virtual kernel at >> appropriate IP addresses and pathnames. > > So I guess I'd like to understand how this works in a bit more detail. > Can you brain dump on the topic for a few minutes? :) What is "fast > model semihosting"? It's a way to have "syscalls" that connect the "bare hardware" (be it physical or emulated) to an external debugger or other monitor. You can find a short introduction in this blog [1]. For us it means we get to write bare-metal assembly that does the equivalent of open(), read(), write() and close(). The files are being opened are on the machine that runs the fast model. You can also print debugging statements straight to the console this way (we probably could write semihosting console driver if there is no such code yet) to get all of the output to the same tty that runs the model (model_shell). A more detailed explanation of this topic can be found in [2] Fast model semihosting simply refers to using semihosting facilities in a fast model interpreter. [1]: http://blogs.arm.com/software-enablement/418-semihosting-a-life-saver-durin… [2]: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471c/CHDJHH… > >> 3) After the machine starts up we immediately open a TPC/IP connection >> to a local TCP socket. We know which port is being used so we can >> easily allocate them up front. This port is now the traditional LAVA >> serial console. > > I guess there is a risk here that we will miss early boot messages? > This might not matter much. There are other options but currently this seems to work quite okay. > Once we've found someone at ARM we can officially complain at about fast > models, an option to have serial comms happen on the process > stdin/stdout would be nice. I think the reason they don't happen in the console is that by default we get four telnet ports to connect to (definitely more than one) so the logical question they'll ask is "which port should we redirect". Maybe there is an option buried somewhere to make that happen but so far I have not found it. > >> The rest of this looks like QEMU: >> - you can access the filesystem easily (to gather results) >> - we can use QEMU to chroot into the NFS root to install additional >> software (emulation via a fast model is extremely slow) > > In my testing, the pip install bzr+lp:lava-test step did not really work > under QEMU. Maybe it does now, or maybe we can install a tarball or > something. I installed lava-test using release tarball. That has worked pretty well. In general I think that: 1) We need to reconsider how to do testing on very slow machines 2) What can be invoked on the host (part of installation, unless that wants to build stuff, result parsing and tracking) 3) What has to be invoked on the target (test code, system probes) It's important to make the intent very clear. If we define that cmd_install installs something while in "master image" on the "target" then we should not break that. I think that it would be sensible to add "host_chroot" mode that applies nicely to qemu and fast models. Very slow things that don't care about the architecture could be invoked in that mode without sacrificing performance. Thanks ZK

13 years, 4 months

Launchpad page for lava-project updated

by Zygmunt Krynicki

Hi I took the liberty to update the launchpad project page for lava-project [1]. I plan to use that whenever people come asking about bugs/features/documentation. You may want to review the description. /me really wishes for rich text editing on launchpad projects, especially if it could just display the README file, eh [1]: https://launchpad.net/lava-project/

13 years, 4 months

a7/a15 kernels for fast models in lava

by Paul Larson

Hi, I wanted to sync up on where things are with this, and as I understand, there's some confusion still about how we should get the available kernel and/or images for testing fast models. First off, it seems there is no way to just take a kernel .axf as we thought because the boot args are wrapped into it as well. Is there really no way to inject that stuff after build time? Is it worth revisiting whether we should have proper hwpacks for fast models? I know there's the 2G max sd size issue with that, but if that's something arm can fix, or if we have another way around it, would that help? Finally, I feel like we've chased a pretty messy route to get fast models supported here, which will ultimately break completely when we try to get android running also. Please correct me if I'm wrong here, but it seems the "recommended" approach is currently: lava takes as input: git tree for the kernel we want to build, defconfig to use, and rootfs tarball lava builds the kernel (lava doesn't do this currently, complicating things for us quite a bit - would be better if this step could be done externally, like in jenkins where we do the kernel ci builds) lava pushes the build axf to another system where we have the fast models software installed, provisions the rootfs for booting over nfs lava boots the system under fast models on this other machine, runs tests, gathers results, etc Is that pretty close Zygmunt? Is there something more straightforward and less fragile that we can do here? Thanks, Paul Larson

13 years, 4 months

Second update on fast models

by Zygmunt Krynicki

Hi. Fast model support is getting better. It seems that with the excellent patches by Peter Maydell we can now boot some kernels (I've only tried one tree, additional trees welcome :-). I'm currently building a from-scratch environment to ensure everything is accounted for and I understand how pieces interact. Having said that I'd like to summarize how LAVA handles fast models: Technically the solution is not unlike QEMU which you are all familiar with. The key differences are: 1) Only NFS boot makes sense. There are no other sensible method that I know of. We may also use a SD card (virtual obviously) but it is constrained to two gigabytes of data. 2) The way we actually boot is complicated. There is no uboot, fast model interpreter actually starts an .axf file that can do anything (some examples include running tests and benchmarks without actually setting the kernel or anything like that). There is no way to easily load the kernel and pass a command line. To work around that we're using a special axf file that uses fast model semihosting features to load the kernel/initrd from a host filesystem as well as to setup the command line that will be passed to the booting kernel. This allows us to freely configure NFS services and point our virtual kernel at appropriate IP addresses and pathnames. 3) After the machine starts up we immediately open a TPC/IP connection to a local TCP socket. We know which port is being used so we can easily allocate them up front. This port is now the traditional LAVA serial console. The rest of this looks like QEMU: - you can access the filesystem easily (to gather results) - we can use QEMU to chroot into the NFS root to install additional software (emulation via a fast model is extremely slow) The rest of lava is now related to managing resources and misc tasks: - running model_shell instances and their 'serial lines' - unpacking and blessing nfs root directories (blessing = patching it enough to make the model boot and work for us) - exporting nfsroot directories for the local network (currently hardcoded as 192.168.0.0/24 - managing local library of models (building, enumerating, adding/removing) - managing root to do the things required. I'm still not done but I hope to push this to one of our servers this week. Ideally I'd like to try on a virtual machine to see how that setup copes with the code. In addition I want to ensure I don't put any magic there, and that all of the installation is managed by automation tools. Thanks ZK PS: After spending the weekend looking at my scripts crashing I took a break. On Monday, today, I grabbed Peter to troubleshoot the issue. The issue was gone, I have no idea what changed (perhaps my long-lived terminal accumulated some cruft).

13 years, 5 months

Improving health checks

by Paul Larson

Following on to the discussion earlier this week, I wanted to kick off a discussion about how to implement one part we discussed, which is to make health checks something that lava natively understands and knows how to do, rather than just a lump of jobs scheduled daily from cron. Here are my thoughts, and I'll let you decide how much it's crack. 1. extend the scheduler model for devices? to add a text blob for storing a health check job - downside is that we would have add one of these for every single board - maybe it makes sense to do this for device type instead and have the scheduler just start it on the proper board? - we should make sure that these jobs use a locally downloadable, complete image 2. at the beginning of every day, if the board is NOT in offline or going offline state, insert this job into the queue for the board - Is there a good way to do this automatically with some internal mechanism - not cron from the outside? 3. Link the result of this job to the health check stuff already in progress thoughts? suggestions?

13 years, 5 months

Origen kernel ci results

by Paul Larson

Hi, I'm looking at the kernel ci results and origen seems to be running tests pretty frequently, but always failing to boot. Sangwook, could you take a look at the args that are getting passed and make sure there's not something here that would explain it hanging on boot? Or is this perhaps something in the kernel config that is getting used? Thanks, Paul Larson

13 years, 5 months

OH NO! IS LAVA DOWN? PONIES!

by Paul Larson

Do you find yourself asking that? If the answer is yes, check here: http://stats.pingdom.com/o15ezbsnv8tb Following on to the discussion earlier in the week, I checked out this pingdom thing that Michael mentioned. It doesn't provide data for system monitoring, but will at least give us external availability checking. Thanks, Paul Larson

13 years, 5 months

Release plans for today

by Zygmunt Krynicki

Hello everyone. I just wanted to give you a head's up on my plans today: 1) Review merge requests for lava-{server,dashboard,dispatcher,test} 2) Merge things that seem ready to land 3) Release everything on pypi 4) Build lava-2012-02-07.pybundle 5) Deploy that to production I also want to figure out how to release Andy's android benchmark extension. Andy, could you ensure that: *) You have merged the patches I +1 on gerrit *) You have registered a project on pypi (ping me if you need help) *) You have released a snapshot (setup.py sdist upload) -- ensure that your MANIFEST.in is correct (look at the source tarball and make sure it has all the files you wanted). If you do that I'll just add your extension to the manifest in lava-deployment-tool and it will be a part of the release today. Thanks ZK

13 years, 5 months

uwsgi memory problems again

by Paul Larson

I noticed that v.l.o was being obnoxiously slow again and top revealed that uwsgi had gone on a memory eating binge again: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3799 lava-pro 20 0 33.3g 24g 28 D 1 82.5 8:20.37 uwsgi touch /srv/lava/instances/production/etc/lava-server/uwsgi.reload and after a few minutes it had cleared up. It's clear the uwsgi changes that were made previously weren't helping though. Any ideas? -Paul Larson

13 years, 5 months

Monitoring LAVA with Sentry and Raven

by Zygmunt Krynicki

Hi. I've created a new extension for LAVA called lava-raven. This extension wraps Raven (which is a client for Sentry) in a easy to install package. The code is up on launchpad [1] and pypi [2], pypi also has a short documentation on how to set up sentry+raven in a LAVA instance. Along with raven I did a few small changes to linaro-django-xmlrpc (merge request [3]) to support raven better (those changes don't depend on raven though) and lava-server (merge request [4]). Here it's mostly a small helper base class for creating headless extensions. The MP is quite long because I took the liberty of doing documentation cleanups at the same time. Please have a look at those so that they can land quickly. Lastly with the help of Canonical IS team I've setup a new DNS for our Sentry instance at [5]. Sadly sentry is not easy to put in a URL prefix so I had to resort to another top level domain. Sentry is built on top of django so it has a local user database. There is no OpenID support and currently only I have an account there. Ping me to get another account if you need it. As we use sentry we'll see if we can keep this view public (I strongly doubt it as it has all the request data, including (probably) security tokens. Sentry currently runs as a new system user (sentry) on validation control server. It's currently using sqlite backend so we may want to create a PostgresSQL user/database for it instead if it gets slow. In any way, as soon as those bits land we'll have live monitoring of various parts of LAVA. All of the things that are integrated with LAVA Server will automatically benefit (that includes the scheduler _except_ for the scheduler daemon). As we progress towards using celery all of our stack will be transparently using Raven. For those bits that won't quickly become celery tasks I think it would be worth to patch some optional raven support (I'm looking at you lava-dispatcher). Best regards ZK [1] https://launchpad.net/lava-raven [2] http://pypi.python.org/pypi/lava-raven/0.1 [3] https://code.launchpad.net/~zkrynicki/linaro-django-xmlrpc/better-sentry-su… [4] https://code.launchpad.net/~linaro-validation/lava-server/production/+merge… [5] http://sentry.validation.linaro.org/

13 years, 5 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

linaro-validation February 2012