Hi.
Fast model support is getting better. It seems that with the excellent
patches by Peter Maydell we can now boot some kernels (I've only tried
one tree, additional trees welcome :-). I'm currently building a
from-scratch environment to ensure everything is accounted for and I
understand how pieces interact.
Having said that I'd like to summarize how LAVA handles fast models:
Technically the solution is not unlike QEMU which you are all familiar
with. The key differences are:
1) Only NFS boot makes sense. There are no other sensible method that
I know of. We may also use a SD card (virtual obviously) but it is
constrained to two gigabytes of data.
2) The way we actually boot is complicated. There is no uboot, fast
model interpreter actually starts an .axf file that can do anything
(some examples include running tests and benchmarks without actually
setting the kernel or anything like that). There is no way to easily
load the kernel and pass a command line. To work around that we're
using a special axf file that uses fast model semihosting features to
load the kernel/initrd from a host filesystem as well as to setup the
command line that will be passed to the booting kernel. This allows us
to freely configure NFS services and point our virtual kernel at
appropriate IP addresses and pathnames.
3) After the machine starts up we immediately open a TPC/IP connection
to a local TCP socket. We know which port is being used so we can
easily allocate them up front. This port is now the traditional LAVA
serial console.
The rest of this looks like QEMU:
- you can access the filesystem easily (to gather results)
- we can use QEMU to chroot into the NFS root to install additional
software (emulation via a fast model is extremely slow)
The rest of lava is now related to managing resources and misc tasks:
- running model_shell instances and their 'serial lines'
- unpacking and blessing nfs root directories (blessing = patching it
enough to make the model boot and work for us)
- exporting nfsroot directories for the local network (currently
hardcoded as 192.168.0.0/24
- managing local library of models (building, enumerating, adding/removing)
- managing root to do the things required.
I'm still not done but I hope to push this to one of our servers this
week. Ideally I'd like to try on a virtual machine to see how that
setup copes with the code. In addition I want to ensure I don't put
any magic there, and that all of the installation is managed by
automation tools.
Thanks
ZK
PS: After spending the weekend looking at my scripts crashing I took a
break. On Monday, today, I grabbed Peter to troubleshoot the issue.
The issue was gone, I have no idea what changed (perhaps my long-lived
terminal accumulated some cruft).
Following on to the discussion earlier this week, I wanted to kick off a
discussion about how to implement one part we discussed, which is to make
health checks something that lava natively understands and knows how to do,
rather than just a lump of jobs scheduled daily from cron. Here are my
thoughts, and I'll let you decide how much it's crack.
1. extend the scheduler model for devices? to add a text blob for storing a
health check job
- downside is that we would have add one of these for every single board
- maybe it makes sense to do this for device type instead and have the
scheduler just start it on the proper board?
- we should make sure that these jobs use a locally downloadable,
complete image
2. at the beginning of every day, if the board is NOT in offline or going
offline state, insert this job into the queue for the board
- Is there a good way to do this automatically with some internal
mechanism - not cron from the outside?
3. Link the result of this job to the health check stuff already in progress
thoughts? suggestions?
Hi, I'm looking at the kernel ci results and origen seems to be running
tests pretty frequently, but always failing to boot. Sangwook, could you
take a look at the args that are getting passed and make sure there's not
something here that would explain it hanging on boot? Or is this perhaps
something in the kernel config that is getting used?
Thanks,
Paul Larson
Do you find yourself asking that? If the answer is yes, check here:
http://stats.pingdom.com/o15ezbsnv8tb
Following on to the discussion earlier in the week, I checked out this
pingdom thing that Michael mentioned. It doesn't provide data for system
monitoring, but will at least give us external availability checking.
Thanks,
Paul Larson
Hello everyone.
I just wanted to give you a head's up on my plans today:
1) Review merge requests for lava-{server,dashboard,dispatcher,test}
2) Merge things that seem ready to land
3) Release everything on pypi
4) Build lava-2012-02-07.pybundle
5) Deploy that to production
I also want to figure out how to release Andy's android benchmark
extension. Andy, could you ensure that:
*) You have merged the patches I +1 on gerrit
*) You have registered a project on pypi (ping me if you need help)
*) You have released a snapshot (setup.py sdist upload) -- ensure that
your MANIFEST.in is correct (look at the source tarball and make sure it
has all the files you wanted).
If you do that I'll just add your extension to the manifest in
lava-deployment-tool and it will be a part of the release today.
Thanks
ZK
I noticed that v.l.o was being obnoxiously slow again and top revealed that
uwsgi had gone on a memory eating binge again:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
3799 lava-pro 20 0 33.3g 24g 28 D 1 82.5 8:20.37 uwsgi
touch /srv/lava/instances/production/etc/lava-server/uwsgi.reload and after
a few minutes it had cleared up. It's clear the uwsgi changes that were
made previously weren't helping though. Any ideas?
-Paul Larson
Hi.
I've created a new extension for LAVA called lava-raven. This
extension wraps Raven (which is a client for Sentry) in a easy to
install package. The code is up on launchpad [1] and pypi [2], pypi
also has a short documentation on how to set up sentry+raven in a LAVA
instance.
Along with raven I did a few small changes to linaro-django-xmlrpc
(merge request [3]) to support raven better (those changes don't
depend on raven though) and lava-server (merge request [4]). Here it's
mostly a small helper base class for creating headless extensions. The
MP is quite long because I took the liberty of doing documentation
cleanups at the same time. Please have a look at those so that they
can land quickly.
Lastly with the help of Canonical IS team I've setup a new DNS for our
Sentry instance at [5]. Sadly sentry is not easy to put in a URL
prefix so I had to resort to another top level domain. Sentry is built
on top of django so it has a local user database. There is no OpenID
support and currently only I have an account there. Ping me to get
another account if you need it. As we use sentry we'll see if we can
keep this view public (I strongly doubt it as it has all the request
data, including (probably) security tokens. Sentry currently runs as a
new system user (sentry) on validation control server. It's currently
using sqlite backend so we may want to create a PostgresSQL
user/database for it instead if it gets slow.
In any way, as soon as those bits land we'll have live monitoring of
various parts of LAVA. All of the things that are integrated with LAVA
Server will automatically benefit (that includes the scheduler
_except_ for the scheduler daemon). As we progress towards using
celery all of our stack will be transparently using Raven. For those
bits that won't quickly become celery tasks I think it would be worth
to patch some optional raven support (I'm looking at you
lava-dispatcher).
Best regards
ZK
[1] https://launchpad.net/lava-raven
[2] http://pypi.python.org/pypi/lava-raven/0.1
[3] https://code.launchpad.net/~zkrynicki/linaro-django-xmlrpc/better-sentry-su…
[4] https://code.launchpad.net/~linaro-validation/lava-server/production/+merge…
[5] http://sentry.validation.linaro.org/
On 02/02/2012 05:30 PM, Andy Doan wrote:
> Starting a new thread to deal with this project.
>
> We now have this project hosted at:
>
>
> <http://android.git.linaro.org/gitweb?p=lava-server/android_benchmark_views.…>
>
> I've made a number of improvements since this was initially sent out
> including:
Hi Andy.
I'm reading the code now. I have some comments. Essentially before this
lands you _have_ to add one change, then we can follow up with iterative
improvements.
You _have_ to apply south migrations to your application. Without that
we will not be able to make reasonable upgrades. Adding migrations later
is not an option as there is a bit that is very hard to automate.
Fortunately this is pretty easy:
1) Add a dependency on south (install_requires in setup.py)
2) Run lava-server manage ... android_benchmark_views_app
schemamigration --initial --auto
3) Inspect the code briefly and add it to version control
4) Push that up for review.
As for the rest:
*) helpers.py, don't use double leading underscore.
*) Try keeping the code that relates to your model in the model class,
also add a few one liners if you can to document each public method
*) extension.py, in def version(), pass the second argument to
versiontools.format_version() please pass your main package object
(android_benchmark_views_app). This will make it follow git version hashes.
*) models.py, BenchmarkRun, each ForeignKey should have a related_name.
This name is used when accessing the model field from the "remote" end,
then you can navigate back using this name. In general this is required
in some cases (when there are clashes, anyway this is beyond the scope
of this conversation)
*) models.py BenchmarkReport, don't use comments like that. Firs of all,
in django it's very very bad to use nullable text/char fields. Instead
always use null=False, blank=True - that makes writing code then saner.
As for comments. Instead of using a simple text field please consider
using django comment system. It _is_ slightly more complicated than
using a free-for-all text area (like on launchpad's whiteboard but at
the same time you get much more in return. You can read about this here:
https://docs.djangoproject.com/en/dev/ref/contrib/comments/
*) models.py both classes, add a get_absolute_url() method based on
django permalink system. It's trivial to use, see this document:
https://docs.djangoproject.com/en/dev/ref/models/instances/#django.db.model…
*) You ship query.flot. While that's okay for now I'd rather see this
managed by a dedicated django app that just wraps this in one place. As
for now this can stay as-is.
*) Your templates extend dashboard templates. Please don't do that (as
those templates are not a stable/public interface). Instead extend
lava-server templates that work just as well but are stable.
That's a very quick review. I have one more comment.
EXCELLENT WORK :-)
Could you please join the validation team? :-)
Thanks
ZK
>
> * general improvements to the flot library for rendering things
> * zoom support for graphs
> * DB queries
>
> The DB queries is the main thing Zygmunt complained about from the
> initial code. I now hit the DB with one query to get all the
> measurements I need it and let it perform the StdDev/Avg[1]
> calculations. The performance (at least on my laptop) seems pretty good
> now. The DB query improvement also allowed me to get rid of some ugly code.
>
> I'm hoping going forward I can start doing submissions as code reviews
> in Gerrit and have someone on this team look at each change.
>
> Let me know what you think - or we can always sit down and chat about it
> directly next week.
>
> [1]: NOTE: StdDev/Avg functions aren't supported by Sqlite3, so this
> feature only works with postgres/mysql.
>
> -andy
--
Zygmunt Krynicki
Linaro Validation Team
Hi folks.
I've noticed bootstrap [1] and it seems quite interesting as a UI
toolkit. It seems more featureful and consistent than jQuery UI.
Perhaps this is something to look at when we need to build/rebuild UI
elements.
Best regards
ZK
[1]: http://twitter.github.com/bootstrap/