[Linaro-validation] Second update on fast models

Michael Hudson-Doyle michael.hudson at canonical.com
Wed Feb 15 02:29:24 UTC 2012


On Wed, 15 Feb 2012 01:44:44 +0100, Zygmunt Krynicki <zygmunt.krynicki at linaro.org> wrote:
> On Wed, Feb 15, 2012 at 12:31 AM, Michael Hudson-Doyle
> <michael.hudson at canonical.com> wrote:
> > On Tue, 14 Feb 2012 20:24:51 +0100, Zygmunt Krynicki <zygmunt.krynicki at linaro.org> wrote:
> >> On Tue, Feb 14, 2012 at 3:26 AM, Michael Hudson-Doyle
> >> <michael.hudson at canonical.com> wrote:
> >> > On Mon, 13 Feb 2012 22:27:25 +0100, Zygmunt Krynicki <zygmunt.krynicki at linaro.org> wrote:
> >> >> Hi.
> >> >>
> >> >> Fast model support is getting better. It seems that with the excellent
> >> >> patches by Peter Maydell we can now boot some kernels (I've only tried
> >> >> one tree, additional trees welcome :-). I'm currently building a
> >> >> from-scratch environment to ensure everything is accounted for and I
> >> >> understand how pieces interact.
> >> >>
> >> >> Having said that I'd like to summarize how LAVA handles fast models:
> >> >>
> >> >> Technically the solution is not unlike QEMU which you are all familiar
> >> >> with. The key differences are:
> >> >> 1) Only NFS boot makes sense. There are no other sensible method that
> >> >> I know of. We may also use a SD card (virtual obviously) but it is
> >> >> constrained to two gigabytes of data.
> >> >
> >> > As mentioned in the other thread, it would be good to at least let ARM
> >> > know that removing this limit would help us (if we can figure out how to
> >> > do this).
> >>
> >> We may figure out how to do this by reading the LISA source code that
> >> came with the model. That's a big task though (maybe grepping for mmc0
> >> is a low hanging fruit, I did not check)
> >
> > That's not what I was suggesting!  We should try to persuade ARM to do
> > that.  It may be that they can't do it in a reasonable timeframe, or
> > maybe it's simply not a need that has been explained to them yet and is
> > something they can do in a week.
> >
> >> >> 2) The way we actually boot is complicated. There is no uboot, fast
> >> >> model interpreter actually starts an .axf file that can do anything
> >> >> (some examples include running tests and benchmarks without actually
> >> >> setting the kernel or anything like that). There is no way to easily
> >> >> load the kernel and pass a command line. To work around that we're
> >> >> using a special axf file that uses fast model semihosting features to
> >> >> load the kernel/initrd from a host filesystem as well as to setup the
> >> >> command line that will be passed to the booting kernel. This allows us
> >> >> to freely configure NFS services and point our virtual kernel at
> >> >> appropriate IP addresses and pathnames.
> >> >
> >> > So I guess I'd like to understand how this works in a bit more detail.
> >> > Can you brain dump on the topic for a few minutes? :) What is "fast
> >> > model semihosting"?
> >>
> >> It's a way to have "syscalls" that connect the "bare hardware" (be it
> >> physical or emulated) to an external debugger or other monitor. You
> >> can find a short introduction in this blog [1]. For us it means we get
> >> to write bare-metal assembly that does the equivalent of open(),
> >> read(), write() and close(). The files are being opened are on the
> >> machine that runs the fast model. You can also print debugging
> >> statements straight to the console this way (we probably could write
> >> semihosting console driver if there is no such code yet) to get all of
> >> the output to the same tty that runs the model (model_shell). A more
> >> detailed explanation of this topic can be found in [2]
> >>
> >> Fast model semihosting simply refers to using semihosting facilities
> >> in a fast model interpreter.
> >>
> >> [1]: http://blogs.arm.com/software-enablement/418-semihosting-a-life-saver-during-soc-and-board-bring-up/
> >> [2]: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0471c/CHDJHHDI.html
> >
> > Thanks for that.  Sounds a tiny little bit like it's using a
> > JTAG-for-fast-models type facility?
> 
> Only in some ways, it's always device driven. You cannot use it to
> program memory or flip register values.

Right.

> PS: It's also a security risk as it includes a funny SYS_SYSTEM call,
> yay, I get to run commands as whoever is running the model ;-) Model
> chroots anyone?

Argh.  Spin up a vm for each test run? :)

> >> >
> >> >> 3) After the machine starts up we immediately open a TPC/IP connection
> >> >> to a local TCP socket. We know which port is being used so we can
> >> >> easily allocate them up front. This port is now the traditional LAVA
> >> >> serial console.
> >> >
> >> > I guess there is a risk here that we will miss early boot messages?
> >> > This might not matter much.
> >>
> >> There are other options but currently this seems to work quite okay.
> >
> > Fair enough.
> >
> >> > Once we've found someone at ARM we can officially complain at about fast
> >> > models, an option to have serial comms happen on the process
> >> > stdin/stdout would be nice.
> >>
> >> I think the reason they don't happen in the console is that by default
> >> we get four telnet ports to connect to (definitely more than one) so
> >> the logical question they'll ask is "which port should we redirect".
> >> Maybe there is an option buried somewhere to make that happen but so
> >> far I have not found it.
> >
> > Again, I'm not saying that this is something we should do...
> >
> >> >> The rest of this looks like QEMU:
> >> >> - you can access the filesystem easily (to gather results)
> >> >> - we can use QEMU to chroot into the NFS root to install additional
> >> >> software (emulation via a fast model is extremely slow)
> >> >
> >> > In my testing, the pip install bzr+lp:lava-test step did not really work
> >> > under QEMU.  Maybe it does now, or maybe we can install a tarball or
> >> > something.
> >>
> >> I installed lava-test using release tarball. That has worked pretty well.
> >
> > OK.  That makes sense.
> >
> >> In general I think that:
> >>
> >> 1) We need to reconsider how to do testing on very slow machines
> >
> > You mean a "drive from the outside" approach like lava-android-test uses
> > may make sense?
> 
> Actually that's very reasonable for another reason. Drive adb from
> outside is the 'lava-agent' idea I've been talking about lately.
> Instead of talking over a busy serial line you use packetized protocol
> over USB to talk to your piece of code that can do anything with your
> test device.

Right.  Although in terms of the actual operations that get executed on
the device I don't know how much difference this makes.

> Here my main concern is speed. Every bit counts, this thing is slow as
> hell already.

How slow is slow?

> >> 2) What can be invoked on the host (part of installation, unless that
> >> wants to build stuff, result parsing and tracking)
> >
> > Yeah, this sort of thing is a grey area currently.  More below.
> >
> >> 3) What has to be invoked on the target (test code, system probes)
> >>
> >> It's important to make the intent very clear. If we define that
> >> cmd_install installs something while in "master image" on the "target"
> >> then we should not break that.
> >
> > Well.  We can't avoid breaking that if there *is no master image*.
> 
> Yes but so far the master image was an ARM device. What I'd like to do
> is discuss how to sensibly migrate from that concept. Master image was
> supposed to have reliable kernel with networking. Is that still
> relevant (if we can just shove files onto the image?). Should we just
> ask people to build their benchmarks before starting the device for
> tests? What about existing tests? How does lava-test need to change to
> support this.

All decent questions.

> > Currently the dispatcher has the concept of a "reliable session", which
> > is meant to be a target-like environment where things like compilation
> > are possible.  For master image based deployments, this is "booted into
> > the master image, chrooted into a mounted testrootfs".  For qemu, it is
> > currently "boot the test image and hope that works", but it could be
> > "chrooted into the testrootfs mounted on the host with qemu-arm-static
> > in the right place", but that was less reliable than the other approach
> > when I was testing this.
> 
> Right, I know. I somewhat feel that it's an implementation detail that
> may vary from device to device and we should think about how the
> framework (lava-test) is going to look like. Unlike dispatcher
> commands it _has to_ be uniform and it _has to_ be backwards
> compatible.
> 
> >> I think that it would be sensible to add "host_chroot" mode that
> >> applies nicely to qemu and fast models.  Very slow things that don't
> >> care about the architecture could be invoked in that mode without
> >> sacrificing performance.
> 
> +1 I think that's a good abstraction

You are +1ing your own idea here :-)

> > This code exists already.  See _chroot_into_rootfs_session in
> > lava_dispatcher.client.qemu.LavaQEMUClient and surrounds.  The problem
> > is that qemu is some distance from perfect...
> 
> Right, I saw that. I'll have more comments later.
> 
> > Maybe we can limit the things that lava-test install does to things that
> > work under qemu -- I guess installing via dpkg usually works (unless
> > it's something like mono) and gcc probably works ok?  Maybe we can do
> > something like scratchbox where gcc is magically a cross compiler
> > running directly on the host?
> 
> The thing is that our users may want to depend on the original
> behavior. Things like cross gcc doing bad stuff or qemu
> corrupting/crashing are not possible to rule out. If anything we
> should do what we did, slowly and in a compatible way while designing
> and implementing a new version of lava-test, with sensible migration
> path, that allows users to optimize such things.

I think running the installation in the model is going to be the most
reliable thing for a15 testing -- given that one of the goals is to test
LVM, I'd hope just plain networking works most of the time in the tested
kernel.  Not a perfect answer of course.

Cheers,
mwh



More information about the linaro-validation mailing list