Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
1) Unless we use external USB/ETH adapters then cloning a master image clones the mac address as well. This has serious consequences and I'm 100% sure that's why lava-test had to be switched to the random UUID mode. This problem applies to the master image mode. In the test image the software can do anything so we may run with a random MAC or with the mac that master images' boot loader set (we should check that). Since making master images is a mess, unless is becomes automated I will not be convinced that people just know how to make them properly and are not simply copying from someone. There is no reproducible master image creation process that ensure two people with the same board can run a single test in a reproducible way! (different starting rootfs/bootloader/package selection/random mistakes)
2) Running code via serial on the master image is a mess. It is very fragile. We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
3) The master image, as we know it today, should be booting remotely. The boot loader can stay on the board until we can push it over USB. The only thing that absolutely has to stay in the card is the lava board identity file which would be generated from the web UI. There is no reason to keep rootfs/kernel/initrd there. This means that a single small card can fit all tests as well. It also means we can reset the master image (as currently it is writeable by the board and can be corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards.
4) With agent on each board, identity file on the SD card LAVA will know if cloning happened. We could do dynamic board detection (unplug the board -> it goes away, plug it back -> it shows up). We could move a board from system to system and have 0config transitions.
5) Dispatcher should drop all configuration files. Sure it made sense 12 months ago when the idea was to run it standalone. Now all of that configuration should be in the database and should be provided by the scheduler to the dispatcher as a big serialized argument (or a file descriptor or a temporary file on disk). Setting up the dispatcher for a new instance is a pain and unless you can copy stuff from the validation server and ask everyone around for help it's very hard to get right. If master images could be constructed programmatically and with a agent on each "master image" lava would just get that configuration for free.
6) We should drop conmux. As in the lab we already have TCP/IP sockets for the serial lines we could just provide my example serial->tcp script as lava-serial service that people with directly attached boards would use. We could get a similar lava-power service if that would make sense. The lava-serial service could be started as an instance for all USB/SERIAL adapters plugged in if we really wanted (hello upstart!). The lava-power service would be custom and would require some config but it is very rare. Only lab and me have something like that. Again it should be instance based IMHO so I can say: 'start lava-power CONF=/etc/lava-power/magic-hack.conf' and see LAVA know about a power service. One could then say that a particular board uses a particular serial and power services.
That's it.
Best regards ZK
On Wed, Dec 7, 2011 at 10:01 AM, Zygmunt Krynicki < zygmunt.krynicki@linaro.org> wrote:
Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
- Unless we use external USB/ETH adapters then cloning a master image
clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
100% sure that's why lava-test had to be switched to the random UUID
Incorrect - I hunted down that problem. We can switch back if you really like, but I don't see any advantage to it.
mode. This problem applies to the master image mode. In the test image the software can do anything so we may run with a random MAC or with the mac that master images' boot loader set (we should check that). Since making master images is a mess, unless is becomes automated I will not be convinced that people just know how to make them properly and are not simply copying from someone. There is no reproducible master image creation process that ensure two people with the same board can run a single test in a reproducible way! (different starting rootfs/bootloader/package selection/random mistakes)
That's a pretty big exaggeration to say that it can't be done by others, or that it affects reproducibility of tests. The process isn't *that* hard. It's essentially just a nano image, a couple of extra packages installed, and add a few partitions. However, I do agree with the sentiment that this should be automated as much as possible.
2) Running code via serial on the master image is a mess. It is very
fragile. We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
This sounds like a good connect topic. It has some advantages, but also a lot of things to address.
- The master image, as we know it today, should be booting remotely.
The boot loader can stay on the board until we can push it over USB.
The problem is getting it to a state that we can push it over usb for every board. Not all boards support this, and the ones that do sometimes have issues with the tools to make it possible. We've talked about other solutions like a SD interface we can write from an external host over USB, then boot the board. One potential pitfall here is that this would mean we can no longer offload the lmc process with celery. It would HAVE to be done from the attached host. That means we are back to serializing LMC processes, or we have a host for every single dev board!
The only thing that absolutely has to stay in the card is the lava
board identity file which would be generated from the web UI. There is
If that's needed, then why couldn't it be written when we deploy to the board?
no reason to keep rootfs/kernel/initrd there. This means that a single small card can fit all tests as well. It also means we can reset the master image (as currently it is writeable by the board and can be corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards.
- With agent on each board, identity file on the SD card LAVA will
know if cloning happened. We could do dynamic board detection (unplug the board -> it goes away, plug it back -> it shows up). We could move a board from system to system and have 0config transitions.
Ok, you lost me here. In the last point you made, you seemed to be advocating for erasing the entire SD with the image we want to deploy.
- Dispatcher should drop all configuration files. Sure it made sense
12 months ago when the idea was to run it standalone. Now all of that configuration should be in the database and should be provided by the scheduler to the dispatcher as a big serialized argument (or a file descriptor or a temporary file on disk). Setting up the dispatcher for a new instance is a pain and unless you can copy stuff from the validation server and ask everyone around for help it's very hard to get right. If master images could be constructed programmatically and with a agent on each "master image" lava would just get that configuration for free.
We should talk to our users about this though. We already *know* we have users that are using the dispatcher standalone today. I think it's possible to still move this config into the database, but we don't want to pull the rug out from under anyone without a good plan of how to get them standing again.
- We should drop conmux. As in the lab we already have TCP/IP sockets
for the serial lines we could just provide my example serial->tcp script as lava-serial service that people with directly attached boards would use. We could get a similar lava-power service if that would make sense. The lava-serial service could be started as an instance for all USB/SERIAL adapters plugged in if we really wanted (hello upstart!). The lava-power service would be custom and would require some config but it is very rare. Only lab and me have something like that. Again it should be instance based IMHO so I can say: 'start lava-power CONF=/etc/lava-power/magic-hack.conf' and see LAVA know about a power service. One could then say that a particular board uses a particular serial and power services.
Another good topic for the connect I think. It needs a lot of fleshing out, but it sounds like you are basically suggesting we should recreate the functionality of conmux in lava. Conmux does two primary things for us: 1. It gives us a single interface for dealing with a variety of boards 2. It provides a safer, more convenient way of dealing with console and power on boards. To console in to a board, and hardreset the power in conmux: $ conmux-console panda01 $ ~$hardreset
Without it? [lookup wiki or database to find the console server/port] $ telnet console01 7001 [notice it's hung] ^] quit [lookup wiki or db to find the pdu server/port] $ telnet pdu01 [go through long menu driven interface to tell it to reset the port] ^] quit $ telnet console01 7001 [still hung... notice that you accidently reset some other board that someone was running a test on]
Sure, we could provide a command line tool for looking up those things in the lava database, and give admins an easy interface to just say "take me to the console of this machine", or "hardreset this machine". If we did that, and also added attached serial multiplexing, we will have... rewritten conmux. :)
That beings said, I think it could still be useful because it can give us an API to call from within lava, rather than having to call out to a command line tool. However, I don't see a huge urgency for it at the moment.
Thanks, Paul Larson
W dniu 07.12.2011 18:44, Paul Larson pisze:
On Wed, Dec 7, 2011 at 10:01 AM, Zygmunt Krynicki <zygmunt.krynicki@linaro.org mailto:zygmunt.krynicki@linaro.org> wrote:
Hi, sorry for the topic, I wanted to catch your attention. This is a quick brain dump based on my own observations/battle with master images last week. 1) Unless we use external USB/ETH adapters then cloning a master image clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
How does it work? As far as I know mac address is burned in boot.scr, if you copy that (and tell me we don't) then we get duplicates.
Update: after a quick discussion on #linaro it seems that the mac address is actually burned into the hardware pack and lmc does not make one (at least not for panda). I have not verified this yet but if true then _all_ pandas with a given hwpack build get the same mac.
100% sure that's why lava-test had to be switched to the random UUID
Incorrect - I hunted down that problem. We can switch back if you really like, but I don't see any advantage to it.
What was the problem? I don't remember we actually traced that to the root cause.
mode. This problem applies to the master image mode. In the test image the software can do anything so we may run with a random MAC or with the mac that master images' boot loader set (we should check that). Since making master images is a mess, unless is becomes automated I will not be convinced that people just know how to make them properly and are not simply copying from someone. There is no reproducible master image creation process that ensure two people with the same board can run a single test in a reproducible way! (different starting rootfs/bootloader/package selection/random mistakes)
That's a pretty big exaggeration to say that it can't be done by others, or that it affects reproducibility of tests.
Try assisting others in getting master images. It takes a long while before you get from "lava installed" to "lava ran stream for the first time". The reason for that is provisioning a board is HARD and should not be.
The process isn't *that* hard. It's essentially just a nano image, a couple of extra packages installed, and add a few partitions. However, I do agree with the sentiment that this should be automated as much as possible.
2) Running code via serial on the master image is a mess. It is very fragile. We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
This sounds like a good connect topic. It has some advantages, but also a lot of things to address.
3) The master image, as we know it today, should be booting remotely. The boot loader can stay on the board until we can push it over USB.
em The problem is getting it to a state that we can push it over usb for every board. Not all boards support this, and the ones that do sometimes have issues with the tools to make it possible.
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
We've talked
about other solutions like a SD interface we can write from an external host over USB, then boot the board. One potential pitfall here is that this would mean we can no longer offload the lmc process with celery. It would HAVE to be done from the attached host.
I'm not proposing anything like that. I just want to keep the master rootfs + kernel away from the sd card. It is not in any way related to how we run testrootfs.
That means we are back
to serializing LMC processes, or we have a host for every single dev board!
I'm not proposing anything like that. LCM can still run anywhere we want.
The only thing that absolutely has to stay in the card is the lava board identity file which would be generated from the web UI. There is
If that's needed, then why couldn't it be written when we deploy to the board?
It should be written once (when we create the master image) for a particular board. It should be written then and never touched before.
no reason to keep rootfs/kernel/initrd there. This means that a single small card can fit all tests as well. It also means we can reset the master image (as currently it is writeable by the board and can be corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards. 4) With agent on each board, identity file on the SD card LAVA will know if cloning happened. We could do dynamic board detection (unplug the board -> it goes away, plug it back -> it shows up). We could move a board from system to system and have 0config transitions.
Ok, you lost me here. In the last point you made, you seemed to be advocating for erasing the entire SD with the image we want to deploy.
I want to do it in stages as we have time / solve roadblocks:
1) Start with what we have but make it 100% automatic. Fix MAC issues. 2) Allow making the master rootfs empty (and very small as a partition), move to nfs or nbd for root filesystem. 3) Allow lava to manage master rootfs (create, revert to pristine) 4) Do tftp booting (kernel, initrd, device tree) so that we only keep uboot in the SD card. *) Fork a new R&D topic that will allow to put the bootloader remotely where possible. This would give us 100% remote booting. 5) Keep the cards with two files: master-boot.scr and master-uboot.bin, and an empty, small rootfs partition (for compatibility)
So once we reach this stage we have:
1) Roughly a few MB used on the SD card for any LAVA files (the boot loader and the script that tells it to do something smart). There is nothing unique to this and we can always DD the first few blocks to copy the partition layout and our boot partition over to any board. Only the identity file will need to be updated (but we don't have identity now so that's not a regression). 2) Each boot of the master image is identical apart from the current time. Lava can revert the snapshot or boot the board in read-only mode (where it will behave as a Ubuntu LTSP client). 3) We can install tests remotely if we want to. We don't even need to turn the board on. We can install the test. Boot the master image and ask it to deploy an already modified rootfs.
And there is joy and peace of mind. Everyone gets identical (perhaps even binary identical if we wipe logs and stuff like ssh host keys) master images.
5) Dispatcher should drop all configuration files. Sure it made sense 12 months ago when the idea was to run it standalone. Now all of that configuration should be in the database and should be provided by the scheduler to the dispatcher as a big serialized argument (or a file descriptor or a temporary file on disk). Setting up the dispatcher for a new instance is a pain and unless you can copy stuff from the validation server and ask everyone around for help it's very hard to get right. If master images could be constructed programmatically and with a agent on each "master image" lava would just get that configuration for free.
We should talk to our users about this though. We already *know* we have users that are using the dispatcher standalone today. I think it's possible to still move this config into the database, but we don't want to pull the rug out from under anyone without a good plan of how to get them standing again.
Sure. We can retain the old way but it's not how lava the stack should invoke it. This way we can manage this configuration in the web parts and evolve it as needed.
6) We should drop conmux. As in the lab we already have TCP/IP sockets for the serial lines we could just provide my example serial->tcp script as lava-serial service that people with directly attached boards would use. We could get a similar lava-power service if that would make sense. The lava-serial service could be started as an instance for all USB/SERIAL adapters plugged in if we really wanted (hello upstart!). The lava-power service would be custom and would require some config but it is very rare. Only lab and me have something like that. Again it should be instance based IMHO so I can say: 'start lava-power CONF=/etc/lava-power/magic-hack.conf' and see LAVA know about a power service. One could then say that a particular board uses a particular serial and power services.
Another good topic for the connect I think. It needs a lot of fleshing out, but it sounds like you are basically suggesting we should recreate the functionality of conmux in lava.
Yes, but sans the perl parts and with lava integration (service subscription and more). Plus I don't expect us to hit all those conmux bugs that we cannot reasonably fix.
Conmux does two primary things for us:
- It gives us a single interface for dealing with a variety of boards
We want a TCP socket for the serial. There are better ways of doing that.
- It provides a safer, more convenient way of dealing with console and
power on boards.
I don't see how that is true. The console - sure, but you cannot even attempt to type there (all hell would break loose if conmux allowed this) so the usefulness is zero. The power point is true but that also integrates poorly with the rest of LAVA.
To console in to a board, and hardreset the power in
conmux: $ conmux-console panda01 $ ~$hardreset
Without it? [lookup wiki or database to find the console server/port] $ telnet console01 7001 [notice it's hung] ^] quit [lookup wiki or db to find the pdu server/port] $ telnet pdu01 [go through long menu driven interface to tell it to reset the port] ^] quit $ telnet console01 7001 [still hung... notice that you accidently reset some other board that someone was running a test on]
You don't think that was my suggestion do you?
Sure, we could provide a command line tool for looking up those things in the lava database, and give admins an easy interface to just say "take me to the console of this machine", or "hardreset this machine". If we did that, and also added attached serial multiplexing, we will have... rewritten conmux. :)
lava-server manage boardctl panda01 --reboot lava-server manage boardctl panda01 --serial-trace
The rest is done in the services, behind the scenes. You get true stuff in the web UI, you get APIs. Try building that on conmux without integration issues.
Without the bugs and with right lava integration, sounds good to me.
That beings said, I think it could still be useful because it can give us an API to call from within lava, rather than having to call out to a command line tool. However, I don't see a huge urgency for it at the moment.
I agree.
My personal view on priority is as follows:
1) Fixing master image story to a point where we can reliably build images such as the ones we have today
2) Putting identity file on the SD card, adding lava extension to manage devices (this will be the place that ultimately holds stuff like dispatcher configs, has actions to do stuff with a board).
3) Putting an agent in the master image so that we don't talk over serial ever again. Simplifying the dispatcher, coming up with a board API. Exposing the stuff in dashboard UI as sensible.
4) Doing more work on remote booting so that we can minimize space usage on the SD card and have immutable master image (that starts exactly the same on each boot because we control the read-only rootfs, initrd, kernel, device tree from the lava-server parts).
Best regards Zygmunt
On 12/08/2011 02:36 AM, Somebody in the thread at some point said:
Just briefly commenting on the bits I have some experience with...
- Unless we use external USB/ETH adapters then cloning a master image
clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
How does it work? As far as I know mac address is burned in boot.scr, if you copy that (and tell me we don't) then we get duplicates.
Update: after a quick discussion on #linaro it seems that the mac address is actually burned into the hardware pack and lmc does not make one (at least not for panda). I have not verified this yet but if true then _all_ pandas with a given hwpack build get the same mac.
If you're running TI LT kernels for Panda, and it has also been true for linux-linaro- on Panda, then I added patches a while back to munge a MAC address from the CPU's allegedly-GUID ID number. That is consistent for a particular Panda board, and there's at least a selection of possible MAC addresses to reduce chance of collision. It's unclear how wide that selection is but it isn't tiny.
corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards.
FWIW I used NBD some years ago and it worked great. You have to take care about building some userland footprint for it but otherwise I found it much more effective and reliable than NFS.
- Start with what we have but make it 100% automatic. Fix MAC issues.
If you're faced with a board that has no on-PCB NV storage for this kind of identity information, if the CPU has a GUID then I can recommend forming the MAC from it.
-Andy
Hi Andy.
On Wed, Dec 7, 2011 at 11:24 PM, Andy Green andy.green@linaro.org wrote:
On 12/08/2011 02:36 AM, Somebody in the thread at some point said:
Just briefly commenting on the bits I have some experience with...
- Unless we use external USB/ETH adapters then cloning a master image
clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
How does it work? As far as I know mac address is burned in boot.scr, if you copy that (and tell me we don't) then we get duplicates.
Update: after a quick discussion on #linaro it seems that the mac address is actually burned into the hardware pack and lmc does not make one (at least not for panda). I have not verified this yet but if true then _all_ pandas with a given hwpack build get the same mac.
If you're running TI LT kernels for Panda, and it has also been true for linux-linaro- on Panda, then I added patches a while back to munge a MAC address from the CPU's allegedly-GUID ID number. That is consistent for a particular Panda board, and there's at least a selection of possible MAC addresses to reduce chance of collision. It's unclear how wide that selection is but it isn't tiny.
I see, that would make some sense (sadly not as much as putting a 1K NV storage on a panda by TI). I need to check how my boot.scr got a hard-wired mac address in its boot.src because I'm 100% sure that I did not touch it though.
corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards.
FWIW I used NBD some years ago and it worked great. You have to take care about building some userland footprint for it but otherwise I found it much more effective and reliable than NFS.
Edubuntu uses NBD and I talked with the person that implemented that part. It seems very fast and reliable and with single readonly rootfs image they export it seems to be just perfect for what we wanted.
- Start with what we have but make it 100% automatic. Fix MAC issues.
If you're faced with a board that has no on-PCB NV storage for this kind of identity information, if the CPU has a GUID then I can recommend forming the MAC from it.
Right now I would use a explicitly set mac that lava also knows about. It's something you would generate on the server. Then plaster a file on the SD card. The boot script would load that file and set the mac address accordingly.
This only applies to boards with no NV storage and on-board ETH. For beagle C4 and origen we can depend on non-crappy USB/ETH adapters.
Doing stuff like that based on the board id sounds nice but has some drawbacks:
1) It does not work in all cases (older kernels, older bootloaders maybe) 2) It requires some per board magic which might take a time to happen in all the kernel trees (unless it is already done) 3) Lava does not know it. We can do smarter things when we know the mac of each board we have (like allow it to tftp boot + nbd boot later). We can also do tests on static IP addresses if we want to as we can control dhcp.
Thanks for your feedback! ZK
On Wed, 07 Dec 2011 19:36:01 +0100, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
- Putting identity file on the SD card, adding lava extension to manage
devices (this will be the place that ultimately holds stuff like dispatcher configs, has actions to do stuff with a board).
This should be part of the scheduler, not a new extension.
Cheers, mwh
W dniu 08.12.2011 02:23, Michael Hudson-Doyle pisze:
On Wed, 07 Dec 2011 19:36:01 +0100, Zygmunt Krynickizygmunt.krynicki@linaro.org wrote:
- Putting identity file on the SD card, adding lava extension to manage
devices (this will be the place that ultimately holds stuff like dispatcher configs, has actions to do stuff with a board).
This should be part of the scheduler, not a new extension.
There are some reasons why it would be better elsewhere but for the moment they are not relevant and we can use that, sure.
ZK
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 07.12.2011 18:44, Paul Larson pisze:
On Wed, Dec 7, 2011 at 10:01 AM, Zygmunt Krynicki <zygmunt.krynicki@linaro.org mailto:zygmunt.krynicki@linaro.org> wrote:
Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
1) Unless we use external USB/ETH adapters then cloning a master image clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
How does it work? As far as I know mac address is burned in boot.scr, if you copy that (and tell me we don't) then we get duplicates.
Update: after a quick discussion on #linaro it seems that the mac address is actually burned into the hardware pack and lmc does not make one (at least not for panda). I have not verified this yet but if true then _all_ pandas with a given hwpack build get the same mac.
As I said at #linaro, there's no default mac address at any hwpack we produce. If you have a boot.scr with a mac address pre-defined, then you either customized your own hwpack or it's a bug.
I believe we already have a valid and unique mac address for all the boards we currently support, even if they rely on being calculated during boot time (like the hack that Andy did for panda). Let me know if you're still having issues with random mac address every time you boot your board.
The process isn't *that* hard. It's essentially just a nano image, a couple of extra packages installed, and add a few partitions. However, I do agree with the sentiment that this should be automated as much as possible.
2) Running code via serial on the master image is a mess. It is very fragile. We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
This sounds like a good connect topic. It has some advantages, but also a lot of things to address.
While I agree that a different implementation might be a nice thing, I also see that it can be quite complicated and still not yet sure if this will actually help much.
I know serial is not the best interface you have, but it's the only one that we know it works for all the boards we have :-) Once you start relying on ethernet or such, then you can easily be blocked by issues at the kernel/userspace side.
Unfortunately it seems that serial is the most reliable interface you may have with these boards.
3) The master image, as we know it today, should be booting remotely. The boot loader can stay on the board until we can push it over USB. em
The problem is getting it to a state that we can push it over usb for every board. Not all boards support this, and the ones that do sometimes have issues with the tools to make it possible.
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
Cheers,
On this thought....
On Wed, Dec 7, 2011 at 7:54 PM, Ricardo Salveti ricardo.salveti@linaro.org wrote:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 07.12.2011 18:44, Paul Larson pisze:
On Wed, Dec 7, 2011 at 10:01 AM, Zygmunt Krynicki <zygmunt.krynicki@linaro.org mailto:zygmunt.krynicki@linaro.org> wrote:
Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
1) Unless we use external USB/ETH adapters then cloning a master image clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
How does it work? As far as I know mac address is burned in boot.scr, if you copy that (and tell me we don't) then we get duplicates.
Update: after a quick discussion on #linaro it seems that the mac address is actually burned into the hardware pack and lmc does not make one (at least not for panda). I have not verified this yet but if true then _all_ pandas with a given hwpack build get the same mac.
As I said at #linaro, there's no default mac address at any hwpack we produce. If you have a boot.scr with a mac address pre-defined, then you either customized your own hwpack or it's a bug.
There is a proposed new option to linaro-media-create to allow you to customize your install. Basically it allows for scripts to be run as part of the linaro-media-create process. Want an update boot.scr or whatever, go nuts, it'll allow for it.
This was something alf, mabac and I had spec'd it out at the last LC.
I believe we already have a valid and unique mac address for all the boards we currently support, even if they rely on being calculated during boot time (like the hack that Andy did for panda). Let me know if you're still having issues with random mac address every time you boot your board.
The process isn't *that* hard. It's essentially just a nano image, a couple of extra packages installed, and add a few partitions. However, I do agree with the sentiment that this should be automated as much as possible.
2) Running code via serial on the master image is a mess. It is very fragile. We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
This sounds like a good connect topic. It has some advantages, but also a lot of things to address.
While I agree that a different implementation might be a nice thing, I also see that it can be quite complicated and still not yet sure if this will actually help much.
I know serial is not the best interface you have, but it's the only one that we know it works for all the boards we have :-) Once you start relying on ethernet or such, then you can easily be blocked by issues at the kernel/userspace side.
Unfortunately it seems that serial is the most reliable interface you may have with these boards.
3) The master image, as we know it today, should be booting remotely. The boot loader can stay on the board until we can push it over USB. em
The problem is getting it to a state that we can push it over usb for every board. Not all boards support this, and the ones that do sometimes have issues with the tools to make it possible.
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
Cheers,
Ricardo Salveti de Araujo
linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev
W dniu 08.12.2011 02:54, Ricardo Salveti pisze:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 07.12.2011 18:44, Paul Larson pisze:
On Wed, Dec 7, 2011 at 10:01 AM, Zygmunt Krynicki <zygmunt.krynicki@linaro.orgmailto:zygmunt.krynicki@linaro.org> wrote:
Hi, sorry for the topic, I wanted to catch your attention. This is a quick brain dump based on my own observations/battle with master images last week. 1) Unless we use external USB/ETH adapters then cloning a master image clones the mac address as well. This has serious consequences and I'm
This doesn't ring true. We do have different mac addresses, even on boards without flash and on-board ethernet.
How does it work? As far as I know mac address is burned in boot.scr, if you copy that (and tell me we don't) then we get duplicates.
Update: after a quick discussion on #linaro it seems that the mac address is actually burned into the hardware pack and lmc does not make one (at least not for panda). I have not verified this yet but if true then _all_ pandas with a given hwpack build get the same mac.
As I said at #linaro, there's no default mac address at any hwpack we produce. If you have a boot.scr with a mac address pre-defined, then you either customized your own hwpack or it's a bug.
I'll dig into it next week
I believe we already have a valid and unique mac address for all the boards we currently support, even if they rely on being calculated during boot time (like the hack that Andy did for panda). Let me know if you're still having issues with random mac address every time you boot your board.
I'll report bugs on all my findings.
The process isn't *that* hard. It's essentially just a nano image, a couple of extra packages installed, and add a few partitions. However, I do agree with the sentiment that this should be automated as much as possible.
2) Running code via serial on the master image is a mess. It is very fragile. We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
This sounds like a good connect topic. It has some advantages, but also a lot of things to address.
While I agree that a different implementation might be a nice thing, I also see that it can be quite complicated and still not yet sure if this will actually help much.
I know serial is not the best interface you have, but it's the only one that we know it works for all the boards we have :-) Once you start relying on ethernet or such, then you can easily be blocked by issues at the kernel/userspace side.
I don't want to use serial the same way we use it right now. That is, by running everything on a root shell with pyexpect fueling the process. There _are_ better ways, mostly based on bidirectional packeted transport interfaces. The same rules apply to USB actually.
Anyway, if you assume that anything we do runs without networking then you'd be surprised. We require master images to have networking right now. For test images we can still kind of make it without having a working connection but IMHO that's a moot point since master image usually has to work first and I can count on my hand the number of times we had broken network in test images and good network in master images.
Then there is pppd which would give us networking without USB.
Unfortunately it seems that serial is the most reliable interface you may have with these boards.
Sure, then let's use it in a less brute-force way than pyexpect.
3) The master image, as we know it today, should be booting remotely. The boot loader can stay on the board until we can push it over USB.
em
The problem is getting it to a state that we can push it over usb for every board. Not all boards support this, and the ones that do sometimes have issues with the tools to make it possible.
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
Getting it to work with panda would be a major milestone. I don't treat boards as equal as they are not equal. Last time I checked uboot has lots of USB and ethernet support so we might be able to eventually do it assuming actual bugs in both linux kernel and uboot for origen are fixed.
Best regards Zygmunt
On Thu, Dec 8, 2011 at 12:49 AM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 08.12.2011 02:54, Ricardo Salveti pisze:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
Getting it to work with panda would be a major milestone. I don't treat boards as equal as they are not equal. Last time I checked uboot has lots of USB and ethernet support so we might be able to eventually do it assuming actual bugs in both linux kernel and uboot for origen are fixed.
For Panda at least you should have everything you need: 1 - USB booting for SPL/U-Boot with USB SPL support; 2 - Ethernet support at U-Boot with TFTP and PXE support; 3 - Unique mac address at both u-boot and kernel (same one, same code to calculate it);
Once you make it work with Panda, we can later then try to have the same support at the other boards we have.
Cheers,
On 12/08/2011 10:56 AM, Somebody in the thread at some point said:
On Thu, Dec 8, 2011 at 12:49 AM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 08.12.2011 02:54, Ricardo Salveti pisze:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
What is a bug, bootloader not supporting Ethernet?
Getting it to work with panda would be a major milestone. I don't treat boards as equal as they are not equal. Last time I checked uboot has lots of USB and ethernet support so we might be able to eventually do it assuming actual bugs in both linux kernel and uboot for origen are fixed.
For Panda at least you should have everything you need: 1 - USB booting for SPL/U-Boot with USB SPL support; 2 - Ethernet support at U-Boot with TFTP and PXE support; 3 - Unique mac address at both u-boot and kernel (same one, same code to calculate it);
Once you make it work with Panda, we can later then try to have the same support at the other boards we have.
Do all the supported SoC ROMs support USB booting, or workable alternative?
-Andy
W dniu 08.12.2011 05:16, Andy Green pisze:
On 12/08/2011 10:56 AM, Somebody in the thread at some point said:
On Thu, Dec 8, 2011 at 12:49 AM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 08.12.2011 02:54, Ricardo Salveti pisze:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
What is a bug, bootloader not supporting Ethernet?
Well all the other boot loaders seem to ;-)
Getting it to work with panda would be a major milestone. I don't treat boards as equal as they are not equal. Last time I checked uboot has lots of USB and ethernet support so we might be able to eventually do it assuming actual bugs in both linux kernel and uboot for origen are fixed.
For Panda at least you should have everything you need: 1 - USB booting for SPL/U-Boot with USB SPL support; 2 - Ethernet support at U-Boot with TFTP and PXE support; 3 - Unique mac address at both u-boot and kernel (same one, same code to calculate it);
Once you make it work with Panda, we can later then try to have the same support at the other boards we have.
Do all the supported SoC ROMs support USB booting, or workable alternative?
The alternative is to keep a static copy of uboot on the SD card as we've been doing all the time now.
ZK
On Thu, Dec 8, 2011 at 2:16 AM, Andy Green andy.green@linaro.org wrote:
On 12/08/2011 10:56 AM, Somebody in the thread at some point said:
On Thu, Dec 8, 2011 at 12:49 AM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 08.12.2011 02:54, Ricardo Salveti pisze:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
What is a bug, bootloader not supporting Ethernet?
USB support (ehci) at the bootloader and then the driver for the device you want to use.
u-boot supports quite a few ethernet devices, so I believe making usb to work would probably be the only thing needed.
Getting it to work with panda would be a major milestone. I don't treat boards as equal as they are not equal. Last time I checked uboot has lots of USB and ethernet support so we might be able to eventually do it assuming actual bugs in both linux kernel and uboot for origen are fixed.
For Panda at least you should have everything you need: 1 - USB booting for SPL/U-Boot with USB SPL support; 2 - Ethernet support at U-Boot with TFTP and PXE support; 3 - Unique mac address at both u-boot and kernel (same one, same code to calculate it);
Once you make it work with Panda, we can later then try to have the same support at the other boards we have.
Do all the supported SoC ROMs support USB booting, or workable alternative?
Even if not fully able to boot from USB, we can have at least a minimal u-boot/SPL that would deliver DFU support, this way you could still push another boot loader later on.
Cheers,
On 12/08/2011 12:29 PM, Somebody in the thread at some point said:
On Thu, Dec 8, 2011 at 2:16 AM, Andy Greenandy.green@linaro.org wrote:
On 12/08/2011 10:56 AM, Somebody in the thread at some point said:
On Thu, Dec 8, 2011 at 12:49 AM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 08.12.2011 02:54, Ricardo Salveti pisze:
On Wed, Dec 7, 2011 at 4:36 PM, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
I don't want to push 100% over usb but pushing 99.9 (all except to boot loader) works for all boards as far as I know. This would give us controllable master image (hell we could install tests before turning the power on).
I guess this will only be an issue with Origen, as to make ethernet to work at the boot loader you also need to have USB support (kind of similar as Panda). For the others I believe it should just work (if not, it's a bug).
What is a bug, bootloader not supporting Ethernet?
USB support (ehci) at the bootloader and then the driver for the device you want to use.
u-boot supports quite a few ethernet devices, so I believe making usb to work would probably be the only thing needed.
Maybe this is worth thinking if the work, and test, to re-implement usb + ethernet at the bootloader as well as Linux for SoC is a good plan compared to just getting it working in Linux, and supporting the case where it doesn't yet work in Linux either.
Getting it to work with panda would be a major milestone. I don't treat boards as equal as they are not equal. Last time I checked uboot has lots of USB and ethernet support so we might be able to eventually do it assuming actual bugs in both linux kernel and uboot for origen are fixed.
For Panda at least you should have everything you need: 1 - USB booting for SPL/U-Boot with USB SPL support; 2 - Ethernet support at U-Boot with TFTP and PXE support; 3 - Unique mac address at both u-boot and kernel (same one, same code to calculate it);
Once you make it work with Panda, we can later then try to have the same support at the other boards we have.
Do all the supported SoC ROMs support USB booting, or workable alternative?
Even if not fully able to boot from USB, we can have at least a minimal u-boot/SPL that would deliver DFU support, this way you could still push another boot loader later on.
Wouldn't it be better to boot a canned Linux + initrd for this kind of thing? Then you don't require a bootloader (cf direct ROM boot) and get high quality enablement you only had to do the once.
-Andy
Generally, there are two ways to drive the boards: by serial line or network(NFS maybe plus NBD). If all switching to NFS, I'm sure there will be other problems in daily using.
For the MAC address issue, it will occur both on current master/tester image layout and NFS deployment if uboot can't get board fused MAC address and pass it to kernel.
For the master image generation, I think the best way is to make LMC be able to create a custom master SD card layout, now it only needs to extend to shrink boot and rootfs partition, and make two or more blank partitions(testboot and testrootfs) with label in one command.
Just wanted to add a couple of comments:
Zygmunt commented earlier that creating master images is hard. I actually disagree, but that is all about perspective and familiarity. I create master images on a *very* regular basis and it's pretty easy once you know what you're doing, but that's the point. It's about what you're most familiar with.
+1 on Spring's idea that we should get l-m-c to create a tailored, partitioned image. I've thought the same for a while, but since it's application was only for LAVA I felt it would be a far lower priority than many other things.
Also +1 on the longer term strategy of supporting NBD (or whatever) type solutions.
Like others in this thread, I'm also concerned about the attempt to reproduce conmux within LAVA. Yes, we get rid of a lot of nasty Perl code (in case I offend, which I don't intend to, I mean Perl is nasty, not the implementation,) but we have to consider whether the effort involved would significantly benefit us in the end result.
Additionally, the idea of not supporting the serial interface, which has the distinct advantage of capturing the log all in one place, I find difficult to justify. Whilst I'm not completely wedded to it, as Ricardo said, it is the one interface that is guaranteed to be provided across all boards in one form or another. If we'd not had it, we would be significantly further behind in deploying origens right now.
++2c
Dave Pigott Validation Engineer T: +44 1223 45 00 24 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 8 Dec 2011, at 09:39, Spring Zhang wrote:
Generally, there are two ways to drive the boards: by serial line or network(NFS maybe plus NBD). If all switching to NFS, I'm sure there will be other problems in daily using.
For the MAC address issue, it will occur both on current master/tester image layout and NFS deployment if uboot can't get board fused MAC address and pass it to kernel.
For the master image generation, I think the best way is to make LMC be able to create a custom master SD card layout, now it only needs to extend to shrink boot and rootfs partition, and make two or more blank partitions(testboot and testrootfs) with label in one command.
-- Best wishes, Spring Zhang
On 8 December 2011 11:54, Dave Pigott dave.pigott@linaro.org wrote:
Just wanted to add a couple of comments:
Zygmunt commented earlier that creating master images is hard. I actually disagree, but that is all about perspective and familiarity. I create master images on a *very* regular basis and it's pretty easy once you know what you're doing, but that's the point. It's about what you're most familiar with.
Is the official documentation to create a master image available on https://wiki.linaro.org/Platform/Validation/Specs/MasterBootImage ? If it's the case, we should convert it to a step by step how to and link it to Validation knowledge base.
+1 on Spring's idea that we should get l-m-c to create a tailored, partitioned image. I've thought the same for a while, but since it's application was only for LAVA I felt it would be a far lower priority than many other things.
Also +1 on the longer term strategy of supporting NBD (or whatever) type solutions.
Like others in this thread, I'm also concerned about the attempt to reproduce conmux within LAVA. Yes, we get rid of a lot of nasty Perl code (in case I offend, which I don't intend to, I mean Perl is nasty, not the implementation,) but we have to consider whether the effort involved would significantly benefit us in the end result.
Additionally, the idea of not supporting the serial interface, which has the distinct advantage of capturing the log all in one place, I find difficult to justify. Whilst I'm not completely wedded to it, as Ricardo said, it is the one interface that is guaranteed to be provided across all boards in one form or another. If we'd not had it, we would be significantly further behind in deploying origens right now.
++2c
Dave Pigott Validation Engineer T: +44 1223 45 00 24 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 8 Dec 2011, at 09:39, Spring Zhang wrote:
Generally, there are two ways to drive the boards: by serial line or network(NFS maybe plus NBD). If all switching to NFS, I'm sure there will be other problems in daily using.
For the MAC address issue, it will occur both on current master/tester image layout and NFS deployment if uboot can't get board fused MAC address and pass it to kernel.
For the master image generation, I think the best way is to make LMC be able to create a custom master SD card layout, now it only needs to extend to shrink boot and rootfs partition, and make two or more blank partitions(testboot and testrootfs) with label in one command.
-- Best wishes, Spring Zhang
While that outlines what the end result should be, and details a few commands, it is by no means complete. I shall add the step by step instructions.
Dave
Dave Pigott Validation Engineer T: +44 1223 45 00 24 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 8 Dec 2011, at 10:09, Fathi Boudra wrote:
On 8 December 2011 11:54, Dave Pigott dave.pigott@linaro.org wrote:
Just wanted to add a couple of comments:
Zygmunt commented earlier that creating master images is hard. I actually disagree, but that is all about perspective and familiarity. I create master images on a *very* regular basis and it's pretty easy once you know what you're doing, but that's the point. It's about what you're most familiar with.
Is the official documentation to create a master image available on https://wiki.linaro.org/Platform/Validation/Specs/MasterBootImage ? If it's the case, we should convert it to a step by step how to and link it to Validation knowledge base.
+1 on Spring's idea that we should get l-m-c to create a tailored, partitioned image. I've thought the same for a while, but since it's application was only for LAVA I felt it would be a far lower priority than many other things.
Also +1 on the longer term strategy of supporting NBD (or whatever) type solutions.
Like others in this thread, I'm also concerned about the attempt to reproduce conmux within LAVA. Yes, we get rid of a lot of nasty Perl code (in case I offend, which I don't intend to, I mean Perl is nasty, not the implementation,) but we have to consider whether the effort involved would significantly benefit us in the end result.
Additionally, the idea of not supporting the serial interface, which has the distinct advantage of capturing the log all in one place, I find difficult to justify. Whilst I'm not completely wedded to it, as Ricardo said, it is the one interface that is guaranteed to be provided across all boards in one form or another. If we'd not had it, we would be significantly further behind in deploying origens right now.
++2c
Dave Pigott Validation Engineer T: +44 1223 45 00 24 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 8 Dec 2011, at 09:39, Spring Zhang wrote:
Generally, there are two ways to drive the boards: by serial line or network(NFS maybe plus NBD). If all switching to NFS, I'm sure there will be other problems in daily using.
For the MAC address issue, it will occur both on current master/tester image layout and NFS deployment if uboot can't get board fused MAC address and pass it to kernel.
For the master image generation, I think the best way is to make LMC be able to create a custom master SD card layout, now it only needs to extend to shrink boot and rootfs partition, and make two or more blank partitions(testboot and testrootfs) with label in one command.
-- Best wishes, Spring Zhang
On Wed, 7 Dec 2011 11:44:05 -0600, Paul Larson paul.larson@linaro.org wrote:
Sure, we could provide a command line tool for looking up those things in the lava database, and give admins an easy interface to just say "take me to the console of this machine", or "hardreset this machine". If we did that, and also added attached serial multiplexing, we will have... rewritten conmux. :)
We don't need multiplexing though, and having a persistent daemon seems to me to be the source of much of the (well, my) angst with conmux.
Cheers, mwh
On Wed, Dec 7, 2011 at 7:27 PM, Michael Hudson-Doyle < michael.hudson@canonical.com> wrote:
On Wed, 7 Dec 2011 11:44:05 -0600, Paul Larson paul.larson@linaro.org wrote:
Sure, we could provide a command line tool for looking up those things
in
the lava database, and give admins an easy interface to just say "take me to the console of this machine", or "hardreset this machine". If we did that, and also added attached serial multiplexing, we will have... rewritten conmux. :)
We don't need multiplexing though, and having a persistent daemon seems to me to be the source of much of the (well, my) angst with conmux.
This has come in handy for me before when debugging locally, or dealing with boards that need serial or usb attachment.
Likewise, I've found it useful when we can operate as a tag team both connected to a board, and discussing things over IRC.
Dave Pigott Validation Engineer T: +44 1223 45 00 24 | M +44 7940 45 93 44 Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
On 8 Dec 2011, at 20:56, Paul Larson wrote:
On Wed, Dec 7, 2011 at 7:27 PM, Michael Hudson-Doyle michael.hudson@canonical.com wrote: On Wed, 7 Dec 2011 11:44:05 -0600, Paul Larson paul.larson@linaro.org wrote:
Sure, we could provide a command line tool for looking up those things in the lava database, and give admins an easy interface to just say "take me to the console of this machine", or "hardreset this machine". If we did that, and also added attached serial multiplexing, we will have... rewritten conmux. :)
We don't need multiplexing though, and having a persistent daemon seems to me to be the source of much of the (well, my) angst with conmux. This has come in handy for me before when debugging locally, or dealing with boards that need serial or usb attachment.
On Wed, 7 Dec 2011 17:01:25 +0100, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
- Running code via serial on the master image is a mess. It is very
fragile.
Is it really? It's a bit of a pain, but it seems this part actually works ok for us. It also has the advantage that all the logs are in one place.
Also, I don't see anything in your proposals that would get us away from having to talk to the bootloader over the serial line. Also getting the boot log for a failed boot seems somehow essential and I don't know how we can do that without a serial connection (this is different from running commands, though).
We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
Well, I just rewrote chunks of the dispatcher to work for software emulators, albeit taking a different approach. Not sure the approach you propose is really any different, although perhaps it would be easier to distribute to different machines.
- The master image, as we know it today, should be booting remotely.
The boot loader can stay on the board until we can push it over USB. The only thing that absolutely has to stay in the card is the lava board identity file which would be generated from the web UI. There is no reason to keep rootfs/kernel/initrd there. This means that a single small card can fit all tests as well. It also means we can reset the master image (as currently it is writeable by the board and can be corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards.
This sounds good.
- With agent on each board, identity file on the SD card LAVA will
know if cloning happened. We could do dynamic board detection (unplug the board -> it goes away, plug it back -> it shows up). We could move a board from system to system and have 0config transitions.
I'm not sure about this though. How do you tell the difference between the agent going away because booting into the test image failed and it being unplugged at a particular time?"
- Dispatcher should drop all configuration files. Sure it made sense
12 months ago when the idea was to run it standalone. Now all of that configuration should be in the database and should be provided by the scheduler to the dispatcher as a big serialized argument (or a file descriptor or a temporary file on disk). Setting up the dispatcher for a new instance is a pain and unless you can copy stuff from the validation server and ask everyone around for help it's very hard to get right.
If you're using a type of board that has support 'upstream' it's actually pretty easy, you basically just need to create a file per device that indicates which type it is.
Apart from the fact that it's all a bit all over the place, I don't see how setting up things in the django admin interface is actually easier than setting it up in the filesystem.
Having said all of that, I agree with this goal :)
If master images could be constructed programmatically and with a agent on each "master image" lava would just get that configuration for free.
- We should drop conmux. As in the lab we already have TCP/IP sockets
for the serial lines we could just provide my example serial->tcp script as lava-serial service that people with directly attached boards would use. We could get a similar lava-power service if that would make sense. The lava-serial service could be started as an instance for all USB/SERIAL adapters plugged in if we really wanted (hello upstart!). The lava-power service would be custom and would require some config but it is very rare. Only lab and me have something like that. Again it should be instance based IMHO so I can say: 'start lava-power CONF=/etc/lava-power/magic-hack.conf' and see LAVA know about a power service. One could then say that a particular board uses a particular serial and power services.
I agree here. conmux is useful, but we don't need the 'mux' part at all, and I find myself restarting the daemon all the damn time just to get it working again.
Cheers, mwh
W dniu 08.12.2011 02:22, Michael Hudson-Doyle pisze:
On Wed, 7 Dec 2011 17:01:25 +0100, Zygmunt Krynickizygmunt.krynicki@linaro.org wrote:
Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
- Running code via serial on the master image is a mess. It is very
fragile.
Is it really? It's a bit of a pain, but it seems this part actually works ok for us. It also has the advantage that all the logs are in one place.
<<CONMUX DISCONNECTED>>
@#!@$ !@ serial output, without any sensible way to break it down (and I don't count matching "# echo LAVA DISPATCHER: now doing foo" as sensible.
A few random reasons for not using serial the way we do it today:
1) Random console message breaks our system of tracking state and invoking commands. 2) We could put pppd on the serial line to get early networking for our agent, we could assume we can download stuff in the master image without ethernet (not that it would be much useful at the speed). We could use TCP to have networked API on the master image. 3) Serial console slows down stuff a LOT. Check how fast you can boot without serial console (hint, much faster). We can still keep all the logs around by other means.
Also, I don't see anything in your proposals that would get us away from having to talk to the bootloader over the serial line. Also getting the boot log for a failed boot seems somehow essential and I don't know how we can do that without a serial connection (this is different from running commands, though).
I'm not saying "we should not use the serial line". I'm saying "we should not use the serial line for everything in the most crude form possible".
For the time being (until I patch u-boot to talk to LAVA) the boot loader will stay as is. For the vast amount of wall clock time spent after the boot loader we can do smarter things without waiting for the sun to eclipse and origen networking to work.
You can send a series of commands to a device. Get return codes back, without parsing, reliably. You can do structured logging (where the device keeps logs for each command it receives), and it will be never confused by funny output pattern. We can ask the device to reboot while other tasks are hanging. We can download stuff without putting wget on the board and piping it to tar for crying out loud.
We need an agent on the board instead of a random master image+serial shell. The agent will expose board identity, capabilities and standard APIs to LAVA (notably the dispatcher). The same API, if done sensibly, will work for software emulators and hardware boards. Agent API for a software emulator can do different things. Dispatcher should be based on agent API instead of ramming the serial line.
Well, I just rewrote chunks of the dispatcher to work for software emulators, albeit taking a different approach. Not sure the approach you propose is really any different, although perhaps it would be easier to distribute to different machines.
I don't want to deprecate your work. What I'm doing here (apart from hand waving and shouting) is discussing how it should work to be more reliable and future proof. I'm sure that implementing this will take a lot of time in practice and that dispatcher maintenance is as relevant as it was yesterday. I need to dig deeper into current dispatcher code to be able to judge this. Still I think that dispatcher is orthogonal. You can build the dispatcher on top of what it currently does or on top of a board API object. Both code variants can coexist for a long while.
- The master image, as we know it today, should be booting remotely.
The boot loader can stay on the board until we can push it over USB. The only thing that absolutely has to stay in the card is the lava board identity file which would be generated from the web UI. There is no reason to keep rootfs/kernel/initrd there. This means that a single small card can fit all tests as well. It also means we can reset the master image (as currently it is writeable by the board and can be corrupted) before booting to ensure consistent behaviour. I did some work on that and I managed to boot panda over NFS. Ideally I want to boot over nbd (netblock device) which is much faster and with proper "master image" init script we can expose a single read only net block device to _all_ the boards.
This sounds good.
- With agent on each board, identity file on the SD card LAVA will
know if cloning happened. We could do dynamic board detection (unplug the board -> it goes away, plug it back -> it shows up). We could move a board from system to system and have 0config transitions.
I'm not sure about this though. How do you tell the difference between the agent going away because booting into the test image failed and it being unplugged at a particular time?"
Good point. The state of a device is a little bit more complicated than I presented. I wanted to point out that we could do discovery in a reliable way, something that we currently cannot do (and this prevents us from having foolproof provisioning of additional (or very first) devices.
For actual state we'd still have a few "in flux" moments like when doing a power cycle, transitioning from boot loader to kernel+userspace context etc.
As for totally unpluging devices. If you require a USB connection then you know your device went away ;-) That's what most people will do (one device + laptop) and that's what we'll eventually have to do (no dedicated serial / ethernet on devices, everything muxed through USB). Snowball is just a very simple example of that.
- Dispatcher should drop all configuration files. Sure it made sense
12 months ago when the idea was to run it standalone. Now all of that configuration should be in the database and should be provided by the scheduler to the dispatcher as a big serialized argument (or a file descriptor or a temporary file on disk). Setting up the dispatcher for a new instance is a pain and unless you can copy stuff from the validation server and ask everyone around for help it's very hard to get right.
If you're using a type of board that has support 'upstream' it's actually pretty easy, you basically just need to create a file per device that indicates which type it is.
That's good.
Apart from the fact that it's all a bit all over the place, I don't see how setting up things in the django admin interface is actually easier than setting it up in the filesystem.
It is not easier except that you can do the UI in Django and then touching filesystem directly is not an option. I want to get to a point where I can click through some wizards to get my panda working without having to open a console. With a few extra services the system will even _tell_ you that you've got a panda plugged in that needs provisioning.
Having said all of that, I agree with this goal :)
If master images could be constructed programmatically and with a agent on each "master image" lava would just get that configuration for free.
- We should drop conmux. As in the lab we already have TCP/IP sockets
for the serial lines we could just provide my example serial->tcp script as lava-serial service that people with directly attached boards would use. We could get a similar lava-power service if that would make sense. The lava-serial service could be started as an instance for all USB/SERIAL adapters plugged in if we really wanted (hello upstart!). The lava-power service would be custom and would require some config but it is very rare. Only lab and me have something like that. Again it should be instance based IMHO so I can say: 'start lava-power CONF=/etc/lava-power/magic-hack.conf' and see LAVA know about a power service. One could then say that a particular board uses a particular serial and power services.
I agree here. conmux is useful, but we don't need the 'mux' part at all, and I find myself restarting the daemon all the damn time just to get it working again.
I had the same experience during my (very brief) contact with this.
Thanks ZK
On Thu, 08 Dec 2011 04:05:51 +0100, Zygmunt Krynicki zygmunt.krynicki@linaro.org wrote:
W dniu 08.12.2011 02:22, Michael Hudson-Doyle pisze:
On Wed, 7 Dec 2011 17:01:25 +0100, Zygmunt Krynickizygmunt.krynicki@linaro.org wrote:
Hi, sorry for the topic, I wanted to catch your attention.
This is a quick brain dump based on my own observations/battle with master images last week.
- Running code via serial on the master image is a mess. It is very
fragile.
Is it really? It's a bit of a pain, but it seems this part actually works ok for us. It also has the advantage that all the logs are in one place.
<<CONMUX DISCONNECTED>>
Well, that's conmux being stupid. I agree with you (even if others don't) that at least looking at not using conmux would make sense.
I think this might be as simple as including a command to run in the device config file such as "cu -l /dev/ttyUSB0 -s 115200" or "telnet 192.168.1.11 7003" (and a command to run to power cycle the board) and passing that to pexpect rather than "conmux-console $board".
Let's talk about the other stuff in January :-)
Cheers, mwh