On Mon, 16 Jul 2018 at 18:57, Grant Likely grant.likely@arm.com wrote:
On 16/07/2018 12:35, Neil Williams wrote:
On Mon, 16 Jul 2018 at 18:11, Daniel Thompson <
daniel.thompson@linaro.org>
wrote:
Hi Folks
Sorry if you have already seen this but for those that didn't, last
Friday
Grant pushed out version 0.6 of EBBR. Details below but the summary is that it is time for feedback.
This v0.6 release of EBBR is a pre-release document. The contents are not final. The purpose of this release is to raise awareness of the
EBBR
project, and to solicit feedback on the current draft. Please do read and provide comments on the boot-architecture@lists.linaro.org mailing
list.
We had a training session at HKG18 looking at best practices for automating devices and there are aspects of any boot architecture which will make or break automation. Has any one considered adding to the EBBR based on the objectives of supporting automation of compliant devices?
http://connect.linaro.org/resource/hkg18/hkg18-tr10/ That session was based on this white paper:
https://collaborate.linaro.org/display/CTT/Automation+and+hardware+design
but that URL is still restricted to people with a login on collaborate,
so
I blogged the content here: https://linux.codehelp.co.uk/automation-and-risk.html (It is a long
read)
There is a lot of interest in automating testing of all parts of the
boot
process on a range of boards. Linaro has a lot of experience of
automating
a range of devices and ARM is building a similar data set using the same tools.
Useful. I'll read the blog post. Thanks.
Principle elements would be:
- Reliability -
- all identifiers are persistent across updates of the software,
reboots
etc.
- Reliable reporting of errors.
- Outputting sensible, unique, strings when performing a CPU reset
- Uniqueness - all identifiers are unique and exposed to other parts of
the boot architecture, e.g. NIC needs to be visible to bootloaders like Grub.
- Scalability - avoid need for customised hardware to automate the device
- Deployment - consider how to deploy new software to the device under
automation.
EBBR talks about UEFI Reset but doesn't require that a reset is identifiable when it happens in automation - a simple string like the U-Boot support for "Resetting CPU ..." is enough for the automation to detect that the system we tried to deploy has not booted and what is
about
to happen is that something else may get booted in it's place.
It isn't something that has come up yet, but not I think for lack of interest. Much of the focus so far has been getting a baseline of functionality defined.
It would be easy to add requirements on boot messages. The process is completely open, so you can propose some language that you think would be suitable. When you do, keep in mind that there are implementation choices in terms of console output. Mostly likely the vast majority of EBBR platforms will use some form of serial port as the primary console, but that won't alwasy be the case. The proposed text should take into account platforms that use a graphical console instead of serial port, even it that just means you use language something like "if the primary console is a serial port, then the firmware shall..."
* output stable, reliable and unique messages for all supportable error conditions and failures, including resets * any changes to messages are to be highlighted in published changelogs * full changelogs must be available with every revision of firmware * firmware shall provide a means to write out the version string for the current build, alongside the files to be deployed to the device - (this allows testing to reliably identify each build)
... any firmware which does not provide a serial console will not be able to support automation.
GUI consoles are impossible to automate reliably. All sorts of complex ideas can be proposed but the evidence is that none of those are reilable or scalable. As covered in the training session and the post, Keep It Simple - complexity undermines automation because reliability decreases out of proportion to any increase in complexity.
If automation is to be an objective, then there needs to be some level of serial support. It doesn't have to be primary but it will need to be available, functional and reliable.
What kind of testing needs to be considered too - if all that's required is to watch UEFI go passed and then interact with something pre-configured and persistent within the firmware, like Grub, then everything is easy. If there is any requirement to interact with UEFI under automation (adding boot options, setting variables etc.), then a graphical console will completely block automation. I don't think that this is a useful sacrifice. I would push for graphical consoles to be ruled out of the specification as actively harmful to any kind of automation and only to be made available IF there is a serial alternative properly supported. (That serial alternative could, of course, be a BMC - anything so long as there is no need to interact with the graphical console under automation.)
I would also take a look at the UEFI spec first to see if anything is prescribed there. EBBR is intended to inform reading of the UEFI spec. I don't want to take on content that should be in UEFI.
Someone needs to act as an intermediary for that - I am not going to be putting changes into the UEFI spec. (One reason I CC'd Steve.)
This highlights a problem for this kind of effort - the people with the automation experience do not generally have low level knowledge of things like UEFI. We need those with that knowledge to engage with automation and work out what changes are actually feasible.
Runtime variable access plays into the need to ensure that the network
PHY
is exposed to UEFI applications (like grubaarch64.efi)
How EBBR talks about runtime variables still needs some work. There are three scenarios that need to be considered:
- SetVariable() doesn't work at all -- rare
- SetVariable() works at boot time, but not runtime -- many platforms
- SetVariable() works as defined in UEFI.
UEFI doesn't say anything about #1 or #2. I think GetVariable() should work in all three cases, but SUSE currently assumes #2 if GetVariable() returns nothing at runtime. Text dealing with these scenarios needs to be written before v0.7 can be released.
Partitioning is a frequent problem when AOSP requires multiple partitions and OE testing requires fewer. Devices need to support switching
partition
tables without needing a full recovery deployment or the deployment of
new
bits of firmware.
Part of the goal of EBBR is to separate the firmware bits (partitions) from the OS bits. The intent is to be able to deploy firmware to a board regardless of the OS, and then be able to (re)install any OS without it disturbing the firmware partitions.
Thanks for the comments.
Cheers, g.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.