On Mon, Dec 02, 2013 at 01:51:22PM -0600, Matt Sealey wrote:
Here's where I think this whole thing falls down as being the weirdest possible implementation of this. It defies logic to put this information in the device tree /chosen node while also attempting to boot the kernel using an EFI stub; the stub is going to have this information because it is going to have the pointer to the system System Table (since it was called by StartImage()). Why not stash the System Table pointer somewhere safe in the stub?
We do. In the DT.
The information in the device tree is all accessible from Boot Services and as long as the System Table isn't being thrown away (my suggestion would be.. stuff it in r2, and set r1 = "EFI\0" then work with arch/arm/kernel/head{-common,}.S code to do the right thing)
You left out the bit of redefining the kernel boot protocol to permit calling it with caches, MMU and interrupts enabled - also known as before ExitBootServices().
It seems like the advantages of booting from UEFI and having all this information and API around are being thrown away very early, and picked up when it's no longer relevant to gain access to the very minimal runtime services. What's missing is a UUID for a "Device Tree Blob" in the Configuration Table, so you can very easily go grab that information from the firmware.
Which is what we are going to implement anyway in order to permit firmware to supply DT hardware description in the same way as with ACPI. Yes, we could pass the system table pointer directly - but that doesn't get us the memory map.
As implemented, these patches employ a very long-winded and complex method of recovering UEFI after throwing the system table pointer away early in boot, and then implements an EFI calling convention which isn't strictly necessary according to the UEFI spec - the question is, is this a workaround for SetVirtualAddressMap() not actually doing the right thing on ARM UEFI implementations? If you can't guarantee that most of the calls from Boot Services or Runtime Services are going to allow this, then having any UEFI support in the kernel at all seems rather weird.
No, it is a workaround for it being explicitly against the kernel boot protocol (not to mention slightly hairy) to enter head.S with MMU and caches enabled and interrupts firing.
The EFI calling convention (as pointed out in the patch itself) is there in order to not have to duplicate code already there for x86.
What I'm worried about is that this is basically a hack to try and shoehorn an existing UEFI implementation to an existing Linux boot method - and implements it in a way nobody is ever going to be able to justify improving. Part of the reason the OpenFirmware CIF got thrown away early in SPARC/PowerPC boot (after "flattening" the device tree using the CIF calls to parse it out) was because you had to disable the MMU, caches, interrupts etc. which caused all kinds of slow firmware code to be all kinds of extra-slow.
I prefer to see it as a way to not reinvent things that do not need reinventing, while not adding more special-case code to the kernel.
What that meant is nobody bothered to implement working, re-entrant, re-locatable firmware to a great degree. This ended up being a self-fulfilling prophecy of "don't trust the bootloader" and "get rid of it as soon as we can," which essentially meant Linux never took advantage of the resources available. In OF's case, the CIF sucked by specification. In UEFI's case here, it's been implemented in Linux in such a way that guarantees poor-performing firmware code with huge penalties to call them, which isn't even required by UEFI if the earlier boot code did the right things in the first place.
I don't follow. In which way does this implementation result in poor performance or reduced functionality?
We deal with a highly quirky set of requirements for calling SetVirtualAddressMap() in a clunky way - after which calls into UEFI are direct and cachable.
/ Leif
On Mon, Dec 2, 2013 at 3:07 PM, Leif Lindholm leif.lindholm@linaro.org wrote:
On Mon, Dec 02, 2013 at 01:51:22PM -0600, Matt Sealey wrote:
Here's where I think this whole thing falls down as being the weirdest possible implementation of this. It defies logic to put this information in the device tree /chosen node while also attempting to boot the kernel using an EFI stub; the stub is going to have this information because it is going to have the pointer to the system System Table (since it was called by StartImage()). Why not stash the System Table pointer somewhere safe in the stub?
We do. In the DT.
Hang on... see way below about "reinventing the wheel"
The information in the device tree is all accessible from Boot Services and as long as the System Table isn't being thrown away (my suggestion would be.. stuff it in r2, and set r1 = "EFI\0" then work with arch/arm/kernel/head{-common,}.S code to do the right thing)
You left out the bit of redefining the kernel boot protocol to permit calling it with caches, MMU and interrupts enabled - also known as before ExitBootServices().
And that's a horrible idea because of what?
What's evident here is there could be two major ways to generate an image that boots from a UEFI implementation;
* one whereby UEFI is jostled or coerced by second stage bootloader to load a plain zImage and you lose all information about UEFI except in the event that that information is preserved in the device tree by the firmware * one whereby a 'stock' UEFI is used and it boots only on UEFI because it is in a format very reasonably only capable of being booted by UEFI and, subordinately, - one where that plain zImage got glued to an EFI stub just like the decompressor is glued to the Image - one where the kernel needs to be built with support for UEFI and that somewhat changes the boot path
By the time we get half-way through arm/kernel/head.S the cache and MMU has been turned off and on and off again by the decompressor, and after a large amount of guesswork and arbitrary restriction-based implementation, there's no guarantee that the kernel hasn't been decompressed over some important UEFI feature or some memory hasn't been trashed. You can't make that guarantee because by entering the plain zImage, you forfeited that information. This is at worst case going to be lots of blank screens and blinking serial console prompts and little more than frustration..
Most of the guessing is ideally not required to be a guess at all, the restrictions are purely to deal with the lack of trust for the bootloader environment. Why can't we trust UEFI? Or at least hold it to a higher standard. If someone ships a broken UEFI, they screw a feature or have a horrible bug and ship it, laud the fact Linux doesn't boot on it and the fact that it's their fault - over their head. It actually works these days, Linux actually has "market share," companies really go out of their way to rescue their "image" and resolve the situation when someone blogs about a serious UEFI bug on their $1300 laptops, or even $300 tablets.
Which is what we are going to implement anyway in order to permit firmware to supply DT hardware description in the same way as with ACPI. Yes, we could pass the system table pointer directly - but that doesn't get us the memory map.
Boot Services gives you the ability to get the memory map.. and the kinds of things that live in those spots in the memory map. It's at least a better guess than "I am located at a specific place and can infer from linker data and masking off the bottom bits that there's probably this amount of RAM that starts at this location or thereabouts". It at least gives the ability to 'allocate' memory to put the page table instead of having a firmware call walk all over it, or having the kernel walk over some parts of firmware, or even not have to do anything except link in a decompressor (eh, sure, it means duplicating decompressor code in some cases, but I also don't think it's a sane requirement to include the entire decompression suite in the kernel proper if it only gets used once at early boot).
I prefer to see it as a way to not reinvent things that do not need reinventing, while not adding more special-case code to the kernel.
Isn't putting the System Table pointer in the DT specifically reinventing the UEFI boot process?
Booting from UEFI is a special case in itself.. the EFI stub here is putting a round block in a square hole.
There are two much, much better solutions: put the round block in a round hole. Put a square block in that square hole. We could do so much better than gluing the round block into the square hole.
What that meant is nobody bothered to implement working, re-entrant, re-locatable firmware to a great degree. This ended up being a self-fulfilling prophecy of "don't trust the bootloader" and "get rid of it as soon as we can," which essentially meant Linux never took advantage of the resources available. In OF's case, the CIF sucked by specification. In UEFI's case here, it's been implemented in Linux in such a way that guarantees poor-performing firmware code with huge penalties to call them, which isn't even required by UEFI if the earlier boot code did the right things in the first place.
I don't follow. In which way does this implementation result in poor performance or reduced functionality?
I believe what I am trying to object to is this weird process of getting to a state where you can get to UEFI, and why anyone would bother gluing the existing Linux kernel image to the back of an externally-built stub, only to do some really quite obnoxious tricks to enable it to go into a decompressor and then through, kernel setup head, that make a bunch of assumptions about the bootloader interface, then to try and recover the information that got thrown away and THEN attempt to reinstate some kind of UEFI functionality.
If your platform has UEFI, then your platform has UEFI - if you built a multiplatform kernel that needs to boot on U-Boot, then you glued an EFI stub to it to make it boot. At some point between the stub and the runtime services driver, you're going through 10,000 lines of code with the information that it *is* running on top of UEFI completely lost to the boot process.
I believe I am also objecting to the idea that the way this is BEST implemented is to take a stock zImage (decompressor+Image payload) and glue a stub in front to resolve the interface issue when the implication is extra complication to the boot process.
By not actually using it, nobody actually bothered to improve the firmware or fix bugs in the places where it could have been used. This ends up as a self-fulfilling prophecy of exhausting amounts of broken and unoptimized firmware.
Nobody in firmware-land has any impetus to fix those bugs or add useful optional features.
By "by not actually using it," I do mean the case where someone has UEFI and somehow boots a plain zImage and a DTB modified to include the System Table pointer. Because that door is completely wide open..
Personally I think having a well known environment at StartImage() jumping to your EFI application entry point is a great place to simplify the decompressor by integrating it into the stub.
At the point you then jump into kernel/head.S - you can still know you're on UEFI, with data in r1 and r2 strongly implying this is UEFI, you can branch to a much, MUCH simpler path for initialization where quite a lot of the work it's trying to do may have already been performed by the stub., and quite a lot of the bare-metalling doesn't need to be done.
I am sure, even if modifying head.S for any reason than to fix a bug or implement some architectural requirement is somehow frowned upon, that comparing r1 to a known constant machine id and branching to a uefi_start() (which, at that point, may as well be a C function, if the stub saw fit to keep around/throw in an early stack) is not going to cause anyone any problems (even if it does add 4 instructions to the entry and slow everyone else down by a nanosecond or two).
Everybody keeps their absolutely fixed entry point to the image proper, that way, so you can still glue your stub (with or without the decompressor as part of the stub) to the front with no changes to the build process for the image or the code path for non-UEFI.. one conditional branch and you can gain a lot of much, much easier to maintain boot process..
We deal with a highly quirky set of requirements for calling SetVirtualAddressMap() in a clunky way - after which calls into UEFI are direct and cachable.
If the kernel boot process now has been derived from years upon years of trial and error and engineering, then it does seem a shame to go do things a different way, you would be right to say it would be a shame not to promote code-reuse of the existing process by not touching the zImage stuff or core kernel boot, and just working on the glue and some not-so-early-init code.
But what it does is make the boot process *more* complicated than it's already complicated implementation, in the face of a very nice specification of the correct way to deal with booting something from a UEFI implementation..
What might be a much better route to take could be to define a nice, shiny new way of getting Linux to the point that it has full control over it's own destiny which does a hell of a lot less, with a less schizophrenic view of using UEFI or not.
Ta, Matt Sealey neko@bakuhatsu.net
On Wed, 2013-12-04 at 15:06 -0600, Matt Sealey wrote:
On Mon, Dec 2, 2013 at 3:07 PM, Leif Lindholm leif.lindholm@linaro.org wrote:
On Mon, Dec 02, 2013 at 01:51:22PM -0600, Matt Sealey wrote:
Here's where I think this whole thing falls down as being the weirdest possible implementation of this. It defies logic to put this information in the device tree /chosen node while also attempting to boot the kernel using an EFI stub; the stub is going to have this information because it is going to have the pointer to the system System Table (since it was called by StartImage()). Why not stash the System Table pointer somewhere safe in the stub?
We do. In the DT.
Hang on... see way below about "reinventing the wheel"
The information in the device tree is all accessible from Boot Services and as long as the System Table isn't being thrown away (my suggestion would be.. stuff it in r2, and set r1 = "EFI\0" then work with arch/arm/kernel/head{-common,}.S code to do the right thing)
You left out the bit of redefining the kernel boot protocol to permit calling it with caches, MMU and interrupts enabled - also known as before ExitBootServices().
And that's a horrible idea because of what?
Talk about reinventing the wheel.
I look at it like this. UEFI applications have a specific boot protocol. The kernel has a different boot protocol. The purpose of the stub is to go from the UEFI protocol to the kernel protocol. The kernel protocol doesn't currently include an explicit way to pass UEFI info (system table and memory map). It does have a way to pass a DT. Much like x86 and ia64 pass the UEFI info in an already existing boot_params block, arm and arm64 pass that info in the device tree. Not changing the kernel boot protocol seems like the simplest and best way to get the job done. Maybe x86 and now arm are going about the wrong way and should be doing it differently, but so far, I'm not convinced that is the case.
--Mark
On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
there's no guarantee that the kernel hasn't been decompressed over some important UEFI feature or some memory hasn't been trashed. You can't make that guarantee because by entering the plain zImage, you forfeited that information.
The stub is responsible for ensuring that the compressed kernel is loaded at a suitable address. Take a look at efi_relocate_kernel().
Most of the guessing is ideally not required to be a guess at all, the restrictions are purely to deal with the lack of trust for the bootloader environment. Why can't we trust UEFI? Or at least hold it to a higher standard. If someone ships a broken UEFI, they screw a feature or have a horrible bug and ship it, laud the fact Linux doesn't boot on it and the fact that it's their fault - over their head. It actually works these days, Linux actually has "market share," companies really go out of their way to rescue their "image" and resolve the situation when someone blogs about a serious UEFI bug on their $1300 laptops, or even $300 tablets.
Yeah, that hasn't actually worked out too well for us.
On Wed, Dec 4, 2013 at 4:44 PM, Matthew Garrett mjg59@srcf.ucam.org wrote:
On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
there's no guarantee that the kernel hasn't been decompressed over some important UEFI feature or some memory hasn't been trashed. You can't make that guarantee because by entering the plain zImage, you forfeited that information.
The stub is responsible for ensuring that the compressed kernel is loaded at a suitable address. Take a look at efi_relocate_kernel().
My objection is the suitable address is based on a restriction that booting from UEFI doesn't have and information UEFI provides that makes kernel features from head.S (both of them) easier to get around. The kernel doesn't need to be within a particular range of the start of memory, nor does the device tree or ramdisk require being in a particular place. What the code before efi_relocate_kernel does is allocate a maximum-sized-buffer to safely decompress in, which is just a gross way to do it, then crosses it's fingers based on the way it has historically worked - while you might want to assume that the decompression process is quite well defined and reliable, I keep seeing patches come in that stop it from doing weird unsavory behavior - for example decompressing over it's own page table.
The decompressor - and the kernel head it jumps to after decompression - *guess* all the information UEFI could have provided and completely regenerate the environment for the decompressor itself (stacks, hacky memory allocations, cache on, off, on, off, on... fudging locations of page tables, zreladdr fixup, low level debug message output, in context of UEFI - reimplementation of memcpy, memset). It forfeits a more controlled and lean boot process to capitulate to a historical legacy. Since you're taking over the decompressor head.S anyway, why not take control of the decompression process?
It sets up a page table location the hard way (as above.. also patched recently not to decompress over it's own page table). It doesn't need to relocate itself past the end of the decompressed image. It doesn't need to set up the C environment - UEFI did that for it. It makes assumptions about the stack and hacks memory allocations for the decompression.. it turns the cache on, decompresses, then turns it off again... you can just walk through the code under the EFI stub in compressed/head.S and see all this can just fall away.
There's one immediate advantage too, if it's actually implemented and working, which is that for kernel images that are compressed using the standard UEFI compression method no actual decompression code needs to be added to the stub, and the functionality gets the exact length of the required decompression buffer.. that doesn't reduce flexibility in kernel compression as long as there is still the possibility of adding additional compression code to the stub.
The second immediate advantage is that the EFI stub/decompressor can actually verify that the *decompressed* image meets Secure Boot requirements.
Once you get past the decompressor and into the kernel proper head.S, creating the page tables (again) and turning the MMU on, pv table patching.. if you still had the information around, that gets simpler too.
Grant suggested I should propose some patches; sure, if I'm not otherwise busy.
Maybe the Linaro guys can recommend a platform (real or emulated) that would be best to test it on with the available UEFI?
Most of the guessing is ideally not required to be a guess at all, the restrictions are purely to deal with the lack of trust for the bootloader environment. Why can't we trust UEFI? Or at least hold it to a higher standard. If someone ships a broken UEFI, they screw a feature or have a horrible bug and ship it, laud the fact Linux doesn't boot on it and the fact that it's their fault - over their head. It actually works these days, Linux actually has "market share," companies really go out of their way to rescue their "image" and resolve the situation when someone blogs about a serious UEFI bug on their $1300 laptops, or even $300 tablets.
Yeah, that hasn't actually worked out too well for us.
Aside from Teething problems caused by a rush to market ;)
For the "ARM server market" rather than the "get the cheapest tablet/ultrabook out of the door that runs Windows 8/RT" I am sure this is going to get to be VERY important for vendors to take into account. Imagine if Dell shipped a *server* where Linux would brick it out of the box just for setting a variable.. however, if it works the day they ship the server, and Linux gains better support for UEFI booting which breaks the server in question, that's our fault for not doing it in the right way in the first place, and Dell can be just as angry at us as we would be at them. Vendors won't test code that doesn't exist for obvious reasons.
This is what I was trying to get at about them not updating their firmware support for the more firmware-aware method if it works with the "ditch firmware early" method worked well for them (which means the functionality in the firmware never gets stressed and the self-fulfilling prophecy of untrustworthy firmware vendors persists). That firmware quality assurance - if not the code itself - will trickle down to consumer tablets and ARM thin laptop kind of devices.
Ta, Matt Sealey neko@bakuhatsu.net
On Fri, 6 Dec 2013 11:20:45 -0600, Matt Sealey neko@bakuhatsu.net wrote:
On Wed, Dec 4, 2013 at 4:44 PM, Matthew Garrett mjg59@srcf.ucam.org wrote:
On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
Grant suggested I should propose some patches; sure, if I'm not otherwise busy.
Maybe the Linaro guys can recommend a platform (real or emulated) that would be best to test it on with the available UEFI?
Roy Franz (cc'd) has got UEFI running under QEMU. A few modifications were required to both stock UEFI and QEMU. I'm not sure what the status of mainlining those patches is. I think there are still a few things that Roy has to fix, but you should be able to get the current patches from him to get going.
g.
On Tue, Dec 10, 2013 at 4:30 AM, Grant Likely grant.likely@linaro.org wrote:
On Fri, 6 Dec 2013 11:20:45 -0600, Matt Sealey neko@bakuhatsu.net wrote:
On Wed, Dec 4, 2013 at 4:44 PM, Matthew Garrett mjg59@srcf.ucam.org wrote:
On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
Grant suggested I should propose some patches; sure, if I'm not otherwise busy.
Maybe the Linaro guys can recommend a platform (real or emulated) that would be best to test it on with the available UEFI?
Roy Franz (cc'd) has got UEFI running under QEMU. A few modifications were required to both stock UEFI and QEMU. I'm not sure what the status of mainlining those patches is. I think there are still a few things that Roy has to fix, but you should be able to get the current patches from him to get going.
g.
Hi Grant and Matt,
I have put together a quick wiki page describing the current status, with git trees for UEFI and QEMU, and instructions for running the model. I just whipped this up now, so it is pretty basic, but should have all the required information.
https://wiki.linaro.org/LEG/Engineering/Kernel/UEFI/VersatileExpress/QEMU
Please let me know if you have any questions.
Thanks, Roy
On Tue, 10 Dec 2013 10:29:34 -0800, Roy Franz roy.franz@linaro.org wrote:
On Tue, Dec 10, 2013 at 4:30 AM, Grant Likely grant.likely@linaro.org wrote:
On Fri, 6 Dec 2013 11:20:45 -0600, Matt Sealey neko@bakuhatsu.net wrote:
On Wed, Dec 4, 2013 at 4:44 PM, Matthew Garrett mjg59@srcf.ucam.org wrote:
On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
Grant suggested I should propose some patches; sure, if I'm not otherwise busy.
Maybe the Linaro guys can recommend a platform (real or emulated) that would be best to test it on with the available UEFI?
Roy Franz (cc'd) has got UEFI running under QEMU. A few modifications were required to both stock UEFI and QEMU. I'm not sure what the status of mainlining those patches is. I think there are still a few things that Roy has to fix, but you should be able to get the current patches from him to get going.
g.
Hi Grant and Matt,
I have put together a quick wiki page describing the current status, with git trees for UEFI and QEMU, and instructions for running the model. I just whipped this up now, so it is pretty basic, but should have all the required information.
https://wiki.linaro.org/LEG/Engineering/Kernel/UEFI/VersatileExpress/QEMU
Please let me know if you have any questions.
Thanks Roy.
g.
On Wed, 4 Dec 2013 15:06:47 -0600, Matt Sealey neko@bakuhatsu.net wrote:
If your platform has UEFI, then your platform has UEFI - if you built a multiplatform kernel that needs to boot on U-Boot, then you glued an EFI stub to it to make it boot. At some point between the stub and the runtime services driver, you're going through 10,000 lines of code with the information that it *is* running on top of UEFI completely lost to the boot process.
I believe I am also objecting to the idea that the way this is BEST implemented is to take a stock zImage (decompressor+Image payload) and glue a stub in front to resolve the interface issue when the implication is extra complication to the boot process.
Adding UEFI support to an existing image type was a design goal when we started. Having yet another image format which is not compatibile with existing firmware adds yet another barrier to migrating from U-Boot to UEFI, or to supporting multiplatforms.
g.
On Wed, Dec 04, 2013 at 03:06:47PM -0600, Matt Sealey wrote:
By the time we get half-way through arm/kernel/head.S the cache and MMU has been turned off and on and off again by the decompressor, and after a large amount of guesswork and arbitrary restriction-based implementation, there's no guarantee that the kernel hasn't been decompressed over some important UEFI feature or some memory hasn't been trashed. You can't make that guarantee because by entering the plain zImage, you forfeited that information. This is at worst case going to be lots of blank screens and blinking serial console prompts and little more than frustration..
So, Grant covered the reason _why_ we coexist with zImage, so I won't go into that. I will however point out that we are explicitly using the UEFI interfaces to allocate the regions the zImage will decompress into. This isn't guesswork, and has in fact already turned up issues with a couple of UEFI board ports that reserved memory near 0 (which were indeed previously being silently overwritten by the kernel decompression).
We _are_ planning to do more development for subsequent patches, making more use of the UEFI memory map. And by subsequent, I mean hopefully in time for 3.14. I sneekily included this in the version of uefi.txt sent out for separate review early November: http://permalink.gmane.org/gmane.linux.kernel.efi/2657, but not in the one included with this patch set (since the code isn't there yet). But we considered it more important to get the basic support ready first.
At that point, you will see the stub reading the dram_base from the UEFI memory map rather than DT, and memblock_init getting its input from there too.
Regards,
Leif