ARM VM System Specification ===========================
Goal ---- The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format ------------ The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware ---------------- The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Hardware Description -------------------- The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
VM Platform ----------- The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
We make the following recommendations for the guest OS kernel:
The guest OS must include support for GICv2 and any available newer version of the GIC architecture to maintain compatibility with older VM implementations.
It is strongly recommended to include support for all available (block, network, console, balloon) virtio-pci, virtio-mmio, and Xen PV drivers in the guest OS kernel or initial ramdisk.
Other common peripherals for block devices, networking, and more can (and typically will) be provided, but OS software written and compiled to run on ARM VMs cannot make any assumptions about which variations of these should exist or which implementation they use (e.g. VirtIO or Xen PV). See "Hardware Description" above.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
References ---------- [1]: The ARM Architecture Reference Manual, ARMv8, Issue A.b http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index...
[2]: ARM Server Base System Architecture http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0029/index.ht...
[3]: The ARM Generic Interrupt Controller Architecture Specifications v2.0 http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index...
[4]: http://www.secretlab.ca/archives/27
[5]: https://git.linaro.org/people/leif.lindholm/linux.git/blob/refs/heads/uefi-f...
[6]: UEFI Specification 2.4 Revision A http://www.uefi.org/sites/default/files/resources/2_4_Errata_A.pdf
Hi Christoffer,
On 02/26/2014 01:34 PM, Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Would you consider including simulators/emulators as well, such as QEMU in TCG mode?
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Can you elaborate on the motivation for requiring that the kernel be stuffed into a disk image and for requiring such a heavyweight bootloader/firmware? By doing so you would seem to exclude those requiring an optimized boot process.
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
Is VirtIO hotplug capable? Over PCI or MMIO transports or both?
We make the following recommendations for the guest OS kernel:
The guest OS must include support for GICv2 and any available newer version of the GIC architecture to maintain compatibility with older VM implementations.
It is strongly recommended to include support for all available (block, network, console, balloon) virtio-pci, virtio-mmio, and Xen PV drivers in the guest OS kernel or initial ramdisk.
I would love to eventually see some defconfigs for this sort of thing.
Other common peripherals for block devices, networking, and more can (and typically will) be provided, but OS software written and compiled to run on ARM VMs cannot make any assumptions about which variations of these should exist or which implementation they use (e.g. VirtIO or Xen PV). See "Hardware Description" above.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
Well, the commit message for it said it mandated a GIC and architected timers.
Regards, Christopher
On Wed, Feb 26, 2014 at 02:27:40PM -0500, Christopher Covington wrote:
Hi Christoffer,
On 02/26/2014 01:34 PM, Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Would you consider including simulators/emulators as well, such as QEMU in TCG mode?
Yes, but I think KVM or Xen is the most common use cases for this, but in fact, for KVM, most of the work to support this would be in QEMU anyhow and whether you choose to enable KVM or not shouldn't make any difference.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Can you elaborate on the motivation for requiring that the kernel be stuffed into a disk image and for requiring such a heavyweight bootloader/firmware? By doing so you would seem to exclude those requiring an optimized boot process.
What's the alternative? Shipping kernels externally and loading them externally? Sure you can do that, but then distros can't upgrade the kernel themselves, and you have to come up with a convention for how to ship kernels, initrd's etc.
This works well on x86 today and is going to reflect how most people see ARM server hardware to behave as well.
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
Is VirtIO hotplug capable? Over PCI or MMIO transports or both?
VirtIO devices attached on a PCIe bus are hotpluggable, the emulated PCIe bus itself would not have anything to do with virtio, except that virtio devices can hang off of it. AFAIU.
We make the following recommendations for the guest OS kernel:
The guest OS must include support for GICv2 and any available newer version of the GIC architecture to maintain compatibility with older VM implementations.
It is strongly recommended to include support for all available (block, network, console, balloon) virtio-pci, virtio-mmio, and Xen PV drivers in the guest OS kernel or initial ramdisk.
I would love to eventually see some defconfigs for this sort of thing.
Agreed, I think it's beyond the scope of this spec though.
Other common peripherals for block devices, networking, and more can (and typically will) be provided, but OS software written and compiled to run on ARM VMs cannot make any assumptions about which variations of these should exist or which implementation they use (e.g. VirtIO or Xen PV). See "Hardware Description" above.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
Well, the commit message for it said it mandated a GIC and architected timers.
Haven't we been down that road before? I think everyone pretty much agrees this is the definition of mach-virt today, but if this note causes people to start splitting hairs, I can remove the paragraph.
Thanks, -Christoffer
Hi Christoffer,
On 02/26/2014 02:51 PM, Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 02:27:40PM -0500, Christopher Covington wrote:
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Can you elaborate on the motivation for requiring that the kernel be stuffed into a disk image and for requiring such a heavyweight bootloader/firmware? By doing so you would seem to exclude those requiring an optimized boot process.
What's the alternative? Shipping kernels externally and loading them externally? Sure you can do that, but then distros can't upgrade the kernel themselves, and you have to come up with a convention for how to ship kernels, initrd's etc.
The self-hosted upgrades use case makes sense. I can imagine using a pass-through or network filesystem to do it in the case of external loading, something like the following. In the case of P9, the tag could be the same as the GPT GUID. Everything could still be in the /EFI/BOOT directory. The kernel Image could be at BOOT(ARM|AA64).IMG, the zImage at .ZMG, and the initramfs at .RFS. It's more work for distros to support multiple upgrade methods, though, so maybe those who want an optimized boot process should make an external loader capable of carving the necessary components out of a VFAT filesystem inside a GPT partitioned image instead.
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
Is VirtIO hotplug capable? Over PCI or MMIO transports or both?
VirtIO devices attached on a PCIe bus are hotpluggable, the emulated PCIe bus itself would not have anything to do with virtio, except that virtio devices can hang off of it. AFAIU.
So network/block device only as memory mapped peripherals (like SMSC or PL SD/MMC) or over VirtIO-MMIO won't meet the specification? Is PCI/VirtIO-PCI on ARM production ready? What's the motivation for requiring hotplug?
Thanks, Christopher
On Thu, Feb 27, 2014 at 08:12:35AM -0500, Christopher Covington wrote:
Hi Christoffer,
On 02/26/2014 02:51 PM, Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 02:27:40PM -0500, Christopher Covington wrote:
[...]
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
Is VirtIO hotplug capable? Over PCI or MMIO transports or both?
VirtIO devices attached on a PCIe bus are hotpluggable, the emulated PCIe bus itself would not have anything to do with virtio, except that virtio devices can hang off of it. AFAIU.
So network/block device only as memory mapped peripherals (like SMSC or PL SD/MMC) or over VirtIO-MMIO won't meet the specification? Is PCI/VirtIO-PCI on ARM production ready? What's the motivation for requiring hotplug?
Platform devices that don't sit on any 'real bus' are generally not hotpluggable.
VM management systems such as OpenStack make heavy use of hotplug to add storage to your VMs, for example.
This spec does not prohibit devices over mmio, or over virtio-mmio, in fact it encourages guest kernels to include support for such. But it mandates that there is some hotpluggable bus, so a very common VM use case is supported for ARM VMs.
PCI for ARM is not ready yet, but people are working on it. That should not have any bearing on what the right decision for this spec is though - it's all early stage at this point.
-Christoffer
On Thu, 27 Feb 2014 08:12:35 -0500, Christopher Covington cov@codeaurora.org wrote:
Hi Christoffer,
On 02/26/2014 02:51 PM, Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 02:27:40PM -0500, Christopher Covington wrote:
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Can you elaborate on the motivation for requiring that the kernel be stuffed into a disk image and for requiring such a heavyweight bootloader/firmware? By doing so you would seem to exclude those requiring an optimized boot process.
This spec doesn't exclude or prevent VMs from doing that if the user wants to. It is about specifying the base requirements for a disk image to be portable. Any disk image conforming to this spec should boot on any VM conforming to the spec.
What's the alternative? Shipping kernels externally and loading them externally? Sure you can do that, but then distros can't upgrade the kernel themselves, and you have to come up with a convention for how to ship kernels, initrd's etc.
The self-hosted upgrades use case makes sense. I can imagine using a pass-through or network filesystem to do it in the case of external loading, something like the following.
Network booting is actually something else that is already supported and doesn't use the filesystem protocol at all. DHCP+TFTP to obtain the 2nd stage loader (which can do whatever it wants) is the preferred way to do things. The problem with trying to provide a network filesystem
Section 3.4.2 (Boot via LOAD_FILE_PROTOCOL) and 3.4.2.1 (Network Booting) covers that scenario. Network boot uses most of the Preboot eXecution Environment (PXE) spec except it retrieves an EFI executable instead of a PXE executable. It is expected that PXE server will then supply all the boot configuration to the client. The exact same thing can be done with VMs... all of which is out of scope for this section because it is talking about disk images! :-)
g.
On Wed, 26 Feb 2014 10:34:54 -0800 Christoffer Dall christoffer.dall@linaro.org wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
I disagree here, 32 bit should use u-boot and look like an existing 32 bit system. there is no reason why 32 bit arm should look different as a guest from the host
Dennis
[why did you drop everyone from cc here?]
On 26 February 2014 11:42, Dennis Gilmore dennis@gilmore.net.au wrote:
On Wed, 26 Feb 2014 10:34:54 -0800 Christoffer Dall christoffer.dall@linaro.org wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
I disagree here, 32 bit should use u-boot and look like an existing 32 bit system. there is no reason why 32 bit arm should look different as a guest from the host
Why? It will look different if you use virtio devices and mach-virt. Sure, you can emulate a vexpress or something else real, but you have to ask yourself why. The main use case we have for 32 bit ARM at the moment is networking use cases and those use cases need virtio block, virtio net, and device passthrough.
Also, I'm afraid "u-boot and look like an existing 32 bit system" is not much of a spec. How does a distro vendor ship an image based on that description that they can be sure will boot?
If you can write up a more concrete suggestion for something u-boot based that you'd like to have considered instead, we can see what people say.
Personally I think keeping things uniform across both 32-bit and 64-bit is better, and the GTP/EFI image is a modern standard that should work well.
-Christoffer
On Wed, 26 Feb 2014 11:56:53 -0800 Christoffer Dall christoffer.dall@linaro.org wrote:
[why did you drop everyone from cc here?]
standard reply to list behavior, I would appreciate if you followed it.
On 26 February 2014 11:42, Dennis Gilmore dennis@gilmore.net.au wrote:
On Wed, 26 Feb 2014 10:34:54 -0800 Christoffer Dall christoffer.dall@linaro.org wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
I disagree here, 32 bit should use u-boot and look like an existing 32 bit system. there is no reason why 32 bit arm should look different as a guest from the host
Why? It will look different if you use virtio devices and mach-virt. Sure, you can emulate a vexpress or something else real, but you have to ask yourself why. The main use case we have for 32 bit ARM at the moment is networking use cases and those use cases need virtio block, virtio net, and device passthrough.
who says this is the main use case. so long as we have a dtb that specifies things correctly it wont matter.
Also, I'm afraid "u-boot and look like an existing 32 bit system" is not much of a spec. How does a distro vendor ship an image based on that description that they can be sure will boot?
based on the work to make a standard boot environment I have been working on, pass in the u-boot binary and things will work by loading config from inside the image and acting just like any system. really UEFI is major overkill here and a massive divergence from the real world. What is the argument that justifies the divergence?
If you can write up a more concrete suggestion for something u-boot based that you'd like to have considered instead, we can see what people say.
All the work I have been doing and bring up on this list for the last 6 months to make a standard boot environment using extlinux configuration from syslinux.
Personally I think keeping things uniform across both 32-bit and 64-bit is better, and the GTP/EFI image is a modern standard that should work well.
It means that installers will need special code paths to support being installed into virt guests and is not sustainable or supportable. as hardware wont work the same way.
Dennis
On Wed, Feb 26, 2014 at 9:15 PM, Dennis Gilmore dennis@gilmore.net.au wrote:
On Wed, 26 Feb 2014 11:56:53 -0800 Christoffer Dall christoffer.dall@linaro.org wrote:
[why did you drop everyone from cc here?]
standard reply to list behavior, I would appreciate if you followed it.
Not on the Linaro, infradead or vger lists. We preserve cc's here, always have.
On 26 February 2014 11:42, Dennis Gilmore dennis@gilmore.net.au wrote:
On Wed, 26 Feb 2014 10:34:54 -0800 Christoffer Dall christoffer.dall@linaro.org wrote:
Also, I'm afraid "u-boot and look like an existing 32 bit system" is not much of a spec. How does a distro vendor ship an image based on that description that they can be sure will boot?
based on the work to make a standard boot environment I have been working on, pass in the u-boot binary and things will work by loading config from inside the image and acting just like any system. really UEFI is major overkill here and a massive divergence from the real world. What is the argument that justifies the divergence?
That's what I used to say all the time until I actually looked at it. It isn't the horrid monster that many of us feared it would be. There is a fully open source implementation hosted on sourceforge which is what I would expect most VM vendors to use directly. It isn't unreasonably large and it implements sane behaviour.
Remember, we are talking about what is needed to make a portable VM ecosystem. The folks working on the UEFI spec have spent a lot of time thinking about how to choose what image to boot from a disk and the spec is well defined in this regard. That aspect has not been U-Boot's focus and U-Boot isn't anywhere near as mature as UEFI in that regard (nor would I expect it to be; embedded has never had the same incentive to create portable boot images as general purpose machines).
Also, specifying UEFI for this spec does not in any way prevent someone from running U-Boot in their VM, or executing the kernel directly. This spec is about a platform for portable images and it is important to as much as possible specify things like firmware interfaces without a whole lot of options. Other use-cases can freely disregard the spec and run whatever they want.
Personally I think keeping things uniform across both 32-bit and 64-bit is better, and the GTP/EFI image is a modern standard that should work well.
It means that installers will need special code paths to support being installed into virt guests and is not sustainable or supportable. as hardware wont work the same way.
Installers already have the EFI code paths, the kernel patches for both 32 and 64 bit ARM are in-flight and will get merged soon. The grub patches are done and merged. Installers will work exactly the same way on real hardware with EFI and on VMs with EFI. It will also work exactly the same way between x86, ARM and ARM64. What part is unsustainable?
g.
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
However, as ACPI will not be supported by arm32, not having the complete FDT will prevent you from running a 32-bit guest on a 64-bit hypervisor, which I consider an important use case.
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
Isn't this more of a CPU capabilities question? Or maybe you should just add 'if aarch32 mode supported is supported by the host CPU'.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
I think you should specify exactly what you want PCIe to look like, if present. Otherwise you can get wildly incompatible bus discovery.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
Did you notice we are removing mach-virt now? Probably no point mentioning it here.
Arnd
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
The Xen guys are hard-set on not supporting a pl011. If we can convince them or force it upon them, I'm ok with that.
I agree we should stay close to the SBSA, but I think there are considerations for VMs beyond the SBSA that warrants this spec.
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The most common response I've been getting so far is that people generally want their VMs to look close to the real thing, but not sure how valid an argument that is.
Some people feel strongly about this and seem to think that ARMv8 kernels will only work with ACPI in the future...
Another case is that it's a good development platform. I know nothing of developing and testing ACPI, so I won't judge one way or the other.
However, as ACPI will not be supported by arm32, not having the complete FDT will prevent you from running a 32-bit guest on a 64-bit hypervisor, which I consider an important use case.
Agreed, I didn't appreciate that fact. Hmmm, we need to consider that case.
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
Isn't this more of a CPU capabilities question? Or maybe you should just add 'if aarch32 mode supported is supported by the host CPU'.
The recommendation is to tell people to actually have a -aarch32 (or whatever it would be called) that works in their VM implementation. This can certainly be reworded.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
I think you should specify exactly what you want PCIe to look like, if present. Otherwise you can get wildly incompatible bus discovery.
As soon as there is more clarity on what it will actually look like, I'll be happy to add this. I'm afraid my PCIe understanding is too piecemeal to fully grasp this, so concrete suggestions for the text would be much appreciated.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
Did you notice we are removing mach-virt now? Probably no point mentioning it here.
Yes, I'm aware. I've just heard people say "why do we need this, isn't mach-virt all we need", and therefore I added the note.
I can definitely can rid of this paragraph in the future if it causes more harm than good.
Thanks! -Christoffer
On Wednesday 26 February 2014 12:05:37 Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
The most common response I've been getting so far is that people generally want their VMs to look close to the real thing, but not sure how valid an argument that is.
Some people feel strongly about this and seem to think that ARMv8 kernels will only work with ACPI in the future...
That is certainly a misconception that has caused a lot of trouble. We will certainly keep supporting FDT boot in ARMv8 indefinitely, and I expect that most systems will not use ACPI at all.
The case for ACPI is really SBSA compliant servers, where ACPI serves to abstract the hardware differences to let you boot an OS that does not know about the system details for things like power management.
For embedded systems that are not SBSA compliant, using ACPI doesn't gain you anything and causes a lot of headache, so we won't do that.
Another case is that it's a good development platform. I know nothing of developing and testing ACPI, so I won't judge one way or the other.
The interesting aspects of developing and testing ACPI are all related to the hardware specific parts. Testing ACPI on a trivial virtual machine doesn't gain you much once the basic support is there.
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
Isn't this more of a CPU capabilities question? Or maybe you should just add 'if aarch32 mode supported is supported by the host CPU'.
The recommendation is to tell people to actually have a -aarch32 (or whatever it would be called) that works in their VM implementation. This can certainly be reworded.
Yes, the intention is good, it just won't work on a few systems when the CPU designers took the shortcut to implement only the 64-bit instructions. Apparently those systems will exist, but I expect them to be the exception, and I don't know if they will support virtualization.
I think you should specify exactly what you want PCIe to look like, if present. Otherwise you can get wildly incompatible bus discovery.
As soon as there is more clarity on what it will actually look like, I'll be happy to add this. I'm afraid my PCIe understanding is too piecemeal to fully grasp this, so concrete suggestions for the text would be much appreciated.
Will Deacon is currently prototyping a PCI model using kvmtool, trying to make it as simple as possible, and compliant to SBSA. How about we wait for the next version of that to see if we're happy with it, and then figure out if all hypervisors we care about can use the same interface.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
Did you notice we are removing mach-virt now? Probably no point mentioning it here.
Yes, I'm aware. I've just heard people say "why do we need this, isn't mach-virt all we need", and therefore I added the note.
I can definitely can rid of this paragraph in the future if it causes more harm than good.
Maybe replace it with something like this:
| On both arm32 and arm64, no platform specific kernel code must be required, | and all device detection must happen through the device description | passed from the hypervisor or discoverable buses.
A harder question is what peripherals we should list as mandatory or optional beyond what you have already. We could try coming up with an exhaustive list of devices that are supported by mainline Linux-3.10 (the current longterm release) and implemented by any of the hypervisors, but we probably want to leave open the possibility to extend it later as we implement new virtual devices.
Arnd
On Wed, Feb 26, 2014 at 2:22 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 26 February 2014 12:05:37 Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
The most common response I've been getting so far is that people generally want their VMs to look close to the real thing, but not sure how valid an argument that is.
Some people feel strongly about this and seem to think that ARMv8 kernels will only work with ACPI in the future...
That is certainly a misconception that has caused a lot of trouble. We will certainly keep supporting FDT boot in ARMv8 indefinitely, and I expect that most systems will not use ACPI at all.
Furthermore, even enterprise distro kernels will boot DT based kernels assuming the h/w support is mainlined despite statements to the contrary. It is a requirement in mainline kernels that DT and ACPI support to coexist. Distros are not going to go out of their way to undo/break that. And since the boot interface is DT, you can't simply turn off DT. :)
Rob
On Wed, Feb 26, 2014 at 03:56:02PM -0600, Rob Herring wrote:
On Wed, Feb 26, 2014 at 2:22 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 26 February 2014 12:05:37 Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
The most common response I've been getting so far is that people generally want their VMs to look close to the real thing, but not sure how valid an argument that is.
Some people feel strongly about this and seem to think that ARMv8 kernels will only work with ACPI in the future...
That is certainly a misconception that has caused a lot of trouble. We will certainly keep supporting FDT boot in ARMv8 indefinitely, and I expect that most systems will not use ACPI at all.
Furthermore, even enterprise distro kernels will boot DT based kernels assuming the h/w support is mainlined despite statements to the contrary. It is a requirement in mainline kernels that DT and ACPI support to coexist. Distros are not going to go out of their way to undo/break that. And since the boot interface is DT, you can't simply turn off DT. :)
Personally I'm all for simplicity so I don't want to push any agenda for ACPI in VMs.
Note that the spec does not mandate the use of ACPI, it just tells you how to do it if you wish to.
But, we can change the spec to require full FDT description of the system, unless of course some of the ACPI-in-VM supporters manage to convince the rest.
-Christoffer
On Wednesday 26 February 2014, Christoffer Dall wrote:
Personally I'm all for simplicity so I don't want to push any agenda for ACPI in VMs.
Note that the spec does not mandate the use of ACPI, it just tells you how to do it if you wish to.
But, we can change the spec to require full FDT description of the system, unless of course some of the ACPI-in-VM supporters manage to convince the rest.
I guess the real question is whether we are interested in running Windows RT in VM guests. I don't personally expect MS to come out with a port for this spec, no matter what we do, but some of you may have information I don't.
Arnd
Il 27/02/2014 08:30, Arnd Bergmann ha scritto:
I guess the real question is whether we are interested in running Windows RT in VM guests. I don't personally expect MS to come out with a port for this spec, no matter what we do, but some of you may have information I don't.
Given enough firmware and driver support there's no reason why Windows and Linux should be any different---they certainly aren't on x86.
Right now Windows on ARM is just tablets and hence RT; but things might change for server-based ARM. So Windows support as a VM should definitely be on the table, but so is writing custom drivers: you certainly will be able at some point to build virtio drivers for ARM, and use them with this spec.
Paolo
On Wed, 26 Feb 2014 14:21:47 -0800, Christoffer Dall christoffer.dall@linaro.org wrote:
On Wed, Feb 26, 2014 at 03:56:02PM -0600, Rob Herring wrote:
On Wed, Feb 26, 2014 at 2:22 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 26 February 2014 12:05:37 Christoffer Dall wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
The most common response I've been getting so far is that people generally want their VMs to look close to the real thing, but not sure how valid an argument that is.
Some people feel strongly about this and seem to think that ARMv8 kernels will only work with ACPI in the future...
That is certainly a misconception that has caused a lot of trouble. We will certainly keep supporting FDT boot in ARMv8 indefinitely, and I expect that most systems will not use ACPI at all.
Furthermore, even enterprise distro kernels will boot DT based kernels assuming the h/w support is mainlined despite statements to the contrary. It is a requirement in mainline kernels that DT and ACPI support to coexist. Distros are not going to go out of their way to undo/break that. And since the boot interface is DT, you can't simply turn off DT. :)
Personally I'm all for simplicity so I don't want to push any agenda for ACPI in VMs.
Note that the spec does not mandate the use of ACPI, it just tells you how to do it if you wish to.
But, we can change the spec to require full FDT description of the system, unless of course some of the ACPI-in-VM supporters manage to convince the rest.
Given that we don't even have a reliable view of what ACPI is going to look like on hardware yet, it probably is appropriate to omit talking about it entirely for v1.0.
... although let me walk through the implications of how that could be done:
Option 1: If v1.0 requires VM provide FDT, and v2.0 requires VM provide either FDT or ACPI: - v1.0 VM shall always provide an FDT - v2.0 VM might provide ACPI without FDT - v1.0 OS must accept FDT - v2.0 OS must accept both ACPI and FDT Implications: - a v1.0 OS might not boot on a v2.0 VM (because it cannot handle ACPI) - a v2.0 OS will always boot on both v1.0 and v2.0 VMs - ACPI-only OS not supported by this spec (Windows; potentially RHEL) - FDT-only OS not supported by v2.0 of the spec. If we're only talking Linux this isn't a problem, but it does put additional burden on anyone doing a domain-specific OS. (for example, a bare-metal networking application using virtualized devices)
Option 2: If v1.0 requires FDT, and v2.0 requires either FDT only or FDT+ACPI - v1.0 and v2.0 VMs shall always provide and FDT - v2.0 VMs may additionally provide ACPI - v1.0 and v2.0 OSes must accept FDT - No requirement for OS to accept ACPI Impliciations: - a v1.0 OS will boot on a v1.0 and v2.0 VM (FDT is always provided) - a v2.0 OS will boot on a v1.0 and v2.0 VM - ACPI-only OS not supported by this spec for all configurations. There would need to be a stronger version of the spec (v2.0-ACPI) to specify the configuration usable by ACPI OSes.
Option 3: If v1.0 requires FDT, and v2.0 requires both FDT and ACPI - v1.0 and v2.0 VMs shall always provide and FDT - v2.0 VMs shall always provide ACPI - v1.0 OSes must accept FDT - v2.0 OS may accept either ACPI or FDT Impliciations: - a v1.0 OS will boot on a v1.0 and v2.0 VM (FDT is always provided) - a v2.0 OS will boot on a v1.0 and v2.0 VM - Both FDT-only and ACPI-only OSes supported by v2.0 version of spec
Someone is going to be unhappy no matter what is chosen. I think it is critical to be really clear about who the audience is. Doing the above also highlights for me the cost adding either/or options to the spec. Every 'or' option given to VMs adds cost to the OS. ie. if the spec allows the VM to implement either ACPI or FDT, then a compliant OS must support both. Alternately, if the spec allows the OS to implement only ACPI or only FDT, then compliant VMs are forced to implement both. In both cases it is a non-trivial burden (As far as I know, no ACPI-on-arm work has been done for *BSD, and it is yet to be done for all the VMs). The spec will be the most useful if as many options as possible are eliminated.
Right now I think option 2 above makes the most sense; require FDT and make ACPI an optional extension. That will support almost all Linux vendors immediately and possibly even FreeBSD. As others have said, Windows is a complete unknown and we have no idea if they will be interested in the spec (nor is this the right forum for that conversation, but I will bring it up with my contacts).
g.
On 26 February 2014 20:05, Christoffer Dall christoffer.dall@linaro.org wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The most common response I've been getting so far is that people generally want their VMs to look close to the real thing, but not sure how valid an argument that is.
Some people feel strongly about this and seem to think that ARMv8 kernels will only work with ACPI in the future...
My strong feeling is that AArch64 kernels *may* support ACPI in the future ;).
On a more serious note, both FDT and ACPI will be first-class citizens on AArch64 and I have no intention whatsoever of dropping FDT.
Il 26/02/2014 20:55, Arnd Bergmann ha scritto:
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
In x86 land it's been certainly helpful to abstract hotplug capabilities. For ARM it could be the same. Not so much for PCI (ARM probably can use native PCIe hotplug and standard hotplug controllers; on x86 we started with PCI and also have to deal with Windows's lack of support for SHPC), but it could help for CPU and memory hotplug.
Did you notice we are removing mach-virt now? Probably no point mentioning it here.
Peter, do we still want mach-virt support in QEMU then?
Paolo
On 26 February 2014 20:19, Paolo Bonzini pbonzini@redhat.com wrote:
Il 26/02/2014 20:55, Arnd Bergmann ha scritto:
Did you notice we are removing mach-virt now? Probably no point mentioning it here.
Peter, do we still want mach-virt support in QEMU then?
Yes, it will be about the only thing that supports PCI-e :-) When the kernel guys say "removing mach-virt" they mean removing a specific lump of kernel code, not removing support for having a kernel that can handle a machine described only by the device tree.
thanks -- PMM
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
I believe it comes down to how much extra overhead pl011-access-trap would be over virtio-console. If low, then sure. (Since there are certain things we cannot provide SBSA-compliant in the guest anyway, I wouldn't consider lack of pl011 to be a big issue.)
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The point is that if we need to share any real hw then we need to use whatever the host has.
However, as ACPI will not be supported by arm32, not having the complete FDT will prevent you from running a 32-bit guest on a 64-bit hypervisor, which I consider an important use case.
In which case we would be making an active call to ban anything other than virtio/xen-pv devices for 32-bit guests on hardware without DT.
However, I see the case of a 32-bit guest on 64-bit hypervisor as less likely in the server space than in mobile, and ACPI in mobile as unlikely, so it may end up not being a big issue.
/ Leif
On Wed, Feb 26, 2014 at 09:48:43PM +0000, Leif Lindholm wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
I believe it comes down to how much extra overhead pl011-access-trap would be over virtio-console. If low, then sure. (Since there are certain things we cannot provide SBSA-compliant in the guest anyway, I wouldn't consider lack of pl011 to be a big issue.)
I don't think it's about overhead, sure pl011 may be slower, but it's a serial port, does anyone care about performance of a console? pl011 should be good enough for sure.
I think the issue is that Xen does no real device emulation today at all, they don't use QEMU etc. That being said, adding a pl011 emulation in the Xen tools doesn't seem like the worst idea in the world, but I will let them chime in on why they are opposed to it.
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The point is that if we need to share any real hw then we need to use whatever the host has.
However, as ACPI will not be supported by arm32, not having the complete FDT will prevent you from running a 32-bit guest on a 64-bit hypervisor, which I consider an important use case.
In which case we would be making an active call to ban anything other than virtio/xen-pv devices for 32-bit guests on hardware without DT.
However, I see the case of a 32-bit guest on 64-bit hypervisor as less likely in the server space than in mobile, and ACPI in mobile as unlikely, so it may end up not being a big issue.
But you may want a networking applicance, for example, built on ARM 64-bit hardware, with an ARM 64-bit hypervisor on there, and you wish to run some 32-bit system in a VM. If the hardware is ACPI based, it would be a shame if the VM implementation (or the virtual firmware) couldn't provide a useable 32-bit FDT. We may want to call that case out in this document?
Thanks, -Christoffer
On Wed, 26 Feb 2014 14:25:53 -0800, Christoffer Dall christoffer.dall@linaro.org wrote:
On Wed, Feb 26, 2014 at 09:48:43PM +0000, Leif Lindholm wrote:
On Wed, Feb 26, 2014 at 08:55:58PM +0100, Arnd Bergmann wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
I believe it comes down to how much extra overhead pl011-access-trap would be over virtio-console. If low, then sure. (Since there are certain things we cannot provide SBSA-compliant in the guest anyway, I wouldn't consider lack of pl011 to be a big issue.)
I don't think it's about overhead, sure pl011 may be slower, but it's a serial port, does anyone care about performance of a console? pl011 should be good enough for sure.
The reason pl011 is specified in the SBSA is to have a reliable output port available from the first instruction. It is a debug feature. Hardware vendors aren't even need to wire it up to a physical serial port. It is just as valid to wire it to the RAS controller.
g.
On 02/26/2014 01:48 PM, Leif Lindholm wrote:
However, I see the case of a 32-bit guest on 64-bit hypervisor as less likely in the server space than in mobile, and ACPI in mobile as unlikely, so it may end up not being a big issue.
In Fedora we expect future aarch32 builder hardware refreshes to be fulfilled with aarch64 hardware. It would be handy if we had the deployment option of full virt rather than chroot or worse, a 32 bit kernel. Other distros may have a similar desire. Cheers,
On Wed, 26 Feb 2014, Leif Lindholm wrote:
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The point is that if we need to share any real hw then we need to use whatever the host has.
That's right.
I dislike ACPI as much as the next guy, but unfortunately if the host only supports ACPI, the Linux driver for a particular device only works together with ACPI, and you want to assign that device to a VM, then we might be forced to use ACPI to describe it.
On Thursday 27 February 2014 12:31:55 Stefano Stabellini wrote:
On Wed, 26 Feb 2014, Leif Lindholm wrote:
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The point is that if we need to share any real hw then we need to use whatever the host has.
I would be more comfortable defining in the spec that you cannot share hardware at all. Obviously that doesn't stop anyone from actually sharing hardware with the guest, but at that point it would become noncompliant with this spec, with the consequence that you couldn't expect a compliant guest image to run on that hardware, but that is exactly something we can't guarantee anyway because we don't know what drivers might be needed.
Also, there is no way to generally do this with either FDT or ACPI: In the former case, the hypervisor needs to modify any properties that point to other device nodes so that they point to nodes visible to the guest. That may be possible for simple things like IRQs and reg properties, but as soon as you get into stuff like dmaengine, pinctrl or PHY references, you just can't solve it in a generic way.
For ACPI it's probably worse: any AML methods that the host has are unlikely to work in the guest, and it's impossible to translate them at all.
Obviously things are different for Xen Dom0 where we share *all* devices between host and guest, and we just use the host firmware interfaces. That case again cannot be covered by the generic VM system specification.
I dislike ACPI as much as the next guy, but unfortunately if the host only supports ACPI, the Linux driver for a particular device only works together with ACPI, and you want to assign that device to a VM, then we might be forced to use ACPI to describe it.
Can anyone think of an example where this would actually work?
The only case I can see where it's possible to share a device with a guest without the hypervisor building up the description is for PCI functions that are passed through with an IOMMU. Those won't need ACPI or DT support however.
Arnd
Am 27.02.2014 um 22:00 schrieb Arnd Bergmann arnd@arndb.de:
On Thursday 27 February 2014 12:31:55 Stefano Stabellini wrote: On Wed, 26 Feb 2014, Leif Lindholm wrote:
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The point is that if we need to share any real hw then we need to use whatever the host has.
I would be more comfortable defining in the spec that you cannot share hardware at all. Obviously that
Nonono we want to share hardware.
doesn't stop anyone from actually sharing hardware with the guest, but at that point it would become noncompliant with this spec, with the consequence that you couldn't expect a compliant guest image to run on that hardware, but that is exactly something we can't guarantee anyway because we don't know what drivers might be needed.
Also, there is no way to generally do this with either FDT or ACPI: In the former case, the hypervisor needs to modify any properties that point to other device nodes so that they point to nodes visible to the guest. That may be possible for simple things like IRQs and reg properties, but as soon as you get into stuff like dmaengine, pinctrl or PHY references, you just can't solve it in a generic way.
For ACPI it's probably worse: any AML methods that the host has are unlikely to work in the guest, and it's impossible to translate them at all.
Obviously things are different for Xen Dom0 where we share *all* devices between host and guest, and we just use the host firmware interfaces. That case again cannot be covered by the generic VM system specification.
I dislike ACPI as much as the next guy, but unfortunately if the host only supports ACPI, the Linux driver for a particular device only works together with ACPI, and you want to assign that device to a VM, then we might be forced to use ACPI to describe it.
Can anyone think of an example where this would actually work?
The only case I can see where it's possible to share a device with a guest without the hypervisor building up the description is for PCI functions that are passed through with an IOMMU. Those won't need ACPI or DT support however.
If you want to assign a platform device, you need to generate a respective hardware description (fdt/dsdt) chunk in the hypervisor. You can't take the host's description - it's too tightly coupled to the overall host layout.
Imagine you get an AArch64 notebook with Windows on it. You want to run Linux there, so your host needs to understand ACPI. Now you want to run a Windows guest inside a VM, so you need ACPI in there again.
Replace Windows by "Linux with custom drivers" and you're in the same situation even when you neglect Windows. Reality will be that we will have fdt and acpi based systems.
Alex
On Thursday 27 February 2014 22:24:13 Alexander Graf wrote:
If you want to assign a platform device, you need to generate a respective hardware description (fdt/dsdt) chunk in the hypervisor. You can't take the host's description - it's too tightly coupled to the overall host layout.
But at that point, you need hardware specific drivers in both the hypervisor and in the guest OS. When you have that, why do you still care about a system specification?
Going back to the previous argument, since the hypervisor has to make up the description for the platform device itself, it won't matter whether the host is booted using DT or ACPI: the description that the hypervisor makes up for the guest has to match what the hypervisor uses to describe the rest of the guest environment, which is independent of what the host uses.
Imagine you get an AArch64 notebook with Windows on it. You want to run Linux there, so your host needs to understand ACPI. Now you want to run a Windows guest inside a VM, so you need ACPI in there again.
And you think that Windows is going to support a VM system specification we are writing here? Booting Windows RT in a virtual machine is certainly an interesting use case, but I think we will have to emulate a platform that WinRT supports then, rather than expect it to run on ours.
Replace Windows by "Linux with custom drivers" and you're in the same situation even when you neglect Windows. Reality will be that we will have fdt and acpi based systems.
We will however want to boot all sorts of guests in a standardized virtual environment:
* 32 bit Linux (since some distros don't support biarch or multiarch on arm64) for running applications that are either binary-only or not 64-bit safe. * 32-bit Android * big-endian Linux for running applications that are not endian-clean (typically network stuff ported from powerpc or mipseb. * OS/v guests * NOMMU Linux * BSD based OSs * QNX * random other RTOSs
Most of these will not work with ACPI, or at least not in 32-bit mode. 64-bit Linux will obviously support both DT (always) and ACPI (optionally), depending on the platform, but for a specification like this, I think it's much easier to support fewer options, to make it easier for other guest OSs to ensure they actually run on any compliant hypervisor.
Arnd
Am 28.02.2014 um 03:56 schrieb Arnd Bergmann arnd@arndb.de:
On Thursday 27 February 2014 22:24:13 Alexander Graf wrote:
If you want to assign a platform device, you need to generate a respective hardware description (fdt/dsdt) chunk in the hypervisor. You can't take the host's description - it's too tightly coupled to the overall host layout.
But at that point, you need hardware specific drivers in both the hypervisor and in the guest OS.
In our case, you need hardware specific drivers in QEMU and the guest, correct.
When you have that, why do you still care about a system specification?
Because I don't want to go back to the system level definition. To me a peripheral is a peripheral - regardless of whether it is on a platform bus or a PCI bus. I want to leverage common ground and only add the few pieces that diverge from it.
Going back to the previous argument, since the hypervisor has to make up the description for the platform device itself, it won't matter whether the host is booted using DT or ACPI: the description that the hypervisor makes up for the guest has to match what the hypervisor uses to describe the rest of the guest environment, which is independent of what the host uses.
I agree 100%. This spec should be completely independent of the host.
The reason I brought up the host is that if you want to migrate an OS from physical to virtual, you may need to pass through devices to make this work. If your host driver developers only ever worked with ACPI, there's a good chance the drivers won't work in a pure dt environment.
Brw, the same argument applies the other way around as well. I don't believe we will get around with generating and mandating a single machibe description environment.
Imagine you get an AArch64 notebook with Windows on it. You want to run Linux there, so your host needs to understand ACPI. Now you want to run a Windows guest inside a VM, so you need ACPI in there again.
And you think that Windows is going to support a VM system specification we are writing here? Booting Windows RT in a virtual machine is certainly an interesting use case, but I think we will have to emulate a platform that WinRT supports then, rather than expect it to run on ours.
Point taken :). Though it is a real shame we are in that situation in tge first place. X86 makes life a lot easier here.
Replace Windows by "Linux with custom drivers" and you're in the same situation even when you neglect Windows. Reality will be that we will have fdt and acpi based systems.
We will however want to boot all sorts of guests in a standardized virtual environment:
- 32 bit Linux (since some distros don't support biarch or multiarch
on arm64) for running applications that are either binary-only or not 64-bit safe.
- 32-bit Android
- big-endian Linux for running applications that are not endian-clean
(typically network stuff ported from powerpc or mipseb.
- OS/v guests
- NOMMU Linux
- BSD based OSs
- QNX
- random other RTOSs
Most of these will not work with ACPI, or at least not in 32-bit mode. 64-bit Linux will obviously support both DT (always)
Unfortunately not
and ACPI (optionally), depending on the platform, but for a specification like this, I think it's much easier to support fewer options, to make it easier for other guest OSs to ensure they actually run on any compliant hypervisor.
You're forgetting a few pretty important cases here:
* Enterprise grade Linux distribution that only supports ACPI * Maybe WinRT if we can convince MS to use it * Non-Linux with x86/ia64 heritage and thus ACPI support
If we want to run those, we need to expose ACPI tables.
Again, I think the only reasonable thing to do is to implement and expose both. That situation sucks, but we got into it ourselves ;).
Alex
On Friday 28 February 2014 08:05:15 Alexander Graf wrote:
Am 28.02.2014 um 03:56 schrieb Arnd Bergmann arnd@arndb.de:
On Thursday 27 February 2014 22:24:13 Alexander Graf wrote:
When you have that, why do you still care about a system specification?
Because I don't want to go back to the system level definition. To me a peripheral is a peripheral - regardless of whether it is on a platform bus or a PCI bus. I want to leverage common ground and only add the few pieces that diverge from it.
You may be missing a lot of the complexity that describing platform devices in the general case brings then. To pass through an ethernet controller, you may also need to add (any subset of)
* phy device * clock controller * voltage regulator * gpio controller * LED controller * DMA engine * an MSI irqchip * IOMMU
Each of the above in turn is shared with other peripherals on the host, which brings you to three options:
* change the driver to not depend on the above, but instead support an abstract virtualized version of the platform device that doesn't need them. * Pass through all devices this one depends on, giving up on guest isolation. This may work for some embedded use cases, but not for running untrusted guests. * Implement virtualized versions of the other interfaces and make the hypervisor talk to the real hardware.
I would still argue that each of those approaches is out of scope for this specification.
Going back to the previous argument, since the hypervisor has to make up the description for the platform device itself, it won't matter whether the host is booted using DT or ACPI: the description that the hypervisor makes up for the guest has to match what the hypervisor uses to describe the rest of the guest environment, which is independent of what the host uses.
I agree 100%. This spec should be completely independent of the host.
The reason I brought up the host is that if you want to migrate an OS from physical to virtual, you may need to pass through devices to make this work. If your host driver developers only ever worked with ACPI, there's a good chance the drivers won't work in a pure dt environment.
Brw, the same argument applies the other way around as well. I don't believe we will get around with generating and mandating a single machibe description environment.
I see those two cases as completely distinct. There are good reasons for emulating a real machine for running a guest image that expects certain hardware, and you can easily do this with qemu. But since you are emulating an existing platform and run an existing OS, you don't need a VM System Specification, you just do whatever the platform would normally do that the guest relies on.
The VM system specification on the other hand should allow you to run any OS that is written to support this specification on any hypervisor that implements it.
Replace Windows by "Linux with custom drivers" and you're in the same situation even when you neglect Windows. Reality will be that we will have fdt and acpi based systems.
We will however want to boot all sorts of guests in a standardized virtual environment:
- 32 bit Linux (since some distros don't support biarch or multiarch
on arm64) for running applications that are either binary-only or not 64-bit safe.
- 32-bit Android
- big-endian Linux for running applications that are not endian-clean
(typically network stuff ported from powerpc or mipseb.
- OS/v guests
- NOMMU Linux
- BSD based OSs
- QNX
- random other RTOSs
Most of these will not work with ACPI, or at least not in 32-bit mode. 64-bit Linux will obviously support both DT (always)
Unfortunately not
and ACPI (optionally), depending on the platform, but for a specification like this, I think it's much easier to support fewer options, to make it easier for other guest OSs to ensure they actually run on any compliant hypervisor.
You're forgetting a few pretty important cases here:
- Enterprise grade Linux distribution that only supports ACPI
You can't actually turn off DT support in the kernel, and I don't think there is any point in patching the kernel to remove it. The only sane thing the enterprise distros can do is turn on the "SBSA" platform that supports all compliant machines running ACPI, but turn off all platforms that are not SBSA compliant and boot using DT. With the way that the VM spec is written at this point, Linux will still boot on these guests, since the hardware support is a subset of SBSA, with the addition of a few drivers for hypervisor specific features.
- Maybe WinRT if we can convince MS to use it
I'd argue that would be unlikely.
- Non-Linux with x86/ia64 heritage and thus ACPI support
That assumes that x86 ACPI support is anything like ARM64 ACPI support, which it really isn't. In particular, you can turn off ACPI on any x86 machine and it will still work for the most part, while on ARM64 we will need to use ACPI to describe even the most basic aspects of the platform.
Arnd
On 28 February 2014 02:05, Alexander Graf agraf@suse.de wrote:
Am 28.02.2014 um 03:56 schrieb Arnd Bergmann arnd@arndb.de
Replace Windows by "Linux with custom drivers" and you're in the same situation even when you neglect Windows. Reality will be that we will have fdt and acpi based systems.
We will however want to boot all sorts of guests in a standardized virtual environment:
- 32 bit Linux (since some distros don't support biarch or multiarch
on arm64) for running applications that are either binary-only or not 64-bit safe.
- 32-bit Android
- big-endian Linux for running applications that are not endian-clean
(typically network stuff ported from powerpc or mipseb.
- OS/v guests
- NOMMU Linux
- BSD based OSs
- QNX
- random other RTOSs
*snip*
You're forgetting a few pretty important cases here:
- Enterprise grade Linux distribution that only supports ACPI
- Maybe WinRT if we can convince MS to use it
- Non-Linux with x86/ia64 heritage and thus ACPI support
I think we need limit the scope of the spec a bit here.
For a this VM system specification, we should describe what is simple to set up and high performance for KVM, Xen and mainline Linux. For everyone else, there is Qemu where we can mangle to provide whatever the guest OS might want.
Re UEFI,
Riku
On Fri, 28 Feb 2014, Alexander Graf wrote:
We will however want to boot all sorts of guests in a standardized virtual environment:
- 32 bit Linux (since some distros don't support biarch or multiarch
on arm64) for running applications that are either binary-only or not 64-bit safe.
- 32-bit Android
- big-endian Linux for running applications that are not endian-clean
(typically network stuff ported from powerpc or mipseb.
- OS/v guests
- NOMMU Linux
- BSD based OSs
- QNX
- random other RTOSs
8<---
- Enterprise grade Linux distribution that only supports ACPI
- Maybe WinRT if we can convince MS to use it
- Non-Linux with x86/ia64 heritage and thus ACPI support
If we want to run those, we need to expose ACPI tables.
Again, I think the only reasonable thing to do is to implement and expose both. That situation sucks, but we got into it ourselves ;).
I think we should have a clear idea on the purpose of this doc: is it a spec that we expect Linux and other guest OSes to comply to if they want to run on KVM/Xen? Or is it a document that describes the state of the world at the beginning of 2014?
If it is a spec, then we should simply ignore non-collaborative vendors and their products. If we know in advance that they are not going to comply to the spec, what's the point of trying to accommodate them here? We can always carry our workarounds and hacks in the hypervisor if we want to run their products as guests.
On Thu, 27 Feb 2014 15:00:44 +0100, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 27 February 2014 12:31:55 Stefano Stabellini wrote:
On Wed, 26 Feb 2014, Leif Lindholm wrote:
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
What's the point of having ACPI in a virtual machine? You wouldn't need to abstract any of the hardware in AML since you already know what the virtual hardware is, so I can't see how this would help anyone.
The point is that if we need to share any real hw then we need to use whatever the host has.
I would be more comfortable defining in the spec that you cannot share hardware at all. Obviously that doesn't stop anyone from actually sharing hardware with the guest, but at that point it would become noncompliant with this spec, with the consequence that you couldn't expect a compliant guest image to run on that hardware, but that is exactly something we can't guarantee anyway because we don't know what drivers might be needed.
I don't think this spec should say *anything* about sharing hardware. This spec is about producing portable disk images. Assigning hardware into guests is rather orthogonal to whether or not hardware is assigned.
g.
On Wed, Feb 26, 2014 at 1:55 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
The SBSA only spec's a very minimal pl011 subset which is only suitable for early serial output. Not only is there no DMA, but there are no interrupts and maybe no input. I think it also assumes the uart is enabled and configured already by firmware. It is all somewhat pointless because the location is still not known or discoverable by early code. Just mandating a real pl011 would have been better, but I guess uart IP is value add for some. There is a downside to the pl011 which is the tty name is different from x86 uarts which gets exposed to users and things like libvirt.
I think the VM image just has to support pl011, virtio-console, and xen console. Arguably, an 8250 should also be included in interest of making things just work.
Rob
On 26 February 2014 22:49, Rob Herring robherring2@gmail.com wrote
The SBSA only spec's a very minimal pl011 subset which is only suitable for early serial output. Not only is there no DMA, but there are no interrupts and maybe no input.
No interrupts on a UART in a VM (especially an emulated one) is a good way to spend all your time bouncing around in I/O emulation of the "hey can we send another byte yet?" register...
I think it also assumes the uart is enabled and configured already by firmware. It is all somewhat pointless because the location is still not known or discoverable by early code.
This sounds like we should specify and implement something so we can provide this information in the device tree. Telling the kernel where the hardware is is exactly what DT is for, right?
I think the VM image just has to support pl011, virtio-console, and xen console. Arguably, an 8250 should also be included in interest of making things just work.
What does the 8250 have to recommend it over just providing the PL011?
thanks -- PMM
On Wed, Feb 26, 2014 at 4:54 PM, Peter Maydell peter.maydell@linaro.org wrote:
On 26 February 2014 22:49, Rob Herring robherring2@gmail.com wrote
The SBSA only spec's a very minimal pl011 subset which is only suitable for early serial output. Not only is there no DMA, but there are no interrupts and maybe no input.
No interrupts on a UART in a VM (especially an emulated one) is a good way to spend all your time bouncing around in I/O emulation of the "hey can we send another byte yet?" register...
I think it also assumes the uart is enabled and configured already by firmware. It is all somewhat pointless because the location is still not known or discoverable by early code.
This sounds like we should specify and implement something so we can provide this information in the device tree. Telling the kernel where the hardware is is exactly what DT is for, right?
Yes, I'm looking into that, but that's not really a concern for this doc as early output is a debug feature.
I think the VM image just has to support pl011, virtio-console, and xen console. Arguably, an 8250 should also be included in interest of making things just work.
What does the 8250 have to recommend it over just providing the PL011?
As I mentioned, it will just work for anything that expects the serial port to be ttyS0 as on x86 rather than ttyAMA0. Really, I'd like to see ttyAMA go away, but evidently that's not an easily fixed issue and it is an ABI.
Rob
On 26 February 2014 23:08, Rob Herring robherring2@gmail.com wrote:
On Wed, Feb 26, 2014 at 4:54 PM, Peter Maydell peter.maydell@linaro.org wrote:
What does the 8250 have to recommend it over just providing the PL011?
As I mentioned, it will just work for anything that expects the serial port to be ttyS0 as on x86 rather than ttyAMA0.
This doesn't seem very compelling to me. Either userspace should just be fixed or the kernel should implement a namespace for serial ports which doesn't randomly change just because the particular h/w driver doing the implementation is different. We shouldn't be papering over other peoples' problems in the VM spec IMHO.
thanks -- PMM
On Wed, 26 Feb 2014, Peter Maydell wrote:
On 26 February 2014 23:08, Rob Herring robherring2@gmail.com wrote:
On Wed, Feb 26, 2014 at 4:54 PM, Peter Maydell peter.maydell@linaro.org wrote:
What does the 8250 have to recommend it over just providing the PL011?
As I mentioned, it will just work for anything that expects the serial port to be ttyS0 as on x86 rather than ttyAMA0.
This doesn't seem very compelling to me. Either userspace should just be fixed or the kernel should implement a namespace for serial ports which doesn't randomly change just because the particular h/w driver doing the implementation is different. We shouldn't be papering over other peoples' problems in the VM spec IMHO.
Indeed.
This is already causing us trouble in other contexts. Please have a look at this patch for a solution:
http://article.gmane.org/gmane.linux.kernel.samsung-soc/27222
Nicolas
On Wed, Feb 26, 2014 at 05:08:38PM -0600, Rob Herring wrote:
What does the 8250 have to recommend it over just providing the PL011?
As I mentioned, it will just work for anything that expects the serial port to be ttyS0 as on x86 rather than ttyAMA0. Really, I'd like to see ttyAMA go away, but evidently that's not an easily fixed issue and it is an ABI.
Kernel parameters (eg. console=...) are going to be embedded inside the guest VM disk image inside a grub configuration or something like that, right? Is there enough here to ensure that I can ship a guest VM disk image that will correctly work (ie. use the right device) for both kernel console output and userspace getty, regardless of what the host is? Reading your post I realise that I'm not sure what I need to do to make this work, since the device may be different. Can something be recommended as part of the spec for this?
On 02/26/2014 05:49 PM, Rob Herring wrote:
On Wed, Feb 26, 2014 at 1:55 PM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 26 February 2014 10:34:54 Christoffer Dall wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
I would prefer if we can stay as close as possible to SBSA for individual hardware components, and only stray from it when there is a strong reason. pl011-subset doesn't sound like a significant problem to implement, especially as SBSA makes the DMA part of that optional. Can you elaborate on what hypervisor would have a problem with that?
The SBSA only spec's a very minimal pl011 subset which is only suitable for early serial output. Not only is there no DMA, but there are no interrupts and maybe no input. I think it also assumes the uart is enabled and configured already by firmware. It is all somewhat pointless because the location is still not known or discoverable by early code. Just mandating a real pl011 would have been better, but I guess uart IP is value add for some. There is a downside to the pl011 which is the tty name is different from x86 uarts which gets exposed to users and things like libvirt.
Can you just use /dev/console? That's what I use in my init scripts for portability from PL011 to DCC.
Christopher
Christoffer Dall christoffer.dall@linaro.org writes:
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
Maybe I'm missing something, but should this last bit say "a trivial FDT" instead of "no FDT"? If not, I don't understand the first paragraph :-)
Cheers, mwh
On Thu, Feb 27, 2014 at 10:05:05AM +1300, Michael Hudson-Doyle wrote:
Christoffer Dall christoffer.dall@linaro.org writes:
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the kernel entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
Maybe I'm missing something, but should this last bit say "a trivial FDT" instead of "no FDT"? If not, I don't understand the first paragraph :-)
That trivial FDT would be generated by the EFI stub in the kernel - not provided by the VM implementation.
-Christoffer
Hi Christoffer,
Comments below...
On 26 Feb 2014 18:35, "Christoffer Dall" christoffer.dall@linaro.org wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the
kernel
entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
I would drop pretty much all of the above detail of the kernel entry point. The spec should specify UEFI compliance and stop there.
What is relevant is the allowance for the UEFI implementation to provide an FDT and/or ACPI via the configuration table.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
It is actually valid for the VM to provide both ACPI and FDT. In that scenario it is up to the OS to chose which it will use.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
For portable disk image, can Xen PV be dropped from the list? pl011 is part of SBSA, and virtio is getting standardised, but Xen PV is implementation specific.
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
We make the following recommendations for the guest OS kernel:
The guest OS must include support for GICv2 and any available newer version of the GIC architecture to maintain compatibility with older VM implementations.
It is strongly recommended to include support for all available (block, network, console, balloon) virtio-pci, virtio-mmio, and Xen PV drivers in the guest OS kernel or initial ramdisk.
Other common peripherals for block devices, networking, and more can (and typically will) be provided, but OS software written and compiled to run on ARM VMs cannot make any assumptions about which variations of these should exist or which implementation they use (e.g. VirtIO or Xen PV). See "Hardware Description" above.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
References
[1]: The ARM Architecture Reference Manual, ARMv8, Issue A.b
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index...
[2]: ARM Server Base System Architecture
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0029/index.ht...
[3]: The ARM Generic Interrupt Controller Architecture Specifications v2.0
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index...
[5]:
https://git.linaro.org/people/leif.lindholm/linux.git/blob/refs/heads/uefi-f...
[6]: UEFI Specification 2.4 Revision A http://www.uefi.org/sites/default/files/resources/2_4_Errata_A.pdf
Hi Grant,
Thanks for comments,
On Wed, Feb 26, 2014 at 10:35:54PM +0000, Grant Likely wrote:
On 26 Feb 2014 18:35, "Christoffer Dall" christoffer.dall@linaro.org wrote:
ARM VM System Specification
Goal
The goal of this spec is to allow suitably-built OS images to run on all ARM virtualization solutions, such as KVM or Xen.
Recommendations in this spec are valid for aarch32 and aarch64 alike, and they aim to be hypervisor agnostic.
Note that simply adhering to the SBSA [2] is not a valid approach, for example because the SBSA mandates EL2, which will not be available for VMs. Further, the SBSA mandates peripherals like the pl011, which may be controversial for some ARM VM implementations to support. This spec also covers the aarch32 execution mode, not covered in the SBSA.
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
This ensures that tools for both Xen and KVM can load a binary UEFI firmware which can read and boot the EFI application in the disk image.
A typical scenario will be GRUB2 packaged as an EFI application, which mounts the system boot partition and boots Linux.
Virtual Firmware
The VM system must be able to boot the EFI application in the ESP. It is recommended that this is achieved by loading a UEFI binary as the first software executed by the VM, which then executes the EFI application. The UEFI implementation should be compliant with UEFI Specification 2.4 Revision A [6] or later.
This document strongly recommends that the VM implementation supports persistent environment storage for virtual firmware implementation in order to ensure probable use cases such as adding additional disk images to a VM or running installers to perform upgrades.
The binary UEFI firmware implementation should not be distributed as part of the VM image, but is specific to the VM implementation.
Hardware Description
The Linux kernel's proper entry point always takes a pointer to an FDT, regardless of the boot mechanism, firmware, and hardware description method. Even on real hardware which only supports ACPI and UEFI, the
kernel
entry point will still receive a pointer to a simple FDT, generated by the Linux kernel UEFI stub, containing a pointer to the UEFI system table. The kernel can then discover ACPI from the system tables. The presence of ACPI vs. FDT is therefore always itself discoverable, through the FDT.
I would drop pretty much all of the above detail of the kernel entry point. The spec should specify UEFI compliance and stop there.
That probably make sense. The discussion started out as "should it be DTB or ACPI" and moved on from there, and then interestingly we sort of found out, that it doesn't really matter in terms of how to package the kernel, hence this text. But looking over it now, it doesn't add to the clarity of the spec.
What is relevant is the allowance for the UEFI implementation to provide an FDT and/or ACPI via the configuration table.
Therefore, the VM implementation must provide through its UEFI implementation, either:
a complete FDT which describes the entire VM system and will boot mainline kernels driven by device tree alone, or
no FDT. In this case, the VM implementation must provide ACPI, and the OS must be able to locate the ACPI root pointer through the UEFI system table.
It is actually valid for the VM to provide both ACPI and FDT. In that scenario it is up to the OS to chose which it will use.
ok, "either" becomes "at least one of" with the appropriate adjustments.
For more information about the arm and arm64 boot conventions, see Documentation/arm/Booting and Documentation/arm64/booting.txt in the Linux kernel source tree.
For more information about UEFI and ACPI booting, see [4] and [5].
VM Platform
The specification does not mandate any specific memory map. The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states: (1) aarch32 virtual CPUs on aarch32 physical CPUs (2) aarch32 virtual CPUs on aarch64 physical CPUs (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
For portable disk image, can Xen PV be dropped from the list? pl011 is part of SBSA, and virtio is getting standardised, but Xen PV is implementation specific.
It would certainly be easier if everyone just use virtio and to mandate a pl011, but I don't want to preclude Xen from this spec, and the Xen folks don't seem likely to implement virtio or pl011 based on this spec.
What's the problem with the current recommendation? Are you concerned that people will build kernels without virtio-console and Xen PV console?
An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer. GICv2 limits the the number of virtual CPUs to 8 cores, newer GIC versions removes this limitation.
The ARM virtual timer and counter should be available to the VM as per the ARM Generic Timers specification in the ARM ARM [1].
A hotpluggable bus to support hotplug of at least block and network devices. Suitable buses include a virtual PCIe bus and the Xen PV bus.
We make the following recommendations for the guest OS kernel:
The guest OS must include support for GICv2 and any available newer version of the GIC architecture to maintain compatibility with older VM implementations.
It is strongly recommended to include support for all available (block, network, console, balloon) virtio-pci, virtio-mmio, and Xen PV drivers in the guest OS kernel or initial ramdisk.
Other common peripherals for block devices, networking, and more can (and typically will) be provided, but OS software written and compiled to run on ARM VMs cannot make any assumptions about which variations of these should exist or which implementation they use (e.g. VirtIO or Xen PV). See "Hardware Description" above.
Note that this platform specification is separate from the Linux kernel concept of mach-virt, which merely specifies a machine model driven purely from device tree, but does not mandate any peripherals or have any mention of ACPI.
References
[1]: The ARM Architecture Reference Manual, ARMv8, Issue A.b
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index...
[2]: ARM Server Base System Architecture
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.den0029/index.ht...
[3]: The ARM Generic Interrupt Controller Architecture Specifications v2.0
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0487a.b/index...
[5]:
https://git.linaro.org/people/leif.lindholm/linux.git/blob/refs/heads/uefi-f...
[6]: UEFI Specification 2.4 Revision A http://www.uefi.org/sites/default/files/resources/2_4_Errata_A.pdf
On Wed, 26 Feb 2014, Grant Likely wrote:
VM Platform
The specification does not mandate any specific memory map. Â The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states:  (1) aarch32 virtual CPUs on aarch32 physical CPUs  (2) aarch32 virtual CPUs on aarch64 physical CPUs  (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: Â The platform should provide a console, Â based on an emulated pl011, a virtio-console, or a Xen PV console.
For portable disk image, can Xen PV be dropped from the list? pl011 is part of SBSA, and virtio is getting standardised, but Xen PV is implementation specific.
Does an interface need OASIS' rubber stamp to be "standard"? If so, we should also drop FDT from this document. The SBSA has not been published by any OASIS-like standardization body either, so maybe we should drop the SBSA too.
If it doesn't need OASIS nice logo on the side to be a standard, then the Xen PV interfaces are a standard too. Give a look at xen/include/public/io, they go as back as 2004, and they have multiple different implementations of the frontends and backends in multiple operating systems today.
There is no reason why another hypervisor couldn't implement the same interface and in fact I know for a fact that it was considered for KVM.
On Thu, 27 Feb 2014 12:27:58 +0000, Stefano Stabellini stefano.stabellini@eu.citrix.com wrote:
On Wed, 26 Feb 2014, Grant Likely wrote:
VM Platform
The specification does not mandate any specific memory map. Â The guest OS must be able to enumerate all processing elements, devices, and memory through HW description data (FDT, ACPI) or a bus-specific mechanism such as PCI.
The virtual platform must support at least one of the following ARM execution states:  (1) aarch32 virtual CPUs on aarch32 physical CPUs  (2) aarch32 virtual CPUs on aarch64 physical CPUs  (3) aarch64 virtual CPUs on aarch64 physical CPUs
It is recommended to support both (2) and (3) on aarch64 capable physical systems.
The virtual hardware platform must provide a number of mandatory peripherals:
Serial console: Â The platform should provide a console, Â based on an emulated pl011, a virtio-console, or a Xen PV console.
For portable disk image, can Xen PV be dropped from the list? pl011 is part of SBSA, and virtio is getting standardised, but Xen PV is implementation specific.
Does an interface need OASIS' rubber stamp to be "standard"? If so, we should also drop FDT from this document. The SBSA has not been published by any OASIS-like standardization body either, so maybe we should drop the SBSA too.
If it doesn't need OASIS nice logo on the side to be a standard, then the Xen PV interfaces are a standard too. Give a look at xen/include/public/io, they go as back as 2004, and they have multiple different implementations of the frontends and backends in multiple operating systems today.
There is no reason why another hypervisor couldn't implement the same interface and in fact I know for a fact that it was considered for KVM.
Allow me to elaborate. I'm not trying to punish Xen here, but I'm deliberately pushing back against "either/or" options in the spec. In this case the spec says the VM must implement one of pl011 *or* virtio *or* xenpv. That gives lots of implementation choice to VM projects.
The downside is that spec-compliant OSes are required to implement support for *all three*. This is a non-issue for Linux guests because all those drivers are already there*, but it a real cost for niche guests.
The reason why a cut-dowm pl011 exists in SBSA is to provide a bare-minimum output device that is always available. I would be far happier for this spec to not give the VMs an option at all here. Now that I think about it, it probably isn't appropriate to allow virtio-console to be an option either. It should flat out require the cut-down pl011 register interface. Make everything else optional. 'sane' linux distros will enable virtio-console and xenpv and the kernel should use whichever it finds in normal operation. Then no matter what crazy guest the user wants to run there is still a known-safe fallback for console output when things go wrong.
* aside from the footprint required to enable all of them
I've just had another thought. In the majority of cases the fallback pl011 won't get used by the guest anyway unless asked to because the UEFI OS Loader (EFI_STUB for Linux) will use the UEFI console drivers. I expect the Xen UEFI port will use xenpv, and the kvm UEFI port will use virtio. After UEFI exit boot services the OS is responsible for it's own console output. If it has drivers for virtio-console or xenpv, then fine; it discovers the interface and all is good. But if it doesn't, then the fallback will at least work.
g.
On 1 March 2014 19:54, Grant Likely grant.likely@linaro.org wrote:
Allow me to elaborate. I'm not trying to punish Xen here, but I'm deliberately pushing back against "either/or" options in the spec. In this case the spec says the VM must implement one of pl011 *or* virtio *or* xenpv. That gives lots of implementation choice to VM projects.
From the VM implementation side, of course, we want to push back
against "must be X and only X" settings in the spec. My feeling is that if you're a VM that doesn't already handle X then implementing it is potentially quite a bit of work (especially if it implies implementing a bus framework or whatever that you don't have any support for, as might be the case for requiring virtio in Xen). On the other hand Linux already has support for all the options and is easy to configure with all the options enabled. So when I was looking through / making comments on this spec my preference generally came down on the side of "give the VM side the flexibility to select 1 from some small number of N".
(Sometimes, as with GIC choices, the "flexibility must be in the guest, not the VM" approach is forced by host h/w possibilities.)
For initial console in particular it might be reasonable to require "minimal pl011". On the other hand I don't think we want to imply "as a guest it's OK for you not to support Xen pv console" because that is saying "console performance will be terrible on Xen". How about "must implement minimal pl011, and also at least one of virtio-console, interrupt-driven pl011 or xen pv console" ?
thanks -- PMM
On 26 February 2014 22:35, Grant Likely grant.likely@linaro.org wrote:
On 26 Feb 2014 18:35, "Christoffer Dall" christoffer.dall@linaro.org wrote:
Serial console: The platform should provide a console, based on an emulated pl011, a virtio-console, or a Xen PV console.
For portable disk image, can Xen PV be dropped from the list? pl011 is part of SBSA, and virtio is getting standardised, but Xen PV is implementation specific.
The underlying question here is to what extent we want to force VMs to provide a single implementation of something and to what extent we want to force guests to cope with "any choice from some small set". Personally I don't think it's realistic to ask the Xen folk to drop their long-standing PV bus implementation, and so the right answer is roughly what we have here, ie "guest kernels need to cope with both situations". Otherwise Xen is going to go its own way anyway, and you just end up either (a) ruling out Xen as a platform for running portable disk images or (b) having an unofficial requirement to handle Xen PV anyway if you want an actually portable image, which I would assume distros do.
thanks -- PMM
On 2/26/14 10:34 AM, Christoffer Dall wrote:
ARM VM System Specification
See also the thread forked off to the EFI dev list, about using existing EFI ByteCode (EBC) for this new purpose, especially the informative reply from Andrew Fish of Apple.com:
http://sourceforge.net/p/edk2/mailman/message/32031943/
Today, EBC is created by Intel, and targets Intel's 3 platforms, but no ARM platforms yet. Today existing EFI uses EBC "VM", existing implementation exists with BSD license. EBC's goal was to let IHVs share Option ROM style drivers, not have to ship multiple ones. Having ARM and Intel use the same VM/bytecode would be even better for IHVs. I'm unclear to your non-EFI use cases, so may not be useful outside EFI. IMO the main issue with EBC is only commercial Intel and Microsoft compilers support it, not GCC or CLang. IP clarify of this Intel creation would also be an issue, but apparently UEFI Forum owns the spec.
There are too many lists CC'ed already, but if this becomes a valid option, the linux-efi list on kernel.org needs to get invited. :-)
Thanks, Lee
Hi Christoffer. I've got another comment on the text, this time about the format of the ESP:
On Wed, 26 Feb 2014 10:34:54 -0800, Christoffer Dall christoffer.dall@linaro.org wrote: [...]
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
g.
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
Hi Christoffer. I've got another comment on the text, this time about the format of the ESP:
On Wed, 26 Feb 2014 10:34:54 -0800, Christoffer Dall christoffer.dall@linaro.org wrote: [...]
Image format
The image format, as presented to the VM, needs to be well-defined in order for prepared disk images to be bootable across various virtualization implementations.
The raw disk format as presented to the VM must be partitioned with a GUID Partition Table (GPT). The bootable software must be placed in the EFI System Partition (ESP), using the UEFI removable media path, and must be an EFI application complying to the UEFI Specification 2.4 Revision A [6].
The ESP partition's GPT entry's partition type GUID must be C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32 execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution state.
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
ack, thanks.
-Christoffer
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
AIUI, we don't have an implementation of this right now, and even if we did, there are implications for persistent storage of this data further up the stack (a required implementation in libvirt, OpenStack nova providing a storage area for it, etc).
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
Would there be any adverse consequences to doing this?
My reason is that this would save us from blocking a general OpenStack implementation on ARM by requiring that these pieces are implemented further up the stack first, when it would bring actual gain to doing so.
This would not preclude host implementations from implementing writeable variables, or guests from using them. Just that for a _portable VM disk image_, the OS on it cannot assume that this functionality is present.
Il 06/03/2014 09:52, Robie Basak ha scritto:
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
UEFI scans the devices; looks for an EFI system partition on the disks; and builds a default boot order.
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
Would there be any adverse consequences to doing this?
Given the experience on x86 UEFI, no.
Unlike bare metal, it is common to run UEFI VMs without persistent flash storage. In this case the boot variables and boot order are rebuilt on the fly on every boot, and it just works for both Windows and Linux; there's no reason why it should be any different for ARM.
My reason is that this would save us from blocking a general OpenStack implementation on ARM by requiring that these pieces are implemented further up the stack first, when it would bring actual gain to doing so.
This would not preclude host implementations from implementing writeable variables, or guests from using them. Just that for a _portable VM disk image_, the OS on it cannot assume that this functionality is present.
This is already the case for most OSes. Otherwise you wouldn't be able to move a hard disk from a (physical) machine to another.
I strongly suggest that you take a look at the work done in Tiano Core's OvmfPkg, which has support for almost every QEMU feature thanks to the work of Laszlo Ersek and Jordan Justen.
In particular, OvmfPkg has support for specifying a boot order in the VM configuration (which maps to the "-boot" option in QEMU). In this case, the UEFI boot order is overridden by a variable that is placed in some architecture-specific firmware configuration mechanism (on x86 we have one called fw_cfg, on ARM you could look at the fdt). This predates UEFI and is not a UEFI variable; in fact is is a list of OpenFirmware device paths. UEFI will match the OF paths to UEFI paths, and use the result to build a UEFI boot order.
Paolo
On 03/06/14 10:46, Paolo Bonzini wrote:
Il 06/03/2014 09:52, Robie Basak ha scritto:
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
"+1"
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
UEFI scans the devices; looks for an EFI system partition on the disks; and builds a default boot order.
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
Would there be any adverse consequences to doing this?
Given the experience on x86 UEFI, no.
Unlike bare metal, it is common to run UEFI VMs without persistent flash storage. In this case the boot variables and boot order are rebuilt on the fly on every boot, and it just works for both Windows and Linux; there's no reason why it should be any different for ARM.
My reason is that this would save us from blocking a general OpenStack implementation on ARM by requiring that these pieces are implemented further up the stack first, when it would bring actual gain to doing so.
This would not preclude host implementations from implementing writeable variables, or guests from using them. Just that for a _portable VM disk image_, the OS on it cannot assume that this functionality is present.
This is already the case for most OSes. Otherwise you wouldn't be able to move a hard disk from a (physical) machine to another.
I strongly suggest that you take a look at the work done in Tiano Core's OvmfPkg, which has support for almost every QEMU feature thanks to the work of Laszlo Ersek and Jordan Justen.
In particular, OvmfPkg has support for specifying a boot order in the VM configuration (which maps to the "-boot" option in QEMU). In this case, the UEFI boot order is overridden by a variable that is placed in some architecture-specific firmware configuration mechanism (on x86 we have one called fw_cfg, on ARM you could look at the fdt). This predates UEFI and is not a UEFI variable; in fact is is a list of OpenFirmware device paths. UEFI will match the OF paths to UEFI paths, and use the result to build a UEFI boot order.
If I understand correctly, the question is this:
Given a hypervisor that doesn't support non-volatile UEFI variables (including BootOrder and Boot####), is it possible to automatically boot a carefully prepared VM image, made available as a fixed media device?
The answer is "yes". See
3.3 Boot Option Variables Default Boot Behavior
in the UEFI spec (already referenced by Grant Likely in the context above).
3.3 Boot Option Variables Default Boot Behavior
The default state of globally-defined variables is firmware vendor specific. However the boot options require a standard default behavior in the exceptional case that valid boot options are not present on a platform. The default behavior must be invoked any time the BootOrder variable does not exist or only points to nonexistent boot options.
If no valid boot options exist, the boot manager will enumerate all removable media devices followed by all fixed media devices. The order within each group is undefined. These new default boot options are not saved to non volatile storage. The boot manger will then attempt to boot from each boot option. If the device supports the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL then the removable media boot behavior (see Section 3.4.1.1) is executed. Otherwise, the firmware will attempt to boot the device via the EFI_LOAD_FILE_PROTOCOL.
It is expected that this default boot will load an operating system or a maintenance utility. If this is an operating system setup program it is then responsible for setting the requisite environment variables for subsequent boots. The platform firmware may also decide to recover or set to a known set of boot options.
Basically, the "removable media boot behavior" applies to fixed media devices as last resort. You can prepare a disk image where this behavior will "simply" boot the OS.
See also Peter Jones' blog post about this:
http://blog.uncooperative.org/blog/2014/02/06/the-efi-system-partition/
(In short, my email is a "+1" to what Grant said.)
Thanks Laszlo
On Thu, Mar 06, 2014 at 12:44:57PM +0100, Laszlo Ersek wrote:
If I understand correctly, the question is this:
Given a hypervisor that doesn't support non-volatile UEFI variables (including BootOrder and Boot####), is it possible to automatically boot a carefully prepared VM image, made available as a fixed media device?
The answer is "yes". See
Right, but I think there is a subsequent problem.
It is expected that this default boot will load an operating system or a maintenance utility. If this is an operating system setup program it is then responsible for setting the requisite environment variables for subsequent boots. The platform firmware may also decide to recover
^^^^^^^^^^^^^^^^
or set to a known set of boot options.
It seems to me that the guest OS is permitted to assume that persistent boot variables will work after first boot, for subsequent boots.
So, for example, the guest OS might, on bootloader or kernel upgrade, completely replace the boot mechanism, dropping the removable path and replacing it with a fixed disk arrangement, setting boot variables appropriately, and assume that it can reboot and everything will continue to work.
But if the host does not support non-volatile variables, then this will break.
This is why I'm suggesting that the specification mandate that the guest OS shipped in a "portable disk image" as defined by the spec must not make this assumption.
It's either this, or mandate that all hosts must support persistent variables. I have no objection to that in principle, but since we have no implementation currently, it seems easier to avoid this particular roadblock by tweaking the spec in a way that nobody seems to care about anyway.
Il 06/03/2014 13:04, Robie Basak ha scritto:
So, for example, the guest OS might, on bootloader or kernel upgrade, completely replace the boot mechanism, dropping the removable path and replacing it with a fixed disk arrangement, setting boot variables appropriately, and assume that it can reboot and everything will continue to work.
It can, but at the same time it had better keep the fallback method working, where fixed media are treated like removable ones.
In practice, both Linux distributions and Windows do that.
This is why I'm suggesting that the specification mandate that the guest OS shipped in a "portable disk image" as defined by the spec must not make this assumption.
Yeah, that's fine to specify.
Paolo
On Thu, 6 Mar 2014 12:04:50 +0000, Robie Basak robie.basak@canonical.com wrote:
On Thu, Mar 06, 2014 at 12:44:57PM +0100, Laszlo Ersek wrote:
If I understand correctly, the question is this:
Given a hypervisor that doesn't support non-volatile UEFI variables (including BootOrder and Boot####), is it possible to automatically boot a carefully prepared VM image, made available as a fixed media device?
The answer is "yes". See
Right, but I think there is a subsequent problem.
It is expected that this default boot will load an operating system or a maintenance utility. If this is an operating system setup program it is then responsible for setting the requisite environment variables for subsequent boots. The platform firmware may also decide to recover
^^^^^^^^^^^^^^^^
or set to a known set of boot options.
It seems to me that the guest OS is permitted to assume that persistent boot variables will work after first boot, for subsequent boots.
So, for example, the guest OS might, on bootloader or kernel upgrade, completely replace the boot mechanism, dropping the removable path and replacing it with a fixed disk arrangement, setting boot variables appropriately, and assume that it can reboot and everything will continue to work.
But if the host does not support non-volatile variables, then this will break.
Correct
This is why I'm suggesting that the specification mandate that the guest OS shipped in a "portable disk image" as defined by the spec must not make this assumption.
Also correct... the installer must be aware of this constraint which is why it is part of the draft spec.
It's either this, or mandate that all hosts must support persistent variables. I have no objection to that in principle, but since we have no implementation currently, it seems easier to avoid this particular roadblock by tweaking the spec in a way that nobody seems to care about anyway.
Right. I guess my position is that if persistent storage is not implemented then there are a number of install/upgrade scenarios that won't work. Regardless, portable images must assume an empty boot list and we can build that into the spec.
g.
On Fri, Mar 07, 2014 at 08:24:18PM +0800, Grant Likely wrote:
On Thu, 6 Mar 2014 12:04:50 +0000, Robie Basak robie.basak@canonical.com wrote:
On Thu, Mar 06, 2014 at 12:44:57PM +0100, Laszlo Ersek wrote:
If I understand correctly, the question is this:
Given a hypervisor that doesn't support non-volatile UEFI variables (including BootOrder and Boot####), is it possible to automatically boot a carefully prepared VM image, made available as a fixed media device?
The answer is "yes". See
Right, but I think there is a subsequent problem.
It is expected that this default boot will load an operating system or a maintenance utility. If this is an operating system setup program it is then responsible for setting the requisite environment variables for subsequent boots. The platform firmware may also decide to recover
^^^^^^^^^^^^^^^^
or set to a known set of boot options.
It seems to me that the guest OS is permitted to assume that persistent boot variables will work after first boot, for subsequent boots.
So, for example, the guest OS might, on bootloader or kernel upgrade, completely replace the boot mechanism, dropping the removable path and replacing it with a fixed disk arrangement, setting boot variables appropriately, and assume that it can reboot and everything will continue to work.
But if the host does not support non-volatile variables, then this will break.
Correct
This is why I'm suggesting that the specification mandate that the guest OS shipped in a "portable disk image" as defined by the spec must not make this assumption.
Also correct... the installer must be aware of this constraint which is why it is part of the draft spec.
It's either this, or mandate that all hosts must support persistent variables. I have no objection to that in principle, but since we have no implementation currently, it seems easier to avoid this particular roadblock by tweaking the spec in a way that nobody seems to care about anyway.
Right. I guess my position is that if persistent storage is not implemented then there are a number of install/upgrade scenarios that won't work. Regardless, portable images must assume an empty boot list and we can build that into the spec.
Sorry for the delay in responding - all sorts of unexpected things happened when I returned from LCA14.
I agree on the technical discussion going on here. My conclusion is that we have two options:
1. Simply mandate that VM implementations support persistent variables for their UEFI implementation - with whatever constraints that may put on higher level tools.
2. Require that OSes shipped as part of compliant VM images make no assumption that changes to the UEFI environment will be stored.
I feel that option number two will break in all sorts of cases, just like Grant stated above, and it is fundamentally not practical; if a distribution ships Linux with a UEFI stub that expects to be able to do something, distributions must modify Linux to conform to this spec. I think imagining that this spec controls how UEFI support in Linux/Grub is done in general would be overreaching. Additionally, Michael brought up the fact that it would be non-UEFI compliant.
I know that door #1 may be a pain in terms of libvirt/openstack support and other related tools, but it feels by far the cleanest and most long-term solution and is in my oppinion what we shoud shoot for. We are early enough in the ARM server/VM game that we should be able to make the right decisions at this point.
-Christoffer
Il 22/03/2014 03:29, Christoffer Dall ha scritto:
Simply mandate that VM implementations support persistent variables for their UEFI implementation - with whatever constraints that may put on higher level tools.
Require that OSes shipped as part of compliant VM images make no assumption that changes to the UEFI environment will be stored.
I feel that option number two will break in all sorts of cases, just like Grant stated above, and it is fundamentally not practical; if a distribution ships Linux with a UEFI stub that expects to be able to do something, distributions must modify Linux to conform to this spec. I think imagining that this spec controls how UEFI support in Linux/Grub is done in general would be overreaching. Additionally, Michael brought up the fact that it would be non-UEFI compliant.
OSes are already able to cope with loss of changes to UEFI environment are stored, because losing persistent variables is what happens if you copy an image to a new hard disk.
Asking implementations for support of persistent variables is a good idea; however, independent of what is in the spec, OSes should not expect that users will enable that support---most of them won't.
Paolo
On Sat, Mar 22, 2014 at 09:08:37AM +0100, Paolo Bonzini wrote:
Il 22/03/2014 03:29, Christoffer Dall ha scritto:
- Simply mandate that VM implementations support persistent variables
for their UEFI implementation - with whatever constraints that may put on higher level tools.
- Require that OSes shipped as part of compliant VM images make no
assumption that changes to the UEFI environment will be stored.
I feel that option number two will break in all sorts of cases, just like Grant stated above, and it is fundamentally not practical; if a distribution ships Linux with a UEFI stub that expects to be able to do something, distributions must modify Linux to conform to this spec. I think imagining that this spec controls how UEFI support in Linux/Grub is done in general would be overreaching. Additionally, Michael brought up the fact that it would be non-UEFI compliant.
OSes are already able to cope with loss of changes to UEFI environment are stored, because losing persistent variables is what happens if you copy an image to a new hard disk.
Asking implementations for support of persistent variables is a good idea; however, independent of what is in the spec, OSes should not expect that users will enable that support---most of them won't.
OK, fair enough, mandating support for persistent variable storage may be overreaching but at the same time I feel it is unlikely for this spec to reach far enough that a generic UEFI Linux loader, for example, actually follows it, and therefore explicitly mandating that guest OSes must be completely portable in all that they do is a non-practical constraint. That was the request that kicked this discussion off, and what I was trygint to address.
-Christoffer
On Sat, Mar 22, 2014 at 08:19:52PM -0700, Christoffer Dall wrote:
On Sat, Mar 22, 2014 at 09:08:37AM +0100, Paolo Bonzini wrote:
Il 22/03/2014 03:29, Christoffer Dall ha scritto:
- Simply mandate that VM implementations support persistent variables
for their UEFI implementation - with whatever constraints that may put on higher level tools.
- Require that OSes shipped as part of compliant VM images make no
assumption that changes to the UEFI environment will be stored.
I feel that option number two will break in all sorts of cases, just like Grant stated above, and it is fundamentally not practical; if a distribution ships Linux with a UEFI stub that expects to be able to do something, distributions must modify Linux to conform to this spec. I think imagining that this spec controls how UEFI support in Linux/Grub is done in general would be overreaching. Additionally, Michael brought up the fact that it would be non-UEFI compliant.
OSes are already able to cope with loss of changes to UEFI environment are stored, because losing persistent variables is what happens if you copy an image to a new hard disk.
Asking implementations for support of persistent variables is a good idea; however, independent of what is in the spec, OSes should not expect that users will enable that support---most of them won't.
OK, fair enough, mandating support for persistent variable storage may be overreaching but at the same time I feel it is unlikely for this spec to reach far enough that a generic UEFI Linux loader, for example, actually follows it, and therefore explicitly mandating that guest OSes must be completely portable in all that they do is a non-practical constraint. That was the request that kicked this discussion off, and what I was trygint to address.
After thinking about this a bit more, I think I see what we're actually discussing. It's obvious that if software in a VM makes changes to UEFI variables that are required to be persistent for that VM image to boot again, then the VM image is no longer portable, as per the spec.
It is in fact probably a good idea to specify that out clearly, I found it intuitively obvious, but it could be spelled out nevertheless.
-Christoffer
Thank you all for considering this case in more detail.
On Sat, Mar 22, 2014 at 08:29:39PM -0700, Christoffer Dall wrote:
After thinking about this a bit more, I think I see what we're actually discussing. It's obvious that if software in a VM makes changes to UEFI variables that are required to be persistent for that VM image to boot again, then the VM image is no longer portable, as per the spec.
No longer portable, and given the current state of implementation, no longer bootable, since we don't support persistent storage yet; certainly not on OpenStack? Or do we have that now?
Are we really pushing ahead with a specification that nobody can implement today? How far away are we from a fully compliant implementation?
Il 24/03/2014 10:57, Robie Basak ha scritto:
After thinking about this a bit more, I think I see what we're actually discussing. It's obvious that if software in a VM makes changes to UEFI variables that are required to be persistent for that VM image to boot again, then the VM image is no longer portable, as per the spec.
No longer portable, and given the current state of implementation, no longer bootable, since we don't support persistent storage yet; certainly not on OpenStack? Or do we have that now?
Are we really pushing ahead with a specification that nobody can implement today? How far away are we from a fully compliant implementation?
The spec says SHOULD, so I think it's fine. While support for persistent variables in the KVM stack is in the early stages, there is request and it is not ARM-specific. It will be implemented sooner rather than later at least at the libvirt level, and I suppose qemu-arm, xl, OpenStack/, Virt, XenServer and everything else will follow soon.
Paolo
On Fri, 21 Mar 2014 19:29:50 -0700, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Mar 07, 2014 at 08:24:18PM +0800, Grant Likely wrote:
On Thu, 6 Mar 2014 12:04:50 +0000, Robie Basak robie.basak@canonical.com wrote:
On Thu, Mar 06, 2014 at 12:44:57PM +0100, Laszlo Ersek wrote:
If I understand correctly, the question is this:
Given a hypervisor that doesn't support non-volatile UEFI variables (including BootOrder and Boot####), is it possible to automatically boot a carefully prepared VM image, made available as a fixed media device?
The answer is "yes". See
Right, but I think there is a subsequent problem.
It is expected that this default boot will load an operating system or a maintenance utility. If this is an operating system setup program it is then responsible for setting the requisite environment variables for subsequent boots. The platform firmware may also decide to recover
^^^^^^^^^^^^^^^^
or set to a known set of boot options.
It seems to me that the guest OS is permitted to assume that persistent boot variables will work after first boot, for subsequent boots.
So, for example, the guest OS might, on bootloader or kernel upgrade, completely replace the boot mechanism, dropping the removable path and replacing it with a fixed disk arrangement, setting boot variables appropriately, and assume that it can reboot and everything will continue to work.
But if the host does not support non-volatile variables, then this will break.
Correct
This is why I'm suggesting that the specification mandate that the guest OS shipped in a "portable disk image" as defined by the spec must not make this assumption.
Also correct... the installer must be aware of this constraint which is why it is part of the draft spec.
It's either this, or mandate that all hosts must support persistent variables. I have no objection to that in principle, but since we have no implementation currently, it seems easier to avoid this particular roadblock by tweaking the spec in a way that nobody seems to care about anyway.
Right. I guess my position is that if persistent storage is not implemented then there are a number of install/upgrade scenarios that won't work. Regardless, portable images must assume an empty boot list and we can build that into the spec.
Sorry for the delay in responding - all sorts of unexpected things happened when I returned from LCA14.
I agree on the technical discussion going on here. My conclusion is that we have two options:
Simply mandate that VM implementations support persistent variables for their UEFI implementation - with whatever constraints that may put on higher level tools.
Require that OSes shipped as part of compliant VM images make no assumption that changes to the UEFI environment will be stored.
I feel that option number two will break in all sorts of cases, just like Grant stated above, and it is fundamentally not practical; if a distribution ships Linux with a UEFI stub that expects to be able to do something, distributions must modify Linux to conform to this spec. I think imagining that this spec controls how UEFI support in Linux/Grub is done in general would be overreaching. Additionally, Michael brought up the fact that it would be non-UEFI compliant.
That isn't actually my position. I absolutely think that VMs /should/ implement persistent variables, but the variables are a property of a VM instance, not of the disk image. As far as this spec is concerned, I think portable disk images should operate under the assumption of an empty set of variables, and therefore follow the removable disk requirements in the UEFI spec.
I would propose a modification to option 1:
VM implementations SHOULD to implement persistent variables for their UEFI implementation - with whatever constraints that may be put on higher level tools. Variable storage SHALL be a property of a VM instance, but SHALL NOT be stored as part of a portable disk image. Portable disk images SHALL conform to the UEFI removable disk requirements from the UEFI spec.
g.
Il 22/03/2014 13:23, Grant Likely ha scritto:
VM implementations SHOULD to implement persistent variables for their UEFI implementation - with whatever constraints that may be put on higher level tools. Variable storage SHALL be a property of a VM instance, but SHALL NOT be stored as part of a portable disk image. Portable disk images SHALL conform to the UEFI removable disk requirements from the UEFI spec.
I fully agree with this wording.
Paolo
On Sat, 22 Mar 2014 20:57:58 +0100, Paolo Bonzini pbonzini@redhat.com wrote:
Il 22/03/2014 13:23, Grant Likely ha scritto:
VM implementations SHOULD to implement persistent variables for their UEFI implementation - with whatever constraints that may be put on higher level tools. Variable storage SHALL be a property of a VM instance, but SHALL NOT be stored as part of a portable disk image. Portable disk images SHALL conform to the UEFI removable disk requirements from the UEFI spec.
I fully agree with this wording.
:-)
On a seperate, but related, topic. If we were to define a portable VM spec that includes the entire hardware configuration, then variables storage should be included in that.
g.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/22/2014 03:57 PM, Paolo Bonzini wrote:
Il 22/03/2014 13:23, Grant Likely ha scritto:
VM implementations SHOULD to implement persistent variables for their UEFI implementation - with whatever constraints that may be put on higher level tools. Variable storage SHALL be a property of a VM instance, but SHALL NOT be stored as part of a portable disk image. Portable disk images SHALL conform to the UEFI removable disk requirements from the UEFI spec.
I fully agree with this wording.
Paolo
+1 here
Michael
On 03/23/14 00:38, Michael Casadevall wrote:
On 03/22/2014 03:57 PM, Paolo Bonzini wrote:
Il 22/03/2014 13:23, Grant Likely ha scritto:
VM implementations SHOULD to implement persistent variables for their UEFI implementation - with whatever constraints that may be put on higher level tools. Variable storage SHALL be a property of a VM instance, but SHALL NOT be stored as part of a portable disk image. Portable disk images SHALL conform to the UEFI removable disk requirements from the UEFI spec.
I fully agree with this wording.
Paolo
+1 here
I cannot resist getting productive here, so I'll mention that the first "to" in the language should probably be dropped.
I would also replace the "but" with "and".
Happy to contribute, always. Laszlo :)
On Sat, Mar 22, 2014 at 12:23:54PM +0000, Grant Likely wrote:
On Fri, 21 Mar 2014 19:29:50 -0700, Christoffer Dall christoffer.dall@linaro.org wrote:
On Fri, Mar 07, 2014 at 08:24:18PM +0800, Grant Likely wrote:
On Thu, 6 Mar 2014 12:04:50 +0000, Robie Basak robie.basak@canonical.com wrote:
On Thu, Mar 06, 2014 at 12:44:57PM +0100, Laszlo Ersek wrote:
If I understand correctly, the question is this:
Given a hypervisor that doesn't support non-volatile UEFI variables (including BootOrder and Boot####), is it possible to automatically boot a carefully prepared VM image, made available as a fixed media device?
The answer is "yes". See
Right, but I think there is a subsequent problem.
It is expected that this default boot will load an operating system or a maintenance utility. If this is an operating system setup program it is then responsible for setting the requisite environment variables for subsequent boots. The platform firmware may also decide to recover
^^^^^^^^^^^^^^^^
or set to a known set of boot options.
It seems to me that the guest OS is permitted to assume that persistent boot variables will work after first boot, for subsequent boots.
So, for example, the guest OS might, on bootloader or kernel upgrade, completely replace the boot mechanism, dropping the removable path and replacing it with a fixed disk arrangement, setting boot variables appropriately, and assume that it can reboot and everything will continue to work.
But if the host does not support non-volatile variables, then this will break.
Correct
This is why I'm suggesting that the specification mandate that the guest OS shipped in a "portable disk image" as defined by the spec must not make this assumption.
Also correct... the installer must be aware of this constraint which is why it is part of the draft spec.
It's either this, or mandate that all hosts must support persistent variables. I have no objection to that in principle, but since we have no implementation currently, it seems easier to avoid this particular roadblock by tweaking the spec in a way that nobody seems to care about anyway.
Right. I guess my position is that if persistent storage is not implemented then there are a number of install/upgrade scenarios that won't work. Regardless, portable images must assume an empty boot list and we can build that into the spec.
Sorry for the delay in responding - all sorts of unexpected things happened when I returned from LCA14.
I agree on the technical discussion going on here. My conclusion is that we have two options:
Simply mandate that VM implementations support persistent variables for their UEFI implementation - with whatever constraints that may put on higher level tools.
Require that OSes shipped as part of compliant VM images make no assumption that changes to the UEFI environment will be stored.
I feel that option number two will break in all sorts of cases, just like Grant stated above, and it is fundamentally not practical; if a distribution ships Linux with a UEFI stub that expects to be able to do something, distributions must modify Linux to conform to this spec. I think imagining that this spec controls how UEFI support in Linux/Grub is done in general would be overreaching. Additionally, Michael brought up the fact that it would be non-UEFI compliant.
That isn't actually my position. I absolutely think that VMs /should/ implement persistent variables, but the variables are a property of a VM instance, not of the disk image. As far as this spec is concerned, I think portable disk images should operate under the assumption of an empty set of variables, and therefore follow the removable disk requirements in the UEFI spec.
I think we may have misunderstood each other a bit here, I didn't mean anything in my statement above that contradicts what you say.
I am only saying that mandating what OSes do and don't do, once they've been booted, is beyond the scope of this spec.
Portable VM images should absolutely boot with an empty set of variables and I completely agree that the persistent variable storage is a property of the VM.
I would propose a modification to option 1:
VM implementations SHOULD to implement persistent variables for their UEFI implementation - with whatever constraints that may be put on higher level tools. Variable storage SHALL be a property of a VM instance, but SHALL NOT be stored as part of a portable disk image. Portable disk images SHALL conform to the UEFI removable disk requirements from the UEFI spec.
I agree with all of the above.
(For the record I wasn't proposing this text as something that should go verbatim in the spec, but I may borrow from your wording here).
Thanks, -Christoffer
On Sat, 2014-03-22 at 12:23 +0000, Grant Likely wrote:
That isn't actually my position. I absolutely think that VMs /should/ implement persistent variables, but the variables are a property of a VM instance, not of the disk image. As far as this spec is concerned, I think portable disk images should operate under the assumption of an empty set of variables, and therefore follow the removable disk requirements in the UEFI spec.
Just to be sure I understand. You position is: 1. A VM image downloaded from www.distro.org should neither contain nor expect any persistent variables to be present. 2. After a VM image is instantiated into a specific VM instance and booted then it is at liberty to set persistent variables (either on first boot or as part of an upgrade) and the VM should ensure that those variables a retained over reboot for that specific instance. 3. If a VM does not preserve those variables then the instance should have some sane functional fallback (implied by the removable disk requirements from the UEFI spec).
Is that right? I'm pretty sure you meant (1), reasonably sure you meant (2) and not at all sure you meant (3) ;-)
Ian.
Il 24/03/2014 10:03, Ian Campbell ha scritto:
That isn't actually my position. I absolutely think that VMs /should/ implement persistent variables, but the variables are a property of a VM instance, not of the disk image. As far as this spec is concerned, I think portable disk images should operate under the assumption of an empty set of variables, and therefore follow the removable disk requirements in the UEFI spec.
Just to be sure I understand. You position is: 1. A VM image downloaded from www.distro.org should neither contain nor expect any persistent variables to be present. 2. After a VM image is instantiated into a specific VM instance and booted then it is at liberty to set persistent variables (either on first boot or as part of an upgrade) and the VM should ensure that those variables a retained over reboot for that specific instance. 3. If a VM does not preserve those variables then the instance should have some sane functional fallback (implied by the removable disk requirements from the UEFI spec).
Is that right? I'm pretty sure you meant (1), reasonably sure you meant (2) and not at all sure you meant (3) ;-)
At least I do. :)
Paolo
On Mon, 2014-03-24 at 11:41 +0100, Paolo Bonzini wrote:
Il 24/03/2014 10:03, Ian Campbell ha scritto:
That isn't actually my position. I absolutely think that VMs /should/ implement persistent variables, but the variables are a property of a VM instance, not of the disk image. As far as this spec is concerned, I think portable disk images should operate under the assumption of an empty set of variables, and therefore follow the removable disk requirements in the UEFI spec.
Just to be sure I understand. You position is: 1. A VM image downloaded from www.distro.org should neither contain nor expect any persistent variables to be present.
I suppose for completeness I should have had 1a here: When a VM image is instantiated into a specific VM instance then it must not expect or require any persistent variables to be present.
2. After a VM image is instantiated into a specific VM instance and booted then it is at liberty to set persistent variables (either on first boot or as part of an upgrade) and the VM should ensure that those variables a retained over reboot for that specific instance. 3. If a VM does not preserve those variables then the instance should have some sane functional fallback (implied by the removable disk requirements from the UEFI spec).
Is that right? I'm pretty sure you meant (1), reasonably sure you meant (2) and not at all sure you meant (3) ;-)
At least I do. :)
You did mean it, or you do think Grant meant it? ;-)
Ian.
On Mon, Mar 24, 2014 at 2:03 AM, Ian Campbell Ian.Campbell@citrix.com wrote:
On Sat, 2014-03-22 at 12:23 +0000, Grant Likely wrote:
That isn't actually my position. I absolutely think that VMs /should/ implement persistent variables, but the variables are a property of a VM instance, not of the disk image. As far as this spec is concerned, I think portable disk images should operate under the assumption of an empty set of variables, and therefore follow the removable disk requirements in the UEFI spec.
Just to be sure I understand. You position is: 1. A VM image downloaded from www.distro.org should neither contain nor expect any persistent variables to be present.
yes
2. After a VM image is instantiated into a specific VM instance and booted then it is at liberty to set persistent variables (either on first boot or as part of an upgrade) and the VM should ensure that those variables a retained over reboot for that specific instance.
yes
3. If a VM does not preserve those variables then the instance should have some sane functional fallback (implied by the removable disk requirements from the UEFI spec).
yes
Is that right? I'm pretty sure you meant (1), reasonably sure you meant (2) and not at all sure you meant (3) ;-)
Ian.
On Thu, 06 Mar 2014 10:46:22 +0100, Paolo Bonzini pbonzini@redhat.com wrote:
Il 06/03/2014 09:52, Robie Basak ha scritto:
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
UEFI scans the devices; looks for an EFI system partition on the disks; and builds a default boot order.
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
Would there be any adverse consequences to doing this?
Given the experience on x86 UEFI, no.
Unlike bare metal, it is common to run UEFI VMs without persistent flash storage. In this case the boot variables and boot order are rebuilt on the fly on every boot, and it just works for both Windows and Linux; there's no reason why it should be any different for ARM.
My reason is that this would save us from blocking a general OpenStack implementation on ARM by requiring that these pieces are implemented further up the stack first, when it would bring actual gain to doing so.
This would not preclude host implementations from implementing writeable variables, or guests from using them. Just that for a _portable VM disk image_, the OS on it cannot assume that this functionality is present.
This is already the case for most OSes. Otherwise you wouldn't be able to move a hard disk from a (physical) machine to another.
I strongly suggest that you take a look at the work done in Tiano Core's OvmfPkg, which has support for almost every QEMU feature thanks to the work of Laszlo Ersek and Jordan Justen.
In particular, OvmfPkg has support for specifying a boot order in the VM configuration (which maps to the "-boot" option in QEMU). In this case, the UEFI boot order is overridden by a variable that is placed in some architecture-specific firmware configuration mechanism (on x86 we have one called fw_cfg, on ARM you could look at the fdt). This predates UEFI and is not a UEFI variable; in fact is is a list of OpenFirmware device paths. UEFI will match the OF paths to UEFI paths, and use the result to build a UEFI boot order.
I don't know why we wouldn't want to make the UEFI variable the mechanism for exposing VM boot order to UEFI and the OS. I do completely agree that the boot order should be owned by the VM and be able to be manipulated from the config file and command line.
g.
On 03/06/2014 05:46 PM, Paolo Bonzini wrote:
Il 06/03/2014 09:52, Robie Basak ha scritto:
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
UEFI scans the devices; looks for an EFI system partition on the disks; and builds a default boot order.
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
Would there be any adverse consequences to doing this?
Given the experience on x86 UEFI, no.
Unlike bare metal, it is common to run UEFI VMs without persistent flash storage. In this case the boot variables and boot order are rebuilt on the fly on every boot, and it just works for both Windows and Linux; there's no reason why it should be any different for ARM.
While I realize in the real world, we can live with non-persistent boot variables, this is a *direct* violation of the UEFI spec; we can't call our VMs UEFI-compatible if we do this.
However, I've been looking at the spec, and I think we're within spec if we save the variables on the HDD itself. There's some support for this already (Firmware Block Volume Device), but its possible we could implement boot variables as a file on system partition (UEFI's default search order can be used to figure out which variable file to use, or some sorta fingerprinting system). The biggest trick though is that UEFI's Runtime Services will need to be able to write this file, which may require us move a large chunk of UEFI to runtime services so the FAT filesystem stuff can stick around. If we give it a proper partition, then we can just do raw block read/writes. This however would require us mandating that said partition exists, and making sure there aren't any hidden gotchas in invoking this.
Obviously this isn't ideal, but this might be the middle road solution we need here. I can dig through Tiano to get a realistic idea of how hard this will be in reality if we want to seriously look at this option.
My reason is that this would save us from blocking a general OpenStack implementation on ARM by requiring that these pieces are implemented further up the stack first, when it would bring actual gain to doing so.
This would not preclude host implementations from implementing writeable variables, or guests from using them. Just that for a _portable VM disk image_, the OS on it cannot assume that this functionality is present.
This is already the case for most OSes. Otherwise you wouldn't be able to move a hard disk from a (physical) machine to another.
I strongly suggest that you take a look at the work done in Tiano Core's OvmfPkg, which has support for almost every QEMU feature thanks to the work of Laszlo Ersek and Jordan Justen.
In particular, OvmfPkg has support for specifying a boot order in the VM configuration (which maps to the "-boot" option in QEMU). In this case, the UEFI boot order is overridden by a variable that is placed in some architecture-specific firmware configuration mechanism (on x86 we have one called fw_cfg, on ARM you could look at the fdt). This predates UEFI and is not a UEFI variable; in fact is is a list of OpenFirmware device paths. UEFI will match the OF paths to UEFI paths, and use the result to build a UEFI boot order.
This lines up with work to make Tiano itself run on FDT to handle varying boot configurations. Is this behaviour and the DT nodes codified anywhere?
Paolo
On 03/08/14 12:41, Michael Casadevall wrote:
On 03/06/2014 05:46 PM, Paolo Bonzini wrote:
Il 06/03/2014 09:52, Robie Basak ha scritto:
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
UEFI scans the devices; looks for an EFI system partition on the disks; and builds a default boot order.
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
Would there be any adverse consequences to doing this?
Given the experience on x86 UEFI, no.
Unlike bare metal, it is common to run UEFI VMs without persistent flash storage. In this case the boot variables and boot order are rebuilt on the fly on every boot, and it just works for both Windows and Linux; there's no reason why it should be any different for ARM.
While I realize in the real world, we can live with non-persistent boot variables, this is a *direct* violation of the UEFI spec; we can't call our VMs UEFI-compatible if we do this.
However, I've been looking at the spec, and I think we're within spec if we save the variables on the HDD itself. There's some support for this already (Firmware Block Volume Device), but its possible we could implement boot variables as a file on system partition (UEFI's default search order can be used to figure out which variable file to use, or some sorta fingerprinting system). The biggest trick though is that UEFI's Runtime Services will need to be able to write this file, which may require us move a large chunk of UEFI to runtime services so the FAT filesystem stuff can stick around. If we give it a proper partition, then we can just do raw block read/writes. This however would require us mandating that said partition exists, and making sure there aren't any hidden gotchas in invoking this.
This is how OVMF fakes non-volatile variables when pflash is not enabled or supported by the host. It stores variables in a file called \NvVars in the EFI system partition. Before ExitBootServices(), changes are synced out immediately. After ExitBootServices(), changes are kept only in memory. If you then do an in-VM reboot, then you end up again "before" ExitBootServices(), and then the in-memory changes are written out. This was usually good enough to keep installers happy (because the last step they do is install grub, run efibootmgr, and reboot).
Keeping around the UEFI FAT driver into runtime would be just one problem. Another problem is that the OS could itself mount the partition that holds the file, and then the two drivers (UEFI and Linux) would corrupt the filesystem. A "raw" partition and a custom FVB driver would probably be safer in this regard.
Obviously this isn't ideal, but this might be the middle road solution we need here. I can dig through Tiano to get a realistic idea of how hard this will be in reality if we want to seriously look at this option.
You could check out Jordan's implementation for qemu pflash -- "OvmfPkg/QemuFlashFvbServicesRuntimeDxe".
The protocol to implement is EFI_FIRMWARE_VOLUME_BLOCK[2]_PROTOCOL, everything else that should be seated on top "automatically" is already present in edk2. The protocol is documented in Vol3 of the Platform Init spec, chapter 3.4.2.
(...BLOCK2... is just a typedef to ...BLOCK..., see MdePkg/Include/Protocol/FirmwareVolumeBlock.h)
Thanks Laszlo
On Thu, 6 Mar 2014 08:52:13 +0000, Robie Basak robie.basak@canonical.com wrote:
On Sat, Mar 01, 2014 at 03:27:56PM +0000, Grant Likely wrote:
I would also reference section 3.3 (Boot Option Variables Default Boot Behavior) and 3.4.1.1 (Removable Media Boot Behavior) here. It's fine to restate the meaning of the requirement in this spec, but the UEFI spec is the authoritative source. Distributed VM disk images fall under the same scenario as the firmware not having any valid boot variables.
What happens when the VM is first booted without boot variables, but then the OS expects to be able to set boot variables and see them on next boot?
AIUI, we don't have an implementation of this right now, and even if we did, there are implications for persistent storage of this data further up the stack (a required implementation in libvirt, OpenStack nova providing a storage area for it, etc).
If possible, I would prefer to mandate that the host implementation is permitted to no-op (or otherwise disable) boot variable write operations altogether to avoid having to deal with this. In the common case, I don't see why an OS installation shipped via a VM disk image would need to write boot variables anyway.
If a VM is booting from a distributed disk image, the boot variables absolutely should start out empty. That's the only sane option.
It is appropriate to implement boot variable storage, but only because it is needed if multiple OSes get installed. Those variables should not get distributed with a disk image.
Would there be any adverse consequences to doing this?
It would be a bad idea to inhibit variable storage. That would break all kinds of boot and install scenarios.
My reason is that this would save us from blocking a general OpenStack implementation on ARM by requiring that these pieces are implemented further up the stack first, when it would bring actual gain to doing so.
This would not preclude host implementations from implementing writeable variables, or guests from using them. Just that for a _portable VM disk image_, the OS on it cannot assume that this functionality is present.
Yeah, just to restate what I mean. If you're talking about bringing up a portable disk image, the VM should start with an empty list of boot variables.
g.