Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are: - RSDP - XSDT - MADT - GTDT - FADT - FACS - DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
This series has been tested on the Foundation Model 0.8 build 5206 and the Juno development board. For kernel and driver support it is based on the "Introduce ACPI for ARM64 based on ACPI 5.1" and "Drivers for Juno to boot from ACPI" patch series from Hanjun Guo.
Alexander Spyridakis (7): hw/i386: Move ACPI header definitions in an arch-independent location hw/arm/virt-acpi: Basic skeleton for dynamic generation of ACPI tables hw/arm/virt-acpi: Generate RSDP and XSDT, add helper functions hw/arm/virt-acpi: Generate FACS and FADT, update ACPI headers hw/arm/virt-acpi: GIC and Arch Timer definitions in MADT and GTDT hw/arm/virt-acpi: Generation of DSDT including virt devices hw/arm/virt: Enable dynamic generation of ACPI v5.1 tables
hw/arm/Makefile.objs | 2 +- hw/arm/boot.c | 26 +++ hw/arm/virt-acpi.c | 555 ++++++++++++++++++++++++++++++++++++++++++++ hw/arm/virt.c | 54 ++++- hw/i386/acpi-build.c | 2 +- hw/i386/acpi-defs.h | 368 ----------------------------- include/hw/acpi/acpi-defs.h | 535 ++++++++++++++++++++++++++++++++++++++++++ include/hw/arm/arm.h | 2 + include/hw/arm/virt-acpi.h | 73 ++++++ tests/bios-tables-test.c | 2 +- 10 files changed, 1244 insertions(+), 375 deletions(-) create mode 100644 hw/arm/virt-acpi.c delete mode 100644 hw/i386/acpi-defs.h create mode 100644 include/hw/acpi/acpi-defs.h create mode 100644 include/hw/arm/virt-acpi.h
The ACPI related header file acpi-defs.h, includes definitions that apply on other architectures as well. Move it in `include/hw/acpi/` to sanely include it from other architectures.
Signed-off-by: Alvise Rigo a.rigo@virtualopensystems.com --- hw/i386/acpi-build.c | 2 +- hw/i386/acpi-defs.h | 368 -------------------------------------------- include/hw/acpi/acpi-defs.h | 368 ++++++++++++++++++++++++++++++++++++++++++++ tests/bios-tables-test.c | 2 +- 4 files changed, 370 insertions(+), 370 deletions(-) delete mode 100644 hw/i386/acpi-defs.h create mode 100644 include/hw/acpi/acpi-defs.h
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 00be4bb..1888420 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -33,7 +33,7 @@ #include "hw/i386/pc.h" #include "target-i386/cpu.h" #include "hw/timer/hpet.h" -#include "hw/i386/acpi-defs.h" +#include "hw/acpi/acpi-defs.h" #include "hw/acpi/acpi.h" #include "hw/nvram/fw_cfg.h" #include "bios-linker-loader.h" diff --git a/hw/i386/acpi-defs.h b/hw/i386/acpi-defs.h deleted file mode 100644 index c4468f8..0000000 --- a/hw/i386/acpi-defs.h +++ /dev/null @@ -1,368 +0,0 @@ -/* - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - - * You should have received a copy of the GNU General Public License along - * with this program; if not, see http://www.gnu.org/licenses/. - */ -#ifndef QEMU_ACPI_DEFS_H -#define QEMU_ACPI_DEFS_H - -enum { - ACPI_FADT_F_WBINVD, - ACPI_FADT_F_WBINVD_FLUSH, - ACPI_FADT_F_PROC_C1, - ACPI_FADT_F_P_LVL2_UP, - ACPI_FADT_F_PWR_BUTTON, - ACPI_FADT_F_SLP_BUTTON, - ACPI_FADT_F_FIX_RTC, - ACPI_FADT_F_RTC_S4, - ACPI_FADT_F_TMR_VAL_EXT, - ACPI_FADT_F_DCK_CAP, - ACPI_FADT_F_RESET_REG_SUP, - ACPI_FADT_F_SEALED_CASE, - ACPI_FADT_F_HEADLESS, - ACPI_FADT_F_CPU_SW_SLP, - ACPI_FADT_F_PCI_EXP_WAK, - ACPI_FADT_F_USE_PLATFORM_CLOCK, - ACPI_FADT_F_S4_RTC_STS_VALID, - ACPI_FADT_F_REMOTE_POWER_ON_CAPABLE, - ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL, - ACPI_FADT_F_FORCE_APIC_PHYSICAL_DESTINATION_MODE, - ACPI_FADT_F_HW_REDUCED_ACPI, - ACPI_FADT_F_LOW_POWER_S0_IDLE_CAPABLE, -}; - -/* - * ACPI 2.0 Generic Address Space definition. - */ -struct Acpi20GenericAddress { - uint8_t address_space_id; - uint8_t register_bit_width; - uint8_t register_bit_offset; - uint8_t reserved; - uint64_t address; -} QEMU_PACKED; -typedef struct Acpi20GenericAddress Acpi20GenericAddress; - -struct AcpiRsdpDescriptor { /* Root System Descriptor Pointer */ - uint64_t signature; /* ACPI signature, contains "RSD PTR " */ - uint8_t checksum; /* To make sum of struct == 0 */ - uint8_t oem_id [6]; /* OEM identification */ - uint8_t revision; /* Must be 0 for 1.0, 2 for 2.0 */ - uint32_t rsdt_physical_address; /* 32-bit physical address of RSDT */ - uint32_t length; /* XSDT Length in bytes including hdr */ - uint64_t xsdt_physical_address; /* 64-bit physical address of XSDT */ - uint8_t extended_checksum; /* Checksum of entire table */ - uint8_t reserved [3]; /* Reserved field must be 0 */ -} QEMU_PACKED; -typedef struct AcpiRsdpDescriptor AcpiRsdpDescriptor; - -/* Table structure from Linux kernel (the ACPI tables are under the - BSD license) */ - - -#define ACPI_TABLE_HEADER_DEF /* ACPI common table header */ \ - uint32_t signature; /* ACPI signature (4 ASCII characters) */ \ - uint32_t length; /* Length of table, in bytes, including header */ \ - uint8_t revision; /* ACPI Specification minor version # */ \ - uint8_t checksum; /* To make sum of entire table == 0 */ \ - uint8_t oem_id [6]; /* OEM identification */ \ - uint8_t oem_table_id [8]; /* OEM table identification */ \ - uint32_t oem_revision; /* OEM revision number */ \ - uint8_t asl_compiler_id [4]; /* ASL compiler vendor ID */ \ - uint32_t asl_compiler_revision; /* ASL compiler revision number */ - - -struct AcpiTableHeader /* ACPI common table header */ -{ - ACPI_TABLE_HEADER_DEF -} QEMU_PACKED; -typedef struct AcpiTableHeader AcpiTableHeader; - -/* - * ACPI 1.0 Fixed ACPI Description Table (FADT) - */ -struct AcpiFadtDescriptorRev1 -{ - ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t firmware_ctrl; /* Physical address of FACS */ - uint32_t dsdt; /* Physical address of DSDT */ - uint8_t model; /* System Interrupt Model */ - uint8_t reserved1; /* Reserved */ - uint16_t sci_int; /* System vector of SCI interrupt */ - uint32_t smi_cmd; /* Port address of SMI command port */ - uint8_t acpi_enable; /* Value to write to smi_cmd to enable ACPI */ - uint8_t acpi_disable; /* Value to write to smi_cmd to disable ACPI */ - uint8_t S4bios_req; /* Value to write to SMI CMD to enter S4BIOS state */ - uint8_t reserved2; /* Reserved - must be zero */ - uint32_t pm1a_evt_blk; /* Port address of Power Mgt 1a acpi_event Reg Blk */ - uint32_t pm1b_evt_blk; /* Port address of Power Mgt 1b acpi_event Reg Blk */ - uint32_t pm1a_cnt_blk; /* Port address of Power Mgt 1a Control Reg Blk */ - uint32_t pm1b_cnt_blk; /* Port address of Power Mgt 1b Control Reg Blk */ - uint32_t pm2_cnt_blk; /* Port address of Power Mgt 2 Control Reg Blk */ - uint32_t pm_tmr_blk; /* Port address of Power Mgt Timer Ctrl Reg Blk */ - uint32_t gpe0_blk; /* Port addr of General Purpose acpi_event 0 Reg Blk */ - uint32_t gpe1_blk; /* Port addr of General Purpose acpi_event 1 Reg Blk */ - uint8_t pm1_evt_len; /* Byte length of ports at pm1_x_evt_blk */ - uint8_t pm1_cnt_len; /* Byte length of ports at pm1_x_cnt_blk */ - uint8_t pm2_cnt_len; /* Byte Length of ports at pm2_cnt_blk */ - uint8_t pm_tmr_len; /* Byte Length of ports at pm_tm_blk */ - uint8_t gpe0_blk_len; /* Byte Length of ports at gpe0_blk */ - uint8_t gpe1_blk_len; /* Byte Length of ports at gpe1_blk */ - uint8_t gpe1_base; /* Offset in gpe model where gpe1 events start */ - uint8_t reserved3; /* Reserved */ - uint16_t plvl2_lat; /* Worst case HW latency to enter/exit C2 state */ - uint16_t plvl3_lat; /* Worst case HW latency to enter/exit C3 state */ - uint16_t flush_size; /* Size of area read to flush caches */ - uint16_t flush_stride; /* Stride used in flushing caches */ - uint8_t duty_offset; /* Bit location of duty cycle field in p_cnt reg */ - uint8_t duty_width; /* Bit width of duty cycle field in p_cnt reg */ - uint8_t day_alrm; /* Index to day-of-month alarm in RTC CMOS RAM */ - uint8_t mon_alrm; /* Index to month-of-year alarm in RTC CMOS RAM */ - uint8_t century; /* Index to century in RTC CMOS RAM */ - uint8_t reserved4; /* Reserved */ - uint8_t reserved4a; /* Reserved */ - uint8_t reserved4b; /* Reserved */ - uint32_t flags; -} QEMU_PACKED; -typedef struct AcpiFadtDescriptorRev1 AcpiFadtDescriptorRev1; - -/* - * ACPI 1.0 Root System Description Table (RSDT) - */ -struct AcpiRsdtDescriptorRev1 -{ - ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t table_offset_entry[0]; /* Array of pointers to other */ - /* ACPI tables */ -} QEMU_PACKED; -typedef struct AcpiRsdtDescriptorRev1 AcpiRsdtDescriptorRev1; - -/* - * ACPI 1.0 Firmware ACPI Control Structure (FACS) - */ -struct AcpiFacsDescriptorRev1 -{ - uint32_t signature; /* ACPI Signature */ - uint32_t length; /* Length of structure, in bytes */ - uint32_t hardware_signature; /* Hardware configuration signature */ - uint32_t firmware_waking_vector; /* ACPI OS waking vector */ - uint32_t global_lock; /* Global Lock */ - uint32_t flags; - uint8_t resverved3 [40]; /* Reserved - must be zero */ -} QEMU_PACKED; -typedef struct AcpiFacsDescriptorRev1 AcpiFacsDescriptorRev1; - -/* - * Differentiated System Description Table (DSDT) - */ - -/* - * MADT values and structures - */ - -/* Values for MADT PCATCompat */ - -#define ACPI_DUAL_PIC 0 -#define ACPI_MULTIPLE_APIC 1 - -/* Master MADT */ - -struct AcpiMultipleApicTable -{ - ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t local_apic_address; /* Physical address of local APIC */ - uint32_t flags; -} QEMU_PACKED; -typedef struct AcpiMultipleApicTable AcpiMultipleApicTable; - -/* Values for Type in APIC sub-headers */ - -#define ACPI_APIC_PROCESSOR 0 -#define ACPI_APIC_IO 1 -#define ACPI_APIC_XRUPT_OVERRIDE 2 -#define ACPI_APIC_NMI 3 -#define ACPI_APIC_LOCAL_NMI 4 -#define ACPI_APIC_ADDRESS_OVERRIDE 5 -#define ACPI_APIC_IO_SAPIC 6 -#define ACPI_APIC_LOCAL_SAPIC 7 -#define ACPI_APIC_XRUPT_SOURCE 8 -#define ACPI_APIC_RESERVED 9 /* 9 and greater are reserved */ - -/* - * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE) - */ -#define ACPI_SUB_HEADER_DEF /* Common ACPI sub-structure header */\ - uint8_t type; \ - uint8_t length; - -/* Sub-structures for MADT */ - -struct AcpiMadtProcessorApic -{ - ACPI_SUB_HEADER_DEF - uint8_t processor_id; /* ACPI processor id */ - uint8_t local_apic_id; /* Processor's local APIC id */ - uint32_t flags; -} QEMU_PACKED; -typedef struct AcpiMadtProcessorApic AcpiMadtProcessorApic; - -struct AcpiMadtIoApic -{ - ACPI_SUB_HEADER_DEF - uint8_t io_apic_id; /* I/O APIC ID */ - uint8_t reserved; /* Reserved - must be zero */ - uint32_t address; /* APIC physical address */ - uint32_t interrupt; /* Global system interrupt where INTI - * lines start */ -} QEMU_PACKED; -typedef struct AcpiMadtIoApic AcpiMadtIoApic; - -struct AcpiMadtIntsrcovr { - ACPI_SUB_HEADER_DEF - uint8_t bus; - uint8_t source; - uint32_t gsi; - uint16_t flags; -} QEMU_PACKED; -typedef struct AcpiMadtIntsrcovr AcpiMadtIntsrcovr; - -struct AcpiMadtLocalNmi { - ACPI_SUB_HEADER_DEF - uint8_t processor_id; /* ACPI processor id */ - uint16_t flags; /* MPS INTI flags */ - uint8_t lint; /* Local APIC LINT# */ -} QEMU_PACKED; -typedef struct AcpiMadtLocalNmi AcpiMadtLocalNmi; - -/* - * HPET Description Table - */ -struct Acpi20Hpet { - ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint32_t timer_block_id; - Acpi20GenericAddress addr; - uint8_t hpet_number; - uint16_t min_tick; - uint8_t page_protect; -} QEMU_PACKED; -typedef struct Acpi20Hpet Acpi20Hpet; - -/* - * SRAT (NUMA topology description) table - */ - -struct AcpiSystemResourceAffinityTable -{ - ACPI_TABLE_HEADER_DEF - uint32_t reserved1; - uint32_t reserved2[2]; -} QEMU_PACKED; -typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable; - -#define ACPI_SRAT_PROCESSOR 0 -#define ACPI_SRAT_MEMORY 1 - -struct AcpiSratProcessorAffinity -{ - ACPI_SUB_HEADER_DEF - uint8_t proximity_lo; - uint8_t local_apic_id; - uint32_t flags; - uint8_t local_sapic_eid; - uint8_t proximity_hi[3]; - uint32_t reserved; -} QEMU_PACKED; -typedef struct AcpiSratProcessorAffinity AcpiSratProcessorAffinity; - -struct AcpiSratMemoryAffinity -{ - ACPI_SUB_HEADER_DEF - uint8_t proximity[4]; - uint16_t reserved1; - uint64_t base_addr; - uint64_t range_length; - uint32_t reserved2; - uint32_t flags; - uint32_t reserved3[2]; -} QEMU_PACKED; -typedef struct AcpiSratMemoryAffinity AcpiSratMemoryAffinity; - -/* PCI fw r3.0 MCFG table. */ -/* Subtable */ -struct AcpiMcfgAllocation { - uint64_t address; /* Base address, processor-relative */ - uint16_t pci_segment; /* PCI segment group number */ - uint8_t start_bus_number; /* Starting PCI Bus number */ - uint8_t end_bus_number; /* Final PCI Bus number */ - uint32_t reserved; -} QEMU_PACKED; -typedef struct AcpiMcfgAllocation AcpiMcfgAllocation; - -struct AcpiTableMcfg { - ACPI_TABLE_HEADER_DEF; - uint8_t reserved[8]; - AcpiMcfgAllocation allocation[0]; -} QEMU_PACKED; -typedef struct AcpiTableMcfg AcpiTableMcfg; - -/* - * TCPA Description Table - */ -struct Acpi20Tcpa { - ACPI_TABLE_HEADER_DEF /* ACPI common table header */ - uint16_t platform_class; - uint32_t log_area_minimum_length; - uint64_t log_area_start_address; -} QEMU_PACKED; -typedef struct Acpi20Tcpa Acpi20Tcpa; - -/* DMAR - DMA Remapping table r2.2 */ -struct AcpiTableDmar { - ACPI_TABLE_HEADER_DEF - uint8_t host_address_width; /* Maximum DMA physical addressability */ - uint8_t flags; - uint8_t reserved[10]; -} QEMU_PACKED; -typedef struct AcpiTableDmar AcpiTableDmar; - -/* Masks for Flags field above */ -#define ACPI_DMAR_INTR_REMAP 1 -#define ACPI_DMAR_X2APIC_OPT_OUT (1 << 1) - -/* Values for sub-structure type for DMAR */ -enum { - ACPI_DMAR_TYPE_HARDWARE_UNIT = 0, /* DRHD */ - ACPI_DMAR_TYPE_RESERVED_MEMORY = 1, /* RMRR */ - ACPI_DMAR_TYPE_ATSR = 2, /* ATSR */ - ACPI_DMAR_TYPE_HARDWARE_AFFINITY = 3, /* RHSR */ - ACPI_DMAR_TYPE_ANDD = 4, /* ANDD */ - ACPI_DMAR_TYPE_RESERVED = 5 /* Reserved for furture use */ -}; - -/* - * Sub-structures for DMAR - */ -/* Type 0: Hardware Unit Definition */ -struct AcpiDmarHardwareUnit { - uint16_t type; - uint16_t length; - uint8_t flags; - uint8_t reserved; - uint16_t pci_segment; /* The PCI Segment associated with this unit */ - uint64_t address; /* Base address of remapping hardware register-set */ -} QEMU_PACKED; -typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit; - -/* Masks for Flags field above */ -#define ACPI_DMAR_INCLUDE_PCI_ALL 1 - -#endif diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h new file mode 100644 index 0000000..c4468f8 --- /dev/null +++ b/include/hw/acpi/acpi-defs.h @@ -0,0 +1,368 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#ifndef QEMU_ACPI_DEFS_H +#define QEMU_ACPI_DEFS_H + +enum { + ACPI_FADT_F_WBINVD, + ACPI_FADT_F_WBINVD_FLUSH, + ACPI_FADT_F_PROC_C1, + ACPI_FADT_F_P_LVL2_UP, + ACPI_FADT_F_PWR_BUTTON, + ACPI_FADT_F_SLP_BUTTON, + ACPI_FADT_F_FIX_RTC, + ACPI_FADT_F_RTC_S4, + ACPI_FADT_F_TMR_VAL_EXT, + ACPI_FADT_F_DCK_CAP, + ACPI_FADT_F_RESET_REG_SUP, + ACPI_FADT_F_SEALED_CASE, + ACPI_FADT_F_HEADLESS, + ACPI_FADT_F_CPU_SW_SLP, + ACPI_FADT_F_PCI_EXP_WAK, + ACPI_FADT_F_USE_PLATFORM_CLOCK, + ACPI_FADT_F_S4_RTC_STS_VALID, + ACPI_FADT_F_REMOTE_POWER_ON_CAPABLE, + ACPI_FADT_F_FORCE_APIC_CLUSTER_MODEL, + ACPI_FADT_F_FORCE_APIC_PHYSICAL_DESTINATION_MODE, + ACPI_FADT_F_HW_REDUCED_ACPI, + ACPI_FADT_F_LOW_POWER_S0_IDLE_CAPABLE, +}; + +/* + * ACPI 2.0 Generic Address Space definition. + */ +struct Acpi20GenericAddress { + uint8_t address_space_id; + uint8_t register_bit_width; + uint8_t register_bit_offset; + uint8_t reserved; + uint64_t address; +} QEMU_PACKED; +typedef struct Acpi20GenericAddress Acpi20GenericAddress; + +struct AcpiRsdpDescriptor { /* Root System Descriptor Pointer */ + uint64_t signature; /* ACPI signature, contains "RSD PTR " */ + uint8_t checksum; /* To make sum of struct == 0 */ + uint8_t oem_id [6]; /* OEM identification */ + uint8_t revision; /* Must be 0 for 1.0, 2 for 2.0 */ + uint32_t rsdt_physical_address; /* 32-bit physical address of RSDT */ + uint32_t length; /* XSDT Length in bytes including hdr */ + uint64_t xsdt_physical_address; /* 64-bit physical address of XSDT */ + uint8_t extended_checksum; /* Checksum of entire table */ + uint8_t reserved [3]; /* Reserved field must be 0 */ +} QEMU_PACKED; +typedef struct AcpiRsdpDescriptor AcpiRsdpDescriptor; + +/* Table structure from Linux kernel (the ACPI tables are under the + BSD license) */ + + +#define ACPI_TABLE_HEADER_DEF /* ACPI common table header */ \ + uint32_t signature; /* ACPI signature (4 ASCII characters) */ \ + uint32_t length; /* Length of table, in bytes, including header */ \ + uint8_t revision; /* ACPI Specification minor version # */ \ + uint8_t checksum; /* To make sum of entire table == 0 */ \ + uint8_t oem_id [6]; /* OEM identification */ \ + uint8_t oem_table_id [8]; /* OEM table identification */ \ + uint32_t oem_revision; /* OEM revision number */ \ + uint8_t asl_compiler_id [4]; /* ASL compiler vendor ID */ \ + uint32_t asl_compiler_revision; /* ASL compiler revision number */ + + +struct AcpiTableHeader /* ACPI common table header */ +{ + ACPI_TABLE_HEADER_DEF +} QEMU_PACKED; +typedef struct AcpiTableHeader AcpiTableHeader; + +/* + * ACPI 1.0 Fixed ACPI Description Table (FADT) + */ +struct AcpiFadtDescriptorRev1 +{ + ACPI_TABLE_HEADER_DEF /* ACPI common table header */ + uint32_t firmware_ctrl; /* Physical address of FACS */ + uint32_t dsdt; /* Physical address of DSDT */ + uint8_t model; /* System Interrupt Model */ + uint8_t reserved1; /* Reserved */ + uint16_t sci_int; /* System vector of SCI interrupt */ + uint32_t smi_cmd; /* Port address of SMI command port */ + uint8_t acpi_enable; /* Value to write to smi_cmd to enable ACPI */ + uint8_t acpi_disable; /* Value to write to smi_cmd to disable ACPI */ + uint8_t S4bios_req; /* Value to write to SMI CMD to enter S4BIOS state */ + uint8_t reserved2; /* Reserved - must be zero */ + uint32_t pm1a_evt_blk; /* Port address of Power Mgt 1a acpi_event Reg Blk */ + uint32_t pm1b_evt_blk; /* Port address of Power Mgt 1b acpi_event Reg Blk */ + uint32_t pm1a_cnt_blk; /* Port address of Power Mgt 1a Control Reg Blk */ + uint32_t pm1b_cnt_blk; /* Port address of Power Mgt 1b Control Reg Blk */ + uint32_t pm2_cnt_blk; /* Port address of Power Mgt 2 Control Reg Blk */ + uint32_t pm_tmr_blk; /* Port address of Power Mgt Timer Ctrl Reg Blk */ + uint32_t gpe0_blk; /* Port addr of General Purpose acpi_event 0 Reg Blk */ + uint32_t gpe1_blk; /* Port addr of General Purpose acpi_event 1 Reg Blk */ + uint8_t pm1_evt_len; /* Byte length of ports at pm1_x_evt_blk */ + uint8_t pm1_cnt_len; /* Byte length of ports at pm1_x_cnt_blk */ + uint8_t pm2_cnt_len; /* Byte Length of ports at pm2_cnt_blk */ + uint8_t pm_tmr_len; /* Byte Length of ports at pm_tm_blk */ + uint8_t gpe0_blk_len; /* Byte Length of ports at gpe0_blk */ + uint8_t gpe1_blk_len; /* Byte Length of ports at gpe1_blk */ + uint8_t gpe1_base; /* Offset in gpe model where gpe1 events start */ + uint8_t reserved3; /* Reserved */ + uint16_t plvl2_lat; /* Worst case HW latency to enter/exit C2 state */ + uint16_t plvl3_lat; /* Worst case HW latency to enter/exit C3 state */ + uint16_t flush_size; /* Size of area read to flush caches */ + uint16_t flush_stride; /* Stride used in flushing caches */ + uint8_t duty_offset; /* Bit location of duty cycle field in p_cnt reg */ + uint8_t duty_width; /* Bit width of duty cycle field in p_cnt reg */ + uint8_t day_alrm; /* Index to day-of-month alarm in RTC CMOS RAM */ + uint8_t mon_alrm; /* Index to month-of-year alarm in RTC CMOS RAM */ + uint8_t century; /* Index to century in RTC CMOS RAM */ + uint8_t reserved4; /* Reserved */ + uint8_t reserved4a; /* Reserved */ + uint8_t reserved4b; /* Reserved */ + uint32_t flags; +} QEMU_PACKED; +typedef struct AcpiFadtDescriptorRev1 AcpiFadtDescriptorRev1; + +/* + * ACPI 1.0 Root System Description Table (RSDT) + */ +struct AcpiRsdtDescriptorRev1 +{ + ACPI_TABLE_HEADER_DEF /* ACPI common table header */ + uint32_t table_offset_entry[0]; /* Array of pointers to other */ + /* ACPI tables */ +} QEMU_PACKED; +typedef struct AcpiRsdtDescriptorRev1 AcpiRsdtDescriptorRev1; + +/* + * ACPI 1.0 Firmware ACPI Control Structure (FACS) + */ +struct AcpiFacsDescriptorRev1 +{ + uint32_t signature; /* ACPI Signature */ + uint32_t length; /* Length of structure, in bytes */ + uint32_t hardware_signature; /* Hardware configuration signature */ + uint32_t firmware_waking_vector; /* ACPI OS waking vector */ + uint32_t global_lock; /* Global Lock */ + uint32_t flags; + uint8_t resverved3 [40]; /* Reserved - must be zero */ +} QEMU_PACKED; +typedef struct AcpiFacsDescriptorRev1 AcpiFacsDescriptorRev1; + +/* + * Differentiated System Description Table (DSDT) + */ + +/* + * MADT values and structures + */ + +/* Values for MADT PCATCompat */ + +#define ACPI_DUAL_PIC 0 +#define ACPI_MULTIPLE_APIC 1 + +/* Master MADT */ + +struct AcpiMultipleApicTable +{ + ACPI_TABLE_HEADER_DEF /* ACPI common table header */ + uint32_t local_apic_address; /* Physical address of local APIC */ + uint32_t flags; +} QEMU_PACKED; +typedef struct AcpiMultipleApicTable AcpiMultipleApicTable; + +/* Values for Type in APIC sub-headers */ + +#define ACPI_APIC_PROCESSOR 0 +#define ACPI_APIC_IO 1 +#define ACPI_APIC_XRUPT_OVERRIDE 2 +#define ACPI_APIC_NMI 3 +#define ACPI_APIC_LOCAL_NMI 4 +#define ACPI_APIC_ADDRESS_OVERRIDE 5 +#define ACPI_APIC_IO_SAPIC 6 +#define ACPI_APIC_LOCAL_SAPIC 7 +#define ACPI_APIC_XRUPT_SOURCE 8 +#define ACPI_APIC_RESERVED 9 /* 9 and greater are reserved */ + +/* + * MADT sub-structures (Follow MULTIPLE_APIC_DESCRIPTION_TABLE) + */ +#define ACPI_SUB_HEADER_DEF /* Common ACPI sub-structure header */\ + uint8_t type; \ + uint8_t length; + +/* Sub-structures for MADT */ + +struct AcpiMadtProcessorApic +{ + ACPI_SUB_HEADER_DEF + uint8_t processor_id; /* ACPI processor id */ + uint8_t local_apic_id; /* Processor's local APIC id */ + uint32_t flags; +} QEMU_PACKED; +typedef struct AcpiMadtProcessorApic AcpiMadtProcessorApic; + +struct AcpiMadtIoApic +{ + ACPI_SUB_HEADER_DEF + uint8_t io_apic_id; /* I/O APIC ID */ + uint8_t reserved; /* Reserved - must be zero */ + uint32_t address; /* APIC physical address */ + uint32_t interrupt; /* Global system interrupt where INTI + * lines start */ +} QEMU_PACKED; +typedef struct AcpiMadtIoApic AcpiMadtIoApic; + +struct AcpiMadtIntsrcovr { + ACPI_SUB_HEADER_DEF + uint8_t bus; + uint8_t source; + uint32_t gsi; + uint16_t flags; +} QEMU_PACKED; +typedef struct AcpiMadtIntsrcovr AcpiMadtIntsrcovr; + +struct AcpiMadtLocalNmi { + ACPI_SUB_HEADER_DEF + uint8_t processor_id; /* ACPI processor id */ + uint16_t flags; /* MPS INTI flags */ + uint8_t lint; /* Local APIC LINT# */ +} QEMU_PACKED; +typedef struct AcpiMadtLocalNmi AcpiMadtLocalNmi; + +/* + * HPET Description Table + */ +struct Acpi20Hpet { + ACPI_TABLE_HEADER_DEF /* ACPI common table header */ + uint32_t timer_block_id; + Acpi20GenericAddress addr; + uint8_t hpet_number; + uint16_t min_tick; + uint8_t page_protect; +} QEMU_PACKED; +typedef struct Acpi20Hpet Acpi20Hpet; + +/* + * SRAT (NUMA topology description) table + */ + +struct AcpiSystemResourceAffinityTable +{ + ACPI_TABLE_HEADER_DEF + uint32_t reserved1; + uint32_t reserved2[2]; +} QEMU_PACKED; +typedef struct AcpiSystemResourceAffinityTable AcpiSystemResourceAffinityTable; + +#define ACPI_SRAT_PROCESSOR 0 +#define ACPI_SRAT_MEMORY 1 + +struct AcpiSratProcessorAffinity +{ + ACPI_SUB_HEADER_DEF + uint8_t proximity_lo; + uint8_t local_apic_id; + uint32_t flags; + uint8_t local_sapic_eid; + uint8_t proximity_hi[3]; + uint32_t reserved; +} QEMU_PACKED; +typedef struct AcpiSratProcessorAffinity AcpiSratProcessorAffinity; + +struct AcpiSratMemoryAffinity +{ + ACPI_SUB_HEADER_DEF + uint8_t proximity[4]; + uint16_t reserved1; + uint64_t base_addr; + uint64_t range_length; + uint32_t reserved2; + uint32_t flags; + uint32_t reserved3[2]; +} QEMU_PACKED; +typedef struct AcpiSratMemoryAffinity AcpiSratMemoryAffinity; + +/* PCI fw r3.0 MCFG table. */ +/* Subtable */ +struct AcpiMcfgAllocation { + uint64_t address; /* Base address, processor-relative */ + uint16_t pci_segment; /* PCI segment group number */ + uint8_t start_bus_number; /* Starting PCI Bus number */ + uint8_t end_bus_number; /* Final PCI Bus number */ + uint32_t reserved; +} QEMU_PACKED; +typedef struct AcpiMcfgAllocation AcpiMcfgAllocation; + +struct AcpiTableMcfg { + ACPI_TABLE_HEADER_DEF; + uint8_t reserved[8]; + AcpiMcfgAllocation allocation[0]; +} QEMU_PACKED; +typedef struct AcpiTableMcfg AcpiTableMcfg; + +/* + * TCPA Description Table + */ +struct Acpi20Tcpa { + ACPI_TABLE_HEADER_DEF /* ACPI common table header */ + uint16_t platform_class; + uint32_t log_area_minimum_length; + uint64_t log_area_start_address; +} QEMU_PACKED; +typedef struct Acpi20Tcpa Acpi20Tcpa; + +/* DMAR - DMA Remapping table r2.2 */ +struct AcpiTableDmar { + ACPI_TABLE_HEADER_DEF + uint8_t host_address_width; /* Maximum DMA physical addressability */ + uint8_t flags; + uint8_t reserved[10]; +} QEMU_PACKED; +typedef struct AcpiTableDmar AcpiTableDmar; + +/* Masks for Flags field above */ +#define ACPI_DMAR_INTR_REMAP 1 +#define ACPI_DMAR_X2APIC_OPT_OUT (1 << 1) + +/* Values for sub-structure type for DMAR */ +enum { + ACPI_DMAR_TYPE_HARDWARE_UNIT = 0, /* DRHD */ + ACPI_DMAR_TYPE_RESERVED_MEMORY = 1, /* RMRR */ + ACPI_DMAR_TYPE_ATSR = 2, /* ATSR */ + ACPI_DMAR_TYPE_HARDWARE_AFFINITY = 3, /* RHSR */ + ACPI_DMAR_TYPE_ANDD = 4, /* ANDD */ + ACPI_DMAR_TYPE_RESERVED = 5 /* Reserved for furture use */ +}; + +/* + * Sub-structures for DMAR + */ +/* Type 0: Hardware Unit Definition */ +struct AcpiDmarHardwareUnit { + uint16_t type; + uint16_t length; + uint8_t flags; + uint8_t reserved; + uint16_t pci_segment; /* The PCI Segment associated with this unit */ + uint64_t address; /* Base address of remapping hardware register-set */ +} QEMU_PACKED; +typedef struct AcpiDmarHardwareUnit AcpiDmarHardwareUnit; + +/* Masks for Flags field above */ +#define ACPI_DMAR_INCLUDE_PCI_ALL 1 + +#endif diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c index 9e4d205..e7985a4 100644 --- a/tests/bios-tables-test.c +++ b/tests/bios-tables-test.c @@ -17,7 +17,7 @@ #include "qemu-common.h" #include "libqtest.h" #include "qemu/compiler.h" -#include "hw/i386/acpi-defs.h" +#include "hw/acpi/acpi-defs.h" #include "hw/i386/smbios.h" #include "qemu/bitmap.h"
Introduce a preliminary skeleton in virt-acpi.c with the main ACPI build functions. The virt machine model is meant to call 'acpi_build_tables' after platform devices are created and pass all needed information. From that point each table will be created in a separate memory buffer and finally be copied to the guest memory as a single binary blob.
The minimum required ACPI v5.1 tables for ARM are: - RSDP: Initial table that points to XSDT - XSDT: Points to all other tables (except FACS & DSDT) - FADT: Generic information about the machine - DSDT: Holds all information about system devices/peripherals - FACS: Needs to be pointed from FADT
Signed-off-by: Alexander Spyridakis a.spyridakis@virtualopensystems.com Signed-off-by: Alvise Rigo a.rigo@virtualopensystems.com --- hw/arm/Makefile.objs | 2 +- hw/arm/virt-acpi.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++ include/hw/arm/virt-acpi.h | 52 ++++++++++++++++++++++++++++ 3 files changed, 137 insertions(+), 1 deletion(-) create mode 100644 hw/arm/virt-acpi.c create mode 100644 include/hw/arm/virt-acpi.h
diff --git a/hw/arm/Makefile.objs b/hw/arm/Makefile.objs index 6088e53..7cffb25 100644 --- a/hw/arm/Makefile.objs +++ b/hw/arm/Makefile.objs @@ -2,7 +2,7 @@ obj-y += boot.o collie.o exynos4_boards.o gumstix.o highbank.o obj-$(CONFIG_DIGIC) += digic_boards.o obj-y += integratorcp.o kzm.o mainstone.o musicpal.o nseries.o obj-y += omap_sx1.o palm.o realview.o spitz.o stellaris.o -obj-y += tosa.o versatilepb.o vexpress.o virt.o xilinx_zynq.o z2.o +obj-y += tosa.o versatilepb.o vexpress.o virt.o virt-acpi.o xilinx_zynq.o z2.o
obj-y += armv7m.o exynos4210.o pxa2xx.o pxa2xx_gpio.o pxa2xx_pic.o obj-$(CONFIG_DIGIC) += digic.o diff --git a/hw/arm/virt-acpi.c b/hw/arm/virt-acpi.c new file mode 100644 index 0000000..5c8df45 --- /dev/null +++ b/hw/arm/virt-acpi.c @@ -0,0 +1,84 @@ +/* + * ARM virt ACPI generation + * + * Copyright (c) 2014 Virtual Open Systems + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2 or later, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include "hw/arm/virt-acpi.h" + +static void *acpi_table[NUM_ACPI_TABLES]; +static int acpi_size[NUM_ACPI_TABLES]; + +static void acpi_create_rsdp(void) +{ + acpi_table[RSDP] = NULL; + acpi_size[RSDP] = 0; +} + +static void acpi_create_xsdt(void) +{ + acpi_table[XSDT] = NULL; + acpi_size[XSDT] = 0; +} + +static void acpi_create_madt(uint32_t smp_cpus, + const struct acpi_madt_info *info) +{ + acpi_table[MADT] = NULL; + acpi_size[MADT] = 0; +} + +static void acpi_create_gtdt(const struct acpi_gtdt_info *irqs) +{ + acpi_table[GTDT] = NULL; + acpi_size[GTDT] = 0; +} + +static void acpi_create_fadt(void) +{ + acpi_table[FADT] = NULL; + acpi_size[FADT] = 0; +} + +static void acpi_create_facs(void) +{ + acpi_table[FACS] = NULL; + acpi_size[FACS] = 0; +} + +static void acpi_create_dsdt(int smp_cpus, const struct acpi_dsdt_info *info) +{ + acpi_table[DSDT] = NULL; + acpi_size[DSDT] = 0; +} + +void acpi_build_tables(int smp_cpus, + const struct acpi_gtdt_info *gtdt_info, + const struct acpi_madt_info *madt_info, + const struct acpi_dsdt_info *dsdt_info) +{ + acpi_create_madt(smp_cpus, madt_info); + acpi_create_gtdt(gtdt_info); + acpi_create_rsdp(); + acpi_create_facs(); + acpi_create_xsdt(); + acpi_create_fadt(); + acpi_create_dsdt(smp_cpus, dsdt_info); +} + +uint32_t acpi_make_blob(void **blob_ptr) +{ + return 0; +} diff --git a/include/hw/arm/virt-acpi.h b/include/hw/arm/virt-acpi.h new file mode 100644 index 0000000..5098118 --- /dev/null +++ b/include/hw/arm/virt-acpi.h @@ -0,0 +1,52 @@ +/* + * + * Copyright (c) 2014 Virtual Open Systems + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + + * You should have received a copy of the GNU General Public License along + * with this program; if not, see http://www.gnu.org/licenses/. + */ +#ifndef QEMU_VIRT_ACPI_BUILD_H +#define QEMU_VIRT_ACPI_BUILD_H + +#include "qemu-common.h" +#include "hw/acpi/acpi-defs.h" + +#define NUM_ACPI_TABLES 7 + +enum { + RSDP, + XSDT, + MADT, + GTDT, + /* New tables should be added before FADT */ + FADT, + FACS, + DSDT, +}; + +struct acpi_gtdt_info { +}; + +struct acpi_madt_info { +}; + +struct acpi_dsdt_info { +}; + +void acpi_build_tables(int smp_cpus, + const struct acpi_gtdt_info *gtdt_info, + const struct acpi_madt_info *madt_info, + const struct acpi_dsdt_info *dsdt_info); +uint32_t acpi_make_blob(void **blob_ptr); + +#endif
On 30 October 2014 17:44, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Introduce a preliminary skeleton in virt-acpi.c with the main ACPI build functions. The virt machine model is meant to call 'acpi_build_tables' after platform devices are created and pass all needed information. From that point each table will be created in a separate memory buffer and finally be copied to the guest memory as a single binary blob.
Please, no. ACPI should be the guest's problem, not QEMU's. We provide the guest with a device tree it can look at to figure out what h/w is present.
-- PMM
On 30 October 2014 17:43, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are:
- RSDP
- XSDT
- MADT
- GTDT
- FADT
- FACS
- DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
(Sorry, I should have waited for the cover letter to arrive before replying.)
I think this is definitely the wrong approach. We already have to generate device tree information for the hardware we have, and having an equivalent parallel infrastructure for generating ACPI as well seems like it would be a tremendous mess. We should support guests that require ACPI by having QEMU boot a UEFI bios blob and have that UEFI code generate ACPI tables based on the DTB we hand it. (Chances seem good that any guest that wants ACPI is going to want UEFI runtime services anyway.)
thanks -- PMM
On Thu, Oct 30, 2014 at 05:52:44PM +0000, Peter Maydell wrote:
On 30 October 2014 17:43, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are:
- RSDP
- XSDT
- MADT
- GTDT
- FADT
- FACS
- DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
(Sorry, I should have waited for the cover letter to arrive before replying.)
I think this is definitely the wrong approach. We already have to generate device tree information for the hardware we have, and having an equivalent parallel infrastructure for generating ACPI as well seems like it would be a tremendous mess. We should support guests that require ACPI by having QEMU boot a UEFI bios blob and have that UEFI code generate ACPI tables based on the DTB we hand it. (Chances seem good that any guest that wants ACPI is going to want UEFI runtime services anyway.)
Depending on why people want ACPI in a guest environment, generating ACPI tables from a DTB might not be possible (e.g. if they want to use AML for some reason).
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
Translating DT tables into the equivalent ACPI tables seems like a waste of effort unless it enables something we can't do at the moment.
Thanks, Mark.
Hi all,
On 30.10.2014 19:02, Mark Rutland wrote:
On Thu, Oct 30, 2014 at 05:52:44PM +0000, Peter Maydell wrote:
On 30 October 2014 17:43, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are:
- RSDP
- XSDT
- MADT
- GTDT
- FADT
- FACS
- DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
(Sorry, I should have waited for the cover letter to arrive before replying.)
I think this is definitely the wrong approach. We already have to generate device tree information for the hardware we have, and having an equivalent parallel infrastructure for generating ACPI as well seems like it would be a tremendous mess. We should support guests that require ACPI by having QEMU boot a UEFI bios blob and have that UEFI code generate ACPI tables based on the DTB we hand it. (Chances seem good that any guest that wants ACPI is going to want UEFI runtime services anyway.)
Depending on why people want ACPI in a guest environment, generating ACPI tables from a DTB might not be possible (e.g. if they want to use AML for some reason).
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
Translating DT tables into the equivalent ACPI tables seems like a waste of effort unless it enables something we can't do at the moment.
Thanks, Mark.
Please correct me if I am wrong, my understanding at the moment is that for X86 there is an ACPI implementation in hw/acpi, with the table generation happening in hw/i386/acpi-build.c . Couldn't there be some unification where part of the infrastructure for ACPI is reused, with arch-specific code specializing for X86 and ARM? Why are ACPI tables created for X86, but cannot be created likewise for ARM?
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
Thanks,
Claudio
On 5 November 2014 09:58, Claudio Fontana claudio.fontana@huawei.com wrote:
Please correct me if I am wrong, my understanding at the moment is that for X86 there is an ACPI implementation in hw/acpi, with the table generation happening in hw/i386/acpi-build.c . Couldn't there be some unification where part of the infrastructure for ACPI is reused, with arch-specific code specializing for X86 and ARM? Why are ACPI tables created for X86, but cannot be created likewise for ARM?
Because then for ARM boards we'd be creating a description of the hardware twice, once in device tree and once in ACPI, which seems like unnecessary duplication.
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
thanks -- PMM
On Thu, 6 Nov 2014 12:44:04 +0000 Peter Maydell peter.maydell@linaro.org wrote:
On 5 November 2014 09:58, Claudio Fontana claudio.fontana@huawei.com wrote:
Please correct me if I am wrong, my understanding at the moment is that for X86 there is an ACPI implementation in hw/acpi, with the table generation happening in hw/i386/acpi-build.c . Couldn't there be some unification where part of the infrastructure for ACPI is reused, with arch-specific code specializing for X86 and ARM? Why are ACPI tables created for X86, but cannot be created likewise for ARM?
Because then for ARM boards we'd be creating a description of the hardware twice, once in device tree and once in ACPI, which seems like unnecessary duplication.
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
ACPI in BIOS had also led to necessity to 1. update BIOS and QEMU in lockstep if fix/feature is must to have 2. adding compatibility hooks so it it would work with mismatched versions. 3. never ending expansion of PV QEMU-BIOS interface
That's the reasons why ACPI tables are build by QEMU now, and we probably should learn on x86 experience instead of going through the same issues second time.
thanks -- PMM
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
Regards.
On 6 November 2014 13:33, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
Can you boot an x86 QEMU/KVM ACPI-using guest without running the BIOS?
-- PMM
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
From experience with Linux, querying this information from UEFI is a
trivial overhead, though a UEFI implementation might take a while to boot to the point where that is possible. It would be more generally helpful to have an optimized virtualised UEFI for this case (or perhaps just a UEFI frontend that presents the same interface to EFI applications but doesn't have to do any heavy lifting at boot).
So far the general trend with AArch64 at the system level is to use generic interfaces as far as possible. The generic interface for discovering ACPI tables is to boot as an EFI application and then to query the tables from UEFI. That is the interface others are likely to follow, and ACPI without UEFI is unlikely to be of much use to anyone else.
Why is it worth expending the effort on the boot protocol you suggest (which so far is not well defined) when there is already a portable, well-defined standard that others are already following?
Thanks, Mark.
On Tue, Nov 11, 2014 at 03:29:33PM +0000, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing the command line and memory information to Dom0 as well. We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
But, what I hear from Huawei is that they don't want any DT and don't want any UEFI, so not sure how they plan on accomplishing that.
-Christoffer
Hi Christoffer,
On Tue, Nov 11, 2014 at 04:31:01PM +0000, Christoffer Dall wrote:
On Tue, Nov 11, 2014 at 03:29:33PM +0000, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing the command line and memory information to Dom0 as well.
When you say "memory information", is that pointers to a UEFI memory map, or memory nodes? The former should work for ACPI, but I don't think the latter will. I think there's a need for some discussion regarding the Dom0 boot flow for ACPI. Is there any tree I can take a peek at?
Passing just the RDSP will mean that Dom0 won't get SMBIOS tables and other potentially useful things, in addition to simply being yet another potential boot configuration. I'm a little concerned about that.
We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
I'm not sure what an RDSP command line property would have to do with kexec. I'll assume I've misunderstood something.
But, what I hear from Huawei is that they don't want any DT and don't want any UEFI, so not sure how they plan on accomplishing that.
Indeed.
Thanks, Mark.
On Tue, Nov 11, 2014 at 04:48:07PM +0000, Mark Rutland wrote:
Hi Christoffer,
On Tue, Nov 11, 2014 at 04:31:01PM +0000, Christoffer Dall wrote:
On Tue, Nov 11, 2014 at 03:29:33PM +0000, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing the command line and memory information to Dom0 as well.
When you say "memory information", is that pointers to a UEFI memory map, or memory nodes? The former should work for ACPI, but I don't think the latter will. I think there's a need for some discussion regarding the Dom0 boot flow for ACPI. Is there any tree I can take a peek at?
Plain memory nodes. There is no UEFI instance for Dom0. AFAIU x86 does something similar (although with some custom PV thing instead of DT), and when Dom0 needs UEFI runtime services, this is done through specific hypercalls.
The Xen code is incomplete for this work, but can be followed here: https://git.linaro.org/people/parth.dixit/acpi-rsdp/xen.git/shortlog/refs/he...
The Linux side is stuff based on the LEG kernel I think, not sure if it's pushed anywhere yet.
I'm cc'ing Parth and Julien here, but I agree that having a discussion on this could probably be good.
Passing just the RDSP will mean that Dom0 won't get SMBIOS tables and other potentially useful things, in addition to simply being yet another potential boot configuration. I'm a little concerned about that.
I share your concern, but running another UEFI instance for Dom0 doesn't seem like a viable alternative either. Why is this a problem on ARM and not on x86 though?
We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
I'm not sure what an RDSP command line property would have to do with kexec. I'll assume I've misunderstood something.
I thought the kexec patches proposed passing the RDSP on the command-line to boot the secondary kernel, so if that ended up being supported by the kernel for kexec, maybe that could be leveraged by Xen's boot protocol. It was an idea someone brought to me, just thought I'd mention it.
-Christoffer
On Tue, Nov 11, 2014 at 09:33:12PM +0000, Christoffer Dall wrote:
On Tue, Nov 11, 2014 at 04:48:07PM +0000, Mark Rutland wrote:
Hi Christoffer,
On Tue, Nov 11, 2014 at 04:31:01PM +0000, Christoffer Dall wrote:
On Tue, Nov 11, 2014 at 03:29:33PM +0000, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
> We need ACPI guest support in QEMU for AArch64 over here, with all features > (including the ability to run ACPI code and add specific tables), for > ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing the command line and memory information to Dom0 as well.
When you say "memory information", is that pointers to a UEFI memory map, or memory nodes? The former should work for ACPI, but I don't think the latter will. I think there's a need for some discussion regarding the Dom0 boot flow for ACPI. Is there any tree I can take a peek at?
Plain memory nodes. There is no UEFI instance for Dom0. AFAIU x86 does something similar (although with some custom PV thing instead of DT), and when Dom0 needs UEFI runtime services, this is done through specific hypercalls.
The Xen code is incomplete for this work, but can be followed here: https://git.linaro.org/people/parth.dixit/acpi-rsdp/xen.git/shortlog/refs/he...
Thanks.
The Linux side is stuff based on the LEG kernel I think, not sure if it's pushed anywhere yet.
I'm cc'ing Parth and Julien here, but I agree that having a discussion on this could probably be good.
Sounds good to me. That might be worth running as a separate thread so as not to confuse matters.
Perhaps just using memory nodes is fine, but so far all of the discussions I've been in (on mailing lists and in person) regarding ACPI have had the fundamental assumption that ACPI would require UEFI, and the UEFI memory map is in use. Given that assumption seems to be broken for this case, we need to revisit those discussions.
There's also a problem in that this opens the possibility of non-Xen !UEFI + ACPI configurations, which I don't think is something we want to encourage. Xen is somewhat a special case because of the symbiotic relationship with Dom0.
Passing just the RDSP will mean that Dom0 won't get SMBIOS tables and other potentially useful things, in addition to simply being yet another potential boot configuration. I'm a little concerned about that.
I share your concern, but running another UEFI instance for Dom0 doesn't seem like a viable alternative either. Why is this a problem on ARM and not on x86 though?
I believe that on x86 the fallback for !UEFI would be the e820 memory map, which provides info regarding the type of the memory mapping, as opposed to just the base + size. That said, I'm not that familiar with e820, and from a quick look the provided information doesn't seem to be that detailed.
We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
I'm not sure what an RDSP command line property would have to do with kexec. I'll assume I've misunderstood something.
I thought the kexec patches proposed passing the RDSP on the command-line to boot the secondary kernel, so if that ended up being supported by the kernel for kexec, maybe that could be leveraged by Xen's boot protocol. It was an idea someone brought to me, just thought I'd mention it.
Ah, that's not something I'd heard of.
I'm not a fan of placing fundamentally required system description on the command line. It's fine for explicit overrides but I don't think it should be the default mechanism as that causes its own set of problems (who wants to fight with their hypervisor to pass a command line to a guest kernel?).
Thanks, Mark.
On Wed, Nov 12, 2014 at 10:38:54AM +0000, Mark Rutland wrote:
On Tue, Nov 11, 2014 at 09:33:12PM +0000, Christoffer Dall wrote:
On Tue, Nov 11, 2014 at 04:48:07PM +0000, Mark Rutland wrote:
Hi Christoffer,
On Tue, Nov 11, 2014 at 04:31:01PM +0000, Christoffer Dall wrote:
On Tue, Nov 11, 2014 at 03:29:33PM +0000, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote: > > > > We need ACPI guest support in QEMU for AArch64 over here, with all features > > (including the ability to run ACPI code and add specific tables), for > > ACPI-based guests. > > The plan for providing ACPI to guests is that we run a UEFI BIOS > blob which is what is responsible for providing ACPI and UEFI > runtime services to guests which need them. (The UEFI blob finds > out about its hardware by looking at a device tree that QEMU > passes it, but that's a detail between QEMU and its bios blob). > This pretty much looks like what x86 QEMU used to do with ACPI > for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing the command line and memory information to Dom0 as well.
When you say "memory information", is that pointers to a UEFI memory map, or memory nodes? The former should work for ACPI, but I don't think the latter will. I think there's a need for some discussion regarding the Dom0 boot flow for ACPI. Is there any tree I can take a peek at?
Plain memory nodes. There is no UEFI instance for Dom0. AFAIU x86 does something similar (although with some custom PV thing instead of DT), and when Dom0 needs UEFI runtime services, this is done through specific hypercalls.
The Xen code is incomplete for this work, but can be followed here: https://git.linaro.org/people/parth.dixit/acpi-rsdp/xen.git/shortlog/refs/he...
Thanks.
The Linux side is stuff based on the LEG kernel I think, not sure if it's pushed anywhere yet.
I'm cc'ing Parth and Julien here, but I agree that having a discussion on this could probably be good.
Sounds good to me. That might be worth running as a separate thread so as not to confuse matters.
Agreed.
Perhaps just using memory nodes is fine, but so far all of the discussions I've been in (on mailing lists and in person) regarding ACPI have had the fundamental assumption that ACPI would require UEFI, and the UEFI memory map is in use. Given that assumption seems to be broken for this case, we need to revisit those discussions.
There's also a problem in that this opens the possibility of non-Xen !UEFI + ACPI configurations, which I don't think is something we want to encourage. Xen is somewhat a special case because of the symbiotic relationship with Dom0.
Passing just the RDSP will mean that Dom0 won't get SMBIOS tables and other potentially useful things, in addition to simply being yet another potential boot configuration. I'm a little concerned about that.
I share your concern, but running another UEFI instance for Dom0 doesn't seem like a viable alternative either. Why is this a problem on ARM and not on x86 though?
I believe that on x86 the fallback for !UEFI would be the e820 memory map, which provides info regarding the type of the memory mapping, as opposed to just the base + size. That said, I'm not that familiar with e820, and from a quick look the provided information doesn't seem to be that detailed.
right, the good old PC.
We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
I'm not sure what an RDSP command line property would have to do with kexec. I'll assume I've misunderstood something.
I thought the kexec patches proposed passing the RDSP on the command-line to boot the secondary kernel, so if that ended up being supported by the kernel for kexec, maybe that could be leveraged by Xen's boot protocol. It was an idea someone brought to me, just thought I'd mention it.
Ah, that's not something I'd heard of.
Maybe there was a misunderstanding then, I thought you were cc'ed on those discussions.
I'm not a fan of placing fundamentally required system description on the command line. It's fine for explicit overrides but I don't think it should be the default mechanism as that causes its own set of problems (who wants to fight with their hypervisor to pass a command line to a guest kernel?).
Agreed completely, but I've been lacking strong technical arguments against passing this stuff on the cmdline. My personal preferred approach for the Xen Dom0 case is to add a property to the DT.
-Christoffer
Hi,
On 12/11/2014 10:44, Christoffer Dall wrote:
I'm not a fan of placing fundamentally required system description on the command line. It's fine for explicit overrides but I don't think it should be the default mechanism as that causes its own set of problems (who wants to fight with their hypervisor to pass a command line to a guest kernel?).
Agreed completely, but I've been lacking strong technical arguments against passing this stuff on the cmdline. My personal preferred approach for the Xen Dom0 case is to add a property to the DT.
IHMO, the cmdline is OS specific, therefore this solution wouldn't fit to support ACPI on other OS (i.e *BSD) without requiring specific implementation/parsing of the command line.
The DT solution would be cleaner and the bindings are already standardize (and starting to be used on other OS such as FreeBSD).
Regards,
[...]
We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
I'm not sure what an RDSP command line property would have to do with kexec. I'll assume I've misunderstood something.
I thought the kexec patches proposed passing the RDSP on the command-line to boot the secondary kernel, so if that ended up being supported by the kernel for kexec, maybe that could be leveraged by Xen's boot protocol. It was an idea someone brought to me, just thought I'd mention it.
Ah, that's not something I'd heard of.
Maybe there was a misunderstanding then, I thought you were cc'ed on those discussions.
I may just have lost them in my inbox. I'm on a few too many mailing lists these days. Sorry about that.
I'm not a fan of placing fundamentally required system description on the command line. It's fine for explicit overrides but I don't think it should be the default mechanism as that causes its own set of problems (who wants to fight with their hypervisor to pass a command line to a guest kernel?).
Agreed completely, but I've been lacking strong technical arguments against passing this stuff on the cmdline. My personal preferred approach for the Xen Dom0 case is to add a property to the DT.
Something under /chosen, or a firmware node would sound preferable to me. For UEFI we pass the system table address as /chosen/linux,uefi-system-table = <... ...>, and I think the RDSP could be handled similarly if necessary. The user can than override that via the command line if desired.
Ideally, the user shouldn't have to place anything on the command line to get a usable system. Obviously some things will be necesarry (where is my rootfs?), but that should be the user's configuration of the system rather than fundamental properties of said system.
The big issue I'm aware of at the moment that forces people to provide a command line (on non-virtualised systems at least) is the default console and the rate thereof, but that's being looked into currently.
Thanks, Mark.
On Wed, Nov 12, 2014 at 11:07:22AM +0000, Mark Rutland wrote:
[...]
We are currently suggesting adding an RDSP property to the chosen node in the tiny DT, but a command-line arguement like kexec proposed could be another option I guess, albeit not a very pretty one.
I'm not sure what an RDSP command line property would have to do with kexec. I'll assume I've misunderstood something.
I thought the kexec patches proposed passing the RDSP on the command-line to boot the secondary kernel, so if that ended up being supported by the kernel for kexec, maybe that could be leveraged by Xen's boot protocol. It was an idea someone brought to me, just thought I'd mention it.
Ah, that's not something I'd heard of.
Maybe there was a misunderstanding then, I thought you were cc'ed on those discussions.
I may just have lost them in my inbox. I'm on a few too many mailing lists these days. Sorry about that.
I'm not a fan of placing fundamentally required system description on the command line. It's fine for explicit overrides but I don't think it should be the default mechanism as that causes its own set of problems (who wants to fight with their hypervisor to pass a command line to a guest kernel?).
Agreed completely, but I've been lacking strong technical arguments against passing this stuff on the cmdline. My personal preferred approach for the Xen Dom0 case is to add a property to the DT.
Something under /chosen, or a firmware node would sound preferable to me. For UEFI we pass the system table address as /chosen/linux,uefi-system-table = <... ...>, and I think the RDSP could be handled similarly if necessary. The user can than override that via the command line if desired.
We have actually done that in the past when we had to support u-boot and bootwrapper as a bootloader. It works fine in the kernel and its a minimal patch to support.
Graeme
On 12/11/2014 11:38, Mark Rutland wrote:
I share your concern, but running another UEFI instance for Dom0 doesn't seem like a viable alternative either. Why is this a problem on ARM and not on x86 though?
I believe that on x86 the fallback for !UEFI would be the e820 memory map, which provides info regarding the type of the memory mapping, as opposed to just the base + size. That said, I'm not that familiar with e820, and from a quick look the provided information doesn't seem to be that detailed.
The e820 memory map is only part of it. On x86 !UEFI you are supposed to scan low memory for magic signatures that provides pointers to the SMBIOS and ACPI tables.
As Christoffer said, "the good old PC". :)
SeaBIOS fishes out information from fw_cfg, and puts it in low memory. On ARM you could use DT binary blobs instead of fw_cfg, as proposed already (I don't remember if it was in this thread or IRC). Then if you want to go !UEFI you can extract the tables from those binary blobs.
Paolo
On Wed, Nov 12, 2014 at 11:52:22AM +0000, Paolo Bonzini wrote:
On 12/11/2014 11:38, Mark Rutland wrote:
I share your concern, but running another UEFI instance for Dom0 doesn't seem like a viable alternative either. Why is this a problem on ARM and not on x86 though?
I believe that on x86 the fallback for !UEFI would be the e820 memory map, which provides info regarding the type of the memory mapping, as opposed to just the base + size. That said, I'm not that familiar with e820, and from a quick look the provided information doesn't seem to be that detailed.
The e820 memory map is only part of it. On x86 !UEFI you are supposed to scan low memory for magic signatures that provides pointers to the SMBIOS and ACPI tables.
Fun...
As Christoffer said, "the good old PC". :)
SeaBIOS fishes out information from fw_cfg, and puts it in low memory. On ARM you could use DT binary blobs instead of fw_cfg, as proposed already (I don't remember if it was in this thread or IRC). Then if you want to go !UEFI you can extract the tables from those binary blobs.
This sounds broken. I am very much not a fan of shoving binary blobs into DT to workaround a shoddy boot interface.
Mark.
On 12/11/2014 13:04, Mark Rutland wrote:
SeaBIOS fishes out information from fw_cfg, and puts it in low memory. On ARM you could use DT binary blobs instead of fw_cfg, as proposed already (I don't remember if it was in this thread or IRC). Then if you want to go !UEFI you can extract the tables from those binary blobs.
This sounds broken. I am very much not a fan of shoving binary blobs into DT to workaround a shoddy boot interface.
We tried spec-ing everything and building the tables in the firmware on x86. You really do not want to do that; it's painful to have to update firmware in lockstep with QEMU, and when you add the next feature it always seems like you got the bindings wrong.
And we only had one client (SeaBIOS) while you will have at least two (TianoCore and Linux, presumably? or is Huawei targeting OSv only?).
What we do now is we have two blobs, one with the ACPI tables and one that tells the firmware how to relocate pointers from one table to another. It's been working very well for both SeaBIOS and OVMF.
Paolo
On 12 November 2014 12:04, Mark Rutland mark.rutland@arm.com wrote:
On Wed, Nov 12, 2014 at 11:52:22AM +0000, Paolo Bonzini wrote:
SeaBIOS fishes out information from fw_cfg, and puts it in low memory. On ARM you could use DT binary blobs instead of fw_cfg, as proposed already (I don't remember if it was in this thread or IRC). Then if you want to go !UEFI you can extract the tables from those binary blobs.
This sounds broken. I am very much not a fan of shoving binary blobs into DT to workaround a shoddy boot interface.
My understanding from an IRC conversation yesterday was that at least some of these ACPI blobs contain data which has to be constructed at the point it is requested (ie is not fixed at the point when QEMU starts up), because OVMF will do: * startup * prod some parts of the hardware to configure it * request ACPI tables via fw_cfg and the ACPI tables have to reflect the statu of the hardware after OVMF's poking, not before.
It wasn't entirely clear to me whether this applies equally to the ARM UEFI setup as to x86 + OVMF, but it does suggest that it would be better to define a memory-mapped variant of fw_cfg rather than stuffing the blobs into the device tree. (I didn't much like throwing the blobs in the dtb myself either.)
If somebody with more x86/ACPI knowledge could clarify what the dynamically-constructed parts of the tables are and whether they're likely to apply to use that would be good. I think they involved PCI in some way, but I don't have access to my irc logs right now to check. (I could imagine that ARM UEFI might not need to prod and configure a PCI bus and devices in the way that an x86 BIOS expects it has to, but that's speculation on my part, and I dunno that I'd care to bake that assumption into the design anyway.)
-- PMM
On 12/11/2014 14:27, Peter Maydell wrote:
If somebody with more x86/ACPI knowledge could clarify what the dynamically-constructed parts of the tables are and whether they're likely to apply to use that would be good. I think they involved PCI in some way, but I don't have access to my irc logs right now to check. (I could imagine that ARM UEFI might not need to prod and configure a PCI bus and devices in the way that an x86 BIOS expects it has to, but that's speculation on my part, and I dunno that I'd care to bake that assumption into the design anyway.)
Yes, IIUC it's just for ACPI hotplug on devices behind a PCI-to-PCI bridge. On Linux/x86 we can use the PCI standard hotplug controller (SHPC), but not on Windows. So if it's just PCI, I tend to agree with you. ARM should really just use PCIe and SHPC hotplug and call it a day.
To me that actually goes in favor of putting blobs in the DT. :)
That said, if you really really want to write kernel drivers for stuff that x86 puts in the AML, there's little point in building the ACPI tables in QEMU. You already have a DT interface to talk to non-ACPI devices, Tiano Core can use the same bits to build ACPI tables.
Paolo
On 12.11.2014 14:27, Peter Maydell wrote:
On 12 November 2014 12:04, Mark Rutland mark.rutland@arm.com wrote:
On Wed, Nov 12, 2014 at 11:52:22AM +0000, Paolo Bonzini wrote:
SeaBIOS fishes out information from fw_cfg, and puts it in low memory. On ARM you could use DT binary blobs instead of fw_cfg, as proposed already (I don't remember if it was in this thread or IRC). Then if you want to go !UEFI you can extract the tables from those binary blobs.
This sounds broken. I am very much not a fan of shoving binary blobs into DT to workaround a shoddy boot interface.
My understanding from an IRC conversation yesterday was that at least some of these ACPI blobs contain data which has to be constructed at the point it is requested (ie is not fixed at the point when QEMU starts up), because OVMF will do:
- startup
- prod some parts of the hardware to configure it
- request ACPI tables via fw_cfg
and the ACPI tables have to reflect the statu of the hardware after OVMF's poking, not before.
It wasn't entirely clear to me whether this applies equally to the ARM UEFI setup as to x86 + OVMF, but it does suggest that it would be better to define a memory-mapped variant of fw_cfg rather than stuffing the blobs into the device tree. (I didn't much like throwing the blobs in the dtb myself either.)
If somebody with more x86/ACPI knowledge could clarify what the dynamically-constructed parts of the tables are and whether they're likely to apply to use that would be good. I think they involved PCI in some way, but I don't have access to my irc logs right now to check. (I could imagine that ARM UEFI might not need to prod and configure a PCI bus and devices in the way that an x86 BIOS expects it has to, but that's speculation on my part, and I dunno that I'd care to bake that assumption into the design anyway.)
Would the last step you mention allow for guests to start with an already existing PCI interrupt map and the BAR registers preprogrammed to point to somewhere sane?
I ask because on OSv at the moment, the situation is that for x86 we don't need to reprogram anything on PCI, as everything is already nicely set up by the time the guest starts, and thus the BAR addresses can be read directly. On ARM we have to reprogram the BARs manually for all devices.
Couldn't we give an easier time to each OS guest by having everything nicely set up on AArch64 as well?
Claudio
On Wednesday 12 November 2014 16:01:14 Claudio Fontana wrote:
Would the last step you mention allow for guests to start with an already existing PCI interrupt map and the BAR registers preprogrammed to point to somewhere sane?
I ask because on OSv at the moment, the situation is that for x86 we don't need to reprogram anything on PCI, as everything is already nicely set up by the time the guest starts, and thus the BAR addresses can be read directly. On ARM we have to reprogram the BARs manually for all devices.
Couldn't we give an easier time to each OS guest by having everything nicely set up on AArch64 as well?
Definitely, I think having the OS manually program the BARs only makes sense in an environment where you don't have a full-featured boot loader or you don't trust it. In servers and virtual machines, the PCI bus should always come fully set up. This also implies that the OS should not have to deal with registers for setting up the translation between PCI and system address ranges.
Arnd
On 12 November 2014 15:32, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 12 November 2014 16:01:14 Claudio Fontana wrote:
Would the last step you mention allow for guests to start with an already existing PCI interrupt map and the BAR registers preprogrammed to point to somewhere sane?
I ask because on OSv at the moment, the situation is that for x86 we don't need to reprogram anything on PCI, as everything is already nicely set up by the time the guest starts, and thus the BAR addresses can be read directly. On ARM we have to reprogram the BARs manually for all devices.
Couldn't we give an easier time to each OS guest by having everything nicely set up on AArch64 as well?
Definitely, I think having the OS manually program the BARs only makes sense in an environment where you don't have a full-featured boot loader or you don't trust it. In servers and virtual machines, the PCI bus should always come fully set up. This also implies that the OS should not have to deal with registers for setting up the translation between PCI and system address ranges.
It seems to me like complicated stuff like that definitely belongs in the UEFI/bootloader blob, though. I'd rather QEMU just modelled the hardware and let the guest (or the firmware, which is guest code from QEMU's point of view) set it up however it wants.
-- PMM
On 12/11/2014 16:39, Peter Maydell wrote:
Definitely, I think having the OS manually program the BARs only makes sense in an environment where you don't have a full-featured boot loader or you don't trust it. In servers and virtual machines, the PCI bus should always come fully set up. This also implies that the OS should not have to deal with registers for setting up the translation between PCI and system address ranges.
It seems to me like complicated stuff like that definitely belongs in the UEFI/bootloader blob, though. I'd rather QEMU just modelled the hardware and let the guest (or the firmware, which is guest code from QEMU's point of view) set it up however it wants.
It definitely doesn't belong in QEMU!
Paolo
On Wednesday 12 November 2014 16:52:25 Paolo Bonzini wrote:
On 12/11/2014 16:39, Peter Maydell wrote:
Definitely, I think having the OS manually program the BARs only makes sense in an environment where you don't have a full-featured boot loader or you don't trust it. In servers and virtual machines, the PCI bus should always come fully set up. This also implies that the OS should not have to deal with registers for setting up the translation between PCI and system address ranges.
It seems to me like complicated stuff like that definitely belongs in the UEFI/bootloader blob, though. I'd rather QEMU just modelled the hardware and let the guest (or the firmware, which is guest code from QEMU's point of view) set it up however it wants.
It definitely doesn't belong in QEMU!
The easiest option would probably be not make all PCI devices have fixed BARs and not even allow them to be changed. I believe this is what kvmtool does, but I can see how supporting both modes is much harder than either one.
How does it work on x86 with qemu?
Arnd
On 12/11/2014 16:57, Arnd Bergmann wrote:
It seems to me like complicated stuff like that definitely belongs in the UEFI/bootloader blob, though. I'd rather QEMU just modelled the hardware and let the guest (or the firmware, which is guest code from QEMU's point of view) set it up however it wants.
It definitely doesn't belong in QEMU!
The easiest option would probably be not make all PCI devices have fixed BARs and not even allow them to be changed. I believe this is what kvmtool does, but I can see how supporting both modes is much harder than either one.
kvmtool does not have firmware; it starts the kernel directly, so it does all the setup that usually is done by the firmware. It implements a couple real-mode interfaces that Linux uses when booting, but nothing of this deals with PCI.
x86 QEMU always runs firmware. Even if you specify -kernel, the firmware does all the usual initialization and then boots from a small ROM. The ROM contains the bootloader, so it loads and starts the kernel.
How does it work on x86 with qemu?
Same as real hardware. Firmware (SeaBIOS or OVMF) builds the memory map, decides where in the free space the BARs go, and programs the PCI devices accordingly.
kvmtool is the special one here. Xen, VMware, Hyper-V all do the same as QEMU.
Paolo
On Wednesday 12 November 2014 17:04:30 Paolo Bonzini wrote:
On 12/11/2014 16:57, Arnd Bergmann wrote:
It seems to me like complicated stuff like that definitely belongs in the UEFI/bootloader blob, though. I'd rather QEMU just modelled the hardware and let the guest (or the firmware, which is guest code from QEMU's point of view) set it up however it wants.
It definitely doesn't belong in QEMU!
The easiest option would probably be not make all PCI devices have fixed BARs and not even allow them to be changed. I believe this is what kvmtool does, but I can see how supporting both modes is much harder than either one.
kvmtool does not have firmware; it starts the kernel directly, so it does all the setup that usually is done by the firmware. It implements a couple real-mode interfaces that Linux uses when booting, but nothing of this deals with PCI.
x86 QEMU always runs firmware. Even if you specify -kernel, the firmware does all the usual initialization and then boots from a small ROM. The ROM contains the bootloader, so it loads and starts the kernel.
Ok, I see.
How does it work on x86 with qemu?
Same as real hardware. Firmware (SeaBIOS or OVMF) builds the memory map, decides where in the free space the BARs go, and programs the PCI devices accordingly.
kvmtool is the special one here. Xen, VMware, Hyper-V all do the same as QEMU.
Right. I guess embedded ARM images in qemu are a third way then, because these don't have a guest firmware but also don't set up the hardware the way that kvmtool does.
Claudio's request to do this differently on arm64 seems absolutely reasonable to me, but I guess that implies having UEFI or something like it that does the PCI scan. Not sure what the best default for "qemu -kernel image" should be though if you don't explicitly pass a firmware image.
Arnd
On 12/11/2014 17:13, Arnd Bergmann wrote:
Same as real hardware. Firmware (SeaBIOS or OVMF) builds the memory map, decides where in the free space the BARs go, and programs the PCI devices accordingly.
kvmtool is the special one here. Xen, VMware, Hyper-V all do the same as QEMU.
Right. I guess embedded ARM images in qemu are a third way then, because these don't have a guest firmware but also don't set up the hardware the way that kvmtool does.
Claudio's request to do this differently on arm64 seems absolutely reasonable to me, but I guess that implies having UEFI or something like it that does the PCI scan. Not sure what the best default for "qemu -kernel image" should be though if you don't explicitly pass a firmware image.
The PowerPC folks are using u-boot as the firmware. I know Peter disagrees, but I don't understand why so I'll throw this up for discussion too; it is definitely lighter-weight than UEFI. Would that make sense for ARM?
I just stumbled upon patches that port u-Boot to bare x86 (no SeaBIOS): http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/201885 -- they have to do the same PCI BAR initialization that Claudio is doing in OSv. Could Claudio build a u-boot ROM that sets up PCI and then automatically loads the OSv kernel?
Paolo
On 12 November 2014 16:25, Paolo Bonzini pbonzini@redhat.com wrote:
On 12/11/2014 17:13, Arnd Bergmann wrote:
Same as real hardware. Firmware (SeaBIOS or OVMF) builds the memory map, decides where in the free space the BARs go, and programs the PCI devices accordingly.
kvmtool is the special one here. Xen, VMware, Hyper-V all do the same as QEMU.
Right. I guess embedded ARM images in qemu are a third way then, because these don't have a guest firmware but also don't set up the hardware the way that kvmtool does.
Claudio's request to do this differently on arm64 seems absolutely reasonable to me, but I guess that implies having UEFI or something like it that does the PCI scan. Not sure what the best default for "qemu -kernel image" should be though if you don't explicitly pass a firmware image.
The PowerPC folks are using u-boot as the firmware. I know Peter disagrees, but I don't understand why so I'll throw this up for discussion too; it is definitely lighter-weight than UEFI. Would that make sense for ARM?
I don't have any objection to people running u-boot firmware if that's what makes sense for them. Historically QEMU's -kernel option on ARM has behaved as "do the extremely bare minimum that the ARM kernel boot protocol demands, as by-hand setup of the CPU before we start executing". Before the advent of multiprocessor this was pretty much just "set up an ATAGS list and then boot the kernel by setting a couple of registers and jumping to it". Bringing in SMP and then the GIC and now device trees has gradually increased the scope of the little bootloader that's built into QEMU, and it is showing the strains a little (especially now it has to cope with some tweaks for some board/SoC models where their SMP secondary boot protocol is different). TrustZone is probably going to complicate the picture further.
Anyway, this approach works because ARMv7 embedded Linux doesn't really demand much of its bootloader. It has the advantage that we haven't needed to carry around ROM images for all our board models (which are a pain to deal with since you get to pick between needing a target CPU toolchain for building, or distributing binary blobs; and in some cases they aren't redistributable at all unless you go to the effort of writing custom firmware just for QEMU). Unlike x86, we wouldn't be able to just have one firmware and be done with it.
ARMv8 Linux at the moment also has a pretty minimal set of booting requirements (the kernel does not require anything like setting up PCI BARs, for instance), and so we've taken a similar approach with the addition of implementing the PSCI firmware interface for CPU power up/down/reset.
You could make a pretty good argument that the QEMU builtin bootloader really has reached the end of its useful life and we should switch to always booting some lump of firmware. (Indeed, the VM spec for ARMv8 mandates a UEFI firmware.) But for any particular new feature in the past we've ended up taking the pragmatic approach of adding it to the builtin loader rather than committing to replacing the whole thing. If ARMv8 Linux starts mandating PCI setup and other complicated setup then the balance probably tilts towards running at least some firmware in all cases.
-- PMM
Hi,
The PowerPC folks are using u-boot as the firmware. I know Peter disagrees, but I don't understand why so I'll throw this up for discussion too; it is definitely lighter-weight than UEFI. Would that make sense for ARM?
Played around with that. Look here:
https://www.kraxel.org/cgit/jenkins/u-boot/tree/ https://www.kraxel.org/repos/jenkins/u-boot/
[ based on fedora uboot package, some vexpress patches on top ]
Problems I ran into:
* u-boot has no virto support. * sdcard emulation is painfully slow on arm kvm.
Oh, and working with u-boot upstream is *ahem* a challenge. After trying to get a patch upstream (rpm patch #8) I'm not really surprised any more that there are so many u-boot forks around.
Making edk2 fly on arm looks more promising to me. Especially for aarch64 of course, but also for armv7 + -M virt.
cheers, Gerd
On Mi, 2014-11-12 at 16:01 +0100, Claudio Fontana wrote:
I ask because on OSv at the moment, the situation is that for x86 we don't need to reprogram anything on PCI, as everything is already nicely set up by the time the guest starts, and thus the BAR addresses can be read directly. On ARM we have to reprogram the BARs manually for all devices.
qemu doesn't do anything here, seabios / ovmf does the pci initialization.
cheers, Gerd
Hi,
My understanding from an IRC conversation yesterday was that at least some of these ACPI blobs contain data which has to be constructed at the point it is requested (ie is not fixed at the point when QEMU starts up), because OVMF will do:
- startup
- prod some parts of the hardware to configure it
- request ACPI tables via fw_cfg
and the ACPI tables have to reflect the statu of the hardware after OVMF's poking, not before.
It is this:
[root@fedora ~]# cat /proc/ioports [ ... ] 0600-063f : 0000:00:01.3 0600-0603 : ACPI PM1a_EVT_BLK 0604-0605 : ACPI PM1a_CNT_BLK 0608-060b : ACPI PM_TMR 0700-070f : 0000:00:01.3 0700-0707 : piix4_smbus [ ... ] [root@fedora ~]# lspci -vs1.3 00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03) Subsystem: Red Hat, Inc Qemu virtual machine Flags: medium devsel, IRQ 9 Kernel driver in use: piix4_smbus Kernel modules: i2c_piix4
A bunch of acpi registers are in a hidden pci bar of the piix4 acpi device (likewise on q35). The firmware must map this somewhere, and this must be done pretty early because the firmware uses the acpi pm timer for timekeeping.
The acpi tables have references to the apci registers, so the acpi table content must match the actual register location.
[ q35 only: similar issue with the pci mmconfig xbar ]
So there are three options to solve that:
(1) Use a fixed address. Doable, but takes away some flexibility.
(2) Have qemu define the address locations. Has some nasty initialization order issues. Also would require (a) a paravirtual interface or (b) a acpi table parser in the firmware.
(3) Have firmware define the address location. This is what we did on x86. No paravirt interface needed, the firmware simply programs the registers as it likes, and qemu adapts the acpi tables accordingly.
It wasn't entirely clear to me whether this applies equally to the ARM UEFI setup as to x86 + OVMF,
I think on arm doing (2) is alot easier as DT provides a nice way to pass addresses from qemu to firmware/guest.
cheers, Gerd
On 11/13/2014 01:10 AM, Gerd Hoffmann wrote:
Hi,
My understanding from an IRC conversation yesterday was that at least some of these ACPI blobs contain data which has to be constructed at the point it is requested (ie is not fixed at the point when QEMU starts up), because OVMF will do:
- startup
- prod some parts of the hardware to configure it
- request ACPI tables via fw_cfg
and the ACPI tables have to reflect the statu of the hardware after OVMF's poking, not before.
It is this:
[root@fedora ~]# cat /proc/ioports [ ... ] 0600-063f : 0000:00:01.3 0600-0603 : ACPI PM1a_EVT_BLK 0604-0605 : ACPI PM1a_CNT_BLK 0608-060b : ACPI PM_TMR 0700-070f : 0000:00:01.3 0700-0707 : piix4_smbus [ ... ]
So this is problematic: the PM1a_EVT_BLK and PM1a_CNT_BLK should not exist if hardware reduced mode ACPI is being used; the values in the FADT should be zero so there should be no ioports (see section 5.2.9 of the ACPI spec). If this is from an ARM platform, it _should_ be in hardware reduced mode. QEMU will have to take that into account.
On 13/11/2014 19:16, Al Stone wrote:
[root@fedora ~]# cat /proc/ioports [ ... ] 0600-063f : 0000:00:01.3 0600-0603 : ACPI PM1a_EVT_BLK 0604-0605 : ACPI PM1a_CNT_BLK 0608-060b : ACPI PM_TMR 0700-070f : 0000:00:01.3 0700-0707 : piix4_smbus [ ... ]
So this is problematic: the PM1a_EVT_BLK and PM1a_CNT_BLK should not exist if hardware reduced mode ACPI is being used; the values in the FADT should be zero so there should be no ioports (see section 5.2.9 of the ACPI spec). If this is from an ARM platform, it _should_ be in hardware reduced mode. QEMU will have to take that into account.
No, this is x86. As mentioned elsewhere in the thread, we'd have to add a GPIO controller (e.g. PL061) to mach-virt, to replace GPE.
Paolo
On Do, 2014-11-13 at 11:16 -0700, Al Stone wrote:
On 11/13/2014 01:10 AM, Gerd Hoffmann wrote:
Hi,
My understanding from an IRC conversation yesterday was that at least some of these ACPI blobs contain data which has to be constructed at the point it is requested (ie is not fixed at the point when QEMU starts up), because OVMF will do:
- startup
- prod some parts of the hardware to configure it
- request ACPI tables via fw_cfg
and the ACPI tables have to reflect the statu of the hardware after OVMF's poking, not before.
It is this:
[root@fedora ~]# cat /proc/ioports [ ... ] 0600-063f : 0000:00:01.3 0600-0603 : ACPI PM1a_EVT_BLK 0604-0605 : ACPI PM1a_CNT_BLK 0608-060b : ACPI PM_TMR 0700-070f : 0000:00:01.3 0700-0707 : piix4_smbus [ ... ]
So this is problematic: the PM1a_EVT_BLK and PM1a_CNT_BLK should not exist if hardware reduced mode ACPI is being used;
This is x86 and gives some background on why the "firmware inits hardware -> qemu adjusts addresses in acpi tables -> firmware loads acpi tables via fw_cfg" init sequence is used there.
Figuring whenever you have similar problems / requirements on arm or if you can simply go with static acpi tables is up to you ;)
cheers, Gerd
On 11.11.2014 16:29, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
From experience with Linux, querying this information from UEFI is a trivial overhead, though a UEFI implementation might take a while to boot to the point where that is possible. It would be more generally helpful to have an optimized virtualised UEFI for this case (or perhaps just a UEFI frontend that presents the same interface to EFI applications but doesn't have to do any heavy lifting at boot).
So far the general trend with AArch64 at the system level is to use generic interfaces as far as possible. The generic interface for discovering ACPI tables is to boot as an EFI application and then to query the tables from UEFI. That is the interface others are likely to follow, and ACPI without UEFI is unlikely to be of much use to anyone else.
Why is it worth expending the effort on the boot protocol you suggest (which so far is not well defined) when there is already a portable, well-defined standard that others are already following?
Thanks, Mark.
Hi,
I tend to mostly agree with this, we might look for alternative solutions for speeding up guest startup in the future but in general if getting ACPI in the guest for ARM64 requires also getting UEFI, then I can personally live with that, especially if we strive to have the kind of optimized virtualized UEFI you mention.
We can in the meantime use these patches to test the "fast path" to the guest by just hardcoding or passing on the command line what is needed to be able to test information reading from ACPI from the guest side.
I cc: my colleagues Jani and Paul which have more the use case in mind and might correct me there.
For x86 though what is the state of UEFI in QEMU? Is it relying on the OVMF project to provide the firmware images if I understand correctly.. How is it working out in practice? Should the same kind of approach be taken for ARM64?
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a limited functionality which makes it not fully useful in practice. I'd rather see it as a first class citizen instead, including the ability to run AML code.
Thanks,
Claudio
On Wed, Nov 12, 2014 at 09:08:55AM +0000, Claudio Fontana wrote:
On 11.11.2014 16:29, Mark Rutland wrote:
Hi,
On Thu, Nov 06, 2014 at 01:33:20PM +0000, Alexander Spyridakis wrote:
On 6 November 2014 14:44, Peter Maydell peter.maydell@linaro.org wrote:
We need ACPI guest support in QEMU for AArch64 over here, with all features (including the ability to run ACPI code and add specific tables), for ACPI-based guests.
The plan for providing ACPI to guests is that we run a UEFI BIOS blob which is what is responsible for providing ACPI and UEFI runtime services to guests which need them. (The UEFI blob finds out about its hardware by looking at a device tree that QEMU passes it, but that's a detail between QEMU and its bios blob). This pretty much looks like what x86 QEMU used to do with ACPI for a very long time, so we know it's a feasible approach.
Hi Peter,
The rational in the proposed approach is meant for cases where the user does not want to rely on external firmware layers. While UEFI could do what you are describing, the point is to avoid this not so trivial overhead in the booting process. Especially in the case of thin guests, where another software dependency is undesired.
I'm not sure how you plan to use ACPI without UEFI, as there are several pieces of information which ACPI misses, such as the memory map, which must be discovered from UEFI. How do you intend to discover the memory map without UEFI?
Additionally, with Linux and other generic OSs, the expectation is that the ACPI tables are discovered via the UEFI system table. How do you intend to discover the ACPI tables? Or other system information?
From experience with Linux, querying this information from UEFI is a trivial overhead, though a UEFI implementation might take a while to boot to the point where that is possible. It would be more generally helpful to have an optimized virtualised UEFI for this case (or perhaps just a UEFI frontend that presents the same interface to EFI applications but doesn't have to do any heavy lifting at boot).
So far the general trend with AArch64 at the system level is to use generic interfaces as far as possible. The generic interface for discovering ACPI tables is to boot as an EFI application and then to query the tables from UEFI. That is the interface others are likely to follow, and ACPI without UEFI is unlikely to be of much use to anyone else.
Why is it worth expending the effort on the boot protocol you suggest (which so far is not well defined) when there is already a portable, well-defined standard that others are already following?
Thanks, Mark.
Hi,
I tend to mostly agree with this, we might look for alternative solutions for speeding up guest startup in the future but in general if getting ACPI in the guest for ARM64 requires also getting UEFI, then I can personally live with that, especially if we strive to have the kind of optimized virtualized UEFI you mention.
Given that UEFI will be required for other guests (e.g. if you want to boot a distribution's ISO image), I hope that virtualised UEFI will see some optimisation work.
We can in the meantime use these patches to test the "fast path" to the guest by just hardcoding or passing on the command line what is needed to be able to test information reading from ACPI from the guest side.
I cc: my colleagues Jani and Paul which have more the use case in mind and might correct me there.
For x86 though what is the state of UEFI in QEMU? Is it relying on the OVMF project to provide the firmware images if I understand correctly.. How is it working out in practice? Should the same kind of approach be taken for ARM64?
I'm unfortunately not familiar enough with OVMF to answer any of these questions.
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
For the features which ACPI provides which device trees do not (e.g. the dynamic addition and removal of memory and CPUs), there will need to be some sort of interface between QEMU and the ACPI implementation. That's already outside of the realm of DT, so as previously mentioned a simple conversion doesn't cover the general case.
For most static things, device tree should be suitable as-is (i.e. without translation to ACPI) for all currently supported guests.
I think any ACPI implemenation for a hypervisor should provide a demonstrable useful feature (e.g. hot-add of CPUs) before merging so we know the infrastructure is suitable.
I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a limited functionality which makes it not fully useful in practice. I'd rather see it as a first class citizen instead, including the ability to run AML code.
I agree that there's no point in having ACPI in a guest unless it provides something which dt does not. I don't know how it should be structured to provide those useful features.
Thanks, Mark.
On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
On Wed, Nov 12, 2014 at 09:08:55AM +0000, Claudio Fontana wrote:
On 11.11.2014 16:29, Mark Rutland wrote:
I tend to mostly agree with this, we might look for alternative solutions for speeding up guest startup in the future but in general if getting ACPI in the guest for ARM64 requires also getting UEFI, then I can personally live with that, especially if we strive to have the kind of optimized virtualized UEFI you mention.
Given that UEFI will be required for other guests (e.g. if you want to boot a distribution's ISO image), I hope that virtualised UEFI will see some optimisation work.
I think the requirement it just for KVM to provide something that behaves exactly like UEFI, it doesn't have to be the full Tianocore implementation if it's easier to reimplement the boot interface.
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
For the features which ACPI provides which device trees do not (e.g. the dynamic addition and removal of memory and CPUs), there will need to be some sort of interface between QEMU and the ACPI implementation. That's already outside of the realm of DT, so as previously mentioned a simple conversion doesn't cover the general case.
I think we need to support the low-level interfaces in the kernel for this anyway, we should not have to use ACPI just to do memory and CPU hotplugging in KVM, that would be silly. If ACPI is present, it can provide a wrapper for the same interface, but KVM should not need to be aware of the fact that ACPI is used in the guest, after it has passed the initial ACPI blob to the kernel.
I think any ACPI implemenation for a hypervisor should provide a demonstrable useful feature (e.g. hot-add of CPUs) before merging so we know the infrastructure is suitable.
I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a limited functionality which makes it not fully useful in practice. I'd rather see it as a first class citizen instead, including the ability to run AML code.
I agree that there's no point in having ACPI in a guest unless it provides something which dt does not. I don't know how it should be structured to provide those useful features.
I see it the opposite way: we shouldn't have to use ACPI just to make use of some feature in Linux, the only reason why you'd want ACPI support in KVM is to be able to run Windows. It makes sense for the ACPI implementation to be compatible with the Linux ACPI code as well so we can test it better.
Arnd
On Wed, Nov 12, 2014 at 12:15:08PM +0100, Arnd Bergmann wrote:
On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
On Wed, Nov 12, 2014 at 09:08:55AM +0000, Claudio Fontana wrote:
On 11.11.2014 16:29, Mark Rutland wrote:
I tend to mostly agree with this, we might look for alternative solutions for speeding up guest startup in the future but in general if getting ACPI in the guest for ARM64 requires also getting UEFI, then I can personally live with that, especially if we strive to have the kind of optimized virtualized UEFI you mention.
Given that UEFI will be required for other guests (e.g. if you want to boot a distribution's ISO image), I hope that virtualised UEFI will see some optimisation work.
I think the requirement it just for KVM to provide something that behaves exactly like UEFI, it doesn't have to be the full Tianocore implementation if it's easier to reimplement the boot interface.
We already have a port of Tinaocode that works for virt, but yes, implementing something ligther is certainly possible.
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
For the features which ACPI provides which device trees do not (e.g. the dynamic addition and removal of memory and CPUs), there will need to be some sort of interface between QEMU and the ACPI implementation. That's already outside of the realm of DT, so as previously mentioned a simple conversion doesn't cover the general case.
I think we need to support the low-level interfaces in the kernel for this anyway, we should not have to use ACPI just to do memory and CPU hotplugging in KVM, that would be silly.
I had that same intuitive feeling, but lacked good tecnical arguments for it. Care to elaborate on that?
If ACPI is present, it can provide a wrapper for the same interface, but KVM should not need to be aware of the fact that ACPI is used in the guest, after it has passed the initial ACPI blob to the kernel.
That's where things begin to be a bit foggy for me. AFAIU ACPI already has a method for doing this and I speculate that there is some IRQ assigned to an ACPI event that causes some AML code to be interpreted by your OS. Wouldn't it be a matter of QEMU putting the right AML table fragments in place to wire this up then?
-Christoffer
On 12/11/2014 12:34, Christoffer Dall wrote:
AFAIU ACPI already has a method for doing this
It's not defined in the spec. QEMU defines a bunch of registers to do that, and provides AML that works with those registers.
While these registers can be separated from the ACPI code in QEMU...
and I speculate that there is some IRQ assigned to an ACPI event that causes some AML code to be interpreted by your OS.
... QEMU does exactly this, it uses the "general purpose event" (GPE) mechanism to trigger the parsing of the AML. When you hot-plug/unplug a CPU or memory, an SCI (system control interrupt - the ACPI IRQ) is triggered in the guest and that's not entirely disconnected from ACPI.
Perhaps you could treat it as a shared level-triggered interrupt in DT? I don't know.
Wouldn't it be a matter of QEMU putting the right AML table fragments in place to wire this up then?
Yes.
Paolo
On Wed, Nov 12, 2014 at 11:48:27AM +0000, Paolo Bonzini wrote:
On 12/11/2014 12:34, Christoffer Dall wrote:
AFAIU ACPI already has a method for doing this
It's not defined in the spec. QEMU defines a bunch of registers to do that, and provides AML that works with those registers.
Huh? SCI + AML is the method, and that's defined by the spec.
The registers the AML pokes are an implementation detail abstracted by the AML, and as such are irrelevant to the spec.
While these registers can be separated from the ACPI code in QEMU...
and I speculate that there is some IRQ assigned to an ACPI event that causes some AML code to be interpreted by your OS.
... QEMU does exactly this, it uses the "general purpose event" (GPE) mechanism to trigger the parsing of the AML. When you hot-plug/unplug a CPU or memory, an SCI (system control interrupt - the ACPI IRQ) is triggered in the guest and that's not entirely disconnected from ACPI.
Perhaps you could treat it as a shared level-triggered interrupt in DT? I don't know.
Putting an interrupt in DT is trivial. The hard part is the rest of the interface, which so far there is no specification for.
Thanks, Mark.
On 12/11/2014 13:18, Mark Rutland wrote:
On Wed, Nov 12, 2014 at 11:48:27AM +0000, Paolo Bonzini wrote:
On 12/11/2014 12:34, Christoffer Dall wrote:
AFAIU ACPI already has a method for doing this
It's not defined in the spec. QEMU defines a bunch of registers to do that, and provides AML that works with those registers.
Huh? SCI + AML is the method, and that's defined by the spec.
I thought Christoffer meant a method to do the actual hotplug, not just to signal events. If you want to "support the low-level interfaces in the kernel for this anyway", you certainly need to know the details underneath the AML.
Perhaps you could treat it as a shared level-triggered interrupt in DT? I don't know.
Putting an interrupt in DT is trivial. The hard part is the rest of the interface, which so far there is no specification for.
Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt? Writing a DT binding for it is trivial too. Or are we talking about two different things?
Paolo
On Wed, Nov 12, 2014 at 1:27 PM, Paolo Bonzini pbonzini@redhat.com wrote:
On 12/11/2014 13:18, Mark Rutland wrote:
On Wed, Nov 12, 2014 at 11:48:27AM +0000, Paolo Bonzini wrote:
On 12/11/2014 12:34, Christoffer Dall wrote:
AFAIU ACPI already has a method for doing this
It's not defined in the spec. QEMU defines a bunch of registers to do that, and provides AML that works with those registers.
Huh? SCI + AML is the method, and that's defined by the spec.
I thought Christoffer meant a method to do the actual hotplug, not just to signal events. If you want to "support the low-level interfaces in the kernel for this anyway", you certainly need to know the details underneath the AML.
I was being deliberately vague, because I don't know how this works (I did use the word speculate, didn't I?).
But yes, one of the reasons why I would argue that we're not in a hurry to accept these patches is that we don't know how the actual underlying mechanisms to do the things we want ACPI to facilitate (e.g. cpu and memory hotplug) really work, and thus we cannot verify the implementation or benefit directly from it today.
-Christoffer
On Wednesday 12 November 2014 13:27:14 Paolo Bonzini wrote:
On 12/11/2014 13:18, Mark Rutland wrote:
On Wed, Nov 12, 2014 at 11:48:27AM +0000, Paolo Bonzini wrote:
Perhaps you could treat it as a shared level-triggered interrupt in DT? I don't know.
Putting an interrupt in DT is trivial. The hard part is the rest of the interface, which so far there is no specification for.
Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt? Writing a DT binding for it is trivial too. Or are we talking about two different things?
Interesting. I agree that doing a DT binding for these will be trivial, and the implementation in Linux should also be straightforward, thanks for pointing these out.
However, it seems that the implementation that qemu uses is incompatible with ARM64, since GPE is not part of the ACPI hardware reduced mode.
Arnd
On 12/11/2014 14:08, Arnd Bergmann wrote:
Putting an interrupt in DT is trivial. The hard part is the rest of the interface, which so far there is no specification for.
Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt? Writing a DT binding for it is trivial too. Or are we talking about two different things?
Interesting. I agree that doing a DT binding for these will be trivial, and the implementation in Linux should also be straightforward, thanks for pointing these out.
However, it seems that the implementation that qemu uses is incompatible with ARM64, since GPE is not part of the ACPI hardware reduced mode.
That makes it even simpler because you do not have to use SCI.
-M virt currently doesn't have a GPIO controller, but it's easy to add a PL061 and specify a random GPIO pin in the DT bindings. Then you can use GPIO-signaled ACPI events in the AML, just move _Lxx methods move from _GPE to _SB.GPIO under the GPIO controller.
Paolo
On Wed, Nov 12, 2014 at 12:27:14PM +0000, Paolo Bonzini wrote:
On 12/11/2014 13:18, Mark Rutland wrote:
On Wed, Nov 12, 2014 at 11:48:27AM +0000, Paolo Bonzini wrote:
On 12/11/2014 12:34, Christoffer Dall wrote:
AFAIU ACPI already has a method for doing this
It's not defined in the spec. QEMU defines a bunch of registers to do that, and provides AML that works with those registers.
Huh? SCI + AML is the method, and that's defined by the spec.
I thought Christoffer meant a method to do the actual hotplug, not just to signal events. If you want to "support the low-level interfaces in the kernel for this anyway", you certainly need to know the details underneath the AML.
I think when Christoffer said that ACPI already had a mechanism he was referring to the SCI + AML, as opposed to the underlying implementation.
Perhaps you could treat it as a shared level-triggered interrupt in DT? I don't know.
Putting an interrupt in DT is trivial. The hard part is the rest of the interface, which so far there is no specification for.
Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt? Writing a DT binding for it is trivial too. Or are we talking about two different things?
Writing a binding for the partiuclar device might be trivial. How this would fit with the DT model is more complicated, and needs to be specified. As far as I can see, those documents are quite strongly tied to x86 ACPI (they talk in terms of OSPM, OST, GPE, APIC IDs).
I agree that we could do CPU and memory hotplug without ACPI, but we need to specify the full firmware interface for doing so, and how this interacts with the initial DTB if using DT. We can't just pick-and-mix portions of ACPI and state that it's specified and standard.
Thanks, Mark.
On 12/11/2014 14:41, Mark Rutland wrote:
Writing a binding for the partiuclar device might be trivial. How this would fit with the DT model is more complicated, and needs to be specified. As far as I can see, those documents are quite strongly tied to x86 ACPI (they talk in terms of OSPM, OST, GPE, APIC IDs).
OSPM -> replace with kernel driver OST -> used to generate events, doesn't need to be implemented in the kernel driver. Or just define the meaning based on the ACPI _OST spec. GPE -> replace with GPIO APIC ID -> replace with whatever id ARM CPUs have
I agree that we could do CPU and memory hotplug without ACPI, but we need to specify the full firmware interface for doing so, and how this interacts with the initial DTB if using DT.
The initial DTB has to expose the IDs for hotpluggable CPUs and the range for hotpluggable memory. In the ACPI case this goes in the SRAT. But none of this is insurmountable.
We can't just pick-and-mix portions of ACPI and state that it's specified and standard.
But that's what you already do if you want to build ACPI tables from DT. You are already picking-and-mixing the variable portions of the ACPI tables and make a DT bindings for them.
All that's left is to de-x86ify the existing spec (which is really easy), and to figure out a DT binding for it. It's really not unlike any other MMIO device.
Paolo
On Wed, Nov 12, 2014 at 01:59:30PM +0000, Paolo Bonzini wrote:
On 12/11/2014 14:41, Mark Rutland wrote:
Writing a binding for the partiuclar device might be trivial. How this would fit with the DT model is more complicated, and needs to be specified. As far as I can see, those documents are quite strongly tied to x86 ACPI (they talk in terms of OSPM, OST, GPE, APIC IDs).
OSPM -> replace with kernel driver OST -> used to generate events, doesn't need to be implemented in the kernel driver. Or just define the meaning based on the ACPI _OST spec. GPE -> replace with GPIO APIC ID -> replace with whatever id ARM CPUs have
I agree that we could do CPU and memory hotplug without ACPI, but we need to specify the full firmware interface for doing so, and how this interacts with the initial DTB if using DT.
The initial DTB has to expose the IDs for hotpluggable CPUs and the range for hotpluggable memory. In the ACPI case this goes in the SRAT. But none of this is insurmountable.
My only point is that it needs to be defined, not that this definition is impossible. That does trickle into a few places -- we currently have no way of defining a CPU in DT which is possible but not present, for example (the status property historically means something different per ePAPR).
We can't just pick-and-mix portions of ACPI and state that it's specified and standard.
But that's what you already do if you want to build ACPI tables from DT. You are already picking-and-mixing the variable portions of the ACPI tables and make a DT bindings for them.
I don't follow. I argued _against_ trying to build ACPI tables from DT because the two don't quite match up anyway. I argued _against_ trying to convert ACPI tables to DT in prior discussions for similar reasons.
All that's left is to de-x86ify the existing spec (which is really easy), and to figure out a DT binding for it. It's really not unlike any other MMIO device.
In addition to fixing up the other specs which are affected by this (e.g. how we describe those additional CPUs). There's also some de-ACPIing to be done in addition to de-x86ification, and we need to be careful to ensure we have access to all the information we require in the absence of ACPI, and that we have a well defined behaviour on both sides of the interface for what would previously have been implicit in ACPI.
I'm not saying that this is impossible. It's just a greater body of work than modifying one spec.
Thanks, Mark.
On 12/11/2014 15:10, Mark Rutland wrote:
We can't just pick-and-mix portions of ACPI and state that it's specified and standard.
But that's what you already do if you want to build ACPI tables from DT. You are already picking-and-mixing the variable portions of the ACPI tables and make a DT bindings for them.
I don't follow. I argued _against_ trying to build ACPI tables from DT because the two don't quite match up anyway. I argued _against_ trying to convert ACPI tables to DT in prior discussions for similar reasons.
Sorry, that was not you-Mark, but more you-ARM.
And in fact I'd tend to agree with you, but if there are people that want not to use ACPI or UEFI or both, I think it's better if the UEFI firmware swallows the same pill and builds ACPI from DT.
In addition to fixing up the other specs which are affected by this (e.g. how we describe those additional CPUs). There's also some de-ACPIing to be done in addition to de-x86ification, and we need to be careful to ensure we have access to all the information we require in the absence of ACPI, and that we have a well defined behaviour on both sides of the interface for what would previously have been implicit in ACPI.
Yes, I agree. On the QEMU side the de-ACPIfication would have to be done anyway (no GPE because of the reduced hardware), but you need extra de-ACPIfication for stuff like the SRAT.
I'm not saying that this is impossible. It's just a greater body of work than modifying one spec.
No doubt about that.
Paolo
On Wednesday 12 November 2014 12:34:01 Christoffer Dall wrote:
On Wed, Nov 12, 2014 at 12:15:08PM +0100, Arnd Bergmann wrote:
On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
For the features which ACPI provides which device trees do not (e.g. the dynamic addition and removal of memory and CPUs), there will need to be some sort of interface between QEMU and the ACPI implementation. That's already outside of the realm of DT, so as previously mentioned a simple conversion doesn't cover the general case.
I think we need to support the low-level interfaces in the kernel for this anyway, we should not have to use ACPI just to do memory and CPU hotplugging in KVM, that would be silly.
I had that same intuitive feeling, but lacked good tecnical arguments for it. Care to elaborate on that?
ACPI always has to interface back to the hypervisor to do anything that changes the hardware configuration, so it essentially has to perform a hypercall or touch some virtual register.
If we need to architect that interface in qemu anyway, we should make it sane enough for the kernel to use directly, without having to go through ACPI, as not everyone will want to run ACPI.
If ACPI is present, it can provide a wrapper for the same interface, but KVM should not need to be aware of the fact that ACPI is used in the guest, after it has passed the initial ACPI blob to the kernel.
That's where things begin to be a bit foggy for me. AFAIU ACPI already has a method for doing this and I speculate that there is some IRQ assigned to an ACPI event that causes some AML code to be interpreted by your OS. Wouldn't it be a matter of QEMU putting the right AML table fragments in place to wire this up then?
Yes, that is what I meant with a wrapper. The two choices are:
1. have an interrupt and a hypercall or mmio interface. When the interrupt gets triggered, we ask the interface what happened and do something on the same interface depending on the state of the system.
2. have an interrupt that causes AML code to be run, that code will use the hypercall or mmio interface to find out what happened and create an ACPI event. This event is enqueued to the generic ACPI hotplug handler, which depending on the state of the system decides to do something by calling into AML code again, which will trigger the underlying interface.
From qemu's point of view, the two are doing exactly the same thing,
except that the MMIO access can be hidden in AML so the OS doesn't have to know the interface. Note that in case of Xen, the use of hypercalls means that the OS has to know the interface after all, so the second half of the process is handled by drivers/xen/xen-acpi-*hotplug.c.
Note how the implementation that uses the ACPI wrapper is much more complex than the native one that does the same thing:
-rw-r--r-- 1 arnd arnd 2119 Nov 10 16:43 drivers/xen/cpu_hotplug.c -rw-r--r-- 1 arnd arnd 10987 Nov 10 16:43 drivers/xen/xen-acpi-cpuhotplug.c
-rw-r--r-- 1 arnd arnd 6894 Nov 10 16:43 drivers/xen/xen-balloon.c -rw-r--r-- 1 arnd arnd 12085 Nov 10 16:43 drivers/xen/xen-acpi-memhotplug.c
Arnd
On Wed, Nov 12, 2014 at 02:03:01PM +0100, Arnd Bergmann wrote:
On Wednesday 12 November 2014 12:34:01 Christoffer Dall wrote:
On Wed, Nov 12, 2014 at 12:15:08PM +0100, Arnd Bergmann wrote:
On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
For the features which ACPI provides which device trees do not (e.g. the dynamic addition and removal of memory and CPUs), there will need to be some sort of interface between QEMU and the ACPI implementation. That's already outside of the realm of DT, so as previously mentioned a simple conversion doesn't cover the general case.
I think we need to support the low-level interfaces in the kernel for this anyway, we should not have to use ACPI just to do memory and CPU hotplugging in KVM, that would be silly.
I had that same intuitive feeling, but lacked good tecnical arguments for it. Care to elaborate on that?
ACPI always has to interface back to the hypervisor to do anything that changes the hardware configuration, so it essentially has to perform a hypercall or touch some virtual register.
If we need to architect that interface in qemu anyway, we should make it sane enough for the kernel to use directly, without having to go through ACPI, as not everyone will want to run ACPI.
With the usual benefits of doing it in ACPI will not require updates on both sides if we need to fix something, but also the usual downside of having something obscure and pseduo-hidden, which may be broken underneath, I suppose.
If ACPI is present, it can provide a wrapper for the same interface, but KVM should not need to be aware of the fact that ACPI is used in the guest, after it has passed the initial ACPI blob to the kernel.
That's where things begin to be a bit foggy for me. AFAIU ACPI already has a method for doing this and I speculate that there is some IRQ assigned to an ACPI event that causes some AML code to be interpreted by your OS. Wouldn't it be a matter of QEMU putting the right AML table fragments in place to wire this up then?
Yes, that is what I meant with a wrapper. The two choices are:
have an interrupt and a hypercall or mmio interface. When the interrupt gets triggered, we ask the interface what happened and do something on the same interface depending on the state of the system.
have an interrupt that causes AML code to be run, that code will use the hypercall or mmio interface to find out what happened and create an ACPI event. This event is enqueued to the generic ACPI hotplug handler, which depending on the state of the system decides to do something by calling into AML code again, which will trigger the underlying interface.
From qemu's point of view, the two are doing exactly the same thing, except that the MMIO access can be hidden in AML so the OS doesn't have to know the interface.
right, thanks for the explanation.
Note that in case of Xen, the use of hypercalls means that the OS has to know the interface after all, so the second half of the process is handled by drivers/xen/xen-acpi-*hotplug.c.
Note how the implementation that uses the ACPI wrapper is much more complex than the native one that does the same thing:
-rw-r--r-- 1 arnd arnd 2119 Nov 10 16:43 drivers/xen/cpu_hotplug.c -rw-r--r-- 1 arnd arnd 10987 Nov 10 16:43 drivers/xen/xen-acpi-cpuhotplug.c
-rw-r--r-- 1 arnd arnd 6894 Nov 10 16:43 drivers/xen/xen-balloon.c -rw-r--r-- 1 arnd arnd 12085 Nov 10 16:43 drivers/xen/xen-acpi-memhotplug.c
Interesting.
-Christoffer
On Wed, Nov 12, 2014 at 11:15:08AM +0000, Arnd Bergmann wrote:
On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
On Wed, Nov 12, 2014 at 09:08:55AM +0000, Claudio Fontana wrote:
On 11.11.2014 16:29, Mark Rutland wrote:
I tend to mostly agree with this, we might look for alternative solutions for speeding up guest startup in the future but in general if getting ACPI in the guest for ARM64 requires also getting UEFI, then I can personally live with that, especially if we strive to have the kind of optimized virtualized UEFI you mention.
Given that UEFI will be required for other guests (e.g. if you want to boot a distribution's ISO image), I hope that virtualised UEFI will see some optimisation work.
I think the requirement it just for KVM to provide something that behaves exactly like UEFI, it doesn't have to be the full Tianocore implementation if it's easier to reimplement the boot interface.
I agree that we don't need a full Tianocore, but whatever we have must implement the minimal interface UEFI requires (which is more than just the boot interface).
For a "boot this EFI application" workflow you can skip most of the BDS stuff, but you still need boot services and runtime services provided to the application.
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
For the features which ACPI provides which device trees do not (e.g. the dynamic addition and removal of memory and CPUs), there will need to be some sort of interface between QEMU and the ACPI implementation. That's already outside of the realm of DT, so as previously mentioned a simple conversion doesn't cover the general case.
I think we need to support the low-level interfaces in the kernel for this anyway, we should not have to use ACPI just to do memory and CPU hotplugging in KVM, that would be silly. If ACPI is present, it can provide a wrapper for the same interface, but KVM should not need to be aware of the fact that ACPI is used in the guest, after it has passed the initial ACPI blob to the kernel.
The difficulty here is that there is currently no common low-level interface between the guest and any specific hypervisor for these facilities. This would be up to kvm tool or QEMU, not KVM itself. So we'd have to spec one for !ACPI.
With ACPI, the interface should be common (following the ACPI spec), but we don't have a common interface underlying that.
I am not averse to having mechanisms for !ACPI, but we'd need to spec something.
I think any ACPI implemenation for a hypervisor should provide a demonstrable useful feature (e.g. hot-add of CPUs) before merging so we know the infrastructure is suitable.
I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a limited functionality which makes it not fully useful in practice. I'd rather see it as a first class citizen instead, including the ability to run AML code.
I agree that there's no point in having ACPI in a guest unless it provides something which dt does not. I don't know how it should be structured to provide those useful features.
I see it the opposite way: we shouldn't have to use ACPI just to make use of some feature in Linux, the only reason why you'd want ACPI support in KVM is to be able to run Windows. It makes sense for the ACPI implementation to be compatible with the Linux ACPI code as well so we can test it better.
At the moment, ACPI specifies how these features should work. So without inventing a whole new interface, ACPI is the way of providing those features.
Thanks, Mark.
On 12 November 2014 09:08, Claudio Fontana claudio.fontana@huawei.com wrote:
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
I think the motivation for "leave ACPI entirely to the firmware" is not just that the device trees are already there, but because it allows for a cleaner separation of concerns between QEMU and the firmware and thus makes QEMU simpler and easier to maintain in future. However as a result of the discussion in this thread and on IRC about what x86 QEMU/OVMF do and what the complexities of handling this in UEFI are, I'm not as sure as I was that it's actually feasible in practice.
I agree with you that if we have QEMU generating ACPI information itself then we should definitely follow the existing tested approach that x86 QEMU+OVMF have, and share code, both in QEMU and in the UEFI firmware. (As I understand it there is a common source code base between OVMF and the Tianocore code we're using for the ARM QEMU UEFI firmware. I've probably got the project names wrong here; I'm not familiar with the distinctions between Tianocore, EDK2, OVMF, etc.)
The x86 QEMU-generating-APCI approach is more complicated than what this RFC patchset does, since it generates various separate tables and hands them individually to the firmware, rather than creating a single (non-relocatable) complete ACPI blob. I would hope we didn't need to support both "provide separated tables to firmware" and "provide a single blob to a standalone guest kernel"; if we're agreed that ACPI should imply UEFI we can forget about the latter, though.
thanks -- PMM
On 12.11.2014 19:10, Peter Maydell wrote:
On 12 November 2014 09:08, Claudio Fontana claudio.fontana@huawei.com wrote:
As mentioned by others, I'd rather see an implementation of ACPI in QEMU which learns from the experience of X86 (and possibly shares some code if possible), rather than going in a different direction by creating device trees first, and then converting them to ACPI tables somewhere in the firmware, just because device trees are "already there", for the reasons which have already been mentioned before by Igor and others.
I think the motivation for "leave ACPI entirely to the firmware" is not just that the device trees are already there, but because it allows for a cleaner separation of concerns between QEMU and the firmware and thus makes QEMU simpler and easier to maintain in future. However as a result of the discussion in this thread and on IRC about what x86 QEMU/OVMF do and what the complexities of handling this in UEFI are, I'm not as sure as I was that it's actually feasible in practice.
I agree with you that if we have QEMU generating ACPI information itself then we should definitely follow the existing tested approach that x86 QEMU+OVMF have, and share code, both in QEMU and in the UEFI firmware. (As I understand it there is a common source code base between OVMF and the Tianocore code we're using for the ARM QEMU UEFI firmware. I've probably got the project names wrong here; I'm not familiar with the distinctions between Tianocore, EDK2, OVMF, etc.)
The x86 QEMU-generating-APCI approach is more complicated than what this RFC patchset does,
I think that the RFC was useful in its goal, as it generated a lot of comments, and we will also be using it to do some internal early tests with ACPI on the guest to speed up development.
I agree with you that as a result of this discussion, the solution for QEMU upstreaming purposes needs to take everything discussed (possibly more) into account.
since it generates various separate tables and hands them individually to the firmware, rather than creating a single (non-relocatable) complete ACPI blob. I would hope we didn't need to support both "provide separated tables to firmware" and "provide a single blob to a standalone guest kernel"; if we're agreed that ACPI should imply UEFI we can forget about the latter, though.
As I mentioned I am personally fine with the ACPI -> UEFI implication, also Paul (who represents my employer here) mentioned that we can live with this implication if we have to. The hope is to try to keep what needs to be implemented under control, as to have as small an impact on boot time as possible, while still complying to the specifications.
Thanks,
Claudio
On 13 November 2014 09:57, Claudio Fontana claudio.fontana@huawei.com wrote:
I agree with you that as a result of this discussion, the solution for QEMU upstreaming purposes needs to take everything discussed (possibly more) into account.
(Picking this email as a reasonable if slightly arbitrary place to try to summarise this discussion thread...)
Everybody agrees that we need to support guests that want to use ACPI. There are several possible approaches to this that have been proposed:
1. all-in-QEMU: generate a complete single ACPI table in QEMU and pass it to the guest and/or firmware, as this RFC does
2. all-in-UEFI: don't do any ACPI table generation in QEMU; just create a device tree (and some standardised interfaces for hotplugging, etc), and let the UEFI firmware put together the ACPI tables accordingly
3. mixed-mode: follow the x86/QEMU practice, and have QEMU generate a set of ACPI table fragments, which UEFI then has to stitch together in combination with any extras it wants to add itself before passing the result to the guest. [Some of these fragments might need to be generated on-demand and supplied via a fw_cfg-like conduit device; this is necessary on x86 and might be also for ARM.]
My opinion about the best approach has swung all over the place during the course of this discussion, but at the moment I think mixed-mode is the way to go.
The most common use case for ACPI is going to be with UEFI, and I'm told that the details of Tianocore and the ACPI binary format mean that it's not possible for UEFI to edit a complete ACPI blob to add any extra items it needs to, so for this use case the all-in-QEMU approach won't work. We could in theory support all-in-QEMU as an extra boot protocol for directly booting a guest that used ACPI without UEFI, alongside one of the other two methods, but I'm pretty reluctant to support multiple ways of constructing the ACPI tables and I think we have agreement that we don't need to do that. That means we can drop 'all-in-QEMU' from consideration and concentrate on the other two options.
The all-in-UEFI approach is potentially workable, and from a QEMU maintainer's perspective it has the attraction of a clean separation: QEMU only has to describe hardware in one way, and all the ACPI related code is in the firmware. However it also has some definite demerits: * we'd need code in UEFI to create ACPI tables based on the device tree information, which is at minimum code that nobody's written yet, and could be difficult even given the restriction that it only has to cope with the particular subset of dtb that QEMU generates * adding support for something like "hotplug-$FOO" needs changes in QEMU and also in UEFI, even if UEFI would otherwise not need to care about the hotplugging, because it has to translate the relevant dtb entries into ACPI * we would need to specify and define dt bindings for any (virtual) hardware used for notifying and controlling hotplug type features rather than these being an implementation detail of QEMU (though we might want to define these anyway so that you can use hotplug etc from a non-ACPI non-UEFI guest, of course)
Going with mixed-mode does mean we end up with a sort of triplication of code (once to create the board model, once to describe it in DT and once to describe it in ACPI), but the payoff is that we get to handle ACPI the same way as we do on x86, which includes a lot of preexisting code both in QEMU and in UEFI. We also don't need to update the UEFI code when we add new hotplug features.
Does anybody have an approach that I've missed from my list of three, or any advantages/disadvantages to add that might tilt the decision in a different direction?
thanks -- PMM
On 2014-10-31 2:02, Mark Rutland wrote:
On Thu, Oct 30, 2014 at 05:52:44PM +0000, Peter Maydell wrote:
On 30 October 2014 17:43, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are:
- RSDP
- XSDT
- MADT
- GTDT
- FADT
- FACS
- DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
(Sorry, I should have waited for the cover letter to arrive before replying.)
I think this is definitely the wrong approach. We already have to generate device tree information for the hardware we have, and having an equivalent parallel infrastructure for generating ACPI as well seems like it would be a tremendous mess. We should support guests that require ACPI by having QEMU boot a UEFI bios blob and have that UEFI code generate ACPI tables based on the DTB we hand it. (Chances seem good that any guest that wants ACPI is going to want UEFI runtime services anyway.)
Depending on why people want ACPI in a guest environment, generating ACPI tables from a DTB might not be possible (e.g. if they want to use AML for some reason).
Agreed.
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
There is important feature called system device dynamic reconfiguration, you know, hot-add/remove, if a gust need more/less memory or CPU, can we add or remove them dynamically with DT? ACPI can do this, but I have no idea if DT can. (Sorry, I don't know much about DT)
Thanks Hanjun
On Thu, Nov 06, 2014 at 06:53:03AM +0000, Hanjun Guo wrote:
On 2014-10-31 2:02, Mark Rutland wrote:
On Thu, Oct 30, 2014 at 05:52:44PM +0000, Peter Maydell wrote:
On 30 October 2014 17:43, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are:
- RSDP
- XSDT
- MADT
- GTDT
- FADT
- FACS
- DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
(Sorry, I should have waited for the cover letter to arrive before replying.)
I think this is definitely the wrong approach. We already have to generate device tree information for the hardware we have, and having an equivalent parallel infrastructure for generating ACPI as well seems like it would be a tremendous mess. We should support guests that require ACPI by having QEMU boot a UEFI bios blob and have that UEFI code generate ACPI tables based on the DTB we hand it. (Chances seem good that any guest that wants ACPI is going to want UEFI runtime services anyway.)
Depending on why people want ACPI in a guest environment, generating ACPI tables from a DTB might not be possible (e.g. if they want to use AML for some reason).
Agreed.
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
There is important feature called system device dynamic reconfiguration, you know, hot-add/remove, if a gust need more/less memory or CPU, can we add or remove them dynamically with DT? ACPI can do this, but I have no idea if DT can. (Sorry, I don't know much about DT)
There is no way of doing this with DT. There has been work into DT fragments/overlays where portions can be added to the tree dynamically, but that's controlled by the OS rather than the hypervisor, and there's no standard for communicating what has been hotplugged to trigger changes to the tree, so it's not quite the same. It really only works for tightly coupled hw/kernel/userspace combinations (i.e. embedded).
Depending on how you implement the hot-add/remove you might be able to get away with an initial static configuration translated from DT. If you need to describe what might be hotplugged from the start, then I suspect you cannot get away with translating a DT in general.
Thanks, Mark.
On Thursday 06 November 2014 13:30:01 Mark Rutland wrote:
There is no way of doing this with DT. There has been work into DT fragments/overlays where portions can be added to the tree dynamically, but that's controlled by the OS rather than the hypervisor, and there's no standard for communicating what has been hotplugged to trigger changes to the tree, so it's not quite the same. It really only works for tightly coupled hw/kernel/userspace combinations (i.e. embedded).
Depending on how you implement the hot-add/remove you might be able to get away with an initial static configuration translated from DT. If you need to describe what might be hotplugged from the start, then I suspect you cannot get away with translating a DT in general.
I believe IBM POWER 5/6/7 servers have an interface to update the DT from the hypervisor, but it's not a nice interface, so we may want to avoid duplicating that.
Arnd
On 06/11/2014 07:53, Hanjun Guo wrote:
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
There is important feature called system device dynamic reconfiguration, you know, hot-add/remove, if a gust need more/less memory or CPU, can we add or remove them dynamically with DT? ACPI can do this, but I have no idea if DT can. (Sorry, I don't know much about DT)
Indeed hot-add/remove is the single biggest AML user in x86 QEMU. Whether you really need it, it depends on what you are adding/removing.
For PCI there is no problem. We can use PCIe from the beginning, and use PCIe hotplug support that is already in QEMU.
Memory and CPU are more problematic. For memory we could perhaps use a PCI memory device, though I'm not sure if that would require drivers in the OS or everything just works.
CPU hotplug, however, probably requires AML. Of course it can be generated in the firmware, like we used to do for x86, but Igor explained why it wasn't a great idea. That said, one of the problems ("never ending expansion of PV QEMU-BIOS interface") could be less important since ARM DT is a better interface than x86 fw_cfg.
Paolo
On Thu, 06 Nov 2014 16:57:47 +0100 Paolo Bonzini pbonzini@redhat.com wrote:
On 06/11/2014 07:53, Hanjun Guo wrote:
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
There is important feature called system device dynamic reconfiguration, you know, hot-add/remove, if a gust need more/less memory or CPU, can we add or remove them dynamically with DT? ACPI can do this, but I have no idea if DT can. (Sorry, I don't know much about DT)
Indeed hot-add/remove is the single biggest AML user in x86 QEMU. Whether you really need it, it depends on what you are adding/removing.
For PCI there is no problem. We can use PCIe from the beginning, and use PCIe hotplug support that is already in QEMU.
Memory and CPU are more problematic. For memory we could perhaps use a PCI memory device, though I'm not sure if that would require drivers in the OS or everything just works.
BTW what's PCI memory device? Is there any reference I could read about it?
CPU hotplug, however, probably requires AML. Of course it can be generated in the firmware, like we used to do for x86, but Igor explained why it wasn't a great idea. That said, one of the problems ("never ending expansion of PV QEMU-BIOS interface") could be less important since ARM DT is a better interface than x86 fw_cfg.
Unfortunately we still would need to teach UEFI to recognize QEMU specific DT entries that were just invented, it doesn't matter what transport is used (DT or fw_cfg) to convey new information to UEFI/BIOS.
Paolo
On 06/11/2014 17:18, Igor Mammedov wrote:
On Thu, 06 Nov 2014 16:57:47 +0100 Paolo Bonzini pbonzini@redhat.com wrote:
On 06/11/2014 07:53, Hanjun Guo wrote:
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
There is important feature called system device dynamic reconfiguration, you know, hot-add/remove, if a gust need more/less memory or CPU, can we add or remove them dynamically with DT? ACPI can do this, but I have no idea if DT can. (Sorry, I don't know much about DT)
Indeed hot-add/remove is the single biggest AML user in x86 QEMU. Whether you really need it, it depends on what you are adding/removing.
For PCI there is no problem. We can use PCIe from the beginning, and use PCIe hotplug support that is already in QEMU.
Memory and CPU are more problematic. For memory we could perhaps use a PCI memory device, though I'm not sure if that would require drivers in the OS or everything just works.
BTW what's PCI memory device? Is there any reference I could read about it?
Just something with a huge BAR. Like ivshmem, but teaching the OS to see it as memory.
CPU hotplug, however, probably requires AML. Of course it can be generated in the firmware, like we used to do for x86, but Igor explained why it wasn't a great idea. That said, one of the problems ("never ending expansion of PV QEMU-BIOS interface") could be less important since ARM DT is a better interface than x86 fw_cfg.
Unfortunately we still would need to teach UEFI to recognize QEMU specific DT entries that were just invented, it doesn't matter what transport is used (DT or fw_cfg) to convey new information to UEFI/BIOS.
Right, it's just a bit more organized than fw_cfg.
Paolo
On 2014-11-6 23:57, Paolo Bonzini wrote:
On 06/11/2014 07:53, Hanjun Guo wrote:
So the important question is _why_ the guest needs to see an ACPI environment. What exactly can ACPI provide to the guest that DT does not already provide, and why is that necessary? What infrastrucutre is needed for that use case?
There is important feature called system device dynamic reconfiguration, you know, hot-add/remove, if a gust need more/less memory or CPU, can we add or remove them dynamically with DT? ACPI can do this, but I have no idea if DT can. (Sorry, I don't know much about DT)
Indeed hot-add/remove is the single biggest AML user in x86 QEMU. Whether you really need it, it depends on what you are adding/removing.
For PCI there is no problem. We can use PCIe from the beginning, and use PCIe hotplug support that is already in QEMU.
Memory and CPU are more problematic. For memory we could perhaps use a PCI memory device, though I'm not sure if that would require drivers in the OS or everything just works.
I didn't see some code for hot-adding system memory as PCI device, but people from Fujitsu is working on that in another solution - QEmu memory hot unplug support:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg251355.html
for the guest, ACPI based memory hot-add/remove was already supported.
Thanks Hanjun
Sorry for being late to the party, missed this set when it went out.
On Thu, Oct 30, 2014 at 05:52:44PM +0000, Peter Maydell wrote:
On 30 October 2014 17:43, Alexander Spyridakis a.spyridakis@virtualopensystems.com wrote:
Currently, the virt machine model generates Device Tree information dynamically based on the existing devices in the system. This patch series extends the same concept but for ACPI information instead. A total of seven tables have been implemented in this patch series, which is the minimum for a basic ARM support.
The set of generated tables are:
- RSDP
- XSDT
- MADT
- GTDT
- FADT
- FACS
- DSDT
The tables are created in standalone buffers, taking into account the needed information passed from the virt machine model. When the generation is finalized, the individual buffers are compacted to a single ACPI binary blob, where it is injected on the guest memory space in a fixed location. The guest kernel can find the ACPI tables by providing to it the physical address of the ACPI blob (e.g. acpi_rsdp=0x47000000 boot argument).
(Sorry, I should have waited for the cover letter to arrive before replying.)
I think this is definitely the wrong approach. We already have to generate device tree information for the hardware we have, and having an equivalent parallel infrastructure for generating ACPI as well seems like it would be a tremendous mess.
I don't see this as duplication, I see this as platform completeness. And it would be really useful for firmware validation.
We should support guests that require ACPI by having QEMU boot a UEFI bios blob and have that UEFI code generate ACPI tables based on the DTB we hand it. (Chances seem good that any guest that wants ACPI is going to want UEFI runtime services anyway.)
Yes, we could do that, but in doing so we would be treating ACPI as some artificial tickbox thing rather than a platform property. And it would introduce an entirely new class of ARM-specific errors.
Moreover, we currently have separate EDK2 platforms for ARM*/x86 QEMU (ArmVirtualizationQEMU vs. Ovmf). The goal there is to progressively move towards greater amounts of shared code between the two, and hopefully merge them into one eventually.
Deferring specifically on ARM the responsibility of generating (all) ACPI tables to the firmware leads to reduced amount of code shared between ARM/x86 in both QEMU and EDK2. And reduced ability to share future developments in that area (x86 QEMU seems well on the way towards letting you inject AML blobs from the command line).
/ Leif
On 9 March 2015 at 21:12, Leif Lindholm leif.lindholm@linaro.org wrote:
Sorry for being late to the party, missed this set when it went out.
You seem to be replying to an email thread which is now many months old and at a part of the conversation which was obsoleted by the subsequent discussion in this thread...
Summary-of-consensus email from this thread: http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02529.html
I think this is the most recent patchset, implementing the "mixed-mode" design: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg03290.html
(I might have missed an update, though, I've been on holiday and not tracking the list as closely the last few weeks.)
thanks -- PMM
On 2015/3/9 20:28, Peter Maydell wrote:
On 9 March 2015 at 21:12, Leif Lindholm leif.lindholm@linaro.org wrote:
Sorry for being late to the party, missed this set when it went out.
You seem to be replying to an email thread which is now many months old and at a part of the conversation which was obsoleted by the subsequent discussion in this thread...
Summary-of-consensus email from this thread: http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02529.html
I think this is the most recent patchset, implementing the "mixed-mode" design: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg03290.html
(I might have missed an update, though, I've been on holiday and not tracking the list as closely the last few weeks.)
Hi Peter,
You haven't missed it. I don't send an update version since V3. I'm waiting for the dependencies(the patches of Igor, Marcel, Michael) to be settled down. Then I'll send a new version.
On Mon, Mar 09, 2015 at 09:28:05PM +0900, Peter Maydell wrote:
On 9 March 2015 at 21:12, Leif Lindholm leif.lindholm@linaro.org wrote:
Sorry for being late to the party, missed this set when it went out.
You seem to be replying to an email thread which is now many months old and at a part of the conversation which was obsoleted by the subsequent discussion in this thread...
And here I was getting all prepared to have to argue :)
Summary-of-consensus email from this thread: http://lists.nongnu.org/archive/html/qemu-devel/2014-11/msg02529.html
Thanks!
I think this is the most recent patchset, implementing the "mixed-mode" design: http://lists.gnu.org/archive/html/qemu-devel/2015-02/msg03290.html
Excellent, that looks like exactly what I was hoping for. I'll give that a spin - thanks again and sorry for the noise.
/ Leif