From the functionality point of view this series might be split into the
following logic parts: 1. Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary. 2. Move non-arch specific bits to the core code. 3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver. 4. Enable above driver on ARM64
Patches has been built on top of 4.5-rc3 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v5)
NOTE, this patch set depends on Lorenzo's fixes: https://patchwork.ozlabs.org/patch/576450/ which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86, Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5 - dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch https://patchwork.ozlabs.org/patch/575525/ - rewrite PCI legacy IRQs allocation - squashed two patches 11 and 12 from series v4, fixed bisection issue - changelog improvements - rebased to 4.5-rc3
v3 -> v4 - dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html - added Lorenzo's fix patch 19/24 - ACPI PCI bus domain number assigning cleanup - changed resource management, we now claim and reassign resources - improvements for applying quirks - dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency - rebased to 4.5-rc1
v2 -> v3 - fix legacy IRQ assigning and IO ports registration - remove reference to arch specific companion device for ia64 - move ACPI PCI host controller driver to pci_root.c - drop generic domain assignment for x86 and ia64 as I am not able to run all necessary test variants - drop patch which cleaned legacy IRQ assignment since it belongs to Mathew's series: https://patchwork.ozlabs.org/patch/557504/ - extend MCFG quirk code - rebased to 4.4
v1 -> v2 - moved non-arch specific piece of code to dirver/acpi/ directory - fixed IO resource handling - introduced PCI config accessors quirks matching - moved ACPI_COMPANION_SET to generic code
v1 - https://lkml.org/lkml/2015/10/27/504 v2 - https://lkml.org/lkml/2015/12/16/246 v3 - http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04308.html v4 - https://lkml.org/lkml/2016/2/4/646
Jayachandran C (1): ACPI: MCFG: Move mmcfg_list management to drivers/acpi
Lorenzo Pieralisi (1): drivers: pci: add generic code to claim bus resources
Tomasz Nowicki (13): acpi, pci, mcfg: Provide default RAW ACPI PCI config space accessors. arm64, acpi: Use MCFG library and empty PCI config space accessors from pci_mcfg.c file. pci, acpi, ecam: Add flag to indicate whether ECAM region was hot added or not. x86, pci: Cleanup platform specific MCFG data by using ECAM hot_added flag. pci, acpi, x86, ia64: Move ACPI host bridge device companion assignment to core code. pci, acpi: Provide generic way to assign bus domain number. x86, ia64: Include acpi_pci_{add|remove}_bus to the default pcibios_{add|remove}_bus implementation. acpi, mcfg: Add default PCI config accessors implementation and initial support for related quirks. pci, of: Move the PCI I/O space management to PCI core code. pci, acpi: Support for ACPI based generic PCI host controller initialization pci, acpi: Match PCI config space accessors against platfrom specific quirks. arm64, pci, acpi: Assign legacy IRQs once device is enable. arm64, pci, acpi: Start using ACPI based PCI host bridge driver for ARM64.
arch/arm64/Kconfig | 5 + arch/arm64/kernel/pci.c | 35 +--- arch/ia64/hp/common/sba_iommu.c | 2 +- arch/ia64/include/asm/pci.h | 1 - arch/ia64/pci/pci.c | 26 --- arch/ia64/sn/kernel/io_acpi_init.c | 4 +- arch/x86/include/asm/pci.h | 3 - arch/x86/include/asm/pci_x86.h | 24 +-- arch/x86/pci/acpi.c | 47 +---- arch/x86/pci/common.c | 10 - arch/x86/pci/mmconfig-shared.c | 269 ++++--------------------- arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + arch/x86/pci/numachip.c | 1 + drivers/acpi/Kconfig | 7 + drivers/acpi/Makefile | 1 + drivers/acpi/pci_mcfg.c | 392 +++++++++++++++++++++++++++++++++++++ drivers/acpi/pci_root.c | 154 ++++++++++++++- drivers/of/address.c | 116 +---------- drivers/pci/pci.c | 126 +++++++++++- drivers/pci/probe.c | 5 + drivers/pci/setup-bus.c | 63 ++++++ drivers/xen/pci.c | 5 +- include/acpi/acpi_bus.h | 1 + include/asm-generic/vmlinux.lds.h | 7 + include/linux/of_address.h | 9 - include/linux/pci-acpi.h | 68 +++++++ include/linux/pci.h | 6 + 28 files changed, 892 insertions(+), 497 deletions(-) create mode 100644 drivers/acpi/pci_mcfg.c
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
Signed-off-by: Jayachandran C jchandra@broadcom.com [Xen parts:] Acked-by: David Vrabel david.vrabel@citrix.com --- arch/x86/include/asm/pci_x86.h | 24 +--- arch/x86/pci/mmconfig-shared.c | 269 +++++------------------------------ arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + arch/x86/pci/numachip.c | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/pci_mcfg.c | 312 +++++++++++++++++++++++++++++++++++++++++ drivers/xen/pci.c | 5 +- include/linux/pci-acpi.h | 33 +++++ 9 files changed, 386 insertions(+), 261 deletions(-) create mode 100644 drivers/acpi/pci_mcfg.c
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 46873fb..7824626 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -122,33 +122,11 @@ extern int pci_legacy_init(void); extern void pcibios_fixup_irqs(void);
/* pci-mmconfig.c */ - -/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ -#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) - -struct pci_mmcfg_region { - struct list_head list; - struct resource res; - u64 address; - char __iomem *virt; - u16 segment; - u8 start_bus; - u8 end_bus; - char name[PCI_MMCFG_RESOURCE_NAME_LEN]; -}; - +struct pci_mmcfg_region; extern int __init pci_mmcfg_arch_init(void); extern void __init pci_mmcfg_arch_free(void); extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); -extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, - phys_addr_t addr); -extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); -extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); - -extern struct list_head pci_mmcfg_list; - -#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
/* * On AMD Fam10h CPUs, all PCI MMIO configuration space accesses must use diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index dd30b7e..626710b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -12,13 +12,12 @@
#include <linux/pci.h> #include <linux/init.h> -#include <linux/sfi_acpi.h> #include <linux/bitmap.h> -#include <linux/dmi.h> #include <linux/slab.h> #include <linux/mutex.h> #include <linux/rculist.h> #include <asm/e820.h> +#include <linux/pci-acpi.h> #include <asm/pci_x86.h> #include <asm/acpi.h>
@@ -27,9 +26,6 @@ /* Indicate if the mmcfg resources have been placed into the resource table. */ static bool pci_mmcfg_running_state; static bool pci_mmcfg_arch_init_failed; -static DEFINE_MUTEX(pci_mmcfg_lock); - -LIST_HEAD(pci_mmcfg_list);
static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) { @@ -48,83 +44,6 @@ static void __init free_all_mmcfg(void) pci_mmconfig_remove(cfg); }
-static void list_add_sorted(struct pci_mmcfg_region *new) -{ - struct pci_mmcfg_region *cfg; - - /* keep list sorted by segment and starting bus number */ - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { - if (cfg->segment > new->segment || - (cfg->segment == new->segment && - cfg->start_bus >= new->start_bus)) { - list_add_tail_rcu(&new->list, &cfg->list); - return; - } - } - list_add_tail_rcu(&new->list, &pci_mmcfg_list); -} - -static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, - int end, u64 addr) -{ - struct pci_mmcfg_region *new; - struct resource *res; - - if (addr == 0) - return NULL; - - new = kzalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - new->address = addr; - new->segment = segment; - new->start_bus = start; - new->end_bus = end; - - res = &new->res; - res->start = addr + PCI_MMCFG_BUS_OFFSET(start); - res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1; - res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; - snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN, - "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end); - res->name = new->name; - - return new; -} - -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) -{ - struct pci_mmcfg_region *new; - - new = pci_mmconfig_alloc(segment, start, end, addr); - if (new) { - mutex_lock(&pci_mmcfg_lock); - list_add_sorted(new); - mutex_unlock(&pci_mmcfg_lock); - - pr_info(PREFIX - "MMCONFIG for domain %04x [bus %02x-%02x] at %pR " - "(base %#lx)\n", - segment, start, end, &new->res, (unsigned long)addr); - } - - return new; -} - -struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) - if (cfg->segment == segment && - cfg->start_bus <= bus && bus <= cfg->end_bus) - return cfg; - - return NULL; -} - static const char *__init pci_mmcfg_e7520(void) { u32 win; @@ -543,73 +462,6 @@ static void __init pci_mmcfg_reject_broken(int early) } }
-static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, - struct acpi_mcfg_allocation *cfg) -{ - int year; - - if (cfg->address < 0xFFFFFFFF) - return 0; - - if (!strncmp(mcfg->header.oem_id, "SGI", 3)) - return 0; - - if (mcfg->header.revision >= 1) { - if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) && - year >= 2010) - return 0; - } - - pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx " - "is above 4GB, ignored\n", cfg->pci_segment, - cfg->start_bus_number, cfg->end_bus_number, cfg->address); - return -EINVAL; -} - -static int __init pci_parse_mcfg(struct acpi_table_header *header) -{ - struct acpi_table_mcfg *mcfg; - struct acpi_mcfg_allocation *cfg_table, *cfg; - unsigned long i; - int entries; - - if (!header) - return -EINVAL; - - mcfg = (struct acpi_table_mcfg *)header; - - /* how many config structures do we have */ - free_all_mmcfg(); - entries = 0; - i = header->length - sizeof(struct acpi_table_mcfg); - while (i >= sizeof(struct acpi_mcfg_allocation)) { - entries++; - i -= sizeof(struct acpi_mcfg_allocation); - } - if (entries == 0) { - pr_err(PREFIX "MMCONFIG has no entries\n"); - return -ENODEV; - } - - cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; - for (i = 0; i < entries; i++) { - cfg = &cfg_table[i]; - if (acpi_mcfg_check_entry(mcfg, cfg)) { - free_all_mmcfg(); - return -ENODEV; - } - - if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, - cfg->end_bus_number, cfg->address) == NULL) { - pr_warn(PREFIX "no memory for MCFG entries\n"); - free_all_mmcfg(); - return -ENOMEM; - } - } - - return 0; -} - #ifdef CONFIG_ACPI_APEI extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size, void *data), void *data); @@ -662,13 +514,20 @@ static void __init __pci_mmcfg_init(int early)
static int __initdata known_bridge;
+static void __init pci_mmcfg_list_setup(void) +{ + free_all_mmcfg(); + if (pci_mmconfig_parse_table()) + free_all_mmcfg(); +} + void __init pci_mmcfg_early_init(void) { if (pci_probe & PCI_PROBE_MMCONF) { if (pci_mmcfg_check_hostbridge()) known_bridge = 1; else - acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); + pci_mmcfg_list_setup(); __pci_mmcfg_init(1);
set_apei_filter(); @@ -686,7 +545,7 @@ void __init pci_mmcfg_late_init(void)
/* MMCONFIG hasn't been enabled yet, try again */ if (pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF) { - acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); + pci_mmcfg_list_setup(); __pci_mmcfg_init(0); } } @@ -720,99 +579,41 @@ static int __init pci_mmcfg_late_insert_resources(void) */ late_initcall(pci_mmcfg_late_insert_resources);
-/* Add MMCFG information for host bridges */ -int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, - phys_addr_t addr) +int pci_mmconfig_map_resource(struct device *dev, struct pci_mmcfg_region *cfg) { - int rc; - struct resource *tmp = NULL; - struct pci_mmcfg_region *cfg; + struct resource *tmp;
- if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed) - return -ENODEV; - - if (start > end) - return -EINVAL; - - mutex_lock(&pci_mmcfg_lock); - cfg = pci_mmconfig_lookup(seg, start); - if (cfg) { - if (cfg->end_bus < end) - dev_info(dev, FW_INFO - "MMCONFIG for " - "domain %04x [bus %02x-%02x] " - "only partially covers this bridge\n", - cfg->segment, cfg->start_bus, cfg->end_bus); - mutex_unlock(&pci_mmcfg_lock); - return -EEXIST; - } - - if (!addr) { - mutex_unlock(&pci_mmcfg_lock); - return -EINVAL; - } - - rc = -EBUSY; - cfg = pci_mmconfig_alloc(seg, start, end, addr); - if (cfg == NULL) { - dev_warn(dev, "fail to add MMCONFIG (out of memory)\n"); - rc = -ENOMEM; - } else if (!pci_mmcfg_check_reserved(dev, cfg, 0)) { + if (!pci_mmcfg_check_reserved(dev, cfg, 0)) { dev_warn(dev, FW_BUG "MMCONFIG %pR isn't reserved\n", &cfg->res); - } else { - /* Insert resource if it's not in boot stage */ - if (pci_mmcfg_running_state) - tmp = insert_resource_conflict(&iomem_resource, - &cfg->res); - + return -EBUSY; + } + /* Insert resource if it's not in boot stage */ + if (pci_mmcfg_running_state) { + tmp = insert_resource_conflict(&iomem_resource, + &cfg->res); if (tmp) { - dev_warn(dev, - "MMCONFIG %pR conflicts with " - "%s %pR\n", - &cfg->res, tmp->name, tmp); - } else if (pci_mmcfg_arch_map(cfg)) { - dev_warn(dev, "fail to map MMCONFIG %pR.\n", - &cfg->res); - } else { - list_add_sorted(cfg); - dev_info(dev, "MMCONFIG at %pR (base %#lx)\n", - &cfg->res, (unsigned long)addr); - cfg = NULL; - rc = 0; + dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n", + &cfg->res, tmp->name, tmp); + return -EBUSY; } } - - if (cfg) { - if (cfg->res.parent) - release_resource(&cfg->res); - kfree(cfg); + if (pci_mmcfg_arch_map(cfg)) { + dev_warn(dev, "fail to map MMCONFIG %pR.\n", &cfg->res); + return -EBUSY; } - - mutex_unlock(&pci_mmcfg_lock); - - return rc; + return 0; }
-/* Delete MMCFG information for host bridges */ -int pci_mmconfig_delete(u16 seg, u8 start, u8 end) +void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *cfg) { - struct pci_mmcfg_region *cfg; - - mutex_lock(&pci_mmcfg_lock); - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) - if (cfg->segment == seg && cfg->start_bus == start && - cfg->end_bus == end) { - list_del_rcu(&cfg->list); - synchronize_rcu(); - pci_mmcfg_arch_unmap(cfg); - if (cfg->res.parent) - release_resource(&cfg->res); - mutex_unlock(&pci_mmcfg_lock); - kfree(cfg); - return 0; - } - mutex_unlock(&pci_mmcfg_lock); + pci_mmcfg_arch_unmap(cfg); + if (cfg->res.parent) + release_resource(&cfg->res); + cfg->res.parent = NULL; +}
- return -ENOENT; +int pci_mmconfig_enabled(void) +{ + return (pci_probe & PCI_PROBE_MMCONF) && !pci_mmcfg_arch_init_failed; } diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 43984bc..38a37f8 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -12,6 +12,7 @@ #include <linux/pci.h> #include <linux/init.h> #include <linux/rcupdate.h> +#include <linux/pci-acpi.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index bea5249..29253ec 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -10,6 +10,7 @@ #include <linux/acpi.h> #include <linux/bitmap.h> #include <linux/rcupdate.h> +#include <linux/pci-acpi.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c index 2e565e6..c181eeb 100644 --- a/arch/x86/pci/numachip.c +++ b/arch/x86/pci/numachip.c @@ -14,6 +14,7 @@ */
#include <linux/pci.h> +#include <linux/pci-acpi.h> #include <asm/pci_x86.h>
static u8 limit __read_mostly; diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 7ea903d..e5e4393 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -40,6 +40,7 @@ acpi-$(CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC) += processor_pdc.o acpi-y += ec.o acpi-$(CONFIG_ACPI_DOCK) += dock.o acpi-y += pci_root.o pci_link.o pci_irq.o +acpi-$(CONFIG_PCI_MMCONFIG) += pci_mcfg.o acpi-y += acpi_lpss.o acpi_apd.o acpi-y += acpi_platform.o acpi-y += acpi_pnp.o diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c @@ -0,0 +1,312 @@ +/* + * pci_mcfg.c + * + * Common code to maintain the MCFG areas and mappings + * + * This has been extracted from arch/x86/pci/mmconfig-shared.c + * and moved here so that other architectures can use this code. + */ + +#include <linux/pci.h> +#include <linux/init.h> +#include <linux/dmi.h> +#include <linux/pci-acpi.h> +#include <linux/sfi_acpi.h> +#include <linux/slab.h> +#include <linux/mutex.h> +#include <linux/rculist.h> + +#define PREFIX "ACPI: " + +static DEFINE_MUTEX(pci_mmcfg_lock); +LIST_HEAD(pci_mmcfg_list); + +static void list_add_sorted(struct pci_mmcfg_region *new) +{ + struct pci_mmcfg_region *cfg; + + /* keep list sorted by segment and starting bus number */ + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { + if (cfg->segment > new->segment || + (cfg->segment == new->segment && + cfg->start_bus >= new->start_bus)) { + list_add_tail_rcu(&new->list, &cfg->list); + return; + } + } + list_add_tail_rcu(&new->list, &pci_mmcfg_list); +} + +static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, + int end, u64 addr) +{ + struct pci_mmcfg_region *new; + struct resource *res; + + if (addr == 0) + return NULL; + + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return NULL; + + new->address = addr; + new->segment = segment; + new->start_bus = start; + new->end_bus = end; + + res = &new->res; + res->start = addr + PCI_MMCFG_BUS_OFFSET(start); + res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1; + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; + snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN, + "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end); + res->name = new->name; + + return new; +} + +struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, + int end, u64 addr) +{ + struct pci_mmcfg_region *new; + + new = pci_mmconfig_alloc(segment, start, end, addr); + if (new) { + mutex_lock(&pci_mmcfg_lock); + list_add_sorted(new); + mutex_unlock(&pci_mmcfg_lock); + + pr_info(PREFIX + "MMCONFIG for domain %04x [bus %02x-%02x] at %pR " + "(base %#lx)\n", + segment, start, end, &new->res, (unsigned long)addr); + } + + return new; +} + +struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) + if (cfg->segment == segment && + cfg->start_bus <= bus && bus <= cfg->end_bus) + return cfg; + + return NULL; +} + +/* + * Map a pci_mmcfg_region, can be overrriden by arch + */ +int __weak pci_mmconfig_map_resource(struct device *dev, + struct pci_mmcfg_region *mcfg) +{ + struct resource *tmp; + void __iomem *vaddr; + + tmp = insert_resource_conflict(&iomem_resource, &mcfg->res); + if (tmp) { + dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n", + &mcfg->res, tmp->name, tmp); + return -EBUSY; + } + + vaddr = ioremap(mcfg->res.start, resource_size(&mcfg->res)); + if (!vaddr) { + release_resource(&mcfg->res); + return -ENOMEM; + } + + mcfg->virt = vaddr; + return 0; +} + +/* + * Unmap a pci_mmcfg_region, can be overrriden by arch + */ +void __weak pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg) +{ + if (mcfg->virt) { + iounmap(mcfg->virt); + mcfg->virt = NULL; + } + if (mcfg->res.parent) { + release_resource(&mcfg->res); + mcfg->res.parent = NULL; + } +} + +/* + * check if the mmconfig is enabled and configured + */ +int __weak pci_mmconfig_enabled(void) +{ + return 1; +} + +/* Add MMCFG information for host bridges */ +int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, + phys_addr_t addr) +{ + struct pci_mmcfg_region *cfg; + int rc; + + if (!pci_mmconfig_enabled()) + return -ENODEV; + if (start > end) + return -EINVAL; + + mutex_lock(&pci_mmcfg_lock); + cfg = pci_mmconfig_lookup(seg, start); + if (cfg) { + if (cfg->end_bus < end) + dev_info(dev, FW_INFO + "MMCONFIG for " + "domain %04x [bus %02x-%02x] " + "only partially covers this bridge\n", + cfg->segment, cfg->start_bus, cfg->end_bus); + rc = -EEXIST; + goto err; + } + + if (!addr) { + rc = -EINVAL; + goto err; + } + + cfg = pci_mmconfig_alloc(seg, start, end, addr); + if (cfg == NULL) { + dev_warn(dev, "fail to add MMCONFIG (out of memory)\n"); + rc = -ENOMEM; + goto err; + } + rc = pci_mmconfig_map_resource(dev, cfg); + if (!rc) { + list_add_sorted(cfg); + dev_info(dev, "MMCONFIG at %pR (base %#lx)\n", + &cfg->res, (unsigned long)addr); + return 0; + } else { + if (cfg->res.parent) + release_resource(&cfg->res); + kfree(cfg); + } + +err: + mutex_unlock(&pci_mmcfg_lock); + return rc; +} + +/* Delete MMCFG information for host bridges */ +int pci_mmconfig_delete(u16 seg, u8 start, u8 end) +{ + struct pci_mmcfg_region *cfg; + + mutex_lock(&pci_mmcfg_lock); + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) + if (cfg->segment == seg && cfg->start_bus == start && + cfg->end_bus == end) { + list_del_rcu(&cfg->list); + synchronize_rcu(); + pci_mmconfig_unmap_resource(cfg); + mutex_unlock(&pci_mmcfg_lock); + kfree(cfg); + return 0; + } + mutex_unlock(&pci_mmcfg_lock); + + return -ENOENT; +} + +static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, + struct acpi_mcfg_allocation *cfg) +{ + int year; + + if (!config_enabled(CONFIG_X86)) + return 0; + + if (cfg->address < 0xFFFFFFFF) + return 0; + + if (!strncmp(mcfg->header.oem_id, "SGI", 3)) + return 0; + + if (mcfg->header.revision >= 1) { + if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) && + year >= 2010) + return 0; + } + + pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx " + "is above 4GB, ignored\n", cfg->pci_segment, + cfg->start_bus_number, cfg->end_bus_number, cfg->address); + return -EINVAL; +} + +static int __init pci_parse_mcfg(struct acpi_table_header *header) +{ + struct acpi_table_mcfg *mcfg; + struct acpi_mcfg_allocation *cfg_table, *cfg; + unsigned long i; + int entries; + + if (!header) + return -EINVAL; + + mcfg = (struct acpi_table_mcfg *)header; + + /* how many config structures do we have */ + entries = 0; + i = header->length - sizeof(struct acpi_table_mcfg); + while (i >= sizeof(struct acpi_mcfg_allocation)) { + entries++; + i -= sizeof(struct acpi_mcfg_allocation); + } + if (entries == 0) { + pr_err(PREFIX "MMCONFIG has no entries\n"); + return -ENODEV; + } + + cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; + for (i = 0; i < entries; i++) { + cfg = &cfg_table[i]; + if (acpi_mcfg_check_entry(mcfg, cfg)) + return -ENODEV; + + if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, + cfg->end_bus_number, cfg->address) == NULL) { + pr_warn(PREFIX "no memory for MCFG entries\n"); + return -ENOMEM; + } + } + + return 0; +} + +int __init pci_mmconfig_parse_table(void) +{ + return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); +} + +void __weak __init pci_mmcfg_late_init(void) +{ + int err, n = 0; + struct pci_mmcfg_region *cfg; + + err = pci_mmconfig_parse_table(); + if (err) { + pr_err(PREFIX " Failed to parse MCFG (%d)\n", err); + return; + } + + list_for_each_entry(cfg, &pci_mmcfg_list, list) { + pci_mmconfig_map_resource(NULL, cfg); + n++; + } + + pr_info(PREFIX " MCFG table loaded %d entries\n", n); +} diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 7494dbe..97aa9d3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -27,9 +27,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif
static bool __read_mostly pci_seg_supported = true;
@@ -221,7 +218,7 @@ static int __init xen_mcfg_late(void) if (!xen_initial_domain()) return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0) + if (!pci_mmconfig_enabled()) return 0;
if (list_empty(&pci_mmcfg_list)) diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) + +struct pci_mmcfg_region { + struct list_head list; + struct resource res; + u64 address; + char __iomem *virt; + u16 segment; + u8 start_bus; + u8 end_bus; + char name[PCI_MMCFG_RESOURCE_NAME_LEN]; +}; + +extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, + phys_addr_t addr); +extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); + +extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, + int end, u64 addr); +extern int pci_mmconfig_map_resource(struct device *dev, + struct pci_mmcfg_region *mcfg); +extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void); + +extern struct list_head pci_mmcfg_list; + +#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12) + #else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
Signed-off-by: Jayachandran C jchandra@broadcom.com [Xen parts:] Acked-by: David Vrabel david.vrabel@citrix.com
arch/x86/include/asm/pci_x86.h | 24 +--- arch/x86/pci/mmconfig-shared.c | 269 +++++------------------------------ arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + arch/x86/pci/numachip.c | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/pci_mcfg.c | 312 +++++++++++++++++++++++++++++++++++++++++ drivers/xen/pci.c | 5 +- include/linux/pci-acpi.h | 33 +++++ 9 files changed, 386 insertions(+), 261 deletions(-) create mode 100644 drivers/acpi/pci_mcfg.c
This patch makes perfect sense to me and manages to move MCFG handling to common code in a seamless manner, it is basically a code move with weak functions to cater for X86 specific legacy bits which are otherwise pretty complex to untangle, so (apart from a few nits below):
Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
[...]
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c @@ -0,0 +1,312 @@ +/*
- pci_mcfg.c
- Common code to maintain the MCFG areas and mappings
- This has been extracted from arch/x86/pci/mmconfig-shared.c
- and moved here so that other architectures can use this code.
- */
+#include <linux/pci.h> +#include <linux/init.h> +#include <linux/dmi.h> +#include <linux/pci-acpi.h> +#include <linux/sfi_acpi.h> +#include <linux/slab.h> +#include <linux/mutex.h> +#include <linux/rculist.h>
Nit: while at it order them alphabetically.
+#define PREFIX "ACPI: "
+static DEFINE_MUTEX(pci_mmcfg_lock); +LIST_HEAD(pci_mmcfg_list);
+static void list_add_sorted(struct pci_mmcfg_region *new) +{
- struct pci_mmcfg_region *cfg;
- /* keep list sorted by segment and starting bus number */
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
if (cfg->segment > new->segment ||
(cfg->segment == new->segment &&
cfg->start_bus >= new->start_bus)) {
list_add_tail_rcu(&new->list, &cfg->list);
return;
}
- }
- list_add_tail_rcu(&new->list, &pci_mmcfg_list);
+}
+static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
int end, u64 addr)
+{
- struct pci_mmcfg_region *new;
- struct resource *res;
- if (addr == 0)
return NULL;
- new = kzalloc(sizeof(*new), GFP_KERNEL);
- if (!new)
return NULL;
- new->address = addr;
- new->segment = segment;
- new->start_bus = start;
- new->end_bus = end;
- res = &new->res;
- res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
- res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
- res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
"PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
- res->name = new->name;
- return new;
+}
+struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr)
+{
- struct pci_mmcfg_region *new;
- new = pci_mmconfig_alloc(segment, start, end, addr);
- if (new) {
mutex_lock(&pci_mmcfg_lock);
list_add_sorted(new);
mutex_unlock(&pci_mmcfg_lock);
pr_info(PREFIX
"MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
"(base %#lx)\n",
segment, start, end, &new->res, (unsigned long)addr);
- }
- return new;
+}
+struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) +{
- struct pci_mmcfg_region *cfg;
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;
- return NULL;
+}
+/*
- Map a pci_mmcfg_region, can be overrriden by arch
s/overrriden/overridden/
[...]
+static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
struct acpi_mcfg_allocation *cfg)
+{
- int year;
- if (!config_enabled(CONFIG_X86))
return 0;
This check in generic code may ruffle someone's feathers, I even think we can run this function safely on ARM64 but to prevent surprises we'd better keep the X86 check, alternatives like adding a weak function just for a quirk do not make much sense to me.
Lorenzo
- if (cfg->address < 0xFFFFFFFF)
return 0;
- if (!strncmp(mcfg->header.oem_id, "SGI", 3))
return 0;
- if (mcfg->header.revision >= 1) {
if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
year >= 2010)
return 0;
- }
- pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
"is above 4GB, ignored\n", cfg->pci_segment,
cfg->start_bus_number, cfg->end_bus_number, cfg->address);
- return -EINVAL;
+}
+static int __init pci_parse_mcfg(struct acpi_table_header *header) +{
- struct acpi_table_mcfg *mcfg;
- struct acpi_mcfg_allocation *cfg_table, *cfg;
- unsigned long i;
- int entries;
- if (!header)
return -EINVAL;
- mcfg = (struct acpi_table_mcfg *)header;
- /* how many config structures do we have */
- entries = 0;
- i = header->length - sizeof(struct acpi_table_mcfg);
- while (i >= sizeof(struct acpi_mcfg_allocation)) {
entries++;
i -= sizeof(struct acpi_mcfg_allocation);
- }
- if (entries == 0) {
pr_err(PREFIX "MMCONFIG has no entries\n");
return -ENODEV;
- }
- cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
- for (i = 0; i < entries; i++) {
cfg = &cfg_table[i];
if (acpi_mcfg_check_entry(mcfg, cfg))
return -ENODEV;
if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number, cfg->address) == NULL) {
pr_warn(PREFIX "no memory for MCFG entries\n");
return -ENOMEM;
}
- }
- return 0;
+}
+int __init pci_mmconfig_parse_table(void) +{
- return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
+void __weak __init pci_mmcfg_late_init(void) +{
- int err, n = 0;
- struct pci_mmcfg_region *cfg;
- err = pci_mmconfig_parse_table();
- if (err) {
pr_err(PREFIX " Failed to parse MCFG (%d)\n", err);
return;
- }
- list_for_each_entry(cfg, &pci_mmcfg_list, list) {
pci_mmconfig_map_resource(NULL, cfg);
n++;
- }
- pr_info(PREFIX " MCFG table loaded %d entries\n", n);
+} diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 7494dbe..97aa9d3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -27,9 +27,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif static bool __read_mostly pci_seg_supported = true; @@ -221,7 +218,7 @@ static int __init xen_mcfg_late(void) if (!xen_initial_domain()) return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0)
- if (!pci_mmconfig_enabled()) return 0;
if (list_empty(&pci_mmcfg_list)) diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09 +/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
- struct list_head list;
- struct resource res;
- u64 address;
- char __iomem *virt;
- u16 segment;
- u8 start_bus;
- u8 end_bus;
- char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
- struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { } -- 1.9.1
Hi Tomasz
在 2016/2/16 21:53, Tomasz Nowicki 写道:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
Signed-off-by: Jayachandran C jchandra@broadcom.com [Xen parts:] Acked-by: David Vrabel david.vrabel@citrix.com
arch/x86/include/asm/pci_x86.h | 24 +--- arch/x86/pci/mmconfig-shared.c | 269 +++++------------------------------ arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + arch/x86/pci/numachip.c | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/pci_mcfg.c | 312 +++++++++++++++++++++++++++++++++++++++++ drivers/xen/pci.c | 5 +- include/linux/pci-acpi.h | 33 +++++ 9 files changed, 386 insertions(+), 261 deletions(-) create mode 100644 drivers/acpi/pci_mcfg.c
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 46873fb..7824626 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -122,33 +122,11 @@ extern int pci_legacy_init(void); extern void pcibios_fixup_irqs(void);
/* pci-mmconfig.c */
-/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ -#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
-struct pci_mmcfg_region {
- struct list_head list;
- struct resource res;
- u64 address;
- char __iomem *virt;
- u16 segment;
- u8 start_bus;
- u8 end_bus;
- char name[PCI_MMCFG_RESOURCE_NAME_LEN];
-};
+struct pci_mmcfg_region; extern int __init pci_mmcfg_arch_init(void); extern void __init pci_mmcfg_arch_free(void); extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); -extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
-extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); -extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
-extern struct list_head pci_mmcfg_list;
-#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
/*
- On AMD Fam10h CPUs, all PCI MMIO configuration space accesses must use
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index dd30b7e..626710b 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -12,13 +12,12 @@
#include <linux/pci.h> #include <linux/init.h> -#include <linux/sfi_acpi.h> #include <linux/bitmap.h> -#include <linux/dmi.h> #include <linux/slab.h> #include <linux/mutex.h> #include <linux/rculist.h> #include <asm/e820.h> +#include <linux/pci-acpi.h> #include <asm/pci_x86.h> #include <asm/acpi.h>
@@ -27,9 +26,6 @@ /* Indicate if the mmcfg resources have been placed into the resource table. */ static bool pci_mmcfg_running_state; static bool pci_mmcfg_arch_init_failed; -static DEFINE_MUTEX(pci_mmcfg_lock);
-LIST_HEAD(pci_mmcfg_list);
static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) { @@ -48,83 +44,6 @@ static void __init free_all_mmcfg(void) pci_mmconfig_remove(cfg); }
-static void list_add_sorted(struct pci_mmcfg_region *new) -{
- struct pci_mmcfg_region *cfg;
- /* keep list sorted by segment and starting bus number */
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
if (cfg->segment > new->segment ||
(cfg->segment == new->segment &&
cfg->start_bus >= new->start_bus)) {
list_add_tail_rcu(&new->list, &cfg->list);
return;
}
- }
- list_add_tail_rcu(&new->list, &pci_mmcfg_list);
-}
-static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
int end, u64 addr)
-{
- struct pci_mmcfg_region *new;
- struct resource *res;
- if (addr == 0)
return NULL;
- new = kzalloc(sizeof(*new), GFP_KERNEL);
- if (!new)
return NULL;
- new->address = addr;
- new->segment = segment;
- new->start_bus = start;
- new->end_bus = end;
- res = &new->res;
- res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
- res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
- res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
"PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
- res->name = new->name;
- return new;
-}
-static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start,
int end, u64 addr)
-{
- struct pci_mmcfg_region *new;
- new = pci_mmconfig_alloc(segment, start, end, addr);
- if (new) {
mutex_lock(&pci_mmcfg_lock);
list_add_sorted(new);
mutex_unlock(&pci_mmcfg_lock);
pr_info(PREFIX
"MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
"(base %#lx)\n",
segment, start, end, &new->res, (unsigned long)addr);
- }
- return new;
-}
-struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) -{
- struct pci_mmcfg_region *cfg;
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;
- return NULL;
-}
- static const char *__init pci_mmcfg_e7520(void) { u32 win;
@@ -543,73 +462,6 @@ static void __init pci_mmcfg_reject_broken(int early) } }
-static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
struct acpi_mcfg_allocation *cfg)
-{
- int year;
- if (cfg->address < 0xFFFFFFFF)
return 0;
- if (!strncmp(mcfg->header.oem_id, "SGI", 3))
return 0;
- if (mcfg->header.revision >= 1) {
if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
year >= 2010)
return 0;
- }
- pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
"is above 4GB, ignored\n", cfg->pci_segment,
cfg->start_bus_number, cfg->end_bus_number, cfg->address);
- return -EINVAL;
-}
-static int __init pci_parse_mcfg(struct acpi_table_header *header) -{
- struct acpi_table_mcfg *mcfg;
- struct acpi_mcfg_allocation *cfg_table, *cfg;
- unsigned long i;
- int entries;
- if (!header)
return -EINVAL;
- mcfg = (struct acpi_table_mcfg *)header;
- /* how many config structures do we have */
- free_all_mmcfg();
- entries = 0;
- i = header->length - sizeof(struct acpi_table_mcfg);
- while (i >= sizeof(struct acpi_mcfg_allocation)) {
entries++;
i -= sizeof(struct acpi_mcfg_allocation);
- }
- if (entries == 0) {
pr_err(PREFIX "MMCONFIG has no entries\n");
return -ENODEV;
- }
- cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
- for (i = 0; i < entries; i++) {
cfg = &cfg_table[i];
if (acpi_mcfg_check_entry(mcfg, cfg)) {
free_all_mmcfg();
return -ENODEV;
}
if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number, cfg->address) == NULL) {
pr_warn(PREFIX "no memory for MCFG entries\n");
free_all_mmcfg();
return -ENOMEM;
}
- }
- return 0;
-}
- #ifdef CONFIG_ACPI_APEI extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size, void *data), void *data);
@@ -662,13 +514,20 @@ static void __init __pci_mmcfg_init(int early)
static int __initdata known_bridge;
+static void __init pci_mmcfg_list_setup(void) +{
- free_all_mmcfg();
- if (pci_mmconfig_parse_table())
free_all_mmcfg();
+}
- void __init pci_mmcfg_early_init(void) { if (pci_probe & PCI_PROBE_MMCONF) { if (pci_mmcfg_check_hostbridge()) known_bridge = 1; else
acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
pci_mmcfg_list_setup();
__pci_mmcfg_init(1);
set_apei_filter();
@@ -686,7 +545,7 @@ void __init pci_mmcfg_late_init(void)
/* MMCONFIG hasn't been enabled yet, try again */ if (pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF) {
acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
__pci_mmcfg_init(0); } }pci_mmcfg_list_setup();
@@ -720,99 +579,41 @@ static int __init pci_mmcfg_late_insert_resources(void) */ late_initcall(pci_mmcfg_late_insert_resources);
-/* Add MMCFG information for host bridges */ -int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr)
+int pci_mmconfig_map_resource(struct device *dev, struct pci_mmcfg_region *cfg) {
- int rc;
- struct resource *tmp = NULL;
- struct pci_mmcfg_region *cfg;
- struct resource *tmp;
- if (!(pci_probe & PCI_PROBE_MMCONF) || pci_mmcfg_arch_init_failed)
return -ENODEV;
- if (start > end)
return -EINVAL;
- mutex_lock(&pci_mmcfg_lock);
- cfg = pci_mmconfig_lookup(seg, start);
- if (cfg) {
if (cfg->end_bus < end)
dev_info(dev, FW_INFO
"MMCONFIG for "
"domain %04x [bus %02x-%02x] "
"only partially covers this bridge\n",
cfg->segment, cfg->start_bus, cfg->end_bus);
mutex_unlock(&pci_mmcfg_lock);
return -EEXIST;
- }
- if (!addr) {
mutex_unlock(&pci_mmcfg_lock);
return -EINVAL;
- }
- rc = -EBUSY;
- cfg = pci_mmconfig_alloc(seg, start, end, addr);
- if (cfg == NULL) {
dev_warn(dev, "fail to add MMCONFIG (out of memory)\n");
rc = -ENOMEM;
- } else if (!pci_mmcfg_check_reserved(dev, cfg, 0)) {
- if (!pci_mmcfg_check_reserved(dev, cfg, 0)) { dev_warn(dev, FW_BUG "MMCONFIG %pR isn't reserved\n", &cfg->res);
- } else {
/* Insert resource if it's not in boot stage */
if (pci_mmcfg_running_state)
tmp = insert_resource_conflict(&iomem_resource,
&cfg->res);
return -EBUSY;
- }
- /* Insert resource if it's not in boot stage */
- if (pci_mmcfg_running_state) {
tmp = insert_resource_conflict(&iomem_resource,
if (tmp) {&cfg->res);
dev_warn(dev,
"MMCONFIG %pR conflicts with "
"%s %pR\n",
&cfg->res, tmp->name, tmp);
} else if (pci_mmcfg_arch_map(cfg)) {
dev_warn(dev, "fail to map MMCONFIG %pR.\n",
&cfg->res);
} else {
list_add_sorted(cfg);
dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
&cfg->res, (unsigned long)addr);
cfg = NULL;
rc = 0;
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&cfg->res, tmp->name, tmp);
} }return -EBUSY;
- if (cfg) {
if (cfg->res.parent)
release_resource(&cfg->res);
kfree(cfg);
- if (pci_mmcfg_arch_map(cfg)) {
dev_warn(dev, "fail to map MMCONFIG %pR.\n", &cfg->res);
}return -EBUSY;
- mutex_unlock(&pci_mmcfg_lock);
- return rc;
- return 0; }
-/* Delete MMCFG information for host bridges */ -int pci_mmconfig_delete(u16 seg, u8 start, u8 end) +void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *cfg) {
- struct pci_mmcfg_region *cfg;
- mutex_lock(&pci_mmcfg_lock);
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == seg && cfg->start_bus == start &&
cfg->end_bus == end) {
list_del_rcu(&cfg->list);
synchronize_rcu();
pci_mmcfg_arch_unmap(cfg);
if (cfg->res.parent)
release_resource(&cfg->res);
mutex_unlock(&pci_mmcfg_lock);
kfree(cfg);
return 0;
}
- mutex_unlock(&pci_mmcfg_lock);
- pci_mmcfg_arch_unmap(cfg);
- if (cfg->res.parent)
release_resource(&cfg->res);
- cfg->res.parent = NULL;
+}
- return -ENOENT;
+int pci_mmconfig_enabled(void) +{
- return (pci_probe & PCI_PROBE_MMCONF) && !pci_mmcfg_arch_init_failed; }
diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 43984bc..38a37f8 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -12,6 +12,7 @@ #include <linux/pci.h> #include <linux/init.h> #include <linux/rcupdate.h> +#include <linux/pci-acpi.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index bea5249..29253ec 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -10,6 +10,7 @@ #include <linux/acpi.h> #include <linux/bitmap.h> #include <linux/rcupdate.h> +#include <linux/pci-acpi.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c index 2e565e6..c181eeb 100644 --- a/arch/x86/pci/numachip.c +++ b/arch/x86/pci/numachip.c @@ -14,6 +14,7 @@ */
#include <linux/pci.h> +#include <linux/pci-acpi.h> #include <asm/pci_x86.h>
static u8 limit __read_mostly; diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 7ea903d..e5e4393 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -40,6 +40,7 @@ acpi-$(CONFIG_ARCH_MIGHT_HAVE_ACPI_PDC) += processor_pdc.o acpi-y += ec.o acpi-$(CONFIG_ACPI_DOCK) += dock.o acpi-y += pci_root.o pci_link.o pci_irq.o +acpi-$(CONFIG_PCI_MMCONFIG) += pci_mcfg.o acpi-y += acpi_lpss.o acpi_apd.o acpi-y += acpi_platform.o acpi-y += acpi_pnp.o diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c @@ -0,0 +1,312 @@ +/*
- pci_mcfg.c
- Common code to maintain the MCFG areas and mappings
- This has been extracted from arch/x86/pci/mmconfig-shared.c
- and moved here so that other architectures can use this code.
- */
+#include <linux/pci.h> +#include <linux/init.h> +#include <linux/dmi.h> +#include <linux/pci-acpi.h> +#include <linux/sfi_acpi.h> +#include <linux/slab.h> +#include <linux/mutex.h> +#include <linux/rculist.h>
+#define PREFIX "ACPI: "
+static DEFINE_MUTEX(pci_mmcfg_lock); +LIST_HEAD(pci_mmcfg_list);
+static void list_add_sorted(struct pci_mmcfg_region *new) +{
- struct pci_mmcfg_region *cfg;
- /* keep list sorted by segment and starting bus number */
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) {
if (cfg->segment > new->segment ||
(cfg->segment == new->segment &&
cfg->start_bus >= new->start_bus)) {
list_add_tail_rcu(&new->list, &cfg->list);
return;
}
- }
- list_add_tail_rcu(&new->list, &pci_mmcfg_list);
+}
+static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start,
int end, u64 addr)
+{
- struct pci_mmcfg_region *new;
- struct resource *res;
- if (addr == 0)
return NULL;
- new = kzalloc(sizeof(*new), GFP_KERNEL);
- if (!new)
return NULL;
- new->address = addr;
- new->segment = segment;
- new->start_bus = start;
- new->end_bus = end;
- res = &new->res;
- res->start = addr + PCI_MMCFG_BUS_OFFSET(start);
- res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1;
- res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
- snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN,
"PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end);
- res->name = new->name;
- return new;
+}
+struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr)
+{
- struct pci_mmcfg_region *new;
- new = pci_mmconfig_alloc(segment, start, end, addr);
- if (new) {
mutex_lock(&pci_mmcfg_lock);
list_add_sorted(new);
mutex_unlock(&pci_mmcfg_lock);
pr_info(PREFIX
"MMCONFIG for domain %04x [bus %02x-%02x] at %pR "
"(base %#lx)\n",
segment, start, end, &new->res, (unsigned long)addr);
- }
- return new;
+}
+struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) +{
- struct pci_mmcfg_region *cfg;
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == segment &&
cfg->start_bus <= bus && bus <= cfg->end_bus)
return cfg;
- return NULL;
+}
+/*
- Map a pci_mmcfg_region, can be overrriden by arch
- */
+int __weak pci_mmconfig_map_resource(struct device *dev,
- struct pci_mmcfg_region *mcfg)
+{
- struct resource *tmp;
- void __iomem *vaddr;
- tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
- if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
- }
- vaddr = ioremap(mcfg->res.start, resource_size(&mcfg->res));
- if (!vaddr) {
release_resource(&mcfg->res);
return -ENOMEM;
- }
- mcfg->virt = vaddr;
Here should be changed to mcfg->virt = vaddr - PCI_MMCFG_BUS_OFFSET(mcfg->start_bus);
or when pcie host "start_bus" is not 0, the configuraion access will be wrong.
See v3 or v4 patchset "addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus);" static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) { void __iomem *addr; u64 start, size; int num_buses;
start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus); num_buses = cfg->end_bus - cfg->start_bus + 1; size = PCI_MMCFG_BUS_OFFSET(num_buses); addr = ioremap_nocache(start, size); if (addr) addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus); return addr; }
Dongdong Thanks
- return 0;
+}
+/*
- Unmap a pci_mmcfg_region, can be overrriden by arch
- */
+void __weak pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg) +{
- if (mcfg->virt) {
iounmap(mcfg->virt);
mcfg->virt = NULL;
- }
- if (mcfg->res.parent) {
release_resource(&mcfg->res);
mcfg->res.parent = NULL;
- }
+}
+/*
- check if the mmconfig is enabled and configured
- */
+int __weak pci_mmconfig_enabled(void) +{
- return 1;
+}
+/* Add MMCFG information for host bridges */ +int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr)
+{
- struct pci_mmcfg_region *cfg;
- int rc;
- if (!pci_mmconfig_enabled())
return -ENODEV;
- if (start > end)
return -EINVAL;
- mutex_lock(&pci_mmcfg_lock);
- cfg = pci_mmconfig_lookup(seg, start);
- if (cfg) {
if (cfg->end_bus < end)
dev_info(dev, FW_INFO
"MMCONFIG for "
"domain %04x [bus %02x-%02x] "
"only partially covers this bridge\n",
cfg->segment, cfg->start_bus, cfg->end_bus);
rc = -EEXIST;
goto err;
- }
- if (!addr) {
rc = -EINVAL;
goto err;
- }
- cfg = pci_mmconfig_alloc(seg, start, end, addr);
- if (cfg == NULL) {
dev_warn(dev, "fail to add MMCONFIG (out of memory)\n");
rc = -ENOMEM;
goto err;
- }
- rc = pci_mmconfig_map_resource(dev, cfg);
- if (!rc) {
list_add_sorted(cfg);
dev_info(dev, "MMCONFIG at %pR (base %#lx)\n",
&cfg->res, (unsigned long)addr);
return 0;
- } else {
if (cfg->res.parent)
release_resource(&cfg->res);
kfree(cfg);
- }
+err:
- mutex_unlock(&pci_mmcfg_lock);
- return rc;
+}
+/* Delete MMCFG information for host bridges */ +int pci_mmconfig_delete(u16 seg, u8 start, u8 end) +{
- struct pci_mmcfg_region *cfg;
- mutex_lock(&pci_mmcfg_lock);
- list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list)
if (cfg->segment == seg && cfg->start_bus == start &&
cfg->end_bus == end) {
list_del_rcu(&cfg->list);
synchronize_rcu();
pci_mmconfig_unmap_resource(cfg);
mutex_unlock(&pci_mmcfg_lock);
kfree(cfg);
return 0;
}
- mutex_unlock(&pci_mmcfg_lock);
- return -ENOENT;
+}
+static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg,
struct acpi_mcfg_allocation *cfg)
+{
- int year;
- if (!config_enabled(CONFIG_X86))
return 0;
- if (cfg->address < 0xFFFFFFFF)
return 0;
- if (!strncmp(mcfg->header.oem_id, "SGI", 3))
return 0;
- if (mcfg->header.revision >= 1) {
if (dmi_get_date(DMI_BIOS_DATE, &year, NULL, NULL) &&
year >= 2010)
return 0;
- }
- pr_err(PREFIX "MCFG region for %04x [bus %02x-%02x] at %#llx "
"is above 4GB, ignored\n", cfg->pci_segment,
cfg->start_bus_number, cfg->end_bus_number, cfg->address);
- return -EINVAL;
+}
+static int __init pci_parse_mcfg(struct acpi_table_header *header) +{
- struct acpi_table_mcfg *mcfg;
- struct acpi_mcfg_allocation *cfg_table, *cfg;
- unsigned long i;
- int entries;
- if (!header)
return -EINVAL;
- mcfg = (struct acpi_table_mcfg *)header;
- /* how many config structures do we have */
- entries = 0;
- i = header->length - sizeof(struct acpi_table_mcfg);
- while (i >= sizeof(struct acpi_mcfg_allocation)) {
entries++;
i -= sizeof(struct acpi_mcfg_allocation);
- }
- if (entries == 0) {
pr_err(PREFIX "MMCONFIG has no entries\n");
return -ENODEV;
- }
- cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1];
- for (i = 0; i < entries; i++) {
cfg = &cfg_table[i];
if (acpi_mcfg_check_entry(mcfg, cfg))
return -ENODEV;
if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number,
cfg->end_bus_number, cfg->address) == NULL) {
pr_warn(PREFIX "no memory for MCFG entries\n");
return -ENOMEM;
}
- }
- return 0;
+}
+int __init pci_mmconfig_parse_table(void) +{
- return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
+void __weak __init pci_mmcfg_late_init(void) +{
- int err, n = 0;
- struct pci_mmcfg_region *cfg;
- err = pci_mmconfig_parse_table();
- if (err) {
pr_err(PREFIX " Failed to parse MCFG (%d)\n", err);
return;
- }
- list_for_each_entry(cfg, &pci_mmcfg_list, list) {
pci_mmconfig_map_resource(NULL, cfg);
n++;
- }
- pr_info(PREFIX " MCFG table loaded %d entries\n", n);
+} diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 7494dbe..97aa9d3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -27,9 +27,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif
static bool __read_mostly pci_seg_supported = true;
@@ -221,7 +218,7 @@ static int __init xen_mcfg_late(void) if (!xen_initial_domain()) return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0)
if (!pci_mmconfig_enabled()) return 0;
if (list_empty(&pci_mmcfg_list))
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
- struct list_head list;
- struct resource res;
- u64 address;
- char __iomem *virt;
- u16 segment;
- u8 start_bus;
- u8 end_bus;
- char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
- struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
- #else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
On Thu, Feb 18, 2016 at 08:25:35PM +0800, liudongdong (C) wrote:
[...]
+/*
- Map a pci_mmcfg_region, can be overrriden by arch
- */
+int __weak pci_mmconfig_map_resource(struct device *dev,
- struct pci_mmcfg_region *mcfg)
+{
- struct resource *tmp;
- void __iomem *vaddr;
- tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
- if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
- }
- vaddr = ioremap(mcfg->res.start, resource_size(&mcfg->res));
^^ while at it, stray white space
- if (!vaddr) {
release_resource(&mcfg->res);
return -ENOMEM;
- }
- mcfg->virt = vaddr;
Here should be changed to mcfg->virt = vaddr - PCI_MMCFG_BUS_OFFSET(mcfg->start_bus);
or when pcie host "start_bus" is not 0, the configuraion access will be wrong.
Well spotted, thanks.
Lorenzo
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
- struct pci_mmcfg_region *mcfg)
+{
- struct resource *tmp;
- void __iomem *vaddr;
- tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
- if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
- }
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
- return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09 +/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
- struct list_head list;
- struct resource res;
- u64 address;
- char __iomem *virt;
- u16 segment;
- u8 start_bus;
- u8 end_bus;
- char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
- struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
Bjorn
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to - drop the pci_mmcfg_list handling from generic case - move common ECAM code so that it can be shared with pci-host-generic.c if that is what you are looking for. The code will end up looking much simpler.
Thanks, JC.
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place.
I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
On x86, the normal config access path is:
pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#)
I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
Bjorn
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64 addr);
+extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There, I tried to leave x86 complication in arch/x86/ and extract generic functionalities to driver/pci/ecam.c as the library.
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors being for ARM64 world. Unfortunately, nobody was able to show real use case for ARM64. Do you see the reason we need this? Our conclusion was to leave it empty for ARM64 which in turn makes code simpler. I am not ASWG member while that was under discussion so I will ask Lorenzo to elaborate more on this.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place. I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do now, but ECAM region and sysdata association will remove ECAM region lookup step (see patch 09/15 of this series).
On x86, the normal config access path is: pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#) I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Any comments appreciated.
Thanks, Tomasz
On 09.03.2016 10:13, Tomasz Nowicki wrote:
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch.
I meant to say, in my previous patch set (V4), sorry.
Tomasz
Hi Tomasz,
On Wed, Mar 9, 2016 at 2:43 PM, Tomasz Nowicki tn@semihalf.com wrote:
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64
addr); +extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There, I tried to leave x86 complication in arch/x86/ and extract generic functionalities to driver/pci/ecam.c as the library.
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors being for ARM64 world. Unfortunately, nobody was able to show real use case for ARM64. Do you see the reason we need this? Our conclusion was to leave it empty for ARM64 which in turn makes code simpler. I am not ASWG member while that was under discussion so I will ask Lorenzo to elaborate more on this.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place. I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do now, but ECAM region and sysdata association will remove ECAM region lookup step (see patch 09/15 of this series).
On x86, the normal config access path is: pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#) I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Both raw_pci_read/write and a host controller with pci_generic_read/write can be done without much trouble, please see the patch I had at: https://patchwork.ozlabs.org/patch/575526/
I have been looking at Bjorn's suggestions and trying to see if I can update my MCFG patch taking care of them. I will post an updated patchset soon, unless you want to take this up.
JC.
Hi Jayachandran,
On 09.03.2016 11:10, Jayachandran Chandrashekaran Nair wrote:
Hi Tomasz,
On Wed, Mar 9, 2016 at 2:43 PM, Tomasz Nowicki tn@semihalf.com wrote:
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64
addr); +extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There, I tried to leave x86 complication in arch/x86/ and extract generic functionalities to driver/pci/ecam.c as the library.
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors being for ARM64 world. Unfortunately, nobody was able to show real use case for ARM64. Do you see the reason we need this? Our conclusion was to leave it empty for ARM64 which in turn makes code simpler. I am not ASWG member while that was under discussion so I will ask Lorenzo to elaborate more on this.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place. I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it. - If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do now, but ECAM region and sysdata association will remove ECAM region lookup step (see patch 09/15 of this series).
On x86, the normal config access path is: pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#) I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Both raw_pci_read/write and a host controller with pci_generic_read/write can be done without much trouble, please see the patch I had at: https://patchwork.ozlabs.org/patch/575526/
Yes it is doable, I implemented it in the same way in one of my initial patch series, about year ago. I'm questioning raw accessors presence on per-arch basis. If it is really needed for all archs, then we definitely should implement it. If ARM64 does not care for it, there is no point to complicate it. Especially, I mean all kind of PCI config space quirk we will need to handle right after this patch got merged, see: [PATCH V5 13/15] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
and
https://lkml.org/lkml/2016/2/9/627 https://lkml.org/lkml/2016/2/8/967
Giving these quirks, raw accessors are not so easy.
I have been looking at Bjorn's suggestions and trying to see if I can update my MCFG patch taking care of them. I will post an updated patch set soon, unless you want to take this up.
Yes, I want to post next version and keep this patch set together, if you and Bjorn are okay. I am feeling that my previous patch set is close to what Bjorn suggested, modulo the way we keep MCFG regions. Lets discuss it here, then I will post it as next version. I am looking forward to hear Bjorn's comment on my previous patch set.
Tomasz
Hi Tomasz,
On Wed, Mar 9, 2016 at 4:20 PM, Tomasz Nowicki tn@semihalf.com wrote:
Hi Jayachandran,
On 09.03.2016 11:10, Jayachandran Chandrashekaran Nair wrote:
Hi Tomasz,
On Wed, Mar 9, 2016 at 2:43 PM, Tomasz Nowicki tn@semihalf.com wrote:
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote: > > > From: Jayachandran C jchandra@broadcom.com > > Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is > to share the API and code with ARM64 later. The corresponding > declarations are moved from asm/pci_x86.h to linux/pci-acpi.h > > As a part of this we introduce three functions that can be > implemented by the arch code: pci_mmconfig_map_resource() to map a > mcfg entry, pci_mmconfig_unmap_resource to do the corresponding > unmap and pci_mmconfig_enabled to see if the arch setup of > mcfg entries was successful. We also provide weak implementations > of these, which will be used from ARM64. On x86, we retain the > old logic by providing platform specific implementation. > > This patch is purely rearranging code, it should not have any > impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
> diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c > new file mode 100644 > index 0000000..ea84365 > --- /dev/null > +++ b/drivers/acpi/pci_mcfg.c > ... > +int __weak pci_mmconfig_map_resource(struct device *dev, > + struct pci_mmcfg_region *mcfg) > +{ > + struct resource *tmp; > + void __iomem *vaddr; > + > + tmp = insert_resource_conflict(&iomem_resource, &mcfg->res); > + if (tmp) { > + dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n", > + &mcfg->res, tmp->name, tmp); > + return -EBUSY; > + }
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
> +int __init pci_mmconfig_parse_table(void) > +{ > + return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); > +}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h > index 89ab057..e9450ef 100644 > --- a/include/linux/pci-acpi.h > +++ b/include/linux/pci-acpi.h > @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; > #define RESET_DELAY_DSM 0x08 > #define FUNCTION_DELAY_DSM 0x09 > > +/* common API to maintain list of MCFG regions */ > +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ > +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) > + > +struct pci_mmcfg_region { > + struct list_head list; > + struct resource res; > + u64 address; > + char __iomem *virt; > + u16 segment; > + u8 start_bus; > + u8 end_bus; > + char name[PCI_MMCFG_RESOURCE_NAME_LEN]; > +}; > + > +extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 > start, > u8 end, > + phys_addr_t addr); > +extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); > + > +extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int > bus); > +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int > start, > + int end, u64 > addr); > +extern int pci_mmconfig_map_resource(struct device *dev, > + struct pci_mmcfg_region *mcfg); > +extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region > *mcfg); > +extern int pci_mmconfig_enabled(void); > +extern int __init pci_mmconfig_parse_table(void); > + > +extern struct list_head pci_mmcfg_list; > + > +#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) > +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12) > +
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
> #else /* CONFIG_ACPI */ > static inline void acpi_pci_add_bus(struct pci_bus *bus) { } > static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There, I tried to leave x86 complication in arch/x86/ and extract generic functionalities to driver/pci/ecam.c as the library.
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors being for ARM64 world. Unfortunately, nobody was able to show real use case for ARM64. Do you see the reason we need this? Our conclusion was to leave it empty for ARM64 which in turn makes code simpler. I am not ASWG member while that was under discussion so I will ask Lorenzo to elaborate more on this.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place. I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it. - If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do now, but ECAM region and sysdata association will remove ECAM region lookup step (see patch 09/15 of this series).
On x86, the normal config access path is: pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#) I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Both raw_pci_read/write and a host controller with pci_generic_read/write can be done without much trouble, please see the patch I had at: https://patchwork.ozlabs.org/patch/575526/
Yes it is doable, I implemented it in the same way in one of my initial patch series, about year ago. I'm questioning raw accessors presence on per-arch basis. If it is really needed for all archs, then we definitely should implement it. If ARM64 does not care for it, there is no point to complicate it. Especially, I mean all kind of PCI config space quirk we will need to handle right after this patch got merged, see: [PATCH V5 13/15] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
and
https://lkml.org/lkml/2016/2/9/627 https://lkml.org/lkml/2016/2/8/967
Giving these quirks, raw accessors are not so easy.
The whole quirk handling infrastructure seems to be an overkill to me. I will leave it to maintainers to comment further.
I have been looking at Bjorn's suggestions and trying to see if I can update my MCFG patch taking care of them. I will post an updated patch set soon, unless you want to take this up.
Yes, I want to post next version and keep this patch set together, if you and Bjorn are okay. I am feeling that my previous patch set is close to what Bjorn suggested, modulo the way we keep MCFG regions. Lets discuss it here, then I will post it as next version. I am looking forward to hear Bjorn's comment on my previous patch set.
I have been looking thru the code, and I have a reasonable implementation which updates this one patch. This pulls in common code from pci-host-generic.c as well. I will post it by next week and you can decide whether to use it to update your patchset.
Thanks, JC.
Hi Bjorn,
Here is a new patchset for the ACPI PCI controller driver based on the earlier discussion[1].
The first two patches in the patchset implements pci/ecam.c for generic config space access and uses it in pci-host-generic.c and related files.
The third patch implements the ACPI PCI host driver using the same ecam access functions. The fourth patch adds the implementation of raw operations.
I have not used the pci_mmcfg_list or the region definitions from x86, but have used a much simpler approach here.
This should apply cleanly on top of the current pci next tree, and can be reviewed as a patchset. To use it on ARM64, we need to pull in about 7 patches more from Tomasz patchset that fixes various issues (like stub code in arm64 pci.c, ACPI companion setup, domain number assignment, IO resources fixup etc.).
If you are okay with this approach, I will work with Tomasz and post the full patchset.
This has been tested on qemu with OVMF for the ACPI part and with device tree for pci-host-generic code.
Thanks, JC.
[1] https://lkml.org/lkml/2016/3/3/921
Jayachandran C (4): PCI: Provide generic ECAM mapping functions PCI: generic,thunder: Use generic config functions ACPI: PCI: Add generic PCI host controller ACPI: PCI: Add raw_pci_read/write operations
drivers/acpi/Kconfig | 9 + drivers/acpi/Makefile | 1 + drivers/acpi/pci_gen_host.c | 334 ++++++++++++++++++++++++++++++++++++ drivers/pci/Kconfig | 3 + drivers/pci/Makefile | 2 + drivers/pci/ecam.c | 127 ++++++++++++++ drivers/pci/host/Kconfig | 1 + drivers/pci/host/pci-host-common.c | 68 ++++---- drivers/pci/host/pci-host-common.h | 25 +-- drivers/pci/host/pci-host-generic.c | 51 +----- drivers/pci/host/pci-thunder-ecam.c | 33 +--- drivers/pci/host/pci-thunder-pem.c | 41 ++--- include/linux/pci.h | 10 ++ 13 files changed, 560 insertions(+), 145 deletions(-) create mode 100644 drivers/acpi/pci_gen_host.c create mode 100644 drivers/pci/ecam.c
Add config option PCI_GENERIC_ECAM and file drivers/pci/ecam.c to provide generic functions to access memory mapped PCI config space.
The API defines 'struct pci_config_window' to hold the mappings. The function pci_generic_map_config() is provided to allocate the struct, request the memory region, and do the ioremap() calls needed to setup the mapping. pci_generic_unmap_config() is provided to delete a mapping.
A helper function pci_generic_map_bus() is also provided to be used to implement pci_ops map_bus method.
Signed-off-by: Jayachandran C jchandra@broadcom.com --- drivers/pci/Kconfig | 3 ++ drivers/pci/Makefile | 2 + drivers/pci/ecam.c | 127 +++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/pci.h | 10 ++++ 4 files changed, 142 insertions(+) create mode 100644 drivers/pci/ecam.c
diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 209292e..e930d62 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -83,6 +83,9 @@ config HT_IRQ config PCI_ATS bool
+config PCI_GENERIC_ECAM + bool + config PCI_IOV bool "PCI IOV support" depends on PCI diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 2154092..810aec8 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -55,6 +55,8 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
obj-$(CONFIG_PCI_STUB) += pci-stub.o
+obj-$(CONFIG_PCI_GENERIC_ECAM) += ecam.o + obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
obj-$(CONFIG_OF) += of.o diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c new file mode 100644 index 0000000..6a63901 --- /dev/null +++ b/drivers/pci/ecam.c @@ -0,0 +1,127 @@ +/* + * Copyright 2016 Broadcom + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation (the "GPL"). + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 (GPLv2) for more details. + * + * You should have received a copy of the GNU General Public License + * version 2 (GPLv2) along with this source code. + */ + +#include <linux/device.h> +#include <linux/io.h> +#include <linux/kernel.h> +#include <linux/module.h> +#include <linux/pci.h> +#include <linux/slab.h> + +/* + * On 64 bit systems, we do a single ioremap for the whole config space + * since we have enough virtual address range available. On 32 bit, do an + * ioremap per bus. + */ +static const bool per_bus_mapping = !config_enabled(CONFIG_64BIT); + +/* + * struct to hold the mappings of a config space window. This + * will be allocated with enough entries in win[] to hold all + * the mappings for the bus range. + */ +struct pci_config_window { + phys_addr_t cfgaddr; + u8 bus_start; + u8 bus_end; + u8 bus_shift; + u8 devfn_shift; + void __iomem *win[0]; +}; + +/* + * helper function provided to implement the pci_ops ->map_bus method + */ +void __iomem *pci_generic_map_bus(struct pci_config_window *cfg, + unsigned int busn, unsigned int devfn, int where) +{ + void __iomem *base; + + if (busn < cfg->bus_start || busn > cfg->bus_end) + return NULL; + + busn -= cfg->bus_start; + if (per_bus_mapping) + base = cfg->win[busn]; + else + base = cfg->win[0] + (busn << cfg->bus_shift); + return base + (devfn << cfg->devfn_shift) + where; +} + +/* + * Create a PCI config space window + * - reserve mem region + * - alloc struct pci_config_window with space for all mappings + * - ioremap the config space + */ +struct pci_config_window *pci_generic_map_config(phys_addr_t addr, + u8 bus_start, u8 bus_end, u8 bus_shift, u8 devfn_shift) +{ + struct pci_config_window *cfg; + unsigned int bus_range, bsz, mapsz; + int i, nidx; + + if (bus_end < bus_start) + return ERR_PTR(-EINVAL); + + bus_range = bus_end - bus_start + 1; + bsz = 1 << bus_shift; + nidx = per_bus_mapping ? bus_range : 1; + mapsz = per_bus_mapping ? bsz : bus_range * bsz; + cfg = kzalloc(sizeof(*cfg) + nidx * sizeof(cfg->win[0]), GFP_KERNEL); + if (!cfg) + return ERR_PTR(-ENOMEM); + + cfg->bus_start = bus_start; + cfg->bus_end = bus_end; + cfg->bus_shift = bus_shift; + cfg->devfn_shift = devfn_shift; + + if (!request_mem_region(addr, bus_range * bsz, "Configuration Space")) + goto err_exit; + + /* cfgaddr has to be set after request_mem_region */ + cfg->cfgaddr = addr; + + for (i = 0; i < nidx; i++) { + cfg->win[i] = ioremap(addr + i * mapsz, mapsz); + if (!cfg->win[i]) + goto err_exit; + } + return cfg; + +err_exit: + pci_generic_unmap_config(cfg); + return ERR_PTR(-ENOMEM); +} + +/* + * Free a config space mapping + */ +void pci_generic_unmap_config(struct pci_config_window *cfg) +{ + unsigned int bus_range; + int i, nidx; + + bus_range = cfg->bus_end - cfg->bus_start + 1; + nidx = per_bus_mapping ? bus_range : 1; + for (i = 0; i < nidx; i++) + if (cfg->win[i]) + iounmap(cfg->win[i]); + if (cfg->cfgaddr) + release_mem_region(cfg->cfgaddr, bus_range << cfg->bus_shift); + kfree(cfg); +} diff --git a/include/linux/pci.h b/include/linux/pci.h index 3df9e37..33c46bc 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2013,6 +2013,16 @@ static inline bool pci_ari_enabled(struct pci_bus *bus) return bus->self && bus->self->ari_enabled; }
+/* Generic ECAM mapping API */ +#ifdef CONFIG_PCI_GENERIC_ECAM +struct pci_config_window; +void __iomem *pci_generic_map_bus(struct pci_config_window *cfg, + unsigned int busn, unsigned int devfn, int where); +struct pci_config_window *pci_generic_map_config(phys_addr_t addr, + u8 bus_start, u8 bus_end, u8 bus_shift, u8 devfn_shift); +void pci_generic_unmap_config(struct pci_config_window *cfg); +#endif + /* provide the legacy pci_dma_* API */ #include <linux/pci-dma-compat.h>
Use functions provided by pci/ecam.c for mapping config space. The gen_pci structure is updated to store a pointer to the mapping.
'struct gen_pci_cfg_windows' is no longer needed since the mapping is handled by generic code and 'struct gen_pci_cfg_bus_ops' can be removed since bus shift is handled by generic code. The rest of the data related to generic host controller are moved into 'struct gen_pci' itself.
The generic code handles the bus and devfn shifts, so a common function gen_pci_map_cfg_bus() has been added which can be used as ->map_bus by all users.
With the updated interface, the users have to allocate a gen_pci instance and setup the bus_shift and pci_ops fields in the structure before calling pci_host_common_probe().
Update pci-host-common.c, pci-host-generic.c, pci-thunder-ecam.c and pci-thunder-pem.c to use the new interface.
Also add dependency of PCI_GENERIC_ECAM for PCI_HOST_GENERIC in Kconfig
Signed-off-by: Jayachandran C jchandra@broadcom.com --- drivers/pci/host/Kconfig | 1 + drivers/pci/host/pci-host-common.c | 68 +++++++++++++++++++------------------ drivers/pci/host/pci-host-common.h | 25 +++++--------- drivers/pci/host/pci-host-generic.c | 51 +++++----------------------- drivers/pci/host/pci-thunder-ecam.c | 33 +++++------------- drivers/pci/host/pci-thunder-pem.c | 41 +++++++--------------- 6 files changed, 74 insertions(+), 145 deletions(-)
diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig index da61fa77..d27b989 100644 --- a/drivers/pci/host/Kconfig +++ b/drivers/pci/host/Kconfig @@ -82,6 +82,7 @@ config PCI_HOST_GENERIC bool "Generic PCI host controller" depends on (ARM || ARM64) && OF select PCI_HOST_COMMON + select PCI_GENERIC_ECAM help Say Y here if you want to support a simple generic PCI host controller, such as the one emulated by kvmtool. diff --git a/drivers/pci/host/pci-host-common.c b/drivers/pci/host/pci-host-common.c index e9f850f..d3b0664 100644 --- a/drivers/pci/host/pci-host-common.c +++ b/drivers/pci/host/pci-host-common.c @@ -24,6 +24,14 @@
#include "pci-host-common.h"
+void __iomem *gen_pci_map_cfg_bus(struct pci_bus *bus, + unsigned int devfn, int where) +{ + struct gen_pci *pci = bus->sysdata; + + return pci_generic_map_bus(pci->cfg, bus->number, devfn, where); +} + static void gen_pci_release_of_pci_ranges(struct gen_pci *pci) { pci_free_resource_list(&pci->resources); @@ -60,7 +68,7 @@ static int gen_pci_parse_request_of_pci_ranges(struct gen_pci *pci) res_valid |= !(res->flags & IORESOURCE_PREFETCH); break; case IORESOURCE_BUS: - pci->cfg.bus_range = res; + pci->bus_range = res; default: continue; } @@ -83,50 +91,44 @@ out_release_res: return err; }
+static void gen_pci_generic_unmap_cfg(void *ptr) +{ + pci_generic_unmap_config((struct pci_config_window *)ptr); +} + static int gen_pci_parse_map_cfg_windows(struct gen_pci *pci) { int err; - u8 bus_max; - resource_size_t busn; - struct resource *bus_range; + struct resource *cfgres = &pci->cfgres; + struct resource *bus_range = pci->bus_range; + unsigned int bus_shift = pci->bus_shift; struct device *dev = pci->host.dev.parent; struct device_node *np = dev->of_node; - u32 sz = 1 << pci->cfg.ops->bus_shift; + struct pci_config_window *cfg;
- err = of_address_to_resource(np, 0, &pci->cfg.res); + err = of_address_to_resource(np, 0, cfgres); if (err) { dev_err(dev, "missing "reg" property\n"); return err; }
/* Limit the bus-range to fit within reg */ - bus_max = pci->cfg.bus_range->start + - (resource_size(&pci->cfg.res) >> pci->cfg.ops->bus_shift) - 1; - pci->cfg.bus_range->end = min_t(resource_size_t, - pci->cfg.bus_range->end, bus_max); - - pci->cfg.win = devm_kcalloc(dev, resource_size(pci->cfg.bus_range), - sizeof(*pci->cfg.win), GFP_KERNEL); - if (!pci->cfg.win) - return -ENOMEM; - - /* Map our Configuration Space windows */ - if (!devm_request_mem_region(dev, pci->cfg.res.start, - resource_size(&pci->cfg.res), - "Configuration Space")) - return -ENOMEM; - - bus_range = pci->cfg.bus_range; - for (busn = bus_range->start; busn <= bus_range->end; ++busn) { - u32 idx = busn - bus_range->start; - - pci->cfg.win[idx] = devm_ioremap(dev, - pci->cfg.res.start + idx * sz, - sz); - if (!pci->cfg.win[idx]) - return -ENOMEM; + bus_range->end = min(bus_range->end, + bus_range->start + (resource_size(cfgres) >> bus_shift) - 1); + + cfg = pci_generic_map_config(cfgres->start, bus_range->start, + bus_range->end, bus_shift, bus_shift - 8); + + if (IS_ERR(cfg)) + return PTR_ERR(cfg); + + err = devm_add_action(dev, gen_pci_generic_unmap_cfg, cfg); + if (err) { + gen_pci_generic_unmap_cfg(cfg); + return err; }
+ pci->cfg = cfg; return 0; }
@@ -168,8 +170,8 @@ int pci_host_common_probe(struct platform_device *pdev, pci_add_flags(PCI_REASSIGN_ALL_RSRC | PCI_REASSIGN_ALL_BUS);
- bus = pci_scan_root_bus(dev, pci->cfg.bus_range->start, - &pci->cfg.ops->ops, pci, &pci->resources); + bus = pci_scan_root_bus(dev, pci->bus_range->start, + pci->ops, pci, &pci->resources); if (!bus) { dev_err(dev, "Scanning rootbus failed"); return -ENODEV; diff --git a/drivers/pci/host/pci-host-common.h b/drivers/pci/host/pci-host-common.h index 09f3fa0..4eb2ff0 100644 --- a/drivers/pci/host/pci-host-common.h +++ b/drivers/pci/host/pci-host-common.h @@ -22,26 +22,19 @@ #include <linux/kernel.h> #include <linux/platform_device.h>
-struct gen_pci_cfg_bus_ops { - u32 bus_shift; - struct pci_ops ops; -}; - -struct gen_pci_cfg_windows { - struct resource res; - struct resource *bus_range; - void __iomem **win; - - struct gen_pci_cfg_bus_ops *ops; -}; - struct gen_pci { - struct pci_host_bridge host; - struct gen_pci_cfg_windows cfg; - struct list_head resources; + struct pci_host_bridge host; + struct resource *bus_range; + unsigned int bus_shift; + struct resource cfgres; + struct pci_config_window *cfg; + struct pci_ops *ops; + struct list_head resources; };
int pci_host_common_probe(struct platform_device *pdev, struct gen_pci *pci); +void __iomem *gen_pci_map_cfg_bus(struct pci_bus *bus, + unsigned int devfn, int where);
#endif /* _PCI_HOST_COMMON_H */ diff --git a/drivers/pci/host/pci-host-generic.c b/drivers/pci/host/pci-host-generic.c index e8aa78f..7b6cee1 100644 --- a/drivers/pci/host/pci-host-generic.c +++ b/drivers/pci/host/pci-host-generic.c @@ -27,51 +27,15 @@
#include "pci-host-common.h"
-static void __iomem *gen_pci_map_cfg_bus_cam(struct pci_bus *bus, - unsigned int devfn, - int where) -{ - struct gen_pci *pci = bus->sysdata; - resource_size_t idx = bus->number - pci->cfg.bus_range->start; - - return pci->cfg.win[idx] + ((devfn << 8) | where); -} - -static struct gen_pci_cfg_bus_ops gen_pci_cfg_cam_bus_ops = { - .bus_shift = 16, - .ops = { - .map_bus = gen_pci_map_cfg_bus_cam, - .read = pci_generic_config_read, - .write = pci_generic_config_write, - } -}; - -static void __iomem *gen_pci_map_cfg_bus_ecam(struct pci_bus *bus, - unsigned int devfn, - int where) -{ - struct gen_pci *pci = bus->sysdata; - resource_size_t idx = bus->number - pci->cfg.bus_range->start; - - return pci->cfg.win[idx] + ((devfn << 12) | where); -} - -static struct gen_pci_cfg_bus_ops gen_pci_cfg_ecam_bus_ops = { - .bus_shift = 20, - .ops = { - .map_bus = gen_pci_map_cfg_bus_ecam, - .read = pci_generic_config_read, - .write = pci_generic_config_write, - } +static struct pci_ops gen_pci_ops = { + .map_bus = gen_pci_map_cfg_bus, + .read = pci_generic_config_read, + .write = pci_generic_config_write, };
static const struct of_device_id gen_pci_of_match[] = { - { .compatible = "pci-host-cam-generic", - .data = &gen_pci_cfg_cam_bus_ops }, - - { .compatible = "pci-host-ecam-generic", - .data = &gen_pci_cfg_ecam_bus_ops }, - + { .compatible = "pci-host-cam-generic", .data = (void *)16}, + { .compatible = "pci-host-ecam-generic", .data = (void *)20}, { }, }; MODULE_DEVICE_TABLE(of, gen_pci_of_match); @@ -86,7 +50,8 @@ static int gen_pci_probe(struct platform_device *pdev) return -ENOMEM;
of_id = of_match_node(gen_pci_of_match, dev->of_node); - pci->cfg.ops = (struct gen_pci_cfg_bus_ops *)of_id->data; + pci->bus_shift = (unsigned long)of_id->data; + pci->ops = &gen_pci_ops;
return pci_host_common_probe(pdev, pci); } diff --git a/drivers/pci/host/pci-thunder-ecam.c b/drivers/pci/host/pci-thunder-ecam.c index d71935cb..34714df 100644 --- a/drivers/pci/host/pci-thunder-ecam.c +++ b/drivers/pci/host/pci-thunder-ecam.c @@ -15,17 +15,6 @@
#include "pci-host-common.h"
-/* Mapping is standard ECAM */ -static void __iomem *thunder_ecam_map_bus(struct pci_bus *bus, - unsigned int devfn, - int where) -{ - struct gen_pci *pci = bus->sysdata; - resource_size_t idx = bus->number - pci->cfg.bus_range->start; - - return pci->cfg.win[idx] + ((devfn << 12) | where); -} - static void set_val(u32 v, int where, int size, u32 *val) { int shift = (where & 3) * 8; @@ -129,7 +118,7 @@ static int thunder_ecam_p2_config_read(struct pci_bus *bus, unsigned int devfn, * the config space access window. Since we are working with * the high-order 32 bits, shift everything down by 32 bits. */ - node_bits = (pci->cfg.res.start >> 32) & (1 << 12); + node_bits = (pci->cfgres.start >> 32) & (1 << 12);
v |= node_bits; set_val(v, where, size, val); @@ -358,19 +347,14 @@ static int thunder_ecam_config_write(struct pci_bus *bus, unsigned int devfn, return pci_generic_config_write(bus, devfn, where, size, val); }
-static struct gen_pci_cfg_bus_ops thunder_ecam_bus_ops = { - .bus_shift = 20, - .ops = { - .map_bus = thunder_ecam_map_bus, - .read = thunder_ecam_config_read, - .write = thunder_ecam_config_write, - } +static struct pci_ops thunder_ecam_pci_ops = { + .map_bus = gen_pci_map_cfg_bus, + .read = thunder_ecam_config_read, + .write = thunder_ecam_config_write, };
static const struct of_device_id thunder_ecam_of_match[] = { - { .compatible = "cavium,pci-host-thunder-ecam", - .data = &thunder_ecam_bus_ops }, - + { .compatible = "cavium,pci-host-thunder-ecam" }, { }, }; MODULE_DEVICE_TABLE(of, thunder_ecam_of_match); @@ -378,14 +362,13 @@ MODULE_DEVICE_TABLE(of, thunder_ecam_of_match); static int thunder_ecam_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; - const struct of_device_id *of_id; struct gen_pci *pci = devm_kzalloc(dev, sizeof(*pci), GFP_KERNEL);
if (!pci) return -ENOMEM;
- of_id = of_match_node(thunder_ecam_of_match, dev->of_node); - pci->cfg.ops = (struct gen_pci_cfg_bus_ops *)of_id->data; + pci->ops = &thunder_ecam_pci_ops; + pci->bus_shift = 20;
return pci_host_common_probe(pdev, pci); } diff --git a/drivers/pci/host/pci-thunder-pem.c b/drivers/pci/host/pci-thunder-pem.c index cabb92a..eb248c1 100644 --- a/drivers/pci/host/pci-thunder-pem.c +++ b/drivers/pci/host/pci-thunder-pem.c @@ -31,15 +31,6 @@ struct thunder_pem_pci { void __iomem *pem_reg_base; };
-static void __iomem *thunder_pem_map_bus(struct pci_bus *bus, - unsigned int devfn, int where) -{ - struct gen_pci *pci = bus->sysdata; - resource_size_t idx = bus->number - pci->cfg.bus_range->start; - - return pci->cfg.win[idx] + ((devfn << 16) | where); -} - static int thunder_pem_bridge_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { @@ -134,15 +125,15 @@ static int thunder_pem_config_read(struct pci_bus *bus, unsigned int devfn, { struct gen_pci *pci = bus->sysdata;
- if (bus->number < pci->cfg.bus_range->start || - bus->number > pci->cfg.bus_range->end) + if (bus->number < pci->bus_range->start || + bus->number > pci->bus_range->end) return PCIBIOS_DEVICE_NOT_FOUND;
/* * The first device on the bus is the PEM PCIe bridge. * Special case its config access. */ - if (bus->number == pci->cfg.bus_range->start) + if (bus->number == pci->bus_range->start) return thunder_pem_bridge_read(bus, devfn, where, size, val);
return pci_generic_config_read(bus, devfn, where, size, val); @@ -258,33 +249,28 @@ static int thunder_pem_config_write(struct pci_bus *bus, unsigned int devfn, { struct gen_pci *pci = bus->sysdata;
- if (bus->number < pci->cfg.bus_range->start || - bus->number > pci->cfg.bus_range->end) + if (bus->number < pci->bus_range->start || + bus->number > pci->bus_range->end) return PCIBIOS_DEVICE_NOT_FOUND; /* * The first device on the bus is the PEM PCIe bridge. * Special case its config access. */ - if (bus->number == pci->cfg.bus_range->start) + if (bus->number == pci->bus_range->start) return thunder_pem_bridge_write(bus, devfn, where, size, val);
return pci_generic_config_write(bus, devfn, where, size, val); }
-static struct gen_pci_cfg_bus_ops thunder_pem_bus_ops = { - .bus_shift = 24, - .ops = { - .map_bus = thunder_pem_map_bus, - .read = thunder_pem_config_read, - .write = thunder_pem_config_write, - } +static struct pci_ops thunder_pem_pci_ops = { + .map_bus = gen_pci_map_cfg_bus, + .read = thunder_pem_config_read, + .write = thunder_pem_config_write, };
static const struct of_device_id thunder_pem_of_match[] = { - { .compatible = "cavium,pci-host-thunder-pem", - .data = &thunder_pem_bus_ops }, - + { .compatible = "cavium,pci-host-thunder-pem" }, { }, }; MODULE_DEVICE_TABLE(of, thunder_pem_of_match); @@ -292,7 +278,6 @@ MODULE_DEVICE_TABLE(of, thunder_pem_of_match); static int thunder_pem_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; - const struct of_device_id *of_id; resource_size_t bar4_start; struct resource *res_pem; struct thunder_pem_pci *pem_pci; @@ -301,8 +286,8 @@ static int thunder_pem_probe(struct platform_device *pdev) if (!pem_pci) return -ENOMEM;
- of_id = of_match_node(thunder_pem_of_match, dev->of_node); - pem_pci->gen_pci.cfg.ops = (struct gen_pci_cfg_bus_ops *)of_id->data; + pem_pci->gen_pci.ops = &thunder_pem_pci_ops; + pem_pci->gen_pci.bus_shift = 24;
/* * The second register range is the PEM bridge to the PCIe
Add a generic ACPI based PCI host controller, and provide config option ACPI_PCI_HOST_GENERIC to enable it.
The implementation selects PCI_MMCONFIG and implements function pci_mmcfg_late_init() to parse MCFG table and save its entries. PCI_GENERIC_ECAM is also selected and pci/ecam.c functions are used to map the config space of the PCI controller.
acpi_pci_root_ops is setup so that init_info looks up the saved MCFG entries and sets up the config space mapping. release_info deletes the mapping and frees the memory. Generic PCI functions are used for accessing config space.
Signed-off-by: Jayachandran C jchandra@broadcom.com --- drivers/acpi/Kconfig | 9 ++ drivers/acpi/Makefile | 1 + drivers/acpi/pci_gen_host.c | 247 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 257 insertions(+) create mode 100644 drivers/acpi/pci_gen_host.c
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 82b96ee..f178f2e 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -343,6 +343,15 @@ config ACPI_PCI_SLOT i.e., segment/bus/device/function tuples, with physical slots in the system. If you are unsure, say N.
+config ACPI_PCI_HOST_GENERIC + bool "Generic ACPI based PCI controller" + depends on ARM64 + select PCI_MMCONFIG + select PCI_GENERIC_ECAM + help + Say Y if you want the generic ACPI based PCI controller + implementation. + config X86_PM_TIMER bool "Power Management Timer Support" if EXPERT depends on X86 diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index cb648a4..ceab5b0 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_ACPI_BUTTON) += button.o obj-$(CONFIG_ACPI_FAN) += fan.o obj-$(CONFIG_ACPI_VIDEO) += video.o obj-$(CONFIG_ACPI_PCI_SLOT) += pci_slot.o +obj-$(CONFIG_ACPI_PCI_HOST_GENERIC) += pci_gen_host.o obj-$(CONFIG_ACPI_PROCESSOR) += processor.o obj-$(CONFIG_ACPI) += container.o obj-$(CONFIG_ACPI_THERMAL) += thermal.o diff --git a/drivers/acpi/pci_gen_host.c b/drivers/acpi/pci_gen_host.c new file mode 100644 index 0000000..a43f2cee --- /dev/null +++ b/drivers/acpi/pci_gen_host.c @@ -0,0 +1,247 @@ +/* + * Copyright 2016 Broadcom + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License, version 2, as + * published by the Free Software Foundation (the "GPL"). + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License version 2 (GPLv2) for more details. + * + * You should have received a copy of the GNU General Public License + * version 2 (GPLv2) along with this source code. + */ +#include <linux/kernel.h> +#include <linux/pci.h> +#include <linux/pci-acpi.h> +#include <linux/sfi_acpi.h> +#include <linux/slab.h> + +#define PREFIX "ACPI: " + +/* + * Structure to hold entries from the MCFG table, these need to be + * mapped until a pci host bridge claims them for raw_pci_read/write + * to work + */ +struct mcfg_entry { + phys_addr_t addr; + u16 segment; + u8 bus_start; + u8 bus_end; +}; + +/* + * Global to save mcfg entries + */ +static struct { + struct mcfg_entry *cfg; + int size; +} mcfgsav; + +/* List of all ACPI PCI roots, needed for raw operations */ +static struct list_head gen_acpi_pci_roots; + +/* lock for global table AND list above */ +static struct mutex gen_acpi_pci_lock; + +/* ACPI info for generic ACPI PCI controller */ +struct acpi_pci_generic_root_info { + struct acpi_pci_root_info common; + struct pci_config_window *cfg; /* config space mapping */ + struct list_head node; /* node in acpi_pci_roots */ +}; + +/* Call generic map_bus after getting cfg pointer */ +static void __iomem *gen_acpi_map_cfg_bus(struct pci_bus *bus, + unsigned int devfn, int where) +{ + struct acpi_pci_generic_root_info *ri = bus->sysdata; + + return pci_generic_map_bus(ri->cfg, bus->number, devfn, where); +} + +static struct pci_ops acpi_pci_ops = { + .map_bus = gen_acpi_map_cfg_bus, + .read = pci_generic_config_read, + .write = pci_generic_config_write, +}; + +/* find the entry in mcfgsav.cfg which contains range bus_start..bus_end */ +static int mcfg_lookup(u16 seg, u8 bus_start, u8 bus_end) +{ + struct mcfg_entry *e; + int i; + + for (i = 0, e = mcfgsav.cfg; i < mcfgsav.size; i++, e++) { + if (seg != e->segment) + continue; + if (bus_start >= e->bus_start && bus_start <= e->bus_end) + return (bus_end <= e->bus_end) ? i : -EINVAL; + else if (bus_end >= e->bus_start && bus_end <= e->bus_end) + return -EINVAL; + } + return -ENOENT; +} + +/* + * init_info - lookup the bus range for the domain in MCFG, and set up + * config space mapping. + */ +static int pci_acpi_generic_init_info(struct acpi_pci_root_info *ci) +{ + struct acpi_pci_generic_root_info *ri; + struct acpi_pci_root *root = ci->root; + u16 seg = root->segment; + u8 bus_start = root->secondary.start; + u8 bus_end = root->secondary.end; + phys_addr_t addr = root->mcfg_addr; + struct mcfg_entry *e; + int ret; + + ri = container_of(ci, struct acpi_pci_generic_root_info, common); + + mutex_lock(&gen_acpi_pci_lock); + ret = mcfg_lookup(seg, bus_start, bus_end); + switch (ret) { + case -ENOENT: + if (addr != 0) /* use address from _CBA */ + break; + pr_err("%04x:%02x-%02x mcfg lookup failed\n", seg, + bus_start, bus_end); + goto err_out; + case -EINVAL: + pr_err("%04x:%02x-%02x bus range error\n", seg, bus_start, + bus_end); + goto err_out; + default: + e = &mcfgsav.cfg[ret]; + if (addr == 0) + addr = e->addr; + if (bus_start != e->bus_start) { + pr_err("%04x:%02x-%02x bus range mismatch %02x\n", + seg, bus_start, bus_end, e->bus_start); + goto err_out; + } + if (addr != e->addr) { + pr_warn("%04x:%02x-%02x addr mismatch, ignoring MCFG\n", + seg, bus_start, bus_end); + } else if (bus_end != e->bus_end) { + pr_warn("%04x:%02x-%02x bus end mismatch %02x\n", + seg, bus_start, bus_end, e->bus_end); + bus_end = min(bus_end, e->bus_end); + } + break; + } + + ri->cfg = pci_generic_map_config(addr, bus_start, bus_end, 20, 12); + if (IS_ERR(ri->cfg)) { + ret = PTR_ERR(ri->cfg); + pr_err("%04x:%02x-%02x error %d mapping CFG\n", seg, bus_start, + bus_end, ret); + goto err_out; + } + list_add_tail(&ri->node, &gen_acpi_pci_roots); +err_out: + mutex_unlock(&gen_acpi_pci_lock); + return ret; +} + +/* release_info: free resrouces allocated by init_info */ +static void pci_acpi_generic_release_info(struct acpi_pci_root_info *ci) +{ + struct acpi_pci_generic_root_info *ri; + + ri = container_of(ci, struct acpi_pci_generic_root_info, common); + + mutex_lock(&gen_acpi_pci_lock); + list_del(&ri->node); + pci_generic_unmap_config(ri->cfg); + mutex_unlock(&gen_acpi_pci_lock); + + kfree(ri); +} + +static struct acpi_pci_root_ops acpi_pci_root_ops = { + .pci_ops = &acpi_pci_ops, + .init_info = pci_acpi_generic_init_info, + .release_info = pci_acpi_generic_release_info, +}; + +/* Interface called from ACPI code to setup PCI host controller */ +struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) +{ + int node = acpi_get_node(root->device->handle); + struct acpi_pci_generic_root_info *ri; + struct pci_bus *bus, *child; + + ri = kzalloc_node(sizeof(*ri), GFP_KERNEL, node); + if (!ri) + return NULL; + + bus = acpi_pci_root_create(root, &acpi_pci_root_ops, &ri->common, ri); + if (!bus) + return NULL; + + pci_bus_size_bridges(bus); + pci_bus_assign_resources(bus); + + list_for_each_entry(child, &bus->children, node) + pcie_bus_configure_settings(child); + + return bus; +} + +/* handle MCFG table entries */ +static __init int handle_mcfg(struct acpi_table_header *header) +{ + struct acpi_table_mcfg *mcfg; + struct acpi_mcfg_allocation *mptr; + struct mcfg_entry *e, *arr; + int i, n; + + if (!header) + return -EINVAL; + + mcfg = (struct acpi_table_mcfg *)header; + mptr = (struct acpi_mcfg_allocation *) &mcfg[1]; + n = (header->length - sizeof(*mcfg)) / sizeof(*mptr); + if (n <= 0 || n > 255) { + pr_err(PREFIX " MCFG has incorrect entries (%d).\n", n); + return -EINVAL; + } + + arr = kcalloc(n, sizeof(*arr), GFP_KERNEL); + if (!arr) + return -ENOMEM; + + for (i = 0, e = arr; i < n; i++, mptr++, e++) { + e->segment = mptr->pci_segment; + e->addr = mptr->address; + e->bus_start = mptr->start_bus_number; + e->bus_end = mptr->end_bus_number; + } + + mcfgsav.cfg = arr; + mcfgsav.size = n; + return 0; +} + +/* Interface called by ACPI - parse and save MCFG table */ +void __init pci_mmcfg_late_init(void) +{ + int err; + + mutex_init(&gen_acpi_pci_lock); + INIT_LIST_HEAD(&gen_acpi_pci_roots); + err = acpi_sfi_table_parse(ACPI_SIG_MCFG, handle_mcfg); + if (err) { + pr_err(PREFIX " Failed to parse MCFG (%d)\n", err); + mcfgsav.size = 0; + } else { + pr_info(PREFIX " MCFG table at %p, %d entries.\n", + mcfgsav.cfg, mcfgsav.size); + } +}
Provide implementations of raw_pci_read and raw_pci_write needed by ACPI.
We already maintain a gen_acpi_pci_roots list with all the ACPI PCI controllers, so we walk thru this list to see if there is an existing mapping. If there is one, we can use the generic config space access.
If there is no existing mapping, create a temporary mapping with information available in the saved MCFG table and use it to do raw config space access.
Signed-off-by: Jayachandran C jchandra@broadcom.com --- drivers/acpi/pci_gen_host.c | 87 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+)
diff --git a/drivers/acpi/pci_gen_host.c b/drivers/acpi/pci_gen_host.c index a43f2cee..b5472f0 100644 --- a/drivers/acpi/pci_gen_host.c +++ b/drivers/acpi/pci_gen_host.c @@ -245,3 +245,90 @@ void __init pci_mmcfg_late_init(void) mcfgsav.cfg, mcfgsav.size); } } + +/* + * First walk thru the root infos of all the known ACPI PCI + * controllers to see if there is an existing mapping and + * use it if we find one. + * Otherwise check the MCFG table and setup a temporary mapping + */ +static int raw_pci_op(int domain, unsigned int busn, unsigned int devfn, + int reg, int len, u32 *val, bool write) +{ + void __iomem *m; + struct acpi_pci_generic_root_info *ri; + struct acpi_pci_root *root; + bool tmpmap = false; + int i, ret = PCIBIOS_DEVICE_NOT_FOUND; + + mutex_lock(&gen_acpi_pci_lock); + list_for_each_entry(ri, &gen_acpi_pci_roots, node) { + root = ri->common.root; + if (domain == root->segment && + busn >= (u8)root->secondary.start && + busn <= (u8)root->secondary.end) { + m = pci_generic_map_bus(ri->cfg, busn, devfn, reg); + if (m) + goto found; + else + goto err_out; + } + } + + /* not found in existing root buses, check in mcfg */ + i = mcfg_lookup(domain, busn, busn); + if (i < 0) + goto err_out; + + /* get a temporary mapping */ + busn -= mcfgsav.cfg[i].bus_start; + m = ioremap(mcfgsav.cfg[i].addr + (busn << 20 | devfn << 12), 1 << 12); + if (!m) + goto err_out; + tmpmap = true; + m += reg; +found: + if (write) { + switch (len) { + case 1: + writeb(*val, m); + break; + case 2: + writew(*val, m); + break; + case 4: + writel(*val, m); + break; + } + } else { + switch (len) { + case 1: + *val = readb(m); + break; + case 2: + *val = readw(m); + break; + case 4: + *val = readl(m); + break; + } + } + if (tmpmap) + iounmap(m); + ret = 0; +err_out: + mutex_unlock(&gen_acpi_pci_lock); + return ret; +} + +int raw_pci_read(unsigned int domain, unsigned int busn, unsigned int devfn, + int reg, int len, u32 *val) +{ + return raw_pci_op(domain, busn, devfn, reg, len, val, false); +} + +int raw_pci_write(unsigned int domain, unsigned int busn, unsigned int devfn, + int reg, int len, u32 val) +{ + return raw_pci_op(domain, busn, devfn, reg, len, &val, true); +}
On Fri, Mar 18, 2016 at 1:48 AM, Jayachandran C jchandra@broadcom.com wrote:
Hi Bjorn,
Here is a new patchset for the ACPI PCI controller driver based on the earlier discussion[1].
The first two patches in the patchset implements pci/ecam.c for generic config space access and uses it in pci-host-generic.c and related files.
The third patch implements the ACPI PCI host driver using the same ecam access functions. The fourth patch adds the implementation of raw operations.
I have not used the pci_mmcfg_list or the region definitions from x86, but have used a much simpler approach here.
This should apply cleanly on top of the current pci next tree, and can be reviewed as a patchset. To use it on ARM64, we need to pull in about 7 patches more from Tomasz patchset that fixes various issues (like stub code in arm64 pci.c, ACPI companion setup, domain number assignment, IO resources fixup etc.).
If you are okay with this approach, I will work with Tomasz and post the full patchset.
This has been tested on qemu with OVMF for the ACPI part and with device tree for pci-host-generic code.
The full patchset is available at https://github.com/jchandra-brcm/linux.git on branch arm64-acpi-pci, if anyone wants to try it.
Comments, suggestions and testing would be welcome.
Thanks, JC.
Hi Jayachandran
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Jayachandran C Sent: 18 March 2016 17:48 To: Bjorn Helgaas; Tomasz Nowicki; rafael@kernel.org Cc: Jayachandran C; Arnd Bergmann; Will Deacon; Catalin Marinas; Hanjun Guo; Lorenzo Pieralisi; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano Stabellini; robert.richter@caviumnetworks.com; Marcin Wojtas; Liviu.Dudau@arm.com; David Daney; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux-pci@vger.kernel.org; linux-arm- kernel@lists.infradead.org; linux-acpi@vger.kernel.org; linux- kernel@vger.kernel.org; linaro-acpi@lists.linaro.org; Jon Masters Subject: Re: [RFC PATCH 0/4] ACPI based PCI host driver with generic ECAM
On Fri, Mar 18, 2016 at 1:48 AM, Jayachandran C jchandra@broadcom.com wrote:
Hi Bjorn,
Here is a new patchset for the ACPI PCI controller driver based on
the
earlier discussion[1].
The first two patches in the patchset implements pci/ecam.c for
generic
config space access and uses it in pci-host-generic.c and related
files.
The third patch implements the ACPI PCI host driver using the same
ecam
access functions. The fourth patch adds the implementation of raw operations.
I have not used the pci_mmcfg_list or the region definitions from
x86,
but have used a much simpler approach here.
This should apply cleanly on top of the current pci next tree, and can be reviewed as a patchset. To use it on ARM64, we need to pull in about 7 patches more from Tomasz patchset that fixes various issues (like stub code in arm64 pci.c, ACPI companion setup, domain number assignment, IO resources fixup etc.).
If you are okay with this approach, I will work with Tomasz and post the full patchset.
This has been tested on qemu with OVMF for the ACPI part and with device tree for pci-host-generic code.
The full patchset is available at https://github.com/jchandra- brcm/linux.git on branch arm64-acpi-pci, if anyone wants to try it.
I had a look at your patchset and also in your git repo at the other patches that you ported over from Tomasz; it seems that now we miss a quirk mechanism to enable controller that are not fully ECAM.
This was provided before by Tomasz in: https://lkml.org/lkml/2016/2/16/410
I think we should put something like that back in...
Thanks
Gab
Comments, suggestions and testing would be welcome.
Thanks, JC.
Hi,
On 3/23/2016 6:22 AM, Gabriele Paoloni wrote:
I had a look at your patchset and also in your git repo at the other patches that you ported over from Tomasz; it seems that now we miss a quirk mechanism to enable controller that are not fully ECAM.
This was provided before by Tomasz in: https://lkml.org/lkml/2016/2/16/410
I think we should put something like that back in...
Thanks
Gab
I was requested to test your patchset. I'll need this mechanism before I can start.
Sinan
On Mon, Mar 28, 2016 at 7:12 PM, Sinan Kaya okaya@codeaurora.org wrote:
Hi,
On 3/23/2016 6:22 AM, Gabriele Paoloni wrote:
I had a look at your patchset and also in your git repo at the other patches that you ported over from Tomasz; it seems that now we miss a quirk mechanism to enable controller that are not fully ECAM.
This was provided before by Tomasz in: https://lkml.org/lkml/2016/2/16/410
I think we should put something like that back in...
Like Tomasz mentioned in his mail, his approach does not work for raw operations. I have added raw operations in may patchset, so we have to come up with a new approach or decide that raw operations can be dropped.
I am waiting for the overall acceptance of the patch set before going further along this path
Thanks
Gab
I was requested to test your patchset. I'll need this mechanism before I can start.
Please see above, we will need to look at the quirks again.
Sinan
JC.
On 09.03.2016 10:13, Tomasz Nowicki wrote:
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64
addr); +extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There, I tried to leave x86 complication in arch/x86/ and extract generic functionalities to driver/pci/ecam.c as the library.
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors being for ARM64 world. Unfortunately, nobody was able to show real use case for ARM64. Do you see the reason we need this? Our conclusion was to leave it empty for ARM64 which in turn makes code simpler. I am not ASWG member while that was under discussion so I will ask Lorenzo to elaborate more on this.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place. I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do now, but ECAM region and sysdata association will remove ECAM region lookup step (see patch 09/15 of this series).
On x86, the normal config access path is: pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#) I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Any comments appreciated.
Hi Bjorn,
Kindly reminder. I would like to move on with this patch set. Can you please comments on it so that we could decide which way to go.
Regards, Tomasz
Hi Tomasz,
On Tue, Apr 05, 2016 at 04:11:55PM +0200, Tomasz Nowicki wrote:
On 09.03.2016 10:13, Tomasz Nowicki wrote:
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
On Fri, Mar 04, 2016 at 02:05:56PM +0530, Jayachandran Chandrashekaran Nair wrote:
On Fri, Mar 4, 2016 at 4:21 AM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
I think the major decision is whether to maintain the pci_mmcfg_list for all architectures or not. My initial plan was not to do this because of the mess (basically the ECAM region info should be attached to the pci root and not maintained in a separate list that needs locking), The patch I posted initially https://patchwork.ozlabs.org/patch/553464/ had a much simpler way of handling the MCFG table without using the list.
I agree that ECAM info should be attached to the PCI host controller. That should simplify locking and hot-add and hot-removal of host controllers.
I think pci_mmcfg_list is an implementation detail that may not need to be generic. I certainly don't think it needs to be part of the interface.
In x86 case it is not feasible to remove using the pci_mmcfg_list. The only use of it outside is in xen that can be fixed up.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
There's a little bit of overlap here with the ECAM code in pci-host-generic.c. I'm not sure whether or how to include that, but it's a very good example of how simple this *should* be: probe the host bridge, discover the ECAM region, request the region, ioremap it, done.
I had a similar approach in my initial patchset, please see the patch above. The resource for ECAM is mapped similar to the the way pci-host-generic.c handled it. An additional step I could do was to move the common code (ioremap and mapbus) into a common file and share the code with pci-host-generic.c
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c new file mode 100644 index 0000000..ea84365 --- /dev/null +++ b/drivers/acpi/pci_mcfg.c ... +int __weak pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg)
+{
struct resource *tmp;
void __iomem *vaddr;
tmp = insert_resource_conflict(&iomem_resource, &mcfg->res);
if (tmp) {
dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n",
&mcfg->res, tmp->name, tmp);
return -EBUSY;
}
I think this is a mistake in the x86 MCFG support that we should not carry over to a generic implementation. We should not use the MCFG table for resource reservation because MCFG is not defined by the ACPI spec and an OS need not include support for it. The platform must indicate in some other, more generic way, that ECAM space is reserved. This probably means ECAM space should be declared in a PNP0C02 _CRS method (see the PCI Firmware Spec r3.0, sec 4.1.2, note 2).
We might need some kind of x86-specific quirk that does this, or a pcibios hook or something here; I just don't think it should be generic.
+int __init pci_mmconfig_parse_table(void) +{
return acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg);
+}
I don't like the fact that we parse the entire MCFG table at once here. I think we should look for the information we need when we are claiming a PCI host bridge, e.g., in acpi_pci_root_add(). This might not be a great fit for the way ACPI table management works, but I think it's better to do things on-demand rather than just-in-case.
There is an overhead of looking up this table, and the information available there is very limited (i.e, segment, start_bus, end_bus and address). My approach in the above patch is to save this info into an array at boot time and avoid multiple lookups.
We need to look up MCFG info once per host bridge, so I don't think there's any performance issue here. But we do use acpi_table_parse(), which is __init, and *that* is a reason why we might need to parse the entire MCFG at boot-time. But this is the least of our worries in any case.
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..e9450ef 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -106,6 +106,39 @@ extern const u8 pci_acpi_dsm_uuid[]; #define RESET_DELAY_DSM 0x08 #define FUNCTION_DELAY_DSM 0x09
+/* common API to maintain list of MCFG regions */ +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2)
+struct pci_mmcfg_region {
struct list_head list;
struct resource res;
u64 address;
char __iomem *virt;
u16 segment;
u8 start_bus;
u8 end_bus;
char name[PCI_MMCFG_RESOURCE_NAME_LEN];
+};
+extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
phys_addr_t addr);
+extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start,
int end, u64
addr); +extern int pci_mmconfig_map_resource(struct device *dev,
struct pci_mmcfg_region *mcfg);
+extern void pci_mmconfig_unmap_resource(struct pci_mmcfg_region *mcfg); +extern int pci_mmconfig_enabled(void); +extern int __init pci_mmconfig_parse_table(void);
+extern struct list_head pci_mmcfg_list;
+#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) +#define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
With the exception of pci_mmconfig_parse_table(), nothing here is ACPI-specific. I'd like to see the PCI ECAM-related interfaces (hopefully not these exact ones, but a more rational set) put in something like include/linux/pci-ecam.h.
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I can update this patch to
- drop the pci_mmcfg_list handling from generic case
- move common ECAM code so that it can be shared with pci-host-generic.c
if that is what you are looking for. The code will end up looking much simpler.
I think we should ignore x86 mmconfig for now. It is absurdly complicated and I'm not sure it's fixable. I *do* want to keep drivers/acpi/pci_root.c for all ACPI host bridges, including x86, ia64, and arm64.
So I think we should write generic MCFG and ECAM support from scratch for arm64. Something like this:
- Add an acpi_mcfg_init(), maybe in drivers/acpi/pci_mcfg.c, to be called from acpi_init() to copy MCFG info to something we can access after __init. This would not reserve resources, but probably does have to ioremap() the regions to support raw_pci_read().
As said, ECAM and ACPI specific code was isolated in previous patch. There, I tried to leave x86 complication in arch/x86/ and extract generic functionalities to driver/pci/ecam.c as the library.
- Implement raw_pci_read(), which is "special" because ACPI needs it for PCI config access from AML. It's supposed to be "always accessible" and we don't have a struct pci_bus *, so this probably has to use the MCFG copy and the ioremap done above. Maybe it should go in the same file. This is completely independent of the PCI core and PCI data structures.
We were looking for the answer which would justify RAW PCI config accessors being for ARM64 world. Unfortunately, nobody was able to show real use case for ARM64. Do you see the reason we need this? Our conclusion was to leave it empty for ARM64 which in turn makes code simpler. I am not ASWG member while that was under discussion so I will ask Lorenzo to elaborate more on this.
- Implement arm64 pci_acpi_scan_root() that calls acpi_pci_root_create() with an .init_info() function that calls acpi_pci_root_get_mcfg_addr() to read _CBA, and if that fails, looks up the bus range in the MCFG copy from above. It should call request_mem_region(). For a region from _CBA, it should call ioremap(). For regions from MCFG it can probably use the ioremap done by acpi_mcfg_init().
Yes, Expanding .init_info() to check for _CBA is good point.
I know acpi_pci_root_add() calls acpi_pci_root_get_mcfg_addr() before calling pci_acpi_scan_root(), but I think that's wrong because (a) some arches, e.g., ia64, don't use ECAM and (b) _CBA and MCFG should be handled in the same place. I know calling request_mem_region() here will probably be an ordering problem because the PNP0C02 driver hasn't reserved resources yet. But the host bridge driver is using the region and it should reserve it.
- If we store the ECAM mapped base address in the sysdata or struct pci_host_bridge, the normal config accessors can use pci_generic_config_read() with a new generic .map_bus() function.
pci_generic_config_{read|write}() is what we want to use, actually we do now, but ECAM region and sysdata association will remove ECAM region lookup step (see patch 09/15 of this series).
On x86, the normal config access path is: pci_read(struct pci_bus *, ...) raw_pci_read(seg, bus#, ...) raw_pci_ext_ops->read(seg, bus#, ...) pci_mmcfg_read(seg, bus#, ...) pci_dev_base pci_mmconfig_lookup(seg, bus#) I think this is somewhat backwards because we start with a pci_bus pointer, so we *could* have a nice simple bus-specific accessor, but we throw that pointer away, so pci_mmcfg_read() has to start over and look up the ECAM offset from scratch, which makes it all unnecessarily complicated.
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Any comments appreciated.
Kindly reminder. I would like to move on with this patch set. Can you please comments on it so that we could decide which way to go.
Can you repost your current proposal with a version number higher than any previous ones? It's OK if the content is the same as v4; I just think it's confusing if we resurrect v4 and have to follow discussion from v3 to v4 to v5 and back to v4. The archives would be a bit of a muddle.
Bjorn
Hi Bjorn,
On 05.04.2016 18:41, Bjorn Helgaas wrote:
Hi Tomasz,
[...]
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Any comments appreciated.
Kindly reminder. I would like to move on with this patch set. Can you please comments on it so that we could decide which way to go.
Can you repost your current proposal with a version number higher than any previous ones? It's OK if the content is the same as v4; I just think it's confusing if we resurrect v4 and have to follow discussion from v3 to v4 to v5 and back to v4. The archives would be a bit of a muddle.
Sure I will repost ASAP.
Thanks! Tomasz
Hi Bjorn,
On Tue, Apr 5, 2016 at 10:11 PM, Bjorn Helgaas helgaas@kernel.org wrote:
Hi Tomasz,
On Tue, Apr 05, 2016 at 04:11:55PM +0200, Tomasz Nowicki wrote:
On 09.03.2016 10:13, Tomasz Nowicki wrote:
Hi Bjorn,
Thanks for your pointers! See my comments inline. Aslo, can you please have a look at my previous patch set v4 and check how many of your comments are already addressed there. We may want to back to it then.
https://lkml.org/lkml/2016/2/4/646 Especially patches [0-6] which handle MMCONFIG refactoring.
On 05.03.2016 05:14, Bjorn Helgaas wrote:
[...]
As you pointed out raw_pci_{read|write} make things complicated, so IMO we should either say they are absolutely necessary (and then think how to simplify it) or just use simple bus-specific accessor (patch 02/15) e.g. for ARM64.
Any comments appreciated.
Kindly reminder. I would like to move on with this patch set. Can you please comments on it so that we could decide which way to go.
Can you repost your current proposal with a version number higher than any previous ones? It's OK if the content is the same as v4; I just think it's confusing if we resurrect v4 and have to follow discussion from v3 to v4 to v5 and back to v4. The archives would be a bit of a muddle.
I had posted a patchset based on your suggestions in this thread https://lkml.org/lkml/2016/3/17/621
Would appreciate any comments on that. Like I said in the earlier mail, if this is a reasonable approach, I can combine this with Tomasz patchset to provide the full patchset for ACPI support.
Thanks, JC.
Hi Bjorn,
On 03.03.2016 23:51, Bjorn Helgaas wrote:
Hi Tomasz, Jayachandran, et al,
On Tue, Feb 16, 2016 at 02:53:31PM +0100, Tomasz Nowicki wrote:
From: Jayachandran C jchandra@broadcom.com
Move pci_mmcfg_list handling to a drivers/acpi/pci_mcfg.c. This is to share the API and code with ARM64 later. The corresponding declarations are moved from asm/pci_x86.h to linux/pci-acpi.h
As a part of this we introduce three functions that can be implemented by the arch code: pci_mmconfig_map_resource() to map a mcfg entry, pci_mmconfig_unmap_resource to do the corresponding unmap and pci_mmconfig_enabled to see if the arch setup of mcfg entries was successful. We also provide weak implementations of these, which will be used from ARM64. On x86, we retain the old logic by providing platform specific implementation.
This patch is purely rearranging code, it should not have any impact on the logic of MCFG parsing or list handling.
I definitely want to figure out how to make this work well on ARM64. I need to ponder this some more, so these are just some initial thoughts.
My first impression is that (a) the x86 MCFG code is an unmitigated disaster, and (b) we're trying a little too hard to make that mess generic. I think we might be better served if we came up with some cleaner, more generic code that we can use for ARM64 today, and migrate x86 toward that over time.
My concern is that if we elevate the current x86 code to be "arch-independent", we will be perpetuating some interfaces and designs that shouldn't be allowed to escape arch/x86.
Some of the code that moved to drivers/acpi/pci_mcfg.c is not really ACPI-specific, and could potentially be used for non-ACPI bridges that support ECAM. I'd like to see that sort of code moved to a new file like drivers/pci/ecam.c.
Actually I split it as you suggested in the previous patch set. Please have a look at: https://lkml.org/lkml/2016/2/4/646
Especially patches [0-6] which handle MMCONFIG refactoring.
Thanks, Tomasz
Lets keep RAW ACPI PCI config space accessors empty by default, since we are note sure if they are necessary accross all archs. Once we sort this out, we can provide generic version or let architectures to overwrite, like now x86.
Suggested-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- drivers/acpi/pci_mcfg.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index ea84365..0467b00 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -21,6 +21,26 @@ static DEFINE_MUTEX(pci_mmcfg_lock); LIST_HEAD(pci_mmcfg_list);
+/* + * raw_pci_read/write - raw ACPI PCI config space accessors. + * + * By defauly (__weak) these accessors are empty and should be overwritten + * by architectures which support operations on ACPI PCI_Config regions, + * see osl.c file. + */ + +int __weak raw_pci_read(unsigned int domain, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *val) +{ + return PCIBIOS_DEVICE_NOT_FOUND; +} + +int __weak raw_pci_write(unsigned int domain, unsigned int bus, + unsigned int devfn, int reg, int len, u32 val) +{ + return PCIBIOS_DEVICE_NOT_FOUND; +} + static void list_add_sorted(struct pci_mmcfg_region *new) { struct pci_mmcfg_region *cfg;
On Tue, Feb 16, 2016 at 02:53:32PM +0100, Tomasz Nowicki wrote:
Lets keep RAW ACPI PCI config space accessors empty by default, since we are note sure if they are necessary accross all archs. Once we sort this out, we can provide generic version or let architectures to overwrite, like now x86.
"ACPICA code requires raw PCI bus accessors in order to give AML access to PCI_Config regions in platforms where they are actually used. The raw PCI bus accessors implementation is arch-dependent, therefore this patch adds a weak generic implementation (for now empty but can be generalized if common functionality is found among arches) allowing arches where PCI_Config regions are currently required to override it (eg x86) as needed and providing at the same time default stubs for arches that do not require them".
?
Suggested-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org
drivers/acpi/pci_mcfg.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index ea84365..0467b00 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -21,6 +21,26 @@ static DEFINE_MUTEX(pci_mmcfg_lock); LIST_HEAD(pci_mmcfg_list); +/*
- raw_pci_read/write - raw ACPI PCI config space accessors.
- By defauly (__weak) these accessors are empty and should be overwritten
s/defauly/default
- by architectures which support operations on ACPI PCI_Config regions,
- see osl.c file.
Add the path or remove the file reference.
- */
+int __weak raw_pci_read(unsigned int domain, unsigned int bus,
unsigned int devfn, int reg, int len, u32 *val)
+{
- return PCIBIOS_DEVICE_NOT_FOUND;
+}
+int __weak raw_pci_write(unsigned int domain, unsigned int bus,
unsigned int devfn, int reg, int len, u32 val)
+{
- return PCIBIOS_DEVICE_NOT_FOUND;
+}
static void list_add_sorted(struct pci_mmcfg_region *new) { struct pci_mmcfg_region *cfg;
Note: this patch is not strictly required, but it is nice because it removes the raw/dumb/empty accessors from ARM64 code (where they do not belong), so:
Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
We can now enable MCFG library. Currently, there is no ARM64 use case for RAW pci config accessors, so lets use empty ones for now. At the same time, we can cleanup the old implementation of RAW accessors from arch/arm64/kernel/pci.c
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- arch/arm64/Kconfig | 4 ++++ arch/arm64/kernel/pci.c | 15 --------------- 2 files changed, 4 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8cc6228..552e996 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -238,6 +238,10 @@ source "drivers/pci/Kconfig" source "drivers/pci/pcie/Kconfig" source "drivers/pci/hotplug/Kconfig"
+config PCI_MMCONFIG + def_bool y + depends on ACPI + endmenu
menu "Kernel Features" diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index b3d098b..023b983 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -61,21 +61,6 @@ int pcibios_add_device(struct pci_dev *dev) return 0; }
-/* - * raw_pci_read/write - Platform-specific PCI config space access. - */ -int raw_pci_read(unsigned int domain, unsigned int bus, - unsigned int devfn, int reg, int len, u32 *val) -{ - return -ENXIO; -} - -int raw_pci_write(unsigned int domain, unsigned int bus, - unsigned int devfn, int reg, int len, u32 val) -{ - return -ENXIO; -} - #ifdef CONFIG_ACPI /* Root bridge scanning */ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
There are two ways we can get ECAM (aka MCFG) regions using ACPI, first from MCFG static table and second from _CBA method. We cannot remove static regions, however regions coming from _CBA should be removed while removing bridge device.
In the light of above we need flag to mark hot added ECAM entries and user to call pci_mmconfig_insert while adding regions from _CBA method. Similarly pci_mmconfig_delete while removing hot added regions.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- drivers/acpi/pci_mcfg.c | 4 +++- include/linux/pci-acpi.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 0467b00..3282f2a 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -74,6 +74,7 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, new->segment = segment; new->start_bus = start; new->end_bus = end; + new->hot_added = false;
res = &new->res; res->start = addr + PCI_MMCFG_BUS_OFFSET(start); @@ -205,6 +206,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, } rc = pci_mmconfig_map_resource(dev, cfg); if (!rc) { + cfg->hot_added = true; list_add_sorted(cfg); dev_info(dev, "MMCONFIG at %pR (base %#lx)\n", &cfg->res, (unsigned long)addr); @@ -228,7 +230,7 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end) mutex_lock(&pci_mmcfg_lock); list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) if (cfg->segment == seg && cfg->start_bus == start && - cfg->end_bus == end) { + cfg->end_bus == end && cfg->hot_added) { list_del_rcu(&cfg->list); synchronize_rcu(); pci_mmconfig_unmap_resource(cfg); diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index e9450ef..94d8f38 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -119,6 +119,7 @@ struct pci_mmcfg_region { u8 start_bus; u8 end_bus; char name[PCI_MMCFG_RESOURCE_NAME_LEN]; + bool hot_added; };
extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end,
On Tue, Feb 16, 2016 at 02:53:34PM +0100, Tomasz Nowicki wrote:
There are two ways we can get ECAM (aka MCFG) regions using ACPI, first from MCFG static table and second from _CBA method. We cannot remove static regions, however regions coming from _CBA should be removed while removing bridge device.
In the light of above we need flag to mark hot added ECAM entries and user to call pci_mmconfig_insert while adding regions from _CBA method. Similarly pci_mmconfig_delete while removing hot added regions.
"According to the PCI firmware specification, ACPI provides two standard mechanisms to retrieve ECAM memory mapped configuration regions (aka MCFG). For non-hot-removable bridges, ECAM bridge configurations are retrieved from the static MCFG table and have to be considered non-hot-removable for the current boot; hot-removable PCI host bridges configurations are retrieved through bridges _CBA methods.
When ECAM regions are added through _CBA methods, they can be marked as hot-added so that, upon respective PCI host bridge hot-removal, they can be unmapped and deleted in that no longer needed.
This patch adds a flag to MCFG regions allowing to mark them as hot-added, so that upon corresponding PCI bridge hot-removal they can be deleted since no longer needed."
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org
drivers/acpi/pci_mcfg.c | 4 +++- include/linux/pci-acpi.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-)
It would be great if x86 people can have a look, we no longer associate a MCFG region to a bridge structure, the end result should be equivalent though, so:
Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 0467b00..3282f2a 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -74,6 +74,7 @@ static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, new->segment = segment; new->start_bus = start; new->end_bus = end;
- new->hot_added = false;
res = &new->res; res->start = addr + PCI_MMCFG_BUS_OFFSET(start); @@ -205,6 +206,7 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, } rc = pci_mmconfig_map_resource(dev, cfg); if (!rc) {
list_add_sorted(cfg); dev_info(dev, "MMCONFIG at %pR (base %#lx)\n", &cfg->res, (unsigned long)addr);cfg->hot_added = true;
@@ -228,7 +230,7 @@ int pci_mmconfig_delete(u16 seg, u8 start, u8 end) mutex_lock(&pci_mmcfg_lock); list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) if (cfg->segment == seg && cfg->start_bus == start &&
cfg->end_bus == end) {
cfg->end_bus == end && cfg->hot_added) { list_del_rcu(&cfg->list); synchronize_rcu(); pci_mmconfig_unmap_resource(cfg);
diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index e9450ef..94d8f38 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -119,6 +119,7 @@ struct pci_mmcfg_region { u8 start_bus; u8 end_bus; char name[PCI_MMCFG_RESOURCE_NAME_LEN];
- bool hot_added;
}; extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, -- 1.9.1
x86 uses lots of arch-specific data to maintain MCFG regions. However, there is no need to. Firstly, information like start_bus, end_bus can be obtained from acpi_pci_root structure. Secondly, mcfg_added flag is already integrated to MCFG library, so it is enough to call functions pci_mmconfig_insert and pci_mmconfig_delete which are handling hot-plugged MCFG regions internally.
This patch implements above improvements, as a results we get much smaller pci_root_info structure.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- arch/x86/pci/acpi.c | 30 ++++++++---------------------- 1 file changed, 8 insertions(+), 22 deletions(-)
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index cec68e7..081dc70 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -11,11 +11,6 @@ struct pci_root_info { struct acpi_pci_root_info common; struct pci_sysdata sd; -#ifdef CONFIG_PCI_MMCONFIG - bool mcfg_added; - u8 start_bus; - u8 end_bus; -#endif };
static bool pci_use_crs = true; @@ -179,16 +174,13 @@ static int check_segment(u16 seg, struct device *dev, char *estr)
static int setup_mcfg_map(struct acpi_pci_root_info *ci) { - int result, seg; - struct pci_root_info *info; + int result, seg, start, end; struct acpi_pci_root *root = ci->root; struct device *dev = &ci->bridge->dev;
- info = container_of(ci, struct pci_root_info, common); - info->start_bus = (u8)root->secondary.start; - info->end_bus = (u8)root->secondary.end; - info->mcfg_added = false; - seg = info->sd.domain; + seg = root->segment; + start = root->secondary.start; + end = root->secondary.end;
/* return success if MMCFG is not in use */ if (raw_pci_ext_ops && raw_pci_ext_ops != &pci_mmcfg) @@ -197,13 +189,11 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci) if (!(pci_probe & PCI_PROBE_MMCONF)) return check_segment(seg, dev, "MMCONFIG is disabled,");
- result = pci_mmconfig_insert(dev, seg, info->start_bus, info->end_bus, - root->mcfg_addr); + result = pci_mmconfig_insert(dev, seg, start, end, root->mcfg_addr); if (result == 0) { /* enable MMCFG if it hasn't been enabled yet */ if (raw_pci_ext_ops == NULL) raw_pci_ext_ops = &pci_mmcfg; - info->mcfg_added = true; } else if (result != -EEXIST) return check_segment(seg, dev, "fail to add MMCONFIG information,"); @@ -213,14 +203,10 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci)
static void teardown_mcfg_map(struct acpi_pci_root_info *ci) { - struct pci_root_info *info; + struct acpi_pci_root *root = ci->root;
- info = container_of(ci, struct pci_root_info, common); - if (info->mcfg_added) { - pci_mmconfig_delete(info->sd.domain, - info->start_bus, info->end_bus); - info->mcfg_added = false; - } + pci_mmconfig_delete(root->segment, root->secondary.start, + root->secondary.end); } #else static int setup_mcfg_map(struct acpi_pci_root_info *ci)
Currently we have two platforms (x86 & ia64) capable of PCI ACPI host bridge initialization. They both use arch-specific sysdata to pass down parent device reference and both rely on NULL parent in pci_create_root_bus() to validate sysdata content.
It looks hacky and prevents us from getting some firmware specific info for PCI host controller based on its acpi_device structure in generic pci_create_root_bus() function. However, we overcome that blocker by passing down parent device via pci_create_root_bus parameter (as the ACPI device type). Then we use ACPI_COMPANION_SET in core code for ACPI boot method only. ACPI_COMPANION_SET is safe to run for all cases DT, ACPI and DT&ACPI.
Since now PCI core code is setting ACPI companion device for us, x86 & ia64 specific ACPI companion device setting turns out to be dead now. We can get rid of it, including related companion reference from PCI sysdata structure. Aslo, PCI_CONTROLLER macro cannot return valid companion device anymore. Therefore we need to convert its usage to ACPI_COMPANION.
Suggested-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Tomasz Nowicki tn@semihalf.com Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- arch/ia64/hp/common/sba_iommu.c | 2 +- arch/ia64/include/asm/pci.h | 1 - arch/ia64/pci/pci.c | 16 ---------------- arch/ia64/sn/kernel/io_acpi_init.c | 4 ++-- arch/x86/include/asm/pci.h | 3 --- arch/x86/pci/acpi.c | 17 ----------------- drivers/acpi/pci_root.c | 8 +++++++- drivers/pci/probe.c | 2 ++ 8 files changed, 12 insertions(+), 41 deletions(-)
diff --git a/arch/ia64/hp/common/sba_iommu.c b/arch/ia64/hp/common/sba_iommu.c index a6d6190..78e4444 100644 --- a/arch/ia64/hp/common/sba_iommu.c +++ b/arch/ia64/hp/common/sba_iommu.c @@ -1981,7 +1981,7 @@ sba_connect_bus(struct pci_bus *bus) if (PCI_CONTROLLER(bus)->iommu) return;
- handle = acpi_device_handle(PCI_CONTROLLER(bus)->companion); + handle = acpi_device_handle(ACPI_COMPANION(bus->bridge)); if (!handle) return;
diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h index 07039d1..5050748 100644 --- a/arch/ia64/include/asm/pci.h +++ b/arch/ia64/include/asm/pci.h @@ -65,7 +65,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus, #define pci_legacy_write platform_pci_legacy_write
struct pci_controller { - struct acpi_device *companion; void *iommu; int segment; int node; /* nearest node with memory or NUMA_NO_NODE for global allocation */ diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 8f6ac2f..978d6af 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -301,28 +301,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) }
info->controller.segment = root->segment; - info->controller.companion = device; info->controller.node = acpi_get_node(device->handle); INIT_LIST_HEAD(&info->io_resources); return acpi_pci_root_create(root, &pci_acpi_root_ops, &info->common, &info->controller); }
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) -{ - /* - * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL - * here, pci_create_root_bus() has been called by someone else and - * sysdata is likely to be different from what we expect. Let it go in - * that case. - */ - if (!bridge->dev.parent) { - struct pci_controller *controller = bridge->bus->sysdata; - ACPI_COMPANION_SET(&bridge->dev, controller->companion); - } - return 0; -} - void pcibios_fixup_device_resources(struct pci_dev *dev) { int idx; diff --git a/arch/ia64/sn/kernel/io_acpi_init.c b/arch/ia64/sn/kernel/io_acpi_init.c index 0640739..bcfddc2 100644 --- a/arch/ia64/sn/kernel/io_acpi_init.c +++ b/arch/ia64/sn/kernel/io_acpi_init.c @@ -132,7 +132,7 @@ sn_get_bussoft_ptr(struct pci_bus *bus) struct acpi_resource_vendor_typed *vendor;
- handle = acpi_device_handle(PCI_CONTROLLER(bus)->companion); + handle = acpi_device_handle(ACPI_COMPANION(bus->bridge)); status = acpi_get_vendor_resource(handle, METHOD_NAME__CRS, &sn_uuid, &buffer); if (ACPI_FAILURE(status)) { @@ -360,7 +360,7 @@ sn_acpi_get_pcidev_info(struct pci_dev *dev, struct pcidev_info **pcidev_info, acpi_status status; struct acpi_buffer name_buffer = { ACPI_ALLOCATE_BUFFER, NULL };
- rootbus_handle = acpi_device_handle(PCI_CONTROLLER(dev)->companion); + rootbus_handle = acpi_device_handle(ACPI_COMPANION(dev->bus->bridge)); status = acpi_evaluate_integer(rootbus_handle, METHOD_NAME__SEG, NULL, &segment); if (ACPI_SUCCESS(status)) { diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index 4625943..a98c022 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -14,9 +14,6 @@ struct pci_sysdata { int domain; /* PCI domain */ int node; /* NUMA node */ -#ifdef CONFIG_ACPI - struct acpi_device *companion; /* ACPI companion device */ -#endif #ifdef CONFIG_X86_64 void *iommu; /* IOMMU private data */ #endif diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 081dc70..c67932e 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -334,7 +334,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) struct pci_sysdata sd = { .domain = domain, .node = node, - .companion = root->device };
memcpy(bus->sysdata, &sd, sizeof(sd)); @@ -349,7 +348,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) else { info->sd.domain = domain; info->sd.node = node; - info->sd.companion = root->device; bus = acpi_pci_root_create(root, &acpi_pci_root_ops, &info->common, &info->sd); } @@ -367,21 +365,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) return bus; }
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) -{ - /* - * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL - * here, pci_create_root_bus() has been called by someone else and - * sysdata is likely to be different from what we expect. Let it go in - * that case. - */ - if (!bridge->dev.parent) { - struct pci_sysdata *sd = bridge->bus->sysdata; - ACPI_COMPANION_SET(&bridge->dev, sd->companion); - } - return 0; -} - int __init pci_acpi_init(void) { struct pci_dev *dev = NULL; diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index ae3fe4e..c2bd6dd 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -846,7 +846,13 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
pci_acpi_root_add_resources(info); pci_add_resource(&info->resources, &root->secondary); - bus = pci_create_root_bus(NULL, busnum, ops->pci_ops, + + /* + * pci_create_root_bus() needs to detect the parent device type, + * so initialize its companion data accordingly. + */ + ACPI_COMPANION_SET(&device->dev, device); + bus = pci_create_root_bus(&device->dev, busnum, ops->pci_ops, sysdata, &info->resources); if (!bus) goto out_release_info; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 6d7ab9b..88a4734 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2100,6 +2100,8 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus, bridge->dev.parent = parent; bridge->dev.release = pci_release_host_bridge_dev; dev_set_name(&bridge->dev, "pci%04x:%02x", pci_domain_nr(b), bus); + if (parent) + ACPI_COMPANION_SET(&bridge->dev, ACPI_COMPANION(parent)); error = pcibios_root_bridge_prepare(bridge); if (error) { kfree(bridge);
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain assign methods into the corresponding helpers.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Reviewed-by: Liviu Dudau Liviu.Dudau@arm.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- drivers/acpi/pci_root.c | 18 ++++++++++++++++++ drivers/pci/pci.c | 11 +++++++++-- include/linux/pci-acpi.h | 2 ++ 3 files changed, 29 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index c2bd6dd..3b284dc 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -419,6 +419,24 @@ out: } EXPORT_SYMBOL(acpi_pci_osc_control_set);
+int acpi_pci_bus_domain_nr(struct device *parent) +{ + struct acpi_device *acpi_dev = to_acpi_device(parent); + unsigned long long segment = 0; + acpi_status status; + + /* + * If _SEG method does not exist, following ACPI spec (6.5.6) + * all PCI buses belong to domain 0. + */ + status = acpi_evaluate_integer(acpi_dev->handle, METHOD_NAME__SEG, NULL, + &segment); + if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) + dev_err(&acpi_dev->dev, "can't evaluate _SEG\n"); + + return segment; +} + static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm) { u32 support, control, requested; diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 602eb42..d6c768e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -19,6 +19,7 @@ #include <linux/spinlock.h> #include <linux/string.h> #include <linux/log2.h> +#include <linux/pci-acpi.h> #include <linux/pci-aspm.h> #include <linux/pm_wakeup.h> #include <linux/interrupt.h> @@ -4769,7 +4770,7 @@ int pci_get_new_domain_nr(void) }
#ifdef CONFIG_PCI_DOMAINS_GENERIC -void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent) +static int of_pci_bus_domain_nr(struct device *parent) { static int use_dt_domains = -1; int domain = of_get_pci_domain_nr(parent->of_node); @@ -4811,7 +4812,13 @@ void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent) domain = -1; }
- bus->domain_nr = domain; + return domain; +} + +void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent) +{ + bus->domain_nr = acpi_disabled ? of_pci_bus_domain_nr(parent) : + acpi_pci_bus_domain_nr(parent); } #endif #endif diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 94d8f38..b4f87ba9 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -22,6 +22,7 @@ static inline acpi_status pci_acpi_remove_pm_notifier(struct acpi_device *dev) { return acpi_remove_pm_notifier(dev); } +extern int acpi_pci_bus_domain_nr(struct device *parent); extern phys_addr_t acpi_pci_root_get_mcfg_addr(acpi_handle handle);
static inline acpi_handle acpi_find_root_bridge_handle(struct pci_dev *pdev) @@ -143,6 +144,7 @@ extern struct list_head pci_mmcfg_list; #else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { } +static inline int acpi_pci_bus_domain_nr(struct device *parent) { return -1; } #endif /* CONFIG_ACPI */
#ifdef CONFIG_ACPI_APEI
Tomasz, Lorenzo,
On Tue, Feb 16, 2016 at 7:23 PM, Tomasz Nowicki tn@semihalf.com wrote:
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part of http://www.spinics.net/lists/arm-kernel/msg478169.html
JC.
On 17.02.2016 14:44, Jayachandran Chandrashekaran Nair wrote:
Tomasz, Lorenzo,
On Tue, Feb 16, 2016 at 7:23 PM, Tomasz Nowickitn@semihalf.com wrote:
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly way. This may hit us again once we want to obtain another firmware specific info e.g. numa node. IMO we need to fix it this way.
Tomasz
On Wed, Feb 17, 2016 at 7:37 PM, Tomasz Nowicki tn@semihalf.com wrote:
On 17.02.2016 14:44, Jayachandran Chandrashekaran Nair wrote:
Tomasz, Lorenzo,
On Tue, Feb 16, 2016 at 7:23 PM, Tomasz Nowickitn@semihalf.com wrote:
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly way. This may hit us again once we want to obtain another firmware specific info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL in case of ACPI, and the check is needed not to crash (unless that has changed).
The main part was the macro acpi_pci_get_segment() and the use of acpi_pci_root_info from sysdata to do this.
JC.
On 17.02.2016 15:21, Jayachandran Chandrashekaran Nair wrote:
On Wed, Feb 17, 2016 at 7:37 PM, Tomasz Nowicki tn@semihalf.com wrote:
On 17.02.2016 14:44, Jayachandran Chandrashekaran Nair wrote:
Tomasz, Lorenzo,
On Tue, Feb 16, 2016 at 7:23 PM, Tomasz Nowickitn@semihalf.com wrote:
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly way. This may hit us again once we want to obtain another firmware specific info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL in case of ACPI, and the check is needed not to crash (unless that has changed).
This series passes down valid parent, see [PATCH V5 06/15].
The main part was the macro acpi_pci_get_segment() and the use of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device (without defining another accessors), I do not see the point to use sysdata. Let me know your opinion.
Tomasz
On Wed, Feb 17, 2016 at 8:35 PM, Tomasz Nowicki tn@semihalf.com wrote:
On 17.02.2016 15:21, Jayachandran Chandrashekaran Nair wrote:
On Wed, Feb 17, 2016 at 7:37 PM, Tomasz Nowicki tn@semihalf.com wrote:
On 17.02.2016 14:44, Jayachandran Chandrashekaran Nair wrote:
Tomasz, Lorenzo,
On Tue, Feb 16, 2016 at 7:23 PM, Tomasz Nowickitn@semihalf.com wrote:
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
While at it, for the sake of code clarity we put ACPI and DT domain assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly way. This may hit us again once we want to obtain another firmware specific info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL in case of ACPI, and the check is needed not to crash (unless that has changed).
This series passes down valid parent, see [PATCH V5 06/15].
The main part was the macro acpi_pci_get_segment() and the use of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device (without defining another accessors), I do not see the point to use sysdata. Let me know your opinion.
In the patch, you use the parent info and call _SEG method again. The segment information is available in the ->root->segment of acpi_pci_root_info if you setup the sysdata like in my patch
JC.
On 17.02.2016 16:21, Jayachandran Chandrashekaran Nair wrote:
On Wed, Feb 17, 2016 at 8:35 PM, Tomasz Nowicki tn@semihalf.com wrote:
On 17.02.2016 15:21, Jayachandran Chandrashekaran Nair wrote:
On Wed, Feb 17, 2016 at 7:37 PM, Tomasz Nowicki tn@semihalf.com wrote:
On 17.02.2016 14:44, Jayachandran Chandrashekaran Nair wrote:
Tomasz, Lorenzo,
On Tue, Feb 16, 2016 at 7:23 PM, Tomasz Nowickitn@semihalf.com wrote:
> As we now have valid PCI host bridge device reference we can > introduce code that is going to find its bus domain number using > ACPI _SEG method. > > Note that _SEG method is optional, therefore _SEG absence means > that all PCI buses belong to domain 0. > > While at it, for the sake of code clarity we put ACPI and DT domain > assign methods into the corresponding helpers.
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly way. This may hit us again once we want to obtain another firmware specific info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL in case of ACPI, and the check is needed not to crash (unless that has changed).
This series passes down valid parent, see [PATCH V5 06/15].
The main part was the macro acpi_pci_get_segment() and the use of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device (without defining another accessors), I do not see the point to use sysdata. Let me know your opinion.
In the patch, you use the parent info and call _SEG method again. The segment information is available in the ->root->segment of acpi_pci_root_info if you setup the sysdata like in my patch
I know it is in sysdata->root->segment, but the way it is passed down is wrong. sysdata is the pointer to unknown content (void *) so we need to validate it before we can use it. If we merge this patch we can remove first _SEG call.
Tomasz
Guys,
On Wed, Feb 17, 2016 at 04:35:30PM +0100, Tomasz Nowicki wrote:
[...]
In my patchset, I had a slightly different and I think better approach for this without calling the _SEG method again. Please see http://www.spinics.net/lists/arm-kernel/msg478167.html at the last part ofhttp://www.spinics.net/lists/arm-kernel/msg478169.html
Relying on NULL parent device to make decision on boot method is really ugly way. This may hit us again once we want to obtain another firmware specific info e.g. numa node. IMO we need to fix it this way.
I am not relying on NULL there, in the current code parent is NULL in case of ACPI, and the check is needed not to crash (unless that has changed).
This series passes down valid parent, see [PATCH V5 06/15].
The main part was the macro acpi_pci_get_segment() and the use of acpi_pci_root_info from sysdata to do this.
Since we can obtain related firmware specific data from valid parent device (without defining another accessors), I do not see the point to use sysdata. Let me know your opinion.
In the patch, you use the parent info and call _SEG method again. The segment information is available in the ->root->segment of acpi_pci_root_info if you setup the sysdata like in my patch
I know it is in sysdata->root->segment, but the way it is passed down is wrong. sysdata is the pointer to unknown content (void *) so we need to validate it before we can use it. If we merge this patch we can remove first _SEG call.
I personally do not think there is such a significant difference, both solutions have pros and cons, it is worth keeping in mind though that reading _SEG again to set the bus domain number works only if the value we stash in acpi_pci_root.segment is not overridden, if it is (ie see x86 - agreed that's to fix a FW bug) we have a disconnect.
On the other hand Tomasz's code allows removing some IA64 code in the process (code that sets the bridge companion, so part of the patch should be kept regardless).
So, there are two things to do:
- Assign the bridge companion in PCI core code - Decide where to get the domain number from (acpi_pci_root.segment vs calling _SEG again). At present they are equivalent so I do not see any compelling reason to change this patch.
Side note: there is already a function (pci_domain_nr()) that you can implement in ACPI PCI host generic (by deselecting PCI_DOMAINS_GENERIC if ACPI) so there is no need for acpi_pci_get_segment() in case we have to override _SEG value in the future, at present there is no need, comments appreciated.
Lorenzo
x86 and ia64 are the only arches that implement pcibios_{add|remove}_bus hooks and implement them in the same way. Moreover ARM64 is going to do the same. So it seems that acpi_pci_{add|remove}_bus is generic enough to be default option for pcibios_{add|remove}_bus hooks. Also, it is always safe to run acpi_pci_{add|remove}_bus as they have empty stubs for !ACPI case and return if ACPI has been switched off in run time.
After all we can remove x86 and ia64 pcibios_{add|remove}_bus implementation.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- arch/ia64/pci/pci.c | 10 ---------- arch/x86/pci/common.c | 10 ---------- drivers/pci/probe.c | 3 +++ 3 files changed, 3 insertions(+), 20 deletions(-)
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 978d6af..be4c9ef 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -358,16 +358,6 @@ void pcibios_fixup_bus(struct pci_bus *b) platform_pci_fixup_bus(b); }
-void pcibios_add_bus(struct pci_bus *bus) -{ - acpi_pci_add_bus(bus); -} - -void pcibios_remove_bus(struct pci_bus *bus) -{ - acpi_pci_remove_bus(bus); -} - void pcibios_set_master (struct pci_dev *dev) { /* No special bus mastering setup handling */ diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index 2879efc..5aa25f1 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -171,16 +171,6 @@ void pcibios_fixup_bus(struct pci_bus *b) pcibios_fixup_device_resources(dev); }
-void pcibios_add_bus(struct pci_bus *bus) -{ - acpi_pci_add_bus(bus); -} - -void pcibios_remove_bus(struct pci_bus *bus) -{ - acpi_pci_remove_bus(bus); -} - /* * Only use DMI information to set this if nothing was passed * on the kernel command line (which was parsed earlier). diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 88a4734..9859b12 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -12,6 +12,7 @@ #include <linux/slab.h> #include <linux/module.h> #include <linux/cpumask.h> +#include <linux/pci-acpi.h> #include <linux/pci-aspm.h> #include <linux/aer.h> #include <linux/acpi.h> @@ -2060,10 +2061,12 @@ int __weak pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
void __weak pcibios_add_bus(struct pci_bus *bus) { + acpi_pci_add_bus(bus); }
void __weak pcibios_remove_bus(struct pci_bus *bus) { + acpi_pci_remove_bus(bus); }
struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
We use generic accessors from access.c by default. However, we already know platforms that need special handling while accessing to PCI config space. These platforms will need different accessors set matched against platform ID, domain, bus touple. Therefore we are going to add (in future) DECLARE_ACPI_MCFG_FIXUP which will register platform specific custom accessors. For now, we let pci_mcfg_get_ops to take acpi_pci_root structure as an arguments and left some space for quirk matching algorithm.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- drivers/acpi/pci_mcfg.c | 30 ++++++++++++++++++++++++++++++ include/linux/pci-acpi.h | 12 ++++++++++++ 2 files changed, 42 insertions(+)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 3282f2a..0062257 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -41,6 +41,36 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; }
+void __iomem * +pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) +{ + struct pci_mmcfg_region *cfg; + + cfg = pci_mmconfig_lookup(pci_domain_nr(bus), bus->number); + if (cfg && cfg->virt) + return cfg->virt + + (PCI_MMCFG_BUS_OFFSET(bus->number) | (devfn << 12)) + + offset; + return NULL; +} + +/* Default generic PCI config accessors */ +static struct pci_ops default_pci_mcfg_ops = { + .map_bus = pci_mcfg_dev_base, + .read = pci_generic_config_read, + .write = pci_generic_config_write, +}; + +struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) +{ + /* + * TODO: Match against platform specific quirks and return + * corresponding PCI config space accessor set. + */ + + return &default_pci_mcfg_ops; +} + static void list_add_sorted(struct pci_mmcfg_region *new) { struct pci_mmcfg_region *cfg; diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index b4f87ba9..3dc6a8c 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -141,6 +141,18 @@ extern struct list_head pci_mmcfg_list; #define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) #define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
+#ifdef CONFIG_PCI_MMCONFIG +extern struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root); +extern void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, + int offset); +#else +static inline struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) +{ return NULL; } +static inline void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, + unsigned int devfn, int offset) +{ return NULL; } +#endif + #else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
On Tue, Feb 16, 2016 at 02:53:39PM +0100, Tomasz Nowicki wrote:
We use generic accessors from access.c by default. However, we already know platforms that need special handling while accessing to PCI config space. These platforms will need different accessors set matched against platform ID, domain, bus touple. Therefore we are going to add (in future) DECLARE_ACPI_MCFG_FIXUP which will register platform specific custom accessors. For now, we let pci_mcfg_get_ops to take acpi_pci_root structure as an arguments and left some space for quirk matching algorithm.
You should not describe the future (because you do not know if/when that will be implemented), you should describe what the patch does in its current form.
"This patch implements MCFG based PCI bus operations through MCFG map function and generic PCI accessors".
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org
drivers/acpi/pci_mcfg.c | 30 ++++++++++++++++++++++++++++++ include/linux/pci-acpi.h | 12 ++++++++++++ 2 files changed, 42 insertions(+)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 3282f2a..0062257 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -41,6 +41,36 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; } +void __iomem * +pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) +{
- struct pci_mmcfg_region *cfg;
- cfg = pci_mmconfig_lookup(pci_domain_nr(bus), bus->number);
- if (cfg && cfg->virt)
return cfg->virt +
(PCI_MMCFG_BUS_OFFSET(bus->number) | (devfn << 12)) +
offset;
- return NULL;
+}
+/* Default generic PCI config accessors */ +static struct pci_ops default_pci_mcfg_ops = {
- .map_bus = pci_mcfg_dev_base,
- .read = pci_generic_config_read,
- .write = pci_generic_config_write,
+};
Nit: s/default_pci_mcfg_ops/pci_mcfg_ops
+struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) +{
- /*
* TODO: Match against platform specific quirks and return
* corresponding PCI config space accessor set.
*/
Remove this comment, see above.
- return &default_pci_mcfg_ops;
See above.
+}
static void list_add_sorted(struct pci_mmcfg_region *new) { struct pci_mmcfg_region *cfg; diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index b4f87ba9..3dc6a8c 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -141,6 +141,18 @@ extern struct list_head pci_mmcfg_list; #define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) #define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12) +#ifdef CONFIG_PCI_MMCONFIG +extern struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root); +extern void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn,
int offset);
+#else +static inline struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) +{ return NULL; } +static inline void __iomem *pci_mcfg_dev_base(struct pci_bus *bus,
unsigned int devfn, int offset)
+{ return NULL; } +#endif
#else /* CONFIG_ACPI */ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } static inline void acpi_pci_remove_bus(struct pci_bus *bus) { }
I think it can even be squashed, anyway:
Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
No functional changes in this patch.
PCI I/O space mapping code does not depend on OF, therefore it can be moved to PCI core code. This way we will be able to use it e.g. in ACPI PCI code.
Suggested-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Signed-off-by: Tomasz Nowicki tn@semihalf.com CC: Arnd Bergmann arnd@arndb.de CC: Liviu Dudau Liviu.Dudau@arm.com CC: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com --- drivers/of/address.c | 116 +-------------------------------------------- drivers/pci/pci.c | 115 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/of_address.h | 9 ---- include/linux/pci.h | 5 ++ 4 files changed, 121 insertions(+), 124 deletions(-)
diff --git a/drivers/of/address.c b/drivers/of/address.c index 91a469d..0a553c0 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -4,6 +4,7 @@ #include <linux/ioport.h> #include <linux/module.h> #include <linux/of_address.h> +#include <linux/pci.h> #include <linux/pci_regs.h> #include <linux/sizes.h> #include <linux/slab.h> @@ -673,121 +674,6 @@ const __be32 *of_get_address(struct device_node *dev, int index, u64 *size, } EXPORT_SYMBOL(of_get_address);
-#ifdef PCI_IOBASE -struct io_range { - struct list_head list; - phys_addr_t start; - resource_size_t size; -}; - -static LIST_HEAD(io_range_list); -static DEFINE_SPINLOCK(io_range_lock); -#endif - -/* - * Record the PCI IO range (expressed as CPU physical address + size). - * Return a negative value if an error has occured, zero otherwise - */ -int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) -{ - int err = 0; - -#ifdef PCI_IOBASE - struct io_range *range; - resource_size_t allocated_size = 0; - - /* check if the range hasn't been previously recorded */ - spin_lock(&io_range_lock); - list_for_each_entry(range, &io_range_list, list) { - if (addr >= range->start && addr + size <= range->start + size) { - /* range already registered, bail out */ - goto end_register; - } - allocated_size += range->size; - } - - /* range not registed yet, check for available space */ - if (allocated_size + size - 1 > IO_SPACE_LIMIT) { - /* if it's too big check if 64K space can be reserved */ - if (allocated_size + SZ_64K - 1 > IO_SPACE_LIMIT) { - err = -E2BIG; - goto end_register; - } - - size = SZ_64K; - pr_warn("Requested IO range too big, new size set to 64K\n"); - } - - /* add the range to the list */ - range = kzalloc(sizeof(*range), GFP_ATOMIC); - if (!range) { - err = -ENOMEM; - goto end_register; - } - - range->start = addr; - range->size = size; - - list_add_tail(&range->list, &io_range_list); - -end_register: - spin_unlock(&io_range_lock); -#endif - - return err; -} - -phys_addr_t pci_pio_to_address(unsigned long pio) -{ - phys_addr_t address = (phys_addr_t)OF_BAD_ADDR; - -#ifdef PCI_IOBASE - struct io_range *range; - resource_size_t allocated_size = 0; - - if (pio > IO_SPACE_LIMIT) - return address; - - spin_lock(&io_range_lock); - list_for_each_entry(range, &io_range_list, list) { - if (pio >= allocated_size && pio < allocated_size + range->size) { - address = range->start + pio - allocated_size; - break; - } - allocated_size += range->size; - } - spin_unlock(&io_range_lock); -#endif - - return address; -} - -unsigned long __weak pci_address_to_pio(phys_addr_t address) -{ -#ifdef PCI_IOBASE - struct io_range *res; - resource_size_t offset = 0; - unsigned long addr = -1; - - spin_lock(&io_range_lock); - list_for_each_entry(res, &io_range_list, list) { - if (address >= res->start && address < res->start + res->size) { - addr = address - res->start + offset; - break; - } - offset += res->size; - } - spin_unlock(&io_range_lock); - - return addr; -#else - if (address > IO_SPACE_LIMIT) - return (unsigned long)-1; - - return (unsigned long) address; -#endif -} - static int __of_address_to_resource(struct device_node *dev, const __be32 *addrp, u64 size, unsigned int flags, const char *name, struct resource *r) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index d6c768e..3a516c0 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -3023,6 +3023,121 @@ int pci_request_regions_exclusive(struct pci_dev *pdev, const char *res_name) } EXPORT_SYMBOL(pci_request_regions_exclusive);
+#ifdef PCI_IOBASE +struct io_range { + struct list_head list; + phys_addr_t start; + resource_size_t size; +}; + +static LIST_HEAD(io_range_list); +static DEFINE_SPINLOCK(io_range_lock); +#endif + +/* + * Record the PCI IO range (expressed as CPU physical address + size). + * Return a negative value if an error has occured, zero otherwise + */ +int __weak pci_register_io_range(phys_addr_t addr, resource_size_t size) +{ + int err = 0; + +#ifdef PCI_IOBASE + struct io_range *range; + resource_size_t allocated_size = 0; + + /* check if the range hasn't been previously recorded */ + spin_lock(&io_range_lock); + list_for_each_entry(range, &io_range_list, list) { + if (addr >= range->start && addr + size <= range->start + size) { + /* range already registered, bail out */ + goto end_register; + } + allocated_size += range->size; + } + + /* range not registed yet, check for available space */ + if (allocated_size + size - 1 > IO_SPACE_LIMIT) { + /* if it's too big check if 64K space can be reserved */ + if (allocated_size + SZ_64K - 1 > IO_SPACE_LIMIT) { + err = -E2BIG; + goto end_register; + } + + size = SZ_64K; + pr_warn("Requested IO range too big, new size set to 64K\n"); + } + + /* add the range to the list */ + range = kzalloc(sizeof(*range), GFP_ATOMIC); + if (!range) { + err = -ENOMEM; + goto end_register; + } + + range->start = addr; + range->size = size; + + list_add_tail(&range->list, &io_range_list); + +end_register: + spin_unlock(&io_range_lock); +#endif + + return err; +} + +phys_addr_t pci_pio_to_address(unsigned long pio) +{ + phys_addr_t address = (phys_addr_t)OF_BAD_ADDR; + +#ifdef PCI_IOBASE + struct io_range *range; + resource_size_t allocated_size = 0; + + if (pio > IO_SPACE_LIMIT) + return address; + + spin_lock(&io_range_lock); + list_for_each_entry(range, &io_range_list, list) { + if (pio >= allocated_size && pio < allocated_size + range->size) { + address = range->start + pio - allocated_size; + break; + } + allocated_size += range->size; + } + spin_unlock(&io_range_lock); +#endif + + return address; +} + +unsigned long __weak pci_address_to_pio(phys_addr_t address) +{ +#ifdef PCI_IOBASE + struct io_range *res; + resource_size_t offset = 0; + unsigned long addr = -1; + + spin_lock(&io_range_lock); + list_for_each_entry(res, &io_range_list, list) { + if (address >= res->start && address < res->start + res->size) { + addr = address - res->start + offset; + break; + } + offset += res->size; + } + spin_unlock(&io_range_lock); + + return addr; +#else + if (address > IO_SPACE_LIMIT) + return (unsigned long)-1; + + return (unsigned long) address; +#endif +} + /** * pci_remap_iospace - Remap the memory mapped I/O space * @res: Resource describing the I/O space diff --git a/include/linux/of_address.h b/include/linux/of_address.h index 01c0a55..3786473 100644 --- a/include/linux/of_address.h +++ b/include/linux/of_address.h @@ -47,10 +47,6 @@ void __iomem *of_io_request_and_map(struct device_node *device, extern const __be32 *of_get_address(struct device_node *dev, int index, u64 *size, unsigned int *flags);
-extern int pci_register_io_range(phys_addr_t addr, resource_size_t size); -extern unsigned long pci_address_to_pio(phys_addr_t addr); -extern phys_addr_t pci_pio_to_address(unsigned long pio); - extern int of_pci_range_parser_init(struct of_pci_range_parser *parser, struct device_node *node); extern struct of_pci_range *of_pci_range_parser_one( @@ -86,11 +82,6 @@ static inline const __be32 *of_get_address(struct device_node *dev, int index, return NULL; }
-static inline phys_addr_t pci_pio_to_address(unsigned long pio) -{ - return 0; -} - static inline int of_pci_range_parser_init(struct of_pci_range_parser *parser, struct device_node *node) { diff --git a/include/linux/pci.h b/include/linux/pci.h index 27df4a6..dac677c 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1168,6 +1168,9 @@ int __must_check pci_bus_alloc_resource(struct pci_bus *bus, void *alignf_data);
+int pci_register_io_range(phys_addr_t addr, resource_size_t size); +unsigned long pci_address_to_pio(phys_addr_t addr); +phys_addr_t pci_pio_to_address(unsigned long pio); int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
static inline pci_bus_addr_t pci_bus_address(struct pci_dev *pdev, int bar) @@ -1488,6 +1491,8 @@ static inline int pci_request_regions(struct pci_dev *dev, const char *res_name) { return -EIO; } static inline void pci_release_regions(struct pci_dev *dev) { }
+static inline unsigned long pci_address_to_pio(phys_addr_t addr) { return -1; } + static inline void pci_block_cfg_access(struct pci_dev *dev) { } static inline int pci_block_cfg_access_in_atomic(struct pci_dev *dev) { return 0; }
From: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com
PCI core code contains a set of functions, eg:
pci_assign_unassigned_bus_resources()
that allow to assign the PCI resources for a given bus after enumeration.
On systems where the PCI BARs are immutable (ie they must not and can not be assigned), PCI resources must be claimed in order to be validated and inserted in the PCI resources tree, but there is no generic PCI kernel function for that purpose and the resource claiming is implemented in an arch specific fashion which resulted in arches implementations that contain duplicated code.
This patch, based on the ia64 resource claiming arch implementation, implements a set of functions in core PCI code that provides a PCI core interface for resources claiming for a given PCI bus hierarchy, paving the way for further resource claiming consolidation across architectures.
Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: Arnd Bergmann arnd@arndb.de Cc: Bjorn Helgaas bhelgaas@google.com Cc: Yinghai Lu yinghai@kernel.org --- drivers/pci/setup-bus.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/pci.h | 1 + 2 files changed, 64 insertions(+)
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 7796d0a..c959398 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -1424,6 +1424,69 @@ void pci_bus_assign_resources(const struct pci_bus *bus) } EXPORT_SYMBOL(pci_bus_assign_resources);
+static void pci_claim_device_resources(struct pci_dev *dev) +{ + int i; + + for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) { + struct resource *r = &dev->resource[i]; + + if (!r->flags || r->parent) + continue; + + pci_claim_resource(dev, i); + } +} + +static void pci_claim_bridge_resources(struct pci_dev *dev) +{ + int i; + + for (i = PCI_BRIDGE_RESOURCES; i < PCI_NUM_RESOURCES; i++) { + struct resource *r = &dev->resource[i]; + + if (!r->flags || r->parent) + continue; + + pci_claim_bridge_resource(dev, i); + } +} + +static void pci_bus_allocate_dev_resources(struct pci_bus *b) +{ + struct pci_dev *dev; + struct pci_bus *child; + + list_for_each_entry(dev, &b->devices, bus_list) { + pci_claim_device_resources(dev); + + child = dev->subordinate; + if (child) + pci_bus_allocate_dev_resources(child); + } +} + +static void pci_bus_allocate_resources(struct pci_bus *b) +{ + struct pci_bus *child; + + /* Depth-First Search on bus tree */ + if (b->self) { + pci_read_bridge_bases(b); + pci_claim_bridge_resources(b->self); + } + + list_for_each_entry(child, &b->children, node) + pci_bus_allocate_resources(child); +} + +void pci_bus_claim_resources(struct pci_bus *b) +{ + pci_bus_allocate_resources(b); + pci_bus_allocate_dev_resources(b); +} +EXPORT_SYMBOL(pci_bus_claim_resources); + static void __pci_bridge_assign_resources(const struct pci_dev *bridge, struct list_head *add_head, struct list_head *fail_head) diff --git a/include/linux/pci.h b/include/linux/pci.h index dac677c..6faf994 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1119,6 +1119,7 @@ ssize_t pci_write_vpd(struct pci_dev *dev, loff_t pos, size_t count, const void /* Helper functions for low-level code (drivers/pci/setup-[bus,res].c) */ resource_size_t pcibios_retrieve_fw_addr(struct pci_dev *dev, int idx); void pci_bus_assign_resources(const struct pci_bus *bus); +void pci_bus_claim_resources(struct pci_bus *bus); void pci_bus_size_bridges(struct pci_bus *bus); int pci_claim_resource(struct pci_dev *, int); int pci_claim_bridge_resource(struct pci_dev *bridge, int i);
This patch is going to implement generic PCI host controller for ACPI world, similar to what pci-host-generic.c driver does for DT world.
All such drivers, which we have seen so far, were implemented within arch/ directory since they had some arch assumptions (x86 and ia64). However, they all are doing similar thing, so it makes sense to find some common code and abstract it into the generic driver.
This driver aims to initialize PCI host controller without architecture assumptions. It uses MCFG library to manage PCI config space regions properly. Also, it parses _CRS content to find out host bridge's resources (i.e. MEM/IO). As mentioned in Kconfig help section, ACPI_PCI_HOST_GENERIC choice should be made on a per-architecture basis.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Signed-off-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Signed-off-by: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com TO: Bjorn Helgaas helgaas@kernel.org TO: Rafael J. Wysocki rafael@kernel.org Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Tested-by: Jeremy Linton jeremy.linton@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- drivers/acpi/Kconfig | 7 +++ drivers/acpi/pci_root.c | 128 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/pci-acpi.h | 10 ++-- 3 files changed, 141 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 183ffa3..1c7f57bd 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -346,6 +346,13 @@ config ACPI_PCI_SLOT i.e., segment/bus/device/function tuples, with physical slots in the system. If you are unsure, say N.
+config ACPI_PCI_HOST_GENERIC + bool + help + Select this config option from the architecture Kconfig, + if it is preferred to enable ACPI PCI host controller driver which + has no arch-specific assumptions. + config X86_PM_TIMER bool "Power Management Timer Support" if EXPERT depends on X86 diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index 3b284dc..02fd690 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -532,6 +532,134 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm) } }
+#ifdef CONFIG_ACPI_PCI_HOST_GENERIC +static int pci_acpi_setup_mcfg_map(struct acpi_pci_root_info *ci) +{ + struct acpi_pci_root *root = ci->root; + int ret; + + ret = pci_mmconfig_insert(&ci->bridge->dev, root->segment, + root->secondary.start, root->secondary.end, + root->mcfg_addr); + if (ret == -EEXIST) + ret = 0; + + return ret; +} + +static void pci_acpi_teardown_mcfg_map(struct acpi_pci_root_info *ci) +{ + struct acpi_pci_root *root = ci->root; + + pci_mmconfig_delete(root->segment, root->secondary.start, + root->secondary.end); + kfree(ci); +} + +static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci) +{ + struct list_head *list = &ci->resources; + struct acpi_device *device = ci->bridge; + struct resource_entry *entry, *tmp; + unsigned long flags; + int ret; + + flags = IORESOURCE_IO | IORESOURCE_MEM; + ret = acpi_dev_get_resources(device, list, + acpi_dev_filter_resource_type_cb, + (void *)flags); + if (ret < 0) { + dev_warn(&device->dev, + "failed to parse _CRS method, error code %d\n", ret); + return ret; + } else if (ret == 0) + dev_dbg(&device->dev, + "no IO and memory resources present in _CRS\n"); + + resource_list_for_each_entry_safe(entry, tmp, &ci->resources) { + struct resource *res = entry->res; + + if (entry->res->flags & IORESOURCE_DISABLED) + resource_list_destroy_entry(entry); + else + res->name = ci->name; + + if (res->flags & IORESOURCE_IO) { + resource_size_t cpu_addr = res->start; + resource_size_t pci_addr = cpu_addr - entry->offset; + resource_size_t length = resource_size(res); + unsigned long port; + + if (pci_register_io_range(cpu_addr, length)) { + resource_list_destroy_entry(entry); + continue; + } + + port = pci_address_to_pio(cpu_addr); + if (port == (unsigned long)-1) { + resource_list_destroy_entry(entry); + continue; + } + + res->start = port; + res->end = port + length - 1; + entry->offset = port - pci_addr; + + if (pci_remap_iospace(res, cpu_addr) < 0) + resource_list_destroy_entry(entry); + } + } + return ret; +} + +static struct acpi_pci_root_ops acpi_pci_root_ops = { + .init_info = pci_acpi_setup_mcfg_map, + .release_info = pci_acpi_teardown_mcfg_map, + .prepare_resources = pci_acpi_root_prepare_resources, +}; + +/* Root bridge scanning */ +struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) +{ + int node = acpi_get_node(root->device->handle); + int domain = root->segment; + int busnum = root->secondary.start; + struct acpi_pci_root_info *info; + struct pci_bus *bus, *child; + + if (domain && !pci_domains_supported) { + pr_warn("PCI %04x:%02x: multiple domains not supported.\n", + domain, busnum); + return NULL; + } + + info = kzalloc_node(sizeof(*info), GFP_KERNEL, node); + if (!info) { + dev_err(&root->device->dev, + "pci_bus %04x:%02x: ignored (out of memory)\n", + domain, busnum); + return NULL; + } + + acpi_pci_root_ops.pci_ops = pci_mcfg_get_ops(root); + bus = acpi_pci_root_create(root, &acpi_pci_root_ops, info, root); + if (!bus) + return NULL; + + pci_bus_claim_resources(bus); + pci_assign_unassigned_bus_resources(bus); + + /* + * After the PCI-E bus has been walked and all devices discovered, + * configure any settings of the fabric that might be necessary. + */ + list_for_each_entry(child, &bus->children, node) + pcie_bus_configure_settings(child); + + return bus; +} +#endif /* CONFIG_ACPI_PCI_HOST_GENERIC */ + static int acpi_pci_root_add(struct acpi_device *device, const struct acpi_device_id *not_used) { diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 3dc6a8c..93feb04 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -123,10 +123,6 @@ struct pci_mmcfg_region { bool hot_added; };
-extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, - phys_addr_t addr); -extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); - extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, int end, u64 addr); @@ -142,10 +138,16 @@ extern struct list_head pci_mmcfg_list; #define PCI_MMCFG_OFFSET(bus, devfn) ((bus) << 20 | (devfn) << 12)
#ifdef CONFIG_PCI_MMCONFIG +extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, + phys_addr_t addr); +extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); extern struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root); extern void __iomem *pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset); #else +static inline int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, + u8 end, phys_addr_t addr) { return 0; } +static inline int pci_mmconfig_delete(u16 seg, u8 start, u8 end) { return 0; } static inline struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) { return NULL; } static inline void __iomem *pci_mcfg_dev_base(struct pci_bus *bus,
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set prior to PCI buses enumeration. Algorithm traverses available quirk list, matches against <platform ID (DMI), domain, bus number> tuple and returns corresponding accessors. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and kept self contained. Example,
static const struct dmi_system_id foo_dmi[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
static struct pci_ops foo_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = foo_ecam_config_read, .write = foo_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(foo_dmi, NULL, &foo_ecam_pci_ops, <domain_nr>, <bus_nr>);
More custom (non-DMI) matching can be done via an extra call. Note that there is possibility to assign quirk related private data to root->sysdata which will be available along read/wriate accessor, example:
static int boo_match(struct pci_mcfg_fixup *fixup, struct acpi_pci_root *root) { return [condition] ? 1 : 0; }
int boo_ecam_config_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { struct acpi_pci_root *root = bus->sysdata; struct boo_priv_data *boo_data = root->sysdata;
[..] }
static struct pci_ops boo_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = boo_ecam_config_read, .write = boo_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(NULL, boo_match, &boo_ecam_pci_ops, <domain_nr>, <bus_nr>);
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- drivers/acpi/pci_mcfg.c | 32 ++++++++++++++++++++++++++++++-- include/acpi/acpi_bus.h | 1 + include/asm-generic/vmlinux.lds.h | 7 +++++++ include/linux/pci-acpi.h | 18 ++++++++++++++++++ 4 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 0062257..b343547 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -41,6 +41,29 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; }
+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[]; +extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[]; + +static struct pci_ops *pci_mcfg_check_quirks(struct acpi_pci_root *root) +{ + struct pci_mcfg_fixup *f; + int bus_num = root->secondary.start; + int domain = root->segment; + + /* + * First match against PCI topology domain:bus then use DMI or + * custom match handler. + */ + for (f = __start_acpi_mcfg_fixups; f < __end_acpi_mcfg_fixups; f++) { + if ((f->domain == domain || f->domain == PCI_MCFG_DOMAIN_ANY) && + (f->bus_num == bus_num || f->bus_num == PCI_MCFG_BUS_ANY) && + (f->system ? dmi_check_system(f->system) : 1 && + f->match ? f->match(f, root) : 1)) + return f->ops; + } + return NULL; +} + void __iomem * pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) { @@ -63,10 +86,15 @@ static struct pci_ops default_pci_mcfg_ops = {
struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) { + struct pci_ops *pci_mcfg_ops_quirk; + /* - * TODO: Match against platform specific quirks and return - * corresponding PCI config space accessor set. + * Match against platform specific quirks and return corresponding + * PCI config space accessor set. */ + pci_mcfg_ops_quirk = pci_mcfg_check_quirks(root); + if (pci_mcfg_ops_quirk) + return pci_mcfg_ops_quirk;
return &default_pci_mcfg_ops; } diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 14362a8..0fc6f13 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -556,6 +556,7 @@ struct acpi_pci_root { struct pci_bus *bus; u16 segment; struct resource secondary; /* downstream bus range */ + void *sysdata;
u32 osc_support_set; /* _OSC state of support bits */ u32 osc_control_set; /* _OSC state of control bits */ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index c4bd0e2..c93fc97 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -298,6 +298,13 @@ VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \ } \ \ + /* ACPI MCFG quirks */ \ + .acpi_fixup : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \ + VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \ + *(.acpi_fixup_mcfg) \ + VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \ + } \ + \ /* Built-in firmware blobs */ \ .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_builtin_fw) = .; \ diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 93feb04..9e1bedd 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -123,6 +123,24 @@ struct pci_mmcfg_region { bool hot_added; };
+struct pci_mcfg_fixup { + const struct dmi_system_id *system; + int (*match)(struct pci_mcfg_fixup *, struct acpi_pci_root *); + struct pci_ops *ops; + int domain; + int bus_num; +}; + +#define PCI_MCFG_DOMAIN_ANY -1 +#define PCI_MCFG_BUS_ANY -1 + +/* Designate a routine to fix up buggy MCFG */ +#define DECLARE_ACPI_MCFG_FIXUP(system, match, ops, dom, bus) \ + static const struct pci_mcfg_fixup __mcfg_fixup_##system##dom##bus\ + __used __attribute__((__section__(".acpi_fixup_mcfg"), \ + aligned((sizeof(void *))))) = \ + { system, match, ops, dom, bus }; + extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, int end, u64 addr);
On Tue, 2016-02-16 at 14:53 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set prior to PCI buses enumeration. Algorithm traverses available quirk list, matches against <platform ID (DMI), domain, bus number> tuple and returns corresponding accessors. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and kept self contained. Example,
static const struct dmi_system_id foo_dmi[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
static struct pci_ops foo_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = foo_ecam_config_read, .write = foo_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(foo_dmi, NULL, &foo_ecam_pci_ops, <domain_nr>, <bus_nr>);
More custom (non-DMI) matching can be done via an extra call. Note that there is possibility to assign quirk related private data to root->sysdata which will be available along read/wriate accessor, example:
static int boo_match(struct pci_mcfg_fixup *fixup, struct acpi_pci_root *root) { return [condition] ? 1 : 0; }
int boo_ecam_config_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { struct acpi_pci_root *root = bus->sysdata; struct boo_priv_data *boo_data = root->sysdata;
[..] }
static struct pci_ops boo_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = boo_ecam_config_read, .write = boo_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(NULL, boo_match, &boo_ecam_pci_ops, <domain_nr>, <bus_nr>);
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org
drivers/acpi/pci_mcfg.c | 32 ++++++++++++++++++++++++++++++-- include/acpi/acpi_bus.h | 1 + include/asm-generic/vmlinux.lds.h | 7 +++++++ include/linux/pci-acpi.h | 18 ++++++++++++++++++ 4 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 0062257..b343547 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -41,6 +41,29 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; } +extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[]; +extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[];
+static struct pci_ops *pci_mcfg_check_quirks(struct acpi_pci_root *root) +{
- struct pci_mcfg_fixup *f;
- int bus_num = root->secondary.start;
- int domain = root->segment;
- /*
- * First match against PCI topology domain:bus then use DMI or
- * custom match handler.
- */
- for (f = __start_acpi_mcfg_fixups; f < __end_acpi_mcfg_fixups; f++) {
if ((f->domain == domain || f->domain == PCI_MCFG_DOMAIN_ANY) &&
(f->bus_num == bus_num || f->bus_num == PCI_MCFG_BUS_ANY) &&
(f->system ? dmi_check_system(f->system) : 1 &&
f->match ? f->match(f, root) : 1))
The parens are not quite right here ^^^^ If dmi_check_system() returns true, f->match won't get called.
This should be: (f->system ? dmi_check_system(f->system) : 1) && f->match ? f->match(f, root) : 1)
return f->ops;
- }
- return NULL;
+}
void __iomem * pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) { @@ -63,10 +86,15 @@ static struct pci_ops default_pci_mcfg_ops = { struct pci_ops *pci_mcfg_get_ops(struct acpi_pci_root *root) {
- struct pci_ops *pci_mcfg_ops_quirk;
/*
- * TODO: Match against platform specific quirks and return
- * corresponding PCI config space accessor set.
- * Match against platform specific quirks and return corresponding
- * PCI config space accessor set.
*/
- pci_mcfg_ops_quirk = pci_mcfg_check_quirks(root);
- if (pci_mcfg_ops_quirk)
return pci_mcfg_ops_quirk;
return &default_pci_mcfg_ops; } diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index 14362a8..0fc6f13 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -556,6 +556,7 @@ struct acpi_pci_root { struct pci_bus *bus; u16 segment; struct resource secondary; /* downstream bus range */
- void *sysdata;
u32 osc_support_set; /* _OSC state of support bits */ u32 osc_control_set; /* _OSC state of control bits */ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index c4bd0e2..c93fc97 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -298,6 +298,13 @@ VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \ } \ \
- /* ACPI MCFG quirks */ \
- .acpi_fixup : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \
*(.acpi_fixup_mcfg) \
VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \
- } \
\
/* Built-in firmware blobs */ \ .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_builtin_fw) = .; \ diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 93feb04..9e1bedd 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -123,6 +123,24 @@ struct pci_mmcfg_region { bool hot_added; }; +struct pci_mcfg_fixup {
- const struct dmi_system_id *system;
- int (*match)(struct pci_mcfg_fixup *, struct acpi_pci_root *);
- struct pci_ops *ops;
- int domain;
- int bus_num;
+};
+#define PCI_MCFG_DOMAIN_ANY -1 +#define PCI_MCFG_BUS_ANY -1
+/* Designate a routine to fix up buggy MCFG */ +#define DECLARE_ACPI_MCFG_FIXUP(system, match, ops, dom, bus) \
- static const struct pci_mcfg_fixup __mcfg_fixup_##system##dom##bus\
- __used __attribute__((__section__(".acpi_fixup_mcfg"), \
aligned((sizeof(void *))))) = \
- { system, match, ops, dom, bus };
extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); extern struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, int end, u64 addr);
On 18.03.2016 16:49, Mark Salter wrote:
On Tue, 2016-02-16 at 14:53 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set prior to PCI buses enumeration. Algorithm traverses available quirk list, matches against <platform ID (DMI), domain, bus number> tuple and returns corresponding accessors. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and kept self contained. Example,
static const struct dmi_system_id foo_dmi[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
static struct pci_ops foo_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = foo_ecam_config_read, .write = foo_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(foo_dmi, NULL, &foo_ecam_pci_ops, <domain_nr>, <bus_nr>);
More custom (non-DMI) matching can be done via an extra call. Note that there is possibility to assign quirk related private data to root->sysdata which will be available along read/wriate accessor, example:
static int boo_match(struct pci_mcfg_fixup *fixup, struct acpi_pci_root *root) { return [condition] ? 1 : 0; }
int boo_ecam_config_read(struct pci_bus *bus, unsigned int devfn, int where, int size, u32 *val) { struct acpi_pci_root *root = bus->sysdata; struct boo_priv_data *boo_data = root->sysdata;
[..]
}
static struct pci_ops boo_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = boo_ecam_config_read, .write = boo_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(NULL, boo_match, &boo_ecam_pci_ops, <domain_nr>, <bus_nr>);
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org
drivers/acpi/pci_mcfg.c | 32 ++++++++++++++++++++++++++++++-- include/acpi/acpi_bus.h | 1 + include/asm-generic/vmlinux.lds.h | 7 +++++++ include/linux/pci-acpi.h | 18 ++++++++++++++++++ 4 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_mcfg.c b/drivers/acpi/pci_mcfg.c index 0062257..b343547 100644 --- a/drivers/acpi/pci_mcfg.c +++ b/drivers/acpi/pci_mcfg.c @@ -41,6 +41,29 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; }
+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[]; +extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[];
+static struct pci_ops *pci_mcfg_check_quirks(struct acpi_pci_root *root) +{
- struct pci_mcfg_fixup *f;
- int bus_num = root->secondary.start;
- int domain = root->segment;
- /*
* First match against PCI topology <domain:bus> then use DMI or
* custom match handler.
*/
- for (f = __start_acpi_mcfg_fixups; f < __end_acpi_mcfg_fixups; f++) {
if ((f->domain == domain || f->domain == PCI_MCFG_DOMAIN_ANY) &&
(f->bus_num == bus_num || f->bus_num == PCI_MCFG_BUS_ANY) &&
(f->system ? dmi_check_system(f->system) : 1 &&
f->match ? f->match(f, root) : 1))
The parens are not quite right here ^^^^ If dmi_check_system() returns true, f->match won't get called.
This should be: (f->system ? dmi_check_system(f->system) : 1) && f->match ? f->match(f, root) : 1)
Well spotted. Thanks!
Tomasz
This is the last step before enabling generic ACPI PCI host controller for ARM64. We need to take care of legacy IRQ mapping for non-MSI(X) PCI devices. pcibios_alloc_irq() evaluation is not sensitive to ACPI device enumeration order, so it is the best place to assign device's IRQs for ACPI boot method. Also, it does not hurt DT to be initialized form the same place.
NOTE: *This is going to be temporary solution*. There is ongoing work which aims for cleaning legacy IRQ allocation from arch specific code. We can consider this patch as the necessary evil which will be removed once cleanup series lands in mailnline in the near future.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Suggested-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com --- arch/arm64/kernel/pci.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..6e77e1b 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -52,11 +52,16 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) }
/* - * Try to assign the IRQ number from DT when adding a new device + * Try to assign the IRQ number when probing a new device */ -int pcibios_add_device(struct pci_dev *dev) +int pcibios_alloc_irq(struct pci_dev *dev) { - dev->irq = of_irq_parse_and_map_pci(dev, 0, 0); + if (acpi_disabled) + dev->irq = of_irq_parse_and_map_pci(dev, 0, 0); +#ifdef CONFIG_ACPI + else + return acpi_pci_irq_enable(dev); +#endif
return 0; }
[+ Duc, this needs testing on DT PCI hosts that do not call pci_fixup_irqs()]
On Tue, Feb 16, 2016 at 02:53:44PM +0100, Tomasz Nowicki wrote:
Subject is wrong, leftover from previous posting (ie you do not allocate at device enable anymore).
This is the last step before enabling generic ACPI PCI host controller for ARM64. We need to take care of legacy IRQ mapping for non-MSI(X)
You do not check MSIs anymore.
PCI devices. pcibios_alloc_irq() evaluation is not sensitive to ACPI device enumeration order, so it is the best place to assign device's IRQs for ACPI boot method. Also, it does not hurt DT to be initialized form the same place.
NOTE: *This is going to be temporary solution*. There is ongoing work which aims for cleaning legacy IRQ allocation from arch specific code. We can consider this patch as the necessary evil which will be removed once cleanup series lands in mailnline in the near future.
"To enable PCI legacy IRQs on platforms booting with ACPI, arch code should include ACPI specific callbacks that parse and set-up the device IRQ number, equivalent to the DT boot path. Owing to the current ACPI core scan handlers implementation, ACPI PCI legacy IRQs bindings cannot be parsed at device add time, since that would trigger ACPI scan handlers ordering issues depending on how the ACPI tables are defined.
To solve this problem and consolidate FW PCI legacy IRQs parsing in one single pcibios callback (pending final removal), this patch moves DT PCI IRQ parsing to the pcibios_alloc_irq() callback (called by PCI core code at device probe time) and adds ACPI PCI legacy IRQs parsing to the same callback too, so that FW PCI legacy IRQs parsing is confined in one single arch callback that can be easily removed when code parsing PCI legacy IRQs is consolidated and moved to core PCI code".
?
Signed-off-by: Tomasz Nowicki tn@semihalf.com Suggested-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
arch/arm64/kernel/pci.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..6e77e1b 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -52,11 +52,16 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) } /*
- Try to assign the IRQ number from DT when adding a new device
*/
- Try to assign the IRQ number when probing a new device
-int pcibios_add_device(struct pci_dev *dev) +int pcibios_alloc_irq(struct pci_dev *dev) {
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
- if (acpi_disabled)
dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
+#ifdef CONFIG_ACPI
- else
return acpi_pci_irq_enable(dev);
+#endif return 0; }
It is good this code is now in one single function, it will be removed more quickly :D
So pending APM X-gene DT testing:
Reviewed-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com
It is perfectly fine to use ACPI_PCI_HOST_GENERIC for ARM64, so lets get rid of PCI init empty stub, related ACPI header and go with full-blown PCI host controller driver.
Signed-off-by: Tomasz Nowicki tn@semihalf.com TO: Catalin Marinas catalin.marinas@arm.com TO: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com TO: Will Deacon will.deacon@arm.com TO: Arnd Bergmann arnd@arndb.de CC: Liviu Dudau Liviu.Dudau@arm.com Tested-by: Duc Dang dhdang@apm.com Tested-by: Dongdong Liu liudongdong3@huawei.com Tested-by: Hanjun Guo hanjun.guo@linaro.org Tested-by: Graeme Gregory graeme.gregory@linaro.org Tested-by: Sinan Kaya okaya@codeaurora.org --- arch/arm64/Kconfig | 1 + arch/arm64/kernel/pci.c | 9 --------- 2 files changed, 1 insertion(+), 9 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 552e996..09c49ea 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -2,6 +2,7 @@ config ARM64 def_bool y select ACPI_CCA_REQUIRED if ACPI select ACPI_GENERIC_GSI if ACPI + select ACPI_PCI_HOST_GENERIC if ACPI select ACPI_REDUCED_HARDWARE_ONLY if ACPI select ARCH_HAS_DEVMEM_IS_ALLOWED select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 6e77e1b..1de0168 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -65,12 +65,3 @@ int pcibios_alloc_irq(struct pci_dev *dev)
return 0; } - -#ifdef CONFIG_ACPI -/* Root bridge scanning */ -struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) -{ - /* TODO: Should be revisited when implementing PCI on ACPI */ - return NULL; -} -#endif
Hi Bjorn, Rafael,
On Tue, Feb 16, 2016 at 02:53:30PM +0100, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary.
- Move non-arch specific bits to the core code.
- Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
- Enable above driver on ARM64
I think that apart from some pending review comments that will force some minor patches update, the overall structure of this patchset is in a reasonable shape, I would be grateful if you could have a look from PCI and ACPI perspectives to see if there is some serious rework needed and/or you want us to do things differently.
In particular, the MCFG rework (along with some PCI core changes ie PCI ACPI bridge companion) affects x86 so we definitely need some feedback on that code, otherwise we are stuck and can't enable ACPI PCI support for ARM64.
Thank you very much.
Cheers, Lorenzo
Patches has been built on top of 4.5-rc3 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v5)
NOTE, this patch set depends on Lorenzo's fixes: https://patchwork.ozlabs.org/patch/576450/ which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86, Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5
- dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch https://patchwork.ozlabs.org/patch/575525/
- rewrite PCI legacy IRQs allocation
- squashed two patches 11 and 12 from series v4, fixed bisection issue
- changelog improvements
- rebased to 4.5-rc3
v3 -> v4
- dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html
- added Lorenzo's fix patch 19/24
- ACPI PCI bus domain number assigning cleanup
- changed resource management, we now claim and reassign resources
- improvements for applying quirks
- dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency
- rebased to 4.5-rc1
v2 -> v3
- fix legacy IRQ assigning and IO ports registration
- remove reference to arch specific companion device for ia64
- move ACPI PCI host controller driver to pci_root.c
- drop generic domain assignment for x86 and ia64 as I am not able to run all necessary test variants
- drop patch which cleaned legacy IRQ assignment since it belongs to Mathew's series: https://patchwork.ozlabs.org/patch/557504/
- extend MCFG quirk code
- rebased to 4.4
v1 -> v2
- moved non-arch specific piece of code to dirver/acpi/ directory
- fixed IO resource handling
- introduced PCI config accessors quirks matching
- moved ACPI_COMPANION_SET to generic code
v1 - https://lkml.org/lkml/2015/10/27/504 v2 - https://lkml.org/lkml/2015/12/16/246 v3 - http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04308.html v4 - https://lkml.org/lkml/2016/2/4/646
Jayachandran C (1): ACPI: MCFG: Move mmcfg_list management to drivers/acpi
Lorenzo Pieralisi (1): drivers: pci: add generic code to claim bus resources
Tomasz Nowicki (13): acpi, pci, mcfg: Provide default RAW ACPI PCI config space accessors. arm64, acpi: Use MCFG library and empty PCI config space accessors from pci_mcfg.c file. pci, acpi, ecam: Add flag to indicate whether ECAM region was hot added or not. x86, pci: Cleanup platform specific MCFG data by using ECAM hot_added flag. pci, acpi, x86, ia64: Move ACPI host bridge device companion assignment to core code. pci, acpi: Provide generic way to assign bus domain number. x86, ia64: Include acpi_pci_{add|remove}_bus to the default pcibios_{add|remove}_bus implementation. acpi, mcfg: Add default PCI config accessors implementation and initial support for related quirks. pci, of: Move the PCI I/O space management to PCI core code. pci, acpi: Support for ACPI based generic PCI host controller initialization pci, acpi: Match PCI config space accessors against platfrom specific quirks. arm64, pci, acpi: Assign legacy IRQs once device is enable. arm64, pci, acpi: Start using ACPI based PCI host bridge driver for ARM64.
arch/arm64/Kconfig | 5 + arch/arm64/kernel/pci.c | 35 +--- arch/ia64/hp/common/sba_iommu.c | 2 +- arch/ia64/include/asm/pci.h | 1 - arch/ia64/pci/pci.c | 26 --- arch/ia64/sn/kernel/io_acpi_init.c | 4 +- arch/x86/include/asm/pci.h | 3 - arch/x86/include/asm/pci_x86.h | 24 +-- arch/x86/pci/acpi.c | 47 +---- arch/x86/pci/common.c | 10 - arch/x86/pci/mmconfig-shared.c | 269 ++++--------------------- arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + arch/x86/pci/numachip.c | 1 + drivers/acpi/Kconfig | 7 + drivers/acpi/Makefile | 1 + drivers/acpi/pci_mcfg.c | 392 +++++++++++++++++++++++++++++++++++++ drivers/acpi/pci_root.c | 154 ++++++++++++++- drivers/of/address.c | 116 +---------- drivers/pci/pci.c | 126 +++++++++++- drivers/pci/probe.c | 5 + drivers/pci/setup-bus.c | 63 ++++++ drivers/xen/pci.c | 5 +- include/acpi/acpi_bus.h | 1 + include/asm-generic/vmlinux.lds.h | 7 + include/linux/of_address.h | 9 - include/linux/pci-acpi.h | 68 +++++++ include/linux/pci.h | 6 + 28 files changed, 892 insertions(+), 497 deletions(-) create mode 100644 drivers/acpi/pci_mcfg.c
-- 1.9.1
On 2/16/2016 8:53 AM, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary.
- Move non-arch specific bits to the core code.
- Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
- Enable above driver on ARM64
Patches has been built on top of 4.5-rc3 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v5)
NOTE, this patch set depends on Lorenzo's fixes: https://patchwork.ozlabs.org/patch/576450/ which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86, Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5
- dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch https://patchwork.ozlabs.org/patch/575525/
- rewrite PCI legacy IRQs allocation
- squashed two patches 11 and 12 from series v4, fixed bisection issue
- changelog improvements
- rebased to 4.5-rc3
v3 -> v4
- dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html
- added Lorenzo's fix patch 19/24
- ACPI PCI bus domain number assigning cleanup
- changed resource management, we now claim and reassign resources
- improvements for applying quirks
- dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency
- rebased to 4.5-rc1
Having tested v4 and v5, I'm seeing some resource assignment problems and address conflicts. And problems booting QEMU.
Anybody else seeing the same?
[+ Yinghai]
On Mon, Feb 29, 2016 at 02:03:45PM -0500, Sinan Kaya wrote:
On 2/16/2016 8:53 AM, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary.
- Move non-arch specific bits to the core code.
- Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
- Enable above driver on ARM64
Patches has been built on top of 4.5-rc3 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v5)
NOTE, this patch set depends on Lorenzo's fixes: https://patchwork.ozlabs.org/patch/576450/ which can be found in pci-acpi-v5 branch.
This has been tested on Cavium ThunderX server, JunoR2, HP RX2660 IA64, x86, Hip05, X-Gene and QEMU-aarch64. Any help in reviewing and testing is very appreciated.
v4 -> v5
- dropped MCFG refactoring group patches 1-6 from series v4 and integrated Jayachandran's patch https://patchwork.ozlabs.org/patch/575525/
- rewrite PCI legacy IRQs allocation
- squashed two patches 11 and 12 from series v4, fixed bisection issue
- changelog improvements
- rebased to 4.5-rc3
v3 -> v4
- dropped Jiang's fix http://lkml.iu.edu/hypermail/linux/kernel/1601.1/04318.html
- added Lorenzo's fix patch 19/24
- ACPI PCI bus domain number assigning cleanup
- changed resource management, we now claim and reassign resources
- improvements for applying quirks
- dropped Matthew's http://www.spinics.net/lists/linux-pci/msg45950.html dependency
- rebased to 4.5-rc1
Having tested v4 and v5, I'm seeing some resource assignment problems and address conflicts. And problems booting QEMU.
I asked Tomasz to add resource claiming code in v4 to make sure that, if FW has left resources in a reasonable set-up, we reuse it as-is.
Now, I was and I am aware this could trigger resource allocation issues (in particular in relation to bridges apertures sizing), that can be nonetheless solved by forcing the kernel to reallocate resources (pci=realloc, that's exactly what's there for, release the bridge apertures, resize the busses downstream and reassign the respective hierarchy).
I am not entirely aware of how consistently pci=realloc was used on x86, what I am aware of is the panoply of pci=* command line parameters defined for x86 and I would certainly avoid that.
The decision on whether we claim resources before reassigning them is either dictacted by the boot method (ie ACPI->claim resources by default) or we should control it via a FW option or a command line option, PCI standard (PCI FW revision 3.1, 3.5 "Device State at Firmware/Operating System Handoff) IIUC does not stricly mandate FW configuring the whole PCI hierarchy (and to be 100% compliant we should check the device IO/MEM enable bits before claiming, as x86 does - see pcibios_allocate_dev_resources() in arch/x86/pci/i386.c).
x86 and IA64 claim PCI resources on boot and live with that (well, minus the gazillions x86 pci= parameters that change the PCI resources assignment one way or another), comments very welcome in particular on the pci=realloc option and its usage.
What's certain is, if we do not claim resources by default we will *never* be able to do it, it will certainly trigger regressions.
Lorenzo
On 3/3/2016 6:23 AM, Lorenzo Pieralisi wrote:
x86 and IA64 claim PCI resources on boot and live with that (well, minus the gazillions x86 pci= parameters that change the PCI resources assignment one way or another), comments very welcome in particular on the pci=realloc option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to claim bus resources is working just fine and is ready to go upstream in my opinion. It passed my internal testing with different types of endpoints.
The inclusion of this patch is now requiring everybody to add pci=realloc argument otherwise the resources assigned by the UEFI BIOS are not working.
I think there is still some work to be done in this patch and is too early to be included into the series. It is blocking progress of the series which is sitting on review over 1 year already.
[ 0.752916] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x80360800000-0x8037fffffff 64bit pref] (contains BAR2 for 63 VFs) [ 0.771799] pci 0000:00:00.0: PCI bridge to [bus 01-06] [ 0.777054] pci 0000:00:00.0: root [mem 0x80100100000-0x8013fffffff window] res [mem 0x8013ff00000-0x8013fffffff] nr 14 [ 0.787846] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:14 [ 0.794135] pci 0000:00:00.0: root [mem 0x80300000000-0x8037fffffff window] res [mem 0x80360000000-0x8037fffffff 64bit pref] nr 15 [ 0.805881] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:15 [ 0.812155] pci 0000:01:00.0: root [mem 0x8013ff00000-0x8013fffffff] res [mem 0x8013ff00000-0x8013fffffff 64bit] nr 0 [ 0.822773] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x80360000000-0x803607fffff 64bit pref] nr 2 [ 0.834778] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x8037ff00000-0x8037fffffff pref] nr 6 [ 0.846265] pci 0000:01:00.0: can't claim BAR 9 [mem 0x80360800000-0x8037fffffff 64bit pref]: address conflict with 0000:01:00.0 [mem 0x8037ff00000-0x8037fffffff pref] [ 0.861237] pci 0000:01:00.0: BAR 9: no space for [mem size 0x1f800000 64bit pref] [ 0.868811] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x1f800000 64bit pref]
I keep saying this but the type of CPU is not important when it comes to PCIe. Both PCIe and ACPI are governed by specs. If it is working for x86 and i64; it needs to work for ARM64 as well.
Even ARM64 has the luxury to omit the old BIOS behaviors. Most ARM64 systems use tianocore based UEFI BIOS.
This is pointing to an implementation problem in arm64 adaptation. Need to figure out what is different.
On Thu, Mar 03, 2016 at 09:24:56AM -0500, Sinan Kaya wrote:
On 3/3/2016 6:23 AM, Lorenzo Pieralisi wrote:
x86 and IA64 claim PCI resources on boot and live with that (well, minus the gazillions x86 pci= parameters that change the PCI resources assignment one way or another), comments very welcome in particular on the pci=realloc option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to claim bus resources is working just fine and is ready to go upstream in my opinion. It passed my internal testing with different types of endpoints.
The inclusion of this patch is now requiring everybody to add pci=realloc argument otherwise the resources assigned by the UEFI BIOS are not working.
I think there is still some work to be done in this patch and is too early to be included into the series. It is blocking progress of the series which is sitting on review over 1 year already.
First off, I think that's specious, patch 11 is not blocking anything, if you and Tomasz want to drop it go ahead and take responsibility of the consequences.
I am not saying patch 11 is perfect, it is there to review, if you spot bugs point them out.
If you are interested and willing to make an effort to understand why I asked Tomasz to integrate it, a bit of background here:
http://permalink.gmane.org/gmane.linux.kernel.pci/44830
If we want to drop patch 11, we are going to discard whatever FW set-up at FW/OS hand-off and reassign everything. Want to do it ? Go ahead.
I wrote it in my previous email, probably it was not clear, so, here we go again.
If we want to at least consider the FW PCI configuration at FW/OS handoff, we should read the PCI bridge apertures and claim them, when that fails reassign the corresponding PCI bus hierarchy (which means releasing the bridge resources and downstream devices and reassign them), that's what pci=realloc does.
I think that it is a command line option since it has to be a choice, ie overriding FW set-up should be an option, not a default.
Patch 11 does what x86 does in arch code arch/x86/pci/i386.c,
pcibios_resource_survey()
and that works for them (of course, minus quirks that do exist).
I could integrate the code implementing pci=realloc in patch 11 so that we realloc by default all resources claimed that failed (which means that bridges are resized accordingly and you won't be forced to use pci=realloc on command line).
[ 0.752916] pci 0000:01:00.0: VF(n) BAR2 space: [mem 0x80360800000-0x8037fffffff 64bit pref] (contains BAR2 for 63 VFs) [ 0.771799] pci 0000:00:00.0: PCI bridge to [bus 01-06] [ 0.777054] pci 0000:00:00.0: root [mem 0x80100100000-0x8013fffffff window] res [mem 0x8013ff00000-0x8013fffffff] nr 14 [ 0.787846] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:14 [ 0.794135] pci 0000:00:00.0: root [mem 0x80300000000-0x8037fffffff window] res [mem 0x80360000000-0x8037fffffff 64bit pref] nr 15 [ 0.805881] pci 0000:00:00.0: pci_claim_bridge_resource:714 1: i:15 [ 0.812155] pci 0000:01:00.0: root [mem 0x8013ff00000-0x8013fffffff] res [mem 0x8013ff00000-0x8013fffffff 64bit] nr 0 [ 0.822773] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x80360000000-0x803607fffff 64bit pref] nr 2 [ 0.834778] pci 0000:01:00.0: root [mem 0x80360000000-0x8037fffffff 64bit pref] res [mem 0x8037ff00000-0x8037fffffff pref] nr 6 [ 0.846265] pci 0000:01:00.0: can't claim BAR 9 [mem 0x80360800000-0x8037fffffff 64bit pref]: address conflict with 0000:01:00.0 [mem 0x8037ff00000-0x8037fffffff pref] [ 0.861237] pci 0000:01:00.0: BAR 9: no space for [mem size 0x1f800000 64bit pref] [ 0.868811] pci 0000:01:00.0: BAR 9: failed to assign [mem size 0x1f800000 64bit pref]
I keep saying this but the type of CPU is not important when it comes to PCIe. Both PCIe and ACPI are governed by specs. If it is working for x86 and i64; it needs to work for ARM64 as well.
That's theory. In practice there is massive legacy there and PCI resource assignment is carried out in an arch specific way (otherwise there would be no pci claiming/assignment code in arch/* right ?) and the resource claiming/assignment strictly depends on FW set-up, like it or lump it, that's the way it *currently* is.
I wrote in my previous email, the status of PCI resources at OS/FW handoff is not strictly mandated by the PCI standard AFAIK (it is covered by 3.5 "Device state at Firmware/Operating System Handoff" in the PCI FW spec revision 3.1), so what I suggest above is the only option we have (or you just discard FW configuration altogether, that's what happens if all PCI resources are reassigned, it is a choice to be made and it is neither correct nor wrong, I wish it would).
Even ARM64 has the luxury to omit the old BIOS behaviors. Most ARM64 systems use tianocore based UEFI BIOS.
This is pointing to an implementation problem in arm64 adaptation. Need to figure out what is different.
Look no further, firmware is different. How do we want to proceed ?
Lorenzo
On 04.03.2016 11:55, Lorenzo Pieralisi wrote:
On Thu, Mar 03, 2016 at 09:24:56AM -0500, Sinan Kaya wrote:
On 3/3/2016 6:23 AM, Lorenzo Pieralisi wrote:
x86 and IA64 claim PCI resources on boot and live with that (well, minus the gazillions x86 pci= parameters that change the PCI resources assignment one way or another), comments very welcome in particular on the pci=realloc option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to claim bus resources is working just fine and is ready to go upstream in my opinion. It passed my internal testing with different types of endpoints.
The inclusion of this patch is now requiring everybody to add pci=realloc argument otherwise the resources assigned by the UEFI BIOS are not working.
I think there is still some work to be done in this patch and is too early to be included into the series. It is blocking progress of the series which is sitting on review over 1 year already.
First off, I think that's specious, patch 11 is not blocking anything, if you and Tomasz want to drop it go ahead and take responsibility of the consequences.
I am not saying patch 11 is perfect, it is there to review, if you spot bugs point them out.
If you are interested and willing to make an effort to understand why I asked Tomasz to integrate it, a bit of background here:
http://permalink.gmane.org/gmane.linux.kernel.pci/44830
If we want to drop patch 11, we are going to discard whatever FW set-up at FW/OS hand-off and reassign everything. Want to do it ? Go ahead.
I wrote it in my previous email, probably it was not clear, so, here we go again.
If we want to at least consider the FW PCI configuration at FW/OS handoff, we should read the PCI bridge apertures and claim them, when that fails reassign the corresponding PCI bus hierarchy (which means releasing the bridge resources and downstream devices and reassign them), that's what pci=realloc does.
I think that it is a command line option since it has to be a choice, ie overriding FW set-up should be an option, not a default.
Patch 11 does what x86 does in arch code arch/x86/pci/i386.c,
pcibios_resource_survey()
and that works for them (of course, minus quirks that do exist).
I could integrate the code implementing pci=realloc in patch 11 so that we realloc by default all resources claimed that failed (which means that bridges are resized accordingly and you won't be forced to use pci=realloc on command line).
I agree with Lorenzo. Just because v3 works it does not mean we want to go this way. Also, I think we should realloc all resources claimed that failed, w/o need to use pci=realloc on command line.
Tomasz
Posting on top of Tomasz's email...
On 3/4/2016 7:01 AM, Tomasz Nowicki wrote:
On 04.03.2016 11:55, Lorenzo Pieralisi wrote:
On Thu, Mar 03, 2016 at 09:24:56AM -0500, Sinan Kaya wrote:
On 3/3/2016 6:23 AM, Lorenzo Pieralisi wrote:
x86 and IA64 claim PCI resources on boot and live with that (well, minus the gazillions x86 pci= parameters that change the PCI resources assignment one way or another), comments very welcome in particular on the pci=realloc option and its usage.
I have been working with Linux PCIe over 3 years. I never used pci=realloc argument.
The v5 series minus [PATCH V5 11/15] drivers: pci: add generic code to claim bus resources is working just fine and is ready to go upstream in my opinion. It passed my internal testing with different types of endpoints.
The inclusion of this patch is now requiring everybody to add pci=realloc argument otherwise the resources assigned by the UEFI BIOS are not working.
I think there is still some work to be done in this patch and is too early to be included into the series. It is blocking progress of the series which is sitting on review over 1 year already.
First off, I think that's specious, patch 11 is not blocking anything, if you and Tomasz want to drop it go ahead and take responsibility of the consequences.
I am not saying patch 11 is perfect, it is there to review, if you spot bugs point them out.
If you are interested and willing to make an effort to understand why I asked Tomasz to integrate it, a bit of background here:
http://permalink.gmane.org/gmane.linux.kernel.pci/44830
If we want to drop patch 11, we are going to discard whatever FW set-up at FW/OS hand-off and reassign everything. Want to do it ? Go ahead.
Yes, we should ideally reuse the BAR addresses assigned by FW. And, it is not working as I said. It happens to work on Intel architectures. I'm saying that this patch needs some more work not that it is right or wrong.
How long it takes to figure this out is the question? It could have been dealt with separately. If you can come up with a solution in the near future, I'm ready to test.
There was some big push on v3 to get it tested by multiple vendors. I was under the impression that we are trying to get some version accepted.
Since I'm the only one complaining right now, I guess nobody else is testing v4 and v5.
I wrote it in my previous email, probably it was not clear, so, here we go again.
If we want to at least consider the FW PCI configuration at FW/OS handoff, we should read the PCI bridge apertures and claim them, when that fails reassign the corresponding PCI bus hierarchy (which means releasing the bridge resources and downstream devices and reassign them), that's what pci=realloc does.
I think that it is a command line option since it has to be a choice, ie overriding FW set-up should be an option, not a default.
Patch 11 does what x86 does in arch code arch/x86/pci/i386.c,
pcibios_resource_survey()
and that works for them (of course, minus quirks that do exist).
I could integrate the code implementing pci=realloc in patch 11 so that we realloc by default all resources claimed that failed (which means that bridges are resized accordingly and you won't be forced to use pci=realloc on command line).
I agree with Lorenzo. Just because v3 works it does not mean we want to go this way. Also, I think we should realloc all resources claimed that failed, w/o need to use pci=realloc on command line.
Let's give this a try. I have seen the kernel messages with and without realloc option too. I don't want to see any kind of error messages if it is actually working.
I don't want to get a support request that PCIe is broken even though it is not just because of some error message in boot log that eventually got corrected by the architecture.
Tomasz
To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 04, 2016 at 09:52:17AM -0500, Sinan Kaya wrote:
[...]
I could integrate the code implementing pci=realloc in patch 11 so that we realloc by default all resources claimed that failed (which means that bridges are resized accordingly and you won't be forced to use pci=realloc on command line).
I agree with Lorenzo. Just because v3 works it does not mean we want to go this way. Also, I think we should realloc all resources claimed that failed, w/o need to use pci=realloc on command line.
Let's give this a try. I have seen the kernel messages with and without realloc option too. I don't want to see any kind of error messages if it is actually working.
I agree, claiming resources failures are too noisy, it is a pet-peeve of mine too. The code to realloc resources is in the kernel already, it is just a matter of defining how to use it (ie trigger it by default without command line option - actually the kernel can be already compiled to enable realloc by default, see CONFIG_PCI_REALLOC_ENABLE_AUTO), that's why I added Yinghai to the thread, Bjorn and him have more insights on how this has been used on current systems and I am really keen on getting their opinion, they have more visibility into this than I do, writing the patch itself should be simple enough.
Thanks ! Lorenzo