From the functionality point of view this series might be split into the
following logic parts: 1. Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary. 2. Move non-arch specific bits to the core code. 3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver. 4. Enable above driver on ARM64
Patches has been built on top of 4.4-rc4 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v2)
NOTE, this patch set depends on Matthew's patches: http://www.spinics.net/lists/linux-pci/msg45950.html https://github.com/Vality/linux/tree/pci-fixes
This has been tested on Cavium ThunderX 1 socket server and QEMU. Any help in reviewing and testing is very appreciated.
v1 -> v2 - moved non-arch specific piece of code to dirver/acpi/ directory - fixed IO resource handling - introduced PCI config accessors quirks matching - moved ACPI_COMPANION_SET to generic code
Liu Jiang (1): ACPI, PCI: Refine the way to handle translation_offset for ACPI resources
Tomasz Nowicki (22): x86, pci: Reorder logic of pci_mmconfig_insert() function x86, pci, acpi: Move arch-agnostic MMCONFIG (aka ECAM) and ACPI code out of arch/x86/ directory pci, acpi, mcfg: Provide generic implementation of MCFG code initialization. x86, pci: mmconfig_{32,64}.c code refactoring - remove code duplication. x86, pci, ecam: mmconfig_64.c becomes default implementation for ECAM driver. XEN / PCI: Remove the dependence on arch x86 when PCI_MMCONFIG=y pci, acpi, mcfg: Provide default RAW ACPI PCI config space accessors. arm64, acpi: Use empty PCI config space accessors from mcfg.c file. pci, acpi, ecam: Add flag to indicate whether ECAM region was hot added or not. x86, pci: Cleanup platform specific MCFG data using previously added ECAM hot_added flag. arm64, pci: Remove useless boot time IRQ assignment when booting with DT. pci, acpi: Move ACPI host bridge device companion assignment to core code. x86, ia64, pci: Remove ACPI companion device from platform specific data. pci, acpi: Provide generic way to assign bus domain number. x86, ia64, pci: Convert arches to use PCI_DOMAINS_GENERIC. x86, ia64: Include acpi_pci_{add|remove}_bus to the default pcibios_{add|remove}_bus implementation. acpi, mcfg: Implement two calls that might be used to inject/remove MCFG region. x86, acpi, pci: Use equivalent function introduced in previous patch. acpi, mcfg: Add default PCI config accessors implementation and initial support for related quirks. pci, acpi: Support for ACPI based PCI hostbridge init pci, acpi: Match PCI config space accessors against platfrom specific quirks. arm64, pci, acpi: Start using ACPI based PCI host bridge driver for ARM64.
arch/arm64/Kconfig | 10 ++ arch/arm64/kernel/pci.c | 35 ------ arch/ia64/Kconfig | 3 + arch/ia64/include/asm/pci.h | 3 - arch/ia64/pci/pci.c | 53 +++----- arch/x86/Kconfig | 7 ++ arch/x86/include/asm/pci.h | 10 -- arch/x86/include/asm/pci_x86.h | 28 +---- arch/x86/pci/acpi.c | 43 ++----- arch/x86/pci/common.c | 10 -- arch/x86/pci/irq.c | 10 -- arch/x86/pci/mmconfig-shared.c | 250 ++++++-------------------------------- arch/x86/pci/mmconfig_32.c | 11 +- arch/x86/pci/mmconfig_64.c | 67 +--------- arch/x86/pci/numachip.c | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/mcfg.c | 203 +++++++++++++++++++++++++++++++ drivers/acpi/pci_root.c | 5 +- drivers/acpi/resource.c | 12 +- drivers/pci/Kconfig | 10 ++ drivers/pci/Makefile | 5 + drivers/pci/ecam.c | 234 +++++++++++++++++++++++++++++++++++ drivers/pci/host/Kconfig | 6 + drivers/pci/host/Makefile | 1 + drivers/pci/host/pci-host-acpi.c | 138 +++++++++++++++++++++ drivers/pci/pci.c | 29 ++++- drivers/pci/probe.c | 5 + drivers/xen/pci.c | 7 +- include/asm-generic/vmlinux.lds.h | 7 ++ include/linux/acpi.h | 2 + include/linux/ecam.h | 61 ++++++++++ include/linux/pci-acpi.h | 17 +++ 32 files changed, 817 insertions(+), 467 deletions(-) create mode 100644 drivers/acpi/mcfg.c create mode 100644 drivers/pci/ecam.c create mode 100644 drivers/pci/host/pci-host-acpi.c create mode 100644 include/linux/ecam.h
This patch is the first step for MMCONFIG refactoring process.
Code that uses pci_mmcfg_lock will be moved to common file and become accessible for all architectures. pci_mmconfig_insert() cannot be moved so easily since it is mixing generic mmconfig code with x86 specific logic inside of mutual exclusive block guarded by pci_mmcfg_lock.
To get rid of that constraint, we reorder actions as follow: 1. sanity check for mmconfig region presence, if we already have such region it doesn't make snese to alloc new mmconfig list entry 2. mmconfig entry allocation, no need to lock 3. insertion to iomem_resource has its own lock, no need to wrap it into mutex 4. insertion to mmconfig list can be done as the final step in separate function (candidate for further refactoring) and needs another mmconfig lookup to avoid race condition.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- arch/x86/pci/mmconfig-shared.c | 101 +++++++++++++++++++++++------------------ 1 file changed, 58 insertions(+), 43 deletions(-)
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index dd30b7e..c8bb9b0 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -720,6 +720,38 @@ static int __init pci_mmcfg_late_insert_resources(void) */ late_initcall(pci_mmcfg_late_insert_resources);
+static int pci_mmconfig_inject(struct pci_mmcfg_region *cfg) +{ + struct pci_mmcfg_region *cfg_conflict; + int err = 0; + + mutex_lock(&pci_mmcfg_lock); + cfg_conflict = pci_mmconfig_lookup(cfg->segment, cfg->start_bus); + if (cfg_conflict) { + if (cfg_conflict->end_bus < cfg->end_bus) + pr_info(FW_INFO "MMCONFIG for " + "domain %04x [bus %02x-%02x] " + "only partially covers this bridge\n", + cfg_conflict->segment, cfg_conflict->start_bus, + cfg_conflict->end_bus); + err = -EEXIST; + goto out; + } + + if (pci_mmcfg_arch_map(cfg)) { + pr_warn("fail to map MMCONFIG %pR.\n", &cfg->res); + err = -ENOMEM; + goto out; + } else { + list_add_sorted(cfg); + pr_info("MMCONFIG at %pR (base %#lx)\n", + &cfg->res, (unsigned long)cfg->address); + } +out: + mutex_unlock(&pci_mmcfg_lock); + return err; +} + /* Add MMCFG information for host bridges */ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr) @@ -734,63 +766,46 @@ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, if (start > end) return -EINVAL;
- mutex_lock(&pci_mmcfg_lock); + rcu_read_lock(); cfg = pci_mmconfig_lookup(seg, start); - if (cfg) { - if (cfg->end_bus < end) - dev_info(dev, FW_INFO - "MMCONFIG for " - "domain %04x [bus %02x-%02x] " - "only partially covers this bridge\n", - cfg->segment, cfg->start_bus, cfg->end_bus); - mutex_unlock(&pci_mmcfg_lock); + rcu_read_unlock(); + if (cfg) return -EEXIST; - }
- if (!addr) { - mutex_unlock(&pci_mmcfg_lock); + if (!addr) return -EINVAL; - }
- rc = -EBUSY; cfg = pci_mmconfig_alloc(seg, start, end, addr); - if (cfg == NULL) { + if (!cfg) { dev_warn(dev, "fail to add MMCONFIG (out of memory)\n"); - rc = -ENOMEM; - } else if (!pci_mmcfg_check_reserved(dev, cfg, 0)) { + return -ENOMEM; + } + + rc = -EBUSY; + if (!pci_mmcfg_check_reserved(dev, cfg, 0)) { dev_warn(dev, FW_BUG "MMCONFIG %pR isn't reserved\n", &cfg->res); - } else { - /* Insert resource if it's not in boot stage */ - if (pci_mmcfg_running_state) - tmp = insert_resource_conflict(&iomem_resource, - &cfg->res); - - if (tmp) { - dev_warn(dev, - "MMCONFIG %pR conflicts with " - "%s %pR\n", - &cfg->res, tmp->name, tmp); - } else if (pci_mmcfg_arch_map(cfg)) { - dev_warn(dev, "fail to map MMCONFIG %pR.\n", - &cfg->res); - } else { - list_add_sorted(cfg); - dev_info(dev, "MMCONFIG at %pR (base %#lx)\n", - &cfg->res, (unsigned long)addr); - cfg = NULL; - rc = 0; - } + goto error; }
- if (cfg) { - if (cfg->res.parent) - release_resource(&cfg->res); - kfree(cfg); + /* Insert resource if it's not in boot stage */ + if (pci_mmcfg_running_state) + tmp = insert_resource_conflict(&iomem_resource, &cfg->res); + + if (tmp) { + dev_warn(dev, "MMCONFIG %pR conflicts with %s %pR\n", + &cfg->res, tmp->name, tmp); + goto error; }
- mutex_unlock(&pci_mmcfg_lock); + rc = pci_mmconfig_inject(cfg); + if (!rc) + return 0;
+error: + if (cfg->res.parent) + release_resource(&cfg->res); + kfree(cfg); return rc; }
ECAM standard and MCFG table are architecture independent and it makes sense to share common code across all architectures. Both are going to corresponding files - ecam.c and mcfg.c
While we are here, rename pci_parse_mcfg to acpi_parse_mcfg. We already have acpi_parse_mcfg prototype which is used nowhere. At the same time, we need pci_parse_mcfg been global so acpi_parse_mcfg can be used perfectly here.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- arch/x86/Kconfig | 3 + arch/x86/include/asm/pci_x86.h | 21 ----- arch/x86/pci/acpi.c | 1 + arch/x86/pci/mmconfig-shared.c | 205 +---------------------------------------- arch/x86/pci/mmconfig_32.c | 1 + arch/x86/pci/mmconfig_64.c | 1 + arch/x86/pci/numachip.c | 1 + drivers/acpi/Makefile | 1 + drivers/acpi/mcfg.c | 59 ++++++++++++ drivers/pci/Kconfig | 7 ++ drivers/pci/Makefile | 5 + drivers/pci/ecam.c | 175 +++++++++++++++++++++++++++++++++++ drivers/xen/pci.c | 1 + include/linux/acpi.h | 2 + include/linux/ecam.h | 37 ++++++++ 15 files changed, 299 insertions(+), 221 deletions(-) create mode 100644 drivers/acpi/mcfg.c create mode 100644 drivers/pci/ecam.c create mode 100644 include/linux/ecam.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index db3622f..350bd52 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -128,6 +128,7 @@ config X86 select HAVE_MIXED_BREAKPOINTS_REGS select HAVE_OPROFILE select HAVE_OPTPROBES + select HAVE_PCI_ECAM select HAVE_PCSPKR_PLATFORM select HAVE_PERF_EVENTS select HAVE_PERF_EVENTS_NMI @@ -2365,6 +2366,7 @@ config PCI_DIRECT
config PCI_MMCONFIG def_bool y + select PCI_ECAM depends on X86_32 && PCI && (ACPI || SFI) && (PCI_GOMMCONFIG || PCI_GOANY)
config PCI_OLPC @@ -2382,6 +2384,7 @@ config PCI_DOMAINS
config PCI_MMCONFIG bool "Support mmconfig PCI config space access" + select PCI_ECAM depends on X86_64 && PCI && ACPI
config PCI_CNB20LE_QUIRK diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 16fd8e9..c1c0f37 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -123,33 +123,12 @@ extern int pci_legacy_init(void); extern int pcibios_fixup_irq(struct pci_dev *dev, u8 pin);
/* pci-mmconfig.c */ - -/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ -#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) - -struct pci_mmcfg_region { - struct list_head list; - struct resource res; - u64 address; - char __iomem *virt; - u16 segment; - u8 start_bus; - u8 end_bus; - char name[PCI_MMCFG_RESOURCE_NAME_LEN]; -}; - extern int __init pci_mmcfg_arch_init(void); extern void __init pci_mmcfg_arch_free(void); extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr); -extern int pci_mmconfig_delete(u16 seg, u8 start, u8 end); -extern struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); - -extern struct list_head pci_mmcfg_list; - -#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
/* * AMD Fam10h CPUs are buggy, and cannot access MMIO config space diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index dda4bc1..64caf2b 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -5,6 +5,7 @@ #include <linux/dmi.h> #include <linux/slab.h> #include <linux/pci-acpi.h> +#include <linux/ecam.h> #include <asm/numa.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index c8bb9b0..ce2c2e4 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -18,6 +18,7 @@ #include <linux/slab.h> #include <linux/mutex.h> #include <linux/rculist.h> +#include <linux/ecam.h> #include <asm/e820.h> #include <asm/pci_x86.h> #include <asm/acpi.h> @@ -27,103 +28,6 @@ /* Indicate if the mmcfg resources have been placed into the resource table. */ static bool pci_mmcfg_running_state; static bool pci_mmcfg_arch_init_failed; -static DEFINE_MUTEX(pci_mmcfg_lock); - -LIST_HEAD(pci_mmcfg_list); - -static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) -{ - if (cfg->res.parent) - release_resource(&cfg->res); - list_del(&cfg->list); - kfree(cfg); -} - -static void __init free_all_mmcfg(void) -{ - struct pci_mmcfg_region *cfg, *tmp; - - pci_mmcfg_arch_free(); - list_for_each_entry_safe(cfg, tmp, &pci_mmcfg_list, list) - pci_mmconfig_remove(cfg); -} - -static void list_add_sorted(struct pci_mmcfg_region *new) -{ - struct pci_mmcfg_region *cfg; - - /* keep list sorted by segment and starting bus number */ - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { - if (cfg->segment > new->segment || - (cfg->segment == new->segment && - cfg->start_bus >= new->start_bus)) { - list_add_tail_rcu(&new->list, &cfg->list); - return; - } - } - list_add_tail_rcu(&new->list, &pci_mmcfg_list); -} - -static struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, - int end, u64 addr) -{ - struct pci_mmcfg_region *new; - struct resource *res; - - if (addr == 0) - return NULL; - - new = kzalloc(sizeof(*new), GFP_KERNEL); - if (!new) - return NULL; - - new->address = addr; - new->segment = segment; - new->start_bus = start; - new->end_bus = end; - - res = &new->res; - res->start = addr + PCI_MMCFG_BUS_OFFSET(start); - res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1; - res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; - snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN, - "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end); - res->name = new->name; - - return new; -} - -static struct pci_mmcfg_region *__init pci_mmconfig_add(int segment, int start, - int end, u64 addr) -{ - struct pci_mmcfg_region *new; - - new = pci_mmconfig_alloc(segment, start, end, addr); - if (new) { - mutex_lock(&pci_mmcfg_lock); - list_add_sorted(new); - mutex_unlock(&pci_mmcfg_lock); - - pr_info(PREFIX - "MMCONFIG for domain %04x [bus %02x-%02x] at %pR " - "(base %#lx)\n", - segment, start, end, &new->res, (unsigned long)addr); - } - - return new; -} - -struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) - if (cfg->segment == segment && - cfg->start_bus <= bus && bus <= cfg->end_bus) - return cfg; - - return NULL; -}
static const char *__init pci_mmcfg_e7520(void) { @@ -543,8 +447,8 @@ static void __init pci_mmcfg_reject_broken(int early) } }
-static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, - struct acpi_mcfg_allocation *cfg) +int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, + struct acpi_mcfg_allocation *cfg) { int year;
@@ -566,50 +470,6 @@ static int __init acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, return -EINVAL; }
-static int __init pci_parse_mcfg(struct acpi_table_header *header) -{ - struct acpi_table_mcfg *mcfg; - struct acpi_mcfg_allocation *cfg_table, *cfg; - unsigned long i; - int entries; - - if (!header) - return -EINVAL; - - mcfg = (struct acpi_table_mcfg *)header; - - /* how many config structures do we have */ - free_all_mmcfg(); - entries = 0; - i = header->length - sizeof(struct acpi_table_mcfg); - while (i >= sizeof(struct acpi_mcfg_allocation)) { - entries++; - i -= sizeof(struct acpi_mcfg_allocation); - } - if (entries == 0) { - pr_err(PREFIX "MMCONFIG has no entries\n"); - return -ENODEV; - } - - cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; - for (i = 0; i < entries; i++) { - cfg = &cfg_table[i]; - if (acpi_mcfg_check_entry(mcfg, cfg)) { - free_all_mmcfg(); - return -ENODEV; - } - - if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, - cfg->end_bus_number, cfg->address) == NULL) { - pr_warn(PREFIX "no memory for MCFG entries\n"); - free_all_mmcfg(); - return -ENOMEM; - } - } - - return 0; -} - #ifdef CONFIG_ACPI_APEI extern int (*arch_apei_filter_addr)(int (*func)(__u64 start, __u64 size, void *data), void *data); @@ -668,7 +528,7 @@ void __init pci_mmcfg_early_init(void) if (pci_mmcfg_check_hostbridge()) known_bridge = 1; else - acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); + acpi_sfi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg); __pci_mmcfg_init(1);
set_apei_filter(); @@ -686,7 +546,7 @@ void __init pci_mmcfg_late_init(void)
/* MMCONFIG hasn't been enabled yet, try again */ if (pci_probe & PCI_PROBE_MASK & ~PCI_PROBE_MMCONF) { - acpi_sfi_table_parse(ACPI_SIG_MCFG, pci_parse_mcfg); + acpi_sfi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg); __pci_mmcfg_init(0); } } @@ -720,38 +580,6 @@ static int __init pci_mmcfg_late_insert_resources(void) */ late_initcall(pci_mmcfg_late_insert_resources);
-static int pci_mmconfig_inject(struct pci_mmcfg_region *cfg) -{ - struct pci_mmcfg_region *cfg_conflict; - int err = 0; - - mutex_lock(&pci_mmcfg_lock); - cfg_conflict = pci_mmconfig_lookup(cfg->segment, cfg->start_bus); - if (cfg_conflict) { - if (cfg_conflict->end_bus < cfg->end_bus) - pr_info(FW_INFO "MMCONFIG for " - "domain %04x [bus %02x-%02x] " - "only partially covers this bridge\n", - cfg_conflict->segment, cfg_conflict->start_bus, - cfg_conflict->end_bus); - err = -EEXIST; - goto out; - } - - if (pci_mmcfg_arch_map(cfg)) { - pr_warn("fail to map MMCONFIG %pR.\n", &cfg->res); - err = -ENOMEM; - goto out; - } else { - list_add_sorted(cfg); - pr_info("MMCONFIG at %pR (base %#lx)\n", - &cfg->res, (unsigned long)cfg->address); - } -out: - mutex_unlock(&pci_mmcfg_lock); - return err; -} - /* Add MMCFG information for host bridges */ int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr) @@ -808,26 +636,3 @@ error: kfree(cfg); return rc; } - -/* Delete MMCFG information for host bridges */ -int pci_mmconfig_delete(u16 seg, u8 start, u8 end) -{ - struct pci_mmcfg_region *cfg; - - mutex_lock(&pci_mmcfg_lock); - list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) - if (cfg->segment == seg && cfg->start_bus == start && - cfg->end_bus == end) { - list_del_rcu(&cfg->list); - synchronize_rcu(); - pci_mmcfg_arch_unmap(cfg); - if (cfg->res.parent) - release_resource(&cfg->res); - mutex_unlock(&pci_mmcfg_lock); - kfree(cfg); - return 0; - } - mutex_unlock(&pci_mmcfg_lock); - - return -ENOENT; -} diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 43984bc..246f135 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -12,6 +12,7 @@ #include <linux/pci.h> #include <linux/init.h> #include <linux/rcupdate.h> +#include <linux/ecam.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index bea5249..b14fcd3 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -10,6 +10,7 @@ #include <linux/acpi.h> #include <linux/bitmap.h> #include <linux/rcupdate.h> +#include <linux/ecam.h> #include <asm/e820.h> #include <asm/pci_x86.h>
diff --git a/arch/x86/pci/numachip.c b/arch/x86/pci/numachip.c index 2e565e6..55fbd18 100644 --- a/arch/x86/pci/numachip.c +++ b/arch/x86/pci/numachip.c @@ -13,6 +13,7 @@ * */
+#include <linux/ecam.h> #include <linux/pci.h> #include <asm/pci_x86.h>
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile index 96b3183..265eb90 100644 --- a/drivers/acpi/Makefile +++ b/drivers/acpi/Makefile @@ -65,6 +65,7 @@ obj-$(CONFIG_ACPI_BUTTON) += button.o obj-$(CONFIG_ACPI_FAN) += fan.o obj-$(CONFIG_ACPI_VIDEO) += video.o obj-$(CONFIG_ACPI_PCI_SLOT) += pci_slot.o +obj-$(CONFIG_PCI_MMCONFIG) += mcfg.o obj-$(CONFIG_ACPI_PROCESSOR) += processor.o obj-y += container.o obj-$(CONFIG_ACPI_THERMAL) += thermal.o diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c new file mode 100644 index 0000000..5ecef20 --- /dev/null +++ b/drivers/acpi/mcfg.c @@ -0,0 +1,59 @@ +/* + * MCFG ACPI table parser. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/acpi.h> +#include <linux/ecam.h> + +#include <asm/pci_x86.h> /* Temp hack before refactoring arch-specific calls */ + +#define PREFIX "MCFG: " + +int __init acpi_parse_mcfg(struct acpi_table_header *header) +{ + struct acpi_table_mcfg *mcfg; + struct acpi_mcfg_allocation *cfg_table, *cfg; + unsigned long i; + int entries; + + if (!header) + return -EINVAL; + + mcfg = (struct acpi_table_mcfg *)header; + + /* how many config structures do we have */ + free_all_mmcfg(); + entries = 0; + i = header->length - sizeof(struct acpi_table_mcfg); + while (i >= sizeof(struct acpi_mcfg_allocation)) { + entries++; + i -= sizeof(struct acpi_mcfg_allocation); + } + if (entries == 0) { + pr_err(PREFIX "MCFG table has no entries\n"); + return -ENODEV; + } + + cfg_table = (struct acpi_mcfg_allocation *) &mcfg[1]; + for (i = 0; i < entries; i++) { + cfg = &cfg_table[i]; + if (acpi_mcfg_check_entry(mcfg, cfg)) { + free_all_mmcfg(); + return -ENODEV; + } + + if (pci_mmconfig_add(cfg->pci_segment, cfg->start_bus_number, + cfg->end_bus_number, cfg->address) == NULL) { + pr_warn(PREFIX "no memory for MCFG entries\n"); + free_all_mmcfg(); + return -ENOMEM; + } + } + + return 0; +} diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 73de4ef..9950248 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -26,6 +26,13 @@ config PCI_MSI_IRQ_DOMAIN depends on PCI_MSI select GENERIC_MSI_IRQ_DOMAIN
+config PCI_ECAM + bool "Enhanced Configuration Access Mechanism (ECAM)" + depends on PCI && HAVE_PCI_ECAM + +config HAVE_PCI_ECAM + bool + config PCI_DEBUG bool "PCI Debugging" depends on PCI && DEBUG_KERNEL diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile index 8417f55..c41acf1 100644 --- a/drivers/pci/Makefile +++ b/drivers/pci/Makefile @@ -30,6 +30,11 @@ obj-$(CONFIG_PCI_ATS) += ats.o obj-$(CONFIG_PCI_IOV) += iov.o
# +# Enhanced Configuration Access Mechanism (ECAM) +# +obj-$(CONFIG_PCI_ECAM) += ecam.o + +# # ACPI Related PCI FW Functions # ACPI _DSM provided firmware instance and string name # diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c new file mode 100644 index 0000000..d221dba --- /dev/null +++ b/drivers/pci/ecam.c @@ -0,0 +1,175 @@ +/* + * Arch agnostic direct PCI config space access via + * ECAM (Enhanced Configuration Access Mechanism) + * + * Per-architecture code takes care of the mappings, region validation and + * accesses themselves. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/mutex.h> +#include <linux/rculist.h> +#include <linux/ecam.h> + +#include <asm/io.h> +#include <asm/pci_x86.h> /* Temp hack before refactoring arch-specific calls */ + +#define PREFIX "PCI: " + +static DEFINE_MUTEX(pci_mmcfg_lock); + +LIST_HEAD(pci_mmcfg_list); + +static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) +{ + if (cfg->res.parent) + release_resource(&cfg->res); + list_del(&cfg->list); + kfree(cfg); +} + +void __init free_all_mmcfg(void) +{ + struct pci_mmcfg_region *cfg, *tmp; + + pci_mmcfg_arch_free(); + list_for_each_entry_safe(cfg, tmp, &pci_mmcfg_list, list) + pci_mmconfig_remove(cfg); +} + +void list_add_sorted(struct pci_mmcfg_region *new) +{ + struct pci_mmcfg_region *cfg; + + /* keep list sorted by segment and starting bus number */ + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) { + if (cfg->segment > new->segment || + (cfg->segment == new->segment && + cfg->start_bus >= new->start_bus)) { + list_add_tail_rcu(&new->list, &cfg->list); + return; + } + } + list_add_tail_rcu(&new->list, &pci_mmcfg_list); +} + +struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, + int end, u64 addr) +{ + struct pci_mmcfg_region *new; + struct resource *res; + + if (addr == 0) + return NULL; + + new = kzalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return NULL; + + new->address = addr; + new->segment = segment; + new->start_bus = start; + new->end_bus = end; + + res = &new->res; + res->start = addr + PCI_MMCFG_BUS_OFFSET(start); + res->end = addr + PCI_MMCFG_BUS_OFFSET(end + 1) - 1; + res->flags = IORESOURCE_MEM | IORESOURCE_BUSY; + snprintf(new->name, PCI_MMCFG_RESOURCE_NAME_LEN, + "PCI MMCONFIG %04x [bus %02x-%02x]", segment, start, end); + res->name = new->name; + + return new; +} + +struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, + int end, u64 addr) +{ + struct pci_mmcfg_region *new; + + new = pci_mmconfig_alloc(segment, start, end, addr); + if (new) { + mutex_lock(&pci_mmcfg_lock); + list_add_sorted(new); + mutex_unlock(&pci_mmcfg_lock); + + pr_info(PREFIX + "MMCONFIG for domain %04x [bus %02x-%02x] at %pR " + "(base %#lx)\n", + segment, start, end, &new->res, (unsigned long)addr); + } + + return new; +} + +struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) + if (cfg->segment == segment && + cfg->start_bus <= bus && bus <= cfg->end_bus) + return cfg; + + return NULL; +} + +/* Delete MMCFG information for host bridges */ +int pci_mmconfig_delete(u16 seg, u8 start, u8 end) +{ + struct pci_mmcfg_region *cfg; + + mutex_lock(&pci_mmcfg_lock); + list_for_each_entry_rcu(cfg, &pci_mmcfg_list, list) + if (cfg->segment == seg && cfg->start_bus == start && + cfg->end_bus == end) { + list_del_rcu(&cfg->list); + synchronize_rcu(); + pci_mmcfg_arch_unmap(cfg); + if (cfg->res.parent) + release_resource(&cfg->res); + mutex_unlock(&pci_mmcfg_lock); + kfree(cfg); + return 0; + } + mutex_unlock(&pci_mmcfg_lock); + + return -ENOENT; +} + +int pci_mmconfig_inject(struct pci_mmcfg_region *cfg) +{ + struct pci_mmcfg_region *cfg_conflict; + int err = 0; + + mutex_lock(&pci_mmcfg_lock); + cfg_conflict = pci_mmconfig_lookup(cfg->segment, cfg->start_bus); + if (cfg_conflict) { + if (cfg_conflict->end_bus < cfg->end_bus) + pr_info(FW_INFO "MMCONFIG for " + "domain %04x [bus %02x-%02x] " + "only partially covers this bridge\n", + cfg_conflict->segment, cfg_conflict->start_bus, + cfg_conflict->end_bus); + err = -EEXIST; + goto out; + } + + if (pci_mmcfg_arch_map(cfg)) { + pr_warn("fail to map MMCONFIG %pR.\n", &cfg->res); + err = -ENOMEM; + goto out; + } else { + list_add_sorted(cfg); + pr_info("MMCONFIG at %pR (base %#lx)\n", + &cfg->res, (unsigned long)cfg->address); + + } +out: + mutex_unlock(&pci_mmcfg_lock); + return err; +} diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 7494dbe..6785ebb 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -20,6 +20,7 @@ #include <linux/pci.h> #include <linux/acpi.h> #include <linux/pci-acpi.h> +#include <linux/ecam.h> #include <xen/xen.h> #include <xen/interface/physdev.h> #include <xen/interface/xen.h> diff --git a/include/linux/acpi.h b/include/linux/acpi.h index cc91c15..e95eab2 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -162,6 +162,8 @@ int acpi_table_parse_madt(enum acpi_madt_type id, acpi_tbl_entry_handler handler, unsigned int max_entries); int acpi_parse_mcfg (struct acpi_table_header *header); +int acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, + struct acpi_mcfg_allocation *cfg); void acpi_table_print_madt_entry (struct acpi_subtable_header *madt);
/* the following four functions are architecture-dependent */ diff --git a/include/linux/ecam.h b/include/linux/ecam.h new file mode 100644 index 0000000..dec3b52 --- /dev/null +++ b/include/linux/ecam.h @@ -0,0 +1,37 @@ +#ifndef __ECAM_H +#define __ECAM_H +#ifdef __KERNEL__ + +#include <linux/types.h> +#include <linux/acpi.h> + +/* "PCI MMCONFIG %04x [bus %02x-%02x]" */ +#define PCI_MMCFG_RESOURCE_NAME_LEN (22 + 4 + 2 + 2) + +struct pci_mmcfg_region { + struct list_head list; + struct resource res; + u64 address; + char __iomem *virt; + u16 segment; + u8 start_bus; + u8 end_bus; + char name[PCI_MMCFG_RESOURCE_NAME_LEN]; +}; + +struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); +struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, + int end, u64 addr); +int pci_mmconfig_inject(struct pci_mmcfg_region *cfg); +struct pci_mmcfg_region *pci_mmconfig_add(int segment, int start, + int end, u64 addr); +void list_add_sorted(struct pci_mmcfg_region *new); +void free_all_mmcfg(void); +int pci_mmconfig_delete(u16 seg, u8 start, u8 end); + +extern struct list_head pci_mmcfg_list; + +#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20) + +#endif /* __KERNEL__ */ +#endif /* __ECAM_H */
First function acpi_mcfg_check_entry() does not apply any quirks by default.
Last two functions are required by ACPI subsystem to make PCI config space accessible. Generic code assume to do nothing for early init call but late init call does as follow: - parse MCFG table and add regions to ECAM resource list - map regions - add regions to iomem_resource
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- drivers/acpi/mcfg.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)
diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index 5ecef20..fad9917 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -57,3 +57,29 @@ int __init acpi_parse_mcfg(struct acpi_table_header *header)
return 0; } + +int __init __weak acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, + struct acpi_mcfg_allocation *cfg) +{ + return 0; +} + +void __init __weak pci_mmcfg_early_init(void) +{ + +} + +void __init __weak pci_mmcfg_late_init(void) +{ + struct pci_mmcfg_region *cfg; + + acpi_table_parse(ACPI_SIG_MCFG, acpi_parse_mcfg); + + if (list_empty(&pci_mmcfg_list)) + return; + if (!pci_mmcfg_arch_init()) + free_all_mmcfg(); + + list_for_each_entry(cfg, &pci_mmcfg_list, list) + insert_resource(&iomem_resource, &cfg->res); +}
mmconfig_64.c version is going to be default implementation for low-level operation on mmconfig regions. However, now it initializes raw_pci_ext_ops pointer which is specific for x86 only. Moreover, mmconfig_32.c is doing the same thing at the same time. So lets move it to mmconfig_shared.c so it becomes common for both and mmconfig_64.c turns out to be purely arch agnostic.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- arch/x86/include/asm/pci_x86.h | 5 +++++ arch/x86/pci/mmconfig-shared.c | 10 ++++++++-- arch/x86/pci/mmconfig_32.c | 10 ++-------- arch/x86/pci/mmconfig_64.c | 11 ++--------- 4 files changed, 17 insertions(+), 19 deletions(-)
diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index c1c0f37..0482807 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -130,6 +130,11 @@ extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr);
+int pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, + int reg, int len, u32 *value); +int pci_mmcfg_write(unsigned int seg, unsigned int bus, unsigned int devfn, + int reg, int len, u32 value); + /* * AMD Fam10h CPUs are buggy, and cannot access MMIO config space * on their northbrige except through the * %eax register. As such, you MUST diff --git a/arch/x86/pci/mmconfig-shared.c b/arch/x86/pci/mmconfig-shared.c index ce2c2e4..980f304 100644 --- a/arch/x86/pci/mmconfig-shared.c +++ b/arch/x86/pci/mmconfig-shared.c @@ -29,6 +29,11 @@ static bool pci_mmcfg_running_state; static bool pci_mmcfg_arch_init_failed;
+const struct pci_raw_ops pci_mmcfg = { + .read = pci_mmcfg_read, + .write = pci_mmcfg_write, +}; + static const char *__init pci_mmcfg_e7520(void) { u32 win; @@ -512,9 +517,10 @@ static void __init __pci_mmcfg_init(int early) } }
- if (pci_mmcfg_arch_init()) + if (pci_mmcfg_arch_init()) { + raw_pci_ext_ops = &pci_mmcfg; pci_probe = (pci_probe & ~PCI_PROBE_MASK) | PCI_PROBE_MMCONF; - else { + } else { free_all_mmcfg(); pci_mmcfg_arch_init_failed = true; } diff --git a/arch/x86/pci/mmconfig_32.c b/arch/x86/pci/mmconfig_32.c index 246f135..2ded56f 100644 --- a/arch/x86/pci/mmconfig_32.c +++ b/arch/x86/pci/mmconfig_32.c @@ -50,7 +50,7 @@ static void pci_exp_set_dev_base(unsigned int base, int bus, int devfn) } }
-static int pci_mmcfg_read(unsigned int seg, unsigned int bus, +int pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 *value) { unsigned long flags; @@ -89,7 +89,7 @@ err: *value = -1; return 0; }
-static int pci_mmcfg_write(unsigned int seg, unsigned int bus, +int pci_mmcfg_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) { unsigned long flags; @@ -126,15 +126,9 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return 0; }
-const struct pci_raw_ops pci_mmcfg = { - .read = pci_mmcfg_read, - .write = pci_mmcfg_write, -}; - int __init pci_mmcfg_arch_init(void) { printk(KERN_INFO "PCI: Using MMCONFIG for extended config space\n"); - raw_pci_ext_ops = &pci_mmcfg; return 1; }
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index b14fcd3..d0c48eb 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -25,7 +25,7 @@ static char __iomem *pci_dev_base(unsigned int seg, unsigned int bus, unsigned i return NULL; }
-static int pci_mmcfg_read(unsigned int seg, unsigned int bus, +int pci_mmcfg_read(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 *value) { char __iomem *addr; @@ -59,7 +59,7 @@ err: *value = -1; return 0; }
-static int pci_mmcfg_write(unsigned int seg, unsigned int bus, +int pci_mmcfg_write(unsigned int seg, unsigned int bus, unsigned int devfn, int reg, int len, u32 value) { char __iomem *addr; @@ -91,11 +91,6 @@ static int pci_mmcfg_write(unsigned int seg, unsigned int bus, return 0; }
-const struct pci_raw_ops pci_mmcfg = { - .read = pci_mmcfg_read, - .write = pci_mmcfg_write, -}; - static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) { void __iomem *addr; @@ -121,8 +116,6 @@ int __init pci_mmcfg_arch_init(void) return 0; }
- raw_pci_ext_ops = &pci_mmcfg; - return 1; }
Hosts with custom ECAM hooks (like 32bit x86) should select ARCH_HAS_CUSTOM_PCI_ECAM. Otherwise, host will use generic version provided by this patch (like 64bit x86 does).
Note, we leaved x86-specific PCI config accessors in corresponding files.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- arch/x86/Kconfig | 1 + arch/x86/include/asm/pci_x86.h | 4 --- arch/x86/pci/mmconfig_64.c | 55 --------------------------------------- drivers/acpi/mcfg.c | 2 -- drivers/pci/Kconfig | 3 +++ drivers/pci/ecam.c | 59 +++++++++++++++++++++++++++++++++++++++++- include/linux/ecam.h | 6 +++++ 7 files changed, 68 insertions(+), 62 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 350bd52..102d7d1 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -30,6 +30,7 @@ config X86 select ARCH_HAS_PMEM_API if X86_64 select ARCH_HAS_MMIO_FLUSH select ARCH_HAS_SG_CHAIN + select ARCH_HAS_CUSTOM_PCI_ECAM if X86_32 select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h index 0482807..be091b4 100644 --- a/arch/x86/include/asm/pci_x86.h +++ b/arch/x86/include/asm/pci_x86.h @@ -123,10 +123,6 @@ extern int pci_legacy_init(void); extern int pcibios_fixup_irq(struct pci_dev *dev, u8 pin);
/* pci-mmconfig.c */ -extern int __init pci_mmcfg_arch_init(void); -extern void __init pci_mmcfg_arch_free(void); -extern int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); -extern void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); extern int pci_mmconfig_insert(struct device *dev, u16 seg, u8 start, u8 end, phys_addr_t addr);
diff --git a/arch/x86/pci/mmconfig_64.c b/arch/x86/pci/mmconfig_64.c index d0c48eb..fd356cc 100644 --- a/arch/x86/pci/mmconfig_64.c +++ b/arch/x86/pci/mmconfig_64.c @@ -90,58 +90,3 @@ int pci_mmcfg_write(unsigned int seg, unsigned int bus,
return 0; } - -static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) -{ - void __iomem *addr; - u64 start, size; - int num_buses; - - start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus); - num_buses = cfg->end_bus - cfg->start_bus + 1; - size = PCI_MMCFG_BUS_OFFSET(num_buses); - addr = ioremap_nocache(start, size); - if (addr) - addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus); - return addr; -} - -int __init pci_mmcfg_arch_init(void) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry(cfg, &pci_mmcfg_list, list) - if (pci_mmcfg_arch_map(cfg)) { - pci_mmcfg_arch_free(); - return 0; - } - - return 1; -} - -void __init pci_mmcfg_arch_free(void) -{ - struct pci_mmcfg_region *cfg; - - list_for_each_entry(cfg, &pci_mmcfg_list, list) - pci_mmcfg_arch_unmap(cfg); -} - -int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg) -{ - cfg->virt = mcfg_ioremap(cfg); - if (!cfg->virt) { - pr_err(PREFIX "can't map MMCONFIG at %pR\n", &cfg->res); - return -ENOMEM; - } - - return 0; -} - -void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg) -{ - if (cfg && cfg->virt) { - iounmap(cfg->virt + PCI_MMCFG_BUS_OFFSET(cfg->start_bus)); - cfg->virt = NULL; - } -} diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index fad9917..745b83e 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -10,8 +10,6 @@ #include <linux/acpi.h> #include <linux/ecam.h>
-#include <asm/pci_x86.h> /* Temp hack before refactoring arch-specific calls */ - #define PREFIX "MCFG: "
int __init acpi_parse_mcfg(struct acpi_table_header *header) diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig index 9950248..b2e27c8 100644 --- a/drivers/pci/Kconfig +++ b/drivers/pci/Kconfig @@ -33,6 +33,9 @@ config PCI_ECAM config HAVE_PCI_ECAM bool
+config ARCH_HAS_CUSTOM_PCI_ECAM + bool + config PCI_DEBUG bool "PCI Debugging" depends on PCI && DEBUG_KERNEL diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c index d221dba..8a5eef7 100644 --- a/drivers/pci/ecam.c +++ b/drivers/pci/ecam.c @@ -16,7 +16,6 @@ #include <linux/ecam.h>
#include <asm/io.h> -#include <asm/pci_x86.h> /* Temp hack before refactoring arch-specific calls */
#define PREFIX "PCI: "
@@ -24,6 +23,64 @@ static DEFINE_MUTEX(pci_mmcfg_lock);
LIST_HEAD(pci_mmcfg_list);
+#ifndef CONFIG_ARCH_HAS_CUSTOM_PCI_ECAM + +static void __iomem *mcfg_ioremap(struct pci_mmcfg_region *cfg) +{ + void __iomem *addr; + u64 start, size; + int num_buses; + + start = cfg->address + PCI_MMCFG_BUS_OFFSET(cfg->start_bus); + num_buses = cfg->end_bus - cfg->start_bus + 1; + size = PCI_MMCFG_BUS_OFFSET(num_buses); + addr = ioremap_nocache(start, size); + if (addr) + addr -= PCI_MMCFG_BUS_OFFSET(cfg->start_bus); + return addr; +} + +int __init pci_mmcfg_arch_init(void) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry(cfg, &pci_mmcfg_list, list) + if (pci_mmcfg_arch_map(cfg)) { + pci_mmcfg_arch_free(); + return 0; + } + + return 1; +} + +void __init pci_mmcfg_arch_free(void) +{ + struct pci_mmcfg_region *cfg; + + list_for_each_entry(cfg, &pci_mmcfg_list, list) + pci_mmcfg_arch_unmap(cfg); +} + +int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg) +{ + cfg->virt = mcfg_ioremap(cfg); + if (!cfg->virt) { + pr_err(PREFIX "can't map MMCONFIG at %pR\n", &cfg->res); + return -ENOMEM; + } + + return 0; +} + +void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg) +{ + if (cfg && cfg->virt) { + iounmap(cfg->virt + PCI_MMCFG_BUS_OFFSET(cfg->start_bus)); + cfg->virt = NULL; + } +} +#endif + static void __init pci_mmconfig_remove(struct pci_mmcfg_region *cfg) { if (cfg->res.parent) diff --git a/include/linux/ecam.h b/include/linux/ecam.h index dec3b52..813acd1 100644 --- a/include/linux/ecam.h +++ b/include/linux/ecam.h @@ -29,6 +29,12 @@ void list_add_sorted(struct pci_mmcfg_region *new); void free_all_mmcfg(void); int pci_mmconfig_delete(u16 seg, u8 start, u8 end);
+/* Arch specific calls */ +int pci_mmcfg_arch_init(void); +void pci_mmcfg_arch_free(void); +int pci_mmcfg_arch_map(struct pci_mmcfg_region *cfg); +void pci_mmcfg_arch_unmap(struct pci_mmcfg_region *cfg); + extern struct list_head pci_mmcfg_list;
#define PCI_MMCFG_BUS_OFFSET(bus) ((bus) << 20)
In drivers/xen/pci.c, there are arch x86 dependent codes when CONFIG_PCI_MMCONFIG is enabled, since CONFIG_PCI_MMCONFIG depends on ACPI, so this will prevent XEN PCI running on other architectures using ACPI with PCI_MMCONFIG enabled (such as ARM64).
Fortunatly, it can be sloved in a simple way. In drivers/xen/pci.c, the only x86 dependent code is if ((pci_probe & PCI_PROBE_MMCONF) == 0), and it's defined in asm/pci_x86.h, the code means that if the PCI resource is not probed in PCI_PROBE_MMCONF way, just ingnore the xen mcfg init. Actually this is duplicate, because if PCI resource is not probed in PCI_PROBE_MMCONF way, the pci_mmconfig_list will be empty, and the if (list_empty()) after it will do the same job.
So just remove the arch related code and the head file, this will be no functional change for x86, and also makes xen/pci.c usable for other architectures.
Signed-off-by: Hanjun Guo hanjun.guo@linaro.org CC: Konrad Rzeszutek Wilk konrad.wilk@oracle.com CC: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Stefano Stabellini Stefano.Stabellini@eu.citrix.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- drivers/xen/pci.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 6785ebb..9a8dbe3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -28,9 +28,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif
static bool __read_mostly pci_seg_supported = true;
@@ -222,9 +219,6 @@ static int __init xen_mcfg_late(void) if (!xen_initial_domain()) return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0) - return 0; - if (list_empty(&pci_mmcfg_list)) return 0;
On 16.12.2015 16:16, Tomasz Nowicki wrote:
In drivers/xen/pci.c, there are arch x86 dependent codes when CONFIG_PCI_MMCONFIG is enabled, since CONFIG_PCI_MMCONFIG depends on ACPI, so this will prevent XEN PCI running on other architectures using ACPI with PCI_MMCONFIG enabled (such as ARM64).
Fortunatly, it can be sloved in a simple way. In drivers/xen/pci.c, the only x86 dependent code is if ((pci_probe & PCI_PROBE_MMCONF) == 0), and it's defined in asm/pci_x86.h, the code means that if the PCI resource is not probed in PCI_PROBE_MMCONF way, just ingnore the xen mcfg init. Actually this is duplicate, because if PCI resource is not probed in PCI_PROBE_MMCONF way, the pci_mmconfig_list will be empty, and the if (list_empty()) after it will do the same job.
So just remove the arch related code and the head file, this will be no functional change for x86, and also makes xen/pci.c usable for other architectures.
Signed-off-by: Hanjun Guo hanjun.guo@linaro.org CC: Konrad Rzeszutek Wilk konrad.wilk@oracle.com CC: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Stefano Stabellini Stefano.Stabellini@eu.citrix.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com
drivers/xen/pci.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 6785ebb..9a8dbe3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -28,9 +28,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif
I noticed that I forgot about: +#include <linux/ecam.h>
Sorry.
Tomasz
On 17.12.2015 11:25, Tomasz Nowicki wrote:
On 16.12.2015 16:16, Tomasz Nowicki wrote:
In drivers/xen/pci.c, there are arch x86 dependent codes when CONFIG_PCI_MMCONFIG is enabled, since CONFIG_PCI_MMCONFIG depends on ACPI, so this will prevent XEN PCI running on other architectures using ACPI with PCI_MMCONFIG enabled (such as ARM64).
Fortunatly, it can be sloved in a simple way. In drivers/xen/pci.c, the only x86 dependent code is if ((pci_probe & PCI_PROBE_MMCONF) == 0), and it's defined in asm/pci_x86.h, the code means that if the PCI resource is not probed in PCI_PROBE_MMCONF way, just ingnore the xen mcfg init. Actually this is duplicate, because if PCI resource is not probed in PCI_PROBE_MMCONF way, the pci_mmconfig_list will be empty, and the if (list_empty()) after it will do the same job.
So just remove the arch related code and the head file, this will be no functional change for x86, and also makes xen/pci.c usable for other architectures.
Signed-off-by: Hanjun Guo hanjun.guo@linaro.org CC: Konrad Rzeszutek Wilk konrad.wilk@oracle.com CC: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Stefano Stabellini Stefano.Stabellini@eu.citrix.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com
drivers/xen/pci.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 6785ebb..9a8dbe3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -28,9 +28,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif
I noticed that I forgot about: +#include <linux/ecam.h>
I did not forget, I added it in patch [PATCH V2 02/23]. Sorry for noise.
Tomasz
On Wed, 16 Dec 2015, Tomasz Nowicki wrote:
In drivers/xen/pci.c, there are arch x86 dependent codes when CONFIG_PCI_MMCONFIG is enabled, since CONFIG_PCI_MMCONFIG depends on ACPI, so this will prevent XEN PCI running on other architectures using ACPI with PCI_MMCONFIG enabled (such as ARM64).
Fortunatly, it can be sloved in a simple way. In drivers/xen/pci.c, the only x86 dependent code is if ((pci_probe & PCI_PROBE_MMCONF) == 0), and it's defined in asm/pci_x86.h, the code means that if the PCI resource is not probed in PCI_PROBE_MMCONF way, just ingnore the xen mcfg init. Actually this is duplicate, because if PCI resource is not probed in PCI_PROBE_MMCONF way, the pci_mmconfig_list will be empty, and the if (list_empty()) after it will do the same job.
So just remove the arch related code and the head file, this will be no functional change for x86, and also makes xen/pci.c usable for other architectures.
Signed-off-by: Hanjun Guo hanjun.guo@linaro.org CC: Konrad Rzeszutek Wilk konrad.wilk@oracle.com CC: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Stefano Stabellini Stefano.Stabellini@eu.citrix.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com
Acked-by: Stefano Stabellini stefano.stabellini@eu.citrix.com
drivers/xen/pci.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 6785ebb..9a8dbe3 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -28,9 +28,6 @@ #include <asm/xen/hypervisor.h> #include <asm/xen/hypercall.h> #include "../pci/pci.h" -#ifdef CONFIG_PCI_MMCONFIG -#include <asm/pci_x86.h> -#endif static bool __read_mostly pci_seg_supported = true; @@ -222,9 +219,6 @@ static int __init xen_mcfg_late(void) if (!xen_initial_domain()) return 0;
- if ((pci_probe & PCI_PROBE_MMCONF) == 0)
return 0;
- if (list_empty(&pci_mmcfg_list)) return 0;
1.9.1
On 21.12.2015 19:12, Stefano Stabellini wrote:
On Wed, 16 Dec 2015, Tomasz Nowicki wrote:
In drivers/xen/pci.c, there are arch x86 dependent codes when CONFIG_PCI_MMCONFIG is enabled, since CONFIG_PCI_MMCONFIG depends on ACPI, so this will prevent XEN PCI running on other architectures using ACPI with PCI_MMCONFIG enabled (such as ARM64).
Fortunatly, it can be sloved in a simple way. In drivers/xen/pci.c, the only x86 dependent code is if ((pci_probe & PCI_PROBE_MMCONF) == 0), and it's defined in asm/pci_x86.h, the code means that if the PCI resource is not probed in PCI_PROBE_MMCONF way, just ingnore the xen mcfg init. Actually this is duplicate, because if PCI resource is not probed in PCI_PROBE_MMCONF way, the pci_mmconfig_list will be empty, and the if (list_empty()) after it will do the same job.
So just remove the arch related code and the head file, this will be no functional change for x86, and also makes xen/pci.c usable for other architectures.
Signed-off-by: Hanjun Guo hanjun.guo@linaro.org CC: Konrad Rzeszutek Wilk konrad.wilk@oracle.com CC: Boris Ostrovsky boris.ostrovsky@oracle.com Cc: Stefano Stabellini Stefano.Stabellini@eu.citrix.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com
Acked-by: Stefano Stabellini stefano.stabellini@eu.citrix.com
Thanks Stefano.
Tomasz
Lets keep RAW ACPI PCI config space accessors empty by default, since we are note sure if they are necessary accross all archs. Once we sort this out, we can provide generic version or let architectures to overwrite, like now x86.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- drivers/acpi/mcfg.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index 745b83e..3e1e7be 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -9,9 +9,30 @@
#include <linux/acpi.h> #include <linux/ecam.h> +#include <linux/pci.h>
#define PREFIX "MCFG: "
+/* + * raw_pci_read/write - raw ACPI PCI config space accessors. + * + * By defauly (__weak) these accessors are empty and should be overwritten + * by architectures which support operations on ACPI PCI_Config regions, + * see osl.c file. + */ + +int __weak raw_pci_read(unsigned int domain, unsigned int bus, + unsigned int devfn, int reg, int len, u32 *val) +{ + return PCIBIOS_DEVICE_NOT_FOUND; +} + +int __weak raw_pci_write(unsigned int domain, unsigned int bus, + unsigned int devfn, int reg, int len, u32 val) +{ + return PCIBIOS_DEVICE_NOT_FOUND; +} + int __init acpi_parse_mcfg(struct acpi_table_header *header) { struct acpi_table_mcfg *mcfg;
We can now use previously prepared empty ACPI RAW accessors and cleanup a bit before adding full support for PCI host bridge driver.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- arch/arm64/Kconfig | 6 ++++++ arch/arm64/kernel/pci.c | 15 --------------- 2 files changed, 6 insertions(+), 15 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 871f217..d65d315 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -92,6 +92,7 @@ config ARM64 select SPARSE_IRQ select SYSCTL_EXCEPTION_TRACE select HAVE_CONTEXT_TRACKING + select HAVE_PCI_ECAM help ARM 64-bit (AArch64) Linux support.
@@ -207,6 +208,11 @@ source "drivers/pci/Kconfig" source "drivers/pci/pcie/Kconfig" source "drivers/pci/hotplug/Kconfig"
+config PCI_MMCONFIG + def_bool y + select PCI_ECAM + depends on ACPI + endmenu
menu "Kernel Features" diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index b3d098b..023b983 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -61,21 +61,6 @@ int pcibios_add_device(struct pci_dev *dev) return 0; }
-/* - * raw_pci_read/write - Platform-specific PCI config space access. - */ -int raw_pci_read(unsigned int domain, unsigned int bus, - unsigned int devfn, int reg, int len, u32 *val) -{ - return -ENXIO; -} - -int raw_pci_write(unsigned int domain, unsigned int bus, - unsigned int devfn, int reg, int len, u32 val) -{ - return -ENXIO; -} - #ifdef CONFIG_ACPI /* Root bridge scanning */ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
There are two ways we can get ECAM (aka MCFG) regions using ACPI, from MCFG static table and from _CBA method. We cannot remove static regions, however regions coming from _CBA should be removed while removing bridge device.
In the light of above we need flag to mark hot added ECAM entries so that user should use pci_mmconfig_inject while adding regions from _CBA method.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- drivers/pci/ecam.c | 2 ++ include/linux/ecam.h | 1 + 2 files changed, 3 insertions(+)
diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c index 8a5eef7..da35b4c 100644 --- a/drivers/pci/ecam.c +++ b/drivers/pci/ecam.c @@ -131,6 +131,7 @@ struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, new->segment = segment; new->start_bus = start; new->end_bus = end; + new->hot_added = false;
res = &new->res; res->start = addr + PCI_MMCFG_BUS_OFFSET(start); @@ -221,6 +222,7 @@ int pci_mmconfig_inject(struct pci_mmcfg_region *cfg) err = -ENOMEM; goto out; } else { + cfg->hot_added = true; list_add_sorted(cfg); pr_info("MMCONFIG at %pR (base %#lx)\n", &cfg->res, (unsigned long)cfg->address); diff --git a/include/linux/ecam.h b/include/linux/ecam.h index 813acd1..e0f322e 100644 --- a/include/linux/ecam.h +++ b/include/linux/ecam.h @@ -17,6 +17,7 @@ struct pci_mmcfg_region { u8 start_bus; u8 end_bus; char name[PCI_MMCFG_RESOURCE_NAME_LEN]; + bool hot_added; };
struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus);
Now that we have hot_added flag and all information in struct acpi_pci_root we need, we can get rid of arch specific mcfg data from struct pci_root_info.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- arch/x86/pci/acpi.c | 36 ++++++++++++++---------------------- 1 file changed, 14 insertions(+), 22 deletions(-)
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 64caf2b..56714a9 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -12,11 +12,6 @@ struct pci_root_info { struct acpi_pci_root_info common; struct pci_sysdata sd; -#ifdef CONFIG_PCI_MMCONFIG - bool mcfg_added; - u8 start_bus; - u8 end_bus; -#endif };
static bool pci_use_crs = true; @@ -180,16 +175,13 @@ static int check_segment(u16 seg, struct device *dev, char *estr)
static int setup_mcfg_map(struct acpi_pci_root_info *ci) { - int result, seg; - struct pci_root_info *info; + int result, seg, start, end; struct acpi_pci_root *root = ci->root; struct device *dev = &ci->bridge->dev;
- info = container_of(ci, struct pci_root_info, common); - info->start_bus = (u8)root->secondary.start; - info->end_bus = (u8)root->secondary.end; - info->mcfg_added = false; - seg = info->sd.domain; + seg = root->segment; + start = root->secondary.start; + end = root->secondary.end;
/* return success if MMCFG is not in use */ if (raw_pci_ext_ops && raw_pci_ext_ops != &pci_mmcfg) @@ -198,13 +190,11 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci) if (!(pci_probe & PCI_PROBE_MMCONF)) return check_segment(seg, dev, "MMCONFIG is disabled,");
- result = pci_mmconfig_insert(dev, seg, info->start_bus, info->end_bus, - root->mcfg_addr); + result = pci_mmconfig_insert(dev, seg, start, end, root->mcfg_addr); if (result == 0) { /* enable MMCFG if it hasn't been enabled yet */ if (raw_pci_ext_ops == NULL) raw_pci_ext_ops = &pci_mmcfg; - info->mcfg_added = true; } else if (result != -EEXIST) return check_segment(seg, dev, "fail to add MMCONFIG information,"); @@ -214,14 +204,16 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci)
static void teardown_mcfg_map(struct acpi_pci_root_info *ci) { - struct pci_root_info *info; + struct acpi_pci_root *root = ci->root; + struct pci_mmcfg_region *cfg;
- info = container_of(ci, struct pci_root_info, common); - if (info->mcfg_added) { - pci_mmconfig_delete(info->sd.domain, - info->start_bus, info->end_bus); - info->mcfg_added = false; - } + cfg = pci_mmconfig_lookup(root->segment, root->secondary.start); + if (!cfg) + return; + + if (cfg->hot_added) + pci_mmconfig_delete(root->segment, root->secondary.start, + root->secondary.end); } #else static int setup_mcfg_map(struct acpi_pci_root_info *ci)
In order to probe PCIe host controller when booting with DT, ARM64 is using drivers which defer IRQ assignment to device enable time. It means that boot time DT specific irq map initialization is always overridden, so lets remove that code.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- arch/arm64/kernel/pci.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..f7948f5 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -51,16 +51,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return pci_enable_resources(dev, mask); }
-/* - * Try to assign the IRQ number from DT when adding a new device - */ -int pcibios_add_device(struct pci_dev *dev) -{ - dev->irq = of_irq_parse_and_map_pci(dev, 0, 0); - - return 0; -} - #ifdef CONFIG_ACPI /* Root bridge scanning */ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root)
On Wed, Dec 16, 2015 at 04:16:21PM +0100, Tomasz Nowicki wrote:
In order to probe PCIe host controller when booting with DT, ARM64 is using drivers which defer IRQ assignment to device enable time. It means that boot time DT specific irq map initialization is always overridden, so lets remove that code.
Signed-off-by: Tomasz Nowicki tn@semihalf.com
arch/arm64/kernel/pci.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..f7948f5 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -51,16 +51,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return pci_enable_resources(dev, mask); } -/*
- Try to assign the IRQ number from DT when adding a new device
- */
-int pcibios_add_device(struct pci_dev *dev) -{
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
- return 0;
-}
Actually, this patch should be part of Matthew's series:
http://comments.gmane.org/gmane.linux.kernel.pci/46461
Lorenzo
On 12.01.2016 14:50, Lorenzo Pieralisi wrote:
On Wed, Dec 16, 2015 at 04:16:21PM +0100, Tomasz Nowicki wrote:
In order to probe PCIe host controller when booting with DT, ARM64 is using drivers which defer IRQ assignment to device enable time. It means that boot time DT specific irq map initialization is always overridden, so lets remove that code.
Signed-off-by: Tomasz Nowicki tn@semihalf.com
arch/arm64/kernel/pci.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..f7948f5 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -51,16 +51,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return pci_enable_resources(dev, mask); }
-/*
- Try to assign the IRQ number from DT when adding a new device
- */
-int pcibios_add_device(struct pci_dev *dev) -{
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
- return 0;
-}
Actually, this patch should be part of Matthew's series:
Agree.
Matthew can you please add this patch to your series?
Thanks, Tomasz
On 01/12/2016 08:13 AM, Tomasz Nowicki wrote:
On 12.01.2016 14:50, Lorenzo Pieralisi wrote:
On Wed, Dec 16, 2015 at 04:16:21PM +0100, Tomasz Nowicki wrote:
In order to probe PCIe host controller when booting with DT, ARM64 is using drivers which defer IRQ assignment to device enable time. It means that boot time DT specific irq map initialization is always overridden, so lets remove that code.
Signed-off-by: Tomasz Nowicki tn@semihalf.com
arch/arm64/kernel/pci.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..f7948f5 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -51,16 +51,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return pci_enable_resources(dev, mask); }
-/*
- Try to assign the IRQ number from DT when adding a new device
- */
-int pcibios_add_device(struct pci_dev *dev) -{
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
- return 0;
-}
How was this tested? Or in other words, what PCI devices that use legacy INT{A,B,C,D} interrupts were used in testing this patch?
David Daney
Actually, this patch should be part of Matthew's series:
Agree.
Matthew can you please add this patch to your series?
Thanks, Tomasz
On 12.01.2016 18:56, David Daney wrote:
On 01/12/2016 08:13 AM, Tomasz Nowicki wrote:
On 12.01.2016 14:50, Lorenzo Pieralisi wrote:
On Wed, Dec 16, 2015 at 04:16:21PM +0100, Tomasz Nowicki wrote:
In order to probe PCIe host controller when booting with DT, ARM64 is using drivers which defer IRQ assignment to device enable time. It means that boot time DT specific irq map initialization is always overridden, so lets remove that code.
Signed-off-by: Tomasz Nowicki tn@semihalf.com
arch/arm64/kernel/pci.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index 023b983..f7948f5 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -51,16 +51,6 @@ int pcibios_enable_device(struct pci_dev *dev, int mask) return pci_enable_resources(dev, mask); }
-/*
- Try to assign the IRQ number from DT when adding a new device
- */
-int pcibios_add_device(struct pci_dev *dev) -{
- dev->irq = of_irq_parse_and_map_pci(dev, 0, 0);
- return 0;
-}
How was this tested? Or in other words, what PCI devices that use legacy INT{A,B,C,D} interrupts were used in testing this patch?
I used QEMU and e1000 NIC:
sudo ./qemu/aarch64-softmmu/qemu-system-aarch64 -smp 1 -m 1024 -M virt -cpu cortex-a57 -nographic -device e1000,netdev=net1,mac=52:54:00:12:34:56 -netdev type=tap,ifname=tun1,id=net1,script=no,downscript=no -drive file=qemu/ubuntu.img,id=root,if=none,format=raw -device virtio-blk-device,drive=root -kernel linux-aarch64/arch/arm64/boot/Image -append "console=ttyAMA0 earlycon=pl011,0x9000000 rw root=/dev/vda"
root@ubuntu:~# ifconfig eth0 Link encap:Ethernet HWaddr 52:54:00:12:34:56 inet addr:10.0.0.48 Bcast:10.0.255.255 Mask:255.255.0.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:107 errors:0 dropped:0 overruns:0 frame:0 TX packets:32 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:18087 (18.0 KB) TX bytes:3818 (3.8 KB)
root@ubuntu:~# cat /proc/interrupts CPU0 39: 280 GIC 36 Level eth0
root@ubuntu:~# lspci -vvv 00:00.0 Host bridge: Red Hat, Inc. Device 0008 Subsystem: Red Hat, Inc Device 1100 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
00:01.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03) Subsystem: Red Hat, Inc QEMU Virtual Machine Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 39 Region 0: Memory at 10040000 (32-bit, non-prefetchable) [size=128K] Region 1: I/O ports at 1000 [size=64] [virtual] Expansion ROM at 10000000 [disabled] [size=256K] Kernel driver in use: e1000
Any specific concern w.r.t. this patch?
Tomasz
Currently we have two platforms (x86 & ia64) capable of PCI ACPI host bridge initialization. They both use sysdata pill to pass down parent device reference and both relay on NULL parent in pci_create_root_bus() to validate sysdata content.
It looks hacky and prevent us from getting some frimware specific info for PCI host controller e.g. bus domain number. However, it seems we can overcome that blocker by passing down parent device via pci_create_root_bus parameter (as the ACPI device type) and using ACPI_COMPANION_SET in core code for ACPI boot method. ACPI_COMPANION_SET is safe to run for all cases DT, ACPI and DT&ACPI.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- drivers/acpi/pci_root.c | 5 ++++- drivers/pci/probe.c | 2 ++ 2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index ae3fe4e..a65c8c2 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -846,7 +846,10 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root,
pci_acpi_root_add_resources(info); pci_add_resource(&info->resources, &root->secondary); - bus = pci_create_root_bus(NULL, busnum, ops->pci_ops, + + /* Root bridge device needs to be sure of parent ACPI type */ + ACPI_COMPANION_SET(&device->dev, device); + bus = pci_create_root_bus(&device->dev, busnum, ops->pci_ops, sysdata, &info->resources); if (!bus) goto out_release_info; diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 553a029..cad836f 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2107,6 +2107,8 @@ struct pci_bus *pci_create_root_bus(struct device *parent, int bus, bridge->dev.parent = parent; bridge->dev.release = pci_release_host_bridge_dev; dev_set_name(&bridge->dev, "pci%04x:%02x", pci_domain_nr(b), bus); + ACPI_COMPANION_SET(&bridge->dev, + parent ? to_acpi_device_node(parent->fwnode) : NULL); error = pcibios_root_bridge_prepare(bridge); if (error) { kfree(bridge);
Since PCI core code is setting ACPI companion device for us now, platform specific ACPI companion device setting turns out to be dead now. Therefore we can get rid of it, including related companion reference from PCI sysdata structure.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- arch/ia64/include/asm/pci.h | 1 - arch/ia64/pci/pci.c | 16 ---------------- arch/x86/include/asm/pci.h | 3 --- arch/x86/pci/acpi.c | 2 -- arch/x86/pci/irq.c | 10 ---------- 5 files changed, 32 deletions(-)
diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h index 07039d1..5050748 100644 --- a/arch/ia64/include/asm/pci.h +++ b/arch/ia64/include/asm/pci.h @@ -65,7 +65,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus, #define pci_legacy_write platform_pci_legacy_write
struct pci_controller { - struct acpi_device *companion; void *iommu; int segment; int node; /* nearest node with memory or NUMA_NO_NODE for global allocation */ diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 8f6ac2f..978d6af 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -301,28 +301,12 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) }
info->controller.segment = root->segment; - info->controller.companion = device; info->controller.node = acpi_get_node(device->handle); INIT_LIST_HEAD(&info->io_resources); return acpi_pci_root_create(root, &pci_acpi_root_ops, &info->common, &info->controller); }
-int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) -{ - /* - * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL - * here, pci_create_root_bus() has been called by someone else and - * sysdata is likely to be different from what we expect. Let it go in - * that case. - */ - if (!bridge->dev.parent) { - struct pci_controller *controller = bridge->bus->sysdata; - ACPI_COMPANION_SET(&bridge->dev, controller->companion); - } - return 0; -} - void pcibios_fixup_device_resources(struct pci_dev *dev) { int idx; diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index 4625943..a98c022 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -14,9 +14,6 @@ struct pci_sysdata { int domain; /* PCI domain */ int node; /* NUMA node */ -#ifdef CONFIG_ACPI - struct acpi_device *companion; /* ACPI companion device */ -#endif #ifdef CONFIG_X86_64 void *iommu; /* IOMMU private data */ #endif diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 56714a9..286e0f5 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -333,7 +333,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) struct pci_sysdata sd = { .domain = domain, .node = node, - .companion = root->device };
memcpy(bus->sysdata, &sd, sizeof(sd)); @@ -348,7 +347,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) else { info->sd.domain = domain; info->sd.node = node; - info->sd.companion = root->device; bus = acpi_pci_root_create(root, &acpi_pci_root_ops, &info->common, &info->sd); } diff --git a/arch/x86/pci/irq.c b/arch/x86/pci/irq.c index 7032798..cc62226 100644 --- a/arch/x86/pci/irq.c +++ b/arch/x86/pci/irq.c @@ -1153,16 +1153,6 @@ void __init pcibios_irq_init(void)
int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) { - /* - * We pass NULL as parent to pci_create_root_bus(), so if it is not NULL - * here, pci_create_root_bus() has been called by someone else and - * sysdata is likely to be different from what we expect. Let it go in - * that case. - */ - if (!bridge->dev.parent) { - struct pci_sysdata *sd = bridge->bus->sysdata; - ACPI_COMPANION_SET(&bridge->dev, sd->companion); - } bridge->map_irq = pci_map_irq; return 0; }
As we now have valid PCI host bridge device reference we can introduce code that is going to find its bus domain number using ACPI _SEG method.
Note that _SEG method is optional, therefore _SEG absence means that all PCI buses belong to domain 0.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Reviewed-by: Liviu Dudau Liviu.Dudau@arm.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- drivers/pci/pci.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 7274006..39a985b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -25,6 +25,7 @@ #include <linux/device.h> #include <linux/pm_runtime.h> #include <linux/pci_hotplug.h> +#include <linux/acpi.h> #include <asm-generic/pci-bridge.h> #include <asm/setup.h> #include <linux/aer.h> @@ -4796,14 +4797,34 @@ void pci_bus_assign_domain_nr(struct pci_bus *bus, struct device *parent) * API and update the use_dt_domains value to keep track of method we * are using to assign domain numbers (use_dt_domains = 0). * + * IF ACPI, we expect non-DT method (use_dt_domains == -1) + * and call _SEG method for corresponding host bridge device. + * If _SEG method does not exist, following ACPI spec (6.5.6) + * all PCI buses belong to domain 0. + * * All other combinations imply we have a platform that is trying - * to mix domain numbers obtained from DT and pci_get_new_domain_nr(), - * which is a recipe for domain mishandling and it is prevented by - * invalidating the domain value (domain = -1) and printing a - * corresponding error. + * to mix domain numbers obtained from DT, ACPI and + * pci_get_new_domain_nr(), which is a recipe for domain mishandling and + * it is prevented by invalidating the domain value (domain = -1) and + * printing a corresponding error. */ + if (domain >= 0 && use_dt_domains) { use_dt_domains = 1; +#ifdef CONFIG_ACPI + } else if (!acpi_disabled && use_dt_domains == -1) { + struct acpi_device *acpi_dev = to_acpi_device(parent); + unsigned long long segment = 0; + acpi_status status; + + status = acpi_evaluate_integer(acpi_dev->handle, + METHOD_NAME__SEG, NULL, + &segment); + if (ACPI_FAILURE(status) && status != AE_NOT_FOUND) + dev_err(&acpi_dev->dev, "can't evaluate _SEG\n"); + + domain = segment; +#endif } else if (domain < 0 && use_dt_domains != 1) { use_dt_domains = 0; domain = pci_get_new_domain_nr();
Since we have now generic way to retrieve domain number using _SEG method, x86 and ia64 can take advantage of it and forget about another platform specific data from pci_sysdata.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- arch/ia64/Kconfig | 3 +++ arch/ia64/include/asm/pci.h | 2 -- arch/ia64/pci/pci.c | 1 - arch/x86/Kconfig | 3 +++ arch/x86/include/asm/pci.h | 7 ------- arch/x86/pci/acpi.c | 2 -- 6 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index eb0249e..6fecd04 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -572,6 +572,9 @@ config PCI config PCI_DOMAINS def_bool PCI
+config PCI_DOMAINS_GENERIC + def_bool PCI + config PCI_SYSCALL def_bool PCI
diff --git a/arch/ia64/include/asm/pci.h b/arch/ia64/include/asm/pci.h index 5050748..4214be1 100644 --- a/arch/ia64/include/asm/pci.h +++ b/arch/ia64/include/asm/pci.h @@ -66,7 +66,6 @@ extern int pci_mmap_legacy_page_range(struct pci_bus *bus,
struct pci_controller { void *iommu; - int segment; int node; /* nearest node with memory or NUMA_NO_NODE for global allocation */
void *platform_data; @@ -74,7 +73,6 @@ struct pci_controller {
#define PCI_CONTROLLER(busdev) ((struct pci_controller *) busdev->sysdata) -#define pci_domain_nr(busdev) (PCI_CONTROLLER(busdev)->segment)
extern struct pci_ops pci_root_ops;
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 978d6af..fe96bc9 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -300,7 +300,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) return NULL; }
- info->controller.segment = root->segment; info->controller.node = acpi_get_node(device->handle); INIT_LIST_HEAD(&info->io_resources); return acpi_pci_root_create(root, &pci_acpi_root_ops, diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 102d7d1..63cc4b7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2383,6 +2383,9 @@ config PCI_DOMAINS def_bool y depends on PCI
+config PCI_DOMAINS_GENERIC + def_bool PCI + config PCI_MMCONFIG bool "Support mmconfig PCI config space access" select PCI_ECAM diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h index a98c022..1dc1ba1 100644 --- a/arch/x86/include/asm/pci.h +++ b/arch/x86/include/asm/pci.h @@ -12,7 +12,6 @@ #ifdef __KERNEL__
struct pci_sysdata { - int domain; /* PCI domain */ int node; /* NUMA node */ #ifdef CONFIG_X86_64 void *iommu; /* IOMMU private data */ @@ -26,12 +25,6 @@ extern int noioapicreroute; #ifdef CONFIG_PCI
#ifdef CONFIG_PCI_DOMAINS -static inline int pci_domain_nr(struct pci_bus *bus) -{ - struct pci_sysdata *sd = bus->sysdata; - return sd->domain; -} - static inline int pci_proc_domain(struct pci_bus *bus) { return pci_domain_nr(bus); diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 286e0f5..5f78595 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -331,7 +331,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) * its bus->sysdata. */ struct pci_sysdata sd = { - .domain = domain, .node = node, };
@@ -345,7 +344,6 @@ struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) "pci_bus %04x:%02x: ignored (out of memory)\n", domain, busnum); else { - info->sd.domain = domain; info->sd.node = node; bus = acpi_pci_root_create(root, &acpi_pci_root_ops, &info->common, &info->sd);
Arches in subject are the only one who use pcibios_{add|remove}_bus hooks and implement it in the same way. Moreover ARM64 is going to do the same. So it seams that acpi_pci_{add|remove}_bus is generic enough to be default option for pcibios_{add|remove}_bus hooks. Also, it is always safe to run acpi_pci_{add|remove}_bus as they have empty stubs for !ACPI case and return if ACPI has been switched off in run time.
After all we can remove x86 and ia64 pcibios_{add|remove}_bus implementation.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- arch/ia64/pci/pci.c | 10 ---------- arch/x86/pci/common.c | 10 ---------- drivers/pci/probe.c | 3 +++ 3 files changed, 3 insertions(+), 20 deletions(-)
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index fe96bc9..c1e8ed5 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -357,16 +357,6 @@ void pcibios_fixup_bus(struct pci_bus *b) platform_pci_fixup_bus(b); }
-void pcibios_add_bus(struct pci_bus *bus) -{ - acpi_pci_add_bus(bus); -} - -void pcibios_remove_bus(struct pci_bus *bus) -{ - acpi_pci_remove_bus(bus); -} - void pcibios_set_master (struct pci_dev *dev) { /* No special bus mastering setup handling */ diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c index eccd4d9..ed3236d 100644 --- a/arch/x86/pci/common.c +++ b/arch/x86/pci/common.c @@ -171,16 +171,6 @@ void pcibios_fixup_bus(struct pci_bus *b) pcibios_fixup_device_resources(dev); }
-void pcibios_add_bus(struct pci_bus *bus) -{ - acpi_pci_add_bus(bus); -} - -void pcibios_remove_bus(struct pci_bus *bus) -{ - acpi_pci_remove_bus(bus); -} - /* * Only use DMI information to set this if nothing was passed * on the kernel command line (which was parsed earlier). diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index cad836f..2fbf840 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -12,6 +12,7 @@ #include <linux/slab.h> #include <linux/module.h> #include <linux/cpumask.h> +#include <linux/pci-acpi.h> #include <linux/pci-aspm.h> #include <linux/aer.h> #include <linux/acpi.h> @@ -2067,10 +2068,12 @@ int __weak pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
void __weak pcibios_add_bus(struct pci_bus *bus) { + acpi_pci_add_bus(bus); }
void __weak pcibios_remove_bus(struct pci_bus *bus) { + acpi_pci_remove_bus(bus); }
struct pci_bus *pci_create_root_bus(struct device *parent, int bus,
Lets abstract two calls which allow to inject and remove MCFG regions which may come from DSDT table. These calls will be used for x86 and ARM64 PCI host bridge driver in the later patches.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- drivers/acpi/mcfg.c | 38 ++++++++++++++++++++++++++++++++++++++ include/linux/pci-acpi.h | 9 +++++++++ 2 files changed, 47 insertions(+)
diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index 3e1e7be..dca4c4e 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -10,6 +10,7 @@ #include <linux/acpi.h> #include <linux/ecam.h> #include <linux/pci.h> +#include <linux/pci-acpi.h>
#define PREFIX "MCFG: "
@@ -77,6 +78,43 @@ int __init acpi_parse_mcfg(struct acpi_table_header *header) return 0; }
+int pci_mmcfg_setup_map(struct acpi_pci_root_info *ci) +{ + struct pci_mmcfg_region *cfg; + struct acpi_pci_root *root; + int seg, start, end, err; + + root = ci->root; + seg = root->segment; + start = root->secondary.start; + end = root->secondary.end; + + cfg = pci_mmconfig_lookup(seg, start); + if (cfg) + return 0; + + cfg = pci_mmconfig_alloc(seg, start, end, root->mcfg_addr); + if (!cfg) + return -ENOMEM; + + err = pci_mmconfig_inject(cfg); + return err; +} + +void pci_mmcfg_teardown_map(struct acpi_pci_root_info *ci) +{ + struct acpi_pci_root *root = ci->root; + struct pci_mmcfg_region *cfg; + + cfg = pci_mmconfig_lookup(root->segment, root->secondary.start); + if (!cfg) + return; + + if (cfg->hot_added) + pci_mmconfig_delete(root->segment, root->secondary.start, + root->secondary.end); +} + int __init __weak acpi_mcfg_check_entry(struct acpi_table_mcfg *mcfg, struct acpi_mcfg_allocation *cfg) { diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index 89ab057..c277415 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -79,6 +79,15 @@ extern struct pci_bus *acpi_pci_root_create(struct acpi_pci_root *root, void acpi_pci_add_bus(struct pci_bus *bus); void acpi_pci_remove_bus(struct pci_bus *bus);
+#ifdef CONFIG_PCI_MMCONFIG +int pci_mmcfg_setup_map(struct acpi_pci_root_info *ci); +void pci_mmcfg_teardown_map(struct acpi_pci_root_info *ci); +#else +static inline int pci_mmcfg_setup_map(struct acpi_pci_root_info *ci) +{ return 0; } +static inline void pci_mmcfg_teardown_map(struct acpi_pci_root_info *ci) { } +#endif + #ifdef CONFIG_ACPI_PCI_SLOT void acpi_pci_slot_init(void); void acpi_pci_slot_enumerate(struct pci_bus *bus);
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- arch/x86/pci/acpi.c | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-)
diff --git a/arch/x86/pci/acpi.c b/arch/x86/pci/acpi.c index 5f78595..33233a7 100644 --- a/arch/x86/pci/acpi.c +++ b/arch/x86/pci/acpi.c @@ -201,29 +201,11 @@ static int setup_mcfg_map(struct acpi_pci_root_info *ci)
return 0; } - -static void teardown_mcfg_map(struct acpi_pci_root_info *ci) -{ - struct acpi_pci_root *root = ci->root; - struct pci_mmcfg_region *cfg; - - cfg = pci_mmconfig_lookup(root->segment, root->secondary.start); - if (!cfg) - return; - - if (cfg->hot_added) - pci_mmconfig_delete(root->segment, root->secondary.start, - root->secondary.end); -} #else static int setup_mcfg_map(struct acpi_pci_root_info *ci) { return 0; } - -static void teardown_mcfg_map(struct acpi_pci_root_info *ci) -{ -} #endif
static int pci_acpi_root_get_node(struct acpi_pci_root *root) @@ -251,7 +233,7 @@ static int pci_acpi_root_init_info(struct acpi_pci_root_info *ci)
static void pci_acpi_root_release_info(struct acpi_pci_root_info *ci) { - teardown_mcfg_map(ci); + pci_mmcfg_teardown_map(ci); kfree(container_of(ci, struct pci_root_info, common)); }
We use generic accessors from access.c by default. However, we already know platforms that need special handling while accessing to PCI config space. These platforms will need different accessors set matched against platform ID, domain, bus touple. Therefore we are going to add (in future) DECLARE_ACPI_MCFG_FIXUP which will register platform specific custom accessors. For now we let pci_mcfg_get_ops takes domain and bus arguments and left some space for matching algorithm.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- drivers/acpi/mcfg.c | 30 ++++++++++++++++++++++++++++++ include/linux/pci-acpi.h | 8 ++++++++ 2 files changed, 38 insertions(+)
diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index dca4c4e..a9b2231 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -34,6 +34,36 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; }
+void __iomem * +pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) +{ + struct pci_mmcfg_region *cfg; + + cfg = pci_mmconfig_lookup(pci_domain_nr(bus), bus->number); + if (cfg && cfg->virt) + return cfg->virt + + (PCI_MMCFG_BUS_OFFSET(bus->number) | (devfn << 12)) + + offset; + return NULL; +} + +/* Default generic PCI config accessors */ +static struct pci_ops default_pci_mcfg_ops = { + .map_bus = pci_mcfg_dev_base, + .read = pci_generic_config_read, + .write = pci_generic_config_write, +}; + +struct pci_ops *pci_mcfg_get_ops(int domain, int bus) +{ + /* + * TODO: Match against platform specific quirks and return + * corresponding PCI config space accessor set. + */ + + return &default_pci_mcfg_ops; +} + int __init acpi_parse_mcfg(struct acpi_table_header *header) { struct acpi_table_mcfg *mcfg; diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h index c277415..3790197 100644 --- a/include/linux/pci-acpi.h +++ b/include/linux/pci-acpi.h @@ -82,10 +82,18 @@ void acpi_pci_remove_bus(struct pci_bus *bus); #ifdef CONFIG_PCI_MMCONFIG int pci_mmcfg_setup_map(struct acpi_pci_root_info *ci); void pci_mmcfg_teardown_map(struct acpi_pci_root_info *ci); +struct pci_ops *pci_mcfg_get_ops(int domain, int bus); +void __iomem * +pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset); #else static inline int pci_mmcfg_setup_map(struct acpi_pci_root_info *ci) { return 0; } static inline void pci_mmcfg_teardown_map(struct acpi_pci_root_info *ci) { } +static inline struct pci_ops *pci_mcfg_get_ops(int domain, int bus) +{ return NULL; } +void __iomem * +pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) +{ return NULL; } #endif
#ifdef CONFIG_ACPI_PCI_SLOT
From: Liu Jiang jiang.liu@linux.intel.com
Some architectures, such as IA64 and ARM64, have no instructions to directly access PCI IO ports, so they map PCI IO ports into PCI MMIO address space. Typically PCI host bridges on those architectures take the responsibility to map (translate) PCI IO port transactions into Memory-Mapped IO transactions. ACPI specification provides support of such a usage case by using resource translation_offset.
But current ACPI resource parsing interface isn't neutral enough, it still has some special logic for IA64. So refine the ACPI resource parsing interface and IA64 code to neutrally handle translation_offset by: 1) ACPI resource parsing interface doesn't do any translation, it just save the translation_offset to be used by arch code. 2) Arch code will do the mapping(translation) based on arch specific information. Typically it does: 2.a) Translate per PCI domain IO port address space into system global IO port address space. 2.b) Setup MMIO address mapping for IO ports. void handle_io_resource(struct resource_entry *io_entry) { struct resource *mmio_res;
mmio_res = kzalloc(sizeof(*mmio_res), GFP_KERNEL); mmio_res->flags = IORESOURCE_MEM; mmio_res->start = io_entry->offset + io_entry->res->start; mmio_res->end = io_entry->offset + io_entry->res->end; insert_resource(&iomem_resource, mmio_res)
base = map_to_system_ioport_address(entry); io_entry->offset = base; io_entry->res->start += base; io_entry->res->end += base; }
Signed-off-by: Jiang Liu jiang.liu@linux.intel.com --- arch/ia64/pci/pci.c | 26 ++++++++++++++++---------- drivers/acpi/resource.c | 12 +++++------- 2 files changed, 21 insertions(+), 17 deletions(-)
diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index c1e8ed5..d496976 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -154,7 +154,7 @@ static int add_io_space(struct device *dev, struct pci_root_info *info, struct resource_entry *iospace; struct resource *resource, *res = entry->res; char *name; - unsigned long base, min, max, base_port; + unsigned long base_mmio, base_port; unsigned int sparse = 0, space_nr, len;
len = strlen(info->common.name) + 32; @@ -172,12 +172,10 @@ static int add_io_space(struct device *dev, struct pci_root_info *info, goto free_resource;
name = (char *)(iospace + 1); - min = res->start - entry->offset; - max = res->end - entry->offset; - base = __pa(io_space[space_nr].mmio_base); + base_mmio = __pa(io_space[space_nr].mmio_base); base_port = IO_SPACE_BASE(space_nr); snprintf(name, len, "%s I/O Ports %08lx-%08lx", info->common.name, - base_port + min, base_port + max); + base_port + res->start, base_port + res->end);
/* * The SDM guarantees the legacy 0-64K space is sparse, but if the @@ -190,19 +188,27 @@ static int add_io_space(struct device *dev, struct pci_root_info *info, resource = iospace->res; resource->name = name; resource->flags = IORESOURCE_MEM; - resource->start = base + (sparse ? IO_SPACE_SPARSE_ENCODING(min) : min); - resource->end = base + (sparse ? IO_SPACE_SPARSE_ENCODING(max) : max); + resource->start = base_mmio; + resource->end = base_mmio; + if (sparse) { + resource->start += IO_SPACE_SPARSE_ENCODING(res->start); + resource->end += IO_SPACE_SPARSE_ENCODING(res->end); + } else { + resource->start += res->start; + resource->end += res->end; + } if (insert_resource(&iomem_resource, resource)) { dev_err(dev, "can't allocate host bridge io space resource %pR\n", resource); goto free_resource; } + resource_list_add_tail(iospace, &info->io_resources);
+ /* Adjust base of original IO port resource descriptor */ entry->offset = base_port; - res->start = min + base_port; - res->end = max + base_port; - resource_list_add_tail(iospace, &info->io_resources); + res->start += base_port; + res->end += base_port;
return 0;
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index cdc5c25..6578f68 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -190,8 +190,7 @@ static bool acpi_decode_space(struct resource_win *win, { u8 iodec = attr->granularity == 0xfff ? ACPI_DECODE_10 : ACPI_DECODE_16; bool wp = addr->info.mem.write_protect; - u64 len = attr->address_length; - u64 start, end, offset = 0; + u64 len = attr->address_length, offset = 0; struct resource *res = &win->res;
/* @@ -215,14 +214,13 @@ static bool acpi_decode_space(struct resource_win *win, else if (attr->translation_offset) pr_debug("ACPI: translation_offset(%lld) is invalid for non-bridge device.\n", attr->translation_offset); - start = attr->minimum + offset; - end = attr->maximum + offset;
win->offset = offset; - res->start = start; - res->end = end; + res->start = attr->minimum; + res->end = attr->maximum; if (sizeof(resource_size_t) < sizeof(u64) && - (offset != win->offset || start != res->start || end != res->end)) { + (offset != win->offset || attr->minimum != res->start || + attr->maximum != res->end)) { pr_warn("acpi resource window ([%#llx-%#llx] ignored, not CPU addressable)\n", attr->minimum, attr->maximum); return false;
Because of two patch series: 1. Jiang Liu's common interface to support PCI host bridge init 2. MMCONFIG refactoring (part of this patch set) now we can think about generic ACPI based PCI host bridge driver out of arch/ directory.
This driver use information from MCFG table (PCI config space regions) and _CRS method (IO/irq resources) to initialize PCI hostbridge.
TBD: We are still not sure whether we should reassign resources after PCI bus enumeration or trust firmware to do all that work for us properly.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Signed-off-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com CC: Arnd Bergmann arnd@arndb.de CC: Catalin Marinas catalin.marinas@arm.com CC: Liviu Dudau Liviu.Dudau@arm.com CC: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com CC: Will Deacon will.deacon@arm.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com --- drivers/pci/host/Kconfig | 6 ++ drivers/pci/host/Makefile | 1 + drivers/pci/host/pci-host-acpi.c | 138 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 145 insertions(+) create mode 100644 drivers/pci/host/pci-host-acpi.c
diff --git a/drivers/pci/host/Kconfig b/drivers/pci/host/Kconfig index f131ba9..0e5e339 100644 --- a/drivers/pci/host/Kconfig +++ b/drivers/pci/host/Kconfig @@ -60,6 +60,12 @@ config PCI_HOST_GENERIC Say Y here if you want to support a simple generic PCI host controller, such as the one emulated by kvmtool.
+config PCI_HOST_GENERIC_ACPI + bool "Generic ACPI PCI host controller" + depends on ACPI && ARCH_PCI_HOST_GENERIC_ACPI + help + Say Y here if you want to support generic ACPI PCI host controller. + config PCIE_SPEAR13XX bool "STMicroelectronics SPEAr PCIe controller" depends on ARCH_SPEAR13XX diff --git a/drivers/pci/host/Makefile b/drivers/pci/host/Makefile index 9d4d3c6..9117894 100644 --- a/drivers/pci/host/Makefile +++ b/drivers/pci/host/Makefile @@ -7,6 +7,7 @@ obj-$(CONFIG_PCI_TEGRA) += pci-tegra.o obj-$(CONFIG_PCI_RCAR_GEN2) += pci-rcar-gen2.o obj-$(CONFIG_PCI_RCAR_GEN2_PCIE) += pcie-rcar.o obj-$(CONFIG_PCI_HOST_GENERIC) += pci-host-generic.o +obj-$(CONFIG_PCI_HOST_GENERIC_ACPI) += pci-host-acpi.o obj-$(CONFIG_PCIE_SPEAR13XX) += pcie-spear13xx.o obj-$(CONFIG_PCI_KEYSTONE) += pci-keystone-dw.o pci-keystone.o obj-$(CONFIG_PCIE_XILINX) += pcie-xilinx.o diff --git a/drivers/pci/host/pci-host-acpi.c b/drivers/pci/host/pci-host-acpi.c new file mode 100644 index 0000000..29175f5 --- /dev/null +++ b/drivers/pci/host/pci-host-acpi.c @@ -0,0 +1,138 @@ +/* + * ACPI based generic PCI host controller driver + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + * + * Copyright (C) 2015 Semihalf + * Author: Tomasz Nowicki tn@semihalf.com + */ + +#include <linux/acpi.h> +#include <linux/ecam.h> +#include <linux/of_address.h> +#include <linux/pci.h> +#include <linux/pci-acpi.h> + +static int pcibios_map_irq(struct pci_dev *dev, u8 slot, u8 pin) +{ + if (pci_dev_msi_enabled(dev)) + return 0; + + return acpi_pci_irq_enable(dev); +} + +int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge) +{ + bridge->map_irq = pcibios_map_irq; + return 0; +} + +static void pci_mcfg_release_info(struct acpi_pci_root_info *ci) +{ + pci_mmcfg_teardown_map(ci); + kfree(ci); +} + +static int pci_acpi_root_prepare_resources(struct acpi_pci_root_info *ci) +{ + struct resource_entry *entry, *tmp; + int ret; + + ret = acpi_pci_probe_root_resources(ci); + if (ret <= 0) + return ret; + + resource_list_for_each_entry_safe(entry, tmp, &ci->resources) { + struct resource *res = entry->res; + + /* + * TODO: need to move pci_register_io_range() function out + * of drivers/of/address.c for both used by DT and ACPI + */ + if (res->flags & IORESOURCE_IO) { + resource_size_t cpu_addr = res->start + entry->offset; + resource_size_t pci_addr = res->start; + resource_size_t length = res->end - res->start; + unsigned long port; + int err; + + err = pci_register_io_range(cpu_addr, length); + if (err) { + resource_list_destroy_entry(entry); + continue; + } + + port = pci_address_to_pio(cpu_addr); + if (port == (unsigned long)-1) { + resource_list_destroy_entry(entry); + continue; + } + + res->start = port; + res->end = port + length; + entry->offset = port - pci_addr; + + if (pci_remap_iospace(res, cpu_addr) < 0) + resource_list_destroy_entry(entry); + } + } + + return ret; +} + +static struct acpi_pci_root_ops acpi_pci_root_ops = { + .init_info = pci_mmcfg_setup_map, + .release_info = pci_mcfg_release_info, + .prepare_resources = pci_acpi_root_prepare_resources, +}; + +/* Root bridge scanning */ +struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) +{ + int node = acpi_get_node(root->device->handle); + int domain = root->segment; + int busnum = root->secondary.start; + struct acpi_pci_root_info *info; + struct pci_bus *bus; + + if (domain && !pci_domains_supported) { + pr_warn("PCI %04x:%02x: multiple domains not supported.\n", + domain, busnum); + return NULL; + } + + info = kzalloc_node(sizeof(*info), GFP_KERNEL, node); + if (!info) { + dev_err(&root->device->dev, + "pci_bus %04x:%02x: ignored (out of memory)\n", + domain, busnum); + return NULL; + } + + acpi_pci_root_ops.pci_ops = pci_mcfg_get_ops(domain, busnum); + bus = acpi_pci_root_create(root, &acpi_pci_root_ops, info, root); + + /* After the PCI-E bus has been walked and all devices discovered, + * configure any settings of the fabric that might be necessary. + */ + if (bus) { + struct pci_bus *child; + pci_bus_size_bridges(bus); + pci_bus_assign_resources(bus); + + list_for_each_entry(child, &bus->children, node) + pcie_bus_configure_settings(child); + } + + return bus; +}
On Wednesday 16 December 2015 16:16:31 Tomasz Nowicki wrote:
Because of two patch series:
- Jiang Liu's common interface to support PCI host bridge init
- MMCONFIG refactoring (part of this patch set)
now we can think about generic ACPI based PCI host bridge driver out of arch/ directory.
This driver use information from MCFG table (PCI config space regions) and _CRS method (IO/irq resources) to initialize PCI hostbridge.
TBD: We are still not sure whether we should reassign resources after PCI bus enumeration or trust firmware to do all that work for us properly.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Signed-off-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com CC: Arnd Bergmann arnd@arndb.de CC: Catalin Marinas catalin.marinas@arm.com CC: Liviu Dudau Liviu.Dudau@arm.com CC: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com CC: Will Deacon will.deacon@arm.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com
I think this code could better live in drivers/acpi/pci_root.c along with all the related functions. It's not really a driver by itself and cannot be a loadable module or built on other architectures.
You can put all the code inside an #ifdef ARCH_PCI_HOST_GENERIC_ACPI there.
Arnd
On 18.12.2015 13:40, Arnd Bergmann wrote:
On Wednesday 16 December 2015 16:16:31 Tomasz Nowicki wrote:
Because of two patch series:
- Jiang Liu's common interface to support PCI host bridge init
- MMCONFIG refactoring (part of this patch set)
now we can think about generic ACPI based PCI host bridge driver out of arch/ directory.
This driver use information from MCFG table (PCI config space regions) and _CRS method (IO/irq resources) to initialize PCI hostbridge.
TBD: We are still not sure whether we should reassign resources after PCI bus enumeration or trust firmware to do all that work for us properly.
Signed-off-by: Tomasz Nowicki tn@semihalf.com Signed-off-by: Hanjun Guo hanjun.guo@linaro.org Signed-off-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com CC: Arnd Bergmann arnd@arndb.de CC: Catalin Marinas catalin.marinas@arm.com CC: Liviu Dudau Liviu.Dudau@arm.com CC: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com CC: Will Deacon will.deacon@arm.com Tested-by: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com
I think this code could better live in drivers/acpi/pci_root.c along with all the related functions. It's not really a driver by itself and cannot be a loadable module or built on other architectures.
You can put all the code inside an #ifdef ARCH_PCI_HOST_GENERIC_ACPI there.
Makes sense to me, thanks.
Tomasz
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
static struct pci_ops ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = xxxx_ecam_config_read, .write = xxxx_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(yyy, &ecam_pci_ops, <domain_nr>, <bus_nr>);
Note, that more custom actions can be done via DMI callback hook.
Signed-off-by: Tomasz Nowicki tn@semihalf.com --- drivers/acpi/mcfg.c | 35 +++++++++++++++++++++++++++++++++-- include/asm-generic/vmlinux.lds.h | 7 +++++++ include/linux/ecam.h | 17 +++++++++++++++++ 3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index a9b2231..6d0194d 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -8,6 +8,7 @@ */
#include <linux/acpi.h> +#include <linux/dmi.h> #include <linux/ecam.h> #include <linux/pci.h> #include <linux/pci-acpi.h> @@ -34,6 +35,31 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; }
+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[]; +extern struct pci_mcfg_fixup __end_acpi_mcfg_fixups[]; + +static struct pci_ops *pci_mcfg_check_quirks(int domain, int bus_number) +{ + struct pci_mcfg_fixup *fixup; + + fixup = __start_acpi_mcfg_fixups; + while (fixup < __end_acpi_mcfg_fixups) { + if (dmi_check_system(fixup->system) && + (fixup->domain == domain || + fixup->domain == PCI_MCFG_DOMAIN_ANY) && + (fixup->bus_number == bus_number || + fixup->bus_number == PCI_MCFG_BUS_ANY)) { + pr_info(PREFIX "Fixup applied: Platform [%s] domain [%d] bus number [%d]\n", + fixup->system->ident, fixup->domain, + fixup->bus_number); + return fixup->ops; + } + ++fixup; + } + + return NULL; +} + void __iomem * pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) { @@ -56,10 +82,15 @@ static struct pci_ops default_pci_mcfg_ops = {
struct pci_ops *pci_mcfg_get_ops(int domain, int bus) { + struct pci_ops *pci_mcfg_ops_quirk; + /* - * TODO: Match against platform specific quirks and return - * corresponding PCI config space accessor set. + * Match against platform specific quirks and return corresponding + * PCI config space accessor set. */ + pci_mcfg_ops_quirk = pci_mcfg_check_quirks(domain, bus); + if (pci_mcfg_ops_quirk) + return pci_mcfg_ops_quirk;
return &default_pci_mcfg_ops; } diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index c4bd0e2..c93fc97 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -298,6 +298,13 @@ VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \ } \ \ + /* ACPI MCFG quirks */ \ + .acpi_fixup : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \ + VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \ + *(.acpi_fixup_mcfg) \ + VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \ + } \ + \ /* Built-in firmware blobs */ \ .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_builtin_fw) = .; \ diff --git a/include/linux/ecam.h b/include/linux/ecam.h index e0f322e..1319fa8 100644 --- a/include/linux/ecam.h +++ b/include/linux/ecam.h @@ -20,6 +20,23 @@ struct pci_mmcfg_region { bool hot_added; };
+struct pci_mcfg_fixup { + const struct dmi_system_id *system; + struct pci_ops *ops; + int domain; + int bus_number; +}; + +#define PCI_MCFG_DOMAIN_ANY -1 +#define PCI_MCFG_BUS_ANY -1 + +/* Designate a routine to fix up buggy MCFG */ +#define DECLARE_ACPI_MCFG_FIXUP(system, ops, dom, bus) \ + static const struct pci_mcfg_fixup __mcfg_fixup_##system##dom##bus\ + __used __attribute__((__section__(".acpi_fixup_mcfg"), \ + aligned((sizeof(void *))))) = \ + { system, ops, dom, bus }; + struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, int end, u64 addr);
Hi Tomasz
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Tomasz Nowicki Sent: 16 December 2015 15:17 To: bhelgaas@google.com; arnd@arndb.de; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com Cc: robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com; jcm@redhat.com; Tomasz Nowicki Subject: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
I've got a couple of comments/questions about this patch..
1) So according to this mechanism quirks would be supported only by vendors whose BIOS are SMBIOS compliant. Now personally I am ok with this but I don't know if this is OK in general as it would narrow down the number of platforms that would be able to define the quirks... Lorenzo, Arnd what is your opinion here?
2) In the quirk mechanism you proposed, I see that the callback function allows to do some preparation work for the host bridge. For example in Hisilicon hip05 case we would need to read some values from the ACPI table (see acpi_pci_root_hisi_add() function in https://lkml.org/lkml/2015/12/3/426). I am quite new to ACPI and I wonder if it is OK to add such "Packages" to the PCI host bridge ACPI device...or maybe we need to declare a new one...?
Many Thanks
Gab
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
static struct pci_ops ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = xxxx_ecam_config_read, .write = xxxx_ecam_config_write, }; DECLARE_ACPI_MCFG_FIXUP(yyy, &ecam_pci_ops, <domain_nr>, <bus_nr>);
Note, that more custom actions can be done via DMI callback hook.
Signed-off-by: Tomasz Nowicki tn@semihalf.com
drivers/acpi/mcfg.c | 35 +++++++++++++++++++++++++++++++++-- include/asm-generic/vmlinux.lds.h | 7 +++++++ include/linux/ecam.h | 17 +++++++++++++++++ 3 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/mcfg.c b/drivers/acpi/mcfg.c index a9b2231..6d0194d 100644 --- a/drivers/acpi/mcfg.c +++ b/drivers/acpi/mcfg.c @@ -8,6 +8,7 @@ */
#include <linux/acpi.h> +#include <linux/dmi.h> #include <linux/ecam.h> #include <linux/pci.h> #include <linux/pci-acpi.h> @@ -34,6 +35,31 @@ int __weak raw_pci_write(unsigned int domain, unsigned int bus, return PCIBIOS_DEVICE_NOT_FOUND; }
+extern struct pci_mcfg_fixup __start_acpi_mcfg_fixups[]; extern struct +pci_mcfg_fixup __end_acpi_mcfg_fixups[];
+static struct pci_ops *pci_mcfg_check_quirks(int domain, int +bus_number) {
- struct pci_mcfg_fixup *fixup;
- fixup = __start_acpi_mcfg_fixups;
- while (fixup < __end_acpi_mcfg_fixups) {
if (dmi_check_system(fixup->system) &&
(fixup->domain == domain ||
fixup->domain == PCI_MCFG_DOMAIN_ANY) &&
(fixup->bus_number == bus_number ||
fixup->bus_number == PCI_MCFG_BUS_ANY)) {
pr_info(PREFIX "Fixup applied: Platform [%s] domain
[%d] bus number [%d]\n",
fixup->system->ident, fixup->domain,
fixup->bus_number);
return fixup->ops;
}
++fixup;
- }
- return NULL;
+}
void __iomem * pci_mcfg_dev_base(struct pci_bus *bus, unsigned int devfn, int offset) { @@ -56,10 +82,15 @@ static struct pci_ops default_pci_mcfg_ops = {
struct pci_ops *pci_mcfg_get_ops(int domain, int bus) {
- struct pci_ops *pci_mcfg_ops_quirk;
- /*
* TODO: Match against platform specific quirks and return
* corresponding PCI config space accessor set.
* Match against platform specific quirks and return
corresponding
* PCI config space accessor set.
*/
pci_mcfg_ops_quirk = pci_mcfg_check_quirks(domain, bus);
if (pci_mcfg_ops_quirk)
return pci_mcfg_ops_quirk;
return &default_pci_mcfg_ops;
} diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm- generic/vmlinux.lds.h index c4bd0e2..c93fc97 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -298,6 +298,13 @@ VMLINUX_SYMBOL(__end_pci_fixups_suspend_late) = .; \ } \ \
- /* ACPI MCFG quirks */ \
- .acpi_fixup : AT(ADDR(.acpi_fixup) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_acpi_mcfg_fixups) = .; \
*(.acpi_fixup_mcfg) \
VMLINUX_SYMBOL(__end_acpi_mcfg_fixups) = .; \
- } \
/* Built-in firmware blobs */ \ .builtin_fw : AT(ADDR(.builtin_fw) - LOAD_OFFSET) { \ VMLINUX_SYMBOL(__start_builtin_fw) = .; \\
diff --git a/include/linux/ecam.h b/include/linux/ecam.h index e0f322e..1319fa8 100644 --- a/include/linux/ecam.h +++ b/include/linux/ecam.h @@ -20,6 +20,23 @@ struct pci_mmcfg_region { bool hot_added; };
+struct pci_mcfg_fixup {
- const struct dmi_system_id *system;
- struct pci_ops *ops;
- int domain;
- int bus_number;
+};
+#define PCI_MCFG_DOMAIN_ANY -1 +#define PCI_MCFG_BUS_ANY -1
+/* Designate a routine to fix up buggy MCFG */ +#define DECLARE_ACPI_MCFG_FIXUP(system, ops, dom, bus) \
- static const struct pci_mcfg_fixup
__mcfg_fixup_##system##dom##bus\
\__used __attribute__((__section__(".acpi_fixup_mcfg"),
aligned((sizeof(void *))))) = \
- { system, ops, dom, bus };
struct pci_mmcfg_region *pci_mmconfig_lookup(int segment, int bus); struct pci_mmcfg_region *pci_mmconfig_alloc(int segment, int start, int end, u64 addr); -- 1.9.1
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On Monday 21 December 2015, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Tomasz Nowicki
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
I've got a couple of comments/questions about this patch..
- So according to this mechanism quirks would be supported only by vendors whose BIOS are SMBIOS compliant. Now personally I am ok with this but I don't know if this is OK in general as it would narrow down the number of platforms that would be able to define the quirks... Lorenzo, Arnd what is your opinion here?
I'd rather not see the quirks in mainline at all, and only support SBSA compliant machines, or require the BIOS to work around the hardware quirks differently (e.g. by trapping config space access through secure firmware, or going through an AML method to be defined). I'm certainly ok with making it depend on SMBIOS if we are going to use something like this.
Arnd
On 12/21/2015 06:10 AM, Arnd Bergmann wrote:
On Monday 21 December 2015, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Tomasz Nowicki
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
I've got a couple of comments/questions about this patch..
- So according to this mechanism quirks would be supported only by vendors whose BIOS are SMBIOS compliant. Now personally I am ok with this but I don't know if this is OK in general as it would narrow down the number of platforms that would be able to define the quirks... Lorenzo, Arnd what is your opinion here?
I'd rather not see the quirks in mainline at all, and only support SBSA compliant machines,
There seems to exist a class of systems that were intended to be SBSA compliant, but after they were manufactured turn out to not be fully complaint. It would be nice to have Linux kernel support for some of these systems
There also seems to be historical precedent for quirk frameworks in various kernel subsystems to handle systems and devices that are not fully "Spec. Compliant".
or require the BIOS to work around the hardware quirks differently (e.g. by trapping config space access through secure firmware, or going through an AML method to be defined).
Some systems don't seem to have this capability. For example, in ARMv8 (A.K.A. arm64), I haven't been able to figure out how to trap these accesses to EL3 for emulation. The specification is 5700 pages long though, so perhaps I missed that bit.
I'm certainly ok with making it depend on SMBIOS if we are going to use something like this.
Arnd
On Monday 21 December 2015, David Daney wrote:
On 12/21/2015 06:10 AM, Arnd Bergmann wrote:
On Monday 21 December 2015, Gabriele Paoloni wrote:
or require the BIOS to work around the hardware quirks differently (e.g. by trapping config space access through secure firmware, or going through an AML method to be defined).
Some systems don't seem to have this capability. For example, in ARMv8 (A.K.A. arm64), I haven't been able to figure out how to trap these accesses to EL3 for emulation. The specification is 5700 pages long though, so perhaps I missed that bit.
How about using AML then? This would be similar to what CHRP used with RTAS calls to do PCI config space access.
Arnd
On 12/21/2015 05:42 PM, Arnd Bergmann wrote:
On Monday 21 December 2015, David Daney wrote:
On 12/21/2015 06:10 AM, Arnd Bergmann wrote:
On Monday 21 December 2015, Gabriele Paoloni wrote:
or require the BIOS to work around the hardware quirks differently (e.g. by trapping config space access through secure firmware, or going through an AML method to be defined).
Some systems don't seem to have this capability. For example, in ARMv8 (A.K.A. arm64), I haven't been able to figure out how to trap these accesses to EL3 for emulation. The specification is 5700 pages long though, so perhaps I missed that bit.
There isn't a way to directly trap to EL3 for emulation (caveat - there might be some nasty hack with an SMMU that wouldn't be supportable). I requested the implementation of a generic mechanism for LPC type emulation (complete with "SMI" traps to EL3 for fixups) about 4 years ago. That wouldn't have helped with this situation, but this was to be on the radar afterward. However, on ARM, it is still early days with respect to transparently trapping and emulating hardware workarounds.
How about using AML then? This would be similar to what CHRP used with RTAS calls to do PCI config space access.
The best way to do it for now (IMO) is via a DMI quirk match and a special method for the early SoCs that aren't implementing MMCONFIG correctly. An effort is underway to correct that in third party IP, and similar directly with the partners for future generations. So this should not get "much" more out of hand than it sadly is so far. Once we have a good upstream solution (which is vital) then it will be an error and a pre-tapeout bug to not be able to boot an upstream kernel with stock ACPI hostbridge working sans nasty quirks. Therefore, the sooner this is upstream, the better it will be for everyone involved.
Jon.
Sorry for top-posting. A quick note that SMBIOS3 is required by SBBR so it can be presumed that compliant platforms will provide quirks via DMI.
On 22.12.2015 00:10, Jon Masters wrote:
Sorry for top-posting. A quick note that SMBIOS3 is required by SBBR so it can be presumed that compliant platforms will provide quirks via DMI.
Thanks Jon for confirmation.
Tomasz
Hi Jon, thanks for replying
-----Original Message----- From: Jon Masters [mailto:jcm@redhat.com] Sent: 21 December 2015 23:11 To: Arnd Bergmann Cc: Gabriele Paoloni; Tomasz Nowicki; bhelgaas@google.com; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com; robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
Sorry for top-posting. A quick note that SMBIOS3 is required by SBBR so it can be presumed that compliant platforms will provide quirks via DMI.
Ok so you completely clarified my question 1). Many Thanks for this
Gab
-- Computer Architect | Sent from my 64-bit #ARM Powered phone
On Dec 21, 2015, at 09:11, Arnd Bergmann arnd@arndb.de wrote:
On Monday 21 December 2015, Gabriele Paoloni wrote:
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Tomasz Nowicki
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that
overwrite
accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
I've got a couple of comments/questions about this patch..
- So according to this mechanism quirks would be supported only by
vendors whose BIOS are SMBIOS compliant. Now personally I am ok with this but I don't know if this is OK in general as it would narrow down the number of platforms that would be able to define the quirks... Lorenzo, Arnd what is your opinion here?
I'd rather not see the quirks in mainline at all, and only support SBSA compliant machines, or require the BIOS to work around the hardware quirks differently (e.g. by trapping config space access through secure firmware, or going through an AML method to be defined). I'm certainly ok with making it depend on SMBIOS if we are
going to use something like this.
Arnd
On 12/22/2015 04:29 AM, Gabriele Paoloni wrote:
Hi Jon, thanks for replying
-----Original Message----- From: Jon Masters [mailto:jcm@redhat.com] Sent: 21 December 2015 23:11 To: Arnd Bergmann Cc: Gabriele Paoloni; Tomasz Nowicki; bhelgaas@google.com; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com; robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
Sorry for top-posting. A quick note that SMBIOS3 is required by SBBR so it can be presumed that compliant platforms will provide quirks via DMI.
Ok so you completely clarified my question 1). Many Thanks for this
No problem. One of the (many) reasons I pushed for requiring SMBIOS/DMI in SBBR (I was lead author of one of the early drafts of that document) was to make the experience ultimately equivalent across architectures. We already know how to do quirks and handle platform deviations, and the major Operating System vendors did not want to reinvent the wheel.
Jon.
On 12/22/2015 11:36 AM, Jon Masters wrote:
On 12/22/2015 04:29 AM, Gabriele Paoloni wrote:
Hi Jon, thanks for replying
-----Original Message----- From: Jon Masters [mailto:jcm@redhat.com] Sent: 21 December 2015 23:11 To: Arnd Bergmann Cc: Gabriele Paoloni; Tomasz Nowicki; bhelgaas@google.com; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com; robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
Sorry for top-posting. A quick note that SMBIOS3 is required by SBBR so it can be presumed that compliant platforms will provide quirks via DMI.
Ok so you completely clarified my question 1). Many Thanks for this
No problem. One of the (many) reasons I pushed for requiring SMBIOS/DMI in SBBR (I was lead author of one of the early drafts of that document) was to make the experience ultimately equivalent across architectures. We already know how to do quirks and handle platform deviations, and the major Operating System vendors did not want to reinvent the wheel.
Additional: it is clear that more prescription is required to get the vendors onto the bandwagon that we have with other architectures (e.g. that other one). So there will be a Red Hat "ARM server whitepaper" coming in the early new year that will include the kind of "server 101" material we want to make sure people know. Things like making sure you implement and test PCIe correctly, handle backward compatibility (you will build hardware in the future that runs my existing OS release), design the hardware to allow for workarounds later, etc. I expect some other Operating System vendors to be involved in reviewing that.
Ultimately my objective is to make this whole thing dull and boring. You will get RHEL(SA)/upstream kernels and it will either boot or it will not. If it does not boot, vendor X screwed up their hardware. We know this story, it's been this way for over a decade already, and that is exactly how it is going to be with ARM servers shortly.
Jon.
-----Original Message----- From: Jon Masters [mailto:jcm@redhat.com] Sent: 22 December 2015 16:45 To: Gabriele Paoloni; Arnd Bergmann Cc: Tomasz Nowicki; bhelgaas@google.com; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com; robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
On 12/22/2015 11:36 AM, Jon Masters wrote:
On 12/22/2015 04:29 AM, Gabriele Paoloni wrote:
Hi Jon, thanks for replying
-----Original Message----- From: Jon Masters [mailto:jcm@redhat.com] Sent: 21 December 2015 23:11 To: Arnd Bergmann Cc: Gabriele Paoloni; Tomasz Nowicki; bhelgaas@google.com; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com; robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
Sorry for top-posting. A quick note that SMBIOS3 is required by
SBBR
so it can be presumed that compliant platforms will provide quirks
via DMI.
Ok so you completely clarified my question 1). Many Thanks for this
No problem. One of the (many) reasons I pushed for requiring SMBIOS/DMI in SBBR (I was lead author of one of the early drafts of that document) was to make the experience ultimately equivalent
across architectures.
We already know how to do quirks and handle platform deviations, and the major Operating System vendors did not want to reinvent the wheel.
Additional: it is clear that more prescription is required to get the vendors onto the bandwagon that we have with other architectures (e.g. that other one). So there will be a Red Hat "ARM server whitepaper" coming in the early new year that will include the kind of "server 101" material we want to make sure people know. Things like making sure you implement and test PCIe correctly, handle backward compatibility (you will build hardware in the future that runs my existing OS release), design the hardware to allow for workarounds later, etc. I expect some other Operating System vendors to be involved in reviewing that.
Ultimately my objective is to make this whole thing dull and boring. You will get RHEL(SA)/upstream kernels and it will either boot or it will not. If it does not boot, vendor X screwed up their hardware. We know this story, it's been this way for over a decade already, and that is exactly how it is going to be with ARM servers shortly.
Ok got it.
Many Thanks
Gab
Jon.
On 21.12.2015 12:47, Gabriele Paoloni wrote:
- In the quirk mechanism you proposed, I see that the callback function allows to do some preparation work for the host bridge. For example in Hisilicon hip05 case we would need to read some values from the ACPI table (see acpi_pci_root_hisi_add() function in https://lkml.org/lkml/2015/12/3/426). I am quite new to ACPI and I wonder if it is OK to add such "Packages" to the PCI host bridge ACPI device...or maybe we need to declare a new one...?
I may miss sth so please correct me in that case.
https://lkml.org/lkml/2015/12/3/426 shows that you need special handling for root->secondary.start bus number only, right? So how about creating special MCFG region rc-base:rc-base+rc-size only for <segment,bus>. Like that:
[0008] Base Address : <rc-base> [0002] Segment Group Number : <segment> [0001] Start Bus Number : <root->secondary.start> [0001] End Bus Number : <root->secondary.start> [0004] Reserved : 00000000
static const struct dmi_system_id hisi_quirk[] = { { .ident = "HiSi...", .matches = { DMI_MATCH(<whatever you need to match your platform>), }, }, { } };
static struct pci_ops hisi_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = hisi_pcie_cfg_read, .write = hisi_pcie_cfg_write, };
DECLARE_ACPI_MCFG_FIXUP(hisi_quirk, &hisi_ecam_pci_ops, <segment>, <bus>);
With above code you can use your custom PCI config accessor only for that region.
Let me know if that is not enough for you.
Tomasz
Hi Tomasz
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Tomasz Nowicki Sent: 22 December 2015 10:20 To: Gabriele Paoloni; bhelgaas@google.com; arnd@arndb.de; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com Cc: robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com; jcm@redhat.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
On 21.12.2015 12:47, Gabriele Paoloni wrote:
- In the quirk mechanism you proposed, I see that the callback
function
allows to do some preparation work for the host bridge. For
example in
Hisilicon hip05 case we would need to read some values from the
ACPI
table (see acpi_pci_root_hisi_add() function in https://lkml.org/lkml/2015/12/3/426). I am quite new to ACPI and I wonder if it is OK to add such
"Packages"
to the PCI host bridge ACPI device...or maybe we need to declare
a new
one...?
I may miss sth so please correct me in that case.
https://lkml.org/lkml/2015/12/3/426 shows that you need special handling for root->secondary.start bus number only, right? So how about creating special MCFG region rc-base:rc-base+rc-size only for <segment,bus>. Like that:
[0008] Base Address : <rc-base> [0002] Segment Group Number : <segment> [0001] Start Bus Number : <root->secondary.start> [0001] End Bus Number : <root->secondary.start> [0004] Reserved : 00000000
static const struct dmi_system_id hisi_quirk[] = { { .ident = "HiSi...", .matches = { DMI_MATCH(<whatever you need to match your platform>), }, }, { } };
static struct pci_ops hisi_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = hisi_pcie_cfg_read, .write = hisi_pcie_cfg_write, };
DECLARE_ACPI_MCFG_FIXUP(hisi_quirk, &hisi_ecam_pci_ops, <segment>, <bus>);
With above code you can use your custom PCI config accessor only for that region.
Let me know if that is not enough for you.
In principle I think it can work...
Liudongdong, Guo Hanjun what is your opinion about?
Thanks
Gab
Tomasz
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 12/22/2015 10:48 PM, Gabriele Paoloni wrote:
Hi Tomasz
-----Original Message----- From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- owner@vger.kernel.org] On Behalf Of Tomasz Nowicki Sent: 22 December 2015 10:20 To: Gabriele Paoloni; bhelgaas@google.com; arnd@arndb.de; will.deacon@arm.com; catalin.marinas@arm.com; rjw@rjwysocki.net; hanjun.guo@linaro.org; Lorenzo.Pieralisi@arm.com; okaya@codeaurora.org; jiang.liu@linux.intel.com; Stefano.Stabellini@eu.citrix.com Cc: robert.richter@caviumnetworks.com; mw@semihalf.com; Liviu.Dudau@arm.com; ddaney@caviumnetworks.com; tglx@linutronix.de; Wangyijing; Suravee.Suthikulpanit@amd.com; msalter@redhat.com; linux- pci@vger.kernel.org; linux-arm-kernel@lists.infradead.org; linux- acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linaro- acpi@lists.linaro.org; jchandra@broadcom.com; jcm@redhat.com Subject: Re: [PATCH V2 22/23] pci, acpi: Match PCI config space accessors against platfrom specific quirks.
On 21.12.2015 12:47, Gabriele Paoloni wrote:
- In the quirk mechanism you proposed, I see that the callback
function
allows to do some preparation work for the host bridge. For
example in
Hisilicon hip05 case we would need to read some values from the
ACPI
table (see acpi_pci_root_hisi_add() function in https://lkml.org/lkml/2015/12/3/426). I am quite new to ACPI and I wonder if it is OK to add such
"Packages"
to the PCI host bridge ACPI device...or maybe we need to declare
a new
one...?
I may miss sth so please correct me in that case.
https://lkml.org/lkml/2015/12/3/426 shows that you need special handling for root->secondary.start bus number only, right? So how about creating special MCFG region rc-base:rc-base+rc-size only for <segment,bus>. Like that:
[0008] Base Address : <rc-base> [0002] Segment Group Number : <segment> [0001] Start Bus Number : <root->secondary.start> [0001] End Bus Number : <root->secondary.start> [0004] Reserved : 00000000
static const struct dmi_system_id hisi_quirk[] = { { .ident = "HiSi...", .matches = { DMI_MATCH(<whatever you need to match your platform>), }, }, { } };
static struct pci_ops hisi_ecam_pci_ops = { .map_bus = pci_mcfg_dev_base, .read = hisi_pcie_cfg_read, .write = hisi_pcie_cfg_write, };
DECLARE_ACPI_MCFG_FIXUP(hisi_quirk, &hisi_ecam_pci_ops, <segment>, <bus>);
With above code you can use your custom PCI config accessor only for that region.
Let me know if that is not enough for you.
In principle I think it can work...
Liudongdong, Guo Hanjun what is your opinion about?
Let me and Dongdong prepare a patch for Hip05 and then will back to this discussion to see if we met some problems.
Thanks Hanjun
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
Also, there doesn't seem to be a way to connect a given quirk check to the MCFG/device requesting the ops. So if there is a platform with multiple PCIE roots and not all of them have quirks, how does one no whether to override the default ecam ops?
On 08.01.2016 15:16, Mark Salter wrote:
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
Right, my next version already has alternative to DMI match handler, so there will be two ways to match: 1. DMI, like in this patch set 2. int (*match)(struct pci_mcfg_fixup *) where you can read CPUID, and whatever is necessary.
Also, there doesn't seem to be a way to connect a given quirk check to the MCFG/device requesting the ops. So if there is a platform with multiple PCIE roots and not all of them have quirks, how does one no whether to override the default ecam ops?
Then we can identify them using domain:bus. I was wondering to pass acpi device handler to match handler for the case where we need e.g. extra properties from related DSDT device descriptor. Does it make sense to you?
Tomasz
On Fri, 2016-01-08 at 15:36 +0100, Tomasz Nowicki wrote:
On 08.01.2016 15:16, Mark Salter wrote:
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
Right, my next version already has alternative to DMI match handler, so there will be two ways to match:
- DMI, like in this patch set
- int (*match)(struct pci_mcfg_fixup *) where you can read CPUID, and
whatever is necessary.
Great. Thanks.
Also, there doesn't seem to be a way to connect a given quirk check to the MCFG/device requesting the ops. So if there is a platform with multiple PCIE roots and not all of them have quirks, how does one no whether to override the default ecam ops?
Then we can identify them using domain:bus. I was wondering to pass acpi device handler to match handler for the case where we need e.g. extra properties from related DSDT device descriptor. Does it make sense to you?
The thing with domain:bus is that it is really a firmware setting. So on one platform domain 0 may be associated with hwdev X but on another platform it may be associated with hwdev Y. But yes, an acpi handle would make it much easier for the match handler to find out which hw dev it is being asked to match.
On 01/08/2016 08:16 AM, Mark Salter wrote:
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
...
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
Also, there doesn't seem to be a way to connect a given quirk check to the MCFG/device requesting the ops. So if there is a platform with multiple PCIE roots and not all of them have quirks, how does one no whether to override the default ecam ops?
That was the thinking that lead me to quirk the xgene based m400 using the root bridge VID/DID and DECLARE_PCI_FIXUP_EARLY(). A solution which works because the default config space accessors work sufficiently to read the VID/DID.
I think that solution might work for at least one other vendor as well, and IMHO is better than using DMI for the reasons you list.
I will rebase/post those patches RSN...
(please ignore the message below, i'm working getting that taken care of as well)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Fri, Jan 08, 2016 at 09:16:21AM -0500, Mark Salter wrote:
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
CPUs get reused across SoCs, so as you've implicitly noted, the CPUID alone is insufficient.
Given that IP blocks get moved around between SoC variants, I don't think you can check "some SoC register" based on the CPU ID -- you can end up bringing the board down at that point.
If the CPU ID alone is insufficient to tell you about a component, it cannot give you enough information about a component you can use to query more information from.
If your platform requires a quirk, it's always going to be painful (and to some extent, rightfulyl so). We should aim for correctness here with explicit matching.
Thanks, Mark.
On Fri, Jan 08, 2016 at 03:01:37PM +0000, Mark Rutland wrote:
On Fri, Jan 08, 2016 at 09:16:21AM -0500, Mark Salter wrote:
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
CPUs get reused across SoCs, so as you've implicitly noted, the CPUID alone is insufficient.
Given that IP blocks get moved around between SoC variants, I don't think you can check "some SoC register" based on the CPU ID -- you can end up bringing the board down at that point.
If the CPU ID alone is insufficient to tell you about a component, it cannot give you enough information about a component you can use to query more information from.
If your platform requires a quirk, it's always going to be painful (and to some extent, rightfulyl so). We should aim for correctness here with explicit matching.
Further, if there is going to be an ever-expanding set of platforms requring quirks, then we need a standard mechanism in ACPI to enable the platform to tell us explicitly either which specific PCI implementation is used, or which common quirk is necessary.
Thanks, Mark.
On Fri, 2016-01-08 at 15:12 +0000, Mark Rutland wrote:
On Fri, Jan 08, 2016 at 03:01:37PM +0000, Mark Rutland wrote:
On Fri, Jan 08, 2016 at 09:16:21AM -0500, Mark Salter wrote:
On Wed, 2015-12-16 at 16:16 +0100, Tomasz Nowicki wrote:
Some platforms may not be fully compliant with generic set of PCI config accessors. For these cases we implement the way to overwrite accessors set before PCI buses enumeration. Algorithm that overwrite accessors matches against platform ID (DMI), domain and bus number, hopefully enough for all cases. All quirks can be defined using: DECLARE_ACPI_MCFG_FIXUP() and keep self contained.
example:
static const struct dmi_system_id yyy[] = { { .ident = "<Platform ident string>", .callback = <handler>, .matches = { DMI_MATCH(DMI_SYS_VENDOR, "<system vendor>"), DMI_MATCH(DMI_PRODUCT_NAME, "<product name>"), DMI_MATCH(DMI_PRODUCT_VERSION, "product version"), }, }, { } };
This seems awkward to me in the case where the quirk is SoC-based and there may be multiple platforms affected. Needing a DECLARE_ACPI_MCFG_FIXUP for each platform using such a SoC (i.e. Mustang and Moonshot) doesn't seem right. In that case, I think it'd be better to check CPUID and possibly some SoC register to cover all platforms affected.
CPUs get reused across SoCs, so as you've implicitly noted, the CPUID alone is insufficient.
Given that IP blocks get moved around between SoC variants, I don't think you can check "some SoC register" based on the CPU ID -- you can end up bringing the board down at that point.
If the CPU ID alone is insufficient to tell you about a component, it cannot give you enough information about a component you can use to query more information from.
If your platform requires a quirk, it's always going to be painful (and to some extent, rightfulyl so). We should aim for correctness here with explicit matching.
Further, if there is going to be an ever-expanding set of platforms requring quirks, then we need a standard mechanism in ACPI to enable the platform to tell us explicitly either which specific PCI implementation is used, or which common quirk is necessary.
No, an ever-expanding set is exactly what we don't want. I think you've convinced me that I'm taking a wrong view of the problem. Putting something in the ACPI standard would be going too far and I think a hard sell to the standards folk. There really is no foolproof way to match a plug and play ACPI PCIe root to specific hardware without considering the exact platform and/or BIOS info. So yeah, it should be painful in order to give incentive to the silicon vendors to get it right.
Lets get rid of empty PCI init stub, related ACPI header and go with full-blown PCI host bridge driver.
Signed-off-by: Tomasz Nowicki tn@semihalf.com CC: Arnd Bergmann arnd@arndb.de CC: Catalin Marinas catalin.marinas@arm.com CC: Liviu Dudau Liviu.Dudau@arm.com CC: Lorenzo Pieralisi Lorenzo.Pieralisi@arm.com CC: Will Deacon will.deacon@arm.com --- arch/arm64/Kconfig | 4 ++++ arch/arm64/kernel/pci.c | 10 ---------- 2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index d65d315..71032ed 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -86,6 +86,7 @@ config ARM64 select OF_EARLY_FLATTREE select OF_RESERVED_MEM select PERF_USE_VMALLOC + select PCI_HOST_GENERIC_ACPI if ACPI select POWER_RESET select POWER_SUPPLY select RTC_LIB @@ -213,6 +214,9 @@ config PCI_MMCONFIG select PCI_ECAM depends on ACPI
+config ARCH_PCI_HOST_GENERIC_ACPI + def_bool ACPI + endmenu
menu "Kernel Features" diff --git a/arch/arm64/kernel/pci.c b/arch/arm64/kernel/pci.c index f7948f5..e9cf58b 100644 --- a/arch/arm64/kernel/pci.c +++ b/arch/arm64/kernel/pci.c @@ -10,7 +10,6 @@ * */
-#include <linux/acpi.h> #include <linux/init.h> #include <linux/io.h> #include <linux/kernel.h> @@ -50,12 +49,3 @@ int pcibios_enable_device(struct pci_dev *dev, int mask)
return pci_enable_resources(dev, mask); } - -#ifdef CONFIG_ACPI -/* Root bridge scanning */ -struct pci_bus *pci_acpi_scan_root(struct acpi_pci_root *root) -{ - /* TODO: Should be revisited when implementing PCI on ACPI */ - return NULL; -} -#endif
Hi Tomasz,
On 12/16/2015 10:16 AM, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary.
- Move non-arch specific bits to the core code.
- Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
- Enable above driver on ARM64
Patches has been built on top of 4.4-rc4 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v2)
NOTE, this patch set depends on Matthew's patches: http://www.spinics.net/lists/linux-pci/msg45950.html https://github.com/Vality/linux/tree/pci-fixes
This has been tested on Cavium ThunderX 1 socket server and QEMU. Any help in reviewing and testing is very appreciated.
v1 -> v2
- moved non-arch specific piece of code to dirver/acpi/ directory
- fixed IO resource handling
- introduced PCI config accessors quirks matching
- moved ACPI_COMPANION_SET to generic code
Just tested your series. I'm seeing a resource assignment problem below. The bus addresses show as memory addresses and memory addresses show as bus addresses and IO resource did not show up.
Tomasz V2
[ 2.520852] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.535472] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.550562] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.567813] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.591270] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.611144] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.630299] ACPI: IORT: can't find node related to (null) device [ 2.647184]_acpi_PNP0A08:02:_PCI_host_bridge_to_bus_0002:00 [ 2.662663] pci_bus 0002:00: root bus resource [mem 0x00100000-0x3fffffff window] (bus address [0xfffff5ff00100000-0xfffff5ff3fffffff]) [ 2.703561] pci_bus 0002:00: root bus resource [mem 0x40000000-0x7fffffff window] (bus address [0xfffff5fe80000000-0xfffff5febfffffff]) [ 2.737737] pci_bus 0002:00: root bus resource [mem 0x80000000-0xffffffff window] (bus address [0xfffff5fe00000000-0xfffff5fe7fffffff]) [ 2.794961] pci_bus 0002:00: root bus resource [bus 00-1f]
Mark Salter's patches
[ 2.730011] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.744648] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.759330] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.783295] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.806726] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.826005] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.845361] PCI host bridge to bus 0002:00 [ 2.856719]_pci_bus_0002:00:_root_bus_resource_[bus_00-1f] [ 2.872056] pci_bus 0002:00: root bus resource [mem 0xa0100100000-0xa013fffffff] (bus address [0x00100000-0x3fffffff]) [ 2.902008] pci_bus 0002:00: root bus resource [mem 0xa0200000000-0xa023fffffff] (bus address [0x40000000-0x7fffffff]) [ 2.932396] pci_bus 0002:00: root bus resource [mem 0xa0300000000-0xa037fffffff] (bus address [0x80000000-0xffffffff]) [ 2.983827] pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
Here is how the ACPI table looks like:
QWORDMemory( // Consumed-And-prodced resource(all of memory space) ResourceProducer, // bit 0 of general flags is 0 PosDecode, // positive Decode: _DEC MinFixed, // Range is fixed: _MIF MaxFixed, // Range is fixed: _MAF NonCacheable, // _MEM ReadWrite, // _RW 0x00000000, // Granularity: _GRA 0x00100000, // Min - PCI Memory start: _MIN 0x3FFFFFFF, // Max - PCI Memory end: _MAX 0xA0100000000, // Translation: _TRA 0x3FF00000, // Range Length: _LEN , // Optional field left blank , // Optional field left blank MEM0, // Name declaration for this descriptor AddressRangeMemory, TypeStatic )
Any thoughts?
On 17.12.2015 22:24, Sinan Kaya wrote:
Hi Tomasz,
On 12/16/2015 10:16 AM, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to collect PCI config regions and used when necessary.
- Move non-arch specific bits to the core code.
- Use MMCONFIG code and implement generic ACPI based PCI host controller driver.
- Enable above driver on ARM64
Patches has been built on top of 4.4-rc4 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v2)
NOTE, this patch set depends on Matthew's patches: http://www.spinics.net/lists/linux-pci/msg45950.html https://github.com/Vality/linux/tree/pci-fixes
This has been tested on Cavium ThunderX 1 socket server and QEMU. Any help in reviewing and testing is very appreciated.
v1 -> v2
- moved non-arch specific piece of code to dirver/acpi/ directory
- fixed IO resource handling
- introduced PCI config accessors quirks matching
- moved ACPI_COMPANION_SET to generic code
Just tested your series. I'm seeing a resource assignment problem below. The bus addresses show as memory addresses and memory addresses show as bus addresses and IO resource did not show up.
Tomasz V2
[ 2.520852] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.535472] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.550562] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.567813] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.591270] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.611144] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.630299] ACPI: IORT: can't find node related to (null) device [ 2.647184]_acpi_PNP0A08:02:_PCI_host_bridge_to_bus_0002:00 [ 2.662663] pci_bus 0002:00: root bus resource [mem 0x00100000-0x3fffffff window] (bus address [0xfffff5ff00100000-0xfffff5ff3fffffff]) [ 2.703561] pci_bus 0002:00: root bus resource [mem 0x40000000-0x7fffffff window] (bus address [0xfffff5fe80000000-0xfffff5febfffffff]) [ 2.737737] pci_bus 0002:00: root bus resource [mem 0x80000000-0xffffffff window] (bus address [0xfffff5fe00000000-0xfffff5fe7fffffff]) [ 2.794961] pci_bus 0002:00: root bus resource [bus 00-1f]
Mark Salter's patches
[ 2.730011] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.744648] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.759330] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.783295] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.806726] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.826005] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.845361] PCI host bridge to bus 0002:00 [ 2.856719]_pci_bus_0002:00:_root_bus_resource_[bus_00-1f] [ 2.872056] pci_bus 0002:00: root bus resource [mem 0xa0100100000-0xa013fffffff] (bus address [0x00100000-0x3fffffff]) [ 2.902008] pci_bus 0002:00: root bus resource [mem 0xa0200000000-0xa023fffffff] (bus address [0x40000000-0x7fffffff]) [ 2.932396] pci_bus 0002:00: root bus resource [mem 0xa0300000000-0xa037fffffff] (bus address [0x80000000-0xffffffff]) [ 2.983827] pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
Here is how the ACPI table looks like:
QWORDMemory( // Consumed-And-prodced resource(all of memory space) ResourceProducer, // bit 0 of general flags is 0 PosDecode, // positive Decode: _DEC MinFixed, // Range is fixed: _MIF MaxFixed, // Range is fixed: _MAF NonCacheable, // _MEM ReadWrite, // _RW 0x00000000, // Granularity: _GRA 0x00100000, // Min - PCI Memory start: _MIN 0x3FFFFFFF, // Max - PCI Memory end: _MAX 0xA0100000000, // Translation: _TRA 0x3FF00000, // Range Length: _LEN , // Optional field left blank , // Optional field left blank MEM0, // Name declaration for this descriptor AddressRangeMemory, TypeStatic )
Any thoughts?
Yes, this is because of: [PATCH V2 20/23] ACPI, PCI: Refine the way to handle translation_offset for ACPI resources which should have RFC tag. I posted this patch to re-trigger discussion on this.
The patch does not add Translation offset to the MMIO type resource start address and for acpi_pci_probe_root_resources(ci) causes problems like that. Indeed MMIO has to be fixed.
But IO resource type is more problematic. Actually, how acpi_decode_space() should parse resources and which ACPI IO descriptor should be used for ARM64: QWORDIO (offset == 0 vs offset != 0), DWordIO (TypeStatic vs TypeTranslation) + backward compatibility with IA64...
Please refer to: https://lkml.org/lkml/2015/11/5/581
As Lorenzo pointed out, we *all* need to agree upon the IO resource ACPI descriptor and its parsing method.
Any comments are very appreciated!
Tomasz
On 17.12.2015 22:24, Sinan Kaya wrote:
Hi Tomasz,
On 12/16/2015 10:16 AM, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to
collect PCI config regions and used when necessary. 2. Move non-arch specific bits to the core code. 3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver. 4. Enable above driver on ARM64
Patches has been built on top of 4.4-rc4 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v2)
NOTE, this patch set depends on Matthew's patches: http://www.spinics.net/lists/linux-pci/msg45950.html https://github.com/Vality/linux/tree/pci-fixes
This has been tested on Cavium ThunderX 1 socket server and QEMU. Any help in reviewing and testing is very appreciated.
v1 -> v2
- moved non-arch specific piece of code to dirver/acpi/ directory
- fixed IO resource handling
- introduced PCI config accessors quirks matching
- moved ACPI_COMPANION_SET to generic code
Just tested your series. I'm seeing a resource assignment problem below. The bus addresses show as memory addresses and memory addresses show as bus addresses and IO resource did not show up.
Tomasz V2
[ 2.520852] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.535472] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.550562] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.567813] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.591270] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.611144] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.630299] ACPI: IORT: can't find node related to (null) device [ 2.647184]_acpi_PNP0A08:02:_PCI_host_bridge_to_bus_0002:00 [ 2.662663] pci_bus 0002:00: root bus resource [mem 0x00100000-0x3fffffff window] (bus address [0xfffff5ff00100000-0xfffff5ff3fffffff]) [ 2.703561] pci_bus 0002:00: root bus resource [mem 0x40000000-0x7fffffff window] (bus address [0xfffff5fe80000000-0xfffff5febfffffff]) [ 2.737737] pci_bus 0002:00: root bus resource [mem 0x80000000-0xffffffff window] (bus address [0xfffff5fe00000000-0xfffff5fe7fffffff]) [ 2.794961] pci_bus 0002:00: root bus resource [bus 00-1f]
Mark Salter's patches
[ 2.730011] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.744648] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.759330] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.783295] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.806726] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.826005] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.845361] PCI host bridge to bus 0002:00 [ 2.856719]_pci_bus_0002:00:_root_bus_resource_[bus_00-1f] [ 2.872056] pci_bus 0002:00: root bus resource [mem 0xa0100100000-0xa013fffffff] (bus address [0x00100000-0x3fffffff]) [ 2.902008] pci_bus 0002:00: root bus resource [mem 0xa0200000000-0xa023fffffff] (bus address [0x40000000-0x7fffffff]) [ 2.932396] pci_bus 0002:00: root bus resource [mem 0xa0300000000-0xa037fffffff] (bus address [0x80000000-0xffffffff]) [ 2.983827] pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
Here is how the ACPI table looks like:
QWORDMemory( // Consumed-And-prodced resource(all of memory space) ResourceProducer, // bit 0 of general flags is 0 PosDecode, // positive Decode: _DEC MinFixed, // Range is fixed: _MIF MaxFixed, // Range is fixed: _MAF NonCacheable, // _MEM ReadWrite, // _RW 0x00000000, // Granularity: _GRA 0x00100000, // Min - PCI Memory start: _MIN 0x3FFFFFFF, // Max - PCI Memory end: _MAX 0xA0100000000, // Translation: _TRA 0x3FF00000, // Range Length: _LEN , // Optional field left blank , // Optional field left blank MEM0, // Name declaration for this descriptor AddressRangeMemory, TypeStatic )
Any thoughts?
Yes, this is because of: [PATCH V2 20/23] ACPI, PCI: Refine the way to handle translation_offset for ACPI resources which should have RFC tag. I posted this patch to re-trigger discussion on this.
The patch does not add Translation offset to the MMIO type resource start address and for acpi_pci_probe_root_resources(ci) causes problems like that. Indeed MMIO has to be fixed.
OK. I assume you'll post a patch for this soon similar to what Liu Jiang is doing in IA64 directory (arch/ia64/pci/pci.c) as I can't proceed with my testing without this bugfix.
But IO resource type is more problematic. Actually, how acpi_decode_space() should parse resources and which ACPI IO descriptor should be used for ARM64: QWORDIO (offset == 0 vs offset != 0), DWordIO (TypeStatic vs TypeTranslation) + backward compatibility with IA64...
Please refer to: https://lkml.org/lkml/2015/11/5/581
As Lorenzo pointed out, we *all* need to agree upon the IO resource ACPI descriptor and its parsing method.
Here is what I have as an IO resource.
QWORDIO( //Consumed-And-produced resource ResourceProducer, // bit 0 of general flags is 0 MinFixed, // Range is fixed MaxFixed, // Range is fixed PosDecode, EntireRange, 0x0000, // Granularity 0x1000, // Min, 0 is not accepted 0x10FFF, // Max 0x8FFFFFEF000, // Translation 0x10000, // Range Length ,, PI00 )
I don't have any type specified.
I agree with Lorenzo's assessment. The min and max values represent the PCI IO bus addresses. The translation offset is added to these values to figure out the CPU view of the PCI IO range.
The endpoints BAR addresses are programmed with IO addresses ranging between 0x1000 and 0x10FFF for this example above.
Here is another question. Chris Covington and I asked this question on a private email to you but we didn't hear back.
We were referring to a Linaro IO hack patch as we were not sure whether this was a limitation of the hack or a general expectation for ARM64 PCI in general.
I'll repeat it here.
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
Each root port can have the same IO address range configuration, are we expecting IO port numbers to be unique across the whole system for ARM64?
Something like
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x11000-0x20FFF Root port 2 = IO range 0x21000-0x30FFF
since the IO addresses are being remapped into PCI IO range printed during boot.
PCI I/O : 0xffff7ffffae00000 - 0xffff7ffffbe00000 ( 16 MB)
and each root port would remap to 64k of the 16MB range.
Any comments are very appreciated!
Tomasz
To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 18.12.2015 19:56, okaya@codeaurora.org wrote:
On 17.12.2015 22:24, Sinan Kaya wrote:
Hi Tomasz,
On 12/16/2015 10:16 AM, Tomasz Nowicki wrote:
From the functionality point of view this series might be split into the following logic parts:
- Make MMCONFIG code arch-agnostic which allows all architectures to
collect PCI config regions and used when necessary. 2. Move non-arch specific bits to the core code. 3. Use MMCONFIG code and implement generic ACPI based PCI host controller driver. 4. Enable above driver on ARM64
Patches has been built on top of 4.4-rc4 and can be found here: git@github.com:semihalf-nowicki-tomasz/linux.git (pci-acpi-v2)
NOTE, this patch set depends on Matthew's patches: http://www.spinics.net/lists/linux-pci/msg45950.html https://github.com/Vality/linux/tree/pci-fixes
This has been tested on Cavium ThunderX 1 socket server and QEMU. Any help in reviewing and testing is very appreciated.
v1 -> v2
- moved non-arch specific piece of code to dirver/acpi/ directory
- fixed IO resource handling
- introduced PCI config accessors quirks matching
- moved ACPI_COMPANION_SET to generic code
Just tested your series. I'm seeing a resource assignment problem below. The bus addresses show as memory addresses and memory addresses show as bus addresses and IO resource did not show up.
Tomasz V2
[ 2.520852] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.535472] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.550562] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.567813] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.591270] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.611144] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.630299] ACPI: IORT: can't find node related to (null) device [ 2.647184]_acpi_PNP0A08:02:_PCI_host_bridge_to_bus_0002:00 [ 2.662663] pci_bus 0002:00: root bus resource [mem 0x00100000-0x3fffffff window] (bus address [0xfffff5ff00100000-0xfffff5ff3fffffff]) [ 2.703561] pci_bus 0002:00: root bus resource [mem 0x40000000-0x7fffffff window] (bus address [0xfffff5fe80000000-0xfffff5febfffffff]) [ 2.737737] pci_bus 0002:00: root bus resource [mem 0x80000000-0xffffffff window] (bus address [0xfffff5fe00000000-0xfffff5fe7fffffff]) [ 2.794961] pci_bus 0002:00: root bus resource [bus 00-1f]
Mark Salter's patches
[ 2.730011] ACPI: PCI Interrupt Link [LN1C] (IRQs *238) [ 2.744648] ACPI: PCI Interrupt Link [LN1D] (IRQs *239) [ 2.759330] ACPI: PCI Root Bridge [PCI2] (domain 0002 [bus 00-1f]) [ 2.783295] acpi PNP0A08:02: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI] [ 2.806726] acpi PNP0A08:02: _OSC: platform does not support [PCIeHotplug] [ 2.826005] acpi PNP0A08:02: _OSC: OS now controls [PME AER PCIeCapability] [ 2.845361] PCI host bridge to bus 0002:00 [ 2.856719]_pci_bus_0002:00:_root_bus_resource_[bus_00-1f] [ 2.872056] pci_bus 0002:00: root bus resource [mem 0xa0100100000-0xa013fffffff] (bus address [0x00100000-0x3fffffff]) [ 2.902008] pci_bus 0002:00: root bus resource [mem 0xa0200000000-0xa023fffffff] (bus address [0x40000000-0x7fffffff]) [ 2.932396] pci_bus 0002:00: root bus resource [mem 0xa0300000000-0xa037fffffff] (bus address [0x80000000-0xffffffff]) [ 2.983827] pci_bus 0002:00: root bus resource [io 0x0000-0xffff]
Here is how the ACPI table looks like:
QWORDMemory( // Consumed-And-prodced resource(all of memory space) ResourceProducer, // bit 0 of general flags is 0 PosDecode, // positive Decode: _DEC MinFixed, // Range is fixed: _MIF MaxFixed, // Range is fixed: _MAF NonCacheable, // _MEM ReadWrite, // _RW 0x00000000, // Granularity: _GRA 0x00100000, // Min - PCI Memory start: _MIN 0x3FFFFFFF, // Max - PCI Memory end: _MAX 0xA0100000000, // Translation: _TRA 0x3FF00000, // Range Length: _LEN , // Optional field left blank , // Optional field left blank MEM0, // Name declaration for this descriptor AddressRangeMemory, TypeStatic )
Any thoughts?
Yes, this is because of: [PATCH V2 20/23] ACPI, PCI: Refine the way to handle translation_offset for ACPI resources which should have RFC tag. I posted this patch to re-trigger discussion on this.
The patch does not add Translation offset to the MMIO type resource start address and for acpi_pci_probe_root_resources(ci) causes problems like that. Indeed MMIO has to be fixed.
OK. I assume you'll post a patch for this soon similar to what Liu Jiang is doing in IA64 directory (arch/ia64/pci/pci.c) as I can't proceed with my testing without this bugfix.
But IO resource type is more problematic. Actually, how acpi_decode_space() should parse resources and which ACPI IO descriptor should be used for ARM64: QWORDIO (offset == 0 vs offset != 0), DWordIO (TypeStatic vs TypeTranslation) + backward compatibility with IA64...
Please refer to: https://lkml.org/lkml/2015/11/5/581
As Lorenzo pointed out, we *all* need to agree upon the IO resource ACPI descriptor and its parsing method.
Here is what I have as an IO resource.
QWORDIO( //Consumed-And-produced resource ResourceProducer, // bit 0 of general flags is 0 MinFixed, // Range is fixed MaxFixed, // Range is fixed PosDecode, EntireRange, 0x0000, // Granularity 0x1000, // Min, 0 is not accepted 0x10FFF, // Max 0x8FFFFFEF000, // Translation 0x10000, // Range Length ,, PI00 )
I don't have any type specified.
I agree with Lorenzo's assessment. The min and max values represent the PCI IO bus addresses. The translation offset is added to these values to figure out the CPU view of the PCI IO range.
The endpoints BAR addresses are programmed with IO addresses ranging between 0x1000 and 0x10FFF for this example above.
Here is another question. Chris Covington and I asked this question on a private email to you but we didn't hear back.
I have not seen any mails like that in my mail box, unless you sent it to linaro one, which is not accessible for me any more. Please use @semihalf
We were referring to a Linaro IO hack patch as we were not sure whether this was a limitation of the hack or a general expectation for ARM64 PCI in general.
I'll repeat it here.
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
Each root port can have the same IO address range configuration, are we expecting IO port numbers to be unique across the whole system for ARM64?
Looking at pci_register_io_range which is currently used on ARM64 I would say, no you can't have the same CPU addressable IO ranges.
pci_register_io_range does not allow to use regions which overlap each other.
Bjorn, Arnd, Will, any opinion on this apart from current code restrictions?
Tomasz
On Fri, Dec 18, 2015 at 06:56:39PM +0000, okaya@codeaurora.org wrote:
[...]
Here is what I have as an IO resource.
QWORDIO( //Consumed-And-produced resource ResourceProducer, // bit 0 of general flags is 0 MinFixed, // Range is fixed MaxFixed, // Range is fixed PosDecode, EntireRange, 0x0000, // Granularity 0x1000, // Min, 0 is not accepted 0x10FFF, // Max 0x8FFFFFEF000, // Translation 0x10000, // Range Length ,, PI00 )
I don't have any type specified.
I agree with Lorenzo's assessment. The min and max values represent the PCI IO bus addresses. The translation offset is added to these values to figure out the CPU view of the PCI IO range.
The endpoints BAR addresses are programmed with IO addresses ranging between 0x1000 and 0x10FFF for this example above.
Here is another question. Chris Covington and I asked this question on a private email to you but we didn't hear back.
We were referring to a Linaro IO hack patch as we were not sure whether this was a limitation of the hack or a general expectation for ARM64 PCI in general.
I'll repeat it here.
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
It is fine. You end up mapping for each of those a 4k window of the virtual address space allocated to IO and that's what you will have in the kernel PCI resources (not in the HW BARs though). If that was a problem it would be even for the current DT host controllers eg:
arch/arm64/boot/dts/apm/apm-storm.dtsi
it should not be (again I will let Arnd comment on this since he may be aware of issues encountered on other arches/platforms).
Lorenzo
Each root port can have the same IO address range configuration, are we expecting IO port numbers to be unique across the whole system for ARM64?
Something like
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x11000-0x20FFF Root port 2 = IO range 0x21000-0x30FFF
since the IO addresses are being remapped into PCI IO range printed during boot.
PCI I/O : 0xffff7ffffae00000 - 0xffff7ffffbe00000 ( 16 MB)
and each root port would remap to 64k of the 16MB range.
Any comments are very appreciated!
Tomasz
To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 21.12.2015 13:10, Lorenzo Pieralisi wrote:
On Fri, Dec 18, 2015 at 06:56:39PM +0000, okaya@codeaurora.org wrote:
[...]
Here is what I have as an IO resource.
QWORDIO( //Consumed-And-produced resource ResourceProducer, // bit 0 of general flags is 0 MinFixed, // Range is fixed MaxFixed, // Range is fixed PosDecode, EntireRange, 0x0000, // Granularity 0x1000, // Min, 0 is not accepted 0x10FFF, // Max 0x8FFFFFEF000, // Translation 0x10000, // Range Length ,, PI00 )
I don't have any type specified.
I agree with Lorenzo's assessment. The min and max values represent the PCI IO bus addresses. The translation offset is added to these values to figure out the CPU view of the PCI IO range.
The endpoints BAR addresses are programmed with IO addresses ranging between 0x1000 and 0x10FFF for this example above.
Here is another question. Chris Covington and I asked this question on a private email to you but we didn't hear back.
We were referring to a Linaro IO hack patch as we were not sure whether this was a limitation of the hack or a general expectation for ARM64 PCI in general.
I'll repeat it here.
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
It is fine. You end up mapping for each of those a 4k window of the virtual address space allocated to IO and that's what you will have in the kernel PCI resources (not in the HW BARs though). If that was a problem it would be even for the current DT host controllers eg:
arch/arm64/boot/dts/apm/apm-storm.dtsi
it should not be (again I will let Arnd comment on this since he may be aware of issues encountered on other arches/platforms).
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
If above ranges are mapped into different CPU windows, then yes, it is fine.
Tomasz
On Monday 21 December 2015, Tomasz Nowicki wrote:
On 21.12.2015 13:10, Lorenzo Pieralisi wrote:
On Fri, Dec 18, 2015 at 06:56:39PM +0000, okaya@codeaurora.org wrote:
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
It is fine. You end up mapping for each of those a 4k window of the virtual address space allocated to IO and that's what you will have in the kernel PCI resources (not in the HW BARs though). If that was a problem it would be even for the current DT host controllers eg:
arch/arm64/boot/dts/apm/apm-storm.dtsi
it should not be (again I will let Arnd comment on this since he may be aware of issues encountered on other arches/platforms).
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
If above ranges are mapped into different CPU windows, then yes, it is fine.
Ideally, they should all be the same CPU address so we only have to map the window once, each device gets an address below 64K, and you can have legacy port numbers (below 4K) on any bus, which is required to make certain GPUs work.
I haven't actually seen anyone do that on ARM though, every implementation so far has a separate mapping per host bridge, and we can cope with that too, and we can live with either overlapping bus addresses or unique bus addresses, any of them can be expressed by the PCI core in Linux, we just have to make sure that we correctly translate the firmware tables into our internal structures.
Arnd
On Monday 21 December 2015, Tomasz Nowicki wrote:
On 21.12.2015 13:10, Lorenzo Pieralisi wrote:
On Fri, Dec 18, 2015 at 06:56:39PM +0000, okaya@codeaurora.org wrote:
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
It is fine. You end up mapping for each of those a 4k window of the virtual address space allocated to IO and that's what you will have in the kernel PCI resources (not in the HW BARs though). If that was a
problem
it would be even for the current DT host controllers eg:
arch/arm64/boot/dts/apm/apm-storm.dtsi
it should not be (again I will let Arnd comment on this since he may
be
aware of issues encountered on other arches/platforms).
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
If above ranges are mapped into different CPU windows, then yes, it is fine.
Ideally, they should all be the same CPU address so we only have to map the window once, each device gets an address below 64K, and you can have legacy port numbers (below 4K) on any bus, which is required to make certain GPUs work.
I haven't actually seen anyone do that on ARM though, every implementation so far has a separate mapping per host bridge, and we can cope with that too, and we can live with either overlapping bus addresses or unique bus addresses, any of them can be expressed by the PCI core in Linux, we just have to make sure that we correctly translate the firmware tables into our internal structures.
Arnd
To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks, I won't be touching the acpi tables then and I will assume the hack had a problem. It was trying to remap the io range of the second root port to the first port io address map.
I was getting a warning from resource.c
Btw, when I tested the io ranges before, kernel didn't accept anything below 1k like 0. That is why my range starts at 1k.
On Monday 21 December 2015, Okaya@codeaurora.org wrote:
Thanks, I won't be touching the acpi tables then and I will assume the hack had a problem. It was trying to remap the io range of the second root port to the first port io address map.
If all domains share the same I/O space, you should only map it once of course.
I was getting a warning from resource.c
Btw, when I tested the io ranges before, kernel didn't accept anything below 1k like 0. That is why my range starts at 1k.
This is PCIBIOS_MIN_IO, it defines what I/O port numbers can be dynamically assigned to PCI devices, but you should still map the entire 64K area per domain, including the first 4K that can be used for legacy ISA compatibility.
Arnd
On Mon, Dec 21, 2015 at 03:26:14PM +0000, Okaya@codeaurora.org wrote:
On Monday 21 December 2015, Tomasz Nowicki wrote:
On 21.12.2015 13:10, Lorenzo Pieralisi wrote:
On Fri, Dec 18, 2015 at 06:56:39PM +0000, okaya@codeaurora.org wrote:
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
It is fine. You end up mapping for each of those a 4k window of the virtual address space allocated to IO and that's what you will have in the kernel PCI resources (not in the HW BARs though). If that was a
problem
it would be even for the current DT host controllers eg:
arch/arm64/boot/dts/apm/apm-storm.dtsi
it should not be (again I will let Arnd comment on this since he may
be
aware of issues encountered on other arches/platforms).
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
If above ranges are mapped into different CPU windows, then yes, it is fine.
Ideally, they should all be the same CPU address so we only have to map the window once, each device gets an address below 64K, and you can have legacy port numbers (below 4K) on any bus, which is required to make certain GPUs work.
I haven't actually seen anyone do that on ARM though, every implementation so far has a separate mapping per host bridge, and we can cope with that too, and we can live with either overlapping bus addresses or unique bus addresses, any of them can be expressed by the PCI core in Linux, we just have to make sure that we correctly translate the firmware tables into our internal structures.
Arnd
To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks, I won't be touching the acpi tables then and I will assume the hack had a problem. It was trying to remap the io range of the second root port to the first port io address map.
I was getting a warning from resource.c
Btw, when I tested the io ranges before, kernel didn't accept anything below 1k like 0. That is why my range starts at 1k.
And that's what you should not do. ACPI tables have to have a 1:1 correspondence with HW resources and you must not change them to make the kernel (which version by the way, given that ARM64 ACPI PCI support is not in the mainline) work. I already said that: we must not interpret ACPI tables, we must define them according to specifications, and that's what everyone should follow to write FW.
So, why does your PCI IO range starts at 1k ? Please define what you mean by "kernel didn't accept anything below 1k", I want to understand what you are referring to.
Thanks, Lorenzo
On 1/11/2016 10:39 AM, Lorenzo Pieralisi wrote:
Thanks, I won't be touching the acpi tables then and I will assume the
hack had a problem. It was trying to remap the io range of the second root port to the first port io address map.
I was getting a warning from resource.c
Btw, when I tested the io ranges before, kernel didn't accept anything below 1k like 0. That is why my range starts at 1k.
And that's what you should not do. ACPI tables have to have a 1:1 correspondence with HW resources and you must not change them to make the kernel (which version by the way, given that ARM64 ACPI PCI support is not in the mainline) work. I already said that: we must not interpret ACPI tables, we must define them according to specifications, and that's what everyone should follow to write FW.
What confused me was the kernel view of IO addresses vs. PCI IO addresses. I looked at Mark Salter's patch and I realized that the kernel is maintaining global IO addresses with offsets to real PCI IO addresses.
I was expecting to see the PCI addresses to match kernel IO addresses. I wondered if somebody put some restriction into Linux or not which happened to me below.
_#_cat_/proc/ioports 00000000-0000efff : PCI Bus 0000:00 00001000-00001fff : PCI Bus 0000:01 0000f000-0001dfff : PCI Bus 0002:00 0000f000-0000ffff : PCI Bus 0002:01 0001e000-0002cfff : PCI Bus 0006:00 0001e000-0001efff : PCI Bus 0006:01 0001e000-0001e01f : 0006:01:00.0 0001e020-0001e03f : 0006:01:00.1 / #
#_dmesg_|_grep_resource [ 2.945762] pci_bus 0000:00: root bus resource [io 0x0000-0xefff window] (bus address [0x1000-0xffff]) [ 3.652201] pci_bus 0002:00: root bus resource [io 0xf000-0x1dfff window] (bus address [0x1000-0xffff]) [ 6.546716] pci_bus 0006:00: root bus resource [io 0x1e000-0x2cfff window] (bus address [0x1000-0xffff]) / #
Since we are talking about what ACPI dictates vs. what kernel does. Here is something that got me while testing.
Somebody sneaked in a 0x10003 upper limit on PCI addresses for some reason below. There is nothing magic about 0x10003 and I'm wonding why we have this limit.
static void acpi_dev_ioresource_flags(struct resource *res, u64 len, u8 io_decode, u8 translation_type) { res->flags = IORESOURCE_IO;
if (!acpi_dev_resource_len_valid(res->start, res->end, len, true)) res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET;
if (res->end >= 0x10003) res->flags |= IORESOURCE_DISABLED | IORESOURCE_UNSET;
I did a debug session with Tomasz last week. He fixed the issue. The range for IO resources were not being registered properly. The second root port was causing a bug check in the kernel because the IO range was overlapping with the first root port. The issue is fixed now.
So, why does your PCI IO range starts at 1k ? Please define what you mean by "kernel didn't accept anything below 1k", I want to understand what you are referring to.
I created my own version of ACPI PCI root port patch by porting ia64 to ARMv7 two years ago and wrote the ACPI table on an ARMv7 platform. I have been reusing the same tables since then.
The issue was what Arnd described in his email to this thread. (PCIBIOS_MIN_IO) macro. I have tested the IO range starting from 0 on Tomasz's patches. I'm not seeing any problems.
On Monday 11 January 2016 10:56:30 Sinan Kaya wrote:
#_dmesg_|_grep_resource [ 2.945762] pci_bus 0000:00: root bus resource [io 0x0000-0xefff window] (bus address [0x1000-0xffff]) [ 3.652201] pci_bus 0002:00: root bus resource [io 0xf000-0x1dfff window] (bus address [0x1000-0xffff]) [ 6.546716] pci_bus 0006:00: root bus resource [io 0x1e000-0x2cfff window] (bus address [0x1000-0xffff]) / #
This is bad. We normally want to stay out of the first 0x1000 bytes of the Linux space, to prevent drivers from poking into the ISA registers.
We can have one of the buses be the "primary" bus that has its first 0x1000 bytes of I/O space mapped into the respective Linux addresses, but mapping the second 0x1000 bytes into the reserved space is the worst possible outcome here, as legacy ISA drivers will now poke at random other devices that are intentionally moved to high addresses to stay of of that range.
Since we are talking about what ACPI dictates vs. what kernel does. Here is something that got me while testing.
Somebody sneaked in a 0x10003 upper limit on PCI addresses for some reason below. There is nothing magic about 0x10003 and I'm wonding why we have this limit.
I/O ports are at aligned addresses, the highest 4-byte address you can access is 0xfffc with the default 0x10000 port limit per bus. This limit is generally seen as sufficient because that is what x86 has. Most PCI devices have no I/O ports at all, and the ones that have them have only a couple of bytes of address space in it.
Arnd
On Tue, Jan 12, 2016 at 03:30:25PM +0100, Arnd Bergmann wrote:
On Monday 11 January 2016 10:56:30 Sinan Kaya wrote:
#_dmesg_|_grep_resource [ 2.945762] pci_bus 0000:00: root bus resource [io 0x0000-0xefff window] (bus address [0x1000-0xffff]) [ 3.652201] pci_bus 0002:00: root bus resource [io 0xf000-0x1dfff window] (bus address [0x1000-0xffff]) [ 6.546716] pci_bus 0006:00: root bus resource [io 0x1e000-0x2cfff window] (bus address [0x1000-0xffff]) / #
This is bad. We normally want to stay out of the first 0x1000 bytes of the Linux space, to prevent drivers from poking into the ISA registers.
You are referring to:
pci_bus 0000:00: root bus resource [io 0x0000-0xefff window] ^^^^^^ here, right ? [0x0 - PCIBIOS_MIN_IO] is not assigned by the PCI code that reassigns resources anyway, so devices with IO BARs won't get assigned [0x0 - PCIBIOS_MIN_IO] address space (Linux space).
Are you saying we should disallow the [0x0 - 0x1000] in the PCI busses IO resource (Linux space) ?
In pci_address_to_pio() the offset (Linux IO resource) we assign starts from 0x0, so we always allocate that chunk of IO address space (that is an offset into the Linux virtual address space), am I correct ?
We can have one of the buses be the "primary" bus that has its first 0x1000 bytes of I/O space mapped into the respective Linux addresses, but mapping the second 0x1000 bytes into the reserved space is the worst possible outcome here, as legacy ISA drivers will now poke at random other devices that are intentionally moved to high addresses to stay of of that range.
And you are referring to:
root bus resource [io 0x0000-0xefff window] (bus address [0x1000-0xffff]) ^^^^^^ ^^^^^^
here ? If ISA drivers poke at addresses in the [0x0 - 0x1000] range (Linux space IO offset) they end up on the PCI bus with addresses above 0x1000, is that what you are saying when you refer to "moved to high addresses to stay out of that range" ?
Thanks, Lorenzo
On Tuesday 12 January 2016 18:38:54 Lorenzo Pieralisi wrote:
On Tue, Jan 12, 2016 at 03:30:25PM +0100, Arnd Bergmann wrote:
On Monday 11 January 2016 10:56:30 Sinan Kaya wrote:
#_dmesg_|_grep_resource [ 2.945762] pci_bus 0000:00: root bus resource [io 0x0000-0xefff window] (bus address [0x1000-0xffff]) [ 3.652201] pci_bus 0002:00: root bus resource [io 0xf000-0x1dfff window] (bus address [0x1000-0xffff]) [ 6.546716] pci_bus 0006:00: root bus resource [io 0x1e000-0x2cfff window] (bus address [0x1000-0xffff]) / #
This is bad. We normally want to stay out of the first 0x1000 bytes of the Linux space, to prevent drivers from poking into the ISA registers.
You are referring to:
pci_bus 0000:00: root bus resource [io 0x0000-0xefff window] ^^^^^^ here, right ? [0x0 - PCIBIOS_MIN_IO] is not assigned by the PCI code that reassigns resources anyway, so devices with IO BARs won't get assigned [0x0 - PCIBIOS_MIN_IO] address space (Linux space).
Are you saying we should disallow the [0x0 - 0x1000] in the PCI busses IO resource (Linux space) ?
In pci_address_to_pio() the offset (Linux IO resource) we assign starts from 0x0, so we always allocate that chunk of IO address space (that is an offset into the Linux virtual address space), am I correct ?
I think we can assign the address zero of the Linux I/O port range, but we should never assign it to a bus port range that does not also start at zero.
If we encounter a firmware description that has bus range which excludes the first 1k, we should probably assign it to somewhere after 0x10000 (65536), so we can later assign a primary I/O space to a bus that has an ISA or LPC bridge with actual devices below 0x1000 (4096).
We can have one of the buses be the "primary" bus that has its first 0x1000 bytes of I/O space mapped into the respective Linux addresses, but mapping the second 0x1000 bytes into the reserved space is the worst possible outcome here, as legacy ISA drivers will now poke at random other devices that are intentionally moved to high addresses to stay of of that range.
And you are referring to:
root bus resource [io 0x0000-0xefff window] (bus address [0x1000-0xffff]) ^^^^^^ ^^^^^^
here ? If ISA drivers poke at addresses in the [0x0 - 0x1000] range (Linux space IO offset) they end up on the PCI bus with addresses above 0x1000, is that what you are saying when you refer to "moved to high addresses to stay out of that range" ?
Correct.
Arnd
Hi Arnd,
On Mon, Dec 21, 2015 at 03:15:54PM +0100, Arnd Bergmann wrote:
On Monday 21 December 2015, Tomasz Nowicki wrote:
On 21.12.2015 13:10, Lorenzo Pieralisi wrote:
On Fri, Dec 18, 2015 at 06:56:39PM +0000, okaya@codeaurora.org wrote:
I have multiple root ports with the same IO port configuration in the current ACPI table.
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
It is fine. You end up mapping for each of those a 4k window of the virtual address space allocated to IO and that's what you will have in the kernel PCI resources (not in the HW BARs though). If that was a problem it would be even for the current DT host controllers eg:
arch/arm64/boot/dts/apm/apm-storm.dtsi
it should not be (again I will let Arnd comment on this since he may be aware of issues encountered on other arches/platforms).
Root port 0 = IO range 0x1000-0x10FFF Root port 1 = IO range 0x1000-0x10FFF Root port 2 = IO range 0x1000-0x10FFF
If above ranges are mapped into different CPU windows, then yes, it is fine.
Ideally, they should all be the same CPU address so we only have to map the window once, each device gets an address below 64K, and you can have legacy port numbers (below 4K) on any bus, which is required to make certain GPUs work.
Can I ask you to elaborate on the above please ? Do you mean a single CPU physical address range mapping the whole PCI IO address space ? I did not quite get what you mean by "you can have legacy port numbers on any bus", I think it would be good to clarify so that we are all on the same page.
Thanks a lot ! Lorenzo