From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
On a platform with APEI (ACPI Platform Error Interface) enabled, firmware updates a memory region with hardware error record using nocache attribute. When OS reads the region, since it maps the region with cacahed attribute even though EFI memory map defines this region as uncached, OS gets stale data and errorneously reports there is no new HW error.
When ghes driver maps the memory region, it uses the cache attribute according to EFI memory map, if EFI memory map feature is enabled at runtime.
Since both arch/x86 and arch/ia64 implemented architecture agnostic EFI memory map attribue lookup function efi_memattributes(), the code is moved from arch/x86 into EFI subsystem and is declared as __weak; archs other than ia64 should not override the default implementation.
V6: 1. Implemented arch_apei_get_mem_attributes() for arm64 as inline function. 2. Rebased to efi-next-14364 of efi/next, pm+acpi-4.2-rc3 of linux-pm/master, arm64-upstream-13521 of arm64/master, next-20150720 of linux-next/master. V5: 1. Rebased to next-20150713 of linux-next/master, efi-next-14359 of efi/next, pm+acpi-4.2-rc2 of linux-pm/master, arm64-fixes-1215 of arm64/master. 2. Added comment for efi_mem_attributes(), explained why it is marked as __weak at the function definition site. V4: 1. Introduced arch_apei_get_mem_attributes() to allow arch specific implementation of getting pgprot_t appropriate for a physical address. 2. Implemented arch_apei_get_mem_attributes() for x86 and for arm64.
V3: 1. Rebased to v4.1-rc7. 2. Moved efi_mem_attributes() from arch/x86 to drivers/firmware/efi and declared it as __weak. 3. Introduced ARCH_APEI_PAGE_KERNEL_UC to allow arch specific page protection type for UC. 4. Removed efi_ioremap(). It can not be used for GHES memory region mapping purpose since ioremap can not be used in atomic context.
V2: 1. Rebased to v4.1-rc5. 2. Split removal of efi_mem_attributes() and creation of efi_ioremap() into two patches.
Jonathan (Zhixiong) Zhang (4): efi: x86: rearrange efi_mem_attributes() x86: acpi: implement arch_apei_get_mem_attributes() arm64: apei: implement arch_apei_get_mem_attributes() acpi, apei: use appropriate pgprot_t to map GHES memory
arch/arm64/include/asm/acpi.h | 15 +++++++++++++++ arch/x86/kernel/acpi/apei.c | 10 ++++++++++ arch/x86/platform/efi/efi.c | 18 ------------------ drivers/acpi/apei/ghes.c | 6 ++++-- drivers/firmware/efi/efi.c | 31 +++++++++++++++++++++++++++++++ include/acpi/apei.h | 1 + 6 files changed, 61 insertions(+), 20 deletions(-)
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
x86 and ia64 implement efi_mem_attributes() differently. This function needs to be available for other arch (such as arm64) as well, such as for the purpose of ACPI/APEI.
ia64 efi does not setup memmap variable and does not set EFI_MEMMAP flag, so it needs to have its unique implementation of efi_mem_attributes().
Move efi_mem_attributes() implementation from x86 to efi, and declare it with __weak. It is recommended that other archs should not override the default implementation.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org --- arch/x86/platform/efi/efi.c | 18 ------------------ drivers/firmware/efi/efi.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index dbc8627a5cdf..88b3ebaeb72f 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -917,24 +917,6 @@ u32 efi_mem_type(unsigned long phys_addr) return 0; }
-u64 efi_mem_attributes(unsigned long phys_addr) -{ - efi_memory_desc_t *md; - void *p; - - if (!efi_enabled(EFI_MEMMAP)) - return 0; - - for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { - md = p; - if ((md->phys_addr <= phys_addr) && - (phys_addr < (md->phys_addr + - (md->num_pages << EFI_PAGE_SHIFT)))) - return md->attribute; - } - return 0; -} - static int __init arch_parse_efi_cmdline(char *str) { if (parse_option_str(str, "old_map")) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 3061bb8629dc..bf4190a4f3f5 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -517,3 +517,34 @@ char * __init efi_md_typeattr_format(char *buf, size_t size, attr & EFI_MEMORY_UC ? "UC" : ""); return buf; } + +/* + * efi_mem_attributes - lookup memmap attributes for physical address + * @phys_addr: the physical address to lookup + * + * Search in the EFI memory map for the region covering + * @phys_addr. Returns the EFI memory attributes if the region + * was found in the memory map, 0 otherwise. + * + * Despite being marked __weak, most architectures should *not* + * override this function. It is __weak solely for the benefit + * of ia64 which has a funky EFI memory map that doesn't work + * the same way as other architectures. + */ +u64 __weak efi_mem_attributes(unsigned long phys_addr) +{ + efi_memory_desc_t *md; + void *p; + + if (!efi_enabled(EFI_MEMMAP)) + return 0; + + for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) { + md = p; + if ((md->phys_addr <= phys_addr) && + (phys_addr < (md->phys_addr + + (md->num_pages << EFI_PAGE_SHIFT)))) + return md->attribute; + } + return 0; +}
On Mon, 20 Jul, at 05:32:36PM, Jonathan (Zhixiong) Zhang wrote:
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
x86 and ia64 implement efi_mem_attributes() differently. This function needs to be available for other arch (such as arm64) as well, such as for the purpose of ACPI/APEI.
ia64 efi does not setup memmap variable and does not set EFI_MEMMAP flag, so it needs to have its unique implementation of efi_mem_attributes().
Move efi_mem_attributes() implementation from x86 to efi, and declare it with __weak. It is recommended that other archs should not override the default implementation.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org
arch/x86/platform/efi/efi.c | 18 ------------------ drivers/firmware/efi/efi.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 18 deletions(-)
Reviewed-by: Matt Fleming matt.fleming@intel.com
Appreciate Matt for your ack.
On 7/22/2015 4:11 AM, Matt Fleming wrote:
On Mon, 20 Jul, at 05:32:36PM, Jonathan (Zhixiong) Zhang wrote:
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
x86 and ia64 implement efi_mem_attributes() differently. This function needs to be available for other arch (such as arm64) as well, such as for the purpose of ACPI/APEI.
ia64 efi does not setup memmap variable and does not set EFI_MEMMAP flag, so it needs to have its unique implementation of efi_mem_attributes().
Move efi_mem_attributes() implementation from x86 to efi, and declare it with __weak. It is recommended that other archs should not override the default implementation.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org
arch/x86/platform/efi/efi.c | 18 ------------------ drivers/firmware/efi/efi.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 18 deletions(-)
Reviewed-by: Matt Fleming matt.fleming@intel.com
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
... to allow arch specific implementation of getting page protection type associated with a physical address.
If the physical address has memory attributes defined by EFI memmap as EFI_MEMORY_UC, the page protection type is PAGE_KENERL_NOCACHE. Otherwise, the page protection type is PAGE_KERNEL.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org --- arch/x86/kernel/acpi/apei.c | 10 ++++++++++ include/acpi/apei.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c index c280df6b2aa2..9c6b3c8d81e4 100644 --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -14,6 +14,8 @@
#include <acpi/apei.h>
+#include <linux/efi.h> + #include <asm/mce.h> #include <asm/tlbflush.h>
@@ -60,3 +62,11 @@ void arch_apei_flush_tlb_one(unsigned long addr) { __flush_tlb_one(addr); } + +pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) +{ + if (efi_mem_attributes(addr) & EFI_MEMORY_UC) + return PAGE_KERNEL_NOCACHE; + + return PAGE_KERNEL; +} diff --git a/include/acpi/apei.h b/include/acpi/apei.h index 284801ac7042..64a12ce9880b 100644 --- a/include/acpi/apei.h +++ b/include/acpi/apei.h @@ -46,6 +46,7 @@ int erst_clear(u64 record_id); int arch_apei_enable_cmcff(struct acpi_hest_header *hest_hdr, void *data); void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err); void arch_apei_flush_tlb_one(unsigned long addr); +pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr);
#endif #endif
On Mon, 20 Jul, at 05:32:37PM, Jonathan (Zhixiong) Zhang wrote:
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
... to allow arch specific implementation of getting page protection type associated with a physical address.
If the physical address has memory attributes defined by EFI memmap as EFI_MEMORY_UC, the page protection type is PAGE_KENERL_NOCACHE. Otherwise, the page protection type is PAGE_KERNEL.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org
arch/x86/kernel/acpi/apei.c | 10 ++++++++++ include/acpi/apei.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c index c280df6b2aa2..9c6b3c8d81e4 100644 --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -14,6 +14,8 @@ #include <acpi/apei.h> +#include <linux/efi.h>
#include <asm/mce.h> #include <asm/tlbflush.h> @@ -60,3 +62,11 @@ void arch_apei_flush_tlb_one(unsigned long addr) { __flush_tlb_one(addr); }
+pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) +{
- if (efi_mem_attributes(addr) & EFI_MEMORY_UC)
return PAGE_KERNEL_NOCACHE;
- return PAGE_KERNEL;
+}
Like I mentioned before, this is theoretically racey because depending on when you call arch_apei_get_mem_attribute() during boot, you'll potentially return a different protection for the *same* memory region. This is because on x86 we discard the EFI memory map in efi_free_boot_services(), after which time efi_mem_attributes() will always return 0.
Now, hitting that race would depend on a number of things but most importantly it would require the region of RAM containing the Hardware Error data to have EFI_MEMORY_UC set in the EFI memmap. For x86 I think it's fair to say that's extremely unlikely given our cache coherency architecture.
Also, as Will noted for arm64, this really wants to be static inline. I'm still hoping the x86/ACPI folks will chime in on this patch.
For x86 we don't need to perform this lookup today for GHES so I would just always return PAGE_KERNEL but include a comment explaining that doing anything else is unneeded. Something like this?
---
static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) {
/* * We currently have no way to lookup the EFI memory map * attributes for a region in a consistent way because the memap * is discarded after efi_free_boot_services(). So if you call * efi_mem_attributes() during boot and at runtime you could * theoretically see different attributes. * * Since we've yet to see any x86 platforms that require * anything other than PAGE_KERNEL (some arm64 platforms require * the equivalent of PAGE_KERNEL_NOCACHE), return that until we * know different. */
return PAGE_KERNEL; }
Thank you Matt for the great feedback.
On 7/22/2015 5:09 AM, Matt Fleming wrote:
On Mon, 20 Jul, at 05:32:37PM, Jonathan (Zhixiong) Zhang wrote:
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
... to allow arch specific implementation of getting page protection type associated with a physical address.
If the physical address has memory attributes defined by EFI memmap as EFI_MEMORY_UC, the page protection type is PAGE_KENERL_NOCACHE. Otherwise, the page protection type is PAGE_KERNEL.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org
arch/x86/kernel/acpi/apei.c | 10 ++++++++++ include/acpi/apei.h | 1 + 2 files changed, 11 insertions(+)
diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c index c280df6b2aa2..9c6b3c8d81e4 100644 --- a/arch/x86/kernel/acpi/apei.c +++ b/arch/x86/kernel/acpi/apei.c @@ -14,6 +14,8 @@
#include <acpi/apei.h>
+#include <linux/efi.h>
- #include <asm/mce.h> #include <asm/tlbflush.h>
@@ -60,3 +62,11 @@ void arch_apei_flush_tlb_one(unsigned long addr) { __flush_tlb_one(addr); }
+pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) +{
- if (efi_mem_attributes(addr) & EFI_MEMORY_UC)
return PAGE_KERNEL_NOCACHE;
- return PAGE_KERNEL;
+}
Like I mentioned before, this is theoretically racey because depending on when you call arch_apei_get_mem_attribute() during boot, you'll potentially return a different protection for the *same* memory region. This is because on x86 we discard the EFI memory map in efi_free_boot_services(), after which time efi_mem_attributes() will always return 0.
Now, hitting that race would depend on a number of things but most importantly it would require the region of RAM containing the Hardware Error data to have EFI_MEMORY_UC set in the EFI memmap. For x86 I think it's fair to say that's extremely unlikely given our cache coherency architecture.
Also, as Will noted for arm64, this really wants to be static inline.
Yes, will do.
I'm still hoping the x86/ACPI folks will chime in on this patch.
Same here.
For x86 we don't need to perform this lookup today for GHES so I would just always return PAGE_KERNEL but include a comment explaining that doing anything else is unneeded. Something like this?
The analysis and comments added make total sense to me. Will do so in V8 of the patch set after Will's feedback on v7.
static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) {
/* * We currently have no way to lookup the EFI memory map * attributes for a region in a consistent way because the memap * is discarded after efi_free_boot_services(). So if you call * efi_mem_attributes() during boot and at runtime you could * theoretically see different attributes. * * Since we've yet to see any x86 platforms that require * anything other than PAGE_KERNEL (some arm64 platforms require * the equivalent of PAGE_KERNEL_NOCACHE), return that until we * know different. */
return PAGE_KERNEL; }
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
If the physical address has memory attributes defined by EFI memmap as EFI_MEMORY_UC, the page protection type is PROT_DEVICE_nGnRE. Otherwise, the page protection type is PAGE_KERNEL.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org --- This patch applies cleanly to efi-next-14364 of efi/next and arm64-upstream-13521 of arm64/master, but needed slight change to apply to next-20150720 of linux-next/master and pm+acpi-4.2-rc3 of linux-pm/master. The later two branches has newer arch/arm64/include/asm/acpi.h with following patch: b6cfb277378e ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro --- arch/arm64/include/asm/acpi.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index 39248d3adf5d..42e4fd8aaf34 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -19,6 +19,11 @@ #include <asm/psci.h> #include <asm/smp_plat.h>
+#ifdef CONFIG_ACPI_APEI +#include <linux/efi.h> +#include <asm/pgtable.h> +#endif + /* Basic configuration for ACPI */ #ifdef CONFIG_ACPI /* ACPI table mapping after acpi_gbl_permanent_mmap is set */ @@ -84,4 +89,15 @@ static inline const char *acpi_get_enable_method(int cpu) { return acpi_psci_present() ? "psci" : NULL; } + +#ifdef CONFIG_ACPI_APEI +static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) +{ + if (efi_mem_attributes(addr) & EFI_MEMORY_UC) + return PROT_DEVICE_nGnRE; + + return PAGE_KERNEL; +} +#endif + #endif /*_ASM_ACPI_H*/
On Tue, Jul 21, 2015 at 01:32:38AM +0100, Jonathan (Zhixiong) Zhang wrote:
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
If the physical address has memory attributes defined by EFI memmap as EFI_MEMORY_UC, the page protection type is PROT_DEVICE_nGnRE. Otherwise, the page protection type is PAGE_KERNEL.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org
This patch applies cleanly to efi-next-14364 of efi/next and arm64-upstream-13521 of arm64/master, but needed slight change to apply to next-20150720 of linux-next/master and pm+acpi-4.2-rc3 of linux-pm/master. The later two branches has newer arch/arm64/include/asm/acpi.h with following patch: b6cfb277378e ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro
arch/arm64/include/asm/acpi.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index 39248d3adf5d..42e4fd8aaf34 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -19,6 +19,11 @@ #include <asm/psci.h> #include <asm/smp_plat.h> +#ifdef CONFIG_ACPI_APEI +#include <linux/efi.h> +#include <asm/pgtable.h> +#endif
/* Basic configuration for ACPI */ #ifdef CONFIG_ACPI /* ACPI table mapping after acpi_gbl_permanent_mmap is set */ @@ -84,4 +89,15 @@ static inline const char *acpi_get_enable_method(int cpu) { return acpi_psci_present() ? "psci" : NULL; }
+#ifdef CONFIG_ACPI_APEI +static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) +{
- if (efi_mem_attributes(addr) & EFI_MEMORY_UC)
return PROT_DEVICE_nGnRE;
The EFI spec says this should be nGnRnE afaict.
- return PAGE_KERNEL;
About about about WC and WT?
Will
Thanks Will. I will create a new patch in this patch set to supplement arm64's page protection type definitions accordingly to meet the needs as defined in UEFI 2.5 table 8. More comments inline...
On 7/21/2015 8:08 AM, Will Deacon wrote:
On Tue, Jul 21, 2015 at 01:32:38AM +0100, Jonathan (Zhixiong) Zhang wrote:
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
If the physical address has memory attributes defined by EFI memmap as EFI_MEMORY_UC, the page protection type is PROT_DEVICE_nGnRE. Otherwise, the page protection type is PAGE_KERNEL.
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org
This patch applies cleanly to efi-next-14364 of efi/next and arm64-upstream-13521 of arm64/master, but needed slight change to apply to next-20150720 of linux-next/master and pm+acpi-4.2-rc3 of linux-pm/master. The later two branches has newer arch/arm64/include/asm/acpi.h with following patch: b6cfb277378e ACPI / ARM64: add BAD_MADT_GICC_ENTRY() macro
arch/arm64/include/asm/acpi.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/arch/arm64/include/asm/acpi.h b/arch/arm64/include/asm/acpi.h index 39248d3adf5d..42e4fd8aaf34 100644 --- a/arch/arm64/include/asm/acpi.h +++ b/arch/arm64/include/asm/acpi.h @@ -19,6 +19,11 @@ #include <asm/psci.h> #include <asm/smp_plat.h>
+#ifdef CONFIG_ACPI_APEI +#include <linux/efi.h> +#include <asm/pgtable.h> +#endif
- /* Basic configuration for ACPI */ #ifdef CONFIG_ACPI /* ACPI table mapping after acpi_gbl_permanent_mmap is set */
@@ -84,4 +89,15 @@ static inline const char *acpi_get_enable_method(int cpu) { return acpi_psci_present() ? "psci" : NULL; }
+#ifdef CONFIG_ACPI_APEI +static inline pgprot_t arch_apei_get_mem_attribute(phys_addr_t addr) +{
- if (efi_mem_attributes(addr) & EFI_MEMORY_UC)
return PROT_DEVICE_nGnRE;
The EFI spec says this should be nGnRnE afaict.
Yes, I will define PROT_DEVICE_nGnRnE.
- return PAGE_KERNEL;
About about about WC and WT?
For WC, PROT_NORMAL_NC will be returned. For WT, I will define PROT_NORNMAL_WT. To enable that, I will also need to add MT_NORMAL_WT to MAIR_EL1.
Will
From: "Jonathan (Zhixiong) Zhang" zjzhang@codeaurora.org
With ACPI APEI firmware first handling, generic hardware error record is updated by firmware in GHES memory region. When firmware updated GHES memory region with uncached access attribute, Linux reads stale data from cache.
GHES memory region should be mapped with page protection type according to what is returned from arch_apei_get_mem_attribute(), instead of always with PAGE_KERNEL (eg. cached attribute).
Signed-off-by: Jonathan (Zhixiong) Zhang zjzhang@codeaurora.org --- drivers/acpi/apei/ghes.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index b979b5dbe5bc..98609b404dae 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -173,8 +173,10 @@ static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) unsigned long vaddr;
vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, - pfn << PAGE_SHIFT, PAGE_KERNEL); + ioremap_page_range(vaddr, + vaddr + PAGE_SIZE, + pfn << PAGE_SHIFT, + arch_apei_get_mem_attribute(pfn << PAGE_SHIFT));
return (void __iomem *)vaddr; }