We are able to inject GHES from userspace. Kernel can parse error status block and print error message in a more descriptive way:
[...] [ 0.744715] GHES: APEI firmware first mode is enabled by APEI bit. [ 0.744749] EINJ: Error INJection is initialized. [...]
# echo 0x20 > /sys/kernel/debug/apei/einj/error_inject [ 149.010380] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2 [ 149.017080] {1}[Hardware Error]: APEI generic hardware error status [ 149.023217] {1}[Hardware Error]: severity: 1, fatal [ 149.027998] {1}[Hardware Error]: section: 0, severity: 0, recoverable [ 149.034317] {1}[Hardware Error]: flags: 0x00 [ 149.038501] {1}[Hardware Error]: section_type: memory error
where 0x20 can be random error ID because of the hack. Please see following patches for more explanation.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- drivers/acpi/apei/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/Kconfig b/drivers/acpi/apei/Kconfig index f0c1ce9..935d823 100644 --- a/drivers/acpi/apei/Kconfig +++ b/drivers/acpi/apei/Kconfig @@ -2,7 +2,7 @@ config ACPI_APEI bool "ACPI Platform Error Interface (APEI)" select MISC_FILESYSTEMS select PSTORE - depends on X86 + depends on X86 || ARM || ARM64 help APEI allows to report errors (for example from the chipset) to the operating system. This improves NMI handling @@ -11,7 +11,7 @@ config ACPI_APEI
config ACPI_APEI_GHES bool "APEI Generic Hardware Error Source" - depends on ACPI_APEI && X86 + depends on ACPI_APEI && (X86 || ARM || ARM64) select ACPI_HED select IRQ_WORK select GENERIC_ALLOCATOR
Since HEST (hardware error sources table) can describe more than PCI specific errors, it needs to be moved out of acpi_pci_root_init.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- drivers/acpi/pci_root.c | 2 -- drivers/acpi/scan.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index e427dc5..8e88431 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -574,8 +574,6 @@ static void acpi_pci_root_remove(struct acpi_device *device)
void __init acpi_pci_root_init(void) { - acpi_hest_init(); - if (!acpi_pci_disabled) { pci_acpi_crs_quirks(); acpi_scan_add_handler(&pci_root_handler); diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index da13061..f732f1d 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -13,6 +13,7 @@ #include <linux/nls.h>
#include <acpi/acpi_drivers.h> +#include <acpi/apei.h>
#include "internal.h"
@@ -2036,6 +2037,7 @@ int __init acpi_scan_init(void) printk(KERN_ERR PREFIX "Could not register bus type\n"); }
+ acpi_hest_init(); #if defined(CONFIG_PCI) acpi_pci_root_init(); acpi_pci_link_init();
On 06/19/2013 05:38 AM, Tomasz Nowicki wrote:
Since HEST (hardware error sources table) can describe more than PCI specific errors, it needs to be moved out of acpi_pci_root_init.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org
drivers/acpi/pci_root.c | 2 -- drivers/acpi/scan.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index e427dc5..8e88431 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -574,8 +574,6 @@ static void acpi_pci_root_remove(struct acpi_device *device)
void __init acpi_pci_root_init(void) {
- acpi_hest_init();
- if (!acpi_pci_disabled) { pci_acpi_crs_quirks(); acpi_scan_add_handler(&pci_root_handler);
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index da13061..f732f1d 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -13,6 +13,7 @@ #include <linux/nls.h>
#include <acpi/acpi_drivers.h> +#include <acpi/apei.h>
#include "internal.h"
@@ -2036,6 +2037,7 @@ int __init acpi_scan_init(void) printk(KERN_ERR PREFIX "Could not register bus type\n"); }
- acpi_hest_init(); #if defined(CONFIG_PCI) acpi_pci_root_init(); acpi_pci_link_init();
I'm not sure I know what the answer is, but should this call occur sooner? For example, would it make any sense to have acpi_boot_init() call acpi_hest_init()? Or is that just far too early to be of any use?
W dniu 19.06.2013 23:45, Al Stone pisze:
On 06/19/2013 05:38 AM, Tomasz Nowicki wrote:
Since HEST (hardware error sources table) can describe more than PCI specific errors, it needs to be moved out of acpi_pci_root_init.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org
drivers/acpi/pci_root.c | 2 -- drivers/acpi/scan.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c index e427dc5..8e88431 100644 --- a/drivers/acpi/pci_root.c +++ b/drivers/acpi/pci_root.c @@ -574,8 +574,6 @@ static void acpi_pci_root_remove(struct acpi_device *device)
void __init acpi_pci_root_init(void) {
- acpi_hest_init();
if (!acpi_pci_disabled) { pci_acpi_crs_quirks(); acpi_scan_add_handler(&pci_root_handler);
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index da13061..f732f1d 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -13,6 +13,7 @@ #include <linux/nls.h>
#include <acpi/acpi_drivers.h> +#include <acpi/apei.h>
#include "internal.h"
@@ -2036,6 +2037,7 @@ int __init acpi_scan_init(void) printk(KERN_ERR PREFIX "Could not register bus type\n"); }
- acpi_hest_init(); #if defined(CONFIG_PCI) acpi_pci_root_init(); acpi_pci_link_init();
I'm not sure I know what the answer is, but should this call occur sooner? For example, would it make any sense to have acpi_boot_init() call acpi_hest_init()? Or is that just far too early to be of any use?
Once we move acpi_hest_init() to acpi_boot_init() it becomes architecture specific thing. Also, I don't see any other piece of code that could need it earlier. I suggest to leave it here. In case parsing of HEST table is required sooner then we will move it. What do you say Al?
Tomasz
Till now __flush_tlb_one was used for unmapping virtual memory which in turn is specific function for x86. Replace it with more generic flush_tlb_kernel_range.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- drivers/acpi/apei/ghes.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index fcd7d91..0d83ac7 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -193,7 +193,7 @@ static void ghes_iounmap_nmi(void __iomem *vaddr_ptr)
BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - __flush_tlb_one(vaddr); + flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE); }
static void ghes_iounmap_irq(void __iomem *vaddr_ptr) @@ -203,7 +203,7 @@ static void ghes_iounmap_irq(void __iomem *vaddr_ptr)
BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - __flush_tlb_one(vaddr); + flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE); }
static int ghes_estatus_pool_init(void)
GHES that supose to be handled by NMI is specific for x86 architecture, ARM does not have such interrupt. It needs be replaced by some equivalent.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- drivers/acpi/apei/ghes.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 0d83ac7..5d8c2f9 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -50,9 +50,13 @@ #include <linux/aer.h>
#include <acpi/ghes.h> +#ifdef CONFIG_X86 #include <asm/mce.h> +#endif #include <asm/tlbflush.h> +#ifdef CONFIG_X86 #include <asm/nmi.h> +#endif
#include "apei-internal.h"
@@ -87,7 +91,9 @@ bool ghes_disable; module_param_named(disable, ghes_disable, bool, 0);
+#ifdef CONFIG_X86 static int ghes_panic_timeout __read_mostly = 30; +#endif
/* * All error sources notified with SCI shares one notifier function, @@ -101,11 +107,13 @@ static LIST_HEAD(ghes_sci); static LIST_HEAD(ghes_nmi); static DEFINE_MUTEX(ghes_list_mutex);
+#ifdef CONFIG_X86 /* * NMI may be triggered on any CPU, so ghes_nmi_lock is used for * mutual exclusion. */ static DEFINE_RAW_SPINLOCK(ghes_nmi_lock); +#endif
/* * Because the memory area used to transfer hardware error information @@ -250,10 +258,12 @@ static int ghes_estatus_pool_expand(unsigned long len) return 0; }
+#ifdef CONFIG_X86 static void ghes_estatus_pool_shrink(unsigned long len) { ghes_estatus_pool_size_request -= PAGE_ALIGN(len); } +#endif
static struct ghes *ghes_new(struct acpi_hest_generic *generic) { @@ -749,6 +759,7 @@ static void ghes_proc_in_irq(struct irq_work *irq_work) } }
+#ifdef CONFIG_X86 static void ghes_print_queued_estatus(void) { struct llist_node *llnode; @@ -844,11 +855,13 @@ out: raw_spin_unlock(&ghes_nmi_lock); return ret; } +#endif
static struct notifier_block ghes_notifier_sci = { .notifier_call = ghes_notify_sci, };
+#ifdef CONFIG_X86 static unsigned long ghes_esource_prealloc_size( const struct acpi_hest_generic *generic) { @@ -863,12 +876,15 @@ static unsigned long ghes_esource_prealloc_size(
return prealloc_size; } +#endif
static int ghes_probe(struct platform_device *ghes_dev) { struct acpi_hest_generic *generic; struct ghes *ghes = NULL; +#ifdef CONFIG_X86 unsigned long len; +#endif int rc = -EINVAL;
generic = *(struct acpi_hest_generic **)ghes_dev->dev.platform_data; @@ -940,6 +956,7 @@ static int ghes_probe(struct platform_device *ghes_dev) mutex_unlock(&ghes_list_mutex); break; case ACPI_HEST_NOTIFY_NMI: +#ifdef CONFIG_X86 len = ghes_esource_prealloc_size(generic); ghes_estatus_pool_expand(len); mutex_lock(&ghes_list_mutex); @@ -948,6 +965,7 @@ static int ghes_probe(struct platform_device *ghes_dev) "ghes"); list_add_rcu(&ghes->list, &ghes_nmi); mutex_unlock(&ghes_list_mutex); +#endif break; default: BUG(); @@ -969,7 +987,9 @@ static int ghes_remove(struct platform_device *ghes_dev) { struct ghes *ghes; struct acpi_hest_generic *generic; +#ifdef CONFIG_X86 unsigned long len; +#endif
ghes = platform_get_drvdata(ghes_dev); generic = ghes->generic; @@ -990,6 +1010,7 @@ static int ghes_remove(struct platform_device *ghes_dev) mutex_unlock(&ghes_list_mutex); break; case ACPI_HEST_NOTIFY_NMI: +#ifdef CONFIG_X86 mutex_lock(&ghes_list_mutex); list_del_rcu(&ghes->list); if (list_empty(&ghes_nmi)) @@ -1002,6 +1023,7 @@ static int ghes_remove(struct platform_device *ghes_dev) synchronize_rcu(); len = ghes_esource_prealloc_size(generic); ghes_estatus_pool_shrink(len); +#endif break; default: BUG();
Platform-Wide OSPM Capabilities (_SB._OSC control method) describe platform features that are supported, APEI bit in this case.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl | 16 ++++++++++++++++ arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl | 19 +++++++++++++++++++ 2 files changed, 35 insertions(+)
diff --git a/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl b/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl index 67b1b42..d385de8 100644 --- a/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl +++ b/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl @@ -51,5 +51,21 @@ DefinitionBlock ( Return (20000) } } + + Method (_OSC, 4, NotSerialized) + { + /* Platform-Wide OSPM Capabilities */ + If(LEqual(Arg0,ToUUID("0811B06E-4A27-44F9-8D60-3CBBC22E7B48"))) + { + /* APEI support unconditionally */ + Return (Arg3) + } Else { + Return (Buffer (0x10) + { + /* 0000 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + /* 0008 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 + }) + } + } } } diff --git a/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl b/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl index 7f06af0..7ab0727 100644 --- a/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl +++ b/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl @@ -19,4 +19,23 @@ DefinitionBlock ( Processor (CPU0, 0x01, 0x00000410, 0x06) {} Processor (CPU1, 0x02, 0x00000410, 0x06) {} } + + Scope (_SB) + { + Method (_OSC, 4, NotSerialized) + { + /* Platform-Wide OSPM Capabilities */ + If(LEqual(Arg0,ToUUID("0811B06E-4A27-44F9-8D60-3CBBC22E7B48"))) + { + /* APEI support unconditionally */ + Return (Arg3) + } Else { + Return (Buffer (0x10) + { + /* 0000 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + /* 0008 */ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 + }) + } + } + } }
HED is meant to intermediate between the hardware error occurrence and SCI handler. So this device is needed to to convey correctable hardware error to OS.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl | 6 ++++++ arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl | 6 ++++++ 2 files changed, 12 insertions(+)
diff --git a/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl b/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl index d385de8..5b5a9f4 100644 --- a/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl +++ b/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl @@ -67,5 +67,11 @@ DefinitionBlock ( }) } } + + Device (HED) + { + Name (_HID, EisaId ("PNP0C33")) + Name (_UID, 0x00) + } } } diff --git a/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl b/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl index 7ab0727..68f195c 100644 --- a/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl +++ b/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl @@ -37,5 +37,11 @@ DefinitionBlock ( }) } } + + Device (HED) + { + Name (_HID, EisaId ("PNP0C33")) + Name (_UID, 0x00) + } } }
There are couple of reasons why such hack was applied: o lack of UEFI thus no firmware region is reserved to exchange info about error between firmware and OS o EINJ table should operate on registers but GPIO interrupts are availabled now
We set aside some pretend physical space (that is described the same in the tables) and fill in with some error info e.g. memory error. Then is should be triggered. The easiest way to trigger hardware error is to call AML method directly which in turn notify HED (hardware error device). Later on, HED call SCI handler, traverse all GHES list, match appropriate GHES and start to parse.
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl | 5 ++ arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl | 2 +- arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl | 5 ++ arch/arm64/boot/asl/foundation-v8.acpi/hest.asl | 2 +- drivers/acpi/apei/einj.c | 74 ++++++++++++++++++++ drivers/acpi/apei/ghes.c | 9 ++- include/acpi/ghes.h | 4 ++ 7 files changed, 98 insertions(+), 3 deletions(-)
diff --git a/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl b/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl index 5b5a9f4..84d6d51 100644 --- a/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl +++ b/arch/arm/boot/asl/exynos5250-arndale.acpi/dsdt.asl @@ -73,5 +73,10 @@ DefinitionBlock ( Name (_HID, EisaId ("PNP0C33")) Name (_UID, 0x00) } + + Method (TRIG, 0, NotSerialized) + { + Notify (HED, 0x80) + } } } diff --git a/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl b/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl index ad29f6f..4d66762 100644 --- a/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl +++ b/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl @@ -148,7 +148,7 @@ [0001] Bit Width : 40 [0001] Bit Offset : 00 [0001] Encoded Access Width : 04 [QWord Access:64] -[0008] Address : 0000000000000000 +[0008] Address : 0x42010000
[0028] Notify : [Hardware Error Notification Structure] [0001] Notify Type : 03 [SCI] diff --git a/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl b/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl index 68f195c..7f8595e 100644 --- a/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl +++ b/arch/arm64/boot/asl/foundation-v8.acpi/dsdt.asl @@ -43,5 +43,10 @@ DefinitionBlock ( Name (_HID, EisaId ("PNP0C33")) Name (_UID, 0x00) } + + Method (TRIG, 0, NotSerialized) + { + Notify (HED, 0x80) + } } } diff --git a/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl b/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl index f133704..56d8bc4 100644 --- a/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl +++ b/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl @@ -148,7 +148,7 @@ [0001] Bit Width : 40 [0001] Bit Offset : 00 [0001] Encoded Access Width : 04 [QWord Access:64] -[0008] Address : 0000000000000000 +[0008] Address : 0x88180000
[0028] Notify : [Hardware Error Notification Structure] [0001] Notify Type : 03 [SCI] diff --git a/drivers/acpi/apei/einj.c b/drivers/acpi/apei/einj.c index 8d457b5..5e8b593 100644 --- a/drivers/acpi/apei/einj.c +++ b/drivers/acpi/apei/einj.c @@ -34,6 +34,10 @@ #include <linux/delay.h> #include <acpi/acpi.h>
+#if defined (CONFIG_ARM) || defined (CONFIG_ARM64) +#include <acpi/ghes.h> +#endif + #include "apei-internal.h"
#define EINJ_PFX "EINJ: " @@ -143,6 +147,10 @@ static DEFINE_MUTEX(einj_mutex);
static void *einj_param;
+#if defined (CONFIG_ARM) || defined (CONFIG_ARM64) +extern struct list_head ghes_sci; +#endif + static void einj_exec_ctx_init(struct apei_exec_context *ctx) { apei_exec_ctx_init(ctx, einj_ins_type, ARRAY_SIZE(einj_ins_type), @@ -613,10 +621,72 @@ DEFINE_SIMPLE_ATTRIBUTE(error_type_fops, error_type_get,
static int error_inject_set(void *data, u64 val) { +#if defined (CONFIG_ARM) || defined (CONFIG_ARM64) + /* + * Simulate error injection by calling AML control method directly. + * We need this hack because of lack in GPIO functionality. + * + * Lets simulate platform memory error. + */ + +#define SIZE sizeof(u64) + \ + sizeof(struct acpi_hest_generic_status) + \ + sizeof(struct acpi_hest_generic_data) + \ + sizeof (struct cper_sec_mem_err) + + char buf[SIZE]; + struct acpi_hest_generic_status *block_ptr; + struct acpi_hest_generic_data *gdata; + struct cper_sec_mem_err *mem_err; + struct ghes *ghes; + u64 paddr; + u64 *add_ptr; + int status; + + list_for_each_entry_rcu(ghes, &ghes_sci, list) { + if (!ghes || !ghes->generic) + return ACPI_EINJ_FAILURE; + + paddr = ghes->generic->error_status_address.address; + + memset(buf, 0, SIZE); + + /* First point to generic error status block */ + add_ptr = (u64 *) buf; + *add_ptr = paddr + sizeof(u64); + + /* Fill in generic error status block */ + block_ptr = (struct acpi_hest_generic_status *) (++add_ptr); + block_ptr->block_status = 1; + block_ptr->data_length = sizeof(struct acpi_hest_generic_data); + block_ptr->error_severity = GHES_SEV_CORRECTED; + + /* Fill in generic error data entry */ + gdata = (struct acpi_hest_generic_data *) (block_ptr + 1); + memcpy(gdata->section_type, (void *) &CPER_SEC_PLATFORM_MEM, + sizeof(uuid_le)); + gdata->error_data_length = sizeof(struct cper_sec_mem_err); + block_ptr->data_length += gdata->error_data_length; + + mem_err = (struct cper_sec_mem_err *) (gdata + 1); + + /* Copy into the physical region */ + ghes_copy_tofrom_phys(buf, paddr, SIZE, 0); + + status = acpi_evaluate_object(NULL, "\_SB.TRIG", NULL, NULL); + if (status != ACPI_EINJ_SUCCESS) { + pr_err("Failure during AML control method.\n"); + break; + } + } + + return status; +#else if (!error_type) return -EINVAL;
return einj_error_inject(error_type, error_param1, error_param2); +#endif }
DEFINE_SIMPLE_ATTRIBUTE(error_inject_fops, NULL, @@ -668,6 +738,7 @@ static int __init einj_init(void) einj_debug_dir = debugfs_create_dir("einj", apei_get_debugfs_dir()); if (!einj_debug_dir) goto err_cleanup; +#if !defined (CONFIG_ARM) && !defined (CONFIG_ARM64) fentry = debugfs_create_file("available_error_type", S_IRUSR, einj_debug_dir, NULL, &available_error_type_fops); @@ -677,11 +748,13 @@ static int __init einj_init(void) einj_debug_dir, NULL, &error_type_fops); if (!fentry) goto err_cleanup; +#endif fentry = debugfs_create_file("error_inject", S_IWUSR, einj_debug_dir, NULL, &error_inject_fops); if (!fentry) goto err_cleanup;
+#if !defined (CONFIG_ARM) && !defined (CONFIG_ARM64) apei_resources_init(&einj_resources); einj_exec_ctx_init(&ctx); rc = apei_exec_collect_resources(&ctx, &einj_resources); @@ -723,6 +796,7 @@ static int __init einj_init(void) if (!fentry) goto err_unmap; } +#endif
pr_info(EINJ_PFX "Error INJection is initialized.\n");
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 5d8c2f9..b59cd4f 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -103,7 +103,11 @@ static int ghes_panic_timeout __read_mostly = 30; * RCU is used for these lists, so ghes_list_mutex is only used for * list changing, not for traversing. */ +#if defined (CONFIG_ARM) || defined (CONFIG_ARM64) +LIST_HEAD(ghes_sci); +#else static LIST_HEAD(ghes_sci); +#endif static LIST_HEAD(ghes_nmi); static DEFINE_MUTEX(ghes_list_mutex);
@@ -324,7 +328,10 @@ static inline int ghes_severity(int severity) } }
-static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, +#if !defined (CONFIG_ARM) && !defined (CONFIG_ARM64) +static +#endif +void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, int from_phys) { void __iomem *vaddr; diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index 720446c..22b8007 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -70,3 +70,7 @@ static inline void ghes_edac_unregister(struct ghes *ghes) { } #endif + +#if defined (CONFIG_ARM) || defined (CONFIG_ARM64) +void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, int from_phys); +#endif
o fix number of Injection Entry in EINJ o leave one generic hardware error source to get rid of error messages in dmsg, some of them were x86 specific or not fully supported for ARM now
Signed-off-by: Tomasz Nowicki tomasz.nowicki@linaro.org --- arch/arm/boot/asl/exynos5250-arndale.acpi/einj.asl | 2 +- arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl | 146 +------------------- arch/arm64/boot/asl/foundation-v8.acpi/einj.asl | 2 +- arch/arm64/boot/asl/foundation-v8.acpi/hest.asl | 146 +------------------- 4 files changed, 4 insertions(+), 292 deletions(-)
diff --git a/arch/arm/boot/asl/exynos5250-arndale.acpi/einj.asl b/arch/arm/boot/asl/exynos5250-arndale.acpi/einj.asl index 2127989..fa04b80 100644 --- a/arch/arm/boot/asl/exynos5250-arndale.acpi/einj.asl +++ b/arch/arm/boot/asl/exynos5250-arndale.acpi/einj.asl @@ -20,7 +20,7 @@ [0004] Injection Header Length : 00000030 [0001] Flags : 00 [0003] Reserved : 000000 -[0004] Injection Entry Count : 0000000A +[0004] Injection Entry Count : 00000008
[0001] Action : 00 [Begin Operation] [0001] Instruction : 00 [Read Register] diff --git a/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl b/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl index 4d66762..db275f5 100644 --- a/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl +++ b/arch/arm/boot/asl/exynos5250-arndale.acpi/hest.asl @@ -17,122 +17,7 @@ [0004] Asl Compiler ID : "INTL" [0004] Asl Compiler Revision : 20100528
-[0004] Error Source Count : 00000004 - -[0002] Subtable Type : 0000 [IA-32 Machine Check Exception] -[0002] Source Id : 0000 -[0002] Reserved1 : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0008] Global Capability Data : 0000000000000000 -[0008] Global Control Data : 0000000000000000 -[0001] Num Hardware Banks : 02 -[0007] Reserved2 : 00000000000000 - -[0001] Bank Number : 00 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0001] Bank Number : 01 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0002] Subtable Type : 0001 [IA-32 Corrected Machine Check] -[0002] Source Id : 0001 -[0002] Reserved1 : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 - -[0028] Notify : [Hardware Error Notification Structure] -[0001] Notify Type : 00 [Polled] -[0001] Notify Length : 00 -[0002] Configuration Write Enable : 0000 -[0004] PollInterval : 00000000 -[0004] Vector : 00000000 -[0004] Polling Threshold Value : 00000000 -[0004] Polling Threshold Window : 00000000 -[0004] Error Threshold Value : 00000000 -[0004] Error Threshold Window : 00000000 - -[0001] Num Hardware Banks : 02 -[0003] Reserved2 : 000000 - -[0001] Bank Number : 00 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0001] Bank Number : 01 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0002] Subtable Type : 0007 [PCI Express AER (AER Endpoint)] -[0002] Source Id : 0000 -[0002] Reserved : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0004] Bus : 00000000 -[0002] Device : 0000 -[0002] Function : 0000 -[0002] DeviceControl : 0000 -[0002] Reserved : 0000 -[0004] Uncorrectable Mask : 00000000 -[0004] Uncorrectable Severity : 00000000 -[0004] Correctable Mask : 00000000 -[0004] Advanced Capabilities : 00000000 - -[0002] Subtable Type : 0008 [PCI Express/PCI-X Bridge AER] -[0002] Source Id : 0000 -[0002] Reserved : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0004] Bus : 00000000 -[0002] Device : 0000 -[0002] Function : 0000 -[0002] DeviceControl : 0000 -[0002] Reserved : 0000 -[0004] Uncorrectable Mask : 00000000 -[0004] Uncorrectable Severity : 00000000 -[0004] Correctable Mask : 00000000 -[0004] Advanced Capabilities : 00000000 -[0004] 2nd Uncorrectable Mask : 00000000 -[0004] 2nd Uncorrectable Severity : 00000000 -[0004] 2nd Advanced Capabilities : 00000000 +[0004] Error Source Count : 00000001
[0002] Subtable Type : 0009 [Generic Hardware Error Source] [0002] Source Id : 0002 @@ -162,32 +47,3 @@ [0004] Error Threshold Window : 00000000
[0004] Error Status Block Length : 00001000 - -[0002] Subtable Type : 0009 [Generic Hardware Error Source] -[0002] Source Id : 0003 -[0002] Related Source Id : 0000 -[0001] Reserved : 00 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0004] Max Raw Data Length : 00001000 - -[0012] Error Status Address : [Generic Address Structure] -[0001] Space ID : 00 [SystemMemory] -[0001] Bit Width : 40 -[0001] Bit Offset : 00 -[0001] Encoded Access Width : 04 [QWord Access:64] -[0008] Address : 0000000000000000 - -[0028] Notify : [Hardware Error Notification Structure] -[0001] Notify Type : 04 [NMI] -[0001] Notify Length : 1C -[0002] Configuration Write Enable : 0000 -[0004] PollInterval : 00000000 -[0004] Vector : 00000000 -[0004] Polling Threshold Value : 00000000 -[0004] Polling Threshold Window : 00000000 -[0004] Error Threshold Value : 00000000 -[0004] Error Threshold Window : 00000000 - -[0004] Error Status Block Length : 00001000 diff --git a/arch/arm64/boot/asl/foundation-v8.acpi/einj.asl b/arch/arm64/boot/asl/foundation-v8.acpi/einj.asl index 449aea1..33930c6 100644 --- a/arch/arm64/boot/asl/foundation-v8.acpi/einj.asl +++ b/arch/arm64/boot/asl/foundation-v8.acpi/einj.asl @@ -20,7 +20,7 @@ [0004] Injection Header Length : 00000030 [0001] Flags : 00 [0003] Reserved : 000000 -[0004] Injection Entry Count : 0000000A +[0004] Injection Entry Count : 00000008
[0001] Action : 00 [Begin Operation] [0001] Instruction : 00 [Read Register] diff --git a/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl b/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl index 56d8bc4..37e93a1 100644 --- a/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl +++ b/arch/arm64/boot/asl/foundation-v8.acpi/hest.asl @@ -17,122 +17,7 @@ [0004] Asl Compiler ID : "INTL" [0004] Asl Compiler Revision : 20100528
-[0004] Error Source Count : 00000004 - -[0002] Subtable Type : 0000 [IA-32 Machine Check Exception] -[0002] Source Id : 0000 -[0002] Reserved1 : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0008] Global Capability Data : 0000000000000000 -[0008] Global Control Data : 0000000000000000 -[0001] Num Hardware Banks : 02 -[0007] Reserved2 : 00000000000000 - -[0001] Bank Number : 00 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0001] Bank Number : 01 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0002] Subtable Type : 0001 [IA-32 Corrected Machine Check] -[0002] Source Id : 0001 -[0002] Reserved1 : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 - -[0028] Notify : [Hardware Error Notification Structure] -[0001] Notify Type : 00 [Polled] -[0001] Notify Length : 00 -[0002] Configuration Write Enable : 0000 -[0004] PollInterval : 00000000 -[0004] Vector : 00000000 -[0004] Polling Threshold Value : 00000000 -[0004] Polling Threshold Window : 00000000 -[0004] Error Threshold Value : 00000000 -[0004] Error Threshold Window : 00000000 - -[0001] Num Hardware Banks : 02 -[0003] Reserved2 : 000000 - -[0001] Bank Number : 00 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0001] Bank Number : 01 -[0001] Clear Status On Init : 00 -[0001] Status Format : 00 -[0001] Reserved : 00 -[0004] Control Register : 00000000 -[0008] Control Data : 0000000000000000 -[0004] Status Register : 00000000 -[0004] Address Register : 00000000 -[0004] Misc Register : 00000000 - -[0002] Subtable Type : 0007 [PCI Express AER (AER Endpoint)] -[0002] Source Id : 0000 -[0002] Reserved : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0004] Bus : 00000000 -[0002] Device : 0000 -[0002] Function : 0000 -[0002] DeviceControl : 0000 -[0002] Reserved : 0000 -[0004] Uncorrectable Mask : 00000000 -[0004] Uncorrectable Severity : 00000000 -[0004] Correctable Mask : 00000000 -[0004] Advanced Capabilities : 00000000 - -[0002] Subtable Type : 0008 [PCI Express/PCI-X Bridge AER] -[0002] Source Id : 0000 -[0002] Reserved : 0000 -[0001] Flags (decoded below) : 00 - Firmware First : 0 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0004] Bus : 00000000 -[0002] Device : 0000 -[0002] Function : 0000 -[0002] DeviceControl : 0000 -[0002] Reserved : 0000 -[0004] Uncorrectable Mask : 00000000 -[0004] Uncorrectable Severity : 00000000 -[0004] Correctable Mask : 00000000 -[0004] Advanced Capabilities : 00000000 -[0004] 2nd Uncorrectable Mask : 00000000 -[0004] 2nd Uncorrectable Severity : 00000000 -[0004] 2nd Advanced Capabilities : 00000000 +[0004] Error Source Count : 00000001
[0002] Subtable Type : 0009 [Generic Hardware Error Source] [0002] Source Id : 0002 @@ -162,32 +47,3 @@ [0004] Error Threshold Window : 00000000
[0004] Error Status Block Length : 00001000 - -[0002] Subtable Type : 0009 [Generic Hardware Error Source] -[0002] Source Id : 0003 -[0002] Related Source Id : 0000 -[0001] Reserved : 00 -[0001] Enabled : 01 -[0004] Records To Preallocate : 00000001 -[0004] Max Sections Per Record : 00000001 -[0004] Max Raw Data Length : 00001000 - -[0012] Error Status Address : [Generic Address Structure] -[0001] Space ID : 00 [SystemMemory] -[0001] Bit Width : 40 -[0001] Bit Offset : 00 -[0001] Encoded Access Width : 04 [QWord Access:64] -[0008] Address : 0000000000000000 - -[0028] Notify : [Hardware Error Notification Structure] -[0001] Notify Type : 04 [NMI] -[0001] Notify Length : 1C -[0002] Configuration Write Enable : 0000 -[0004] PollInterval : 00000000 -[0004] Vector : 00000000 -[0004] Polling Threshold Value : 00000000 -[0004] Polling Threshold Window : 00000000 -[0004] Error Threshold Value : 00000000 -[0004] Error Threshold Window : 00000000 - -[0004] Error Status Block Length : 00001000
On 06/19/2013 05:38 AM, Tomasz Nowicki wrote:
We are able to inject GHES from userspace. Kernel can parse error status block and print error message in a more descriptive way:
[...] [ 0.744715] GHES: APEI firmware first mode is enabled by APEI bit. [ 0.744749] EINJ: Error INJection is initialized. [...]
# echo 0x20 > /sys/kernel/debug/apei/einj/error_inject [ 149.010380] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2 [ 149.017080] {1}[Hardware Error]: APEI generic hardware error status [ 149.023217] {1}[Hardware Error]: severity: 1, fatal [ 149.027998] {1}[Hardware Error]: section: 0, severity: 0, recoverable [ 149.034317] {1}[Hardware Error]: flags: 0x00 [ 149.038501] {1}[Hardware Error]: section_type: memory error
where 0x20 can be random error ID because of the hack. Please see following patches for more explanation.
Nice patch set. Are there documents or pointers to documentation on the Linaro wiki describing how to use EINJ? If not, it would be nice to add something that can be pointed to as part of the demo at LCE.
W dniu 19.06.2013 23:47, Al Stone pisze:
On 06/19/2013 05:38 AM, Tomasz Nowicki wrote:
We are able to inject GHES from userspace. Kernel can parse error status block and print error message in a more descriptive way:
[...] [ 0.744715] GHES: APEI firmware first mode is enabled by APEI bit. [ 0.744749] EINJ: Error INJection is initialized. [...]
# echo 0x20 > /sys/kernel/debug/apei/einj/error_inject [ 149.010380] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2 [ 149.017080] {1}[Hardware Error]: APEI generic hardware error status [ 149.023217] {1}[Hardware Error]: severity: 1, fatal [ 149.027998] {1}[Hardware Error]: section: 0, severity: 0, recoverable [ 149.034317] {1}[Hardware Error]: flags: 0x00 [ 149.038501] {1}[Hardware Error]: section_type: memory error
where 0x20 can be random error ID because of the hack. Please see following patches for more explanation.
Nice patch set. Are there documents or pointers to documentation on the Linaro wiki describing how to use EINJ? If not, it would be nice to add something that can be pointed to as part of the demo at LCE.
Hi Al,
I create wiki page how to use EINJ and how it is built: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/EINJ
Also, I put link to EINJ as reference on: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/RASandACPI
Please take a look, and comments are welcomed.
Regards, Tomasz
Tomasz,
congratulations, excellent work!
/Andrea
On 1 July 2013 16:53, Tomasz Nowicki tomasz.nowicki@linaro.org wrote:
W dniu 19.06.2013 23:47, Al Stone pisze:
On 06/19/2013 05:38 AM, Tomasz Nowicki wrote:
We are able to inject GHES from userspace. Kernel can parse error status block and print error message in a more descriptive way:
[...] [ 0.744715] GHES: APEI firmware first mode is enabled by APEI bit. [ 0.744749] EINJ: Error INJection is initialized. [...]
# echo 0x20 > /sys/kernel/debug/apei/einj/error_inject [ 149.010380] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2 [ 149.017080] {1}[Hardware Error]: APEI generic hardware error status [ 149.023217] {1}[Hardware Error]: severity: 1, fatal [ 149.027998] {1}[Hardware Error]: section: 0, severity: 0, recoverable [ 149.034317] {1}[Hardware Error]: flags: 0x00 [ 149.038501] {1}[Hardware Error]: section_type: memory error
where 0x20 can be random error ID because of the hack. Please see following patches for more explanation.
Nice patch set. Are there documents or pointers to documentation on the Linaro wiki describing how to use EINJ? If not, it would be nice to add something that can be pointed to as part of the demo at LCE.
Hi Al,
I create wiki page how to use EINJ and how it is built: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/EINJ
Also, I put link to EINJ as reference on: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/RASandACPI
Please take a look, and comments are welcomed.
Regards, Tomasz
Linaro-acpi mailing list Linaro-acpi@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-acpi
On 07/01/2013 08:53 AM, Tomasz Nowicki wrote:
W dniu 19.06.2013 23:47, Al Stone pisze:
On 06/19/2013 05:38 AM, Tomasz Nowicki wrote:
We are able to inject GHES from userspace. Kernel can parse error status block and print error message in a more descriptive way:
[...] [ 0.744715] GHES: APEI firmware first mode is enabled by APEI bit. [ 0.744749] EINJ: Error INJection is initialized. [...]
# echo 0x20 > /sys/kernel/debug/apei/einj/error_inject [ 149.010380] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2 [ 149.017080] {1}[Hardware Error]: APEI generic hardware error status [ 149.023217] {1}[Hardware Error]: severity: 1, fatal [ 149.027998] {1}[Hardware Error]: section: 0, severity: 0, recoverable [ 149.034317] {1}[Hardware Error]: flags: 0x00 [ 149.038501] {1}[Hardware Error]: section_type: memory error
where 0x20 can be random error ID because of the hack. Please see following patches for more explanation.
Nice patch set. Are there documents or pointers to documentation on the Linaro wiki describing how to use EINJ? If not, it would be nice to add something that can be pointed to as part of the demo at LCE.
Hi Al,
I create wiki page how to use EINJ and how it is built: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/EINJ
Also, I put link to EINJ as reference on: https://wiki.linaro.org/LEG/Engineering/Kernel/ACPI/RASandACPI
Please take a look, and comments are welcomed.
Regards, Tomasz
Very nice; I found one minor, minor typo but that was it. Thanks for writing this up!