Hi,
This is interesting answer to my question so I am shearing it with you.
Tomasz
-------- Original Message -------- Subject: Re: APEI question Date: Thu, 24 Oct 2013 02:38:26 -0400 From: Chen, Gong gong.chen@linux.intel.com To: Tomasz Nowicki tn@semihalf.com
On Wed, Oct 23, 2013 at 11:16:13AM +0200, Tomasz Nowicki wrote:
Date: Wed, 23 Oct 2013 11:16:13 +0200 From: Tomasz Nowicki tn@semihalf.com To: chen gong Chen@gchen.bj.intel.com CC: gong.chen@linux.intel.com Subject: Re: APEI question User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0
Let me put this in the different way.
ghes_do_proc() process errors which are reported via APEI. It handles with CPER_SEC_PLATFORM_MEM and CPER_SEC_PCIE error types. I wander why this function is not taking care of CPER_SEC_PROC_GENERIC.
Yes, the spec has covered all kinds of situations but the reality is cruel. Even FFM is enabled, not all errors can be covered by firmware. Based on the fact, only memory/IO CE and IO UC errors can be handled well by firmware by now. For processer errors, as a fact, I never saw so-called CE errors, only UC, even fatal error. As a result, you will get a MCE. You can consider that firmware just *bypass* this kind of error. In some way, that's why eMCA happens as a successor. That's all. Otherwise I will cross the red line ;-).
Hi Tomasz, The Linux code was written for x86. I'm not x86 guy but I can assume that ghes_do_proc() do not deal with CPER_SEC_PROC_GENERIC because for x86 it has no meaning. For x86 there are only memory and PEX errors. All other IOs are behind PEX. Anyway, ghes_do_proc() process GHES errors, which means these are not standard errors or platform vendor decides to do FFM to run errata fix in firmware . Otherwise, x86 architectural errors will be reported using APEI error types 0-2 and 6-8. I do not understand what Chen means when he refer to “processor errors”. Moreover, I think only SW can decide whether uncorrectable error is fatal or not. Thanks.
-----Original Message----- From: Tomasz Nowicki [mailto:tomasz.nowicki@linaro.org] Sent: Thursday, October 24, 2013 10:33 AM To: linaro-acpi Cc: Assaf Hoffman; Robert Richter; Al Stone Subject: Fwd: Re: APEI question
Hi,
This is interesting answer to my question so I am shearing it with you.
Tomasz
-------- Original Message -------- Subject: Re: APEI question Date: Thu, 24 Oct 2013 02:38:26 -0400 From: Chen, Gong gong.chen@linux.intel.com To: Tomasz Nowicki tn@semihalf.com
On Wed, Oct 23, 2013 at 11:16:13AM +0200, Tomasz Nowicki wrote:
Date: Wed, 23 Oct 2013 11:16:13 +0200 From: Tomasz Nowicki tn@semihalf.com To: chen gong Chen@gchen.bj.intel.com CC: gong.chen@linux.intel.com Subject: Re: APEI question User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0
Let me put this in the different way.
ghes_do_proc() process errors which are reported via APEI. It handles with CPER_SEC_PLATFORM_MEM and CPER_SEC_PCIE error types. I wander why this function is not taking care of CPER_SEC_PROC_GENERIC.
Yes, the spec has covered all kinds of situations but the reality is cruel. Even FFM is enabled, not all errors can be covered by firmware. Based on the fact, only memory/IO CE and IO UC errors can be handled well by firmware by now. For processer errors, as a fact, I never saw so-called CE errors, only UC, even fatal error. As a result, you will get a MCE. You can consider that firmware just *bypass* this kind of error. In some way, that's why eMCA happens as a successor. That's all. Otherwise I will cross the red line ;-).