On 1/18/2016 11:08 AM, Borislav Petkov wrote:
> On Mon, Jan 18, 2016 at 10:08:00AM -0500, Abdulhamid, Harb wrote:
>> Here is my crack at massaging the language a bit more: "Under
>> normal circumstances, when a hardware error occurs, the kernel gets
>> notified via an NMI, MCE or some other method. When the error has a
>> fatal severity or is unrecoverable, the kernel would normally
>> panic.
>
> So this is still not exact. It all depends on what the hardware
> does. Even more importantly, does the hardware even run the error
> handler and let it access MCA banks to find about the error or does
> it directly warm-reset the system.
>
> The error can happen, it is critical, *nothing* might be visible in
> the MCA registers (this is x86-specific) and the machine would reset.
> Only when you warm-reset, you may or may not see anything in there.
>
> In reading the BERT explanation in the ACPI spec, I have to say, it
> sounds pretty ok to me:
>
> "18.3.1 Boot Error Source
>
> Under normal circumstances, when a hardware error occurs, the error
> handler receives control and processes the error. This gives OSPM a
> chance to process the error condition, report it, and optionally
> attempt recovery. In some cases, the system is unable to process an
> error. For example, system firmware or a management controller may
> choose to reset the system or the system might experience an
> uncontrolled crash or reset.The boot error source is used to report
> unhandled errors that occurred in a previous boot. This mechanism is
> described in the BERT table."
>
> I think we should take that text. :)
Agreed. That would be best.
Apologies for the format of the last email.
--
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center,
Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
Linux Foundation Collaborative Project