On 1/18/2016 11:08 AM, Borislav Petkov wrote:
> On Mon, Jan 18, 2016 at 10:08:00AM -0500, Abdulhamid, Harb wrote: >> Here is my crack at massaging the language a bit more: "Under >> normal circumstances, when a hardware error occurs, the kernel gets >> notified via an NMI, MCE or some other method. When the error has a >> fatal severity or is unrecoverable, the kernel would normally >> panic. > > So this is still not exact. It all depends on what the hardware > does. Even more importantly, does the hardware even run the error > handler and let it access MCA banks to find about the error or does > it directly warm-reset the system. > > The error can happen, it is critical, *nothing* might be visible in > the MCA registers (this is x86-specific) and the machine would reset. > Only when you warm-reset, you may or may not see anything in there. > > In reading the BERT explanation in the ACPI spec, I have to say, it > sounds pretty ok to me: > > "18.3.1 Boot Error Source > > Under normal circumstances, when a hardware error occurs, the error > handler receives control and processes the error. This gives OSPM a > chance to process the error condition, report it, and optionally > attempt recovery. In some cases, the system is unable to process an > error. For example, system firmware or a management controller may > choose to reset the system or the system might experience an > uncontrolled crash or reset.The boot error source is used to report > unhandled errors that occurred in a previous boot. This mechanism is > described in the BERT table." > > I think we should take that text. :)
Agreed. That would be best.

Apologies for the format of the last email.

--
Qualcomm Technologies, Inc. on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project