Hi,
I am able to see APEI events (memory errors) via perft tool and now switch my context to others types of error. According to our plan, cache ECC error type would be next.
I want to place tracepoints at the end of every available APEI error processing path (actually it is already done for memory errors). This way we RAS daemon could keep its eye on them. There would be of course more type of error coming from others ARM SoC peripherals.
GHES driver is unrolling data structure to "Common Platform Error Record" and try to obtain error section (defined in UEFI spec). Cache ECC fall within "Processor Error Sections". Here appear problems since most of "Common Platform Error Records" are x86/IA64 oriented.
So my proposals would be to: 1. Implement all available APEI errors as tracepoint that is: Processor and PCI errors 2. Mean time, create proposals for UEFI spec. toward "Common Platform Error Record" expansion for ARM architecture. This would require changing available structures and add new sections for SoC peripherals (USB, Ethernet etc.). 3. In turn, new sections for SoC peripherals in UEFI spec. would require ACPI spec. change as well, I mean new error sources for HEST table. 4. Once above proposals would be accept, we could fill in tracepoints for new error types.
I would appreciate any input on this.
Regards, Tomasz