On Tue, Apr 07, 2020 at 09:48:02PM -0700, Andy Lutomirski wrote:
I’m fine with the flow being different. do_machine_check() could have entirely different logic to decide the error in PV.
Nope, do_machine_check() is already as ugly as it gets. I don't want any more crap in it.
But I think we should reuse the overall flow: kernel gets #MC with RIP pointing to the offending instruction. If there’s an extable entry that can handle memory failure, handle it. If it’s a user access, handle it. If it’s an unrecoverable error because it was a non-extable kernel access, oops or panic.
The actual PV part could be extremely simple: the host just needs to tell the guest “this #MC is due to memory failure at this guest physical address”. No banks, no DIMM slot, no rendezvous crap (LMCE), no other nonsense. It would be nifty if the host also told the guest what the guest virtual address was if the host knows it.
It better be a whole different path and a whole different vector. If you wanna keep it simple and apart from all of the other nonsense, then you can just as well use a completely different vector.
Thx.