Exiting-event identification can also have bit 13 set, indicating a nested exception encountered and caused VM-exit. when reinjecting the exception to guests, kvm needs to set the "nested" bit, right? I suspect some changes to e.g., handle_exception_nmi() are needed.
The current patch relies on kvm_multiple_exception() to do that. But TBH, I'm not sure it can recognize all nested cases. I probably should revisit it.
So the conclusion is that kvm_multiple_exception() is smart enough, and a VMM doesn't have to check bit 13 of the Exiting-event identification.
In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception Support, there is a statement at the end of Exiting-event identification:
(The value of this bit is always identical to that of the valid bit of the original-event identification field.)
It means that even w/o VMX Nested-Exception support, a VMM already knows if an exception is a nested exception encountered during delivery of another event in an exception caused VM exit (exit reason 0). This is done in KVM through reading IDT_VECTORING_INFO_FIELD and calling vmx_complete_interrupts() immediately after VM exits.
vmx_complete_interrupts() simply queues the original exception if there is one, and later the nested exception causing the VM exit could be cancelled if it is a shadow page fault. However if the shadow page fault is caused by a guest page fault, KVM injects it as a nested exception to have guest fix its page table.
I will add comments about this background in the next iteration.
is it possible that the CPU encounters an exception and causes VM-exit during injecting an __interrupt__? in this case, no __exception__ will be (re-)queued by vmx_complete_interrupts().
I guess the following case is what you're suggesting: KVM injects an external interrupt after shadow page tables are nuked.
vmx_complete_interrupts() are called after each VM exit to clear both interrupt and exception queues, which means it always pushes the deepest event if there is an original event. In the above case, the original event is the external interrupt KVM just tried to inject.