Exiting-event identification can also have bit 13 set, indicating a nested exception encountered and caused VM-exit. when reinjecting the exception to guests, kvm needs to set the "nested" bit, right? I suspect some changes to e.g., handle_exception_nmi() are needed.
The current patch relies on kvm_multiple_exception() to do that. But TBH,
I'm
not sure it can recognize all nested cases. I probably should revisit it.
So the conclusion is that kvm_multiple_exception() is smart enough, and a VMM doesn't have to check bit 13 of the Exiting-event identification.
In FRED spec 5.0, section 9.2 - New VMX Feature: VMX Nested-Exception Support, there is a statement at the end of Exiting-event identification:
(The value of this bit is always identical to that of the valid bit of the original-event identification field.)
It means that even w/o VMX Nested-Exception support, a VMM already
knows
if an exception is a nested exception encountered during delivery of another event in an exception caused VM exit (exit reason 0). This is done in KVM through reading IDT_VECTORING_INFO_FIELD and calling vmx_complete_interrupts() immediately after VM exits.
vmx_complete_interrupts() simply queues the original exception if there is one, and later the nested exception causing the VM exit could be cancelled if it is a shadow page fault. However if the shadow page fault is caused by a guest page fault, KVM injects it as a nested exception to have guest fix its page table.
I will add comments about this background in the next iteration.
is it possible that the CPU encounters an exception and causes VM-exit during injecting an __interrupt__? in this case, no __exception__ will be (re-)queued by vmx_complete_interrupts().
I guess the following case is what you're suggesting: KVM injects an external interrupt after shadow page tables are nuked.
vmx_complete_interrupts() are called after each VM exit to clear both interrupt and exception queues, which means it always pushes the deepest event if there is an original event. In the above case, the original event is the external interrupt KVM just tried to inject.
in my understanding, your point is:
- if bit 13 of the Exiting-event identification is set. the original-event
identification field should be valid. 2. vmx_complete_interrupts() is done immediately after VM exits and reads original-event identification and reinjects the event there. 3. if KVM injects the exception in exiting-event identification to guest, KVM doesn't need to read the bit 13 because kvm_multiple_exception() is "smart enough" and recognize the exception as nested-exception because if bit 13 is 1, one exception must has been queued in #2.
my question is: what if the event in original-event identification is an interrupt e.g., external interrupt or NMI, rather than exception. vmx_complete_interrupts() won't queue an exception, then how can KVM or kvm_multiple_exception() know the exception that caused VM-exit is an nested exception w/o reading bit 13 of the Exiting-event identification?
The good news is that vmx_complete_interrupts() still queues the event even it's not a hardware exception. It's just that kvm_multiple_exception() doesn't check if there is an original interrupt or NMI because IDT event delivery doesn't care such a case.
I think your point is more of that we should check it when FRED is enabled for a guest. Yes, architecturally we should do it.
What I want to emphasize is that bit 13 of the exiting-event identification is set to the valid bit of the original-event identification, they are logically the same thing when FRED is enabled. It doens't matter which one a VMM reads and uses. But a VMM doesn't need to differentiate FRED and IDT if it reads the info from original-event identification.