On Tue, Jan 04, 2022, Paolo Bonzini wrote:
On 12/29/21 14:13, Yang Zhong wrote:
Highly appreciate for your review. This version mostly addressed the comments from Sean. Most comments are adopted except three which are not closed and need more discussions:
- Move the entire xfd write emulation code to x86.c. Doing so requires introducing a new kvm_x86_ops callback to disable msr write bitmap. According to Paolo's earlier comment he prefers to handle it in vmx.c.
Yes, I do.
No objection, my comments were prior to seeing the patches that manipulated the bitmap, e.g. in the earlier patches, having anything in vmx.c is unnecessary.
- Directly check msr_bitmap in update_exception_bitmap() (for trapping #NM) and vcpu_enter_guest() (for syncing guest xfd after vm-exit) instead of introducing an extra flag in the last patch. However, doing so requires another new kvm_x86_ops callback for checking msr_bitmap since vcpu_enter_guest() is x86 common code. Having an extra flag sounds simpler here (at least for the initial AMX support). It does penalize nested guest with one xfd sync per exit, but it's not worse than a normal guest which initializes xfd but doesn't run AMX applications at all. Those could be improved afterwards.
The thing to do here would be to move MAX_POSSIBLE_PASSTHROUGH_MSRS/MAX_DIRECT_ACCESS_MSRS from VMX/SVM to core code. For now we can keep the flag.
- Disable #NM trap for nested guest. This version still chooses to always trap #NM (regardless in L1 or L2) as long as xfd write interception is disabled. In reality #NM is rare if nested guest doesn't intend to run AMX applications and always-trap is safer than dynamic trap for the basic support in case of any oversight here.
Sean was justifying this with lack of support for nested AMX, but I'm not sure actually what is missing at all. That is, an L1 hypervisor could expose AMX to L2, and then an L2->L0->L2 exit/reentry would have to trap #NM. Otherwise it would miss an XFD_ERR update.
Ya, I was assuming there was something L0 needed to do to supported nested AMX, but as Paolo pointed out there are no VMCS bits, so L0 just needs to correctly handle #NM and MSR interceptions according to vmcs12.