On Wed, Jul 06, 2022, Chao Peng wrote:
A page fault can carry the private/shared information for KVM_MEM_PRIVATE memslot, this can be filled by architecture code(like TDX code). To handle page fault for such access, KVM maps the page only when this private property matches the host's view on the page.
For a successful match, private pfn is obtained with memfile_notifier callbacks from private fd and shared pfn is obtained with existing get_user_pages.
For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to userspace. Userspace then can convert memory between private/shared from host's view then retry the access.
Co-developed-by: Yu Zhang yu.c.zhang@linux.intel.com Signed-off-by: Yu Zhang yu.c.zhang@linux.intel.com Signed-off-by: Chao Peng chao.p.peng@linux.intel.com
arch/x86/kvm/mmu/mmu.c | 60 ++++++++++++++++++++++++++++++++- arch/x86/kvm/mmu/mmu_internal.h | 18 ++++++++++ arch/x86/kvm/mmu/mmutrace.h | 1 + include/linux/kvm_host.h | 35 ++++++++++++++++++- 4 files changed, 112 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 545eb74305fe..27dbdd4fe8d1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3004,6 +3004,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm, if (max_level == PG_LEVEL_4K) return PG_LEVEL_4K;
- if (kvm_mem_is_private(kvm, gfn))
return max_level;
- host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot); return min(host_level, max_level);
} @@ -4101,10 +4104,52 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); } +static inline u8 order_to_level(int order) +{
- enum pg_level level;
- for (level = KVM_MAX_HUGEPAGE_LEVEL; level > PG_LEVEL_4K; level--)
Curly braces needed for the for-loop.
And I think it makes sense to take in the fault->max_level, that way this is slightly more performant when the guest mapping is smaller than the host, e.g.
for (level = max_level; level > PG_LEVEL_4K; level--) ...
return level;
Though I think I'd vote to avoid a loop entirely and do:
BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
if (order > ???) return PG_LEVEL_1G; if (order > ???) return PG_LEVEL_2M;
return PG_LEVEL_4K;
if (order >= page_level_shift(level) - PAGE_SHIFT)
return level;
- return level;
+}
+static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
+{
- int order;
- struct kvm_memory_slot *slot = fault->slot;
- bool private_exist = kvm_mem_is_private(vcpu->kvm, fault->gfn);
- if (fault->is_private != private_exist) {
vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
if (fault->is_private)
vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
else
vcpu->run->memory.flags = 0;
vcpu->run->memory.padding = 0;
vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
vcpu->run->memory.size = PAGE_SIZE;
return RET_PF_USER;
- }
- if (fault->is_private) {
if (kvm_private_mem_get_pfn(slot, fault->gfn, &fault->pfn, &order))
return RET_PF_RETRY;
fault->max_level = min(order_to_level(order), fault->max_level);
fault->map_writable = !(slot->flags & KVM_MEM_READONLY);
return RET_PF_FIXED;
- }
- /* Fault is shared, fallthrough. */
- return RET_PF_CONTINUE;
+}
static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { struct kvm_memory_slot *slot = fault->slot; bool async;
- int r;
/* * Retry the page fault if the gfn hit a memslot that is being deleted @@ -4133,6 +4178,12 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) return RET_PF_EMULATE; }
- if (kvm_slot_can_be_private(slot)) {
r = kvm_faultin_pfn_private(vcpu, fault);
if (r != RET_PF_CONTINUE)
return r == RET_PF_FIXED ? RET_PF_CONTINUE : r;
I apologize if I've given you conflicting feedback in the past. Now that this returns RET_PF_* directly, I definitely think it makes sense to do:
if (kvm_slot_can_be_private(slot) && fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) { vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT; if (fault->is_private) vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE; else vcpu->run->memory.flags = 0; vcpu->run->memory.padding = 0; vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT; vcpu->run->memory.size = PAGE_SIZE; return RET_PF_USER; }
if (fault->is_private) return kvm_faultin_pfn_private(vcpu, fault);
That way kvm_faultin_pfn_private() only handles private faults, and this doesn't need to play games with RET_PF_FIXED.
- }
- async = false; fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, fault->write, &fault->map_writable,
@@ -4241,7 +4292,11 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault read_unlock(&vcpu->kvm->mmu_lock); else write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
- if (fault->is_private)
kvm_private_mem_put_pfn(fault->slot, fault->pfn);
- else
kvm_release_pfn_clean(fault->pfn);
AFAIK, we never bottomed out on whether or not this is needed[*]. Can you follow up with Kirill to get an answer before posting v8?
[*] https://lore.kernel.org/all/20220620141647.GC2016793@chaop.bj.intel.com