We found that the !is_zero_page() in kvm_is_mmio_pfn() was submmited in commit:85c8555ff0("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()"), but reverted in commit:bf4bea8e9a("kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()").
Maybe just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:a78986aae9("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if we would not treat zero_page as being reserved.
And we check the code of v4.9.y v4.10.y v4.11.y v4.12.y, this bug exists in v4.11.y and later, but not in v4.9.y v4.10.y or before. After commit:e86c59b1b1("mm/ksm: improve deduplication of zero pages with colouring"), ksm will use zero pages with colouring.
We use crash tools attaching to /proc/kcore to check the refcount of zero_page, then create and destroy vm. The refcount stays at 1 on v4.9.y, well it increases only after v4.11.y.
Fix commit:7df003c852("KVM: fix overflow of zero page refcount with ksm running")
Cc: stable@vger.kernel.org Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com --- Well, as fixing all functions reference to kvm_is_reserved_pfn() in this patch, we found that only kvm_release_pfn_clean() and kvm_get_pfn() don't need special handling.
So, we thought why not only check is_zero_page() in before get and put page, and revert our last commit:7df003c852("KVM: fix overflow of zero page refcount with ksm running"). Instead of adding !is_zero_page() in kvm_is_reserved_pfn(), new idea is as follow:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c 7f9ee2929cfe..f9a1f9cf188e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1695,7 +1695,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
void kvm_release_pfn_clean(kvm_pfn_t pfn) {
- if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
- if (!is_error_noslot_pfn(pfn) &&
put_page(pfn_to_page(pfn));(!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)))
} EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -1734,7 +1735,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
void kvm_get_pfn(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn))
- if (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)) get_page(pfn_to_page(pfn));
} EXPORT_SYMBOL_GPL(kvm_get_pfn);
We are confused why ZONE_DEVICE not do this, but treating it as no reserved. Is it racy if we only use the patch above, and revert our last commit:7df003c852("KVM: fix overflow of zero page refcount with ksm running"). --- arch/x86/kvm/mmu/mmu.c | 4 +++- virt/kvm/kvm_main.c | 8 +++++--- 2 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 87e9ba27ada1..c82c0dfd3a67 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3285,7 +3285,8 @@ static int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, if (unlikely(max_level == PT_PAGE_TABLE_LEVEL)) return PT_PAGE_TABLE_LEVEL;
- if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn)) + if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn) || + is_zero_pfn(pfn)) return PT_PAGE_TABLE_LEVEL;
slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true); @@ -5914,6 +5915,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, * mapping if the indirect sp has level = 1. */ if (sp->role.direct && !kvm_is_reserved_pfn(pfn) && + !is_zero_pfn(pfn) && (kvm_is_zone_device_pfn(pfn) || PageCompound(pfn_to_page(pfn)))) { pte_list_remove(rmap_head, sptep); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 70f03ce0e5c1..dff3b94e6270 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1800,7 +1800,7 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn) if (is_error_noslot_pfn(pfn)) return KVM_ERR_PTR_BAD_PAGE;
- if (kvm_is_reserved_pfn(pfn)) { + if (kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) { WARN_ON(1); return KVM_ERR_PTR_BAD_PAGE; } @@ -2007,14 +2007,16 @@ EXPORT_SYMBOL_GPL(kvm_release_pfn_dirty);
void kvm_set_pfn_dirty(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) && + !kvm_is_zone_device_pfn(pfn) && !is_zero_pfn(pfn)) SetPageDirty(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
void kvm_set_pfn_accessed(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn) && !kvm_is_zone_device_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) && + !kvm_is_zone_device_pfn(pfn) && !is_zero_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);