From: LinFeng linfeng23@huawei.com
We found that the !is_zero_page() in kvm_is_mmio_pfn() was submmited in commit:90cff5a8cc("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()"), but reverted in commit:0ef2459983("kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()").
Maybe just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if we would treat zero_page as being reserved.
Well, as fixing all functions reference to kvm_is_reserved_pfn() in this patch, we found that only kvm_release_pfn_clean() and kvm_get_pfn() don't need special handling.
So, we thought why not only check is_zero_page() in before get and put page, and revert our last commit:31e813f38f("KVM: fix overflow of zero page refcount with ksm running") in master. Instead of adding !is_zero_page() in kvm_is_reserved_pfn(), new idea is as follow:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f9ee2929cfe..f9a1f9cf188e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1695,7 +1695,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
void kvm_release_pfn_clean(kvm_pfn_t pfn) { - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) + if (!is_error_noslot_pfn(pfn) && + (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn))) put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -1734,7 +1735,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
void kvm_get_pfn(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)) get_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_get_pfn);
We are confused why ZONE_DEVICE not do this, but treating it as no reserved. Is it racy if we change only use the patch in cover letter, but not the series patches.
And we check the code of v4.9.y v4.10.y v4.11.y v4.12.y, this bug exists in v4.11.y and later, but not in v4.9.y v4.10.y or before. After commit:e86c59b1b1("mm/ksm: improve deduplication of zero pages with colouring"), ksm will use zero pages with colouring. This feature was added in v4.11.y, so I wonder why v4.9.y has this bug.
We use crash tools attaching to /proc/kcore to check the refcount of zero_page, then create and destroy vm. The refcount stays at 1 on v4.9.y, well it increases only after v4.11.y. Are you sure it is the same bug you run into? Is there something we missing?
LinFeng (1): KVM: special handling of zero_page in some flows
Zhuang Yanying (1): KVM: fix overflow of zero page refcount with ksm running
arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 9 +++++---- 2 files changed, 7 insertions(+), 4 deletions(-)
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows.
step1: echo 10000 > /sys/kernel/mm/ksm/pages_to_scan echo 1 > /sys/kernel/mm/ksm/run echo 1 > /sys/kernel/mm/ksm/use_zero_pages
step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time.
After a long period of time, all domains hang because of the refcount of zero page overflow.
Qemu print error log as follow: error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04
Meanwhile, a kernel warning is departed.
[40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com --- virt/kvm/kvm_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e80f62f034c..7f7c22a687c0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -133,7 +133,8 @@ static bool largepages_enabled = true; bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { if (pfn_valid(pfn)) - return PageReserved(pfn_to_page(pfn)); + return PageReserved(pfn_to_page(pfn)) && + !is_zero_pfn(pfn);
return true; }
On Wed, Apr 01, 2020 at 08:50:55PM +0800, Zhuang Yanying wrote:
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows.
step1: echo 10000 > /sys/kernel/mm/ksm/pages_to_scan echo 1 > /sys/kernel/mm/ksm/run echo 1 > /sys/kernel/mm/ksm/use_zero_pages
step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time.
After a long period of time, all domains hang because of the refcount of zero page overflow.
Qemu print error log as follow: error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04
Meanwhile, a kernel warning is departed.
[40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com
virt/kvm/kvm_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7e80f62f034c..7f7c22a687c0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -133,7 +133,8 @@ static bool largepages_enabled = true; bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn));
return PageReserved(pfn_to_page(pfn)) &&
!is_zero_pfn(pfn);
return true; } -- 2.23.0
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
From: LinFeng linfeng23@huawei.com
Just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if not treat zero_page as being reserved.
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com --- arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 732c0270a489..e8db17d87c1a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2961,6 +2961,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, * here. */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && + !is_zero_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn)) && !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { @@ -5010,6 +5011,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, */ if (sp->role.direct && !kvm_is_reserved_pfn(pfn) && + !is_zero_pfn(pfn) && PageTransCompoundMap(pfn_to_page(pfn))) { drop_spte(kvm, sptep); need_tlb_flush = 1; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f7c22a687c0..fc0446bb393b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1660,7 +1660,7 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn) if (is_error_noslot_pfn(pfn)) return KVM_ERR_PTR_BAD_PAGE;
- if (kvm_is_reserved_pfn(pfn)) { + if (kvm_is_reserved_pfn(pfn) && is_zero_pfn(pfn)) { WARN_ON(1); return KVM_ERR_PTR_BAD_PAGE; } @@ -1719,7 +1719,7 @@ static void kvm_release_pfn_dirty(kvm_pfn_t pfn)
void kvm_set_pfn_dirty(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) { + if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) { struct page *page = pfn_to_page(pfn);
if (!PageReserved(page)) @@ -1730,7 +1730,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
void kvm_set_pfn_accessed(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
On Wed, Apr 01, 2020 at 08:50:56PM +0800, Zhuang Yanying wrote:
From: LinFeng linfeng23@huawei.com
Just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if not treat zero_page as being reserved.
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com
arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 732c0270a489..e8db17d87c1a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2961,6 +2961,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, * here. */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
!is_zero_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn)) && !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
@@ -5010,6 +5011,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, */ if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
!is_zero_pfn(pfn) && PageTransCompoundMap(pfn_to_page(pfn))) { drop_spte(kvm, sptep); need_tlb_flush = 1;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f7c22a687c0..fc0446bb393b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1660,7 +1660,7 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn) if (is_error_noslot_pfn(pfn)) return KVM_ERR_PTR_BAD_PAGE;
- if (kvm_is_reserved_pfn(pfn)) {
- if (kvm_is_reserved_pfn(pfn) && is_zero_pfn(pfn)) { WARN_ON(1); return KVM_ERR_PTR_BAD_PAGE; }
@@ -1719,7 +1719,7 @@ static void kvm_release_pfn_dirty(kvm_pfn_t pfn) void kvm_set_pfn_dirty(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn)) {
- if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) { struct page *page = pfn_to_page(pfn);
if (!PageReserved(page)) @@ -1730,7 +1730,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn))
- if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn));
} EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); -- 2.23.0
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
On Wed, Apr 01, 2020 at 08:50:54PM +0800, Zhuang Yanying wrote:
From: LinFeng linfeng23@huawei.com
We found that the !is_zero_page() in kvm_is_mmio_pfn() was submmited in commit:90cff5a8cc("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()"), but reverted in commit:0ef2459983("kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()").
Maybe just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if we would treat zero_page as being reserved.
Well, as fixing all functions reference to kvm_is_reserved_pfn() in this patch, we found that only kvm_release_pfn_clean() and kvm_get_pfn() don't need special handling.
So, we thought why not only check is_zero_page() in before get and put page, and revert our last commit:31e813f38f("KVM: fix overflow of zero page refcount with ksm running") in master. Instead of adding !is_zero_page() in kvm_is_reserved_pfn(), new idea is as follow:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f9ee2929cfe..f9a1f9cf188e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1695,7 +1695,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
void kvm_release_pfn_clean(kvm_pfn_t pfn) {
- if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
- if (!is_error_noslot_pfn(pfn) &&
put_page(pfn_to_page(pfn));(!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)))
} EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -1734,7 +1735,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
void kvm_get_pfn(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn))
- if (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)) get_page(pfn_to_page(pfn));
} EXPORT_SYMBOL_GPL(kvm_get_pfn);
We are confused why ZONE_DEVICE not do this, but treating it as no reserved. Is it racy if we change only use the patch in cover letter, but not the series patches.
And we check the code of v4.9.y v4.10.y v4.11.y v4.12.y, this bug exists in v4.11.y and later, but not in v4.9.y v4.10.y or before. After commit:e86c59b1b1("mm/ksm: improve deduplication of zero pages with colouring"), ksm will use zero pages with colouring. This feature was added in v4.11.y, so I wonder why v4.9.y has this bug.
We use crash tools attaching to /proc/kcore to check the refcount of zero_page, then create and destroy vm. The refcount stays at 1 on v4.9.y, well it increases only after v4.11.y. Are you sure it is the same bug you run into? Is there something we missing?
LinFeng (1): KVM: special handling of zero_page in some flows
Zhuang Yanying (1): KVM: fix overflow of zero page refcount with ksm running
arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 9 +++++---- 2 files changed, 7 insertions(+), 4 deletions(-)
-- 2.23.0
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
linux-stable-mirror@lists.linaro.org