Hello,
the following one line commit
commit 7df003c85218b5f5b10a7f6418208f31e813f38f Author: Zhuang Yanying ann.zhuangyanying@huawei.com Date: Sat Oct 12 11:37:31 2019 +0800
KVM: fix overflow of zero page refcount with ksm running
applies cleanly to v5.5.y, v5.4.y, 4.19.y, v4.14.y and v4.9.y.
I actually ran into that bug on 4.9.y
Thanks in advance,
Thomas
On 26/03/20 18:43, Thomas Voegtle wrote:
Hello,
the following one line commit
commit 7df003c85218b5f5b10a7f6418208f31e813f38f Author: Zhuang Yanying ann.zhuangyanying@huawei.com Date: Sat Oct 12 11:37:31 2019 +0800
KVM: fix overflow of zero page refcount with ksm running
applies cleanly to v5.5.y, v5.4.y, 4.19.y, v4.14.y and v4.9.y.
I actually ran into that bug on 4.9.y
Thanks in advance,
Thomas
Yes, indeed. It's not a trivial backport though, so I prefer to do it manually. I can help with that, or with reviews if Yanying already has patches ready.
Thanks,
Paolo
On Thu, 26 Mar 2020, Paolo Bonzini wrote:
On 26/03/20 18:43, Thomas Voegtle wrote:
Hello,
the following one line commit
commit 7df003c85218b5f5b10a7f6418208f31e813f38f Author: Zhuang Yanying ann.zhuangyanying@huawei.com Date: Sat Oct 12 11:37:31 2019 +0800
KVM: fix overflow of zero page refcount with ksm running
applies cleanly to v5.5.y, v5.4.y, 4.19.y, v4.14.y and v4.9.y.
I actually ran into that bug on 4.9.y
Thanks in advance,
Thomas
Yes, indeed. It's not a trivial backport though, so I prefer to do it manually. I can help with that, or with reviews if Yanying already has patches ready.
Are you sure we are talking about the same commit? Or do I really see something wrong here? It's an one liner. Applies cleanly. Compiles. Don't know if it fixes the bug, though.
Thomas
On 26/03/20 21:08, Thomas Voegtle wrote:
On Thu, 26 Mar 2020, Paolo Bonzini wrote:
On 26/03/20 18:43, Thomas Voegtle wrote:
Hello,
the following one line commit
commit 7df003c85218b5f5b10a7f6418208f31e813f38f Author: Zhuang Yanying ann.zhuangyanying@huawei.com Date: Sat Oct 12 11:37:31 2019 +0800
KVM: fix overflow of zero page refcount with ksm running
applies cleanly to v5.5.y, v5.4.y, 4.19.y, v4.14.y and v4.9.y.
I actually ran into that bug on 4.9.y
Thanks in advance,
Thomas
Yes, indeed. It's not a trivial backport though, so I prefer to do it manually. I can help with that, or with reviews if Yanying already has patches ready.
Are you sure we are talking about the same commit? Or do I really see something wrong here? It's an one liner. Applies cleanly. Compiles. Don't know if it fixes the bug, though.
It's a one liner, but usage of the function it touches has changed subtly over the years. So before committing it to stable I need to check that all uses of kvm_is_reserved_pfn match 5.6 (and also kvm_is_zone_device_pfn is probably not there in older kernels, though I might be missing more stable backports there).
Paolo
On Thu, Mar 26, 2020 at 07:04:15PM +0100, Paolo Bonzini wrote:
On 26/03/20 18:43, Thomas Voegtle wrote:
Hello,
the following one line commit
commit 7df003c85218b5f5b10a7f6418208f31e813f38f Author: Zhuang Yanying ann.zhuangyanying@huawei.com Date: Sat Oct 12 11:37:31 2019 +0800
KVM: fix overflow of zero page refcount with ksm running
applies cleanly to v5.5.y, v5.4.y, 4.19.y, v4.14.y and v4.9.y.
I actually ran into that bug on 4.9.y
Thanks in advance,
Thomas
Yes, indeed. It's not a trivial backport though, so I prefer to do it manually. I can help with that, or with reviews if Yanying already has patches ready.
Backports would be great, I'll wait for them before applying them.
thanks,
greg k-h
From: LinFeng linfeng23@huawei.com
We found that the !is_zero_page() in kvm_is_mmio_pfn() was submmited in commit:90cff5a8cc("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()"), but reverted in commit:0ef2459983("kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()").
Maybe just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if we treat zero_page as being reserved.
Well, as fixing all functions reference to kvm_is_reserved_pfn() in this patch, we found that only kvm_release_pfn_clean() and kvm_get_pfn() don't need special handling.
So, we thought why not only check is_zero_page() in before get and put page, and revert our last commit:31e813f38f("KVM: fix overflow of zero page refcount with ksm running"). Instead of add !is_zero_page() in kvm_is_reserved_pfn(), new idea is as follow:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f9ee2929cfe..f9a1f9cf188e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1695,7 +1695,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean);
void kvm_release_pfn_clean(kvm_pfn_t pfn) { - if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn)) + if (!is_error_noslot_pfn(pfn) && + (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn))) put_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -1734,7 +1735,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
void kvm_get_pfn(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)) get_page(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_get_pfn);
We are confused why ZONE_DEVICE not do this, but treating it as no reserved. Is it racy if we change only use the patch in cover letter, but not the series patches.
LinFeng (1): KVM: special handling of zero_page in some flows
Zhuang Yanying (1): KVM: fix overflow of zero page refcount with ksm running
arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 9 +++++---- 2 files changed, 7 insertions(+), 4 deletions(-)
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows.
step1: echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan echo 1 > /sys/kernel/pages_to_scan/run echo 1 > /sys/kernel/pages_to_scan/use_zero_pages
step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time.
After a long period of time, all domains hang because of the refcount of zero page overflow.
Qemu print error log as follow: … error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04 …
Meanwhile, a kernel warning is departed.
[40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com --- virt/kvm/kvm_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f9ee2929cfe..d49cdd6ca883 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -128,7 +128,8 @@ static bool largepages_enabled = true; bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { if (pfn_valid(pfn)) - return PageReserved(pfn_to_page(pfn)); + return PageReserved(pfn_to_page(pfn)) && + !is_zero_pfn(pfn);
return true; }
On Mon, Mar 30, 2020 at 09:32:41PM +0800, Zhuang Yanying wrote:
We are testing Virtual Machine with KSM on v5.4-rc2 kernel, and found the zero_page refcount overflow. The cause of refcount overflow is increased in try_async_pf (get_user_page) without being decreased in mmu_set_spte() while handling ept violation. In kvm_release_pfn_clean(), only unreserved page will call put_page. However, zero page is reserved. So, as well as creating and destroy vm, the refcount of zero page will continue to increase until it overflows.
step1: echo 10000 > /sys/kernel/pages_to_scan/pages_to_scan echo 1 > /sys/kernel/pages_to_scan/run echo 1 > /sys/kernel/pages_to_scan/use_zero_pages
step2: just create several normal qemu kvm vms. And destroy it after 10s. Repeat this action all the time.
After a long period of time, all domains hang because of the refcount of zero page overflow.
Qemu print error log as follow: … error: kvm run failed Bad address EAX=00006cdc EBX=00000008 ECX=80202001 EDX=078bfbfd ESI=ffffffff EDI=00000000 EBP=00000008 ESP=00006cc4 EIP=000efd75 EFL=00010002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00c09b00 DPL=0 CS32 [-RA] SS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00c09300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 000f7070 00000037 IDT= 000f70ae 00000000 CR0=00000011 CR2=00000000 CR3=00000000 CR4=00000000 DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000 DR6=00000000ffff0ff0 DR7=0000000000000400 EFER=0000000000000000 Code=00 01 00 00 00 e9 e8 00 00 00 c7 05 4c 55 0f 00 01 00 00 00 <8b> 35 00 00 01 00 8b 3d 04 00 01 00 b8 d8 d3 00 00 c1 e0 08 0c ea a3 00 00 01 00 c7 05 04 …
Meanwhile, a kernel warning is departed.
[40914.836375] WARNING: CPU: 3 PID: 82067 at ./include/linux/mm.h:987 try_get_page+0x1f/0x30 [40914.836412] CPU: 3 PID: 82067 Comm: CPU 0/KVM Kdump: loaded Tainted: G OE 5.2.0-rc2 #5 [40914.836415] RIP: 0010:try_get_page+0x1f/0x30 [40914.836417] Code: 40 00 c3 0f 1f 84 00 00 00 00 00 48 8b 47 08 a8 01 75 11 8b 47 34 85 c0 7e 10 f0 ff 47 34 b8 01 00 00 00 c3 48 8d 78 ff eb e9 <0f> 0b 31 c0 c3 66 90 66 2e 0f 1f 84 00 0 0 00 00 00 48 8b 47 08 a8 [40914.836418] RSP: 0018:ffffb4144e523988 EFLAGS: 00010286 [40914.836419] RAX: 0000000080000000 RBX: 0000000000000326 RCX: 0000000000000000 [40914.836420] RDX: 0000000000000000 RSI: 00004ffdeba10000 RDI: ffffdf07093f6440 [40914.836421] RBP: ffffdf07093f6440 R08: 800000424fd91225 R09: 0000000000000000 [40914.836421] R10: ffff9eb41bfeebb8 R11: 0000000000000000 R12: ffffdf06bbd1e8a8 [40914.836422] R13: 0000000000000080 R14: 800000424fd91225 R15: ffffdf07093f6440 [40914.836423] FS: 00007fb60ffff700(0000) GS:ffff9eb4802c0000(0000) knlGS:0000000000000000 [40914.836425] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [40914.836426] CR2: 0000000000000000 CR3: 0000002f220e6002 CR4: 00000000003626e0 [40914.836427] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [40914.836427] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [40914.836428] Call Trace: [40914.836433] follow_page_pte+0x302/0x47b [40914.836437] __get_user_pages+0xf1/0x7d0 [40914.836441] ? irq_work_queue+0x9/0x70 [40914.836443] get_user_pages_unlocked+0x13f/0x1e0 [40914.836469] __gfn_to_pfn_memslot+0x10e/0x400 [kvm] [40914.836486] try_async_pf+0x87/0x240 [kvm] [40914.836503] tdp_page_fault+0x139/0x270 [kvm] [40914.836523] kvm_mmu_page_fault+0x76/0x5e0 [kvm] [40914.836588] vcpu_enter_guest+0xb45/0x1570 [kvm] [40914.836632] kvm_arch_vcpu_ioctl_run+0x35d/0x580 [kvm] [40914.836645] kvm_vcpu_ioctl+0x26e/0x5d0 [kvm] [40914.836650] do_vfs_ioctl+0xa9/0x620 [40914.836653] ksys_ioctl+0x60/0x90 [40914.836654] __x64_sys_ioctl+0x16/0x20 [40914.836658] do_syscall_64+0x5b/0x180 [40914.836664] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [40914.836666] RIP: 0033:0x7fb61cb6bfc7
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com
virt/kvm/kvm_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f9ee2929cfe..d49cdd6ca883 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -128,7 +128,8 @@ static bool largepages_enabled = true; bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { if (pfn_valid(pfn))
return PageReserved(pfn_to_page(pfn));
return PageReserved(pfn_to_page(pfn)) &&
!is_zero_pfn(pfn);
return true; } -- 2.23.0
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
From: LinFeng linfeng23@huawei.com
Just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if not treat zero_page as being reserved.
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com --- arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d9c7e986b4e4..63155e67abf7 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2826,6 +2826,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, * here. */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && + !is_zero_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn)) && !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { @@ -4788,6 +4789,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, */ if (sp->role.direct && !kvm_is_reserved_pfn(pfn) && + !is_zero_pfn(pfn) && PageTransCompoundMap(pfn_to_page(pfn))) { drop_spte(kvm, sptep); need_tlb_flush = 1; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d49cdd6ca883..ddc8c2a6e206 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1658,7 +1658,7 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn) if (is_error_noslot_pfn(pfn)) return KVM_ERR_PTR_BAD_PAGE;
- if (kvm_is_reserved_pfn(pfn)) { + if (kvm_is_reserved_pfn(pfn) && is_zero_pfn(pfn)) { WARN_ON(1); return KVM_ERR_PTR_BAD_PAGE; } @@ -1717,7 +1717,7 @@ static void kvm_release_pfn_dirty(kvm_pfn_t pfn)
void kvm_set_pfn_dirty(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) { + if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) { struct page *page = pfn_to_page(pfn);
if (!PageReserved(page)) @@ -1728,7 +1728,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty);
void kvm_set_pfn_accessed(kvm_pfn_t pfn) { - if (!kvm_is_reserved_pfn(pfn)) + if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn)); } EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed);
On Mon, Mar 30, 2020 at 09:32:42PM +0800, Zhuang Yanying wrote:
From: LinFeng linfeng23@huawei.com
Just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if not treat zero_page as being reserved.
Signed-off-by: LinFeng linfeng23@huawei.com Signed-off-by: Zhuang Yanying ann.zhuangyanying@huawei.com
arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d9c7e986b4e4..63155e67abf7 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2826,6 +2826,7 @@ static void transparent_hugepage_adjust(struct kvm_vcpu *vcpu, * here. */ if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) &&
!is_zero_pfn(pfn) && level == PT_PAGE_TABLE_LEVEL && PageTransCompoundMap(pfn_to_page(pfn)) && !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) {
@@ -4788,6 +4789,7 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm, */ if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
!is_zero_pfn(pfn) && PageTransCompoundMap(pfn_to_page(pfn))) { drop_spte(kvm, sptep); need_tlb_flush = 1;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d49cdd6ca883..ddc8c2a6e206 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1658,7 +1658,7 @@ static struct page *kvm_pfn_to_page(kvm_pfn_t pfn) if (is_error_noslot_pfn(pfn)) return KVM_ERR_PTR_BAD_PAGE;
- if (kvm_is_reserved_pfn(pfn)) {
- if (kvm_is_reserved_pfn(pfn) && is_zero_pfn(pfn)) { WARN_ON(1); return KVM_ERR_PTR_BAD_PAGE; }
@@ -1717,7 +1717,7 @@ static void kvm_release_pfn_dirty(kvm_pfn_t pfn) void kvm_set_pfn_dirty(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn)) {
- if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) { struct page *page = pfn_to_page(pfn);
if (!PageReserved(page)) @@ -1728,7 +1728,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_dirty); void kvm_set_pfn_accessed(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn))
- if (!kvm_is_reserved_pfn(pfn) && !is_zero_pfn(pfn)) mark_page_accessed(pfn_to_page(pfn));
} EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); -- 2.23.0
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
On Mon, Mar 30, 2020 at 09:32:40PM +0800, Zhuang Yanying wrote:
From: LinFeng linfeng23@huawei.com
We found that the !is_zero_page() in kvm_is_mmio_pfn() was submmited in commit:90cff5a8cc("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()"), but reverted in commit:0ef2459983("kvm: fix kvm_is_mmio_pfn() and rename to kvm_is_reserved_pfn()").
Maybe just adding !is_zero_page() to kvm_is_reserved_pfn() is too rough. According to commit:e433e83bc3("KVM: MMU: Do not treat ZONE_DEVICE pages as being reserved"), special handling in some other flows is also need by zero_page, if we treat zero_page as being reserved.
Well, as fixing all functions reference to kvm_is_reserved_pfn() in this patch, we found that only kvm_release_pfn_clean() and kvm_get_pfn() don't need special handling.
So, we thought why not only check is_zero_page() in before get and put page, and revert our last commit:31e813f38f("KVM: fix overflow of zero page refcount with ksm running"). Instead of add !is_zero_page() in kvm_is_reserved_pfn(), new idea is as follow:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 7f9ee2929cfe..f9a1f9cf188e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1695,7 +1695,8 @@ EXPORT_SYMBOL_GPL(kvm_release_page_clean); void kvm_release_pfn_clean(kvm_pfn_t pfn) {
- if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn))
- if (!is_error_noslot_pfn(pfn) &&
put_page(pfn_to_page(pfn));(!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)))
} EXPORT_SYMBOL_GPL(kvm_release_pfn_clean); @@ -1734,7 +1735,7 @@ EXPORT_SYMBOL_GPL(kvm_set_pfn_accessed); void kvm_get_pfn(kvm_pfn_t pfn) {
- if (!kvm_is_reserved_pfn(pfn))
- if (!kvm_is_reserved_pfn(pfn) || is_zero_pfn(pfn)) get_page(pfn_to_page(pfn));
} EXPORT_SYMBOL_GPL(kvm_get_pfn);
We are confused why ZONE_DEVICE not do this, but treating it as no reserved. Is it racy if we change only use the patch in cover letter, but not the series patches.
LinFeng (1): KVM: special handling of zero_page in some flows
Zhuang Yanying (1): KVM: fix overflow of zero page refcount with ksm running
arch/x86/kvm/mmu.c | 2 ++ virt/kvm/kvm_main.c | 9 +++++---- 2 files changed, 7 insertions(+), 4 deletions(-)
-- 2.23.0
<formletter>
This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly.
</formletter>
linux-stable-mirror@lists.linaro.org