From: Paolo Bonzini pbonzini@redhat.com
commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.14, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 547ae59199db..4e23d0b4b810 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1491,9 +1491,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn; + pte_t *ptep; + spinlock_t *ptl; int r;
- r = follow_pfn(vma, addr, &pfn); + r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) { /* * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does @@ -1508,14 +1510,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r;
- r = follow_pfn(vma, addr, &pfn); + r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) return r; + }
+ if (write_fault && !pte_write(*ptep)) { + pfn = KVM_PFN_ERR_RO_FAULT; + goto out; }
if (writable) - *writable = true; + *writable = pte_write(*ptep); + pfn = pte_pfn(*ptep);
/* * Get a reference here because callers of *hva_to_pfn* and @@ -1530,6 +1537,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn);
+out: + pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }
From: Nicholas Piggin npiggin@gmail.com
commit f8be156be163a052a067306417cd0ff679068c97 upstream.
It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly.
Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page).
This addresses CVE-2021-22543.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Paolo Bonzini pbonzini@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4e23d0b4b810..469361d01116 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1485,6 +1485,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; }
+static int kvm_try_get_pfn(kvm_pfn_t pfn) +{ + if (kvm_is_reserved_pfn(pfn)) + return 1; + return get_page_unless_zero(pfn_to_page(pfn)); +} + static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable, @@ -1534,13 +1541,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier. + * + * Certain IO or PFNMAP mappings can be backed with valid + * struct pages, but be allocated without refcounting e.g., + * tail pages of non-compound higher order allocations, which + * would then underflow the refcount when the caller does the + * required put_page. Don't allow those pages here. */ - kvm_get_pfn(pfn); + if (!kvm_try_get_pfn(pfn)) + r = -EFAULT;
out: pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; - return 0; + + return r; }
/*
On 03/08/21 15:55, Ovidiu Panait wrote:
From: Nicholas Piggin npiggin@gmail.com
commit f8be156be163a052a067306417cd0ff679068c97 upstream.
It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly.
Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page).
This addresses CVE-2021-22543.
Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Paolo Bonzini pbonzini@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com
virt/kvm/kvm_main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4e23d0b4b810..469361d01116 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1485,6 +1485,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; } +static int kvm_try_get_pfn(kvm_pfn_t pfn) +{
- if (kvm_is_reserved_pfn(pfn))
return 1;
- return get_page_unless_zero(pfn_to_page(pfn));
+}
- static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable,
@@ -1534,13 +1541,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier.
*
* Certain IO or PFNMAP mappings can be backed with valid
* struct pages, but be allocated without refcounting e.g.,
* tail pages of non-compound higher order allocations, which
* would then underflow the refcount when the caller does the
*/* required put_page. Don't allow those pages here.
- kvm_get_pfn(pfn);
- if (!kvm_try_get_pfn(pfn))
r = -EFAULT;
out: pte_unmap_unlock(ptep, ptl); *p_pfn = pfn;
- return 0;
- return r; }
/*
Acked-by: Paolo Bonzini pbonzini@redhat.com
From: Sean Christopherson seanjc@google.com
commit a9545779ee9e9e103648f6f2552e73cfe808d0f4 upstream
Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving a so called "remapped" hva/pfn pair. In theory, the hva could resolve to a pfn in high memory on a 32-bit kernel.
This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn"), which added an error PFN value to the mix, causing gcc to comlain about overflowing the unsigned long.
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’: include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’ to ‘long unsigned int’ changes value from ‘9218868437227405314’ to ‘2’ [-Werror=overflow] 89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) | ^ virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
Cc: stable@vger.kernel.org Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up") Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210208201940.1258328-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 469361d01116..36b9f2b29071 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1497,7 +1497,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, bool write_fault, bool *writable, kvm_pfn_t *p_pfn) { - unsigned long pfn; + kvm_pfn_t pfn; pte_t *ptep; spinlock_t *ptl; int r;
On 03/08/21 15:55, Ovidiu Panait wrote:
From: Sean Christopherson seanjc@google.com
commit a9545779ee9e9e103648f6f2552e73cfe808d0f4 upstream
Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving a so called "remapped" hva/pfn pair. In theory, the hva could resolve to a pfn in high memory on a 32-bit kernel.
This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn"), which added an error PFN value to the mix, causing gcc to comlain about overflowing the unsigned long.
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’: include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’ to ‘long unsigned int’ changes value from ‘9218868437227405314’ to ‘2’ [-Werror=overflow] 89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) | ^ virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
Cc: stable@vger.kernel.org Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up") Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210208201940.1258328-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com
virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 469361d01116..36b9f2b29071 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1497,7 +1497,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, bool write_fault, bool *writable, kvm_pfn_t *p_pfn) {
- unsigned long pfn;
- kvm_pfn_t pfn; pte_t *ptep; spinlock_t *ptl; int r;
Acked-by: Paolo Bonzini pbonzini@redhat.com
On 03/08/21 15:55, Ovidiu Panait wrote:
From: Paolo Bonzini pbonzini@redhat.com
commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.14, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com
virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 547ae59199db..4e23d0b4b810 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1491,9 +1491,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn;
- pte_t *ptep;
- spinlock_t *ptl; int r;
- r = follow_pfn(vma, addr, &pfn);
- r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) { /*
- get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1508,14 +1510,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r;
r = follow_pfn(vma, addr, &pfn);
if (r) return r;r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl);
- }
- if (write_fault && !pte_write(*ptep)) {
pfn = KVM_PFN_ERR_RO_FAULT;
}goto out;
if (writable)
*writable = true;
*writable = pte_write(*ptep);
- pfn = pte_pfn(*ptep);
/* * Get a reference here because callers of *hva_to_pfn* and @@ -1530,6 +1537,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn); +out:
- pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }
Acked-by: Paolo Bonzini pbonzini@redhat.com
On Tue, Aug 03, 2021 at 04:55:19PM +0300, Ovidiu Panait wrote:
From: Paolo Bonzini pbonzini@redhat.com
commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.
In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.14, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com
virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)
All now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org