[PATCH 4.14 1/3] KVM: do not assume PTE is writable after follow_pfn

List overview All Threads
Download

newer

older

stable-rc/queue/4.19 build: 185...

[PATCH] firmware: arm_scmi: Add...

Ovidiu Panait

3 Aug 2021 3 Aug '21

1:55 p.m.

From: Paolo Bonzini pbonzini@redhat.com

commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.

In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.

In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.

Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.

Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.14, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 547ae59199db..4e23d0b4b810 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1491,9 +1491,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn; + pte_t *ptep; + spinlock_t *ptl; int r;

- r = follow_pfn(vma, addr, &pfn); + r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) { /* * get_user_pages fails for VM_IO and VM_PFNMAP vmas and does @@ -1508,14 +1510,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r;

- r = follow_pfn(vma, addr, &pfn); + r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) return r; + }

+ if (write_fault && !pte_write(*ptep)) { + pfn = KVM_PFN_ERR_RO_FAULT; + goto out; }

if (writable) - *writable = true; + *writable = pte_write(*ptep); + pfn = pte_pfn(*ptep);

/* * Get a reference here because callers of *hva_to_pfn* and @@ -1530,6 +1537,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn);

+out: + pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }

-- 2.25.1

Show replies by date

Ovidiu Panait

3 Aug 3 Aug

1:55 p.m.

New subject: [PATCH 4.14 2/3] KVM: do not allow mapping valid but non-reference-counted pages

From: Nicholas Piggin npiggin@gmail.com

commit f8be156be163a052a067306417cd0ff679068c97 upstream.

It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page).

This addresses CVE-2021-22543.

Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Paolo Bonzini pbonzini@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4e23d0b4b810..469361d01116 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1485,6 +1485,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; }

+static int kvm_try_get_pfn(kvm_pfn_t pfn) +{ + if (kvm_is_reserved_pfn(pfn)) + return 1; + return get_page_unless_zero(pfn_to_page(pfn)); +} + static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable, @@ -1534,13 +1541,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier. + * + * Certain IO or PFNMAP mappings can be backed with valid + * struct pages, but be allocated without refcounting e.g., + * tail pages of non-compound higher order allocations, which + * would then underflow the refcount when the caller does the + * required put_page. Don't allow those pages here. */ - kvm_get_pfn(pfn); + if (!kvm_try_get_pfn(pfn)) + r = -EFAULT;

out: pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; - return 0; + + return r; }

-- 2.25.1

Paolo Bonzini

4 Aug 4 Aug

8:51 a.m.

New subject: [PATCH 4.14 2/3] KVM: do not allow mapping valid but non-reference-counted pages

On 03/08/21 15:55, Ovidiu Panait wrote:

...

From: Nicholas Piggin npiggin@gmail.com

commit f8be156be163a052a067306417cd0ff679068c97 upstream.

It's possible to create a region which maps valid but non-refcounted pages (e.g., tail pages of non-compound higher order allocations). These host pages can then be returned by gfn_to_page, gfn_to_pfn, etc., family of APIs, which take a reference to the page, which takes it from 0 to 1. When the reference is dropped, this will free the page incorrectly.

Fix this by only taking a reference on valid pages if it was non-zero, which indicates it is participating in normal refcounting (and can be released with put_page).

This addresses CVE-2021-22543.

Signed-off-by: Nicholas Piggin npiggin@gmail.com Tested-by: Paolo Bonzini pbonzini@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com

virt/kvm/kvm_main.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4e23d0b4b810..469361d01116 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1485,6 +1485,13 @@ static bool vma_is_valid(struct vm_area_struct *vma, bool write_fault) return true; } +static int kvm_try_get_pfn(kvm_pfn_t pfn) +{
if (kvm_is_reserved_pfn(pfn))
return 1;
return get_page_unless_zero(pfn_to_page(pfn));
+}

static int hva_to_pfn_remapped(struct vm_area_struct *vma, unsigned long addr, bool *async, bool write_fault, bool *writable,

@@ -1534,13 +1541,21 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * Whoever called remap_pfn_range is also going to call e.g. * unmap_mapping_range before the underlying pages are freed, * causing a call to our MMU notifier.
*
* Certain IO or PFNMAP mappings can be backed with valid
* struct pages, but be allocated without refcounting e.g.,
* tail pages of non-compound higher order allocations, which
* would then underflow the refcount when the caller does the
* required put_page. Don't allow those pages here.
*/
kvm_get_pfn(pfn);
if (!kvm_try_get_pfn(pfn))
r = -EFAULT;
out: pte_unmap_unlock(ptep, ptl); *p_pfn = pfn;

return 0;

return r; }

/*

Acked-by: Paolo Bonzini pbonzini@redhat.com

Ovidiu Panait

3 Aug 3 Aug

1:55 p.m.

New subject: [PATCH 4.14 3/3] KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()

From: Sean Christopherson seanjc@google.com

commit a9545779ee9e9e103648f6f2552e73cfe808d0f4 upstream

Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving a so called "remapped" hva/pfn pair. In theory, the hva could resolve to a pfn in high memory on a 32-bit kernel.

This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn"), which added an error PFN value to the mix, causing gcc to comlain about overflowing the unsigned long.

arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’: include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’ to ‘long unsigned int’ changes value from ‘9218868437227405314’ to ‘2’ [-Werror=overflow] 89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) | ^ virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’

Cc: stable@vger.kernel.org Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up") Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210208201940.1258328-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com --- virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 469361d01116..36b9f2b29071 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1497,7 +1497,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, bool write_fault, bool *writable, kvm_pfn_t *p_pfn) { - unsigned long pfn; + kvm_pfn_t pfn; pte_t *ptep; spinlock_t *ptl; int r;

-- 2.25.1

Paolo Bonzini

4 Aug 4 Aug

8:51 a.m.

New subject: [PATCH 4.14 3/3] KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()

On 03/08/21 15:55, Ovidiu Panait wrote:

...

From: Sean Christopherson seanjc@google.com

commit a9545779ee9e9e103648f6f2552e73cfe808d0f4 upstream

Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving a so called "remapped" hva/pfn pair. In theory, the hva could resolve to a pfn in high memory on a 32-bit kernel.

This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not assume PTE is writable after follow_pfn"), which added an error PFN value to the mix, causing gcc to comlain about overflowing the unsigned long.

arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’: include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’ to ‘long unsigned int’ changes value from ‘9218868437227405314’ to ‘2’ [-Werror=overflow] 89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) | ^ virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’

Cc: stable@vger.kernel.org Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up") Signed-off-by: Sean Christopherson seanjc@google.com Message-Id: 20210208201940.1258328-1-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com

virt/kvm/kvm_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 469361d01116..36b9f2b29071 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1497,7 +1497,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, bool write_fault, bool *writable, kvm_pfn_t *p_pfn) {

unsigned long pfn;

kvm_pfn_t pfn; pte_t *ptep; spinlock_t *ptl; int r;

Acked-by: Paolo Bonzini pbonzini@redhat.com

Paolo Bonzini

8:51 a.m.

On 03/08/21 15:55, Ovidiu Panait wrote:

...

From: Paolo Bonzini pbonzini@redhat.com

commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.

In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.

In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.

Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.

Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.14, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com

virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 547ae59199db..4e23d0b4b810 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1491,9 +1491,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, kvm_pfn_t *p_pfn) { unsigned long pfn;

pte_t *ptep;

spinlock_t *ptl; int r;

r = follow_pfn(vma, addr, &pfn);

r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl); if (r) { /*

get_user_pages fails for VM_IO and VM_PFNMAP vmas and does

@@ -1508,14 +1510,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, if (r) return r;
r = follow_pfn(vma, addr, &pfn);
r = follow_pte_pmd(vma->vm_mm, addr, NULL, NULL, &ptep, NULL, &ptl);
if (r) return r;
}
if (write_fault && !pte_write(*ptep)) {
pfn = KVM_PFN_ERR_RO_FAULT;
goto out;
}
if (writable)
*writable = true;
*writable = pte_write(*ptep);
pfn = pte_pfn(*ptep);
/* * Get a reference here because callers of *hva_to_pfn* and @@ -1530,6 +1537,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, */ kvm_get_pfn(pfn); +out:

pte_unmap_unlock(ptep, ptl); *p_pfn = pfn; return 0; }

Acked-by: Paolo Bonzini pbonzini@redhat.com

Greg KH

6 Aug 6 Aug

6:26 a.m.

On Tue, Aug 03, 2021 at 04:55:19PM +0300, Ovidiu Panait wrote:

...

From: Paolo Bonzini pbonzini@redhat.com

commit bd2fae8da794b55bf2ac02632da3a151b10e664c upstream.

In order to convert an HVA to a PFN, KVM usually tries to use the get_user_pages family of functinso. This however is not possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.

In doing this however KVM loses the information on whether the PFN is writable. That is usually not a problem because the main use of VM_IO vmas with KVM is for BARs in PCI device assignment, however it is a bug. To fix it, use follow_pte and check pte_write while under the protection of the PTE lock. The information can be used to fail hva_to_pfn_remapped or passed back to the caller via *writable.

Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up", 2016-07-05); however, even older version have the same issue, all the way back to commit 2e2e3738af33 ("KVM: Handle vma regions with no backing page", 2008-07-20), as they also did not check whether the PFN was writable.

Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page") Reported-by: David Stevens stevensd@google.com Cc: 3pvd@google.com Cc: Jann Horn jannh@google.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com [OP: backport to 4.14, adjust follow_pte() -> follow_pte_pmd()] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com

virt/kvm/kvm_main.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)

All now queued up, thanks.

greg k-h

1425

days inactive

1428

days old

linux-stable-mirror@lists.linaro.org

6 comments

participants

tags (0)

participants (3)

Greg KH
Ovidiu Panait
Paolo Bonzini