If a stage-2 page-table contains an executable, read-only mapping at the pte level (e.g. due to dirty logging being enabled), a subsequent write fault to the same page which tries to install a larger block mapping (e.g. due to dirty logging having been disabled) will erroneously inherit the exec permission and consequently skip I-cache invalidation for the rest of the block.
Ensure that exec permission is only inherited by write faults when the new mapping is of the same size as the existing one. A subsequent instruction abort will result in I-cache invalidation for the entire block mapping.
Cc: Marc Zyngier maz@kernel.org Cc: Quentin Perret qperret@google.com Cc: stable@vger.kernel.org Signed-off-by: Will Deacon will@kernel.org ---
Found by code inspection, rather than something actually going wrong.
arch/arm64/kvm/mmu.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8c0035cab6b6..69dc36d1d486 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1326,7 +1326,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, return true; }
-static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr, unsigned long sz) { pud_t *pudp; pmd_t *pmdp; @@ -1338,9 +1338,9 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) return false;
if (pudp) - return kvm_s2pud_exec(pudp); + return sz == PUD_SIZE && kvm_s2pud_exec(pudp); else if (pmdp) - return kvm_s2pmd_exec(pmdp); + return sz == PMD_SIZE && kvm_s2pmd_exec(pmdp); else return kvm_s2pte_exec(ptep); } @@ -1958,7 +1958,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * execute permissions, and we preserve whatever we have. */ needs_exec = exec_fault || - (fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa)); + (fault_status == FSC_PERM && + stage2_is_exec(kvm, fault_ipa, vma_pagesize));
if (vma_pagesize == PUD_SIZE) { pud_t new_pud = kvm_pfn_pud(pfn, mem_type);
Hey Will,
On Wednesday 22 Jul 2020 at 14:15:10 (+0100), Will Deacon wrote:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8c0035cab6b6..69dc36d1d486 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1326,7 +1326,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, return true; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr, unsigned long sz) { pud_t *pudp; pmd_t *pmdp; @@ -1338,9 +1338,9 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) return false; if (pudp)
return kvm_s2pud_exec(pudp);
else if (pmdp)return sz == PUD_SIZE && kvm_s2pud_exec(pudp);
return kvm_s2pmd_exec(pmdp);
else return kvm_s2pte_exec(ptep);return sz == PMD_SIZE && kvm_s2pmd_exec(pmdp);
This wants a 'sz == PAGE_SIZE' check, otherwise you'll happily inherit the exec flag when a PTE has exec rights while you create a block mapping on top.
Also, I think it should be safe to make the PMD and PUD case more permissive, as 'sz <= PMD_SIZE' for instance, as the icache invalidation shouldn't be an issue there? That probably doesn't matter all that much though.
} @@ -1958,7 +1958,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * execute permissions, and we preserve whatever we have. */ needs_exec = exec_fault ||
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
(fault_status == FSC_PERM &&
stage2_is_exec(kvm, fault_ipa, vma_pagesize));
if (vma_pagesize == PUD_SIZE) { pud_t new_pud = kvm_pfn_pud(pfn, mem_type); -- 2.28.0.rc0.105.gf9edc3c819-goog
FWIW, I reproduced the issue with a dummy guest accessing memory just the wrong way, and toggling dirty logging at the right moment. And this patch + my suggestion above seems to cure things. So, with the above applied:
Reviewed-by: Quentin Perret qperret@google.com Tested-by: Quentin Perret qperret@google.com
Thanks, Quentin
Hi Quentin,
On Wed, Jul 22, 2020 at 04:54:28PM +0100, Quentin Perret wrote:
On Wednesday 22 Jul 2020 at 14:15:10 (+0100), Will Deacon wrote:
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 8c0035cab6b6..69dc36d1d486 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1326,7 +1326,7 @@ static bool stage2_get_leaf_entry(struct kvm *kvm, phys_addr_t addr, return true; } -static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) +static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr, unsigned long sz) { pud_t *pudp; pmd_t *pmdp; @@ -1338,9 +1338,9 @@ static bool stage2_is_exec(struct kvm *kvm, phys_addr_t addr) return false; if (pudp)
return kvm_s2pud_exec(pudp);
else if (pmdp)return sz == PUD_SIZE && kvm_s2pud_exec(pudp);
return kvm_s2pmd_exec(pmdp);
else return kvm_s2pte_exec(ptep);return sz == PMD_SIZE && kvm_s2pmd_exec(pmdp);
This wants a 'sz == PAGE_SIZE' check, otherwise you'll happily inherit the exec flag when a PTE has exec rights while you create a block mapping on top.
Nice catch! Somehow I thought we always had PAGE_SIZE in the 'else' case, but that's obviously not true now that you've pointed it out.
Also, I think it should be safe to make the PMD and PUD case more permissive, as 'sz <= PMD_SIZE' for instance, as the icache invalidation shouldn't be an issue there? That probably doesn't matter all that much though.
I'll make that change anyway.
@@ -1958,7 +1958,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, * execute permissions, and we preserve whatever we have. */ needs_exec = exec_fault ||
(fault_status == FSC_PERM && stage2_is_exec(kvm, fault_ipa));
(fault_status == FSC_PERM &&
stage2_is_exec(kvm, fault_ipa, vma_pagesize));
if (vma_pagesize == PUD_SIZE) { pud_t new_pud = kvm_pfn_pud(pfn, mem_type); -- 2.28.0.rc0.105.gf9edc3c819-goog
FWIW, I reproduced the issue with a dummy guest accessing memory just the wrong way, and toggling dirty logging at the right moment. And this patch + my suggestion above seems to cure things.
Testing?! It'll never catch on...
So, with the above applied:
Reviewed-by: Quentin Perret qperret@google.com Tested-by: Quentin Perret qperret@google.com
Cheers. v2 coming up.
Will
linux-stable-mirror@lists.linaro.org