On 09.02.21 16:43, Claudio Imbrenda wrote:
Extend kvm_s390_shadow_fault to return the pointer to the valid leaf DAT table entry, or to the invalid entry.
Also return some flags in the lower bits of the address: DAT_PROT: indicates that DAT protection applies because of the protection bit in the segment (or, if EDAT, region) tables NOT_PTE: indicates that the address of the DAT table entry returned does not refer to a PTE, but to a segment or region table.
I've been thinking about one possible issue, but I think it's not actually an issue. Just sharing so others can verify:
In case our guest uses huge pages / gigantic pages / ASCE R, we create fake page tables (GMAP_SHADOW_FAKE_TABLE).
So, it could be that kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup() succeeds, however, we have a fake PTE in our hands. We lost the actual guest STE/RTE address. (I think it could be recovered somehow via page->index, thought)
But I guess, if there is a fake PTE, then there is not acutally something that could go wrong in gmap_shadow_page() anymore that could lead us in responding something wrong to the guest. We can only really fail with -EINVAL, -ENOMEM or -EFAULT.
So if the guest changed anything in the meantime (e.g., zap a segment), we would have unshadowed the whole fake page table hierarchy and would simply retry.
Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com Cc: stable@vger.kernel.org
arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++----- arch/s390/kvm/gaccess.h | 5 ++++- arch/s390/kvm/vsie.c | 8 ++++---- 3 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c index 6d6b57059493..e0ab83f051d2 100644 --- a/arch/s390/kvm/gaccess.c +++ b/arch/s390/kvm/gaccess.c @@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
- kvm_s390_shadow_tables - walk the guest page table and create shadow tables
- @sg: pointer to the shadow guest address space structure
- @saddr: faulting address in the shadow gmap
- @pgt: pointer to the page table address result
- @pgt: pointer to the beginning of the page table for the given address if
successful (return value 0), or to the first invalid DAT entry in
*/ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
case of exceptions (return value > 0)
- @fake: pgt references contiguous guest memory block, not a pgtable
@@ -1034,6 +1036,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, rfte.val = ptr; goto shadow_r2t; }
rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);*pgt = ptr + vaddr.rfx * 8;
Using
gmap_read_table(parent, *pgt, &rfte.val);
or similar with a local variable might then be even clearer. But no strong opinion.
if (rc) return rc;
@@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, rste.val = ptr; goto shadow_r3t; }
rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val); if (rc) return rc;*pgt = ptr + vaddr.rsx * 8;
@@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, rtte.val = ptr; goto shadow_sgt; }
rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val); if (rc) return rc;*pgt = ptr + vaddr.rtx * 8;
@@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr, ste.val = ptr; goto shadow_pgt; }
rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val); if (rc) return rc;*pgt = ptr + vaddr.sx * 8;
@@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
- @vcpu: virtual cpu
- @sg: pointer to the shadow guest address space structure
- @saddr: faulting address in the shadow gmap
- @datptr: will contain the address of the faulting DAT table entry, or of
the valid leaf, plus some flags
- Returns: - 0 if the shadow fault was successfully resolved
- > 0 (pgm exception code) on exceptions while faulting
@@ -1165,11 +1173,11 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
- -ENOMEM if out of memory
*/ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
unsigned long saddr)
{ union vaddress vaddr; union page_table_entry pte;unsigned long saddr, unsigned long *datptr)
- unsigned long pgt;
- unsigned long pgt = 0; int dat_protection, fake; int rc;
@@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg, pte.val = pgt + vaddr.px * PAGE_SIZE; goto shadow_page; }
- if (!rc)
rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
- switch (rc) {
- case PGM_SEGMENT_TRANSLATION:
- case PGM_REGION_THIRD_TRANS:
- case PGM_REGION_SECOND_TRANS:
- case PGM_REGION_FIRST_TRANS:
pgt |= NOT_PTE;
break;
- case 0:
pgt += vaddr.px * 8;
rc = gmap_read_table(sg->parent, pgt, &pte.val);
- }
- if (*datptr)
if (!rc && pte.i) rc = PGM_PAGE_TRANSLATION; if (!rc && pte.z)*datptr = pgt | dat_protection * DAT_PROT;
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h index f4c51756c462..fec26bbb17ba 100644 --- a/arch/s390/kvm/gaccess.h +++ b/arch/s390/kvm/gaccess.h @@ -359,7 +359,10 @@ void ipte_unlock(struct kvm_vcpu *vcpu); int ipte_lock_held(struct kvm_vcpu *vcpu); int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra); +#define DAT_PROT 2 +#define NOT_PTE 4
What if our guest is using ASCE.R ?