Correctly handle the MVPG instruction when issued by a VSIE guest.
Fixes: a3508fbe9dc6d ("KVM: s390: vsie: initial support for nested virtualization") Cc: stable@vger.kernel.org Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com --- arch/s390/kvm/vsie.c | 73 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+)
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c index ada49583e530..6c3069868acd 100644 --- a/arch/s390/kvm/vsie.c +++ b/arch/s390/kvm/vsie.c @@ -977,6 +977,75 @@ static int handle_stfle(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) return 0; }
+static u64 vsie_get_register(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page, u8 reg) +{ + reg &= 0xf; + switch (reg) { + case 15: + return vsie_page->scb_s.gg15; + case 14: + return vsie_page->scb_s.gg14; + default: + return vcpu->run->s.regs.gprs[reg]; + } +} + +static int vsie_handle_mvpg(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) +{ + struct kvm_s390_sie_block *scb_s = &vsie_page->scb_s; + unsigned long r1, r2, mask = PAGE_MASK; + int rc; + + if (psw_bits(scb_s->gpsw).eaba == PSW_BITS_AMODE_24BIT) + mask = 0xfff000; + else if (psw_bits(scb_s->gpsw).eaba == PSW_BITS_AMODE_31BIT) + mask = 0x7ffff000; + + r1 = vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 20) & mask; + r2 = vsie_get_register(vcpu, vsie_page, scb_s->ipb >> 16) & mask; + rc = kvm_s390_vsie_mvpg_check(vcpu, r1, r2, &vsie_page->scb_o->mcic); + + /* + * Guest translation was not successful. The host needs to forward + * the intercept to the guest and let the guest fix its page tables. + * The guest needs then to retry the instruction. + */ + if (rc == -ENOENT) + return 1; + + retry_vsie_icpt(vsie_page); + + /* + * Guest translation was not successful. The page tables of the guest + * are broken. Try again and let the hardware deliver the fault. + */ + if (rc == -EFAULT) + return 0; + + /* + * Guest translation was successful. The host needs to fix up its + * page tables and retry the instruction in the nested guest. + * In case of failure, the instruction will intercept again, and + * a different path will be taken. + */ + if (!rc) { + kvm_s390_shadow_fault(vcpu, vsie_page->gmap, r2); + kvm_s390_shadow_fault(vcpu, vsie_page->gmap, r1); + return 0; + } + + /* + * An exception happened during guest translation, it needs to be + * delivered to the guest. This can happen if the host has EDAT1 + * enabled and the guest has not, or for other causes. The guest + * needs to process the exception and return to the nested guest. + */ + if (rc > 0) + return kvm_s390_inject_prog_cond(vcpu, rc); + + return 1; +} + /* * Run the vsie on a shadow scb and a shadow gmap, without any further * sanity checks, handling SIE faults. @@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) != 0xf000) scb_s->ipa += 0x1000; break; + case ICPT_PARTEXEC: + if (scb_s->ipa == 0xb254) + rc = vsie_handle_mvpg(vcpu, vsie_page); + break; } return rc; }
On 18.12.20 15:18, Claudio Imbrenda wrote:
Correctly handle the MVPG instruction when issued by a VSIE guest.
I remember that MVPG SIE documentation was completely crazy and full of corner cases. :)
Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot that
1. "This interception can only happen for guests with DAT disabled ..." 2. KVM does not make use of any mvpg state inside the SCB.
Can this be observed with Linux guests?
Can I get some information on what information is stored at [0xc0, 0xd) inside the SCB? I assume it's:
0xc0: guest physical address of source PTE 0xc8: guest physical address of target PTE
Also, which conditions have to be met such that we get a ICPT_PARTEXEC:
a) State of guest DAT (I assume off?) b) State of PTEs: What happens if there is no PTE (I assume we need two PTEs, otherwise no such intercept)? I assume we get an intercept if one of both PTEs is not present or the destination PTE is protected. Correct?
So, when we (g1) get an intercept for g3, can we be sure 0xc0 and 0xc8 in the scb are both valid g1 addresses pointing at our PTE, and what do we know about these PTEs (one not present or destination protected)?
[...]
/*
- Run the vsie on a shadow scb and a shadow gmap, without any further
- sanity checks, handling SIE faults.
@@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) != 0xf000) scb_s->ipa += 0x1000; break;
- case ICPT_PARTEXEC:
if (scb_s->ipa == 0xb254)
Old code hat "/* MVPG only */" - why is this condition now necessary?
rc = vsie_handle_mvpg(vcpu, vsie_page);
} return rc;break;
}
On Sun, 20 Dec 2020 11:13:57 +0100 David Hildenbrand david@redhat.com wrote:
On 18.12.20 15:18, Claudio Imbrenda wrote:
Correctly handle the MVPG instruction when issued by a VSIE guest.
I remember that MVPG SIE documentation was completely crazy and full of corner cases. :)
you remember correctly
Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot that
- "This interception can only happen for guests with DAT disabled
..." 2. KVM does not make use of any mvpg state inside the SCB.
Can this be observed with Linux guests?
a Linux guest will typically not run with DAT disabled
Can I get some information on what information is stored at [0xc0, 0xd) inside the SCB? I assume it's:
0xc0: guest physical address of source PTE 0xc8: guest physical address of target PTE
yes (plus 3 flags in the lower bits of each)
Also, which conditions have to be met such that we get a ICPT_PARTEXEC:
a) State of guest DAT (I assume off?) b) State of PTEs: What happens if there is no PTE (I assume we need two PTEs, otherwise no such intercept)? I assume we get an intercept if one of both PTEs is not present or the destination PTE is protected. Correct?
we get the intercept if the guest has DAT off, and at least one of the host PTEs is invalid (and I think if the destination is valid but protected)
So, when we (g1) get an intercept for g3, can we be sure 0xc0 and 0xc8 in the scb are both valid g1 addresses pointing at our PTE, and what do we know about these PTEs (one not present or destination protected)?
we only know that at least one of the following holds true: * source invalid * destination invalid * destination protected
there is a bit that tells us if the destination was protected (bit 62), but that does not exclude an invalid source
[...]
/*
- Run the vsie on a shadow scb and a shadow gmap, without any
further
- sanity checks, handling SIE faults.
@@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) != 0xf000) scb_s->ipa += 0x1000; break;
- case ICPT_PARTEXEC:
if (scb_s->ipa == 0xb254)
Old code hat "/* MVPG only */" - why is this condition now necessary?
old code was wrong ;)
rc = vsie_handle_mvpg(vcpu, vsie_page);
} return rc;break;
}
On 04.01.21 16:22, Claudio Imbrenda wrote:
On Sun, 20 Dec 2020 11:13:57 +0100 David Hildenbrand david@redhat.com wrote:
On 18.12.20 15:18, Claudio Imbrenda wrote:
Correctly handle the MVPG instruction when issued by a VSIE guest.
I remember that MVPG SIE documentation was completely crazy and full of corner cases. :)
you remember correctly
Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot that
- "This interception can only happen for guests with DAT disabled
..." 2. KVM does not make use of any mvpg state inside the SCB.
Can this be observed with Linux guests?
a Linux guest will typically not run with DAT disabled
Can I get some information on what information is stored at [0xc0, 0xd) inside the SCB? I assume it's:
0xc0: guest physical address of source PTE 0xc8: guest physical address of target PTE
yes (plus 3 flags in the lower bits of each)
Thanks! Do the flags tell us what the deal with the PTE was? If yes, what's the meaning of the separate flags?
I assume something like "invalid, proteced, ??"
I'm asking because I think we can handle this a little easier.
[...]
/*
- Run the vsie on a shadow scb and a shadow gmap, without any
further
- sanity checks, handling SIE faults.
@@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) != 0xf000) scb_s->ipa += 0x1000; break;
- case ICPT_PARTEXEC:
if (scb_s->ipa == 0xb254)
Old code hat "/* MVPG only */" - why is this condition now necessary?
old code was wrong ;)
arch/s390/kvm/intercept.c:handle_partial_execution() we only seem to handle
1. MVPG 2. SIGP PEI
The latter is only relevant for external calls. IIRC, this is only active with sigp interpretation - which is never active under vsie (ECA_SIGPI).
On Mon, 4 Jan 2021 17:08:15 +0100 David Hildenbrand david@redhat.com wrote:
On 04.01.21 16:22, Claudio Imbrenda wrote:
On Sun, 20 Dec 2020 11:13:57 +0100 David Hildenbrand david@redhat.com wrote:
On 18.12.20 15:18, Claudio Imbrenda wrote:
Correctly handle the MVPG instruction when issued by a VSIE guest.
I remember that MVPG SIE documentation was completely crazy and full of corner cases. :)
you remember correctly
Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot that
- "This interception can only happen for guests with DAT disabled
..." 2. KVM does not make use of any mvpg state inside the SCB.
Can this be observed with Linux guests?
a Linux guest will typically not run with DAT disabled
Can I get some information on what information is stored at [0xc0, 0xd) inside the SCB? I assume it's:
0xc0: guest physical address of source PTE 0xc8: guest physical address of target PTE
yes (plus 3 flags in the lower bits of each)
Thanks! Do the flags tell us what the deal with the PTE was? If yes, what's the meaning of the separate flags?
I assume something like "invalid, proteced, ??"
bit 61 indicates that the address is a region or segment table entry, when EDAT applies bit 62 is "protected" when the protected bit is set in the segment table entry (or region, if EDAT applies) bit 63 is set when the operand was translated with a real-space ASCE
but you can check if the PTE is valid just by dereferencing the pointers...
I'm asking because I think we can handle this a little easier.
what is your idea?
[...]
/*
- Run the vsie on a shadow scb and a shadow gmap, without any
further
- sanity checks, handling SIE faults.
@@ -1063,6 +1132,10 @@ static int do_vsie_run(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page) if ((scb_s->ipa & 0xf000) != 0xf000) scb_s->ipa += 0x1000; break;
- case ICPT_PARTEXEC:
if (scb_s->ipa == 0xb254)
Old code hat "/* MVPG only */" - why is this condition now necessary?
old code was wrong ;)
arch/s390/kvm/intercept.c:handle_partial_execution() we only seem to handle
- MVPG
- SIGP PEI
The latter is only relevant for external calls. IIRC, this is only active with sigp interpretation - which is never active under vsie (ECA_SIGPI).
I think putting an explicit check is better than just a jump in the dark.
On 04.01.21 17:36, Claudio Imbrenda wrote:
On Mon, 4 Jan 2021 17:08:15 +0100 David Hildenbrand david@redhat.com wrote:
On 04.01.21 16:22, Claudio Imbrenda wrote:
On Sun, 20 Dec 2020 11:13:57 +0100 David Hildenbrand david@redhat.com wrote:
On 18.12.20 15:18, Claudio Imbrenda wrote:
Correctly handle the MVPG instruction when issued by a VSIE guest.
I remember that MVPG SIE documentation was completely crazy and full of corner cases. :)
you remember correctly
Looking at arch/s390/kvm/intercept.c:handle_mvpg_pei(), I can spot that
- "This interception can only happen for guests with DAT disabled
..." 2. KVM does not make use of any mvpg state inside the SCB.
Can this be observed with Linux guests?
a Linux guest will typically not run with DAT disabled
Can I get some information on what information is stored at [0xc0, 0xd) inside the SCB? I assume it's:
0xc0: guest physical address of source PTE 0xc8: guest physical address of target PTE
yes (plus 3 flags in the lower bits of each)
Thanks! Do the flags tell us what the deal with the PTE was? If yes, what's the meaning of the separate flags?
I assume something like "invalid, proteced, ??"
bit 61 indicates that the address is a region or segment table entry, when EDAT applies bit 62 is "protected" when the protected bit is set in the segment table entry (or region, if EDAT applies) bit 63 is set when the operand was translated with a real-space ASCE
Thanks!
but you can check if the PTE is valid just by dereferencing the pointers...
The pgtable might already have been unshadowed and repurposed I think. So for vSIE, the PTE content, therefore, is a little unreliable.
We could, of course, try using them to make a guess.
"Likely valid" "Likely invalid"
A rerun of the vSIE will fixup any wrong guess.
I'm asking because I think we can handle this a little easier.
what is your idea?
I was wondering if we can
1. avoid essentially two translations per PTE, obtaining the information we need while tying to shadow. kvm_s390_shadow_fault() on steroids that
a) gives us the last guest pte address (tricky for segment.region table I think ... will have to think about this) b) the final protection
2. avoid faulting/shadowing in case we know an entry is not problematic. E.g., no need to shadow/fault the source in case the PTE is there and not invalid. "likely valid" case above.
The idea would be to call the new kvm_s390_shadow_fault() two times (or only once due to our guesses) and either rerun the vsie, inject an interrupt, or create the partial intercept.
Essentially avoiding kvm_s390_vsie_mvpg_check(). Will have to think about this.
[...]
arch/s390/kvm/intercept.c:handle_partial_execution() we only seem to handle
- MVPG
- SIGP PEI
The latter is only relevant for external calls. IIRC, this is only active with sigp interpretation - which is never active under vsie (ECA_SIGPI).
I think putting an explicit check is better than just a jump in the dark.
Agreed, but that should then be called out somewhere why the change as done. (e.g., separate cleanup patch)
linux-stable-mirror@lists.linaro.org