Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to transferring to a KVM guest, which is roughly equivalent to an exit to userspace and processes many of the same pending actions. While the task cannot be in an rseq critical section as the KVM path is reachable only via ioctl(KVM_RUN), the side effects that apply to rseq outside of a critical section still apply, e.g. the CPU ID needs to be updated if the task is migrated.
Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults and other badness in userspace VMMs that use rseq in combination with KVM, e.g. due to the CPU ID being stale after task migration.
Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function") Reported-by: Peter Foley pefoley@google.com Bisected-by: Doug Evans dje@google.com Cc: Shakeel Butt shakeelb@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com --- kernel/entry/kvm.c | 4 +++- kernel/rseq.c | 4 ++-- 2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c index 49972ee99aff..049fd06b4c3d 100644 --- a/kernel/entry/kvm.c +++ b/kernel/entry/kvm.c @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu, unsigned long ti_work) if (ti_work & _TIF_NEED_RESCHED) schedule();
- if (ti_work & _TIF_NOTIFY_RESUME) + if (ti_work & _TIF_NOTIFY_RESUME) { tracehook_notify_resume(NULL); + rseq_handle_notify_resume(NULL, NULL); + }
ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work); if (ret) diff --git a/kernel/rseq.c b/kernel/rseq.c index 35f7bd0fced0..58c79a7918cd 100644 --- a/kernel/rseq.c +++ b/kernel/rseq.c @@ -236,7 +236,7 @@ static bool in_rseq_cs(unsigned long ip, struct rseq_cs *rseq_cs)
static int rseq_ip_fixup(struct pt_regs *regs) { - unsigned long ip = instruction_pointer(regs); + unsigned long ip = regs ? instruction_pointer(regs) : 0; struct task_struct *t = current; struct rseq_cs rseq_cs; int ret; @@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs) * If not nested over a rseq critical section, restart is useless. * Clear the rseq_cs pointer and return. */ - if (!in_rseq_cs(ip, &rseq_cs)) + if (!regs || !in_rseq_cs(ip, &rseq_cs)) return clear_rseq_cs(t); ret = rseq_need_restart(t, rseq_cs.flags); if (ret <= 0)