The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b18c2b9487d8e797fc0a757e57ac3645348c5fba Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Tue, 3 Oct 2017 18:14:12 +0100
Subject: [PATCH] bus: arm-ccn: Fix use of smp_processor_id() in preemptible
context
Booting a DEBUG_PREEMPT enabled kernel on a CCN-based system
results in the following splat:
[...]
arm-ccn e8000000.ccn: No access to interrupts, using timer.
BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1
caller is debug_smp_processor_id+0x1c/0x28
CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.13.0 #6111
Hardware name: AMD Seattle/Seattle, BIOS 17:08:23 Jun 26 2017
Call trace:
[<ffff000008089e78>] dump_backtrace+0x0/0x278
[<ffff00000808a22c>] show_stack+0x24/0x30
[<ffff000008bc3bc4>] dump_stack+0x8c/0xb0
[<ffff00000852b534>] check_preemption_disabled+0xfc/0x100
[<ffff00000852b554>] debug_smp_processor_id+0x1c/0x28
[<ffff000008551bd8>] arm_ccn_probe+0x358/0x4f0
[...]
as we use smp_processor_id() in the wrong context.
Turn this into a get_cpu()/put_cpu() that extends over the CPU hotplug
registration, making sure that we don't race against a CPU down operation.
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Acked-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: stable(a)vger.kernel.org # 4.2+
Signed-off-by: Pawel Moll <pawel.moll(a)arm.com>
diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c
index bbc1a2ef9639..508a1a389b7e 100644
--- a/drivers/bus/arm-ccn.c
+++ b/drivers/bus/arm-ccn.c
@@ -1300,7 +1300,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
}
/* Pick one CPU which we will use to collect data from CCN... */
- cpumask_set_cpu(smp_processor_id(), &ccn->dt.cpu);
+ cpumask_set_cpu(get_cpu(), &ccn->dt.cpu);
/* Also make sure that the overflow interrupt is handled by this CPU */
if (ccn->irq) {
@@ -1317,10 +1317,12 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
cpuhp_state_add_instance_nocalls(CPUHP_AP_PERF_ARM_CCN_ONLINE,
&ccn->dt.node);
+ put_cpu();
return 0;
error_pmu_register:
error_set_affinity:
+ put_cpu();
error_choose_name:
ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id);
for (i = 0; i < ccn->num_xps; i++)
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 24771179c5c138f0ea3ef88b7972979f62f2d5db Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Date: Sun, 27 Aug 2017 11:06:50 +0100
Subject: [PATCH] bus: arm-ccn: Check memory allocation failure
Check memory allocation failures and return -ENOMEM in such cases
This avoids a potential NULL pointer dereference.
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Acked-by: Scott Branden <scott.branden(a)broadcom.com>
Cc: stable(a)vger.kernel.org # 3.17+
Signed-off-by: Pawel Moll <pawel.moll(a)arm.com>
diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c
index 7625bf762acb..a7951662f85b 100644
--- a/drivers/bus/arm-ccn.c
+++ b/drivers/bus/arm-ccn.c
@@ -1271,6 +1271,10 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
int len = snprintf(NULL, 0, "ccn_%d", ccn->dt.id);
name = devm_kzalloc(ccn->dev, len + 1, GFP_KERNEL);
+ if (!name) {
+ err = -ENOMEM;
+ goto error_choose_name;
+ }
snprintf(name, len + 1, "ccn_%d", ccn->dt.id);
}
@@ -1319,6 +1323,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
error_pmu_register:
error_set_affinity:
+error_choose_name:
ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id);
for (i = 0; i < ccn->num_xps; i++)
writel(0, ccn->xp[i].base + CCN_XP_DT_CONTROL);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 24771179c5c138f0ea3ef88b7972979f62f2d5db Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Date: Sun, 27 Aug 2017 11:06:50 +0100
Subject: [PATCH] bus: arm-ccn: Check memory allocation failure
Check memory allocation failures and return -ENOMEM in such cases
This avoids a potential NULL pointer dereference.
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Acked-by: Scott Branden <scott.branden(a)broadcom.com>
Cc: stable(a)vger.kernel.org # 3.17+
Signed-off-by: Pawel Moll <pawel.moll(a)arm.com>
diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c
index 7625bf762acb..a7951662f85b 100644
--- a/drivers/bus/arm-ccn.c
+++ b/drivers/bus/arm-ccn.c
@@ -1271,6 +1271,10 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
int len = snprintf(NULL, 0, "ccn_%d", ccn->dt.id);
name = devm_kzalloc(ccn->dev, len + 1, GFP_KERNEL);
+ if (!name) {
+ err = -ENOMEM;
+ goto error_choose_name;
+ }
snprintf(name, len + 1, "ccn_%d", ccn->dt.id);
}
@@ -1319,6 +1323,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn)
error_pmu_register:
error_set_affinity:
+error_choose_name:
ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id);
for (i = 0; i < ccn->num_xps; i++)
writel(0, ccn->xp[i].base + CCN_XP_DT_CONTROL);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4608af8aa53e7f3922ddee695d023b7bcd5cb35b Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Tue, 3 Oct 2017 18:14:13 +0100
Subject: [PATCH] bus: arm-cci: Fix use of smp_processor_id() in preemptible
context
The ARM CCI driver seem to be using smp_processor_id() in a
preemptible context, which is likely to make a DEBUG_PREMPT
kernel scream at boot time.
Turn this into a get_cpu()/put_cpu() that extends over the CPU
hotplug registration, making sure that we don't race against
a CPU down operation.
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Acked-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: stable(a)vger.kernel.org # 4.2+
Signed-off-by: Pawel Moll <pawel.moll(a)arm.com>
diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c
index 3c29d36702a8..5426c04fe24b 100644
--- a/drivers/bus/arm-cci.c
+++ b/drivers/bus/arm-cci.c
@@ -1755,14 +1755,17 @@ static int cci_pmu_probe(struct platform_device *pdev)
raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
mutex_init(&cci_pmu->reserve_mutex);
atomic_set(&cci_pmu->active_events, 0);
- cpumask_set_cpu(smp_processor_id(), &cci_pmu->cpus);
+ cpumask_set_cpu(get_cpu(), &cci_pmu->cpus);
ret = cci_pmu_init(cci_pmu, pdev);
- if (ret)
+ if (ret) {
+ put_cpu();
return ret;
+ }
cpuhp_state_add_instance_nocalls(CPUHP_AP_PERF_ARM_CCI_ONLINE,
&cci_pmu->node);
+ put_cpu();
pr_info("ARM %s PMU driver probed", cci_pmu->model->name);
return 0;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 64afe6e9eb4841f35317da4393de21a047a883b3 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Thu, 16 Nov 2017 17:58:17 +0000
Subject: [PATCH] KVM: arm/arm64: vgic-its: Preserve the revious read from the
pending table
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.
We end-up using whatever is on the stack. Who knows, it might
just be the right thing...
Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
Cc: stable(a)vger.kernel.org # 4.8
Reported-by: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org>
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 1f761a9991e7..cb2d0a2dbe5a 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -421,6 +421,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
u32 *intids;
int nr_irqs, i;
unsigned long flags;
+ u8 pendmask;
nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
if (nr_irqs < 0)
@@ -428,7 +429,6 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
for (i = 0; i < nr_irqs; i++) {
int byte_offset, bit_nr;
- u8 pendmask;
byte_offset = intids[i] / BITS_PER_BYTE;
bit_nr = intids[i] % BITS_PER_BYTE;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 64afe6e9eb4841f35317da4393de21a047a883b3 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Thu, 16 Nov 2017 17:58:17 +0000
Subject: [PATCH] KVM: arm/arm64: vgic-its: Preserve the revious read from the
pending table
The current pending table parsing code assumes that we keep the
previous read of the pending bits, but keep that variable in
the current block, making sure it is discarded on each loop.
We end-up using whatever is on the stack. Who knows, it might
just be the right thing...
Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table")
Cc: stable(a)vger.kernel.org # 4.8
Reported-by: AKASHI Takahiro <takahiro.akashi(a)linaro.org>
Reviewed-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org>
diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c
index 1f761a9991e7..cb2d0a2dbe5a 100644
--- a/virt/kvm/arm/vgic/vgic-its.c
+++ b/virt/kvm/arm/vgic/vgic-its.c
@@ -421,6 +421,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
u32 *intids;
int nr_irqs, i;
unsigned long flags;
+ u8 pendmask;
nr_irqs = vgic_copy_lpi_list(vcpu, &intids);
if (nr_irqs < 0)
@@ -428,7 +429,6 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
for (i = 0; i < nr_irqs; i++) {
int byte_offset, bit_nr;
- u8 pendmask;
byte_offset = intids[i] / BITS_PER_BYTE;
bit_nr = intids[i] % BITS_PER_BYTE;
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5553b142be11e794ebc0805950b2e8313f93d718 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Thu, 16 Nov 2017 17:58:21 +0000
Subject: [PATCH] arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
VTTBR address. It seems to currently be off by one, thereby only
allowing up to 39-bit addresses (instead of 40-bit) and also
insufficiently checking the alignment. This patch fixes it.
This patch is the 32bit pendent of Kristina's arm64 fix, and
she deserves the actual kudos for pinpointing that one.
Fixes: f7ed45be3ba52 ("KVM: ARM: World-switch implementation")
Cc: <stable(a)vger.kernel.org> # 3.9
Reported-by: Kristina Martsenko <kristina.martsenko(a)arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org>
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index c8781450905b..3ab8b3781bfe 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -161,8 +161,7 @@
#else
#define VTTBR_X (5 - KVM_T0SZ)
#endif
-#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
-#define VTTBR_BADDR_MASK (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
+#define VTTBR_BADDR_MASK (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_X)
#define VTTBR_VMID_SHIFT _AC(48, ULL)
#define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5553b142be11e794ebc0805950b2e8313f93d718 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <marc.zyngier(a)arm.com>
Date: Thu, 16 Nov 2017 17:58:21 +0000
Subject: [PATCH] arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
VTTBR_BADDR_MASK is used to sanity check the size and alignment of the
VTTBR address. It seems to currently be off by one, thereby only
allowing up to 39-bit addresses (instead of 40-bit) and also
insufficiently checking the alignment. This patch fixes it.
This patch is the 32bit pendent of Kristina's arm64 fix, and
she deserves the actual kudos for pinpointing that one.
Fixes: f7ed45be3ba52 ("KVM: ARM: World-switch implementation")
Cc: <stable(a)vger.kernel.org> # 3.9
Reported-by: Kristina Martsenko <kristina.martsenko(a)arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall(a)linaro.org>
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index c8781450905b..3ab8b3781bfe 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -161,8 +161,7 @@
#else
#define VTTBR_X (5 - KVM_T0SZ)
#endif
-#define VTTBR_BADDR_SHIFT (VTTBR_X - 1)
-#define VTTBR_BADDR_MASK (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT)
+#define VTTBR_BADDR_MASK (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_X)
#define VTTBR_VMID_SHIFT _AC(48, ULL)
#define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a87e55f89f0b0dc541d89248a8445635936a3858 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Wed, 29 Nov 2017 17:37:30 +0200
Subject: [PATCH] drm/i915: Fix vblank timestamp/frame counter jumps on gen2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previously I was under the impression that the scanline counter
reads 0 when the pipe is off. Turns out that's not correct, and
instead the scanline counter simply stops when the pipe stops, and
it retains it's last value until the pipe starts up again, at which
point the scanline counter jumps to vblank start.
These jumps can cause the timestamp to jump backwards by one frame.
Since we use the timestamps to guesstimage also the frame counter
value on gen2, that would cause the frame counter to also jump
backwards, which leads to a massice difference from the previous value.
The end result is that flips/vblank events don't appear to complete as
they're stuck waiting for the frame counter to catch up to that massive
difference.
Fix the problem properly by actually making sure the scanline counter
has started to move before we assume that it's safe to enable vblank
processing.
v2: Less pointless duplication in the code (Chris)
Cc: stable(a)vger.kernel.org
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Fixes: b7792d8b54cc ("drm/i915: Wait for pipe to start before sampling vblank timestamps on gen2")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171129153732.3612-1-ville.s…
(cherry picked from commit 8fedd64dabc86d0f31a0d1e152be3aa23c323553)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 878acc432a4b..e8ccf89cb17b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1000,7 +1000,8 @@ enum transcoder intel_pipe_to_cpu_transcoder(struct drm_i915_private *dev_priv,
return crtc->config->cpu_transcoder;
}
-static bool pipe_dsl_stopped(struct drm_i915_private *dev_priv, enum pipe pipe)
+static bool pipe_scanline_is_moving(struct drm_i915_private *dev_priv,
+ enum pipe pipe)
{
i915_reg_t reg = PIPEDSL(pipe);
u32 line1, line2;
@@ -1015,7 +1016,28 @@ static bool pipe_dsl_stopped(struct drm_i915_private *dev_priv, enum pipe pipe)
msleep(5);
line2 = I915_READ(reg) & line_mask;
- return line1 == line2;
+ return line1 != line2;
+}
+
+static void wait_for_pipe_scanline_moving(struct intel_crtc *crtc, bool state)
+{
+ struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
+ enum pipe pipe = crtc->pipe;
+
+ /* Wait for the display line to settle/start moving */
+ if (wait_for(pipe_scanline_is_moving(dev_priv, pipe) == state, 100))
+ DRM_ERROR("pipe %c scanline %s wait timed out\n",
+ pipe_name(pipe), onoff(state));
+}
+
+static void intel_wait_for_pipe_scanline_stopped(struct intel_crtc *crtc)
+{
+ wait_for_pipe_scanline_moving(crtc, false);
+}
+
+static void intel_wait_for_pipe_scanline_moving(struct intel_crtc *crtc)
+{
+ wait_for_pipe_scanline_moving(crtc, true);
}
/*
@@ -1038,7 +1060,6 @@ static void intel_wait_for_pipe_off(struct intel_crtc *crtc)
{
struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
enum transcoder cpu_transcoder = crtc->config->cpu_transcoder;
- enum pipe pipe = crtc->pipe;
if (INTEL_GEN(dev_priv) >= 4) {
i915_reg_t reg = PIPECONF(cpu_transcoder);
@@ -1049,9 +1070,7 @@ static void intel_wait_for_pipe_off(struct intel_crtc *crtc)
100))
WARN(1, "pipe_off wait timed out\n");
} else {
- /* Wait for the display line to settle */
- if (wait_for(pipe_dsl_stopped(dev_priv, pipe), 100))
- WARN(1, "pipe_off wait timed out\n");
+ intel_wait_for_pipe_scanline_stopped(crtc);
}
}
@@ -1936,15 +1955,14 @@ static void intel_enable_pipe(struct intel_crtc *crtc)
POSTING_READ(reg);
/*
- * Until the pipe starts DSL will read as 0, which would cause
- * an apparent vblank timestamp jump, which messes up also the
- * frame count when it's derived from the timestamps. So let's
- * wait for the pipe to start properly before we call
- * drm_crtc_vblank_on()
+ * Until the pipe starts PIPEDSL reads will return a stale value,
+ * which causes an apparent vblank timestamp jump when PIPEDSL
+ * resets to its proper value. That also messes up the frame count
+ * when it's derived from the timestamps. So let's wait for the
+ * pipe to start properly before we call drm_crtc_vblank_on()
*/
- if (dev->max_vblank_count == 0 &&
- wait_for(intel_get_crtc_scanline(crtc) != crtc->scanline_offset, 50))
- DRM_ERROR("pipe %c didn't start\n", pipe_name(pipe));
+ if (dev->max_vblank_count == 0)
+ intel_wait_for_pipe_scanline_moving(crtc);
}
/**
@@ -14643,6 +14661,8 @@ void i830_enable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe)
void i830_disable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe)
{
+ struct intel_crtc *crtc = intel_get_crtc_for_pipe(dev_priv, pipe);
+
DRM_DEBUG_KMS("disabling pipe %c due to force quirk\n",
pipe_name(pipe));
@@ -14652,8 +14672,7 @@ void i830_disable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe)
I915_WRITE(PIPECONF(pipe), 0);
POSTING_READ(PIPECONF(pipe));
- if (wait_for(pipe_dsl_stopped(dev_priv, pipe), 100))
- DRM_ERROR("pipe %c off wait timed out\n", pipe_name(pipe));
+ intel_wait_for_pipe_scanline_stopped(crtc);
I915_WRITE(DPLL(pipe), DPLL_VGA_MODE_DIS);
POSTING_READ(DPLL(pipe));
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e19182c0fff451e3744c1107d98f072e7ca377a0 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Mon, 4 Dec 2017 13:11:45 -0500
Subject: [PATCH] btrfs: fix missing error return in btrfs_drop_snapshot
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
error but then return 0 anyway due to mixing err and ret.
Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling")
Cc: <stable(a)vger.kernel.org> # v3.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 784d41e95ed9..16e46ee3cd16 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9220,6 +9220,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
ret = btrfs_del_root(trans, fs_info, &root->root_key);
if (ret) {
btrfs_abort_transaction(trans, ret);
+ err = ret;
goto out_end_trans;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e19182c0fff451e3744c1107d98f072e7ca377a0 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Mon, 4 Dec 2017 13:11:45 -0500
Subject: [PATCH] btrfs: fix missing error return in btrfs_drop_snapshot
If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the
error but then return 0 anyway due to mixing err and ret.
Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling")
Cc: <stable(a)vger.kernel.org> # v3.4+
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 784d41e95ed9..16e46ee3cd16 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9220,6 +9220,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
ret = btrfs_del_root(trans, fs_info, &root->root_key);
if (ret) {
btrfs_abort_transaction(trans, ret);
+ err = ret;
goto out_end_trans;
}
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f775b13eedee2f7f3c6fdd4e90fb79090ce5d339 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel(a)redhat.com>
Date: Tue, 14 Nov 2017 16:54:23 -0500
Subject: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.
This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.
This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.
This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:
[258270.527947] __warn+0xcb/0xf0
[258270.527948] warn_slowpath_null+0x1d/0x20
[258270.527951] kernel_fpu_disable+0x3f/0x50
[258270.527953] __kernel_fpu_begin+0x49/0x100
[258270.527955] kernel_fpu_begin+0xe/0x10
[258270.527958] crc32c_pcl_intel_update+0x84/0xb0
[258270.527961] crypto_shash_update+0x3f/0x110
[258270.527968] crc32c+0x63/0x8a [libcrc32c]
[258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
[258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data]
[258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
[258270.527988] submit_io+0x170/0x1b0 [dm_bufio]
[258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio]
[258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio]
[258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio]
[258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
[258270.528002] shrink_slab.part.40+0x1f5/0x420
[258270.528004] shrink_node+0x22c/0x320
[258270.528006] do_try_to_free_pages+0xf5/0x330
[258270.528008] try_to_free_pages+0xe9/0x190
[258270.528009] __alloc_pages_slowpath+0x40f/0xba0
[258270.528011] __alloc_pages_nodemask+0x209/0x260
[258270.528014] alloc_pages_vma+0x1f1/0x250
[258270.528017] do_huge_pmd_anonymous_page+0x123/0x660
[258270.528021] handle_mm_fault+0xfd3/0x1330
[258270.528025] __get_user_pages+0x113/0x640
[258270.528027] get_user_pages+0x4f/0x60
[258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
[258270.528108] try_async_pf+0x66/0x230 [kvm]
[258270.528135] tdp_page_fault+0x130/0x280 [kvm]
[258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm]
[258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel]
[258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.
Cc: stable(a)vger.kernel.org
Signed-off-by: Rik van Riel <riel(a)redhat.com>
Suggested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 977de5fb968b..62527e053ee4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
+ /*
+ * QEMU userspace and the guest each have their own FPU state.
+ * In vcpu_run, we switch between the user and guest FPU contexts.
+ * While running a VCPU, the VCPU thread will have the guest FPU
+ * context.
+ *
+ * Note that while the PKRU state lives inside the fpu registers,
+ * it is switched out separately at VMENTER and VMEXIT time. The
+ * "guest_fpu" state here contains the guest FPU context, with the
+ * host PRKU bits.
+ */
+ struct fpu user_fpu;
struct fpu guest_fpu;
+
u64 xcr0;
u64 guest_supported_xcr0;
u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eee8e7faf1af..c8da1680a7d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2937,7 +2937,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
- kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
}
@@ -5254,13 +5253,10 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt)
static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_disable();
- kvm_load_guest_fpu(emul_to_vcpu(ctxt));
}
static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_enable();
}
static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
@@ -6952,7 +6948,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
kvm_x86_ops->prepare_guest_switch(vcpu);
- kvm_load_guest_fpu(vcpu);
/*
* Disable IRQs before setting IN_GUEST_MODE. Posted interrupt
@@ -7297,12 +7292,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out;
+ goto out_fpu;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
@@ -7311,6 +7308,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);
+out_fpu:
+ kvm_put_guest_fpu(vcpu);
out:
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
@@ -7704,32 +7703,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
}
+/* Swap (qemu) user FPU context for the guest FPU context. */
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (vcpu->guest_fpu_loaded)
- return;
-
- /*
- * Restore all possible states in the guest,
- * and assume host would use all available bits.
- * Guest xcr0 would be loaded later.
- */
- vcpu->guest_fpu_loaded = 1;
- __kernel_fpu_begin();
+ preempt_disable();
+ copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
~XFEATURE_MASK_PKRU);
+ preempt_enable();
trace_kvm_fpu(1);
}
+/* When vcpu_run ends, restore user space FPU context. */
void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (!vcpu->guest_fpu_loaded)
- return;
-
- vcpu->guest_fpu_loaded = 0;
+ preempt_disable();
copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
- __kernel_fpu_end();
+ copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+ preempt_enable();
++vcpu->stat.fpu_reload;
trace_kvm_fpu(0);
}
@@ -7846,7 +7838,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
* To avoid have the INIT path from kvm_apic_has_events() that be
* called with loaded FPU and does not let userspace fix the state.
*/
- kvm_put_guest_fpu(vcpu);
+ if (init_event)
+ kvm_put_guest_fpu(vcpu);
mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
XFEATURE_MASK_BNDREGS);
if (mpx_state_buffer)
@@ -7855,6 +7848,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
XFEATURE_MASK_BNDCSR);
if (mpx_state_buffer)
memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+ if (init_event)
+ kvm_load_guest_fpu(vcpu);
}
if (!init_event) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 893d6d606cd0..6bdd4b9f6611 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -232,7 +232,7 @@ struct kvm_vcpu {
struct mutex mutex;
struct kvm_run *run;
- int guest_fpu_loaded, guest_xcr0_loaded;
+ int guest_xcr0_loaded;
struct swait_queue_head wq;
struct pid __rcu *pid;
int sigset_active;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f775b13eedee2f7f3c6fdd4e90fb79090ce5d339 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel(a)redhat.com>
Date: Tue, 14 Nov 2017 16:54:23 -0500
Subject: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.
This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.
This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.
This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:
[258270.527947] __warn+0xcb/0xf0
[258270.527948] warn_slowpath_null+0x1d/0x20
[258270.527951] kernel_fpu_disable+0x3f/0x50
[258270.527953] __kernel_fpu_begin+0x49/0x100
[258270.527955] kernel_fpu_begin+0xe/0x10
[258270.527958] crc32c_pcl_intel_update+0x84/0xb0
[258270.527961] crypto_shash_update+0x3f/0x110
[258270.527968] crc32c+0x63/0x8a [libcrc32c]
[258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
[258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data]
[258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
[258270.527988] submit_io+0x170/0x1b0 [dm_bufio]
[258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio]
[258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio]
[258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio]
[258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
[258270.528002] shrink_slab.part.40+0x1f5/0x420
[258270.528004] shrink_node+0x22c/0x320
[258270.528006] do_try_to_free_pages+0xf5/0x330
[258270.528008] try_to_free_pages+0xe9/0x190
[258270.528009] __alloc_pages_slowpath+0x40f/0xba0
[258270.528011] __alloc_pages_nodemask+0x209/0x260
[258270.528014] alloc_pages_vma+0x1f1/0x250
[258270.528017] do_huge_pmd_anonymous_page+0x123/0x660
[258270.528021] handle_mm_fault+0xfd3/0x1330
[258270.528025] __get_user_pages+0x113/0x640
[258270.528027] get_user_pages+0x4f/0x60
[258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
[258270.528108] try_async_pf+0x66/0x230 [kvm]
[258270.528135] tdp_page_fault+0x130/0x280 [kvm]
[258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm]
[258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel]
[258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.
Cc: stable(a)vger.kernel.org
Signed-off-by: Rik van Riel <riel(a)redhat.com>
Suggested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 977de5fb968b..62527e053ee4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
+ /*
+ * QEMU userspace and the guest each have their own FPU state.
+ * In vcpu_run, we switch between the user and guest FPU contexts.
+ * While running a VCPU, the VCPU thread will have the guest FPU
+ * context.
+ *
+ * Note that while the PKRU state lives inside the fpu registers,
+ * it is switched out separately at VMENTER and VMEXIT time. The
+ * "guest_fpu" state here contains the guest FPU context, with the
+ * host PRKU bits.
+ */
+ struct fpu user_fpu;
struct fpu guest_fpu;
+
u64 xcr0;
u64 guest_supported_xcr0;
u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eee8e7faf1af..c8da1680a7d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2937,7 +2937,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
- kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
}
@@ -5254,13 +5253,10 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt)
static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_disable();
- kvm_load_guest_fpu(emul_to_vcpu(ctxt));
}
static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_enable();
}
static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
@@ -6952,7 +6948,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
kvm_x86_ops->prepare_guest_switch(vcpu);
- kvm_load_guest_fpu(vcpu);
/*
* Disable IRQs before setting IN_GUEST_MODE. Posted interrupt
@@ -7297,12 +7292,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out;
+ goto out_fpu;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
@@ -7311,6 +7308,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);
+out_fpu:
+ kvm_put_guest_fpu(vcpu);
out:
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
@@ -7704,32 +7703,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
}
+/* Swap (qemu) user FPU context for the guest FPU context. */
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (vcpu->guest_fpu_loaded)
- return;
-
- /*
- * Restore all possible states in the guest,
- * and assume host would use all available bits.
- * Guest xcr0 would be loaded later.
- */
- vcpu->guest_fpu_loaded = 1;
- __kernel_fpu_begin();
+ preempt_disable();
+ copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
~XFEATURE_MASK_PKRU);
+ preempt_enable();
trace_kvm_fpu(1);
}
+/* When vcpu_run ends, restore user space FPU context. */
void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (!vcpu->guest_fpu_loaded)
- return;
-
- vcpu->guest_fpu_loaded = 0;
+ preempt_disable();
copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
- __kernel_fpu_end();
+ copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+ preempt_enable();
++vcpu->stat.fpu_reload;
trace_kvm_fpu(0);
}
@@ -7846,7 +7838,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
* To avoid have the INIT path from kvm_apic_has_events() that be
* called with loaded FPU and does not let userspace fix the state.
*/
- kvm_put_guest_fpu(vcpu);
+ if (init_event)
+ kvm_put_guest_fpu(vcpu);
mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
XFEATURE_MASK_BNDREGS);
if (mpx_state_buffer)
@@ -7855,6 +7848,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
XFEATURE_MASK_BNDCSR);
if (mpx_state_buffer)
memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+ if (init_event)
+ kvm_load_guest_fpu(vcpu);
}
if (!init_event) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 893d6d606cd0..6bdd4b9f6611 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -232,7 +232,7 @@ struct kvm_vcpu {
struct mutex mutex;
struct kvm_run *run;
- int guest_fpu_loaded, guest_xcr0_loaded;
+ int guest_xcr0_loaded;
struct swait_queue_head wq;
struct pid __rcu *pid;
int sigset_active;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f775b13eedee2f7f3c6fdd4e90fb79090ce5d339 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel(a)redhat.com>
Date: Tue, 14 Nov 2017 16:54:23 -0500
Subject: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.
This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.
This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.
This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:
[258270.527947] __warn+0xcb/0xf0
[258270.527948] warn_slowpath_null+0x1d/0x20
[258270.527951] kernel_fpu_disable+0x3f/0x50
[258270.527953] __kernel_fpu_begin+0x49/0x100
[258270.527955] kernel_fpu_begin+0xe/0x10
[258270.527958] crc32c_pcl_intel_update+0x84/0xb0
[258270.527961] crypto_shash_update+0x3f/0x110
[258270.527968] crc32c+0x63/0x8a [libcrc32c]
[258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
[258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data]
[258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
[258270.527988] submit_io+0x170/0x1b0 [dm_bufio]
[258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio]
[258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio]
[258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio]
[258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
[258270.528002] shrink_slab.part.40+0x1f5/0x420
[258270.528004] shrink_node+0x22c/0x320
[258270.528006] do_try_to_free_pages+0xf5/0x330
[258270.528008] try_to_free_pages+0xe9/0x190
[258270.528009] __alloc_pages_slowpath+0x40f/0xba0
[258270.528011] __alloc_pages_nodemask+0x209/0x260
[258270.528014] alloc_pages_vma+0x1f1/0x250
[258270.528017] do_huge_pmd_anonymous_page+0x123/0x660
[258270.528021] handle_mm_fault+0xfd3/0x1330
[258270.528025] __get_user_pages+0x113/0x640
[258270.528027] get_user_pages+0x4f/0x60
[258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
[258270.528108] try_async_pf+0x66/0x230 [kvm]
[258270.528135] tdp_page_fault+0x130/0x280 [kvm]
[258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm]
[258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel]
[258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.
Cc: stable(a)vger.kernel.org
Signed-off-by: Rik van Riel <riel(a)redhat.com>
Suggested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 977de5fb968b..62527e053ee4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
+ /*
+ * QEMU userspace and the guest each have their own FPU state.
+ * In vcpu_run, we switch between the user and guest FPU contexts.
+ * While running a VCPU, the VCPU thread will have the guest FPU
+ * context.
+ *
+ * Note that while the PKRU state lives inside the fpu registers,
+ * it is switched out separately at VMENTER and VMEXIT time. The
+ * "guest_fpu" state here contains the guest FPU context, with the
+ * host PRKU bits.
+ */
+ struct fpu user_fpu;
struct fpu guest_fpu;
+
u64 xcr0;
u64 guest_supported_xcr0;
u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eee8e7faf1af..c8da1680a7d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2937,7 +2937,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
- kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
}
@@ -5254,13 +5253,10 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt)
static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_disable();
- kvm_load_guest_fpu(emul_to_vcpu(ctxt));
}
static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_enable();
}
static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
@@ -6952,7 +6948,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
kvm_x86_ops->prepare_guest_switch(vcpu);
- kvm_load_guest_fpu(vcpu);
/*
* Disable IRQs before setting IN_GUEST_MODE. Posted interrupt
@@ -7297,12 +7292,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out;
+ goto out_fpu;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
@@ -7311,6 +7308,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);
+out_fpu:
+ kvm_put_guest_fpu(vcpu);
out:
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
@@ -7704,32 +7703,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
}
+/* Swap (qemu) user FPU context for the guest FPU context. */
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (vcpu->guest_fpu_loaded)
- return;
-
- /*
- * Restore all possible states in the guest,
- * and assume host would use all available bits.
- * Guest xcr0 would be loaded later.
- */
- vcpu->guest_fpu_loaded = 1;
- __kernel_fpu_begin();
+ preempt_disable();
+ copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
~XFEATURE_MASK_PKRU);
+ preempt_enable();
trace_kvm_fpu(1);
}
+/* When vcpu_run ends, restore user space FPU context. */
void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (!vcpu->guest_fpu_loaded)
- return;
-
- vcpu->guest_fpu_loaded = 0;
+ preempt_disable();
copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
- __kernel_fpu_end();
+ copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+ preempt_enable();
++vcpu->stat.fpu_reload;
trace_kvm_fpu(0);
}
@@ -7846,7 +7838,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
* To avoid have the INIT path from kvm_apic_has_events() that be
* called with loaded FPU and does not let userspace fix the state.
*/
- kvm_put_guest_fpu(vcpu);
+ if (init_event)
+ kvm_put_guest_fpu(vcpu);
mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
XFEATURE_MASK_BNDREGS);
if (mpx_state_buffer)
@@ -7855,6 +7848,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
XFEATURE_MASK_BNDCSR);
if (mpx_state_buffer)
memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+ if (init_event)
+ kvm_load_guest_fpu(vcpu);
}
if (!init_event) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 893d6d606cd0..6bdd4b9f6611 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -232,7 +232,7 @@ struct kvm_vcpu {
struct mutex mutex;
struct kvm_run *run;
- int guest_fpu_loaded, guest_xcr0_loaded;
+ int guest_xcr0_loaded;
struct swait_queue_head wq;
struct pid __rcu *pid;
int sigset_active;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f775b13eedee2f7f3c6fdd4e90fb79090ce5d339 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel(a)redhat.com>
Date: Tue, 14 Nov 2017 16:54:23 -0500
Subject: [PATCH] x86,kvm: move qemu/guest FPU switching out to vcpu_run
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Currently, every time a VCPU is scheduled out, the host kernel will
first save the guest FPU/xstate context, then load the qemu userspace
FPU context, only to then immediately save the qemu userspace FPU
context back to memory. When scheduling in a VCPU, the same extraneous
FPU loads and saves are done.
This could be avoided by moving from a model where the guest FPU is
loaded and stored with preemption disabled, to a model where the
qemu userspace FPU is swapped out for the guest FPU context for
the duration of the KVM_RUN ioctl.
This is done under the VCPU mutex, which is also taken when other
tasks inspect the VCPU FPU context, so the code should already be
safe for this change. That should come as no surprise, given that
s390 already has this optimization.
This can fix a bug where KVM calls get_user_pages while owning the
FPU, and the file system ends up requesting the FPU again:
[258270.527947] __warn+0xcb/0xf0
[258270.527948] warn_slowpath_null+0x1d/0x20
[258270.527951] kernel_fpu_disable+0x3f/0x50
[258270.527953] __kernel_fpu_begin+0x49/0x100
[258270.527955] kernel_fpu_begin+0xe/0x10
[258270.527958] crc32c_pcl_intel_update+0x84/0xb0
[258270.527961] crypto_shash_update+0x3f/0x110
[258270.527968] crc32c+0x63/0x8a [libcrc32c]
[258270.527975] dm_bm_checksum+0x1b/0x20 [dm_persistent_data]
[258270.527978] node_prepare_for_write+0x44/0x70 [dm_persistent_data]
[258270.527985] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data]
[258270.527988] submit_io+0x170/0x1b0 [dm_bufio]
[258270.527992] __write_dirty_buffer+0x89/0x90 [dm_bufio]
[258270.527994] __make_buffer_clean+0x4f/0x80 [dm_bufio]
[258270.527996] __try_evict_buffer+0x42/0x60 [dm_bufio]
[258270.527998] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio]
[258270.528002] shrink_slab.part.40+0x1f5/0x420
[258270.528004] shrink_node+0x22c/0x320
[258270.528006] do_try_to_free_pages+0xf5/0x330
[258270.528008] try_to_free_pages+0xe9/0x190
[258270.528009] __alloc_pages_slowpath+0x40f/0xba0
[258270.528011] __alloc_pages_nodemask+0x209/0x260
[258270.528014] alloc_pages_vma+0x1f1/0x250
[258270.528017] do_huge_pmd_anonymous_page+0x123/0x660
[258270.528021] handle_mm_fault+0xfd3/0x1330
[258270.528025] __get_user_pages+0x113/0x640
[258270.528027] get_user_pages+0x4f/0x60
[258270.528063] __gfn_to_pfn_memslot+0x120/0x3f0 [kvm]
[258270.528108] try_async_pf+0x66/0x230 [kvm]
[258270.528135] tdp_page_fault+0x130/0x280 [kvm]
[258270.528149] kvm_mmu_page_fault+0x60/0x120 [kvm]
[258270.528158] handle_ept_violation+0x91/0x170 [kvm_intel]
[258270.528162] vmx_handle_exit+0x1ca/0x1400 [kvm_intel]
No performance changes were detected in quick ping-pong tests on
my 4 socket system, which is expected since an FPU+xstate load is
on the order of 0.1us, while ping-ponging between CPUs is on the
order of 20us, and somewhat noisy.
Cc: stable(a)vger.kernel.org
Signed-off-by: Rik van Riel <riel(a)redhat.com>
Suggested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[Fixed a bug where reset_vcpu called put_fpu without preceding load_fpu,
which happened inside from KVM_CREATE_VCPU ioctl. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 977de5fb968b..62527e053ee4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -536,7 +536,20 @@ struct kvm_vcpu_arch {
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
+ /*
+ * QEMU userspace and the guest each have their own FPU state.
+ * In vcpu_run, we switch between the user and guest FPU contexts.
+ * While running a VCPU, the VCPU thread will have the guest FPU
+ * context.
+ *
+ * Note that while the PKRU state lives inside the fpu registers,
+ * it is switched out separately at VMENTER and VMEXIT time. The
+ * "guest_fpu" state here contains the guest FPU context, with the
+ * host PRKU bits.
+ */
+ struct fpu user_fpu;
struct fpu guest_fpu;
+
u64 xcr0;
u64 guest_supported_xcr0;
u32 guest_xstate_size;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index eee8e7faf1af..c8da1680a7d6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2937,7 +2937,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
srcu_read_unlock(&vcpu->kvm->srcu, idx);
pagefault_enable();
kvm_x86_ops->vcpu_put(vcpu);
- kvm_put_guest_fpu(vcpu);
vcpu->arch.last_host_tsc = rdtsc();
}
@@ -5254,13 +5253,10 @@ static void emulator_halt(struct x86_emulate_ctxt *ctxt)
static void emulator_get_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_disable();
- kvm_load_guest_fpu(emul_to_vcpu(ctxt));
}
static void emulator_put_fpu(struct x86_emulate_ctxt *ctxt)
{
- preempt_enable();
}
static int emulator_intercept(struct x86_emulate_ctxt *ctxt,
@@ -6952,7 +6948,6 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
preempt_disable();
kvm_x86_ops->prepare_guest_switch(vcpu);
- kvm_load_guest_fpu(vcpu);
/*
* Disable IRQs before setting IN_GUEST_MODE. Posted interrupt
@@ -7297,12 +7292,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
}
}
+ kvm_load_guest_fpu(vcpu);
+
if (unlikely(vcpu->arch.complete_userspace_io)) {
int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
vcpu->arch.complete_userspace_io = NULL;
r = cui(vcpu);
if (r <= 0)
- goto out;
+ goto out_fpu;
} else
WARN_ON(vcpu->arch.pio.count || vcpu->mmio_needed);
@@ -7311,6 +7308,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
else
r = vcpu_run(vcpu);
+out_fpu:
+ kvm_put_guest_fpu(vcpu);
out:
post_kvm_run_save(vcpu);
kvm_sigset_deactivate(vcpu);
@@ -7704,32 +7703,25 @@ static void fx_init(struct kvm_vcpu *vcpu)
vcpu->arch.cr0 |= X86_CR0_ET;
}
+/* Swap (qemu) user FPU context for the guest FPU context. */
void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (vcpu->guest_fpu_loaded)
- return;
-
- /*
- * Restore all possible states in the guest,
- * and assume host would use all available bits.
- * Guest xcr0 would be loaded later.
- */
- vcpu->guest_fpu_loaded = 1;
- __kernel_fpu_begin();
+ preempt_disable();
+ copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
~XFEATURE_MASK_PKRU);
+ preempt_enable();
trace_kvm_fpu(1);
}
+/* When vcpu_run ends, restore user space FPU context. */
void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
{
- if (!vcpu->guest_fpu_loaded)
- return;
-
- vcpu->guest_fpu_loaded = 0;
+ preempt_disable();
copy_fpregs_to_fpstate(&vcpu->arch.guest_fpu);
- __kernel_fpu_end();
+ copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
+ preempt_enable();
++vcpu->stat.fpu_reload;
trace_kvm_fpu(0);
}
@@ -7846,7 +7838,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
* To avoid have the INIT path from kvm_apic_has_events() that be
* called with loaded FPU and does not let userspace fix the state.
*/
- kvm_put_guest_fpu(vcpu);
+ if (init_event)
+ kvm_put_guest_fpu(vcpu);
mpx_state_buffer = get_xsave_addr(&vcpu->arch.guest_fpu.state.xsave,
XFEATURE_MASK_BNDREGS);
if (mpx_state_buffer)
@@ -7855,6 +7848,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
XFEATURE_MASK_BNDCSR);
if (mpx_state_buffer)
memset(mpx_state_buffer, 0, sizeof(struct mpx_bndcsr));
+ if (init_event)
+ kvm_load_guest_fpu(vcpu);
}
if (!init_event) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 893d6d606cd0..6bdd4b9f6611 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -232,7 +232,7 @@ struct kvm_vcpu {
struct mutex mutex;
struct kvm_run *run;
- int guest_fpu_loaded, guest_xcr0_loaded;
+ int guest_xcr0_loaded;
struct swait_queue_head wq;
struct pid __rcu *pid;
int sigset_active;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4dca6ea1d9432052afb06baf2e3ae78188a4410b Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Fri, 8 Dec 2017 15:13:27 +0000
Subject: [PATCH] KEYS: add missing permission check for request_key()
destination
When the request_key() syscall is not passed a destination keyring, it
links the requested key (if constructed) into the "default" request-key
keyring. This should require Write permission to the keyring. However,
there is actually no permission check.
This can be abused to add keys to any keyring to which only Search
permission is granted. This is because Search permission allows joining
the keyring. keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING)
then will set the default request-key keyring to the session keyring.
Then, request_key() can be used to add keys to the keyring.
Both negatively and positively instantiated keys can be added using this
method. Adding negative keys is trivial. Adding a positive key is a
bit trickier. It requires that either /sbin/request-key positively
instantiates the key, or that another thread adds the key to the process
keyring at just the right time, such that request_key() misses it
initially but then finds it in construct_alloc_key().
Fix this bug by checking for Write permission to the keyring in
construct_get_dest_keyring() when the default keyring is being used.
We don't do the permission check for non-default keyrings because that
was already done by the earlier call to lookup_user_key(). Also,
request_key_and_link() is currently passed a 'struct key *' rather than
a key_ref_t, so the "possessed" bit is unavailable.
We also don't do the permission check for the "requestor keyring", to
continue to support the use case described by commit 8bbf4976b59f
("KEYS: Alter use of key instantiation link-to-keyring argument") where
/sbin/request-key recursively calls request_key() to add keys to the
original requestor's destination keyring. (I don't know of any users
who actually do that, though...)
Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key")
Cc: <stable(a)vger.kernel.org> # v2.6.13+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
diff --git a/security/keys/request_key.c b/security/keys/request_key.c
index c6880af8b411..114f7408feee 100644
--- a/security/keys/request_key.c
+++ b/security/keys/request_key.c
@@ -251,11 +251,12 @@ static int construct_key(struct key *key, const void *callout_info,
* The keyring selected is returned with an extra reference upon it which the
* caller must release.
*/
-static void construct_get_dest_keyring(struct key **_dest_keyring)
+static int construct_get_dest_keyring(struct key **_dest_keyring)
{
struct request_key_auth *rka;
const struct cred *cred = current_cred();
struct key *dest_keyring = *_dest_keyring, *authkey;
+ int ret;
kenter("%p", dest_keyring);
@@ -264,6 +265,8 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
/* the caller supplied one */
key_get(dest_keyring);
} else {
+ bool do_perm_check = true;
+
/* use a default keyring; falling through the cases until we
* find one that we actually have */
switch (cred->jit_keyring) {
@@ -278,8 +281,10 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
dest_keyring =
key_get(rka->dest_keyring);
up_read(&authkey->sem);
- if (dest_keyring)
+ if (dest_keyring) {
+ do_perm_check = false;
break;
+ }
}
case KEY_REQKEY_DEFL_THREAD_KEYRING:
@@ -314,11 +319,29 @@ static void construct_get_dest_keyring(struct key **_dest_keyring)
default:
BUG();
}
+
+ /*
+ * Require Write permission on the keyring. This is essential
+ * because the default keyring may be the session keyring, and
+ * joining a keyring only requires Search permission.
+ *
+ * However, this check is skipped for the "requestor keyring" so
+ * that /sbin/request-key can itself use request_key() to add
+ * keys to the original requestor's destination keyring.
+ */
+ if (dest_keyring && do_perm_check) {
+ ret = key_permission(make_key_ref(dest_keyring, 1),
+ KEY_NEED_WRITE);
+ if (ret) {
+ key_put(dest_keyring);
+ return ret;
+ }
+ }
}
*_dest_keyring = dest_keyring;
kleave(" [dk %d]", key_serial(dest_keyring));
- return;
+ return 0;
}
/*
@@ -444,11 +467,15 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
if (ctx->index_key.type == &key_type_keyring)
return ERR_PTR(-EPERM);
- user = key_user_lookup(current_fsuid());
- if (!user)
- return ERR_PTR(-ENOMEM);
+ ret = construct_get_dest_keyring(&dest_keyring);
+ if (ret)
+ goto error;
- construct_get_dest_keyring(&dest_keyring);
+ user = key_user_lookup(current_fsuid());
+ if (!user) {
+ ret = -ENOMEM;
+ goto error_put_dest_keyring;
+ }
ret = construct_alloc_key(ctx, dest_keyring, flags, user, &key);
key_user_put(user);
@@ -463,7 +490,7 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
} else if (ret == -EINPROGRESS) {
ret = 0;
} else {
- goto couldnt_alloc_key;
+ goto error_put_dest_keyring;
}
key_put(dest_keyring);
@@ -473,8 +500,9 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx,
construction_failed:
key_negate_and_link(key, key_negative_timeout, NULL, NULL);
key_put(key);
-couldnt_alloc_key:
+error_put_dest_keyring:
key_put(dest_keyring);
+error:
kleave(" = %d", ret);
return ERR_PTR(ret);
}
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e0058f3a874ebb48b25be7ff79bc3b4e59929f90 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Fri, 8 Dec 2017 15:13:27 +0000
Subject: [PATCH] ASN.1: fix out-of-bounds read when parsing indefinite length
item
In asn1_ber_decoder(), indefinitely-sized ASN.1 items were being passed
to the action functions before their lengths had been computed, using
the bogus length of 0x80 (ASN1_INDEFINITE_LENGTH). This resulted in
reading data past the end of the input buffer, when given a specially
crafted message.
Fix it by rearranging the code so that the indefinite length is resolved
before the action is called.
This bug was originally found by fuzzing the X.509 parser in userspace
using libFuzzer from the LLVM project.
KASAN report (cleaned up slightly):
BUG: KASAN: slab-out-of-bounds in memcpy ./include/linux/string.h:341 [inline]
BUG: KASAN: slab-out-of-bounds in x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
Read of size 128 at addr ffff880035dd9eaf by task keyctl/195
CPU: 1 PID: 195 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
Call Trace:
__dump_stack lib/dump_stack.c:17 [inline]
dump_stack+0xd1/0x175 lib/dump_stack.c:53
print_address_description+0x78/0x260 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x23f/0x350 mm/kasan/report.c:409
memcpy+0x1f/0x50 mm/kasan/kasan.c:302
memcpy ./include/linux/string.h:341 [inline]
x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366
asn1_ber_decoder+0xb4a/0x1fd0 lib/asn1_decoder.c:447
x509_cert_parse+0x1c7/0x620 crypto/asymmetric_keys/x509_cert_parser.c:89
x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
SYSC_add_key security/keys/keyctl.c:122 [inline]
SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Allocated by task 195:
__do_kmalloc_node mm/slab.c:3675 [inline]
__kmalloc_node+0x47/0x60 mm/slab.c:3682
kvmalloc ./include/linux/mm.h:540 [inline]
SYSC_add_key security/keys/keyctl.c:104 [inline]
SyS_add_key+0x19e/0x290 security/keys/keyctl.c:62
entry_SYSCALL_64_fastpath+0x1f/0x96
Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder")
Reported-by: Alexander Potapenko <glider(a)google.com>
Cc: <stable(a)vger.kernel.org> # v3.7+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c
index 1ef0cec38d78..d77cdfc4b554 100644
--- a/lib/asn1_decoder.c
+++ b/lib/asn1_decoder.c
@@ -313,42 +313,47 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder,
/* Decide how to handle the operation */
switch (op) {
- case ASN1_OP_MATCH_ANY_ACT:
- case ASN1_OP_MATCH_ANY_ACT_OR_SKIP:
- case ASN1_OP_COND_MATCH_ANY_ACT:
- case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP:
- ret = actions[machine[pc + 1]](context, hdr, tag, data + dp, len);
- if (ret < 0)
- return ret;
- goto skip_data;
-
- case ASN1_OP_MATCH_ACT:
- case ASN1_OP_MATCH_ACT_OR_SKIP:
- case ASN1_OP_COND_MATCH_ACT_OR_SKIP:
- ret = actions[machine[pc + 2]](context, hdr, tag, data + dp, len);
- if (ret < 0)
- return ret;
- goto skip_data;
-
case ASN1_OP_MATCH:
case ASN1_OP_MATCH_OR_SKIP:
+ case ASN1_OP_MATCH_ACT:
+ case ASN1_OP_MATCH_ACT_OR_SKIP:
case ASN1_OP_MATCH_ANY:
case ASN1_OP_MATCH_ANY_OR_SKIP:
+ case ASN1_OP_MATCH_ANY_ACT:
+ case ASN1_OP_MATCH_ANY_ACT_OR_SKIP:
case ASN1_OP_COND_MATCH_OR_SKIP:
+ case ASN1_OP_COND_MATCH_ACT_OR_SKIP:
case ASN1_OP_COND_MATCH_ANY:
case ASN1_OP_COND_MATCH_ANY_OR_SKIP:
- skip_data:
+ case ASN1_OP_COND_MATCH_ANY_ACT:
+ case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP:
+
if (!(flags & FLAG_CONS)) {
if (flags & FLAG_INDEFINITE_LENGTH) {
+ size_t tmp = dp;
+
ret = asn1_find_indefinite_length(
- data, datalen, &dp, &len, &errmsg);
+ data, datalen, &tmp, &len, &errmsg);
if (ret < 0)
goto error;
- } else {
- dp += len;
}
pr_debug("- LEAF: %zu\n", len);
}
+
+ if (op & ASN1_OP_MATCH__ACT) {
+ unsigned char act;
+
+ if (op & ASN1_OP_MATCH__ANY)
+ act = machine[pc + 1];
+ else
+ act = machine[pc + 2];
+ ret = actions[act](context, hdr, tag, data + dp, len);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (!(flags & FLAG_CONS))
+ dp += len;
pc += asn1_op_lengths[op];
goto next_op;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d9e427f6ab8142d6868eb719e6a7851aafea56b6 Mon Sep 17 00:00:00 2001
From: Jan Stancek <jstancek(a)redhat.com>
Date: Fri, 1 Dec 2017 10:50:28 +0100
Subject: [PATCH] virtio_balloon: fix increment of vb->num_pfns in
fill_balloon()
commit c7cdff0e8647 ("virtio_balloon: fix deadlock on OOM")
changed code to increment vb->num_pfns before call to
set_page_pfns(), which used to happen only after.
This patch fixes boot hang for me on ppc64le KVM guests.
Fixes: c7cdff0e8647 ("virtio_balloon: fix deadlock on OOM")
Cc: Michael S. Tsirkin <mst(a)redhat.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Wang <wei.w.wang(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 7960746f7597..a1fb52cb3f0a 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -174,13 +174,12 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num)
while ((page = balloon_page_pop(&pages))) {
balloon_page_enqueue(&vb->vb_dev_info, page);
- vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
-
set_page_pfns(vb, vb->pfns + vb->num_pfns, page);
vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE;
if (!virtio_has_feature(vb->vdev,
VIRTIO_BALLOON_F_DEFLATE_ON_OOM))
adjust_managed_page_count(page, -1);
+ vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE;
}
num_allocated_pages = vb->num_pfns;
From: Dexuan Cui <decui(a)microsoft.com>
Fixes: c2e5df616e1a ("vmbus: add per-channel sysfs info")
Without the patch, a device can't be thoroughly destroyed, because
vmbus_device_register() -> kset_create_and_add() still holds a reference
to the hv_device's device.kobj.
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Cc: Stephen Hemminger <sthemmin(a)microsoft.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: K. Y. Srinivasan <kys(a)microsoft.com>
---
drivers/hv/vmbus_drv.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 6a86746..4f3faf5 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1378,6 +1378,8 @@ void vmbus_device_unregister(struct hv_device *device_obj)
pr_debug("child device %s unregistered\n",
dev_name(&device_obj->device));
+ kset_unregister(device_obj->channels_kset);
+
/*
* Kick off the process of unregistering the device.
* This will call vmbus_remove() and eventually vmbus_device_release()
--
1.7.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: dvb_frontend: be sure to init dvb_frontend_handle_ioctl() return code
Author: Mauro Carvalho Chehab <mchehab(a)s-opensource.com>
Date: Wed Nov 1 17:05:39 2017 -0400
As smatch warned:
drivers/media/dvb-core/dvb_frontend.c:2468 dvb_frontend_handle_ioctl() error: uninitialized symbol 'err'.
The ioctl handler actually got a regression here: before changeset
d73dcf0cdb95 ("media: dvb_frontend: cleanup ioctl handling logic"),
the code used to return -EOPNOTSUPP if an ioctl handler was not
implemented on a driver. After the change, it may return a random
value.
Fixes: d73dcf0cdb95 ("media: dvb_frontend: cleanup ioctl handling logic")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)s-opensource.com>
Tested-by: Daniel Scheller <d.scheller(a)gmx.net>
Signed-off-by: Mauro Carvalho Chehab <mchehab(a)s-opensource.com>
drivers/media/dvb-core/dvb_frontend.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
---
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c
index 2afaa8226342..46f977177faf 100644
--- a/drivers/media/dvb-core/dvb_frontend.c
+++ b/drivers/media/dvb-core/dvb_frontend.c
@@ -2110,7 +2110,7 @@ static int dvb_frontend_handle_ioctl(struct file *file,
struct dvb_frontend *fe = dvbdev->priv;
struct dvb_frontend_private *fepriv = fe->frontend_priv;
struct dtv_frontend_properties *c = &fe->dtv_property_cache;
- int i, err;
+ int i, err = -EOPNOTSUPP;
dev_dbg(fe->dvb->device, "%s:\n", __func__);
@@ -2145,6 +2145,7 @@ static int dvb_frontend_handle_ioctl(struct file *file,
}
}
kfree(tvp);
+ err = 0;
break;
}
case FE_GET_PROPERTY: {
@@ -2196,6 +2197,7 @@ static int dvb_frontend_handle_ioctl(struct file *file,
return -EFAULT;
}
kfree(tvp);
+ err = 0;
break;
}
Hi,
This series corrects a number of issues with NT_PRFPREG regset, most
importantly an FCSR access API regression introduced with the addition of
MSA support, and then a few smaller issues with the get/set handlers.
I have decided to factor out non-MSA and MSA context helpers as the first
step to avoid the issue with excessive indentation that would inevitably
happen if the regression fix was applied to current code as it stands.
It shouldn't be a big deal with backporting as this code hasn't changed
much since the regression, and it will make any future bacports easier.
Only a call to `init_fp_ctx' will have to be trivially resolved (though
arguably commit ac9ad83bc318 ("MIPS: prevent FP context set via ptrace
being discarded"), which has added `init_fp_ctx', would be good to
backport as far as possible instead).
These changes have been verified by examining the register state recorded
in core dumps manually with GDB, as well as by running the GDB test suite.
No user of ptrace(2) PTRACE_GETREGSET and PTRACE_SETREGSET requests is
known for the MIPS port, so this part remains not covered, however it is
assumed to remain consistent with how the creation of core file works.
See individual patch descriptions for further details.
Maciej
On Mon, 2017-12-11 at 14:51 +0100, Richard Schütz wrote:
> Hello,
>
> as per netdev-FAQ.txt I'm requesting the submission of commit
> 57629915d568c522ac1422df7bba4bee5b5c7a7c ("mac80211: Fix addition of
> mesh configuration element") to stable. Because of automatic selection
> commit 76f43b4c0a9337af22827d78de4f2b8fd5328489 ("mac80211: Remove
> invalid flag operations in mesh TSF synchronization") was introduced
> into stable recently without this accompanying fix.
It seems that due to the Fixes: tag Greg should pick it up anyway, but
in any case I don't really deal with stable myself (in that sense, I
guess wireless isn't really part of netdev...)
johannes
From: Eric Biggers <ebiggers(a)google.com>
All the ChaCha20 algorithms as well as the ARM bit-sliced AES-XTS
algorithms call skcipher_walk_virt(), then access the IV (walk.iv)
before checking whether any bytes need to be processed (walk.nbytes).
But if the input is empty, then skcipher_walk_virt() doesn't set the IV,
and the algorithms crash trying to use the uninitialized IV pointer.
Fix it by setting the IV earlier in skcipher_walk_virt(). Also fix it
for the AEAD walk functions.
This isn't a perfect solution because we can't actually align the IV to
->cra_alignmask unless there are bytes to process, for one because the
temporary buffer for the aligned IV is freed by skcipher_walk_done(),
which is only called when there are bytes to process. Thus, algorithms
that require aligned IVs will still need to avoid accessing the IV when
walk.nbytes == 0. Still, many algorithms/architectures are fine with
IVs having any alignment, and even for those that aren't, a misaligned
pointer bug is much less severe than an uninitialized pointer bug.
This change also matches the behavior of the older blkcipher_walk API.
Fixes: 0cabf2af6f5a ("crypto: skcipher - Fix crash on zero-length input")
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: <stable(a)vger.kernel.org> # v4.14+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
crypto/skcipher.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 778e0ff42bfa..11af5fd6a443 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -449,6 +449,8 @@ static int skcipher_walk_skcipher(struct skcipher_walk *walk,
walk->total = req->cryptlen;
walk->nbytes = 0;
+ walk->iv = req->iv;
+ walk->oiv = req->iv;
if (unlikely(!walk->total))
return 0;
@@ -456,9 +458,6 @@ static int skcipher_walk_skcipher(struct skcipher_walk *walk,
scatterwalk_start(&walk->in, req->src);
scatterwalk_start(&walk->out, req->dst);
- walk->iv = req->iv;
- walk->oiv = req->iv;
-
walk->flags &= ~SKCIPHER_WALK_SLEEP;
walk->flags |= req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
SKCIPHER_WALK_SLEEP : 0;
@@ -510,6 +509,8 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk,
int err;
walk->nbytes = 0;
+ walk->iv = req->iv;
+ walk->oiv = req->iv;
if (unlikely(!walk->total))
return 0;
@@ -525,9 +526,6 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk,
scatterwalk_done(&walk->in, 0, walk->total);
scatterwalk_done(&walk->out, 0, walk->total);
- walk->iv = req->iv;
- walk->oiv = req->iv;
-
if (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP)
walk->flags |= SKCIPHER_WALK_SLEEP;
else
--
2.15.0