The 4.9.77 version of "x86/pti/efi: broken conversion from efi to kernel
page table" looked nicer than the 4.4.112 version, but was suboptimal on
machines booted with "pti=off" (or on AMD machines): it allocated pgd
with an order 1 page whatever the setting of kaiser_enabled.
Fix that by moving the definition of PGD_ALLOCATION_ORDER from
asm/pgalloc.h to asm/pgtable.h, which already defines kaiser_enabled.
Fixes: 1b92c48a2eeb ("x86/pti/efi: broken conversion from efi to kernel page table")
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Steven Sistare <steven.sistare(a)oracle.com>
Cc: Jiri Kosina <jkosina(a)suse.cz>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugh Dickins <hughd(a)google.com>
---
arch/x86/include/asm/pgalloc.h | 11 -----------
arch/x86/include/asm/pgtable.h | 6 ++++++
2 files changed, 6 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index 1178a51b77f3..b6d425999f99 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -27,17 +27,6 @@ static inline void paravirt_release_pud(unsigned long pfn) {}
*/
extern gfp_t __userpte_alloc_gfp;
-#ifdef CONFIG_PAGE_TABLE_ISOLATION
-/*
- * Instead of one PGD, we acquire two PGDs. Being order-1, it is
- * both 8k in size and 8k-aligned. That lets us just flip bit 12
- * in a pointer to swap between the two 4k halves.
- */
-#define PGD_ALLOCATION_ORDER 1
-#else
-#define PGD_ALLOCATION_ORDER 0
-#endif
-
/*
* Allocate and free page tables.
*/
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 2536f90cd30c..5af0401ccff2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -20,9 +20,15 @@
#ifdef CONFIG_PAGE_TABLE_ISOLATION
extern int kaiser_enabled;
+/*
+ * Instead of one PGD, we acquire two PGDs. Being order-1, it is
+ * both 8k in size and 8k-aligned. That lets us just flip bit 12
+ * in a pointer to swap between the two 4k halves.
+ */
#else
#define kaiser_enabled 0
#endif
+#define PGD_ALLOCATION_ORDER kaiser_enabled
void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd);
void ptdump_walk_pgd_level_checkwx(void);
--
2.16.0.rc1.238.g530d649a79-goog
From: Dave Hansen <dave.hansen(a)linux.intel.com>
commit 445b69e3b75e42362a5bdc13c8b8f61599e2228a upstream
The inital fix for trusted boot and PTI potentially misses the pgd clearing
if pud_alloc() sets a PGD. It probably works in *practice* because for two
adjacent calls to map_tboot_page() that share a PGD entry, the first will
clear NX, *then* allocate and set the PGD (without NX clear). The second
call will *not* allocate but will clear the NX bit.
Defer the NX clearing to a point after it is known that all top-level
allocations have occurred. Add a comment to clarify why.
[ tglx: Massaged changelog ]
Fixes: 262b6b30087 ("x86/tboot: Unbreak tboot with PTI enabled")
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Jon Masters <jcm(a)redhat.com>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: gnomes(a)lxorguk.ukuu.org.uk
Cc: peterz(a)infradead.org
Cc: ning.sun(a)intel.com
Cc: tboot-devel(a)lists.sourceforge.net
Cc: andi(a)firstfloor.org
Cc: luto(a)kernel.org
Cc: law(a)redhat.com
Cc: pbonzini(a)redhat.com
Cc: torvalds(a)linux-foundation.org
Cc: gregkh(a)linux-foundation.org
Cc: dwmw(a)amazon.co.uk
Cc: nickc(a)redhat.com
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20180110224939.2695CD47@viggo.jf.intel.com
Cc: Jiri Kosina <jkosina(a)suse.cz>
Signed-off-by: Hugh Dickins <hughd(a)google.com>
hughd notes: I have not tested tboot, but this looks to me as necessary
and as safe in old-Kaiser backports as it is upstream; I'm not submitting
the commit-to-be-fixed 262b6b30087, since it was undone by 445b69e3b75e,
and makes conflict trouble because of 5-level's p4d versus 4-level's pgd.
---
arch/x86/kernel/tboot.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 8402907825b0..21454e254a4c 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -134,6 +134,16 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn,
return -1;
set_pte_at(&tboot_mm, vaddr, pte, pfn_pte(pfn, prot));
pte_unmap(pte);
+
+ /*
+ * PTI poisons low addresses in the kernel page tables in the
+ * name of making them unusable for userspace. To execute
+ * code at such a low address, the poison must be cleared.
+ *
+ * Note: 'pgd' actually gets set in pud_alloc().
+ */
+ pgd->pgd &= ~_PAGE_NX;
+
return 0;
}
--
2.16.0.rc1.238.g530d649a79-goog
Initialize the request queue lock earlier such that the following
race can no longer occur:
blk_init_queue_node blkcg_print_blkgs
blk_alloc_queue_node (1)
q->queue_lock = &q->__queue_lock (2)
blkcg_init_queue(q) (3)
spin_lock_irq(blkg->q->queue_lock) (4)
q->queue_lock = lock (5)
spin_unlock_irq(blkg->q->queue_lock) (6)
(1) allocate an uninitialized queue;
(2) initialize queue_lock to its default internal lock;
(3) initialize blkcg part of request queue, which will create blkg and
then insert it to blkg_list;
(4) traverse blkg_list and find the created blkg, and then take its
queue lock, here it is the default *internal lock*;
(5) *race window*, now queue_lock is overridden with *driver specified
lock*;
(6) now unlock *driver specified lock*, not the locked *internal lock*,
unlock balance breaks.
The changes in this patch are as follows:
- Move the .queue_lock initialization from blk_init_queue_node() into
blk_alloc_queue_node().
- For all all block drivers that initialize .queue_lock explicitly,
change the blk_alloc_queue() call in the driver into a
blk_alloc_queue_node() call and remove the explicit .queue_lock
initialization. Additionally, initialize the spin lock that will
be used as queue lock earlier if necessary.
Reported-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Philipp Reisner <philipp.reisner(a)linbit.com>
Cc: Ulf Hansson <ulf.hansson(a)linaro.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
---
block/blk-core.c | 24 ++++++++++++++++--------
drivers/block/drbd/drbd_main.c | 3 +--
drivers/block/umem.c | 7 +++----
drivers/mmc/core/queue.c | 3 +--
4 files changed, 21 insertions(+), 16 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 860a039fd1a8..c2c81c5b7420 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -946,6 +946,20 @@ static void blk_rq_timed_out_timer(struct timer_list *t)
kblockd_schedule_work(&q->timeout_work);
}
+/**
+ * blk_alloc_queue_node - allocate a request queue
+ * @gfp_mask: memory allocation flags
+ * @node_id: NUMA node to allocate memory from
+ * @lock: Pointer to a spinlock that will be used to e.g. serialize calls to
+ * the legacy .request_fn(). Only set this pointer for queues that use
+ * legacy mode and not for queues that use blk-mq.
+ *
+ * Note: pass the queue lock as the third argument to this function instead of
+ * setting the queue lock pointer explicitly to avoid triggering a crash in
+ * the blkcg throttling code. That code namely makes sysfs attributes visible
+ * in user space before this function returns and the show methods of these
+ * sysfs attributes use the queue lock.
+ */
struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id,
spinlock_t *lock)
{
@@ -998,11 +1012,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id,
mutex_init(&q->sysfs_lock);
spin_lock_init(&q->__queue_lock);
- /*
- * By default initialize queue_lock to internal lock and driver can
- * override it later if need be.
- */
- q->queue_lock = &q->__queue_lock;
+ q->queue_lock = lock ? : &q->__queue_lock;
/*
* A queue starts its life with bypass turned on to avoid
@@ -1089,13 +1099,11 @@ blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
struct request_queue *q;
- q = blk_alloc_queue_node(GFP_KERNEL, node_id, NULL);
+ q = blk_alloc_queue_node(GFP_KERNEL, node_id, lock);
if (!q)
return NULL;
q->request_fn = rfn;
- if (lock)
- q->queue_lock = lock;
if (blk_init_allocated_queue(q) < 0) {
blk_cleanup_queue(q);
return NULL;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 4b4697a1f963..058247bc2f30 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2822,7 +2822,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
drbd_init_set_defaults(device);
- q = blk_alloc_queue(GFP_KERNEL);
+ q = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, &resource->req_lock);
if (!q)
goto out_no_q;
device->rq_queue = q;
@@ -2854,7 +2854,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
/* Setting the max_hw_sectors to an odd value of 8kibyte here
This triggers a max_bio_size message upon first attach or connect */
blk_queue_max_hw_sectors(q, DRBD_MAX_BIO_SIZE_SAFE >> 8);
- q->queue_lock = &resource->req_lock;
device->md_io.page = alloc_page(GFP_KERNEL);
if (!device->md_io.page)
diff --git a/drivers/block/umem.c b/drivers/block/umem.c
index 8077123678ad..5c7fb8cc4149 100644
--- a/drivers/block/umem.c
+++ b/drivers/block/umem.c
@@ -888,13 +888,14 @@ static int mm_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
card->Active = -1; /* no page is active */
card->bio = NULL;
card->biotail = &card->bio;
+ spin_lock_init(&card->lock);
- card->queue = blk_alloc_queue(GFP_KERNEL);
+ card->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE,
+ &card->lock);
if (!card->queue)
goto failed_alloc;
blk_queue_make_request(card->queue, mm_make_request);
- card->queue->queue_lock = &card->lock;
card->queue->queuedata = card;
tasklet_init(&card->tasklet, process_page, (unsigned long)card);
@@ -968,8 +969,6 @@ static int mm_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)
dev_printk(KERN_INFO, &card->dev->dev,
"Window size %d bytes, IRQ %d\n", data, dev->irq);
- spin_lock_init(&card->lock);
-
pci_set_drvdata(dev, card);
if (pci_write_cmd != 0x0F) /* If not Memory Write & Invalidate */
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 5ecd54088988..bcf6ae03fa97 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -216,10 +216,9 @@ int mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
int ret = -ENOMEM;
mq->card = card;
- mq->queue = blk_alloc_queue(GFP_KERNEL);
+ mq->queue = blk_alloc_queue_node(GFP_KERNEL, NUMA_NO_NODE, lock);
if (!mq->queue)
return -ENOMEM;
- mq->queue->queue_lock = lock;
mq->queue->request_fn = mmc_request_fn;
mq->queue->init_rq_fn = mmc_init_request;
mq->queue->exit_rq_fn = mmc_exit_request;
--
2.16.0
Upstream commit 1c9de5bf4286 ("usbip: vhci-hcd: Add USB3 SuperSpeed
support")
vhci_hcd clears all the bits port_status bits instead of clearing
just the USB_PORT_STAT_POWER bit when it handles ClearPortFeature:
USB_PORT_FEAT_POWER. This causes vhci_hcd attach to fail in a bad
state, leaving device unusable by the client. The device is still
attached and however client can't use it.
The problem was fixed as part of larger change to add USB3 Super
Speed support. This patch backports just the change to clear the
USB_PORT_STAT_POWER.
Signed-off-by: Shuah Khan <shuahkh(a)osg.samsung.com>
---
drivers/usb/usbip/vhci_hcd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 7f161b095176..dbe615ba07c9 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -300,7 +300,7 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
case USB_PORT_FEAT_POWER:
usbip_dbg_vhci_rh(
" ClearPortFeature: USB_PORT_FEAT_POWER\n");
- dum->port_status[rhport] = 0;
+ dum->port_status[rhport] &= ~USB_PORT_STAT_POWER;
dum->resuming = 0;
break;
case USB_PORT_FEAT_C_RESET:
--
2.14.1
Currently we see sporadic timeouts during CDCLK changing both on BXT and
GLK as reported by the Bugzilla: ticket. It's easy to reproduce this by
changing the frequency in a tight loop after blanking the display. The
upper bound for the completion time is 800us based on my tests, so
increase it from the current 500us to 2ms; with that I couldn't trigger
the problem either on BXT or GLK.
Note that timeouts happened during both the change notification and the
voltage level setting PCODE request. (For the latter one BSpec doesn't
require us to wait for completion before further HW programming.)
This issue is similar to
2c7d0602c815 ("drm/i915/gen9: Fix PCODE polling during CDCLK change
notification")
but there the PCODE request does complete (as shown by the mbox
busy flag), only the reply we get from PCODE indicates a failure.
So there we keep resending the request until a success reply, here we
just have to increase the timeout for the one PCODE request we send.
v2:
- s/snb_pcode_request/sandybridge_pcode_write_timeout/ (Ville)
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v4.4+
Acked-by: Chris Wilson <chris(a)chris-wilson.co.uk> (v1)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103326
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/i915_drv.h | 6 +++++-
drivers/gpu/drm/i915/intel_cdclk.c | 22 +++++++++++++++++-----
drivers/gpu/drm/i915/intel_pm.c | 6 +++---
3 files changed, 25 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 454d8f937fae..7e83d0d3e632 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3723,7 +3723,11 @@ extern void intel_display_print_error_state(struct drm_i915_error_state_buf *e,
struct intel_display_error_state *error);
int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val);
-int sandybridge_pcode_write(struct drm_i915_private *dev_priv, u32 mbox, u32 val);
+int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv, u32 mbox,
+ u32 val, int timeout_us);
+#define sandybridge_pcode_write(dev_priv, mbox, val) \
+ sandybridge_pcode_write_timeout(dev_priv, mbox, val, 500)
+
int skl_pcode_request(struct drm_i915_private *dev_priv, u32 mbox, u32 request,
u32 reply_mask, u32 reply, int timeout_base_ms);
diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c
index c4392ea34a3d..a423b674fcec 100644
--- a/drivers/gpu/drm/i915/intel_cdclk.c
+++ b/drivers/gpu/drm/i915/intel_cdclk.c
@@ -1370,10 +1370,15 @@ static void bxt_set_cdclk(struct drm_i915_private *dev_priv,
break;
}
- /* Inform power controller of upcoming frequency change */
+ /*
+ * Inform power controller of upcoming frequency change. BSpec
+ * requires us to wait up to 150usec, but that leads to timeouts;
+ * the 2ms used here is based on experiment.
+ */
mutex_lock(&dev_priv->pcu_lock);
- ret = sandybridge_pcode_write(dev_priv, HSW_PCODE_DE_WRITE_FREQ_REQ,
- 0x80000000);
+ ret = sandybridge_pcode_write_timeout(dev_priv,
+ HSW_PCODE_DE_WRITE_FREQ_REQ,
+ 0x80000000, 2000);
mutex_unlock(&dev_priv->pcu_lock);
if (ret) {
@@ -1404,8 +1409,15 @@ static void bxt_set_cdclk(struct drm_i915_private *dev_priv,
I915_WRITE(CDCLK_CTL, val);
mutex_lock(&dev_priv->pcu_lock);
- ret = sandybridge_pcode_write(dev_priv, HSW_PCODE_DE_WRITE_FREQ_REQ,
- cdclk_state->voltage_level);
+ /*
+ * The timeout isn't specified, the 2ms used here is based on
+ * experiment.
+ * FIXME: Waiting for the request completion could be delayed until
+ * the next PCODE request based on BSpec.
+ */
+ ret = sandybridge_pcode_write_timeout(dev_priv,
+ HSW_PCODE_DE_WRITE_FREQ_REQ,
+ cdclk_state->voltage_level, 2000);
mutex_unlock(&dev_priv->pcu_lock);
if (ret) {
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 0b92ea1dbd40..a6098932be5a 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -9169,8 +9169,8 @@ int sandybridge_pcode_read(struct drm_i915_private *dev_priv, u32 mbox, u32 *val
return 0;
}
-int sandybridge_pcode_write(struct drm_i915_private *dev_priv,
- u32 mbox, u32 val)
+int sandybridge_pcode_write_timeout(struct drm_i915_private *dev_priv,
+ u32 mbox, u32 val, int timeout_us)
{
int status;
@@ -9193,7 +9193,7 @@ int sandybridge_pcode_write(struct drm_i915_private *dev_priv,
if (__intel_wait_for_register_fw(dev_priv,
GEN6_PCODE_MAILBOX, GEN6_PCODE_READY, 0,
- 500, 0, NULL)) {
+ timeout_us, 0, NULL)) {
DRM_ERROR("timeout waiting for pcode write of 0x%08x to mbox %x to finish for %ps\n",
val, mbox, __builtin_return_address(0));
return -ETIMEDOUT;
--
2.13.2