The patch below does not apply to the 4.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f572a034d9e07157dd07d2b6be3a1459b5574b58 Mon Sep 17 00:00:00 2001
From: Greentime Hu <greentime(a)andestech.com>
Date: Thu, 16 Nov 2017 19:33:35 +0800
Subject: [PATCH] earlycon: add reg-offset to physical address before mapping
It will get the wrong virtual address because port->mapbase is not added
the correct reg-offset yet. We have to update it before earlycon_map()
is called
Signed-off-by: Greentime Hu <greentime(a)andestech.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Rob Herring <robh(a)kernel.org>
Cc: Peter Hurley <peter(a)hurleysoftware.com>
Cc: stable(a)vger.kernel.org
Fixes: 088da2a17619 ("of: earlycon: Initialize port fields from DT
properties")
diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index 870e84fb6e39..a24278380fec 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -245,11 +245,12 @@ int __init of_setup_earlycon(const struct earlycon_id *match,
}
port->mapbase = addr;
port->uartclk = BASE_BAUD * 16;
- port->membase = earlycon_map(port->mapbase, SZ_4K);
val = of_get_flat_dt_prop(node, "reg-offset", NULL);
if (val)
port->mapbase += be32_to_cpu(*val);
+ port->membase = earlycon_map(port->mapbase, SZ_4K);
+
val = of_get_flat_dt_prop(node, "reg-shift", NULL);
if (val)
port->regshift = be32_to_cpu(*val);
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f572a034d9e07157dd07d2b6be3a1459b5574b58 Mon Sep 17 00:00:00 2001
From: Greentime Hu <greentime(a)andestech.com>
Date: Thu, 16 Nov 2017 19:33:35 +0800
Subject: [PATCH] earlycon: add reg-offset to physical address before mapping
It will get the wrong virtual address because port->mapbase is not added
the correct reg-offset yet. We have to update it before earlycon_map()
is called
Signed-off-by: Greentime Hu <greentime(a)andestech.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Rob Herring <robh(a)kernel.org>
Cc: Peter Hurley <peter(a)hurleysoftware.com>
Cc: stable(a)vger.kernel.org
Fixes: 088da2a17619 ("of: earlycon: Initialize port fields from DT
properties")
diff --git a/drivers/tty/serial/earlycon.c b/drivers/tty/serial/earlycon.c
index 870e84fb6e39..a24278380fec 100644
--- a/drivers/tty/serial/earlycon.c
+++ b/drivers/tty/serial/earlycon.c
@@ -245,11 +245,12 @@ int __init of_setup_earlycon(const struct earlycon_id *match,
}
port->mapbase = addr;
port->uartclk = BASE_BAUD * 16;
- port->membase = earlycon_map(port->mapbase, SZ_4K);
val = of_get_flat_dt_prop(node, "reg-offset", NULL);
if (val)
port->mapbase += be32_to_cpu(*val);
+ port->membase = earlycon_map(port->mapbase, SZ_4K);
+
val = of_get_flat_dt_prop(node, "reg-shift", NULL);
if (val)
port->regshift = be32_to_cpu(*val);
Hi,
Building 4.9.94 in the same way we have been building previous 4.9 releases yields the following error:
DEBUG: tests/code-reading.c: In function 'read_object_code':
DEBUG: tests/code-reading.c:228:19: error: 'KMOD_DECOMP_LEN' undeclared (first use in this function)
DEBUG: char decomp_name[KMOD_DECOMP_LEN];
DEBUG: ^
DEBUG: tests/code-reading.c:228:19: note: each undeclared identifier is reported only once for each
function it appears in
DEBUG: tests/code-reading.c:291:3: warning: implicit declaration of function
'dso__decompress_kmodule_path' [-Wimplicit-function-declaration]
DEBUG: if (dso__decompress_kmodule_path(al.map->dso, objdump_name,
DEBUG: ^
DEBUG: tests/code-reading.c:291:3: warning: nested extern declaration of
'dso__decompress_kmodule_path' [-Wnested-externs]
DEBUG: tests/code-reading.c:228:7: warning: unused variable 'decomp_name' [-Wunused-variable]
DEBUG: char decomp_name[KMOD_DECOMP_LEN];
DEBUG: ^
DEBUG: CC tests/topology.o
DEBUG: CC tests/cpumap.o
DEBUG: CC tests/stat.o
DEBUG: CC tests/event_update.o
DEBUG: mv: cannot stat 'tests/.code-reading.o.tmp': No such file or directory
DEBUG: make[3]: *** [tests/code-reading.o] Error 1
DEBUG: make[3]: *** Waiting for unfinished jobs....
DEBUG: make[2]: *** [util] Error 2
DEBUG: make[1]: *** [libperf-in.o] Error 2
DEBUG: make[1]: *** Waiting for unfinished jobs....
DEBUG: LD bench/perf-in.o
DEBUG: make[2]: *** [tests] Error 2
DEBUG: make[1]: *** [perf-in.o] Error 2
As far as I can see, KMOD_DECOMP_LEN was introduced by 7525a238be8f ("perf tests: Decompress kernel
module before objdump"), but I have zero deep knowledge in this area so I may be very wrong here.
Cheers,
Pavlos Parissis
The normal request completion can be done before or during handling
BLK_EH_RESET_TIMER, and this race may cause the request to never
be completed since driver's .timeout() may always return
BLK_EH_RESET_TIMER.
This issue can't be fixed completely by driver, since the normal
completion can be done between returning .timeout() and handling
BLK_EH_RESET_TIMER.
This patch fixes the race by introducing rq state of MQ_RQ_COMPLETE_IN_RESET,
and reading/writing rq's state by holding queue lock, which can be
per-request actually, but just not necessary to introduce one lock for
so unusual event.
Also handle the timeout requests in two steps:
1) in 1st step, call .timeout(), and reset timer for BLK_EH_RESET_TIMER
2) in 2nd step, sync with normal completion path by holding queue lock
for avoiding race between BLK_EH_RESET_TIMER and normal completion.
Cc: "jianchao.wang" <jianchao.w.wang(a)oracle.com>
Cc: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Sagi Grimberg <sagi(a)grimberg.me>
Cc: Israel Rukshin <israelr(a)mellanox.com>,
Cc: Max Gurtovoy <maxg(a)mellanox.com>
Cc: stable(a)vger.kernel.org
Cc: Martin Steigerwald <Martin(a)Lichtvoll.de>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-mq.c | 116 ++++++++++++++++++++++++++++++++++++++++---------
block/blk-mq.h | 1 +
include/linux/blkdev.h | 6 +++
3 files changed, 102 insertions(+), 21 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index d6a21898933d..9415e65302a8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -630,10 +630,27 @@ void blk_mq_complete_request(struct request *rq)
* However, that would complicate paths which want to synchronize
* against us. Let stay in sync with the issue path so that
* hctx_lock() covers both issue and completion paths.
+ *
+ * Cover complete vs BLK_EH_RESET_TIMER race in slow path with
+ * holding queue lock.
*/
hctx_lock(hctx, &srcu_idx);
if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
__blk_mq_complete_request(rq);
+ else {
+ unsigned long flags;
+ bool need_complete = false;
+
+ spin_lock_irqsave(q->queue_lock, flags);
+ if (!blk_mq_rq_aborted_gstate(rq))
+ need_complete = true;
+ else
+ blk_mq_rq_update_state(rq, MQ_RQ_COMPLETE_IN_TIMEOUT);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ if (need_complete)
+ __blk_mq_complete_request(rq);
+ }
hctx_unlock(hctx, srcu_idx);
}
EXPORT_SYMBOL(blk_mq_complete_request);
@@ -810,7 +827,7 @@ struct blk_mq_timeout_data {
unsigned int nr_expired;
};
-static void blk_mq_rq_timed_out(struct request *req, bool reserved)
+static void blk_mq_rq_pre_timed_out(struct request *req, bool reserved)
{
const struct blk_mq_ops *ops = req->q->mq_ops;
enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
@@ -818,18 +835,44 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
if (ops->timeout)
ret = ops->timeout(req, reserved);
+ if (ret == BLK_EH_RESET_TIMER)
+ blk_add_timer(req);
+
+ req->timeout_ret = ret;
+}
+
+static void blk_mq_rq_timed_out(struct request *req, bool reserved)
+{
+ enum blk_eh_timer_return ret = req->timeout_ret;
+ unsigned long flags;
+
switch (ret) {
case BLK_EH_HANDLED:
+ spin_lock_irqsave(req->q->queue_lock, flags);
+ complete_rq:
+ if (blk_mq_rq_state(req) == MQ_RQ_COMPLETE_IN_TIMEOUT)
+ blk_mq_rq_update_state(req, MQ_RQ_IN_FLIGHT);
+ spin_unlock_irqrestore(req->q->queue_lock, flags);
__blk_mq_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
/*
- * As nothing prevents from completion happening while
- * ->aborted_gstate is set, this may lead to ignored
- * completions and further spurious timeouts.
+ * The normal completion may happen during handling the
+ * timeout, or even after returning from .timeout(), so
+ * once the request has been completed, we can't reset
+ * timer any more since this request may be handled as
+ * BLK_EH_RESET_TIMER in next timeout handling too, and
+ * it has to be completed in this situation.
+ *
+ * Holding the queue lock to cover read/write rq's
+ * aborted_gstate and normal state, so the race can be
+ * avoided completely.
*/
+ spin_lock_irqsave(req->q->queue_lock, flags);
blk_mq_rq_update_aborted_gstate(req, 0);
- blk_add_timer(req);
+ if (blk_mq_rq_state(req) == MQ_RQ_COMPLETE_IN_TIMEOUT)
+ goto complete_rq;
+ spin_unlock_irqrestore(req->q->queue_lock, flags);
break;
case BLK_EH_NOT_HANDLED:
req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED;
@@ -875,7 +918,7 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
}
}
-static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
+static void blk_mq_prepare_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
/*
@@ -887,9 +930,40 @@ static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
*/
if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) &&
READ_ONCE(rq->gstate) == rq->aborted_gstate)
+ blk_mq_rq_pre_timed_out(rq, reserved);
+}
+
+static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
+ struct request *rq, void *priv, bool reserved)
+{
+ if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) &&
+ READ_ONCE(rq->gstate) == rq->aborted_gstate)
blk_mq_rq_timed_out(rq, reserved);
}
+static void blk_mq_timeout_synchronize_rcu(struct request_queue *q,
+ bool reset_expired)
+{
+ struct blk_mq_hw_ctx *hctx;
+ int i;
+ bool has_rcu = false;
+
+ queue_for_each_hw_ctx(q, hctx, i) {
+ if (!hctx->nr_expired)
+ continue;
+
+ if (!(hctx->flags & BLK_MQ_F_BLOCKING))
+ has_rcu = true;
+ else
+ synchronize_srcu(hctx->srcu);
+
+ if (reset_expired)
+ hctx->nr_expired = 0;
+ }
+ if (has_rcu)
+ synchronize_rcu();
+}
+
static void blk_mq_timeout_work(struct work_struct *work)
{
struct request_queue *q =
@@ -899,8 +973,6 @@ static void blk_mq_timeout_work(struct work_struct *work)
.next_set = 0,
.nr_expired = 0,
};
- struct blk_mq_hw_ctx *hctx;
- int i;
/* A deadlock might occur if a request is stuck requiring a
* timeout at the same time a queue freeze is waiting
@@ -922,27 +994,26 @@ static void blk_mq_timeout_work(struct work_struct *work)
blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data);
if (data.nr_expired) {
- bool has_rcu = false;
-
/*
* Wait till everyone sees ->aborted_gstate. The
* sequential waits for SRCUs aren't ideal. If this ever
* becomes a problem, we can add per-hw_ctx rcu_head and
* wait in parallel.
*/
- queue_for_each_hw_ctx(q, hctx, i) {
- if (!hctx->nr_expired)
- continue;
+ blk_mq_timeout_synchronize_rcu(q, false);
- if (!(hctx->flags & BLK_MQ_F_BLOCKING))
- has_rcu = true;
- else
- synchronize_srcu(hctx->srcu);
+ /* call .timeout() for timed-out requests */
+ blk_mq_queue_tag_busy_iter(q, blk_mq_prepare_expired, NULL);
- hctx->nr_expired = 0;
- }
- if (has_rcu)
- synchronize_rcu();
+ /*
+ * If .timeout returns BLK_EH_HANDLED, wait till current
+ * completion is done, for avoiding to update state on
+ * completed request.
+ *
+ * If .timeout returns BLK_EH_RESET_TIMER, wait till
+ * blk_add_timer() is commited before completing this rq.
+ */
+ blk_mq_timeout_synchronize_rcu(q, true);
/* terminate the ones we won */
blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL);
@@ -952,6 +1023,9 @@ static void blk_mq_timeout_work(struct work_struct *work)
data.next = blk_rq_timeout(round_jiffies_up(data.next));
mod_timer(&q->timeout, data.next);
} else {
+ struct blk_mq_hw_ctx *hctx;
+ int i;
+
/*
* Request timeouts are handled as a forward rolling timer. If
* we end up here it means that no requests are pending and
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..0426d048743d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -35,6 +35,7 @@ enum mq_rq_state {
MQ_RQ_IDLE = 0,
MQ_RQ_IN_FLIGHT = 1,
MQ_RQ_COMPLETE = 2,
+ MQ_RQ_COMPLETE_IN_TIMEOUT = 3,
MQ_RQ_STATE_BITS = 2,
MQ_RQ_STATE_MASK = (1 << MQ_RQ_STATE_BITS) - 1,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9af3e0f430bc..8278f67d39a6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -252,8 +252,14 @@ struct request {
struct list_head timeout_list;
union {
+ /* used after completion */
struct __call_single_data csd;
+
+ /* used in io scheduler, before dispatch */
u64 fifo_time;
+
+ /* used after dispatch and before completion */
+ int timeout_ret;
};
/*
--
2.9.5
The normal request completion can be done before or during handling
BLK_EH_RESET_TIMER, and this race may cause the request to never
be completed since driver's .timeout() may always return
BLK_EH_RESET_TIMER.
This issue can't be fixed completely by driver, since the normal
completion can be done between returning .timeout() and handling
BLK_EH_RESET_TIMER.
This patch fixes the race by introducing rq state of MQ_RQ_COMPLETE_IN_RESET,
and reading/writing rq's state by holding queue lock, which can be
per-request actually, but just not necessary to introduce one lock for
so unusual event.
Also when .timeout() returns BLK_EH_HANDLED, sync with normal completion
path before completing this timed-out rq finally for avoiding this rq's
state touched by normal completion.
Cc: "jianchao.wang" <jianchao.w.wang(a)oracle.com>
Cc: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Sagi Grimberg <sagi(a)grimberg.me>
Cc: Israel Rukshin <israelr(a)mellanox.com>,
Cc: Max Gurtovoy <maxg(a)mellanox.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
V3:
- before completing rq for BLK_EH_HANDLED, sync with normal
completion path
- make sure rq's state updated as MQ_RQ_IN_FLIGHT before completing
V2:
- rename the new flag as MQ_RQ_COMPLETE_IN_TIMEOUT
- fix lock uses in blk_mq_rq_timed_out
- document re-order between blk_add_timer() and
blk_mq_rq_update_aborted_gstate(req, 0)
block/blk-mq.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++-----------
block/blk-mq.h | 1 +
2 files changed, 70 insertions(+), 16 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0dc9e341c2a7..d70f69a32226 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -198,23 +198,12 @@ void blk_mq_quiesce_queue_nowait(struct request_queue *q)
}
EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue_nowait);
-/**
- * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
- * @q: request queue.
- *
- * Note: this function does not prevent that the struct request end_io()
- * callback function is invoked. Once this function is returned, we make
- * sure no dispatch can happen until the queue is unquiesced via
- * blk_mq_unquiesce_queue().
- */
-void blk_mq_quiesce_queue(struct request_queue *q)
+static void blk_mq_queue_synchronize_rcu(struct request_queue *q)
{
struct blk_mq_hw_ctx *hctx;
unsigned int i;
bool rcu = false;
- blk_mq_quiesce_queue_nowait(q);
-
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->flags & BLK_MQ_F_BLOCKING)
synchronize_srcu(hctx->srcu);
@@ -224,6 +213,21 @@ void blk_mq_quiesce_queue(struct request_queue *q)
if (rcu)
synchronize_rcu();
}
+
+/**
+ * blk_mq_quiesce_queue() - wait until all ongoing dispatches have finished
+ * @q: request queue.
+ *
+ * Note: this function does not prevent that the struct request end_io()
+ * callback function is invoked. Once this function is returned, we make
+ * sure no dispatch can happen until the queue is unquiesced via
+ * blk_mq_unquiesce_queue().
+ */
+void blk_mq_quiesce_queue(struct request_queue *q)
+{
+ blk_mq_quiesce_queue_nowait(q);
+ blk_mq_queue_synchronize_rcu(q);
+}
EXPORT_SYMBOL_GPL(blk_mq_quiesce_queue);
/*
@@ -630,10 +634,27 @@ void blk_mq_complete_request(struct request *rq)
* However, that would complicate paths which want to synchronize
* against us. Let stay in sync with the issue path so that
* hctx_lock() covers both issue and completion paths.
+ *
+ * Cover complete vs BLK_EH_RESET_TIMER race in slow path with
+ * holding queue lock.
*/
hctx_lock(hctx, &srcu_idx);
if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
__blk_mq_complete_request(rq);
+ else {
+ unsigned long flags;
+ bool need_complete = false;
+
+ spin_lock_irqsave(q->queue_lock, flags);
+ if (!blk_mq_rq_aborted_gstate(rq))
+ need_complete = true;
+ else
+ blk_mq_rq_update_state(rq, MQ_RQ_COMPLETE_IN_TIMEOUT);
+ spin_unlock_irqrestore(q->queue_lock, flags);
+
+ if (need_complete)
+ __blk_mq_complete_request(rq);
+ }
hctx_unlock(hctx, srcu_idx);
}
EXPORT_SYMBOL(blk_mq_complete_request);
@@ -814,6 +835,7 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
{
const struct blk_mq_ops *ops = req->q->mq_ops;
enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
+ unsigned long flags;
req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED;
@@ -822,16 +844,47 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
switch (ret) {
case BLK_EH_HANDLED:
+ /*
+ * If .timeout returns BLK_EH_HANDLED, this rq shouldn't
+ * be completed by normal irq context any more, but for
+ * the sake of safety, sync with normal completion path
+ * before completing this request finally because the
+ * normal completion path may touch this rq's state.
+ */
+ blk_mq_queue_synchronize_rcu(req->q);
+
+ spin_lock_irqsave(req->q->queue_lock, flags);
+ complete_rq:
+ if (blk_mq_rq_state(req) == MQ_RQ_COMPLETE_IN_TIMEOUT)
+ blk_mq_rq_update_state(req, MQ_RQ_IN_FLIGHT);
+ spin_unlock_irqrestore(req->q->queue_lock, flags);
__blk_mq_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
/*
- * As nothing prevents from completion happening while
- * ->aborted_gstate is set, this may lead to ignored
- * completions and further spurious timeouts.
+ * The normal completion may happen during handling the
+ * timeout, or even after returning from .timeout(), so
+ * once the request has been completed, we can't reset
+ * timer any more since this request may be handled as
+ * BLK_EH_RESET_TIMER in next timeout handling too, and
+ * it has to be completed in this situation.
+ *
+ * Holding the queue lock to cover read/write rq's
+ * aborted_gstate and normal state, so the race can be
+ * avoided completely.
+ *
+ * blk_add_timer() may be re-ordered with resetting
+ * aborted_gstate, and the only side-effec is that if this
+ * request is recycled after aborted_gstate is cleared, it
+ * may be timed out a bit late, that is what we can survive
+ * given timeout event is so unusual.
*/
- blk_mq_rq_update_aborted_gstate(req, 0);
+ spin_lock_irqsave(req->q->queue_lock, flags);
+ if (blk_mq_rq_state(req) == MQ_RQ_COMPLETE_IN_TIMEOUT)
+ goto complete_rq;
blk_add_timer(req);
+ blk_mq_rq_update_aborted_gstate(req, 0);
+ spin_unlock_irqrestore(req->q->queue_lock, flags);
break;
case BLK_EH_NOT_HANDLED:
break;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..0426d048743d 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -35,6 +35,7 @@ enum mq_rq_state {
MQ_RQ_IDLE = 0,
MQ_RQ_IN_FLIGHT = 1,
MQ_RQ_COMPLETE = 2,
+ MQ_RQ_COMPLETE_IN_TIMEOUT = 3,
MQ_RQ_STATE_BITS = 2,
MQ_RQ_STATE_MASK = (1 << MQ_RQ_STATE_BITS) - 1,
--
2.9.5
From: Takashi Iwai <tiwai(a)suse.de>
[ Upstream commit d7f910bfedd863d13ea320030fe98e42d0938ed5 ]
For accessing the snd_timer_user queue indices, we take tu->qlock.
But it's forgotten in a couple of places.
The one in snd_timer_user_params() should be safe without the
spinlock as the timer is already stopped. But it's better for
consistency.
The one in poll is just a read-out, so it's not inevitably needed, but
it'd be good to make the result consistent, too.
Tested-by: Alexander Potapenko <glider(a)google.com>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
---
sound/core/timer.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/sound/core/timer.c b/sound/core/timer.c
index 48eaccba82a3..fd622aa0bb93 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1771,6 +1771,7 @@ static int snd_timer_user_params(struct file *file,
}
}
}
+ spin_lock_irq(&tu->qlock);
tu->qhead = tu->qtail = tu->qused = 0;
if (tu->timeri->flags & SNDRV_TIMER_IFLG_EARLY_EVENT) {
if (tu->tread) {
@@ -1791,6 +1792,7 @@ static int snd_timer_user_params(struct file *file,
}
tu->filter = params.filter;
tu->ticks = params.ticks;
+ spin_unlock_irq(&tu->qlock);
err = 0;
_end:
if (copy_to_user(_params, ¶ms, sizeof(params)))
@@ -2029,10 +2031,12 @@ static unsigned int snd_timer_user_poll(struct file *file, poll_table * wait)
poll_wait(file, &tu->qchange_sleep, wait);
mask = 0;
+ spin_lock_irq(&tu->qlock);
if (tu->qused)
mask |= POLLIN | POLLRDNORM;
if (tu->disconnected)
mask |= POLLERR;
+ spin_unlock_irq(&tu->qlock);
return mask;
}
--
2.15.1
From: Fabio Estevam <fabio.estevam(a)nxp.com>
imx6ul and imx7 report the following error:
caam_jr 2142000.jr1: 40000789: DECO: desc idx 7:
Protocol Size Error - A protocol has seen an error in size. When
running RSA, pdb size N < (size of F) when no formatting is used; or
pdb size N < (F + 11) when formatting is used.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at crypto/asymmetric_keys/public_key.c:148
public_key_verify_signature+0x27c/0x2b0
This error happens because the signature contains 257 bytes, including
a leading zero as the first element.
Fix the problem by stripping off the leading zero from input data
before feeding it to the CAAM accelerator.
Fixes: 8c419778ab57e497b5 ("crypto: caam - add support for RSA algorithm")
Cc: <stable(a)vger.kernel.org>
Reported-by: Martin Townsend <mtownsend1973(a)gmail.com>
Signed-off-by: Fabio Estevam <fabio.estevam(a)nxp.com>
---
Changes since v1:
- Use a temp pointer
- Assign len to req->src_len , so that more than one leading zero
can be taken into account
drivers/crypto/caam/caampkc.c | 45 +++++++++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/drivers/crypto/caam/caampkc.c b/drivers/crypto/caam/caampkc.c
index 7a897209..5f3e627 100644
--- a/drivers/crypto/caam/caampkc.c
+++ b/drivers/crypto/caam/caampkc.c
@@ -166,6 +166,14 @@ static void rsa_priv_f3_done(struct device *dev, u32 *desc, u32 err,
akcipher_request_complete(req, err);
}
+static void caam_rsa_drop_leading_zeros(const u8 **ptr, size_t *nbytes)
+{
+ while (!**ptr && *nbytes) {
+ (*ptr)++;
+ (*nbytes)--;
+ }
+}
+
static struct rsa_edesc *rsa_edesc_alloc(struct akcipher_request *req,
size_t desclen)
{
@@ -178,7 +186,36 @@ static struct rsa_edesc *rsa_edesc_alloc(struct akcipher_request *req,
int sgc;
int sec4_sg_index, sec4_sg_len = 0, sec4_sg_bytes;
int src_nents, dst_nents;
+ const u8 *temp;
+ void *buffer;
+ size_t len;
+
+ buffer = kzalloc(req->src_len, GFP_ATOMIC);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+
+ sg_copy_to_buffer(req->src, sg_nents(req->src),
+ buffer, req->src_len);
+ temp = (u8 *)buffer;
+ len = req->src_len;
+ /*
+ * Check if the buffer contains leading zeros and if
+ * it does, drop the leading zeros
+ */
+ if (temp[0] == 0) {
+ caam_rsa_drop_leading_zeros(&temp, &len);
+ if (!temp) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ req->src_len = len;
+ sg_copy_from_buffer(req->src, sg_nents(req->src),
+ (void *)temp, req->src_len);
+ }
+
+ kfree(buffer);
src_nents = sg_nents_for_len(req->src, req->src_len);
dst_nents = sg_nents_for_len(req->dst, req->dst_len);
@@ -683,14 +720,6 @@ static void caam_rsa_free_key(struct caam_rsa_key *key)
memset(key, 0, sizeof(*key));
}
-static void caam_rsa_drop_leading_zeros(const u8 **ptr, size_t *nbytes)
-{
- while (!**ptr && *nbytes) {
- (*ptr)++;
- (*nbytes)--;
- }
-}
-
/**
* caam_read_rsa_crt - Used for reading dP, dQ, qInv CRT members.
* dP, dQ and qInv could decode to less than corresponding p, q length, as the
--
2.7.4
When blk_queue_enter() waits for a queue to unfreeze, or unset the
PREEMPT_ONLY flag, do not allow it to be interrupted by a signal.
The PREEMPT_ONLY flag was introduced later in commit 3a0a529971ec
("block, scsi: Make SCSI quiesce and resume work reliably"). Note the SCSI
device is resumed asynchronously, i.e. after un-freezing userspace tasks.
So that commit exposed the bug as a regression in v4.15. A mysterious
SIGBUS (or -EIO) sometimes happened during the time the device was being
resumed. Most frequently, there was no kernel log message, and we saw Xorg
or Xwayland killed by SIGBUS.[1]
[1] E.g. https://bugzilla.redhat.com/show_bug.cgi?id=1553979
Without this fix, I get an IO error in this test:
# dd if=/dev/sda of=/dev/null iflag=direct & \
while killall -SIGUSR1 dd; do sleep 0.1; done & \
echo mem > /sys/power/state ; \
sleep 5; killall dd # stop after 5 seconds
The interruptible wait was added to blk_queue_enter in
commit 3ef28e83ab15 ("block: generic request_queue reference counting").
Before then, the interruptible wait was only in blk-mq, but I don't think
it could ever have been correct.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alan Jenkins <alan.christopher.jenkins(a)gmail.com>
---
block/blk-core.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index abcb8684ba67..5a6d20069364 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -915,7 +915,6 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
while (true) {
bool success = false;
- int ret;
rcu_read_lock();
if (percpu_ref_tryget_live(&q->q_usage_counter)) {
@@ -947,14 +946,12 @@ int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags)
*/
smp_rmb();
- ret = wait_event_interruptible(q->mq_freeze_wq,
+ wait_event(q->mq_freeze_wq,
(atomic_read(&q->mq_freeze_depth) == 0 &&
(preempt || !blk_queue_preempt_only(q))) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
return -ENODEV;
- if (ret)
- return ret;
}
}
--
2.14.3
From: Fabio Estevam <fabio.estevam(a)nxp.com>
imx6ul and imx7 report the following error:
caam_jr 2142000.jr1: 40000789: DECO: desc idx 7:
Protocol Size Error - A protocol has seen an error in size. When
running RSA, pdb size N < (size of F) when no formatting is used; or
pdb size N < (F + 11) when formatting is used.
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at crypto/asymmetric_keys/public_key.c:148
public_key_verify_signature+0x27c/0x2b0
This error happens because the signature contains 257 bytes, including
a leading zero as the first element.
Fix the problem by striping off the leading zero from input data
before feeding it to the CAAM accelerator.
Fixes: 8c419778ab57e497b5 ("crypto: caam - add support for RSA algorithm")
Cc: <stable(a)vger.kernel.org>
Reported-by: Martin Townsend <mtownsend1973(a)gmail.com>
Signed-off-by: Fabio Estevam <fabio.estevam(a)nxp.com>
---
drivers/crypto/caam/caampkc.c | 43 +++++++++++++++++++++++++++++++++++--------
1 file changed, 35 insertions(+), 8 deletions(-)
diff --git a/drivers/crypto/caam/caampkc.c b/drivers/crypto/caam/caampkc.c
index 7a897209..d2ad547 100644
--- a/drivers/crypto/caam/caampkc.c
+++ b/drivers/crypto/caam/caampkc.c
@@ -166,6 +166,14 @@ static void rsa_priv_f3_done(struct device *dev, u32 *desc, u32 err,
akcipher_request_complete(req, err);
}
+static void caam_rsa_drop_leading_zeros(const u8 **ptr, size_t *nbytes)
+{
+ while (!**ptr && *nbytes) {
+ (*ptr)++;
+ (*nbytes)--;
+ }
+}
+
static struct rsa_edesc *rsa_edesc_alloc(struct akcipher_request *req,
size_t desclen)
{
@@ -178,7 +186,34 @@ static struct rsa_edesc *rsa_edesc_alloc(struct akcipher_request *req,
int sgc;
int sec4_sg_index, sec4_sg_len = 0, sec4_sg_bytes;
int src_nents, dst_nents;
+ const u8 *buffer;
+ size_t len;
+
+ buffer = kzalloc(req->src_len, GFP_ATOMIC);
+ if (!buffer)
+ return ERR_PTR(-ENOMEM);
+
+ sg_copy_to_buffer(req->src, sg_nents(req->src),
+ (void *)buffer, req->src_len);
+ len = req->src_len;
+ /*
+ * Check if the buffer contains leading zero and if
+ * it does, drop the leading zero
+ */
+ if (buffer[0] == 0) {
+ caam_rsa_drop_leading_zeros(&buffer, &len);
+ if (!buffer) {
+ kfree(buffer);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ req->src_len -= 1;
+ sg_copy_from_buffer(req->src, sg_nents(req->src),
+ (void *)buffer, req->src_len);
+ }
+
+ kfree(buffer);
src_nents = sg_nents_for_len(req->src, req->src_len);
dst_nents = sg_nents_for_len(req->dst, req->dst_len);
@@ -683,14 +718,6 @@ static void caam_rsa_free_key(struct caam_rsa_key *key)
memset(key, 0, sizeof(*key));
}
-static void caam_rsa_drop_leading_zeros(const u8 **ptr, size_t *nbytes)
-{
- while (!**ptr && *nbytes) {
- (*ptr)++;
- (*nbytes)--;
- }
-}
-
/**
* caam_read_rsa_crt - Used for reading dP, dQ, qInv CRT members.
* dP, dQ and qInv could decode to less than corresponding p, q length, as the
--
2.7.4
I Sell Sure Spamming Toolz
What we have on Stock Daily
Inbox Webmail
Inbox SMTP
Fresh USA email leads
Fresh Canada email leads
Fresh Loan email leads
Fresh Business emails leads
Real Eastate email leads
Conference delegates email leads
Fresh Job Seaker emails
cPanel HTTP and HTTPs
Shell Zip/Unzipp
Mailer
RDP
All ScamPages
Bank ScamPage
Add me on whatsapp or call me
Watsapp: +2348107268246
Only Real buyers
Hello,
Good day,
I am Mohammed, Our company is interested in your product.
We have gone through your product site online and wish to make order of your
product.
Please do send us details of your products and company to our {email} Also
provide with the recent price
We await your response with quotation and specification.
[1] Payment terms
[2] And your products Warranty
(3] Minimum Order Quantity
Mohammed /Purchasing Manager
Telephone: +966 3 867 1902
Fax: +966 3 867 3435
tr.export.import(a)outlook.com
PAN TRADING EQUIPMENT'S WORLDWIDE
Address: Dallah street, Al Rehab
Saudi Arabia
Hi Greg,
There are two important v4l2-core fixes on the patches merged this week by
Linux.
1) media: v4l2-core: fix size of devnode_nums[] bitarray
This patch correct a regression against Kernel 4.16. It affects notebooks
with advanced Synaptics mice (and similar touch devices). On those devices,
the pad produces an image with is handled via V4L2. Without this patch,
the input driver OOPSes at probing time:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
2) v4l2-compat-ioctl32: don't oops on overlay
This patch complements the security fix we've made at the V4L2 core
compat32 logic. It fixes an illegal use of an __user pointer without
first convert into a Kernel pointer with get_user(). It wasn't detect
before, as it uses an obscure streaming mode of V4L2 (overlay mode):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
This one should go all the way down to stable Kernels. Here, I was able
to reproduce the bug with both upstream Kernel and Kernel 3.18. The
patch applied without any conflicts on both.
Could you please add both on your next set of -stable releases?
Thanks,
Mauro
Please queue up the following networking bug fixes for v4.4 and
v4.16 -stable, respectively.
Note, you may wish to take patch "vhost: fix vhost_vq_access_ok() log check"
(upsteam d14d2b78090c7de0557362b26a4ca591aa6a9faa) for v4.15 as well
because the change it is fixing went into v4.15.17
Thanks!
Pixel 2 field testers reported that when they tried to reboot their
phones with some USB devices plugged in, the reboot would get wedged and
eventually trigger watchdog reset. Once the Pixel kernel team found a
reliable repro case, they narrowed it down to this commit's 4.4.y
backport. Reverting the change made the issue go away.
This reverts commit b07c12517f2aed0add8ce18146bb426b14099392.
Cc: stable(a)vger.kernel.org
Signed-off-by: Greg Hackmann <ghackmann(a)google.com>
---
drivers/usb/host/xhci-plat.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index df327dcc2bac..ea089fdda611 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -420,7 +420,6 @@ MODULE_DEVICE_TABLE(acpi, usb_xhci_acpi_match);
static struct platform_driver usb_xhci_driver = {
.probe = xhci_plat_probe,
.remove = xhci_plat_remove,
- .shutdown = usb_hcd_platform_shutdown,
.driver = {
.name = "xhci-hcd",
.pm = &xhci_plat_pm_ops,
--
2.17.0.484.g0c8726318c-goog
The blk-mq timeout handling code ignores completions that occur after
blk_mq_check_expired() has been called and before blk_mq_rq_timed_out()
has reset rq->aborted_gstate. If a block driver timeout handler always
returns BLK_EH_RESET_TIMER then the result will be that the request
never terminates.
Since the request state can be updated from two different contexts,
namely regular completion and request timeout, this race cannot be
fixed with RCU synchronization only. Fix this race as follows:
- Use the deadline instead of the request generation to detect whether
or not a request timer fired after reinitialization of a request.
- Store the request state in the lowest two bits of the deadline instead
of the lowest two bits of 'gstate'.
- Rename MQ_RQ_STATE_MASK into RQ_STATE_MASK and change it from an
enumeration member into a #define such that its type can be changed
into unsigned long. That allows to write & ~RQ_STATE_MASK instead of
~(unsigned long)RQ_STATE_MASK.
- Remove all request member variables that became superfluous due to
this change: gstate, aborted_gstate, gstate_seq and aborted_gstate_sync.
- Remove the request state information that became superfluous due to this
patch, namely RQF_MQ_TIMEOUT_EXPIRED.
- Remove the hctx member that became superfluous due to these changes,
namely nr_expired.
- Remove the code that became superfluous due to this change, namely
the RCU lock and unlock statements in blk_mq_complete_request() and
also the synchronize_rcu() call in the timeout handler.
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Ming Lei <ming.lei(a)redhat.com>
Cc: Sagi Grimberg <sagi(a)grimberg.me>
Cc: Israel Rukshin <israelr(a)mellanox.com>,
Cc: Max Gurtovoy <maxg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org> # v4.16
---
Changes compared to v4:
- Addressed multiple review comments from Christoph. The most important are
that atomic_long_cmpxchg() has been changed into cmpxchg() and also that
there is now a nice and clean split between the legacy and blk-mq versions
of blk_add_timer().
- Changed the patch name and modified the patch description because there is
disagreement about whether or not the v4.16 blk-mq core can complete a
single request twice. Kept the "Cc: stable" tag because of
https://bugzilla.kernel.org/show_bug.cgi?id=199077.
Changes compared to v3 (see also https://www.mail-archive.com/linux-block@vger.kernel.org/msg20073.html):
- Removed the spinlock again that was introduced to protect the request state.
v4 uses atomic_long_cmpxchg() instead.
- Split __deadline into two variables - one for the legacy block layer and one
for blk-mq.
Changes compared to v2 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18338.html):
- Rebased and retested on top of kernel v4.16.
Changes compared to v1 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18089.html):
- Removed the gstate and aborted_gstate members of struct request and used
the __deadline member to encode both the generation and state information.
block/blk-core.c | 2 -
block/blk-mq-debugfs.c | 1 -
block/blk-mq.c | 174 +++++--------------------------------------------
block/blk-mq.h | 65 ++++++++++--------
block/blk-timeout.c | 89 ++++++++++++++-----------
block/blk.h | 13 ++--
include/linux/blk-mq.h | 1 -
include/linux/blkdev.h | 30 ++-------
8 files changed, 118 insertions(+), 257 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 8625ec929fe5..181b1a688a5b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -200,8 +200,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
rq->start_time = jiffies;
set_start_time_ns(rq);
rq->part = NULL;
- seqcount_init(&rq->gstate_seq);
- u64_stats_init(&rq->aborted_gstate_sync);
}
EXPORT_SYMBOL(blk_rq_init);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 6f72413b6cab..80c7c585769f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -345,7 +345,6 @@ static const char *const rqf_name[] = {
RQF_NAME(STATS),
RQF_NAME(SPECIAL_PAYLOAD),
RQF_NAME(ZONE_WRITE_LOCKED),
- RQF_NAME(MQ_TIMEOUT_EXPIRED),
RQF_NAME(MQ_POLL_SLEPT),
};
#undef RQF_NAME
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7816d28b7219..0680977d6d98 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -305,7 +305,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
rq->special = NULL;
/* tag was already set */
rq->extra_len = 0;
- rq->__deadline = 0;
INIT_LIST_HEAD(&rq->timeout_list);
rq->timeout = 0;
@@ -481,7 +480,8 @@ void blk_mq_free_request(struct request *rq)
if (blk_rq_rl(rq))
blk_put_rl(blk_rq_rl(rq));
- blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
+ if (!blk_mq_change_rq_state(rq, blk_mq_rq_state(rq), MQ_RQ_IDLE))
+ WARN_ON_ONCE(true);
if (rq->tag != -1)
blk_mq_put_tag(hctx, hctx->tags, ctx, rq->tag);
if (sched_tag != -1)
@@ -527,8 +527,7 @@ static void __blk_mq_complete_request(struct request *rq)
bool shared = false;
int cpu;
- WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT);
- blk_mq_rq_update_state(rq, MQ_RQ_COMPLETE);
+ WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_COMPLETE);
if (rq->internal_tag != -1)
blk_mq_sched_completed_request(rq);
@@ -577,36 +576,6 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
}
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
-{
- unsigned long flags;
-
- /*
- * blk_mq_rq_aborted_gstate() is used from the completion path and
- * can thus be called from irq context. u64_stats_fetch in the
- * middle of update on the same CPU leads to lockup. Disable irq
- * while updating.
- */
- local_irq_save(flags);
- u64_stats_update_begin(&rq->aborted_gstate_sync);
- rq->aborted_gstate = gstate;
- u64_stats_update_end(&rq->aborted_gstate_sync);
- local_irq_restore(flags);
-}
-
-static u64 blk_mq_rq_aborted_gstate(struct request *rq)
-{
- unsigned int start;
- u64 aborted_gstate;
-
- do {
- start = u64_stats_fetch_begin(&rq->aborted_gstate_sync);
- aborted_gstate = rq->aborted_gstate;
- } while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start));
-
- return aborted_gstate;
-}
-
/**
* blk_mq_complete_request - end I/O on a request
* @rq: the request being processed
@@ -618,27 +587,12 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq)
void blk_mq_complete_request(struct request *rq)
{
struct request_queue *q = rq->q;
- struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
- int srcu_idx;
if (unlikely(blk_should_fake_timeout(q)))
return;
- /*
- * If @rq->aborted_gstate equals the current instance, timeout is
- * claiming @rq and we lost. This is synchronized through
- * hctx_lock(). See blk_mq_timeout_work() for details.
- *
- * Completion path never blocks and we can directly use RCU here
- * instead of hctx_lock() which can be either RCU or SRCU.
- * However, that would complicate paths which want to synchronize
- * against us. Let stay in sync with the issue path so that
- * hctx_lock() covers both issue and completion paths.
- */
- hctx_lock(hctx, &srcu_idx);
- if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
+ if (blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE))
__blk_mq_complete_request(rq);
- hctx_unlock(hctx, srcu_idx);
}
EXPORT_SYMBOL(blk_mq_complete_request);
@@ -662,27 +616,7 @@ void blk_mq_start_request(struct request *rq)
wbt_issue(q->rq_wb, &rq->issue_stat);
}
- WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
-
- /*
- * Mark @rq in-flight which also advances the generation number,
- * and register for timeout. Protect with a seqcount to allow the
- * timeout path to read both @rq->gstate and @rq->deadline
- * coherently.
- *
- * This is the only place where a request is marked in-flight. If
- * the timeout path reads an in-flight @rq->gstate, the
- * @rq->deadline it reads together under @rq->gstate_seq is
- * guaranteed to be the matching one.
- */
- preempt_disable();
- write_seqcount_begin(&rq->gstate_seq);
-
- blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
- blk_add_timer(rq);
-
- write_seqcount_end(&rq->gstate_seq);
- preempt_enable();
+ blk_mq_add_timer(rq, MQ_RQ_IDLE, MQ_RQ_IN_FLIGHT);
if (q->dma_drain_size && blk_rq_bytes(rq)) {
/*
@@ -695,22 +629,19 @@ void blk_mq_start_request(struct request *rq)
}
EXPORT_SYMBOL(blk_mq_start_request);
-/*
- * When we reach here because queue is busy, it's safe to change the state
- * to IDLE without checking @rq->aborted_gstate because we should still be
- * holding the RCU read lock and thus protected against timeout.
- */
static void __blk_mq_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
+ enum mq_rq_state old_state = blk_mq_rq_state(rq);
blk_mq_put_driver_tag(rq);
trace_block_rq_requeue(q, rq);
wbt_requeue(q->rq_wb, &rq->issue_stat);
- if (blk_mq_rq_state(rq) != MQ_RQ_IDLE) {
- blk_mq_rq_update_state(rq, MQ_RQ_IDLE);
+ if (old_state != MQ_RQ_IDLE) {
+ if (!blk_mq_change_rq_state(rq, old_state, MQ_RQ_IDLE))
+ WARN_ON_ONCE(true);
if (q->dma_drain_size && blk_rq_bytes(rq))
rq->nr_phys_segments--;
}
@@ -811,7 +742,6 @@ EXPORT_SYMBOL(blk_mq_tag_to_rq);
struct blk_mq_timeout_data {
unsigned long next;
unsigned int next_set;
- unsigned int nr_expired;
};
static void blk_mq_rq_timed_out(struct request *req, bool reserved)
@@ -819,8 +749,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
const struct blk_mq_ops *ops = req->q->mq_ops;
enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
- req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED;
-
if (ops->timeout)
ret = ops->timeout(req, reserved);
@@ -829,13 +757,7 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
__blk_mq_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
- /*
- * As nothing prevents from completion happening while
- * ->aborted_gstate is set, this may lead to ignored
- * completions and further spurious timeouts.
- */
- blk_mq_rq_update_aborted_gstate(req, 0);
- blk_add_timer(req);
+ blk_mq_add_timer(req, MQ_RQ_COMPLETE, MQ_RQ_IN_FLIGHT);
break;
case BLK_EH_NOT_HANDLED:
break;
@@ -849,60 +771,23 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
struct blk_mq_timeout_data *data = priv;
- unsigned long gstate, deadline;
- int start;
-
- might_sleep();
-
- if (rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED)
- return;
-
- /* read coherent snapshots of @rq->state_gen and @rq->deadline */
- while (true) {
- start = read_seqcount_begin(&rq->gstate_seq);
- gstate = READ_ONCE(rq->gstate);
- deadline = blk_rq_deadline(rq);
- if (!read_seqcount_retry(&rq->gstate_seq, start))
- break;
- cond_resched();
- }
+ unsigned long deadline = blk_rq_deadline(rq);
- /* if in-flight && overdue, mark for abortion */
- if ((gstate & MQ_RQ_STATE_MASK) == MQ_RQ_IN_FLIGHT &&
- time_after_eq(jiffies, deadline)) {
- blk_mq_rq_update_aborted_gstate(rq, gstate);
- data->nr_expired++;
- hctx->nr_expired++;
+ if (time_after_eq(jiffies, deadline) &&
+ blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) {
+ blk_mq_rq_timed_out(rq, reserved);
} else if (!data->next_set || time_after(data->next, deadline)) {
data->next = deadline;
data->next_set = 1;
}
-}
-static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv, bool reserved)
-{
- /*
- * We marked @rq->aborted_gstate and waited for RCU. If there were
- * completions that we lost to, they would have finished and
- * updated @rq->gstate by now; otherwise, the completion path is
- * now guaranteed to see @rq->aborted_gstate and yield. If
- * @rq->aborted_gstate still matches @rq->gstate, @rq is ours.
- */
- if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) &&
- READ_ONCE(rq->gstate) == rq->aborted_gstate)
- blk_mq_rq_timed_out(rq, reserved);
}
static void blk_mq_timeout_work(struct work_struct *work)
{
struct request_queue *q =
container_of(work, struct request_queue, timeout_work);
- struct blk_mq_timeout_data data = {
- .next = 0,
- .next_set = 0,
- .nr_expired = 0,
- };
+ struct blk_mq_timeout_data data = { };
struct blk_mq_hw_ctx *hctx;
int i;
@@ -925,33 +810,6 @@ static void blk_mq_timeout_work(struct work_struct *work)
/* scan for the expired ones and set their ->aborted_gstate */
blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data);
- if (data.nr_expired) {
- bool has_rcu = false;
-
- /*
- * Wait till everyone sees ->aborted_gstate. The
- * sequential waits for SRCUs aren't ideal. If this ever
- * becomes a problem, we can add per-hw_ctx rcu_head and
- * wait in parallel.
- */
- queue_for_each_hw_ctx(q, hctx, i) {
- if (!hctx->nr_expired)
- continue;
-
- if (!(hctx->flags & BLK_MQ_F_BLOCKING))
- has_rcu = true;
- else
- synchronize_srcu(hctx->srcu);
-
- hctx->nr_expired = 0;
- }
- if (has_rcu)
- synchronize_rcu();
-
- /* terminate the ones we won */
- blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL);
- }
-
if (data.next_set) {
data.next = blk_rq_timeout(round_jiffies_up(data.next));
mod_timer(&q->timeout, data.next);
@@ -2087,8 +1945,6 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
return ret;
}
- seqcount_init(&rq->gstate_seq);
- u64_stats_init(&rq->aborted_gstate_sync);
return 0;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..368cd73a00bd 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,18 +27,11 @@ struct blk_mq_ctx {
struct kobject kobj;
} ____cacheline_aligned_in_smp;
-/*
- * Bits for request->gstate. The lower two bits carry MQ_RQ_* state value
- * and the upper bits the generation number.
- */
+/* Lowest two bits of request->__deadline. */
enum mq_rq_state {
MQ_RQ_IDLE = 0,
MQ_RQ_IN_FLIGHT = 1,
MQ_RQ_COMPLETE = 2,
-
- MQ_RQ_STATE_BITS = 2,
- MQ_RQ_STATE_MASK = (1 << MQ_RQ_STATE_BITS) - 1,
- MQ_RQ_GEN_INC = 1 << MQ_RQ_STATE_BITS,
};
void blk_mq_freeze_queue(struct request_queue *q);
@@ -100,37 +93,55 @@ extern void blk_mq_hctx_kobj_init(struct blk_mq_hw_ctx *hctx);
void blk_mq_release(struct request_queue *q);
+/*
+ * If the state of request @rq equals @old_state, update deadline and request
+ * state atomically to @time and @new_state. blk-mq only.
+ */
+static inline bool blk_mq_rq_set_deadline(struct request *rq,
+ unsigned long new_time,
+ enum mq_rq_state old_state,
+ enum mq_rq_state new_state)
+{
+ unsigned long old_val, new_val;
+
+ do {
+ old_val = READ_ONCE(rq->__deadline);
+ if ((old_val & RQ_STATE_MASK) != old_state)
+ return false;
+ new_val = (new_time & ~RQ_STATE_MASK) |
+ (new_state & RQ_STATE_MASK);
+ } while (cmpxchg(&rq->__deadline, old_val, new_val) != old_val);
+
+ return true;
+}
+
/**
* blk_mq_rq_state() - read the current MQ_RQ_* state of a request
* @rq: target request.
*/
-static inline int blk_mq_rq_state(struct request *rq)
+static inline enum mq_rq_state blk_mq_rq_state(struct request *rq)
{
- return READ_ONCE(rq->gstate) & MQ_RQ_STATE_MASK;
+ return READ_ONCE(rq->__deadline) & RQ_STATE_MASK;
}
/**
- * blk_mq_rq_update_state() - set the current MQ_RQ_* state of a request
- * @rq: target request.
- * @state: new state to set.
+ * blk_mq_change_rq_state - atomically test and set request state
+ * @rq: Request pointer.
+ * @old_state: Old request state.
+ * @new_state: New request state.
*
- * Set @rq's state to @state. The caller is responsible for ensuring that
- * there are no other updaters. A request can transition into IN_FLIGHT
- * only from IDLE and doing so increments the generation number.
+ * Returns %true if and only if the old state was @old and if the state has
+ * been changed into @new.
*/
-static inline void blk_mq_rq_update_state(struct request *rq,
- enum mq_rq_state state)
+static inline bool blk_mq_change_rq_state(struct request *rq,
+ enum mq_rq_state old_state,
+ enum mq_rq_state new_state)
{
- u64 old_val = READ_ONCE(rq->gstate);
- u64 new_val = (old_val & ~MQ_RQ_STATE_MASK) | state;
-
- if (state == MQ_RQ_IN_FLIGHT) {
- WARN_ON_ONCE((old_val & MQ_RQ_STATE_MASK) != MQ_RQ_IDLE);
- new_val += MQ_RQ_GEN_INC;
- }
+ unsigned long old_val = (READ_ONCE(rq->__deadline) & ~RQ_STATE_MASK) |
+ old_state;
+ unsigned long new_val = (old_val & ~RQ_STATE_MASK) | new_state;
- /* avoid exposing interim values */
- WRITE_ONCE(rq->gstate, new_val);
+ return cmpxchg(&rq->__deadline, old_val, new_val) == old_val;
}
static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 50a191720055..e98da6db7d4b 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -165,8 +165,9 @@ void blk_abort_request(struct request *req)
* immediately and that scan sees the new timeout value.
* No need for fancy synchronizations.
*/
- blk_rq_set_deadline(req, jiffies);
- kblockd_schedule_work(&req->q->timeout_work);
+ if (blk_mq_rq_set_deadline(req, jiffies, MQ_RQ_IN_FLIGHT,
+ MQ_RQ_IN_FLIGHT))
+ kblockd_schedule_work(&req->q->timeout_work);
} else {
if (blk_mark_rq_complete(req))
return;
@@ -187,52 +188,17 @@ unsigned long blk_rq_timeout(unsigned long timeout)
return timeout;
}
-/**
- * blk_add_timer - Start timeout timer for a single request
- * @req: request that is about to start running.
- *
- * Notes:
- * Each request has its own timer, and as it is added to the queue, we
- * set up the timer. When the request completes, we cancel the timer.
- */
-void blk_add_timer(struct request *req)
+static void __blk_add_timer(struct request *req)
{
struct request_queue *q = req->q;
unsigned long expiry;
- if (!q->mq_ops)
- lockdep_assert_held(q->queue_lock);
-
- /* blk-mq has its own handler, so we don't need ->rq_timed_out_fn */
- if (!q->mq_ops && !q->rq_timed_out_fn)
- return;
-
- BUG_ON(!list_empty(&req->timeout_list));
-
- /*
- * Some LLDs, like scsi, peek at the timeout to prevent a
- * command from being retried forever.
- */
- if (!req->timeout)
- req->timeout = q->rq_timeout;
-
- blk_rq_set_deadline(req, jiffies + req->timeout);
- req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED;
-
- /*
- * Only the non-mq case needs to add the request to a protected list.
- * For the mq case we simply scan the tag map.
- */
- if (!q->mq_ops)
- list_add_tail(&req->timeout_list, &req->q->timeout_list);
-
/*
* If the timer isn't already pending or this timeout is earlier
* than an existing one, modify the timer. Round up to next nearest
* second.
*/
expiry = blk_rq_timeout(round_jiffies_up(blk_rq_deadline(req)));
-
if (!timer_pending(&q->timeout) ||
time_before(expiry, q->timeout.expires)) {
unsigned long diff = q->timeout.expires - expiry;
@@ -247,5 +213,52 @@ void blk_add_timer(struct request *req)
if (!timer_pending(&q->timeout) || (diff >= HZ / 2))
mod_timer(&q->timeout, expiry);
}
+}
+
+/**
+ * blk_add_timer - Start timeout timer for a single request
+ * @req: request that is about to start running.
+ *
+ * Notes:
+ * Each request has its own timer, and as it is added to the queue, we
+ * set up the timer. When the request completes, we cancel the timer.
+ */
+void blk_add_timer(struct request *req)
+{
+ struct request_queue *q = req->q;
+
+ lockdep_assert_held(q->queue_lock);
+ if (!q->rq_timed_out_fn)
+ return;
+ if (!req->timeout)
+ req->timeout = q->rq_timeout;
+
+ blk_rq_set_deadline(req, jiffies + req->timeout);
+ list_add_tail(&req->timeout_list, &req->q->timeout_list);
+
+ return __blk_add_timer(req);
+}
+
+/**
+ * blk_mq_add_timer - set the deadline for a single request
+ * @req: request for which to set the deadline.
+ * @old: current request state.
+ * @new: new request state.
+ *
+ * Sets the deadline of a request if and only if it has state @old and
+ * at the same time changes the request state from @old into @new. The caller
+ * must guarantee that the request state won't be modified while this function
+ * is in progress.
+ */
+void blk_mq_add_timer(struct request *req, enum mq_rq_state old,
+ enum mq_rq_state new)
+{
+ struct request_queue *q = req->q;
+
+ if (!req->timeout)
+ req->timeout = q->rq_timeout;
+ if (!blk_mq_rq_set_deadline(req, jiffies + req->timeout, old, new))
+ WARN_ON_ONCE(true);
+ return __blk_add_timer(req);
}
diff --git a/block/blk.h b/block/blk.h
index b034fd2460c4..7cd64f533a46 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -170,6 +170,8 @@ static inline bool bio_integrity_endio(struct bio *bio)
void blk_timeout_work(struct work_struct *work);
unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req);
+void blk_mq_add_timer(struct request *req, enum mq_rq_state old,
+ enum mq_rq_state new);
void blk_delete_timer(struct request *);
@@ -308,18 +310,19 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
}
/*
- * Steal a bit from this field for legacy IO path atomic IO marking. Note that
- * setting the deadline clears the bottom bit, potentially clearing the
- * completed bit. The user has to be OK with this (current ones are fine).
+ * Steal two bits from this field. The legacy IO path uses the lowest bit for
+ * atomic IO marking. Note that setting the deadline clears the bottom bit,
+ * potentially clearing the completed bit. The current legacy block layer is
+ * fine with that. Must be called with the request queue lock held.
*/
static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)
{
- rq->__deadline = time & ~0x1UL;
+ rq->__deadline = time & RQ_STATE_MASK;
}
static inline unsigned long blk_rq_deadline(struct request *rq)
{
- return rq->__deadline & ~0x1UL;
+ return rq->__deadline & ~RQ_STATE_MASK;
}
/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49796a3..13ccbb418e89 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -51,7 +51,6 @@ struct blk_mq_hw_ctx {
unsigned int queue_num;
atomic_t nr_active;
- unsigned int nr_expired;
struct hlist_node cpuhp_dead;
struct kobject kobj;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6075d1a6760c..302ce237c728 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -27,7 +27,6 @@
#include <linux/percpu-refcount.h>
#include <linux/scatterlist.h>
#include <linux/blkzoned.h>
-#include <linux/seqlock.h>
#include <linux/u64_stats_sync.h>
struct module;
@@ -125,10 +124,8 @@ typedef __u32 __bitwise req_flags_t;
#define RQF_SPECIAL_PAYLOAD ((__force req_flags_t)(1 << 18))
/* The per-zone write lock is held for this request */
#define RQF_ZONE_WRITE_LOCKED ((__force req_flags_t)(1 << 19))
-/* timeout is expired */
-#define RQF_MQ_TIMEOUT_EXPIRED ((__force req_flags_t)(1 << 20))
/* already slept for hybrid poll */
-#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 21))
+#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 20))
/* flags that prevent us from merging requests: */
#define RQF_NOMERGE_FLAGS \
@@ -226,27 +223,12 @@ struct request {
unsigned int extra_len; /* length of alignment and padding */
/*
- * On blk-mq, the lower bits of ->gstate (generation number and
- * state) carry the MQ_RQ_* state value and the upper bits the
- * generation number which is monotonically incremented and used to
- * distinguish the reuse instances.
- *
- * ->gstate_seq allows updates to ->gstate and other fields
- * (currently ->deadline) during request start to be read
- * atomically from the timeout path, so that it can operate on a
- * coherent set of information.
+ * Access through blk_rq_deadline() and blk_rq_set_deadline(),
+ * blk_mark_rq_complete(), blk_clear_rq_complete() and
+ * blk_rq_is_complete() for legacy queues or blk_mq_rq_state() for
+ * blk-mq queues.
*/
- seqcount_t gstate_seq;
- u64 gstate;
-
- /*
- * ->aborted_gstate is used by the timeout to claim a specific
- * recycle instance of this request. See blk_mq_timeout_work().
- */
- struct u64_stats_sync aborted_gstate_sync;
- u64 aborted_gstate;
-
- /* access through blk_rq_set_deadline, blk_rq_deadline */
+#define RQ_STATE_MASK 0x3UL
unsigned long __deadline;
struct list_head timeout_list;
--
2.16.2
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, slab: reschedule cache_reap() on the same CPU
cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled
via schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.
Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work
on wq_unbound_cpumask CPUs") there is no guarantee the future iterations
will run on the originally intended cpu, although it's still preferred. I
was able to demonstrate this with
/sys/module/workqueue/parameters/debug_force_rr_cpu. IIUC, it may also
happen due to migrating timers in nohz context. As a result, some cpu's
would be calling cache_reap() more frequently and others never.
This patch uses schedule_delayed_work_on() with the current cpu when
scheduling the next iteration.
Link: http://lkml.kernel.org/r/20180411070007.32225-1-vbabka@suse.cz
Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Pekka Enberg <penberg(a)kernel.org>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN mm/slab.c~mm-slab-reschedule-cache_reap-on-the-same-cpu mm/slab.c
--- a/mm/slab.c~mm-slab-reschedule-cache_reap-on-the-same-cpu
+++ a/mm/slab.c
@@ -4086,7 +4086,8 @@ next:
next_reap_node();
out:
/* Set up the next iteration */
- schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
+ schedule_delayed_work_on(smp_processor_id(), work,
+ round_jiffies_relative(REAPTIMEOUT_AC));
}
void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
_
From: Eric Biggers <ebiggers(a)google.com>
Subject: ipc/shm: fix use-after-free of shm file via remap_file_pages()
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately
it couldn't generate a reproducer, but I found a bug which I think caused
it. When remap_file_pages() is passed a full System V shared memory
segment, the memory is first unmapped, then a new map is created using the
->vm_file. Between these steps, the shm ID can be removed and reused for
a new shm segment. But, shm_mmap() only checks whether the ID is
currently valid before calling the underlying file's ->mmap(); it doesn't
check whether it was reused. Thus it can use the wrong underlying file,
one that was already freed.
Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches the
one associated with the "outer" file. Taking the reference to the real
shm file is needed to fully solve the problem, since otherwise sfd->file
could point to a freed file, which then could be reallocated for the
reused shm ID, causing the wrong shm segment to be mapped (and without the
required permission checks).
Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because it
didn't consider the case where the shm ID is reused.
The following program usually reproduces this bug:
#include <stdlib.h>
#include <sys/shm.h>
#include <sys/syscall.h>
#include <unistd.h>
int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}
It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed. (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report. But I think it's possible
with this bug; it would just take a more extraordinary race...)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
file_accessed include/linux/fs.h:2063 [inline]
shmem_mmap+0x25/0x40 mm/shmem.c:2149
call_mmap include/linux/fs.h:1789 [inline]
shm_mmap+0x34/0x80 ipc/shm.c:465
call_mmap include/linux/fs.h:1789 [inline]
mmap_region+0x309/0x5b0 mm/mmap.c:1712
do_mmap+0x294/0x4a0 mm/mmap.c:1483
do_mmap_pgoff include/linux/mm.h:2235 [inline]
SYSC_remap_file_pages mm/mmap.c:2853 [inline]
SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ebiggers(a)google.com: add comment]
Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439(a)syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso(a)suse.de>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff -puN ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages ipc/shm.c
--- a/ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages
+++ a/ipc/shm.c
@@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_str
if (IS_ERR(shp))
return PTR_ERR(shp);
+ if (shp->shm_file != sfd->file) {
+ /* ID was reused */
+ shm_unlock(shp);
+ return -EINVAL;
+ }
+
shp->shm_atim = ktime_get_real_seconds();
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_nattch++;
@@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, s
int ret;
/*
- * In case of remap_file_pages() emulation, the file can represent
- * removed IPC ID: propogate shm_lock() error to caller.
+ * In case of remap_file_pages() emulation, the file can represent an
+ * IPC ID that was removed, and possibly even reused by another shm
+ * segment already. Propagate this case as an error to caller.
*/
ret = __shm_open(vma);
if (ret)
@@ -480,6 +487,7 @@ static int shm_release(struct inode *ino
struct shm_file_data *sfd = shm_file_data(file);
put_ipc_ns(sfd->ns);
+ fput(sfd->file);
shm_file_data(file) = NULL;
kfree(sfd);
return 0;
@@ -1445,7 +1453,16 @@ long do_shmat(int shmid, char __user *sh
file->f_mapping = shp->shm_file->f_mapping;
sfd->id = shp->shm_perm.id;
sfd->ns = get_ipc_ns(ns);
- sfd->file = shp->shm_file;
+ /*
+ * We need to take a reference to the real shm file to prevent the
+ * pointer from becoming stale in cases where the lifetime of the outer
+ * file extends beyond that of the shm segment. It's not usually
+ * possible, but it can happen during remap_file_pages() emulation as
+ * that unmaps the memory, then does ->mmap() via file reference only.
+ * We'll deny the ->mmap() if the shm segment was since removed, but to
+ * detect shm ID reuse we need to compare the file pointers.
+ */
+ sfd->file = get_file(shp->shm_file);
sfd->vm_ops = NULL;
err = security_mmap_file(file, prot, flags);
_