The patch titled
Subject: mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
has been added to the -mm tree. Its filename is
fix-null-pointer-in-page_cache_tree_insert.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/fix-null-pointer-in-page_cache_tre…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/fix-null-pointer-in-page_cache_tre…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Matthew Wilcox <mawilcox(a)microsoft.com>
Subject: mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
f2fs specifies the __GFP_ZERO flag for allocating some of its pages.
Unfortunately, the page cache also uses the mapping's GFP flags for
allocating radix tree nodes. It always masked off the __GFP_HIGHMEM
flag, and masks off __GFP_ZERO in some paths, but not all. That causes
radix tree nodes to be allocated with a NULL list_head, which causes
backtraces like:
[<ffffff80086f4de0>] __list_del_entry+0x30/0xd0
[<ffffff8008362018>] list_lru_del+0xac/0x1ac
[<ffffff800830f04c>] page_cache_tree_insert+0xd8/0x110
The __GFP_DMA and __GFP_DMA32 flags would also be able to sneak through if
they are ever used. Fix them all by using GFP_RECLAIM_MASK at the
innermost location, and remove it from earlier in the callchain.
Link: http://lkml.kernel.org/r/20180411060320.14458-2-willy@infradead.org
Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Reported-by: Chris Fries <cfries(a)google.com>
Debugged-by: Minchan Kim <minchan(a)kernel.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff -puN mm/filemap.c~fix-null-pointer-in-page_cache_tree_insert mm/filemap.c
--- a/mm/filemap.c~fix-null-pointer-in-page_cache_tree_insert
+++ a/mm/filemap.c
@@ -786,7 +786,7 @@ int replace_page_cache_page(struct page
VM_BUG_ON_PAGE(!PageLocked(new), new);
VM_BUG_ON_PAGE(new->mapping, new);
- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
+ error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK);
if (!error) {
struct address_space *mapping = old->mapping;
void (*freepage)(struct page *);
@@ -842,7 +842,7 @@ static int __add_to_page_cache_locked(st
return error;
}
- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
+ error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK);
if (error) {
if (!huge)
mem_cgroup_cancel_charge(page, memcg, false);
@@ -1585,8 +1585,7 @@ no_page:
if (fgp_flags & FGP_ACCESSED)
__SetPageReferenced(page);
- err = add_to_page_cache_lru(page, mapping, offset,
- gfp_mask & GFP_RECLAIM_MASK);
+ err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
if (unlikely(err)) {
put_page(page);
page = NULL;
@@ -2387,7 +2386,7 @@ static int page_cache_read(struct file *
if (!page)
return -ENOMEM;
- ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
+ ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
if (ret == 0)
ret = mapping->a_ops->readpage(file, page);
else if (ret == -EEXIST)
_
Patches currently in -mm which might be from mawilcox(a)microsoft.com are
fix-null-pointer-in-page_cache_tree_insert.patch
slab-__gfp_zero-is-incompatible-with-a-constructor.patch
The below commit
"drm/atomic: Try to preserve the crtc enabled state in drm_atomic_remove_fb, v2"
introduces a slight behavioral change to rmfb. Instead of disabling a crtc
when the primary plane is disabled, it now preserves it.
Since DC is currently not equipped to handle this we need to fail such
a commit, otherwise we might see a corrupted screen.
This is based on Shirish's previous approach but avoids adding all
planes to the new atomic state which leads to a full update in DC for
any commit, and is not what we intend.
Theoretically DM should be able to deal with states with fully populated planes,
even for simple updates, such as cursor updates. This should still be
addressed in the future.
Signed-off-by: Harry Wentland <harry.wentland(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 6f92a19bebd6..0bdc6b484bad 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -4683,6 +4683,7 @@ static int dm_update_crtcs_state(struct amdgpu_display_manager *dm,
struct amdgpu_dm_connector *aconnector = NULL;
struct drm_connector_state *new_con_state = NULL;
struct dm_connector_state *dm_conn_state = NULL;
+ struct drm_plane_state *new_plane_state = NULL;
new_stream = NULL;
@@ -4690,6 +4691,13 @@ static int dm_update_crtcs_state(struct amdgpu_display_manager *dm,
dm_new_crtc_state = to_dm_crtc_state(new_crtc_state);
acrtc = to_amdgpu_crtc(crtc);
+ new_plane_state = drm_atomic_get_new_plane_state(state, new_crtc_state->crtc->primary);
+
+ if (new_crtc_state->enable && new_plane_state && !new_plane_state->fb) {
+ ret = -EINVAL;
+ goto fail;
+ }
+
aconnector = amdgpu_dm_find_first_crtc_matching_connector(state, crtc);
/* TODO This hack should go away */
@@ -4894,7 +4902,7 @@ static int dm_update_planes_state(struct dc *dc,
if (!dm_old_crtc_state->stream)
continue;
- DRM_DEBUG_DRIVER("Disabling DRM plane: %d on DRM crtc %d\n",
+ DRM_DEBUG_ATOMIC("Disabling DRM plane: %d on DRM crtc %d\n",
plane->base.id, old_plane_crtc->base.id);
if (!dc_remove_plane_from_context(
--
2.17.0
Hi.
[This is an automated email]
This commit has been processed by the -stable helper bot and determined
to be a high probability candidate for -stable trees. (score: 13.1846)
The bot has tested the following trees: v4.15.15, v4.14.32, v4.9.92, v4.4.126,
v4.15.15: Build OK!
v4.14.32: Build OK!
v4.9.92: Failed to apply! Possible dependencies:
3920ad4951e2 ("usbip: vhc_hcd: prevent module being removed while device are attached")
a38711a88b7e ("usbip: auto retry for concurrent attach")
v4.4.126: Failed to apply! Possible dependencies:
0775a9cbc694 ("usbip: vhci extension: modifications to vhci driver")
3920ad4951e2 ("usbip: vhc_hcd: prevent module being removed while device are attached")
a38711a88b7e ("usbip: auto retry for concurrent attach")
Please let us know if you'd like to have this patch included in a stable tree.
--
Thanks.
Sasha
If a completion occurs after blk_mq_rq_timed_out() has reset
rq->aborted_gstate and the request is again in flight when the timeout
expires then a request will be completed twice: a first time by the
timeout handler and a second time when the regular completion occurs.
Additionally, the blk-mq timeout handling code ignores completions that
occur after blk_mq_check_expired() has been called and before
blk_mq_rq_timed_out() has reset rq->aborted_gstate. If a block driver
timeout handler always returns BLK_EH_RESET_TIMER then the result will
be that the request never terminates.
Since the request state can be updated from two different contexts,
namely regular completion and request timeout, this race cannot be
fixed with RCU synchronization only. Fix this race as follows:
- Split __deadline in two variables, namely lq_deadline for legacy
request queues and mq_deadline for blk-mq request queues. Use atomic
operations to update mq_deadline.
- Use the deadline instead of the request generation to detect whether
or not a request timer fired after reinitialization of a request.
- Store the request state in the lowest two bits of the deadline instead
of the lowest two bits of 'gstate'.
- Remove all request member variables that became superfluous due to
this change: gstate, aborted_gstate, gstate_seq and aborted_gstate_sync.
- Remove the request state information that became superfluous due to this
patch, namely RQF_MQ_TIMEOUT_EXPIRED.
- Remove the hctx member that became superfluous due to these changes,
namely nr_expired.
- Remove the code that became superfluous due to this change, namely
the RCU lock and unlock statements in blk_mq_complete_request() and
also the synchronize_rcu() call in the timeout handler.
This patch fixes the following kernel crashes:
BUG: unable to handle kernel NULL pointer dereference at (null)
Oops: 0000 [#1] PREEMPT SMP
CPU: 2 PID: 151 Comm: kworker/2:1H Tainted: G W 4.15.0-dbg+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: kblockd blk_mq_timeout_work
RIP: 0010:scsi_times_out+0x17/0x2c0 [scsi_mod]
Call Trace:
blk_mq_terminate_expired+0x42/0x80
bt_iter+0x3d/0x50
blk_mq_queue_tag_busy_iter+0xe9/0x200
blk_mq_timeout_work+0x181/0x2e0
process_one_work+0x21c/0x6d0
worker_thread+0x35/0x380
kthread+0x117/0x130
ret_from_fork+0x24/0x30
This patch also fixes a double completion problem in the NVMeOF
initiator driver. See also http://lists.infradead.org/pipermail/linux-nvme/2018-February/015848.html.
Fixes: 1d9bd5161ba3 ("blk-mq: replace timeout synchronization with a RCU and generation based scheme")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Sagi Grimberg <sagi(a)grimberg.me>
Cc: Israel Rukshin <israelr(a)mellanox.com>,
Cc: Max Gurtovoy <maxg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org> # v4.16
---
Changes compared to v3 (see also https://www.mail-archive.com/linux-block@vger.kernel.org/msg20073.html):
- Removed the spinlock again that was introduced to protect the request state.
v4 uses atomic_long_cmpxchg() instead.
- Split __deadline into two variables - one for the legacy block layer and one
for blk-mq.
Changes compared to v2 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18338.html):
- Rebased and retested on top of kernel v4.16.
Changes compared to v1 (https://www.mail-archive.com/linux-block@vger.kernel.org/msg18089.html):
- Removed the gstate and aborted_gstate members of struct request and used
the __deadline member to encode both the generation and state information.
block/blk-core.c | 2 -
block/blk-mq-debugfs.c | 1 -
block/blk-mq.c | 166 +++----------------------------------------------
block/blk-mq.h | 47 ++++++++------
block/blk-timeout.c | 57 ++++++++++++-----
block/blk.h | 41 ++++++++++--
include/linux/blk-mq.h | 1 -
include/linux/blkdev.h | 32 +++-------
8 files changed, 122 insertions(+), 225 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 0c48bef8490f..422b79b61bb9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -200,8 +200,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
rq->start_time = jiffies;
set_start_time_ns(rq);
rq->part = NULL;
- seqcount_init(&rq->gstate_seq);
- u64_stats_init(&rq->aborted_gstate_sync);
}
EXPORT_SYMBOL(blk_rq_init);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 6f72413b6cab..80c7c585769f 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -345,7 +345,6 @@ static const char *const rqf_name[] = {
RQF_NAME(STATS),
RQF_NAME(SPECIAL_PAYLOAD),
RQF_NAME(ZONE_WRITE_LOCKED),
- RQF_NAME(MQ_TIMEOUT_EXPIRED),
RQF_NAME(MQ_POLL_SLEPT),
};
#undef RQF_NAME
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 7816d28b7219..337e10a5a30c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -305,7 +305,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
rq->special = NULL;
/* tag was already set */
rq->extra_len = 0;
- rq->__deadline = 0;
INIT_LIST_HEAD(&rq->timeout_list);
rq->timeout = 0;
@@ -527,8 +526,7 @@ static void __blk_mq_complete_request(struct request *rq)
bool shared = false;
int cpu;
- WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT);
- blk_mq_rq_update_state(rq, MQ_RQ_COMPLETE);
+ WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_COMPLETE);
if (rq->internal_tag != -1)
blk_mq_sched_completed_request(rq);
@@ -577,36 +575,6 @@ static void hctx_lock(struct blk_mq_hw_ctx *hctx, int *srcu_idx)
*srcu_idx = srcu_read_lock(hctx->srcu);
}
-static void blk_mq_rq_update_aborted_gstate(struct request *rq, u64 gstate)
-{
- unsigned long flags;
-
- /*
- * blk_mq_rq_aborted_gstate() is used from the completion path and
- * can thus be called from irq context. u64_stats_fetch in the
- * middle of update on the same CPU leads to lockup. Disable irq
- * while updating.
- */
- local_irq_save(flags);
- u64_stats_update_begin(&rq->aborted_gstate_sync);
- rq->aborted_gstate = gstate;
- u64_stats_update_end(&rq->aborted_gstate_sync);
- local_irq_restore(flags);
-}
-
-static u64 blk_mq_rq_aborted_gstate(struct request *rq)
-{
- unsigned int start;
- u64 aborted_gstate;
-
- do {
- start = u64_stats_fetch_begin(&rq->aborted_gstate_sync);
- aborted_gstate = rq->aborted_gstate;
- } while (u64_stats_fetch_retry(&rq->aborted_gstate_sync, start));
-
- return aborted_gstate;
-}
-
/**
* blk_mq_complete_request - end I/O on a request
* @rq: the request being processed
@@ -618,27 +586,12 @@ static u64 blk_mq_rq_aborted_gstate(struct request *rq)
void blk_mq_complete_request(struct request *rq)
{
struct request_queue *q = rq->q;
- struct blk_mq_hw_ctx *hctx = blk_mq_map_queue(q, rq->mq_ctx->cpu);
- int srcu_idx;
if (unlikely(blk_should_fake_timeout(q)))
return;
- /*
- * If @rq->aborted_gstate equals the current instance, timeout is
- * claiming @rq and we lost. This is synchronized through
- * hctx_lock(). See blk_mq_timeout_work() for details.
- *
- * Completion path never blocks and we can directly use RCU here
- * instead of hctx_lock() which can be either RCU or SRCU.
- * However, that would complicate paths which want to synchronize
- * against us. Let stay in sync with the issue path so that
- * hctx_lock() covers both issue and completion paths.
- */
- hctx_lock(hctx, &srcu_idx);
- if (blk_mq_rq_aborted_gstate(rq) != rq->gstate)
+ if (blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE))
__blk_mq_complete_request(rq);
- hctx_unlock(hctx, srcu_idx);
}
EXPORT_SYMBOL(blk_mq_complete_request);
@@ -662,27 +615,8 @@ void blk_mq_start_request(struct request *rq)
wbt_issue(q->rq_wb, &rq->issue_stat);
}
- WARN_ON_ONCE(blk_mq_rq_state(rq) != MQ_RQ_IDLE);
-
- /*
- * Mark @rq in-flight which also advances the generation number,
- * and register for timeout. Protect with a seqcount to allow the
- * timeout path to read both @rq->gstate and @rq->deadline
- * coherently.
- *
- * This is the only place where a request is marked in-flight. If
- * the timeout path reads an in-flight @rq->gstate, the
- * @rq->deadline it reads together under @rq->gstate_seq is
- * guaranteed to be the matching one.
- */
- preempt_disable();
- write_seqcount_begin(&rq->gstate_seq);
-
- blk_mq_rq_update_state(rq, MQ_RQ_IN_FLIGHT);
- blk_add_timer(rq);
-
- write_seqcount_end(&rq->gstate_seq);
- preempt_enable();
+ /* Mark @rq in-flight and set its deadline. */
+ blk_mq_add_timer(rq, MQ_RQ_IDLE, MQ_RQ_IN_FLIGHT);
if (q->dma_drain_size && blk_rq_bytes(rq)) {
/*
@@ -695,11 +629,6 @@ void blk_mq_start_request(struct request *rq)
}
EXPORT_SYMBOL(blk_mq_start_request);
-/*
- * When we reach here because queue is busy, it's safe to change the state
- * to IDLE without checking @rq->aborted_gstate because we should still be
- * holding the RCU read lock and thus protected against timeout.
- */
static void __blk_mq_requeue_request(struct request *rq)
{
struct request_queue *q = rq->q;
@@ -811,7 +740,6 @@ EXPORT_SYMBOL(blk_mq_tag_to_rq);
struct blk_mq_timeout_data {
unsigned long next;
unsigned int next_set;
- unsigned int nr_expired;
};
static void blk_mq_rq_timed_out(struct request *req, bool reserved)
@@ -819,8 +747,6 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
const struct blk_mq_ops *ops = req->q->mq_ops;
enum blk_eh_timer_return ret = BLK_EH_RESET_TIMER;
- req->rq_flags |= RQF_MQ_TIMEOUT_EXPIRED;
-
if (ops->timeout)
ret = ops->timeout(req, reserved);
@@ -829,13 +755,7 @@ static void blk_mq_rq_timed_out(struct request *req, bool reserved)
__blk_mq_complete_request(req);
break;
case BLK_EH_RESET_TIMER:
- /*
- * As nothing prevents from completion happening while
- * ->aborted_gstate is set, this may lead to ignored
- * completions and further spurious timeouts.
- */
- blk_mq_rq_update_aborted_gstate(req, 0);
- blk_add_timer(req);
+ blk_mq_add_timer(req, MQ_RQ_COMPLETE, MQ_RQ_IN_FLIGHT);
break;
case BLK_EH_NOT_HANDLED:
break;
@@ -849,60 +769,23 @@ static void blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
struct request *rq, void *priv, bool reserved)
{
struct blk_mq_timeout_data *data = priv;
- unsigned long gstate, deadline;
- int start;
-
- might_sleep();
-
- if (rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED)
- return;
-
- /* read coherent snapshots of @rq->state_gen and @rq->deadline */
- while (true) {
- start = read_seqcount_begin(&rq->gstate_seq);
- gstate = READ_ONCE(rq->gstate);
- deadline = blk_rq_deadline(rq);
- if (!read_seqcount_retry(&rq->gstate_seq, start))
- break;
- cond_resched();
- }
+ unsigned long deadline = blk_mq_rq_deadline(rq);
- /* if in-flight && overdue, mark for abortion */
- if ((gstate & MQ_RQ_STATE_MASK) == MQ_RQ_IN_FLIGHT &&
- time_after_eq(jiffies, deadline)) {
- blk_mq_rq_update_aborted_gstate(rq, gstate);
- data->nr_expired++;
- hctx->nr_expired++;
+ if (time_after_eq(jiffies, deadline) &&
+ blk_mq_change_rq_state(rq, MQ_RQ_IN_FLIGHT, MQ_RQ_COMPLETE)) {
+ blk_mq_rq_timed_out(rq, reserved);
} else if (!data->next_set || time_after(data->next, deadline)) {
data->next = deadline;
data->next_set = 1;
}
-}
-static void blk_mq_terminate_expired(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv, bool reserved)
-{
- /*
- * We marked @rq->aborted_gstate and waited for RCU. If there were
- * completions that we lost to, they would have finished and
- * updated @rq->gstate by now; otherwise, the completion path is
- * now guaranteed to see @rq->aborted_gstate and yield. If
- * @rq->aborted_gstate still matches @rq->gstate, @rq is ours.
- */
- if (!(rq->rq_flags & RQF_MQ_TIMEOUT_EXPIRED) &&
- READ_ONCE(rq->gstate) == rq->aborted_gstate)
- blk_mq_rq_timed_out(rq, reserved);
}
static void blk_mq_timeout_work(struct work_struct *work)
{
struct request_queue *q =
container_of(work, struct request_queue, timeout_work);
- struct blk_mq_timeout_data data = {
- .next = 0,
- .next_set = 0,
- .nr_expired = 0,
- };
+ struct blk_mq_timeout_data data = { };
struct blk_mq_hw_ctx *hctx;
int i;
@@ -925,33 +808,6 @@ static void blk_mq_timeout_work(struct work_struct *work)
/* scan for the expired ones and set their ->aborted_gstate */
blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &data);
- if (data.nr_expired) {
- bool has_rcu = false;
-
- /*
- * Wait till everyone sees ->aborted_gstate. The
- * sequential waits for SRCUs aren't ideal. If this ever
- * becomes a problem, we can add per-hw_ctx rcu_head and
- * wait in parallel.
- */
- queue_for_each_hw_ctx(q, hctx, i) {
- if (!hctx->nr_expired)
- continue;
-
- if (!(hctx->flags & BLK_MQ_F_BLOCKING))
- has_rcu = true;
- else
- synchronize_srcu(hctx->srcu);
-
- hctx->nr_expired = 0;
- }
- if (has_rcu)
- synchronize_rcu();
-
- /* terminate the ones we won */
- blk_mq_queue_tag_busy_iter(q, blk_mq_terminate_expired, NULL);
- }
-
if (data.next_set) {
data.next = blk_rq_timeout(round_jiffies_up(data.next));
mod_timer(&q->timeout, data.next);
@@ -2087,8 +1943,6 @@ static int blk_mq_init_request(struct blk_mq_tag_set *set, struct request *rq,
return ret;
}
- seqcount_init(&rq->gstate_seq);
- u64_stats_init(&rq->aborted_gstate_sync);
return 0;
}
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 88c558f71819..4f96fd66eb8a 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,10 +27,7 @@ struct blk_mq_ctx {
struct kobject kobj;
} ____cacheline_aligned_in_smp;
-/*
- * Bits for request->gstate. The lower two bits carry MQ_RQ_* state value
- * and the upper bits the generation number.
- */
+/* Lowest two bits of request->mq_deadline. */
enum mq_rq_state {
MQ_RQ_IDLE = 0,
MQ_RQ_IN_FLIGHT = 1,
@@ -38,7 +35,6 @@ enum mq_rq_state {
MQ_RQ_STATE_BITS = 2,
MQ_RQ_STATE_MASK = (1 << MQ_RQ_STATE_BITS) - 1,
- MQ_RQ_GEN_INC = 1 << MQ_RQ_STATE_BITS,
};
void blk_mq_freeze_queue(struct request_queue *q);
@@ -104,9 +100,30 @@ void blk_mq_release(struct request_queue *q);
* blk_mq_rq_state() - read the current MQ_RQ_* state of a request
* @rq: target request.
*/
-static inline int blk_mq_rq_state(struct request *rq)
+static inline enum mq_rq_state blk_mq_rq_state(struct request *rq)
{
- return READ_ONCE(rq->gstate) & MQ_RQ_STATE_MASK;
+ return atomic_long_read(&rq->mq_deadline) & MQ_RQ_STATE_MASK;
+}
+
+/**
+ * blk_mq_change_rq_state - atomically test and set request state
+ * @rq: Request pointer.
+ * @old: Old request state.
+ * @new: New request state.
+ *
+ * Returns %true if and only if the old state was @old and if the state has
+ * been changed into @new.
+ */
+static inline bool blk_mq_change_rq_state(struct request *rq,
+ enum mq_rq_state old_s,
+ enum mq_rq_state new_s)
+{
+ unsigned long old_d = (atomic_long_read(&rq->mq_deadline) &
+ ~(unsigned long)MQ_RQ_STATE_MASK) | old_s;
+ unsigned long new_d = (old_d & ~(unsigned long)MQ_RQ_STATE_MASK) |
+ new_s;
+
+ return atomic_long_cmpxchg(&rq->mq_deadline, old_d, new_d) == old_d;
}
/**
@@ -114,23 +131,13 @@ static inline int blk_mq_rq_state(struct request *rq)
* @rq: target request.
* @state: new state to set.
*
- * Set @rq's state to @state. The caller is responsible for ensuring that
- * there are no other updaters. A request can transition into IN_FLIGHT
- * only from IDLE and doing so increments the generation number.
+ * Set @rq's state to @state.
*/
static inline void blk_mq_rq_update_state(struct request *rq,
- enum mq_rq_state state)
+ enum mq_rq_state new_s)
{
- u64 old_val = READ_ONCE(rq->gstate);
- u64 new_val = (old_val & ~MQ_RQ_STATE_MASK) | state;
-
- if (state == MQ_RQ_IN_FLIGHT) {
- WARN_ON_ONCE((old_val & MQ_RQ_STATE_MASK) != MQ_RQ_IDLE);
- new_val += MQ_RQ_GEN_INC;
+ while (!blk_mq_change_rq_state(rq, blk_mq_rq_state(rq), new_s)) {
}
-
- /* avoid exposing interim values */
- WRITE_ONCE(rq->gstate, new_val);
}
static inline struct blk_mq_ctx *__blk_mq_get_ctx(struct request_queue *q,
diff --git a/block/blk-timeout.c b/block/blk-timeout.c
index 50a191720055..3ca829dce2d6 100644
--- a/block/blk-timeout.c
+++ b/block/blk-timeout.c
@@ -165,8 +165,9 @@ void blk_abort_request(struct request *req)
* immediately and that scan sees the new timeout value.
* No need for fancy synchronizations.
*/
- blk_rq_set_deadline(req, jiffies);
- kblockd_schedule_work(&req->q->timeout_work);
+ if (blk_mq_rq_set_deadline(req, jiffies, MQ_RQ_IN_FLIGHT,
+ MQ_RQ_IN_FLIGHT))
+ kblockd_schedule_work(&req->q->timeout_work);
} else {
if (blk_mark_rq_complete(req))
return;
@@ -187,15 +188,8 @@ unsigned long blk_rq_timeout(unsigned long timeout)
return timeout;
}
-/**
- * blk_add_timer - Start timeout timer for a single request
- * @req: request that is about to start running.
- *
- * Notes:
- * Each request has its own timer, and as it is added to the queue, we
- * set up the timer. When the request completes, we cancel the timer.
- */
-void blk_add_timer(struct request *req)
+static void __blk_add_timer(struct request *req, enum mq_rq_state old,
+ enum mq_rq_state new)
{
struct request_queue *q = req->q;
unsigned long expiry;
@@ -216,15 +210,17 @@ void blk_add_timer(struct request *req)
if (!req->timeout)
req->timeout = q->rq_timeout;
- blk_rq_set_deadline(req, jiffies + req->timeout);
- req->rq_flags &= ~RQF_MQ_TIMEOUT_EXPIRED;
-
/*
* Only the non-mq case needs to add the request to a protected list.
* For the mq case we simply scan the tag map.
*/
- if (!q->mq_ops)
+ if (!q->mq_ops) {
+ blk_rq_set_deadline(req, jiffies + req->timeout);
list_add_tail(&req->timeout_list, &req->q->timeout_list);
+ } else {
+ WARN_ON_ONCE(!blk_mq_rq_set_deadline(req, jiffies +
+ req->timeout, old, new));
+ }
/*
* If the timer isn't already pending or this timeout is earlier
@@ -249,3 +245,34 @@ void blk_add_timer(struct request *req)
}
}
+
+/**
+ * blk_add_timer - Start timeout timer for a single request
+ * @req: request that is about to start running.
+ *
+ * Notes:
+ * Each request has its own timer, and as it is added to the queue, we
+ * set up the timer. When the request completes, we cancel the timer.
+ */
+void blk_add_timer(struct request *req)
+{
+ return __blk_add_timer(req, MQ_RQ_IDLE/*ignored*/,
+ MQ_RQ_IDLE/*ignored*/);
+}
+
+/**
+ * blk_mq_add_timer - set the deadline for a single request
+ * @req: request for which to set the deadline.
+ * @old: current request state.
+ * @new: new request state.
+ *
+ * Sets the deadline of a request if and only if it has state @old and
+ * at the same time changes the request state from @old into @new. The caller
+ * must guarantee that the request state won't be modified while this function
+ * is in progress.
+ */
+void blk_mq_add_timer(struct request *req, enum mq_rq_state old,
+ enum mq_rq_state new)
+{
+ return __blk_add_timer(req, old, new);
+}
diff --git a/block/blk.h b/block/blk.h
index b034fd2460c4..7665d4af777e 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -170,6 +170,8 @@ static inline bool bio_integrity_endio(struct bio *bio)
void blk_timeout_work(struct work_struct *work);
unsigned long blk_rq_timeout(unsigned long timeout);
void blk_add_timer(struct request *req);
+void blk_mq_add_timer(struct request *req, enum mq_rq_state old,
+ enum mq_rq_state new);
void blk_delete_timer(struct request *);
@@ -191,21 +193,21 @@ void blk_account_io_done(struct request *req);
/*
* EH timer and IO completion will both attempt to 'grab' the request, make
* sure that only one of them succeeds. Steal the bottom bit of the
- * __deadline field for this.
+ * lq_deadline field for this.
*/
static inline int blk_mark_rq_complete(struct request *rq)
{
- return test_and_set_bit(0, &rq->__deadline);
+ return test_and_set_bit(0, &rq->lq_deadline);
}
static inline void blk_clear_rq_complete(struct request *rq)
{
- clear_bit(0, &rq->__deadline);
+ clear_bit(0, &rq->lq_deadline);
}
static inline bool blk_rq_is_complete(struct request *rq)
{
- return test_bit(0, &rq->__deadline);
+ return test_bit(0, &rq->lq_deadline);
}
/*
@@ -311,15 +313,42 @@ static inline void req_set_nomerge(struct request_queue *q, struct request *req)
* Steal a bit from this field for legacy IO path atomic IO marking. Note that
* setting the deadline clears the bottom bit, potentially clearing the
* completed bit. The user has to be OK with this (current ones are fine).
+ * Must be called with the request queue lock held.
*/
static inline void blk_rq_set_deadline(struct request *rq, unsigned long time)
{
- rq->__deadline = time & ~0x1UL;
+ rq->lq_deadline = time & ~0x1UL;
}
static inline unsigned long blk_rq_deadline(struct request *rq)
{
- return rq->__deadline & ~0x1UL;
+ return rq->lq_deadline & ~0x1UL;
+}
+
+/*
+ * If the state of request @rq equals @old_s, update deadline and request state
+ * atomically to @time and @new_s. blk-mq only.
+ */
+static inline bool blk_mq_rq_set_deadline(struct request *rq,
+ unsigned long time,
+ enum mq_rq_state old_s,
+ enum mq_rq_state new_s)
+{
+ unsigned long old_d, new_d;
+
+ do {
+ old_d = atomic_long_read(&rq->mq_deadline);
+ if ((old_d & MQ_RQ_STATE_MASK) != old_s)
+ return false;
+ new_d = (time & ~0x3UL) | (new_s & 3UL);
+ } while (atomic_long_cmpxchg(&rq->mq_deadline, old_d, new_d) != old_d);
+
+ return true;
+}
+
+static inline unsigned long blk_mq_rq_deadline(struct request *rq)
+{
+ return atomic_long_read(&rq->mq_deadline) & ~0x3UL;
}
/*
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 8efcf49796a3..13ccbb418e89 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -51,7 +51,6 @@ struct blk_mq_hw_ctx {
unsigned int queue_num;
atomic_t nr_active;
- unsigned int nr_expired;
struct hlist_node cpuhp_dead;
struct kobject kobj;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6075d1a6760c..abf78819014b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -27,7 +27,6 @@
#include <linux/percpu-refcount.h>
#include <linux/scatterlist.h>
#include <linux/blkzoned.h>
-#include <linux/seqlock.h>
#include <linux/u64_stats_sync.h>
struct module;
@@ -125,8 +124,6 @@ typedef __u32 __bitwise req_flags_t;
#define RQF_SPECIAL_PAYLOAD ((__force req_flags_t)(1 << 18))
/* The per-zone write lock is held for this request */
#define RQF_ZONE_WRITE_LOCKED ((__force req_flags_t)(1 << 19))
-/* timeout is expired */
-#define RQF_MQ_TIMEOUT_EXPIRED ((__force req_flags_t)(1 << 20))
/* already slept for hybrid poll */
#define RQF_MQ_POLL_SLEPT ((__force req_flags_t)(1 << 21))
@@ -226,28 +223,15 @@ struct request {
unsigned int extra_len; /* length of alignment and padding */
/*
- * On blk-mq, the lower bits of ->gstate (generation number and
- * state) carry the MQ_RQ_* state value and the upper bits the
- * generation number which is monotonically incremented and used to
- * distinguish the reuse instances.
- *
- * ->gstate_seq allows updates to ->gstate and other fields
- * (currently ->deadline) during request start to be read
- * atomically from the timeout path, so that it can operate on a
- * coherent set of information.
+ * Access through blk_rq_set_deadline(), blk_rq_deadline() and
+ * blk_mark_rq_complete(), blk_clear_rq_complete() and
+ * blk_rq_is_complete() for legacy queues or blk_mq_rq_set_deadline(),
+ * blk_mq_rq_deadline() and blk_mq_rq_state() for blk-mq queues.
*/
- seqcount_t gstate_seq;
- u64 gstate;
-
- /*
- * ->aborted_gstate is used by the timeout to claim a specific
- * recycle instance of this request. See blk_mq_timeout_work().
- */
- struct u64_stats_sync aborted_gstate_sync;
- u64 aborted_gstate;
-
- /* access through blk_rq_set_deadline, blk_rq_deadline */
- unsigned long __deadline;
+ union {
+ unsigned long lq_deadline;
+ atomic_long_t mq_deadline;
+ };
struct list_head timeout_list;
--
2.16.2
------------------------
NOTE, this is the last expected 4.15.y release. After this one, the
tree is end-of-life. Please move to 4.16.y at this point in time.
------------------------
This is the start of the stable review cycle for the 4.15.18 release.
There are 53 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Apr 19 15:57:06 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.18-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.15.18-rc1
Amir Goldstein <amir73il(a)gmail.com>
ovl: set lower layer st_dev only if setting lower st_ino
Sudhir Sreedharan <ssreedharan(a)mvista.com>
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
Hans de Goede <hdegoede(a)redhat.com>
Bluetooth: hci_bcm: Treat Interrupt ACPI resources as always being active-low
Szymon Janc <szymon.janc(a)codecoup.pl>
Bluetooth: Fix connection if directed advertising and privacy is used
Al Viro <viro(a)zeniv.linux.org.uk>
getname_kernel() needs to make sure that ->name != ->iname in long case
Michael S. Tsirkin <mst(a)redhat.com>
mm/gup_benchmark: handle gup failures
Michael S. Tsirkin <mst(a)redhat.com>
get_user_pages_fast(): return -EFAULT on access_ok failure
Heiko Carstens <heiko.carstens(a)de.ibm.com>
s390/compat: fix setup_frame32
Vasily Gorbik <gor(a)linux.ibm.com>
s390/ipl: ensure loadparm valid flag is set
Julian Wiedmann <jwi(a)linux.vnet.ibm.com>
s390/qdio: don't merge ERROR output buffers
Julian Wiedmann <jwi(a)linux.vnet.ibm.com>
s390/qdio: don't retry EQBS after CCQ 96
Dan Williams <dan.j.williams(a)intel.com>
nfit: fix region registration vs block-data-window ranges
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
block/loop: fix deadlock after loop_set_status
John Johansen <john.johansen(a)canonical.com>
apparmor: fix resource audit messages when auditing peer
John Johansen <john.johansen(a)canonical.com>
apparmor: fix display of .ns_name for containers
John Johansen <john.johansen(a)canonical.com>
apparmor: fix logging of the existence test for signals
Bill Kuzeja <William.Kuzeja(a)stratus.com>
scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
J. Bruce Fields <bfields(a)redhat.com>
nfsd: fix incorrect umasks
Mike Kravetz <mike.kravetz(a)oracle.com>
hugetlbfs: fix bug in pgoff overflow checking
Simon Gaiser <simon(a)invisiblethingslab.com>
xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
Amir Goldstein <amir73il(a)gmail.com>
ovl: fix lookup with middle layer opaque dir and absolute path redirects
Ming Lei <ming.lei(a)redhat.com>
blk-mq: don't keep offline CPUs mapped to hctx 0
Ming Lei <ming.lei(a)redhat.com>
blk-mq: order getting budget and driver tag
Yury Norov <ynorov(a)caviumnetworks.com>
lib: fix stall in __bitmap_parselist()
Keith Busch <keith.busch(a)intel.com>
nvme: Skip checking heads without namespaces
Bart Van Assche <bart.vanassche(a)wdc.com>
block: Change a rcu_read_{lock,unlock}_sched() pair into rcu_read_{lock,unlock}()
Yunlong Song <yunlong.song(a)huawei.com>
f2fs: fix heap mode to reset it back
Eric Biggers <ebiggers(a)google.com>
sunrpc: remove incorrect HMAC request initialization
Li RongQing <lirongqing(a)baidu.com>
x86/apic: Fix signedness bug in APIC ID validity checks
Toke Høiland-Jørgensen <toke(a)toke.dk>
ath9k: Protect queue draining by rcu_read_lock()
Marek Szyprowski <m.szyprowski(a)samsung.com>
hwmon: (ina2xx) Fix access to uninitialized mutex
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/mce/AMD: Get address from already initialized block
Prashant Bhole <bhole_prashant_q7(a)lab.ntt.co.jp>
perf/core: Fix use-after-free in uprobe_perf_close()
Nicholas Piggin <npiggin(a)gmail.com>
KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
Dexuan Cui <decui(a)microsoft.com>
PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()
Dexuan Cui <decui(a)microsoft.com>
PCI: hv: Serialize the present and eject work items
Dexuan Cui <decui(a)microsoft.com>
Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Helge Deller <deller(a)gmx.de>
parisc: Fix HPMC handler by increasing size to multiple of 16 bytes
Helge Deller <deller(a)gmx.de>
parisc: Fix out of array access in match_pci_device()
Corey Minyard <cminyard(a)mvista.com>
ipmi: Fix some error cleanup issues
Kieran Bingham <kieran.bingham+renesas(a)ideasonboard.com>
media: v4l: vsp1: Fix header display list status check in continuous mode
Mauro Carvalho Chehab <mchehab(a)kernel.org>
media: v4l2-compat-ioctl32: don't oops on overlay
Phil Elwell <phil(a)raspberrypi.org>
lan78xx: Correctly indicate invalid OTP
Eric Auger <eric.auger(a)redhat.com>
vhost: Fix vhost_copy_to_user()
Sabrina Dubroca <sd(a)queasysnail.net>
ip_gre: clear feature flags when incompatible o_flags are set
Guillaume Nault <g.nault(a)alphalink.fr>
l2tp: fix race in duplicate tunnel detection
Guillaume Nault <g.nault(a)alphalink.fr>
l2tp: fix races in tunnel creation
Stefan Hajnoczi <stefanha(a)redhat.com>
vhost: fix vhost_vq_access_ok() log check
Tejaswi Tanikella <tejaswit(a)codeaurora.org>
slip: Check if rstate is initialized before uncompressing
Ka-Cheong Poon <ka-cheong.poon(a)oracle.com>
rds: MP-RDS may use an invalid c_path
Bassem Boubaker <bassem.boubaker(a)actia.fr>
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
Jozsef Kadlecsik <kadlec(a)blackhole.kfki.hu>
netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to ip_set_net_exit()
Manasi Navare <manasi.d.navare(a)intel.com>
drm/i915/edp: Do not do link training fallback or prune modes on EDP
-------------
Diffstat:
Makefile | 4 +-
arch/parisc/kernel/drivers.c | 4 +
arch/parisc/kernel/hpmc.S | 6 +-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 -
arch/s390/kernel/compat_signal.c | 2 +-
arch/s390/kernel/ipl.c | 1 +
arch/x86/include/asm/apic.h | 4 +-
arch/x86/kernel/acpi/boot.c | 13 +-
arch/x86/kernel/apic/apic_common.c | 2 +-
arch/x86/kernel/apic/apic_numachip.c | 2 +-
arch/x86/kernel/apic/x2apic.h | 2 +-
arch/x86/kernel/apic/x2apic_phys.c | 2 +-
arch/x86/kernel/apic/x2apic_uv_x.c | 2 +-
arch/x86/kernel/cpu/mcheck/mce_amd.c | 15 ++
arch/x86/xen/apic.c | 2 +-
block/blk-core.c | 4 +-
block/blk-mq-cpumap.c | 5 -
block/blk-mq.c | 21 +-
drivers/acpi/nfit/core.c | 22 +-
drivers/block/loop.c | 12 +-
drivers/bluetooth/hci_bcm.c | 20 +-
drivers/char/ipmi/ipmi_si_intf.c | 18 +-
drivers/gpu/drm/i915/intel_dp_link_training.c | 26 ++-
drivers/hv/channel_mgmt.c | 2 +-
drivers/hwmon/ina2xx.c | 3 +-
drivers/media/platform/vsp1/vsp1_dl.c | 3 +-
drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +-
drivers/net/slip/slhc.c | 5 +
drivers/net/usb/cdc_ether.c | 6 +
drivers/net/usb/lan78xx.c | 3 +-
drivers/net/wireless/ath/ath9k/xmit.c | 4 +
drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c | 2 +-
drivers/nvme/host/core.c | 1 +
drivers/pci/host/pci-hyperv.c | 92 +++++++--
drivers/s390/cio/qdio_main.c | 42 ++--
drivers/scsi/qla2xxx/qla_os.c | 44 ++--
drivers/vhost/vhost.c | 10 +-
drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
fs/f2fs/gc.c | 5 +-
fs/f2fs/segment.c | 3 +-
fs/hugetlbfs/inode.c | 10 +-
fs/namei.c | 3 +-
fs/nfsd/nfs4proc.c | 12 +-
fs/nfsd/nfs4xdr.c | 8 +-
fs/nfsd/xdr4.h | 2 +
fs/overlayfs/inode.c | 7 +-
fs/overlayfs/namei.c | 9 +
include/net/bluetooth/hci_core.h | 2 +-
include/net/slhc_vj.h | 1 +
kernel/events/core.c | 6 +
lib/bitmap.c | 2 +-
lib/test_bitmap.c | 4 +
mm/gup.c | 5 +-
mm/gup_benchmark.c | 4 +-
net/bluetooth/hci_conn.c | 29 ++-
net/bluetooth/hci_event.c | 15 +-
net/bluetooth/l2cap_core.c | 2 +-
net/ipv4/ip_gre.c | 6 +
net/l2tp/l2tp_core.c | 225 +++++++++------------
net/l2tp/l2tp_core.h | 4 +-
net/l2tp/l2tp_netlink.c | 22 +-
net/l2tp/l2tp_ppp.c | 9 +
net/netfilter/ipset/ip_set_core.c | 2 +
net/rds/send.c | 15 +-
net/sunrpc/auth_gss/gss_krb5_crypto.c | 3 -
security/apparmor/apparmorfs.c | 4 +-
security/apparmor/include/audit.h | 8 +-
security/apparmor/include/sig_names.h | 4 +-
security/apparmor/ipc.c | 2 +-
69 files changed, 504 insertions(+), 345 deletions(-)
This is the start of the stable review cycle for the 4.14.35 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Apr 19 15:56:59 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.35-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.35-rc1
J. Bruce Fields <bfields(a)redhat.com>
nfsd: fix incorrect umasks
Mike Kravetz <mike.kravetz(a)oracle.com>
hugetlbfs: fix bug in pgoff overflow checking
Simon Gaiser <simon(a)invisiblethingslab.com>
xen: xenbus_dev_frontend: Fix XS_TRANSACTION_END handling
Amir Goldstein <amir73il(a)gmail.com>
ovl: fix lookup with middle layer opaque dir and absolute path redirects
Ming Lei <ming.lei(a)redhat.com>
blk-mq: don't keep offline CPUs mapped to hctx 0
Yury Norov <ynorov(a)caviumnetworks.com>
lib: fix stall in __bitmap_parselist()
Yunlong Song <yunlong.song(a)huawei.com>
f2fs: fix heap mode to reset it back
Eric Biggers <ebiggers(a)google.com>
sunrpc: remove incorrect HMAC request initialization
Toke Høiland-Jørgensen <toke(a)toke.dk>
ath9k: Protect queue draining by rcu_read_lock()
Marek Szyprowski <m.szyprowski(a)samsung.com>
hwmon: (ina2xx) Fix access to uninitialized mutex
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/mce/AMD: Get address from already initialized block
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/mce/AMD, EDAC/mce_amd: Enumerate Reserved SMCA bank type
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/mce/AMD: Pass the bank number to smca_get_bank_type()
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/MCE: Report only DRAM ECC as memory errors on AMD systems
Sudhir Sreedharan <ssreedharan(a)mvista.com>
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
Hans de Goede <hdegoede(a)redhat.com>
Bluetooth: hci_bcm: Treat Interrupt ACPI resources as always being active-low
Szymon Janc <szymon.janc(a)codecoup.pl>
Bluetooth: Fix connection if directed advertising and privacy is used
Al Viro <viro(a)zeniv.linux.org.uk>
getname_kernel() needs to make sure that ->name != ->iname in long case
Michael S. Tsirkin <mst(a)redhat.com>
get_user_pages_fast(): return -EFAULT on access_ok failure
Vasily Gorbik <gor(a)linux.ibm.com>
s390/ipl: ensure loadparm valid flag is set
Julian Wiedmann <jwi(a)linux.vnet.ibm.com>
s390/qdio: don't merge ERROR output buffers
Julian Wiedmann <jwi(a)linux.vnet.ibm.com>
s390/qdio: don't retry EQBS after CCQ 96
Dan Williams <dan.j.williams(a)intel.com>
nfit: fix region registration vs block-data-window ranges
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
block/loop: fix deadlock after loop_set_status
John Johansen <john.johansen(a)canonical.com>
apparmor: fix resource audit messages when auditing peer
John Johansen <john.johansen(a)canonical.com>
apparmor: fix display of .ns_name for containers
John Johansen <john.johansen(a)canonical.com>
apparmor: fix logging of the existence test for signals
Bill Kuzeja <William.Kuzeja(a)stratus.com>
scsi: qla2xxx: Fix small memory leak in qla2x00_probe_one on probe failure
Yazen Ghannam <yazen.ghannam(a)amd.com>
x86/MCE/AMD: Define a function to get SMCA bank type
Arnd Bergmann <arnd(a)arndb.de>
radeon: hide pointless #warning when compile testing
Prashant Bhole <bhole_prashant_q7(a)lab.ntt.co.jp>
perf/core: Fix use-after-free in uprobe_perf_close()
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix timestamp following overflow
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix error recovery from missing TIP packet
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix sync_switch
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
Nicholas Piggin <npiggin(a)gmail.com>
KVM: PPC: Book3S HV: trace_tlbie must not be called in realmode
Dexuan Cui <decui(a)microsoft.com>
PCI: hv: Serialize the present and eject work items
Dexuan Cui <decui(a)microsoft.com>
Drivers: hv: vmbus: do not mark HV_PCIE as perf_device
Helge Deller <deller(a)gmx.de>
parisc: Fix HPMC handler by increasing size to multiple of 16 bytes
Helge Deller <deller(a)gmx.de>
parisc: Fix out of array access in match_pci_device()
Kieran Bingham <kieran.bingham+renesas(a)ideasonboard.com>
media: v4l: vsp1: Fix header display list status check in continuous mode
Mauro Carvalho Chehab <mchehab(a)kernel.org>
media: v4l2-compat-ioctl32: don't oops on overlay
Phil Elwell <phil(a)raspberrypi.org>
lan78xx: Correctly indicate invalid OTP
Eric Auger <eric.auger(a)redhat.com>
vhost: Fix vhost_copy_to_user()
Stefan Hajnoczi <stefanha(a)redhat.com>
vhost: fix vhost_vq_access_ok() log check
Tejaswi Tanikella <tejaswit(a)codeaurora.org>
slip: Check if rstate is initialized before uncompressing
Ka-Cheong Poon <ka-cheong.poon(a)oracle.com>
rds: MP-RDS may use an invalid c_path
Bassem Boubaker <bassem.boubaker(a)actia.fr>
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
Jozsef Kadlecsik <kadlec(a)blackhole.kfki.hu>
netfilter: ipset: Missing nfnl_lock()/nfnl_unlock() is added to ip_set_net_exit()
-------------
Diffstat:
Makefile | 4 +-
arch/parisc/kernel/drivers.c | 4 ++
arch/parisc/kernel/hpmc.S | 6 +-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 4 --
arch/s390/kernel/ipl.c | 1 +
arch/x86/include/asm/mce.h | 3 +
arch/x86/kernel/cpu/mcheck/mce.c | 4 +-
arch/x86/kernel/cpu/mcheck/mce_amd.c | 54 ++++++++++++++++--
block/blk-mq-cpumap.c | 5 --
drivers/acpi/nfit/core.c | 22 +++++---
drivers/block/loop.c | 12 ++--
drivers/bluetooth/hci_bcm.c | 20 +------
drivers/edac/mce_amd.c | 11 ++--
drivers/gpu/drm/radeon/radeon_object.c | 3 +-
drivers/hv/channel_mgmt.c | 2 +-
drivers/hwmon/ina2xx.c | 3 +-
drivers/media/platform/vsp1/vsp1_dl.c | 3 +-
drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 4 +-
drivers/net/slip/slhc.c | 5 ++
drivers/net/usb/cdc_ether.c | 6 ++
drivers/net/usb/lan78xx.c | 3 +-
drivers/net/wireless/ath/ath9k/xmit.c | 4 ++
drivers/net/wireless/realtek/rtl818x/rtl8187/dev.c | 2 +-
drivers/pci/host/pci-hyperv.c | 34 ++++++------
drivers/s390/cio/qdio_main.c | 42 +++++++-------
drivers/scsi/qla2xxx/qla_os.c | 44 +++++++--------
drivers/vhost/vhost.c | 10 ++--
drivers/xen/xenbus/xenbus_dev_frontend.c | 2 +-
fs/f2fs/gc.c | 5 +-
fs/f2fs/segment.c | 3 +-
fs/hugetlbfs/inode.c | 10 +++-
fs/namei.c | 3 +-
fs/nfsd/nfs4proc.c | 12 +++-
fs/nfsd/nfs4xdr.c | 8 +--
fs/nfsd/xdr4.h | 2 +
fs/overlayfs/namei.c | 9 +++
include/net/bluetooth/hci_core.h | 2 +-
include/net/slhc_vj.h | 1 +
kernel/events/core.c | 6 ++
lib/bitmap.c | 2 +-
lib/test_bitmap.c | 4 ++
mm/gup.c | 5 +-
net/bluetooth/hci_conn.c | 29 +++++++---
net/bluetooth/hci_event.c | 15 +++--
net/bluetooth/l2cap_core.c | 2 +-
net/netfilter/ipset/ip_set_core.c | 2 +
net/rds/send.c | 15 +++--
net/sunrpc/auth_gss/gss_krb5_crypto.c | 3 -
security/apparmor/apparmorfs.c | 4 +-
security/apparmor/include/audit.h | 8 +--
security/apparmor/include/sig_names.h | 4 +-
security/apparmor/ipc.c | 2 +-
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 64 +++++++++++-----------
.../perf/util/intel-pt-decoder/intel-pt-decoder.h | 2 +-
tools/perf/util/intel-pt.c | 37 ++++++++++---
55 files changed, 361 insertions(+), 215 deletions(-)
It is not possible to get DMA32 zone memory through kmalloc, causing
the vboxguest driver to malfunction due to getting memory above
4G which the PCI device cannot handle.
This commit changes the kmalloc calls where the 4G limit matters to
using __get_free_pages() fixing vboxguest not working on x86_64 guests
with more then 4G RAM.
Cc: stable(a)vger.kernel.org
Reported-by: Eloy Coto Pereiro <eloy.coto(a)gmail.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/virt/vboxguest/vboxguest_linux.c | 19 ++++++++++++++++---
drivers/virt/vboxguest/vboxguest_utils.c | 5 +++--
2 files changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/virt/vboxguest/vboxguest_linux.c b/drivers/virt/vboxguest/vboxguest_linux.c
index 82e280d38cc2..398d22693234 100644
--- a/drivers/virt/vboxguest/vboxguest_linux.c
+++ b/drivers/virt/vboxguest/vboxguest_linux.c
@@ -87,6 +87,7 @@ static long vbg_misc_device_ioctl(struct file *filp, unsigned int req,
struct vbg_session *session = filp->private_data;
size_t returned_size, size;
struct vbg_ioctl_hdr hdr;
+ bool is_vmmdev_req;
int ret = 0;
void *buf;
@@ -106,8 +107,17 @@ static long vbg_misc_device_ioctl(struct file *filp, unsigned int req,
if (size > SZ_16M)
return -E2BIG;
- /* __GFP_DMA32 because IOCTL_VMMDEV_REQUEST passes this to the host */
- buf = kmalloc(size, GFP_KERNEL | __GFP_DMA32);
+ /*
+ * IOCTL_VMMDEV_REQUEST needs the buffer to be below 4G to avoid
+ * the need for a bounce-buffer and another copy later on.
+ */
+ is_vmmdev_req = (req & ~IOCSIZE_MASK) == VBG_IOCTL_VMMDEV_REQUEST(0) ||
+ req == VBG_IOCTL_VMMDEV_REQUEST_BIG;
+
+ if (is_vmmdev_req)
+ buf = vbg_req_alloc(size, VBG_IOCTL_HDR_TYPE_DEFAULT);
+ else
+ buf = kmalloc(size, GFP_KERNEL);
if (!buf)
return -ENOMEM;
@@ -132,7 +142,10 @@ static long vbg_misc_device_ioctl(struct file *filp, unsigned int req,
ret = -EFAULT;
out:
- kfree(buf);
+ if (is_vmmdev_req)
+ vbg_req_free(buf, size);
+ else
+ kfree(buf);
return ret;
}
diff --git a/drivers/virt/vboxguest/vboxguest_utils.c b/drivers/virt/vboxguest/vboxguest_utils.c
index bad915463359..bf4474214b4d 100644
--- a/drivers/virt/vboxguest/vboxguest_utils.c
+++ b/drivers/virt/vboxguest/vboxguest_utils.c
@@ -65,8 +65,9 @@ VBG_LOG(vbg_debug, pr_debug);
void *vbg_req_alloc(size_t len, enum vmmdev_request_type req_type)
{
struct vmmdev_request_header *req;
+ int order = get_order(PAGE_ALIGN(len));
- req = kmalloc(len, GFP_KERNEL | __GFP_DMA32);
+ req = (void *)__get_free_pages(GFP_KERNEL | GFP_DMA32, order);
if (!req)
return NULL;
@@ -87,7 +88,7 @@ void vbg_req_free(void *req, size_t len)
if (!req)
return;
- kfree(req);
+ free_pages((unsigned long)req, get_order(PAGE_ALIGN(len)));
}
/* Note this function returns a VBox status code, not a negative errno!! */
--
2.17.0
This patch series has fixes for bugs in the SWIM floppy disk controller
driver, including an oops and a soft lockup.
One way to apply these patches to v4.14+ is by first cherry-picking
these commits:
b87eaec27eca3def6c8ed617e3b1bac08d7bc715
e5f0d2e2a153b18dcf31e1a633e210c37829d759
There are of course other ways to fix the patch rejects, but this way
would be convenient for me because it would simplify my own backporting.
Changes since v1:
- Dropped the two IOP patches as they aren't simple fixes. This way,
the entire series is suitable for stable trees.
- Added Cc, Fixes, Acked-by and Reviewed-by tags.
Finn Thain (10):
m68k/mac: Revisit floppy disc controller base addresses
m68k/mac: Fix SWIM memory resource end address
m68k/mac: Don't remap SWIM MMIO region
block/swim: Fix array bounds check
block/swim: Remove extra put_disk() call from error path
block/swim: Don't log an error message for an invalid ioctl
block/swim: Rename macros to avoid inconsistent inverted logic
block/swim: Check drive type
block/swim: Fix IO error at end of medium
block/swim: Select appropriate drive on device open
arch/m68k/include/asm/macintosh.h | 10 +--
arch/m68k/mac/config.c | 126 ++++++++++++++++++++------------------
drivers/block/swim.c | 49 +++++++--------
drivers/block/swim3.c | 6 +-
4 files changed, 96 insertions(+), 95 deletions(-)
--
2.16.1
AOSP use userspace firmware loader to load firmwares, which will
return -EAGAIN in case qca/rampatch_00440302.bin is not found.
Since there is no rampatch for dragonboard820c QCA controller
revision, just make it work as is.
CC: Loic Poulain <loic.poulain(a)linaro.org>
CC: Nicolas Dechesne <nicolas.dechesne(a)linaro.org>
CC: Marcel Holtmann <marcel(a)holtmann.org>
CC: Johan Hedberg <johan.hedberg(a)gmail.com>
CC: Stable <stable(a)vger.kernel.org>
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
---
drivers/bluetooth/hci_qca.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 05ec530b8a3a..330e9b29e145 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -935,6 +935,12 @@ static int qca_setup(struct hci_uart *hu)
} else if (ret == -ENOENT) {
/* No patch/nvm-config found, run with original fw/config */
ret = 0;
+ } else if (ret == -EAGAIN) {
+ /*
+ * Userspace firmware loader will return -EAGAIN in case no
+ * patch/nvm-config is found, so run with original fw/config.
+ */
+ ret = 0;
}
/* Setup bdaddr */
--
2.7.4
Hi Jason and Doug,
Here are most of our updates for 4.17. I will follow this up with a small
16B series then I still have a few more patches that are waiting on
some more thorough testing. Should be able to get them on the list tomrrow or
Friday at the latest, wanted to get these out now. I don't think anything
is really that scary in here.
These are mostly driver fixes. Patches 4,7,8,11, and 14 are marked stable.
They didn't get sent for the -rc because they fix really old issues.
Patch 5 is a core fix. I should have sent it a bit sooner, sorry about that but
it's pretty trivial so I decided to include it as well rather than wait for
4.18.
---
Alex Estrin (2):
IB/hfi1: Complete check for locally terminated smp
IB/{hfi1,qib}: Add handling of kernel restart
Ashutosh Dixit (1):
IB/core: Fix rkey invalidation from user space into the kernel
Michael J. Ruhl (5):
IB/hfi1: Return actual error value from program_rcvarray()
IB/hfi1: Use after free race condition in send context error path
IB/hfi1 Use correct type for num_user_context
IB/hfi1: Return correct value for device state
IB/hfi1: Reorder incorrect send context disable
Mike Marciniszyn (3):
IB/hfi1: Fix handling of FECN marked multicast packet
IB/hfi1: Fix fault injection init/exit issues
IB/hfi1: Fix loss of BECN with AHG
Sebastian Sanchez (2):
IB/hfi1: Prevent LNI hang when LCB can't obtain lanes
IB/{hfi1,rdmavt,qib}: Fit kernel completions into single aligned cache-line
drivers/infiniband/core/uverbs_cmd.c | 4 +
drivers/infiniband/hw/hfi1/chip.c | 59 ++++++++---
drivers/infiniband/hw/hfi1/chip.h | 15 ++-
drivers/infiniband/hw/hfi1/chip_registers.h | 7 +
drivers/infiniband/hw/hfi1/debugfs.c | 8 +
drivers/infiniband/hw/hfi1/driver.c | 19 +++-
drivers/infiniband/hw/hfi1/file_ops.c | 2
drivers/infiniband/hw/hfi1/hfi.h | 9 +-
drivers/infiniband/hw/hfi1/init.c | 9 +-
drivers/infiniband/hw/hfi1/mad.c | 36 ++++---
drivers/infiniband/hw/hfi1/pio.c | 44 ++++++--
drivers/infiniband/hw/hfi1/rc.c | 2
drivers/infiniband/hw/hfi1/ruc.c | 54 ++++++++--
drivers/infiniband/hw/hfi1/uc.c | 2
drivers/infiniband/hw/hfi1/ud.c | 10 +-
drivers/infiniband/hw/hfi1/user_exp_rcv.c | 1
drivers/infiniband/hw/qib/qib.h | 1
drivers/infiniband/hw/qib/qib_init.c | 5 +
drivers/infiniband/hw/qib/qib_rc.c | 2
drivers/infiniband/hw/qib/qib_ruc.c | 4 -
drivers/infiniband/hw/qib/qib_uc.c | 2
drivers/infiniband/hw/qib/qib_ud.c | 4 -
drivers/infiniband/sw/rdmavt/cq.c | 146 ++++++++++++++++++---------
drivers/infiniband/sw/rdmavt/qp.c | 4 -
drivers/infiniband/sw/rdmavt/trace_cq.h | 6 +
include/rdma/ib_verbs.h | 5 +
include/rdma/rdmavt_cq.h | 35 +++++-
include/rdma/rdmavt_qp.h | 2
28 files changed, 344 insertions(+), 153 deletions(-)
--
-Denny
The patch titled
Subject: mm, oom: fix concurrent munlock and oom reaper unmap
has been added to the -mm tree. Its filename is
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-oom-fix-concurrent-munlock-and-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-oom-fix-concurrent-munlock-and-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Rientjes <rientjes(a)google.com>
Subject: mm, oom: fix concurrent munlock and oom reaper unmap
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.
This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma. Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.
This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires kick_all_cpus_sync(). If the pmd is zapped
by the oom reaper during follow_page_mask() after the check for pmd_none()
is bypassed, this ends up deferencing a NULL ptl.
Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
reaped. This prevents the concurrent munlock_vma_pages_range() and
unmap_page_range(). The oom reaper will simply not operate on an mm that
has the bit set and leave the unmapping to exit_mmap().
Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1804171545460.53786@chino.kir.corp…
Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently")
Signed-off-by: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 38 ++++++++++++++++++++------------------
mm/oom_kill.c | 19 ++++++++-----------
2 files changed, 28 insertions(+), 29 deletions(-)
diff -puN mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/mmap.c
--- a/mm/mmap.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/mmap.c
@@ -3015,6 +3015,25 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
+ if (unlikely(mm_is_oom_victim(mm))) {
+ /*
+ * Wait for oom_reap_task() to stop working on this mm. Because
+ * MMF_UNSTABLE is already set before calling down_read(),
+ * oom_reap_task() will not run on this mm after up_write().
+ * oom_reap_task() also depends on a stable VM_LOCKED flag to
+ * indicate it should not unmap during munlock_vma_pages_all().
+ *
+ * mm_is_oom_victim() cannot be set from under us because
+ * victim->mm is already set to NULL under task_lock before
+ * calling mmput() and victim->signal->oom_mm is set by the oom
+ * killer only if victim->mm is non-NULL while holding
+ * task_lock().
+ */
+ set_bit(MMF_UNSTABLE, &mm->flags);
+ down_write(&mm->mmap_sem);
+ up_write(&mm->mmap_sem);
+ }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3036,26 +3055,9 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
- if (unlikely(mm_is_oom_victim(mm))) {
- /*
- * Wait for oom_reap_task() to stop working on this
- * mm. Because MMF_OOM_SKIP is already set before
- * calling down_read(), oom_reap_task() will not run
- * on this "mm" post up_write().
- *
- * mm_is_oom_victim() cannot be set from under us
- * either because victim->mm is already set to NULL
- * under task_lock before calling mmput and oom_mm is
- * set not NULL by the OOM killer only if victim->mm
- * is found not NULL while holding the task_lock.
- */
- set_bit(MMF_OOM_SKIP, &mm->flags);
- down_write(&mm->mmap_sem);
- up_write(&mm->mmap_sem);
- }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
+ set_bit(MMF_OOM_SKIP, &mm->flags);
/*
* Walk the list again, actually closing and freeing it,
diff -puN mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap mm/oom_kill.c
--- a/mm/oom_kill.c~mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap
+++ a/mm/oom_kill.c
@@ -521,12 +521,17 @@ static bool __oom_reap_task_mm(struct ta
}
/*
- * MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
- * work on the mm anymore. The check for MMF_OOM_SKIP must run
+ * Tell all users of get_user/copy_from_user etc... that the content
+ * is no longer stable. No barriers really needed because unmapping
+ * should imply barriers already and the reader would hit a page fault
+ * if it stumbled over reaped memory.
+ *
+ * MMF_UNSTABLE is also set by exit_mmap when the OOM reaper shouldn't
+ * work on the mm anymore. The check for MMF_OOM_UNSTABLE must run
* under mmap_sem for reading because it serializes against the
* down_write();up_write() cycle in exit_mmap().
*/
- if (test_bit(MMF_OOM_SKIP, &mm->flags)) {
+ if (test_and_set_bit(MMF_UNSTABLE, &mm->flags)) {
up_read(&mm->mmap_sem);
trace_skip_task_reaping(tsk->pid);
goto unlock_oom;
@@ -534,14 +539,6 @@ static bool __oom_reap_task_mm(struct ta
trace_start_task_reaping(tsk->pid);
- /*
- * Tell all users of get_user/copy_from_user etc... that the content
- * is no longer stable. No barriers really needed because unmapping
- * should imply barriers already and the reader would hit a page fault
- * if it stumbled over a reaped memory.
- */
- set_bit(MMF_UNSTABLE, &mm->flags);
-
for (vma = mm->mmap ; vma; vma = vma->vm_next) {
if (!can_madv_dontneed_vma(vma))
continue;
_
Patches currently in -mm which might be from rientjes(a)google.com are
mm-oom-fix-concurrent-munlock-and-oom-reaper-unmap.patch
Since SCSI scanning occurs asynchronously, since sd_revalidate_disk()
is called from sd_probe_async() and since sd_revalidate_disk() calls
sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
concurrently with blkdev_report_zones() and/or blkdev_reset_zones().
That can cause these functions to fail with -EIO because
sd_zbc_read_zones() e.g. sets q->nr_zones to zero before restoring it
to the actual value, even if no drive characteristics have changed.
Avoid that this can happen by making the following changes:
- Protect the code that updates zone information with blk_queue_enter()
and blk_queue_exit().
- Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
these functions do not modify struct scsi_disk before all zone
information has been obtained.
Note: since commit 055f6e18e08f ("block: Make q_usage_counter also
track legacy requests"; kernel v4.15) the request queue freezing
mechanism also affects legacy request queues.
Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Signed-off-by: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Damien Le Moal <damien.lemoal(a)wdc.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Hannes Reinecke <hare(a)suse.com>
Cc: stable(a)vger.kernel.org # v4.10
---
drivers/scsi/sd_zbc.c | 140 +++++++++++++++++++++++++++++--------------------
include/linux/blkdev.h | 5 ++
2 files changed, 87 insertions(+), 58 deletions(-)
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 2d0c06f7db3e..323e3dc4bc59 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -390,8 +390,10 @@ static int sd_zbc_check_capacity(struct scsi_disk *sdkp, unsigned char *buf)
*
* Check that all zones of the device are equal. The last zone can however
* be smaller. The zone size must also be a power of two number of LBAs.
+ *
+ * Returns the zone size in bytes upon success or an error code upon failure.
*/
-static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
+static s64 sd_zbc_check_zone_size(struct scsi_disk *sdkp)
{
u64 zone_blocks = 0;
sector_t block = 0;
@@ -402,8 +404,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
int ret;
u8 same;
- sdkp->zone_blocks = 0;
-
/* Get a buffer */
buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
if (!buf)
@@ -435,16 +435,17 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
/* Parse zone descriptors */
while (rec < buf + buf_len) {
- zone_blocks = get_unaligned_be64(&rec[8]);
- if (sdkp->zone_blocks == 0) {
- sdkp->zone_blocks = zone_blocks;
- } else if (zone_blocks != sdkp->zone_blocks &&
- (block + zone_blocks < sdkp->capacity
- || zone_blocks > sdkp->zone_blocks)) {
- zone_blocks = 0;
+ u64 this_zone_blocks = get_unaligned_be64(&rec[8]);
+
+ if (zone_blocks == 0) {
+ zone_blocks = this_zone_blocks;
+ } else if (this_zone_blocks != zone_blocks &&
+ (block + this_zone_blocks < sdkp->capacity
+ || this_zone_blocks > zone_blocks)) {
+ this_zone_blocks = 0;
goto out;
}
- block += zone_blocks;
+ block += this_zone_blocks;
rec += 64;
}
@@ -457,8 +458,6 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
} while (block < sdkp->capacity);
- zone_blocks = sdkp->zone_blocks;
-
out:
if (!zone_blocks) {
if (sdkp->first_scan)
@@ -478,8 +477,7 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
"Zone size too large\n");
ret = -ENODEV;
} else {
- sdkp->zone_blocks = zone_blocks;
- sdkp->zone_shift = ilog2(zone_blocks);
+ ret = zone_blocks;
}
out_free:
@@ -490,15 +488,14 @@ static int sd_zbc_check_zone_size(struct scsi_disk *sdkp)
/**
* sd_zbc_alloc_zone_bitmap - Allocate a zone bitmap (one bit per zone).
- * @sdkp: The disk of the bitmap
+ * @nr_zones: Number of zones to allocate space for.
+ * @numa_node: NUMA node to allocate the memory from.
*/
-static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
+static inline unsigned long *
+sd_zbc_alloc_zone_bitmap(u32 nr_zones, int numa_node)
{
- struct request_queue *q = sdkp->disk->queue;
-
- return kzalloc_node(BITS_TO_LONGS(sdkp->nr_zones)
- * sizeof(unsigned long),
- GFP_KERNEL, q->node);
+ return kzalloc_node(BITS_TO_LONGS(nr_zones) * sizeof(unsigned long),
+ GFP_KERNEL, numa_node);
}
/**
@@ -506,6 +503,7 @@ static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
* @sdkp: disk used
* @buf: report reply buffer
* @buflen: length of @buf
+ * @zone_shift: logarithm base 2 of the number of blocks in a zone
* @seq_zones_bitmap: bitmap of sequential zones to set
*
* Parse reported zone descriptors in @buf to identify sequential zones and
@@ -515,7 +513,7 @@ static inline unsigned long *sd_zbc_alloc_zone_bitmap(struct scsi_disk *sdkp)
* Return the LBA after the last zone reported.
*/
static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
- unsigned int buflen,
+ unsigned int buflen, u32 zone_shift,
unsigned long *seq_zones_bitmap)
{
sector_t lba, next_lba = sdkp->capacity;
@@ -534,7 +532,7 @@ static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
if (type != ZBC_ZONE_TYPE_CONV &&
cond != ZBC_ZONE_COND_READONLY &&
cond != ZBC_ZONE_COND_OFFLINE)
- set_bit(lba >> sdkp->zone_shift, seq_zones_bitmap);
+ set_bit(lba >> zone_shift, seq_zones_bitmap);
next_lba = lba + get_unaligned_be64(&rec[8]);
rec += 64;
}
@@ -543,12 +541,16 @@ static sector_t sd_zbc_get_seq_zones(struct scsi_disk *sdkp, unsigned char *buf,
}
/**
- * sd_zbc_setup_seq_zones_bitmap - Initialize the disk seq zone bitmap.
+ * sd_zbc_setup_seq_zones_bitmap - Initialize a seq zone bitmap.
* @sdkp: target disk
+ * @zone_shift: logarithm base 2 of the number of blocks in a zone
+ * @nr_zones: number of zones to set up a seq zone bitmap for
*
* Allocate a zone bitmap and initialize it by identifying sequential zones.
*/
-static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
+static unsigned long *
+sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp, u32 zone_shift,
+ u32 nr_zones)
{
struct request_queue *q = sdkp->disk->queue;
unsigned long *seq_zones_bitmap;
@@ -556,9 +558,9 @@ static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
unsigned char *buf;
int ret = -ENOMEM;
- seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(sdkp);
+ seq_zones_bitmap = sd_zbc_alloc_zone_bitmap(nr_zones, q->node);
if (!seq_zones_bitmap)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
buf = kmalloc(SD_ZBC_BUF_SIZE, GFP_KERNEL);
if (!buf)
@@ -569,7 +571,7 @@ static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
if (ret)
goto out;
lba = sd_zbc_get_seq_zones(sdkp, buf, SD_ZBC_BUF_SIZE,
- seq_zones_bitmap);
+ zone_shift, seq_zones_bitmap);
}
if (lba != sdkp->capacity) {
@@ -581,12 +583,9 @@ static int sd_zbc_setup_seq_zones_bitmap(struct scsi_disk *sdkp)
kfree(buf);
if (ret) {
kfree(seq_zones_bitmap);
- return ret;
+ return ERR_PTR(ret);
}
-
- q->seq_zones_bitmap = seq_zones_bitmap;
-
- return 0;
+ return seq_zones_bitmap;
}
static void sd_zbc_cleanup(struct scsi_disk *sdkp)
@@ -602,44 +601,64 @@ static void sd_zbc_cleanup(struct scsi_disk *sdkp)
q->nr_zones = 0;
}
-static int sd_zbc_setup(struct scsi_disk *sdkp)
+static int sd_zbc_setup(struct scsi_disk *sdkp, u32 zone_blocks)
{
struct request_queue *q = sdkp->disk->queue;
+ u32 zone_shift = ilog2(zone_blocks);
+ u32 nr_zones;
int ret;
- /* READ16/WRITE16 is mandatory for ZBC disks */
- sdkp->device->use_16_for_rw = 1;
- sdkp->device->use_10_for_rw = 0;
-
/* chunk_sectors indicates the zone size */
- blk_queue_chunk_sectors(sdkp->disk->queue,
- logical_to_sectors(sdkp->device, sdkp->zone_blocks));
- sdkp->nr_zones =
- round_up(sdkp->capacity, sdkp->zone_blocks) >> sdkp->zone_shift;
+ blk_queue_chunk_sectors(q,
+ logical_to_sectors(sdkp->device, zone_blocks));
+ nr_zones = round_up(sdkp->capacity, zone_blocks) >> zone_shift;
/*
* Initialize the device request queue information if the number
* of zones changed.
*/
- if (sdkp->nr_zones != q->nr_zones) {
-
- sd_zbc_cleanup(sdkp);
-
- q->nr_zones = sdkp->nr_zones;
- if (sdkp->nr_zones) {
- q->seq_zones_wlock = sd_zbc_alloc_zone_bitmap(sdkp);
- if (!q->seq_zones_wlock) {
+ if (nr_zones != sdkp->nr_zones || nr_zones != q->nr_zones) {
+ unsigned long *seq_zones_wlock = NULL, *seq_zones_bitmap = NULL;
+ size_t zone_bitmap_size;
+
+ if (nr_zones) {
+ seq_zones_wlock = sd_zbc_alloc_zone_bitmap(nr_zones,
+ q->node);
+ if (!seq_zones_wlock) {
ret = -ENOMEM;
goto err;
}
- ret = sd_zbc_setup_seq_zones_bitmap(sdkp);
- if (ret) {
- sd_zbc_cleanup(sdkp);
+ seq_zones_bitmap = sd_zbc_setup_seq_zones_bitmap(sdkp,
+ zone_shift, nr_zones);
+ if (IS_ERR(seq_zones_bitmap)) {
+ ret = PTR_ERR(seq_zones_bitmap);
+ kfree(seq_zones_wlock);
goto err;
}
}
-
+ zone_bitmap_size = BITS_TO_LONGS(nr_zones) *
+ sizeof(unsigned long);
+ blk_mq_freeze_queue(q);
+ if (q->nr_zones != nr_zones) {
+ /* READ16/WRITE16 is mandatory for ZBC disks */
+ sdkp->device->use_16_for_rw = 1;
+ sdkp->device->use_10_for_rw = 0;
+
+ sdkp->zone_blocks = zone_blocks;
+ sdkp->zone_shift = zone_shift;
+ sdkp->nr_zones = nr_zones;
+ q->nr_zones = nr_zones;
+ swap(q->seq_zones_wlock, seq_zones_wlock);
+ swap(q->seq_zones_bitmap, seq_zones_bitmap);
+ } else if (memcmp(q->seq_zones_bitmap, seq_zones_bitmap,
+ zone_bitmap_size) != 0) {
+ memcpy(q->seq_zones_bitmap, seq_zones_bitmap,
+ zone_bitmap_size);
+ }
+ blk_mq_unfreeze_queue(q);
+ kfree(seq_zones_wlock);
+ kfree(seq_zones_bitmap);
}
return 0;
@@ -651,6 +670,7 @@ static int sd_zbc_setup(struct scsi_disk *sdkp)
int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
{
+ int64_t zone_blocks;
int ret;
if (!sd_is_zoned(sdkp))
@@ -687,12 +707,16 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, unsigned char *buf)
* Check zone size: only devices with a constant zone size (except
* an eventual last runt zone) that is a power of 2 are supported.
*/
- ret = sd_zbc_check_zone_size(sdkp);
- if (ret)
+ zone_blocks = sd_zbc_check_zone_size(sdkp);
+ ret = -EFBIG;
+ if (zone_blocks != (u32)zone_blocks)
+ goto err;
+ ret = zone_blocks;
+ if (ret < 0)
goto err;
/* The drive satisfies the kernel restrictions: set it up */
- ret = sd_zbc_setup(sdkp);
+ ret = sd_zbc_setup(sdkp, zone_blocks);
if (ret)
goto err;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9af3e0f430bc..21e21f273a21 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -605,6 +605,11 @@ struct request_queue {
* initialized by the low level device driver (e.g. scsi/sd.c).
* Stacking drivers (device mappers) may or may not initialize
* these fields.
+ *
+ * Reads of this information must be protected with blk_queue_enter() /
+ * blk_queue_exit(). Modifying this information is only allowed while
+ * no requests are being processed. See also blk_mq_freeze_queue() and
+ * blk_mq_unfreeze_queue().
*/
unsigned int nr_zones;
unsigned long *seq_zones_bitmap;
--
2.16.3
From: Long Li <longli(a)microsoft.com>
When sending the last iov that breaks into smaller buffers to fit the
transfer size, it's necessary to check if this is the last iov.
If this is the latest iov, stop and proceed to send pages.
Signed-off-by: Long Li <longli(a)microsoft.com>
---
fs/cifs/smbdirect.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/cifs/smbdirect.c b/fs/cifs/smbdirect.c
index 90e673c..b5c6c0d 100644
--- a/fs/cifs/smbdirect.c
+++ b/fs/cifs/smbdirect.c
@@ -2197,6 +2197,8 @@ int smbd_send(struct smbd_connection *info, struct smb_rqst *rqst)
goto done;
}
i++;
+ if (i == rqst->rq_nvec)
+ break;
}
start = i;
buflen = 0;
--
2.7.4
The patch titled
Subject: mm, slab: reschedule cache_reap() on the same CPU
has been removed from the -mm tree. Its filename was
mm-slab-reschedule-cache_reap-on-the-same-cpu.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, slab: reschedule cache_reap() on the same CPU
cache_reap() is initially scheduled in start_cpu_timer() via
schedule_delayed_work_on(). But then the next iterations are scheduled
via schedule_delayed_work(), i.e. using WORK_CPU_UNBOUND.
Thus since commit ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work
on wq_unbound_cpumask CPUs") there is no guarantee the future iterations
will run on the originally intended cpu, although it's still preferred. I
was able to demonstrate this with
/sys/module/workqueue/parameters/debug_force_rr_cpu. IIUC, it may also
happen due to migrating timers in nohz context. As a result, some cpu's
would be calling cache_reap() more frequently and others never.
This patch uses schedule_delayed_work_on() with the current cpu when
scheduling the next iteration.
Link: http://lkml.kernel.org/r/20180411070007.32225-1-vbabka@suse.cz
Fixes: ef557180447f ("workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: Pekka Enberg <penberg(a)kernel.org>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Lai Jiangshan <jiangshanlai(a)gmail.com>
Cc: John Stultz <john.stultz(a)linaro.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN mm/slab.c~mm-slab-reschedule-cache_reap-on-the-same-cpu mm/slab.c
--- a/mm/slab.c~mm-slab-reschedule-cache_reap-on-the-same-cpu
+++ a/mm/slab.c
@@ -4086,7 +4086,8 @@ next:
next_reap_node();
out:
/* Set up the next iteration */
- schedule_delayed_work(work, round_jiffies_relative(REAPTIMEOUT_AC));
+ schedule_delayed_work_on(smp_processor_id(), work,
+ round_jiffies_relative(REAPTIMEOUT_AC));
}
void get_slabinfo(struct kmem_cache *cachep, struct slabinfo *sinfo)
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
The patch titled
Subject: ipc/shm: fix use-after-free of shm file via remap_file_pages()
has been removed from the -mm tree. Its filename was
ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Eric Biggers <ebiggers(a)google.com>
Subject: ipc/shm: fix use-after-free of shm file via remap_file_pages()
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages(). Unfortunately
it couldn't generate a reproducer, but I found a bug which I think caused
it. When remap_file_pages() is passed a full System V shared memory
segment, the memory is first unmapped, then a new map is created using the
->vm_file. Between these steps, the shm ID can be removed and reused for
a new shm segment. But, shm_mmap() only checks whether the ID is
currently valid before calling the underlying file's ->mmap(); it doesn't
check whether it was reused. Thus it can use the wrong underlying file,
one that was already freed.
Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches the
one associated with the "outer" file. Taking the reference to the real
shm file is needed to fully solve the problem, since otherwise sfd->file
could point to a freed file, which then could be reallocated for the
reused shm ID, causing the wrong shm segment to be mapped (and without the
required permission checks).
Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because it
didn't consider the case where the shm ID is reused.
The following program usually reproduces this bug:
#include <stdlib.h>
#include <sys/shm.h>
#include <sys/syscall.h>
#include <unistd.h>
int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}
It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed. (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report. But I think it's possible
with this bug; it would just take a more extraordinary race...)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
file_accessed include/linux/fs.h:2063 [inline]
shmem_mmap+0x25/0x40 mm/shmem.c:2149
call_mmap include/linux/fs.h:1789 [inline]
shm_mmap+0x34/0x80 ipc/shm.c:465
call_mmap include/linux/fs.h:1789 [inline]
mmap_region+0x309/0x5b0 mm/mmap.c:1712
do_mmap+0x294/0x4a0 mm/mmap.c:1483
do_mmap_pgoff include/linux/mm.h:2235 [inline]
SYSC_remap_file_pages mm/mmap.c:2853 [inline]
SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ebiggers(a)google.com: add comment]
Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439(a)syzkaller.appspotmail.com
Fixes: c8d78c1823f4 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso(a)suse.de>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)
diff -puN ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages ipc/shm.c
--- a/ipc/shm.c~ipc-shm-fix-use-after-free-of-shm-file-via-remap_file_pages
+++ a/ipc/shm.c
@@ -225,6 +225,12 @@ static int __shm_open(struct vm_area_str
if (IS_ERR(shp))
return PTR_ERR(shp);
+ if (shp->shm_file != sfd->file) {
+ /* ID was reused */
+ shm_unlock(shp);
+ return -EINVAL;
+ }
+
shp->shm_atim = ktime_get_real_seconds();
ipc_update_pid(&shp->shm_lprid, task_tgid(current));
shp->shm_nattch++;
@@ -455,8 +461,9 @@ static int shm_mmap(struct file *file, s
int ret;
/*
- * In case of remap_file_pages() emulation, the file can represent
- * removed IPC ID: propogate shm_lock() error to caller.
+ * In case of remap_file_pages() emulation, the file can represent an
+ * IPC ID that was removed, and possibly even reused by another shm
+ * segment already. Propagate this case as an error to caller.
*/
ret = __shm_open(vma);
if (ret)
@@ -480,6 +487,7 @@ static int shm_release(struct inode *ino
struct shm_file_data *sfd = shm_file_data(file);
put_ipc_ns(sfd->ns);
+ fput(sfd->file);
shm_file_data(file) = NULL;
kfree(sfd);
return 0;
@@ -1445,7 +1453,16 @@ long do_shmat(int shmid, char __user *sh
file->f_mapping = shp->shm_file->f_mapping;
sfd->id = shp->shm_perm.id;
sfd->ns = get_ipc_ns(ns);
- sfd->file = shp->shm_file;
+ /*
+ * We need to take a reference to the real shm file to prevent the
+ * pointer from becoming stale in cases where the lifetime of the outer
+ * file extends beyond that of the shm segment. It's not usually
+ * possible, but it can happen during remap_file_pages() emulation as
+ * that unmaps the memory, then does ->mmap() via file reference only.
+ * We'll deny the ->mmap() if the shm segment was since removed, but to
+ * detect shm ID reuse we need to compare the file pointers.
+ */
+ sfd->file = get_file(shp->shm_file);
sfd->vm_ops = NULL;
err = security_mmap_file(file, prot, flags);
_
Patches currently in -mm which might be from ebiggers(a)google.com are
The patch titled
Subject: get_user_pages_fast(): return -EFAULT on access_ok failure
has been removed from the -mm tree. Its filename was
gup-return-efault-on-access_ok-failure.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Subject: get_user_pages_fast(): return -EFAULT on access_ok failure
get_user_pages_fast is supposed to be a faster drop-in equivalent of
get_user_pages. As such, callers expect it to return a negative return
code when passed an invalid address, and never expect it to return 0 when
passed a positive number of pages, since its documentation says:
* Returns number of pages pinned. This may be fewer than the number
* requested. If nr_pages is 0 or negative, returns 0. If no pages
* were pinned, returns -errno.
When get_user_pages_fast fall back on get_user_pages this is exactly what
happens. Unfortunately the implementation is inconsistent: it returns 0
if passed a kernel address, confusing callers: for example, the following
is pretty common but does not appear to do the right thing with a kernel
address:
ret = get_user_pages_fast(addr, 1, writeable, &page);
if (ret < 0)
return ret;
Change get_user_pages_fast to return -EFAULT when supplied a kernel
address to make it match expectations.
All callers have been audited for consistency with the documented
semantics.
Link: http://lkml.kernel.org/r/1522962072-182137-4-git-send-email-mst@redhat.com
Fixes: 5b65c4677a57 ("mm, x86/mm: Fix performance regression in get_user_pages_fast()")
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Reported-by: syzbot+6304bf97ef436580fede(a)syzkaller.appspotmail.com
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff -puN mm/gup.c~gup-return-efault-on-access_ok-failure mm/gup.c
--- a/mm/gup.c~gup-return-efault-on-access_ok-failure
+++ a/mm/gup.c
@@ -1806,9 +1806,12 @@ int get_user_pages_fast(unsigned long st
len = (unsigned long) nr_pages << PAGE_SHIFT;
end = start + len;
+ if (nr_pages <= 0)
+ return 0;
+
if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ,
(void __user *)start, len)))
- return 0;
+ return -EFAULT;
if (gup_fast_permitted(start, nr_pages, write)) {
local_irq_disable();
_
Patches currently in -mm which might be from mst(a)redhat.com are
The patch titled
Subject: mm/gup_benchmark: handle gup failures
has been removed from the -mm tree. Its filename was
mm-gup_benchmark-handle-gup-failures.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Michael S. Tsirkin" <mst(a)redhat.com>
Subject: mm/gup_benchmark: handle gup failures
Patch series "mm/get_user_pages_fast fixes, cleanups", v2.
Turns out get_user_pages_fast and __get_user_pages_fast return different
values on error when given a single page: __get_user_pages_fast returns 0.
get_user_pages_fast returns either 0 or an error.
Callers of get_user_pages_fast expect an error so fix it up to return an
error consistently.
Stress the difference between get_user_pages_fast and
__get_user_pages_fast to make sure callers aren't confused.
This patch (of 3):
__gup_benchmark_ioctl does not handle the case where get_user_pages_fast
fails:
- a negative return code will cause a buffer overrun
- returning with partial success will cause use of
uninitialized memory.
[akpm(a)linux-foundation.org: simplification]
Link: http://lkml.kernel.org/r/1522962072-182137-3-git-send-email-mst@redhat.com
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup_benchmark.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff -puN mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures mm/gup_benchmark.c
--- a/mm/gup_benchmark.c~mm-gup_benchmark-handle-gup-failures
+++ a/mm/gup_benchmark.c
@@ -23,7 +23,7 @@ static int __gup_benchmark_ioctl(unsigne
struct page **pages;
nr_pages = gup->size / PAGE_SIZE;
- pages = kvmalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
+ pages = kvzalloc(sizeof(void *) * nr_pages, GFP_KERNEL);
if (!pages)
return -ENOMEM;
@@ -41,6 +41,8 @@ static int __gup_benchmark_ioctl(unsigne
}
nr = get_user_pages_fast(addr, nr, gup->flags & 1, pages + i);
+ if (nr <= 0)
+ break;
i += nr;
}
end_time = ktime_get();
_
Patches currently in -mm which might be from mst(a)redhat.com are
The patch titled
Subject: resource: fix integer overflow at reallocation
has been removed from the -mm tree. Its filename was
resource-fix-integer-overflow-at-reallocation-v1.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Takashi Iwai <tiwai(a)suse.de>
Subject: resource: fix integer overflow at reallocation
We've got a bug report indicating a kernel panic at booting on an x86-32
system, and it turned out to be the invalid PCI resource assigned after
reallocation. __find_resource() first aligns the resource start address
and resets the end address with start+size-1 accordingly, then checks
whether it's contained. Here the end address may overflow the integer,
although resource_contains() still returns true because the function
validates only start and end address. So this ends up with returning an
invalid resource (start > end).
There was already an attempt to cover such a problem in the commit
47ea91b4052d ("Resource: fix wrong resource window calculation"), but this
case is an overseen one.
This patch adds the validity check of the newly calculated resource for
avoiding the integer overflow problem.
Bugzilla: http://bugzilla.opensuse.org/show_bug.cgi?id=1086739
Link: http://lkml.kernel.org/r/s5hpo37d5l8.wl-tiwai@suse.de
Fixes: 23c570a67448 ("resource: ability to resize an allocated resource")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Reported-by: Michael Henders <hendersm(a)shaw.ca>
Tested-by: Michael Henders <hendersm(a)shaw.ca>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ram Pai <linuxram(a)us.ibm.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff -puN kernel/resource.c~resource-fix-integer-overflow-at-reallocation-v1 kernel/resource.c
--- a/kernel/resource.c~resource-fix-integer-overflow-at-reallocation-v1
+++ a/kernel/resource.c
@@ -651,7 +651,8 @@ static int __find_resource(struct resour
alloc.start = constraint->alignf(constraint->alignf_data, &avail,
size, constraint->align);
alloc.end = alloc.start + size - 1;
- if (resource_contains(&avail, &alloc)) {
+ if (alloc.start <= alloc.end &&
+ resource_contains(&avail, &alloc)) {
new->start = alloc.start;
new->end = alloc.end;
return 0;
_
Patches currently in -mm which might be from tiwai(a)suse.de are