This is a note to let you know that I've just added the patch titled
clk: meson: mpll: use 64-bit maths in params_from_rate
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
clk-meson-mpll-use-64-bit-maths-in-params_from_rate.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Date: Sat, 23 Dec 2017 22:38:32 +0100
Subject: clk: meson: mpll: use 64-bit maths in params_from_rate
From: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
[ Upstream commit 86aacdca66774051cbc0958110a48074b57a060b ]
"rem * SDM_DEN" can easily overflow on the 32-bit Meson8 and Meson8b
SoCs if the "remainder" (after the division operation) is greater than
262143Hz. This is likely to happen since the input clock for the MPLLs
on Meson8 and Meson8b is "fixed_pll", which is running at a rate of
2550MHz.
One example where this was observed to be problematic was the Ethernet
clock calculation (which takes MPLL2 as input). When requesting a rate
of 125MHz there is a remainder of 2500000Hz.
The resulting MPLL2 rate before this patch was 127488329Hz.
The resulting MPLL2 rate after this patch is 124999103Hz.
Commit b609338b26f5 ("clk: meson: mpll: use 64bit math in
rate_from_params") already fixed a similar issue in rate_from_params.
Fixes: 007e6e5c5f01d3 ("clk: meson: mpll: add rw operation")
Signed-off-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Signed-off-by: Jerome Brunet <jbrunet(a)baylibre.com>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/clk/meson/clk-mpll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/clk/meson/clk-mpll.c
+++ b/drivers/clk/meson/clk-mpll.c
@@ -98,7 +98,7 @@ static void params_from_rate(unsigned lo
*sdm = SDM_DEN - 1;
} else {
*n2 = div;
- *sdm = DIV_ROUND_UP(rem * SDM_DEN, requested_rate);
+ *sdm = DIV_ROUND_UP_ULL((u64)rem * SDM_DEN, requested_rate);
}
}
Patches currently in stable-queue which might be from martin.blumenstingl(a)googlemail.com are
queue-4.14/clk-meson-mpll-use-64-bit-maths-in-params_from_rate.patch
This is a note to let you know that I've just added the patch titled
clk: divider: fix incorrect usage of container_of
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
clk-divider-fix-incorrect-usage-of-container_of.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Jerome Brunet <jbrunet(a)baylibre.com>
Date: Thu, 21 Dec 2017 17:30:54 +0100
Subject: clk: divider: fix incorrect usage of container_of
From: Jerome Brunet <jbrunet(a)baylibre.com>
[ Upstream commit 12a26c298d2a8b1cab498533fa65198e49e3afd3 ]
divider_recalc_rate() is an helper function used by clock divider of
different types, so the structure containing the 'hw' pointer is not
always a 'struct clk_divider'
At the following line:
> div = _get_div(table, val, flags, divider->width);
in several cases, the value of 'divider->width' is garbage as the actual
structure behind this memory is not a 'struct clk_divider'
Fortunately, this width value is used by _get_val() only when
CLK_DIVIDER_MAX_AT_ZERO flag is set. This has never been the case so
far when the structure is not a 'struct clk_divider'. This is probably
why we did not notice this bug before
Fixes: afe76c8fd030 ("clk: allow a clk divider with max divisor when zero")
Signed-off-by: Jerome Brunet <jbrunet(a)baylibre.com>
Acked-by: Alexandre Belloni <alexandre.belloni(a)free-electrons.com>
Acked-by: Sylvain Lemieux <slemieux.tyco(a)gmail.com>
Signed-off-by: Stephen Boyd <sboyd(a)codeaurora.org>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/clk/clk-divider.c | 7 +++----
drivers/clk/hisilicon/clkdivider-hi6220.c | 2 +-
drivers/clk/nxp/clk-lpc32xx.c | 2 +-
drivers/clk/qcom/clk-regmap-divider.c | 2 +-
drivers/clk/sunxi-ng/ccu_div.c | 2 +-
drivers/gpu/drm/msm/dsi/pll/dsi_pll_14nm.c | 2 +-
drivers/rtc/rtc-ac100.c | 6 ++++--
include/linux/clk-provider.h | 2 +-
8 files changed, 13 insertions(+), 12 deletions(-)
--- a/drivers/clk/clk-divider.c
+++ b/drivers/clk/clk-divider.c
@@ -118,12 +118,11 @@ static unsigned int _get_val(const struc
unsigned long divider_recalc_rate(struct clk_hw *hw, unsigned long parent_rate,
unsigned int val,
const struct clk_div_table *table,
- unsigned long flags)
+ unsigned long flags, unsigned long width)
{
- struct clk_divider *divider = to_clk_divider(hw);
unsigned int div;
- div = _get_div(table, val, flags, divider->width);
+ div = _get_div(table, val, flags, width);
if (!div) {
WARN(!(flags & CLK_DIVIDER_ALLOW_ZERO),
"%s: Zero divisor and CLK_DIVIDER_ALLOW_ZERO not set\n",
@@ -145,7 +144,7 @@ static unsigned long clk_divider_recalc_
val &= div_mask(divider->width);
return divider_recalc_rate(hw, parent_rate, val, divider->table,
- divider->flags);
+ divider->flags, divider->width);
}
static bool _is_valid_table_div(const struct clk_div_table *table,
--- a/drivers/clk/hisilicon/clkdivider-hi6220.c
+++ b/drivers/clk/hisilicon/clkdivider-hi6220.c
@@ -56,7 +56,7 @@ static unsigned long hi6220_clkdiv_recal
val &= div_mask(dclk->width);
return divider_recalc_rate(hw, parent_rate, val, dclk->table,
- CLK_DIVIDER_ROUND_CLOSEST);
+ CLK_DIVIDER_ROUND_CLOSEST, dclk->width);
}
static long hi6220_clkdiv_round_rate(struct clk_hw *hw, unsigned long rate,
--- a/drivers/clk/nxp/clk-lpc32xx.c
+++ b/drivers/clk/nxp/clk-lpc32xx.c
@@ -956,7 +956,7 @@ static unsigned long clk_divider_recalc_
val &= div_mask(divider->width);
return divider_recalc_rate(hw, parent_rate, val, divider->table,
- divider->flags);
+ divider->flags, divider->width);
}
static long clk_divider_round_rate(struct clk_hw *hw, unsigned long rate,
--- a/drivers/clk/qcom/clk-regmap-divider.c
+++ b/drivers/clk/qcom/clk-regmap-divider.c
@@ -59,7 +59,7 @@ static unsigned long div_recalc_rate(str
div &= BIT(divider->width) - 1;
return divider_recalc_rate(hw, parent_rate, div, NULL,
- CLK_DIVIDER_ROUND_CLOSEST);
+ CLK_DIVIDER_ROUND_CLOSEST, divider->width);
}
const struct clk_ops clk_regmap_div_ops = {
--- a/drivers/clk/sunxi-ng/ccu_div.c
+++ b/drivers/clk/sunxi-ng/ccu_div.c
@@ -71,7 +71,7 @@ static unsigned long ccu_div_recalc_rate
parent_rate);
val = divider_recalc_rate(hw, parent_rate, val, cd->div.table,
- cd->div.flags);
+ cd->div.flags, cd->div.width);
if (cd->common.features & CCU_FEATURE_FIXED_POSTDIV)
val /= cd->fixed_post_div;
--- a/drivers/gpu/drm/msm/dsi/pll/dsi_pll_14nm.c
+++ b/drivers/gpu/drm/msm/dsi/pll/dsi_pll_14nm.c
@@ -698,7 +698,7 @@ static unsigned long dsi_pll_14nm_postdi
val &= div_mask(width);
return divider_recalc_rate(hw, parent_rate, val, NULL,
- postdiv->flags);
+ postdiv->flags, width);
}
static long dsi_pll_14nm_postdiv_round_rate(struct clk_hw *hw,
--- a/drivers/rtc/rtc-ac100.c
+++ b/drivers/rtc/rtc-ac100.c
@@ -137,13 +137,15 @@ static unsigned long ac100_clkout_recalc
div = (reg >> AC100_CLKOUT_PRE_DIV_SHIFT) &
((1 << AC100_CLKOUT_PRE_DIV_WIDTH) - 1);
prate = divider_recalc_rate(hw, prate, div,
- ac100_clkout_prediv, 0);
+ ac100_clkout_prediv, 0,
+ AC100_CLKOUT_PRE_DIV_WIDTH);
}
div = (reg >> AC100_CLKOUT_DIV_SHIFT) &
(BIT(AC100_CLKOUT_DIV_WIDTH) - 1);
return divider_recalc_rate(hw, prate, div, NULL,
- CLK_DIVIDER_POWER_OF_TWO);
+ CLK_DIVIDER_POWER_OF_TWO,
+ AC100_CLKOUT_DIV_WIDTH);
}
static long ac100_clkout_round_rate(struct clk_hw *hw, unsigned long rate,
--- a/include/linux/clk-provider.h
+++ b/include/linux/clk-provider.h
@@ -412,7 +412,7 @@ extern const struct clk_ops clk_divider_
unsigned long divider_recalc_rate(struct clk_hw *hw, unsigned long parent_rate,
unsigned int val, const struct clk_div_table *table,
- unsigned long flags);
+ unsigned long flags, unsigned long width);
long divider_round_rate_parent(struct clk_hw *hw, struct clk_hw *parent,
unsigned long rate, unsigned long *prate,
const struct clk_div_table *table,
Patches currently in stable-queue which might be from jbrunet(a)baylibre.com are
queue-4.14/clk-divider-fix-incorrect-usage-of-container_of.patch
queue-4.14/clk-meson-mpll-use-64-bit-maths-in-params_from_rate.patch
This is a note to let you know that I've just added the patch titled
block, bfq: put async queues for root bfq groups too
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
block-bfq-put-async-queues-for-root-bfq-groups-too.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Paolo Valente <paolo.valente(a)linaro.org>
Date: Tue, 9 Jan 2018 10:27:58 +0100
Subject: block, bfq: put async queues for root bfq groups too
From: Paolo Valente <paolo.valente(a)linaro.org>
[ Upstream commit 52257ffbfcaf58d247b13fb148e27ed17c33e526 ]
For each pair [device for which bfq is selected as I/O scheduler,
group in blkio/io], bfq maintains a corresponding bfq group. Each such
bfq group contains a set of async queues, with each async queue
created on demand, i.e., when some I/O request arrives for it. On
creation, an async queue gets an extra reference, to make sure that
the queue is not freed as long as its bfq group exists. Accordingly,
to allow the queue to be freed after the group exited, this extra
reference must released on group exit.
The above holds also for a bfq root group, i.e., for the bfq group
corresponding to the root blkio/io root for a given device. Yet, by
mistake, the references to the existing async queues of a root group
are not released when the latter exits. This causes a memory leak when
the instance of bfq for a given device exits. In a similar vein,
bfqg_stats_xfer_dead is not executed for a root group.
This commit fixes bfq_pd_offline so that the latter executes the above
missing operations for a root group too.
Reported-by: Holger Hoffstätte <holger(a)applied-asynchrony.com>
Reported-by: Guoqing Jiang <gqjiang(a)suse.com>
Tested-by: Holger Hoffstätte <holger(a)applied-asynchrony.com>
Signed-off-by: Davide Ferrari <davideferrari8(a)gmail.com>
Signed-off-by: Paolo Valente <paolo.valente(a)linaro.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
block/bfq-cgroup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -749,10 +749,11 @@ static void bfq_pd_offline(struct blkg_p
unsigned long flags;
int i;
+ spin_lock_irqsave(&bfqd->lock, flags);
+
if (!entity) /* root group */
- return;
+ goto put_async_queues;
- spin_lock_irqsave(&bfqd->lock, flags);
/*
* Empty all service_trees belonging to this group before
* deactivating the group itself.
@@ -783,6 +784,8 @@ static void bfq_pd_offline(struct blkg_p
}
__bfq_deactivate_entity(entity, false);
+
+put_async_queues:
bfq_put_async_queues(bfqd, bfqg);
spin_unlock_irqrestore(&bfqd->lock, flags);
Patches currently in stable-queue which might be from paolo.valente(a)linaro.org are
queue-4.14/block-bfq-put-async-queues-for-root-bfq-groups-too.patch
This is a note to let you know that I've just added the patch titled
blk-mq: fix race between updating nr_hw_queues and switching io sched
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
blk-mq-fix-race-between-updating-nr_hw_queues-and-switching-io-sched.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Ming Lei <ming.lei(a)redhat.com>
Date: Sat, 6 Jan 2018 16:27:40 +0800
Subject: blk-mq: fix race between updating nr_hw_queues and switching io sched
From: Ming Lei <ming.lei(a)redhat.com>
[ Upstream commit fb350e0ad99359768e1e80b4784692031ec340e4 ]
In both elevator_switch_mq() and blk_mq_update_nr_hw_queues(), sched tags
can be allocated, and q->nr_hw_queue is used, and race is inevitable, for
example: blk_mq_init_sched() may trigger use-after-free on hctx, which is
freed in blk_mq_realloc_hw_ctxs() when nr_hw_queues is decreased.
This patch fixes the race be holding q->sysfs_lock.
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reported-by: Yi Zhang <yi.zhang(a)redhat.com>
Tested-by: Yi Zhang <yi.zhang(a)redhat.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
block/blk-mq.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2314,6 +2314,9 @@ static void blk_mq_realloc_hw_ctxs(struc
struct blk_mq_hw_ctx **hctxs = q->queue_hw_ctx;
blk_mq_sysfs_unregister(q);
+
+ /* protect against switching io scheduler */
+ mutex_lock(&q->sysfs_lock);
for (i = 0; i < set->nr_hw_queues; i++) {
int node;
@@ -2358,6 +2361,7 @@ static void blk_mq_realloc_hw_ctxs(struc
}
}
q->nr_hw_queues = i;
+ mutex_unlock(&q->sysfs_lock);
blk_mq_sysfs_register(q);
}
Patches currently in stable-queue which might be from ming.lei(a)redhat.com are
queue-4.14/blk-mq-avoid-to-map-cpu-into-stale-hw-queue.patch
queue-4.14/blk-mq-fix-kernel-oops-in-blk_mq_tag_idle.patch
queue-4.14/blk-mq-fix-race-between-updating-nr_hw_queues-and-switching-io-sched.patch
This is a note to let you know that I've just added the patch titled
bcache: stop writeback thread after detaching
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-stop-writeback-thread-after-detaching.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Tang Junhui <tang.junhui(a)zte.com.cn>
Date: Mon, 8 Jan 2018 12:21:19 -0800
Subject: bcache: stop writeback thread after detaching
From: Tang Junhui <tang.junhui(a)zte.com.cn>
[ Upstream commit 8d29c4426b9f8afaccf28de414fde8a722b35fdf ]
Currently, when a cached device detaching from cache, writeback thread is
not stopped, and writeback_rate_update work is not canceled. For example,
after the following command:
echo 1 >/sys/block/sdb/bcache/detach
you can still see the writeback thread. Then you attach the device to the
cache again, bcache will create another writeback thread, for example,
after below command:
echo ba0fb5cd-658a-4533-9806-6ce166d883b9 > /sys/block/sdb/bcache/attach
then you will see 2 writeback threads.
This patch stops writeback thread and cancels writeback_rate_update work
when cached device detaching from cache.
Compare with patch v1, this v2 patch moves code down into the register
lock for safety in case of any future changes as Coly and Mike suggested.
[edit by mlyle: commit log spelling/formatting]
Signed-off-by: Tang Junhui <tang.junhui(a)zte.com.cn>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/super.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -893,6 +893,12 @@ static void cached_dev_detach_finish(str
mutex_lock(&bch_register_lock);
+ cancel_delayed_work_sync(&dc->writeback_rate_update);
+ if (!IS_ERR_OR_NULL(dc->writeback_thread)) {
+ kthread_stop(dc->writeback_thread);
+ dc->writeback_thread = NULL;
+ }
+
memset(&dc->sb.set_uuid, 0, 16);
SET_BDEV_STATE(&dc->sb, BDEV_STATE_NONE);
Patches currently in stable-queue which might be from tang.junhui(a)zte.com.cn are
queue-4.14/bcache-segregate-flash-only-volume-write-streams.patch
queue-4.14/bcache-stop-writeback-thread-after-detaching.patch
This is a note to let you know that I've just added the patch titled
bcache: segregate flash only volume write streams
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-segregate-flash-only-volume-write-streams.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Tang Junhui <tang.junhui(a)zte.com.cn>
Date: Mon, 8 Jan 2018 12:21:21 -0800
Subject: bcache: segregate flash only volume write streams
From: Tang Junhui <tang.junhui(a)zte.com.cn>
[ Upstream commit 4eca1cb28d8b0574ca4f1f48e9331c5f852d43b9 ]
In such scenario that there are some flash only volumes
, and some cached devices, when many tasks request these devices in
writeback mode, the write IOs may fall to the same bucket as bellow:
| cached data | flash data | cached data | cached data| flash data|
then after writeback of these cached devices, the bucket would
be like bellow bucket:
| free | flash data | free | free | flash data |
So, there are many free space in this bucket, but since data of flash
only volumes still exists, so this bucket cannot be reclaimable,
which would cause waste of bucket space.
In this patch, we segregate flash only volume write streams from
cached devices, so data from flash only volumes and cached devices
can store in different buckets.
Compare to v1 patch, this patch do not add a additionally open bucket
list, and it is try best to segregate flash only volume write streams
from cached devices, sectors of flash only volumes may still be mixed
with dirty sectors of cached device, but the number is very small.
[mlyle: fixed commit log formatting, permissions, line endings]
Signed-off-by: Tang Junhui <tang.junhui(a)zte.com.cn>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/alloc.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
--- a/drivers/md/bcache/alloc.c
+++ b/drivers/md/bcache/alloc.c
@@ -515,15 +515,21 @@ struct open_bucket {
/*
* We keep multiple buckets open for writes, and try to segregate different
- * write streams for better cache utilization: first we look for a bucket where
- * the last write to it was sequential with the current write, and failing that
- * we look for a bucket that was last used by the same task.
+ * write streams for better cache utilization: first we try to segregate flash
+ * only volume write streams from cached devices, secondly we look for a bucket
+ * where the last write to it was sequential with the current write, and
+ * failing that we look for a bucket that was last used by the same task.
*
* The ideas is if you've got multiple tasks pulling data into the cache at the
* same time, you'll get better cache utilization if you try to segregate their
* data and preserve locality.
*
- * For example, say you've starting Firefox at the same time you're copying a
+ * For example, dirty sectors of flash only volume is not reclaimable, if their
+ * dirty sectors mixed with dirty sectors of cached device, such buckets will
+ * be marked as dirty and won't be reclaimed, though the dirty data of cached
+ * device have been written back to backend device.
+ *
+ * And say you've starting Firefox at the same time you're copying a
* bunch of files. Firefox will likely end up being fairly hot and stay in the
* cache awhile, but the data you copied might not be; if you wrote all that
* data to the same buckets it'd get invalidated at the same time.
@@ -540,7 +546,10 @@ static struct open_bucket *pick_data_buc
struct open_bucket *ret, *ret_task = NULL;
list_for_each_entry_reverse(ret, &c->data_buckets, list)
- if (!bkey_cmp(&ret->key, search))
+ if (UUID_FLASH_ONLY(&c->uuids[KEY_INODE(&ret->key)]) !=
+ UUID_FLASH_ONLY(&c->uuids[KEY_INODE(search)]))
+ continue;
+ else if (!bkey_cmp(&ret->key, search))
goto found;
else if (ret->last_write_point == write_point)
ret_task = ret;
Patches currently in stable-queue which might be from tang.junhui(a)zte.com.cn are
queue-4.14/bcache-segregate-flash-only-volume-write-streams.patch
queue-4.14/bcache-stop-writeback-thread-after-detaching.patch
This is a note to let you know that I've just added the patch titled
bcache: ret IOERR when read meets metadata error
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bcache-ret-ioerr-when-read-meets-metadata-error.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Apr 9 13:58:16 CEST 2018
From: Rui Hua <huarui.dev(a)gmail.com>
Date: Mon, 8 Jan 2018 12:21:18 -0800
Subject: bcache: ret IOERR when read meets metadata error
From: Rui Hua <huarui.dev(a)gmail.com>
[ Upstream commit b221fc130c49c50f4c2250d22e873420765a9fa2 ]
The read request might meet error when searching the btree, but the error
was not handled in cache_lookup(), and this kind of metadata failure will
not go into cached_dev_read_error(), finally, the upper layer will receive
bi_status=0. In this patch we judge the metadata error by the return
value of bch_btree_map_keys(), there are two potential paths give rise to
the error:
1. Because the btree is not totally cached in memery, we maybe get error
when read btree node from cache device (see bch_btree_node_get()), the
likely errno is -EIO, -ENOMEM
2. When read miss happens, bch_btree_insert_check_key() will be called to
insert a "replace_key" to btree(see cached_dev_cache_miss(), just for
doing preparatory work before insert the missed data to cache device),
a failure can also happen in this situation, the likely errno is
-ENOMEM
bch_btree_map_keys() will return MAP_DONE in normal scenario, but we will
get either -EIO or -ENOMEM in above two cases. if this happened, we should
NOT recover data from backing device (when cache device is dirty) because
we don't know whether bkeys the read request covered are all clean. And
after that happened, s->iop.status is still its initially value(0) before
we submit s->bio.bio, we set it to BLK_STS_IOERR, so it can go into
cached_dev_read_error(), and finally it can be passed to upper layer, or
recovered by reread from backing device.
[edit by mlyle: patch formatting, word-wrap, comment spelling,
commit log format]
Signed-off-by: Hua Rui <huarui.dev(a)gmail.com>
Reviewed-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Michael Lyle <mlyle(a)lyle.org>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Sasha Levin <alexander.levin(a)microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/md/bcache/request.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -568,6 +568,7 @@ static void cache_lookup(struct closure
{
struct search *s = container_of(cl, struct search, iop.cl);
struct bio *bio = &s->bio.bio;
+ struct cached_dev *dc;
int ret;
bch_btree_op_init(&s->op, -1);
@@ -580,6 +581,27 @@ static void cache_lookup(struct closure
return;
}
+ /*
+ * We might meet err when searching the btree, If that happens, we will
+ * get negative ret, in this scenario we should not recover data from
+ * backing device (when cache device is dirty) because we don't know
+ * whether bkeys the read request covered are all clean.
+ *
+ * And after that happened, s->iop.status is still its initial value
+ * before we submit s->bio.bio
+ */
+ if (ret < 0) {
+ BUG_ON(ret == -EINTR);
+ if (s->d && s->d->c &&
+ !UUID_FLASH_ONLY(&s->d->c->uuids[s->d->id])) {
+ dc = container_of(s->d, struct cached_dev, disk);
+ if (dc && atomic_read(&dc->has_dirty))
+ s->recoverable = false;
+ }
+ if (!s->iop.status)
+ s->iop.status = BLK_STS_IOERR;
+ }
+
closure_return(cl);
}
Patches currently in stable-queue which might be from huarui.dev(a)gmail.com are
queue-4.14/bcache-ret-ioerr-when-read-meets-metadata-error.patch