The journal no-space deadlock was reported time to time. Such deadlock
can happen in the following situation.
When all journal buckets are fully filled by active jset with heavy
write I/O load, the cache set registration (after a reboot) will load
all active jsets and inserting them into the btree again (which is
called journal replay). If a journaled bkey is inserted into a btree
node and results btree node split, new journal request might be
triggered. For example, the btree grows one more level after the node
split, then the root node record in cache device super block will be
upgrade by bch_journal_meta() from bch_btree_set_root(). But there is no
space in journal buckets, the journal replay has to wait for new journal
bucket to be reclaimed after at least one journal bucket replayed. This
is one example that how the journal no-space deadlock happens.
The solution to avoid the deadlock is to reserve 1 journal bucket in
run time, and only permit the reserved journal bucket to be used during
cache set registration procedure for things like journal replay. Then
the journal space will never be fully filled, there is no chance for
journal no-space deadlock to happen anymore.
This patch adds a new member "bool do_reserve" in struct journal, it is
inititalized to 0 (false) when struct journal is allocated, and set to
1 (true) by bch_journal_space_reserve() when all initialization done in
run_cache_set(). In the run time when journal_reclaim() tries to
allocate a new journal bucket, free_journal_buckets() is called to check
whether there are enough free journal buckets to use. If there is only
1 free journal bucket and journal->do_reserve is 1 (true), the last
bucket is reserved and free_journal_buckets() will return 0 to indicate
no free journal bucket. Then journal_reclaim() will give up, and try
next time to see whetheer there is free journal bucket to allocate. By
this method, there is always 1 jouranl bucket reserved in run time.
During the cache set registration, journal->do_reserve is 0 (false), so
the reserved journal bucket can be used to avoid the no-space deadlock.
Reported-by: Nikhil Kshirsagar <nkshirsagar(a)gmail.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/journal.c | 31 ++++++++++++++++++++++++++-----
drivers/md/bcache/journal.h | 2 ++
drivers/md/bcache/super.c | 1 +
3 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index df5347ea450b..e5da469a4235 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -405,6 +405,11 @@ int bch_journal_replay(struct cache_set *s, struct list_head *list)
return ret;
}
+void bch_journal_space_reserve(struct journal *j)
+{
+ j->do_reserve = true;
+}
+
/* Journalling */
static void btree_flush_write(struct cache_set *c)
@@ -621,12 +626,30 @@ static void do_journal_discard(struct cache *ca)
}
}
+static unsigned int free_journal_buckets(struct cache_set *c)
+{
+ struct journal *j = &c->journal;
+ struct cache *ca = c->cache;
+ struct journal_device *ja = &c->cache->journal;
+ unsigned int n;
+
+ /* In case njournal_buckets is not power of 2 */
+ if (ja->cur_idx >= ja->discard_idx)
+ n = ca->sb.njournal_buckets + ja->discard_idx - ja->cur_idx;
+ else
+ n = ja->discard_idx - ja->cur_idx;
+
+ if (n > (1 + j->do_reserve))
+ return n - (1 + j->do_reserve);
+
+ return 0;
+}
+
static void journal_reclaim(struct cache_set *c)
{
struct bkey *k = &c->journal.key;
struct cache *ca = c->cache;
uint64_t last_seq;
- unsigned int next;
struct journal_device *ja = &ca->journal;
atomic_t p __maybe_unused;
@@ -649,12 +672,10 @@ static void journal_reclaim(struct cache_set *c)
if (c->journal.blocks_free)
goto out;
- next = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
- /* No space available on this device */
- if (next == ja->discard_idx)
+ if (!free_journal_buckets(c))
goto out;
- ja->cur_idx = next;
+ ja->cur_idx = (ja->cur_idx + 1) % ca->sb.njournal_buckets;
k->ptr[0] = MAKE_PTR(0,
bucket_to_sector(c, ca->sb.d[ja->cur_idx]),
ca->sb.nr_this_dev);
diff --git a/drivers/md/bcache/journal.h b/drivers/md/bcache/journal.h
index f2ea34d5f431..cd316b4a1e95 100644
--- a/drivers/md/bcache/journal.h
+++ b/drivers/md/bcache/journal.h
@@ -105,6 +105,7 @@ struct journal {
spinlock_t lock;
spinlock_t flush_write_lock;
bool btree_flushing;
+ bool do_reserve;
/* used when waiting because the journal was full */
struct closure_waitlist wait;
struct closure io;
@@ -182,5 +183,6 @@ int bch_journal_replay(struct cache_set *c, struct list_head *list);
void bch_journal_free(struct cache_set *c);
int bch_journal_alloc(struct cache_set *c);
+void bch_journal_space_reserve(struct journal *j);
#endif /* _BCACHE_JOURNAL_H */
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index bf3de149d3c9..2bb55278d22d 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -2128,6 +2128,7 @@ static int run_cache_set(struct cache_set *c)
flash_devs_run(c);
+ bch_journal_space_reserve(&c->journal);
set_bit(CACHE_SET_RUNNING, &c->flags);
return 0;
err:
--
2.35.3
After making bch_sectors_dirty_init() being multithreaded, the existing
incremental dirty sector counting in bch_root_node_dirty_init() doesn't
release btree occupation after iterating 500000 (INIT_KEYS_EACH_TIME)
bkeys. Because a read lock is added on btree root node to prevent the
btree to be split during the dirty sectors counting, other I/O requester
has no chance to gain the write lock even restart bcache_btree().
That is to say, the incremental dirty sectors counting is incompatible
to the multhreaded bch_sectors_dirty_init(). We have to choose one and
drop another one.
In my testing, with 512 bytes random writes, I generate 1.2T dirty data
and a btree with 400K nodes. With single thread and incremental dirty
sectors counting, it takes 30+ minites to register the backing device.
And with multithreaded dirty sectors counting, the backing device
registration can be accomplished within 2 minutes.
The 30+ minutes V.S. 2- minutes difference makes me decide to keep
multithreaded bch_sectors_dirty_init() and drop the incremental dirty
sectors counting. This is what this patch does.
But INIT_KEYS_EACH_TIME is kept, in sectors_dirty_init_fn() the CPU
will be released by cond_resched() after every INIT_KEYS_EACH_TIME keys
iterated. This is to avoid the watchdog reports a bogus soft lockup
warning.
Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/writeback.c | 41 +++++++++++------------------------
1 file changed, 13 insertions(+), 28 deletions(-)
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index d24c09490f8e..75b71199800d 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -805,13 +805,11 @@ static int bch_writeback_thread(void *arg)
/* Init */
#define INIT_KEYS_EACH_TIME 500000
-#define INIT_KEYS_SLEEP_MS 100
struct sectors_dirty_init {
struct btree_op op;
unsigned int inode;
size_t count;
- struct bkey start;
};
static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
@@ -827,11 +825,8 @@ static int sectors_dirty_init_fn(struct btree_op *_op, struct btree *b,
KEY_START(k), KEY_SIZE(k));
op->count++;
- if (atomic_read(&b->c->search_inflight) &&
- !(op->count % INIT_KEYS_EACH_TIME)) {
- bkey_copy_key(&op->start, k);
- return -EAGAIN;
- }
+ if (!(op->count % INIT_KEYS_EACH_TIME))
+ cond_resched();
return MAP_CONTINUE;
}
@@ -846,24 +841,16 @@ static int bch_root_node_dirty_init(struct cache_set *c,
bch_btree_op_init(&op.op, -1);
op.inode = d->id;
op.count = 0;
- op.start = KEY(op.inode, 0, 0);
-
- do {
- ret = bcache_btree(map_keys_recurse,
- k,
- c->root,
- &op.op,
- &op.start,
- sectors_dirty_init_fn,
- 0);
- if (ret == -EAGAIN)
- schedule_timeout_interruptible(
- msecs_to_jiffies(INIT_KEYS_SLEEP_MS));
- else if (ret < 0) {
- pr_warn("sectors dirty init failed, ret=%d!\n", ret);
- break;
- }
- } while (ret == -EAGAIN);
+
+ ret = bcache_btree(map_keys_recurse,
+ k,
+ c->root,
+ &op.op,
+ &KEY(op.inode, 0, 0),
+ sectors_dirty_init_fn,
+ 0);
+ if (ret < 0)
+ pr_warn("sectors dirty init failed, ret=%d!\n", ret);
return ret;
}
@@ -907,7 +894,6 @@ static int bch_dirty_init_thread(void *arg)
goto out;
}
skip_nr--;
- cond_resched();
}
if (p) {
@@ -917,7 +903,6 @@ static int bch_dirty_init_thread(void *arg)
p = NULL;
prev_idx = cur_idx;
- cond_resched();
}
out:
@@ -956,11 +941,11 @@ void bch_sectors_dirty_init(struct bcache_device *d)
bch_btree_op_init(&op.op, -1);
op.inode = d->id;
op.count = 0;
- op.start = KEY(op.inode, 0, 0);
for_each_key_filter(&c->root->keys,
k, &iter, bch_ptr_invalid)
sectors_dirty_init_fn(&op.op, c->root, k);
+
rw_unlock(0, c->root);
return;
}
--
2.35.3
Commit 8e7102273f59 ("bcache: make bch_btree_check() to be
multithreaded") makes bch_btree_check() to be much faster when checking
all btree nodes during cache device registration. But it isn't in ideal
shap yet, still can be improved.
This patch does the following thing to improve current parallel btree
nodes check by multiple threads in bch_btree_check(),
- Add read lock to root node while checking all the btree nodes with
multiple threads. Although currently it is not mandatory but it is
good to have a read lock in code logic.
- Remove local variable 'char name[32]', and generate kernel thread name
string directly when calling kthread_run().
- Allocate local variable "struct btree_check_state check_state" on the
stack and avoid unnecessary dynamic memory allocation for it.
- Increase check_state->started to count created kernel thread after it
succeeds to create.
- When wait for all checking kernel threads to finish, use wait_event()
to replace wait_event_interruptible().
With this change, the code is more clear, and some potential error
conditions are avoided.
Fixes: 8e7102273f59 ("bcache: make bch_btree_check() to be multithreaded")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
---
drivers/md/bcache/btree.c | 58 ++++++++++++++++++---------------------
1 file changed, 26 insertions(+), 32 deletions(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index ad9f16689419..2362bb8ef6d1 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -2006,8 +2006,7 @@ int bch_btree_check(struct cache_set *c)
int i;
struct bkey *k = NULL;
struct btree_iter iter;
- struct btree_check_state *check_state;
- char name[32];
+ struct btree_check_state check_state;
/* check and mark root node keys */
for_each_key_filter(&c->root->keys, k, &iter, bch_ptr_invalid)
@@ -2018,63 +2017,58 @@ int bch_btree_check(struct cache_set *c)
if (c->root->level == 0)
return 0;
- check_state = kzalloc(sizeof(struct btree_check_state), GFP_KERNEL);
- if (!check_state)
- return -ENOMEM;
-
- check_state->c = c;
- check_state->total_threads = bch_btree_chkthread_nr();
- check_state->key_idx = 0;
- spin_lock_init(&check_state->idx_lock);
- atomic_set(&check_state->started, 0);
- atomic_set(&check_state->enough, 0);
- init_waitqueue_head(&check_state->wait);
+ check_state.c = c;
+ check_state.total_threads = bch_btree_chkthread_nr();
+ check_state.key_idx = 0;
+ spin_lock_init(&check_state.idx_lock);
+ atomic_set(&check_state.started, 0);
+ atomic_set(&check_state.enough, 0);
+ init_waitqueue_head(&check_state.wait);
+ rw_lock(0, c->root, c->root->level);
/*
* Run multiple threads to check btree nodes in parallel,
- * if check_state->enough is non-zero, it means current
+ * if check_state.enough is non-zero, it means current
* running check threads are enough, unncessary to create
* more.
*/
- for (i = 0; i < check_state->total_threads; i++) {
- /* fetch latest check_state->enough earlier */
+ for (i = 0; i < check_state.total_threads; i++) {
+ /* fetch latest check_state.enough earlier */
smp_mb__before_atomic();
- if (atomic_read(&check_state->enough))
+ if (atomic_read(&check_state.enough))
break;
- check_state->infos[i].result = 0;
- check_state->infos[i].state = check_state;
- snprintf(name, sizeof(name), "bch_btrchk[%u]", i);
- atomic_inc(&check_state->started);
+ check_state.infos[i].result = 0;
+ check_state.infos[i].state = &check_state;
- check_state->infos[i].thread =
+ check_state.infos[i].thread =
kthread_run(bch_btree_check_thread,
- &check_state->infos[i],
- name);
- if (IS_ERR(check_state->infos[i].thread)) {
+ &check_state.infos[i],
+ "bch_btrchk[%d]", i);
+ if (IS_ERR(check_state.infos[i].thread)) {
pr_err("fails to run thread bch_btrchk[%d]\n", i);
for (--i; i >= 0; i--)
- kthread_stop(check_state->infos[i].thread);
+ kthread_stop(check_state.infos[i].thread);
ret = -ENOMEM;
goto out;
}
+ atomic_inc(&check_state.started);
}
/*
* Must wait for all threads to stop.
*/
- wait_event_interruptible(check_state->wait,
- atomic_read(&check_state->started) == 0);
+ wait_event(check_state.wait, atomic_read(&check_state.started) == 0);
- for (i = 0; i < check_state->total_threads; i++) {
- if (check_state->infos[i].result) {
- ret = check_state->infos[i].result;
+ for (i = 0; i < check_state.total_threads; i++) {
+ if (check_state.infos[i].result) {
+ ret = check_state.infos[i].result;
goto out;
}
}
out:
- kfree(check_state);
+ rw_unlock(0, c->root);
return ret;
}
--
2.35.3
Dear Altruist,
With confidence that this message will find you in good fate, i wish
to share with you my last wish. I have a terminal ailment and writing
from my hospital bed. I got your contact from humanitarian event with
hope you have the welfare of less privilege around the globe at heart.
I wish to submit my remaining fund the sum of ($ 6,200,000.00 Dollars)
to you for continuation of charity service. My life is no longer
promising as a result of cancer.
I need your urgent cooperation for more details.
Thanks in advance for your kind heart and quick response to this gesture.
Yours faithfully,
Elodie Antoine.
Good days to you
Please kindly accept my apology for sending you this email without
your consent i am Mr.Azzedine Guessous,The director in charge of
auditing and accounting section of Bank Of Africa Ouagadougou
Burkina-Faso in West Africa, I am writing to request your assistance
to transfer the sum of ( $18.6 Million US DOLLARS) feel free to
contact me here (azzedineguessous1(a)gmail.com) for more clarifications
if you are really interested in my proposal Have a nice day
Mr.Azzedine Guessous
Hi,
starting with 5.4.188 wie see a massive performance regression on our
nfs-server. It basically is serving requests very very slowly with cpu
utilization of 100% (with 5.4.187 and earlier it is 10%) so that it is
unusable as a fileserver.
The culprit are commits (or one of it):
c32f1041382a88b17da5736886da4a492353a1bb "nfsd: cleanup
nfsd_file_lru_dispose()"
628adfa21815f74c04724abc85847f24b5dd1645 "nfsd: Containerise filecache
laundrette"
(upstream 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63 and
9542e6a643fc69d528dfb3303f145719c61d3050)
If I revert them in v5.4.192 the kernel works as before and performance
is ok again.
I did not try to revert them one by one as any disruption of our
nfs-server is a severe problem for us and I'm not sure if they are
related.
5.10 and 5.15 both always performed very badly on our nfs-server in a
similar way so we were stuck with 5.4.
I now think this is because of 36ebbdb96b694dd9c6b25ad98f2bbd263d022b63
and/or 9542e6a643fc69d528dfb3303f145719c61d3050 though I didn't tried to
revert them in 5.15 yet.
Regards,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
Align the power-up sequence with the known-good procedure documented in [1]:
un-swap dvdd12 and dvdd25, and allow a little extra time for them to settle
before de-asserting reset.
Fixes: 6aa192698089b ("drm/bridge: Add Analogix anx6345 support")
[1] https://github.com/OLIMEX/DIY-LAPTOP/blob/master/
HARDWARE/A64-TERES/TERES-PCB1-A64-MAIN/Rev.C/TERES_PCB1-A64-MAIN_Rev.C.pdf
(page 5, blue comment down left)
Reported-by: Harald Geyer <harald(a)ccbib.org>
Signed-off-by: Torsten Duwe <duwe(a)lst.de>
Cc: stable(a)vger.kernel.org
---
This fixes the problem that e.g. X screensaver turns the screen black,
and it stays black until the next reboot; definitely on the Teres-I,
probably on the pinebook64, too.
--- a/drivers/gpu/drm/bridge/analogix/analogix-anx6345.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix-anx6345.c
@@ -309,27 +309,27 @@ static void anx6345_poweron(struct anx63
gpiod_set_value_cansleep(anx6345->gpiod_reset, 1);
usleep_range(1000, 2000);
- err = regulator_enable(anx6345->dvdd12);
+ err = regulator_enable(anx6345->dvdd25);
if (err) {
- DRM_ERROR("Failed to enable dvdd12 regulator: %d\n",
+ DRM_ERROR("Failed to enable dvdd25 regulator: %d\n",
err);
return;
}
- /* T1 - delay between VDD12 and VDD25 should be 0-2ms */
+ /* T1 - delay between VDD25 and VDD12 should be 0-2ms */
usleep_range(1000, 2000);
- err = regulator_enable(anx6345->dvdd25);
+ err = regulator_enable(anx6345->dvdd12);
if (err) {
- DRM_ERROR("Failed to enable dvdd25 regulator: %d\n",
+ DRM_ERROR("Failed to enable dvdd12 regulator: %d\n",
err);
return;
}
/* T2 - delay between RESETN and all power rail stable,
- * should be 2-5ms
+ * should be at least 2-5ms, 10ms to be safe.
*/
- usleep_range(2000, 5000);
+ usleep_range(9000, 11000);
gpiod_set_value_cansleep(anx6345->gpiod_reset, 0);
These 3 commits from upstream allow us to have more fine grained control
over the MMC command timeouts and this solves the following timeouts
that we have seen on our systems across suspend/resume cycles:
[ 14.907496] usb usb2: root hub lost power or was reset
[ 15.216232] usb 1-1: reset high-speed USB device number 2 using
xhci-hcd
[ 15.485812] bcmgenet 8f00000.ethernet eth0: Link is Down
[ 15.525328] mmc1: error -110 doing runtime resume
[ 15.531864] OOM killer enabled.
Thanks!
Ulf Hansson (3):
mmc: core: Specify timeouts for BKOPS and CACHE_FLUSH for eMMC
mmc: block: Use generic_cmd6_time when modifying
INAND_CMD38_ARG_EXT_CSD
mmc: core: Default to generic_cmd6_time as timeout in __mmc_switch()
drivers/mmc/core/block.c | 6 +++---
drivers/mmc/core/mmc_ops.c | 25 +++++++++++++------------
2 files changed, 16 insertions(+), 15 deletions(-)
--
2.25.1
Every time I retest your email, it tells me to check with my ISP or
Log onto incoming mail server (POP3): Your e-mail server rejected .
Kindly verify if your email is still valid for us to talk.