2023年11月24日 21:55,Markus Weippert markus@gekmihesg.de 写道:
On Fri, 2023-11-24 at 21:46 +0800, Coly Li wrote:
2023年11月24日 21:29,Markus Weippert markus@gekmihesg.de 写道:
On 23.11.23 14:53, Stefan Förster wrote:
starting with kernel 6.1.39, we see the following error message with heavy I/O loads. We needed to revert
Thx for the report. I assume that problem still occurs with the latest 6.1.y kernel?
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v...
FWIW, that is mainline commit 028ddcac477b69 ("bcache: Remove unnecessary NULL point check in node allocations") [v6.5-rc1].
Did a quick check and noticed a fix for that change was recently mainlined as f72f4312d43883 ("bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce()") [v6.7-rc2-post]: https://lore.kernel.org/all/20231118163852.9692-1-colyli@suse.de/
It is expected to soon be interegrated into a 6.1.y kernel.
But maybe it's something else. I CCed the involved people, they might know.
We applied f72f4312d43883 to the current Debian kernel (based on 6.1.55) but it didn't help, same stack trace. Looking at the description, __bch_btree_node_alloc() should never be able to return NULL anyway after https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v... But I didn't verify all callers, so this might still be correct, if it's not always initialized with the return value of __bch_btree_node_alloc().
Anyway, I think we fixed it by applying this:
diff -Naurp a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c --- a/drivers/md/bcache/btree.c 2023-09-23 11:11:13.000000000 +0200 +++ b/drivers/md/bcache/btree.c 2023-11-24 13:13:09.840013759 +0100 @@ -1489,7 +1489,7 @@ out_nocoalesce: bch_keylist_free(&keylist);
for (i = 0; i < nodes; i++)
- if (!IS_ERR(new_nodes[i])) {
- if (!IS_ERR_OR_NULL(new_nodes[i])) {
btree_node_free(new_nodes[i]); rw_unlock(true, new_nodes[i]); }
The above change is what commit f72f4312d43883 ("bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce()” does.
But f72f4312d43883 reverts @@ -1340,7 +1340,7 @@, while the patch we applied reverts @@ -1487,7 +1487,7 @@ instead. Applying f72f4312d43883 didn't help for us.
OK, I know what you mean. Yes, your fix is necessary too.
Would you like to post patch for your fix?
Thanks.
Coly Li
Although the above patch is suggested to go into 6.5+ kernel, for this condition it should go into all stable kernels where commit 028ddcac477b69 ("bcache: Remove unnecessary NULL point check in node allocations”) were merged into.
[snipped]