Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected

2 Mar 2024


      ...
I have not root cause this yet, but would like share some findings from 
the vmcore Dan shared. From what i can see, this doesn't look like a md 
issue, but something wrong with block layer or below.
Below is one other thing I found that might be of interest. This is
from the original email thread [1] that was linked to in the original
issue from 2022, which the change in question reverts:
On 2022-09-02 17:46, Logan Gunthorpe wrote:
...
I've made some progress on this nasty bug. I've got far enough to know it's not
related to the blk-wbt or the block layer.
Turns out a bunch of bios are stuck queued in a blk_plug in the md_raid5 
thread while that thread appears to be stuck in an infinite loop (so it never
schedules or does anything to flush the plug).
I'm still debugging to try and find out the root cause of that infinite loop, 
but I just wanted to send an update that the previous place I was stuck at
was not correct.
Logan
This certainly sounds like it has some similarities to what we are
seeing when that change is reverted. The md0_raid5 thread appears to be
in an infinite loop, consuming 100% CPU, but not actually doing any
work.
-- Dan
[1] https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [REGRESSION] 6.7.1: md: raid5 hang and unresponsive system; successfully bisected