I have not root cause this yet, but would like share some findings from the vmcore Dan shared. From what i can see, this doesn't look like a md issue, but something wrong with block layer or below.
Below is one other thing I found that might be of interest. This is from the original email thread [1] that was linked to in the original issue from 2022, which the change in question reverts:
On 2022-09-02 17:46, Logan Gunthorpe wrote:
I've made some progress on this nasty bug. I've got far enough to know it's not related to the blk-wbt or the block layer.
Turns out a bunch of bios are stuck queued in a blk_plug in the md_raid5 thread while that thread appears to be stuck in an infinite loop (so it never schedules or does anything to flush the plug).
I'm still debugging to try and find out the root cause of that infinite loop, but I just wanted to send an update that the previous place I was stuck at was not correct.
Logan
This certainly sounds like it has some similarities to what we are seeing when that change is reverted. The md0_raid5 thread appears to be in an infinite loop, consuming 100% CPU, but not actually doing any work.
-- Dan
[1] https://lore.kernel.org/r/7f3b87b6-b52a-f737-51d7-a4eec5c44112@deltatee.com