On 02.03.24 01:05, Song Liu wrote:
On Fri, Mar 1, 2024 at 3:12 PM Dan Moulding dan@danm.net wrote:
- Looks like the block layer or underlying(scsi/virtio-scsi) may have
some issue which leading to the io request from md layer stayed in a partial complete statue. I can't see how this can be related with the commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
There is no question that the above mentioned commit makes this problem appear. While it may be that ultimately the root cause lies outside the md/raid5 code (I'm not able to make such an assessment), I can tell you that change is what turned it into a runtime regression. Prior to that change, I cannot reproduce the problem. One of my RAID-5 arrays has been running on every kernel version since 4.8, without issue. Then kernel 6.7.1 the problem appeared within hours of running the new code and affected not just one but two different machines with RAID-5 arrays. With that change reverted, the problem is not reproducible. Then when I recently upgraded to 6.8-rc5 I immediately hit the problem again (because it hadn't been reverted in the mainline yet). I'm now running 6.8.0-rc5 on one of my affected machines without issue after reverting that commit on top of it.
[...] I also tried again to reproduce the issue, but haven't got luck. While I will continue try to repro the issue, I will also send the revert to 6.8 kernel.
Is that revert on the way meanwhile? I'm asking because Linus might release 6.8 on Sunday.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.