New subject: Stable backport request (was Re: read regression for dm-snapshot with loopback)

3 Oct 2024


      Hello,
I have been investigating a read performance regression of dm-snapshot
on top of loopback in which the read time for a dd command increased
from 2min to 40min. I bisected the issue to dc5fc361d89 ("block:
attempt direct issue of plug list"). I blktraced before and after this
commit and the main difference I saw was that before this commit, when
the performance was good, there were a lot of IO unplugs on the loop
dev. After this commit, I saw 0 IO unplugs.
On the mainline, I was also able to bisect to a commit which fixed
this issue: 667ea36378cf ("loop: don't set QUEUE_FLAG_NOMERGES"). I
also blktraced before and after this commit, and unsurprisingly, the
main difference was that commit resulted in IO merges whereas
previously there were none being.
I don't totally understand what is going on with the first commit
which introduced the issue but I'd guess some modifying of the plug
list behavior resulted in IO not getting merged/grouped but when we
enabled QUEUE_FLAG_NOMERGES, we were then able to optimize through
this mechanism. Buuuut 2min->40min seems like a huge performance drop
just from merged vs non-merged IO, no? So perhaps it's more
complicated than that...
dc5fc361d89 -> 5.16
667ea36378c -> 6.11
6.6.y and 6.1.y and were both experiencing the performance issue. I
tried porting 667ea36378 to these branches; it applied cleanly and
resolved the issue for both. So perhaps we should consider it for the
stable trees, but it'd be great if someone from the block layer could
chime in with a better idea of what's going on here.
- leah