Hello,
I have been investigating a read performance regression of dm-snapshot on top of loopback in which the read time for a dd command increased from 2min to 40min. I bisected the issue to dc5fc361d89 ("block: attempt direct issue of plug list"). I blktraced before and after this commit and the main difference I saw was that before this commit, when the performance was good, there were a lot of IO unplugs on the loop dev. After this commit, I saw 0 IO unplugs.
On the mainline, I was also able to bisect to a commit which fixed this issue: 667ea36378cf ("loop: don't set QUEUE_FLAG_NOMERGES"). I also blktraced before and after this commit, and unsurprisingly, the main difference was that commit resulted in IO merges whereas previously there were none being.
I don't totally understand what is going on with the first commit which introduced the issue but I'd guess some modifying of the plug list behavior resulted in IO not getting merged/grouped but when we enabled QUEUE_FLAG_NOMERGES, we were then able to optimize through this mechanism. Buuuut 2min->40min seems like a huge performance drop just from merged vs non-merged IO, no? So perhaps it's more complicated than that...
dc5fc361d89 -> 5.16 667ea36378c -> 6.11
6.6.y and 6.1.y and were both experiencing the performance issue. I tried porting 667ea36378 to these branches; it applied cleanly and resolved the issue for both. So perhaps we should consider it for the stable trees, but it'd be great if someone from the block layer could chime in with a better idea of what's going on here.
- leah