Hi Andrew
Bcache is designed to boost the I/O performance of slower storage (HDDs, network-attached storage) by leveraging fast SSDs as a block cache. This functionality is critical in significantly reducing I/O latency. Therefore, any notable increase in bcache's latency severely diminishes its value. For instance, our tests show a P100 (max) latency spike from 600 ms to 2.4 seconds every 5 minutes due to this regression. In real-world environments, this increase will cause frequent timeouts and stalls in end-user applications that rely on bcache's latency improvements, highlighting the urgent need to address this issue.
Best regards Robert Pang
On Fri, Jun 13, 2025 at 3:16 PM Kuan-Wei Chiu visitorckw@gmail.com wrote:
Hi Andrew,
On Wed, Jun 11, 2025 at 06:48:17PM -0700, Andrew Morton wrote:
On Wed, 11 Jun 2025 05:55:08 +0800 Kuan-Wei Chiu visitorckw@gmail.com wrote:
This patch series introduces equality-aware variants of the min heap API that use a top-down heapify strategy to improve performance when many elements are equal under the comparison function. It also updates the documentation accordingly and modifies bcache to use the new APIs to fix a performance regression caused by the switch to the generic min heap library.
In particular, invalidate_buckets_lru() in bcache suffered from increased comparison overhead due to the bottom-up strategy introduced in commit 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap"). The regression is addressed by switching to the equality-aware variants and using the inline versions to avoid function call overhead in this hot path.
Cc: stable@vger.kernel.org
To justify a -stable backport this performance regression would need to have a pretty significant impact upon real-world userspace. Especially as the patchset is large.
Unfortunately the changelog provides no indication of the magnitude of the userspace impact. Please tell us this, in detail.
I'll work with Robert to provide a more detailed explanation of the real-world impact on userspace.
Also, if we are to address this regression in -stable kernels then reverting 866898efbb25 is an obvious way - it is far far safer. So please also tell us why the proposed patchset is a better way for us to go.
I agree that reverting 866898efbb25 is a much safer and smaller change for backporting. In fact, I previously raised the discussion of whether we should revert the commit or instead introduce an equality-aware API and use it. The bcache maintainer preferred the latter, and I also believe that it is a more forward-looking approach. Given that bcache has run into this issue, it's likely that other users with similar use cases may encounter it as well. We wouldn't want those users to continue relying on the current default heapify behavior. So, although reverting may be more suitable for stable in isolation, adding an equality-aware API could better serve a broader set of use cases going forward.
(Also, each patch should have a fixes:866898efbb25 to help direct the backporting efforts)
Ack. Will do.
I'll add the patches to mm.git to get you some testing but from what I'm presently seeing the -stable backporting would be unwise.
Thanks!
Regards, Kuan-Wei