This patch series introduces equality-aware variants of the min heap API that use a top-down heapify strategy to improve performance when many elements are equal under the comparison function. It also updates the documentation accordingly and modifies bcache to use the new APIs to fix a performance regression caused by the switch to the generic min heap library.
In particular, invalidate_buckets_lru() in bcache suffered from increased comparison overhead due to the bottom-up strategy introduced in commit 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap"). The regression is addressed by switching to the equality-aware variants and using the inline versions to avoid function call overhead in this hot path.
Cc: stable@vger.kernel.org ---
To avoid duplicated effort and expedite resolution, Robert kindly agreed that I should submit my already-completed series instead. Many thanks to him for his cooperation and support.
Kuan-Wei Chiu (8): lib min_heap: Add equal-elements-aware sift_down variant lib min_heap: Add typedef for sift_down function pointer lib min_heap: add eqaware variant of min_heapify_all() lib min_heap: add eqaware variant of min_heap_pop() lib min_heap: add eqaware variant of min_heap_pop_push() lib min_heap: add eqaware variant of min_heap_del() Documentation/core-api: min_heap: Document _eqaware variants of min-heap APIs bcache: Fix the tail IO latency regression by using equality-aware min heap API
Documentation/core-api/min_heap.rst | 20 +++++ drivers/md/bcache/alloc.c | 15 ++-- include/linux/min_heap.h | 131 +++++++++++++++++++++++----- lib/min_heap.c | 23 +++-- 4 files changed, 154 insertions(+), 35 deletions(-)
The existing min_heap_sift_down() uses the bottom-up heapify variant, which reduces the number of comparisons from ~2 * log2(n) to ~1 * log2(n) when all elements are distinct. However, in workloads where the heap contains many equal elements, this bottom-up variant can degenerate and perform up to 2 * log2(n) comparisons, while the traditional top-down variant needs only O(1) comparisons in such cases.
To address this, introduce min_heap_sift_down_eqaware(), a top-down heapify variant optimized for scenarios with many equal elements. This variant avoids unnecessary comparisons and swaps when elements are already equal or in the correct position.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- include/linux/min_heap.h | 51 ++++++++++++++++++++++++++++++++++++++++ lib/min_heap.c | 7 ++++++ 2 files changed, 58 insertions(+)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index 79ddc0adbf2b..b0d603fe5379 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -292,6 +292,52 @@ void __min_heap_sift_down_inline(min_heap_char *heap, size_t pos, size_t elem_si __min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ __minheap_obj_size(_heap), _func, _args)
+/* + * Sift the element at pos down the heap. + * + * Variants of heap functions using an equal-elements-aware sift_down. + * These may perform better when the heap contains many equal elements. + */ +static __always_inline +void __min_heap_sift_down_eqaware_inline(min_heap_char * heap, size_t pos, size_t elem_size, + const struct min_heap_callbacks *func, void *args) +{ + void *data = heap->data; + void (*swp)(void *lhs, void *rhs, void *args) = func->swp; + /* pre-scale counters for performance */ + size_t a = pos * elem_size; + size_t b, c, smallest; + size_t n = heap->nr * elem_size; + + if (!swp) + swp = select_swap_func(data, elem_size); + + for (;;) { + b = 2 * a + elem_size; + c = b + elem_size; + smallest = a; + + if (b >= n) + break; + + if (func->less(data + b, data + smallest, args)) + smallest = b; + + if (c < n && func->less(data + c, data + smallest, args)) + smallest = c; + + if (smallest == a) + break; + + do_swap(data + a, data + smallest, elem_size, swp, args); + a = smallest; + } +} + +#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \ + __min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ + __minheap_obj_size(_heap), _func, _args) + /* Sift up ith element from the heap, O(log2(nr)). */ static __always_inline void __min_heap_sift_up_inline(min_heap_char *heap, size_t elem_size, size_t idx, @@ -433,6 +479,8 @@ void *__min_heap_peek(struct min_heap_char *heap); bool __min_heap_full(min_heap_char *heap); void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, const struct min_heap_callbacks *func, void *args); +void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_size, + const struct min_heap_callbacks *func, void *args); void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args); void __min_heapify_all(min_heap_char *heap, size_t elem_size, @@ -455,6 +503,9 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, #define min_heap_sift_down(_heap, _pos, _func, _args) \ __min_heap_sift_down(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ __minheap_obj_size(_heap), _func, _args) +#define min_heap_sift_down_eqaware(_heap, _pos, _func, _args) \ + __min_heap_sift_down_eqaware(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ + __minheap_obj_size(_heap), _func, _args) #define min_heap_sift_up(_heap, _idx, _func, _args) \ __min_heap_sift_up(container_of(&(_heap)->nr, min_heap_char, nr), \ __minheap_obj_size(_heap), _idx, _func, _args) diff --git a/lib/min_heap.c b/lib/min_heap.c index 96f01a4c5fb6..2225f40d0d7a 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -27,6 +27,13 @@ void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, } EXPORT_SYMBOL(__min_heap_sift_down);
+void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_size, + const struct min_heap_callbacks *func, void *args) +{ + __min_heap_sift_down_eqaware_inline(heap, pos, elem_size, func, args); +} +EXPORT_SYMBOL(__min_heap_sift_down_eqaware); + void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args) {
Hi Kuan-Wei
Thanks for this patch series to address the bcache latency regression. I tested it but results show regression still remains. Upon review of the patch changes, I notice that the min_heap_sift_down_eqaware_inline #define macro in this patch may have been mapped incorrectly:
+#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \ + __min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ + __minheap_obj_size(_heap), _func, _args)
I changed it to map to its "eqaware" counterpart like this and the regression does not happen again.
+#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \ + __min_heap_sift_down_eqaware_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ + __minheap_obj_size(_heap), _func, _args)
Do you think this correction is appropriate?
Best regards Robert Pang
On Wed, Jun 11, 2025 at 6:55 AM Kuan-Wei Chiu visitorckw@gmail.com wrote:
The existing min_heap_sift_down() uses the bottom-up heapify variant, which reduces the number of comparisons from ~2 * log2(n) to ~1 * log2(n) when all elements are distinct. However, in workloads where the heap contains many equal elements, this bottom-up variant can degenerate and perform up to 2 * log2(n) comparisons, while the traditional top-down variant needs only O(1) comparisons in such cases.
To address this, introduce min_heap_sift_down_eqaware(), a top-down heapify variant optimized for scenarios with many equal elements. This variant avoids unnecessary comparisons and swaps when elements are already equal or in the correct position.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com
include/linux/min_heap.h | 51 ++++++++++++++++++++++++++++++++++++++++ lib/min_heap.c | 7 ++++++ 2 files changed, 58 insertions(+)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index 79ddc0adbf2b..b0d603fe5379 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -292,6 +292,52 @@ void __min_heap_sift_down_inline(min_heap_char *heap, size_t pos, size_t elem_si __min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ __minheap_obj_size(_heap), _func, _args)
+/*
- Sift the element at pos down the heap.
- Variants of heap functions using an equal-elements-aware sift_down.
- These may perform better when the heap contains many equal elements.
- */
+static __always_inline +void __min_heap_sift_down_eqaware_inline(min_heap_char * heap, size_t pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args)
+{
void *data = heap->data;
void (*swp)(void *lhs, void *rhs, void *args) = func->swp;
/* pre-scale counters for performance */
size_t a = pos * elem_size;
size_t b, c, smallest;
size_t n = heap->nr * elem_size;
if (!swp)
swp = select_swap_func(data, elem_size);
for (;;) {
b = 2 * a + elem_size;
c = b + elem_size;
smallest = a;
if (b >= n)
break;
if (func->less(data + b, data + smallest, args))
smallest = b;
if (c < n && func->less(data + c, data + smallest, args))
smallest = c;
if (smallest == a)
break;
do_swap(data + a, data + smallest, elem_size, swp, args);
a = smallest;
}
+}
+#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \
__min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \
__minheap_obj_size(_heap), _func, _args)
/* Sift up ith element from the heap, O(log2(nr)). */ static __always_inline void __min_heap_sift_up_inline(min_heap_char *heap, size_t elem_size, size_t idx, @@ -433,6 +479,8 @@ void *__min_heap_peek(struct min_heap_char *heap); bool __min_heap_full(min_heap_char *heap); void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, const struct min_heap_callbacks *func, void *args); +void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args);
void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args); void __min_heapify_all(min_heap_char *heap, size_t elem_size, @@ -455,6 +503,9 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, #define min_heap_sift_down(_heap, _pos, _func, _args) \ __min_heap_sift_down(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ __minheap_obj_size(_heap), _func, _args) +#define min_heap_sift_down_eqaware(_heap, _pos, _func, _args) \
__min_heap_sift_down_eqaware(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \
__minheap_obj_size(_heap), _func, _args)
#define min_heap_sift_up(_heap, _idx, _func, _args) \ __min_heap_sift_up(container_of(&(_heap)->nr, min_heap_char, nr), \ __minheap_obj_size(_heap), _idx, _func, _args) diff --git a/lib/min_heap.c b/lib/min_heap.c index 96f01a4c5fb6..2225f40d0d7a 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -27,6 +27,13 @@ void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, } EXPORT_SYMBOL(__min_heap_sift_down);
+void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args)
+{
__min_heap_sift_down_eqaware_inline(heap, pos, elem_size, func, args);
+} +EXPORT_SYMBOL(__min_heap_sift_down_eqaware);
void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args) { -- 2.34.1
On Thu, Jun 12, 2025 at 10:00:14PM +0900, Robert Pang wrote:
Hi Kuan-Wei
Thanks for this patch series to address the bcache latency regression. I tested it but results show regression still remains. Upon review of the patch changes, I notice that the min_heap_sift_down_eqaware_inline #define macro in this patch may have been mapped incorrectly:
+#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \
__min_heap_sift_down_inline(container_of(&(_heap)->nr,
min_heap_char, nr), _pos, \
__minheap_obj_size(_heap), _func, _args)
I changed it to map to its "eqaware" counterpart like this and the regression does not happen again.
+#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \
__min_heap_sift_down_eqaware_inline(container_of(&(_heap)->nr,
min_heap_char, nr), _pos, \
__minheap_obj_size(_heap), _func, _args)
Do you think this correction is appropriate?
That's definitely my mistake. Thanks for testing and pointing it out. I'll fix the typo in the next version.
Regards, Kuan-Wei
Best regards Robert Pang
On Wed, Jun 11, 2025 at 6:55 AM Kuan-Wei Chiu visitorckw@gmail.com wrote:
The existing min_heap_sift_down() uses the bottom-up heapify variant, which reduces the number of comparisons from ~2 * log2(n) to ~1 * log2(n) when all elements are distinct. However, in workloads where the heap contains many equal elements, this bottom-up variant can degenerate and perform up to 2 * log2(n) comparisons, while the traditional top-down variant needs only O(1) comparisons in such cases.
To address this, introduce min_heap_sift_down_eqaware(), a top-down heapify variant optimized for scenarios with many equal elements. This variant avoids unnecessary comparisons and swaps when elements are already equal or in the correct position.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com
include/linux/min_heap.h | 51 ++++++++++++++++++++++++++++++++++++++++ lib/min_heap.c | 7 ++++++ 2 files changed, 58 insertions(+)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index 79ddc0adbf2b..b0d603fe5379 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -292,6 +292,52 @@ void __min_heap_sift_down_inline(min_heap_char *heap, size_t pos, size_t elem_si __min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ __minheap_obj_size(_heap), _func, _args)
+/*
- Sift the element at pos down the heap.
- Variants of heap functions using an equal-elements-aware sift_down.
- These may perform better when the heap contains many equal elements.
- */
+static __always_inline +void __min_heap_sift_down_eqaware_inline(min_heap_char * heap, size_t pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args)
+{
void *data = heap->data;
void (*swp)(void *lhs, void *rhs, void *args) = func->swp;
/* pre-scale counters for performance */
size_t a = pos * elem_size;
size_t b, c, smallest;
size_t n = heap->nr * elem_size;
if (!swp)
swp = select_swap_func(data, elem_size);
for (;;) {
b = 2 * a + elem_size;
c = b + elem_size;
smallest = a;
if (b >= n)
break;
if (func->less(data + b, data + smallest, args))
smallest = b;
if (c < n && func->less(data + c, data + smallest, args))
smallest = c;
if (smallest == a)
break;
do_swap(data + a, data + smallest, elem_size, swp, args);
a = smallest;
}
+}
+#define min_heap_sift_down_eqaware_inline(_heap, _pos, _func, _args) \
__min_heap_sift_down_inline(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \
__minheap_obj_size(_heap), _func, _args)
/* Sift up ith element from the heap, O(log2(nr)). */ static __always_inline void __min_heap_sift_up_inline(min_heap_char *heap, size_t elem_size, size_t idx, @@ -433,6 +479,8 @@ void *__min_heap_peek(struct min_heap_char *heap); bool __min_heap_full(min_heap_char *heap); void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, const struct min_heap_callbacks *func, void *args); +void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args);
void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args); void __min_heapify_all(min_heap_char *heap, size_t elem_size, @@ -455,6 +503,9 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, #define min_heap_sift_down(_heap, _pos, _func, _args) \ __min_heap_sift_down(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \ __minheap_obj_size(_heap), _func, _args) +#define min_heap_sift_down_eqaware(_heap, _pos, _func, _args) \
__min_heap_sift_down_eqaware(container_of(&(_heap)->nr, min_heap_char, nr), _pos, \
__minheap_obj_size(_heap), _func, _args)
#define min_heap_sift_up(_heap, _idx, _func, _args) \ __min_heap_sift_up(container_of(&(_heap)->nr, min_heap_char, nr), \ __minheap_obj_size(_heap), _idx, _func, _args) diff --git a/lib/min_heap.c b/lib/min_heap.c index 96f01a4c5fb6..2225f40d0d7a 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -27,6 +27,13 @@ void __min_heap_sift_down(min_heap_char *heap, size_t pos, size_t elem_size, } EXPORT_SYMBOL(__min_heap_sift_down);
+void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_size,
const struct min_heap_callbacks *func, void *args)
+{
__min_heap_sift_down_eqaware_inline(heap, pos, elem_size, func, args);
+} +EXPORT_SYMBOL(__min_heap_sift_down_eqaware);
void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args) { -- 2.34.1
Several min-heap operations such as min_heapify_all(), min_heap_pop(), min_heap_pop_push(), and min_heap_del() use min_heap_sift_down() internally. With the addition of the equal-elements-aware variant min_heap_sift_down_eqaware(), these functions now need to choose between multiple sift_down implementations.
Introduce a siftdown_fn_t typedef to represent the function pointer type for sift_down routines. This avoids repeating verbose function pointer declarations and simplifies code that dynamically selects the appropriate implementation based on heap characteristics.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- include/linux/min_heap.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index b0d603fe5379..4cd8fd9db259 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -49,6 +49,9 @@ struct min_heap_callbacks { void (*swp)(void *lhs, void *rhs, void *args); };
+typedef void (*siftdown_fn_t)(min_heap_char *heap, size_t pos, size_t elem_size, + const struct min_heap_callbacks *func, void *args); + /** * is_aligned - is this pointer & size okay for word-wide copying? * @base: pointer to data
Introduce min_heapify_all_eqaware() as a variant of min_heapify_all() that uses the equality-aware version of sift_down, which is implemented in a top-down manner.
This top-down sift_down reduces the number of comparisons from O(log2(n)) to O(1) in cases where many elements have equal priority. It enables more efficient heap construction when the heap contains a large number of equal elements.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- include/linux/min_heap.h | 19 ++++++++++++++----- lib/min_heap.c | 4 ++-- 2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index 4cd8fd9db259..c2f6e1450505 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -371,17 +371,23 @@ void __min_heap_sift_up_inline(min_heap_char *heap, size_t elem_size, size_t idx /* Floyd's approach to heapification that is O(nr). */ static __always_inline void __min_heapify_all_inline(min_heap_char *heap, size_t elem_size, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { ssize_t i; + siftdown_fn_t sift_down = eqaware ? __min_heap_sift_down_eqaware_inline : + __min_heap_sift_down_inline;
for (i = heap->nr / 2 - 1; i >= 0; i--) - __min_heap_sift_down_inline(heap, i, elem_size, func, args); + sift_down(heap, i, elem_size, func, args); }
#define min_heapify_all_inline(_heap, _func, _args) \ __min_heapify_all_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ - __minheap_obj_size(_heap), _func, _args) + __minheap_obj_size(_heap), _func, _args, false) + +#define min_heapify_all_eqaware_inline(_heap, _func, _args) \ + __min_heapify_all_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ + __minheap_obj_size(_heap), _func, _args, true)
/* Remove minimum element from the heap, O(log2(nr)). */ static __always_inline @@ -487,7 +493,7 @@ void __min_heap_sift_down_eqaware(min_heap_char *heap, size_t pos, size_t elem_s void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, const struct min_heap_callbacks *func, void *args); void __min_heapify_all(min_heap_char *heap, size_t elem_size, - const struct min_heap_callbacks *func, void *args); + const struct min_heap_callbacks *func, void *args, bool eqaware); bool __min_heap_pop(min_heap_char *heap, size_t elem_size, const struct min_heap_callbacks *func, void *args); void __min_heap_pop_push(min_heap_char *heap, const void *element, size_t elem_size, @@ -514,7 +520,10 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, __minheap_obj_size(_heap), _idx, _func, _args) #define min_heapify_all(_heap, _func, _args) \ __min_heapify_all(container_of(&(_heap)->nr, min_heap_char, nr), \ - __minheap_obj_size(_heap), _func, _args) + __minheap_obj_size(_heap), _func, _args, false) +#define min_heapify_all_eqaware(_heap, _func, _args) \ + __min_heapify_all(container_of(&(_heap)->nr, min_heap_char, nr), \ + __minheap_obj_size(_heap), _func, _args, true) #define min_heap_pop(_heap, _func, _args) \ __min_heap_pop(container_of(&(_heap)->nr, min_heap_char, nr), \ __minheap_obj_size(_heap), _func, _args) diff --git a/lib/min_heap.c b/lib/min_heap.c index 2225f40d0d7a..a422cfaff196 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -42,9 +42,9 @@ void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, EXPORT_SYMBOL(__min_heap_sift_up);
void __min_heapify_all(min_heap_char *heap, size_t elem_size, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { - __min_heapify_all_inline(heap, elem_size, func, args); + __min_heapify_all_inline(heap, elem_size, func, args, eqaware); } EXPORT_SYMBOL(__min_heapify_all);
Introduce min_heap_pop_eqaware() as a variant of min_heap_pop() that uses the equality-aware version of sift_down, which is implemented in a top-down manner.
This top-down sift_down reduces the number of comparisons from O(log2(n)) to O(1) in cases where many elements have equal priority. It enables more efficient heap construction when the heap contains a large number of equal elements.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- include/linux/min_heap.h | 19 ++++++++++++++----- lib/min_heap.c | 4 ++-- 2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index c2f6e1450505..6c45d617b027 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -392,9 +392,11 @@ void __min_heapify_all_inline(min_heap_char *heap, size_t elem_size, /* Remove minimum element from the heap, O(log2(nr)). */ static __always_inline bool __min_heap_pop_inline(min_heap_char *heap, size_t elem_size, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { void *data = heap->data; + siftdown_fn_t sift_down = eqaware ? __min_heap_sift_down_eqaware_inline : + __min_heap_sift_down_inline;
if (WARN_ONCE(heap->nr <= 0, "Popping an empty heap")) return false; @@ -402,14 +404,18 @@ bool __min_heap_pop_inline(min_heap_char *heap, size_t elem_size, /* Place last element at the root (position 0) and then sift down. */ heap->nr--; memcpy(data, data + (heap->nr * elem_size), elem_size); - __min_heap_sift_down_inline(heap, 0, elem_size, func, args); + sift_down(heap, 0, elem_size, func, args);
return true; }
#define min_heap_pop_inline(_heap, _func, _args) \ __min_heap_pop_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ - __minheap_obj_size(_heap), _func, _args) + __minheap_obj_size(_heap), _func, _args, false) + +#define min_heap_pop_eqaware_inline(_heap, _func, _args) \ + __min_heap_pop_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ + __minheap_obj_size(_heap), _func, _args, true)
/* * Remove the minimum element and then push the given element. The @@ -495,7 +501,7 @@ void __min_heap_sift_up(min_heap_char *heap, size_t elem_size, size_t idx, void __min_heapify_all(min_heap_char *heap, size_t elem_size, const struct min_heap_callbacks *func, void *args, bool eqaware); bool __min_heap_pop(min_heap_char *heap, size_t elem_size, - const struct min_heap_callbacks *func, void *args); + const struct min_heap_callbacks *func, void *args, bool eqaware); void __min_heap_pop_push(min_heap_char *heap, const void *element, size_t elem_size, const struct min_heap_callbacks *func, void *args); bool __min_heap_push(min_heap_char *heap, const void *element, size_t elem_size, @@ -526,7 +532,10 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, __minheap_obj_size(_heap), _func, _args, true) #define min_heap_pop(_heap, _func, _args) \ __min_heap_pop(container_of(&(_heap)->nr, min_heap_char, nr), \ - __minheap_obj_size(_heap), _func, _args) + __minheap_obj_size(_heap), _func, _args, false) +#define min_heap_pop_eqaware(_heap, _func, _args) \ + __min_heap_pop(container_of(&(_heap)->nr, min_heap_char, nr), \ + __minheap_obj_size(_heap), _func, _args, true) #define min_heap_pop_push(_heap, _element, _func, _args) \ __min_heap_pop_push(container_of(&(_heap)->nr, min_heap_char, nr), _element, \ __minheap_obj_size(_heap), _func, _args) diff --git a/lib/min_heap.c b/lib/min_heap.c index a422cfaff196..dae3ed39421a 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -49,9 +49,9 @@ void __min_heapify_all(min_heap_char *heap, size_t elem_size, EXPORT_SYMBOL(__min_heapify_all);
bool __min_heap_pop(min_heap_char *heap, size_t elem_size, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { - return __min_heap_pop_inline(heap, elem_size, func, args); + return __min_heap_pop_inline(heap, elem_size, func, args, eqaware); } EXPORT_SYMBOL(__min_heap_pop);
Introduce min_heap_pop_push_eqaware() as a variant of min_heap_pop_push() that uses the equality-aware version of sift_down, which is implemented in a top-down manner.
This top-down sift_down reduces the number of comparisons from O(log2(n)) to O(1) in cases where many elements have equal priority. It enables more efficient heap construction when the heap contains a large number of equal elements.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- include/linux/min_heap.h | 20 +++++++++++++++----- lib/min_heap.c | 4 ++-- 2 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index 6c45d617b027..d7bf8dd0f6b1 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -424,15 +424,22 @@ bool __min_heap_pop_inline(min_heap_char *heap, size_t elem_size, */ static __always_inline void __min_heap_pop_push_inline(min_heap_char *heap, const void *element, size_t elem_size, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { + siftdown_fn_t sift_down = eqaware ? __min_heap_sift_down_eqaware_inline : + __min_heap_sift_down_inline; + memcpy(heap->data, element, elem_size); - __min_heap_sift_down_inline(heap, 0, elem_size, func, args); + sift_down(heap, 0, elem_size, func, args); }
#define min_heap_pop_push_inline(_heap, _element, _func, _args) \ __min_heap_pop_push_inline(container_of(&(_heap)->nr, min_heap_char, nr), _element, \ - __minheap_obj_size(_heap), _func, _args) + __minheap_obj_size(_heap), _func, _args, false) + +#define min_heap_pop_push_eqaware_inline(_heap, _element, _func, _args) \ + __min_heap_pop_push_inline(container_of(&(_heap)->nr, min_heap_char, nr), _element, \ + __minheap_obj_size(_heap), _func, _args, true)
/* Push an element on to the heap, O(log2(nr)). */ static __always_inline @@ -503,7 +510,7 @@ void __min_heapify_all(min_heap_char *heap, size_t elem_size, bool __min_heap_pop(min_heap_char *heap, size_t elem_size, const struct min_heap_callbacks *func, void *args, bool eqaware); void __min_heap_pop_push(min_heap_char *heap, const void *element, size_t elem_size, - const struct min_heap_callbacks *func, void *args); + const struct min_heap_callbacks *func, void *args, bool eqaware); bool __min_heap_push(min_heap_char *heap, const void *element, size_t elem_size, const struct min_heap_callbacks *func, void *args); bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, @@ -538,7 +545,10 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, __minheap_obj_size(_heap), _func, _args, true) #define min_heap_pop_push(_heap, _element, _func, _args) \ __min_heap_pop_push(container_of(&(_heap)->nr, min_heap_char, nr), _element, \ - __minheap_obj_size(_heap), _func, _args) + __minheap_obj_size(_heap), _func, _args, false) +#define min_heap_pop_push_eqaware(_heap, _element, _func, _args) \ + __min_heap_pop_push(container_of(&(_heap)->nr, min_heap_char, nr), _element, \ + __minheap_obj_size(_heap), _func, _args, true) #define min_heap_push(_heap, _element, _func, _args) \ __min_heap_push(container_of(&(_heap)->nr, min_heap_char, nr), _element, \ __minheap_obj_size(_heap), _func, _args) diff --git a/lib/min_heap.c b/lib/min_heap.c index dae3ed39421a..a69d8b80d443 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -56,9 +56,9 @@ bool __min_heap_pop(min_heap_char *heap, size_t elem_size, EXPORT_SYMBOL(__min_heap_pop);
void __min_heap_pop_push(min_heap_char *heap, const void *element, size_t elem_size, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { - __min_heap_pop_push_inline(heap, element, elem_size, func, args); + __min_heap_pop_push_inline(heap, element, elem_size, func, args, eqaware); } EXPORT_SYMBOL(__min_heap_pop_push);
Introduce min_heap_del_eqaware() as a variant of min_heap_del() that uses the equality-aware version of sift_down, which is implemented in a top-down manner.
This top-down sift_down reduces the number of comparisons from O(log2(n)) to O(1) in cases where many elements have equal priority. It enables more efficient heap construction when the heap contains a large number of equal elements.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- include/linux/min_heap.h | 19 ++++++++++++++----- lib/min_heap.c | 4 ++-- 2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/include/linux/min_heap.h b/include/linux/min_heap.h index d7bf8dd0f6b1..f34df8dd2e17 100644 --- a/include/linux/min_heap.h +++ b/include/linux/min_heap.h @@ -470,10 +470,12 @@ bool __min_heap_push_inline(min_heap_char *heap, const void *element, size_t ele /* Remove ith element from the heap, O(log2(nr)). */ static __always_inline bool __min_heap_del_inline(min_heap_char *heap, size_t elem_size, size_t idx, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { void *data = heap->data; void (*swp)(void *lhs, void *rhs, void *args) = func->swp; + siftdown_fn_t sift_down = eqaware ? __min_heap_sift_down_eqaware_inline : + __min_heap_sift_down_inline;
if (WARN_ONCE(heap->nr <= 0, "Popping an empty heap")) return false; @@ -487,14 +489,18 @@ bool __min_heap_del_inline(min_heap_char *heap, size_t elem_size, size_t idx, return true; do_swap(data + (idx * elem_size), data + (heap->nr * elem_size), elem_size, swp, args); __min_heap_sift_up_inline(heap, elem_size, idx, func, args); - __min_heap_sift_down_inline(heap, idx, elem_size, func, args); + sift_down(heap, idx, elem_size, func, args);
return true; }
#define min_heap_del_inline(_heap, _idx, _func, _args) \ __min_heap_del_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ - __minheap_obj_size(_heap), _idx, _func, _args) + __minheap_obj_size(_heap), _idx, _func, _args, false) + +#define min_heap_del_eqaware_inline(_heap, _idx, _func, _args) \ + __min_heap_del_inline(container_of(&(_heap)->nr, min_heap_char, nr), \ + __minheap_obj_size(_heap), _idx, _func, _args, true)
void __min_heap_init(min_heap_char *heap, void *data, size_t size); void *__min_heap_peek(struct min_heap_char *heap); @@ -514,7 +520,7 @@ void __min_heap_pop_push(min_heap_char *heap, const void *element, size_t elem_s bool __min_heap_push(min_heap_char *heap, const void *element, size_t elem_size, const struct min_heap_callbacks *func, void *args); bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, - const struct min_heap_callbacks *func, void *args); + const struct min_heap_callbacks *func, void *args, bool eqaware);
#define min_heap_init(_heap, _data, _size) \ __min_heap_init(container_of(&(_heap)->nr, min_heap_char, nr), _data, _size) @@ -554,6 +560,9 @@ bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, __minheap_obj_size(_heap), _func, _args) #define min_heap_del(_heap, _idx, _func, _args) \ __min_heap_del(container_of(&(_heap)->nr, min_heap_char, nr), \ - __minheap_obj_size(_heap), _idx, _func, _args) + __minheap_obj_size(_heap), _idx, _func, _args, false) +#define min_heap_del_eqaware(_heap, _idx, _func, _args) \ + __min_heap_del(container_of(&(_heap)->nr, min_heap_char, nr), \ + __minheap_obj_size(_heap), _idx, _func, _args, true)
#endif /* _LINUX_MIN_HEAP_H */ diff --git a/lib/min_heap.c b/lib/min_heap.c index a69d8b80d443..50f224f174d5 100644 --- a/lib/min_heap.c +++ b/lib/min_heap.c @@ -70,8 +70,8 @@ bool __min_heap_push(min_heap_char *heap, const void *element, size_t elem_size, EXPORT_SYMBOL(__min_heap_push);
bool __min_heap_del(min_heap_char *heap, size_t elem_size, size_t idx, - const struct min_heap_callbacks *func, void *args) + const struct min_heap_callbacks *func, void *args, bool eqaware) { - return __min_heap_del_inline(heap, elem_size, idx, func, args); + return __min_heap_del_inline(heap, elem_size, idx, func, args, eqaware); } EXPORT_SYMBOL(__min_heap_del);
Add documentation for equality-aware variants of min-heap maintenance functions, which use a top-down sift_down strategy. These variants, suffixed with _eqaware, reduce the number of comparisons to O(1) in workloads with many elements of equal priority and can be used as drop-in replacements for their standard counterparts.
Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- Documentation/core-api/min_heap.rst | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+)
diff --git a/Documentation/core-api/min_heap.rst b/Documentation/core-api/min_heap.rst index 9f57766581df..012c82038b46 100644 --- a/Documentation/core-api/min_heap.rst +++ b/Documentation/core-api/min_heap.rst @@ -30,6 +30,26 @@ more expensive. As with the non-inline versions, it is important to use the macro wrappers for inline functions instead of directly calling the functions themselves.
+Equality-Aware Heap Maintenance +------------------------------- + +In some workloads, a large number of elements in the heap may be equal under +the user-defined comparison function. For such cases, the standard +``min_heap_sift_down()`` implementation, which uses the bottom-up heapify +strategy, can be inefficient. While bottom-up heapify reduces the number of +comparisons by approximately 50% for randomly ordered data, it may perform up +to :math:`2 \times \log_2(n)` comparisons in the presence of many equal +elements. + +To address this, equality-aware versions of heap maintenance APIs are provided. +These versions use the traditional top-down heapify strategy, which performs +better - sometimes requiring only :math:`\mathcal{O}(1)` comparisons - when +many elements are equal. + +The equality-aware APIs are suffixed with ``_eqaware``, and serve as drop-in +replacements for their standard counterparts when equal elements are expected. + + Data Structures ===============
Commit 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap") replaced the original top-down heap macros in bcache with the generic min heap library, which uses a bottom-up heapify strategy. However, in scenarios like invalidate_buckets_lru() - especially before the cache is fully populated - many buckets remain unfilled. This causes new_bucket_prio() to frequently return zero, leading to a high rate of equal comparisons.
Bottom-up sift_down performs up to 2 * log2(n) comparisons in such cases, resulting in a performance regression.
Switch to the _eqaware variants of the min heap API to restore the original top-down sift_down behavior, which requires only O(1) comparisons when many elements are equal.
Also use the inline versions of the heap functions to avoid performance degradation introduced by commit 92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API functions"), as invalidate_buckets_lru() is on a performance-critical hot path.
Fixes: 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap") Fixes: 92a8b224b833 ("lib/min_heap: introduce non-inline versions of min heap API functions") Reported-by: Robert Pang robertpang@google.com Closes: https://lore.kernel.org/linux-bcache/CAJhEC06F_AtrPgw2-7CvCqZgeStgCtitbD-ryu... Cc: stable@vger.kernel.org # 6.11+ Signed-off-by: Kuan-Wei Chiu visitorckw@gmail.com --- drivers/md/bcache/alloc.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c index 8998e61efa40..625c5c4eb962 100644 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c @@ -207,15 +207,16 @@ static void invalidate_buckets_lru(struct cache *ca) if (!bch_can_invalidate_bucket(ca, b)) continue;
- if (!min_heap_full(&ca->heap)) - min_heap_push(&ca->heap, &b, &bucket_max_cmp_callback, ca); - else if (!new_bucket_max_cmp(&b, min_heap_peek(&ca->heap), ca)) { + if (!min_heap_full_inline(&ca->heap)) + min_heap_push_inline(&ca->heap, &b, &bucket_max_cmp_callback, ca); + else if (!new_bucket_max_cmp(&b, min_heap_peek_inline(&ca->heap), ca)) { ca->heap.data[0] = b; - min_heap_sift_down(&ca->heap, 0, &bucket_max_cmp_callback, ca); + min_heap_sift_down_eqaware_inline(&ca->heap, 0, &bucket_max_cmp_callback, + ca); } }
- min_heapify_all(&ca->heap, &bucket_min_cmp_callback, ca); + min_heapify_all_eqaware_inline(&ca->heap, &bucket_min_cmp_callback, ca);
while (!fifo_full(&ca->free_inc)) { if (!ca->heap.nr) { @@ -227,8 +228,8 @@ static void invalidate_buckets_lru(struct cache *ca) wake_up_gc(ca->set); return; } - b = min_heap_peek(&ca->heap)[0]; - min_heap_pop(&ca->heap, &bucket_min_cmp_callback, ca); + b = min_heap_peek_inline(&ca->heap)[0]; + min_heap_pop_eqaware_inline(&ca->heap, &bucket_min_cmp_callback, ca);
bch_invalidate_one_bucket(ca, b); }
On Wed, 11 Jun 2025 05:55:08 +0800 Kuan-Wei Chiu visitorckw@gmail.com wrote:
This patch series introduces equality-aware variants of the min heap API that use a top-down heapify strategy to improve performance when many elements are equal under the comparison function. It also updates the documentation accordingly and modifies bcache to use the new APIs to fix a performance regression caused by the switch to the generic min heap library.
In particular, invalidate_buckets_lru() in bcache suffered from increased comparison overhead due to the bottom-up strategy introduced in commit 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap"). The regression is addressed by switching to the equality-aware variants and using the inline versions to avoid function call overhead in this hot path.
Cc: stable@vger.kernel.org
To justify a -stable backport this performance regression would need to have a pretty significant impact upon real-world userspace. Especially as the patchset is large.
Unfortunately the changelog provides no indication of the magnitude of the userspace impact. Please tell us this, in detail.
Also, if we are to address this regression in -stable kernels then reverting 866898efbb25 is an obvious way - it is far far safer. So please also tell us why the proposed patchset is a better way for us to go.
(Also, each patch should have a fixes:866898efbb25 to help direct the backporting efforts)
I'll add the patches to mm.git to get you some testing but from what I'm presently seeing the -stable backporting would be unwise.
On Wed, 11 Jun 2025 18:48:17 -0700 Andrew Morton akpm@linux-foundation.org wrote:
Unfortunately the changelog provides no indication of the magnitude of the userspace impact. Please tell us this, in detail.
OK I found details in the 8th patch's Closes: link. That was tricky ;)
It's impressively detailed but I'm still struggling to understand how much pain this regression causes users in real life. So some sort of reader-friendly executive summary would be great, please.
Hi Andrew,
On Wed, Jun 11, 2025 at 06:48:17PM -0700, Andrew Morton wrote:
On Wed, 11 Jun 2025 05:55:08 +0800 Kuan-Wei Chiu visitorckw@gmail.com wrote:
This patch series introduces equality-aware variants of the min heap API that use a top-down heapify strategy to improve performance when many elements are equal under the comparison function. It also updates the documentation accordingly and modifies bcache to use the new APIs to fix a performance regression caused by the switch to the generic min heap library.
In particular, invalidate_buckets_lru() in bcache suffered from increased comparison overhead due to the bottom-up strategy introduced in commit 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap"). The regression is addressed by switching to the equality-aware variants and using the inline versions to avoid function call overhead in this hot path.
Cc: stable@vger.kernel.org
To justify a -stable backport this performance regression would need to have a pretty significant impact upon real-world userspace. Especially as the patchset is large.
Unfortunately the changelog provides no indication of the magnitude of the userspace impact. Please tell us this, in detail.
I'll work with Robert to provide a more detailed explanation of the real-world impact on userspace.
Also, if we are to address this regression in -stable kernels then reverting 866898efbb25 is an obvious way - it is far far safer. So please also tell us why the proposed patchset is a better way for us to go.
I agree that reverting 866898efbb25 is a much safer and smaller change for backporting. In fact, I previously raised the discussion of whether we should revert the commit or instead introduce an equality-aware API and use it. The bcache maintainer preferred the latter, and I also believe that it is a more forward-looking approach. Given that bcache has run into this issue, it's likely that other users with similar use cases may encounter it as well. We wouldn't want those users to continue relying on the current default heapify behavior. So, although reverting may be more suitable for stable in isolation, adding an equality-aware API could better serve a broader set of use cases going forward.
(Also, each patch should have a fixes:866898efbb25 to help direct the backporting efforts)
Ack. Will do.
I'll add the patches to mm.git to get you some testing but from what I'm presently seeing the -stable backporting would be unwise.
Thanks!
Regards, Kuan-Wei
Hi Andrew
Bcache is designed to boost the I/O performance of slower storage (HDDs, network-attached storage) by leveraging fast SSDs as a block cache. This functionality is critical in significantly reducing I/O latency. Therefore, any notable increase in bcache's latency severely diminishes its value. For instance, our tests show a P100 (max) latency spike from 600 ms to 2.4 seconds every 5 minutes due to this regression. In real-world environments, this increase will cause frequent timeouts and stalls in end-user applications that rely on bcache's latency improvements, highlighting the urgent need to address this issue.
Best regards Robert Pang
On Fri, Jun 13, 2025 at 3:16 PM Kuan-Wei Chiu visitorckw@gmail.com wrote:
Hi Andrew,
On Wed, Jun 11, 2025 at 06:48:17PM -0700, Andrew Morton wrote:
On Wed, 11 Jun 2025 05:55:08 +0800 Kuan-Wei Chiu visitorckw@gmail.com wrote:
This patch series introduces equality-aware variants of the min heap API that use a top-down heapify strategy to improve performance when many elements are equal under the comparison function. It also updates the documentation accordingly and modifies bcache to use the new APIs to fix a performance regression caused by the switch to the generic min heap library.
In particular, invalidate_buckets_lru() in bcache suffered from increased comparison overhead due to the bottom-up strategy introduced in commit 866898efbb25 ("bcache: remove heap-related macros and switch to generic min_heap"). The regression is addressed by switching to the equality-aware variants and using the inline versions to avoid function call overhead in this hot path.
Cc: stable@vger.kernel.org
To justify a -stable backport this performance regression would need to have a pretty significant impact upon real-world userspace. Especially as the patchset is large.
Unfortunately the changelog provides no indication of the magnitude of the userspace impact. Please tell us this, in detail.
I'll work with Robert to provide a more detailed explanation of the real-world impact on userspace.
Also, if we are to address this regression in -stable kernels then reverting 866898efbb25 is an obvious way - it is far far safer. So please also tell us why the proposed patchset is a better way for us to go.
I agree that reverting 866898efbb25 is a much safer and smaller change for backporting. In fact, I previously raised the discussion of whether we should revert the commit or instead introduce an equality-aware API and use it. The bcache maintainer preferred the latter, and I also believe that it is a more forward-looking approach. Given that bcache has run into this issue, it's likely that other users with similar use cases may encounter it as well. We wouldn't want those users to continue relying on the current default heapify behavior. So, although reverting may be more suitable for stable in isolation, adding an equality-aware API could better serve a broader set of use cases going forward.
(Also, each patch should have a fixes:866898efbb25 to help direct the backporting efforts)
Ack. Will do.
I'll add the patches to mm.git to get you some testing but from what I'm presently seeing the -stable backporting would be unwise.
Thanks!
Regards, Kuan-Wei
On Fri, 13 Jun 2025 23:26:33 +0900 Robert Pang robertpang@google.com wrote:
Hi Andrew
Bcache is designed to boost the I/O performance of slower storage (HDDs, network-attached storage) by leveraging fast SSDs as a block cache. This functionality is critical in significantly reducing I/O latency. Therefore, any notable increase in bcache's latency severely diminishes its value. For instance, our tests show a P100 (max) latency spike from 600 ms to 2.4 seconds every 5 minutes due to this regression. In real-world environments, this increase will cause frequent timeouts and stalls in end-user applications that rely on bcache's latency improvements, highlighting the urgent need to address this issue.
Great, thanks. Let's please incorporate this into the v2 changelogging.
Also, if we are to address this regression in -stable kernels then reverting 866898efbb25 is an obvious way - it is far far safer. So please also tell us why the proposed patchset is a better way for us to go.
I agree that reverting 866898efbb25 is a much safer and smaller change for backporting. In fact, I previously raised the discussion of whether we should revert the commit or instead introduce an equality-aware API and use it. The bcache maintainer preferred the latter, and I also believe that it is a more forward-looking approach. Given that bcache has run into this issue, it's likely that other users with similar use cases may encounter it as well. We wouldn't want those users to continue relying on the current default heapify behavior. So, although reverting may be more suitable for stable in isolation, adding an equality-aware API could better serve a broader set of use cases going forward.
"much safer and smaller" is very desirable for backporting, please. After all, 866898efbb25 didn't really fix anything and reverting that takes us back to a known-to-work implementation.
I of course have no problem making the changes in this patchset for "going forward"!
So if agreeable, please prepare a patch which reverts 866898efbb25. Robert's words above are a great basis for that patch's description.
Hi Andrew,
On Fri, Jun 13, 2025 at 11:04:15AM -0700, Andrew Morton wrote:
On Fri, 13 Jun 2025 23:26:33 +0900 Robert Pang robertpang@google.com wrote:
Hi Andrew
Bcache is designed to boost the I/O performance of slower storage (HDDs, network-attached storage) by leveraging fast SSDs as a block cache. This functionality is critical in significantly reducing I/O latency. Therefore, any notable increase in bcache's latency severely diminishes its value. For instance, our tests show a P100 (max) latency spike from 600 ms to 2.4 seconds every 5 minutes due to this regression. In real-world environments, this increase will cause frequent timeouts and stalls in end-user applications that rely on bcache's latency improvements, highlighting the urgent need to address this issue.
Great, thanks. Let's please incorporate this into the v2 changelogging.
Also, if we are to address this regression in -stable kernels then reverting 866898efbb25 is an obvious way - it is far far safer. So please also tell us why the proposed patchset is a better way for us to go.
I agree that reverting 866898efbb25 is a much safer and smaller change for backporting. In fact, I previously raised the discussion of whether we should revert the commit or instead introduce an equality-aware API and use it. The bcache maintainer preferred the latter, and I also believe that it is a more forward-looking approach. Given that bcache has run into this issue, it's likely that other users with similar use cases may encounter it as well. We wouldn't want those users to continue relying on the current default heapify behavior. So, although reverting may be more suitable for stable in isolation, adding an equality-aware API could better serve a broader set of use cases going forward.
"much safer and smaller" is very desirable for backporting, please. After all, 866898efbb25 didn't really fix anything and reverting that takes us back to a known-to-work implementation.
I of course have no problem making the changes in this patchset for "going forward"!
So if agreeable, please prepare a patch which reverts 866898efbb25. Robert's words above are a great basis for that patch's description.
Sure, I'll prepare a revert patch to address the issue and plan to submit it for backporting to -stable.
However, I'd like to confirm whether the following patch series structure would be appropriate:
- Patch 1: Revert 866898efbb25 and CC it to stable - Patch 2–8: Introduce the new equality-aware heap API - Patch 9: Revert Patch 1 and switch bcache to use the new API
In this case, we would only backport Patch 1 to stable.
Alternatively, would you prefer we simply revert 866898efbb25 without introducing and using the new API in the same series?
Regards, Kuan-Wei
On Sat, 14 Jun 2025 07:19:51 +0800 Kuan-Wei Chiu visitorckw@gmail.com wrote:
Sure, I'll prepare a revert patch to address the issue and plan to submit it for backporting to -stable.
However, I'd like to confirm whether the following patch series structure would be appropriate:
- Patch 1: Revert 866898efbb25 and CC it to stable
- Patch 2–8: Introduce the new equality-aware heap API
- Patch 9: Revert Patch 1 and switch bcache to use the new API
In this case, we would only backport Patch 1 to stable.
Alternatively, would you prefer we simply revert 866898efbb25 without introducing and using the new API in the same series?
Yes, just the revert for now, please. Let's not make that dependent on new development.
linux-stable-mirror@lists.linaro.org