Replace the freelist (O(n)) used for free block management with a red-black tree, providing more efficient O(log n) search, insert, and delete operations. This improves scalability and performance when managing large numbers of free blocks per order (e.g., hundreds or thousands).
In the VK-CTS memory stress subtest, the buddy manager merges fragmented memory and inserts freed blocks into the freelist. Since freelist insertion is O(n), this becomes a bottleneck as fragmentation increases. Benchmarking shows list_insert_sorted() consumes ~52.69% CPU with the freelist, compared to just 0.03% with the RB tree (rbtree_insert.isra.0), despite performing the same sorted insert.
This also improves performance in heavily fragmented workloads, such as games or graphics tests that stress memory.
As the buddy allocator evolves with new features such as clear-page tracking, the resulting fragmentation and complexity have grown. These RB-tree based design changes are introduced to address that growth and ensure the allocator continues to perform efficiently under fragmented conditions.
The RB tree implementation with separate clear/dirty trees provides: - O(n log n) aggregate complexity for all operations instead of O(n^2) - Elimination of soft lockups and system instability - Improved code maintainability and clarity - Better scalability for large memory systems - Predictable performance under fragmentation
v3(Matthew): - Remove RB_EMPTY_NODE check in force_merge function. - Rename rb for loop macros to have less generic names and move to .c file. - Make the rb node rb and link field as union.
v4(Jani Nikula): - The kernel-doc comment should be "/**" - Move all the rbtree macros to rbtree.h and add parens to ensure correct precedence.
v5: - Remove the inline in a .c file (Jani Nikula).
Cc: stable@vger.kernel.org Signed-off-by: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com --- drivers/gpu/drm/drm_buddy.c | 142 ++++++++++++++++++++++-------------- include/drm/drm_buddy.h | 9 ++- include/linux/rbtree.h | 56 ++++++++++++++ 3 files changed, 152 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index a94061f373de..8b340f47f73d 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -31,6 +31,8 @@ static struct drm_buddy_block *drm_block_alloc(struct drm_buddy *mm, block->header |= order; block->parent = parent;
+ RB_CLEAR_NODE(&block->rb); + BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED); return block; } @@ -41,23 +43,53 @@ static void drm_block_free(struct drm_buddy *mm, kmem_cache_free(slab_blocks, block); }
-static void list_insert_sorted(struct drm_buddy *mm, - struct drm_buddy_block *block) +static void rbtree_insert(struct drm_buddy *mm, + struct drm_buddy_block *block) { + struct rb_root *root = &mm->free_tree[drm_buddy_block_order(block)]; + struct rb_node **link = &root->rb_node; + struct rb_node *parent = NULL; struct drm_buddy_block *node; - struct list_head *head; + u64 offset; + + offset = drm_buddy_block_offset(block);
- head = &mm->free_list[drm_buddy_block_order(block)]; - if (list_empty(head)) { - list_add(&block->link, head); - return; + while (*link) { + parent = *link; + node = rb_entry(parent, struct drm_buddy_block, rb); + + if (offset < drm_buddy_block_offset(node)) + link = &parent->rb_left; + else + link = &parent->rb_right; }
- list_for_each_entry(node, head, link) - if (drm_buddy_block_offset(block) < drm_buddy_block_offset(node)) - break; + rb_link_node(&block->rb, parent, link); + rb_insert_color(&block->rb, root); +} + +static void rbtree_remove(struct drm_buddy *mm, + struct drm_buddy_block *block) +{ + struct rb_root *root; + + root = &mm->free_tree[drm_buddy_block_order(block)]; + rb_erase(&block->rb, root); + + RB_CLEAR_NODE(&block->rb); +} + +static struct drm_buddy_block * +rbtree_last_entry(struct drm_buddy *mm, unsigned int order) +{ + struct rb_node *node = rb_last(&mm->free_tree[order]);
- __list_add(&block->link, node->link.prev, &node->link); + return node ? rb_entry(node, struct drm_buddy_block, rb) : NULL; +} + +static bool rbtree_is_empty(struct drm_buddy *mm, unsigned int order) +{ + return RB_EMPTY_ROOT(&mm->free_tree[order]); }
static void clear_reset(struct drm_buddy_block *block) @@ -70,12 +102,13 @@ static void mark_cleared(struct drm_buddy_block *block) block->header |= DRM_BUDDY_HEADER_CLEAR; }
-static void mark_allocated(struct drm_buddy_block *block) +static void mark_allocated(struct drm_buddy *mm, + struct drm_buddy_block *block) { block->header &= ~DRM_BUDDY_HEADER_STATE; block->header |= DRM_BUDDY_ALLOCATED;
- list_del(&block->link); + rbtree_remove(mm, block); }
static void mark_free(struct drm_buddy *mm, @@ -84,15 +117,16 @@ static void mark_free(struct drm_buddy *mm, block->header &= ~DRM_BUDDY_HEADER_STATE; block->header |= DRM_BUDDY_FREE;
- list_insert_sorted(mm, block); + rbtree_insert(mm, block); }
-static void mark_split(struct drm_buddy_block *block) +static void mark_split(struct drm_buddy *mm, + struct drm_buddy_block *block) { block->header &= ~DRM_BUDDY_HEADER_STATE; block->header |= DRM_BUDDY_SPLIT;
- list_del(&block->link); + rbtree_remove(mm, block); }
static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2) @@ -148,7 +182,7 @@ static unsigned int __drm_buddy_free(struct drm_buddy *mm, mark_cleared(parent); }
- list_del(&buddy->link); + rbtree_remove(mm, buddy); if (force_merge && drm_buddy_block_is_clear(buddy)) mm->clear_avail -= drm_buddy_block_size(mm, buddy);
@@ -179,9 +213,11 @@ static int __force_merge(struct drm_buddy *mm, return -EINVAL;
for (i = min_order - 1; i >= 0; i--) { - struct drm_buddy_block *block, *prev; + struct drm_buddy_block *block, *prev_block, *first_block; + + first_block = rb_entry(rb_first(&mm->free_tree[i]), struct drm_buddy_block, rb);
- list_for_each_entry_safe_reverse(block, prev, &mm->free_list[i], link) { + rbtree_reverse_for_each_entry_safe(block, prev_block, &mm->free_tree[i], rb) { struct drm_buddy_block *buddy; u64 block_start, block_end;
@@ -206,10 +242,14 @@ static int __force_merge(struct drm_buddy *mm, * block in the next iteration as we would free the * buddy block as part of the free function. */ - if (prev == buddy) - prev = list_prev_entry(prev, link); + if (prev_block && prev_block == buddy) { + if (prev_block != first_block) + prev_block = rb_entry(rb_prev(&prev_block->rb), + struct drm_buddy_block, + rb); + }
- list_del(&block->link); + rbtree_remove(mm, block); if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block);
@@ -258,14 +298,14 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size)
BUG_ON(mm->max_order > DRM_BUDDY_MAX_ORDER);
- mm->free_list = kmalloc_array(mm->max_order + 1, - sizeof(struct list_head), + mm->free_tree = kmalloc_array(mm->max_order + 1, + sizeof(struct rb_root), GFP_KERNEL); - if (!mm->free_list) + if (!mm->free_tree) return -ENOMEM;
for (i = 0; i <= mm->max_order; ++i) - INIT_LIST_HEAD(&mm->free_list[i]); + mm->free_tree[i] = RB_ROOT;
mm->n_roots = hweight64(size);
@@ -273,7 +313,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size) sizeof(struct drm_buddy_block *), GFP_KERNEL); if (!mm->roots) - goto out_free_list; + goto out_free_tree;
offset = 0; i = 0; @@ -312,8 +352,8 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size) while (i--) drm_block_free(mm, mm->roots[i]); kfree(mm->roots); -out_free_list: - kfree(mm->free_list); +out_free_tree: + kfree(mm->free_tree); return -ENOMEM; } EXPORT_SYMBOL(drm_buddy_init); @@ -323,7 +363,7 @@ EXPORT_SYMBOL(drm_buddy_init); * * @mm: DRM buddy manager to free * - * Cleanup memory manager resources and the freelist + * Cleanup memory manager resources and the freetree */ void drm_buddy_fini(struct drm_buddy *mm) { @@ -350,7 +390,7 @@ void drm_buddy_fini(struct drm_buddy *mm) WARN_ON(mm->avail != mm->size);
kfree(mm->roots); - kfree(mm->free_list); + kfree(mm->free_tree); } EXPORT_SYMBOL(drm_buddy_fini);
@@ -383,7 +423,7 @@ static int split_block(struct drm_buddy *mm, clear_reset(block); }
- mark_split(block); + mark_split(mm, block);
return 0; } @@ -412,7 +452,7 @@ EXPORT_SYMBOL(drm_get_buddy); * @is_clear: blocks clear state * * Reset the clear state based on @is_clear value for each block - * in the freelist. + * in the freetree. */ void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear) { @@ -433,7 +473,7 @@ void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear) for (i = 0; i <= mm->max_order; ++i) { struct drm_buddy_block *block;
- list_for_each_entry_reverse(block, &mm->free_list[i], link) { + rbtree_reverse_for_each_entry(block, &mm->free_tree[i], rb) { if (is_clear != drm_buddy_block_is_clear(block)) { if (is_clear) { mark_cleared(block); @@ -641,7 +681,7 @@ get_maxblock(struct drm_buddy *mm, unsigned int order, for (i = order; i <= mm->max_order; ++i) { struct drm_buddy_block *tmp_block;
- list_for_each_entry_reverse(tmp_block, &mm->free_list[i], link) { + rbtree_reverse_for_each_entry(tmp_block, &mm->free_tree[i], rb) { if (block_incompatible(tmp_block, flags)) continue;
@@ -667,7 +707,7 @@ get_maxblock(struct drm_buddy *mm, unsigned int order, }
static struct drm_buddy_block * -alloc_from_freelist(struct drm_buddy *mm, +alloc_from_freetree(struct drm_buddy *mm, unsigned int order, unsigned long flags) { @@ -684,7 +724,7 @@ alloc_from_freelist(struct drm_buddy *mm, for (tmp = order; tmp <= mm->max_order; ++tmp) { struct drm_buddy_block *tmp_block;
- list_for_each_entry_reverse(tmp_block, &mm->free_list[tmp], link) { + rbtree_reverse_for_each_entry(tmp_block, &mm->free_tree[tmp], rb) { if (block_incompatible(tmp_block, flags)) continue;
@@ -700,10 +740,8 @@ alloc_from_freelist(struct drm_buddy *mm, if (!block) { /* Fallback method */ for (tmp = order; tmp <= mm->max_order; ++tmp) { - if (!list_empty(&mm->free_list[tmp])) { - block = list_last_entry(&mm->free_list[tmp], - struct drm_buddy_block, - link); + if (!rbtree_is_empty(mm, tmp)) { + block = rbtree_last_entry(mm, tmp); if (block) break; } @@ -771,7 +809,7 @@ static int __alloc_range(struct drm_buddy *mm,
if (contains(start, end, block_start, block_end)) { if (drm_buddy_block_is_free(block)) { - mark_allocated(block); + mark_allocated(mm, block); total_allocated += drm_buddy_block_size(mm, block); mm->avail -= drm_buddy_block_size(mm, block); if (drm_buddy_block_is_clear(block)) @@ -849,7 +887,6 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, { u64 rhs_offset, lhs_offset, lhs_size, filled; struct drm_buddy_block *block; - struct list_head *list; LIST_HEAD(blocks_lhs); unsigned long pages; unsigned int order; @@ -862,11 +899,10 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, if (order == 0) return -ENOSPC;
- list = &mm->free_list[order]; - if (list_empty(list)) + if (rbtree_is_empty(mm, order)) return -ENOSPC;
- list_for_each_entry_reverse(block, list, link) { + rbtree_reverse_for_each_entry(block, &mm->free_tree[order], rb) { /* Allocate blocks traversing RHS */ rhs_offset = drm_buddy_block_offset(block); err = __drm_buddy_alloc_range(mm, rhs_offset, size, @@ -976,7 +1012,7 @@ int drm_buddy_block_trim(struct drm_buddy *mm, list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { - mark_allocated(block); + mark_allocated(mm, block); mm->avail -= drm_buddy_block_size(mm, block); if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block); @@ -999,8 +1035,8 @@ __drm_buddy_alloc_blocks(struct drm_buddy *mm, return __drm_buddy_alloc_range_bias(mm, start, end, order, flags); else - /* Allocate from freelist */ - return alloc_from_freelist(mm, order, flags); + /* Allocate from freetree */ + return alloc_from_freetree(mm, order, flags); }
/** @@ -1017,8 +1053,8 @@ __drm_buddy_alloc_blocks(struct drm_buddy *mm, * alloc_range_bias() called on range limitations, which traverses * the tree and returns the desired block. * - * alloc_from_freelist() called when *no* range restrictions - * are enforced, which picks the block from the freelist. + * alloc_from_freetree() called when *no* range restrictions + * are enforced, which picks the block from the freetree. * * Returns: * 0 on success, error code on failure. @@ -1120,7 +1156,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } } while (1);
- mark_allocated(block); + mark_allocated(mm, block); mm->avail -= drm_buddy_block_size(mm, block); if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block); @@ -1204,7 +1240,7 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p) struct drm_buddy_block *block; u64 count = 0, free;
- list_for_each_entry(block, &mm->free_list[order], link) { + rbtree_for_each_entry(block, &mm->free_tree[order], rb) { BUG_ON(!drm_buddy_block_is_free(block)); count++; } diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 513837632b7d..091823592034 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -10,6 +10,7 @@ #include <linux/list.h> #include <linux/slab.h> #include <linux/sched.h> +#include <linux/rbtree.h>
#include <drm/drm_print.h>
@@ -53,7 +54,11 @@ struct drm_buddy_block { * a list, if so desired. As soon as the block is freed with * drm_buddy_free* ownership is given back to the mm. */ - struct list_head link; + union { + struct rb_node rb; + struct list_head link; + }; + struct list_head tmp_link; };
@@ -68,7 +73,7 @@ struct drm_buddy_block { */ struct drm_buddy { /* Maintain a free list for each order. */ - struct list_head *free_list; + struct rb_root *free_tree;
/* * Maintain explicit binary tree(s) to track the allocation of the diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index 8d2ba3749866..17190bb4837c 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -79,6 +79,62 @@ static inline void rb_link_node_rcu(struct rb_node *node, struct rb_node *parent ____ptr ? rb_entry(____ptr, type, member) : NULL; \ })
+/** + * rbtree_for_each_entry - iterate in-order over rb_root of given type + * + * @pos: the 'type *' to use as a loop cursor. + * @root: 'rb_root *' of the rbtree. + * @member: the name of the rb_node field within 'type'. + */ +#define rbtree_for_each_entry(pos, root, member) \ + for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member); \ + (pos); \ + (pos) = rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member)) + +/** + * rbtree_reverse_for_each_entry - iterate in reverse in-order over rb_root + * of given type + * + * @pos: the 'type *' to use as a loop cursor. + * @root: 'rb_root *' of the rbtree. + * @member: the name of the rb_node field within 'type'. + */ +#define rbtree_reverse_for_each_entry(pos, root, member) \ + for ((pos) = rb_entry_safe(rb_last(root), typeof(*(pos)), member); \ + (pos); \ + (pos) = rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member)) + +/** + * rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal + * + * @pos: the 'type *' to use as a loop cursor + * @n: another 'type *' to use as temporary storage + * @root: 'rb_root *' of the rbtree + * @member: the name of the rb_node field within 'type' + */ +#define rbtree_for_each_entry_safe(pos, n, root, member) \ + for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \ + (n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \ + (pos); \ + (pos) = (n), \ + (n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL) + +/** + * rbtree_reverse_for_each_entry_safe - iterate in reverse in-order over rb_root + * safe against removal + * + * @pos: the struct type * to use as a loop cursor. + * @n: another struct type * to use as temporary storage. + * @root: pointer to struct rb_root to iterate. + * @member: name of the rb_node field within the struct. + */ +#define rbtree_reverse_for_each_entry_safe(pos, n, root, member) \ + for ((pos) = rb_entry_safe(rb_last(root), typeof(*(pos)), member), \ + (n) = (pos) ? rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member) : NULL; \ + (pos); \ + (pos) = (n), \ + (n) = (pos) ? rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member) : NULL) + /** * rbtree_postorder_for_each_entry_safe - iterate in post-order over rb_root of * given type allowing the backing memory of @pos to be invalidated
base-commit: 7156602d56e5ad689ae11e03680ab6326238b5e3
Hi Matthew,
I have added a new performance test case in this version, Could you please take a look and provide RB. It is important to get these patches upstream soon.
Thanks, Arun.
On 9/9/2025 3:26 PM, Arunpravin Paneer Selvam wrote:
Replace the freelist (O(n)) used for free block management with a red-black tree, providing more efficient O(log n) search, insert, and delete operations. This improves scalability and performance when managing large numbers of free blocks per order (e.g., hundreds or thousands).
In the VK-CTS memory stress subtest, the buddy manager merges fragmented memory and inserts freed blocks into the freelist. Since freelist insertion is O(n), this becomes a bottleneck as fragmentation increases. Benchmarking shows list_insert_sorted() consumes ~52.69% CPU with the freelist, compared to just 0.03% with the RB tree (rbtree_insert.isra.0), despite performing the same sorted insert.
This also improves performance in heavily fragmented workloads, such as games or graphics tests that stress memory.
As the buddy allocator evolves with new features such as clear-page tracking, the resulting fragmentation and complexity have grown. These RB-tree based design changes are introduced to address that growth and ensure the allocator continues to perform efficiently under fragmented conditions.
The RB tree implementation with separate clear/dirty trees provides:
- O(n log n) aggregate complexity for all operations instead of O(n^2)
- Elimination of soft lockups and system instability
- Improved code maintainability and clarity
- Better scalability for large memory systems
- Predictable performance under fragmentation
v3(Matthew):
- Remove RB_EMPTY_NODE check in force_merge function.
- Rename rb for loop macros to have less generic names and move to .c file.
- Make the rb node rb and link field as union.
v4(Jani Nikula):
- The kernel-doc comment should be "/**"
- Move all the rbtree macros to rbtree.h and add parens to ensure correct precedence.
v5:
- Remove the inline in a .c file (Jani Nikula).
Cc: stable@vger.kernel.org Signed-off-by: Arunpravin Paneer Selvam Arunpravin.PaneerSelvam@amd.com
drivers/gpu/drm/drm_buddy.c | 142 ++++++++++++++++++++++-------------- include/drm/drm_buddy.h | 9 ++- include/linux/rbtree.h | 56 ++++++++++++++ 3 files changed, 152 insertions(+), 55 deletions(-)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index a94061f373de..8b340f47f73d 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -31,6 +31,8 @@ static struct drm_buddy_block *drm_block_alloc(struct drm_buddy *mm, block->header |= order; block->parent = parent;
- RB_CLEAR_NODE(&block->rb);
- BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED); return block; }
@@ -41,23 +43,53 @@ static void drm_block_free(struct drm_buddy *mm, kmem_cache_free(slab_blocks, block); } -static void list_insert_sorted(struct drm_buddy *mm,
struct drm_buddy_block *block)
+static void rbtree_insert(struct drm_buddy *mm,
{struct drm_buddy_block *block)
- struct rb_root *root = &mm->free_tree[drm_buddy_block_order(block)];
- struct rb_node **link = &root->rb_node;
- struct rb_node *parent = NULL; struct drm_buddy_block *node;
- struct list_head *head;
- u64 offset;
- offset = drm_buddy_block_offset(block);
- head = &mm->free_list[drm_buddy_block_order(block)];
- if (list_empty(head)) {
list_add(&block->link, head);
return;
- while (*link) {
parent = *link;
node = rb_entry(parent, struct drm_buddy_block, rb);
if (offset < drm_buddy_block_offset(node))
link = &parent->rb_left;
else
}link = &parent->rb_right;
- list_for_each_entry(node, head, link)
if (drm_buddy_block_offset(block) < drm_buddy_block_offset(node))
break;
- rb_link_node(&block->rb, parent, link);
- rb_insert_color(&block->rb, root);
+}
+static void rbtree_remove(struct drm_buddy *mm,
struct drm_buddy_block *block)
+{
- struct rb_root *root;
- root = &mm->free_tree[drm_buddy_block_order(block)];
- rb_erase(&block->rb, root);
- RB_CLEAR_NODE(&block->rb);
+}
+static struct drm_buddy_block * +rbtree_last_entry(struct drm_buddy *mm, unsigned int order) +{
- struct rb_node *node = rb_last(&mm->free_tree[order]);
- __list_add(&block->link, node->link.prev, &node->link);
- return node ? rb_entry(node, struct drm_buddy_block, rb) : NULL;
+}
+static bool rbtree_is_empty(struct drm_buddy *mm, unsigned int order) +{
- return RB_EMPTY_ROOT(&mm->free_tree[order]); }
static void clear_reset(struct drm_buddy_block *block) @@ -70,12 +102,13 @@ static void mark_cleared(struct drm_buddy_block *block) block->header |= DRM_BUDDY_HEADER_CLEAR; } -static void mark_allocated(struct drm_buddy_block *block) +static void mark_allocated(struct drm_buddy *mm,
{ block->header &= ~DRM_BUDDY_HEADER_STATE; block->header |= DRM_BUDDY_ALLOCATED;struct drm_buddy_block *block)
- list_del(&block->link);
- rbtree_remove(mm, block); }
static void mark_free(struct drm_buddy *mm, @@ -84,15 +117,16 @@ static void mark_free(struct drm_buddy *mm, block->header &= ~DRM_BUDDY_HEADER_STATE; block->header |= DRM_BUDDY_FREE;
- list_insert_sorted(mm, block);
- rbtree_insert(mm, block); }
-static void mark_split(struct drm_buddy_block *block) +static void mark_split(struct drm_buddy *mm,
{ block->header &= ~DRM_BUDDY_HEADER_STATE; block->header |= DRM_BUDDY_SPLIT;struct drm_buddy_block *block)
- list_del(&block->link);
- rbtree_remove(mm, block); }
static inline bool overlaps(u64 s1, u64 e1, u64 s2, u64 e2) @@ -148,7 +182,7 @@ static unsigned int __drm_buddy_free(struct drm_buddy *mm, mark_cleared(parent); }
list_del(&buddy->link);
if (force_merge && drm_buddy_block_is_clear(buddy)) mm->clear_avail -= drm_buddy_block_size(mm, buddy);rbtree_remove(mm, buddy);
@@ -179,9 +213,11 @@ static int __force_merge(struct drm_buddy *mm, return -EINVAL; for (i = min_order - 1; i >= 0; i--) {
struct drm_buddy_block *block, *prev;
struct drm_buddy_block *block, *prev_block, *first_block;
first_block = rb_entry(rb_first(&mm->free_tree[i]), struct drm_buddy_block, rb);
list_for_each_entry_safe_reverse(block, prev, &mm->free_list[i], link) {
rbtree_reverse_for_each_entry_safe(block, prev_block, &mm->free_tree[i], rb) { struct drm_buddy_block *buddy; u64 block_start, block_end;
@@ -206,10 +242,14 @@ static int __force_merge(struct drm_buddy *mm, * block in the next iteration as we would free the * buddy block as part of the free function. */
if (prev == buddy)
prev = list_prev_entry(prev, link);
if (prev_block && prev_block == buddy) {
if (prev_block != first_block)
prev_block = rb_entry(rb_prev(&prev_block->rb),
struct drm_buddy_block,
rb);
}
list_del(&block->link);
rbtree_remove(mm, block); if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block);
@@ -258,14 +298,14 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size) BUG_ON(mm->max_order > DRM_BUDDY_MAX_ORDER);
- mm->free_list = kmalloc_array(mm->max_order + 1,
sizeof(struct list_head),
- mm->free_tree = kmalloc_array(mm->max_order + 1,
sizeof(struct rb_root), GFP_KERNEL);
- if (!mm->free_list)
- if (!mm->free_tree) return -ENOMEM;
for (i = 0; i <= mm->max_order; ++i)
INIT_LIST_HEAD(&mm->free_list[i]);
mm->free_tree[i] = RB_ROOT;
mm->n_roots = hweight64(size); @@ -273,7 +313,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size) sizeof(struct drm_buddy_block *), GFP_KERNEL); if (!mm->roots)
goto out_free_list;
goto out_free_tree;
offset = 0; i = 0; @@ -312,8 +352,8 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size) while (i--) drm_block_free(mm, mm->roots[i]); kfree(mm->roots); -out_free_list:
- kfree(mm->free_list);
+out_free_tree:
- kfree(mm->free_tree); return -ENOMEM; } EXPORT_SYMBOL(drm_buddy_init);
@@ -323,7 +363,7 @@ EXPORT_SYMBOL(drm_buddy_init);
- @mm: DRM buddy manager to free
- Cleanup memory manager resources and the freelist
*/ void drm_buddy_fini(struct drm_buddy *mm) {
- Cleanup memory manager resources and the freetree
@@ -350,7 +390,7 @@ void drm_buddy_fini(struct drm_buddy *mm) WARN_ON(mm->avail != mm->size); kfree(mm->roots);
- kfree(mm->free_list);
- kfree(mm->free_tree); } EXPORT_SYMBOL(drm_buddy_fini);
@@ -383,7 +423,7 @@ static int split_block(struct drm_buddy *mm, clear_reset(block); }
- mark_split(block);
- mark_split(mm, block);
return 0; } @@ -412,7 +452,7 @@ EXPORT_SYMBOL(drm_get_buddy);
- @is_clear: blocks clear state
- Reset the clear state based on @is_clear value for each block
- in the freelist.
*/ void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear) {
- in the freetree.
@@ -433,7 +473,7 @@ void drm_buddy_reset_clear(struct drm_buddy *mm, bool is_clear) for (i = 0; i <= mm->max_order; ++i) { struct drm_buddy_block *block;
list_for_each_entry_reverse(block, &mm->free_list[i], link) {
rbtree_reverse_for_each_entry(block, &mm->free_tree[i], rb) { if (is_clear != drm_buddy_block_is_clear(block)) { if (is_clear) { mark_cleared(block);
@@ -641,7 +681,7 @@ get_maxblock(struct drm_buddy *mm, unsigned int order, for (i = order; i <= mm->max_order; ++i) { struct drm_buddy_block *tmp_block;
list_for_each_entry_reverse(tmp_block, &mm->free_list[i], link) {
rbtree_reverse_for_each_entry(tmp_block, &mm->free_tree[i], rb) { if (block_incompatible(tmp_block, flags)) continue;
@@ -667,7 +707,7 @@ get_maxblock(struct drm_buddy *mm, unsigned int order, } static struct drm_buddy_block * -alloc_from_freelist(struct drm_buddy *mm, +alloc_from_freetree(struct drm_buddy *mm, unsigned int order, unsigned long flags) { @@ -684,7 +724,7 @@ alloc_from_freelist(struct drm_buddy *mm, for (tmp = order; tmp <= mm->max_order; ++tmp) { struct drm_buddy_block *tmp_block;
list_for_each_entry_reverse(tmp_block, &mm->free_list[tmp], link) {
rbtree_reverse_for_each_entry(tmp_block, &mm->free_tree[tmp], rb) { if (block_incompatible(tmp_block, flags)) continue;
@@ -700,10 +740,8 @@ alloc_from_freelist(struct drm_buddy *mm, if (!block) { /* Fallback method */ for (tmp = order; tmp <= mm->max_order; ++tmp) {
if (!list_empty(&mm->free_list[tmp])) {
block = list_last_entry(&mm->free_list[tmp],
struct drm_buddy_block,
link);
if (!rbtree_is_empty(mm, tmp)) {
block = rbtree_last_entry(mm, tmp); if (block) break; }
@@ -771,7 +809,7 @@ static int __alloc_range(struct drm_buddy *mm, if (contains(start, end, block_start, block_end)) { if (drm_buddy_block_is_free(block)) {
mark_allocated(block);
mark_allocated(mm, block); total_allocated += drm_buddy_block_size(mm, block); mm->avail -= drm_buddy_block_size(mm, block); if (drm_buddy_block_is_clear(block))
@@ -849,7 +887,6 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, { u64 rhs_offset, lhs_offset, lhs_size, filled; struct drm_buddy_block *block;
- struct list_head *list; LIST_HEAD(blocks_lhs); unsigned long pages; unsigned int order;
@@ -862,11 +899,10 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, if (order == 0) return -ENOSPC;
- list = &mm->free_list[order];
- if (list_empty(list))
- if (rbtree_is_empty(mm, order)) return -ENOSPC;
- list_for_each_entry_reverse(block, list, link) {
- rbtree_reverse_for_each_entry(block, &mm->free_tree[order], rb) { /* Allocate blocks traversing RHS */ rhs_offset = drm_buddy_block_offset(block); err = __drm_buddy_alloc_range(mm, rhs_offset, size,
@@ -976,7 +1012,7 @@ int drm_buddy_block_trim(struct drm_buddy *mm, list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) {
mark_allocated(block);
mm->avail -= drm_buddy_block_size(mm, block); if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block);mark_allocated(mm, block);
@@ -999,8 +1035,8 @@ __drm_buddy_alloc_blocks(struct drm_buddy *mm, return __drm_buddy_alloc_range_bias(mm, start, end, order, flags); else
/* Allocate from freelist */
return alloc_from_freelist(mm, order, flags);
/* Allocate from freetree */
}return alloc_from_freetree(mm, order, flags);
/** @@ -1017,8 +1053,8 @@ __drm_buddy_alloc_blocks(struct drm_buddy *mm,
- alloc_range_bias() called on range limitations, which traverses
- the tree and returns the desired block.
- alloc_from_freelist() called when *no* range restrictions
- are enforced, which picks the block from the freelist.
- alloc_from_freetree() called when *no* range restrictions
- are enforced, which picks the block from the freetree.
- Returns:
- 0 on success, error code on failure.
@@ -1120,7 +1156,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } } while (1);
mark_allocated(block);
mm->avail -= drm_buddy_block_size(mm, block); if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block);mark_allocated(mm, block);
@@ -1204,7 +1240,7 @@ void drm_buddy_print(struct drm_buddy *mm, struct drm_printer *p) struct drm_buddy_block *block; u64 count = 0, free;
list_for_each_entry(block, &mm->free_list[order], link) {
}rbtree_for_each_entry(block, &mm->free_tree[order], rb) { BUG_ON(!drm_buddy_block_is_free(block)); count++;
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 513837632b7d..091823592034 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -10,6 +10,7 @@ #include <linux/list.h> #include <linux/slab.h> #include <linux/sched.h> +#include <linux/rbtree.h> #include <drm/drm_print.h> @@ -53,7 +54,11 @@ struct drm_buddy_block { * a list, if so desired. As soon as the block is freed with * drm_buddy_free* ownership is given back to the mm. */
- struct list_head link;
- union {
struct rb_node rb;
struct list_head link;
- };
- struct list_head tmp_link; };
@@ -68,7 +73,7 @@ struct drm_buddy_block { */ struct drm_buddy { /* Maintain a free list for each order. */
- struct list_head *free_list;
- struct rb_root *free_tree;
/* * Maintain explicit binary tree(s) to track the allocation of the diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index 8d2ba3749866..17190bb4837c 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -79,6 +79,62 @@ static inline void rb_link_node_rcu(struct rb_node *node, struct rb_node *parent ____ptr ? rb_entry(____ptr, type, member) : NULL; \ }) +/**
- rbtree_for_each_entry - iterate in-order over rb_root of given type
- @pos: the 'type *' to use as a loop cursor.
- @root: 'rb_root *' of the rbtree.
- @member: the name of the rb_node field within 'type'.
- */
+#define rbtree_for_each_entry(pos, root, member) \
- for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member); \
(pos); \
(pos) = rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member))
+/**
- rbtree_reverse_for_each_entry - iterate in reverse in-order over rb_root
- of given type
- @pos: the 'type *' to use as a loop cursor.
- @root: 'rb_root *' of the rbtree.
- @member: the name of the rb_node field within 'type'.
- */
+#define rbtree_reverse_for_each_entry(pos, root, member) \
- for ((pos) = rb_entry_safe(rb_last(root), typeof(*(pos)), member); \
(pos); \
(pos) = rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member))
+/**
- rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal
- @pos: the 'type *' to use as a loop cursor
- @n: another 'type *' to use as temporary storage
- @root: 'rb_root *' of the rbtree
- @member: the name of the rb_node field within 'type'
- */
+#define rbtree_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL)
+/**
- rbtree_reverse_for_each_entry_safe - iterate in reverse in-order over rb_root
- safe against removal
- @pos: the struct type * to use as a loop cursor.
- @n: another struct type * to use as temporary storage.
- @root: pointer to struct rb_root to iterate.
- @member: name of the rb_node field within the struct.
- */
+#define rbtree_reverse_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_last(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member) : NULL)
- /**
- rbtree_postorder_for_each_entry_safe - iterate in post-order over rb_root of
- given type allowing the backing memory of @pos to be invalidated
base-commit: 7156602d56e5ad689ae11e03680ab6326238b5e3
Hi Arun,
On 09.09.25 11:56, Arunpravin Paneer Selvam wrote: [SNIP]
+/**
- rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal
- @pos: the 'type *' to use as a loop cursor
- @n: another 'type *' to use as temporary storage
- @root: 'rb_root *' of the rbtree
- @member: the name of the rb_node field within 'type'
- */
+#define rbtree_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL)
As far as I know exactly that operation does not work on an R/B tree.
See the _safe() variants of the for_each_ macros are usually used to iterate over a container while being able to remove entries.
But because of the potential re-balance storing just the next entry is not sufficient for an R/B tree to do that as far as I know.
Please explain how exactly you want to use this macro.
Regards, Christian.
+/**
- rbtree_reverse_for_each_entry_safe - iterate in reverse in-order over rb_root
- safe against removal
- @pos: the struct type * to use as a loop cursor.
- @n: another struct type * to use as temporary storage.
- @root: pointer to struct rb_root to iterate.
- @member: name of the rb_node field within the struct.
- */
+#define rbtree_reverse_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_last(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_prev(&(pos)->member), typeof(*(pos)), member) : NULL)
/**
- rbtree_postorder_for_each_entry_safe - iterate in post-order over rb_root of
- given type allowing the backing memory of @pos to be invalidated
base-commit: 7156602d56e5ad689ae11e03680ab6326238b5e3
On Tue, Sep 09, 2025 at 02:04:30PM +0200, Christian König wrote:
Hi Arun,
On 09.09.25 11:56, Arunpravin Paneer Selvam wrote: [SNIP]
+/**
- rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal
- @pos: the 'type *' to use as a loop cursor
- @n: another 'type *' to use as temporary storage
- @root: 'rb_root *' of the rbtree
- @member: the name of the rb_node field within 'type'
- */
+#define rbtree_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL)
As far as I know exactly that operation does not work on an R/B tree.
See the _safe() variants of the for_each_ macros are usually used to iterate over a container while being able to remove entries.
But because of the potential re-balance storing just the next entry is not sufficient for an R/B tree to do that as far as I know.
Please explain how exactly you want to use this macro.
So I don't much like these iterators; I've said so before. Either we should introduce a properly threaded rb-tree (where the NULL child pointers encode a linked list), or simply keep a list_head next to the rb_node and use that.
The rb_{next,prev}() things are O(ln n), in the worst case they do a full traversal up the tree and a full traversal down the other branch.
That said; given 'next' will remain an existing node, only the 'pos' node gets removed, rb_next() will still work correctly, even in the face of rebalance.
On 09.09.25 16:05, Peter Zijlstra wrote:
On Tue, Sep 09, 2025 at 02:04:30PM +0200, Christian König wrote:
Hi Arun,
On 09.09.25 11:56, Arunpravin Paneer Selvam wrote: [SNIP]
+/**
- rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal
- @pos: the 'type *' to use as a loop cursor
- @n: another 'type *' to use as temporary storage
- @root: 'rb_root *' of the rbtree
- @member: the name of the rb_node field within 'type'
- */
+#define rbtree_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL)
As far as I know exactly that operation does not work on an R/B tree.
See the _safe() variants of the for_each_ macros are usually used to iterate over a container while being able to remove entries.
But because of the potential re-balance storing just the next entry is not sufficient for an R/B tree to do that as far as I know.
Please explain how exactly you want to use this macro.
So I don't much like these iterators; I've said so before. Either we should introduce a properly threaded rb-tree (where the NULL child pointers encode a linked list), or simply keep a list_head next to the rb_node and use that.
I agree, something is clearly fishy here.
The rb_{next,prev}() things are O(ln n), in the worst case they do a full traversal up the tree and a full traversal down the other branch.
Yeah from the logic that is exactly what is supposed to happen in the __force_merge() function.
The question is rather why does that function exists in the first place? The operation doesn't look logical to me.
For drm_buddy_reset_clear() and drm_buddy_fini() we should use rbtree_postorder_for_each_entry_safe() instead.
And during normal allocation __force_merge() should never be used.
That said; given 'next' will remain an existing node, only the 'pos' node gets removed, rb_next() will still work correctly, even in the face of rebalance.
Good to know!
Regards, Christian.
Hi Christian,
On 9/9/2025 9:55 PM, Christian König wrote:
On 09.09.25 16:05, Peter Zijlstra wrote:
On Tue, Sep 09, 2025 at 02:04:30PM +0200, Christian König wrote:
Hi Arun,
On 09.09.25 11:56, Arunpravin Paneer Selvam wrote: [SNIP]
+/**
- rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal
- @pos: the 'type *' to use as a loop cursor
- @n: another 'type *' to use as temporary storage
- @root: 'rb_root *' of the rbtree
- @member: the name of the rb_node field within 'type'
- */
+#define rbtree_for_each_entry_safe(pos, n, root, member) \
- for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \
(pos); \
(pos) = (n), \
(n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL)
As far as I know exactly that operation does not work on an R/B tree.
See the _safe() variants of the for_each_ macros are usually used to iterate over a container while being able to remove entries.
But because of the potential re-balance storing just the next entry is not sufficient for an R/B tree to do that as far as I know.
Please explain how exactly you want to use this macro.
Thanks for the pointer, yes, this will not work on RB tree. We need a reverse safe variant for use in the force_merge() function similar to the list_for_each_entry_safe_reverse() macro in list.h. The reason is that in force_merge(), we remove the block from the free tree before invoking drm_buddy_free(), which merges and frees buddy blocks to form a larger block.
So I don't much like these iterators; I've said so before. Either we should introduce a properly threaded rb-tree (where the NULL child pointers encode a linked list), or simply keep a list_head next to the rb_node and use that.
I agree, something is clearly fishy here.
The rb_{next,prev}() things are O(ln n), in the worst case they do a full traversal up the tree and a full traversal down the other branch.
Yeah from the logic that is exactly what is supposed to happen in the __force_merge() function.
The question is rather why does that function exists in the first place? The operation doesn't look logical to me.
For drm_buddy_reset_clear() and drm_buddy_fini() we should use rbtree_postorder_for_each_entry_safe() instead.
And during normal allocation __force_merge() should never be used.
In normal allocation, the force_merge() function is used when no free blocks of the requested order are available. In such cases, smaller blocks must be merged on demand to satisfy the allocation. Mainly, this does not involve traversing the entire tree to merge all blocks, but only merging as needed. For example, if the requested order is 6, and the minimum order is 5, drm_buddy_alloc_blocks() will first attempt to allocate an order-6 block. If none are available, it will try to allocate two order-5 blocks. If those are also unavailable, it will invoke force_merge() to merge lower order blocks (4, 3, 2, 1, 0) in order to coalesce into a higher-order block of order 5.
In drm_buddy_fini(), force_merge() is called to ensure all blocks are merged before tearing down the allocator. This guarantees that all mm->roots are freed and not held by the driver at shutdown. If any blocks remain allocated, drm_buddy_fini() will issue a warning.
In drm_buddy_reset_clear(), which is invoked at device suspend/resume, it is an ideal place to call force_merge(). This ensures that all possible blocks are merged before resetting the clear state, thereby reducing fragmentation and improving allocation efficiency after resume.
I tried using this rbtree_postorder_for_each_entry_safe() macro in the force_merge() and it works, but we also a need a reverse variant since in normal allocation we dont want to disturb the lower addresses.
Thanks, Arun.
That said; given 'next' will remain an existing node, only the 'pos' node gets removed, rb_next() will still work correctly, even in the face of rebalance.
Good to know!
Regards, Christian.
On 10.09.25 14:37, Arunpravin Paneer Selvam wrote:
Hi Christian,
On 9/9/2025 9:55 PM, Christian König wrote:
On 09.09.25 16:05, Peter Zijlstra wrote:
On Tue, Sep 09, 2025 at 02:04:30PM +0200, Christian König wrote:
Hi Arun,
On 09.09.25 11:56, Arunpravin Paneer Selvam wrote: [SNIP]
+/**
- rbtree_for_each_entry_safe - iterate in-order over rb_root safe against removal
- @pos: the 'type *' to use as a loop cursor
- @n: another 'type *' to use as temporary storage
- @root: 'rb_root *' of the rbtree
- @member: the name of the rb_node field within 'type'
- */
+#define rbtree_for_each_entry_safe(pos, n, root, member) \ + for ((pos) = rb_entry_safe(rb_first(root), typeof(*(pos)), member), \ + (n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL; \ + (pos); \ + (pos) = (n), \ + (n) = (pos) ? rb_entry_safe(rb_next(&(pos)->member), typeof(*(pos)), member) : NULL)
As far as I know exactly that operation does not work on an R/B tree.
See the _safe() variants of the for_each_ macros are usually used to iterate over a container while being able to remove entries.
But because of the potential re-balance storing just the next entry is not sufficient for an R/B tree to do that as far as I know.
Please explain how exactly you want to use this macro.
Thanks for the pointer, yes, this will not work on RB tree. We need a reverse safe variant for use in the force_merge() function similar to the list_for_each_entry_safe_reverse() macro in list.h. The reason is that in force_merge(), we remove the block from the free tree before invoking drm_buddy_free(), which merges and frees buddy blocks to form a larger block.
So I don't much like these iterators; I've said so before. Either we should introduce a properly threaded rb-tree (where the NULL child pointers encode a linked list), or simply keep a list_head next to the rb_node and use that.
I agree, something is clearly fishy here.
The rb_{next,prev}() things are O(ln n), in the worst case they do a full traversal up the tree and a full traversal down the other branch.
Yeah from the logic that is exactly what is supposed to happen in the __force_merge() function.
The question is rather why does that function exists in the first place? The operation doesn't look logical to me.
For drm_buddy_reset_clear() and drm_buddy_fini() we should use rbtree_postorder_for_each_entry_safe() instead.
And during normal allocation __force_merge() should never be used.
In normal allocation, the force_merge() function is used when no free blocks of the requested order are available. In such cases, smaller blocks must be merged on demand to satisfy the allocation. Mainly, this does not involve traversing the entire tree to merge all blocks, but only merging as needed. For example, if the requested order is 6, and the minimum order is 5, drm_buddy_alloc_blocks() will first attempt to allocate an order-6 block. If none are available, it will try to allocate two order-5 blocks. If those are also unavailable, it will invoke force_merge() to merge lower order blocks (4, 3, 2, 1, 0) in order to coalesce into a higher-order block of order 5.
Yeah and exactly that is what should never be necessary in the first place.
The idea of a buddy allocator is that blocks are merged when they are freed and not on demand.
The only use case for the force_merge() I can see is when cleared blocks are merged with non-cleared ones, but that is orthogonal to the discussion here.
In drm_buddy_fini(), force_merge() is called to ensure all blocks are merged before tearing down the allocator. This guarantees that all mm->roots are freed and not held by the driver at shutdown. If any blocks remain allocated, drm_buddy_fini() will issue a warning.
In drm_buddy_reset_clear(), which is invoked at device suspend/resume, it is an ideal place to call force_merge(). This ensures that all possible blocks are merged before resetting the clear state, thereby reducing fragmentation and improving allocation efficiency after resume.
That's where rbtree_postorder_for_each_entry_safe() should be used.
I tried using this rbtree_postorder_for_each_entry_safe() macro in the force_merge() and it works, but we also a need a reverse variant since in normal allocation we dont want to disturb the lower addresses.
I don't get what you mean here.
Regards, Christian.
Thanks, Arun.
That said; given 'next' will remain an existing node, only the 'pos' node gets removed, rb_next() will still work correctly, even in the face of rebalance.
Good to know!
Regards, Christian.
linux-stable-mirror@lists.linaro.org