This patchset introduces a new dma heap, chunk heap that makes it easy to perform the bulk allocation of high order pages. It has been created to help optimize the 4K/8K HDR video playback with secure DRM HW to protect contents on memory. The HW needs physically contiguous memory chunks up to several hundred MB memory.
The chunk heap is registered by device tree with alignment and memory node of Contiguous Memory Allocator(CMA). Alignment defines chunk page size. For example, alignment 0x1_0000 means chunk page size is 64KB. The phandle to memory node indicates contiguous memory allocator(CMA). If device node doesn't have cma, the registration of chunk heap fails.
This patchset is against on next-20201110.
The patchset includes the following: - cma_alloc_bulk API - export dma-heap API to register kernel module dma heap. - add chunk heap implementation. - devicetree
Hyesoo Yu (3): dma-buf: add export symbol for dma-heap dma-buf: heaps: add chunk heap to dmabuf heaps dma-heap: Devicetree binding for chunk heap
Minchan Kim (1): mm: introduce cma_alloc_bulk API
.../bindings/dma-buf/chunk_heap.yaml | 52 ++ drivers/dma-buf/dma-heap.c | 2 + drivers/dma-buf/heaps/Kconfig | 9 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/chunk_heap.c | 458 ++++++++++++++++++ include/linux/cma.h | 5 + include/linux/page-isolation.h | 1 + mm/cma.c | 129 ++++- mm/page_alloc.c | 19 +- mm/page_isolation.c | 3 +- 10 files changed, 666 insertions(+), 13 deletions(-) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
There is a need for special HW to require bulk allocation of high-order pages. For example, 4800 * order-4 pages, which would be minimum, sometimes, it requires more.
To meet the requirement, a option reserves 300M CMA area and requests the whole 300M contiguous memory. However, it doesn't work if even one of those pages in the range is long-term pinned directly or indirectly. The other option is to ask higher-order size (e.g., 2M) than requested order(64K) repeatedly until driver could gather necessary amount of memory. Basically, this approach makes the allocation very slow due to cma_alloc's function slowness and it could be stuck on one of the pageblocks if it encounters unmigratable page.
To solve the issue, this patch introduces cma_alloc_bulk.
int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, unsigned int order, size_t nr_requests, struct page **page_array, size_t *nr_allocated);
Most parameters are same with cma_alloc but it additionally passes vector array to store allocated memory. What's different with cma_alloc is it will skip pageblocks without waiting/stopping if it has unmovable page so that API continues to scan other pageblocks to find requested order page.
cma_alloc_bulk is best effort approach in that it skips some pageblocks if they have unmovable pages unlike cma_alloc. It doesn't need to be perfect from the beginning at the cost of performance. Thus, the API takes gfp_t to support __GFP_NORETRY which is propagated into alloc_contig_page to avoid significat overhead functions to inrecase CMA allocation success ratio(e.g., migration retrial, PCP, LRU draining per pageblock) at the cost of less allocation success ratio. If the caller couldn't allocate enough pages with __GFP_NORETRY, they could call it without __GFP_NORETRY to increase success ratio this time if they are okay to expense the overhead for the success ratio.
Signed-off-by: Minchan Kim minchan@kernel.org --- include/linux/cma.h | 5 ++ include/linux/page-isolation.h | 1 + mm/cma.c | 126 +++++++++++++++++++++++++++++++-- mm/page_alloc.c | 19 +++-- mm/page_isolation.c | 3 +- 5 files changed, 141 insertions(+), 13 deletions(-)
diff --git a/include/linux/cma.h b/include/linux/cma.h index 217999c8a762..2fc8d2b7cf99 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -46,6 +46,11 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, bool no_warn); + +extern int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, + unsigned int order, size_t nr_requests, + struct page **page_array, size_t *nr_allocated); + extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count);
extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 572458016331..0e105dce2a15 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -32,6 +32,7 @@ static inline bool is_migrate_isolate(int migratetype)
#define MEMORY_OFFLINE 0x1 #define REPORT_FAILURE 0x2 +#define SKIP_PCP_DRAIN 0x4
struct page *has_unmovable_pages(struct zone *zone, struct page *page, int migratetype, int flags); diff --git a/mm/cma.c b/mm/cma.c index 3692a34e2353..7c11ec2dc04c 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -32,6 +32,7 @@ #include <linux/highmem.h> #include <linux/io.h> #include <linux/kmemleak.h> +#include <linux/swap.h> #include <trace/events/cma.h>
#include "cma.h" @@ -397,6 +398,14 @@ static void cma_debug_show_areas(struct cma *cma) static inline void cma_debug_show_areas(struct cma *cma) { } #endif
+static void reset_page_kasan_tag(struct page *page, int count) +{ + int i; + + for (i = 0; i < count; i++) + page_kasan_tag_reset(page + i); +} + /** * cma_alloc() - allocate pages from contiguous area * @cma: Contiguous memory region for which the allocation is performed. @@ -414,7 +423,6 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, unsigned long pfn = -1; unsigned long start = 0; unsigned long bitmap_maxno, bitmap_no, bitmap_count; - size_t i; struct page *page = NULL; int ret = -ENOMEM;
@@ -478,10 +486,8 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, * blocks being marked with different tags. Reset the tags to ignore * those page blocks. */ - if (page) { - for (i = 0; i < count; i++) - page_kasan_tag_reset(page + i); - } + if (page) + reset_page_kasan_tag(page, count);
if (ret && !no_warn) { pr_err("%s: alloc failed, req-size: %zu pages, ret: %d\n", @@ -493,6 +499,116 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, return page; }
+/* + * cma_alloc_bulk() - allocate high order bulk pages from contiguous area with + * best effort. It will usually be used for private @cma + * + * @cma: contiguous memory region for which the allocation is performed. + * @align: requested alignment of pages (in PAGE_SIZE order). + * @gfp_mask: memory allocation flags + * @order: requested page order + * @nr_requests: the number of 2^order pages requested to be allocated as input, + * @page_array: page_array pointer to store allocated pages (must have space + * for at least nr_requests) + * @nr_allocated: the number of 2^order pages allocated as output + * + * This function tries to allocate up to @nr_requests @order pages on specific + * contiguous memory area. If @gfp_mask has __GFP_NORETRY, it will avoid costly + * functions to increase allocation success ratio so it will be fast but might + * return less than requested number of pages. User could retry with + * !__GFP_NORETRY if it is needed. + * + * Return: it will return 0 only if all pages requested by @nr_requestsed are + * allocated. Otherwise, it returns negative error code. + * + * Note: Regardless of success/failure, user should check @nr_allocated to see + * how many @order pages are allocated and free those pages when they are not + * needed. + */ +int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, + unsigned int order, size_t nr_requests, + struct page **page_array, size_t *nr_allocated) +{ + int ret = 0; + size_t i = 0; + unsigned long nr_pages_needed = nr_requests * (1 << order); + unsigned long nr_chunk_pages, nr_pages; + unsigned long mask, offset; + unsigned long pfn = -1; + unsigned long start = 0; + unsigned long bitmap_maxno, bitmap_no, bitmap_count; + struct page *page = NULL; + gfp_t gfp = GFP_KERNEL|__GFP_NOWARN|gfp_mask; + + *nr_allocated = 0; + if (!cma || !cma->count || !cma->bitmap || !page_array) + return -EINVAL; + + if (!nr_pages_needed) + return 0; + + nr_chunk_pages = 1 << max_t(unsigned int, order, pageblock_order); + + mask = cma_bitmap_aligned_mask(cma, align); + offset = cma_bitmap_aligned_offset(cma, align); + bitmap_maxno = cma_bitmap_maxno(cma); + + lru_add_drain_all(); + drain_all_pages(NULL); + + while (nr_pages_needed) { + nr_pages = min(nr_chunk_pages, nr_pages_needed); + + bitmap_count = cma_bitmap_pages_to_bits(cma, nr_pages); + mutex_lock(&cma->lock); + bitmap_no = bitmap_find_next_zero_area_off(cma->bitmap, + bitmap_maxno, start, bitmap_count, mask, + offset); + if (bitmap_no >= bitmap_maxno) { + mutex_unlock(&cma->lock); + break; + } + bitmap_set(cma->bitmap, bitmap_no, bitmap_count); + /* + * It's safe to drop the lock here. If the migration fails + * cma_clear_bitmap will take the lock again and unmark it. + */ + mutex_unlock(&cma->lock); + + pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); + ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA, gfp); + if (ret) { + cma_clear_bitmap(cma, pfn, nr_pages); + if (ret != -EBUSY) + break; + + /* continue to search next block */ + start = (pfn + nr_pages - cma->base_pfn) >> + cma->order_per_bit; + continue; + } + + page = pfn_to_page(pfn); + while (nr_pages) { + page_array[i++] = page; + reset_page_kasan_tag(page, 1 << order); + page += 1 << order; + nr_pages -= 1 << order; + nr_pages_needed -= 1 << order; + } + + start = bitmap_no + bitmap_count; + } + + *nr_allocated = i; + + if (!ret && nr_pages_needed) + ret = -EBUSY; + + return ret; +} +EXPORT_SYMBOL_GPL(cma_alloc_bulk); + /** * cma_release() - release allocated pages * @cma: Contiguous memory region for which the allocation is performed. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f84b7eea39ec..097cc83097bb 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8404,7 +8404,8 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
/* [start, end) must belong to a single zone. */ static int __alloc_contig_migrate_range(struct compact_control *cc, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, + unsigned int max_tries) { /* This function is based on compact_zone() from compaction.c. */ unsigned int nr_reclaimed; @@ -8432,7 +8433,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, break; } tries = 0; - } else if (++tries == 5) { + } else if (++tries == max_tries) { ret = ret < 0 ? ret : -EBUSY; break; } @@ -8478,6 +8479,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, unsigned long outer_start, outer_end; unsigned int order; int ret = 0; + bool no_retry = gfp_mask & __GFP_NORETRY;
struct compact_control cc = { .nr_migratepages = 0, @@ -8516,7 +8518,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, */
ret = start_isolate_page_range(pfn_max_align_down(start), - pfn_max_align_up(end), migratetype, 0); + pfn_max_align_up(end), migratetype, + no_retry ? SKIP_PCP_DRAIN : 0); if (ret) return ret;
@@ -8530,7 +8533,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, * allocated. So, if we fall through be sure to clear ret so that * -EBUSY is not accidentally used or returned to caller. */ - ret = __alloc_contig_migrate_range(&cc, start, end); + ret = __alloc_contig_migrate_range(&cc, start, end, no_retry ? 1 : 5); if (ret && ret != -EBUSY) goto done; ret =0; @@ -8552,7 +8555,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, * isolated thus they won't get removed from buddy. */
- lru_add_drain_all(); + if (!no_retry) + lru_add_drain_all();
order = 0; outer_start = start; @@ -8579,8 +8583,9 @@ int alloc_contig_range(unsigned long start, unsigned long end,
/* Make sure the range is really isolated. */ if (test_pages_isolated(outer_start, end, 0)) { - pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", - __func__, outer_start, end); + if (!no_retry) + pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", + __func__, outer_start, end); ret = -EBUSY; goto done; } diff --git a/mm/page_isolation.c b/mm/page_isolation.c index abbf42214485..31b1dcc1a395 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -49,7 +49,8 @@ static int set_migratetype_isolate(struct page *page, int migratetype, int isol_
__mod_zone_freepage_state(zone, -nr_pages, mt); spin_unlock_irqrestore(&zone->lock, flags); - drain_all_pages(zone); + if (!(isol_flags & SKIP_PCP_DRAIN)) + drain_all_pages(zone); return 0; }
On 17.11.20 19:19, Minchan Kim wrote:
There is a need for special HW to require bulk allocation of high-order pages. For example, 4800 * order-4 pages, which would be minimum, sometimes, it requires more.
To meet the requirement, a option reserves 300M CMA area and requests the whole 300M contiguous memory. However, it doesn't work if even one of those pages in the range is long-term pinned directly or indirectly. The other option is to ask higher-order size (e.g., 2M) than requested order(64K) repeatedly until driver could gather necessary amount of memory. Basically, this approach makes the allocation very slow due to cma_alloc's function slowness and it could be stuck on one of the pageblocks if it encounters unmigratable page.
To solve the issue, this patch introduces cma_alloc_bulk.
int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, unsigned int order, size_t nr_requests, struct page **page_array, size_t *nr_allocated);
Most parameters are same with cma_alloc but it additionally passes vector array to store allocated memory. What's different with cma_alloc is it will skip pageblocks without waiting/stopping if it has unmovable page so that API continues to scan other pageblocks to find requested order page.
cma_alloc_bulk is best effort approach in that it skips some pageblocks if they have unmovable pages unlike cma_alloc. It doesn't need to be perfect from the beginning at the cost of performance. Thus, the API takes gfp_t to support __GFP_NORETRY which is propagated into alloc_contig_page to avoid significat overhead functions to inrecase CMA allocation success ratio(e.g., migration retrial, PCP, LRU draining per pageblock) at the cost of less allocation success ratio. If the caller couldn't allocate enough pages with __GFP_NORETRY, they could call it without __GFP_NORETRY to increase success ratio this time if they are okay to expense the overhead for the success ratio.
I'm not a friend of connecting __GFP_NORETRY to PCP and LRU draining. Also, gfp flags apply mostly to compaction (e.g., how to allocate free pages for migration), so this seems a little wrong.
Can we instead introduce
enum alloc_contig_mode { /* * Normal mode: * * Retry page migration 5 times, ... TBD * */ ALLOC_CONTIG_NORMAL = 0, /* * Fast mode: e.g., used for bulk allocations. * * Don't retry page migration if it fails, don't drain PCP * lists, don't drain LRU. */ ALLOC_CONTIG_FAST, };
To be extended by ALLOC_CONTIG_HARD in the future to be used e.g., by virtio-mem (disable PCP, retry a couple of times more often ) ...
On Mon, Nov 23, 2020 at 03:15:37PM +0100, David Hildenbrand wrote:
On 17.11.20 19:19, Minchan Kim wrote:
There is a need for special HW to require bulk allocation of high-order pages. For example, 4800 * order-4 pages, which would be minimum, sometimes, it requires more.
To meet the requirement, a option reserves 300M CMA area and requests the whole 300M contiguous memory. However, it doesn't work if even one of those pages in the range is long-term pinned directly or indirectly. The other option is to ask higher-order size (e.g., 2M) than requested order(64K) repeatedly until driver could gather necessary amount of memory. Basically, this approach makes the allocation very slow due to cma_alloc's function slowness and it could be stuck on one of the pageblocks if it encounters unmigratable page.
To solve the issue, this patch introduces cma_alloc_bulk.
int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, unsigned int order, size_t nr_requests, struct page **page_array, size_t *nr_allocated);
Most parameters are same with cma_alloc but it additionally passes vector array to store allocated memory. What's different with cma_alloc is it will skip pageblocks without waiting/stopping if it has unmovable page so that API continues to scan other pageblocks to find requested order page.
cma_alloc_bulk is best effort approach in that it skips some pageblocks if they have unmovable pages unlike cma_alloc. It doesn't need to be perfect from the beginning at the cost of performance. Thus, the API takes gfp_t to support __GFP_NORETRY which is propagated into alloc_contig_page to avoid significat overhead functions to inrecase CMA allocation success ratio(e.g., migration retrial, PCP, LRU draining per pageblock) at the cost of less allocation success ratio. If the caller couldn't allocate enough pages with __GFP_NORETRY, they could call it without __GFP_NORETRY to increase success ratio this time if they are okay to expense the overhead for the success ratio.
I'm not a friend of connecting __GFP_NORETRY to PCP and LRU draining.
I was also not a fan of the gfp flags in the cma funcions since it could cause misunderstanding easily but saw taling about brining back gfp_t into cma_alloc. Since It seems to be dropped, let's use other term.
diff --git a/mm/cma.c b/mm/cma.c index 7c11ec2dc04c..806280050307 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -505,7 +505,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, * * @cma: contiguous memory region for which the allocation is performed. * @align: requested alignment of pages (in PAGE_SIZE order). - * @gfp_mask: memory allocation flags + * @fast: will skip costly opeartions if it's true. * @order: requested page order * @nr_requests: the number of 2^order pages requested to be allocated as input, * @page_array: page_array pointer to store allocated pages (must have space @@ -513,10 +513,10 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, * @nr_allocated: the number of 2^order pages allocated as output * * This function tries to allocate up to @nr_requests @order pages on specific - * contiguous memory area. If @gfp_mask has __GFP_NORETRY, it will avoid costly - * functions to increase allocation success ratio so it will be fast but might - * return less than requested number of pages. User could retry with - * !__GFP_NORETRY if it is needed. + * contiguous memory area. If @fast has true, it will avoid costly functions + * to increase allocation success ratio so it will be faster but might return + * less than requested number of pages. User could retry it with true if it is + * needed. * * Return: it will return 0 only if all pages requested by @nr_requestsed are * allocated. Otherwise, it returns negative error code. @@ -525,7 +525,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, * how many @order pages are allocated and free those pages when they are not * needed. */ -int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, +int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast, unsigned int order, size_t nr_requests, struct page **page_array, size_t *nr_allocated) { @@ -538,8 +538,8 @@ int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, unsigned long start = 0; unsigned long bitmap_maxno, bitmap_no, bitmap_count; struct page *page = NULL; - gfp_t gfp = GFP_KERNEL|__GFP_NOWARN|gfp_mask; - + enum alloc_contig_mode mode = fast ? ALLOC_CONTIG_FAST : + ALLOC_CONTIG_NORMAL; *nr_allocated = 0; if (!cma || !cma->count || !cma->bitmap || !page_array) return -EINVAL; @@ -576,7 +576,8 @@ int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask, mutex_unlock(&cma->lock);
pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); - ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA, gfp); + ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA, + GFP_KERNEL|__GFP_NOWARN, mode); if (ret) { cma_clear_bitmap(cma, pfn, nr_pages); if (ret != -EBUSY)
Also, gfp flags apply mostly to compaction (e.g., how to allocate free pages for migration), so this seems a little wrong.
Can we instead introduce
enum alloc_contig_mode { /* * Normal mode: * * Retry page migration 5 times, ... TBD * */ ALLOC_CONTIG_NORMAL = 0, /* * Fast mode: e.g., used for bulk allocations. * * Don't retry page migration if it fails, don't drain PCP * lists, don't drain LRU. */ ALLOC_CONTIG_FAST, };
Yeah, the mode is better. Let's have it as preparation patch.
From: Hyesoo Yu hyesoo.yu@samsung.com
The heaps could be added as module, so some functions should be exported to register dma-heaps. And dma-heap of module can use cma area to allocate and free. However the function related cma is not exported now. Let's export them for next patches.
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Signed-off-by: Minchan Kim minchan@kernel.org --- drivers/dma-buf/dma-heap.c | 2 ++ mm/cma.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index afd22c9dbdcf..cc6339cbca09 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -189,6 +189,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap) { return heap->priv; } +EXPORT_SYMBOL_GPL(dma_heap_get_drvdata);
struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) { @@ -272,6 +273,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) kfree(heap); return err_ret; } +EXPORT_SYMBOL_GPL(dma_heap_add);
static char *dma_heap_devnode(struct device *dev, umode_t *mode) { diff --git a/mm/cma.c b/mm/cma.c index 7c11ec2dc04c..87834e2966fa 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -54,6 +54,7 @@ const char *cma_get_name(const struct cma *cma) { return cma->name; } +EXPORT_SYMBOL_GPL(cma_get_name);
static unsigned long cma_bitmap_aligned_mask(const struct cma *cma, unsigned int align_order) @@ -498,6 +499,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, pr_debug("%s(): returned %p\n", __func__, page); return page; } +EXPORT_SYMBOL_GPL(cma_alloc);
/* * cma_alloc_bulk() - allocate high order bulk pages from contiguous area with @@ -641,6 +643,7 @@ bool cma_release(struct cma *cma, const struct page *pages, unsigned int count)
return true; } +EXPORT_SYMBOL_GPL(cma_release);
int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) {
On Tue, Nov 17, 2020 at 10:19 AM Minchan Kim minchan@kernel.org wrote:
From: Hyesoo Yu hyesoo.yu@samsung.com
The heaps could be added as module, so some functions should be exported to register dma-heaps. And dma-heap of module can use cma area to allocate and free. However the function related cma is not exported now. Let's export them for next patches.
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Signed-off-by: Minchan Kim minchan@kernel.org
drivers/dma-buf/dma-heap.c | 2 ++ mm/cma.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index afd22c9dbdcf..cc6339cbca09 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -189,6 +189,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap) { return heap->priv; } +EXPORT_SYMBOL_GPL(dma_heap_get_drvdata);
struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) { @@ -272,6 +273,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) kfree(heap); return err_ret; } +EXPORT_SYMBOL_GPL(dma_heap_add);
static char *dma_heap_devnode(struct device *dev, umode_t *mode) {
Thanks so much for sending this series along! I'm ok with the dma-heap exports to support modules.
diff --git a/mm/cma.c b/mm/cma.c index 7c11ec2dc04c..87834e2966fa 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -54,6 +54,7 @@ const char *cma_get_name(const struct cma *cma) { return cma->name; } +EXPORT_SYMBOL_GPL(cma_get_name);
static unsigned long cma_bitmap_aligned_mask(const struct cma *cma, unsigned int align_order) @@ -498,6 +499,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align, pr_debug("%s(): returned %p\n", __func__, page); return page; } +EXPORT_SYMBOL_GPL(cma_alloc);
/*
- cma_alloc_bulk() - allocate high order bulk pages from contiguous area with
@@ -641,6 +643,7 @@ bool cma_release(struct cma *cma, const struct page *pages, unsigned int count)
return true;
} +EXPORT_SYMBOL_GPL(cma_release);
int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data) { --
Though Christoph's (cc'ed) input would probably be good for the cma ones, as I know he had concerns previously with similar patches.
thanks -john
From: Hyesoo Yu hyesoo.yu@samsung.com
This patch supports chunk heap that allocates the buffers that are made up of a list of fixed size chunks taken from a CMA.
The chunk heap doesn't use heap-helper although it can remove duplicated code since heap-helper is under deprecated process.[1]
[1] https://lore.kernel.org/patchwork/patch/1336002
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Signed-off-by: Minchan Kim minchan@kernel.org --- drivers/dma-buf/heaps/Kconfig | 9 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/chunk_heap.c | 458 +++++++++++++++++++++++++++++ 3 files changed, 468 insertions(+) create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06c4226..9cc5366b8f5e 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA Choose this option to enable dma-buf CMA heap. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config DMABUF_HEAPS_CHUNK + tristate "DMA-BUF CHUNK Heap" + depends on DMABUF_HEAPS && DMA_CMA + help + Choose this option to enable dma-buf CHUNK heap. This heap is backed + by the Contiguous Memory Allocator (CMA) and allocates the buffers that + arranged into a list of fixed size chunks taken from CMA. Chunk size + is configured when the heap is created. diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 6e54cdec3da0..3b2a09869fd8 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -2,3 +2,4 @@ obj-y += heap-helpers.o obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o +obj-$(CONFIG_DMABUF_HEAPS_CHUNK) += chunk_heap.o diff --git a/drivers/dma-buf/heaps/chunk_heap.c b/drivers/dma-buf/heaps/chunk_heap.c new file mode 100644 index 000000000000..427594f56e18 --- /dev/null +++ b/drivers/dma-buf/heaps/chunk_heap.c @@ -0,0 +1,458 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ION Memory Allocator chunk heap exporter + * + * Copyright (c) 2020 Samsung Electronics Co., Ltd. + * Author: hyesoo.yu@samsung.com for Samsung Electronics. + */ + +#include <linux/platform_device.h> +#include <linux/cma.h> +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/dma-map-ops.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/highmem.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/sched/signal.h> +#include <linux/of_reserved_mem.h> +#include <linux/of.h> + +struct chunk_heap { + struct dma_heap *heap; + unsigned int order; + struct cma *cma; +}; + +struct chunk_heap_buffer { + struct chunk_heap *heap; + struct list_head attachments; + struct mutex lock; + struct sg_table sg_table; + unsigned long len; + int vmap_cnt; + void *vaddr; +}; + +struct chunk_heap_attachment { + struct device *dev; + struct sg_table *table; + struct list_head list; + bool mapped; +}; + +static struct sg_table *dup_sg_table(struct sg_table *table) +{ + struct sg_table *new_table; + int ret, i; + struct scatterlist *sg, *new_sg; + + new_table = kzalloc(sizeof(*new_table), GFP_KERNEL); + if (!new_table) + return ERR_PTR(-ENOMEM); + + ret = sg_alloc_table(new_table, table->orig_nents, GFP_KERNEL); + if (ret) { + kfree(new_table); + return ERR_PTR(-ENOMEM); + } + + new_sg = new_table->sgl; + for_each_sgtable_sg(table, sg, i) { + sg_set_page(new_sg, sg_page(sg), sg->length, sg->offset); + new_sg = sg_next(new_sg); + } + + return new_table; +} + +static int chunk_heap_attach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + struct chunk_heap_attachment *a; + struct sg_table *table; + + a = kzalloc(sizeof(*a), GFP_KERNEL); + if (!a) + return -ENOMEM; + + table = dup_sg_table(&buffer->sg_table); + if (IS_ERR(table)) { + kfree(a); + return -ENOMEM; + } + + a->table = table; + a->dev = attachment->dev; + INIT_LIST_HEAD(&a->list); + a->mapped = false; + + attachment->priv = a; + + mutex_lock(&buffer->lock); + list_add(&a->list, &buffer->attachments); + mutex_unlock(&buffer->lock); + + return 0; +} + +static void chunk_heap_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attachment) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + struct chunk_heap_attachment *a = attachment->priv; + + mutex_lock(&buffer->lock); + list_del(&a->list); + mutex_unlock(&buffer->lock); + + sg_free_table(a->table); + kfree(a->table); + kfree(a); +} + +static struct sg_table *chunk_heap_map_dma_buf(struct dma_buf_attachment *attachment, + enum dma_data_direction direction) +{ + struct chunk_heap_attachment *a = attachment->priv; + struct sg_table *table = a->table; + int ret; + + ret = dma_map_sgtable(attachment->dev, table, direction, 0); + if (ret) + return ERR_PTR(ret); + + a->mapped = true; + return table; +} + +static void chunk_heap_unmap_dma_buf(struct dma_buf_attachment *attachment, + struct sg_table *table, + enum dma_data_direction direction) +{ + struct chunk_heap_attachment *a = attachment->priv; + + a->mapped = false; + dma_unmap_sgtable(attachment->dev, table, direction, 0); +} + +static int chunk_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + enum dma_data_direction direction) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + struct chunk_heap_attachment *a; + + mutex_lock(&buffer->lock); + + if (buffer->vmap_cnt) + invalidate_kernel_vmap_range(buffer->vaddr, buffer->len); + + list_for_each_entry(a, &buffer->attachments, list) { + if (!a->mapped) + continue; + dma_sync_sgtable_for_cpu(a->dev, a->table, direction); + } + mutex_unlock(&buffer->lock); + + return 0; +} + +static int chunk_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf, + enum dma_data_direction direction) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + struct chunk_heap_attachment *a; + + mutex_lock(&buffer->lock); + + if (buffer->vmap_cnt) + flush_kernel_vmap_range(buffer->vaddr, buffer->len); + + list_for_each_entry(a, &buffer->attachments, list) { + if (!a->mapped) + continue; + dma_sync_sgtable_for_device(a->dev, a->table, direction); + } + mutex_unlock(&buffer->lock); + + return 0; +} + +static int chunk_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + struct sg_table *table = &buffer->sg_table; + unsigned long addr = vma->vm_start; + struct sg_page_iter piter; + int ret; + + for_each_sgtable_page(table, &piter, vma->vm_pgoff) { + struct page *page = sg_page_iter_page(&piter); + + ret = remap_pfn_range(vma, addr, page_to_pfn(page), PAGE_SIZE, + vma->vm_page_prot); + if (ret) + return ret; + addr = PAGE_SIZE; + if (addr >= vma->vm_end) + return 0; + } + return 0; +} + +static void *chunk_heap_do_vmap(struct chunk_heap_buffer *buffer) +{ + struct sg_table *table = &buffer->sg_table; + int npages = PAGE_ALIGN(buffer->len) / PAGE_SIZE; + struct page **pages = vmalloc(sizeof(struct page *) * npages); + struct page **tmp = pages; + struct sg_page_iter piter; + void *vaddr; + + if (!pages) + return ERR_PTR(-ENOMEM); + + for_each_sgtable_page(table, &piter, 0) { + WARN_ON(tmp - pages >= npages); + *tmp++ = sg_page_iter_page(&piter); + } + + vaddr = vmap(pages, npages, VM_MAP, PAGE_KERNEL); + vfree(pages); + + if (!vaddr) + return ERR_PTR(-ENOMEM); + + return vaddr; +} + +static int chunk_heap_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + int ret = 0; + void *vaddr; + + mutex_lock(&buffer->lock); + if (buffer->vmap_cnt) { + vaddr = buffer->vaddr; + goto done; + } + + vaddr = chunk_heap_do_vmap(buffer); + if (IS_ERR(vaddr)) { + ret = PTR_ERR(vaddr); + goto err; + } + + buffer->vaddr = vaddr; +done: + buffer->vmap_cnt++; + dma_buf_map_set_vaddr(map, vaddr); +err: + mutex_unlock(&buffer->lock); + + return ret; +} + +static void chunk_heap_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + + mutex_lock(&buffer->lock); + if (!--buffer->vmap_cnt) { + vunmap(buffer->vaddr); + buffer->vaddr = NULL; + } + mutex_unlock(&buffer->lock); +} + +static void chunk_heap_dma_buf_release(struct dma_buf *dmabuf) +{ + struct chunk_heap_buffer *buffer = dmabuf->priv; + struct chunk_heap *chunk_heap = buffer->heap; + struct sg_table *table; + struct scatterlist *sg; + int i; + + table = &buffer->sg_table; + for_each_sgtable_sg(table, sg, i) + cma_release(chunk_heap->cma, sg_page(sg), 1 << chunk_heap->order); + sg_free_table(table); + kfree(buffer); +} + +static const struct dma_buf_ops chunk_heap_buf_ops = { + .attach = chunk_heap_attach, + .detach = chunk_heap_detach, + .map_dma_buf = chunk_heap_map_dma_buf, + .unmap_dma_buf = chunk_heap_unmap_dma_buf, + .begin_cpu_access = chunk_heap_dma_buf_begin_cpu_access, + .end_cpu_access = chunk_heap_dma_buf_end_cpu_access, + .mmap = chunk_heap_mmap, + .vmap = chunk_heap_vmap, + .vunmap = chunk_heap_vunmap, + .release = chunk_heap_dma_buf_release, +}; + +static int chunk_heap_allocate(struct dma_heap *heap, unsigned long len, + unsigned long fd_flags, unsigned long heap_flags) +{ + struct chunk_heap *chunk_heap = dma_heap_get_drvdata(heap); + struct chunk_heap_buffer *buffer; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct dma_buf *dmabuf; + struct sg_table *table; + struct scatterlist *sg; + struct page **pages; + unsigned int chunk_size = PAGE_SIZE << chunk_heap->order; + unsigned int count, alloced = 0; + unsigned int num_retry = 5; + int ret = -ENOMEM; + pgoff_t pg; + + buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); + if (!buffer) + return ret; + + INIT_LIST_HEAD(&buffer->attachments); + mutex_init(&buffer->lock); + buffer->heap = chunk_heap; + buffer->len = ALIGN(len, chunk_size); + count = buffer->len / chunk_size; + + pages = kvmalloc_array(count, sizeof(*pages), GFP_KERNEL); + if (!pages) + goto err_pages; + + while (num_retry--) { + unsigned long nr_pages; + + ret = cma_alloc_bulk(chunk_heap->cma, chunk_heap->order, + num_retry ? __GFP_NORETRY : 0, + chunk_heap->order, count - alloced, + pages + alloced, &nr_pages); + alloced += nr_pages; + if (alloced == count) + break; + if (ret != -EBUSY) + break; + + } + if (ret < 0) + goto err_alloc; + + table = &buffer->sg_table; + if (sg_alloc_table(table, count, GFP_KERNEL)) + goto err_alloc; + + sg = table->sgl; + for (pg = 0; pg < count; pg++) { + sg_set_page(sg, pages[pg], chunk_size, 0); + sg = sg_next(sg); + } + + exp_info.ops = &chunk_heap_buf_ops; + exp_info.size = buffer->len; + exp_info.flags = fd_flags; + exp_info.priv = buffer; + dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(dmabuf)) { + ret = PTR_ERR(dmabuf); + goto err_export; + } + kvfree(pages); + + ret = dma_buf_fd(dmabuf, fd_flags); + if (ret < 0) { + dma_buf_put(dmabuf); + return ret; + } + + return 0; +err_export: + sg_free_table(table); +err_alloc: + for (pg = 0; pg < alloced; pg++) + cma_release(chunk_heap->cma, pages[pg], 1 << chunk_heap->order); + kvfree(pages); +err_pages: + kfree(buffer); + + return ret; +} + +static void rmem_remove_callback(void *p) +{ + of_reserved_mem_device_release((struct device *)p); +} + +static const struct dma_heap_ops chunk_heap_ops = { + .allocate = chunk_heap_allocate, +}; + +static int chunk_heap_probe(struct platform_device *pdev) +{ + struct chunk_heap *chunk_heap; + struct dma_heap_export_info exp_info; + unsigned int alignment; + int ret; + + ret = of_reserved_mem_device_init(&pdev->dev); + if (ret || !pdev->dev.cma_area) { + dev_err(&pdev->dev, "The CMA reserved area is not assigned (ret %d)", ret); + return -EINVAL; + } + + ret = devm_add_action(&pdev->dev, rmem_remove_callback, &pdev->dev); + if (ret) { + of_reserved_mem_device_release(&pdev->dev); + return ret; + } + + chunk_heap = devm_kzalloc(&pdev->dev, sizeof(*chunk_heap), GFP_KERNEL); + if (!chunk_heap) + return -ENOMEM; + + if (of_property_read_u32(pdev->dev.of_node, "alignment", &alignment)) + chunk_heap->order = 0; + else + chunk_heap->order = get_order(alignment); + + chunk_heap->cma = pdev->dev.cma_area; + + exp_info.name = cma_get_name(pdev->dev.cma_area); + exp_info.ops = &chunk_heap_ops; + exp_info.priv = chunk_heap; + + chunk_heap->heap = dma_heap_add(&exp_info); + if (IS_ERR(chunk_heap->heap)) + return PTR_ERR(chunk_heap->heap); + + return 0; +} + +static const struct of_device_id chunk_heap_of_match[] = { + { .compatible = "dma_heap,chunk", }, + { }, +}; + +MODULE_DEVICE_TABLE(of, chunk_heap_of_match); + +static struct platform_driver chunk_heap_driver = { + .driver = { + .name = "chunk_heap", + .of_match_table = chunk_heap_of_match, + }, + .probe = chunk_heap_probe, +}; + +static int __init chunk_heap_init(void) +{ + return platform_driver_register(&chunk_heap_driver); +} +module_init(chunk_heap_init); +MODULE_DESCRIPTION("DMA-BUF Chunk Heap"); +MODULE_LICENSE("GPL v2");
From: Hyesoo Yu hyesoo.yu@samsung.com
Document devicetree binding for chunk heap on dma heap framework
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Signed-off-by: Minchan Kim minchan@kernel.org --- .../bindings/dma-buf/chunk_heap.yaml | 52 +++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
diff --git a/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml new file mode 100644 index 000000000000..f382bee02778 --- /dev/null +++ b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/dma-buf/chunk_heap.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Device tree binding for chunk heap on DMA HEAP FRAMEWORK + +maintainers: + - Sumit Semwal sumit.semwal@linaro.org + +description: | + The chunk heap is backed by the Contiguous Memory Allocator (CMA) and + allocates the buffers that are made up to a list of fixed size chunks + taken from CMA. Chunk sizes are configurated when the heaps are created. + +properties: + compatible: + enum: + - dma_heap,chunk + + memory-region: + maxItems: 1 + + alignment: + maxItems: 1 + +required: + - compatible + - memory-region + - alignment + +additionalProperties: false + +examples: + - | + reserved-memory { + #address-cells = <2>; + #size-cells = <1>; + + chunk_memory: chunk_memory { + compatible = "shared-dma-pool"; + reusable; + size = <0x10000000>; + }; + }; + + chunk_default_heap: chunk_default_heap { + compatible = "dma_heap,chunk"; + memory-region = <&chunk_memory>; + alignment = <0x10000>; + };
On Tue, Nov 17, 2020 at 10:19 AM Minchan Kim minchan@kernel.org wrote:
From: Hyesoo Yu hyesoo.yu@samsung.com
Document devicetree binding for chunk heap on dma heap framework
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Signed-off-by: Minchan Kim minchan@kernel.org
.../bindings/dma-buf/chunk_heap.yaml | 52 +++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
diff --git a/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml new file mode 100644 index 000000000000..f382bee02778 --- /dev/null +++ b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/dma-buf/chunk_heap.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml#
+title: Device tree binding for chunk heap on DMA HEAP FRAMEWORK
+maintainers:
- Sumit Semwal sumit.semwal@linaro.org
+description: |
- The chunk heap is backed by the Contiguous Memory Allocator (CMA) and
- allocates the buffers that are made up to a list of fixed size chunks
- taken from CMA. Chunk sizes are configurated when the heaps are created.
+properties:
- compatible:
- enum:
- dma_heap,chunk
- memory-region:
- maxItems: 1
- alignment:
- maxItems: 1
+required:
- compatible
- memory-region
- alignment
+additionalProperties: false
+examples:
- |
- reserved-memory {
#address-cells = <2>;
#size-cells = <1>;
chunk_memory: chunk_memory {
compatible = "shared-dma-pool";
reusable;
size = <0x10000000>;
};
- };
- chunk_default_heap: chunk_default_heap {
compatible = "dma_heap,chunk";
memory-region = <&chunk_memory>;
alignment = <0x10000>;
- };
So I suspect Rob will push back on this as he has for other dt bindings related to ion/dmabuf heaps (I tried to push a similar solution to exporting multiple CMA areas via dmabuf heaps).
The proposal he seemed to like best was having an in-kernel function that a driver would call to initialize the heap (associated with the CMA region the driver is interested in). Similar to Kunihiko Hayashi's patch here: - https://lore.kernel.org/lkml/1594948208-4739-1-git-send-email-hayashi.kunihi...
The one sticking point for that patch (which I think is a good one), is that we don't have any in-tree users, so it couldn't be merged yet.
A similar approach might be good here, but again we probably need to have at least one in-tree user which could call such a registration function.
thanks -john
On Tue, Nov 17, 2020 at 07:00:54PM -0800, John Stultz wrote:
On Tue, Nov 17, 2020 at 10:19 AM Minchan Kim minchan@kernel.org wrote:
From: Hyesoo Yu hyesoo.yu@samsung.com
Document devicetree binding for chunk heap on dma heap framework
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com Signed-off-by: Minchan Kim minchan@kernel.org
.../bindings/dma-buf/chunk_heap.yaml | 52 +++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
diff --git a/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml new file mode 100644 index 000000000000..f382bee02778 --- /dev/null +++ b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml @@ -0,0 +1,52 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: https://protect2.fireeye.com/v1/url?k=9020a1f6-cfbb98fd-90212ab9-002590f5b90... +$schema: https://protect2.fireeye.com/v1/url?k=876fa02f-d8f49924-876e2b60-002590f5b90...
+title: Device tree binding for chunk heap on DMA HEAP FRAMEWORK
+maintainers:
- Sumit Semwal sumit.semwal@linaro.org
+description: |
- The chunk heap is backed by the Contiguous Memory Allocator (CMA) and
- allocates the buffers that are made up to a list of fixed size chunks
- taken from CMA. Chunk sizes are configurated when the heaps are created.
+properties:
- compatible:
- enum:
- dma_heap,chunk
- memory-region:
- maxItems: 1
- alignment:
- maxItems: 1
+required:
- compatible
- memory-region
- alignment
+additionalProperties: false
+examples:
- |
- reserved-memory {
#address-cells = <2>;
#size-cells = <1>;
chunk_memory: chunk_memory {
compatible = "shared-dma-pool";
reusable;
size = <0x10000000>;
};
- };
- chunk_default_heap: chunk_default_heap {
compatible = "dma_heap,chunk";
memory-region = <&chunk_memory>;
alignment = <0x10000>;
- };
So I suspect Rob will push back on this as he has for other dt bindings related to ion/dmabuf heaps (I tried to push a similar solution to exporting multiple CMA areas via dmabuf heaps).
The proposal he seemed to like best was having an in-kernel function that a driver would call to initialize the heap (associated with the CMA region the driver is interested in). Similar to Kunihiko Hayashi's patch here:
The one sticking point for that patch (which I think is a good one), is that we don't have any in-tree users, so it couldn't be merged yet.
A similar approach might be good here, but again we probably need to have at least one in-tree user which could call such a registration function.
thanks -john
Thanks for your review.
The chunk heap is not considered for device-specific reserved memory and specific driver. It is similar to system heap, but it only collects high-order pages by using specific cma-area for performance.
It is strange that there is in-tree user who registers chunk heap. (Wouldn't it be strange for some users to register the system heap?)
Is there a reason to use dma-heap framework to add cma-area for specific device ?
Even if some in-tree users register dma-heap with cma-area, the buffers could be allocated in user-land and these could be shared among other devices. For exclusive access, I guess, the device don't need to register dma-heap for cma area.
Please let me know if I misunderstood what you said.
Thanks, Regards.
On Wed, Nov 18, 2020 at 5:22 PM Hyesoo Yu hyesoo.yu@samsung.com wrote:
On Tue, Nov 17, 2020 at 07:00:54PM -0800, John Stultz wrote:
So I suspect Rob will push back on this as he has for other dt bindings related to ion/dmabuf heaps (I tried to push a similar solution to exporting multiple CMA areas via dmabuf heaps).
The proposal he seemed to like best was having an in-kernel function that a driver would call to initialize the heap (associated with the CMA region the driver is interested in). Similar to Kunihiko Hayashi's patch here:
The one sticking point for that patch (which I think is a good one), is that we don't have any in-tree users, so it couldn't be merged yet.
A similar approach might be good here, but again we probably need to have at least one in-tree user which could call such a registration function.
Thanks for your review.
The chunk heap is not considered for device-specific reserved memory and specific driver. It is similar to system heap, but it only collects high-order pages by using specific cma-area for performance.
So, yes I agree, the chunk heap isn't device specific. It's just that the CMA regions usually are tied to devices.
The main objection to this style of solution has been due to the fact that the DTS is supposed to describe the physical hardware (in an OS agnostic way), rather than define configuration info for Linux software drivers.
Obviously this can be quibbled about, as the normal way of tying devices to CMA has some assumptions of what the driver will use that region for, rather than somehow representing a physical tie between a memory reservation and a device. Nonetheless, Rob has been hesitant to take any sort of ION/DmaBuf Heap DT devices, and has been more interested in some device having the memory reservation reference and the driver for that doing the registration.
It is strange that there is in-tree user who registers chunk heap. (Wouldn't it be strange for some users to register the system heap?)
Well, as there's no reservation/configuration needed, the system heap can register itself.
The CMA heap currently only registers the default CMA heap, as we didn't want to expose all CMA regions and there's otherwise no way to pick and choose.
Is there a reason to use dma-heap framework to add cma-area for specific device ?
Even if some in-tree users register dma-heap with cma-area, the buffers could be allocated in user-land and these could be shared among other devices. For exclusive access, I guess, the device don't need to register dma-heap for cma area.
It's not really about exclusive access. More just that if you want to bind a memory reservation/region (cma or otherwise), at least for DTS, it needs to bind with some device in DT.
Then the device driver can register that region with a heap driver. This avoids adding new Linux-specific software bindings to DT. It becomes a driver implementation detail instead. The primary user of the heap type would probably be a practical pick (ie the display or isp driver).
The other potential solution Rob has suggested is that we create some tag for the memory reservation (ie: like we do with cma: "reusable"), which can be used to register the region to a heap. But this has the problem that each tag has to be well defined and map to a known heap.
thanks -john
On Wed, Nov 18, 2020 at 07:19:07PM -0800, John Stultz wrote:
On Wed, Nov 18, 2020 at 5:22 PM Hyesoo Yu hyesoo.yu@samsung.com wrote:
On Tue, Nov 17, 2020 at 07:00:54PM -0800, John Stultz wrote:
So I suspect Rob will push back on this as he has for other dt bindings related to ion/dmabuf heaps (I tried to push a similar solution to exporting multiple CMA areas via dmabuf heaps).
The proposal he seemed to like best was having an in-kernel function that a driver would call to initialize the heap (associated with the CMA region the driver is interested in). Similar to Kunihiko Hayashi's patch here:
The one sticking point for that patch (which I think is a good one), is that we don't have any in-tree users, so it couldn't be merged yet.
A similar approach might be good here, but again we probably need to have at least one in-tree user which could call such a registration function.
Thanks for your review.
The chunk heap is not considered for device-specific reserved memory and specific driver. It is similar to system heap, but it only collects high-order pages by using specific cma-area for performance.
So, yes I agree, the chunk heap isn't device specific. It's just that the CMA regions usually are tied to devices.
The main objection to this style of solution has been due to the fact that the DTS is supposed to describe the physical hardware (in an OS agnostic way), rather than define configuration info for Linux software drivers.
Obviously this can be quibbled about, as the normal way of tying devices to CMA has some assumptions of what the driver will use that region for, rather than somehow representing a physical tie between a memory reservation and a device. Nonetheless, Rob has been hesitant to take any sort of ION/DmaBuf Heap DT devices, and has been more interested in some device having the memory reservation reference and the driver for that doing the registration.
It is strange that there is in-tree user who registers chunk heap. (Wouldn't it be strange for some users to register the system heap?)
Well, as there's no reservation/configuration needed, the system heap can register itself.
The CMA heap currently only registers the default CMA heap, as we didn't want to expose all CMA regions and there's otherwise no way to pick and choose.
Is there a reason to use dma-heap framework to add cma-area for specific device ?
Even if some in-tree users register dma-heap with cma-area, the buffers could be allocated in user-land and these could be shared among other devices. For exclusive access, I guess, the device don't need to register dma-heap for cma area.
It's not really about exclusive access. More just that if you want to bind a memory reservation/region (cma or otherwise), at least for DTS, it needs to bind with some device in DT.
Then the device driver can register that region with a heap driver. This avoids adding new Linux-specific software bindings to DT. It becomes a driver implementation detail instead. The primary user of the heap type would probably be a practical pick (ie the display or isp driver).
The other potential solution Rob has suggested is that we create some tag for the memory reservation (ie: like we do with cma: "reusable"), which can be used to register the region to a heap. But this has the problem that each tag has to be well defined and map to a known heap.
thanks -john
Thanks for the detailed reply.
I understood what you mean exactly. I agree with your opinion that avoids software bindings to DT.
The way to register the heap by specific device driver, makes dependency between heap and some device drivers that we pick practically. If that device driver changed or removed whenever H/W changed, the chunk heap is affected regardless of our intentions.
As you said, the other solution that add tags need to be well defined. I guess, that will be a long-term solution.
First of all, we just want to register chunk heap to allocate high-order pages. I'm going to change to a simple solution that uses default cma like cma heap, not using DT.
Thanks. Regards.
On Wed, Nov 18, 2020 at 07:19:07PM -0800, John Stultz wrote:
On Wed, Nov 18, 2020 at 5:22 PM Hyesoo Yu hyesoo.yu@samsung.com wrote:
On Tue, Nov 17, 2020 at 07:00:54PM -0800, John Stultz wrote:
So I suspect Rob will push back on this as he has for other dt bindings related to ion/dmabuf heaps (I tried to push a similar solution to exporting multiple CMA areas via dmabuf heaps).
The proposal he seemed to like best was having an in-kernel function that a driver would call to initialize the heap (associated with the CMA region the driver is interested in). Similar to Kunihiko Hayashi's patch here:
The one sticking point for that patch (which I think is a good one), is that we don't have any in-tree users, so it couldn't be merged yet.
A similar approach might be good here, but again we probably need to have at least one in-tree user which could call such a registration function.
Thanks for your review.
The chunk heap is not considered for device-specific reserved memory and specific driver. It is similar to system heap, but it only collects high-order pages by using specific cma-area for performance.
So, yes I agree, the chunk heap isn't device specific. It's just that the CMA regions usually are tied to devices.
The main objection to this style of solution has been due to the fact that the DTS is supposed to describe the physical hardware (in an OS agnostic way), rather than define configuration info for Linux software drivers.
Obviously this can be quibbled about, as the normal way of tying devices to CMA has some assumptions of what the driver will use that region for, rather than somehow representing a physical tie between a memory reservation and a device. Nonetheless, Rob has been hesitant to take any sort of ION/DmaBuf Heap DT devices, and has been more interested in some device having the memory reservation reference and the driver for that doing the registration.
It is strange that there is in-tree user who registers chunk heap. (Wouldn't it be strange for some users to register the system heap?)
Well, as there's no reservation/configuration needed, the system heap can register itself.
The CMA heap currently only registers the default CMA heap, as we didn't want to expose all CMA regions and there's otherwise no way to pick and choose.
Yub.
dma-buf really need a way to make exclusive CMA area. Otherwise, default CMA would be shared among drivers and introduce fragmentation easily since we couldn't control other drivers. In such aspect, I don't think current cma-heap works if userspace needs big memory chunk.
Here, the problem is there is no in-kernel user to bind the specific CMA area because the owner will be random in userspace via dma-buf interface.
Is there a reason to use dma-heap framework to add cma-area for specific device ?
Even if some in-tree users register dma-heap with cma-area, the buffers could be allocated in user-land and these could be shared among other devices. For exclusive access, I guess, the device don't need to register dma-heap for cma area.
It's not really about exclusive access. More just that if you want to bind a memory reservation/region (cma or otherwise), at least for DTS, it needs to bind with some device in DT.
Then the device driver can register that region with a heap driver. This avoids adding new Linux-specific software bindings to DT. It becomes a driver implementation detail instead. The primary user of the heap type would probably be a practical pick (ie the display or isp driver).
If it's the only solution, we could create some dummy driver which has only module_init and bind it from there but I don't think it's a good idea.
The other potential solution Rob has suggested is that we create some tag for the memory reservation (ie: like we do with cma: "reusable"), which can be used to register the region to a heap. But this has the problem that each tag has to be well defined and map to a known heap.
Do you think that's the only solution to make progress for this feature? Then, could you elaborate it a bit more or any other ideas from dma-buf folks?
On Wed, Dec 9, 2020 at 3:53 PM Minchan Kim minchan@kernel.org wrote:
On Wed, Nov 18, 2020 at 07:19:07PM -0800, John Stultz wrote:
The CMA heap currently only registers the default CMA heap, as we didn't want to expose all CMA regions and there's otherwise no way to pick and choose.
Yub.
dma-buf really need a way to make exclusive CMA area. Otherwise, default CMA would be shared among drivers and introduce fragmentation easily since we couldn't control other drivers. In such aspect, I don't think current cma-heap works if userspace needs big memory chunk.
Yes, the default CMA region is not always optimal.
That's why I was hopeful for Kunihiko Hayashi's patch to allow for exposing specific cma regions: https://lore.kernel.org/lkml/1594948208-4739-1-git-send-email-hayashi.kunihi...
I think it would be a good solution, but all we need is *some* driver which can be considered the primary user/owner of the cma region which would then explicitly export it via the dmabuf heaps.
Here, the problem is there is no in-kernel user to bind the specific CMA area because the owner will be random in userspace via dma-buf interface.
Well, while I agree that conceptually the dmabuf heaps allow for allocations for multi-device pipelines, and thus are not tied to specific devices. I do think that the memory types exposed are likely to have specific devices/drivers in the pipeline that it matters most to. So I don't see a big issue with the in-kernel driver registering a specific CMA region as a dmabuf heap.
Is there a reason to use dma-heap framework to add cma-area for specific device ?
Even if some in-tree users register dma-heap with cma-area, the buffers could be allocated in user-land and these could be shared among other devices. For exclusive access, I guess, the device don't need to register dma-heap for cma area.
It's not really about exclusive access. More just that if you want to bind a memory reservation/region (cma or otherwise), at least for DTS, it needs to bind with some device in DT.
Then the device driver can register that region with a heap driver. This avoids adding new Linux-specific software bindings to DT. It becomes a driver implementation detail instead. The primary user of the heap type would probably be a practical pick (ie the display or isp driver).
If it's the only solution, we could create some dummy driver which has only module_init and bind it from there but I don't think it's a good idea.
Yea, an un-upstreamable dummy driver is maybe what it devolves to in the worst case. But I suspect it would be cleaner for a display or ISP driver that benefits most from the heap type to add the reserved memory reference to their DT node, and on init for them to register the region with the dmabuf heap code.
The other potential solution Rob has suggested is that we create some tag for the memory reservation (ie: like we do with cma: "reusable"), which can be used to register the region to a heap. But this has the problem that each tag has to be well defined and map to a known heap.
Do you think that's the only solution to make progress for this feature? Then, could you elaborate it a bit more or any other ideas from dma-buf folks?
I'm skeptical of that DT tag approach working out, as we'd need a new DT binding for the new tag name, and we'd have to do so for each new heap type that needs this (so non-default cma, your chunk heap, whatever other similar heap types that use reserved regions folks come up with). Having *some* driver take ownership for the reserved region and register it with the appropriate heap type seems much cleaner/flexible and avoids mucking with the DT ABI.
thanks -john
On Thu, Dec 10, 2020 at 12:15:15AM -0800, John Stultz wrote:
On Wed, Dec 9, 2020 at 3:53 PM Minchan Kim minchan@kernel.org wrote:
On Wed, Nov 18, 2020 at 07:19:07PM -0800, John Stultz wrote:
The CMA heap currently only registers the default CMA heap, as we didn't want to expose all CMA regions and there's otherwise no way to pick and choose.
Yub.
dma-buf really need a way to make exclusive CMA area. Otherwise, default CMA would be shared among drivers and introduce fragmentation easily since we couldn't control other drivers. In such aspect, I don't think current cma-heap works if userspace needs big memory chunk.
Yes, the default CMA region is not always optimal.
That's why I was hopeful for Kunihiko Hayashi's patch to allow for exposing specific cma regions: https://lore.kernel.org/lkml/1594948208-4739-1-git-send-email-hayashi.kunihi...
I think it would be a good solution, but all we need is *some* driver which can be considered the primary user/owner of the cma region which would then explicitly export it via the dmabuf heaps.
Here, the problem is there is no in-kernel user to bind the specific CMA area because the owner will be random in userspace via dma-buf interface.
Well, while I agree that conceptually the dmabuf heaps allow for allocations for multi-device pipelines, and thus are not tied to specific devices. I do think that the memory types exposed are likely to have specific devices/drivers in the pipeline that it matters most to. So I don't see a big issue with the in-kernel driver registering a specific CMA region as a dmabuf heap.
Then, I am worry about that we spread out dma_heap_add_cma to too many drivers since kernel doesn't how userspace will use it. For example, system 1 could have device A-B-C pipeline so they added it A driver. After that, system 2 could have device B-C-D pipeline so they add dma_heap_add_cma into B device.
Is there a reason to use dma-heap framework to add cma-area for specific device ?
Even if some in-tree users register dma-heap with cma-area, the buffers could be allocated in user-land and these could be shared among other devices. For exclusive access, I guess, the device don't need to register dma-heap for cma area.
It's not really about exclusive access. More just that if you want to bind a memory reservation/region (cma or otherwise), at least for DTS, it needs to bind with some device in DT.
Then the device driver can register that region with a heap driver. This avoids adding new Linux-specific software bindings to DT. It becomes a driver implementation detail instead. The primary user of the heap type would probably be a practical pick (ie the display or isp driver).
If it's the only solution, we could create some dummy driver which has only module_init and bind it from there but I don't think it's a good idea.
Yea, an un-upstreamable dummy driver is maybe what it devolves to in the worst case. But I suspect it would be cleaner for a display or ISP driver that benefits most from the heap type to add the reserved memory reference to their DT node, and on init for them to register the region with the dmabuf heap code.
As I mentioned above, it could be a display at this moment but it could be different driver next time. If I miss your point, let me know.
Thanks for the review, John.
The other potential solution Rob has suggested is that we create some tag for the memory reservation (ie: like we do with cma: "reusable"), which can be used to register the region to a heap. But this has the problem that each tag has to be well defined and map to a known heap.
Do you think that's the only solution to make progress for this feature? Then, could you elaborate it a bit more or any other ideas from dma-buf folks?
I'm skeptical of that DT tag approach working out, as we'd need a new DT binding for the new tag name, and we'd have to do so for each new heap type that needs this (so non-default cma, your chunk heap, whatever other similar heap types that use reserved regions folks come up with). Having *some* driver take ownership for the reserved region and register it with the appropriate heap type seems much cleaner/flexible and avoids mucking with the DT ABI.
thanks -john
On Thu, Dec 10, 2020 at 8:06 AM Minchan Kim minchan@kernel.org wrote:
On Thu, Dec 10, 2020 at 12:15:15AM -0800, John Stultz wrote:
Well, while I agree that conceptually the dmabuf heaps allow for allocations for multi-device pipelines, and thus are not tied to specific devices. I do think that the memory types exposed are likely to have specific devices/drivers in the pipeline that it matters most to. So I don't see a big issue with the in-kernel driver registering a specific CMA region as a dmabuf heap.
Then, I am worry about that we spread out dma_heap_add_cma to too many drivers since kernel doesn't how userspace will use it. For example, system 1 could have device A-B-C pipeline so they added it A driver. After that, system 2 could have device B-C-D pipeline so they add dma_heap_add_cma into B device.
I'm not sure I see this as a major issue? If the drivers add it based on the dt memory reference, those will be configured to not add duplicate heaps (and even so the heap driver can also ensure we don't try to add a heap twice).
Yea, an un-upstreamable dummy driver is maybe what it devolves to in the worst case. But I suspect it would be cleaner for a display or ISP driver that benefits most from the heap type to add the reserved memory reference to their DT node, and on init for them to register the region with the dmabuf heap code.
As I mentioned above, it could be a display at this moment but it could be different driver next time. If I miss your point, let me know.
I guess I just don't see potentially having the registration calls added to multiple drivers as a big problem.
Ideally, yes, I'd probably rather see a DT node that would allow the heap driver to register specified regions, but that's been NACKed multiple times. Given that, having hooks in device drivers to export the region seems to me like the next best approach, as it avoids DT ABI ( if ends up its a bad approach, its not something we have to keep).
The bigger problem right now is not that there are too many places the registration call would be made from, but that there aren't upstream drivers which I'm aware of where it would currently make sense to add specific dma_heap_add_cma() registration hooks to. We need an upstream user of Kunihiko Hayashi's patch.
thanks -john
On Thu, Dec 10, 2020 at 02:40:38PM -0800, John Stultz wrote:
On Thu, Dec 10, 2020 at 8:06 AM Minchan Kim minchan@kernel.org wrote:
On Thu, Dec 10, 2020 at 12:15:15AM -0800, John Stultz wrote:
Well, while I agree that conceptually the dmabuf heaps allow for allocations for multi-device pipelines, and thus are not tied to specific devices. I do think that the memory types exposed are likely to have specific devices/drivers in the pipeline that it matters most to. So I don't see a big issue with the in-kernel driver registering a specific CMA region as a dmabuf heap.
Then, I am worry about that we spread out dma_heap_add_cma to too many drivers since kernel doesn't how userspace will use it. For example, system 1 could have device A-B-C pipeline so they added it A driver. After that, system 2 could have device B-C-D pipeline so they add dma_heap_add_cma into B device.
I'm not sure I see this as a major issue? If the drivers add it based on the dt memory reference, those will be configured to not add duplicate heaps (and even so the heap driver can also ensure we don't try to add a heap twice).
Sure, it doesn't have any problem to work but design ponint of view, it looks werid to me in that one of random driver in the specific scenario should have the heap registration and then we propagate the heap registeration logics to other drivers day by day. To avoid DT binding with dmabuf directy but it seems we have bad design. To me, it sounds like to toss DT with anonymous dmabuf binding problem to device drivers.
Yea, an un-upstreamable dummy driver is maybe what it devolves to in the worst case. But I suspect it would be cleaner for a display or ISP driver that benefits most from the heap type to add the reserved memory reference to their DT node, and on init for them to register the region with the dmabuf heap code.
As I mentioned above, it could be a display at this moment but it could be different driver next time. If I miss your point, let me know.
I guess I just don't see potentially having the registration calls added to multiple drivers as a big problem.
Ideally, yes, I'd probably rather see a DT node that would allow the heap driver to register specified regions, but that's been NACKed multiple times. Given that, having hooks in device drivers to export the region seems to me like the next best approach, as it avoids DT ABI ( if ends up its a bad approach, its not something we have to keep).
The bigger problem right now is not that there are too many places the registration call would be made from, but that there aren't upstream drivers which I'm aware of where it would currently make sense to add specific dma_heap_add_cma() registration hooks to. We need an upstream user of Kunihiko Hayashi's patch.
Hmm, if that's only way to proceed, Hyesoo, do you have any specifid driver in your mind to bind the CMA area?
linaro-mm-sig@lists.linaro.org