These patch series to introduce a new dma heap, chunk heap. That heap is needed for special HW that requires bulk allocation of fixed high order pages. For example, 64MB dma-buf pages are made up to fixed order-4 pages * 1024.
The chunk heap uses alloc_pages_bulk to allocate high order page. https://lore.kernel.org/linux-mm/20200814173131.2803002-1-minchan@kernel.org
The chunk heap is registered by device tree with alignment and memory node of contiguous memory allocator(CMA). Alignment defines chunk page size. For example, alignment 0x1_0000 means chunk page size is 64KB. The phandle to memory node indicates contiguous memory allocator(CMA). If device node doesn't have cma, the registration of chunk heap fails.
The patchset includes the following: - export dma-heap API to register kernel module dma heap. - add chunk heap implementation. - document of device tree to register chunk heap
Hyesoo Yu (3): dma-buf: add missing EXPORT_SYMBOL_GPL() for dma heaps dma-buf: heaps: add chunk heap to dmabuf heaps dma-heap: Devicetree binding for chunk heap
.../devicetree/bindings/dma-buf/chunk_heap.yaml | 46 +++++ drivers/dma-buf/dma-heap.c | 2 + drivers/dma-buf/heaps/Kconfig | 9 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/chunk_heap.c | 222 +++++++++++++++++++++ drivers/dma-buf/heaps/heap-helpers.c | 2 + 6 files changed, 282 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
The interface of dma heap is used from kernel module to register dma heaps, otherwize we will get compile error.
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com --- drivers/dma-buf/dma-heap.c | 2 ++ drivers/dma-buf/heaps/heap-helpers.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index afd22c9..cc6339c 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -189,6 +189,7 @@ void *dma_heap_get_drvdata(struct dma_heap *heap) { return heap->priv; } +EXPORT_SYMBOL_GPL(dma_heap_get_drvdata);
struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) { @@ -272,6 +273,7 @@ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info) kfree(heap); return err_ret; } +EXPORT_SYMBOL_GPL(dma_heap_add);
static char *dma_heap_devnode(struct device *dev, umode_t *mode) { diff --git a/drivers/dma-buf/heaps/heap-helpers.c b/drivers/dma-buf/heaps/heap-helpers.c index 9f964ca..741bae0 100644 --- a/drivers/dma-buf/heaps/heap-helpers.c +++ b/drivers/dma-buf/heaps/heap-helpers.c @@ -24,6 +24,7 @@ void init_heap_helper_buffer(struct heap_helper_buffer *buffer, INIT_LIST_HEAD(&buffer->attachments); buffer->free = free; } +EXPORT_SYMBOL_GPL(init_heap_helper_buffer);
struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer, int fd_flags) @@ -37,6 +38,7 @@ struct dma_buf *heap_helper_export_dmabuf(struct heap_helper_buffer *buffer,
return dma_buf_export(&exp_info); } +EXPORT_SYMBOL_GPL(heap_helper_export_dmabuf);
static void *dma_heap_map_kernel(struct heap_helper_buffer *buffer) {
This patch adds support for a chunk heap that allows for buffers that are made up of a list of fixed size chunks taken from a CMA. Chunk sizes are configuratd when the heaps are created.
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com --- drivers/dma-buf/heaps/Kconfig | 9 ++ drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/chunk_heap.c | 222 +++++++++++++++++++++++++++++++++++++ 3 files changed, 232 insertions(+) create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06..98552fa 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA Choose this option to enable dma-buf CMA heap. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config DMABUF_HEAPS_CHUNK + tristate "DMA-BUF CHUNK Heap" + depends on DMABUF_HEAPS && DMA_CMA + help + Choose this option to enable dma-buf CHUNK heap. This heap is backed + by the Contiguous Memory Allocator (CMA) and allocate the buffers that + are made up to a list of fixed size chunks tasken from CMA. Chunk sizes + are configurated when the heaps are created. diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 6e54cde..3b2a0986 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -2,3 +2,4 @@ obj-y += heap-helpers.o obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o +obj-$(CONFIG_DMABUF_HEAPS_CHUNK) += chunk_heap.o diff --git a/drivers/dma-buf/heaps/chunk_heap.c b/drivers/dma-buf/heaps/chunk_heap.c new file mode 100644 index 0000000..1eefaec --- /dev/null +++ b/drivers/dma-buf/heaps/chunk_heap.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * ION Memory Allocator chunk heap exporter + * + * Copyright (c) 2020 Samsung Electronics Co., Ltd. + * Author: hyesoo.yu@samsung.com for Samsung Electronics. + */ + +#include <linux/platform_device.h> +#include <linux/cma.h> +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/dma-contiguous.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/highmem.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/sched/signal.h> +#include <linux/of_reserved_mem.h> +#include <linux/of.h> + +#include "heap-helpers.h" + +struct chunk_heap { + struct dma_heap *heap; + phys_addr_t base; + phys_addr_t size; + atomic_t cur_pageblock_idx; + unsigned int max_num_pageblocks; + unsigned int order; +}; + +static void chunk_heap_free(struct heap_helper_buffer *buffer) +{ + struct chunk_heap *chunk_heap = dma_heap_get_drvdata(buffer->heap); + pgoff_t pg; + + for (pg = 0; pg < buffer->pagecount; pg++) + __free_pages(buffer->pages[pg], chunk_heap->order); + kvfree(buffer->pages); + kfree(buffer); +} + +static inline unsigned long chunk_get_next_pfn(struct chunk_heap *chunk_heap) +{ + unsigned long i = atomic_inc_return(&chunk_heap->cur_pageblock_idx) % + chunk_heap->max_num_pageblocks; + + return PHYS_PFN(chunk_heap->base) + i * pageblock_nr_pages; +} + +static int chunk_alloc_pages(struct chunk_heap *chunk_heap, struct page **pages, + unsigned int order, unsigned int count) +{ + unsigned long base; + unsigned int i = 0, nr_block = 0, nr_elem, ret; + + while (count) { + /* + * If the number of scanned page block is the same as max block, + * the tries of allocation fails. + */ + if (nr_block++ == chunk_heap->max_num_pageblocks) { + ret = -ENOMEM; + goto err_bulk; + } + base = chunk_get_next_pfn(chunk_heap); + nr_elem = min_t(unsigned int, count, pageblock_nr_pages >> order); + ret = alloc_pages_bulk(base, base + pageblock_nr_pages, MIGRATE_CMA, + GFP_KERNEL, order, nr_elem, pages + i); + if (ret < 0) + goto err_bulk; + + i += ret; + count -= ret; + } + + return 0; + +err_bulk: + while (i-- > 0) + __free_pages(pages[i], order); + + return ret; +} + +static int chunk_heap_allocate(struct dma_heap *heap, unsigned long len, + unsigned long fd_flags, unsigned long heap_flags) +{ + + struct chunk_heap *chunk_heap = dma_heap_get_drvdata(heap); + struct heap_helper_buffer *helper_buffer; + struct dma_buf *dmabuf; + unsigned int count = DIV_ROUND_UP(len, PAGE_SIZE << chunk_heap->order); + int ret = -ENOMEM; + + helper_buffer = kzalloc(sizeof(*helper_buffer), GFP_KERNEL); + if (!helper_buffer) + return ret; + + init_heap_helper_buffer(helper_buffer, chunk_heap_free); + + helper_buffer->heap = heap; + helper_buffer->size = ALIGN(len, PAGE_SIZE << chunk_heap->order); + helper_buffer->pagecount = count; + helper_buffer->pages = kvmalloc_array(helper_buffer->pagecount, + sizeof(*helper_buffer->pages), GFP_KERNEL); + if (!helper_buffer->pages) + goto err0; + + ret = chunk_alloc_pages(chunk_heap, helper_buffer->pages, + chunk_heap->order, helper_buffer->pagecount); + if (ret < 0) + goto err1; + + dmabuf = heap_helper_export_dmabuf(helper_buffer, fd_flags); + if (IS_ERR(dmabuf)) { + ret = PTR_ERR(dmabuf); + goto err2; + } + + helper_buffer->dmabuf = dmabuf; + + ret = dma_buf_fd(dmabuf, fd_flags); + if (ret < 0) { + dma_buf_put(dmabuf); + return ret; + } + + return ret; + +err2: + while (count-- > 0) + __free_pages(helper_buffer->pages[count], chunk_heap->order); +err1: + kvfree(helper_buffer->pages); +err0: + kfree(helper_buffer); + + return ret; +} + +static void rmem_remove_callback(void *p) +{ + of_reserved_mem_device_release((struct device *)p); +} + +static const struct dma_heap_ops chunk_heap_ops = { + .allocate = chunk_heap_allocate, +}; + +static int chunk_heap_probe(struct platform_device *pdev) +{ + struct chunk_heap *chunk_heap; + struct reserved_mem *rmem; + struct device_node *rmem_np; + struct dma_heap_export_info exp_info; + unsigned int alignment; + int ret; + + ret = of_reserved_mem_device_init(&pdev->dev); + if (ret || !pdev->dev.cma_area) { + dev_err(&pdev->dev, "The CMA reserved area is not assigned (ret %d)", ret); + return -EINVAL; + } + + ret = devm_add_action(&pdev->dev, rmem_remove_callback, &pdev->dev); + if (ret) { + of_reserved_mem_device_release(&pdev->dev); + return ret; + } + + rmem_np = of_parse_phandle(pdev->dev.of_node, "memory-region", 0); + rmem = of_reserved_mem_lookup(rmem_np); + + chunk_heap = devm_kzalloc(&pdev->dev, sizeof(*chunk_heap), GFP_KERNEL); + if (!chunk_heap) + return -ENOMEM; + + chunk_heap->base = rmem->base; + chunk_heap->size = rmem->size; + chunk_heap->max_num_pageblocks = rmem->size >> (pageblock_order + PAGE_SHIFT); + + of_property_read_u32(pdev->dev.of_node, "alignment", &alignment); + chunk_heap->order = get_order(alignment); + + exp_info.name = rmem->name; + exp_info.ops = &chunk_heap_ops; + exp_info.priv = chunk_heap; + + chunk_heap->heap = dma_heap_add(&exp_info); + if (IS_ERR(chunk_heap->heap)) + return PTR_ERR(chunk_heap->heap); + + return 0; +} + +static const struct of_device_id chunk_heap_of_match[] = { + { .compatible = "dma_heap,chunk", }, + { }, +}; + +MODULE_DEVICE_TABLE(of, chunk_heap_of_match); + +static struct platform_driver chunk_heap_driver = { + .driver = { + .name = "chunk_heap", + .of_match_table = chunk_heap_of_match, + }, + .probe = chunk_heap_probe, +}; + +static int __init chunk_heap_init(void) +{ + return platform_driver_register(&chunk_heap_driver); +} +module_init(chunk_heap_init); +MODULE_DESCRIPTION("DMA-BUF Chunk Heap"); +MODULE_LICENSE("GPL v2");
On 18.08.20 10:04, Hyesoo Yu wrote:
This patch adds support for a chunk heap that allows for buffers that are made up of a list of fixed size chunks taken from a CMA. Chunk sizes are configuratd when the heaps are created.
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com
drivers/dma-buf/heaps/Kconfig | 9 ++ drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/chunk_heap.c | 222 +++++++++++++++++++++++++++++++++++++ 3 files changed, 232 insertions(+) create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06..98552fa 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA Choose this option to enable dma-buf CMA heap. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here.
+config DMABUF_HEAPS_CHUNK
- tristate "DMA-BUF CHUNK Heap"
- depends on DMABUF_HEAPS && DMA_CMA
- help
Choose this option to enable dma-buf CHUNK heap. This heap is backed
by the Contiguous Memory Allocator (CMA) and allocate the buffers that
are made up to a list of fixed size chunks tasken from CMA. Chunk sizes
are configurated when the heaps are created.
diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 6e54cde..3b2a0986 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -2,3 +2,4 @@ obj-y += heap-helpers.o obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o +obj-$(CONFIG_DMABUF_HEAPS_CHUNK) += chunk_heap.o diff --git a/drivers/dma-buf/heaps/chunk_heap.c b/drivers/dma-buf/heaps/chunk_heap.c new file mode 100644 index 0000000..1eefaec --- /dev/null +++ b/drivers/dma-buf/heaps/chunk_heap.c @@ -0,0 +1,222 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- ION Memory Allocator chunk heap exporter
- Copyright (c) 2020 Samsung Electronics Co., Ltd.
- Author: hyesoo.yu@samsung.com for Samsung Electronics.
- */
+#include <linux/platform_device.h> +#include <linux/cma.h> +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/dma-contiguous.h> +#include <linux/err.h> +#include <linux/errno.h> +#include <linux/highmem.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/sched/signal.h> +#include <linux/of_reserved_mem.h> +#include <linux/of.h>
+#include "heap-helpers.h"
+struct chunk_heap {
- struct dma_heap *heap;
- phys_addr_t base;
- phys_addr_t size;
- atomic_t cur_pageblock_idx;
- unsigned int max_num_pageblocks;
- unsigned int order;
+};
+static void chunk_heap_free(struct heap_helper_buffer *buffer) +{
- struct chunk_heap *chunk_heap = dma_heap_get_drvdata(buffer->heap);
- pgoff_t pg;
- for (pg = 0; pg < buffer->pagecount; pg++)
__free_pages(buffer->pages[pg], chunk_heap->order);
- kvfree(buffer->pages);
- kfree(buffer);
+}
+static inline unsigned long chunk_get_next_pfn(struct chunk_heap *chunk_heap) +{
- unsigned long i = atomic_inc_return(&chunk_heap->cur_pageblock_idx) %
chunk_heap->max_num_pageblocks;
- return PHYS_PFN(chunk_heap->base) + i * pageblock_nr_pages;
+}
+static int chunk_alloc_pages(struct chunk_heap *chunk_heap, struct page **pages,
unsigned int order, unsigned int count)
+{
- unsigned long base;
- unsigned int i = 0, nr_block = 0, nr_elem, ret;
- while (count) {
/*
* If the number of scanned page block is the same as max block,
* the tries of allocation fails.
*/
if (nr_block++ == chunk_heap->max_num_pageblocks) {
ret = -ENOMEM;
goto err_bulk;
}
base = chunk_get_next_pfn(chunk_heap);
nr_elem = min_t(unsigned int, count, pageblock_nr_pages >> order);
ret = alloc_pages_bulk(base, base + pageblock_nr_pages, MIGRATE_CMA,
GFP_KERNEL, order, nr_elem, pages + i);
So you are bypassing the complete cma allocator here. This all smells like a complete hack to me. No, I don't think this is the right way to support (or rather speed up allocations for) special, weird hardware.
Document devicetree binding for chunk heap on dma heap framework
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com --- .../devicetree/bindings/dma-buf/chunk_heap.yaml | 46 ++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
diff --git a/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml new file mode 100644 index 0000000..1ee8fad --- /dev/null +++ b/Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml @@ -0,0 +1,46 @@ +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/dma-buf/chunk_heap.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Device tree binding for chunk heap on DMA HEAP FRAMEWORK + +maintainers: + - Sumit Semwal sumit.semwal@linaro.org + +description: | + The chunk heap is backed by the Contiguous Memory Allocator (CMA) and + allocates the buffers that are made up to a list of fixed size chunks + taken from CMA. Chunk sizes are configurated when the heaps are created. + +properties: + compatible: + enum: + - dma_heap,chunk + +required: + - compatible + - memory-region + - alignment + +additionalProperties: false + +examples: + - | + reserved-memory { + #address-cells = <2>; + #size-cells = <1>; + + chunk_memory: chunk_memory { + compatible = "shared-dma-pool"; + reusable; + size = <0x10000000>; + }; + }; + + chunk_default_heap: chunk_default_heap { + compatible = "dma_heap,chunk"; + memory-region = <&chunk_memory>; + alignment = <0x10000>; + };
On Tue, 18 Aug 2020 17:04:15 +0900, Hyesoo Yu wrote:
Document devicetree binding for chunk heap on dma heap framework
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com
.../devicetree/bindings/dma-buf/chunk_heap.yaml | 46 ++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
My bot found errors running 'make dt_binding_check' on your patch:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/dma-buf/chunk_heap.example.dt.yaml: chunk_default_heap: 'alignment', 'memory-region' do not match any of the regexes: 'pinctrl-[0-9]+'
See https://patchwork.ozlabs.org/patch/1346687
If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure dt-schema is up to date:
pip3 install git+https://github.com/devicetree-org/dt-schema.git@master --upgrade
Please check and re-submit.
Hi,
On Tue, Aug 18, 2020 at 10:48:12AM -0600, Rob Herring wrote:
On Tue, 18 Aug 2020 17:04:15 +0900, Hyesoo Yu wrote:
Document devicetree binding for chunk heap on dma heap framework
Signed-off-by: Hyesoo Yu hyesoo.yu@samsung.com
.../devicetree/bindings/dma-buf/chunk_heap.yaml | 46 ++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml
My bot found errors running 'make dt_binding_check' on your patch:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/dma-buf/chunk_heap.example.dt.yaml: chunk_default_heap: 'alignment', 'memory-region' do not match any of the regexes: 'pinctrl-[0-9]+'
See https://protect2.fireeye.com/v1/url?k=66da9090-3b1117ae-66db1bdf-0cc47a31309...
If you already ran 'make dt_binding_check' and didn't see the above error(s), then make sure dt-schema is up to date:
pip3 install git+https://protect2.fireeye.com/v1/url?k=c99eded1-945559ef-c99f559e-0cc47a31309... --upgrade
Please check and re-submit.
Thanks for reply. I missed alignment and memory-region on property. I added and ran dt_binding_check, and all passed.
I will re-submit the patch v2.
Regards. Hyesoo yu.
Hi,
On Tue, Aug 18, 2020 at 05:04:12PM +0900, Hyesoo Yu wrote:
These patch series to introduce a new dma heap, chunk heap. That heap is needed for special HW that requires bulk allocation of fixed high order pages. For example, 64MB dma-buf pages are made up to fixed order-4 pages * 1024.
The chunk heap uses alloc_pages_bulk to allocate high order page. https://lore.kernel.org/linux-mm/20200814173131.2803002-1-minchan@kernel.org
The chunk heap is registered by device tree with alignment and memory node of contiguous memory allocator(CMA). Alignment defines chunk page size. For example, alignment 0x1_0000 means chunk page size is 64KB. The phandle to memory node indicates contiguous memory allocator(CMA). If device node doesn't have cma, the registration of chunk heap fails.
This reminds me of an ion heap developed at Arm several years ago: https://git.linaro.org/landing-teams/working/arm/kernel.git/tree/drivers/sta...
Some more descriptive text here: https://github.com/ARM-software/CPA
It maintains a pool of high-order pages with a worker thread to attempt compaction and allocation to keep the pool filled, with high and low watermarks to trigger freeing/allocating of chunks. It implements a shrinker to allow the system to reclaim the pool under high memory pressure.
Is maintaining a pool something you considered? From the alloc_pages_bulk thread it sounds like you want to allocate 300M at a time, so I expect if you tuned the pool size to match that it could work quite well.
That implementation isn't using a CMA region, but a similar approach could definitely be applied.
Thanks, -Brian
The patchset includes the following:
- export dma-heap API to register kernel module dma heap.
- add chunk heap implementation.
- document of device tree to register chunk heap
Hyesoo Yu (3): dma-buf: add missing EXPORT_SYMBOL_GPL() for dma heaps dma-buf: heaps: add chunk heap to dmabuf heaps dma-heap: Devicetree binding for chunk heap
.../devicetree/bindings/dma-buf/chunk_heap.yaml | 46 +++++ drivers/dma-buf/dma-heap.c | 2 + drivers/dma-buf/heaps/Kconfig | 9 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/chunk_heap.c | 222 +++++++++++++++++++++ drivers/dma-buf/heaps/heap-helpers.c | 2 + 6 files changed, 282 insertions(+) create mode 100644 Documentation/devicetree/bindings/dma-buf/chunk_heap.yaml create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
-- 2.7.4
On Tue, Aug 18, 2020 at 11:55:57AM +0100, Brian Starkey wrote:
Hi,
On Tue, Aug 18, 2020 at 05:04:12PM +0900, Hyesoo Yu wrote:
These patch series to introduce a new dma heap, chunk heap. That heap is needed for special HW that requires bulk allocation of fixed high order pages. For example, 64MB dma-buf pages are made up to fixed order-4 pages * 1024.
The chunk heap uses alloc_pages_bulk to allocate high order page. https://lore.kernel.org/linux-mm/20200814173131.2803002-1-minchan@kernel.org
The chunk heap is registered by device tree with alignment and memory node of contiguous memory allocator(CMA). Alignment defines chunk page size. For example, alignment 0x1_0000 means chunk page size is 64KB. The phandle to memory node indicates contiguous memory allocator(CMA). If device node doesn't have cma, the registration of chunk heap fails.
This reminds me of an ion heap developed at Arm several years ago: https://protect2.fireeye.com/v1/url?k=aceed8af-f122140a-acef53e0-0cc47a30d44...
Some more descriptive text here: https://protect2.fireeye.com/v1/url?k=83dc3e8b-de10f22e-83ddb5c4-0cc47a30d44...
It maintains a pool of high-order pages with a worker thread to attempt compaction and allocation to keep the pool filled, with high and low watermarks to trigger freeing/allocating of chunks. It implements a shrinker to allow the system to reclaim the pool under high memory pressure.
Is maintaining a pool something you considered? From the alloc_pages_bulk thread it sounds like you want to allocate 300M at a time, so I expect if you tuned the pool size to match that it could work quite well.
That implementation isn't using a CMA region, but a similar approach could definitely be applied.
I have seriously considered CPA in our product but we developed our own because of the pool in CPA. The high-order pages are required by some specific users like Netflix app. Moreover required number of bytes are dramatically increasing because of high resolution videos and displays in these days.
Gathering lots of free high-order pages in the background during run-time means reserving that amount of pages from the entier available system memory. Moreover the gathered pages are soon reclaimed whenever the system is sufferring from memory pressure (i.e. camera recording, heavy games). So we had to consider allocating hundreds of megabytes at at time. Of course we don't allocate all buffers by a single call to alloc_pages_bulk(). But still a buffer is very large. A single frame of 8K HDR video needs 95MB (7680*4320*2*1.5). Even a single frame of HDR 4K video needs 24MB and 4K HDR is now popular in Netflix, YouTube and Google Play video.
Thanks, -Brian
Thank you!
KyongHo
Hi KyongHo,
On Wed, Aug 19, 2020 at 12:46:26PM +0900, Cho KyongHo wrote:
I have seriously considered CPA in our product but we developed our own because of the pool in CPA.
Oh good, I'm glad you considered it :-)
The high-order pages are required by some specific users like Netflix app. Moreover required number of bytes are dramatically increasing because of high resolution videos and displays in these days.
Gathering lots of free high-order pages in the background during run-time means reserving that amount of pages from the entier available system memory. Moreover the gathered pages are soon reclaimed whenever the system is sufferring from memory pressure (i.e. camera recording, heavy games).
Aren't these two things in contradiction? If they're easily reclaimed then they aren't "reserved" in any detrimental way. And if you don't want them to be reclaimed, then you need them to be reserved...
The approach you have here assigns the chunk of memory as a reserved CMA region which the kernel is going to try not to use too - similar to the CPA pool.
I suppose it's a balance depending on how much you're willing to wait for migration on the allocation path. CPA has the potential to get you faster allocations, but the downside is you need to make it a little more "greedy".
Cheers, -Brian
Hi Brain,
On Wed, Aug 19, 2020 at 02:22:04PM +0100, Brian Starkey wrote:
Hi KyongHo,
On Wed, Aug 19, 2020 at 12:46:26PM +0900, Cho KyongHo wrote:
I have seriously considered CPA in our product but we developed our own because of the pool in CPA.
Oh good, I'm glad you considered it :-)
The high-order pages are required by some specific users like Netflix app. Moreover required number of bytes are dramatically increasing because of high resolution videos and displays in these days.
Gathering lots of free high-order pages in the background during run-time means reserving that amount of pages from the entier available system memory. Moreover the gathered pages are soon reclaimed whenever the system is sufferring from memory pressure (i.e. camera recording, heavy games).
Aren't these two things in contradiction? If they're easily reclaimed then they aren't "reserved" in any detrimental way. And if you don't want them to be reclaimed, then you need them to be reserved...
The approach you have here assigns the chunk of memory as a reserved CMA region which the kernel is going to try not to use too - similar to the CPA pool.
I suppose it's a balance depending on how much you're willing to wait for migration on the allocation path. CPA has the potential to get you faster allocations, but the downside is you need to make it a little more "greedy".
I understand why you think it as contradiction. But I don't think so. Kernel page allocator now prefers free pages in CMA when allocating movable pages by commit https://lore.kernel.org/linux-mm/CAAmzW4P6+3O_RLvgy_QOKD4iXw+Hk3HE7Toc4Ky7kv... .
We are trying to reduce unused pages to improve performance. So, unused pages in a pool should be easily reclaimed. That is why we does not secure free pages in a special pool for a specific usecase. Instead we have tried to reduce performance bottle-necks in page migration to allocate large amount memory when the memory is needed.
On Tue, Aug 18, 2020 at 12:45 AM Hyesoo Yu hyesoo.yu@samsung.com wrote:
These patch series to introduce a new dma heap, chunk heap. That heap is needed for special HW that requires bulk allocation of fixed high order pages. For example, 64MB dma-buf pages are made up to fixed order-4 pages * 1024.
The chunk heap uses alloc_pages_bulk to allocate high order page. https://lore.kernel.org/linux-mm/20200814173131.2803002-1-minchan@kernel.org
The chunk heap is registered by device tree with alignment and memory node of contiguous memory allocator(CMA). Alignment defines chunk page size. For example, alignment 0x1_0000 means chunk page size is 64KB. The phandle to memory node indicates contiguous memory allocator(CMA). If device node doesn't have cma, the registration of chunk heap fails.
The patchset includes the following:
- export dma-heap API to register kernel module dma heap.
- add chunk heap implementation.
- document of device tree to register chunk heap
Hyesoo Yu (3): dma-buf: add missing EXPORT_SYMBOL_GPL() for dma heaps dma-buf: heaps: add chunk heap to dmabuf heaps dma-heap: Devicetree binding for chunk heap
Hey! Thanks so much for sending this out! I'm really excited to see these heaps be submitted and reviewed on the list!
The first general concern I have with your series is that it adds a dt binding for the chunk heap, which we've gotten a fair amount of pushback on.
A possible alternative might be something like what Kunihiko Hayashi proposed for non-default CMA heaps: https://lore.kernel.org/lkml/1594948208-4739-1-git-send-email-hayashi.kunihi...
This approach would insteal allow a driver to register a CMA area with the chunk heap implementation.
However, (and this was the catch Kunihiko Hayashi's patch) this requires that the driver also be upstream, as we need an in-tree user of such code.
Also, it might be good to provide some further rationale on why this heap is beneficial over the existing CMA heap? In general focusing the commit messages more on the why we might want the patch, rather than what the patch does, is helpful.
"Special hardware" that doesn't have upstream drivers isn't very compelling for most maintainers.
That said, I'm very excited to see these sorts of submissions, as I know lots of vendors have historically had very custom out of tree ION heaps, and I think it would be a great benefit to the community to better understand the experience vendors have in optimizing performance on their devices, so we can create good common solutions upstream. So I look forward to your insights on future revisions of this patch series!
thanks -john
linaro-mm-sig@lists.linaro.org