[PATCH 0/8] dma-buf: heaps: Support carved-out heaps and ECC related-flags

List overview All Threads
Download

newer

older

Re: [PATCH 1/2] dma-buf: heaps:...

Maxime Ripard

15 May 2024 15 May '24

1:56 p.m.

Hi,

This series is the follow-up of the discussion that John and I had a few months ago here:

https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrro...

The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases.

After a suggestion from John, I chose to start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties.

We could also easily extend this mechanism to support more flags, or through a new ioctl to discover which flags a given heap supports.

I submitted a draft PR to the DT schema for the bindings used in this PR: https://github.com/devicetree-org/dt-schema/pull/138

Let me know what you think, Maxime

Signed-off-by: Maxime Ripard mripard@kernel.org --- Maxime Ripard (8): dma-buf: heaps: Introduce a new heap for reserved memory of: Add helper to retrieve ECC memory bits dma-buf: heaps: Import uAPI header dma-buf: heaps: Add ECC protection flags dma-buf: heaps: system: Remove global variable dma-buf: heaps: system: Handle ECC flags dma-buf: heaps: cma: Handle ECC flags dma-buf: heaps: carveout: Handle ECC flags

drivers/dma-buf/dma-heap.c | 4 + drivers/dma-buf/heaps/Kconfig | 8 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/carveout_heap.c | 330 ++++++++++++++++++++++++++++++++++ drivers/dma-buf/heaps/cma_heap.c | 10 ++ drivers/dma-buf/heaps/system_heap.c | 29 ++- include/linux/dma-heap.h | 2 + include/linux/of.h | 25 +++ include/uapi/linux/dma-heap.h | 5 +- 9 files changed, 407 insertions(+), 7 deletions(-) --- base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6 change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e

Best regards,

-- Maxime Ripard mripard@kernel.org

Show replies by date

Maxime Ripard

15 May 15 May

1:56 p.m.

New subject: [PATCH 1/8] dma-buf: heaps: Introduce a new heap for reserved memory

Some reserved memory regions might have particular memory setup or attributes that make them good candidates for heaps.

Let's provide a heap type that will create a new heap for each reserved memory region flagged as such.

Signed-off-by: Maxime Ripard mripard@kernel.org --- drivers/dma-buf/heaps/Kconfig | 8 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/carveout_heap.c | 314 ++++++++++++++++++++++++++++++++++ 3 files changed, 323 insertions(+)

diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06c4226..c6981d696733 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -1,5 +1,13 @@ +config DMABUF_HEAPS_CARVEOUT + bool "Carveout Heaps" + depends on DMABUF_HEAPS + help + Choose this option to enable the carveout dmabuf heap. The carveout + heap is backed by pages from reserved memory regions flagged as + exportable. If in doubt, say Y. + config DMABUF_HEAPS_SYSTEM bool "DMA-BUF System Heap" depends on DMABUF_HEAPS help Choose this option to enable the system dmabuf heap. The system heap diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 974467791032..b734647ad5c8 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 +obj-$(CONFIG_DMABUF_HEAPS_CARVEOUT) += carveout_heap.o obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o diff --git a/drivers/dma-buf/heaps/carveout_heap.c b/drivers/dma-buf/heaps/carveout_heap.c new file mode 100644 index 000000000000..896ca67e6bd9 --- /dev/null +++ b/drivers/dma-buf/heaps/carveout_heap.c @@ -0,0 +1,314 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/genalloc.h> +#include <linux/of_reserved_mem.h> + +struct carveout_heap_priv { + struct dma_heap *heap; + struct gen_pool *pool; +}; + +struct carveout_heap_buffer_priv { + struct mutex lock; + struct list_head attachments; + + unsigned long len; + unsigned long num_pages; + struct carveout_heap_priv *heap; + void *buffer; +}; + +struct carveout_heap_attachment { + struct list_head head; + struct sg_table table; + + struct device *dev; + bool mapped; +}; + +static int carveout_heap_attach(struct dma_buf *buf, + struct dma_buf_attachment *attachment) +{ + struct carveout_heap_buffer_priv *priv = buf->priv; + struct carveout_heap_attachment *a; + int ret; + + a = kzalloc(sizeof(*a), GFP_KERNEL); + if (!a) + return -ENOMEM; + INIT_LIST_HEAD(&a->head); + a->dev = attachment->dev; + attachment->priv = a; + + ret = sg_alloc_table(&a->table, priv->num_pages, GFP_KERNEL); + if (ret) + goto err_cleanup_attach; + + mutex_lock(&priv->lock); + list_add(&a->head, &priv->attachments); + mutex_unlock(&priv->lock); + + return 0; + +err_cleanup_attach: + kfree(a); + return ret; +} + +static void carveout_heap_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *attachment) +{ + struct carveout_heap_buffer_priv *priv = dmabuf->priv; + struct carveout_heap_attachment *a = attachment->priv; + + mutex_lock(&priv->lock); + list_del(&a->head); + mutex_unlock(&priv->lock); + + sg_free_table(&a->table); + kfree(a); +} + +static struct sg_table * +carveout_heap_map_dma_buf(struct dma_buf_attachment *attachment, + enum dma_data_direction direction) +{ + struct carveout_heap_attachment *a = attachment->priv; + struct sg_table *table = &a->table; + int ret; + + ret = dma_map_sgtable(a->dev, table, direction, 0); + if (ret) + return ERR_PTR(-ENOMEM); + + a->mapped = true; + + return table; +} + +static void carveout_heap_unmap_dma_buf(struct dma_buf_attachment *attachment, + struct sg_table *table, + enum dma_data_direction direction) +{ + struct carveout_heap_attachment *a = attachment->priv; + + a->mapped = false; + dma_unmap_sgtable(a->dev, table, direction, 0); +} + +static int +carveout_heap_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + enum dma_data_direction direction) +{ + struct carveout_heap_buffer_priv *priv = dmabuf->priv; + struct carveout_heap_attachment *a; + + mutex_lock(&priv->lock); + + list_for_each_entry(a, &priv->attachments, head) { + if (!a->mapped) + continue; + + dma_sync_sgtable_for_cpu(a->dev, &a->table, direction); + } + + mutex_unlock(&priv->lock); + + return 0; +} + +static int +carveout_heap_dma_buf_end_cpu_access(struct dma_buf *dmabuf, + enum dma_data_direction direction) +{ + struct carveout_heap_buffer_priv *priv = dmabuf->priv; + struct carveout_heap_attachment *a; + + mutex_lock(&priv->lock); + + list_for_each_entry(a, &priv->attachments, head) { + if (!a->mapped) + continue; + + dma_sync_sgtable_for_device(a->dev, &a->table, direction); + } + + mutex_unlock(&priv->lock); + + return 0; +} + +static int carveout_heap_mmap(struct dma_buf *dmabuf, + struct vm_area_struct *vma) +{ + struct carveout_heap_buffer_priv *priv = dmabuf->priv; + struct page *page = virt_to_page(priv->buffer); + + return remap_pfn_range(vma, vma->vm_start, page_to_pfn(page), + priv->num_pages * PAGE_SIZE, vma->vm_page_prot); +} + +static void carveout_heap_dma_buf_release(struct dma_buf *buf) +{ + struct carveout_heap_buffer_priv *buffer_priv = buf->priv; + struct carveout_heap_priv *heap_priv = buffer_priv->heap; + + gen_pool_free(heap_priv->pool, (unsigned long)buffer_priv->buffer, + buffer_priv->len); + kfree(buffer_priv); +} + +static const struct dma_buf_ops carveout_heap_buf_ops = { + .attach = carveout_heap_attach, + .detach = carveout_heap_detach, + .map_dma_buf = carveout_heap_map_dma_buf, + .unmap_dma_buf = carveout_heap_unmap_dma_buf, + .begin_cpu_access = carveout_heap_dma_buf_begin_cpu_access, + .end_cpu_access = carveout_heap_dma_buf_end_cpu_access, + .mmap = carveout_heap_mmap, + .release = carveout_heap_dma_buf_release, +}; + +static struct dma_buf *carveout_heap_allocate(struct dma_heap *heap, + unsigned long len, + unsigned long fd_flags, + unsigned long heap_flags) +{ + struct carveout_heap_priv *heap_priv = dma_heap_get_drvdata(heap); + struct carveout_heap_buffer_priv *buffer_priv; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct dma_buf *buf; + dma_addr_t daddr; + void *buffer; + int ret; + + buffer_priv = kzalloc(sizeof(*buffer_priv), GFP_KERNEL); + if (!buffer_priv) + return ERR_PTR(-ENOMEM); + + INIT_LIST_HEAD(&buffer_priv->attachments); + mutex_init(&buffer_priv->lock); + + buffer = gen_pool_dma_zalloc(heap_priv->pool, len, &daddr); + if (!buffer) { + ret = -ENOMEM; + goto err_free_buffer_priv; + } + + buffer_priv->buffer = buffer; + buffer_priv->heap = heap_priv; + buffer_priv->len = len; + buffer_priv->num_pages = len / PAGE_SIZE; + + /* create the dmabuf */ + exp_info.exp_name = dma_heap_get_name(heap); + exp_info.ops = &carveout_heap_buf_ops; + exp_info.size = buffer_priv->len; + exp_info.flags = fd_flags; + exp_info.priv = buffer_priv; + + buf = dma_buf_export(&exp_info); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto err_free_buffer; + } + + return buf; + +err_free_buffer: + gen_pool_free(heap_priv->pool, (unsigned long)buffer, len); +err_free_buffer_priv: + kfree(buffer_priv); + + return ERR_PTR(ret); +} + +static const struct dma_heap_ops carveout_heap_ops = { + .allocate = carveout_heap_allocate, +}; + +static int __init carveout_heap_setup(struct device_node *node) +{ + struct dma_heap_export_info exp_info = {}; + const struct reserved_mem *rmem; + struct carveout_heap_priv *priv; + struct dma_heap *heap; + struct gen_pool *pool; + void *base; + int ret; + + rmem = of_reserved_mem_lookup(node); + if (!rmem) + return -EINVAL; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); + if (!pool) { + ret = -ENOMEM; + goto err_cleanup_heap; + } + priv->pool = pool; + + base = memremap(rmem->base, rmem->size, MEMREMAP_WB); + if (!base) { + ret = -ENOMEM; + goto err_release_mem_region; + } + + ret = gen_pool_add_virt(pool, (unsigned long)base, rmem->base, + rmem->size, NUMA_NO_NODE); + if (ret) + goto err_unmap; + + exp_info.name = node->full_name; + exp_info.ops = &carveout_heap_ops; + exp_info.priv = priv; + + heap = dma_heap_add(&exp_info); + if (IS_ERR(heap)) { + ret = PTR_ERR(heap); + goto err_cleanup_pool_region; + } + priv->heap = heap; + + return 0; + +err_cleanup_pool_region: + gen_pool_free(pool, (unsigned long)base, rmem->size); +err_unmap: + memunmap(base); +err_release_mem_region: + gen_pool_destroy(pool); +err_cleanup_heap: + kfree(priv); + return ret; +} + +static int __init carveout_heap_init(void) +{ + struct device_node *rmem_node; + struct device_node *node; + int ret; + + rmem_node = of_find_node_by_path("/reserved-memory"); + if (!rmem_node) + return 0; + + for_each_child_of_node(rmem_node, node) { + if (!of_property_read_bool(node, "export")) + continue; + + ret = carveout_heap_setup(node); + if (ret) + return ret; + } + + return 0; +} + +module_init(carveout_heap_init);

-- 2.44.0

Maxime Ripard

1:56 p.m.

New subject: [PATCH 2/8] of: Add helper to retrieve ECC memory bits

The /memory device tree bindings allow to store the ECC detection and correction bits through the ecc-detection-bits and ecc-correction-bits properties.

Our next patches rely on whether ECC is enabled, so let's add a helper to retrieve the ECC correction bits from the /memory node.

Signed-off-by: Maxime Ripard mripard@kernel.org --- include/linux/of.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)

diff --git a/include/linux/of.h b/include/linux/of.h index a0bedd038a05..2fbee65a7aa9 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1510,10 +1510,35 @@ static inline int of_get_available_child_count(const struct device_node *np) num++;

return num; }

+/** + * of_memory_get_ecc_correction_bits() - Returns the number of ECC correction bits + * + * Search for the number of bits in memory that can be corrected by the + * ECC algorithm. + * + * Returns: + * The number of ECC bits, 0 if there's no ECC support, a negative error + * code on failure. + */ +static inline int of_memory_get_ecc_correction_bits(void) +{ + struct device_node *mem; + u32 val = 0; + + mem = of_find_node_by_path("/memory"); + if (!mem) + return -ENODEV; + + of_property_read_u32(mem, "ecc-correction-bits", &val); + of_node_put(mem); + + return val; +} + #define _OF_DECLARE_STUB(table, name, compat, fn, fn_type) \ static const struct of_device_id __of_table_##name \ __attribute__((unused)) \ = { .compatible = compat, \ .data = (fn == (fn_type)NULL) ? fn : fn }

-- 2.44.0

Maxime Ripard

1:56 p.m.

New subject: [PATCH 3/8] dma-buf: heaps: Import uAPI header

The uAPI header has a bunch of constants and structures that might be handy in drivers.

Let's include the header in the driver-side dma-heap header.

Signed-off-by: Maxime Ripard mripard@kernel.org --- include/linux/dma-heap.h | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h index 0c05561cad6e..e7cf110c5fdc 100644 --- a/include/linux/dma-heap.h +++ b/include/linux/dma-heap.h @@ -10,10 +10,12 @@ #define _DMA_HEAPS_H

#include <linux/cdev.h> #include <linux/types.h>

+#include <uapi/linux/dma-heap.h> + struct dma_heap;

/** * struct dma_heap_ops - ops to operate on a given heap * @allocate: allocate dmabuf and return struct dma_buf ptr

-- 2.44.0

Christian König

16 May 16 May

8:14 a.m.

New subject: [PATCH 3/8] dma-buf: heaps: Import uAPI header

Am 15.05.24 um 15:56 schrieb Maxime Ripard:

...

The uAPI header has a bunch of constants and structures that might be handy in drivers.

Let's include the header in the driver-side dma-heap header.

Well as long as this header doesn't need any symbols from the uAPI itself I think that is a no-go.

Includes should only be applied for things which are really necessary and not because some driver might need it.

Regards, Christian.

...

Signed-off-by: Maxime Ripard mripard@kernel.org

include/linux/dma-heap.h | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h index 0c05561cad6e..e7cf110c5fdc 100644 --- a/include/linux/dma-heap.h +++ b/include/linux/dma-heap.h @@ -10,10 +10,12 @@ #define _DMA_HEAPS_H #include <linux/cdev.h> #include <linux/types.h> +#include <uapi/linux/dma-heap.h>

struct dma_heap;

/**

struct dma_heap_ops - ops to operate on a given heap

@allocate: allocate dmabuf and return struct dma_buf ptr

Maxime Ripard

15 May 15 May

1:56 p.m.

New subject: [PATCH 4/8] dma-buf: heaps: Add ECC protection flags

Some systems run with ECC enabled on the memory by default, but with some memory regions with ECC disabled to mitigate the downsides of enabling ECC (reduced performances, increased memory usage) for buffers that don't require it.

Let's create some allocation flags to ask for a particular ECC setup when allocate a dma-buf through a heap.

Signed-off-by: Maxime Ripard mripard@kernel.org --- drivers/dma-buf/dma-heap.c | 4 ++++ include/uapi/linux/dma-heap.h | 5 +++-- 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index 84ae708fafe7..a96c1865b627 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -106,10 +106,14 @@ static long dma_heap_ioctl_allocate(struct file *file, void *data) return -EINVAL;

if (heap_allocation->heap_flags & ~DMA_HEAP_VALID_HEAP_FLAGS) return -EINVAL;

+ if ((heap_allocation->heap_flags & DMA_HEAP_FLAG_ECC_PROTECTED) && + (heap_allocation->heap_flags & DMA_HEAP_FLAG_ECC_UNPROTECTED)) + return -EINVAL; + fd = dma_heap_buffer_alloc(heap, heap_allocation->len, heap_allocation->fd_flags, heap_allocation->heap_flags); if (fd < 0) return fd; diff --git a/include/uapi/linux/dma-heap.h b/include/uapi/linux/dma-heap.h index 6f84fa08e074..677b6355552d 100644 --- a/include/uapi/linux/dma-heap.h +++ b/include/uapi/linux/dma-heap.h @@ -16,12 +16,13 @@ */

/* Valid FD_FLAGS are O_CLOEXEC, O_RDONLY, O_WRONLY, O_RDWR */ #define DMA_HEAP_VALID_FD_FLAGS (O_CLOEXEC | O_ACCMODE)

-/* Currently no heap flags */ -#define DMA_HEAP_VALID_HEAP_FLAGS (0) +#define DMA_HEAP_FLAG_ECC_PROTECTED BIT(0) +#define DMA_HEAP_FLAG_ECC_UNPROTECTED BIT(1) +#define DMA_HEAP_VALID_HEAP_FLAGS (DMA_HEAP_FLAG_ECC_PROTECTED | DMA_HEAP_FLAG_ECC_UNPROTECTED)

/** * struct dma_heap_allocation_data - metadata passed from userspace for * allocations * @len: size of the allocation

-- 2.44.0

Maxime Ripard

1:57 p.m.

New subject: [PATCH 5/8] dma-buf: heaps: system: Remove global variable

The system heap has been using its struct dma_heap pointer but wasn't using it anywhere.

Since we'll need additional parameters to attach to that heap type, let's create a private structure and set it as the dma_heap drvdata, removing the global variable in the process.

Signed-off-by: Maxime Ripard mripard@kernel.org --- drivers/dma-buf/heaps/system_heap.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index 9076d47ed2ef..8b5e6344eea4 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -19,11 +19,13 @@ #include <linux/module.h> #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/vmalloc.h>

-static struct dma_heap *sys_heap; +struct system_heap { + struct dma_heap *heap; +};

struct system_heap_buffer { struct dma_heap *heap; struct list_head attachments; struct mutex lock; @@ -422,17 +424,22 @@ static const struct dma_heap_ops system_heap_ops = { };

static int system_heap_create(void) { struct dma_heap_export_info exp_info; + struct system_heap *sys_heap; + + sys_heap = kzalloc(sizeof(*sys_heap), GFP_KERNEL); + if (!sys_heap) + return -ENOMEM;

exp_info.name = "system"; exp_info.ops = &system_heap_ops; - exp_info.priv = NULL; + exp_info.priv = sys_heap;

- sys_heap = dma_heap_add(&exp_info); - if (IS_ERR(sys_heap)) - return PTR_ERR(sys_heap); + sys_heap->heap = dma_heap_add(&exp_info); + if (IS_ERR(sys_heap->heap)) + return PTR_ERR(sys_heap->heap);

return 0; } module_init(system_heap_create);

-- 2.44.0

Maxime Ripard

1:57 p.m.

New subject: [PATCH 6/8] dma-buf: heaps: system: Handle ECC flags

Now that we have introduced ECC-related flags for the dma-heaps buffer allocations, let's honour these flags depending on the memory setup.

Signed-off-by: Maxime Ripard mripard@kernel.org --- drivers/dma-buf/heaps/system_heap.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c index 8b5e6344eea4..dd7c8b6f9cf6 100644 --- a/drivers/dma-buf/heaps/system_heap.c +++ b/drivers/dma-buf/heaps/system_heap.c @@ -18,13 +18,15 @@ #include <linux/mm.h> #include <linux/module.h> #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/vmalloc.h> +#include <linux/of.h>

struct system_heap { struct dma_heap *heap; + bool ecc_enabled; };

struct system_heap_buffer { struct dma_heap *heap; struct list_head attachments; @@ -336,10 +338,11 @@ static struct page *alloc_largest_available(unsigned long size, static struct dma_buf *system_heap_allocate(struct dma_heap *heap, unsigned long len, unsigned long fd_flags, unsigned long heap_flags) { + struct system_heap *sys_heap = dma_heap_get_drvdata(heap); struct system_heap_buffer *buffer; DEFINE_DMA_BUF_EXPORT_INFO(exp_info); unsigned long size_remaining = len; unsigned int max_order = orders[0]; struct dma_buf *dmabuf; @@ -347,10 +350,16 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap, struct scatterlist *sg; struct list_head pages; struct page *page, *tmp_page; int i, ret = -ENOMEM;

+ if (!sys_heap->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_PROTECTED)) + return ERR_PTR(-EINVAL); + + if (sys_heap->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_UNPROTECTED)) + return ERR_PTR(-EINVAL); + buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); if (!buffer) return ERR_PTR(-ENOMEM);

INIT_LIST_HEAD(&buffer->attachments); @@ -430,10 +439,13 @@ static int system_heap_create(void)

sys_heap = kzalloc(sizeof(*sys_heap), GFP_KERNEL); if (!sys_heap) return -ENOMEM;

+ if (of_memory_get_ecc_correction_bits() > 0) + sys_heap->ecc_enabled = true; + exp_info.name = "system"; exp_info.ops = &system_heap_ops; exp_info.priv = sys_heap;

sys_heap->heap = dma_heap_add(&exp_info);

-- 2.44.0

Maxime Ripard

1:57 p.m.

New subject: [PATCH 7/8] dma-buf: heaps: cma: Handle ECC flags

Now that we have introduced ECC-related flags for the dma-heaps buffer allocations, let's honour these flags depending on the memory setup.

Signed-off-by: Maxime Ripard mripard@kernel.org --- drivers/dma-buf/heaps/cma_heap.c | 10 ++++++++++ 1 file changed, 10 insertions(+)

diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c index 4a63567e93ba..1e6babbd8eb5 100644 --- a/drivers/dma-buf/heaps/cma_heap.c +++ b/drivers/dma-buf/heaps/cma_heap.c @@ -24,10 +24,11 @@

struct cma_heap { struct dma_heap *heap; struct cma *cma; + bool ecc_enabled; };

struct cma_heap_buffer { struct cma_heap *heap; struct list_head attachments; @@ -286,10 +287,16 @@ static struct dma_buf *cma_heap_allocate(struct dma_heap *heap, struct page *cma_pages; struct dma_buf *dmabuf; int ret = -ENOMEM; pgoff_t pg;

+ if (!cma_heap->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_PROTECTED)) + return -EINVAL; + + if (cma_heap->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_UNPROTECTED)) + return -EINVAL; + buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); if (!buffer) return ERR_PTR(-ENOMEM);

INIT_LIST_HEAD(&buffer->attachments); @@ -374,10 +381,13 @@ static int __add_cma_heap(struct cma *cma, void *data) cma_heap = kzalloc(sizeof(*cma_heap), GFP_KERNEL); if (!cma_heap) return -ENOMEM; cma_heap->cma = cma;

+ if (of_memory_get_ecc_correction_bits() > 0) + cma_heap->ecc_enabled = true; + exp_info.name = cma_get_name(cma); exp_info.ops = &cma_heap_ops; exp_info.priv = cma_heap;

cma_heap->heap = dma_heap_add(&exp_info);

-- 2.44.0

kernel test robot

16 May 16 May

5:55 a.m.

New subject: [PATCH 7/8] dma-buf: heaps: cma: Handle ECC flags

Hi Maxime,

kernel test robot noticed the following build warnings:

[auto build test WARNING on a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6]

url: https://github.com/intel-lab-lkp/linux/commits/Maxime-Ripard/dma-buf-heaps-I... base: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6 patch link: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-7-54cbbd049511%40kern... patch subject: [PATCH 7/8] dma-buf: heaps: cma: Handle ECC flags config: mips-allmodconfig (https://download.01.org/0day-ci/archive/20240516/202405161341.XBePS2s0-lkp@i...) compiler: mips-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405161341.XBePS2s0-lkp@i...)

If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202405161341.XBePS2s0-lkp@intel.com/

All warnings (new ones prefixed by >>):

drivers/dma-buf/heaps/cma_heap.c: In function 'cma_heap_allocate':

...

...
drivers/dma-buf/heaps/cma_heap.c:293:24: warning: returning 'int' from a function with return type 'struct dma_buf *' makes pointer from integer without a cast [-Wint-conversion]

293 | return -EINVAL; | ^ drivers/dma-buf/heaps/cma_heap.c:296:24: warning: returning 'int' from a function with return type 'struct dma_buf *' makes pointer from integer without a cast [-Wint-conversion] 296 | return -EINVAL; | ^ drivers/dma-buf/heaps/cma_heap.c: In function '__add_cma_heap': drivers/dma-buf/heaps/cma_heap.c:386:13: error: implicit declaration of function 'of_memory_get_ecc_correction_bits' [-Werror=implicit-function-declaration] 386 | if (of_memory_get_ecc_correction_bits() > 0) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors

vim +293 drivers/dma-buf/heaps/cma_heap.c

275 276 static struct dma_buf *cma_heap_allocate(struct dma_heap *heap, 277 unsigned long len, 278 unsigned long fd_flags, 279 unsigned long heap_flags) 280 { 281 struct cma_heap *cma_heap = dma_heap_get_drvdata(heap); 282 struct cma_heap_buffer *buffer; 283 DEFINE_DMA_BUF_EXPORT_INFO(exp_info); 284 size_t size = PAGE_ALIGN(len); 285 pgoff_t pagecount = size >> PAGE_SHIFT; 286 unsigned long align = get_order(size); 287 struct page *cma_pages; 288 struct dma_buf *dmabuf; 289 int ret = -ENOMEM; 290 pgoff_t pg; 291 292 if (!cma_heap->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_PROTECTED))

...

293 return -EINVAL;

294 295 if (cma_heap->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_UNPROTECTED)) 296 return -EINVAL; 297 298 buffer = kzalloc(sizeof(*buffer), GFP_KERNEL); 299 if (!buffer) 300 return ERR_PTR(-ENOMEM); 301 302 INIT_LIST_HEAD(&buffer->attachments); 303 mutex_init(&buffer->lock); 304 buffer->len = size; 305 306 if (align > CONFIG_CMA_ALIGNMENT) 307 align = CONFIG_CMA_ALIGNMENT; 308 309 cma_pages = cma_alloc(cma_heap->cma, pagecount, align, false); 310 if (!cma_pages) 311 goto free_buffer; 312 313 /* Clear the cma pages */ 314 if (PageHighMem(cma_pages)) { 315 unsigned long nr_clear_pages = pagecount; 316 struct page *page = cma_pages; 317 318 while (nr_clear_pages > 0) { 319 void *vaddr = kmap_atomic(page); 320 321 memset(vaddr, 0, PAGE_SIZE); 322 kunmap_atomic(vaddr); 323 /* 324 * Avoid wasting time zeroing memory if the process 325 * has been killed by by SIGKILL 326 */ 327 if (fatal_signal_pending(current)) 328 goto free_cma; 329 page++; 330 nr_clear_pages--; 331 } 332 } else { 333 memset(page_address(cma_pages), 0, size); 334 } 335 336 buffer->pages = kmalloc_array(pagecount, sizeof(*buffer->pages), GFP_KERNEL); 337 if (!buffer->pages) { 338 ret = -ENOMEM; 339 goto free_cma; 340 } 341 342 for (pg = 0; pg < pagecount; pg++) 343 buffer->pages[pg] = &cma_pages[pg]; 344 345 buffer->cma_pages = cma_pages; 346 buffer->heap = cma_heap; 347 buffer->pagecount = pagecount; 348 349 /* create the dmabuf */ 350 exp_info.exp_name = dma_heap_get_name(heap); 351 exp_info.ops = &cma_heap_buf_ops; 352 exp_info.size = buffer->len; 353 exp_info.flags = fd_flags; 354 exp_info.priv = buffer; 355 dmabuf = dma_buf_export(&exp_info); 356 if (IS_ERR(dmabuf)) { 357 ret = PTR_ERR(dmabuf); 358 goto free_pages; 359 } 360 return dmabuf; 361 362 free_pages: 363 kfree(buffer->pages); 364 free_cma: 365 cma_release(cma_heap->cma, cma_pages, pagecount); 366 free_buffer: 367 kfree(buffer); 368 369 return ERR_PTR(ret); 370 } 371

-- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

kernel test robot

12:48 p.m.

New subject: [PATCH 7/8] dma-buf: heaps: cma: Handle ECC flags

Hi Maxime,

kernel test robot noticed the following build errors:

[auto build test ERROR on a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6]

url: https://github.com/intel-lab-lkp/linux/commits/Maxime-Ripard/dma-buf-heaps-I... base: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6 patch link: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-7-54cbbd049511%40kern... patch subject: [PATCH 7/8] dma-buf: heaps: cma: Handle ECC flags config: mips-allmodconfig (https://download.01.org/0day-ci/archive/20240516/202405162048.CExrV8yy-lkp@i...) compiler: mips-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240516/202405162048.CExrV8yy-lkp@i...)

All errors (new ones prefixed by >>):

drivers/dma-buf/heaps/cma_heap.c: In function 'cma_heap_allocate': drivers/dma-buf/heaps/cma_heap.c:293:24: warning: returning 'int' from a function with return type 'struct dma_buf *' makes pointer from integer without a cast [-Wint-conversion] 293 | return -EINVAL; | ^ drivers/dma-buf/heaps/cma_heap.c:296:24: warning: returning 'int' from a function with return type 'struct dma_buf *' makes pointer from integer without a cast [-Wint-conversion] 296 | return -EINVAL; | ^ drivers/dma-buf/heaps/cma_heap.c: In function '__add_cma_heap':

...

...
drivers/dma-buf/heaps/cma_heap.c:386:13: error: implicit declaration of function 'of_memory_get_ecc_correction_bits' [-Werror=implicit-function-declaration]

386 | if (of_memory_get_ecc_correction_bits() > 0) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors

vim +/of_memory_get_ecc_correction_bits +386 drivers/dma-buf/heaps/cma_heap.c

...

293 return -EINVAL;

...

386 if (of_memory_get_ecc_correction_bits() > 0)

387 cma_heap->ecc_enabled = true; 388 389 exp_info.name = cma_get_name(cma); 390 exp_info.ops = &cma_heap_ops; 391 exp_info.priv = cma_heap; 392 393 cma_heap->heap = dma_heap_add(&exp_info); 394 if (IS_ERR(cma_heap->heap)) { 395 int ret = PTR_ERR(cma_heap->heap); 396 397 kfree(cma_heap); 398 return ret; 399 } 400 401 return 0; 402 } 403

-- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

Maxime Ripard

15 May 15 May

1:57 p.m.

New subject: [PATCH 8/8] dma-buf: heaps: carveout: Handle ECC flags

Now that we have introduced ECC-related flags for the dma-heaps buffer allocations, let's honour these flags depending on the memory setup.

Signed-off-by: Maxime Ripard mripard@kernel.org --- drivers/dma-buf/heaps/carveout_heap.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)

diff --git a/drivers/dma-buf/heaps/carveout_heap.c b/drivers/dma-buf/heaps/carveout_heap.c index 896ca67e6bd9..81b167785999 100644 --- a/drivers/dma-buf/heaps/carveout_heap.c +++ b/drivers/dma-buf/heaps/carveout_heap.c @@ -6,10 +6,11 @@ #include <linux/of_reserved_mem.h>

struct carveout_heap_priv { struct dma_heap *heap; struct gen_pool *pool; + bool ecc_enabled; };

struct carveout_heap_buffer_priv { struct mutex lock; struct list_head attachments; @@ -182,10 +183,16 @@ static struct dma_buf *carveout_heap_allocate(struct dma_heap *heap, struct dma_buf *buf; dma_addr_t daddr; void *buffer; int ret;

+ if (!heap_priv->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_PROTECTED)) + return ERR_PTR(-EINVAL); + + if (heap_priv->ecc_enabled && (heap_flags & DMA_HEAP_FLAG_ECC_UNPROTECTED)) + return ERR_PTR(-EINVAL); + buffer_priv = kzalloc(sizeof(*buffer_priv), GFP_KERNEL); if (!buffer_priv) return ERR_PTR(-ENOMEM);

INIT_LIST_HEAD(&buffer_priv->attachments); @@ -235,20 +242,29 @@ static int __init carveout_heap_setup(struct device_node *node) const struct reserved_mem *rmem; struct carveout_heap_priv *priv; struct dma_heap *heap; struct gen_pool *pool; void *base; + u32 val = 0; int ret;

rmem = of_reserved_mem_lookup(node); if (!rmem) return -EINVAL;

priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM;

+ of_property_read_u32(node, "ecc-correction-bits", &val); + if (val <= 0) { + if (of_memory_get_ecc_correction_bits() > 0) + priv->ecc_enabled = true; + } else { + priv->ecc_enabled = true; + } + pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); if (!pool) { ret = -ENOMEM; goto err_cleanup_heap; }

-- 2.44.0

John Stultz

6:42 p.m.

On Wed, May 15, 2024 at 6:57 AM Maxime Ripard mripard@kernel.org wrote:

...

This series is the follow-up of the discussion that John and I had a few months ago here:

https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrro...

The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases.

After a suggestion from John, I chose to start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties.

We could also easily extend this mechanism to support more flags, or through a new ioctl to discover which flags a given heap supports.

Hey! Thanks for sending this along! I'm eager to see more heap related work being done upstream.

The only thing that makes me a bit hesitant, is the introduction of allocation flags (as opposed to a uniquely specified/named "ecc" heap).

We did talk about this earlier, and my earlier press that only if the ECC flag was general enough to apply to the majority of heaps then it makes sense as a flag, and your patch here does apply it to all the heaps. So I don't have an objection.

But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another thing to discuss, that I didn't see in your mail: Do we have an open-source user of this new flag?

thanks -john

Daniel Vetter

16 May 16 May

10:56 a.m.

On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...

On Wed, May 15, 2024 at 6:57 AM Maxime Ripard mripard@kernel.org wrote:

...
This series is the follow-up of the discussion that John and I had a few months ago here:

https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrro...

The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases.

After a suggestion from John, I chose to start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties.

We could also easily extend this mechanism to support more flags, or through a new ioctl to discover which flags a given heap supports.

Hey! Thanks for sending this along! I'm eager to see more heap related work being done upstream.

The only thing that makes me a bit hesitant, is the introduction of allocation flags (as opposed to a uniquely specified/named "ecc" heap).

We did talk about this earlier, and my earlier press that only if the ECC flag was general enough to apply to the majority of heaps then it makes sense as a flag, and your patch here does apply it to all the heaps. So I don't have an objection.

But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

...

Another thing to discuss, that I didn't see in your mail: Do we have an open-source user of this new flag?

I think one option might be to just start using these internally, but not sure the dma-api would understand a fallback cadence of allocators (afaik you can specify specific cma regions already, but that doesn't really covere the case where you can fall back to pages and iommu to remap to contig dma space) ... And I don't think abandonding the dma-api for allocating cma buffers is going to be a popular proposal. -Sima

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Maxime Ripard

12:41 p.m.

On Thu, May 16, 2024 at 12:56:27PM +0200, Daniel Vetter wrote:

...

On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
On Wed, May 15, 2024 at 6:57 AM Maxime Ripard mripard@kernel.org wrote:

...
This series is the follow-up of the discussion that John and I had a few months ago here:

https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrro...

The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases.

After a suggestion from John, I chose to start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties.

We could also easily extend this mechanism to support more flags, or through a new ioctl to discover which flags a given heap supports.

Hey! Thanks for sending this along! I'm eager to see more heap related work being done upstream.

The only thing that makes me a bit hesitant, is the introduction of allocation flags (as opposed to a uniquely specified/named "ecc" heap).

We did talk about this earlier, and my earlier press that only if the ECC flag was general enough to apply to the majority of heaps then it makes sense as a flag, and your patch here does apply it to all the heaps. So I don't have an objection.

But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

I guess it depends what we want to use the heaps for exactly. If we create a heap by type, then the number of heaps is going to explode and their name is going to be super weird and inconsistent.

Using the ECC setup here as an example, it means that we would need to create system (with the default ECC setup for the system), system-ecc, system-no-ecc, cma, cma-ecc, cma-no-ecc.

Let's say we introduce caching next. do we want to triple the number of heaps again?

So I guess it all boils down to whether we want to consider heaps as allocators, and then we need the flags to fine-tune the attributes/exact semantics, or the combination of an allocator and the semantics which will make the number of heaps explode (and reduce their general usefulness, I guess).

Maxime

John Stultz

4:51 p.m.

On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...

On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

thanks -john

Daniel Vetter

21 May 21 May

12:06 p.m.

On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...

On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Also wondering whether we should get the symlink/allocator idea off the ground first, but given that that hasn't moved in a decade it might be too much. But then the question is, what userspace are we going to use for all these new heaps (or heaps with new flags)?

Cheers, Sima

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Maxime Ripard

22 May 22 May

1:18 p.m.

On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...

On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

Right, but "not useable" is still kind of usage dependent, which reinforces the need for flags (and possibly some way to discover what the heap supports).

Like, if I just want to allocate a buffer for a single writeback frame, then I probably don't have the same requirements than a compositor that needs to output a frame at 120Hz.

The former probably doesn't care about the buffer attributes aside that it's accessible by the device. The latter probably can't make any kind of compromise over what kind of memory characteristics it uses.

If we look into the current discussions we have, a compositor would probably need a buffer without ECC, non-secure, and probably wouldn't care about caching and being physically contiguous.

Libcamera's SoftISP would probably require that the buffer is cacheable, non-secure, without ECC and might ask for physically contiguous buffers.

As we add more memory types / attributes, I think being able to discover and enforce a particular set of flags will be more and more important, even more so if we tie heaps to devices, because it just gives a hint about the memory being reachable from the device, but as you said, you can still get a buffer with shit performance that won't be what you want.

...

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Also wondering whether we should get the symlink/allocator idea off the ground first, but given that that hasn't moved in a decade it might be too much. But then the question is, what userspace are we going to use for all these new heaps (or heaps with new flags)?

For ECC here, the compositors are the obvious target. Which loops backs into the discussion with John. Do you consider dma-buf code have the same uapi requirements as DRM?

Maxime

Daniel Vetter

23 May 23 May

9:45 a.m.

On Wed, May 22, 2024 at 03:18:02PM +0200, Maxime Ripard wrote:

...

On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

Right, but "not useable" is still kind of usage dependent, which reinforces the need for flags (and possibly some way to discover what the heap supports).

Like, if I just want to allocate a buffer for a single writeback frame, then I probably don't have the same requirements than a compositor that needs to output a frame at 120Hz.

The former probably doesn't care about the buffer attributes aside that it's accessible by the device. The latter probably can't make any kind of compromise over what kind of memory characteristics it uses.

If we look into the current discussions we have, a compositor would probably need a buffer without ECC, non-secure, and probably wouldn't care about caching and being physically contiguous.

Libcamera's SoftISP would probably require that the buffer is cacheable, non-secure, without ECC and might ask for physically contiguous buffers.

As we add more memory types / attributes, I think being able to discover and enforce a particular set of flags will be more and more important, even more so if we tie heaps to devices, because it just gives a hint about the memory being reachable from the device, but as you said, you can still get a buffer with shit performance that won't be what you want.

...
So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Also wondering whether we should get the symlink/allocator idea off the ground first, but given that that hasn't moved in a decade it might be too much. But then the question is, what userspace are we going to use for all these new heaps (or heaps with new flags)?

For ECC here, the compositors are the obvious target. Which loops backs into the discussion with John. Do you consider dma-buf code have the same uapi requirements as DRM?

Imo yes, otherwise we'll get really funny stuff like people bypass drm's userspace requirement for e.g. content protected buffers by just shipping the feature in a dma-buf heap ... Been there, done that.

Also I think especially with interop across components there's a huge difference between a quick test program toy and the real thing. And dma-buf heaps are kinda all about cross component interop. -Sima

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Thierry Reding

28 Jun 28 Jun

11:29 a.m.

On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...

On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

This rather than try to represent every possible combination we basically make this a "configuration" issue. System designers need to settle on whatever combination of flags work for all the desired use- cases and then we expose that combination as a named heap.

One problem that this doesn't solve is that we still don't have a way of retrieving these flags in drivers which may need them. Perhaps one way to address this would be to add in-kernel APIs to allocate from a heap. That way a DRM/KMS driver (for example) could find a named heap, allocate from it and implicitly store flags about the heap/buffer. Or maybe we could add in-kernel API to retrieve flags, which would be a bit better than having to expose them to userspace.

Thierry

Maxime Ripard

1:08 p.m.

Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...

On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

...

Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

...

For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

But yeah, I agree that being backed by CMA is probably not what an application cares about (and we even have might some discussions about that), but if the ECC protection comes at a performance cost then it will very much care about it. Or if it comes with caches enabled or not.

...

This rather than try to represent every possible combination we basically make this a "configuration" issue. System designers need to settle on whatever combination of flags work for all the desired use- cases and then we expose that combination as a named heap.

This just pushes the problem down to applications, and carry the flags mentioned earlier in the heap name. So the same information, but harder to process or discover for an application.

...

One problem that this doesn't solve is that we still don't have a way of retrieving these flags in drivers which may need them.

I'm not sure drivers should actually need to allocate from heaps, but we could do it just like I suggested we'd do it for applications: we add a new function that allows to discover what a given heap capabilities are. And then we just have to iterate and choose the best suited for our needs.

Maxime

Thierry Reding

2:42 p.m.

On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...

Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

...

...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume. For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

...

And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

We could still standardize on common use-cases so that applications would know what heaps to allocate from. But there's also no need to arbitrarily restrict this. For example there could be cases that are very specific to a particular platform and which just doesn't exist anywhere else. Platform designers could then still use this mechanism to define that very particular heap and have a very specialized userspace application use that heap for their purpose.

...

...
For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

Well, CMA is really just an implementation detail, right? It doesn't make sense to advertise that to anything outside the kernel. Maybe it's an interesting fact that buffers allocated from these heaps will be physically contiguous? In the majority of cases that's probably not even something that matters because we get a DMA-BUF anyway and we can map that any way we want.

Irrespective of that, physically contigous buffers could be allocated in any number of ways, CMA is just a convenient implementation of one such allocator.

...

But yeah, I agree that being backed by CMA is probably not what an application cares about (and we even have might some discussions about that), but if the ECC protection comes at a performance cost then it will very much care about it. Or if it comes with caches enabled or not.

True, no doubt about that. However, I'm saying there may be advantages in hiding all of this from applications. Let's say we're trying to implement video decoding. We can create a special "protected-video" heap that is specifically designed to allocate encrypted/protected scanout buffers from.

When you design that system, you would most certainly not enable ECC protection on that heap because it leads to bad performance. You would also want to make sure that all of the buffers in that heap are cached and whatever other optimizations your chip may provide.

Your application doesn't have to care about this, though, because it can simply look for a heap named "protected-video" and allocate buffers from it.

...

...
This rather than try to represent every possible combination we basically make this a "configuration" issue. System designers need to settle on whatever combination of flags work for all the desired use- cases and then we expose that combination as a named heap.

This just pushes the problem down to applications, and carry the flags mentioned earlier in the heap name. So the same information, but harder to process or discover for an application.

Yes, this pushes the problem down to the application. But given the above I don't think it becomes at all hard to process. We may sacrifice some flexibility, but I'm arguing that it's flexibility that we don't need anyway.

...

...
One problem that this doesn't solve is that we still don't have a way of retrieving these flags in drivers which may need them.

I'm not sure drivers should actually need to allocate from heaps, but we could do it just like I suggested we'd do it for applications: we add a new function that allows to discover what a given heap capabilities are. And then we just have to iterate and choose the best suited for our needs.

Yeah, that's an interesting option as well. I think contrary to userspace it makes more sense to work off of a set of flags at the kernel level.

The obvious downside to this is that userspace now also needs driver- specific implementations for the allocation. Similar to the above it gives us a lot of flexibility at the cost of simplicity.

Thierry

Maxime Ripard

4 Jul 4 Jul

12:24 p.m.

On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...

On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > But it makes me a little nervous to add a new generic allocation flag > for a feature most hardware doesn't support (yet, at least). So it's > hard to weigh how common the actual usage will be across all the > heaps. > > I apologize as my worry is mostly born out of seeing vendors really > push opaque feature flags in their old ion heaps, so in providing a > flags argument, it was mostly intended as an escape hatch for > obviously common attributes. So having the first be something that > seems reasonable, but isn't actually that common makes me fret some. > > So again, not an objection, just something for folks to stew on to > make sure this is really the right approach.

Another good reason to go with full heap names instead of opaque flags on existing heaps is that with the former we can use symlinks in sysfs to specify heaps, with the latter we need a new idea. We haven't yet gotten around to implement this anywhere, but it's been in the dma-buf/heap todo since forever, and I like it as a design approach. So would be a good idea to not toss it. With that display would have symlinks to cma-ecc and cma, and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

...

...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

...

For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

How can it know which heap to allocate from on a given platform?

Similarly with the ECC support we started that discussion with. ECC will introduce a significant performance cost. How can a generic application, such as a compositor, will know which heap to allocate from without:

a) Trying to bundle up a list of heaps for each platform it might or might not run

b) and handling the name difference between BSPs and mainline.

If some hardware-specific applications / middleware want to take a shortcut and use the name, that's fine. But we need to find a way for generic applications to discover which heap is best suited for their needs without the name.

...

...
And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

We could still standardize on common use-cases so that applications would know what heaps to allocate from. But there's also no need to arbitrarily restrict this. For example there could be cases that are very specific to a particular platform and which just doesn't exist anywhere else. Platform designers could then still use this mechanism to define that very particular heap and have a very specialized userspace application use that heap for their purpose.

We could just add a different capabitily flag to make sure those would get ignored.

...

...
...
For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

Well, CMA is really just an implementation detail, right? It doesn't make sense to advertise that to anything outside the kernel. Maybe it's an interesting fact that buffers allocated from these heaps will be physically contiguous?

CMA itself might be an implementation detail, but it's still right there in the name on ARM.

And being able to get physically contiguous buffers is critical on platforms without an IOMMU.

...

In the majority of cases that's probably not even something that matters because we get a DMA-BUF anyway and we can map that any way we want.

Irrespective of that, physically contigous buffers could be allocated in any number of ways, CMA is just a convenient implementation of one such allocator.

...
But yeah, I agree that being backed by CMA is probably not what an application cares about (and we even have might some discussions about that), but if the ECC protection comes at a performance cost then it will very much care about it. Or if it comes with caches enabled or not.

True, no doubt about that. However, I'm saying there may be advantages in hiding all of this from applications. Let's say we're trying to implement video decoding. We can create a special "protected-video" heap that is specifically designed to allocate encrypted/protected scanout buffers from.

When you design that system, you would most certainly not enable ECC protection on that heap because it leads to bad performance. You would also want to make sure that all of the buffers in that heap are cached and whatever other optimizations your chip may provide.

Your application doesn't have to care about this, though, because it can simply look for a heap named "protected-video" and allocate buffers from it.

I mean, I disagree. Or rather, in an environment where you have a system architect, and the application is targeted for a particular system only, and where "protected-video" means whatever the team decided in general, yeah, that works.

So, in a BSP or Android, that works fine.

On a mainline based system, with generic stacks like libcamera, it just doesn't fly anymore.

Let's use the two heaps we currently support: their name isn't stable across architectures, nobody ever documented the set of attributes that particular heap has, and since it's not documented, good luck trying to avoid regressions.

So, today, with a very limited number of heaps and no vendor involvement so far, the "let's just use the name" policy doesn't work already.

Maxime

Thierry Reding

5 Jul 5 Jul

2:31 p.m.

On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...

On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote:

...
On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > > But it makes me a little nervous to add a new generic allocation flag > > for a feature most hardware doesn't support (yet, at least). So it's > > hard to weigh how common the actual usage will be across all the > > heaps. > > > > I apologize as my worry is mostly born out of seeing vendors really > > push opaque feature flags in their old ion heaps, so in providing a > > flags argument, it was mostly intended as an escape hatch for > > obviously common attributes. So having the first be something that > > seems reasonable, but isn't actually that common makes me fret some. > > > > So again, not an objection, just something for folks to stew on to > > make sure this is really the right approach. > > Another good reason to go with full heap names instead of opaque flags on > existing heaps is that with the former we can use symlinks in sysfs to > specify heaps, with the latter we need a new idea. We haven't yet gotten > around to implement this anywhere, but it's been in the dma-buf/heap todo > since forever, and I like it as a design approach. So would be a good idea > to not toss it. With that display would have symlinks to cma-ecc and cma, > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a > SoC where the display needs contig memory for scanout.

So indeed that is a good point to keep in mind, but I also think it might re-inforce the choice of having ECC as a flag here.

Since my understanding of the sysfs symlinks to heaps idea is about being able to figure out a common heap from a collection of devices, it's really about the ability for the driver to access the type of memory. If ECC is just an attribute of the type of memory (as in this patch series), it being on or off won't necessarily affect compatibility of the buffer with the device. Similarly "uncached" seems more of an attribute of memory type and not a type itself. Hardware that can access non-contiguous "system" buffers can access uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

...

...
...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...

...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

...

How can it know which heap to allocate from on a given platform?

Similarly with the ECC support we started that discussion with. ECC will introduce a significant performance cost. How can a generic application, such as a compositor, will know which heap to allocate from without:

a) Trying to bundle up a list of heaps for each platform it might or might not run

b) and handling the name difference between BSPs and mainline.

Obviously some standardization of heap names is a requirement here, otherwise such a proposal does indeed not make sense.

...

If some hardware-specific applications / middleware want to take a shortcut and use the name, that's fine. But we need to find a way for generic applications to discover which heap is best suited for their needs without the name.

You can still have fairly generic names for heaps. If you want protected content, you could try to use a standard "video-protected" heap. If you need ECC protected memory, maybe you want to allocate from a heap named "safety", or whatever.

...

...
...
And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

We could still standardize on common use-cases so that applications would know what heaps to allocate from. But there's also no need to arbitrarily restrict this. For example there could be cases that are very specific to a particular platform and which just doesn't exist anywhere else. Platform designers could then still use this mechanism to define that very particular heap and have a very specialized userspace application use that heap for their purpose.

We could just add a different capabitily flag to make sure those would get ignored.

Sure you can do all of this with a myriad of flags. But again, I'm trying to argue that we may not need this additional complexity. In a typical system, how many heaps do you encounter? You may need a generic one and then perhaps a handful specific ones? Or do you need more?

...

...
...
...
For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

Well, CMA is really just an implementation detail, right? It doesn't make sense to advertise that to anything outside the kernel. Maybe it's an interesting fact that buffers allocated from these heaps will be physically contiguous?

CMA itself might be an implementation detail, but it's still right there in the name on ARM.

That doesn't mean we can do something more useful going forward (and perhaps symlink for backwards-compatibility if needed).

...

And being able to get physically contiguous buffers is critical on platforms without an IOMMU.

Again, I'm not trying to dispute the necessity of contiguous buffers. I'm trying to say that contextual names can be a viable alternative to full discoverability. If you want contiguous buffers, go call the heap "contiguous" and it's quite clear what it means.

You can even hide details such as IOMMU availability from userspace that way. On a system where an IOMMU is present, you could for example go and use IOMMU-backed memory in a "contiguous" heap, while on a system without an IOMMU the memory for the "contiguous" heap could come from CMA.

...

...
In the majority of cases that's probably not even something that matters because we get a DMA-BUF anyway and we can map that any way we want.

Irrespective of that, physically contigous buffers could be allocated in any number of ways, CMA is just a convenient implementation of one such allocator.

...
But yeah, I agree that being backed by CMA is probably not what an application cares about (and we even have might some discussions about that), but if the ECC protection comes at a performance cost then it will very much care about it. Or if it comes with caches enabled or not.

True, no doubt about that. However, I'm saying there may be advantages in hiding all of this from applications. Let's say we're trying to implement video decoding. We can create a special "protected-video" heap that is specifically designed to allocate encrypted/protected scanout buffers from.

When you design that system, you would most certainly not enable ECC protection on that heap because it leads to bad performance. You would also want to make sure that all of the buffers in that heap are cached and whatever other optimizations your chip may provide.

Your application doesn't have to care about this, though, because it can simply look for a heap named "protected-video" and allocate buffers from it.

I mean, I disagree. Or rather, in an environment where you have a system architect, and the application is targeted for a particular system only, and where "protected-video" means whatever the team decided in general, yeah, that works.

So, in a BSP or Android, that works fine.

On a mainline based system, with generic stacks like libcamera, it just doesn't fly anymore.

I'm not sure I know of a system that isn't architected. Even very "generic" devices have a set of functionality that the manufacturer wanted the device to provide.

Aren't generic stacks not also build to provide a specific function? Again, libcamera could try to use a "camera" heap, or maybe it would fit into that "multimedia" category.

For truly generic systems you typically don't need any of this, right? A generic system like a PC usually gets by with just system memory and maybe video RAM for some specific cases.

Hardware where you need these extra heaps usually need them for very narrow use-cases.

...

Let's use the two heaps we currently support: their name isn't stable across architectures, nobody ever documented the set of attributes that particular heap has, and since it's not documented, good luck trying to avoid regressions.

So, today, with a very limited number of heaps and no vendor involvement so far, the "let's just use the name" policy doesn't work already.

I'm not sure I understand this argument. Today we don't expose flags to userspace either. Does that mean there's nothing we can do about it? Of course not.

Just because these heaps are currently suboptimally named doesn't mean that can't be changed. If we wanted we could forge ahead and standardize names and perhaps add backwards-compatible links for the existing ones.

Thierry

Daniel Vetter

3:35 p.m.

Just figured I'll jump in on one detail here.

On Fri, Jul 05, 2024 at 04:31:34PM +0200, Thierry Reding wrote:

...

On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...
On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote: > On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: > > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > > > But it makes me a little nervous to add a new generic allocation flag > > > for a feature most hardware doesn't support (yet, at least). So it's > > > hard to weigh how common the actual usage will be across all the > > > heaps. > > > > > > I apologize as my worry is mostly born out of seeing vendors really > > > push opaque feature flags in their old ion heaps, so in providing a > > > flags argument, it was mostly intended as an escape hatch for > > > obviously common attributes. So having the first be something that > > > seems reasonable, but isn't actually that common makes me fret some. > > > > > > So again, not an objection, just something for folks to stew on to > > > make sure this is really the right approach. > > > > Another good reason to go with full heap names instead of opaque flags on > > existing heaps is that with the former we can use symlinks in sysfs to > > specify heaps, with the latter we need a new idea. We haven't yet gotten > > around to implement this anywhere, but it's been in the dma-buf/heap todo > > since forever, and I like it as a design approach. So would be a good idea > > to not toss it. With that display would have symlinks to cma-ecc and cma, > > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a > > SoC where the display needs contig memory for scanout. > > So indeed that is a good point to keep in mind, but I also think it > might re-inforce the choice of having ECC as a flag here. > > Since my understanding of the sysfs symlinks to heaps idea is about > being able to figure out a common heap from a collection of devices, > it's really about the ability for the driver to access the type of > memory. If ECC is just an attribute of the type of memory (as in this > patch series), it being on or off won't necessarily affect > compatibility of the buffer with the device. Similarly "uncached" > seems more of an attribute of memory type and not a type itself. > Hardware that can access non-contiguous "system" buffers can access > uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

...
...
...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...
...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

Yeah the idea to sort this out was to have symlinks in sysfs from the device to each heap. We could then have priorities for each such link, so that applications can pick the "best" heap that will work with all devices. Or also special links for special use-cases, like for a display+render drm device you might want to have separate links for the display and the render-only use-case.

I think trying to encode this all into the name of a heap without linking it to the device is not going to work well in general.

We still have that entire "make sysfs symlinks work for dma-buf heaps" on our todos, and that idea is almost as old as dma-buf itself :-/ -Sima

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Christian König

8 Jul 8 Jul

7:14 a.m.

Am 05.07.24 um 17:35 schrieb Daniel Vetter:

...

Just figured I'll jump in on one detail here.

On Fri, Jul 05, 2024 at 04:31:34PM +0200, Thierry Reding wrote:

...
On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...
On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote: > On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote: >> On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: >>> On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: >>>> But it makes me a little nervous to add a new generic allocation flag >>>> for a feature most hardware doesn't support (yet, at least). So it's >>>> hard to weigh how common the actual usage will be across all the >>>> heaps. >>>> >>>> I apologize as my worry is mostly born out of seeing vendors really >>>> push opaque feature flags in their old ion heaps, so in providing a >>>> flags argument, it was mostly intended as an escape hatch for >>>> obviously common attributes. So having the first be something that >>>> seems reasonable, but isn't actually that common makes me fret some. >>>> >>>> So again, not an objection, just something for folks to stew on to >>>> make sure this is really the right approach. >>> Another good reason to go with full heap names instead of opaque flags on >>> existing heaps is that with the former we can use symlinks in sysfs to >>> specify heaps, with the latter we need a new idea. We haven't yet gotten >>> around to implement this anywhere, but it's been in the dma-buf/heap todo >>> since forever, and I like it as a design approach. So would be a good idea >>> to not toss it. With that display would have symlinks to cma-ecc and cma, >>> and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a >>> SoC where the display needs contig memory for scanout. >> So indeed that is a good point to keep in mind, but I also think it >> might re-inforce the choice of having ECC as a flag here. >> >> Since my understanding of the sysfs symlinks to heaps idea is about >> being able to figure out a common heap from a collection of devices, >> it's really about the ability for the driver to access the type of >> memory. If ECC is just an attribute of the type of memory (as in this >> patch series), it being on or off won't necessarily affect >> compatibility of the buffer with the device. Similarly "uncached" >> seems more of an attribute of memory type and not a type itself. >> Hardware that can access non-contiguous "system" buffers can access >> uncached system buffers. > Yeah, but in graphics there's a wide band where "shit performance" is > defacto "not useable (as intended at least)". > > So if we limit the symlink idea to just making sure zero-copy access is > possible, then we might not actually solve the real world problem we need > to solve. And so the symlinks become somewhat useless, and we need to > somewhere encode which flags you need to use with each symlink. > > But I also see the argument that there's a bit a combinatorial explosion > possible. So I guess the question is where we want to handle it ... Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

...
...
...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...
...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

Yeah the idea to sort this out was to have symlinks in sysfs from the device to each heap. We could then have priorities for each such link, so that applications can pick the "best" heap that will work with all devices. Or also special links for special use-cases, like for a display+render drm device you might want to have separate links for the display and the render-only use-case.

I think trying to encode this all into the name of a heap without linking it to the device is not going to work well in general.

We still have that entire "make sysfs symlinks work for dma-buf heaps" on our todos, and that idea is almost as old as dma-buf itself :-/

I still have the draft patches for that lying around on my harddisk somewhere with zero time to look into it.

If anybody wants to pick it up feel free to ping me, but be aware that you need to write more documentation than code.

Regards, Christian.

...

-Sima

Thierry Reding

11 Jul 11 Jul

12:44 p.m.

On Mon, Jul 08, 2024 at 09:14:14AM GMT, Christian König wrote:

...

Am 05.07.24 um 17:35 schrieb Daniel Vetter:

...
Just figured I'll jump in on one detail here.

On Fri, Jul 05, 2024 at 04:31:34PM +0200, Thierry Reding wrote:

...
On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...
On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote: > On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote: > > On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote: > > > On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: > > > > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > > > > > But it makes me a little nervous to add a new generic allocation flag > > > > > for a feature most hardware doesn't support (yet, at least). So it's > > > > > hard to weigh how common the actual usage will be across all the > > > > > heaps. > > > > > > > > > > I apologize as my worry is mostly born out of seeing vendors really > > > > > push opaque feature flags in their old ion heaps, so in providing a > > > > > flags argument, it was mostly intended as an escape hatch for > > > > > obviously common attributes. So having the first be something that > > > > > seems reasonable, but isn't actually that common makes me fret some. > > > > > > > > > > So again, not an objection, just something for folks to stew on to > > > > > make sure this is really the right approach. > > > > Another good reason to go with full heap names instead of opaque flags on > > > > existing heaps is that with the former we can use symlinks in sysfs to > > > > specify heaps, with the latter we need a new idea. We haven't yet gotten > > > > around to implement this anywhere, but it's been in the dma-buf/heap todo > > > > since forever, and I like it as a design approach. So would be a good idea > > > > to not toss it. With that display would have symlinks to cma-ecc and cma, > > > > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a > > > > SoC where the display needs contig memory for scanout. > > > So indeed that is a good point to keep in mind, but I also think it > > > might re-inforce the choice of having ECC as a flag here. > > > > > > Since my understanding of the sysfs symlinks to heaps idea is about > > > being able to figure out a common heap from a collection of devices, > > > it's really about the ability for the driver to access the type of > > > memory. If ECC is just an attribute of the type of memory (as in this > > > patch series), it being on or off won't necessarily affect > > > compatibility of the buffer with the device. Similarly "uncached" > > > seems more of an attribute of memory type and not a type itself. > > > Hardware that can access non-contiguous "system" buffers can access > > > uncached system buffers. > > Yeah, but in graphics there's a wide band where "shit performance" is > > defacto "not useable (as intended at least)". > > > > So if we limit the symlink idea to just making sure zero-copy access is > > possible, then we might not actually solve the real world problem we need > > to solve. And so the symlinks become somewhat useless, and we need to > > somewhere encode which flags you need to use with each symlink. > > > > But I also see the argument that there's a bit a combinatorial explosion > > possible. So I guess the question is where we want to handle it ... > Sorry for jumping into this discussion so late. But are we really > concerned about this combinatorial explosion in practice? It may be > theoretically possible to create any combination of these, but do we > expect more than a couple of heaps to exist in any given system? I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

...
...
...
> Would it perhaps make more sense to let a platform override the heap > name to make it more easily identifiable? Maybe this is a naive > assumption, but aren't userspace applications and drivers not primarily > interested in the "type" of heap rather than whatever specific flags > have been set for it? I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...
...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

Yeah the idea to sort this out was to have symlinks in sysfs from the device to each heap. We could then have priorities for each such link, so that applications can pick the "best" heap that will work with all devices. Or also special links for special use-cases, like for a display+render drm device you might want to have separate links for the display and the render-only use-case.

I think trying to encode this all into the name of a heap without linking it to the device is not going to work well in general.

We still have that entire "make sysfs symlinks work for dma-buf heaps" on our todos, and that idea is almost as old as dma-buf itself :-/

I still have the draft patches for that lying around on my harddisk somewhere with zero time to look into it.

If anybody wants to pick it up feel free to ping me, but be aware that you need to write more documentation than code.

I'm interested, so if you can dig those out that'd be a great reference.

Thanks, Thierry

Maxime Ripard

10 Jul 10 Jul

12:10 p.m.

On Fri, Jul 05, 2024 at 04:31:34PM GMT, Thierry Reding wrote:

...

On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...
On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote:

...
On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote: > On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: > > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > > > But it makes me a little nervous to add a new generic allocation flag > > > for a feature most hardware doesn't support (yet, at least). So it's > > > hard to weigh how common the actual usage will be across all the > > > heaps. > > > > > > I apologize as my worry is mostly born out of seeing vendors really > > > push opaque feature flags in their old ion heaps, so in providing a > > > flags argument, it was mostly intended as an escape hatch for > > > obviously common attributes. So having the first be something that > > > seems reasonable, but isn't actually that common makes me fret some. > > > > > > So again, not an objection, just something for folks to stew on to > > > make sure this is really the right approach. > > > > Another good reason to go with full heap names instead of opaque flags on > > existing heaps is that with the former we can use symlinks in sysfs to > > specify heaps, with the latter we need a new idea. We haven't yet gotten > > around to implement this anywhere, but it's been in the dma-buf/heap todo > > since forever, and I like it as a design approach. So would be a good idea > > to not toss it. With that display would have symlinks to cma-ecc and cma, > > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a > > SoC where the display needs contig memory for scanout. > > So indeed that is a good point to keep in mind, but I also think it > might re-inforce the choice of having ECC as a flag here. > > Since my understanding of the sysfs symlinks to heaps idea is about > being able to figure out a common heap from a collection of devices, > it's really about the ability for the driver to access the type of > memory. If ECC is just an attribute of the type of memory (as in this > patch series), it being on or off won't necessarily affect > compatibility of the buffer with the device. Similarly "uncached" > seems more of an attribute of memory type and not a type itself. > Hardware that can access non-contiguous "system" buffers can access > uncached system buffers.

Yeah, but in graphics there's a wide band where "shit performance" is defacto "not useable (as intended at least)".

So if we limit the symlink idea to just making sure zero-copy access is possible, then we might not actually solve the real world problem we need to solve. And so the symlinks become somewhat useless, and we need to somewhere encode which flags you need to use with each symlink.

But I also see the argument that there's a bit a combinatorial explosion possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

How can undocumented, unenforced, conventions can work in practice?

...

...
...
...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...
...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

That kind of vague categorization is pointless though. Some criteria are about hardwar (ie, can the device access it in the first place?), so is purely about a particular context and policy and will change from one application to the other.

A camera app using an ISP will not care about caches. A software rendering library will. A compositor will not want ECC. A safety component probably will.

All of them are "multimedia".

We *need* to be able to differentiate policy from hardware requirements.

...

The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

Some of them are specific, some of them aren't.

...

...
How can it know which heap to allocate from on a given platform?

Similarly with the ECC support we started that discussion with. ECC will introduce a significant performance cost. How can a generic application, such as a compositor, will know which heap to allocate from without:

a) Trying to bundle up a list of heaps for each platform it might or might not run

b) and handling the name difference between BSPs and mainline.

Obviously some standardization of heap names is a requirement here, otherwise such a proposal does indeed not make sense.

...
If some hardware-specific applications / middleware want to take a shortcut and use the name, that's fine. But we need to find a way for generic applications to discover which heap is best suited for their needs without the name.

You can still have fairly generic names for heaps. If you want protected content, you could try to use a standard "video-protected" heap. If you need ECC protected memory, maybe you want to allocate from a heap named "safety", or whatever.

And if I need cacheable, physically contiguous, "multimedia" buffers from ECC protected memory?

...

...
...
...
And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

We could still standardize on common use-cases so that applications would know what heaps to allocate from. But there's also no need to arbitrarily restrict this. For example there could be cases that are very specific to a particular platform and which just doesn't exist anywhere else. Platform designers could then still use this mechanism to define that very particular heap and have a very specialized userspace application use that heap for their purpose.

We could just add a different capabitily flag to make sure those would get ignored.

Sure you can do all of this with a myriad of flags. But again, I'm trying to argue that we may not need this additional complexity. In a typical system, how many heaps do you encounter? You may need a generic one and then perhaps a handful specific ones? Or do you need more?

It's not a matter of the number of heaps, but what they provide.

...

...
...
...
...
For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

Well, CMA is really just an implementation detail, right? It doesn't make sense to advertise that to anything outside the kernel. Maybe it's an interesting fact that buffers allocated from these heaps will be physically contiguous?

CMA itself might be an implementation detail, but it's still right there in the name on ARM.

That doesn't mean we can do something more useful going forward (and perhaps symlink for backwards-compatibility if needed).

...
And being able to get physically contiguous buffers is critical on platforms without an IOMMU.

Again, I'm not trying to dispute the necessity of contiguous buffers. I'm trying to say that contextual names can be a viable alternative to full discoverability. If you want contiguous buffers, go call the heap "contiguous" and it's quite clear what it means.

You can even hide details such as IOMMU availability from userspace that way. On a system where an IOMMU is present, you could for example go and use IOMMU-backed memory in a "contiguous" heap, while on a system without an IOMMU the memory for the "contiguous" heap could come from CMA.

I can see the benefits from that, and it would be quite nice indeed. However, it still only addresses the "hardware" part of the requirements (ie, is it contiguous, accessible, etc.). It doesn't address applications having different requirements when it comes to what kind of attributes they'd like/need to get from the buffer.

If one application in the system wants contiguous (using your definition just above) buffers without caches, and the other wants to have contiguous cacheable buffers, if we're only using the name we'd need to instantiate two heaps, from the same allocator, for what's essentially a mapping attribute.

It's more complex for the kernel, more code to maintain, and more complex for applications too because they need to know about what a given name means for that particular context.

...

...
...
In the majority of cases that's probably not even something that matters because we get a DMA-BUF anyway and we can map that any way we want.

Irrespective of that, physically contigous buffers could be allocated in any number of ways, CMA is just a convenient implementation of one such allocator.

...
But yeah, I agree that being backed by CMA is probably not what an application cares about (and we even have might some discussions about that), but if the ECC protection comes at a performance cost then it will very much care about it. Or if it comes with caches enabled or not.

True, no doubt about that. However, I'm saying there may be advantages in hiding all of this from applications. Let's say we're trying to implement video decoding. We can create a special "protected-video" heap that is specifically designed to allocate encrypted/protected scanout buffers from.

When you design that system, you would most certainly not enable ECC protection on that heap because it leads to bad performance. You would also want to make sure that all of the buffers in that heap are cached and whatever other optimizations your chip may provide.

Your application doesn't have to care about this, though, because it can simply look for a heap named "protected-video" and allocate buffers from it.

I mean, I disagree. Or rather, in an environment where you have a system architect, and the application is targeted for a particular system only, and where "protected-video" means whatever the team decided in general, yeah, that works.

So, in a BSP or Android, that works fine.

On a mainline based system, with generic stacks like libcamera, it just doesn't fly anymore.

I'm not sure I know of a system that isn't architected. Even very "generic" devices have a set of functionality that the manufacturer wanted the device to provide.

Aren't generic stacks not also build to provide a specific function? Again, libcamera could try to use a "camera" heap, or maybe it would fit into that "multimedia" category.

For truly generic systems you typically don't need any of this, right? A generic system like a PC usually gets by with just system memory and maybe video RAM for some specific cases.

Why wouldn't we need this for a truly generic system?

With ARM laptops around the corner, pretty much the same SoC can be used in a tablet, in a car, or in a "generic system like a PC".

Maxime

Thierry Reding

11 Jul 11 Jul

12:43 p.m.

On Wed, Jul 10, 2024 at 02:10:09PM GMT, Maxime Ripard wrote:

...

On Fri, Jul 05, 2024 at 04:31:34PM GMT, Thierry Reding wrote:

...
On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...
On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote: > On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote: > > On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: > > > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > > > > But it makes me a little nervous to add a new generic allocation flag > > > > for a feature most hardware doesn't support (yet, at least). So it's > > > > hard to weigh how common the actual usage will be across all the > > > > heaps. > > > > > > > > I apologize as my worry is mostly born out of seeing vendors really > > > > push opaque feature flags in their old ion heaps, so in providing a > > > > flags argument, it was mostly intended as an escape hatch for > > > > obviously common attributes. So having the first be something that > > > > seems reasonable, but isn't actually that common makes me fret some. > > > > > > > > So again, not an objection, just something for folks to stew on to > > > > make sure this is really the right approach. > > > > > > Another good reason to go with full heap names instead of opaque flags on > > > existing heaps is that with the former we can use symlinks in sysfs to > > > specify heaps, with the latter we need a new idea. We haven't yet gotten > > > around to implement this anywhere, but it's been in the dma-buf/heap todo > > > since forever, and I like it as a design approach. So would be a good idea > > > to not toss it. With that display would have symlinks to cma-ecc and cma, > > > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a > > > SoC where the display needs contig memory for scanout. > > > > So indeed that is a good point to keep in mind, but I also think it > > might re-inforce the choice of having ECC as a flag here. > > > > Since my understanding of the sysfs symlinks to heaps idea is about > > being able to figure out a common heap from a collection of devices, > > it's really about the ability for the driver to access the type of > > memory. If ECC is just an attribute of the type of memory (as in this > > patch series), it being on or off won't necessarily affect > > compatibility of the buffer with the device. Similarly "uncached" > > seems more of an attribute of memory type and not a type itself. > > Hardware that can access non-contiguous "system" buffers can access > > uncached system buffers. > > Yeah, but in graphics there's a wide band where "shit performance" is > defacto "not useable (as intended at least)". > > So if we limit the symlink idea to just making sure zero-copy access is > possible, then we might not actually solve the real world problem we need > to solve. And so the symlinks become somewhat useless, and we need to > somewhere encode which flags you need to use with each symlink. > > But I also see the argument that there's a bit a combinatorial explosion > possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

How can undocumented, unenforced, conventions can work in practice?

...
...
...
...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...
...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

That kind of vague categorization is pointless though. Some criteria are about hardwar (ie, can the device access it in the first place?), so is purely about a particular context and policy and will change from one application to the other.

A camera app using an ISP will not care about caches. A software rendering library will. A compositor will not want ECC. A safety component probably will.

All of them are "multimedia".

We *need* to be able to differentiate policy from hardware requirements.

...
The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

Some of them are specific, some of them aren't.

...
...
How can it know which heap to allocate from on a given platform?

Similarly with the ECC support we started that discussion with. ECC will introduce a significant performance cost. How can a generic application, such as a compositor, will know which heap to allocate from without:

a) Trying to bundle up a list of heaps for each platform it might or might not run

b) and handling the name difference between BSPs and mainline.

Obviously some standardization of heap names is a requirement here, otherwise such a proposal does indeed not make sense.

...
If some hardware-specific applications / middleware want to take a shortcut and use the name, that's fine. But we need to find a way for generic applications to discover which heap is best suited for their needs without the name.

You can still have fairly generic names for heaps. If you want protected content, you could try to use a standard "video-protected" heap. If you need ECC protected memory, maybe you want to allocate from a heap named "safety", or whatever.

And if I need cacheable, physically contiguous, "multimedia" buffers from ECC protected memory?

...
...
...
...
And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

We could still standardize on common use-cases so that applications would know what heaps to allocate from. But there's also no need to arbitrarily restrict this. For example there could be cases that are very specific to a particular platform and which just doesn't exist anywhere else. Platform designers could then still use this mechanism to define that very particular heap and have a very specialized userspace application use that heap for their purpose.

We could just add a different capabitily flag to make sure those would get ignored.

Sure you can do all of this with a myriad of flags. But again, I'm trying to argue that we may not need this additional complexity. In a typical system, how many heaps do you encounter? You may need a generic one and then perhaps a handful specific ones? Or do you need more?

It's not a matter of the number of heaps, but what they provide.

...
...
...
...
...
For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

Well, CMA is really just an implementation detail, right? It doesn't make sense to advertise that to anything outside the kernel. Maybe it's an interesting fact that buffers allocated from these heaps will be physically contiguous?

CMA itself might be an implementation detail, but it's still right there in the name on ARM.

That doesn't mean we can do something more useful going forward (and perhaps symlink for backwards-compatibility if needed).

...
And being able to get physically contiguous buffers is critical on platforms without an IOMMU.

Again, I'm not trying to dispute the necessity of contiguous buffers. I'm trying to say that contextual names can be a viable alternative to full discoverability. If you want contiguous buffers, go call the heap "contiguous" and it's quite clear what it means.

You can even hide details such as IOMMU availability from userspace that way. On a system where an IOMMU is present, you could for example go and use IOMMU-backed memory in a "contiguous" heap, while on a system without an IOMMU the memory for the "contiguous" heap could come from CMA.

I can see the benefits from that, and it would be quite nice indeed. However, it still only addresses the "hardware" part of the requirements (ie, is it contiguous, accessible, etc.). It doesn't address applications having different requirements when it comes to what kind of attributes they'd like/need to get from the buffer.

If one application in the system wants contiguous (using your definition just above) buffers without caches, and the other wants to have contiguous cacheable buffers, if we're only using the name we'd need to instantiate two heaps, from the same allocator, for what's essentially a mapping attribute.

It's more complex for the kernel, more code to maintain, and more complex for applications too because they need to know about what a given name means for that particular context.

...
...
...
In the majority of cases that's probably not even something that matters because we get a DMA-BUF anyway and we can map that any way we want.

Irrespective of that, physically contigous buffers could be allocated in any number of ways, CMA is just a convenient implementation of one such allocator.

...
But yeah, I agree that being backed by CMA is probably not what an application cares about (and we even have might some discussions about that), but if the ECC protection comes at a performance cost then it will very much care about it. Or if it comes with caches enabled or not.

True, no doubt about that. However, I'm saying there may be advantages in hiding all of this from applications. Let's say we're trying to implement video decoding. We can create a special "protected-video" heap that is specifically designed to allocate encrypted/protected scanout buffers from.

When you design that system, you would most certainly not enable ECC protection on that heap because it leads to bad performance. You would also want to make sure that all of the buffers in that heap are cached and whatever other optimizations your chip may provide.

Your application doesn't have to care about this, though, because it can simply look for a heap named "protected-video" and allocate buffers from it.

I mean, I disagree. Or rather, in an environment where you have a system architect, and the application is targeted for a particular system only, and where "protected-video" means whatever the team decided in general, yeah, that works.

So, in a BSP or Android, that works fine.

On a mainline based system, with generic stacks like libcamera, it just doesn't fly anymore.

I'm not sure I know of a system that isn't architected. Even very "generic" devices have a set of functionality that the manufacturer wanted the device to provide.

Aren't generic stacks not also build to provide a specific function? Again, libcamera could try to use a "camera" heap, or maybe it would fit into that "multimedia" category.

For truly generic systems you typically don't need any of this, right? A generic system like a PC usually gets by with just system memory and maybe video RAM for some specific cases.

Why wouldn't we need this for a truly generic system?

Because ARM systems really aren't that generic. That's why we need these special carveouts and such in the first place.

Once you start making an ARM system more generic (say, by adding things like PCI devices and such into the mix), then these specific cases tend to go away.

Another way of saying this is that these carveouts are usually needed for some SoC-specific functionality, so they are inherently bound to that SoC and no longer generic.

...

With ARM laptops around the corner, pretty much the same SoC can be used in a tablet, in a car, or in a "generic system like a PC".

A "generic system like a PC" based on ARM would still be tied to the specific ARM SoC that's being used, so it's not generic in the same way that a PC is.

Fundamentally the same SoC is going to need the same carveouts, whether it's used in a tablet, in a car or in a laptop. The carveout's use is tied to a particular function of the system. Anything that is not tied to a particular function is just plain old system memory, isn't it?

Of course I may be completely ignorant of whatever it is that you have in mind, so maybe you can provide some concrete examples of where this isn't the case?

Thierry

Thierry Reding

12 Jul 12 Jul

10:37 a.m.

On Wed, Jul 10, 2024 at 02:10:09PM GMT, Maxime Ripard wrote:

...

On Fri, Jul 05, 2024 at 04:31:34PM GMT, Thierry Reding wrote:

...
On Thu, Jul 04, 2024 at 02:24:49PM GMT, Maxime Ripard wrote:

...
On Fri, Jun 28, 2024 at 04:42:35PM GMT, Thierry Reding wrote:

...
On Fri, Jun 28, 2024 at 03:08:46PM GMT, Maxime Ripard wrote:

...
Hi,

On Fri, Jun 28, 2024 at 01:29:17PM GMT, Thierry Reding wrote:

...
On Tue, May 21, 2024 at 02:06:19PM GMT, Daniel Vetter wrote: > On Thu, May 16, 2024 at 09:51:35AM -0700, John Stultz wrote: > > On Thu, May 16, 2024 at 3:56 AM Daniel Vetter daniel@ffwll.ch wrote: > > > On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote: > > > > But it makes me a little nervous to add a new generic allocation flag > > > > for a feature most hardware doesn't support (yet, at least). So it's > > > > hard to weigh how common the actual usage will be across all the > > > > heaps. > > > > > > > > I apologize as my worry is mostly born out of seeing vendors really > > > > push opaque feature flags in their old ion heaps, so in providing a > > > > flags argument, it was mostly intended as an escape hatch for > > > > obviously common attributes. So having the first be something that > > > > seems reasonable, but isn't actually that common makes me fret some. > > > > > > > > So again, not an objection, just something for folks to stew on to > > > > make sure this is really the right approach. > > > > > > Another good reason to go with full heap names instead of opaque flags on > > > existing heaps is that with the former we can use symlinks in sysfs to > > > specify heaps, with the latter we need a new idea. We haven't yet gotten > > > around to implement this anywhere, but it's been in the dma-buf/heap todo > > > since forever, and I like it as a design approach. So would be a good idea > > > to not toss it. With that display would have symlinks to cma-ecc and cma, > > > and rendering maybe cma-ecc, shmem, cma heaps (in priority order) for a > > > SoC where the display needs contig memory for scanout. > > > > So indeed that is a good point to keep in mind, but I also think it > > might re-inforce the choice of having ECC as a flag here. > > > > Since my understanding of the sysfs symlinks to heaps idea is about > > being able to figure out a common heap from a collection of devices, > > it's really about the ability for the driver to access the type of > > memory. If ECC is just an attribute of the type of memory (as in this > > patch series), it being on or off won't necessarily affect > > compatibility of the buffer with the device. Similarly "uncached" > > seems more of an attribute of memory type and not a type itself. > > Hardware that can access non-contiguous "system" buffers can access > > uncached system buffers. > > Yeah, but in graphics there's a wide band where "shit performance" is > defacto "not useable (as intended at least)". > > So if we limit the symlink idea to just making sure zero-copy access is > possible, then we might not actually solve the real world problem we need > to solve. And so the symlinks become somewhat useless, and we need to > somewhere encode which flags you need to use with each symlink. > > But I also see the argument that there's a bit a combinatorial explosion > possible. So I guess the question is where we want to handle it ...

Sorry for jumping into this discussion so late. But are we really concerned about this combinatorial explosion in practice? It may be theoretically possible to create any combination of these, but do we expect more than a couple of heaps to exist in any given system?

I don't worry too much about the number of heaps available in a given system, it would indeed be fairly low.

My concern is about the semantics combinatorial explosion. So far, the name has carried what semantics we were supposed to get from the buffer we allocate from that heap.

The more variations and concepts we'll have, the more heap names we'll need, and with confusing names since we wouldn't be able to change the names of the heaps we already have.

What I was trying to say is that none of this matters if we make these names opaque. If these names are contextual for the given system it doesn't matter what the exact capabilities are. It only matters that their purpose is known and that's what applications will be interested in.

If the names are opaque, and we don't publish what the exact capabilities are, how can an application figure out which heap to use in the first place?

This would need to be based on conventions. The idea is to standardize on a set of names for specific, well-known use-cases.

Sorry, hadn't seen all of your comments in this mail before, a few more notes below.

...

How can undocumented, unenforced, conventions can work in practice?

Unenforced, perhaps, yes, but who says that these conventions need to be undocumented?

...

...
...
...
...
...
Would it perhaps make more sense to let a platform override the heap name to make it more easily identifiable? Maybe this is a naive assumption, but aren't userspace applications and drivers not primarily interested in the "type" of heap rather than whatever specific flags have been set for it?

I guess it depends on what you call the type of a heap. Where we allocate the memory from, sure, an application won't care about that. How the buffer behaves on the other end is definitely something applications are going to be interested in though.

Most of these heaps will be very specific, I would assume.

We don't have any specific heap upstream at the moment, only generic ones.

But we're trying to add more specific ones, right?

...
...
For example a heap that is meant to be protected for protected video decoding is both going to be created in such a way as to allow that use-case (i.e. it doesn't make sense for it to be uncached, for example) and it's also not going to be useful for any other use-case (i.e. there's no reason to use that heap for GPU jobs or networking, or whatever).

Right. But also, libcamera has started to use dma-heaps to allocate dma-capable buffers and do software processing on it before sending it to some hardware controller.

Caches are critical here, and getting a non-cacheable buffer would be a clear regression.

I understand that. My point is that maybe we shouldn't try to design a complex mechanism that allows full discoverability of everything that a heap supports or is capable of. Instead if the camera has specific requirements, it could look for a heap named "camera". Or if it can share a heap with other multimedia devices, maybe call the heap "multimedia".

That kind of vague categorization is pointless though. Some criteria are about hardwar (ie, can the device access it in the first place?), so is purely about a particular context and policy and will change from one application to the other.

A camera app using an ISP will not care about caches. A software rendering library will. A compositor will not want ECC. A safety component probably will.

All of them are "multimedia".

We *need* to be able to differentiate policy from hardware requirements.

Do we really? My point is that if we have, say, a safety component that needs hardware and software to access certain memory, then by definition that memory needs to properties that satisfy both the hardware *and* the software components involved with that memory. Otherwise it's all just not going to work.

If you have an ISP that never needs to pass the buffer to software for post-processing or whatever, then there's hardly a need for that buffer to be cached. On the other hand, if the system requires software post- processing, I bet you that the system will be designed such that the ISP and software can efficiently access that particular shared memory region or else, again, the system won't work.

Given that these are special purpose carveout regions, I have a hard time imagining somebody creating arbitrary heaps just for the sake of it.

...

...
The idea is that heaps for these use-cases are quite specific, so you would likely not find an arbitrary number of processes try to use the same heap.

Some of them are specific, some of them aren't.

Which ones wouldn't be specific? Of course I can /think/ of arbitrarily generic heaps, but the real question is whether we are going to encounter these in practice.

...

...
...
How can it know which heap to allocate from on a given platform?

Similarly with the ECC support we started that discussion with. ECC will introduce a significant performance cost. How can a generic application, such as a compositor, will know which heap to allocate from without:

a) Trying to bundle up a list of heaps for each platform it might or might not run

b) and handling the name difference between BSPs and mainline.

Obviously some standardization of heap names is a requirement here, otherwise such a proposal does indeed not make sense.

...
If some hardware-specific applications / middleware want to take a shortcut and use the name, that's fine. But we need to find a way for generic applications to discover which heap is best suited for their needs without the name.

You can still have fairly generic names for heaps. If you want protected content, you could try to use a standard "video-protected" heap. If you need ECC protected memory, maybe you want to allocate from a heap named "safety", or whatever.

And if I need cacheable, physically contiguous, "multimedia" buffers from ECC protected memory?

Again, I think you're trying to design for a very theoretically generic use-case that doesn't exist.

Note also that I'm not necessarily talking about global names here, but if necessary these could be per-device or per-use-case. If you have ECC protected memory that you may want to use in certain cases, you could call this "safety" *in the context* of "multimedia". So you could associate multiple multimedia heaps with a video encoder. One could be used if only plain physically contiguous memory is needed, and another would be used if ECC protection is needed.

These two heaps could be different from regular and safety heaps of a camera, for example.

So even if we have a fairly large number of heaps globally, I expect the number of heaps per-use-case to be very small (and easily named).

...

...
...
...
...
And if we allow any platform to change a given heap name, then a generic application won't be able to support that without some kind of platform-specific configuration.

We could still standardize on common use-cases so that applications would know what heaps to allocate from. But there's also no need to arbitrarily restrict this. For example there could be cases that are very specific to a particular platform and which just doesn't exist anywhere else. Platform designers could then still use this mechanism to define that very particular heap and have a very specialized userspace application use that heap for their purpose.

We could just add a different capabitily flag to make sure those would get ignored.

Sure you can do all of this with a myriad of flags. But again, I'm trying to argue that we may not need this additional complexity. In a typical system, how many heaps do you encounter? You may need a generic one and then perhaps a handful specific ones? Or do you need more?

It's not a matter of the number of heaps, but what they provide.

It sounds like you want to design a system that allows any arbitrary number of carveouts to be defined, each with its own unique combination of capabilities. I'm afraid that's going to be overly complex and end up in a system that is very difficult to use. If I recall correctly there have been attempts to do something like this is the past (GBM allocator) and they didn't really go anywhere.

Ultimately I think we need to find the practical applications for this and then base the design on what the real world requirements are.

...

...
...
...
...
...
For example, if an applications wants to use a protected buffer, the application doesn't (and shouldn't need to) care about whether the heap for that buffer supports ECC or is backed by CMA. All it really needs to know is that it's the system's "protected" heap.

I mean... "protected" very much means backed by CMA already, it's pretty much the only thing we document, and we call it as such in Kconfig.

Well, CMA is really just an implementation detail, right? It doesn't make sense to advertise that to anything outside the kernel. Maybe it's an interesting fact that buffers allocated from these heaps will be physically contiguous?

CMA itself might be an implementation detail, but it's still right there in the name on ARM.

That doesn't mean we can do something more useful going forward (and perhaps symlink for backwards-compatibility if needed).

...
And being able to get physically contiguous buffers is critical on platforms without an IOMMU.

Again, I'm not trying to dispute the necessity of contiguous buffers. I'm trying to say that contextual names can be a viable alternative to full discoverability. If you want contiguous buffers, go call the heap "contiguous" and it's quite clear what it means.

You can even hide details such as IOMMU availability from userspace that way. On a system where an IOMMU is present, you could for example go and use IOMMU-backed memory in a "contiguous" heap, while on a system without an IOMMU the memory for the "contiguous" heap could come from CMA.

I can see the benefits from that, and it would be quite nice indeed. However, it still only addresses the "hardware" part of the requirements (ie, is it contiguous, accessible, etc.). It doesn't address applications having different requirements when it comes to what kind of attributes they'd like/need to get from the buffer.

If one application in the system wants contiguous (using your definition just above) buffers without caches, and the other wants to have contiguous cacheable buffers, if we're only using the name we'd need to instantiate two heaps, from the same allocator, for what's essentially a mapping attribute.

This sounds very hypothetical to me. Maybe we have a fundamentally different view of what these heaps are supposed to be, but in my view they are very specific regions of memory that serve a special purpose, so they are very unlikely going to need a lot of flexibility. If one application is going to require uncached buffers, then any application is likely going to require uncached buffers for that particular use- case. In fact, I'd say there's probably only one application using the functionality in the first place.

Again, I realize that I may have a very limited picture of what is needed for existing use-cases, so maybe we can start collecting some data about real-world use-cases for these carveouts to get a better understanding of what we need?

...

It's more complex for the kernel, more code to maintain, and more complex for applications too because they need to know about what a given name means for that particular context.

I don't think it will be very complex or a lot of code to make this name-based. In fact I expect it to become quite simple. There's going to have to be some (generic) code that knows how to link carveouts to the devices that use them, but the rest should be pretty straightforward.

As for applications, isn't it going to be much easier to request a heap allocation "by name" rather than having to discover all heaps and determining the best one?

Thierry

Maxime Ripard

16 May 16 May

12:21 p.m.

Hi John,

Thanks for your feedback

On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...

On Wed, May 15, 2024 at 6:57 AM Maxime Ripard mripard@kernel.org wrote:

...
This series is the follow-up of the discussion that John and I had a few months ago here:

https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrro...

The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases.

After a suggestion from John, I chose to start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties.

We could also easily extend this mechanism to support more flags, or through a new ioctl to discover which flags a given heap supports.

Hey! Thanks for sending this along! I'm eager to see more heap related work being done upstream.

The only thing that makes me a bit hesitant, is the introduction of allocation flags (as opposed to a uniquely specified/named "ecc" heap).

We did talk about this earlier, and my earlier press that only if the ECC flag was general enough to apply to the majority of heaps then it makes sense as a flag, and your patch here does apply it to all the heaps. So I don't have an objection.

But it makes me a little nervous to add a new generic allocation flag for a feature most hardware doesn't support (yet, at least). So it's hard to weigh how common the actual usage will be across all the heaps.

I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

I understand your hesitation and concern :) Is there anything we could provide that would help moving the discussion forward?

...

Another thing to discuss, that I didn't see in your mail: Do we have an open-source user of this new flag?

Not at the moment. I guess it would be one of the things that would reduce your concerns, but is it a requirement?

Thanks! Maxime

John Stultz

5:09 p.m.

On Thu, May 16, 2024 at 5:22 AM Maxime Ripard mripard@kernel.org wrote:

...

On Wed, May 15, 2024 at 11:42:58AM -0700, John Stultz wrote:

...
I apologize as my worry is mostly born out of seeing vendors really push opaque feature flags in their old ion heaps, so in providing a flags argument, it was mostly intended as an escape hatch for obviously common attributes. So having the first be something that seems reasonable, but isn't actually that common makes me fret some.

So again, not an objection, just something for folks to stew on to make sure this is really the right approach.

I understand your hesitation and concern :) Is there anything we could provide that would help moving the discussion forward?

Mostly I just want to make sure it's discussed, which is why I raise it as a concern.

Getting APIs right is hard, and I know I've made my share of mistakes (for instance, a situation that very much echoes this current question: the *_ALARM clockids probably should have used a flag). So I've found highlighting the trade-offs and getting other folk's perspectives useful to avoid such issues. But I also don't intend to needlessly delay things.

...

...
Another thing to discuss, that I didn't see in your mail: Do we have an open-source user of this new flag?

Not at the moment. I guess it would be one of the things that would reduce your concerns, but is it a requirement?

So I'd defer to Sima on this. There have been a number of heap releated changes that have had to be held out of tree on this requirement, but I'm hoping recent efforts on upstream device support will eventually unblock those.

thanks -john

475

days inactive

533

days old

linaro-mm-sig@lists.linaro.org

31 comments

participants

tags (0)

participants (8)

Christian König
Christian König
Daniel Vetter
Daniel Vetter
John Stultz
kernel test robot
Maxime Ripard
Thierry Reding