Hi,
Why: ==== While sharing buffers using dma-buf, currently there's no mechanism to let devices share their memory access constraints with each other to allow for delayed allocation of backing storage.
This RFC attempts to introduce the idea of memory constraints of a device, and how these constraints can be shared and used to help allocate buffers that can satisfy requirements of all devices attached to a particular dma-buf.
How: ==== A constraints_mask is added to dma_parms of the device, and at the time of each device attachment to a dma-buf, the dma-buf uses this constraints_mask to calculate the access_mask for the dma-buf.
Allocators can be defined for each of these constraints_masks, and then helper functions can be used to allocate the backing storage from the matching allocator satisfying the constraints of all devices interested.
A new miscdevice, /dev/cenalloc [1] is created, which acts as the dma-buf exporter to make this transparent to the devices.
More details in the patch description of "cenalloc: Constraint-Enabled Allocation helpers for dma-buf".
At present, the constraint_mask is only a bitmask, but it should be possible to change it to a struct and adapt the constraint_mask calculation accordingly, based on discussion.
Important requirement: ====================== Of course, delayed allocation can only work if all participating devices will wait for other devices to have 'attached' before mapping the buffer for the first time.
As of now, users of dma-buf(drm prime, v4l2 etc) call the attach() and then map_attachment() almost immediately after it. This would need to be changed if they were to benefit from constraints.
What 'cenalloc' is not: ======================= - not 'general' allocator helpers - useful only for constraints-enabled devices that share buffers with others using dma-buf. - not a replacement for existing allocation mechanisms inside various subsystems; merely a possible alternative. - no page-migration - it would be very complementary to the delayed allocation suggested here.
TODOs: ====== - demonstration test cases - vma helpers for allocators - more sample allocators - userspace ioctl (It should be a simple one, and we have one ready, but wanted to agree on the kernel side of things first)
May the brickbats begin, please! :)
Best regards, ~Sumit.
[1]: 'C'onstraints 'EN'abled 'ALLOC'ation helpers = cenalloc: it might not be a very appealing name, so suggestions are very welcome!
Benjamin Gaignard (1): cenalloc: a sample allocator for contiguous page allocation
Sumit Semwal (3): dma-buf: Add constraints sharing information cenalloc: Constraint-Enabled Allocation helpers for dma-buf cenalloc: Build files for constraint-enabled allocation helpers
MAINTAINERS | 1 + drivers/Kconfig | 2 + drivers/Makefile | 1 + drivers/cenalloc/Kconfig | 8 + drivers/cenalloc/Makefile | 3 + drivers/cenalloc/cenalloc.c | 597 ++++++++++++++++++++++++++++++ drivers/cenalloc/cenalloc.h | 99 +++++ drivers/cenalloc/cenalloc_priv.h | 188 ++++++++++ drivers/cenalloc/cenalloc_system_contig.c | 225 +++++++++++ drivers/dma-buf/dma-buf.c | 50 ++- include/linux/device.h | 7 +- include/linux/dma-buf.h | 14 + 12 files changed, 1189 insertions(+), 6 deletions(-) create mode 100644 drivers/cenalloc/Kconfig create mode 100644 drivers/cenalloc/Makefile create mode 100644 drivers/cenalloc/cenalloc.c create mode 100644 drivers/cenalloc/cenalloc.h create mode 100644 drivers/cenalloc/cenalloc_priv.h create mode 100644 drivers/cenalloc/cenalloc_system_contig.c
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org --- drivers/dma-buf/dma-buf.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- include/linux/device.h | 7 ++++++- include/linux/dma-buf.h | 14 +++++++++++++ 3 files changed, 65 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index f3014c4..33bdb6a 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -264,6 +264,30 @@ static inline int is_dma_buf_file(struct file *file) return file->f_op == &dma_buf_fops; }
+/* + * def_calc_access_constraints - default implementation of constraint checking + */ +static int def_calc_access_constraints(struct dma_buf *dmabuf, + struct dma_buf_attachment *attach) +{ + unsigned long access_mask; + + access_mask = attach->dev->dma_parms->access_constraints_mask; + + if (!access_mask) { + pr_warn("%s dev has no access_constraints_mask; using default\n", + dev_name(attach->dev)); + access_mask = DMA_BUF_ALL_MEMORY_ACCESS_MASK; + } + + dmabuf->access_constraints_mask &= access_mask; + + if (!dmabuf->access_constraints_mask) + return -EINVAL; + + return 0; +} + /** * dma_buf_export_named - Creates a new dma_buf, and associates an anon file * with this buffer, so it can be exported. @@ -313,6 +337,8 @@ struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops, dmabuf->ops = ops; dmabuf->size = size; dmabuf->exp_name = exp_name; + dmabuf->access_constraints_mask = DMA_BUF_ALL_MEMORY_ACCESS_MASK; + init_waitqueue_head(&dmabuf->poll); dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll; dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0; @@ -410,8 +436,10 @@ void dma_buf_put(struct dma_buf *dmabuf) EXPORT_SYMBOL_GPL(dma_buf_put);
/** - * dma_buf_attach - Add the device to dma_buf's attachments list; optionally, - * calls attach() of dma_buf_ops to allow device-specific attach functionality + * dma_buf_attach - Add the device to dma_buf's attachments list; + * calculates access_constraints and throws error if constraints aren't + * satisfied. Optionally, calls attach() of dma_buf_ops to allow + * device-specific attach functionality. * @dmabuf: [in] buffer to attach device to. * @dev: [in] device to be attached. * @@ -436,11 +464,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
mutex_lock(&dmabuf->lock);
+ if (!dmabuf->ops->calc_access_constraints) + ret = def_calc_access_constraints(dmabuf, attach); + else + ret = dmabuf->ops->calc_access_constraints(dmabuf, attach); + + if (ret) + goto err_attach; + if (dmabuf->ops->attach) { ret = dmabuf->ops->attach(dmabuf, dev, attach); if (ret) goto err_attach; } + list_add(&attach->node, &dmabuf->attachments);
mutex_unlock(&dmabuf->lock); @@ -785,7 +822,7 @@ static int dma_buf_describe(struct seq_file *s) return ret;
seq_puts(s, "\nDma-buf Objects:\n"); - seq_puts(s, "size\tflags\tmode\tcount\texp_name\n"); + seq_puts(s, "size\tflags\tmode\tcount\tconstraints\texp_name\n");
list_for_each_entry(buf_obj, &db_list.head, list_node) { ret = mutex_lock_interruptible(&buf_obj->lock); @@ -796,10 +833,11 @@ static int dma_buf_describe(struct seq_file *s) continue; }
- seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%s\n", + seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%08lx\t%s\n", buf_obj->size, buf_obj->file->f_flags, buf_obj->file->f_mode, (long)(buf_obj->file->f_count.counter), + buf_obj->access_constraints_mask, buf_obj->exp_name);
seq_puts(s, "\tAttached Devices:\n"); @@ -808,7 +846,9 @@ static int dma_buf_describe(struct seq_file *s) list_for_each_entry(attach_obj, &buf_obj->attachments, node) { seq_puts(s, "\t");
- seq_printf(s, "%s\n", dev_name(attach_obj->dev)); + seq_printf(s, "%s\t:%lx\n", + dev_name(attach_obj->dev), + attach_obj->dev->dma_parms->access_constraints_mask); attach_count++; }
diff --git a/include/linux/device.h b/include/linux/device.h index a608e23..f9aefa2 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -647,6 +647,11 @@ struct device_dma_parameters { */ unsigned int max_segment_size; unsigned long segment_boundary_mask; + /* + * access_constraints_mask: this would be used to share constraints + * about memories that this device can access. + */ + unsigned long access_constraints_mask; };
struct acpi_device; @@ -696,7 +701,7 @@ struct acpi_dev_node { * such descriptors. * @dma_pfn_offset: offset of DMA memory range relatively of RAM * @dma_parms: A low level driver may set these to teach IOMMU code about - * segment limitations. + * segment limitations, and access constraints. * @dma_pools: Dma pools (if dma'ble device). * @dma_mem: Internal for coherent mem override. * @cma_area: Contiguous memory area for dma allocations diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 694e1fe..8429a38 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -37,6 +37,8 @@ struct device; struct dma_buf; struct dma_buf_attachment;
+#define DMA_BUF_ALL_MEMORY_ACCESS_MASK ((unsigned long)-1) + /** * struct dma_buf_ops - operations possible on struct dma_buf * @attach: [optional] allows different devices to 'attach' themselves to the @@ -44,6 +46,12 @@ struct dma_buf_attachment; * is already allocated and incompatible with the requirements * of requesting device. * @detach: [optional] detach a given device from this buffer. + * @calc_access_constraints(): [optional] will be called at the end of each + * attach - to calculate and set the constraints for this dma_buf + * according to this attachment's access_constraint_mask in + * dev->dma_parms. + * A default implementation is provided, but exporters are free to + * provide custom version if needed. * @map_dma_buf: returns list of scatter pages allocated, increases usecount * of the buffer. Requires atleast one attach to be called * before. Returned sg list should already be mapped into @@ -77,6 +85,9 @@ struct dma_buf_ops {
void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
+ int (*calc_access_constraints)(struct dma_buf *, + struct dma_buf_attachment *); + /* For {map,unmap}_dma_buf below, any specific buffer attributes * required should get added to device_dma_parameters accessible * via dev->dma_params. @@ -86,6 +97,7 @@ struct dma_buf_ops { void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction); + /* TODO: Add try_map_dma_buf version, to return immed with -EBUSY * if the call would block. */ @@ -116,6 +128,7 @@ struct dma_buf_ops { * @ops: dma_buf_ops associated with this buffer object. * @exp_name: name of the exporter; useful for debugging. * @list_node: node for dma_buf accounting and debugging. + * @access_constraints_mask: mask to share access constraints. * @priv: exporter specific private data for this buffer object. * @resv: reservation object linked to this dma-buf */ @@ -130,6 +143,7 @@ struct dma_buf { void *vmap_ptr; const char *exp_name; struct list_head list_node; + unsigned long access_constraints_mask; void *priv; struct reservation_object *resv;
On Sat, Oct 11, 2014 at 01:37:55AM +0530, Sumit Semwal wrote:
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
Just a few high-level comments, I'm between conference travel but hopefully I can discuss this a bit at plumbers next week.
- I agree that for the insane specific cases we need something opaque like the access constraints mask you propose here. But for the normal case I think the existing dma constraints in dma_params would go a long way, and I think we should look at Rob's RFC from aeons ago to solve those:
https://lkml.org/lkml/2012/7/19/285
With this we should be able to cover the allocation constraints of 90% of all cases hopefully.
- I'm not sure whether an opaque bitmask is good enough really, I suspect that we also need various priorities between different allocators. With the option that some allocators are flat-out incompatible.
- The big bummer imo with ION is that it fully side-steps, but this proposal here also seems to add entirely new allocators. My rough idea was that at allocate/attach time we iterate over all attached devices like in Rob's patch and compute the most constrained allocation requirements. Then we pick the underlying dma api allocator for these constraints. That probably means that we need to open up the dma api a bit. But I guess for a start we could simply try to allocate from the most constrained device. Together with the opaque bits you propose here we could even map additional crazy requirements like that an allocation must come from a specific memory bank (provided by a special-purpose CMA region). That might also mean that requirements are exclusive and no allocation is possible.
- I'm not sure we should allow drivers to override the access constraint checks really - the dma_buf interfaces already provide this possibility through the ->attach callback. In there exporters are allowed to reject the attachment for any reason whatsover.
- I think we should at least provide a helper implementation to allocate dma-buffers for multiple devices using the dma constraints logic we implement here. I think we should even go as far as providing a default implementation for dma-bufs which uses dma_alloc_coherent and this new dma contstraints computation code internally. This should be good enough for almost all devices, except those that do crazy stuff like swap support of buffer objects (gem/ttm), virtual hardware buffers (vmwgfx) or have other special needs (e.g. non-coherent buffers as speed optimization).
Cheers, Daniel
drivers/dma-buf/dma-buf.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- include/linux/device.h | 7 ++++++- include/linux/dma-buf.h | 14 +++++++++++++ 3 files changed, 65 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index f3014c4..33bdb6a 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -264,6 +264,30 @@ static inline int is_dma_buf_file(struct file *file) return file->f_op == &dma_buf_fops; } +/*
- def_calc_access_constraints - default implementation of constraint checking
- */
+static int def_calc_access_constraints(struct dma_buf *dmabuf,
struct dma_buf_attachment *attach)
+{
- unsigned long access_mask;
- access_mask = attach->dev->dma_parms->access_constraints_mask;
- if (!access_mask) {
pr_warn("%s dev has no access_constraints_mask; using default\n",
dev_name(attach->dev));
access_mask = DMA_BUF_ALL_MEMORY_ACCESS_MASK;
- }
- dmabuf->access_constraints_mask &= access_mask;
- if (!dmabuf->access_constraints_mask)
return -EINVAL;
- return 0;
+}
/**
- dma_buf_export_named - Creates a new dma_buf, and associates an anon file
- with this buffer, so it can be exported.
@@ -313,6 +337,8 @@ struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops, dmabuf->ops = ops; dmabuf->size = size; dmabuf->exp_name = exp_name;
- dmabuf->access_constraints_mask = DMA_BUF_ALL_MEMORY_ACCESS_MASK;
- init_waitqueue_head(&dmabuf->poll); dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll; dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0;
@@ -410,8 +436,10 @@ void dma_buf_put(struct dma_buf *dmabuf) EXPORT_SYMBOL_GPL(dma_buf_put); /**
- dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
- calls attach() of dma_buf_ops to allow device-specific attach functionality
- dma_buf_attach - Add the device to dma_buf's attachments list;
- calculates access_constraints and throws error if constraints aren't
- satisfied. Optionally, calls attach() of dma_buf_ops to allow
- device-specific attach functionality.
- @dmabuf: [in] buffer to attach device to.
- @dev: [in] device to be attached.
@@ -436,11 +464,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, mutex_lock(&dmabuf->lock);
- if (!dmabuf->ops->calc_access_constraints)
ret = def_calc_access_constraints(dmabuf, attach);
- else
ret = dmabuf->ops->calc_access_constraints(dmabuf, attach);
- if (ret)
goto err_attach;
- if (dmabuf->ops->attach) { ret = dmabuf->ops->attach(dmabuf, dev, attach); if (ret) goto err_attach; }
- list_add(&attach->node, &dmabuf->attachments);
mutex_unlock(&dmabuf->lock); @@ -785,7 +822,7 @@ static int dma_buf_describe(struct seq_file *s) return ret; seq_puts(s, "\nDma-buf Objects:\n");
- seq_puts(s, "size\tflags\tmode\tcount\texp_name\n");
- seq_puts(s, "size\tflags\tmode\tcount\tconstraints\texp_name\n");
list_for_each_entry(buf_obj, &db_list.head, list_node) { ret = mutex_lock_interruptible(&buf_obj->lock); @@ -796,10 +833,11 @@ static int dma_buf_describe(struct seq_file *s) continue; }
seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%s\n",
seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%08lx\t%s\n", buf_obj->size, buf_obj->file->f_flags, buf_obj->file->f_mode, (long)(buf_obj->file->f_count.counter),
buf_obj->access_constraints_mask, buf_obj->exp_name);
seq_puts(s, "\tAttached Devices:\n"); @@ -808,7 +846,9 @@ static int dma_buf_describe(struct seq_file *s) list_for_each_entry(attach_obj, &buf_obj->attachments, node) { seq_puts(s, "\t");
seq_printf(s, "%s\n", dev_name(attach_obj->dev));
seq_printf(s, "%s\t:%lx\n",
dev_name(attach_obj->dev),
}attach_obj->dev->dma_parms->access_constraints_mask); attach_count++;
diff --git a/include/linux/device.h b/include/linux/device.h index a608e23..f9aefa2 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -647,6 +647,11 @@ struct device_dma_parameters { */ unsigned int max_segment_size; unsigned long segment_boundary_mask;
- /*
* access_constraints_mask: this would be used to share constraints
* about memories that this device can access.
*/
- unsigned long access_constraints_mask;
}; struct acpi_device; @@ -696,7 +701,7 @@ struct acpi_dev_node {
such descriptors.
- @dma_pfn_offset: offset of DMA memory range relatively of RAM
- @dma_parms: A low level driver may set these to teach IOMMU code about
segment limitations.
segment limitations, and access constraints.
- @dma_pools: Dma pools (if dma'ble device).
- @dma_mem: Internal for coherent mem override.
- @cma_area: Contiguous memory area for dma allocations
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 694e1fe..8429a38 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -37,6 +37,8 @@ struct device; struct dma_buf; struct dma_buf_attachment; +#define DMA_BUF_ALL_MEMORY_ACCESS_MASK ((unsigned long)-1)
/**
- struct dma_buf_ops - operations possible on struct dma_buf
- @attach: [optional] allows different devices to 'attach' themselves to the
@@ -44,6 +46,12 @@ struct dma_buf_attachment;
is already allocated and incompatible with the requirements
of requesting device.
- @detach: [optional] detach a given device from this buffer.
- @calc_access_constraints(): [optional] will be called at the end of each
attach - to calculate and set the constraints for this dma_buf
according to this attachment's access_constraint_mask in
dev->dma_parms.
A default implementation is provided, but exporters are free to
provide custom version if needed.
- @map_dma_buf: returns list of scatter pages allocated, increases usecount
of the buffer. Requires atleast one attach to be called
before. Returned sg list should already be mapped into
@@ -77,6 +85,9 @@ struct dma_buf_ops { void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
- int (*calc_access_constraints)(struct dma_buf *,
struct dma_buf_attachment *);
- /* For {map,unmap}_dma_buf below, any specific buffer attributes
- required should get added to device_dma_parameters accessible
- via dev->dma_params.
@@ -86,6 +97,7 @@ struct dma_buf_ops { void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction);
- /* TODO: Add try_map_dma_buf version, to return immed with -EBUSY
*/
- if the call would block.
@@ -116,6 +128,7 @@ struct dma_buf_ops {
- @ops: dma_buf_ops associated with this buffer object.
- @exp_name: name of the exporter; useful for debugging.
- @list_node: node for dma_buf accounting and debugging.
*/
- @access_constraints_mask: mask to share access constraints.
- @priv: exporter specific private data for this buffer object.
- @resv: reservation object linked to this dma-buf
@@ -130,6 +143,7 @@ struct dma_buf { void *vmap_ptr; const char *exp_name; struct list_head list_node;
- unsigned long access_constraints_mask; void *priv; struct reservation_object *resv;
1.9.1
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 10/11/2014 11:55 AM, Daniel Vetter wrote:
On Sat, Oct 11, 2014 at 01:37:55AM +0530, Sumit Semwal wrote:
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
Just a few high-level comments, I'm between conference travel but hopefully I can discuss this a bit at plumbers next week.
I agree that for the insane specific cases we need something opaque like the access constraints mask you propose here. But for the normal case I think the existing dma constraints in dma_params would go a long way, and I think we should look at Rob's RFC from aeons ago to solve those:
https://lkml.org/lkml/2012/7/19/285
With this we should be able to cover the allocation constraints of 90% of all cases hopefully.
I'm not sure whether an opaque bitmask is good enough really, I suspect that we also need various priorities between different allocators. With the option that some allocators are flat-out incompatible.
From my experience with Ion, the bitmask is okay if you have only a few types but as soon as there are multiple regions it gets complicated and when you start adding in priority via id it really gets unwieldy.
Thanks, Laura
On Mon, Oct 13, 2014 at 01:14:58AM -0700, Laura Abbott wrote:
On 10/11/2014 11:55 AM, Daniel Vetter wrote:
On Sat, Oct 11, 2014 at 01:37:55AM +0530, Sumit Semwal wrote:
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
Just a few high-level comments, I'm between conference travel but hopefully I can discuss this a bit at plumbers next week.
I agree that for the insane specific cases we need something opaque like the access constraints mask you propose here. But for the normal case I think the existing dma constraints in dma_params would go a long way, and I think we should look at Rob's RFC from aeons ago to solve those:
https://lkml.org/lkml/2012/7/19/285
With this we should be able to cover the allocation constraints of 90% of all cases hopefully.
I'm not sure whether an opaque bitmask is good enough really, I suspect that we also need various priorities between different allocators. With the option that some allocators are flat-out incompatible.
From my experience with Ion, the bitmask is okay if you have only a few types but as soon as there are multiple regions it gets complicated and when you start adding in priority via id it really gets unwieldy.
My idea is to just have a priority field for all devices which have special requirements (i.e. beyond dma masks/alignement/segment limits). Same priority on different devices would indicate an incompatibility, but higher priority on a parent device would indicate a possible allocator. That way you could have a root allocator if there's a way to unify everything. Not sure though whether this will work, since on intel devices we just don't have these kinds of special constraints.
Overall my idea is to reuse the existing dma allocation code in upstream linux as much as possible. So having a completely separate hirarchy of memory allocators for shared buffers should imo be avoided. -Daniel
Hi Daniel,
Thanks a bunch for your review comments! A few comments, post our discussion at LPC;
On 12 October 2014 at 00:25, Daniel Vetter daniel@ffwll.ch wrote:
On Sat, Oct 11, 2014 at 01:37:55AM +0530, Sumit Semwal wrote:
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
Just a few high-level comments, I'm between conference travel but hopefully I can discuss this a bit at plumbers next week.
I agree that for the insane specific cases we need something opaque like the access constraints mask you propose here. But for the normal case I think the existing dma constraints in dma_params would go a long way, and I think we should look at Rob's RFC from aeons ago to solve those:
https://lkml.org/lkml/2012/7/19/285
With this we should be able to cover the allocation constraints of 90% of all cases hopefully.
I'm not sure whether an opaque bitmask is good enough really, I suspect that we also need various priorities between different allocators. With the option that some allocators are flat-out incompatible.
Your/Rob's idea to figure out the constraints wrt max number of segments in the sg_list can provide, like you said, maybe 80-90% of the allocation constraints hopefully. The opaque mask should help for the remaining 'crazy' cases, so I'll be glad to merge Rob's and my approach on defining the constraints.
I should think a little bit more about the priority idea that you propose here (and in another patch), but atm I am unable to see how that could help solve the finding-out-constraints problem.
- The big bummer imo with ION is that it fully side-steps, but this proposal here also seems to add entirely new allocators. My rough idea
This proposal does borrow this bit from ION, but once we have the required changes done in the dma api itself, the allocators can just become shims to the dma api allocators (eg dma_alloc_coherent etc) for cases where they can be used directly, while leaving provision for any crazy platform-specific allocators, without the userspace having to worry about it.
was that at allocate/attach time we iterate over all attached devices like in Rob's patch and compute the most constrained allocation requirements. Then we pick the underlying dma api allocator for these constraints. That probably means that we need to open up the dma api a bit. But I guess for a start we could simply try to allocate from the most constrained device. Together with the opaque bits you propose here we could even map additional crazy requirements like that an allocation must come from a specific memory bank (provided by a special-purpose CMA region). That might also mean that requirements are exclusive and no allocation is possible.
My idea was a little variation on what you said here - rather than do compute the most constraint allocation 'after' devices have attached (and right now, we don't really have a way to know that - but that's another point), I'd proposed to do the compute on each attach request, so the requesting drivers can know immediately if the attachment will not work for the other currently attached devices.
- I'm not sure we should allow drivers to override the access constraint checks really - the dma_buf interfaces already provide this possibility through the ->attach callback. In there exporters are allowed to reject the attachment for any reason whatsover.
This override the access constraint check is again meant only as a helper, but I could sure drop it.
- I think we should at least provide a helper implementation to allocate dma-buffers for multiple devices using the dma constraints logic we implement here. I think we should even go as far as providing a default implementation for dma-bufs which uses dma_alloc_coherent and this new dma contstraints computation code internally. This should be good enough
Ok, my idea was to keep the allocation helpers separate from dma-buf framework - hence the cenalloc idea; if it seems like an extremely terrible approach to separate out helpers, I could try and do an RFC based on your idea.
for almost all devices, except those that do crazy stuff like swap support of buffer objects (gem/ttm), virtual hardware buffers (vmwgfx) or have other special needs (e.g. non-coherent buffers as speed optimization).
Cenalloc type of idea could allow for these special needs I think!
Cheers, Daniel
drivers/dma-buf/dma-buf.c | 50 ++++++++++++++++++++++++++++++++++++++++++----- include/linux/device.h | 7 ++++++- include/linux/dma-buf.h | 14 +++++++++++++ 3 files changed, 65 insertions(+), 6 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index f3014c4..33bdb6a 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -264,6 +264,30 @@ static inline int is_dma_buf_file(struct file *file) return file->f_op == &dma_buf_fops; }
+/*
- def_calc_access_constraints - default implementation of constraint checking
- */
+static int def_calc_access_constraints(struct dma_buf *dmabuf,
struct dma_buf_attachment *attach)
+{
unsigned long access_mask;
access_mask = attach->dev->dma_parms->access_constraints_mask;
if (!access_mask) {
pr_warn("%s dev has no access_constraints_mask; using default\n",
dev_name(attach->dev));
access_mask = DMA_BUF_ALL_MEMORY_ACCESS_MASK;
}
dmabuf->access_constraints_mask &= access_mask;
if (!dmabuf->access_constraints_mask)
return -EINVAL;
return 0;
+}
/**
- dma_buf_export_named - Creates a new dma_buf, and associates an anon file
- with this buffer, so it can be exported.
@@ -313,6 +337,8 @@ struct dma_buf *dma_buf_export_named(void *priv, const struct dma_buf_ops *ops, dmabuf->ops = ops; dmabuf->size = size; dmabuf->exp_name = exp_name;
dmabuf->access_constraints_mask = DMA_BUF_ALL_MEMORY_ACCESS_MASK;
init_waitqueue_head(&dmabuf->poll); dmabuf->cb_excl.poll = dmabuf->cb_shared.poll = &dmabuf->poll; dmabuf->cb_excl.active = dmabuf->cb_shared.active = 0;
@@ -410,8 +436,10 @@ void dma_buf_put(struct dma_buf *dmabuf) EXPORT_SYMBOL_GPL(dma_buf_put);
/**
- dma_buf_attach - Add the device to dma_buf's attachments list; optionally,
- calls attach() of dma_buf_ops to allow device-specific attach functionality
- dma_buf_attach - Add the device to dma_buf's attachments list;
- calculates access_constraints and throws error if constraints aren't
- satisfied. Optionally, calls attach() of dma_buf_ops to allow
- device-specific attach functionality.
- @dmabuf: [in] buffer to attach device to.
- @dev: [in] device to be attached.
@@ -436,11 +464,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf,
mutex_lock(&dmabuf->lock);
if (!dmabuf->ops->calc_access_constraints)
ret = def_calc_access_constraints(dmabuf, attach);
else
ret = dmabuf->ops->calc_access_constraints(dmabuf, attach);
if (ret)
goto err_attach;
if (dmabuf->ops->attach) { ret = dmabuf->ops->attach(dmabuf, dev, attach); if (ret) goto err_attach; }
list_add(&attach->node, &dmabuf->attachments); mutex_unlock(&dmabuf->lock);
@@ -785,7 +822,7 @@ static int dma_buf_describe(struct seq_file *s) return ret;
seq_puts(s, "\nDma-buf Objects:\n");
seq_puts(s, "size\tflags\tmode\tcount\texp_name\n");
seq_puts(s, "size\tflags\tmode\tcount\tconstraints\texp_name\n"); list_for_each_entry(buf_obj, &db_list.head, list_node) { ret = mutex_lock_interruptible(&buf_obj->lock);
@@ -796,10 +833,11 @@ static int dma_buf_describe(struct seq_file *s) continue; }
seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%s\n",
seq_printf(s, "%08zu\t%08x\t%08x\t%08ld\t%08lx\t%s\n", buf_obj->size, buf_obj->file->f_flags, buf_obj->file->f_mode, (long)(buf_obj->file->f_count.counter),
buf_obj->access_constraints_mask, buf_obj->exp_name); seq_puts(s, "\tAttached Devices:\n");
@@ -808,7 +846,9 @@ static int dma_buf_describe(struct seq_file *s) list_for_each_entry(attach_obj, &buf_obj->attachments, node) { seq_puts(s, "\t");
seq_printf(s, "%s\n", dev_name(attach_obj->dev));
seq_printf(s, "%s\t:%lx\n",
dev_name(attach_obj->dev),
attach_obj->dev->dma_parms->access_constraints_mask); attach_count++; }
diff --git a/include/linux/device.h b/include/linux/device.h index a608e23..f9aefa2 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -647,6 +647,11 @@ struct device_dma_parameters { */ unsigned int max_segment_size; unsigned long segment_boundary_mask;
/*
* access_constraints_mask: this would be used to share constraints
* about memories that this device can access.
*/
unsigned long access_constraints_mask;
};
struct acpi_device; @@ -696,7 +701,7 @@ struct acpi_dev_node {
such descriptors.
- @dma_pfn_offset: offset of DMA memory range relatively of RAM
- @dma_parms: A low level driver may set these to teach IOMMU code about
segment limitations.
segment limitations, and access constraints.
- @dma_pools: Dma pools (if dma'ble device).
- @dma_mem: Internal for coherent mem override.
- @cma_area: Contiguous memory area for dma allocations
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 694e1fe..8429a38 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -37,6 +37,8 @@ struct device; struct dma_buf; struct dma_buf_attachment;
+#define DMA_BUF_ALL_MEMORY_ACCESS_MASK ((unsigned long)-1)
/**
- struct dma_buf_ops - operations possible on struct dma_buf
- @attach: [optional] allows different devices to 'attach' themselves to the
@@ -44,6 +46,12 @@ struct dma_buf_attachment;
is already allocated and incompatible with the requirements
of requesting device.
- @detach: [optional] detach a given device from this buffer.
- @calc_access_constraints(): [optional] will be called at the end of each
attach - to calculate and set the constraints for this dma_buf
according to this attachment's access_constraint_mask in
dev->dma_parms.
A default implementation is provided, but exporters are free to
provide custom version if needed.
- @map_dma_buf: returns list of scatter pages allocated, increases usecount
of the buffer. Requires atleast one attach to be called
before. Returned sg list should already be mapped into
@@ -77,6 +85,9 @@ struct dma_buf_ops {
void (*detach)(struct dma_buf *, struct dma_buf_attachment *);
int (*calc_access_constraints)(struct dma_buf *,
struct dma_buf_attachment *);
/* For {map,unmap}_dma_buf below, any specific buffer attributes * required should get added to device_dma_parameters accessible * via dev->dma_params.
@@ -86,6 +97,7 @@ struct dma_buf_ops { void (*unmap_dma_buf)(struct dma_buf_attachment *, struct sg_table *, enum dma_data_direction);
/* TODO: Add try_map_dma_buf version, to return immed with -EBUSY * if the call would block. */
@@ -116,6 +128,7 @@ struct dma_buf_ops {
- @ops: dma_buf_ops associated with this buffer object.
- @exp_name: name of the exporter; useful for debugging.
- @list_node: node for dma_buf accounting and debugging.
*/
- @access_constraints_mask: mask to share access constraints.
- @priv: exporter specific private data for this buffer object.
- @resv: reservation object linked to this dma-buf
@@ -130,6 +143,7 @@ struct dma_buf { void *vmap_ptr; const char *exp_name; struct list_head list_node;
unsigned long access_constraints_mask; void *priv; struct reservation_object *resv;
-- 1.9.1
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
-- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Wed, Dec 10, 2014 at 07:01:16PM +0530, Sumit Semwal wrote:
Hi Daniel,
Thanks a bunch for your review comments! A few comments, post our discussion at LPC;
On 12 October 2014 at 00:25, Daniel Vetter daniel@ffwll.ch wrote:
On Sat, Oct 11, 2014 at 01:37:55AM +0530, Sumit Semwal wrote:
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
Just a few high-level comments, I'm between conference travel but hopefully I can discuss this a bit at plumbers next week.
I agree that for the insane specific cases we need something opaque like the access constraints mask you propose here. But for the normal case I think the existing dma constraints in dma_params would go a long way, and I think we should look at Rob's RFC from aeons ago to solve those:
https://lkml.org/lkml/2012/7/19/285
With this we should be able to cover the allocation constraints of 90% of all cases hopefully.
I'm not sure whether an opaque bitmask is good enough really, I suspect that we also need various priorities between different allocators. With the option that some allocators are flat-out incompatible.
Your/Rob's idea to figure out the constraints wrt max number of segments in the sg_list can provide, like you said, maybe 80-90% of the allocation constraints hopefully. The opaque mask should help for the remaining 'crazy' cases, so I'll be glad to merge Rob's and my approach on defining the constraints.
I should think a little bit more about the priority idea that you propose here (and in another patch), but atm I am unable to see how that could help solve the finding-out-constraints problem.
- The big bummer imo with ION is that it fully side-steps, but this proposal here also seems to add entirely new allocators. My rough idea
This proposal does borrow this bit from ION, but once we have the required changes done in the dma api itself, the allocators can just become shims to the dma api allocators (eg dma_alloc_coherent etc) for cases where they can be used directly, while leaving provision for any crazy platform-specific allocators, without the userspace having to worry about it.
was that at allocate/attach time we iterate over all attached devices like in Rob's patch and compute the most constrained allocation requirements. Then we pick the underlying dma api allocator for these constraints. That probably means that we need to open up the dma api a bit. But I guess for a start we could simply try to allocate from the most constrained device. Together with the opaque bits you propose here we could even map additional crazy requirements like that an allocation must come from a specific memory bank (provided by a special-purpose CMA region). That might also mean that requirements are exclusive and no allocation is possible.
My idea was a little variation on what you said here - rather than do compute the most constraint allocation 'after' devices have attached (and right now, we don't really have a way to know that - but that's another point), I'd proposed to do the compute on each attach request, so the requesting drivers can know immediately if the attachment will not work for the other currently attached devices.
Well I said allocate/attach ;-) But yeah if we check at attach and reject anything that doesn't work then there's no need to check again when allocating, it /should/ work. But perhaps good to be paranoid and check again.
- I'm not sure we should allow drivers to override the access constraint checks really - the dma_buf interfaces already provide this possibility through the ->attach callback. In there exporters are allowed to reject the attachment for any reason whatsover.
This override the access constraint check is again meant only as a helper, but I could sure drop it.
- I think we should at least provide a helper implementation to allocate dma-buffers for multiple devices using the dma constraints logic we implement here. I think we should even go as far as providing a default implementation for dma-bufs which uses dma_alloc_coherent and this new dma contstraints computation code internally. This should be good enough
Ok, my idea was to keep the allocation helpers separate from dma-buf framework - hence the cenalloc idea; if it seems like an extremely terrible approach to separate out helpers, I could try and do an RFC based on your idea.
Oh, I like helpers, it'd just put them into the dma-buf code and integrate it directly instead of creating something separate.
for almost all devices, except those that do crazy stuff like swap support of buffer objects (gem/ttm), virtual hardware buffers (vmwgfx) or have other special needs (e.g. non-coherent buffers as speed optimization).
Cenalloc type of idea could allow for these special needs I think!
Well imo we should aim for 90% first, fix out fallout and then reasses what's needed. Tends to leat to better design overall. -Daniel
For me dmabuf and cenalloc offer two different features: one is buffer sharing (dmabuf) and one is buffer allocation (cenalloc).
You may want to use dmabuf sharing feature whithout need of the buffer allocation feature, that is what for drm, v4l2, ION and other use dmabuf.
In addition of dmabuf we need something aware of hardware devices constraints to allocate the buffer, that will be the role of cenalloc. Like ION, cenalloc will also provide a usespace API to allocate buffers but , unlike ION, the selection of the allocator will be based on devices constraints not on heap ID.
In first step I think we can forget the bitmask approach and only use the current fields of devices structure like coherent_dma_mask, dma_parms->max_segment_size or dma_parms->segment_boundary_mask to find matching allocator.
What I propose is to add in allocator ops a match(struct device*) function which will be call at attache time. In this way cenalloc could ask to each allocator if it is able to allocate buffer for all attached devices.
Regards, Benjamin
2014-12-10 14:47 GMT+01:00 Daniel Vetter daniel@ffwll.ch:
On Wed, Dec 10, 2014 at 07:01:16PM +0530, Sumit Semwal wrote:
Hi Daniel,
Thanks a bunch for your review comments! A few comments, post our discussion at LPC;
On 12 October 2014 at 00:25, Daniel Vetter daniel@ffwll.ch wrote:
On Sat, Oct 11, 2014 at 01:37:55AM +0530, Sumit Semwal wrote:
At present, struct device lacks a mechanism of exposing memory access constraints for the device.
Consequently, there is also no mechanism to share these constraints while sharing buffers using dma-buf.
If we add support for sharing such constraints, we could use that to try to collect requirements of different buffer-sharing devices to allocate buffers from a pool that satisfies requirements of all such devices.
This is an attempt to add this support; at the moment, only a bitmask is added, but if post discussion, we realise we need more information, we could always extend the definition of constraint.
A new dma-buf op is also added, to allow exporters to interpret or decide on constraint-masks on their own. A default implementation is provided to just AND (&) all the constraint-masks.
What constitutes a constraint-mask could be left for interpretation on a per-platform basis, while defining some common masks.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
Just a few high-level comments, I'm between conference travel but hopefully I can discuss this a bit at plumbers next week.
I agree that for the insane specific cases we need something opaque like the access constraints mask you propose here. But for the normal case I think the existing dma constraints in dma_params would go a long way, and I think we should look at Rob's RFC from aeons ago to solve those:
https://lkml.org/lkml/2012/7/19/285
With this we should be able to cover the allocation constraints of 90% of all cases hopefully.
I'm not sure whether an opaque bitmask is good enough really, I suspect that we also need various priorities between different allocators. With the option that some allocators are flat-out incompatible.
Your/Rob's idea to figure out the constraints wrt max number of segments in the sg_list can provide, like you said, maybe 80-90% of the allocation constraints hopefully. The opaque mask should help for the remaining 'crazy' cases, so I'll be glad to merge Rob's and my approach on defining the constraints.
I should think a little bit more about the priority idea that you propose here (and in another patch), but atm I am unable to see how that could help solve the finding-out-constraints problem.
- The big bummer imo with ION is that it fully side-steps, but this proposal here also seems to add entirely new allocators. My rough idea
This proposal does borrow this bit from ION, but once we have the required changes done in the dma api itself, the allocators can just become shims to the dma api allocators (eg dma_alloc_coherent etc) for cases where they can be used directly, while leaving provision for any crazy platform-specific allocators, without the userspace having to worry about it.
was that at allocate/attach time we iterate over all attached devices like in Rob's patch and compute the most constrained allocation requirements. Then we pick the underlying dma api allocator for these constraints. That probably means that we need to open up the dma api a bit. But I guess for a start we could simply try to allocate from the most constrained device. Together with the opaque bits you propose here we could even map additional crazy requirements like that an allocation must come from a specific memory bank (provided by a special-purpose CMA region). That might also mean that requirements are exclusive and no allocation is possible.
My idea was a little variation on what you said here - rather than do compute the most constraint allocation 'after' devices have attached (and right now, we don't really have a way to know that - but that's another point), I'd proposed to do the compute on each attach request, so the requesting drivers can know immediately if the attachment will not work for the other currently attached devices.
Well I said allocate/attach ;-) But yeah if we check at attach and reject anything that doesn't work then there's no need to check again when allocating, it /should/ work. But perhaps good to be paranoid and check again.
- I'm not sure we should allow drivers to override the access constraint checks really - the dma_buf interfaces already provide this possibility through the ->attach callback. In there exporters are allowed to reject the attachment for any reason whatsover.
This override the access constraint check is again meant only as a helper, but I could sure drop it.
- I think we should at least provide a helper implementation to allocate dma-buffers for multiple devices using the dma constraints logic we implement here. I think we should even go as far as providing a default implementation for dma-bufs which uses dma_alloc_coherent and this new dma contstraints computation code internally. This should be good enough
Ok, my idea was to keep the allocation helpers separate from dma-buf framework - hence the cenalloc idea; if it seems like an extremely terrible approach to separate out helpers, I could try and do an RFC based on your idea.
Oh, I like helpers, it'd just put them into the dma-buf code and integrate it directly instead of creating something separate.
for almost all devices, except those that do crazy stuff like swap support of buffer objects (gem/ttm), virtual hardware buffers (vmwgfx) or have other special needs (e.g. non-coherent buffers as speed optimization).
Cenalloc type of idea could allow for these special needs I think!
Well imo we should aim for 90% first, fix out fallout and then reasses what's needed. Tends to leat to better design overall.
-Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Devices sharing buffers using dma-buf could benefit from sharing their constraints via struct device, and dma-buf framework would manage the common constraints for all attached devices per buffer.
With that information, we could have a 'generic' allocator helper in the form of a central dma-buf exporter, which can create dma-bufs, and allocate backing storage at the time of first call to dma_buf_map_attachment.
This allocation would utilise the constraint-mask by matching it to the right allocator from a pool of allocators, and then allocating buffer backing storage from this allocator.
The pool of allocators could be platform-dependent, allowing for platforms to hide the specifics of these allocators from the devices that access the dma-buf buffers.
A sample sequence could be: - get handle to cenalloc_device, - create a dmabuf using cenalloc_buffer_create; - use this dmabuf to attach each device, which has its constraints set in the constraints mask (dev->dma_params->access_constraints_mask) - at each dma_buf_attach() call, dma-buf will check to see if the constraint mask for the device requesting attachment is compatible with the constraints of devices already attached to the dma-buf; returns an error if it isn't. - after all devices have attached, the first call to dma_buf_map_attachment() will allocate the backing storage for the buffer. - follow the dma-buf api for map / unmap etc usage. - detach all attachments, - call cenalloc_buffer_free to free the buffer if refcount reaches zero;
** IMPORTANT** This mechanism of delayed allocation based on constraint-enablement will work *ONLY IF* the first map_attachment() call is made AFTER all attach() calls are done.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org --- MAINTAINERS | 1 + drivers/cenalloc/cenalloc.c | 597 +++++++++++++++++++++++++++++++++++++++ drivers/cenalloc/cenalloc.h | 99 +++++++ drivers/cenalloc/cenalloc_priv.h | 188 ++++++++++++ 4 files changed, 885 insertions(+) create mode 100644 drivers/cenalloc/cenalloc.c create mode 100644 drivers/cenalloc/cenalloc.h create mode 100644 drivers/cenalloc/cenalloc_priv.h
diff --git a/MAINTAINERS b/MAINTAINERS index 40d4796..e88ac81 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3039,6 +3039,7 @@ L: linux-media@vger.kernel.org L: dri-devel@lists.freedesktop.org L: linaro-mm-sig@lists.linaro.org F: drivers/dma-buf/ +F: drivers/cenalloc/ F: include/linux/dma-buf* F: include/linux/reservation.h F: include/linux/*fence.h diff --git a/drivers/cenalloc/cenalloc.c b/drivers/cenalloc/cenalloc.c new file mode 100644 index 0000000..f278056 --- /dev/null +++ b/drivers/cenalloc/cenalloc.c @@ -0,0 +1,597 @@ +/* + * Allocator helper framework for constraints-aware dma-buf backing storage + * allocation. + * This allows constraint-sharing devices to deferred-allocate buffers shared + * via dma-buf. + * + * Copyright(C) 2014 Linaro Limited. All rights reserved. + * Author: Sumit Semwal sumit.semwal@linaro.org + * + * Structure for management of clients, buffers etc heavily derived from + * Android's ION framework. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/device.h> +#include <linux/err.h> +#include <linux/file.h> +#include <linux/fs.h> +#include <linux/miscdevice.h> +#include <linux/rbtree.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> +#include <linux/debugfs.h> +#include <linux/module.h> + +#include "cenalloc.h" +#include "cenalloc_priv.h" + +/* + * Constraints-aware allocator framework helper is meant to facilitate + * deferred allocation of backing storage for dma-buf buffers. + * It works for devices that can share their constraints via dma_params. + * These dma_params are then used by dma_buf_attach() to create a mask of + * common constraints. The cenalloc constraint helpers then allocate + * for the preferred allocator according to the constraint mask. + * Allocators and their corresponding constraint masks are pre-populated + * for a given system - likely at the time of platform initialization. + */ +/** + * struct cenalloc_device - the metadata of the cenalloc device node + * @dev: the actual misc device + * @buffers: an rb tree of all the existing buffers + * @buffer_lock: lock protecting the tree of buffers & handles + * @lock: rwsem protecting the tree of allocators + * @clients: list of all the clients created + * @allocators: list of all the allocators in the system + */ +struct cenalloc_device { + struct miscdevice dev; + struct rb_root buffers; + + struct mutex buffer_lock; + struct rw_semaphore lock; + + struct plist_head allocators; + +}; + + +/* this function should only be called while dev->buffer_lock is held */ +static void cenalloc_buffer_add(struct cenalloc_device *dev, + struct cenalloc_buffer *buffer) +{ + struct rb_node **p = &dev->buffers.rb_node; + struct rb_node *parent = NULL; + struct cenalloc_buffer *entry; + + while (*p) { + parent = *p; + entry = rb_entry(parent, struct cenalloc_buffer, node); + + if (buffer < entry) { + p = &(*p)->rb_left; + } else if (buffer > entry) { + p = &(*p)->rb_right; + } else { + pr_err("%s: buffer already found.", __func__); + BUG(); + } + } + + rb_link_node(&buffer->node, parent, p); + rb_insert_color(&buffer->node, &dev->buffers); +} + +static struct dma_buf_ops ca_dma_buf_ops; + +static bool is_cenalloc_buffer(struct dma_buf *dmabuf); + +/* + * cenalloc_buffer_create creates a buffer, exports the dma-buf handle and + * associates a dma-buf handle with it. + * Returns: + * on success, pointer to the associated dma_buf; + * error if dma-buf cannot be exported or if it is out of memory. + */ +struct dma_buf *cenalloc_buffer_create(struct cenalloc_device *dev, + unsigned long len, + unsigned long align, + unsigned long flags) +{ + struct cenalloc_buffer *buffer; + struct dma_buf *dmabuf; + + buffer = kzalloc(sizeof(struct cenalloc_buffer), GFP_KERNEL); + if (!buffer) + return ERR_PTR(-ENOMEM); + + buffer->flags = flags; + kref_init(&buffer->ref); + + buffer->dev = dev; + buffer->size = len; + buffer->align = align; + + dmabuf = dma_buf_export(buffer, &ca_dma_buf_ops, buffer->size, O_RDWR, + NULL); + if (IS_ERR(dmabuf)) + goto err; + + buffer->dmabuf = dmabuf; + dmabuf->priv = buffer; + + mutex_init(&buffer->lock); + + mutex_lock(&dev->buffer_lock); + cenalloc_buffer_add(dev, buffer); + mutex_unlock(&dev->buffer_lock); + + return dmabuf; + +err: + kfree(buffer); + return ERR_CAST(dmabuf); +} +EXPORT_SYMBOL_GPL(cenalloc_buffer_create); + + +static void cenalloc_buffer_destroy(struct cenalloc_buffer *buffer) +{ + if (WARN_ON(buffer->kmap_cnt > 0)) + buffer->allocator->ops->unmap_kernel(buffer->allocator, buffer); + buffer->allocator->ops->unmap_dma(buffer->allocator, buffer); + buffer->allocator->ops->free(buffer); + kfree(buffer); +} + +static void _cenalloc_buffer_destroy(struct cenalloc_buffer *buffer) +{ + struct cenalloc_device *dev = buffer->dev; + + mutex_lock(&dev->buffer_lock); + rb_erase(&buffer->node, &dev->buffers); + mutex_unlock(&dev->buffer_lock); + + cenalloc_buffer_destroy(buffer); +} + +void cenalloc_buffer_free(struct dma_buf *dmabuf) +{ + if (is_cenalloc_buffer(dmabuf)) + dma_buf_put(dmabuf); +} +EXPORT_SYMBOL_GPL(cenalloc_buffer_free); + +int cenalloc_phys(struct dma_buf *dmabuf, + phys_addr_t *addr, size_t *len) +{ + struct cenalloc_buffer *buffer; + int ret; + + if (is_cenalloc_buffer(dmabuf)) + buffer = (struct cenalloc_buffer *)dmabuf->priv; + else + return -EINVAL; + + if (!buffer->allocator->ops->phys) { + pr_err("%s: cenalloc_phys is not implemented by this allocator.\n", + __func__); + return -ENODEV; + } + mutex_lock(&buffer->lock); + ret = buffer->allocator->ops->phys(buffer->allocator, buffer, addr, + len); + mutex_lock(&buffer->lock); + return ret; +} +EXPORT_SYMBOL_GPL(cenalloc_phys); + +static void *cenalloc_buffer_kmap_get(struct cenalloc_buffer *buffer) +{ + void *vaddr; + + if (buffer->kmap_cnt) { + buffer->kmap_cnt++; + return buffer->vaddr; + } + vaddr = buffer->allocator->ops->map_kernel(buffer->allocator, buffer); + if (WARN_ONCE(vaddr == NULL, + "allocator->ops->map_kernel should return ERR_PTR on error")) + return ERR_PTR(-EINVAL); + if (IS_ERR(vaddr)) + return vaddr; + buffer->vaddr = vaddr; + buffer->kmap_cnt++; + return vaddr; +} + +static void cenalloc_buffer_kmap_put(struct cenalloc_buffer *buffer) +{ + buffer->kmap_cnt--; + if (!buffer->kmap_cnt) { + buffer->allocator->ops->unmap_kernel(buffer->allocator, buffer); + buffer->vaddr = NULL; + } +} + +struct sg_table *cenalloc_sg_table(struct dma_buf *dmabuf) +{ + struct sg_table *table; + struct cenalloc_buffer *buffer; + + if (is_cenalloc_buffer(dmabuf)) + buffer = (struct cenalloc_buffer *)dmabuf->priv; + else { + pr_err("%s: invalid buffer passed to sg_table.\n", + __func__); + return ERR_PTR(-EINVAL); + } + + mutex_lock(&buffer->lock); + table = buffer->sg_table; + mutex_unlock(&buffer->lock); + return table; +} +EXPORT_SYMBOL_GPL(cenalloc_sg_table); + +/* + * dma_buf ops implementation + */ + +static void cenalloc_buffer_sync_for_device(struct cenalloc_buffer *buffer, + struct device *dev, + enum dma_data_direction direction); + +/* + * This will find the right allocator for the buffer passed; + * assumption is, that all the interested importers have called dma_buf_attach() + * with their right constraint masks before the first call to + * dma_buf_map_attachment(). + * At present, using the same priority based mechanism as ION. + */ +static int cenalloc_find_allocator(struct cenalloc_device *dev, + struct cenalloc_buffer *buf) +{ + struct cenalloc_allocator *allocator; + unsigned long constraints_mask = buf->dmabuf->access_constraints_mask; + size_t len = buf->size; + + /* + * traverse the list of allocators available in this system in priority + * order. If the allocator type is supported by the client, and matches + * the request of the caller allocate from it. Repeat until allocate + * has succeeded or all allocators have been tried. + */ + len = PAGE_ALIGN(len); + + if (!len) + return -EINVAL; + + plist_for_each_entry(allocator, &dev->allocators, node) { + /* if the caller didn't specify this allocator id */ + if (!((1 << allocator->id) & constraints_mask)) + continue; + buf->allocator = allocator; + } + + if (buf->allocator == NULL) + return -ENODEV; + + return 0; +} + +static struct sg_table *cenalloc_buffer_first_alloc( + struct cenalloc_buffer *buffer) +{ + struct cenalloc_allocator *allocator = buffer->allocator; + struct sg_table *table; + + int num_pages = PAGE_ALIGN(buffer->size) / PAGE_SIZE; + struct scatterlist *sg; + int ret, i, j, k = 0; + + mutex_lock(&buffer->lock); + ret = cenalloc_find_allocator(buffer->dev, buffer); + + if (ret) { + mutex_unlock(&buffer->lock); + return ERR_PTR(-ENODEV); + } + + ret = allocator->ops->allocate(allocator, buffer, buffer->size, + buffer->align, buffer->flags); + if (ret) + goto cannot_allocate; + + table = allocator->ops->map_dma(allocator, buffer); + if (WARN_ONCE(table == NULL, + "allocator->ops->map_dma should return ERR_PTR on error")) + table = ERR_PTR(-EINVAL); + + if (IS_ERR(table)) { + /* TODO: find the right way to handle an error here? */ + allocator->ops->free(buffer); + + mutex_unlock(&buffer->lock); + return ERR_CAST(table); + } + + buffer->pages = vmalloc(sizeof(struct page *) * num_pages); + if (!buffer->pages) { + ret = -ENOMEM; + goto cannot_allocate; + } + + for_each_sg(table->sgl, sg, table->nents, i) { + struct page *page = sg_page(sg); + + for (j = 0; j < sg->length / PAGE_SIZE; j++) + buffer->pages[k++] = page++; + } + + mutex_unlock(&buffer->lock); + + return table; + +cannot_allocate: + mutex_unlock(&buffer->lock); + return ERR_PTR(ret); +} + +static void cenalloc_buffer_sync_for_device(struct cenalloc_buffer *buffer, + struct device *dev, + enum dma_data_direction dir) +{ + + if (!buffer->allocator->ops->sync_for_device) { + pr_err("%s: this allocator does not define a method for sync-to-device\n", + __func__); + return; + } + + buffer->allocator->ops->sync_for_device(buffer->allocator, buffer, dev, + dir); + +} + +/* + * cenalloc_map_dma_buf() models delayed allocation; so if the buffer is not + * backed up by storage, the allocation shall happen here for the first time, + * based on the constraint_mask of the dma_buf, which is set based on devices + * currently attached to the dma_buf. + * + * IMP: Assumption is that all participating devices call dma_buf_attach() + * before the first dma_buf_map_attachment() is called. + * + * Migration is not supported at this time. + */ +static struct sg_table *cenalloc_map_dma_buf(struct dma_buf_attachment *attach, + enum dma_data_direction direction) +{ + struct dma_buf *dmabuf = attach->dmabuf; + struct cenalloc_buffer *buffer = dmabuf->priv; + struct sg_table *table = NULL; + + if (!buffer->sg_table) { + down_read(&(buffer->dev->lock)); + table = cenalloc_buffer_first_alloc(buffer); + up_read(&(buffer->dev->lock)); + if (IS_ERR(table)) + return table; + } + + mutex_lock(&buffer->lock); + buffer->sg_table = table; + + cenalloc_buffer_sync_for_device(buffer, attach->dev, direction); + mutex_unlock(&buffer->lock); + + return buffer->sg_table; + +} + +static void cenalloc_unmap_dma_buf(struct dma_buf_attachment *attachment, + struct sg_table *table, + enum dma_data_direction direction) +{ + struct dma_buf *dmabuf = attachment->dmabuf; + struct cenalloc_buffer *buffer = dmabuf->priv; + + buffer->allocator->ops->unmap_dma(buffer->allocator, buffer); +} + +static int cenalloc_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) +{ + struct cenalloc_buffer *buffer = dmabuf->priv; + int ret = 0; + + if (!buffer->sg_table) { + pr_err( + "%s: buffer needs to be mapped first before userspace mapping\n", + __func__); + return -EINVAL; + } + + if (!buffer->allocator->ops->map_user) { + pr_err( + "%s: allocator does not define a method for userspace mapping\n", + __func__); + return -EINVAL; + } + + /* + * NOTE: For now, assume that the allocators will take care of any VMA + * management for the buffer - includes providing vma_ops, and / or + * managing mmap faults. + * struct cenalloc_buffer will still provide struct **pages, + * and also the vma area list for the buffer. + * This also allows for device-specific vma operations. + * + */ + + mutex_lock(&buffer->lock); + /* now map it to userspace */ + ret = buffer->allocator->ops->map_user(buffer->allocator, buffer, vma); + mutex_unlock(&buffer->lock); + + if (ret) + pr_err("%s: failure mapping buffer to userspace\n", + __func__); + + return ret; +} + +static void cenalloc_dma_buf_release(struct dma_buf *dmabuf) +{ + struct cenalloc_buffer *buffer = dmabuf->priv; + + _cenalloc_buffer_destroy(buffer); +} + + +static void *cenalloc_dma_buf_kmap(struct dma_buf *dmabuf, unsigned long offset) +{ + struct cenalloc_buffer *buffer = dmabuf->priv; + + return buffer->vaddr + offset * PAGE_SIZE; +} + +static void cenalloc_dma_buf_kunmap(struct dma_buf *dmabuf, + unsigned long offset, + void *ptr) +{ + /* TODO */ +} + +static int cenalloc_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + size_t start, size_t len, + enum dma_data_direction dir) +{ + struct cenalloc_buffer *buffer = dmabuf->priv; + void *vaddr; + + if (!buffer->allocator->ops->map_kernel) { + pr_err("%s: map kernel is not implemented by this allocator.\n", + __func__); + return -ENODEV; + } + + mutex_lock(&buffer->lock); + vaddr = cenalloc_buffer_kmap_get(buffer); + mutex_unlock(&buffer->lock); + return PTR_ERR_OR_ZERO(vaddr); +} + +static void cenalloc_dma_buf_end_cpu_access(struct dma_buf *dmabuf, + size_t start, size_t len, + enum dma_data_direction dir) +{ + struct cenalloc_buffer *buffer = dmabuf->priv; + + mutex_lock(&buffer->lock); + cenalloc_buffer_kmap_put(buffer); + mutex_unlock(&buffer->lock); +} + +static struct dma_buf_ops ca_dma_buf_ops = { + .map_dma_buf = cenalloc_map_dma_buf, + .unmap_dma_buf = cenalloc_unmap_dma_buf, + .mmap = cenalloc_mmap, + .release = cenalloc_dma_buf_release, + .begin_cpu_access = cenalloc_dma_buf_begin_cpu_access, + .end_cpu_access = cenalloc_dma_buf_end_cpu_access, + .kmap = cenalloc_dma_buf_kmap, + .kunmap = cenalloc_dma_buf_kunmap, +}; + +static bool is_cenalloc_buffer(struct dma_buf *dmabuf) +{ + return(dmabuf->ops == &ca_dma_buf_ops); +} + +/** + * cenalloc_device_add_allocator: + * Adds an allocator to the cenalloc_device; This should be called at + * platform initialization time, for all allocators for the platform. + */ +void cenalloc_device_add_allocator(struct cenalloc_device *dev, + struct cenalloc_allocator *allocator) +{ + + if (!allocator->ops->allocate || !allocator->ops->free || + !allocator->ops->map_dma || !allocator->ops->unmap_dma) + pr_err("%s: can not add allocator with invalid ops struct.\n", + __func__); + + allocator->dev = dev; + down_write(&dev->lock); + /* use negative allocator->id to reverse the priority -- when traversing + the list later attempt higher id numbers first */ + plist_node_init(&allocator->node, -allocator->id); + plist_add(&allocator->node, &dev->allocators); + + up_write(&dev->lock); +} +EXPORT_SYMBOL_GPL(cenalloc_device_add_allocator); + +/* + * Device Init and Remove + */ +static struct cenalloc_device cenalloc_dev = { + .dev.minor = MISC_DYNAMIC_MINOR, + .dev.name = "cenalloc", + .dev.parent = NULL, +}; + +/* + * TODO: this mechanism of getting a cenalloc device isn't the best, + * Need to have a better way of getting handle to device. + */ +struct cenalloc_device *cenalloc_get_device(void) +{ + return &cenalloc_dev; +} +EXPORT_SYMBOL_GPL(cenalloc_get_device); + +static int __init cenalloc_device_init(void) +{ + int ret; + + ret = misc_register(&cenalloc_dev.dev); + if (ret) { + pr_err("cenalloc: failed to register misc device.\n"); + return ret; + } + + cenalloc_dev.buffers = RB_ROOT; + mutex_init(&cenalloc_dev.buffer_lock); + init_rwsem(&cenalloc_dev.lock); + plist_head_init(&cenalloc_dev.allocators); + return ret; +} + +static void __exit cenalloc_device_remove(void) +{ + misc_deregister(&cenalloc_dev.dev); + + /* XXX need to free the allocators? */ +} + +module_init(cenalloc_device_init); +module_exit(cenalloc_device_remove); + +MODULE_AUTHOR("Sumit Semwal sumit.semwal@linaro.org"); +MODULE_DESCRIPTION("Constraint Aware Central Allocation Helper"); +MODULE_LICENSE("GPL"); diff --git a/drivers/cenalloc/cenalloc.h b/drivers/cenalloc/cenalloc.h new file mode 100644 index 0000000..91e07b2 --- /dev/null +++ b/drivers/cenalloc/cenalloc.h @@ -0,0 +1,99 @@ +/* + * Header file for allocator helper framework for constraints-aware + * dma-buf backing storage allocation. + * + * Copyright(C) 2014 Linaro Limited. All rights reserved. + * Author: Sumit Semwal sumit.semwal@linaro.org + * + * Structure for management of device, buffers etc heavily derived from + * Android's ION framework. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef CENALLOC_H_ +#define CENALLOC_H_ + +#include <linux/types.h> +#include <linux/dma-buf.h> + +struct cenalloc_device; + +/** + * cenalloc_get_device: + * gets a reference to cenalloc_device; this should be used in the + * call to cenalloc_buffer_create. + * + * TODO: might need to have a better way of getting this device. + */ +const struct cenalloc_device *cenalloc_get_device(void); + +/** + * cenalloc_buffer_create: + * creates a cenalloc_buffer, associates a dma_buf buffer with it, + * and returns the dma_buf; other importers can then use references + * to this dma_buf and attach themselves to it. + * + * Note: Since this is delayed-allocation model, no actual allocation + * will happen at this call. + * + * @dev: cenalloc_device to create the buffer from + * @len: size of the buffer + * @align: alignment info, if any + * @flags: flags for the buffer, if any + * + */ +struct dma_buf *cenalloc_buffer_create(struct cenalloc_device *dev, + unsigned long len, + unsigned long align, + unsigned long flags); + +/** + * cenalloc_buffer_free: + * calls dma_buf_put(), which in turn may call allocator->free() + * if this was the last reference held. + * For dma-bufs created with cenalloc_buffer_create, this should be + * called instead of dma_buf_put() directly. + * + * @dma_buf: the dma_buf to free. + */ +void cenalloc_buffer_free(struct dma_buf *dmabuf); + +/** + * cenalloc_phys: + * returns the phys address and len associated with this buffer - this + * will get refined as the ION 'abuse' of phys_addr_t is corrected. + * This is valid only for buffers that are allocated from physically + * contiguous memory; its output is invalid otherwise. For such cases, + * cenalloc_sg_table() should be used instead. + * Will return -EINVAL if the buffer is invalid. + * + * @dmabuf: buffer for which phys address is needed + * @phys: pointer to the phys address + * @len: pointer to teh length of the buffer + * + */ +int cenalloc_phys(struct dma_buf *dmabuf, + phys_addr_t *addr, size_t *len); + +/** + * cenalloc_sg_table: + * returns the sg_table associated with the dma_buf. + * Will return -EINVAL in case of error. + * + * @dmabuf: handle to buffer who's sg_table is to be returned. + * + */ +struct sg_table *cenalloc_sg_table(struct dma_buf *dmabuf); + +#endif /* CENALLOC_H_ */ diff --git a/drivers/cenalloc/cenalloc_priv.h b/drivers/cenalloc/cenalloc_priv.h new file mode 100644 index 0000000..31f5e59 --- /dev/null +++ b/drivers/cenalloc/cenalloc_priv.h @@ -0,0 +1,188 @@ +/* + * Private header file for allocator helper framework for constraints-aware + * dma-buf backing storage allocation. + * + * Copyright(C) 2014 Linaro Limited. All rights reserved. + * Author: Sumit Semwal sumit.semwal@linaro.org + * + * Structure for management of clients, buffers etc heavily derived from + * Android's ION framework. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ +#ifndef CENALLOC_PRIV_H_ +#define CENALLOC_PRIV_H_ + +#include <linux/device.h> +#include <linux/dma-direction.h> +#include <linux/kref.h> +#include <linux/mm_types.h> +#include <linux/mutex.h> +#include <linux/rbtree.h> +#include <linux/sched.h> +#include <linux/types.h> + +#include "cenalloc.h" + +/** + * TODO: Find a better way to generate and manage allocator IDs + * For now, define a couple of allocator types + * + * @CENALLOC_ALLOCATOR_SYSTEM: eg memory allocated via vmalloc + * @CENALLOC_ALLOCATOR_SYSTEM_CONTIG: eg memory allocated via kmalloc + * @CENALLOC_NUM_ALLOCATORS: helper for iterating over Allocators + * a bit mask is used to identify the + * allocators, so only 32 total allocator + * types are supported + */ +#define CENALLOC_ALLOCATOR_SYSTEM 0x1 +#define CENALLOC_ALLOCATOR_SYSTEM_CONTIG (CENALLOC_ALLOCATOR_SYSTEM + 1) + +#define CENALLOC_NUM_ALLOCATORS 32 + +#define CENALLOC_ALLOCATOR_SYSTEM_MASK (1 << CENALLOC_ALLOCATOR_SYSTEM) +#define CENALLOC_ALLOCATOR_SYSTEM_CONTIG_MASK \ + (1 << CENALLOC_ALLOCATOR_SYSTEM_CONTIG) + + +struct cenalloc_device; +/** + * struct cenalloc_buffer - metadata for a particular buffer + * @node: node in the cenalloc_device buffers tree + * @dev: back pointer to the cenalloc_device + * @allocator: back pointer to the allocator the buffer came from + * @flags: buffer specific flags + * @private_flags: internal buffer specific flags + * @size: size of the buffer + * @priv_virt: private data to the buffer representable as + * a void * + * @lock: protects the buffers cnt fields + * @pages: flat array of pages in the buffer -- used by fault + * handler and only valid for buffers that are faulted in + * @vmas: list of vma's mapping this buffer + * @sg_table: the sg table for the buffer if dmap_cnt is not zero + * @handle_count: count of handles referencing this buffer +*/ +struct cenalloc_buffer { + struct rb_node node; + struct cenalloc_device *dev; + struct cenalloc_allocator *allocator; + unsigned long flags; + unsigned long align; + unsigned long private_flags; + + size_t size; + + struct kref ref; + unsigned int kmap_cnt; + struct mutex lock; + void *vaddr; + struct page **pages; + struct list_head vmas; + + struct sg_table *sg_table; + struct dma_buf *dmabuf; +}; + +/** + * struct cenalloc_allocator_ops - ops to operate on a given allocator + * @allocate: allocate memory + * @free: free memory + * @phys: get physical address of a buffer (only define on + * physically contiguous allocators) + * @map_dma: map the memory for dma to a scatterlist + * @unmap_dma: unmap the memory for dma + * @map_kernel: map memory to the kernel + * @unmap_kernel: unmap memory to the kernel + * @map_user: map memory to userspace + * @sync_for_device: sync the memory for device + * + * allocate, phys, and map_user return 0 on success, -errno on error. + * map_dma and map_kernel return pointer on success, ERR_PTR on + * error. + */ +struct cenalloc_allocator_ops { + int (*allocate)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer, unsigned long len, + unsigned long align, unsigned long flags); + void (*free)(struct cenalloc_buffer *buffer); + int (*phys)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer, + phys_addr_t *addr, size_t *len); + struct sg_table * (*map_dma)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer); + void (*unmap_dma)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer); + void * (*map_kernel)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer); + void (*unmap_kernel)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer); + int (*map_user)(struct cenalloc_allocator *mapper, + struct cenalloc_buffer *buffer, + struct vm_area_struct *vma); + + void (*sync_for_device)(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer, + struct device *dev, + enum dma_data_direction dir); +}; + +/** + * struct cenalloc_allocator - represents a allocator in the system + * @node: rb node to put the allocator on the device's tree of + * allocators + * @dev: back pointer to the cenalloc_device + * @type: type of allocator + * @ops: ops struct as above + * @flags: flags + * @id: id of allocator, also indicates priority of this + * allocator when allocating. These MUST be unique. + * @name: used for debugging + * @debug_show: called when allocator debug file is read to add any + * allocator specific debug info to output + * + * Represents a pool of memory from which buffers can be allocated. In some + * systems the only allocator is regular system memory allocated via vmalloc. + * On others, some blocks might require large physically contiguous buffers + * that are allocated from a specially reserved allocator. + */ +struct cenalloc_allocator { + struct plist_node node; + struct cenalloc_device *dev; + unsigned long type; + struct cenalloc_allocator_ops *ops; + struct vm_operations_struct *vma_ops; + unsigned long flags; + unsigned int id; + const char *name; + + int (*debug_show)(struct cenalloc_allocator *allocator, + struct seq_file *, void *); +}; + +/** + * cenalloc_device_add_allocator - adds an allocator to the cenalloc device + * @dev: the device + * @allocator: the allocator to add + */ +void cenalloc_device_add_allocator(struct cenalloc_device *dev, + struct cenalloc_allocator *allocator); + +/** + * TODO: add some helpers for common allocator operations on buffers; + * some candidates might be: + * cenalloc_allocator_{map_kernel, unmap_kernel, map_user} + * some vma_* management operations + */ + +#endif /* CENALLOC_PRIV_H_ */
On Sat, Oct 11, 2014 at 01:37:56AM +0530, Sumit Semwal wrote:
Devices sharing buffers using dma-buf could benefit from sharing their constraints via struct device, and dma-buf framework would manage the common constraints for all attached devices per buffer.
With that information, we could have a 'generic' allocator helper in the form of a central dma-buf exporter, which can create dma-bufs, and allocate backing storage at the time of first call to dma_buf_map_attachment.
This allocation would utilise the constraint-mask by matching it to the right allocator from a pool of allocators, and then allocating buffer backing storage from this allocator.
The pool of allocators could be platform-dependent, allowing for platforms to hide the specifics of these allocators from the devices that access the dma-buf buffers.
A sample sequence could be:
- get handle to cenalloc_device,
- create a dmabuf using cenalloc_buffer_create;
- use this dmabuf to attach each device, which has its constraints set in the constraints mask (dev->dma_params->access_constraints_mask)
- at each dma_buf_attach() call, dma-buf will check to see if the constraint mask for the device requesting attachment is compatible with the constraints of devices already attached to the dma-buf; returns an error if it isn't.
- after all devices have attached, the first call to dma_buf_map_attachment() will allocate the backing storage for the buffer.
- follow the dma-buf api for map / unmap etc usage.
- detach all attachments,
- call cenalloc_buffer_free to free the buffer if refcount reaches zero;
** IMPORTANT** This mechanism of delayed allocation based on constraint-enablement will work *ONLY IF* the first map_attachment() call is made AFTER all attach() calls are done.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
MAINTAINERS | 1 + drivers/cenalloc/cenalloc.c | 597 +++++++++++++++++++++++++++++++++++++++ drivers/cenalloc/cenalloc.h | 99 +++++++ drivers/cenalloc/cenalloc_priv.h | 188 ++++++++++++ 4 files changed, 885 insertions(+) create mode 100644 drivers/cenalloc/cenalloc.c create mode 100644 drivers/cenalloc/cenalloc.h create mode 100644 drivers/cenalloc/cenalloc_priv.h
diff --git a/MAINTAINERS b/MAINTAINERS index 40d4796..e88ac81 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3039,6 +3039,7 @@ L: linux-media@vger.kernel.org L: dri-devel@lists.freedesktop.org L: linaro-mm-sig@lists.linaro.org F: drivers/dma-buf/ +F: drivers/cenalloc/ F: include/linux/dma-buf* F: include/linux/reservation.h F: include/linux/*fence.h diff --git a/drivers/cenalloc/cenalloc.c b/drivers/cenalloc/cenalloc.c new file mode 100644 index 0000000..f278056 --- /dev/null +++ b/drivers/cenalloc/cenalloc.c @@ -0,0 +1,597 @@ +/*
- Allocator helper framework for constraints-aware dma-buf backing storage
- allocation.
- This allows constraint-sharing devices to deferred-allocate buffers shared
- via dma-buf.
- Copyright(C) 2014 Linaro Limited. All rights reserved.
- Author: Sumit Semwal sumit.semwal@linaro.org
- Structure for management of clients, buffers etc heavily derived from
- Android's ION framework.
Does that mean we can drop ION after this gets merged?
/me dreams
Anyway, why a new directory? Why not just put it in drivers/dma-buf ? Or a subdir below there?
thanks,
greg k-h
On Fri, Oct 10, 2014 at 04:09:00PM -0700, Greg Kroah-Hartman wrote:
On Sat, Oct 11, 2014 at 01:37:56AM +0530, Sumit Semwal wrote:
Devices sharing buffers using dma-buf could benefit from sharing their constraints via struct device, and dma-buf framework would manage the common constraints for all attached devices per buffer.
With that information, we could have a 'generic' allocator helper in the form of a central dma-buf exporter, which can create dma-bufs, and allocate backing storage at the time of first call to dma_buf_map_attachment.
This allocation would utilise the constraint-mask by matching it to the right allocator from a pool of allocators, and then allocating buffer backing storage from this allocator.
The pool of allocators could be platform-dependent, allowing for platforms to hide the specifics of these allocators from the devices that access the dma-buf buffers.
A sample sequence could be:
- get handle to cenalloc_device,
- create a dmabuf using cenalloc_buffer_create;
- use this dmabuf to attach each device, which has its constraints set in the constraints mask (dev->dma_params->access_constraints_mask)
- at each dma_buf_attach() call, dma-buf will check to see if the constraint mask for the device requesting attachment is compatible with the constraints of devices already attached to the dma-buf; returns an error if it isn't.
- after all devices have attached, the first call to dma_buf_map_attachment() will allocate the backing storage for the buffer.
- follow the dma-buf api for map / unmap etc usage.
- detach all attachments,
- call cenalloc_buffer_free to free the buffer if refcount reaches zero;
** IMPORTANT** This mechanism of delayed allocation based on constraint-enablement will work *ONLY IF* the first map_attachment() call is made AFTER all attach() calls are done.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
MAINTAINERS | 1 + drivers/cenalloc/cenalloc.c | 597 +++++++++++++++++++++++++++++++++++++++ drivers/cenalloc/cenalloc.h | 99 +++++++ drivers/cenalloc/cenalloc_priv.h | 188 ++++++++++++ 4 files changed, 885 insertions(+) create mode 100644 drivers/cenalloc/cenalloc.c create mode 100644 drivers/cenalloc/cenalloc.h create mode 100644 drivers/cenalloc/cenalloc_priv.h
diff --git a/MAINTAINERS b/MAINTAINERS index 40d4796..e88ac81 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3039,6 +3039,7 @@ L: linux-media@vger.kernel.org L: dri-devel@lists.freedesktop.org L: linaro-mm-sig@lists.linaro.org F: drivers/dma-buf/ +F: drivers/cenalloc/ F: include/linux/dma-buf* F: include/linux/reservation.h F: include/linux/*fence.h diff --git a/drivers/cenalloc/cenalloc.c b/drivers/cenalloc/cenalloc.c new file mode 100644 index 0000000..f278056 --- /dev/null +++ b/drivers/cenalloc/cenalloc.c @@ -0,0 +1,597 @@ +/*
- Allocator helper framework for constraints-aware dma-buf backing storage
- allocation.
- This allows constraint-sharing devices to deferred-allocate buffers shared
- via dma-buf.
- Copyright(C) 2014 Linaro Limited. All rights reserved.
- Author: Sumit Semwal sumit.semwal@linaro.org
- Structure for management of clients, buffers etc heavily derived from
- Android's ION framework.
Does that mean we can drop ION after this gets merged?
Yeah, I hope so. Not sure whetether this hope is shared by google android people ...
/me dreams
I guess we can collectively dream about this next week at plumbers ;-) I'll try to squeeze in some light review of Sumit's patches between conference travels ...
Cheers, Daniel
Hi Greg, Daniel!
On 12 October 2014 00:10, Daniel Vetter daniel@ffwll.ch wrote:
On Fri, Oct 10, 2014 at 04:09:00PM -0700, Greg Kroah-Hartman wrote:
On Sat, Oct 11, 2014 at 01:37:56AM +0530, Sumit Semwal wrote:
Devices sharing buffers using dma-buf could benefit from sharing their constraints via struct device, and dma-buf framework would manage the common constraints for all attached devices per buffer.
With that information, we could have a 'generic' allocator helper in the form of a central dma-buf exporter, which can create dma-bufs, and allocate backing storage at the time of first call to dma_buf_map_attachment.
This allocation would utilise the constraint-mask by matching it to the right allocator from a pool of allocators, and then allocating buffer backing storage from this allocator.
The pool of allocators could be platform-dependent, allowing for platforms to hide the specifics of these allocators from the devices that access the dma-buf buffers.
A sample sequence could be:
- get handle to cenalloc_device,
- create a dmabuf using cenalloc_buffer_create;
- use this dmabuf to attach each device, which has its constraints set in the constraints mask (dev->dma_params->access_constraints_mask)
- at each dma_buf_attach() call, dma-buf will check to see if the constraint mask for the device requesting attachment is compatible with the constraints of devices already attached to the dma-buf; returns an error if it isn't.
- after all devices have attached, the first call to dma_buf_map_attachment() will allocate the backing storage for the buffer.
- follow the dma-buf api for map / unmap etc usage.
- detach all attachments,
- call cenalloc_buffer_free to free the buffer if refcount reaches zero;
** IMPORTANT** This mechanism of delayed allocation based on constraint-enablement will work *ONLY IF* the first map_attachment() call is made AFTER all attach() calls are done.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org
MAINTAINERS | 1 + drivers/cenalloc/cenalloc.c | 597 +++++++++++++++++++++++++++++++++++++++ drivers/cenalloc/cenalloc.h | 99 +++++++ drivers/cenalloc/cenalloc_priv.h | 188 ++++++++++++ 4 files changed, 885 insertions(+) create mode 100644 drivers/cenalloc/cenalloc.c create mode 100644 drivers/cenalloc/cenalloc.h create mode 100644 drivers/cenalloc/cenalloc_priv.h
diff --git a/MAINTAINERS b/MAINTAINERS index 40d4796..e88ac81 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3039,6 +3039,7 @@ L: linux-media@vger.kernel.org L: dri-devel@lists.freedesktop.org L: linaro-mm-sig@lists.linaro.org F: drivers/dma-buf/ +F: drivers/cenalloc/ F: include/linux/dma-buf* F: include/linux/reservation.h F: include/linux/*fence.h diff --git a/drivers/cenalloc/cenalloc.c b/drivers/cenalloc/cenalloc.c new file mode 100644 index 0000000..f278056 --- /dev/null +++ b/drivers/cenalloc/cenalloc.c @@ -0,0 +1,597 @@ +/*
- Allocator helper framework for constraints-aware dma-buf backing storage
- allocation.
- This allows constraint-sharing devices to deferred-allocate buffers shared
- via dma-buf.
- Copyright(C) 2014 Linaro Limited. All rights reserved.
- Author: Sumit Semwal sumit.semwal@linaro.org
- Structure for management of clients, buffers etc heavily derived from
- Android's ION framework.
Does that mean we can drop ION after this gets merged?
Yeah, I hope so. Not sure whetether this hope is shared by google android people ...
Apologies for the delay in response; was travelling for LPC and so couldn't respond.
Yes, that is certainly the hope, but this is the first-step RFC which would need a few more things before ION can be replaced completely.
/me dreams
I guess we can collectively dream about this next week at plumbers ;-) I'll try to squeeze in some light review of Sumit's patches between conference travels ...
Cheers, Daniel
Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On 10/10/2014 1:07 PM, Sumit Semwal wrote:
Devices sharing buffers using dma-buf could benefit from sharing their constraints via struct device, and dma-buf framework would manage the common constraints for all attached devices per buffer.
With that information, we could have a 'generic' allocator helper in the form of a central dma-buf exporter, which can create dma-bufs, and allocate backing storage at the time of first call to dma_buf_map_attachment.
This allocation would utilise the constraint-mask by matching it to the right allocator from a pool of allocators, and then allocating buffer backing storage from this allocator.
The pool of allocators could be platform-dependent, allowing for platforms to hide the specifics of these allocators from the devices that access the dma-buf buffers.
A sample sequence could be:
- get handle to cenalloc_device,
- create a dmabuf using cenalloc_buffer_create;
- use this dmabuf to attach each device, which has its constraints set in the constraints mask (dev->dma_params->access_constraints_mask)
- at each dma_buf_attach() call, dma-buf will check to see if the constraint mask for the device requesting attachment is compatible with the constraints of devices already attached to the dma-buf; returns an error if it isn't.
- after all devices have attached, the first call to dma_buf_map_attachment() will allocate the backing storage for the buffer.
- follow the dma-buf api for map / unmap etc usage.
- detach all attachments,
- call cenalloc_buffer_free to free the buffer if refcount reaches zero;
** IMPORTANT** This mechanism of delayed allocation based on constraint-enablement will work *ONLY IF* the first map_attachment() call is made AFTER all attach() calls are done.
My first instinct is 'I wonder which drivers will call map_attachment at the wrong time and screw things up'. Are there any plans for synchronization and/or debugging output to catch drivers violating this requirement?
[...]
+int cenalloc_phys(struct dma_buf *dmabuf,
phys_addr_t *addr, size_t *len)
+{
- struct cenalloc_buffer *buffer;
- int ret;
- if (is_cenalloc_buffer(dmabuf))
buffer = (struct cenalloc_buffer *)dmabuf->priv;
- else
return -EINVAL;
- if (!buffer->allocator->ops->phys) {
pr_err("%s: cenalloc_phys is not implemented by this allocator.\n",
__func__);
return -ENODEV;
- }
- mutex_lock(&buffer->lock);
- ret = buffer->allocator->ops->phys(buffer->allocator, buffer, addr,
len);
- mutex_lock(&buffer->lock);
- return ret;
+} +EXPORT_SYMBOL_GPL(cenalloc_phys);
The .phys operation makes it difficult to have drivers which can handle both contiguous and non contiguous memory (too much special casing). Any chance we could drop this API and just have drivers treat an sg_table with 1 entry as contiguous memory?
Thanks, Laura
Hi Laura,
On 13 October 2014 14:05, Laura Abbott lauraa@codeaurora.org wrote:
On 10/10/2014 1:07 PM, Sumit Semwal wrote:
Devices sharing buffers using dma-buf could benefit from sharing their constraints via struct device, and dma-buf framework would manage the common constraints for all attached devices per buffer.
With that information, we could have a 'generic' allocator helper in the form of a central dma-buf exporter, which can create dma-bufs, and allocate backing storage at the time of first call to dma_buf_map_attachment.
This allocation would utilise the constraint-mask by matching it to the right allocator from a pool of allocators, and then allocating buffer backing storage from this allocator.
The pool of allocators could be platform-dependent, allowing for platforms to hide the specifics of these allocators from the devices that access the dma-buf buffers.
A sample sequence could be:
- get handle to cenalloc_device,
- create a dmabuf using cenalloc_buffer_create;
- use this dmabuf to attach each device, which has its constraints set in the constraints mask (dev->dma_params->access_constraints_mask)
- at each dma_buf_attach() call, dma-buf will check to see if the
constraint mask for the device requesting attachment is compatible with the constraints of devices already attached to the dma-buf; returns an error if it isn't.
- after all devices have attached, the first call to
dma_buf_map_attachment() will allocate the backing storage for the buffer.
- follow the dma-buf api for map / unmap etc usage.
- detach all attachments,
- call cenalloc_buffer_free to free the buffer if refcount reaches zero;
** IMPORTANT** This mechanism of delayed allocation based on constraint-enablement will work *ONLY IF* the first map_attachment() call is made AFTER all attach() calls are done.
My first instinct is 'I wonder which drivers will call map_attachment at the wrong time and screw things up'. Are there any plans for synchronization and/or debugging output to catch drivers violating this requirement?
Well, of course you're right - at the moment, no mechanism to do so. That will certainly be the next step - we could discuss it sometime this week at LPC to see what makes better sense.
[...]
+int cenalloc_phys(struct dma_buf *dmabuf,
phys_addr_t *addr, size_t *len)
+{
struct cenalloc_buffer *buffer;
int ret;
if (is_cenalloc_buffer(dmabuf))
buffer = (struct cenalloc_buffer *)dmabuf->priv;
else
return -EINVAL;
if (!buffer->allocator->ops->phys) {
pr_err("%s: cenalloc_phys is not implemented by this
allocator.\n",
__func__);
return -ENODEV;
}
mutex_lock(&buffer->lock);
ret = buffer->allocator->ops->phys(buffer->allocator, buffer,
addr,
len);
mutex_lock(&buffer->lock);
return ret;
+} +EXPORT_SYMBOL_GPL(cenalloc_phys);
The .phys operation makes it difficult to have drivers which can handle both contiguous and non contiguous memory (too much special casing). Any chance we could drop this API and just have drivers treat an sg_table with 1 entry as contiguous memory?
I am not sure I understand how having a .phys makes it more difficult - and also, for cases where you're sharing buffers between CPU and a co-processor like DSP, my understanding is that we'd need an equivalent of a phys address.
Thanks, Laura
-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Add the build files for cenalloc, the constraints-enabled allocation helper framework for dma-buf.
Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org --- drivers/Kconfig | 2 ++ drivers/Makefile | 1 + drivers/cenalloc/Kconfig | 8 ++++++++ drivers/cenalloc/Makefile | 3 +++ 4 files changed, 14 insertions(+) create mode 100644 drivers/cenalloc/Kconfig create mode 100644 drivers/cenalloc/Makefile
diff --git a/drivers/Kconfig b/drivers/Kconfig index 1a693d3..f40d2ce 100644 --- a/drivers/Kconfig +++ b/drivers/Kconfig @@ -182,4 +182,6 @@ source "drivers/ras/Kconfig"
source "drivers/thunderbolt/Kconfig"
+source "drivers/cenalloc/Kconfig" + endmenu diff --git a/drivers/Makefile b/drivers/Makefile index ebee555..a04e516 100644 --- a/drivers/Makefile +++ b/drivers/Makefile @@ -161,3 +161,4 @@ obj-$(CONFIG_POWERCAP) += powercap/ obj-$(CONFIG_MCB) += mcb/ obj-$(CONFIG_RAS) += ras/ obj-$(CONFIG_THUNDERBOLT) += thunderbolt/ +obj-$(CONFIG_CENALLOC) += cenalloc/ diff --git a/drivers/cenalloc/Kconfig b/drivers/cenalloc/Kconfig new file mode 100644 index 0000000..9472d5d --- /dev/null +++ b/drivers/cenalloc/Kconfig @@ -0,0 +1,8 @@ +menuconfig CENALLOC + bool "cenalloc helper functions" + default n + select ANON_INODES + help + This option enables the helper allocation framework for drivers using + dma-buf buffer-sharing. It uses constraints of participating devices + to help find out best suited allocator. diff --git a/drivers/cenalloc/Makefile b/drivers/cenalloc/Makefile new file mode 100644 index 0000000..d36b531 --- /dev/null +++ b/drivers/cenalloc/Makefile @@ -0,0 +1,3 @@ +# Makefile for the cenalloc helper + +obj-y += cenalloc.o
From: Benjamin Gaignard benjamin.gaignard@linaro.org
Signed-off-by: Benjamin Gaignard benjamin.gaignard@linaro.org Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Cc: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: linux-media@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linaro-mm-sig@lists.linaro.org --- drivers/cenalloc/Makefile | 2 +- drivers/cenalloc/cenalloc_system_contig.c | 225 ++++++++++++++++++++++++++++++ 2 files changed, 226 insertions(+), 1 deletion(-) create mode 100644 drivers/cenalloc/cenalloc_system_contig.c
diff --git a/drivers/cenalloc/Makefile b/drivers/cenalloc/Makefile index d36b531..2f69b61 100644 --- a/drivers/cenalloc/Makefile +++ b/drivers/cenalloc/Makefile @@ -1,3 +1,3 @@ # Makefile for the cenalloc helper
-obj-y += cenalloc.o +obj-y += cenalloc.o cenalloc_system_contig.o diff --git a/drivers/cenalloc/cenalloc_system_contig.c b/drivers/cenalloc/cenalloc_system_contig.c new file mode 100644 index 0000000..ecf0c03 --- /dev/null +++ b/drivers/cenalloc/cenalloc_system_contig.c @@ -0,0 +1,225 @@ +/* + * central allocator using kmalloc + * + * Copyright(C) 2014 Linaro Limited. All rights reserved. + * Author: Benjamin Gaignard benjamin.gaignard@linaro.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License version 2 as published by + * the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <asm/page.h> +#include <linux/dma-mapping.h> +#include <linux/err.h> +#include <linux/highmem.h> +#include <linux/mm.h> +#include <linux/scatterlist.h> +#include <linux/seq_file.h> +#include <linux/slab.h> +#include <linux/vmalloc.h> + +#include "cenalloc_priv.h" + +static gfp_t low_order_gfp_flags = (GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN); + +void cenalloc_pages_sync_for_device(struct device *dev, struct page *page, + size_t size, enum dma_data_direction dir) +{ + struct scatterlist sg; + + sg_init_table(&sg, 1); + sg_set_page(&sg, page, size, 0); + /* + * This is not correct - sg_dma_address needs a dma_addr_t that is valid + * for the the targeted device, but this works on the currently targeted + * hardware. + */ + sg_dma_address(&sg) = page_to_phys(page); + dma_sync_sg_for_device(dev, &sg, 1, dir); +} + +static int cenalloc_system_contig_allocate(struct cenalloc_allocator *allocator, + struct cenalloc_buffer *buffer, unsigned long len, + unsigned long align, unsigned long flags) +{ + int order = get_order(len); + struct page *page; + struct sg_table *table; + unsigned long i; + int ret; + + if (align > (PAGE_SIZE << order)) + return -EINVAL; + + page = alloc_pages(low_order_gfp_flags, order); + if (!page) + return -ENOMEM; + + split_page(page, order); + + len = PAGE_ALIGN(len); + for (i = len >> PAGE_SHIFT; i < (1 << order); i++) + __free_page(page + i); + + table = kmalloc(sizeof(struct sg_table), GFP_KERNEL); + if (!table) { + ret = -ENOMEM; + goto free_pages; + } + + ret = sg_alloc_table(table, 1, GFP_KERNEL); + if (ret) + goto free_table; + + sg_set_page(table->sgl, page, len, 0); + + buffer->sg_table = table; + + cenalloc_pages_sync_for_device(NULL, page, len, DMA_BIDIRECTIONAL); + + return 0; + +free_table: + kfree(table); +free_pages: + for (i = 0; i < len >> PAGE_SHIFT; i++) + __free_page(page + i); + + return ret; + +} + +static void cenalloc_system_contig_free(struct cenalloc_buffer *buffer) +{ + struct sg_table *table = buffer->sg_table; + struct page *page = sg_page(table->sgl); + unsigned long pages = PAGE_ALIGN(buffer->size) >> PAGE_SHIFT; + unsigned long i; + + for (i = 0; i < pages; i++) + __free_page(page + i); + sg_free_table(table); + kfree(table); +} + +static struct sg_table *cenalloc_system_contig_map_dma + (struct cenalloc_allocator *allocator, struct cenalloc_buffer *buffer) +{ + return buffer->sg_table; +} + +static void cenalloc_system_contig_unmap_dma + (struct cenalloc_allocator *allocator, struct cenalloc_buffer *buffer) +{ + +} + +static void *cenalloc_system_contig_map_kernel + (struct cenalloc_allocator *allocator, struct cenalloc_buffer *buffer) +{ + struct scatterlist *sg; + int i, j; + void *vaddr; + pgprot_t pgprot; + struct sg_table *table = buffer->sg_table; + int npages = PAGE_ALIGN(buffer->size) / PAGE_SIZE; + struct page **pages = vmalloc(sizeof(struct page *) * npages); + struct page **tmp = pages; + + if (!pages) + return NULL; + + pgprot = pgprot_writecombine(PAGE_KERNEL); + + for_each_sg(table->sgl, sg, table->nents, i) { + int npages_this_entry = PAGE_ALIGN(sg->length) / PAGE_SIZE; + struct page *page = sg_page(sg); + + BUG_ON(i >= npages); + for (j = 0; j < npages_this_entry; j++) + *(tmp++) = page++; + } + vaddr = vmap(pages, npages, VM_MAP, pgprot); + vfree(pages); + + if (vaddr == NULL) + return ERR_PTR(-ENOMEM); + + return vaddr; +} + +static void cenalloc_system_contig_unmap_kernel + (struct cenalloc_allocator *allocator, struct cenalloc_buffer *buffer) +{ + vunmap(buffer->vaddr); +} + +static int cenalloc_system_contig_map_user + (struct cenalloc_allocator *mapper, struct cenalloc_buffer *buffer, + struct vm_area_struct *vma) +{ + struct sg_table *table = buffer->sg_table; + unsigned long addr = vma->vm_start; + unsigned long offset = vma->vm_pgoff * PAGE_SIZE; + struct scatterlist *sg; + int i; + int ret; + + for_each_sg(table->sgl, sg, table->nents, i) { + struct page *page = sg_page(sg); + unsigned long remainder = vma->vm_end - addr; + unsigned long len = sg->length; + + if (offset >= sg->length) { + offset -= sg->length; + continue; + } else if (offset) { + page += offset / PAGE_SIZE; + len = sg->length - offset; + offset = 0; + } + len = min(len, remainder); + ret = remap_pfn_range(vma, addr, page_to_pfn(page), len, + vma->vm_page_prot); + if (ret) + return ret; + addr += len; + if (addr >= vma->vm_end) + return 0; + } + return 0; +} + +static void cenalloc_system_contig_sync_for_device + (struct cenalloc_allocator *allocator, struct cenalloc_buffer *buffer, + struct device *dev, enum dma_data_direction dir) +{ + /* TODO */ +} + +static struct cenalloc_allocator_ops system_contig_ops = { + .allocate = cenalloc_system_contig_allocate, + .free = cenalloc_system_contig_free, + .map_dma = cenalloc_system_contig_map_dma, + .unmap_dma = cenalloc_system_contig_unmap_dma, + .map_kernel = cenalloc_system_contig_map_kernel, + .unmap_kernel = cenalloc_system_contig_unmap_kernel, + .map_user = cenalloc_system_contig_map_user, + .sync_for_device = cenalloc_system_contig_sync_for_device, +}; + +struct cenalloc_allocator system_contig = { + .ops = &system_contig_ops, + .type = CENALLOC_ALLOCATOR_SYSTEM, + .id = CENALLOC_ALLOCATOR_SYSTEM, + .name = "system contiguous memory allocator", +};
On 10/10/2014 1:07 PM, Sumit Semwal wrote:
Hi,
Why:
While sharing buffers using dma-buf, currently there's no mechanism to let devices share their memory access constraints with each other to allow for delayed allocation of backing storage.
This RFC attempts to introduce the idea of memory constraints of a device, and how these constraints can be shared and used to help allocate buffers that can satisfy requirements of all devices attached to a particular dma-buf.
How:
A constraints_mask is added to dma_parms of the device, and at the time of each device attachment to a dma-buf, the dma-buf uses this constraints_mask to calculate the access_mask for the dma-buf.
Allocators can be defined for each of these constraints_masks, and then helper functions can be used to allocate the backing storage from the matching allocator satisfying the constraints of all devices interested.
A new miscdevice, /dev/cenalloc [1] is created, which acts as the dma-buf exporter to make this transparent to the devices.
More details in the patch description of "cenalloc: Constraint-Enabled Allocation helpers for dma-buf".
At present, the constraint_mask is only a bitmask, but it should be possible to change it to a struct and adapt the constraint_mask calculation accordingly, based on discussion.
Important requirement:
Of course, delayed allocation can only work if all participating devices will wait for other devices to have 'attached' before mapping the buffer for the first time.
As of now, users of dma-buf(drm prime, v4l2 etc) call the attach() and then map_attachment() almost immediately after it. This would need to be changed if they were to benefit from constraints.
What 'cenalloc' is not:
- not 'general' allocator helpers - useful only for constraints-enabled devices that share buffers with others using dma-buf.
- not a replacement for existing allocation mechanisms inside various subsystems; merely a possible alternative.
- no page-migration - it would be very complementary to the delayed allocation suggested here.
TODOs:
- demonstration test cases
- vma helpers for allocators
- more sample allocators
- userspace ioctl (It should be a simple one, and we have one ready, but wanted to agree on the kernel side of things first)
I'm interested to see the userspace ioctl. The mask based approach of Ion does not scale well to a userspace ABI so I'm curious if cenalloc does better.
Thanks, Laura
Hi Laura,
On 13 October 2014 13:42, Laura Abbott lauraa@codeaurora.org wrote:
On 10/10/2014 1:07 PM, Sumit Semwal wrote:
Hi,
Why:
While sharing buffers using dma-buf, currently there's no mechanism to let devices share their memory access constraints with each other to allow for delayed allocation of backing storage.
This RFC attempts to introduce the idea of memory constraints of a device, and how these constraints can be shared and used to help allocate buffers that can satisfy requirements of all devices attached to a particular dma-buf.
How:
A constraints_mask is added to dma_parms of the device, and at the time of each device attachment to a dma-buf, the dma-buf uses this constraints_mask to calculate the access_mask for the dma-buf.
Allocators can be defined for each of these constraints_masks, and then helper functions can be used to allocate the backing storage from the matching allocator satisfying the constraints of all devices interested.
A new miscdevice, /dev/cenalloc [1] is created, which acts as the dma-buf exporter to make this transparent to the devices.
More details in the patch description of "cenalloc: Constraint-Enabled Allocation helpers for dma-buf".
At present, the constraint_mask is only a bitmask, but it should be possible to change it to a struct and adapt the constraint_mask calculation accordingly, based on discussion.
Important requirement:
Of course, delayed allocation can only work if all participating devices will wait for other devices to have 'attached' before mapping the buffer for the first time.
As of now, users of dma-buf(drm prime, v4l2 etc) call the attach() and then map_attachment() almost immediately after it. This would need to be changed if they were to benefit from constraints.
What 'cenalloc' is not:
- not 'general' allocator helpers - useful only for constraints-enabled devices that share buffers with others using dma-buf.
- not a replacement for existing allocation mechanisms inside various subsystems; merely a possible alternative.
- no page-migration - it would be very complementary to the delayed
allocation suggested here.
TODOs:
- demonstration test cases
- vma helpers for allocators
- more sample allocators
- userspace ioctl (It should be a simple one, and we have one ready, but
wanted to agree on the kernel side of things first)
I'm interested to see the userspace ioctl. The mask based approach of Ion does not scale well to a userspace ABI so I'm curious if cenalloc does better.
Apologies for the delay in response. Since with cenalloc, the decision of 'which pool to allocate from' is not with the userspace, but is calculated based on the devices that attach, the userspace ABI should be just a simple xxx_create, which returns an fd that'd be the dma-buf fd. That will allow easy sharing with other dma-buf importers via standard dma-buf API.
Thanks, Laura
-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
linaro-mm-sig@lists.linaro.org