Hi all,
This series here implements an interface to enable cpu access from the kernel context to dma_buf objects. The main design goal of this interface proposal is to enable buffer objects that reside in highmem.
Comments, flames, ideas and questions highly welcome. Althouhg I might be a bit slow in responding - I'm on conferences and vacation the next 2 weeks.
Cheers, Daniel
Daniel Vetter (3): dma-buf: don't hold the mutex around map/unmap calls dma-buf: add support for kernel cpu access dma_buf: Add documentation for the new cpu access support
Documentation/dma-buf-sharing.txt | 102 +++++++++++++++++++++++++++++- drivers/base/dma-buf.c | 124 +++++++++++++++++++++++++++++++++++- include/linux/dma-buf.h | 62 ++++++++++++++++++- 3 files changed, 280 insertions(+), 8 deletions(-)
The mutex protects the attachment list and hence needs to be held around the callbakc to the exporters (optional) attach/detach functions.
Holding the mutex around the map/unmap calls doesn't protect any dma_buf state. Exporters need to properly protect any of their own state anyway (to protect against calls from their own interfaces). So this only makes the locking messier (and lockdep easier to anger).
Therefore let's just drop this.
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/base/dma-buf.c | 4 ---- include/linux/dma-buf.h | 2 +- 2 files changed, 1 insertions(+), 5 deletions(-)
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index e38ad24..1b11192 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -258,10 +258,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (WARN_ON(!attach || !attach->dmabuf || !attach->dmabuf->ops)) return ERR_PTR(-EINVAL);
- mutex_lock(&attach->dmabuf->lock); if (attach->dmabuf->ops->map_dma_buf) sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction); - mutex_unlock(&attach->dmabuf->lock);
return sg_table; } @@ -282,10 +280,8 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, || !attach->dmabuf->ops)) return;
- mutex_lock(&attach->dmabuf->lock); if (attach->dmabuf->ops->unmap_dma_buf) attach->dmabuf->ops->unmap_dma_buf(attach, sg_table); - mutex_unlock(&attach->dmabuf->lock);
} EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index f8ac076..f7ad2ca 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -86,7 +86,7 @@ struct dma_buf { struct file *file; struct list_head attachments; const struct dma_buf_ops *ops; - /* mutex to serialize list manipulation and other ops */ + /* mutex to serialize list manipulation and attach/detach */ struct mutex lock; void *priv; };
On Thu, Mar 1, 2012 at 9:35 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
The mutex protects the attachment list and hence needs to be held around the callbakc to the exporters (optional) attach/detach functions.
Holding the mutex around the map/unmap calls doesn't protect any dma_buf state. Exporters need to properly protect any of their own state anyway (to protect against calls from their own interfaces). So this only makes the locking messier (and lockdep easier to anger).
Therefore let's just drop this.
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
Reviewed-by: Rob Clark rob.clark@linaro.org
drivers/base/dma-buf.c | 4 ---- include/linux/dma-buf.h | 2 +- 2 files changed, 1 insertions(+), 5 deletions(-)
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index e38ad24..1b11192 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -258,10 +258,8 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (WARN_ON(!attach || !attach->dmabuf || !attach->dmabuf->ops)) return ERR_PTR(-EINVAL);
- mutex_lock(&attach->dmabuf->lock);
if (attach->dmabuf->ops->map_dma_buf) sg_table = attach->dmabuf->ops->map_dma_buf(attach, direction);
- mutex_unlock(&attach->dmabuf->lock);
return sg_table; } @@ -282,10 +280,8 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, || !attach->dmabuf->ops)) return;
- mutex_lock(&attach->dmabuf->lock);
if (attach->dmabuf->ops->unmap_dma_buf) attach->dmabuf->ops->unmap_dma_buf(attach, sg_table);
- mutex_unlock(&attach->dmabuf->lock);
} EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index f8ac076..f7ad2ca 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -86,7 +86,7 @@ struct dma_buf { struct file *file; struct list_head attachments; const struct dma_buf_ops *ops;
- /* mutex to serialize list manipulation and other ops */
- /* mutex to serialize list manipulation and attach/detach */
struct mutex lock; void *priv; }; -- 1.7.7.5
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Big differences to other contenders in the field (like ion) is that this also supports highmem, so we have to split up the cpu access from the kernel side into a prepare and a kmap step.
Prepare is allowed to fail and should do everything required so that the kmap calls can succeed (like swapin/backing storage allocation, flushing, ...).
More in-depth explanations will follow in the follow-up documentation patch.
Changes in v2:
- Clear up begin_cpu_access confusion noticed by Sumit Semwal. - Don't automatically fallback from the _atomic variants to the non-atomic variants. The _atomic callbacks are not allowed to sleep, so we want exporters to make this decision explicit. The function signatures are explicit, so simpler exporters can still use the same function for both. - Make the unmap functions optional. Simpler exporters with permanent mappings don't need to do anything at unmap time.
Signed-Off-by: Daniel Vetter daniel.vetter@ffwll.ch --- drivers/base/dma-buf.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 60 +++++++++++++++++++++++ 2 files changed, 180 insertions(+), 0 deletions(-)
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 1b11192..bf54b89 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -285,3 +285,123 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
} EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment); + + +/** + * dma_buf_begin_cpu_access - Must be called before accessing a dma_buf from the + * cpu in the kernel context. Calls begin_cpu_access to allow exporter-specific + * preparations. Coherency is only guaranteed in the specified range for the + * specified access direction. + * @dma_buf: [in] buffer to prepare cpu access for. + * @start: [in] start of range for cpu access. + * @len: [in] length of range for cpu access. + * @direction: [in] length of range for cpu access. + * + * Can return negative error values, returns 0 on success. + */ +int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, + enum dma_data_direction direction) +{ + int ret = 0; + + if (WARN_ON(!dmabuf || !dmabuf->ops)) + return EINVAL; + + if (dmabuf->ops->begin_cpu_access) + ret = dmabuf->ops->begin_cpu_access(dmabuf, start, len, direction); + + return ret; +} +EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access); + +/** + * dma_buf_end_cpu_access - Must be called after accessing a dma_buf from the + * cpu in the kernel context. Calls end_cpu_access to allow exporter-specific + * actions. Coherency is only guaranteed in the specified range for the + * specified access direction. + * @dma_buf: [in] buffer to complete cpu access for. + * @start: [in] start of range for cpu access. + * @len: [in] length of range for cpu access. + * @direction: [in] length of range for cpu access. + * + * This call must always succeed. + */ +void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len, + enum dma_data_direction direction) +{ + WARN_ON(!dmabuf || !dmabuf->ops); + + if (dmabuf->ops->end_cpu_access) + dmabuf->ops->end_cpu_access(dmabuf, start, len, direction); +} +EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access); + +/** + * dma_buf_kmap_atomic - Map a page of the buffer object into kernel address + * space. The same restrictions as for kmap_atomic and friends apply. + * @dma_buf: [in] buffer to map page from. + * @page_num: [in] page in PAGE_SIZE units to map. + * + * This call must always succeed, any necessary preparations that might fail + * need to be done in begin_cpu_access. + */ +void *dma_buf_kmap_atomic(struct dma_buf *dmabuf, unsigned long page_num) +{ + WARN_ON(!dmabuf || !dmabuf->ops); + + return dmabuf->ops->kmap_atomic(dmabuf, page_num); +} +EXPORT_SYMBOL_GPL(dma_buf_kmap_atomic); + +/** + * dma_buf_kunmap_atomic - Unmap a page obtained by dma_buf_kmap_atomic. + * @dma_buf: [in] buffer to unmap page from. + * @page_num: [in] page in PAGE_SIZE units to unmap. + * @vaddr: [in] kernel space pointer obtained from dma_buf_kmap_atomic. + * + * This call must always succeed. + */ +void dma_buf_kunmap_atomic(struct dma_buf *dmabuf, unsigned long page_num, + void *vaddr) +{ + WARN_ON(!dmabuf || !dmabuf->ops); + + if (dmabuf->ops->kunmap_atomic) + dmabuf->ops->kunmap_atomic(dmabuf, page_num, vaddr); +} +EXPORT_SYMBOL_GPL(dma_buf_kunmap_atomic); + +/** + * dma_buf_kmap - Map a page of the buffer object into kernel address space. The + * same restrictions as for kmap and friends apply. + * @dma_buf: [in] buffer to map page from. + * @page_num: [in] page in PAGE_SIZE units to map. + * + * This call must always succeed, any necessary preparations that might fail + * need to be done in begin_cpu_access. + */ +void *dma_buf_kmap(struct dma_buf *dmabuf, unsigned long page_num) +{ + WARN_ON(!dmabuf || !dmabuf->ops); + + return dmabuf->ops->kmap(dmabuf, page_num); +} +EXPORT_SYMBOL_GPL(dma_buf_kmap); + +/** + * dma_buf_kunmap - Unmap a page obtained by dma_buf_kmap. + * @dma_buf: [in] buffer to unmap page from. + * @page_num: [in] page in PAGE_SIZE units to unmap. + * @vaddr: [in] kernel space pointer obtained from dma_buf_kmap. + * + * This call must always succeed. + */ +void dma_buf_kunmap(struct dma_buf *dmabuf, unsigned long page_num, + void *vaddr) +{ + WARN_ON(!dmabuf || !dmabuf->ops); + + if (dmabuf->ops->kunmap) + dmabuf->ops->kunmap(dmabuf, page_num, vaddr); +} +EXPORT_SYMBOL_GPL(dma_buf_kunmap); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index f7ad2ca..72eb922 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -49,6 +49,17 @@ struct dma_buf_attachment; * @unmap_dma_buf: decreases usecount of buffer, might deallocate scatter * pages. * @release: release this buffer; to be called after the last dma_buf_put. + * @begin_cpu_access: [optional] called before cpu access to invalidate cpu + * caches and allocate backing storage (if not yet done) + * respectively pin the objet into memory. + * @end_cpu_access: [optional] called after cpu access to flush cashes. + * @kmap_atomic: maps a page from the buffer into kernel address + * space, users may not block until the subsequent unmap call. + * This callback must not sleep. + * @kunmap_atomic: [optional] unmaps a atomically mapped page from the buffer. + * This Callback must not sleep. + * @kmap: maps a page from the buffer into kernel address space. + * @kunmap: [optional] unmaps a page from the buffer. */ struct dma_buf_ops { int (*attach)(struct dma_buf *, struct device *, @@ -71,6 +82,14 @@ struct dma_buf_ops { /* after final dma_buf_put() */ void (*release)(struct dma_buf *);
+ int (*begin_cpu_access)(struct dma_buf *, size_t, size_t, + enum dma_data_direction); + void (*end_cpu_access)(struct dma_buf *, size_t, size_t, + enum dma_data_direction); + void *(*kmap_atomic)(struct dma_buf *, unsigned long); + void (*kunmap_atomic)(struct dma_buf *, unsigned long, void *); + void *(*kmap)(struct dma_buf *, unsigned long); + void (*kunmap)(struct dma_buf *, unsigned long, void *); };
/** @@ -123,6 +142,15 @@ void dma_buf_put(struct dma_buf *dmabuf); struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *); + +int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len, + enum dma_data_direction dir); +void dma_buf_end_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len, + enum dma_data_direction dir); +void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); +void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *); +void *dma_buf_kmap(struct dma_buf *, unsigned long); +void dma_buf_kunmap(struct dma_buf *, unsigned long, void *); #else
static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, @@ -171,6 +199,38 @@ static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, return; }
+static inline int dma_buf_begin_cpu_access(struct dma_buf *, + size_t, size_t, + enum dma_data_direction) +{ + return -ENODEV; +} + +static inline void dma_buf_end_cpu_access(struct dma_buf *, + size_t, size_t, + enum dma_data_direction) +{ +} + +static inline void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long) +{ + return NULL; +} + +static inline void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, + void *) +{ +} + +static inline void *dma_buf_kmap(struct dma_buf *, unsigned long) +{ + return NULL; +} + +static inline void dma_buf_kunmap(struct dma_buf *, unsigned long, + void *) +{ +} #endif /* CONFIG_DMA_SHARED_BUFFER */
#endif /* __DMA_BUF_H__ */
On Thu, Mar 1, 2012 at 9:36 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
Big differences to other contenders in the field (like ion) is that this also supports highmem, so we have to split up the cpu access from the kernel side into a prepare and a kmap step.
Prepare is allowed to fail and should do everything required so that the kmap calls can succeed (like swapin/backing storage allocation, flushing, ...).
More in-depth explanations will follow in the follow-up documentation patch.
Changes in v2:
- Clear up begin_cpu_access confusion noticed by Sumit Semwal.
- Don't automatically fallback from the _atomic variants to the
non-atomic variants. The _atomic callbacks are not allowed to sleep, so we want exporters to make this decision explicit. The function signatures are explicit, so simpler exporters can still use the same function for both.
- Make the unmap functions optional. Simpler exporters with permanent
mappings don't need to do anything at unmap time.
Signed-Off-by: Daniel Vetter daniel.vetter@ffwll.ch
drivers/base/dma-buf.c | 120 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/dma-buf.h | 60 +++++++++++++++++++++++ 2 files changed, 180 insertions(+), 0 deletions(-)
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c index 1b11192..bf54b89 100644 --- a/drivers/base/dma-buf.c +++ b/drivers/base/dma-buf.c @@ -285,3 +285,123 @@ void dma_buf_unmap_attachment(struct dma_buf_attachment *attach,
} EXPORT_SYMBOL_GPL(dma_buf_unmap_attachment);
+/**
- dma_buf_begin_cpu_access - Must be called before accessing a dma_buf from the
- cpu in the kernel context. Calls begin_cpu_access to allow exporter-specific
- preparations. Coherency is only guaranteed in the specified range for the
- specified access direction.
- @dma_buf: [in] buffer to prepare cpu access for.
- @start: [in] start of range for cpu access.
- @len: [in] length of range for cpu access.
- @direction: [in] length of range for cpu access.
- Can return negative error values, returns 0 on success.
- */
+int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len,
- enum dma_data_direction direction)
+{
- int ret = 0;
- if (WARN_ON(!dmabuf || !dmabuf->ops))
- return EINVAL;
- if (dmabuf->ops->begin_cpu_access)
- ret = dmabuf->ops->begin_cpu_access(dmabuf, start, len, direction);
- return ret;
+} +EXPORT_SYMBOL_GPL(dma_buf_begin_cpu_access);
+/**
- dma_buf_end_cpu_access - Must be called after accessing a dma_buf from the
- cpu in the kernel context. Calls end_cpu_access to allow exporter-specific
- actions. Coherency is only guaranteed in the specified range for the
- specified access direction.
- @dma_buf: [in] buffer to complete cpu access for.
- @start: [in] start of range for cpu access.
- @len: [in] length of range for cpu access.
- @direction: [in] length of range for cpu access.
- This call must always succeed.
- */
+void dma_buf_end_cpu_access(struct dma_buf *dmabuf, size_t start, size_t len,
- enum dma_data_direction direction)
+{
- WARN_ON(!dmabuf || !dmabuf->ops);
- if (dmabuf->ops->end_cpu_access)
- dmabuf->ops->end_cpu_access(dmabuf, start, len, direction);
+} +EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access);
+/**
- dma_buf_kmap_atomic - Map a page of the buffer object into kernel address
- space. The same restrictions as for kmap_atomic and friends apply.
- @dma_buf: [in] buffer to map page from.
- @page_num: [in] page in PAGE_SIZE units to map.
- This call must always succeed, any necessary preparations that might fail
- need to be done in begin_cpu_access.
- */
+void *dma_buf_kmap_atomic(struct dma_buf *dmabuf, unsigned long page_num) +{
- WARN_ON(!dmabuf || !dmabuf->ops);
- return dmabuf->ops->kmap_atomic(dmabuf, page_num);
Perhaps we should check somewhere for required dmabuf ops fxns (like kmap_atomic here), rather than just calling unconditionally what might be a null ptr. At least put it in the WARN_ON(), but it might be nicer to catch a missing required fxns at export time, rather than waiting for an importer to try and call it. Less likely that way, for newly added required functions go unnoticed.
(same comment applies below for the non-atomic variant.. and possibly some other existing dmabuf ops)
BR, -R
+} +EXPORT_SYMBOL_GPL(dma_buf_kmap_atomic);
+/**
- dma_buf_kunmap_atomic - Unmap a page obtained by dma_buf_kmap_atomic.
- @dma_buf: [in] buffer to unmap page from.
- @page_num: [in] page in PAGE_SIZE units to unmap.
- @vaddr: [in] kernel space pointer obtained from dma_buf_kmap_atomic.
- This call must always succeed.
- */
+void dma_buf_kunmap_atomic(struct dma_buf *dmabuf, unsigned long page_num,
- void *vaddr)
+{
- WARN_ON(!dmabuf || !dmabuf->ops);
- if (dmabuf->ops->kunmap_atomic)
- dmabuf->ops->kunmap_atomic(dmabuf, page_num, vaddr);
+} +EXPORT_SYMBOL_GPL(dma_buf_kunmap_atomic);
+/**
- dma_buf_kmap - Map a page of the buffer object into kernel address space. The
- same restrictions as for kmap and friends apply.
- @dma_buf: [in] buffer to map page from.
- @page_num: [in] page in PAGE_SIZE units to map.
- This call must always succeed, any necessary preparations that might fail
- need to be done in begin_cpu_access.
- */
+void *dma_buf_kmap(struct dma_buf *dmabuf, unsigned long page_num) +{
- WARN_ON(!dmabuf || !dmabuf->ops);
- return dmabuf->ops->kmap(dmabuf, page_num);
+} +EXPORT_SYMBOL_GPL(dma_buf_kmap);
+/**
- dma_buf_kunmap - Unmap a page obtained by dma_buf_kmap.
- @dma_buf: [in] buffer to unmap page from.
- @page_num: [in] page in PAGE_SIZE units to unmap.
- @vaddr: [in] kernel space pointer obtained from dma_buf_kmap.
- This call must always succeed.
- */
+void dma_buf_kunmap(struct dma_buf *dmabuf, unsigned long page_num,
- void *vaddr)
+{
- WARN_ON(!dmabuf || !dmabuf->ops);
- if (dmabuf->ops->kunmap)
- dmabuf->ops->kunmap(dmabuf, page_num, vaddr);
+} +EXPORT_SYMBOL_GPL(dma_buf_kunmap); diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index f7ad2ca..72eb922 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -49,6 +49,17 @@ struct dma_buf_attachment; * @unmap_dma_buf: decreases usecount of buffer, might deallocate scatter * pages. * @release: release this buffer; to be called after the last dma_buf_put.
- @begin_cpu_access: [optional] called before cpu access to invalidate cpu
- caches and allocate backing storage (if not yet done)
- respectively pin the objet into memory.
- @end_cpu_access: [optional] called after cpu access to flush cashes.
- @kmap_atomic: maps a page from the buffer into kernel address
- space, users may not block until the subsequent unmap call.
- This callback must not sleep.
- @kunmap_atomic: [optional] unmaps a atomically mapped page from the buffer.
- This Callback must not sleep.
- @kmap: maps a page from the buffer into kernel address space.
- @kunmap: [optional] unmaps a page from the buffer.
*/ struct dma_buf_ops { int (*attach)(struct dma_buf *, struct device *, @@ -71,6 +82,14 @@ struct dma_buf_ops { /* after final dma_buf_put() */ void (*release)(struct dma_buf *);
- int (*begin_cpu_access)(struct dma_buf *, size_t, size_t,
- enum dma_data_direction);
- void (*end_cpu_access)(struct dma_buf *, size_t, size_t,
- enum dma_data_direction);
- void *(*kmap_atomic)(struct dma_buf *, unsigned long);
- void (*kunmap_atomic)(struct dma_buf *, unsigned long, void *);
- void *(*kmap)(struct dma_buf *, unsigned long);
- void (*kunmap)(struct dma_buf *, unsigned long, void *);
};
/** @@ -123,6 +142,15 @@ void dma_buf_put(struct dma_buf *dmabuf); struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *, enum dma_data_direction); void dma_buf_unmap_attachment(struct dma_buf_attachment *, struct sg_table *);
+int dma_buf_begin_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len,
- enum dma_data_direction dir);
+void dma_buf_end_cpu_access(struct dma_buf *dma_buf, size_t start, size_t len,
- enum dma_data_direction dir);
+void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); +void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *); +void *dma_buf_kmap(struct dma_buf *, unsigned long); +void dma_buf_kunmap(struct dma_buf *, unsigned long, void *); #else
static inline struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, @@ -171,6 +199,38 @@ static inline void dma_buf_unmap_attachment(struct dma_buf_attachment *attach, return; }
+static inline int dma_buf_begin_cpu_access(struct dma_buf *,
- size_t, size_t,
- enum dma_data_direction)
+{
- return -ENODEV;
+}
+static inline void dma_buf_end_cpu_access(struct dma_buf *,
- size_t, size_t,
- enum dma_data_direction)
+{ +}
+static inline void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long) +{
- return NULL;
+}
+static inline void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long,
- void *)
+{ +}
+static inline void *dma_buf_kmap(struct dma_buf *, unsigned long) +{
- return NULL;
+}
+static inline void dma_buf_kunmap(struct dma_buf *, unsigned long,
- void *)
+{ +} #endif /* CONFIG_DMA_SHARED_BUFFER */
#endif /* __DMA_BUF_H__ */
1.7.7.5
-- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 2, 2012 at 23:24, Rob Clark robdclark@gmail.com wrote:
Perhaps we should check somewhere for required dmabuf ops fxns (like kmap_atomic here), rather than just calling unconditionally what might be a null ptr. At least put it in the WARN_ON(), but it might be nicer to catch a missing required fxns at export time, rather than waiting for an importer to try and call it. Less likely that way, for newly added required functions go unnoticed.
(same comment applies below for the non-atomic variant.. and possibly some other existing dmabuf ops)
Agreed, I'll rework the patch to do that when rebasing onto Sumit's latest tree. -Daniel
Hi Daniel, On Tue, Mar 6, 2012 at 12:27 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
On Fri, Mar 2, 2012 at 23:24, Rob Clark robdclark@gmail.com wrote:
Perhaps we should check somewhere for required dmabuf ops fxns (like kmap_atomic here), rather than just calling unconditionally what might be a null ptr. At least put it in the WARN_ON(), but it might be nicer to catch a missing required fxns at export time, rather than waiting for an importer to try and call it. Less likely that way, for newly added required functions go unnoticed.
(same comment applies below for the non-atomic variant.. and possibly some other existing dmabuf ops)
Agreed, I'll rework the patch to do that when rebasing onto Sumit's latest tree.
In addition, you'd not need to check for !dmabuf->ops since the export should already catch it.
As I sent in the other mail a while back, could you please rebase on for-next at git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
Best regards, ~Sumit.
-Daniel
On Thu, 1 Mar 2012 16:36:00 +0100, Daniel Vetter daniel.vetter@ffwll.ch wrote:
Big differences to other contenders in the field (like ion) is that this also supports highmem, so we have to split up the cpu access from the kernel side into a prepare and a kmap step.
Prepare is allowed to fail and should do everything required so that the kmap calls can succeed (like swapin/backing storage allocation, flushing, ...).
More in-depth explanations will follow in the follow-up documentation patch.
Changes in v2:
- Clear up begin_cpu_access confusion noticed by Sumit Semwal.
- Don't automatically fallback from the _atomic variants to the non-atomic variants. The _atomic callbacks are not allowed to sleep, so we want exporters to make this decision explicit. The function signatures are explicit, so simpler exporters can still use the same function for both.
- Make the unmap functions optional. Simpler exporters with permanent mappings don't need to do anything at unmap time.
Are we going to have to have a dma_buf->ops->begin_async_access(&me, dir) variant for coherency control of rendering with an imported dma_buf? There is also no concurrency control here between multiple importers doing simultaneous begin_cpu_access(). I presume that is going to be a common function across all exporters so the midlayer might offer a semaphore as a library function and then the dma_buf->ops->begin_cpu_access() becomes mandatory as at a minimum it has to point to the default implementation. -Chris
On Fri, Mar 2, 2012 at 4:38 PM, Chris Wilson chris@chris-wilson.co.uk wrote:
On Thu, 1 Mar 2012 16:36:00 +0100, Daniel Vetter daniel.vetter@ffwll.ch wrote:
Big differences to other contenders in the field (like ion) is that this also supports highmem, so we have to split up the cpu access from the kernel side into a prepare and a kmap step.
Prepare is allowed to fail and should do everything required so that the kmap calls can succeed (like swapin/backing storage allocation, flushing, ...).
More in-depth explanations will follow in the follow-up documentation patch.
Changes in v2:
- Clear up begin_cpu_access confusion noticed by Sumit Semwal.
- Don't automatically fallback from the _atomic variants to the
non-atomic variants. The _atomic callbacks are not allowed to sleep, so we want exporters to make this decision explicit. The function signatures are explicit, so simpler exporters can still use the same function for both.
- Make the unmap functions optional. Simpler exporters with permanent
mappings don't need to do anything at unmap time.
Are we going to have to have a dma_buf->ops->begin_async_access(&me, dir) variant for coherency control of rendering with an imported dma_buf? There is also no concurrency control here between multiple importers doing simultaneous begin_cpu_access(). I presume that is going to be a common function across all exporters so the midlayer might offer a semaphore as a library function and then the dma_buf->ops->begin_cpu_access() becomes mandatory as at a minimum it has to point to the default implementation.
Initially the expectation was that userspace would not pass a buffer to multiple subsystems for writing (or that if it did, it would get the undefined results that one could expect).. so dealing w/ synchronization was punted.
I expect, though, that one of the next steps is some sort of sync-object mechanism to supplement dmabuf
BR, -R
-Chris
-- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Mar 2, 2012 at 23:53, Rob Clark rob.clark@linaro.org wrote:
nitially the expectation was that userspace would not pass a buffer to multiple subsystems for writing (or that if it did, it would get the undefined results that one could expect).. so dealing w/ synchronization was punted.
Imo synchronization should not be part of the dma_buf core, i.e. userspace needs to ensure that access is synchronized. begin/end_cpu_access are the coherency brackets (like map/unmap for device dma). And if userspace asks for a gun and some bullets, the kernel should just deliver. Even in drm/i915 gem land we don't (and simply can't) make any promises about concurrent reads/writes/ioctls.
I expect, though, that one of the next steps is some sort of sync-object mechanism to supplement dmabuf
Imo the only reason to add sync objects as explicit things is to make device-to-device sync more efficient by using hw semaphores and signalling lines. Or maybe a quick irq handler in the kernel that kicks of the next device. I don't think we should design these to make userspace simpler.
Cheers, Daniel
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch --- Documentation/dma-buf-sharing.txt | 102 +++++++++++++++++++++++++++++++++++- 1 files changed, 99 insertions(+), 3 deletions(-)
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 225f96d..f12542b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -32,8 +32,12 @@ The buffer-user *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] For this first version, A buffer shared using the dma_buf sharing API: - *may* be exported to user space using "mmap" *ONLY* by exporter, outside of - this framework. -- may be used *ONLY* by importers that do not need CPU access to the buffer. + this framework. +- with this new iteration of the dma-buf api cpu access from the kernel has been + enable, see below for the details. + +dma-buf operations for device dma only +--------------------------------------
The dma_buf buffer sharing API usage contains the following steps:
@@ -219,7 +223,99 @@ NOTES: If the exporter chooses not to allow an attach() operation once a map_dma_buf() API has been called, it simply returns an error.
-Miscellaneous notes: +Kernel cpu access to a dma-buf buffer object +-------------------------------------------- + +The motivation to allow cpu access from the kernel to a dma-buf object from the +importers side are: +- fallback operations, e.g. if the devices is connected to a usb bus and the + kernel needs to shuffle the data around first before sending it away. +- full transperancy for existing users on the importer side, i.e. userspace + should not notice the difference between a normal object from that subsystem + and an imported one backed by a dma-buf. This is really important for drm + opengl drivers that expect to still use all the existing upload/download + paths. + +Access to a dma_buf from the kernel context involves three steps: + +1. Prepare access, which invalidate any necessary caches and make the object + available for cpu access. +2. Access the object page-by-page with the dma_buf map apis +3. Finish access, which will flush any necessary cpu caches and free reserved + resources. + +1. Prepare acces + + Before an importer can acces a dma_buf object with the cpu from the kernel + context, it needs to notice the exporter of the access that is about to + happen. + + Interface: + int dma_buf_begin_cpu_access(struct dma_buf *dmabuf, + size_t start, size_t len, + enum dma_data_direction direction) + + This allows the exporter to ensure that the memory is actually available for + cpu access - the exporter might need to allocate or swap-in and pin the + backing storage. The exporter also needs to ensure that cpu access is + coherent for the given range and access direction. The range and access + direction can be used by the exporter to optimize the cache flushing, i.e. + access outside of the range or with a different direction (read instead of + write) might return stale or even bogus data (e.g. when the exporter needs to + copy the data to temporaray storage). + + This step might fail, e.g. in oom conditions. + +2. Accessing the buffer + + To support dma_buf objects residing in highmem cpu access is page-based using + an api similar to kmap. Accessing a dma_buf is done in aligned chunks of + PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which returns + a pointer in kernel virtual address space. Afterwards the chunk needs to be + unmapped again. There is no limit on how often a given chunk can be mapped + and unmmapped, i.e. the importer does not need to call begin_cpu_access again + before mapping the same chunk again. + + Interfaces: + void *dma_buf_kmap(struct dma_buf *, unsigned long); + void dma_buf_kunmap(struct dma_buf *, unsigned long, void *); + + There are also atomic variants of these interfaces. Like for kmap they + facilitate non-blocking fast-paths. Neither the importer nor the exporter (in + the callback) is allowed to block when using these. + + Interfaces: + void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long); + void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *); + + For importers all the restrictions of using kmap apply, like the limited + supply of kmap_atomic slots. Hence an importer shall only hold onto at most 2 + atomic dma_buf kmaps at the same time (in any given process context). + + dma_buf kmap calls outside of the range specified in begin_cpu_access are + undefined. If the range is not PAGE_SIZE aligned, kmap needs to succeed on + the partial chunks at the beginning and end but may return stale or bogus + data outside of the range (in these partial chunks). + + Note that these calls need to always succeed. The exporter needs to complete + any preparations that might fail in begin_cpu_access. + +3. Finish access + + When the importer is done accessing the range specified in begin_cpu_acces, + it needs to announce this to the exporter (to facilitate cache flushing and + unpinning of any pinned resources). The result of of any dma_buf kmap calls + after end_cpu_access is undefined. + + Interface: + void dma_buf_end_cpu_access(struct dma_buf *dma_buf, + size_t start, size_t len, + enum dma_data_direction dir); + + +Miscellaneous notes +------------------- + - Any exporters or users of the dma-buf buffer sharing framework must have a 'select DMA_SHARED_BUFFER' in their respective Kconfigs.
minor comments from the typo-police..
On Thu, Mar 1, 2012 at 9:36 AM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
Documentation/dma-buf-sharing.txt | 102 +++++++++++++++++++++++++++++++++++- 1 files changed, 99 insertions(+), 3 deletions(-)
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 225f96d..f12542b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -32,8 +32,12 @@ The buffer-user *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] For this first version, A buffer shared using the dma_buf sharing API: - *may* be exported to user space using "mmap" *ONLY* by exporter, outside of
- this framework.
-- may be used *ONLY* by importers that do not need CPU access to the buffer.
- this framework.
+- with this new iteration of the dma-buf api cpu access from the kernel has been
- enable, see below for the details.
+dma-buf operations for device dma only +--------------------------------------
The dma_buf buffer sharing API usage contains the following steps:
@@ -219,7 +223,99 @@ NOTES: If the exporter chooses not to allow an attach() operation once a map_dma_buf() API has been called, it simply returns an error.
-Miscellaneous notes: +Kernel cpu access to a dma-buf buffer object +--------------------------------------------
+The motivation to allow cpu access from the kernel to a dma-buf object from the +importers side are: +- fallback operations, e.g. if the devices is connected to a usb bus and the
- kernel needs to shuffle the data around first before sending it away.
+- full transperancy for existing users on the importer side, i.e. userspace
s/transperancy/transparency/
- should not notice the difference between a normal object from that subsystem
- and an imported one backed by a dma-buf. This is really important for drm
- opengl drivers that expect to still use all the existing upload/download
- paths.
+Access to a dma_buf from the kernel context involves three steps:
+1. Prepare access, which invalidate any necessary caches and make the object
- available for cpu access.
+2. Access the object page-by-page with the dma_buf map apis +3. Finish access, which will flush any necessary cpu caches and free reserved
- resources.
+1. Prepare acces
s/acces/access/
- Before an importer can acces a dma_buf object with the cpu from the kernel
s/acces/access/
- context, it needs to notice the exporter of the access that is about to
s/notice/notify/ (I assume?)
- happen.
- Interface:
- int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
- size_t start, size_t len,
- enum dma_data_direction direction)
- This allows the exporter to ensure that the memory is actually available for
- cpu access - the exporter might need to allocate or swap-in and pin the
- backing storage. The exporter also needs to ensure that cpu access is
- coherent for the given range and access direction. The range and access
- direction can be used by the exporter to optimize the cache flushing, i.e.
- access outside of the range or with a different direction (read instead of
- write) might return stale or even bogus data (e.g. when the exporter needs to
- copy the data to temporaray storage).
s/temporaray/temporary/
- This step might fail, e.g. in oom conditions.
+2. Accessing the buffer
- To support dma_buf objects residing in highmem cpu access is page-based using
- an api similar to kmap. Accessing a dma_buf is done in aligned chunks of
- PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which returns
- a pointer in kernel virtual address space. Afterwards the chunk needs to be
- unmapped again. There is no limit on how often a given chunk can be mapped
- and unmmapped, i.e. the importer does not need to call begin_cpu_access again
s/unmmapped/unmapped/
- before mapping the same chunk again.
- Interfaces:
- void *dma_buf_kmap(struct dma_buf *, unsigned long);
- void dma_buf_kunmap(struct dma_buf *, unsigned long, void *);
- There are also atomic variants of these interfaces. Like for kmap they
- facilitate non-blocking fast-paths. Neither the importer nor the exporter (in
- the callback) is allowed to block when using these.
- Interfaces:
- void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long);
- void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *);
- For importers all the restrictions of using kmap apply, like the limited
- supply of kmap_atomic slots. Hence an importer shall only hold onto at most 2
- atomic dma_buf kmaps at the same time (in any given process context).
- dma_buf kmap calls outside of the range specified in begin_cpu_access are
- undefined. If the range is not PAGE_SIZE aligned, kmap needs to succeed on
- the partial chunks at the beginning and end but may return stale or bogus
- data outside of the range (in these partial chunks).
- Note that these calls need to always succeed. The exporter needs to complete
- any preparations that might fail in begin_cpu_access.
+3. Finish access
- When the importer is done accessing the range specified in begin_cpu_acces,
s/begin_cpu_acces/begin_cpu_access/
BR, -R
- it needs to announce this to the exporter (to facilitate cache flushing and
- unpinning of any pinned resources). The result of of any dma_buf kmap calls
- after end_cpu_access is undefined.
- Interface:
- void dma_buf_end_cpu_access(struct dma_buf *dma_buf,
- size_t start, size_t len,
- enum dma_data_direction dir);
+Miscellaneous notes +-------------------
- Any exporters or users of the dma-buf buffer sharing framework must have a 'select DMA_SHARED_BUFFER' in their respective Kconfigs.
-- 1.7.7.5
-- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Daniel,
Thanks for the patch.
On Thu, Mar 01, 2012 at 04:36:01PM +0100, Daniel Vetter wrote:
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
Documentation/dma-buf-sharing.txt | 102 +++++++++++++++++++++++++++++++++++- 1 files changed, 99 insertions(+), 3 deletions(-)
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 225f96d..f12542b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -32,8 +32,12 @@ The buffer-user *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] For this first version, A buffer shared using the dma_buf sharing API:
- *may* be exported to user space using "mmap" *ONLY* by exporter, outside of
- this framework.
-- may be used *ONLY* by importers that do not need CPU access to the buffer.
- this framework.
+- with this new iteration of the dma-buf api cpu access from the kernel has been
- enable, see below for the details.
+dma-buf operations for device dma only +-------------------------------------- The dma_buf buffer sharing API usage contains the following steps: @@ -219,7 +223,99 @@ NOTES: If the exporter chooses not to allow an attach() operation once a map_dma_buf() API has been called, it simply returns an error. -Miscellaneous notes: +Kernel cpu access to a dma-buf buffer object +--------------------------------------------
+The motivation to allow cpu access from the kernel to a dma-buf object from the +importers side are: +- fallback operations, e.g. if the devices is connected to a usb bus and the
- kernel needs to shuffle the data around first before sending it away.
+- full transperancy for existing users on the importer side, i.e. userspace
- should not notice the difference between a normal object from that subsystem
- and an imported one backed by a dma-buf. This is really important for drm
- opengl drivers that expect to still use all the existing upload/download
- paths.
+Access to a dma_buf from the kernel context involves three steps:
+1. Prepare access, which invalidate any necessary caches and make the object
- available for cpu access.
+2. Access the object page-by-page with the dma_buf map apis +3. Finish access, which will flush any necessary cpu caches and free reserved
- resources.
Where it should be decided which operations are being done to the buffer when it is passed to user space and back to kernel space?
How about spliting these operations to those done on the first time the buffer is passed to the user space (mapping to kernel address space, for example) and those required every time buffer is passed from kernel to user and back (cache flusing)?
I'm asking since any unnecessary time-consuming operations, especially as heavy as mapping the buffer, should be avoidable in subsystems dealing with streaming video, cameras etc., i.e. non-GPU users.
+1. Prepare acces
- Before an importer can acces a dma_buf object with the cpu from the kernel
- context, it needs to notice the exporter of the access that is about to
- happen.
- Interface:
int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
size_t start, size_t len,
enum dma_data_direction direction)
- This allows the exporter to ensure that the memory is actually available for
- cpu access - the exporter might need to allocate or swap-in and pin the
- backing storage. The exporter also needs to ensure that cpu access is
- coherent for the given range and access direction. The range and access
- direction can be used by the exporter to optimize the cache flushing, i.e.
- access outside of the range or with a different direction (read instead of
- write) might return stale or even bogus data (e.g. when the exporter needs to
- copy the data to temporaray storage).
- This step might fail, e.g. in oom conditions.
+2. Accessing the buffer
- To support dma_buf objects residing in highmem cpu access is page-based using
- an api similar to kmap. Accessing a dma_buf is done in aligned chunks of
- PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which returns
- a pointer in kernel virtual address space. Afterwards the chunk needs to be
- unmapped again. There is no limit on how often a given chunk can be mapped
- and unmmapped, i.e. the importer does not need to call begin_cpu_access again
- before mapping the same chunk again.
- Interfaces:
void *dma_buf_kmap(struct dma_buf *, unsigned long);
void dma_buf_kunmap(struct dma_buf *, unsigned long, void *);
- There are also atomic variants of these interfaces. Like for kmap they
- facilitate non-blocking fast-paths. Neither the importer nor the exporter (in
- the callback) is allowed to block when using these.
- Interfaces:
void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long);
void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *);
- For importers all the restrictions of using kmap apply, like the limited
- supply of kmap_atomic slots. Hence an importer shall only hold onto at most 2
- atomic dma_buf kmaps at the same time (in any given process context).
- dma_buf kmap calls outside of the range specified in begin_cpu_access are
- undefined. If the range is not PAGE_SIZE aligned, kmap needs to succeed on
- the partial chunks at the beginning and end but may return stale or bogus
- data outside of the range (in these partial chunks).
- Note that these calls need to always succeed. The exporter needs to complete
- any preparations that might fail in begin_cpu_access.
+3. Finish access
- When the importer is done accessing the range specified in begin_cpu_acces,
- it needs to announce this to the exporter (to facilitate cache flushing and
- unpinning of any pinned resources). The result of of any dma_buf kmap calls
- after end_cpu_access is undefined.
- Interface:
void dma_buf_end_cpu_access(struct dma_buf *dma_buf,
size_t start, size_t len,
enum dma_data_direction dir);
+Miscellaneous notes +-------------------
- Any exporters or users of the dma-buf buffer sharing framework must have a 'select DMA_SHARED_BUFFER' in their respective Kconfigs.
Kind regards,
On Fri, Mar 2, 2012 at 6:23 PM, Sakari Ailus sakari.ailus@iki.fi wrote:
Hi Daniel,
Thanks for the patch.
On Thu, Mar 01, 2012 at 04:36:01PM +0100, Daniel Vetter wrote:
Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch
Documentation/dma-buf-sharing.txt | 102 +++++++++++++++++++++++++++++++++++- 1 files changed, 99 insertions(+), 3 deletions(-)
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt index 225f96d..f12542b 100644 --- a/Documentation/dma-buf-sharing.txt +++ b/Documentation/dma-buf-sharing.txt @@ -32,8 +32,12 @@ The buffer-user *IMPORTANT*: [see https://lkml.org/lkml/2011/12/20/211 for more details] For this first version, A buffer shared using the dma_buf sharing API: - *may* be exported to user space using "mmap" *ONLY* by exporter, outside of
- this framework.
-- may be used *ONLY* by importers that do not need CPU access to the buffer.
- this framework.
+- with this new iteration of the dma-buf api cpu access from the kernel has been
- enable, see below for the details.
+dma-buf operations for device dma only +--------------------------------------
The dma_buf buffer sharing API usage contains the following steps:
@@ -219,7 +223,99 @@ NOTES: If the exporter chooses not to allow an attach() operation once a map_dma_buf() API has been called, it simply returns an error.
-Miscellaneous notes: +Kernel cpu access to a dma-buf buffer object +--------------------------------------------
+The motivation to allow cpu access from the kernel to a dma-buf object from the +importers side are: +- fallback operations, e.g. if the devices is connected to a usb bus and the
- kernel needs to shuffle the data around first before sending it away.
+- full transperancy for existing users on the importer side, i.e. userspace
- should not notice the difference between a normal object from that subsystem
- and an imported one backed by a dma-buf. This is really important for drm
- opengl drivers that expect to still use all the existing upload/download
- paths.
+Access to a dma_buf from the kernel context involves three steps:
+1. Prepare access, which invalidate any necessary caches and make the object
- available for cpu access.
+2. Access the object page-by-page with the dma_buf map apis +3. Finish access, which will flush any necessary cpu caches and free reserved
- resources.
Where it should be decided which operations are being done to the buffer when it is passed to user space and back to kernel space?
How about spliting these operations to those done on the first time the buffer is passed to the user space (mapping to kernel address space, for example) and those required every time buffer is passed from kernel to user and back (cache flusing)?
I'm asking since any unnecessary time-consuming operations, especially as heavy as mapping the buffer, should be avoidable in subsystems dealing with streaming video, cameras etc., i.e. non-GPU users.
Well, this is really something for the buffer exporter to deal with.. since there is no way for an importer to create a userspace mmap'ing of the buffer. A lot of these expensive operations go away if you don't even create a userspace virtual mapping in the first place ;-)
BR, -R
+1. Prepare acces
- Before an importer can acces a dma_buf object with the cpu from the kernel
- context, it needs to notice the exporter of the access that is about to
- happen.
- Interface:
- int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
- size_t start, size_t len,
- enum dma_data_direction direction)
- This allows the exporter to ensure that the memory is actually available for
- cpu access - the exporter might need to allocate or swap-in and pin the
- backing storage. The exporter also needs to ensure that cpu access is
- coherent for the given range and access direction. The range and access
- direction can be used by the exporter to optimize the cache flushing, i.e.
- access outside of the range or with a different direction (read instead of
- write) might return stale or even bogus data (e.g. when the exporter needs to
- copy the data to temporaray storage).
- This step might fail, e.g. in oom conditions.
+2. Accessing the buffer
- To support dma_buf objects residing in highmem cpu access is page-based using
- an api similar to kmap. Accessing a dma_buf is done in aligned chunks of
- PAGE_SIZE size. Before accessing a chunk it needs to be mapped, which returns
- a pointer in kernel virtual address space. Afterwards the chunk needs to be
- unmapped again. There is no limit on how often a given chunk can be mapped
- and unmmapped, i.e. the importer does not need to call begin_cpu_access again
- before mapping the same chunk again.
- Interfaces:
- void *dma_buf_kmap(struct dma_buf *, unsigned long);
- void dma_buf_kunmap(struct dma_buf *, unsigned long, void *);
- There are also atomic variants of these interfaces. Like for kmap they
- facilitate non-blocking fast-paths. Neither the importer nor the exporter (in
- the callback) is allowed to block when using these.
- Interfaces:
- void *dma_buf_kmap_atomic(struct dma_buf *, unsigned long);
- void dma_buf_kunmap_atomic(struct dma_buf *, unsigned long, void *);
- For importers all the restrictions of using kmap apply, like the limited
- supply of kmap_atomic slots. Hence an importer shall only hold onto at most 2
- atomic dma_buf kmaps at the same time (in any given process context).
- dma_buf kmap calls outside of the range specified in begin_cpu_access are
- undefined. If the range is not PAGE_SIZE aligned, kmap needs to succeed on
- the partial chunks at the beginning and end but may return stale or bogus
- data outside of the range (in these partial chunks).
- Note that these calls need to always succeed. The exporter needs to complete
- any preparations that might fail in begin_cpu_access.
+3. Finish access
- When the importer is done accessing the range specified in begin_cpu_acces,
- it needs to announce this to the exporter (to facilitate cache flushing and
- unpinning of any pinned resources). The result of of any dma_buf kmap calls
- after end_cpu_access is undefined.
- Interface:
- void dma_buf_end_cpu_access(struct dma_buf *dma_buf,
- size_t start, size_t len,
- enum dma_data_direction dir);
+Miscellaneous notes +-------------------
- Any exporters or users of the dma-buf buffer sharing framework must have a 'select DMA_SHARED_BUFFER' in their respective Kconfigs.
Kind regards,
-- Sakari Ailus e-mail: sakari.ailus@iki.fi jabber/XMPP/Gmail: sailus@retiisi.org.uk
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Sat, Mar 3, 2012 at 01:23, Sakari Ailus sakari.ailus@iki.fi wrote:
Where it should be decided which operations are being done to the buffer when it is passed to user space and back to kernel space?
How about spliting these operations to those done on the first time the buffer is passed to the user space (mapping to kernel address space, for example) and those required every time buffer is passed from kernel to user and back (cache flusing)?
I'm asking since any unnecessary time-consuming operations, especially as heavy as mapping the buffer, should be avoidable in subsystems dealing with streaming video, cameras etc., i.e. non-GPU users.
I'm a bit confused about your comments because this interface extension doesn't support userspace mmap. So userspace isn't even part of the picture. Adding mmap support is something entirely different imo, and I have no idea yet how we should handle cache coherency for that.
Yours, Daniel
linaro-mm-sig@lists.linaro.org