On Wed, Sep 30, 2020 at 11:39:06AM +0200, Michel Dänzer wrote:
> On 2020-03-17 10:21 p.m., Jason Ekstrand wrote:
> > Explicit synchronization is the future. At least, that seems to be what
> > most userspace APIs are agreeing on at this point. However, most of our
> > Linux APIs (both userspace and kernel UAPI) are currently built around
> > implicit synchronization with dma-buf. While work is ongoing to change
> > many of the userspace APIs and protocols to an explicit synchronization
> > model, switching over piecemeal is difficult due to the number of
> > potential components involved. On the kernel side, many drivers use
> > dma-buf including GPU (3D/compute), display, v4l, and others. In
> > userspace, we have X11, several Wayland compositors, 3D drivers, compute
> > drivers (OpenCL etc.), media encode/decode, and the list goes on.
> >
> > This patch provides a path forward by allowing userspace to manually
> > manage the fences attached to a dma-buf. Alternatively, one can think
> > of this as making dma-buf's implicit synchronization simply a carrier
> > for an explicit fence. This is accomplished by adding two IOCTLs to
> > dma-buf for importing and exporting a sync file to/from the dma-buf.
> > This way a userspace component which is uses explicit synchronization,
> > such as a Vulkan driver, can manually set the write fence on a buffer
> > before handing it off to an implicitly synchronized component such as a
> > Wayland compositor or video encoder. In this way, each of the different
> > components can be upgraded to an explicit synchronization model one at a
> > time as long as the userspace pieces connecting them are aware of it and
> > import/export fences at the right times.
> >
> > There is a potential race condition with this API if userspace is not
> > careful. A typical use case for implicit synchronization is to wait for
> > the dma-buf to be ready, use it, and then signal it for some other
> > component. Because a sync_file cannot be created until it is guaranteed
> > to complete in finite time, userspace can only signal the dma-buf after
> > it has already submitted the work which uses it to the kernel and has
> > received a sync_file back. There is no way to atomically submit a
> > wait-use-signal operation. This is not, however, really a problem with
> > this API so much as it is a problem with explicit synchronization
> > itself. The way this is typically handled is to have very explicit
> > ownership transfer points in the API or protocol which ensure that only
> > one component is using it at any given time. Both X11 (via the PRESENT
> > extension) and Wayland provide such ownership transfer points via
> > explicit present and idle messages.
> >
> > The decision was intentionally made in this patch to make the import and
> > export operations IOCTLs on the dma-buf itself rather than as a DRM
> > IOCTL. This makes it the import/export operation universal across all
> > components which use dma-buf including GPU, display, v4l, and others.
> > It also means that a userspace component can do the import/export
> > without access to the DRM fd which may be tricky to get in cases where
> > the client communicates with DRM via a userspace API such as OpenGL or
> > Vulkan. At a future date we may choose to add direct import/export APIs
> > to components such as drm_syncobj to avoid allocating a file descriptor
> > and going through two ioctls. However, that seems to be something of a
> > micro-optimization as import/export operations are likely to happen at a
> > rate of a few per frame of rendered or decoded video.
> >
> > v2 (Jason Ekstrand):
> > - Use a wrapper dma_fence_array of all fences including the new one
> > when importing an exclusive fence.
> >
> > v3 (Jason Ekstrand):
> > - Lock around setting shared fences as well as exclusive
> > - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> > - Initialize ret to 0 in dma_buf_wait_sync_file
> >
> > v4 (Jason Ekstrand):
> > - Use the new dma_resv_get_singleton helper
> >
> > v5 (Jason Ekstrand):
> > - Rename the IOCTLs to import/export rather than wait/signal
> > - Drop the WRITE flag and always get/set the exclusive fence
> >
> > Signed-off-by: Jason Ekstrand <jason(a)jlekstrand.net>
>
> What's the status of this? DMA_BUF_IOCTL_EXPORT_SYNC_FILE would be useful
> for Wayland compositors to wait for client buffers to become ready without
> being prone to getting delayed by later HW access to them, so it would be
> nice to merge that at least (if DMA_BUF_IOCTL_IMPORT_SYNC_FILE is still
> controversial).
I think the missing bits are just the usual stuff
- igt testcases
- userspace using the new ioctls
- review of the entire pile
I don't think there's any fundamental objections aside from "no one ever
pushed this over the finish line".
Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Thu, Dec 17, 2020 at 4:31 AM <siyanteng01(a)gmail.com> wrote:
>
> From: siyanteng <siyanteng01(a)gmail.com>
>
> When building cma_heap the following error shows up:
>
> drivers/dma-buf/heaps/cma_heap.c:195:10: error: implicit declaration of function 'vmap'; did you mean 'kmap'? [-Werror=implicit-function-declaration]
> 195 | vaddr = vmap(buffer->pages, buffer->pagecount, VM_MAP, PAGE_KERNEL);
> | ^~~~
> | kmap
>
> Use this include: linux-next/include/linux/vmalloc.h
>
> Signed-off-by: siyanteng <siyanteng01(a)gmail.com>
Thanks for submitting this! My apologies for the trouble!
We already have a similar patch queued here:
https://cgit.freedesktop.org/drm/drm-misc/commit/?h=drm-misc-next-fixes&id=…
so hopefully that will land upstream soon.
thanks again!
-john
Also try to clarify a bit when dma_buf_begin/end_cpu_access should
be called.
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Cc: linux-media(a)vger.kernel.org
Cc: linaro-mm-sig(a)lists.linaro.org
---
drivers/dma-buf/dma-buf.c | 20 ++++++++++++++------
include/linux/dma-buf.h | 25 +++++++++----------------
2 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index e63684d4cd90..a12fdffa130f 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1001,15 +1001,15 @@ EXPORT_SYMBOL_GPL(dma_buf_move_notify);
* vmalloc space might be limited and result in vmap calls failing.
*
* Interfaces::
+ *
* void \*dma_buf_vmap(struct dma_buf \*dmabuf)
* void dma_buf_vunmap(struct dma_buf \*dmabuf, void \*vaddr)
*
* The vmap call can fail if there is no vmap support in the exporter, or if
- * it runs out of vmalloc space. Fallback to kmap should be implemented. Note
- * that the dma-buf layer keeps a reference count for all vmap access and
- * calls down into the exporter's vmap function only when no vmapping exists,
- * and only unmaps it once. Protection against concurrent vmap/vunmap calls is
- * provided by taking the dma_buf->lock mutex.
+ * it runs out of vmalloc space. Note that the dma-buf layer keeps a reference
+ * count for all vmap access and calls down into the exporter's vmap function
+ * only when no vmapping exists, and only unmaps it once. Protection against
+ * concurrent vmap/vunmap calls is provided by taking the &dma_buf.lock mutex.
*
* - For full compatibility on the importer side with existing userspace
* interfaces, which might already support mmap'ing buffers. This is needed in
@@ -1098,6 +1098,11 @@ static int __dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
* dma_buf_end_cpu_access(). Only when cpu access is braketed by both calls is
* it guaranteed to be coherent with other DMA access.
*
+ * This function will also wait for any DMA transactions tracked through
+ * implicit synchronization in &dma_buf.resv. For DMA transactions with explicit
+ * synchronization this function will only ensure cache coherency, callers must
+ * ensure synchronization with such DMA transactions on their own.
+ *
* Can return negative error values, returns 0 on success.
*/
int dma_buf_begin_cpu_access(struct dma_buf *dmabuf,
@@ -1199,7 +1204,10 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap);
* This call may fail due to lack of virtual mapping address space.
* These calls are optional in drivers. The intended use for them
* is for mapping objects linear in kernel space for high use objects.
- * Please attempt to use kmap/kunmap before thinking about these interfaces.
+ *
+ * To ensure coherency users must call dma_buf_begin_cpu_access() and
+ * dma_buf_end_cpu_access() around any cpu access performed through this
+ * mapping.
*
* Returns 0 on success, or a negative errno code otherwise.
*/
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index cf72699cb2bc..7eca37c8b10c 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -183,24 +183,19 @@ struct dma_buf_ops {
* @begin_cpu_access:
*
* This is called from dma_buf_begin_cpu_access() and allows the
- * exporter to ensure that the memory is actually available for cpu
- * access - the exporter might need to allocate or swap-in and pin the
- * backing storage. The exporter also needs to ensure that cpu access is
- * coherent for the access direction. The direction can be used by the
- * exporter to optimize the cache flushing, i.e. access with a different
+ * exporter to ensure that the memory is actually coherent for cpu
+ * access. The exporter also needs to ensure that cpu access is coherent
+ * for the access direction. The direction can be used by the exporter
+ * to optimize the cache flushing, i.e. access with a different
* direction (read instead of write) might return stale or even bogus
* data (e.g. when the exporter needs to copy the data to temporary
* storage).
*
- * This callback is optional.
+ * Note that this is both called through the DMA_BUF_IOCTL_SYNC IOCTL
+ * command for userspace mappings established through @mmap, and also
+ * for kernel mappings established with @vmap.
*
- * FIXME: This is both called through the DMA_BUF_IOCTL_SYNC command
- * from userspace (where storage shouldn't be pinned to avoid handing
- * de-factor mlock rights to userspace) and for the kernel-internal
- * users of the various kmap interfaces, where the backing storage must
- * be pinned to guarantee that the atomic kmap calls can succeed. Since
- * there's no in-kernel users of the kmap interfaces yet this isn't a
- * real problem.
+ * This callback is optional.
*
* Returns:
*
@@ -216,9 +211,7 @@ struct dma_buf_ops {
*
* This is called from dma_buf_end_cpu_access() when the importer is
* done accessing the CPU. The exporter can use this to flush caches and
- * unpin any resources pinned in @begin_cpu_access.
- * The result of any dma_buf kmap calls after end_cpu_access is
- * undefined.
+ * undo anything else done in @begin_cpu_access.
*
* This callback is optional.
*
--
2.29.2
On Wed, Dec 09, 2020 at 03:25:24PM +0100, Thomas Zimmermann wrote:
> Implementations of the vmap/vunmap GEM callbacks may perform pinning
> of the BO and may acquire the associated reservation object's lock.
> Callers that only require a mapping of the contained memory can thus
> interfere with other tasks that require exact pinning, such as scanout.
> This is less of an issue with private CMA buffers, but may happen
> with imported ones.
>
> Therefore provide the new interface drm_gem_cma_vmap_local(), which only
> performs the vmap operations. Callers have to hold the reservation lock
> while the mapping persists.
>
> This patch also connects GEM CMA helpers to the GEM object function with
> equivalent functionality.
>
> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
> ---
> drivers/gpu/drm/drm_gem_cma_helper.c | 35 ++++++++++++++++++++++++++++
> drivers/gpu/drm/vc4/vc4_bo.c | 13 +++++++++++
> drivers/gpu/drm/vc4/vc4_drv.h | 1 +
> include/drm/drm_gem_cma_helper.h | 1 +
> 4 files changed, 50 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
> index 7942cf05cd93..40b3e8e3fc42 100644
> --- a/drivers/gpu/drm/drm_gem_cma_helper.c
> +++ b/drivers/gpu/drm/drm_gem_cma_helper.c
> @@ -38,6 +38,7 @@ static const struct drm_gem_object_funcs drm_gem_cma_default_funcs = {
> .print_info = drm_gem_cma_print_info,
> .get_sg_table = drm_gem_cma_get_sg_table,
> .vmap = drm_gem_cma_vmap,
> + .vmap_local = drm_gem_cma_vmap_local,
> .mmap = drm_gem_cma_mmap,
> .vm_ops = &drm_gem_cma_vm_ops,
> };
> @@ -471,6 +472,40 @@ int drm_gem_cma_vmap(struct drm_gem_object *obj, struct dma_buf_map *map)
> }
> EXPORT_SYMBOL_GPL(drm_gem_cma_vmap);
>
> +/**
> + * drm_gem_cma_vmap_local - map a CMA GEM object into the kernel's virtual
> + * address space
> + * @obj: GEM object
> + * @map: Returns the kernel virtual address of the CMA GEM object's backing
> + * store.
> + *
> + * This function maps a buffer into the kernel's
> + * virtual address space. Since the CMA buffers are already mapped into the
> + * kernel virtual address space this simply returns the cached virtual
> + * address. Drivers using the CMA helpers should set this as their DRM
> + * driver's &drm_gem_object_funcs.vmap_local callback.
> + *
> + * Returns:
> + * 0 on success, or a negative error code otherwise.
> + */
> +int drm_gem_cma_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map)
> +{
> + struct drm_gem_cma_object *cma_obj = to_drm_gem_cma_obj(obj);
> +
> + /*
> + * TODO: The code in drm_gem_cma_prime_import_sg_table_vmap()
> + * establishes this mapping. The correct solution would
> + * be to call dma_buf_vmap_local() here.
> + *
> + * If we find a case where we absolutely have to call
> + * dma_buf_vmap_local(), the code needs to be restructured.
dma_buf_vmap_local is only relevant for dynamic importers, pinning at
import time is actually what you get anyway. That's what Christian meant
with his comments for the ->pin hook. So the TODO here doesn't make sense
imo, just delete it. We're very far away from making cma dynamic :-)
> + */
> + dma_buf_map_set_vaddr(map, cma_obj->vaddr);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_cma_vmap_local);
> +
> /**
> * drm_gem_cma_mmap - memory-map an exported CMA GEM object
> * @obj: GEM object
> diff --git a/drivers/gpu/drm/vc4/vc4_bo.c b/drivers/gpu/drm/vc4/vc4_bo.c
> index dc316cb79e00..ec57326c69c4 100644
> --- a/drivers/gpu/drm/vc4/vc4_bo.c
> +++ b/drivers/gpu/drm/vc4/vc4_bo.c
> @@ -387,6 +387,7 @@ static const struct drm_gem_object_funcs vc4_gem_object_funcs = {
> .export = vc4_prime_export,
> .get_sg_table = drm_gem_cma_get_sg_table,
> .vmap = vc4_prime_vmap,
> + .vmap_local = vc4_prime_vmap_local,
> .vm_ops = &vc4_vm_ops,
> };
>
> @@ -797,6 +798,18 @@ int vc4_prime_vmap(struct drm_gem_object *obj, struct dma_buf_map *map)
> return drm_gem_cma_vmap(obj, map);
> }
>
> +int vc4_prime_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map)
> +{
> + struct vc4_bo *bo = to_vc4_bo(obj);
> +
> + if (bo->validated_shader) {
This freaks me out. It should be impossible to export a validated shader
as a dma-buf, and indeed the check exists already.
All the wrapper functions here are imo pointless. Either we should remove
them, or replace the if with a BUG_ON here since if that ever happens we
have a security bug already. I'd go with removing, less code. Maybe throw
a patch on top?
Anyway this patch looks good, with the todo deleted:
Reviewed-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
> + DRM_DEBUG("mmaping of shader BOs not allowed.\n");
> + return -EINVAL;
> + }
> +
> + return drm_gem_cma_vmap_local(obj, map);
> +}
> +
> struct drm_gem_object *
> vc4_prime_import_sg_table(struct drm_device *dev,
> struct dma_buf_attachment *attach,
> diff --git a/drivers/gpu/drm/vc4/vc4_drv.h b/drivers/gpu/drm/vc4/vc4_drv.h
> index 43a1af110b3e..efb6c47d318f 100644
> --- a/drivers/gpu/drm/vc4/vc4_drv.h
> +++ b/drivers/gpu/drm/vc4/vc4_drv.h
> @@ -812,6 +812,7 @@ struct drm_gem_object *vc4_prime_import_sg_table(struct drm_device *dev,
> struct dma_buf_attachment *attach,
> struct sg_table *sgt);
> int vc4_prime_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
> +int vc4_prime_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map);
> int vc4_bo_cache_init(struct drm_device *dev);
> int vc4_bo_inc_usecnt(struct vc4_bo *bo);
> void vc4_bo_dec_usecnt(struct vc4_bo *bo);
> diff --git a/include/drm/drm_gem_cma_helper.h b/include/drm/drm_gem_cma_helper.h
> index 0a9711caa3e8..05122e71bc6d 100644
> --- a/include/drm/drm_gem_cma_helper.h
> +++ b/include/drm/drm_gem_cma_helper.h
> @@ -99,6 +99,7 @@ drm_gem_cma_prime_import_sg_table(struct drm_device *dev,
> struct dma_buf_attachment *attach,
> struct sg_table *sgt);
> int drm_gem_cma_vmap(struct drm_gem_object *obj, struct dma_buf_map *map);
> +int drm_gem_cma_vmap_local(struct drm_gem_object *obj, struct dma_buf_map *map);
> int drm_gem_cma_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma);
>
> /**
> --
> 2.29.2
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch