Am 23.01.24 um 14:02 schrieb Paul Cercueil:
[SNIP]

          
That an exporter has to call extra functions to access his own
buffers
is a complete no-go for the design since this forces exporters
into
doing extra steps for allowing importers to access their data.
Then what about we add these dma_buf_{begin,end}_access(), with
only
implementations for "dumb" exporters e.g. udmabuf or the dmabuf
heaps?
And only importers (who cache the mapping and actually care about
non-
coherency) would have to call these.
No, the problem is still that you would have to change all importers
to 
mandatory use dma_buf_begin/end.

But going a step back caching the mapping is irrelevant for
coherency. 
Even if you don't cache the mapping you don't get coherency.
You actually do - at least with udmabuf, as in that case
dma_buf_map_attachment() / dma_buf_unmap_attachment() will handle cache
coherency when the SGs are mapped/unmapped.

Well I just double checked the source in 6.7.1 and I can't see udmabuf doing anything for cache coherency in map/unmap.

All it does is calling dma_map_sgtable() and dma_unmap_sgtable() to create and destroy the SG table and those are not supposed to sync anything to the CPU cache.

In other words drivers usually use DMA_ATTR_SKIP_CPU_SYNC here, it's just that this is missing from udmabuf.

The problem was then that dma_buf_unmap_attachment cannot be called
before the dma_fence is signaled, and calling it after is already too
late (because the fence would be signaled before the data is sync'd).

Well what sync are you talking about? CPU sync? In DMA-buf that is handled differently.

For importers it's mandatory that they can be coherent with the exporter. That usually means they can snoop the CPU cache if the exporter can snoop the CPU cache.

For exporters you can implement the begin/end CPU access functions which allows you to implement something even if your exporting device can't snoop the CPU cache.

Daniel / Sima suggested then that I cache the mapping and add new
functions to ensure cache coherency, which is what these patches are
about.

Yeah, I've now catched up on the latest mail. Sorry I haven't seen that before.


In other words exporters are not require to call sync_to_cpu or 
sync_to_device when you create a mapping.

What exactly is your use case here? And why does coherency matters?
My use-case is, I create DMABUFs with udmabuf, that I attach to
USB/functionfs with the interface introduced by this patchset. I attach
them to IIO with a similar interface (being upstreamed in parallel),
and transfer data from USB to IIO and vice-versa in a zero-copy
fashion.

This works perfectly fine as long as the USB and IIO hardware are
coherent between themselves, which is the case on most of our boards.
However I do have a board (with a Xilinx Ultrascale SoC) where it is
not the case, and cache flushes/sync are needed. So I was trying to
rework these new interfaces to work on that system too.

Yeah, that sounds strongly like one of the use cases we have rejected so far.

If this really is a no-no, then I am fine with the assumption that
devices sharing a DMABUF must be coherent between themselves; but
that's something that should probably be enforced rather than assumed.

(and I *think* there is a way to force coherency in the Ultrascale's
interconnect - we're investigating it)

What you can do is that instead of using udmabuf or dma-heaps is that the device which can't provide coherency act as exporters of the buffers.

The exporter is allowed to call sync_for_cpu/sync_for_device on it's own buffers and also gets begin/end CPU access notfications. So you can then handle coherency between the exporter and the CPU.

If you really don't have coherency between devices then that would be a really new use case and we would need much more agreement on how to do this.

Regards,
Christian.


Cheers,
-Paul

At the very least, is there a way to check that "the data can be
accessed coherently by the involved devices"? So that my importer
can
EPERM if there is no coherency vs. a device that's already
attached.
Yeah, there is functionality for this in the DMA subsystem. I've once
created prototype patches for enforcing the same coherency approach 
between importer and exporter, but we never got around to upstream
them.



Cheers,
-Paul

That in turn is pretty much un-testable unless you have every
possible
importer around while testing the exporter.

Regards,
Christian.

Regards,
Christian.
Cheers,
-Paul

Signed-off-by: Paul Cercueil <paul@crapouillou.net>

---
v5: New patch
---
    drivers/dma-buf/dma-buf.c | 66
+++++++++++++++++++++++++++++++++++++++
    include/linux/dma-buf.h   | 37 ++++++++++++++++++++++
    2 files changed, 103 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-
buf/dma-
buf.c
index 8fe5aa67b167..a8bab6c18fcd 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -830,6 +830,8 @@ static struct sg_table *
__map_dma_buf(struct
dma_buf_attachment *attach,
     *     - dma_buf_mmap()
     *     - dma_buf_begin_cpu_access()
     *     - dma_buf_end_cpu_access()
+ *     - dma_buf_begin_access()
+ *     - dma_buf_end_access()
     *     - dma_buf_map_attachment_unlocked()
     *     - dma_buf_unmap_attachment_unlocked()
     *     - dma_buf_vmap_unlocked()
@@ -1602,6 +1604,70 @@ void dma_buf_vunmap_unlocked(struct
dma_buf
*dmabuf, struct iosys_map *map)
    }
    EXPORT_SYMBOL_NS_GPL(dma_buf_vunmap_unlocked, DMA_BUF);
    
+/**
+ * @dma_buf_begin_access - Call before any hardware access
from/to
the DMABUF
+ * @attach:	[in]	attachment used for hardware
access
+ * @sg_table:	[in]	scatterlist used for the DMA
transfer
+ * @direction:  [in]    direction of DMA transfer
+ */
+int dma_buf_begin_access(struct dma_buf_attachment
*attach,
+			 struct sg_table *sgt, enum
dma_data_direction dir)
+{
+	struct dma_buf *dmabuf;
+	bool cookie;
+	int ret;
+
+	if (WARN_ON(!attach))
+		return -EINVAL;
+
+	dmabuf = attach->dmabuf;
+
+	if (!dmabuf->ops->begin_access)
+		return 0;
+
+	cookie = dma_fence_begin_signalling();
+	ret = dmabuf->ops->begin_access(attach, sgt, dir);
+	dma_fence_end_signalling(cookie);
+
+	if (WARN_ON_ONCE(ret))
+		return ret;
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_begin_access, DMA_BUF);
+
+/**
+ * @dma_buf_end_access - Call after any hardware access
from/to
the DMABUF
+ * @attach:	[in]	attachment used for hardware
access
+ * @sg_table:	[in]	scatterlist used for the DMA
transfer
+ * @direction:  [in]    direction of DMA transfer
+ */
+int dma_buf_end_access(struct dma_buf_attachment *attach,
+		       struct sg_table *sgt, enum
dma_data_direction dir)
+{
+	struct dma_buf *dmabuf;
+	bool cookie;
+	int ret;
+
+	if (WARN_ON(!attach))
+		return -EINVAL;
+
+	dmabuf = attach->dmabuf;
+
+	if (!dmabuf->ops->end_access)
+		return 0;
+
+	cookie = dma_fence_begin_signalling();
+	ret = dmabuf->ops->end_access(attach, sgt, dir);
+	dma_fence_end_signalling(cookie);
+
+	if (WARN_ON_ONCE(ret))
+		return ret;
+
+	return 0;
+}
+EXPORT_SYMBOL_NS_GPL(dma_buf_end_access, DMA_BUF);
+
    #ifdef CONFIG_DEBUG_FS
    static int dma_buf_debug_show(struct seq_file *s, void
*unused)
    {
diff --git a/include/linux/dma-buf.h b/include/linux/dma-
buf.h
index 8ff4add71f88..8ba612c7cc16 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -246,6 +246,38 @@ struct dma_buf_ops {
    	 */
    	int (*end_cpu_access)(struct dma_buf *, enum
dma_data_direction);
    
+	/**
+	 * @begin_access:
+	 *
+	 * This is called from dma_buf_begin_access() when
a
device driver
+	 * wants to access the data of the DMABUF. The
exporter
can use this
+	 * to flush/sync the caches if needed.
+	 *
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or a negative error code on
failure.
+	 */
+	int (*begin_access)(struct dma_buf_attachment *,
struct
sg_table *,
+			    enum dma_data_direction);
+
+	/**
+	 * @end_access:
+	 *
+	 * This is called from dma_buf_end_access() when a
device
driver is
+	 * done accessing the data of the DMABUF. The
exporter
can
use this
+	 * to flush/sync the caches if needed.
+	 *
+	 * This callback is optional.
+	 *
+	 * Returns:
+	 *
+	 * 0 on success or a negative error code on
failure.
+	 */
+	int (*end_access)(struct dma_buf_attachment *,
struct
sg_table *,
+			  enum dma_data_direction);
+
    	/**
    	 * @mmap:
    	 *
@@ -606,6 +638,11 @@ void dma_buf_detach(struct dma_buf
*dmabuf,
    int dma_buf_pin(struct dma_buf_attachment *attach);
    void dma_buf_unpin(struct dma_buf_attachment *attach);
    
+int dma_buf_begin_access(struct dma_buf_attachment
*attach,
+			 struct sg_table *sgt, enum
dma_data_direction dir);
+int dma_buf_end_access(struct dma_buf_attachment *attach,
+		       struct sg_table *sgt, enum
dma_data_direction dir);
+
    struct dma_buf *dma_buf_export(const struct
dma_buf_export_info
*exp_info);
    
    int dma_buf_fd(struct dma_buf *dmabuf, int flags);