[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
--- Changelog: - [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's .device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
- [5/8]: - Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
- [7/8]: - Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
- [8/8]: Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
--- Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
The buffer-dma code was using two queues, incoming and outgoing, to manage the state of the blocks in use.
While this totally works, it adds some complexity to the code, especially since the code only manages 2 blocks. It is much easier to just check each block's state manually, and keep a counter for the next block to dequeue.
Since the new DMABUF based API wouldn't use the outgoing queue anyway, getting rid of it now makes the upcoming changes simpler.
With this change, the IIO_BLOCK_STATE_DEQUEUED is now useless, and can be removed.
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v2: - Only remove the outgoing queue, and keep the incoming queue, as we want the buffer to start streaming data as soon as it is enabled. - Remove IIO_BLOCK_STATE_DEQUEUED, since it is now functionally the same as IIO_BLOCK_STATE_DONE. --- drivers/iio/buffer/industrialio-buffer-dma.c | 44 ++++++++++---------- include/linux/iio/buffer-dma.h | 7 ++-- 2 files changed, 26 insertions(+), 25 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c index d348af8b9705..1fc91467d1aa 100644 --- a/drivers/iio/buffer/industrialio-buffer-dma.c +++ b/drivers/iio/buffer/industrialio-buffer-dma.c @@ -179,7 +179,7 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( }
block->size = size; - block->state = IIO_BLOCK_STATE_DEQUEUED; + block->state = IIO_BLOCK_STATE_DONE; block->queue = queue; INIT_LIST_HEAD(&block->head); kref_init(&block->kref); @@ -191,16 +191,8 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
static void _iio_dma_buffer_block_done(struct iio_dma_buffer_block *block) { - struct iio_dma_buffer_queue *queue = block->queue; - - /* - * The buffer has already been freed by the application, just drop the - * reference. - */ - if (block->state != IIO_BLOCK_STATE_DEAD) { + if (block->state != IIO_BLOCK_STATE_DEAD) block->state = IIO_BLOCK_STATE_DONE; - list_add_tail(&block->head, &queue->outgoing); - } }
/** @@ -261,7 +253,6 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block) * not support abort and has not given back the block yet. */ switch (block->state) { - case IIO_BLOCK_STATE_DEQUEUED: case IIO_BLOCK_STATE_QUEUED: case IIO_BLOCK_STATE_DONE: return true; @@ -317,7 +308,6 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) * dead. This means we can reset the lists without having to fear * corrution. */ - INIT_LIST_HEAD(&queue->outgoing); spin_unlock_irq(&queue->list_lock);
INIT_LIST_HEAD(&queue->incoming); @@ -456,14 +446,20 @@ static struct iio_dma_buffer_block *iio_dma_buffer_dequeue( struct iio_dma_buffer_queue *queue) { struct iio_dma_buffer_block *block; + unsigned int idx;
spin_lock_irq(&queue->list_lock); - block = list_first_entry_or_null(&queue->outgoing, struct - iio_dma_buffer_block, head); - if (block != NULL) { - list_del(&block->head); - block->state = IIO_BLOCK_STATE_DEQUEUED; + + idx = queue->fileio.next_dequeue; + block = queue->fileio.blocks[idx]; + + if (block->state == IIO_BLOCK_STATE_DONE) { + idx = (idx + 1) % ARRAY_SIZE(queue->fileio.blocks); + queue->fileio.next_dequeue = idx; + } else { + block = NULL; } + spin_unlock_irq(&queue->list_lock);
return block; @@ -539,6 +535,7 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf) struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buf); struct iio_dma_buffer_block *block; size_t data_available = 0; + unsigned int i;
/* * For counting the available bytes we'll use the size of the block not @@ -552,8 +549,15 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf) data_available += queue->fileio.active_block->size;
spin_lock_irq(&queue->list_lock); - list_for_each_entry(block, &queue->outgoing, head) - data_available += block->size; + + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { + block = queue->fileio.blocks[i]; + + if (block != queue->fileio.active_block + && block->state == IIO_BLOCK_STATE_DONE) + data_available += block->size; + } + spin_unlock_irq(&queue->list_lock); mutex_unlock(&queue->lock);
@@ -617,7 +621,6 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue, queue->ops = ops;
INIT_LIST_HEAD(&queue->incoming); - INIT_LIST_HEAD(&queue->outgoing);
mutex_init(&queue->lock); spin_lock_init(&queue->list_lock); @@ -645,7 +648,6 @@ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue) continue; queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD; } - INIT_LIST_HEAD(&queue->outgoing); spin_unlock_irq(&queue->list_lock);
INIT_LIST_HEAD(&queue->incoming); diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h index 6564bdcdac66..18d3702fa95d 100644 --- a/include/linux/iio/buffer-dma.h +++ b/include/linux/iio/buffer-dma.h @@ -19,14 +19,12 @@ struct device;
/** * enum iio_block_state - State of a struct iio_dma_buffer_block - * @IIO_BLOCK_STATE_DEQUEUED: Block is not queued * @IIO_BLOCK_STATE_QUEUED: Block is on the incoming queue * @IIO_BLOCK_STATE_ACTIVE: Block is currently being processed by the DMA * @IIO_BLOCK_STATE_DONE: Block is on the outgoing queue * @IIO_BLOCK_STATE_DEAD: Block has been marked as to be freed */ enum iio_block_state { - IIO_BLOCK_STATE_DEQUEUED, IIO_BLOCK_STATE_QUEUED, IIO_BLOCK_STATE_ACTIVE, IIO_BLOCK_STATE_DONE, @@ -73,12 +71,15 @@ struct iio_dma_buffer_block { * @active_block: Block being used in read() * @pos: Read offset in the active block * @block_size: Size of each block + * @next_dequeue: index of next block that will be dequeued */ struct iio_dma_buffer_queue_fileio { struct iio_dma_buffer_block *blocks[2]; struct iio_dma_buffer_block *active_block; size_t pos; size_t block_size; + + unsigned int next_dequeue; };
/** @@ -93,7 +94,6 @@ struct iio_dma_buffer_queue_fileio { * list and typically also a list of active blocks in the part that handles * the DMA controller * @incoming: List of buffers on the incoming queue - * @outgoing: List of buffers on the outgoing queue * @active: Whether the buffer is currently active * @fileio: FileIO state */ @@ -105,7 +105,6 @@ struct iio_dma_buffer_queue { struct mutex lock; spinlock_t list_lock; struct list_head incoming; - struct list_head outgoing;
bool active;
On Tue, 19 Dec 2023 18:50:02 +0100 Paul Cercueil paul@crapouillou.net wrote:
The buffer-dma code was using two queues, incoming and outgoing, to manage the state of the blocks in use.
While this totally works, it adds some complexity to the code, especially since the code only manages 2 blocks. It is much easier to just check each block's state manually, and keep a counter for the next block to dequeue.
Since the new DMABUF based API wouldn't use the outgoing queue anyway, getting rid of it now makes the upcoming changes simpler.
With this change, the IIO_BLOCK_STATE_DEQUEUED is now useless, and can be removed.
Signed-off-by: Paul Cercueil paul@crapouillou.net
I've applied this in interests in reducing the outstanding set of patches and because it stands fine as on its own.
Applied to the togreg branch of iio.git and pushed out as testing. Note this is now almost certainly 6.9 material given timing.
Jonathan
From: Alexandru Ardelean alexandru.ardelean@analog.com
This change splits the logic into a separate function, which will be re-used later.
Signed-off-by: Alexandru Ardelean alexandru.ardelean@analog.com Cc: Alexandru Ardelean ardeleanalex@gmail.com Signed-off-by: Paul Cercueil paul@crapouillou.net --- drivers/iio/buffer/industrialio-buffer-dma.c | 43 +++++++++++--------- 1 file changed, 24 insertions(+), 19 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c index 1fc91467d1aa..5610ba67925e 100644 --- a/drivers/iio/buffer/industrialio-buffer-dma.c +++ b/drivers/iio/buffer/industrialio-buffer-dma.c @@ -346,6 +346,29 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) } EXPORT_SYMBOL_GPL(iio_dma_buffer_request_update);
+static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue) +{ + unsigned int i; + + spin_lock_irq(&queue->list_lock); + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { + if (!queue->fileio.blocks[i]) + continue; + queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD; + } + spin_unlock_irq(&queue->list_lock); + + INIT_LIST_HEAD(&queue->incoming); + + for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { + if (!queue->fileio.blocks[i]) + continue; + iio_buffer_block_put(queue->fileio.blocks[i]); + queue->fileio.blocks[i] = NULL; + } + queue->fileio.active_block = NULL; +} + static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue, struct iio_dma_buffer_block *block) { @@ -638,27 +661,9 @@ EXPORT_SYMBOL_GPL(iio_dma_buffer_init); */ void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue) { - unsigned int i; - mutex_lock(&queue->lock);
- spin_lock_irq(&queue->list_lock); - for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { - if (!queue->fileio.blocks[i]) - continue; - queue->fileio.blocks[i]->state = IIO_BLOCK_STATE_DEAD; - } - spin_unlock_irq(&queue->list_lock); - - INIT_LIST_HEAD(&queue->incoming); - - for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { - if (!queue->fileio.blocks[i]) - continue; - iio_buffer_block_put(queue->fileio.blocks[i]); - queue->fileio.blocks[i] = NULL; - } - queue->fileio.active_block = NULL; + iio_dma_buffer_fileio_free(queue); queue->ops = NULL;
mutex_unlock(&queue->lock);
On Tue, 19 Dec 2023 18:50:03 +0100 Paul Cercueil paul@crapouillou.net wrote:
From: Alexandru Ardelean alexandru.ardelean@analog.com
This change splits the logic into a separate function, which will be re-used later.
Signed-off-by: Alexandru Ardelean alexandru.ardelean@analog.com Cc: Alexandru Ardelean ardeleanalex@gmail.com Signed-off-by: Paul Cercueil paul@crapouillou.net
Harmless refactor so I'm fine taking this now to reduce noise on any v6
Applied
Jonathan
This function can be used to initiate a scatter-gather DMA transfer, where the address and size of each segment is located in one entry of the dma_vec array.
The major difference with dmaengine_prep_slave_sg() is that it supports specifying the lengths of each DMA transfer; as trying to override the length of the transfer with dmaengine_prep_slave_sg() is a very tedious process. The introduction of a new API function is also justified by the fact that scatterlists are on their way out.
Note that dmaengine_prep_interleaved_dma() is not helpful either in that case, as it assumes that the address of each segment will be higher than the one of the previous segment, which we just cannot guarantee in case of a scatter-gather transfer.
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v3: New patch
v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct 'dma_vec'. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function? --- include/linux/dmaengine.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 3df70d6131c8..ee5931ddb42f 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -160,6 +160,16 @@ struct dma_interleaved_template { struct data_chunk sgl[]; };
+/** + * struct dma_vec - DMA vector + * @addr: Bus address of the start of the vector + * @len: Length in bytes of the DMA vector + */ +struct dma_vec { + dma_addr_t addr; + size_t len; +}; + /** * enum dma_ctrl_flags - DMA flags to augment operation preparation, * control completion, and communicate status. @@ -910,6 +920,10 @@ struct dma_device { struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)( struct dma_chan *chan, unsigned long flags);
+ struct dma_async_tx_descriptor *(*device_prep_slave_dma_vec)( + struct dma_chan *chan, const struct dma_vec *vecs, + size_t nents, enum dma_transfer_direction direction, + unsigned long flags); struct dma_async_tx_descriptor *(*device_prep_slave_sg)( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, @@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single( dir, flags, NULL); }
+static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec( + struct dma_chan *chan, const struct dma_vec *vecs, size_t nents, + enum dma_transfer_direction dir, unsigned long flags) +{ + if (!chan || !chan->device || !chan->device->device_prep_slave_dma_vec) + return NULL; + + return chan->device->device_prep_slave_dma_vec(chan, vecs, nents, + dir, flags); +} + static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction dir, unsigned long flags)
On Tue, 19 Dec 2023 18:50:04 +0100 Paul Cercueil paul@crapouillou.net wrote:
This function can be used to initiate a scatter-gather DMA transfer, where the address and size of each segment is located in one entry of the dma_vec array.
The major difference with dmaengine_prep_slave_sg() is that it supports specifying the lengths of each DMA transfer; as trying to override the length of the transfer with dmaengine_prep_slave_sg() is a very tedious process. The introduction of a new API function is also justified by the fact that scatterlists are on their way out.
Note that dmaengine_prep_interleaved_dma() is not helpful either in that case, as it assumes that the address of each segment will be higher than the one of the previous segment, which we just cannot guarantee in case of a scatter-gather transfer.
Signed-off-by: Paul Cercueil paul@crapouillou.net
This and the next patch look fine to me as clearly simplify things for our usecases, but they are really something for the dmaengine maintainers to comment on.
Jonathan
On 19-12-23, 18:50, Paul Cercueil wrote:
This function can be used to initiate a scatter-gather DMA transfer, where the address and size of each segment is located in one entry of the dma_vec array.
The major difference with dmaengine_prep_slave_sg() is that it supports specifying the lengths of each DMA transfer; as trying to override the length of the transfer with dmaengine_prep_slave_sg() is a very tedious process. The introduction of a new API function is also justified by the fact that scatterlists are on their way out.
Note that dmaengine_prep_interleaved_dma() is not helpful either in that case, as it assumes that the address of each segment will be higher than the one of the previous segment, which we just cannot guarantee in case of a scatter-gather transfer.
Signed-off-by: Paul Cercueil paul@crapouillou.net
v3: New patch
v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct 'dma_vec'. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
include/linux/dmaengine.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 3df70d6131c8..ee5931ddb42f 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -160,6 +160,16 @@ struct dma_interleaved_template { struct data_chunk sgl[]; }; +/**
- struct dma_vec - DMA vector
- @addr: Bus address of the start of the vector
- @len: Length in bytes of the DMA vector
- */
+struct dma_vec {
- dma_addr_t addr;
- size_t len;
+};
so you want to transfer multiple buffers, right? why not use dmaengine_prep_slave_sg(). If there is reason for not using that one?
Furthermore I missed replying to your email earlier on use of dmaengine_prep_interleaved_dma(), my apologies. That can be made to work for you as well. Please see the notes where icg can be ignored and it does not need icg value to be set
Infact, interleaved api can be made to work in most of these cases I can think of...
/**
- enum dma_ctrl_flags - DMA flags to augment operation preparation,
- control completion, and communicate status.
@@ -910,6 +920,10 @@ struct dma_device { struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)( struct dma_chan *chan, unsigned long flags);
- struct dma_async_tx_descriptor *(*device_prep_slave_dma_vec)(
struct dma_chan *chan, const struct dma_vec *vecs,
size_t nents, enum dma_transfer_direction direction,
struct dma_async_tx_descriptor *(*device_prep_slave_sg)( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction,unsigned long flags);
@@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single( dir, flags, NULL); } +static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
- struct dma_chan *chan, const struct dma_vec *vecs, size_t nents,
- enum dma_transfer_direction dir, unsigned long flags)
+{
- if (!chan || !chan->device || !chan->device->device_prep_slave_dma_vec)
return NULL;
- return chan->device->device_prep_slave_dma_vec(chan, vecs, nents,
dir, flags);
+}
static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction dir, unsigned long flags) -- 2.43.0
Hi Vinod,
Le jeudi 21 décembre 2023 à 20:44 +0530, Vinod Koul a écrit :
On 19-12-23, 18:50, Paul Cercueil wrote:
This function can be used to initiate a scatter-gather DMA transfer, where the address and size of each segment is located in one entry of the dma_vec array.
The major difference with dmaengine_prep_slave_sg() is that it supports specifying the lengths of each DMA transfer; as trying to override the length of the transfer with dmaengine_prep_slave_sg() is a very tedious process. The introduction of a new API function is also justified by the fact that scatterlists are on their way out.
Note that dmaengine_prep_interleaved_dma() is not helpful either in that case, as it assumes that the address of each segment will be higher than the one of the previous segment, which we just cannot guarantee in case of a scatter-gather transfer.
Signed-off-by: Paul Cercueil paul@crapouillou.net
v3: New patch
v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct 'dma_vec'. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
include/linux/dmaengine.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 3df70d6131c8..ee5931ddb42f 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -160,6 +160,16 @@ struct dma_interleaved_template { struct data_chunk sgl[]; }; +/**
- struct dma_vec - DMA vector
- @addr: Bus address of the start of the vector
- @len: Length in bytes of the DMA vector
- */
+struct dma_vec {
- dma_addr_t addr;
- size_t len;
+};
so you want to transfer multiple buffers, right? why not use dmaengine_prep_slave_sg(). If there is reason for not using that one?
Well I think I answer that in the commit message, don't I?
Furthermore I missed replying to your email earlier on use of dmaengine_prep_interleaved_dma(), my apologies. That can be made to work for you as well. Please see the notes where icg can be ignored and it does not need icg value to be set
Infact, interleaved api can be made to work in most of these cases I can think of...
So if I want to transfer 16 bytes from 0x10, then 16 bytes from 0x0, then 16 bytes from 0x20, how should I configure the dma_interleaved_template?
Cheers, -Paul
/** * enum dma_ctrl_flags - DMA flags to augment operation preparation, * control completion, and communicate status. @@ -910,6 +920,10 @@ struct dma_device { struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)( struct dma_chan *chan, unsigned long flags);
- struct dma_async_tx_descriptor
*(*device_prep_slave_dma_vec)(
struct dma_chan *chan, const struct dma_vec *vecs,
size_t nents, enum dma_transfer_direction
direction,
unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_slave_sg)( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, @@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single( dir, flags, NULL); } +static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
- struct dma_chan *chan, const struct dma_vec *vecs, size_t
nents,
- enum dma_transfer_direction dir, unsigned long flags)
+{
- if (!chan || !chan->device || !chan->device-
device_prep_slave_dma_vec)
return NULL;
- return chan->device->device_prep_slave_dma_vec(chan, vecs,
nents,
dir,
flags); +}
static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction dir, unsigned long flags) -- 2.43.0
Hi Vinod,
Le jeudi 21 décembre 2023 à 20:44 +0530, Vinod Koul a écrit :
On 19-12-23, 18:50, Paul Cercueil wrote:
This function can be used to initiate a scatter-gather DMA transfer, where the address and size of each segment is located in one entry of the dma_vec array.
The major difference with dmaengine_prep_slave_sg() is that it supports specifying the lengths of each DMA transfer; as trying to override the length of the transfer with dmaengine_prep_slave_sg() is a very tedious process. The introduction of a new API function is also justified by the fact that scatterlists are on their way out.
Note that dmaengine_prep_interleaved_dma() is not helpful either in that case, as it assumes that the address of each segment will be higher than the one of the previous segment, which we just cannot guarantee in case of a scatter-gather transfer.
Signed-off-by: Paul Cercueil paul@crapouillou.net
v3: New patch
v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct 'dma_vec'. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
include/linux/dmaengine.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 3df70d6131c8..ee5931ddb42f 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -160,6 +160,16 @@ struct dma_interleaved_template { struct data_chunk sgl[]; }; +/**
- struct dma_vec - DMA vector
- @addr: Bus address of the start of the vector
- @len: Length in bytes of the DMA vector
- */
+struct dma_vec {
- dma_addr_t addr;
- size_t len;
+};
I don't want to be pushy, but I'd like to know how to solve this now, otherwise I'll just send the same patches for my v6.
so you want to transfer multiple buffers, right? why not use dmaengine_prep_slave_sg(). If there is reason for not using that one?
The reason is that we want to have the possibility to transfer less than the total size of the scatterlist, and that's currently very hard to do - scatterlists were designed to not be tampered with.
Christian König then suggested to introduce a "dma_vec" which had been on his TODO list for a while now.
Furthermore I missed replying to your email earlier on use of dmaengine_prep_interleaved_dma(), my apologies. That can be made to work for you as well. Please see the notes where icg can be ignored and it does not need icg value to be set
Infact, interleaved api can be made to work in most of these cases I can think of...
Interleaved API only supports incrementing addresses, I see no way to decrement the address (without using crude hacks e.g. overflowing size_t). I can't guarantee that my DMABUF's pages are ordered in memory.
Cheers, -Paul
/** * enum dma_ctrl_flags - DMA flags to augment operation preparation, * control completion, and communicate status. @@ -910,6 +920,10 @@ struct dma_device { struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)( struct dma_chan *chan, unsigned long flags);
- struct dma_async_tx_descriptor
*(*device_prep_slave_dma_vec)(
struct dma_chan *chan, const struct dma_vec *vecs,
size_t nents, enum dma_transfer_direction
direction,
unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_slave_sg)( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, @@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_single( dir, flags, NULL); } +static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
- struct dma_chan *chan, const struct dma_vec *vecs, size_t
nents,
- enum dma_transfer_direction dir, unsigned long flags)
+{
- if (!chan || !chan->device || !chan->device-
device_prep_slave_dma_vec)
return NULL;
- return chan->device->device_prep_slave_dma_vec(chan, vecs,
nents,
dir,
flags); +}
static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg( struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction dir, unsigned long flags) -- 2.43.0
Hi Paul,
On 08-01-24, 13:20, Paul Cercueil wrote:
Hi Vinod,
Le jeudi 21 décembre 2023 à 20:44 +0530, Vinod Koul a écrit :
On 19-12-23, 18:50, Paul Cercueil wrote:
This function can be used to initiate a scatter-gather DMA transfer, where the address and size of each segment is located in one entry of the dma_vec array.
The major difference with dmaengine_prep_slave_sg() is that it supports specifying the lengths of each DMA transfer; as trying to override the length of the transfer with dmaengine_prep_slave_sg() is a very tedious process. The introduction of a new API function is also justified by the fact that scatterlists are on their way out.
Note that dmaengine_prep_interleaved_dma() is not helpful either in that case, as it assumes that the address of each segment will be higher than the one of the previous segment, which we just cannot guarantee in case of a scatter-gather transfer.
Signed-off-by: Paul Cercueil paul@crapouillou.net
v3: New patch
v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct 'dma_vec'. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
include/linux/dmaengine.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+)
diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 3df70d6131c8..ee5931ddb42f 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -160,6 +160,16 @@ struct dma_interleaved_template { struct data_chunk sgl[]; }; +/**
- struct dma_vec - DMA vector
- @addr: Bus address of the start of the vector
- @len: Length in bytes of the DMA vector
- */
+struct dma_vec {
- dma_addr_t addr;
- size_t len;
+};
I don't want to be pushy, but I'd like to know how to solve this now, otherwise I'll just send the same patches for my v6.
so you want to transfer multiple buffers, right? why not use dmaengine_prep_slave_sg(). If there is reason for not using that one?
The reason is that we want to have the possibility to transfer less than the total size of the scatterlist, and that's currently very hard to do - scatterlists were designed to not be tampered with.
Christian König then suggested to introduce a "dma_vec" which had been on his TODO list for a while now.
Yeah for this interleaved seems overkill. Lets go with this api. I would suggest change the name of the API replacing slave with peripheral though
Add implementation of the .device_prep_slave_dma_vec() callback.
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v3: New patch
v5: Implement .device_prep_slave_dma_vec() instead of v3's .device_prep_slave_dma_array(). --- drivers/dma/dma-axi-dmac.c | 40 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c index 2457a420c13d..363808088cc5 100644 --- a/drivers/dma/dma-axi-dmac.c +++ b/drivers/dma/dma-axi-dmac.c @@ -535,6 +535,45 @@ static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct axi_dmac_chan *chan, return sg; }
+static struct dma_async_tx_descriptor * +axi_dmac_prep_slave_dma_vec(struct dma_chan *c, const struct dma_vec *vecs, + size_t nb, enum dma_transfer_direction direction, + unsigned long flags) +{ + struct axi_dmac_chan *chan = to_axi_dmac_chan(c); + struct axi_dmac_desc *desc; + unsigned int num_sgs = 0; + struct axi_dmac_sg *dsg; + size_t i; + + if (direction != chan->direction) + return NULL; + + for (i = 0; i < nb; i++) + num_sgs += DIV_ROUND_UP(vecs[i].len, chan->max_length); + + desc = axi_dmac_alloc_desc(num_sgs); + if (!desc) + return NULL; + + dsg = desc->sg; + + for (i = 0; i < nb; i++) { + if (!axi_dmac_check_addr(chan, vecs[i].addr) || + !axi_dmac_check_len(chan, vecs[i].len)) { + kfree(desc); + return NULL; + } + + dsg = axi_dmac_fill_linear_sg(chan, direction, vecs[i].addr, 1, + vecs[i].len, dsg); + } + + desc->cyclic = false; + + return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags); +} + static struct dma_async_tx_descriptor *axi_dmac_prep_slave_sg( struct dma_chan *c, struct scatterlist *sgl, unsigned int sg_len, enum dma_transfer_direction direction, @@ -957,6 +996,7 @@ static int axi_dmac_probe(struct platform_device *pdev) dma_dev->device_tx_status = dma_cookie_status; dma_dev->device_issue_pending = axi_dmac_issue_pending; dma_dev->device_prep_slave_sg = axi_dmac_prep_slave_sg; + dma_dev->device_prep_slave_dma_vec = axi_dmac_prep_slave_dma_vec; dma_dev->device_prep_dma_cyclic = axi_dmac_prep_dma_cyclic; dma_dev->device_prep_interleaved_dma = axi_dmac_prep_interleaved; dma_dev->device_terminate_all = axi_dmac_terminate_all;
Add the necessary infrastructure to the IIO core to support a new optional DMABUF based interface.
With this new interface, DMABUF objects (externally created) can be attached to a IIO buffer, and subsequently used for data transfer.
A userspace application can then use this interface to share DMABUF objects between several interfaces, allowing it to transfer data in a zero-copy fashion, for instance between IIO and the USB stack.
The userspace application can also memory-map the DMABUF objects, and access the sample data directly. The advantage of doing this vs. the read() interface is that it avoids an extra copy of the data between the kernel and userspace. This is particularly userful for high-speed devices which produce several megabytes or even gigabytes of data per second.
As part of the interface, 3 new IOCTLs have been added:
IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd): Attach the DMABUF object identified by the given file descriptor to the buffer.
IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd): Detach the DMABUF object identified by the given file descriptor from the buffer. Note that closing the IIO buffer's file descriptor will automatically detach all previously attached DMABUF objects.
IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *): Request a data transfer to/from the given DMABUF object. Its file descriptor, as well as the transfer size and flags are provided in the "iio_dmabuf" structure.
These three IOCTLs have to be performed on the IIO buffer's file descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v2: Only allow the new IOCTLs on the buffer FD created with IIO_BUFFER_GET_FD_IOCTL().
v3: - Get rid of the old IOCTLs. The IIO subsystem does not create or manage DMABUFs anymore, and only attaches/detaches externally created DMABUFs. - Add IIO_BUFFER_DMABUF_CYCLIC to the supported flags.
v5: - Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static --- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++++++++++++++ include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 ++ 3 files changed, 450 insertions(+)
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c index 09c41e9ccf87..24c040e073a7 100644 --- a/drivers/iio/industrialio-buffer.c +++ b/drivers/iio/industrialio-buffer.c @@ -13,10 +13,14 @@ #include <linux/kernel.h> #include <linux/export.h> #include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/dma-fence.h> +#include <linux/dma-resv.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/cdev.h> #include <linux/slab.h> +#include <linux/mm.h> #include <linux/poll.h> #include <linux/sched/signal.h>
@@ -28,6 +32,31 @@ #include <linux/iio/buffer.h> #include <linux/iio/buffer_impl.h>
+#define DMABUF_ENQUEUE_TIMEOUT_MS 5000 + +struct iio_dma_fence; + +struct iio_dmabuf_priv { + struct list_head entry; + struct kref ref; + + struct iio_buffer *buffer; + struct iio_dma_buffer_block *block; + + u64 context; + spinlock_t lock; + + struct dma_buf_attachment *attach; + struct iio_dma_fence *fence; +}; + +struct iio_dma_fence { + struct dma_fence base; + struct iio_dmabuf_priv *priv; + struct sg_table *sgt; + enum dma_data_direction dir; +}; + static const char * const iio_endian_prefix[] = { [IIO_BE] = "be", [IIO_LE] = "le", @@ -332,6 +361,7 @@ void iio_buffer_init(struct iio_buffer *buffer) { INIT_LIST_HEAD(&buffer->demux_list); INIT_LIST_HEAD(&buffer->buffer_list); + INIT_LIST_HEAD(&buffer->dmabufs); init_waitqueue_head(&buffer->pollq); kref_init(&buffer->ref); if (!buffer->watermark) @@ -1519,14 +1549,54 @@ static void iio_buffer_unregister_legacy_sysfs_groups(struct iio_dev *indio_dev) kfree(iio_dev_opaque->legacy_scan_el_group.attrs); }
+static void iio_buffer_dmabuf_release(struct kref *ref) +{ + struct iio_dmabuf_priv *priv = container_of(ref, struct iio_dmabuf_priv, ref); + struct dma_buf_attachment *attach = priv->attach; + struct iio_buffer *buffer = priv->buffer; + struct dma_buf *dmabuf = attach->dmabuf; + + buffer->access->detach_dmabuf(buffer, priv->block); + + dma_buf_detach(attach->dmabuf, attach); + dma_buf_put(dmabuf); + kfree(priv); +} + +void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach) +{ + struct iio_dmabuf_priv *priv = attach->importer_priv; + + kref_get(&priv->ref); +} +EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_get); + +void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach) +{ + struct iio_dmabuf_priv *priv = attach->importer_priv; + + kref_put(&priv->ref, iio_buffer_dmabuf_release); +} +EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_put); + static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep) { struct iio_dev_buffer_pair *ib = filep->private_data; struct iio_dev *indio_dev = ib->indio_dev; struct iio_buffer *buffer = ib->buffer; + struct iio_dmabuf_priv *priv, *tmp;
wake_up(&buffer->pollq);
+ /* Close all attached DMABUFs */ + list_for_each_entry_safe(priv, tmp, &buffer->dmabufs, entry) { + list_del_init(&priv->entry); + iio_buffer_dmabuf_put(priv->attach); + } + + if (!list_empty(&buffer->dmabufs)) + dev_warn(&indio_dev->dev, "Buffer FD closed with active transfers\n"); + kfree(ib); clear_bit(IIO_BUSY_BIT_POS, &buffer->flags); iio_device_put(indio_dev); @@ -1534,11 +1604,343 @@ static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep) return 0; }
+static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock) +{ + int ret; + + ret = dma_resv_lock_interruptible(dmabuf->resv, NULL); + if (ret) { + if (ret != -EDEADLK) + goto out; + if (nonblock) { + ret = -EBUSY; + goto out; + } + + ret = dma_resv_lock_slow_interruptible(dmabuf->resv, NULL); + } + +out: + return ret; +} + +static struct dma_buf_attachment * +iio_buffer_find_attachment(struct iio_dev *indio_dev, struct dma_buf *dmabuf) +{ + struct dma_buf_attachment *elm, *attach = NULL; + int ret; + + ret = iio_dma_resv_lock(dmabuf, false); + if (ret) + return ERR_PTR(ret); + + list_for_each_entry(elm, &dmabuf->attachments, node) { + if (elm->dev == indio_dev->dev.parent) { + attach = elm; + break; + } + } + + if (attach) + iio_buffer_dmabuf_get(elm); + + dma_resv_unlock(dmabuf->resv); + + return attach ?: ERR_PTR(-EPERM); +} + +static int iio_buffer_attach_dmabuf(struct iio_dev_buffer_pair *ib, + int __user *user_fd) +{ + struct iio_dev *indio_dev = ib->indio_dev; + struct iio_buffer *buffer = ib->buffer; + struct dma_buf_attachment *attach; + struct iio_dmabuf_priv *priv; + struct dma_buf *dmabuf; + int err, fd; + + if (!buffer->access->attach_dmabuf + || !buffer->access->detach_dmabuf + || !buffer->access->enqueue_dmabuf) + return -EPERM; + + if (copy_from_user(&fd, user_fd, sizeof(fd))) + return -EFAULT; + + priv = kzalloc(sizeof(*priv), GFP_KERNEL); + if (!priv) + return -ENOMEM; + + spin_lock_init(&priv->lock); + priv->context = dma_fence_context_alloc(1); + + dmabuf = dma_buf_get(fd); + if (IS_ERR(dmabuf)) { + err = PTR_ERR(dmabuf); + goto err_free_priv; + } + + attach = dma_buf_attach(dmabuf, indio_dev->dev.parent); + if (IS_ERR(attach)) { + err = PTR_ERR(attach); + goto err_dmabuf_put; + } + + kref_init(&priv->ref); + priv->buffer = buffer; + priv->attach = attach; + attach->importer_priv = priv; + + priv->block = buffer->access->attach_dmabuf(buffer, attach); + if (IS_ERR(priv->block)) { + err = PTR_ERR(priv->block); + goto err_dmabuf_detach; + } + + list_add(&priv->entry, &buffer->dmabufs); + + return 0; + +err_dmabuf_detach: + dma_buf_detach(dmabuf, attach); +err_dmabuf_put: + dma_buf_put(dmabuf); +err_free_priv: + kfree(priv); + + return err; +} + +static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req) +{ + struct dma_buf_attachment *attach; + struct iio_dmabuf_priv *priv; + struct dma_buf *dmabuf; + int dmabuf_fd, ret = 0; + + if (copy_from_user(&dmabuf_fd, user_req, sizeof(dmabuf_fd))) + return -EFAULT; + + dmabuf = dma_buf_get(dmabuf_fd); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + attach = iio_buffer_find_attachment(ib->indio_dev, dmabuf); + if (IS_ERR(attach)) { + ret = PTR_ERR(attach); + goto out_dmabuf_put; + } + + priv = attach->importer_priv; + list_del_init(&priv->entry); + + /* + * Unref twice to release the reference obtained with + * iio_buffer_find_attachment() above, and the one obtained in + * iio_buffer_attach_dmabuf(). + */ + iio_buffer_dmabuf_put(attach); + iio_buffer_dmabuf_put(attach); + +out_dmabuf_put: + dma_buf_put(dmabuf); + + return ret; +} + +static const char * +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence) +{ + return "iio"; +} + +static void iio_buffer_dma_fence_release(struct dma_fence *fence) +{ + struct iio_dma_fence *iio_fence = + container_of(fence, struct iio_dma_fence, base); + + kfree(iio_fence); +} + +static const struct dma_fence_ops iio_buffer_dma_fence_ops = { + .get_driver_name = iio_buffer_dma_fence_get_driver_name, + .get_timeline_name = iio_buffer_dma_fence_get_driver_name, + .release = iio_buffer_dma_fence_release, +}; + +static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib, + struct iio_dmabuf __user *iio_dmabuf_req, + bool nonblock) +{ + struct iio_dev *indio_dev = ib->indio_dev; + struct iio_buffer *buffer = ib->buffer; + struct iio_dmabuf iio_dmabuf; + struct dma_buf_attachment *attach; + struct iio_dmabuf_priv *priv; + enum dma_data_direction dir; + struct iio_dma_fence *fence; + struct dma_buf *dmabuf; + struct sg_table *sgt; + unsigned long timeout; + bool dma_to_ram; + bool cyclic; + int ret; + + if (copy_from_user(&iio_dmabuf, iio_dmabuf_req, sizeof(iio_dmabuf))) + return -EFAULT; + + if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS) + return -EINVAL; + + cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC; + + /* Cyclic flag is only supported on output buffers */ + if (cyclic && buffer->direction != IIO_BUFFER_DIRECTION_OUT) + return -EINVAL; + + dmabuf = dma_buf_get(iio_dmabuf.fd); + if (IS_ERR(dmabuf)) + return PTR_ERR(dmabuf); + + if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used > dmabuf->size) { + ret = -EINVAL; + goto err_dmabuf_put; + } + + attach = iio_buffer_find_attachment(indio_dev, dmabuf); + if (IS_ERR(attach)) { + ret = PTR_ERR(attach); + goto err_dmabuf_put; + } + + priv = attach->importer_priv; + + dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN; + dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE; + + sgt = dma_buf_map_attachment(attach, dir); + if (IS_ERR(sgt)) { + ret = PTR_ERR(sgt); + dev_err(&indio_dev->dev, "Unable to map attachment: %d\n", ret); + goto err_attachment_put; + } + + fence = kmalloc(sizeof(*fence), GFP_KERNEL); + if (!fence) { + ret = -ENOMEM; + goto err_unmap_attachment; + } + + fence->priv = priv; + fence->sgt = sgt; + fence->dir = dir; + priv->fence = fence; + + dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops, + &priv->lock, priv->context, 0); + + ret = iio_dma_resv_lock(dmabuf, nonblock); + if (ret) + goto err_fence_put; + + timeout = nonblock ? 0 : msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS); + + /* Make sure we don't have writers */ + ret = (int) dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_WRITE, + true, timeout); + if (ret == 0) + ret = -EBUSY; + if (ret < 0) + goto err_resv_unlock; + + if (dma_to_ram) { + /* + * If we're writing to the DMABUF, make sure we don't have + * readers + */ + ret = (int) dma_resv_wait_timeout(dmabuf->resv, + DMA_RESV_USAGE_READ, true, + timeout); + if (ret == 0) + ret = -EBUSY; + if (ret < 0) + goto err_resv_unlock; + } + + ret = dma_resv_reserve_fences(dmabuf->resv, 1); + if (ret) + goto err_resv_unlock; + + dma_resv_add_fence(dmabuf->resv, &fence->base, + dma_resv_usage_rw(dma_to_ram)); + dma_resv_unlock(dmabuf->resv); + + ret = buffer->access->enqueue_dmabuf(buffer, priv->block, sgt, + iio_dmabuf.bytes_used, cyclic); + if (ret) + iio_buffer_signal_dmabuf_done(attach, ret); + + dma_buf_put(dmabuf); + + return ret; + +err_resv_unlock: + dma_resv_unlock(dmabuf->resv); +err_fence_put: + dma_fence_put(&fence->base); +err_unmap_attachment: + dma_buf_unmap_attachment(attach, sgt, dir); +err_attachment_put: + iio_buffer_dmabuf_put(attach); +err_dmabuf_put: + dma_buf_put(dmabuf); + + return ret; +} + +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret) +{ + struct iio_dmabuf_priv *priv = attach->importer_priv; + struct iio_dma_fence *fence = priv->fence; + enum dma_data_direction dir = fence->dir; + struct sg_table *sgt = fence->sgt; + + dma_fence_get(&fence->base); + fence->base.error = ret; + dma_fence_signal(&fence->base); + dma_fence_put(&fence->base); + + dma_buf_unmap_attachment(attach, sgt, dir); + iio_buffer_dmabuf_put(attach); +} +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done); + +static long iio_buffer_chrdev_ioctl(struct file *filp, + unsigned int cmd, unsigned long arg) +{ + struct iio_dev_buffer_pair *ib = filp->private_data; + void __user *_arg = (void __user *)arg; + + switch (cmd) { + case IIO_BUFFER_DMABUF_ATTACH_IOCTL: + return iio_buffer_attach_dmabuf(ib, _arg); + case IIO_BUFFER_DMABUF_DETACH_IOCTL: + return iio_buffer_detach_dmabuf(ib, _arg); + case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL: + return iio_buffer_enqueue_dmabuf(ib, _arg, + filp->f_flags & O_NONBLOCK); + default: + return IIO_IOCTL_UNHANDLED; + } +} + static const struct file_operations iio_buffer_chrdev_fileops = { .owner = THIS_MODULE, .llseek = noop_llseek, .read = iio_buffer_read, .write = iio_buffer_write, + .unlocked_ioctl = iio_buffer_chrdev_ioctl, + .compat_ioctl = compat_ptr_ioctl, .poll = iio_buffer_poll, .release = iio_buffer_chrdev_release, }; diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h index 89c3fd7c29ca..55d93705c96b 100644 --- a/include/linux/iio/buffer_impl.h +++ b/include/linux/iio/buffer_impl.h @@ -9,8 +9,11 @@ #include <uapi/linux/iio/buffer.h> #include <linux/iio/buffer.h>
+struct dma_buf_attachment; struct iio_dev; +struct iio_dma_buffer_block; struct iio_buffer; +struct sg_table;
/** * INDIO_BUFFER_FLAG_FIXED_WATERMARK - Watermark level of the buffer can not be @@ -39,6 +42,13 @@ struct iio_buffer; * device stops sampling. Calles are balanced with @enable. * @release: called when the last reference to the buffer is dropped, * should free all resources allocated by the buffer. + * @attach_dmabuf: called from userspace via ioctl to attach one external + * DMABUF. + * @detach_dmabuf: called from userspace via ioctl to detach one previously + * attached DMABUF. + * @enqueue_dmabuf: called from userspace via ioctl to queue this DMABUF + * object to this buffer. Requires a valid DMABUF fd, that + * was previouly attached to this buffer. * @modes: Supported operating modes by this buffer type * @flags: A bitmask combination of INDIO_BUFFER_FLAG_* * @@ -68,6 +78,14 @@ struct iio_buffer_access_funcs {
void (*release)(struct iio_buffer *buffer);
+ struct iio_dma_buffer_block * (*attach_dmabuf)(struct iio_buffer *buffer, + struct dma_buf_attachment *attach); + void (*detach_dmabuf)(struct iio_buffer *buffer, + struct iio_dma_buffer_block *block); + int (*enqueue_dmabuf)(struct iio_buffer *buffer, + struct iio_dma_buffer_block *block, + struct sg_table *sgt, size_t size, bool cyclic); + unsigned int modes; unsigned int flags; }; @@ -136,6 +154,9 @@ struct iio_buffer {
/* @ref: Reference count of the buffer. */ struct kref ref; + + /* @dmabufs: List of DMABUF attachments */ + struct list_head dmabufs; };
/** @@ -156,9 +177,14 @@ int iio_update_buffers(struct iio_dev *indio_dev, **/ void iio_buffer_init(struct iio_buffer *buffer);
+void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach); +void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach); + struct iio_buffer *iio_buffer_get(struct iio_buffer *buffer); void iio_buffer_put(struct iio_buffer *buffer);
+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret); + #else /* CONFIG_IIO_BUFFER */
static inline void iio_buffer_get(struct iio_buffer *buffer) {} diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h index 13939032b3f6..c666aa95e532 100644 --- a/include/uapi/linux/iio/buffer.h +++ b/include/uapi/linux/iio/buffer.h @@ -5,6 +5,28 @@ #ifndef _UAPI_IIO_BUFFER_H_ #define _UAPI_IIO_BUFFER_H_
+#include <linux/types.h> + +/* Flags for iio_dmabuf.flags */ +#define IIO_BUFFER_DMABUF_CYCLIC (1 << 0) +#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS 0x00000001 + +/** + * struct iio_dmabuf - Descriptor for a single IIO DMABUF object + * @fd: file descriptor of the DMABUF object + * @flags: one or more IIO_BUFFER_DMABUF_* flags + * @bytes_used: number of bytes used in this DMABUF for the data transfer. + * Should generally be set to the DMABUF's size. + */ +struct iio_dmabuf { + __u32 fd; + __u32 flags; + __u64 bytes_used; +}; + #define IIO_BUFFER_GET_FD_IOCTL _IOWR('i', 0x91, int) +#define IIO_BUFFER_DMABUF_ATTACH_IOCTL _IOW('i', 0x92, int) +#define IIO_BUFFER_DMABUF_DETACH_IOCTL _IOW('i', 0x93, int) +#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL _IOW('i', 0x94, struct iio_dmabuf)
#endif /* _UAPI_IIO_BUFFER_H_ */
On Tue, 19 Dec 2023 18:50:06 +0100 Paul Cercueil paul@crapouillou.net wrote:
Add the necessary infrastructure to the IIO core to support a new optional DMABUF based interface.
With this new interface, DMABUF objects (externally created) can be attached to a IIO buffer, and subsequently used for data transfer.
A userspace application can then use this interface to share DMABUF objects between several interfaces, allowing it to transfer data in a zero-copy fashion, for instance between IIO and the USB stack.
The userspace application can also memory-map the DMABUF objects, and access the sample data directly. The advantage of doing this vs. the read() interface is that it avoids an extra copy of the data between the kernel and userspace. This is particularly userful for high-speed devices which produce several megabytes or even gigabytes of data per second.
As part of the interface, 3 new IOCTLs have been added:
IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd): Attach the DMABUF object identified by the given file descriptor to the buffer.
IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd): Detach the DMABUF object identified by the given file descriptor from the buffer. Note that closing the IIO buffer's file descriptor will automatically detach all previously attached DMABUF objects.
IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *): Request a data transfer to/from the given DMABUF object. Its file descriptor, as well as the transfer size and flags are provided in the "iio_dmabuf" structure.
These three IOCTLs have to be performed on the IIO buffer's file descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
Fair enough - so they don't apply to the 'legacy' buffer which simplifies things but in one place you assume that logic is used (given error return values).
Signed-off-by: Paul Cercueil paul@crapouillou.net
This is big and complex and I'm out of time for now, so I've made some comments but should revisit it. I'm also looking for review from those more familiar with dmabuf side of things than I am!
Jonathan
+static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock) +{
- int ret;
- ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
- if (ret) {
if (ret != -EDEADLK)
goto out;
if (nonblock) {
ret = -EBUSY;
goto out;
}
ret = dma_resv_lock_slow_interruptible(dmabuf->resv, NULL);
- }
+out:
- return ret;
I'm not a fan gotos that do nothing. Just return in appropriate places above.
+}
+static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req) +{
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- struct dma_buf *dmabuf;
- int dmabuf_fd, ret = 0;
- if (copy_from_user(&dmabuf_fd, user_req, sizeof(dmabuf_fd)))
return -EFAULT;
- dmabuf = dma_buf_get(dmabuf_fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- attach = iio_buffer_find_attachment(ib->indio_dev, dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto out_dmabuf_put;
- }
- priv = attach->importer_priv;
- list_del_init(&priv->entry);
- /*
* Unref twice to release the reference obtained with
* iio_buffer_find_attachment() above, and the one obtained in
* iio_buffer_attach_dmabuf().
*/
- iio_buffer_dmabuf_put(attach);
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
- return ret;
+}
+static const char * +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence) +{
- return "iio";
+}
+static void iio_buffer_dma_fence_release(struct dma_fence *fence) +{
- struct iio_dma_fence *iio_fence =
container_of(fence, struct iio_dma_fence, base);
- kfree(iio_fence);
+}
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
- .get_driver_name = iio_buffer_dma_fence_get_driver_name,
- .get_timeline_name = iio_buffer_dma_fence_get_driver_name,
- .release = iio_buffer_dma_fence_release,
+};
+static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
struct iio_dmabuf __user *iio_dmabuf_req,
bool nonblock)
+{
- struct iio_dev *indio_dev = ib->indio_dev;
- struct iio_buffer *buffer = ib->buffer;
- struct iio_dmabuf iio_dmabuf;
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- enum dma_data_direction dir;
- struct iio_dma_fence *fence;
- struct dma_buf *dmabuf;
- struct sg_table *sgt;
- unsigned long timeout;
- bool dma_to_ram;
- bool cyclic;
- int ret;
- if (copy_from_user(&iio_dmabuf, iio_dmabuf_req, sizeof(iio_dmabuf)))
return -EFAULT;
- if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
return -EINVAL;
- cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
- /* Cyclic flag is only supported on output buffers */
- if (cyclic && buffer->direction != IIO_BUFFER_DIRECTION_OUT)
return -EINVAL;
- dmabuf = dma_buf_get(iio_dmabuf.fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used > dmabuf->size) {
ret = -EINVAL;
goto err_dmabuf_put;
- }
- attach = iio_buffer_find_attachment(indio_dev, dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto err_dmabuf_put;
Might be worth some cleanup.h magic given this put happens in all exit paths.
- }
- priv = attach->importer_priv;
- dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
- dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
- sgt = dma_buf_map_attachment(attach, dir);
- if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
dev_err(&indio_dev->dev, "Unable to map attachment: %d\n", ret);
goto err_attachment_put;
- }
- fence = kmalloc(sizeof(*fence), GFP_KERNEL);
- if (!fence) {
ret = -ENOMEM;
goto err_unmap_attachment;
- }
- fence->priv = priv;
- fence->sgt = sgt;
- fence->dir = dir;
- priv->fence = fence;
- dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
&priv->lock, priv->context, 0);
- ret = iio_dma_resv_lock(dmabuf, nonblock);
- if (ret)
goto err_fence_put;
- timeout = nonblock ? 0 : msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
- /* Make sure we don't have writers */
- ret = (int) dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_WRITE,
true, timeout);
I'd handle this and similar cases as long rather than adding the odd looking cast and making me think too much about it.
- if (ret == 0)
ret = -EBUSY;
- if (ret < 0)
goto err_resv_unlock;
- if (dma_to_ram) {
/*
* If we're writing to the DMABUF, make sure we don't have
* readers
*/
ret = (int) dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_READ, true,
timeout);
if (ret == 0)
ret = -EBUSY;
if (ret < 0)
goto err_resv_unlock;
- }
- ret = dma_resv_reserve_fences(dmabuf->resv, 1);
- if (ret)
goto err_resv_unlock;
- dma_resv_add_fence(dmabuf->resv, &fence->base,
dma_resv_usage_rw(dma_to_ram));
- dma_resv_unlock(dmabuf->resv);
- ret = buffer->access->enqueue_dmabuf(buffer, priv->block, sgt,
iio_dmabuf.bytes_used, cyclic);
- if (ret)
iio_buffer_signal_dmabuf_done(attach, ret);
I'd like a comment on why using the 'successful' path cleanup makes sense in this error case. It's possible to figure it out, but reviewers are lazy and generally we like the cleanup to be obvious and local on error paths.
- dma_buf_put(dmabuf);
- return ret;
+err_resv_unlock:
- dma_resv_unlock(dmabuf->resv);
+err_fence_put:
- dma_fence_put(&fence->base);
+err_unmap_attachment:
- dma_buf_unmap_attachment(attach, sgt, dir);
+err_attachment_put:
- iio_buffer_dmabuf_put(attach);
+err_dmabuf_put:
- dma_buf_put(dmabuf);
- return ret;
+}
+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret) +{
- struct iio_dmabuf_priv *priv = attach->importer_priv;
- struct iio_dma_fence *fence = priv->fence;
- enum dma_data_direction dir = fence->dir;
- struct sg_table *sgt = fence->sgt;
- dma_fence_get(&fence->base);
I don't know much about dma_fence, but is it valid to access contents of it (sgt, etc) before getting a reference? Ultimately dma_fence_put() can result in a kfree_rcu() so seems unlikely to be safe and definitely fails the 'obviously correct' test. Given those are I assume trivial accesses just do them down here perhaps?
- fence->base.error = ret;
- dma_fence_signal(&fence->base);
- dma_fence_put(&fence->base);
- dma_buf_unmap_attachment(attach, sgt, dir);
- iio_buffer_dmabuf_put(attach);
+} +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
+static long iio_buffer_chrdev_ioctl(struct file *filp,
unsigned int cmd, unsigned long arg)
+{
- struct iio_dev_buffer_pair *ib = filp->private_data;
- void __user *_arg = (void __user *)arg;
- switch (cmd) {
- case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
return iio_buffer_attach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_DETACH_IOCTL:
return iio_buffer_detach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
return iio_buffer_enqueue_dmabuf(ib, _arg,
filp->f_flags & O_NONBLOCK);
- default:
return IIO_IOCTL_UNHANDLED;
Given you aren't using the ioctl handling on the legacy buffer, I think this should simply return an error code, not the magic value we use to indicate 'all fine, but it's not mine'. Probably -EINVAL or similar. Note that the wrapper around the legacy buffer ioctls translates this to -ENODEV; rather than returning from the ioctl.
- }
+}
static const struct file_operations iio_buffer_chrdev_fileops = { .owner = THIS_MODULE, .llseek = noop_llseek, .read = iio_buffer_read, .write = iio_buffer_write,
- .unlocked_ioctl = iio_buffer_chrdev_ioctl,
- .compat_ioctl = compat_ptr_ioctl, .poll = iio_buffer_poll, .release = iio_buffer_chrdev_release,
};
Hi Jonathan,
Le jeudi 21 décembre 2023 à 12:06 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:06 +0100 Paul Cercueil paul@crapouillou.net wrote:
Add the necessary infrastructure to the IIO core to support a new optional DMABUF based interface.
With this new interface, DMABUF objects (externally created) can be attached to a IIO buffer, and subsequently used for data transfer.
A userspace application can then use this interface to share DMABUF objects between several interfaces, allowing it to transfer data in a zero-copy fashion, for instance between IIO and the USB stack.
The userspace application can also memory-map the DMABUF objects, and access the sample data directly. The advantage of doing this vs. the read() interface is that it avoids an extra copy of the data between the kernel and userspace. This is particularly userful for high-speed devices which produce several megabytes or even gigabytes of data per second.
As part of the interface, 3 new IOCTLs have been added:
IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd): Attach the DMABUF object identified by the given file descriptor to the buffer.
IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd): Detach the DMABUF object identified by the given file descriptor from the buffer. Note that closing the IIO buffer's file descriptor will automatically detach all previously attached DMABUF objects.
IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *): Request a data transfer to/from the given DMABUF object. Its file descriptor, as well as the transfer size and flags are provided in the "iio_dmabuf" structure.
These three IOCTLs have to be performed on the IIO buffer's file descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
Fair enough - so they don't apply to the 'legacy' buffer which simplifies things but in one place you assume that logic is used (given error return values).
Signed-off-by: Paul Cercueil paul@crapouillou.net
This is big and complex and I'm out of time for now, so I've made some comments but should revisit it. I'm also looking for review from those more familiar with dmabuf side of things than I am!
Jonathan
+static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock) +{
- int ret;
- ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
- if (ret) {
if (ret != -EDEADLK)
goto out;
if (nonblock) {
ret = -EBUSY;
goto out;
}
ret = dma_resv_lock_slow_interruptible(dmabuf-
resv, NULL);
- }
+out:
- return ret;
I'm not a fan gotos that do nothing. Just return in appropriate places above.
+}
+static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req) +{
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- struct dma_buf *dmabuf;
- int dmabuf_fd, ret = 0;
- if (copy_from_user(&dmabuf_fd, user_req,
sizeof(dmabuf_fd)))
return -EFAULT;
- dmabuf = dma_buf_get(dmabuf_fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- attach = iio_buffer_find_attachment(ib->indio_dev,
dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto out_dmabuf_put;
- }
- priv = attach->importer_priv;
- list_del_init(&priv->entry);
- /*
* Unref twice to release the reference obtained with
* iio_buffer_find_attachment() above, and the one
obtained in
* iio_buffer_attach_dmabuf().
*/
- iio_buffer_dmabuf_put(attach);
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
Whoa, never heard about this. That looks great!
- return ret;
+}
+static const char * +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence) +{
- return "iio";
+}
+static void iio_buffer_dma_fence_release(struct dma_fence *fence) +{
- struct iio_dma_fence *iio_fence =
container_of(fence, struct iio_dma_fence, base);
- kfree(iio_fence);
+}
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
- .get_driver_name =
iio_buffer_dma_fence_get_driver_name,
- .get_timeline_name =
iio_buffer_dma_fence_get_driver_name,
- .release = iio_buffer_dma_fence_release,
+};
+static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
struct iio_dmabuf __user
*iio_dmabuf_req,
bool nonblock)
+{
- struct iio_dev *indio_dev = ib->indio_dev;
- struct iio_buffer *buffer = ib->buffer;
- struct iio_dmabuf iio_dmabuf;
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- enum dma_data_direction dir;
- struct iio_dma_fence *fence;
- struct dma_buf *dmabuf;
- struct sg_table *sgt;
- unsigned long timeout;
- bool dma_to_ram;
- bool cyclic;
- int ret;
- if (copy_from_user(&iio_dmabuf, iio_dmabuf_req,
sizeof(iio_dmabuf)))
return -EFAULT;
- if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
return -EINVAL;
- cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
- /* Cyclic flag is only supported on output buffers */
- if (cyclic && buffer->direction !=
IIO_BUFFER_DIRECTION_OUT)
return -EINVAL;
- dmabuf = dma_buf_get(iio_dmabuf.fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used >
dmabuf->size) {
ret = -EINVAL;
goto err_dmabuf_put;
- }
- attach = iio_buffer_find_attachment(indio_dev, dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto err_dmabuf_put;
Might be worth some cleanup.h magic given this put happens in all exit paths.
- }
- priv = attach->importer_priv;
- dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
- dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
- sgt = dma_buf_map_attachment(attach, dir);
- if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
dev_err(&indio_dev->dev, "Unable to map
attachment: %d\n", ret);
goto err_attachment_put;
- }
- fence = kmalloc(sizeof(*fence), GFP_KERNEL);
- if (!fence) {
ret = -ENOMEM;
goto err_unmap_attachment;
- }
- fence->priv = priv;
- fence->sgt = sgt;
- fence->dir = dir;
- priv->fence = fence;
- dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
&priv->lock, priv->context, 0);
- ret = iio_dma_resv_lock(dmabuf, nonblock);
- if (ret)
goto err_fence_put;
- timeout = nonblock ? 0 :
msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
- /* Make sure we don't have writers */
- ret = (int) dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_WRITE,
true, timeout);
I'd handle this and similar cases as long rather than adding the odd looking cast and making me think too much about it.
- if (ret == 0)
ret = -EBUSY;
- if (ret < 0)
goto err_resv_unlock;
- if (dma_to_ram) {
/*
* If we're writing to the DMABUF, make sure we
don't have
* readers
*/
ret = (int) dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_READ, true,
timeout);
if (ret == 0)
ret = -EBUSY;
if (ret < 0)
goto err_resv_unlock;
- }
- ret = dma_resv_reserve_fences(dmabuf->resv, 1);
- if (ret)
goto err_resv_unlock;
- dma_resv_add_fence(dmabuf->resv, &fence->base,
dma_resv_usage_rw(dma_to_ram));
- dma_resv_unlock(dmabuf->resv);
- ret = buffer->access->enqueue_dmabuf(buffer, priv->block,
sgt,
iio_dmabuf.bytes_used, cyclic);
- if (ret)
iio_buffer_signal_dmabuf_done(attach, ret);
I'd like a comment on why using the 'successful' path cleanup makes sense in this error case. It's possible to figure it out, but reviewers are lazy and generally we like the cleanup to be obvious and local on error paths.
- dma_buf_put(dmabuf);
- return ret;
+err_resv_unlock:
- dma_resv_unlock(dmabuf->resv);
+err_fence_put:
- dma_fence_put(&fence->base);
+err_unmap_attachment:
- dma_buf_unmap_attachment(attach, sgt, dir);
+err_attachment_put:
- iio_buffer_dmabuf_put(attach);
+err_dmabuf_put:
- dma_buf_put(dmabuf);
- return ret;
+}
+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret) +{
- struct iio_dmabuf_priv *priv = attach->importer_priv;
- struct iio_dma_fence *fence = priv->fence;
- enum dma_data_direction dir = fence->dir;
- struct sg_table *sgt = fence->sgt;
- dma_fence_get(&fence->base);
I don't know much about dma_fence, but is it valid to access contents of it (sgt, etc) before getting a reference? Ultimately dma_fence_put() can result in a kfree_rcu() so seems unlikely to be safe and definitely fails the 'obviously correct' test. Given those are I assume trivial accesses just do them down here perhaps?
It is valid to access the fence before getting a reference - the fence won't be freed before it is signaled down below. It would be illegal to access it after the dma_fence_put() though, which I'm not doing here.
- fence->base.error = ret;
- dma_fence_signal(&fence->base);
- dma_fence_put(&fence->base);
- dma_buf_unmap_attachment(attach, sgt, dir);
- iio_buffer_dmabuf_put(attach);
+} +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
+static long iio_buffer_chrdev_ioctl(struct file *filp,
unsigned int cmd, unsigned
long arg) +{
- struct iio_dev_buffer_pair *ib = filp->private_data;
- void __user *_arg = (void __user *)arg;
- switch (cmd) {
- case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
return iio_buffer_attach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_DETACH_IOCTL:
return iio_buffer_detach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
return iio_buffer_enqueue_dmabuf(ib, _arg,
filp->f_flags &
O_NONBLOCK);
- default:
return IIO_IOCTL_UNHANDLED;
Given you aren't using the ioctl handling on the legacy buffer, I think this should simply return an error code, not the magic value we use to indicate 'all fine, but it's not mine'. Probably -EINVAL or similar. Note that the wrapper around the legacy buffer ioctls translates this to -ENODEV; rather than returning from the ioctl.
ACK for this and the other comments.
Thanks for reviewing!
Cheers, -Paul
- }
+}
static const struct file_operations iio_buffer_chrdev_fileops = { .owner = THIS_MODULE, .llseek = noop_llseek, .read = iio_buffer_read, .write = iio_buffer_write,
- .unlocked_ioctl = iio_buffer_chrdev_ioctl,
- .compat_ioctl = compat_ptr_ioctl,
.poll = iio_buffer_poll, .release = iio_buffer_chrdev_release, };
Hi Jonathan,
Le jeudi 21 décembre 2023 à 12:06 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:06 +0100 Paul Cercueil paul@crapouillou.net wrote:
Add the necessary infrastructure to the IIO core to support a new optional DMABUF based interface.
With this new interface, DMABUF objects (externally created) can be attached to a IIO buffer, and subsequently used for data transfer.
A userspace application can then use this interface to share DMABUF objects between several interfaces, allowing it to transfer data in a zero-copy fashion, for instance between IIO and the USB stack.
The userspace application can also memory-map the DMABUF objects, and access the sample data directly. The advantage of doing this vs. the read() interface is that it avoids an extra copy of the data between the kernel and userspace. This is particularly userful for high-speed devices which produce several megabytes or even gigabytes of data per second.
As part of the interface, 3 new IOCTLs have been added:
IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd): Attach the DMABUF object identified by the given file descriptor to the buffer.
IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd): Detach the DMABUF object identified by the given file descriptor from the buffer. Note that closing the IIO buffer's file descriptor will automatically detach all previously attached DMABUF objects.
IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *): Request a data transfer to/from the given DMABUF object. Its file descriptor, as well as the transfer size and flags are provided in the "iio_dmabuf" structure.
These three IOCTLs have to be performed on the IIO buffer's file descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
Fair enough - so they don't apply to the 'legacy' buffer which simplifies things but in one place you assume that logic is used (given error return values).
Signed-off-by: Paul Cercueil paul@crapouillou.net
This is big and complex and I'm out of time for now, so I've made some comments but should revisit it. I'm also looking for review from those more familiar with dmabuf side of things than I am!
Jonathan
+static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock) +{
- int ret;
- ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
- if (ret) {
if (ret != -EDEADLK)
goto out;
if (nonblock) {
ret = -EBUSY;
goto out;
}
ret = dma_resv_lock_slow_interruptible(dmabuf-
resv, NULL);
- }
+out:
- return ret;
I'm not a fan gotos that do nothing. Just return in appropriate places above.
+}
+static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req) +{
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- struct dma_buf *dmabuf;
- int dmabuf_fd, ret = 0;
- if (copy_from_user(&dmabuf_fd, user_req,
sizeof(dmabuf_fd)))
return -EFAULT;
- dmabuf = dma_buf_get(dmabuf_fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- attach = iio_buffer_find_attachment(ib->indio_dev,
dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto out_dmabuf_put;
- }
- priv = attach->importer_priv;
- list_del_init(&priv->entry);
- /*
* Unref twice to release the reference obtained with
* iio_buffer_find_attachment() above, and the one
obtained in
* iio_buffer_attach_dmabuf().
*/
- iio_buffer_dmabuf_put(attach);
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Cheers, -Paul
- return ret;
+}
+static const char * +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence) +{
- return "iio";
+}
+static void iio_buffer_dma_fence_release(struct dma_fence *fence) +{
- struct iio_dma_fence *iio_fence =
container_of(fence, struct iio_dma_fence, base);
- kfree(iio_fence);
+}
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
- .get_driver_name =
iio_buffer_dma_fence_get_driver_name,
- .get_timeline_name =
iio_buffer_dma_fence_get_driver_name,
- .release = iio_buffer_dma_fence_release,
+};
+static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
struct iio_dmabuf __user
*iio_dmabuf_req,
bool nonblock)
+{
- struct iio_dev *indio_dev = ib->indio_dev;
- struct iio_buffer *buffer = ib->buffer;
- struct iio_dmabuf iio_dmabuf;
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- enum dma_data_direction dir;
- struct iio_dma_fence *fence;
- struct dma_buf *dmabuf;
- struct sg_table *sgt;
- unsigned long timeout;
- bool dma_to_ram;
- bool cyclic;
- int ret;
- if (copy_from_user(&iio_dmabuf, iio_dmabuf_req,
sizeof(iio_dmabuf)))
return -EFAULT;
- if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
return -EINVAL;
- cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
- /* Cyclic flag is only supported on output buffers */
- if (cyclic && buffer->direction !=
IIO_BUFFER_DIRECTION_OUT)
return -EINVAL;
- dmabuf = dma_buf_get(iio_dmabuf.fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used >
dmabuf->size) {
ret = -EINVAL;
goto err_dmabuf_put;
- }
- attach = iio_buffer_find_attachment(indio_dev, dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto err_dmabuf_put;
Might be worth some cleanup.h magic given this put happens in all exit paths.
- }
- priv = attach->importer_priv;
- dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
- dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
- sgt = dma_buf_map_attachment(attach, dir);
- if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
dev_err(&indio_dev->dev, "Unable to map
attachment: %d\n", ret);
goto err_attachment_put;
- }
- fence = kmalloc(sizeof(*fence), GFP_KERNEL);
- if (!fence) {
ret = -ENOMEM;
goto err_unmap_attachment;
- }
- fence->priv = priv;
- fence->sgt = sgt;
- fence->dir = dir;
- priv->fence = fence;
- dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
&priv->lock, priv->context, 0);
- ret = iio_dma_resv_lock(dmabuf, nonblock);
- if (ret)
goto err_fence_put;
- timeout = nonblock ? 0 :
msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
- /* Make sure we don't have writers */
- ret = (int) dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_WRITE,
true, timeout);
I'd handle this and similar cases as long rather than adding the odd looking cast and making me think too much about it.
- if (ret == 0)
ret = -EBUSY;
- if (ret < 0)
goto err_resv_unlock;
- if (dma_to_ram) {
/*
* If we're writing to the DMABUF, make sure we
don't have
* readers
*/
ret = (int) dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_READ, true,
timeout);
if (ret == 0)
ret = -EBUSY;
if (ret < 0)
goto err_resv_unlock;
- }
- ret = dma_resv_reserve_fences(dmabuf->resv, 1);
- if (ret)
goto err_resv_unlock;
- dma_resv_add_fence(dmabuf->resv, &fence->base,
dma_resv_usage_rw(dma_to_ram));
- dma_resv_unlock(dmabuf->resv);
- ret = buffer->access->enqueue_dmabuf(buffer, priv->block,
sgt,
iio_dmabuf.bytes_used, cyclic);
- if (ret)
iio_buffer_signal_dmabuf_done(attach, ret);
I'd like a comment on why using the 'successful' path cleanup makes sense in this error case. It's possible to figure it out, but reviewers are lazy and generally we like the cleanup to be obvious and local on error paths.
- dma_buf_put(dmabuf);
- return ret;
+err_resv_unlock:
- dma_resv_unlock(dmabuf->resv);
+err_fence_put:
- dma_fence_put(&fence->base);
+err_unmap_attachment:
- dma_buf_unmap_attachment(attach, sgt, dir);
+err_attachment_put:
- iio_buffer_dmabuf_put(attach);
+err_dmabuf_put:
- dma_buf_put(dmabuf);
- return ret;
+}
+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret) +{
- struct iio_dmabuf_priv *priv = attach->importer_priv;
- struct iio_dma_fence *fence = priv->fence;
- enum dma_data_direction dir = fence->dir;
- struct sg_table *sgt = fence->sgt;
- dma_fence_get(&fence->base);
I don't know much about dma_fence, but is it valid to access contents of it (sgt, etc) before getting a reference? Ultimately dma_fence_put() can result in a kfree_rcu() so seems unlikely to be safe and definitely fails the 'obviously correct' test. Given those are I assume trivial accesses just do them down here perhaps?
- fence->base.error = ret;
- dma_fence_signal(&fence->base);
- dma_fence_put(&fence->base);
- dma_buf_unmap_attachment(attach, sgt, dir);
- iio_buffer_dmabuf_put(attach);
+} +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
+static long iio_buffer_chrdev_ioctl(struct file *filp,
unsigned int cmd, unsigned
long arg) +{
- struct iio_dev_buffer_pair *ib = filp->private_data;
- void __user *_arg = (void __user *)arg;
- switch (cmd) {
- case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
return iio_buffer_attach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_DETACH_IOCTL:
return iio_buffer_detach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
return iio_buffer_enqueue_dmabuf(ib, _arg,
filp->f_flags &
O_NONBLOCK);
- default:
return IIO_IOCTL_UNHANDLED;
Given you aren't using the ioctl handling on the legacy buffer, I think this should simply return an error code, not the magic value we use to indicate 'all fine, but it's not mine'. Probably -EINVAL or similar. Note that the wrapper around the legacy buffer ioctls translates this to -ENODEV; rather than returning from the ioctl.
- }
+}
static const struct file_operations iio_buffer_chrdev_fileops = { .owner = THIS_MODULE, .llseek = noop_llseek, .read = iio_buffer_read, .write = iio_buffer_write,
- .unlocked_ioctl = iio_buffer_chrdev_ioctl,
- .compat_ioctl = compat_ptr_ioctl,
.poll = iio_buffer_poll, .release = iio_buffer_chrdev_release, };
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing so. Alternative is to circle back to this later after this code is upstream.
Cheers, -Paul
Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing so. Alternative is to circle back to this later after this code is upstream.
Separate patches for that please, the autocleanup feature is so new that I'm not 100% convinced that everything works out smoothly from the start.
Regards, Christian.
Cheers, -Paul
Hi Christian,
Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing so. Alternative is to circle back to this later after this code is upstream.
Separate patches for that please, the autocleanup feature is so new that I'm not 100% convinced that everything works out smoothly from the start.
Separate patches is a given, did you mean outside this patchset? Because I can send a separate patchset that introduces scope-based management for dma_fence and dma_buf, but then it won't have users.
Cheers, -Paul
Am 29.01.24 um 14:06 schrieb Paul Cercueil:
Hi Christian,
Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing so. Alternative is to circle back to this later after this code is upstream.
Separate patches for that please, the autocleanup feature is so new that I'm not 100% convinced that everything works out smoothly from the start.
Separate patches is a given, did you mean outside this patchset? Because I can send a separate patchset that introduces scope-based management for dma_fence and dma_buf, but then it won't have users.
Outside of the patchset, this is essentially brand new stuff.
IIRC we have quite a number of dma_fence selftests and sw_sync which is basically code inside the drivers/dma-buf directory only there for testing DMA-buf functionality.
Convert those over as well and I'm more than happy to upstream this change.
Thanks, Christian.
Cheers, -Paul
Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
Am 29.01.24 um 14:06 schrieb Paul Cercueil:
Hi Christian,
Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> + iio_buffer_dmabuf_put(attach); > + > +out_dmabuf_put: > + dma_buf_put(dmabuf); As below. Feels like a __free(dma_buf_put) bit of magic would be a nice to have.
I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing so. Alternative is to circle back to this later after this code is upstream.
Separate patches for that please, the autocleanup feature is so new that I'm not 100% convinced that everything works out smoothly from the start.
Separate patches is a given, did you mean outside this patchset? Because I can send a separate patchset that introduces scope-based management for dma_fence and dma_buf, but then it won't have users.
Outside of the patchset, this is essentially brand new stuff.
IIRC we have quite a number of dma_fence selftests and sw_sync which is basically code inside the drivers/dma-buf directory only there for testing DMA-buf functionality.
Convert those over as well and I'm more than happy to upstream this change.
Well there is very little to convert there; you can use scope-based management when the unref is done in all exit points of the functional block, and the only place I could find that does that in drivers/dma- buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.
Cheers, -Paul
Le lundi 29 janvier 2024 à 14:32 +0100, Paul Cercueil a écrit :
Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
Am 29.01.24 um 14:06 schrieb Paul Cercueil:
Hi Christian,
Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > + iio_buffer_dmabuf_put(attach); > > + > > +out_dmabuf_put: > > + dma_buf_put(dmabuf); > As below. Feels like a __free(dma_buf_put) bit of magic > would > be a > nice to have. I'm working on the patches right now, just one quick question.
Having a __free(dma_buf_put) requires that dma_buf_put is first "registered" as a freeing function using DEFINE_FREE() in <linux/dma- buf.h>, which has not been done yet.
That would mean carrying a dma-buf specific patch in your tree, are you OK with that?
Needs an ACK from appropriate maintainer, but otherwise I'm fine doing so. Alternative is to circle back to this later after this code is upstream.
Separate patches for that please, the autocleanup feature is so new that I'm not 100% convinced that everything works out smoothly from the start.
Separate patches is a given, did you mean outside this patchset? Because I can send a separate patchset that introduces scope- based management for dma_fence and dma_buf, but then it won't have users.
Outside of the patchset, this is essentially brand new stuff.
IIRC we have quite a number of dma_fence selftests and sw_sync which is basically code inside the drivers/dma-buf directory only there for testing DMA-buf functionality.
Convert those over as well and I'm more than happy to upstream this change.
Well there is very little to convert there; you can use scope-based management when the unref is done in all exit points of the functional block, and the only place I could find that does that in drivers/dma- buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.
Actually - not even that, since it doesn't call dma_fence_get() and dma_fence_put() on the same fence.
So I cannot use it anywhere in drivers/dma-buf/.
-Paul
On Tue, Dec 19, 2023 at 06:50:06PM +0100, Paul Cercueil wrote:
Add the necessary infrastructure to the IIO core to support a new optional DMABUF based interface.
With this new interface, DMABUF objects (externally created) can be attached to a IIO buffer, and subsequently used for data transfer.
A userspace application can then use this interface to share DMABUF objects between several interfaces, allowing it to transfer data in a zero-copy fashion, for instance between IIO and the USB stack.
The userspace application can also memory-map the DMABUF objects, and access the sample data directly. The advantage of doing this vs. the read() interface is that it avoids an extra copy of the data between the kernel and userspace. This is particularly userful for high-speed devices which produce several megabytes or even gigabytes of data per second.
As part of the interface, 3 new IOCTLs have been added:
IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd): Attach the DMABUF object identified by the given file descriptor to the buffer.
IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd): Detach the DMABUF object identified by the given file descriptor from the buffer. Note that closing the IIO buffer's file descriptor will automatically detach all previously attached DMABUF objects.
IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *): Request a data transfer to/from the given DMABUF object. Its file descriptor, as well as the transfer size and flags are provided in the "iio_dmabuf" structure.
These three IOCTLs have to be performed on the IIO buffer's file descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
Signed-off-by: Paul Cercueil paul@crapouillou.net
v2: Only allow the new IOCTLs on the buffer FD created with IIO_BUFFER_GET_FD_IOCTL().
v3: - Get rid of the old IOCTLs. The IIO subsystem does not create or manage DMABUFs anymore, and only attaches/detaches externally created DMABUFs. - Add IIO_BUFFER_DMABUF_CYCLIC to the supported flags.
v5: - Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++++++++++++++ include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 ++ 3 files changed, 450 insertions(+)
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c index 09c41e9ccf87..24c040e073a7 100644 --- a/drivers/iio/industrialio-buffer.c +++ b/drivers/iio/industrialio-buffer.c @@ -13,10 +13,14 @@ #include <linux/kernel.h> #include <linux/export.h> #include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/dma-fence.h> +#include <linux/dma-resv.h> #include <linux/file.h> #include <linux/fs.h> #include <linux/cdev.h> #include <linux/slab.h> +#include <linux/mm.h> #include <linux/poll.h> #include <linux/sched/signal.h> @@ -28,6 +32,31 @@ #include <linux/iio/buffer.h> #include <linux/iio/buffer_impl.h> +#define DMABUF_ENQUEUE_TIMEOUT_MS 5000
+struct iio_dma_fence;
+struct iio_dmabuf_priv {
- struct list_head entry;
- struct kref ref;
- struct iio_buffer *buffer;
- struct iio_dma_buffer_block *block;
- u64 context;
- spinlock_t lock;
- struct dma_buf_attachment *attach;
- struct iio_dma_fence *fence;
+};
+struct iio_dma_fence {
- struct dma_fence base;
- struct iio_dmabuf_priv *priv;
- struct sg_table *sgt;
- enum dma_data_direction dir;
+};
static const char * const iio_endian_prefix[] = { [IIO_BE] = "be", [IIO_LE] = "le", @@ -332,6 +361,7 @@ void iio_buffer_init(struct iio_buffer *buffer) { INIT_LIST_HEAD(&buffer->demux_list); INIT_LIST_HEAD(&buffer->buffer_list);
- INIT_LIST_HEAD(&buffer->dmabufs); init_waitqueue_head(&buffer->pollq); kref_init(&buffer->ref); if (!buffer->watermark)
@@ -1519,14 +1549,54 @@ static void iio_buffer_unregister_legacy_sysfs_groups(struct iio_dev *indio_dev) kfree(iio_dev_opaque->legacy_scan_el_group.attrs); } +static void iio_buffer_dmabuf_release(struct kref *ref) +{
- struct iio_dmabuf_priv *priv = container_of(ref, struct iio_dmabuf_priv, ref);
- struct dma_buf_attachment *attach = priv->attach;
- struct iio_buffer *buffer = priv->buffer;
- struct dma_buf *dmabuf = attach->dmabuf;
- buffer->access->detach_dmabuf(buffer, priv->block);
- dma_buf_detach(attach->dmabuf, attach);
- dma_buf_put(dmabuf);
- kfree(priv);
+}
+void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach) +{
- struct iio_dmabuf_priv *priv = attach->importer_priv;
- kref_get(&priv->ref);
+} +EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_get);
+void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach) +{
- struct iio_dmabuf_priv *priv = attach->importer_priv;
- kref_put(&priv->ref, iio_buffer_dmabuf_release);
+} +EXPORT_SYMBOL_GPL(iio_buffer_dmabuf_put);
static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep) { struct iio_dev_buffer_pair *ib = filep->private_data; struct iio_dev *indio_dev = ib->indio_dev; struct iio_buffer *buffer = ib->buffer;
- struct iio_dmabuf_priv *priv, *tmp;
wake_up(&buffer->pollq);
- /* Close all attached DMABUFs */
- list_for_each_entry_safe(priv, tmp, &buffer->dmabufs, entry) {
list_del_init(&priv->entry);
iio_buffer_dmabuf_put(priv->attach);
- }
- if (!list_empty(&buffer->dmabufs))
dev_warn(&indio_dev->dev, "Buffer FD closed with active transfers\n");
- kfree(ib); clear_bit(IIO_BUSY_BIT_POS, &buffer->flags); iio_device_put(indio_dev);
@@ -1534,11 +1604,343 @@ static int iio_buffer_chrdev_release(struct inode *inode, struct file *filep) return 0; } +static int iio_dma_resv_lock(struct dma_buf *dmabuf, bool nonblock) +{
- int ret;
- ret = dma_resv_lock_interruptible(dmabuf->resv, NULL);
- if (ret) {
if (ret != -EDEADLK)
goto out;
if (nonblock) {
ret = -EBUSY;
goto out;
}
ret = dma_resv_lock_slow_interruptible(dmabuf->resv, NULL);
This is overkill, without a reservation context you never get -EDEADLK and so never have to go into the slowpath locking mode. You can check this with the CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y build option.
- }
+out:
- return ret;
+}
+static struct dma_buf_attachment * +iio_buffer_find_attachment(struct iio_dev *indio_dev, struct dma_buf *dmabuf) +{
- struct dma_buf_attachment *elm, *attach = NULL;
- int ret;
- ret = iio_dma_resv_lock(dmabuf, false);
- if (ret)
return ERR_PTR(ret);
- list_for_each_entry(elm, &dmabuf->attachments, node) {
if (elm->dev == indio_dev->dev.parent) {
attach = elm;
break;
}
- }
- if (attach)
iio_buffer_dmabuf_get(elm);
Same comment as on your usb gagdet support: This must be a kref_get_unless_zero, and I'd really prefer if you use your own list+locking instead of digging around in dma-buf internals in a lifetime-relevant way.
- dma_resv_unlock(dmabuf->resv);
- return attach ?: ERR_PTR(-EPERM);
+}
+static int iio_buffer_attach_dmabuf(struct iio_dev_buffer_pair *ib,
int __user *user_fd)
+{
- struct iio_dev *indio_dev = ib->indio_dev;
- struct iio_buffer *buffer = ib->buffer;
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- struct dma_buf *dmabuf;
- int err, fd;
- if (!buffer->access->attach_dmabuf
|| !buffer->access->detach_dmabuf
|| !buffer->access->enqueue_dmabuf)
return -EPERM;
- if (copy_from_user(&fd, user_fd, sizeof(fd)))
return -EFAULT;
- priv = kzalloc(sizeof(*priv), GFP_KERNEL);
- if (!priv)
return -ENOMEM;
- spin_lock_init(&priv->lock);
- priv->context = dma_fence_context_alloc(1);
- dmabuf = dma_buf_get(fd);
- if (IS_ERR(dmabuf)) {
err = PTR_ERR(dmabuf);
goto err_free_priv;
- }
- attach = dma_buf_attach(dmabuf, indio_dev->dev.parent);
- if (IS_ERR(attach)) {
err = PTR_ERR(attach);
goto err_dmabuf_put;
- }
- kref_init(&priv->ref);
- priv->buffer = buffer;
- priv->attach = attach;
- attach->importer_priv = priv;
- priv->block = buffer->access->attach_dmabuf(buffer, attach);
- if (IS_ERR(priv->block)) {
err = PTR_ERR(priv->block);
goto err_dmabuf_detach;
- }
- list_add(&priv->entry, &buffer->dmabufs);
This list seems to have no locking. And I think you want to tie the attach refcount 1:1 to this list, to make sure userspace can't double-detach and hence underrun any refcount here. Would also address my concern with your find_attachment() function.
- return 0;
+err_dmabuf_detach:
- dma_buf_detach(dmabuf, attach);
+err_dmabuf_put:
- dma_buf_put(dmabuf);
+err_free_priv:
- kfree(priv);
- return err;
+}
+static int iio_buffer_detach_dmabuf(struct iio_dev_buffer_pair *ib, int *user_req) +{
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- struct dma_buf *dmabuf;
- int dmabuf_fd, ret = 0;
- if (copy_from_user(&dmabuf_fd, user_req, sizeof(dmabuf_fd)))
return -EFAULT;
- dmabuf = dma_buf_get(dmabuf_fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- attach = iio_buffer_find_attachment(ib->indio_dev, dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto out_dmabuf_put;
- }
- priv = attach->importer_priv;
- list_del_init(&priv->entry);
- /*
* Unref twice to release the reference obtained with
* iio_buffer_find_attachment() above, and the one obtained in
* iio_buffer_attach_dmabuf().
*/
Again like in the usb gagdet code, this looks like it's exploitable to provoke a refcount underflow by userspace.
- iio_buffer_dmabuf_put(attach);
- iio_buffer_dmabuf_put(attach);
+out_dmabuf_put:
- dma_buf_put(dmabuf);
- return ret;
+}
+static const char * +iio_buffer_dma_fence_get_driver_name(struct dma_fence *fence) +{
- return "iio";
+}
+static void iio_buffer_dma_fence_release(struct dma_fence *fence) +{
- struct iio_dma_fence *iio_fence =
container_of(fence, struct iio_dma_fence, base);
- kfree(iio_fence);
+}
+static const struct dma_fence_ops iio_buffer_dma_fence_ops = {
- .get_driver_name = iio_buffer_dma_fence_get_driver_name,
- .get_timeline_name = iio_buffer_dma_fence_get_driver_name,
- .release = iio_buffer_dma_fence_release,
+};
+static int iio_buffer_enqueue_dmabuf(struct iio_dev_buffer_pair *ib,
struct iio_dmabuf __user *iio_dmabuf_req,
bool nonblock)
+{
- struct iio_dev *indio_dev = ib->indio_dev;
- struct iio_buffer *buffer = ib->buffer;
- struct iio_dmabuf iio_dmabuf;
- struct dma_buf_attachment *attach;
- struct iio_dmabuf_priv *priv;
- enum dma_data_direction dir;
- struct iio_dma_fence *fence;
- struct dma_buf *dmabuf;
- struct sg_table *sgt;
- unsigned long timeout;
- bool dma_to_ram;
- bool cyclic;
- int ret;
- if (copy_from_user(&iio_dmabuf, iio_dmabuf_req, sizeof(iio_dmabuf)))
return -EFAULT;
- if (iio_dmabuf.flags & ~IIO_BUFFER_DMABUF_SUPPORTED_FLAGS)
return -EINVAL;
- cyclic = iio_dmabuf.flags & IIO_BUFFER_DMABUF_CYCLIC;
- /* Cyclic flag is only supported on output buffers */
- if (cyclic && buffer->direction != IIO_BUFFER_DIRECTION_OUT)
return -EINVAL;
- dmabuf = dma_buf_get(iio_dmabuf.fd);
- if (IS_ERR(dmabuf))
return PTR_ERR(dmabuf);
- if (!iio_dmabuf.bytes_used || iio_dmabuf.bytes_used > dmabuf->size) {
ret = -EINVAL;
goto err_dmabuf_put;
- }
- attach = iio_buffer_find_attachment(indio_dev, dmabuf);
- if (IS_ERR(attach)) {
ret = PTR_ERR(attach);
goto err_dmabuf_put;
- }
- priv = attach->importer_priv;
- dma_to_ram = buffer->direction == IIO_BUFFER_DIRECTION_IN;
- dir = dma_to_ram ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
- sgt = dma_buf_map_attachment(attach, dir);
- if (IS_ERR(sgt)) {
ret = PTR_ERR(sgt);
dev_err(&indio_dev->dev, "Unable to map attachment: %d\n", ret);
goto err_attachment_put;
- }
- fence = kmalloc(sizeof(*fence), GFP_KERNEL);
- if (!fence) {
ret = -ENOMEM;
goto err_unmap_attachment;
- }
- fence->priv = priv;
- fence->sgt = sgt;
- fence->dir = dir;
- priv->fence = fence;
- dma_fence_init(&fence->base, &iio_buffer_dma_fence_ops,
&priv->lock, priv->context, 0);
Same comment as for the usb gadget patch: You need a real seqno here (and iio must then guarantee that all transactions are ordered), or a new unordered dma_fence (meaning a new context for each fence, currently there's no non-hackish way to make that happen).
- ret = iio_dma_resv_lock(dmabuf, nonblock);
- if (ret)
goto err_fence_put;
- timeout = nonblock ? 0 : msecs_to_jiffies(DMABUF_ENQUEUE_TIMEOUT_MS);
- /* Make sure we don't have writers */
- ret = (int) dma_resv_wait_timeout(dmabuf->resv, DMA_RESV_USAGE_WRITE,
true, timeout);
- if (ret == 0)
ret = -EBUSY;
- if (ret < 0)
goto err_resv_unlock;
- if (dma_to_ram) {
/*
* If we're writing to the DMABUF, make sure we don't have
* readers
*/
ret = (int) dma_resv_wait_timeout(dmabuf->resv,
DMA_RESV_USAGE_READ, true,
timeout);
if (ret == 0)
ret = -EBUSY;
if (ret < 0)
goto err_resv_unlock;
- }
- ret = dma_resv_reserve_fences(dmabuf->resv, 1);
- if (ret)
goto err_resv_unlock;
- dma_resv_add_fence(dmabuf->resv, &fence->base,
dma_resv_usage_rw(dma_to_ram));
- dma_resv_unlock(dmabuf->resv);
Please add dma_buf_begin/end_signalling annotations here to make sure the locking/memory allocation rules for dma_fence are followed. Same suggestion would also be good in the usb gadget code.
- ret = buffer->access->enqueue_dmabuf(buffer, priv->block, sgt,
iio_dmabuf.bytes_used, cyclic);
- if (ret)
iio_buffer_signal_dmabuf_done(attach, ret);
- dma_buf_put(dmabuf);
- return ret;
+err_resv_unlock:
- dma_resv_unlock(dmabuf->resv);
+err_fence_put:
- dma_fence_put(&fence->base);
+err_unmap_attachment:
- dma_buf_unmap_attachment(attach, sgt, dir);
+err_attachment_put:
- iio_buffer_dmabuf_put(attach);
+err_dmabuf_put:
- dma_buf_put(dmabuf);
- return ret;
+}
+void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret) +{
- struct iio_dmabuf_priv *priv = attach->importer_priv;
- struct iio_dma_fence *fence = priv->fence;
- enum dma_data_direction dir = fence->dir;
- struct sg_table *sgt = fence->sgt;
- dma_fence_get(&fence->base);
- fence->base.error = ret;
- dma_fence_signal(&fence->base);
- dma_fence_put(&fence->base);
- dma_buf_unmap_attachment(attach, sgt, dir);
- iio_buffer_dmabuf_put(attach);
Like with the usb gadget code I have concerns that you might hold up the entire completion machinery here by taking the wrong locks. You probably want to add dma_fence_begin/end_signalling annotations here, and also split out the final attachment unref with all the refcount unravelling into a preallocated worker.
The other issue is dma_buf_unmap_attachment here. That must be called with dma_resv_lock held, but you can't do that here because this is dma_fence completion code.
Usually drivers cache this stuff, but I guess you could also just put that into your unref worker. The more gnarly issue is that if you need this for cache coherency maintainance, then that should be _before_ the dma_fence_signal(), but currently we don't have a dma_buf function which does only the cache maintenance (which wouldn't need dma_resv_lock) and not also the unmapping.
I think to make sure we don't have a big design issue here we need:
- dma_fence_begin/end_signalling critical section annotations in both iio and usb gadget code, for anything that could potentially hold up the dma_fence_signal at any point after a fence has been installed into the dma_resv object.
- probably dma-api debugging or a platform that needs cache flushes to make sure this works. For gpu dma-buf sharing we pretty much side-step this all by assuming everyone does only write-combined mappings ever, or at least the interconnect fabric is reasonable enough that flushing only around cpu access is enough. This assumption very much does not hold in general, and it's fallen apart enough times even for gpu dma-buf sharing in the past.
Otherwise I think we might end up opening pandoras box here a bit and merge code that works in tech demo mode, but which would need serious amounts of subsystem rework in iio or usb gadget to make it work correctly across the board.
Cheers, Sima
+} +EXPORT_SYMBOL_GPL(iio_buffer_signal_dmabuf_done);
+static long iio_buffer_chrdev_ioctl(struct file *filp,
unsigned int cmd, unsigned long arg)
+{
- struct iio_dev_buffer_pair *ib = filp->private_data;
- void __user *_arg = (void __user *)arg;
- switch (cmd) {
- case IIO_BUFFER_DMABUF_ATTACH_IOCTL:
return iio_buffer_attach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_DETACH_IOCTL:
return iio_buffer_detach_dmabuf(ib, _arg);
- case IIO_BUFFER_DMABUF_ENQUEUE_IOCTL:
return iio_buffer_enqueue_dmabuf(ib, _arg,
filp->f_flags & O_NONBLOCK);
- default:
return IIO_IOCTL_UNHANDLED;
- }
+}
static const struct file_operations iio_buffer_chrdev_fileops = { .owner = THIS_MODULE, .llseek = noop_llseek, .read = iio_buffer_read, .write = iio_buffer_write,
- .unlocked_ioctl = iio_buffer_chrdev_ioctl,
- .compat_ioctl = compat_ptr_ioctl, .poll = iio_buffer_poll, .release = iio_buffer_chrdev_release,
}; diff --git a/include/linux/iio/buffer_impl.h b/include/linux/iio/buffer_impl.h index 89c3fd7c29ca..55d93705c96b 100644 --- a/include/linux/iio/buffer_impl.h +++ b/include/linux/iio/buffer_impl.h @@ -9,8 +9,11 @@ #include <uapi/linux/iio/buffer.h> #include <linux/iio/buffer.h> +struct dma_buf_attachment; struct iio_dev; +struct iio_dma_buffer_block; struct iio_buffer; +struct sg_table; /**
- INDIO_BUFFER_FLAG_FIXED_WATERMARK - Watermark level of the buffer can not be
@@ -39,6 +42,13 @@ struct iio_buffer;
device stops sampling. Calles are balanced with @enable.
- @release: called when the last reference to the buffer is dropped,
should free all resources allocated by the buffer.
- @attach_dmabuf: called from userspace via ioctl to attach one external
DMABUF.
- @detach_dmabuf: called from userspace via ioctl to detach one previously
attached DMABUF.
- @enqueue_dmabuf: called from userspace via ioctl to queue this DMABUF
object to this buffer. Requires a valid DMABUF fd, that
was previouly attached to this buffer.
- @modes: Supported operating modes by this buffer type
- @flags: A bitmask combination of INDIO_BUFFER_FLAG_*
@@ -68,6 +78,14 @@ struct iio_buffer_access_funcs { void (*release)(struct iio_buffer *buffer);
- struct iio_dma_buffer_block * (*attach_dmabuf)(struct iio_buffer *buffer,
struct dma_buf_attachment *attach);
- void (*detach_dmabuf)(struct iio_buffer *buffer,
struct iio_dma_buffer_block *block);
- int (*enqueue_dmabuf)(struct iio_buffer *buffer,
struct iio_dma_buffer_block *block,
struct sg_table *sgt, size_t size, bool cyclic);
- unsigned int modes; unsigned int flags;
}; @@ -136,6 +154,9 @@ struct iio_buffer { /* @ref: Reference count of the buffer. */ struct kref ref;
- /* @dmabufs: List of DMABUF attachments */
- struct list_head dmabufs;
}; /** @@ -156,9 +177,14 @@ int iio_update_buffers(struct iio_dev *indio_dev, **/ void iio_buffer_init(struct iio_buffer *buffer); +void iio_buffer_dmabuf_get(struct dma_buf_attachment *attach); +void iio_buffer_dmabuf_put(struct dma_buf_attachment *attach);
struct iio_buffer *iio_buffer_get(struct iio_buffer *buffer); void iio_buffer_put(struct iio_buffer *buffer); +void iio_buffer_signal_dmabuf_done(struct dma_buf_attachment *attach, int ret);
#else /* CONFIG_IIO_BUFFER */ static inline void iio_buffer_get(struct iio_buffer *buffer) {} diff --git a/include/uapi/linux/iio/buffer.h b/include/uapi/linux/iio/buffer.h index 13939032b3f6..c666aa95e532 100644 --- a/include/uapi/linux/iio/buffer.h +++ b/include/uapi/linux/iio/buffer.h @@ -5,6 +5,28 @@ #ifndef _UAPI_IIO_BUFFER_H_ #define _UAPI_IIO_BUFFER_H_ +#include <linux/types.h>
+/* Flags for iio_dmabuf.flags */ +#define IIO_BUFFER_DMABUF_CYCLIC (1 << 0) +#define IIO_BUFFER_DMABUF_SUPPORTED_FLAGS 0x00000001
+/**
- struct iio_dmabuf - Descriptor for a single IIO DMABUF object
- @fd: file descriptor of the DMABUF object
- @flags: one or more IIO_BUFFER_DMABUF_* flags
- @bytes_used: number of bytes used in this DMABUF for the data transfer.
Should generally be set to the DMABUF's size.
- */
+struct iio_dmabuf {
- __u32 fd;
- __u32 flags;
- __u64 bytes_used;
+};
#define IIO_BUFFER_GET_FD_IOCTL _IOWR('i', 0x91, int) +#define IIO_BUFFER_DMABUF_ATTACH_IOCTL _IOW('i', 0x92, int) +#define IIO_BUFFER_DMABUF_DETACH_IOCTL _IOW('i', 0x93, int) +#define IIO_BUFFER_DMABUF_ENQUEUE_IOCTL _IOW('i', 0x94, struct iio_dmabuf)
#endif /* _UAPI_IIO_BUFFER_H_ */
2.43.0
Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf() and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO DMA buffer implementations.
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v3: Update code to provide the functions that will be used as callbacks for the new IOCTLs. --- drivers/iio/buffer/industrialio-buffer-dma.c | 157 +++++++++++++++++-- include/linux/iio/buffer-dma.h | 26 +++ 2 files changed, 170 insertions(+), 13 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c index 5610ba67925e..adb64f975064 100644 --- a/drivers/iio/buffer/industrialio-buffer-dma.c +++ b/drivers/iio/buffer/industrialio-buffer-dma.c @@ -14,6 +14,7 @@ #include <linux/poll.h> #include <linux/iio/buffer_impl.h> #include <linux/iio/buffer-dma.h> +#include <linux/dma-buf.h> #include <linux/dma-mapping.h> #include <linux/sizes.h>
@@ -94,14 +95,24 @@ static void iio_buffer_block_release(struct kref *kref) { struct iio_dma_buffer_block *block = container_of(kref, struct iio_dma_buffer_block, kref); + struct iio_dma_buffer_queue *queue = block->queue;
- WARN_ON(block->state != IIO_BLOCK_STATE_DEAD); + WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);
- dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size), - block->vaddr, block->phys_addr); + mutex_lock(&queue->lock);
- iio_buffer_put(&block->queue->buffer); + if (block->fileio) { + dma_free_coherent(queue->dev, PAGE_ALIGN(block->size), + block->vaddr, block->phys_addr); + queue->num_fileio_blocks--; + } + + queue->num_blocks--; kfree(block); + + mutex_unlock(&queue->lock); + + iio_buffer_put(&queue->buffer); }
static void iio_buffer_block_get(struct iio_dma_buffer_block *block) @@ -163,7 +174,7 @@ static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf) }
static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( - struct iio_dma_buffer_queue *queue, size_t size) + struct iio_dma_buffer_queue *queue, size_t size, bool fileio) { struct iio_dma_buffer_block *block;
@@ -171,13 +182,16 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( if (!block) return NULL;
- block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size), - &block->phys_addr, GFP_KERNEL); - if (!block->vaddr) { - kfree(block); - return NULL; + if (fileio) { + block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size), + &block->phys_addr, GFP_KERNEL); + if (!block->vaddr) { + kfree(block); + return NULL; + } }
+ block->fileio = fileio; block->size = size; block->state = IIO_BLOCK_STATE_DONE; block->queue = queue; @@ -186,6 +200,9 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
iio_buffer_get(&queue->buffer);
+ queue->num_blocks++; + queue->num_fileio_blocks += fileio; + return block; }
@@ -211,6 +228,9 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block) _iio_dma_buffer_block_done(block); spin_unlock_irqrestore(&queue->list_lock, flags);
+ if (!block->fileio) + iio_buffer_signal_dmabuf_done(block->attach, 0); + iio_buffer_block_put_atomic(block); wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM); } @@ -237,10 +257,14 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue, list_del(&block->head); block->bytes_used = 0; _iio_dma_buffer_block_done(block); + + if (!block->fileio) + iio_buffer_signal_dmabuf_done(block->attach, -EINTR); iio_buffer_block_put_atomic(block); } spin_unlock_irqrestore(&queue->list_lock, flags);
+ queue->fileio.enabled = false; wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM); } EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort); @@ -261,6 +285,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block) } }
+static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue) +{ + return queue->fileio.enabled || + queue->num_blocks == queue->num_fileio_blocks; +} + /** * iio_dma_buffer_request_update() - DMA buffer request_update callback * @buffer: The buffer which to request an update @@ -287,6 +317,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer)
mutex_lock(&queue->lock);
+ queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue); + + /* If DMABUFs were created, disable fileio interface */ + if (!queue->fileio.enabled) + goto out_unlock; + /* Allocations are page aligned */ if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size)) try_reuse = true; @@ -317,7 +353,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) block = queue->fileio.blocks[i]; if (block->state == IIO_BLOCK_STATE_DEAD) { /* Could not reuse it */ - iio_buffer_block_put(block); + iio_buffer_block_put_atomic(block); block = NULL; } else { block->size = size; @@ -327,7 +363,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) }
if (!block) { - block = iio_dma_buffer_alloc_block(queue, size); + block = iio_dma_buffer_alloc_block(queue, size, true); if (!block) { ret = -ENOMEM; goto out_unlock; @@ -363,7 +399,7 @@ static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue) for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { if (!queue->fileio.blocks[i]) continue; - iio_buffer_block_put(queue->fileio.blocks[i]); + iio_buffer_block_put_atomic(queue->fileio.blocks[i]); queue->fileio.blocks[i] = NULL; } queue->fileio.active_block = NULL; @@ -384,8 +420,12 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue,
block->state = IIO_BLOCK_STATE_ACTIVE; iio_buffer_block_get(block); + ret = queue->ops->submit(queue, block); if (ret) { + if (!block->fileio) + iio_buffer_signal_dmabuf_done(block->attach, ret); + /* * This is a bit of a problem and there is not much we can do * other then wait for the buffer to be disabled and re-enabled @@ -588,6 +628,97 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf) } EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available);
+struct iio_dma_buffer_block * +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer, + struct dma_buf_attachment *attach) +{ + struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer); + struct iio_dma_buffer_block *block; + int err; + + mutex_lock(&queue->lock); + + /* + * If the buffer is enabled and in fileio mode new blocks can't be + * allocated. + */ + if (queue->fileio.enabled) { + err = -EBUSY; + goto err_unlock; + } + + block = iio_dma_buffer_alloc_block(queue, attach->dmabuf->size, false); + if (!block) { + err = -ENOMEM; + goto err_unlock; + } + + block->attach = attach; + + /* Free memory that might be in use for fileio mode */ + iio_dma_buffer_fileio_free(queue); + + mutex_unlock(&queue->lock); + + return block; + +err_unlock: + mutex_unlock(&queue->lock); + return ERR_PTR(err); +} +EXPORT_SYMBOL_GPL(iio_dma_buffer_attach_dmabuf); + +void iio_dma_buffer_detach_dmabuf(struct iio_buffer *buffer, + struct iio_dma_buffer_block *block) +{ + block->state = IIO_BLOCK_STATE_DEAD; + iio_buffer_block_put_atomic(block); +} +EXPORT_SYMBOL_GPL(iio_dma_buffer_detach_dmabuf); + +static int iio_dma_can_enqueue_block(struct iio_dma_buffer_block *block) +{ + struct iio_dma_buffer_queue *queue = block->queue; + + /* If in fileio mode buffers can't be enqueued. */ + if (queue->fileio.enabled) + return -EBUSY; + + switch (block->state) { + case IIO_BLOCK_STATE_QUEUED: + return -EPERM; + case IIO_BLOCK_STATE_DONE: + return 0; + default: + return -EBUSY; + } +} + +int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer, + struct iio_dma_buffer_block *block, + struct sg_table *sgt, + size_t size, bool cyclic) +{ + struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer); + int ret = 0; + + mutex_lock(&queue->lock); + ret = iio_dma_can_enqueue_block(block); + if (ret < 0) + goto out_mutex_unlock; + + block->bytes_used = size; + block->cyclic = cyclic; + block->sg_table = sgt; + + iio_dma_buffer_enqueue(queue, block); + +out_mutex_unlock: + mutex_unlock(&queue->lock); + return ret; +} +EXPORT_SYMBOL_GPL(iio_dma_buffer_enqueue_dmabuf); + /** * iio_dma_buffer_set_bytes_per_datum() - DMA buffer set_bytes_per_datum callback * @buffer: Buffer to set the bytes-per-datum for diff --git a/include/linux/iio/buffer-dma.h b/include/linux/iio/buffer-dma.h index 18d3702fa95d..7be12a6bff5b 100644 --- a/include/linux/iio/buffer-dma.h +++ b/include/linux/iio/buffer-dma.h @@ -16,6 +16,8 @@ struct iio_dma_buffer_queue; struct iio_dma_buffer_ops; struct device; +struct dma_buf_attachment; +struct sg_table;
/** * enum iio_block_state - State of a struct iio_dma_buffer_block @@ -41,6 +43,8 @@ enum iio_block_state { * @queue: Parent DMA buffer queue * @kref: kref used to manage the lifetime of block * @state: Current state of the block + * @cyclic: True if this is a cyclic buffer + * @fileio: True if this buffer is used for fileio mode */ struct iio_dma_buffer_block { /* May only be accessed by the owner of the block */ @@ -63,6 +67,12 @@ struct iio_dma_buffer_block { * queue->list_lock if the block is not owned by the core. */ enum iio_block_state state; + + bool cyclic; + bool fileio; + + struct dma_buf_attachment *attach; + struct sg_table *sg_table; };
/** @@ -72,6 +82,7 @@ struct iio_dma_buffer_block { * @pos: Read offset in the active block * @block_size: Size of each block * @next_dequeue: index of next block that will be dequeued + * @enabled: Whether the buffer is operating in fileio mode */ struct iio_dma_buffer_queue_fileio { struct iio_dma_buffer_block *blocks[2]; @@ -80,6 +91,7 @@ struct iio_dma_buffer_queue_fileio { size_t block_size;
unsigned int next_dequeue; + bool enabled; };
/** @@ -95,6 +107,8 @@ struct iio_dma_buffer_queue_fileio { * the DMA controller * @incoming: List of buffers on the incoming queue * @active: Whether the buffer is currently active + * @num_blocks: Total number of DMA blocks + * @num_fileio_blocks: Number of DMA blocks for fileio mode * @fileio: FileIO state */ struct iio_dma_buffer_queue { @@ -107,6 +121,8 @@ struct iio_dma_buffer_queue { struct list_head incoming;
bool active; + unsigned int num_blocks; + unsigned int num_fileio_blocks;
struct iio_dma_buffer_queue_fileio fileio; }; @@ -142,4 +158,14 @@ int iio_dma_buffer_init(struct iio_dma_buffer_queue *queue, void iio_dma_buffer_exit(struct iio_dma_buffer_queue *queue); void iio_dma_buffer_release(struct iio_dma_buffer_queue *queue);
+struct iio_dma_buffer_block * +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer, + struct dma_buf_attachment *attach); +void iio_dma_buffer_detach_dmabuf(struct iio_buffer *buffer, + struct iio_dma_buffer_block *block); +int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer, + struct iio_dma_buffer_block *block, + struct sg_table *sgt, + size_t size, bool cyclic); + #endif
On Tue, 19 Dec 2023 18:50:07 +0100 Paul Cercueil paul@crapouillou.net wrote:
Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf() and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO DMA buffer implementations.
Signed-off-by: Paul Cercueil paul@crapouillou.net
Hi Paul,
A few comments in here. Mostly places where the cleanup.h guard() stuff can simplify error paths.
Overall this looks reasonable to me.
Jonathan
v3: Update code to provide the functions that will be used as callbacks for the new IOCTLs.
drivers/iio/buffer/industrialio-buffer-dma.c | 157 +++++++++++++++++-- include/linux/iio/buffer-dma.h | 26 +++ 2 files changed, 170 insertions(+), 13 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c index 5610ba67925e..adb64f975064 100644 --- a/drivers/iio/buffer/industrialio-buffer-dma.c +++ b/drivers/iio/buffer/industrialio-buffer-dma.c @@ -14,6 +14,7 @@ #include <linux/poll.h> #include <linux/iio/buffer_impl.h> #include <linux/iio/buffer-dma.h> +#include <linux/dma-buf.h> #include <linux/dma-mapping.h> #include <linux/sizes.h> @@ -94,14 +95,24 @@ static void iio_buffer_block_release(struct kref *kref) { struct iio_dma_buffer_block *block = container_of(kref, struct iio_dma_buffer_block, kref);
- struct iio_dma_buffer_queue *queue = block->queue;
- WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
- WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);
- dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
block->vaddr, block->phys_addr);
- mutex_lock(&queue->lock);
- iio_buffer_put(&block->queue->buffer);
- if (block->fileio) {
dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
block->vaddr, block->phys_addr);
queue->num_fileio_blocks--;
- }
- queue->num_blocks--; kfree(block);
- mutex_unlock(&queue->lock);
- iio_buffer_put(&queue->buffer);
} static void iio_buffer_block_get(struct iio_dma_buffer_block *block) @@ -163,7 +174,7 @@ static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf) } static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
- struct iio_dma_buffer_queue *queue, size_t size)
- struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
{ struct iio_dma_buffer_block *block; @@ -171,13 +182,16 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( if (!block) return NULL;
- block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
&block->phys_addr, GFP_KERNEL);
- if (!block->vaddr) {
kfree(block);
return NULL;
- if (fileio) {
block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
&block->phys_addr, GFP_KERNEL);
if (!block->vaddr) {
kfree(block);
return NULL;
}}
- block->fileio = fileio; block->size = size; block->state = IIO_BLOCK_STATE_DONE; block->queue = queue;
@@ -186,6 +200,9 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( iio_buffer_get(&queue->buffer);
- queue->num_blocks++;
- queue->num_fileio_blocks += fileio;
Adding a boolean is non intuitive.
if (fileio) queue->num_fileio_blocks++;
probably easier to read and compiler should be able to figure out how to optimise the condition away.
- return block;
} @@ -211,6 +228,9 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block) _iio_dma_buffer_block_done(block); spin_unlock_irqrestore(&queue->list_lock, flags);
- if (!block->fileio)
iio_buffer_signal_dmabuf_done(block->attach, 0);
- iio_buffer_block_put_atomic(block); wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM);
} @@ -237,10 +257,14 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue, list_del(&block->head); block->bytes_used = 0; _iio_dma_buffer_block_done(block);
if (!block->fileio)
iio_buffer_block_put_atomic(block); } spin_unlock_irqrestore(&queue->list_lock, flags);iio_buffer_signal_dmabuf_done(block->attach, -EINTR);
- queue->fileio.enabled = false;
While this obviously doesn't need to be conditional if it can already be false it might be easier to follow the code flow it if didn't check if we were doing fileio or not before disabling it.
wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM); } EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort); @@ -261,6 +285,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block) } } +static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue) +{
- return queue->fileio.enabled ||
queue->num_blocks == queue->num_fileio_blocks;
This is a little odd. So would be good have a comment on why this condition. Or rename the function to imply it's checking if enabled, or can be enabled.
At first glanced I expected a function with this name to just be an accessor function. e.g. return queue->fileio.enabled;
+}
/**
- iio_dma_buffer_request_update() - DMA buffer request_update callback
- @buffer: The buffer which to request an update
@@ -287,6 +317,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) mutex_lock(&queue->lock);
- queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
- /* If DMABUFs were created, disable fileio interface */
- if (!queue->fileio.enabled)
goto out_unlock;
- /* Allocations are page aligned */ if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size)) try_reuse = true;
@@ -317,7 +353,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) block = queue->fileio.blocks[i]; if (block->state == IIO_BLOCK_STATE_DEAD) { /* Could not reuse it */
iio_buffer_block_put(block);
iio_buffer_block_put_atomic(block); block = NULL; } else { block->size = size;
@@ -327,7 +363,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) } if (!block) {
block = iio_dma_buffer_alloc_block(queue, size);
block = iio_dma_buffer_alloc_block(queue, size, true); if (!block) { ret = -ENOMEM; goto out_unlock;
@@ -363,7 +399,7 @@ static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue) for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { if (!queue->fileio.blocks[i]) continue;
iio_buffer_block_put(queue->fileio.blocks[i]);
iio_buffer_block_put_atomic(queue->fileio.blocks[i]);
For these cases that are atomic or not, it might be worth calling out why they need to be atomic.
queue->fileio.blocks[i] = NULL;
} queue->fileio.active_block = NULL; @@ -384,8 +420,12 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue, block->state = IIO_BLOCK_STATE_ACTIVE; iio_buffer_block_get(block);
- ret = queue->ops->submit(queue, block); if (ret) {
if (!block->fileio)
iio_buffer_signal_dmabuf_done(block->attach, ret);
- /*
- This is a bit of a problem and there is not much we can do
- other then wait for the buffer to be disabled and re-enabled
@@ -588,6 +628,97 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf) } EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available); +struct iio_dma_buffer_block * +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer,
struct dma_buf_attachment *attach)
+{
- struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
- struct iio_dma_buffer_block *block;
- int err;
- mutex_lock(&queue->lock);
guard(mutex)(&queue->lock);
- /*
* If the buffer is enabled and in fileio mode new blocks can't be
* allocated.
*/
- if (queue->fileio.enabled) {
err = -EBUSY;
return ERR_PTR(-EBUSY);
goto err_unlock;
- }
- block = iio_dma_buffer_alloc_block(queue, attach->dmabuf->size, false);
- if (!block) {
err = -ENOMEM;
return
goto err_unlock;
- }
- block->attach = attach;
- /* Free memory that might be in use for fileio mode */
- iio_dma_buffer_fileio_free(queue);
- mutex_unlock(&queue->lock);
Drop this as unneeded if you use guard()
- return block;
+err_unlock:
- mutex_unlock(&queue->lock);
- return ERR_PTR(err);
+} +EXPORT_SYMBOL_GPL(iio_dma_buffer_attach_dmabuf);
+static int iio_dma_can_enqueue_block(struct iio_dma_buffer_block *block) +{
- struct iio_dma_buffer_queue *queue = block->queue;
- /* If in fileio mode buffers can't be enqueued. */
- if (queue->fileio.enabled)
return -EBUSY;
- switch (block->state) {
- case IIO_BLOCK_STATE_QUEUED:
return -EPERM;
- case IIO_BLOCK_STATE_DONE:
return 0;
- default:
return -EBUSY;
Is this a real condition or just avoiding a compile warning? If it's real I'd like the various states that lead to it be listed here just so we can more easily understand why -EBUSY is appropriate.
- }
+}
+int iio_dma_buffer_enqueue_dmabuf(struct iio_buffer *buffer,
struct iio_dma_buffer_block *block,
struct sg_table *sgt,
size_t size, bool cyclic)
+{
- struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer);
- int ret = 0;
No need to init as it's always set.
- mutex_lock(&queue->lock);
guard(mutex)(&queue->lock);
Then can do direct returns on error and not bother with the manual unlock.
- ret = iio_dma_can_enqueue_block(block);
- if (ret < 0)
goto out_mutex_unlock;
- block->bytes_used = size;
- block->cyclic = cyclic;
- block->sg_table = sgt;
- iio_dma_buffer_enqueue(queue, block);
+out_mutex_unlock:
- mutex_unlock(&queue->lock);
- return ret;
+} +EXPORT_SYMBOL_GPL(iio_dma_buffer_enqueue_dmabuf);
On Thu, 2023-12-21 at 16:04 +0000, Jonathan Cameron wrote:
On Tue, 19 Dec 2023 18:50:07 +0100 Paul Cercueil paul@crapouillou.net wrote:
Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf() and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO DMA buffer implementations.
Signed-off-by: Paul Cercueil paul@crapouillou.net
Hi Paul,
A few comments in here. Mostly places where the cleanup.h guard() stuff can simplify error paths.
Overall this looks reasonable to me.
Jonathan
v3: Update code to provide the functions that will be used as callbacks for the new IOCTLs.
drivers/iio/buffer/industrialio-buffer-dma.c | 157 +++++++++++++++++-- include/linux/iio/buffer-dma.h | 26 +++ 2 files changed, 170 insertions(+), 13 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c b/drivers/iio/buffer/industrialio-buffer-dma.c index 5610ba67925e..adb64f975064 100644 --- a/drivers/iio/buffer/industrialio-buffer-dma.c +++ b/drivers/iio/buffer/industrialio-buffer-dma.c @@ -14,6 +14,7 @@ #include <linux/poll.h> #include <linux/iio/buffer_impl.h> #include <linux/iio/buffer-dma.h> +#include <linux/dma-buf.h> #include <linux/dma-mapping.h> #include <linux/sizes.h> @@ -94,14 +95,24 @@ static void iio_buffer_block_release(struct kref *kref) { struct iio_dma_buffer_block *block = container_of(kref, struct iio_dma_buffer_block, kref); + struct iio_dma_buffer_queue *queue = block->queue; - WARN_ON(block->state != IIO_BLOCK_STATE_DEAD); + WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD); - dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size), - block->vaddr, block->phys_addr); + mutex_lock(&queue->lock); - iio_buffer_put(&block->queue->buffer); + if (block->fileio) { + dma_free_coherent(queue->dev, PAGE_ALIGN(block->size), + block->vaddr, block->phys_addr); + queue->num_fileio_blocks--; + }
+ queue->num_blocks--; kfree(block);
+ mutex_unlock(&queue->lock);
+ iio_buffer_put(&queue->buffer); } static void iio_buffer_block_get(struct iio_dma_buffer_block *block) @@ -163,7 +174,7 @@ static struct iio_dma_buffer_queue *iio_buffer_to_queue(struct iio_buffer *buf) } static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( - struct iio_dma_buffer_queue *queue, size_t size) + struct iio_dma_buffer_queue *queue, size_t size, bool fileio) { struct iio_dma_buffer_block *block; @@ -171,13 +182,16 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( if (!block) return NULL; - block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size), - &block->phys_addr, GFP_KERNEL); - if (!block->vaddr) { - kfree(block); - return NULL; + if (fileio) { + block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size), + &block->phys_addr, GFP_KERNEL); + if (!block->vaddr) { + kfree(block); + return NULL; + } } + block->fileio = fileio; block->size = size; block->state = IIO_BLOCK_STATE_DONE; block->queue = queue; @@ -186,6 +200,9 @@ static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block( iio_buffer_get(&queue->buffer); + queue->num_blocks++; + queue->num_fileio_blocks += fileio;
Adding a boolean is non intuitive.
if (fileio) queue->num_fileio_blocks++;
probably easier to read and compiler should be able to figure out how to optimise the condition away.
return block; } @@ -211,6 +228,9 @@ void iio_dma_buffer_block_done(struct iio_dma_buffer_block *block) _iio_dma_buffer_block_done(block); spin_unlock_irqrestore(&queue->list_lock, flags); + if (!block->fileio) + iio_buffer_signal_dmabuf_done(block->attach, 0);
iio_buffer_block_put_atomic(block); wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM); } @@ -237,10 +257,14 @@ void iio_dma_buffer_block_list_abort(struct iio_dma_buffer_queue *queue, list_del(&block->head); block->bytes_used = 0; _iio_dma_buffer_block_done(block);
+ if (!block->fileio) + iio_buffer_signal_dmabuf_done(block->attach, -EINTR); iio_buffer_block_put_atomic(block); } spin_unlock_irqrestore(&queue->list_lock, flags); + queue->fileio.enabled = false;
While this obviously doesn't need to be conditional if it can already be false it might be easier to follow the code flow it if didn't check if we were doing fileio or not before disabling it.
wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDNORM); } EXPORT_SYMBOL_GPL(iio_dma_buffer_block_list_abort); @@ -261,6 +285,12 @@ static bool iio_dma_block_reusable(struct iio_dma_buffer_block *block) } } +static bool iio_dma_buffer_fileio_mode(struct iio_dma_buffer_queue *queue) +{ + return queue->fileio.enabled || + queue->num_blocks == queue->num_fileio_blocks;
This is a little odd. So would be good have a comment on why this condition. Or rename the function to imply it's checking if enabled, or can be enabled.
At first glanced I expected a function with this name to just be an accessor function. e.g. return queue->fileio.enabled;
+}
/** * iio_dma_buffer_request_update() - DMA buffer request_update callback * @buffer: The buffer which to request an update @@ -287,6 +317,12 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) mutex_lock(&queue->lock); + queue->fileio.enabled = iio_dma_buffer_fileio_mode(queue);
+ /* If DMABUFs were created, disable fileio interface */ + if (!queue->fileio.enabled) + goto out_unlock;
/* Allocations are page aligned */ if (PAGE_ALIGN(queue->fileio.block_size) == PAGE_ALIGN(size)) try_reuse = true; @@ -317,7 +353,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) block = queue->fileio.blocks[i]; if (block->state == IIO_BLOCK_STATE_DEAD) { /* Could not reuse it */ - iio_buffer_block_put(block); + iio_buffer_block_put_atomic(block); block = NULL; } else { block->size = size; @@ -327,7 +363,7 @@ int iio_dma_buffer_request_update(struct iio_buffer *buffer) } if (!block) { - block = iio_dma_buffer_alloc_block(queue, size); + block = iio_dma_buffer_alloc_block(queue, size, true); if (!block) { ret = -ENOMEM; goto out_unlock; @@ -363,7 +399,7 @@ static void iio_dma_buffer_fileio_free(struct iio_dma_buffer_queue *queue) for (i = 0; i < ARRAY_SIZE(queue->fileio.blocks); i++) { if (!queue->fileio.blocks[i]) continue; - iio_buffer_block_put(queue->fileio.blocks[i]); + iio_buffer_block_put_atomic(queue->fileio.blocks[i]);
For these cases that are atomic or not, it might be worth calling out why they need to be atomic.
queue->fileio.blocks[i] = NULL; } queue->fileio.active_block = NULL; @@ -384,8 +420,12 @@ static void iio_dma_buffer_submit_block(struct iio_dma_buffer_queue *queue, block->state = IIO_BLOCK_STATE_ACTIVE; iio_buffer_block_get(block);
ret = queue->ops->submit(queue, block); if (ret) { + if (!block->fileio) + iio_buffer_signal_dmabuf_done(block->attach, ret);
/* * This is a bit of a problem and there is not much we can do * other then wait for the buffer to be disabled and re-enabled @@ -588,6 +628,97 @@ size_t iio_dma_buffer_data_available(struct iio_buffer *buf) } EXPORT_SYMBOL_GPL(iio_dma_buffer_data_available); +struct iio_dma_buffer_block * +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer, + struct dma_buf_attachment *attach) +{ + struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer); + struct iio_dma_buffer_block *block; + int err;
+ mutex_lock(&queue->lock);
guard(mutex)(&queue->lock);
I'm also a big fan of this new stuff but shouldn't we have a prep patch making the transition to that? Otherwise we'll end up with a mix of styles. I'm happy to clean up stuff with follow up patches (even some coding style could be improved IIRC) but typically that's not how we handle things like this I believe...
- Nuno Sá
+struct iio_dma_buffer_block * +iio_dma_buffer_attach_dmabuf(struct iio_buffer *buffer, + struct dma_buf_attachment *attach) +{ + struct iio_dma_buffer_queue *queue = iio_buffer_to_queue(buffer); + struct iio_dma_buffer_block *block; + int err;
+ mutex_lock(&queue->lock);
guard(mutex)(&queue->lock);
I'm also a big fan of this new stuff but shouldn't we have a prep patch making the transition to that? Otherwise we'll end up with a mix of styles. I'm happy to clean up stuff with follow up patches (even some coding style could be improved IIRC) but typically that's not how we handle things like this I believe...
Ideally yes, I think a prep patch would make sense - but I'm not that fussed about it and follow up patches would be fine here.
There are cases for using this that are much easier to justify than others, so I don't really mind if simple
mutex_lock(); do_something mutex_unlock();
cases are left for ever not using guard(), though I also don't mind if people want to use scoped_guard() or guard for these just to be consistent. Either way is pretty easy to read. We'll probably also continue to gain new uses of this logic such as the conditional guard stuff that is queued up for the next merge window and whilst that is going on we are going to have a bit of mixed style.
Jonathan
- Nuno Sá
Use the functions provided by the buffer-dma core to implement the DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.
Since we want to be able to transfer an arbitrary number of bytes and not necesarily the full DMABUF, the associated scatterlist is converted to an array of DMA addresses + lengths, which is then passed to dmaengine_prep_slave_dma_array().
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to work with the new functions introduced in industrialio-buffer-dma.c.
v5: - Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers. --- .../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++--- 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c index 5f85ba38e6f6..825d76a24a67 100644 --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c @@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue, struct dmaengine_buffer *dmaengine_buffer = iio_buffer_to_dmaengine_buffer(&queue->buffer); struct dma_async_tx_descriptor *desc; + unsigned int i, nents; + struct scatterlist *sgl; + struct dma_vec *vecs; + size_t max_size; dma_cookie_t cookie; + size_t len_total;
- block->bytes_used = min(block->size, dmaengine_buffer->max_size); - block->bytes_used = round_down(block->bytes_used, - dmaengine_buffer->align); + if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) { + /* We do not yet support output buffers. */ + return -EINVAL; + }
- desc = dmaengine_prep_slave_single(dmaengine_buffer->chan, - block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM, - DMA_PREP_INTERRUPT); + if (block->sg_table) { + sgl = block->sg_table->sgl; + nents = sg_nents_for_len(sgl, block->bytes_used); + + vecs = kmalloc_array(nents, sizeof(*vecs), GFP_KERNEL); + if (!vecs) + return -ENOMEM; + + len_total = block->bytes_used; + + for (i = 0; i < nents; i++) { + vecs[i].addr = sg_dma_address(sgl); + vecs[i].len = min(sg_dma_len(sgl), len_total); + len_total -= vecs[i].len; + + sgl = sg_next(sgl); + } + + desc = dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan, + vecs, nents, DMA_DEV_TO_MEM, + DMA_PREP_INTERRUPT); + kfree(vecs); + } else { + max_size = min(block->size, dmaengine_buffer->max_size); + max_size = round_down(max_size, dmaengine_buffer->align); + block->bytes_used = max_size; + + desc = dmaengine_prep_slave_single(dmaengine_buffer->chan, + block->phys_addr, + block->bytes_used, + DMA_DEV_TO_MEM, + DMA_PREP_INTERRUPT); + } if (!desc) return -ENOMEM;
@@ -120,6 +156,10 @@ static const struct iio_buffer_access_funcs iio_dmaengine_buffer_ops = { .data_available = iio_dma_buffer_data_available, .release = iio_dmaengine_buffer_release,
+ .enqueue_dmabuf = iio_dma_buffer_enqueue_dmabuf, + .attach_dmabuf = iio_dma_buffer_attach_dmabuf, + .detach_dmabuf = iio_dma_buffer_detach_dmabuf, + .modes = INDIO_BUFFER_HARDWARE, .flags = INDIO_BUFFER_FLAG_FIXED_WATERMARK, };
On Tue, 19 Dec 2023 18:50:08 +0100 Paul Cercueil paul@crapouillou.net wrote:
Use the functions provided by the buffer-dma core to implement the DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.
Since we want to be able to transfer an arbitrary number of bytes and not necesarily the full DMABUF, the associated scatterlist is converted to an array of DMA addresses + lengths, which is then passed to dmaengine_prep_slave_dma_array().
Signed-off-by: Paul Cercueil paul@crapouillou.net
One question inline. Otherwise looks fine to me.
J
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to work with the new functions introduced in industrialio-buffer-dma.c.
v5: - Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
.../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++--- 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c index 5f85ba38e6f6..825d76a24a67 100644 --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c @@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue, struct dmaengine_buffer *dmaengine_buffer = iio_buffer_to_dmaengine_buffer(&queue->buffer); struct dma_async_tx_descriptor *desc;
- unsigned int i, nents;
- struct scatterlist *sgl;
- struct dma_vec *vecs;
- size_t max_size; dma_cookie_t cookie;
- size_t len_total;
- block->bytes_used = min(block->size, dmaengine_buffer->max_size);
- block->bytes_used = round_down(block->bytes_used,
dmaengine_buffer->align);
- if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
/* We do not yet support output buffers. */
return -EINVAL;
- }
- desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT);
- if (block->sg_table) {
sgl = block->sg_table->sgl;
nents = sg_nents_for_len(sgl, block->bytes_used);
Are we guaranteed the length in the sglist is enough? If not this can return an error code.
vecs = kmalloc_array(nents, sizeof(*vecs), GFP_KERNEL);
if (!vecs)
return -ENOMEM;
len_total = block->bytes_used;
for (i = 0; i < nents; i++) {
vecs[i].addr = sg_dma_address(sgl);
vecs[i].len = min(sg_dma_len(sgl), len_total);
len_total -= vecs[i].len;
sgl = sg_next(sgl);
}
desc = dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
vecs, nents, DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT);
kfree(vecs);
- } else {
max_size = min(block->size, dmaengine_buffer->max_size);
max_size = round_down(max_size, dmaengine_buffer->align);
block->bytes_used = max_size;
desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
block->phys_addr,
block->bytes_used,
DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT);
- } if (!desc) return -ENOMEM;
Hi Jonathan,
Le jeudi 21 décembre 2023 à 16:12 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:08 +0100 Paul Cercueil paul@crapouillou.net wrote:
Use the functions provided by the buffer-dma core to implement the DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.
Since we want to be able to transfer an arbitrary number of bytes and not necesarily the full DMABUF, the associated scatterlist is converted to an array of DMA addresses + lengths, which is then passed to dmaengine_prep_slave_dma_array().
Signed-off-by: Paul Cercueil paul@crapouillou.net
One question inline. Otherwise looks fine to me.
J
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to work with the new functions introduced in industrialio-buffer- dma.c.
v5: - Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
.../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++--- 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c index 5f85ba38e6f6..825d76a24a67 100644 --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c @@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue, struct dmaengine_buffer *dmaengine_buffer = iio_buffer_to_dmaengine_buffer(&queue->buffer); struct dma_async_tx_descriptor *desc;
- unsigned int i, nents;
- struct scatterlist *sgl;
- struct dma_vec *vecs;
- size_t max_size;
dma_cookie_t cookie;
- size_t len_total;
- block->bytes_used = min(block->size, dmaengine_buffer-
max_size);
- block->bytes_used = round_down(block->bytes_used,
dmaengine_buffer->align);
- if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
/* We do not yet support output buffers. */
return -EINVAL;
- }
- desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
block->phys_addr, block->bytes_used,
DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT);
- if (block->sg_table) {
sgl = block->sg_table->sgl;
nents = sg_nents_for_len(sgl, block->bytes_used);
Are we guaranteed the length in the sglist is enough? If not this can return an error code.
The length of the sglist will always be enough, the iio_buffer_enqueue_dmabuf() function already checks that block-
bytes_used is equal or smaller than the size of the DMABUF.
It is quite a few functions above in the call stack though, so I can handle the errors of sg_nents_for_len() here if you think makes sense.
Cheers, -Paul
vecs = kmalloc_array(nents, sizeof(*vecs),
GFP_KERNEL);
if (!vecs)
return -ENOMEM;
len_total = block->bytes_used;
for (i = 0; i < nents; i++) {
vecs[i].addr = sg_dma_address(sgl);
vecs[i].len = min(sg_dma_len(sgl),
len_total);
len_total -= vecs[i].len;
sgl = sg_next(sgl);
}
desc =
dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
vecs, nents,
DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT);
kfree(vecs);
- } else {
max_size = min(block->size, dmaengine_buffer-
max_size);
max_size = round_down(max_size, dmaengine_buffer-
align);
block->bytes_used = max_size;
desc =
dmaengine_prep_slave_single(dmaengine_buffer->chan,
block-
phys_addr,
block-
bytes_used,
DMA_DEV_TO_MEM,
DMA_PREP_INTERRUPT);
- }
if (!desc) return -ENOMEM;
On Thu, 2023-12-21 at 18:30 +0100, Paul Cercueil wrote:
Hi Jonathan,
Le jeudi 21 décembre 2023 à 16:12 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:08 +0100 Paul Cercueil paul@crapouillou.net wrote:
Use the functions provided by the buffer-dma core to implement the DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.
Since we want to be able to transfer an arbitrary number of bytes and not necesarily the full DMABUF, the associated scatterlist is converted to an array of DMA addresses + lengths, which is then passed to dmaengine_prep_slave_dma_array().
Signed-off-by: Paul Cercueil paul@crapouillou.net
One question inline. Otherwise looks fine to me.
J
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to work with the new functions introduced in industrialio-buffer- dma.c.
v5: - Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
.../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++--- 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c index 5f85ba38e6f6..825d76a24a67 100644 --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c @@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue, struct dmaengine_buffer *dmaengine_buffer = iio_buffer_to_dmaengine_buffer(&queue->buffer); struct dma_async_tx_descriptor *desc; + unsigned int i, nents; + struct scatterlist *sgl; + struct dma_vec *vecs; + size_t max_size; dma_cookie_t cookie; + size_t len_total; - block->bytes_used = min(block->size, dmaengine_buffer-
max_size);
- block->bytes_used = round_down(block->bytes_used, - dmaengine_buffer->align); + if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) { + /* We do not yet support output buffers. */ + return -EINVAL; + } - desc = dmaengine_prep_slave_single(dmaengine_buffer->chan, - block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM, - DMA_PREP_INTERRUPT); + if (block->sg_table) { + sgl = block->sg_table->sgl; + nents = sg_nents_for_len(sgl, block->bytes_used);
Are we guaranteed the length in the sglist is enough? If not this can return an error code.
The length of the sglist will always be enough, the iio_buffer_enqueue_dmabuf() function already checks that block-
bytes_used is equal or smaller than the size of the DMABUF.
It is quite a few functions above in the call stack though, so I can handle the errors of sg_nents_for_len() here if you think makes sense.
Maybe putting something like the above in a comment?
- Nuno Sá
On Fri, 22 Dec 2023 09:58:29 +0100 Nuno Sá noname.nuno@gmail.com wrote:
On Thu, 2023-12-21 at 18:30 +0100, Paul Cercueil wrote:
Hi Jonathan,
Le jeudi 21 décembre 2023 à 16:12 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:08 +0100 Paul Cercueil paul@crapouillou.net wrote:
Use the functions provided by the buffer-dma core to implement the DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.
Since we want to be able to transfer an arbitrary number of bytes and not necesarily the full DMABUF, the associated scatterlist is converted to an array of DMA addresses + lengths, which is then passed to dmaengine_prep_slave_dma_array().
Signed-off-by: Paul Cercueil paul@crapouillou.net
One question inline. Otherwise looks fine to me.
J
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to work with the new functions introduced in industrialio-buffer- dma.c.
v5: - Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
.../buffer/industrialio-buffer-dmaengine.c | 52 ++++++++++++++++--- 1 file changed, 46 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c b/drivers/iio/buffer/industrialio-buffer-dmaengine.c index 5f85ba38e6f6..825d76a24a67 100644 --- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c +++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c @@ -64,15 +64,51 @@ static int iio_dmaengine_buffer_submit_block(struct iio_dma_buffer_queue *queue, struct dmaengine_buffer *dmaengine_buffer = iio_buffer_to_dmaengine_buffer(&queue->buffer); struct dma_async_tx_descriptor *desc; + unsigned int i, nents; + struct scatterlist *sgl; + struct dma_vec *vecs; + size_t max_size; dma_cookie_t cookie; + size_t len_total; - block->bytes_used = min(block->size, dmaengine_buffer-
max_size);
- block->bytes_used = round_down(block->bytes_used, - dmaengine_buffer->align); + if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) { + /* We do not yet support output buffers. */ + return -EINVAL; + } - desc = dmaengine_prep_slave_single(dmaengine_buffer->chan, - block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM, - DMA_PREP_INTERRUPT); + if (block->sg_table) { + sgl = block->sg_table->sgl; + nents = sg_nents_for_len(sgl, block->bytes_used);
Are we guaranteed the length in the sglist is enough? If not this can return an error code.
The length of the sglist will always be enough, the iio_buffer_enqueue_dmabuf() function already checks that block-
bytes_used is equal or smaller than the size of the DMABUF.
It is quite a few functions above in the call stack though, so I can handle the errors of sg_nents_for_len() here if you think makes sense.
Maybe putting something like the above in a comment?
Either comment, or an explicit check so we don't need the comment is fine as far as I'm concerned.
Jonathan
- Nuno Sá
Document the new DMABUF based API.
Signed-off-by: Paul Cercueil paul@crapouillou.net
--- v2: - Explicitly state that the new interface is optional and is not implemented by all drivers. - The IOCTLs can now only be called on the buffer FD returned by IIO_BUFFER_GET_FD_IOCTL. - Move the page up a bit in the index since it is core stuff and not driver-specific.
v3: Update the documentation to reflect the new API.
v5: Use description lists for the documentation of the three new IOCTLs instead of abusing subsections. --- Documentation/iio/dmabuf_api.rst | 54 ++++++++++++++++++++++++++++++++ Documentation/iio/index.rst | 2 ++ 2 files changed, 56 insertions(+) create mode 100644 Documentation/iio/dmabuf_api.rst
diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst new file mode 100644 index 000000000000..1cd6cd51a582 --- /dev/null +++ b/Documentation/iio/dmabuf_api.rst @@ -0,0 +1,54 @@ +.. SPDX-License-Identifier: GPL-2.0 + +=================================== +High-speed DMABUF interface for IIO +=================================== + +1. Overview +=========== + +The Industrial I/O subsystem supports access to buffers through a +file-based interface, with read() and write() access calls through the +IIO device's dev node. + +It additionally supports a DMABUF based interface, where the userspace +can attach DMABUF objects (externally created) to a IIO buffer, and +subsequently use them for data transfers. + +A userspace application can then use this interface to share DMABUF +objects between several interfaces, allowing it to transfer data in a +zero-copy fashion, for instance between IIO and the USB stack. + +The userspace application can also memory-map the DMABUF objects, and +access the sample data directly. The advantage of doing this vs. the +read() interface is that it avoids an extra copy of the data between the +kernel and userspace. This is particularly useful for high-speed devices +which produce several megabytes or even gigabytes of data per second. +It does however increase the userspace-kernelspace synchronization +overhead, as the DMA_BUF_SYNC_START and DMA_BUF_SYNC_END IOCTLs have to +be used for data integrity. + +2. User API +=========== + +As part of this interface, three new IOCTLs have been added. These three +IOCTLs have to be performed on the IIO buffer's file descriptor, +obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl. + + ``IIO_BUFFER_DMABUF_ATTACH_IOCTL(int)`` + Attach the DMABUF object, identified by its file descriptor, to the + IIO buffer. Returns zero on success, and a negative errno value on + error. + + ``IIO_BUFFER_DMABUF_DETACH_IOCTL(int)`` + Detach the given DMABUF object, identified by its file descriptor, + from the IIO buffer. Returns zero on success, and a negative errno + value on error. + + Note that closing the IIO buffer's file descriptor will + automatically detach all previously attached DMABUF objects. + + ``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *iio_dmabuf)`` + Enqueue a previously attached DMABUF object to the buffer queue. + Enqueued DMABUFs will be read from (if output buffer) or written to + (if input buffer) as long as the buffer is enabled. diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst index 1b7292c58cd0..3eae8fcb1938 100644 --- a/Documentation/iio/index.rst +++ b/Documentation/iio/index.rst @@ -9,6 +9,8 @@ Industrial I/O
iio_configfs
+ dmabuf_api + ep93xx_adc
bno055
On Tue, 19 Dec 2023 18:50:09 +0100 Paul Cercueil paul@crapouillou.net wrote:
Document the new DMABUF based API.
Signed-off-by: Paul Cercueil paul@crapouillou.net
One minor comment inline.
v2: - Explicitly state that the new interface is optional and is not implemented by all drivers. - The IOCTLs can now only be called on the buffer FD returned by IIO_BUFFER_GET_FD_IOCTL. - Move the page up a bit in the index since it is core stuff and not driver-specific.
v3: Update the documentation to reflect the new API.
v5: Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Documentation/iio/dmabuf_api.rst | 54 ++++++++++++++++++++++++++++++++ Documentation/iio/index.rst | 2 ++ 2 files changed, 56 insertions(+) create mode 100644 Documentation/iio/dmabuf_api.rst
diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst new file mode 100644 index 000000000000..1cd6cd51a582 --- /dev/null +++ b/Documentation/iio/dmabuf_api.rst @@ -0,0 +1,54 @@ +.. SPDX-License-Identifier: GPL-2.0
+=================================== +High-speed DMABUF interface for IIO +===================================
+1. Overview +===========
+The Industrial I/O subsystem supports access to buffers through a +file-based interface, with read() and write() access calls through the +IIO device's dev node.
+It additionally supports a DMABUF based interface, where the userspace +can attach DMABUF objects (externally created) to a IIO buffer, and +subsequently use them for data transfers.
+A userspace application can then use this interface to share DMABUF +objects between several interfaces, allowing it to transfer data in a +zero-copy fashion, for instance between IIO and the USB stack.
+The userspace application can also memory-map the DMABUF objects, and +access the sample data directly. The advantage of doing this vs. the +read() interface is that it avoids an extra copy of the data between the +kernel and userspace. This is particularly useful for high-speed devices +which produce several megabytes or even gigabytes of data per second. +It does however increase the userspace-kernelspace synchronization +overhead, as the DMA_BUF_SYNC_START and DMA_BUF_SYNC_END IOCTLs have to +be used for data integrity.
+2. User API +===========
+As part of this interface, three new IOCTLs have been added. These three +IOCTLs have to be performed on the IIO buffer's file descriptor, +obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
I would call out that they do not work on the main file descriptor (which is arguably also a IIO buffer file descriptor).
- ``IIO_BUFFER_DMABUF_ATTACH_IOCTL(int)``
- Attach the DMABUF object, identified by its file descriptor, to the
- IIO buffer. Returns zero on success, and a negative errno value on
- error.
- ``IIO_BUFFER_DMABUF_DETACH_IOCTL(int)``
- Detach the given DMABUF object, identified by its file descriptor,
- from the IIO buffer. Returns zero on success, and a negative errno
- value on error.
- Note that closing the IIO buffer's file descriptor will
- automatically detach all previously attached DMABUF objects.
- ``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *iio_dmabuf)``
- Enqueue a previously attached DMABUF object to the buffer queue.
- Enqueued DMABUFs will be read from (if output buffer) or written to
- (if input buffer) as long as the buffer is enabled.
On Tue, 19 Dec 2023 18:50:01 +0100 Paul Cercueil paul@crapouillou.net wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
Hi Paul,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
Great to see this moving forwards.
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
Seems like a sensible path in the short term.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
This needs a bit more investigation ideally. Perhaps perf counters can be used to establish that cache misses are the main different between dropping it on the floor and actually reading the data.
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
I'd imagine folios of reasonable size will help sort of a huge page as then hopefully it will use the flush by va range instructions if available.
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
This is a nice example. Where are you with getting the patch merged?
Overall, this code looks fine to me, though there are some parts that need review by other maintainers (e.g. Vinod for the dmaengine callback) and I'd like a 'looks fine' at least form those who know a lot more about dmabuf than I do.
To actually make this useful sounds like either udmabuf needs some perf improvements, or there has to be an upstream case of sharing it without something else (e.g your functionfs patches). So what do we need to get in before the positive benefit becomes worth carrying this extra complexity? (which isn't too bad so I'm fine with a small benefit and promises of riches :)
Jonathan
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
[3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
[4/8]: Implement .device_prep_slave_dma_vec() instead of V3's .device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
[5/8]:
- Use dev_err() instead of pr_err()
- Inline to_iio_dma_fence()
- Add comment to explain why we unref twice when detaching dmabuf
- Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed.
- Fix documentation of new fields in struct iio_buffer_access_funcs
- iio_dma_resv_lock() does not need to be exported, make it static
[7/8]:
- Use the new dmaengine_prep_slave_dma_vec().
- Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
[8/8]: Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
Hi Jonathan,
Le jeudi 21 décembre 2023 à 16:30 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:01 +0100 Paul Cercueil paul@crapouillou.net wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
Hi Paul,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
Great to see this moving forwards.
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
Seems like a sensible path in the short term.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
This needs a bit more investigation ideally. Perhaps perf counters can be used to establish that cache misses are the main different between dropping it on the floor and actually reading the data.
Yes, we'll work on it. The other big difference is that fileio uses dma_alloc_coherent() while the DMABUFs use non-coherent mappings. I guess coherent memory is faster for the typical access pattern (which is "read/write everything sequentially once").
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
I'd imagine folios of reasonable size will help sort of a huge page as then hopefully it will use the flush by va range instructions if available.
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
This is a nice example. Where are you with getting the patch merged?
I'll send a new version (mostly a [RESEND]...) in the coming days. As you can see from the review on my last attempt, the main blocker is that nobody wants to merge a new interface if the rest of the kernel bits aren't upstream yet. Kind of a chicken-and-egg problem :)
Overall, this code looks fine to me, though there are some parts that need review by other maintainers (e.g. Vinod for the dmaengine callback) and I'd like a 'looks fine' at least form those who know a lot more about dmabuf than I do.
To actually make this useful sounds like either udmabuf needs some perf improvements, or there has to be an upstream case of sharing it without something else (e.g your functionfs patches). So what do we need to get in before the positive benefit becomes worth carrying this extra complexity? (which isn't too bad so I'm fine with a small benefit and promises of riches :)
I think the FunctionFS DMABUF interface can be pushed as well for 5.9, in parallel of this one, as the feedback on the V1 was good. I might just need some help pushing it forward (kind of a "I merge it if you merge it" guarantee).
Cheers, -Paul
Jonathan
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
- [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
.device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
- [5/8]:
- Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
- [7/8]:
- Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
- [8/8]:
Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
On Thu, 21 Dec 2023 18:56:52 +0100 Paul Cercueil paul@crapouillou.net wrote:
Hi Jonathan,
Le jeudi 21 décembre 2023 à 16:30 +0000, Jonathan Cameron a écrit :
On Tue, 19 Dec 2023 18:50:01 +0100 Paul Cercueil paul@crapouillou.net wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
Hi Paul,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
Great to see this moving forwards.
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
Seems like a sensible path in the short term.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
This needs a bit more investigation ideally. Perhaps perf counters can be used to establish that cache misses are the main different between dropping it on the floor and actually reading the data.
Yes, we'll work on it. The other big difference is that fileio uses dma_alloc_coherent() while the DMABUFs use non-coherent mappings. I guess coherent memory is faster for the typical access pattern (which is "read/write everything sequentially once").
Long time since I last worked much with a platform that wasn't always IO coherent, so I've forgotten how all this works (all ends up as no-ops on platforms I tend to use these days!) Good luck, I'll be interested to see what this turns out to be.
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
I'd imagine folios of reasonable size will help sort of a huge page as then hopefully it will use the flush by va range instructions if available.
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
This is a nice example. Where are you with getting the patch merged?
I'll send a new version (mostly a [RESEND]...) in the coming days. As you can see from the review on my last attempt, the main blocker is that nobody wants to merge a new interface if the rest of the kernel bits aren't upstream yet. Kind of a chicken-and-egg problem :)
Overall, this code looks fine to me, though there are some parts that need review by other maintainers (e.g. Vinod for the dmaengine callback) and I'd like a 'looks fine' at least form those who know a lot more about dmabuf than I do.
To actually make this useful sounds like either udmabuf needs some perf improvements, or there has to be an upstream case of sharing it without something else (e.g your functionfs patches). So what do we need to get in before the positive benefit becomes worth carrying this extra complexity? (which isn't too bad so I'm fine with a small benefit and promises of riches :)
I think the FunctionFS DMABUF interface can be pushed as well for 5.9, in parallel of this one, as the feedback on the V1 was good. I might just need some help pushing it forward (kind of a "I merge it if you merge it" guarantee).
Ok. If we get a 'fine by us' from DMABUF folk I'd be happy to make that commitment for the IIO parts.
Jonathan
Cheers, -Paul
Jonathan
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
- [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
.device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
- [5/8]:
- Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
- [7/8]:
- Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
- [8/8]:
Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
On 12/19/23 11:50 AM, Paul Cercueil wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
Have you tried DMABUFs created using the DMABUF System heap exporter? (drivers/dma-buf/heaps/system_heap.c) It should be able to handle larger allocation better here, and if you don't have any active mmaps or vmaps then it can skip CPU-side coherency maintenance (useful for device to device transfers).
Allocating DMABUFs out of user pages has a bunch of other issues you might run into also. I'd argue udmabuf is now completely superseded by DMABUF system heaps. Try it out :)
Andrew
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
[3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
[4/8]: Implement .device_prep_slave_dma_vec() instead of V3's .device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
[5/8]:
- Use dev_err() instead of pr_err()
- Inline to_iio_dma_fence()
- Add comment to explain why we unref twice when detaching dmabuf
- Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed.
- Fix documentation of new fields in struct iio_buffer_access_funcs
- iio_dma_resv_lock() does not need to be exported, make it static
[7/8]:
- Use the new dmaengine_prep_slave_dma_vec().
- Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
[8/8]: Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
Hi Andrew,
Le lundi 08 janvier 2024 à 15:12 -0600, Andrew Davis a écrit :
On 12/19/23 11:50 AM, Paul Cercueil wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
Have you tried DMABUFs created using the DMABUF System heap exporter? (drivers/dma-buf/heaps/system_heap.c) It should be able to handle larger allocation better here, and if you don't have any active mmaps or vmaps then it can skip CPU-side coherency maintenance (useful for device to device transfers).
I didn't know about it!
But udmabuf also allows you to skip CPU-side coherency maintenance, since DMABUFs have two ioctls to start/finish CPU access anyway.
Allocating DMABUFs out of user pages has a bunch of other issues you might run into also. I'd argue udmabuf is now completely superseded by DMABUF system heaps. Try it out :)
I'm curious, what other issues?
The good thing about udmabuf is that the memory is backed by pages, so we can use MSG_ZEROCOPY on sockets to transfer the mmapped data over the network (having a DMABUF interface to the network stack would be better, but I'm not opening that can of worms).
Andrew
Cheers, -Paul
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
- [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
.device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
- [5/8]:
- Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
- [7/8]:
- Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
- [8/8]:
Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
On 1/11/24 3:20 AM, Paul Cercueil wrote:
Hi Andrew,
Le lundi 08 janvier 2024 à 15:12 -0600, Andrew Davis a écrit :
On 12/19/23 11:50 AM, Paul Cercueil wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
Have you tried DMABUFs created using the DMABUF System heap exporter? (drivers/dma-buf/heaps/system_heap.c) It should be able to handle larger allocation better here, and if you don't have any active mmaps or vmaps then it can skip CPU-side coherency maintenance (useful for device to device transfers).
I didn't know about it!
But udmabuf also allows you to skip CPU-side coherency maintenance, since DMABUFs have two ioctls to start/finish CPU access anyway.
The only way it lets you skip that is if your application just doesn't call those begin/end ioctls, which is wrong. That may work on a system where CPU caches can be snooped by all devices that could attach to a buffer(x86), but that might not work on others(ARM). So calling those begin/end ioctls is required[0]. If maintenance is not actually needed then the kernel will turn those calls into NOPs for you, but only the kernel can know when that is correct (based on the running system and the devices attached to that buffer), not userspace.
Allocating DMABUFs out of user pages has a bunch of other issues you might run into also. I'd argue udmabuf is now completely superseded by DMABUF system heaps. Try it out :)
I'm curious, what other issues?
For starters the {begin,end}_cpu_access() callbacks don't actually sync the pages for any of the devices attached to the DMABUF, it only makes a fake mapping for the misc device(CPU) then syncs with that. That probably works for the QEMU case it was designed for where the device is always a VM instance running on the same CPU, but for any real devices the sync never happens towards them.
I have some patches fixing the above I'll post this cycle, but it wont help with folks doing reads/wrties on the original shmem/memfd outside of the begin/end ioctls. So there is a fundamental issue with the buffer's backing memory's ownership/lifecycle that makes udmabuf broken by design.
The DMABUF System Heap owns the backing memory and manages that memory's lifecycle as all correct DMABUF exporters must.
The good thing about udmabuf is that the memory is backed by pages, so we can use MSG_ZEROCOPY on sockets to transfer the mmapped data over the network (having a DMABUF interface to the network stack would be better, but I'm not opening that can of worms).
Yes, having a DMABUF importer interface for the network stack would be the best long-term solution here, and one will probably end up being needed for zero-copy buffer passing directly between HW and network which seems to be a growing area of interest. And would help solve some cases where MSG_ZEROCOPY fails (such as devices without scatter-gather) by making sure the backing buffer meets the needs of all attached devices, etc.. But I do agree let's leave those worm-cans for someone else to open :)
I wonder what would happen if you tried a MSG_ZEROCOPY on a buffer that was an mmap'd address from a DMABUF.. probably nothing good but might be worth looking into.
Andrew
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Andrew
Cheers, -Paul
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
- [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a new
dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
.device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
- [5/8]:
- Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
- [7/8]:
- Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
- [8/8]:
Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
Hi Andrew,
Le jeudi 11 janvier 2024 à 11:30 -0600, Andrew Davis a écrit :
On 1/11/24 3:20 AM, Paul Cercueil wrote:
Hi Andrew,
Le lundi 08 janvier 2024 à 15:12 -0600, Andrew Davis a écrit :
On 12/19/23 11:50 AM, Paul Cercueil wrote:
[V4 was: "iio: Add buffer write() support"][1]
Hi Jonathan,
This is a respin of the V3 of my patchset that introduced a new interface based on DMABUF objects [2].
The V4 was a split of the patchset, to attempt to upstream buffer write() support first. But since there is no current user upstream, it was not merged. This V5 is about doing the opposite, and contains the new DMABUF interface, without adding the buffer write() support. It can already be used with the upstream adi-axi-adc driver.
In user-space, Libiio uses it to transfer back and forth blocks of samples between the hardware and the applications, without having to copy the data.
On a ZCU102 with a FMComms3 daughter board, running Libiio from the pcercuei/dev-new-dmabuf-api branch [3], compiled with WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 116 MiB/s
Same hardware, with the DMABUF API (WITH_LOCAL_DMABUF_API=ON): sudo utils/iio_rwdev -b 4096 -B cf-ad9361-lpc Throughput: 475 MiB/s
This benchmark only measures the speed at which the data can be fetched to iio_rwdev's internal buffers, and does not actually try to read the data (e.g. to pipe it to stdout). It shows that fetching the data is more than 4x faster using the new interface.
When actually reading the data, the performance difference isn't that impressive (maybe because in case of DMABUF the data is not in cache):
WITH_LOCAL_DMABUF_API=OFF (so that it uses fileio): sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2446422528 bytes (2.4 GB, 2.3 GiB) copied, 22 s, 111 MB/s
WITH_LOCAL_DMABUF_API=ON: sudo utils/iio_rwdev -b 4096 cf-ad9361-lpc | dd of=/dev/zero status=progress 2334388736 bytes (2.3 GB, 2.2 GiB) copied, 21 s, 114 MB/s
One interesting thing to note is that fileio is (currently) actually faster than the DMABUF interface if you increase a lot the buffer size. My explanation is that the cache invalidation routine takes more and more time the bigger the DMABUF gets. This is because the DMABUF is backed by small-size pages, so a (e.g.) 64 MiB DMABUF is backed by up to 16 thousands pages, that have to be invalidated one by one. This can be addressed by using huge pages, but the udmabuf driver does not (yet) support creating DMABUFs backed by huge pages.
Have you tried DMABUFs created using the DMABUF System heap exporter? (drivers/dma-buf/heaps/system_heap.c) It should be able to handle larger allocation better here, and if you don't have any active mmaps or vmaps then it can skip CPU-side coherency maintenance (useful for device to device transfers).
I didn't know about it!
But udmabuf also allows you to skip CPU-side coherency maintenance, since DMABUFs have two ioctls to start/finish CPU access anyway.
The only way it lets you skip that is if your application just doesn't call those begin/end ioctls, which is wrong. That may work on a system where CPU caches can be snooped by all devices that could attach to a buffer(x86), but that might not work on others(ARM). So calling those begin/end ioctls is required[0]. If maintenance is not actually needed then the kernel will turn those calls into NOPs for you, but only the kernel can know when that is correct (based on the running system and the devices attached to that buffer), not userspace.
My application only calls these begin/end IOCTLs when the DMABUF's data is accessed (through its mmapped address), and not ie. when I just pass around the DMABUF to another device driver. In that case I don't care that the CPU caches aren't sync'd.
Allocating DMABUFs out of user pages has a bunch of other issues you might run into also. I'd argue udmabuf is now completely superseded by DMABUF system heaps. Try it out :)
I'm curious, what other issues?
For starters the {begin,end}_cpu_access() callbacks don't actually sync the pages for any of the devices attached to the DMABUF, it only makes a fake mapping for the misc device(CPU) then syncs with that. That probably works for the QEMU case it was designed for where the device is always a VM instance running on the same CPU, but for any real devices the sync never happens towards them.
I have some patches fixing the above I'll post this cycle, but it wont help with folks doing reads/wrties on the original shmem/memfd outside of the begin/end ioctls. So there is a fundamental issue with the buffer's backing memory's ownership/lifecycle that makes udmabuf broken by design.
The DMABUF System Heap owns the backing memory and manages that memory's lifecycle as all correct DMABUF exporters must.
Sounds good.
One other issue I was having with udmabuf is that it does not like huge pages, so I had to use small 4096 bytes pages. Since my DMABUFs can be huge (half of the RAM) this caused my SGs to be very long, and in turn cause the CPU to spend an enormous amount of time inside dma_sg_sync_for_cpu().
At least the DMABUF system heap seems better designed in that regard.
The good thing about udmabuf is that the memory is backed by pages, so we can use MSG_ZEROCOPY on sockets to transfer the mmapped data over the network (having a DMABUF interface to the network stack would be better, but I'm not opening that can of worms).
Yes, having a DMABUF importer interface for the network stack would be the best long-term solution here, and one will probably end up being needed for zero-copy buffer passing directly between HW and network which seems to be a growing area of interest. And would help solve some cases where MSG_ZEROCOPY fails (such as devices without scatter-gather) by making sure the backing buffer meets the needs of all attached devices, etc.. But I do agree let's leave those worm-cans for someone else to open :)
I wonder what would happen if you tried a MSG_ZEROCOPY on a buffer that was an mmap'd address from a DMABUF.. probably nothing good but might be worth looking into.
I'll give it a try at some point.
Cheers, -Paul
Andrew
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/driv...
Andrew
Cheers, -Paul
Anyway, the real benefits happen when the DMABUFs are either shared between IIO devices, or between the IIO subsystem and another filesystem. In that case, the DMABUFs are simply passed around drivers, without the data being copied at any moment.
We use that feature to transfer samples from our transceivers to USB, using a DMABUF interface to FunctionFS [4].
This drastically increases the throughput, to about 274 MiB/s over a USB3 link, vs. 127 MiB/s using IIO's fileio interface + write() to the FunctionFS endpoints, for a lower CPU usage (0.85 vs. 0.65 load avg.).
Based on linux-next/next-20231219.
Cheers, -Paul
[1] https://lore.kernel.org/all/20230807112113.47157-1-paul@crapouillou.net/ [2] https://lore.kernel.org/all/20230403154800.215924-1-paul@crapouillou.net/ [3] https://github.com/analogdevicesinc/libiio/tree/pcercuei/dev-new-dmabuf-api [4] https://lore.kernel.org/all/20230322092118.9213-1-paul@crapouillou.net/
Changelog:
- [3/8]: Replace V3's dmaengine_prep_slave_dma_array() with a
new dmaengine_prep_slave_dma_vec(), which uses a new 'dma_vec' struct. Note that at some point we will need to support cyclic transfers using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags" parameter to the function?
- [4/8]: Implement .device_prep_slave_dma_vec() instead of V3's
.device_prep_slave_dma_array().
@Vinod: this patch will cause a small conflict with my other patchset adding scatter-gather support to the axi-dmac driver. This patch adds a call to axi_dmac_alloc_desc(num_sgs), but the prototype of this function changed in my other patchset - it would have to be passed the "chan" variable. I don't know how you prefer it to be resolved. Worst case scenario (and if @Jonathan is okay with that) this one patch can be re-sent later, but it would make this patchset less "atomic".
- [5/8]:
- Use dev_err() instead of pr_err() - Inline to_iio_dma_fence() - Add comment to explain why we unref twice when detaching dmabuf - Remove TODO comment. It is actually safe to free the file's private data even when transfers are still pending because it won't be accessed. - Fix documentation of new fields in struct iio_buffer_access_funcs - iio_dma_resv_lock() does not need to be exported, make it static
- [7/8]:
- Use the new dmaengine_prep_slave_dma_vec(). - Restrict to input buffers, since output buffers are not yet supported by IIO buffers.
- [8/8]:
Use description lists for the documentation of the three new IOCTLs instead of abusing subsections.
Alexandru Ardelean (1): iio: buffer-dma: split iio_dma_buffer_fileio_free() function
Paul Cercueil (7): iio: buffer-dma: Get rid of outgoing queue dmaengine: Add API function dmaengine_prep_slave_dma_vec() dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec iio: core: Add new DMABUF interface infrastructure iio: buffer-dma: Enable support for DMABUFs iio: buffer-dmaengine: Support new DMABUF based userspace API Documentation: iio: Document high-speed DMABUF based API
Documentation/iio/dmabuf_api.rst | 54 +++ Documentation/iio/index.rst | 2 + drivers/dma/dma-axi-dmac.c | 40 ++ drivers/iio/buffer/industrialio-buffer-dma.c | 242 ++++++++--- .../buffer/industrialio-buffer-dmaengine.c | 52 ++- drivers/iio/industrialio-buffer.c | 402 ++++++++++++++++++ include/linux/dmaengine.h | 25 ++ include/linux/iio/buffer-dma.h | 33 +- include/linux/iio/buffer_impl.h | 26 ++ include/uapi/linux/iio/buffer.h | 22 + 10 files changed, 836 insertions(+), 62 deletions(-) create mode 100644 Documentation/iio/dmabuf_api.rst
linaro-mm-sig@lists.linaro.org