On Wed, May 22, 2019 at 7:28 PM Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 22.05.19 um 18:17 schrieb Sumit Semwal:
Hi Christian,
On Sat, 27 Apr 2019 at 05:31, Liam Mark lmark@codeaurora.org wrote:
On Tue, 16 Apr 2019, Christian König wrote:
To allow a smooth transition from pinning buffer objects to dynamic invalidation we first start to cache the sg_table for an attachment unless the driver explicitly says to not do so.
drivers/dma-buf/dma-buf.c | 24 ++++++++++++++++++++++++ include/linux/dma-buf.h | 11 +++++++++++ 2 files changed, 35 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 7c858020d14b..65161a82d4d5 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -573,6 +573,20 @@ struct dma_buf_attachment *dma_buf_attach(struct dma_buf *dmabuf, list_add(&attach->node, &dmabuf->attachments);
mutex_unlock(&dmabuf->lock);
if (!dmabuf->ops->dynamic_sgt_mapping) {
struct sg_table *sgt;
sgt = dmabuf->ops->map_dma_buf(attach, DMA_BIDIRECTIONAL);
if (!sgt)
sgt = ERR_PTR(-ENOMEM);
if (IS_ERR(sgt)) {
dma_buf_detach(dmabuf, attach);
return ERR_CAST(sgt);
}
attach->sgt = sgt;
}
return attach;
err_attach:
@@ -595,6 +609,10 @@ void dma_buf_detach(struct dma_buf *dmabuf, struct dma_buf_attachment *attach) if (WARN_ON(!dmabuf || !attach)) return;
if (attach->sgt)
dmabuf->ops->unmap_dma_buf(attach, attach->sgt,
DMA_BIDIRECTIONAL);
mutex_lock(&dmabuf->lock); list_del(&attach->node); if (dmabuf->ops->detach)
@@ -630,6 +648,9 @@ struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment *attach, if (WARN_ON(!attach || !attach->dmabuf)) return ERR_PTR(-EINVAL);
if (attach->sgt)
return attach->sgt;
I am concerned by this change to make caching the sg_table the default behavior as this will result in the exporter's map_dma_buf/unmap_dma_buf calls are no longer being called in dma_buf_map_attachment/dma_buf_unmap_attachment.
Probably this concern from Liam got lost between versions of your patches; could we please request a reply to these points here?
Sorry I indeed never got this mail, but this is actually not an issue because Daniel had similar concerns and we didn't made this the default in the final version.
This seems concerning to me as it appears to ignore the cache maintenance aspect of the map_dma_buf/unmap_dma_buf calls. For example won't this potentially cause issues for clients of ION.
If we had the following
- #1 dma_buf_attach coherent_device
- #2 dma_buf attach non_coherent_device
- #3 dma_buf_map_attachment non_coherent_device
- #4 non_coherent_device writes to buffer
- #5 dma_buf_unmap_attachment non_coherent_device
- #6 dma_buf_map_attachment coherent_device
- #7 coherent_device reads buffer
- #8 dma_buf_unmap_attachment coherent_device
There wouldn't be any CMO at step #5 anymore (specifically no invalidate) so now at step #7 the coherent_device could read a stale cache line.
Also, now by default dma_buf_unmap_attachment no longer removes the mappings from the iommu, so now by default dma_buf_unmap_attachment is not doing what I would expect and clients are losing the potential sandboxing benefits of removing the mappings. Shouldn't this caching behavior be something that clients opt into instead of being the default?
Well, it seems you are making incorrect assumptions about the cache maintenance of DMA-buf here.
At least for all DRM devices I'm aware of mapping/unmapping an attachment does *NOT* have any cache maintenance implications.
E.g. the use case you describe above would certainly fail with amdgpu, radeon, nouveau and i915 because mapping a DMA-buf doesn't stop the exporter from reading/writing to that buffer (just the opposite actually).
All of them assume perfectly coherent access to the underlying memory. As far as I know there is no documented cache maintenance requirements for DMA-buf.
I think it is documented. It's just that on x86, we ignore that because the dma-api pretends there's never a need for cache flushing on x86, and that everything snoops the cpu caches. Which isn't true since over 20 ago when AGP happened. The actual rules for x86 dma-buf are very much ad-hoc (and we occasionally reapply some duct-tape when cacheline noise shows up somewhere).
I've just filed this away as another instance of the dma-api not fitting gpus, and I think giving recent discussions that won't improve anytime soon. So we're stuck with essentially undefined dma-buf behaviour. -Daniel
The IOMMU concern on the other hand is certainly valid and I perfectly agree that keeping the mapping time as short as possible is desirable.
Regards, Christian.
Liam
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Best, Sumit.
dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel