Re: [Linaro-mm-sig] [PATCH 3/4] dma-buf: add support for mapping with dma mapping attributes - Linaro-mm-sig

22 Jan 2019


      On Tue, 22 Jan 2019, Andrew F. Davis wrote:
...
On 1/21/19 4:12 PM, Liam Mark wrote:
...
On Mon, 21 Jan 2019, Christoph Hellwig wrote:
...
On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
...
The main use case is for allowing clients to pass in 
DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance 
which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In 
ION the buffers aren't usually accessed from the CPU so this allows 
clients to often avoid doing unnecessary cache maintenance.
This can't work.  The cpu can still easily speculate into this area.
Can you provide more detail on your concern here.
The use case I am thinking about here is a cached buffer which is accessed 
by a non IO-coherent device (quite a common use case for ION).
Guessing on your concern:
The speculative access can be an issue if you are going to access the 
buffer from the CPU after the device has written to it, however if you 
know you aren't going to do any CPU access before the buffer is again 
returned to the device then I don't think the speculative access is a 
concern.
...
Moreover in general these operations should be cheap if the addresses
aren't cached.
I am thinking of use cases with cached buffers here, so CMO isn't cheap.
These buffers are cacheable, not cached, if you haven't written anything
the data wont actually be in cache.
That's true
...
And in the case of speculative cache
filling the lines are marked clean. In either case the only cost is the
little 7 instruction loop calling the clean/invalidate instruction (dc
civac for ARMv8) for the cache-lines. Unless that is the cost you are
trying to avoid?
This is the cost I am trying to avoid and this comes back to our previous 
discussion.  We have a coherent system cache so if you are doing this for 
every cache line on a large buffer it adds up with this work and the going 
to the bus.
For example I believe 1080P buffers are 8MB, and 4K buffers are even 
larger.
I also still think you would want to solve this properly such that 
invalidates aren't being done unnecessarily.
...
In that case if you are mapping and unmapping so much that the little
CMO here is hurting performance then I would argue your usage is broken
and needs to be re-worked a bit.
I am not sure I would say it is broken, the large buffers (example 1080P 
buffers) are mapped and unmapped on every frame. I don't think there is 
any clean way to avoid that in a pipelining framework, you could ask 
clients to keep the buffers dma mapped but there isn't necessarily a good 
time to tell them to unmap.
It would be unfortunate to not consider this something legitimate for 
usespace to do in a pipelining use case.
Requiring devices to stay attached doesn't seem very clean to me as there 
isn't necessarily a nice place to tell them when to detach.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project