On Tue, 22 Jan 2019, Andrew F. Davis wrote:
On 1/21/19 4:12 PM, Liam Mark wrote:
On Mon, 21 Jan 2019, Christoph Hellwig wrote:
On Mon, Jan 21, 2019 at 11:44:10AM -0800, Liam Mark wrote:
The main use case is for allowing clients to pass in DMA_ATTR_SKIP_CPU_SYNC in order to skip the default cache maintenance which happens in dma_buf_map_attachment and dma_buf_unmap_attachment. In ION the buffers aren't usually accessed from the CPU so this allows clients to often avoid doing unnecessary cache maintenance.
This can't work. The cpu can still easily speculate into this area.
Can you provide more detail on your concern here. The use case I am thinking about here is a cached buffer which is accessed by a non IO-coherent device (quite a common use case for ION).
Guessing on your concern: The speculative access can be an issue if you are going to access the buffer from the CPU after the device has written to it, however if you know you aren't going to do any CPU access before the buffer is again returned to the device then I don't think the speculative access is a concern.
Moreover in general these operations should be cheap if the addresses aren't cached.
I am thinking of use cases with cached buffers here, so CMO isn't cheap.
These buffers are cacheable, not cached, if you haven't written anything the data wont actually be in cache.
That's true
And in the case of speculative cache filling the lines are marked clean. In either case the only cost is the little 7 instruction loop calling the clean/invalidate instruction (dc civac for ARMv8) for the cache-lines. Unless that is the cost you are trying to avoid?
This is the cost I am trying to avoid and this comes back to our previous discussion. We have a coherent system cache so if you are doing this for every cache line on a large buffer it adds up with this work and the going to the bus. For example I believe 1080P buffers are 8MB, and 4K buffers are even larger.
I also still think you would want to solve this properly such that invalidates aren't being done unnecessarily.
In that case if you are mapping and unmapping so much that the little CMO here is hurting performance then I would argue your usage is broken and needs to be re-worked a bit.
I am not sure I would say it is broken, the large buffers (example 1080P buffers) are mapped and unmapped on every frame. I don't think there is any clean way to avoid that in a pipelining framework, you could ask clients to keep the buffers dma mapped but there isn't necessarily a good time to tell them to unmap.
It would be unfortunate to not consider this something legitimate for usespace to do in a pipelining use case. Requiring devices to stay attached doesn't seem very clean to me as there isn't necessarily a nice place to tell them when to detach.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project