On Wed, Apr 25, 2018 at 12:04:29PM +0200, Daniel Vetter wrote:
Coordinating the backport of a trivial helper in the arm tree is not the end of the world. Really, this cowboy attitude is a good reason why graphics folks have such a bad rep. You keep poking into random kernel internals, don't talk to anoyone and then complain if people are upset. This shouldn't be surprising.
Not really agreeing on the cowboy thing. The fundamental problem is that the dma api provides abstraction that seriously gets in the way of writing a gpu driver. Some examples:
So talk to other people. Maybe people share your frustation. Or maybe other people have a way to help.
- We never want bounce buffers, ever. dma_map_sg gives us that, so there's hacks to fall back to a cache of pages allocated using dma_alloc_coherent if you build a kernel with bounce buffers.
get_required_mask() is supposed to tell you if you are safe. However we are missing lots of implementations of it for iommus so you might get some false negatives, improvements welcome. It's been on my list of things to fix in the DMA API, but it is nowhere near the top.
- dma api hides the cache flushing requirements from us. GPUs love non-snooped access, and worse give userspace control over that. We want a strict separation between mapping stuff and flushing stuff. With the IOMMU api we mostly have the former, but for the later arch maintainers regularly tells they won't allow that. So we have drm_clflush.c.
The problem is that a cache flushing API entirely separate is hard. That being said if you look at my generic dma-noncoherent API series it tries to move that way. So far it is in early stages and apparently rather buggy unfortunately.
- dma api hides how/where memory is allocated. Kinda similar problem, except now for CMA or address limits. So either we roll our own allocators and then dma_map_sg (and pray it doesn't bounce buffer), or we use dma_alloc_coherent and then grab the sgt to get at the CMA allocations because that's the only way. Which sucks, because we can't directly tell CMA how to back off if there's some way to make CMA memory available through other means (gpus love to hog all of memory, so we have shrinkers and everything).
If you really care about doing explicitly cache flushing anyway (see above) allocating your own memory and mapping it where needed is by far the superior solution. On cache coherent architectures dma_alloc_coherent is nothing but allocate memory + dma_map_single. On non coherent allocations the memory might come through a special pool or must be used through a special virtual address mapping that is set up either statically or dynamically. For that case splitting allocation and mapping is a good idea in many ways, and I plan to move towards that once the number of dma mapping implementations is down to a reasonable number so that it can actually be done.