On Tue, Apr 17, 2012 at 04:23:08PM +0200, Tomasz Stanislawski wrote:
On 04/17/2012 03:03 PM, Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 01:40:47PM +0200, Tomasz Stanislawski wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
- V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost.
- Exynos DRM - the latest version implements mapping on the exporter side
- Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose
- nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Hi Daniel,
No, and imo you're examples are void: 4. Dave Airlie's latest rfc patches implement dma api mapping on the importers side. For i915 see e.g. commit http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=f5c6... which directly grabs the sg list from the dma_buf exporter and shoves it into the into the intel gtt functions that require and already dma mapped sg list.
Sorry, my mistake. I was thinking about 'exporter' when I wrote 'importer'. I agree that i915 is consistent with DMABUF spec. Sorry for the confusion.
Doesn't matter, drm/i915 exporter code in that branch does the dma_map_sg for the attached device:
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=3212...
- Omap: Afaik the address returned is from the TILER, which can be
accessed from any hw device. So it's already mapped into device address space, just not through an iommu managed by dma-api (but conceptually on the same level). This use case is very much one fo the reasons for exporting the sg list already mapped.
No. If the client driver has IOMMU then it cannot access the TILER by basing only on physical address. The client has to map the physical address to the device's address space. It can be done by transforming paddr into a scatterlist and mapping it using dma_map_sg. Otherwise the physical address is useless from the perspective of the client device.
Therefore returning a phys address as a dma_address is not a valid way of mapping a memory.
Of course if none of HW pieces uses IOMMU everything works fine. However, imo an introduction of IOMMU for some client device should __not__ force any change to the exporter code.
Well, there are 3 ways to make this work with the current dma_buf spec - There are no other iommus (like you've noticed). - There are other iommus, but the tiler area is mapped 1:1 into them (current approach used in omap according to Rob Clarke). - Things from tiler are maped on-demand. Currently we don't have any dma interface to remap random pfns, so using these iommus managed by the dma api on top of the omap tiler won't work. What would work is adding a platform specific bit into the device's dma options to tell the omap tiler code to either map the dma_buf into the tiler or use the platform dma api to map it into device address space.
This use case of a special, driver-manged iommu which is thighly integrated with e.g. the gpu, but can also be accessed by other related devices is the core scenario I have in mind for mandating device address space mappings. Yes, this needs some special code in the exporter to make it work, but e.g. with omap it's the only way to shared buffer objects in a tiled layout through (all mapped into device address space through the tiler). For fast video processing pipelineds, using tiled objects is crucial for performance, and on many devices the solution for a common tiling layout is a common tiling unit like omaps tiler.
So I don't see the problem here.
- Exynos uses the iommu api as a nice interface to handle it's graphics
aperture (akin to omap tiler or i915 gtt). Last time I've checked the discussion about how this should be integrated (and at what level) with the dma api is still ongoing. So it's imo way to early to conclude that exynos doesn't return device dma addresses because this iommu might not end up being managed by the core dma api.
- That's your code ;-)
Last but not least your dma_pages_from_sg (or whatever it's called) trick
The function was called dma_get_pages. As it was mentioned before this function was a workaround for deficiencies in DMA api. I agree that it is not possible to transform a buffer into a list of pages in general case i.e. there might be no struct pages for a given range of physical address at which RAM is available.
However if it is possible to transform a DMA buffer into a list of pages then the DMA subsystem is the best place to do it. That is why dma_get_pages was introduced to DMA api in my RFC. The alternative might be dma_get_sg to obtain the scatterlist but it is only a cosmetic difference from dma_get_pages.
Still, a hack that works sometimes but not always is guaranteed to blow up sooner or later. I'd like to get this right and not rework it too often, hence my stubborn opinion on it.
You can criticize as you like but please try to help us to find a better way to export a DMA buffer allocated by DMA api using function like dma_alloc_{coherent/noncoherent/writecombine}.
To be frank, the current dma api is not suitable to be used with dma_buf if one of your importers cannot deal with struct pages allocated with page_alloc. Which means all the devices that need CMA allocated through the dma api.
Now for the fully dynamic & insane case of a drm device being the exporter, I think we can live with this. Atm we only have seriously useful drm drivers on x86 and there we don't need CMA.
But I agree that this isn't a workable solution on arm and embedded platforms, and I think we need to pimp up the dma api to be useful for the basic case of sharing a statically allocated piece of memory.
My plan here is that we should have a default dma_buf exporter implementation that uses these new dma api interfaces so that for the usual case you don't have to write implementations for all these nutty dma_buf interface pieces. That would only be required for special cases that need the full generality of the interface (i.e. a drm exporter) or for memory allocated in a special manner (e.g. v4l userpointer buffer or memory allocated from a carveout not backed by struct pages).
To make that happen we need a new way to allocate coherent dma memory, so I guess we'll have a
void * dma_alloc_attrs_for_devices(struct device **devices, int num_devices, dma_addr_t **dma_handles, gfp_t gfp struct dma_attrs *attrs)
which would allocate coherent dma memory suitable for all the num_devices listed in devices, handing back the dma_handles in the array pointed to by dma_handles.
For the no-iommu case (i.e. dma_generic_alloc_coherent) it would be enough to simply & toghether all the dma masks. On platforms that require more magic (e.g. CMA) this would be more complicated.
To make dynamic pipeline reconfiguration for the example dma_buf exporter possible, the example exporter could alloc a new buffer when a new devices gets attached and copy over the data from the old buffer to the new one, then releasing the old one. If that doesn't work, it can simply return -EBUSY or -ENOMEM.
If we later on also want to support the streaming dma api better for common dma_buf exporters, we can look into that. But I think that would be a second step after getting this dma api extension in.
I hope we've found the real reason for our disagreement on the dma_buf spec, if I'm totally off, please keep on shouting at me.
Yours, Daniel