Hey there,
We ran a call today on the topics of V4L2/dmabuf integration and the DMA mapping changes; I've included minutes and actions below.
In summary:
- We're not going to see the V4L2 patches in 3.4; they haven't had enough review or testing and Mauro is very busy this merge window.
- We're going to put in effort to validate the work on Exynos4 and OMAP4 using test applications that Rob and Tomasz are maintaining.
- Sumit's maintaining an up-to-date branch containing all the in-flight DRM/V4L2/dmabuf-dependent work that we can carry meanwhile: http://git.linaro.org/gitweb?p=people/sumitsemwal/linux-3.x.git%3Ba=shortlog...
Laurent, I missed one thing that you mentioned in the call and it's included in an XXX: below; if you could note what it was it would be great.
Please feel free to correct my possibly wild misunderstandings in the minutes and summary above. Thanks!
Attendees are all on the To: line.
Tomasz: At the moment, have PoC support for import/export dma-buf for V4L2 Modified patches by Sumit Supporting importer of dma-buf Exynos DRM and drm-prime
Test application worked fine for V4L capture and DRM output Test application between two V4L devices
Laurent has sent in review comments
Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this
Depends on dma_get_pages Needs to be merged first
Test application posted to dri-devel with dependencies to run demo Many dependencies
Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status Will send request to Mauro? Doesn't think so, won't have enough time for testing RC is already open -- not enough time for that API details needs consideration Depends on how much time current project 2-3 weeks after the merge window closes 1 month to actually review Would like to see Exynos working Mauro has a Samsung V310/Exynos4 dev board Also wants to see tested with V4L driver virtual drivers also supporting dma-buf wants both importer and exporter? ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter
Laurent has work that he can push forward in parallel API change ioctl addition that could be reviewed
Rob: the demo had V4L importer and DRM exporter (?) There are some changes for prime changes Laurent: needs to implement importer/exporter camera piece With drm-prime-dmabuf patches drivers can be simplified
Sumit: we have a new 3.3-WIP branch Rob could use that as a base and put DRM updates on it ACTION: Rob to pull in changes and update demonstration Rob: what camera does Mauro have? On the Samsung SDK, M5MOLS Mauro does have a Pandaboard with ov5650 camera but needs setting up and potential hardware mod ACTION: Mauro to take picture of setup for Sumit
As a backup Rob could add a Tested-By for the changes Mauro essentially wants a test on
With CPU access in dmabuf can use vmalloc Without dependency on dma mapping at all Without Russell's acceptance can't go forward with ARM-related pieces ACTION: Kiko to check with Nicolas on this piece, see if he can review or help Marek with the ARM-related pieces
Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch
Tomasz: so no XXX: exporter support for V4L? Laurent: doesn't have time to update XXX at the moment Needs porting to videobuf2
Rob: Looks like Daniel V. has also replied to the V4L patches so it's going to be hard for 3.4 Mauro's lack of time makes 3.4 not possible anyway mmap support also likely to miss 3.4 EGLImage extension needs sorting out Can be carried in our WIP branch meanwhile
Mauro: what are the test applications being used for dmabuf? Rob: using Panda with camera adapter board (omap-drm) Would like others to have similar setups Requires YUV sensor, but there are patches that support the RAW YUV conversion which allows other sensors to be used Mauro: but what software app are you using? Rob: has test code in github: https://github.com/robclark/omapdrmtest
Tomasz: Have posted one test application on dri-devel (March 6th) http://www.spinics.net/lists/dri-devel/msg19634.html Second application posted to linux-media As a reply to RFCv1 dma-buf V4L2 patches (Jan 26th) http://www.mail-archive.com/linux-media@vger.kernel.org/msg42522.html
Mauro: include both applications together with patches when posting
Sumit: could ask Inki to provide similar exynos-drm test application in parallel to Rob's omap-drm; same interface will simplify testing
Marek: device-to-device coordinate will be needed Will start next week ideally For a device-to-device operation, need this to avoid touching CPU cache Not sure about ARM stuff for 3.4; need review If they don't go in, will keep on dma-mapping-next Konrad's comments are addressed and Reviewed-bys added
Take care,
Hi Kiko,
On Thursday 22 March 2012 11:54:55 Christian Robottom Reis wrote:
Hey there,
We ran a call today on the topics of V4L2/dmabuf integration and the
DMA mapping changes; I've included minutes and actions below.
In summary:
- We're not going to see the V4L2 patches in 3.4; they haven't had enough review or testing and Mauro is very busy this merge window. - We're going to put in effort to validate the work on Exynos4 and OMAP4 using test applications that Rob and Tomasz are maintaining.
Just for reference, I've also tested the V4L2 dmabuf importer role on a Renesas AP4 platform (sharing buffers between V4L2 and FB).
- Sumit's maintaining an up-to-date branch containing all the in-flight DRM/V4L2/dmabuf-dependent work that we can carry meanwhile:
http://git.linaro.org/gitweb?p=people/sumitsemwal/linux-3.x.git%3Ba=shortlog; h=refs/heads/umm-3.3-wip
Laurent, I missed one thing that you mentioned in the call and it's included in an XXX: below; if you could note what it was it would be great.
Sure, done below.
Please feel free to correct my possibly wild misunderstandings in the minutes and summary above. Thanks!
Attendees are all on the To: line.
Tomasz: At the moment, have PoC support for import/export dma-buf for V4L2 Modified patches by Sumit Supporting importer of dma-buf Exynos DRM and drm-prime Test application worked fine for V4L capture and DRM output Test application between two V4L devices Laurent has sent in review comments Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run demo Many dependencies Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status Will send request to Mauro? Doesn't think so, won't have enough time for testing RC is already open -- not enough time for that API details needs consideration Depends on how much time current project 2-3 weeks after the merge window closes 1 month to actually review Would like to see Exynos working Mauro has a Samsung V310/Exynos4 dev board Also wants to see tested with V4L driver virtual drivers also supporting dma-buf wants both importer and exporter? ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter Laurent has work that he can push forward in parallel API change ioctl addition that could be reviewed
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Rob: the demo had V4L importer and DRM exporter (?) There are some changes for prime changes Laurent: needs to implement importer/exporter camera piece
s/piece/ISP/ (Image Signal Processor, the camera interface in the OMAP3)
With drm-prime-dmabuf patches drivers can be simplified Sumit: we have a new 3.3-WIP branch Rob could use that as a base and put DRM updates on it ACTION: Rob to pull in changes and update demonstration Rob: what camera does Mauro have? On the Samsung SDK, M5MOLS Mauro does have a Pandaboard with ov5650 camera but needs setting up and potential hardware mod ACTION: Mauro to take picture of setup for Sumit As a backup Rob could add a Tested-By for the changes Mauro essentially wants a test on With CPU access in dmabuf can use vmalloc Without dependency on dma mapping at all Without Russell's acceptance can't go forward with ARM-related pieces ACTION: Kiko to check with Nicolas on this piece, see if he can review or help Marek with the ARM-related pieces Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch Tomasz: so no XXX: exporter support for V4L? Laurent: doesn't have time to update XXX at the moment Needs porting to videobuf2
I was talking about the OMAP3 ISP driver (camera interface). It could be a good test case for the dmabuf API, but needs to be ported to videobuf2 first.
Rob: Looks like Daniel V. has also replied to the V4L patches so it's going to be hard for 3.4 Mauro's lack of time makes 3.4 not possible anyway mmap support also likely to miss 3.4 EGLImage extension needs sorting out Can be carried in our WIP branch meanwhile Mauro: what are the test applications being used for dmabuf? Rob: using Panda with camera adapter board (omap-drm) Would like others to have similar setups Requires YUV sensor, but there are patches that support the RAW YUV conversion which allows other sensors to be used Mauro: but what software app are you using? Rob: has test code in github: https://github.com/robclark/omapdrmtest Tomasz: Have posted one test application on dri-devel (March 6th) http://www.spinics.net/lists/dri-devel/msg19634.html Second application posted to linux-media As a reply to RFCv1 dma-buf V4L2 patches (Jan 26th)
http://www.mail-archive.com/linux-media@vger.kernel.org/msg42522.html
Mauro: include both applications together with patches when posting Sumit: could ask Inki to provide similar exynos-drm test application
in parallel to Rob's omap-drm; same interface will simplify testing
Marek: device-to-device coordinate will be needed Will start next week ideally For a device-to-device operation, need this to avoid touching CPU cache Not sure about ARM stuff for 3.4; need review If they don't go in, will keep on dma-mapping-next Konrad's comments are addressed and Reviewed-bys added
Take care,
On Thu, Mar 22, 2012 at 04:17:12PM +0100, Laurent Pinchart wrote:
Hi Kiko,
On Thursday 22 March 2012 11:54:55 Christian Robottom Reis wrote:
Hey there,
We ran a call today on the topics of V4L2/dmabuf integration and the
DMA mapping changes; I've included minutes and actions below.
In summary:
- We're not going to see the V4L2 patches in 3.4; they haven't had enough review or testing and Mauro is very busy this merge window. - We're going to put in effort to validate the work on Exynos4 and OMAP4 using test applications that Rob and Tomasz are maintaining.
Just for reference, I've also tested the V4L2 dmabuf importer role on a Renesas AP4 platform (sharing buffers between V4L2 and FB).
Is it worth adding explicit Tested-Bys for this (or has Tomasz done that already)?
Laurent has work that he can push forward in parallel API change ioctl addition that could be reviewed
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Thanks for the clarification. I hope this makes sense to somebody who knows enough to understand why these two pieces of work are separate <wink>
Hi there,
Quick action review:
On Thu, Mar 22, 2012 at 04:17:12PM +0100, Laurent Pinchart wrote:
Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status
Inki, Tomasz, are you happy with the latest posting -- i.e. does this look sensible and useful for the V4L and display controller work?
Also wants to see tested with V4L driver virtual drivers also supporting dma-buf wants both importer and exporter? ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter
Sumit, Tomasz: have you two synced up on this?
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Laurent, is there an action here to kick off the discussion of the V4L2 API changes (which I don't understand current status of very well)?
Sumit: we have a new 3.3-WIP branch Rob could use that as a base and put DRM updates on it ACTION: Rob to pull in changes and update demonstration
I think Rob's said he'd have time to look at this in a week or two so I won't inquire on this.
Rob: what camera does Mauro have? On the Samsung SDK, M5MOLS Mauro does have a Pandaboard with ov5650 camera but needs setting up and potential hardware mod ACTION: Mauro to take picture of setup for Sumit
Mauro probably still needs the Panda set up at least, or an up-to-date Samsung board. Can either of you provide him something to make testing this work easier? If you have someone I should ask inside the company just let me know -- if you don't reply I'll assume I need to go and ask the Linaro member contact.
Without Russell's acceptance can't go forward with ARM-related pieces ACTION: Kiko to check with Nicolas on this piece, see if he can review or help Marek with the ARM-related pieces
Still waiting for Nico on this one..
Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch
Tomasz, Inki: are you happy with this?
Thanks,
Hi Kiko,
On 28 March 2012 20:35, Christian Robottom Reis kiko@linaro.org wrote:
Hi there,
Quick action review:
On Thu, Mar 22, 2012 at 04:17:12PM +0100, Laurent Pinchart wrote:
Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status
Inki, Tomasz, are you happy with the latest posting -- i.e. does this look sensible and useful for the V4L and display controller work?
Also wants to see tested with V4L driver virtual drivers also supporting dma-buf wants both importer and exporter? ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter
Sumit, Tomasz: have you two synced up on this?
I hadn't started yet, and saw a mail from Tomasz that he's already started. Like he mentioned in his mail, he's having some problem since vivi is a virtual driver. We'll discuss potential solutions soon.
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Laurent, is there an action here to kick off the discussion of the V4L2 API changes (which I don't understand current status of very well)?
Sumit: we have a new 3.3-WIP branch Rob could use that as a base and put DRM updates on it ACTION: Rob to pull in changes and update demonstration
I think Rob's said he'd have time to look at this in a week or two so I won't inquire on this.
Rob: what camera does Mauro have? On the Samsung SDK, M5MOLS Mauro does have a Pandaboard with ov5650 camera but needs setting up and potential hardware mod ACTION: Mauro to take picture of setup for Sumit
Mauro probably still needs the Panda set up at least, or an up-to-date Samsung board. Can either of you provide him something to make testing this work easier? If you have someone I should ask inside the company just let me know -- if you don't reply I'll assume I need to go and ask the Linaro member contact.
Unfortunately I couldn't clearly understand from his description about the connector; I can send him a couple of pics of how it should fit, and hopefully that'd be that. For the record: I guess there are total 3 parts - 1. connector to be soldered onto the board, 2. the adapter PCB, and 3. the sensor itself. I will send the pics to him directly.
Without Russell's acceptance can't go forward with ARM-related pieces ACTION: Kiko to check with Nicolas on this piece, see if he can review or help Marek with the ARM-related pieces
Still waiting for Nico on this one..
Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch
Tomasz, Inki: are you happy with this?
Thanks,
Christian Robottom Reis, Engineering VP Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935 Linaro.org: Open Source Software for ARM SoCs
Hi Kiko,
On Wednesday 28 March 2012 12:05:02 Christian Robottom Reis wrote:
On Thu, Mar 22, 2012 at 04:17:12PM +0100, Laurent Pinchart wrote:
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Laurent, is there an action here to kick off the discussion of the V4L2 API changes (which I don't understand current status of very well)?
Patches have already been posted to the linux-media mailing list. They miss documentation though, I think it would help getting reviewers if we documented the API. Tomasz, you mentioned that you would work on that for the next version, have you had time to update the patch with the Documentation/DocBook/media/v4l changes ?
On 03/29/2012 01:17 PM, Laurent Pinchart wrote:
Hi Kiko,
On Wednesday 28 March 2012 12:05:02 Christian Robottom Reis wrote:
On Thu, Mar 22, 2012 at 04:17:12PM +0100, Laurent Pinchart wrote:
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Laurent, is there an action here to kick off the discussion of the V4L2 API changes (which I don't understand current status of very well)?
Patches have already been posted to the linux-media mailing list. They miss documentation though, I think it would help getting reviewers if we documented the API. Tomasz, you mentioned that you would work on that for the next version, have you had time to update the patch with the Documentation/DocBook/media/v4l changes ?
Hi Laurent, The documentation patches should be split into importer and exporter parts. Due to the problems with dma_get_pages (or other form of DMA cookie extractor) the exporter part is probably going to be postponed until DMA-mapping-redesign reaches a steady state.
I haven't focused on documentation part yet but I will try to prepare at least a sketch of documentation for DMABUF importer. Quite probably, it will be posted next week.
Regards, Tomasz Stanislawski
Hi, Kiko
2012년 3월 29일 오전 12:05, Christian Robottom Reis kiko@linaro.org님의 말:
Hi there,
Quick action review:
On Thu, Mar 22, 2012 at 04:17:12PM +0100, Laurent Pinchart wrote:
Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status
Inki, Tomasz, are you happy with the latest posting -- i.e. does this look sensible and useful for the V4L and display controller work?
Also wants to see tested with V4L driver virtual drivers also supporting dma-buf wants both importer and exporter? ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter
Sumit, Tomasz: have you two synced up on this?
I don't really have work that I can push forward in parallel. My point was that the modifications to the V4L2 API and the implementation in videobuf2 are two separate tasks that should be pushed forward in parallel (extending the V4L2 API takes quite a lot of time in my experience, our developers like nit- picking on the mailing list when they review the documentation - which I consider as a good thing, just for the record).
Laurent, is there an action here to kick off the discussion of the V4L2 API changes (which I don't understand current status of very well)?
Sumit: we have a new 3.3-WIP branch Rob could use that as a base and put DRM updates on it ACTION: Rob to pull in changes and update demonstration
I think Rob's said he'd have time to look at this in a week or two so I won't inquire on this.
Rob: what camera does Mauro have? On the Samsung SDK, M5MOLS Mauro does have a Pandaboard with ov5650 camera but needs setting up and potential hardware mod ACTION: Mauro to take picture of setup for Sumit
Mauro probably still needs the Panda set up at least, or an up-to-date Samsung board. Can either of you provide him something to make testing this work easier? If you have someone I should ask inside the company just let me know -- if you don't reply I'll assume I need to go and ask the Linaro member contact.
Without Russell's acceptance can't go forward with ARM-related pieces ACTION: Kiko to check with Nicolas on this piece, see if he can review or help Marek with the ARM-related pieces
Still waiting for Nico on this one..
Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch
Tomasz, Inki: are you happy with this?
if doing so, It would be very useful for me. we are working on updating Exynos specific drm prime feature with latest dmabuf and drm prime codes posted recently and that would be tested with v4l2 side including dmabuf featue soon.
Thanks, Inki Dae
Thanks,
Christian Robottom Reis, Engineering VP Brazil (GMT-3) | [+55] 16 9112 6430 | [+1] 612 216 4935 Linaro.org: Open Source Software for ARM SoCs
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run demo Many dependencies
I kinda missed to yell at this patch when it first showed up, so I'll do that here ;-)
I think this is a gross layering violation and I don't like it at all. The entire point of the dma api is that device drivers only get to see device addresses and can forget about all the remapping/contig-alloc madness. And dma-buf should just follow this with it's map/unmap interfaces.
Furthermore the exporter memory might simply not have any associated struct pages. The two examples I always bring up: - special purpose remapping units (like omap's TILER) which are managed by the exporter and can do crazy things like tiling or rotation transparently for all devices. - special carve-out memory which is unknown to linux memory management. drm/i915 is totally abusing this, mostly because windows is lame and doesn't have decent largepage allocation support. This is just plain system memory, but there's no struct page for it (because it's not part of the system map).
Now the core dma api isn't fully up to snuff for everything yet and there are things missing. But it's certainly not dma_get_pages, but more things like mmap support for coherent memory or allocating coherent memroy which doesn't have a static mapping in the kernel address space. I very much hope that the interfaces we develop for dma-buf (and the insights gained) could help as examples here, so that in the further there's not such a gaping difference for the driver between dma_coherent allocations of it's own and imported buffer objects.
Yours, Daniel
Hi Daniel,
On Thursday 22 March 2012 19:01:01 Daniel Vetter wrote:
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Tomasz: proposed extension to DMA Mapping -- dma_get_pages
Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run demo Many dependencies
I kinda missed to yell at this patch when it first showed up, so I'll do that here ;-)
I think this is a gross layering violation and I don't like it at all. The entire point of the dma api is that device drivers only get to see device addresses and can forget about all the remapping/contig-alloc madness. And dma-buf should just follow this with it's map/unmap interfaces.
Furthermore the exporter memory might simply not have any associated struct pages. The two examples I always bring up:
- special purpose remapping units (like omap's TILER) which are managed by the exporter and can do crazy things like tiling or rotation transparently for all devices.
- special carve-out memory which is unknown to linux memory management. drm/i915 is totally abusing this, mostly because windows is lame and doesn't have decent largepage allocation support. This is just plain system memory, but there's no struct page for it (because it's not part of the system map).
I agree with you that the DMA API is the proper layer to abstract physical memory and provide devices with a DMA address. DMA addresses are specific to a device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless). As DMA address are device-local, they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind the scene and mapped to the device address space ("allocated" in this case means anything from plain physical memory allocation to reservation of a special- purpose memory range, like in the OMAP TILER example). All the device driver gets to see is the DMA address and/or the DMA scatter list. So far, so good.
Then, when we want to share the memory with a second device, we need a way to map the memory to the second device's address space. There are several options here (and this is related to the "[RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour" mail thread).
- Let the importer driver map the memory to its own address space. This makes sense from the importer device's point of view, as that's where knowledge about the importer device is located (although you could argue that knowledge about the importer device is located in its struct device, which can be passed around - and I could agree with that). The importer driver would thus need to receive a cookie identifying the memory. As explained before, the exporter's DMA address isn't enough. There are various options here as well (list of pages or page frame numbers, exporter's DMA address + exporter's struct device, a new kind of DMA API-related cookie, ... to just list a few). The importer driver would then use that cookie to map the memory to the importer device's address space (and this should most probably be implemented in the DMA API, which would require extensions).
- Let the exporter driver map the memory to the importer device's address space. This makes sense from the exporter device's point of view, as that's where knowledge about the exported memory is located. In this case we also most probably want to extend the DMA API to handle the mapping operation, and we will need to pass the same kind of cookie as in the first option to the API.
Now the core dma api isn't fully up to snuff for everything yet and there are things missing. But it's certainly not dma_get_pages, but more things like mmap support for coherent memory or allocating coherent memroy which doesn't have a static mapping in the kernel address space. I very much hope that the interfaces we develop for dma-buf (and the insights gained) could help as examples here, so that in the further there's not such a gaping difference for the driver between dma_coherent allocations of it's own and imported buffer objects.
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
1. V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost. 2. Exynos DRM - the latest version implements mapping on the exporter side 3. Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose 4. nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Regards, Tomasz Stanislawski
On 03/27/2012 11:39 AM, Laurent Pinchart wrote:
Hi Daniel,
On Thursday 22 March 2012 19:01:01 Daniel Vetter wrote:
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Tomasz: proposed extension to DMA Mapping -- dma_get_pages
Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run demo Many dependencies
I kinda missed to yell at this patch when it first showed up, so I'll do that here ;-)
I think this is a gross layering violation and I don't like it at all. The entire point of the dma api is that device drivers only get to see device addresses and can forget about all the remapping/contig-alloc madness. And dma-buf should just follow this with it's map/unmap interfaces.
Furthermore the exporter memory might simply not have any associated struct pages. The two examples I always bring up:
- special purpose remapping units (like omap's TILER) which are managed by the exporter and can do crazy things like tiling or rotation transparently for all devices.
- special carve-out memory which is unknown to linux memory management. drm/i915 is totally abusing this, mostly because windows is lame and doesn't have decent largepage allocation support. This is just plain system memory, but there's no struct page for it (because it's not part of the system map).
I agree with you that the DMA API is the proper layer to abstract physical memory and provide devices with a DMA address. DMA addresses are specific to a device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless). As DMA address are device-local, they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind the scene and mapped to the device address space ("allocated" in this case means anything from plain physical memory allocation to reservation of a special- purpose memory range, like in the OMAP TILER example). All the device driver gets to see is the DMA address and/or the DMA scatter list. So far, so good.
Then, when we want to share the memory with a second device, we need a way to map the memory to the second device's address space. There are several options here (and this is related to the "[RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour" mail thread).
- Let the importer driver map the memory to its own address space. This makes
sense from the importer device's point of view, as that's where knowledge about the importer device is located (although you could argue that knowledge about the importer device is located in its struct device, which can be passed around - and I could agree with that). The importer driver would thus need to receive a cookie identifying the memory. As explained before, the exporter's DMA address isn't enough. There are various options here as well (list of pages or page frame numbers, exporter's DMA address + exporter's struct device, a new kind of DMA API-related cookie, ... to just list a few). The importer driver would then use that cookie to map the memory to the importer device's address space (and this should most probably be implemented in the DMA API, which would require extensions).
- Let the exporter driver map the memory to the importer device's address
space. This makes sense from the exporter device's point of view, as that's where knowledge about the exported memory is located. In this case we also most probably want to extend the DMA API to handle the mapping operation, and we will need to pass the same kind of cookie as in the first option to the API.
Now the core dma api isn't fully up to snuff for everything yet and there are things missing. But it's certainly not dma_get_pages, but more things like mmap support for coherent memory or allocating coherent memroy which doesn't have a static mapping in the kernel address space. I very much hope that the interfaces we develop for dma-buf (and the insights gained) could help as examples here, so that in the further there's not such a gaping difference for the driver between dma_coherent allocations of it's own and imported buffer objects.
Hi, What about the mapping for importing devices which an IOMMU? To update the mapping in page tables accessed by importing device's IOMMU do we need to create a mapping in the exporter side or the importing device must use the mapped sg returned by exporter and create a mapping for IOMMU?
Regards, Abhinav
On Tue, Apr 17, 2012 at 8:40 PM, Tomasz Stanislawski < t.stanislaws@samsung.com> wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
- V4L2 - support for dmabuf importing hopefully consistent with dmabuf
spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost. 2. Exynos DRM - the latest version implements mapping on the exporter side 3. Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose 4. nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Regards, Tomasz Stanislawski
On 03/27/2012 11:39 AM, Laurent Pinchart wrote:
Hi Daniel,
On Thursday 22 March 2012 19:01:01 Daniel Vetter wrote:
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run
demo
Many dependencies
I kinda missed to yell at this patch when it first showed up, so I'll do that here ;-)
I think this is a gross layering violation and I don't like it at all.
The
entire point of the dma api is that device drivers only get to see
device
addresses and can forget about all the remapping/contig-alloc madness.
And
dma-buf should just follow this with it's map/unmap interfaces.
Furthermore the exporter memory might simply not have any associated struct pages. The two examples I always bring up:
- special purpose remapping units (like omap's TILER) which are managed
by
the exporter and can do crazy things like tiling or rotation transparently for all devices.
- special carve-out memory which is unknown to linux memory management. drm/i915 is totally abusing this, mostly because windows is lame and doesn't have decent largepage allocation support. This is just plain system memory, but there's no struct page for it (because it's not
part
of the system map).
I agree with you that the DMA API is the proper layer to abstract
physical
memory and provide devices with a DMA address. DMA addresses are
specific to a
device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless). As DMA address are
device-local,
they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind
the
scene and mapped to the device address space ("allocated" in this case
means
anything from plain physical memory allocation to reservation of a
special-
purpose memory range, like in the OMAP TILER example). All the device
driver
gets to see is the DMA address and/or the DMA scatter list. So far, so
good.
Then, when we want to share the memory with a second device, we need a
way to
map the memory to the second device's address space. There are several
options
here (and this is related to the "[RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour" mail thread).
- Let the importer driver map the memory to its own address space. This
makes
sense from the importer device's point of view, as that's where knowledge about the importer device is located (although you could argue that
knowledge
about the importer device is located in its struct device, which can be
passed
around - and I could agree with that). The importer driver would thus
need to
receive a cookie identifying the memory. As explained before, the
exporter's
DMA address isn't enough. There are various options here as well (list of pages or page frame numbers, exporter's DMA address + exporter's struct device, a new kind of DMA API-related cookie, ... to just list a few).
The
importer driver would then use that cookie to map the memory to the
importer
device's address space (and this should most probably be implemented in
the
DMA API, which would require extensions).
- Let the exporter driver map the memory to the importer device's address
space. This makes sense from the exporter device's point of view, as
that's
where knowledge about the exported memory is located. In this case we
also
most probably want to extend the DMA API to handle the mapping
operation, and
we will need to pass the same kind of cookie as in the first option to
the
API.
Now the core dma api isn't fully up to snuff for everything yet and
there
are things missing. But it's certainly not dma_get_pages, but more
things
like mmap support for coherent memory or allocating coherent memroy
which
doesn't have a static mapping in the kernel address space. I very much hope that the interfaces we develop for dma-buf (and the insights
gained)
could help as examples here, so that in the further there's not such a gaping difference for the driver between dma_coherent allocations of
it's
own and imported buffer objects.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Hi,
I never saw this answered (sorry if it was and I just missed it) and it seemed like a generally useful detail to clarify, so here's my understanding (from Documentation/dma-buf-sharing.txt):
When the importer calls dma_buf_map_attachment(), the struct sg_table* returned by the exporter will already have been appropriately mapped for the importer's IOMMU. This is expected as part of the API contract and is possible because of the struct device* passed in by the importer in the call to dma_buf_attach().
cheers, Jesse
On Tue, Apr 17, 2012 at 4:58 AM, Abhinav Kochhar kochhar.abhinav@gmail.com wrote:
Hi, What about the mapping for importing devices which an IOMMU? To update the mapping in page tables accessed by importing device's IOMMU do we need to create a mapping in the exporter side or the importing device must use the mapped sg returned by exporter and create a mapping for IOMMU?
Regards, Abhinav
On Tue, Apr 17, 2012 at 8:40 PM, Tomasz Stanislawski t.stanislaws@samsung.com wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
- V4L2 - support for dmabuf importing hopefully consistent with dmabuf
spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost. 2. Exynos DRM - the latest version implements mapping on the exporter side 3. Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose 4. nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Regards, Tomasz Stanislawski
On 03/27/2012 11:39 AM, Laurent Pinchart wrote:
Hi Daniel,
On Thursday 22 March 2012 19:01:01 Daniel Vetter wrote:
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Tomasz: proposed extension to DMA Mapping -- dma_get_pages
Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this
Depends on dma_get_pages Needs to be merged first
Test application posted to dri-devel with dependencies to run demo
Many dependencies
I kinda missed to yell at this patch when it first showed up, so I'll do that here ;-)
I think this is a gross layering violation and I don't like it at all. The entire point of the dma api is that device drivers only get to see device addresses and can forget about all the remapping/contig-alloc madness. And dma-buf should just follow this with it's map/unmap interfaces.
Furthermore the exporter memory might simply not have any associated struct pages. The two examples I always bring up:
- special purpose remapping units (like omap's TILER) which are managed
by the exporter and can do crazy things like tiling or rotation transparently for all devices.
- special carve-out memory which is unknown to linux memory management.
drm/i915 is totally abusing this, mostly because windows is lame and doesn't have decent largepage allocation support. This is just plain system memory, but there's no struct page for it (because it's not part of the system map).
I agree with you that the DMA API is the proper layer to abstract physical memory and provide devices with a DMA address. DMA addresses are specific to a device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless). As DMA address are device-local, they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind the scene and mapped to the device address space ("allocated" in this case means anything from plain physical memory allocation to reservation of a special- purpose memory range, like in the OMAP TILER example). All the device driver gets to see is the DMA address and/or the DMA scatter list. So far, so good.
Then, when we want to share the memory with a second device, we need a way to map the memory to the second device's address space. There are several options here (and this is related to the "[RFCv2 PATCH 7/9] v4l: vb2-dma-contig: change map/unmap behaviour" mail thread).
- Let the importer driver map the memory to its own address space. This
makes sense from the importer device's point of view, as that's where knowledge about the importer device is located (although you could argue that knowledge about the importer device is located in its struct device, which can be passed around - and I could agree with that). The importer driver would thus need to receive a cookie identifying the memory. As explained before, the exporter's DMA address isn't enough. There are various options here as well (list of pages or page frame numbers, exporter's DMA address + exporter's struct device, a new kind of DMA API-related cookie, ... to just list a few). The importer driver would then use that cookie to map the memory to the importer device's address space (and this should most probably be implemented in the DMA API, which would require extensions).
- Let the exporter driver map the memory to the importer device's
address space. This makes sense from the exporter device's point of view, as that's where knowledge about the exported memory is located. In this case we also most probably want to extend the DMA API to handle the mapping operation, and we will need to pass the same kind of cookie as in the first option to the API.
Now the core dma api isn't fully up to snuff for everything yet and there are things missing. But it's certainly not dma_get_pages, but more things like mmap support for coherent memory or allocating coherent memroy which doesn't have a static mapping in the kernel address space. I very much hope that the interfaces we develop for dma-buf (and the insights gained) could help as examples here, so that in the further there's not such a gaping difference for the driver between dma_coherent allocations of it's own and imported buffer objects.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Tue, Apr 17, 2012 at 01:40:47PM +0200, Tomasz Stanislawski wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
- V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost.
- Exynos DRM - the latest version implements mapping on the exporter side
- Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose
- nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
No, and imo you're examples are void: 4. Dave Airlie's latest rfc patches implement dma api mapping on the importers side. For i915 see e.g. commit http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=f5c6... which directly grabs the sg list from the dma_buf exporter and shoves it into the into the intel gtt functions that require and already dma mapped sg list.
3. Omap: Afaik the address returned is from the TILER, which can be accessed from any hw device. So it's already mapped into device address space, just not through an iommu managed by dma-api (but conceptually on the same level). This use case is very much one fo the reasons for exporting the sg list already mapped.
2. Exynos uses the iommu api as a nice interface to handle it's graphics aperture (akin to omap tiler or i915 gtt). Last time I've checked the discussion about how this should be integrated (and at what level) with the dma api is still ongoing. So it's imo way to early to conclude that exynos doesn't return device dma addresses because this iommu might not end up being managed by the core dma api.
1. That's your code ;-)
Last but not least your dma_pages_from_sg (or whatever it's called) trick is one gross layering hack which can easily blow up. I pretty much expect drm/i915 (and also other drivers) to dma_buf export objects that are _not_ backed by struct pages sooner or later.
Cheers, Daniel
On 04/17/2012 03:03 PM, Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 01:40:47PM +0200, Tomasz Stanislawski wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
- V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost.
- Exynos DRM - the latest version implements mapping on the exporter side
- Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose
- nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Hi Daniel,
No, and imo you're examples are void: 4. Dave Airlie's latest rfc patches implement dma api mapping on the importers side. For i915 see e.g. commit http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=f5c6... which directly grabs the sg list from the dma_buf exporter and shoves it into the into the intel gtt functions that require and already dma mapped sg list.
Sorry, my mistake. I was thinking about 'exporter' when I wrote 'importer'. I agree that i915 is consistent with DMABUF spec. Sorry for the confusion.
- Omap: Afaik the address returned is from the TILER, which can be
accessed from any hw device. So it's already mapped into device address space, just not through an iommu managed by dma-api (but conceptually on the same level). This use case is very much one fo the reasons for exporting the sg list already mapped.
No. If the client driver has IOMMU then it cannot access the TILER by basing only on physical address. The client has to map the physical address to the device's address space. It can be done by transforming paddr into a scatterlist and mapping it using dma_map_sg. Otherwise the physical address is useless from the perspective of the client device.
Therefore returning a phys address as a dma_address is not a valid way of mapping a memory.
Of course if none of HW pieces uses IOMMU everything works fine. However, imo an introduction of IOMMU for some client device should __not__ force any change to the exporter code.
- Exynos uses the iommu api as a nice interface to handle it's graphics
aperture (akin to omap tiler or i915 gtt). Last time I've checked the discussion about how this should be integrated (and at what level) with the dma api is still ongoing. So it's imo way to early to conclude that exynos doesn't return device dma addresses because this iommu might not end up being managed by the core dma api.
- That's your code ;-)
Last but not least your dma_pages_from_sg (or whatever it's called) trick
The function was called dma_get_pages. As it was mentioned before this function was a workaround for deficiencies in DMA api. I agree that it is not possible to transform a buffer into a list of pages in general case i.e. there might be no struct pages for a given range of physical address at which RAM is available.
However if it is possible to transform a DMA buffer into a list of pages then the DMA subsystem is the best place to do it. That is why dma_get_pages was introduced to DMA api in my RFC. The alternative might be dma_get_sg to obtain the scatterlist but it is only a cosmetic difference from dma_get_pages.
You can criticize as you like but please try to help us to find a better way to export a DMA buffer allocated by DMA api using function like dma_alloc_{coherent/noncoherent/writecombine}.
is one gross layering hack which can easily blow up. I pretty much expect drm/i915 (and also other drivers) to dma_buf export objects that are _not_ backed by struct pages sooner or later.
Cheers, Daniel
To everyone:
I see that there are multiple issues about DMABUF. My proposition is to organize a brainstorming session (circa 3 days) in Warsaw similar to V4L brainstorming that happened about a year ago. It will focus on topic related to DMA mapping and DMABUF.
What do you think about this idea? Who would like to attend?
Regards, Tomasz Stanislawski
On Tue, Apr 17, 2012 at 04:23:08PM +0200, Tomasz Stanislawski wrote:
On 04/17/2012 03:03 PM, Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 01:40:47PM +0200, Tomasz Stanislawski wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
- V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost.
- Exynos DRM - the latest version implements mapping on the exporter side
- Omap/DRM - 'mapping' is done on exporter side by setting a physical address as DMA address in the scatterlist. The dma_map_sg should be used for this purpose
- nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Hi Daniel,
No, and imo you're examples are void: 4. Dave Airlie's latest rfc patches implement dma api mapping on the importers side. For i915 see e.g. commit http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=f5c6... which directly grabs the sg list from the dma_buf exporter and shoves it into the into the intel gtt functions that require and already dma mapped sg list.
Sorry, my mistake. I was thinking about 'exporter' when I wrote 'importer'. I agree that i915 is consistent with DMABUF spec. Sorry for the confusion.
Doesn't matter, drm/i915 exporter code in that branch does the dma_map_sg for the attached device:
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=3212...
- Omap: Afaik the address returned is from the TILER, which can be
accessed from any hw device. So it's already mapped into device address space, just not through an iommu managed by dma-api (but conceptually on the same level). This use case is very much one fo the reasons for exporting the sg list already mapped.
No. If the client driver has IOMMU then it cannot access the TILER by basing only on physical address. The client has to map the physical address to the device's address space. It can be done by transforming paddr into a scatterlist and mapping it using dma_map_sg. Otherwise the physical address is useless from the perspective of the client device.
Therefore returning a phys address as a dma_address is not a valid way of mapping a memory.
Of course if none of HW pieces uses IOMMU everything works fine. However, imo an introduction of IOMMU for some client device should __not__ force any change to the exporter code.
Well, there are 3 ways to make this work with the current dma_buf spec - There are no other iommus (like you've noticed). - There are other iommus, but the tiler area is mapped 1:1 into them (current approach used in omap according to Rob Clarke). - Things from tiler are maped on-demand. Currently we don't have any dma interface to remap random pfns, so using these iommus managed by the dma api on top of the omap tiler won't work. What would work is adding a platform specific bit into the device's dma options to tell the omap tiler code to either map the dma_buf into the tiler or use the platform dma api to map it into device address space.
This use case of a special, driver-manged iommu which is thighly integrated with e.g. the gpu, but can also be accessed by other related devices is the core scenario I have in mind for mandating device address space mappings. Yes, this needs some special code in the exporter to make it work, but e.g. with omap it's the only way to shared buffer objects in a tiled layout through (all mapped into device address space through the tiler). For fast video processing pipelineds, using tiled objects is crucial for performance, and on many devices the solution for a common tiling layout is a common tiling unit like omaps tiler.
So I don't see the problem here.
- Exynos uses the iommu api as a nice interface to handle it's graphics
aperture (akin to omap tiler or i915 gtt). Last time I've checked the discussion about how this should be integrated (and at what level) with the dma api is still ongoing. So it's imo way to early to conclude that exynos doesn't return device dma addresses because this iommu might not end up being managed by the core dma api.
- That's your code ;-)
Last but not least your dma_pages_from_sg (or whatever it's called) trick
The function was called dma_get_pages. As it was mentioned before this function was a workaround for deficiencies in DMA api. I agree that it is not possible to transform a buffer into a list of pages in general case i.e. there might be no struct pages for a given range of physical address at which RAM is available.
However if it is possible to transform a DMA buffer into a list of pages then the DMA subsystem is the best place to do it. That is why dma_get_pages was introduced to DMA api in my RFC. The alternative might be dma_get_sg to obtain the scatterlist but it is only a cosmetic difference from dma_get_pages.
Still, a hack that works sometimes but not always is guaranteed to blow up sooner or later. I'd like to get this right and not rework it too often, hence my stubborn opinion on it.
You can criticize as you like but please try to help us to find a better way to export a DMA buffer allocated by DMA api using function like dma_alloc_{coherent/noncoherent/writecombine}.
To be frank, the current dma api is not suitable to be used with dma_buf if one of your importers cannot deal with struct pages allocated with page_alloc. Which means all the devices that need CMA allocated through the dma api.
Now for the fully dynamic & insane case of a drm device being the exporter, I think we can live with this. Atm we only have seriously useful drm drivers on x86 and there we don't need CMA.
But I agree that this isn't a workable solution on arm and embedded platforms, and I think we need to pimp up the dma api to be useful for the basic case of sharing a statically allocated piece of memory.
My plan here is that we should have a default dma_buf exporter implementation that uses these new dma api interfaces so that for the usual case you don't have to write implementations for all these nutty dma_buf interface pieces. That would only be required for special cases that need the full generality of the interface (i.e. a drm exporter) or for memory allocated in a special manner (e.g. v4l userpointer buffer or memory allocated from a carveout not backed by struct pages).
To make that happen we need a new way to allocate coherent dma memory, so I guess we'll have a
void * dma_alloc_attrs_for_devices(struct device **devices, int num_devices, dma_addr_t **dma_handles, gfp_t gfp struct dma_attrs *attrs)
which would allocate coherent dma memory suitable for all the num_devices listed in devices, handing back the dma_handles in the array pointed to by dma_handles.
For the no-iommu case (i.e. dma_generic_alloc_coherent) it would be enough to simply & toghether all the dma masks. On platforms that require more magic (e.g. CMA) this would be more complicated.
To make dynamic pipeline reconfiguration for the example dma_buf exporter possible, the example exporter could alloc a new buffer when a new devices gets attached and copy over the data from the old buffer to the new one, then releasing the old one. If that doesn't work, it can simply return -EBUSY or -ENOMEM.
If we later on also want to support the streaming dma api better for common dma_buf exporters, we can look into that. But I think that would be a second step after getting this dma api extension in.
I hope we've found the real reason for our disagreement on the dma_buf spec, if I'm totally off, please keep on shouting at me.
Yours, Daniel
Hi Daniel,
On Tuesday 17 April 2012 19:21:37 Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 04:23:08PM +0200, Tomasz Stanislawski wrote:
On 04/17/2012 03:03 PM, Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 01:40:47PM +0200, Tomasz Stanislawski wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost.
Exynos DRM - the latest version implements mapping on the exporter side
Omap/DRM - 'mapping' is done on exporter side by setting a physical address> as DMA address in the scatterlist. The dma_map_sg should be used for this purpose
nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Hi Daniel,
No, and imo you're examples are void: 4. Dave Airlie's latest rfc patches implement dma api mapping on the importers side. For i915 see e.g. commit http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=f5c6 bc7c5483ceaa79ed23c6b0af9b967ebc009f which directly grabs the sg list from the dma_buf exporter and shoves it into the into the intel gtt functions that require and already dma mapped sg list.
Sorry, my mistake. I was thinking about 'exporter' when I wrote 'importer'. I agree that i915 is consistent with DMABUF spec. Sorry for the confusion.
Doesn't matter, drm/i915 exporter code in that branch does the dma_map_sg for the attached device:
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=3212... afcce81c993e274a4d1ff28ddf86b3b0
I think Tomasz agrees with you on this one. The latest DMABUF exporter support in DRM/i915 implements mapping to the importer's device in the exporter driver.
- Omap: Afaik the address returned is from the TILER, which can be
accessed from any hw device. So it's already mapped into device address space, just not through an iommu managed by dma-api (but conceptually on the same level). This use case is very much one fo the reasons for exporting the sg list already mapped.
No. If the client driver has IOMMU then it cannot access the TILER by basing only on physical address. The client has to map the physical address to the device's address space. It can be done by transforming paddr into a scatterlist and mapping it using dma_map_sg. Otherwise the physical address is useless from the perspective of the client device.
Therefore returning a phys address as a dma_address is not a valid way of mapping a memory.
Of course if none of HW pieces uses IOMMU everything works fine. However, imo an introduction of IOMMU for some client device should __not__ force any change to the exporter code.
Well, there are 3 ways to make this work with the current dma_buf spec
- There are no other iommus (like you've noticed).
- There are other iommus, but the tiler area is mapped 1:1 into them (current approach used in omap according to Rob Clarke).
- Things from tiler are maped on-demand. Currently we don't have any dma interface to remap random pfns, so using these iommus managed by the dma api on top of the omap tiler won't work. What would work is adding a platform specific bit into the device's dma options to tell the omap tiler code to either map the dma_buf into the tiler or use the platform dma api to map it into device address space.
This use case of a special, driver-manged iommu which is thighly integrated with e.g. the gpu, but can also be accessed by other related devices is the core scenario I have in mind for mandating device address space mappings. Yes, this needs some special code in the exporter to make it work, but e.g. with omap it's the only way to shared buffer objects in a tiled layout through (all mapped into device address space through the tiler). For fast video processing pipelineds, using tiled objects is crucial for performance, and on many devices the solution for a common tiling layout is a common tiling unit like omaps tiler.
So I don't see the problem here.
I don't see any issue with the first two use cases (no IOMMU or 1:1 mappings), as I expect they will be all we need for tiling units. Tiling units expose a physically contiguous aperture, IOMMU mappings are thus pretty useless. I just cross my fingers and hope we won't get tiling units that expose non-contiguous apertures.
However, I'm not really sure what the implementation will look like. The OMAP tiler can be considered in a way as a special-purpose IOMMU. Even if we use 1:1 IOMMU mappings for the importer, we will need to setup those, resulting in two cascaded IOMMUs. I don't think the DMA API should handle that, so I expect the exporter to come up with an SG list of physical addresses corresponding to the tiler-mapped aperture, and map it to the importer's device using the DMA API. For tiling units that are part of the exporter, or at least tightly coupled with them, I don't expect issues there. For independent tiling units that can be used by several exporter devices, we will need a tiler API to map ranges on demand. We should be careful here, as creating a second IOMMU API wouldn't be such a great idea.
The problem of selecting how to tile the data is also left unanswered. Should the user see multiple instances of the same buffer, each with a different layout, and select the one the importer will need, or should that be handled automagically in the kernel ?
- Exynos uses the iommu api as a nice interface to handle it's graphics
aperture (akin to omap tiler or i915 gtt). Last time I've checked the discussion about how this should be integrated (and at what level) with the dma api is still ongoing. So it's imo way to early to conclude that exynos doesn't return device dma addresses because this iommu might not end up being managed by the core dma api.
- That's your code ;-)
Last but not least your dma_pages_from_sg (or whatever it's called) trick
The function was called dma_get_pages. As it was mentioned before this function was a workaround for deficiencies in DMA api. I agree that it is not possible to transform a buffer into a list of pages in general case i.e. there might be no struct pages for a given range of physical address at which RAM is available.
However if it is possible to transform a DMA buffer into a list of pages then the DMA subsystem is the best place to do it. That is why dma_get_pages was introduced to DMA api in my RFC. The alternative might be dma_get_sg to obtain the scatterlist but it is only a cosmetic difference from dma_get_pages.
Still, a hack that works sometimes but not always is guaranteed to blow up sooner or later. I'd like to get this right and not rework it too often, hence my stubborn opinion on it.
I'm pretty sure none of us want to get it wrong ;-)
I agree that dma_get_pages() is the wrong level of abstraction, as pages can't correctly describe all possible buffers. dma_get_sg() is slightly better, do you have another proposition ?
You can criticize as you like but please try to help us to find a better way to export a DMA buffer allocated by DMA api using function like dma_alloc_{coherent/noncoherent/writecombine}.
To be frank, the current dma api is not suitable to be used with dma_buf if one of your importers cannot deal with struct pages allocated with page_alloc. Which means all the devices that need CMA allocated through the dma api.
I don't think the problem is on the importer's side. Whether the memory comes from page_alloc, from a tiling unit or from a reserved memory block results from how the exporter has allocated memory in the first place. From the importer's point of view all we need is an SG list mapped to the importer's device. Whether the memory is backed by struct page or not doesn't mapper once the mapping has been created.
Now for the fully dynamic & insane case of a drm device being the exporter, I think we can live with this. Atm we only have seriously useful drm drivers on x86 and there we don't need CMA.
But I agree that this isn't a workable solution on arm and embedded platforms, and I think we need to pimp up the dma api to be useful for the basic case of sharing a statically allocated piece of memory.
My plan here is that we should have a default dma_buf exporter implementation that uses these new dma api interfaces so that for the usual case you don't have to write implementations for all these nutty dma_buf interface pieces. That would only be required for special cases that need the full generality of the interface (i.e. a drm exporter) or for memory allocated in a special manner (e.g. v4l userpointer buffer or memory allocated from a carveout not backed by struct pages).
To make that happen we need a new way to allocate coherent dma memory, so I guess we'll have a
void * dma_alloc_attrs_for_devices(struct device **devices, int num_devices, dma_addr_t **dma_handles, gfp_t gfp struct dma_attrs *attrs)
which would allocate coherent dma memory suitable for all the num_devices listed in devices, handing back the dma_handles in the array pointed to by dma_handles.
This would require a one-size-fits-them-all generic memory allocator that knows about all possible devices in a system. We've already discussed this in the past, and I'd *really* like to avoid that as it would quickly become insanely complex to handle. Such an API would also require knowing all the devices that a buffer will be used with beforehand, which isn't provided by the current DMABUF API.
For the no-iommu case (i.e. dma_generic_alloc_coherent) it would be enough to simply & toghether all the dma masks. On platforms that require more magic (e.g. CMA) this would be more complicated.
To make dynamic pipeline reconfiguration for the example dma_buf exporter possible, the example exporter could alloc a new buffer when a new devices gets attached and copy over the data from the old buffer to the new one, then releasing the old one. If that doesn't work, it can simply return -EBUSY or -ENOMEM.
If we later on also want to support the streaming dma api better for common dma_buf exporters, we can look into that. But I think that would be a second step after getting this dma api extension in.
I hope we've found the real reason for our disagreement on the dma_buf spec, if I'm totally off, please keep on shouting at me.
I don't think there's such a fundamental disagreement. The spec requires exporters to map buffers to the importer's device, and that's what is implemented by the latest work-in-progress patches. The V4L2 patches that move mapping to the importer's side are getting dropped.
Creating the importer's device mapping in the exporter driver was a bit counter-intuitive to me to start with, but I don't have a strong preference now. Things like tiler mappings might be easier to handle in the exporter's device, but at the end of the the core issue is mapping a piece of memory allocated for one device to another device.
I agree that passing physical addresses (regardless of whether those physical addresses are struct page pointers, PFNs or anything else) around is probably not a good solution. We already have a DMA API to abstract physical memory and provide devices with a DMA address, we should use and extend it. However, DMA addresses are specific to a device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless), they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind the scene and mapped to the device address space ("allocated" in this case means anything from plain physical memory allocation to reservation of a special- purpose memory range, like in the OMAP TILER example). All the allocator device driver gets to see is the DMA address and/or the DMA scatter list. Then, when we want to share the memory with a second device, we need a way to map the memory to the second device's address space. Whether that operation is performed in the importer or exporter driver isn't the core issue, in both cases the mapping will be delegated to the DMA API.
We currently have no way in the DMA API to map memory allocated and mapped to one device to a second device. From an API point of view, we need a call to perform that mapping. That call will need a pointer to the device to map the memory to, and a reference to the memory. One possible solution is to identify the memory by the combination of allocator device pointer and dma address. This would be pretty simple for the caller, but it might be complex to implement in the DMA API. I haven't looked at how this could be done yet.
On Thu, Apr 19, 2012 at 03:20:30PM +0200, Laurent Pinchart wrote:
Hi Daniel,
On Tuesday 17 April 2012 19:21:37 Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 04:23:08PM +0200, Tomasz Stanislawski wrote:
On 04/17/2012 03:03 PM, Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 01:40:47PM +0200, Tomasz Stanislawski wrote:
Hello everyone, I would like to ask about the agreement on a behavior of a DMABUF exporter for dma_buf_map_attachment.
According to the DMABUF spec the exporter should return a scatterlist mapped into importers DMA space. However there were issues about the concept.
I made a short survey for mapping strategy for DMABUF patches for some drivers:
V4L2 - support for dmabuf importing hopefully consistent with dmabuf spec. The patch "v4l: vb2-dma-contig: change map/unmap behaviour for importers" implement DMA mapping performed on the importer side. However the patch can be dropped at no cost.
Exynos DRM - the latest version implements mapping on the exporter side
Omap/DRM - 'mapping' is done on exporter side by setting a physical address> as DMA address in the scatterlist. The dma_map_sg should be used for this purpose
nouveau/i915 by Dave Airlie - mapping for client is done on importer side. http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2
Does it mean that it is agreed that the exporter is responsible for mapping into the client space?
Hi Daniel,
No, and imo you're examples are void: 4. Dave Airlie's latest rfc patches implement dma api mapping on the importers side. For i915 see e.g. commit http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=f5c6 bc7c5483ceaa79ed23c6b0af9b967ebc009f which directly grabs the sg list from the dma_buf exporter and shoves it into the into the intel gtt functions that require and already dma mapped sg list.
Sorry, my mistake. I was thinking about 'exporter' when I wrote 'importer'. I agree that i915 is consistent with DMABUF spec. Sorry for the confusion.
Doesn't matter, drm/i915 exporter code in that branch does the dma_map_sg for the attached device:
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=3212... afcce81c993e274a4d1ff28ddf86b3b0
I think Tomasz agrees with you on this one. The latest DMABUF exporter support in DRM/i915 implements mapping to the importer's device in the exporter driver.
- Omap: Afaik the address returned is from the TILER, which can be
accessed from any hw device. So it's already mapped into device address space, just not through an iommu managed by dma-api (but conceptually on the same level). This use case is very much one fo the reasons for exporting the sg list already mapped.
No. If the client driver has IOMMU then it cannot access the TILER by basing only on physical address. The client has to map the physical address to the device's address space. It can be done by transforming paddr into a scatterlist and mapping it using dma_map_sg. Otherwise the physical address is useless from the perspective of the client device.
Therefore returning a phys address as a dma_address is not a valid way of mapping a memory.
Of course if none of HW pieces uses IOMMU everything works fine. However, imo an introduction of IOMMU for some client device should __not__ force any change to the exporter code.
Well, there are 3 ways to make this work with the current dma_buf spec
- There are no other iommus (like you've noticed).
- There are other iommus, but the tiler area is mapped 1:1 into them (current approach used in omap according to Rob Clarke).
- Things from tiler are maped on-demand. Currently we don't have any dma interface to remap random pfns, so using these iommus managed by the dma api on top of the omap tiler won't work. What would work is adding a platform specific bit into the device's dma options to tell the omap tiler code to either map the dma_buf into the tiler or use the platform dma api to map it into device address space.
This use case of a special, driver-manged iommu which is thighly integrated with e.g. the gpu, but can also be accessed by other related devices is the core scenario I have in mind for mandating device address space mappings. Yes, this needs some special code in the exporter to make it work, but e.g. with omap it's the only way to shared buffer objects in a tiled layout through (all mapped into device address space through the tiler). For fast video processing pipelineds, using tiled objects is crucial for performance, and on many devices the solution for a common tiling layout is a common tiling unit like omaps tiler.
So I don't see the problem here.
I don't see any issue with the first two use cases (no IOMMU or 1:1 mappings), as I expect they will be all we need for tiling units. Tiling units expose a physically contiguous aperture, IOMMU mappings are thus pretty useless. I just cross my fingers and hope we won't get tiling units that expose non-contiguous apertures.
However, I'm not really sure what the implementation will look like. The OMAP tiler can be considered in a way as a special-purpose IOMMU. Even if we use 1:1 IOMMU mappings for the importer, we will need to setup those, resulting in two cascaded IOMMUs. I don't think the DMA API should handle that, so I expect the exporter to come up with an SG list of physical addresses corresponding to the tiler-mapped aperture, and map it to the importer's device using the DMA API. For tiling units that are part of the exporter, or at least tightly coupled with them, I don't expect issues there. For independent tiling units that can be used by several exporter devices, we will need a tiler API to map ranges on demand. We should be careful here, as creating a second IOMMU API wouldn't be such a great idea.
The problem of selecting how to tile the data is also left unanswered. Should the user see multiple instances of the same buffer, each with a different layout, and select the one the importer will need, or should that be handled automagically in the kernel ?
That's a problem I'm trying to dodge atm. But tiling is already an extremely platform/gpu specific thing and I don't expect that we will ever leak this information to applications. So if you want to share a tiled buffer between a codec and the gpu this will happen in a platform specific way anyway and needs tight integration of the codec and GL userspace driver parts anyway. So for this we could just let the kernel do whatever works best for the hw, i.e. if we have a tiling iommu we could say devices access goes through that (and essentially looks untiled) or if both devices support detiling natively the exporter could expose dma addresses directly. Or a mix of these.
- Exynos uses the iommu api as a nice interface to handle it's graphics
aperture (akin to omap tiler or i915 gtt). Last time I've checked the discussion about how this should be integrated (and at what level) with the dma api is still ongoing. So it's imo way to early to conclude that exynos doesn't return device dma addresses because this iommu might not end up being managed by the core dma api.
- That's your code ;-)
Last but not least your dma_pages_from_sg (or whatever it's called) trick
The function was called dma_get_pages. As it was mentioned before this function was a workaround for deficiencies in DMA api. I agree that it is not possible to transform a buffer into a list of pages in general case i.e. there might be no struct pages for a given range of physical address at which RAM is available.
However if it is possible to transform a DMA buffer into a list of pages then the DMA subsystem is the best place to do it. That is why dma_get_pages was introduced to DMA api in my RFC. The alternative might be dma_get_sg to obtain the scatterlist but it is only a cosmetic difference from dma_get_pages.
Still, a hack that works sometimes but not always is guaranteed to blow up sooner or later. I'd like to get this right and not rework it too often, hence my stubborn opinion on it.
I'm pretty sure none of us want to get it wrong ;-)
I agree that dma_get_pages() is the wrong level of abstraction, as pages can't correctly describe all possible buffers. dma_get_sg() is slightly better, do you have another proposition ?
Hm, I haven't seen a dma_get_sg proposal floating around anywhere, can you give a pointer?
You can criticize as you like but please try to help us to find a better way to export a DMA buffer allocated by DMA api using function like dma_alloc_{coherent/noncoherent/writecombine}.
To be frank, the current dma api is not suitable to be used with dma_buf if one of your importers cannot deal with struct pages allocated with page_alloc. Which means all the devices that need CMA allocated through the dma api.
I don't think the problem is on the importer's side. Whether the memory comes from page_alloc, from a tiling unit or from a reserved memory block results from how the exporter has allocated memory in the first place. From the importer's point of view all we need is an SG list mapped to the importer's device. Whether the memory is backed by struct page or not doesn't mapper once the mapping has been created.
Bad phrasing on my side, the problem is not with the importer. But with the current dma interfaces the exporter simply cannot allocate the memory required by the importer if the importer needs contigious coherent dma memory.
Now for the fully dynamic & insane case of a drm device being the exporter, I think we can live with this. Atm we only have seriously useful drm drivers on x86 and there we don't need CMA.
But I agree that this isn't a workable solution on arm and embedded platforms, and I think we need to pimp up the dma api to be useful for the basic case of sharing a statically allocated piece of memory.
My plan here is that we should have a default dma_buf exporter implementation that uses these new dma api interfaces so that for the usual case you don't have to write implementations for all these nutty dma_buf interface pieces. That would only be required for special cases that need the full generality of the interface (i.e. a drm exporter) or for memory allocated in a special manner (e.g. v4l userpointer buffer or memory allocated from a carveout not backed by struct pages).
To make that happen we need a new way to allocate coherent dma memory, so I guess we'll have a
void * dma_alloc_attrs_for_devices(struct device **devices, int num_devices, dma_addr_t **dma_handles, gfp_t gfp struct dma_attrs *attrs)
which would allocate coherent dma memory suitable for all the num_devices listed in devices, handing back the dma_handles in the array pointed to by dma_handles.
This would require a one-size-fits-them-all generic memory allocator that knows about all possible devices in a system. We've already discussed this in the past, and I'd *really* like to avoid that as it would quickly become insanely complex to handle. Such an API would also require knowing all the devices that a buffer will be used with beforehand, which isn't provided by the current DMABUF API.
The idea is that we reallocate this memory when a new devices shows up. If that is not possible, you'll simply get an error. I think this would also be more in line with v4l behaviour which demands that memory is allocated up-front (or must fail).
And I don't think this memory allocator will be too complicated to be implement. One thing to note is that it only tackles coherent dma memory, i.e. static allocation, fixed kernel mapping. That leaves out tons of special cases needed in drm land with all their implications on the dma_buf interface.
The other thing is that the dma api can already allocate this memory and we only need to figure out a way to aggregate all these constraints and allocate it from the right memory pool. For simple stuff like the dma_mask we can just & them together, for crazier stuff like contigous or a special CMA pool we could use flags and check which allocators support all the flags we need.
I've already looked a bit through the dma generic stuff and it should be too hard to integrate it there. Integration with CMA pools will be uglier, but should be doable.
Once the memory is allocated, the only thing left is wrapping the allocation in a sg list and calling the map_sg function of the relevant dma_map_ops for each device.
Totally crazy boards/SoCs/platforms are simply left with creating their own hacks, but I think for somewhat sane platforms this should work.
For the no-iommu case (i.e. dma_generic_alloc_coherent) it would be enough to simply & toghether all the dma masks. On platforms that require more magic (e.g. CMA) this would be more complicated.
To make dynamic pipeline reconfiguration for the example dma_buf exporter possible, the example exporter could alloc a new buffer when a new devices gets attached and copy over the data from the old buffer to the new one, then releasing the old one. If that doesn't work, it can simply return -EBUSY or -ENOMEM.
If we later on also want to support the streaming dma api better for common dma_buf exporters, we can look into that. But I think that would be a second step after getting this dma api extension in.
I hope we've found the real reason for our disagreement on the dma_buf spec, if I'm totally off, please keep on shouting at me.
I don't think there's such a fundamental disagreement. The spec requires exporters to map buffers to the importer's device, and that's what is implemented by the latest work-in-progress patches. The V4L2 patches that move mapping to the importer's side are getting dropped.
Creating the importer's device mapping in the exporter driver was a bit counter-intuitive to me to start with, but I don't have a strong preference now. Things like tiler mappings might be easier to handle in the exporter's device, but at the end of the the core issue is mapping a piece of memory allocated for one device to another device.
I agree that passing physical addresses (regardless of whether those physical addresses are struct page pointers, PFNs or anything else) around is probably not a good solution. We already have a DMA API to abstract physical memory and provide devices with a DMA address, we should use and extend it. However, DMA addresses are specific to a device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless), they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind the scene and mapped to the device address space ("allocated" in this case means anything from plain physical memory allocation to reservation of a special- purpose memory range, like in the OMAP TILER example). All the allocator device driver gets to see is the DMA address and/or the DMA scatter list. Then, when we want to share the memory with a second device, we need a way to map the memory to the second device's address space. Whether that operation is performed in the importer or exporter driver isn't the core issue, in both cases the mapping will be delegated to the DMA API.
We currently have no way in the DMA API to map memory allocated and mapped to one device to a second device. From an API point of view, we need a call to perform that mapping. That call will need a pointer to the device to map the memory to, and a reference to the memory. One possible solution is to identify the memory by the combination of allocator device pointer and dma address. This would be pretty simple for the caller, but it might be complex to implement in the DMA API. I haven't looked at how this could be done yet.
I think the allocate&map for multiple devices issue is solveable, see above. It's gonna be a bit of work to piece things together, though ;-)
Cheers, Daniel
Hi Daniel,
On Thursday 19 April 2012 15:59:42 Daniel Vetter wrote:
On Thu, Apr 19, 2012 at 03:20:30PM +0200, Laurent Pinchart wrote:
On Tuesday 17 April 2012 19:21:37 Daniel Vetter wrote:
On Tue, Apr 17, 2012 at 04:23:08PM +0200, Tomasz Stanislawski wrote:
[snip]
The function was called dma_get_pages. As it was mentioned before this function was a workaround for deficiencies in DMA api. I agree that it is not possible to transform a buffer into a list of pages in general case i.e. there might be no struct pages for a given range of physical address at which RAM is available.
However if it is possible to transform a DMA buffer into a list of pages then the DMA subsystem is the best place to do it. That is why dma_get_pages was introduced to DMA api in my RFC. The alternative might be dma_get_sg to obtain the scatterlist but it is only a cosmetic difference from dma_get_pages.
Still, a hack that works sometimes but not always is guaranteed to blow up sooner or later. I'd like to get this right and not rework it too often, hence my stubborn opinion on it.
I'm pretty sure none of us want to get it wrong ;-)
I agree that dma_get_pages() is the wrong level of abstraction, as pages can't correctly describe all possible buffers. dma_get_sg() is slightly better, do you have another proposition ?
Hm, I haven't seen a dma_get_sg proposal floating around anywhere, can you give a pointer?
I don't think one has been posted. My point was that a DMA API function that returned a scatter list instead of a table of struct page would be better.
You can criticize as you like but please try to help us to find a better way to export a DMA buffer allocated by DMA api using function like dma_alloc_{coherent/noncoherent/writecombine}.
To be frank, the current dma api is not suitable to be used with dma_buf if one of your importers cannot deal with struct pages allocated with page_alloc. Which means all the devices that need CMA allocated through the dma api.
I don't think the problem is on the importer's side. Whether the memory comes from page_alloc, from a tiling unit or from a reserved memory block results from how the exporter has allocated memory in the first place. From the importer's point of view all we need is an SG list mapped to the importer's device. Whether the memory is backed by struct page or not doesn't mapper once the mapping has been created.
Bad phrasing on my side, the problem is not with the importer. But with the current dma interfaces the exporter simply cannot allocate the memory required by the importer if the importer needs contigious coherent dma memory.
Now for the fully dynamic & insane case of a drm device being the exporter, I think we can live with this. Atm we only have seriously useful drm drivers on x86 and there we don't need CMA.
But I agree that this isn't a workable solution on arm and embedded platforms, and I think we need to pimp up the dma api to be useful for the basic case of sharing a statically allocated piece of memory.
My plan here is that we should have a default dma_buf exporter implementation that uses these new dma api interfaces so that for the usual case you don't have to write implementations for all these nutty dma_buf interface pieces. That would only be required for special cases that need the full generality of the interface (i.e. a drm exporter) or for memory allocated in a special manner (e.g. v4l userpointer buffer or memory allocated from a carveout not backed by struct pages).
To make that happen we need a new way to allocate coherent dma memory, so I guess we'll have a
void * dma_alloc_attrs_for_devices(struct device **devices, int num_devices,
dma_addr_t **dma_handles, gfp_t gfp struct dma_attrs *attrs)
which would allocate coherent dma memory suitable for all the num_devices listed in devices, handing back the dma_handles in the array pointed to by dma_handles.
This would require a one-size-fits-them-all generic memory allocator that knows about all possible devices in a system. We've already discussed this in the past, and I'd *really* like to avoid that as it would quickly become insanely complex to handle. Such an API would also require knowing all the devices that a buffer will be used with beforehand, which isn't provided by the current DMABUF API.
The idea is that we reallocate this memory when a new devices shows up. If that is not possible, you'll simply get an error. I think this would also be more in line with v4l behaviour which demands that memory is allocated up-front (or must fail).
That's an interesting idea, although I'm not sure if we should really do so. Sure, it completely abstracts memory allocation from userspace and provides something that just works (that is in a couple of years time, when we'll be done with the implementation :-)). However, the performance impact of reallocating the memory is likely prohibitive, so we will run into issues if applications don't act smartly. Providing an API that just works but then fails in some cases because of performance issues (especially when the issues are potentially random-looking from a userspace perspective) is asking for trouble. I'd rather have a solid API that requires a bit of thinking on userspace's side, to make sure userspace developer think about what they do.
And I don't think this memory allocator will be too complicated to be implement. One thing to note is that it only tackles coherent dma memory, i.e. static allocation, fixed kernel mapping. That leaves out tons of special cases needed in drm land with all their implications on the dma_buf interface.
The other thing is that the dma api can already allocate this memory and we only need to figure out a way to aggregate all these constraints and allocate it from the right memory pool. For simple stuff like the dma_mask we can just & them together, for crazier stuff like contigous or a special CMA pool we could use flags and check which allocators support all the flags we need.
One of those crazy constraints come from Samsung, They need to allocate the luma and chroma planes (for NV formats) in different memory banks for some of their IP blocks. Flags could be used to direct allocation to a specific CMA pool, but we would then need to know about the buffer content format to know what flag to set.
I've already looked a bit through the dma generic stuff and it should be too hard to integrate it there. Integration with CMA pools will be uglier, but should be doable.
Once the memory is allocated, the only thing left is wrapping the allocation in a sg list and calling the map_sg function of the relevant dma_map_ops for each device.
Totally crazy boards/SoCs/platforms are simply left with creating their own hacks, but I think for somewhat sane platforms this should work.
For the no-iommu case (i.e. dma_generic_alloc_coherent) it would be enough to simply & toghether all the dma masks. On platforms that require more magic (e.g. CMA) this would be more complicated.
To make dynamic pipeline reconfiguration for the example dma_buf exporter possible, the example exporter could alloc a new buffer when a new devices gets attached and copy over the data from the old buffer to the new one, then releasing the old one. If that doesn't work, it can simply return -EBUSY or -ENOMEM.
If we later on also want to support the streaming dma api better for common dma_buf exporters, we can look into that. But I think that would be a second step after getting this dma api extension in.
I hope we've found the real reason for our disagreement on the dma_buf spec, if I'm totally off, please keep on shouting at me.
I don't think there's such a fundamental disagreement. The spec requires exporters to map buffers to the importer's device, and that's what is implemented by the latest work-in-progress patches. The V4L2 patches that move mapping to the importer's side are getting dropped.
Creating the importer's device mapping in the exporter driver was a bit counter-intuitive to me to start with, but I don't have a strong preference now. Things like tiler mappings might be easier to handle in the exporter's device, but at the end of the the core issue is mapping a piece of memory allocated for one device to another device.
I agree that passing physical addresses (regardless of whether those physical addresses are struct page pointers, PFNs or anything else) around is probably not a good solution. We already have a DMA API to abstract physical memory and provide devices with a DMA address, we should use and extend it. However, DMA addresses are specific to a device, while dma-buf needs to share buffers between separate devices (otherwise it would be pretty pointless), they can't be used to describe a cross-device buffer.
When allocating a buffer using the DMA API, memory is "allocated" behind the scene and mapped to the device address space ("allocated" in this case means anything from plain physical memory allocation to reservation of a special- purpose memory range, like in the OMAP TILER example). All the allocator device driver gets to see is the DMA address and/or the DMA scatter list. Then, when we want to share the memory with a second device, we need a way to map the memory to the second device's address space. Whether that operation is performed in the importer or exporter driver isn't the core issue, in both cases the mapping will be delegated to the DMA API.
We currently have no way in the DMA API to map memory allocated and mapped to one device to a second device. From an API point of view, we need a call to perform that mapping. That call will need a pointer to the device to map the memory to, and a reference to the memory. One possible solution is to identify the memory by the combination of allocator device pointer and dma address. This would be pretty simple for the caller, but it might be complex to implement in the DMA API. I haven't looked at how this could be done yet.
I think the allocate&map for multiple devices issue is solveable, see above. It's gonna be a bit of work to piece things together, though ;-)
Do you think it would be that complex to split allocation and mapping ? Surely enough we could also provide an allocate & map function, but if we can separate the two operations we could also add an importer at runtime if needed.
Hi Tomasz,
On Tuesday 17 April 2012 16:23:08 Tomasz Stanislawski wrote:
[snip]
To everyone:
I see that there are multiple issues about DMABUF. My proposition is to organize a brainstorming session (circa 3 days) in Warsaw similar to V4L brainstorming that happened about a year ago. It will focus on topic related to DMA mapping and DMABUF.
What do you think about this idea? Who would like to attend?
I like the idea, as it would considerably speed up the process, but that requires at least one person from each relevant subsystem/framework to be present. I definitely would like to attend.
On Thu, Apr 19, 2012 at 08:26:31AM +0200, Laurent Pinchart wrote:
To everyone:
I see that there are multiple issues about DMABUF. My proposition is to organize a brainstorming session (circa 3 days) in Warsaw similar to V4L brainstorming that happened about a year ago. It will focus on topic related to DMA mapping and DMABUF.
What do you think about this idea? Who would like to attend?
I like the idea, as it would considerably speed up the process, but that requires at least one person from each relevant subsystem/framework to be present. I definitely would like to attend.
I think it's going to be hard to get Rob or Sumit to go to Europe this month. We could definitely help get calls and/or hangout conferences going (though the timezone spread makes the duration of them somewhat limited).
We are currently planning on running an integration camp in the UK in early May, and if by then this issue hasn't advanced otherwise we can definitely use some of that time to flesh out the remaining issues.
Hi Kiko,
On Thursday 19 April 2012 11:41:04 Christian Robottom Reis wrote:
On Thu, Apr 19, 2012 at 08:26:31AM +0200, Laurent Pinchart wrote:
To everyone:
I see that there are multiple issues about DMABUF. My proposition is to organize a brainstorming session (circa 3 days) in Warsaw similar to V4L brainstorming that happened about a year ago. It will focus on topic related to DMA mapping and DMABUF.
What do you think about this idea? Who would like to attend?
I like the idea, as it would considerably speed up the process, but that requires at least one person from each relevant subsystem/framework to be present. I definitely would like to attend.
I think it's going to be hard to get Rob or Sumit to go to Europe this month. We could definitely help get calls and/or hangout conferences going (though the timezone spread makes the duration of them somewhat limited).
We are currently planning on running an integration camp in the UK in early May, and if by then this issue hasn't advanced otherwise we can definitely use some of that time to flesh out the remaining issues.
Early may is in a couple of weeks only, I don't think we will organize anything before that anyway.
Hi Daniel,
I have already written about this issue in the dma mapping patch thread. But would like to add it here, to keep all known issues in one place, as Kiko wanted it to be. This probably will help him track the missing pieces from the UMM.
On 03/22/2012 11:31 PM, Daniel Vetter wrote:
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run demo Many dependencies
I kinda missed to yell at this patch when it first showed up, so I'll do that here ;-)
I think this is a gross layering violation and I don't like it at all. The entire point of the dma api is that device drivers only get to see device addresses and can forget about all the remapping/contig-alloc madness. And dma-buf should just follow this with it's map/unmap interfaces.
Furthermore the exporter memory might simply not have any associated struct pages. The two examples I always bring up:
- special purpose remapping units (like omap's TILER) which are managed by the exporter and can do crazy things like tiling or rotation transparently for all devices.
- special carve-out memory which is unknown to linux memory management. drm/i915 is totally abusing this, mostly because windows is lame and doesn't have decent largepage allocation support. This is just plain system memory, but there's no struct page for it (because it's not part of the system map).
Now the core dma api isn't fully up to snuff for everything yet and there are things missing. But it's certainly not dma_get_pages, but more things like mmap support for coherent memory or allocating coherent memroy which doesn't have a static mapping in the kernel address space. I very much hope that the interfaces we develop for dma-buf (and the insights gained) could help as examples here, so that in the further there's not such a gaping difference for the driver between dma_coherent allocations of it's own and imported buffer objects.
I have already faced the coherent memory issue. I would like to mention the scenario in which this was observed.
Origen has a encoder/decoder IP which can declare coherent regions. Now when dma_buf was used to share the buffer between this IP and another IP(for color space conversion), it was observed that the cache operations (aka cpu access to buffers) failed. The reason for this is, coherent memory wont have struct pages entries and also kernel addresses (not remapped into consistent dma area). When the cpu access the pages structure during the map_sg operations, it leads to a fault.
I had discussion with Marek on IRC for this. We need to find a mechanism to solve this issue. Please correct me if I am wrong somewhere in my analysis.
Yours, Daniel
Regards, Subash
Hi Christian,
A few comments. See below:
Em 22-03-2012 11:54, Christian Robottom Reis escreveu:
Hey there,
We ran a call today on the topics of V4L2/dmabuf integration and the
DMA mapping changes; I've included minutes and actions below.
In summary:
- We're not going to see the V4L2 patches in 3.4; they haven't had enough review or testing and Mauro is very busy this merge window. - We're going to put in effort to validate the work on Exynos4 and OMAP4 using test applications that Rob and Tomasz are maintaining. - Sumit's maintaining an up-to-date branch containing all the in-flight DRM/V4L2/dmabuf-dependent work that we can carry meanwhile: http://git.linaro.org/gitweb?p=people/sumitsemwal/linux-3.x.git;a=shortlog;h=refs/heads/umm-3.3-wip
Laurent, I missed one thing that you mentioned in the call and it's included in an XXX: below; if you could note what it was it would be great.
Please feel free to correct my possibly wild misunderstandings in the minutes and summary above. Thanks!
Attendees are all on the To: line.
Tomasz: At the moment, have PoC support for import/export dma-buf for V4L2 Modified patches by Sumit Supporting importer of dma-buf Exynos DRM and drm-prime Test application worked fine for V4L capture and DRM output Test application between two V4L devices Laurent has sent in review comments Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this Depends on dma_get_pages Needs to be merged first Test application posted to dri-devel with dependencies to run demo Many dependencies Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status- Will send request to Mauro? Doesn't think so, won't have enough time for testing RC is already open -- not enough time for that API details needs consideration Depends on how much time current project 2-3 weeks after the merge window closes 1 month to actually review Would like to see Exynos working Mauro has a Samsung V310/Exynos4 dev board Also wants to see tested with V4L driver virtual drivers also supporting dma-buf
It should be, instead:
Also wants to see tested with V4L virtual driver (vivi) also supporting dma-buf
wants both importer and exporter?
I want to simulate what's currently done with the Xorg v4l driver/xawtv and a board with overlay support, as the dma-buf is, on my understanding, a replacement for it.
ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter
s/VV/vivi/
Laurent has work that he can push forward in parallel API change ioctl addition that could be reviewed Rob: the demo had V4L importer and DRM exporter (?) There are some changes for prime changes Laurent: needs to implement importer/exporter camera piece With drm-prime-dmabuf patches drivers can be simplified Sumit: we have a new 3.3-WIP branch Rob could use that as a base and put DRM updates on it ACTION: Rob to pull in changes and update demonstration Rob: what camera does Mauro have? On the Samsung SDK, M5MOLS Mauro does have a Pandaboard with ov5650 camera but needs setting up and potential hardware mod ACTION: Mauro to take picture of setup for Sumit
Tried to take a few pictures on it, but they weren't clear enough.
What happens is that there's a connector to be soldered, in order to allow me to plug the camera daughter board. The connector Sumit sent me is higher on one of the sides. There is a small "peek" on it (probably, at the side that should match pin 1), as if it were expecting a hole at the board. Without either adding a hole at the motherboard or cutting the "peek", the pins can't be soldered.
Should I cut try to cut it? How do I know what's the right position of the connector?
As a backup Rob could add a Tested-By for the changes Mauro essentially wants a test on With CPU access in dmabuf can use vmalloc Without dependency on dma mapping at all Without Russell's acceptance can't go forward with ARM-related pieces ACTION: Kiko to check with Nicolas on this piece, see if he can review or help Marek with the ARM-related pieces Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch Tomasz: so no XXX: exporter support for V4L? Laurent: doesn't have time to update XXX at the moment Needs porting to videobuf2 Rob: Looks like Daniel V. has also replied to the V4L patches so it's going to be hard for 3.4 Mauro's lack of time makes 3.4 not possible anyway mmap support also likely to miss 3.4 EGLImage extension needs sorting out Can be carried in our WIP branch meanwhile Mauro: what are the test applications being used for dmabuf? Rob: using Panda with camera adapter board (omap-drm) Would like others to have similar setups Requires YUV sensor, but there are patches that support the RAW YUV conversion which allows other sensors to be used Mauro: but what software app are you using? Rob: has test code in github: https://github.com/robclark/omapdrmtest Tomasz: Have posted one test application on dri-devel (March 6th) http://www.spinics.net/lists/dri-devel/msg19634.html Second application posted to linux-media As a reply to RFCv1 dma-buf V4L2 patches (Jan 26th) http://www.mail-archive.com/linux-media@vger.kernel.org/msg42522.html Mauro: include both applications together with patches when posting
A small addition to it: the applications should be either sent to the linux-media or, better, a link should be provided there, to allow to download the test environment.
Sumit: could ask Inki to provide similar exynos-drm test application in parallel to Rob's omap-drm; same interface will simplify testing Marek: device-to-device coordinate will be needed Will start next week ideally For a device-to-device operation, need this to avoid touching CPU cache Not sure about ARM stuff for 3.4; need review If they don't go in, will keep on dma-mapping-next Konrad's comments are addressed and Reviewed-bys added
Take care,
Hi Kiko, everyone,
Some update on my Action items: On 22 March 2012 20:24, Christian Robottom Reis kiko@linaro.org wrote:
Hey there,
We ran a call today on the topics of V4L2/dmabuf integration and the DMA mapping changes; I've included minutes and actions below.
In summary:
- We're not going to see the V4L2 patches in 3.4; they haven't had enough review or testing and Mauro is very busy this merge window.
- We're going to put in effort to validate the work on Exynos4 and OMAP4 using test applications that Rob and Tomasz are maintaining.
- Sumit's maintaining an up-to-date branch containing all the in-flight DRM/V4L2/dmabuf-dependent work that we can carry meanwhile: http://git.linaro.org/gitweb?p=people/sumitsemwal/linux-3.x.git%3Ba=shortlog...
Laurent, I missed one thing that you mentioned in the call and it's included in an XXX: below; if you could note what it was it would be great.
Please feel free to correct my possibly wild misunderstandings in the minutes and summary above. Thanks!
Attendees are all on the To: line.
Tomasz: At the moment, have PoC support for import/export dma-buf for V4L2 Modified patches by Sumit Supporting importer of dma-buf Exynos DRM and drm-prime
Test application worked fine for V4L capture and DRM output Test application between two V4L devices
Laurent has sent in review comments
Tomasz: proposed extension to DMA Mapping -- dma_get_pages Currently difficult to change the camera address into list of pages DMA framework has the knowledge of this list and could do this
Depends on dma_get_pages Needs to be merged first
Test application posted to dri-devel with dependencies to run demo Many dependencies
Topic: dmabuf importer from V4L2 For sharing with DRM do not need exporter Need some patches added to DRM prime prime for Exynos drm-prime ACTION: Tomasz to check with Inki Dae and David A. on status
<snip>
ACTION: Sumit to look into VV driver used for v4l2 testing needs importer and exporter
I will start looking into this soon - though most likely, it should be simple enough if Vivi is using dma-contig allocator of videobuf2.
<snip>
Tomasz could use the 3.3-WIP branch Sumit: rebasing not a good idea, but could pull in for-next Suggests Tomasz bases work on Linus' mainline Had problems with Tomasz' branch that is based on -next His branch includes Tomasz RFCv2 patches as well Laurent: agree with Sumit ACTION: Sumit to email Tomasz and CC: Inki Dae ACTION: send around latest drm-prime branch
I will send this email by today - Dave A has posted the next version of DRM Prime a short while back.
<snip>
On 27 March 2012 16:11, Sumit Semwal sumit.semwal@linaro.org wrote:
Hi Kiko, everyone,
Some update on my Action items:
<snip> > Tomasz could use the 3.3-WIP branch > Sumit: rebasing not a good idea, but could pull in for-next > Suggests Tomasz bases work on Linus' mainline > Had problems with Tomasz' branch that is based on -next > His branch includes Tomasz RFCv2 patches as well > Laurent: agree with Sumit > ACTION: Sumit to email Tomasz and CC: Inki Dae > ACTION: send around latest drm-prime branch I will send this email by today - Dave A has posted the next version of DRM Prime a short while back.
Drm prime support: latest version is at: http://cgit.freedesktop.org/~airlied/linux/log/?h=drm-dmabuf2 (he posted out v5 a little while back here: http://lists.freedesktop.org/archives/dri-devel/2012-March/020611.html) Latest dma-buf: http://git.linaro.org/git/people/sumitsemwal/linux-dma-buf.git branch: for-linus-3.4
<snip> --
Best regards, ~Sumit.
Hi Everyone,
I started preparing a support for DMABUF in VIVI allocator. I encountered a problem that may involve making an important design decision.
Option I
Use existing dma_buf_attach/dma_buf_map_attachment mechanism.
The allocator vb2-vmalloc (thus VIVI) would be relatively easy to implement if one would allow to call dma_buf_attach on NULL device. AFAIK, dmabuf code does not dereference this pointer. Permitting passing NULL as device pointer would allow DMABUF to be accessed by importer not associated with any device (like VIVI). After obtaining the sglist the importer would map it into its kernel address space using kmap or vm_map_ram or remap_pfn_range. Note that memory would be pinned after calling dma_buf_map_attachment, so the memory will not be freed in parallel.
The cache flushing would still be an unsolved problem but the same situation happens for non-NULL devices. It may be fixed by future extension to dmabuf API.
I prefer this approach because it is compatible with 'importer-maps-memory-for-importer' strategy.
Option II
Recently support for kernel access by CPU for DMABUF was proposed by Daniel Vetter. It seams to be suitable for VIVI and vb2-vmalloc allocator.
However there are some issues.
1. VIVI requires that the whole kernel mapping is contiguous in VMALLOC area, accessible by a single pointer pointing to the first byte of a buffer. The interface proposed by Daniel involves calling a dma_buf_kmap. This function takes a page number as the argument. However the spec does not guarantee if page n and page n+1 would be mapped into sequential addresses. I think that this requirement should be added to the spec.
2. AFAIK, usage of dma_buf_kmap interface does not require any attach operation. Therefore CPU access is more-or-less a parallel mechanism for memory access.
Calling dma_buf_map_attachment is seams to be equivalent for calling dma_buf_begin_cpu_access for the whole buffer. Plus kmap for all pages if an exporter is responsible for mapping memory at dma_buf_map_attachment
Is it really worth to introduce a parallel mechanism?
Could you give me a hint which solution is better?
Regards, Tomasz Stanislawski
On Tue, Mar 27, 2012 at 1:25 PM, Tomasz Stanislawski t.stanislaws@samsung.com wrote:
Hi Everyone,
I started preparing a support for DMABUF in VIVI allocator. I encountered a problem that may involve making an important design decision.
Option I
Use existing dma_buf_attach/dma_buf_map_attachment mechanism.
The allocator vb2-vmalloc (thus VIVI) would be relatively easy to implement if one would allow to call dma_buf_attach on NULL device. AFAIK, dmabuf code does not dereference this pointer. Permitting passing NULL as device pointer would allow DMABUF to be accessed by importer not associated with any device (like VIVI). After obtaining the sglist the importer would map it into its kernel address space using kmap or vm_map_ram or remap_pfn_range. Note that memory would be pinned after calling dma_buf_map_attachment, so the memory will not be freed in parallel.
The cache flushing would still be an unsolved problem but the same situation happens for non-NULL devices. It may be fixed by future extension to dmabuf API.
I prefer this approach because it is compatible with 'importer-maps-memory-for-importer' strategy.
Option II
Recently support for kernel access by CPU for DMABUF was proposed by Daniel Vetter. It seams to be suitable for VIVI and vb2-vmalloc allocator.
However there are some issues.
- VIVI requires that the whole kernel mapping is contiguous in VMALLOC area,
accessible by a single pointer pointing to the first byte of a buffer. The interface proposed by Daniel involves calling a dma_buf_kmap. This function takes a page number as the argument. However the spec does not guarantee if page n and page n+1 would be mapped into sequential addresses. I think that this requirement should be added to the spec.
- AFAIK, usage of dma_buf_kmap interface does not require any attach operation.
Therefore CPU access is more-or-less a parallel mechanism for memory access.
Calling dma_buf_map_attachment is seams to be equivalent for calling dma_buf_begin_cpu_access for the whole buffer. Plus kmap for all pages if an exporter is responsible for mapping memory at dma_buf_map_attachment
Is it really worth to introduce a parallel mechanism?
Could you give me a hint which solution is better?
Option III - write a vmap interface ala,
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=c481...
I'm using this for i915->udl mappings, though vmap is a limited resource and I'm sure on ARM its even more limited.
Dave.
Hi Dave,
Thank you for the suggestion of using vmap/vunmap extensions for dmabuf. It was exactly what I needed. It leads to trivial (circa ~60 lines) implementation of DMABUF importer for vmalloc.
I prepared a PoC implementation and successfully tested it using s5p-tv as exporter and VIVI as importer. I will prepare and post patches and test application for DRM vs VIVI version next week. The Exynos DRM makes use of dma_alloc_coherent for a buffer allocation making implementation of vmap/vunmap trivial.
Memory coherence fixes all caching problems :)
I decided to give up setup CPU access using sglist provided by dma_buf_map_attachment. I agree with you that mapping in vmalloc area are a scare resource. Therefore mapping a scatterlist into vmalloc area it not a good idea.
The kmap interface presented by Vetter is much more generic and robust but does not suit well to vb2-vmalloc's and VIVI's designs.
Adding request that consecutive pages are mapped into consecutive address will help little because VIVI has to touch the whole buffer anyway. Therefore vmalloc would have to kmap all pages during map_dmabuf operation. Moreover, requesting kmap to map consecutive pages into consecutive address may lead to great fragmentation of vmalloc area.
Calling simple vmap callback would exactly all what needs to be done for vmalloc.
Regards, Tomasz Stanislawski
On 03/27/2012 02:48 PM, Dave Airlie wrote:
On Tue, Mar 27, 2012 at 1:25 PM, Tomasz Stanislawski t.stanislaws@samsung.com wrote:
Hi Everyone,
I started preparing a support for DMABUF in VIVI allocator. I encountered a problem that may involve making an important design decision.
Option I
Use existing dma_buf_attach/dma_buf_map_attachment mechanism.
The allocator vb2-vmalloc (thus VIVI) would be relatively easy to implement if one would allow to call dma_buf_attach on NULL device. AFAIK, dmabuf code does not dereference this pointer. Permitting passing NULL as device pointer would allow DMABUF to be accessed by importer not associated with any device (like VIVI). After obtaining the sglist the importer would map it into its kernel address space using kmap or vm_map_ram or remap_pfn_range. Note that memory would be pinned after calling dma_buf_map_attachment, so the memory will not be freed in parallel.
The cache flushing would still be an unsolved problem but the same situation happens for non-NULL devices. It may be fixed by future extension to dmabuf API.
I prefer this approach because it is compatible with 'importer-maps-memory-for-importer' strategy.
Option II
Recently support for kernel access by CPU for DMABUF was proposed by Daniel Vetter. It seams to be suitable for VIVI and vb2-vmalloc allocator.
However there are some issues.
- VIVI requires that the whole kernel mapping is contiguous in VMALLOC area,
accessible by a single pointer pointing to the first byte of a buffer. The interface proposed by Daniel involves calling a dma_buf_kmap. This function takes a page number as the argument. However the spec does not guarantee if page n and page n+1 would be mapped into sequential addresses. I think that this requirement should be added to the spec.
- AFAIK, usage of dma_buf_kmap interface does not require any attach operation.
Therefore CPU access is more-or-less a parallel mechanism for memory access.
Calling dma_buf_map_attachment is seams to be equivalent for calling dma_buf_begin_cpu_access for the whole buffer. Plus kmap for all pages if an exporter is responsible for mapping memory at dma_buf_map_attachment
Is it really worth to introduce a parallel mechanism?
Could you give me a hint which solution is better?
Option III - write a vmap interface ala,
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=c481...
I'm using this for i915->udl mappings, though vmap is a limited resource and I'm sure on ARM its even more limited.
Dave.
Hi Tomasz,
I have updated drm prime for Exynos drm. for this you can refer to below link: git://git.infradead.org/users/kmpark/linux-samsung exynos-drm-prime
this branch is based on git://people.freedesktop.org/~airlied/linux.git drm-prime-dmabuf-initial posted by Dave recently. I have already tested this version internally(just for import and export) and worked fine. and then I will try to test it with v4l2 based mfc and fimc driver on Exynos4412 SoC next week.
please let me know if there is any problem.
Thanks, Inki Dae
2012년 3월 30일 오전 5:35, Tomasz Stanislawski t.stanislaws@samsung.com님의 말:
Hi Dave,
Thank you for the suggestion of using vmap/vunmap extensions for dmabuf. It was exactly what I needed. It leads to trivial (circa ~60 lines) implementation of DMABUF importer for vmalloc.
I prepared a PoC implementation and successfully tested it using s5p-tv as exporter and VIVI as importer. I will prepare and post patches and test application for DRM vs VIVI version next week. The Exynos DRM makes use of dma_alloc_coherent for a buffer allocation making implementation of vmap/vunmap trivial.
Memory coherence fixes all caching problems :)
I decided to give up setup CPU access using sglist provided by dma_buf_map_attachment. I agree with you that mapping in vmalloc area are a scare resource. Therefore mapping a scatterlist into vmalloc area it not a good idea.
The kmap interface presented by Vetter is much more generic and robust but does not suit well to vb2-vmalloc's and VIVI's designs.
Adding request that consecutive pages are mapped into consecutive address will help little because VIVI has to touch the whole buffer anyway. Therefore vmalloc would have to kmap all pages during map_dmabuf operation. Moreover, requesting kmap to map consecutive pages into consecutive address may lead to great fragmentation of vmalloc area.
Calling simple vmap callback would exactly all what needs to be done for vmalloc.
Regards, Tomasz Stanislawski
On 03/27/2012 02:48 PM, Dave Airlie wrote:
On Tue, Mar 27, 2012 at 1:25 PM, Tomasz Stanislawski t.stanislaws@samsung.com wrote:
Hi Everyone,
I started preparing a support for DMABUF in VIVI allocator. I encountered a problem that may involve making an important design decision.
Option I
Use existing dma_buf_attach/dma_buf_map_attachment mechanism.
The allocator vb2-vmalloc (thus VIVI) would be relatively easy to implement if one would allow to call dma_buf_attach on NULL device. AFAIK, dmabuf code does not dereference this pointer. Permitting passing NULL as device pointer would allow DMABUF to be accessed by importer not associated with any device (like VIVI). After obtaining the sglist the importer would map it into its kernel address space using kmap or vm_map_ram or remap_pfn_range. Note that memory would be pinned after calling dma_buf_map_attachment, so the memory will not be freed in parallel.
The cache flushing would still be an unsolved problem but the same situation happens for non-NULL devices. It may be fixed by future extension to dmabuf API.
I prefer this approach because it is compatible with 'importer-maps-memory-for-importer' strategy.
Option II
Recently support for kernel access by CPU for DMABUF was proposed by Daniel Vetter. It seams to be suitable for VIVI and vb2-vmalloc allocator.
However there are some issues.
- VIVI requires that the whole kernel mapping is contiguous in VMALLOC area,
accessible by a single pointer pointing to the first byte of a buffer. The interface proposed by Daniel involves calling a dma_buf_kmap. This function takes a page number as the argument. However the spec does not guarantee if page n and page n+1 would be mapped into sequential addresses. I think that this requirement should be added to the spec.
- AFAIK, usage of dma_buf_kmap interface does not require any attach operation.
Therefore CPU access is more-or-less a parallel mechanism for memory access.
Calling dma_buf_map_attachment is seams to be equivalent for calling dma_buf_begin_cpu_access for the whole buffer. Plus kmap for all pages if an exporter is responsible for mapping memory at dma_buf_map_attachment
Is it really worth to introduce a parallel mechanism?
Could you give me a hint which solution is better?
Option III - write a vmap interface ala,
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=c481...
I'm using this for i915->udl mappings, though vmap is a limited resource and I'm sure on ARM its even more limited.
Dave.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Just as a heads-up, I've been in contact with Mauro who is a) still quite busy and b) still blocked on hardware with an operational sensor. I'm working on unblocking him at least on b) and I'm really hoping we're all thinking about 3.5 as the window for this work to go in.
On Fri, Mar 30, 2012 at 02:35:52PM +0200, Tomasz Stanislawski wrote:
Hi Dave,
Thank you for the suggestion of using vmap/vunmap extensions for dmabuf. It was exactly what I needed. It leads to trivial (circa ~60 lines) implementation of DMABUF importer for vmalloc.
I prepared a PoC implementation and successfully tested it using s5p-tv as exporter and VIVI as importer. I will prepare and post patches and test application for DRM vs VIVI version next week. The Exynos DRM makes use of dma_alloc_coherent for a buffer allocation making implementation of vmap/vunmap trivial.
Memory coherence fixes all caching problems :)
I decided to give up setup CPU access using sglist provided by dma_buf_map_attachment. I agree with you that mapping in vmalloc area are a scare resource. Therefore mapping a scatterlist into vmalloc area it not a good idea.
The kmap interface presented by Vetter is much more generic and robust but does not suit well to vb2-vmalloc's and VIVI's designs.
Adding request that consecutive pages are mapped into consecutive address will help little because VIVI has to touch the whole buffer anyway. Therefore vmalloc would have to kmap all pages during map_dmabuf operation. Moreover, requesting kmap to map consecutive pages into consecutive address may lead to great fragmentation of vmalloc area.
Calling simple vmap callback would exactly all what needs to be done for vmalloc.
Regards, Tomasz Stanislawski
On 03/27/2012 02:48 PM, Dave Airlie wrote:
On Tue, Mar 27, 2012 at 1:25 PM, Tomasz Stanislawski t.stanislaws@samsung.com wrote:
Hi Everyone,
I started preparing a support for DMABUF in VIVI allocator. I encountered a problem that may involve making an important design decision.
Option I
Use existing dma_buf_attach/dma_buf_map_attachment mechanism.
The allocator vb2-vmalloc (thus VIVI) would be relatively easy to implement if one would allow to call dma_buf_attach on NULL device. AFAIK, dmabuf code does not dereference this pointer. Permitting passing NULL as device pointer would allow DMABUF to be accessed by importer not associated with any device (like VIVI). After obtaining the sglist the importer would map it into its kernel address space using kmap or vm_map_ram or remap_pfn_range. Note that memory would be pinned after calling dma_buf_map_attachment, so the memory will not be freed in parallel.
The cache flushing would still be an unsolved problem but the same situation happens for non-NULL devices. It may be fixed by future extension to dmabuf API.
I prefer this approach because it is compatible with 'importer-maps-memory-for-importer' strategy.
Option II
Recently support for kernel access by CPU for DMABUF was proposed by Daniel Vetter. It seams to be suitable for VIVI and vb2-vmalloc allocator.
However there are some issues.
- VIVI requires that the whole kernel mapping is contiguous in VMALLOC area,
accessible by a single pointer pointing to the first byte of a buffer. The interface proposed by Daniel involves calling a dma_buf_kmap. This function takes a page number as the argument. However the spec does not guarantee if page n and page n+1 would be mapped into sequential addresses. I think that this requirement should be added to the spec.
- AFAIK, usage of dma_buf_kmap interface does not require any attach operation.
Therefore CPU access is more-or-less a parallel mechanism for memory access.
Calling dma_buf_map_attachment is seams to be equivalent for calling dma_buf_begin_cpu_access for the whole buffer. Plus kmap for all pages if an exporter is responsible for mapping memory at dma_buf_map_attachment
Is it really worth to introduce a parallel mechanism?
Could you give me a hint which solution is better?
Option III - write a vmap interface ala,
http://cgit.freedesktop.org/~airlied/linux/commit/?h=drm-dmabuf2&id=c481...
I'm using this for i915->udl mappings, though vmap is a limited resource and I'm sure on ARM its even more limited.
Dave.
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Thu, Mar 22, 2012 at 11:54:55AM -0300, Christian Robottom Reis wrote:
Marek: Not sure about ARM stuff for 3.4; need review If they don't go in, will keep on dma-mapping-next
Just a heads-up that Linus is looking for users of the DMA mapping framework to confirm they would like to see Marek's changes accepted:
https://lkml.org/lkml/2012/3/31/214
[...]
* DMA-mapping framework. The tree now has a few more acks from people, and it's largely in the same situation as HSI is: I'll probably pull, but I really wanted the users who are impacted to actually talk to me about it.
Please chime in and say why this patchset will make the world better! Thanks,
linaro-mm-sig@lists.linaro.org