(changed subject and decoupling from udmabuf thread)
On Wed, Apr 11, 2018 at 08:59:32AM +0300, Oleksandr Andrushchenko wrote:
> On 04/10/2018 08:26 PM, Dongwon Kim wrote:
> >On Tue, Apr 10, 2018 at 09:37:53AM +0300, Oleksandr Andrushchenko wrote:
> >>On 04/06/2018 09:57 PM, Dongwon Kim wrote:
> >>>On Fri, Apr 06, 2018 at 03:36:03PM +0300, Oleksandr Andrushchenko wrote:
> >>>>On 04/06/2018 02:57 PM, Gerd Hoffmann wrote:
> >>>>> Hi,
> >>>>>
> >>>>>>>I fail to see any common ground for xen-zcopy and udmabuf ...
> >>>>>>Does the above mean you can assume that xen-zcopy and udmabuf
> >>>>>>can co-exist as two different solutions?
> >>>>>Well, udmabuf route isn't fully clear yet, but yes.
> >>>>>
> >>>>>See also gvt (intel vgpu), where the hypervisor interface is abstracted
> >>>>>away into a separate kernel modules even though most of the actual vgpu
> >>>>>emulation code is common.
> >>>>Thank you for your input, I'm just trying to figure out
> >>>>which of the three z-copy solutions intersect and how much
> >>>>>>And what about hyper-dmabuf?
> >>>xen z-copy solution is pretty similar fundamentally to hyper_dmabuf
> >>>in terms of these core sharing feature:
> >>>
> >>>1. the sharing process - import prime/dmabuf from the producer -> extract
> >>>underlying pages and get those shared -> return references for shared pages
> >Another thing is danvet was kind of against to the idea of importing existing
> >dmabuf/prime buffer and forward it to the other domain due to synchronization
> >issues. He proposed to make hyper_dmabuf only work as an exporter so that it
> >can have a full control over the buffer. I think we need to talk about this
> >further as well.
> Yes, I saw this. But this limits the use-cases so much.
I agree. Our current approach is a lot more flexible. You can find very
similar feedback in my reply to those review messages. However, I also
understand Daniel's concern as well. I believe we need more dicussion
regarding this matter.
> For instance, running Android as a Guest (which uses ION to allocate
> buffers) means that finally HW composer will import dma-buf into
> the DRM driver. Then, in case of xen-front for example, it needs to be
> shared with the backend (Host side). Of course, we can change user-space
> to make xen-front allocate the buffers (make it exporter), but what we try
> to avoid is to change user-space which in normal world would have remain
> unchanged otherwise.
> So, I do think we have to support this use-case and just have to understand
> the complexity.
>
> >
> >danvet, can you comment on this topic?
> >
> >>>2. the page sharing mechanism - it uses Xen-grant-table.
> >>>
> >>>And to give you a quick summary of differences as far as I understand
> >>>between two implementations (please correct me if I am wrong, Oleksandr.)
> >>>
> >>>1. xen-zcopy is DRM specific - can import only DRM prime buffer
> >>>while hyper_dmabuf can export any dmabuf regardless of originator
> >>Well, this is true. And at the same time this is just a matter
> >>of extending the API: xen-zcopy is a helper driver designed for
> >>xen-front/back use-case, so this is why it only has DRM PRIME API
> >>>2. xen-zcopy doesn't seem to have dma-buf synchronization between two VMs
> >>>while (as danvet called it as remote dmabuf api sharing) hyper_dmabuf sends
> >>>out synchronization message to the exporting VM for synchronization.
> >>This is true. Again, this is because of the use-cases it covers.
> >>But having synchronization for a generic solution seems to be a good idea.
> >Yeah, understood xen-zcopy works ok with your use case. But I am just curious
> >if it is ok not to have any inter-domain synchronization in this sharing model.
> The synchronization is done with displif protocol [1]
> >The buffer being shared is technically dma-buf and originator needs to be able
> >to keep track of it.
> As I am working in DRM terms the tracking is done by the DRM core
> for me for free. (This might be one of the reasons Daniel sees DRM
> based implementation fit very good from code-reuse POV).
yeah but once you have a DRM object (whether it's dmabuf or not) on a remote
domain, it is totally new object and out of sync (correct me if I am wrong)
with original DRM prime, isn't it? How could these two different but based on
same pages be synchronized?
> >
> >>>3. 1-level references - when using grant-table for sharing pages, there will
> >>>be same # of refs (each 8 byte)
> >>To be precise, grant ref is 4 bytes
> >You are right. Thanks for correction.;)
> >
> >>>as # of shared pages, which is passed to
> >>>the userspace to be shared with importing VM in case of xen-zcopy.
> >>The reason for that is that xen-zcopy is a helper driver, e.g.
> >>the grant references come from the display backend [1], which implements
> >>Xen display protocol [2]. So, effectively the backend extracts references
> >>from frontend's requests and passes those to xen-zcopy as an array
> >>of refs.
> >>> Compared
> >>>to this, hyper_dmabuf does multiple level addressing to generate only one
> >>>reference id that represents all shared pages.
> >>In the protocol [2] only one reference to the gref directory is passed
> >>between VMs
> >>(and the gref directory is a single-linked list of shared pages containing
> >>all
> >>of the grefs of the buffer).
> >ok, good to know. I will look into its implementation in more details but is
> >this gref directory (chained grefs) something that can be used for any general
> >memory sharing use case or is it jsut for xen-display (in current code base)?
> Not to mislead you: one grant ref is passed via displif protocol,
> but the page it's referencing contains the rest of the grant refs.
>
I checked displif.h. I like the concept of chaining 2nd level grefs.
As you should have already realized, our multi-level addressing is almost
identical to gref directory except that we defined another level on top to
address multiple 2nd level grefs instead of creating a linked list. And I
see there would be an advantage in terms of memory saving in your method.
Now I think why it should be remaining as one of displif features. I think
we could expand this to any type of large buffer sharing use-case in Xen
(possibly as an extension to grant-table driver?)
> As to if this can be used for any memory: yes. It is the same for
> sndif and displif Xen protocols, but defined twice as strictly speaking
> sndif and displif are two separate protocols.
>
> While reviewing your RFC v2 one of the comments I had [2] was that if we
> can start from defining such a generic protocol for hyper-dmabuf.
> It can be a header file, which not only has the description part
> (which then become a part of Documentation/...rst file), but also defines
> all the required constants for requests, responses, defines message formats,
> state diagrams etc. all at one place. Of course this protocol must not be
> Xen specific, but be OS/hypervisor agnostic.
> Having that will trigger a new round of discussion, so we have it all
> designed
> and discussed before we start implementing.
>
> Besides the protocol we have to design UAPI part as well and make sure
> the hyper-dmabuf is not only accessible from user-space, but there will be
> number
> of kernel-space users as well.
> >
> >>>4. inter VM messaging (hype_dmabuf only) - hyper_dmabuf has inter-vm msg
> >>>communication defined for dmabuf synchronization and private data (meta
> >>>info that Matt Roper mentioned) exchange.
> >>This is true, xen-zcopy has no means for inter VM sync and meta-data,
> >>simply because it doesn't have any code for inter VM exchange in it,
> >>e.g. the inter VM protocol is handled by the backend [1].
> >>>5. driver-to-driver notification (hyper_dmabuf only) - importing VM gets
> >>>notified when newdmabuf is exported from other VM - uevent can be optionally
> >>>generated when this happens.
> >>>
> >>>6. structure - hyper_dmabuf is targetting to provide a generic solution for
> >>>inter-domain dmabuf sharing for most hypervisors, which is why it has two
> >>>layers as mattrope mentioned, front-end that contains standard API and backend
> >>>that is specific to hypervisor.
> >>Again, xen-zcopy is decoupled from inter VM communication
> >>>>>No idea, didn't look at it in detail.
> >>>>>
> >>>>>Looks pretty complex from a distant view. Maybe because it tries to
> >>>>>build a communication framework using dma-bufs instead of a simple
> >>>>>dma-buf passing mechanism.
> >>>we started with simple dma-buf sharing but realized there are many
> >>>things we need to consider in real use-case, so we added communication
> >>>, notification and dma-buf synchronization then re-structured it to
> >>>front-end and back-end (this made things more compicated..) since Xen
> >>>was not our only target. Also, we thought passing the reference for the
> >>>buffer (hyper_dmabuf_id) is not secure so added uvent mechanism later.
> >>>
> >>>>Yes, I am looking at it now, trying to figure out the full story
> >>>>and its implementation. BTW, Intel guys were about to share some
> >>>>test application for hyper-dmabuf, maybe I have missed one.
> >>>>It could probably better explain the use-cases and the complexity
> >>>>they have in hyper-dmabuf.
> >>>One example is actually in github. If you want take a look at it, please
> >>>visit:
> >>>
> >>>https://github.com/downor/linux_hyper_dmabuf_test/tree/xen/simple_export
> >>Thank you, I'll have a look
> >>>>>Like xen-zcopy it seems to depend on the idea that the hypervisor
> >>>>>manages all memory it is easy for guests to share pages with the help of
> >>>>>the hypervisor.
> >>>>So, for xen-zcopy we were not trying to make it generic,
> >>>>it just solves display (dumb) zero-copying use-cases for Xen.
> >>>>We implemented it as a DRM helper driver because we can't see any
> >>>>other use-cases as of now.
> >>>>For example, we also have Xen para-virtualized sound driver, but
> >>>>its buffer memory usage is not comparable to what display wants
> >>>>and it works somewhat differently (e.g. there is no "frame done"
> >>>>event, so one can't tell when the sound buffer can be "flipped").
> >>>>At the same time, we do not use virtio-gpu, so this could probably
> >>>>be one more candidate for shared dma-bufs some day.
> >>>>> Which simply isn't the case on kvm.
> >>>>>
> >>>>>hyper-dmabuf and xen-zcopy could maybe share code, or hyper-dmabuf build
> >>>>>on top of xen-zcopy.
> >>>>Hm, I can imagine that: xen-zcopy could be a library code for hyper-dmabuf
> >>>>in terms of implementing all that page sharing fun in multiple directions,
> >>>>e.g. Host->Guest, Guest->Host, Guest<->Guest.
> >>>>But I'll let Matt and Dongwon to comment on that.
> >>>I think we can definitely collaborate. Especially, maybe we are using some
> >>>outdated sharing mechanism/grant-table mechanism in our Xen backend (thanks
> >>>for bringing that up Oleksandr). However, the question is once we collaborate
> >>>somehow, can xen-zcopy's usecase use the standard API that hyper_dmabuf
> >>>provides? I don't think we need different IOCTLs that do the same in the final
> >>>solution.
> >>>
> >>If you think of xen-zcopy as a library (which implements Xen
> >>grant references mangling) and DRM PRIME wrapper on top of that
> >>library, we can probably define proper API for that library,
> >>so both xen-zcopy and hyper-dmabuf can use it. What is more, I am
> >>about to start upstreaming Xen para-virtualized sound device driver soon,
> >>which also uses similar code and gref passing mechanism [3].
> >>(Actually, I was about to upstream drm/xen-front, drm/xen-zcopy and
> >>snd/xen-front and then propose a Xen helper library for sharing big buffers,
> >>so common code of the above drivers can use the same code w/o code
> >>duplication)
> >I think it is possible to use your functions for memory sharing part in
> >hyper_dmabuf's backend (this 'backend' means the layer that does page sharing
> >and inter-vm communication with xen-specific way.), so why don't we work on
> >"Xen helper library for sharing big buffers" first while we continue our
> >discussion on the common API layer that can cover any dmabuf sharing cases.
> >
> Well, I would love we reuse the code that I have, but I also
> understand that it was limited by my use-cases. So, I do not
> insist we have to ;)
> If we start designing and discussing hyper-dmabuf protocol we of course
> can work on this helper library in parallel.
> >>Thank you,
> >>Oleksandr
> >>
> >>P.S. All, is it a good idea to move this out of udmabuf thread into a
> >>dedicated one?
> >Either way is fine with me.
> So, if you can start designing the protocol we may have a dedicated mail
> thread for that. I will try to help with the protocol as much as I can
Sure thanks. We can talk about it. just FYI, I have prepared a application
note that contains definition of hyper_dmabuf messages included in RFC v2
patch. That would be a great starting point. It will be great if you can
review it.
>
> >>>>>cheers,
> >>>>> Gerd
> >>>>>
> >>>>Thank you,
> >>>>Oleksandr
> >>>>
> >>>>P.S. Sorry for making your original mail thread to discuss things much
> >>>>broader than your RFC...
> >>>>
> >>[1] https://github.com/xen-troops/displ_be
> >>[2] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/…
> >>[3] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/…
> >>
> [1] https://elixir.bootlin.com/linux/v4.16-rc7/source/include/xen/interface/io/…
> [2]
> https://lists.xenproject.org/archives/html/xen-devel/2018-04/msg00685.html
This patch series contains the implementation of a new device driver,
hyper_DMABUF driver, which provides a way to expand the boundary of
Linux DMA-BUF sharing to across different VM instances in Multi-OS platform
enabled by a Hypervisor (e.g. XEN)
This version 2 series is basically refactored version of old series starting
with "[RFC PATCH 01/60] hyper_dmabuf: initial working version of hyper_dmabuf
drv"
Implementation details of this driver are described in the reference guide
added by the second patch, "[RFC PATCH v2 2/5] hyper_dmabuf: architecture
specification and reference guide".
Attaching 'Overview' section here as a quick summary.
------------------------------------------------------------------------------
Section 1. Overview
------------------------------------------------------------------------------
Hyper_DMABUF driver is a Linux device driver running on multiple Virtual
achines (VMs), which expands DMA-BUF sharing capability to the VM environment
where multiple different OS instances need to share same physical data without
data-copy across VMs.
To share a DMA_BUF across VMs, an instance of the Hyper_DMABUF drv on the
exporting VM (so called, “exporter”) imports a local DMA_BUF from the original
producer of the buffer, then re-exports it with an unique ID, hyper_dmabuf_id
for the buffer to the importing VM (so called, “importer”).
Another instance of the Hyper_DMABUF driver on importer registers
a hyper_dmabuf_id together with reference information for the shared physical
pages associated with the DMA_BUF to its database when the export happens.
The actual mapping of the DMA_BUF on the importer’s side is done by
the Hyper_DMABUF driver when user space issues the IOCTL command to access
the shared DMA_BUF. The Hyper_DMABUF driver works as both an importing and
exporting driver as is, that is, no special configuration is required.
Consequently, only a single module per VM is needed to enable cross-VM DMA_BUF
exchange.
------------------------------------------------------------------------------
There is a git repository at github.com where this series of patches are all
integrated in Linux kernel tree based on the commit:
commit ae64f9bd1d3621b5e60d7363bc20afb46aede215
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Sun Dec 3 11:01:47 2018 -0500
Linux 4.15-rc2
https://github.com/downor/linux_hyper_dmabuf.git hyper_dmabuf_integration_v4
Dongwon Kim, Mateusz Polrola (9):
hyper_dmabuf: initial upload of hyper_dmabuf drv core framework
hyper_dmabuf: architecture specification and reference guide
MAINTAINERS: adding Hyper_DMABUF driver section in MAINTAINERS
hyper_dmabuf: user private data attached to hyper_DMABUF
hyper_dmabuf: hyper_DMABUF synchronization across VM
hyper_dmabuf: query ioctl for retreiving various hyper_DMABUF info
hyper_dmabuf: event-polling mechanism for detecting a new hyper_DMABUF
hyper_dmabuf: threaded interrupt in Xen-backend
hyper_dmabuf: default backend for XEN hypervisor
Documentation/hyper-dmabuf-sharing.txt | 734 ++++++++++++++++
MAINTAINERS | 11 +
drivers/dma-buf/Kconfig | 2 +
drivers/dma-buf/Makefile | 1 +
drivers/dma-buf/hyper_dmabuf/Kconfig | 50 ++
drivers/dma-buf/hyper_dmabuf/Makefile | 44 +
.../backends/xen/hyper_dmabuf_xen_comm.c | 944 +++++++++++++++++++++
.../backends/xen/hyper_dmabuf_xen_comm.h | 78 ++
.../backends/xen/hyper_dmabuf_xen_comm_list.c | 158 ++++
.../backends/xen/hyper_dmabuf_xen_comm_list.h | 67 ++
.../backends/xen/hyper_dmabuf_xen_drv.c | 46 +
.../backends/xen/hyper_dmabuf_xen_drv.h | 53 ++
.../backends/xen/hyper_dmabuf_xen_shm.c | 525 ++++++++++++
.../backends/xen/hyper_dmabuf_xen_shm.h | 46 +
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.c | 410 +++++++++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.h | 122 +++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.c | 122 +++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.h | 38 +
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.c | 135 +++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.h | 53 ++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.c | 794 +++++++++++++++++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.h | 52 ++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.c | 295 +++++++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.h | 73 ++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.c | 416 +++++++++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.h | 89 ++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.c | 415 +++++++++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.h | 34 +
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.c | 174 ++++
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.h | 36 +
.../hyper_dmabuf/hyper_dmabuf_remote_sync.c | 324 +++++++
.../hyper_dmabuf/hyper_dmabuf_remote_sync.h | 32 +
.../dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.c | 257 ++++++
.../dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.h | 43 +
drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_struct.h | 143 ++++
include/uapi/linux/hyper_dmabuf.h | 134 +++
36 files changed, 6950 insertions(+)
create mode 100644 Documentation/hyper-dmabuf-sharing.txt
create mode 100644 drivers/dma-buf/hyper_dmabuf/Kconfig
create mode 100644 drivers/dma-buf/hyper_dmabuf/Makefile
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm_list.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm_list.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_drv.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_drv.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_shm.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_shm.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_remote_sync.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_remote_sync.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.c
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.h
create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_struct.h
create mode 100644 include/uapi/linux/hyper_dmabuf.h
--
2.16.1
Most of the other cross-driver gfx infrastructure (dma_buf, dma_fence)
also gets cross posted to all the relevant gfx/memory lists. Doing the
same for ION means people won't miss relevant patches.
Cc: Laura Abbott <labbott(a)redhat.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: devel(a)driverdev.osuosl.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org (moderated for non-subscribers)
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 555db72d4eb7..d43cdfca3eb5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -902,6 +902,8 @@ ANDROID ION DRIVER
M: Laura Abbott <labbott(a)redhat.com>
M: Sumit Semwal <sumit.semwal(a)linaro.org>
L: devel(a)driverdev.osuosl.org
+L: dri-devel(a)lists.freedesktop.org
+L: linaro-mm-sig(a)lists.linaro.org (moderated for non-subscribers)
S: Supported
F: drivers/staging/android/ion
F: drivers/staging/android/uapi/ion.h
--
2.16.2
On Thu, Mar 29, 2018 at 09:58:54PM -0400, Jerome Glisse wrote:
> dma_map_resource() is the right API (thought its current implementation
> is fill with x86 assumptions). So i would argue that arch can decide to
> implement it or simply return dma error address which trigger fallback
> path into the caller (at least for GPU drivers). SG variant can be added
> on top.
It isn't in general. It doesn't integrate with scatterlists (see my
comment to page one), and it doesn't integrate with all the subsystems
that also need a kernel virtual address.
Am 29.03.2018 um 18:25 schrieb Logan Gunthorpe:
>
> On 29/03/18 10:10 AM, Christian König wrote:
>> Why not? I mean the dma_map_resource() function is for P2P while other
>> dma_map_* functions are only for system memory.
> Oh, hmm, I wasn't aware dma_map_resource was exclusively for mapping
> P2P. Though it's a bit odd seeing we've been working under the
> assumption that PCI P2P is different as it has to translate the PCI bus
> address. Where as P2P for devices on other buses is a big unknown.
Yeah, completely agree. On my TODO list (but rather far down) is
actually supporting P2P with USB devices.
And no, I don't have the slightest idea how to do this at the moment.
>>> And this is necessary to
>>> check if the DMA ops in use support it or not. We can't have the
>>> dma_map_X() functions do the wrong thing because they don't support it yet.
>> Well that sounds like we should just return an error from
>> dma_map_resources() when an architecture doesn't support P2P yet as Alex
>> suggested.
> Yes, well except in our patch-set we can't easily use
> dma_map_resources() as we either have SGLs to deal with or we need to
> create whole new interfaces to a number of subsystems.
Agree as well. I was also in clear favor of extending the SGLs to have a
flag for this instead of the dma_map_resource() interface, but for some
reason that didn't made it into the kernel.
>> You don't seem to understand the implications: The devices do have a
>> common upstream bridge! In other words your code would currently claim
>> that P2P is supported, but in practice it doesn't work.
> Do they? They don't on any of the Intel machines I'm looking at. The
> previous version of the patchset not only required a common upstream
> bridge but two layers of upstream bridges on both devices which would
> effectively limit transfers to PCIe switches only. But Bjorn did not
> like this.
At least to me that sounds like a good idea, it would at least disable
(the incorrect) auto detection of P2P for such devices.
>> You need to include both drivers which participate in the P2P
>> transaction to make sure that both supports this and give them
>> opportunity to chicken out and in the case of AMD APUs even redirect the
>> request to another location (e.g. participate in the DMA translation).
> I don't think it's the drivers responsibility to reject P2P . The
> topology is what governs support or not. The discussions we had with
> Bjorn settled on if the devices are all behind the same bridge they can
> communicate with each other. This is essentially guaranteed by the PCI spec.
Well it is not only rejecting P2P, see the devices I need to worry about
are essentially part of the CPU. Their resources looks like a PCI BAR to
the BIOS and OS, but are actually backed by stolen system memory.
So as crazy as it sounds what you get is an operation which starts as
P2P, but then the GPU drivers sees it and says: Hey please don't write
that to my PCIe BAR, but rather system memory location X.
>> DMA-buf fortunately seems to handle all this already, that's why we
>> choose it as base for our implementation.
> Well, unfortunately DMA-buf doesn't help for the drivers we are working
> with as neither the block layer nor the RDMA subsystem have any
> interfaces for it.
A fact that gives me quite some sleepless nights as well. I think we
sooner or later need to extend those interfaces to work with DMA-bufs as
well.
I will try to give your patch set a review when I'm back from vacation
and rebase my DMA-buf work on top of that.
Regards,
Christian.
>
> Logan
Am 29.03.2018 um 17:45 schrieb Logan Gunthorpe:
>
> On 29/03/18 05:44 AM, Christian König wrote:
>> Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe:
>>> On 28/03/18 01:44 PM, Christian König wrote:
>>>> Well, isn't that exactly what dma_map_resource() is good for? As far as
>>>> I can see it makes sure IOMMU is aware of the access route and
>>>> translates a CPU address into a PCI Bus address.
>>>> I'm using that with the AMD IOMMU driver and at least there it works
>>>> perfectly fine.
>>> Yes, it would be nice, but no arch has implemented this yet. We are just
>>> lucky in the x86 case because that arch is simple and doesn't need to do
>>> anything for P2P (partially due to the Bus and CPU addresses being the
>>> same). But in the general case, you can't rely on it.
>> Well, that an arch hasn't implemented it doesn't mean that we don't have
>> the right interface to do it.
> Yes, but right now we don't have a performant way to check if we are
> doing P2P or not in the dma_map_X() wrappers.
Why not? I mean the dma_map_resource() function is for P2P while other
dma_map_* functions are only for system memory.
> And this is necessary to
> check if the DMA ops in use support it or not. We can't have the
> dma_map_X() functions do the wrong thing because they don't support it yet.
Well that sounds like we should just return an error from
dma_map_resources() when an architecture doesn't support P2P yet as Alex
suggested.
>> Devices integrated in the CPU usually only "claim" to be PCIe devices.
>> In reality their memory request path go directly through the integrated
>> north bridge. The reason for this is simple better throughput/latency.
> These are just more reasons why our patchset restricts to devices behind
> a switch. And more mess for someone to deal with if they need to relax
> that restriction.
You don't seem to understand the implications: The devices do have a
common upstream bridge! In other words your code would currently claim
that P2P is supported, but in practice it doesn't work.
You need to include both drivers which participate in the P2P
transaction to make sure that both supports this and give them
opportunity to chicken out and in the case of AMD APUs even redirect the
request to another location (e.g. participate in the DMA translation).
DMA-buf fortunately seems to handle all this already, that's why we
choose it as base for our implementation.
Regards,
Christian.
Sorry, didn't mean to drop the lists here. re-adding.
On Wed, Mar 28, 2018 at 4:05 PM, Alex Deucher <alexdeucher(a)gmail.com> wrote:
> On Wed, Mar 28, 2018 at 3:53 PM, Logan Gunthorpe <logang(a)deltatee.com> wrote:
>>
>>
>> On 28/03/18 01:44 PM, Christian König wrote:
>>> Well, isn't that exactly what dma_map_resource() is good for? As far as
>>> I can see it makes sure IOMMU is aware of the access route and
>>> translates a CPU address into a PCI Bus address.
>>
>>> I'm using that with the AMD IOMMU driver and at least there it works
>>> perfectly fine.
>>
>> Yes, it would be nice, but no arch has implemented this yet. We are just
>> lucky in the x86 case because that arch is simple and doesn't need to do
>> anything for P2P (partially due to the Bus and CPU addresses being the
>> same). But in the general case, you can't rely on it.
>
> Could we do something for the arches where it works? I feel like peer
> to peer has dragged out for years because everyone is trying to boil
> the ocean for all arches. There are a huge number of use cases for
> peer to peer on these "simple" architectures which actually represent
> a good deal of the users that want this.
>
> Alex
>
>>
>>>>> Yeah, but not for ours. See if you want to do real peer 2 peer you need
>>>>> to keep both the operation as well as the direction into account.
>>>> Not sure what you are saying here... I'm pretty sure we are doing "real"
>>>> peer 2 peer...
>>>>
>>>>> For example when you can do writes between A and B that doesn't mean
>>>>> that writes between B and A work. And reads are generally less likely to
>>>>> work than writes. etc...
>>>> If both devices are behind a switch then the PCI spec guarantees that A
>>>> can both read and write B and vice versa.
>>>
>>> Sorry to say that, but I know a whole bunch of PCI devices which
>>> horrible ignores that.
>>
>> Can you elaborate? As far as the device is concerned it shouldn't know
>> whether a request comes from a peer or from the host. If it does do
>> crazy stuff like that it's well out of spec. It's up to the switch (or
>> root complex if good support exists) to route the request to the device
>> and it's the root complex that tends to be what drops the load requests
>> which causes the asymmetries.
>>
>> Logan
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx(a)lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx