Linaro-mm-sig March 2018

linaro-mm-sig@lists.linaro.org

13 participants
18 discussions

[RFC] android: ion: How to properly clean caches for uncached allocations

by Liam Mark

The issue: Currently in ION if you allocate uncached memory it is possible that there are still dirty lines in the cache. And often these dirty lines in the cache are the zeros which were meant to clear out any sensitive kernel data. What this means is that if you allocate uncached memory from ION, and then subsequently write to that buffer (using the uncached mapping you are provided by ION) then the data you have written could be corrupted at some point in the future if a dirty line is evicted from the cache. Also this means there is a potential security issue. If an un-privileged userspace user allocated uncached memory (for example from the system heap) and then if they were to read from that buffer (through the un-cached mapping they are provided by ION), and if some of the zeros which were written to that memory are still in the cache then this un-privileged userspace user could read potentially sensitive kernel data. An unacceptable fix: I have included some sample code which should resolve this issue for the system heap and the cma heap on some architectures, however this code would not be acceptable for upstreaming since it uses hacks to clean the cache. Similar changes would need to be made to carveout heap and chunk heap. I would appreciate some feedback on the proper way for ION to clean the caches for memory it has allocated that is intended for uncached access. I realize that it may be tempting, as a solution to this problem, to simply strip uncached support from ION. I hope that we can try to find a solution which preserves uncached memory support as ION uncached memory is often used (though perhaps not in upstreamed code) in cases such as multimedia use cases where there is no CPU access required, in secure heap allocations, and in some cases where there is minimal CPU access and therefore uncached memory performs better. Signed-off-by: Liam Mark <lmark(a)codeaurora.org> --- drivers/staging/android/ion/ion.c | 16 ++++++++++++++++ drivers/staging/android/ion/ion.h | 5 ++++- drivers/staging/android/ion/ion_cma_heap.c | 3 +++ drivers/staging/android/ion/ion_page_pool.c | 8 ++++++-- drivers/staging/android/ion/ion_system_heap.c | 7 ++++++- 5 files changed, 35 insertions(+), 4 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 57e0d8035b2e..10e967b0a0f4 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -38,6 +38,22 @@ bool ion_buffer_cached(struct ion_buffer *buffer) return !!(buffer->flags & ION_FLAG_CACHED); } +void ion_pages_sync_for_device(struct page *page, size_t size, + enum dma_data_direction dir) +{ + struct scatterlist sg; + struct device dev = {0}; + + /* hack, use dummy device */ + arch_setup_dma_ops(&dev, 0, 0, NULL, false); + + sg_init_table(&sg, 1); + sg_set_page(&sg, page, size, 0); + /* hack, use phys address for dma address */ + sg_dma_address(&sg) = page_to_phys(page); + dma_sync_sg_for_device(&dev, &sg, 1, dir); +} + /* this function should only be called while dev->lock is held */ static void ion_buffer_add(struct ion_device *dev, struct ion_buffer *buffer) diff --git a/drivers/staging/android/ion/ion.h b/drivers/staging/android/ion/ion.h index a238f23c9116..227b9928d185 100644 --- a/drivers/staging/android/ion/ion.h +++ b/drivers/staging/android/ion/ion.h @@ -192,6 +192,9 @@ struct ion_heap { */ bool ion_buffer_cached(struct ion_buffer *buffer); +void ion_pages_sync_for_device(struct page *page, size_t size, + enum dma_data_direction dir); + /** * ion_buffer_fault_user_mappings - fault in user mappings of this buffer * @buffer: buffer @@ -333,7 +336,7 @@ struct ion_page_pool { struct ion_page_pool *ion_page_pool_create(gfp_t gfp_mask, unsigned int order, bool cached); void ion_page_pool_destroy(struct ion_page_pool *pool); -struct page *ion_page_pool_alloc(struct ion_page_pool *pool); +struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool); void ion_page_pool_free(struct ion_page_pool *pool, struct page *page); /** ion_page_pool_shrink - shrinks the size of the memory cached in the pool diff --git a/drivers/staging/android/ion/ion_cma_heap.c b/drivers/staging/android/ion/ion_cma_heap.c index 49718c96bf9e..82e80621d114 100644 --- a/drivers/staging/android/ion/ion_cma_heap.c +++ b/drivers/staging/android/ion/ion_cma_heap.c @@ -59,6 +59,9 @@ static int ion_cma_allocate(struct ion_heap *heap, struct ion_buffer *buffer, memset(page_address(pages), 0, size); } + if (!ion_buffer_cached(buffer)) + ion_pages_sync_for_device(pages, size, DMA_BIDIRECTIONAL); + table = kmalloc(sizeof(*table), GFP_KERNEL); if (!table) goto err; diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c index b3017f12835f..169a321778ed 100644 --- a/drivers/staging/android/ion/ion_page_pool.c +++ b/drivers/staging/android/ion/ion_page_pool.c @@ -63,7 +63,7 @@ static struct page *ion_page_pool_remove(struct ion_page_pool *pool, bool high) return page; } -struct page *ion_page_pool_alloc(struct ion_page_pool *pool) +struct page *ion_page_pool_alloc(struct ion_page_pool *pool, bool *from_pool) { struct page *page = NULL; @@ -76,8 +76,12 @@ struct page *ion_page_pool_alloc(struct ion_page_pool *pool) page = ion_page_pool_remove(pool, false); mutex_unlock(&pool->mutex); - if (!page) + if (!page) { page = ion_page_pool_alloc_pages(pool); + *from_pool = false; + } else { + *from_pool = true; + } return page; } diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c index bc19cdd30637..3bb4604e032b 100644 --- a/drivers/staging/android/ion/ion_system_heap.c +++ b/drivers/staging/android/ion/ion_system_heap.c @@ -57,13 +57,18 @@ static struct page *alloc_buffer_page(struct ion_system_heap *heap, bool cached = ion_buffer_cached(buffer); struct ion_page_pool *pool; struct page *page; + bool from_pool; if (!cached) pool = heap->uncached_pools[order_to_index(order)]; else pool = heap->cached_pools[order_to_index(order)]; - page = ion_page_pool_alloc(pool); + page = ion_page_pool_alloc(pool, &from_pool); + + if (!from_pool && !ion_buffer_cached(buffer)) + ion_pages_sync_for_device(page, PAGE_SIZE << order, + DMA_BIDIRECTIONAL); return page; } -- 1.8.5.2 Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project

6 years

[PATCH 1/8] lib/scatterlist: add sg_set_dma_addr() helper

by Christian König

Use this function to set an sg entry to point to device resources mapped using dma_map_resource(). The page pointer is set to NULL and only the DMA address, length and offset values are valid. Signed-off-by: Christian König <christian.koenig(a)amd.com> --- include/linux/scatterlist.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 22b2131bcdcd..f944ee4e482c 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -149,6 +149,29 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf, sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); } +/** + * sg_set_dma_addr - Set sg entry to point at specified dma address + * @sg: SG entry + * @address: DMA address to set + * @len: Length of data + * @offset: Offset into page + * + * Description: + * Use this function to set an sg entry to point to device resources mapped + * using dma_map_resource(). The page pointer is set to NULL and only the DMA + * address, length and offset values are valid. + * + **/ +static inline void sg_set_dma_addr(struct scatterlist *sg, dma_addr_t address, + unsigned int len, unsigned int offset) +{ + sg_set_page(sg, NULL, len, offset); + sg->dma_address = address; +#ifdef CONFIG_NEED_SG_DMA_LENGTH + sg->dma_length = len; +#endif +} + /* * Loop over each sg element, following the pointer to a new list if necessary */ -- 2.14.1

6 years, 7 months

[RfC PATCH] Add udmabuf misc device

by Gerd Hoffmann

A driver to let userspace turn iovecs into dma-bufs. Use case: Allows qemu pass around dmabufs for the guest framebuffer. https://www.kraxel.org/cgit/qemu/log/?h=sirius/udmabuf has an experimental patch. Also allows qemu to export guest virtio-gpu resources as host dmabufs. Should be possible to use it to display guest wayland windows on the host display server. virtio-gpu ressources can be chunked so we will actually need multiple iovec entries. UNTESTED. Want collect some feedback on the general approach with this RfC series. Can this work? If not, better ideas? Question: Must this be hooked into some kind of mlock accounting, to limit the amout of memory userspace is allowed to pin this way? Or will get_user_pages_fast() handle that for me? Known issue: Driver API isn't complete yet. Need add some flags, for example to support read-only buffers. Cc: David Airlie <airlied(a)linux.ie> Cc: Tomeu Vizoso <tomeu.vizoso(a)collabora.com> Signed-off-by: Gerd Hoffmann <kraxel(a)redhat.com> --- include/uapi/linux/udmabuf.h | 21 ++++ drivers/dma-buf/udmabuf.c | 250 +++++++++++++++++++++++++++++++++++++++++++ drivers/dma-buf/Kconfig | 7 ++ drivers/dma-buf/Makefile | 1 + 4 files changed, 279 insertions(+) create mode 100644 include/uapi/linux/udmabuf.h create mode 100644 drivers/dma-buf/udmabuf.c diff --git a/include/uapi/linux/udmabuf.h b/include/uapi/linux/udmabuf.h new file mode 100644 index 0000000000..fd2fa441fe --- /dev/null +++ b/include/uapi/linux/udmabuf.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_UDMABUF_H +#define _UAPI_LINUX_UDMABUF_H + +#include <linux/types.h> +#include <linux/ioctl.h> + +struct udmabuf_iovec { + __u64 base; + __u64 len; +}; + +struct udmabuf_create { + __u32 flags; + __u32 niov; + struct udmabuf_iovec iovs[]; +}; + +#define UDMABUF_CREATE _IOW(0x42, 0x23, struct udmabuf_create) + +#endif /* _UAPI_LINUX_UDMABUF_H */ diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c new file mode 100644 index 0000000000..ec012d7ac7 --- /dev/null +++ b/drivers/dma-buf/udmabuf.c @@ -0,0 +1,250 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include <linux/init.h> +#include <linux/module.h> +#include <linux/device.h> +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/miscdevice.h> +#include <linux/dma-buf.h> +#include <linux/highmem.h> + +#include <uapi/linux/udmabuf.h> + +struct udmabuf { + u32 pagecount; + struct page **pages; +}; + +static int udmabuf_vm_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct udmabuf *ubuf = vma->vm_private_data; + + if (WARN_ON(vmf->pgoff >= ubuf->pagecount)) + return VM_FAULT_SIGBUS; + + vmf->page = ubuf->pages[vmf->pgoff]; + get_page(vmf->page); + return 0; +} + +static const struct vm_operations_struct udmabuf_vm_ops = { + .fault = udmabuf_vm_fault, +}; + +static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) +{ + struct udmabuf *ubuf = buf->priv; + + if ((vma->vm_flags & VM_SHARED) == 0) + return -EINVAL; + + vma->vm_ops = &udmabuf_vm_ops; + vma->vm_private_data = ubuf; + return 0; +} + +static struct sg_table *map_udmabuf(struct dma_buf_attachment *at, + enum dma_data_direction direction) +{ + struct udmabuf *ubuf = at->dmabuf->priv; + struct sg_table *sg; + + sg = kzalloc(sizeof(*sg), GFP_KERNEL); + if (!sg) + goto err1; + if (sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, + 0, ubuf->pagecount << PAGE_SHIFT, + GFP_KERNEL) < 0) + goto err2; + if (!dma_map_sg(at->dev, sg->sgl, sg->nents, direction)) + goto err3; + + return sg; + +err3: + sg_free_table(sg); +err2: + kfree(sg); +err1: + return ERR_PTR(-ENOMEM); +} + +static void unmap_udmabuf(struct dma_buf_attachment *at, + struct sg_table *sg, + enum dma_data_direction direction) +{ + sg_free_table(sg); + kfree(sg); +} + +static void release_udmabuf(struct dma_buf *buf) +{ + struct udmabuf *ubuf = buf->priv; + pgoff_t pg; + + for (pg = 0; pg < ubuf->pagecount; pg++) + put_page(ubuf->pages[pg]); + kfree(ubuf->pages); + kfree(ubuf); +} + +static void *kmap_atomic_udmabuf(struct dma_buf *buf, unsigned long offset) +{ + struct udmabuf *ubuf = buf->priv; + struct page *page = ubuf->pages[offset >> PAGE_SHIFT]; + + return kmap_atomic(page); +} + +static void *kmap_udmabuf(struct dma_buf *buf, unsigned long offset) +{ + struct udmabuf *ubuf = buf->priv; + struct page *page = ubuf->pages[offset >> PAGE_SHIFT]; + + return kmap(page); +} + +static struct dma_buf_ops udmabuf_ops = { + .map_dma_buf = map_udmabuf, + .unmap_dma_buf = unmap_udmabuf, + .release = release_udmabuf, + .map_atomic = kmap_atomic_udmabuf, + .map = kmap_udmabuf, + .mmap = mmap_udmabuf, +}; + +static long udmabuf_ioctl_create(struct file *filp, unsigned long arg) +{ + struct udmabuf_create create; + struct udmabuf_iovec *iovs; + struct udmabuf *ubuf; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct dma_buf *buf; + pgoff_t pgoff, pgcnt; + u32 iov; + int ret; + + if (copy_from_user(&create, (void __user *)arg, + sizeof(struct udmabuf_create))) + return -EFAULT; + + iovs = kmalloc_array(create.niov, sizeof(struct udmabuf_iovec), + GFP_KERNEL); + if (!iovs) + return -ENOMEM; + + arg += offsetof(struct udmabuf_create, iovs); + ret = -EFAULT; + if (copy_from_user(iovs, (void __user *)arg, + create.niov * sizeof(struct udmabuf_iovec))) + goto err_free_iovs; + + ubuf = kzalloc(sizeof(struct udmabuf), GFP_KERNEL); + if (!ubuf) + goto err_free_iovs; + + ret = -EINVAL; + for (iov = 0; iov < create.niov; iov++) { + if (!IS_ALIGNED(iovs[iov].base, PAGE_SIZE)) + goto err_free_iovs; + if (!IS_ALIGNED(iovs[iov].len, PAGE_SIZE)) + goto err_free_iovs; + ubuf->pagecount += iovs[iov].len >> PAGE_SHIFT; + } + + ret = -ENOMEM; + ubuf->pages = kmalloc_array(ubuf->pagecount, sizeof(struct page*), + GFP_KERNEL); + if (!ubuf->pages) + goto err_free_buf; + + pgoff = 0; + for (iov = 0; iov < create.niov; iov++) { + pgcnt = iovs[iov].len >> PAGE_SHIFT; + while (pgcnt > 0) { + ret = get_user_pages_fast(iovs[iov].base, pgcnt, + true, /* write */ + ubuf->pages + pgoff); + if (ret < 0) + goto err_put_pages; + pgoff += ret; + pgcnt -= ret; + } + } + + exp_info.ops = &udmabuf_ops; + exp_info.size = ubuf->pagecount << PAGE_SHIFT; + exp_info.priv = ubuf; + + buf = dma_buf_export(&exp_info); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto err_put_pages; + } + + kfree(iovs); + return dma_buf_fd(buf, 0); + +err_put_pages: + while (pgoff > 0) + put_page(ubuf->pages[--pgoff]); +err_free_buf: + kfree(ubuf->pages); + kfree(ubuf); +err_free_iovs: + kfree(iovs); + return ret; +} + +static long udmabuf_ioctl(struct file *filp, unsigned int ioctl, + unsigned long arg) +{ + long ret; + + switch (ioctl) { + case UDMABUF_CREATE: + ret = udmabuf_ioctl_create(filp, arg); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} + +static const struct file_operations udmabuf_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = udmabuf_ioctl, +}; + +static struct miscdevice udmabuf_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "udmabuf", + .fops = &udmabuf_fops, +}; + +static int __init udmabuf_dev_init(void) +{ + int ret; + + ret = misc_register(&udmabuf_misc); + if (ret) + return ret; + + return 0; +} + +static void __exit udmabuf_dev_exit(void) +{ + misc_deregister(&udmabuf_misc); +} + +module_init(udmabuf_dev_init) +module_exit(udmabuf_dev_exit) + +MODULE_LICENSE("GPL v2"); diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index ed3b785bae..5876b52554 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -30,4 +30,11 @@ config SW_SYNC WARNING: improper use of this can result in deadlocking kernel drivers from userspace. Intended for test and debug only. +config UDMABUF + tristate "userspace dmabuf misc driver" + default n + depends on DMA_SHARED_BUFFER + ---help--- + A driver to let userspace turn iovs into dma-bufs. + endmenu diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index c33bf88631..0913a6ccab 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1,3 +1,4 @@ obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o obj-$(CONFIG_SYNC_FILE) += sync_file.o obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o +obj-$(CONFIG_UDMABUF) += udmabuf.o -- 2.9.3

6 years, 8 months

[RFC PATCH v2 0/9] hyper_dmabuf: Hyper_DMABUF driver

by Dongwon Kim

This patch series contains the implementation of a new device driver, hyper_DMABUF driver, which provides a way to expand the boundary of Linux DMA-BUF sharing to across different VM instances in Multi-OS platform enabled by a Hypervisor (e.g. XEN) This version 2 series is basically refactored version of old series starting with "[RFC PATCH 01/60] hyper_dmabuf: initial working version of hyper_dmabuf drv" Implementation details of this driver are described in the reference guide added by the second patch, "[RFC PATCH v2 2/5] hyper_dmabuf: architecture specification and reference guide". Attaching 'Overview' section here as a quick summary. ------------------------------------------------------------------------------ Section 1. Overview ------------------------------------------------------------------------------ Hyper_DMABUF driver is a Linux device driver running on multiple Virtual achines (VMs), which expands DMA-BUF sharing capability to the VM environment where multiple different OS instances need to share same physical data without data-copy across VMs. To share a DMA_BUF across VMs, an instance of the Hyper_DMABUF drv on the exporting VM (so called, “exporter”) imports a local DMA_BUF from the original producer of the buffer, then re-exports it with an unique ID, hyper_dmabuf_id for the buffer to the importing VM (so called, “importer”). Another instance of the Hyper_DMABUF driver on importer registers a hyper_dmabuf_id together with reference information for the shared physical pages associated with the DMA_BUF to its database when the export happens. The actual mapping of the DMA_BUF on the importer’s side is done by the Hyper_DMABUF driver when user space issues the IOCTL command to access the shared DMA_BUF. The Hyper_DMABUF driver works as both an importing and exporting driver as is, that is, no special configuration is required. Consequently, only a single module per VM is needed to enable cross-VM DMA_BUF exchange. ------------------------------------------------------------------------------ There is a git repository at github.com where this series of patches are all integrated in Linux kernel tree based on the commit: commit ae64f9bd1d3621b5e60d7363bc20afb46aede215 Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Sun Dec 3 11:01:47 2018 -0500 Linux 4.15-rc2 https://github.com/downor/linux_hyper_dmabuf.git hyper_dmabuf_integration_v4 Dongwon Kim, Mateusz Polrola (9): hyper_dmabuf: initial upload of hyper_dmabuf drv core framework hyper_dmabuf: architecture specification and reference guide MAINTAINERS: adding Hyper_DMABUF driver section in MAINTAINERS hyper_dmabuf: user private data attached to hyper_DMABUF hyper_dmabuf: hyper_DMABUF synchronization across VM hyper_dmabuf: query ioctl for retreiving various hyper_DMABUF info hyper_dmabuf: event-polling mechanism for detecting a new hyper_DMABUF hyper_dmabuf: threaded interrupt in Xen-backend hyper_dmabuf: default backend for XEN hypervisor Documentation/hyper-dmabuf-sharing.txt | 734 ++++++++++++++++ MAINTAINERS | 11 + drivers/dma-buf/Kconfig | 2 + drivers/dma-buf/Makefile | 1 + drivers/dma-buf/hyper_dmabuf/Kconfig | 50 ++ drivers/dma-buf/hyper_dmabuf/Makefile | 44 + .../backends/xen/hyper_dmabuf_xen_comm.c | 944 +++++++++++++++++++++ .../backends/xen/hyper_dmabuf_xen_comm.h | 78 ++ .../backends/xen/hyper_dmabuf_xen_comm_list.c | 158 ++++ .../backends/xen/hyper_dmabuf_xen_comm_list.h | 67 ++ .../backends/xen/hyper_dmabuf_xen_drv.c | 46 + .../backends/xen/hyper_dmabuf_xen_drv.h | 53 ++ .../backends/xen/hyper_dmabuf_xen_shm.c | 525 ++++++++++++ .../backends/xen/hyper_dmabuf_xen_shm.h | 46 + drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.c | 410 +++++++++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.h | 122 +++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.c | 122 +++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.h | 38 + drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.c | 135 +++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.h | 53 ++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.c | 794 +++++++++++++++++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.h | 52 ++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.c | 295 +++++++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.h | 73 ++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.c | 416 +++++++++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.h | 89 ++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.c | 415 +++++++++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.h | 34 + drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.c | 174 ++++ drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.h | 36 + .../hyper_dmabuf/hyper_dmabuf_remote_sync.c | 324 +++++++ .../hyper_dmabuf/hyper_dmabuf_remote_sync.h | 32 + .../dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.c | 257 ++++++ .../dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.h | 43 + drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_struct.h | 143 ++++ include/uapi/linux/hyper_dmabuf.h | 134 +++ 36 files changed, 6950 insertions(+) create mode 100644 Documentation/hyper-dmabuf-sharing.txt create mode 100644 drivers/dma-buf/hyper_dmabuf/Kconfig create mode 100644 drivers/dma-buf/hyper_dmabuf/Makefile create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm.c create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm.h create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm_list.c create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_comm_list.h create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_drv.c create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_drv.h create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_shm.c create mode 100644 drivers/dma-buf/hyper_dmabuf/backends/xen/hyper_dmabuf_xen_shm.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_drv.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_event.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_id.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ioctl.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_list.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_msg.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_ops.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_query.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_remote_sync.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_remote_sync.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.c create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_sgl_proc.h create mode 100644 drivers/dma-buf/hyper_dmabuf/hyper_dmabuf_struct.h create mode 100644 include/uapi/linux/hyper_dmabuf.h -- 2.16.1

6 years, 8 months

[PATCH v2] Add udmabuf misc device

by Gerd Hoffmann

A driver to let userspace turn iovecs into dma-bufs. Use case: Allows qemu create dmabufs for the vga framebuffer or virtio-gpu ressources. Then they can be passed around to display those guest things on the host. To spice client for classic full framebuffer display, and hopefully some day to wayland server for seamless guest window display. Those dma-bufs are accounted against user's shm mlock bucket as the pages are effectively locked in memory. Cc: David Airlie <airlied(a)linux.ie> Cc: Tomeu Vizoso <tomeu.vizoso(a)collabora.com> Cc: Daniel Vetter <daniel(a)ffwll.ch> Signed-off-by: Gerd Hoffmann <kraxel(a)redhat.com> --- include/uapi/linux/udmabuf.h | 23 ++ drivers/dma-buf/udmabuf.c | 261 ++++++++++++++++++++++ tools/testing/selftests/drivers/dma-buf/udmabuf.c | 69 ++++++ drivers/dma-buf/Kconfig | 7 + drivers/dma-buf/Makefile | 1 + tools/testing/selftests/drivers/dma-buf/Makefile | 5 + 6 files changed, 366 insertions(+) create mode 100644 include/uapi/linux/udmabuf.h create mode 100644 drivers/dma-buf/udmabuf.c create mode 100644 tools/testing/selftests/drivers/dma-buf/udmabuf.c create mode 100644 tools/testing/selftests/drivers/dma-buf/Makefile diff --git a/include/uapi/linux/udmabuf.h b/include/uapi/linux/udmabuf.h new file mode 100644 index 0000000000..54ceba203a --- /dev/null +++ b/include/uapi/linux/udmabuf.h @@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +#ifndef _UAPI_LINUX_UDMABUF_H +#define _UAPI_LINUX_UDMABUF_H + +#include <linux/types.h> +#include <linux/ioctl.h> + +struct udmabuf_iovec { + __u64 base; + __u64 len; +}; + +#define UDMABUF_FLAGS_CLOEXEC 0x01 + +struct udmabuf_create { + __u32 flags; + __u32 niov; + struct udmabuf_iovec iovs[]; +}; + +#define UDMABUF_CREATE _IOW(0x42, 0x23, struct udmabuf_create) + +#endif /* _UAPI_LINUX_UDMABUF_H */ diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c new file mode 100644 index 0000000000..664ab4ee4e --- /dev/null +++ b/drivers/dma-buf/udmabuf.c @@ -0,0 +1,261 @@ +/* + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include <linux/init.h> +#include <linux/module.h> +#include <linux/device.h> +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/miscdevice.h> +#include <linux/dma-buf.h> +#include <linux/highmem.h> +#include <linux/cred.h> + +#include <uapi/linux/udmabuf.h> + +struct udmabuf { + u32 pagecount; + struct page **pages; + struct user_struct *owner; +}; + +static int udmabuf_vm_fault(struct vm_fault *vmf) +{ + struct vm_area_struct *vma = vmf->vma; + struct udmabuf *ubuf = vma->vm_private_data; + + if (WARN_ON(vmf->pgoff >= ubuf->pagecount)) + return VM_FAULT_SIGBUS; + + vmf->page = ubuf->pages[vmf->pgoff]; + get_page(vmf->page); + return 0; +} + +static const struct vm_operations_struct udmabuf_vm_ops = { + .fault = udmabuf_vm_fault, +}; + +static int mmap_udmabuf(struct dma_buf *buf, struct vm_area_struct *vma) +{ + struct udmabuf *ubuf = buf->priv; + + if ((vma->vm_flags & VM_SHARED) == 0) + return -EINVAL; + + vma->vm_ops = &udmabuf_vm_ops; + vma->vm_private_data = ubuf; + return 0; +} + +static struct sg_table *map_udmabuf(struct dma_buf_attachment *at, + enum dma_data_direction direction) +{ + struct udmabuf *ubuf = at->dmabuf->priv; + struct sg_table *sg; + + sg = kzalloc(sizeof(*sg), GFP_KERNEL); + if (!sg) + goto err1; + if (sg_alloc_table_from_pages(sg, ubuf->pages, ubuf->pagecount, + 0, ubuf->pagecount << PAGE_SHIFT, + GFP_KERNEL) < 0) + goto err2; + if (!dma_map_sg(at->dev, sg->sgl, sg->nents, direction)) + goto err3; + + return sg; + +err3: + sg_free_table(sg); +err2: + kfree(sg); +err1: + return ERR_PTR(-ENOMEM); +} + +static void unmap_udmabuf(struct dma_buf_attachment *at, + struct sg_table *sg, + enum dma_data_direction direction) +{ + sg_free_table(sg); + kfree(sg); +} + +static void release_udmabuf(struct dma_buf *buf) +{ + struct udmabuf *ubuf = buf->priv; + pgoff_t pg; + + for (pg = 0; pg < ubuf->pagecount; pg++) + put_page(ubuf->pages[pg]); + user_shm_unlock(ubuf->pagecount << PAGE_SHIFT, ubuf->owner); + free_uid(ubuf->owner); + kfree(ubuf->pages); + kfree(ubuf); +} + +static void *kmap_atomic_udmabuf(struct dma_buf *buf, unsigned long page_num) +{ + struct udmabuf *ubuf = buf->priv; + struct page *page = ubuf->pages[page_num]; + + return kmap_atomic(page); +} + +static void *kmap_udmabuf(struct dma_buf *buf, unsigned long page_num) +{ + struct udmabuf *ubuf = buf->priv; + struct page *page = ubuf->pages[page_num]; + + return kmap(page); +} + +static struct dma_buf_ops udmabuf_ops = { + .map_dma_buf = map_udmabuf, + .unmap_dma_buf = unmap_udmabuf, + .release = release_udmabuf, + .map_atomic = kmap_atomic_udmabuf, + .map = kmap_udmabuf, + .mmap = mmap_udmabuf, +}; + +static long udmabuf_ioctl_create(struct file *filp, unsigned long arg) +{ + struct udmabuf_create create; + struct udmabuf_iovec *iovs; + struct udmabuf *ubuf; + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); + struct dma_buf *buf; + pgoff_t pgoff, pgcnt; + u32 iov, flags; + int ret; + + if (copy_from_user(&create, (void __user *)arg, + sizeof(struct udmabuf_create))) + return -EFAULT; + + iovs = kmalloc_array(create.niov, sizeof(struct udmabuf_iovec), + GFP_KERNEL); + if (!iovs) + return -ENOMEM; + + arg += offsetof(struct udmabuf_create, iovs); + ret = -EFAULT; + if (copy_from_user(iovs, (void __user *)arg, + create.niov * sizeof(struct udmabuf_iovec))) + goto err_free_iovs; + + ubuf = kzalloc(sizeof(struct udmabuf), GFP_KERNEL); + if (!ubuf) + goto err_free_iovs; + + ret = -EINVAL; + for (iov = 0; iov < create.niov; iov++) { + if (!IS_ALIGNED(iovs[iov].base, PAGE_SIZE)) + goto err_free_buf; + if (!IS_ALIGNED(iovs[iov].len, PAGE_SIZE)) + goto err_free_buf; + ubuf->pagecount += iovs[iov].len >> PAGE_SHIFT; + } + + /* this effectively mlocks the pages so account it accordingly */ + ret = -ENOMEM; + ubuf->owner = current_user(); + if (!user_shm_lock(ubuf->pagecount << PAGE_SHIFT, ubuf->owner)) + goto err_free_buf; + + ubuf->pages = kmalloc_array(ubuf->pagecount, sizeof(struct page*), + GFP_KERNEL); + if (!ubuf->pages) + goto err_shm_unlock; + + pgoff = 0; + for (iov = 0; iov < create.niov; iov++) { + pgcnt = iovs[iov].len >> PAGE_SHIFT; + while (pgcnt > 0) { + ret = get_user_pages_fast(iovs[iov].base, pgcnt, + true, /* write */ + ubuf->pages + pgoff); + if (ret < 0) + goto err_put_pages; + pgoff += ret; + pgcnt -= ret; + } + } + + exp_info.ops = &udmabuf_ops; + exp_info.size = ubuf->pagecount << PAGE_SHIFT; + exp_info.priv = ubuf; + + buf = dma_buf_export(&exp_info); + if (IS_ERR(buf)) { + ret = PTR_ERR(buf); + goto err_put_pages; + } + + flags = 0; + if (create.flags & UDMABUF_FLAGS_CLOEXEC) + flags |= O_CLOEXEC; + + kfree(iovs); + return dma_buf_fd(buf, flags); + +err_put_pages: + while (pgoff > 0) + put_page(ubuf->pages[--pgoff]); +err_shm_unlock: + user_shm_unlock(ubuf->pagecount << PAGE_SHIFT, ubuf->owner); +err_free_buf: + free_uid(ubuf->owner); + kfree(ubuf->pages); + kfree(ubuf); +err_free_iovs: + kfree(iovs); + return ret; +} + +static long udmabuf_ioctl(struct file *filp, unsigned int ioctl, + unsigned long arg) +{ + long ret; + + switch (ioctl) { + case UDMABUF_CREATE: + ret = udmabuf_ioctl_create(filp, arg); + break; + default: + ret = -EINVAL; + break; + } + return ret; +} + +static const struct file_operations udmabuf_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = udmabuf_ioctl, +}; + +static struct miscdevice udmabuf_misc = { + .minor = MISC_DYNAMIC_MINOR, + .name = "udmabuf", + .fops = &udmabuf_fops, +}; + +static int __init udmabuf_dev_init(void) +{ + return misc_register(&udmabuf_misc); +} + +static void __exit udmabuf_dev_exit(void) +{ + misc_deregister(&udmabuf_misc); +} + +module_init(udmabuf_dev_init) +module_exit(udmabuf_dev_exit) + +MODULE_AUTHOR("Gerd Hoffmann <kraxel(a)redhat.com>"); +MODULE_LICENSE("GPL v2"); diff --git a/tools/testing/selftests/drivers/dma-buf/udmabuf.c b/tools/testing/selftests/drivers/dma-buf/udmabuf.c new file mode 100644 index 0000000000..3472c8ee49 --- /dev/null +++ b/tools/testing/selftests/drivers/dma-buf/udmabuf.c @@ -0,0 +1,69 @@ +#include <stdio.h> +#include <stdlib.h> +#include <unistd.h> +#include <string.h> +#include <errno.h> +#include <fcntl.h> +#include <malloc.h> + +#include <sys/ioctl.h> +#include <linux/udmabuf.h> + +#define TEST_PREFIX "drivers/dma-buf/udmabuf" +#define NUM_PAGES 4 + +int main(int argc, char *argv[]) +{ + struct udmabuf_create *create; + void *mem; + int dev, fd; + + dev = open("/dev/udmabuf", O_RDWR); + if (dev < 0) { + printf("%s: [skip]\n", TEST_PREFIX); + exit(77); + } + + mem = memalign(getpagesize(), getpagesize() * NUM_PAGES); + if (mem == NULL) { + printf("%s: [FAIL]\n", TEST_PREFIX); + exit (1); + } + + create = malloc(sizeof(struct udmabuf_create) + + sizeof(struct udmabuf_iovec)); + create->flags = 0; + create->niov = 1; + + /* should fail (base not page aligned) */ + create->iovs[0].base = (intptr_t)mem + getpagesize()/2; + create->iovs[0].len = getpagesize(); + fd = ioctl(dev, UDMABUF_CREATE, create); + if (fd >= 0) { + printf("%s: [FAIL]\n", TEST_PREFIX); + exit(1); + } + + /* should fail (size not multiple of page) */ + create->iovs[0].base = (intptr_t)mem; + create->iovs[0].len = getpagesize()/2; + fd = ioctl(dev, UDMABUF_CREATE, create); + if (fd >= 0) { + printf("%s: [FAIL]\n", TEST_PREFIX); + exit(1); + } + + /* should work */ + create->iovs[0].base = (intptr_t)mem; + create->iovs[0].len = getpagesize() * NUM_PAGES; + fd = ioctl(dev, UDMABUF_CREATE, create); + if (fd < 0) { + printf("%s: [FAIL]\n", TEST_PREFIX); + exit(1); + } + close(fd); + + fprintf(stderr, "%s: ok\n", TEST_PREFIX); + close(dev); + return 0; +} diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig index ed3b785bae..19be3ec62d 100644 --- a/drivers/dma-buf/Kconfig +++ b/drivers/dma-buf/Kconfig @@ -30,4 +30,11 @@ config SW_SYNC WARNING: improper use of this can result in deadlocking kernel drivers from userspace. Intended for test and debug only. +config UDMABUF + bool "userspace dmabuf misc driver" + default n + depends on DMA_SHARED_BUFFER + ---help--- + A driver to let userspace turn iovs into dma-bufs. + endmenu diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile index c33bf88631..0913a6ccab 100644 --- a/drivers/dma-buf/Makefile +++ b/drivers/dma-buf/Makefile @@ -1,3 +1,4 @@ obj-y := dma-buf.o dma-fence.o dma-fence-array.o reservation.o seqno-fence.o obj-$(CONFIG_SYNC_FILE) += sync_file.o obj-$(CONFIG_SW_SYNC) += sw_sync.o sync_debug.o +obj-$(CONFIG_UDMABUF) += udmabuf.o diff --git a/tools/testing/selftests/drivers/dma-buf/Makefile b/tools/testing/selftests/drivers/dma-buf/Makefile new file mode 100644 index 0000000000..4154c3d7aa --- /dev/null +++ b/tools/testing/selftests/drivers/dma-buf/Makefile @@ -0,0 +1,5 @@ +CFLAGS += -I../../../../../usr/include/ + +TEST_GEN_PROGS := udmabuf + +include ../../lib.mk -- 2.9.3

6 years, 8 months

Re: [Linaro-mm-sig] [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

by Christoph Hellwig

On Thu, Mar 29, 2018 at 09:58:54PM -0400, Jerome Glisse wrote: > dma_map_resource() is the right API (thought its current implementation > is fill with x86 assumptions). So i would argue that arch can decide to > implement it or simply return dma error address which trigger fallback > path into the caller (at least for GPU drivers). SG variant can be added > on top. It isn't in general. It doesn't integrate with scatterlists (see my comment to page one), and it doesn't integrate with all the subsystems that also need a kernel virtual address.

6 years, 8 months

Re: [Linaro-mm-sig] [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

by Christian König

Am 29.03.2018 um 18:25 schrieb Logan Gunthorpe: > > On 29/03/18 10:10 AM, Christian König wrote: >> Why not? I mean the dma_map_resource() function is for P2P while other >> dma_map_* functions are only for system memory. > Oh, hmm, I wasn't aware dma_map_resource was exclusively for mapping > P2P. Though it's a bit odd seeing we've been working under the > assumption that PCI P2P is different as it has to translate the PCI bus > address. Where as P2P for devices on other buses is a big unknown. Yeah, completely agree. On my TODO list (but rather far down) is actually supporting P2P with USB devices. And no, I don't have the slightest idea how to do this at the moment. >>> And this is necessary to >>> check if the DMA ops in use support it or not. We can't have the >>> dma_map_X() functions do the wrong thing because they don't support it yet. >> Well that sounds like we should just return an error from >> dma_map_resources() when an architecture doesn't support P2P yet as Alex >> suggested. > Yes, well except in our patch-set we can't easily use > dma_map_resources() as we either have SGLs to deal with or we need to > create whole new interfaces to a number of subsystems. Agree as well. I was also in clear favor of extending the SGLs to have a flag for this instead of the dma_map_resource() interface, but for some reason that didn't made it into the kernel. >> You don't seem to understand the implications: The devices do have a >> common upstream bridge! In other words your code would currently claim >> that P2P is supported, but in practice it doesn't work. > Do they? They don't on any of the Intel machines I'm looking at. The > previous version of the patchset not only required a common upstream > bridge but two layers of upstream bridges on both devices which would > effectively limit transfers to PCIe switches only. But Bjorn did not > like this. At least to me that sounds like a good idea, it would at least disable (the incorrect) auto detection of P2P for such devices. >> You need to include both drivers which participate in the P2P >> transaction to make sure that both supports this and give them >> opportunity to chicken out and in the case of AMD APUs even redirect the >> request to another location (e.g. participate in the DMA translation). > I don't think it's the drivers responsibility to reject P2P . The > topology is what governs support or not. The discussions we had with > Bjorn settled on if the devices are all behind the same bridge they can > communicate with each other. This is essentially guaranteed by the PCI spec. Well it is not only rejecting P2P, see the devices I need to worry about are essentially part of the CPU. Their resources looks like a PCI BAR to the BIOS and OS, but are actually backed by stolen system memory. So as crazy as it sounds what you get is an operation which starts as P2P, but then the GPU drivers sees it and says: Hey please don't write that to my PCIe BAR, but rather system memory location X. >> DMA-buf fortunately seems to handle all this already, that's why we >> choose it as base for our implementation. > Well, unfortunately DMA-buf doesn't help for the drivers we are working > with as neither the block layer nor the RDMA subsystem have any > interfaces for it. A fact that gives me quite some sleepless nights as well. I think we sooner or later need to extend those interfaces to work with DMA-bufs as well. I will try to give your patch set a review when I'm back from vacation and rebase my DMA-buf work on top of that. Regards, Christian. > > Logan

6 years, 8 months

Re: [Linaro-mm-sig] [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

by Christian König

Am 29.03.2018 um 17:45 schrieb Logan Gunthorpe: > > On 29/03/18 05:44 AM, Christian König wrote: >> Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe: >>> On 28/03/18 01:44 PM, Christian König wrote: >>>> Well, isn't that exactly what dma_map_resource() is good for? As far as >>>> I can see it makes sure IOMMU is aware of the access route and >>>> translates a CPU address into a PCI Bus address. >>>> I'm using that with the AMD IOMMU driver and at least there it works >>>> perfectly fine. >>> Yes, it would be nice, but no arch has implemented this yet. We are just >>> lucky in the x86 case because that arch is simple and doesn't need to do >>> anything for P2P (partially due to the Bus and CPU addresses being the >>> same). But in the general case, you can't rely on it. >> Well, that an arch hasn't implemented it doesn't mean that we don't have >> the right interface to do it. > Yes, but right now we don't have a performant way to check if we are > doing P2P or not in the dma_map_X() wrappers. Why not? I mean the dma_map_resource() function is for P2P while other dma_map_* functions are only for system memory. > And this is necessary to > check if the DMA ops in use support it or not. We can't have the > dma_map_X() functions do the wrong thing because they don't support it yet. Well that sounds like we should just return an error from dma_map_resources() when an architecture doesn't support P2P yet as Alex suggested. >> Devices integrated in the CPU usually only "claim" to be PCIe devices. >> In reality their memory request path go directly through the integrated >> north bridge. The reason for this is simple better throughput/latency. > These are just more reasons why our patchset restricts to devices behind > a switch. And more mess for someone to deal with if they need to relax > that restriction. You don't seem to understand the implications: The devices do have a common upstream bridge! In other words your code would currently claim that P2P is supported, but in practice it doesn't work. You need to include both drivers which participate in the P2P transaction to make sure that both supports this and give them opportunity to chicken out and in the case of AMD APUs even redirect the request to another location (e.g. participate in the DMA translation). DMA-buf fortunately seems to handle all this already, that's why we choose it as base for our implementation. Regards, Christian.

6 years, 8 months

Re: [Linaro-mm-sig] [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

by Alex Deucher

Sorry, didn't mean to drop the lists here. re-adding. On Wed, Mar 28, 2018 at 4:05 PM, Alex Deucher <alexdeucher(a)gmail.com> wrote: > On Wed, Mar 28, 2018 at 3:53 PM, Logan Gunthorpe <logang(a)deltatee.com> wrote: >> >> >> On 28/03/18 01:44 PM, Christian König wrote: >>> Well, isn't that exactly what dma_map_resource() is good for? As far as >>> I can see it makes sure IOMMU is aware of the access route and >>> translates a CPU address into a PCI Bus address. >> >>> I'm using that with the AMD IOMMU driver and at least there it works >>> perfectly fine. >> >> Yes, it would be nice, but no arch has implemented this yet. We are just >> lucky in the x86 case because that arch is simple and doesn't need to do >> anything for P2P (partially due to the Bus and CPU addresses being the >> same). But in the general case, you can't rely on it. > > Could we do something for the arches where it works? I feel like peer > to peer has dragged out for years because everyone is trying to boil > the ocean for all arches. There are a huge number of use cases for > peer to peer on these "simple" architectures which actually represent > a good deal of the users that want this. > > Alex > >> >>>>> Yeah, but not for ours. See if you want to do real peer 2 peer you need >>>>> to keep both the operation as well as the direction into account. >>>> Not sure what you are saying here... I'm pretty sure we are doing "real" >>>> peer 2 peer... >>>> >>>>> For example when you can do writes between A and B that doesn't mean >>>>> that writes between B and A work. And reads are generally less likely to >>>>> work than writes. etc... >>>> If both devices are behind a switch then the PCI spec guarantees that A >>>> can both read and write B and vice versa. >>> >>> Sorry to say that, but I know a whole bunch of PCI devices which >>> horrible ignores that. >> >> Can you elaborate? As far as the device is concerned it shouldn't know >> whether a request comes from a peer or from the host. If it does do >> crazy stuff like that it's well out of spec. It's up to the switch (or >> root complex if good support exists) to route the request to the device >> and it's the root complex that tends to be what drops the load requests >> which causes the asymmetries. >> >> Logan >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx(a)lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

6 years, 8 months

Re: [Linaro-mm-sig] [PATCH 2/8] PCI: Add pci_find_common_upstream_dev()

by Christian König

Am 28.03.2018 um 21:53 schrieb Logan Gunthorpe: > > On 28/03/18 01:44 PM, Christian König wrote: >> Well, isn't that exactly what dma_map_resource() is good for? As far as >> I can see it makes sure IOMMU is aware of the access route and >> translates a CPU address into a PCI Bus address. >> I'm using that with the AMD IOMMU driver and at least there it works >> perfectly fine. > Yes, it would be nice, but no arch has implemented this yet. We are just > lucky in the x86 case because that arch is simple and doesn't need to do > anything for P2P (partially due to the Bus and CPU addresses being the > same). But in the general case, you can't rely on it. Well, that an arch hasn't implemented it doesn't mean that we don't have the right interface to do it. >>>> Yeah, but not for ours. See if you want to do real peer 2 peer you need >>>> to keep both the operation as well as the direction into account. >>> Not sure what you are saying here... I'm pretty sure we are doing "real" >>> peer 2 peer... >>> >>>> For example when you can do writes between A and B that doesn't mean >>>> that writes between B and A work. And reads are generally less likely to >>>> work than writes. etc... >>> If both devices are behind a switch then the PCI spec guarantees that A >>> can both read and write B and vice versa. >> Sorry to say that, but I know a whole bunch of PCI devices which >> horrible ignores that. > Can you elaborate? As far as the device is concerned it shouldn't know > whether a request comes from a peer or from the host. If it does do > crazy stuff like that it's well out of spec. It's up to the switch (or > root complex if good support exists) to route the request to the device > and it's the root complex that tends to be what drops the load requests > which causes the asymmetries. Devices integrated in the CPU usually only "claim" to be PCIe devices. In reality their memory request path go directly through the integrated north bridge. The reason for this is simple better throughput/latency. That is hidden from the software, for example the BIOS just allocates address space for the BARs as if it's a normal PCIe device. The only crux is when you then do peer2peer your request simply go into nirvana and are not handled by anything because the BARs are only visible from the CPU side of the northbridge. Regards, Christian. > > Logan > _______________________________________________ > amd-gfx mailing list > amd-gfx(a)lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

6 years, 8 months

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig March 2018