Hello,
Folloing the discussion about the driver for IOMMU controller for Samsung Exynos4 platform and Arnd's suggestions I've decided to start working on redesign of dma-mapping implementation for ARM architecture. The goal is to add support for IOMMU in the way preffered by the community :)
Some of the ideas about merging dma-mapping api and iommu api comes from the following threads: http://www.spinics.net/lists/linux-media/msg31453.html http://www.spinics.net/lists/arm-kernel/msg122552.html http://www.spinics.net/lists/arm-kernel/msg124416.html
They were also discussed on Linaro memory management meeting at UDS (Budapest 9-12 May).
I've finaly managed to clean up a bit my works and present the initial, very proof-of-concept version of patches that were ready just before Linaro meeting.
What have been implemented:
1. Introduced arm_dma_ops
dma_map_ops from include/linux/dma-mapping.h suffers from the following limitations: - lack of start address for sync operations - lack of write-combine methods - lack of mmap to user-space methods - lack of map_single method
For the initial version I've decided to use custom arm_dma_ops. Extending common interface will take time, until that I wanted to have something already working.
dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro meeting. New attribute for WRITE_COMBINE memory have been introduced.
2. moved all inline ARM dma-mapping related operations to arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops structure. The dma-mapping.c code deinitely needs cleanup, but this is just a first step.
3. Added very initial IOMMU support. Right now it is limited only to dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been tested with s5p-fimc driver on Samsung Exynos4 platform.
4. Adapted Samsung Exynos4 IOMUU driver to make use of the introduced iommu_dma proposal.
This patch series contains only patches for common dma-mapping part. There is also a patch that adds driver for Samsung IOMMU controller on Exynos4 platform. All required patches are available on:
git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch
Git web interface: http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/...
Future:
1. Add all missing operations for IOMMU mappings (map_single/page/sg, sync_*)
2. Move sync_* operations into separate function for better code sharing between iommu and non-iommu dma-mapping code
3. Splitting out dma bounce code from non-bounce into separate set of dma methods. Right now dma-bounce code is compiled conditionally and spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.
4. Merging dma_map_single with dma_map_page. I haven't investigated deeply why they have separate implementation on ARM. If this is a requirement then dma_map_ops need to be extended with another method.
5. Fix dma_alloc to unmap from linear mapping.
6. Convert IO address space management code from gen-alloc to some simpler bitmap based solution.
7. resolve issues that might araise during discussion & comments
Please note that this is very early version of patches, definitely NOT intended for merging. I just wanted to make sure that the direction is right and share the code with others that might want to cooperate on dma-mapping improvements.
Best regards
This is initial step in ARM DMA-mapping redesign. All calls have been moved into separate methods from arm_dma_ops structure that can be set separately for particular device's.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- arch/arm/include/asm/device.h | 1 + arch/arm/include/asm/dma-mapping.h | 653 +++++++++++++----------------------- arch/arm/mm/dma-mapping.c | 491 ++++++++++++++++++++++++---- arch/arm/mm/vmregion.h | 2 +- include/linux/dma-attrs.h | 1 + 5 files changed, 674 insertions(+), 474 deletions(-)
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 9f390ce..005791a 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -7,6 +7,7 @@ #define ASMARM_DEVICE_H
struct dev_archdata { + struct arm_dma_map_ops *dma_ops; #ifdef CONFIG_DMABOUNCE struct dmabounce_device_info *dmabounce; #endif diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 4fff837..42f4625 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -6,155 +6,219 @@ #include <linux/mm_types.h> #include <linux/scatterlist.h> #include <linux/dma-debug.h> +#include <linux/kmemcheck.h>
#include <asm-generic/dma-coherent.h> #include <asm/memory.h>
-#ifdef __arch_page_to_dma -#error Please update to __arch_pfn_to_dma -#endif - -/* - * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private - * functions used internally by the DMA-mapping API to provide DMA - * addresses. They must not be used by drivers. - */ -#ifndef __arch_pfn_to_dma -static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn) +#define DMA_ERROR_CODE (~(dma_addr_t)0x0) + +struct arm_dma_map_ops { + void *(*alloc_attrs)(struct device *, size_t, dma_addr_t *, gfp_t, + struct dma_attrs *attrs); + void (*free_attrs)(struct device *, size_t, void *, dma_addr_t, + struct dma_attrs *attrs); + int (*mmap_attrs)(struct device *, struct vm_area_struct *, + void *, dma_addr_t, size_t, struct dma_attrs *attrs); + + /* map single and map page might be merged together */ + dma_addr_t (*map_single)(struct device *dev, void *cpu_addr, + size_t size, enum dma_data_direction dir); + void (*unmap_single)(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir); + dma_addr_t (*map_page)(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir); + void (*unmap_page)(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir); + + int (*map_sg)(struct device *, struct scatterlist *, int, + enum dma_data_direction); + void (*unmap_sg)(struct device *, struct scatterlist *, int, + enum dma_data_direction); + + void (*sync_single_for_device)(struct device *dev, + dma_addr_t handle, unsigned long offset, size_t size, + enum dma_data_direction dir); + void (*sync_single_for_cpu)(struct device *dev, + dma_addr_t handle, unsigned long offset, size_t size, + enum dma_data_direction dir); + + void (*sync_sg_for_cpu)(struct device *, struct scatterlist *, int, + enum dma_data_direction); + void (*sync_sg_for_device)(struct device *, struct scatterlist *, int, + enum dma_data_direction); + +}; + +extern struct arm_dma_map_ops dma_ops; + +static inline struct arm_dma_map_ops *get_dma_ops(struct device *dev) { - return (dma_addr_t)__pfn_to_bus(pfn); + if (dev->archdata.dma_ops) + return dev->archdata.dma_ops; + return &dma_ops; }
-static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr) +static inline void set_dma_ops(struct device *dev, struct arm_dma_map_ops *ops) { - return __bus_to_pfn(addr); + dev->archdata.dma_ops = ops; }
-static inline void *dma_to_virt(struct device *dev, dma_addr_t addr) -{ - return (void *)__bus_to_virt(addr); -} +/********************************/
-static inline dma_addr_t virt_to_dma(struct device *dev, void *addr) +static inline void *dma_alloc_coherent(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t flag) { - return (dma_addr_t)__virt_to_bus((unsigned long)(addr)); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + void *cpu_addr; + + BUG_ON(!ops); + + cpu_addr = ops->alloc_attrs(dev, size, dma_handle, flag, NULL); + debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); + return cpu_addr; } -#else -static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn) + +static inline void dma_free_coherent(struct device *dev, size_t size, + void *cpu_addr, dma_addr_t dma_handle) { - return __arch_pfn_to_dma(dev, pfn); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + + BUG_ON(!ops); + + debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); + ops->free_attrs(dev, size, cpu_addr, dma_handle, NULL); }
-static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr) +static inline void *dma_alloc_writecombine(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t flag) { - return __arch_dma_to_pfn(dev, addr); + DEFINE_DMA_ATTRS(attrs); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + void *cpu_addr; + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); + BUG_ON(!ops); + + cpu_addr = ops->alloc_attrs(dev, size, dma_handle, flag, &attrs); + debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); + return cpu_addr; }
-static inline void *dma_to_virt(struct device *dev, dma_addr_t addr) +static inline void dma_free_writecombine(struct device *dev, size_t size, + void *cpu_addr, dma_addr_t dma_handle) { - return __arch_dma_to_virt(dev, addr); + DEFINE_DMA_ATTRS(attrs); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); + + BUG_ON(!ops); + + + debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); + ops->free_attrs(dev, size, cpu_addr, dma_handle, &attrs); }
-static inline dma_addr_t virt_to_dma(struct device *dev, void *addr) +static inline dma_addr_t dma_map_single(struct device *dev, void *ptr, + size_t size, + enum dma_data_direction dir) { - return __arch_virt_to_dma(dev, addr); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + dma_addr_t addr; + + kmemcheck_mark_initialized(ptr, size); + BUG_ON(!valid_dma_direction(dir)); + addr = ops->map_single(dev, ptr, size, dir); + debug_dma_map_page(dev, virt_to_page(ptr), + (unsigned long)ptr & ~PAGE_MASK, size, + dir, addr, true); + return addr; } -#endif
-/* - * The DMA API is built upon the notion of "buffer ownership". A buffer - * is either exclusively owned by the CPU (and therefore may be accessed - * by it) or exclusively owned by the DMA device. These helper functions - * represent the transitions between these two ownership states. - * - * Note, however, that on later ARMs, this notion does not work due to - * speculative prefetches. We model our approach on the assumption that - * the CPU does do speculative prefetches, which means we clean caches - * before transfers and delay cache invalidation until transfer completion. - * - * Private support functions: these are not part of the API and are - * liable to change. Drivers must not use these. - */ -static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size, - enum dma_data_direction dir) +static inline void dma_unmap_single(struct device *dev, dma_addr_t addr, + size_t size, + enum dma_data_direction dir) { - extern void ___dma_single_cpu_to_dev(const void *, size_t, - enum dma_data_direction); + struct arm_dma_map_ops *ops = get_dma_ops(dev);
- if (!arch_is_coherent()) - ___dma_single_cpu_to_dev(kaddr, size, dir); + BUG_ON(!valid_dma_direction(dir)); + if (ops->unmap_single) + ops->unmap_single(dev, addr, size, dir); + debug_dma_unmap_page(dev, addr, size, dir, true); }
-static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size, - enum dma_data_direction dir) +static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, + size_t offset, size_t size, + enum dma_data_direction dir) { - extern void ___dma_single_dev_to_cpu(const void *, size_t, - enum dma_data_direction); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + dma_addr_t addr; + + kmemcheck_mark_initialized(page_address(page) + offset, size); + BUG_ON(!valid_dma_direction(dir)); + addr = ops->map_page(dev, page, offset, size, dir); + debug_dma_map_page(dev, page, offset, size, dir, addr, false);
- if (!arch_is_coherent()) - ___dma_single_dev_to_cpu(kaddr, size, dir); + return addr; }
-static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +static inline void dma_unmap_page(struct device *dev, dma_addr_t addr, + size_t size, enum dma_data_direction dir) { - extern void ___dma_page_cpu_to_dev(struct page *, unsigned long, - size_t, enum dma_data_direction); + struct arm_dma_map_ops *ops = get_dma_ops(dev);
- if (!arch_is_coherent()) - ___dma_page_cpu_to_dev(page, off, size, dir); + BUG_ON(!valid_dma_direction(dir)); + if (ops->unmap_page) + ops->unmap_page(dev, addr, size, dir); + debug_dma_unmap_page(dev, addr, size, dir, false); }
-static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) +static inline void dma_sync_single_for_cpu(struct device *dev, dma_addr_t addr, + size_t size, + enum dma_data_direction dir) { - extern void ___dma_page_dev_to_cpu(struct page *, unsigned long, - size_t, enum dma_data_direction); + struct arm_dma_map_ops *ops = get_dma_ops(dev);
- if (!arch_is_coherent()) - ___dma_page_dev_to_cpu(page, off, size, dir); + BUG_ON(!valid_dma_direction(dir)); + if (ops->sync_single_for_cpu) + ops->sync_single_for_cpu(dev, addr, 0, size, dir); + debug_dma_sync_single_for_cpu(dev, addr, size, dir); }
-/* - * Return whether the given device DMA address mask can be supported - * properly. For example, if your device can only drive the low 24-bits - * during bus mastering, then you would pass 0x00ffffff as the mask - * to this function. - * - * FIXME: This should really be a platform specific issue - we should - * return false if GFP_DMA allocations may not satisfy the supplied 'mask'. - */ -static inline int dma_supported(struct device *dev, u64 mask) +static inline void dma_sync_single_for_device(struct device *dev, + dma_addr_t addr, size_t size, + enum dma_data_direction dir) { - if (mask < ISA_DMA_THRESHOLD) - return 0; - return 1; + struct arm_dma_map_ops *ops = get_dma_ops(dev); + + BUG_ON(!valid_dma_direction(dir)); + if (ops->sync_single_for_device) + ops->sync_single_for_device(dev, addr, 0, size, dir); + debug_dma_sync_single_for_device(dev, addr, size, dir); }
-static inline int dma_set_mask(struct device *dev, u64 dma_mask) +static inline void dma_sync_single_range_for_cpu(struct device *dev, dma_addr_t addr, + unsigned long offset, size_t size, + enum dma_data_direction dir) { -#ifdef CONFIG_DMABOUNCE - if (dev->archdata.dmabounce) { - if (dma_mask >= ISA_DMA_THRESHOLD) - return 0; - else - return -EIO; - } -#endif - if (!dev->dma_mask || !dma_supported(dev, dma_mask)) - return -EIO; + struct arm_dma_map_ops *ops = get_dma_ops(dev);
- *dev->dma_mask = dma_mask; - - return 0; + BUG_ON(!valid_dma_direction(dir)); + if (ops->sync_single_for_cpu) + ops->sync_single_for_cpu(dev, addr, offset, size, dir); + debug_dma_sync_single_for_cpu(dev, addr, size, dir); }
-/* - * DMA errors are defined by all-bits-set in the DMA address. - */ -static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) +static inline void dma_sync_single_range_for_device(struct device *dev, + dma_addr_t addr, + unsigned long offset, size_t size, + enum dma_data_direction dir) { - return dma_addr == ~0; + struct arm_dma_map_ops *ops = get_dma_ops(dev); + + BUG_ON(!valid_dma_direction(dir)); + if (ops->sync_single_for_device) + ops->sync_single_for_device(dev, addr, offset, size, dir); + debug_dma_sync_single_for_device(dev, addr, size, dir); }
/* @@ -172,358 +236,117 @@ static inline void dma_free_noncoherent(struct device *dev, size_t size, { }
-/** - * dma_alloc_coherent - allocate consistent memory for DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @size: required memory size - * @handle: bus-specific DMA address - * - * Allocate some uncached, unbuffered memory for a device for - * performing DMA. This function allocates pages, and will - * return the CPU-viewed address, and sets @handle to be the - * device-viewed address. - */ -extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t); - -/** - * dma_free_coherent - free memory allocated by dma_alloc_coherent - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @size: size of memory originally requested in dma_alloc_coherent - * @cpu_addr: CPU-view address returned from dma_alloc_coherent - * @handle: device-view address returned from dma_alloc_coherent - * - * Free (and unmap) a DMA buffer previously allocated by - * dma_alloc_coherent(). - * - * References to memory and mappings associated with cpu_addr/handle - * during and after this call executing are illegal. - */ -extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t); - -/** - * dma_mmap_coherent - map a coherent DMA allocation into user space - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @vma: vm_area_struct describing requested user mapping - * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent - * @handle: device-view address returned from dma_alloc_coherent - * @size: size of memory originally requested in dma_alloc_coherent - * - * Map a coherent DMA buffer previously allocated by dma_alloc_coherent - * into user space. The coherent DMA buffer must not be freed by the - * driver until the user space mapping has been released. - */ -int dma_mmap_coherent(struct device *, struct vm_area_struct *, - void *, dma_addr_t, size_t); - - -/** - * dma_alloc_writecombine - allocate writecombining memory for DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @size: required memory size - * @handle: bus-specific DMA address - * - * Allocate some uncached, buffered memory for a device for - * performing DMA. This function allocates pages, and will - * return the CPU-viewed address, and sets @handle to be the - * device-viewed address. - */ -extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *, - gfp_t); - -#define dma_free_writecombine(dev,size,cpu_addr,handle) \ - dma_free_coherent(dev,size,cpu_addr,handle) - -int dma_mmap_writecombine(struct device *, struct vm_area_struct *, - void *, dma_addr_t, size_t); - - -#ifdef CONFIG_DMABOUNCE -/* - * For SA-1111, IXP425, and ADI systems the dma-mapping functions are "magic" - * and utilize bounce buffers as needed to work around limited DMA windows. - * - * On the SA-1111, a bug limits DMA to only certain regions of RAM. - * On the IXP425, the PCI inbound window is 64MB (256MB total RAM) - * On some ADI engineering systems, PCI inbound window is 32MB (12MB total RAM) - * - * The following are helper functions used by the dmabounce subystem - * - */ - -/** - * dmabounce_register_dev - * - * @dev: valid struct device pointer - * @small_buf_size: size of buffers to use with small buffer pool - * @large_buf_size: size of buffers to use with large buffer pool (can be 0) - * - * This function should be called by low-level platform code to register - * a device as requireing DMA buffer bouncing. The function will allocate - * appropriate DMA pools for the device. - * - */ -extern int dmabounce_register_dev(struct device *, unsigned long, - unsigned long); - -/** - * dmabounce_unregister_dev - * - * @dev: valid struct device pointer - * - * This function should be called by low-level platform code when device - * that was previously registered with dmabounce_register_dev is removed - * from the system. - * - */ -extern void dmabounce_unregister_dev(struct device *); - -/** - * dma_needs_bounce - * - * @dev: valid struct device pointer - * @dma_handle: dma_handle of unbounced buffer - * @size: size of region being mapped - * - * Platforms that utilize the dmabounce mechanism must implement - * this function. - * - * The dmabounce routines call this function whenever a dma-mapping - * is requested to determine whether a given buffer needs to be bounced - * or not. The function must return 0 if the buffer is OK for - * DMA access and 1 if the buffer needs to be bounced. - * - */ -extern int dma_needs_bounce(struct device*, dma_addr_t, size_t); - -/* - * The DMA API, implemented by dmabounce.c. See below for descriptions. - */ -extern dma_addr_t __dma_map_single(struct device *, void *, size_t, - enum dma_data_direction); -extern void __dma_unmap_single(struct device *, dma_addr_t, size_t, - enum dma_data_direction); -extern dma_addr_t __dma_map_page(struct device *, struct page *, - unsigned long, size_t, enum dma_data_direction); -extern void __dma_unmap_page(struct device *, dma_addr_t, size_t, - enum dma_data_direction); - -/* - * Private functions - */ -int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long, - size_t, enum dma_data_direction); -int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long, - size_t, enum dma_data_direction); -#else -static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr, - unsigned long offset, size_t size, enum dma_data_direction dir) +static inline int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size) { - return 1; + struct arm_dma_map_ops *ops = get_dma_ops(dev); + return ops->mmap_attrs(dev, vma, cpu_addr, dma_addr, size, NULL); }
-static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr, - unsigned long offset, size_t size, enum dma_data_direction dir) +static inline int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size) { - return 1; + DEFINE_DMA_ATTRS(attrs); + struct arm_dma_map_ops *ops = get_dma_ops(dev); + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); + return ops->mmap_attrs(dev, vma, cpu_addr, dma_addr, size, &attrs); }
- -static inline dma_addr_t __dma_map_single(struct device *dev, void *cpu_addr, - size_t size, enum dma_data_direction dir) +static inline int dma_map_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir) { - __dma_single_cpu_to_dev(cpu_addr, size, dir); - return virt_to_dma(dev, cpu_addr); -} + struct arm_dma_map_ops *ops = get_dma_ops(dev); + int i, ents; + struct scatterlist *s;
-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(page, offset, size, dir); - return pfn_to_dma(dev, page_to_pfn(page)) + offset; -} + for_each_sg(sg, s, nents, i) + kmemcheck_mark_initialized(sg_virt(s), s->length); + BUG_ON(!valid_dma_direction(dir)); + ents = ops->map_sg(dev, sg, nents, dir); + debug_dma_map_sg(dev, sg, nents, ents, dir);
-static inline void __dma_unmap_single(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) -{ - __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); + return ents; }
-static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) +static inline void dma_unmap_sg(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir) { - __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), - handle & ~PAGE_MASK, size, dir); -} -#endif /* CONFIG_DMABOUNCE */ - -/** - * dma_map_single - map a single buffer for streaming DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @cpu_addr: CPU direct mapped address of buffer - * @size: size of buffer to map - * @dir: DMA transfer direction - * - * Ensure that any data held in the cache is appropriately discarded - * or written back. - * - * The device owns this memory once this call has completed. The CPU - * can regain ownership by calling dma_unmap_single() or - * dma_sync_single_for_cpu(). - */ -static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, - size_t size, enum dma_data_direction dir) -{ - dma_addr_t addr; + struct arm_dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir)); - - addr = __dma_map_single(dev, cpu_addr, size, dir); - debug_dma_map_page(dev, virt_to_page(cpu_addr), - (unsigned long)cpu_addr & ~PAGE_MASK, size, - dir, addr, true); - - return addr; + debug_dma_unmap_sg(dev, sg, nents, dir); + if (ops->unmap_sg) + ops->unmap_sg(dev, sg, nents, dir); }
-/** - * dma_map_page - map a portion of a page for streaming DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @page: page that buffer resides in - * @offset: offset into page for start of buffer - * @size: size of buffer to map - * @dir: DMA transfer direction - * - * Ensure that any data held in the cache is appropriately discarded - * or written back. - * - * The device owns this memory once this call has completed. The CPU - * can regain ownership by calling dma_unmap_page(). - */ -static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, enum dma_data_direction dir) +static inline void +dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, + int nelems, enum dma_data_direction dir) { - dma_addr_t addr; + struct arm_dma_map_ops *ops = get_dma_ops(dev);
BUG_ON(!valid_dma_direction(dir)); - - addr = __dma_map_page(dev, page, offset, size, dir); - debug_dma_map_page(dev, page, offset, size, dir, addr, false); - - return addr; + if (ops->sync_sg_for_cpu) + ops->sync_sg_for_cpu(dev, sg, nelems, dir); + debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir); }
-/** - * dma_unmap_single - unmap a single buffer previously mapped - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @handle: DMA address of buffer - * @size: size of buffer (same as passed to dma_map_single) - * @dir: DMA transfer direction (same as passed to dma_map_single) - * - * Unmap a single streaming mode DMA translation. The handle and size - * must match what was provided in the previous dma_map_single() call. - * All other usages are undefined. - * - * After this call, reads by the CPU to the buffer are guaranteed to see - * whatever the device wrote there. - */ -static inline void dma_unmap_single(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) +static inline void +dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, + int nelems, enum dma_data_direction dir) { - debug_dma_unmap_page(dev, handle, size, dir, true); - __dma_unmap_single(dev, handle, size, dir); -} + struct arm_dma_map_ops *ops = get_dma_ops(dev); + + BUG_ON(!valid_dma_direction(dir)); + if (ops->sync_sg_for_device) + ops->sync_sg_for_device(dev, sg, nelems, dir); + debug_dma_sync_sg_for_device(dev, sg, nelems, dir);
-/** - * dma_unmap_page - unmap a buffer previously mapped through dma_map_page() - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @handle: DMA address of buffer - * @size: size of buffer (same as passed to dma_map_page) - * @dir: DMA transfer direction (same as passed to dma_map_page) - * - * Unmap a page streaming mode DMA translation. The handle and size - * must match what was provided in the previous dma_map_page() call. - * All other usages are undefined. - * - * After this call, reads by the CPU to the buffer are guaranteed to see - * whatever the device wrote there. - */ -static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) -{ - debug_dma_unmap_page(dev, handle, size, dir, false); - __dma_unmap_page(dev, handle, size, dir); }
-/** - * dma_sync_single_range_for_cpu - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @handle: DMA address of buffer - * @offset: offset of region to start sync - * @size: size of region to sync - * @dir: DMA transfer direction (same as passed to dma_map_single) - * - * Make physical memory consistent for a single streaming mode DMA - * translation after a transfer. +/********************************/ + +/* + * Return whether the given device DMA address mask can be supported + * properly. For example, if your device can only drive the low 24-bits + * during bus mastering, then you would pass 0x00ffffff as the mask + * to this function. * - * If you perform a dma_map_single() but wish to interrogate the - * buffer using the cpu, yet do not wish to teardown the PCI dma - * mapping, you must call this function before doing so. At the - * next point you give the PCI dma address back to the card, you - * must first the perform a dma_sync_for_device, and then the - * device again owns the buffer. + * FIXME: This should really be a platform specific issue - we should + * return false if GFP_DMA allocations may not satisfy the supplied 'mask'. */ -static inline void dma_sync_single_range_for_cpu(struct device *dev, - dma_addr_t handle, unsigned long offset, size_t size, - enum dma_data_direction dir) +static inline int dma_supported(struct device *dev, u64 mask) { - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir); - - if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir)) - return; - - __dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); + if (mask < ISA_DMA_THRESHOLD) + return 0; + return 1; }
-static inline void dma_sync_single_range_for_device(struct device *dev, - dma_addr_t handle, unsigned long offset, size_t size, - enum dma_data_direction dir) +static inline int dma_set_mask(struct device *dev, u64 dma_mask) { - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_device(dev, handle + offset, size, dir); - - if (!dmabounce_sync_for_device(dev, handle, offset, size, dir)) - return; - - __dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); -} +#ifdef CONFIG_DMABOUNCE + if (dev->archdata.dmabounce) { + if (dma_mask >= ISA_DMA_THRESHOLD) + return 0; + else + return -EIO; + } +#endif + if (!dev->dma_mask || !dma_supported(dev, dma_mask)) + return -EIO;
-static inline void dma_sync_single_for_cpu(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - dma_sync_single_range_for_cpu(dev, handle, 0, size, dir); -} + *dev->dma_mask = dma_mask;
-static inline void dma_sync_single_for_device(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - dma_sync_single_range_for_device(dev, handle, 0, size, dir); + return 0; }
/* - * The scatter list versions of the above methods. + * DMA errors are defined by all-bits-set in the DMA address. */ -extern int dma_map_sg(struct device *, struct scatterlist *, int, - enum dma_data_direction); -extern void dma_unmap_sg(struct device *, struct scatterlist *, int, - enum dma_data_direction); -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int, - enum dma_data_direction); -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int, - enum dma_data_direction); - +static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) +{ + return dma_addr == ~0; +}
#endif /* __KERNEL__ */ #endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 82a093c..f8c6972 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -18,6 +18,7 @@ #include <linux/device.h> #include <linux/dma-mapping.h> #include <linux/highmem.h> +#include <linux/slab.h>
#include <asm/memory.h> #include <asm/highmem.h> @@ -25,6 +26,389 @@ #include <asm/tlbflush.h> #include <asm/sizes.h>
+#ifdef __arch_page_to_dma +#error Please update to __arch_pfn_to_dma +#endif + +/* + * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private + * functions used internally by the DMA-mapping API to provide DMA + * addresses. They must not be used by drivers. + */ +#ifndef __arch_pfn_to_dma +static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn) +{ + return (dma_addr_t)__pfn_to_bus(pfn); +} + +static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr) +{ + return __bus_to_pfn(addr); +} + +static inline void *dma_to_virt(struct device *dev, dma_addr_t addr) +{ + return (void *)__bus_to_virt(addr); +} + +static inline dma_addr_t virt_to_dma(struct device *dev, void *addr) +{ + return (dma_addr_t)__virt_to_bus((unsigned long)(addr)); +} +#else +static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn) +{ + return __arch_pfn_to_dma(dev, pfn); +} + +static inline unsigned long dma_to_pfn(struct device *dev, dma_addr_t addr) +{ + return __arch_dma_to_pfn(dev, addr); +} + +static inline void *dma_to_virt(struct device *dev, dma_addr_t addr) +{ + return __arch_dma_to_virt(dev, addr); +} + +static inline dma_addr_t virt_to_dma(struct device *dev, void *addr) +{ + return __arch_virt_to_dma(dev, addr); +} +#endif + +/* + * The DMA API is built upon the notion of "buffer ownership". A buffer + * is either exclusively owned by the CPU (and therefore may be accessed + * by it) or exclusively owned by the DMA device. These helper functions + * represent the transitions between these two ownership states. + * + * Note, however, that on later ARMs, this notion does not work due to + * speculative prefetches. We model our approach on the assumption that + * the CPU does do speculative prefetches, which means we clean caches + * before transfers and delay cache invalidation until transfer completion. + * + * Private support functions: these are not part of the API and are + * liable to change. Drivers must not use these. + */ +static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size, + enum dma_data_direction dir) +{ + extern void ___dma_single_cpu_to_dev(const void *, size_t, + enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_single_cpu_to_dev(kaddr, size, dir); +} + +static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size, + enum dma_data_direction dir) +{ + extern void ___dma_single_dev_to_cpu(const void *, size_t, + enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_single_dev_to_cpu(kaddr, size, dir); +} + +static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off, + size_t size, enum dma_data_direction dir) +{ + extern void ___dma_page_cpu_to_dev(struct page *, unsigned long, + size_t, enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_page_cpu_to_dev(page, off, size, dir); +} + +static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, + size_t size, enum dma_data_direction dir) +{ + extern void ___dma_page_dev_to_cpu(struct page *, unsigned long, + size_t, enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_page_dev_to_cpu(page, off, size, dir); +} + + +#ifdef CONFIG_DMABOUNCE +/* + * For SA-1111, IXP425, and ADI systems the dma-mapping functions are "magic" + * and utilize bounce buffers as needed to work around limited DMA windows. + * + * On the SA-1111, a bug limits DMA to only certain regions of RAM. + * On the IXP425, the PCI inbound window is 64MB (256MB total RAM) + * On some ADI engineering systems, PCI inbound window is 32MB (12MB total RAM) + * + * The following are helper functions used by the dmabounce subystem + * + */ + +/** + * dmabounce_register_dev + * + * @dev: valid struct device pointer + * @small_buf_size: size of buffers to use with small buffer pool + * @large_buf_size: size of buffers to use with large buffer pool (can be 0) + * + * This function should be called by low-level platform code to register + * a device as requireing DMA buffer bouncing. The function will allocate + * appropriate DMA pools for the device. + * + */ +extern int dmabounce_register_dev(struct device *, unsigned long, + unsigned long); + +/** + * dmabounce_unregister_dev + * + * @dev: valid struct device pointer + * + * This function should be called by low-level platform code when device + * that was previously registered with dmabounce_register_dev is removed + * from the system. + * + */ +extern void dmabounce_unregister_dev(struct device *); + +/** + * dma_needs_bounce + * + * @dev: valid struct device pointer + * @dma_handle: dma_handle of unbounced buffer + * @size: size of region being mapped + * + * Platforms that utilize the dmabounce mechanism must implement + * this function. + * + * The dmabounce routines call this function whenever a dma-mapping + * is requested to determine whether a given buffer needs to be bounced + * or not. The function must return 0 if the buffer is OK for + * DMA access and 1 if the buffer needs to be bounced. + * + */ +extern int dma_needs_bounce(struct device*, dma_addr_t, size_t); + +/* + * The DMA API, implemented by dmabounce.c. See below for descriptions. + */ +extern dma_addr_t __dma_map_single(struct device *, void *, size_t, + enum dma_data_direction); +extern void __dma_unmap_single(struct device *, dma_addr_t, size_t, + enum dma_data_direction); +extern dma_addr_t __dma_map_page(struct device *, struct page *, + unsigned long, size_t, enum dma_data_direction); +extern void __dma_unmap_page(struct device *, dma_addr_t, size_t, + enum dma_data_direction); + +/* + * Private functions + */ +int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long, + size_t, enum dma_data_direction); +int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long, + size_t, enum dma_data_direction); +#else +static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr, + unsigned long offset, size_t size, enum dma_data_direction dir) +{ + return 1; +} + +static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr, + unsigned long offset, size_t size, enum dma_data_direction dir) +{ + return 1; +} + + +static inline dma_addr_t __dma_map_single(struct device *dev, void *cpu_addr, + size_t size, enum dma_data_direction dir) +{ + __dma_single_cpu_to_dev(cpu_addr, size, dir); + return virt_to_dma(dev, cpu_addr); +} + +static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir) +{ + __dma_page_cpu_to_dev(page, offset, size, dir); + return pfn_to_dma(dev, page_to_pfn(page)) + offset; +} + +static inline void __dma_unmap_single(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); +} + +static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), + handle & ~PAGE_MASK, size, dir); +} +#endif /* CONFIG_DMABOUNCE */ + +/** + * dma_map_single - map a single buffer for streaming DMA + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @cpu_addr: CPU direct mapped address of buffer + * @size: size of buffer to map + * @dir: DMA transfer direction + * + * Ensure that any data held in the cache is appropriately discarded + * or written back. + * + * The device owns this memory once this call has completed. The CPU + * can regain ownership by calling dma_unmap_single() or + * dma_sync_single_for_cpu(). + */ +static inline dma_addr_t arm_dma_map_single(struct device *dev, void *cpu_addr, + size_t size, enum dma_data_direction dir) +{ + dma_addr_t addr; + + BUG_ON(!valid_dma_direction(dir)); + + addr = __dma_map_single(dev, cpu_addr, size, dir); + debug_dma_map_page(dev, virt_to_page(cpu_addr), + (unsigned long)cpu_addr & ~PAGE_MASK, size, + dir, addr, true); + + return addr; +} + +/** + * dma_map_page - map a portion of a page for streaming DMA + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @page: page that buffer resides in + * @offset: offset into page for start of buffer + * @size: size of buffer to map + * @dir: DMA transfer direction + * + * Ensure that any data held in the cache is appropriately discarded + * or written back. + * + * The device owns this memory once this call has completed. The CPU + * can regain ownership by calling dma_unmap_page(). + */ +static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir) +{ + dma_addr_t addr; + + BUG_ON(!valid_dma_direction(dir)); + + addr = __dma_map_page(dev, page, offset, size, dir); + debug_dma_map_page(dev, page, offset, size, dir, addr, false); + + return addr; +} + +/** + * dma_unmap_single - unmap a single buffer previously mapped + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @handle: DMA address of buffer + * @size: size of buffer (same as passed to dma_map_single) + * @dir: DMA transfer direction (same as passed to dma_map_single) + * + * Unmap a single streaming mode DMA translation. The handle and size + * must match what was provided in the previous dma_map_single() call. + * All other usages are undefined. + * + * After this call, reads by the CPU to the buffer are guaranteed to see + * whatever the device wrote there. + */ +static inline void arm_dma_unmap_single(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + debug_dma_unmap_page(dev, handle, size, dir, true); + __dma_unmap_single(dev, handle, size, dir); +} + +/** + * dma_unmap_page - unmap a buffer previously mapped through dma_map_page() + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @handle: DMA address of buffer + * @size: size of buffer (same as passed to dma_map_page) + * @dir: DMA transfer direction (same as passed to dma_map_page) + * + * Unmap a page streaming mode DMA translation. The handle and size + * must match what was provided in the previous dma_map_page() call. + * All other usages are undefined. + * + * After this call, reads by the CPU to the buffer are guaranteed to see + * whatever the device wrote there. + */ +static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + debug_dma_unmap_page(dev, handle, size, dir, false); + __dma_unmap_page(dev, handle, size, dir); +} + +/** + * dma_sync_single_range_for_cpu + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @handle: DMA address of buffer + * @offset: offset of region to start sync + * @size: size of region to sync + * @dir: DMA transfer direction (same as passed to dma_map_single) + * + * Make physical memory consistent for a single streaming mode DMA + * translation after a transfer. + * + * If you perform a dma_map_single() but wish to interrogate the + * buffer using the cpu, yet do not wish to teardown the PCI dma + * mapping, you must call this function before doing so. At the + * next point you give the PCI dma address back to the card, you + * must first the perform a dma_sync_for_device, and then the + * device again owns the buffer. + */ +static inline void arm_dma_sync_single_range_for_cpu(struct device *dev, + dma_addr_t handle, unsigned long offset, size_t size, + enum dma_data_direction dir) +{ + BUG_ON(!valid_dma_direction(dir)); + + debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir); + + if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir)) + return; + + __dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); +} + +static inline void arm_dma_sync_single_range_for_device(struct device *dev, + dma_addr_t handle, unsigned long offset, size_t size, + enum dma_data_direction dir) +{ + BUG_ON(!valid_dma_direction(dir)); + + debug_dma_sync_single_for_device(dev, handle + offset, size, dir); + + if (!dmabounce_sync_for_device(dev, handle, offset, size, dir)) + return; + + __dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); +} + +static inline void arm_dma_sync_single_for_cpu(struct device *dev, + dma_addr_t handle, unsigned long offset, size_t size, + enum dma_data_direction dir) +{ + dma_sync_single_range_for_cpu(dev, handle, offset, size, dir); +} + +static inline void arm_dma_sync_single_for_device(struct device *dev, + dma_addr_t handle, unsigned long offset, size_t size, + enum dma_data_direction dir) +{ + dma_sync_single_range_for_device(dev, handle, offset, size, dir); +} + static u64 get_coherent_dma_mask(struct device *dev) { u64 mask = ISA_DMA_THRESHOLD; @@ -224,7 +608,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot) u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
pte = consistent_pte[idx] + off; - c->vm_pages = page; + c->priv = page;
do { BUG_ON(!pte_none(*pte)); @@ -330,39 +714,36 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, * Allocate DMA-coherent memory space and return both the kernel remapped * virtual and bus address for that space. */ -void * -dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) +void *arm_dma_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, + gfp_t gfp, struct dma_attrs *attrs) { void *memory;
- if (dma_alloc_from_coherent(dev, size, handle, &memory)) - return memory; - - return __dma_alloc(dev, size, handle, gfp, - pgprot_dmacoherent(pgprot_kernel)); -} -EXPORT_SYMBOL(dma_alloc_coherent); - -/* - * Allocate a writecombining region, in much the same way as - * dma_alloc_coherent above. - */ -void * -dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) -{ - return __dma_alloc(dev, size, handle, gfp, + if (dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs)) { + return __dma_alloc(dev, size, handle, gfp, pgprot_writecombine(pgprot_kernel)); + } else { + if (dma_alloc_from_coherent(dev, size, handle, &memory)) + return memory; + return __dma_alloc(dev, size, handle, gfp, + pgprot_dmacoherent(pgprot_kernel)); + } } -EXPORT_SYMBOL(dma_alloc_writecombine);
-static int dma_mmap(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size) +static int arm_dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + struct dma_attrs *attrs) { - int ret = -ENXIO; -#ifdef CONFIG_MMU unsigned long user_size, kern_size; struct arm_vmregion *c; + int ret = -ENXIO; + + if (dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs)) + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot);
+#ifdef CONFIG_MMU user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); @@ -373,8 +754,9 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma,
if (off < kern_size && user_size <= (kern_size - off)) { + struct page *vm_pages = c->priv; ret = remap_pfn_range(vma, vma->vm_start, - page_to_pfn(c->vm_pages) + off, + page_to_pfn(vm_pages) + off, user_size << PAGE_SHIFT, vma->vm_page_prot); } @@ -384,27 +766,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma, return ret; }
-int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size) -{ - vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot); - return dma_mmap(dev, vma, cpu_addr, dma_addr, size); -} -EXPORT_SYMBOL(dma_mmap_coherent); - -int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size) -{ - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); - return dma_mmap(dev, vma, cpu_addr, dma_addr, size); -} -EXPORT_SYMBOL(dma_mmap_writecombine); - /* * free a page as defined by the above mapping. * Must not be called with IRQs disabled. */ -void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle) +void arm_dma_free_attrs(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, struct dma_attrs *attrs) { WARN_ON(irqs_disabled());
@@ -418,7 +785,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr
__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size); } -EXPORT_SYMBOL(dma_free_coherent);
/* * Make an area consistent for devices. @@ -443,7 +809,6 @@ void ___dma_single_cpu_to_dev(const void *kaddr, size_t size, } /* FIXME: non-speculating: flush on bidirectional mappings? */ } -EXPORT_SYMBOL(___dma_single_cpu_to_dev);
void ___dma_single_dev_to_cpu(const void *kaddr, size_t size, enum dma_data_direction dir) @@ -459,7 +824,6 @@ void ___dma_single_dev_to_cpu(const void *kaddr, size_t size,
dmac_unmap_area(kaddr, size, dir); } -EXPORT_SYMBOL(___dma_single_dev_to_cpu);
static void dma_cache_maint_page(struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, @@ -520,7 +884,6 @@ void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, } /* FIXME: non-speculating: flush on bidirectional mappings? */ } -EXPORT_SYMBOL(___dma_page_cpu_to_dev);
void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) @@ -540,10 +903,9 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, if (dir != DMA_TO_DEVICE && off == 0 && size >= PAGE_SIZE) set_bit(PG_dcache_clean, &page->flags); } -EXPORT_SYMBOL(___dma_page_dev_to_cpu);
/** - * dma_map_sg - map a set of SG buffers for streaming mode DMA + * arm_dma_map_sg - map a set of SG buffers for streaming mode DMA * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to map @@ -558,7 +920,7 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu); * Device ownership issues as mentioned for dma_map_single are the same * here. */ -int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, +static int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { struct scatterlist *s; @@ -580,10 +942,9 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); return 0; } -EXPORT_SYMBOL(dma_map_sg);
/** - * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg + * arm_dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to unmap (same as was passed to dma_map_sg) @@ -592,7 +953,7 @@ EXPORT_SYMBOL(dma_map_sg); * Unmap a set of streaming mode DMA translations. Again, CPU access * rules concerning calls here are the same as for dma_unmap_single(). */ -void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, +static void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { struct scatterlist *s; @@ -603,7 +964,6 @@ void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, for_each_sg(sg, s, nents, i) __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); } -EXPORT_SYMBOL(dma_unmap_sg);
/** * dma_sync_sg_for_cpu @@ -612,7 +972,7 @@ EXPORT_SYMBOL(dma_unmap_sg); * @nents: number of buffers to map (returned from dma_map_sg) * @dir: DMA transfer direction (same as was passed to dma_map_sg) */ -void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, +static void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { struct scatterlist *s; @@ -629,16 +989,15 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
debug_dma_sync_sg_for_cpu(dev, sg, nents, dir); } -EXPORT_SYMBOL(dma_sync_sg_for_cpu);
/** - * dma_sync_sg_for_device + * arm_dma_sync_sg_for_device * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to map (returned from dma_map_sg) * @dir: DMA transfer direction (same as was passed to dma_map_sg) */ -void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, +static void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { struct scatterlist *s; @@ -655,7 +1014,23 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
debug_dma_sync_sg_for_device(dev, sg, nents, dir); } -EXPORT_SYMBOL(dma_sync_sg_for_device); + +struct arm_dma_map_ops dma_ops = { + .alloc_attrs = arm_dma_alloc_attrs, + .free_attrs = arm_dma_free_attrs, + .map_single = arm_dma_map_single, + .unmap_single = arm_dma_unmap_single, + .map_page = arm_dma_map_page, + .unmap_page = arm_dma_unmap_page, + .map_sg = arm_dma_map_sg, + .unmap_sg = arm_dma_unmap_sg, + .sync_single_for_device = arm_dma_sync_single_for_device, + .sync_single_for_cpu = arm_dma_sync_single_for_cpu, + .sync_sg_for_cpu = arm_dma_sync_sg_for_cpu, + .sync_sg_for_device = arm_dma_sync_sg_for_device, + .mmap_attrs = arm_dma_mmap_attrs, +}; +EXPORT_SYMBOL_GPL(dma_ops);
#define PREALLOC_DMA_DEBUG_ENTRIES 4096
diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h index 15e9f04..6bbc402 100644 --- a/arch/arm/mm/vmregion.h +++ b/arch/arm/mm/vmregion.h @@ -17,7 +17,7 @@ struct arm_vmregion { struct list_head vm_list; unsigned long vm_start; unsigned long vm_end; - struct page *vm_pages; + void *priv; int vm_active; };
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h index 71ad34e..ada61e1 100644 --- a/include/linux/dma-attrs.h +++ b/include/linux/dma-attrs.h @@ -13,6 +13,7 @@ enum dma_attr { DMA_ATTR_WRITE_BARRIER, DMA_ATTR_WEAK_ORDERING, + DMA_ATTR_WRITE_COMBINE, DMA_ATTR_MAX, };
Add initial proof of concept implementation of DMA-mapping API for devices that have IOMMU support. Right now only dma_alloc_coherent, dma_free_coherent and dma_mmap_coherent functions are supported.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- arch/arm/Kconfig | 1 + arch/arm/include/asm/device.h | 2 + arch/arm/include/asm/dma-iommu.h | 30 ++++ arch/arm/mm/dma-mapping.c | 326 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 359 insertions(+), 0 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 377a7a5..61900f1 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -8,6 +8,8 @@ config ARM select RTC_LIB select SYS_SUPPORTS_APM_EMULATION select GENERIC_ATOMIC64 if (CPU_V6 || !CPU_32v6K || !AEABI) + select GENERIC_ALLOCATOR + select HAVE_DMA_ATTRS select HAVE_OPROFILE if (HAVE_PERF_EVENTS) select HAVE_ARCH_KGDB select HAVE_KPROBES if (!XIP_KERNEL && !THUMB2_KERNEL) diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 005791a..d3ec1e9 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -11,6 +11,8 @@ struct dev_archdata { #ifdef CONFIG_DMABOUNCE struct dmabounce_device_info *dmabounce; #endif + void *iommu_priv; + struct dma_iommu_mapping *mapping; };
struct pdev_archdata { diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h new file mode 100644 index 0000000..c246ff3 --- /dev/null +++ b/arch/arm/include/asm/dma-iommu.h @@ -0,0 +1,30 @@ +#ifndef ASMARM_DMA_IOMMU_H +#define ASMARM_DMA_IOMMU_H + +#ifdef __KERNEL__ + +#include <linux/mm_types.h> +#include <linux/scatterlist.h> +#include <linux/dma-debug.h> +#include <linux/kmemcheck.h> + +#include <asm/memory.h> + +struct dma_iommu_mapping { + /* iommu specific data */ + struct iommu_domain *domain; + + /* address space data */ + struct gen_pool *pool; + + dma_addr_t base; + size_t size; + + atomic_t ref_count; + struct mutex lock; +}; + +int __init arm_iommu_assign_device(struct device *dev, dma_addr_t base, dma_addr_t size); + +#endif /* __KERNEL__ */ +#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index f8c6972..b6397c1 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -19,6 +19,7 @@ #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/slab.h> +#include <linux/genalloc.h>
#include <asm/memory.h> #include <asm/highmem.h> @@ -26,6 +27,9 @@ #include <asm/tlbflush.h> #include <asm/sizes.h>
+#include <linux/iommu.h> +#include <asm/dma-iommu.h> + #ifdef __arch_page_to_dma #error Please update to __arch_pfn_to_dma #endif @@ -1040,3 +1044,325 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init); + + +/* IOMMU */ + +/* + * Allocate a DMA buffer for 'dev' of size 'size' using the + * specified gfp mask. Note that 'size' must be page aligned. + */ +static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +{ + struct page **pages; + int count = size >> PAGE_SHIFT; + void *ptr; + int i; + + pages = kzalloc(count * sizeof(struct page*), gfp); + if (!pages) + return NULL; + + printk("IOMMU: page table allocated\n"); + + for (i=0; i<count; i++) { + pages[i] = alloc_page(gfp); //alloc_pages(gfp, 0); + + if (!pages[i]) + goto error; + + /* + * Ensure that the allocated pages are zeroed, and that any data + * lurking in the kernel direct-mapped region is invalidated. + */ + ptr = page_address(pages[i]); + memset(ptr, 0, PAGE_SIZE); + dmac_flush_range(ptr, ptr + PAGE_SIZE); + outer_flush_range(__pa(ptr), __pa(ptr) + PAGE_SIZE); + } + printk("IOMMU: pages allocated\n"); + return pages; +error: + printk("IOMMU: error allocating pages\n"); + while (--i) + if (pages[i]) + __free_pages(pages[i], 0); + kfree(pages); + return NULL; +} + +static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) +{ + int count = size >> PAGE_SHIFT; + int i; + for (i=0; i< count; i++) + if (pages[i]) + __free_pages(pages[i], 0); + kfree(pages); + return 0; +} + +static void * +__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot) +{ + struct arm_vmregion *c; + size_t align; + size_t count = size >> PAGE_SHIFT; + int bit; + + if (!consistent_pte[0]) { + printk(KERN_ERR "%s: not initialised\n", __func__); + dump_stack(); + return NULL; + } + + /* + * Align the virtual region allocation - maximum alignment is + * a section size, minimum is a page size. This helps reduce + * fragmentation of the DMA space, and also prevents allocations + * smaller than a section from crossing a section boundary. + */ + bit = fls(size - 1); + if (bit > SECTION_SHIFT) + bit = SECTION_SHIFT; + align = 1 << bit; + + /* + * Allocate a virtual address in the consistent mapping region. + */ + c = arm_vmregion_alloc(&consistent_head, align, size, + gfp & ~(__GFP_DMA | __GFP_HIGHMEM)); + if (c) { + pte_t *pte; + int idx = CONSISTENT_PTE_INDEX(c->vm_start); + int i = 0; + u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1); + + pte = consistent_pte[idx] + off; + c->priv = pages; + + do { + BUG_ON(!pte_none(*pte)); + + set_pte_ext(pte, mk_pte(pages[i], prot), 0); + pte++; + off++; + i++; + if (off >= PTRS_PER_PTE) { + off = 0; + pte = consistent_pte[++idx]; + } + } while (i < count); + + dsb(); + + return (void *)c->vm_start; + } + return NULL; +} + +static dma_addr_t __iommu_create_mapping(struct device *dev, struct page **pages, size_t size) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + unsigned int count = size >> PAGE_SHIFT; + dma_addr_t dma_addr, iova; + int i, ret = 0; + + printk("IOMMU: mapping %p\n", mapping); + + iova = gen_pool_alloc(mapping->pool, size); + + printk("IOMMU: gen_alloc res %x\n", iova); + + if (iova == 0) + goto fail; + + dma_addr = iova; + + for (i=0; i<count; i++) { + unsigned int phys = page_to_phys(pages[i]); + ret = iommu_map(mapping->domain, iova, phys, 0, 0); + if (ret < 0) + goto fail; + iova += PAGE_SIZE; + } + + return dma_addr; +fail: + return 0; +} + +static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + unsigned int count = size >> PAGE_SHIFT; + int i; + + gen_pool_free(mapping->pool, iova, size); + + for (i=0; i<count; i++) { + iommu_unmap(mapping->domain, iova, 0); + iova += PAGE_SIZE; + } + return 0; +} + +int arm_iommu_init(struct device *dev); + +static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, + dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + struct page **pages; + void *addr = NULL; + pgprot_t prot; + + if (dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs)) + prot = pgprot_writecombine(pgprot_kernel); + else + prot = pgprot_dmacoherent(pgprot_kernel); + + arm_iommu_init(dev); + + mutex_lock(&mapping->lock); + + *handle = ~0; + size = PAGE_ALIGN(size); + + printk("IOMMU: requested size %d\n", size); + + pages = __iommu_alloc_buffer(dev, size, gfp); + if (!pages) + return NULL; + + printk("IOMMU: allocated pages: %p\n", pages); + + *handle = __iommu_create_mapping(dev, pages, size); + + printk("IOMMU: created iova: %08x\n", *handle); + + if (!*handle) + goto err_buffer; + + addr = __iommu_alloc_remap(pages, size, gfp, prot); + if (!addr) + goto err_iommu; + + printk("IOMMU: allocated iova %08x, virt %p\n", *handle, addr); + +err_iommu: +err_buffer: + mutex_unlock(&mapping->lock); + return addr; +} + +static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + struct dma_attrs *attrs) +{ + unsigned long user_size; + struct arm_vmregion *c; + + if (dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs)) + vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); + else + vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot); + + printk("IOMMU: mmap virt %p, dma %08x, size %d\n", cpu_addr, dma_addr, size); + + user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); + if (c) { + struct page **pages = c->priv; + + unsigned long uaddr = vma->vm_start; + unsigned long usize = vma->vm_end - vma->vm_start; + int i = 0; + + do { + int ret; + + ret = vm_insert_page(vma, uaddr, pages[i++]); + if (ret) { + printk(KERN_ERR "Remapping memory, error: %d\n", ret); + return ret; + } + + uaddr += PAGE_SIZE; + usize -= PAGE_SIZE; + } while (usize > 0); + } + return 0; +} + +/* + * free a page as defined by the above mapping. + * Must not be called with IRQs disabled. + */ +void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, struct dma_attrs *attrs) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + struct arm_vmregion *c; + size = PAGE_ALIGN(size); + + mutex_lock(&mapping->lock); + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); + if (c) { + struct page **pages = c->priv; + __dma_free_remap(cpu_addr, size); + __iommu_remove_mapping(dev, handle, size); + __iommu_free_buffer(dev, pages, size); + } + mutex_unlock(&mapping->lock); +} + +struct arm_dma_map_ops iommu_ops = { + .alloc_attrs = arm_iommu_alloc_attrs, + .free_attrs = arm_iommu_free_attrs, + .mmap_attrs = arm_iommu_mmap_attrs, +}; +EXPORT_SYMBOL_GPL(iommu_ops); + +int arm_iommu_init(struct device *dev) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + + if (mapping->pool) + return 0; + + mutex_init(&mapping->lock); + + mapping->pool = gen_pool_create(16, -1); + if (!mapping->pool) + return -ENOMEM; + + if (gen_pool_add(mapping->pool, mapping->base, mapping->size, -1) != 0) + return -ENOMEM; + + mapping->domain = iommu_domain_alloc(); + if (!mapping->domain) + return -ENOMEM; + + if (iommu_attach_device(mapping->domain, dev) != 0) + return -ENOMEM; + + return 0; +} + +int __init arm_iommu_assign_device(struct device *dev, dma_addr_t base, dma_addr_t size) +{ + struct dma_iommu_mapping *mapping; + mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL); + if (!mapping) + return -ENOMEM; + mapping->base = base; + mapping->size = size; + + dev->archdata.mapping = mapping; + set_dma_ops(dev, &iommu_ops); + printk(KERN_INFO "Assigned IOMMU device to %s\n", dev_name(dev)); + + return 0; +}
I don't think dma_alloc_writecombine() is useful because it is actually not different from dma_alloc_coherent(). Moreover, no architecture implements it except ARM and AVR32 and 'struct dma_map_ops' in <linux/dma-mapping.h> does not cover it.
The only difference of dma_alloc_writecombine() from dma_alloc_coherent() is whether a caller needs to decide to use memory barrier after call dma_alloc_writecombine().
Of course, the mapping created by by dma_alloc_writecombine() may be more efficient for CPU to update the DMA buffer. But I think mapping with dma_alloc_coherent() is not such a performance bottleneck.
I think it is better to remove dma_alloc_writecombine() and replace all of it with dma_alloc_coherent().
In addition, IMHO, mapping to user's address is not a duty of dma_map_ops. dma_mmap_*() is not suitable for a system that has IOMMU because a DMA address does not equal to its correspondent physical address semantically.
I think DMA APIs of ARM must be changed drastically to support IOMMU because IOMMU API does not manage virtual address space.
I've also concerned about IOMMU implementation in ARM architecture for several months. But i found that there are some obstacles to overcome.
Best regards.
On Wed, May 25, 2011 at 4:35 PM, Marek Szyprowski m.szyprowski@samsung.com wrote:
Hello,
Folloing the discussion about the driver for IOMMU controller for Samsung Exynos4 platform and Arnd's suggestions I've decided to start working on redesign of dma-mapping implementation for ARM architecture. The goal is to add support for IOMMU in the way preffered by the community :)
Some of the ideas about merging dma-mapping api and iommu api comes from the following threads: http://www.spinics.net/lists/linux-media/msg31453.html http://www.spinics.net/lists/arm-kernel/msg122552.html http://www.spinics.net/lists/arm-kernel/msg124416.html
They were also discussed on Linaro memory management meeting at UDS (Budapest 9-12 May).
I've finaly managed to clean up a bit my works and present the initial, very proof-of-concept version of patches that were ready just before Linaro meeting.
What have been implemented:
- Introduced arm_dma_ops
dma_map_ops from include/linux/dma-mapping.h suffers from the following limitations:
- lack of start address for sync operations
- lack of write-combine methods
- lack of mmap to user-space methods
- lack of map_single method
For the initial version I've decided to use custom arm_dma_ops. Extending common interface will take time, until that I wanted to have something already working.
dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro meeting. New attribute for WRITE_COMBINE memory have been introduced.
- moved all inline ARM dma-mapping related operations to
arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops structure. The dma-mapping.c code deinitely needs cleanup, but this is just a first step.
- Added very initial IOMMU support. Right now it is limited only to
dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been tested with s5p-fimc driver on Samsung Exynos4 platform.
- Adapted Samsung Exynos4 IOMUU driver to make use of the introduced
iommu_dma proposal.
This patch series contains only patches for common dma-mapping part. There is also a patch that adds driver for Samsung IOMMU controller on Exynos4 platform. All required patches are available on:
git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch
Git web interface: http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/...
Future:
- Add all missing operations for IOMMU mappings (map_single/page/sg,
sync_*)
- Move sync_* operations into separate function for better code sharing
between iommu and non-iommu dma-mapping code
- Splitting out dma bounce code from non-bounce into separate set of
dma methods. Right now dma-bounce code is compiled conditionally and spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.
- Merging dma_map_single with dma_map_page. I haven't investigated
deeply why they have separate implementation on ARM. If this is a requirement then dma_map_ops need to be extended with another method.
Fix dma_alloc to unmap from linear mapping.
Convert IO address space management code from gen-alloc to some
simpler bitmap based solution.
- resolve issues that might araise during discussion & comments
Please note that this is very early version of patches, definitely NOT intended for merging. I just wanted to make sure that the direction is right and share the code with others that might want to cooperate on dma-mapping improvements.
Best regards
Marek Szyprowski Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (2): ARM: Move dma related inlines into arm_dma_ops methods ARM: initial proof-of-concept IOMMU mapper for DMA-mapping
arch/arm/Kconfig | 1 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 30 ++ arch/arm/include/asm/dma-mapping.h | 653 +++++++++++------------------ arch/arm/mm/dma-mapping.c | 817 +++++++++++++++++++++++++++++++++--- arch/arm/mm/vmregion.h | 2 +- include/linux/dma-attrs.h | 1 + 7 files changed, 1033 insertions(+), 474 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h
-- 1.7.1.569.g6f426
-- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
On Monday 13 June 2011 16:12:05 KyongHo Cho wrote:
Of course, the mapping created by by dma_alloc_writecombine() may be more efficient for CPU to update the DMA buffer. But I think mapping with dma_alloc_coherent() is not such a performance bottleneck.
I think it is better to remove dma_alloc_writecombine() and replace all of it with dma_alloc_coherent().
I'm sure that the graphics people will disagree with you on that. Having the frame buffer mapped in write-combine mode is rather important when you want to efficiently output videos from your CPU.
In addition, IMHO, mapping to user's address is not a duty of dma_map_ops. dma_mmap_*() is not suitable for a system that has IOMMU because a DMA address does not equal to its correspondent physical address semantically.
I think DMA APIs of ARM must be changed drastically to support IOMMU because IOMMU API does not manage virtual address space.
I can understand that there are arguments why mapping a DMA buffer into user space doesn't belong into dma_map_ops, but I don't see how the presence of an IOMMU is one of them.
The entire purpose of dma_map_ops is to hide from the user whether you have an IOMMU or not, so that would be the main argument for putting it in there, not against doing so.
Arnd
Hi.
On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann arnd@arndb.de wrote:
I'm sure that the graphics people will disagree with you on that. Having the frame buffer mapped in write-combine mode is rather important when you want to efficiently output videos from your CPU.
I agree with you. But I am discussing about dma_alloc_writecombine() in ARM. You can see that only ARM and AVR32 implement it and there are few drivers which use it. No function in dma_map_ops corresponds to dma_alloc_writecombine(). That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.
I can understand that there are arguments why mapping a DMA buffer into user space doesn't belong into dma_map_ops, but I don't see how the presence of an IOMMU is one of them.
The entire purpose of dma_map_ops is to hide from the user whether you have an IOMMU or not, so that would be the main argument for putting it in there, not against doing so.
I also understand the reasons why dma_map_ops maps a buffer into user space. Mapping in device and user space at the same time or in a simple approach may look good. But I think mapping to user must be and driver-specific. Moreover, kernel already provides various ways to map physical memory to user space. And I think that remapping DMA address that is in device address space to user space is not a good idea because DMA address is not same to physical address semantically if features of IOMMU are implemented.
On Tue, Jun 14, 2011 at 12:30:44AM +0900, KyongHo Cho wrote:
On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann arnd@arndb.de wrote:
I'm sure that the graphics people will disagree with you on that. Having the frame buffer mapped in write-combine mode is rather important when you want to efficiently output videos from your CPU.
I agree with you. But I am discussing about dma_alloc_writecombine() in ARM. You can see that only ARM and AVR32 implement it and there are few drivers which use it. No function in dma_map_ops corresponds to dma_alloc_writecombine(). That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.
FWIW, on ARMv6 and later hardware, the dma_alloc_coherent() provides writecombine memory (i.e. Normal Noncacheable), so no need for dma_alloc_writecombine(). On earlier architectures it is creating Strongly Ordered mappings (no writecombine).
On Monday 13 June 2011 17:30:44 KyongHo Cho wrote:
On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann arnd@arndb.de wrote:
I'm sure that the graphics people will disagree with you on that. Having the frame buffer mapped in write-combine mode is rather important when you want to efficiently output videos from your CPU.
I agree with you. But I am discussing about dma_alloc_writecombine() in ARM. You can see that only ARM and AVR32 implement it and there are few drivers which use it. No function in dma_map_ops corresponds to dma_alloc_writecombine(). That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.
Yes, and I think Marek's patch is really necessary. The reason we need dma_alloc_writecombine on ARM is because the page attributes in the kernel need to match the ones in user space, while other architectures either handle the writecombine flag outside of the page table or can have multiple conflicting mappings.
The reason that I suspect AVR32 needs it is to share device drivers with ARM.
I can understand that there are arguments why mapping a DMA buffer into user space doesn't belong into dma_map_ops, but I don't see how the presence of an IOMMU is one of them.
The entire purpose of dma_map_ops is to hide from the user whether you have an IOMMU or not, so that would be the main argument for putting it in there, not against doing so.
I also understand the reasons why dma_map_ops maps a buffer into user space. Mapping in device and user space at the same time or in a simple approach may look good. But I think mapping to user must be and driver-specific. Moreover, kernel already provides various ways to map physical memory to user space.
I believe the idea of providing dma_mmap_... is to ensure that the page attributes are not conflicting and the DMA code is the place that decides on the page attributes for the kernel mapping, so no other place in the kernel can really know what it should be in user space.
And I think that remapping DMA address that is in device address space to user space is not a good idea because DMA address is not same to physical address semantically if features of IOMMU are implemented.
I'm totally not following this argument. This has nothing to do with IOMMU or not. If you have an IOMMU, the dma code will know where the pages are anyway, so it can always map them into user space. The dma code might have an easier way to do it other than follwoing the page tables.
Arnd
I'm totally not following this argument. This has nothing to do with IOMMU or not. If you have an IOMMU, the dma code will know where the pages are anyway, so it can always map them into user space. The dma code might have an easier way to do it other than follwoing the page tables.
Ah. Sorry for that. I mixed dma_alloc_* up with dma_map_*. I identified the reason why mmap_* in dma_map_ops is required. You mean that nothing but DMA API knows what pages will be mapped to user space. Thanks anyway.
KyongHo.
FWIW, on ARMv6 and later hardware, the dma_alloc_coherent() provides writecombine memory (i.e. Normal Noncacheable), so no need for dma_alloc_writecombine(). On earlier architectures it is creating Strongly Ordered mappings (no writecombine).
Thanks.
Do you mean that dma_alloc_coherent() and dma_alloc_writecombine() are not different except some additional features of dma_alloc_coherent() in ARM?
The need to allocate pages for "write combining" access goes deeper than anything to do with DMA or IOMMUs. Please keep "write combine" distinct from "coherent" in the allocation/mapping APIs.
Write-combining is a special case because it's an end-to-end requirement, usually architecturally invisible, and getting it to happen requires a very specific combination of mappings and code. There's a good explanation here of the requirements on some Intel implementations of the x86 architecture: http://software.intel.com/en-us/articles/copying-accelerated-video-decode-fr... . As I understand it, similar considerations apply on at least some ARMv7 implementations, with NEON multi-register load/store operations taking the place of MOVNTDQ. (See http://www.arm.com/files/pdf/A8_Paper.pdf for instance; although I don't think there's enough detail about the conditions under which "if the full cache line is written, the Level-2 line is simply marked dirty and no external memory requests are required.")
As far as I can tell, there is not yet any way to get real cache-bypassing write-combining from userland in a mainline kernel, for x86/x86_64 or ARM. I have been able to do it from inside a driver on x86, including in an ISR with some fixes to the kernel's FPU context save/restore code (patch attached, if you're curious); otherwise I haven't yet seen write-combining in operation on Linux. The code that needs to bypass the cache is part of a SoC silicon erratum workaround supplied by Intel. It didn't work as delivered -- it oopsed the kernel -- but is now shipping inside our product, and no problems have been reported from QA or the field. So I'm fairly sure that the changes I made are effective.
I am not expert in this area; I was just forced to learn something about it in order to make a product work. My assertion that "there's no way to do it yet" is almost certainly wrong. I am hoping and expecting to be immediately contradicted, with a working code example and benchmarks that show that cache lines are not being fetched, clobbered, and stored again, with the latencies hidden inside the cache architecture. :-) (Seriously: there are four bits in the Cortex-A8's "L2 Cache Auxiliary Control Register" that control various aspects of this mechanism, and if you don't have a fairly good explanation of which bits do and don't affect your benchmark, then I contend that the job isn't done. I don't begin to understand the equivalent for the multi-core A9 I'm targeting next.)
If some kind person doesn't help me see the error of my ways, I'm going to have to figure it out for myself on ARM in the next couple of months, this time for performance reasons rather than to work around silicon errata. Unfortunately, I do not expect it to be particularly low-hanging fruit. I expect to switch to the hard-float ABI first (the only remaining obstacle being a couple of TI-supplied binary-only libraries). That might provide enough of a system-level performance win (by allowing the compiler to reorder fetches to NEON registers across function/method calls) to obviate the need.
Cheers, - Michael
On Mon, Jun 13, 2011 at 05:00:16PM +0100, KyongHo Cho wrote:
FWIW, on ARMv6 and later hardware, the dma_alloc_coherent() provides writecombine memory (i.e. Normal Noncacheable), so no need for dma_alloc_writecombine(). On earlier architectures it is creating Strongly Ordered mappings (no writecombine).
Do you mean that dma_alloc_coherent() and dma_alloc_writecombine() are not different except some additional features of dma_alloc_coherent() in ARM?
When CONFIG_DMA_MEM_BUFFERABLE is enabled (by default on ARMv7 and ARMv6 with some exceptions because of hardware issues), the resulting mapping for both coherent and writecombine is the same. In both cases the mapping is done as L_PTE_MT_BUFFERABLE which is what you want with writecombine. You can check the pgprot_writecombine() and pgprot_dmacoherent() macros in asm/pgtable.h
On Mon, 13 Jun 2011 10:55:59 -0700 "Michael K. Edwards" m.k.edwards@gmail.com wrote:
As far as I can tell, there is not yet any way to get real cache-bypassing write-combining from userland in a mainline kernel, for x86/x86_64 or ARM.
Well only if things are really broken. sysfs exposes _wc resource files to allow userland drivers to map a given PCI BAR using write combining, if the underlying platform supports it.
Similarly, userland mapping of GEM objects through the GTT are supposed to be write combined, though I need to verify this (we've had trouble with it in the past).
Hello,
On Monday, June 13, 2011 5:31 PM KyongHo Cho wrote:
On Tue, Jun 14, 2011 at 12:07 AM, Arnd Bergmann arnd@arndb.de wrote:
I'm sure that the graphics people will disagree with you on that. Having the frame buffer mapped in write-combine mode is rather important when you want to efficiently output videos from your CPU.
I agree with you. But I am discussing about dma_alloc_writecombine() in ARM. You can see that only ARM and AVR32 implement it and there are few drivers which use it. No function in dma_map_ops corresponds to dma_alloc_writecombine(). That's why Marek tried to add 'alloc_writecombine' to dma_map_ops.
I also introduced dma_alloc_attrs() to allow other types of memory&mappings combinations in the future. For example in case of IOMMU the driver might like to call a function that will allocate a buffer that will 'work best with hardware'. This means that the buffer might be build from pages larger than 4KiB, aligned to particular IOMMU requirements. Handling such requirements are definitely not a part of the driver, only particular implementation of dma-mapping will know them. The driver may just provide a some hints how the memory will be used. The one that I'm particularly thinking of are different types of caching.
I can understand that there are arguments why mapping a DMA buffer into user space doesn't belong into dma_map_ops, but I don't see how the presence of an IOMMU is one of them.
The entire purpose of dma_map_ops is to hide from the user whether you have an IOMMU or not, so that would be the main argument for putting it in there, not against doing so.
I also understand the reasons why dma_map_ops maps a buffer into user space. Mapping in device and user space at the same time or in a simple approach may look good. But I think mapping to user must be and driver-specific. Moreover, kernel already provides various ways to map physical memory to user space. And I think that remapping DMA address that is in device address space to user space is not a good idea because DMA address is not same to physical address semantically if features of IOMMU are implemented.
Mapping DMA address to user-space is one of the common feature of various APIs (framebuffer, v4l2, alsa). In most cases the kernel virtual address in not even required for such drivers, because they just want to expose the buffer content to userspace. It would be great if dma-mapping will allow allocating a coherent buffer without the need of mapping it to kernel space at all. Kernel virtual space is really limited. For some multimedia processing (like capturing & encoding HD movie from camera sensor) we might need buffers of total size over 128MB or even more).
Best regards
On Mon, Jun 13, 2011 at 11:54 AM, Jesse Barnes jbarnes@virtuousgeek.org wrote:
Well only if things are really broken. sysfs exposes _wc resource files to allow userland drivers to map a given PCI BAR using write combining, if the underlying platform supports it.
Mmm, I hadn't spotted that; that is useful, at least as sample code. Doesn't do me any good directly, though; I'm not on a PCI device, I'm on a SoC. And what I need to do is to allocate normal memory through an uncacheable write-combining page table entry (with certainty that it is not aliased by a cacheable entry for the same physical memory), and use it for interchange of data (GPU assets, compressed video) with other on-chip cores. (Or with off-chip PCI devices which use DMA to transfer data to/from these buffers and then interrupt the CPU to notify it to rotate them.)
What doesn't seem to be straightforward to do from userland is to allocate pages that are locked to physical memory and mapped for write-combining. The device driver shouldn't have to mediate their allocation, just map to a physical address (or set up an IOMMU entry, I suppose) and pass that to the hardware that needs it. Typical userland code that could use such a mechanism would be the Qt/OpenGL back end (which needs to store decompressed images and other pre-rendered assets in GPU-ready buffers) and media pipelines.
Similarly, userland mapping of GEM objects through the GTT are supposed to be write combined, though I need to verify this (we've had trouble with it in the past).
Also a nice source of sample code; though, again, I don't want this to be driver-specific. I might want a stage in my media pipeline that uses the GPU to perform, say, lens distortion correction. I shouldn't have to go through contortions to use the same buffers from the GPU and the video capture device. The two devices are likely to have their own variants on scatter-gather DMA, with a circularly linked list of block descriptors with ownership bits and all that jazz; but the actual data buffers should be generic, and the userland pipeline setup code should just allocate them (presumably as contiguous regions in a write-combining hugepage) and feed them to the plumbing.
Cheers, - Michael
On Tue, 14 Jun 2011 11:15:38 -0700 "Michael K. Edwards" m.k.edwards@gmail.com wrote:
What doesn't seem to be straightforward to do from userland is to allocate pages that are locked to physical memory and mapped for write-combining. The device driver shouldn't have to mediate their allocation, just map to a physical address (or set up an IOMMU entry, I suppose) and pass that to the hardware that needs it. Typical userland code that could use such a mechanism would be the Qt/OpenGL back end (which needs to store decompressed images and other pre-rendered assets in GPU-ready buffers) and media pipelines.
We try to avoid allowing userspace to pin arbitrary buffers though. So on the gfx side, userspace can allocate buffers, but they're only actually pinned when some operation is performed on them (e.g. they're referenced in a command buffer or used for a mode set operation).
Something like ION or GEM can provide the basic alloc & map API, but the platform code still has to deal with grabbing hunks of memory, making them uncached or write combine, and mapping them to app space without conflicts.
Also a nice source of sample code; though, again, I don't want this to be driver-specific. I might want a stage in my media pipeline that uses the GPU to perform, say, lens distortion correction. I shouldn't have to go through contortions to use the same buffers from the GPU and the video capture device. The two devices are likely to have their own variants on scatter-gather DMA, with a circularly linked list of block descriptors with ownership bits and all that jazz; but the actual data buffers should be generic, and the userland pipeline setup code should just allocate them (presumably as contiguous regions in a write-combining hugepage) and feed them to the plumbing.
Totally agree. That's one reason I don't think enhancing the DMA mapping API in the kernel is a complete solution. Sure, the platform code needs to be able to map buffers to devices and use any available IOMMUs, but we still need a userspace API for all of that, with its associated changes to the CPU MMU handling.
On 14 June 2011 13:21, Jesse Barnes jbarnes@virtuousgeek.org wrote:
On Tue, 14 Jun 2011 11:15:38 -0700 "Michael K. Edwards" m.k.edwards@gmail.com wrote:
What doesn't seem to be straightforward to do from userland is to allocate pages that are locked to physical memory and mapped for write-combining. The device driver shouldn't have to mediate their allocation, just map to a physical address (or set up an IOMMU entry, I suppose) and pass that to the hardware that needs it. Typical userland code that could use such a mechanism would be the Qt/OpenGL back end (which needs to store decompressed images and other pre-rendered assets in GPU-ready buffers) and media pipelines.
We try to avoid allowing userspace to pin arbitrary buffers though. So on the gfx side, userspace can allocate buffers, but they're only actually pinned when some operation is performed on them (e.g. they're referenced in a command buffer or used for a mode set operation).
Something like ION or GEM can provide the basic alloc & map API, but the platform code still has to deal with grabbing hunks of memory, making them uncached or write combine, and mapping them to app space without conflicts.
Also a nice source of sample code; though, again, I don't want this to be driver-specific. I might want a stage in my media pipeline that uses the GPU to perform, say, lens distortion correction. I shouldn't have to go through contortions to use the same buffers from the GPU and the video capture device. The two devices are likely to have their own variants on scatter-gather DMA, with a circularly linked list of block descriptors with ownership bits and all that jazz; but the actual data buffers should be generic, and the userland pipeline setup code should just allocate them (presumably as contiguous regions in a write-combining hugepage) and feed them to the plumbing.
Totally agree. That's one reason I don't think enhancing the DMA mapping API in the kernel is a complete solution. Sure, the platform code needs to be able to map buffers to devices and use any available IOMMUs, but we still need a userspace API for all of that, with its associated changes to the CPU MMU handling.
I haven't seen all the discussions but it sounds like creating the correct userspace abstraction and then looking at how the kernel needs to change (instead of the other way around) may add some clarity to things.
-- Jesse Barnes, Intel Open Source Technology Center
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Tue, Jun 14, 2011 at 11:21 AM, Jesse Barnes jbarnes@virtuousgeek.org wrote:
We try to avoid allowing userspace to pin arbitrary buffers though. So on the gfx side, userspace can allocate buffers, but they're only actually pinned when some operation is performed on them (e.g. they're referenced in a command buffer or used for a mode set operation).
The issue isn't so much pinning; I don't really care if the physical memory moves out from under me as long as the mappings are properly updated in all the process page tables that share it and all the hardware units that care. But the mapping has to have the right cache policy from the beginning, so that I get the important part of write combining (the fill buffer allocation -- without bothering to load contents from DRAM that are likely to be completely clobbered -- and the cache-line-sized flush once it's filled). In any case, supposedly there are weird aliasing issues if you try to take a page that is already mapped cacheable and remap it write-combine; and in the case of shared pages, you'd need to look up all processes that have the page mapped and alter their page tables, even if they're currently running on other SMP cores. Nasty.
Besides, I don't want little 4K pages; I want a hugepage with the right cache policy, in which I can build a malloc pool (tcmalloc, jemalloc, something like that) and allocate buffers for a variety of purposes. (I also want to use this to pass whole data structures, like priority search trees built using offset pointers, among cores that don't share a cache hierarchy or a cache coherency protocol.)
Presumably the privilege of write-combine buffer allocation would be limited to processes that have been granted the appropriate capability; but then that process should be able to share it with others. I would think the natural thing would be for the special-page allocation API to return a file descriptor, which can then be passed over local domain sockets and mmap()ed by as many processes as necessary. For many usage patterns, there will be no need for a kernel virtual mapping; hardware wants physical addresses (or IOMMU mappings) anyway.
Something like ION or GEM can provide the basic alloc & map API, but the platform code still has to deal with grabbing hunks of memory, making them uncached or write combine, and mapping them to app space without conflicts.
Absolutely. Much like any other hugepage allocation, right? Not really something ION or GEM or any other device driver needs to be involved in. Except for alignment issues, I suppose; I haven't given that much thought.
The part about setting up corresponding mappings to the same physical addresses in the device's DMA mechanics is not buffer *allocation*, it's buffer *registration*. That's sort of like V4L2's "user pointer I/O" mode, in which the userspace app allocates the buffers and uses the QBUF ioctl to register them. I see no reason why the enforcement of minimum alignment and cache policy couldn't be done at buffer registration time rather than region allocation time.
Cheers, - Michael
Hi Marek,
In function: dma_alloc_coherent()->arm_iommu_alloc_attrs()->__iommu_alloc_buffer()
I have following questions:
a) Before we come to this point, we would have enabled SYSMMU in a call to arm_iommu_init(). Shouldnt the SYSMMU be enabled after call to __iommu_alloc_buffer(), but before __iommu_create_mapping()? If in case the __iommu_alloc_buffer() fails, we dont disable the SYSMMU.
b) For huge buffer sizes, the pressure on SYSMMU would be very high. Cant we have option to dictate the page size for the IOMMU from driver in such cases? Should it always be the size of system pages?
Regards, Subash SISO-SLG
On 05/25/2011 01:05 PM, Marek Szyprowski wrote:
Hello,
Folloing the discussion about the driver for IOMMU controller for Samsung Exynos4 platform and Arnd's suggestions I've decided to start working on redesign of dma-mapping implementation for ARM architecture. The goal is to add support for IOMMU in the way preffered by the community :)
Some of the ideas about merging dma-mapping api and iommu api comes from the following threads: http://www.spinics.net/lists/linux-media/msg31453.html http://www.spinics.net/lists/arm-kernel/msg122552.html http://www.spinics.net/lists/arm-kernel/msg124416.html
They were also discussed on Linaro memory management meeting at UDS (Budapest 9-12 May).
I've finaly managed to clean up a bit my works and present the initial, very proof-of-concept version of patches that were ready just before Linaro meeting.
What have been implemented:
- Introduced arm_dma_ops
dma_map_ops from include/linux/dma-mapping.h suffers from the following limitations:
- lack of start address for sync operations
- lack of write-combine methods
- lack of mmap to user-space methods
- lack of map_single method
For the initial version I've decided to use custom arm_dma_ops. Extending common interface will take time, until that I wanted to have something already working.
dma_{alloc,free,mmap}_{coherent,writecombine} have been consolidated into dma_{alloc,free,mmap}_attrib what have been suggested on Linaro meeting. New attribute for WRITE_COMBINE memory have been introduced.
- moved all inline ARM dma-mapping related operations to
arch/arm/mm/dma-mapping.c and put them as methods in generic arm_dma_ops structure. The dma-mapping.c code deinitely needs cleanup, but this is just a first step.
- Added very initial IOMMU support. Right now it is limited only to
dma_alloc_attrib, dma_free_attrib and dma_mmap_attrib. It have been tested with s5p-fimc driver on Samsung Exynos4 platform.
- Adapted Samsung Exynos4 IOMUU driver to make use of the introduced
iommu_dma proposal.
This patch series contains only patches for common dma-mapping part. There is also a patch that adds driver for Samsung IOMMU controller on Exynos4 platform. All required patches are available on:
git://git.infradead.org/users/kmpark/linux-2.6-samsung dma-mapping branch
Git web interface: http://git.infradead.org/users/kmpark/linux-2.6-samsung/shortlog/refs/heads/...
Future:
- Add all missing operations for IOMMU mappings (map_single/page/sg,
sync_*)
- Move sync_* operations into separate function for better code sharing
between iommu and non-iommu dma-mapping code
- Splitting out dma bounce code from non-bounce into separate set of
dma methods. Right now dma-bounce code is compiled conditionally and spread over arch/arm/mm/dma-mapping.c and arch/arm/common/dmabounce.c.
- Merging dma_map_single with dma_map_page. I haven't investigated
deeply why they have separate implementation on ARM. If this is a requirement then dma_map_ops need to be extended with another method.
Fix dma_alloc to unmap from linear mapping.
Convert IO address space management code from gen-alloc to some
simpler bitmap based solution.
- resolve issues that might araise during discussion& comments
Please note that this is very early version of patches, definitely NOT intended for merging. I just wanted to make sure that the direction is right and share the code with others that might want to cooperate on dma-mapping improvements.
Best regards
Hello,
On Monday, June 20, 2011 4:31 PM Subash Patel wrote:
In function: dma_alloc_coherent()->arm_iommu_alloc_attrs()->__iommu_alloc_buffer()
I have following questions:
a) Before we come to this point, we would have enabled SYSMMU in a call to arm_iommu_init(). Shouldnt the SYSMMU be enabled after call to __iommu_alloc_buffer(), but before __iommu_create_mapping()? If in case the __iommu_alloc_buffer() fails, we dont disable the SYSMMU.
I want to move enabling and disabling SYSMMU completely to the runtime_pm framework. As You can notice, the updated SYSMMU driver automatically becomes a parent of respective multimedia device and a child of the power domain to which both belongs. This means that sysmmu will operate only when multimedia device is enabled, what really makes sense. The sysmmu driver will need to be updated not to poke into the registers if it is disabled, but this should be really trivial change.
b) For huge buffer sizes, the pressure on SYSMMU would be very high. Cant we have option to dictate the page size for the IOMMU from driver in such cases? Should it always be the size of system pages?
This was just a first version of dma-mapping and IOMMU integration, just to show the development road and start the discussion. Of course in the final version support for pages larger than 4KiB is highly expected. We can even reuse the recently posted CMA to allocate large pages for IOMMU to improve the performance and make sure that the framework will be able to allocate such pages even if the device is running for long time and memory got fragmented by typically movable pages.
Best regards
linaro-mm-sig@lists.linaro.org