[PATCHv9 00/10] ARM: DMA-mapping framework redesign

List overview All Threads
Download

newer

older

how to avoid allocating or freeze...

[PATCH 3/3] [RFC] Kernel Virtual...

Marek Szyprowski

18 Apr 2012 18 Apr '12

1:44 p.m.

Hello,

This is a quick update on dma-mapping redesign patches for ARM. I did some minor fixes suggested by Arnd and extended commit messages for a few patches. Like the previous version, the patches have been rebased onto latest Linux v3.4-rc3 which comes with dma_map_ops related preparation changes.

The patches are also available on my git repository at: git://git.linaro.org/people/mszyprowski/linux-dma-mapping.git 3.4-rc3-arm-dma-v9

The code has been tested on Samsung Exynos4 'UniversalC210' and NURI boards with IOMMU driver posted by KyongHo Cho. The integration patch has been posted in the following thread: http://www.spinics.net/lists/arm-kernel/msg169030.html

History of the development:

v1: (initial version of the DMA-mapping redesign patches): http://www.spinics.net/lists/linux-mm/msg21241.html

v2: http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000571.html http://lists.linaro.org/pipermail/linaro-mm-sig/2011-September/000577.html

v3: http://www.spinics.net/lists/linux-mm/msg25490.html

v4 and v5: http://www.spinics.net/lists/arm-kernel/msg151147.html http://www.spinics.net/lists/arm-kernel/msg154889.html

v6: http://www.spinics.net/lists/linux-mm/msg29903.html

v7: http://www.spinics.net/lists/arm-kernel/msg162149.html

v8: http://www.spinics.net/lists/arm-kernel/msg168478.html

Best regards Marek Szyprowski Samsung Poland R&D Center

Patch summary:

Marek Szyprowski (10): common: add dma_mmap_from_coherent() function ARM: dma-mapping: use pr_* instread of printk ARM: dma-mapping: introduce DMA_ERROR_CODE constant ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops ARM: dma-mapping: use asm-generic/dma-mapping-common.h ARM: dma-mapping: implement dma sg methods on top of any generic dma ops ARM: dma-mapping: move all dma bounce code to separate dma ops structure ARM: dma-mapping: remove redundant code and cleanup ARM: dma-mapping: use alloc, mmap, free from dma_ops ARM: dma-mapping: add support for IOMMU mapper

arch/arm/Kconfig | 9 + arch/arm/common/dmabounce.c | 84 +++- arch/arm/include/asm/device.h | 4 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/include/asm/dma-mapping.h | 407 ++++----------- arch/arm/mm/dma-mapping.c | 1015 ++++++++++++++++++++++++++++++------ arch/arm/mm/vmregion.h | 2 +- drivers/base/dma-coherent.c | 42 ++ include/asm-generic/dma-coherent.h | 4 +- 9 files changed, 1134 insertions(+), 467 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

-- 1.7.1.569.g6f426

Show replies by date

Marek Szyprowski

18 Apr 18 Apr

1:44 p.m.

New subject: [PATCHv9 01/10] common: add dma_mmap_from_coherent() function

Add a common helper for dma-mapping core for mapping a coherent buffer to userspace.

Reported-by: Subash Patel subashrp@gmail.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Tested-By: Subash Patel subash.ramaswamy@linaro.org --- drivers/base/dma-coherent.c | 42 ++++++++++++++++++++++++++++++++++++ include/asm-generic/dma-coherent.h | 4 ++- 2 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index bb0025c..1b85949 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -10,6 +10,7 @@ struct dma_coherent_mem { void *virt_base; dma_addr_t device_base; + phys_addr_t pfn_base; int size; int flags; unsigned long *bitmap; @@ -44,6 +45,7 @@ int dma_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,

dev->dma_mem->virt_base = mem_base; dev->dma_mem->device_base = device_addr; + dev->dma_mem->pfn_base = PFN_DOWN(bus_addr); dev->dma_mem->size = pages; dev->dma_mem->flags = flags;

@@ -176,3 +178,43 @@ int dma_release_from_coherent(struct device *dev, int order, void *vaddr) return 0; } EXPORT_SYMBOL(dma_release_from_coherent); + +/** + * dma_mmap_from_coherent() - try to mmap the memory allocated from + * per-device coherent memory pool to userspace + * @dev: device from which the memory was allocated + * @vma: vm_area for the userspace memory + * @vaddr: cpu address returned by dma_alloc_from_coherent + * @size: size of the memory buffer allocated by dma_alloc_from_coherent + * + * This checks whether the memory was allocated from the per-device + * coherent memory pool and if so, maps that memory to the provided vma. + * + * Returns 1 if we correctly mapped the memory, or 0 if + * dma_release_coherent() should proceed with mapping memory from + * generic pools. + */ +int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma, + void *vaddr, size_t size, int *ret) +{ + struct dma_coherent_mem *mem = dev ? dev->dma_mem : NULL; + + if (mem && vaddr >= mem->virt_base && vaddr + size <= + (mem->virt_base + (mem->size << PAGE_SHIFT))) { + unsigned long off = vma->vm_pgoff; + int start = (vaddr - mem->virt_base) >> PAGE_SHIFT; + int user_count = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + int count = size >> PAGE_SHIFT; + + *ret = -ENXIO; + if (off < count && user_count <= count - off) { + unsigned pfn = mem->pfn_base + start + off; + *ret = remap_pfn_range(vma, vma->vm_start, pfn, + user_count << PAGE_SHIFT, + vma->vm_page_prot); + } + return 1; + } + return 0; +} +EXPORT_SYMBOL(dma_mmap_from_coherent); diff --git a/include/asm-generic/dma-coherent.h b/include/asm-generic/dma-coherent.h index 85a3ffa..abfb268 100644 --- a/include/asm-generic/dma-coherent.h +++ b/include/asm-generic/dma-coherent.h @@ -3,13 +3,15 @@

#ifdef CONFIG_HAVE_GENERIC_DMA_COHERENT /* - * These two functions are only for dma allocator. + * These three functions are only for dma allocator. * Don't use them in device drivers. */ int dma_alloc_from_coherent(struct device *dev, ssize_t size, dma_addr_t *dma_handle, void **ret); int dma_release_from_coherent(struct device *dev, int order, void *vaddr);

+int dma_mmap_from_coherent(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, size_t size, int *ret); /* * Standard interface */

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 02/10] ARM: dma-mapping: use pr_* instread of printk

Replace all calls to printk with pr_* functions family.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Acked-by: Arnd Bergmann arnd@arndb.de Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/mm/dma-mapping.c | 16 ++++++++-------- 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index db23ae4..366f3a2 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -184,14 +184,14 @@ static int __init consistent_init(void)

pud = pud_alloc(&init_mm, pgd, base); if (!pud) { - printk(KERN_ERR "%s: no pud tables\n", __func__); + pr_err("%s: no pud tables\n", __func__); ret = -ENOMEM; break; }

pmd = pmd_alloc(&init_mm, pud, base); if (!pmd) { - printk(KERN_ERR "%s: no pmd tables\n", __func__); + pr_err("%s: no pmd tables\n", __func__); ret = -ENOMEM; break; } @@ -199,7 +199,7 @@ static int __init consistent_init(void)

pte = pte_alloc_kernel(pmd, base); if (!pte) { - printk(KERN_ERR "%s: no pte tables\n", __func__); + pr_err("%s: no pte tables\n", __func__); ret = -ENOMEM; break; } @@ -222,7 +222,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, int bit;

if (!consistent_pte) { - printk(KERN_ERR "%s: not initialised\n", __func__); + pr_err("%s: not initialised\n", __func__); dump_stack(); return NULL; } @@ -281,14 +281,14 @@ static void __dma_free_remap(void *cpu_addr, size_t size)

c = arm_vmregion_find_remove(&consistent_head, (unsigned long)cpu_addr); if (!c) { - printk(KERN_ERR "%s: trying to free invalid coherent area: %p\n", + pr_err("%s: trying to free invalid coherent area: %p\n", __func__, cpu_addr); dump_stack(); return; }

if ((c->vm_end - c->vm_start) != size) { - printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n", + pr_err("%s: freeing wrong coherent size (%ld != %d)\n", __func__, c->vm_end - c->vm_start, size); dump_stack(); size = c->vm_end - c->vm_start; @@ -310,8 +310,8 @@ static void __dma_free_remap(void *cpu_addr, size_t size) }

if (pte_none(pte) || !pte_present(pte)) - printk(KERN_CRIT "%s: bad page in kernel page table\n", - __func__); + pr_crit("%s: bad page in kernel page table\n", + __func__); } while (size -= PAGE_SIZE);

flush_tlb_kernel_range(c->vm_start, c->vm_end);

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 03/10] ARM: dma-mapping: introduce DMA_ERROR_CODE constant

Replace all uses of ~0 with DMA_ERROR_CODE, what should make the code easier to read.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/common/dmabounce.c | 6 +++--- arch/arm/include/asm/dma-mapping.h | 4 +++- arch/arm/mm/dma-mapping.c | 2 +- 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 595ecd29..210ad1b 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -254,7 +254,7 @@ static inline dma_addr_t map_single(struct device *dev, void *ptr, size_t size, if (buf == NULL) { dev_err(dev, "%s: unable to map unsafe buffer %p!\n", __func__, ptr); - return ~0; + return DMA_ERROR_CODE; }

dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", @@ -320,7 +320,7 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page,

ret = needs_bounce(dev, dma_addr, size); if (ret < 0) - return ~0; + return DMA_ERROR_CODE;

if (ret == 0) { __dma_page_cpu_to_dev(page, offset, size, dir); @@ -329,7 +329,7 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page,

if (PageHighMem(page)) { dev_err(dev, "DMA buffer bouncing of HIGHMEM pages is not supported\n"); - return ~0; + return DMA_ERROR_CODE; }

return map_single(dev, page_address(page) + offset, size, dir); diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index cb3b7c9..6a838da 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -10,6 +10,8 @@ #include <asm-generic/dma-coherent.h> #include <asm/memory.h>

+#define DMA_ERROR_CODE (~0) + #ifdef __arch_page_to_dma #error Please update to __arch_pfn_to_dma #endif @@ -123,7 +125,7 @@ extern int dma_set_mask(struct device *, u64); */ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) { - return dma_addr == ~0; + return dma_addr == DMA_ERROR_CODE; }

/* diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 366f3a2..0d6e203 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -342,7 +342,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, */ gfp &= ~(__GFP_COMP);

- *handle = ~0; + *handle = DMA_ERROR_CODE; size = PAGE_ALIGN(size);

page = __dma_alloc_buffer(dev, size, gfp);

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 04/10] ARM: dma-mapping: remove offset parameter to prepare for generic dma_ops

This patch removes the need for the offset parameter in dma bounce functions. This is required to let dma-mapping framework on ARM architecture to use common, generic dma_map_ops based dma-mapping helpers.

Background and more detailed explaination:

dma_*_range_* functions are available from the early days of the dma mapping api. They are the correct way of doing a partial syncs on the buffer (usually used by the network device drivers). This patch changes only the internal implementation of the dma bounce functions to let them tunnel through dma_map_ops structure. The driver api stays unchanged, so driver are obliged to call dma_*_range_* functions to keep code clean and easy to understand.

The only drawback from this patch is reduced detection of the dma api abuse. Let us consider the following code:

dma_addr = dma_map_single(dev, ptr, 64, DMA_TO_DEVICE); dma_sync_single_range_for_cpu(dev, dma_addr+16, 0, 32, DMA_TO_DEVICE);

Without the patch such code fails, because dma bounce code is unable to find the bounce buffer for the given dma_address. After the patch the above sync call will be equivalent to:

dma_sync_single_range_for_cpu(dev, dma_addr, 16, 32, DMA_TO_DEVICE);

which succeeds.

I don't consider this as a real problem, because DMA API abuse should be caught by debug_dma_* function family. This patch lets us to simplify the internal low-level implementation without chaning the driver visible API.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/common/dmabounce.c | 13 +++++-- arch/arm/include/asm/dma-mapping.h | 67 +++++++++++++++++------------------ arch/arm/mm/dma-mapping.c | 4 +- 3 files changed, 45 insertions(+), 39 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 210ad1b..32e9cc6 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -173,7 +173,8 @@ find_safe_buffer(struct dmabounce_device_info *device_info, dma_addr_t safe_dma_ read_lock_irqsave(&device_info->lock, flags);

list_for_each_entry(b, &device_info->safe_buffers, node) - if (b->safe_dma_addr == safe_dma_addr) { + if (b->safe_dma_addr <= safe_dma_addr && + b->safe_dma_addr + b->size > safe_dma_addr) { rb = b; break; } @@ -362,9 +363,10 @@ void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, EXPORT_SYMBOL(__dma_unmap_page);

int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, - unsigned long off, size_t sz, enum dma_data_direction dir) + size_t sz, enum dma_data_direction dir) { struct safe_buffer *buf; + unsigned long off;

dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", __func__, addr, off, sz, dir); @@ -373,6 +375,8 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, if (!buf) return 1;

+ off = addr - buf->safe_dma_addr; + BUG_ON(buf->direction != dir);

dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", @@ -391,9 +395,10 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, EXPORT_SYMBOL(dmabounce_sync_for_cpu);

int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, - unsigned long off, size_t sz, enum dma_data_direction dir) + size_t sz, enum dma_data_direction dir) { struct safe_buffer *buf; + unsigned long off;

dev_dbg(dev, "%s(dma=%#x,off=%#lx,sz=%zx,dir=%x)\n", __func__, addr, off, sz, dir); @@ -402,6 +407,8 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, if (!buf) return 1;

+ off = addr - buf->safe_dma_addr; + BUG_ON(buf->direction != dir);

dev_dbg(dev, "%s: unsafe buffer %p (dma=%#x) mapped to %p (dma=%#x)\n", diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 6a838da..eeddbe2 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -266,19 +266,17 @@ extern void __dma_unmap_page(struct device *, dma_addr_t, size_t, /* * Private functions */ -int dmabounce_sync_for_cpu(struct device *, dma_addr_t, unsigned long, - size_t, enum dma_data_direction); -int dmabounce_sync_for_device(struct device *, dma_addr_t, unsigned long, - size_t, enum dma_data_direction); +int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction); +int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction); #else static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr, - unsigned long offset, size_t size, enum dma_data_direction dir) + size_t size, enum dma_data_direction dir) { return 1; }

static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr, - unsigned long offset, size_t size, enum dma_data_direction dir) + size_t size, enum dma_data_direction dir) { return 1; } @@ -401,6 +399,33 @@ static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, __dma_unmap_page(dev, handle, size, dir); }

+ +static inline void dma_sync_single_for_cpu(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + BUG_ON(!valid_dma_direction(dir)); + + debug_dma_sync_single_for_cpu(dev, handle, size, dir); + + if (!dmabounce_sync_for_cpu(dev, handle, size, dir)) + return; + + __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); +} + +static inline void dma_sync_single_for_device(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + BUG_ON(!valid_dma_direction(dir)); + + debug_dma_sync_single_for_device(dev, handle, size, dir); + + if (!dmabounce_sync_for_device(dev, handle, size, dir)) + return; + + __dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir); +} + /** * dma_sync_single_range_for_cpu * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices @@ -423,40 +448,14 @@ static inline void dma_sync_single_range_for_cpu(struct device *dev, dma_addr_t handle, unsigned long offset, size_t size, enum dma_data_direction dir) { - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_cpu(dev, handle + offset, size, dir); - - if (!dmabounce_sync_for_cpu(dev, handle, offset, size, dir)) - return; - - __dma_single_dev_to_cpu(dma_to_virt(dev, handle) + offset, size, dir); + dma_sync_single_for_cpu(dev, handle + offset, size, dir); }

static inline void dma_sync_single_range_for_device(struct device *dev, dma_addr_t handle, unsigned long offset, size_t size, enum dma_data_direction dir) { - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_device(dev, handle + offset, size, dir); - - if (!dmabounce_sync_for_device(dev, handle, offset, size, dir)) - return; - - __dma_single_cpu_to_dev(dma_to_virt(dev, handle) + offset, size, dir); -} - -static inline void dma_sync_single_for_cpu(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - dma_sync_single_range_for_cpu(dev, handle, 0, size, dir); -} - -static inline void dma_sync_single_for_device(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - dma_sync_single_range_for_device(dev, handle, 0, size, dir); + dma_sync_single_for_device(dev, handle + offset, size, dir); }

/* diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 0d6e203..cd228c6 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -657,7 +657,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int i;

for_each_sg(sg, s, nents, i) { - if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), 0, + if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), sg_dma_len(s), dir)) continue;

@@ -683,7 +683,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int i;

for_each_sg(sg, s, nents, i) { - if (!dmabounce_sync_for_device(dev, sg_dma_address(s), 0, + if (!dmabounce_sync_for_device(dev, sg_dma_address(s), sg_dma_len(s), dir)) continue;

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 05/10] ARM: dma-mapping: use asm-generic/dma-mapping-common.h

This patch modifies dma-mapping implementation on ARM architecture to use common dma_map_ops structure and asm-generic/dma-mapping-common.h helpers.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/Kconfig | 1 + arch/arm/include/asm/device.h | 1 + arch/arm/include/asm/dma-mapping.h | 196 +++++------------------------------- arch/arm/mm/dma-mapping.c | 148 ++++++++++++++++----------- 4 files changed, 115 insertions(+), 231 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index cf006d4..0fd27d4 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -4,6 +4,7 @@ config ARM select HAVE_AOUT select HAVE_DMA_API_DEBUG select HAVE_IDE if PCI || ISA || PCMCIA + select HAVE_DMA_ATTRS select HAVE_MEMBLOCK select RTC_LIB select SYS_SUPPORTS_APM_EMULATION diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 7aa3680..6e2cb0e 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -7,6 +7,7 @@ #define ASMARM_DEVICE_H

struct dev_archdata { + struct dma_map_ops *dma_ops; #ifdef CONFIG_DMABOUNCE struct dmabounce_device_info *dmabounce; #endif diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index eeddbe2..6725a08 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -11,6 +11,27 @@ #include <asm/memory.h>

#define DMA_ERROR_CODE (~0) +extern struct dma_map_ops arm_dma_ops; + +static inline struct dma_map_ops *get_dma_ops(struct device *dev) +{ + if (dev && dev->archdata.dma_ops) + return dev->archdata.dma_ops; + return &arm_dma_ops; +} + +static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops) +{ + BUG_ON(!dev); + dev->archdata.dma_ops = ops; +} + +#include <asm-generic/dma-mapping-common.h> + +static inline int dma_set_mask(struct device *dev, u64 mask) +{ + return get_dma_ops(dev)->set_dma_mask(dev, mask); +}

#ifdef __arch_page_to_dma #error Please update to __arch_pfn_to_dma @@ -119,7 +140,6 @@ static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off,

extern int dma_supported(struct device *, u64); extern int dma_set_mask(struct device *, u64); - /* * DMA errors are defined by all-bits-set in the DMA address. */ @@ -297,179 +317,17 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, } #endif /* CONFIG_DMABOUNCE */

-/** - * dma_map_single - map a single buffer for streaming DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @cpu_addr: CPU direct mapped address of buffer - * @size: size of buffer to map - * @dir: DMA transfer direction - * - * Ensure that any data held in the cache is appropriately discarded - * or written back. - * - * The device owns this memory once this call has completed. The CPU - * can regain ownership by calling dma_unmap_single() or - * dma_sync_single_for_cpu(). - */ -static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr, - size_t size, enum dma_data_direction dir) -{ - unsigned long offset; - struct page *page; - dma_addr_t addr; - - BUG_ON(!virt_addr_valid(cpu_addr)); - BUG_ON(!virt_addr_valid(cpu_addr + size - 1)); - BUG_ON(!valid_dma_direction(dir)); - - page = virt_to_page(cpu_addr); - offset = (unsigned long)cpu_addr & ~PAGE_MASK; - addr = __dma_map_page(dev, page, offset, size, dir); - debug_dma_map_page(dev, page, offset, size, dir, addr, true); - - return addr; -} - -/** - * dma_map_page - map a portion of a page for streaming DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @page: page that buffer resides in - * @offset: offset into page for start of buffer - * @size: size of buffer to map - * @dir: DMA transfer direction - * - * Ensure that any data held in the cache is appropriately discarded - * or written back. - * - * The device owns this memory once this call has completed. The CPU - * can regain ownership by calling dma_unmap_page(). - */ -static inline dma_addr_t dma_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, enum dma_data_direction dir) -{ - dma_addr_t addr; - - BUG_ON(!valid_dma_direction(dir)); - - addr = __dma_map_page(dev, page, offset, size, dir); - debug_dma_map_page(dev, page, offset, size, dir, addr, false); - - return addr; -} - -/** - * dma_unmap_single - unmap a single buffer previously mapped - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @handle: DMA address of buffer - * @size: size of buffer (same as passed to dma_map_single) - * @dir: DMA transfer direction (same as passed to dma_map_single) - * - * Unmap a single streaming mode DMA translation. The handle and size - * must match what was provided in the previous dma_map_single() call. - * All other usages are undefined. - * - * After this call, reads by the CPU to the buffer are guaranteed to see - * whatever the device wrote there. - */ -static inline void dma_unmap_single(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) -{ - debug_dma_unmap_page(dev, handle, size, dir, true); - __dma_unmap_page(dev, handle, size, dir); -} - -/** - * dma_unmap_page - unmap a buffer previously mapped through dma_map_page() - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @handle: DMA address of buffer - * @size: size of buffer (same as passed to dma_map_page) - * @dir: DMA transfer direction (same as passed to dma_map_page) - * - * Unmap a page streaming mode DMA translation. The handle and size - * must match what was provided in the previous dma_map_page() call. - * All other usages are undefined. - * - * After this call, reads by the CPU to the buffer are guaranteed to see - * whatever the device wrote there. - */ -static inline void dma_unmap_page(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) -{ - debug_dma_unmap_page(dev, handle, size, dir, false); - __dma_unmap_page(dev, handle, size, dir); -} - - -static inline void dma_sync_single_for_cpu(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_cpu(dev, handle, size, dir); - - if (!dmabounce_sync_for_cpu(dev, handle, size, dir)) - return; - - __dma_single_dev_to_cpu(dma_to_virt(dev, handle), size, dir); -} - -static inline void dma_sync_single_for_device(struct device *dev, - dma_addr_t handle, size_t size, enum dma_data_direction dir) -{ - BUG_ON(!valid_dma_direction(dir)); - - debug_dma_sync_single_for_device(dev, handle, size, dir); - - if (!dmabounce_sync_for_device(dev, handle, size, dir)) - return; - - __dma_single_cpu_to_dev(dma_to_virt(dev, handle), size, dir); -} - -/** - * dma_sync_single_range_for_cpu - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @handle: DMA address of buffer - * @offset: offset of region to start sync - * @size: size of region to sync - * @dir: DMA transfer direction (same as passed to dma_map_single) - * - * Make physical memory consistent for a single streaming mode DMA - * translation after a transfer. - * - * If you perform a dma_map_single() but wish to interrogate the - * buffer using the cpu, yet do not wish to teardown the PCI dma - * mapping, you must call this function before doing so. At the - * next point you give the PCI dma address back to the card, you - * must first the perform a dma_sync_for_device, and then the - * device again owns the buffer. - */ -static inline void dma_sync_single_range_for_cpu(struct device *dev, - dma_addr_t handle, unsigned long offset, size_t size, - enum dma_data_direction dir) -{ - dma_sync_single_for_cpu(dev, handle + offset, size, dir); -} - -static inline void dma_sync_single_range_for_device(struct device *dev, - dma_addr_t handle, unsigned long offset, size_t size, - enum dma_data_direction dir) -{ - dma_sync_single_for_device(dev, handle + offset, size, dir); -} - /* * The scatter list versions of the above methods. */ -extern int dma_map_sg(struct device *, struct scatterlist *, int, - enum dma_data_direction); -extern void dma_unmap_sg(struct device *, struct scatterlist *, int, +extern int arm_dma_map_sg(struct device *, struct scatterlist *, int, + enum dma_data_direction, struct dma_attrs *attrs); +extern void arm_dma_unmap_sg(struct device *, struct scatterlist *, int, + enum dma_data_direction, struct dma_attrs *attrs); +extern void arm_dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int, enum dma_data_direction); -extern void dma_sync_sg_for_cpu(struct device *, struct scatterlist *, int, +extern void arm_dma_sync_sg_for_device(struct device *, struct scatterlist *, int, enum dma_data_direction); -extern void dma_sync_sg_for_device(struct device *, struct scatterlist *, int, - enum dma_data_direction); -

#endif /* __KERNEL__ */ #endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index cd228c6..92ffb0a 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -29,6 +29,85 @@

#include "mm.h"

+/** + * arm_dma_map_page - map a portion of a page for streaming DMA + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @page: page that buffer resides in + * @offset: offset into page for start of buffer + * @size: size of buffer to map + * @dir: DMA transfer direction + * + * Ensure that any data held in the cache is appropriately discarded + * or written back. + * + * The device owns this memory once this call has completed. The CPU + * can regain ownership by calling dma_unmap_page(). + */ +static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) +{ + return __dma_map_page(dev, page, offset, size, dir); +} + +/** + * arm_dma_unmap_page - unmap a buffer previously mapped through dma_map_page() + * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices + * @handle: DMA address of buffer + * @size: size of buffer (same as passed to dma_map_page) + * @dir: DMA transfer direction (same as passed to dma_map_page) + * + * Unmap a page streaming mode DMA translation. The handle and size + * must match what was provided in the previous dma_map_page() call. + * All other usages are undefined. + * + * After this call, reads by the CPU to the buffer are guaranteed to see + * whatever the device wrote there. + */ +static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) +{ + __dma_unmap_page(dev, handle, size, dir); +} + +static inline void arm_dma_sync_single_for_cpu(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + unsigned int offset = handle & (PAGE_SIZE - 1); + struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); + if (!dmabounce_sync_for_cpu(dev, handle, size, dir)) + return; + + __dma_page_dev_to_cpu(page, offset, size, dir); +} + +static inline void arm_dma_sync_single_for_device(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + unsigned int offset = handle & (PAGE_SIZE - 1); + struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); + if (!dmabounce_sync_for_device(dev, handle, size, dir)) + return; + + __dma_page_cpu_to_dev(page, offset, size, dir); +} + +static int arm_dma_set_mask(struct device *dev, u64 dma_mask); + +struct dma_map_ops arm_dma_ops = { + .map_page = arm_dma_map_page, + .unmap_page = arm_dma_unmap_page, + .map_sg = arm_dma_map_sg, + .unmap_sg = arm_dma_unmap_sg, + .sync_single_for_cpu = arm_dma_sync_single_for_cpu, + .sync_single_for_device = arm_dma_sync_single_for_device, + .sync_sg_for_cpu = arm_dma_sync_sg_for_cpu, + .sync_sg_for_device = arm_dma_sync_sg_for_device, + .set_dma_mask = arm_dma_set_mask, +}; +EXPORT_SYMBOL(arm_dma_ops); + static u64 get_coherent_dma_mask(struct device *dev) { u64 mask = (u64)arm_dma_limit; @@ -458,47 +537,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr } EXPORT_SYMBOL(dma_free_coherent);

-/* - * Make an area consistent for devices. - * Note: Drivers should NOT use this function directly, as it will break - * platforms with CONFIG_DMABOUNCE. - * Use the driver DMA support - see dma-mapping.h (dma_sync_*) - */ -void ___dma_single_cpu_to_dev(const void *kaddr, size_t size, - enum dma_data_direction dir) -{ - unsigned long paddr; - - BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1)); - - dmac_map_area(kaddr, size, dir); - - paddr = __pa(kaddr); - if (dir == DMA_FROM_DEVICE) { - outer_inv_range(paddr, paddr + size); - } else { - outer_clean_range(paddr, paddr + size); - } - /* FIXME: non-speculating: flush on bidirectional mappings? */ -} -EXPORT_SYMBOL(___dma_single_cpu_to_dev); - -void ___dma_single_dev_to_cpu(const void *kaddr, size_t size, - enum dma_data_direction dir) -{ - BUG_ON(!virt_addr_valid(kaddr) || !virt_addr_valid(kaddr + size - 1)); - - /* FIXME: non-speculating: not required */ - /* don't bother invalidating if DMA to device */ - if (dir != DMA_TO_DEVICE) { - unsigned long paddr = __pa(kaddr); - outer_inv_range(paddr, paddr + size); - } - - dmac_unmap_area(kaddr, size, dir); -} -EXPORT_SYMBOL(___dma_single_dev_to_cpu); - static void dma_cache_maint_page(struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, void (*op)(const void *, size_t, int)) @@ -596,21 +634,18 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu); * Device ownership issues as mentioned for dma_map_single are the same * here. */ -int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, - enum dma_data_direction dir) +int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir, struct dma_attrs *attrs) { struct scatterlist *s; int i, j;

- BUG_ON(!valid_dma_direction(dir)); - for_each_sg(sg, s, nents, i) { s->dma_address = __dma_map_page(dev, sg_page(s), s->offset, s->length, dir); if (dma_mapping_error(dev, s->dma_address)) goto bad_mapping; } - debug_dma_map_sg(dev, sg, nents, nents, dir); return nents;

bad_mapping: @@ -618,7 +653,6 @@ int dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); return 0; } -EXPORT_SYMBOL(dma_map_sg);

/** * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg @@ -630,18 +664,15 @@ EXPORT_SYMBOL(dma_map_sg); * Unmap a set of streaming mode DMA translations. Again, CPU access * rules concerning calls here are the same as for dma_unmap_single(). */ -void dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, - enum dma_data_direction dir) +void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir, struct dma_attrs *attrs) { struct scatterlist *s; int i;

- debug_dma_unmap_sg(dev, sg, nents, dir); - for_each_sg(sg, s, nents, i) __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); } -EXPORT_SYMBOL(dma_unmap_sg);

/** * dma_sync_sg_for_cpu @@ -650,7 +681,7 @@ EXPORT_SYMBOL(dma_unmap_sg); * @nents: number of buffers to map (returned from dma_map_sg) * @dir: DMA transfer direction (same as was passed to dma_map_sg) */ -void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, +void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { struct scatterlist *s; @@ -664,10 +695,7 @@ void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); } - - debug_dma_sync_sg_for_cpu(dev, sg, nents, dir); } -EXPORT_SYMBOL(dma_sync_sg_for_cpu);

/** * dma_sync_sg_for_device @@ -676,7 +704,7 @@ EXPORT_SYMBOL(dma_sync_sg_for_cpu); * @nents: number of buffers to map (returned from dma_map_sg) * @dir: DMA transfer direction (same as was passed to dma_map_sg) */ -void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, +void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { struct scatterlist *s; @@ -690,10 +718,7 @@ void dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); } - - debug_dma_sync_sg_for_device(dev, sg, nents, dir); } -EXPORT_SYMBOL(dma_sync_sg_for_device);

/* * Return whether the given device DMA address mask can be supported @@ -709,7 +734,7 @@ int dma_supported(struct device *dev, u64 mask) } EXPORT_SYMBOL(dma_supported);

-int dma_set_mask(struct device *dev, u64 dma_mask) +static int arm_dma_set_mask(struct device *dev, u64 dma_mask) { if (!dev->dma_mask || !dma_supported(dev, dma_mask)) return -EIO; @@ -720,7 +745,6 @@ int dma_set_mask(struct device *dev, u64 dma_mask)

return 0; } -EXPORT_SYMBOL(dma_set_mask);

#define PREALLOC_DMA_DEBUG_ENTRIES 4096

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 06/10] ARM: dma-mapping: implement dma sg methods on top of any generic dma ops

This patch converts all dma_sg methods to be generic (independent of the current DMA mapping implementation for ARM architecture). All dma sg operations are now implemented on top of respective dma_map_page/dma_sync_single_for* operations from dma_map_ops structure.

Before this patch there were custom methods for all scatter/gather related operations. They iterated over the whole scatter list and called cache related operations directly (which in turn checked if we use dma bounce code or not and called respective version). This patch changes them not to use such shortcut. Instead it provides similar loop over scatter list and calls methods from the device's dma_map_ops structure. This enables us to use device dependent implementations of cache related operations (direct linear or dma bounce) depending on the provided dma_map_ops structure.

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 92ffb0a..c08909e 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -619,7 +619,7 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, EXPORT_SYMBOL(___dma_page_dev_to_cpu);

/** - * dma_map_sg - map a set of SG buffers for streaming mode DMA + * arm_dma_map_sg - map a set of SG buffers for streaming mode DMA * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to map @@ -637,12 +637,13 @@ EXPORT_SYMBOL(___dma_page_dev_to_cpu); int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, struct dma_attrs *attrs) { + struct dma_map_ops *ops = get_dma_ops(dev); struct scatterlist *s; int i, j;

for_each_sg(sg, s, nents, i) { - s->dma_address = __dma_map_page(dev, sg_page(s), s->offset, - s->length, dir); + s->dma_address = ops->map_page(dev, sg_page(s), s->offset, + s->length, dir, attrs); if (dma_mapping_error(dev, s->dma_address)) goto bad_mapping; } @@ -650,12 +651,12 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,

bad_mapping: for_each_sg(sg, s, i, j) - __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); + ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs); return 0; }

/** - * dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg + * arm_dma_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to unmap (same as was passed to dma_map_sg) @@ -667,15 +668,17 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir, struct dma_attrs *attrs) { + struct dma_map_ops *ops = get_dma_ops(dev); struct scatterlist *s; + int i;

for_each_sg(sg, s, nents, i) - __dma_unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir); + ops->unmap_page(dev, sg_dma_address(s), sg_dma_len(s), dir, attrs); }

/** - * dma_sync_sg_for_cpu + * arm_dma_sync_sg_for_cpu * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to map (returned from dma_map_sg) @@ -684,21 +687,17 @@ void arm_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { + struct dma_map_ops *ops = get_dma_ops(dev); struct scatterlist *s; int i;

- for_each_sg(sg, s, nents, i) { - if (!dmabounce_sync_for_cpu(dev, sg_dma_address(s), - sg_dma_len(s), dir)) - continue; - - __dma_page_dev_to_cpu(sg_page(s), s->offset, - s->length, dir); - } + for_each_sg(sg, s, nents, i) + ops->sync_single_for_cpu(dev, sg_dma_address(s), s->length, + dir); }

/** - * dma_sync_sg_for_device + * arm_dma_sync_sg_for_device * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @sg: list of buffers * @nents: number of buffers to map (returned from dma_map_sg) @@ -707,17 +706,13 @@ void arm_dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, void arm_dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents, enum dma_data_direction dir) { + struct dma_map_ops *ops = get_dma_ops(dev); struct scatterlist *s; int i;

- for_each_sg(sg, s, nents, i) { - if (!dmabounce_sync_for_device(dev, sg_dma_address(s), - sg_dma_len(s), dir)) - continue; - - __dma_page_cpu_to_dev(sg_page(s), s->offset, - s->length, dir); - } + for_each_sg(sg, s, nents, i) + ops->sync_single_for_device(dev, sg_dma_address(s), s->length, + dir); }

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 07/10] ARM: dma-mapping: move all dma bounce code to separate dma ops structure

This patch removes dma bounce hooks from the common dma mapping implementation on ARM architecture and creates a separate set of dma_map_ops for dma bounce devices.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/common/dmabounce.c | 62 ++++++++++++++++++----- arch/arm/include/asm/dma-mapping.h | 99 +----------------------------------- arch/arm/mm/dma-mapping.c | 79 +++++++++++++++++++++++++---- 3 files changed, 120 insertions(+), 120 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index 32e9cc6..a4874af 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -308,8 +308,9 @@ static inline void unmap_single(struct device *dev, struct safe_buffer *buf, * substitute the safe buffer for the unsafe one. * (basically move the buffer from an unsafe area to a safe one) */ -dma_addr_t __dma_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, enum dma_data_direction dir) +static dma_addr_t dmabounce_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) { dma_addr_t dma_addr; int ret; @@ -324,7 +325,7 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page, return DMA_ERROR_CODE;

if (ret == 0) { - __dma_page_cpu_to_dev(page, offset, size, dir); + arm_dma_ops.sync_single_for_device(dev, dma_addr, size, dir); return dma_addr; }

@@ -335,7 +336,6 @@ dma_addr_t __dma_map_page(struct device *dev, struct page *page,

return map_single(dev, page_address(page) + offset, size, dir); } -EXPORT_SYMBOL(__dma_map_page);

/* * see if a mapped address was really a "safe" buffer and if so, copy @@ -343,8 +343,8 @@ EXPORT_SYMBOL(__dma_map_page); * the safe buffer. (basically return things back to the way they * should be) */ -void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, - enum dma_data_direction dir) +static void dmabounce_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size, + enum dma_data_direction dir, struct dma_attrs *attrs) { struct safe_buffer *buf;

@@ -353,16 +353,14 @@ void __dma_unmap_page(struct device *dev, dma_addr_t dma_addr, size_t size,

buf = find_safe_buffer_dev(dev, dma_addr, __func__); if (!buf) { - __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, dma_addr)), - dma_addr & ~PAGE_MASK, size, dir); + arm_dma_ops.sync_single_for_cpu(dev, dma_addr, size, dir); return; }

unmap_single(dev, buf, size, dir); } -EXPORT_SYMBOL(__dma_unmap_page);

-int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, +static int __dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, size_t sz, enum dma_data_direction dir) { struct safe_buffer *buf; @@ -392,9 +390,17 @@ int dmabounce_sync_for_cpu(struct device *dev, dma_addr_t addr, } return 0; } -EXPORT_SYMBOL(dmabounce_sync_for_cpu);

-int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, +static void dmabounce_sync_for_cpu(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + if (!__dmabounce_sync_for_cpu(dev, handle, size, dir)) + return; + + arm_dma_ops.sync_single_for_cpu(dev, handle, size, dir); +} + +static int __dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, size_t sz, enum dma_data_direction dir) { struct safe_buffer *buf; @@ -424,7 +430,35 @@ int dmabounce_sync_for_device(struct device *dev, dma_addr_t addr, } return 0; } -EXPORT_SYMBOL(dmabounce_sync_for_device); + +static void dmabounce_sync_for_device(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + if (!__dmabounce_sync_for_device(dev, handle, size, dir)) + return; + + arm_dma_ops.sync_single_for_device(dev, handle, size, dir); +} + +static int dmabounce_set_mask(struct device *dev, u64 dma_mask) +{ + if (dev->archdata.dmabounce) + return 0; + + return arm_dma_ops.set_dma_mask(dev, dma_mask); +} + +static struct dma_map_ops dmabounce_ops = { + .map_page = dmabounce_map_page, + .unmap_page = dmabounce_unmap_page, + .sync_single_for_cpu = dmabounce_sync_for_cpu, + .sync_single_for_device = dmabounce_sync_for_device, + .map_sg = generic_dma_map_sg, + .unmap_sg = generic_dma_unmap_sg, + .sync_sg_for_cpu = generic_dma_sync_sg_for_cpu, + .sync_sg_for_device = generic_dma_sync_sg_for_device, + .set_dma_mask = dmabounce_set_mask, +};

static int dmabounce_init_pool(struct dmabounce_pool *pool, struct device *dev, const char *name, unsigned long size) @@ -486,6 +520,7 @@ int dmabounce_register_dev(struct device *dev, unsigned long small_buffer_size, #endif

dev->archdata.dmabounce = device_info; + set_dma_ops(dev, &dmabounce_ops);

dev_info(dev, "dmabounce: registered device\n");

@@ -504,6 +539,7 @@ void dmabounce_unregister_dev(struct device *dev) struct dmabounce_device_info *device_info = dev->archdata.dmabounce;

dev->archdata.dmabounce = NULL; + set_dma_ops(dev, NULL);

if (!device_info) { dev_warn(dev, diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 6725a08..7a7c3c7 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -85,62 +85,6 @@ static inline dma_addr_t virt_to_dma(struct device *dev, void *addr) #endif

/* - * The DMA API is built upon the notion of "buffer ownership". A buffer - * is either exclusively owned by the CPU (and therefore may be accessed - * by it) or exclusively owned by the DMA device. These helper functions - * represent the transitions between these two ownership states. - * - * Note, however, that on later ARMs, this notion does not work due to - * speculative prefetches. We model our approach on the assumption that - * the CPU does do speculative prefetches, which means we clean caches - * before transfers and delay cache invalidation until transfer completion. - * - * Private support functions: these are not part of the API and are - * liable to change. Drivers must not use these. - */ -static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size, - enum dma_data_direction dir) -{ - extern void ___dma_single_cpu_to_dev(const void *, size_t, - enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_single_cpu_to_dev(kaddr, size, dir); -} - -static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size, - enum dma_data_direction dir) -{ - extern void ___dma_single_dev_to_cpu(const void *, size_t, - enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_single_dev_to_cpu(kaddr, size, dir); -} - -static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) -{ - extern void ___dma_page_cpu_to_dev(struct page *, unsigned long, - size_t, enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_page_cpu_to_dev(page, off, size, dir); -} - -static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) -{ - extern void ___dma_page_dev_to_cpu(struct page *, unsigned long, - size_t, enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_page_dev_to_cpu(page, off, size, dir); -} - -extern int dma_supported(struct device *, u64); -extern int dma_set_mask(struct device *, u64); -/* * DMA errors are defined by all-bits-set in the DMA address. */ static inline int dma_mapping_error(struct device *dev, dma_addr_t dma_addr) @@ -163,6 +107,8 @@ static inline void dma_free_noncoherent(struct device *dev, size_t size, { }

+extern int dma_supported(struct device *dev, u64 mask); + /** * dma_alloc_coherent - allocate consistent memory for DMA * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices @@ -235,7 +181,6 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *, extern void __init init_consistent_dma_size(unsigned long size);

-#ifdef CONFIG_DMABOUNCE /* * For SA-1111, IXP425, and ADI systems the dma-mapping functions are "magic" * and utilize bounce buffers as needed to work around limited DMA windows. @@ -275,47 +220,7 @@ extern int dmabounce_register_dev(struct device *, unsigned long, */ extern void dmabounce_unregister_dev(struct device *);

-/* - * The DMA API, implemented by dmabounce.c. See below for descriptions. - */ -extern dma_addr_t __dma_map_page(struct device *, struct page *, - unsigned long, size_t, enum dma_data_direction); -extern void __dma_unmap_page(struct device *, dma_addr_t, size_t, - enum dma_data_direction); - -/* - * Private functions - */ -int dmabounce_sync_for_cpu(struct device *, dma_addr_t, size_t, enum dma_data_direction); -int dmabounce_sync_for_device(struct device *, dma_addr_t, size_t, enum dma_data_direction); -#else -static inline int dmabounce_sync_for_cpu(struct device *d, dma_addr_t addr, - size_t size, enum dma_data_direction dir) -{ - return 1; -} - -static inline int dmabounce_sync_for_device(struct device *d, dma_addr_t addr, - size_t size, enum dma_data_direction dir) -{ - return 1; -} -

-static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(page, offset, size, dir); - return pfn_to_dma(dev, page_to_pfn(page)) + offset; -} - -static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), - handle & ~PAGE_MASK, size, dir); -} -#endif /* CONFIG_DMABOUNCE */

/* * The scatter list versions of the above methods. diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index c08909e..a4db5ec 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -29,6 +29,75 @@

#include "mm.h"

+/* + * The DMA API is built upon the notion of "buffer ownership". A buffer + * is either exclusively owned by the CPU (and therefore may be accessed + * by it) or exclusively owned by the DMA device. These helper functions + * represent the transitions between these two ownership states. + * + * Note, however, that on later ARMs, this notion does not work due to + * speculative prefetches. We model our approach on the assumption that + * the CPU does do speculative prefetches, which means we clean caches + * before transfers and delay cache invalidation until transfer completion. + * + * Private support functions: these are not part of the API and are + * liable to change. Drivers must not use these. + */ +static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size, + enum dma_data_direction dir) +{ + extern void ___dma_single_cpu_to_dev(const void *, size_t, + enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_single_cpu_to_dev(kaddr, size, dir); +} + +static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size, + enum dma_data_direction dir) +{ + extern void ___dma_single_dev_to_cpu(const void *, size_t, + enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_single_dev_to_cpu(kaddr, size, dir); +} + +static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off, + size_t size, enum dma_data_direction dir) +{ + extern void ___dma_page_cpu_to_dev(struct page *, unsigned long, + size_t, enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_page_cpu_to_dev(page, off, size, dir); +} + +static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, + size_t size, enum dma_data_direction dir) +{ + extern void ___dma_page_dev_to_cpu(struct page *, unsigned long, + size_t, enum dma_data_direction); + + if (!arch_is_coherent()) + ___dma_page_dev_to_cpu(page, off, size, dir); +} + + +static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir) +{ + __dma_page_cpu_to_dev(page, offset, size, dir); + return pfn_to_dma(dev, page_to_pfn(page)) + offset; +} + +static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir) +{ + __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), + handle & ~PAGE_MASK, size, dir); +} + /** * arm_dma_map_page - map a portion of a page for streaming DMA * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices @@ -76,9 +145,6 @@ static inline void arm_dma_sync_single_for_cpu(struct device *dev, { unsigned int offset = handle & (PAGE_SIZE - 1); struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); - if (!dmabounce_sync_for_cpu(dev, handle, size, dir)) - return; - __dma_page_dev_to_cpu(page, offset, size, dir); }

@@ -87,9 +153,6 @@ static inline void arm_dma_sync_single_for_device(struct device *dev, { unsigned int offset = handle & (PAGE_SIZE - 1); struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); - if (!dmabounce_sync_for_device(dev, handle, size, dir)) - return; - __dma_page_cpu_to_dev(page, offset, size, dir); }

@@ -596,7 +659,6 @@ void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, } /* FIXME: non-speculating: flush on bidirectional mappings? */ } -EXPORT_SYMBOL(___dma_page_cpu_to_dev);

void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) @@ -616,7 +678,6 @@ void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, if (dir != DMA_TO_DEVICE && off == 0 && size >= PAGE_SIZE) set_bit(PG_dcache_clean, &page->flags); } -EXPORT_SYMBOL(___dma_page_dev_to_cpu);

/** * arm_dma_map_sg - map a set of SG buffers for streaming mode DMA @@ -734,9 +795,7 @@ static int arm_dma_set_mask(struct device *dev, u64 dma_mask) if (!dev->dma_mask || !dma_supported(dev, dma_mask)) return -EIO;

-#ifndef CONFIG_DMABOUNCE *dev->dma_mask = dma_mask; -#endif

return 0; }

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 08/10] ARM: dma-mapping: remove redundant code and cleanup

This patch just performs a global cleanup in DMA mapping implementation for ARM architecture. Some of the tiny helper functions have been moved to the caller code, some have been merged together.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Acked-by: Arnd Bergmann arnd@arndb.de Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/mm/dma-mapping.c | 88 ++++++++++++-------------------------------- 1 files changed, 24 insertions(+), 64 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index a4db5ec..615fabd 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -40,64 +40,12 @@ * the CPU does do speculative prefetches, which means we clean caches * before transfers and delay cache invalidation until transfer completion. * - * Private support functions: these are not part of the API and are - * liable to change. Drivers must not use these. */ -static inline void __dma_single_cpu_to_dev(const void *kaddr, size_t size, - enum dma_data_direction dir) -{ - extern void ___dma_single_cpu_to_dev(const void *, size_t, - enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_single_cpu_to_dev(kaddr, size, dir); -} - -static inline void __dma_single_dev_to_cpu(const void *kaddr, size_t size, - enum dma_data_direction dir) -{ - extern void ___dma_single_dev_to_cpu(const void *, size_t, - enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_single_dev_to_cpu(kaddr, size, dir); -} - -static inline void __dma_page_cpu_to_dev(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) -{ - extern void ___dma_page_cpu_to_dev(struct page *, unsigned long, +static void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t, enum dma_data_direction); - - if (!arch_is_coherent()) - ___dma_page_cpu_to_dev(page, off, size, dir); -} - -static inline void __dma_page_dev_to_cpu(struct page *page, unsigned long off, - size_t size, enum dma_data_direction dir) -{ - extern void ___dma_page_dev_to_cpu(struct page *, unsigned long, +static void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t, enum dma_data_direction);

- if (!arch_is_coherent()) - ___dma_page_dev_to_cpu(page, off, size, dir); -} - - -static inline dma_addr_t __dma_map_page(struct device *dev, struct page *page, - unsigned long offset, size_t size, enum dma_data_direction dir) -{ - __dma_page_cpu_to_dev(page, offset, size, dir); - return pfn_to_dma(dev, page_to_pfn(page)) + offset; -} - -static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, - size_t size, enum dma_data_direction dir) -{ - __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), - handle & ~PAGE_MASK, size, dir); -} - /** * arm_dma_map_page - map a portion of a page for streaming DMA * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices @@ -112,11 +60,13 @@ static inline void __dma_unmap_page(struct device *dev, dma_addr_t handle, * The device owns this memory once this call has completed. The CPU * can regain ownership by calling dma_unmap_page(). */ -static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page, +static dma_addr_t arm_dma_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, struct dma_attrs *attrs) { - return __dma_map_page(dev, page, offset, size, dir); + if (!arch_is_coherent()) + __dma_page_cpu_to_dev(page, offset, size, dir); + return pfn_to_dma(dev, page_to_pfn(page)) + offset; }

/** @@ -133,27 +83,31 @@ static inline dma_addr_t arm_dma_map_page(struct device *dev, struct page *page, * After this call, reads by the CPU to the buffer are guaranteed to see * whatever the device wrote there. */ -static inline void arm_dma_unmap_page(struct device *dev, dma_addr_t handle, +static void arm_dma_unmap_page(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir, struct dma_attrs *attrs) { - __dma_unmap_page(dev, handle, size, dir); + if (!arch_is_coherent()) + __dma_page_dev_to_cpu(pfn_to_page(dma_to_pfn(dev, handle)), + handle & ~PAGE_MASK, size, dir); }

-static inline void arm_dma_sync_single_for_cpu(struct device *dev, +static void arm_dma_sync_single_for_cpu(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { unsigned int offset = handle & (PAGE_SIZE - 1); struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); - __dma_page_dev_to_cpu(page, offset, size, dir); + if (!arch_is_coherent()) + __dma_page_dev_to_cpu(page, offset, size, dir); }

-static inline void arm_dma_sync_single_for_device(struct device *dev, +static void arm_dma_sync_single_for_device(struct device *dev, dma_addr_t handle, size_t size, enum dma_data_direction dir) { unsigned int offset = handle & (PAGE_SIZE - 1); struct page *page = pfn_to_page(dma_to_pfn(dev, handle-offset)); - __dma_page_cpu_to_dev(page, offset, size, dir); + if (!arch_is_coherent()) + __dma_page_cpu_to_dev(page, offset, size, dir); }

static int arm_dma_set_mask(struct device *dev, u64 dma_mask); @@ -644,7 +598,13 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset, } while (left); }

-void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, +/* + * Make an area consistent for devices. + * Note: Drivers should NOT use this function directly, as it will break + * platforms with CONFIG_DMABOUNCE. + * Use the driver DMA support - see dma-mapping.h (dma_sync_*) + */ +static void __dma_page_cpu_to_dev(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) { unsigned long paddr; @@ -660,7 +620,7 @@ void ___dma_page_cpu_to_dev(struct page *page, unsigned long off, /* FIXME: non-speculating: flush on bidirectional mappings? */ }

-void ___dma_page_dev_to_cpu(struct page *page, unsigned long off, +static void __dma_page_dev_to_cpu(struct page *page, unsigned long off, size_t size, enum dma_data_direction dir) { unsigned long paddr = page_to_phys(page) + off;

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 09/10] ARM: dma-mapping: use alloc, mmap, free from dma_ops

This patch converts dma_alloc/free/mmap_{coherent,writecombine} functions to use generic alloc/free/mmap methods from dma_map_ops structure. A new DMA_ATTR_WRITE_COMBINE DMA attribute have been introduced to implement writecombine methods.

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Acked-by: Arnd Bergmann arnd@arndb.de Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/common/dmabounce.c | 3 + arch/arm/include/asm/dma-mapping.h | 107 ++++++++++++++++++++++++++---------- arch/arm/mm/dma-mapping.c | 54 ++++++------------ 3 files changed, 98 insertions(+), 66 deletions(-)

diff --git a/arch/arm/common/dmabounce.c b/arch/arm/common/dmabounce.c index a4874af..fefe040 100644 --- a/arch/arm/common/dmabounce.c +++ b/arch/arm/common/dmabounce.c @@ -449,6 +449,9 @@ static int dmabounce_set_mask(struct device *dev, u64 dma_mask) }

static struct dma_map_ops dmabounce_ops = { + .alloc = arm_dma_alloc, + .free = arm_dma_free, + .mmap = arm_dma_mmap, .map_page = dmabounce_map_page, .unmap_page = dmabounce_unmap_page, .sync_single_for_cpu = dmabounce_sync_for_cpu, diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 7a7c3c7..bbef15d 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -5,6 +5,7 @@

#include <linux/mm_types.h> #include <linux/scatterlist.h> +#include <linux/dma-attrs.h> #include <linux/dma-debug.h>

#include <asm-generic/dma-coherent.h> @@ -110,68 +111,115 @@ static inline void dma_free_noncoherent(struct device *dev, size_t size, extern int dma_supported(struct device *dev, u64 mask);

/** - * dma_alloc_coherent - allocate consistent memory for DMA + * arm_dma_alloc - allocate consistent memory for DMA * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @size: required memory size * @handle: bus-specific DMA address + * @attrs: optinal attributes that specific mapping properties * - * Allocate some uncached, unbuffered memory for a device for - * performing DMA. This function allocates pages, and will - * return the CPU-viewed address, and sets @handle to be the - * device-viewed address. + * Allocate some memory for a device for performing DMA. This function + * allocates pages, and will return the CPU-viewed address, and sets @handle + * to be the device-viewed address. */ -extern void *dma_alloc_coherent(struct device *, size_t, dma_addr_t *, gfp_t); +extern void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, + gfp_t gfp, struct dma_attrs *attrs); + +#define dma_alloc_coherent(d, s, h, f) dma_alloc_attrs(d, s, h, f, NULL) + +static inline void *dma_alloc_attrs(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t flag, + struct dma_attrs *attrs) +{ + struct dma_map_ops *ops = get_dma_ops(dev); + void *cpu_addr; + BUG_ON(!ops); + + cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs); + debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr); + return cpu_addr; +}

/** - * dma_free_coherent - free memory allocated by dma_alloc_coherent + * arm_dma_free - free memory allocated by arm_dma_alloc * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @size: size of memory originally requested in dma_alloc_coherent * @cpu_addr: CPU-view address returned from dma_alloc_coherent * @handle: device-view address returned from dma_alloc_coherent + * @attrs: optinal attributes that specific mapping properties * * Free (and unmap) a DMA buffer previously allocated by - * dma_alloc_coherent(). + * arm_dma_alloc(). * * References to memory and mappings associated with cpu_addr/handle * during and after this call executing are illegal. */ -extern void dma_free_coherent(struct device *, size_t, void *, dma_addr_t); +extern void arm_dma_free(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, struct dma_attrs *attrs); + +#define dma_free_coherent(d, s, c, h) dma_free_attrs(d, s, c, h, NULL) + +static inline void dma_free_attrs(struct device *dev, size_t size, + void *cpu_addr, dma_addr_t dma_handle, + struct dma_attrs *attrs) +{ + struct dma_map_ops *ops = get_dma_ops(dev); + BUG_ON(!ops); + + debug_dma_free_coherent(dev, size, cpu_addr, dma_handle); + ops->free(dev, size, cpu_addr, dma_handle, attrs); +}

/** - * dma_mmap_coherent - map a coherent DMA allocation into user space + * arm_dma_mmap - map a coherent DMA allocation into user space * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices * @vma: vm_area_struct describing requested user mapping * @cpu_addr: kernel CPU-view address returned from dma_alloc_coherent * @handle: device-view address returned from dma_alloc_coherent * @size: size of memory originally requested in dma_alloc_coherent + * @attrs: optinal attributes that specific mapping properties * * Map a coherent DMA buffer previously allocated by dma_alloc_coherent * into user space. The coherent DMA buffer must not be freed by the * driver until the user space mapping has been released. */ -int dma_mmap_coherent(struct device *, struct vm_area_struct *, - void *, dma_addr_t, size_t); +extern int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + struct dma_attrs *attrs);

+#define dma_mmap_coherent(d, v, c, h, s) dma_mmap_attrs(d, v, c, h, s, NULL)

-/** - * dma_alloc_writecombine - allocate writecombining memory for DMA - * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices - * @size: required memory size - * @handle: bus-specific DMA address - * - * Allocate some uncached, buffered memory for a device for - * performing DMA. This function allocates pages, and will - * return the CPU-viewed address, and sets @handle to be the - * device-viewed address. - */ -extern void *dma_alloc_writecombine(struct device *, size_t, dma_addr_t *, - gfp_t); +static inline int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, + size_t size, struct dma_attrs *attrs) +{ + struct dma_map_ops *ops = get_dma_ops(dev); + BUG_ON(!ops); + return ops->mmap(dev, vma, cpu_addr, dma_addr, size, attrs); +}

-#define dma_free_writecombine(dev,size,cpu_addr,handle) \ - dma_free_coherent(dev,size,cpu_addr,handle) +static inline void *dma_alloc_writecombine(struct device *dev, size_t size, + dma_addr_t *dma_handle, gfp_t flag) +{ + DEFINE_DMA_ATTRS(attrs); + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); + return dma_alloc_attrs(dev, size, dma_handle, flag, &attrs); +}

-int dma_mmap_writecombine(struct device *, struct vm_area_struct *, - void *, dma_addr_t, size_t); +static inline void dma_free_writecombine(struct device *dev, size_t size, + void *cpu_addr, dma_addr_t dma_handle) +{ + DEFINE_DMA_ATTRS(attrs); + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); + return dma_free_attrs(dev, size, cpu_addr, dma_handle, &attrs); +} + +static inline int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size) +{ + DEFINE_DMA_ATTRS(attrs); + dma_set_attr(DMA_ATTR_WRITE_COMBINE, &attrs); + return dma_mmap_attrs(dev, vma, cpu_addr, dma_addr, size, &attrs); +}

/* * This can be called during boot to increase the size of the consistent @@ -180,7 +228,6 @@ int dma_mmap_writecombine(struct device *, struct vm_area_struct *, */ extern void __init init_consistent_dma_size(unsigned long size);

- /* * For SA-1111, IXP425, and ADI systems the dma-mapping functions are "magic" * and utilize bounce buffers as needed to work around limited DMA windows. diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 615fabd..d4aad65 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -113,6 +113,9 @@ static void arm_dma_sync_single_for_device(struct device *dev, static int arm_dma_set_mask(struct device *dev, u64 dma_mask);

struct dma_map_ops arm_dma_ops = { + .alloc = arm_dma_alloc, + .free = arm_dma_free, + .mmap = arm_dma_mmap, .map_page = arm_dma_map_page, .unmap_page = arm_dma_unmap_page, .map_sg = arm_dma_map_sg, @@ -462,35 +465,27 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, * Allocate DMA-coherent memory space and return both the kernel remapped * virtual and bus address for that space. */ -void * -dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) +void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, + gfp_t gfp, struct dma_attrs *attrs) { + pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ? + pgprot_writecombine(pgprot_kernel) : + pgprot_dmacoherent(pgprot_kernel); void *memory;

if (dma_alloc_from_coherent(dev, size, handle, &memory)) return memory;

- return __dma_alloc(dev, size, handle, gfp, - pgprot_dmacoherent(pgprot_kernel), + return __dma_alloc(dev, size, handle, gfp, prot, __builtin_return_address(0)); } -EXPORT_SYMBOL(dma_alloc_coherent);

/* - * Allocate a writecombining region, in much the same way as - * dma_alloc_coherent above. + * Create userspace mapping for the DMA-coherent memory. */ -void * -dma_alloc_writecombine(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) -{ - return __dma_alloc(dev, size, handle, gfp, - pgprot_writecombine(pgprot_kernel), - __builtin_return_address(0)); -} -EXPORT_SYMBOL(dma_alloc_writecombine); - -static int dma_mmap(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size) +int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + struct dma_attrs *attrs) { int ret = -ENXIO; #ifdef CONFIG_MMU @@ -498,6 +493,9 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma, struct arm_vmregion *c;

user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; + vma->vm_page_prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ? + pgprot_writecombine(vma->vm_page_prot) : + pgprot_dmacoherent(vma->vm_page_prot);

c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); if (c) { @@ -518,27 +516,12 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma, return ret; }

-int dma_mmap_coherent(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size) -{ - vma->vm_page_prot = pgprot_dmacoherent(vma->vm_page_prot); - return dma_mmap(dev, vma, cpu_addr, dma_addr, size); -} -EXPORT_SYMBOL(dma_mmap_coherent); - -int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, - void *cpu_addr, dma_addr_t dma_addr, size_t size) -{ - vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); - return dma_mmap(dev, vma, cpu_addr, dma_addr, size); -} -EXPORT_SYMBOL(dma_mmap_writecombine); - /* * free a page as defined by the above mapping. * Must not be called with IRQs disabled. */ -void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle) +void arm_dma_free(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, struct dma_attrs *attrs) { WARN_ON(irqs_disabled());

@@ -552,7 +535,6 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr

__dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size); } -EXPORT_SYMBOL(dma_free_coherent);

static void dma_cache_maint_page(struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir,

-- 1.7.1.569.g6f426

Marek Szyprowski

1:44 p.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

This implementation tries to optimize dma address space usage by remapping all possible physical memory chunks into a single dma address space chunk.

DMA address space is managed on top of the bitmap stored in the dma_iommu_mapping structure stored in device->archdata. Platform setup code has to initialize parameters of the dma address space (base address, size, allocation precision order) with arm_iommu_create_mapping() function. To reduce the size of the bitmap, all allocations are aligned to the specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with alloc_pages() function to avoid failing if the physical memory gets fragmented. In worst case the allocated buffer is composed of 4 KiB page chunks.

dma_map_sg() function minimizes the total number of dma address space chunks by merging of physical memory chunks into one larger dma address space chunk. If requested chunk (scatter list entry) boundaries match physical page boundaries, most calls to dma_map_sg() requests will result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma address space.

All dma functions also perform required cache operation like their counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by: - Krishna Reddy vdumpa@nvidia.com, - Andrzej Pietrasiewicz andrzej.p@samsung.com, - Hiroshi DOYU hdoyu@nvidia.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Tested-By: Subash Patel subash.ramaswamy@linaro.org --- arch/arm/Kconfig | 8 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/mm/dma-mapping.c | 727 +++++++++++++++++++++++++++++++++++++- arch/arm/mm/vmregion.h | 2 +- 5 files changed, 759 insertions(+), 15 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fd27d4..874e519 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -46,6 +46,14 @@ config ARM config ARM_HAS_SG_CHAIN bool

+config NEED_SG_DMA_LENGTH + bool + +config ARM_DMA_USE_IOMMU + select NEED_SG_DMA_LENGTH + select ARM_HAS_SG_CHAIN + bool + config HAVE_PWM bool

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 6e2cb0e..b69c0d3 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -14,6 +14,9 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu; /* private IOMMU data */ #endif +#ifdef CONFIG_ARM_DMA_USE_IOMMU + struct dma_iommu_mapping *mapping; +#endif };

struct omap_device; diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h new file mode 100644 index 0000000..799b094 --- /dev/null +++ b/arch/arm/include/asm/dma-iommu.h @@ -0,0 +1,34 @@ +#ifndef ASMARM_DMA_IOMMU_H +#define ASMARM_DMA_IOMMU_H + +#ifdef __KERNEL__ + +#include <linux/mm_types.h> +#include <linux/scatterlist.h> +#include <linux/dma-debug.h> +#include <linux/kmemcheck.h> + +struct dma_iommu_mapping { + /* iommu specific data */ + struct iommu_domain *domain; + + void *bitmap; + size_t bits; + unsigned int order; + dma_addr_t base; + + spinlock_t lock; + struct kref kref; +}; + +struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size, + int order); + +void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping); + +int arm_iommu_attach_device(struct device *dev, + struct dma_iommu_mapping *mapping); + +#endif /* __KERNEL__ */ +#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index d4aad65..2d11aa0 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -19,6 +19,8 @@ #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/slab.h> +#include <linux/iommu.h> +#include <linux/vmalloc.h>

#include <asm/memory.h> #include <asm/highmem.h> @@ -26,6 +28,7 @@ #include <asm/tlbflush.h> #include <asm/sizes.h> #include <asm/mach/arch.h> +#include <asm/dma-iommu.h>

#include "mm.h"

@@ -155,6 +158,21 @@ static u64 get_coherent_dma_mask(struct device *dev) return mask; }

+static void __dma_clear_buffer(struct page *page, size_t size) +{ + void *ptr; + /* + * Ensure that the allocated pages are zeroed, and that any data + * lurking in the kernel direct-mapped region is invalidated. + */ + ptr = page_address(page); + if (ptr) { + memset(ptr, 0, size); + dmac_flush_range(ptr, ptr + size); + outer_flush_range(__pa(ptr), __pa(ptr) + size); + } +} + /* * Allocate a DMA buffer for 'dev' of size 'size' using the * specified gfp mask. Note that 'size' must be page aligned. @@ -163,7 +181,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf { unsigned long order = get_order(size); struct page *page, *p, *e; - void *ptr; u64 mask = get_coherent_dma_mask(dev);

#ifdef CONFIG_DMA_API_DEBUG @@ -192,14 +209,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) __free_page(p);

- /* - * Ensure that the allocated pages are zeroed, and that any data - * lurking in the kernel direct-mapped region is invalidated. - */ - ptr = page_address(page); - memset(ptr, 0, size); - dmac_flush_range(ptr, ptr + size); - outer_flush_range(__pa(ptr), __pa(ptr) + size); + __dma_clear_buffer(page, size);

return page; } @@ -348,7 +358,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);

pte = consistent_pte[idx] + off; - c->vm_pages = page; + c->priv = page;

do { BUG_ON(!pte_none(*pte)); @@ -461,6 +471,14 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, return addr; }

+static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot) +{ + prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ? + pgprot_writecombine(prot) : + pgprot_dmacoherent(prot); + return prot; +} + /* * Allocate DMA-coherent memory space and return both the kernel remapped * virtual and bus address for that space. @@ -468,9 +486,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) { - pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ? - pgprot_writecombine(pgprot_kernel) : - pgprot_dmacoherent(pgprot_kernel); + pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel); void *memory;

if (dma_alloc_from_coherent(dev, size, handle, &memory)) @@ -497,16 +513,20 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, pgprot_writecombine(vma->vm_page_prot) : pgprot_dmacoherent(vma->vm_page_prot);

+ if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret)) + return ret; + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); if (c) { unsigned long off = vma->vm_pgoff; + struct page *pages = c->priv;

kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;

if (off < kern_size && user_size <= (kern_size - off)) { ret = remap_pfn_range(vma, vma->vm_start, - page_to_pfn(c->vm_pages) + off, + page_to_pfn(pages) + off, user_size << PAGE_SHIFT, vma->vm_page_prot); } @@ -645,6 +665,9 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int i, j;

for_each_sg(sg, s, nents, i) { +#ifdef CONFIG_NEED_SG_DMA_LENGTH + s->dma_length = s->length; +#endif s->dma_address = ops->map_page(dev, sg_page(s), s->offset, s->length, dir, attrs); if (dma_mapping_error(dev, s->dma_address)) @@ -753,3 +776,679 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init); + +#ifdef CONFIG_ARM_DMA_USE_IOMMU + +/* IOMMU */ + +static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping, + size_t size) +{ + unsigned int order = get_order(size); + unsigned int align = 0; + unsigned int count, start; + unsigned long flags; + + count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) + + (1 << mapping->order) - 1) >> mapping->order; + + if (order > mapping->order) + align = (1 << (order - mapping->order)) - 1; + + spin_lock_irqsave(&mapping->lock, flags); + start = bitmap_find_next_zero_area(mapping->bitmap, mapping->bits, 0, + count, align); + if (start > mapping->bits) { + spin_unlock_irqrestore(&mapping->lock, flags); + return DMA_ERROR_CODE; + } + + bitmap_set(mapping->bitmap, start, count); + spin_unlock_irqrestore(&mapping->lock, flags); + + return mapping->base + (start << (mapping->order + PAGE_SHIFT)); +} + +static inline void __free_iova(struct dma_iommu_mapping *mapping, + dma_addr_t addr, size_t size) +{ + unsigned int start = (addr - mapping->base) >> + (mapping->order + PAGE_SHIFT); + unsigned int count = ((size >> PAGE_SHIFT) + + (1 << mapping->order) - 1) >> mapping->order; + unsigned long flags; + + spin_lock_irqsave(&mapping->lock, flags); + bitmap_clear(mapping->bitmap, start, count); + spin_unlock_irqrestore(&mapping->lock, flags); +} + +static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +{ + struct page **pages; + int count = size >> PAGE_SHIFT; + int array_size = count * sizeof(struct page *); + int i = 0; + + if (array_size <= PAGE_SIZE) + pages = kzalloc(array_size, gfp); + else + pages = vzalloc(array_size); + if (!pages) + return NULL; + + while (count) { + int j, order = __ffs(count); + + pages[i] = alloc_pages(gfp | __GFP_NOWARN, order); + while (!pages[i] && order) + pages[i] = alloc_pages(gfp | __GFP_NOWARN, --order); + if (!pages[i]) + goto error; + + if (order) + split_page(pages[i], order); + j = 1 << order; + while (--j) + pages[i + j] = pages[i] + j; + + __dma_clear_buffer(pages[i], PAGE_SIZE << order); + i += 1 << order; + count -= 1 << order; + } + + return pages; +error: + while (--i) + if (pages[i]) + __free_pages(pages[i], 0); + if (array_size < PAGE_SIZE) + kfree(pages); + else + vfree(pages); + return NULL; +} + +static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) +{ + int count = size >> PAGE_SHIFT; + int array_size = count * sizeof(struct page *); + int i; + for (i = 0; i < count; i++) + if (pages[i]) + __free_pages(pages[i], 0); + if (array_size < PAGE_SIZE) + kfree(pages); + else + vfree(pages); + return 0; +} + +/* + * Create a CPU mapping for a specified pages + */ +static void * +__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot) +{ + struct arm_vmregion *c; + size_t align; + size_t count = size >> PAGE_SHIFT; + int bit; + + if (!consistent_pte[0]) { + pr_err("%s: not initialised\n", __func__); + dump_stack(); + return NULL; + } + + /* + * Align the virtual region allocation - maximum alignment is + * a section size, minimum is a page size. This helps reduce + * fragmentation of the DMA space, and also prevents allocations + * smaller than a section from crossing a section boundary. + */ + bit = fls(size - 1); + if (bit > SECTION_SHIFT) + bit = SECTION_SHIFT; + align = 1 << bit; + + /* + * Allocate a virtual address in the consistent mapping region. + */ + c = arm_vmregion_alloc(&consistent_head, align, size, + gfp & ~(__GFP_DMA | __GFP_HIGHMEM), NULL); + if (c) { + pte_t *pte; + int idx = CONSISTENT_PTE_INDEX(c->vm_start); + int i = 0; + u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1); + + pte = consistent_pte[idx] + off; + c->priv = pages; + + do { + BUG_ON(!pte_none(*pte)); + + set_pte_ext(pte, mk_pte(pages[i], prot), 0); + pte++; + off++; + i++; + if (off >= PTRS_PER_PTE) { + off = 0; + pte = consistent_pte[++idx]; + } + } while (i < count); + + dsb(); + + return (void *)c->vm_start; + } + return NULL; +} + +/* + * Create a mapping in device IO address space for specified pages + */ +static dma_addr_t +__iommu_create_mapping(struct device *dev, struct page **pages, size_t size) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT; + dma_addr_t dma_addr, iova; + int i, ret = DMA_ERROR_CODE; + + dma_addr = __alloc_iova(mapping, size); + if (dma_addr == DMA_ERROR_CODE) + return dma_addr; + + iova = dma_addr; + for (i = 0; i < count; ) { + unsigned int next_pfn = page_to_pfn(pages[i]) + 1; + phys_addr_t phys = page_to_phys(pages[i]); + unsigned int len, j; + + for (j = i + 1; j < count; j++, next_pfn++) + if (page_to_pfn(pages[j]) != next_pfn) + break; + + len = (j - i) << PAGE_SHIFT; + ret = iommu_map(mapping->domain, iova, phys, len, 0); + if (ret < 0) + goto fail; + iova += len; + i = j; + } + return dma_addr; +fail: + iommu_unmap(mapping->domain, dma_addr, iova-dma_addr); + __free_iova(mapping, dma_addr, size); + return DMA_ERROR_CODE; +} + +static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + + /* + * add optional in-page offset from iova to size and align + * result to page size + */ + size = PAGE_ALIGN((iova & ~PAGE_MASK) + size); + iova &= PAGE_MASK; + + iommu_unmap(mapping->domain, iova, size); + __free_iova(mapping, iova, size); + return 0; +} + +static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, + dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) +{ + pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel); + struct page **pages; + void *addr = NULL; + + *handle = DMA_ERROR_CODE; + size = PAGE_ALIGN(size); + + pages = __iommu_alloc_buffer(dev, size, gfp); + if (!pages) + return NULL; + + *handle = __iommu_create_mapping(dev, pages, size); + if (*handle == DMA_ERROR_CODE) + goto err_buffer; + + addr = __iommu_alloc_remap(pages, size, gfp, prot); + if (!addr) + goto err_mapping; + + return addr; + +err_mapping: + __iommu_remove_mapping(dev, *handle, size); +err_buffer: + __iommu_free_buffer(dev, pages, size); + return NULL; +} + +static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + struct dma_attrs *attrs) +{ + struct arm_vmregion *c; + + vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot); + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); + + if (c) { + struct page **pages = c->priv; + + unsigned long uaddr = vma->vm_start; + unsigned long usize = vma->vm_end - vma->vm_start; + int i = 0; + + do { + int ret; + + ret = vm_insert_page(vma, uaddr, pages[i++]); + if (ret) { + pr_err("Remapping memory, error: %d\n", ret); + return ret; + } + + uaddr += PAGE_SIZE; + usize -= PAGE_SIZE; + } while (usize > 0); + } + return 0; +} + +/* + * free a page as defined by the above mapping. + * Must not be called with IRQs disabled. + */ +void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, + dma_addr_t handle, struct dma_attrs *attrs) +{ + struct arm_vmregion *c; + size = PAGE_ALIGN(size); + + c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); + if (c) { + struct page **pages = c->priv; + __dma_free_remap(cpu_addr, size); + __iommu_remove_mapping(dev, handle, size); + __iommu_free_buffer(dev, pages, size); + } +} + +/* + * Map a part of the scatter-gather list into contiguous io address space + */ +static int __map_sg_chunk(struct device *dev, struct scatterlist *sg, + size_t size, dma_addr_t *handle, + enum dma_data_direction dir) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + dma_addr_t iova, iova_base; + int ret = 0; + unsigned int count; + struct scatterlist *s; + + size = PAGE_ALIGN(size); + *handle = DMA_ERROR_CODE; + + iova_base = iova = __alloc_iova(mapping, size); + if (iova == DMA_ERROR_CODE) + return -ENOMEM; + + for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) { + phys_addr_t phys = page_to_phys(sg_page(s)); + unsigned int len = PAGE_ALIGN(s->offset + s->length); + + if (!arch_is_coherent()) + __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); + + ret = iommu_map(mapping->domain, iova, phys, len, 0); + if (ret < 0) + goto fail; + count += len >> PAGE_SHIFT; + iova += len; + } + *handle = iova_base; + + return 0; +fail: + iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE); + __free_iova(mapping, iova_base, size); + return ret; +} + +/** + * arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA + * @dev: valid struct device pointer + * @sg: list of buffers + * @nents: number of buffers to map + * @dir: DMA transfer direction + * + * Map a set of buffers described by scatterlist in streaming mode for DMA. + * The scatter gather list elements are merged together (if possible) and + * tagged with the appropriate dma address and length. They are obtained via + * sg_dma_{address,length}. + */ +int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir, struct dma_attrs *attrs) +{ + struct scatterlist *s = sg, *dma = sg, *start = sg; + int i, count = 0; + unsigned int offset = s->offset; + unsigned int size = s->offset + s->length; + unsigned int max = dma_get_max_seg_size(dev); + + for (i = 1; i < nents; i++) { + s = sg_next(s); + + s->dma_address = DMA_ERROR_CODE; + s->dma_length = 0; + + if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) { + if (__map_sg_chunk(dev, start, size, &dma->dma_address, + dir) < 0) + goto bad_mapping; + + dma->dma_address += offset; + dma->dma_length = size - offset; + + size = offset = s->offset; + start = s; + dma = sg_next(dma); + count += 1; + } + size += s->length; + } + if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0) + goto bad_mapping; + + dma->dma_address += offset; + dma->dma_length = size - offset; + + return count+1; + +bad_mapping: + for_each_sg(sg, s, count, i) + __iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s)); + return 0; +} + +/** + * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg + * @dev: valid struct device pointer + * @sg: list of buffers + * @nents: number of buffers to unmap (same as was passed to dma_map_sg) + * @dir: DMA transfer direction (same as was passed to dma_map_sg) + * + * Unmap a set of streaming mode DMA translations. Again, CPU access + * rules concerning calls here are the same as for dma_unmap_single(). + */ +void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents, + enum dma_data_direction dir, struct dma_attrs *attrs) +{ + struct scatterlist *s; + int i; + + for_each_sg(sg, s, nents, i) { + if (sg_dma_len(s)) + __iommu_remove_mapping(dev, sg_dma_address(s), + sg_dma_len(s)); + if (!arch_is_coherent()) + __dma_page_dev_to_cpu(sg_page(s), s->offset, + s->length, dir); + } +} + +/** + * arm_iommu_sync_sg_for_cpu + * @dev: valid struct device pointer + * @sg: list of buffers + * @nents: number of buffers to map (returned from dma_map_sg) + * @dir: DMA transfer direction (same as was passed to dma_map_sg) + */ +void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir) +{ + struct scatterlist *s; + int i; + + for_each_sg(sg, s, nents, i) + if (!arch_is_coherent()) + __dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir); + +} + +/** + * arm_iommu_sync_sg_for_device + * @dev: valid struct device pointer + * @sg: list of buffers + * @nents: number of buffers to map (returned from dma_map_sg) + * @dir: DMA transfer direction (same as was passed to dma_map_sg) + */ +void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg, + int nents, enum dma_data_direction dir) +{ + struct scatterlist *s; + int i; + + for_each_sg(sg, s, nents, i) + if (!arch_is_coherent()) + __dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir); +} + + +/** + * arm_iommu_map_page + * @dev: valid struct device pointer + * @page: page that buffer resides in + * @offset: offset into page for start of buffer + * @size: size of buffer to map + * @dir: DMA transfer direction + * + * IOMMU aware version of arm_dma_map_page() + */ +static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page, + unsigned long offset, size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + dma_addr_t dma_addr; + int ret, len = PAGE_ALIGN(size + offset); + + if (!arch_is_coherent()) + __dma_page_cpu_to_dev(page, offset, size, dir); + + dma_addr = __alloc_iova(mapping, len); + if (dma_addr == DMA_ERROR_CODE) + return dma_addr; + + ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0); + if (ret < 0) + goto fail; + + return dma_addr + offset; +fail: + __free_iova(mapping, dma_addr, len); + return DMA_ERROR_CODE; +} + +/** + * arm_iommu_unmap_page + * @dev: valid struct device pointer + * @handle: DMA address of buffer + * @size: size of buffer (same as passed to dma_map_page) + * @dir: DMA transfer direction (same as passed to dma_map_page) + * + * IOMMU aware version of arm_dma_unmap_page() + */ +static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle, + size_t size, enum dma_data_direction dir, + struct dma_attrs *attrs) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + dma_addr_t iova = handle & PAGE_MASK; + struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); + int offset = handle & ~PAGE_MASK; + int len = PAGE_ALIGN(size + offset); + + if (!iova) + return; + + if (!arch_is_coherent()) + __dma_page_dev_to_cpu(page, offset, size, dir); + + iommu_unmap(mapping->domain, iova, len); + __free_iova(mapping, iova, len); +} + +static void arm_iommu_sync_single_for_cpu(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + dma_addr_t iova = handle & PAGE_MASK; + struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); + unsigned int offset = handle & ~PAGE_MASK; + + if (!iova) + return; + + if (!arch_is_coherent()) + __dma_page_dev_to_cpu(page, offset, size, dir); +} + +static void arm_iommu_sync_single_for_device(struct device *dev, + dma_addr_t handle, size_t size, enum dma_data_direction dir) +{ + struct dma_iommu_mapping *mapping = dev->archdata.mapping; + dma_addr_t iova = handle & PAGE_MASK; + struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova)); + unsigned int offset = handle & ~PAGE_MASK; + + if (!iova) + return; + + __dma_page_cpu_to_dev(page, offset, size, dir); +} + +struct dma_map_ops iommu_ops = { + .alloc = arm_iommu_alloc_attrs, + .free = arm_iommu_free_attrs, + .mmap = arm_iommu_mmap_attrs, + + .map_page = arm_iommu_map_page, + .unmap_page = arm_iommu_unmap_page, + .sync_single_for_cpu = arm_iommu_sync_single_for_cpu, + .sync_single_for_device = arm_iommu_sync_single_for_device, + + .map_sg = arm_iommu_map_sg, + .unmap_sg = arm_iommu_unmap_sg, + .sync_sg_for_cpu = arm_iommu_sync_sg_for_cpu, + .sync_sg_for_device = arm_iommu_sync_sg_for_device, +}; + +/** + * arm_iommu_create_mapping + * @bus: pointer to the bus holding the client device (for IOMMU calls) + * @base: start address of the valid IO address space + * @size: size of the valid IO address space + * @order: accuracy of the IO addresses allocations + * + * Creates a mapping structure which holds information about used/unused + * IO address ranges, which is required to perform memory allocation and + * mapping with IOMMU aware functions. + * + * The client device need to be attached to the mapping with + * arm_iommu_attach_device function. + */ +struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size, + int order) +{ + unsigned int count = size >> (PAGE_SHIFT + order); + unsigned int bitmap_size = BITS_TO_LONGS(count) * sizeof(long); + struct dma_iommu_mapping *mapping; + int err = -ENOMEM; + + if (!count) + return ERR_PTR(-EINVAL); + + mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL); + if (!mapping) + goto err; + + mapping->bitmap = kzalloc(bitmap_size, GFP_KERNEL); + if (!mapping->bitmap) + goto err2; + + mapping->base = base; + mapping->bits = BITS_PER_BYTE * bitmap_size; + mapping->order = order; + spin_lock_init(&mapping->lock); + + mapping->domain = iommu_domain_alloc(bus); + if (!mapping->domain) + goto err3; + + kref_init(&mapping->kref); + return mapping; +err3: + kfree(mapping->bitmap); +err2: + kfree(mapping); +err: + return ERR_PTR(err); +} + +static void release_iommu_mapping(struct kref *kref) +{ + struct dma_iommu_mapping *mapping = + container_of(kref, struct dma_iommu_mapping, kref); + + iommu_domain_free(mapping->domain); + kfree(mapping->bitmap); + kfree(mapping); +} + +void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) +{ + if (mapping) + kref_put(&mapping->kref, release_iommu_mapping); +} + +/** + * arm_iommu_attach_device + * @dev: valid struct device pointer + * @mapping: io address space mapping structure (returned from + * arm_iommu_create_mapping) + * + * Attaches specified io address space mapping to the provided device, + * this replaces the dma operations (dma_map_ops pointer) with the + * IOMMU aware version. More than one client might be attached to + * the same io address space mapping. + */ +int arm_iommu_attach_device(struct device *dev, + struct dma_iommu_mapping *mapping) +{ + int err; + + err = iommu_attach_device(mapping->domain, dev); + if (err) + return err; + + kref_get(&mapping->kref); + dev->archdata.mapping = mapping; + set_dma_ops(dev, &iommu_ops); + + pr_info("Attached IOMMU controller to %s device.\n", dev_name(dev)); + return 0; +} + +#endif diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h index 162be66..bf312c3 100644 --- a/arch/arm/mm/vmregion.h +++ b/arch/arm/mm/vmregion.h @@ -17,7 +17,7 @@ struct arm_vmregion { struct list_head vm_list; unsigned long vm_start; unsigned long vm_end; - struct page *vm_pages; + void *priv; int vm_active; const void *caller; };

-- 1.7.1.569.g6f426

Abhinav Kochhar

20 Apr 20 Apr

1:44 a.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

Hi Marek,

dma_addr_t dma_addr is an unused argument passed to the function arm_iommu_mmap_attrs

+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + struct dma_attrs *attrs) +{ + struct arm_vmregion *c; + + vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot); + c = arm_vmregion_find(&consistent_ head, (unsigned long)cpu_addr); + + if (c) { + struct page **pages = c->priv; + + unsigned long uaddr = vma->vm_start; + unsigned long usize = vma->vm_end - vma->vm_start; + int i = 0; + + do { + int ret; + + ret = vm_insert_page(vma, uaddr, pages[i++]); + if (ret) { + pr_err("Remapping memory, error: %d\n", ret); + return ret; + } + + uaddr += PAGE_SIZE; + usize -= PAGE_SIZE; + } while (usize > 0); + } + return 0; +}

On Wed, Apr 18, 2012 at 10:44 PM, Marek Szyprowski <m.szyprowski@samsung.com

...

wrote:

...

This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

This implementation tries to optimize dma address space usage by remapping all possible physical memory chunks into a single dma address space chunk.

DMA address space is managed on top of the bitmap stored in the dma_iommu_mapping structure stored in device->archdata. Platform setup code has to initialize parameters of the dma address space (base address, size, allocation precision order) with arm_iommu_create_mapping() function. To reduce the size of the bitmap, all allocations are aligned to the specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with alloc_pages() function to avoid failing if the physical memory gets fragmented. In worst case the allocated buffer is composed of 4 KiB page chunks.

dma_map_sg() function minimizes the total number of dma address space chunks by merging of physical memory chunks into one larger dma address space chunk. If requested chunk (scatter list entry) boundaries match physical page boundaries, most calls to dma_map_sg() requests will result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma address space.

All dma functions also perform required cache operation like their counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by:

Krishna Reddy vdumpa@nvidia.com,

Andrzej Pietrasiewicz andrzej.p@samsung.com,

Hiroshi DOYU hdoyu@nvidia.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Tested-By: Subash Patel subash.ramaswamy@linaro.org

arch/arm/Kconfig | 8 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/mm/dma-mapping.c | 727 +++++++++++++++++++++++++++++++++++++- arch/arm/mm/vmregion.h | 2 +- 5 files changed, 759 insertions(+), 15 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fd27d4..874e519 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -46,6 +46,14 @@ config ARM config ARM_HAS_SG_CHAIN bool

+config NEED_SG_DMA_LENGTH
  bool
+config ARM_DMA_USE_IOMMU
  select NEED_SG_DMA_LENGTH
  select ARM_HAS_SG_CHAIN
  bool
config HAVE_PWM bool

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 6e2cb0e..b69c0d3 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -14,6 +14,9 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu; /* private IOMMU data */ #endif +#ifdef CONFIG_ARM_DMA_USE_IOMMU
  struct dma_iommu_mapping        *mapping;
+#endif };

struct omap_device; diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h new file mode 100644 index 0000000..799b094 --- /dev/null +++ b/arch/arm/include/asm/dma-iommu.h @@ -0,0 +1,34 @@ +#ifndef ASMARM_DMA_IOMMU_H +#define ASMARM_DMA_IOMMU_H

+#ifdef __KERNEL__

+#include <linux/mm_types.h> +#include <linux/scatterlist.h> +#include <linux/dma-debug.h> +#include <linux/kmemcheck.h>

+struct dma_iommu_mapping {
  /* iommu specific data */
  struct iommu_domain     *domain;
  void                    *bitmap;
  size_t                  bits;
  unsigned int            order;
  dma_addr_t              base;
  spinlock_t              lock;
  struct kref             kref;
+};

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order);
+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);

+int arm_iommu_attach_device(struct device *dev,
                                  struct dma_iommu_mapping *mapping);
+#endif /* __KERNEL__ */ +#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index d4aad65..2d11aa0 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -19,6 +19,8 @@ #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/slab.h> +#include <linux/iommu.h> +#include <linux/vmalloc.h>

#include <asm/memory.h> #include <asm/highmem.h> @@ -26,6 +28,7 @@ #include <asm/tlbflush.h> #include <asm/sizes.h> #include <asm/mach/arch.h> +#include <asm/dma-iommu.h>

#include "mm.h"

@@ -155,6 +158,21 @@ static u64 get_coherent_dma_mask(struct device *dev) return mask; }

+static void __dma_clear_buffer(struct page *page, size_t size) +{
  void *ptr;
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  if (ptr) {
          memset(ptr, 0, size);
          dmac_flush_range(ptr, ptr + size);
          outer_flush_range(__pa(ptr), __pa(ptr) + size);
  }
+}

/*

Allocate a DMA buffer for 'dev' of size 'size' using the

specified gfp mask. Note that 'size' must be page aligned.

@@ -163,7 +181,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf { unsigned long order = get_order(size); struct page *page, *p, *e;
  void *ptr;
 u64 mask = get_coherent_dma_mask(dev);
#ifdef CONFIG_DMA_API_DEBUG @@ -192,14 +209,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) __free_page(p);
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  memset(ptr, 0, size);
  dmac_flush_range(ptr, ptr + size);
  outer_flush_range(__pa(ptr), __pa(ptr) + size);
  __dma_clear_buffer(page, size);

 return page;
} @@ -348,7 +358,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
           pte = consistent_pte[idx] + off;
          c->vm_pages = page;
          c->priv = page;

         do {
                 BUG_ON(!pte_none(*pte));
@@ -461,6 +471,14 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, return addr; }

+static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot) +{
  prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                      pgprot_writecombine(prot) :
                      pgprot_dmacoherent(prot);
  return prot;
+}

/*

Allocate DMA-coherent memory space and return both the kernel remapped

virtual and bus address for that space.

@@ -468,9 +486,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) {
  pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                  pgprot_writecombine(pgprot_kernel) :
                  pgprot_dmacoherent(pgprot_kernel);
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
 void *memory;

 if (dma_alloc_from_coherent(dev, size, handle, &memory))
@@ -497,16 +513,20 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, pgprot_writecombine(vma->vm_page_prot) : pgprot_dmacoherent(vma->vm_page_prot);
  if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
          return ret;
 c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
 if (c) {
         unsigned long off = vma->vm_pgoff;
          struct page *pages = c->priv;

         kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;

         if (off < kern_size &&
             user_size <= (kern_size - off)) {
                 ret = remap_pfn_range(vma, vma->vm_start,
                                        page_to_pfn(c->vm_pages) +
off,
                                        page_to_pfn(pages) + off,
                                       user_size << PAGE_SHIFT,
                                       vma->vm_page_prot);
         }
@@ -645,6 +665,9 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int i, j;
   for_each_sg(sg, s, nents, i) {
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
          s->dma_length = s->length;
+#endif s->dma_address = ops->map_page(dev, sg_page(s), s->offset, s->length, dir, attrs); if (dma_mapping_error(dev, s->dma_address)) @@ -753,3 +776,679 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init);

+#ifdef CONFIG_ARM_DMA_USE_IOMMU

+/* IOMMU */

+static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
                                size_t size)
+{
  unsigned int order = get_order(size);
  unsigned int align = 0;
  unsigned int count, start;
  unsigned long flags;
  count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) +
           (1 << mapping->order) - 1) >> mapping->order;
  if (order > mapping->order)
          align = (1 << (order - mapping->order)) - 1;
  spin_lock_irqsave(&mapping->lock, flags);
  start = bitmap_find_next_zero_area(mapping->bitmap, mapping->bits,
0,
                                     count, align);
  if (start > mapping->bits) {
          spin_unlock_irqrestore(&mapping->lock, flags);
          return DMA_ERROR_CODE;
  }
  bitmap_set(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
  return mapping->base + (start << (mapping->order + PAGE_SHIFT));
+}

+static inline void __free_iova(struct dma_iommu_mapping *mapping,
                         dma_addr_t addr, size_t size)
+{
  unsigned int start = (addr - mapping->base) >>
                       (mapping->order + PAGE_SHIFT);
  unsigned int count = ((size >> PAGE_SHIFT) +
                        (1 << mapping->order) - 1) >> mapping->order;
  unsigned long flags;
  spin_lock_irqsave(&mapping->lock, flags);
  bitmap_clear(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
+}

+static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +{
  struct page **pages;
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i = 0;
  if (array_size <= PAGE_SIZE)
          pages = kzalloc(array_size, gfp);
  else
          pages = vzalloc(array_size);
  if (!pages)
          return NULL;
  while (count) {
          int j, order = __ffs(count);
          pages[i] = alloc_pages(gfp | __GFP_NOWARN, order);
          while (!pages[i] && order)
                  pages[i] = alloc_pages(gfp | __GFP_NOWARN,
--order);
          if (!pages[i])
                  goto error;
          if (order)
                  split_page(pages[i], order);
          j = 1 << order;
          while (--j)
                  pages[i + j] = pages[i] + j;
          __dma_clear_buffer(pages[i], PAGE_SIZE << order);
          i += 1 << order;
          count -= 1 << order;
  }
  return pages;
+error:
  while (--i)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return NULL;
+}

+static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) +{
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i;
  for (i = 0; i < count; i++)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return 0;
+}

+/*

Create a CPU mapping for a specified pages

*/

+static void * +__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot) +{
  struct arm_vmregion *c;
  size_t align;
  size_t count = size >> PAGE_SHIFT;
  int bit;
  if (!consistent_pte[0]) {
          pr_err("%s: not initialised\n", __func__);
          dump_stack();
          return NULL;
  }
  /*
   * Align the virtual region allocation - maximum alignment is
   * a section size, minimum is a page size.  This helps reduce
   * fragmentation of the DMA space, and also prevents allocations
   * smaller than a section from crossing a section boundary.
   */
  bit = fls(size - 1);
  if (bit > SECTION_SHIFT)
          bit = SECTION_SHIFT;
  align = 1 << bit;
  /*
   * Allocate a virtual address in the consistent mapping region.
   */
  c = arm_vmregion_alloc(&consistent_head, align, size,
                      gfp & ~(__GFP_DMA | __GFP_HIGHMEM), NULL);
  if (c) {
          pte_t *pte;
          int idx = CONSISTENT_PTE_INDEX(c->vm_start);
          int i = 0;
          u32 off = CONSISTENT_OFFSET(c->vm_start) &
(PTRS_PER_PTE-1);
          pte = consistent_pte[idx] + off;
          c->priv = pages;
          do {
                  BUG_ON(!pte_none(*pte));
                  set_pte_ext(pte, mk_pte(pages[i], prot), 0);
                  pte++;
                  off++;
                  i++;
                  if (off >= PTRS_PER_PTE) {
                          off = 0;
                          pte = consistent_pte[++idx];
                  }
          } while (i < count);
          dsb();
          return (void *)c->vm_start;
  }
  return NULL;
+}

+/*

Create a mapping in device IO address space for specified pages

*/

+static dma_addr_t +__iommu_create_mapping(struct device *dev, struct page **pages, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  dma_addr_t dma_addr, iova;
  int i, ret = DMA_ERROR_CODE;
  dma_addr = __alloc_iova(mapping, size);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  iova = dma_addr;
  for (i = 0; i < count; ) {
          unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
          phys_addr_t phys = page_to_phys(pages[i]);
          unsigned int len, j;
          for (j = i + 1; j < count; j++, next_pfn++)
                  if (page_to_pfn(pages[j]) != next_pfn)
                          break;
          len = (j - i) << PAGE_SHIFT;
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          iova += len;
          i = j;
  }
  return dma_addr;
+fail:
  iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
  __free_iova(mapping, dma_addr, size);
  return DMA_ERROR_CODE;
+}

+static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  /*
   * add optional in-page offset from iova to size and align
   * result to page size
   */
  size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  iova &= PAGE_MASK;
  iommu_unmap(mapping->domain, iova, size);
  __free_iova(mapping, iova, size);
  return 0;
+}

+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
      dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
+{
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
  struct page **pages;
  void *addr = NULL;
  *handle = DMA_ERROR_CODE;
  size = PAGE_ALIGN(size);
  pages = __iommu_alloc_buffer(dev, size, gfp);
  if (!pages)
          return NULL;
  *handle = __iommu_create_mapping(dev, pages, size);
  if (*handle == DMA_ERROR_CODE)
          goto err_buffer;
  addr = __iommu_alloc_remap(pages, size, gfp, prot);
  if (!addr)
          goto err_mapping;
  return addr;
+err_mapping:
  __iommu_remove_mapping(dev, *handle, size);
+err_buffer:
  __iommu_free_buffer(dev, pages, size);
  return NULL;
+}

+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

+/*

free a page as defined by the above mapping.

Must not be called with IRQs disabled.

*/

+void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
                    dma_addr_t handle, struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  size = PAGE_ALIGN(size);
  c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          __dma_free_remap(cpu_addr, size);
          __iommu_remove_mapping(dev, handle, size);
          __iommu_free_buffer(dev, pages, size);
  }
+}

+/*

Map a part of the scatter-gather list into contiguous io address space

*/

+static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
                    size_t size, dma_addr_t *handle,
                    enum dma_data_direction dir)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova, iova_base;
  int ret = 0;
  unsigned int count;
  struct scatterlist *s;
  size = PAGE_ALIGN(size);
  *handle = DMA_ERROR_CODE;
  iova_base = iova = __alloc_iova(mapping, size);
  if (iova == DMA_ERROR_CODE)
          return -ENOMEM;
  for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s =
sg_next(s)) {
          phys_addr_t phys = page_to_phys(sg_page(s));
          unsigned int len = PAGE_ALIGN(s->offset + s->length);
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir);
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          count += len >> PAGE_SHIFT;
          iova += len;
  }
  *handle = iova_base;
  return 0;
+fail:
  iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
  __free_iova(mapping, iova_base, size);
  return ret;
+}

+/**

arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map

@dir: DMA transfer direction

Map a set of buffers described by scatterlist in streaming mode for

DMA.

The scatter gather list elements are merged together (if possible) and

tagged with the appropriate dma address and length. They are obtained

via

sg_dma_{address,length}.

*/

+int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
               enum dma_data_direction dir, struct dma_attrs *attrs)
+{
  struct scatterlist *s = sg, *dma = sg, *start = sg;
  int i, count = 0;
  unsigned int offset = s->offset;
  unsigned int size = s->offset + s->length;
  unsigned int max = dma_get_max_seg_size(dev);
  for (i = 1; i < nents; i++) {
          s = sg_next(s);
          s->dma_address = DMA_ERROR_CODE;
          s->dma_length = 0;
          if (s->offset || (size & ~PAGE_MASK) || size + s->length >
max) {
                  if (__map_sg_chunk(dev, start, size,
&dma->dma_address,
                      dir) < 0)
                          goto bad_mapping;
                  dma->dma_address += offset;
                  dma->dma_length = size - offset;
                  size = offset = s->offset;
                  start = s;
                  dma = sg_next(dma);
                  count += 1;
          }
          size += s->length;
  }
  if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0)
          goto bad_mapping;
  dma->dma_address += offset;
  dma->dma_length = size - offset;
  return count+1;
+bad_mapping:
  for_each_sg(sg, s, count, i)
          __iommu_remove_mapping(dev, sg_dma_address(s),
sg_dma_len(s));
  return 0;
+}

+/**

arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to unmap (same as was passed to dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

Unmap a set of streaming mode DMA translations. Again, CPU access

rules concerning calls here are the same as for dma_unmap_single().

*/

+void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
                  enum dma_data_direction dir, struct dma_attrs
*attrs) +{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i) {
          if (sg_dma_len(s))
                  __iommu_remove_mapping(dev, sg_dma_address(s),
                                         sg_dma_len(s));
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
                                        s->length, dir);
  }
+}

+/**

arm_iommu_sync_sg_for_cpu

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
s->length, dir);

+}

+/**

arm_iommu_sync_sg_for_device

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir); +}

+/**

arm_iommu_map_page

@dev: valid struct device pointer

@page: page that buffer resides in

@offset: offset into page for start of buffer

@size: size of buffer to map

@dir: DMA transfer direction

IOMMU aware version of arm_dma_map_page()

*/

+static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
       unsigned long offset, size_t size, enum dma_data_direction
dir,
       struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t dma_addr;
  int ret, len = PAGE_ALIGN(size + offset);
  if (!arch_is_coherent())
          __dma_page_cpu_to_dev(page, offset, size, dir);
  dma_addr = __alloc_iova(mapping, len);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page),
len, 0);
  if (ret < 0)
          goto fail;
  return dma_addr + offset;
+fail:
  __free_iova(mapping, dma_addr, len);
  return DMA_ERROR_CODE;
+}

+/**

arm_iommu_unmap_page

@dev: valid struct device pointer

@handle: DMA address of buffer

@size: size of buffer (same as passed to dma_map_page)

@dir: DMA transfer direction (same as passed to dma_map_page)

IOMMU aware version of arm_dma_unmap_page()

*/

+static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
          size_t size, enum dma_data_direction dir,
          struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  int offset = handle & ~PAGE_MASK;
  int len = PAGE_ALIGN(size + offset);
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
  iommu_unmap(mapping->domain, iova, len);
  __free_iova(mapping, iova, len);
+}

+static void arm_iommu_sync_single_for_cpu(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
+}

+static void arm_iommu_sync_single_for_device(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  __dma_page_cpu_to_dev(page, offset, size, dir);
+}

+struct dma_map_ops iommu_ops = {
  .alloc          = arm_iommu_alloc_attrs,
  .free           = arm_iommu_free_attrs,
  .mmap           = arm_iommu_mmap_attrs,
  .map_page               = arm_iommu_map_page,
  .unmap_page             = arm_iommu_unmap_page,
  .sync_single_for_cpu    = arm_iommu_sync_single_for_cpu,
  .sync_single_for_device = arm_iommu_sync_single_for_device,
  .map_sg                 = arm_iommu_map_sg,
  .unmap_sg               = arm_iommu_unmap_sg,
  .sync_sg_for_cpu        = arm_iommu_sync_sg_for_cpu,
  .sync_sg_for_device     = arm_iommu_sync_sg_for_device,
+};

+/**

arm_iommu_create_mapping

@bus: pointer to the bus holding the client device (for IOMMU calls)

@base: start address of the valid IO address space

@size: size of the valid IO address space

@order: accuracy of the IO addresses allocations

Creates a mapping structure which holds information about used/unused

IO address ranges, which is required to perform memory allocation and

mapping with IOMMU aware functions.

The client device need to be attached to the mapping with

arm_iommu_attach_device function.

*/

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order)
+{
  unsigned int count = size >> (PAGE_SHIFT + order);
  unsigned int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
  struct dma_iommu_mapping *mapping;
  int err = -ENOMEM;
  if (!count)
          return ERR_PTR(-EINVAL);
  mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
  if (!mapping)
          goto err;
  mapping->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  if (!mapping->bitmap)
          goto err2;
  mapping->base = base;
  mapping->bits = BITS_PER_BYTE * bitmap_size;
  mapping->order = order;
  spin_lock_init(&mapping->lock);
  mapping->domain = iommu_domain_alloc(bus);
  if (!mapping->domain)
          goto err3;
  kref_init(&mapping->kref);
  return mapping;
+err3:
  kfree(mapping->bitmap);
+err2:
  kfree(mapping);
+err:
  return ERR_PTR(err);
+}

+static void release_iommu_mapping(struct kref *kref) +{
  struct dma_iommu_mapping *mapping =
          container_of(kref, struct dma_iommu_mapping, kref);
  iommu_domain_free(mapping->domain);
  kfree(mapping->bitmap);
  kfree(mapping);
+}

+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) +{
  if (mapping)
          kref_put(&mapping->kref, release_iommu_mapping);
+}

+/**
arm_iommu_attach_device

@dev: valid struct device pointer

@mapping: io address space mapping structure (returned from
arm_iommu_create_mapping)
Attaches specified io address space mapping to the provided device,

this replaces the dma operations (dma_map_ops pointer) with the

IOMMU aware version. More than one client might be attached to

the same io address space mapping.

*/
+int arm_iommu_attach_device(struct device *dev,
                      struct dma_iommu_mapping *mapping)
+{
  int err;
  err = iommu_attach_device(mapping->domain, dev);
  if (err)
          return err;
  kref_get(&mapping->kref);
  dev->archdata.mapping = mapping;
  set_dma_ops(dev, &iommu_ops);
  pr_info("Attached IOMMU controller to %s device.\n",
dev_name(dev));
  return 0;
+}

+#endif diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h index 162be66..bf312c3 100644 --- a/arch/arm/mm/vmregion.h +++ b/arch/arm/mm/vmregion.h @@ -17,7 +17,7 @@ struct arm_vmregion { struct list_head vm_list; unsigned long vm_start; unsigned long vm_end;
  struct page             *vm_pages;
  void                    *priv;
 int                     vm_active;
 const void              *caller;
};

1.7.1.569.g6f426

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Abhinav Kochhar

1:48 a.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

Even "size_t size" is unused

On Fri, Apr 20, 2012 at 10:44 AM, Abhinav Kochhar <kochhar.abhinav@gmail.com

...

wrote:

...

Hi Marek,

dma_addr_t dma_addr is an unused argument passed to the function arm_iommu_mmap_attrs

+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_
head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

On Wed, Apr 18, 2012 at 10:44 PM, Marek Szyprowski < m.szyprowski@samsung.com> wrote:

...
This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

This implementation tries to optimize dma address space usage by remapping all possible physical memory chunks into a single dma address space chunk.

DMA address space is managed on top of the bitmap stored in the dma_iommu_mapping structure stored in device->archdata. Platform setup code has to initialize parameters of the dma address space (base address, size, allocation precision order) with arm_iommu_create_mapping() function. To reduce the size of the bitmap, all allocations are aligned to the specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with alloc_pages() function to avoid failing if the physical memory gets fragmented. In worst case the allocated buffer is composed of 4 KiB page chunks.

dma_map_sg() function minimizes the total number of dma address space chunks by merging of physical memory chunks into one larger dma address space chunk. If requested chunk (scatter list entry) boundaries match physical page boundaries, most calls to dma_map_sg() requests will result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma address space.

All dma functions also perform required cache operation like their counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by:

Krishna Reddy vdumpa@nvidia.com,

Andrzej Pietrasiewicz andrzej.p@samsung.com,

Hiroshi DOYU hdoyu@nvidia.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Tested-By: Subash Patel subash.ramaswamy@linaro.org

arch/arm/Kconfig | 8 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/mm/dma-mapping.c | 727 +++++++++++++++++++++++++++++++++++++- arch/arm/mm/vmregion.h | 2 +- 5 files changed, 759 insertions(+), 15 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fd27d4..874e519 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -46,6 +46,14 @@ config ARM config ARM_HAS_SG_CHAIN bool

+config NEED_SG_DMA_LENGTH
  bool
+config ARM_DMA_USE_IOMMU
  select NEED_SG_DMA_LENGTH
  select ARM_HAS_SG_CHAIN
  bool
config HAVE_PWM bool

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 6e2cb0e..b69c0d3 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -14,6 +14,9 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu; /* private IOMMU data */ #endif +#ifdef CONFIG_ARM_DMA_USE_IOMMU
  struct dma_iommu_mapping        *mapping;
+#endif };

struct omap_device; diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h new file mode 100644 index 0000000..799b094 --- /dev/null +++ b/arch/arm/include/asm/dma-iommu.h @@ -0,0 +1,34 @@ +#ifndef ASMARM_DMA_IOMMU_H +#define ASMARM_DMA_IOMMU_H

+#ifdef __KERNEL__

+#include <linux/mm_types.h> +#include <linux/scatterlist.h> +#include <linux/dma-debug.h> +#include <linux/kmemcheck.h>

+struct dma_iommu_mapping {
  /* iommu specific data */
  struct iommu_domain     *domain;
  void                    *bitmap;
  size_t                  bits;
  unsigned int            order;
  dma_addr_t              base;
  spinlock_t              lock;
  struct kref             kref;
+};

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order);
+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);

+int arm_iommu_attach_device(struct device *dev,
                                  struct dma_iommu_mapping
*mapping);

+#endif /* __KERNEL__ */ +#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index d4aad65..2d11aa0 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -19,6 +19,8 @@ #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/slab.h> +#include <linux/iommu.h> +#include <linux/vmalloc.h>

#include <asm/memory.h> #include <asm/highmem.h> @@ -26,6 +28,7 @@ #include <asm/tlbflush.h> #include <asm/sizes.h> #include <asm/mach/arch.h> +#include <asm/dma-iommu.h>

#include "mm.h"

@@ -155,6 +158,21 @@ static u64 get_coherent_dma_mask(struct device *dev) return mask; }

+static void __dma_clear_buffer(struct page *page, size_t size) +{
  void *ptr;
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  if (ptr) {
          memset(ptr, 0, size);
          dmac_flush_range(ptr, ptr + size);
          outer_flush_range(__pa(ptr), __pa(ptr) + size);
  }
+}

/*

Allocate a DMA buffer for 'dev' of size 'size' using the

specified gfp mask. Note that 'size' must be page aligned.

@@ -163,7 +181,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf { unsigned long order = get_order(size); struct page *page, *p, *e;
  void *ptr;
 u64 mask = get_coherent_dma_mask(dev);
#ifdef CONFIG_DMA_API_DEBUG @@ -192,14 +209,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) __free_page(p);
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  memset(ptr, 0, size);
  dmac_flush_range(ptr, ptr + size);
  outer_flush_range(__pa(ptr), __pa(ptr) + size);
  __dma_clear_buffer(page, size);

 return page;
} @@ -348,7 +358,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
           pte = consistent_pte[idx] + off;
          c->vm_pages = page;
          c->priv = page;

         do {
                 BUG_ON(!pte_none(*pte));
@@ -461,6 +471,14 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, return addr; }

+static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot) +{
  prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                      pgprot_writecombine(prot) :
                      pgprot_dmacoherent(prot);
  return prot;
+}

/*

Allocate DMA-coherent memory space and return both the kernel remapped

virtual and bus address for that space.

@@ -468,9 +486,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) {
  pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                  pgprot_writecombine(pgprot_kernel) :
                  pgprot_dmacoherent(pgprot_kernel);
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
 void *memory;

 if (dma_alloc_from_coherent(dev, size, handle, &memory))
@@ -497,16 +513,20 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, pgprot_writecombine(vma->vm_page_prot) : pgprot_dmacoherent(vma->vm_page_prot);
  if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
          return ret;
 c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
 if (c) {
         unsigned long off = vma->vm_pgoff;
          struct page *pages = c->priv;

         kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;

         if (off < kern_size &&
             user_size <= (kern_size - off)) {
                 ret = remap_pfn_range(vma, vma->vm_start,
                                        page_to_pfn(c->vm_pages) +
off,
                                        page_to_pfn(pages) + off,
                                       user_size << PAGE_SHIFT,
                                       vma->vm_page_prot);
         }
@@ -645,6 +665,9 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int i, j;
   for_each_sg(sg, s, nents, i) {
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
          s->dma_length = s->length;
+#endif s->dma_address = ops->map_page(dev, sg_page(s), s->offset, s->length, dir, attrs); if (dma_mapping_error(dev, s->dma_address)) @@ -753,3 +776,679 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init);

+#ifdef CONFIG_ARM_DMA_USE_IOMMU

+/* IOMMU */

+static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
                                size_t size)
+{
  unsigned int order = get_order(size);
  unsigned int align = 0;
  unsigned int count, start;
  unsigned long flags;
  count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) +
           (1 << mapping->order) - 1) >> mapping->order;
  if (order > mapping->order)
          align = (1 << (order - mapping->order)) - 1;
  spin_lock_irqsave(&mapping->lock, flags);
  start = bitmap_find_next_zero_area(mapping->bitmap,
mapping->bits, 0,
                                     count, align);
  if (start > mapping->bits) {
          spin_unlock_irqrestore(&mapping->lock, flags);
          return DMA_ERROR_CODE;
  }
  bitmap_set(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
  return mapping->base + (start << (mapping->order + PAGE_SHIFT));
+}

+static inline void __free_iova(struct dma_iommu_mapping *mapping,
                         dma_addr_t addr, size_t size)
+{
  unsigned int start = (addr - mapping->base) >>
                       (mapping->order + PAGE_SHIFT);
  unsigned int count = ((size >> PAGE_SHIFT) +
                        (1 << mapping->order) - 1) >>
mapping->order;
  unsigned long flags;
  spin_lock_irqsave(&mapping->lock, flags);
  bitmap_clear(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
+}

+static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +{
  struct page **pages;
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i = 0;
  if (array_size <= PAGE_SIZE)
          pages = kzalloc(array_size, gfp);
  else
          pages = vzalloc(array_size);
  if (!pages)
          return NULL;
  while (count) {
          int j, order = __ffs(count);
          pages[i] = alloc_pages(gfp | __GFP_NOWARN, order);
          while (!pages[i] && order)
                  pages[i] = alloc_pages(gfp | __GFP_NOWARN,
--order);
          if (!pages[i])
                  goto error;
          if (order)
                  split_page(pages[i], order);
          j = 1 << order;
          while (--j)
                  pages[i + j] = pages[i] + j;
          __dma_clear_buffer(pages[i], PAGE_SIZE << order);
          i += 1 << order;
          count -= 1 << order;
  }
  return pages;
+error:
  while (--i)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return NULL;
+}

+static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) +{
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i;
  for (i = 0; i < count; i++)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return 0;
+}

+/*

Create a CPU mapping for a specified pages

*/

+static void * +__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot) +{
  struct arm_vmregion *c;
  size_t align;
  size_t count = size >> PAGE_SHIFT;
  int bit;
  if (!consistent_pte[0]) {
          pr_err("%s: not initialised\n", __func__);
          dump_stack();
          return NULL;
  }
  /*
   * Align the virtual region allocation - maximum alignment is
   * a section size, minimum is a page size.  This helps reduce
   * fragmentation of the DMA space, and also prevents allocations
   * smaller than a section from crossing a section boundary.
   */
  bit = fls(size - 1);
  if (bit > SECTION_SHIFT)
          bit = SECTION_SHIFT;
  align = 1 << bit;
  /*
   * Allocate a virtual address in the consistent mapping region.
   */
  c = arm_vmregion_alloc(&consistent_head, align, size,
                      gfp & ~(__GFP_DMA | __GFP_HIGHMEM), NULL);
  if (c) {
          pte_t *pte;
          int idx = CONSISTENT_PTE_INDEX(c->vm_start);
          int i = 0;
          u32 off = CONSISTENT_OFFSET(c->vm_start) &
(PTRS_PER_PTE-1);
          pte = consistent_pte[idx] + off;
          c->priv = pages;
          do {
                  BUG_ON(!pte_none(*pte));
                  set_pte_ext(pte, mk_pte(pages[i], prot), 0);
                  pte++;
                  off++;
                  i++;
                  if (off >= PTRS_PER_PTE) {
                          off = 0;
                          pte = consistent_pte[++idx];
                  }
          } while (i < count);
          dsb();
          return (void *)c->vm_start;
  }
  return NULL;
+}

+/*

Create a mapping in device IO address space for specified pages

*/

+static dma_addr_t +__iommu_create_mapping(struct device *dev, struct page **pages, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  dma_addr_t dma_addr, iova;
  int i, ret = DMA_ERROR_CODE;
  dma_addr = __alloc_iova(mapping, size);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  iova = dma_addr;
  for (i = 0; i < count; ) {
          unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
          phys_addr_t phys = page_to_phys(pages[i]);
          unsigned int len, j;
          for (j = i + 1; j < count; j++, next_pfn++)
                  if (page_to_pfn(pages[j]) != next_pfn)
                          break;
          len = (j - i) << PAGE_SHIFT;
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          iova += len;
          i = j;
  }
  return dma_addr;
+fail:
  iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
  __free_iova(mapping, dma_addr, size);
  return DMA_ERROR_CODE;
+}

+static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  /*
   * add optional in-page offset from iova to size and align
   * result to page size
   */
  size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  iova &= PAGE_MASK;
  iommu_unmap(mapping->domain, iova, size);
  __free_iova(mapping, iova, size);
  return 0;
+}

+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
      dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
+{
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
  struct page **pages;
  void *addr = NULL;
  *handle = DMA_ERROR_CODE;
  size = PAGE_ALIGN(size);
  pages = __iommu_alloc_buffer(dev, size, gfp);
  if (!pages)
          return NULL;
  *handle = __iommu_create_mapping(dev, pages, size);
  if (*handle == DMA_ERROR_CODE)
          goto err_buffer;
  addr = __iommu_alloc_remap(pages, size, gfp, prot);
  if (!addr)
          goto err_mapping;
  return addr;
+err_mapping:
  __iommu_remove_mapping(dev, *handle, size);
+err_buffer:
  __iommu_free_buffer(dev, pages, size);
  return NULL;
+}

+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

+/*

free a page as defined by the above mapping.

Must not be called with IRQs disabled.

*/

+void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
                    dma_addr_t handle, struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  size = PAGE_ALIGN(size);
  c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          __dma_free_remap(cpu_addr, size);
          __iommu_remove_mapping(dev, handle, size);
          __iommu_free_buffer(dev, pages, size);
  }
+}

+/*

Map a part of the scatter-gather list into contiguous io address space

*/

+static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
                    size_t size, dma_addr_t *handle,
                    enum dma_data_direction dir)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova, iova_base;
  int ret = 0;
  unsigned int count;
  struct scatterlist *s;
  size = PAGE_ALIGN(size);
  *handle = DMA_ERROR_CODE;
  iova_base = iova = __alloc_iova(mapping, size);
  if (iova == DMA_ERROR_CODE)
          return -ENOMEM;
  for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s =
sg_next(s)) {
          phys_addr_t phys = page_to_phys(sg_page(s));
          unsigned int len = PAGE_ALIGN(s->offset + s->length);
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir);
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          count += len >> PAGE_SHIFT;
          iova += len;
  }
  *handle = iova_base;
  return 0;
+fail:
  iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
  __free_iova(mapping, iova_base, size);
  return ret;
+}

+/**

arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map

@dir: DMA transfer direction

Map a set of buffers described by scatterlist in streaming mode for

DMA.

The scatter gather list elements are merged together (if possible) and

tagged with the appropriate dma address and length. They are obtained

via

sg_dma_{address,length}.

*/

+int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
               enum dma_data_direction dir, struct dma_attrs *attrs)
+{
  struct scatterlist *s = sg, *dma = sg, *start = sg;
  int i, count = 0;
  unsigned int offset = s->offset;
  unsigned int size = s->offset + s->length;
  unsigned int max = dma_get_max_seg_size(dev);
  for (i = 1; i < nents; i++) {
          s = sg_next(s);
          s->dma_address = DMA_ERROR_CODE;
          s->dma_length = 0;
          if (s->offset || (size & ~PAGE_MASK) || size + s->length
...
max) {
                  if (__map_sg_chunk(dev, start, size,
&dma->dma_address,
                      dir) < 0)
                          goto bad_mapping;
                  dma->dma_address += offset;
                  dma->dma_length = size - offset;
                  size = offset = s->offset;
                  start = s;
                  dma = sg_next(dma);
                  count += 1;
          }
          size += s->length;
  }
  if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0)
          goto bad_mapping;
  dma->dma_address += offset;
  dma->dma_length = size - offset;
  return count+1;
+bad_mapping:
  for_each_sg(sg, s, count, i)
          __iommu_remove_mapping(dev, sg_dma_address(s),
sg_dma_len(s));
  return 0;
+}

+/**

arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to unmap (same as was passed to dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

Unmap a set of streaming mode DMA translations. Again, CPU access

rules concerning calls here are the same as for dma_unmap_single().

*/

+void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
                  enum dma_data_direction dir, struct dma_attrs
*attrs) +{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i) {
          if (sg_dma_len(s))
                  __iommu_remove_mapping(dev, sg_dma_address(s),
                                         sg_dma_len(s));
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
                                        s->length, dir);
  }
+}

+/**

arm_iommu_sync_sg_for_cpu

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
s->length, dir);

+}

+/**

arm_iommu_sync_sg_for_device

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir); +}

+/**

arm_iommu_map_page

@dev: valid struct device pointer

@page: page that buffer resides in

@offset: offset into page for start of buffer

@size: size of buffer to map

@dir: DMA transfer direction

IOMMU aware version of arm_dma_map_page()

*/

+static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
       unsigned long offset, size_t size, enum dma_data_direction
dir,
       struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t dma_addr;
  int ret, len = PAGE_ALIGN(size + offset);
  if (!arch_is_coherent())
          __dma_page_cpu_to_dev(page, offset, size, dir);
  dma_addr = __alloc_iova(mapping, len);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page),
len, 0);
  if (ret < 0)
          goto fail;
  return dma_addr + offset;
+fail:
  __free_iova(mapping, dma_addr, len);
  return DMA_ERROR_CODE;
+}

+/**

arm_iommu_unmap_page

@dev: valid struct device pointer

@handle: DMA address of buffer

@size: size of buffer (same as passed to dma_map_page)

@dir: DMA transfer direction (same as passed to dma_map_page)

IOMMU aware version of arm_dma_unmap_page()

*/

+static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
          size_t size, enum dma_data_direction dir,
          struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  int offset = handle & ~PAGE_MASK;
  int len = PAGE_ALIGN(size + offset);
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
  iommu_unmap(mapping->domain, iova, len);
  __free_iova(mapping, iova, len);
+}

+static void arm_iommu_sync_single_for_cpu(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
+}

+static void arm_iommu_sync_single_for_device(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  __dma_page_cpu_to_dev(page, offset, size, dir);
+}

+struct dma_map_ops iommu_ops = {
  .alloc          = arm_iommu_alloc_attrs,
  .free           = arm_iommu_free_attrs,
  .mmap           = arm_iommu_mmap_attrs,
  .map_page               = arm_iommu_map_page,
  .unmap_page             = arm_iommu_unmap_page,
  .sync_single_for_cpu    = arm_iommu_sync_single_for_cpu,
  .sync_single_for_device = arm_iommu_sync_single_for_device,
  .map_sg                 = arm_iommu_map_sg,
  .unmap_sg               = arm_iommu_unmap_sg,
  .sync_sg_for_cpu        = arm_iommu_sync_sg_for_cpu,
  .sync_sg_for_device     = arm_iommu_sync_sg_for_device,
+};

+/**

arm_iommu_create_mapping

@bus: pointer to the bus holding the client device (for IOMMU calls)

@base: start address of the valid IO address space

@size: size of the valid IO address space

@order: accuracy of the IO addresses allocations

Creates a mapping structure which holds information about used/unused

IO address ranges, which is required to perform memory allocation and

mapping with IOMMU aware functions.

The client device need to be attached to the mapping with

arm_iommu_attach_device function.

*/

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order)
+{
  unsigned int count = size >> (PAGE_SHIFT + order);
  unsigned int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
  struct dma_iommu_mapping *mapping;
  int err = -ENOMEM;
  if (!count)
          return ERR_PTR(-EINVAL);
  mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
  if (!mapping)
          goto err;
  mapping->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  if (!mapping->bitmap)
          goto err2;
  mapping->base = base;
  mapping->bits = BITS_PER_BYTE * bitmap_size;
  mapping->order = order;
  spin_lock_init(&mapping->lock);
  mapping->domain = iommu_domain_alloc(bus);
  if (!mapping->domain)
          goto err3;
  kref_init(&mapping->kref);
  return mapping;
+err3:
  kfree(mapping->bitmap);
+err2:
  kfree(mapping);
+err:
  return ERR_PTR(err);
+}

+static void release_iommu_mapping(struct kref *kref) +{
  struct dma_iommu_mapping *mapping =
          container_of(kref, struct dma_iommu_mapping, kref);
  iommu_domain_free(mapping->domain);
  kfree(mapping->bitmap);
  kfree(mapping);
+}

+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) +{
  if (mapping)
          kref_put(&mapping->kref, release_iommu_mapping);
+}

+/**
arm_iommu_attach_device

@dev: valid struct device pointer

@mapping: io address space mapping structure (returned from
arm_iommu_create_mapping)
Attaches specified io address space mapping to the provided device,

this replaces the dma operations (dma_map_ops pointer) with the

IOMMU aware version. More than one client might be attached to

the same io address space mapping.

*/
+int arm_iommu_attach_device(struct device *dev,
                      struct dma_iommu_mapping *mapping)
+{
  int err;
  err = iommu_attach_device(mapping->domain, dev);
  if (err)
          return err;
  kref_get(&mapping->kref);
  dev->archdata.mapping = mapping;
  set_dma_ops(dev, &iommu_ops);
  pr_info("Attached IOMMU controller to %s device.\n",
dev_name(dev));
  return 0;
+}

+#endif diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h index 162be66..bf312c3 100644 --- a/arch/arm/mm/vmregion.h +++ b/arch/arm/mm/vmregion.h @@ -17,7 +17,7 @@ struct arm_vmregion { struct list_head vm_list; unsigned long vm_start; unsigned long vm_end;
  struct page             *vm_pages;
  void                    *priv;
 int                     vm_active;
 const void              *caller;
};

1.7.1.569.g6f426

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Kyungmin Park

1:51 a.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

On 4/20/12, Abhinav Kochhar kochhar.abhinav@gmail.com wrote:

...

Hi Marek,

dma_addr_t dma_addr is an unused argument passed to the function arm_iommu_mmap_attrs

Even though it's not used at here. it's mmap function field at dma_map_ops. To match the type, it's required.

struct dma_map_ops iommu_ops = { .alloc = arm_iommu_alloc_attrs, .free = arm_iommu_free_attrs, .mmap = arm_iommu_mmap_attrs,

Thank you, Kyungmin Park

...

+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_
head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

On Wed, Apr 18, 2012 at 10:44 PM, Marek Szyprowski <m.szyprowski@samsung.com

...
wrote:

...
This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

This implementation tries to optimize dma address space usage by remapping all possible physical memory chunks into a single dma address space chunk.

DMA address space is managed on top of the bitmap stored in the dma_iommu_mapping structure stored in device->archdata. Platform setup code has to initialize parameters of the dma address space (base address, size, allocation precision order) with arm_iommu_create_mapping() function. To reduce the size of the bitmap, all allocations are aligned to the specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with alloc_pages() function to avoid failing if the physical memory gets fragmented. In worst case the allocated buffer is composed of 4 KiB page chunks.

dma_map_sg() function minimizes the total number of dma address space chunks by merging of physical memory chunks into one larger dma address space chunk. If requested chunk (scatter list entry) boundaries match physical page boundaries, most calls to dma_map_sg() requests will result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma address space.

All dma functions also perform required cache operation like their counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by:

Krishna Reddy vdumpa@nvidia.com,

Andrzej Pietrasiewicz andrzej.p@samsung.com,

Hiroshi DOYU hdoyu@nvidia.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Tested-By: Subash Patel subash.ramaswamy@linaro.org

arch/arm/Kconfig | 8 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/mm/dma-mapping.c | 727 +++++++++++++++++++++++++++++++++++++- arch/arm/mm/vmregion.h | 2 +- 5 files changed, 759 insertions(+), 15 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fd27d4..874e519 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -46,6 +46,14 @@ config ARM config ARM_HAS_SG_CHAIN bool

+config NEED_SG_DMA_LENGTH
  bool
+config ARM_DMA_USE_IOMMU
  select NEED_SG_DMA_LENGTH
  select ARM_HAS_SG_CHAIN
  bool
config HAVE_PWM bool

diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 6e2cb0e..b69c0d3 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -14,6 +14,9 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu; /* private IOMMU data */ #endif +#ifdef CONFIG_ARM_DMA_USE_IOMMU
  struct dma_iommu_mapping        *mapping;
+#endif };

struct omap_device; diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h new file mode 100644 index 0000000..799b094 --- /dev/null +++ b/arch/arm/include/asm/dma-iommu.h @@ -0,0 +1,34 @@ +#ifndef ASMARM_DMA_IOMMU_H +#define ASMARM_DMA_IOMMU_H

+#ifdef __KERNEL__

+#include <linux/mm_types.h> +#include <linux/scatterlist.h> +#include <linux/dma-debug.h> +#include <linux/kmemcheck.h>

+struct dma_iommu_mapping {
  /* iommu specific data */
  struct iommu_domain     *domain;
  void                    *bitmap;
  size_t                  bits;
  unsigned int            order;
  dma_addr_t              base;
  spinlock_t              lock;
  struct kref             kref;
+};

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order);
+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);

+int arm_iommu_attach_device(struct device *dev,
                                  struct dma_iommu_mapping
*mapping);

+#endif /* __KERNEL__ */ +#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index d4aad65..2d11aa0 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -19,6 +19,8 @@ #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/slab.h> +#include <linux/iommu.h> +#include <linux/vmalloc.h>

#include <asm/memory.h> #include <asm/highmem.h> @@ -26,6 +28,7 @@ #include <asm/tlbflush.h> #include <asm/sizes.h> #include <asm/mach/arch.h> +#include <asm/dma-iommu.h>

#include "mm.h"

@@ -155,6 +158,21 @@ static u64 get_coherent_dma_mask(struct device *dev) return mask; }

+static void __dma_clear_buffer(struct page *page, size_t size) +{
  void *ptr;
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  if (ptr) {
          memset(ptr, 0, size);
          dmac_flush_range(ptr, ptr + size);
          outer_flush_range(__pa(ptr), __pa(ptr) + size);
  }
+}

/*

Allocate a DMA buffer for 'dev' of size 'size' using the

specified gfp mask. Note that 'size' must be page aligned.

@@ -163,7 +181,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf { unsigned long order = get_order(size); struct page *page, *p, *e;
  void *ptr;
 u64 mask = get_coherent_dma_mask(dev);
#ifdef CONFIG_DMA_API_DEBUG @@ -192,14 +209,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) __free_page(p);
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  memset(ptr, 0, size);
  dmac_flush_range(ptr, ptr + size);
  outer_flush_range(__pa(ptr), __pa(ptr) + size);
  __dma_clear_buffer(page, size);

 return page;
} @@ -348,7 +358,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
           pte = consistent_pte[idx] + off;
          c->vm_pages = page;
          c->priv = page;

         do {
                 BUG_ON(!pte_none(*pte));
@@ -461,6 +471,14 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, return addr; }

+static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot) +{
  prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                      pgprot_writecombine(prot) :
                      pgprot_dmacoherent(prot);
  return prot;
+}

/*

Allocate DMA-coherent memory space and return both the kernel remapped

virtual and bus address for that space.

@@ -468,9 +486,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs) {
  pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                  pgprot_writecombine(pgprot_kernel) :
                  pgprot_dmacoherent(pgprot_kernel);
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
 void *memory;

 if (dma_alloc_from_coherent(dev, size, handle, &memory))
@@ -497,16 +513,20 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, pgprot_writecombine(vma->vm_page_prot) : pgprot_dmacoherent(vma->vm_page_prot);
  if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
          return ret;
 c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
 if (c) {
         unsigned long off = vma->vm_pgoff;
          struct page *pages = c->priv;

         kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;

         if (off < kern_size &&
             user_size <= (kern_size - off)) {
                 ret = remap_pfn_range(vma, vma->vm_start,
                                        page_to_pfn(c->vm_pages) +
off,
                                        page_to_pfn(pages) + off,
                                       user_size << PAGE_SHIFT,
                                       vma->vm_page_prot);
         }
@@ -645,6 +665,9 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int i, j;
   for_each_sg(sg, s, nents, i) {
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
          s->dma_length = s->length;
+#endif s->dma_address = ops->map_page(dev, sg_page(s), s->offset, s->length, dir, attrs); if (dma_mapping_error(dev, s->dma_address)) @@ -753,3 +776,679 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init);

+#ifdef CONFIG_ARM_DMA_USE_IOMMU

+/* IOMMU */

+static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
                                size_t size)
+{
  unsigned int order = get_order(size);
  unsigned int align = 0;
  unsigned int count, start;
  unsigned long flags;
  count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) +
           (1 << mapping->order) - 1) >> mapping->order;
  if (order > mapping->order)
          align = (1 << (order - mapping->order)) - 1;
  spin_lock_irqsave(&mapping->lock, flags);
  start = bitmap_find_next_zero_area(mapping->bitmap, mapping->bits,
0,
                                     count, align);
  if (start > mapping->bits) {
          spin_unlock_irqrestore(&mapping->lock, flags);
          return DMA_ERROR_CODE;
  }
  bitmap_set(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
  return mapping->base + (start << (mapping->order + PAGE_SHIFT));
+}

+static inline void __free_iova(struct dma_iommu_mapping *mapping,
                         dma_addr_t addr, size_t size)
+{
  unsigned int start = (addr - mapping->base) >>
                       (mapping->order + PAGE_SHIFT);
  unsigned int count = ((size >> PAGE_SHIFT) +
                        (1 << mapping->order) - 1) >>
mapping->order;
  unsigned long flags;
  spin_lock_irqsave(&mapping->lock, flags);
  bitmap_clear(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
+}

+static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +{
  struct page **pages;
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i = 0;
  if (array_size <= PAGE_SIZE)
          pages = kzalloc(array_size, gfp);
  else
          pages = vzalloc(array_size);
  if (!pages)
          return NULL;
  while (count) {
          int j, order = __ffs(count);
          pages[i] = alloc_pages(gfp | __GFP_NOWARN, order);
          while (!pages[i] && order)
                  pages[i] = alloc_pages(gfp | __GFP_NOWARN,
--order);
          if (!pages[i])
                  goto error;
          if (order)
                  split_page(pages[i], order);
          j = 1 << order;
          while (--j)
                  pages[i + j] = pages[i] + j;
          __dma_clear_buffer(pages[i], PAGE_SIZE << order);
          i += 1 << order;
          count -= 1 << order;
  }
  return pages;
+error:
  while (--i)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return NULL;
+}

+static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) +{
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i;
  for (i = 0; i < count; i++)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return 0;
+}

+/*

Create a CPU mapping for a specified pages

*/

+static void * +__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot) +{
  struct arm_vmregion *c;
  size_t align;
  size_t count = size >> PAGE_SHIFT;
  int bit;
  if (!consistent_pte[0]) {
          pr_err("%s: not initialised\n", __func__);
          dump_stack();
          return NULL;
  }
  /*
   * Align the virtual region allocation - maximum alignment is
   * a section size, minimum is a page size.  This helps reduce
   * fragmentation of the DMA space, and also prevents allocations
   * smaller than a section from crossing a section boundary.
   */
  bit = fls(size - 1);
  if (bit > SECTION_SHIFT)
          bit = SECTION_SHIFT;
  align = 1 << bit;
  /*
   * Allocate a virtual address in the consistent mapping region.
   */
  c = arm_vmregion_alloc(&consistent_head, align, size,
                      gfp & ~(__GFP_DMA | __GFP_HIGHMEM), NULL);
  if (c) {
          pte_t *pte;
          int idx = CONSISTENT_PTE_INDEX(c->vm_start);
          int i = 0;
          u32 off = CONSISTENT_OFFSET(c->vm_start) &
(PTRS_PER_PTE-1);
          pte = consistent_pte[idx] + off;
          c->priv = pages;
          do {
                  BUG_ON(!pte_none(*pte));
                  set_pte_ext(pte, mk_pte(pages[i], prot), 0);
                  pte++;
                  off++;
                  i++;
                  if (off >= PTRS_PER_PTE) {
                          off = 0;
                          pte = consistent_pte[++idx];
                  }
          } while (i < count);
          dsb();
          return (void *)c->vm_start;
  }
  return NULL;
+}

+/*

Create a mapping in device IO address space for specified pages

*/

+static dma_addr_t +__iommu_create_mapping(struct device *dev, struct page **pages, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  dma_addr_t dma_addr, iova;
  int i, ret = DMA_ERROR_CODE;
  dma_addr = __alloc_iova(mapping, size);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  iova = dma_addr;
  for (i = 0; i < count; ) {
          unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
          phys_addr_t phys = page_to_phys(pages[i]);
          unsigned int len, j;
          for (j = i + 1; j < count; j++, next_pfn++)
                  if (page_to_pfn(pages[j]) != next_pfn)
                          break;
          len = (j - i) << PAGE_SHIFT;
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          iova += len;
          i = j;
  }
  return dma_addr;
+fail:
  iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
  __free_iova(mapping, dma_addr, size);
  return DMA_ERROR_CODE;
+}

+static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  /*
   * add optional in-page offset from iova to size and align
   * result to page size
   */
  size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  iova &= PAGE_MASK;
  iommu_unmap(mapping->domain, iova, size);
  __free_iova(mapping, iova, size);
  return 0;
+}

+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
      dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
+{
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
  struct page **pages;
  void *addr = NULL;
  *handle = DMA_ERROR_CODE;
  size = PAGE_ALIGN(size);
  pages = __iommu_alloc_buffer(dev, size, gfp);
  if (!pages)
          return NULL;
  *handle = __iommu_create_mapping(dev, pages, size);
  if (*handle == DMA_ERROR_CODE)
          goto err_buffer;
  addr = __iommu_alloc_remap(pages, size, gfp, prot);
  if (!addr)
          goto err_mapping;
  return addr;
+err_mapping:
  __iommu_remove_mapping(dev, *handle, size);
+err_buffer:
  __iommu_free_buffer(dev, pages, size);
  return NULL;
+}

+static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

+/*

free a page as defined by the above mapping.

Must not be called with IRQs disabled.

*/

+void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
                    dma_addr_t handle, struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  size = PAGE_ALIGN(size);
  c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          __dma_free_remap(cpu_addr, size);
          __iommu_remove_mapping(dev, handle, size);
          __iommu_free_buffer(dev, pages, size);
  }
+}

+/*

Map a part of the scatter-gather list into contiguous io address space

*/

+static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
                    size_t size, dma_addr_t *handle,
                    enum dma_data_direction dir)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova, iova_base;
  int ret = 0;
  unsigned int count;
  struct scatterlist *s;
  size = PAGE_ALIGN(size);
  *handle = DMA_ERROR_CODE;
  iova_base = iova = __alloc_iova(mapping, size);
  if (iova == DMA_ERROR_CODE)
          return -ENOMEM;
  for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s =
sg_next(s)) {
          phys_addr_t phys = page_to_phys(sg_page(s));
          unsigned int len = PAGE_ALIGN(s->offset + s->length);
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir);
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          count += len >> PAGE_SHIFT;
          iova += len;
  }
  *handle = iova_base;
  return 0;
+fail:
  iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
  __free_iova(mapping, iova_base, size);
  return ret;
+}

+/**

arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map

@dir: DMA transfer direction

Map a set of buffers described by scatterlist in streaming mode for

DMA.

The scatter gather list elements are merged together (if possible) and

tagged with the appropriate dma address and length. They are obtained

via

sg_dma_{address,length}.

*/

+int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
               enum dma_data_direction dir, struct dma_attrs *attrs)
+{
  struct scatterlist *s = sg, *dma = sg, *start = sg;
  int i, count = 0;
  unsigned int offset = s->offset;
  unsigned int size = s->offset + s->length;
  unsigned int max = dma_get_max_seg_size(dev);
  for (i = 1; i < nents; i++) {
          s = sg_next(s);
          s->dma_address = DMA_ERROR_CODE;
          s->dma_length = 0;
          if (s->offset || (size & ~PAGE_MASK) || size + s->length >
max) {
                  if (__map_sg_chunk(dev, start, size,
&dma->dma_address,
                      dir) < 0)
                          goto bad_mapping;
                  dma->dma_address += offset;
                  dma->dma_length = size - offset;
                  size = offset = s->offset;
                  start = s;
                  dma = sg_next(dma);
                  count += 1;
          }
          size += s->length;
  }
  if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) < 0)
          goto bad_mapping;
  dma->dma_address += offset;
  dma->dma_length = size - offset;
  return count+1;
+bad_mapping:
  for_each_sg(sg, s, count, i)
          __iommu_remove_mapping(dev, sg_dma_address(s),
sg_dma_len(s));
  return 0;
+}

+/**

arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to unmap (same as was passed to dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

Unmap a set of streaming mode DMA translations. Again, CPU access

rules concerning calls here are the same as for dma_unmap_single().

*/

+void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
                  enum dma_data_direction dir, struct dma_attrs
*attrs) +{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i) {
          if (sg_dma_len(s))
                  __iommu_remove_mapping(dev, sg_dma_address(s),
                                         sg_dma_len(s));
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
                                        s->length, dir);
  }
+}

+/**

arm_iommu_sync_sg_for_cpu

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
s->length, dir);

+}

+/**

arm_iommu_sync_sg_for_device

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir); +}

+/**

arm_iommu_map_page

@dev: valid struct device pointer

@page: page that buffer resides in

@offset: offset into page for start of buffer

@size: size of buffer to map

@dir: DMA transfer direction

IOMMU aware version of arm_dma_map_page()

*/

+static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
       unsigned long offset, size_t size, enum dma_data_direction
dir,
       struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t dma_addr;
  int ret, len = PAGE_ALIGN(size + offset);
  if (!arch_is_coherent())
          __dma_page_cpu_to_dev(page, offset, size, dir);
  dma_addr = __alloc_iova(mapping, len);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page),
len, 0);
  if (ret < 0)
          goto fail;
  return dma_addr + offset;
+fail:
  __free_iova(mapping, dma_addr, len);
  return DMA_ERROR_CODE;
+}

+/**

arm_iommu_unmap_page

@dev: valid struct device pointer

@handle: DMA address of buffer

@size: size of buffer (same as passed to dma_map_page)

@dir: DMA transfer direction (same as passed to dma_map_page)

IOMMU aware version of arm_dma_unmap_page()

*/

+static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
          size_t size, enum dma_data_direction dir,
          struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  int offset = handle & ~PAGE_MASK;
  int len = PAGE_ALIGN(size + offset);
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
  iommu_unmap(mapping->domain, iova, len);
  __free_iova(mapping, iova, len);
+}

+static void arm_iommu_sync_single_for_cpu(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
+}

+static void arm_iommu_sync_single_for_device(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  __dma_page_cpu_to_dev(page, offset, size, dir);
+}

+struct dma_map_ops iommu_ops = {
  .alloc          = arm_iommu_alloc_attrs,
  .free           = arm_iommu_free_attrs,
  .mmap           = arm_iommu_mmap_attrs,
  .map_page               = arm_iommu_map_page,
  .unmap_page             = arm_iommu_unmap_page,
  .sync_single_for_cpu    = arm_iommu_sync_single_for_cpu,
  .sync_single_for_device = arm_iommu_sync_single_for_device,
  .map_sg                 = arm_iommu_map_sg,
  .unmap_sg               = arm_iommu_unmap_sg,
  .sync_sg_for_cpu        = arm_iommu_sync_sg_for_cpu,
  .sync_sg_for_device     = arm_iommu_sync_sg_for_device,
+};

+/**

arm_iommu_create_mapping

@bus: pointer to the bus holding the client device (for IOMMU calls)

@base: start address of the valid IO address space

@size: size of the valid IO address space

@order: accuracy of the IO addresses allocations

Creates a mapping structure which holds information about used/unused

IO address ranges, which is required to perform memory allocation and

mapping with IOMMU aware functions.

The client device need to be attached to the mapping with

arm_iommu_attach_device function.

*/

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order)
+{
  unsigned int count = size >> (PAGE_SHIFT + order);
  unsigned int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
  struct dma_iommu_mapping *mapping;
  int err = -ENOMEM;
  if (!count)
          return ERR_PTR(-EINVAL);
  mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
  if (!mapping)
          goto err;
  mapping->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  if (!mapping->bitmap)
          goto err2;
  mapping->base = base;
  mapping->bits = BITS_PER_BYTE * bitmap_size;
  mapping->order = order;
  spin_lock_init(&mapping->lock);
  mapping->domain = iommu_domain_alloc(bus);
  if (!mapping->domain)
          goto err3;
  kref_init(&mapping->kref);
  return mapping;
+err3:
  kfree(mapping->bitmap);
+err2:
  kfree(mapping);
+err:
  return ERR_PTR(err);
+}

+static void release_iommu_mapping(struct kref *kref) +{
  struct dma_iommu_mapping *mapping =
          container_of(kref, struct dma_iommu_mapping, kref);
  iommu_domain_free(mapping->domain);
  kfree(mapping->bitmap);
  kfree(mapping);
+}

+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) +{
  if (mapping)
          kref_put(&mapping->kref, release_iommu_mapping);
+}

+/**
arm_iommu_attach_device

@dev: valid struct device pointer

@mapping: io address space mapping structure (returned from
arm_iommu_create_mapping)
Attaches specified io address space mapping to the provided device,

this replaces the dma operations (dma_map_ops pointer) with the

IOMMU aware version. More than one client might be attached to

the same io address space mapping.

*/
+int arm_iommu_attach_device(struct device *dev,
                      struct dma_iommu_mapping *mapping)
+{
  int err;
  err = iommu_attach_device(mapping->domain, dev);
  if (err)
          return err;
  kref_get(&mapping->kref);
  dev->archdata.mapping = mapping;
  set_dma_ops(dev, &iommu_ops);
  pr_info("Attached IOMMU controller to %s device.\n",
dev_name(dev));
  return 0;
+}

+#endif diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h index 162be66..bf312c3 100644 --- a/arch/arm/mm/vmregion.h +++ b/arch/arm/mm/vmregion.h @@ -17,7 +17,7 @@ struct arm_vmregion { struct list_head vm_list; unsigned long vm_start; unsigned long vm_end;
  struct page             *vm_pages;
  void                    *priv;
 int                     vm_active;
 const void              *caller;
};

1.7.1.569.g6f426

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Abhinav Kochhar

23 Apr 23 Apr

10:42 a.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

Hi,

I see a bottle-neck with the current dma-mapping framework. Issue seems to be with the Virtual memory allocation for access in kernel address space.

1. In "arch/arm/mm/dma-mapping.c" there is a initialization call to "consistent_init". It reserves size 32MB of Kernel Address space. 2. "consistent_init" allocates memory for kernel page directory and page tables.

3. "__iommu_alloc_remap" function allocates virtual memory region in kernel address space reserved in step 1.

4. "__iommu_alloc_remap" function then maps the allocated pages to the address space reserved in step 3.

Since the virtual memory area allocated for mapping these pages in kernel address space is only 32MB,

eventually the calls for allocation and mapping new pages into kernel address space are going to fail once 32 MB is exhausted.

e.g., For Exynos 5 platform Each framebuffer for 1280x800 resolution consumes around 4MB.

We have a scenario where X11 DRI driver would allocate Non-contig pages for all "Pixmaps" through "exynos_drm_gem_create" function which will follow the path given above in steps 1 - 4.

Now the problem is the size limitation of 32MB. We may want to allocate more than 8 such buffers when X11 DRI driver is integrated.

Possible solutions:

1. Why do we need to create a kernel virtual address space? Are we going to access these pages in kernel using this address?

If we are not going to access anything in kernel then why do we need to map these pages in kernel address space?. If we can avoid this then the problem can be solved.

2 Is it used for only book-keeping to retrieve "struct pages" later on for passing/mapping to different devices?

If yes, then we have to find another way.

For "dmabuf" framework one solution could be to add a new member variable "pages" in the exporting driver's local object and use that for passing/mapping to different devices.

Moreover, even if we increase to say 64 MB that would not be enough for our use, we never know how many graphic applications would be spawned by the user. Let me know your opinion on this.

Regards, Abhinav

On Fri, Apr 20, 2012 at 10:51 AM, Kyungmin Park kyungmin.park@samsung.comwrote:

...

On 4/20/12, Abhinav Kochhar kochhar.abhinav@gmail.com wrote:

...
Hi Marek,

dma_addr_t dma_addr is an unused argument passed to the function arm_iommu_mmap_attrs

Even though it's not used at here. it's mmap function field at dma_map_ops. To match the type, it's required.

struct dma_map_ops iommu_ops = { .alloc = arm_iommu_alloc_attrs, .free = arm_iommu_free_attrs, .mmap = arm_iommu_mmap_attrs,

Thank you, Kyungmin Park

...
+static int arm_iommu_mmap_attrs(struct device *dev, struct

vm_area_struct

...
*vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_
head, (unsigned long)cpu_addr);
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

On Wed, Apr 18, 2012 at 10:44 PM, Marek Szyprowski <
m.szyprowski@samsung.com

...
...
wrote:

...
This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

This implementation tries to optimize dma address space usage by

remapping

...
...
all possible physical memory chunks into a single dma address space

chunk.

...
...
DMA address space is managed on top of the bitmap stored in the dma_iommu_mapping structure stored in device->archdata. Platform setup code has to initialize parameters of the dma address space (base

address,

...
...
size, allocation precision order) with arm_iommu_create_mapping() function. To reduce the size of the bitmap, all allocations are aligned to the specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with alloc_pages() function to avoid failing if the physical memory gets fragmented. In worst case the allocated buffer is composed of 4 KiB page chunks.

dma_map_sg() function minimizes the total number of dma address space chunks by merging of physical memory chunks into one larger dma address space chunk. If requested chunk (scatter list entry) boundaries match physical page boundaries, most calls to dma_map_sg() requests will result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma address space.

All dma functions also perform required cache operation like their counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by:

Krishna Reddy vdumpa@nvidia.com,

Andrzej Pietrasiewicz andrzej.p@samsung.com,

Hiroshi DOYU hdoyu@nvidia.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Tested-By: Subash Patel subash.ramaswamy@linaro.org

arch/arm/Kconfig | 8 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/mm/dma-mapping.c | 727 +++++++++++++++++++++++++++++++++++++- arch/arm/mm/vmregion.h | 2 +- 5 files changed, 759 insertions(+), 15 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 0fd27d4..874e519 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -46,6 +46,14 @@ config ARM config ARM_HAS_SG_CHAIN bool

+config NEED_SG_DMA_LENGTH
  bool
+config ARM_DMA_USE_IOMMU
  select NEED_SG_DMA_LENGTH
  select ARM_HAS_SG_CHAIN
  bool
config HAVE_PWM bool

diff --git a/arch/arm/include/asm/device.h
b/arch/arm/include/asm/device.h

...
...
index 6e2cb0e..b69c0d3 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -14,6 +14,9 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu; /* private IOMMU data */ #endif +#ifdef CONFIG_ARM_DMA_USE_IOMMU
  struct dma_iommu_mapping        *mapping;
+#endif };

struct omap_device; diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h new file mode 100644 index 0000000..799b094 --- /dev/null +++ b/arch/arm/include/asm/dma-iommu.h @@ -0,0 +1,34 @@ +#ifndef ASMARM_DMA_IOMMU_H +#define ASMARM_DMA_IOMMU_H

+#ifdef __KERNEL__

+#include <linux/mm_types.h> +#include <linux/scatterlist.h> +#include <linux/dma-debug.h> +#include <linux/kmemcheck.h>

+struct dma_iommu_mapping {
  /* iommu specific data */
  struct iommu_domain     *domain;
  void                    *bitmap;
  size_t                  bits;
  unsigned int            order;
  dma_addr_t              base;
  spinlock_t              lock;
  struct kref             kref;
+};

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order);
+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);

+int arm_iommu_attach_device(struct device *dev,
                                  struct dma_iommu_mapping
*mapping);

+#endif /* __KERNEL__ */ +#endif diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index d4aad65..2d11aa0 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -19,6 +19,8 @@ #include <linux/dma-mapping.h> #include <linux/highmem.h> #include <linux/slab.h> +#include <linux/iommu.h> +#include <linux/vmalloc.h>

#include <asm/memory.h> #include <asm/highmem.h> @@ -26,6 +28,7 @@ #include <asm/tlbflush.h> #include <asm/sizes.h> #include <asm/mach/arch.h> +#include <asm/dma-iommu.h>

#include "mm.h"

@@ -155,6 +158,21 @@ static u64 get_coherent_dma_mask(struct device
*dev)

...
...
   return mask;
}

+static void __dma_clear_buffer(struct page *page, size_t size) +{
  void *ptr;
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  if (ptr) {
          memset(ptr, 0, size);
          dmac_flush_range(ptr, ptr + size);
          outer_flush_range(__pa(ptr), __pa(ptr) + size);
  }
+}

/*

Allocate a DMA buffer for 'dev' of size 'size' using the

specified gfp mask. Note that 'size' must be page aligned.

@@ -163,7 +181,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf { unsigned long order = get_order(size); struct page *page, *p, *e;
  void *ptr;
 u64 mask = get_coherent_dma_mask(dev);
#ifdef CONFIG_DMA_API_DEBUG @@ -192,14 +209,7 @@ static struct page *__dma_alloc_buffer(struct
device

...
...
*dev, size_t size, gfp_t gf for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p

<

...
...
e; p++) __free_page(p);
  /*
   * Ensure that the allocated pages are zeroed, and that any data
   * lurking in the kernel direct-mapped region is invalidated.
   */
  ptr = page_address(page);
  memset(ptr, 0, size);
  dmac_flush_range(ptr, ptr + size);
  outer_flush_range(__pa(ptr), __pa(ptr) + size);
  __dma_clear_buffer(page, size);

 return page;
} @@ -348,7 +358,7 @@ __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot, u32 off = CONSISTENT_OFFSET(c->vm_start) & (PTRS_PER_PTE-1);
           pte = consistent_pte[idx] + off;
          c->vm_pages = page;
          c->priv = page;

         do {
                 BUG_ON(!pte_none(*pte));
@@ -461,6 +471,14 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, return addr; }

+static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs,
pgprot_t

...
...
prot) +{
  prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                      pgprot_writecombine(prot) :
                      pgprot_dmacoherent(prot);
  return prot;
+}

/*

Allocate DMA-coherent memory space and return both the kernel
remapped

...
...

virtual and bus address for that space.

@@ -468,9 +486,7 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t

*handle,

...
...
               gfp_t gfp, struct dma_attrs *attrs)
{
  pgprot_t prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
                  pgprot_writecombine(pgprot_kernel) :
                  pgprot_dmacoherent(pgprot_kernel);
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
 void *memory;

 if (dma_alloc_from_coherent(dev, size, handle, &memory))
@@ -497,16 +513,20 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma, pgprot_writecombine(vma->vm_page_prot) : pgprot_dmacoherent(vma->vm_page_prot);
  if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
          return ret;
 c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr);
 if (c) {
         unsigned long off = vma->vm_pgoff;
          struct page *pages = c->priv;

         kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT;

         if (off < kern_size &&
             user_size <= (kern_size - off)) {
                 ret = remap_pfn_range(vma, vma->vm_start,
                                        page_to_pfn(c->vm_pages) +
off,
                                        page_to_pfn(pages) + off,
                                       user_size << PAGE_SHIFT,
                                       vma->vm_page_prot);
         }
@@ -645,6 +665,9 @@ int arm_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents, int i, j;
   for_each_sg(sg, s, nents, i) {
+#ifdef CONFIG_NEED_SG_DMA_LENGTH
          s->dma_length = s->length;
+#endif s->dma_address = ops->map_page(dev, sg_page(s),
s->offset,

...
...
                                           s->length, dir, attrs);
           if (dma_mapping_error(dev, s->dma_address))
@@ -753,3 +776,679 @@ static int __init dma_debug_do_init(void) return 0; } fs_initcall(dma_debug_do_init);

+#ifdef CONFIG_ARM_DMA_USE_IOMMU

+/* IOMMU */

+static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping
*mapping,

...
...
                                size_t size)
+{
  unsigned int order = get_order(size);
  unsigned int align = 0;
  unsigned int count, start;
  unsigned long flags;
  count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) +
           (1 << mapping->order) - 1) >> mapping->order;
  if (order > mapping->order)
          align = (1 << (order - mapping->order)) - 1;
  spin_lock_irqsave(&mapping->lock, flags);
  start = bitmap_find_next_zero_area(mapping->bitmap,
mapping->bits,

...
...
0,
                                     count, align);
  if (start > mapping->bits) {
          spin_unlock_irqrestore(&mapping->lock, flags);
          return DMA_ERROR_CODE;
  }
  bitmap_set(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
  return mapping->base + (start << (mapping->order + PAGE_SHIFT));
+}

+static inline void __free_iova(struct dma_iommu_mapping *mapping,
                         dma_addr_t addr, size_t size)
+{
  unsigned int start = (addr - mapping->base) >>
                       (mapping->order + PAGE_SHIFT);
  unsigned int count = ((size >> PAGE_SHIFT) +
                        (1 << mapping->order) - 1) >>
mapping->order;
  unsigned long flags;
  spin_lock_irqsave(&mapping->lock, flags);
  bitmap_clear(mapping->bitmap, start, count);
  spin_unlock_irqrestore(&mapping->lock, flags);
+}

+static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +{
  struct page **pages;
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i = 0;
  if (array_size <= PAGE_SIZE)
          pages = kzalloc(array_size, gfp);
  else
          pages = vzalloc(array_size);
  if (!pages)
          return NULL;
  while (count) {
          int j, order = __ffs(count);
          pages[i] = alloc_pages(gfp | __GFP_NOWARN, order);
          while (!pages[i] && order)
                  pages[i] = alloc_pages(gfp | __GFP_NOWARN,
--order);
          if (!pages[i])
                  goto error;
          if (order)
                  split_page(pages[i], order);
          j = 1 << order;
          while (--j)
                  pages[i + j] = pages[i] + j;
          __dma_clear_buffer(pages[i], PAGE_SIZE << order);
          i += 1 << order;
          count -= 1 << order;
  }
  return pages;
+error:
  while (--i)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return NULL;
+}

+static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t size) +{
  int count = size >> PAGE_SHIFT;
  int array_size = count * sizeof(struct page *);
  int i;
  for (i = 0; i < count; i++)
          if (pages[i])
                  __free_pages(pages[i], 0);
  if (array_size < PAGE_SIZE)
          kfree(pages);
  else
          vfree(pages);
  return 0;
+}

+/*

Create a CPU mapping for a specified pages

*/

+static void * +__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp,
pgprot_t

...
...
prot) +{
  struct arm_vmregion *c;
  size_t align;
  size_t count = size >> PAGE_SHIFT;
  int bit;
  if (!consistent_pte[0]) {
          pr_err("%s: not initialised\n", __func__);
          dump_stack();
          return NULL;
  }
  /*
   * Align the virtual region allocation - maximum alignment is
   * a section size, minimum is a page size.  This helps reduce
   * fragmentation of the DMA space, and also prevents allocations
   * smaller than a section from crossing a section boundary.
   */
  bit = fls(size - 1);
  if (bit > SECTION_SHIFT)
          bit = SECTION_SHIFT;
  align = 1 << bit;
  /*
   * Allocate a virtual address in the consistent mapping region.
   */
  c = arm_vmregion_alloc(&consistent_head, align, size,
                      gfp & ~(__GFP_DMA | __GFP_HIGHMEM), NULL);
  if (c) {
          pte_t *pte;
          int idx = CONSISTENT_PTE_INDEX(c->vm_start);
          int i = 0;
          u32 off = CONSISTENT_OFFSET(c->vm_start) &
(PTRS_PER_PTE-1);
          pte = consistent_pte[idx] + off;
          c->priv = pages;
          do {
                  BUG_ON(!pte_none(*pte));
                  set_pte_ext(pte, mk_pte(pages[i], prot), 0);
                  pte++;
                  off++;
                  i++;
                  if (off >= PTRS_PER_PTE) {
                          off = 0;
                          pte = consistent_pte[++idx];
                  }
          } while (i < count);
          dsb();
          return (void *)c->vm_start;
  }
  return NULL;
+}

+/*

Create a mapping in device IO address space for specified pages

*/

+static dma_addr_t +__iommu_create_mapping(struct device *dev, struct page **pages, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  dma_addr_t dma_addr, iova;
  int i, ret = DMA_ERROR_CODE;
  dma_addr = __alloc_iova(mapping, size);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  iova = dma_addr;
  for (i = 0; i < count; ) {
          unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
          phys_addr_t phys = page_to_phys(pages[i]);
          unsigned int len, j;
          for (j = i + 1; j < count; j++, next_pfn++)
                  if (page_to_pfn(pages[j]) != next_pfn)
                          break;
          len = (j - i) << PAGE_SHIFT;
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          iova += len;
          i = j;
  }
  return dma_addr;
+fail:
  iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
  __free_iova(mapping, dma_addr, size);
  return DMA_ERROR_CODE;
+}

+static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  /*
   * add optional in-page offset from iova to size and align
   * result to page size
   */
  size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  iova &= PAGE_MASK;
  iommu_unmap(mapping->domain, iova, size);
  __free_iova(mapping, iova, size);
  return 0;
+}

+static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
      dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
+{
  pgprot_t prot = __get_dma_pgprot(attrs, pgprot_kernel);
  struct page **pages;
  void *addr = NULL;
  *handle = DMA_ERROR_CODE;
  size = PAGE_ALIGN(size);
  pages = __iommu_alloc_buffer(dev, size, gfp);
  if (!pages)
          return NULL;
  *handle = __iommu_create_mapping(dev, pages, size);
  if (*handle == DMA_ERROR_CODE)
          goto err_buffer;
  addr = __iommu_alloc_remap(pages, size, gfp, prot);
  if (!addr)
          goto err_mapping;
  return addr;
+err_mapping:
  __iommu_remove_mapping(dev, *handle, size);
+err_buffer:
  __iommu_free_buffer(dev, pages, size);
  return NULL;
+}

+static int arm_iommu_mmap_attrs(struct device *dev, struct
vm_area_struct

...
...
*vma,
              void *cpu_addr, dma_addr_t dma_addr, size_t size,
              struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
  c = arm_vmregion_find(&consistent_head, (unsigned
long)cpu_addr);

...
...
  if (c) {
          struct page **pages = c->priv;
          unsigned long uaddr = vma->vm_start;
          unsigned long usize = vma->vm_end - vma->vm_start;
          int i = 0;
          do {
                  int ret;
                  ret = vm_insert_page(vma, uaddr, pages[i++]);
                  if (ret) {
                          pr_err("Remapping memory, error: %d\n",
ret);
                          return ret;
                  }
                  uaddr += PAGE_SIZE;
                  usize -= PAGE_SIZE;
          } while (usize > 0);
  }
  return 0;
+}

+/*

free a page as defined by the above mapping.

Must not be called with IRQs disabled.

*/

+void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
                    dma_addr_t handle, struct dma_attrs *attrs)
+{
  struct arm_vmregion *c;
  size = PAGE_ALIGN(size);
  c = arm_vmregion_find(&consistent_head, (unsigned
long)cpu_addr);

...
...
  if (c) {
          struct page **pages = c->priv;
          __dma_free_remap(cpu_addr, size);
          __iommu_remove_mapping(dev, handle, size);
          __iommu_free_buffer(dev, pages, size);
  }
+}

+/*

Map a part of the scatter-gather list into contiguous io address
space

...
...
*/

+static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
                    size_t size, dma_addr_t *handle,
                    enum dma_data_direction dir)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova, iova_base;
  int ret = 0;
  unsigned int count;
  struct scatterlist *s;
  size = PAGE_ALIGN(size);
  *handle = DMA_ERROR_CODE;
  iova_base = iova = __alloc_iova(mapping, size);
  if (iova == DMA_ERROR_CODE)
          return -ENOMEM;
  for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s =
sg_next(s)) {
          phys_addr_t phys = page_to_phys(sg_page(s));
          unsigned int len = PAGE_ALIGN(s->offset + s->length);
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir);
          ret = iommu_map(mapping->domain, iova, phys, len, 0);
          if (ret < 0)
                  goto fail;
          count += len >> PAGE_SHIFT;
          iova += len;
  }
  *handle = iova_base;
  return 0;
+fail:
  iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
  __free_iova(mapping, iova_base, size);
  return ret;
+}

+/**

arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map

@dir: DMA transfer direction

Map a set of buffers described by scatterlist in streaming mode for

DMA.

The scatter gather list elements are merged together (if possible)
and

...
...

tagged with the appropriate dma address and length. They are

obtained

...
...
via

sg_dma_{address,length}.

*/

+int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
               enum dma_data_direction dir, struct dma_attrs
*attrs)

...
...
+{
  struct scatterlist *s = sg, *dma = sg, *start = sg;
  int i, count = 0;
  unsigned int offset = s->offset;
  unsigned int size = s->offset + s->length;
  unsigned int max = dma_get_max_seg_size(dev);
  for (i = 1; i < nents; i++) {
          s = sg_next(s);
          s->dma_address = DMA_ERROR_CODE;
          s->dma_length = 0;
          if (s->offset || (size & ~PAGE_MASK) || size +
s->length >

...
...
max) {
                  if (__map_sg_chunk(dev, start, size,
&dma->dma_address,
                      dir) < 0)
                          goto bad_mapping;
                  dma->dma_address += offset;
                  dma->dma_length = size - offset;
                  size = offset = s->offset;
                  start = s;
                  dma = sg_next(dma);
                  count += 1;
          }
          size += s->length;
  }
  if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir) <
...
...
          goto bad_mapping;
  dma->dma_address += offset;
  dma->dma_length = size - offset;
  return count+1;
+bad_mapping:
  for_each_sg(sg, s, count, i)
          __iommu_remove_mapping(dev, sg_dma_address(s),
sg_dma_len(s));
  return 0;
+}

+/**

arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to unmap (same as was passed to
dma_map_sg)

...
...
@dir: DMA transfer direction (same as was passed to dma_map_sg)

Unmap a set of streaming mode DMA translations. Again, CPU access

rules concerning calls here are the same as for dma_unmap_single().

*/

+void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
                  enum dma_data_direction dir, struct dma_attrs
*attrs) +{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i) {
          if (sg_dma_len(s))
                  __iommu_remove_mapping(dev, sg_dma_address(s),
                                         sg_dma_len(s));
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
                                        s->length, dir);
  }
+}

+/**

arm_iommu_sync_sg_for_cpu

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_dev_to_cpu(sg_page(s), s->offset,
s->length, dir);

+}

+/**

arm_iommu_sync_sg_for_device

@dev: valid struct device pointer

@sg: list of buffers

@nents: number of buffers to map (returned from dma_map_sg)

@dir: DMA transfer direction (same as was passed to dma_map_sg)

*/

+void arm_iommu_sync_sg_for_device(struct device *dev, struct
scatterlist

...
...
*sg,
                  int nents, enum dma_data_direction dir)
+{
  struct scatterlist *s;
  int i;
  for_each_sg(sg, s, nents, i)
          if (!arch_is_coherent())
                  __dma_page_cpu_to_dev(sg_page(s), s->offset,
s->length, dir); +}

+/**

arm_iommu_map_page

@dev: valid struct device pointer

@page: page that buffer resides in

@offset: offset into page for start of buffer

@size: size of buffer to map

@dir: DMA transfer direction

IOMMU aware version of arm_dma_map_page()

*/

+static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
       unsigned long offset, size_t size, enum dma_data_direction
dir,
       struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t dma_addr;
  int ret, len = PAGE_ALIGN(size + offset);
  if (!arch_is_coherent())
          __dma_page_cpu_to_dev(page, offset, size, dir);
  dma_addr = __alloc_iova(mapping, len);
  if (dma_addr == DMA_ERROR_CODE)
          return dma_addr;
  ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page),
len, 0);
  if (ret < 0)
          goto fail;
  return dma_addr + offset;
+fail:
  __free_iova(mapping, dma_addr, len);
  return DMA_ERROR_CODE;
+}

+/**

arm_iommu_unmap_page

@dev: valid struct device pointer

@handle: DMA address of buffer

@size: size of buffer (same as passed to dma_map_page)

@dir: DMA transfer direction (same as passed to dma_map_page)

IOMMU aware version of arm_dma_unmap_page()

*/

+static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
          size_t size, enum dma_data_direction dir,
          struct dma_attrs *attrs)
+{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  int offset = handle & ~PAGE_MASK;
  int len = PAGE_ALIGN(size + offset);
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
  iommu_unmap(mapping->domain, iova, len);
  __free_iova(mapping, iova, len);
+}

+static void arm_iommu_sync_single_for_cpu(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  if (!arch_is_coherent())
          __dma_page_dev_to_cpu(page, offset, size, dir);
+}

+static void arm_iommu_sync_single_for_device(struct device *dev,
          dma_addr_t handle, size_t size, enum dma_data_direction
dir) +{
  struct dma_iommu_mapping *mapping = dev->archdata.mapping;
  dma_addr_t iova = handle & PAGE_MASK;
  struct page *page =
phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
  unsigned int offset = handle & ~PAGE_MASK;
  if (!iova)
          return;
  __dma_page_cpu_to_dev(page, offset, size, dir);
+}

+struct dma_map_ops iommu_ops = {
  .alloc          = arm_iommu_alloc_attrs,
  .free           = arm_iommu_free_attrs,
  .mmap           = arm_iommu_mmap_attrs,
  .map_page               = arm_iommu_map_page,
  .unmap_page             = arm_iommu_unmap_page,
  .sync_single_for_cpu    = arm_iommu_sync_single_for_cpu,
  .sync_single_for_device = arm_iommu_sync_single_for_device,
  .map_sg                 = arm_iommu_map_sg,
  .unmap_sg               = arm_iommu_unmap_sg,
  .sync_sg_for_cpu        = arm_iommu_sync_sg_for_cpu,
  .sync_sg_for_device     = arm_iommu_sync_sg_for_device,
+};

+/**

arm_iommu_create_mapping

@bus: pointer to the bus holding the client device (for IOMMU calls)

@base: start address of the valid IO address space

@size: size of the valid IO address space

@order: accuracy of the IO addresses allocations

Creates a mapping structure which holds information about
used/unused

...
...

IO address ranges, which is required to perform memory allocation

and

...
...
mapping with IOMMU aware functions.

The client device need to be attached to the mapping with

arm_iommu_attach_device function.

*/

+struct dma_iommu_mapping * +arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, size_t size,
                   int order)
+{
  unsigned int count = size >> (PAGE_SHIFT + order);
  unsigned int bitmap_size = BITS_TO_LONGS(count) * sizeof(long);
  struct dma_iommu_mapping *mapping;
  int err = -ENOMEM;
  if (!count)
          return ERR_PTR(-EINVAL);
  mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
  if (!mapping)
          goto err;
  mapping->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
  if (!mapping->bitmap)
          goto err2;
  mapping->base = base;
  mapping->bits = BITS_PER_BYTE * bitmap_size;
  mapping->order = order;
  spin_lock_init(&mapping->lock);
  mapping->domain = iommu_domain_alloc(bus);
  if (!mapping->domain)
          goto err3;
  kref_init(&mapping->kref);
  return mapping;
+err3:
  kfree(mapping->bitmap);
+err2:
  kfree(mapping);
+err:
  return ERR_PTR(err);
+}

+static void release_iommu_mapping(struct kref *kref) +{
  struct dma_iommu_mapping *mapping =
          container_of(kref, struct dma_iommu_mapping, kref);
  iommu_domain_free(mapping->domain);
  kfree(mapping->bitmap);
  kfree(mapping);
+}

+void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping) +{
  if (mapping)
          kref_put(&mapping->kref, release_iommu_mapping);
+}

+/**
arm_iommu_attach_device

@dev: valid struct device pointer

@mapping: io address space mapping structure (returned from
arm_iommu_create_mapping)
Attaches specified io address space mapping to the provided device,

this replaces the dma operations (dma_map_ops pointer) with the

IOMMU aware version. More than one client might be attached to

the same io address space mapping.

*/
+int arm_iommu_attach_device(struct device *dev,
                      struct dma_iommu_mapping *mapping)
+{
  int err;
  err = iommu_attach_device(mapping->domain, dev);
  if (err)
          return err;
  kref_get(&mapping->kref);
  dev->archdata.mapping = mapping;
  set_dma_ops(dev, &iommu_ops);
  pr_info("Attached IOMMU controller to %s device.\n",
dev_name(dev));
  return 0;
+}

+#endif diff --git a/arch/arm/mm/vmregion.h b/arch/arm/mm/vmregion.h index 162be66..bf312c3 100644 --- a/arch/arm/mm/vmregion.h +++ b/arch/arm/mm/vmregion.h @@ -17,7 +17,7 @@ struct arm_vmregion { struct list_head vm_list; unsigned long vm_start; unsigned long vm_end;
  struct page             *vm_pages;
  void                    *priv;
 int                     vm_active;
 const void              *caller;
};

1.7.1.569.g6f426

Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig

Paul Gortmaker

11 May 11 May

2:08 a.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

On Wed, Apr 18, 2012 at 9:44 AM, Marek Szyprowski m.szyprowski@samsung.com wrote:

...

This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

Hi Marek,

It looks like this patch breaks no-MMU builds on ARM, at least according to git bisect. Here is a link to a linux-next failure:

http://kisskb.ellerman.id.au/kisskb/buildresult/6291233/

arch/arm/mm/dma-mapping.c:726:42: error: 'pgprot_kernel' undeclared (first use in this function) make[2]: *** [arch/arm/mm/dma-mapping.o] Error 1

Please have a look, thanks.

Paul. ---

...

This implementation tries to optimize dma address space usage by remapping all possible physical memory chunks into a single dma address space chunk.

DMA address space is managed on top of the bitmap stored in the dma_iommu_mapping structure stored in device->archdata. Platform setup code has to initialize parameters of the dma address space (base address, size, allocation precision order) with arm_iommu_create_mapping() function. To reduce the size of the bitmap, all allocations are aligned to the specified order of base 4 KiB pages.

dma_alloc_* functions allocate physical memory in chunks, each with alloc_pages() function to avoid failing if the physical memory gets fragmented. In worst case the allocated buffer is composed of 4 KiB page chunks.

dma_map_sg() function minimizes the total number of dma address space chunks by merging of physical memory chunks into one larger dma address space chunk. If requested chunk (scatter list entry) boundaries match physical page boundaries, most calls to dma_map_sg() requests will result in creating only one chunk in dma address space.

dma_map_page() simply creates a mapping for the given page(s) in the dma address space.

All dma functions also perform required cache operation like their counterparts from the arm linear physical memory mapping version.

This patch contains code and fixes kindly provided by:

Krishna Reddy vdumpa@nvidia.com,

Andrzej Pietrasiewicz andrzej.p@samsung.com,

Hiroshi DOYU hdoyu@nvidia.com

Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Acked-by: Kyungmin Park kyungmin.park@samsung.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Tested-By: Subash Patel subash.ramaswamy@linaro.org

arch/arm/Kconfig | 8 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-iommu.h | 34 ++ arch/arm/mm/dma-mapping.c | 727 +++++++++++++++++++++++++++++++++++++- arch/arm/mm/vmregion.h | 2 +- 5 files changed, 759 insertions(+), 15 deletions(-) create mode 100644 arch/arm/include/asm/dma-iommu.h

Marek Szyprowski

7:52 a.m.

New subject: [PATCHv9 10/10] ARM: dma-mapping: add support for IOMMU mapper

Hello,

On Friday, May 11, 2012 4:09 AM Paul Gortmaker wrote:

...

On Wed, Apr 18, 2012 at 9:44 AM, Marek Szyprowski m.szyprowski@samsung.com wrote:

...
This patch add a complete implementation of DMA-mapping API for devices which have IOMMU support.

Hi Marek,

It looks like this patch breaks no-MMU builds on ARM, at least according to git bisect. Here is a link to a linux-next failure:

http://kisskb.ellerman.id.au/kisskb/buildresult/6291233/

arch/arm/mm/dma-mapping.c:726:42: error: 'pgprot_kernel' undeclared (first use in this function) make[2]: *** [arch/arm/mm/dma-mapping.o] Error 1

Please have a look, thanks.

Thanks for reporting this issue, I will send a fix in a minute.

Best regards

-- Marek Szyprowski Samsung Poland R&D Center

Marek Szyprowski

8:33 a.m.

New subject: [PATCH] ARM: dma-mapping: fix build break on no-MMU systems

Fix the following build issue:

arch/arm/mm/dma-mapping.c:726:42: error: 'pgprot_kernel' undeclared (first use in this function) make[2]: *** [arch/arm/mm/dma-mapping.o] Error 1

Reported-by: Paul Gortmaker paul.gortmaker@windriver.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com --- arch/arm/mm/dma-mapping.c | 17 +++++++++-------- 1 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 2d11aa0..686ef02 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -428,10 +428,19 @@ static void __dma_free_remap(void *cpu_addr, size_t size) arm_vmregion_free(&consistent_head, c); }

#define __dma_alloc_remap(page, size, gfp, prot, c) page_address(page) #define __dma_free_remap(addr, size) do { } while (0) +#define __get_dma_pgprot(attrs, prot) __pgprot(0)

#endif /* CONFIG_MMU */

@@ -471,14 +480,6 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, return addr; }

-static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot) -{ - prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ? - pgprot_writecombine(prot) : - pgprot_dmacoherent(prot); - return prot; -} - /* * Allocate DMA-coherent memory space and return both the kernel remapped * virtual and bus address for that space.

-- 1.7.1.569.g6f426

5123

days inactive

5146

days old

linaro-mm-sig@lists.linaro.org

17 comments

participants

tags (0)

participants (4)

Abhinav Kochhar
Kyungmin Park
Marek Szyprowski
Paul Gortmaker