Hi,
This is v3 on the attempt to remove the misuse of the DMA cache APIS from Ion.
As from before: The APIs created are kernel_force_cache_clean and kernel_force_cache_invalidate. They force a clean and invalidate of the cache, respectively. The aim was to take the semantics of dma_sync and turn them into something that isn't dma_sync. This series includes a nominal implementation for arm/arm64, mostly for demonstration purposes.
The major change from v2 is that the implementations no longer leverage the DMA abstractions. Russell King noted that dma_map and dma_unmap just 'happen' to do the right thing but they aren't guaranteed.
I'm hoping at v3 there are no objections to the general concept but if they exist please express them.
Thanks, Laura
[1]http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg49406....
Laura Abbott (5): Documentation: Introduce kernel_force_cache_* APIs arm: Impelment ARCH_HAS_FORCE_CACHE arm64: Implement ARCH_HAS_FORCE_CACHE staging: android: ion: Convert to the kernel_force_cache APIs staging: ion: Add support for syncing with DMA_BUF_IOCTL_SYNC
Documentation/cachetlb.txt | 18 ++++++- arch/arm/include/asm/cacheflush.h | 11 ++++ arch/arm/include/asm/glue-cache.h | 2 + arch/arm/mm/Makefile | 2 +- arch/arm/mm/cache-fa.S | 8 +++ arch/arm/mm/cache-nop.S | 6 +++ arch/arm/mm/cache-v4.S | 10 ++++ arch/arm/mm/cache-v4wb.S | 8 +++ arch/arm/mm/cache-v4wt.S | 8 +++ arch/arm/mm/cache-v6.S | 8 +++ arch/arm/mm/cache-v7.S | 13 +++++ arch/arm/mm/cacheflush.c | 71 +++++++++++++++++++++++++ arch/arm/mm/proc-arm920.S | 8 +++ arch/arm/mm/proc-arm922.S | 8 +++ arch/arm/mm/proc-arm925.S | 8 +++ arch/arm/mm/proc-arm926.S | 8 +++ arch/arm/mm/proc-feroceon.S | 11 ++++ arch/arm/mm/proc-macros.S | 2 + arch/arm/mm/proc-xsc3.S | 9 ++++ arch/arm/mm/proc-xscale.S | 9 ++++ arch/arm64/include/asm/cacheflush.h | 8 +++ arch/arm64/mm/cache.S | 24 +++++++-- arch/arm64/mm/flush.c | 11 ++++ drivers/staging/android/ion/ion.c | 53 +++++++++++------- drivers/staging/android/ion/ion_carveout_heap.c | 8 +-- drivers/staging/android/ion/ion_chunk_heap.c | 12 +++-- drivers/staging/android/ion/ion_page_pool.c | 7 +-- drivers/staging/android/ion/ion_priv.h | 11 ---- drivers/staging/android/ion/ion_system_heap.c | 6 +-- include/linux/cacheflush.h | 11 ++++ 30 files changed, 330 insertions(+), 49 deletions(-) create mode 100644 arch/arm/mm/cacheflush.c create mode 100644 include/linux/cacheflush.h
Some frameworks (e.g. Ion) may need to do explicit cache management to meet performance/correctness requirements. Rather than piggy-back on another API and hope the semantics don't change, introduce a set of APIs to force a page to be cleaned/invalidated in the cache.
Signed-off-by: Laura Abbott labbott@redhat.com --- Documentation/cachetlb.txt | 18 +++++++++++++++++- include/linux/cacheflush.h | 11 +++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) create mode 100644 include/linux/cacheflush.h
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index 3f9f808..18eec7c 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt @@ -378,7 +378,7 @@ maps this page at its virtual address. flush_dcache_page and update_mmu_cache. In the future, the hope is to remove this interface completely.
-The final category of APIs is for I/O to deliberately aliased address +Another set of APIs is for I/O to deliberately aliased address ranges inside the kernel. Such aliases are set up by use of the vmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O subsystem assumes that the user mapping and kernel offset mapping are @@ -401,3 +401,19 @@ I/O and invalidating it after the I/O returns. speculatively reading data while the I/O was occurring to the physical pages. This is only necessary for data reads into the vmap area. + +Nearly all drivers can handle cache management using the existing DMA model. +There may be limited circumstances when a driver or framework needs to +explicitly manage the cache; trying to force cache management into the DMA +framework may lead to performance loss or unnecessary work. These APIs may +be used to provide explicit coherency for memory that does not fall into +any of the above categories. Implementers of this API must assume the +address can be aliased. Any cache operations shall not be delayed and must +be completed by the time the call returns. + + void kernel_force_cache_clean(struct page *page, size_t size); + Ensures that any data in the cache by the page is written back + and visible across all aliases. + + void kernel_force_cache_invalidate(struct page *page, size_t size); + Invalidates the cache for the given page. diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h new file mode 100644 index 0000000..4388846 --- /dev/null +++ b/include/linux/cacheflush.h @@ -0,0 +1,11 @@ +#ifndef CACHEFLUSH_H +#define CACHEFLUSH_H + +#include <asm/cacheflush.h> + +#ifndef ARCH_HAS_FORCE_CACHE +static inline void kernel_force_cache_clean(struct page *page, size_t size) { } +static inline void kernel_force_cache_invalidate(struct page *page, size_t size) { } +#endif + +#endif
arm64 may need to guarantee the caches are synced. Implement versions of the kernel_force_cache API for v7. Other versions are stubbed out and can be added as appropriate.
Signed-off-by: Laura Abbott labbott@redhat.com --- v3: Switch to force implementations per CPU instead of relying on dma_map/dma_unmap. v7 is the only one implemented right now, others can be added as needed. --- arch/arm/include/asm/cacheflush.h | 11 ++++++ arch/arm/include/asm/glue-cache.h | 2 ++ arch/arm/mm/Makefile | 2 +- arch/arm/mm/cache-fa.S | 8 +++++ arch/arm/mm/cache-nop.S | 6 ++++ arch/arm/mm/cache-v4.S | 10 ++++++ arch/arm/mm/cache-v4wb.S | 8 +++++ arch/arm/mm/cache-v4wt.S | 8 +++++ arch/arm/mm/cache-v6.S | 8 +++++ arch/arm/mm/cache-v7.S | 13 +++++++ arch/arm/mm/cacheflush.c | 71 +++++++++++++++++++++++++++++++++++++++ arch/arm/mm/proc-arm920.S | 8 +++++ arch/arm/mm/proc-arm922.S | 8 +++++ arch/arm/mm/proc-arm925.S | 8 +++++ arch/arm/mm/proc-arm926.S | 8 +++++ arch/arm/mm/proc-feroceon.S | 11 ++++++ arch/arm/mm/proc-macros.S | 2 ++ arch/arm/mm/proc-xsc3.S | 9 +++++ arch/arm/mm/proc-xscale.S | 9 +++++ 19 files changed, 209 insertions(+), 1 deletion(-) create mode 100644 arch/arm/mm/cacheflush.c
diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h index 9156fc3..2d9a4d3 100644 --- a/arch/arm/include/asm/cacheflush.h +++ b/arch/arm/include/asm/cacheflush.h @@ -116,6 +116,9 @@ struct cpu_cache_fns { void (*dma_unmap_area)(const void *, size_t, int);
void (*dma_flush_range)(const void *, const void *); + + void (*force_dcache_clean)(void *, size_t); + void (*force_dcache_invalidate)(void *, size_t); };
/* @@ -133,6 +136,8 @@ extern struct cpu_cache_fns cpu_cache; #define __cpuc_coherent_kern_range cpu_cache.coherent_kern_range #define __cpuc_coherent_user_range cpu_cache.coherent_user_range #define __cpuc_flush_dcache_area cpu_cache.flush_kern_dcache_area +#define __cpuc_force_dcache_clean cpu_cache.force_dcache_clean +#define __cpuc_force_dcache_invalidate cpu_cache.force_dcache_invalidate
/* * These are private to the dma-mapping API. Do not use directly. @@ -152,6 +157,8 @@ extern void __cpuc_flush_user_range(unsigned long, unsigned long, unsigned int); extern void __cpuc_coherent_kern_range(unsigned long, unsigned long); extern int __cpuc_coherent_user_range(unsigned long, unsigned long); extern void __cpuc_flush_dcache_area(void *, size_t); +extern void __cpuc_force_dcache_clean(void *, size_t); +extern void __cpuc_force_dcache_invalidate(void *, size_t);
/* * These are private to the dma-mapping API. Do not use directly. @@ -518,4 +525,8 @@ static inline void secure_flush_area(const void *addr, size_t size) outer_flush_range(phys, phys + size); }
+#define ARCH_HAS_FORCE_CACHE 1 +void kernel_force_cache_clean(struct page *page, size_t size); +void kernel_force_cache_invalidate(struct page *page, size_t size); + #endif diff --git a/arch/arm/include/asm/glue-cache.h b/arch/arm/include/asm/glue-cache.h index cab07f6..232938f 100644 --- a/arch/arm/include/asm/glue-cache.h +++ b/arch/arm/include/asm/glue-cache.h @@ -157,6 +157,8 @@ static inline void nop_dma_unmap_area(const void *s, size_t l, int f) { } #define __cpuc_coherent_kern_range __glue(_CACHE,_coherent_kern_range) #define __cpuc_coherent_user_range __glue(_CACHE,_coherent_user_range) #define __cpuc_flush_dcache_area __glue(_CACHE,_flush_kern_dcache_area) +#define __cpuc_force_dcache_clean __glue(_CACHE,_force_dcache_clean) +#define __cpuc_force_dcache_invalidate __glue(_CACHE,_force_dcache_invalidate)
#define dmac_flush_range __glue(_CACHE,_dma_flush_range) #endif diff --git a/arch/arm/mm/Makefile b/arch/arm/mm/Makefile index 7f76d96..3afcdd0 100644 --- a/arch/arm/mm/Makefile +++ b/arch/arm/mm/Makefile @@ -3,7 +3,7 @@ #
obj-y := dma-mapping.o extable.o fault.o init.o \ - iomap.o + iomap.o cacheflush.o
obj-$(CONFIG_MMU) += fault-armv.o flush.o idmap.o ioremap.o \ mmap.o pgd.o mmu.o pageattr.o diff --git a/arch/arm/mm/cache-fa.S b/arch/arm/mm/cache-fa.S index 2f0c588..f1fe5df 100644 --- a/arch/arm/mm/cache-fa.S +++ b/arch/arm/mm/cache-fa.S @@ -244,6 +244,14 @@ ENDPROC(fa_dma_unmap_area) .globl fa_flush_kern_cache_louis .equ fa_flush_kern_cache_louis, fa_flush_kern_cache_all
+ENTRY(fa_force_dcache_invalidate) + ret lr +ENDPROC(fa_force_dcache_invalidate) + +ENTRY(fa_force_dcache_clean) + ret lr +ENDPROC(fa_force_dcache_clean) + __INITDATA
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) diff --git a/arch/arm/mm/cache-nop.S b/arch/arm/mm/cache-nop.S index f1cc986..983f96f 100644 --- a/arch/arm/mm/cache-nop.S +++ b/arch/arm/mm/cache-nop.S @@ -45,6 +45,12 @@ ENDPROC(nop_coherent_user_range) .globl nop_dma_unmap_area .equ nop_dma_unmap_area, nop_flush_icache_all
+ .globl nop_force_dcache_clean + .equ nop_force_dcache_clean, nop_flush_icache_all + + .globl nop_force_dcache_invalidate + .equ nop_force_dcache_invalidate, nop_flush_icache_all + __INITDATA
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S index 91e3adf..db07995 100644 --- a/arch/arm/mm/cache-v4.S +++ b/arch/arm/mm/cache-v4.S @@ -144,6 +144,16 @@ ENDPROC(v4_dma_map_area) .globl v4_flush_kern_cache_louis .equ v4_flush_kern_cache_louis, v4_flush_kern_cache_all
+ +ENTRY(v4_force_dcache_invalidate) + ret lr +ENDPROC(v4_force_dcache_invalidate) + +ENTRY(v4_force_dcache_clean) + ret lr +ENDPROC(v4_force_dcache_clean) + + __INITDATA
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S index 2522f8c..897f333 100644 --- a/arch/arm/mm/cache-v4wb.S +++ b/arch/arm/mm/cache-v4wb.S @@ -255,6 +255,14 @@ ENDPROC(v4wb_dma_unmap_area) .globl v4wb_flush_kern_cache_louis .equ v4wb_flush_kern_cache_louis, v4wb_flush_kern_cache_all
+ENTRY(v4wb_force_dcache_invalidate) + ret lr +ENDPROC(v4wb_force_dcache_invalidate) + +ENTRY(v4wb_force_dcache_clean) + ret lr +ENDPROC(v4wb_force_dcache_clean) + __INITDATA
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S index a0982ce..2e77e4a 100644 --- a/arch/arm/mm/cache-v4wt.S +++ b/arch/arm/mm/cache-v4wt.S @@ -200,6 +200,14 @@ ENDPROC(v4wt_dma_map_area) .globl v4wt_flush_kern_cache_louis .equ v4wt_flush_kern_cache_louis, v4wt_flush_kern_cache_all
+ENTRY(v4wt_force_dcache_invalidate) + ret lr +ENDPROC(v4wt_force_dcache_invalidate) + +ENTRY(v4wt_force_dcache_clean) + ret lr +ENDPROC(v4wt_force_dcache_clean) + __INITDATA
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S index 2465995..4911634 100644 --- a/arch/arm/mm/cache-v6.S +++ b/arch/arm/mm/cache-v6.S @@ -333,3 +333,11 @@ ENDPROC(v6_dma_unmap_area)
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) define_cache_functions v6 + +ENTRY(v6_force_dcache_invalidate) + ret lr +ENDPROC(v6_force_dcache_invalidate) + +ENTRY(v6_force_dcache_clean) + ret lr +ENDPROC(v6_force_dcache_clean) diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S index a134d8a..2750b27 100644 --- a/arch/arm/mm/cache-v7.S +++ b/arch/arm/mm/cache-v7.S @@ -442,6 +442,19 @@ ENTRY(v7_dma_unmap_area) ret lr ENDPROC(v7_dma_unmap_area)
+ +ENTRY(v7_force_dcache_invalidate) + add r1, r1, r0 + b v7_dma_inv_range + ret lr +ENDPROC(v7_force_dcache_invalidate) + +ENTRY(v7_force_dcache_clean) + add r1, r1, r0 + b v7_dma_clean_range + ret lr +ENDPROC(v7_force_dcache_clean) + __INITDATA
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) diff --git a/arch/arm/mm/cacheflush.c b/arch/arm/mm/cacheflush.c new file mode 100644 index 0000000..5daa98e --- /dev/null +++ b/arch/arm/mm/cacheflush.c @@ -0,0 +1,71 @@ +/* + * Based on arch/arm/mm/dma-mapping.c which is + * Copyright (C) 2000-2004 Russell King + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + * + */ + +#include <linux/kernel.h> +#include <linux/mm.h> +#include <linux/highmem.h> + +#include <asm/highmem.h> +#include <asm/cacheflush.h> + +static void __force_cache_op(struct page *page, size_t size, + void (*op)(void *start, size_t size)) +{ + unsigned long pfn; + size_t left = size; + + pfn = page_to_pfn(page); + + do { + size_t len = left; + void *vaddr; + + page = pfn_to_page(pfn); + + if (PageHighMem(page)) { + if (len > PAGE_SIZE) + len = PAGE_SIZE; + if (cache_is_vipt_nonaliasing()) { + vaddr = kmap_atomic(page); + op(vaddr, len); + kunmap_atomic(vaddr); + } else { + vaddr = kmap_high_get(page); + if (vaddr) { + op(vaddr, len); + kunmap_high(page); + } + } + } else { + + op(page_address(page), len); + } + pfn++; + left -= len; + } while(left); +} + +void kernel_force_cache_clean(struct page *page, size_t size) +{ + phys_addr_t paddr; + + paddr = page_to_phys(page); + __force_cache_op(page, size, __cpuc_force_dcache_clean); + outer_clean_range(paddr, paddr + size); +} + +void kernel_force_cache_invalidate(struct page *page, size_t size) +{ + phys_addr_t paddr; + + paddr = page_to_phys(page); + __force_cache_op(page, size, __cpuc_force_dcache_invalidate); + outer_inv_range(paddr, paddr + size); +} diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S index 7a14bd4..ead3060 100644 --- a/arch/arm/mm/proc-arm920.S +++ b/arch/arm/mm/proc-arm920.S @@ -334,6 +334,14 @@ ENTRY(cpu_arm920_dcache_clean_area) bhi 1b ret lr
+ENTRY(arm920_force_dcache_invalidate) + ret lr +ENDPROC(arm920_force_dcache_invalidate) + +ENTRY(arm920_force_dcache_clean) + ret lr +ENDPROC(arm920_force_dcache_clean) + /* =============================== PageTable ============================== */
/* diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S index edccfcd..4645a98 100644 --- a/arch/arm/mm/proc-arm922.S +++ b/arch/arm/mm/proc-arm922.S @@ -338,6 +338,14 @@ ENTRY(cpu_arm922_dcache_clean_area) #endif ret lr
+ENTRY(arm922_force_dcache_invalidate) + ret lr +ENDPROC(arm922_force_dcache_invalidate) + +ENTRY(arm922_force_dcache_clean) + ret lr +ENDPROC(arm922_force_dcache_clean) + /* =============================== PageTable ============================== */
/* diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S index 32a47cc..866d623 100644 --- a/arch/arm/mm/proc-arm925.S +++ b/arch/arm/mm/proc-arm925.S @@ -392,6 +392,14 @@ ENTRY(cpu_arm925_dcache_clean_area) mcr p15, 0, r0, c7, c10, 4 @ drain WB ret lr
+ENTRY(arm925_force_dcache_invalidate) + ret lr +ENDPROC(arm925_force_dcache_invalidate) + +ENTRY(arm925_force_dcache_clean) + ret lr +ENDPROC(arm925_force_dcache_clean) + /* =============================== PageTable ============================== */
/* diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S index fb827c6..2257b00 100644 --- a/arch/arm/mm/proc-arm926.S +++ b/arch/arm/mm/proc-arm926.S @@ -355,6 +355,14 @@ ENTRY(cpu_arm926_dcache_clean_area) mcr p15, 0, r0, c7, c10, 4 @ drain WB ret lr
+ENTRY(arm926_force_dcache_invalidate) + ret lr +ENDPROC(arm926_force_dcache_invalidate) + +ENTRY(arm926_force_dcache_clean) + ret lr +ENDPROC(arm926_force_dcache_clean) + /* =============================== PageTable ============================== */
/* diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S index 92e08bf..fca0e42 100644 --- a/arch/arm/mm/proc-feroceon.S +++ b/arch/arm/mm/proc-feroceon.S @@ -439,6 +439,8 @@ ENDPROC(feroceon_dma_unmap_area) range_alias coherent_kern_range range_alias coherent_user_range range_alias dma_unmap_area + range_alias force_dcache_clean + range_alias force_dcache_invalidate
define_cache_functions feroceon_range
@@ -463,6 +465,15 @@ ENTRY(cpu_feroceon_dcache_clean_area) mcr p15, 0, r0, c7, c10, 4 @ drain WB ret lr
+ENTRY(feroceon_force_dcache_invalidate) + ret lr +ENDPROC(feroceon_force_dcache_invalidate) + +ENTRY(feroceon_force_dcache_clean) + ret lr +ENDPROC(feroceon_force_dcache_clean) + + /* =============================== PageTable ============================== */
/* diff --git a/arch/arm/mm/proc-macros.S b/arch/arm/mm/proc-macros.S index c671f34..cc2d6cf 100644 --- a/arch/arm/mm/proc-macros.S +++ b/arch/arm/mm/proc-macros.S @@ -310,6 +310,8 @@ ENTRY(\name()_cache_fns) .long \name()_dma_map_area .long \name()_dma_unmap_area .long \name()_dma_flush_range + .long \name()_force_dcache_clean + .long \name()_force_dcache_invalidate .size \name()_cache_fns, . - \name()_cache_fns .endm
diff --git a/arch/arm/mm/proc-xsc3.S b/arch/arm/mm/proc-xsc3.S index 293dcc2..924d304 100644 --- a/arch/arm/mm/proc-xsc3.S +++ b/arch/arm/mm/proc-xsc3.S @@ -343,6 +343,15 @@ ENDPROC(xsc3_dma_unmap_area) @ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) define_cache_functions xsc3
+ENTRY(xsc3_force_dcache_clean) + ret lr +ENDPROC(xsc3_force_dcache_clean) + +ENTRY(xsc3_force_dcache_invalidate) + ret lr +ENDPROC(xsc3_force_dcache_invalidate) + + ENTRY(cpu_xsc3_dcache_clean_area) 1: mcr p15, 0, r0, c7, c10, 1 @ clean L1 D line add r0, r0, #CACHELINESIZE diff --git a/arch/arm/mm/proc-xscale.S b/arch/arm/mm/proc-xscale.S index b6bbfdb..a8f4c74 100644 --- a/arch/arm/mm/proc-xscale.S +++ b/arch/arm/mm/proc-xscale.S @@ -449,6 +449,8 @@ ENDPROC(xscale_dma_unmap_area) a0_alias flush_kern_dcache_area a0_alias dma_flush_range a0_alias dma_unmap_area + a0_alias force_dcache_clean + a0_alias force_dcache_invalidate
@ define struct cpu_cache_fns (see <asm/cacheflush.h> and proc-macros.S) define_cache_functions xscale_80200_A0_A1 @@ -460,6 +462,13 @@ ENTRY(cpu_xscale_dcache_clean_area) bhi 1b ret lr
+ENTRY(xscale_force_dcache_invalidate) + ret lr +ENDPROC(xscale_force_dcache_invalidate) + +ENTRY(xscale_force_dcache_clean) + ret lr +ENDPROC(xscale_force_dcache_clean) /* =============================== PageTable ============================== */
/*
arm64 may need to guarantee the caches are synced. Implement versions of the kernel_force_cache API to allow this.
Signed-off-by: Laura Abbott labbott@redhat.com --- v3: Switch to calling cache operations directly instead of relying on DMA mapping. --- arch/arm64/include/asm/cacheflush.h | 8 ++++++++ arch/arm64/mm/cache.S | 24 ++++++++++++++++++++---- arch/arm64/mm/flush.c | 11 +++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h index c64268d..1134c15 100644 --- a/arch/arm64/include/asm/cacheflush.h +++ b/arch/arm64/include/asm/cacheflush.h @@ -87,6 +87,9 @@ extern void __dma_map_area(const void *, size_t, int); extern void __dma_unmap_area(const void *, size_t, int); extern void __dma_flush_range(const void *, const void *);
+extern void __force_dcache_clean(const void *, size_t); +extern void __force_dcache_invalidate(const void *, size_t); + /* * Copy user data from/to a page which is mapped into a different * processes address space. Really, we want to allow our "user @@ -149,4 +152,9 @@ int set_memory_rw(unsigned long addr, int numpages); int set_memory_x(unsigned long addr, int numpages); int set_memory_nx(unsigned long addr, int numpages);
+#define ARCH_HAS_FORCE_CACHE 1 + +void kernel_force_cache_clean(struct page *page, size_t size); +void kernel_force_cache_invalidate(struct page *page, size_t size); + #endif diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 07d7352..e99c9a4 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -184,10 +184,6 @@ ENDPIPROC(__dma_flush_range) * - dir - DMA direction */ ENTRY(__dma_map_area) - add x1, x1, x0 - cmp w2, #DMA_FROM_DEVICE - b.eq __dma_inv_range - b __dma_clean_range ENDPIPROC(__dma_map_area)
/* @@ -202,3 +198,23 @@ ENTRY(__dma_unmap_area) b.ne __dma_inv_range ret ENDPIPROC(__dma_unmap_area) + +/* + * __force_dcache_clean(start, size) + * - start - kernel virtual start address + * - size - size of region + */ +ENTRY(__force_dcache_clean) + add x1, x1, x0 + b __dma_clean_range +ENDPROC(__force_dcache_clean) + +/* + * __force_dcache_invalidate(start, size) + * - start - kernel virtual start address + * - size - size of region + */ +ENTRY(__force_dcache_invalidate) + add x1, x1, x0 + b __dma_inv_range +ENDPROC(__force_dcache_invalidate) diff --git a/arch/arm64/mm/flush.c b/arch/arm64/mm/flush.c index 43a76b0..54ff32e 100644 --- a/arch/arm64/mm/flush.c +++ b/arch/arm64/mm/flush.c @@ -20,6 +20,7 @@ #include <linux/export.h> #include <linux/mm.h> #include <linux/pagemap.h> +#include <linux/dma-mapping.h>
#include <asm/cacheflush.h> #include <asm/cachetype.h> @@ -94,3 +95,13 @@ EXPORT_SYMBOL(flush_dcache_page); * Additional functions defined in assembly. */ EXPORT_SYMBOL(flush_icache_range); + +void kernel_force_cache_clean(struct page *page, size_t size) +{ + __force_dcache_clean(page_address(page), size); +} + +void kernel_force_cache_invalidate(struct page *page, size_t size) +{ + __force_dcache_invalidate(page_address(page), size); +}
Hi Laura,
On Mon, Sep 12, 2016 at 02:32:56PM -0700, Laura Abbott wrote:
arm64 may need to guarantee the caches are synced. Implement versions of the kernel_force_cache API to allow this.
Signed-off-by: Laura Abbott labbott@redhat.com
v3: Switch to calling cache operations directly instead of relying on DMA mapping.
arch/arm64/include/asm/cacheflush.h | 8 ++++++++ arch/arm64/mm/cache.S | 24 ++++++++++++++++++++---- arch/arm64/mm/flush.c | 11 +++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-)
I'm really hesitant to expose these cache routines as an API solely to support a driver sitting in staging/. I appreciate that there's a chicken and egg problem here, but we *really* don't want people using these routines in preference to the DMA API, and I fear that we'll simply grow a bunch more users of these things if we promote it as an API like you're proposing.
Can the code not be contained under staging/, as part of ion?
Will
On 09/13/2016 02:19 AM, Will Deacon wrote:
Hi Laura,
On Mon, Sep 12, 2016 at 02:32:56PM -0700, Laura Abbott wrote:
arm64 may need to guarantee the caches are synced. Implement versions of the kernel_force_cache API to allow this.
Signed-off-by: Laura Abbott labbott@redhat.com
v3: Switch to calling cache operations directly instead of relying on DMA mapping.
arch/arm64/include/asm/cacheflush.h | 8 ++++++++ arch/arm64/mm/cache.S | 24 ++++++++++++++++++++---- arch/arm64/mm/flush.c | 11 +++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-)
I'm really hesitant to expose these cache routines as an API solely to support a driver sitting in staging/. I appreciate that there's a chicken and egg problem here, but we *really* don't want people using these routines in preference to the DMA API, and I fear that we'll simply grow a bunch more users of these things if we promote it as an API like you're proposing.
Can the code not be contained under staging/, as part of ion?
I proposed that in V1 and it was suggested I make it a proper API
http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg47654.... http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg47672....
Will
Thanks, Laura
On Tue, Sep 13, 2016 at 08:02:20AM -0700, Laura Abbott wrote:
On 09/13/2016 02:19 AM, Will Deacon wrote:
On Mon, Sep 12, 2016 at 02:32:56PM -0700, Laura Abbott wrote:
arm64 may need to guarantee the caches are synced. Implement versions of the kernel_force_cache API to allow this.
Signed-off-by: Laura Abbott labbott@redhat.com
v3: Switch to calling cache operations directly instead of relying on DMA mapping.
arch/arm64/include/asm/cacheflush.h | 8 ++++++++ arch/arm64/mm/cache.S | 24 ++++++++++++++++++++---- arch/arm64/mm/flush.c | 11 +++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-)
I'm really hesitant to expose these cache routines as an API solely to support a driver sitting in staging/. I appreciate that there's a chicken and egg problem here, but we *really* don't want people using these routines in preference to the DMA API, and I fear that we'll simply grow a bunch more users of these things if we promote it as an API like you're proposing.
Can the code not be contained under staging/, as part of ion?
I proposed that in V1 and it was suggested I make it a proper API
http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg47654.... http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg47672....
:/ then I guess we're in disagreement. If ion really needs this stuff (which I don't fully grok), perhaps we should be exposing something at a higher level from the architecture, so it really can't be used for anything other than ion.
Will
On 09/13/2016 08:14 AM, Will Deacon wrote:
On Tue, Sep 13, 2016 at 08:02:20AM -0700, Laura Abbott wrote:
On 09/13/2016 02:19 AM, Will Deacon wrote:
On Mon, Sep 12, 2016 at 02:32:56PM -0700, Laura Abbott wrote:
arm64 may need to guarantee the caches are synced. Implement versions of the kernel_force_cache API to allow this.
Signed-off-by: Laura Abbott labbott@redhat.com
v3: Switch to calling cache operations directly instead of relying on DMA mapping.
arch/arm64/include/asm/cacheflush.h | 8 ++++++++ arch/arm64/mm/cache.S | 24 ++++++++++++++++++++---- arch/arm64/mm/flush.c | 11 +++++++++++ 3 files changed, 39 insertions(+), 4 deletions(-)
I'm really hesitant to expose these cache routines as an API solely to support a driver sitting in staging/. I appreciate that there's a chicken and egg problem here, but we *really* don't want people using these routines in preference to the DMA API, and I fear that we'll simply grow a bunch more users of these things if we promote it as an API like you're proposing.
Can the code not be contained under staging/, as part of ion?
I proposed that in V1 and it was suggested I make it a proper API
http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg47654.... http://www.mail-archive.com/driverdev-devel@linuxdriverproject.org/msg47672....
:/ then I guess we're in disagreement. If ion really needs this stuff (which I don't fully grok), perhaps we should be exposing something at a higher level from the architecture, so it really can't be used for anything other than ion.
I talked/complained about this at a past plumbers. The gist is that Ion ends up acting as a fake DMA layer for clients. It doesn't match nicely because clients can allocate both coherent and non-coherent memory. Trying to use dma_map doesn't work because a) a device for coherency isn't known at allocation time b) it kills performance. Part of the motivation for taking this approach is to avoid the need to rework the existing Android userspace and keep the existing behavior, as terrible as it is. Having Ion out of staging and not actually usable isn't helpful.
I'll give this all some more thought and hopefully have one or two more proposals before Connect/Plumbers.
Will
Thanks, Laura
Now that there exists a proper set of cache sync APIs, move away from the dma_sync and do less bad things.
Signed-off-by: Laura Abbott labbott@redhat.com --- v3: Rebased to latest-next --- drivers/staging/android/ion/ion.c | 22 ++++------------------ drivers/staging/android/ion/ion_carveout_heap.c | 8 +++++--- drivers/staging/android/ion/ion_chunk_heap.c | 12 +++++++----- drivers/staging/android/ion/ion_page_pool.c | 7 ++++--- drivers/staging/android/ion/ion_priv.h | 11 ----------- drivers/staging/android/ion/ion_system_heap.c | 6 +++--- 6 files changed, 23 insertions(+), 43 deletions(-)
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 396ded5..c2125de 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -37,6 +37,8 @@ #include <linux/dma-buf.h> #include <linux/idr.h>
+#include <linux/cacheflush.h> + #include "ion.h" #include "ion_priv.h" #include "compat_ion.h" @@ -817,22 +819,6 @@ static void ion_unmap_dma_buf(struct dma_buf_attachment *attachment, { }
-void ion_pages_sync_for_device(struct device *dev, struct page *page, - size_t size, enum dma_data_direction dir) -{ - struct scatterlist sg; - - sg_init_table(&sg, 1); - sg_set_page(&sg, page, size, 0); - /* - * This is not correct - sg_dma_address needs a dma_addr_t that is valid - * for the targeted device, but this works on the currently targeted - * hardware. - */ - sg_dma_address(&sg) = page_to_phys(page); - dma_sync_sg_for_device(dev, &sg, 1, dir); -} - struct ion_vma_list { struct list_head list; struct vm_area_struct *vma; @@ -857,8 +843,8 @@ static void ion_buffer_sync_for_device(struct ion_buffer *buffer, struct page *page = buffer->pages[i];
if (ion_buffer_page_is_dirty(page)) - ion_pages_sync_for_device(dev, ion_buffer_page(page), - PAGE_SIZE, dir); + kernel_force_cache_clean(ion_buffer_page(page), + PAGE_SIZE);
ion_buffer_page_clean(buffer->pages + i); } diff --git a/drivers/staging/android/ion/ion_carveout_heap.c b/drivers/staging/android/ion/ion_carveout_heap.c index c4f0795..af81edc 100644 --- a/drivers/staging/android/ion/ion_carveout_heap.c +++ b/drivers/staging/android/ion/ion_carveout_heap.c @@ -22,6 +22,9 @@ #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/vmalloc.h> + +#include <linux/cacheflush.h> + #include "ion.h" #include "ion_priv.h"
@@ -105,8 +108,7 @@ static void ion_carveout_heap_free(struct ion_buffer *buffer) ion_heap_buffer_zero(buffer);
if (ion_buffer_cached(buffer)) - dma_sync_sg_for_device(NULL, table->sgl, table->nents, - DMA_BIDIRECTIONAL); + kernel_force_cache_clean(page, buffer->size);
ion_carveout_free(heap, paddr, buffer->size); sg_free_table(table); @@ -132,7 +134,7 @@ struct ion_heap *ion_carveout_heap_create(struct ion_platform_heap *heap_data) page = pfn_to_page(PFN_DOWN(heap_data->base)); size = heap_data->size;
- ion_pages_sync_for_device(NULL, page, size, DMA_BIDIRECTIONAL); + kernel_force_cache_clean(page, size);
ret = ion_heap_pages_zero(page, size, pgprot_writecombine(PAGE_KERNEL)); if (ret) diff --git a/drivers/staging/android/ion/ion_chunk_heap.c b/drivers/staging/android/ion/ion_chunk_heap.c index 70495dc..f6d1bae 100644 --- a/drivers/staging/android/ion/ion_chunk_heap.c +++ b/drivers/staging/android/ion/ion_chunk_heap.c @@ -21,6 +21,9 @@ #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/vmalloc.h> + +#include <linux/cacheflush.h> + #include "ion.h" #include "ion_priv.h"
@@ -104,11 +107,10 @@ static void ion_chunk_heap_free(struct ion_buffer *buffer)
ion_heap_buffer_zero(buffer);
- if (ion_buffer_cached(buffer)) - dma_sync_sg_for_device(NULL, table->sgl, table->nents, - DMA_BIDIRECTIONAL); - for_each_sg(table->sgl, sg, table->nents, i) { + if (ion_buffer_cached(buffer)) + kernel_force_cache_clean(sg_page(table->sgl), + sg->length); gen_pool_free(chunk_heap->pool, page_to_phys(sg_page(sg)), sg->length); } @@ -135,7 +137,7 @@ struct ion_heap *ion_chunk_heap_create(struct ion_platform_heap *heap_data) page = pfn_to_page(PFN_DOWN(heap_data->base)); size = heap_data->size;
- ion_pages_sync_for_device(NULL, page, size, DMA_BIDIRECTIONAL); + kernel_force_cache_clean(page, size);
ret = ion_heap_pages_zero(page, size, pgprot_writecombine(PAGE_KERNEL)); if (ret) diff --git a/drivers/staging/android/ion/ion_page_pool.c b/drivers/staging/android/ion/ion_page_pool.c index aea89c1..f289d88 100644 --- a/drivers/staging/android/ion/ion_page_pool.c +++ b/drivers/staging/android/ion/ion_page_pool.c @@ -22,6 +22,9 @@ #include <linux/init.h> #include <linux/slab.h> #include <linux/swap.h> + +#include <linux/cacheflush.h> + #include "ion_priv.h"
static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool) @@ -30,9 +33,7 @@ static void *ion_page_pool_alloc_pages(struct ion_page_pool *pool)
if (!page) return NULL; - if (!pool->cached) - ion_pages_sync_for_device(NULL, page, PAGE_SIZE << pool->order, - DMA_BIDIRECTIONAL); + kernel_force_cache_clean(page, PAGE_SIZE << pool->order); return page; }
diff --git a/drivers/staging/android/ion/ion_priv.h b/drivers/staging/android/ion/ion_priv.h index 3c3b324..a344190 100644 --- a/drivers/staging/android/ion/ion_priv.h +++ b/drivers/staging/android/ion/ion_priv.h @@ -441,17 +441,6 @@ void ion_page_pool_free(struct ion_page_pool *, struct page *); int ion_page_pool_shrink(struct ion_page_pool *pool, gfp_t gfp_mask, int nr_to_scan);
-/** - * ion_pages_sync_for_device - cache flush pages for use with the specified - * device - * @dev: the device the pages will be used with - * @page: the first page to be flushed - * @size: size in bytes of region to be flushed - * @dir: direction of dma transfer - */ -void ion_pages_sync_for_device(struct device *dev, struct page *page, - size_t size, enum dma_data_direction dir); - long ion_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
int ion_sync_for_device(struct ion_client *client, int fd); diff --git a/drivers/staging/android/ion/ion_system_heap.c b/drivers/staging/android/ion/ion_system_heap.c index 7e023d5..8eefe83 100644 --- a/drivers/staging/android/ion/ion_system_heap.c +++ b/drivers/staging/android/ion/ion_system_heap.c @@ -23,6 +23,7 @@ #include <linux/seq_file.h> #include <linux/slab.h> #include <linux/vmalloc.h> +#include <linux/cacheflush.h> #include "ion.h" #include "ion_priv.h"
@@ -76,8 +77,7 @@ static struct page *alloc_buffer_page(struct ion_system_heap *heap, page = ion_page_pool_alloc(pool);
if (cached) - ion_pages_sync_for_device(NULL, page, PAGE_SIZE << order, - DMA_BIDIRECTIONAL); + kernel_force_cache_clean(page, PAGE_SIZE << order); return page; }
@@ -408,7 +408,7 @@ static int ion_system_contig_heap_allocate(struct ion_heap *heap,
buffer->sg_table = table;
- ion_pages_sync_for_device(NULL, page, len, DMA_BIDIRECTIONAL); + kernel_force_cache_clean(page, len);
return 0;
dma_buf added support for a userspace syncing ioctl. It is implemented by calling dma_buf_begin_cpu_access and dma_buf_end_cpu_access. Ion currently lacks cache operations on this code path. Add them for compatibility with the dma_buf ioctl.
Signed-off-by: Laura Abbott labbott@redhat.com --- drivers/staging/android/ion/ion.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+)
diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index c2125de..1ad8e8a 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -969,6 +969,24 @@ static void ion_dma_buf_kunmap(struct dma_buf *dmabuf, unsigned long offset, { }
+static void ion_clean_buffer(struct ion_buffer *buffer) +{ + struct scatterlist *sg; + int i; + + for_each_sg(buffer->sg_table->sgl, sg, buffer->sg_table->orig_nents, i) + kernel_force_cache_clean(sg_page(sg), sg->length); +} + +static void ion_invalidate_buffer(struct ion_buffer *buffer) +{ + struct scatterlist *sg; + int i; + + for_each_sg(buffer->sg_table->sgl, sg, buffer->sg_table->orig_nents, i) + kernel_force_cache_invalidate(sg_page(sg), sg->length); +} + static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, enum dma_data_direction direction) { @@ -984,6 +1002,11 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, mutex_lock(&buffer->lock); vaddr = ion_buffer_kmap_get(buffer); mutex_unlock(&buffer->lock); + + if (direction != DMA_TO_DEVICE) { + ion_invalidate_buffer(buffer); + } + return PTR_ERR_OR_ZERO(vaddr); }
@@ -996,6 +1019,12 @@ static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, ion_buffer_kmap_put(buffer); mutex_unlock(&buffer->lock);
+ if (direction == DMA_FROM_DEVICE) { + ion_invalidate_buffer(buffer); + } else { + ion_clean_buffer(buffer); + } + return 0; }
@@ -1126,6 +1155,8 @@ int ion_sync_for_device(struct ion_client *client, int fd) struct dma_buf *dmabuf; struct ion_buffer *buffer;
+ WARN_ONCE(1, "This API is deprecated in favor of the dma_buf ioctl\n"); + dmabuf = dma_buf_get(fd); if (IS_ERR(dmabuf)) return PTR_ERR(dmabuf);
linaro-mm-sig@lists.linaro.org