Hello everyone,
Like I've promised during the Memory Management summit at Linaro Meeting in Budapest I continued the development of the CMA. The goal is to integrate it as tight as possible with other kernel subsystems (like memory management and dma-mapping) and finally merge to mainline.
This version introduces integration with DMA-mapping subsystem for ARM architecture, but I believe that similar integration can be done for other archs too. I've also rebased all the code onto latest v3.0-rc2 kernel.
A few words for these who see CMA for the first time:
The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted.
The main difference from the similar frameworks is the fact that CMA allows to transparently reuse memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework will migrate system pages to create a required big chunk of physically contiguous memory.
For more information see the changelog and links to previous versions of CMA framework.
The current version of CMA is just an allocator that handles allocation of contiguous memory blocks. The difference between this patchset and Kamezawa's alloc_contig_pages() are:
1. alloc_contig_pages() requires MAX_ORDER alignment of allocations which may be unsuitable for embeded systems where a few MiBs are required.
Lack of the requirement on the alignment means that several threads might try to access the same pageblock/page. To prevent this from happening CMA uses a mutex so that only one cm_alloc()/cm_free() function may run at one point.
2. CMA may use its own migratetype (MIGRATE_CMA) which behaves similarly to ZONE_MOVABLE but can be put in arbitrary places.
This is required for us since we need to define two disjoint memory ranges inside system RAM. (ie. in two memory banks (do not confuse with nodes)).
3. alloc_contig_pages() scans memory in search for range that could be migrated. CMA on the other hand maintains its own allocator to decide where to allocate memory for device drivers and then tries to migrate pages from that part if needed. This is not strictly required but I somehow feel it might be faster.
The integration with ARM DMA-mapping subsystem is quite straightforward. Once cma context is available alloc_pages() can be replaced by cm_alloc() call.
Current version have been tested on Samsung S5PC110 based Aquila machine and s5p-fimc V4L2 driver. The driver itself uses videobuf2 dma-contig memory allocator, which in turn relies on dma_alloc_coherent() from DMA-mapping subsystem. By integrating CMA with DMA-mapping we manage to get this driver working with CMA without any single change required in the driver or videobuf2-dma-contig allocator.
TODO: 1. use struct page * or pfn internally instead of physicall address 2. use some simple bitmap based allocator instead of genaloc 3. provide a function similar to dma_declare_coherent_memory(), which will created and register cma area for particular device 4. code cleanup and simplification 5. discussion 6. double-mapping issues with ARMv6+ and coherent memory
Best regards
From: Michal Nazarewicz m.nazarewicz@samsung.com
This commit adds a bitmap_find_next_zero_area_off() function which works like bitmap_find_next_zero_area() function expect it allows an offset to be specified when alignment is checked. This lets caller request a bit such that its number plus the offset is aligned according to the mask.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/bitmap.h | 24 +++++++++++++++++++----- lib/bitmap.c | 22 ++++++++++++---------- 2 files changed, 31 insertions(+), 15 deletions(-)
diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index dcafe0b..50e2c16 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -45,6 +45,7 @@ * bitmap_set(dst, pos, nbits) Set specified bit area * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area + * bitmap_find_next_zero_area_off(buf, len, pos, n, mask) as above * bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n * bitmap_shift_left(dst, src, n, nbits) *dst = *src << n * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) @@ -114,11 +115,24 @@ extern int __bitmap_weight(const unsigned long *bitmap, int bits);
extern void bitmap_set(unsigned long *map, int i, int len); extern void bitmap_clear(unsigned long *map, int start, int nr); -extern unsigned long bitmap_find_next_zero_area(unsigned long *map, - unsigned long size, - unsigned long start, - unsigned int nr, - unsigned long align_mask); + +extern unsigned long bitmap_find_next_zero_area_off(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask, + unsigned long align_offset); + +static inline unsigned long +bitmap_find_next_zero_area(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask) +{ + return bitmap_find_next_zero_area_off(map, size, start, nr, + align_mask, 0); +}
extern int bitmap_scnprintf(char *buf, unsigned int len, const unsigned long *src, int nbits); diff --git a/lib/bitmap.c b/lib/bitmap.c index 41baf02..bad4f20 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -315,30 +315,32 @@ void bitmap_clear(unsigned long *map, int start, int nr) } EXPORT_SYMBOL(bitmap_clear);
-/* +/** * bitmap_find_next_zero_area - find a contiguous aligned zero area * @map: The address to base the search on * @size: The bitmap size in bits * @start: The bitnumber to start searching at * @nr: The number of zeroed bits we're looking for * @align_mask: Alignment mask for zero area + * @align_offset: Alignment offset for zero area. * * The @align_mask should be one less than a power of 2; the effect is that - * the bit offset of all zero areas this function finds is multiples of that - * power of 2. A @align_mask of 0 means no alignment is required. + * the bit offset of all zero areas this function finds plus @align_offset + * is multiple of that power of 2. */ -unsigned long bitmap_find_next_zero_area(unsigned long *map, - unsigned long size, - unsigned long start, - unsigned int nr, - unsigned long align_mask) +unsigned long bitmap_find_next_zero_area_off(unsigned long *map, + unsigned long size, + unsigned long start, + unsigned int nr, + unsigned long align_mask, + unsigned long align_offset) { unsigned long index, end, i; again: index = find_next_zero_bit(map, size, start);
/* Align allocation */ - index = __ALIGN_MASK(index, align_mask); + index = __ALIGN_MASK(index + align_offset, align_mask) - align_offset;
end = index + nr; if (end > size) @@ -350,7 +352,7 @@ again: } return index; } -EXPORT_SYMBOL(bitmap_find_next_zero_area); +EXPORT_SYMBOL(bitmap_find_next_zero_area_off);
/* * Bitmap printing & parsing functions: first version by Bill Irwin,
From: Michal Nazarewicz m.nazarewicz@samsung.com
This commit adds a gen_pool_alloc_aligned() function to the generic allocator API. It allows specifying alignment for the allocated block. This feature uses the bitmap_find_next_zero_area_off() function.
It also fixes possible issue with bitmap's last element being not fully allocated (ie. space allocated for chunk->bits is not a multiple of sizeof(long)).
It also makes some other smaller changes: - moves structure definitions out of the header file, - adds __must_check to functions returning value, - makes gen_pool_add() return -ENOMEM rater than -1 on error, - changes list_for_each to list_for_each_entry, and - makes use of bitmap_clear().
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: rebased and updated to Linux v3.0-rc1] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/genalloc.h | 50 ++++++------ lib/genalloc.c | 190 +++++++++++++++++++++++++++------------------- 2 files changed, 138 insertions(+), 102 deletions(-)
diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 5bbebda..af44e88 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -11,28 +11,11 @@
#ifndef __GENALLOC_H__ #define __GENALLOC_H__ -/* - * General purpose special memory pool descriptor. - */ -struct gen_pool { - rwlock_t lock; - struct list_head chunks; /* list of chunks in this pool */ - int min_alloc_order; /* minimum allocation order */ -};
-/* - * General purpose special memory pool chunk descriptor. - */ -struct gen_pool_chunk { - spinlock_t lock; - struct list_head next_chunk; /* next chunk in pool */ - phys_addr_t phys_addr; /* physical starting address of memory chunk */ - unsigned long start_addr; /* starting address of memory chunk */ - unsigned long end_addr; /* ending address of memory chunk */ - unsigned long bits[0]; /* bitmap for allocating memory chunk */ -}; - -extern struct gen_pool *gen_pool_create(int, int); +struct gen_pool; + +struct gen_pool *__must_check gen_pool_create(unsigned order, int nid); + extern phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long); extern int gen_pool_add_virt(struct gen_pool *, unsigned long, phys_addr_t, size_t, int); @@ -53,7 +36,26 @@ static inline int gen_pool_add(struct gen_pool *pool, unsigned long addr, { return gen_pool_add_virt(pool, addr, -1, size, nid); } -extern void gen_pool_destroy(struct gen_pool *); -extern unsigned long gen_pool_alloc(struct gen_pool *, size_t); -extern void gen_pool_free(struct gen_pool *, unsigned long, size_t); + +void gen_pool_destroy(struct gen_pool *pool); + +unsigned long __must_check +gen_pool_alloc_aligned(struct gen_pool *pool, size_t size, + unsigned alignment_order); + +/** + * gen_pool_alloc() - allocate special memory from the pool + * @pool: Pool to allocate from. + * @size: Number of bytes to allocate from the pool. + * + * Allocate the requested number of bytes from the specified pool. + * Uses a first-fit algorithm. + */ +static inline unsigned long __must_check +gen_pool_alloc(struct gen_pool *pool, size_t size) +{ + return gen_pool_alloc_aligned(pool, size, 0); +} + +void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size); #endif /* __GENALLOC_H__ */ diff --git a/lib/genalloc.c b/lib/genalloc.c index 577ddf8..b41dd90 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -16,23 +16,46 @@ #include <linux/genalloc.h>
+/* General purpose special memory pool descriptor. */ +struct gen_pool { + rwlock_t lock; /* protects chunks list */ + struct list_head chunks; /* list of chunks in this pool */ + unsigned order; /* minimum allocation order */ +}; + +/* General purpose special memory pool chunk descriptor. */ +struct gen_pool_chunk { + spinlock_t lock; /* protects bits */ + struct list_head next_chunk; /* next chunk in pool */ + phys_addr_t phys_addr; /* physical starting address of memory chunk */ + unsigned long start; /* start of memory chunk */ + unsigned long size; /* number of bits */ + unsigned long bits[0]; /* bitmap for allocating memory chunk */ +}; + + /** - * gen_pool_create - create a new special memory pool - * @min_alloc_order: log base 2 of number of bytes each bitmap bit represents - * @nid: node id of the node the pool structure should be allocated on, or -1 + * gen_pool_create() - create a new special memory pool + * @order: Log base 2 of number of bytes each bitmap bit + * represents. + * @nid: Node id of the node the pool structure should be allocated + * on, or -1. This will be also used for other allocations. * * Create a new special memory pool that can be used to manage special purpose * memory not managed by the regular kmalloc/kfree interface. */ -struct gen_pool *gen_pool_create(int min_alloc_order, int nid) +struct gen_pool *__must_check gen_pool_create(unsigned order, int nid) { struct gen_pool *pool;
- pool = kmalloc_node(sizeof(struct gen_pool), GFP_KERNEL, nid); - if (pool != NULL) { + if (WARN_ON(order >= BITS_PER_LONG)) + return NULL; + + pool = kmalloc_node(sizeof *pool, GFP_KERNEL, nid); + if (pool) { rwlock_init(&pool->lock); INIT_LIST_HEAD(&pool->chunks); - pool->min_alloc_order = min_alloc_order; + pool->order = order; } return pool; } @@ -40,33 +63,41 @@ EXPORT_SYMBOL(gen_pool_create);
/** * gen_pool_add_virt - add a new chunk of special memory to the pool - * @pool: pool to add new memory chunk to - * @virt: virtual starting address of memory chunk to add to pool - * @phys: physical starting address of memory chunk to add to pool - * @size: size in bytes of the memory chunk to add to pool - * @nid: node id of the node the chunk structure and bitmap should be - * allocated on, or -1 + * @pool: Pool to add new memory chunk to + * @virt: Virtual starting address of memory chunk to add to pool + * @phys: Physical starting address of memory chunk to add to pool + * @size: Size in bytes of the memory chunk to add to pool + * @nid: Node id of the node the chunk structure and bitmap should be + * allocated on, or -1 * * Add a new chunk of special memory to the specified pool. * * Returns 0 on success or a -ve errno on failure. */ -int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys, - size_t size, int nid) +int __must_check +gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phys, + size_t size, int nid) { struct gen_pool_chunk *chunk; - int nbits = size >> pool->min_alloc_order; - int nbytes = sizeof(struct gen_pool_chunk) + - (nbits + BITS_PER_BYTE - 1) / BITS_PER_BYTE; + size_t nbytes;
- chunk = kmalloc_node(nbytes, GFP_KERNEL | __GFP_ZERO, nid); - if (unlikely(chunk == NULL)) + if (WARN_ON(!virt || virt + size < virt || + (virt & ((1 << pool->order) - 1)))) + return -EINVAL; + + size = size >> pool->order; + if (WARN_ON(!size)) + return -EINVAL; + + nbytes = sizeof *chunk + BITS_TO_LONGS(size) * sizeof *chunk->bits; + chunk = kzalloc_node(nbytes, GFP_KERNEL, nid); + if (!chunk) return -ENOMEM;
spin_lock_init(&chunk->lock); + chunk->start = virt >> pool->order; + chunk->size = size; chunk->phys_addr = phys; - chunk->start_addr = virt; - chunk->end_addr = virt + size;
write_lock(&pool->lock); list_add(&chunk->next_chunk, &pool->chunks); @@ -90,10 +121,12 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr)
read_lock(&pool->lock); list_for_each(_chunk, &pool->chunks) { + unsigned long start_addr; chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk);
- if (addr >= chunk->start_addr && addr < chunk->end_addr) - return chunk->phys_addr + addr - chunk->start_addr; + start_addr = chunk->start << pool->order; + if (addr >= start_addr && addr < start_addr + chunk->size) + return chunk->phys_addr + addr - start_addr; } read_unlock(&pool->lock);
@@ -102,115 +135,116 @@ phys_addr_t gen_pool_virt_to_phys(struct gen_pool *pool, unsigned long addr) EXPORT_SYMBOL(gen_pool_virt_to_phys);
/** - * gen_pool_destroy - destroy a special memory pool - * @pool: pool to destroy + * gen_pool_destroy() - destroy a special memory pool + * @pool: Pool to destroy. * * Destroy the specified special memory pool. Verifies that there are no * outstanding allocations. */ void gen_pool_destroy(struct gen_pool *pool) { - struct list_head *_chunk, *_next_chunk; struct gen_pool_chunk *chunk; - int order = pool->min_alloc_order; - int bit, end_bit; - + int bit;
- list_for_each_safe(_chunk, _next_chunk, &pool->chunks) { - chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk); + while (!list_empty(&pool->chunks)) { + chunk = list_entry(pool->chunks.next, struct gen_pool_chunk, + next_chunk); list_del(&chunk->next_chunk);
- end_bit = (chunk->end_addr - chunk->start_addr) >> order; - bit = find_next_bit(chunk->bits, end_bit, 0); - BUG_ON(bit < end_bit); + bit = find_next_bit(chunk->bits, chunk->size, 0); + BUG_ON(bit < chunk->size);
kfree(chunk); } kfree(pool); - return; } EXPORT_SYMBOL(gen_pool_destroy);
/** - * gen_pool_alloc - allocate special memory from the pool - * @pool: pool to allocate from - * @size: number of bytes to allocate from the pool + * gen_pool_alloc_aligned() - allocate special memory from the pool + * @pool: Pool to allocate from. + * @size: Number of bytes to allocate from the pool. + * @alignment_order: Order the allocated space should be + * aligned to (eg. 20 means allocated space + * must be aligned to 1MiB). * * Allocate the requested number of bytes from the specified pool. * Uses a first-fit algorithm. */ -unsigned long gen_pool_alloc(struct gen_pool *pool, size_t size) +unsigned long __must_check +gen_pool_alloc_aligned(struct gen_pool *pool, size_t size, + unsigned alignment_order) { - struct list_head *_chunk; + unsigned long addr, align_mask = 0, flags, start; struct gen_pool_chunk *chunk; - unsigned long addr, flags; - int order = pool->min_alloc_order; - int nbits, start_bit, end_bit;
if (size == 0) return 0;
- nbits = (size + (1UL << order) - 1) >> order; + if (alignment_order > pool->order) + align_mask = (1 << (alignment_order - pool->order)) - 1;
- read_lock(&pool->lock); - list_for_each(_chunk, &pool->chunks) { - chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk); + size = (size + (1UL << pool->order) - 1) >> pool->order;
- end_bit = (chunk->end_addr - chunk->start_addr) >> order; + read_lock(&pool->lock); + list_for_each_entry(chunk, &pool->chunks, next_chunk) { + if (chunk->size < size) + continue;
spin_lock_irqsave(&chunk->lock, flags); - start_bit = bitmap_find_next_zero_area(chunk->bits, end_bit, 0, - nbits, 0); - if (start_bit >= end_bit) { + start = bitmap_find_next_zero_area_off(chunk->bits, chunk->size, + 0, size, align_mask, + chunk->start); + if (start >= chunk->size) { spin_unlock_irqrestore(&chunk->lock, flags); continue; }
- addr = chunk->start_addr + ((unsigned long)start_bit << order); - - bitmap_set(chunk->bits, start_bit, nbits); + bitmap_set(chunk->bits, start, size); spin_unlock_irqrestore(&chunk->lock, flags); - read_unlock(&pool->lock); - return addr; + addr = (chunk->start + start) << pool->order; + goto done; } + + addr = 0; +done: read_unlock(&pool->lock); - return 0; + return addr; } -EXPORT_SYMBOL(gen_pool_alloc); +EXPORT_SYMBOL(gen_pool_alloc_aligned);
/** - * gen_pool_free - free allocated special memory back to the pool - * @pool: pool to free to - * @addr: starting address of memory to free back to pool - * @size: size in bytes of memory to free + * gen_pool_free() - free allocated special memory back to the pool + * @pool: Pool to free to. + * @addr: Starting address of memory to free back to pool. + * @size: Size in bytes of memory to free. * * Free previously allocated special memory back to the specified pool. */ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size) { - struct list_head *_chunk; struct gen_pool_chunk *chunk; unsigned long flags; - int order = pool->min_alloc_order; - int bit, nbits;
- nbits = (size + (1UL << order) - 1) >> order; + if (!size) + return;
- read_lock(&pool->lock); - list_for_each(_chunk, &pool->chunks) { - chunk = list_entry(_chunk, struct gen_pool_chunk, next_chunk); + addr = addr >> pool->order; + size = (size + (1UL << pool->order) - 1) >> pool->order;
- if (addr >= chunk->start_addr && addr < chunk->end_addr) { - BUG_ON(addr + size > chunk->end_addr); + BUG_ON(addr + size < addr); + + read_lock(&pool->lock); + list_for_each_entry(chunk, &pool->chunks, next_chunk) + if (addr >= chunk->start && + addr + size <= chunk->start + chunk->size) { spin_lock_irqsave(&chunk->lock, flags); - bit = (addr - chunk->start_addr) >> order; - while (nbits--) - __clear_bit(bit++, chunk->bits); + bitmap_clear(chunk->bits, addr - chunk->start, size); spin_unlock_irqrestore(&chunk->lock, flags); - break; + goto done; } - } - BUG_ON(nbits > 0); + BUG_ON(1); +done: read_unlock(&pool->lock); } EXPORT_SYMBOL(gen_pool_free);
I am curious about one thing
Why do we need this allocator. Why not use allocate_resource and friends. The kernel generic resource handler already handles object alignment and subranges. It just seems to be a surplus allocator that could perhaps be mostly removed by using the kernel resource allocator we already have ?
Alan
Hello,
On Friday, June 10, 2011 1:25 PM Alan Cox wrote:
I am curious about one thing
Why do we need this allocator. Why not use allocate_resource and friends. The kernel generic resource handler already handles object alignment and subranges. It just seems to be a surplus allocator that could perhaps be mostly removed by using the kernel resource allocator we already have ?
genalloc was used mainly for historical reasons (in the earlier version we were looking for direct replacement for first fit allocator).
I plan to replace it with lib/bitmap.c bitmap_* based allocator (similar like it it is used by dma_declare_coherent_memory() and friends in drivers/base/dma-coherent.c). We need something really simple for CMA area management.
IMHO allocate_resource and friends a bit too heavy here, but good to know that such allocator also exists.
Best regards
I plan to replace it with lib/bitmap.c bitmap_* based allocator (similar like it it is used by dma_declare_coherent_memory() and friends in drivers/base/dma-coherent.c). We need something really simple for CMA area management.
IMHO allocate_resource and friends a bit too heavy here, but good to know that such allocator also exists.
Not sure I'd class allocate_resource as heavyweight but providing it's using something that already exists rather than inventing yet another allocator.
This wants dealing with before it goes upstream though so the chaneges in lib/*c etc never have to reach mainline and then get changed back.
Alan
On Fri, 10 Jun 2011 14:52:17 +0200, Alan Cox alan@lxorguk.ukuu.org.uk wrote:
I plan to replace it with lib/bitmap.c bitmap_* based allocator (similar like it it is used by dma_declare_coherent_memory() and friends in drivers/base/dma-coherent.c). We need something really simple for CMA area management.
IMHO allocate_resource and friends a bit too heavy here, but good to know that such allocator also exists.
Not sure I'd class allocate_resource as heavyweight but providing it's using something that already exists rather than inventing yet another allocator.
genalloc is already in the kernel and is used in a few places, so we either let everyone use it as they see fit or we deprecate the library. If we don't deprecate it I see no reason why CMA should not use it.
On 06/10/2011 06:52 AM, Alan Cox wrote:
I plan to replace it with lib/bitmap.c bitmap_* based allocator (similar like it it is used by dma_declare_coherent_memory() and friends in drivers/base/dma-coherent.c). We need something really simple for CMA area management.
IMHO allocate_resource and friends a bit too heavy here, but good to know that such allocator also exists.
Not sure I'd class allocate_resource as heavyweight but providing it's using something that already exists rather than inventing yet another allocator.
This wants dealing with before it goes upstream though so the chaneges in lib/*c etc never have to reach mainline and then get changed back.
Even if CMA doesn't end up using genalloc, there are existing consumers of the API and I think the _aligned function has value.
I agree that allocate_resource isn't overly heavy, but comparatively genalloc is really simple and lightweight for a driver to manage a contiguous address space without a lot of extra thought. I think both APIs serve slightly different masters, but if somebody thought about it long enough there could be some consolidation (same goes for the internal guts of dma_declare_coherent_memory).
I agree with Michal - if genalloc goes deprecated, then so be it, but as long as it lives, I think its useful to get these functions upstream.
Jordan
From: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com
Memory hotplug is a logic for making pages unused in the specified range of pfn. So, some of core logics can be used for other purpose as allocating a very large contigous memory block.
This patch moves some functions from mm/memory_hotplug.c to mm/page_isolation.c. This helps adding a function for large-alloc in page_isolation.c with memory-unplug technique.
Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com [m.nazarewicz: reworded commit message] Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: rebased and updated to Linux v3.0-rc1] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/page-isolation.h | 7 +++ mm/memory_hotplug.c | 111 -------------------------------------- mm/page_isolation.c | 115 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 122 insertions(+), 111 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 051c1b1..58cdbac 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page);
+/* + * For migration. + */ + +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn); +unsigned long scan_lru_pages(unsigned long start, unsigned long end); +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
#endif diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 9f64637..aee6dc0 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -637,117 +637,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) }
/* - * Confirm all pages in a range [start, end) is belongs to the same zone. - */ -static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) -{ - unsigned long pfn; - struct zone *zone = NULL; - struct page *page; - int i; - for (pfn = start_pfn; - pfn < end_pfn; - pfn += MAX_ORDER_NR_PAGES) { - i = 0; - /* This is just a CONFIG_HOLES_IN_ZONE check.*/ - while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) - i++; - if (i == MAX_ORDER_NR_PAGES) - continue; - page = pfn_to_page(pfn + i); - if (zone && page_zone(page) != zone) - return 0; - zone = page_zone(page); - } - return 1; -} - -/* - * Scanning pfn is much easier than scanning lru list. - * Scan pfn from start to end and Find LRU page. - */ -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) -{ - unsigned long pfn; - struct page *page; - for (pfn = start; pfn < end; pfn++) { - if (pfn_valid(pfn)) { - page = pfn_to_page(pfn); - if (PageLRU(page)) - return pfn; - } - } - return 0; -} - -static struct page * -hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) -{ - /* This should be improooooved!! */ - return alloc_page(GFP_HIGHUSER_MOVABLE); -} - -#define NR_OFFLINE_AT_ONCE_PAGES (256) -static int -do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) -{ - unsigned long pfn; - struct page *page; - int move_pages = NR_OFFLINE_AT_ONCE_PAGES; - int not_managed = 0; - int ret = 0; - LIST_HEAD(source); - - for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) { - if (!pfn_valid(pfn)) - continue; - page = pfn_to_page(pfn); - if (!get_page_unless_zero(page)) - continue; - /* - * We can skip free pages. And we can only deal with pages on - * LRU. - */ - ret = isolate_lru_page(page); - if (!ret) { /* Success */ - put_page(page); - list_add_tail(&page->lru, &source); - move_pages--; - inc_zone_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - - } else { -#ifdef CONFIG_DEBUG_VM - printk(KERN_ALERT "removing pfn %lx from LRU failed\n", - pfn); - dump_page(page); -#endif - put_page(page); - /* Because we don't have big zone->lock. we should - check this again here. */ - if (page_count(page)) { - not_managed++; - ret = -EBUSY; - break; - } - } - } - if (!list_empty(&source)) { - if (not_managed) { - putback_lru_pages(&source); - goto out; - } - /* this function returns # of failed pages */ - ret = migrate_pages(&source, hotremove_migrate_alloc, 0, - true, true); - if (ret) - putback_lru_pages(&source); - } -out: - return ret; -} - -/* * remove from free_area[] and mark all as Reserved. */ static int diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 4ae42bb..15b41ec 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -5,6 +5,9 @@ #include <linux/mm.h> #include <linux/page-isolation.h> #include <linux/pageblock-flags.h> +#include <linux/memcontrol.h> +#include <linux/migrate.h> +#include <linux/mm_inline.h> #include "internal.h"
static inline struct page * @@ -139,3 +142,115 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) spin_unlock_irqrestore(&zone->lock, flags); return ret ? 0 : -EBUSY; } + + +/* + * Confirm all pages in a range [start, end) is belongs to the same zone. + */ +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + struct zone *zone = NULL; + struct page *page; + int i; + for (pfn = start_pfn; + pfn < end_pfn; + pfn += MAX_ORDER_NR_PAGES) { + i = 0; + /* This is just a CONFIG_HOLES_IN_ZONE check.*/ + while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) + i++; + if (i == MAX_ORDER_NR_PAGES) + continue; + page = pfn_to_page(pfn + i); + if (zone && page_zone(page) != zone) + return 0; + zone = page_zone(page); + } + return 1; +} + +/* + * Scanning pfn is much easier than scanning lru list. + * Scan pfn from start to end and Find LRU page. + */ +unsigned long scan_lru_pages(unsigned long start, unsigned long end) +{ + unsigned long pfn; + struct page *page; + for (pfn = start; pfn < end; pfn++) { + if (pfn_valid(pfn)) { + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + } + } + return 0; +} + +struct page * +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) +{ + /* This should be improooooved!! */ + return alloc_page(GFP_HIGHUSER_MOVABLE); +} + +#define NR_OFFLINE_AT_ONCE_PAGES (256) +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + struct page *page; + int move_pages = NR_OFFLINE_AT_ONCE_PAGES; + int not_managed = 0; + int ret = 0; + LIST_HEAD(source); + + for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) { + if (!pfn_valid(pfn)) + continue; + page = pfn_to_page(pfn); + if (!get_page_unless_zero(page)) + continue; + /* + * We can skip free pages. And we can only deal with pages on + * LRU. + */ + ret = isolate_lru_page(page); + if (!ret) { /* Success */ + put_page(page); + list_add_tail(&page->lru, &source); + move_pages--; + inc_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); + + } else { +#ifdef CONFIG_DEBUG_VM + printk(KERN_ALERT "removing pfn %lx from LRU failed\n", + pfn); + dump_page(page); +#endif + put_page(page); + /* Because we don't have big zone->lock. we should + check this again here. */ + if (page_count(page)) { + not_managed++; + ret = -EBUSY; + break; + } + } + } + if (!list_empty(&source)) { + if (not_managed) { + putback_lru_pages(&source); + goto out; + } + /* this function returns # of failed pages */ + ret = migrate_pages(&source, hotremove_migrate_alloc, 0, + true, true); + if (ret) + putback_lru_pages(&source); + } +out: + return ret; +} +
From: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com
This commit introduces alloc_contig_freed_pages() function which allocates (ie. removes from buddy system) free pages in range. Caller has to guarantee that all pages in range are in buddy system.
Along with this function, a free_contig_pages() function is provided which frees all (or a subset of) pages allocated with alloc_contig_free_pages().
Michal Nazarewicz has modified the function to make it easier to allocate not MAX_ORDER_NR_PAGES aligned pages by making it return pfn of one-past-the-last allocated page.
Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/page-isolation.h | 3 ++ mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 0 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 58cdbac..f1417ed 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); */ extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); +extern unsigned long alloc_contig_freed_pages(unsigned long start, + unsigned long end, gfp_t flag); +extern void free_contig_pages(struct page *page, int nr_pages);
/* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4e8985a..00e9b24 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5600,6 +5600,50 @@ out: spin_unlock_irqrestore(&zone->lock, flags); }
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); + zone = page_zone(pfn_to_page(start)); + + spin_lock_irq(&zone->lock); + + page = pfn_to_page(pfn); + for (;;) { + VM_BUG_ON(page_count(page) || !PageBuddy(page)); + list_del(&page->lru); + order = page_order(page); + zone->free_area[order].nr_free--; + rmv_page_order(page); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); + pfn += 1 << order; + if (pfn >= end) + break; + VM_BUG_ON(!pfn_valid(pfn)); + page += 1 << order; + } + + spin_unlock_irq(&zone->lock); + + /* After this, pages in the range can be freed one be one */ + page = pfn_to_page(start); + for (count = pfn - start; count; --count, ++page) + prep_new_page(page, 0, flag); + + return pfn; +} + +void free_contig_pages(struct page *page, int nr_pages) +{ + for (; nr_pages; --nr_pages, ++page) + __free_page(page); +} + #ifdef CONFIG_MEMORY_HOTREMOVE /* * All pages in the range must be isolated before calling this.
From: Michal Nazarewicz m.nazarewicz@samsung.com
This commit adds the alloc_contig_range() function which tries to allecate given range of pages. It tries to migrate all already allocated pages that fall in the range thus freeing them. Once all pages in the range are freed they are removed from the buddy system thus allocated for the caller to use.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: renamed some variables for easier code reading] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/page-isolation.h | 2 + mm/page_alloc.c | 144 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 146 insertions(+), 0 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index f1417ed..c5d1a7c 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); +extern int alloc_contig_range(unsigned long start, unsigned long end, + gfp_t flags); extern void free_contig_pages(struct page *page, int nr_pages);
/* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 00e9b24..2cea044 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5638,6 +5638,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, return pfn; }
+static unsigned long pfn_to_maxpage(unsigned long pfn) +{ + return pfn & ~(MAX_ORDER_NR_PAGES - 1); +} + +static unsigned long pfn_to_maxpage_up(unsigned long pfn) +{ + return ALIGN(pfn, MAX_ORDER_NR_PAGES); +} + +#define MIGRATION_RETRY 5 +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) +{ + int migration_failed = 0, ret; + unsigned long pfn = start; + + /* + * Some code "borrowed" from KAMEZAWA Hiroyuki's + * __alloc_contig_pages(). + */ + + for (;;) { + pfn = scan_lru_pages(pfn, end); + if (!pfn || pfn >= end) + break; + + ret = do_migrate_range(pfn, end); + if (!ret) { + migration_failed = 0; + } else if (ret != -EBUSY + || ++migration_failed >= MIGRATION_RETRY) { + return ret; + } else { + /* There are unstable pages.on pagevec. */ + lru_add_drain_all(); + /* + * there may be pages on pcplist before + * we mark the range as ISOLATED. + */ + drain_all_pages(); + } + cond_resched(); + } + + if (!migration_failed) { + /* drop all pages in pagevec and pcp list */ + lru_add_drain_all(); + drain_all_pages(); + } + + /* Make sure all pages are isolated */ + if (WARN_ON(test_pages_isolated(start, end))) + return -EBUSY; + + return 0; +} + +/** + * alloc_contig_range() -- tries to allocate given range of pages + * @start: start PFN to allocate + * @end: one-past-the-last PFN to allocate + * @flags: flags passed to alloc_contig_freed_pages(). + * + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES + * aligned, hovewer it's callers responsibility to guarantee that we + * are the only thread that changes migrate type of pageblocks the + * pages fall in. + * + * Returns zero on success or negative error code. On success all + * pages which PFN is in (start, end) are allocated for the caller and + * need to be freed with free_contig_pages(). + */ +int alloc_contig_range(unsigned long start, unsigned long end, + gfp_t flags) +{ + unsigned long outer_start, outer_end; + int ret; + + /* + * What we do here is we mark all pageblocks in range as + * MIGRATE_ISOLATE. Because of the way page allocator work, we + * align the range to MAX_ORDER pages so that page allocator + * won't try to merge buddies from different pageblocks and + * change MIGRATE_ISOLATE to some other migration type. + * + * Once the pageblocks are marked as MIGRATE_ISOLATE, we + * migrate the pages from an unaligned range (ie. pages that + * we are interested in). This will put all the pages in + * range back to page allocator as MIGRATE_ISOLATE. + * + * When this is done, we take the pages in range from page + * allocator removing them from the buddy system. This way + * page allocator will never consider using them. + * + * This lets us mark the pageblocks back as + * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the + * MAX_ORDER aligned range but not in the unaligned, original + * range are put back to page allocator so that buddy can use + * them. + */ + + ret = start_isolate_page_range(pfn_to_maxpage(start), + pfn_to_maxpage_up(end)); + if (ret) + goto done; + + ret = __alloc_contig_migrate_range(start, end); + if (ret) + goto done; + + /* + * Pages from [start, end) are within a MAX_ORDER_NR_PAGES + * aligned blocks that are marked as MIGRATE_ISOLATE. What's + * more, all pages in [start, end) are free in page allocator. + * What we are going to do is to allocate all pages from + * [start, end) (that is remove them from page allocater). + * + * The only problem is that pages at the beginning and at the + * end of interesting range may be not aligned with pages that + * page allocator holds, ie. they can be part of higher order + * pages. Because of this, we reserve the bigger range and + * once this is done free the pages we are not interested in. + */ + + ret = 0; + while (!PageBuddy(pfn_to_page(start & (~0UL << ret)))) + if (WARN_ON(++ret >= MAX_ORDER)) + return -EINVAL; + + outer_start = start & (~0UL << ret); + outer_end = alloc_contig_freed_pages(outer_start, end, flags); + + /* Free head and tail (if any) */ + if (start != outer_start) + free_contig_pages(pfn_to_page(outer_start), start - outer_start); + if (end != outer_end) + free_contig_pages(pfn_to_page(end), outer_end - end); + + ret = 0; +done: + undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end)); + return ret; +} + void free_contig_pages(struct page *page, int nr_pages) { for (; nr_pages; --nr_pages, ++page)
From: Michal Nazarewicz m.nazarewicz@samsung.com
The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks.
This guarantees that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system).
It is designed to be used with Contiguous Memory Allocator (CMA) for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, CMA will migrate pages from MIGRATE_CMA pageblocks.
To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types then requested.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: cleaned up Kconfig] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/mmzone.h | 43 ++++++++++++++++++---- mm/Kconfig | 8 ++++- mm/compaction.c | 10 +++++ mm/internal.h | 3 ++ mm/page_alloc.c | 93 ++++++++++++++++++++++++++++++++++++++---------- 5 files changed, 130 insertions(+), 27 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c928dac..a2eeacf 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -35,13 +35,37 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3
-#define MIGRATE_UNMOVABLE 0 -#define MIGRATE_RECLAIMABLE 1 -#define MIGRATE_MOVABLE 2 -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ -#define MIGRATE_RESERVE 3 -#define MIGRATE_ISOLATE 4 /* can't allocate from here */ -#define MIGRATE_TYPES 5 +enum { + MIGRATE_UNMOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_MOVABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_RESERVE = MIGRATE_PCPTYPES, +#ifdef CONFIG_CMA_MIGRATE_TYPE + /* + * MIGRATE_CMA migration type is designed to mimic the way + * ZONE_MOVABLE works. Only movable pages can be allocated + * from MIGRATE_CMA pageblocks and page allocator never + * implicitly change migration type of MIGRATE_CMA pageblock. + * + * The way to use it is to change migratetype of a range of + * pageblocks to MIGRATE_CMA which can be done by + * __free_pageblock_cma() function. What is important though + * is that a range of pageblocks must be aligned to + * MAX_ORDER_NR_PAGES should biggest page be bigger then + * a single pageblock. + */ + MIGRATE_CMA, +#endif + MIGRATE_ISOLATE, /* can't allocate from here */ + MIGRATE_TYPES +}; + +#ifdef CONFIG_CMA_MIGRATE_TYPE +# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) +#else +# define is_migrate_cma(migratetype) false +#endif
#define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ @@ -54,6 +78,11 @@ static inline int get_pageblock_migratetype(struct page *page) return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); }
+static inline bool is_pageblock_cma(struct page *page) +{ + return is_migrate_cma(get_pageblock_migratetype(page)); +} + struct free_area { struct list_head free_list[MIGRATE_TYPES]; unsigned long nr_free; diff --git a/mm/Kconfig b/mm/Kconfig index 8ca47a5..6ffedd8 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -189,7 +189,7 @@ config COMPACTION config MIGRATION bool "Page migration" def_bool y - depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION + depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE help Allows the migration of the physical location of pages of processes while the virtual addresses are not changed. This is useful in @@ -198,6 +198,12 @@ config MIGRATION pages as migration can relocate pages to satisfy a huge page allocation instead of reclaiming.
+config CMA_MIGRATE_TYPE + bool + help + This enables the use the MIGRATE_CMA migrate type, which lets lets CMA + work on almost arbitrary memory range and not only inside ZONE_MOVABLE. + config PHYS_ADDR_T_64BIT def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
diff --git a/mm/compaction.c b/mm/compaction.c index 021a296..6d013c3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page) if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) return false;
+ /* Keep MIGRATE_CMA alone as well. */ + /* + * XXX Revisit. We currently cannot let compaction touch CMA + * pages since compaction insists on changing their migration + * type to MIGRATE_MOVABLE (see split_free_page() called from + * isolate_freepages_block() above). + */ + if (is_migrate_cma(migratetype)) + return false; + /* If the page is a large free page, then allow migration */ if (PageBuddy(page) && page_order(page) >= pageblock_order) return true; diff --git a/mm/internal.h b/mm/internal.h index d071d380..b44566c 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -49,6 +49,9 @@ extern void putback_lru_page(struct page *page); * in mm/page_alloc.c */ extern void __free_pages_bootmem(struct page *page, unsigned int order); +#ifdef CONFIG_CMA_MIGRATE_TYPE +extern void __free_pageblock_cma(struct page *page); +#endif extern void prep_compound_page(struct page *page, unsigned long order); #ifdef CONFIG_MEMORY_FAILURE extern bool is_free_buddy_page(struct page *page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2cea044..c13c0dc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -719,6 +719,30 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order) } }
+#ifdef CONFIG_CMA_MIGRATE_TYPE + +/* + * Free whole pageblock and set it's migration type to MIGRATE_CMA. + */ +void __init __free_pageblock_cma(struct page *page) +{ + struct page *p = page; + unsigned i = pageblock_nr_pages; + + prefetchw(p); + do { + if (--i) + prefetchw(p + 1); + __ClearPageReserved(p); + set_page_count(p, 0); + } while (++p, i); + + set_page_refcounted(page); + set_pageblock_migratetype(page, MIGRATE_CMA); + __free_pages(page, pageblock_order); +} + +#endif
/* * The order of subdivision here is critical for the IO subsystem. @@ -827,11 +851,15 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, * This array describes the order lists are fallen back to when * the free lists for the desirable migrate type are depleted */ -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = { +static int fallbacks[MIGRATE_TYPES][4] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, +#ifdef CONFIG_CMA_MIGRATE_TYPE + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE }, +#else [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, - [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */ +#endif + [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */ };
/* @@ -926,12 +954,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* Find the largest possible block of pages in the other list */ for (current_order = MAX_ORDER-1; current_order >= order; --current_order) { - for (i = 0; i < MIGRATE_TYPES - 1; i++) { + for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) { migratetype = fallbacks[start_migratetype][i];
/* MIGRATE_RESERVE handled later if necessary */ if (migratetype == MIGRATE_RESERVE) - continue; + break;
area = &(zone->free_area[current_order]); if (list_empty(&area->free_list[migratetype])) @@ -946,19 +974,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) * pages to the preferred allocation list. If falling * back for a reclaimable kernel allocation, be more * aggressive about taking ownership of free pages + * + * On the other hand, never change migration + * type of MIGRATE_CMA pageblocks nor move CMA + * pages on different free lists. We don't + * want unmovable pages to be allocated from + * MIGRATE_CMA areas. */ - if (unlikely(current_order >= (pageblock_order >> 1)) || - start_migratetype == MIGRATE_RECLAIMABLE || - page_group_by_mobility_disabled) { - unsigned long pages; + if (!is_pageblock_cma(page) && + (unlikely(current_order >= pageblock_order / 2) || + start_migratetype == MIGRATE_RECLAIMABLE || + page_group_by_mobility_disabled)) { + int pages; pages = move_freepages_block(zone, page, - start_migratetype); + start_migratetype);
- /* Claim the whole block if over half of it is free */ + /* + * Claim the whole block if over half + * of it is free + */ if (pages >= (1 << (pageblock_order-1)) || - page_group_by_mobility_disabled) + page_group_by_mobility_disabled) set_pageblock_migratetype(page, - start_migratetype); + start_migratetype);
migratetype = start_migratetype; } @@ -968,11 +1006,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) rmv_page_order(page);
/* Take ownership for orders >= pageblock_order */ - if (current_order >= pageblock_order) + if (current_order >= pageblock_order && + !is_pageblock_cma(page)) change_pageblock_range(page, current_order, start_migratetype);
- expand(zone, page, order, current_order, area, migratetype); + expand(zone, page, order, current_order, area, + is_migrate_cma(start_migratetype) + ? start_migratetype : migratetype);
trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, migratetype); @@ -1044,7 +1085,12 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add(&page->lru, list); else list_add_tail(&page->lru, list); - set_page_private(page, migratetype); +#ifdef CONFIG_CMA_MIGRATE_TYPE + if (is_pageblock_cma(page)) + set_page_private(page, MIGRATE_CMA); + else +#endif + set_page_private(page, migratetype); list = &page->lru; } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); @@ -1185,9 +1231,16 @@ void free_hot_cold_page(struct page *page, int cold) * offlined but treat RESERVE as movable pages so we can get those * areas back if necessary. Otherwise, we may have to free * excessively into the page allocator + * + * Still, do not change migration type of MIGRATE_CMA pages (if + * they'd be recorded as MIGRATE_MOVABLE an unmovable page could + * be allocated from MIGRATE_CMA block and we don't want to allow + * that). In this respect, treat MIGRATE_CMA like + * MIGRATE_ISOLATE. */ if (migratetype >= MIGRATE_PCPTYPES) { - if (unlikely(migratetype == MIGRATE_ISOLATE)) { + if (unlikely(migratetype == MIGRATE_ISOLATE + || is_migrate_cma(migratetype))) { free_one_page(zone, page, 0, migratetype); goto out; } @@ -1276,7 +1329,9 @@ int split_free_page(struct page *page) if (order >= pageblock_order - 1) { struct page *endpage = page + (1 << order) - 1; for (; page < endpage; page += pageblock_nr_pages) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); + if (!is_pageblock_cma(page)) + set_pageblock_migratetype(page, + MIGRATE_MOVABLE); }
return 1 << order; @@ -5486,8 +5541,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count) */ if (zone_idx(zone) == ZONE_MOVABLE) return true; - - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE) + if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE || + is_pageblock_cma(page)) return true;
pfn = page_to_pfn(page);
From: Michal Nazarewicz m.nazarewicz@samsung.com
This commit changes various functions that change pages and pageblocks migrate type between MIGRATE_ISOLATE and MIGRATE_MOVABLE in such a way as to allow to work with MIGRATE_CMA migrate type.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/page-isolation.h | 40 +++++++++++++++++++++++++++------------- mm/page_alloc.c | 19 ++++++++++++------- mm/page_isolation.c | 15 ++++++++------- 3 files changed, 47 insertions(+), 27 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index c5d1a7c..177b307 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -3,39 +3,53 @@
/* * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. - * If specified range includes migrate types other than MOVABLE, + * If specified range includes migrate types other than MOVABLE or CMA, * this will fail with -EBUSY. * * For isolating all pages in the range finally, the caller have to * free all pages in the range. test_page_isolated() can be used for * test it. */ -extern int -start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); +int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype); + +static inline int +start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +{ + return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); +} + +int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype);
/* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ -extern int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); +static inline int +undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +{ + return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); +}
/* - * test all pages in [start_pfn, end_pfn)are isolated or not. + * Test all pages in [start_pfn, end_pfn) are isolated or not. */ -extern int -test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
/* - * Internal funcs.Changes pageblock's migrate type. - * Please use make_pagetype_isolated()/make_pagetype_movable(). + * Internal functions. Changes pageblock's migrate type. */ -extern int set_migratetype_isolate(struct page *page); -extern void unset_migratetype_isolate(struct page *page); +int set_migratetype_isolate(struct page *page); +void __unset_migratetype_isolate(struct page *page, unsigned migratetype); +static inline void unset_migratetype_isolate(struct page *page) +{ + __unset_migratetype_isolate(page, MIGRATE_MOVABLE); +} extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, - gfp_t flags); + gfp_t flags, unsigned migratetype); extern void free_contig_pages(struct page *page, int nr_pages);
/* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c13c0dc..c5b78ad 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5641,7 +5641,7 @@ out: return ret; }
-void unset_migratetype_isolate(struct page *page) +void __unset_migratetype_isolate(struct page *page, unsigned migratetype) { struct zone *zone; unsigned long flags; @@ -5649,8 +5649,8 @@ void unset_migratetype_isolate(struct page *page) spin_lock_irqsave(&zone->lock, flags); if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) goto out; - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(zone, page, MIGRATE_MOVABLE); + set_pageblock_migratetype(page, migratetype); + move_freepages_block(zone, page, migratetype); out: spin_unlock_irqrestore(&zone->lock, flags); } @@ -5755,6 +5755,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) * @start: start PFN to allocate * @end: one-past-the-last PFN to allocate * @flags: flags passed to alloc_contig_freed_pages(). + * @migratetype: migratetype of the underlaying pageblocks (either + * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks + * in range must have the same migratetype and it must + * be either of the two. * * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES * aligned, hovewer it's callers responsibility to guarantee that we @@ -5766,7 +5770,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) * need to be freed with free_contig_pages(). */ int alloc_contig_range(unsigned long start, unsigned long end, - gfp_t flags) + gfp_t flags, unsigned migratetype) { unsigned long outer_start, outer_end; int ret; @@ -5794,8 +5798,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, * them. */
- ret = start_isolate_page_range(pfn_to_maxpage(start), - pfn_to_maxpage_up(end)); + ret = __start_isolate_page_range(pfn_to_maxpage(start), + pfn_to_maxpage_up(end), migratetype); if (ret) goto done;
@@ -5833,7 +5837,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
ret = 0; done: - undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end)); + __undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end), + migratetype); return ret; }
diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 15b41ec..f8beab5 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) }
/* - * start_isolate_page_range() -- make page-allocation-type of range of pages + * __start_isolate_page_range() -- make page-allocation-type of range of pages * to be MIGRATE_ISOLATE. * @start_pfn: The lower PFN of the range to be isolated. * @end_pfn: The upper PFN of the range to be isolated. + * @migratetype: migrate type to set in error recovery. * * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in * the range will never be allocated. Any free pages and pages freed in the @@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * start_pfn/end_pfn must be aligned to pageblock_order. * Returns 0 on success and -EBUSY if any part of range cannot be isolated. */ -int -start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype) { unsigned long pfn; unsigned long undo_pfn; @@ -59,7 +60,7 @@ undo: for (pfn = start_pfn; pfn < undo_pfn; pfn += pageblock_nr_pages) - unset_migratetype_isolate(pfn_to_page(pfn)); + __unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
return -EBUSY; } @@ -67,8 +68,8 @@ undo: /* * Make isolated pages available again. */ -int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype) { unsigned long pfn; struct page *page; @@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) page = __first_valid_page(pfn, pageblock_nr_pages); if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE) continue; - unset_migratetype_isolate(page); + __unset_migratetype_isolate(page, migratetype); } return 0; }
The Contiguous Memory Allocator is a set of functions that lets one initialise a region of memory which then can be used to perform allocations of contiguous memory chunks from.
CMA allows for creation of separate contexts. Kernel is allowed to allocate movable pages within CMA's managed memory so that it can be used for page cache when CMA devices do not use it. On cm_alloc() request such pages are migrated out of CMA area to free required contiguous block.
This code is heavily based on earlier works by Michal Nazarewicz.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- include/linux/cma.h | 189 +++++++++++++++++++++++++++ mm/Kconfig | 21 +++ mm/Makefile | 1 + mm/cma.c | 358 +++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 569 insertions(+), 0 deletions(-) create mode 100644 include/linux/cma.h create mode 100644 mm/cma.c
diff --git a/include/linux/cma.h b/include/linux/cma.h new file mode 100644 index 0000000..70a993c --- /dev/null +++ b/include/linux/cma.h @@ -0,0 +1,189 @@ +#ifndef __LINUX_CMA_H +#define __LINUX_CMA_H + +/* + * Contiguous Memory Allocator + * Copyright (c) 2010-2011 by Samsung Electronics. + * Written by: + * Michal Nazarewicz mina86@mina86.com + * Marek Szyprowski m.szyprowski@samsung.com + */ + +/* + * Contiguous Memory Allocator + * + * The Contiguous Memory Allocator (CMA) makes it possible for + * device drivers to allocate big contiguous chunks of memory after + * the system has booted. + * + * It requires some machine- and/or platform-specific initialisation + * code which prepares memory ranges to be used with CMA and later, + * device drivers can allocate memory from those ranges. + * + * Why is it needed? + * + * Various devices on embedded systems have no scatter-getter and/or + * IO map support and require contiguous blocks of memory to + * operate. They include devices such as cameras, hardware video + * coders, etc. + * + * Such devices often require big memory buffers (a full HD frame + * is, for instance, more then 2 mega pixels large, i.e. more than 6 + * MB of memory), which makes mechanisms such as kmalloc() or + * alloc_page() ineffective. + * + * At the same time, a solution where a big memory region is + * reserved for a device is suboptimal since often more memory is + * reserved then strictly required and, moreover, the memory is + * inaccessible to page system even if device drivers don't use it. + * + * CMA tries to solve this issue by operating on memory regions + * where only movable pages can be allocated from. This way, kernel + * can use the memory for pagecache and when device driver requests + * it, allocated pages can be migrated. + * + * Driver usage + * + * CMA should not be used directly by the device drivers. It should + * be considered as helper framework for dma-mapping subsystm and + * respective (platform)bus drivers. + * + * The CMA client needs to have a pointer to a CMA context + * represented by a struct cma (which is an opaque data type). + * + * Once such pointer is obtained, a caller may allocate contiguous + * memory chunk using the following function: + * + * cm_alloc() + * + * This function returns a pointer to the first struct page which + * represent a contiguous memory chunk. This pointer + * may be used with the following function: + * + * cm_free() -- frees allocated contiguous memory + * + * Platform/machine integration + * + * CMA context must be created on platform or machine initialisation + * and passed to respective subsystem that will be a client for CMA. + * The latter may be done by a global variable or some filed in + * struct device. For the former CMA provides the following functions: + * + * cma_init_migratetype() + * cma_reserve() + * cma_create() + * + * The first one initialises a portion of reserved memory so that it + * can be used with CMA. The second first tries to reserve memory + * (using memblock) and then initialise it. + * + * The cma_reserve() function must be called when memblock is still + * operational and reserving memory with it is still possible. On + * ARM platform the "reserve" machine callback is a perfect place to + * call it. + * + * The last function creates a CMA context on a range of previously + * initialised memory addresses. Because it uses kmalloc() it needs + * to be called after SLAB is initialised. + */ + +/***************************** Kernel level API *****************************/ + +#ifdef __KERNEL__ + +struct cma; +struct page; + +#ifdef CONFIG_CMA + +/** + * cma_init_migratetype() - initialises range of physical memory to be used + * with CMA context. + * @start: start address of the memory range in bytes. + * @size: size of the memory range in bytes. + * + * The range must be MAX_ORDER_NR_PAGES aligned and it must have been + * already reserved (eg. with memblock). + * + * The actual initialisation is deferred until subsys initcalls are + * evaluated (unless this has already happened). + * + * Returns zero on success or negative error. + */ +int cma_init_migratetype(unsigned long start, unsigned long end); + +/** + * cma_reserve() - reserves memory. + * @start: start address of the memory range in bytes hint; if unsure + * pass zero. + * @size: size of the memory to reserve in bytes. + * + * It will use memblock to allocate memory. It will also call + * cma_init_migratetype() on reserved region so that a CMA context can + * be created on given range. + * + * @start and @size will be aligned to (MAX_ORDER_NR_PAGES << PAGE_SHIFT). + * + * Returns reserved's area physical address or value that yields true + * when checked with IS_ERR_VALUE(). + */ +unsigned long cma_reserve(unsigned long start, unsigned long size); + +/** + * cma_create() - creates a CMA context. + * @start: start address of the context in bytes. + * @size: size of the context in bytes. + * + * The range must be page aligned. Different contexts cannot overlap. + * + * The memory range must either lay in ZONE_MOVABLE or must have been + * initialised with cma_init_migratetype() function. + * + * @start and @size must be page aligned. + * + * Because this function uses kmalloc() it must be called after SLAB + * is initialised. This in particular means that it cannot be called + * just after cma_reserve() since the former needs to be run way + * earlier. + * + * Returns pointer to CMA context or a pointer-error on error. + */ +struct cma *cma_create(unsigned long start, unsigned long size); + +/** + * cma_destroy() - destroys CMA context. + * @cma: context to destroy. + */ +void cma_destroy(struct cma *cma); + +/** + * cm_alloc() - allocates contiguous memory. + * @cma: CMA context to use + * @count: desired chunk size in pages (must be non-zero) + * @order: desired alignment in pages + * + * Returns pointer to first page structure representing contiguous memory + * or a pointer-error on error. + */ +struct page *cm_alloc(struct cma *cma, int count, unsigned int order); + +/** + * cm_free() - frees contiguous memory. + * @cma: CMA context to use + * @pages: contiguous memory to free + * @count: desired chunk size in pages (must be non-zero) + * + */ +void cm_free(struct cma *cma, struct page *pages, int count); + +#else +struct page *cm_alloc(struct cma *cma, int count, unsigned int order) +{ + return NULL; +}; +void cm_free(struct cma *cma, struct page *pages, int count) { } +#endif + +#endif + +#endif diff --git a/mm/Kconfig b/mm/Kconfig index 6ffedd8..b85de4c 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -346,6 +346,27 @@ choice benefit. endchoice
+config CMA + bool "Contiguous Memory Allocator" + select CMA_MIGRATE_TYPE + select MIGRATION + select GENERIC_ALLOCATOR + help + This enables the Contiguous Memory Allocator which allows drivers + to allocate big physically-contiguous blocks of memory for use with + hardware components that do not support I/O map nor scatter-gather. + + For more information see <include/linux/cma.h>. If unsure, say "n". + +config CMA_DEBUG + bool "CMA debug messages (DEVELOPEMENT)" + depends on CMA + help + Turns on debug messages in CMA. This produces KERN_DEBUG + messages for every CMA call as well as various messages while + processing calls such as cm_alloc(). This option does not + affect warning and error messages. + # # UP and nommu archs use km based percpu allocator # diff --git a/mm/Makefile b/mm/Makefile index 836e416..4610320 100644 --- a/mm/Makefile +++ b/mm/Makefile @@ -50,3 +50,4 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o obj-$(CONFIG_CLEANCACHE) += cleancache.o +obj-$(CONFIG_CMA) += cma.o diff --git a/mm/cma.c b/mm/cma.c new file mode 100644 index 0000000..aee306c --- /dev/null +++ b/mm/cma.c @@ -0,0 +1,358 @@ +/* + * Contiguous Memory Allocator + * Copyright (c) 2010-2011 by Samsung Electronics. + * Written by: + * Michal Nazarewicz mina86@mina86.com + * Marek Szyprowski m.szyprowski@samsung.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License or (at your optional) any later version of the license. + */ + +/* + * See include/linux/cma.h for details. + */ + +#define pr_fmt(fmt) "cma: " fmt + +#ifdef CONFIG_CMA_DEBUG +# define DEBUG +#endif + +#include <asm/page.h> +#include <asm/errno.h> + +#include <linux/cma.h> +#include <linux/memblock.h> +#include <linux/err.h> +#include <linux/genalloc.h> +#include <linux/mm.h> +#include <linux/module.h> +#include <linux/mutex.h> +#include <linux/page-isolation.h> +#include <linux/slab.h> +#include <linux/swap.h> +#include <linux/mm_types.h> + +#include "internal.h" + +/* XXX Revisit */ +#ifdef phys_to_pfn +/* nothing to do */ +#elif defined __phys_to_pfn +# define phys_to_pfn __phys_to_pfn +#else +# warning correct phys_to_pfn implementation needed +static unsigned long phys_to_pfn(phys_addr_t phys) +{ + return virt_to_pfn(phys_to_virt(phys)); +} +#endif + + +/************************* Initialise CMA *************************/ + +static struct cma_grabbed { + unsigned long start; + unsigned long size; +} cma_grabbed[8] __initdata; +static unsigned cma_grabbed_count __initdata; + +#ifdef CONFIG_DEBUG_VM + +static int __cma_give_back(unsigned long start, unsigned long size) +{ + unsigned long pfn = phys_to_pfn(start); + unsigned i = size >> PAGE_SHIFT; + struct zone *zone; + + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size); + + VM_BUG_ON(!pfn_valid(pfn)); + zone = page_zone(pfn_to_page(pfn)); + + do { + VM_BUG_ON(!pfn_valid(pfn)); + VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone); + if (!(pfn & (pageblock_nr_pages - 1))) + __free_pageblock_cma(pfn_to_page(pfn)); + ++pfn; + ++totalram_pages; + } while (--i); + + return 0; +} + +#else + +static int __cma_give_back(unsigned long start, unsigned long size) +{ + unsigned i = size >> (PAGE_SHIFT + pageblock_order); + struct page *p = phys_to_page(start); + + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size); + + do { + __free_pageblock_cma(p); + p += pageblock_nr_pages; + totalram_pages += pageblock_nr_pages; + } while (--i); + + return 0; +} + +#endif + +static int __init __cma_queue_give_back(unsigned long start, unsigned long size) +{ + if (cma_grabbed_count == ARRAY_SIZE(cma_grabbed)) + return -ENOSPC; + + cma_grabbed[cma_grabbed_count].start = start; + cma_grabbed[cma_grabbed_count].size = size; + ++cma_grabbed_count; + return 0; +} + +static int (*cma_give_back)(unsigned long start, unsigned long size) = + __cma_queue_give_back; + +static int __init cma_give_back_queued(void) +{ + struct cma_grabbed *r = cma_grabbed; + unsigned i = cma_grabbed_count; + + pr_debug("%s(): will give %u range(s)\n", __func__, i); + + cma_give_back = __cma_give_back; + + for (; i; --i, ++r) + __cma_give_back(r->start, r->size); + + return 0; +} +core_initcall(cma_give_back_queued); + +int __ref cma_init_migratetype(unsigned long start, unsigned long size) +{ + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size); + + if (!size) + return -EINVAL; + if ((start | size) & ((MAX_ORDER_NR_PAGES << PAGE_SHIFT) - 1)) + return -EINVAL; + if (start + size < start) + return -EOVERFLOW; + + return cma_give_back(start, size); +} + +unsigned long cma_reserve(unsigned long start, unsigned long size) +{ + unsigned long alignment; + int ret; + + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size); + + /* Sanity checks */ + if (!size) + return (unsigned long)-EINVAL; + + /* Sanitise input arguments */ + start = ALIGN(start, MAX_ORDER_NR_PAGES << PAGE_SHIFT); + size = ALIGN(size , MAX_ORDER_NR_PAGES << PAGE_SHIFT); + alignment = MAX_ORDER_NR_PAGES << PAGE_SHIFT; + + /* Reserve memory */ + if (start) { + if (memblock_is_region_reserved(start, size) || + memblock_reserve(start, size) < 0) + return (unsigned long)-EBUSY; + } else { + /* + * Use __memblock_alloc_base() since + * memblock_alloc_base() panic()s. + */ + u64 addr = __memblock_alloc_base(size, alignment, 0); + if (!addr) { + return (unsigned long)-ENOMEM; + } else if (addr + size > ~(unsigned long)0) { + memblock_free(addr, size); + return (unsigned long)-EOVERFLOW; + } else { + start = addr; + } + } + + /* CMA Initialise */ + ret = cma_init_migratetype(start, size); + if (ret < 0) { + memblock_free(start, size); + return ret; + } + + return start; +} + + +/************************** CMA context ***************************/ + +struct cma { + int migratetype; + struct gen_pool *pool; +}; + +static int __cma_check_range(unsigned long start, unsigned long size) +{ + int migratetype = MIGRATE_MOVABLE; + unsigned long pfn, count; + struct page *page; + struct zone *zone; + + start = phys_to_pfn(start); + if (WARN_ON(!pfn_valid(start))) + return -EINVAL; + + if (page_zonenum(pfn_to_page(start)) != ZONE_MOVABLE) + migratetype = MIGRATE_CMA; + + /* First check if all pages are valid and in the same zone */ + zone = page_zone(pfn_to_page(start)); + count = size >> PAGE_SHIFT; + pfn = start; + while (++pfn, --count) { + if (WARN_ON(!pfn_valid(pfn)) || + WARN_ON(page_zone(pfn_to_page(pfn)) != zone)) + return -EINVAL; + } + + /* Now check migratetype of their pageblocks. */ + start = start & ~(pageblock_nr_pages - 1); + pfn = ALIGN(pfn, pageblock_nr_pages); + page = pfn_to_page(start); + count = (pfn - start) >> PAGE_SHIFT; + do { + if (WARN_ON(get_pageblock_migratetype(page) != migratetype)) + return -EINVAL; + page += pageblock_nr_pages; + } while (--count); + + return migratetype; +} + +struct cma *cma_create(unsigned long start, unsigned long size) +{ + struct gen_pool *pool; + int migratetype, ret; + struct cma *cma; + + pr_debug("%s(%p+%p)\n", __func__, (void *)start, (void *)size); + + if (!size) + return ERR_PTR(-EINVAL); + if ((start | size) & (PAGE_SIZE - 1)) + return ERR_PTR(-EINVAL); + if (start + size < start) + return ERR_PTR(-EOVERFLOW); + + migratetype = __cma_check_range(start, size); + if (migratetype < 0) + return ERR_PTR(migratetype); + + cma = kmalloc(sizeof *cma, GFP_KERNEL); + if (!cma) + return ERR_PTR(-ENOMEM); + + pool = gen_pool_create(ffs(PAGE_SIZE) - 1, -1); + if (!pool) { + ret = -ENOMEM; + goto error1; + } + + ret = gen_pool_add(pool, start, size, -1); + if (unlikely(ret)) + goto error2; + + cma->migratetype = migratetype; + cma->pool = pool; + + pr_debug("%s: returning <%p>\n", __func__, (void *)cma); + return cma; + +error2: + gen_pool_destroy(pool); +error1: + kfree(cma); + return ERR_PTR(ret); +} + +void cma_destroy(struct cma *cma) +{ + pr_debug("%s(<%p>)\n", __func__, (void *)cma); + gen_pool_destroy(cma->pool); +} + + +/************************* Allocate and free *************************/ + +/* Protects cm_alloc(), cm_free() as well as gen_pools of each cm. */ +static DEFINE_MUTEX(cma_mutex); + +struct page *cm_alloc(struct cma *cma, int count, unsigned int order) +{ + unsigned long start; + unsigned long size = count << PAGE_SHIFT; + + if (!cma) + return NULL; + + pr_debug("%s(<%p>, %lx/%d)\n", __func__, (void *)cma, size, order); + + if (!size) + return NULL; + + mutex_lock(&cma_mutex); + + start = gen_pool_alloc_aligned(cma->pool, size, order+12); + if (!start) + goto error1; + + if (cma->migratetype) { + unsigned long pfn = phys_to_pfn(start); + int ret = alloc_contig_range(pfn, pfn + (size >> PAGE_SHIFT), + 0, cma->migratetype); + if (ret) + goto error2; + } + + mutex_unlock(&cma_mutex); + + pr_debug("%s(): returning [%p]\n", __func__, (void *)phys_to_page(start)); + return phys_to_page(start); +error2: + gen_pool_free(cma->pool, start, size); +error1: + mutex_unlock(&cma_mutex); + return NULL; +} +EXPORT_SYMBOL_GPL(cm_alloc); + +void cm_free(struct cma *cma, struct page *pages, int count) +{ + unsigned long size = count << PAGE_SHIFT; + pr_debug("%s([%p])\n", __func__, (void *)pages); + + if (!cma || !pages) + return; + + mutex_lock(&cma_mutex); + + gen_pool_free(cma->pool, page_to_phys(pages), size); + if (cma->migratetype) + free_contig_pages(pages, count); + + mutex_unlock(&cma_mutex); +} +EXPORT_SYMBOL_GPL(cm_free);
On Friday 10 June 2011, Marek Szyprowski wrote:
The Contiguous Memory Allocator is a set of functions that lets one initialise a region of memory which then can be used to perform allocations of contiguous memory chunks from.
CMA allows for creation of separate contexts. Kernel is allowed to allocate movable pages within CMA's managed memory so that it can be used for page cache when CMA devices do not use it. On cm_alloc() request such pages are migrated out of CMA area to free required contiguous block.
Hi Marek,
I'm generally happy with the patches 1 through 7, i.e the heavy lifting to make contiguous allocations work. Thank you very much for keeping up the work and submitting these in a good shape.
I do think that we need to discuss the driver-visible API a bit more. My feeling is that this is rather un-Linux-like and it needs to be simplified some more. Of course, I don't mind being overruled by the memory management experts here, or if you can argue that it's really the right way to do it.
- Driver usage
- CMA should not be used directly by the device drivers. It should
- be considered as helper framework for dma-mapping subsystm and
- respective (platform)bus drivers.
- The CMA client needs to have a pointer to a CMA context
- represented by a struct cma (which is an opaque data type).
- Once such pointer is obtained, a caller may allocate contiguous
- memory chunk using the following function:
cm_alloc()
- This function returns a pointer to the first struct page which
- represent a contiguous memory chunk. This pointer
- may be used with the following function:
cm_free() -- frees allocated contiguous memory
Please explain why you want a new top-level API here. I think it would be much nicer if a device driver could simply call alloc_pages(..., GFP_CMA) or similar, where all the complexity in here gets hidden behind a conditional deep inside of the page allocator.
Evidently, the two functions you describe here have an extra argument for the context. Can you elaborate why that is really needed? What is the specific requirement to have multiple such contexts in one system and what is the worst-case effect that you would get when the interface is simplified to have only one for all devices?
Alternatively, would it be possible to provide the cm_alloc/cm_free functions only as backing to the dma mapping API and not export them as a generic device driver interface?
- Platform/machine integration
- CMA context must be created on platform or machine initialisation
- and passed to respective subsystem that will be a client for CMA.
- The latter may be done by a global variable or some filed in
- struct device. For the former CMA provides the following functions:
cma_init_migratetype()
cma_reserve()
cma_create()
- The first one initialises a portion of reserved memory so that it
- can be used with CMA. The second first tries to reserve memory
- (using memblock) and then initialise it.
- The cma_reserve() function must be called when memblock is still
- operational and reserving memory with it is still possible. On
- ARM platform the "reserve" machine callback is a perfect place to
- call it.
- The last function creates a CMA context on a range of previously
- initialised memory addresses. Because it uses kmalloc() it needs
- to be called after SLAB is initialised.
- */
This interface looks flawed to me for multiple reasons:
* It requires you to call three distinct functions in order to do one thing, and they all take the same arguments (more or less). Why not have one function call at the latest possible point where you can still change the memory attributes, and have everything else happen automatically?
* It requires you to pass the exact location of the area. I can see why you want that on certain machines that require DMA areas to be spread across multiple memory buses, but IMHO it's not appropriate for a generic API.
* It requires you to hardcode the size in a machine specific source file. This probably seems to be a natural thing to do when you have worked a lot on the ARM architecture, but it's really not. We really want to get rid of board files and replace them with generic probing based on the device tree, and the size of the area is more policy than a property of the hardware that can be accurately described in the device tree or a board file.
I'm sure that we can find a solution for all of these problems, it just takes a few good ideas. Especially for the last point, I don't have a better suggestion yet, but hopefully someone else can come up with one.
Arnd
Hello,
On Friday, June 10, 2011 6:22 PM Arnd Bergmann wrote:
On Friday 10 June 2011, Marek Szyprowski wrote:
The Contiguous Memory Allocator is a set of functions that lets one initialise a region of memory which then can be used to perform allocations of contiguous memory chunks from.
CMA allows for creation of separate contexts. Kernel is allowed to allocate movable pages within CMA's managed memory so that it can be used for page cache when CMA devices do not use it. On cm_alloc() request such pages are migrated out of CMA area to free required contiguous block.
Hi Marek,
I'm generally happy with the patches 1 through 7, i.e the heavy lifting to make contiguous allocations work. Thank you very much for keeping up the work and submitting these in a good shape.
I do think that we need to discuss the driver-visible API a bit more. My feeling is that this is rather un-Linux-like and it needs to be simplified some more. Of course, I don't mind being overruled by the memory management experts here, or if you can argue that it's really the right way to do it.
Thanks for your comments!
- Driver usage
- CMA should not be used directly by the device drivers. It should
- be considered as helper framework for dma-mapping subsystm and
- respective (platform)bus drivers.
- The CMA client needs to have a pointer to a CMA context
- represented by a struct cma (which is an opaque data type).
- Once such pointer is obtained, a caller may allocate contiguous
- memory chunk using the following function:
cm_alloc()
- This function returns a pointer to the first struct page which
- represent a contiguous memory chunk. This pointer
- may be used with the following function:
cm_free() -- frees allocated contiguous memory
Please explain why you want a new top-level API here. I think it would be much nicer if a device driver could simply call alloc_pages(..., GFP_CMA) or similar, where all the complexity in here gets hidden behind a conditional deep inside of the page allocator.
Evidently, the two functions you describe here have an extra argument for the context. Can you elaborate why that is really needed? What is the specific requirement to have multiple such contexts in one system and what is the worst-case effect that you would get when the interface is simplified to have only one for all devices?
Alternatively, would it be possible to provide the cm_alloc/cm_free functions only as backing to the dma mapping API and not export them as a generic device driver interface?
cm_alloc/free are definitely not meant to be called from device drivers. They should be only considered as a backend for dma-mapping.
'Raw' contiguous memory block doesn't really make sense for the device drivers. What the drivers require is a contiguous memory block that is somehow mapped into respective bus address space, so dma-mapping framework is the right interface.
alloc_pages(..., GFP_CMA) looks nice but in fact it is really impractical. The driver will need to map such buffer to dma context anyway, so imho dma_alloc_attributed() will give the drivers much more flexibility. In terms of dma-mapping the context argument isn't anything odd.
If possible I would like to make cma something similar to declare_dma_coherent()&friends, so the board/platform/bus startup code will just call declare_dma_contiguous() to enable support for cma for particular devices.
- Platform/machine integration
- CMA context must be created on platform or machine initialisation
- and passed to respective subsystem that will be a client for CMA.
- The latter may be done by a global variable or some filed in
- struct device. For the former CMA provides the following
functions:
cma_init_migratetype()
cma_reserve()
cma_create()
- The first one initialises a portion of reserved memory so that it
- can be used with CMA. The second first tries to reserve memory
- (using memblock) and then initialise it.
- The cma_reserve() function must be called when memblock is still
- operational and reserving memory with it is still possible. On
- ARM platform the "reserve" machine callback is a perfect place to
- call it.
- The last function creates a CMA context on a range of previously
- initialised memory addresses. Because it uses kmalloc() it needs
- to be called after SLAB is initialised.
- */
This interface looks flawed to me for multiple reasons:
- It requires you to call three distinct functions in order to do one thing, and they all take the same arguments (more or less). Why not have one function call at the latest possible point where you can still change the memory attributes, and have everything else happen automatically?
Initialization part will be definitely simplified, I must confess that I was in hurry to post the patches before the weekend and just forgot to cleanup this part...
- It requires you to pass the exact location of the area. I can see why you want that on certain machines that require DMA areas to be spread across multiple memory buses, but IMHO it's not appropriate for a generic API.
IMHO we can also use some NULL context to indicate some global, system wide CMA area and again -> in terms of dma-mapping api having a context isn't anything uncommon.
- It requires you to hardcode the size in a machine specific source file. This probably seems to be a natural thing to do when you have worked a lot on the ARM architecture, but it's really not. We really want to get rid of board files and replace them with generic probing based on the device tree, and the size of the area is more policy than a property of the hardware that can be accurately described in the device tree or a board file.
The problem is the fact that right now, we still have board files and we have to live with them for a while (with all advantages and disadvantages). I hope that you won't require me to rewrite the whole support for all ARM platforms to get rid of board files to get CMA merged ;)
I see no problem defining CMA areas in device tree, as this is something really specific to particular board configuration.
I'm sure that we can find a solution for all of these problems, it just takes a few good ideas. Especially for the last point, I don't have a better suggestion yet, but hopefully someone else can come up with one.
I hope we will manage to get agreement :)
Best regards
On Monday 13 June 2011, Marek Szyprowski wrote:
cm_alloc/free are definitely not meant to be called from device drivers. They should be only considered as a backend for dma-mapping.
'Raw' contiguous memory block doesn't really make sense for the device drivers. What the drivers require is a contiguous memory block that is somehow mapped into respective bus address space, so dma-mapping framework is the right interface.
alloc_pages(..., GFP_CMA) looks nice but in fact it is really impractical. The driver will need to map such buffer to dma context anyway, so imho dma_alloc_attributed() will give the drivers much more flexibility. In terms of dma-mapping the context argument isn't anything odd.
Ok.
If possible I would like to make cma something similar to declare_dma_coherent()&friends, so the board/platform/bus startup code will just call declare_dma_contiguous() to enable support for cma for particular devices.
Sounds good, I like that.
This interface looks flawed to me for multiple reasons:
- It requires you to call three distinct functions in order to do one thing, and they all take the same arguments (more or less). Why not have one function call at the latest possible point where you can still change the memory attributes, and have everything else happen automatically?
Initialization part will be definitely simplified, I must confess that I was in hurry to post the patches before the weekend and just forgot to cleanup this part...
Fair enough. In cases like this, it's often good to add a TODO section to the patch description, or a FIXME comment in the patch itself.
- It requires you to pass the exact location of the area. I can see why you want that on certain machines that require DMA areas to be spread across multiple memory buses, but IMHO it's not appropriate for a generic API.
IMHO we can also use some NULL context to indicate some global, system wide CMA area and again -> in terms of dma-mapping api having a context isn't anything uncommon.
Please explain the exact requirements that lead you to defining multiple contexts. My feeling is that in most cases we don't need them and can simply live with a single area. Depending on how obscure the cases are where we do need something beyond that, we can then come up with special-case solutions for them.
- It requires you to hardcode the size in a machine specific source file. This probably seems to be a natural thing to do when you have worked a lot on the ARM architecture, but it's really not. We really want to get rid of board files and replace them with generic probing based on the device tree, and the size of the area is more policy than a property of the hardware that can be accurately described in the device tree or a board file.
The problem is the fact that right now, we still have board files and we have to live with them for a while (with all advantages and disadvantages). I hope that you won't require me to rewrite the whole support for all ARM platforms to get rid of board files to get CMA merged ;)
Of course not. But we need to know what we want a platform with device tree support to look like when it's using CMA, so we don't make it harder to change the platforms over than it already is.
I see no problem defining CMA areas in device tree, as this is something really specific to particular board configuration.
The problem with defining CMA areas in the device tree is that it's not a property of the hardware, but really policy. The device tree file should not contain anything related to a specific operating system because you might want to boot something on the board that doesn't know about CMA, and even when you only care about using Linux, the implementation might change to the point where hardcoded CMA areas no longer make sense.
IMHO we should first aim for a solution that works almost everywhere without the kernel knowing what board it's running on, and then we can add quirks for devices that have special requirements. I think the situation is similar to the vmalloc virtual address space, which normally has a hardcoded size that works almost everywhere, but there are certain drivers etc that require much more, or there are situations where you want to make it smaller in order to avoid highmem.
Arnd
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann arnd@arndb.de wrote:
Please explain the exact requirements that lead you to defining multiple contexts.
Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes.
On Tuesday 14 June 2011, Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann arnd@arndb.de wrote:
Please explain the exact requirements that lead you to defining multiple contexts.
Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes.
For all I know, that is something that is only true for a few very special Samsung devices, and is completely unrelated of the need for contiguous allocations, so this approach becomes pointless as soon as the next generation of that chip grows an IOMMU, where we don't handle the special bank attributes. Also, the way I understood the situation for the Samsung SoC during the Budapest discussion, it's only a performance hack, not a functional requirement, unless you count '1080p playback' as a functional requirement.
Supporting contiguous allocation is a very useful goal and many people want this, but supporting a crazy one-off hardware design with lots of generic infrastructure is going a bit too far. If you can't be more specific than 'some devices may need this', I would suggest going forward without having multiple regions:
* Remove the registration of specific addresses from the initial patch set (but keep the patch). * Add a heuristic plus command-line override to automatically come up with a reasonable location+size for *one* CMA area in the system. * Ship the patch to add support for multiple CMA areas with the BSP for the boards that need it (if any). * Wait for someone on a non-Samsung SoC to run into the same problem, then have /them/ get the final patch in.
Even if you think you can convince enough people that having support for distinct predefined regions is a good idea, I would recommend splitting that out of the initial merge so we can have that discussion separately from the other issues.
Arnd
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann arnd@arndb.de wrote:
Please explain the exact requirements that lead you to defining multiple contexts.
On Tuesday 14 June 2011, Michal Nazarewicz wrote:
Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes.
On Tue, 14 Jun 2011 18:03:00 +0200, Arnd Bergmann wrote:
For all I know, that is something that is only true for a few very special Samsung devices,
Maybe. I'm just answering your question. :)
Ah yes, I forgot that separate regions for different purposes could decrease fragmentation.
I would suggest going forward without having multiple regions:
Is having support for multiple regions a bad thing? Frankly, removing this support will change code from reading context passed as argument to code reading context from global variable. Nothing is gained; functionality is lost.
- Remove the registration of specific addresses from the initial patch set (but keep the patch).
- Add a heuristic plus command-line override to automatically come up with a reasonable location+size for *one* CMA area in the system.
I'm not arguing those.
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 18:03:00 +0200, Arnd Bergmann wrote:
For all I know, that is something that is only true for a few very special Samsung devices,
Maybe. I'm just answering your question. :)
Ah yes, I forgot that separate regions for different purposes could decrease fragmentation.
That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this.
I would suggest going forward without having multiple regions:
Is having support for multiple regions a bad thing? Frankly, removing this support will change code from reading context passed as argument to code reading context from global variable. Nothing is gained; functionality is lost.
What is bad IMHO is making them the default, which forces the board code to care about memory management details. I would much prefer to have contiguous allocation parameters tuned automatically to just work on most boards before we add ways to do board-specific hacks.
Arnd
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
Is having support for multiple regions a bad thing? Frankly, removing this support will change code from reading context passed as argument to code reading context from global variable. Nothing is gained; functionality is lost.
On Tue, 14 Jun 2011 20:30:07 +0200, Arnd Bergmann wrote:
What is bad IMHO is making them the default, which forces the board code to care about memory management details. I would much prefer to have contiguous allocation parameters tuned automatically to just work on most boards before we add ways to do board-specific hacks.
I see those as orthogonal problems. The code can have support for multiple contexts but by default use a single global context exported as cma_global variable (or some such).
And I'm not arguing against having “contiguous allocation parameters tuned automatically to just work on most boards”. I just don't see the reason to delete functionality that is already there, does not add much code and can be useful.
Hello,
On Tuesday, June 14, 2011 8:30 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 18:03:00 +0200, Arnd Bergmann wrote:
For all I know, that is something that is only true for a few very special Samsung devices,
Maybe. I'm just answering your question. :)
Ah yes, I forgot that separate regions for different purposes could decrease fragmentation.
That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this.
I would suggest going forward without having multiple regions:
Is having support for multiple regions a bad thing? Frankly, removing this support will change code from reading context passed as argument to code reading context from global variable. Nothing is gained; functionality is lost.
What is bad IMHO is making them the default, which forces the board code to care about memory management details. I would much prefer to have contiguous allocation parameters tuned automatically to just work on most boards before we add ways to do board-specific hacks.
I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time).
The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided).
Best regards
On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time).
Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults.
The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided).
The command line option can be a last resort if the heuristics fail, but it's not much better than a fixed Kconfig setting.
How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations?
Arnd
Hello,
On Wednesday, June 15, 2011 9:37 AM Arnd Bergmann wrote:
On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time).
Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults.
The problem is to define these reasonable defaults, because they also depend on the target usage pattern for the board. If one doesn't plan to use video codec at all, then the value calculated for full HD movie decoding are definitely too high.
The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided).
The command line option can be a last resort if the heuristics fail, but it's not much better than a fixed Kconfig setting.
How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations?
There can be probably both types of Kconfig entries: for absolute value and the percentage of the total memory, but still, creating a fully-functional multi-board kernel will be really hard.
However there is one more issue here. It is quite common that embedded systems have memory that is not really contiguous in address space (there are some holes that splits it into 'banks' or regions). So we might have a system with 256MiB of memory split into 2 memory banks. In this case the CMA pool can use only one of them (the pool must be contiguous internally). I'm not sure if such systems might require more memory for contiguous buffers.
Best regards
On 06/15/2011 12:37 AM, Arnd Bergmann wrote:
On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time).
Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults.
Is this a situation where passing the information from device tree might help? I know this does not help short term, but I am trying to understand the sorts of problems device tree can help solve.
Philip
The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided).
The command line option can be a last resort if the heuristics fail, but it's not much better than a fixed Kconfig setting.
How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations?
Arnd
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Thursday 16 June 2011 02:48:12 Philip Balister wrote:
On 06/15/2011 12:37 AM, Arnd Bergmann wrote:
On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time).
Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults.
Is this a situation where passing the information from device tree might help? I know this does not help short term, but I am trying to understand the sorts of problems device tree can help solve.
The device tree is a good place to describe any hardware properties such as 'this device will need 32 MB of contiguous allocations on the memory bank described in that other device node'.
It is however not a good place to describe user settings such as 'I want to give this device a 200 MB pool for large allocations so I can run application X efficiently', because that would require knowledge in the boot loader about local policy, which it should generally not care about.
Arnd
On Wednesday, June 15, 2011 09:37:18 Arnd Bergmann wrote:
On Wednesday 15 June 2011 09:11:39 Marek Szyprowski wrote:
I see your concerns, but I really wonder how to determine the properties of the global/default cma pool. You definitely don't want to give all available memory o CMA, because it will have negative impact on kernel operation (kernel really needs to allocate unmovable pages from time to time).
Exactly. This is a hard problem, so I would prefer to see a solution for coming up with reasonable defaults.
The only solution I see now is to provide Kconfig entry to determine the size of the global CMA pool, but this still have some issues, especially for multi-board kernels (each board probably will have different amount of RAM and different memory-consuming devices available). It looks that each board startup code still might need to tweak the size of CMA pool. I can add a kernel command line option for it, but such solution also will not solve all the cases (afair there was a discussion about kernel command line parameters for memory configuration and the conclusion was that it should be avoided).
The command line option can be a last resort if the heuristics fail, but it's not much better than a fixed Kconfig setting.
How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations?
I would actually like to see a cma_size kernel option of some sort. This would be for the global CMA pool only as I don't think we should try to do anything more complicated here.
While it is relatively easy for embedded systems to do a recompile every time you need to change the pool size, this isn't an option on 'normal' desktop systems.
While usually you have more than enough memory on such systems and don't need CMA, there are a number of cases where you do want to reserve sufficient memory. Usually these involve lots of video capture cards in one system.
What I was wondering about is how this patch series changes the allocation in case it can't allocate from the CMA pool. Will it attempt to fall back to a 'normal' allocation?
The reason I ask is that for desktop systems you could just start with a CMA pool of size 0. And only in specific situations would you need to add a cma_size kernel parameter depending on your needs. But this scheme would require a fallback scenario in case of a global CMA pool of size 0.
Hmm, perhaps this fallback scenario is more driver specific. For SoC platform video devices you may not want a fallback, whereas for PCI(e)/USB devices you do. I don't know what's best, frankly.
Regards,
Hans
On Wed, 22 Jun 2011 09:03:30 +0200, Hans Verkuil hverkuil@xs4all.nl wrote:
What I was wondering about is how this patch series changes the allocation in case it can't allocate from the CMA pool. Will it attempt to fall back to a 'normal' allocation?
Unless Marek changed something since I wrote the code, which I doubt, if CMA cannot obtain memory from CMA region, it will fail.
Part of the reason is that CMA lacks the knowledge where to allocate memory from. For instance, with the case of several memory banks, it does not know which memory bank to allocate from.
It is, in my opinion, a task for a higher level functions (read: DMA layer) to try another mechanism if CMA fails.
On Wednesday 22 June 2011, Hans Verkuil wrote:
How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations?
I would actually like to see a cma_size kernel option of some sort. This would be for the global CMA pool only as I don't think we should try to do anything more complicated here.
A command line is probably good to override the compile-time default, yes.
We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations.
My feeling is that we need to find a way to integrate the global settings for four kinds of allocations:
* nonmovable kernel pages * hugetlb pages * CMA * memory hotplug
These essentially fight over the same memory (though things are slightly different with dynamic hugepages), and they all face the same basic problem of getting as much for themselves without starving the other three.
Arnd
Hello,
On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote:
On Wednesday 22 June 2011, Hans Verkuil wrote:
How about a Kconfig option that defines the percentage of memory to set aside for contiguous allocations?
I would actually like to see a cma_size kernel option of some sort. This
would
be for the global CMA pool only as I don't think we should try to do
anything
more complicated here.
A command line is probably good to override the compile-time default, yes.
We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations.
Sounds really good, but it might be really hard to implemnt, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet.
My feeling is that we need to find a way to integrate the global settings for four kinds of allocations:
- nonmovable kernel pages
- hugetlb pages
- CMA
- memory hotplug
These essentially fight over the same memory (though things are slightly different with dynamic hugepages), and they all face the same basic problem of getting as much for themselves without starving the other three.
I'm not sure we can solve all such issues in the first version. Maybe we should first have each of the above fully working in mainline separately and then start the integration works.
Best regards
On Wednesday 22 June 2011, Marek Szyprowski wrote:
Sounds really good, but it might be really hard to implemnt, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet.
Why that? I would expect you can do the same that hugepages (used to) do and just attempt high-order allocations. If they succeed, you can add them as a CMA region and free them again, into the movable set of pages, otherwise you just fail the request from user space when the memory is already fragmented.
These essentially fight over the same memory (though things are slightly different with dynamic hugepages), and they all face the same basic problem of getting as much for themselves without starving the other three.
I'm not sure we can solve all such issues in the first version. Maybe we should first have each of the above fully working in mainline separately and then start the integration works.
Yes, makes sense. We just need to be careful not to introduce user-visible interfaces that we cannot change any more in the process.
Arnd
On Wed, 22 Jun 2011 15:39:23 +0200, Arnd Bergmann arnd@arndb.de wrote:
Why that? I would expect you can do the same that hugepages (used to) do and just attempt high-order allocations. If they succeed, you can add them as a CMA region and free them again, into the movable set of pages, otherwise you just fail the request from user space when the memory is already fragmented.
Problem with that is that CMA needs to have whole pageblocks allocated and buddy can allocate at most half a pageblock.
On Wednesday, June 22, 2011 2:42 PM Arnd Bergmann wrote:
We could also go further and add a runtime sysctl mechanism like the one for hugepages, where you can grow the pool at run time as long as there is enough free contiguous memory (e.g. from init scripts), or shrink it later if you want to allow larger nonmovable allocations.
On Wed, 22 Jun 2011 15:15:35 +0200, Marek Szyprowski wrote:
Sounds really good, but it might be really hard to implement, at least for CMA, because it needs to tweak parameters of memory management internal structures very early, when buddy allocator has not been activated yet.
If you are able to allocate a pageblock of free memory from buddy system, you should be able to convert it to CMA memory with no problems.
Also, if you want to convert CMA memory back to regular memory you should be able to do that even if some of the memory is used by CMA (it just won't be available right away but only when CMA frees it).
It is important to note that, because of the use of migration type, all such conversion have to be performed on pageblock basis.
I don't think this is a feature we should consider for the first patch though. We started with an overgrown idea about what CMA might do and it didn't got us far. Let's first get the basics right and then start implementing features as they become needed.
On Tue, Jun 14, 2011 at 20:30, Arnd Bergmann arnd@arndb.de wrote:
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
Ah yes, I forgot that separate regions for different purposes could decrease fragmentation.
That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this.
imo no allocator algorithm is gonna help if you have comparably large, variable-sized contiguous allocations out of a restricted address range. It might work well enough if there are only a few sizes and/or there's decent headroom. But for really generic workloads this would require sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed with ttm).
So if this is only a requirement on very few platforms and can be cheaply fixed with multiple cma allocation areas (heck, we have slabs for the same reasons in the kernel), it might be a sensible compromise. -Daniel
On 06/15/2011 01:53 PM, Daniel Vetter wrote:
On Tue, Jun 14, 2011 at 20:30, Arnd Bergmannarnd@arndb.de wrote:
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
Ah yes, I forgot that separate regions for different purposes could decrease fragmentation.
That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this.
imo no allocator algorithm is gonna help if you have comparably large, variable-sized contiguous allocations out of a restricted address range. It might work well enough if there are only a few sizes and/or there's decent headroom. But for really generic workloads this would require sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed with ttm).
Indeed, IIRC on the meeting I pointed out that there is no way to generically solve the fragmentation problem without movable buffers. (I'd do it as a simple CMA backend to TTM). This is exactly the same problem as trying to fit buffers in a limited VRAM area.
/Thomas
On Wednesday 15 June 2011, Daniel Vetter wrote:
On Tue, Jun 14, 2011 at 20:30, Arnd Bergmann arnd@arndb.de wrote:
On Tuesday 14 June 2011 18:58:35 Michal Nazarewicz wrote:
Ah yes, I forgot that separate regions for different purposes could decrease fragmentation.
That is indeed a good point, but having a good allocator algorithm could also solve this. I don't know too much about these allocation algorithms, but there are probably multiple working approaches to this.
imo no allocator algorithm is gonna help if you have comparably large, variable-sized contiguous allocations out of a restricted address range. It might work well enough if there are only a few sizes and/or there's decent headroom. But for really generic workloads this would require sync objects and eviction callbacks (i.e. what Thomas Hellstrom pushed with ttm).
The requirements are quite different depending on what system you look at. In a lot of cases, the constraints are not that tight at all, and CMA will easily help to turn "works sometimes" into "works almost always". Let's get there first and then look into the harder problems.
Unfortunately, memory allocation gets nondeterministic in the corner cases, you can simply get the system into a state where you don't have enough memory when you try to do too many things at once. This may sound like a platitude but it's really what is behind all this:
If we had unlimited amounts of RAM, we would never need CMA, we could simply set aside a lot of memory at boot time. Having one CMA area with movable page eviction lets you build systems capable of doing the same thing with less RAM than without CMA. Adding more complexity lets you reduce that amount further.
The other aspects that have been mentioned about bank affinity and SRAM are pretty orthogonal to the allocation, so we should also treat them separately.
So if this is only a requirement on very few platforms and can be cheaply fixed with multiple cma allocation areas (heck, we have slabs for the same reasons in the kernel), it might be a sensible compromise.
Yes, we can probably add it later when we find out what the limits of the generic approach are. I don't really mind having the per-device pointers to CMA areas, we just need to come up with a good way to initialize them.
Arnd
Hi,
On Tue, Jun 14, 2011 at 06:03:00PM +0200, Arnd Bergmann wrote:
On Tuesday 14 June 2011, Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann arnd@arndb.de wrote:
Please explain the exact requirements that lead you to defining multiple contexts.
Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes.
For all I know, that is something that is only true for a few very special Samsung devices, and is completely unrelated of the need for contiguous allocations, so this approach becomes pointless as soon as the next generation of that chip grows an IOMMU, where we don't handle the special bank attributes. Also, the way I understood the situation for the Samsung SoC during the Budapest discussion, it's only a performance hack, not a functional requirement, unless you count '1080p playback' as a functional requirement.
Hm, I think that was something similar but not quite the same: talking about having allocations split to lie between two banks of RAM to maximise the read/write speed for performance reasons. That's something that can be handled in the allocator, rather than an API constraint, as this is.
Not that I know of any hardware which is limited as such, but eh.
Cheers, Daniel
On 14 June 2011 12:01, Daniel Stone daniels@collabora.com wrote:
Hi,
On Tue, Jun 14, 2011 at 06:03:00PM +0200, Arnd Bergmann wrote:
On Tuesday 14 June 2011, Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmann arnd@arndb.de wrote:
Please explain the exact requirements that lead you to defining multiple contexts.
Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes.
For all I know, that is something that is only true for a few very special Samsung devices, and is completely unrelated of the need for contiguous allocations, so this approach becomes pointless as soon as the next generation of that chip grows an IOMMU, where we don't handle the special bank attributes. Also, the way I understood the situation for the Samsung SoC during the Budapest discussion, it's only a performance hack, not a functional requirement, unless you count '1080p playback' as a functional requirement.
Coming in mid topic...
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
Hm, I think that was something similar but not quite the same: talking about having allocations split to lie between two banks of RAM to maximise the read/write speed for performance reasons. That's something that can be handled in the allocator, rather than an API constraint, as this is.
Not that I know of any hardware which is limited as such, but eh.
Cheers, Daniel
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation?
I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way.
You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first.
The possible conflict that I still see with per-bank CMA regions are:
* It completely destroys memory power management in cases where that is based on powering down entire memory banks.
* We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both.
Arnd
On 06/14/2011 02:42 PM, Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation?
Yes. The IOMMU case with split bank allocation is key, especially for shared buffers. Consider the case where video is using a certain bank for performance purposes and that frame is shared with the GPU.
Jordan
On Tuesday 14 June 2011, Jordan Crouse wrote:
On 06/14/2011 02:42 PM, Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation?
Yes. The IOMMU case with split bank allocation is key, especially for shared buffers. Consider the case where video is using a certain bank for performance purposes and that frame is shared with the GPU.
Could we use the non-uniform memory access (NUMA) code for this? That code does more than what we've been talking about, and we're currently thinking only of a degenerate case (one CPU node with multiple memory nodes), but my feeling is that we can still build on top of it.
The NUMA code can describe relations between different areas of memory and how they interact with devices and processes, so you can attach a device to a specific node and have all allocations done from there. You can also set policy in user space, e.g. to have a video decoder process running on the bank that is not used by the GPU.
In the DMA mapping API, that would mean we add another dma_attr to dma_alloc_* that lets you pass a node identifier.
Arnd
Hello,
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation?
I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way.
You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first.
The possible conflict that I still see with per-bank CMA regions are:
- It completely destroys memory power management in cases where that is based on powering down entire memory banks.
I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management.
Best regards
On 15 Jun 11 10:36, Marek Szyprowski wrote:
Hello,
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it "accidentally", depriving those who really need it.
Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation?
I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way.
You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first.
The possible conflict that I still see with per-bank CMA regions are:
- It completely destroys memory power management in cases where that is based on powering down entire memory banks.
I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management.
We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove).
Best regards
Marek Szyprowski Samsung Poland R&D Center
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Larry Bassel
On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
On 15 Jun 11 10:36, Marek Szyprowski wrote:
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it "accidentally", depriving those who really need it.
Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another?
The possible conflict that I still see with per-bank CMA regions are:
- It completely destroys memory power management in cases where that is based on powering down entire memory banks.
I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management.
We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove).
We already established that we have to know something about the banks, and your additional input makes it even clearer that we need to consider the bigger picture here: We need to describe parts of memory separately regarding general performance, device specific allocations and hotplug characteristics.
It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two.
Arnd
On 16 Jun 11 00:06, Arnd Bergmann wrote:
On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
On 15 Jun 11 10:36, Marek Szyprowski wrote:
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it "accidentally", depriving those who really need it.
Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another?
One is always faster, but very small (generally 2-10% the size of "normal" memory).
Larry
On Thursday 16 June 2011 19:01:33 Larry Bassel wrote:
Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another?
One is always faster, but very small (generally 2-10% the size of "normal" memory).
Ok, that sounds like the "SRAM" regions that we are handling on some ARM platforms using the various interfaces. It should probably remain outside of the regular allocator, but we can try to generalize the SRAM support further. There are many possible uses for it.
Arnd
Hi,
On Thu, Jun 16, 2011 at 12:06:07AM +0200, Arnd Bergmann wrote:
On Wednesday 15 June 2011 23:39:58 Larry Bassel wrote:
On 15 Jun 11 10:36, Marek Szyprowski wrote:
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it "accidentally", depriving those who really need it.
Can you describe how the memory areas differ specifically? Is there one that is always faster but very small, or are there just specific circumstances under which some memory is faster than another?
The possible conflict that I still see with per-bank CMA regions are:
- It completely destroys memory power management in cases where that is based on powering down entire memory banks.
We already established that we have to know something about the banks, and your additional input makes it even clearer that we need to consider the bigger picture here: We need to describe parts of memory separately regarding general performance, device specific allocations and hotplug characteristics.
It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two.
Recently, we proposed a generic 'memory regions' framework to exploit the memory power management capabilities on the embedded boards. Think of some of the above CMA requirements could be met by this fraemwork. One of the main goals of regions is to make the VM aware of the hardware memory boundaries, like bank. For managing memory power consumption, memory regions are created aligned to the hardware granularity at which the power can be managed (ie, the memory power consumption operations like on/off can be performed). If attributed are associated with each of these regions, some of these regions could be marked as CMA-only, ensuring that only movable and per-bank memory is allocated. More details on the design can be found here:
http://lkml.org/lkml/2011/5/27/177 http://lkml.org/lkml/2011/6/29/202 http://lwn.net/Articles/446493/
On Monday 04 July 2011, Ankita Garg wrote:
It still sounds to me that this can be done using the NUMA properties that Linux already understands, and teaching more subsystems about it, but maybe the memory hotplug developers have already come up with another scheme. The way that memory hotplug and CMA choose their memory regions certainly needs to take both into account. As far as I can see there are both conflicting and synergistic effects when you combine the two.
Recently, we proposed a generic 'memory regions' framework to exploit the memory power management capabilities on the embedded boards. Think of some of the above CMA requirements could be met by this fraemwork. One of the main goals of regions is to make the VM aware of the hardware memory boundaries, like bank. For managing memory power consumption, memory regions are created aligned to the hardware granularity at which the power can be managed (ie, the memory power consumption operations like on/off can be performed). If attributed are associated with each of these regions, some of these regions could be marked as CMA-only, ensuring that only movable and per-bank memory is allocated. More details on the design can be found here:
http://lkml.org/lkml/2011/5/27/177 http://lkml.org/lkml/2011/6/29/202 http://lwn.net/Articles/446493/
Thanks for the pointers, that is exactly what I was looking for.
Arnd
On 15 June 2011 16:39, Larry Bassel lbassel@codeaurora.org wrote:
On 15 Jun 11 10:36, Marek Szyprowski wrote:
Hello,
On Tuesday, June 14, 2011 10:42 PM Arnd Bergmann wrote:
On Tuesday 14 June 2011 20:58:25 Zach Pfeffer wrote:
I've seen this split bank allocation in Qualcomm and TI SoCs, with Samsung, that makes 3 major SoC vendors (I would be surprised if Nvidia didn't also need to do this) - so I think some configurable method to control allocations is necessarily. The chips can't do decode without it (and by can't do I mean 1080P and higher decode is not functionally useful). Far from special, this would appear to be the default.
We at Qualcomm have some platforms that have memory of different performance characteristics, some drivers will need a way of specifying that they need fast memory for an allocation (and would prefer an error if it is not available rather than a fallback to slower memory). It would also be bad if allocators who don't need fast memory got it "accidentally", depriving those who really need it.
I think this statement actually applies to all the SoCs that are coming out now and in the future from TI, Samsung, Nvidia, Freescale, ST Ericsson and others. It seems that in all cases users will want to:
1. Allocate memory with a per-SoC physical memory mapping policy that is usually manually specified, i.e. use this physical memory bank set for this allocation or nothing. 2. Be able to easily pass a token to this memory between various userspace processes and the kernel. 3. Be able to easily and explicitly access attributes of an allocation from all contexts. 4. Be able to save and reload this memory without giving up the virtual address allocation.
In essence they want a architectural independent map object that can bounce around the system with a unique handle.
Thanks for the insight, that's a much better argument than 'something may need it'. Are those all chips without an IOMMU or do we also need to solve the IOMMU case with split bank allocation?
I think I'd still prefer to see the support for multiple regions split out into one of the later patches, especially since that would defer the question of how to do the initialization for this case and make sure we first get a generic way.
You've convinced me that we need to solve the problem of allocating memory from a specific bank eventually, but separating it from the one at hand (contiguous allocation) should help getting the important groundwork in at first.
The possible conflict that I still see with per-bank CMA regions are:
- It completely destroys memory power management in cases where that
is based on powering down entire memory banks.
I don't think that per-bank CMA regions destroys memory power management more than the global CMA pool. Please note that the contiguous buffers (or in general dma-buffers) right now are unmovable so they don't fit well into memory power management.
We also have platforms where a well-defined part of the memory can be powered off, and other parts can't (or won't). We need a way to steer the place allocations come from to the memory that won't be turned off (so that CMA allocations are not an obstacle to memory hotremove).
Best regards
Marek Szyprowski Samsung Poland R&D Center
Linaro-mm-sig mailing list Linaro-mm-sig@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-mm-sig
Larry Bassel
-- Sent by an employee of the Qualcomm Innovation Center, Inc. The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.
On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann arnd@arndb.de wrote:
- We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both.
I don't think I follow. What does IOMMU has to do with CMA?
On Wednesday 15 June 2011, Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 22:42:24 +0200, Arnd Bergmann arnd@arndb.de wrote:
- We still need to solve the same problem in case of IOMMU mappings at some point, even if today's hardware doesn't have this combination. It would be good to use the same solution for both.
I don't think I follow. What does IOMMU has to do with CMA?
The point is that on the higher level device drivers, we want to hide the presence of CMA and/or IOMMU behind the dma mapping API, but the device drivers do need to know about the bank properties.
If we want to solve the problem of allocating per-bank memory inside of CMA, we also need to solve it inside of the IOMMU code, using the same device driver interface.
Arnd
On Wed, 15 Jun 2011 13:20:42 +0200, Arnd Bergmann arnd@arndb.de wrote:
The point is that on the higher level device drivers, we want to hide the presence of CMA and/or IOMMU behind the dma mapping API, but the device drivers do need to know about the bank properties.
Gotcha, thanks.
Hi Arnd,
On 06/14/2011 09:33 PM, Arnd Bergmann wrote:
On Tuesday 14 June 2011, Michal Nazarewicz wrote:
On Tue, 14 Jun 2011 15:49:29 +0200, Arnd Bergmannarnd@arndb.de wrote:
Please explain the exact requirements that lead you to defining multiple contexts.
Some devices may have access only to some banks of memory. Some devices may use different banks of memory for different purposes.
For all I know, that is something that is only true for a few very special Samsung devices, and is completely unrelated of the need for contiguous allocations, so this approach becomes pointless as soon as the next generation of that chip grows an IOMMU, where we don't handle the special bank attributes. Also, the way I understood the situation for the Samsung SoC during the Budapest discussion, it's only a performance hack, not a functional requirement, unless you count '1080p playback' as a functional requirement.
1080p@30fps is indeed a functional requirement, as the IP has the capability to achieve it. This IP itself can act as more than one AXI master, and control more than one memory port(bank). So this is not a *performance hack*
Also, if I recall, during the Budapest discussion (I was on irc), I recall that this requirement can become the information available to the actual allocator. Below is the summary point I could collect from summit notes:
* May also need to specify more attributes (specific physical memory region)
As per this point, the requirement (as above) must be attribute to the allocator, which is CMA in this case.
Supporting contiguous allocation is a very useful goal and many people want this, but supporting a crazy one-off hardware design with lots of generic infrastructure is going a bit too far. If you can't be more specific than 'some devices may need this', I would suggest going forward without having multiple regions:
- Remove the registration of specific addresses from the initial patch set (but keep the patch).
- Add a heuristic plus command-line override to automatically come up with a reasonable location+size for *one* CMA area in the system.
- Ship the patch to add support for multiple CMA areas with the BSP for the boards that need it (if any).
- Wait for someone on a non-Samsung SoC to run into the same problem, then have /them/ get the final patch in.
Even if you think you can convince enough people that having support for distinct predefined regions is a good idea, I would recommend splitting that out of the initial merge so we can have that discussion separately from the other issues.
Arnd
To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Regards, Subash SISO-SLG
Hello,
On Tuesday, June 14, 2011 3:49 PM Arnd Bergmann wrote:
On Monday 13 June 2011, Marek Szyprowski wrote:
cm_alloc/free are definitely not meant to be called from device drivers. They should be only considered as a backend for dma-mapping.
'Raw' contiguous memory block doesn't really make sense for the device drivers. What the drivers require is a contiguous memory block that is somehow mapped into respective bus address space, so dma-mapping framework is the right interface.
alloc_pages(..., GFP_CMA) looks nice but in fact it is really impractical. The driver will need to map such buffer to dma context anyway, so imho dma_alloc_attributed() will give the drivers much more flexibility. In terms of dma-mapping the context argument isn't anything odd.
Ok.
If possible I would like to make cma something similar to declare_dma_coherent()&friends, so the board/platform/bus startup code will just call declare_dma_contiguous() to enable support for cma for particular devices.
Sounds good, I like that.
Thanks. I thought a bit more on this and decided that I want to make this declare_dma_contiguous() optional for the drivers. It should be used only for some sophisticated cases like for example our video codec with two memory interfaces for 2 separate banks. By default the dma-mapping will use system-wide cma pool.
(snipped)
- It requires you to pass the exact location of the area. I can see why you want that on certain machines that require DMA areas to be spread across multiple memory buses, but IMHO it's not appropriate for a generic API.
IMHO we can also use some NULL context to indicate some global, system wide CMA area and again -> in terms of dma-mapping api having a context isn't anything uncommon.
Please explain the exact requirements that lead you to defining multiple contexts. My feeling is that in most cases we don't need them and can simply live with a single area. Depending on how obscure the cases are where we do need something beyond that, we can then come up with special-case solutions for them.
Like it was already stated we need such feature for our multimedia codec to allocate buffers from different memory banks. I really don't see any problems with the possibility to have additional cma areas for special purposes.
- It requires you to hardcode the size in a machine specific source
file.
This probably seems to be a natural thing to do when you have worked
a
lot on the ARM architecture, but it's really not. We really want to get rid of board files and replace them with generic probing based on the device tree, and the size of the area is more policy than a
property
of the hardware that can be accurately described in the device tree
or
a board file.
The problem is the fact that right now, we still have board files and we have to live with them for a while (with all advantages and
disadvantages).
I hope that you won't require me to rewrite the whole support for all ARM platforms to get rid of board files to get CMA merged ;)
Of course not. But we need to know what we want a platform with device tree support to look like when it's using CMA, so we don't make it harder to change the platforms over than it already is.
I see no problem defining CMA areas in device tree, as this is something really specific to particular board configuration.
The problem with defining CMA areas in the device tree is that it's not a property of the hardware, but really policy. The device tree file should not contain anything related to a specific operating system because you might want to boot something on the board that doesn't know about CMA, and even when you only care about using Linux, the implementation might change to the point where hardcoded CMA areas no longer make sense.
I really doubt that the device tree will carry only system-independent information. Anyway, the preferred or required memory areas/banks for buffer allocation is something that is a property of the hardware not the OS policy.
IMHO we should first aim for a solution that works almost everywhere without the kernel knowing what board it's running on, and then we can add quirks for devices that have special requirements. I think the situation is similar to the vmalloc virtual address space, which normally has a hardcoded size that works almost everywhere, but there are certain drivers etc that require much more, or there are situations where you want to make it smaller in order to avoid highmem.
I'm trying to create something that will fulfill the requirements of my hardware, that's why I cannot focus on a generic case only.
Best regards
On Wednesday 15 June 2011, Marek Szyprowski wrote:
If possible I would like to make cma something similar to declare_dma_coherent()&friends, so the board/platform/bus startup code will just call declare_dma_contiguous() to enable support for cma for particular devices.
Sounds good, I like that.
Thanks. I thought a bit more on this and decided that I want to make this declare_dma_contiguous() optional for the drivers. It should be used only for some sophisticated cases like for example our video codec with two memory interfaces for 2 separate banks. By default the dma-mapping will use system-wide cma pool.
(snipped)
Ok, good.
Please explain the exact requirements that lead you to defining multiple contexts. My feeling is that in most cases we don't need them and can simply live with a single area. Depending on how obscure the cases are where we do need something beyond that, we can then come up with special-case solutions for them.
Like it was already stated we need such feature for our multimedia codec to allocate buffers from different memory banks. I really don't see any problems with the possibility to have additional cma areas for special purposes.
It's not a problem for special cases, I just want to make sure that the common case works well enough that we don't need too many special cases.
The problem with defining CMA areas in the device tree is that it's not a property of the hardware, but really policy. The device tree file should not contain anything related to a specific operating system because you might want to boot something on the board that doesn't know about CMA, and even when you only care about using Linux, the implementation might change to the point where hardcoded CMA areas no longer make sense.
I really doubt that the device tree will carry only system-independent information.
So far, this has worked well enough.
Anyway, the preferred or required memory areas/banks for buffer allocation is something that is a property of the hardware not the OS policy.
That is true. If there are hard requirements of the hardware, we can and should definitely document them in the device tree. It's totally fine to describe the layout of memory banks and affinity of devices to specific banks where that exists in hardware.
The part that should not be in the device tree is a specific location of a buffer inside the bank when there is no hardware reason for the location.
I guess it's also fine to describe requirements per device regarding how much contiguous memory it will need to operate, if you can provide that number independent of the application you want to run. The early boot code can walk the device tree and ensure that the region is large enough.
IMHO we should first aim for a solution that works almost everywhere without the kernel knowing what board it's running on, and then we can add quirks for devices that have special requirements. I think the situation is similar to the vmalloc virtual address space, which normally has a hardcoded size that works almost everywhere, but there are certain drivers etc that require much more, or there are situations where you want to make it smaller in order to avoid highmem.
I'm trying to create something that will fulfill the requirements of my hardware, that's why I cannot focus on a generic case only.
Yes, that is the obvious conflict. My main interest is to have something that works not just for your hardware but works as good as possible on the widest possible range of hardware, which may mean it may run not quite as good in some of the cases.
Arnd
This patch adds support for CMA to dma-mapping subsystem for ARM architecture. CMA area can be defined individually for each device in the system. This is up to the board startup code to create CMA area and assign it to the devices.
Buffer alignment is derived from the buffer size, but for only for buffers up to 1MiB. Larger buffers are aligned to 1MiB always.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- arch/arm/include/asm/device.h | 3 ++ arch/arm/include/asm/dma-mapping.h | 19 +++++++++++ arch/arm/mm/dma-mapping.c | 60 +++++++++++++++++++++++++++--------- 3 files changed, 67 insertions(+), 15 deletions(-)
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 9f390ce..942913e 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -10,6 +10,9 @@ struct dev_archdata { #ifdef CONFIG_DMABOUNCE struct dmabounce_device_info *dmabounce; #endif +#ifdef CONFIG_CMA + struct cma *cma_area; +#endif };
struct pdev_archdata { diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h index 4fff837..e387ea7 100644 --- a/arch/arm/include/asm/dma-mapping.h +++ b/arch/arm/include/asm/dma-mapping.h @@ -14,6 +14,25 @@ #error Please update to __arch_pfn_to_dma #endif
+struct cma; + +#ifdef CONFIG_CMA +static inline struct cma *get_dev_cma_area(struct device *dev) +{ + return dev->archdata.cma_area; +} + +static inline void set_dev_cma_area(struct device *dev, struct cma *cma) +{ + dev->archdata.cma_area = cma; +} +#else +static inline struct cma *get_dev_cma_area(struct device *dev) +{ + return NULL; +} +#endif + /* * dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma are architecture private * functions used internally by the DMA-mapping API to provide DMA diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 82a093c..233e34a 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -18,6 +18,7 @@ #include <linux/device.h> #include <linux/dma-mapping.h> #include <linux/highmem.h> +#include <linux/cma.h>
#include <asm/memory.h> #include <asm/highmem.h> @@ -52,16 +53,36 @@ static u64 get_coherent_dma_mask(struct device *dev) return mask; }
+ +static struct page *__alloc_system_pages(size_t count, unsigned int order, gfp_t gfp) +{ + struct page *page, *p, *e; + + page = alloc_pages(gfp, order); + if (!page) + return NULL; + + /* + * Now split the huge page and free the excess pages + */ + split_page(page, order); + for (p = page + count, e = page + (1 << order); p < e; p++) + __free_page(p); + return page; +} + /* * Allocate a DMA buffer for 'dev' of size 'size' using the * specified gfp mask. Note that 'size' must be page aligned. */ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) { - unsigned long order = get_order(size); - struct page *page, *p, *e; + struct cma *cma = get_dev_cma_area(dev); + struct page *page; + size_t count = size >> PAGE_SHIFT; void *ptr; u64 mask = get_coherent_dma_mask(dev); + unsigned long order = get_order(count << PAGE_SHIFT);
#ifdef CONFIG_DMA_API_DEBUG u64 limit = (mask + 1) & ~mask; @@ -78,16 +99,19 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf if (mask < 0xffffffffULL) gfp |= GFP_DMA;
- page = alloc_pages(gfp, order); - if (!page) - return NULL; + /* + * First, try to allocate memory from contiguous area aligned up to 1MiB + */ + page = cm_alloc(cma, count, order < 8 ? 8 : order);
/* - * Now split the huge page and free the excess pages + * Fallback if contiguous alloc fails or is not available */ - split_page(page, order); - for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) - __free_page(p); + if (!page) + page = __alloc_system_pages(count, order, gfp); + + if (!page) + return NULL;
/* * Ensure that the allocated pages are zeroed, and that any data @@ -104,13 +128,19 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf /* * Free a DMA buffer. 'size' must be page aligned. */ -static void __dma_free_buffer(struct page *page, size_t size) +static void __dma_free_buffer(struct device *dev, struct page *page, size_t size) { - struct page *e = page + (size >> PAGE_SHIFT); + struct cma *cma = get_dev_cma_area(dev); + size_t count = size >> PAGE_SHIFT; + struct page *e = page + count;
- while (page < e) { - __free_page(page); - page++; + if (cma) { + cm_free(cma, page, count); + } else { + while (page < e) { + __free_page(page); + page++; + } } }
@@ -416,7 +446,7 @@ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr if (!arch_is_coherent()) __dma_free_remap(cpu_addr, size);
- __dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size); + __dma_free_buffer(dev, pfn_to_page(dma_to_pfn(dev, handle)), size); } EXPORT_SYMBOL(dma_free_coherent);
This patch is an example how CMA can be activated for particular devices in the system. It creates one CMA region and assigns it to all s5p-fimc devices on Samsung Aquila S5PC110 board.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- arch/arm/mach-s5pv210/Kconfig | 1 + arch/arm/mach-s5pv210/mach-aquila.c | 26 ++++++++++++++++++++++++++ 2 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mach-s5pv210/Kconfig b/arch/arm/mach-s5pv210/Kconfig index 37b5a97..c09a92c 100644 --- a/arch/arm/mach-s5pv210/Kconfig +++ b/arch/arm/mach-s5pv210/Kconfig @@ -64,6 +64,7 @@ menu "S5PC110 Machines" config MACH_AQUILA bool "Aquila" select CPU_S5PV210 + select CMA select S3C_DEV_FB select S5P_DEV_FIMC0 select S5P_DEV_FIMC1 diff --git a/arch/arm/mach-s5pv210/mach-aquila.c b/arch/arm/mach-s5pv210/mach-aquila.c index 4e1d8ff..8c404e5 100644 --- a/arch/arm/mach-s5pv210/mach-aquila.c +++ b/arch/arm/mach-s5pv210/mach-aquila.c @@ -21,6 +21,8 @@ #include <linux/gpio_keys.h> #include <linux/input.h> #include <linux/gpio.h> +#include <linux/cma.h> +#include <linux/dma-mapping.h>
#include <asm/mach/arch.h> #include <asm/mach/map.h> @@ -650,6 +652,19 @@ static void __init aquila_map_io(void) s5p_set_timer_source(S5P_PWM3, S5P_PWM4); }
+unsigned long cma_area_start; +unsigned long cma_area_size = 32 << 20; + +static void __init aquila_reserve(void) +{ + unsigned long ret = cma_reserve(cma_area_start, cma_area_size); + + if (!IS_ERR_VALUE(ret)) { + cma_area_start = ret; + printk(KERN_INFO "cma: reserved %ld bytes at %lx\n", cma_area_size, cma_area_start); + } +} + static void __init aquila_machine_init(void) { /* PMIC */ @@ -672,6 +687,16 @@ static void __init aquila_machine_init(void) s3c_fb_set_platdata(&aquila_lcd_pdata);
platform_add_devices(aquila_devices, ARRAY_SIZE(aquila_devices)); + + if (cma_area_start) { + struct cma *cma; + cma = cma_create(cma_area_start, cma_area_size); + if (cma) { + set_dev_cma_area(&s5p_device_fimc0.dev, cma); + set_dev_cma_area(&s5p_device_fimc1.dev, cma); + set_dev_cma_area(&s5p_device_fimc2.dev, cma); + } + } }
MACHINE_START(AQUILA, "Aquila") @@ -683,4 +708,5 @@ MACHINE_START(AQUILA, "Aquila") .map_io = aquila_map_io, .init_machine = aquila_machine_init, .timer = &s5p_timer, + .reserve = aquila_reserve, MACHINE_END
linaro-mm-sig@lists.linaro.org