Hello again,
This is yet another round of Contiguous Memory Allocator patches. Now I implemented all the ideas that has been discussed during Linaro Sprint meeting.
This version provides a solution for complete integration of CMA to DMA mapping subsystem on ARM architecture. The issue caused by double dma pages mapping and possible aliasing in coherent memory mapping has been finally resolved, both for GFP_ATOMIC case (allocations comes from coherent memory pool) and non-GFP_ATOMIC case (allocations comes from CMA managed areas).
For coherent, nommu, ARMv4 and ARMv5 systems the current DMA-mapping implementation has been kept.
For ARMv6+ systems, CMA has been enabled and a special pool of coherent memory for atomic allocations has been created. The size of this pool defaults to CONSISTEN_DMA_SIZE/8, but can be changed with coherent_pool kernel parameter (if really required).
All atomic allocations are served from this pool. I've did a little simplification here, because there is no separate pool for writecombine memory - such requests are also served from coherent pool. I don't think that such simplification is a problem here - I found no driver that use dma_alloc_writecombine with GFP_ATOMIC flags.
All non-atomic allocation are served from CMA area. Kernel mapping is updated to reflect required memory attributes changes. This is possible because during early boot, all CMA area are remapped with 4KiB pages in kernel low-memory.
This version have been tested on Samsung S5PC110 based Goni machine and Exynos4 UniversalC210 board with various V4L2 multimedia drivers.
Coherent atomic allocations has been tested by manually enabling the dma bounce for the s3c-sdhci device.
All patches are prepared for Linux Kernel v3.1-rc2.
A few words for these who see CMA for the first time:
The Contiguous Memory Allocator (CMA) makes it possible for device drivers to allocate big contiguous chunks of memory after the system has booted.
The main difference from the similar frameworks is the fact that CMA allows to transparently reuse memory region reserved for the big chunk allocation as a system memory, so no memory is wasted when no big chunk is allocated. Once the alloc request is issued, the framework will migrate system pages to create a required big chunk of physically contiguous memory.
For more information you can refer to nice LWN articles: http://lwn.net/Articles/447405/ and http://lwn.net/Articles/450286/ as well as links to previous versions of the CMA framework.
The CMA framework has been initially developed by Michal Nazarewicz at Samsung Poland R&D Center. Since version 9, I've taken over the development, because Michal has left the company.
TODO (optional): - implement support for contiguous memory areas placed in HIGHMEM zone
Best regards
From: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com
Memory hotplug is a logic for making pages unused in the specified range of pfn. So, some of core logics can be used for other purpose as allocating a very large contigous memory block.
This patch moves some functions from mm/memory_hotplug.c to mm/page_isolation.c. This helps adding a function for large-alloc in page_isolation.c with memory-unplug technique.
Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com [m.nazarewicz: reworded commit message] Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: rebased and updated to Linux v3.0-rc1] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com Acked-by: Arnd Bergmann arnd@arndb.de --- include/linux/page-isolation.h | 7 +++ mm/memory_hotplug.c | 111 -------------------------------------- mm/page_isolation.c | 114 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 121 insertions(+), 111 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 051c1b1..58cdbac 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -33,5 +33,12 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page);
+/* + * For migration. + */ + +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn); +unsigned long scan_lru_pages(unsigned long start, unsigned long end); +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
#endif diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 6e7d8b2..3419dd6 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -707,117 +707,6 @@ int is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages) }
/* - * Confirm all pages in a range [start, end) is belongs to the same zone. - */ -static int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) -{ - unsigned long pfn; - struct zone *zone = NULL; - struct page *page; - int i; - for (pfn = start_pfn; - pfn < end_pfn; - pfn += MAX_ORDER_NR_PAGES) { - i = 0; - /* This is just a CONFIG_HOLES_IN_ZONE check.*/ - while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) - i++; - if (i == MAX_ORDER_NR_PAGES) - continue; - page = pfn_to_page(pfn + i); - if (zone && page_zone(page) != zone) - return 0; - zone = page_zone(page); - } - return 1; -} - -/* - * Scanning pfn is much easier than scanning lru list. - * Scan pfn from start to end and Find LRU page. - */ -static unsigned long scan_lru_pages(unsigned long start, unsigned long end) -{ - unsigned long pfn; - struct page *page; - for (pfn = start; pfn < end; pfn++) { - if (pfn_valid(pfn)) { - page = pfn_to_page(pfn); - if (PageLRU(page)) - return pfn; - } - } - return 0; -} - -static struct page * -hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) -{ - /* This should be improooooved!! */ - return alloc_page(GFP_HIGHUSER_MOVABLE); -} - -#define NR_OFFLINE_AT_ONCE_PAGES (256) -static int -do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) -{ - unsigned long pfn; - struct page *page; - int move_pages = NR_OFFLINE_AT_ONCE_PAGES; - int not_managed = 0; - int ret = 0; - LIST_HEAD(source); - - for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) { - if (!pfn_valid(pfn)) - continue; - page = pfn_to_page(pfn); - if (!get_page_unless_zero(page)) - continue; - /* - * We can skip free pages. And we can only deal with pages on - * LRU. - */ - ret = isolate_lru_page(page); - if (!ret) { /* Success */ - put_page(page); - list_add_tail(&page->lru, &source); - move_pages--; - inc_zone_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - - } else { -#ifdef CONFIG_DEBUG_VM - printk(KERN_ALERT "removing pfn %lx from LRU failed\n", - pfn); - dump_page(page); -#endif - put_page(page); - /* Because we don't have big zone->lock. we should - check this again here. */ - if (page_count(page)) { - not_managed++; - ret = -EBUSY; - break; - } - } - } - if (!list_empty(&source)) { - if (not_managed) { - putback_lru_pages(&source); - goto out; - } - /* this function returns # of failed pages */ - ret = migrate_pages(&source, hotremove_migrate_alloc, 0, - true, true); - if (ret) - putback_lru_pages(&source); - } -out: - return ret; -} - -/* * remove from free_area[] and mark all as Reserved. */ static int diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 4ae42bb..270a026 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -5,6 +5,9 @@ #include <linux/mm.h> #include <linux/page-isolation.h> #include <linux/pageblock-flags.h> +#include <linux/memcontrol.h> +#include <linux/migrate.h> +#include <linux/mm_inline.h> #include "internal.h"
static inline struct page * @@ -139,3 +142,114 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn) spin_unlock_irqrestore(&zone->lock, flags); return ret ? 0 : -EBUSY; } + + +/* + * Confirm all pages in a range [start, end) is belongs to the same zone. + */ +int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + struct zone *zone = NULL; + struct page *page; + int i; + for (pfn = start_pfn; + pfn < end_pfn; + pfn += MAX_ORDER_NR_PAGES) { + i = 0; + /* This is just a CONFIG_HOLES_IN_ZONE check.*/ + while ((i < MAX_ORDER_NR_PAGES) && !pfn_valid_within(pfn + i)) + i++; + if (i == MAX_ORDER_NR_PAGES) + continue; + page = pfn_to_page(pfn + i); + if (zone && page_zone(page) != zone) + return 0; + zone = page_zone(page); + } + return 1; +} + +/* + * Scanning pfn is much easier than scanning lru list. + * Scan pfn from start to end and Find LRU page. + */ +unsigned long scan_lru_pages(unsigned long start, unsigned long end) +{ + unsigned long pfn; + struct page *page; + for (pfn = start; pfn < end; pfn++) { + if (pfn_valid(pfn)) { + page = pfn_to_page(pfn); + if (PageLRU(page)) + return pfn; + } + } + return 0; +} + +struct page * +hotremove_migrate_alloc(struct page *page, unsigned long private, int **x) +{ + /* This should be improooooved!! */ + return alloc_page(GFP_HIGHUSER_MOVABLE); +} + +#define NR_OFFLINE_AT_ONCE_PAGES (256) +int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) +{ + unsigned long pfn; + struct page *page; + int move_pages = NR_OFFLINE_AT_ONCE_PAGES; + int not_managed = 0; + int ret = 0; + LIST_HEAD(source); + + for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) { + if (!pfn_valid(pfn)) + continue; + page = pfn_to_page(pfn); + if (!get_page_unless_zero(page)) + continue; + /* + * We can skip free pages. And we can only deal with pages on + * LRU. + */ + ret = isolate_lru_page(page); + if (!ret) { /* Success */ + put_page(page); + list_add_tail(&page->lru, &source); + move_pages--; + inc_zone_page_state(page, NR_ISOLATED_ANON + + page_is_file_cache(page)); + + } else { +#ifdef CONFIG_DEBUG_VM + printk(KERN_ALERT "removing pfn %lx from LRU failed\n", + pfn); + dump_page(page); +#endif + put_page(page); + /* Because we don't have big zone->lock. we should + check this again here. */ + if (page_count(page)) { + not_managed++; + ret = -EBUSY; + break; + } + } + } + if (!list_empty(&source)) { + if (not_managed) { + putback_lru_pages(&source); + goto out; + } + /* this function returns # of failed pages */ + ret = migrate_pages(&source, hotremove_migrate_alloc, 0, + true, true); + if (ret) + putback_lru_pages(&source); + } +out: + return ret; +}
From: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com
This commit introduces alloc_contig_freed_pages() function which allocates (ie. removes from buddy system) free pages in range. Caller has to guarantee that all pages in range are in buddy system.
Along with this function, a free_contig_pages() function is provided which frees all (or a subset of) pages allocated with alloc_contig_free_pages().
Michal Nazarewicz has modified the function to make it easier to allocate not MAX_ORDER_NR_PAGES aligned pages by making it return pfn of one-past-the-last allocated page.
Signed-off-by: KAMEZAWA Hiroyuki kamezawa.hiroyu@jp.fujitsu.com Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com Acked-by: Arnd Bergmann arnd@arndb.de --- include/linux/page-isolation.h | 3 ++ mm/page_alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 47 insertions(+), 0 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 58cdbac..f1417ed 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -32,6 +32,9 @@ test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); */ extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); +extern unsigned long alloc_contig_freed_pages(unsigned long start, + unsigned long end, gfp_t flag); +extern void free_contig_pages(struct page *page, int nr_pages);
/* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6e8ecb6..ad6ae3f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5668,6 +5668,50 @@ out: spin_unlock_irqrestore(&zone->lock, flags); }
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, + gfp_t flag) +{ + unsigned long pfn = start, count; + struct page *page; + struct zone *zone; + int order; + + VM_BUG_ON(!pfn_valid(start)); + zone = page_zone(pfn_to_page(start)); + + spin_lock_irq(&zone->lock); + + page = pfn_to_page(pfn); + for (;;) { + VM_BUG_ON(page_count(page) || !PageBuddy(page)); + list_del(&page->lru); + order = page_order(page); + zone->free_area[order].nr_free--; + rmv_page_order(page); + __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); + pfn += 1 << order; + if (pfn >= end) + break; + VM_BUG_ON(!pfn_valid(pfn)); + page += 1 << order; + } + + spin_unlock_irq(&zone->lock); + + /* After this, pages in the range can be freed one be one */ + page = pfn_to_page(start); + for (count = pfn - start; count; --count, ++page) + prep_new_page(page, 0, flag); + + return pfn; +} + +void free_contig_pages(struct page *page, int nr_pages) +{ + for (; nr_pages; --nr_pages, ++page) + __free_page(page); +} + #ifdef CONFIG_MEMORY_HOTREMOVE /* * All pages in the range must be isolated before calling this.
On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote:
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
gfp_t flag)
+{
- unsigned long pfn = start, count;
- struct page *page;
- struct zone *zone;
- int order;
- VM_BUG_ON(!pfn_valid(start));
- zone = page_zone(pfn_to_page(start));
This implies that start->end are entirely contained in a single zone. What enforces that? If some higher layer enforces that, I think we probably need at least a VM_BUG_ON() in here and a comment about who enforces it.
- spin_lock_irq(&zone->lock);
- page = pfn_to_page(pfn);
- for (;;) {
VM_BUG_ON(page_count(page) || !PageBuddy(page));
list_del(&page->lru);
order = page_order(page);
zone->free_area[order].nr_free--;
rmv_page_order(page);
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
pfn += 1 << order;
if (pfn >= end)
break;
VM_BUG_ON(!pfn_valid(pfn));
page += 1 << order;
- }
This 'struct page *'++ stuff is OK, but only for small, aligned areas. For at least some of the sparsemem modes (non-VMEMMAP), you could walk off of the end of the section_mem_map[] when you cross a MAX_ORDER boundary. I'd feel a little bit more comfortable if pfn_to_page() was being done each time, or only occasionally when you cross a section boundary.
This may not apply to what ARM is doing today, but it shouldn't be too difficult to fix up, or to document what's going on.
- spin_unlock_irq(&zone->lock);
- /* After this, pages in the range can be freed one be one */
- page = pfn_to_page(start);
- for (count = pfn - start; count; --count, ++page)
prep_new_page(page, 0, flag);
- return pfn;
+}
+void free_contig_pages(struct page *page, int nr_pages) +{
- for (; nr_pages; --nr_pages, ++page)
__free_page(page);
+}
The same thing about 'struct page' pointer math goes here.
-- Dave
On Thu, 08 Sep 2011 20:05:52 +0200, Dave Hansen dave@linux.vnet.ibm.com wrote:
On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote:
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
gfp_t flag)
+{
- unsigned long pfn = start, count;
- struct page *page;
- struct zone *zone;
- int order;
- VM_BUG_ON(!pfn_valid(start));
- zone = page_zone(pfn_to_page(start));
This implies that start->end are entirely contained in a single zone. What enforces that?
In case of CMA, the __cma_activate_area() function from 6/8 has the check:
151 VM_BUG_ON(!pfn_valid(pfn)); 152 VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone);
This guarantees that CMA will never try to call alloc_contig_freed_pages() on a region that spans multiple regions.
If some higher layer enforces that, I think we probably need at least a VM_BUG_ON() in here and a comment about who enforces it.
Agreed.
- spin_lock_irq(&zone->lock);
- page = pfn_to_page(pfn);
- for (;;) {
VM_BUG_ON(page_count(page) || !PageBuddy(page));
list_del(&page->lru);
order = page_order(page);
zone->free_area[order].nr_free--;
rmv_page_order(page);
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
pfn += 1 << order;
if (pfn >= end)
break;
VM_BUG_ON(!pfn_valid(pfn));
page += 1 << order;
- }
This 'struct page *'++ stuff is OK, but only for small, aligned areas. For at least some of the sparsemem modes (non-VMEMMAP), you could walk off of the end of the section_mem_map[] when you cross a MAX_ORDER boundary. I'd feel a little bit more comfortable if pfn_to_page() was being done each time, or only occasionally when you cross a section boundary.
I'm fine with that. I've used pointer arithmetic for performance reasons but if that may potentially lead to bugs then obviously pfn_to_page() should be used.
On Wed, 2011-09-21 at 15:17 +0200, Michal Nazarewicz wrote:
This 'struct page *'++ stuff is OK, but only for small, aligned areas. For at least some of the sparsemem modes (non-VMEMMAP), you could walk off of the end of the section_mem_map[] when you cross a MAX_ORDER boundary. I'd feel a little bit more comfortable if pfn_to_page() was being done each time, or only occasionally when you cross a section boundary.
I'm fine with that. I've used pointer arithmetic for performance reasons but if that may potentially lead to bugs then obviously pfn_to_page() should be used
pfn_to_page() on x86 these days is usually:
#define __pfn_to_page(pfn) (vmemmap + (pfn))
Even for the non-vmemmap sparsemem it stays pretty quick because the section array is in cache as you run through the loop.
There are ways to _minimize_ the number of pfn_to_page() calls by checking when you cross a section boundary, or even at a MAX_ORDER_NR_PAGES boundary. But, I don't think it's worth the trouble.
-- Dave
From: Michal Nazarewicz mina86@mina86.com
Signed-off-by: Michal Nazarewicz mina86@mina86.com
--- include/linux/page-isolation.h | 4 ++- mm/page_alloc.c | 66 ++++++++++++++++++++++++++++++++++----- 2 files changed, 60 insertions(+), 10 deletions(-)
On Fri, 2011-08-19 at 16:27 +0200, Marek Szyprowski wrote:
+unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
gfp_t flag)
+{
- unsigned long pfn = start, count;
- struct page *page;
- struct zone *zone;
- int order;
- VM_BUG_ON(!pfn_valid(start));
- zone = page_zone(pfn_to_page(start));
On Thu, 08 Sep 2011 20:05:52 +0200, Dave Hansen dave@linux.vnet.ibm.com wrote:
This implies that start->end are entirely contained in a single zone. What enforces that? If some higher layer enforces that, I think we probably need at least a VM_BUG_ON() in here and a comment about who enforces it.
- spin_lock_irq(&zone->lock);
- page = pfn_to_page(pfn);
- for (;;) {
VM_BUG_ON(page_count(page) || !PageBuddy(page));
list_del(&page->lru);
order = page_order(page);
zone->free_area[order].nr_free--;
rmv_page_order(page);
__mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order));
pfn += 1 << order;
if (pfn >= end)
break;
VM_BUG_ON(!pfn_valid(pfn));
page += 1 << order;
- }
This 'struct page *'++ stuff is OK, but only for small, aligned areas. For at least some of the sparsemem modes (non-VMEMMAP), you could walk off of the end of the section_mem_map[] when you cross a MAX_ORDER boundary. I'd feel a little bit more comfortable if pfn_to_page() was being done each time, or only occasionally when you cross a section boundary.
Do the attached changes seem to make sense?
I wanted to avoid calling pfn_to_page() each time as it seem fairly expensive in sparsemem and disctontig modes. At the same time, the macro trickery is so that users of sparsemem-vmemmap and flatmem won't have to pay the price.
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index b2a81fd..003c52f 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -46,11 +46,13 @@ static inline void unset_migratetype_isolate(struct page *page) { __unset_migratetype_isolate(page, MIGRATE_MOVABLE); } + +/* The below functions must be run on a range from a single zone. */ extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, gfp_t flags, unsigned migratetype); -extern void free_contig_pages(struct page *page, int nr_pages); +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
/* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 46e78d4..32fda5d 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5716,9 +5716,41 @@ out: spin_unlock_irqrestore(&zone->lock, flags); }
+#if defined(CONFIG_FLATMEM) || defined(CONFIG_SPARSEMEM_VMEMMAP) + +/* + * In FLATMEM and CONFIG_SPARSEMEM_VMEMMAP we can safely increment the page + * pointer and get the same value as if we were to get by calling + * pfn_to_page() on incremented pfn counter. + */ +#define __contig_next_page(page, pageblock_left, pfn, increment) \ + ((page) + (increment)) + +#define __contig_first_page(pageblock_left, pfn) pfn_to_page(pfn) + +#else + +/* + * If we cross pageblock boundary, make sure we get a valid page pointer. If + * we are within pageblock, incrementing the pointer is good enough, and is + * a bit of an optimisation. + */ +#define __contig_next_page(page, pageblock_left, pfn, increment) \ + (likely((pageblock_left) -= (increment)) ? (page) + (increment) \ + : (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn))) + +#define __contig_first_page(pageblock_left, pfn) ( \ + ((pageblock_left) = pageblock_nr_pages - \ + ((pfn) & (pageblock_nr_pages - 1))), \ + pfn_to_page(pfn)) + + +#endif + unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag) { + unsigned long pageblock_left __unused; unsigned long pfn = start, count; struct page *page; struct zone *zone; @@ -5729,27 +5761,37 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end,
spin_lock_irq(&zone->lock);
- page = pfn_to_page(pfn); + page = __contig_first_page(pageblock_left, pfn); for (;;) { - VM_BUG_ON(page_count(page) || !PageBuddy(page)); + VM_BUG_ON(!page_count(page) || !PageBuddy(page) || + page_zone(page) != zone); + list_del(&page->lru); order = page_order(page); + count = 1UL << order; zone->free_area[order].nr_free--; rmv_page_order(page); - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); - pfn += 1 << order; + __mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count); + + pfn += count; if (pfn >= end) break; VM_BUG_ON(!pfn_valid(pfn)); - page += 1 << order; + + page = __contig_next_page(page, pageblock_left, pfn, count); }
spin_unlock_irq(&zone->lock);
/* After this, pages in the range can be freed one be one */ - page = pfn_to_page(start); - for (count = pfn - start; count; --count, ++page) + count = pfn - start; + pfn = start; + page = __contig_first_page(pageblock_left, pfn); + for (; count; --count) { prep_new_page(page, 0, flag); + ++pfn; + page = __contig_next_page(page, pageblock_left, pfn, 1); + }
return pfn; } @@ -5903,10 +5945,16 @@ done: return ret; }
-void free_contig_pages(struct page *page, int nr_pages) +void free_contig_pages(unsigned long pfn, unsigned nr_pages) { - for (; nr_pages; --nr_pages, ++page) + unsigned long pageblock_left __unused; + struct page *page = __contig_first_page(pageblock_left, pfn); + + while (nr_pages--) { __free_page(page); + ++pfn; + page = __contig_next_page(page, pageblock_left, pfn, 1); + } }
#ifdef CONFIG_MEMORY_HOTREMOVE
On Wed, 2011-09-21 at 17:19 +0200, Michal Nazarewicz wrote:
Do the attached changes seem to make sense?
The logic looks OK.
I wanted to avoid calling pfn_to_page() each time as it seem fairly expensive in sparsemem and disctontig modes. At the same time, the macro trickery is so that users of sparsemem-vmemmap and flatmem won't have to pay the price.
Personally, I'd say the (incredibly minuscule) runtime cost is worth the cost of making folks' eyes bleed when they see those macros. I think there are some nicer ways to do it.
Is there a reason you can't logically do?
page = pfn_to_page(pfn); for (;;) { if (pfn_to_section_nr(pfn) == pfn_to_section_nr(pfn+1)) page++; else page = pfn_to_page(pfn+1); }
pfn_to_section_nr() is a register shift. Our smallest section size on x86 is 128MB and on ppc64 16MB. So, at *WORST* (64k pages on ppc64), you're doing pfn_to_page() one of every 256 loops.
My suggestion would be put put a macro up in the sparsemem headers that does something like:
#ifdef VMEMMAP #define zone_pfn_same_memmap(pfn1, pfn2) (1) #elif SPARSEMEM_OTHER static inline int zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2) { return (pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2)); } #else #define zone_pfn_same_memmap(pfn1, pfn2) (1) #endif
The zone_ bit is necessary in the naming because DISCONTIGMEM's pfns are at least contiguous within a zone. Only the non-VMEMMAP sparsemem case isn't.
Other folks would probably have a use for something like that. Although most of the previous users have gotten to this point, given up, and just done pfn_to_page() on each loop. :)
+#if defined(CONFIG_FLATMEM) || defined(CONFIG_SPARSEMEM_VMEMMAP)
+/*
- In FLATMEM and CONFIG_SPARSEMEM_VMEMMAP we can safely increment the page
- pointer and get the same value as if we were to get by calling
- pfn_to_page() on incremented pfn counter.
- */
+#define __contig_next_page(page, pageblock_left, pfn, increment) \
- ((page) + (increment))
+#define __contig_first_page(pageblock_left, pfn) pfn_to_page(pfn)
+#else
+/*
- If we cross pageblock boundary, make sure we get a valid page pointer. If
- we are within pageblock, incrementing the pointer is good enough, and is
- a bit of an optimisation.
- */
+#define __contig_next_page(page, pageblock_left, pfn, increment) \
- (likely((pageblock_left) -= (increment)) ? (page) + (increment) \
: (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn)))
+#define __contig_first_page(pageblock_left, pfn) ( \
- ((pageblock_left) = pageblock_nr_pages - \
((pfn) & (pageblock_nr_pages - 1))), \
- pfn_to_page(pfn))
+#endif
For the love of Pete, please make those in to functions if you're going to keep them. They're really unreadable like that.
You might also want to look at mm/internal.h's mem_map_offset() and mem_map_next(). They're not _quite_ what you need, but they're close.
-- Dave
From: Michal Nazarewicz mina86@mina86.com
Signed-off-by: Michal Nazarewicz mina86@mina86.com
--- include/asm-generic/memory_model.h | 17 ++++++++++++++ include/linux/page-isolation.h | 4 ++- mm/page_alloc.c | 43 +++++++++++++++++++++++++++-------- 3 files changed, 53 insertions(+), 11 deletions(-)
On Wed, 2011-09-21 at 17:19 +0200, Michal Nazarewicz wrote:
I wanted to avoid calling pfn_to_page() each time as it seem fairly expensive in sparsemem and disctontig modes. At the same time, the macro trickery is so that users of sparsemem-vmemmap and flatmem won't have to pay the price.
On Wed, 21 Sep 2011 17:45:59 +0200, Dave Hansen dave@linux.vnet.ibm.com wrote:
Personally, I'd say the (incredibly minuscule) runtime cost is worth the cost of making folks' eyes bleed when they see those macros. I think there are some nicer ways to do it.
Yeah. I wasn't amazed by them either.
Is there a reason you can't logically do? page = pfn_to_page(pfn); for (;;) { if (pfn_to_section_nr(pfn) == pfn_to_section_nr(pfn+1)) page++; else page = pfn_to_page(pfn+1); }
Done. Thanks for the suggestions!
+#define __contig_next_page(page, pageblock_left, pfn, increment) \
- (likely((pageblock_left) -= (increment)) ? (page) + (increment) \
: (((pageblock_left) = pageblock_nr_pages), pfn_to_page(pfn)))
+#define __contig_first_page(pageblock_left, pfn) ( \
- ((pageblock_left) = pageblock_nr_pages - \
((pfn) & (pageblock_nr_pages - 1))), \
- pfn_to_page(pfn))
+#endif
For the love of Pete, please make those in to functions if you're going to keep them.
That was tricky because they modify pageblock_left. Not relevant now anyways though.
diff --git a/include/asm-generic/memory_model.h b/include/asm-generic/memory_model.h index fb2d63f..900da88 100644 --- a/include/asm-generic/memory_model.h +++ b/include/asm-generic/memory_model.h @@ -69,6 +69,23 @@ }) #endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
+#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) + +/* + * Both PFNs must be from the same zone! If this function returns + * true, pfn_to_page(pfn1) + (pfn2 - pfn1) == pfn_to_page(pfn2). + */ +static inline bool zone_pfn_same_memmap(unsigned long pfn1, unsigned long pfn2) +{ + return pfn_to_section_nr(pfn1) == pfn_to_section_nr(pfn2); +} + +#else + +#define zone_pfn_same_memmap(pfn1, pfn2) (true) + +#endif + #define page_to_pfn __page_to_pfn #define pfn_to_page __pfn_to_page
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index b2a81fd..003c52f 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -46,11 +46,13 @@ static inline void unset_migratetype_isolate(struct page *page) { __unset_migratetype_isolate(page, MIGRATE_MOVABLE); } + +/* The below functions must be run on a range from a single zone. */ extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, gfp_t flags, unsigned migratetype); -extern void free_contig_pages(struct page *page, int nr_pages); +extern void free_contig_pages(unsigned long pfn, unsigned nr_pages);
/* * For migration. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 46e78d4..bc200a9 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5725,31 +5725,46 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, int order;
VM_BUG_ON(!pfn_valid(start)); - zone = page_zone(pfn_to_page(start)); + page = pfn_to_page(start); + zone = page_zone(page);
spin_lock_irq(&zone->lock);
- page = pfn_to_page(pfn); for (;;) { - VM_BUG_ON(page_count(page) || !PageBuddy(page)); + VM_BUG_ON(!page_count(page) || !PageBuddy(page) || + page_zone(page) != zone); + list_del(&page->lru); order = page_order(page); + count = 1UL << order; zone->free_area[order].nr_free--; rmv_page_order(page); - __mod_zone_page_state(zone, NR_FREE_PAGES, -(1UL << order)); - pfn += 1 << order; + __mod_zone_page_state(zone, NR_FREE_PAGES, -(long)count); + + pfn += count; if (pfn >= end) break; VM_BUG_ON(!pfn_valid(pfn)); - page += 1 << order; + + if (zone_pfn_same_memmap(pfn - count, pfn)) + page += count; + else + page = pfn_to_page(pfn); }
spin_unlock_irq(&zone->lock);
/* After this, pages in the range can be freed one be one */ - page = pfn_to_page(start); - for (count = pfn - start; count; --count, ++page) + count = pfn - start; + pfn = start; + for (page = pfn_to_page(pfn); count; --count) { prep_new_page(page, 0, flag); + ++pfn; + if (likely(zone_pfn_same_memmap(pfn - 1, pfn))) + ++page; + else + page = pfn_to_page(pfn); + }
return pfn; } @@ -5903,10 +5918,18 @@ done: return ret; }
-void free_contig_pages(struct page *page, int nr_pages) +void free_contig_pages(unsigned long pfn, unsigned nr_pages) { - for (; nr_pages; --nr_pages, ++page) + struct page *page = pfn_to_page(pfn); + + while (nr_pages--) { __free_page(page); + ++pfn; + if (likely(zone_pfn_same_memmap(pfn - 1, pfn))) + ++page; + else + page = pfn_to_page(pfn); + } }
#ifdef CONFIG_MEMORY_HOTREMOVE
On Wed, 2011-09-21 at 18:26 +0200, Michal Nazarewicz wrote:
page += 1 << order;
if (zone_pfn_same_memmap(pfn - count, pfn))
page += count;
else
page = pfn_to_page(pfn); }
That all looks sane to me and should fix the bug I brought up.
-- Dave
From: Michal Nazarewicz m.nazarewicz@samsung.com
This commit adds the alloc_contig_range() function which tries to allecate given range of pages. It tries to migrate all already allocated pages that fall in the range thus freeing them. Once all pages in the range are freed they are removed from the buddy system thus allocated for the caller to use.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: renamed some variables for easier code reading] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com Acked-by: Arnd Bergmann arnd@arndb.de --- include/linux/page-isolation.h | 2 + mm/page_alloc.c | 144 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 146 insertions(+), 0 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index f1417ed..c5d1a7c 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -34,6 +34,8 @@ extern int set_migratetype_isolate(struct page *page); extern void unset_migratetype_isolate(struct page *page); extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); +extern int alloc_contig_range(unsigned long start, unsigned long end, + gfp_t flags); extern void free_contig_pages(struct page *page, int nr_pages);
/* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ad6ae3f..35423c2 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5706,6 +5706,150 @@ unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, return pfn; }
+static unsigned long pfn_to_maxpage(unsigned long pfn) +{ + return pfn & ~(MAX_ORDER_NR_PAGES - 1); +} + +static unsigned long pfn_to_maxpage_up(unsigned long pfn) +{ + return ALIGN(pfn, MAX_ORDER_NR_PAGES); +} + +#define MIGRATION_RETRY 5 +static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) +{ + int migration_failed = 0, ret; + unsigned long pfn = start; + + /* + * Some code "borrowed" from KAMEZAWA Hiroyuki's + * __alloc_contig_pages(). + */ + + for (;;) { + pfn = scan_lru_pages(pfn, end); + if (!pfn || pfn >= end) + break; + + ret = do_migrate_range(pfn, end); + if (!ret) { + migration_failed = 0; + } else if (ret != -EBUSY + || ++migration_failed >= MIGRATION_RETRY) { + return ret; + } else { + /* There are unstable pages.on pagevec. */ + lru_add_drain_all(); + /* + * there may be pages on pcplist before + * we mark the range as ISOLATED. + */ + drain_all_pages(); + } + cond_resched(); + } + + if (!migration_failed) { + /* drop all pages in pagevec and pcp list */ + lru_add_drain_all(); + drain_all_pages(); + } + + /* Make sure all pages are isolated */ + if (WARN_ON(test_pages_isolated(start, end))) + return -EBUSY; + + return 0; +} + +/** + * alloc_contig_range() -- tries to allocate given range of pages + * @start: start PFN to allocate + * @end: one-past-the-last PFN to allocate + * @flags: flags passed to alloc_contig_freed_pages(). + * + * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES + * aligned, hovewer it's callers responsibility to guarantee that we + * are the only thread that changes migrate type of pageblocks the + * pages fall in. + * + * Returns zero on success or negative error code. On success all + * pages which PFN is in (start, end) are allocated for the caller and + * need to be freed with free_contig_pages(). + */ +int alloc_contig_range(unsigned long start, unsigned long end, + gfp_t flags) +{ + unsigned long outer_start, outer_end; + int ret; + + /* + * What we do here is we mark all pageblocks in range as + * MIGRATE_ISOLATE. Because of the way page allocator work, we + * align the range to MAX_ORDER pages so that page allocator + * won't try to merge buddies from different pageblocks and + * change MIGRATE_ISOLATE to some other migration type. + * + * Once the pageblocks are marked as MIGRATE_ISOLATE, we + * migrate the pages from an unaligned range (ie. pages that + * we are interested in). This will put all the pages in + * range back to page allocator as MIGRATE_ISOLATE. + * + * When this is done, we take the pages in range from page + * allocator removing them from the buddy system. This way + * page allocator will never consider using them. + * + * This lets us mark the pageblocks back as + * MIGRATE_CMA/MIGRATE_MOVABLE so that free pages in the + * MAX_ORDER aligned range but not in the unaligned, original + * range are put back to page allocator so that buddy can use + * them. + */ + + ret = start_isolate_page_range(pfn_to_maxpage(start), + pfn_to_maxpage_up(end)); + if (ret) + goto done; + + ret = __alloc_contig_migrate_range(start, end); + if (ret) + goto done; + + /* + * Pages from [start, end) are within a MAX_ORDER_NR_PAGES + * aligned blocks that are marked as MIGRATE_ISOLATE. What's + * more, all pages in [start, end) are free in page allocator. + * What we are going to do is to allocate all pages from + * [start, end) (that is remove them from page allocater). + * + * The only problem is that pages at the beginning and at the + * end of interesting range may be not aligned with pages that + * page allocator holds, ie. they can be part of higher order + * pages. Because of this, we reserve the bigger range and + * once this is done free the pages we are not interested in. + */ + + ret = 0; + while (!PageBuddy(pfn_to_page(start & (~0UL << ret)))) + if (WARN_ON(++ret >= MAX_ORDER)) + return -EINVAL; + + outer_start = start & (~0UL << ret); + outer_end = alloc_contig_freed_pages(outer_start, end, flags); + + /* Free head and tail (if any) */ + if (start != outer_start) + free_contig_pages(pfn_to_page(outer_start), start - outer_start); + if (end != outer_end) + free_contig_pages(pfn_to_page(end), outer_end - end); + + ret = 0; +done: + undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end)); + return ret; +} + void free_contig_pages(struct page *page, int nr_pages) { for (; nr_pages; --nr_pages, ++page)
From: Michal Nazarewicz m.nazarewicz@samsung.com
The MIGRATE_CMA migration type has two main characteristics: (i) only movable pages can be allocated from MIGRATE_CMA pageblocks and (ii) page allocator will never change migration type of MIGRATE_CMA pageblocks.
This guarantees that page in a MIGRATE_CMA page block can always be migrated somewhere else (unless there's no memory left in the system).
It is designed to be used with Contiguous Memory Allocator (CMA) for allocating big chunks (eg. 10MiB) of physically contiguous memory. Once driver requests contiguous memory, CMA will migrate pages from MIGRATE_CMA pageblocks.
To minimise number of migrations, MIGRATE_CMA migration type is the last type tried when page allocator falls back to other migration types then requested.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com [m.szyprowski: cleaned up Kconfig, renamed some functions, removed ifdefs] Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com Acked-by: Arnd Bergmann arnd@arndb.de --- include/linux/mmzone.h | 41 +++++++++++++++--- include/linux/page-isolation.h | 1 + mm/Kconfig | 8 +++- mm/compaction.c | 10 +++++ mm/page_alloc.c | 88 +++++++++++++++++++++++++++++++--------- 5 files changed, 120 insertions(+), 28 deletions(-)
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index be1ac8d..74b7f27 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -35,13 +35,35 @@ */ #define PAGE_ALLOC_COSTLY_ORDER 3
-#define MIGRATE_UNMOVABLE 0 -#define MIGRATE_RECLAIMABLE 1 -#define MIGRATE_MOVABLE 2 -#define MIGRATE_PCPTYPES 3 /* the number of types on the pcp lists */ -#define MIGRATE_RESERVE 3 -#define MIGRATE_ISOLATE 4 /* can't allocate from here */ -#define MIGRATE_TYPES 5 +enum { + MIGRATE_UNMOVABLE, + MIGRATE_RECLAIMABLE, + MIGRATE_MOVABLE, + MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ + MIGRATE_RESERVE = MIGRATE_PCPTYPES, + /* + * MIGRATE_CMA migration type is designed to mimic the way + * ZONE_MOVABLE works. Only movable pages can be allocated + * from MIGRATE_CMA pageblocks and page allocator never + * implicitly change migration type of MIGRATE_CMA pageblock. + * + * The way to use it is to change migratetype of a range of + * pageblocks to MIGRATE_CMA which can be done by + * __free_pageblock_cma() function. What is important though + * is that a range of pageblocks must be aligned to + * MAX_ORDER_NR_PAGES should biggest page be bigger then + * a single pageblock. + */ + MIGRATE_CMA, + MIGRATE_ISOLATE, /* can't allocate from here */ + MIGRATE_TYPES +}; + +#ifdef CONFIG_CMA_MIGRATE_TYPE +# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) +#else +# define is_migrate_cma(migratetype) false +#endif
#define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ @@ -54,6 +76,11 @@ static inline int get_pageblock_migratetype(struct page *page) return get_pageblock_flags_group(page, PB_migrate, PB_migrate_end); }
+static inline bool is_pageblock_cma(struct page *page) +{ + return is_migrate_cma(get_pageblock_migratetype(page)); +} + struct free_area { struct list_head free_list[MIGRATE_TYPES]; unsigned long nr_free; diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index c5d1a7c..856d9cf 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -46,4 +46,5 @@ int test_pages_in_a_zone(unsigned long start_pfn, unsigned long end_pfn); unsigned long scan_lru_pages(unsigned long start, unsigned long end); int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn);
+extern void init_cma_reserved_pageblock(struct page *page); #endif diff --git a/mm/Kconfig b/mm/Kconfig index f2f1ca1..dd6e1ea 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -189,7 +189,7 @@ config COMPACTION config MIGRATION bool "Page migration" def_bool y - depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION + depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION || CMA_MIGRATE_TYPE help Allows the migration of the physical location of pages of processes while the virtual addresses are not changed. This is useful in @@ -198,6 +198,12 @@ config MIGRATION pages as migration can relocate pages to satisfy a huge page allocation instead of reclaiming.
+config CMA_MIGRATE_TYPE + bool + help + This enables the use the MIGRATE_CMA migrate type, which lets lets CMA + work on almost arbitrary memory range and not only inside ZONE_MOVABLE. + config PHYS_ADDR_T_64BIT def_bool 64BIT || ARCH_PHYS_ADDR_T_64BIT
diff --git a/mm/compaction.c b/mm/compaction.c index 6cc604b..9e5cc59 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -119,6 +119,16 @@ static bool suitable_migration_target(struct page *page) if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE) return false;
+ /* Keep MIGRATE_CMA alone as well. */ + /* + * XXX Revisit. We currently cannot let compaction touch CMA + * pages since compaction insists on changing their migration + * type to MIGRATE_MOVABLE (see split_free_page() called from + * isolate_freepages_block() above). + */ + if (is_migrate_cma(migratetype)) + return false; + /* If the page is a large free page, then allow migration */ if (PageBuddy(page) && page_order(page) >= pageblock_order) return true; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 35423c2..c9dfed0 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -719,6 +719,29 @@ void __meminit __free_pages_bootmem(struct page *page, unsigned int order) } }
+#ifdef CONFIG_CMA_MIGRATE_TYPE +/* + * Free whole pageblock and set it's migration type to MIGRATE_CMA. + */ +void __init init_cma_reserved_pageblock(struct page *page) +{ + struct page *p = page; + unsigned i = pageblock_nr_pages; + + prefetchw(p); + do { + if (--i) + prefetchw(p + 1); + __ClearPageReserved(p); + set_page_count(p, 0); + } while (++p, i); + + set_page_refcounted(page); + set_pageblock_migratetype(page, MIGRATE_CMA); + __free_pages(page, pageblock_order); + totalram_pages += pageblock_nr_pages; +} +#endif
/* * The order of subdivision here is critical for the IO subsystem. @@ -827,11 +850,11 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order, * This array describes the order lists are fallen back to when * the free lists for the desirable migrate type are depleted */ -static int fallbacks[MIGRATE_TYPES][MIGRATE_TYPES-1] = { +static int fallbacks[MIGRATE_TYPES][4] = { [MIGRATE_UNMOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, [MIGRATE_RECLAIMABLE] = { MIGRATE_UNMOVABLE, MIGRATE_MOVABLE, MIGRATE_RESERVE }, - [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_RESERVE }, - [MIGRATE_RESERVE] = { MIGRATE_RESERVE, MIGRATE_RESERVE, MIGRATE_RESERVE }, /* Never used */ + [MIGRATE_MOVABLE] = { MIGRATE_RECLAIMABLE, MIGRATE_UNMOVABLE, MIGRATE_CMA , MIGRATE_RESERVE }, + [MIGRATE_RESERVE] = { MIGRATE_RESERVE }, /* Never used */ };
/* @@ -926,12 +949,12 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) /* Find the largest possible block of pages in the other list */ for (current_order = MAX_ORDER-1; current_order >= order; --current_order) { - for (i = 0; i < MIGRATE_TYPES - 1; i++) { + for (i = 0; i < ARRAY_SIZE(fallbacks[0]); i++) { migratetype = fallbacks[start_migratetype][i];
/* MIGRATE_RESERVE handled later if necessary */ if (migratetype == MIGRATE_RESERVE) - continue; + break;
area = &(zone->free_area[current_order]); if (list_empty(&area->free_list[migratetype])) @@ -946,19 +969,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) * pages to the preferred allocation list. If falling * back for a reclaimable kernel allocation, be more * aggressive about taking ownership of free pages + * + * On the other hand, never change migration + * type of MIGRATE_CMA pageblocks nor move CMA + * pages on different free lists. We don't + * want unmovable pages to be allocated from + * MIGRATE_CMA areas. */ - if (unlikely(current_order >= (pageblock_order >> 1)) || - start_migratetype == MIGRATE_RECLAIMABLE || - page_group_by_mobility_disabled) { - unsigned long pages; + if (!is_pageblock_cma(page) && + (unlikely(current_order >= pageblock_order / 2) || + start_migratetype == MIGRATE_RECLAIMABLE || + page_group_by_mobility_disabled)) { + int pages; pages = move_freepages_block(zone, page, - start_migratetype); + start_migratetype);
- /* Claim the whole block if over half of it is free */ + /* + * Claim the whole block if over half + * of it is free + */ if (pages >= (1 << (pageblock_order-1)) || - page_group_by_mobility_disabled) + page_group_by_mobility_disabled) set_pageblock_migratetype(page, - start_migratetype); + start_migratetype);
migratetype = start_migratetype; } @@ -968,11 +1001,14 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype) rmv_page_order(page);
/* Take ownership for orders >= pageblock_order */ - if (current_order >= pageblock_order) + if (current_order >= pageblock_order && + !is_pageblock_cma(page)) change_pageblock_range(page, current_order, start_migratetype);
- expand(zone, page, order, current_order, area, migratetype); + expand(zone, page, order, current_order, area, + is_migrate_cma(start_migratetype) + ? start_migratetype : migratetype);
trace_mm_page_alloc_extfrag(page, order, current_order, start_migratetype, migratetype); @@ -1044,7 +1080,10 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, list_add(&page->lru, list); else list_add_tail(&page->lru, list); - set_page_private(page, migratetype); + if (is_pageblock_cma(page)) + set_page_private(page, MIGRATE_CMA); + else + set_page_private(page, migratetype); list = &page->lru; } __mod_zone_page_state(zone, NR_FREE_PAGES, -(i << order)); @@ -1185,9 +1224,16 @@ void free_hot_cold_page(struct page *page, int cold) * offlined but treat RESERVE as movable pages so we can get those * areas back if necessary. Otherwise, we may have to free * excessively into the page allocator + * + * Still, do not change migration type of MIGRATE_CMA pages (if + * they'd be recorded as MIGRATE_MOVABLE an unmovable page could + * be allocated from MIGRATE_CMA block and we don't want to allow + * that). In this respect, treat MIGRATE_CMA like + * MIGRATE_ISOLATE. */ if (migratetype >= MIGRATE_PCPTYPES) { - if (unlikely(migratetype == MIGRATE_ISOLATE)) { + if (unlikely(migratetype == MIGRATE_ISOLATE + || is_migrate_cma(migratetype))) { free_one_page(zone, page, 0, migratetype); goto out; } @@ -1276,7 +1322,9 @@ int split_free_page(struct page *page) if (order >= pageblock_order - 1) { struct page *endpage = page + (1 << order) - 1; for (; page < endpage; page += pageblock_nr_pages) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); + if (!is_pageblock_cma(page)) + set_pageblock_migratetype(page, + MIGRATE_MOVABLE); }
return 1 << order; @@ -5554,8 +5602,8 @@ __count_immobile_pages(struct zone *zone, struct page *page, int count) */ if (zone_idx(zone) == ZONE_MOVABLE) return true; - - if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE) + if (get_pageblock_migratetype(page) == MIGRATE_MOVABLE || + is_pageblock_cma(page)) return true;
pfn = page_to_pfn(page);
From: Michal Nazarewicz m.nazarewicz@samsung.com
This commit changes various functions that change pages and pageblocks migrate type between MIGRATE_ISOLATE and MIGRATE_MOVABLE in such a way as to allow to work with MIGRATE_CMA migrate type.
Signed-off-by: Michal Nazarewicz m.nazarewicz@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com CC: Michal Nazarewicz mina86@mina86.com Acked-by: Arnd Bergmann arnd@arndb.de --- include/linux/page-isolation.h | 40 +++++++++++++++++++++++++++------------- mm/page_alloc.c | 19 ++++++++++++------- mm/page_isolation.c | 15 ++++++++------- 3 files changed, 47 insertions(+), 27 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 856d9cf..b2a81fd 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -3,39 +3,53 @@
/* * Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE. - * If specified range includes migrate types other than MOVABLE, + * If specified range includes migrate types other than MOVABLE or CMA, * this will fail with -EBUSY. * * For isolating all pages in the range finally, the caller have to * free all pages in the range. test_page_isolated() can be used for * test it. */ -extern int -start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); +int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype); + +static inline int +start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +{ + return __start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); +} + +int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype);
/* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ -extern int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); +static inline int +undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +{ + return __undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE); +}
/* - * test all pages in [start_pfn, end_pfn)are isolated or not. + * Test all pages in [start_pfn, end_pfn) are isolated or not. */ -extern int -test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn); +int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn);
/* - * Internal funcs.Changes pageblock's migrate type. - * Please use make_pagetype_isolated()/make_pagetype_movable(). + * Internal functions. Changes pageblock's migrate type. */ -extern int set_migratetype_isolate(struct page *page); -extern void unset_migratetype_isolate(struct page *page); +int set_migratetype_isolate(struct page *page); +void __unset_migratetype_isolate(struct page *page, unsigned migratetype); +static inline void unset_migratetype_isolate(struct page *page) +{ + __unset_migratetype_isolate(page, MIGRATE_MOVABLE); +} extern unsigned long alloc_contig_freed_pages(unsigned long start, unsigned long end, gfp_t flag); extern int alloc_contig_range(unsigned long start, unsigned long end, - gfp_t flags); + gfp_t flags, unsigned migratetype); extern void free_contig_pages(struct page *page, int nr_pages);
/* diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c9dfed0..46e78d4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5702,7 +5702,7 @@ out: return ret; }
-void unset_migratetype_isolate(struct page *page) +void __unset_migratetype_isolate(struct page *page, unsigned migratetype) { struct zone *zone; unsigned long flags; @@ -5710,8 +5710,8 @@ void unset_migratetype_isolate(struct page *page) spin_lock_irqsave(&zone->lock, flags); if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE) goto out; - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - move_freepages_block(zone, page, MIGRATE_MOVABLE); + set_pageblock_migratetype(page, migratetype); + move_freepages_block(zone, page, migratetype); out: spin_unlock_irqrestore(&zone->lock, flags); } @@ -5816,6 +5816,10 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) * @start: start PFN to allocate * @end: one-past-the-last PFN to allocate * @flags: flags passed to alloc_contig_freed_pages(). + * @migratetype: migratetype of the underlaying pageblocks (either + * #MIGRATE_MOVABLE or #MIGRATE_CMA). All pageblocks + * in range must have the same migratetype and it must + * be either of the two. * * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES * aligned, hovewer it's callers responsibility to guarantee that we @@ -5827,7 +5831,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end) * need to be freed with free_contig_pages(). */ int alloc_contig_range(unsigned long start, unsigned long end, - gfp_t flags) + gfp_t flags, unsigned migratetype) { unsigned long outer_start, outer_end; int ret; @@ -5855,8 +5859,8 @@ int alloc_contig_range(unsigned long start, unsigned long end, * them. */
- ret = start_isolate_page_range(pfn_to_maxpage(start), - pfn_to_maxpage_up(end)); + ret = __start_isolate_page_range(pfn_to_maxpage(start), + pfn_to_maxpage_up(end), migratetype); if (ret) goto done;
@@ -5894,7 +5898,8 @@ int alloc_contig_range(unsigned long start, unsigned long end,
ret = 0; done: - undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end)); + __undo_isolate_page_range(pfn_to_maxpage(start), pfn_to_maxpage_up(end), + migratetype); return ret; }
diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 270a026..e232b25 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -23,10 +23,11 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) }
/* - * start_isolate_page_range() -- make page-allocation-type of range of pages + * __start_isolate_page_range() -- make page-allocation-type of range of pages * to be MIGRATE_ISOLATE. * @start_pfn: The lower PFN of the range to be isolated. * @end_pfn: The upper PFN of the range to be isolated. + * @migratetype: migrate type to set in error recovery. * * Making page-allocation-type to be MIGRATE_ISOLATE means free pages in * the range will never be allocated. Any free pages and pages freed in the @@ -35,8 +36,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * start_pfn/end_pfn must be aligned to pageblock_order. * Returns 0 on success and -EBUSY if any part of range cannot be isolated. */ -int -start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +int __start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype) { unsigned long pfn; unsigned long undo_pfn; @@ -59,7 +60,7 @@ undo: for (pfn = start_pfn; pfn < undo_pfn; pfn += pageblock_nr_pages) - unset_migratetype_isolate(pfn_to_page(pfn)); + __unset_migratetype_isolate(pfn_to_page(pfn), migratetype);
return -EBUSY; } @@ -67,8 +68,8 @@ undo: /* * Make isolated pages available again. */ -int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) +int __undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, + unsigned migratetype) { unsigned long pfn; struct page *page; @@ -80,7 +81,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn) page = __first_valid_page(pfn, pageblock_nr_pages); if (!page || get_pageblock_migratetype(page) != MIGRATE_ISOLATE) continue; - unset_migratetype_isolate(page); + __unset_migratetype_isolate(page, migratetype); } return 0; }
The Contiguous Memory Allocator is a set of helper functions for DMA mapping framework that improves allocations of contiguous memory chunks.
CMA grabs memory on system boot, marks it with CMA_MIGRATE_TYPE and gives back to the system. Kernel is allowed to allocate movable pages within CMA's managed memory so that it can be used for example for page cache when DMA mapping do not use it. On dma_alloc_from_contiguous() request such pages are migrated out of CMA area to free required contiguous block and fulfill the request. This allows to allocate large contiguous chunks of memory at any time assuming that there is enough free memory available in the system.
This code is heavily based on earlier works by Michal Nazarewicz.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com CC: Michal Nazarewicz mina86@mina86.com --- arch/Kconfig | 3 + drivers/base/Kconfig | 79 ++++++++ drivers/base/Makefile | 1 + drivers/base/dma-contiguous.c | 386 ++++++++++++++++++++++++++++++++++++++++ include/linux/dma-contiguous.h | 102 +++++++++++ 5 files changed, 571 insertions(+), 0 deletions(-) create mode 100644 drivers/base/dma-contiguous.c create mode 100644 include/linux/dma-contiguous.h
diff --git a/arch/Kconfig b/arch/Kconfig index 4b0669c..a3b39a2 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -124,6 +124,9 @@ config HAVE_ARCH_TRACEHOOK config HAVE_DMA_ATTRS bool
+config HAVE_DMA_CONTIGUOUS + bool + config USE_GENERIC_SMP_HELPERS bool
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 21cf46f..bffe7fb 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -174,4 +174,83 @@ config SYS_HYPERVISOR
source "drivers/base/regmap/Kconfig"
+config CMA + bool "Contiguous Memory Allocator" + depends on HAVE_DMA_CONTIGUOUS && HAVE_MEMBLOCK + select MIGRATION + select CMA_MIGRATE_TYPE + help + This enables the Contiguous Memory Allocator which allows drivers + to allocate big physically-contiguous blocks of memory for use with + hardware components that do not support I/O map nor scatter-gather. + + For more information see <include/linux/dma-contiguous.h>. + If unsure, say "n". + +if CMA + +config CMA_DEBUG + bool "CMA debug messages (DEVELOPEMENT)" + help + Turns on debug messages in CMA. This produces KERN_DEBUG + messages for every CMA call as well as various messages while + processing calls such as dma_alloc_from_contiguous(). + This option does not affect warning and error messages. + +comment "Default contiguous memory area size:" + +config CMA_SIZE_ABSOLUTE + int "Absolute size (in MiB)" + depends on !CMA_SIZE_SEL_PERCENTAGE + default 16 + help + Defines the size (in MiB) of the default memory area for Contiguous + Memory Allocator. + +config CMA_SIZE_PERCENTAGE + int "Percentage of total memory" + depends on !CMA_SIZE_SEL_ABSOLUTE + default 10 + help + Defines the size of the default memory area for Contiguous Memory + Allocator as a percentage of the total memory in the system. + +choice + prompt "Selected region size" + default CMA_SIZE_SEL_ABSOLUTE + +config CMA_SIZE_SEL_ABSOLUTE + bool "Use absolute value only" + +config CMA_SIZE_SEL_PERCENTAGE + bool "Use percentage value only" + +config CMA_SIZE_SEL_MIN + bool "Use lower value (minimum)" + +config CMA_SIZE_SEL_MAX + bool "Use higher value (maximum)" + +endchoice + +config CMA_ALIGNMENT + int "Maximum PAGE_SIZE order of alignment for contiguous buffers" + range 4 9 + default 8 + help + DMA mapping framework by default aligns all buffers to the smallest + PAGE_SIZE order which is greater than or equal to the requested buffer + size. This works well for buffers up to a few hundreds kilobytes, but + for larger buffers it just a memory waste. With this parameter you can + specify the maximum PAGE_SIZE order for contiguous buffers. Larger + buffers will be aligned only to this specified order. The order is + expressed as a power of two multiplied by the PAGE_SIZE. + + For example, if your system defaults to 4KiB pages, the order value + of 8 means that the buffers will be aligned up to 1MiB only. + + If unsure, leave the default value "8". + +endif + endmenu diff --git a/drivers/base/Makefile b/drivers/base/Makefile index 99a375a..794546f 100644 --- a/drivers/base/Makefile +++ b/drivers/base/Makefile @@ -5,6 +5,7 @@ obj-y := core.o sys.o bus.o dd.o syscore.o \ cpu.o firmware.o init.o map.o devres.o \ attribute_container.o transport_class.o obj-$(CONFIG_DEVTMPFS) += devtmpfs.o +obj-$(CONFIG_CMA) += dma-contiguous.o obj-y += power/ obj-$(CONFIG_HAS_DMA) += dma-mapping.o obj-$(CONFIG_HAVE_GENERIC_DMA_COHERENT) += dma-coherent.o diff --git a/drivers/base/dma-contiguous.c b/drivers/base/dma-contiguous.c new file mode 100644 index 0000000..1f2519d --- /dev/null +++ b/drivers/base/dma-contiguous.c @@ -0,0 +1,386 @@ +/* + * Contiguous Memory Allocator for DMA mapping framework + * Copyright (c) 2010-2011 by Samsung Electronics. + * Written by: + * Marek Szyprowski m.szyprowski@samsung.com + * Michal Nazarewicz mina86@mina86.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License or (at your optional) any later version of the license. + */ + +#define pr_fmt(fmt) "cma: " fmt + +#ifdef CONFIG_CMA_DEBUG +#ifndef DEBUG +# define DEBUG +#endif +#endif + +#include <asm/page.h> +#include <asm/dma-contiguous.h> + +#include <linux/memblock.h> +#include <linux/err.h> +#include <linux/mm.h> +#include <linux/mutex.h> +#include <linux/page-isolation.h> +#include <linux/slab.h> +#include <linux/swap.h> +#include <linux/mm_types.h> +#include <linux/dma-contiguous.h> + +#ifndef SZ_1M +#define SZ_1M (1 << 20) +#endif + +#ifdef phys_to_pfn +/* nothing to do */ +#elif defined __phys_to_pfn +# define phys_to_pfn __phys_to_pfn +#elif defined PFN_PHYS +# define phys_to_pfn PFN_PHYS +#else +# error correct phys_to_pfn implementation needed +#endif + +struct cma { + unsigned long base_pfn; + unsigned long count; + unsigned long *bitmap; +}; + +struct cma *dma_contiguous_default_area; + +#ifndef CONFIG_CMA_SIZE_ABSOLUTE +#define CONFIG_CMA_SIZE_ABSOLUTE 0 +#endif + +#ifndef CONFIG_CMA_SIZE_PERCENTAGE +#define CONFIG_CMA_SIZE_PERCENTAGE 0 +#endif + +static unsigned long size_abs = CONFIG_CMA_SIZE_ABSOLUTE * SZ_1M; +static unsigned long size_percent = CONFIG_CMA_SIZE_PERCENTAGE; +static long size_cmdline = -1; + +static int __init early_cma(char *p) +{ + pr_debug("%s(%s)\n", __func__, p); + size_cmdline = memparse(p, &p); + return 0; +} +early_param("cma", early_cma); + +static unsigned long __init __cma_early_get_total_pages(void) +{ + struct memblock_region *reg; + unsigned long total_pages = 0; + + /* + * We cannot use memblock_phys_mem_size() here, because + * memblock_analyze() has not been called yet. + */ + for_each_memblock(memory, reg) + total_pages += memblock_region_memory_end_pfn(reg) - + memblock_region_memory_base_pfn(reg); + return total_pages; +} + +/** + * dma_contiguous_reserve() - reserve area for contiguous memory handling + * + * This funtion reserves memory from early allocator. It should be + * called by arch specific code once the early allocator (memblock or bootmem) + * has been activated and all other subsystems have already allocated/reserved + * memory. + */ +void __init dma_contiguous_reserve(phys_addr_t limit) +{ + unsigned long selected_size = 0; + unsigned long total_pages; + + pr_debug("%s(limit %08lx)\n", __func__, (unsigned long)limit); + + total_pages = __cma_early_get_total_pages(); + size_percent *= (total_pages << PAGE_SHIFT) / 100; + + pr_debug("%s: total available: %ld MiB, size absolute: %ld MiB, size percentage: %ld MiB\n", + __func__, (total_pages << PAGE_SHIFT) / SZ_1M, + size_abs / SZ_1M, size_percent / SZ_1M); + +#ifdef CONFIG_CMA_SIZE_SEL_ABSOLUTE + selected_size = size_abs; +#elif defined(CONFIG_CMA_SIZE_SEL_PERCENTAGE) + selected_size = size_percent; +#elif defined(CONFIG_CMA_SIZE_SEL_MIN) + selected_size = min(size_abs, size_percent); +#elif defined(CONFIG_CMA_SIZE_SEL_MAX) + selected_size = max(size_abs, size_percent); +#endif + + if (size_cmdline != -1) + selected_size = size_cmdline; + + if (!selected_size) + return; + + pr_debug("%s: reserving %ld MiB for global area\n", __func__, + selected_size / SZ_1M); + + dma_declare_contiguous(NULL, selected_size, 0, limit); +}; + +static DEFINE_MUTEX(cma_mutex); + +static void __cma_activate_area(unsigned long base_pfn, unsigned long count) +{ + unsigned long pfn = base_pfn; + unsigned i = count >> pageblock_order; + struct zone *zone; + + VM_BUG_ON(!pfn_valid(pfn)); + zone = page_zone(pfn_to_page(pfn)); + + do { + unsigned j; + base_pfn = pfn; + for (j = pageblock_nr_pages; j; --j, pfn++) { + VM_BUG_ON(!pfn_valid(pfn)); + VM_BUG_ON(page_zone(pfn_to_page(pfn)) != zone); + } + init_cma_reserved_pageblock(pfn_to_page(base_pfn)); + } while (--i); +} + +static struct cma *__cma_create_area(unsigned long base_pfn, + unsigned long count) +{ + int bitmap_size = BITS_TO_LONGS(count) * sizeof(long); + struct cma *cma; + + pr_debug("%s(base %08lx, count %lx)\n", __func__, base_pfn, count); + + cma = kmalloc(sizeof *cma, GFP_KERNEL); + if (!cma) + return ERR_PTR(-ENOMEM); + + cma->base_pfn = base_pfn; + cma->count = count; + cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL); + + if (!cma->bitmap) + goto no_mem; + + __cma_activate_area(base_pfn, count); + + pr_debug("%s: returned %p\n", __func__, (void *)cma); + return cma; + +no_mem: + kfree(cma); + return ERR_PTR(-ENOMEM); +} + +static struct cma_reserved { + phys_addr_t start; + unsigned long size; + struct device *dev; +} cma_reserved[MAX_CMA_AREAS] __initdata; +static unsigned cma_reserved_count __initdata; + +static int __init __cma_init_reserved_areas(void) +{ + struct cma_reserved *r = cma_reserved; + unsigned i = cma_reserved_count; + + pr_debug("%s()\n", __func__); + + for (; i; --i, ++r) { + struct cma *cma; + cma = __cma_create_area(phys_to_pfn(r->start), + r->size >> PAGE_SHIFT); + if (!IS_ERR(cma)) { + if (r->dev) + set_dev_cma_area(r->dev, cma); + else + dma_contiguous_default_area = cma; + } + } + return 0; +} +core_initcall(__cma_init_reserved_areas); + +/** + * dma_declare_contiguous() - reserve area for contiguous memory handling + * for particular device + * @dev: Pointer to device structure. + * @size: Size of the reserved memory. + * @start: Start address of the reserved memory (optional, 0 for any). + * @limit: End address of the reserved memory (optional, 0 for any). + * + * This funtion reserves memory for specified device. It should be + * called by board specific code when early allocator (memblock or bootmem) + * is still activate. + */ +int __init dma_declare_contiguous(struct device *dev, unsigned long size, + phys_addr_t base, phys_addr_t limit) +{ + struct cma_reserved *r = &cma_reserved[cma_reserved_count]; + unsigned long alignment; + + pr_debug("%s(size %lx, base %08lx, limit %08lx)\n", __func__, + (unsigned long)size, (unsigned long)base, + (unsigned long)limit); + + /* Sanity checks */ + if (cma_reserved_count == ARRAY_SIZE(cma_reserved)) + return -ENOSPC; + + if (!size) + return -EINVAL; + + /* Sanitise input arguments */ + alignment = PAGE_SIZE << max(MAX_ORDER, pageblock_order); + base = ALIGN(base, alignment); + size = ALIGN(size, alignment); + limit = ALIGN(limit, alignment); + + /* Reserve memory */ + if (base) { + if (memblock_is_region_reserved(base, size) || + memblock_reserve(base, size) < 0) { + base = -EBUSY; + goto err; + } + } else { + /* + * Use __memblock_alloc_base() since + * memblock_alloc_base() panic()s. + */ + phys_addr_t addr = __memblock_alloc_base(size, alignment, limit); + if (!addr) { + base = -ENOMEM; + goto err; + } else if (addr + size > ~(unsigned long)0) { + memblock_free(addr, size); + base = -EOVERFLOW; + goto err; + } else { + base = addr; + } + } + + /* + * Each reserved area must be initialised later, when more kernel + * subsystems (like slab allocator) are available. + */ + r->start = base; + r->size = size; + r->dev = dev; + cma_reserved_count++; + printk(KERN_INFO "CMA: reserved %ld MiB at %08lx\n", size / SZ_1M, + (unsigned long)base); + + /* + * Architecture specific contiguous memory fixup. + */ + dma_contiguous_early_fixup(base, size); + return 0; +err: + printk(KERN_ERR "CMA: failed to reserve %ld MiB\n", size / SZ_1M); + return base; +} + +/** + * dma_alloc_from_contiguous() - allocate pages from contiguous area + * @dev: Pointer to device for which the allocation is performed. + * @count: Requested number of pages. + * @align: Requested alignment of pages (in PAGE_SIZE order). + * + * This funtion allocates memory buffer for specified device. It uses + * device specific contiguous memory area if available or the default + * global one. Requires architecture specific get_dev_cma_area() helper + * function. + */ +struct page *dma_alloc_from_contiguous(struct device *dev, int count, + unsigned int align) +{ + struct cma *cma = get_dev_cma_area(dev); + unsigned long pfn, pageno; + int ret; + + if (!cma) + return NULL; + + if (align > CONFIG_CMA_ALIGNMENT) + align = CONFIG_CMA_ALIGNMENT; + + pr_debug("%s(cma %p, count %d, align %d)\n", __func__, (void *)cma, + count, align); + + if (!count) + return NULL; + + mutex_lock(&cma_mutex); + + pageno = bitmap_find_next_zero_area(cma->bitmap, cma->count, 0, count, + (1 << align) - 1); + if (pageno >= cma->count) { + ret = -ENOMEM; + goto error; + } + bitmap_set(cma->bitmap, pageno, count); + + pfn = cma->base_pfn + pageno; + ret = alloc_contig_range(pfn, pfn + count, 0, MIGRATE_CMA); + if (ret) + goto free; + + mutex_unlock(&cma_mutex); + + pr_debug("%s(): returned %p\n", __func__, pfn_to_page(pfn)); + return pfn_to_page(pfn); +free: + bitmap_clear(cma->bitmap, pageno, count); +error: + mutex_unlock(&cma_mutex); + return NULL; +} + +/** + * dma_release_from_contiguous() - release allocated pages + * @dev: Pointer to device for which the pages were allocated. + * @pages: Allocated pages. + * @count: Number of allocated pages. + * + * This funtion releases memory allocated by dma_alloc_from_contiguous(). + * It return 0 when provided pages doen't belongs to contiguous area and + * 1 on success. + */ +int dma_release_from_contiguous(struct device *dev, struct page *pages, + int count) +{ + struct cma *cma = get_dev_cma_area(dev); + unsigned long pfn; + + if (!cma || !pages) + return 0; + + pr_debug("%s(page %p)\n", __func__, (void *)pages); + + pfn = page_to_pfn(pages); + + if (pfn < cma->base_pfn || pfn >= cma->base_pfn + cma->count) + return 0; + + mutex_lock(&cma_mutex); + + bitmap_clear(cma->bitmap, pfn - cma->base_pfn, count); + free_contig_pages(pages, count); + + mutex_unlock(&cma_mutex); + return 1; +} diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h new file mode 100644 index 0000000..905773e --- /dev/null +++ b/include/linux/dma-contiguous.h @@ -0,0 +1,102 @@ +#ifndef __LINUX_CMA_H +#define __LINUX_CMA_H + +/* + * Contiguous Memory Allocator for DMA mapping framework + * Copyright (c) 2010-2011 by Samsung Electronics. + * Written by: + * Marek Szyprowski m.szyprowski@samsung.com + * Michal Nazarewicz mina86@mina86.com + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License as + * published by the Free Software Foundation; either version 2 of the + * License or (at your optional) any later version of the license. + */ + +/* + * Contiguous Memory Allocator + * + * The Contiguous Memory Allocator (CMA) makes it possible to + * allocate big contiguous chunks of memory after the system has + * booted. + * + * Why is it needed? + * + * Various devices on embedded systems have no scatter-getter and/or + * IO map support and require contiguous blocks of memory to + * operate. They include devices such as cameras, hardware video + * coders, etc. + * + * Such devices often require big memory buffers (a full HD frame + * is, for instance, more then 2 mega pixels large, i.e. more than 6 + * MB of memory), which makes mechanisms such as kmalloc() or + * alloc_page() ineffective. + * + * At the same time, a solution where a big memory region is + * reserved for a device is suboptimal since often more memory is + * reserved then strictly required and, moreover, the memory is + * inaccessible to page system even if device drivers don't use it. + * + * CMA tries to solve this issue by operating on memory regions + * where only movable pages can be allocated from. This way, kernel + * can use the memory for pagecache and when device driver requests + * it, allocated pages can be migrated. + * + * Driver usage + * + * CMA should not be used by the device drivers directly. It is + * only a helper framework for dma-mapping subsystem. + * + * For more information, see kernel-docs in drivers/base/dma-contiguous.c + */ + +#ifdef __KERNEL__ + +struct cma; +struct page; +struct device; + +#ifdef CONFIG_CMA + +extern struct cma *dma_contiguous_default_area; + +void dma_contiguous_reserve(phys_addr_t addr_limit); +int dma_declare_contiguous(struct device *dev, unsigned long size, + phys_addr_t base, phys_addr_t limit); + +struct page *dma_alloc_from_contiguous(struct device *dev, int count, + unsigned int order); +int dma_release_from_contiguous(struct device *dev, struct page *pages, + int count); + +#else + +static inline void dma_contiguous_reserve(phys_addr_t limit) { } + +static inline +int dma_declare_contiguous(struct device *dev, unsigned long size, + phys_addr_t base, phys_addr_t limit) +{ + return -ENOSYS; +} + +static inline +struct page *dma_alloc_from_contiguous(struct device *dev, int count, + unsigned int order) +{ + return NULL; +} + +static inline +int dma_release_from_contiguous(struct device *dev, struct page *pages, + int count) +{ + return 0; +} + +#endif + +#endif + +#endif
This patch adds support for CMA to dma-mapping subsystem for ARM architecture. By default a global CMA area is used, but specific devices are allowed to have their private memory areas if required (they can be created with dma_declare_contiguous() function during board initialization).
Contiguous memory areas reserved for DMA are remapped with 2-level page tables on boot. Once a buffer is requested, a low memory kernel mapping is updated to to match requested memory access type.
GFP_ATOMIC allocations are performed from special pool which is created early during boot. This way remapping page attributes is not needed on allocation time.
CMA has been enabled unconditionally for ARMv6+ systems.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- arch/arm/Kconfig | 2 + arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-contiguous.h | 33 +++ arch/arm/include/asm/mach/map.h | 1 + arch/arm/mm/dma-mapping.c | 362 +++++++++++++++++++++++++++------ arch/arm/mm/init.c | 8 + arch/arm/mm/mm.h | 3 + arch/arm/mm/mmu.c | 29 ++- 8 files changed, 366 insertions(+), 75 deletions(-) create mode 100644 arch/arm/include/asm/dma-contiguous.h
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 5ebc5d9..2327fdd 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -3,6 +3,8 @@ config ARM default y select HAVE_AOUT select HAVE_DMA_API_DEBUG + select HAVE_DMA_CONTIGUOUS if (CPU_V6 || CPU_V6K || CPU_V7) + select CMA if (CPU_V6 || CPU_V6K || CPU_V7) select HAVE_IDE select HAVE_MEMBLOCK select RTC_LIB diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h index 9f390ce..942913e 100644 --- a/arch/arm/include/asm/device.h +++ b/arch/arm/include/asm/device.h @@ -10,6 +10,9 @@ struct dev_archdata { #ifdef CONFIG_DMABOUNCE struct dmabounce_device_info *dmabounce; #endif +#ifdef CONFIG_CMA + struct cma *cma_area; +#endif };
struct pdev_archdata { diff --git a/arch/arm/include/asm/dma-contiguous.h b/arch/arm/include/asm/dma-contiguous.h new file mode 100644 index 0000000..6be12ba --- /dev/null +++ b/arch/arm/include/asm/dma-contiguous.h @@ -0,0 +1,33 @@ +#ifndef ASMARM_DMA_CONTIGUOUS_H +#define ASMARM_DMA_CONTIGUOUS_H + +#ifdef __KERNEL__ + +#include <linux/device.h> +#include <linux/dma-contiguous.h> + +#ifdef CONFIG_CMA + +#define MAX_CMA_AREAS (8) + +void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size); + +static inline struct cma *get_dev_cma_area(struct device *dev) +{ + if (dev && dev->archdata.cma_area) + return dev->archdata.cma_area; + return dma_contiguous_default_area; +} + +static inline void set_dev_cma_area(struct device *dev, struct cma *cma) +{ + dev->archdata.cma_area = cma; +} + +#else + +#define MAX_CMA_AREAS (0) + +#endif +#endif +#endif diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h index d2fedb5..05343de 100644 --- a/arch/arm/include/asm/mach/map.h +++ b/arch/arm/include/asm/mach/map.h @@ -29,6 +29,7 @@ struct map_desc { #define MT_MEMORY_NONCACHED 11 #define MT_MEMORY_DTCM 12 #define MT_MEMORY_ITCM 13 +#define MT_MEMORY_DMA_READY 14
#ifdef CONFIG_MMU extern void iotable_init(struct map_desc *, int); diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 0a0a1e7..a0006c9 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -17,13 +17,17 @@ #include <linux/init.h> #include <linux/device.h> #include <linux/dma-mapping.h> +#include <linux/dma-contiguous.h> #include <linux/highmem.h> +#include <linux/memblock.h>
#include <asm/memory.h> #include <asm/highmem.h> #include <asm/cacheflush.h> #include <asm/tlbflush.h> #include <asm/sizes.h> +#include <asm/mach/map.h> +#include <asm/dma-contiguous.h>
#include "mm.h"
@@ -54,6 +58,19 @@ static u64 get_coherent_dma_mask(struct device *dev) return mask; }
+static void __dma_clear_buffer(struct page *page, size_t size) +{ + void *ptr; + /* + * Ensure that the allocated pages are zeroed, and that any data + * lurking in the kernel direct-mapped region is invalidated. + */ + ptr = page_address(page); + memset(ptr, 0, size); + dmac_flush_range(ptr, ptr + size); + outer_flush_range(__pa(ptr), __pa(ptr) + size); +} + /* * Allocate a DMA buffer for 'dev' of size 'size' using the * specified gfp mask. Note that 'size' must be page aligned. @@ -62,23 +79,6 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf { unsigned long order = get_order(size); struct page *page, *p, *e; - void *ptr; - u64 mask = get_coherent_dma_mask(dev); - -#ifdef CONFIG_DMA_API_DEBUG - u64 limit = (mask + 1) & ~mask; - if (limit && size >= limit) { - dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n", - size, mask); - return NULL; - } -#endif - - if (!mask) - return NULL; - - if (mask < 0xffffffffULL) - gfp |= GFP_DMA;
page = alloc_pages(gfp, order); if (!page) @@ -91,14 +91,7 @@ static struct page *__dma_alloc_buffer(struct device *dev, size_t size, gfp_t gf for (p = page + (size >> PAGE_SHIFT), e = page + (1 << order); p < e; p++) __free_page(p);
- /* - * Ensure that the allocated pages are zeroed, and that any data - * lurking in the kernel direct-mapped region is invalidated. - */ - ptr = page_address(page); - memset(ptr, 0, size); - dmac_flush_range(ptr, ptr + size); - outer_flush_range(__pa(ptr), __pa(ptr) + size); + __dma_clear_buffer(page, size);
return page; } @@ -157,6 +150,9 @@ static int __init consistent_init(void) int i = 0; u32 base = CONSISTENT_BASE;
+ if (cpu_architecture() >= CPU_ARCH_ARMv6) + return 0; + do { pgd = pgd_offset(&init_mm, base);
@@ -188,9 +184,102 @@ static int __init consistent_init(void)
return ret; } - core_initcall(consistent_init);
+static void *__alloc_from_contiguous(struct device *dev, size_t size, + pgprot_t prot, struct page **ret_page); + +static struct arm_vmregion_head coherent_head = { + .vm_lock = __SPIN_LOCK_UNLOCKED(&coherent_head.vm_lock), + .vm_list = LIST_HEAD_INIT(coherent_head.vm_list), +}; + +size_t coherent_pool_size = CONSISTENT_DMA_SIZE / 8; + +static int __init early_coherent_pool(char *p) +{ + coherent_pool_size = memparse(p, &p); + return 0; +} +early_param("coherent_pool", early_coherent_pool); + +/* + * Initialise the coherent pool for atomic allocations. + */ +static int __init coherent_init(void) +{ + pgprot_t prot = pgprot_dmacoherent(pgprot_kernel); + size_t size = coherent_pool_size; + struct page *page; + void *ptr; + + if (cpu_architecture() < CPU_ARCH_ARMv6) + return 0; + + ptr = __alloc_from_contiguous(NULL, size, prot, &page); + if (ptr) { + coherent_head.vm_start = (unsigned long) ptr; + coherent_head.vm_end = (unsigned long) ptr + size; + printk(KERN_INFO "DMA: preallocated %u KiB pool for atomic coherent allocations\n", + (unsigned)size / 1024); + return 0; + } + printk(KERN_ERR "DMA: failed to allocate %u KiB pool for atomic coherent allocation\n", + (unsigned)size / 1024); + return -ENOMEM; +} +/* + * CMA is activated by core_initcall, so we must be called after it + */ +postcore_initcall(coherent_init); + +struct dma_contiguous_early_reserve { + phys_addr_t base; + unsigned long size; +}; + +static struct dma_contiguous_early_reserve +dma_mmu_remap[MAX_CMA_AREAS] __initdata; + +static int dma_mmu_remap_num __initdata; + +void __init dma_contiguous_early_fixup(phys_addr_t base, unsigned long size) +{ + dma_mmu_remap[dma_mmu_remap_num].base = base; + dma_mmu_remap[dma_mmu_remap_num].size = size; + dma_mmu_remap_num++; +} + +void __init dma_contiguous_remap(void) +{ + int i; + for (i = 0; i < dma_mmu_remap_num; i++) { + phys_addr_t start = dma_mmu_remap[i].base; + phys_addr_t end = start + dma_mmu_remap[i].size; + struct map_desc map; + unsigned long addr; + + if (end > arm_lowmem_limit) + end = arm_lowmem_limit; + if (start >= end) + return; + + map.pfn = __phys_to_pfn(start); + map.virtual = __phys_to_virt(start); + map.length = end - start; + map.type = MT_MEMORY_DMA_READY; + + /* + * Clear previous low-memory mapping + */ + for (addr = __phys_to_virt(start); addr < __phys_to_virt(end); + addr += PGDIR_SIZE) + pmd_clear(pmd_off_k(addr)); + + iotable_init(&map, 1); + } +} + static void * __dma_alloc_remap(struct page *page, size_t size, gfp_t gfp, pgprot_t prot) { @@ -296,31 +385,178 @@ static void __dma_free_remap(void *cpu_addr, size_t size) arm_vmregion_free(&consistent_head, c); }
+static int __dma_update_pte(pte_t *pte, pgtable_t token, unsigned long addr, + void *data) +{ + struct page *page = virt_to_page(addr); + pgprot_t prot = *(pgprot_t *)data; + + set_pte_ext(pte, mk_pte(page, prot), 0); + return 0; +} + +static void __dma_remap(struct page *page, size_t size, pgprot_t prot) +{ + unsigned long start = (unsigned long) page_address(page); + unsigned end = start + size; + + apply_to_page_range(&init_mm, start, size, __dma_update_pte, &prot); + dsb(); + flush_tlb_kernel_range(start, end); +} + +static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp, + pgprot_t prot, struct page **ret_page) +{ + struct page *page; + void *ptr; + page = __dma_alloc_buffer(dev, size, gfp); + if (!page) + return NULL; + + ptr = __dma_alloc_remap(page, size, gfp, prot); + if (!ptr) { + __dma_free_buffer(page, size); + return NULL; + } + + *ret_page = page; + return ptr; +} + +static void *__alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page) +{ + struct arm_vmregion *c; + size_t align; + + if (!coherent_head.vm_start) { + printk(KERN_ERR "%s: coherent pool not initialised!\n", + __func__); + dump_stack(); + return NULL; + } + + align = 1 << fls(size - 1); + c = arm_vmregion_alloc(&coherent_head, align, size, 0); + if (c) { + void *ptr = (void *)c->vm_start; + struct page *page = virt_to_page(ptr); + *ret_page = page; + return ptr; + } + return NULL; +} + +static int __free_from_pool(void *cpu_addr, size_t size) +{ + unsigned long start = (unsigned long)cpu_addr; + unsigned long end = start + size; + struct arm_vmregion *c; + + if (start < coherent_head.vm_start || end > coherent_head.vm_end) + return 0; + + c = arm_vmregion_find_remove(&coherent_head, (unsigned long)start); + + if ((c->vm_end - c->vm_start) != size) { + printk(KERN_ERR "%s: freeing wrong coherent size (%ld != %d)\n", + __func__, c->vm_end - c->vm_start, size); + dump_stack(); + size = c->vm_end - c->vm_start; + } + + arm_vmregion_free(&coherent_head, c); + return 1; +} + +static void *__alloc_from_contiguous(struct device *dev, size_t size, + pgprot_t prot, struct page **ret_page) +{ + unsigned long order = get_order(size); + size_t count = size >> PAGE_SHIFT; + struct page *page; + + page = dma_alloc_from_contiguous(dev, count, order); + if (!page) + return NULL; + + __dma_clear_buffer(page, size); + __dma_remap(page, size, prot); + + *ret_page = page; + return page_address(page); +} + +static void __free_from_contiguous(struct device *dev, struct page *page, + size_t size) +{ + __dma_remap(page, size, pgprot_kernel); + dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT); +} + +#define nommu() 0 + #else /* !CONFIG_MMU */
-#define __dma_alloc_remap(page, size, gfp, prot) page_address(page) -#define __dma_free_remap(addr, size) do { } while (0) +#define nommu() 1 + +#define __alloc_remap_buffer(dev, size, gfp, prot, ret) NULL +#define __alloc_from_pool(dev, size, ret_page) NULL +#define __alloc_from_contiguous(dev, size, prot, ret) NULL +#define __free_from_pool(cpu_addr, size) 0 +#define __free_from_contiguous(dev, page, size) do { } while (0) +#define __dma_free_remap(cpu_addr, size) do { } while (0)
#endif /* CONFIG_MMU */
-static void * -__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, - pgprot_t prot) +static void *__alloc_simple_buffer(struct device *dev, size_t size, gfp_t gfp, + struct page **ret_page) { struct page *page; + page = __dma_alloc_buffer(dev, size, gfp); + if (!page) + return NULL; + + *ret_page = page; + return page_address(page); +} + + + +static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, + gfp_t gfp, pgprot_t prot) +{ + u64 mask = get_coherent_dma_mask(dev); + struct page *page = NULL; void *addr;
- *handle = ~0; - size = PAGE_ALIGN(size); +#ifdef CONFIG_DMA_API_DEBUG + u64 limit = (mask + 1) & ~mask; + if (limit && size >= limit) { + dev_warn(dev, "coherent allocation too big (requested %#x mask %#llx)\n", + size, mask); + return NULL; + } +#endif
- page = __dma_alloc_buffer(dev, size, gfp); - if (!page) + if (!mask) return NULL;
- if (!arch_is_coherent()) - addr = __dma_alloc_remap(page, size, gfp, prot); + if (mask < 0xffffffffULL) + gfp |= GFP_DMA; + + *handle = ~0; + size = PAGE_ALIGN(size); + + if (arch_is_coherent() || nommu()) + addr = __alloc_simple_buffer(dev, size, gfp, &page); + else if (cpu_architecture() < CPU_ARCH_ARMv6) + addr = __alloc_remap_buffer(dev, size, gfp, prot, &page); + else if (gfp & GFP_ATOMIC) + addr = __alloc_from_pool(dev, size, &page); else - addr = page_address(page); + addr = __alloc_from_contiguous(dev, size, prot, &page);
if (addr) *handle = pfn_to_dma(dev, page_to_pfn(page)); @@ -332,8 +568,8 @@ __dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, * Allocate DMA-coherent memory space and return both the kernel remapped * virtual and bus address for that space. */ -void * -dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp) +void *dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, + gfp_t gfp) { void *memory;
@@ -362,25 +598,11 @@ static int dma_mmap(struct device *dev, struct vm_area_struct *vma, { int ret = -ENXIO; #ifdef CONFIG_MMU - unsigned long user_size, kern_size; - struct arm_vmregion *c; - - user_size = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; - - c = arm_vmregion_find(&consistent_head, (unsigned long)cpu_addr); - if (c) { - unsigned long off = vma->vm_pgoff; - - kern_size = (c->vm_end - c->vm_start) >> PAGE_SHIFT; - - if (off < kern_size && - user_size <= (kern_size - off)) { - ret = remap_pfn_range(vma, vma->vm_start, - page_to_pfn(c->vm_pages) + off, - user_size << PAGE_SHIFT, - vma->vm_page_prot); - } - } + unsigned long pfn = dma_to_pfn(dev, dma_addr); + ret = remap_pfn_range(vma, vma->vm_start, + pfn + vma->vm_pgoff, + vma->vm_end - vma->vm_start, + vma->vm_page_prot); #endif /* CONFIG_MMU */
return ret; @@ -402,23 +624,33 @@ int dma_mmap_writecombine(struct device *dev, struct vm_area_struct *vma, } EXPORT_SYMBOL(dma_mmap_writecombine);
+ /* - * free a page as defined by the above mapping. - * Must not be called with IRQs disabled. + * Free a buffer as defined by the above mapping. */ void dma_free_coherent(struct device *dev, size_t size, void *cpu_addr, dma_addr_t handle) { - WARN_ON(irqs_disabled()); + struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
if (dma_release_from_coherent(dev, get_order(size), cpu_addr)) return;
size = PAGE_ALIGN(size);
- if (!arch_is_coherent()) + if (arch_is_coherent() || nommu()) { + __dma_free_buffer(page, size); + } else if (cpu_architecture() < CPU_ARCH_ARMv6) { __dma_free_remap(cpu_addr, size); - - __dma_free_buffer(pfn_to_page(dma_to_pfn(dev, handle)), size); + __dma_free_buffer(page, size); + } else { + if (__free_from_pool(cpu_addr, size)) + return; + /* + * Non-atomic allocations cannot be freed with IRQs disabled + */ + WARN_ON(irqs_disabled()); + __free_from_contiguous(dev, page, size); + } } EXPORT_SYMBOL(dma_free_coherent);
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 91bca35..b35b782 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -20,6 +20,7 @@ #include <linux/gfp.h> #include <linux/memblock.h> #include <linux/sort.h> +#include <linux/dma-contiguous.h>
#include <asm/mach-types.h> #include <asm/prom.h> @@ -370,6 +371,13 @@ void __init arm_memblock_init(struct meminfo *mi, struct machine_desc *mdesc) if (mdesc->reserve) mdesc->reserve();
+ /* reserve memory for DMA contigouos allocations */ +#ifdef CONFIG_ZONE_DMA + dma_contiguous_reserve(PHYS_OFFSET + mdesc->dma_zone_size - 1); +#else + dma_contiguous_reserve(0); +#endif + memblock_analyze(); memblock_dump_all(); } diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h index 0105667..4bc58df 100644 --- a/arch/arm/mm/mm.h +++ b/arch/arm/mm/mm.h @@ -29,5 +29,8 @@ extern u32 arm_dma_limit; #define arm_dma_limit ((u32)~0) #endif
+extern phys_addr_t arm_lowmem_limit; + void __init bootmem_init(void); void arm_mm_memblock_reserve(void); +void dma_contiguous_remap(void); diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index 594d677..bc4e7e8 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -273,6 +273,11 @@ static struct mem_type mem_types[] = { .prot_l1 = PMD_TYPE_TABLE, .domain = DOMAIN_KERNEL, }, + [MT_MEMORY_DMA_READY] = { + .prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY, + .prot_l1 = PMD_TYPE_TABLE, + .domain = DOMAIN_KERNEL, + }, };
const struct mem_type *get_mem_type(unsigned int type) @@ -414,6 +419,7 @@ static void __init build_mem_type_table(void) if (arch_is_coherent() && cpu_is_xsc3()) { mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S; mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED; + mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED; mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S; mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED; } @@ -443,6 +449,7 @@ static void __init build_mem_type_table(void) mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED; mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S; mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED; + mem_types[MT_MEMORY_DMA_READY].prot_pte |= L_PTE_SHARED; mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S; mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED; } @@ -482,6 +489,7 @@ static void __init build_mem_type_table(void) mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask; mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd; mem_types[MT_MEMORY].prot_pte |= kern_pgprot; + mem_types[MT_MEMORY_DMA_READY].prot_pte |= kern_pgprot; mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask; mem_types[MT_ROM].prot_sect |= cp->pmd;
@@ -561,7 +569,7 @@ static void __init alloc_init_section(pud_t *pud, unsigned long addr, * L1 entries, whereas PGDs refer to a group of L1 entries making * up one logical pointer to an L2 table. */ - if (((addr | end | phys) & ~SECTION_MASK) == 0) { + if (type->prot_sect && ((addr | end | phys) & ~SECTION_MASK) == 0) { pmd_t *p = pmd;
if (addr & SECTION_SIZE) @@ -757,7 +765,7 @@ static int __init early_vmalloc(char *arg) } early_param("vmalloc", early_vmalloc);
-static phys_addr_t lowmem_limit __initdata = 0; +phys_addr_t arm_lowmem_limit __initdata = 0;
void __init sanity_check_meminfo(void) { @@ -826,8 +834,8 @@ void __init sanity_check_meminfo(void) bank->size = newsize; } #endif - if (!bank->highmem && bank->start + bank->size > lowmem_limit) - lowmem_limit = bank->start + bank->size; + if (!bank->highmem && bank->start + bank->size > arm_lowmem_limit) + arm_lowmem_limit = bank->start + bank->size;
j++; } @@ -852,7 +860,7 @@ void __init sanity_check_meminfo(void) } #endif meminfo.nr_banks = j; - memblock_set_current_limit(lowmem_limit); + memblock_set_current_limit(arm_lowmem_limit); }
static inline void prepare_page_table(void) @@ -877,8 +885,8 @@ static inline void prepare_page_table(void) * Find the end of the first block of lowmem. */ end = memblock.memory.regions[0].base + memblock.memory.regions[0].size; - if (end >= lowmem_limit) - end = lowmem_limit; + if (end >= arm_lowmem_limit) + end = arm_lowmem_limit;
/* * Clear out all the kernel space mappings, except for the first @@ -1010,8 +1018,8 @@ static void __init map_lowmem(void) phys_addr_t end = start + reg->size; struct map_desc map;
- if (end > lowmem_limit) - end = lowmem_limit; + if (end > arm_lowmem_limit) + end = arm_lowmem_limit; if (start >= end) break;
@@ -1032,11 +1040,12 @@ void __init paging_init(struct machine_desc *mdesc) { void *zero_page;
- memblock_set_current_limit(lowmem_limit); + memblock_set_current_limit(arm_lowmem_limit);
build_mem_type_table(); prepare_page_table(); map_lowmem(); + dma_contiguous_remap(); devicemaps_init(mdesc); kmap_init();
On Fri, Aug 19, 2011 at 10:27, Marek Szyprowski wrote:
arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-contiguous.h | 33 +++
seems like these would be good asm-generic/ additions rather than arm -mike
Hello,
On Thursday, September 08, 2011 7:27 PM Mike Frysinger wrote:
On Fri, Aug 19, 2011 at 10:27, Marek Szyprowski wrote:
arch/arm/include/asm/device.h | 3 + arch/arm/include/asm/dma-contiguous.h | 33 +++
seems like these would be good asm-generic/ additions rather than arm
Only some of them can be really moved to asm-generic imho. The following lines are definitely architecture specific:
void dma_contiguous_early_fixup(phys_addr_t base, unsigned long size);
Some other archs might define empty fixup function. Right now only ARM architecture is the real client of the CMA. IMHO if any other arch stats using CMA, some of the CMA definitions can be then moved to asm-generic. Right now I wanted to keep it as simple as possible.
Best regards
This patch is an example how device private CMA area can be activated. It creates one CMA region and assigns it to the first s5p-fimc device on Samsung Goni S5PC110 board.
Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Kyungmin Park kyungmin.park@samsung.com --- arch/arm/mach-s5pv210/mach-goni.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mach-s5pv210/mach-goni.c b/arch/arm/mach-s5pv210/mach-goni.c index 14578f5..f766c45 100644 --- a/arch/arm/mach-s5pv210/mach-goni.c +++ b/arch/arm/mach-s5pv210/mach-goni.c @@ -26,6 +26,7 @@ #include <linux/input.h> #include <linux/gpio.h> #include <linux/interrupt.h> +#include <linux/dma-contiguous.h>
#include <asm/mach/arch.h> #include <asm/mach/map.h> @@ -857,6 +858,9 @@ static void __init goni_map_io(void) static void __init goni_reserve(void) { s5p_mfc_reserve_mem(0x43000000, 8 << 20, 0x51000000, 8 << 20); + + /* Create private 16MiB contiguous memory area for s5p-fimc.0 device */ + dma_declare_contiguous(&s5p_device_fimc0.dev, 16*SZ_1M, 0); }
static void __init goni_machine_init(void)
linaro-mm-sig@lists.linaro.org