Hi,
The commit e9da6e9 "ARM: dma-mapping: remove custom consistent dma region" breaks the compatibility with existing drivers. This causes the following kernel oops(*1). That driver has called dma_pool_alloc() to allocate memory from the interrupt context, and it hits BUG_ON(in_interrpt()) in "get_vm_area_caller()". This patch seris fixes this problem with making use of the pre-allocate atomic memory pool which DMA is using in the same way as DMA does now.
Any comment would be really appreciated.
v2: Don't modify attrs(DMA_ATTR_NO_KERNEL_MAPPING) for atomic allocation. (Marek) Skip vzalloc (KyongHo, Minchan) v1: http://lists.linaro.org/pipermail/linaro-mm-sig/2012-August/002398.html
*1: [ 8.321343] ------------[ cut here ]------------ [ 8.325971] kernel BUG at kernel/mm/vmalloc.c:1322! [ 8.333615] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM [ 8.339436] Modules linked in: [ 8.342496] CPU: 0 Tainted: G W (3.4.6-00067-g5d485f7 #67) [ 8.349192] PC is at __get_vm_area_node.isra.29+0x164/0x16c [ 8.354758] LR is at get_vm_area_caller+0x4c/0x54 [ 8.359454] pc : [<c011297c>] lr : [<c011318c>] psr: 20000193 [ 8.359458] sp : c09edca0 ip : c09ec000 fp : ae278000 [ 8.370922] r10: f0000000 r9 : c011aa54 r8 : c0a26cb8 [ 8.376136] r7 : 00000001 r6 : 000000d0 r5 : 20000008 r4 : c09edca0 [ 8.382651] r3 : 00010000 r2 : 20000008 r1 : 00000001 r0 : 00001000 [ 8.389166] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel [ 8.396549] Control: 10c5387d Table: ad98c04a DAC: 00000015 .... [ 9.169162] dfa0: 412fc099 c09ec000 00000000 c000fdd8 c06df1e4 c0a1b080 00000000 00000000 [ 9.177329] dfc0: c0a235cc 8000406a 00000000 c0986818 ffffffff ffffffff c0986404 00000000 [ 9.185497] dfe0: 00000000 c09bb070 10c5387d c0a19c58 c09bb064 80008044 00000000 00000000 [ 9.193673] [<c011297c>] (__get_vm_area_node.isra.29+0x164/0x16c) from [<c011318c>] (get_vm_area_caller+0x4c/0x54) [ 9.204022] [<c011318c>] (get_vm_area_caller+0x4c/0x54) from [<c001aed8>] (__iommu_alloc_remap.isra.14+0x2c/0xfc) [ 9.214276] [<c001aed8>] (__iommu_alloc_remap.isra.14+0x2c/0xfc) from [<c001b06c>] (arm_iommu_alloc_attrs+0xc4/0xf8) [ 9.224795] [<c001b06c>] (arm_iommu_alloc_attrs+0xc4/0xf8) from [<c011aa54>] (pool_alloc_page.constprop.5+0x6c/0xf8) [ 9.235309] [<c011aa54>] (pool_alloc_page.constprop.5+0x6c/0xf8) from [<c011ab60>] (dma_pool_alloc+0x80/0x170) [ 9.245304] [<c011ab60>] (dma_pool_alloc+0x80/0x170) from [<c03cbbcc>] (tegra_build_dtd+0x48/0x14c) [ 9.254344] [<c03cbbcc>] (tegra_build_dtd+0x48/0x14c) from [<c03cbd4c>] (tegra_req_to_dtd+0x7c/0xa8) [ 9.263467] [<c03cbd4c>] (tegra_req_to_dtd+0x7c/0xa8) from [<c03cc140>] (tegra_ep_queue+0x154/0x33c) [ 9.272592] [<c03cc140>] (tegra_ep_queue+0x154/0x33c) from [<c03dd5b4>] (composite_setup+0x364/0x6d4) [ 9.281804] [<c03dd5b4>] (composite_setup+0x364/0x6d4) from [<c03dd9dc>] (android_setup+0xb8/0x14c) [ 9.290843] [<c03dd9dc>] (android_setup+0xb8/0x14c) from [<c03cd144>] (setup_received_irq+0xbc/0x270) [ 9.300053] [<c03cd144>] (setup_received_irq+0xbc/0x270) from [<c03cda64>] (tegra_udc_irq+0x2ac/0x2c4) [ 9.309353] [<c03cda64>] (tegra_udc_irq+0x2ac/0x2c4) from [<c00b5708>] (handle_irq_event_percpu+0x78/0x2e0) [ 9.319087] [<c00b5708>] (handle_irq_event_percpu+0x78/0x2e0) from [<c00b59b4>] (handle_irq_event+0x44/0x64) [ 9.328907] [<c00b59b4>] (handle_irq_event+0x44/0x64) from [<c00b8688>] (handle_fasteoi_irq+0xc4/0x16c) [ 9.338294] [<c00b8688>] (handle_fasteoi_irq+0xc4/0x16c) from [<c00b4f14>] (generic_handle_irq+0x34/0x48) [ 9.347858] [<c00b4f14>] (generic_handle_irq+0x34/0x48) from [<c000f6f4>] (handle_IRQ+0x54/0xb4) [ 9.356637] [<c000f6f4>] (handle_IRQ+0x54/0xb4) from [<c00084b0>] (gic_handle_irq+0x2c/0x60) [ 9.365068] [<c00084b0>] (gic_handle_irq+0x2c/0x60) from [<c000e900>] (__irq_svc+0x40/0x70) [ 9.373405] Exception stack(0xc09edf10 to 0xc09edf58) [ 9.378447] df00: 00000000 000f4240 00000003 00000000 [ 9.386615] df20: 00000000 e55bbc00 ef66f3ca 00000001 00000000 412fc099 c0abb9c8 00000000 [ 9.394781] df40: 3b9ac9ff c09edf58 c027a9bc c0042880 20000113 ffffffff [ 9.401396] [<c000e900>] (__irq_svc+0x40/0x70) from [<c0042880>] (tegra_idle_enter_lp3+0x68/0x78) [ 9.410272] [<c0042880>] (tegra_idle_enter_lp3+0x68/0x78) from [<c04701d4>] (cpuidle_idle_call+0xdc/0x3a4) [ 9.419922] [<c04701d4>] (cpuidle_idle_call+0xdc/0x3a4) from [<c000fdd8>] (cpu_idle+0xd8/0x134) [ 9.428612] [<c000fdd8>] (cpu_idle+0xd8/0x134) from [<c0986818>] (start_kernel+0x27c/0x2cc) [ 9.436952] Code: e1a00004 e3a04000 eb002265 eaffffe0 (e7f001f2) [ 9.443038] ---[ end trace 1b75b31a2719ed24 ]--- [ 9.447645] Kernel panic - not syncing: Fatal exception in interrupt
Hiroshi Doyu (4): ARM: dma-mapping: Refactor out to introduce __in_atomic_pool ARM: dma-mapping: Use kzalloc() with GFP_ATOMIC ARM: dma-mapping: Refactor out to introduce __alloc_fill_pages ARM: dma-mapping: IOMMU allocates pages from atomic_pool with GFP_ATOMIC
arch/arm/mm/dma-mapping.c | 102 ++++++++++++++++++++++++++++++++++----------- 1 files changed, 77 insertions(+), 25 deletions(-)
Check the given range("start", "size") is included in "atomic_pool" or not.
Signed-off-by: Hiroshi Doyu hdoyu@nvidia.com --- arch/arm/mm/dma-mapping.c | 25 +++++++++++++++++++------ 1 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 601da7a..aca2fd0 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -492,19 +492,32 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page) return ptr; }
+static bool __in_atomic_pool(void *start, size_t size) +{ + struct dma_pool *pool = &atomic_pool; + void *end = start + size; + void *pool_start = pool->vaddr; + void *pool_end = pool->vaddr + pool->size; + + if (start < pool_start || start > pool_end) + return false; + + if (end > pool_end) { + WARN(1, "freeing wrong coherent size from pool\n"); + return false; + } + + return true; +} + static int __free_from_pool(void *start, size_t size) { struct dma_pool *pool = &atomic_pool; unsigned long pageno, count; unsigned long flags;
- if (start < pool->vaddr || start > pool->vaddr + pool->size) - return 0; - - if (start + size > pool->vaddr + pool->size) { - WARN(1, "freeing wrong coherent size from pool\n"); + if (!__in_atomic_pool(start, size)) return 0; - }
pageno = (start - pool->vaddr) >> PAGE_SHIFT; count = size >> PAGE_SHIFT;
Use kzalloc() with GFP_ATOMIC instead of vzalloc(). At freeing, __in_atomic_pool() checks if it comes from atomic_pool or not.
Signed-off-by: Hiroshi Doyu hdoyu@nvidia.com --- arch/arm/mm/dma-mapping.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index aca2fd0..b64475a 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1029,7 +1029,7 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t int array_size = count * sizeof(struct page *); int i = 0;
- if (array_size <= PAGE_SIZE) + if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) pages = kzalloc(array_size, gfp); else pages = vzalloc(array_size); @@ -1061,7 +1061,7 @@ error: while (i--) if (pages[i]) __free_pages(pages[i], 0); - if (array_size <= PAGE_SIZE) + if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) kfree(pages); else vfree(pages); @@ -1076,7 +1076,8 @@ static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t s for (i = 0; i < count; i++) if (pages[i]) __free_pages(pages[i], 0); - if (array_size <= PAGE_SIZE) + if ((array_size <= PAGE_SIZE) || + __in_atomic_pool(page_address(pages[0]), size)) kfree(pages); else vfree(pages);
__alloc_fill_pages() allocates power of 2 page number allocation at most repeatedly.
Signed-off-by: Hiroshi Doyu hdoyu@nvidia.com --- arch/arm/mm/dma-mapping.c | 50 ++++++++++++++++++++++++++++++-------------- 1 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index b64475a..7ab016b 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1022,20 +1022,11 @@ static inline void __free_iova(struct dma_iommu_mapping *mapping, spin_unlock_irqrestore(&mapping->lock, flags); }
-static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t gfp) +static int __alloc_fill_pages(struct page ***ppages, int count, gfp_t gfp) { - struct page **pages; - int count = size >> PAGE_SHIFT; - int array_size = count * sizeof(struct page *); + struct page **pages = *ppages; int i = 0;
- if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) - pages = kzalloc(array_size, gfp); - else - pages = vzalloc(array_size); - if (!pages) - return NULL; - while (count) { int j, order = __fls(count);
@@ -1045,22 +1036,49 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, gfp_t if (!pages[i]) goto error;
- if (order) + if (order) { split_page(pages[i], order); - j = 1 << order; - while (--j) - pages[i + j] = pages[i] + j; + j = 1 << order; + while (--j) + pages[i + j] = pages[i] + j; + }
__dma_clear_buffer(pages[i], PAGE_SIZE << order); i += 1 << order; count -= 1 << order; }
- return pages; + return 0; + error: while (i--) if (pages[i]) __free_pages(pages[i], 0); + return -ENOMEM; +} + +static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, + gfp_t gfp) +{ + struct page **pages; + int count = size >> PAGE_SHIFT; + int array_size = count * sizeof(struct page *); + int err; + + if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) + pages = kzalloc(array_size, gfp); + else + pages = vzalloc(array_size); + if (!pages) + return NULL; + + err = __alloc_fill_pages(&pages, count, gfp); + if (err) + goto error + + return pages; + +error: if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) kfree(pages); else
Makes use of the same atomic pool from DMA, and skips kernel page mapping which can involve sleep'able operations at allocating a kernel page table.
Signed-off-by: Hiroshi Doyu hdoyu@nvidia.com --- arch/arm/mm/dma-mapping.c | 30 +++++++++++++++++++++++++----- 1 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 7ab016b..433312a 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1063,7 +1063,6 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, struct page **pages; int count = size >> PAGE_SHIFT; int array_size = count * sizeof(struct page *); - int err;
if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) pages = kzalloc(array_size, gfp); @@ -1072,9 +1071,20 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, if (!pages) return NULL;
- err = __alloc_fill_pages(&pages, count, gfp); - if (err) - goto error + if (gfp & GFP_ATOMIC) { + struct page *page; + int i; + void *addr = __alloc_from_pool(size, &page); + if (!addr) + goto error; + + for (i = 0; i < count; i++) + pages[i] = page + i; + } else { + int err = __alloc_fill_pages(&pages, count, gfp); + if (err) + goto error; + }
return pages;
@@ -1091,9 +1101,15 @@ static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t s int count = size >> PAGE_SHIFT; int array_size = count * sizeof(struct page *); int i; + + if (__free_from_pool(page_address(pages[0]), size)) + goto out; + for (i = 0; i < count; i++) if (pages[i]) __free_pages(pages[i], 0); + +out: if ((array_size <= PAGE_SIZE) || __in_atomic_pool(page_address(pages[0]), size)) kfree(pages); @@ -1221,6 +1237,9 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, if (*handle == DMA_ERROR_CODE) goto err_buffer;
+ if (gfp & GFP_ATOMIC) + return page_address(pages[0]); + if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) return pages;
@@ -1279,7 +1298,8 @@ void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, return; }
- if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) { + if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs) || + !__in_atomic_pool(cpu_addr, size)) { unmap_kernel_range((unsigned long)cpu_addr, size); vunmap(cpu_addr); }
Hi Hiroshi,
On Thursday, August 23, 2012 8:10 AM Hiroshi Doyu wrote:
Makes use of the same atomic pool from DMA, and skips kernel page mapping which can involve sleep'able operations at allocating a kernel page table.
Signed-off-by: Hiroshi Doyu hdoyu@nvidia.com
arch/arm/mm/dma-mapping.c | 30 +++++++++++++++++++++++++----- 1 files changed, 25 insertions(+), 5 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 7ab016b..433312a 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -1063,7 +1063,6 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, struct page **pages; int count = size >> PAGE_SHIFT; int array_size = count * sizeof(struct page *);
int err;
if ((array_size <= PAGE_SIZE) || (gfp & GFP_ATOMIC)) pages = kzalloc(array_size, gfp);
@@ -1072,9 +1071,20 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size, if (!pages) return NULL;
- err = __alloc_fill_pages(&pages, count, gfp);
- if (err)
goto error
if (gfp & GFP_ATOMIC) {
struct page *page;
int i;
void *addr = __alloc_from_pool(size, &page);
if (!addr)
goto error;
for (i = 0; i < count; i++)
pages[i] = page + i;
} else {
int err = __alloc_fill_pages(&pages, count, gfp);
if (err)
goto error;
}
return pages;
@@ -1091,9 +1101,15 @@ static int __iommu_free_buffer(struct device *dev, struct page **pages, size_t s int count = size >> PAGE_SHIFT; int array_size = count * sizeof(struct page *); int i;
- if (__free_from_pool(page_address(pages[0]), size))
goto out;
- for (i = 0; i < count; i++) if (pages[i]) __free_pages(pages[i], 0);
+out: if ((array_size <= PAGE_SIZE) || __in_atomic_pool(page_address(pages[0]), size)) kfree(pages); @@ -1221,6 +1237,9 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, if (*handle == DMA_ERROR_CODE) goto err_buffer;
- if (gfp & GFP_ATOMIC)
return page_address(pages[0]);
- if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) return pages;
I've read the whole code and it looks that it starts to become a little spaghetti - there are too many places altered by those atomic changes and it is hard to follow the logic at the first sight.
IMHO it will be better to add a following function:
static void *__iommu_alloc_atomic(struct device *dev, size_t size, dma_addr_t *handle) { struct page *page, **pages; int count = size >> PAGE_SHIFT; int array_size = count * sizeof(struct page *); void *addr; int i;
pages = kzalloc(array_size, gfp); if (!pages) return NULL;
addr = __alloc_from_pool(size, &page); if (!addr) goto err_pool;
for (i = 0; i < count; i++) pages[i] = page + i;
*handle = __iommu_create_mapping(dev, pages, size); if (*handle == DMA_ERROR_CODE) goto err_mapping;
return addr;
err_mapping: __free_from_pool(addr, size); err_pool: kfree(pages); return NULL; }
and then call it at the beginning of the arm_iommu_alloc_attrs():
if (gfp & GFP_ATOMIC) return __iommu_alloc_atomic(dev, size, handle);
You should also add support for the allocation from atomic_pool to __iommu_get_pages() function to get dma_mmap() and dma_get_sgtable() working correctly.
@@ -1279,7 +1298,8 @@ void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr, return; }
- if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
- if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs) ||
unmap_kernel_range((unsigned long)cpu_addr, size); vunmap(cpu_addr); }!__in_atomic_pool(cpu_addr, size)) {
Are you sure this one works? __iommu_get_pages() won't find pages array, because atomic allocations don't get their separate vm_struct area, so this check will be never reached. Maybe a __iommu_free_atomic() function would also make it safer and easier to understand.
Best regards
On Thu, Aug 23, 2012 at 09:10:25AM +0300, Hiroshi Doyu wrote:
Hi,
The commit e9da6e9 "ARM: dma-mapping: remove custom consistent dma region" breaks the compatibility with existing drivers. This causes the following kernel oops(*1). That driver has called dma_pool_alloc() to allocate memory from the interrupt context, and it hits BUG_ON(in_interrpt()) in "get_vm_area_caller()". This patch seris fixes this problem with making use of the pre-allocate atomic memory pool which DMA is using in the same way as DMA does now.
Any comment would be really appreciated.
v2: Don't modify attrs(DMA_ATTR_NO_KERNEL_MAPPING) for atomic allocation. (Marek) Skip vzalloc (KyongHo, Minchan)
Huh? I would like to correct exactly. I didn't say that kzalloc unify is okay. As KyongHo said, there are other usecases for allocating big buffer but I can't agree his opinion that it's system memory shortage if the allocation fails. Because there are lots of freeable pages(cached page + anon pages with swap , shrinkable slab and so on). But as I said early, VM can't do anything except relying on kswapd in atomic context while there are a ton of freeable pages in system and we can't gaurantee kswapd will do something for us before seeing the page allocation failure in process context. Especially, UP is more problem. So, it never indicate system needs more memory.
Atomic high order allocation is very fragile in VM POV so caller should have fallback mechanism. Otherwise, don't do that.
linaro-mm-sig@lists.linaro.org