looping-in the linaro-mm-sig ML.

On Thu, Aug 30, 2012 at 4:47 PM, Aubertin, Guillaume <g-aubertin@ti.com> wrote:
hi guys,

I've been working for a few days on getting a proper rmmod with the remoteproc/rpmsg modules, and I stumbled upon an interesting issue. 

when doing sucessive memory allocation and release in the CMA reservation (by loading/unloading the firmware several times),  the following message shows up  :

[  119.908477] cma: dma_alloc_from_contiguous(cma ed10ad00, count 256, align 8)
[  119.908843] cma: dma_alloc_from_contiguous(): memory range at c0dfb000 is busy, retrying
[  119.909698] cma: dma_alloc_from_contiguous(): returned c0dfd000

dma_alloc_from_contiguous() tries to allocate the following range, 0xc0dfd000, succesfully this time.

In some cases, the allocation fails after trying several ranges : 

[  119.912231] cma: dma_alloc_from_contiguous(cma ed10ad00, count 768, align 8)
[  119.912719] cma: dma_alloc_from_contiguous(): memory range at c0dff000 is busy, retrying
[  119.913055] cma: dma_alloc_from_contiguous(): memory range at c0e01000 is busy, retrying
[  119.913055] rproc remoteproc0: dma_alloc_coherent failed: 3145728

Here is my understanding so far :

First, even if we made a CMA reservation, the kernel can still allocate pages in this area, but these pages must be movable (user process page by example).

When dma_alloc_from_contiguous() is called to allocate X pages, it looks for the next X contiguous free pages in it's CMA bitmap (with respect to the memory alignment). Then, alloc_contig_range() is called to allocate the given range of pages.  Alloc_contig_range() analyses the pages we want to allocate, and if a page is already used, it is migrated to a new page outside the page array we want to reserve. this is done using isolate_migratepages_range() to list the pages to migrate, and migrate_pages() to try to migrate the pages, and that's where it fails. Below is a list of next function calls :

 fallback_migrate_page() --> migrate_page()  --> try_to_release_page()  --> try_to_free_buffer() --> drop_buffers() --> buffer_busy()

I understand here that the page contains used buffers that can't be dropped, and so the page can't be migrated. Well, I must admit that once here, I'm feeling a little lost in this ocean of memory management code ;). After a few researches, I found the following thread on the linux-arm-kernel ML talking about the same issue : 

http://lists.infradead.org/pipermail/linux-arm-kernel/2012-June/102844.html with the following patch :

 mm/page_alloc.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0e1c6f5..c9a6483 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1310,7 +1310,8 @@ void free_hot_cold_page(struct page *page, int cold)
  * excessively into the page allocator
  */
  if (migratetype >= MIGRATE_PCPTYPES) {
- if (unlikely(migratetype == MIGRATE_ISOLATE)) {
+ if (unlikely(migratetype == MIGRATE_ISOLATE)
+   || is_migrate_cma(migratetype)) {
  free_one_page(zone, page, 0, migratetype);
  goto out;
  }

I tried the patch, and it seems to work (I didn't have any "memory range busy" in 5000+ tests), but I'm affraid that this could have some nasty side effects. 

Any idea ? 

Thanks in advance,
Guillaume


--
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920



--
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920