On Thu, Aug 30, 2012 at 6:40 PM, Laura Abbott <lauraa@codeaurora.org> wrote:
Hi,


On 8/30/2012 8:45 AM, Aubertin, Guillaume wrote:
looping-in the linaro-mm-sig ML.

On Thu, Aug 30, 2012 at 4:47 PM, Aubertin, Guillaume <g-aubertin@ti.com
<mailto:g-aubertin@ti.com>> wrote:

    hi guys,

    I've been working for a few days on getting a proper rmmod with the
    remoteproc/rpmsg modules, and I stumbled upon an interesting issue.

    when doing sucessive memory allocation and release in the CMA
    reservation (by loading/unloading the firmware several times),  the
    following message shows up  :

    [  119.908477] cma: dma_alloc_from_contiguous(cma ed10ad00, count
    256, align 8)
    [  119.908843] cma: dma_alloc_from_contiguous(): memory range at
    c0dfb000 is busy, retrying
    [  119.909698] cma: dma_alloc_from_contiguous(): returned c0dfd000

    dma_alloc_from_contiguous() tries to allocate the following range,
    0xc0dfd000, succesfully this time.

    In some cases, the allocation fails after trying several ranges :

    [  119.912231] cma: dma_alloc_from_contiguous(cma ed10ad00, count
    768, align 8)
    [  119.912719] cma: dma_alloc_from_contiguous(): memory range at
    c0dff000 is busy, retrying
    [  119.913055] cma: dma_alloc_from_contiguous(): memory range at
    c0e01000 is busy, retrying
    [  119.913055] rproc remoteproc0: dma_alloc_coherent failed: 3145728

    Here is my understanding so far :

    First, even if we made a CMA reservation, the kernel can still
    allocate pages in this area, but these pages must be movable (user
    process page by example).

    When dma_alloc_from_contiguous() is called to allocate X pages, it
    looks for the next X contiguous free pages in it's CMA bitmap (with
    respect to the memory alignment). Then, alloc_contig_range() is
    called to allocate the given range of pages.  Alloc_contig_range()
    analyses the pages we want to allocate, and if a page is already
    used, it is migrated to a new page outside the page array we want to
    reserve. this is done using isolate_migratepages_range() to list the
    pages to migrate, and migrate_pages() to try to migrate the pages,
    and that's where it fails. Below is a list of next function calls :

      fallback_migrate_page() --> migrate_page()
      --> try_to_release_page()  --> try_to_free_buffer() -->
    drop_buffers() --> buffer_busy()

    I understand here that the page contains used buffers that can't be
    dropped, and so the page can't be migrated. Well, I must admit that
    once here, I'm feeling a little lost in this ocean of memory
    management code ;). After a few researches, I found the following
    thread on the linux-arm-kernel ML talking about the same issue :

    http://lists.infradead.org/pipermail/linux-arm-kernel/2012-June/102844.html with
    the following patch :

    / mm/page_alloc.c |    3 ++-/
    / 1 files changed, 2 insertions(+), 1 deletions(-)/
    /
    /
    /diff --git a/mm/page_alloc.c b/mm/page_alloc.c/
    /index 0e1c6f5..c9a6483 100644/
    /--- a/mm/page_alloc.c/
    /+++ b/mm/page_alloc.c/
    /@@ -1310,7 +1310,8 @@ void free_hot_cold_page(struct page *page,

    int cold)/
    /* excessively into the page allocator/
    /*//
    /if (migratetype >= MIGRATE_PCPTYPES) {/
    /-if (unlikely(migratetype == MIGRATE_ISOLATE)) {/
    /+if (unlikely(migratetype == MIGRATE_ISOLATE)/
    /+  || is_migrate_cma(migratetype)) {/
    /free_one_page(zone, page, 0, migratetype);/
    /goto out;/
    /}/

    I tried the patch, and it seems to work (I didn't have any "memory
    range busy" in 5000+ tests), but I'm affraid that this could have
    some nasty side effects.

    Any idea ?

    Thanks in advance,
    Guillaume


Hi,

Speaking as the author of that patch, I agree that it does have some nasty side effects and is not the right approach. I finally got down to debugging where the extra references came from and asked about it last night (http://lists.linaro.org/pipermail/linaro-mm-sig/2012-August/002510.html)

Basically, as long as the page cache buffers exist on the LRU list they can't be migrated away. I think the fix should be to drop the buffers from the LRU list when migrating away, but as mentioned there I don't really know the filesystem layer well enough to know if that is the right approach.

Unfortunately all the suggested fixes I've seen have nasty side effects so I don't think there has been any consensus on a good solution.

Thanks,
Laura

Thanks a lot for the clarification. I'll give a shot to your quick hack in find_or_create_page() to be sure that I don't have any other underlying issue, and wait until a proper solution is defined.

Guillaume
 

_______________________________________________
Linaro-mm-sig mailing list
Linaro-mm-sig@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-mm-sig



--
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.



--
Texas Instruments France SA, 821 Avenue Jack Kilby, 06270 Villeneuve Loubet. 036 420 040 R.C.S Antibes. Capital de EUR 753.920