On an ARMv7 platform with 1G memory, when MAX_ORDER_NR_PAGES is big (e.g. 32M with max_order 14) and CMA memory is relatively small (e.g. 128M), we observed a huge number of repeat retries of CMA allocation (1k+) during booting when allocating one page for each of 3 mmc instance probe.
This is caused by CMA now supports concurrent allocation since commit a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock"). The memory range, MAX_ORDER_NR_PAGES aligned, from which we are trying to allocate memory may have already been acquired and isolated by others (see: alloc_contig_range()).
Current cma_alloc() will retry the next area by a small step of bitmap_no + mask + 1 which are very likely within the same isolated range and fail again. So when the MAX_ORDER_NR_PAGES is big (e.g. 32M), keep retrying in a small step become meaningless because it will be known to fail at a huge number of times due to that memory range has been isolated by others, especially when allocating only one or two pages.
Instead of looping in the same isolated range and wasting CPU mips a lot, especially for big MAX_ORDER systems (e.g. 16M or 32M), we try the next MAX_ORDER_NR_PAGES directly.
Doing this way can greatly mitigate the situtation.
Below is the original error log during booting: [ 2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0) [ 2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0) [ 2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying [ 2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying [ 2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying [ 2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying [ 2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying [ 2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying .... (+1K retries)
After fix, the 1200+ reties can be reduced to 0. Another test running 8 threads running dma_alloc_coherent() in parallel shows that 1500+ retries dropped to ~145.
IOW this patch can improve the CMA allocation speed a lot when there're enough CMA memory by reducing retries significantly.
Cc: Andrew Morton akpm@linux-foundation.org Cc: Marek Szyprowski m.szyprowski@samsung.com Cc: Lecopzer Chen lecopzer.chen@mediatek.com Cc: David Hildenbrand david@redhat.com Cc: Vlastimil Babka vbabka@suse.cz CC: stable@vger.kernel.org # 5.11+ Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock") Signed-off-by: Dong Aisheng aisheng.dong@nxp.com --- v2->v3: * Improve commit messeages v1->v2: * change to align with MAX_ORDER_NR_PAGES instead of pageblock_nr_pages --- mm/cma.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/cma.c b/mm/cma.c index 46a9fd9f92c4..46bc12fe28b3 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -496,8 +496,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn), count, align); - /* try again with a bit different memory target */ - start = bitmap_no + mask + 1; + /* + * Try again with a bit different memory target. + * Since memory isolated in alloc_contig_range() is aligned + * with MAX_ORDER_NR_PAGES, instead of retrying in a small + * step within the same isolated range, we try the next + * available memory range directly. + */ + start = ALIGN(bitmap_no + mask + 1, + MAX_ORDER_NR_PAGES >> cma->order_per_bit); + }
trace_cma_alloc_finish(cma->name, pfn, page, count, align);