On Wed, Mar 16, 2022 at 6:58 AM Andrew Morton akpm@linux-foundation.org wrote:
On Tue, 15 Mar 2022 22:45:20 +0800 Dong Aisheng aisheng.dong@nxp.com wrote:
--- a/mm/cma.c +++ b/mm/cma.c
...
@@ -457,6 +458,16 @@ struct page *cma_alloc(struct cma *cma, unsigned long count, offset); if (bitmap_no >= bitmap_maxno) { spin_unlock_irq(&cma->lock);
pr_debug("%s(): alloc fail, retry loop %d\n", __func__, loop++);
/*
* rescan as others may finish the memory migration
* and quit if no available CMA memory found finally
*/
if (start) {
schedule();
start = 0;
continue;
} break;
The schedule() is problematic. For a start, we'd normally use cond_resched() here, so we avoid calling the more expensive schedule() if we know it won't perform any action.
But cond_resched() is problematic if this thread has realtime scheduling policy and the process we're waiting on does not. One way to address that is to use an unconditional msleep(1), but that's still just a hack.
I think we can simply drop schedule() here during the second round of retry as the estimated delay may not be really needed.
Do you think that's ok?
A much cleaner solution is to use appropriate locking so that various threads run this code in order, without messing each other up.
And it looks like the way to do that is to simply revert the commit which caused this regression, a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")?
Yes, agree it could be a backup solution if not better ideas.
Regards Aisheng