On Mon, Sep 25, 2023 at 10:16 AM Yang Shi yang@os.amperecomputing.com wrote:
On 9/25/23 8:48 AM, Andrew Morton wrote:
On Wed, 20 Sep 2023 15:32:42 -0700 Yang Shi yang@os.amperecomputing.com wrote:
When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel should attempt to migrate all existing pages, and return -EIO if there is misplaced or unmovable page. Then commit 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") messed up the return value and didn't break VMA scan early ianymore when MPOL_MF_STRICT alone. The return value problem was fixed by commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke the VMA walk early if unmovable page is met, it may cause some pages are not migrated as expected.
So I'm thinking that a7f40cfe3b7a is the suitable Fixes: target?
Yes, thanks. My follow-up email also added this.
The code should conceptually do:
if (MPOL_MF_MOVE|MOVEALL) scan all vmas try to migrate the existing pages return success else if (MPOL_MF_MOVE* | MPOL_MF_STRICT) scan all vmas try to migrate the existing pages return -EIO if unmovable or migration failed else /* MPOL_MF_STRICT alone */ break early if meets unmovable and don't call mbind_range() at all else /* none of those flags */ check the ranges in test_walk, EFAULT without mbind_range() if discontig.
With this change I think my temporary fix at https://lore.kernel.org/all/20230918211608.3580629-1-surenb@google.com/ can be removed because we either scan all vmas (which means we locked them all) or we break early and do not call mbind_range() at all (in which case we don't need vmas to be locked).
Fixed the behavior.