On 3/20/19 1:16 AM, Oscar Salvador wrote:
On Wed, Mar 20, 2019 at 02:35:56AM +0800, Yang Shi wrote:
Fixes: 6f4576e3687b ("mempolicy: apply page table walker on queue_pages_range()") Reported-by: Cyril Hrubis chrubis@suse.cz Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Suggested-by: Kirill A. Shutemov kirill@shutemov.name Signed-off-by: Yang Shi yang.shi@linux.alibaba.com Signed-off-by: Oscar Salvador osalvador@suse.de
Hi Yang, thanks for the patch.
Some observations below.
} page = pmd_page(*pmd); @@ -473,8 +480,15 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, ret = 1; flags = qp->flags; /* go to thp migration */
- if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))
- if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(walk->vma)) {
ret = -EIO;
goto unlock;
}
- migrate_page_add(page, qp->pagelist, flags);
- } else
ret = -EIO;
if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(walk->vma)) { ret = -EIO; goto unlock; }
migrate_page_add(page, qp->pagelist, flags); unlock: spin_unlock(ptl); out: return ret;
seems more clean to me?
Yes, it sounds so.
unlock: spin_unlock(ptl); out: @@ -499,8 +513,10 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { ret = queue_pages_pmd(pmd, ptl, addr, end, walk);
if (ret)
if (ret > 0) return 0;
else if (ret < 0)
return ret;
I would go with the following, but that's a matter of taste I guess.
if (ret < 0) return ret; else return 0;
No, this is not correct. queue_pages_pmd() may return 0, which means THP gets split. If it returns 0 the code should just fall through instead of returning.
} if (pmd_trans_unstable(pmd)) @@ -521,11 +537,16 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, continue; if (!queue_pages_required(page, qp)) continue;
migrate_page_add(page, qp->pagelist, flags);
if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) {
if (!vma_migratable(vma))
break;
migrate_page_add(page, qp->pagelist, flags);
} else
break;
I might be missing something, but AFAICS neither vma nor flags is going to change while we are in queue_pages_pte_range(), so, could not we move the check just above the loop? In that way, 1) we only perform the check once and 2) if we enter the loop we know that we are going to do some work, so, something like:
index af171ccb56a2..7c0e44389826 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -487,6 +487,9 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, if (pmd_trans_unstable(pmd)) return 0;
if (!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) || !vma_migratable(vma))
return -EIO;
It sounds not correct to me. We need check if there is existing page on the node which is not allowed by the policy. This is what queue_pages_required() does.
Thanks, Yang
pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); for (; addr != end; pte++, addr += PAGE_SIZE) { if (!pte_present(*pte))
} pte_unmap_unlock(pte - 1, ptl); cond_resched();
- return 0;
- return addr != end ? -EIO : 0;
If we can do the above, we can leave the return value as it was.