[PATCH v4 1/4] mm/migrate_device.c: Flush TLB while holding PTL

List overview All Threads
Download

newer

older

stable-rc/queue/4.19 baseline: 134...

stable-rc/queue/5.15 baseline: 119...

Alistair Popple

2 Sep 2022 2 Sep '22

12:35 a.m.

When clearing a PTE the TLB should be flushed whilst still holding the PTL to avoid a potential race with madvise/munmap/etc. For example consider the following sequence:

CPU0 CPU1 ---- ----

migrate_vma_collect_pmd() pte_unmap_unlock() madvise(MADV_DONTNEED) -> zap_pte_range() pte_offset_map_lock() [ PTE not present, TLB not flushed ] pte_unmap_unlock() [ page is still accessible via stale TLB ] flush_tlb_range()

In this case the page may still be accessed via the stale TLB entry after madvise returns. Fix this by flushing the TLB while holding the PTL.

Signed-off-by: Alistair Popple apopple@nvidia.com Reported-by: Nadav Amit nadav.amit@gmail.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

---

Changes for v4:

- Added Review-by

Changes for v3:

- New for v3 --- mm/migrate_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d..6a5ef9f 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, migrate->dst[migrate->npages] = 0; migrate->src[migrate->npages++] = mpfn; } - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(ptep - 1, ptl);

/* Only flush the TLB if we actually modified any entries */ if (unmapped) flush_tlb_range(walk->vma, start, end);

+ arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(ptep - 1, ptl); + return 0; }

base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136

-- git-series 0.9.1

Show replies by date

Alistair Popple

2 Sep 2 Sep

12:35 a.m.

New subject: [PATCH v4 2/4] mm/migrate_device.c: Add missing flush_cache_page()

Currently we only call flush_cache_page() for the anon_exclusive case, however in both cases we clear the pte so should flush the cache.

Signed-off-by: Alistair Popple apopple@nvidia.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

---

New for v4 --- mm/migrate_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6a5ef9f..4cc849c 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -193,9 +193,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, bool anon_exclusive; pte_t swp_pte;

+ flush_cache_page(vma, addr, pte_pfn(*ptep)); anon_exclusive = PageAnon(page) && PageAnonExclusive(page); if (anon_exclusive) { - flush_cache_page(vma, addr, pte_pfn(*ptep)); ptep_clear_flush(vma, addr, ptep);

if (page_try_share_anon_rmap(page)) {

-- git-series 0.9.1

David Hildenbrand

6:51 a.m.

New subject: [PATCH v4 2/4] mm/migrate_device.c: Add missing flush_cache_page()

On 02.09.22 02:35, Alistair Popple wrote:

...

Currently we only call flush_cache_page() for the anon_exclusive case, however in both cases we clear the pte so should flush the cache.

Signed-off-by: Alistair Popple apopple@nvidia.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

New for v4

mm/migrate_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6a5ef9f..4cc849c 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -193,9 +193,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, bool anon_exclusive; pte_t swp_pte;
	flush_cache_page(vma, addr, pte_pfn(*ptep));
anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
if (anon_exclusive) {
		flush_cache_page(vma, addr, pte_pfn(*ptep));
	ptep_clear_flush(vma, addr, ptep);
if (page_try_share_anon_rmap(page)) {

Reviewed-by: David Hildenbrand david@redhat.com

-- Thanks, David / dhildenb

Peter Xu

8:39 p.m.

New subject: [PATCH v4 2/4] mm/migrate_device.c: Add missing flush_cache_page()

On Fri, Sep 02, 2022 at 10:35:52AM +1000, Alistair Popple wrote:

...

Currently we only call flush_cache_page() for the anon_exclusive case, however in both cases we clear the pte so should flush the cache.

Signed-off-by: Alistair Popple apopple@nvidia.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

This is the patch to start collide with David's.

David's patch has also unified both paths with ptep_get_and_clear(), but this patch itself is also correct to me.

It'll probably just become no-diff after rebase, though.. I'm not sure how the ordering would be at last, but anyway I think this patch stands as its own too..

Acked-by: Peter Xu peterx@redhat.com

Thanks for tolerant with my nitpickings,

...

New for v4

mm/migrate_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6a5ef9f..4cc849c 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -193,9 +193,9 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, bool anon_exclusive; pte_t swp_pte;
	flush_cache_page(vma, addr, pte_pfn(*ptep));
anon_exclusive = PageAnon(page) && PageAnonExclusive(page);
if (anon_exclusive) {
		flush_cache_page(vma, addr, pte_pfn(*ptep));
	ptep_clear_flush(vma, addr, ptep);
if (page_try_share_anon_rmap(page)) { -- git-series 0.9.1

-- Peter Xu

Alistair Popple

12:35 a.m.

New subject: [PATCH v4 3/4] mm/migrate_device.c: Copy pte dirty bit to page

migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that installs migration entries directly if it can lock the migrating page. When removing a dirty pte the dirty bit is supposed to be carried over to the underlying page to prevent it being lost.

Currently migrate_vma_*() can only be used for private anonymous mappings. That means loss of the dirty bit usually doesn't result in data loss because these pages are typically not file-backed. However pages may be backed by swap storage which can result in data loss if an attempt is made to migrate a dirty page that doesn't yet have the PageDirty flag set.

In this case migration will fail due to unexpected references but the dirty pte bit will be lost. If the page is subsequently reclaimed data won't be written back to swap storage as it is considered uptodate, resulting in data loss if the page is subsequently accessed.

Prevent this by copying the dirty bit to the page when removing the pte to match what try_to_migrate_one() does.

Signed-off-by: Alistair Popple apopple@nvidia.com Acked-by: Peter Xu peterx@redhat.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Reported-by: "Huang, Ying" ying.huang@intel.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

---

Changes for v4:

- Added Reviewed-by

Changes for v3:

- Defer TLB flushing - Split a TLB flushing fix into a separate change.

Changes for v2:

- Fixed up Reported-by tag. - Added Peter's Acked-by. - Atomically read and clear the pte to prevent the dirty bit getting set after reading it. - Added fixes tag --- mm/migrate_device.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 4cc849c..dbf6c7a 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -7,6 +7,7 @@ #include <linux/export.h> #include <linux/memremap.h> #include <linux/migrate.h> +#include <linux/mm.h> #include <linux/mm_inline.h> #include <linux/mmu_notifier.h> #include <linux/oom.h> @@ -196,7 +197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, flush_cache_page(vma, addr, pte_pfn(*ptep)); anon_exclusive = PageAnon(page) && PageAnonExclusive(page); if (anon_exclusive) { - ptep_clear_flush(vma, addr, ptep); + pte = ptep_clear_flush(vma, addr, ptep);

if (page_try_share_anon_rmap(page)) { set_pte_at(mm, addr, ptep, pte); @@ -206,11 +207,15 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, goto next; } } else { - ptep_get_and_clear(mm, addr, ptep); + pte = ptep_get_and_clear(mm, addr, ptep); }

migrate->cpages++;

+ /* Set the dirty flag on the folio now the pte is gone. */ + if (pte_dirty(pte)) + folio_mark_dirty(page_folio(page)); + /* Setup special migration page table entry */ if (mpfn & MIGRATE_PFN_WRITE) entry = make_writable_migration_entry(

-- git-series 0.9.1

David Hildenbrand

6:53 a.m.

New subject: [PATCH v4 3/4] mm/migrate_device.c: Copy pte dirty bit to page

On 02.09.22 02:35, Alistair Popple wrote:

...

migrate_vma_setup() has a fast path in migrate_vma_collect_pmd() that installs migration entries directly if it can lock the migrating page. When removing a dirty pte the dirty bit is supposed to be carried over to the underlying page to prevent it being lost.

Currently migrate_vma_*() can only be used for private anonymous mappings. That means loss of the dirty bit usually doesn't result in data loss because these pages are typically not file-backed. However pages may be backed by swap storage which can result in data loss if an attempt is made to migrate a dirty page that doesn't yet have the PageDirty flag set.

In this case migration will fail due to unexpected references but the dirty pte bit will be lost. If the page is subsequently reclaimed data won't be written back to swap storage as it is considered uptodate, resulting in data loss if the page is subsequently accessed.

Prevent this by copying the dirty bit to the page when removing the pte to match what try_to_migrate_one() does.

Signed-off-by: Alistair Popple apopple@nvidia.com Acked-by: Peter Xu peterx@redhat.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Reported-by: "Huang, Ying" ying.huang@intel.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

Changes for v4:

Added Reviewed-by

Changes for v3:

Defer TLB flushing

Split a TLB flushing fix into a separate change.

Changes for v2:

Fixed up Reported-by tag.

Added Peter's Acked-by.

Atomically read and clear the pte to prevent the dirty bit getting set after reading it.

Added fixes tag

mm/migrate_device.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 4cc849c..dbf6c7a 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -7,6 +7,7 @@ #include <linux/export.h> #include <linux/memremap.h> #include <linux/migrate.h> +#include <linux/mm.h> #include <linux/mm_inline.h> #include <linux/mmu_notifier.h> #include <linux/oom.h> @@ -196,7 +197,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, flush_cache_page(vma, addr, pte_pfn(*ptep)); anon_exclusive = PageAnon(page) && PageAnonExclusive(page); if (anon_exclusive) {
		ptep_clear_flush(vma, addr, ptep);
		pte = ptep_clear_flush(vma, addr, ptep);
if (page_try_share_anon_rmap(page)) { set_pte_at(mm, addr, ptep, pte); @@ -206,11 +207,15 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, goto next; } } else {
		ptep_get_and_clear(mm, addr, ptep);
		pte = ptep_get_and_clear(mm, addr, ptep);
}
migrate->cpages++;
	/* Set the dirty flag on the folio now the pte is gone. */
	if (pte_dirty(pte))
		folio_mark_dirty(page_folio(page));
/* Setup special migration page table entry */
if (mpfn & MIGRATE_PFN_WRITE)
	entry = make_writable_migration_entry(

This matches what we do in try_to_unmap_one()

Acked-by: David Hildenbrand david@redhat.com

-- Thanks, David / dhildenb

David Hildenbrand

6:51 a.m.

On 02.09.22 02:35, Alistair Popple wrote:

...

When clearing a PTE the TLB should be flushed whilst still holding the PTL to avoid a potential race with madvise/munmap/etc. For example consider the following sequence:

CPU0 CPU1

migrate_vma_collect_pmd() pte_unmap_unlock() madvise(MADV_DONTNEED) -> zap_pte_range() pte_offset_map_lock() [ PTE not present, TLB not flushed ] pte_unmap_unlock() [ page is still accessible via stale TLB ] flush_tlb_range()

In this case the page may still be accessed via the stale TLB entry after madvise returns. Fix this by flushing the TLB while holding the PTL.

Signed-off-by: Alistair Popple apopple@nvidia.com Reported-by: Nadav Amit nadav.amit@gmail.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

Changes for v4:

Added Review-by

Changes for v3:

New for v3

mm/migrate_device.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 27fb37d..6a5ef9f 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -254,13 +254,14 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, migrate->dst[migrate->npages] = 0; migrate->src[migrate->npages++] = mpfn; }

arch_leave_lazy_mmu_mode();

pte_unmap_unlock(ptep - 1, ptl);

/* Only flush the TLB if we actually modified any entries */ if (unmapped) flush_tlb_range(walk->vma, start, end);

arch_leave_lazy_mmu_mode();

pte_unmap_unlock(ptep - 1, ptl);

return 0;

}

base-commit: ffcf9c5700e49c0aee42dcba9a12ba21338e8136

Acked-by: David Hildenbrand david@redhat.com

-- Thanks, David / dhildenb

Peter Xu

8:35 p.m.

On Fri, Sep 02, 2022 at 10:35:51AM +1000, Alistair Popple wrote:

...

When clearing a PTE the TLB should be flushed whilst still holding the PTL to avoid a potential race with madvise/munmap/etc. For example consider the following sequence:

CPU0 CPU1

migrate_vma_collect_pmd() pte_unmap_unlock() madvise(MADV_DONTNEED) -> zap_pte_range() pte_offset_map_lock() [ PTE not present, TLB not flushed ] pte_unmap_unlock() [ page is still accessible via stale TLB ] flush_tlb_range()

In this case the page may still be accessed via the stale TLB entry after madvise returns. Fix this by flushing the TLB while holding the PTL.

Signed-off-by: Alistair Popple apopple@nvidia.com Reported-by: Nadav Amit nadav.amit@gmail.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Fixes: 8c3328f1f36a ("mm/migrate: migrate_vma() unmap page from vma while collecting pages") Cc: stable@vger.kernel.org

Acked-by: Peter Xu peterx@redhat.com

-- Peter Xu

1228

days inactive

1228

days old

linux-stable-mirror@lists.linaro.org

7 comments

participants

tags (0)

participants (3)

Alistair Popple
David Hildenbrand
Peter Xu