From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org --- mm/page_vma_mapped.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..1cf5b9bfb559 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,6 +21,15 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else { + if (is_swap_pte(*pvmw->pte)) { + swp_entry_t entry; + + /* Handle un-addressable ZONE_DEVICE memory */ + entry = pte_to_swp_entry(*pvmw->pte); + if (is_device_private_entry(entry)) + return true; + } + if (!pte_present(*pvmw->pte)) return false; }
On Fri, Aug 24, 2018 at 03:25:44PM -0400, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org
mm/page_vma_mapped.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..1cf5b9bfb559 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,6 +21,15 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else {
if (is_swap_pte(*pvmw->pte)) {
swp_entry_t entry;
/* Handle un-addressable ZONE_DEVICE memory */
entry = pte_to_swp_entry(*pvmw->pte);
if (is_device_private_entry(entry))
return true;
}
This happens just for !PVMW_SYNC && PVMW_MIGRATION? I presume this is triggered via the remove_migration_pte() code path? Doesn't returning true here imply that we've taken the ptl lock for the pvmw?
Balbir
On Fri, Aug 31, 2018 at 12:05:38AM +1000, Balbir Singh wrote:
On Fri, Aug 24, 2018 at 03:25:44PM -0400, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org
mm/page_vma_mapped.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..1cf5b9bfb559 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,6 +21,15 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else {
if (is_swap_pte(*pvmw->pte)) {
swp_entry_t entry;
/* Handle un-addressable ZONE_DEVICE memory */
entry = pte_to_swp_entry(*pvmw->pte);
if (is_device_private_entry(entry))
return true;
}
This happens just for !PVMW_SYNC && PVMW_MIGRATION? I presume this is triggered via the remove_migration_pte() code path? Doesn't returning true here imply that we've taken the ptl lock for the pvmw?
This happens through try_to_unmap() from migrate_vma_unmap() and thus has !PVMW_SYNC and !PVMW_MIGRATION
But you are right about the ptl lock, so looking at code we were just doing pte modification without holding the pte lock but the page_vma_mapped_walk() would not try to unlock as pvmw->ptl == NULL so this never triggered any warning.
I am gonna post a v2 shortly which address that.
Cheers, Jérôme
From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Changes since v1: - properly lock pte directory in map_pte()
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Balbir Singh bsingharora@gmail.com Cc: stable@vger.kernel.org --- mm/page_vma_mapped.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..bd67e23dce33 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,7 +21,14 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else { - if (!pte_present(*pvmw->pte)) + if (is_swap_pte(*pvmw->pte)) { + swp_entry_t entry; + + /* Handle un-addressable ZONE_DEVICE memory */ + entry = pte_to_swp_entry(*pvmw->pte); + if (!is_device_private_entry(entry)) + return false; + } else if (!pte_present(*pvmw->pte)) return false; } }
On Thu, Aug 30, 2018 at 10:41:56AM -0400, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Changes since v1: - properly lock pte directory in map_pte()
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Balbir Singh bsingharora@gmail.com Cc: stable@vger.kernel.org
mm/page_vma_mapped.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..bd67e23dce33 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,7 +21,14 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else {
if (!pte_present(*pvmw->pte))
if (is_swap_pte(*pvmw->pte)) {
swp_entry_t entry;
/* Handle un-addressable ZONE_DEVICE memory */
entry = pte_to_swp_entry(*pvmw->pte);
if (!is_device_private_entry(entry))
return false;
OK, so we skip this pte from unmap since it's already unmapped? This prevents try_to_unmap from unmapping it and it gets restored with MIGRATE_PFN_MIGRATE flag cleared?
Sounds like the right thing, if I understand it correctly
Acked-by: Balbir Singh bsingharora@gmail.com
Balbir Singh.
On Fri, Aug 31, 2018 at 07:27:24PM +1000, Balbir Singh wrote:
On Thu, Aug 30, 2018 at 10:41:56AM -0400, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Changes since v1: - properly lock pte directory in map_pte()
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Balbir Singh bsingharora@gmail.com Cc: stable@vger.kernel.org
mm/page_vma_mapped.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..bd67e23dce33 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,7 +21,14 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else {
if (!pte_present(*pvmw->pte))
if (is_swap_pte(*pvmw->pte)) {
swp_entry_t entry;
/* Handle un-addressable ZONE_DEVICE memory */
entry = pte_to_swp_entry(*pvmw->pte);
if (!is_device_private_entry(entry))
return false;
OK, so we skip this pte from unmap since it's already unmapped? This prevents try_to_unmap from unmapping it and it gets restored with MIGRATE_PFN_MIGRATE flag cleared?
Sounds like the right thing, if I understand it correctly
Well not exactly we do not skip it, we replace it with a migration pte see try_to_unmap_one() which get call with TTU_MIGRATION flag set (which do not translate in PVMW_MIGRATION being set on contrary).
From migration point of view even if this is a swap pte, it is still
a valid mapping of the page and is counted as such for all intent and purposes. The only thing we don't need is flushing CPU tlb or cache.
So this all happens when we are migrating something back to regular memory either because of CPU fault or because the device driver want to make room in its memory and decided to evict that page back to regular memory.
Cheers, Jérôme
On Fri, Aug 31, 2018 at 12:19:35PM -0400, Jerome Glisse wrote:
On Fri, Aug 31, 2018 at 07:27:24PM +1000, Balbir Singh wrote:
On Thu, Aug 30, 2018 at 10:41:56AM -0400, jglisse@redhat.com wrote:
From: Ralph Campbell rcampbell@nvidia.com
Private ZONE_DEVICE pages use a special pte entry and thus are not present. Properly handle this case in map_pte(), it is already handled in check_pte(), the map_pte() part was lost in some rebase most probably.
Without this patch the slow migration path can not migrate back private ZONE_DEVICE memory to regular memory. This was found after stress testing migration back to system memory. This ultimatly can lead the CPU to an infinite page fault loop on the special swap entry.
Changes since v1: - properly lock pte directory in map_pte()
Signed-off-by: Ralph Campbell rcampbell@nvidia.com Signed-off-by: Jérôme Glisse jglisse@redhat.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Balbir Singh bsingharora@gmail.com Cc: stable@vger.kernel.org
mm/page_vma_mapped.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index ae3c2a35d61b..bd67e23dce33 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -21,7 +21,14 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) if (!is_swap_pte(*pvmw->pte)) return false; } else {
if (!pte_present(*pvmw->pte))
if (is_swap_pte(*pvmw->pte)) {
swp_entry_t entry;
/* Handle un-addressable ZONE_DEVICE memory */
entry = pte_to_swp_entry(*pvmw->pte);
if (!is_device_private_entry(entry))
return false;
OK, so we skip this pte from unmap since it's already unmapped? This prevents try_to_unmap from unmapping it and it gets restored with MIGRATE_PFN_MIGRATE flag cleared?
Sounds like the right thing, if I understand it correctly
Well not exactly we do not skip it, we replace it with a migration
I think I missed the !is_device_private_entry and missed the ! part, so that seems reasonable
Reviewed-by: Balbir Singh bsingharora@gmail.com
linux-stable-mirror@lists.linaro.org