This is the backport for 4.4-stable.
We had a race in the old balloon compaction code before commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting commit 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With commit 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon(): - WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0 - list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to the refactoring).
Cc: Andrew Morton akpm@linux-foundation.org Cc: Mel Gorman mgorman@techsingularity.net Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Jan Kara jack@suse.cz Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Matthew Wilcox willy@infradead.org Cc: Vratislav Bendel vbendel@redhat.com Cc: Rafael Aquini aquini@redhat.com Cc: Konstantin Khlebnikov k.khlebnikov@samsung.com Cc: Minchan Kim minchan@kernel.org Cc: Sasha Levin sashal@kernel.org Cc: stable@vger.kernel.org # 3.12 - 4.7 Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Reported-by: Vratislav Bendel vbendel@redhat.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Rafael Aquini aquini@redhat.com Signed-off-by: David Hildenbrand david@redhat.com --- mm/migrate.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c index afedcfab60e2..3304c98f9a78 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -936,6 +936,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, int rc = MIGRATEPAGE_SUCCESS; int *result = NULL; struct page *newpage; + bool is_lru = !isolated_balloon_page(page);
newpage = get_new_page(page, private, &result); if (!newpage) @@ -984,10 +985,14 @@ out: * If migration was not successful and there's a freeing callback, use * it. Otherwise, putback_lru_page() will drop the reference grabbed * during isolation. + * + * Use the old state of the isolated source page to determine if we + * migrated a LRU page. newpage was already unlocked and possibly + * modified by its owner - don't rely on the page state. */ if (put_new_page) put_new_page(newpage, private); - else if (unlikely(__is_movable_balloon_page(newpage))) { + else if (unlikely(!is_lru)) { /* drop our reference, page already in the balloon */ put_page(newpage); } else
On Fri, Feb 01, 2019 at 02:43:47PM +0100, David Hildenbrand wrote:
This is the backport for 4.4-stable.
We had a race in the old balloon compaction code before commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting commit 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With commit 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to the refactoring).
Cc: Andrew Morton akpm@linux-foundation.org Cc: Mel Gorman mgorman@techsingularity.net Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Jan Kara jack@suse.cz Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Matthew Wilcox willy@infradead.org Cc: Vratislav Bendel vbendel@redhat.com Cc: Rafael Aquini aquini@redhat.com Cc: Konstantin Khlebnikov k.khlebnikov@samsung.com Cc: Minchan Kim minchan@kernel.org Cc: Sasha Levin sashal@kernel.org Cc: stable@vger.kernel.org # 3.12 - 4.7 Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Reported-by: Vratislav Bendel vbendel@redhat.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Rafael Aquini aquini@redhat.com Signed-off-by: David Hildenbrand david@redhat.com
mm/migrate.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
What is the git commit id of this patch in Linus's tree?
thanks,
greg k-h
On 01.02.19 15:09, Greg KH wrote:
On Fri, Feb 01, 2019 at 02:43:47PM +0100, David Hildenbrand wrote:
This is the backport for 4.4-stable.
We had a race in the old balloon compaction code before commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting commit 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With commit 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to the refactoring).
Cc: Andrew Morton akpm@linux-foundation.org Cc: Mel Gorman mgorman@techsingularity.net Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Jan Kara jack@suse.cz Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Matthew Wilcox willy@infradead.org Cc: Vratislav Bendel vbendel@redhat.com Cc: Rafael Aquini aquini@redhat.com Cc: Konstantin Khlebnikov k.khlebnikov@samsung.com Cc: Minchan Kim minchan@kernel.org Cc: Sasha Levin sashal@kernel.org Cc: stable@vger.kernel.org # 3.12 - 4.7 Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Reported-by: Vratislav Bendel vbendel@redhat.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Rafael Aquini aquini@redhat.com Signed-off-by: David Hildenbrand david@redhat.com
mm/migrate.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
What is the git commit id of this patch in Linus's tree?
It's still in Andrew's tree as far as I know. I just wanted to share the backport right away (to show that it is easy :) ). Will resend it again (with proper commit idea) once upstream.
thanks,
greg k-h
On 01.02.19 14:43, David Hildenbrand wrote:
This is the backport for 4.4-stable.
We had a race in the old balloon compaction code before commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature") refactored it that became visible after backporting commit 195a8c43e93d ("virtio-balloon: deflate via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") till commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"). commit d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") was backported to 3.12, so the broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage in __unmap_and_move() and checking for __is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and deflate the newpage, effectively dequeueing it and clearing PageBalloon, in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via putback_lru_page(newpage) instead of put_page(newpage), leading to page->lru getting modified and a !LRU page ending up in the LRU lists. With commit 195a8c43e93d ("virtio-balloon: deflate via a page list") backported, one would suddenly get corrupted lists in release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only PageMovable(). So the new check (__PageMovable(newpage)) will still hold even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be introduced again. So instead, make it explicit and use the information of the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to the refactoring).
Cc: Andrew Morton akpm@linux-foundation.org Cc: Mel Gorman mgorman@techsingularity.net Cc: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Jan Kara jack@suse.cz Cc: Andrea Arcangeli aarcange@redhat.com Cc: Dominik Brodowski linux@dominikbrodowski.net Cc: Matthew Wilcox willy@infradead.org Cc: Vratislav Bendel vbendel@redhat.com Cc: Rafael Aquini aquini@redhat.com Cc: Konstantin Khlebnikov k.khlebnikov@samsung.com Cc: Minchan Kim minchan@kernel.org Cc: Sasha Levin sashal@kernel.org Cc: stable@vger.kernel.org # 3.12 - 4.7 Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management") Reported-by: Vratislav Bendel vbendel@redhat.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: Rafael Aquini aquini@redhat.com Signed-off-by: David Hildenbrand david@redhat.com
mm/migrate.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c index afedcfab60e2..3304c98f9a78 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -936,6 +936,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, int rc = MIGRATEPAGE_SUCCESS; int *result = NULL; struct page *newpage;
- bool is_lru = !isolated_balloon_page(page);
newpage = get_new_page(page, private, &result); if (!newpage) @@ -984,10 +985,14 @@ out: * If migration was not successful and there's a freeing callback, use * it. Otherwise, putback_lru_page() will drop the reference grabbed * during isolation.
*
* Use the old state of the isolated source page to determine if we
* migrated a LRU page. newpage was already unlocked and possibly
*/ if (put_new_page) put_new_page(newpage, private);* modified by its owner - don't rely on the page state.
- else if (unlikely(__is_movable_balloon_page(newpage))) {
And to be save, we should turn this into
else if (rc == MIGRATEPAGE_SUCCESS && unlikely(!is_lru)) {
But will resend this either way as already mentioned to Greg.
- else if (unlikely(!is_lru)) { /* drop our reference, page already in the balloon */ put_page(newpage); } else
linux-stable-mirror@lists.linaro.org