Hi,
Round two of this show, I forget to add these stable tags sometimes
apparently... Can you add these four to 5.2 stable? Again listed
in order of how they should be applied.
a982eeb09b6030e567b8b815277c8c9197168040
500f9fbadef86466a435726192f4ca4df7d94236
a3a0e43fd77013819e4b6f55e37e0efe8e35d805
08f5439f1df25a6cf6cf4c72cf6c13025599ce67
Thanks!
--
Jens Axboe
The patch titled
Subject: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
has been removed from the -mm tree. Its filename was
mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags=y.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: mm/kasan: fix false positive invalid-free reports with CONFIG_KASAN_SW_TAGS=y
The code like this:
ptr = kmalloc(size, GFP_KERNEL);
page = virt_to_page(ptr);
offset = offset_in_page(ptr);
kfree(page_address(page) + offset);
may produce false-positive invalid-free reports on the kernel with
CONFIG_KASAN_SW_TAGS=y.
In the example above we lose the original tag assigned to 'ptr', so
kfree() gets the pointer with 0xFF tag. In kfree() we check that 0xFF tag
is different from the tag in shadow hence print false report.
Instead of just comparing tags, do the following:
1) Check that shadow doesn't contain KASAN_TAG_INVALID. Otherwise it's
double-free and it doesn't matter what tag the pointer have.
2) If pointer tag is different from 0xFF, make sure that tag in the
shadow is the same as in the pointer.
Link: http://lkml.kernel.org/r/20190819172540.19581-1-aryabinin@virtuozzo.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reported-by: Walter Wu <walter-zh.wu(a)mediatek.com>
Reported-by: Mark Rutland <mark.rutland(a)arm.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/common.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/kasan/common.c~mm-kasan-fix-false-positive-invalid-free-reports-with-config_kasan_sw_tags=y
+++ a/mm/kasan/common.c
@@ -407,8 +407,14 @@ static inline bool shadow_invalid(u8 tag
if (IS_ENABLED(CONFIG_KASAN_GENERIC))
return shadow_byte < 0 ||
shadow_byte >= KASAN_SHADOW_SCALE_SIZE;
- else
- return tag != (u8)shadow_byte;
+
+ /* else CONFIG_KASAN_SW_TAGS: */
+ if ((u8)shadow_byte == KASAN_TAG_INVALID)
+ return true;
+ if ((tag != KASAN_TAG_KERNEL) && (tag != (u8)shadow_byte))
+ return true;
+
+ return false;
}
static bool __kasan_slab_free(struct kmem_cache *cache, void *object,
_
Patches currently in -mm which might be from aryabinin(a)virtuozzo.com are
mm-vmscan-remove-unused-lru_pages-argument.patch
The patch titled
Subject: mm/zsmalloc.c: fix race condition in zs_destroy_pool
has been removed from the -mm tree. Its filename was
mm-zsmallocc-fix-race-condition-in-zs_destroy_pool.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Henry Burns <henryburns(a)google.com>
Subject: mm/zsmalloc.c: fix race condition in zs_destroy_pool
In zs_destroy_pool() we call flush_work(&pool->free_work). However, we
have no guarantee that migration isn't happening in the background at that
time.
Since migration can't directly free pages, it relies on free_work being
scheduled to free the pages. But there's nothing preventing an
in-progress migrate from queuing the work *after*
zs_unregister_migration() has called flush_work(). Which would mean pages
still pointing at the inode when we free it.
Since we know at destroy time all objects should be free, no new
migrations can come in (since zs_page_isolate() fails for fully-free
zspages). This means it is sufficient to track a "# isolated zspages"
count by class, and have the destroy logic ensure all such pages have
drained before proceeding. Keeping that state under the class spinlock
keeps the logic straightforward.
In this case a memory leak could lead to an eventual crash if
compaction hits the leaked page. This crash would only occur if people
are changing their zswap backend at runtime (which eventually starts
destruction).
Link: http://lkml.kernel.org/r/20190809181751.219326-2-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 59 insertions(+), 2 deletions(-)
--- a/mm/zsmalloc.c~mm-zsmallocc-fix-race-condition-in-zs_destroy_pool
+++ a/mm/zsmalloc.c
@@ -54,6 +54,7 @@
#include <linux/mount.h>
#include <linux/pseudo_fs.h>
#include <linux/migrate.h>
+#include <linux/wait.h>
#include <linux/pagemap.h>
#include <linux/fs.h>
@@ -268,6 +269,10 @@ struct zs_pool {
#ifdef CONFIG_COMPACTION
struct inode *inode;
struct work_struct free_work;
+ /* A wait queue for when migration races with async_free_zspage() */
+ struct wait_queue_head migration_wait;
+ atomic_long_t isolated_pages;
+ bool destroying;
#endif
};
@@ -1874,6 +1879,19 @@ static void putback_zspage_deferred(stru
}
+static inline void zs_pool_dec_isolated(struct zs_pool *pool)
+{
+ VM_BUG_ON(atomic_long_read(&pool->isolated_pages) <= 0);
+ atomic_long_dec(&pool->isolated_pages);
+ /*
+ * There's no possibility of racing, since wait_for_isolated_drain()
+ * checks the isolated count under &class->lock after enqueuing
+ * on migration_wait.
+ */
+ if (atomic_long_read(&pool->isolated_pages) == 0 && pool->destroying)
+ wake_up_all(&pool->migration_wait);
+}
+
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
@@ -1943,6 +1961,7 @@ static bool zs_page_isolate(struct page
*/
if (!list_empty(&zspage->list) && !is_zspage_isolated(zspage)) {
get_zspage_mapping(zspage, &class_idx, &fullness);
+ atomic_long_inc(&pool->isolated_pages);
remove_zspage(class, zspage, fullness);
}
@@ -2042,8 +2061,16 @@ static int zs_page_migrate(struct addres
* Page migration is done so let's putback isolated zspage to
* the list if @page is final isolated subpage in the zspage.
*/
- if (!is_zspage_isolated(zspage))
+ if (!is_zspage_isolated(zspage)) {
+ /*
+ * We cannot race with zs_destroy_pool() here because we wait
+ * for isolation to hit zero before we start destroying.
+ * Also, we ensure that everyone can see pool->destroying before
+ * we start waiting.
+ */
putback_zspage_deferred(pool, class, zspage);
+ zs_pool_dec_isolated(pool);
+ }
reset_page(page);
put_page(page);
@@ -2094,8 +2121,8 @@ static void zs_page_putback(struct page
* so let's defer.
*/
putback_zspage_deferred(pool, class, zspage);
+ zs_pool_dec_isolated(pool);
}
-
spin_unlock(&class->lock);
}
@@ -2118,8 +2145,36 @@ static int zs_register_migration(struct
return 0;
}
+static bool pool_isolated_are_drained(struct zs_pool *pool)
+{
+ return atomic_long_read(&pool->isolated_pages) == 0;
+}
+
+/* Function for resolving migration */
+static void wait_for_isolated_drain(struct zs_pool *pool)
+{
+
+ /*
+ * We're in the process of destroying the pool, so there are no
+ * active allocations. zs_page_isolate() fails for completely free
+ * zspages, so we need only wait for the zs_pool's isolated
+ * count to hit zero.
+ */
+ wait_event(pool->migration_wait,
+ pool_isolated_are_drained(pool));
+}
+
static void zs_unregister_migration(struct zs_pool *pool)
{
+ pool->destroying = true;
+ /*
+ * We need a memory barrier here to ensure global visibility of
+ * pool->destroying. Thus pool->isolated pages will either be 0 in which
+ * case we don't care, or it will be > 0 and pool->destroying will
+ * ensure that we wake up once isolation hits 0.
+ */
+ smp_mb();
+ wait_for_isolated_drain(pool); /* This can block */
flush_work(&pool->free_work);
iput(pool->inode);
}
@@ -2357,6 +2412,8 @@ struct zs_pool *zs_create_pool(const cha
if (!pool->name)
goto err;
+ init_waitqueue_head(&pool->migration_wait);
+
if (create_cache(pool))
goto err;
_
Patches currently in -mm which might be from henryburns(a)google.com are
The patch titled
Subject: mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
has been removed from the -mm tree. Its filename was
mm-zsmallocc-migration-can-leave-pages-in-zs_empty-indefinitely.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Henry Burns <henryburns(a)google.com>
Subject: mm/zsmalloc.c: migration can leave pages in ZS_EMPTY indefinitely
In zs_page_migrate() we call putback_zspage() after we have finished
migrating all pages in this zspage. However, the return value is ignored.
If a zs_free() races in between zs_page_isolate() and zs_page_migrate(),
freeing the last object in the zspage, putback_zspage() will leave the
page in ZS_EMPTY for potentially an unbounded amount of time.
To fix this, we need to do the same thing as zs_page_putback() does:
schedule free_work to occur. To avoid duplicated code, move the sequence
to a new putback_zspage_deferred() function which both zs_page_migrate()
and zs_page_putback() call.
Link: http://lkml.kernel.org/r/20190809181751.219326-1-henryburns@google.com
Fixes: 48b4800a1c6a ("zsmalloc: page migration support")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: Henry Burns <henrywolfeburns(a)gmail.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
--- a/mm/zsmalloc.c~mm-zsmallocc-migration-can-leave-pages-in-zs_empty-indefinitely
+++ a/mm/zsmalloc.c
@@ -1862,6 +1862,18 @@ static void dec_zspage_isolation(struct
zspage->isolated--;
}
+static void putback_zspage_deferred(struct zs_pool *pool,
+ struct size_class *class,
+ struct zspage *zspage)
+{
+ enum fullness_group fg;
+
+ fg = putback_zspage(class, zspage);
+ if (fg == ZS_EMPTY)
+ schedule_work(&pool->free_work);
+
+}
+
static void replace_sub_page(struct size_class *class, struct zspage *zspage,
struct page *newpage, struct page *oldpage)
{
@@ -2031,7 +2043,7 @@ static int zs_page_migrate(struct addres
* the list if @page is final isolated subpage in the zspage.
*/
if (!is_zspage_isolated(zspage))
- putback_zspage(class, zspage);
+ putback_zspage_deferred(pool, class, zspage);
reset_page(page);
put_page(page);
@@ -2077,14 +2089,13 @@ static void zs_page_putback(struct page
spin_lock(&class->lock);
dec_zspage_isolation(zspage);
if (!is_zspage_isolated(zspage)) {
- fg = putback_zspage(class, zspage);
/*
* Due to page_lock, we cannot free zspage immediately
* so let's defer.
*/
- if (fg == ZS_EMPTY)
- schedule_work(&pool->free_work);
+ putback_zspage_deferred(pool, class, zspage);
}
+
spin_unlock(&class->lock);
}
_
Patches currently in -mm which might be from henryburns(a)google.com are
The patch titled
Subject: mm, page_owner: handle THP splits correctly
has been removed from the -mm tree. Its filename was
mm-page_owner-handle-thp-splits-correctly.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_owner: handle THP splits correctly
THP splitting path is missing the split_page_owner() call that
split_page() has. As a result, split THP pages are wrongly reported in
the page_owner file as order-9 pages. Furthermore when the former head
page is freed, the remaining former tail pages are not listed in the
page_owner file at all. This patch fixes that by adding the
split_page_owner() call into __split_huge_page().
Link: http://lkml.kernel.org/r/20190820131828.22684-2-vbabka@suse.cz
Fixes: a9627bc5e34e ("mm/page_owner: introduce split_page_owner and replace manual handling")
Reported-by: Kirill A. Shutemov <kirill(a)shutemov.name>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/huge_memory.c~mm-page_owner-handle-thp-splits-correctly
+++ a/mm/huge_memory.c
@@ -32,6 +32,7 @@
#include <linux/shmem_fs.h>
#include <linux/oom.h>
#include <linux/numa.h>
+#include <linux/page_owner.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
@@ -2516,6 +2517,9 @@ static void __split_huge_page(struct pag
}
ClearPageCompound(head);
+
+ split_page_owner(head, HPAGE_PMD_ORDER);
+
/* See comment in __split_huge_page_tail() */
if (PageAnon(head)) {
/* Additional pin to swap cache */
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-page_owner-record-page-owner-for-each-subpage.patch
mm-page_owner-keep-owner-info-when-freeing-the-page.patch
mm-page_owner-debug_pagealloc-save-and-dump-freeing-stack-trace.patch
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone-fix-2.patch
mm-reclaim-cleanup-should_continue_reclaim.patch
mm-compaction-raise-compaction-priority-after-it-withdrawns.patch
The patch titled
Subject: psi: get poll_work to run when calling poll syscall next time
has been removed from the -mm tree. Its filename was
psi-get-poll_work-to-run-when-calling-poll-syscall-next-time.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jason Xing <kerneljasonxing(a)linux.alibaba.com>
Subject: psi: get poll_work to run when calling poll syscall next time
Only when calling the poll syscall the first time can user receive POLLPRI
correctly. After that, user always fails to acquire the event signal.
Reproduce case:
1. Get the monitor code in Documentation/accounting/psi.txt
2. Run it, and wait for the event triggered.
3. Kill and restart the process.
The question is why we can end up with poll_scheduled = 1 but the work not
running (which would reset it to 0). And the answer is because the
scheduling side sees group->poll_kworker under RCU protection and then
schedules it, but here we cancel the work and destroy the worker. The
cancel needs to pair with resetting the poll_scheduled flag.
Link: http://lkml.kernel.org/r/1566357985-97781-1-git-send-email-joseph.qi@linux.…
Signed-off-by: Jason Xing <kerneljasonxing(a)linux.alibaba.com>
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Caspar Zhang <caspar(a)linux.alibaba.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/sched/psi.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/kernel/sched/psi.c~psi-get-poll_work-to-run-when-calling-poll-syscall-next-time
+++ a/kernel/sched/psi.c
@@ -1131,7 +1131,15 @@ static void psi_trigger_destroy(struct k
* deadlock while waiting for psi_poll_work to acquire trigger_lock
*/
if (kworker_to_destroy) {
+ /*
+ * After the RCU grace period has expired, the worker
+ * can no longer be found through group->poll_kworker.
+ * But it might have been already scheduled before
+ * that - deschedule it cleanly before destroying it.
+ */
kthread_cancel_delayed_work_sync(&group->poll_work);
+ atomic_set(&group->poll_scheduled, 0);
+
kthread_destroy_worker(kworker_to_destroy);
}
kfree(t);
_
Patches currently in -mm which might be from kerneljasonxing(a)linux.alibaba.com are
The patch titled
Subject: mm: memcontrol: flush percpu vmevents before releasing memcg
has been removed from the -mm tree. Its filename was
mm-memcontrol-flush-percpu-vmevents-before-releasing-memcg.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcontrol: flush percpu vmevents before releasing memcg
Similar to vmstats, percpu caching of local vmevents leads to an
accumulation of errors on non-leaf levels. This happens because some
leftovers may remain in percpu caches, so that they are never propagated
up by the cgroup tree and just disappear into nonexistence with on
releasing of the memory cgroup.
To fix this issue let's accumulate and propagate percpu vmevents values
before releasing the memory cgroup similar to what we're doing with
vmstats.
Since on cpu hotplug we do flush percpu vmstats anyway, we can iterate
only over online cpus.
Link: http://lkml.kernel.org/r/20190819202338.363363-4-guro@fb.com
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-flush-percpu-vmevents-before-releasing-memcg
+++ a/mm/memcontrol.c
@@ -3295,6 +3295,25 @@ static void memcg_flush_percpu_vmstats(s
}
}
+static void memcg_flush_percpu_vmevents(struct mem_cgroup *memcg)
+{
+ unsigned long events[NR_VM_EVENT_ITEMS];
+ struct mem_cgroup *mi;
+ int cpu, i;
+
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ events[i] = 0;
+
+ for_each_online_cpu(cpu)
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ events[i] += raw_cpu_read(
+ memcg->vmstats_percpu->events[i]);
+
+ for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
+ for (i = 0; i < NR_VM_EVENT_ITEMS; i++)
+ atomic_long_add(events[i], &mi->vmevents[i]);
+}
+
#ifdef CONFIG_MEMCG_KMEM
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
@@ -4718,10 +4737,11 @@ static void __mem_cgroup_free(struct mem
int node;
/*
- * Flush percpu vmstats to guarantee the value correctness
+ * Flush percpu vmstats and vmevents to guarantee the value correctness
* on parent's and all ancestor levels.
*/
memcg_flush_percpu_vmstats(memcg);
+ memcg_flush_percpu_vmevents(memcg);
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->vmstats_percpu);
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcontrol-flush-percpu-slab-vmstats-on-kmem-offlining.patch
partially-revert-mm-memcontrolc-keep-local-vm-counters-in-sync-with-the-hierarchical-ones.patch
mm-memcontrol-switch-to-rcu-protection-in-drain_all_stock.patch