The patch titled
Subject: fs/proc/task_mmu.c: fix smaps_rollup pss_locked calculation
has been removed from the -mm tree. Its filename was
mm-proc-smaps_rollup-fix-pss_locked-calculation.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Sandeep Patil <sspatil(a)android.com>
Subject: fs/proc/task_mmu.c: fix smaps_rollup pss_locked calculation
The 'pss_locked' field of smaps_rollup was being calculated incorrectly as
it accumulated the current pss everytime a locked VMA was found.
Fix that by making sure we record the current pss value before each VMA is
walked. So, we can only add the delta if the VMA was found to be
VM_LOCKED.
Link: http://lkml.kernel.org/r/20190121011049.160505-1-sspatil@android.com
Fixes: 493b0e9d945f ("mm: add /proc/pid/smaps_rollup")
Signed-off-by: Sandeep Patil <sspatil(a)android.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Andrey Vagin <avagin(a)openvz.org>
Cc: Daniel Colascione <dancol(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.14.x 4.19.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/proc/task_mmu.c~mm-proc-smaps_rollup-fix-pss_locked-calculation
+++ a/fs/proc/task_mmu.c
@@ -709,6 +709,7 @@ static void smap_gather_stats(struct vm_
#endif
.mm = vma->vm_mm,
};
+ unsigned long pss;
smaps_walk.private = mss;
@@ -737,11 +738,12 @@ static void smap_gather_stats(struct vm_
}
}
#endif
-
+ /* record current pss so we can calculate the delta after page walk */
+ pss = mss->pss;
/* mmap_sem is held in m_start */
walk_page_vma(vma, &smaps_walk);
if (vma->vm_flags & VM_LOCKED)
- mss->pss_locked += mss->pss;
+ mss->pss_locked += mss->pss - pss;
}
#define SEQ_PUT_DEC(str, val) \
_
Patches currently in -mm which might be from sspatil(a)android.com are
The current approach to read first 6 bytes from the response and then tail
of the response, can cause the 2nd memcpy_fromio() to do an unaligned read
(e.g. read 32-bit word from address aligned to a 16-bits), depending on how
memcpy_fromio() is implemented. If this happens, the read will fail and the
memory controller will fill the read with 1's.
This was triggered by 170d13ca3a2f, which should be probably refined to
check and react to the address alignment. Before that commit, on x86
memcpy_fromio() turned out to be memcpy(). By a luck GCC has done the right
thing (from tpm_crb's perspective) for us so far, but we should not rely on
that. Thus, it makes sense to fix this also in tpm_crb, not least because
the fix can be then backported to stable kernels and make them more robust
when compiled in differing environments.
Cc: stable(a)vger.kernel.org
Cc: James Morris <jmorris(a)namei.org>
Cc: Tomas Winkler <tomas.winkler(a)intel.com>
Cc: Jerry Snitselaar <jsnitsel(a)redhat.com>
Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
---
v2:
* There was a trailing double colon in the end of the short summary.
* Check requested and expected length against TPM_HEADER_SIZE.
* Add some explanatory comments to crb_recv().
drivers/char/tpm/tpm_crb.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 36952ef98f90..c084e61299aa 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -287,19 +287,29 @@ static int crb_recv(struct tpm_chip *chip, u8 *buf, size_t count)
struct crb_priv *priv = dev_get_drvdata(&chip->dev);
unsigned int expected;
- /* sanity check */
- if (count < 6)
+ /* A sanity check that the upper layer wants to get at least the header
+ * as that is the minimum size for any TPM response.
+ */
+ if (count < TPM_HEADER_SIZE)
return -EIO;
+ /* If this bit is set, according to the spec, the TPM is in unrecovable
+ * condition.
+ */
if (ioread32(&priv->regs_t->ctrl_sts) & CRB_CTRL_STS_ERROR)
return -EIO;
- memcpy_fromio(buf, priv->rsp, 6);
- expected = be32_to_cpup((__be32 *) &buf[2]);
- if (expected > count || expected < 6)
+ /* Read 8 bytes (not just 6 bytes, which would cover the response length
+ * field) in order to make sure that the reminding memory accesses will
+ * be aligned.
+ */
+ memcpy_fromio(buf, priv->rsp, 8);
+
+ expected = be32_to_cpup((__be32 *)&buf[2]);
+ if (expected > count || expected < TPM_HEADER_SIZE)
return -EIO;
- memcpy_fromio(&buf[6], &priv->rsp[6], expected - 6);
+ memcpy_fromio(&buf[8], &priv->rsp[8], expected - 8);
return expected;
}
--
2.19.1
The patch titled
Subject: mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
has been removed from the -mm tree. Its filename was
mm-migrate-dont-rely-on-__pagemovable-of-newpage-after-unlocking-it.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm: migrate: don't rely on __PageMovable() of newpage after unlocking it
We had a race in the old balloon compaction code before b1123ea6d3b3 ("mm:
balloon: use general non-lru movable page feature") refactored it that
became visible after backporting 195a8c43e93d ("virtio-balloon: deflate
via a page list") without the refactoring.
The bug existed from commit d6d86c0a7f8d ("mm/balloon_compaction: redesign
ballooned pages management") till b1123ea6d3b3 ("mm: balloon: use general
non-lru movable page feature"). d6d86c0a7f8d ("mm/balloon_compaction:
redesign ballooned pages management") was backported to 3.12, so the
broken kernels are stable kernels [3.12 - 4.7].
There was a subtle race between dropping the page lock of the newpage
in __unmap_and_move() and checking for
__is_movable_balloon_page(newpage).
Just after dropping this page lock, virtio-balloon could go ahead and
deflate the newpage, effectively dequeueing it and clearing PageBalloon,
in turn making __is_movable_balloon_page(newpage) fail.
This resulted in dropping the reference of the newpage via
putback_lru_page(newpage) instead of put_page(newpage), leading to
page->lru getting modified and a !LRU page ending up in the LRU lists.
With 195a8c43e93d ("virtio-balloon: deflate via a page list") backported,
one would suddenly get corrupted lists in release_pages_balloon():
- WARNING: CPU: 13 PID: 6586 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
- list_del corruption. prev->next should be ffffe253961090a0, but was dead000000000100
Nowadays this race is no longer possible, but it is hidden behind very
ugly handling of __ClearPageMovable() and __PageMovable().
__ClearPageMovable() will not make __PageMovable() fail, only
PageMovable(). So the new check (__PageMovable(newpage)) will still hold
even after newpage was dequeued by virtio-balloon.
If anybody would ever change that special handling, the BUG would be
introduced again. So instead, make it explicit and use the information of
the original isolated page before migration.
This patch can be backported fairly easy to stable kernels (in contrast to
the refactoring).
Link: http://lkml.kernel.org/r/20190129233217.10747-1-david@redhat.com
Fixes: d6d86c0a7f8d ("mm/balloon_compaction: redesign ballooned pages management")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Vratislav Bendel <vbendel(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Rafael Aquini <aquini(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vratislav Bendel <vbendel(a)redhat.com>
Cc: Rafael Aquini <aquini(a)redhat.com>
Cc: Konstantin Khlebnikov <k.khlebnikov(a)samsung.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [3.12 - 4.7]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/migrate.c~mm-migrate-dont-rely-on-__pagemovable-of-newpage-after-unlocking-it
+++ a/mm/migrate.c
@@ -1130,10 +1130,13 @@ out:
* If migration is successful, decrease refcount of the newpage
* which will not free the page because new page owner increased
* refcounter. As well, if it is LRU page, add the page to LRU
- * list in here.
+ * list in here. Use the old state of the isolated source page to
+ * determine if we migrated a LRU page. newpage was already unlocked
+ * and possibly modified by its owner - don't rely on the page
+ * state.
*/
if (rc == MIGRATEPAGE_SUCCESS) {
- if (unlikely(__PageMovable(newpage)))
+ if (unlikely(!is_lru))
put_page(newpage);
else
putback_lru_page(newpage);
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-balloon-update-comment-about-isolation-migration-compaction.patch
mm-convert-pg_balloon-to-pg_offline.patch
kexec-export-pg_offline-to-vmcoreinfo.patch
xen-balloon-mark-inflated-pages-pg_offline.patch
hv_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline.patch
vmw_balloon-mark-inflated-pages-pg_offline-v2.patch
pm-hibernate-use-pfn_to_online_page.patch
pm-hibernate-exclude-all-pageoffline-pages.patch
pm-hibernate-exclude-all-pageoffline-pages-v2.patch
agp-efficeon-no-need-to-set-pg_reserved-on-gatt-tables.patch
s390-vdso-dont-clear-pg_reserved.patch
powerpc-vdso-dont-clear-pg_reserved.patch
riscv-vdso-dont-clear-pg_reserved.patch
m68k-mm-use-__clearpagereserved.patch
arm64-kexec-no-need-to-clearpagereserved.patch
arm64-kdump-no-need-to-mark-crashkernel-pages-manually-pg_reserved.patch
ia64-perfmon-dont-mark-buffer-pages-as-pg_reserved.patch
mm-better-document-pg_reserved.patch
The patch titled
Subject: mm: hwpoison: use do_send_sig_info() instead of force_sig()
has been removed from the -mm tree. Its filename was
mm-hwpoison-use-do_send_sig_info-instead-of-force_sig-re-pmem-error-handling-forces-sigkill-causes-kernel-panic.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Subject: mm: hwpoison: use do_send_sig_info() instead of force_sig()
Currently memory_failure() is racy against process's exiting, which
results in kernel crash by null pointer dereference.
The root cause is that memory_failure() uses force_sig() to forcibly kill
asynchronous (meaning not in the current context) processes. As discussed
in thread https://lkml.org/lkml/2010/6/8/236 years ago for OOM fixes, this
is not a right thing to do. OOM solves this issue by using
do_send_sig_info() as done in commit d2d393099de2 ("signal: oom_kill_task:
use SEND_SIG_FORCED instead of force_sig()"), so this patch is suggesting
to do the same for hwpoison. do_send_sig_info() properly accesses to
siglock with lock_task_sighand(), so is free from the reported race.
I confirmed that the reported bug reproduces with inserting some delay in
kill_procs(), and it never reproduces with this patch.
Note that memory_failure() can send another type of signal using
force_sig_mceerr(), and the reported race shouldn't happen on it because
force_sig_mceerr() is called only for synchronous processes (i.e.
BUS_MCEERR_AR happens only when some process accesses to the corrupted
memory.)
Link: http://lkml.kernel.org/r/20190116093046.GA29835@hori1.linux.bs1.fc.nec.co.jp
Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: William Kucharski <william.kucharski(a)oracle.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/memory-failure.c~mm-hwpoison-use-do_send_sig_info-instead-of-force_sig-re-pmem-error-handling-forces-sigkill-causes-kernel-panic
+++ a/mm/memory-failure.c
@@ -372,7 +372,8 @@ static void kill_procs(struct list_head
if (fail || tk->addr_valid == 0) {
pr_err("Memory failure: %#lx: forcibly killing %s:%d because of failure to unmap corrupted page\n",
pfn, tk->tsk->comm, tk->tsk->pid);
- force_sig(SIGKILL, tk->tsk);
+ do_send_sig_info(SIGKILL, SEND_SIG_PRIV,
+ tk->tsk, PIDTYPE_PID);
}
/*
_
Patches currently in -mm which might be from n-horiguchi(a)ah.jp.nec.com are
The patch titled
Subject: mm, oom: fix use-after-free in oom_kill_process
has been removed from the -mm tree. Its filename was
mm-oom-fix-use-after-free-in-oom_kill_process.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Shakeel Butt <shakeelb(a)google.com>
Subject: mm, oom: fix use-after-free in oom_kill_process
Syzbot instance running on upstream kernel found a use-after-free bug in
oom_kill_process. On further inspection it seems like the process
selected to be oom-killed has exited even before reaching
read_lock(&tasklist_lock) in oom_kill_process(). More specifically the
tsk->usage is 1 which is due to get_task_struct() in oom_evaluate_task()
and the put_task_struct within for_each_thread() frees the tsk and
for_each_thread() tries to access the tsk. The easiest fix is to do
get/put across the for_each_thread() on the selected task.
Now the next question is should we continue with the oom-kill as the
previously selected task has exited? However before adding more
complexity and heuristics, let's answer why we even look at the children
of oom-kill selected task? The select_bad_process() has already selected
the worst process in the system/memcg. Due to race, the selected process
might not be the worst at the kill time but does that matter? The
userspace can use the oom_score_adj interface to prefer children to be
killed before the parent. I looked at the history but it seems like this
is there before git history.
Link: http://lkml.kernel.org/r/20190121215850.221745-1-shakeelb@google.com
Reported-by: syzbot+7fbbfa368521945f0e3d(a)syzkaller.appspotmail.com
Fixes: 6b0c81b3be11 ("mm, oom: reduce dependency on tasklist_lock")
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/oom_kill.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/oom_kill.c~mm-oom-fix-use-after-free-in-oom_kill_process
+++ a/mm/oom_kill.c
@@ -975,6 +975,13 @@ static void oom_kill_process(struct oom_
* still freeing memory.
*/
read_lock(&tasklist_lock);
+
+ /*
+ * The task 'p' might have already exited before reaching here. The
+ * put_task_struct() will free task_struct 'p' while the loop still try
+ * to access the field of 'p', so, get an extra reference.
+ */
+ get_task_struct(p);
for_each_thread(p, t) {
list_for_each_entry(child, &t->children, sibling) {
unsigned int child_points;
@@ -994,6 +1001,7 @@ static void oom_kill_process(struct oom_
}
}
}
+ put_task_struct(p);
read_unlock(&tasklist_lock);
/*
_
Patches currently in -mm which might be from shakeelb(a)google.com are
memcg-localize-memcg_kmem_enabled-check.patch
memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work.patch
memcg-schedule-high-reclaim-for-remote-memcgs-on-high_work-v3.patch
mm-oom-remove-prefer-children-over-parent-heuristic.patch
The patch titled
Subject: oom, oom_reaper: do not enqueue same task twice
has been removed from the -mm tree. Its filename was
oom-oom_reaper-do-not-enqueue-same-task-twice.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Subject: oom, oom_reaper: do not enqueue same task twice
Arkadiusz reported that enabling memcg's group oom killing causes strange
memcg statistics where there is no task in a memcg despite the number of
tasks in that memcg is not 0. It turned out that there is a bug in
wake_oom_reaper() which allows enqueuing same task twice which makes
impossible to decrease the number of tasks in that memcg due to a refcount
leak.
This bug existed since the OOM reaper became invokable from
task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
T1@P1 |T2@P1 |T3@P1 |OOM reaper
----------+----------+----------+------------
# Processing an OOM victim in a different memcg domain.
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
try_charge()
mem_cgroup_out_of_memory()
mutex_lock(&oom_lock)
out_of_memory()
oom_kill_process(P1)
do_send_sig_info(SIGKILL, @P1)
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T2@P1)
wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
mutex_unlock(&oom_lock)
out_of_memory()
mark_oom_victim(T1@P1)
wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
mutex_unlock(&oom_lock)
# Completed processing an OOM victim in a different memcg domain.
spin_lock(&oom_reaper_lock)
# T1P1 is dequeued.
spin_unlock(&oom_reaper_lock)
but memcg's group oom killing made it easier to trigger this bug by
calling wake_oom_reaper() on the same task from one out_of_memory()
request.
Fix this bug using an approach used by commit 855b018325737f76 ("oom,
oom_reaper: disable oom_reaper for oom_kill_allocating_task"). As a side
effect of this patch, this patch also avoids enqueuing multiple threads
sharing memory via task_will_free_mem(current) path.
Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura…
Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura…
Fixes: af8e15cc85a25315 ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reported-by: Arkadiusz Miskiewicz <arekm(a)maven.pl>
Tested-by: Arkadiusz Miskiewicz <arekm(a)maven.pl>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Aleksa Sarai <asarai(a)suse.de>
Cc: Jay Kamat <jgkamat(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/sched/coredump.h | 1 +
mm/oom_kill.c | 4 ++--
2 files changed, 3 insertions(+), 2 deletions(-)
--- a/include/linux/sched/coredump.h~oom-oom_reaper-do-not-enqueue-same-task-twice
+++ a/include/linux/sched/coredump.h
@@ -71,6 +71,7 @@ static inline int get_dumpable(struct mm
#define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */
#define MMF_DISABLE_THP 24 /* disable THP for all VMAs */
#define MMF_OOM_VICTIM 25 /* mm is the oom victim */
+#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */
#define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP)
#define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
--- a/mm/oom_kill.c~oom-oom_reaper-do-not-enqueue-same-task-twice
+++ a/mm/oom_kill.c
@@ -647,8 +647,8 @@ static int oom_reaper(void *unused)
static void wake_oom_reaper(struct task_struct *tsk)
{
- /* tsk is already queued? */
- if (tsk == oom_reaper_list || tsk->oom_reaper_list)
+ /* mm is already queued? */
+ if (test_and_set_bit(MMF_OOM_REAP_QUEUED, &tsk->signal->oom_mm->flags))
return;
get_task_struct(tsk);
_
Patches currently in -mm which might be from penguin-kernel(a)I-love.SAKURA.ne.jp are
memcg-killed-threads-should-not-invoke-memcg-oom-killer.patch
mmoom-dont-kill-global-init-via-memoryoomgroup.patch
info-task-hung-in-generic_file_write_iter.patch
info-task-hung-in-generic_file_write-fix.patch
The patch titled
Subject: mm: migrate: make buffer_migrate_page_norefs() actually succeed
has been removed from the -mm tree. Its filename was
mm-migrate-make-buffer_migrate_page_norefs-actually-succeed.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: mm: migrate: make buffer_migrate_page_norefs() actually succeed
Currently, buffer_migrate_page_norefs() was constantly failing because
buffer_migrate_lock_buffers() grabbed reference on each buffer. In fact,
there's no reason for buffer_migrate_lock_buffers() to grab any buffer
references as the page is locked during all our operation and thus nobody
can reclaim buffers from the page. So remove grabbing of buffer
references which also makes buffer_migrate_page_norefs() succeed.
Link: http://lkml.kernel.org/r/20190116131217.7226-1-jack@suse.cz
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: Pavel Machek <pavel(a)ucw.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Zi Yan <zi.yan(a)cs.rutgers.edu>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 5 -----
1 file changed, 5 deletions(-)
--- a/mm/migrate.c~mm-migrate-make-buffer_migrate_page_norefs-actually-succeed
+++ a/mm/migrate.c
@@ -709,7 +709,6 @@ static bool buffer_migrate_lock_buffers(
/* Simple case, sync compaction */
if (mode != MIGRATE_ASYNC) {
do {
- get_bh(bh);
lock_buffer(bh);
bh = bh->b_this_page;
@@ -720,18 +719,15 @@ static bool buffer_migrate_lock_buffers(
/* async case, we cannot block on lock_buffer so use trylock_buffer */
do {
- get_bh(bh);
if (!trylock_buffer(bh)) {
/*
* We failed to lock the buffer and cannot stall in
* async migration. Release the taken locks
*/
struct buffer_head *failed_bh = bh;
- put_bh(failed_bh);
bh = head;
while (bh != failed_bh) {
unlock_buffer(bh);
- put_bh(bh);
bh = bh->b_this_page;
}
return false;
@@ -818,7 +814,6 @@ unlock_buffers:
bh = head;
do {
unlock_buffer(bh);
- put_bh(bh);
bh = bh->b_this_page;
} while (bh != head);
_
Patches currently in -mm which might be from jack(a)suse.cz are
The patch titled
Subject: kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
has been removed from the -mm tree. Its filename was
kernel-release-ptraced-tasks-before-zap_pid_ns_processes.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Andrei Vagin <avagin(a)gmail.com>
Subject: kernel/exit.c: release ptraced tasks before zap_pid_ns_processes
Currently, exit_ptrace() adds all ptraced tasks in a dead list, then
zap_pid_ns_processes() waits on all tasks in a current pidns, and only
then are tasks from the dead list released.
zap_pid_ns_processes() can get stuck on waiting tasks from the dead list. In
this case, we will have one unkillable process with one or more dead
children.
Thanks to Oleg for the advice to release tasks in find_child_reaper().
Link: http://lkml.kernel.org/r/20190110175200.12442-1-avagin@gmail.com
Fixes: 7c8bd2322c7f ("exit: ptrace: shift "reap dead" code from exit_ptrace() to forget_original_parent()")
Signed-off-by: Andrei Vagin <avagin(a)gmail.com>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/exit.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--- a/kernel/exit.c~kernel-release-ptraced-tasks-before-zap_pid_ns_processes
+++ a/kernel/exit.c
@@ -558,12 +558,14 @@ static struct task_struct *find_alive_th
return NULL;
}
-static struct task_struct *find_child_reaper(struct task_struct *father)
+static struct task_struct *find_child_reaper(struct task_struct *father,
+ struct list_head *dead)
__releases(&tasklist_lock)
__acquires(&tasklist_lock)
{
struct pid_namespace *pid_ns = task_active_pid_ns(father);
struct task_struct *reaper = pid_ns->child_reaper;
+ struct task_struct *p, *n;
if (likely(reaper != father))
return reaper;
@@ -579,6 +581,12 @@ static struct task_struct *find_child_re
panic("Attempted to kill init! exitcode=0x%08x\n",
father->signal->group_exit_code ?: father->exit_code);
}
+
+ list_for_each_entry_safe(p, n, dead, ptrace_entry) {
+ list_del_init(&p->ptrace_entry);
+ release_task(p);
+ }
+
zap_pid_ns_processes(pid_ns);
write_lock_irq(&tasklist_lock);
@@ -668,7 +676,7 @@ static void forget_original_parent(struc
exit_ptrace(father, dead);
/* Can drop and reacquire tasklist_lock */
- reaper = find_child_reaper(father);
+ reaper = find_child_reaper(father, dead);
if (list_empty(&father->children))
return;
_
Patches currently in -mm which might be from avagin(a)gmail.com are
ptrace-take-into-account-saved_sigmask-in-ptrace_getsetsigmask.patch
include-replace-tsk-to-task-in-linux-sched-signalh.patch