The patch titled
Subject: mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
has been removed from the -mm tree. Its filename was
mm-hugetlb-switch-to-css_tryget-in-hugetlb_cgroup_charge_cgroup.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
An exiting task might belong to an offline cgroup. In this case an
attempt to grab a cgroup reference from the task can end up with an
infinite loop in hugetlb_cgroup_charge_cgroup(), because neither the
cgroup will become online, neither the task will be migrated to a live
cgroup.
Fix this by switching over to css_tryget(). As css_tryget_online() can't
guarantee that the cgroup won't go offline, in most cases the check
doesn't make sense. In this particular case users of
hugetlb_cgroup_charge_cgroup() are not affected by this change.
A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Link: http://lkml.kernel.org/r/20191106225131.3543616-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Tejun Heo <tj(a)kernel.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb_cgroup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb_cgroup.c~mm-hugetlb-switch-to-css_tryget-in-hugetlb_cgroup_charge_cgroup
+++ a/mm/hugetlb_cgroup.c
@@ -196,7 +196,7 @@ int hugetlb_cgroup_charge_cgroup(int idx
again:
rcu_read_lock();
h_cg = hugetlb_cgroup_from_task(current);
- if (!css_tryget_online(&h_cg->css)) {
+ if (!css_tryget(&h_cg->css)) {
rcu_read_unlock();
goto again;
}
_
Patches currently in -mm which might be from guro(a)fb.com are
The patch titled
Subject: mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
has been removed from the -mm tree. Its filename was
mm-memcg-switch-to-css_tryget-in-get_mem_cgroup_from_mm.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
We've encountered a rcu stall in get_mem_cgroup_from_mm():
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
(t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
<...>
RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
<...>
__memcg_kmem_charge+0x55/0x140
__alloc_pages_nodemask+0x267/0x320
pipe_write+0x1ad/0x400
new_sync_write+0x127/0x1c0
__kernel_write+0x4f/0xf0
dump_emit+0x91/0xc0
writenote+0xa0/0xc0
elf_core_dump+0x11af/0x1430
do_coredump+0xc65/0xee0
? unix_stream_sendmsg+0x37d/0x3b0
get_signal+0x132/0x7c0
do_signal+0x36/0x640
? recalc_sigpending+0x17/0x50
exit_to_usermode_loop+0x61/0xd0
do_syscall_64+0xd4/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The problem is caused by an exiting task which is associated with an
offline memcg. We're iterating over and over in the do {} while
(!css_tryget_online()) loop, but obviously the memcg won't become online
and the exiting task won't be migrated to a live memcg.
Let's fix it by switching from css_tryget_online() to css_tryget().
As css_tryget_online() cannot guarantee that the memcg won't go offline,
the check is usually useless, except some rare cases when for example it
determines if something should be presented to a user.
A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Johannes:
: The bug aside, it doesn't matter whether the cgroup is online for the
: callers. It used to matter when offlining needed to evacuate all charges
: from the memcg, and so needed to prevent new ones from showing up, but we
: don't care now.
Link: http://lkml.kernel.org/r/20191106225131.3543616-1-guro@fb.com
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Tejun Heo <tj(a)kernel.org>
Reviewed-by: Shakeel Butt <shakeeb(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Michal Koutn <mkoutny(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcg-switch-to-css_tryget-in-get_mem_cgroup_from_mm
+++ a/mm/memcontrol.c
@@ -960,7 +960,7 @@ struct mem_cgroup *get_mem_cgroup_from_m
if (unlikely(!memcg))
memcg = root_mem_cgroup;
}
- } while (!css_tryget_online(&memcg->css));
+ } while (!css_tryget(&memcg->css));
rcu_read_unlock();
return memcg;
}
_
Patches currently in -mm which might be from guro(a)fb.com are
The patch titled
Subject: mm: mempolicy: fix the wrong return value and potential pages leak of mbind
has been removed from the -mm tree. Its filename was
mm-mempolicy-fix-the-wrong-return-value-and-potential-pages-leak-of-mbind.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm: mempolicy: fix the wrong return value and potential pages leak of mbind
Commit d883544515aa ("mm: mempolicy: make the behavior consistent when
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") fixed the return value
of mbind() for a couple of corner cases. But, it altered the errno for
some other cases, for example, mbind() should return -EFAULT when part or
all of the memory range specified by nodemask and maxnode points outside
your accessible address space, or there was an unmapped hole in the
specified memory range specified by addr and len.
Fix this by preserving the errno returned by queue_pages_range(). And,
the pagelist may be not empty even though queue_pages_range() returns
error, put the pages back to LRU since mbind_range() is not called to
really apply the policy so those pages should not be migrated, this is
also the old behavior before the problematic commit.
Link: http://lkml.kernel.org/r/1572454731-3925-1-git-send-email-yang.shi@linux.al…
Fixes: d883544515aa ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified")
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Li Xinhai <lixinhai.lxh(a)gmail.com>
Reviewed-by: Li Xinhai <lixinhai.lxh(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [4.19 and 5.2+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-the-wrong-return-value-and-potential-pages-leak-of-mbind
+++ a/mm/mempolicy.c
@@ -672,7 +672,9 @@ static const struct mm_walk_ops queue_pa
* 1 - there is unmovable page, but MPOL_MF_MOVE* & MPOL_MF_STRICT were
* specified.
* 0 - queue pages successfully or no misplaced page.
- * -EIO - there is misplaced page and only MPOL_MF_STRICT was specified.
+ * errno - i.e. misplaced pages with MPOL_MF_STRICT specified (-EIO) or
+ * memory range specified by nodemask and maxnode points outside
+ * your accessible address space (-EFAULT)
*/
static int
queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end,
@@ -1286,7 +1288,7 @@ static long do_mbind(unsigned long start
flags | MPOL_MF_INVERT, &pagelist);
if (ret < 0) {
- err = -EIO;
+ err = ret;
goto up_out;
}
@@ -1305,10 +1307,12 @@ static long do_mbind(unsigned long start
if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT)))
err = -EIO;
- } else
- putback_movable_pages(&pagelist);
-
+ } else {
up_out:
+ if (!list_empty(&pagelist))
+ putback_movable_pages(&pagelist);
+ }
+
up_write(&mm->mmap_sem);
mpol_out:
mpol_put(new);
_
Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are
mm-rmap-use-vm_bug_on_page-in-__page_check_anon_rmap.patch
mm-vmscan-remove-unused-scan_control-parameter-from-pageout.patch
mm-migrate-handle-freed-page-at-the-first-place.patch
mm-shmem-use-proper-gfp-flags-for-shmem_writepage.patch
Hello,
I'm requesting this commit to be back-ported to v4.14:
---
commit 5b18f1289808fee5d04a7e6ecf200189f41a4db6
Author: Stephen Suryaputra <ssuryaextr(a)gmail.com>
Date: Wed Jun 26 02:21:16 2019 -0400
ipv4: reset rt_iif for recirculated mcast/bcast out pkts
Multicast or broadcast egress packets have rt_iif set to the oif. These
packets might be recirculated back as input and lookup to the raw
sockets may fail because they are bound to the incoming interface
(skb_iif). If rt_iif is not zero, during the lookup, inet_iif() function
returns rt_iif instead of skb_iif. Hence, the lookup fails.
v2: Make it non vrf specific (David Ahern). Reword the changelog to
reflect it.
Signed-off-by: Stephen Suryaputra <ssuryaextr(a)gmail.com>
Reviewed-by: David Ahern <dsahern(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
We found the issue in that release and the above commit is on
linux-stable. On the discussion behind this commit, please see:
https://www.spinics.net/lists/netdev/msg581045.html
I think after the following diff is needed on top of the above commit
for v4.14:
---
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 4d85a4fdfdb0..ad2718c1624e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1623,11 +1623,8 @@ struct rtable *rt_dst_clone(struct net_device *dev, struct rtable *rt)
new_rt->rt_iif = rt->rt_iif;
new_rt->rt_pmtu = rt->rt_pmtu;
new_rt->rt_mtu_locked = rt->rt_mtu_locked;
- new_rt->rt_gw_family = rt->rt_gw_family;
- if (rt->rt_gw_family == AF_INET)
- new_rt->rt_gw4 = rt->rt_gw4;
- else if (rt->rt_gw_family == AF_INET6)
- new_rt->rt_gw6 = rt->rt_gw6;
+ new_rt->rt_gateway = rt->rt_gateway;
+ new_rt->rt_table_id = rt->rt_table_id;
INIT_LIST_HEAD(&new_rt->rt_uncached);
new_rt->dst.flags |= DST_HOST;
---
Thank you,
Stephen.