The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 61fb0d01680771f72cc9d39783fb2c122aaad51e Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Wed, 15 May 2019 19:39:52 -0700
Subject: [PATCH] ipv6: prevent possible fib6 leaks
At ipv6 route dismantle, fib6_drop_pcpu_from() is responsible
for finding all percpu routes and set their ->from pointer
to NULL, so that fib6_ref can reach its expected value (1).
The problem right now is that other cpus can still catch the
route being deleted, since there is no rcu grace period
between the route deletion and call to fib6_drop_pcpu_from()
This can leak the fib6 and associated resources, since no
notifier will take care of removing the last reference(s).
I decided to add another boolean (fib6_destroying) instead
of reusing/renaming exception_bucket_flushed to ease stable backports,
and properly document the memory barriers used to implement this fix.
This patch has been co-developped with Wei Wang.
Fixes: 93531c674315 ("net/ipv6: separate handling of FIB entries from dst based routes")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Cc: Wei Wang <weiwan(a)google.com>
Cc: David Ahern <dsahern(a)gmail.com>
Cc: Martin Lau <kafai(a)fb.com>
Acked-by: Wei Wang <weiwan(a)google.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Reviewed-by: David Ahern <dsahern(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 40105738e2f6..525f701653ca 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -167,7 +167,8 @@ struct fib6_info {
dst_nocount:1,
dst_nopolicy:1,
dst_host:1,
- unused:3;
+ fib6_destroying:1,
+ unused:2;
struct fib6_nh fib6_nh;
struct rcu_head rcu;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 08e0390e001c..008421b550c6 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -904,6 +904,12 @@ static void fib6_drop_pcpu_from(struct fib6_info *f6i,
{
int cpu;
+ /* Make sure rt6_make_pcpu_route() wont add other percpu routes
+ * while we are cleaning them here.
+ */
+ f6i->fib6_destroying = 1;
+ mb(); /* paired with the cmpxchg() in rt6_make_pcpu_route() */
+
/* release the reference to this fib entry from
* all of its cached pcpu routes
*/
@@ -927,6 +933,9 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,
{
struct fib6_table *table = rt->fib6_table;
+ if (rt->rt6i_pcpu)
+ fib6_drop_pcpu_from(rt, table);
+
if (refcount_read(&rt->fib6_ref) != 1) {
/* This route is used as dummy address holder in some split
* nodes. It is not leaked, but it still holds other resources,
@@ -948,9 +957,6 @@ static void fib6_purge_rt(struct fib6_info *rt, struct fib6_node *fn,
fn = rcu_dereference_protected(fn->parent,
lockdep_is_held(&table->tb6_lock));
}
-
- if (rt->rt6i_pcpu)
- fib6_drop_pcpu_from(rt, table);
}
}
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 23a20d62daac..27c0cc5d9d30 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1295,6 +1295,13 @@ static struct rt6_info *rt6_make_pcpu_route(struct net *net,
prev = cmpxchg(p, NULL, pcpu_rt);
BUG_ON(prev);
+ if (res->f6i->fib6_destroying) {
+ struct fib6_info *from;
+
+ from = xchg((__force struct fib6_info **)&pcpu_rt->from, NULL);
+ fib6_info_release(from);
+ }
+
return pcpu_rt;
}
From: Dexuan Cui <decui(a)microsoft.com>
commit b5679cebf780c6f1c2451a73bf1842a4409840e7 upstream
The changes to split ring allocation from open/close, broke
the cleanup of subchannels. This resulted in problems using
uio on network devices because the subchannel was left behind
when the network device was unbound.
The cause was in the disconnect logic which used list splice
to move the subchannel list into a local variable. This won't
work because the subchannel list is needed later during the
process of the rescind messages (relid2channel).
The fix is to just leave the subchannel list in place
which is what the original code did. The list is cleaned
up later when the host rescind is processed.
Without the fix, we have a lot of "hang" issues in netvsc when we
try to change the NIC's MTU, set the number of channels, etc.
Fixes: ae6935ed7d42 ("vmbus: split ring buffer allocation from open")
Cc: stable(a)vger.kernel.org
Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hv/channel.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 170770339720..8e23ed6ea9ab 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -701,20 +701,12 @@ static int vmbus_close_internal(struct vmbus_channel *channel)
int vmbus_disconnect_ring(struct vmbus_channel *channel)
{
struct vmbus_channel *cur_channel, *tmp;
- unsigned long flags;
- LIST_HEAD(list);
int ret;
if (channel->primary_channel != NULL)
return -EINVAL;
- /* Snapshot the list of subchannels */
- spin_lock_irqsave(&channel->lock, flags);
- list_splice_init(&channel->sc_list, &list);
- channel->num_sc = 0;
- spin_unlock_irqrestore(&channel->lock, flags);
-
- list_for_each_entry_safe(cur_channel, tmp, &list, sc_list) {
+ list_for_each_entry_safe(cur_channel, tmp, &channel->sc_list, sc_list) {
if (cur_channel->rescind)
wait_for_completion(&cur_channel->rescind_event);
--
2.20.1
From: Ross Lagerwall <ross.lagerwall(a)citrix.com>
[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:
vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1
This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.
Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: Ross Lagerwall <ross.lagerwall(a)citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/glock.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index d5284d0dbdb59..cd6a64478a026 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -183,15 +183,19 @@ static int demote_ok(const struct gfs2_glock *gl)
void gfs2_glock_add_to_lru(struct gfs2_glock *gl)
{
+ if (!(gl->gl_ops->go_flags & GLOF_LRU))
+ return;
+
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru))
- list_del_init(&gl->gl_lru);
- else
+ list_del(&gl->gl_lru);
+ list_add_tail(&gl->gl_lru, &lru_list);
+
+ if (!test_bit(GLF_LRU, &gl->gl_flags)) {
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
+ }
- list_add_tail(&gl->gl_lru, &lru_list);
- set_bit(GLF_LRU, &gl->gl_flags);
spin_unlock(&lru_lock);
}
@@ -201,7 +205,7 @@ static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
return;
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru)) {
+ if (test_bit(GLF_LRU, &gl->gl_flags)) {
list_del_init(&gl->gl_lru);
atomic_dec(&lru_count);
clear_bit(GLF_LRU, &gl->gl_flags);
@@ -1158,8 +1162,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
!test_bit(GLF_DEMOTE, &gl->gl_flags))
fast_path = 1;
}
- if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl) &&
- (glops->go_flags & GLOF_LRU))
+ if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl))
gfs2_glock_add_to_lru(gl);
trace_gfs2_glock_queue(gh, 0);
@@ -1454,6 +1457,7 @@ __acquires(&lru_lock)
if (!spin_trylock(&gl->gl_lockref.lock)) {
add_back_to_lru:
list_add(&gl->gl_lru, &lru_list);
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
continue;
}
@@ -1461,7 +1465,6 @@ __acquires(&lru_lock)
spin_unlock(&gl->gl_lockref.lock);
goto add_back_to_lru;
}
- clear_bit(GLF_LRU, &gl->gl_flags);
gl->gl_lockref.count++;
if (demote_ok(gl))
handle_callback(gl, LM_ST_UNLOCKED, 0, false);
@@ -1496,6 +1499,7 @@ static long gfs2_scan_glock_lru(int nr)
if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
list_move(&gl->gl_lru, &dispose);
atomic_dec(&lru_count);
+ clear_bit(GLF_LRU, &gl->gl_flags);
freed++;
continue;
}
--
2.20.1
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: dvb: warning about dvb frequency limits produces too much noise
Author: Sean Young <sean(a)mess.org>
Date: Mon May 20 15:43:49 2019 -0400
This can be a debug message. Favour dev_dbg() over dprintk() as this is
already used much more than dprintk().
dvb_frontend: dvb_frontend_get_frequency_limits: frequency interval: tuner: 45000000...860000000, frontend: 44250000...867250000
Fixes: 00ecd6bc7128 ("media: dvb_frontend: add debug message for frequency intervals")
Cc: <stable(a)vger.kernel.org> # 5.0
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
drivers/media/dvb-core/dvb_frontend.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c
index fbdb4ecc7c50..7402c9834189 100644
--- a/drivers/media/dvb-core/dvb_frontend.c
+++ b/drivers/media/dvb-core/dvb_frontend.c
@@ -917,7 +917,7 @@ static void dvb_frontend_get_frequency_limits(struct dvb_frontend *fe,
"DVB: adapter %i frontend %u frequency limits undefined - fix the driver\n",
fe->dvb->num, fe->id);
- dprintk("frequency interval: tuner: %u...%u, frontend: %u...%u",
+ dev_dbg(fe->dvb->device, "frequency interval: tuner: %u...%u, frontend: %u...%u",
tuner_min, tuner_max, frontend_min, frontend_max);
/* If the standard is for satellite, convert frequencies to kHz */
From: Ross Lagerwall <ross.lagerwall(a)citrix.com>
[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:
vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1
This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.
Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: Ross Lagerwall <ross.lagerwall(a)citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/glock.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 09a0cf5f3dd86..1eb737c466ddc 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -136,22 +136,26 @@ static int demote_ok(const struct gfs2_glock *gl)
void gfs2_glock_add_to_lru(struct gfs2_glock *gl)
{
+ if (!(gl->gl_ops->go_flags & GLOF_LRU))
+ return;
+
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru))
- list_del_init(&gl->gl_lru);
- else
+ list_del(&gl->gl_lru);
+ list_add_tail(&gl->gl_lru, &lru_list);
+
+ if (!test_bit(GLF_LRU, &gl->gl_flags)) {
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
+ }
- list_add_tail(&gl->gl_lru, &lru_list);
- set_bit(GLF_LRU, &gl->gl_flags);
spin_unlock(&lru_lock);
}
static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
{
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru)) {
+ if (test_bit(GLF_LRU, &gl->gl_flags)) {
list_del_init(&gl->gl_lru);
atomic_dec(&lru_count);
clear_bit(GLF_LRU, &gl->gl_flags);
@@ -1040,8 +1044,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
!test_bit(GLF_DEMOTE, &gl->gl_flags))
fast_path = 1;
}
- if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl) &&
- (glops->go_flags & GLOF_LRU))
+ if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl))
gfs2_glock_add_to_lru(gl);
trace_gfs2_glock_queue(gh, 0);
@@ -1341,6 +1344,7 @@ __acquires(&lru_lock)
if (!spin_trylock(&gl->gl_lockref.lock)) {
add_back_to_lru:
list_add(&gl->gl_lru, &lru_list);
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
continue;
}
@@ -1348,7 +1352,6 @@ __acquires(&lru_lock)
spin_unlock(&gl->gl_lockref.lock);
goto add_back_to_lru;
}
- clear_bit(GLF_LRU, &gl->gl_flags);
gl->gl_lockref.count++;
if (demote_ok(gl))
handle_callback(gl, LM_ST_UNLOCKED, 0, false);
@@ -1384,6 +1387,7 @@ static long gfs2_scan_glock_lru(int nr)
if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
list_move(&gl->gl_lru, &dispose);
atomic_dec(&lru_count);
+ clear_bit(GLF_LRU, &gl->gl_flags);
freed++;
continue;
}
--
2.20.1
From: Ross Lagerwall <ross.lagerwall(a)citrix.com>
[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:
vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1
This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.
Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: Ross Lagerwall <ross.lagerwall(a)citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/glock.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 9d566e62684c2..775256141e9fb 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -183,15 +183,19 @@ static int demote_ok(const struct gfs2_glock *gl)
void gfs2_glock_add_to_lru(struct gfs2_glock *gl)
{
+ if (!(gl->gl_ops->go_flags & GLOF_LRU))
+ return;
+
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru))
- list_del_init(&gl->gl_lru);
- else
+ list_del(&gl->gl_lru);
+ list_add_tail(&gl->gl_lru, &lru_list);
+
+ if (!test_bit(GLF_LRU, &gl->gl_flags)) {
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
+ }
- list_add_tail(&gl->gl_lru, &lru_list);
- set_bit(GLF_LRU, &gl->gl_flags);
spin_unlock(&lru_lock);
}
@@ -201,7 +205,7 @@ static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
return;
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru)) {
+ if (test_bit(GLF_LRU, &gl->gl_flags)) {
list_del_init(&gl->gl_lru);
atomic_dec(&lru_count);
clear_bit(GLF_LRU, &gl->gl_flags);
@@ -1158,8 +1162,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
!test_bit(GLF_DEMOTE, &gl->gl_flags))
fast_path = 1;
}
- if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl) &&
- (glops->go_flags & GLOF_LRU))
+ if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl))
gfs2_glock_add_to_lru(gl);
trace_gfs2_glock_queue(gh, 0);
@@ -1455,6 +1458,7 @@ __acquires(&lru_lock)
if (!spin_trylock(&gl->gl_lockref.lock)) {
add_back_to_lru:
list_add(&gl->gl_lru, &lru_list);
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
continue;
}
@@ -1462,7 +1466,6 @@ __acquires(&lru_lock)
spin_unlock(&gl->gl_lockref.lock);
goto add_back_to_lru;
}
- clear_bit(GLF_LRU, &gl->gl_flags);
gl->gl_lockref.count++;
if (demote_ok(gl))
handle_callback(gl, LM_ST_UNLOCKED, 0, false);
@@ -1497,6 +1500,7 @@ static long gfs2_scan_glock_lru(int nr)
if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
list_move(&gl->gl_lru, &dispose);
atomic_dec(&lru_count);
+ clear_bit(GLF_LRU, &gl->gl_flags);
freed++;
continue;
}
--
2.20.1
From: Ross Lagerwall <ross.lagerwall(a)citrix.com>
[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:
vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
negative objects to delete nr=-1
This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
to the lru list but the lru_count has still decreased by one.
Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.
Signed-off-by: Ross Lagerwall <ross.lagerwall(a)citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/gfs2/glock.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 7a8b1d72e3d91..efd44d5645d83 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -136,22 +136,26 @@ static int demote_ok(const struct gfs2_glock *gl)
void gfs2_glock_add_to_lru(struct gfs2_glock *gl)
{
+ if (!(gl->gl_ops->go_flags & GLOF_LRU))
+ return;
+
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru))
- list_del_init(&gl->gl_lru);
- else
+ list_del(&gl->gl_lru);
+ list_add_tail(&gl->gl_lru, &lru_list);
+
+ if (!test_bit(GLF_LRU, &gl->gl_flags)) {
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
+ }
- list_add_tail(&gl->gl_lru, &lru_list);
- set_bit(GLF_LRU, &gl->gl_flags);
spin_unlock(&lru_lock);
}
static void gfs2_glock_remove_from_lru(struct gfs2_glock *gl)
{
spin_lock(&lru_lock);
- if (!list_empty(&gl->gl_lru)) {
+ if (test_bit(GLF_LRU, &gl->gl_flags)) {
list_del_init(&gl->gl_lru);
atomic_dec(&lru_count);
clear_bit(GLF_LRU, &gl->gl_flags);
@@ -1048,8 +1052,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh)
!test_bit(GLF_DEMOTE, &gl->gl_flags))
fast_path = 1;
}
- if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl) &&
- (glops->go_flags & GLOF_LRU))
+ if (!test_bit(GLF_LFLUSH, &gl->gl_flags) && demote_ok(gl))
gfs2_glock_add_to_lru(gl);
trace_gfs2_glock_queue(gh, 0);
@@ -1349,6 +1352,7 @@ __acquires(&lru_lock)
if (!spin_trylock(&gl->gl_lockref.lock)) {
add_back_to_lru:
list_add(&gl->gl_lru, &lru_list);
+ set_bit(GLF_LRU, &gl->gl_flags);
atomic_inc(&lru_count);
continue;
}
@@ -1356,7 +1360,6 @@ __acquires(&lru_lock)
spin_unlock(&gl->gl_lockref.lock);
goto add_back_to_lru;
}
- clear_bit(GLF_LRU, &gl->gl_flags);
gl->gl_lockref.count++;
if (demote_ok(gl))
handle_callback(gl, LM_ST_UNLOCKED, 0, false);
@@ -1392,6 +1395,7 @@ static long gfs2_scan_glock_lru(int nr)
if (!test_bit(GLF_LOCK, &gl->gl_flags)) {
list_move(&gl->gl_lru, &dispose);
atomic_dec(&lru_count);
+ clear_bit(GLF_LRU, &gl->gl_flags);
freed++;
continue;
}
--
2.20.1
The patch titled
Subject: memcg: make it work on sparse non-0-node systems
has been added to the -mm tree. Its filename is
memcg-make-it-work-on-sparse-non-0-node-systems.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/memcg-make-it-work-on-sparse-non-0…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/memcg-make-it-work-on-sparse-non-0…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Jiri Slaby <jslaby(a)suse.cz>
Subject: memcg: make it work on sparse non-0-node systems
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt(a)linux.vnet.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/list_lru.h | 1 +
mm/list_lru.c | 8 +++-----
2 files changed, 4 insertions(+), 5 deletions(-)
--- a/include/linux/list_lru.h~memcg-make-it-work-on-sparse-non-0-node-systems
+++ a/include/linux/list_lru.h
@@ -54,6 +54,7 @@ struct list_lru {
#ifdef CONFIG_MEMCG_KMEM
struct list_head list;
int shrinker_id;
+ bool memcg_aware;
#endif
};
--- a/mm/list_lru.c~memcg-make-it-work-on-sparse-non-0-node-systems
+++ a/mm/list_lru.c
@@ -37,11 +37,7 @@ static int lru_shrinker_id(struct list_l
static inline bool list_lru_memcg_aware(struct list_lru *lru)
{
- /*
- * This needs node 0 to be always present, even
- * in the systems supporting sparse numa ids.
- */
- return !!lru->node[0].memcg_lrus;
+ return lru->memcg_aware;
}
static inline struct list_lru_one *
@@ -451,6 +447,8 @@ static int memcg_init_list_lru(struct li
{
int i;
+ lru->memcg_aware = memcg_aware;
+
if (!memcg_aware)
return 0;
_
Patches currently in -mm which might be from jslaby(a)suse.cz are
memcg-make-it-work-on-sparse-non-0-node-systems.patch