(CC: davem, soheil & gregkh)
On 03/17/18 20:12, Holger Hoffstätte wrote:
> On 03/17/18 19:41, Carlos Carvalho wrote:
>> I've put 4.14.27 this morning in this machine and in about 2h it started
>> showing null dereferences identical to the following one. There were several of
>> them, with about 1/2h of interval. Strangely it continued to work and I saw no
>> other anomalies. I've just reverted to 4.14.26.
>>
>> It only happened in this machine, which has a net traffic of several Gb/s and
>> thousands of simultaneous connections.
>>
>> Mar 17 13:29:21 sagres kernel: : BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
>> Mar 17 13:29:21 sagres kernel: : IP: tcp_push+0x4e/0xe7
>> Mar 17 13:29:21 sagres kernel: : PGD 0 P4D 0
>> Mar 17 13:29:21 sagres kernel: : Oops: 0002 [#1] SMP PTI
>> Mar 17 13:29:21 sagres kernel: : CPU: 55 PID: 2658 Comm: apache2 Not tainted 4.14.27 #4
(snip)
>
> Fixed by: https://www.spinics.net/lists/netdev/msg489445.html
>
> -h
>
This patch is in the netdev patchwork at https://patchwork.ozlabs.org/patch/886324/
but has been marked as "not applicable" without further queued/rejected comment
from Dave, so I believe it became a victim of email lossage.
As the patch says it doesn't apply to anything older than 4.14, but it has been
tested & reported by several people as fixing the problem, and indeed works
fine. Since GregKH only accepts net patches from Dave I wanted to make sure
it got queued up for 4.14.
Thanks,
Holger
Quoting Eric Anholt (2018-03-13 09:56:57)
> Stephen Boyd <sboyd(a)kernel.org> writes:
> >
> > What exactly is going on here? It sounds like the framework isn't aware
> > of the 'on/off' boot state of certain clks (a known problem) and that's
> > causing some sort of problem when changing rates? This usually happens
> > with PLLs that are enabled at boot time and can't support their rate
> > changing when they're enabled. We really should start reading on/off
> > state and "hand off" that enabled state to something in the framework so
> > we at least know if a clk is enabled or not out of boot. There was some
> > work on clk handoff done a while ago by Mike that never landed which may
> > be useful to finish this off. Maybe we can pass that enabled state off
> > to the clk we always create for a clk_hw structure at registration time
> > and then have clk_disable_unused operate on that clk pointer at late
> > init.
>
> Yes, the usual problem of clk not handling boot-time clock state well.
>
> That said, it's patch 1 that's critical for fixing many of our users,
> and we need that in as soon as possible. #2 is also reviewed and ready.
Got it. I'll pick up the first two then.
This is a note to let you know that I've just added the patch titled
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-fix-use-after-free-when-cleaning-up-fs_devs-with-a-single-stale-device.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fd649f10c3d21ee9d7542c609f29978bdf73ab94 Mon Sep 17 00:00:00 2001
From: Nikolay Borisov <nborisov(a)suse.com>
Date: Tue, 30 Jan 2018 16:07:37 +0200
Subject: btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
From: Nikolay Borisov <nborisov(a)suse.com>
commit fd649f10c3d21ee9d7542c609f29978bdf73ab94 upstream.
Commit 4fde46f0cc71 ("Btrfs: free the stale device") introduced
btrfs_free_stale_device which iterates the device lists for all
registered btrfs filesystems and deletes those devices which aren't
mounted. In a btrfs_devices structure has only 1 device attached to it
and it is unused then btrfs_free_stale_devices will proceed to also free
the btrfs_fs_devices struct itself. Currently this leads to a use after
free since list_for_each_entry will try to perform a check on the
already freed memory to see if it has to terminate the loop.
The fix is to use 'break' when we know we are freeing the current
fs_devs.
Fixes: 4fde46f0cc71 ("Btrfs: free the stale device")
Signed-off-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Anand Jain <anand.jain(a)oracle.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/volumes.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -583,6 +583,7 @@ void btrfs_free_stale_device(struct btrf
btrfs_sysfs_remove_fsid(fs_devs);
list_del(&fs_devs->list);
free_fs_devices(fs_devs);
+ break;
} else {
fs_devs->num_devices--;
list_del(&dev->dev_list);
Patches currently in stable-queue which might be from nborisov(a)suse.com are
queue-4.9/btrfs-fix-use-after-free-when-cleaning-up-fs_devs-with-a-single-stale-device.patch
This is a note to let you know that I've just added the patch titled
btrfs: alloc_chunk: fix DUP stripe size handling
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-alloc_chunk-fix-dup-stripe-size-handling.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 92e222df7b8f05c565009c7383321b593eca488b Mon Sep 17 00:00:00 2001
From: Hans van Kranenburg <hans.van.kranenburg(a)mendix.com>
Date: Mon, 5 Feb 2018 17:45:11 +0100
Subject: btrfs: alloc_chunk: fix DUP stripe size handling
From: Hans van Kranenburg <hans.van.kranenburg(a)mendix.com>
commit 92e222df7b8f05c565009c7383321b593eca488b upstream.
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.
The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.
Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.
The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.
Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.
Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.
Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.
The real problem was already introduced with the rest of the code in
73c5de0051.
The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.
Reported-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg(a)mendix.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/volumes.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4748,10 +4748,13 @@ static int __btrfs_alloc_chunk(struct bt
if (devs_max && ndevs > devs_max)
ndevs = devs_max;
/*
- * the primary goal is to maximize the number of stripes, so use as many
- * devices as possible, even if the stripes are not maximum sized.
+ * The primary goal is to maximize the number of stripes, so use as
+ * many devices as possible, even if the stripes are not maximum sized.
+ *
+ * The DUP profile stores more than one stripe per device, the
+ * max_avail is the total size so we have to adjust.
*/
- stripe_size = devices_info[ndevs-1].max_avail;
+ stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
num_stripes = ndevs * dev_stripes;
/*
@@ -4791,8 +4794,6 @@ static int __btrfs_alloc_chunk(struct bt
stripe_size = devices_info[ndevs-1].max_avail;
}
- stripe_size = div_u64(stripe_size, dev_stripes);
-
/* align to BTRFS_STRIPE_LEN */
stripe_size = div_u64(stripe_size, raid_stripe_len);
stripe_size *= raid_stripe_len;
Patches currently in stable-queue which might be from hans.van.kranenburg(a)mendix.com are
queue-4.9/btrfs-alloc_chunk-fix-dup-stripe-size-handling.patch
This is a note to let you know that I've just added the patch titled
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-fix-use-after-free-when-cleaning-up-fs_devs-with-a-single-stale-device.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From fd649f10c3d21ee9d7542c609f29978bdf73ab94 Mon Sep 17 00:00:00 2001
From: Nikolay Borisov <nborisov(a)suse.com>
Date: Tue, 30 Jan 2018 16:07:37 +0200
Subject: btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
From: Nikolay Borisov <nborisov(a)suse.com>
commit fd649f10c3d21ee9d7542c609f29978bdf73ab94 upstream.
Commit 4fde46f0cc71 ("Btrfs: free the stale device") introduced
btrfs_free_stale_device which iterates the device lists for all
registered btrfs filesystems and deletes those devices which aren't
mounted. In a btrfs_devices structure has only 1 device attached to it
and it is unused then btrfs_free_stale_devices will proceed to also free
the btrfs_fs_devices struct itself. Currently this leads to a use after
free since list_for_each_entry will try to perform a check on the
already freed memory to see if it has to terminate the loop.
The fix is to use 'break' when we know we are freeing the current
fs_devs.
Fixes: 4fde46f0cc71 ("Btrfs: free the stale device")
Signed-off-by: Nikolay Borisov <nborisov(a)suse.com>
Reviewed-by: Anand Jain <anand.jain(a)oracle.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/volumes.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -568,6 +568,7 @@ void btrfs_free_stale_device(struct btrf
btrfs_sysfs_remove_fsid(fs_devs);
list_del(&fs_devs->list);
free_fs_devices(fs_devs);
+ break;
} else {
fs_devs->num_devices--;
list_del(&dev->dev_list);
Patches currently in stable-queue which might be from nborisov(a)suse.com are
queue-4.4/btrfs-fix-use-after-free-when-cleaning-up-fs_devs-with-a-single-stale-device.patch
This is a note to let you know that I've just added the patch titled
btrfs: alloc_chunk: fix DUP stripe size handling
to the 4.4-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-alloc_chunk-fix-dup-stripe-size-handling.patch
and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From 92e222df7b8f05c565009c7383321b593eca488b Mon Sep 17 00:00:00 2001
From: Hans van Kranenburg <hans.van.kranenburg(a)mendix.com>
Date: Mon, 5 Feb 2018 17:45:11 +0100
Subject: btrfs: alloc_chunk: fix DUP stripe size handling
From: Hans van Kranenburg <hans.van.kranenburg(a)mendix.com>
commit 92e222df7b8f05c565009c7383321b593eca488b upstream.
In case of using DUP, we search for enough unallocated disk space on a
device to hold two stripes.
The devices_info[ndevs-1].max_avail that holds the amount of unallocated
space found is directly assigned to stripe_size, while it's actually
twice the stripe size.
Later on in the code, an unconditional division of stripe_size by
dev_stripes corrects the value, but in the meantime there's a check to
see if the stripe_size does not exceed max_chunk_size. Since during this
check stripe_size is twice the amount as intended, the check will reduce
the stripe_size to max_chunk_size if the actual correct to be used
stripe_size is more than half the amount of max_chunk_size.
The unconditional division later tries to correct stripe_size, but will
actually make sure we can't allocate more than half the max_chunk_size.
Fix this by moving the division by dev_stripes before the max chunk size
check, so it always contains the right value, instead of putting a duct
tape division in further on to get it fixed again.
Since in all other cases than DUP, dev_stripes is 1, this change only
affects DUP.
Other attempts in the past were made to fix this:
* 37db63a400 "Btrfs: fix max chunk size check in chunk allocator" tried
to fix the same problem, but still resulted in part of the code acting
on a wrongly doubled stripe_size value.
* 86db25785a "Btrfs: fix max chunk size on raid5/6" unintentionally
broke this fix again.
The real problem was already introduced with the rest of the code in
73c5de0051.
The user visible result however will be that the max chunk size for DUP
will suddenly double, while it's actually acting according to the limits
in the code again like it was 5 years ago.
Reported-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Link: https://www.spinics.net/lists/linux-btrfs/msg69752.html
Fixes: 73c5de0051 ("btrfs: quasi-round-robin for chunk allocation")
Fixes: 86db25785a ("Btrfs: fix max chunk size on raid5/6")
Signed-off-by: Hans van Kranenburg <hans.van.kranenburg(a)mendix.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
[ update comment ]
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/volumes.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -4638,10 +4638,13 @@ static int __btrfs_alloc_chunk(struct bt
if (devs_max && ndevs > devs_max)
ndevs = devs_max;
/*
- * the primary goal is to maximize the number of stripes, so use as many
- * devices as possible, even if the stripes are not maximum sized.
+ * The primary goal is to maximize the number of stripes, so use as
+ * many devices as possible, even if the stripes are not maximum sized.
+ *
+ * The DUP profile stores more than one stripe per device, the
+ * max_avail is the total size so we have to adjust.
*/
- stripe_size = devices_info[ndevs-1].max_avail;
+ stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
num_stripes = ndevs * dev_stripes;
/*
@@ -4681,8 +4684,6 @@ static int __btrfs_alloc_chunk(struct bt
stripe_size = devices_info[ndevs-1].max_avail;
}
- stripe_size = div_u64(stripe_size, dev_stripes);
-
/* align to BTRFS_STRIPE_LEN */
stripe_size = div_u64(stripe_size, raid_stripe_len);
stripe_size *= raid_stripe_len;
Patches currently in stable-queue which might be from hans.van.kranenburg(a)mendix.com are
queue-4.4/btrfs-alloc_chunk-fix-dup-stripe-size-handling.patch
This is a note to let you know that I've just added the patch titled
btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
btrfs-remove-spurious-warn_on-ref-count-0-in-find_parent_nodes.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From c8195a7b1ad5648857ce20ba24f384faed8512bc Mon Sep 17 00:00:00 2001
From: Zygo Blaxell <ce3g8jdj(a)umail.furryterror.org>
Date: Tue, 23 Jan 2018 22:22:09 -0500
Subject: btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
From: Zygo Blaxell <ce3g8jdj(a)umail.furryterror.org>
commit c8195a7b1ad5648857ce20ba24f384faed8512bc upstream.
Until v4.14, this warning was very infrequent:
WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0
Modules linked in: [...]
CPU: 3 PID: 18172 Comm: bees Tainted: G D W L 4.11.9-zb64+ #1
Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014
Call Trace:
dump_stack+0x85/0xc2
__warn+0xd1/0xf0
warn_slowpath_null+0x1d/0x20
find_parent_nodes+0xc41/0x14e0
__btrfs_find_all_roots+0xad/0x120
? extent_same_check_offsets+0x70/0x70
iterate_extent_inodes+0x168/0x300
iterate_inodes_from_logical+0x87/0xb0
? iterate_inodes_from_logical+0x87/0xb0
? extent_same_check_offsets+0x70/0x70
btrfs_ioctl+0x8ac/0x2820
? lock_acquire+0xc2/0x200
do_vfs_ioctl+0x91/0x700
? __fget+0x112/0x200
SyS_ioctl+0x79/0x90
entry_SYSCALL_64_fastpath+0x23/0xc6
? trace_hardirqs_off_caller+0x1f/0x140
Starting with v4.14 (specifically 86d5f9944252 ("btrfs: convert prelimary
reference tracking to use rbtrees")) the WARN_ON occurs three orders of
magnitude more frequently--almost once per second while running workloads
like bees.
Replace the WARN_ON() with a comment rationale for its removal.
The rationale is paraphrased from an explanation by Edmund Nadolski
<enadolski(a)suse.de> on the linux-btrfs mailing list.
Fixes: 8da6d5815c59 ("Btrfs: added btrfs_find_all_roots()")
Signed-off-by: Zygo Blaxell <ce3g8jdj(a)umail.furryterror.org>
Reviewed-by: Lu Fengqi <lufq.fnst(a)cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/btrfs/backref.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -1263,7 +1263,16 @@ again:
while (node) {
ref = rb_entry(node, struct prelim_ref, rbnode);
node = rb_next(&ref->rbnode);
- WARN_ON(ref->count < 0);
+ /*
+ * ref->count < 0 can happen here if there are delayed
+ * refs with a node->action of BTRFS_DROP_DELAYED_REF.
+ * prelim_ref_insert() relies on this when merging
+ * identical refs to keep the overall count correct.
+ * prelim_ref_insert() will merge only those refs
+ * which compare identically. Any refs having
+ * e.g. different offsets would not be merged,
+ * and would retain their original ref->count < 0.
+ */
if (roots && ref->count && ref->root_id && ref->parent == 0) {
if (sc && sc->root_objectid &&
ref->root_id != sc->root_objectid) {
Patches currently in stable-queue which might be from ce3g8jdj(a)umail.furryterror.org are
queue-4.15/btrfs-remove-spurious-warn_on-ref-count-0-in-find_parent_nodes.patch