The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ecc64fab7d49c678e70bd4c35fe64d2ab3e3d212 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 27 Jul 2021 11:24:43 +0100
Subject: [PATCH] btrfs: fix lost inode on log replay after mix of fsync,
rename and inode eviction
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9fd0348be7f5..e6430ac9bbe8 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6503,8 +6503,8 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* if this inode hasn't been logged and directory we're renaming it
* from hasn't been logged, we don't need to log it
*/
- if (inode->logged_trans < trans->transid &&
- (!old_dir || old_dir->logged_trans < trans->transid))
+ if (!inode_logged(trans, inode) &&
+ (!old_dir || !inode_logged(trans, old_dir)))
return;
/*
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ecc64fab7d49c678e70bd4c35fe64d2ab3e3d212 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 27 Jul 2021 11:24:43 +0100
Subject: [PATCH] btrfs: fix lost inode on log replay after mix of fsync,
rename and inode eviction
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9fd0348be7f5..e6430ac9bbe8 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6503,8 +6503,8 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* if this inode hasn't been logged and directory we're renaming it
* from hasn't been logged, we don't need to log it
*/
- if (inode->logged_trans < trans->transid &&
- (!old_dir || old_dir->logged_trans < trans->transid))
+ if (!inode_logged(trans, inode) &&
+ (!old_dir || !inode_logged(trans, old_dir)))
return;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ecc64fab7d49c678e70bd4c35fe64d2ab3e3d212 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 27 Jul 2021 11:24:43 +0100
Subject: [PATCH] btrfs: fix lost inode on log replay after mix of fsync,
rename and inode eviction
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9fd0348be7f5..e6430ac9bbe8 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6503,8 +6503,8 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* if this inode hasn't been logged and directory we're renaming it
* from hasn't been logged, we don't need to log it
*/
- if (inode->logged_trans < trans->transid &&
- (!old_dir || old_dir->logged_trans < trans->transid))
+ if (!inode_logged(trans, inode) &&
+ (!old_dir || !inode_logged(trans, old_dir)))
return;
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ecc64fab7d49c678e70bd4c35fe64d2ab3e3d212 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Tue, 27 Jul 2021 11:24:43 +0100
Subject: [PATCH] btrfs: fix lost inode on log replay after mix of fsync,
rename and inode eviction
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.14+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9fd0348be7f5..e6430ac9bbe8 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -6503,8 +6503,8 @@ void btrfs_log_new_name(struct btrfs_trans_handle *trans,
* if this inode hasn't been logged and directory we're renaming it
* from hasn't been logged, we don't need to log it
*/
- if (inode->logged_trans < trans->transid &&
- (!old_dir || old_dir->logged_trans < trans->transid))
+ if (!inode_logged(trans, inode) &&
+ (!old_dir || !inode_logged(trans, old_dir)))
return;
/*
Dear list,
The current 5.13 stable kernel branch oopses when handling ext2
filesystems, and the filesystem is not usable, sometimes leading
to a panic.
The bug was introduced during the 5.13 development cycle.
A complete analysis can be found here:
https://lore.kernel.org/linux-ext4/20210713165821.8a268e2c1db4fd5cf452acd2@…
A fix for this bug has been recently merged into the mainline
kernel, with commit id 728d392f8a799f037812d0f2b254fb3b5e115fcf.
The 5.13 branch is the only one affected by this bug.
Best regards,
Javier
This is the start of the stable review cycle for the 5.13.7 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 31 Jul 2021 13:51:22 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.13.7-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.13.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.13.7-rc1
Vasily Averin <vvs(a)virtuozzo.com>
ipv6: ip6_finish_output2: set sk into newly allocated nskb
Sudeep Holla <sudeep.holla(a)arm.com>
ARM: dts: versatile: Fix up interrupt controller node names
Christoph Hellwig <hch(a)lst.de>
iomap: remove the length variable in iomap_seek_hole
Christoph Hellwig <hch(a)lst.de>
iomap: remove the length variable in iomap_seek_data
Hyunchul Lee <hyc.lee(a)gmail.com>
cifs: fix the out of range assignment to bit fields in parse_server_interfaces
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Fix range check for the maximum number of pending messages
Sudeep Holla <sudeep.holla(a)arm.com>
firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add lock nesting notation to hfs_find_init
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: fix high memory mapping in hfs_bnode_read
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add missing clean-up in hfs_fill_super
Zheyu Ma <zheyuma97(a)gmail.com>
drm/ttm: add a check against null pointer dereference
Casey Chen <cachen(a)purestorage.com>
nvme-pci: fix multiple races in nvme_setup_io_queues
Vasily Averin <vvs(a)virtuozzo.com>
ipv6: allocate enough headroom in ip6_finish_output2()
Paul E. McKenney <paulmck(a)kernel.org>
rcu-tasks: Don't delete holdouts within trc_wait_for_one_reader()
Paul E. McKenney <paulmck(a)kernel.org>
rcu-tasks: Don't delete holdouts within trc_inspect_reader()
Xin Long <lucien.xin(a)gmail.com>
sctp: move 198 addresses from unusable to private scope
Eric Dumazet <edumazet(a)google.com>
net: annotate data race around sk_ll_usec
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/garp: fix memleak in garp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/mrp: fix memleak in mrp_request_join()
Paul Gortmaker <paul.gortmaker(a)windriver.com>
cgroup1: fix leaked context root causing sporadic NULL deref in LTP
Yang Yingliang <yangyingliang(a)huawei.com>
workqueue: fix UAF in pwq_unbound_release_workfn()
Miklos Szeredi <mszeredi(a)redhat.com>
af_unix: fix garbage collect vs MSG_PEEK
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/versatile-ab.dts | 5 +--
arch/arm/boot/dts/versatile-pb.dts | 2 +-
drivers/firmware/arm_scmi/driver.c | 12 +++---
drivers/gpu/drm/ttm/ttm_range_manager.c | 3 ++
drivers/nvme/host/pci.c | 66 +++++++++++++++++++++++++++++----
fs/cifs/smb2ops.c | 4 +-
fs/hfs/bfind.c | 14 ++++++-
fs/hfs/bnode.c | 25 ++++++++++---
fs/hfs/btree.h | 7 ++++
fs/hfs/super.c | 10 ++---
fs/internal.h | 1 -
fs/iomap/seek.c | 25 +++++--------
include/linux/fs_context.h | 1 +
include/net/busy_poll.h | 2 +-
include/net/sctp/constants.h | 4 +-
kernel/cgroup/cgroup-v1.c | 4 +-
kernel/rcu/tasks.h | 6 +--
kernel/workqueue.c | 20 ++++++----
net/802/garp.c | 14 +++++++
net/802/mrp.c | 14 +++++++
net/core/sock.c | 2 +-
net/ipv6/ip6_output.c | 28 ++++++++++++++
net/sctp/protocol.c | 3 +-
net/unix/af_unix.c | 51 ++++++++++++++++++++++++-
25 files changed, 256 insertions(+), 71 deletions(-)
This is the start of the stable review cycle for the 5.4.137 release.
There are 21 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 31 Jul 2021 13:51:22 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.137-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.137-rc1
Vasily Averin <vvs(a)virtuozzo.com>
ipv6: ip6_finish_output2: set sk into newly allocated nskb
Sudeep Holla <sudeep.holla(a)arm.com>
ARM: dts: versatile: Fix up interrupt controller node names
Christoph Hellwig <hch(a)lst.de>
iomap: remove the length variable in iomap_seek_hole
Christoph Hellwig <hch(a)lst.de>
iomap: remove the length variable in iomap_seek_data
Hyunchul Lee <hyc.lee(a)gmail.com>
cifs: fix the out of range assignment to bit fields in parse_server_interfaces
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Fix range check for the maximum number of pending messages
Sudeep Holla <sudeep.holla(a)arm.com>
firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add lock nesting notation to hfs_find_init
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: fix high memory mapping in hfs_bnode_read
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add missing clean-up in hfs_fill_super
Vasily Averin <vvs(a)virtuozzo.com>
ipv6: allocate enough headroom in ip6_finish_output2()
Xin Long <lucien.xin(a)gmail.com>
sctp: move 198 addresses from unusable to private scope
Eric Dumazet <edumazet(a)google.com>
net: annotate data race around sk_ll_usec
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/garp: fix memleak in garp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/mrp: fix memleak in mrp_request_join()
Paul Gortmaker <paul.gortmaker(a)windriver.com>
cgroup1: fix leaked context root causing sporadic NULL deref in LTP
Yang Yingliang <yangyingliang(a)huawei.com>
workqueue: fix UAF in pwq_unbound_release_workfn()
Miklos Szeredi <mszeredi(a)redhat.com>
af_unix: fix garbage collect vs MSG_PEEK
Maxim Levitsky <mlevitsk(a)redhat.com>
KVM: x86: determine if an exception has an error code only when injecting it.
Yonghong Song <yhs(a)fb.com>
tools: Allow proper CC/CXX/... override with LLVM=1 in Makefile.include
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c
-------------
Diffstat:
Makefile | 4 +--
arch/arm/boot/dts/versatile-ab.dts | 5 ++--
arch/arm/boot/dts/versatile-pb.dts | 2 +-
arch/x86/kvm/x86.c | 13 +++++---
drivers/firmware/arm_scmi/driver.c | 12 ++++----
fs/cifs/smb2ops.c | 4 +--
fs/hfs/bfind.c | 14 ++++++++-
fs/hfs/bnode.c | 25 ++++++++++++----
fs/hfs/btree.h | 7 +++++
fs/hfs/super.c | 10 +++----
fs/internal.h | 1 -
fs/iomap/seek.c | 25 ++++++----------
include/linux/fs_context.h | 1 +
include/net/busy_poll.h | 2 +-
include/net/sctp/constants.h | 4 +--
kernel/cgroup/cgroup-v1.c | 4 +--
kernel/workqueue.c | 20 ++++++++-----
net/802/garp.c | 14 +++++++++
net/802/mrp.c | 14 +++++++++
net/core/sock.c | 2 +-
net/ipv6/ip6_output.c | 28 ++++++++++++++++++
net/sctp/protocol.c | 3 +-
net/unix/af_unix.c | 51 ++++++++++++++++++++++++++++++--
tools/scripts/Makefile.include | 12 ++++++--
tools/testing/selftests/vm/userfaultfd.c | 2 +-
25 files changed, 213 insertions(+), 66 deletions(-)