From: Vlastimil Babka <vbabka(a)suse.cz>
The x86 version of get_user_pages_fast() relies on disabled interrupts to
synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
releases the page. As TLB flush is done synchronously via IPI disabling
interrupts blocks the page release, and get_page(), which assumes existing
reference on page, is thus safe.
However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
no blocking thanks to disabled interrupts, and get_page() can succeed on a page
that was already freed or even reused.
We have recently seen this happen with our 4.4 and 4.12 based kernels, with
userspace (java) that exits a thread, where mm_release() performs a futex_wake()
on tsk->clear_child_tid, and another thread in parallel unmaps the page where
tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
immediately releases the page again, while it's already on a freelist. Symptoms
include a bad page state warning, general protection faults acessing a poisoned
list prev/next pointer in the freelist, or free page pcplists of two cpus joined
together in a single list. Oscar has also reproduced this scenario, with a
patch inserting delays before the get_page() to make the race window larger.
Fix this by removing the dependency on TLB flush interrupts the same way as the
generic get_user_pages_fast() code by using page_cache_add_speculative() and
revalidating the PTE contents after pinning the page. Mainline is safe since
4.13 where the x86 gup code was removed in favor of the common code. Accessing
the page table itself safely also relies on disabled interrupts and TLB flush
IPIs that don't happen with hypercalls, which was acknowledged in commit
9e52fc2b50de ("x86/mm: Enable RCU based page table freeing
(CONFIG_HAVE_RCU_TABLE_FREE=y)"). That commit with follups should also be
backported for full safety, although our reproducer didn't hit a problem
without that backport.
Reproduced-by: Oscar Salvador <osalvador(a)suse.de>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
---
arch/x86/mm/gup.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 6612d532e42e..6379a4883c0a 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -9,6 +9,7 @@
#include <linux/vmstat.h>
#include <linux/highmem.h>
#include <linux/swap.h>
+#include <linux/pagemap.h>
#include <asm/pgtable.h>
@@ -95,10 +96,23 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
}
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- if (unlikely(!try_get_page(page))) {
+
+ if (WARN_ON_ONCE(page_ref_count(page) < 0)) {
+ pte_unmap(ptep);
+ return 0;
+ }
+
+ if (!page_cache_get_speculative(page)) {
pte_unmap(ptep);
return 0;
}
+
+ if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ put_page(page);
+ pte_unmap(ptep);
+ return 0;
+ }
+
SetPageReferenced(page);
pages[*nr] = page;
(*nr)++;
--
2.23.0
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 60e4cf67a582d64f07713eda5fcc8ccdaf7833e6 Mon Sep 17 00:00:00 2001
From: Jeff Mahoney <jeffm(a)suse.com>
Date: Thu, 24 Oct 2019 10:31:27 -0400
Subject: [PATCH] reiserfs: fix extended attributes on the root directory
Since commit d0a5b995a308 (vfs: Add IOP_XATTR inode operations flag)
extended attributes haven't worked on the root directory in reiserfs.
This is due to reiserfs conditionally setting the sb->s_xattrs handler
array depending on whether it located or create the internal privroot
directory. It necessarily does this after the root inode is already
read in. The IOP_XATTR flag is set during inode initialization, so
it never gets set on the root directory.
This commit unconditionally assigns sb->s_xattrs and clears IOP_XATTR on
internal inodes. The old return values due to the conditional assignment
are handled via open_xa_root, which now returns EOPNOTSUPP as the VFS
would have done.
Link: https://lore.kernel.org/r/20191024143127.17509-1-jeffm@suse.com
CC: stable(a)vger.kernel.org
Fixes: d0a5b995a308 ("vfs: Add IOP_XATTR inode operations flag")
Signed-off-by: Jeff Mahoney <jeffm(a)suse.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/reiserfs/inode.c b/fs/reiserfs/inode.c
index 132ec4406ed0..6419e6dacc39 100644
--- a/fs/reiserfs/inode.c
+++ b/fs/reiserfs/inode.c
@@ -2097,6 +2097,15 @@ int reiserfs_new_inode(struct reiserfs_transaction_handle *th,
goto out_inserted_sd;
}
+ /*
+ * Mark it private if we're creating the privroot
+ * or something under it.
+ */
+ if (IS_PRIVATE(dir) || dentry == REISERFS_SB(sb)->priv_root) {
+ inode->i_flags |= S_PRIVATE;
+ inode->i_opflags &= ~IOP_XATTR;
+ }
+
if (reiserfs_posixacl(inode->i_sb)) {
reiserfs_write_unlock(inode->i_sb);
retval = reiserfs_inherit_default_acl(th, dir, dentry, inode);
@@ -2111,8 +2120,7 @@ int reiserfs_new_inode(struct reiserfs_transaction_handle *th,
reiserfs_warning(inode->i_sb, "jdm-13090",
"ACLs aren't enabled in the fs, "
"but vfs thinks they are!");
- } else if (IS_PRIVATE(dir))
- inode->i_flags |= S_PRIVATE;
+ }
if (security->name) {
reiserfs_write_unlock(inode->i_sb);
diff --git a/fs/reiserfs/namei.c b/fs/reiserfs/namei.c
index 97f3fc4fdd79..959a066b7bb0 100644
--- a/fs/reiserfs/namei.c
+++ b/fs/reiserfs/namei.c
@@ -377,10 +377,13 @@ static struct dentry *reiserfs_lookup(struct inode *dir, struct dentry *dentry,
/*
* Propagate the private flag so we know we're
- * in the priv tree
+ * in the priv tree. Also clear IOP_XATTR
+ * since we don't have xattrs on xattr files.
*/
- if (IS_PRIVATE(dir))
+ if (IS_PRIVATE(dir)) {
inode->i_flags |= S_PRIVATE;
+ inode->i_opflags &= ~IOP_XATTR;
+ }
}
reiserfs_write_unlock(dir->i_sb);
if (retval == IO_ERROR) {
diff --git a/fs/reiserfs/reiserfs.h b/fs/reiserfs/reiserfs.h
index e5ca9ed79e54..726580114d55 100644
--- a/fs/reiserfs/reiserfs.h
+++ b/fs/reiserfs/reiserfs.h
@@ -1168,6 +1168,8 @@ static inline int bmap_would_wrap(unsigned bmap_nr)
return bmap_nr > ((1LL << 16) - 1);
}
+extern const struct xattr_handler *reiserfs_xattr_handlers[];
+
/*
* this says about version of key of all items (but stat data) the
* object consists of
diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c
index d69b4ac0ae2f..3244037b1286 100644
--- a/fs/reiserfs/super.c
+++ b/fs/reiserfs/super.c
@@ -2049,6 +2049,8 @@ static int reiserfs_fill_super(struct super_block *s, void *data, int silent)
if (replay_only(s))
goto error_unlocked;
+ s->s_xattr = reiserfs_xattr_handlers;
+
if (bdev_read_only(s->s_bdev) && !sb_rdonly(s)) {
SWARN(silent, s, "clm-7000",
"Detected readonly device, marking FS readonly");
diff --git a/fs/reiserfs/xattr.c b/fs/reiserfs/xattr.c
index b5b26d8a192c..62b40df36c98 100644
--- a/fs/reiserfs/xattr.c
+++ b/fs/reiserfs/xattr.c
@@ -122,13 +122,13 @@ static struct dentry *open_xa_root(struct super_block *sb, int flags)
struct dentry *xaroot;
if (d_really_is_negative(privroot))
- return ERR_PTR(-ENODATA);
+ return ERR_PTR(-EOPNOTSUPP);
inode_lock_nested(d_inode(privroot), I_MUTEX_XATTR);
xaroot = dget(REISERFS_SB(sb)->xattr_root);
if (!xaroot)
- xaroot = ERR_PTR(-ENODATA);
+ xaroot = ERR_PTR(-EOPNOTSUPP);
else if (d_really_is_negative(xaroot)) {
int err = -ENODATA;
@@ -619,6 +619,10 @@ int reiserfs_xattr_set(struct inode *inode, const char *name,
int error, error2;
size_t jbegin_count = reiserfs_xattr_nblocks(inode, buffer_size);
+ /* Check before we start a transaction and then do nothing. */
+ if (!d_really_is_positive(REISERFS_SB(inode->i_sb)->priv_root))
+ return -EOPNOTSUPP;
+
if (!(flags & XATTR_REPLACE))
jbegin_count += reiserfs_xattr_jcreate_nblocks(inode);
@@ -841,8 +845,7 @@ ssize_t reiserfs_listxattr(struct dentry * dentry, char *buffer, size_t size)
if (d_really_is_negative(dentry))
return -EINVAL;
- if (!dentry->d_sb->s_xattr ||
- get_inode_sd_version(d_inode(dentry)) == STAT_DATA_V1)
+ if (get_inode_sd_version(d_inode(dentry)) == STAT_DATA_V1)
return -EOPNOTSUPP;
dir = open_xa_dir(d_inode(dentry), XATTR_REPLACE);
@@ -882,6 +885,7 @@ static int create_privroot(struct dentry *dentry)
}
d_inode(dentry)->i_flags |= S_PRIVATE;
+ d_inode(dentry)->i_opflags &= ~IOP_XATTR;
reiserfs_info(dentry->d_sb, "Created %s - reserved for xattr "
"storage.\n", PRIVROOT_NAME);
@@ -895,7 +899,7 @@ static int create_privroot(struct dentry *dentry) { return 0; }
#endif
/* Actual operations that are exported to VFS-land */
-static const struct xattr_handler *reiserfs_xattr_handlers[] = {
+const struct xattr_handler *reiserfs_xattr_handlers[] = {
#ifdef CONFIG_REISERFS_FS_XATTR
&reiserfs_xattr_user_handler,
&reiserfs_xattr_trusted_handler,
@@ -966,8 +970,10 @@ int reiserfs_lookup_privroot(struct super_block *s)
if (!IS_ERR(dentry)) {
REISERFS_SB(s)->priv_root = dentry;
d_set_d_op(dentry, &xattr_lookup_poison_ops);
- if (d_really_is_positive(dentry))
+ if (d_really_is_positive(dentry)) {
d_inode(dentry)->i_flags |= S_PRIVATE;
+ d_inode(dentry)->i_opflags &= ~IOP_XATTR;
+ }
} else
err = PTR_ERR(dentry);
inode_unlock(d_inode(s->s_root));
@@ -996,7 +1002,6 @@ int reiserfs_xattr_init(struct super_block *s, int mount_flags)
}
if (d_really_is_positive(privroot)) {
- s->s_xattr = reiserfs_xattr_handlers;
inode_lock(d_inode(privroot));
if (!REISERFS_SB(s)->xattr_root) {
struct dentry *dentry;
diff --git a/fs/reiserfs/xattr_acl.c b/fs/reiserfs/xattr_acl.c
index aa9380bac196..05f666794561 100644
--- a/fs/reiserfs/xattr_acl.c
+++ b/fs/reiserfs/xattr_acl.c
@@ -320,10 +320,8 @@ reiserfs_inherit_default_acl(struct reiserfs_transaction_handle *th,
* would be useless since permissions are ignored, and a pain because
* it introduces locking cycles
*/
- if (IS_PRIVATE(dir)) {
- inode->i_flags |= S_PRIVATE;
+ if (IS_PRIVATE(inode))
goto apply_umask;
- }
err = posix_acl_create(dir, &inode->i_mode, &default_acl, &acl);
if (err)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c9029ef9c95765e7b63c4d9aa780674447db1ec0 Mon Sep 17 00:00:00 2001
From: Nathan Chancellor <natechancellor(a)gmail.com>
Date: Mon, 18 Nov 2019 21:57:11 -0700
Subject: [PATCH] powerpc: Avoid clang warnings around setjmp and longjmp
Commit aea447141c7e ("powerpc: Disable -Wbuiltin-requires-header when
setjmp is used") disabled -Wbuiltin-requires-header because of a
warning about the setjmp and longjmp declarations.
r367387 in clang added another diagnostic around this, complaining
that there is no jmp_buf declaration.
In file included from ../arch/powerpc/xmon/xmon.c:47:
../arch/powerpc/include/asm/setjmp.h:10:13: error: declaration of
built-in function 'setjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern long setjmp(long *);
^
../arch/powerpc/include/asm/setjmp.h:11:13: error: declaration of
built-in function 'longjmp' requires the declaration of the 'jmp_buf'
type, commonly provided in the header <setjmp.h>.
[-Werror,-Wincomplete-setjmp-declaration]
extern void longjmp(long *, long);
^
2 errors generated.
We are not using the standard library's longjmp/setjmp implementations
for obvious reasons; make this clear to clang by using -ffreestanding
on these files.
Cc: stable(a)vger.kernel.org # 4.14+
Suggested-by: Segher Boessenkool <segher(a)kernel.crashing.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Nathan Chancellor <natechancellor(a)gmail.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20191119045712.39633-3-natechancellor@gmail.com
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 16c1c5a19519..378f6108a414 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -3,8 +3,8 @@
# Makefile for the linux kernel.
#
-# Disable clang warning for using setjmp without setjmp.h header
-CFLAGS_crash.o += $(call cc-disable-warning, builtin-requires-header)
+# Avoid clang warnings around longjmp/setjmp declarations
+CFLAGS_crash.o += -ffreestanding
obj-y += core.o crash.o core_$(BITS).o
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index f142570ad860..c3842dbeb1b7 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for xmon
-# Disable clang warning for using setjmp without setjmp.h header
-subdir-ccflags-y := $(call cc-disable-warning, builtin-requires-header)
+# Avoid clang warnings around longjmp/setjmp declarations
+subdir-ccflags-y := -ffreestanding
GCOV_PROFILE := n
KCOV_INSTRUMENT := n
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7f028caadf6c37580d0f59c6c094ed09afc04062 Mon Sep 17 00:00:00 2001
From: Krzysztof Kozlowski <krzk(a)kernel.org>
Date: Mon, 5 Aug 2019 18:27:09 +0200
Subject: [PATCH] pinctrl: samsung: Fix device node refcount leaks in S3C64xx
wakeup controller init
In s3c64xx_eint_eint0_init() the for_each_child_of_node() loop is used
with a break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
Cc: <stable(a)vger.kernel.org>
Fixes: 61dd72613177 ("pinctrl: Add pinctrl-s3c64xx driver")
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
diff --git a/drivers/pinctrl/samsung/pinctrl-s3c64xx.c b/drivers/pinctrl/samsung/pinctrl-s3c64xx.c
index c399f0932af5..f97f8179f2b1 100644
--- a/drivers/pinctrl/samsung/pinctrl-s3c64xx.c
+++ b/drivers/pinctrl/samsung/pinctrl-s3c64xx.c
@@ -704,8 +704,10 @@ static int s3c64xx_eint_eint0_init(struct samsung_pinctrl_drv_data *d)
return -ENODEV;
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
- if (!data)
+ if (!data) {
+ of_node_put(eint0_np);
return -ENOMEM;
+ }
data->drvdata = d;
for (i = 0; i < NUM_EINT0_IRQ; ++i) {
@@ -714,6 +716,7 @@ static int s3c64xx_eint_eint0_init(struct samsung_pinctrl_drv_data *d)
irq = irq_of_parse_and_map(eint0_np, i);
if (!irq) {
dev_err(dev, "failed to get wakeup EINT IRQ %d\n", i);
+ of_node_put(eint0_np);
return -ENXIO;
}
@@ -721,6 +724,7 @@ static int s3c64xx_eint_eint0_init(struct samsung_pinctrl_drv_data *d)
s3c64xx_eint0_handlers[i],
data);
}
+ of_node_put(eint0_np);
bank = d->pin_banks;
for (i = 0; i < d->nr_banks; ++i, ++bank) {
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 287897f9aaa2ad1c923d9875914f57c4dc9159c8 Mon Sep 17 00:00:00 2001
From: Jarkko Nikula <jarkko.nikula(a)bitmer.com>
Date: Sat, 16 Nov 2019 17:16:51 +0200
Subject: [PATCH] ARM: dts: omap3-tao3530: Fix incorrect MMC card detection
GPIO polarity
The MMC card detection GPIO polarity is active low on TAO3530, like in many
other similar boards. Now the card is not detected and it is unable to
mount rootfs from an SD card.
Fix this by using the correct polarity.
This incorrect polarity was defined already in the commit 30d95c6d7092
("ARM: dts: omap3: Add Technexion TAO3530 SOM omap3-tao3530.dtsi") in v3.18
kernel and later changed to use defined GPIO constants in v4.4 kernel by
the commit 3a637e008e54 ("ARM: dts: Use defined GPIO constants in flags
cell for OMAP2+ boards").
While the latter commit did not introduce the issue I'm marking it with
Fixes tag due the v4.4 kernels still being maintained.
Fixes: 3a637e008e54 ("ARM: dts: Use defined GPIO constants in flags cell for OMAP2+ boards")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.4+
Signed-off-by: Jarkko Nikula <jarkko.nikula(a)bitmer.com>
Signed-off-by: Tony Lindgren <tony(a)atomide.com>
diff --git a/arch/arm/boot/dts/omap3-tao3530.dtsi b/arch/arm/boot/dts/omap3-tao3530.dtsi
index a7a04d78deeb..f24e2326cfa7 100644
--- a/arch/arm/boot/dts/omap3-tao3530.dtsi
+++ b/arch/arm/boot/dts/omap3-tao3530.dtsi
@@ -222,7 +222,7 @@ &mmc1 {
pinctrl-0 = <&mmc1_pins>;
vmmc-supply = <&vmmc1>;
vqmmc-supply = <&vsim>;
- cd-gpios = <&twl_gpio 0 GPIO_ACTIVE_HIGH>;
+ cd-gpios = <&twl_gpio 0 GPIO_ACTIVE_LOW>;
bus-width = <8>;
};
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a0e248bb502d5165b3314ac3819e888fdcdf7d9f Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Fri, 11 Oct 2019 16:41:20 +0100
Subject: [PATCH] Btrfs: fix negative subv_writers counter and data space leak
after buffered write
When doing a buffered write it's possible to leave the subv_writers
counter of the root, used for synchronization between buffered nocow
writers and snapshotting. This happens in an exceptional case like the
following:
1) We fail to allocate data space for the write, since there's not
enough available data space nor enough unallocated space for allocating
a new data block group;
2) Because of that failure, we try to go to NOCOW mode, which succeeds
and therefore we set the local variable 'only_release_metadata' to true
and set the root's sub_writers counter to 1 through the call to
btrfs_start_write_no_snapshotting() made by check_can_nocow();
3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
to happen but not impossible;
4) No pages are copied because btrfs_copy_from_user() returned zero;
5) We call btrfs_end_write_no_snapshotting() which decrements the root's
subv_writers counter to 0;
6) We don't set 'only_release_metadata' back to 'false' because we do
it only if 'copied', the value returned by btrfs_copy_from_user(), is
greater than zero;
7) On the next iteration of the while loop, which processes the same
page range, we are now able to allocate data space for the write (we
got enough data space released in the meanwhile);
8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
now there isn't enough free metadata space, or in some other place
further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
btrfs_dirty_pages()), we break out of the while loop with
'only_release_metadata' having a value of 'true';
9) Because 'only_release_metadata' is 'true' we end up decrementing the
root's subv_writers counter to -1 (through a call to
btrfs_end_write_no_snapshotting()), and we also end up not releasing the
data space previously reserved through btrfs_check_data_free_space().
As a consequence the mechanism for synchronizing NOCOW buffered writes
with snapshotting gets broken.
Fix this by always setting 'only_release_metadata' to false at the start
of each iteration.
Fixes: 8257b2dc3c1a ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume")
Fixes: 7ee9e4405f26 ("Btrfs: check if we can nocow if we don't have data space")
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index f9434fa3e387..91c98a6d1408 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1636,6 +1636,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
break;
}
+ only_release_metadata = false;
sector_offset = pos & (fs_info->sectorsize - 1);
reserve_bytes = round_up(write_bytes + sector_offset,
fs_info->sectorsize);
@@ -1791,7 +1792,6 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
set_extent_bit(&BTRFS_I(inode)->io_tree, lockstart,
lockend, EXTENT_NORESERVE, NULL,
NULL, GFP_NOFS);
- only_release_metadata = false;
}
btrfs_drop_pages(pages, num_pages);
The patch below does not apply to the 5.3-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a60adce85f4bb5c1ef8ffcebadd702cafa2f3696 Mon Sep 17 00:00:00 2001
From: Josef Bacik <josef(a)toxicpanda.com>
Date: Tue, 24 Sep 2019 16:50:44 -0400
Subject: [PATCH] btrfs: use btrfs_block_group_cache_done in update_block_group
When free'ing extents in a block group we check to see if the block
group is not cached, and then cache it if we need to. However we'll
just carry on as long as we're loading the cache. This is problematic
because we are dirtying the block group here. If we are fast enough we
could do a transaction commit and clear the free space cache while we're
still loading the space cache in another thread. This truncates the
free space inode, which will keep it from loading the space cache.
Fix this by using the btrfs_block_group_cache_done helper so that we try
to load the space cache unconditionally here, which will result in the
caller waiting for the fast caching to complete and keep us from
truncating the free space inode.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: Nikolay Borisov <nborisov(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 384659dc7818..540a7a63601e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -2661,7 +2661,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
* is because we need the unpinning stage to actually add the
* space back to the block group, otherwise we will leak space.
*/
- if (!alloc && cache->cached == BTRFS_CACHE_NO)
+ if (!alloc && !btrfs_block_group_cache_done(cache))
btrfs_cache_block_group(cache, 1);
byte_in_group = bytenr - cache->key.objectid;