The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41bd60676923822de1df2c50b3f9a10171f4338a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 28 Nov 2018 14:54:28 +0000
Subject: [PATCH] Btrfs: fix fsync of files with multiple hard links in new
directories
The log tree has a long standing problem that when a file is fsync'ed we
only check for new ancestors, created in the current transaction, by
following only the hard link for which the fsync was issued. We follow the
ancestors using the VFS' dget_parent() API. This means that if we create a
new link for a file in a directory that is new (or in an any other new
ancestor directory) and then fsync the file using an old hard link, we end
up not logging the new ancestor, and on log replay that new hard link and
ancestor do not exist. In some cases, involving renames, the file will not
exist at all.
Example:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt
mkdir /mnt/A
touch /mnt/foo
ln /mnt/foo /mnt/A/bar
xfs_io -c fsync /mnt/foo
<power failure>
In this example after log replay only the hard link named 'foo' exists
and directory A does not exist, which is unexpected. In other major linux
filesystems, such as ext4, xfs and f2fs for example, both hard links exist
and so does directory A after mounting again the filesystem.
Checking if any new ancestors are new and need to be logged was added in
2009 by commit 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes"),
however only for the ancestors of the hard link (dentry) for which the
fsync was issued, instead of checking for all ancestors for all of the
inode's hard links.
So fix this by tracking the id of the last transaction where a hard link
was created for an inode and then on fsync fallback to a full transaction
commit when an inode has more than one hard link and at least one new hard
link was created in the current transaction. This is the simplest solution
since this is not a common use case (adding frequently hard links for
which there's an ancestor created in the current transaction and then
fsync the file). In case it ever becomes a common use case, a solution
that consists of iterating the fs/subvol btree for each hard link and
check if any ancestor is new, could be implemented.
This solves many unexpected scenarios reported by Jayashree Mohan and
Vijay Chidambaram, and for which there is a new test case for fstests
under review.
Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
CC: stable(a)vger.kernel.org # 4.4+
Reported-by: Vijay Chidambaram <vvijay03(a)gmail.com>
Reported-by: Jayashree Mohan <jayashree2912(a)gmail.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index fc25607304f2..6f5d07415dab 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -147,6 +147,12 @@ struct btrfs_inode {
*/
u64 last_unlink_trans;
+ /*
+ * Track the transaction id of the last transaction used to create a
+ * hard link for the inode. This is used by the log tree (fsync).
+ */
+ u64 last_link_trans;
+
/*
* Number of bytes outstanding that are going to need csums. This is
* used in ENOSPC accounting.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d54bdef16d8d..b4129d9072ec 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3658,6 +3658,21 @@ static int btrfs_read_locked_inode(struct inode *inode,
* inode is not a directory, logging its parent unnecessarily.
*/
BTRFS_I(inode)->last_unlink_trans = BTRFS_I(inode)->last_trans;
+ /*
+ * Similar reasoning for last_link_trans, needs to be set otherwise
+ * for a case like the following:
+ *
+ * mkdir A
+ * touch foo
+ * ln foo A/bar
+ * echo 2 > /proc/sys/vm/drop_caches
+ * fsync foo
+ * <power failure>
+ *
+ * Would result in link bar and directory A not existing after the power
+ * failure.
+ */
+ BTRFS_I(inode)->last_link_trans = BTRFS_I(inode)->last_trans;
path->slots[0]++;
if (inode->i_nlink != 1 ||
@@ -6597,6 +6612,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
if (err)
goto fail;
}
+ BTRFS_I(inode)->last_link_trans = trans->transid;
d_instantiate(dentry, inode);
ret = btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent,
true, NULL);
@@ -9123,6 +9139,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->index_cnt = (u64)-1;
ei->dir_index = 0;
ei->last_unlink_trans = 0;
+ ei->last_link_trans = 0;
ei->last_log_commit = 0;
spin_lock_init(&ei->lock);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 013d0abcd46b..5baad9bebc62 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5758,6 +5758,22 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
goto end_trans;
}
+ /*
+ * If a new hard link was added to the inode in the current transaction
+ * and its link count is now greater than 1, we need to fallback to a
+ * transaction commit, otherwise we can end up not logging all its new
+ * parents for all the hard links. Here just from the dentry used to
+ * fsync, we can not visit the ancestor inodes for all the other hard
+ * links to figure out if any is new, so we fallback to a transaction
+ * commit (instead of adding a lot of complexity of scanning a btree,
+ * since this scenario is not a common use case).
+ */
+ if (inode->vfs_inode.i_nlink > 1 &&
+ inode->last_link_trans > last_committed) {
+ ret = -EMLINK;
+ goto end_trans;
+ }
+
while (1) {
if (!parent || d_really_is_negative(parent) || sb != parent->d_sb)
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 18f2c4fcebf2582f96cbd5f2238f4f354a0e4847 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Wed, 19 Dec 2018 14:36:58 -0500
Subject: [PATCH] ext4: check for shutdown and r/o file system in
ext4_write_inode()
If the file system has been shut down or is read-only, then
ext4_write_inode() needs to bail out early.
Also use jbd2_complete_transaction() instead of ext4_force_commit() so
we only force a commit if it is needed.
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 664b434ba836..9affabd07682 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5400,9 +5400,13 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
{
int err;
- if (WARN_ON_ONCE(current->flags & PF_MEMALLOC))
+ if (WARN_ON_ONCE(current->flags & PF_MEMALLOC) ||
+ sb_rdonly(inode->i_sb))
return 0;
+ if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+ return -EIO;
+
if (EXT4_SB(inode->i_sb)->s_journal) {
if (ext4_journal_current_handle()) {
jbd_debug(1, "called recursively, non-PF_MEMALLOC!\n");
@@ -5418,7 +5422,8 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync)
return 0;
- err = ext4_force_commit(inode->i_sb);
+ err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal,
+ EXT4_I(inode)->i_sync_tid);
} else {
struct ext4_iloc iloc;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 18f2c4fcebf2582f96cbd5f2238f4f354a0e4847 Mon Sep 17 00:00:00 2001
From: Theodore Ts'o <tytso(a)mit.edu>
Date: Wed, 19 Dec 2018 14:36:58 -0500
Subject: [PATCH] ext4: check for shutdown and r/o file system in
ext4_write_inode()
If the file system has been shut down or is read-only, then
ext4_write_inode() needs to bail out early.
Also use jbd2_complete_transaction() instead of ext4_force_commit() so
we only force a commit if it is needed.
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 664b434ba836..9affabd07682 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5400,9 +5400,13 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
{
int err;
- if (WARN_ON_ONCE(current->flags & PF_MEMALLOC))
+ if (WARN_ON_ONCE(current->flags & PF_MEMALLOC) ||
+ sb_rdonly(inode->i_sb))
return 0;
+ if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+ return -EIO;
+
if (EXT4_SB(inode->i_sb)->s_journal) {
if (ext4_journal_current_handle()) {
jbd_debug(1, "called recursively, non-PF_MEMALLOC!\n");
@@ -5418,7 +5422,8 @@ int ext4_write_inode(struct inode *inode, struct writeback_control *wbc)
if (wbc->sync_mode != WB_SYNC_ALL || wbc->for_sync)
return 0;
- err = ext4_force_commit(inode->i_sb);
+ err = jbd2_complete_transaction(EXT4_SB(inode->i_sb)->s_journal,
+ EXT4_I(inode)->i_sync_tid);
} else {
struct ext4_iloc iloc;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0568e82dbe2510fc1fa664f58e5c997d3f1e649e Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Fri, 30 Nov 2018 11:52:14 -0500
Subject: [PATCH] btrfs: run delayed items before dropping the snapshot
With my delayed refs patches in place we started seeing a large amount
of aborts in __btrfs_free_extent:
BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0
Call Trace:
? btrfs_merge_delayed_refs+0xaf/0x340
__btrfs_run_delayed_refs+0x6ea/0xfc0
? btrfs_set_path_blocking+0x31/0x60
btrfs_run_delayed_refs+0xeb/0x180
btrfs_commit_transaction+0x179/0x7f0
? btrfs_check_space_for_delayed_refs+0x30/0x50
? should_end_transaction.isra.19+0xe/0x40
btrfs_drop_snapshot+0x41c/0x7c0
btrfs_clean_one_deleted_snapshot+0xb5/0xd0
cleaner_kthread+0xf6/0x120
kthread+0xf8/0x130
? btree_invalidatepage+0x90/0x90
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
This was because btrfs_drop_snapshot depends on the root not being
modified while it's dropping the snapshot. It will unlock the root node
(and really every node) as it walks down the tree, only to re-lock it
when it needs to do something. This is a problem because if we modify
the tree we could cow a block in our path, which frees our reference to
that block. Then once we get back to that shared block we'll free our
reference to it again, and get ENOENT when trying to lookup our extent
reference to that block in __btrfs_free_extent.
This is ultimately happening because we have delayed items left to be
processed for our deleted snapshot _after_ all of the inodes are closed
for the snapshot. We only run the delayed inode item if we're deleting
the inode, and even then we do not run the delayed insertions or delayed
removals. These can be run at any point after our final inode does its
last iput, which is what triggers the snapshot deletion. We can end up
with the snapshot deletion happening and then have the delayed items run
on that file system, resulting in the above problem.
This problem has existed forever, however my patches made it much easier
to hit as I wake up the cleaner much more often to deal with delayed
iputs, which made us more likely to start the snapshot dropping work
before the transaction commits, which is when the delayed items would
generally be run. Before, generally speaking, we would run the delayed
items, commit the transaction, and wakeup the cleaner thread to start
deleting snapshots, which means we were less likely to hit this problem.
You could still hit it if you had multiple snapshots to be deleted and
ended up with lots of delayed items, but it was definitely harder.
Fix for now by simply running all the delayed items before starting to
drop the snapshot. We could make this smarter in the future by making
the delayed items per-root, and then simply drop any delayed items for
roots that we are going to delete. But for now just a quick and easy
solution is the safest.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de21e0c93eb6..8a9ce33dfdbc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9272,6 +9272,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_free;
}
+ err = btrfs_run_delayed_items(trans);
+ if (err)
+ goto out_end_trans;
+
if (block_rsv)
trans->block_rsv = block_rsv;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0568e82dbe2510fc1fa664f58e5c997d3f1e649e Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Fri, 30 Nov 2018 11:52:14 -0500
Subject: [PATCH] btrfs: run delayed items before dropping the snapshot
With my delayed refs patches in place we started seeing a large amount
of aborts in __btrfs_free_extent:
BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0
Call Trace:
? btrfs_merge_delayed_refs+0xaf/0x340
__btrfs_run_delayed_refs+0x6ea/0xfc0
? btrfs_set_path_blocking+0x31/0x60
btrfs_run_delayed_refs+0xeb/0x180
btrfs_commit_transaction+0x179/0x7f0
? btrfs_check_space_for_delayed_refs+0x30/0x50
? should_end_transaction.isra.19+0xe/0x40
btrfs_drop_snapshot+0x41c/0x7c0
btrfs_clean_one_deleted_snapshot+0xb5/0xd0
cleaner_kthread+0xf6/0x120
kthread+0xf8/0x130
? btree_invalidatepage+0x90/0x90
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
This was because btrfs_drop_snapshot depends on the root not being
modified while it's dropping the snapshot. It will unlock the root node
(and really every node) as it walks down the tree, only to re-lock it
when it needs to do something. This is a problem because if we modify
the tree we could cow a block in our path, which frees our reference to
that block. Then once we get back to that shared block we'll free our
reference to it again, and get ENOENT when trying to lookup our extent
reference to that block in __btrfs_free_extent.
This is ultimately happening because we have delayed items left to be
processed for our deleted snapshot _after_ all of the inodes are closed
for the snapshot. We only run the delayed inode item if we're deleting
the inode, and even then we do not run the delayed insertions or delayed
removals. These can be run at any point after our final inode does its
last iput, which is what triggers the snapshot deletion. We can end up
with the snapshot deletion happening and then have the delayed items run
on that file system, resulting in the above problem.
This problem has existed forever, however my patches made it much easier
to hit as I wake up the cleaner much more often to deal with delayed
iputs, which made us more likely to start the snapshot dropping work
before the transaction commits, which is when the delayed items would
generally be run. Before, generally speaking, we would run the delayed
items, commit the transaction, and wakeup the cleaner thread to start
deleting snapshots, which means we were less likely to hit this problem.
You could still hit it if you had multiple snapshots to be deleted and
ended up with lots of delayed items, but it was definitely harder.
Fix for now by simply running all the delayed items before starting to
drop the snapshot. We could make this smarter in the future by making
the delayed items per-root, and then simply drop any delayed items for
roots that we are going to delete. But for now just a quick and easy
solution is the safest.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de21e0c93eb6..8a9ce33dfdbc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9272,6 +9272,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_free;
}
+ err = btrfs_run_delayed_items(trans);
+ if (err)
+ goto out_end_trans;
+
if (block_rsv)
trans->block_rsv = block_rsv;
In commit 1a0c10ed7b media: dvb-usb-v2: stop using coherent memory for URBs
incorrectly adds URB_FREE_BUFFER after every urb transfer.
It cannot use this flag because it reconfigures the URBs accordingly
to suit connected devices. In doing a call to usb_free_urb is made and
invertedly frees the buffers.
The stream buffer should remain constant while driver is up.
Signed-off-by: Malcolm Priestley <tvboxspy(a)gmail.com>
CC: stable(a)vger.kernel.org # v4.18+
---
v3 change commit message to the actual cause
drivers/media/usb/dvb-usb-v2/usb_urb.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/media/usb/dvb-usb-v2/usb_urb.c b/drivers/media/usb/dvb-usb-v2/usb_urb.c
index 024c751eb165..2ad2ddeaff51 100644
--- a/drivers/media/usb/dvb-usb-v2/usb_urb.c
+++ b/drivers/media/usb/dvb-usb-v2/usb_urb.c
@@ -155,7 +155,6 @@ static int usb_urb_alloc_bulk_urbs(struct usb_data_stream *stream)
stream->props.u.bulk.buffersize,
usb_urb_complete, stream);
- stream->urb_list[i]->transfer_flags = URB_FREE_BUFFER;
stream->urbs_initialized++;
}
return 0;
@@ -186,7 +185,7 @@ static int usb_urb_alloc_isoc_urbs(struct usb_data_stream *stream)
urb->complete = usb_urb_complete;
urb->pipe = usb_rcvisocpipe(stream->udev,
stream->props.endpoint);
- urb->transfer_flags = URB_ISO_ASAP | URB_FREE_BUFFER;
+ urb->transfer_flags = URB_ISO_ASAP;
urb->interval = stream->props.u.isoc.interval;
urb->number_of_packets = stream->props.u.isoc.framesperurb;
urb->transfer_buffer_length = stream->props.u.isoc.framesize *
@@ -210,7 +209,7 @@ static int usb_free_stream_buffers(struct usb_data_stream *stream)
if (stream->state & USB_STATE_URB_BUF) {
while (stream->buf_num) {
stream->buf_num--;
- stream->buf_list[stream->buf_num] = NULL;
+ kfree(stream->buf_list[stream->buf_num]);
}
}
--
2.19.1