The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 005b0a0c24e1628313e951516b675109a92cacfe # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025081840-stomp-enhance-b456@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 005b0a0c24e1628313e951516b675109a92cacfe Mon Sep 17 00:00:00 2001 From: Filipe Manana fdmanana@suse.com Date: Fri, 18 Jul 2025 13:07:29 +0100 Subject: [PATCH] btrfs: send: use fallocate for hole punching with send stream v2
Currently holes are sent as writes full of zeroes, which results in unnecessarily using disk space at the receiving end and increasing the stream size.
In some cases we avoid sending writes of zeroes, like during a full send operation where we just skip writes for holes.
But for some cases we fill previous holes with writes of zeroes too, like in this scenario:
1) We have a file with a hole in the range [2M, 3M), we snapshot the subvolume and do a full send. The range [2M, 3M) stays as a hole at the receiver since we skip sending write commands full of zeroes;
2) We punch a hole for the range [3M, 4M) in our file, so that now it has a 2M hole in the range [2M, 4M), and snapshot the subvolume. Now if we do an incremental send, we will send write commands full of zeroes for the range [2M, 4M), removing the hole for [2M, 3M) at the receiver.
We could improve cases such as this last one by doing additional comparisons of file extent items (or their absence) between the parent and send snapshots, but that's a lot of code to add plus additional CPU and IO costs.
Since the send stream v2 already has a fallocate command and btrfs-progs implements a callback to execute fallocate since the send stream v2 support was added to it, update the kernel to use fallocate for punching holes for V2+ streams.
Test coverage is provided by btrfs/284 which is a version of btrfs/007 that exercises send stream v2 instead of v1, using fsstress with random operations and fssum to verify file contents.
Link: https://github.com/kdave/btrfs-progs/issues/1001 CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 09822e766e41..7664025a5af4 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4,6 +4,7 @@ */
#include <linux/bsearch.h> +#include <linux/falloc.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/sort.h> @@ -5405,6 +5406,30 @@ static int send_update_extent(struct send_ctx *sctx, return ret; }
+static int send_fallocate(struct send_ctx *sctx, u32 mode, u64 offset, u64 len) +{ + struct fs_path *path; + int ret; + + path = get_cur_inode_path(sctx); + if (IS_ERR(path)) + return PTR_ERR(path); + + ret = begin_cmd(sctx, BTRFS_SEND_C_FALLOCATE); + if (ret < 0) + return ret; + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, path); + TLV_PUT_U32(sctx, BTRFS_SEND_A_FALLOCATE_MODE, mode); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: + return ret; +} + static int send_hole(struct send_ctx *sctx, u64 end) { struct fs_path *p = NULL; @@ -5412,6 +5437,14 @@ static int send_hole(struct send_ctx *sctx, u64 end) u64 offset = sctx->cur_inode_last_extent; int ret = 0;
+ /* + * Starting with send stream v2 we have fallocate and can use it to + * punch holes instead of sending writes full of zeroes. + */ + if (proto_cmd_ok(sctx, BTRFS_SEND_C_FALLOCATE)) + return send_fallocate(sctx, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + offset, end - offset); + /* * A hole that starts at EOF or beyond it. Since we do not yet support * fallocate (for extent preallocation and hole punching), sending a
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 17f6a74d0b89092e38e3328b66eda1ab29a195d4 ]
We always send xattrs for the current inode only and both callers of send_set_xattr() pass a path for the current inode. So move the path allocation and computation to send_set_xattr(), reducing duplicated code. This also facilitates an upcoming patch.
Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 41 +++++++++++++++-------------------------- 1 file changed, 15 insertions(+), 26 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index e2ead36e5be4..bb24c1a00f6e 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4879,11 +4879,19 @@ static int process_all_refs(struct send_ctx *sctx, }
static int send_set_xattr(struct send_ctx *sctx, - struct fs_path *path, const char *name, int name_len, const char *data, int data_len) { - int ret = 0; + struct fs_path *path; + int ret; + + path = fs_path_alloc(); + if (!path) + return -ENOMEM; + + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, path); + if (ret < 0) + goto out;
ret = begin_cmd(sctx, BTRFS_SEND_C_SET_XATTR); if (ret < 0) @@ -4897,6 +4905,8 @@ static int send_set_xattr(struct send_ctx *sctx,
tlv_put_failure: out: + fs_path_free(path); + return ret; }
@@ -4924,19 +4934,13 @@ static int __process_new_xattr(int num, struct btrfs_key *di_key, const char *name, int name_len, const char *data, int data_len, void *ctx) { - int ret; struct send_ctx *sctx = ctx; - struct fs_path *p; struct posix_acl_xattr_header dummy_acl;
/* Capabilities are emitted by finish_inode_if_needed */ if (!strncmp(name, XATTR_NAME_CAPS, name_len)) return 0;
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; - /* * This hack is needed because empty acls are stored as zero byte * data in xattrs. Problem with that is, that receiving these zero byte @@ -4953,15 +4957,7 @@ static int __process_new_xattr(int num, struct btrfs_key *di_key, } }
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); - if (ret < 0) - goto out; - - ret = send_set_xattr(sctx, p, name, name_len, data, data_len); - -out: - fs_path_free(p); - return ret; + return send_set_xattr(sctx, name, name_len, data, data_len); }
static int __process_deleted_xattr(int num, struct btrfs_key *di_key, @@ -5831,7 +5827,6 @@ static int send_extent_data(struct send_ctx *sctx, struct btrfs_path *path, */ static int send_capabilities(struct send_ctx *sctx) { - struct fs_path *fspath = NULL; struct btrfs_path *path; struct btrfs_dir_item *di; struct extent_buffer *leaf; @@ -5857,25 +5852,19 @@ static int send_capabilities(struct send_ctx *sctx) leaf = path->nodes[0]; buf_len = btrfs_dir_data_len(leaf, di);
- fspath = fs_path_alloc(); buf = kmalloc(buf_len, GFP_KERNEL); - if (!fspath || !buf) { + if (!buf) { ret = -ENOMEM; goto out; }
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, fspath); - if (ret < 0) - goto out; - data_ptr = (unsigned long)(di + 1) + btrfs_dir_name_len(leaf, di); read_extent_buffer(leaf, buf, data_ptr, buf_len);
- ret = send_set_xattr(sctx, fspath, XATTR_NAME_CAPS, + ret = send_set_xattr(sctx, XATTR_NAME_CAPS, strlen(XATTR_NAME_CAPS), buf, buf_len); out: kfree(buf); - fs_path_free(fspath); btrfs_free_path(path); return ret; }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 9453fe329789073d9a971de01da5902c32c1a01a ]
We have several local variables at process_recorded_refs() that are used as booleans, with some of them having a 'bool' type while two of them having an 'int' type. Change this to make them all use the 'bool' type which is more clear and to make everything more consistent.
Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index bb24c1a00f6e..07738b368608 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4180,9 +4180,9 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) u64 ow_inode = 0; u64 ow_gen; u64 ow_mode; - int did_overwrite = 0; - int is_orphan = 0; u64 last_dir_ino_rm = 0; + bool did_overwrite = false; + bool is_orphan = false; bool can_rename = true; bool orphanized_dir = false; bool orphanized_ancestor = false; @@ -4224,14 +4224,14 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) if (ret < 0) goto out; if (ret) - did_overwrite = 1; + did_overwrite = true; } if (sctx->cur_inode_new || did_overwrite) { ret = gen_unique_name(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path); if (ret < 0) goto out; - is_orphan = 1; + is_orphan = true; } else { ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path); @@ -4454,7 +4454,7 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) ret = send_rename(sctx, valid_path, cur->full_path); if (ret < 0) goto out; - is_orphan = 0; + is_orphan = false; ret = fs_path_copy(valid_path, cur->full_path); if (ret < 0) goto out; @@ -4515,7 +4515,7 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) sctx->cur_inode_gen, valid_path); if (ret < 0) goto out; - is_orphan = 1; + is_orphan = true; }
list_for_each_entry(cur, &sctx->deleted_refs, list) {
From: Filipe Manana fdmanana@suse.com
[ Upstream commit ec666c84deba56f714505b53556a97565f72db86 ]
Extract the logic to rename the current inode at process_recorded_refs() into a helper function and use it, therefore removing duplicated logic and making it easier for an upcoming patch by avoiding yet more duplicated logic.
Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 07738b368608..7c6b4963ded8 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4166,6 +4166,19 @@ static int refresh_ref_path(struct send_ctx *sctx, struct recorded_ref *ref) return ret; }
+static int rename_current_inode(struct send_ctx *sctx, + struct fs_path *current_path, + struct fs_path *new_path) +{ + int ret; + + ret = send_rename(sctx, current_path, new_path); + if (ret < 0) + return ret; + + return fs_path_copy(current_path, new_path); +} + /* * This does all the move/link/unlink/rmdir magic. */ @@ -4451,13 +4464,10 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) * it depending on the inode mode. */ if (is_orphan && can_rename) { - ret = send_rename(sctx, valid_path, cur->full_path); + ret = rename_current_inode(sctx, valid_path, cur->full_path); if (ret < 0) goto out; is_orphan = false; - ret = fs_path_copy(valid_path, cur->full_path); - if (ret < 0) - goto out; } else if (can_rename) { if (S_ISDIR(sctx->cur_inode_mode)) { /* @@ -4465,10 +4475,7 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) * dirs, we always have one new and one deleted * ref. The deleted ref is ignored later. */ - ret = send_rename(sctx, valid_path, - cur->full_path); - if (!ret) - ret = fs_path_copy(valid_path, + ret = rename_current_inode(sctx, valid_path, cur->full_path); if (ret < 0) goto out;
From: Filipe Manana fdmanana@suse.com
[ Upstream commit fc746acb7aa9aeaa2cb5dcba449323319ba5c8eb ]
Whenever we need to send a command for the current inode, like sending writes, xattr updates, truncates, utimes, etc, we compute the inode's path each time, which implies doing some memory allocations and traversing the inode hierarchy to extract the name of the inode and each ancestor directory, and that implies doing lookups in the subvolume tree amongst other operations.
Most of the time, by far, the current inode's path doesn't change while we are processing it (like if we need to issue 100 write commands, the path remains the same and it's pointless to compute it 100 times).
To avoid this keep the current inode's path cached in the send context and invalidate it or update it whenever it's needed (after unlinks or renames).
A performance test, and its results, is mentioned in the next patch in the series (subject: "btrfs: send: avoid path allocation for the current inode when issuing commands").
Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 53 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 48 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 7c6b4963ded8..7685b412b35c 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -179,6 +179,7 @@ struct send_ctx { u64 cur_inode_rdev; u64 cur_inode_last_extent; u64 cur_inode_next_write_offset; + struct fs_path cur_inode_path; bool cur_inode_new; bool cur_inode_new_gen; bool cur_inode_deleted; @@ -436,6 +437,14 @@ static void fs_path_reset(struct fs_path *p) } }
+static void init_path(struct fs_path *p) +{ + p->reversed = 0; + p->buf = p->inline_buf; + p->buf_len = FS_PATH_INLINE_SIZE; + fs_path_reset(p); +} + static struct fs_path *fs_path_alloc(void) { struct fs_path *p; @@ -443,10 +452,7 @@ static struct fs_path *fs_path_alloc(void) p = kmalloc(sizeof(*p), GFP_KERNEL); if (!p) return NULL; - p->reversed = 0; - p->buf = p->inline_buf; - p->buf_len = FS_PATH_INLINE_SIZE; - fs_path_reset(p); + init_path(p); return p; }
@@ -624,6 +630,14 @@ static void fs_path_unreverse(struct fs_path *p) p->reversed = 0; }
+static inline bool is_current_inode_path(const struct send_ctx *sctx, + const struct fs_path *path) +{ + const struct fs_path *cur = &sctx->cur_inode_path; + + return (strncmp(path->start, cur->start, fs_path_len(cur)) == 0); +} + static struct btrfs_path *alloc_path_for_send(void) { struct btrfs_path *path; @@ -2450,6 +2464,14 @@ static int get_cur_path(struct send_ctx *sctx, u64 ino, u64 gen, u64 parent_inode = 0; u64 parent_gen = 0; int stop = 0; + const bool is_cur_inode = (ino == sctx->cur_ino && gen == sctx->cur_inode_gen); + + if (is_cur_inode && fs_path_len(&sctx->cur_inode_path) > 0) { + if (dest != &sctx->cur_inode_path) + return fs_path_copy(dest, &sctx->cur_inode_path); + + return 0; + }
name = fs_path_alloc(); if (!name) { @@ -2501,8 +2523,12 @@ static int get_cur_path(struct send_ctx *sctx, u64 ino, u64 gen,
out: fs_path_free(name); - if (!ret) + if (!ret) { fs_path_unreverse(dest); + if (is_cur_inode && dest != &sctx->cur_inode_path) + ret = fs_path_copy(&sctx->cur_inode_path, dest); + } + return ret; }
@@ -3113,6 +3139,11 @@ static int orphanize_inode(struct send_ctx *sctx, u64 ino, u64 gen, goto out;
ret = send_rename(sctx, path, orphan); + if (ret < 0) + goto out; + + if (ino == sctx->cur_ino && gen == sctx->cur_inode_gen) + ret = fs_path_copy(&sctx->cur_inode_path, orphan);
out: fs_path_free(orphan); @@ -4176,6 +4207,10 @@ static int rename_current_inode(struct send_ctx *sctx, if (ret < 0) return ret;
+ ret = fs_path_copy(&sctx->cur_inode_path, new_path); + if (ret < 0) + return ret; + return fs_path_copy(current_path, new_path); }
@@ -4369,6 +4404,7 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) if (ret > 0) { orphanized_ancestor = true; fs_path_reset(valid_path); + fs_path_reset(&sctx->cur_inode_path); ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, valid_path); @@ -4568,6 +4604,8 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move) ret = send_unlink(sctx, cur->full_path); if (ret < 0) goto out; + if (is_current_inode_path(sctx, cur->full_path)) + fs_path_reset(&sctx->cur_inode_path); } ret = dup_ref(cur, &check_dirs); if (ret < 0) @@ -6900,6 +6938,7 @@ static int changed_inode(struct send_ctx *sctx, sctx->cur_inode_last_extent = (u64)-1; sctx->cur_inode_next_write_offset = 0; sctx->ignore_cur_inode = false; + fs_path_reset(&sctx->cur_inode_path);
/* * Set send_progress to current inode. This will tell all get_cur_xxx @@ -8190,6 +8229,7 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) goto out; }
+ init_path(&sctx->cur_inode_path); INIT_LIST_HEAD(&sctx->new_refs); INIT_LIST_HEAD(&sctx->deleted_refs);
@@ -8475,6 +8515,9 @@ long btrfs_ioctl_send(struct inode *inode, struct btrfs_ioctl_send_args *arg) btrfs_lru_cache_clear(&sctx->dir_created_cache); btrfs_lru_cache_clear(&sctx->dir_utimes_cache);
+ if (sctx->cur_inode_path.buf != sctx->cur_inode_path.inline_buf) + kfree(sctx->cur_inode_path.buf); + kfree(sctx); }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 374d45af6435534a11b01b88762323abf03dd755 ]
Whenever we issue a command we allocate a path and then compute it. For the current inode this is not necessary since we have one preallocated and computed in the send context structure, so we can use it instead and avoid allocating and freeing a path.
For example if we have 100 extents to send (100 write commands) for a file, we are allocating and freeing paths 100 times.
So improve on this by avoiding path allocation and freeing whenever a command is for the current inode by using the current inode's path stored in the send context structure.
A test was run before applying this patch and the previous one in the series:
"btrfs: send: keep the current inode's path cached"
The test script is the following:
$ cat test.sh #!/bin/bash
DEV=/dev/nullb0 MNT=/mnt/nullb0
mkfs.btrfs -f $DEV > /dev/null mount $DEV $MNT
DIR="$MNT/one/two/three/four" FILE="$DIR/foobar"
mkdir -p $DIR
# Create some empty files to get a deeper btree and therefore make # path computations slower. for ((i = 1; i <= 30000; i++)); do echo -n > "$DIR/filler_$i" done
for ((i = 0; i < 10000; i += 2)); do offset=$(( i * 4096 )) xfs_io -f -c "pwrite -S 0xab $offset 4K" $FILE > /dev/null done
btrfs subvolume snapshot -r $MNT $MNT/snap
start=$(date +%s%N) btrfs send -f /dev/null $MNT/snap end=$(date +%s%N)
echo -e "\nsend took $(( (end - start) / 1000000 )) milliseconds"
umount $MNT
Result before applying the 2 patches: 1121 milliseconds Result after applying the 2 patches: 815 milliseconds (-31.6%)
Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 005b0a0c24e1 ("btrfs: send: use fallocate for hole punching with send stream v2") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 215 ++++++++++++++++++++++-------------------------- 1 file changed, 97 insertions(+), 118 deletions(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 7685b412b35c..05291aeee723 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -2623,6 +2623,47 @@ static int send_subvol_begin(struct send_ctx *sctx) return ret; }
+static struct fs_path *get_cur_inode_path(struct send_ctx *sctx) +{ + if (fs_path_len(&sctx->cur_inode_path) == 0) { + int ret; + + ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, + &sctx->cur_inode_path); + if (ret < 0) + return ERR_PTR(ret); + } + + return &sctx->cur_inode_path; +} + +static struct fs_path *get_path_for_command(struct send_ctx *sctx, u64 ino, u64 gen) +{ + struct fs_path *path; + int ret; + + if (ino == sctx->cur_ino && gen == sctx->cur_inode_gen) + return get_cur_inode_path(sctx); + + path = fs_path_alloc(); + if (!path) + return ERR_PTR(-ENOMEM); + + ret = get_cur_path(sctx, ino, gen, path); + if (ret < 0) { + fs_path_free(path); + return ERR_PTR(ret); + } + + return path; +} + +static void free_path_for_command(const struct send_ctx *sctx, struct fs_path *path) +{ + if (path != &sctx->cur_inode_path) + fs_path_free(path); +} + static int send_truncate(struct send_ctx *sctx, u64 ino, u64 gen, u64 size) { struct btrfs_fs_info *fs_info = sctx->send_root->fs_info; @@ -2631,17 +2672,14 @@ static int send_truncate(struct send_ctx *sctx, u64 ino, u64 gen, u64 size)
btrfs_debug(fs_info, "send_truncate %llu size=%llu", ino, size);
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; + p = get_path_for_command(sctx, ino, gen); + if (IS_ERR(p)) + return PTR_ERR(p);
ret = begin_cmd(sctx, BTRFS_SEND_C_TRUNCATE); if (ret < 0) goto out;
- ret = get_cur_path(sctx, ino, gen, p); - if (ret < 0) - goto out; TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, size);
@@ -2649,7 +2687,7 @@ static int send_truncate(struct send_ctx *sctx, u64 ino, u64 gen, u64 size)
tlv_put_failure: out: - fs_path_free(p); + free_path_for_command(sctx, p); return ret; }
@@ -2661,17 +2699,14 @@ static int send_chmod(struct send_ctx *sctx, u64 ino, u64 gen, u64 mode)
btrfs_debug(fs_info, "send_chmod %llu mode=%llu", ino, mode);
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; + p = get_path_for_command(sctx, ino, gen); + if (IS_ERR(p)) + return PTR_ERR(p);
ret = begin_cmd(sctx, BTRFS_SEND_C_CHMOD); if (ret < 0) goto out;
- ret = get_cur_path(sctx, ino, gen, p); - if (ret < 0) - goto out; TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_MODE, mode & 07777);
@@ -2679,7 +2714,7 @@ static int send_chmod(struct send_ctx *sctx, u64 ino, u64 gen, u64 mode)
tlv_put_failure: out: - fs_path_free(p); + free_path_for_command(sctx, p); return ret; }
@@ -2694,17 +2729,14 @@ static int send_fileattr(struct send_ctx *sctx, u64 ino, u64 gen, u64 fileattr)
btrfs_debug(fs_info, "send_fileattr %llu fileattr=%llu", ino, fileattr);
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; + p = get_path_for_command(sctx, ino, gen); + if (IS_ERR(p)) + return PTR_ERR(p);
ret = begin_cmd(sctx, BTRFS_SEND_C_FILEATTR); if (ret < 0) goto out;
- ret = get_cur_path(sctx, ino, gen, p); - if (ret < 0) - goto out; TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_FILEATTR, fileattr);
@@ -2712,7 +2744,7 @@ static int send_fileattr(struct send_ctx *sctx, u64 ino, u64 gen, u64 fileattr)
tlv_put_failure: out: - fs_path_free(p); + free_path_for_command(sctx, p); return ret; }
@@ -2725,17 +2757,14 @@ static int send_chown(struct send_ctx *sctx, u64 ino, u64 gen, u64 uid, u64 gid) btrfs_debug(fs_info, "send_chown %llu uid=%llu, gid=%llu", ino, uid, gid);
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; + p = get_path_for_command(sctx, ino, gen); + if (IS_ERR(p)) + return PTR_ERR(p);
ret = begin_cmd(sctx, BTRFS_SEND_C_CHOWN); if (ret < 0) goto out;
- ret = get_cur_path(sctx, ino, gen, p); - if (ret < 0) - goto out; TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_UID, uid); TLV_PUT_U64(sctx, BTRFS_SEND_A_GID, gid); @@ -2744,7 +2773,7 @@ static int send_chown(struct send_ctx *sctx, u64 ino, u64 gen, u64 uid, u64 gid)
tlv_put_failure: out: - fs_path_free(p); + free_path_for_command(sctx, p); return ret; }
@@ -2761,9 +2790,9 @@ static int send_utimes(struct send_ctx *sctx, u64 ino, u64 gen)
btrfs_debug(fs_info, "send_utimes %llu", ino);
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; + p = get_path_for_command(sctx, ino, gen); + if (IS_ERR(p)) + return PTR_ERR(p);
path = alloc_path_for_send(); if (!path) { @@ -2788,9 +2817,6 @@ static int send_utimes(struct send_ctx *sctx, u64 ino, u64 gen) if (ret < 0) goto out;
- ret = get_cur_path(sctx, ino, gen, p); - if (ret < 0) - goto out; TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_BTRFS_TIMESPEC(sctx, BTRFS_SEND_A_ATIME, eb, &ii->atime); TLV_PUT_BTRFS_TIMESPEC(sctx, BTRFS_SEND_A_MTIME, eb, &ii->mtime); @@ -2802,7 +2828,7 @@ static int send_utimes(struct send_ctx *sctx, u64 ino, u64 gen)
tlv_put_failure: out: - fs_path_free(p); + free_path_for_command(sctx, p); btrfs_free_path(path); return ret; } @@ -4930,13 +4956,9 @@ static int send_set_xattr(struct send_ctx *sctx, struct fs_path *path; int ret;
- path = fs_path_alloc(); - if (!path) - return -ENOMEM; - - ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, path); - if (ret < 0) - goto out; + path = get_cur_inode_path(sctx); + if (IS_ERR(path)) + return PTR_ERR(path);
ret = begin_cmd(sctx, BTRFS_SEND_C_SET_XATTR); if (ret < 0) @@ -4950,8 +4972,6 @@ static int send_set_xattr(struct send_ctx *sctx,
tlv_put_failure: out: - fs_path_free(path); - return ret; }
@@ -5009,23 +5029,14 @@ static int __process_deleted_xattr(int num, struct btrfs_key *di_key, const char *name, int name_len, const char *data, int data_len, void *ctx) { - int ret; struct send_ctx *sctx = ctx; struct fs_path *p;
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; - - ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); - if (ret < 0) - goto out; - - ret = send_remove_xattr(sctx, p, name, name_len); + p = get_cur_inode_path(sctx); + if (IS_ERR(p)) + return PTR_ERR(p);
-out: - fs_path_free(p); - return ret; + return send_remove_xattr(sctx, p, name, name_len); }
static int process_new_xattr(struct send_ctx *sctx) @@ -5259,21 +5270,13 @@ static int process_verity(struct send_ctx *sctx) if (ret < 0) goto iput;
- p = fs_path_alloc(); - if (!p) { - ret = -ENOMEM; + p = get_cur_inode_path(sctx); + if (IS_ERR(p)) { + ret = PTR_ERR(p); goto iput; } - ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); - if (ret < 0) - goto free_path;
ret = send_verity(sctx, p, sctx->verity_descriptor); - if (ret < 0) - goto free_path; - -free_path: - fs_path_free(p); iput: iput(inode); return ret; @@ -5388,31 +5391,25 @@ static int send_write(struct send_ctx *sctx, u64 offset, u32 len) int ret = 0; struct fs_path *p;
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; - btrfs_debug(fs_info, "send_write offset=%llu, len=%d", offset, len);
- ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE); - if (ret < 0) - goto out; + p = get_cur_inode_path(sctx); + if (IS_ERR(p)) + return PTR_ERR(p);
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); + ret = begin_cmd(sctx, BTRFS_SEND_C_WRITE); if (ret < 0) - goto out; + return ret;
TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); ret = put_file_data(sctx, offset, len); if (ret < 0) - goto out; + return ret;
ret = send_cmd(sctx);
tlv_put_failure: -out: - fs_path_free(p); return ret; }
@@ -5425,6 +5422,7 @@ static int send_clone(struct send_ctx *sctx, { int ret = 0; struct fs_path *p; + struct fs_path *cur_inode_path; u64 gen;
btrfs_debug(sctx->send_root->fs_info, @@ -5432,6 +5430,10 @@ static int send_clone(struct send_ctx *sctx, offset, len, clone_root->root->root_key.objectid, clone_root->ino, clone_root->offset);
+ cur_inode_path = get_cur_inode_path(sctx); + if (IS_ERR(cur_inode_path)) + return PTR_ERR(cur_inode_path); + p = fs_path_alloc(); if (!p) return -ENOMEM; @@ -5440,13 +5442,9 @@ static int send_clone(struct send_ctx *sctx, if (ret < 0) goto out;
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); - if (ret < 0) - goto out; - TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); TLV_PUT_U64(sctx, BTRFS_SEND_A_CLONE_LEN, len); - TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, cur_inode_path);
if (clone_root->root == sctx->send_root) { ret = get_inode_gen(sctx->send_root, clone_root->ino, &gen); @@ -5497,17 +5495,13 @@ static int send_update_extent(struct send_ctx *sctx, int ret = 0; struct fs_path *p;
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; + p = get_cur_inode_path(sctx); + if (IS_ERR(p)) + return PTR_ERR(p);
ret = begin_cmd(sctx, BTRFS_SEND_C_UPDATE_EXTENT); if (ret < 0) - goto out; - - ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); - if (ret < 0) - goto out; + return ret;
TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, p); TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); @@ -5516,8 +5510,6 @@ static int send_update_extent(struct send_ctx *sctx, ret = send_cmd(sctx);
tlv_put_failure: -out: - fs_path_free(p); return ret; }
@@ -5546,12 +5538,10 @@ static int send_hole(struct send_ctx *sctx, u64 end) if (sctx->flags & BTRFS_SEND_FLAG_NO_FILE_DATA) return send_update_extent(sctx, offset, end - offset);
- p = fs_path_alloc(); - if (!p) - return -ENOMEM; - ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, p); - if (ret < 0) - goto tlv_put_failure; + p = get_cur_inode_path(sctx); + if (IS_ERR(p)) + return PTR_ERR(p); + while (offset < end) { u64 len = min(end - offset, read_size);
@@ -5572,7 +5562,6 @@ static int send_hole(struct send_ctx *sctx, u64 end) } sctx->cur_inode_next_write_offset = offset; tlv_put_failure: - fs_path_free(p); return ret; }
@@ -5595,9 +5584,9 @@ static int send_encoded_inline_extent(struct send_ctx *sctx, if (IS_ERR(inode)) return PTR_ERR(inode);
- fspath = fs_path_alloc(); - if (!fspath) { - ret = -ENOMEM; + fspath = get_cur_inode_path(sctx); + if (IS_ERR(fspath)) { + ret = PTR_ERR(fspath); goto out; }
@@ -5605,10 +5594,6 @@ static int send_encoded_inline_extent(struct send_ctx *sctx, if (ret < 0) goto out;
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, fspath); - if (ret < 0) - goto out; - btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item); ram_bytes = btrfs_file_extent_ram_bytes(leaf, ei); @@ -5637,7 +5622,6 @@ static int send_encoded_inline_extent(struct send_ctx *sctx,
tlv_put_failure: out: - fs_path_free(fspath); iput(inode); return ret; } @@ -5662,9 +5646,9 @@ static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path, if (IS_ERR(inode)) return PTR_ERR(inode);
- fspath = fs_path_alloc(); - if (!fspath) { - ret = -ENOMEM; + fspath = get_cur_inode_path(sctx); + if (IS_ERR(fspath)) { + ret = PTR_ERR(fspath); goto out; }
@@ -5672,10 +5656,6 @@ static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path, if (ret < 0) goto out;
- ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, fspath); - if (ret < 0) - goto out; - btrfs_item_key_to_cpu(leaf, &key, path->slots[0]); ei = btrfs_item_ptr(leaf, path->slots[0], struct btrfs_file_extent_item); disk_bytenr = btrfs_file_extent_disk_bytenr(leaf, ei); @@ -5742,7 +5722,6 @@ static int send_encoded_extent(struct send_ctx *sctx, struct btrfs_path *path,
tlv_put_failure: out: - fs_path_free(fspath); iput(inode); return ret; }
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 005b0a0c24e1628313e951516b675109a92cacfe ]
Currently holes are sent as writes full of zeroes, which results in unnecessarily using disk space at the receiving end and increasing the stream size.
In some cases we avoid sending writes of zeroes, like during a full send operation where we just skip writes for holes.
But for some cases we fill previous holes with writes of zeroes too, like in this scenario:
1) We have a file with a hole in the range [2M, 3M), we snapshot the subvolume and do a full send. The range [2M, 3M) stays as a hole at the receiver since we skip sending write commands full of zeroes;
2) We punch a hole for the range [3M, 4M) in our file, so that now it has a 2M hole in the range [2M, 4M), and snapshot the subvolume. Now if we do an incremental send, we will send write commands full of zeroes for the range [2M, 4M), removing the hole for [2M, 3M) at the receiver.
We could improve cases such as this last one by doing additional comparisons of file extent items (or their absence) between the parent and send snapshots, but that's a lot of code to add plus additional CPU and IO costs.
Since the send stream v2 already has a fallocate command and btrfs-progs implements a callback to execute fallocate since the send stream v2 support was added to it, update the kernel to use fallocate for punching holes for V2+ streams.
Test coverage is provided by btrfs/284 which is a version of btrfs/007 that exercises send stream v2 instead of v1, using fsstress with random operations and fssum to verify file contents.
Link: https://github.com/kdave/btrfs-progs/issues/1001 CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Boris Burkov boris@bur.io Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index 05291aeee723..a512e1628dc5 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4,6 +4,7 @@ */
#include <linux/bsearch.h> +#include <linux/falloc.h> #include <linux/fs.h> #include <linux/file.h> #include <linux/sort.h> @@ -5513,6 +5514,30 @@ static int send_update_extent(struct send_ctx *sctx, return ret; }
+static int send_fallocate(struct send_ctx *sctx, u32 mode, u64 offset, u64 len) +{ + struct fs_path *path; + int ret; + + path = get_cur_inode_path(sctx); + if (IS_ERR(path)) + return PTR_ERR(path); + + ret = begin_cmd(sctx, BTRFS_SEND_C_FALLOCATE); + if (ret < 0) + return ret; + + TLV_PUT_PATH(sctx, BTRFS_SEND_A_PATH, path); + TLV_PUT_U32(sctx, BTRFS_SEND_A_FALLOCATE_MODE, mode); + TLV_PUT_U64(sctx, BTRFS_SEND_A_FILE_OFFSET, offset); + TLV_PUT_U64(sctx, BTRFS_SEND_A_SIZE, len); + + ret = send_cmd(sctx); + +tlv_put_failure: + return ret; +} + static int send_hole(struct send_ctx *sctx, u64 end) { struct fs_path *p = NULL; @@ -5520,6 +5545,14 @@ static int send_hole(struct send_ctx *sctx, u64 end) u64 offset = sctx->cur_inode_last_extent; int ret = 0;
+ /* + * Starting with send stream v2 we have fallocate and can use it to + * punch holes instead of sending writes full of zeroes. + */ + if (proto_cmd_ok(sctx, BTRFS_SEND_C_FALLOCATE)) + return send_fallocate(sctx, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE, + offset, end - offset); + /* * A hole that starts at EOF or beyond it. Since we do not yet support * fallocate (for extent preallocation and hole punching), sending a
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 920e8ee2bfcaf886fd8c0ad9df097a7dddfeb2d8 ]
The helper function fs_path_len() is trivial and doesn't need to change its path argument, so make it inline and constify the argument.
Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/send.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index a512e1628dc5..c25eb4416a67 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -478,7 +478,7 @@ static void fs_path_free(struct fs_path *p) kfree(p); }
-static int fs_path_len(struct fs_path *p) +static inline int fs_path_len(const struct fs_path *p) { return p->end - p->start; }
linux-stable-mirror@lists.linaro.org