The patch below does not apply to the 6.12-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to stable@vger.kernel.org.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y git checkout FETCH_HEAD git cherry-pick -x 29ec9bed2395061350249ae356fb300dd82a78e7 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to 'stable@vger.kernel.org' --in-reply-to '2025062009-junior-thriving-f882@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 29ec9bed2395061350249ae356fb300dd82a78e7 Mon Sep 17 00:00:00 2001 From: Zhang Yi yi.zhang@huawei.com Date: Tue, 6 May 2025 09:20:07 +0800 Subject: [PATCH] ext4: fix incorrect punch max_end
For the extents based inodes, the maxbytes should be sb->s_maxbytes instead of sbi->s_bitmap_maxbytes. Additionally, for the calculation of max_end, the -sb->s_blocksize operation is necessary only for indirect-block based inodes. Correct the maxbytes and max_end value to correct the behavior of punch hole.
Fixes: 2da376228a24 ("ext4: limit length to bitmap_maxbytes - blocksize in punch_hole") Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Baokun Li libaokun1@huawei.com Link: https://patch.msgid.link/20250506012009.3896990-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 99f30b9cfe17..01038b4ecee0 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4051,7 +4051,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; ext4_lblk_t start_lblk, end_lblk; - loff_t max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; + loff_t max_end = sb->s_maxbytes; loff_t end = offset + length; handle_t *handle; unsigned int credits; @@ -4060,14 +4060,20 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) trace_ext4_punch_hole(inode, offset, length, 0); WARN_ON_ONCE(!inode_is_locked(inode));
+ /* + * For indirect-block based inodes, make sure that the hole within + * one block before last range. + */ + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; + /* No need to punch hole beyond i_size */ if (offset >= inode->i_size || offset >= max_end) return 0;
/* * If the hole extends beyond i_size, set the hole to end after - * the page that contains i_size, and also make sure that the hole - * within one block before last range. + * the page that contains i_size. */ if (end > inode->i_size) end = round_up(inode->i_size, PAGE_SIZE);
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 73ae756ecdfa9684446134590eef32b0f067249c ]
After commit 'ad5cd4f4ee4d ("ext4: fix fallocate to use file_modified to update permissions consistently"), we can update mtime and ctime appropriately through file_modified() when doing zero range, collapse rage, insert range and punch hole, hence there is no need to explicit update times in those paths, just drop them.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-3-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 5 ----- fs/ext4/inode.c | 1 - 2 files changed, 6 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index b16d72275e105..43da9906b9240 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4675,8 +4675,6 @@ static long ext4_zero_range(struct file *file, loff_t offset, goto out_mutex; }
- inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); - ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags); filemap_invalidate_unlock(mapping); @@ -4700,7 +4698,6 @@ static long ext4_zero_range(struct file *file, loff_t offset, goto out_mutex; }
- inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); if (new_size) ext4_update_inode_size(inode, new_size); ret = ext4_mark_inode_dirty(handle, inode); @@ -5431,7 +5428,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) up_write(&EXT4_I(inode)->i_data_sem); if (IS_SYNC(inode)) ext4_handle_sync(handle); - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); ret = ext4_mark_inode_dirty(handle, inode); ext4_update_inode_fsync_trans(handle, inode, 1);
@@ -5541,7 +5537,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) /* Expand file to avoid data loss if there is error while shifting */ inode->i_size += len; EXT4_I(inode)->i_disksize += len; - inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); ret = ext4_mark_inode_dirty(handle, inode); if (ret) goto out_stop; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f769f5cb6deb7..e4b6ab28d7055 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4113,7 +4113,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) if (IS_SYNC(inode)) ext4_handle_sync(handle);
- inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode)); ret2 = ext4_mark_inode_dirty(handle, inode); if (unlikely(ret2)) ret = ret2;
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 982bf37da09d078570650b691d9084f43805a5de ]
The current implementation of ext4_punch_hole() contains complex position calculations and stale error tags. To improve the code's clarity and maintainability, it is essential to clean up the code and improve its readability, this can be achieved by: a) simplifying and renaming variables; b) eliminating unnecessary position calculations; c) writing back all data in data=journal mode, and drop page cache from the original offset to the end, rather than using aligned blocks, d) renaming the stale error tags.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/ext4.h | 2 + fs/ext4/inode.c | 119 +++++++++++++++++++++--------------------------- 2 files changed, 55 insertions(+), 66 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index e94df69ee2e0d..a95525bfb99cf 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -368,6 +368,8 @@ struct ext4_io_submit { #define EXT4_MAX_BLOCKS(size, offset, blkbits) \ ((EXT4_BLOCK_ALIGN(size + offset, blkbits) >> blkbits) - (offset >> \ blkbits)) +#define EXT4_B_TO_LBLK(inode, offset) \ + (round_up((offset), i_blocksize(inode)) >> (inode)->i_blkbits)
/* Translate a block number to a cluster number */ #define EXT4_B2C(sbi, blk) ((blk) >> (sbi)->s_cluster_bits) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e4b6ab28d7055..bb68c851b33ad 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3991,13 +3991,13 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) { struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; - ext4_lblk_t first_block, stop_block; + ext4_lblk_t start_lblk, end_lblk; struct address_space *mapping = inode->i_mapping; - loff_t first_block_offset, last_block_offset, max_length; - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + loff_t max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; + loff_t end = offset + length; handle_t *handle; unsigned int credits; - int ret = 0, ret2 = 0; + int ret = 0;
trace_ext4_punch_hole(inode, offset, length, 0);
@@ -4005,36 +4005,27 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
/* No need to punch hole beyond i_size */ if (offset >= inode->i_size) - goto out_mutex; + goto out;
/* - * If the hole extends beyond i_size, set the hole - * to end after the page that contains i_size + * If the hole extends beyond i_size, set the hole to end after + * the page that contains i_size, and also make sure that the hole + * within one block before last range. */ - if (offset + length > inode->i_size) { - length = inode->i_size + - PAGE_SIZE - (inode->i_size & (PAGE_SIZE - 1)) - - offset; - } + if (end > inode->i_size) + end = round_up(inode->i_size, PAGE_SIZE); + if (end > max_end) + end = max_end; + length = end - offset;
/* - * For punch hole the length + offset needs to be within one block - * before last range. Adjust the length if it goes beyond that limit. + * Attach jinode to inode for jbd2 if we do any zeroing of partial + * block. */ - max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize; - if (offset + length > max_length) - length = max_length - offset; - - if (offset & (sb->s_blocksize - 1) || - (offset + length) & (sb->s_blocksize - 1)) { - /* - * Attach jinode to inode for jbd2 if we do any zeroing of - * partial block - */ + if (!IS_ALIGNED(offset | end, sb->s_blocksize)) { ret = ext4_inode_attach_jinode(inode); if (ret < 0) - goto out_mutex; - + goto out; }
/* Wait all existing dio workers, newcomers will block on i_rwsem */ @@ -4042,7 +4033,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
ret = file_modified(file); if (ret) - goto out_mutex; + goto out;
/* * Prevent page faults from reinstantiating pages we have released from @@ -4052,22 +4043,16 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
ret = ext4_break_layouts(inode); if (ret) - goto out_dio; + goto out_invalidate_lock;
- first_block_offset = round_up(offset, sb->s_blocksize); - last_block_offset = round_down((offset + length), sb->s_blocksize) - 1; + ret = ext4_update_disksize_before_punch(inode, offset, length); + if (ret) + goto out_invalidate_lock;
/* Now release the pages and zero block aligned part of pages*/ - if (last_block_offset > first_block_offset) { - ret = ext4_update_disksize_before_punch(inode, offset, length); - if (ret) - goto out_dio; - - ret = ext4_truncate_page_cache_block_range(inode, - first_block_offset, last_block_offset + 1); - if (ret) - goto out_dio; - } + ret = ext4_truncate_page_cache_block_range(inode, offset, end); + if (ret) + goto out_invalidate_lock;
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) credits = ext4_writepage_trans_blocks(inode); @@ -4077,52 +4062,54 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) if (IS_ERR(handle)) { ret = PTR_ERR(handle); ext4_std_error(sb, ret); - goto out_dio; + goto out_invalidate_lock; }
- ret = ext4_zero_partial_blocks(handle, inode, offset, - length); + ret = ext4_zero_partial_blocks(handle, inode, offset, length); if (ret) - goto out_stop; - - first_block = (offset + sb->s_blocksize - 1) >> - EXT4_BLOCK_SIZE_BITS(sb); - stop_block = (offset + length) >> EXT4_BLOCK_SIZE_BITS(sb); + goto out_handle;
/* If there are blocks to remove, do it */ - if (stop_block > first_block) { - ext4_lblk_t hole_len = stop_block - first_block; + start_lblk = EXT4_B_TO_LBLK(inode, offset); + end_lblk = end >> inode->i_blkbits; + + if (end_lblk > start_lblk) { + ext4_lblk_t hole_len = end_lblk - start_lblk;
down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode);
- ext4_es_remove_extent(inode, first_block, hole_len); + ext4_es_remove_extent(inode, start_lblk, hole_len);
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - ret = ext4_ext_remove_space(inode, first_block, - stop_block - 1); + ret = ext4_ext_remove_space(inode, start_lblk, + end_lblk - 1); else - ret = ext4_ind_remove_space(handle, inode, first_block, - stop_block); + ret = ext4_ind_remove_space(handle, inode, start_lblk, + end_lblk); + if (ret) { + up_write(&EXT4_I(inode)->i_data_sem); + goto out_handle; + }
- ext4_es_insert_extent(inode, first_block, hole_len, ~0, + ext4_es_insert_extent(inode, start_lblk, hole_len, ~0, EXTENT_STATUS_HOLE, 0); up_write(&EXT4_I(inode)->i_data_sem); } - ext4_fc_track_range(handle, inode, first_block, stop_block); + ext4_fc_track_range(handle, inode, start_lblk, end_lblk); + + ret = ext4_mark_inode_dirty(handle, inode); + if (unlikely(ret)) + goto out_handle; + + ext4_update_inode_fsync_trans(handle, inode, 1); if (IS_SYNC(inode)) ext4_handle_sync(handle); - - ret2 = ext4_mark_inode_dirty(handle, inode); - if (unlikely(ret2)) - ret = ret2; - if (ret >= 0) - ext4_update_inode_fsync_trans(handle, inode, 1); -out_stop: +out_handle: ext4_journal_stop(handle); -out_dio: +out_invalidate_lock: filemap_invalidate_unlock(mapping); -out_mutex: +out: inode_unlock(inode); return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 53471e0bedad5891b860d02233819dc0e28189e2 ]
The current implementation of ext4_zero_range() contains complex position calculations and stale error tags. To improve the code's clarity and maintainability, it is essential to clean up the code and improve its readability, this can be achieved by: a) simplifying and renaming variables, making the style the same as ext4_punch_hole(); b) eliminating unnecessary position calculations, writing back all data in data=journal mode, and drop page cache from the original offset to the end, rather than using aligned blocks; c) renaming the stale out_mutex tags.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-6-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 142 +++++++++++++++++++--------------------------- 1 file changed, 57 insertions(+), 85 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 43da9906b9240..00c7a03cc7c6e 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4571,40 +4571,15 @@ static long ext4_zero_range(struct file *file, loff_t offset, struct inode *inode = file_inode(file); struct address_space *mapping = file->f_mapping; handle_t *handle = NULL; - unsigned int max_blocks; loff_t new_size = 0; - int ret = 0; - int flags; - int credits; - int partial_begin, partial_end; - loff_t start, end; - ext4_lblk_t lblk; + loff_t end = offset + len; + ext4_lblk_t start_lblk, end_lblk; + unsigned int blocksize = i_blocksize(inode); unsigned int blkbits = inode->i_blkbits; + int ret, flags, credits;
trace_ext4_zero_range(inode, offset, len, mode);
- /* - * Round up offset. This is not fallocate, we need to zero out - * blocks, so convert interior block aligned part of the range to - * unwritten and possibly manually zero out unaligned parts of the - * range. Here, start and partial_begin are inclusive, end and - * partial_end are exclusive. - */ - start = round_up(offset, 1 << blkbits); - end = round_down((offset + len), 1 << blkbits); - - if (start < offset || end > offset + len) - return -EINVAL; - partial_begin = offset & ((1 << blkbits) - 1); - partial_end = (offset + len) & ((1 << blkbits) - 1); - - lblk = start >> blkbits; - max_blocks = (end >> blkbits); - if (max_blocks < lblk) - max_blocks = 0; - else - max_blocks -= lblk; - inode_lock(inode);
/* @@ -4612,77 +4587,70 @@ static long ext4_zero_range(struct file *file, loff_t offset, */ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { ret = -EOPNOTSUPP; - goto out_mutex; + goto out; }
if (!(mode & FALLOC_FL_KEEP_SIZE) && - (offset + len > inode->i_size || - offset + len > EXT4_I(inode)->i_disksize)) { - new_size = offset + len; + (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) { + new_size = end; ret = inode_newsize_ok(inode, new_size); if (ret) - goto out_mutex; + goto out; }
- flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; - /* Wait all existing dio workers, newcomers will block on i_rwsem */ inode_dio_wait(inode);
ret = file_modified(file); if (ret) - goto out_mutex; - - /* Preallocate the range including the unaligned edges */ - if (partial_begin || partial_end) { - ret = ext4_alloc_file_blocks(file, - round_down(offset, 1 << blkbits) >> blkbits, - (round_up((offset + len), 1 << blkbits) - - round_down(offset, 1 << blkbits)) >> blkbits, - new_size, flags); - if (ret) - goto out_mutex; + goto out;
- } + /* + * Prevent page faults from reinstantiating pages we have released + * from page cache. + */ + filemap_invalidate_lock(mapping);
- /* Zero range excluding the unaligned edges */ - if (max_blocks > 0) { - flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | - EXT4_EX_NOCACHE); + ret = ext4_break_layouts(inode); + if (ret) + goto out_invalidate_lock;
- /* - * Prevent page faults from reinstantiating pages we have - * released from page cache. - */ - filemap_invalidate_lock(mapping); + flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; + /* Preallocate the range including the unaligned edges */ + if (!IS_ALIGNED(offset | end, blocksize)) { + ext4_lblk_t alloc_lblk = offset >> blkbits; + ext4_lblk_t len_lblk = EXT4_MAX_BLOCKS(len, offset, blkbits);
- ret = ext4_break_layouts(inode); - if (ret) { - filemap_invalidate_unlock(mapping); - goto out_mutex; - } + ret = ext4_alloc_file_blocks(file, alloc_lblk, len_lblk, + new_size, flags); + if (ret) + goto out_invalidate_lock; + }
- ret = ext4_update_disksize_before_punch(inode, offset, len); - if (ret) { - filemap_invalidate_unlock(mapping); - goto out_mutex; - } + ret = ext4_update_disksize_before_punch(inode, offset, len); + if (ret) + goto out_invalidate_lock;
- /* Now release the pages and zero block aligned part of pages */ - ret = ext4_truncate_page_cache_block_range(inode, start, end); - if (ret) { - filemap_invalidate_unlock(mapping); - goto out_mutex; - } + /* Now release the pages and zero block aligned part of pages */ + ret = ext4_truncate_page_cache_block_range(inode, offset, end); + if (ret) + goto out_invalidate_lock;
- ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, - flags); - filemap_invalidate_unlock(mapping); + /* Zero range excluding the unaligned edges */ + start_lblk = EXT4_B_TO_LBLK(inode, offset); + end_lblk = end >> blkbits; + if (end_lblk > start_lblk) { + ext4_lblk_t zero_blks = end_lblk - start_lblk; + + flags |= (EXT4_GET_BLOCKS_CONVERT_UNWRITTEN | EXT4_EX_NOCACHE); + ret = ext4_alloc_file_blocks(file, start_lblk, zero_blks, + new_size, flags); if (ret) - goto out_mutex; + goto out_invalidate_lock; } - if (!partial_begin && !partial_end) - goto out_mutex; + /* Finish zeroing out if it doesn't contain partial block */ + if (IS_ALIGNED(offset | end, blocksize)) + goto out_invalidate_lock;
/* * In worst case we have to writeout two nonadjacent unwritten @@ -4695,25 +4663,29 @@ static long ext4_zero_range(struct file *file, loff_t offset, if (IS_ERR(handle)) { ret = PTR_ERR(handle); ext4_std_error(inode->i_sb, ret); - goto out_mutex; + goto out_invalidate_lock; }
+ /* Zero out partial block at the edges of the range */ + ret = ext4_zero_partial_blocks(handle, inode, offset, len); + if (ret) + goto out_handle; + if (new_size) ext4_update_inode_size(inode, new_size); ret = ext4_mark_inode_dirty(handle, inode); if (unlikely(ret)) goto out_handle; - /* Zero out partial block at the edges of the range */ - ret = ext4_zero_partial_blocks(handle, inode, offset, len); - if (ret >= 0) - ext4_update_inode_fsync_trans(handle, inode, 1);
+ ext4_update_inode_fsync_trans(handle, inode, 1); if (file->f_flags & O_SYNC) ext4_handle_sync(handle);
out_handle: ext4_journal_stop(handle); -out_mutex: +out_invalidate_lock: + filemap_invalidate_unlock(mapping); +out: inode_unlock(inode); return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 162e3c5ad1672ef41dccfb28ad198c704b8aa9e7 ]
Simplify ext4_collapse_range() and align its code style with that of ext4_zero_range() and ext4_punch_hole(). Refactor it by: a) renaming variables, b) removing redundant input parameter checks and moving the remaining checks under i_rwsem in preparation for future refactoring, and c) renaming the three stale error tags.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-7-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 103 +++++++++++++++++++++------------------------- 1 file changed, 48 insertions(+), 55 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 00c7a03cc7c6e..54fbeba3a929d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5288,43 +5288,36 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; struct address_space *mapping = inode->i_mapping; - ext4_lblk_t punch_start, punch_stop; + loff_t end = offset + len; + ext4_lblk_t start_lblk, end_lblk; handle_t *handle; unsigned int credits; - loff_t new_size, ioffset; + loff_t start, new_size; int ret;
- /* - * We need to test this early because xfstests assumes that a - * collapse range of (0, 1) will return EOPNOTSUPP if the file - * system does not support collapse range. - */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - return -EOPNOTSUPP; + trace_ext4_collapse_range(inode, offset, len);
- /* Collapse range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) - return -EINVAL; + inode_lock(inode);
- trace_ext4_collapse_range(inode, offset, len); + /* Currently just for extent based files */ + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { + ret = -EOPNOTSUPP; + goto out; + }
- punch_start = offset >> EXT4_BLOCK_SIZE_BITS(sb); - punch_stop = (offset + len) >> EXT4_BLOCK_SIZE_BITS(sb); + /* Collapse range works only on fs cluster size aligned regions. */ + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { + ret = -EINVAL; + goto out; + }
- inode_lock(inode); /* * There is no need to overlap collapse range with EOF, in which case * it is effectively a truncate operation */ - if (offset + len >= inode->i_size) { + if (end >= inode->i_size) { ret = -EINVAL; - goto out_mutex; - } - - /* Currently just for extent based files */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { - ret = -EOPNOTSUPP; - goto out_mutex; + goto out; }
/* Wait for existing dio to complete */ @@ -5332,7 +5325,7 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
ret = file_modified(file); if (ret) - goto out_mutex; + goto out;
/* * Prevent page faults from reinstantiating pages we have released from @@ -5342,55 +5335,52 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
ret = ext4_break_layouts(inode); if (ret) - goto out_mmap; + goto out_invalidate_lock;
/* + * Write tail of the last page before removed range and data that + * will be shifted since they will get removed from the page cache + * below. We are also protected from pages becoming dirty by + * i_rwsem and invalidate_lock. * Need to round down offset to be aligned with page size boundary * for page size > block size. */ - ioffset = round_down(offset, PAGE_SIZE); - /* - * Write tail of the last page before removed range since it will get - * removed from the page cache below. - */ - ret = filemap_write_and_wait_range(mapping, ioffset, offset); - if (ret) - goto out_mmap; - /* - * Write data that will be shifted to preserve them when discarding - * page cache below. We are also protected from pages becoming dirty - * by i_rwsem and invalidate_lock. - */ - ret = filemap_write_and_wait_range(mapping, offset + len, - LLONG_MAX); + start = round_down(offset, PAGE_SIZE); + ret = filemap_write_and_wait_range(mapping, start, offset); + if (!ret) + ret = filemap_write_and_wait_range(mapping, end, LLONG_MAX); if (ret) - goto out_mmap; - truncate_pagecache(inode, ioffset); + goto out_invalidate_lock; + + truncate_pagecache(inode, start);
credits = ext4_writepage_trans_blocks(inode); handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); if (IS_ERR(handle)) { ret = PTR_ERR(handle); - goto out_mmap; + goto out_invalidate_lock; } ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
+ start_lblk = offset >> inode->i_blkbits; + end_lblk = (offset + len) >> inode->i_blkbits; + down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode); - ext4_es_remove_extent(inode, punch_start, EXT_MAX_BLOCKS - punch_start); + ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk);
- ret = ext4_ext_remove_space(inode, punch_start, punch_stop - 1); + ret = ext4_ext_remove_space(inode, start_lblk, end_lblk - 1); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); - goto out_stop; + goto out_handle; } ext4_discard_preallocations(inode);
- ret = ext4_ext_shift_extents(inode, handle, punch_stop, - punch_stop - punch_start, SHIFT_LEFT); + ret = ext4_ext_shift_extents(inode, handle, end_lblk, + end_lblk - start_lblk, SHIFT_LEFT); if (ret) { up_write(&EXT4_I(inode)->i_data_sem); - goto out_stop; + goto out_handle; }
new_size = inode->i_size - len; @@ -5398,16 +5388,19 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) EXT4_I(inode)->i_disksize = new_size;
up_write(&EXT4_I(inode)->i_data_sem); - if (IS_SYNC(inode)) - ext4_handle_sync(handle); ret = ext4_mark_inode_dirty(handle, inode); + if (ret) + goto out_handle; + ext4_update_inode_fsync_trans(handle, inode, 1); + if (IS_SYNC(inode)) + ext4_handle_sync(handle);
-out_stop: +out_handle: ext4_journal_stop(handle); -out_mmap: +out_invalidate_lock: filemap_invalidate_unlock(mapping); -out_mutex: +out: inode_unlock(inode); return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 49425504376c335c68f7be54ae7c32312afd9475 ]
Simplify ext4_insert_range() and align its code style with that of ext4_collapse_range(). Refactor it by: a) renaming variables, b) removing redundant input parameter checks and moving the remaining checks under i_rwsem in preparation for future refactoring, and c) renaming the three stale error tags.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-8-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 101 ++++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 53 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 54fbeba3a929d..961e7b634401d 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5421,45 +5421,37 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) handle_t *handle; struct ext4_ext_path *path; struct ext4_extent *extent; - ext4_lblk_t offset_lblk, len_lblk, ee_start_lblk = 0; + ext4_lblk_t start_lblk, len_lblk, ee_start_lblk = 0; unsigned int credits, ee_len; - int ret = 0, depth, split_flag = 0; - loff_t ioffset; - - /* - * We need to test this early because xfstests assumes that an - * insert range of (0, 1) will return EOPNOTSUPP if the file - * system does not support insert range. - */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - return -EOPNOTSUPP; - - /* Insert range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) - return -EINVAL; + int ret, depth, split_flag = 0; + loff_t start;
trace_ext4_insert_range(inode, offset, len);
- offset_lblk = offset >> EXT4_BLOCK_SIZE_BITS(sb); - len_lblk = len >> EXT4_BLOCK_SIZE_BITS(sb); - inode_lock(inode); + /* Currently just for extent based files */ if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { ret = -EOPNOTSUPP; - goto out_mutex; + goto out; }
- /* Check whether the maximum file size would be exceeded */ - if (len > inode->i_sb->s_maxbytes - inode->i_size) { - ret = -EFBIG; - goto out_mutex; + /* Insert range works only on fs cluster size aligned regions. */ + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { + ret = -EINVAL; + goto out; }
/* Offset must be less than i_size */ if (offset >= inode->i_size) { ret = -EINVAL; - goto out_mutex; + goto out; + } + + /* Check whether the maximum file size would be exceeded */ + if (len > inode->i_sb->s_maxbytes - inode->i_size) { + ret = -EFBIG; + goto out; }
/* Wait for existing dio to complete */ @@ -5467,7 +5459,7 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
ret = file_modified(file); if (ret) - goto out_mutex; + goto out;
/* * Prevent page faults from reinstantiating pages we have released from @@ -5477,25 +5469,24 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
ret = ext4_break_layouts(inode); if (ret) - goto out_mmap; + goto out_invalidate_lock;
/* - * Need to round down to align start offset to page size boundary - * for page size > block size. + * Write out all dirty pages. Need to round down to align start offset + * to page size boundary for page size > block size. */ - ioffset = round_down(offset, PAGE_SIZE); - /* Write out all dirty pages */ - ret = filemap_write_and_wait_range(inode->i_mapping, ioffset, - LLONG_MAX); + start = round_down(offset, PAGE_SIZE); + ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX); if (ret) - goto out_mmap; - truncate_pagecache(inode, ioffset); + goto out_invalidate_lock; + + truncate_pagecache(inode, start);
credits = ext4_writepage_trans_blocks(inode); handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); if (IS_ERR(handle)) { ret = PTR_ERR(handle); - goto out_mmap; + goto out_invalidate_lock; } ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
@@ -5504,16 +5495,19 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) EXT4_I(inode)->i_disksize += len; ret = ext4_mark_inode_dirty(handle, inode); if (ret) - goto out_stop; + goto out_handle; + + start_lblk = offset >> inode->i_blkbits; + len_lblk = len >> inode->i_blkbits;
down_write(&EXT4_I(inode)->i_data_sem); ext4_discard_preallocations(inode);
- path = ext4_find_extent(inode, offset_lblk, NULL, 0); + path = ext4_find_extent(inode, start_lblk, NULL, 0); if (IS_ERR(path)) { up_write(&EXT4_I(inode)->i_data_sem); ret = PTR_ERR(path); - goto out_stop; + goto out_handle; }
depth = ext_depth(inode); @@ -5523,16 +5517,16 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) ee_len = ext4_ext_get_actual_len(extent);
/* - * If offset_lblk is not the starting block of extent, split - * the extent @offset_lblk + * If start_lblk is not the starting block of extent, split + * the extent @start_lblk */ - if ((offset_lblk > ee_start_lblk) && - (offset_lblk < (ee_start_lblk + ee_len))) { + if ((start_lblk > ee_start_lblk) && + (start_lblk < (ee_start_lblk + ee_len))) { if (ext4_ext_is_unwritten(extent)) split_flag = EXT4_EXT_MARK_UNWRIT1 | EXT4_EXT_MARK_UNWRIT2; path = ext4_split_extent_at(handle, inode, path, - offset_lblk, split_flag, + start_lblk, split_flag, EXT4_EX_NOCACHE | EXT4_GET_BLOCKS_PRE_IO | EXT4_GET_BLOCKS_METADATA_NOFAIL); @@ -5541,31 +5535,32 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) if (IS_ERR(path)) { up_write(&EXT4_I(inode)->i_data_sem); ret = PTR_ERR(path); - goto out_stop; + goto out_handle; } }
ext4_free_ext_path(path); - ext4_es_remove_extent(inode, offset_lblk, EXT_MAX_BLOCKS - offset_lblk); + ext4_es_remove_extent(inode, start_lblk, EXT_MAX_BLOCKS - start_lblk);
/* - * if offset_lblk lies in a hole which is at start of file, use + * if start_lblk lies in a hole which is at start of file, use * ee_start_lblk to shift extents */ ret = ext4_ext_shift_extents(inode, handle, - max(ee_start_lblk, offset_lblk), len_lblk, SHIFT_RIGHT); - + max(ee_start_lblk, start_lblk), len_lblk, SHIFT_RIGHT); up_write(&EXT4_I(inode)->i_data_sem); + if (ret) + goto out_handle; + + ext4_update_inode_fsync_trans(handle, inode, 1); if (IS_SYNC(inode)) ext4_handle_sync(handle); - if (ret >= 0) - ext4_update_inode_fsync_trans(handle, inode, 1);
-out_stop: +out_handle: ext4_journal_stop(handle); -out_mmap: +out_invalidate_lock: filemap_invalidate_unlock(mapping); -out_mutex: +out: inode_unlock(inode); return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit fd2f764826df5489b849a8937b5a093aae5b1816 ]
Now the real job of normal fallocate are open coded in ext4_fallocate(), factor out a new helper ext4_do_fallocate() to do the real job, like others functions (e.g. ext4_zero_range()) in ext4_fallocate() do, this can make the code more clear, no functional changes.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-9-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 125 ++++++++++++++++++++++------------------------ 1 file changed, 60 insertions(+), 65 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 961e7b634401d..eb58f7a1ab5aa 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4690,6 +4690,58 @@ static long ext4_zero_range(struct file *file, loff_t offset, return ret; }
+static long ext4_do_fallocate(struct file *file, loff_t offset, + loff_t len, int mode) +{ + struct inode *inode = file_inode(file); + loff_t end = offset + len; + loff_t new_size = 0; + ext4_lblk_t start_lblk, len_lblk; + int ret; + + trace_ext4_fallocate_enter(inode, offset, len, mode); + + start_lblk = offset >> inode->i_blkbits; + len_lblk = EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits); + + inode_lock(inode); + + /* We only support preallocation for extent-based files only. */ + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { + ret = -EOPNOTSUPP; + goto out; + } + + if (!(mode & FALLOC_FL_KEEP_SIZE) && + (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) { + new_size = end; + ret = inode_newsize_ok(inode, new_size); + if (ret) + goto out; + } + + /* Wait all existing dio workers, newcomers will block on i_rwsem */ + inode_dio_wait(inode); + + ret = file_modified(file); + if (ret) + goto out; + + ret = ext4_alloc_file_blocks(file, start_lblk, len_lblk, new_size, + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); + if (ret) + goto out; + + if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) { + ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal, + EXT4_I(inode)->i_sync_tid); + } +out: + inode_unlock(inode); + trace_ext4_fallocate_exit(inode, offset, len_lblk, ret); + return ret; +} + /* * preallocate space for a file. This implements ext4's fallocate file * operation, which gets called from sys_fallocate system call. @@ -4700,12 +4752,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode = file_inode(file); - loff_t new_size = 0; - unsigned int max_blocks; - int ret = 0; - int flags; - ext4_lblk_t lblk; - unsigned int blkbits = inode->i_blkbits; + int ret;
/* * Encrypted inodes can't handle collapse range or insert @@ -4727,71 +4774,19 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) ret = ext4_convert_inline_data(inode); inode_unlock(inode); if (ret) - goto exit; + return ret;
- if (mode & FALLOC_FL_PUNCH_HOLE) { + if (mode & FALLOC_FL_PUNCH_HOLE) ret = ext4_punch_hole(file, offset, len); - goto exit; - } - - if (mode & FALLOC_FL_COLLAPSE_RANGE) { + else if (mode & FALLOC_FL_COLLAPSE_RANGE) ret = ext4_collapse_range(file, offset, len); - goto exit; - } - - if (mode & FALLOC_FL_INSERT_RANGE) { + else if (mode & FALLOC_FL_INSERT_RANGE) ret = ext4_insert_range(file, offset, len); - goto exit; - } - - if (mode & FALLOC_FL_ZERO_RANGE) { + else if (mode & FALLOC_FL_ZERO_RANGE) ret = ext4_zero_range(file, offset, len, mode); - goto exit; - } - trace_ext4_fallocate_enter(inode, offset, len, mode); - lblk = offset >> blkbits; - - max_blocks = EXT4_MAX_BLOCKS(len, offset, blkbits); - flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; - - inode_lock(inode); - - /* - * We only support preallocation for extent-based files only - */ - if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { - ret = -EOPNOTSUPP; - goto out; - } - - if (!(mode & FALLOC_FL_KEEP_SIZE) && - (offset + len > inode->i_size || - offset + len > EXT4_I(inode)->i_disksize)) { - new_size = offset + len; - ret = inode_newsize_ok(inode, new_size); - if (ret) - goto out; - } - - /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret = file_modified(file); - if (ret) - goto out; - - ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size, flags); - if (ret) - goto out; + else + ret = ext4_do_fallocate(file, offset, len, mode);
- if (file->f_flags & O_SYNC && EXT4_SB(inode->i_sb)->s_journal) { - ret = ext4_fc_commit(EXT4_SB(inode->i_sb)->s_journal, - EXT4_I(inode)->i_sync_tid); - } -out: - inode_unlock(inode); - trace_ext4_fallocate_exit(inode, offset, max_blocks, ret); -exit: return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit ea3f17efd36b56c5839289716ba83eaa85893590 ]
Currently, all five sub-functions of ext4_fallocate() acquire the inode's i_rwsem at the beginning and release it before exiting. This process can be simplified by factoring out the management of i_rwsem into the ext4_fallocate() function.
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-10-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 90 +++++++++++++++-------------------------------- fs/ext4/inode.c | 13 +++---- 2 files changed, 33 insertions(+), 70 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index eb58f7a1ab5aa..30d412b62d9ed 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4579,23 +4579,18 @@ static long ext4_zero_range(struct file *file, loff_t offset, int ret, flags, credits;
trace_ext4_zero_range(inode, offset, len, mode); + WARN_ON_ONCE(!inode_is_locked(inode));
- inode_lock(inode); - - /* - * Indirect files do not support unwritten extents - */ - if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { - ret = -EOPNOTSUPP; - goto out; - } + /* Indirect files do not support unwritten extents */ + if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) + return -EOPNOTSUPP;
if (!(mode & FALLOC_FL_KEEP_SIZE) && (end > inode->i_size || end > EXT4_I(inode)->i_disksize)) { new_size = end; ret = inode_newsize_ok(inode, new_size); if (ret) - goto out; + return ret; }
/* Wait all existing dio workers, newcomers will block on i_rwsem */ @@ -4603,7 +4598,7 @@ static long ext4_zero_range(struct file *file, loff_t offset,
ret = file_modified(file); if (ret) - goto out; + return ret;
/* * Prevent page faults from reinstantiating pages we have released @@ -4685,8 +4680,6 @@ static long ext4_zero_range(struct file *file, loff_t offset, ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; }
@@ -4700,12 +4693,11 @@ static long ext4_do_fallocate(struct file *file, loff_t offset, int ret;
trace_ext4_fallocate_enter(inode, offset, len, mode); + WARN_ON_ONCE(!inode_is_locked(inode));
start_lblk = offset >> inode->i_blkbits; len_lblk = EXT4_MAX_BLOCKS(len, offset, inode->i_blkbits);
- inode_lock(inode); - /* We only support preallocation for extent-based files only. */ if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) { ret = -EOPNOTSUPP; @@ -4737,7 +4729,6 @@ static long ext4_do_fallocate(struct file *file, loff_t offset, EXT4_I(inode)->i_sync_tid); } out: - inode_unlock(inode); trace_ext4_fallocate_exit(inode, offset, len_lblk, ret); return ret; } @@ -4772,9 +4763,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
inode_lock(inode); ret = ext4_convert_inline_data(inode); - inode_unlock(inode); if (ret) - return ret; + goto out_inode_lock;
if (mode & FALLOC_FL_PUNCH_HOLE) ret = ext4_punch_hole(file, offset, len); @@ -4786,7 +4776,8 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) ret = ext4_zero_range(file, offset, len, mode); else ret = ext4_do_fallocate(file, offset, len, mode); - +out_inode_lock: + inode_unlock(inode); return ret; }
@@ -5291,36 +5282,27 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) int ret;
trace_ext4_collapse_range(inode, offset, len); - - inode_lock(inode); + WARN_ON_ONCE(!inode_is_locked(inode));
/* Currently just for extent based files */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { - ret = -EOPNOTSUPP; - goto out; - } - + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + return -EOPNOTSUPP; /* Collapse range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { - ret = -EINVAL; - goto out; - } - + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) + return -EINVAL; /* * There is no need to overlap collapse range with EOF, in which case * it is effectively a truncate operation */ - if (end >= inode->i_size) { - ret = -EINVAL; - goto out; - } + if (end >= inode->i_size) + return -EINVAL;
/* Wait for existing dio to complete */ inode_dio_wait(inode);
ret = file_modified(file); if (ret) - goto out; + return ret;
/* * Prevent page faults from reinstantiating pages we have released from @@ -5395,8 +5377,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; }
@@ -5422,39 +5402,27 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) loff_t start;
trace_ext4_insert_range(inode, offset, len); - - inode_lock(inode); + WARN_ON_ONCE(!inode_is_locked(inode));
/* Currently just for extent based files */ - if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) { - ret = -EOPNOTSUPP; - goto out; - } - + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + return -EOPNOTSUPP; /* Insert range works only on fs cluster size aligned regions. */ - if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) { - ret = -EINVAL; - goto out; - } - + if (!IS_ALIGNED(offset | len, EXT4_CLUSTER_SIZE(sb))) + return -EINVAL; /* Offset must be less than i_size */ - if (offset >= inode->i_size) { - ret = -EINVAL; - goto out; - } - + if (offset >= inode->i_size) + return -EINVAL; /* Check whether the maximum file size would be exceeded */ - if (len > inode->i_sb->s_maxbytes - inode->i_size) { - ret = -EFBIG; - goto out; - } + if (len > inode->i_sb->s_maxbytes - inode->i_size) + return -EFBIG;
/* Wait for existing dio to complete */ inode_dio_wait(inode);
ret = file_modified(file); if (ret) - goto out; + return ret;
/* * Prevent page faults from reinstantiating pages we have released from @@ -5555,8 +5523,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bb68c851b33ad..6f0b1b0bd1af8 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3997,15 +3997,14 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) loff_t end = offset + length; handle_t *handle; unsigned int credits; - int ret = 0; + int ret;
trace_ext4_punch_hole(inode, offset, length, 0); - - inode_lock(inode); + WARN_ON_ONCE(!inode_is_locked(inode));
/* No need to punch hole beyond i_size */ if (offset >= inode->i_size) - goto out; + return 0;
/* * If the hole extends beyond i_size, set the hole to end after @@ -4025,7 +4024,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) if (!IS_ALIGNED(offset | end, sb->s_blocksize)) { ret = ext4_inode_attach_jinode(inode); if (ret < 0) - goto out; + return ret; }
/* Wait all existing dio workers, newcomers will block on i_rwsem */ @@ -4033,7 +4032,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
ret = file_modified(file); if (ret) - goto out; + return ret;
/* * Prevent page faults from reinstantiating pages we have released from @@ -4109,8 +4108,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) ext4_journal_stop(handle); out_invalidate_lock: filemap_invalidate_unlock(mapping); -out: - inode_unlock(inode); return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 2890e5e0f49e10f3dadc5f7b7ea434e3e77e12a6 ]
Currently, all zeroing ranges, punch holes, collapse ranges, and insert ranges first wait for all existing direct I/O workers to complete, and then they acquire the mapping's invalidate lock before performing the actual work. These common components are nearly identical, so we can simplify the code by factoring them out into the ext4_fallocate().
Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ojaswin Mujoo ojaswin@linux.ibm.com Link: https://patch.msgid.link/20241220011637.1157197-11-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 29ec9bed2395 ("ext4: fix incorrect punch max_end") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 124 ++++++++++++++++------------------------------ fs/ext4/inode.c | 25 ++-------- 2 files changed, 45 insertions(+), 104 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 30d412b62d9ed..51b9533416e04 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4569,7 +4569,6 @@ static long ext4_zero_range(struct file *file, loff_t offset, loff_t len, int mode) { struct inode *inode = file_inode(file); - struct address_space *mapping = file->f_mapping; handle_t *handle = NULL; loff_t new_size = 0; loff_t end = offset + len; @@ -4593,23 +4592,6 @@ static long ext4_zero_range(struct file *file, loff_t offset, return ret; }
- /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret = file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released - * from page cache. - */ - filemap_invalidate_lock(mapping); - - ret = ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; /* Preallocate the range including the unaligned edges */ if (!IS_ALIGNED(offset | end, blocksize)) { @@ -4619,17 +4601,17 @@ static long ext4_zero_range(struct file *file, loff_t offset, ret = ext4_alloc_file_blocks(file, alloc_lblk, len_lblk, new_size, flags); if (ret) - goto out_invalidate_lock; + return ret; }
ret = ext4_update_disksize_before_punch(inode, offset, len); if (ret) - goto out_invalidate_lock; + return ret;
/* Now release the pages and zero block aligned part of pages */ ret = ext4_truncate_page_cache_block_range(inode, offset, end); if (ret) - goto out_invalidate_lock; + return ret;
/* Zero range excluding the unaligned edges */ start_lblk = EXT4_B_TO_LBLK(inode, offset); @@ -4641,11 +4623,11 @@ static long ext4_zero_range(struct file *file, loff_t offset, ret = ext4_alloc_file_blocks(file, start_lblk, zero_blks, new_size, flags); if (ret) - goto out_invalidate_lock; + return ret; } /* Finish zeroing out if it doesn't contain partial block */ if (IS_ALIGNED(offset | end, blocksize)) - goto out_invalidate_lock; + return ret;
/* * In worst case we have to writeout two nonadjacent unwritten @@ -4658,7 +4640,7 @@ static long ext4_zero_range(struct file *file, loff_t offset, if (IS_ERR(handle)) { ret = PTR_ERR(handle); ext4_std_error(inode->i_sb, ret); - goto out_invalidate_lock; + return ret; }
/* Zero out partial block at the edges of the range */ @@ -4678,8 +4660,6 @@ static long ext4_zero_range(struct file *file, loff_t offset,
out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; }
@@ -4712,13 +4692,6 @@ static long ext4_do_fallocate(struct file *file, loff_t offset, goto out; }
- /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret = file_modified(file); - if (ret) - goto out; - ret = ext4_alloc_file_blocks(file, start_lblk, len_lblk, new_size, EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); if (ret) @@ -4743,6 +4716,7 @@ static long ext4_do_fallocate(struct file *file, loff_t offset, long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) { struct inode *inode = file_inode(file); + struct address_space *mapping = file->f_mapping; int ret;
/* @@ -4766,6 +4740,29 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) if (ret) goto out_inode_lock;
+ /* Wait all existing dio workers, newcomers will block on i_rwsem */ + inode_dio_wait(inode); + + ret = file_modified(file); + if (ret) + return ret; + + if ((mode & FALLOC_FL_MODE_MASK) == FALLOC_FL_ALLOCATE_RANGE) { + ret = ext4_do_fallocate(file, offset, len, mode); + goto out_inode_lock; + } + + /* + * Follow-up operations will drop page cache, hold invalidate lock + * to prevent page faults from reinstantiating pages we have + * released from page cache. + */ + filemap_invalidate_lock(mapping); + + ret = ext4_break_layouts(inode); + if (ret) + goto out_invalidate_lock; + if (mode & FALLOC_FL_PUNCH_HOLE) ret = ext4_punch_hole(file, offset, len); else if (mode & FALLOC_FL_COLLAPSE_RANGE) @@ -4775,7 +4772,10 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len) else if (mode & FALLOC_FL_ZERO_RANGE) ret = ext4_zero_range(file, offset, len, mode); else - ret = ext4_do_fallocate(file, offset, len, mode); + ret = -EOPNOTSUPP; + +out_invalidate_lock: + filemap_invalidate_unlock(mapping); out_inode_lock: inode_unlock(inode); return ret; @@ -5297,23 +5297,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) if (end >= inode->i_size) return -EINVAL;
- /* Wait for existing dio to complete */ - inode_dio_wait(inode); - - ret = file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released from - * page cache. - */ - filemap_invalidate_lock(mapping); - - ret = ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - /* * Write tail of the last page before removed range and data that * will be shifted since they will get removed from the page cache @@ -5327,16 +5310,15 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len) if (!ret) ret = filemap_write_and_wait_range(mapping, end, LLONG_MAX); if (ret) - goto out_invalidate_lock; + return ret;
truncate_pagecache(inode, start);
credits = ext4_writepage_trans_blocks(inode); handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto out_invalidate_lock; - } + if (IS_ERR(handle)) + return PTR_ERR(handle); + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
start_lblk = offset >> inode->i_blkbits; @@ -5375,8 +5357,6 @@ static int ext4_collapse_range(struct file *file, loff_t offset, loff_t len)
out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; }
@@ -5417,23 +5397,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) if (len > inode->i_sb->s_maxbytes - inode->i_size) return -EFBIG;
- /* Wait for existing dio to complete */ - inode_dio_wait(inode); - - ret = file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released from - * page cache. - */ - filemap_invalidate_lock(mapping); - - ret = ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock; - /* * Write out all dirty pages. Need to round down to align start offset * to page size boundary for page size > block size. @@ -5441,16 +5404,15 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len) start = round_down(offset, PAGE_SIZE); ret = filemap_write_and_wait_range(mapping, start, LLONG_MAX); if (ret) - goto out_invalidate_lock; + return ret;
truncate_pagecache(inode, start);
credits = ext4_writepage_trans_blocks(inode); handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); - if (IS_ERR(handle)) { - ret = PTR_ERR(handle); - goto out_invalidate_lock; - } + if (IS_ERR(handle)) + return PTR_ERR(handle); + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_FALLOC_RANGE, handle);
/* Expand file to avoid data loss if there is error while shifting */ @@ -5521,8 +5483,6 @@ static int ext4_insert_range(struct file *file, loff_t offset, loff_t len)
out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; }
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6f0b1b0bd1af8..ca98f04fcf556 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3992,7 +3992,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; ext4_lblk_t start_lblk, end_lblk; - struct address_space *mapping = inode->i_mapping; loff_t max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; loff_t end = offset + length; handle_t *handle; @@ -4027,31 +4026,15 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) return ret; }
- /* Wait all existing dio workers, newcomers will block on i_rwsem */ - inode_dio_wait(inode); - - ret = file_modified(file); - if (ret) - return ret; - - /* - * Prevent page faults from reinstantiating pages we have released from - * page cache. - */ - filemap_invalidate_lock(mapping); - - ret = ext4_break_layouts(inode); - if (ret) - goto out_invalidate_lock;
ret = ext4_update_disksize_before_punch(inode, offset, length); if (ret) - goto out_invalidate_lock; + return ret;
/* Now release the pages and zero block aligned part of pages*/ ret = ext4_truncate_page_cache_block_range(inode, offset, end); if (ret) - goto out_invalidate_lock; + return ret;
if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) credits = ext4_writepage_trans_blocks(inode); @@ -4061,7 +4044,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) if (IS_ERR(handle)) { ret = PTR_ERR(handle); ext4_std_error(sb, ret); - goto out_invalidate_lock; + return ret; }
ret = ext4_zero_partial_blocks(handle, inode, offset, length); @@ -4106,8 +4089,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) ext4_handle_sync(handle); out_handle: ext4_journal_stop(handle); -out_invalidate_lock: - filemap_invalidate_unlock(mapping); return ret; }
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 29ec9bed2395061350249ae356fb300dd82a78e7 ]
For the extents based inodes, the maxbytes should be sb->s_maxbytes instead of sbi->s_bitmap_maxbytes. Additionally, for the calculation of max_end, the -sb->s_blocksize operation is necessary only for indirect-block based inodes. Correct the maxbytes and max_end value to correct the behavior of punch hole.
Fixes: 2da376228a24 ("ext4: limit length to bitmap_maxbytes - blocksize in punch_hole") Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Baokun Li libaokun1@huawei.com Link: https://patch.msgid.link/20250506012009.3896990-2-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/inode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ca98f04fcf556..fe1d19d920a96 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3992,7 +3992,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) struct inode *inode = file_inode(file); struct super_block *sb = inode->i_sb; ext4_lblk_t start_lblk, end_lblk; - loff_t max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; + loff_t max_end = sb->s_maxbytes; loff_t end = offset + length; handle_t *handle; unsigned int credits; @@ -4001,14 +4001,20 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) trace_ext4_punch_hole(inode, offset, length, 0); WARN_ON_ONCE(!inode_is_locked(inode));
+ /* + * For indirect-block based inodes, make sure that the hole within + * one block before last range. + */ + if (!ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) + max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize; + /* No need to punch hole beyond i_size */ if (offset >= inode->i_size) return 0;
/* * If the hole extends beyond i_size, set the hole to end after - * the page that contains i_size, and also make sure that the hole - * within one block before last range. + * the page that contains i_size. */ if (end > inode->i_size) end = round_up(inode->i_size, PAGE_SIZE);
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 129245cfbd6d79c6d603f357f428010ccc0f0ee7 ]
The error out label of file_modified() should be out_inode_lock in ext4_fallocate().
Fixes: 2890e5e0f49e ("ext4: move out common parts into ext4_fallocate()") Reported-by: Baokun Li libaokun1@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Baokun Li libaokun1@huawei.com Link: https://patch.msgid.link/20250319023557.2785018-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/extents.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 51b9533416e04..2f9c3cd4f26cc 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4745,7 +4745,7 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
ret = file_modified(file); if (ret) - return ret; + goto out_inode_lock;
if ((mode & FALLOC_FL_MODE_MASK) == FALLOC_FL_ALLOCATE_RANGE) { ret = ext4_do_fallocate(file, offset, len, mode);
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit b5e58bcd79625423487fa3ecba8e8411b5396327 ]
Punching a hole with a start offset that exceeds max_end is not permitted and will result in a negative length in the truncate_inode_partial_folio() function while truncating the page cache, potentially leading to undesirable consequences.
A simple reproducer:
truncate -s 9895604649994 /mnt/foo xfs_io -c "pwrite 8796093022208 4096" /mnt/foo xfs_io -c "fpunch 8796093022213 25769803777" /mnt/foo
kernel BUG at include/linux/highmem.h:275! Oops: invalid opcode: 0000 [#1] SMP PTI CPU: 3 UID: 0 PID: 710 Comm: xfs_io Not tainted 6.15.0-rc3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:zero_user_segments.constprop.0+0xd7/0x110 RSP: 0018:ffffc90001cf3b38 EFLAGS: 00010287 RAX: 0000000000000005 RBX: ffffea0001485e40 RCX: 0000000000001000 RDX: 000000000040b000 RSI: 0000000000000005 RDI: 000000000040b000 RBP: 000000000040affb R08: ffff888000000000 R09: ffffea0000000000 R10: 0000000000000003 R11: 00000000fffc7fc5 R12: 0000000000000005 R13: 000000000040affb R14: ffffea0001485e40 R15: ffff888031cd3000 FS: 00007f4f63d0b780(0000) GS:ffff8880d337d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000001ae0b038 CR3: 00000000536aa000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> truncate_inode_partial_folio+0x3dd/0x620 truncate_inode_pages_range+0x226/0x720 ? bdev_getblk+0x52/0x3e0 ? ext4_get_group_desc+0x78/0x150 ? crc32c_arch+0xfd/0x180 ? __ext4_get_inode_loc+0x18c/0x840 ? ext4_inode_csum+0x117/0x160 ? jbd2_journal_dirty_metadata+0x61/0x390 ? __ext4_handle_dirty_metadata+0xa0/0x2b0 ? kmem_cache_free+0x90/0x5a0 ? jbd2_journal_stop+0x1d5/0x550 ? __ext4_journal_stop+0x49/0x100 truncate_pagecache_range+0x50/0x80 ext4_truncate_page_cache_block_range+0x57/0x3a0 ext4_punch_hole+0x1fe/0x670 ext4_fallocate+0x792/0x17d0 ? __count_memcg_events+0x175/0x2a0 vfs_fallocate+0x121/0x560 ksys_fallocate+0x51/0xc0 __x64_sys_fallocate+0x24/0x40 x64_sys_call+0x18d2/0x4170 do_syscall_64+0xa7/0x220 entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fix this by filtering out cases where the punching start offset exceeds max_end.
Fixes: 982bf37da09d ("ext4: refactor ext4_punch_hole()") Reported-by: Liebes Wang wanghaichi0403@gmail.com Closes: https://lore.kernel.org/linux-ext4/ac3a58f6-e686-488b-a9ee-fc041024e43d@huaw... Tested-by: Liebes Wang wanghaichi0403@gmail.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Baokun Li libaokun1@huawei.com Link: https://patch.msgid.link/20250506012009.3896990-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index fe1d19d920a96..eb092133c6b88 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4009,7 +4009,7 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length) max_end = EXT4_SB(sb)->s_bitmap_maxbytes - sb->s_blocksize;
/* No need to punch hole beyond i_size */ - if (offset >= inode->i_size) + if (offset >= inode->i_size || offset >= max_end) return 0;
/*
linux-stable-mirror@lists.linaro.org