6.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 34ecde3c56066ba79e5ec3d93c5b14ea83e3603e ]
DONTCACHE I/O must have the completion punted to a workqueue, just like what is done for unwritten extents, as the completion needs task context to perform the invalidation of the folio(s). However, if writeback is started off filemap_fdatawrite_range() off generic_sync() and it's an overwrite, then the DONTCACHE marking gets lost as iomap_add_to_ioend() don't look at the folio being added and no further state is passed down to help it know that this is a dropbehind/DONTCACHE write.
Check if the folio being added is marked as dropbehind, and set IOMAP_IOEND_DONTCACHE if that is the case. Then XFS can factor this into the decision making of completion context in xfs_submit_ioend(). Additionally include this ioend flag in the NOMERGE flags, to avoid mixing it with unrelated IO.
Since this is the 3rd flag that will cause XFS to punt the completion to a workqueue, add a helper so that each one of them can get appropriately commented.
This fixes extra page cache being instantiated when the write performed is an overwrite, rather than newly instantiated blocks.
Fixes: b2cd5ae693a3 ("iomap: make buffered writes work with RWF_DONTCACHE") Signed-off-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/5153f6e8-274d-4546-bf55-30a5018e0d03@kernel.dk Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/iomap/buffered-io.c | 2 ++ fs/xfs/xfs_aops.c | 22 ++++++++++++++++++++-- include/linux/iomap.h | 5 ++++- 3 files changed, 26 insertions(+), 3 deletions(-)
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 5b08bd417b287..0ac474888a02e 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1675,6 +1675,8 @@ static int iomap_add_to_ioend(struct iomap_writepage_ctx *wpc, ioend_flags |= IOMAP_IOEND_UNWRITTEN; if (wpc->iomap.flags & IOMAP_F_SHARED) ioend_flags |= IOMAP_IOEND_SHARED; + if (folio_test_dropbehind(folio)) + ioend_flags |= IOMAP_IOEND_DONTCACHE; if (pos == wpc->iomap.offset && (wpc->iomap.flags & IOMAP_F_BOUNDARY)) ioend_flags |= IOMAP_IOEND_BOUNDARY;
diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index 26a04a7834896..63151feb9c3fd 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -436,6 +436,25 @@ xfs_map_blocks( return 0; }
+static bool +xfs_ioend_needs_wq_completion( + struct iomap_ioend *ioend) +{ + /* Changing inode size requires a transaction. */ + if (xfs_ioend_is_append(ioend)) + return true; + + /* Extent manipulation requires a transaction. */ + if (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED)) + return true; + + /* Page cache invalidation cannot be done in irq context. */ + if (ioend->io_flags & IOMAP_IOEND_DONTCACHE) + return true; + + return false; +} + static int xfs_submit_ioend( struct iomap_writepage_ctx *wpc, @@ -460,8 +479,7 @@ xfs_submit_ioend( memalloc_nofs_restore(nofs_flag);
/* send ioends that might require a transaction to the completion wq */ - if (xfs_ioend_is_append(ioend) || - (ioend->io_flags & (IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_SHARED))) + if (xfs_ioend_needs_wq_completion(ioend)) ioend->io_bio.bi_end_io = xfs_end_bio;
if (status) diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 68416b135151d..522644d62f30f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -377,13 +377,16 @@ sector_t iomap_bmap(struct address_space *mapping, sector_t bno, #define IOMAP_IOEND_BOUNDARY (1U << 2) /* is direct I/O */ #define IOMAP_IOEND_DIRECT (1U << 3) +/* is DONTCACHE I/O */ +#define IOMAP_IOEND_DONTCACHE (1U << 4)
/* * Flags that if set on either ioend prevent the merge of two ioends. * (IOMAP_IOEND_BOUNDARY also prevents merges, but only one-way) */ #define IOMAP_IOEND_NOMERGE_FLAGS \ - (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_DIRECT) + (IOMAP_IOEND_SHARED | IOMAP_IOEND_UNWRITTEN | IOMAP_IOEND_DIRECT | \ + IOMAP_IOEND_DONTCACHE)
/* * Structure for writeback I/O completions.