Hi Greg,
These patches correspond to the last two patches from the 5.10 series [1]. These patches were postponed for 5.10 until they were tested on 5.15. I have tested these on 5.15 (40 runs of the auto group x 4 configs).
Best, Leah
Changes v1-> v2: - Added Acks
[1] https://lore.kernel.org/linux-xfs/20220901054854.2449416-1-amir73il@gmail.co...
Brian Foster (1): xfs: fix xfs_ifree() error handling to not leak perag ref
Dave Chinner (2): xfs: reorder iunlink remove operation in xfs_ifree xfs: validate inode fork size against fork format
fs/xfs/libxfs/xfs_inode_buf.c | 35 ++++++++++++++++++++++++++--------- fs/xfs/xfs_inode.c | 22 ++++++++++++---------- 2 files changed, 38 insertions(+), 19 deletions(-)
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 9a5280b312e2e7898b6397b2ca3cfd03f67d7be1 ]
The O_TMPFILE creation implementation creates a specific order of operations for inode allocation/freeing and unlinked list modification. Currently both are serialised by the AGI, so the order doesn't strictly matter as long as the are both in the same transaction.
However, if we want to move the unlinked list insertions largely out from under the AGI lock, then we have to be concerned about the order in which we do unlinked list modification operations. O_TMPFILE creation tells us this order is inode allocation/free, then unlinked list modification.
Change xfs_ifree() to use this same ordering on unlinked list removal. This way we always guarantee that when we enter the iunlinked list removal code from this path, we already have the AGI locked and we don't have to worry about lock nesting AGI reads inside unlink list locks because it's already locked and attached to the transaction.
We can do this safely as the inode freeing and unlinked list removal are done in the same transaction and hence are atomic operations with respect to log recovery.
Reported-by: Frank Hofmann fhofmann@cloudflare.com Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org --- fs/xfs/xfs_inode.c | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index fb7a97cdf99f..36bcdcf3bb78 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2599,14 +2599,13 @@ xfs_ifree_cluster( }
/* - * This is called to return an inode to the inode free list. - * The inode should already be truncated to 0 length and have - * no pages associated with it. This routine also assumes that - * the inode is already a part of the transaction. + * This is called to return an inode to the inode free list. The inode should + * already be truncated to 0 length and have no pages associated with it. This + * routine also assumes that the inode is already a part of the transaction. * - * The on-disk copy of the inode will have been added to the list - * of unlinked inodes in the AGI. We need to remove the inode from - * that list atomically with respect to freeing it here. + * The on-disk copy of the inode will have been added to the list of unlinked + * inodes in the AGI. We need to remove the inode from that list atomically with + * respect to freeing it here. */ int xfs_ifree( @@ -2628,13 +2627,16 @@ xfs_ifree( pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
/* - * Pull the on-disk inode from the AGI unlinked list. + * Free the inode first so that we guarantee that the AGI lock is going + * to be taken before we remove the inode from the unlinked list. This + * makes the AGI lock -> unlinked list modification order the same as + * used in O_TMPFILE creation. */ - error = xfs_iunlink_remove(tp, pag, ip); + error = xfs_difree(tp, pag, ip->i_ino, &xic); if (error) - goto out; + return error;
- error = xfs_difree(tp, pag, ip->i_ino, &xic); + error = xfs_iunlink_remove(tp, pag, ip); if (error) goto out;
From: Brian Foster bfoster@redhat.com
[ Upstream commit 6f5097e3367a7c0751e165e4c15bc30511a4ba38 ]
For some reason commit 9a5280b312e2e ("xfs: reorder iunlink remove operation in xfs_ifree") replaced a jump to the exit path in the event of an xfs_difree() error with a direct return, which skips releasing the perag reference acquired at the top of the function. Restore the original code to drop the reference on error.
Fixes: 9a5280b312e2e ("xfs: reorder iunlink remove operation in xfs_ifree") Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org --- fs/xfs/xfs_inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 36bcdcf3bb78..b2ea85318214 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2634,7 +2634,7 @@ xfs_ifree( */ error = xfs_difree(tp, pag, ip->i_ino, &xic); if (error) - return error; + goto out;
error = xfs_iunlink_remove(tp, pag, ip); if (error)
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 1eb70f54c445fcbb25817841e774adb3d912f3e8 ]
xfs_repair catches fork size/format mismatches, but the in-kernel verifier doesn't, leading to null pointer failures when attempting to perform operations on the fork. This can occur in the xfs_dir_is_empty() where the in-memory fork format does not match the size and so the fork data pointer is accessed incorrectly.
Note: this causes new failures in xfs/348 which is testing mode vs ftype mismatches. We now detect a regular file that has been changed to a directory or symlink mode as being corrupt because the data fork is for a symlink or directory should be in local form when there are only 3 bytes of data in the data fork. Hence the inode verify for the regular file now fires w/ -EFSCORRUPTED because the inode fork format does not match the format the corrupted mode says it should be in.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: Darrick J. Wong djwong@kernel.org --- fs/xfs/libxfs/xfs_inode_buf.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 3932b4ebf903..f84d3fbb9d3d 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -337,19 +337,36 @@ xfs_dinode_verify_fork( int whichfork) { uint32_t di_nextents = XFS_DFORK_NEXTENTS(dip, whichfork); + mode_t mode = be16_to_cpu(dip->di_mode); + uint32_t fork_size = XFS_DFORK_SIZE(dip, mp, whichfork); + uint32_t fork_format = XFS_DFORK_FORMAT(dip, whichfork);
- switch (XFS_DFORK_FORMAT(dip, whichfork)) { + /* + * For fork types that can contain local data, check that the fork + * format matches the size of local data contained within the fork. + * + * For all types, check that when the size says the should be in extent + * or btree format, the inode isn't claiming it is in local format. + */ + if (whichfork == XFS_DATA_FORK) { + if (S_ISDIR(mode) || S_ISLNK(mode)) { + if (be64_to_cpu(dip->di_size) <= fork_size && + fork_format != XFS_DINODE_FMT_LOCAL) + return __this_address; + } + + if (be64_to_cpu(dip->di_size) > fork_size && + fork_format == XFS_DINODE_FMT_LOCAL) + return __this_address; + } + + switch (fork_format) { case XFS_DINODE_FMT_LOCAL: /* - * no local regular files yet + * No local regular files yet. */ - if (whichfork == XFS_DATA_FORK) { - if (S_ISREG(be16_to_cpu(dip->di_mode))) - return __this_address; - if (be64_to_cpu(dip->di_size) > - XFS_DFORK_SIZE(dip, mp, whichfork)) - return __this_address; - } + if (S_ISREG(mode) && whichfork == XFS_DATA_FORK) + return __this_address; if (di_nextents) return __this_address; break;
On Thu, Sep 22, 2022 at 6:15 PM Leah Rumancik leah.rumancik@gmail.com wrote:
Hi Greg,
These patches correspond to the last two patches from the 5.10 series [1]. These patches were postponed for 5.10 until they were tested on 5.15. I have tested these on 5.15 (40 runs of the auto group x 4 configs).
Re-posted the 5.10 patches.
Thanks! Amir.
On Thu, Sep 22, 2022 at 08:14:58AM -0700, Leah Rumancik wrote:
Hi Greg,
These patches correspond to the last two patches from the 5.10 series [1]. These patches were postponed for 5.10 until they were tested on 5.15. I have tested these on 5.15 (40 runs of the auto group x 4 configs).
Now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org