Hi all,
New code for 6.12.
If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below.
This has been running on the djcloud for months with no problems. Enjoy! Comments and questions are, as always, welcome.
--D
kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=xfs...
xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=... --- Commits in this patchset: * xfs: create incore realtime group structures * xfs: define locking primitives for realtime groups * xfs: add a lockdep class key for rtgroup inodes * xfs: support caching rtgroup metadata inodes * xfs: add a xfs_bmap_free_rtblocks helper * xfs: move RT bitmap and summary information to the rtgroup * xfs: support creating per-RTG files in growfs * xfs: refactor xfs_rtbitmap_blockcount * xfs: refactor xfs_rtsummary_blockcount * xfs: make RT extent numbers relative to the rtgroup * libfrog: add memchr_inv * xfs: define the format of rt groups * xfs: update realtime super every time we update the primary fs super * xfs: export realtime group geometry via XFS_FSOP_GEOM * xfs: check that rtblock extents do not break rtsupers or rtgroups * xfs: add a helper to prevent bmap merges across rtgroup boundaries * xfs: add frextents to the lazysbcounters when rtgroups enabled * xfs: record rt group metadata errors in the health system * xfs: export the geometry of realtime groups to userspace * xfs: add block headers to realtime bitmap and summary blocks * xfs: encode the rtbitmap in big endian format * xfs: encode the rtsummary in big endian format * xfs: grow the realtime section when realtime groups are enabled * xfs: support logging EFIs for realtime extents * xfs: support error injection when freeing rt extents * xfs: use realtime EFI to free extents when rtgroups are enabled * xfs: don't merge ioends across RTGs * xfs: make the RT allocator rtgroup aware * xfs: scrub the realtime group superblock * xfs: scrub metadir paths for rtgroup metadata * xfs: mask off the rtbitmap and summary inodes when metadir in use * xfs: create helpers to deal with rounding xfs_fileoff_t to rtx boundaries * xfs: create helpers to deal with rounding xfs_filblks_t to rtx boundaries * xfs: make xfs_rtblock_t a segmented address like xfs_fsblock_t * xfs: adjust min_block usage in xfs_verify_agbno * xfs: move the min and max group block numbers to xfs_group * xfs: implement busy extent tracking for rtgroups * xfs: use metadir for quota inodes * xfs: scrub quota file metapaths * xfs: enable metadata directory feature * xfs: convert struct typedefs in xfs_ondisk.h * xfs: separate space btree structures in xfs_ondisk.h * xfs: port ondisk structure checks from xfs/122 to the kernel * xfs: remove unknown compat feature check in superblock write validation * xfs: fix sparse inode limits on runt AG * xfs: switch to multigrain timestamps * xfs: don't call xfs_bmap_same_rtgroup in xfs_bmap_add_extent_hole_delay * xfs: return a 64-bit block count from xfs_btree_count_blocks * xfs: fix error bailout in xfs_rtginode_create * xfs: update btree keys correctly when _insrec splits an inode root block * xfs: fix sb_spino_align checks for large fsblock sizes * xfs: return from xfs_symlink_verify early on V4 filesystems --- db/block.c | 2 db/block.h | 16 - db/convert.c | 1 db/faddr.c | 1 include/libxfs.h | 2 include/platform_defs.h | 33 ++ include/xfs_mount.h | 30 +- include/xfs_trace.h | 7 include/xfs_trans.h | 1 libfrog/util.c | 14 + libfrog/util.h | 4 libxfs/Makefile | 2 libxfs/init.c | 35 ++ libxfs/libxfs_api_defs.h | 16 + libxfs/libxfs_io.h | 1 libxfs/libxfs_priv.h | 34 -- libxfs/rdwr.c | 17 + libxfs/trans.c | 29 ++ libxfs/util.c | 8 libxfs/xfs_ag.c | 22 + libxfs/xfs_ag.h | 16 - libxfs/xfs_alloc.c | 15 + libxfs/xfs_alloc.h | 12 + libxfs/xfs_bmap.c | 124 ++++++-- libxfs/xfs_btree.c | 33 ++ libxfs/xfs_btree.h | 2 libxfs/xfs_defer.c | 6 libxfs/xfs_defer.h | 1 libxfs/xfs_dquot_buf.c | 190 ++++++++++++ libxfs/xfs_format.h | 80 +++++ libxfs/xfs_fs.h | 32 ++ libxfs/xfs_group.h | 33 ++ libxfs/xfs_health.h | 42 ++- libxfs/xfs_ialloc.c | 16 + libxfs/xfs_ialloc_btree.c | 6 libxfs/xfs_log_format.h | 6 libxfs/xfs_ondisk.h | 186 +++++++++--- libxfs/xfs_quota_defs.h | 43 +++ libxfs/xfs_rtbitmap.c | 405 +++++++++++++++++-------- libxfs/xfs_rtbitmap.h | 247 ++++++++++----- libxfs/xfs_rtgroup.c | 694 +++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 284 ++++++++++++++++++ libxfs/xfs_sb.c | 246 ++++++++++++++- libxfs/xfs_sb.h | 6 libxfs/xfs_shared.h | 4 libxfs/xfs_symlink_remote.c | 4 libxfs/xfs_trans_inode.c | 6 libxfs/xfs_trans_resv.c | 2 libxfs/xfs_types.c | 35 ++ libxfs/xfs_types.h | 8 mkfs/proto.c | 33 +- mkfs/xfs_mkfs.c | 8 repair/dinode.c | 4 repair/phase6.c | 203 ++++++------- repair/rt.c | 34 -- repair/rt.h | 4 56 files changed, 2728 insertions(+), 617 deletions(-) create mode 100644 libxfs/xfs_rtgroup.c create mode 100644 libxfs/xfs_rtgroup.h
From: Long Li leo.lilong@huawei.com
Source kernel commit: 652f03db897ba24f9c4b269e254ccc6cc01ff1b7
Compat features are new features that older kernels can safely ignore, allowing read-write mounts without issues. The current sb write validation implementation returns -EFSCORRUPTED for unknown compat features, preventing filesystem write operations and contradicting the feature's definition.
Additionally, if the mounted image is unclean, the log recovery may need to write to the superblock. Returning an error for unknown compat features during sb write validation can cause mount failures.
Although XFS currently does not use compat feature flags, this issue affects current kernels' ability to mount images that may use compat feature flags in the future.
Since superblock read validation already warns about unknown compat features, it's unnecessary to repeat this warning during write validation. Therefore, the relevant code in write validation is being removed.
Fixes: 9e037cb7972f ("xfs: check for unknown v5 feature bits in superblock write verifier") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Carlos Maiolino cem@kernel.org --- libxfs/xfs_sb.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 375324b99261af..87f740e6c75dce 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -323,13 +323,6 @@ xfs_validate_sb_write( * the kernel cannot support since we checked for unsupported bits in * the read verifier, which means that memory is corrupt. */ - if (xfs_sb_has_compat_feature(sbp, XFS_SB_FEAT_COMPAT_UNKNOWN)) { - xfs_warn(mp, -"Corruption detected in superblock compatible features (0x%x)!", - (sbp->sb_features_compat & XFS_SB_FEAT_COMPAT_UNKNOWN)); - return -EFSCORRUPTED; - } - if (!xfs_is_readonly(mp) && xfs_sb_has_ro_compat_feature(sbp, XFS_SB_FEAT_RO_COMPAT_UNKNOWN)) { xfs_alert(mp,
From: Darrick J. Wong djwong@kernel.org
Source kernel commit: bd27c7bcdca25ce8067ebb94ded6ac1bd7b47317
With the nrext64 feature enabled, it's possible for a data fork to have 2^48 extent mappings. Even with a 64k fsblock size, that maps out to a bmbt containing more than 2^32 blocks. Therefore, this predicate must return a u64 count to avoid an integer wraparound that will cause scrub to do the wrong thing.
It's unlikely that any such filesystem currently exists, because the incore bmbt would consume more than 64GB of kernel memory on its own, and so far nobody except me has driven a filesystem that far, judging from the lack of complaints.
Cc: stable@vger.kernel.org # v5.19 Fixes: df9ad5cc7a5240 ("xfs: Introduce macros to represent new maximum extent counts for data/attr forks") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- libxfs/xfs_btree.c | 4 ++-- libxfs/xfs_btree.h | 2 +- libxfs/xfs_ialloc_btree.c | 4 +++- 3 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index 3d870f3f4a5165..5c293ccf623336 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -5142,7 +5142,7 @@ xfs_btree_count_blocks_helper( int level, void *data) { - xfs_extlen_t *blocks = data; + xfs_filblks_t *blocks = data; (*blocks)++;
return 0; @@ -5152,7 +5152,7 @@ xfs_btree_count_blocks_helper( int xfs_btree_count_blocks( struct xfs_btree_cur *cur, - xfs_extlen_t *blocks) + xfs_filblks_t *blocks) { *blocks = 0; return xfs_btree_visit_blocks(cur, xfs_btree_count_blocks_helper, diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 3b739459ebb0f4..c5bff273cae255 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -484,7 +484,7 @@ typedef int (*xfs_btree_visit_blocks_fn)(struct xfs_btree_cur *cur, int level, int xfs_btree_visit_blocks(struct xfs_btree_cur *cur, xfs_btree_visit_blocks_fn fn, unsigned int flags, void *data);
-int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_extlen_t *blocks); +int xfs_btree_count_blocks(struct xfs_btree_cur *cur, xfs_filblks_t *blocks);
union xfs_btree_rec *xfs_btree_rec_addr(struct xfs_btree_cur *cur, int n, struct xfs_btree_block *block); diff --git a/libxfs/xfs_ialloc_btree.c b/libxfs/xfs_ialloc_btree.c index 19fca9fad62b1d..4cccac145dc775 100644 --- a/libxfs/xfs_ialloc_btree.c +++ b/libxfs/xfs_ialloc_btree.c @@ -743,6 +743,7 @@ xfs_finobt_count_blocks( { struct xfs_buf *agbp = NULL; struct xfs_btree_cur *cur; + xfs_filblks_t blocks; int error;
error = xfs_ialloc_read_agi(pag, tp, 0, &agbp); @@ -750,9 +751,10 @@ xfs_finobt_count_blocks( return error;
cur = xfs_finobt_init_cursor(pag, tp, agbp); - error = xfs_btree_count_blocks(cur, tree_blocks); + error = xfs_btree_count_blocks(cur, &blocks); xfs_btree_del_cursor(cur, error); xfs_trans_brelse(tp, agbp); + *tree_blocks = blocks;
return error; }
From: Darrick J. Wong djwong@kernel.org
Source kernel commit: 23bee6f390a12d0c4c51fefc083704bc5dac377e
smatch reported that we screwed up the error cleanup in this function. Fix it.
Cc: stable@vger.kernel.org # v6.13-rc1 Fixes: ae897e0bed0f54 ("xfs: support creating per-RTG files in growfs") Reported-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- libxfs/xfs_rtgroup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 8189b83d0f184a..aaaec2a1cef9e5 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -493,7 +493,7 @@ xfs_rtginode_create(
error = xfs_metadir_create(&upd, S_IFREG); if (error) - return error; + goto out_cancel;
xfs_rtginode_lockdep_setup(upd.ip, rtg_rgno(rtg), type);
From: Darrick J. Wong djwong@kernel.org
Source kernel commit: 6d7b4bc1c3e00b1a25b7a05141a64337b4629337
In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec would erroneously try to update the parent's key for a block that had been split if we decided to insert the new record into the new block. The solution was to detect this situation and update the in-core key value that we pass up to the caller so that the caller will (eventually) add the new block to the parent level of the tree with the correct key.
However, I missed a subtlety about the way inode-rooted btrees work. If the full block was a maximally sized inode root block, we'll solve that fullness by moving the root block's records to a new block, resizing the root block, and updating the root to point to the new block. We don't pass a pointer to the new block to the caller because that work has already been done. The new record will /always/ land in the new block, so in this case we need to use xfs_btree_update_keys to update the keys.
This bug can theoretically manifest itself in the very rare case that we split a bmbt root block and the new record lands in the very first slot of the new block, though I've never managed to trigger it in practice. However, it is very easy to reproduce by running generic/522 with the realtime rmapbt patchset if rtinherit=1.
Cc: stable@vger.kernel.org # v4.8 Fixes: 2c813ad66a7218 ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- libxfs/xfs_btree.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-)
diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index 5c293ccf623336..f4c4db62e2069e 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -3555,14 +3555,31 @@ xfs_btree_insrec( xfs_btree_log_block(cur, bp, XFS_BB_NUMRECS);
/* - * If we just inserted into a new tree block, we have to - * recalculate nkey here because nkey is out of date. + * Update btree keys to reflect the newly added record or keyptr. + * There are three cases here to be aware of. Normally, all we have to + * do is walk towards the root, updating keys as necessary. * - * Otherwise we're just updating an existing block (having shoved - * some records into the new tree block), so use the regular key - * update mechanism. + * If the caller had us target a full block for the insertion, we dealt + * with that by calling the _make_block_unfull function. If the + * "make unfull" function splits the block, it'll hand us back the key + * and pointer of the new block. We haven't yet added the new block to + * the next level up, so if we decide to add the new record to the new + * block (bp->b_bn != old_bn), we have to update the caller's pointer + * so that the caller adds the new block with the correct key. + * + * However, there is a third possibility-- if the selected block is the + * root block of an inode-rooted btree and cannot be expanded further, + * the "make unfull" function moves the root block contents to a new + * block and updates the root block to point to the new block. In this + * case, no block pointer is passed back because the block has already + * been added to the btree. In this case, we need to use the regular + * key update function, just like the first case. This is critical for + * overlapping btrees, because the high key must be updated to reflect + * the entire tree, not just the subtree accessible through the first + * child of the root (which is now two levels down from the root). */ - if (bp && xfs_buf_daddr(bp) != old_bn) { + if (!xfs_btree_ptr_is_null(cur, &nptr) && + bp && xfs_buf_daddr(bp) != old_bn) { xfs_btree_get_keys(cur, block, lkey); } else if (xfs_btree_needs_key_update(cur, optr)) { error = xfs_btree_update_keys(cur, level);
From: Darrick J. Wong djwong@kernel.org
Source kernel commit: 7f8a44f37229fc76bfcafa341a4b8862368ef44a
For a sparse inodes filesystem, mkfs.xfs computes the values of sb_spino_align and sb_inoalignmt with the following code:
int cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;
if (cfg->sb_feat.crcs_enabled) cluster_size *= cfg->inodesize / XFS_DINODE_MIN_SIZE;
sbp->sb_spino_align = cluster_size >> cfg->blocklog; sbp->sb_inoalignmt = XFS_INODES_PER_CHUNK * cfg->inodesize >> cfg->blocklog;
On a V5 filesystem with 64k fsblocks and 512 byte inodes, this results in cluster_size = 8192 * (512 / 256) = 16384. As a result, sb_spino_align and sb_inoalignmt are both set to zero. Unfortunately, this trips the new sb_spino_align check that was just added to xfs_validate_sb_common, and the mkfs fails:
# mkfs.xfs -f -b size=64k, /dev/sda meta-data=/dev/sda isize=512 agcount=4, agsize=81136 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 = exchange=0 metadir=0 data = bsize=65536 blocks=324544, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=65536 ascii-ci=0, ftype=1, parent=0 log =internal log bsize=65536 blocks=5006, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 = rgcount=0 rgsize=0 extents Discarding blocks...Sparse inode alignment (0) is invalid. Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1 mkfs.xfs: Releasing dirty buffer to free list! found dirty buffer (bulk) on free list! Sparse inode alignment (0) is invalid. Metadata corruption detected at 0x560ac5a80bbe, xfs_sb block 0x0/0x200 libxfs_bwrite: write verifier failed on xfs_sb bno 0x0/0x1 mkfs.xfs: writing AG headers failed, err=22
Prior to commit 59e43f5479cce1 this all worked fine, even if "sparse" inodes are somewhat meaningless when everything fits in a single fsblock. Adjust the checks to handle existing filesystems.
Cc: stable@vger.kernel.org # v6.13-rc1 Fixes: 59e43f5479cce1 ("xfs: sb_spino_align is not verified") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- libxfs/xfs_sb.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 87f740e6c75dce..ff23803a8065bf 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -491,12 +491,13 @@ xfs_validate_sb_common( return -EINVAL; }
- if (!sbp->sb_spino_align || - sbp->sb_spino_align > sbp->sb_inoalignmt || - (sbp->sb_inoalignmt % sbp->sb_spino_align) != 0) { + if (sbp->sb_spino_align && + (sbp->sb_spino_align > sbp->sb_inoalignmt || + (sbp->sb_inoalignmt % sbp->sb_spino_align) != 0)) { xfs_warn(mp, - "Sparse inode alignment (%u) is invalid.", - sbp->sb_spino_align); +"Sparse inode alignment (%u) is invalid, must be integer factor of (%u).", + sbp->sb_spino_align, + sbp->sb_inoalignmt); return -EINVAL; } } else if (sbp->sb_spino_align) {
From: Darrick J. Wong djwong@kernel.org
Source kernel commit: 7f8b718c58783f3ff0810b39e2f62f50ba2549f6
V4 symlink blocks didn't have headers, so return early if this is a V4 filesystem.
Cc: stable@vger.kernel.org # v5.1 Fixes: 39708c20ab5133 ("xfs: miscellaneous verifier magic value fixups") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- libxfs/xfs_symlink_remote.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/libxfs/xfs_symlink_remote.c b/libxfs/xfs_symlink_remote.c index 2ad586f3926ad2..1c355f751e1cc7 100644 --- a/libxfs/xfs_symlink_remote.c +++ b/libxfs/xfs_symlink_remote.c @@ -89,8 +89,10 @@ xfs_symlink_verify( struct xfs_mount *mp = bp->b_mount; struct xfs_dsymlink_hdr *dsl = bp->b_addr;
+ /* no verification of non-crc buffers */ if (!xfs_has_crc(mp)) - return __this_address; + return NULL; + if (!xfs_verify_magic(bp, dsl->sl_magic)) return __this_address; if (!uuid_equal(&dsl->sl_uuid, &mp->m_sb.sb_meta_uuid))
linux-stable-mirror@lists.linaro.org