Hello again,
This is a series for 6.1.y for fixes from 6.11. It corresponds to the 6.6.y series here: https://lore.kernel.org/linux-xfs/20241218191725.63098-1-catherine.hoang@ora...
During porting, I noticed 6.1.y was missing a fix series from 6.5 that is a dependency of the fixes from 6.11 so I included those first.
These were tested via the auto group on 9 configs with no regressions seen. These were also already ack'd on the xfs-stable mailing list.
series from 6.5: https://lore.kernel.org/linux-xfs/168506055189.3727958.722711918040129046.st... 63ef7a35912d xfs: fix interval filtering in multi-step fsmap queries 7975aba19cba xfs: fix integer overflows in the fsmap rtbitmap and logdev backends d898137d789c xfs: fix getfsmap reporting past the last rt extent f045dd00328d xfs: clean up the rtbitmap fsmap backend a949a1c2a198 xfs: fix logdev fsmap query result filtering 3ee9351e7490 xfs: validate fsmap offsets specified in the query keys 75dc03453122 xfs: fix xfs_btree_query_range callers to initialize btree rec fully
fix of 63ef7a35912dd ("xfs: fix interval filtering in multi-step fsmap queries") https://lore.kernel.org/linux-xfs/169335025661.3518128.12423331693506002020.... cfa2df68b7ce xfs: fix an agbno overflow in __xfs_getfsmap_datadev
6.6 series for 6.11: https://lore.kernel.org/linux-xfs/20241218191725.63098-1-catherine.hoang@ora... 85d0947db262 xfs: fix the contact address for the sysfs ABI documentation c08d03996cea xfs: verify buffer, inode, and dquot items every tx commit ff627196ddc1 xfs: use consistent uid/gid when grabbing dquots for inodes 7531c9ab2e55 xfs: declare xfs_file.c symbols in xfs_file.h c070b8802159 xfs: create a new helper to return a file's allocation unit 2e63ed9b0175 xfs: Fix xfs_flush_unmap_range() range for RT fe962ab3c4f1 xfs: Fix xfs_prepare_shift() range for RT ca96d83c9307 xfs: don't walk off the end of a directory data block 27336a327b40 xfs: remove unused parameter in macro XFS_DQUOT_LOGRES b2dcbd8a928c xfs: attr forks require attr, not attr2 4a82db7a4b73 xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set 9fadc53d793c xfs: Fix the owner setting issue for rmap query in xfs fsmap 35bd108619c2 xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code 29fcb5fef608 xfs: take m_growlock when running growfsrt e5d1ae2d4d0b xfs: reset rootdir extent size hint after growfsrt [skipped for 6.1 as scrub is not supported in 6.1:] cb95cb2450e3 xfs: convert comma to semicolon 1bee32f33c0a xfs: fix file_path handling in tracepoints
- Leah
Christoph Hellwig (1): xfs: fix the contact address for the sysfs ABI documentation
Darrick J. Wong (17): xfs: fix interval filtering in multi-step fsmap queries xfs: fix integer overflows in the fsmap rtbitmap and logdev backends xfs: fix getfsmap reporting past the last rt extent xfs: clean up the rtbitmap fsmap backend xfs: fix logdev fsmap query result filtering xfs: validate fsmap offsets specified in the query keys xfs: fix xfs_btree_query_range callers to initialize btree rec fully xfs: fix an agbno overflow in __xfs_getfsmap_datadev xfs: verify buffer, inode, and dquot items every tx commit xfs: use consistent uid/gid when grabbing dquots for inodes xfs: declare xfs_file.c symbols in xfs_file.h xfs: create a new helper to return a file's allocation unit xfs: attr forks require attr, not attr2 xfs: conditionally allow FS_XFLAG_REALTIME changes if S_DAX is set xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code xfs: take m_growlock when running growfsrt xfs: reset rootdir extent size hint after growfsrt
John Garry (2): xfs: Fix xfs_flush_unmap_range() range for RT xfs: Fix xfs_prepare_shift() range for RT
Julian Sun (1): xfs: remove unused parameter in macro XFS_DQUOT_LOGRES
Zizhi Wo (1): xfs: Fix the owner setting issue for rmap query in xfs fsmap
lei lu (1): xfs: don't walk off the end of a directory data block
Documentation/ABI/testing/sysfs-fs-xfs | 8 +- fs/xfs/Kconfig | 12 ++ fs/xfs/libxfs/xfs_alloc.c | 10 +- fs/xfs/libxfs/xfs_dir2_data.c | 31 ++- fs/xfs/libxfs/xfs_dir2_priv.h | 7 + fs/xfs/libxfs/xfs_quota_defs.h | 2 +- fs/xfs/libxfs/xfs_refcount.c | 13 +- fs/xfs/libxfs/xfs_rmap.c | 10 +- fs/xfs/libxfs/xfs_trans_resv.c | 28 +-- fs/xfs/scrub/bmap.c | 8 +- fs/xfs/xfs.h | 4 + fs/xfs/xfs_bmap_util.c | 22 +- fs/xfs/xfs_buf_item.c | 32 +++ fs/xfs/xfs_dquot_item.c | 31 +++ fs/xfs/xfs_file.c | 33 ++- fs/xfs/xfs_file.h | 15 ++ fs/xfs/xfs_fsmap.c | 266 ++++++++++++++----------- fs/xfs/xfs_inode.c | 29 ++- fs/xfs/xfs_inode.h | 2 + fs/xfs/xfs_inode_item.c | 32 +++ fs/xfs/xfs_ioctl.c | 12 ++ fs/xfs/xfs_iops.c | 1 + fs/xfs/xfs_iops.h | 3 - fs/xfs/xfs_rtalloc.c | 78 ++++++-- fs/xfs/xfs_symlink.c | 8 +- fs/xfs/xfs_trace.h | 25 +++ 26 files changed, 505 insertions(+), 217 deletions(-) create mode 100644 fs/xfs/xfs_file.h
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 63ef7a35912dd743cabd65d5bb95891625c0dd46 ]
I noticed a bug in ranged GETFSMAP queries:
# xfs_io -c 'fsmap -vvvv' /opt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 8:80 [0..7]: static fs metadata 0 (0..7) 8 <snip> 9: 8:80 [192..223]: 137 0..31 0 (192..223) 32 # xfs_io -c 'fsmap -vvvv -d 208 208' /opt #
That's not right -- we asked what block maps block 208, and we should've received a mapping for inode 137 offset 16. Instead, we get nothing.
The root cause of this problem is a mis-interaction between the fsmap code and how btree ranged queries work. xfs_btree_query_range returns any btree record that overlaps with the query interval, even if the record starts before or ends after the interval. Similarly, GETFSMAP is supposed to return a recordset containing all records that overlap the range queried.
However, it's possible that the recordset is larger than the buffer that the caller provided to convey mappings to userspace. In /that/ case, userspace is supposed to copy the last record returned to fmh_keys[0] and call GETFSMAP again. In this case, we do not want to return mappings that we have already supplied to the caller. The call to xfs_btree_query_range is the same, but now we ignore any records that start before fmh_keys[0].
Unfortunately, we didn't implement the filtering predicate correctly. The predicate should only be called when we're calling back for more records. Accomplish this by setting info->low.rm_blockcount to a nonzero value and ensuring that it is cleared as necessary. As a result, we no longer want to adjust dkeys[0] in the main setup function because that's confusing.
This patch doesn't touch the logdev/rtbitmap backends because they have bigger problems that will be addressed by subsequent patches.
Found via xfs/556 with parent pointers enabled.
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 67 +++++++++++++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 19 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index a5b9754c62d1..2011f1bf7ce0 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -160,11 +160,18 @@ struct xfs_getfsmap_info { struct xfs_buf *agf_bp; /* AGF, for refcount queries */ struct xfs_perag *pag; /* AG info, if applicable */ xfs_daddr_t next_daddr; /* next daddr we expect */ u64 missing_owner; /* owner of holes */ u32 dev; /* device id */ - struct xfs_rmap_irec low; /* low rmap key */ + /* + * Low rmap key for the query. If low.rm_blockcount is nonzero, this + * is the second (or later) call to retrieve the recordset in pieces. + * xfs_getfsmap_rec_before_start will compare all records retrieved + * by the rmapbt query to filter out any records that start before + * the last record. + */ + struct xfs_rmap_irec low; struct xfs_rmap_irec high; /* high rmap key */ bool last; /* last extent? */ };
/* Associate a device with a getfsmap handler. */ @@ -235,34 +242,45 @@ xfs_getfsmap_format(
rec = &info->fsmap_recs[info->head->fmh_entries++]; xfs_fsmap_from_internal(rec, xfm); }
+static inline bool +xfs_getfsmap_rec_before_start( + struct xfs_getfsmap_info *info, + const struct xfs_rmap_irec *rec, + xfs_daddr_t rec_daddr) +{ + if (info->low.rm_blockcount) + return xfs_rmap_compare(rec, &info->low) < 0; + return false; +} + /* * Format a reverse mapping for getfsmap, having translated rm_startblock * into the appropriate daddr units. */ STATIC int xfs_getfsmap_helper( struct xfs_trans *tp, struct xfs_getfsmap_info *info, const struct xfs_rmap_irec *rec, xfs_daddr_t rec_daddr) { struct xfs_fsmap fmr; struct xfs_mount *mp = tp->t_mountp; bool shared; int error;
if (fatal_signal_pending(current)) return -EINTR;
/* * Filter out records that start before our startpoint, if the * caller requested that. */ - if (xfs_rmap_compare(rec, &info->low) < 0) { + if (xfs_getfsmap_rec_before_start(info, rec, rec_daddr)) { rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; } @@ -604,13 +622,31 @@ __xfs_getfsmap_datadev( info->low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb); info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); if (error) return error; - info->low.rm_blockcount = 0; + info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length); xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
+ /* Adjust the low key if we are continuing from where we left off. */ + if (info->low.rm_blockcount == 0) { + /* empty */ + } else if (XFS_RMAP_NON_INODE_OWNER(info->low.rm_owner) || + (info->low.rm_flags & (XFS_RMAP_ATTR_FORK | + XFS_RMAP_BMBT_BLOCK | + XFS_RMAP_UNWRITTEN))) { + info->low.rm_startblock += info->low.rm_blockcount; + info->low.rm_owner = 0; + info->low.rm_offset = 0; + + start_fsb += info->low.rm_blockcount; + if (XFS_FSB_TO_DADDR(mp, start_fsb) >= eofs) + return 0; + } else { + info->low.rm_offset += info->low.rm_blockcount; + } + info->high.rm_startblock = -1U; info->high.rm_owner = ULLONG_MAX; info->high.rm_offset = ULLONG_MAX; info->high.rm_blockcount = 0; info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; @@ -657,16 +693,12 @@ __xfs_getfsmap_datadev(
/* * Set the AG low key to the start of the AG prior to * moving on to the next AG. */ - if (pag->pag_agno == start_ag) { - info->low.rm_startblock = 0; - info->low.rm_owner = 0; - info->low.rm_offset = 0; - info->low.rm_flags = 0; - } + if (pag->pag_agno == start_ag) + memset(&info->low, 0, sizeof(info->low));
/* * If this is the last AG, report any gap at the end of it * before we drop the reference to the perag when the loop * terminates. @@ -899,25 +931,21 @@ xfs_getfsmap( * * If the low key mapping refers to file data, the same physical * blocks could be mapped to several other files/offsets. * According to rmapbt record ordering, the minimal next * possible record for the block range is the next starting - * offset in the same inode. Therefore, bump the file offset to - * continue the search appropriately. For all other low key - * mapping types (attr blocks, metadata), bump the physical - * offset as there can be no other mapping for the same physical - * block range. + * offset in the same inode. Therefore, each fsmap backend bumps + * the file offset to continue the search appropriately. For + * all other low key mapping types (attr blocks, metadata), each + * fsmap backend bumps the physical offset as there can be no + * other mapping for the same physical block range. */ dkeys[0] = head->fmh_keys[0]; if (dkeys[0].fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) { - dkeys[0].fmr_physical += dkeys[0].fmr_length; - dkeys[0].fmr_owner = 0; if (dkeys[0].fmr_offset) return -EINVAL; - } else - dkeys[0].fmr_offset += dkeys[0].fmr_length; - dkeys[0].fmr_length = 0; + } memset(&dkeys[1], 0xFF, sizeof(struct xfs_fsmap));
if (!xfs_getfsmap_check_keys(dkeys, &head->fmh_keys[1])) return -EINVAL;
@@ -958,10 +986,11 @@ xfs_getfsmap( break;
info.dev = handlers[i].dev; info.last = false; info.pag = NULL; + info.low.rm_blockcount = 0; error = handlers[i].fn(tp, dkeys, &info); if (error) break; xfs_trans_cancel(tp); tp = NULL;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 7975aba19cba4eba7ff60410f9294c90edc96dcf ]
It's not correct to use the rmap irec structure to hold query key information to query the rtbitmap because the realtime volume can be longer than 2^32 fsblocks in length. Because the rt volume doesn't have allocation groups, introduce a daddr-based record filtering algorithm and compute the rtextent values using 64-bit variables. The same problem exists in the external log device fsmap implementation, so use the same solution to fix it too.
After this patch, all the code that touches info->low and info->high under xfs_getfsmap_logdev and __xfs_getfsmap_rtdev are unnecessary. Cleaning this up will be done in subsequent patches.
Fixes: 4c934c7dd60c ("xfs: report realtime space information via the rtbitmap") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 90 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 64 insertions(+), 26 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 2011f1bf7ce0..5039d330ef98 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -158,10 +158,12 @@ struct xfs_getfsmap_info { struct xfs_fsmap_head *head; struct fsmap *fsmap_recs; /* mapping records */ struct xfs_buf *agf_bp; /* AGF, for refcount queries */ struct xfs_perag *pag; /* AG info, if applicable */ xfs_daddr_t next_daddr; /* next daddr we expect */ + /* daddr of low fsmap key when we're using the rtbitmap */ + xfs_daddr_t low_daddr; u64 missing_owner; /* owner of holes */ u32 dev; /* device id */ /* * Low rmap key for the query. If low.rm_blockcount is nonzero, this * is the second (or later) call to retrieve the recordset in pieces. @@ -248,59 +250,66 @@ static inline bool xfs_getfsmap_rec_before_start( struct xfs_getfsmap_info *info, const struct xfs_rmap_irec *rec, xfs_daddr_t rec_daddr) { + if (info->low_daddr != -1ULL) + return rec_daddr < info->low_daddr; if (info->low.rm_blockcount) return xfs_rmap_compare(rec, &info->low) < 0; return false; }
/* * Format a reverse mapping for getfsmap, having translated rm_startblock - * into the appropriate daddr units. + * into the appropriate daddr units. Pass in a nonzero @len_daddr if the + * length could be larger than rm_blockcount in struct xfs_rmap_irec. */ STATIC int xfs_getfsmap_helper( struct xfs_trans *tp, struct xfs_getfsmap_info *info, const struct xfs_rmap_irec *rec, - xfs_daddr_t rec_daddr) + xfs_daddr_t rec_daddr, + xfs_daddr_t len_daddr) { struct xfs_fsmap fmr; struct xfs_mount *mp = tp->t_mountp; bool shared; int error;
if (fatal_signal_pending(current)) return -EINTR;
+ if (len_daddr == 0) + len_daddr = XFS_FSB_TO_BB(mp, rec->rm_blockcount); + /* * Filter out records that start before our startpoint, if the * caller requested that. */ if (xfs_getfsmap_rec_before_start(info, rec, rec_daddr)) { - rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); + rec_daddr += len_daddr; if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; }
/* Are we just counting mappings? */ if (info->head->fmh_count == 0) { if (info->head->fmh_entries == UINT_MAX) return -ECANCELED;
if (rec_daddr > info->next_daddr) info->head->fmh_entries++;
if (info->last) return 0;
info->head->fmh_entries++;
- rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); + rec_daddr += len_daddr; if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; }
@@ -336,73 +345,73 @@ xfs_getfsmap_helper( fmr.fmr_physical = rec_daddr; error = xfs_fsmap_owner_from_rmap(&fmr, rec); if (error) return error; fmr.fmr_offset = XFS_FSB_TO_BB(mp, rec->rm_offset); - fmr.fmr_length = XFS_FSB_TO_BB(mp, rec->rm_blockcount); + fmr.fmr_length = len_daddr; if (rec->rm_flags & XFS_RMAP_UNWRITTEN) fmr.fmr_flags |= FMR_OF_PREALLOC; if (rec->rm_flags & XFS_RMAP_ATTR_FORK) fmr.fmr_flags |= FMR_OF_ATTR_FORK; if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) fmr.fmr_flags |= FMR_OF_EXTENT_MAP; if (fmr.fmr_flags == 0) { error = xfs_getfsmap_is_shared(tp, info, rec, &shared); if (error) return error; if (shared) fmr.fmr_flags |= FMR_OF_SHARED; }
xfs_getfsmap_format(mp, &fmr, info); out: - rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); + rec_daddr += len_daddr; if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; }
/* Transform a rmapbt irec into a fsmap */ STATIC int xfs_getfsmap_datadev_helper( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *rec, void *priv) { struct xfs_mount *mp = cur->bc_mp; struct xfs_getfsmap_info *info = priv; xfs_fsblock_t fsb; xfs_daddr_t rec_daddr;
fsb = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, rec->rm_startblock); rec_daddr = XFS_FSB_TO_DADDR(mp, fsb);
- return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr); + return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr, 0); }
/* Transform a bnobt irec into a fsmap */ STATIC int xfs_getfsmap_datadev_bnobt_helper( struct xfs_btree_cur *cur, const struct xfs_alloc_rec_incore *rec, void *priv) { struct xfs_mount *mp = cur->bc_mp; struct xfs_getfsmap_info *info = priv; struct xfs_rmap_irec irec; xfs_daddr_t rec_daddr;
rec_daddr = XFS_AGB_TO_DADDR(mp, cur->bc_ag.pag->pag_agno, rec->ar_startblock);
irec.rm_startblock = rec->ar_startblock; irec.rm_blockcount = rec->ar_blockcount; irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ irec.rm_offset = 0; irec.rm_flags = 0;
- return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr); + return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr, 0); }
/* Set rmap flags based on the getfsmap flags */ static void xfs_getfsmap_set_irec_flags( @@ -425,133 +434,161 @@ xfs_getfsmap_logdev( const struct xfs_fsmap *keys, struct xfs_getfsmap_info *info) { struct xfs_mount *mp = tp->t_mountp; struct xfs_rmap_irec rmap; + xfs_daddr_t rec_daddr, len_daddr; + xfs_fsblock_t start_fsb; int error;
/* Set up search keys */ + start_fsb = XFS_BB_TO_FSBT(mp, + keys[0].fmr_physical + keys[0].fmr_length); info->low.rm_startblock = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); error = xfs_fsmap_owner_to_rmap(&info->low, keys); if (error) return error; info->low.rm_blockcount = 0; xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
+ /* Adjust the low key if we are continuing from where we left off. */ + if (keys[0].fmr_length > 0) + info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb); + error = xfs_fsmap_owner_to_rmap(&info->high, keys + 1); if (error) return error; info->high.rm_startblock = -1U; info->high.rm_owner = ULLONG_MAX; info->high.rm_offset = ULLONG_MAX; info->high.rm_blockcount = 0; info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; info->missing_owner = XFS_FMR_OWN_FREE;
trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high);
- if (keys[0].fmr_physical > 0) + if (start_fsb > 0) return 0;
/* Fabricate an rmap entry for the external log device. */ rmap.rm_startblock = 0; rmap.rm_blockcount = mp->m_sb.sb_logblocks; rmap.rm_owner = XFS_RMAP_OWN_LOG; rmap.rm_offset = 0; rmap.rm_flags = 0;
- return xfs_getfsmap_helper(tp, info, &rmap, 0); + rec_daddr = XFS_FSB_TO_BB(mp, rmap.rm_startblock); + len_daddr = XFS_FSB_TO_BB(mp, rmap.rm_blockcount); + return xfs_getfsmap_helper(tp, info, &rmap, rec_daddr, len_daddr); }
#ifdef CONFIG_XFS_RT /* Transform a rtbitmap "record" into a fsmap */ STATIC int xfs_getfsmap_rtdev_rtbitmap_helper( struct xfs_mount *mp, struct xfs_trans *tp, const struct xfs_rtalloc_rec *rec, void *priv) { struct xfs_getfsmap_info *info = priv; struct xfs_rmap_irec irec; - xfs_daddr_t rec_daddr; + xfs_rtblock_t rtbno; + xfs_daddr_t rec_daddr, len_daddr; + + rtbno = rec->ar_startext * mp->m_sb.sb_rextsize; + rec_daddr = XFS_FSB_TO_BB(mp, rtbno); + irec.rm_startblock = rtbno; + + rtbno = rec->ar_extcount * mp->m_sb.sb_rextsize; + len_daddr = XFS_FSB_TO_BB(mp, rtbno); + irec.rm_blockcount = rtbno;
- irec.rm_startblock = rec->ar_startext * mp->m_sb.sb_rextsize; - rec_daddr = XFS_FSB_TO_BB(mp, irec.rm_startblock); - irec.rm_blockcount = rec->ar_extcount * mp->m_sb.sb_rextsize; irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ irec.rm_offset = 0; irec.rm_flags = 0;
- return xfs_getfsmap_helper(tp, info, &irec, rec_daddr); + return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr); }
/* Execute a getfsmap query against the realtime device. */ STATIC int __xfs_getfsmap_rtdev( struct xfs_trans *tp, const struct xfs_fsmap *keys, int (*query_fn)(struct xfs_trans *, - struct xfs_getfsmap_info *), + struct xfs_getfsmap_info *, + xfs_rtblock_t start_rtb, + xfs_rtblock_t end_rtb), struct xfs_getfsmap_info *info) { struct xfs_mount *mp = tp->t_mountp; - xfs_fsblock_t start_fsb; - xfs_fsblock_t end_fsb; + xfs_rtblock_t start_rtb; + xfs_rtblock_t end_rtb; uint64_t eofs; int error = 0;
eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); if (keys[0].fmr_physical >= eofs) return 0; - start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); - end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + start_rtb = XFS_BB_TO_FSBT(mp, + keys[0].fmr_physical + keys[0].fmr_length); + end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
/* Set up search keys */ - info->low.rm_startblock = start_fsb; + info->low.rm_startblock = start_rtb; error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); if (error) return error; info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); info->low.rm_blockcount = 0; xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
- info->high.rm_startblock = end_fsb; + /* Adjust the low key if we are continuing from where we left off. */ + if (keys[0].fmr_length > 0) { + info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb); + if (info->low_daddr >= eofs) + return 0; + } + + info->high.rm_startblock = end_rtb; error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); if (error) return error; info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); info->high.rm_blockcount = 0; xfs_getfsmap_set_irec_flags(&info->high, &keys[1]);
trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high);
- return query_fn(tp, info); + return query_fn(tp, info, start_rtb, end_rtb); }
/* Actually query the realtime bitmap. */ STATIC int xfs_getfsmap_rtdev_rtbitmap_query( struct xfs_trans *tp, - struct xfs_getfsmap_info *info) + struct xfs_getfsmap_info *info, + xfs_rtblock_t start_rtb, + xfs_rtblock_t end_rtb) { struct xfs_rtalloc_rec alow = { 0 }; struct xfs_rtalloc_rec ahigh = { 0 }; struct xfs_mount *mp = tp->t_mountp; int error;
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED);
/* * Set up query parameters to return free rtextents covering the range * we want. */ - alow.ar_startext = info->low.rm_startblock; - ahigh.ar_startext = info->high.rm_startblock; + alow.ar_startext = start_rtb; + ahigh.ar_startext = end_rtb; do_div(alow.ar_startext, mp->m_sb.sb_rextsize); if (do_div(ahigh.ar_startext, mp->m_sb.sb_rextsize)) ahigh.ar_startext++; error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh, xfs_getfsmap_rtdev_rtbitmap_helper, info); @@ -986,10 +1023,11 @@ xfs_getfsmap( break;
info.dev = handlers[i].dev; info.last = false; info.pag = NULL; + info.low_daddr = -1ULL; info.low.rm_blockcount = 0; error = handlers[i].fn(tp, dkeys, &info); if (error) break; xfs_trans_cancel(tp);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit d898137d789cac9ebe5eed9957e4cf25122ca524 ]
The realtime section ends at the last rt extent. If the user configures the rt geometry with an extent size that is not an integer factor of the number of rt blocks, it's possible for there to be rt blocks past the end of the last rt extent. These tail blocks cannot ever be allocated and will cause corruption reports if the last extent coincides with the end of an rt bitmap block, so do not report consider them for the GETFSMAP output.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 5039d330ef98..7b72992c14d9 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -527,11 +527,11 @@ __xfs_getfsmap_rtdev( xfs_rtblock_t start_rtb; xfs_rtblock_t end_rtb; uint64_t eofs; int error = 0;
- eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rextents * mp->m_sb.sb_rextsize); if (keys[0].fmr_physical >= eofs) return 0; start_rtb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical + keys[0].fmr_length); end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit f045dd00328d78f25d64913285f4547f772d13e2 ]
The rtbitmap fsmap backend doesn't query the rmapbt, so it's wasteful to spend time initializing the rmap_irec objects. Worse yet, the logic to query the rtbitmap is spread across three separate functions, which is unnecessarily difficult to follow.
Compute the start rtextent that we want from keys[0] directly and combine the functions to avoid passing parameters around everywhere, and consolidate all the logic into a single function. At one point many years ago I intended to use __xfs_getfsmap_rtdev as the launching point for realtime rmapbt queries, but this hasn't been the case for a long time.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 62 +++++++--------------------------------------- fs/xfs/xfs_trace.h | 25 +++++++++++++++++++ 2 files changed, 34 insertions(+), 53 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 7b72992c14d9..202f162515bd 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -510,76 +510,44 @@ xfs_getfsmap_rtdev_rtbitmap_helper( irec.rm_flags = 0;
return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr); }
-/* Execute a getfsmap query against the realtime device. */ +/* Execute a getfsmap query against the realtime device rtbitmap. */ STATIC int -__xfs_getfsmap_rtdev( +xfs_getfsmap_rtdev_rtbitmap( struct xfs_trans *tp, const struct xfs_fsmap *keys, - int (*query_fn)(struct xfs_trans *, - struct xfs_getfsmap_info *, - xfs_rtblock_t start_rtb, - xfs_rtblock_t end_rtb), struct xfs_getfsmap_info *info) { + + struct xfs_rtalloc_rec alow = { 0 }; + struct xfs_rtalloc_rec ahigh = { 0 }; struct xfs_mount *mp = tp->t_mountp; xfs_rtblock_t start_rtb; xfs_rtblock_t end_rtb; uint64_t eofs; - int error = 0; + int error;
eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rextents * mp->m_sb.sb_rextsize); if (keys[0].fmr_physical >= eofs) return 0; start_rtb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical + keys[0].fmr_length); end_rtb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
- /* Set up search keys */ - info->low.rm_startblock = start_rtb; - error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); - if (error) - return error; - info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); - info->low.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + info->missing_owner = XFS_FMR_OWN_UNKNOWN;
/* Adjust the low key if we are continuing from where we left off. */ if (keys[0].fmr_length > 0) { info->low_daddr = XFS_FSB_TO_BB(mp, start_rtb); if (info->low_daddr >= eofs) return 0; }
- info->high.rm_startblock = end_rtb; - error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); - if (error) - return error; - info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); - info->high.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); - - trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); - trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); - - return query_fn(tp, info, start_rtb, end_rtb); -} - -/* Actually query the realtime bitmap. */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap_query( - struct xfs_trans *tp, - struct xfs_getfsmap_info *info, - xfs_rtblock_t start_rtb, - xfs_rtblock_t end_rtb) -{ - struct xfs_rtalloc_rec alow = { 0 }; - struct xfs_rtalloc_rec ahigh = { 0 }; - struct xfs_mount *mp = tp->t_mountp; - int error; + trace_xfs_fsmap_low_key_linear(mp, info->dev, start_rtb); + trace_xfs_fsmap_high_key_linear(mp, info->dev, end_rtb);
xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED);
/* * Set up query parameters to return free rtextents covering the range @@ -607,22 +575,10 @@ xfs_getfsmap_rtdev_rtbitmap_query( goto err; err: xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED); return error; } - -/* Execute a getfsmap query against the realtime device rtbitmap. */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - struct xfs_getfsmap_info *info) -{ - info->missing_owner = XFS_FMR_OWN_UNKNOWN; - return __xfs_getfsmap_rtdev(tp, keys, xfs_getfsmap_rtdev_rtbitmap_query, - info); -} #endif /* CONFIG_XFS_RT */
/* Execute a getfsmap query against the regular data device. */ STATIC int __xfs_getfsmap_datadev( diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 20e2ec8b73aa..a9e3081b6625 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3489,10 +3489,35 @@ DEFINE_EVENT(xfs_fsmap_class, name, \ TP_ARGS(mp, keydev, agno, rmap)) DEFINE_FSMAP_EVENT(xfs_fsmap_low_key); DEFINE_FSMAP_EVENT(xfs_fsmap_high_key); DEFINE_FSMAP_EVENT(xfs_fsmap_mapping);
+DECLARE_EVENT_CLASS(xfs_fsmap_linear_class, + TP_PROTO(struct xfs_mount *mp, u32 keydev, uint64_t bno), + TP_ARGS(mp, keydev, bno), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, keydev) + __field(xfs_fsblock_t, bno) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->keydev = new_decode_dev(keydev); + __entry->bno = bno; + ), + TP_printk("dev %d:%d keydev %d:%d bno 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->keydev), MINOR(__entry->keydev), + __entry->bno) +) +#define DEFINE_FSMAP_LINEAR_EVENT(name) \ +DEFINE_EVENT(xfs_fsmap_linear_class, name, \ + TP_PROTO(struct xfs_mount *mp, u32 keydev, uint64_t bno), \ + TP_ARGS(mp, keydev, bno)) +DEFINE_FSMAP_LINEAR_EVENT(xfs_fsmap_low_key_linear); +DEFINE_FSMAP_LINEAR_EVENT(xfs_fsmap_high_key_linear); + DECLARE_EVENT_CLASS(xfs_getfsmap_class, TP_PROTO(struct xfs_mount *mp, struct xfs_fsmap *fsmap), TP_ARGS(mp, fsmap), TP_STRUCT__entry( __field(dev_t, dev)
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit a949a1c2a198e048630a8b0741a99b85a5d88136 ]
The external log device fsmap backend doesn't have an rmapbt to query, so it's wasteful to spend time initializing the rmap_irec objects. Worse yet, the log could (someday) be longer than 2^32 fsblocks, so using the rmap irec structure will result in integer overflows.
Fix this mess by computing the start address that we want from keys[0] directly, and use the daddr-based record filtering algorithm that we also use for rtbitmap queries.
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 30 ++++++++---------------------- 1 file changed, 8 insertions(+), 22 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 202f162515bd..cdd806d80b7c 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -435,40 +435,26 @@ xfs_getfsmap_logdev( struct xfs_getfsmap_info *info) { struct xfs_mount *mp = tp->t_mountp; struct xfs_rmap_irec rmap; xfs_daddr_t rec_daddr, len_daddr; - xfs_fsblock_t start_fsb; - int error; + xfs_fsblock_t start_fsb, end_fsb; + uint64_t eofs;
- /* Set up search keys */ + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks); + if (keys[0].fmr_physical >= eofs) + return 0; start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical + keys[0].fmr_length); - info->low.rm_startblock = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); - info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); - error = xfs_fsmap_owner_to_rmap(&info->low, keys); - if (error) - return error; - info->low.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical));
/* Adjust the low key if we are continuing from where we left off. */ if (keys[0].fmr_length > 0) info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb);
- error = xfs_fsmap_owner_to_rmap(&info->high, keys + 1); - if (error) - return error; - info->high.rm_startblock = -1U; - info->high.rm_owner = ULLONG_MAX; - info->high.rm_offset = ULLONG_MAX; - info->high.rm_blockcount = 0; - info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; - info->missing_owner = XFS_FMR_OWN_FREE; - - trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); - trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); + trace_xfs_fsmap_low_key_linear(mp, info->dev, start_fsb); + trace_xfs_fsmap_high_key_linear(mp, info->dev, end_fsb);
if (start_fsb > 0) return 0;
/* Fabricate an rmap entry for the external log device. */
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 3ee9351e74907fe3acb0721c315af25b05dc87da ]
Improve the validation of the fsmap offset fields in the query keys and move the validation to the top of the function now that we have pushed the low key adjustment code downwards.
Also fix some indenting issues that aren't worth a separate patch.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index cdd806d80b7c..d10f2c719220 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -800,10 +800,23 @@ xfs_getfsmap_is_valid_device( STATIC bool xfs_getfsmap_check_keys( struct xfs_fsmap *low_key, struct xfs_fsmap *high_key) { + if (low_key->fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) { + if (low_key->fmr_offset) + return false; + } + if (high_key->fmr_flags != -1U && + (high_key->fmr_flags & (FMR_OF_SPECIAL_OWNER | + FMR_OF_EXTENT_MAP))) { + if (high_key->fmr_offset && high_key->fmr_offset != -1ULL) + return false; + } + if (high_key->fmr_length && high_key->fmr_length != -1ULL) + return false; + if (low_key->fmr_device > high_key->fmr_device) return false; if (low_key->fmr_device < high_key->fmr_device) return true;
@@ -843,39 +856,41 @@ xfs_getfsmap_check_keys( * * Key to Confusion * ---------------- * There are multiple levels of keys and counters at work here: * xfs_fsmap_head.fmh_keys -- low and high fsmap keys passed in; - * these reflect fs-wide sector addrs. + * these reflect fs-wide sector addrs. * dkeys -- fmh_keys used to query each device; - * these are fmh_keys but w/ the low key - * bumped up by fmr_length. + * these are fmh_keys but w/ the low key + * bumped up by fmr_length. * xfs_getfsmap_info.next_daddr -- next disk addr we expect to see; this * is how we detect gaps in the fsmap records and report them. * xfs_getfsmap_info.low/high -- per-AG low/high keys computed from - * dkeys; used to query the metadata. + * dkeys; used to query the metadata. */ int xfs_getfsmap( struct xfs_mount *mp, struct xfs_fsmap_head *head, struct fsmap *fsmap_recs) { struct xfs_trans *tp = NULL; struct xfs_fsmap dkeys[2]; /* per-dev keys */ struct xfs_getfsmap_dev handlers[XFS_GETFSMAP_DEVS]; struct xfs_getfsmap_info info = { NULL }; bool use_rmap; int i; int error = 0;
if (head->fmh_iflags & ~FMH_IF_VALID) return -EINVAL; if (!xfs_getfsmap_is_valid_device(mp, &head->fmh_keys[0]) || !xfs_getfsmap_is_valid_device(mp, &head->fmh_keys[1])) return -EINVAL; + if (!xfs_getfsmap_check_keys(&head->fmh_keys[0], &head->fmh_keys[1])) + return -EINVAL;
use_rmap = xfs_has_rmapbt(mp) && has_capability_noaudit(current, CAP_SYS_ADMIN); head->fmh_entries = 0;
@@ -917,19 +932,12 @@ xfs_getfsmap( * all other low key mapping types (attr blocks, metadata), each * fsmap backend bumps the physical offset as there can be no * other mapping for the same physical block range. */ dkeys[0] = head->fmh_keys[0]; - if (dkeys[0].fmr_flags & (FMR_OF_SPECIAL_OWNER | FMR_OF_EXTENT_MAP)) { - if (dkeys[0].fmr_offset) - return -EINVAL; - } memset(&dkeys[1], 0xFF, sizeof(struct xfs_fsmap));
- if (!xfs_getfsmap_check_keys(dkeys, &head->fmh_keys[1])) - return -EINVAL; - info.next_daddr = head->fmh_keys[0].fmr_physical + head->fmh_keys[0].fmr_length; info.fsmap_recs = fsmap_recs; info.head = head;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 75dc0345312221971903b2e28279b7e24b7dbb1b ]
Use struct initializers to ensure that the xfs_btree_irecs passed into the query_range function are completely initialized. No functional changes, just closing some sloppy hygiene.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_alloc.c | 10 +++------- fs/xfs/libxfs/xfs_refcount.c | 13 +++++++------ fs/xfs/libxfs/xfs_rmap.c | 10 +++------- 3 files changed, 13 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index c08265f19136..cd5b197d7046 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3543,19 +3543,15 @@ xfs_alloc_query_range( const struct xfs_alloc_rec_incore *low_rec, const struct xfs_alloc_rec_incore *high_rec, xfs_alloc_query_range_fn fn, void *priv) { - union xfs_btree_irec low_brec; - union xfs_btree_irec high_brec; - struct xfs_alloc_query_range_info query; + union xfs_btree_irec low_brec = { .a = *low_rec }; + union xfs_btree_irec high_brec = { .a = *high_rec }; + struct xfs_alloc_query_range_info query = { .priv = priv, .fn = fn };
ASSERT(cur->bc_btnum == XFS_BTNUM_BNO); - low_brec.a = *low_rec; - high_brec.a = *high_rec; - query.priv = priv; - query.fn = fn; return xfs_btree_query_range(cur, &low_brec, &high_brec, xfs_alloc_query_range_helper, &query); }
/* Find all free space records. */ diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 4ec7a81dd3ef..7e16e76fd2e1 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1901,12 +1901,17 @@ xfs_refcount_recover_cow_leftovers( struct xfs_trans *tp; struct xfs_btree_cur *cur; struct xfs_buf *agbp; struct xfs_refcount_recovery *rr, *n; struct list_head debris; - union xfs_btree_irec low; - union xfs_btree_irec high; + union xfs_btree_irec low = { + .rc.rc_domain = XFS_REFC_DOMAIN_COW, + }; + union xfs_btree_irec high = { + .rc.rc_domain = XFS_REFC_DOMAIN_COW, + .rc.rc_startblock = -1U, + }; xfs_fsblock_t fsb; int error;
/* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */ BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG); @@ -1933,14 +1938,10 @@ xfs_refcount_recover_cow_leftovers( if (error) goto out_trans; cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag);
/* Find all the leftover CoW staging extents. */ - memset(&low, 0, sizeof(low)); - memset(&high, 0, sizeof(high)); - low.rc.rc_domain = high.rc.rc_domain = XFS_REFC_DOMAIN_COW; - high.rc.rc_startblock = -1U; error = xfs_btree_query_range(cur, &low, &high, xfs_refcount_recover_extent, &debris); xfs_btree_del_cursor(cur, error); xfs_trans_brelse(tp, agbp); xfs_trans_cancel(tp); diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index b56aca1e7c66..95d3599561ce 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -2335,18 +2335,14 @@ xfs_rmap_query_range( const struct xfs_rmap_irec *low_rec, const struct xfs_rmap_irec *high_rec, xfs_rmap_query_range_fn fn, void *priv) { - union xfs_btree_irec low_brec; - union xfs_btree_irec high_brec; - struct xfs_rmap_query_range_info query; + union xfs_btree_irec low_brec = { .r = *low_rec }; + union xfs_btree_irec high_brec = { .r = *high_rec }; + struct xfs_rmap_query_range_info query = { .priv = priv, .fn = fn };
- low_brec.r = *low_rec; - high_brec.r = *high_rec; - query.priv = priv; - query.fn = fn; return xfs_btree_query_range(cur, &low_brec, &high_brec, xfs_rmap_query_range_helper, &query); }
/* Find all rmaps. */
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit cfa2df68b7ceb49ac9eb2d295ab0c5974dbf17e7 ]
Dave Chinner reported that xfs/273 fails if the AG size happens to be an exact power of two. I traced this to an agbno integer overflow when the current GETFSMAP call is a continuation of a previous GETFSMAP call, and the last record returned was non-shareable space at the end of an AG.
__xfs_getfsmap_datadev sets up a data device query by converting the incoming fmr_physical into an xfs_fsblock_t and cracking it into an agno and agbno pair. In the (failing) case of where fmr_blockcount of the low key is nonzero and the record was for a non-shareable extent, it will add fmr_blockcount to start_fsb and info->low.rm_startblock.
If the low key was actually the last record for that AG, then this addition causes info->low.rm_startblock to point beyond EOAG. When the rmapbt range query starts, it'll return an empty set, and fsmap moves on to the next AG.
Or so I thought. Remember how we added to start_fsb?
If agsize < 1<<agblklog, start_fsb points to the same AG as the original fmr_physical from the low key. We run the rmapbt query, which returns nothing, so getfsmap zeroes info->low and moves on to the next AG.
If agsize == 1<<agblklog, start_fsb now points to the next AG. We run the rmapbt query on the next AG with the excessively large rm_startblock. If this next AG is actually the last AG, we'll set info->high to EOFS (which is now has a lower rm_startblock than info->low), and the ranged btree query code will return -EINVAL. If it's not the last AG, we ignore all records for the intermediate AGs.
Oops.
Fix this by decoding start_fsb into agno and agbno only after making adjustments to start_fsb. This means that info->low.rm_startblock will always be set to a valid agbno, and we always start the rmapbt iteration in the correct AG.
While we're at it, fix the predicate for determining if an fsmap record represents non-shareable space to include file data on pre-reflink filesystems.
Reported-by: Dave Chinner david@fromorbit.com Fixes: 63ef7a35912dd ("xfs: fix interval filtering in multi-step fsmap queries") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index d10f2c719220..956a5670e56c 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -563,10 +563,23 @@ xfs_getfsmap_rtdev_rtbitmap( xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED); return error; } #endif /* CONFIG_XFS_RT */
+static inline bool +rmap_not_shareable(struct xfs_mount *mp, const struct xfs_rmap_irec *r) +{ + if (!xfs_has_reflink(mp)) + return true; + if (XFS_RMAP_NON_INODE_OWNER(r->rm_owner)) + return true; + if (r->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK | + XFS_RMAP_UNWRITTEN)) + return true; + return false; +} + /* Execute a getfsmap query against the regular data device. */ STATIC int __xfs_getfsmap_datadev( struct xfs_trans *tp, const struct xfs_fsmap *keys, @@ -596,35 +609,33 @@ __xfs_getfsmap_datadev( /* * Convert the fsmap low/high keys to AG based keys. Initialize * low to the fsmap low key and max out the high key to the end * of the AG. */ - info->low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb); info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); if (error) return error; info->low.rm_blockcount = XFS_BB_TO_FSBT(mp, keys[0].fmr_length); xfs_getfsmap_set_irec_flags(&info->low, &keys[0]);
/* Adjust the low key if we are continuing from where we left off. */ if (info->low.rm_blockcount == 0) { - /* empty */ - } else if (XFS_RMAP_NON_INODE_OWNER(info->low.rm_owner) || - (info->low.rm_flags & (XFS_RMAP_ATTR_FORK | - XFS_RMAP_BMBT_BLOCK | - XFS_RMAP_UNWRITTEN))) { - info->low.rm_startblock += info->low.rm_blockcount; + /* No previous record from which to continue */ + } else if (rmap_not_shareable(mp, &info->low)) { + /* Last record seen was an unshareable extent */ info->low.rm_owner = 0; info->low.rm_offset = 0;
start_fsb += info->low.rm_blockcount; if (XFS_FSB_TO_DADDR(mp, start_fsb) >= eofs) return 0; } else { + /* Last record seen was a shareable file data extent */ info->low.rm_offset += info->low.rm_blockcount; } + info->low.rm_startblock = XFS_FSB_TO_AGBNO(mp, start_fsb);
info->high.rm_startblock = -1U; info->high.rm_owner = ULLONG_MAX; info->high.rm_offset = ULLONG_MAX; info->high.rm_blockcount = 0;
From: Christoph Hellwig hch@lst.de
[ Upstream commit 9ff4490e2ab364ec433f15668ef3f5edfb53feca ]
oss.sgi.com is long dead, refer to the current linux-xfs list instead.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- Documentation/ABI/testing/sysfs-fs-xfs | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-fs-xfs b/Documentation/ABI/testing/sysfs-fs-xfs index f704925f6fe9..82d8e2f79834 100644 --- a/Documentation/ABI/testing/sysfs-fs-xfs +++ b/Documentation/ABI/testing/sysfs-fs-xfs @@ -1,37 +1,37 @@ What: /sys/fs/xfs/<disk>/log/log_head_lsn Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The log sequence number (LSN) of the current head of the log. The LSN is exported in "cycle:basic block" format. Users: xfstests
What: /sys/fs/xfs/<disk>/log/log_tail_lsn Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The log sequence number (LSN) of the current tail of the log. The LSN is exported in "cycle:basic block" format.
What: /sys/fs/xfs/<disk>/log/reserve_grant_head Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The current state of the log reserve grant head. It represents the total log reservation of all currently outstanding transactions. The grant head is exported in "cycle:bytes" format. Users: xfstests
What: /sys/fs/xfs/<disk>/log/write_grant_head Date: July 2014 KernelVersion: 3.17 -Contact: xfs@oss.sgi.com +Contact: linux-xfs@vger.kernel.org Description: The current state of the log write grant head. It represents the total log reservation of all currently outstanding transactions, including regrants due to rolling transactions. The grant head is exported in
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 150bb10a28b9c8709ae227fc898d9cf6136faa1e ]
generic/388 has an annoying tendency to fail like this during log recovery:
XFS (sda4): Unmounting Filesystem 435fe39b-82b6-46ef-be56-819499585130 XFS (sda4): Mounting V5 Filesystem 435fe39b-82b6-46ef-be56-819499585130 XFS (sda4): Starting recovery (logdev: internal) 00000000: 49 4e 81 b6 03 02 00 00 00 00 00 07 00 00 00 07 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 10 ................ 00000020: 35 9a 8b c1 3e 6e 81 00 35 9a 8b c1 3f dc b7 00 5...>n..5...?... 00000030: 35 9a 8b c1 3f dc b7 00 00 00 00 00 00 3c 86 4f 5...?........<.O 00000040: 00 00 00 00 00 00 02 f3 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 1f 01 00 00 00 00 00 00 00 02 b2 74 c9 0b .............t.. 00000060: ff ff ff ff d7 45 73 10 00 00 00 00 00 00 00 2d .....Es........- 00000070: 00 00 07 92 00 01 fe 30 00 00 00 00 00 00 00 1a .......0........ 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000090: 35 9a 8b c1 3b 55 0c 00 00 00 00 00 04 27 b2 d1 5...;U.......'.. 000000a0: 43 5f e3 9b 82 b6 46 ef be 56 81 94 99 58 51 30 C_....F..V...XQ0 XFS (sda4): Internal error Bad dinode after recovery at line 539 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x4e/0xc0 [xfs] CPU: 0 PID: 2189311 Comm: mount Not tainted 6.9.0-rc4-djwx #rc4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4f/0x60 xfs_corruption_error+0x90/0xa0 xlog_recover_inode_commit_pass2+0x5f1/0xb00 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x2db/0x350 xlog_recovery_process_trans+0xab/0xe0 xlog_recover_process_data+0xa7/0x130 xlog_do_recovery_pass+0x398/0x840 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x34/0x1d0 xlog_recover+0xe9/0x1a0 xfs_log_mount+0xff/0x260 xfs_mountfs+0x5d9/0xb60 xfs_fs_fill_super+0x76b/0xa30 get_tree_bdev+0x124/0x1d0 vfs_get_tree+0x17/0xa0 path_mount+0x72b/0xa90 __x64_sys_mount+0x112/0x150 do_syscall_64+0x49/0x100 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> XFS (sda4): Corruption detected. Unmount and run xfs_repair XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0x739/0x920 [xfs], inode 0x427b2d1 XFS (sda4): Filesystem has been shut down due to log error (0x2). XFS (sda4): Please unmount the filesystem and rectify the problem(s). XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed
This inode log item recovery failing the dinode verifier after replaying the contents of the inode log item into the ondisk inode. Looking back into what the kernel was doing at the time of the fs shutdown, a thread was in the middle of running a series of transactions, each of which committed changes to the inode.
At some point in the middle of that chain, an invalid (at least according to the verifier) change was committed. Had the filesystem not shut down in the middle of the chain, a subsequent transaction would have corrected the invalid state and nobody would have noticed. But that's not what happened here. Instead, the invalid inode state was committed to the ondisk log, so log recovery tripped over it.
The actual defect here was an overzealous inode verifier, which was fixed in a separate patch. This patch adds some transaction precommit functions for CONFIG_XFS_DEBUG=y mode so that we can detect these kinds of transient errors at transaction commit time, where it's much easier to find the root cause.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/Kconfig | 12 ++++++++++++ fs/xfs/xfs.h | 4 ++++ fs/xfs/xfs_buf_item.c | 32 ++++++++++++++++++++++++++++++++ fs/xfs/xfs_dquot_item.c | 31 +++++++++++++++++++++++++++++++ fs/xfs/xfs_inode_item.c | 32 ++++++++++++++++++++++++++++++++ 5 files changed, 111 insertions(+)
diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index 9fac5ea8d0e4..dff90db507e3 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -152,10 +152,22 @@ config XFS_DEBUG Note that the resulting code will be HUGE and SLOW, and probably not useful unless you are debugging a particular problem.
Say N unless you are an XFS developer, or you play one on TV.
+config XFS_DEBUG_EXPENSIVE + bool "XFS expensive debugging checks" + depends on XFS_FS && XFS_DEBUG + help + Say Y here to get an XFS build with expensive debugging checks + enabled. These checks may affect performance significantly. + + Note that the resulting code will be HUGER and SLOWER, and probably + not useful unless you are debugging a particular problem. + + Say N unless you are an XFS developer, or you play one on TV. + config XFS_ASSERT_FATAL bool "XFS fatal asserts" default y depends on XFS_FS && XFS_DEBUG help diff --git a/fs/xfs/xfs.h b/fs/xfs/xfs.h index f6ffb4f248f7..9355ccad9503 100644 --- a/fs/xfs/xfs.h +++ b/fs/xfs/xfs.h @@ -8,10 +8,14 @@
#ifdef CONFIG_XFS_DEBUG #define DEBUG 1 #endif
+#ifdef CONFIG_XFS_DEBUG_EXPENSIVE +#define DEBUG_EXPENSIVE 1 +#endif + #ifdef CONFIG_XFS_ASSERT_FATAL #define XFS_ASSERT_FATAL 1 #endif
#ifdef CONFIG_XFS_WARN diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 023d4e0385dd..b02ce568de0c 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -20,10 +20,11 @@ #include "xfs_dquot_item.h" #include "xfs_dquot.h" #include "xfs_trace.h" #include "xfs_log.h" #include "xfs_log_priv.h" +#include "xfs_error.h"
struct kmem_cache *xfs_buf_item_cache;
static inline struct xfs_buf_log_item *BUF_ITEM(struct xfs_log_item *lip) @@ -779,12 +780,43 @@ xfs_buf_item_committed( if ((bip->bli_flags & XFS_BLI_INODE_ALLOC_BUF) && lip->li_lsn != 0) return lip->li_lsn; return lsn; }
+#ifdef DEBUG_EXPENSIVE +static int +xfs_buf_item_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_buf_log_item *bip = BUF_ITEM(lip); + struct xfs_buf *bp = bip->bli_buf; + struct xfs_mount *mp = bp->b_mount; + xfs_failaddr_t fa; + + if (!bp->b_ops || !bp->b_ops->verify_struct) + return 0; + if (bip->bli_flags & XFS_BLI_STALE) + return 0; + + fa = bp->b_ops->verify_struct(bp); + if (fa) { + xfs_buf_verifier_error(bp, -EFSCORRUPTED, bp->b_ops->name, + bp->b_addr, BBTOB(bp->b_length), fa); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + ASSERT(fa == NULL); + } + + return 0; +} +#else +# define xfs_buf_item_precommit NULL +#endif + static const struct xfs_item_ops xfs_buf_item_ops = { .iop_size = xfs_buf_item_size, + .iop_precommit = xfs_buf_item_precommit, .iop_format = xfs_buf_item_format, .iop_pin = xfs_buf_item_pin, .iop_unpin = xfs_buf_item_unpin, .iop_release = xfs_buf_item_release, .iop_committing = xfs_buf_item_committing, diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 6a1aae799cf1..7d19091215b0 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -15,10 +15,11 @@ #include "xfs_trans.h" #include "xfs_buf_item.h" #include "xfs_trans_priv.h" #include "xfs_qm.h" #include "xfs_log.h" +#include "xfs_error.h"
static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip) { return container_of(lip, struct xfs_dq_logitem, qli_item); } @@ -191,12 +192,42 @@ xfs_qm_dquot_logitem_committing( xfs_csn_t seq) { return xfs_qm_dquot_logitem_release(lip); }
+#ifdef DEBUG_EXPENSIVE +static int +xfs_qm_dquot_logitem_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_dquot *dqp = DQUOT_ITEM(lip)->qli_dquot; + struct xfs_mount *mp = dqp->q_mount; + struct xfs_disk_dquot ddq = { }; + xfs_failaddr_t fa; + + xfs_dquot_to_disk(&ddq, dqp); + fa = xfs_dquot_verify(mp, &ddq, dqp->q_id); + if (fa) { + XFS_CORRUPTION_ERROR("Bad dquot during logging", + XFS_ERRLEVEL_LOW, mp, &ddq, sizeof(ddq)); + xfs_alert(mp, + "Metadata corruption detected at %pS, dquot 0x%x", + fa, dqp->q_id); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + ASSERT(fa == NULL); + } + + return 0; +} +#else +# define xfs_qm_dquot_logitem_precommit NULL +#endif + static const struct xfs_item_ops xfs_dquot_item_ops = { .iop_size = xfs_qm_dquot_logitem_size, + .iop_precommit = xfs_qm_dquot_logitem_precommit, .iop_format = xfs_qm_dquot_logitem_format, .iop_pin = xfs_qm_dquot_logitem_pin, .iop_unpin = xfs_qm_dquot_logitem_unpin, .iop_release = xfs_qm_dquot_logitem_release, .iop_committing = xfs_qm_dquot_logitem_committing, diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 2ec23c9af760..a734ca8d8f03 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -34,10 +34,40 @@ xfs_inode_item_sort( struct xfs_log_item *lip) { return INODE_ITEM(lip)->ili_inode->i_ino; }
+#ifdef DEBUG_EXPENSIVE +static void +xfs_inode_item_precommit_check( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_dinode *dip; + xfs_failaddr_t fa; + + dip = kzalloc(mp->m_sb.sb_inodesize, GFP_KERNEL | GFP_NOFS); + if (!dip) { + ASSERT(dip != NULL); + return; + } + + xfs_inode_to_disk(ip, dip, 0); + xfs_dinode_calc_crc(mp, dip); + fa = xfs_dinode_verify(mp, ip->i_ino, dip); + if (fa) { + xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, + sizeof(*dip), fa); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + ASSERT(fa == NULL); + } + kfree(dip); +} +#else +# define xfs_inode_item_precommit_check(ip) ((void)0) +#endif + /* * Prior to finally logging the inode, we have to ensure that all the * per-modification inode state changes are applied. This includes VFS inode * state updates, format conversions, verifier state synchronisation and * ensuring the inode buffer remains in memory whilst the inode is dirty. @@ -166,10 +196,12 @@ xfs_inode_item_precommit( * in xfs_iflush() for an explanation of this coordination mechanism. */ iip->ili_fields |= (flags | iip->ili_last_fields); spin_unlock(&iip->ili_lock);
+ xfs_inode_item_precommit_check(ip); + /* * We are done with the log item transaction dirty state, so clear it so * that it doesn't pollute future transactions. */ iip->ili_dirty_flags = 0;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 24a4e1cb322e2bf0f3a1afd1978b610a23aa8f36 ]
[ 6.1: resolved conflicts in xfs_inode.c and xfs_symlink.c due to 6.1 not having switched to idmap yet ]
I noticed that callers of xfs_qm_vop_dqalloc use the following code to compute the anticipated uid of the new file:
mapped_fsuid(idmap, &init_user_ns);
whereas the VFS uses a slightly different computation for actually assigning i_uid:
mapped_fsuid(idmap, i_user_ns(inode));
Technically, these are not the same things. According to Christian Brauner, the only time that inode->i_sb->s_user_ns != &init_user_ns is when the filesystem was mounted in a new mount namespace by an unpriviledged user. XFS does not allow this, which is why we've never seen bug reports about quotas being incorrect or the uid checks in xfs_qm_vop_create_dqattach tripping debug assertions.
However, this /is/ a logic bomb, so let's make the code consistent.
Link: https://lore.kernel.org/linux-fsdevel/20240617-weitblick-gefertigt-4a41f3711... Fixes: c14329d39f2d ("fs: port fs{g,u}id helpers to mnt_idmap") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_inode.c | 16 ++++++++++------ fs/xfs/xfs_symlink.c | 8 +++++--- 2 files changed, 15 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b26d26d29273..88d0a088fa86 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -981,14 +981,16 @@ xfs_create( return -EIO;
prid = xfs_get_initial_prid(dp);
/* - * Make sure that we have allocated dquot(s) on disk. + * Make sure that we have allocated dquot(s) on disk. The uid/gid + * computation code must match what the VFS uses to assign i_[ug]id. + * INHERIT adjusts the gid computation for setgid/grpid systems. */ - error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns), - mapped_fsgid(mnt_userns, &init_user_ns), prid, + error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, i_user_ns(VFS_I(dp))), + mapped_fsgid(mnt_userns, i_user_ns(VFS_I(dp))), prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) return error;
@@ -1130,14 +1132,16 @@ xfs_create_tmpfile( return -EIO;
prid = xfs_get_initial_prid(dp);
/* - * Make sure that we have allocated dquot(s) on disk. + * Make sure that we have allocated dquot(s) on disk. The uid/gid + * computation code must match what the VFS uses to assign i_[ug]id. + * INHERIT adjusts the gid computation for setgid/grpid systems. */ - error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns), - mapped_fsgid(mnt_userns, &init_user_ns), prid, + error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, i_user_ns(VFS_I(dp))), + mapped_fsgid(mnt_userns, i_user_ns(VFS_I(dp))), prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) return error;
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 8389f3ef88ef..78bd02a98aa5 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -189,14 +189,16 @@ xfs_symlink( ASSERT(pathlen > 0);
prid = xfs_get_initial_prid(dp);
/* - * Make sure that we have allocated dquot(s) on disk. + * Make sure that we have allocated dquot(s) on disk. The uid/gid + * computation code must match what the VFS uses to assign i_[ug]id. + * INHERIT adjusts the gid computation for setgid/grpid systems. */ - error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns), - mapped_fsgid(mnt_userns, &init_user_ns), prid, + error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, i_user_ns(VFS_I(dp))), + mapped_fsgid(mnt_userns, i_user_ns(VFS_I(dp))), prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) return error;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 00acb28d96746f78389f23a7b5309a917b45c12f ]
Move the two public symbols in xfs_file.c to xfs_file.h. We're about to add more public symbols in that source file, so let's finally create the header file.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_file.c | 1 + fs/xfs/xfs_file.h | 12 ++++++++++++ fs/xfs/xfs_ioctl.c | 1 + fs/xfs/xfs_iops.c | 1 + fs/xfs/xfs_iops.h | 3 --- 5 files changed, 15 insertions(+), 3 deletions(-) create mode 100644 fs/xfs/xfs_file.h
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 821cb86a83bd..6f7522977f7f 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -22,10 +22,11 @@ #include "xfs_log.h" #include "xfs_icache.h" #include "xfs_pnfs.h" #include "xfs_iomap.h" #include "xfs_reflink.h" +#include "xfs_file.h"
#include <linux/dax.h> #include <linux/falloc.h> #include <linux/backing-dev.h> #include <linux/mman.h> diff --git a/fs/xfs/xfs_file.h b/fs/xfs/xfs_file.h new file mode 100644 index 000000000000..7d39e3eca56d --- /dev/null +++ b/fs/xfs/xfs_file.h @@ -0,0 +1,12 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#ifndef __XFS_FILE_H__ +#define __XFS_FILE_H__ + +extern const struct file_operations xfs_file_operations; +extern const struct file_operations xfs_dir_file_operations; + +#endif /* __XFS_FILE_H__ */ diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index c7cb496dc345..1afb1b1b831e 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -36,10 +36,11 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_reflink.h" #include "xfs_ioctl.h" #include "xfs_xattr.h" +#include "xfs_file.h"
#include <linux/mount.h> #include <linux/namei.h> #include <linux/fileattr.h>
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 6fbdc0a19e54..9ca1b8bf1f05 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -23,10 +23,11 @@ #include "xfs_dir2.h" #include "xfs_iomap.h" #include "xfs_error.h" #include "xfs_ioctl.h" #include "xfs_xattr.h" +#include "xfs_file.h"
#include <linux/posix_acl.h> #include <linux/security.h> #include <linux/iversion.h> #include <linux/fiemap.h> diff --git a/fs/xfs/xfs_iops.h b/fs/xfs/xfs_iops.h index e570dcb5df8d..73ff92355eaa 100644 --- a/fs/xfs/xfs_iops.h +++ b/fs/xfs/xfs_iops.h @@ -6,13 +6,10 @@ #ifndef __XFS_IOPS_H__ #define __XFS_IOPS_H__
struct xfs_inode;
-extern const struct file_operations xfs_file_operations; -extern const struct file_operations xfs_dir_file_operations; - extern ssize_t xfs_vn_listxattr(struct dentry *, char *data, size_t size);
int xfs_vn_setattr_size(struct user_namespace *mnt_userns, struct dentry *dentry, struct iattr *vap);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit ee20808d848c87a51e176706d81b95a21747d6cf ]
Create a new helper function to calculate the fundamental allocation unit (i.e. the smallest unit of space we can allocate) of a file. Things are going to get hairy with range-exchange on the realtime device, so prepare for this now.
Remove the static attribute from xfs_is_falloc_aligned since the next patch will need it.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_file.c | 32 ++++++++++++-------------------- fs/xfs/xfs_file.h | 3 +++ fs/xfs/xfs_inode.c | 13 +++++++++++++ fs/xfs/xfs_inode.h | 2 ++ 4 files changed, 30 insertions(+), 20 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 6f7522977f7f..3c910e36da69 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -37,37 +37,29 @@ static const struct vm_operations_struct xfs_file_vm_ops;
/* * Decide if the given file range is aligned to the size of the fundamental * allocation unit for the file. */ -static bool +bool xfs_is_falloc_aligned( struct xfs_inode *ip, loff_t pos, long long int len) { - struct xfs_mount *mp = ip->i_mount; - uint64_t mask; - - if (XFS_IS_REALTIME_INODE(ip)) { - if (!is_power_of_2(mp->m_sb.sb_rextsize)) { - u64 rextbytes; - u32 mod; - - rextbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - div_u64_rem(pos, rextbytes, &mod); - if (mod) - return false; - div_u64_rem(len, rextbytes, &mod); - return mod == 0; - } - mask = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) - 1; - } else { - mask = mp->m_sb.sb_blocksize - 1; + unsigned int alloc_unit = xfs_inode_alloc_unitsize(ip); + + if (!is_power_of_2(alloc_unit)) { + u32 mod; + + div_u64_rem(pos, alloc_unit, &mod); + if (mod) + return false; + div_u64_rem(len, alloc_unit, &mod); + return mod == 0; }
- return !((pos | len) & mask); + return !((pos | len) & (alloc_unit - 1)); }
/* * Fsync operations on directories are much simpler than on regular files, * as there is no file data to flush, and thus also no need for explicit diff --git a/fs/xfs/xfs_file.h b/fs/xfs/xfs_file.h index 7d39e3eca56d..2ad91f755caf 100644 --- a/fs/xfs/xfs_file.h +++ b/fs/xfs/xfs_file.h @@ -7,6 +7,9 @@ #define __XFS_FILE_H__
extern const struct file_operations xfs_file_operations; extern const struct file_operations xfs_dir_file_operations;
+bool xfs_is_falloc_aligned(struct xfs_inode *ip, loff_t pos, + long long int len); + #endif /* __XFS_FILE_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 88d0a088fa86..3ccbc31767b3 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3773,5 +3773,18 @@ xfs_inode_reload_unlinked( xfs_iunlock(ip, XFS_ILOCK_SHARED); xfs_trans_cancel(tp);
return error; } + +/* Returns the size of fundamental allocation unit for a file, in bytes. */ +unsigned int +xfs_inode_alloc_unitsize( + struct xfs_inode *ip) +{ + unsigned int blocks = 1; + + if (XFS_IS_REALTIME_INODE(ip)) + blocks = ip->i_mount->m_sb.sb_rextsize; + + return XFS_FSB_TO_B(ip->i_mount, blocks); +} diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index c177c92f3aa5..c4f426eadf8e 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -620,6 +620,8 @@ xfs_inode_unlinked_incomplete( return VFS_I(ip)->i_nlink == 0 && !xfs_inode_on_unlinked_list(ip); } int xfs_inode_reload_unlinked_bucket(struct xfs_trans *tp, struct xfs_inode *ip); int xfs_inode_reload_unlinked(struct xfs_inode *ip);
+unsigned int xfs_inode_alloc_unitsize(struct xfs_inode *ip); + #endif /* __XFS_INODE_H__ */
From: John Garry john.g.garry@oracle.com
[ Upstream commit d3b689d7c711a9f36d3e48db9eaa75784a892f4c ]
Currently xfs_flush_unmap_range() does unmap for a full RT extent range, which we also want to ensure is clean and idle.
This code change is originally from Dave Chinner.
Reviewed-by: Christoph Hellwig hch@lst.de4 Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_bmap_util.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 62b92e92a685..dabae6323c50 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -961,18 +961,22 @@ int xfs_flush_unmap_range( struct xfs_inode *ip, xfs_off_t offset, xfs_off_t len) { - struct xfs_mount *mp = ip->i_mount; struct inode *inode = VFS_I(ip); xfs_off_t rounding, start, end; int error;
- rounding = max_t(xfs_off_t, mp->m_sb.sb_blocksize, PAGE_SIZE); - start = round_down(offset, rounding); - end = round_up(offset + len, rounding) - 1; + /* + * Make sure we extend the flush out to extent alignment + * boundaries so any extent range overlapping the start/end + * of the modification we are about to do is clean and idle. + */ + rounding = max_t(xfs_off_t, xfs_inode_alloc_unitsize(ip), PAGE_SIZE); + start = rounddown_64(offset, rounding); + end = roundup_64(offset + len, rounding) - 1;
error = filemap_write_and_wait_range(inode->i_mapping, start, end); if (error) return error; truncate_pagecache_range(inode, start, end);
From: John Garry john.g.garry@oracle.com
[ Upstream commit f23660f059470ec7043748da7641e84183c23bc8 ]
The RT extent range must be considered in the xfs_flush_unmap_range() call to stabilize the boundary.
This code change is originally from Dave Chinner.
Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: John Garry john.g.garry@oracle.com Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_bmap_util.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index dabae6323c50..bab8ba224e10 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -1057,33 +1057,35 @@ xfs_free_file_space( static int xfs_prepare_shift( struct xfs_inode *ip, loff_t offset) { - struct xfs_mount *mp = ip->i_mount; + unsigned int rounding; int error;
/* * Trim eofblocks to avoid shifting uninitialized post-eof preallocation * into the accessible region of the file. */ if (xfs_can_free_eofblocks(ip)) { error = xfs_free_eofblocks(ip); if (error) return error; }
/* * Shift operations must stabilize the start block offset boundary along * with the full range of the operation. If we don't, a COW writeback * completion could race with an insert, front merge with the start * extent (after split) during the shift and corrupt the file. Start - * with the block just prior to the start to stabilize the boundary. + * with the allocation unit just prior to the start to stabilize the + * boundary. */ - offset = round_down(offset, mp->m_sb.sb_blocksize); + rounding = xfs_inode_alloc_unitsize(ip); + offset = rounddown_64(offset, rounding); if (offset) - offset -= mp->m_sb.sb_blocksize; + offset -= rounding;
/* * Writeback and invalidate cache for the remainder of the file as we're * about to shift down every extent from offset to EOF. */
From: lei lu llfamsec@gmail.com
[ Upstream commit 0c7fcdb6d06cdf8b19b57c17605215b06afa864a ]
This adds sanity checks for xfs_dir2_data_unused and xfs_dir2_data_entry to make sure don't stray beyond valid memory region. Before patching, the loop simply checks that the start offset of the dup and dep is within the range. So in a crafted image, if last entry is xfs_dir2_data_unused, we can change dup->length to dup->length-1 and leave 1 byte of space. In the next traversal, this space will be considered as dup or dep. We may encounter an out of bound read when accessing the fixed members.
In the patch, we make sure that the remaining bytes large enough to hold an unused entry before accessing xfs_dir2_data_unused and xfs_dir2_data_unused is XFS_DIR2_DATA_ALIGN byte aligned. We also make sure that the remaining bytes large enough to hold a dirent with a single-byte name before accessing xfs_dir2_data_entry.
Signed-off-by: lei lu llfamsec@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_dir2_data.c | 31 ++++++++++++++++++++++++++----- fs/xfs/libxfs/xfs_dir2_priv.h | 7 +++++++ 2 files changed, 33 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_dir2_data.c b/fs/xfs/libxfs/xfs_dir2_data.c index dbcf58979a59..e1d5da6d8d4a 100644 --- a/fs/xfs/libxfs/xfs_dir2_data.c +++ b/fs/xfs/libxfs/xfs_dir2_data.c @@ -175,78 +175,99 @@ __xfs_dir3_data_check( * Loop over the data/unused entries. */ while (offset < end) { struct xfs_dir2_data_unused *dup = bp->b_addr + offset; struct xfs_dir2_data_entry *dep = bp->b_addr + offset; + unsigned int reclen; + + /* + * Are the remaining bytes large enough to hold an + * unused entry? + */ + if (offset > end - xfs_dir2_data_unusedsize(1)) + return __this_address;
/* * If it's unused, look for the space in the bestfree table. * If we find it, account for that, else make sure it * doesn't need to be there. */ if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) { xfs_failaddr_t fa;
+ reclen = xfs_dir2_data_unusedsize( + be16_to_cpu(dup->length)); if (lastfree != 0) return __this_address; - if (offset + be16_to_cpu(dup->length) > end) + if (be16_to_cpu(dup->length) != reclen) + return __this_address; + if (offset + reclen > end) return __this_address; if (be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)) != offset) return __this_address; fa = xfs_dir2_data_freefind_verify(hdr, bf, dup, &dfp); if (fa) return fa; if (dfp) { i = (int)(dfp - bf); if ((freeseen & (1 << i)) != 0) return __this_address; freeseen |= 1 << i; } else { if (be16_to_cpu(dup->length) > be16_to_cpu(bf[2].length)) return __this_address; } - offset += be16_to_cpu(dup->length); + offset += reclen; lastfree = 1; continue; } + + /* + * This is not an unused entry. Are the remaining bytes + * large enough for a dirent with a single-byte name? + */ + if (offset > end - xfs_dir2_data_entsize(mp, 1)) + return __this_address; + /* * It's a real entry. Validate the fields. * If this is a block directory then make sure it's * in the leaf section of the block. * The linear search is crude but this is DEBUG code. */ if (dep->namelen == 0) return __this_address; - if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber))) + reclen = xfs_dir2_data_entsize(mp, dep->namelen); + if (offset + reclen > end) return __this_address; - if (offset + xfs_dir2_data_entsize(mp, dep->namelen) > end) + if (!xfs_verify_dir_ino(mp, be64_to_cpu(dep->inumber))) return __this_address; if (be16_to_cpu(*xfs_dir2_data_entry_tag_p(mp, dep)) != offset) return __this_address; if (xfs_dir2_data_get_ftype(mp, dep) >= XFS_DIR3_FT_MAX) return __this_address; count++; lastfree = 0; if (hdr->magic == cpu_to_be32(XFS_DIR2_BLOCK_MAGIC) || hdr->magic == cpu_to_be32(XFS_DIR3_BLOCK_MAGIC)) { addr = xfs_dir2_db_off_to_dataptr(geo, geo->datablk, (xfs_dir2_data_aoff_t) ((char *)dep - (char *)hdr)); name.name = dep->name; name.len = dep->namelen; hash = xfs_dir2_hashname(mp, &name); for (i = 0; i < be32_to_cpu(btp->count); i++) { if (be32_to_cpu(lep[i].address) == addr && be32_to_cpu(lep[i].hashval) == hash) break; } if (i >= be32_to_cpu(btp->count)) return __this_address; } - offset += xfs_dir2_data_entsize(mp, dep->namelen); + offset += reclen; } /* * Need to have seen all the entries and all the bestfree slots. */ if (freeseen != 7) diff --git a/fs/xfs/libxfs/xfs_dir2_priv.h b/fs/xfs/libxfs/xfs_dir2_priv.h index 7404a9ff1a92..9046d08554e9 100644 --- a/fs/xfs/libxfs/xfs_dir2_priv.h +++ b/fs/xfs/libxfs/xfs_dir2_priv.h @@ -185,10 +185,17 @@ void xfs_dir2_sf_put_ftype(struct xfs_mount *mp,
/* xfs_dir2_readdir.c */ extern int xfs_readdir(struct xfs_trans *tp, struct xfs_inode *dp, struct dir_context *ctx, size_t bufsize);
+static inline unsigned int +xfs_dir2_data_unusedsize( + unsigned int len) +{ + return round_up(len, XFS_DIR2_DATA_ALIGN); +} + static inline unsigned int xfs_dir2_data_entsize( struct xfs_mount *mp, unsigned int namelen) {
From: Julian Sun sunjunchao2870@gmail.com
[ Upstream commit af5d92f2fad818663da2ce073b6fe15b9d56ffdc ]
In the macro definition of XFS_DQUOT_LOGRES, a parameter is accepted, but it is not used. Hence, it should be removed.
This patch has only passed compilation test, but it should be fine.
Signed-off-by: Julian Sun sunjunchao2870@gmail.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_quota_defs.h | 2 +- fs/xfs/libxfs/xfs_trans_resv.c | 28 ++++++++++++++-------------- 2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_quota_defs.h b/fs/xfs/libxfs/xfs_quota_defs.h index cb035da3f990..fb05f44f6c75 100644 --- a/fs/xfs/libxfs/xfs_quota_defs.h +++ b/fs/xfs/libxfs/xfs_quota_defs.h @@ -54,11 +54,11 @@ typedef uint8_t xfs_dqtype_t; * dquots can be unique and so 6 dquots can be modified.... * * And, of course, we also need to take into account the dquot log format item * used to describe each dquot. */ -#define XFS_DQUOT_LOGRES(mp) \ +#define XFS_DQUOT_LOGRES \ ((sizeof(struct xfs_dq_logformat) + sizeof(struct xfs_disk_dquot)) * 6)
#define XFS_IS_QUOTA_ON(mp) ((mp)->m_qflags & XFS_ALL_QUOTA_ACCT) #define XFS_IS_UQUOTA_ON(mp) ((mp)->m_qflags & XFS_UQUOTA_ACCT) #define XFS_IS_PQUOTA_ON(mp) ((mp)->m_qflags & XFS_PQUOTA_ACCT) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 5b2f27cbdb80..1bb2891b26ff 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -332,15 +332,15 @@ xfs_calc_write_reservation( adj = xfs_calc_buf_res( xfs_refcountbt_block_count(mp, 2), blksz); t1 += adj; t3 += adj; - return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3); + return XFS_DQUOT_LOGRES + max3(t1, t2, t3); }
t4 = xfs_calc_refcountbt_reservation(mp, 1); - return XFS_DQUOT_LOGRES(mp) + max(t4, max3(t1, t2, t3)); + return XFS_DQUOT_LOGRES + max(t4, max3(t1, t2, t3)); }
unsigned int xfs_calc_write_reservation_minlogsize( struct xfs_mount *mp) @@ -404,41 +404,41 @@ xfs_calc_itruncate_reservation( if (xfs_has_reflink(mp)) t2 += xfs_calc_buf_res( xfs_refcountbt_block_count(mp, 4), blksz);
- return XFS_DQUOT_LOGRES(mp) + max3(t1, t2, t3); + return XFS_DQUOT_LOGRES + max3(t1, t2, t3); }
t4 = xfs_calc_refcountbt_reservation(mp, 2); - return XFS_DQUOT_LOGRES(mp) + max(t4, max3(t1, t2, t3)); + return XFS_DQUOT_LOGRES + max(t4, max3(t1, t2, t3)); }
unsigned int xfs_calc_itruncate_reservation_minlogsize( struct xfs_mount *mp) { return xfs_calc_itruncate_reservation(mp, true); }
/* * In renaming a files we can modify: * the five inodes involved: 5 * inode size * the two directory btrees: 2 * (max depth + v2) * dir block size * the two directory bmap btrees: 2 * max depth * block size * And the bmap_finish transaction can free dir and bmap blocks (two sets * of bmap blocks) giving: * the agf for the ags in which the blocks live: 3 * sector size * the agfl for the ags in which the blocks live: 3 * sector size * the superblock for the free block count: sector size * the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size */ STATIC uint xfs_calc_rename_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + max((xfs_calc_inode_res(mp, 5) + xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3), @@ -473,11 +473,11 @@ xfs_calc_iunlink_remove_reservation( */ STATIC uint xfs_calc_link_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + xfs_calc_iunlink_remove_reservation(mp) + max((xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + @@ -511,11 +511,11 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp) */ STATIC uint xfs_calc_remove_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + xfs_calc_iunlink_add_reservation(mp) + max((xfs_calc_inode_res(mp, 2) + xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1))), (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) + @@ -570,20 +570,20 @@ xfs_calc_icreate_resv_alloc( }
STATIC uint xfs_calc_icreate_reservation(xfs_mount_t *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + max(xfs_calc_icreate_resv_alloc(mp), xfs_calc_create_resv_modify(mp)); }
STATIC uint xfs_calc_create_tmpfile_reservation( struct xfs_mount *mp) { - uint res = XFS_DQUOT_LOGRES(mp); + uint res = XFS_DQUOT_LOGRES;
res += xfs_calc_icreate_resv_alloc(mp); return res + xfs_calc_iunlink_add_reservation(mp); }
@@ -628,28 +628,28 @@ xfs_calc_symlink_reservation( */ STATIC uint xfs_calc_ifree_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) + xfs_calc_iunlink_remove_reservation(mp) + xfs_calc_inode_chunk_res(mp, _FREE) + xfs_calc_inobt_res(mp) + xfs_calc_finobt_res(mp); }
/* * When only changing the inode we log the inode and possibly the superblock * We also add a bit of slop for the transaction stuff. */ STATIC uint xfs_calc_ichange_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
}
@@ -754,11 +754,11 @@ xfs_calc_writeid_reservation( */ STATIC uint xfs_calc_addafork_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(1, mp->m_dir_geo->blksize) + xfs_calc_buf_res(XFS_DAENTER_BMAP1B(mp, XFS_DATA_FORK) + 1, XFS_FSB_TO_B(mp, 1)) + @@ -802,11 +802,11 @@ xfs_calc_attrinval_reservation( */ STATIC uint xfs_calc_attrsetm_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(1, mp->m_sb.sb_sectsize) + xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1)); }
@@ -842,11 +842,11 @@ xfs_calc_attrsetrt_reservation( */ STATIC uint xfs_calc_attrrm_reservation( struct xfs_mount *mp) { - return XFS_DQUOT_LOGRES(mp) + + return XFS_DQUOT_LOGRES + max((xfs_calc_inode_res(mp, 1) + xfs_calc_buf_res(XFS_DA_NODE_MAXDEPTH, XFS_FSB_TO_B(mp, 1)) + (uint)XFS_FSB_TO_B(mp, XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)) +
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 73c34b0b85d46bf9c2c0b367aeaffa1e2481b136 ]
It turns out that I misunderstood the difference between the attr and attr2 feature bits. "attr" means that at some point an attr fork was created somewhere in the filesystem. "attr2" means that inodes have variable-sized forks, but says nothing about whether or not there actually /are/ attr forks in the system.
If we have an attr fork, we only need to check that attr is set.
Fixes: 99d9d8d05da26 ("xfs: scrub inode block mappings") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/scrub/bmap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index f0b9cb6506fd..45b135929144 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -645,11 +645,17 @@ xchk_bmap( xchk_ino_set_corrupt(sc, sc->ip->i_ino); goto out; } break; case XFS_ATTR_FORK: - if (!xfs_has_attr(mp) && !xfs_has_attr2(mp)) + /* + * "attr" means that an attr fork was created at some point in + * the life of this filesystem. "attr2" means that inodes have + * variable-sized data/attr fork areas. Hence we only check + * attr here. + */ + if (!xfs_has_attr(mp)) xchk_ino_set_corrupt(sc, sc->ip->i_ino); break; default: ASSERT(whichfork == XFS_DATA_FORK); break;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 8d16762047c627073955b7ed171a36addaf7b1ff ]
If a file has the S_DAX flag (aka fsdax access mode) set, we cannot allow users to change the realtime flag unless the datadev and rtdev both support fsdax access modes. Even if there are no extents allocated to the file, the setattr thread could be racing with another thread that has already started down the write code paths.
Fixes: ba23cba9b3bdc ("fs: allow per-device dax status checking for filesystems") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_ioctl.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 1afb1b1b831e..ef3dc0778566 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1126,10 +1126,21 @@ xfs_ioctl_setattr_xflags(
if (rtflag != XFS_IS_REALTIME_INODE(ip)) { /* Can't change realtime flag if any extents are allocated. */ if (ip->i_df.if_nextents || ip->i_delayed_blks) return -EINVAL; + + /* + * If S_DAX is enabled on this file, we can only switch the + * device if both support fsdax. We can't update S_DAX because + * there might be other threads walking down the access paths. + */ + if (IS_DAX(VFS_I(ip)) && + (mp->m_ddev_targp->bt_daxdev == NULL || + (mp->m_rtdev_targp && + mp->m_rtdev_targp->bt_daxdev == NULL))) + return -EINVAL; }
if (rtflag) { /* If realtime flag is set then must have realtime device */ if (mp->m_sb.sb_rblocks == 0 || mp->m_sb.sb_rextsize == 0 ||
From: Zizhi Wo wozizhi@huawei.com
[ Upstream commit 68415b349f3f16904f006275757f4fcb34b8ee43 ]
I notice a rmap query bug in xfs_io fsmap: [root@fedora ~]# xfs_io -c 'fsmap -vvvv' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8 1: 253:16 [8..23]: per-AG metadata 0 (8..23) 16 2: 253:16 [24..39]: inode btree 0 (24..39) 16 3: 253:16 [40..47]: per-AG metadata 0 (40..47) 8 4: 253:16 [48..55]: refcount btree 0 (48..55) 8 5: 253:16 [56..103]: per-AG metadata 0 (56..103) 48 6: 253:16 [104..127]: free space 0 (104..127) 24 ......
Bug: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt [root@fedora ~]# Normally, we should be able to get one record, but we got nothing.
The root cause of this problem lies in the incorrect setting of rm_owner in the rmap query. In the case of the initial query where the owner is not set, __xfs_getfsmap_datadev() first sets info->high.rm_owner to ULLONG_MAX. This is done to prevent any omissions when comparing rmap items. However, if the current ag is detected to be the last one, the function sets info's high_irec based on the provided key. If high->rm_owner is not specified, it should continue to be set to ULLONG_MAX; otherwise, there will be issues with interval omissions. For example, consider "start" and "end" within the same block. If high->rm_owner == 0, it will be smaller than the founded record in rmapbt, resulting in a query with no records. The main call stack is as follows:
xfs_ioc_getfsmap xfs_getfsmap xfs_getfsmap_datadev_rmapbt __xfs_getfsmap_datadev info->high.rm_owner = ULLONG_MAX if (pag->pag_agno == end_ag) xfs_fsmap_owner_to_rmap // set info->high.rm_owner = 0 because fmr_owner == -1ULL dest->rm_owner = 0 // get nothing xfs_getfsmap_datadev_rmapbt_query
The problem can be resolved by simply modify the xfs_fsmap_owner_to_rmap function internal logic to achieve.
After applying this patch, the above problem have been solved: [root@fedora ~]# xfs_io -c 'fsmap -vvvv -d 0 3' /mnt EXT: DEV BLOCK-RANGE OWNER FILE-OFFSET AG AG-OFFSET TOTAL 0: 253:16 [0..7]: static fs metadata 0 (0..7) 8
Fixes: e89c041338ed ("xfs: implement the GETFSMAP ioctl") Signed-off-by: Zizhi Wo wozizhi@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 956a5670e56c..1efd18437ca4 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -69,11 +69,11 @@ xfs_fsmap_owner_to_rmap( }
switch (src->fmr_owner) { case 0: /* "lowest owner id possible" */ case -1ULL: /* "highest owner id possible" */ - dest->rm_owner = 0; + dest->rm_owner = src->fmr_owner; break; case XFS_FMR_OWN_FREE: dest->rm_owner = XFS_RMAP_OWN_NULL; break; case XFS_FMR_OWN_UNKNOWN:
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 6b35cc8d9239569700cc7cc737c8ed40b8b9cfdb ]
Use XFS_BUF_DADDR_NULL (instead of a magic sentinel value) to mean "this field is null" like the rest of xfs.
Cc: wozizhi@huawei.com Fixes: e89c041338ed6 ("xfs: implement the GETFSMAP ioctl") Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_fsmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 1efd18437ca4..a0668a1ef100 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -250,11 +250,11 @@ static inline bool xfs_getfsmap_rec_before_start( struct xfs_getfsmap_info *info, const struct xfs_rmap_irec *rec, xfs_daddr_t rec_daddr) { - if (info->low_daddr != -1ULL) + if (info->low_daddr != XFS_BUF_DADDR_NULL) return rec_daddr < info->low_daddr; if (info->low.rm_blockcount) return xfs_rmap_compare(rec, &info->low) < 0; return false; } @@ -984,11 +984,11 @@ xfs_getfsmap( break;
info.dev = handlers[i].dev; info.last = false; info.pag = NULL; - info.low_daddr = -1ULL; + info.low_daddr = XFS_BUF_DADDR_NULL; info.low.rm_blockcount = 0; error = handlers[i].fn(tp, dkeys, &info); if (error) break; xfs_trans_cancel(tp);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 16e1fbdce9c8d084863fd63cdaff8fb2a54e2f88 ]
Take the grow lock when we're expanding the realtime volume, like we do for the other growfs calls.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_rtalloc.c | 38 +++++++++++++++++++++++++------------- 1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 5cf1e91f4c20..149fcfc485d8 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -951,83 +951,93 @@ xfs_growfs_rt( return -EPERM;
/* Needs to have been mounted with an rt device. */ if (!XFS_IS_REALTIME_MOUNT(mp)) return -EINVAL; + + if (!mutex_trylock(&mp->m_growlock)) + return -EWOULDBLOCK; /* * Mount should fail if the rt bitmap/summary files don't load, but * we'll check anyway. */ + error = -EINVAL; if (!mp->m_rbmip || !mp->m_rsumip) - return -EINVAL; + goto out_unlock;
/* Shrink not supported. */ if (in->newblocks <= sbp->sb_rblocks) - return -EINVAL; + goto out_unlock;
/* Can only change rt extent size when adding rt volume. */ if (sbp->sb_rblocks > 0 && in->extsize != sbp->sb_rextsize) - return -EINVAL; + goto out_unlock;
/* Range check the extent size. */ if (XFS_FSB_TO_B(mp, in->extsize) > XFS_MAX_RTEXTSIZE || XFS_FSB_TO_B(mp, in->extsize) < XFS_MIN_RTEXTSIZE) - return -EINVAL; + goto out_unlock;
/* Unsupported realtime features. */ + error = -EOPNOTSUPP; if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp)) - return -EOPNOTSUPP; + goto out_unlock;
nrblocks = in->newblocks; error = xfs_sb_validate_fsb_count(sbp, nrblocks); if (error) - return error; + goto out_unlock; /* * Read in the last block of the device, make sure it exists. */ error = xfs_buf_read_uncached(mp->m_rtdev_targp, XFS_FSB_TO_BB(mp, nrblocks - 1), XFS_FSB_TO_BB(mp, 1), 0, &bp, NULL); if (error) - return error; + goto out_unlock; xfs_buf_relse(bp);
/* * Calculate new parameters. These are the final values to be reached. */ nrextents = nrblocks; do_div(nrextents, in->extsize); - if (!xfs_validate_rtextents(nrextents)) - return -EINVAL; + if (!xfs_validate_rtextents(nrextents)) { + error = -EINVAL; + goto out_unlock; + } nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize); nrextslog = xfs_compute_rextslog(nrextents); nrsumlevels = nrextslog + 1; nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks; nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize); nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); /* * New summary size can't be more than half the size of * the log. This prevents us from getting a log overflow, * since we'll log basically the whole summary file at once. */ - if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1)) - return -EINVAL; + if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1)) { + error = -EINVAL; + goto out_unlock; + } + /* * Get the old block counts for bitmap and summary inodes. * These can't change since other growfs callers are locked out. */ rbmblocks = XFS_B_TO_FSB(mp, mp->m_rbmip->i_disk_size); rsumblocks = XFS_B_TO_FSB(mp, mp->m_rsumip->i_disk_size); /* * Allocate space to the bitmap and summary files, as necessary. */ error = xfs_growfs_rt_alloc(mp, rbmblocks, nrbmblocks, mp->m_rbmip); if (error) - return error; + goto out_unlock; error = xfs_growfs_rt_alloc(mp, rsumblocks, nrsumblocks, mp->m_rsumip); if (error) - return error; + goto out_unlock;
rsum_cache = mp->m_rsum_cache; if (nrbmblocks != sbp->sb_rbmblocks) xfs_alloc_rsum_cache(mp, nrbmblocks);
@@ -1188,10 +1198,12 @@ xfs_growfs_rt( } else { kmem_free(rsum_cache); } }
+out_unlock: + mutex_unlock(&mp->m_growlock); return error; }
/* * Allocate an extent in the realtime subvolume, with the usual allocation
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit a24cae8fc1f13f6f6929351309f248fd2e9351ce ]
If growfsrt is run on a filesystem that doesn't have a rt volume, it's possible to change the rt extent size. If the root directory was previously set up with an inherited extent size hint and rtinherit, it's possible that the hint is no longer a multiple of the rt extent size. Although the verifiers don't complain about this, xfs_repair will, so if we detect this situation, log the root directory to clean it up. This is still racy, but it's better than nothing.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_rtalloc.c | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 149fcfc485d8..fc21b4e81ade 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -913,10 +913,43 @@ xfs_alloc_rsum_cache( mp->m_rsum_cache = kvzalloc(rbmblocks, GFP_KERNEL); if (!mp->m_rsum_cache) xfs_warn(mp, "could not allocate realtime summary cache"); }
+/* + * If we changed the rt extent size (meaning there was no rt volume previously) + * and the root directory had EXTSZINHERIT and RTINHERIT set, it's possible + * that the extent size hint on the root directory is no longer congruent with + * the new rt extent size. Log the rootdir inode to fix this. + */ +static int +xfs_growfs_rt_fixup_extsize( + struct xfs_mount *mp) +{ + struct xfs_inode *ip = mp->m_rootip; + struct xfs_trans *tp; + int error = 0; + + xfs_ilock(ip, XFS_IOLOCK_EXCL); + if (!(ip->i_diflags & XFS_DIFLAG_RTINHERIT) || + !(ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT)) + goto out_iolock; + + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_ichange, 0, 0, false, + &tp); + if (error) + goto out_iolock; + + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + error = xfs_trans_commit(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); + +out_iolock: + xfs_iunlock(ip, XFS_IOLOCK_EXCL); + return error; +} + /* * Visible (exported) functions. */
/* @@ -942,10 +975,11 @@ xfs_growfs_rt( xfs_extlen_t rbmblocks; /* current number of rt bitmap blocks */ xfs_extlen_t rsumblocks; /* current number of rt summary blks */ xfs_sb_t *sbp; /* old superblock */ xfs_fsblock_t sumbno; /* summary block number */ uint8_t *rsum_cache; /* old summary cache */ + xfs_agblock_t old_rextsize = mp->m_sb.sb_rextsize;
sbp = &mp->m_sb;
if (!capable(CAP_SYS_ADMIN)) return -EPERM; @@ -1175,10 +1209,16 @@ xfs_growfs_rt( mp->m_features |= XFS_FEAT_REALTIME; } if (error) goto out_free;
+ if (old_rextsize != in->extsize) { + error = xfs_growfs_rt_fixup_extsize(mp); + if (error) + goto out_free; + } + /* Update secondary superblocks now the physical grow has completed */ error = xfs_update_secondary_sbs(mp);
out_free: /*
On Wed, Jun 11, 2025 at 02:01:04PM -0700, Leah Rumancik wrote:
Hello again,
This is a series for 6.1.y for fixes from 6.11. It corresponds to the 6.6.y series here: https://lore.kernel.org/linux-xfs/20241218191725.63098-1-catherine.hoang@ora...
Queued up, thanks!
linux-stable-mirror@lists.linaro.org