Hello,
(This series has already been ack'd on the xfs-stable mailing list.)
Here is the 6.1.y series corresponding to the 6.6.y series for 6.8 (https://lore.kernel.org/all/20240325220724.42216-1-catherine.hoang@oracle.co...).
Descrepancies between the patch series are as follows...
The following were added as dependencies (9 patches):
0b11553ec54a6d88907e60d0595dbcef98539747 xfs: pass refcount intent directly through the log intent code (v6.3-rc1~142^2~5)
72ba455599ad13d08c29dafa22a32360e07b1961 xfs: pass xfs_extent_free_item directly through the log intent code (v6.3-rc1~142^2~9)
578c714b215d474c52949e65a914dae67924f0fe xfs: fix confusing xfs_extent_item variable names (v6.3-rc1~142^2~8)
ddccb81b26ec021ae1f3366aa996cc4c68dd75ce xfs: pass the xfs_bmbt_irec directly through the log intent code (v6.3-rc1~142^2~11)
b2ccab3199aa7cea9154d80ea2585312c5f6eba0 xfs: pass per-ag references to xfs_free_extent (v6.4-rc1~80^2~22^2~3)
7dfee17b13e5024c5c0ab1911859ded4182de3e5 xfs: validate block number being freed before adding to xefi (v6.4-rc6~19^2~1)
fix of 7dfee17b13e: 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070 xfs: fix bounds check in xfs_defer_agfl_block() (v6.5-rc1~44^2~10)
b742d7b4f0e03df25c2a772adcded35044b625ca xfs: use deferred frees for btree block freeing (v6.5-rc1~44^2~16)
3c919b0910906cc69d76dea214776f0eac73358b xfs: reserve less log space when recovering log intent items (v6.6-rc3~13^2~5^2)
And the following were skipped for 6.1.y (4 patches):
fb6e584e74710a1b7caee9dac59b494a37e07a62 (scrub) xfs: make xchk_iget safer in the presence of corrupt inode btrees
c0e37f07d2bd3c1ee3fb5a650da7d8673557ed16 (scrub) xfs: fix an off-by-one error in xreap_agextent_binval
b9358db0a811ff698b0a743bcfb80dfc44b88ebd (scrub) xfs: add missing nrext64 inode flag check to scrub
84712492e6dab803bf595fb8494d11098b74a652 (already in 6.1.y) xfs: short circuit xfs_growfs_data_private() if delta is zero
The auto group was run 1x on each of these configs:
xfs/4k xfs/1k xfs/logdev xfs/realtime xfs/quota xfs/v4 xfs/dax xfs/adv xfs/dirblock_8k
and no regressions were seen.
Let me know if you see any issues. Thanks, Leah
Andrey Albershteyn (1): xfs: reset XFS_ATTR_INCOMPLETE filter on node removal
Christoph Hellwig (1): xfs: consider minlen sized extents in xfs_rtallocate_extent_block
Darrick J. Wong (19): xfs: pass refcount intent directly through the log intent code xfs: pass xfs_extent_free_item directly through the log intent code xfs: fix confusing xfs_extent_item variable names xfs: pass the xfs_bmbt_irec directly through the log intent code xfs: pass per-ag references to xfs_free_extent xfs: reserve less log space when recovering log intent items xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t xfs: don't leak recovered attri intent items xfs: use xfs_defer_pending objects to recover intent items xfs: pass the xfs_defer_pending object to iop_recover xfs: transfer recovered intent item ownership in ->iop_recover xfs: make rextslog computation consistent with mkfs xfs: fix 32-bit truncation in xfs_compute_rextslog xfs: don't allow overly small or large realtime volumes xfs: remove unused fields from struct xbtree_ifakeroot xfs: recompute growfsrtfree transaction reservation while growing rt volume xfs: force all buffers to be written during btree bulk load xfs: remove conditional building of rt geometry validator functions
Dave Chinner (4): xfs: validate block number being freed before adding to xefi xfs: fix bounds check in xfs_defer_agfl_block() xfs: use deferred frees for btree block freeing xfs: initialise di_crc in xfs_log_dinode
Jiachen Zhang (1): xfs: ensure logflagsp is initialized in xfs_bmap_del_extent_real
Long Li (2): xfs: add lock protection when remove perag from radix tree xfs: fix perag leak when growfs fails
Zhang Tianci (1): xfs: update dir3 leaf block metadata after swap
fs/xfs/libxfs/xfs_ag.c | 45 ++++++++--- fs/xfs/libxfs/xfs_ag.h | 3 + fs/xfs/libxfs/xfs_alloc.c | 70 ++++++++++------- fs/xfs/libxfs/xfs_alloc.h | 20 +++-- fs/xfs/libxfs/xfs_attr.c | 6 +- fs/xfs/libxfs/xfs_bmap.c | 121 ++++++++++++++-------------- fs/xfs/libxfs/xfs_bmap.h | 5 +- fs/xfs/libxfs/xfs_bmap_btree.c | 8 +- fs/xfs/libxfs/xfs_btree_staging.c | 4 +- fs/xfs/libxfs/xfs_btree_staging.h | 6 -- fs/xfs/libxfs/xfs_da_btree.c | 7 ++ fs/xfs/libxfs/xfs_defer.c | 103 +++++++++++++++++------- fs/xfs/libxfs/xfs_defer.h | 5 ++ fs/xfs/libxfs/xfs_format.h | 2 +- fs/xfs/libxfs/xfs_ialloc.c | 24 ++++-- fs/xfs/libxfs/xfs_ialloc_btree.c | 6 +- fs/xfs/libxfs/xfs_log_recover.h | 27 +++++++ fs/xfs/libxfs/xfs_refcount.c | 116 +++++++++++++-------------- fs/xfs/libxfs/xfs_refcount.h | 4 +- fs/xfs/libxfs/xfs_refcount_btree.c | 9 +-- fs/xfs/libxfs/xfs_rtbitmap.c | 2 + fs/xfs/libxfs/xfs_rtbitmap.h | 83 ++++++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 20 ++++- fs/xfs/libxfs/xfs_sb.h | 2 + fs/xfs/libxfs/xfs_types.h | 13 +++ fs/xfs/scrub/repair.c | 3 +- fs/xfs/scrub/rtbitmap.c | 3 +- fs/xfs/xfs_attr_item.c | 30 +++---- fs/xfs/xfs_bmap_item.c | 99 ++++++++++------------- fs/xfs/xfs_buf.c | 44 ++++++++++- fs/xfs/xfs_buf.h | 1 + fs/xfs/xfs_extfree_item.c | 122 ++++++++++++++++------------- fs/xfs/xfs_fsmap.c | 2 +- fs/xfs/xfs_fsops.c | 5 +- fs/xfs/xfs_inode_item.c | 3 + fs/xfs/xfs_log.c | 1 + fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_log_recover.c | 118 +++++++++++++++------------- fs/xfs/xfs_refcount_item.c | 81 +++++++++---------- fs/xfs/xfs_reflink.c | 7 +- fs/xfs/xfs_rmap_item.c | 20 ++--- fs/xfs/xfs_rtalloc.c | 14 +++- fs/xfs/xfs_rtalloc.h | 73 ----------------- fs/xfs/xfs_trace.h | 15 +--- fs/xfs/xfs_trans.h | 4 +- 45 files changed, 782 insertions(+), 575 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 0b11553ec54a6d88907e60d0595dbcef98539747 ]
Pass the incore refcount intent through the CUI logging code instead of repeatedly boxing and unboxing parameters.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_refcount.c | 96 ++++++++++++++++-------------------- fs/xfs/libxfs/xfs_refcount.h | 4 +- fs/xfs/xfs_refcount_item.c | 62 ++++++++++------------- fs/xfs/xfs_trace.h | 15 ++---- 4 files changed, 74 insertions(+), 103 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 6f7ed9288fe4..bcf46aa0d08b 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1211,60 +1211,56 @@ xfs_refcount_adjust_extents(
/* Adjust the reference count of a range of AG blocks. */ STATIC int xfs_refcount_adjust( struct xfs_btree_cur *cur, - xfs_agblock_t agbno, - xfs_extlen_t aglen, - xfs_agblock_t *new_agbno, - xfs_extlen_t *new_aglen, + xfs_agblock_t *agbno, + xfs_extlen_t *aglen, enum xfs_refc_adjust_op adj) { bool shape_changed; int shape_changes = 0; int error;
- *new_agbno = agbno; - *new_aglen = aglen; if (adj == XFS_REFCOUNT_ADJUST_INCREASE) - trace_xfs_refcount_increase(cur->bc_mp, cur->bc_ag.pag->pag_agno, - agbno, aglen); + trace_xfs_refcount_increase(cur->bc_mp, + cur->bc_ag.pag->pag_agno, *agbno, *aglen); else - trace_xfs_refcount_decrease(cur->bc_mp, cur->bc_ag.pag->pag_agno, - agbno, aglen); + trace_xfs_refcount_decrease(cur->bc_mp, + cur->bc_ag.pag->pag_agno, *agbno, *aglen);
/* * Ensure that no rcextents cross the boundary of the adjustment range. */ error = xfs_refcount_split_extent(cur, XFS_REFC_DOMAIN_SHARED, - agbno, &shape_changed); + *agbno, &shape_changed); if (error) goto out_error; if (shape_changed) shape_changes++;
error = xfs_refcount_split_extent(cur, XFS_REFC_DOMAIN_SHARED, - agbno + aglen, &shape_changed); + *agbno + *aglen, &shape_changed); if (error) goto out_error; if (shape_changed) shape_changes++;
/* * Try to merge with the left or right extents of the range. */ error = xfs_refcount_merge_extents(cur, XFS_REFC_DOMAIN_SHARED, - new_agbno, new_aglen, adj, &shape_changed); + agbno, aglen, adj, &shape_changed); if (error) goto out_error; if (shape_changed) shape_changes++; if (shape_changes) cur->bc_ag.refc.shape_changes++;
/* Now that we've taken care of the ends, adjust the middle extents */ - error = xfs_refcount_adjust_extents(cur, new_agbno, new_aglen, adj); + error = xfs_refcount_adjust_extents(cur, agbno, aglen, adj); if (error) goto out_error;
return 0;
@@ -1296,62 +1292,56 @@ xfs_refcount_finish_one_cleanup( * Checks to make sure we're not going to run off the end of the AG. */ static inline int xfs_refcount_continue_op( struct xfs_btree_cur *cur, - xfs_fsblock_t startblock, - xfs_agblock_t new_agbno, - xfs_extlen_t new_len, - xfs_fsblock_t *new_fsbno) + struct xfs_refcount_intent *ri, + xfs_agblock_t new_agbno) { struct xfs_mount *mp = cur->bc_mp; struct xfs_perag *pag = cur->bc_ag.pag;
- if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, new_len))) + if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, + ri->ri_blockcount))) return -EFSCORRUPTED;
- *new_fsbno = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno); + ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno);
- ASSERT(xfs_verify_fsbext(mp, *new_fsbno, new_len)); - ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, *new_fsbno)); + ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount)); + ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock));
return 0; }
/* * Process one of the deferred refcount operations. We pass back the * btree cursor to maintain our lock on the btree between calls. * This saves time and eliminates a buffer deadlock between the * superblock and the AGF because we'll always grab them in the same * order. */ int xfs_refcount_finish_one( struct xfs_trans *tp, - enum xfs_refcount_intent_type type, - xfs_fsblock_t startblock, - xfs_extlen_t blockcount, - xfs_fsblock_t *new_fsb, - xfs_extlen_t *new_len, + struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur) { struct xfs_mount *mp = tp->t_mountp; struct xfs_btree_cur *rcur; struct xfs_buf *agbp = NULL; int error = 0; xfs_agblock_t bno; - xfs_agblock_t new_agbno; unsigned long nr_ops = 0; int shape_changes = 0; struct xfs_perag *pag;
- pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, startblock)); - bno = XFS_FSB_TO_AGBNO(mp, startblock); + pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ri->ri_startblock)); + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock);
- trace_xfs_refcount_deferred(mp, XFS_FSB_TO_AGNO(mp, startblock), - type, XFS_FSB_TO_AGBNO(mp, startblock), - blockcount); + trace_xfs_refcount_deferred(mp, XFS_FSB_TO_AGNO(mp, ri->ri_startblock), + ri->ri_type, XFS_FSB_TO_AGBNO(mp, ri->ri_startblock), + ri->ri_blockcount);
if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) { error = -EIO; goto out_drop; } @@ -1378,46 +1368,46 @@ xfs_refcount_finish_one( rcur->bc_ag.refc.nr_ops = nr_ops; rcur->bc_ag.refc.shape_changes = shape_changes; } *pcur = rcur;
- switch (type) { + switch (ri->ri_type) { case XFS_REFCOUNT_INCREASE: - error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno, - new_len, XFS_REFCOUNT_ADJUST_INCREASE); + error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount, + XFS_REFCOUNT_ADJUST_INCREASE); if (error) goto out_drop; - if (*new_len > 0) - error = xfs_refcount_continue_op(rcur, startblock, - new_agbno, *new_len, new_fsb); + if (ri->ri_blockcount > 0) + error = xfs_refcount_continue_op(rcur, ri, bno); break; case XFS_REFCOUNT_DECREASE: - error = xfs_refcount_adjust(rcur, bno, blockcount, &new_agbno, - new_len, XFS_REFCOUNT_ADJUST_DECREASE); + error = xfs_refcount_adjust(rcur, &bno, &ri->ri_blockcount, + XFS_REFCOUNT_ADJUST_DECREASE); if (error) goto out_drop; - if (*new_len > 0) - error = xfs_refcount_continue_op(rcur, startblock, - new_agbno, *new_len, new_fsb); + if (ri->ri_blockcount > 0) + error = xfs_refcount_continue_op(rcur, ri, bno); break; case XFS_REFCOUNT_ALLOC_COW: - *new_fsb = startblock + blockcount; - *new_len = 0; - error = __xfs_refcount_cow_alloc(rcur, bno, blockcount); + error = __xfs_refcount_cow_alloc(rcur, bno, ri->ri_blockcount); + if (error) + goto out_drop; + ri->ri_blockcount = 0; break; case XFS_REFCOUNT_FREE_COW: - *new_fsb = startblock + blockcount; - *new_len = 0; - error = __xfs_refcount_cow_free(rcur, bno, blockcount); + error = __xfs_refcount_cow_free(rcur, bno, ri->ri_blockcount); + if (error) + goto out_drop; + ri->ri_blockcount = 0; break; default: ASSERT(0); error = -EFSCORRUPTED; } - if (!error && *new_len > 0) - trace_xfs_refcount_finish_one_leftover(mp, pag->pag_agno, type, - bno, blockcount, new_agbno, *new_len); + if (!error && ri->ri_blockcount > 0) + trace_xfs_refcount_finish_one_leftover(mp, pag->pag_agno, + ri->ri_type, bno, ri->ri_blockcount); out_drop: xfs_perag_put(pag); return error; }
diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 452f30556f5a..c633477ce3ce 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -73,13 +73,11 @@ void xfs_refcount_decrease_extent(struct xfs_trans *tp, struct xfs_bmbt_irec *irec);
extern void xfs_refcount_finish_one_cleanup(struct xfs_trans *tp, struct xfs_btree_cur *rcur, int error); extern int xfs_refcount_finish_one(struct xfs_trans *tp, - enum xfs_refcount_intent_type type, xfs_fsblock_t startblock, - xfs_extlen_t blockcount, xfs_fsblock_t *new_fsb, - xfs_extlen_t *new_len, struct xfs_btree_cur **pcur); + struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur);
extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur, xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_end_of_shared);
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 858e3e9eb4a8..ff4d5087ba00 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -250,21 +250,16 @@ xfs_trans_get_cud( */ static int xfs_trans_log_finish_refcount_update( struct xfs_trans *tp, struct xfs_cud_log_item *cudp, - enum xfs_refcount_intent_type type, - xfs_fsblock_t startblock, - xfs_extlen_t blockcount, - xfs_fsblock_t *new_fsb, - xfs_extlen_t *new_len, + struct xfs_refcount_intent *ri, struct xfs_btree_cur **pcur) { int error;
- error = xfs_refcount_finish_one(tp, type, startblock, - blockcount, new_fsb, new_len, pcur); + error = xfs_refcount_finish_one(tp, ri, pcur);
/* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: * @@ -376,29 +371,24 @@ xfs_refcount_update_finish_item( struct xfs_trans *tp, struct xfs_log_item *done, struct list_head *item, struct xfs_btree_cur **state) { - struct xfs_refcount_intent *refc; - xfs_fsblock_t new_fsb; - xfs_extlen_t new_aglen; + struct xfs_refcount_intent *ri; int error;
- refc = container_of(item, struct xfs_refcount_intent, ri_list); - error = xfs_trans_log_finish_refcount_update(tp, CUD_ITEM(done), - refc->ri_type, refc->ri_startblock, refc->ri_blockcount, - &new_fsb, &new_aglen, state); + ri = container_of(item, struct xfs_refcount_intent, ri_list); + error = xfs_trans_log_finish_refcount_update(tp, CUD_ITEM(done), ri, + state);
/* Did we run out of reservation? Requeue what we didn't finish. */ - if (!error && new_aglen > 0) { - ASSERT(refc->ri_type == XFS_REFCOUNT_INCREASE || - refc->ri_type == XFS_REFCOUNT_DECREASE); - refc->ri_startblock = new_fsb; - refc->ri_blockcount = new_aglen; + if (!error && ri->ri_blockcount > 0) { + ASSERT(ri->ri_type == XFS_REFCOUNT_INCREASE || + ri->ri_type == XFS_REFCOUNT_DECREASE); return -EAGAIN; } - kmem_cache_free(xfs_refcount_intent_cache, refc); + kmem_cache_free(xfs_refcount_intent_cache, ri); return error; }
/* Abort all pending CUIs. */ STATIC void @@ -461,22 +451,17 @@ xfs_cui_validate_phys( STATIC int xfs_cui_item_recover( struct xfs_log_item *lip, struct list_head *capture_list) { - struct xfs_bmbt_irec irec; struct xfs_cui_log_item *cuip = CUI_ITEM(lip); - struct xfs_phys_extent *refc; struct xfs_cud_log_item *cudp; struct xfs_trans *tp; struct xfs_btree_cur *rcur = NULL; struct xfs_mount *mp = lip->li_log->l_mp; - xfs_fsblock_t new_fsb; - xfs_extlen_t new_len; unsigned int refc_type; bool requeue_only = false; - enum xfs_refcount_intent_type type; int i; int error = 0;
/* * First check the validity of the extents described by the @@ -511,45 +496,50 @@ xfs_cui_item_recover( return error;
cudp = xfs_trans_get_cud(tp, cuip);
for (i = 0; i < cuip->cui_format.cui_nextents; i++) { + struct xfs_refcount_intent fake = { }; + struct xfs_phys_extent *refc; + refc = &cuip->cui_format.cui_extents[i]; refc_type = refc->pe_flags & XFS_REFCOUNT_EXTENT_TYPE_MASK; switch (refc_type) { case XFS_REFCOUNT_INCREASE: case XFS_REFCOUNT_DECREASE: case XFS_REFCOUNT_ALLOC_COW: case XFS_REFCOUNT_FREE_COW: - type = refc_type; + fake.ri_type = refc_type; break; default: XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &cuip->cui_format, sizeof(cuip->cui_format)); error = -EFSCORRUPTED; goto abort_error; } - if (requeue_only) { - new_fsb = refc->pe_startblock; - new_len = refc->pe_len; - } else + + fake.ri_startblock = refc->pe_startblock; + fake.ri_blockcount = refc->pe_len; + if (!requeue_only) error = xfs_trans_log_finish_refcount_update(tp, cudp, - type, refc->pe_startblock, refc->pe_len, - &new_fsb, &new_len, &rcur); + &fake, &rcur); if (error == -EFSCORRUPTED) XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &cuip->cui_format, sizeof(cuip->cui_format)); if (error) goto abort_error;
/* Requeue what we didn't finish. */ - if (new_len > 0) { - irec.br_startblock = new_fsb; - irec.br_blockcount = new_len; - switch (type) { + if (fake.ri_blockcount > 0) { + struct xfs_bmbt_irec irec = { + .br_startblock = fake.ri_startblock, + .br_blockcount = fake.ri_blockcount, + }; + + switch (fake.ri_type) { case XFS_REFCOUNT_INCREASE: xfs_refcount_increase_extent(tp, &irec); break; case XFS_REFCOUNT_DECREASE: xfs_refcount_decrease_extent(tp, &irec); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 0cd62031e53f..20e2ec8b73aa 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3206,39 +3206,32 @@ DEFINE_AG_ERROR_EVENT(xfs_refcount_find_shared_error); DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_defer); DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_deferred);
TRACE_EVENT(xfs_refcount_finish_one_leftover, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - int type, xfs_agblock_t agbno, xfs_extlen_t len, - xfs_agblock_t new_agbno, xfs_extlen_t new_len), - TP_ARGS(mp, agno, type, agbno, len, new_agbno, new_len), + int type, xfs_agblock_t agbno, xfs_extlen_t len), + TP_ARGS(mp, agno, type, agbno, len), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) __field(int, type) __field(xfs_agblock_t, agbno) __field(xfs_extlen_t, len) - __field(xfs_agblock_t, new_agbno) - __field(xfs_extlen_t, new_len) ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; __entry->agno = agno; __entry->type = type; __entry->agbno = agbno; __entry->len = len; - __entry->new_agbno = new_agbno; - __entry->new_len = new_len; ), - TP_printk("dev %d:%d type %d agno 0x%x agbno 0x%x fsbcount 0x%x new_agbno 0x%x new_fsbcount 0x%x", + TP_printk("dev %d:%d type %d agno 0x%x agbno 0x%x fsbcount 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->type, __entry->agno, __entry->agbno, - __entry->len, - __entry->new_agbno, - __entry->new_len) + __entry->len) );
/* simple inode-based error/%ip tracepoint class */ DECLARE_EVENT_CLASS(xfs_inode_error_class, TP_PROTO(struct xfs_inode *ip, int error, unsigned long caller_ip),
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 0b11553ec54a6d88907e60d0595dbcef98539747
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Darrick J. Wongdjwong@kernel.org
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: 0b11553ec54a6 ! 1: 17e9e595a5fc0 xfs: pass refcount intent directly through the log intent code @@ Metadata ## Commit message ## xfs: pass refcount intent directly through the log intent code
+ [ Upstream commit 0b11553ec54a6d88907e60d0595dbcef98539747 ] + Pass the incore refcount intent through the CUI logging code instead of repeatedly boxing and unboxing parameters.
Signed-off-by: Darrick J. Wong djwong@kernel.org + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/libxfs/xfs_refcount.c ## @@ fs/xfs/libxfs/xfs_refcount.c: xfs_refcount_adjust_extents( ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 72ba455599ad13d08c29dafa22a32360e07b1961 ]
Pass the incore xfs_extent_free_item through the EFI logging code instead of repeatedly boxing and unboxing parameters.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_extfree_item.c | 55 +++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 25 deletions(-)
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index d5130d1fcfae..618d2f9ff535 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -343,42 +343,49 @@ xfs_trans_get_efd( */ static int xfs_trans_free_extent( struct xfs_trans *tp, struct xfs_efd_log_item *efdp, - xfs_fsblock_t start_block, - xfs_extlen_t ext_len, - const struct xfs_owner_info *oinfo, - bool skip_discard) + struct xfs_extent_free_item *free) { + struct xfs_owner_info oinfo = { }; struct xfs_mount *mp = tp->t_mountp; struct xfs_extent *extp; uint next_extent; - xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, start_block); + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, + free->xefi_startblock); xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, - start_block); + free->xefi_startblock); int error;
- trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, ext_len); + oinfo.oi_owner = free->xefi_owner; + if (free->xefi_flags & XFS_EFI_ATTR_FORK) + oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; + if (free->xefi_flags & XFS_EFI_BMBT_BLOCK) + oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK; + + trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, + free->xefi_blockcount);
- error = __xfs_free_extent(tp, start_block, ext_len, - oinfo, XFS_AG_RESV_NONE, skip_discard); + error = __xfs_free_extent(tp, free->xefi_startblock, + free->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, + free->xefi_flags & XFS_EFI_SKIP_DISCARD); /* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: * * 1.) releases the EFI and frees the EFD * 2.) shuts down the filesystem */ tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE; set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
next_extent = efdp->efd_next_extent; ASSERT(next_extent < efdp->efd_format.efd_nextents); extp = &(efdp->efd_format.efd_extents[next_extent]); - extp->ext_start = start_block; - extp->ext_len = ext_len; + extp->ext_start = free->xefi_startblock; + extp->ext_len = free->xefi_blockcount; efdp->efd_next_extent++;
return error; }
@@ -461,24 +468,16 @@ xfs_extent_free_finish_item( struct xfs_trans *tp, struct xfs_log_item *done, struct list_head *item, struct xfs_btree_cur **state) { - struct xfs_owner_info oinfo = { }; struct xfs_extent_free_item *free; int error;
free = container_of(item, struct xfs_extent_free_item, xefi_list); - oinfo.oi_owner = free->xefi_owner; - if (free->xefi_flags & XFS_EFI_ATTR_FORK) - oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; - if (free->xefi_flags & XFS_EFI_BMBT_BLOCK) - oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK; - error = xfs_trans_free_extent(tp, EFD_ITEM(done), - free->xefi_startblock, - free->xefi_blockcount, - &oinfo, free->xefi_flags & XFS_EFI_SKIP_DISCARD); + + error = xfs_trans_free_extent(tp, EFD_ITEM(done), free); kmem_cache_free(xfs_extfree_item_cache, free); return error; }
/* Abort all pending EFIs. */ @@ -597,39 +596,45 @@ xfs_efi_item_recover( { struct xfs_efi_log_item *efip = EFI_ITEM(lip); struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_efd_log_item *efdp; struct xfs_trans *tp; - struct xfs_extent *extp; int i; int error = 0;
/* * First check the validity of the extents described by the * EFI. If any are bad, then assume that all are bad and * just toss the EFI. */ for (i = 0; i < efip->efi_format.efi_nextents; i++) { if (!xfs_efi_validate_ext(mp, &efip->efi_format.efi_extents[i])) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &efip->efi_format, sizeof(efip->efi_format)); return -EFSCORRUPTED; } }
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) return error; efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) { + struct xfs_extent_free_item fake = { + .xefi_owner = XFS_RMAP_OWN_UNKNOWN, + }; + struct xfs_extent *extp; + extp = &efip->efi_format.efi_extents[i]; - error = xfs_trans_free_extent(tp, efdp, extp->ext_start, - extp->ext_len, - &XFS_RMAP_OINFO_ANY_OWNER, false); + + fake.xefi_startblock = extp->ext_start; + fake.xefi_blockcount = extp->ext_len; + + error = xfs_trans_free_extent(tp, efdp, &fake); if (error == -EFSCORRUPTED) XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, extp, sizeof(*extp)); if (error) goto abort_error;
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 72ba455599ad13d08c29dafa22a32360e07b1961
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Darrick J. Wongdjwong@kernel.org
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: 72ba455599ad1 ! 1: a4d4bccfa3b1e xfs: pass xfs_extent_free_item directly through the log intent code @@ Metadata ## Commit message ## xfs: pass xfs_extent_free_item directly through the log intent code
+ [ Upstream commit 72ba455599ad13d08c29dafa22a32360e07b1961 ] + Pass the incore xfs_extent_free_item through the EFI logging code instead of repeatedly boxing and unboxing parameters.
Signed-off-by: Darrick J. Wong djwong@kernel.org + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/xfs_extfree_item.c ## @@ fs/xfs/xfs_extfree_item.c: static int ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 578c714b215d474c52949e65a914dae67924f0fe ]
Change the name of all pointers to xfs_extent_item structures to "xefi" to make the name consistent and because the current selections ("new" and "free") mean other things in C.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_alloc.c | 32 +++++++++--------- fs/xfs/xfs_extfree_item.c | 70 +++++++++++++++++++-------------------- 2 files changed, 51 insertions(+), 51 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 74d039bdc9f7..cc5c954cda88 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2491,78 +2491,78 @@ xfs_defer_agfl_block( xfs_agnumber_t agno, xfs_fsblock_t agbno, struct xfs_owner_info *oinfo) { struct xfs_mount *mp = tp->t_mountp; - struct xfs_extent_free_item *new; /* new element */ + struct xfs_extent_free_item *xefi;
ASSERT(xfs_extfree_item_cache != NULL); ASSERT(oinfo != NULL);
- new = kmem_cache_zalloc(xfs_extfree_item_cache, + xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); - new->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); - new->xefi_blockcount = 1; - new->xefi_owner = oinfo->oi_owner; + xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); + xefi->xefi_blockcount = 1; + xefi->xefi_owner = oinfo->oi_owner;
trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
- xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &new->xefi_list); + xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); }
/* * Add the extent to the list of extents to be free at transaction end. * The list is maintained sorted (by block number). */ void __xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, bool skip_discard) { - struct xfs_extent_free_item *new; /* new element */ + struct xfs_extent_free_item *xefi; #ifdef DEBUG struct xfs_mount *mp = tp->t_mountp; xfs_agnumber_t agno; xfs_agblock_t agbno;
ASSERT(bno != NULLFSBLOCK); ASSERT(len > 0); ASSERT(len <= XFS_MAX_BMBT_EXTLEN); ASSERT(!isnullstartblock(bno)); agno = XFS_FSB_TO_AGNO(mp, bno); agbno = XFS_FSB_TO_AGBNO(mp, bno); ASSERT(agno < mp->m_sb.sb_agcount); ASSERT(agbno < mp->m_sb.sb_agblocks); ASSERT(len < mp->m_sb.sb_agblocks); ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif ASSERT(xfs_extfree_item_cache != NULL);
- new = kmem_cache_zalloc(xfs_extfree_item_cache, + xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); - new->xefi_startblock = bno; - new->xefi_blockcount = (xfs_extlen_t)len; + xefi->xefi_startblock = bno; + xefi->xefi_blockcount = (xfs_extlen_t)len; if (skip_discard) - new->xefi_flags |= XFS_EFI_SKIP_DISCARD; + xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; if (oinfo) { ASSERT(oinfo->oi_offset == 0);
if (oinfo->oi_flags & XFS_OWNER_INFO_ATTR_FORK) - new->xefi_flags |= XFS_EFI_ATTR_FORK; + xefi->xefi_flags |= XFS_EFI_ATTR_FORK; if (oinfo->oi_flags & XFS_OWNER_INFO_BMBT_BLOCK) - new->xefi_flags |= XFS_EFI_BMBT_BLOCK; - new->xefi_owner = oinfo->oi_owner; + xefi->xefi_flags |= XFS_EFI_BMBT_BLOCK; + xefi->xefi_owner = oinfo->oi_owner; } else { - new->xefi_owner = XFS_RMAP_OWN_NULL; + xefi->xefi_owner = XFS_RMAP_OWN_NULL; } trace_xfs_bmap_free_defer(tp->t_mountp, XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &new->xefi_list); + xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); }
#ifdef DEBUG /* * Check if an AGF has a free extent record whose length is equal to diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 618d2f9ff535..011b50469301 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -343,49 +343,49 @@ xfs_trans_get_efd( */ static int xfs_trans_free_extent( struct xfs_trans *tp, struct xfs_efd_log_item *efdp, - struct xfs_extent_free_item *free) + struct xfs_extent_free_item *xefi) { struct xfs_owner_info oinfo = { }; struct xfs_mount *mp = tp->t_mountp; struct xfs_extent *extp; uint next_extent; xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, - free->xefi_startblock); + xefi->xefi_startblock); xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, - free->xefi_startblock); + xefi->xefi_startblock); int error;
- oinfo.oi_owner = free->xefi_owner; - if (free->xefi_flags & XFS_EFI_ATTR_FORK) + oinfo.oi_owner = xefi->xefi_owner; + if (xefi->xefi_flags & XFS_EFI_ATTR_FORK) oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; - if (free->xefi_flags & XFS_EFI_BMBT_BLOCK) + if (xefi->xefi_flags & XFS_EFI_BMBT_BLOCK) oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, - free->xefi_blockcount); + xefi->xefi_blockcount);
- error = __xfs_free_extent(tp, free->xefi_startblock, - free->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, - free->xefi_flags & XFS_EFI_SKIP_DISCARD); + error = __xfs_free_extent(tp, xefi->xefi_startblock, + xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, + xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); /* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: * * 1.) releases the EFI and frees the EFD * 2.) shuts down the filesystem */ tp->t_flags |= XFS_TRANS_DIRTY | XFS_TRANS_HAS_INTENT_DONE; set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
next_extent = efdp->efd_next_extent; ASSERT(next_extent < efdp->efd_format.efd_nextents); extp = &(efdp->efd_format.efd_extents[next_extent]); - extp->ext_start = free->xefi_startblock; - extp->ext_len = free->xefi_blockcount; + extp->ext_start = xefi->xefi_startblock; + extp->ext_len = xefi->xefi_blockcount; efdp->efd_next_extent++;
return error; }
@@ -409,162 +409,162 @@ xfs_extent_free_diff_items( /* Log a free extent to the intent item. */ STATIC void xfs_extent_free_log_item( struct xfs_trans *tp, struct xfs_efi_log_item *efip, - struct xfs_extent_free_item *free) + struct xfs_extent_free_item *xefi) { uint next_extent; struct xfs_extent *extp;
tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &efip->efi_item.li_flags);
/* * atomic_inc_return gives us the value after the increment; * we want to use it as an array index so we need to subtract 1 from * it. */ next_extent = atomic_inc_return(&efip->efi_next_extent) - 1; ASSERT(next_extent < efip->efi_format.efi_nextents); extp = &efip->efi_format.efi_extents[next_extent]; - extp->ext_start = free->xefi_startblock; - extp->ext_len = free->xefi_blockcount; + extp->ext_start = xefi->xefi_startblock; + extp->ext_len = xefi->xefi_blockcount; }
static struct xfs_log_item * xfs_extent_free_create_intent( struct xfs_trans *tp, struct list_head *items, unsigned int count, bool sort) { struct xfs_mount *mp = tp->t_mountp; struct xfs_efi_log_item *efip = xfs_efi_init(mp, count); - struct xfs_extent_free_item *free; + struct xfs_extent_free_item *xefi;
ASSERT(count > 0);
xfs_trans_add_item(tp, &efip->efi_item); if (sort) list_sort(mp, items, xfs_extent_free_diff_items); - list_for_each_entry(free, items, xefi_list) - xfs_extent_free_log_item(tp, efip, free); + list_for_each_entry(xefi, items, xefi_list) + xfs_extent_free_log_item(tp, efip, xefi); return &efip->efi_item; }
/* Get an EFD so we can process all the free extents. */ static struct xfs_log_item * xfs_extent_free_create_done( struct xfs_trans *tp, struct xfs_log_item *intent, unsigned int count) { return &xfs_trans_get_efd(tp, EFI_ITEM(intent), count)->efd_item; }
/* Process a free extent. */ STATIC int xfs_extent_free_finish_item( struct xfs_trans *tp, struct xfs_log_item *done, struct list_head *item, struct xfs_btree_cur **state) { - struct xfs_extent_free_item *free; + struct xfs_extent_free_item *xefi; int error;
- free = container_of(item, struct xfs_extent_free_item, xefi_list); + xefi = container_of(item, struct xfs_extent_free_item, xefi_list);
- error = xfs_trans_free_extent(tp, EFD_ITEM(done), free); - kmem_cache_free(xfs_extfree_item_cache, free); + error = xfs_trans_free_extent(tp, EFD_ITEM(done), xefi); + kmem_cache_free(xfs_extfree_item_cache, xefi); return error; }
/* Abort all pending EFIs. */ STATIC void xfs_extent_free_abort_intent( struct xfs_log_item *intent) { xfs_efi_release(EFI_ITEM(intent)); }
/* Cancel a free extent. */ STATIC void xfs_extent_free_cancel_item( struct list_head *item) { - struct xfs_extent_free_item *free; + struct xfs_extent_free_item *xefi;
- free = container_of(item, struct xfs_extent_free_item, xefi_list); - kmem_cache_free(xfs_extfree_item_cache, free); + xefi = container_of(item, struct xfs_extent_free_item, xefi_list); + kmem_cache_free(xfs_extfree_item_cache, xefi); }
const struct xfs_defer_op_type xfs_extent_free_defer_type = { .max_items = XFS_EFI_MAX_FAST_EXTENTS, .create_intent = xfs_extent_free_create_intent, .abort_intent = xfs_extent_free_abort_intent, .create_done = xfs_extent_free_create_done, .finish_item = xfs_extent_free_finish_item, .cancel_item = xfs_extent_free_cancel_item, };
/* * AGFL blocks are accounted differently in the reserve pools and are not * inserted into the busy extent list. */ STATIC int xfs_agfl_free_finish_item( struct xfs_trans *tp, struct xfs_log_item *done, struct list_head *item, struct xfs_btree_cur **state) { struct xfs_owner_info oinfo = { }; struct xfs_mount *mp = tp->t_mountp; struct xfs_efd_log_item *efdp = EFD_ITEM(done); - struct xfs_extent_free_item *free; + struct xfs_extent_free_item *xefi; struct xfs_extent *extp; struct xfs_buf *agbp; int error; xfs_agnumber_t agno; xfs_agblock_t agbno; uint next_extent; struct xfs_perag *pag;
- free = container_of(item, struct xfs_extent_free_item, xefi_list); - ASSERT(free->xefi_blockcount == 1); - agno = XFS_FSB_TO_AGNO(mp, free->xefi_startblock); - agbno = XFS_FSB_TO_AGBNO(mp, free->xefi_startblock); - oinfo.oi_owner = free->xefi_owner; + xefi = container_of(item, struct xfs_extent_free_item, xefi_list); + ASSERT(xefi->xefi_blockcount == 1); + agno = XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock); + agbno = XFS_FSB_TO_AGBNO(mp, xefi->xefi_startblock); + oinfo.oi_owner = xefi->xefi_owner;
- trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, free->xefi_blockcount); + trace_xfs_agfl_free_deferred(mp, agno, 0, agbno, xefi->xefi_blockcount);
pag = xfs_perag_get(mp, agno); error = xfs_alloc_read_agf(pag, tp, 0, &agbp); if (!error) error = xfs_free_agfl_block(tp, agno, agbno, agbp, &oinfo); xfs_perag_put(pag);
/* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: * * 1.) releases the EFI and frees the EFD * 2.) shuts down the filesystem */ tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &efdp->efd_item.li_flags);
next_extent = efdp->efd_next_extent; ASSERT(next_extent < efdp->efd_format.efd_nextents); extp = &(efdp->efd_format.efd_extents[next_extent]); - extp->ext_start = free->xefi_startblock; - extp->ext_len = free->xefi_blockcount; + extp->ext_start = xefi->xefi_startblock; + extp->ext_len = xefi->xefi_blockcount; efdp->efd_next_extent++;
- kmem_cache_free(xfs_extfree_item_cache, free); + kmem_cache_free(xfs_extfree_item_cache, xefi); return error; }
/* sub-type with special handling for AGFL deferred frees */ const struct xfs_defer_op_type xfs_agfl_free_defer_type = {
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 578c714b215d474c52949e65a914dae67924f0fe
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Darrick J. Wongdjwong@kernel.org
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: 578c714b215d4 ! 1: 48382d577161d xfs: fix confusing xfs_extent_item variable names @@ Metadata ## Commit message ## xfs: fix confusing xfs_extent_item variable names
+ [ Upstream commit 578c714b215d474c52949e65a914dae67924f0fe ] + Change the name of all pointers to xfs_extent_item structures to "xefi" to make the name consistent and because the current selections ("new" and "free") mean other things in C.
Signed-off-by: Darrick J. Wong djwong@kernel.org + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/libxfs/xfs_alloc.c ## @@ fs/xfs/libxfs/xfs_alloc.c: xfs_defer_agfl_block( ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit ddccb81b26ec021ae1f3366aa996cc4c68dd75ce ]
Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through.
Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_bmap.c | 32 ++++++++-------- fs/xfs/libxfs/xfs_bmap.h | 5 +-- fs/xfs/xfs_bmap_item.c | 81 +++++++++++++++------------------------- 3 files changed, 46 insertions(+), 72 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 27d3121e6da9..d523ac51c662 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6117,43 +6117,41 @@ xfs_bmap_unmap_extent( * btree cursor to maintain our lock on the bmapbt between calls. */ int xfs_bmap_finish_one( struct xfs_trans *tp, - struct xfs_inode *ip, - enum xfs_bmap_intent_type type, - int whichfork, - xfs_fileoff_t startoff, - xfs_fsblock_t startblock, - xfs_filblks_t *blockcount, - xfs_exntst_t state) + struct xfs_bmap_intent *bi) { + struct xfs_bmbt_irec *bmap = &bi->bi_bmap; int error = 0;
ASSERT(tp->t_firstblock == NULLFSBLOCK);
trace_xfs_bmap_deferred(tp->t_mountp, - XFS_FSB_TO_AGNO(tp->t_mountp, startblock), type, - XFS_FSB_TO_AGBNO(tp->t_mountp, startblock), - ip->i_ino, whichfork, startoff, *blockcount, state); + XFS_FSB_TO_AGNO(tp->t_mountp, bmap->br_startblock), + bi->bi_type, + XFS_FSB_TO_AGBNO(tp->t_mountp, bmap->br_startblock), + bi->bi_owner->i_ino, bi->bi_whichfork, + bmap->br_startoff, bmap->br_blockcount, + bmap->br_state);
- if (WARN_ON_ONCE(whichfork != XFS_DATA_FORK)) + if (WARN_ON_ONCE(bi->bi_whichfork != XFS_DATA_FORK)) return -EFSCORRUPTED;
if (XFS_TEST_ERROR(false, tp->t_mountp, XFS_ERRTAG_BMAP_FINISH_ONE)) return -EIO;
- switch (type) { + switch (bi->bi_type) { case XFS_BMAP_MAP: - error = xfs_bmapi_remap(tp, ip, startoff, *blockcount, - startblock, 0); - *blockcount = 0; + error = xfs_bmapi_remap(tp, bi->bi_owner, bmap->br_startoff, + bmap->br_blockcount, bmap->br_startblock, 0); + bmap->br_blockcount = 0; break; case XFS_BMAP_UNMAP: - error = __xfs_bunmapi(tp, ip, startoff, blockcount, - XFS_BMAPI_REMAP, 1); + error = __xfs_bunmapi(tp, bi->bi_owner, bmap->br_startoff, + &bmap->br_blockcount, XFS_BMAPI_REMAP, 1); break; default: ASSERT(0); error = -EFSCORRUPTED; } diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 08c16e4edc0f..524912f276f8 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -234,14 +234,11 @@ struct xfs_bmap_intent { int bi_whichfork; struct xfs_inode *bi_owner; struct xfs_bmbt_irec bi_bmap; };
-int xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_inode *ip, - enum xfs_bmap_intent_type type, int whichfork, - xfs_fileoff_t startoff, xfs_fsblock_t startblock, - xfs_filblks_t *blockcount, xfs_exntst_t state); +int xfs_bmap_finish_one(struct xfs_trans *tp, struct xfs_bmap_intent *bi); void xfs_bmap_map_extent(struct xfs_trans *tp, struct xfs_inode *ip, struct xfs_bmbt_irec *imap); void xfs_bmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, struct xfs_bmbt_irec *imap);
diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 41323da523d1..13aa5359c02f 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -244,22 +244,15 @@ xfs_trans_get_bud( */ static int xfs_trans_log_finish_bmap_update( struct xfs_trans *tp, struct xfs_bud_log_item *budp, - enum xfs_bmap_intent_type type, - struct xfs_inode *ip, - int whichfork, - xfs_fileoff_t startoff, - xfs_fsblock_t startblock, - xfs_filblks_t *blockcount, - xfs_exntst_t state) + struct xfs_bmap_intent *bi) { int error;
- error = xfs_bmap_finish_one(tp, ip, type, whichfork, startoff, - startblock, blockcount, state); + error = xfs_bmap_finish_one(tp, bi);
/* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: * @@ -376,29 +369,21 @@ xfs_bmap_update_finish_item( struct xfs_trans *tp, struct xfs_log_item *done, struct list_head *item, struct xfs_btree_cur **state) { - struct xfs_bmap_intent *bmap; - xfs_filblks_t count; + struct xfs_bmap_intent *bi; int error;
- bmap = container_of(item, struct xfs_bmap_intent, bi_list); - count = bmap->bi_bmap.br_blockcount; - error = xfs_trans_log_finish_bmap_update(tp, BUD_ITEM(done), - bmap->bi_type, - bmap->bi_owner, bmap->bi_whichfork, - bmap->bi_bmap.br_startoff, - bmap->bi_bmap.br_startblock, - &count, - bmap->bi_bmap.br_state); - if (!error && count > 0) { - ASSERT(bmap->bi_type == XFS_BMAP_UNMAP); - bmap->bi_bmap.br_blockcount = count; + bi = container_of(item, struct xfs_bmap_intent, bi_list); + + error = xfs_trans_log_finish_bmap_update(tp, BUD_ITEM(done), bi); + if (!error && bi->bi_bmap.br_blockcount > 0) { + ASSERT(bi->bi_type == XFS_BMAP_UNMAP); return -EAGAIN; } - kmem_cache_free(xfs_bmap_intent_cache, bmap); + kmem_cache_free(xfs_bmap_intent_cache, bi); return error; }
/* Abort all pending BUIs. */ STATIC void @@ -469,79 +454,73 @@ xfs_bui_validate( STATIC int xfs_bui_item_recover( struct xfs_log_item *lip, struct list_head *capture_list) { - struct xfs_bmbt_irec irec; + struct xfs_bmap_intent fake = { }; struct xfs_bui_log_item *buip = BUI_ITEM(lip); struct xfs_trans *tp; struct xfs_inode *ip = NULL; struct xfs_mount *mp = lip->li_log->l_mp; - struct xfs_map_extent *bmap; + struct xfs_map_extent *map; struct xfs_bud_log_item *budp; - xfs_filblks_t count; - xfs_exntst_t state; - unsigned int bui_type; - int whichfork; int iext_delta; int error = 0;
if (!xfs_bui_validate(mp, buip)) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &buip->bui_format, sizeof(buip->bui_format)); return -EFSCORRUPTED; }
- bmap = &buip->bui_format.bui_extents[0]; - state = (bmap->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ? - XFS_EXT_UNWRITTEN : XFS_EXT_NORM; - whichfork = (bmap->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ? + map = &buip->bui_format.bui_extents[0]; + fake.bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ? XFS_ATTR_FORK : XFS_DATA_FORK; - bui_type = bmap->me_flags & XFS_BMAP_EXTENT_TYPE_MASK; + fake.bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
- error = xlog_recover_iget(mp, bmap->me_owner, &ip); + error = xlog_recover_iget(mp, map->me_owner, &ip); if (error) return error;
/* Allocate transaction and do the work. */ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); if (error) goto err_rele;
budp = xfs_trans_get_bud(tp, buip); xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
- if (bui_type == XFS_BMAP_MAP) + if (fake.bi_type == XFS_BMAP_MAP) iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT; else iext_delta = XFS_IEXT_PUNCH_HOLE_CNT;
- error = xfs_iext_count_may_overflow(ip, whichfork, iext_delta); + error = xfs_iext_count_may_overflow(ip, fake.bi_whichfork, iext_delta); if (error == -EFBIG) error = xfs_iext_count_upgrade(tp, ip, iext_delta); if (error) goto err_cancel;
- count = bmap->me_len; - error = xfs_trans_log_finish_bmap_update(tp, budp, bui_type, ip, - whichfork, bmap->me_startoff, bmap->me_startblock, - &count, state); + fake.bi_owner = ip; + fake.bi_bmap.br_startblock = map->me_startblock; + fake.bi_bmap.br_startoff = map->me_startoff; + fake.bi_bmap.br_blockcount = map->me_len; + fake.bi_bmap.br_state = (map->me_flags & XFS_BMAP_EXTENT_UNWRITTEN) ? + XFS_EXT_UNWRITTEN : XFS_EXT_NORM; + + error = xfs_trans_log_finish_bmap_update(tp, budp, &fake); if (error == -EFSCORRUPTED) - XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, bmap, - sizeof(*bmap)); + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, map, + sizeof(*map)); if (error) goto err_cancel;
- if (count > 0) { - ASSERT(bui_type == XFS_BMAP_UNMAP); - irec.br_startblock = bmap->me_startblock; - irec.br_blockcount = count; - irec.br_startoff = bmap->me_startoff; - irec.br_state = state; - xfs_bmap_unmap_extent(tp, ip, &irec); + if (fake.bi_bmap.br_blockcount > 0) { + ASSERT(fake.bi_type == XFS_BMAP_UNMAP); + xfs_bmap_unmap_extent(tp, ip, &fake.bi_bmap); }
/* * Commit transaction, which frees the transaction and saves the inode * for later replay activities.
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: ddccb81b26ec021ae1f3366aa996cc4c68dd75ce
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Darrick J. Wongdjwong@kernel.org
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: ddccb81b26ec0 ! 1: f2c38698292e3 xfs: pass the xfs_bmbt_irec directly through the log intent code @@ Metadata ## Commit message ## xfs: pass the xfs_bmbt_irec directly through the log intent code
+ [ Upstream commit ddccb81b26ec021ae1f3366aa996cc4c68dd75ce ] + Instead of repeatedly boxing and unboxing the incore extent mapping structure as it passes through the BUI code, pass the pointer directly through.
Signed-off-by: Darrick J. Wong djwong@kernel.org + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/libxfs/xfs_bmap.c ## @@ fs/xfs/libxfs/xfs_bmap.c: xfs_bmap_unmap_extent( ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit b2ccab3199aa7cea9154d80ea2585312c5f6eba0 ]
Pass a reference to the per-AG structure to xfs_free_extent. Most callers already have one, so we can eliminate unnecessary lookups. The one exception to this is the EFI code, which the next patch will fix.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_ag.c | 6 ++---- fs/xfs/libxfs/xfs_alloc.c | 15 +++++---------- fs/xfs/libxfs/xfs_alloc.h | 8 +++++--- fs/xfs/libxfs/xfs_ialloc_btree.c | 7 +++++-- fs/xfs/libxfs/xfs_refcount_btree.c | 5 +++-- fs/xfs/scrub/repair.c | 3 ++- fs/xfs/xfs_extfree_item.c | 8 ++++++-- 7 files changed, 28 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index bf47efe08a58..f29767e6eb7f 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -979,14 +979,12 @@ xfs_ag_extend_space( error = xfs_rmap_free(tp, bp, pag, be32_to_cpu(agf->agf_length) - len, len, &XFS_RMAP_OINFO_SKIP_UPDATE); if (error) return error;
- error = xfs_free_extent(tp, XFS_AGB_TO_FSB(pag->pag_mount, pag->pag_agno, - be32_to_cpu(agf->agf_length) - len), - len, &XFS_RMAP_OINFO_SKIP_UPDATE, - XFS_AG_RESV_NONE); + error = xfs_free_extent(tp, pag, be32_to_cpu(agf->agf_length) - len, + len, &XFS_RMAP_OINFO_SKIP_UPDATE, XFS_AG_RESV_NONE); if (error) return error;
/* Update perag geometry */ pag->block_count = be32_to_cpu(agf->agf_length); diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index cc5c954cda88..da6d158d666f 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3445,63 +3445,58 @@ xfs_free_extent_fix_freelist( * after fixing up the freelist. */ int __xfs_free_extent( struct xfs_trans *tp, - xfs_fsblock_t bno, + struct xfs_perag *pag, + xfs_agblock_t agbno, xfs_extlen_t len, const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type, bool skip_discard) { struct xfs_mount *mp = tp->t_mountp; struct xfs_buf *agbp; - xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, bno); - xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, bno); struct xfs_agf *agf; int error; unsigned int busy_flags = 0; - struct xfs_perag *pag;
ASSERT(len != 0); ASSERT(type != XFS_AG_RESV_AGFL);
if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FREE_EXTENT)) return -EIO;
- pag = xfs_perag_get(mp, agno); error = xfs_free_extent_fix_freelist(tp, pag, &agbp); if (error) - goto err; + return error; agf = agbp->b_addr;
if (XFS_IS_CORRUPT(mp, agbno >= mp->m_sb.sb_agblocks)) { error = -EFSCORRUPTED; goto err_release; }
/* validate the extent size is legal now we have the agf locked */ if (XFS_IS_CORRUPT(mp, agbno + len > be32_to_cpu(agf->agf_length))) { error = -EFSCORRUPTED; goto err_release; }
- error = xfs_free_ag_extent(tp, agbp, agno, agbno, len, oinfo, type); + error = xfs_free_ag_extent(tp, agbp, pag->pag_agno, agbno, len, oinfo, + type); if (error) goto err_release;
if (skip_discard) busy_flags |= XFS_EXTENT_BUSY_SKIP_DISCARD; xfs_extent_busy_insert(tp, pag, agbno, len, busy_flags); - xfs_perag_put(pag); return 0;
err_release: xfs_trans_brelse(tp, agbp); -err: - xfs_perag_put(pag); return error; }
struct xfs_alloc_query_range_info { xfs_alloc_query_range_fn fn; diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 2c3f762dfb58..5074aed6dfad 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -128,25 +128,27 @@ xfs_alloc_vextent( * Free an extent. */ int /* error */ __xfs_free_extent( struct xfs_trans *tp, /* transaction pointer */ - xfs_fsblock_t bno, /* starting block number of extent */ + struct xfs_perag *pag, + xfs_agblock_t agbno, xfs_extlen_t len, /* length of extent */ const struct xfs_owner_info *oinfo, /* extent owner */ enum xfs_ag_resv_type type, /* block reservation type */ bool skip_discard);
static inline int xfs_free_extent( struct xfs_trans *tp, - xfs_fsblock_t bno, + struct xfs_perag *pag, + xfs_agblock_t agbno, xfs_extlen_t len, const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type) { - return __xfs_free_extent(tp, bno, len, oinfo, type, false); + return __xfs_free_extent(tp, pag, agbno, len, oinfo, type, false); }
int /* error */ xfs_alloc_lookup_le( struct xfs_btree_cur *cur, /* btree cursor */ diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c index 8c83e265770c..2dbe553d87fb 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -154,13 +154,16 @@ STATIC int __xfs_inobt_free_block( struct xfs_btree_cur *cur, struct xfs_buf *bp, enum xfs_ag_resv_type resv) { + xfs_fsblock_t fsbno; + xfs_inobt_mod_blockcount(cur, -1); - return xfs_free_extent(cur->bc_tp, - XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)), 1, + fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)); + return xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, + XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1, &XFS_RMAP_OINFO_INOBT, resv); }
STATIC int xfs_inobt_free_block( diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c index e1f789866683..3d8e62da2ccc 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.c +++ b/fs/xfs/libxfs/xfs_refcount_btree.c @@ -110,12 +110,13 @@ xfs_refcountbt_free_block(
trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno, XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1); be32_add_cpu(&agf->agf_refcount_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); - error = xfs_free_extent(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_REFC, - XFS_AG_RESV_METADATA); + error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, + XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1, + &XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA); if (error) return error;
return error; } diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index c18bd039fce9..e0ed0ebfdaea 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -580,11 +580,12 @@ xrep_reap_block( error = xfs_rmap_free(sc->tp, agf_bp, sc->sa.pag, agbno, 1, oinfo); else if (resv == XFS_AG_RESV_AGFL) error = xrep_put_freelist(sc, agbno); else - error = xfs_free_extent(sc->tp, fsbno, 1, oinfo, resv); + error = xfs_free_extent(sc->tp, sc->sa.pag, agbno, 1, oinfo, + resv); if (agf_bp != sc->sa.agf_bp) xfs_trans_brelse(sc->tp, agf_bp); if (error) return error;
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 011b50469301..c1aae07467c9 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -348,29 +348,33 @@ xfs_trans_free_extent( struct xfs_extent_free_item *xefi) { struct xfs_owner_info oinfo = { }; struct xfs_mount *mp = tp->t_mountp; struct xfs_extent *extp; + struct xfs_perag *pag; uint next_extent; xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock); xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, xefi->xefi_startblock); int error;
oinfo.oi_owner = xefi->xefi_owner; if (xefi->xefi_flags & XFS_EFI_ATTR_FORK) oinfo.oi_flags |= XFS_OWNER_INFO_ATTR_FORK; if (xefi->xefi_flags & XFS_EFI_BMBT_BLOCK) oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK;
trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, xefi->xefi_blockcount);
- error = __xfs_free_extent(tp, xefi->xefi_startblock, - xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, + pag = xfs_perag_get(mp, agno); + error = __xfs_free_extent(tp, pag, agbno, xefi->xefi_blockcount, + &oinfo, XFS_AG_RESV_NONE, xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); + xfs_perag_put(pag); + /* * Mark the transaction dirty, even on error. This ensures the * transaction is aborted, which: * * 1.) releases the EFI and frees the EFD
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: b2ccab3199aa7cea9154d80ea2585312c5f6eba0
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Darrick J. Wongdjwong@kernel.org
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: b2ccab3199aa7 ! 1: ab2da2d8a4063 xfs: pass per-ag references to xfs_free_extent @@ Metadata ## Commit message ## xfs: pass per-ag references to xfs_free_extent
+ [ Upstream commit b2ccab3199aa7cea9154d80ea2585312c5f6eba0 ] + Pass a reference to the per-AG structure to xfs_free_extent. Most callers already have one, so we can eliminate unnecessary lookups. The one exception to this is the EFI code, which the next patch will fix.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/libxfs/xfs_ag.c ## @@ fs/xfs/libxfs/xfs_ag.c: xfs_ag_extend_space( @@ fs/xfs/libxfs/xfs_alloc.c: __xfs_free_extent(
## fs/xfs/libxfs/xfs_alloc.h ## -@@ fs/xfs/libxfs/xfs_alloc.h: int xfs_alloc_vextent_first_ag(struct xfs_alloc_arg *args, +@@ fs/xfs/libxfs/xfs_alloc.h: xfs_alloc_vextent( int /* error */ __xfs_free_extent( struct xfs_trans *tp, /* transaction pointer */ ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 7dfee17b13e5024c5c0ab1911859ded4182de3e5 ]
Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno being passed to __xfs_free_extent_later(). Either way, it's very difficult to diagnose where a null perag oops in EFI creation is coming from when the operation that queued the xefi has already been completed and there's no longer any trace of it around....
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_ag.c | 5 ++++- fs/xfs/libxfs/xfs_alloc.c | 16 +++++++++++++--- fs/xfs/libxfs/xfs_alloc.h | 6 +++--- fs/xfs/libxfs/xfs_bmap.c | 10 ++++++++-- fs/xfs/libxfs/xfs_bmap_btree.c | 7 +++++-- fs/xfs/libxfs/xfs_ialloc.c | 24 ++++++++++++++++-------- fs/xfs/libxfs/xfs_refcount.c | 13 ++++++++++--- fs/xfs/xfs_reflink.c | 4 +++- 8 files changed, 62 insertions(+), 23 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index f29767e6eb7f..ee0d56854b83 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -904,11 +904,14 @@ xfs_ag_shrink_space( be32_add_cpu(&agi->agi_length, delta); be32_add_cpu(&agf->agf_length, delta); if (err2 != -ENOSPC) goto resv_err;
- __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, true); + err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, + true); + if (err2) + goto resv_err;
/* * Roll the transaction before trying to re-init the per-ag * reservation. The new transaction is clean so it will cancel * without any side effects. diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index da6d158d666f..ec03040237db 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2483,39 +2483,43 @@ xfs_agfl_reset( * because for one they are always freed one at a time. Further, an immediate * AGFL block free can cause a btree join and require another block free before * the real allocation can proceed. Deferring the free disconnects freeing up * the AGFL slot from freeing the block. */ -STATIC void +static int xfs_defer_agfl_block( struct xfs_trans *tp, xfs_agnumber_t agno, xfs_fsblock_t agbno, struct xfs_owner_info *oinfo) { struct xfs_mount *mp = tp->t_mountp; struct xfs_extent_free_item *xefi;
ASSERT(xfs_extfree_item_cache != NULL); ASSERT(oinfo != NULL);
xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); xefi->xefi_blockcount = 1; xefi->xefi_owner = oinfo->oi_owner;
+ if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock))) + return -EFSCORRUPTED; + trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); + return 0; }
/* * Add the extent to the list of extents to be free at transaction end. * The list is maintained sorted (by block number). */ -void +int __xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, @@ -2538,31 +2542,35 @@ __xfs_free_extent_later( ASSERT(len < mp->m_sb.sb_agblocks); ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif ASSERT(xfs_extfree_item_cache != NULL);
+ if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len))) + return -EFSCORRUPTED; + xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = bno; xefi->xefi_blockcount = (xfs_extlen_t)len; if (skip_discard) xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; if (oinfo) { ASSERT(oinfo->oi_offset == 0);
if (oinfo->oi_flags & XFS_OWNER_INFO_ATTR_FORK) xefi->xefi_flags |= XFS_EFI_ATTR_FORK; if (oinfo->oi_flags & XFS_OWNER_INFO_BMBT_BLOCK) xefi->xefi_flags |= XFS_EFI_BMBT_BLOCK; xefi->xefi_owner = oinfo->oi_owner; } else { xefi->xefi_owner = XFS_RMAP_OWN_NULL; } trace_xfs_bmap_free_defer(tp->t_mountp, XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); + return 0; }
#ifdef DEBUG /* * Check if an AGF has a free extent record whose length is equal to @@ -2718,11 +2726,13 @@ xfs_alloc_fix_freelist( error = xfs_alloc_get_freelist(pag, tp, agbp, &bno, 0); if (error) goto out_agbp_relse;
/* defer agfl frees */ - xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo); + error = xfs_defer_agfl_block(tp, args->agno, bno, &targs.oinfo); + if (error) + goto out_agbp_relse; }
targs.tp = tp; targs.mp = mp; targs.agbp = agbp; diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 5074aed6dfad..cfc9dcdc1753 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -211,38 +211,38 @@ xfs_buf_to_agfl_bno( if (xfs_has_crc(bp->b_mount)) return bp->b_addr + sizeof(struct xfs_agfl); return bp->b_addr; }
-void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, +int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, bool skip_discard);
/* * List of extents to be free "later". * The list is kept sorted on xbf_startblock. */ struct xfs_extent_free_item { struct list_head xefi_list; uint64_t xefi_owner; xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ unsigned int xefi_flags; };
#define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */
-static inline void +static inline int xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo) { - __xfs_free_extent_later(tp, bno, len, oinfo, false); + return __xfs_free_extent_later(tp, bno, len, oinfo, false); }
extern struct kmem_cache *xfs_extfree_item_cache;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index d523ac51c662..29781a124323 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -570,12 +570,16 @@ xfs_bmap_btree_to_extents( if (error) return error; cblock = XFS_BUF_TO_BLOCK(cbp); if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error; + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + if (error) + return error; + ip->i_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); xfs_trans_binval(tp, cbp); if (cur->bc_levels[0].bp == cbp) cur->bc_levels[0].bp = NULL; @@ -5200,14 +5204,16 @@ xfs_bmap_del_extent_real( */ if (do_fx && !(bflags & XFS_BMAPI_REMAP)) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { - __xfs_free_extent_later(tp, del->br_startblock, + error = __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, (bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN); + if (error) + goto done; } }
/* * Adjust inode # blocks in the file. diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 18de4fbfef4e..bac2a6496a8e 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -283,15 +283,18 @@ xfs_bmbt_free_block( struct xfs_mount *mp = cur->bc_mp; struct xfs_inode *ip = cur->bc_ino.ip; struct xfs_trans *tp = cur->bc_tp; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); struct xfs_owner_info oinfo; + int error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); - ip->i_nblocks--; + error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + if (error) + return error;
+ ip->i_nblocks--; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); return 0; }
diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 120dbec16f5c..448ea76d50f8 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1825,81 +1825,87 @@ xfs_dialloc( /* * Free the blocks of an inode chunk. We must consider that the inode chunk * might be sparse and only free the regions that are allocated as part of the * chunk. */ -STATIC void +static int xfs_difree_inode_chunk( struct xfs_trans *tp, xfs_agnumber_t agno, struct xfs_inobt_rec_incore *rec) { struct xfs_mount *mp = tp->t_mountp; xfs_agblock_t sagbno = XFS_AGINO_TO_AGBNO(mp, rec->ir_startino); int startidx, endidx; int nextbit; xfs_agblock_t agbno; int contigblk; DECLARE_BITMAP(holemask, XFS_INOBT_HOLEMASK_BITS);
if (!xfs_inobt_issparse(rec->ir_holemask)) { /* not sparse, calculate extent info directly */ - xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), - M_IGEO(mp)->ialloc_blks, - &XFS_RMAP_OINFO_INODES); - return; + return xfs_free_extent_later(tp, + XFS_AGB_TO_FSB(mp, agno, sagbno), + M_IGEO(mp)->ialloc_blks, + &XFS_RMAP_OINFO_INODES); }
/* holemask is only 16-bits (fits in an unsigned long) */ ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0])); holemask[0] = rec->ir_holemask;
/* * Find contiguous ranges of zeroes (i.e., allocated regions) in the * holemask and convert the start/end index of each range to an extent. * We start with the start and end index both pointing at the first 0 in * the mask. */ startidx = endidx = find_first_zero_bit(holemask, XFS_INOBT_HOLEMASK_BITS); nextbit = startidx + 1; while (startidx < XFS_INOBT_HOLEMASK_BITS) { + int error; + nextbit = find_next_zero_bit(holemask, XFS_INOBT_HOLEMASK_BITS, nextbit); /* * If the next zero bit is contiguous, update the end index of * the current range and continue. */ if (nextbit != XFS_INOBT_HOLEMASK_BITS && nextbit == endidx + 1) { endidx = nextbit; goto next; }
/* * nextbit is not contiguous with the current end index. Convert * the current start/end to an extent and add it to the free * list. */ agbno = sagbno + (startidx * XFS_INODES_PER_HOLEMASK_BIT) / mp->m_sb.sb_inopblock; contigblk = ((endidx - startidx + 1) * XFS_INODES_PER_HOLEMASK_BIT) / mp->m_sb.sb_inopblock;
ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); - xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, agbno), - contigblk, &XFS_RMAP_OINFO_INODES); + error = xfs_free_extent_later(tp, + XFS_AGB_TO_FSB(mp, agno, agbno), + contigblk, &XFS_RMAP_OINFO_INODES); + if (error) + return error;
/* reset range to current bit and carry on... */ startidx = endidx = nextbit;
next: nextbit++; } + return 0; }
STATIC int xfs_difree_inobt( struct xfs_mount *mp, @@ -1996,11 +2002,13 @@ xfs_difree_inobt( xfs_warn(mp, "%s: xfs_btree_delete returned error %d.", __func__, error); goto error0; }
- xfs_difree_inode_chunk(tp, pag->pag_agno, &rec); + error = xfs_difree_inode_chunk(tp, pag->pag_agno, &rec); + if (error) + goto error0; } else { xic->deleted = false;
error = xfs_inobt_update(cur, &rec); if (error) { diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index bcf46aa0d08b..fec1ad95988c 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1127,12 +1127,14 @@ xfs_refcount_adjust_extents( } } else { fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, tmp.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, + error = xfs_free_extent_later(cur->bc_tp, fsbno, tmp.rc_blockcount, NULL); + if (error) + goto out_error; }
(*agbno) += tmp.rc_blockcount; (*aglen) -= tmp.rc_blockcount;
@@ -1186,12 +1188,14 @@ xfs_refcount_adjust_extents( goto advloop; } else { fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, ext.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, + error = xfs_free_extent_later(cur->bc_tp, fsbno, ext.rc_blockcount, NULL); + if (error) + goto out_error; }
skip: error = xfs_btree_increment(cur, 0, &found_rec); if (error) @@ -1956,11 +1960,14 @@ xfs_refcount_recover_cow_leftovers( rr->rr_rrec.rc_startblock); xfs_refcount_free_cow_extent(tp, fsb, rr->rr_rrec.rc_blockcount);
/* Free the block. */ - xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL); + error = xfs_free_extent_later(tp, fsb, + rr->rr_rrec.rc_blockcount, NULL); + if (error) + goto out_trans;
error = xfs_trans_commit(tp); if (error) goto out_free;
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index cbdc23217a42..ee187e90d943 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -616,12 +616,14 @@ xfs_reflink_cancel_cow_blocks(
/* Free the CoW orphan record. */ xfs_refcount_free_cow_extent(*tpp, del.br_startblock, del.br_blockcount);
- xfs_free_extent_later(*tpp, del.br_startblock, + error = xfs_free_extent_later(*tpp, del.br_startblock, del.br_blockcount, NULL); + if (error) + break;
/* Roll the transaction */ error = xfs_defer_finish(tpp); if (error) break;
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 06/29 of a series ⚠️ Found follow-up fixes in mainline
The upstream commit SHA1 provided is correct: 7dfee17b13e5024c5c0ab1911859ded4182de3e5
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Dave Chinnerdchinner@redhat.com
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Found fixes commits: 2bed0d82c2f7 xfs: fix bounds check in xfs_defer_agfl_block()
Note: The patch differs from the upstream commit: --- 1: 7dfee17b13e50 ! 1: 0de6be42d2941 xfs: validate block number being freed before adding to xefi @@ Metadata ## Commit message ## xfs: validate block number being freed before adding to xefi
+ [ Upstream commit 7dfee17b13e5024c5c0ab1911859ded4182de3e5 ] + Bad things happen in defered extent freeing operations if it is passed a bad block number in the xefi. This can come from a bogus agno/agbno pair from deferred agfl freeing, or just a bad fsbno @@ Commit message Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Dave Chinner david@fromorbit.com + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/libxfs/xfs_ag.c ## @@ fs/xfs/libxfs/xfs_ag.c: xfs_ag_shrink_space( @@ fs/xfs/libxfs/xfs_alloc.c: xfs_defer_agfl_block( + trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
- xfs_extent_free_get_group(mp, xefi); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); + return 0; } @@ fs/xfs/libxfs/xfs_alloc.c: __xfs_free_extent_later( GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = bno; @@ fs/xfs/libxfs/xfs_alloc.c: __xfs_free_extent_later( - - xfs_extent_free_get_group(mp, xefi); + XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, + XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); + return 0; } @@ fs/xfs/libxfs/xfs_alloc.h: xfs_buf_to_agfl_bno( xfs_filblks_t len, const struct xfs_owner_info *oinfo, bool skip_discard);
-@@ fs/xfs/libxfs/xfs_alloc.h: void xfs_extent_free_get_group(struct xfs_mount *mp, +@@ fs/xfs/libxfs/xfs_alloc.h: struct xfs_extent_free_item { #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */
---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070 ]
Need to happen before we allocate and then leak the xefi. Found by coverity via an xfsprogs libxfs scan.
[djwong: This also fixes the type of the @agbno argument.]
Fixes: 7dfee17b13e5 ("xfs: validate block number being freed before adding to xefi") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_alloc.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index ec03040237db..af447605051b 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2487,28 +2487,29 @@ xfs_agfl_reset( */ static int xfs_defer_agfl_block( struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_fsblock_t agbno, + xfs_agblock_t agbno, struct xfs_owner_info *oinfo) { struct xfs_mount *mp = tp->t_mountp; struct xfs_extent_free_item *xefi; + xfs_fsblock_t fsbno = XFS_AGB_TO_FSB(mp, agno, agbno);
ASSERT(xfs_extfree_item_cache != NULL); ASSERT(oinfo != NULL);
+ if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, fsbno))) + return -EFSCORRUPTED; + xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); - xefi->xefi_startblock = XFS_AGB_TO_FSB(mp, agno, agbno); + xefi->xefi_startblock = fsbno; xefi->xefi_blockcount = 1; xefi->xefi_owner = oinfo->oi_owner;
- if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock))) - return -EFSCORRUPTED; - trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); return 0; }
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Dave Chinnerdchinner@redhat.com
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: 2bed0d82c2f78 ! 1: aec5f983044b3 xfs: fix bounds check in xfs_defer_agfl_block() @@ Metadata ## Commit message ## xfs: fix bounds check in xfs_defer_agfl_block()
+ [ Upstream commit 2bed0d82c2f78b91a0a9a5a73da57ee883a0c070 ] + Need to happen before we allocate and then leak the xefi. Found by coverity via an xfsprogs libxfs scan.
@@ Commit message Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org + Signed-off-by: Leah Rumancik leah.rumancik@gmail.com + Acked-by: "Darrick J. Wong" djwong@kernel.org
## fs/xfs/libxfs/xfs_alloc.c ## @@ fs/xfs/libxfs/xfs_alloc.c: static int @@ fs/xfs/libxfs/xfs_alloc.c: static int + xefi->xefi_startblock = fsbno; xefi->xefi_blockcount = 1; xefi->xefi_owner = oinfo->oi_owner; - xefi->xefi_agresv = XFS_AG_RESV_AGFL;
- if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, xefi->xefi_startblock))) - return -EFSCORRUPTED; - trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
- xfs_extent_free_get_group(mp, xefi); + xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: Dave Chinner dchinner@redhat.com
[ Upstream commit b742d7b4f0e03df25c2a772adcded35044b625ca ]
[ 6.1: resolved conflict in xfs_extfree_item.c ]
Btrees that aren't freespace management trees use the normal extent allocation and freeing routines for their blocks. Hence when a btree block is freed, a direct call to xfs_free_extent() is made and the extent is immediately freed. This puts the entire free space management btrees under this path, so we are stacking btrees on btrees in the call stack. The inobt, finobt and refcount btrees all do this.
However, the bmap btree does not do this - it calls xfs_free_extent_later() to defer the extent free operation via an XEFI and hence it gets processed in deferred operation processing during the commit of the primary transaction (i.e. via intent chaining).
We need to change xfs_free_extent() to behave in a non-blocking manner so that we can avoid deadlocks with busy extents near ENOSPC in transactions that free multiple extents. Inserting or removing a record from a btree can cause a multi-level tree merge operation and that will free multiple blocks from the btree in a single transaction. i.e. we can call xfs_free_extent() multiple times, and hence the btree manipulation transaction is vulnerable to this busy extent deadlock vector.
To fix this, convert all the remaining callers of xfs_free_extent() to use xfs_free_extent_later() to queue XEFIs and hence defer processing of the extent frees to a context that can be safely restarted if a deadlock condition is detected.
Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Chandan Babu R chandan.babu@oracle.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_ag.c | 2 +- fs/xfs/libxfs/xfs_alloc.c | 4 ++++ fs/xfs/libxfs/xfs_alloc.h | 8 +++++--- fs/xfs/libxfs/xfs_bmap.c | 8 +++++--- fs/xfs/libxfs/xfs_bmap_btree.c | 3 ++- fs/xfs/libxfs/xfs_ialloc.c | 8 ++++---- fs/xfs/libxfs/xfs_ialloc_btree.c | 3 +-- fs/xfs/libxfs/xfs_refcount.c | 9 ++++++--- fs/xfs/libxfs/xfs_refcount_btree.c | 8 +------- fs/xfs/xfs_extfree_item.c | 3 ++- fs/xfs/xfs_reflink.c | 3 ++- 11 files changed, 33 insertions(+), 26 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index ee0d56854b83..e03bfeacbed4 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -905,11 +905,11 @@ xfs_ag_shrink_space( be32_add_cpu(&agf->agf_length, delta); if (err2 != -ENOSPC) goto resv_err;
err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, - true); + XFS_AG_RESV_NONE, true); if (err2) goto resv_err;
/* * Roll the transaction before trying to re-init the per-ag diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index af447605051b..e44f3f5c6d27 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2505,55 +2505,59 @@ xfs_defer_agfl_block( xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = fsbno; xefi->xefi_blockcount = 1; xefi->xefi_owner = oinfo->oi_owner; + xefi->xefi_agresv = XFS_AG_RESV_AGFL;
trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); return 0; }
/* * Add the extent to the list of extents to be free at transaction end. * The list is maintained sorted (by block number). */ int __xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type, bool skip_discard) { struct xfs_extent_free_item *xefi; #ifdef DEBUG struct xfs_mount *mp = tp->t_mountp; xfs_agnumber_t agno; xfs_agblock_t agbno;
ASSERT(bno != NULLFSBLOCK); ASSERT(len > 0); ASSERT(len <= XFS_MAX_BMBT_EXTLEN); ASSERT(!isnullstartblock(bno)); agno = XFS_FSB_TO_AGNO(mp, bno); agbno = XFS_FSB_TO_AGBNO(mp, bno); ASSERT(agno < mp->m_sb.sb_agcount); ASSERT(agbno < mp->m_sb.sb_agblocks); ASSERT(len < mp->m_sb.sb_agblocks); ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif ASSERT(xfs_extfree_item_cache != NULL); + ASSERT(type != XFS_AG_RESV_AGFL);
if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len))) return -EFSCORRUPTED;
xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = bno; xefi->xefi_blockcount = (xfs_extlen_t)len; + xefi->xefi_agresv = type; if (skip_discard) xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; if (oinfo) { ASSERT(oinfo->oi_offset == 0);
diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index cfc9dcdc1753..bbedb18651de 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -213,36 +213,38 @@ xfs_buf_to_agfl_bno( return bp->b_addr; }
int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, - bool skip_discard); + enum xfs_ag_resv_type type, bool skip_discard);
/* * List of extents to be free "later". * The list is kept sorted on xbf_startblock. */ struct xfs_extent_free_item { struct list_head xefi_list; uint64_t xefi_owner; xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ unsigned int xefi_flags; + enum xfs_ag_resv_type xefi_agresv; };
#define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */
static inline int xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, - const struct xfs_owner_info *oinfo) + const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type) { - return __xfs_free_extent_later(tp, bno, len, oinfo, false); + return __xfs_free_extent_later(tp, bno, len, oinfo, type, false); }
extern struct kmem_cache *xfs_extfree_item_cache;
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 29781a124323..aac071e4d000 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -572,11 +572,12 @@ xfs_bmap_btree_to_extents( cblock = XFS_BUF_TO_BLOCK(cbp); if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo, + XFS_AG_RESV_NONE); if (error) return error;
ip->i_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); @@ -5206,12 +5207,13 @@ xfs_bmap_del_extent_real( if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { error = __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, - (bflags & XFS_BMAPI_NODISCARD) || - del->br_state == XFS_EXT_UNWRITTEN); + XFS_AG_RESV_NONE, + ((bflags & XFS_BMAPI_NODISCARD) || + del->br_state == XFS_EXT_UNWRITTEN)); if (error) goto done; } }
diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index bac2a6496a8e..57f401f2492d 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -286,11 +286,12 @@ xfs_bmbt_free_block( xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); struct xfs_owner_info oinfo; int error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo, + XFS_AG_RESV_NONE); if (error) return error;
ip->i_nblocks--; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 448ea76d50f8..d1472cbd48ff 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1844,12 +1844,12 @@ xfs_difree_inode_chunk(
if (!xfs_inobt_issparse(rec->ir_holemask)) { /* not sparse, calculate extent info directly */ return xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), - M_IGEO(mp)->ialloc_blks, - &XFS_RMAP_OINFO_INODES); + M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES, + XFS_AG_RESV_NONE); }
/* holemask is only 16-bits (fits in an unsigned long) */ ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0])); holemask[0] = rec->ir_holemask; @@ -1890,12 +1890,12 @@ xfs_difree_inode_chunk( mp->m_sb.sb_inopblock;
ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); error = xfs_free_extent_later(tp, - XFS_AGB_TO_FSB(mp, agno, agbno), - contigblk, &XFS_RMAP_OINFO_INODES); + XFS_AGB_TO_FSB(mp, agno, agbno), contigblk, + &XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE); if (error) return error;
/* reset range to current bit and carry on... */ startidx = endidx = nextbit; diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c index 2dbe553d87fb..7125447cde1a 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -158,12 +158,11 @@ __xfs_inobt_free_block( { xfs_fsblock_t fsbno;
xfs_inobt_mod_blockcount(cur, -1); fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)); - return xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, - XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1, + return xfs_free_extent_later(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_INOBT, resv); }
STATIC int xfs_inobt_free_block( diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index fec1ad95988c..4ec7a81dd3ef 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1128,11 +1128,12 @@ xfs_refcount_adjust_extents( } else { fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, tmp.rc_startblock); error = xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL); + tmp.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_error; }
(*agbno) += tmp.rc_blockcount; @@ -1189,11 +1190,12 @@ xfs_refcount_adjust_extents( } else { fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, ext.rc_startblock); error = xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL); + ext.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_error; }
skip: @@ -1961,11 +1963,12 @@ xfs_refcount_recover_cow_leftovers( xfs_refcount_free_cow_extent(tp, fsb, rr->rr_rrec.rc_blockcount);
/* Free the block. */ error = xfs_free_extent_later(tp, fsb, - rr->rr_rrec.rc_blockcount, NULL); + rr->rr_rrec.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_trans;
error = xfs_trans_commit(tp); if (error) diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c index 3d8e62da2ccc..fbd53b6951a9 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.c +++ b/fs/xfs/libxfs/xfs_refcount_btree.c @@ -104,23 +104,17 @@ xfs_refcountbt_free_block( { struct xfs_mount *mp = cur->bc_mp; struct xfs_buf *agbp = cur->bc_ag.agbp; struct xfs_agf *agf = agbp->b_addr; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); - int error;
trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno, XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1); be32_add_cpu(&agf->agf_refcount_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); - error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, - XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1, + return xfs_free_extent_later(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA); - if (error) - return error; - - return error; }
STATIC int xfs_refcountbt_get_minrecs( struct xfs_btree_cur *cur, diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index c1aae07467c9..b670863d8467 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -367,11 +367,11 @@ xfs_trans_free_extent( trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, xefi->xefi_blockcount);
pag = xfs_perag_get(mp, agno); error = __xfs_free_extent(tp, pag, agbno, xefi->xefi_blockcount, - &oinfo, XFS_AG_RESV_NONE, + &oinfo, xefi->xefi_agresv, xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); xfs_perag_put(pag);
/* * Mark the transaction dirty, even on error. This ensures the @@ -626,10 +626,11 @@ xfs_efi_item_recover( efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) { struct xfs_extent_free_item fake = { .xefi_owner = XFS_RMAP_OWN_UNKNOWN, + .xefi_agresv = XFS_AG_RESV_NONE, }; struct xfs_extent *extp;
extp = &efip->efi_format.efi_extents[i];
diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index ee187e90d943..1bac6a8af970 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -617,11 +617,12 @@ xfs_reflink_cancel_cow_blocks( /* Free the CoW orphan record. */ xfs_refcount_free_cow_extent(*tpp, del.br_startblock, del.br_blockcount);
error = xfs_free_extent_later(*tpp, del.br_startblock, - del.br_blockcount, NULL); + del.br_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) break;
/* Roll the transaction */ error = xfs_defer_finish(tpp);
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 08/29 of a series ❌ Build failures detected
The upstream commit SHA1 provided is correct: b742d7b4f0e03df25c2a772adcded35044b625ca
WARNING: Author mismatch between patch and upstream commit: Backport author: Leah Rumancikleah.rumancik@gmail.com Commit author: Dave Chinnerdchinner@redhat.com
Status in newer kernel trees: 6.13.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- Failed to apply patch cleanly. ---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Failed | N/A |
Build Errors: Patch failed to apply on stable/linux-6.1.y. Reject:
diff a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c (rejected hunks) @@ -1128,11 +1128,12 @@ xfs_refcount_adjust_extents( } else { fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, tmp.rc_startblock); error = xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL); + tmp.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_error; }
(*agbno) += tmp.rc_blockcount; @@ -1189,11 +1190,12 @@ xfs_refcount_adjust_extents( } else { fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, ext.rc_startblock); error = xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL); + ext.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_error; }
skip: @@ -1961,11 +1963,12 @@ xfs_refcount_recover_cow_leftovers( xfs_refcount_free_cow_extent(tp, fsb, rr->rr_rrec.rc_blockcount);
/* Free the block. */ error = xfs_free_extent_later(tp, fsb, - rr->rr_rrec.rc_blockcount, NULL); + rr->rr_rrec.rc_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) goto out_trans;
error = xfs_trans_commit(tp); if (error) diff a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c (rejected hunks) @@ -1844,12 +1844,12 @@ xfs_difree_inode_chunk(
if (!xfs_inobt_issparse(rec->ir_holemask)) { /* not sparse, calculate extent info directly */ return xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), - M_IGEO(mp)->ialloc_blks, - &XFS_RMAP_OINFO_INODES); + M_IGEO(mp)->ialloc_blks, &XFS_RMAP_OINFO_INODES, + XFS_AG_RESV_NONE); }
/* holemask is only 16-bits (fits in an unsigned long) */ ASSERT(sizeof(rec->ir_holemask) <= sizeof(holemask[0])); holemask[0] = rec->ir_holemask; @@ -1890,12 +1890,12 @@ xfs_difree_inode_chunk( mp->m_sb.sb_inopblock;
ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); error = xfs_free_extent_later(tp, - XFS_AGB_TO_FSB(mp, agno, agbno), - contigblk, &XFS_RMAP_OINFO_INODES); + XFS_AGB_TO_FSB(mp, agno, agbno), contigblk, + &XFS_RMAP_OINFO_INODES, XFS_AG_RESV_NONE); if (error) return error;
/* reset range to current bit and carry on... */ startidx = endidx = nextbit; diff a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c (rejected hunks) @@ -2505,55 +2505,59 @@ xfs_defer_agfl_block( xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = fsbno; xefi->xefi_blockcount = 1; xefi->xefi_owner = oinfo->oi_owner; + xefi->xefi_agresv = XFS_AG_RESV_AGFL;
trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1);
xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); return 0; }
/* * Add the extent to the list of extents to be free at transaction end. * The list is maintained sorted (by block number). */ int __xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type, bool skip_discard) { struct xfs_extent_free_item *xefi; #ifdef DEBUG struct xfs_mount *mp = tp->t_mountp; xfs_agnumber_t agno; xfs_agblock_t agbno;
ASSERT(bno != NULLFSBLOCK); ASSERT(len > 0); ASSERT(len <= XFS_MAX_BMBT_EXTLEN); ASSERT(!isnullstartblock(bno)); agno = XFS_FSB_TO_AGNO(mp, bno); agbno = XFS_FSB_TO_AGBNO(mp, bno); ASSERT(agno < mp->m_sb.sb_agcount); ASSERT(agbno < mp->m_sb.sb_agblocks); ASSERT(len < mp->m_sb.sb_agblocks); ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif ASSERT(xfs_extfree_item_cache != NULL); + ASSERT(type != XFS_AG_RESV_AGFL);
if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbext(mp, bno, len))) return -EFSCORRUPTED;
xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = bno; xefi->xefi_blockcount = (xfs_extlen_t)len; + xefi->xefi_agresv = type; if (skip_discard) xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; if (oinfo) { ASSERT(oinfo->oi_offset == 0);
diff a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c (rejected hunks) @@ -104,23 +104,17 @@ xfs_refcountbt_free_block( { struct xfs_mount *mp = cur->bc_mp; struct xfs_buf *agbp = cur->bc_ag.agbp; struct xfs_agf *agf = agbp->b_addr; xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); - int error;
trace_xfs_refcountbt_free_block(cur->bc_mp, cur->bc_ag.pag->pag_agno, XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1); be32_add_cpu(&agf->agf_refcount_blocks, -1); xfs_alloc_log_agf(cur->bc_tp, agbp, XFS_AGF_REFCOUNT_BLOCKS); - error = xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, - XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1, + return xfs_free_extent_later(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_REFC, XFS_AG_RESV_METADATA); - if (error) - return error; - - return error; }
STATIC int xfs_refcountbt_get_minrecs( struct xfs_btree_cur *cur, diff a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c (rejected hunks) @@ -158,12 +158,11 @@ __xfs_inobt_free_block( { xfs_fsblock_t fsbno;
xfs_inobt_mod_blockcount(cur, -1); fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, xfs_buf_daddr(bp)); - return xfs_free_extent(cur->bc_tp, cur->bc_ag.pag, - XFS_FSB_TO_AGBNO(cur->bc_mp, fsbno), 1, + return xfs_free_extent_later(cur->bc_tp, fsbno, 1, &XFS_RMAP_OINFO_INOBT, resv); }
STATIC int xfs_inobt_free_block( diff a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c (rejected hunks) @@ -905,11 +905,11 @@ xfs_ag_shrink_space( be32_add_cpu(&agf->agf_length, delta); if (err2 != -ENOSPC) goto resv_err;
err2 = __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, - true); + XFS_AG_RESV_NONE, true); if (err2) goto resv_err;
/* * Roll the transaction before trying to re-init the per-ag diff a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c (rejected hunks) @@ -286,11 +286,12 @@ xfs_bmbt_free_block( xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); struct xfs_owner_info oinfo; int error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo, + XFS_AG_RESV_NONE); if (error) return error;
ip->i_nblocks--; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h (rejected hunks) @@ -213,36 +213,38 @@ xfs_buf_to_agfl_bno( return bp->b_addr; }
int __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, - bool skip_discard); + enum xfs_ag_resv_type type, bool skip_discard);
/* * List of extents to be free "later". * The list is kept sorted on xbf_startblock. */ struct xfs_extent_free_item { struct list_head xefi_list; uint64_t xefi_owner; xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ unsigned int xefi_flags; + enum xfs_ag_resv_type xefi_agresv; };
#define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */
static inline int xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, - const struct xfs_owner_info *oinfo) + const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type) { - return __xfs_free_extent_later(tp, bno, len, oinfo, false); + return __xfs_free_extent_later(tp, bno, len, oinfo, type, false); }
extern struct kmem_cache *xfs_extfree_item_cache;
diff a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c (rejected hunks) @@ -572,11 +572,12 @@ xfs_bmap_btree_to_extents( cblock = XFS_BUF_TO_BLOCK(cbp); if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error;
xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + error = xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo, + XFS_AG_RESV_NONE); if (error) return error;
ip->i_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); @@ -5206,12 +5207,13 @@ xfs_bmap_del_extent_real( if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { error = __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, - (bflags & XFS_BMAPI_NODISCARD) || - del->br_state == XFS_EXT_UNWRITTEN); + XFS_AG_RESV_NONE, + ((bflags & XFS_BMAPI_NODISCARD) || + del->br_state == XFS_EXT_UNWRITTEN)); if (error) goto done; } }
diff a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c (rejected hunks) @@ -367,11 +367,11 @@ xfs_trans_free_extent( trace_xfs_bmap_free_deferred(tp->t_mountp, agno, 0, agbno, xefi->xefi_blockcount);
pag = xfs_perag_get(mp, agno); error = __xfs_free_extent(tp, pag, agbno, xefi->xefi_blockcount, - &oinfo, XFS_AG_RESV_NONE, + &oinfo, xefi->xefi_agresv, xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); xfs_perag_put(pag);
/* * Mark the transaction dirty, even on error. This ensures the @@ -626,10 +626,11 @@ xfs_efi_item_recover( efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) { struct xfs_extent_free_item fake = { .xefi_owner = XFS_RMAP_OWN_UNKNOWN, + .xefi_agresv = XFS_AG_RESV_NONE, }; struct xfs_extent *extp;
extp = &efip->efi_format.efi_extents[i];
diff a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c (rejected hunks) @@ -617,11 +617,12 @@ xfs_reflink_cancel_cow_blocks( /* Free the CoW orphan record. */ xfs_refcount_free_cow_extent(*tpp, del.br_startblock, del.br_blockcount);
error = xfs_free_extent_later(*tpp, del.br_startblock, - del.br_blockcount, NULL); + del.br_blockcount, NULL, + XFS_AG_RESV_NONE); if (error) break;
/* Roll the transaction */ error = xfs_defer_finish(tpp);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 3c919b0910906cc69d76dea214776f0eac73358b ]
Wengang Wang reports that a customer's system was running a number of truncate operations on a filesystem with a very small log. Contention on the reserve heads lead to other threads stalling on smaller updates (e.g. mtime updates) long enough to result in the node being rebooted on account of the lack of responsivenes. The node failed to recover because log recovery of an EFI became stuck waiting for a grant of reserve space. From Wengang's report:
"For the file deletion, log bytes are reserved basing on xfs_mount->tr_itruncate which is:
tr_logres = 175488, tr_logcount = 2, tr_logflags = XFS_TRANS_PERM_LOG_RES,
"You see it's a permanent log reservation with two log operations (two transactions in rolling mode). After calculation (xlog_calc_unit_res() adds space for various log headers), the final log space needed per transaction changes from 175488 to 180208 bytes. So the total log space needed is 360416 bytes (180208 * 2). [That quantity] of log space (360416 bytes) needs to be reserved for both run time inode removing (xfs_inactive_truncate()) and EFI recover (xfs_efi_item_recover())."
In other words, runtime pre-reserves 360K of space in anticipation of running a chain of two transactions in which each transaction gets a 180K reservation.
Now that we've allocated the transaction, we delete the bmap mapping, log an EFI to free the space, and roll the transaction as part of finishing the deferops chain. Rolling creates a new xfs_trans which shares its ticket with the old transaction. Next, xfs_trans_roll calls __xfs_trans_commit with regrant == true, which calls xlog_cil_commit with the same regrant parameter.
xlog_cil_commit calls xfs_log_ticket_regrant, which decrements t_cnt and subtracts t_curr_res from the reservation and write heads.
If the filesystem is fresh and the first transaction only used (say) 20K, then t_curr_res will be 160K, and we give that much reservation back to the reservation head. Or if the file is really fragmented and the first transaction actually uses 170K, then t_curr_res will be 10K, and that's what we give back to the reservation.
Having done that, we're now headed into the second transaction with an EFI and 180K of reservation. Other threads apparently consumed all the reservation for smaller transactions, such as timestamp updates.
Now let's say the first transaction gets written to disk and we crash without ever completing the second transaction. Now we remount the fs, log recovery finds the unfinished EFI, and calls xfs_efi_recover to finish the EFI. However, xfs_efi_recover starts a new tr_itruncate tranasction, which asks for 360K log reservation. This is a lot more than the 180K that we had reserved at the time of the crash. If the first EFI to be recovered is also pinning the tail of the log, we will be unable to free any space in the log, and recovery livelocks.
Wengang confirmed this:
"Now we have the second transaction which has 180208 log bytes reserved too. The second transaction is supposed to process intents including extent freeing. With my hacking patch, I blocked the extent freeing 5 hours. So in that 5 hours, 180208 (NOT 360416) log bytes are reserved.
"With my test case, other transactions (update timestamps) then happen. As my hacking patch pins the journal tail, those timestamp-updating transactions finally use up (almost) all the left available log space (in memory in on disk). And finally the on disk (and in memory) available log space goes down near to 180208 bytes. Those 180208 bytes are reserved by [the] second (extent-free) transaction [in the chain]."
Wengang and I noticed that EFI recovery starts a transaction, completes one step of the chain, and commits the transaction without completing any other steps of the chain. Those subsequent steps are completed by xlog_finish_defer_ops, which allocates yet another transaction to finish the rest of the chain. That transaction gets the same tr_logres as the head transaction, but with tr_logcount = 1 to force regranting with every roll to avoid livelocks.
In other words, we already figured this out in commit 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation"), but should have applied that logic to each intent item's recovery function. For Wengang's case, the xfs_trans_alloc call in the EFI recovery function should only be asking for a single transaction's worth of log reservation -- 180K, not 360K.
Quoting Wengang again:
"With log recovery, during EFI recovery, we use tr_itruncate again to reserve two transactions that needs 360416 log bytes. Reserving 360416 bytes fails [stalls] because we now only have about 180208 available.
"Actually during the EFI recover, we only need one transaction to free the extents just like the 2nd transaction at RUNTIME. So it only needs to reserve 180208 rather than 360416 bytes. We have (a bit) more than 180208 available log bytes on disk, so [if we decrease the reservation to 180K] the reservation goes and the recovery [finishes]. That is to say: we can fix the log recover part to fix the issue. We can introduce a new xfs_trans_res xfs_mount->tr_ext_free
{ tr_logres = 175488, tr_logcount = 0, tr_logflags = 0, }
"and use tr_ext_free instead of tr_itruncate in EFI recover."
However, I don't think it quite makes sense to create an entirely new transaction reservation type to handle single-stepping during log recovery. Instead, we should copy the transaction reservation information in the xfs_mount, change tr_logcount to 1, and pass that into xfs_trans_alloc. We know this won't risk changing the min log size computation since we always ask for a fraction of the reservation for all known transaction types.
This looks like it's been lurking in the codebase since commit 3d3c8b5222b92, which changed the xfs_trans_reserve call in xlog_recover_process_efi to use the tr_logcount in tr_itruncate. That changed the EFI recovery transaction from making a non-XFS_TRANS_PERM_LOG_RES request for one transaction's worth of log space to a XFS_TRANS_PERM_LOG_RES request for two transactions worth.
Fixes: 3d3c8b5222b92 ("xfs: refactor xfs_trans_reserve() interface") Complements: 929b92f64048d ("xfs: xfs_defer_capture should absorb remaining transaction reservation") Suggested-by: Wengang Wang wen.gang.wang@oracle.com Cc: Srikanth C S srikanth.c.s@oracle.com [djwong: apply the same transformation to all log intent recovery] Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_log_recover.h | 22 ++++++++++++++++++++++ fs/xfs/xfs_attr_item.c | 7 ++++--- fs/xfs/xfs_bmap_item.c | 4 +++- fs/xfs/xfs_extfree_item.c | 4 +++- fs/xfs/xfs_refcount_item.c | 6 ++++-- fs/xfs/xfs_rmap_item.c | 6 ++++-- 6 files changed, 40 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index 2420865f3007..a5100a11faf9 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -129,6 +129,28 @@ void xlog_free_buf_cancel_table(struct xlog *log); void xlog_check_buf_cancel_table(struct xlog *log); #else #define xlog_check_buf_cancel_table(log) do { } while (0) #endif
+/* + * Transform a regular reservation into one suitable for recovery of a log + * intent item. + * + * Intent recovery only runs a single step of the transaction chain and defers + * the rest to a separate transaction. Therefore, we reduce logcount to 1 here + * to avoid livelocks if the log grant space is nearly exhausted due to the + * recovered intent pinning the tail. Keep the same logflags to avoid tripping + * asserts elsewhere. Struct copies abound below. + */ +static inline struct xfs_trans_res +xlog_recover_resv(const struct xfs_trans_res *r) +{ + struct xfs_trans_res ret = { + .tr_logres = r->tr_logres, + .tr_logcount = 1, + .tr_logflags = r->tr_logflags, + }; + + return ret; +} + #endif /* __XFS_LOG_RECOVER_H__ */ diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index 2788a6f2edcd..36fe2abb16e6 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -545,11 +545,11 @@ xfs_attri_item_recover( struct xfs_attr_intent *attr; struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_inode *ip; struct xfs_da_args *args; struct xfs_trans *tp; - struct xfs_trans_res tres; + struct xfs_trans_res resv; struct xfs_attri_log_format *attrp; struct xfs_attri_log_nameval *nv = attrip->attri_nameval; int error; int total; int local; @@ -616,12 +616,13 @@ xfs_attri_item_recover( ASSERT(0); error = -EFSCORRUPTED; goto out; }
- xfs_init_attr_trans(args, &tres, &total); - error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp); + xfs_init_attr_trans(args, &resv, &total); + resv = xlog_recover_resv(&resv); + error = xfs_trans_alloc(mp, &resv, total, 0, XFS_TRANS_RESERVE, &tp); if (error) goto out;
args->trans = tp; done_item = xfs_trans_get_attrd(tp, attrip); diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 13aa5359c02f..1058603db3ac 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -455,36 +455,38 @@ STATIC int xfs_bui_item_recover( struct xfs_log_item *lip, struct list_head *capture_list) { struct xfs_bmap_intent fake = { }; + struct xfs_trans_res resv; struct xfs_bui_log_item *buip = BUI_ITEM(lip); struct xfs_trans *tp; struct xfs_inode *ip = NULL; struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_map_extent *map; struct xfs_bud_log_item *budp; int iext_delta; int error = 0;
if (!xfs_bui_validate(mp, buip)) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &buip->bui_format, sizeof(buip->bui_format)); return -EFSCORRUPTED; }
map = &buip->bui_format.bui_extents[0]; fake.bi_whichfork = (map->me_flags & XFS_BMAP_EXTENT_ATTR_FORK) ? XFS_ATTR_FORK : XFS_DATA_FORK; fake.bi_type = map->me_flags & XFS_BMAP_EXTENT_TYPE_MASK;
error = xlog_recover_iget(mp, map->me_owner, &ip); if (error) return error;
/* Allocate transaction and do the work. */ - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, + resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); + error = xfs_trans_alloc(mp, &resv, XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); if (error) goto err_rele;
budp = xfs_trans_get_bud(tp, buip); diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index b670863d8467..3ed25c352269 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -596,33 +596,35 @@ xfs_efi_validate_ext( STATIC int xfs_efi_item_recover( struct xfs_log_item *lip, struct list_head *capture_list) { + struct xfs_trans_res resv; struct xfs_efi_log_item *efip = EFI_ITEM(lip); struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_efd_log_item *efdp; struct xfs_trans *tp; int i; int error = 0;
/* * First check the validity of the extents described by the * EFI. If any are bad, then assume that all are bad and * just toss the EFI. */ for (i = 0; i < efip->efi_format.efi_nextents; i++) { if (!xfs_efi_validate_ext(mp, &efip->efi_format.efi_extents[i])) { XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, &efip->efi_format, sizeof(efip->efi_format)); return -EFSCORRUPTED; } }
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); + resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); + error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp); if (error) return error; efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) { diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index ff4d5087ba00..dfd7b824e32b 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -451,10 +451,11 @@ xfs_cui_validate_phys( STATIC int xfs_cui_item_recover( struct xfs_log_item *lip, struct list_head *capture_list) { + struct xfs_trans_res resv; struct xfs_cui_log_item *cuip = CUI_ITEM(lip); struct xfs_cud_log_item *cudp; struct xfs_trans *tp; struct xfs_btree_cur *rcur = NULL; struct xfs_mount *mp = lip->li_log->l_mp; @@ -488,12 +489,13 @@ xfs_cui_item_recover( * refcount update. However, we're in log recovery here, so we * use the passed in defer_ops and to finish up any work that * doesn't fit. We need to reserve enough blocks to handle a * full btree split on either end of the refcount range. */ - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, - mp->m_refc_maxlevels * 2, 0, XFS_TRANS_RESERVE, &tp); + resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); + error = xfs_trans_alloc(mp, &resv, mp->m_refc_maxlevels * 2, 0, + XFS_TRANS_RESERVE, &tp); if (error) return error;
cudp = xfs_trans_get_cud(tp, cuip);
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 534504ede1a3..2043cea261c0 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -490,10 +490,11 @@ xfs_rui_validate_map( STATIC int xfs_rui_item_recover( struct xfs_log_item *lip, struct list_head *capture_list) { + struct xfs_trans_res resv; struct xfs_rui_log_item *ruip = RUI_ITEM(lip); struct xfs_map_extent *rmap; struct xfs_rud_log_item *rudp; struct xfs_trans *tp; struct xfs_btree_cur *rcur = NULL; @@ -517,12 +518,13 @@ xfs_rui_item_recover( sizeof(ruip->rui_format)); return -EFSCORRUPTED; } }
- error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, - mp->m_rmap_maxlevels, 0, XFS_TRANS_RESERVE, &tp); + resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); + error = xfs_trans_alloc(mp, &resv, mp->m_rmap_maxlevels, 0, + XFS_TRANS_RESERVE, &tp); if (error) return error; rudp = xfs_trans_get_rud(tp, ruip);
for (i = 0; i < ruip->rui_format.rui_nextents; i++) {
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 13928113fc5b5e79c91796290a99ed991ac0efe2 ]
[6.1: resolved conflicts with fscounters.c and rtsummary.c ]
Move all the declarations for functionality in xfs_rtbitmap.c into a separate xfs_rtbitmap.h header file.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_rtbitmap.c | 1 + fs/xfs/libxfs/xfs_rtbitmap.h | 82 ++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtbitmap.c | 2 +- fs/xfs/xfs_fsmap.c | 2 +- fs/xfs/xfs_rtalloc.c | 1 + fs/xfs/xfs_rtalloc.h | 73 -------------------------------- 7 files changed, 87 insertions(+), 76 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index aac071e4d000..c99fd7fe242e 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -19,11 +19,11 @@ #include "xfs_trans.h" #include "xfs_alloc.h" #include "xfs_bmap.h" #include "xfs_bmap_util.h" #include "xfs_bmap_btree.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_quota.h" #include "xfs_trans_space.h" #include "xfs_buf_item.h" diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 655108a4cd05..9eb1b5aa7e35 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -14,10 +14,11 @@ #include "xfs_inode.h" #include "xfs_bmap.h" #include "xfs_trans.h" #include "xfs_rtalloc.h" #include "xfs_error.h" +#include "xfs_rtbitmap.h"
/* * Realtime allocator bitmap functions shared with userspace. */
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h new file mode 100644 index 000000000000..546dea34bb37 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#ifndef __XFS_RTBITMAP_H__ +#define __XFS_RTBITMAP_H__ + +/* + * XXX: Most of the realtime allocation functions deal in units of realtime + * extents, not realtime blocks. This looks funny when paired with the type + * name and screams for a larger cleanup. + */ +struct xfs_rtalloc_rec { + xfs_rtblock_t ar_startext; + xfs_rtblock_t ar_extcount; +}; + +typedef int (*xfs_rtalloc_query_range_fn)( + struct xfs_mount *mp, + struct xfs_trans *tp, + const struct xfs_rtalloc_rec *rec, + void *priv); + +#ifdef CONFIG_XFS_RT +int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t block, int issum, struct xfs_buf **bpp); +int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, int val, + xfs_rtblock_t *new, int *stat); +int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_rtblock_t limit, + xfs_rtblock_t *rtblock); +int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_rtblock_t limit, + xfs_rtblock_t *rtblock); +int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, int val); +int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, + int log, xfs_rtblock_t bbno, int delta, + struct xfs_buf **rbpp, xfs_fsblock_t *rsb, + xfs_suminfo_t *sum); +int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, + xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, + xfs_fsblock_t *rsb); +int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, + struct xfs_buf **rbpp, xfs_fsblock_t *rsb); +int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, + const struct xfs_rtalloc_rec *low_rec, + const struct xfs_rtalloc_rec *high_rec, + xfs_rtalloc_query_range_fn fn, void *priv); +int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtalloc_query_range_fn fn, + void *priv); +bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); +int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, + bool *is_free); +/* + * Free an extent in the realtime subvolume. Length is expressed in + * realtime extents, as is the block number. + */ +int /* error */ +xfs_rtfree_extent( + struct xfs_trans *tp, /* transaction pointer */ + xfs_rtblock_t bno, /* starting block number to free */ + xfs_extlen_t len); /* length of extent freed */ + +/* Same as above, but in units of rt blocks. */ +int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, + xfs_filblks_t rtlen); +#else /* CONFIG_XFS_RT */ +# define xfs_rtfree_extent(t,b,l) (-ENOSYS) +# define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) +# define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) +# define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) +# define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) +# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +#endif /* CONFIG_XFS_RT */ + +#endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 0a3bde64c675..faf6e0890d44 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -9,11 +9,11 @@ #include "xfs_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" #include "xfs_log_format.h" #include "xfs_trans.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_bmap.h" #include "scrub/scrub.h" #include "scrub/common.h"
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 062e5dc5db9f..a5b9754c62d1 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -21,11 +21,11 @@ #include <linux/fsmap.h> #include "xfs_fsmap.h" #include "xfs_refcount.h" #include "xfs_refcount_btree.h" #include "xfs_alloc_btree.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_ag.h"
/* Convert an xfs_fsmap to an fsmap. */ static void xfs_fsmap_from_internal( diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 0bfbbc1dd0da..7ce122da43fe 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -17,10 +17,11 @@ #include "xfs_trans.h" #include "xfs_trans_space.h" #include "xfs_icache.h" #include "xfs_rtalloc.h" #include "xfs_sb.h" +#include "xfs_rtbitmap.h"
/* * Read and return the summary information for a given extent size, * bitmap block combination. * Keeps track of a current summary block, so we don't keep reading diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 65c284e9d33e..11859c259a1c 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -9,60 +9,31 @@ /* kernel only definitions and functions */
struct xfs_mount; struct xfs_trans;
-/* - * XXX: Most of the realtime allocation functions deal in units of realtime - * extents, not realtime blocks. This looks funny when paired with the type - * name and screams for a larger cleanup. - */ -struct xfs_rtalloc_rec { - xfs_rtblock_t ar_startext; - xfs_rtblock_t ar_extcount; -}; - -typedef int (*xfs_rtalloc_query_range_fn)( - struct xfs_mount *mp, - struct xfs_trans *tp, - const struct xfs_rtalloc_rec *rec, - void *priv); - #ifdef CONFIG_XFS_RT /* * Function prototypes for exported functions. */
/* * Allocate an extent in the realtime subvolume, with the usual allocation * parameters. The length units are all in realtime extents, as is the * result block number. */ int /* error */ xfs_rtallocate_extent( struct xfs_trans *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to allocate */ xfs_extlen_t minlen, /* minimum length to allocate */ xfs_extlen_t maxlen, /* maximum length to allocate */ xfs_extlen_t *len, /* out: actual length allocated */ int wasdel, /* was a delayed allocation extent */ xfs_extlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock); /* out: start block allocated */
-/* - * Free an extent in the realtime subvolume. Length is expressed in - * realtime extents, as is the block number. - */ -int /* error */ -xfs_rtfree_extent( - struct xfs_trans *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to free */ - xfs_extlen_t len); /* length of extent freed */ - -/* Same as above, but in units of rt blocks. */ -int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, - xfs_filblks_t rtlen);
/* * Initialize realtime fields in the mount structure. */ int /* error */ @@ -100,59 +71,15 @@ xfs_rtpick_extent( int xfs_growfs_rt( struct xfs_mount *mp, /* file system mount structure */ xfs_growfs_rt_t *in); /* user supplied growfs struct */
-/* - * From xfs_rtbitmap.c - */ -int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t block, int issum, struct xfs_buf **bpp); -int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, int val, - xfs_rtblock_t *new, int *stat); -int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtblock_t limit, - xfs_rtblock_t *rtblock); -int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtblock_t limit, - xfs_rtblock_t *rtblock); -int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, int val); -int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, - int log, xfs_rtblock_t bbno, int delta, - struct xfs_buf **rbpp, xfs_fsblock_t *rsb, - xfs_suminfo_t *sum); -int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, - xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, - xfs_fsblock_t *rsb); -int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, - struct xfs_buf **rbpp, xfs_fsblock_t *rsb); -int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, - const struct xfs_rtalloc_rec *low_rec, - const struct xfs_rtalloc_rec *high_rec, - xfs_rtalloc_query_range_fn fn, void *priv); -int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtalloc_query_range_fn fn, - void *priv); -bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); -int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, - bool *is_free); int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); #else # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (-ENOSYS) -# define xfs_rtfree_extent(t,b,l) (-ENOSYS) -# define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) # define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) # define xfs_growfs_rt(mp,in) (-ENOSYS) -# define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) -# define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) -# define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) -# define xfs_verify_rtbno(m, r) (false) -# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) # define xfs_rtalloc_reinit_frextents(m) (0) static inline int /* error */ xfs_rtmount_init( xfs_mount_t *mp) /* file system mount structure */ {
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit f29c3e745dc253bf9d9d06ddc36af1a534ba1dd0 ]
[ 6.1: excluded changes to trace.h as xchk_rtsum_record_free does not exist yet ]
XFS uses xfs_rtblock_t for many different uses, which makes it much more difficult to perform a unit analysis on the codebase. One of these (ab)uses is when we need to store the length of a free space extent as stored in the realtime bitmap. Because there can be up to 2^64 realtime extents in a filesystem, we need a new type that is larger than xfs_rtxlen_t for callers that are querying the bitmap directly. This means scrub and growfs.
Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx lengths. 'b' stands for 'bitmap' or 'big'; reader's choice.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_format.h | 2 +- fs/xfs/libxfs/xfs_rtbitmap.h | 2 +- fs/xfs/libxfs/xfs_types.h | 1 + 3 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 371dc07233e0..20acb8573d7a 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -96,11 +96,11 @@ struct xfs_ifork; typedef struct xfs_sb { uint32_t sb_magicnum; /* magic number == XFS_SB_MAGIC */ uint32_t sb_blocksize; /* logical block size, bytes */ xfs_rfsblock_t sb_dblocks; /* number of data blocks */ xfs_rfsblock_t sb_rblocks; /* number of realtime blocks */ - xfs_rtblock_t sb_rextents; /* number of realtime extents */ + xfs_rtbxlen_t sb_rextents; /* number of realtime extents */ uuid_t sb_uuid; /* user-visible file system unique id */ xfs_fsblock_t sb_logstart; /* starting block of log if internal */ xfs_ino_t sb_rootino; /* root inode number */ xfs_ino_t sb_rbmino; /* bitmap inode for realtime extents */ xfs_ino_t sb_rsumino; /* summary inode for rt bitmap */ diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 546dea34bb37..c3ef22e67aa3 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -11,11 +11,11 @@ * extents, not realtime blocks. This looks funny when paired with the type * name and screams for a larger cleanup. */ struct xfs_rtalloc_rec { xfs_rtblock_t ar_startext; - xfs_rtblock_t ar_extcount; + xfs_rtbxlen_t ar_extcount; };
typedef int (*xfs_rtalloc_query_range_fn)( struct xfs_mount *mp, struct xfs_trans *tp, diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 5ebdda7e1078..7023e4a79f87 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -29,10 +29,11 @@ typedef uint32_t xfs_dahash_t; /* dir/attr hash value */ typedef uint64_t xfs_fsblock_t; /* blockno in filesystem (agno|agbno) */ typedef uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */ typedef uint64_t xfs_rtblock_t; /* extent (block) in realtime area */ typedef uint64_t xfs_fileoff_t; /* block number in a file */ typedef uint64_t xfs_filblks_t; /* number of blocks in a file */ +typedef uint64_t xfs_rtbxlen_t; /* rtbitmap extent length in rtextents */
typedef int64_t xfs_srtblock_t; /* signed version of xfs_rtblock_t */
/* * New verifiers will return the instruction address of the failing check.
From: Christoph Hellwig hch@lst.de
[ Upstream commit 944df75958807d56f2db9fdc769eb15dd9f0366a ]
minlen is the lower bound on the extent length that the caller can accept, and maxlen is at this point the maximal available length. This means a minlen extent is perfectly fine to use, so do it. This matches the equivalent logic in xfs_rtallocate_extent_exact that also accepts a minlen sized extent.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_rtalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 7ce122da43fe..2f2280f4e7fa 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -316,11 +316,11 @@ xfs_rtallocate_extent_block( break; } /* * Searched the whole thing & didn't find a maxlen free extent. */ - if (minlen < maxlen && besti != -1) { + if (minlen <= maxlen && besti != -1) { xfs_extlen_t p; /* amount to trim length by */
/* * If size should be a multiple of prod, make that so. */
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 07bcbdf020c9fd3c14bec51c50225a2a02707b94 ]
If recovery finds an xattr log intent item calling for the removal of an attribute and the file doesn't even have an attr fork, we know that the removal is trivially complete. However, we can't just exit the recovery function without doing something about the recovered log intent item -- it's still on the AIL, and not logging an attrd item means it stays there forever.
This has likely not been seen in practice because few people use LARP and the runtime code won't log the attri for a no-attrfork removexattr operation. But let's fix this anyway.
Also we shouldn't really be testing the attr fork presence until we've taken the ILOCK, though this doesn't matter much in recovery, which is single threaded.
Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_attr_item.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index 36fe2abb16e6..11e88a76a33c 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -327,10 +327,17 @@ xfs_xattri_finish_update( if (XFS_TEST_ERROR(false, args->dp->i_mount, XFS_ERRTAG_LARP)) { error = -EIO; goto out; }
+ /* If an attr removal is trivially complete, we're done. */ + if (attr->xattri_op_flags == XFS_ATTRI_OP_FLAGS_REMOVE && + !xfs_inode_hasattr(args->dp)) { + error = 0; + goto out; + } + error = xfs_attr_set_iter(attr); if (!error && attr->xattri_dela_state != XFS_DAS_DONE) error = -EAGAIN; out: /* @@ -606,12 +613,10 @@ xfs_attri_item_recover( attr->xattri_dela_state = xfs_attr_init_replace_state(args); else attr->xattri_dela_state = xfs_attr_init_add_state(args); break; case XFS_ATTRI_OP_FLAGS_REMOVE: - if (!xfs_inode_hasattr(args->dp)) - goto out; attr->xattri_dela_state = xfs_attr_init_remove_state(args); break; default: ASSERT(0); error = -EFSCORRUPTED;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 ]
[ 6.1: resovled conflict in xfs_defer.c ]
One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed.
This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done.
Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function.
However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed.
This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_defer.c | 103 +++++++++++++++++++++-------- fs/xfs/libxfs/xfs_defer.h | 5 ++ fs/xfs/libxfs/xfs_log_recover.h | 3 + fs/xfs/xfs_attr_item.c | 10 +-- fs/xfs/xfs_bmap_item.c | 9 +-- fs/xfs/xfs_extfree_item.c | 9 +-- fs/xfs/xfs_log.c | 1 + fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_log_recover.c | 113 ++++++++++++++++---------------- fs/xfs/xfs_refcount_item.c | 9 +-- fs/xfs/xfs_rmap_item.c | 9 +-- 11 files changed, 157 insertions(+), 115 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 92470ed3fcbd..64005ea1e8af 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -243,37 +243,66 @@ xfs_defer_create_intents( ret |= ret2; } return ret; }
-STATIC void +static inline void xfs_defer_pending_abort( + struct xfs_mount *mp, + struct xfs_defer_pending *dfp) +{ + const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type]; + + trace_xfs_defer_pending_abort(mp, dfp); + + if (dfp->dfp_intent && !dfp->dfp_done) { + ops->abort_intent(dfp->dfp_intent); + dfp->dfp_intent = NULL; + } +} + +static inline void +xfs_defer_pending_cancel_work( + struct xfs_mount *mp, + struct xfs_defer_pending *dfp) +{ + const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type]; + struct list_head *pwi; + struct list_head *n; + + trace_xfs_defer_cancel_list(mp, dfp); + + list_del(&dfp->dfp_list); + list_for_each_safe(pwi, n, &dfp->dfp_work) { + list_del(pwi); + dfp->dfp_count--; + ops->cancel_item(pwi); + } + ASSERT(dfp->dfp_count == 0); + kmem_cache_free(xfs_defer_pending_cache, dfp); +} + +STATIC void +xfs_defer_pending_abort_list( struct xfs_mount *mp, struct list_head *dop_list) { struct xfs_defer_pending *dfp; - const struct xfs_defer_op_type *ops;
/* Abort intent items that don't have a done item. */ - list_for_each_entry(dfp, dop_list, dfp_list) { - ops = defer_op_types[dfp->dfp_type]; - trace_xfs_defer_pending_abort(mp, dfp); - if (dfp->dfp_intent && !dfp->dfp_done) { - ops->abort_intent(dfp->dfp_intent); - dfp->dfp_intent = NULL; - } - } + list_for_each_entry(dfp, dop_list, dfp_list) + xfs_defer_pending_abort(mp, dfp); }
/* Abort all the intents that were committed. */ STATIC void xfs_defer_trans_abort( struct xfs_trans *tp, struct list_head *dop_pending) { trace_xfs_defer_trans_abort(tp, _RET_IP_); - xfs_defer_pending_abort(tp->t_mountp, dop_pending); + xfs_defer_pending_abort_list(tp->t_mountp, dop_pending); }
/* * Capture resources that the caller said not to release ("held") when the * transaction commits. Caller is responsible for zero-initializing @dres. @@ -387,30 +416,17 @@ xfs_defer_cancel_list( struct xfs_mount *mp, struct list_head *dop_list) { struct xfs_defer_pending *dfp; struct xfs_defer_pending *pli; - struct list_head *pwi; - struct list_head *n; - const struct xfs_defer_op_type *ops;
/* * Free the pending items. Caller should already have arranged * for the intent items to be released. */ - list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) { - ops = defer_op_types[dfp->dfp_type]; - trace_xfs_defer_cancel_list(mp, dfp); - list_del(&dfp->dfp_list); - list_for_each_safe(pwi, n, &dfp->dfp_work) { - list_del(pwi); - dfp->dfp_count--; - ops->cancel_item(pwi); - } - ASSERT(dfp->dfp_count == 0); - kmem_cache_free(xfs_defer_pending_cache, dfp); - } + list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) + xfs_defer_pending_cancel_work(mp, dfp); }
/* * Prevent a log intent item from pinning the tail of the log by logging a * done item to release the intent item; and then log a new intent item. @@ -661,10 +677,43 @@ xfs_defer_add(
list_add_tail(li, &dfp->dfp_work); dfp->dfp_count++; }
+/* + * Create a pending deferred work item to replay the recovered intent item + * and add it to the list. + */ +void +xfs_defer_start_recovery( + struct xfs_log_item *lip, + enum xfs_defer_ops_type dfp_type, + struct list_head *r_dfops) +{ + struct xfs_defer_pending *dfp; + + dfp = kmem_cache_zalloc(xfs_defer_pending_cache, + GFP_NOFS | __GFP_NOFAIL); + dfp->dfp_type = dfp_type; + dfp->dfp_intent = lip; + INIT_LIST_HEAD(&dfp->dfp_work); + list_add_tail(&dfp->dfp_list, r_dfops); +} + +/* + * Cancel a deferred work item created to recover a log intent item. @dfp + * will be freed after this function returns. + */ +void +xfs_defer_cancel_recovery( + struct xfs_mount *mp, + struct xfs_defer_pending *dfp) +{ + xfs_defer_pending_abort(mp, dfp); + xfs_defer_pending_cancel_work(mp, dfp); +} + /* * Move deferred ops from one transaction to another and reset the source to * initial state. This is primarily used to carry state forward across * transaction rolls with pending dfops. */ @@ -765,11 +814,11 @@ xfs_defer_ops_capture_abort( struct xfs_mount *mp, struct xfs_defer_capture *dfc) { unsigned short i;
- xfs_defer_pending_abort(mp, &dfc->dfc_dfops); + xfs_defer_pending_abort_list(mp, &dfc->dfc_dfops); xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
for (i = 0; i < dfc->dfc_held.dr_bufs; i++) xfs_buf_relse(dfc->dfc_held.dr_bp[i]);
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index 8788ad5f6a73..5dce938ba3d5 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -123,9 +123,14 @@ void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp, struct xfs_defer_resources *dres); void xfs_defer_ops_capture_abort(struct xfs_mount *mp, struct xfs_defer_capture *d); void xfs_defer_resources_rele(struct xfs_defer_resources *dres);
+void xfs_defer_start_recovery(struct xfs_log_item *lip, + enum xfs_defer_ops_type dfp_type, struct list_head *r_dfops); +void xfs_defer_cancel_recovery(struct xfs_mount *mp, + struct xfs_defer_pending *dfp); + int __init xfs_defer_init_item_caches(void); void xfs_defer_destroy_item_caches(void);
#endif /* __XFS_DEFER_H__ */ diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index a5100a11faf9..271a4ce7375c 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -151,6 +151,9 @@ xlog_recover_resv(const struct xfs_trans_res *r) };
return ret; }
+void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip, + xfs_lsn_t lsn, unsigned int dfp_type); + #endif /* __XFS_LOG_RECOVER_H__ */ diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index 11e88a76a33c..a32716b8cbbd 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -770,18 +770,12 @@ xlog_recover_attri_commit_pass2( attri_formatp->alfi_value_len);
attrip = xfs_attri_init(mp, nv); memcpy(&attrip->attri_format, attri_formatp, len);
- /* - * The ATTRI has two references. One for the ATTRD and one for ATTRI to - * ensure it makes it into the AIL. Insert the ATTRI into the AIL - * directly and drop the ATTRI reference. Note that - * xfs_trans_ail_update() drops the AIL lock. - */ - xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn); - xfs_attri_release(attrip); + xlog_recover_intent_item(log, &attrip->attri_item, lsn, + XFS_DEFER_OPS_TYPE_ATTR); xfs_attri_log_nameval_put(nv); return 0; }
/* diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 1058603db3ac..8d08252e1953 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -644,16 +644,13 @@ xlog_recover_bui_commit_pass2( }
buip = xfs_bui_init(mp); xfs_bui_copy_format(&buip->bui_format, bui_formatp); atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents); - /* - * Insert the intent into the AIL directly and drop one reference so - * that finishing or canceling the work will drop the other. - */ - xfs_trans_ail_insert(log->l_ailp, &buip->bui_item, lsn); - xfs_bui_release(buip); + + xlog_recover_intent_item(log, &buip->bui_item, lsn, + XFS_DEFER_OPS_TYPE_BMAP); return 0; }
const struct xlog_recover_item_ops xlog_bui_item_ops = { .item_type = XFS_LI_BUI, diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 3ed25c352269..fd9fe51bcc31 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -734,16 +734,13 @@ xlog_recover_efi_commit_pass2( if (error) { xfs_efi_item_free(efip); return error; } atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents); - /* - * Insert the intent into the AIL directly and drop one reference so - * that finishing or canceling the work will drop the other. - */ - xfs_trans_ail_insert(log->l_ailp, &efip->efi_item, lsn); - xfs_efi_release(efip); + + xlog_recover_intent_item(log, &efip->efi_item, lsn, + XFS_DEFER_OPS_TYPE_FREE); return 0; }
const struct xlog_recover_item_ops xlog_efi_item_ops = { .item_type = XFS_LI_EFI, diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index ce6b303484cf..d39ee05ac1f2 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1538,10 +1538,11 @@ xlog_alloc_log( log->l_logBBstart = blk_offset; log->l_logBBsize = num_bblks; log->l_covered_state = XLOG_STATE_COVER_IDLE; set_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate); INIT_DELAYED_WORK(&log->l_work, xfs_log_worker); + INIT_LIST_HEAD(&log->r_dfops);
log->l_prev_block = -1; /* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */ xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0); xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 1bd2963e8fbd..8677ba92d317 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -401,10 +401,11 @@ struct xlog { struct workqueue_struct *l_ioend_workqueue; /* for I/O completions */ struct delayed_work l_work; /* background flush work */ long l_opstate; /* operational state */ uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ struct list_head *l_buf_cancel_table; + struct list_head r_dfops; /* recovered log intent items */ int l_iclog_hsize; /* size of iclog header */ int l_iclog_heads; /* # of iclog header sectors */ uint l_sectBBsize; /* sector size in BBs (2^n) */ int l_iclog_size; /* size of log in bytes */ int l_iclog_bufs; /* number of iclog buffers */ diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index e009bb23d8a2..65041ed7833d 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1721,34 +1721,28 @@ xlog_clear_stale_blocks( * Release the recovered intent item in the AIL that matches the given intent * type and intent id. */ void xlog_recover_release_intent( - struct xlog *log, - unsigned short intent_type, - uint64_t intent_id) + struct xlog *log, + unsigned short intent_type, + uint64_t intent_id) { - struct xfs_ail_cursor cur; - struct xfs_log_item *lip; - struct xfs_ail *ailp = log->l_ailp; + struct xfs_defer_pending *dfp, *n; + + list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) { + struct xfs_log_item *lip = dfp->dfp_intent;
- spin_lock(&ailp->ail_lock); - for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip != NULL; - lip = xfs_trans_ail_cursor_next(ailp, &cur)) { if (lip->li_type != intent_type) continue; if (!lip->li_ops->iop_match(lip, intent_id)) continue;
- spin_unlock(&ailp->ail_lock); - lip->li_ops->iop_release(lip); - spin_lock(&ailp->ail_lock); - break; - } + ASSERT(xlog_item_is_intent(lip));
- xfs_trans_ail_cursor_done(&cur); - spin_unlock(&ailp->ail_lock); + xfs_defer_cancel_recovery(log->l_mp, dfp); + } }
int xlog_recover_iget( struct xfs_mount *mp, @@ -1937,10 +1931,33 @@ xlog_buf_readahead( { if (!xlog_is_buffer_cancelled(log, blkno, len)) xfs_buf_readahead(log->l_mp->m_ddev_targp, blkno, len, ops); }
+/* + * Create a deferred work structure for resuming and tracking the progress of a + * log intent item that was found during recovery. + */ +void +xlog_recover_intent_item( + struct xlog *log, + struct xfs_log_item *lip, + xfs_lsn_t lsn, + unsigned int dfp_type) +{ + ASSERT(xlog_item_is_intent(lip)); + + xfs_defer_start_recovery(lip, dfp_type, &log->r_dfops); + + /* + * Insert the intent into the AIL directly and drop one reference so + * that finishing or canceling the work will drop the other. + */ + xfs_trans_ail_insert(log->l_ailp, lip, lsn); + lip->li_ops->iop_unpin(lip, 0); +} + STATIC int xlog_recover_items_pass2( struct xlog *log, struct xlog_recover *trans, struct list_head *buffer_list, @@ -2534,104 +2551,88 @@ xlog_abort_defer_ops( * have started recovery on all the pending intents when we find an non-intent * item in the AIL. */ STATIC int xlog_recover_process_intents( - struct xlog *log) + struct xlog *log) { LIST_HEAD(capture_list); - struct xfs_ail_cursor cur; - struct xfs_log_item *lip; - struct xfs_ail *ailp; - int error = 0; + struct xfs_defer_pending *dfp, *n; + int error = 0; #if defined(DEBUG) || defined(XFS_WARN) - xfs_lsn_t last_lsn; -#endif + xfs_lsn_t last_lsn;
- ailp = log->l_ailp; - spin_lock(&ailp->ail_lock); -#if defined(DEBUG) || defined(XFS_WARN) last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block); #endif - for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); - lip != NULL; - lip = xfs_trans_ail_cursor_next(ailp, &cur)) { - const struct xfs_item_ops *ops;
- if (!xlog_item_is_intent(lip)) - break; + list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) { + struct xfs_log_item *lip = dfp->dfp_intent; + const struct xfs_item_ops *ops = lip->li_ops; + + ASSERT(xlog_item_is_intent(lip));
/* * We should never see a redo item with a LSN higher than * the last transaction we found in the log at the start * of recovery. */ ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0);
/* * NOTE: If your intent processing routine can create more * deferred ops, you /must/ attach them to the capture list in * the recover routine or else those subsequent intents will be * replayed in the wrong order! * * The recovery function can free the log item, so we must not * access lip after it returns. */ - spin_unlock(&ailp->ail_lock); - ops = lip->li_ops; error = ops->iop_recover(lip, &capture_list); - spin_lock(&ailp->ail_lock); if (error) { trace_xlog_intent_recovery_failed(log->l_mp, error, ops->iop_recover); break; } - }
- xfs_trans_ail_cursor_done(&cur); - spin_unlock(&ailp->ail_lock); + /* + * XXX: @lip could have been freed, so detach the log item from + * the pending item before freeing the pending item. This does + * not fix the existing UAF bug that occurs if ->iop_recover + * fails after creating the intent done item. + */ + dfp->dfp_intent = NULL; + xfs_defer_cancel_recovery(log->l_mp, dfp); + } if (error) goto err;
error = xlog_finish_defer_ops(log->l_mp, &capture_list); if (error) goto err;
return 0; err: xlog_abort_defer_ops(log->l_mp, &capture_list); return error; }
/* * A cancel occurs when the mount has failed and we're bailing out. Release all * pending log intent items that we haven't started recovery on so they don't * pin the AIL. */ STATIC void xlog_recover_cancel_intents( - struct xlog *log) + struct xlog *log) { - struct xfs_log_item *lip; - struct xfs_ail_cursor cur; - struct xfs_ail *ailp; - - ailp = log->l_ailp; - spin_lock(&ailp->ail_lock); - lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); - while (lip != NULL) { - if (!xlog_item_is_intent(lip)) - break; + struct xfs_defer_pending *dfp, *n;
- spin_unlock(&ailp->ail_lock); - lip->li_ops->iop_release(lip); - spin_lock(&ailp->ail_lock); - lip = xfs_trans_ail_cursor_next(ailp, &cur); - } + list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) { + ASSERT(xlog_item_is_intent(dfp->dfp_intent));
- xfs_trans_ail_cursor_done(&cur); - spin_unlock(&ailp->ail_lock); + xfs_defer_cancel_recovery(log->l_mp, dfp); + } }
/* * This routine performs a transaction to null out a bad inode pointer * in an agi unlinked inode hash bucket. diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index dfd7b824e32b..1e047107d2f2 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -666,16 +666,13 @@ xlog_recover_cui_commit_pass2( }
cuip = xfs_cui_init(mp, cui_formatp->cui_nextents); xfs_cui_copy_format(&cuip->cui_format, cui_formatp); atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents); - /* - * Insert the intent into the AIL directly and drop one reference so - * that finishing or canceling the work will drop the other. - */ - xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn); - xfs_cui_release(cuip); + + xlog_recover_intent_item(log, &cuip->cui_item, lsn, + XFS_DEFER_OPS_TYPE_REFCOUNT); return 0; }
const struct xlog_recover_item_ops xlog_cui_item_ops = { .item_type = XFS_LI_CUI, diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 2043cea261c0..12ae8ab6a69d 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -680,16 +680,13 @@ xlog_recover_rui_commit_pass2( }
ruip = xfs_rui_init(mp, rui_formatp->rui_nextents); xfs_rui_copy_format(&ruip->rui_format, rui_formatp); atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents); - /* - * Insert the intent into the AIL directly and drop one reference so - * that finishing or canceling the work will drop the other. - */ - xfs_trans_ail_insert(log->l_ailp, &ruip->rui_item, lsn); - xfs_rui_release(ruip); + + xlog_recover_intent_item(log, &ruip->rui_item, lsn, + XFS_DEFER_OPS_TYPE_RMAP); return 0; }
const struct xlog_recover_item_ops xlog_rui_item_ops = { .item_type = XFS_LI_RUI,
Hi,
Leah Rumancik wrote:
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 ]
Commit 03f7767c9f6120ac933378fdec3bfd78bf07bc11 leads to a kernel crash during the xfs/235 test execution on 6.1.
Bisecting this on the mainline shows it is resolved by
e5f1a5146ec3 ("xfs: use xfs_defer_finish_one to finish recovered work items")
which was a part of the same patchset [1] with intent recovery changes but is not included into the current 6.1 backport-series.
[1]: https://lore.kernel.org/linux-xfs/170120318847.13206.17051442307252477333.st...
$ cat local.config export TEST_DEV=/dev/loop0 export TEST_DIR=/mnt/test export SCRATCH_DEV=/dev/loop1 export SCRATCH_MNT=/mnt/scratch export SCRATCH_LOGDEV=/dev/loop2 export SCRATCH_RTDEV=/dev/loop3 export MKFS_OPTIONS="-m rmapbt=1" $ ./check xfs/235 FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 pserver 6.1.131+ #5 SMP PREEMPT_DYNAMIC Fri Mar 21 00:23:33 2025 MKFS_OPTIONS -- -f -m rmapbt=1 /dev/loop1 MOUNT_OPTIONS -- -o context=unconfined_u:object_r:root_t:s0 /dev/loop1 /mnt/scratch
xfs/235 6s ...
Kernel log taken with "xfs: use xfs_defer_pending objects to recover intent items" as HEAD commit in stable 6.1-queue
[ 24.491106] loop0: detected capacity change from 0 to 2097152 [ 24.497086] loop1: detected capacity change from 0 to 2097152 [ 24.502562] loop2: detected capacity change from 0 to 2097152 [ 24.507256] loop3: detected capacity change from 0 to 2097152 [ 28.801611] XFS (loop0): Mounting V5 Filesystem [ 28.857208] XFS (loop0): Ending clean mount [ 32.056570] XFS (loop1): Mounting V5 Filesystem [ 32.062298] XFS (loop1): Ending clean mount [ 32.068974] XFS (loop1): Unmounting Filesystem [ 32.129230] XFS (loop0): EXPERIMENTAL online scrub feature in use. Use at your own risk! [ 32.242946] XFS (loop0): Unmounting Filesystem [ 32.312170] XFS (loop0): Mounting V5 Filesystem [ 32.317663] XFS (loop0): Ending clean mount [ 32.362570] run fstests xfs/235 at 2025-03-21 08:06:16 [ 33.010312] XFS (loop1): Mounting V5 Filesystem [ 33.016581] XFS (loop1): Ending clean mount [ 33.047697] XFS (loop1): Unmounting Filesystem [ 33.124483] XFS (loop1): Mounting V5 Filesystem [ 33.130343] XFS (loop1): Ending clean mount [ 33.200269] cp (1897) used greatest stack depth: 22704 bytes left [ 33.229054] XFS (loop1): Unmounting Filesystem [ 33.348374] XFS (loop1): Mounting V5 Filesystem [ 33.352033] XFS (loop1): Ending clean mount [ 33.363401] XFS (loop1): Metadata CRC error detected at xfs_rmapbt_read_verify+0x28/0x240, xfs_rmapbt block 0x28 [ 33.364028] XFS (loop1): Unmount and run xfs_repair [ 33.364186] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 33.364387] 00000000: 52 4d 42 33 00 00 00 0b ff ff ff ff ff ff ff ff RMB3............ [ 33.364642] 00000010: 00 00 00 00 00 00 00 28 00 00 00 01 00 00 00 02 .......(........ [ 33.364881] 00000020: 85 c5 70 96 4c 65 44 11 bf 35 27 35 35 35 ec 19 ..p.LeD..5'555.. [ 33.365173] 00000030: 00 00 00 00 16 08 c6 49 00 00 00 00 00 00 00 01 .......I........ [ 33.365484] 00000040: ff ff ff ff ff ff ff fd 00 00 00 00 00 00 00 00 ................ [ 33.365854] 00000050: 00 00 00 01 00 00 00 02 ff ff ff ff ff ff ff fb ................ [ 33.366204] 00000060: 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 02 ................ [ 33.366435] 00000070: ff ff ff ff ff ff ff fa 00 00 00 00 00 00 00 00 ................ [ 33.366694] XFS (loop1): metadata I/O error in "xfs_btree_read_buf_block.constprop.0+0x1ae/0x360" at daddr 0x28 len 8 error 74 [ 33.374164] XFS (loop1): Corruption of in-memory data (0x8) detected at xfs_defer_finish_noroll+0xcc3/0x1c10 (fs/xfs/libxfs/xfs_defer.c:596). Shutting down filesys. [ 33.374863] XFS (loop1): Please unmount the filesystem and rectify the problem(s) [ 33.389190] XFS (loop1): Unmounting Filesystem [ 33.422139] XFS (loop1): Mounting V5 Filesystem [ 33.426543] XFS (loop1): Starting recovery (logdev: internal) [ 33.428526] XFS (loop1): Metadata CRC error detected at xfs_rmapbt_read_verify+0x28/0x240, xfs_rmapbt block 0x28 [ 33.428923] XFS (loop1): Unmount and run xfs_repair [ 33.429147] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 33.429378] 00000000: 52 4d 42 33 00 00 00 0b ff ff ff ff ff ff ff ff RMB3............ [ 33.429788] 00000010: 00 00 00 00 00 00 00 28 00 00 00 01 00 00 00 02 .......(........ [ 33.430087] 00000020: 85 c5 70 96 4c 65 44 11 bf 35 27 35 35 35 ec 19 ..p.LeD..5'555.. [ 33.430442] 00000030: 00 00 00 00 16 08 c6 49 00 00 00 00 00 00 00 01 .......I........ [ 33.430733] 00000040: ff ff ff ff ff ff ff fd 00 00 00 00 00 00 00 00 ................ [ 33.431047] 00000050: 00 00 00 01 00 00 00 02 ff ff ff ff ff ff ff fb ................ [ 33.431350] 00000060: 00 00 00 00 00 00 00 00 00 00 00 03 00 00 00 02 ................ [ 33.431616] 00000070: ff ff ff ff ff ff ff fa 00 00 00 00 00 00 00 00 ................ [ 33.431858] XFS (loop1): metadata I/O error in "xfs_btree_read_buf_block.constprop.0+0x1ae/0x360" at daddr 0x28 len 8 error 74 [ 33.432279] 00000000: 87 00 00 00 00 00 00 00 98 00 00 00 00 00 00 00 ................ [ 33.432553] 00000010: 00 00 00 00 00 00 00 00 70 00 00 00 01 00 00 20 ........p...... [ 33.432883] XFS (loop1): Internal error xfs_rui_item_recover at line 573 of file fs/xfs/xfs_rmap_item.c. Caller xlog_recover_process_intents+0x221/0xbc0 [ 33.433397] CPU: 0 PID: 2107 Comm: mount Not tainted 6.1.131+ #5 [ 33.433607] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 33.433893] Call Trace: [ 33.433985] <TASK> [ 33.434065] dump_stack_lvl+0x91/0xbe [ 33.434204] xfs_corruption_error+0x139/0x160 [ 33.434540] xfs_rui_item_recover+0x8b2/0xb70 [ 33.435985] xlog_recover_process_intents+0x221/0xbc0 [ 33.437896] xlog_recover_finish+0x8b/0x9f0 [ 33.439417] xfs_log_mount_finish+0x386/0x650 [ 33.439565] xfs_mountfs+0x125c/0x1d90 [ 33.440407] xfs_fs_fill_super+0x1327/0x1ea0 [ 33.440547] get_tree_bdev+0x42c/0x740 [ 33.440850] vfs_get_tree+0x8e/0x2f0 [ 33.440969] path_mount+0x137f/0x1ec0 [ 33.441561] __x64_sys_mount+0x286/0x310 [ 33.442210] do_syscall_64+0x39/0x90 [ 33.442340] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 33.444714] </TASK> [ 33.444947] XFS (loop1): Corruption detected. Unmount and run xfs_repair [ 33.445212] XFS (loop1): Internal error xfs_trans_cancel at line 1096 of file fs/xfs/xfs_trans.c. Caller xfs_rui_item_recover+0x8de/0xb70 [ 33.445667] CPU: 0 PID: 2107 Comm: mount Not tainted 6.1.131+ #5 [ 33.445905] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 [ 33.446225] Call Trace: [ 33.446318] <TASK> [ 33.446406] dump_stack_lvl+0x91/0xbe [ 33.446552] xfs_error_report+0xb7/0xc0 [ 33.446875] xfs_trans_cancel+0x512/0x6d0 [ 33.447025] xfs_rui_item_recover+0x8de/0xb70 [ 33.448066] xlog_recover_process_intents+0x221/0xbc0 [ 33.449553] xlog_recover_finish+0x8b/0x9f0 [ 33.450811] xfs_log_mount_finish+0x386/0x650 [ 33.450971] xfs_mountfs+0x125c/0x1d90 [ 33.451866] xfs_fs_fill_super+0x1327/0x1ea0 [ 33.452028] get_tree_bdev+0x42c/0x740 [ 33.452307] vfs_get_tree+0x8e/0x2f0 [ 33.452434] path_mount+0x137f/0x1ec0 [ 33.453011] __x64_sys_mount+0x286/0x310 [ 33.453658] do_syscall_64+0x39/0x90 [ 33.453787] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 33.456137] </TASK> [ 33.456250] XFS (loop1): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x52b/0x6d0 (fs/xfs/xfs_trans.c:1097). Shutting down filesystem. [ 33.456734] XFS (loop1): Please unmount the filesystem and rectify the problem(s) [ 33.457005] ==================================================================
After there goes KASAN [decoded].
BUG: KASAN: use-after-free in xlog_item_is_intent (fs/xfs/xfs_trans.h:103) BUG: KASAN: use-after-free in xlog_recover_cancel_intents (fs/xfs/xfs_log_recover.c:2630) Read of size 8 at addr ffff88802c268390 by task mount/2107
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107) print_report (mm/kasan/report.c:317 mm/kasan/report.c:427) kasan_report (mm/kasan/report.c:194 mm/kasan/report.c:533) xlog_item_is_intent (fs/xfs/xfs_trans.h:103 inline) xlog_recover_cancel_intents (fs/xfs/xfs_log_recover.c:2630) xlog_recover_finish (fs/xfs/xfs_log_recover.c:3468) xfs_log_mount_finish (fs/xfs/xfs_log.c:796) xfs_mountfs (fs/xfs/xfs_mount.c:934) xfs_fs_fill_super (fs/xfs/xfs_super.c:1682) get_tree_bdev (fs/super.c:1367) vfs_get_tree (fs/super.c:1574) path_mount (fs/namespace.c:3386) __x64_sys_mount (fs/namespace.c:3584) do_syscall_64 (arch/x86/entry/common.c:81) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121) </TASK>
Allocated by task 2107: kasan_save_stack (mm/kasan/common.c:46) kasan_set_track (mm/kasan/common.c:52) __kasan_slab_alloc (mm/kasan/common.c:328) kmem_cache_alloc (mm/slub.c:3422) xfs_rui_init (./include/linux/slab.h:689 fs/xfs/xfs_rmap_item.c:146) xlog_recover_rui_commit_pass2 (fs/xfs/xfs_rmap_item.c:683) xlog_recover_items_pass2 (fs/xfs/xfs_log_recover.c:1976) xlog_recover_commit_trans (fs/xfs/xfs_log_recover.c:2043) xlog_recovery_process_trans (fs/xfs/xfs_log_recover.c:2274) xlog_recover_process_ophdr (fs/xfs/xfs_log_recover.c:2420) xlog_recover_process_data (fs/xfs/xfs_log_recover.c:2467) xlog_recover_process (fs/xfs/xfs_log_recover.c:2903) xlog_do_recovery_pass (fs/xfs/xfs_log_recover.c:3199) xlog_do_log_recovery (fs/xfs/xfs_log_recover.c:3279) xlog_do_recover (fs/xfs/xfs_log_recover.c:3306) xlog_recover (fs/xfs/xfs_log_recover.c:3439) xfs_log_mount (fs/xfs/xfs_log.c:717) xfs_mountfs (fs/xfs/xfs_mount.c:822) xfs_fs_fill_super (fs/xfs/xfs_super.c:1682) get_tree_bdev (fs/super.c:1367) vfs_get_tree (fs/super.c:1574) path_mount (fs/namespace.c:3057) __x64_sys_mount (fs/namespace.c:3584) do_syscall_64 (arch/x86/entry/common.c:81) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
Freed by task 2107: kasan_save_stack (mm/kasan/common.c:46) kasan_set_track (mm/kasan/common.c:52) kasan_save_free_info (mm/kasan/generic.c:518) ____kasan_slab_free (mm/kasan/common.c:200) kmem_cache_free (mm/slub.c:3683) xfs_rui_item_free (fs/xfs/xfs_rmap_item.c:43) xfs_rui_release (fs/xfs/xfs_rmap_item.c:61) xfs_rud_item_release (fs/xfs/xfs_rmap_item.c:207) xfs_trans_free_items (fs/xfs/xfs_trans.c:711) xfs_trans_cancel (fs/xfs/xfs_trans.c:1117) xfs_rui_item_recover (fs/xfs/xfs_rmap_item.c:586) xlog_recover_process_intents (fs/xfs/xfs_log_recover.c:2590) xlog_recover_finish (fs/xfs/xfs_log_recover.c:3459) xfs_log_mount_finish (fs/xfs/xfs_log.c:796) xfs_mountfs (fs/xfs/xfs_mount.c:934) xfs_fs_fill_super (fs/xfs/xfs_super.c:1682) get_tree_bdev (fs/super.c:1367) vfs_get_tree (fs/super.c:1574) path_mount (fs/namespace.c:3386) __x64_sys_mount (fs/namespace.c:3584) do_syscall_64 (arch/x86/entry/common.c:81) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
The buggy address belongs to the object at ffff88802c268330 which belongs to the cache xfs_rui_item of size 688 The buggy address is located 96 bytes inside of 688-byte region [ffff88802c268330, ffff88802c2685e0)
I'd say the following 6.1 patches from the current backport-series
[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover
transform the crash a little bit to a NULL pointer dereference in the same location but do not suppress the problem. It is reproducible above the relevant top of queue/6.1 of linux-stable-rc.git
Thanks, Fedor
[ 6.1: resovled conflict in xfs_defer.c ]
One thing I never quite got around to doing is porting the log intent item recovery code to reconstruct the deferred pending work state. As a result, each intent item open codes xfs_defer_finish_one in its recovery method, because that's what the EFI code did before xfs_defer.c even existed.
This is a gross thing to have left unfixed -- if an EFI cannot proceed due to busy extents, we end up creating separate new EFIs for each unfinished work item, which is a change in behavior from what runtime would have done.
Worse yet, Long Li pointed out that there's a UAF in the recovery code. The ->commit_pass2 function adds the intent item to the AIL and drops the refcount. The one remaining refcount is now owned by the recovery mechanism (aka the log intent items in the AIL) with the intent of giving the refcount to the intent done item in the ->iop_recover function.
However, if something fails later in recovery, xlog_recover_finish will walk the recovered intent items in the AIL and release them. If the CIL hasn't been pushed before that point (which is possible since we don't force the log until later) then the intent done release will try to free its associated intent, which has already been freed.
This patch starts to address this mess by having the ->commit_pass2 functions recreate the xfs_defer_pending state. The next few patches will fix the recovery functions.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org
fs/xfs/libxfs/xfs_defer.c | 103 +++++++++++++++++++++-------- fs/xfs/libxfs/xfs_defer.h | 5 ++ fs/xfs/libxfs/xfs_log_recover.h | 3 + fs/xfs/xfs_attr_item.c | 10 +-- fs/xfs/xfs_bmap_item.c | 9 +-- fs/xfs/xfs_extfree_item.c | 9 +-- fs/xfs/xfs_log.c | 1 + fs/xfs/xfs_log_priv.h | 1 + fs/xfs/xfs_log_recover.c | 113 ++++++++++++++++---------------- fs/xfs/xfs_refcount_item.c | 9 +-- fs/xfs/xfs_rmap_item.c | 9 +-- 11 files changed, 157 insertions(+), 115 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 92470ed3fcbd..64005ea1e8af 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -243,37 +243,66 @@ xfs_defer_create_intents( ret |= ret2; } return ret; } -STATIC void +static inline void xfs_defer_pending_abort(
- struct xfs_mount *mp,
- struct xfs_defer_pending *dfp)
+{
- const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type];
- trace_xfs_defer_pending_abort(mp, dfp);
- if (dfp->dfp_intent && !dfp->dfp_done) {
ops->abort_intent(dfp->dfp_intent);
dfp->dfp_intent = NULL;
- }
+}
+static inline void +xfs_defer_pending_cancel_work(
- struct xfs_mount *mp,
- struct xfs_defer_pending *dfp)
+{
- const struct xfs_defer_op_type *ops = defer_op_types[dfp->dfp_type];
- struct list_head *pwi;
- struct list_head *n;
- trace_xfs_defer_cancel_list(mp, dfp);
- list_del(&dfp->dfp_list);
- list_for_each_safe(pwi, n, &dfp->dfp_work) {
list_del(pwi);
dfp->dfp_count--;
ops->cancel_item(pwi);
- }
- ASSERT(dfp->dfp_count == 0);
- kmem_cache_free(xfs_defer_pending_cache, dfp);
+}
+STATIC void +xfs_defer_pending_abort_list( struct xfs_mount *mp, struct list_head *dop_list) { struct xfs_defer_pending *dfp;
- const struct xfs_defer_op_type *ops;
/* Abort intent items that don't have a done item. */
- list_for_each_entry(dfp, dop_list, dfp_list) {
ops = defer_op_types[dfp->dfp_type];
trace_xfs_defer_pending_abort(mp, dfp);
if (dfp->dfp_intent && !dfp->dfp_done) {
ops->abort_intent(dfp->dfp_intent);
dfp->dfp_intent = NULL;
}
- }
- list_for_each_entry(dfp, dop_list, dfp_list)
xfs_defer_pending_abort(mp, dfp);
} /* Abort all the intents that were committed. */ STATIC void xfs_defer_trans_abort( struct xfs_trans *tp, struct list_head *dop_pending) { trace_xfs_defer_trans_abort(tp, _RET_IP_);
- xfs_defer_pending_abort(tp->t_mountp, dop_pending);
- xfs_defer_pending_abort_list(tp->t_mountp, dop_pending);
} /*
- Capture resources that the caller said not to release ("held") when the
- transaction commits. Caller is responsible for zero-initializing @dres.
@@ -387,30 +416,17 @@ xfs_defer_cancel_list( struct xfs_mount *mp, struct list_head *dop_list) { struct xfs_defer_pending *dfp; struct xfs_defer_pending *pli;
- struct list_head *pwi;
- struct list_head *n;
- const struct xfs_defer_op_type *ops;
/* * Free the pending items. Caller should already have arranged * for the intent items to be released. */
- list_for_each_entry_safe(dfp, pli, dop_list, dfp_list) {
ops = defer_op_types[dfp->dfp_type];
trace_xfs_defer_cancel_list(mp, dfp);
list_del(&dfp->dfp_list);
list_for_each_safe(pwi, n, &dfp->dfp_work) {
list_del(pwi);
dfp->dfp_count--;
ops->cancel_item(pwi);
}
ASSERT(dfp->dfp_count == 0);
kmem_cache_free(xfs_defer_pending_cache, dfp);
- }
- list_for_each_entry_safe(dfp, pli, dop_list, dfp_list)
xfs_defer_pending_cancel_work(mp, dfp);
} /*
- Prevent a log intent item from pinning the tail of the log by logging a
- done item to release the intent item; and then log a new intent item.
@@ -661,10 +677,43 @@ xfs_defer_add( list_add_tail(li, &dfp->dfp_work); dfp->dfp_count++; } +/*
- Create a pending deferred work item to replay the recovered intent item
- and add it to the list.
- */
+void +xfs_defer_start_recovery(
- struct xfs_log_item *lip,
- enum xfs_defer_ops_type dfp_type,
- struct list_head *r_dfops)
+{
- struct xfs_defer_pending *dfp;
- dfp = kmem_cache_zalloc(xfs_defer_pending_cache,
GFP_NOFS | __GFP_NOFAIL);
- dfp->dfp_type = dfp_type;
- dfp->dfp_intent = lip;
- INIT_LIST_HEAD(&dfp->dfp_work);
- list_add_tail(&dfp->dfp_list, r_dfops);
+}
+/*
- Cancel a deferred work item created to recover a log intent item. @dfp
- will be freed after this function returns.
- */
+void +xfs_defer_cancel_recovery(
- struct xfs_mount *mp,
- struct xfs_defer_pending *dfp)
+{
- xfs_defer_pending_abort(mp, dfp);
- xfs_defer_pending_cancel_work(mp, dfp);
+}
/*
- Move deferred ops from one transaction to another and reset the source to
- initial state. This is primarily used to carry state forward across
- transaction rolls with pending dfops.
*/ @@ -765,11 +814,11 @@ xfs_defer_ops_capture_abort( struct xfs_mount *mp, struct xfs_defer_capture *dfc) { unsigned short i;
- xfs_defer_pending_abort(mp, &dfc->dfc_dfops);
- xfs_defer_pending_abort_list(mp, &dfc->dfc_dfops); xfs_defer_cancel_list(mp, &dfc->dfc_dfops);
for (i = 0; i < dfc->dfc_held.dr_bufs; i++) xfs_buf_relse(dfc->dfc_held.dr_bp[i]); diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index 8788ad5f6a73..5dce938ba3d5 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -123,9 +123,14 @@ void xfs_defer_ops_continue(struct xfs_defer_capture *d, struct xfs_trans *tp, struct xfs_defer_resources *dres); void xfs_defer_ops_capture_abort(struct xfs_mount *mp, struct xfs_defer_capture *d); void xfs_defer_resources_rele(struct xfs_defer_resources *dres); +void xfs_defer_start_recovery(struct xfs_log_item *lip,
enum xfs_defer_ops_type dfp_type, struct list_head *r_dfops);
+void xfs_defer_cancel_recovery(struct xfs_mount *mp,
struct xfs_defer_pending *dfp);
int __init xfs_defer_init_item_caches(void); void xfs_defer_destroy_item_caches(void); #endif /* __XFS_DEFER_H__ */ diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index a5100a11faf9..271a4ce7375c 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -151,6 +151,9 @@ xlog_recover_resv(const struct xfs_trans_res *r) }; return ret; } +void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip,
xfs_lsn_t lsn, unsigned int dfp_type);
#endif /* __XFS_LOG_RECOVER_H__ */ diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index 11e88a76a33c..a32716b8cbbd 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -770,18 +770,12 @@ xlog_recover_attri_commit_pass2( attri_formatp->alfi_value_len); attrip = xfs_attri_init(mp, nv); memcpy(&attrip->attri_format, attri_formatp, len);
- /*
* The ATTRI has two references. One for the ATTRD and one for ATTRI to
* ensure it makes it into the AIL. Insert the ATTRI into the AIL
* directly and drop the ATTRI reference. Note that
* xfs_trans_ail_update() drops the AIL lock.
*/
- xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
- xfs_attri_release(attrip);
- xlog_recover_intent_item(log, &attrip->attri_item, lsn,
xfs_attri_log_nameval_put(nv); return 0;XFS_DEFER_OPS_TYPE_ATTR);
} /* diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 1058603db3ac..8d08252e1953 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -644,16 +644,13 @@ xlog_recover_bui_commit_pass2( } buip = xfs_bui_init(mp); xfs_bui_copy_format(&buip->bui_format, bui_formatp); atomic_set(&buip->bui_next_extent, bui_formatp->bui_nextents);
- /*
* Insert the intent into the AIL directly and drop one reference so
* that finishing or canceling the work will drop the other.
*/
- xfs_trans_ail_insert(log->l_ailp, &buip->bui_item, lsn);
- xfs_bui_release(buip);
- xlog_recover_intent_item(log, &buip->bui_item, lsn,
return 0;XFS_DEFER_OPS_TYPE_BMAP);
} const struct xlog_recover_item_ops xlog_bui_item_ops = { .item_type = XFS_LI_BUI, diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 3ed25c352269..fd9fe51bcc31 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -734,16 +734,13 @@ xlog_recover_efi_commit_pass2( if (error) { xfs_efi_item_free(efip); return error; } atomic_set(&efip->efi_next_extent, efi_formatp->efi_nextents);
- /*
* Insert the intent into the AIL directly and drop one reference so
* that finishing or canceling the work will drop the other.
*/
- xfs_trans_ail_insert(log->l_ailp, &efip->efi_item, lsn);
- xfs_efi_release(efip);
- xlog_recover_intent_item(log, &efip->efi_item, lsn,
return 0;XFS_DEFER_OPS_TYPE_FREE);
} const struct xlog_recover_item_ops xlog_efi_item_ops = { .item_type = XFS_LI_EFI, diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c index ce6b303484cf..d39ee05ac1f2 100644 --- a/fs/xfs/xfs_log.c +++ b/fs/xfs/xfs_log.c @@ -1538,10 +1538,11 @@ xlog_alloc_log( log->l_logBBstart = blk_offset; log->l_logBBsize = num_bblks; log->l_covered_state = XLOG_STATE_COVER_IDLE; set_bit(XLOG_ACTIVE_RECOVERY, &log->l_opstate); INIT_DELAYED_WORK(&log->l_work, xfs_log_worker);
- INIT_LIST_HEAD(&log->r_dfops);
log->l_prev_block = -1; /* log->l_tail_lsn = 0x100000000LL; cycle = 1; current block = 0 */ xlog_assign_atomic_lsn(&log->l_tail_lsn, 1, 0); xlog_assign_atomic_lsn(&log->l_last_sync_lsn, 1, 0); diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index 1bd2963e8fbd..8677ba92d317 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -401,10 +401,11 @@ struct xlog { struct workqueue_struct *l_ioend_workqueue; /* for I/O completions */ struct delayed_work l_work; /* background flush work */ long l_opstate; /* operational state */ uint l_quotaoffs_flag; /* XFS_DQ_*, for QUOTAOFFs */ struct list_head *l_buf_cancel_table;
- struct list_head r_dfops; /* recovered log intent items */ int l_iclog_hsize; /* size of iclog header */ int l_iclog_heads; /* # of iclog header sectors */ uint l_sectBBsize; /* sector size in BBs (2^n) */ int l_iclog_size; /* size of log in bytes */ int l_iclog_bufs; /* number of iclog buffers */
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index e009bb23d8a2..65041ed7833d 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -1721,34 +1721,28 @@ xlog_clear_stale_blocks(
- Release the recovered intent item in the AIL that matches the given intent
- type and intent id.
*/ void xlog_recover_release_intent(
- struct xlog *log,
- unsigned short intent_type,
- uint64_t intent_id)
- struct xlog *log,
- unsigned short intent_type,
- uint64_t intent_id)
{
- struct xfs_ail_cursor cur;
- struct xfs_log_item *lip;
- struct xfs_ail *ailp = log->l_ailp;
- struct xfs_defer_pending *dfp, *n;
- list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
struct xfs_log_item *lip = dfp->dfp_intent;
- spin_lock(&ailp->ail_lock);
- for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0); lip != NULL;
if (lip->li_type != intent_type) continue; if (!lip->li_ops->iop_match(lip, intent_id)) continue;lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
spin_unlock(&ailp->ail_lock);
lip->li_ops->iop_release(lip);
spin_lock(&ailp->ail_lock);
break;
- }
ASSERT(xlog_item_is_intent(lip));
- xfs_trans_ail_cursor_done(&cur);
- spin_unlock(&ailp->ail_lock);
xfs_defer_cancel_recovery(log->l_mp, dfp);
- }
} int xlog_recover_iget( struct xfs_mount *mp, @@ -1937,10 +1931,33 @@ xlog_buf_readahead( { if (!xlog_is_buffer_cancelled(log, blkno, len)) xfs_buf_readahead(log->l_mp->m_ddev_targp, blkno, len, ops); } +/*
- Create a deferred work structure for resuming and tracking the progress of a
- log intent item that was found during recovery.
- */
+void +xlog_recover_intent_item(
- struct xlog *log,
- struct xfs_log_item *lip,
- xfs_lsn_t lsn,
- unsigned int dfp_type)
+{
- ASSERT(xlog_item_is_intent(lip));
- xfs_defer_start_recovery(lip, dfp_type, &log->r_dfops);
- /*
* Insert the intent into the AIL directly and drop one reference so
* that finishing or canceling the work will drop the other.
*/
- xfs_trans_ail_insert(log->l_ailp, lip, lsn);
- lip->li_ops->iop_unpin(lip, 0);
+}
STATIC int xlog_recover_items_pass2( struct xlog *log, struct xlog_recover *trans, struct list_head *buffer_list, @@ -2534,104 +2551,88 @@ xlog_abort_defer_ops(
- have started recovery on all the pending intents when we find an non-intent
- item in the AIL.
*/ STATIC int xlog_recover_process_intents(
- struct xlog *log)
- struct xlog *log)
{ LIST_HEAD(capture_list);
- struct xfs_ail_cursor cur;
- struct xfs_log_item *lip;
- struct xfs_ail *ailp;
- int error = 0;
- struct xfs_defer_pending *dfp, *n;
- int error = 0;
#if defined(DEBUG) || defined(XFS_WARN)
- xfs_lsn_t last_lsn;
-#endif
- xfs_lsn_t last_lsn;
- ailp = log->l_ailp;
- spin_lock(&ailp->ail_lock);
-#if defined(DEBUG) || defined(XFS_WARN) last_lsn = xlog_assign_lsn(log->l_curr_cycle, log->l_curr_block); #endif
- for (lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
lip != NULL;
lip = xfs_trans_ail_cursor_next(ailp, &cur)) {
const struct xfs_item_ops *ops;
if (!xlog_item_is_intent(lip))
break;
- list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
struct xfs_log_item *lip = dfp->dfp_intent;
const struct xfs_item_ops *ops = lip->li_ops;
ASSERT(xlog_item_is_intent(lip));
/* * We should never see a redo item with a LSN higher than * the last transaction we found in the log at the start * of recovery. */ ASSERT(XFS_LSN_CMP(last_lsn, lip->li_lsn) >= 0); /* * NOTE: If your intent processing routine can create more * deferred ops, you /must/ attach them to the capture list in * the recover routine or else those subsequent intents will be * replayed in the wrong order! * * The recovery function can free the log item, so we must not * access lip after it returns. */
spin_unlock(&ailp->ail_lock);
error = ops->iop_recover(lip, &capture_list);ops = lip->li_ops;
if (error) { trace_xlog_intent_recovery_failed(log->l_mp, error, ops->iop_recover); break; }spin_lock(&ailp->ail_lock);
- }
- xfs_trans_ail_cursor_done(&cur);
- spin_unlock(&ailp->ail_lock);
/*
* XXX: @lip could have been freed, so detach the log item from
* the pending item before freeing the pending item. This does
* not fix the existing UAF bug that occurs if ->iop_recover
* fails after creating the intent done item.
*/
dfp->dfp_intent = NULL;
xfs_defer_cancel_recovery(log->l_mp, dfp);
- } if (error) goto err;
error = xlog_finish_defer_ops(log->l_mp, &capture_list); if (error) goto err; return 0; err: xlog_abort_defer_ops(log->l_mp, &capture_list); return error; } /*
- A cancel occurs when the mount has failed and we're bailing out. Release all
- pending log intent items that we haven't started recovery on so they don't
- pin the AIL.
*/ STATIC void xlog_recover_cancel_intents(
- struct xlog *log)
- struct xlog *log)
{
- struct xfs_log_item *lip;
- struct xfs_ail_cursor cur;
- struct xfs_ail *ailp;
- ailp = log->l_ailp;
- spin_lock(&ailp->ail_lock);
- lip = xfs_trans_ail_cursor_first(ailp, &cur, 0);
- while (lip != NULL) {
if (!xlog_item_is_intent(lip))
break;
- struct xfs_defer_pending *dfp, *n;
spin_unlock(&ailp->ail_lock);
lip->li_ops->iop_release(lip);
spin_lock(&ailp->ail_lock);
lip = xfs_trans_ail_cursor_next(ailp, &cur);
- }
- list_for_each_entry_safe(dfp, n, &log->r_dfops, dfp_list) {
ASSERT(xlog_item_is_intent(dfp->dfp_intent));
- xfs_trans_ail_cursor_done(&cur);
- spin_unlock(&ailp->ail_lock);
xfs_defer_cancel_recovery(log->l_mp, dfp);
- }
} /*
- This routine performs a transaction to null out a bad inode pointer
- in an agi unlinked inode hash bucket.
diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index dfd7b824e32b..1e047107d2f2 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -666,16 +666,13 @@ xlog_recover_cui_commit_pass2( } cuip = xfs_cui_init(mp, cui_formatp->cui_nextents); xfs_cui_copy_format(&cuip->cui_format, cui_formatp); atomic_set(&cuip->cui_next_extent, cui_formatp->cui_nextents);
- /*
* Insert the intent into the AIL directly and drop one reference so
* that finishing or canceling the work will drop the other.
*/
- xfs_trans_ail_insert(log->l_ailp, &cuip->cui_item, lsn);
- xfs_cui_release(cuip);
- xlog_recover_intent_item(log, &cuip->cui_item, lsn,
return 0;XFS_DEFER_OPS_TYPE_REFCOUNT);
} const struct xlog_recover_item_ops xlog_cui_item_ops = { .item_type = XFS_LI_CUI, diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 2043cea261c0..12ae8ab6a69d 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -680,16 +680,13 @@ xlog_recover_rui_commit_pass2( } ruip = xfs_rui_init(mp, rui_formatp->rui_nextents); xfs_rui_copy_format(&ruip->rui_format, rui_formatp); atomic_set(&ruip->rui_next_extent, rui_formatp->rui_nextents);
- /*
* Insert the intent into the AIL directly and drop one reference so
* that finishing or canceling the work will drop the other.
*/
- xfs_trans_ail_insert(log->l_ailp, &ruip->rui_item, lsn);
- xfs_rui_release(ruip);
- xlog_recover_intent_item(log, &ruip->rui_item, lsn,
return 0;XFS_DEFER_OPS_TYPE_RMAP);
} const struct xlog_recover_item_ops xlog_rui_item_ops = { .item_type = XFS_LI_RUI, -- 2.49.0.rc1.451.g8f38331e32-goog
Hey Fedor,
Thanks a bunch for the report! I don't see xfs/235 running on my setup. I will look into why and see if I can repro.
Few questions,
Were you able to confirm that e5f1a5146ec3 fixes the issue on 6.1.y? If so, we can just port this patch, otherwise we might want to drop the xfs set while we investigate further.
Also, the backport set you mentioned was based on a set from 6.6.y. I don't see the suggested fix (e5f1a5146ec3) there either. If it's not too much hassle, could you see if we have the same problem for 6.6.y as well?
Best, Leah
On Fri, 21. Mar 10:42, Leah Rumancik wrote:
Hey Fedor,
Thanks a bunch for the report! I don't see xfs/235 running on my setup. I will look into why and see if I can repro.
Few questions,
Were you able to confirm that e5f1a5146ec3 fixes the issue on 6.1.y?
Oh, it probably wasn't obvious from my initial report but the problem concerns only 6.1.132-rc1 at the moment. It's in testing phase and hasn't been finally released yet. So it hasn't reached 6.1.y.
Unfortunately, it's hard to port the proposed fix directly on top of 6.1.132-rc1 since it depends on a number of non-trivial changes done in mainline. The conflicts are huge and require a solid amount of expertise in the code and I'm not ready to do this to be honest.
Well, as I perceive, the reason to port the following patches
[PATCH 6.1 13/29] xfs: don't leak recovered attri intent items [PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover intent items [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover
to stable (6.1, in particular) is to fix a UAF when intent recovery fails. This is stated in upstream patchset [1] where the patches come from.
[1]: https://lore.kernel.org/linux-xfs/170191741007.1195961.10092536809136830257....
Taking as a fact that 6.1.y is vulnerable to that bug (though I failed to find an exact blamed commit), I think that the whole series of 8 patches should be ported, otherwise it looks not complete. But only 4/8 patches have been taken to 6.1-queue and 6.6 so far.
If so, we can just port this patch, otherwise we might want to drop the xfs set while we investigate further.
I'd suggest to drop the xfs set from 6.1-queue for now until the problem is addressed in some way.
Also, the backport set you mentioned was based on a set from 6.6.y. I don't see the suggested fix (e5f1a5146ec3) there either. If it's not too much hassle, could you see if we have the same problem for 6.6.y as well?
Yes, the crash occurs there, too. And for 6.6 case it actually is for a released kernel (since v6.6.24).
The remaining four patches of the original upstream series [1] - one of which is e5f1a5146ec3 - can be applied there without many problems, fortunately.
I'll send them to you in a separate thread.
Okay so a summary from my understanding, correct me if I'm wrong:
03f7767c9f612 introduced the issue in both 6.1 and 6.6.
On mainline, this is resolved by e5f1a5146ec3. This commit is painful to apply to 6.1 but does apply to 6.6 along with the rest of the patchset it was a part of (which is the set you just sent out for 6.6).
With the stable branches we try to balance the risk of introducing new bugs via huge fixes with the benefit of the fix itself. Especially if the patches don't apply cleanly, it might not be worth the risk and effort to do the porting. Hmm, since it seems like we might not even end up taking 03f7767c9f6120 to stable, I'd propose we just drop 03f7767c9f6120 for now. If the rest of the subsequent patches in the original set apply cleanly, I don't think we need to drop them all. We can then try to fix the UAF with a more targeted approach in a later patch instead of via direct cherry-picks.
What do you think?
- leah
Also, the backport set you mentioned was based on a set from 6.6.y. I don't see the suggested fix (e5f1a5146ec3) there either. If it's not too much hassle, could you see if we have the same problem for 6.6.y as well?
Yes, the crash occurs there, too. And for 6.6 case it actually is for a released kernel (since v6.6.24).
The remaining four patches of the original upstream series [1] - one of which is e5f1a5146ec3 - can be applied there without many problems, fortunately.
I'll send them to you in a separate thread.
On Sun, 23. Mar 17:29, Leah Rumancik wrote:
Okay so a summary from my understanding, correct me if I'm wrong:
03f7767c9f612 introduced the issue in both 6.1 and 6.6.
On mainline, this is resolved by e5f1a5146ec3. This commit is painful to apply to 6.1 but does apply to 6.6 along with the rest of the patchset it was a part of (which is the set you just sent out for 6.6).
Yeah, that's all correct.
With the stable branches we try to balance the risk of introducing new bugs via huge fixes with the benefit of the fix itself. Especially if the patches don't apply cleanly, it might not be worth the risk and effort to do the porting. Hmm, since it seems like we might not even end up taking 03f7767c9f6120 to stable, I'd propose we just drop 03f7767c9f6120 for now. If the rest of the subsequent patches in the original set apply cleanly, I don't think we need to drop them all. We can then try to fix the UAF with a more targeted approach in a later patch instead of via direct cherry-picks.
What do you think?
03f7767c9f6120 is '[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover'
Two subsequent patches depend on it logically so should also be dropped:
'[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover' '[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover'
On the other side, '[PATCH 6.1 13/29] xfs: don't leak recovered attri intent items' which is at the start of the original patchset [1] looks OK to be taken. It's rather aside from the subsequent rework patches and fixes a pinpoint bug.
[1]: https://lore.kernel.org/linux-xfs/170191741007.1195961.10092536809136830257....
So I've tried the current xfs backport series with three dropped commits:
[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover
(everything before and after that still applies cleanly and touches other things)
and no regressions seen on my side.
This sounds good to me.
Greg, can we drop the following patches?
'[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover' '[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover' '[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover'
Thanks, leah
On Mon, Mar 24, 2025 at 1:53 AM Fedor Pchelkin pchelkin@ispras.ru wrote:
On Sun, 23. Mar 17:29, Leah Rumancik wrote:
Okay so a summary from my understanding, correct me if I'm wrong:
03f7767c9f612 introduced the issue in both 6.1 and 6.6.
On mainline, this is resolved by e5f1a5146ec3. This commit is painful to apply to 6.1 but does apply to 6.6 along with the rest of the patchset it was a part of (which is the set you just sent out for 6.6).
Yeah, that's all correct.
With the stable branches we try to balance the risk of introducing new bugs via huge fixes with the benefit of the fix itself. Especially if the patches don't apply cleanly, it might not be worth the risk and effort to do the porting. Hmm, since it seems like we might not even end up taking 03f7767c9f6120 to stable, I'd propose we just drop 03f7767c9f6120 for now. If the rest of the subsequent patches in the original set apply cleanly, I don't think we need to drop them all. We can then try to fix the UAF with a more targeted approach in a later patch instead of via direct cherry-picks.
What do you think?
03f7767c9f6120 is '[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover'
Two subsequent patches depend on it logically so should also be dropped:
'[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover' '[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover'
On the other side, '[PATCH 6.1 13/29] xfs: don't leak recovered attri intent items' which is at the start of the original patchset [1] looks OK to be taken. It's rather aside from the subsequent rework patches and fixes a pinpoint bug.
So I've tried the current xfs backport series with three dropped commits:
[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover [PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover [PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover
(everything before and after that still applies cleanly and touches other things)
and no regressions seen on my side.
On Mon, Mar 24, 2025 at 02:10:03PM -0700, Leah Rumancik wrote:
This sounds good to me.
Greg, can we drop the following patches?
'[PATCH 6.1 14/29] xfs: use xfs_defer_pending objects to recover' '[PATCH 6.1 15/29] xfs: pass the xfs_defer_pending object to iop_recover' '[PATCH 6.1 16/29] xfs: transfer recovered intent item ownership in ->iop_recover'
Now dropped, thanks!
greg k-h
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit a050acdfa8003a44eae4558fddafc7afb1aef458 ]
Now that log intent item recovery recreates the xfs_defer_pending state, we should pass that into the ->iop_recover routines so that the intent item can finish the recreation work.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_attr_item.c | 3 ++- fs/xfs/xfs_bmap_item.c | 3 ++- fs/xfs/xfs_extfree_item.c | 3 ++- fs/xfs/xfs_log_recover.c | 2 +- fs/xfs/xfs_refcount_item.c | 3 ++- fs/xfs/xfs_rmap_item.c | 3 ++- fs/xfs/xfs_trans.h | 4 +++- 7 files changed, 14 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index a32716b8cbbd..6119a7a480a0 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -543,13 +543,14 @@ xfs_attri_validate( * Process an attr intent item that was recovered from the log. We need to * delete the attr that it describes. */ STATIC int xfs_attri_item_recover( - struct xfs_log_item *lip, + struct xfs_defer_pending *dfp, struct list_head *capture_list) { + struct xfs_log_item *lip = dfp->dfp_intent; struct xfs_attri_log_item *attrip = ATTRI_ITEM(lip); struct xfs_attr_intent *attr; struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_inode *ip; struct xfs_da_args *args; diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 8d08252e1953..30c05eb862d6 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -451,15 +451,16 @@ xfs_bui_validate( * Process a bmap update intent item that was recovered from the log. * We need to update some inode's bmbt. */ STATIC int xfs_bui_item_recover( - struct xfs_log_item *lip, + struct xfs_defer_pending *dfp, struct list_head *capture_list) { struct xfs_bmap_intent fake = { }; struct xfs_trans_res resv; + struct xfs_log_item *lip = dfp->dfp_intent; struct xfs_bui_log_item *buip = BUI_ITEM(lip); struct xfs_trans *tp; struct xfs_inode *ip = NULL; struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_map_extent *map; diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index fd9fe51bcc31..6cfd9339f872 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -593,14 +593,15 @@ xfs_efi_validate_ext( * Process an extent free intent item that was recovered from * the log. We need to free the extents that it describes. */ STATIC int xfs_efi_item_recover( - struct xfs_log_item *lip, + struct xfs_defer_pending *dfp, struct list_head *capture_list) { struct xfs_trans_res resv; + struct xfs_log_item *lip = dfp->dfp_intent; struct xfs_efi_log_item *efip = EFI_ITEM(lip); struct xfs_mount *mp = lip->li_log->l_mp; struct xfs_efd_log_item *efdp; struct xfs_trans *tp; int i; diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 65041ed7833d..303bf9728c03 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2584,11 +2584,11 @@ xlog_recover_process_intents( * replayed in the wrong order! * * The recovery function can free the log item, so we must not * access lip after it returns. */ - error = ops->iop_recover(lip, &capture_list); + error = ops->iop_recover(dfp, &capture_list); if (error) { trace_xlog_intent_recovery_failed(log->l_mp, error, ops->iop_recover); break; } diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 1e047107d2f2..e158dd9f86b0 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -448,14 +448,15 @@ xfs_cui_validate_phys( * Process a refcount update intent item that was recovered from the log. * We need to update the refcountbt. */ STATIC int xfs_cui_item_recover( - struct xfs_log_item *lip, + struct xfs_defer_pending *dfp, struct list_head *capture_list) { struct xfs_trans_res resv; + struct xfs_log_item *lip = dfp->dfp_intent; struct xfs_cui_log_item *cuip = CUI_ITEM(lip); struct xfs_cud_log_item *cudp; struct xfs_trans *tp; struct xfs_btree_cur *rcur = NULL; struct xfs_mount *mp = lip->li_log->l_mp; diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 12ae8ab6a69d..b3b4de68b41b 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -487,14 +487,15 @@ xfs_rui_validate_map( * Process an rmap update intent item that was recovered from the log. * We need to update the rmapbt. */ STATIC int xfs_rui_item_recover( - struct xfs_log_item *lip, + struct xfs_defer_pending *dfp, struct list_head *capture_list) { struct xfs_trans_res resv; + struct xfs_log_item *lip = dfp->dfp_intent; struct xfs_rui_log_item *ruip = RUI_ITEM(lip); struct xfs_map_extent *rmap; struct xfs_rud_log_item *rudp; struct xfs_trans *tp; struct xfs_btree_cur *rcur = NULL; diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 55819785941c..0f2b62f3ca19 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -64,23 +64,25 @@ struct xfs_log_item { { (1u << XFS_LI_ABORTED), "ABORTED" }, \ { (1u << XFS_LI_FAILED), "FAILED" }, \ { (1u << XFS_LI_DIRTY), "DIRTY" }, \ { (1u << XFS_LI_WHITEOUT), "WHITEOUT" }
+struct xfs_defer_pending; + struct xfs_item_ops { unsigned flags; void (*iop_size)(struct xfs_log_item *, int *, int *); void (*iop_format)(struct xfs_log_item *, struct xfs_log_vec *); void (*iop_pin)(struct xfs_log_item *); void (*iop_unpin)(struct xfs_log_item *, int remove); uint64_t (*iop_sort)(struct xfs_log_item *lip); int (*iop_precommit)(struct xfs_trans *tp, struct xfs_log_item *lip); void (*iop_committing)(struct xfs_log_item *lip, xfs_csn_t seq); xfs_lsn_t (*iop_committed)(struct xfs_log_item *, xfs_lsn_t); uint (*iop_push)(struct xfs_log_item *, struct list_head *); void (*iop_release)(struct xfs_log_item *); - int (*iop_recover)(struct xfs_log_item *lip, + int (*iop_recover)(struct xfs_defer_pending *dfp, struct list_head *capture_list); bool (*iop_match)(struct xfs_log_item *item, uint64_t id); struct xfs_log_item *(*iop_relog)(struct xfs_log_item *intent, struct xfs_trans *tp); struct xfs_log_item *(*iop_intent)(struct xfs_log_item *intent_done);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit deb4cd8ba87f17b12c72b3827820d9c703e9fd95 ]
Now that we pass the xfs_defer_pending object into the intent item recovery functions, we know exactly when ownership of the sole refcount passes from the recovery context to the intent done item. At that point, we need to null out dfp_intent so that the recovery mechanism won't release it. This should fix the UAF problem reported by Long Li.
Note that we still want to recreate the full deferred work state. That will be addressed in the next patches.
Fixes: 2e76f188fd90 ("xfs: cancel intents immediately if process_intents fails") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_log_recover.h | 2 ++ fs/xfs/xfs_attr_item.c | 1 + fs/xfs/xfs_bmap_item.c | 2 ++ fs/xfs/xfs_extfree_item.c | 2 ++ fs/xfs/xfs_log_recover.c | 19 ++++++++++++------- fs/xfs/xfs_refcount_item.c | 1 + fs/xfs/xfs_rmap_item.c | 2 ++ 7 files changed, 22 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_log_recover.h b/fs/xfs/libxfs/xfs_log_recover.h index 271a4ce7375c..13583df9f239 100644 --- a/fs/xfs/libxfs/xfs_log_recover.h +++ b/fs/xfs/libxfs/xfs_log_recover.h @@ -153,7 +153,9 @@ xlog_recover_resv(const struct xfs_trans_res *r) return ret; }
void xlog_recover_intent_item(struct xlog *log, struct xfs_log_item *lip, xfs_lsn_t lsn, unsigned int dfp_type); +void xlog_recover_transfer_intent(struct xfs_trans *tp, + struct xfs_defer_pending *dfp);
#endif /* __XFS_LOG_RECOVER_H__ */ diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c index 6119a7a480a0..82775e9537df 100644 --- a/fs/xfs/xfs_attr_item.c +++ b/fs/xfs/xfs_attr_item.c @@ -630,10 +630,11 @@ xfs_attri_item_recover( if (error) goto out;
args->trans = tp; done_item = xfs_trans_get_attrd(tp, attrip); + xlog_recover_transfer_intent(tp, dfp);
xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
error = xfs_xattri_finish_update(attr, done_item); diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 30c05eb862d6..317cf49e6cd6 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -489,10 +489,12 @@ xfs_bui_item_recover( XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK), 0, 0, &tp); if (error) goto err_rele;
budp = xfs_trans_get_bud(tp, buip); + xlog_recover_transfer_intent(tp, dfp); + xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, 0);
if (fake.bi_type == XFS_BMAP_MAP) iext_delta = XFS_IEXT_ADD_NOSPLIT_CNT; diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 6cfd9339f872..9c726f082285 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -624,11 +624,13 @@ xfs_efi_item_recover(
resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); error = xfs_trans_alloc(mp, &resv, 0, 0, 0, &tp); if (error) return error; + efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents); + xlog_recover_transfer_intent(tp, dfp);
for (i = 0; i < efip->efi_format.efi_nextents; i++) { struct xfs_extent_free_item fake = { .xefi_owner = XFS_RMAP_OWN_UNKNOWN, .xefi_agresv = XFS_AG_RESV_NONE, diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 303bf9728c03..e7992a651740 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -2591,17 +2591,10 @@ xlog_recover_process_intents( trace_xlog_intent_recovery_failed(log->l_mp, error, ops->iop_recover); break; }
- /* - * XXX: @lip could have been freed, so detach the log item from - * the pending item before freeing the pending item. This does - * not fix the existing UAF bug that occurs if ->iop_recover - * fails after creating the intent done item. - */ - dfp->dfp_intent = NULL; xfs_defer_cancel_recovery(log->l_mp, dfp); } if (error) goto err;
@@ -2631,10 +2624,22 @@ xlog_recover_cancel_intents(
xfs_defer_cancel_recovery(log->l_mp, dfp); } }
+/* + * Transfer ownership of the recovered log intent item to the recovery + * transaction. + */ +void +xlog_recover_transfer_intent( + struct xfs_trans *tp, + struct xfs_defer_pending *dfp) +{ + dfp->dfp_intent = NULL; +} + /* * This routine performs a transaction to null out a bad inode pointer * in an agi unlinked inode hash bucket. */ STATIC void diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index e158dd9f86b0..2e8bdd598296 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -497,10 +497,11 @@ xfs_cui_item_recover( XFS_TRANS_RESERVE, &tp); if (error) return error;
cudp = xfs_trans_get_cud(tp, cuip); + xlog_recover_transfer_intent(tp, dfp);
for (i = 0; i < cuip->cui_format.cui_nextents; i++) { struct xfs_refcount_intent fake = { }; struct xfs_phys_extent *refc;
diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index b3b4de68b41b..f29fadf68ed2 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -524,11 +524,13 @@ xfs_rui_item_recover( resv = xlog_recover_resv(&M_RES(mp)->tr_itruncate); error = xfs_trans_alloc(mp, &resv, mp->m_rmap_maxlevels, 0, XFS_TRANS_RESERVE, &tp); if (error) return error; + rudp = xfs_trans_get_rud(tp, ruip); + xlog_recover_transfer_intent(tp, dfp);
for (i = 0; i < ruip->rui_format.rui_nextents; i++) { rmap = &ruip->rui_format.rui_extents[i]; state = (rmap->me_flags & XFS_RMAP_EXTENT_UNWRITTEN) ? XFS_EXT_UNWRITTEN : XFS_EXT_NORM;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit a6a38f309afc4a7ede01242b603f36c433997780 ]
There's a weird discrepancy in xfsprogs dating back to the creation of the Linux port -- if there are zero rt extents, mkfs will set sb_rextents and sb_rextslog both to zero:
sbp->sb_rextslog = (uint8_t)(rtextents ? libxfs_highbit32((unsigned int)rtextents) : 0);
However, that's not the check that xfs_repair uses for nonzero rtblocks:
if (sb->sb_rextslog != libxfs_highbit32((unsigned int)sb->sb_rextents))
The difference here is that xfs_highbit32 returns -1 if its argument is zero. Unfortunately, this means that in the weird corner case of a realtime volume shorter than 1 rt extent, xfs_repair will immediately flag a freshly formatted filesystem as corrupt. Because mkfs has been writing ondisk artifacts like this for decades, we have to accept that as "correct". TBH, zero rextslog for zero rtextents makes more sense to me anyway.
Regrettably, the superblock verifier checks created in commit copied xfs_repair even though mkfs has been writing out such filesystems for ages. Fix the superblock verifier to accept what mkfs spits out; the userspace version of this patch will have to fix xfs_repair as well.
Note that the new helper leaves the zeroday bug where the upper 32 bits of sb_rextents is ripped off and fed to highbit32. This leads to a seriously undersized rt summary file, which immediately breaks mkfs:
$ hugedisk.sh foo /dev/sdc $(( 0x100000080 * 4096))B $ /sbin/mkfs.xfs -f /dev/sda -m rmapbt=0,reflink=0 -r rtdev=/dev/mapper/foo meta-data=/dev/sda isize=512 agcount=4, agsize=1298176 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=5192704, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =/dev/mapper/foo extsz=4096 blocks=4294967424, rtextents=4294967424 Discarding blocks...Done. mkfs.xfs: Error initializing the realtime space [117 - Structure needs cleaning]
The next patch will drop support for rt volumes with fewer than 1 or more than 2^32-1 rt extents, since they've clearly been broken forever.
Fixes: f8e566c0f5e1f ("xfs: validate the realtime geometry in xfs_validate_sb_common") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_rtbitmap.c | 13 +++++++++++++ fs/xfs/libxfs/xfs_rtbitmap.h | 4 ++++ fs/xfs/libxfs/xfs_sb.c | 3 ++- fs/xfs/xfs_rtalloc.c | 4 ++-- 4 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 9eb1b5aa7e35..37b425ea3fed 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1128,5 +1128,18 @@ xfs_rtalloc_extent_is_free( return error;
*is_free = matches; return 0; } + +/* + * Compute the maximum level number of the realtime summary file, as defined by + * mkfs. The use of highbit32 on a 64-bit quantity is a historic artifact that + * prohibits correct use of rt volumes with more than 2^32 extents. + */ +uint8_t +xfs_compute_rextslog( + xfs_rtbxlen_t rtextents) +{ + return rtextents ? xfs_highbit32(rtextents) : 0; +} + diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index c3ef22e67aa3..6becdc7a48ed 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -68,15 +68,19 @@ xfs_rtfree_extent( xfs_extlen_t len); /* length of extent freed */
/* Same as above, but in units of rt blocks. */ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, xfs_filblks_t rtlen); + +uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents); + #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) # define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) # define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) # define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) # define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +# define xfs_compute_rextslog(rtx) (0) #endif /* CONFIG_XFS_RT */
#endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index d214233ef532..6b87b04d0c6c 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -23,10 +23,11 @@ #include "xfs_rmap_btree.h" #include "xfs_refcount_btree.h" #include "xfs_da_format.h" #include "xfs_health.h" #include "xfs_ag.h" +#include "xfs_rtbitmap.h"
/* * Physical superblock buffer manipulations. Shared with libxfs in userspace. */
@@ -500,11 +501,11 @@ xfs_validate_sb_common( rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize); rbmblocks = howmany_64(sbp->sb_rextents, NBBY * sbp->sb_blocksize);
if (sbp->sb_rextents != rexts || - sbp->sb_rextslog != xfs_highbit32(sbp->sb_rextents) || + sbp->sb_rextslog != xfs_compute_rextslog(rexts) || sbp->sb_rbmblocks != rbmblocks) { xfs_notice(mp, "realtime geometry sanity check failed"); return -EFSCORRUPTED; } diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 2f2280f4e7fa..eca800e2b879 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -997,11 +997,11 @@ xfs_growfs_rt( * Calculate new parameters. These are the final values to be reached. */ nrextents = nrblocks; do_div(nrextents, in->extsize); nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize); - nrextslog = xfs_highbit32(nrextents); + nrextslog = xfs_compute_rextslog(nrextents); nrsumlevels = nrextslog + 1; nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks; nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize); nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); /* @@ -1059,11 +1059,11 @@ xfs_growfs_rt( nsbp->sb_rextsize; nsbp->sb_rblocks = min(nrblocks, nrblocks_step); nsbp->sb_rextents = nsbp->sb_rblocks; do_div(nsbp->sb_rextents, nsbp->sb_rextsize); ASSERT(nsbp->sb_rextents != 0); - nsbp->sb_rextslog = xfs_highbit32(nsbp->sb_rextents); + nsbp->sb_rextslog = xfs_compute_rextslog(nsbp->sb_rextents); nrsumlevels = nmp->m_rsumlevels = nsbp->sb_rextslog + 1; nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nsbp->sb_rbmblocks; nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit cf8f0e6c1429be7652869059ea44696b72d5b726 ]
It's quite reasonable that some customer somewhere will want to configure a realtime volume with more than 2^32 extents. If they try to do this, the highbit32() call will truncate the upper bits of the xfs_rtbxlen_t and produce the wrong value for rextslog. This in turn causes the rsumlevels to be wrong, which results in a realtime summary file that is the wrong length. Fix that.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_rtbitmap.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 37b425ea3fed..8db1243beacc 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1131,15 +1131,17 @@ xfs_rtalloc_extent_is_free( return 0; }
/* * Compute the maximum level number of the realtime summary file, as defined by - * mkfs. The use of highbit32 on a 64-bit quantity is a historic artifact that - * prohibits correct use of rt volumes with more than 2^32 extents. + * mkfs. The historic use of highbit32 on a 64-bit quantity prohibited correct + * use of rt volumes with more than 2^32 extents. */ uint8_t xfs_compute_rextslog( xfs_rtbxlen_t rtextents) { - return rtextents ? xfs_highbit32(rtextents) : 0; + if (!rtextents) + return 0; + return xfs_highbit64(rtextents); }
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit e14293803f4e84eb23a417b462b56251033b5a66 ]
Don't allow realtime volumes that are less than one rt extent long. This has been broken across 4 LTS kernels with nobody noticing, so let's just disable it.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_rtbitmap.h | 13 +++++++++++++ fs/xfs/libxfs/xfs_sb.c | 3 ++- fs/xfs/xfs_rtalloc.c | 2 ++ 3 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 6becdc7a48ed..4e49aadf0955 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -71,16 +71,29 @@ xfs_rtfree_extent( int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, xfs_filblks_t rtlen);
uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents);
+/* Do we support an rt volume having this number of rtextents? */ +static inline bool +xfs_validate_rtextents( + xfs_rtbxlen_t rtextents) +{ + /* No runt rt volumes */ + if (rtextents == 0) + return false; + + return true; +} + #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) # define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) # define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) # define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) # define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) # define xfs_compute_rextslog(rtx) (0) +# define xfs_validate_rtextents(rtx) (false) #endif /* CONFIG_XFS_RT */
#endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 6b87b04d0c6c..04247d1c7523 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -500,11 +500,12 @@ xfs_validate_sb_common(
rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize); rbmblocks = howmany_64(sbp->sb_rextents, NBBY * sbp->sb_blocksize);
- if (sbp->sb_rextents != rexts || + if (!xfs_validate_rtextents(rexts) || + sbp->sb_rextents != rexts || sbp->sb_rextslog != xfs_compute_rextslog(rexts) || sbp->sb_rbmblocks != rbmblocks) { xfs_notice(mp, "realtime geometry sanity check failed"); return -EFSCORRUPTED; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index eca800e2b879..2dcd5cca4ec2 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -996,10 +996,12 @@ xfs_growfs_rt( /* * Calculate new parameters. These are the final values to be reached. */ nrextents = nrblocks; do_div(nrextents, in->extsize); + if (!xfs_validate_rtextents(nrextents)) + return -EINVAL; nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize); nrextslog = xfs_compute_rextslog(nrextents); nrsumlevels = nrextslog + 1; nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks; nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize);
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 4c8ecd1cfdd01fb727121035014d9f654a30bdf2 ]
Remove these unused fields since nobody uses them. They should have been removed years ago in a different cleanup series from Christoph Hellwig.
Fixes: daf83964a3681 ("xfs: move the per-fork nextents fields into struct xfs_ifork") Fixes: f7e67b20ecbbc ("xfs: move the fork format fields into struct xfs_ifork") Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Dave Chinner dchinner@redhat.com Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_btree_staging.h | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_btree_staging.h b/fs/xfs/libxfs/xfs_btree_staging.h index f0d2976050ae..5f638f711246 100644 --- a/fs/xfs/libxfs/xfs_btree_staging.h +++ b/fs/xfs/libxfs/xfs_btree_staging.h @@ -35,16 +35,10 @@ struct xbtree_ifakeroot { /* Height of the new btree. */ unsigned int if_levels;
/* Number of bytes available for this fork in the inode. */ unsigned int if_fork_size; - - /* Fork format. */ - unsigned int if_format; - - /* Number of records. */ - unsigned int if_extents; };
/* Cursor interactions with fake roots for inode-rooted btrees. */ void xfs_btree_stage_ifakeroot(struct xfs_btree_cur *cur, struct xbtree_ifakeroot *ifake,
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 578bd4ce7100ae34f98c6b0147fe75cfa0dadbac ]
While playing with growfs to create a 20TB realtime section on a filesystem that didn't previously have an rt section, I noticed that growfs would occasionally shut down the log due to a transaction reservation overflow.
xfs_calc_growrtfree_reservation uses the current size of the realtime summary file (m_rsumsize) to compute the transaction reservation for a growrtfree transaction. The reservations are computed at mount time, which means that m_rsumsize is zero when growfs starts "freeing" the new realtime extents into the rt volume. As a result, the transaction is undersized and fails.
Fix this by recomputing the transaction reservations every time we change m_rsumsize.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_rtalloc.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 2dcd5cca4ec2..7c5134899634 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1068,10 +1068,13 @@ xfs_growfs_rt( nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nsbp->sb_rbmblocks; nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize); nmp->m_rsumsize = nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); + /* recompute growfsrt reservation from new rsumsize */ + xfs_trans_resv_calc(nmp, &nmp->m_resv); + /* * Start a transaction, get the log reservation. */ error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtfree, 0, 0, 0, &tp); @@ -1151,10 +1154,12 @@ xfs_growfs_rt( /* * Update mp values into the real mp structure. */ mp->m_rsumlevels = nrsumlevels; mp->m_rsumsize = nrsumsize; + /* recompute growfsrt reservation from new rsumsize */ + xfs_trans_resv_calc(mp, &mp->m_resv);
error = xfs_trans_commit(tp); if (error) break;
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 13ae04d8d45227c2ba51e188daf9fc13d08a1b12 ]
While stress-testing online repair of btrees, I noticed periodic assertion failures from the buffer cache about buffers with incorrect DELWRI_Q state. Looking further, I observed this race between the AIL trying to write out a btree block and repair zapping a btree block after the fact:
AIL: Repair0:
pin buffer X delwri_queue: set DELWRI_Q add to delwri list
stale buf X: clear DELWRI_Q does not clear b_list free space X commit
delwri_submit # oops
Worse yet, I discovered that running the same repair over and over in a tight loop can result in a second race that cause data integrity problems with the repair:
AIL: Repair0: Repair1:
pin buffer X delwri_queue: set DELWRI_Q add to delwri list
stale buf X: clear DELWRI_Q does not clear b_list free space X commit
find free space X get buffer rewrite buffer delwri_queue: set DELWRI_Q already on a list, do not add commit
BAD: committed tree root before all blocks written
delwri_submit # too late now
I traced this to my own misunderstanding of how the delwri lists work, particularly with regards to the AIL's buffer list. If a buffer is logged and committed, the buffer can end up on that AIL buffer list. If btree repairs are run twice in rapid succession, it's possible that the first repair will invalidate the buffer and free it before the next time the AIL wakes up. Marking the buffer stale clears DELWRI_Q from the buffer state without removing the buffer from its delwri list. The buffer doesn't know which list it's on, so it cannot know which lock to take to protect the list for a removal.
If the second repair allocates the same block, it will then recycle the buffer to start writing the new btree block. Meanwhile, if the AIL wakes up and walks the buffer list, it will ignore the buffer because it can't lock it, and go back to sleep.
When the second repair calls delwri_queue to put the buffer on the list of buffers to write before committing the new btree, it will set DELWRI_Q again, but since the buffer hasn't been removed from the AIL's buffer list, it won't add it to the bulkload buffer's list.
This is incorrect, because the bulkload caller relies on delwri_submit to ensure that all the buffers have been sent to disk /before/ committing the new btree root pointer. This ordering requirement is required for data consistency.
Worse, the AIL won't clear DELWRI_Q from the buffer when it does finally drop it, so the next thread to walk through the btree will trip over a debug assertion on that flag.
To fix this, create a new function that waits for the buffer to be removed from any other delwri lists before adding the buffer to the caller's delwri list. By waiting for the buffer to clear both the delwri list and any potential delwri wait list, we can be sure that repair will initiate writes of all buffers and report all write errors back to userspace instead of committing the new structure.
Signed-off-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_btree_staging.c | 4 +-- fs/xfs/xfs_buf.c | 44 ++++++++++++++++++++++++++++--- fs/xfs/xfs_buf.h | 1 + 3 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c index dd75e208b543..29e3f8ccb185 100644 --- a/fs/xfs/libxfs/xfs_btree_staging.c +++ b/fs/xfs/libxfs/xfs_btree_staging.c @@ -340,13 +340,11 @@ xfs_btree_bload_drop_buf( struct xfs_buf **bpp) { if (*bpp == NULL) return;
- if (!xfs_buf_delwri_queue(*bpp, buffers_list)) - ASSERT(0); - + xfs_buf_delwri_queue_here(*bpp, buffers_list); xfs_buf_relse(*bpp); *bpp = NULL; }
/* diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 54c774af6e1c..257945cdf63b 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -2038,28 +2038,36 @@ xfs_alloc_buftarg( error_free: kmem_free(btp); return NULL; }
+static inline void +xfs_buf_list_del( + struct xfs_buf *bp) +{ + list_del_init(&bp->b_list); + wake_up_var(&bp->b_list); +} + /* * Cancel a delayed write list. * * Remove each buffer from the list, clear the delwri queue flag and drop the * associated buffer reference. */ void xfs_buf_delwri_cancel( struct list_head *list) { struct xfs_buf *bp;
while (!list_empty(list)) { bp = list_first_entry(list, struct xfs_buf, b_list);
xfs_buf_lock(bp); bp->b_flags &= ~_XBF_DELWRI_Q; - list_del_init(&bp->b_list); + xfs_buf_list_del(bp); xfs_buf_relse(bp); } }
/* @@ -2108,10 +2116,38 @@ xfs_buf_delwri_queue( }
return true; }
+/* + * Queue a buffer to this delwri list as part of a data integrity operation. + * If the buffer is on any other delwri list, we'll wait for that to clear + * so that the caller can submit the buffer for IO and wait for the result. + * Callers must ensure the buffer is not already on the list. + */ +void +xfs_buf_delwri_queue_here( + struct xfs_buf *bp, + struct list_head *buffer_list) +{ + /* + * We need this buffer to end up on the /caller's/ delwri list, not any + * old list. This can happen if the buffer is marked stale (which + * clears DELWRI_Q) after the AIL queues the buffer to its list but + * before the AIL has a chance to submit the list. + */ + while (!list_empty(&bp->b_list)) { + xfs_buf_unlock(bp); + wait_var_event(&bp->b_list, list_empty(&bp->b_list)); + xfs_buf_lock(bp); + } + + ASSERT(!(bp->b_flags & _XBF_DELWRI_Q)); + + xfs_buf_delwri_queue(bp, buffer_list); +} + /* * Compare function is more complex than it needs to be because * the return value is only 32 bits and we are doing comparisons * on 64 bit values */ @@ -2170,31 +2206,31 @@ xfs_buf_delwri_submit_buffers( * marked it stale in the meantime. In that case only the * _XBF_DELWRI_Q flag got cleared, and we have to drop the * reference and remove it from the list here. */ if (!(bp->b_flags & _XBF_DELWRI_Q)) { - list_del_init(&bp->b_list); + xfs_buf_list_del(bp); xfs_buf_relse(bp); continue; }
trace_xfs_buf_delwri_split(bp, _RET_IP_);
/* * If we have a wait list, each buffer (and associated delwri * queue reference) transfers to it and is submitted * synchronously. Otherwise, drop the buffer from the delwri * queue and submit async. */ bp->b_flags &= ~_XBF_DELWRI_Q; bp->b_flags |= XBF_WRITE; if (wait_list) { bp->b_flags &= ~XBF_ASYNC; list_move_tail(&bp->b_list, wait_list); } else { bp->b_flags |= XBF_ASYNC; - list_del_init(&bp->b_list); + xfs_buf_list_del(bp); } __xfs_buf_submit(bp, false); } blk_finish_plug(&plug);
@@ -2244,11 +2280,11 @@ xfs_buf_delwri_submit(
/* Wait for IO to complete. */ while (!list_empty(&wait_list)) { bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
- list_del_init(&bp->b_list); + xfs_buf_list_del(bp);
/* * Wait on the locked buffer, check for errors and unlock and * release the delwri queue reference. */ diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h index 549c60942208..6cf0332ba62c 100644 --- a/fs/xfs/xfs_buf.h +++ b/fs/xfs/xfs_buf.h @@ -303,10 +303,11 @@ extern void *xfs_buf_offset(struct xfs_buf *, size_t); extern void xfs_buf_stale(struct xfs_buf *bp);
/* Delayed Write Buffer Routines */ extern void xfs_buf_delwri_cancel(struct list_head *); extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *); +void xfs_buf_delwri_queue_here(struct xfs_buf *bp, struct list_head *bl); extern int xfs_buf_delwri_submit(struct list_head *); extern int xfs_buf_delwri_submit_nowait(struct list_head *); extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);
static inline xfs_daddr_t xfs_buf_daddr(struct xfs_buf *bp)
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 0573676fdde7ce3829ee6a42a8e5a56355234712 ]
Alexander Potapenko report that KMSAN was issuing these warnings:
kmalloc-ed xlog buffer of size 512 : ffff88802fc26200 kmalloc-ed xlog buffer of size 368 : ffff88802fc24a00 kmalloc-ed xlog buffer of size 648 : ffff88802b631000 kmalloc-ed xlog buffer of size 648 : ffff88802b632800 kmalloc-ed xlog buffer of size 648 : ffff88802b631c00 xlog_write_iovec: copying 12 bytes from ffff888017ddbbd8 to ffff88802c300400 xlog_write_iovec: copying 28 bytes from ffff888017ddbbe4 to ffff88802c30040c xlog_write_iovec: copying 68 bytes from ffff88802fc26274 to ffff88802c300428 xlog_write_iovec: copying 188 bytes from ffff88802fc262bc to ffff88802c30046c ===================================================== BUG: KMSAN: uninit-value in xlog_write_iovec fs/xfs/xfs_log.c:2227 BUG: KMSAN: uninit-value in xlog_write_full fs/xfs/xfs_log.c:2263 BUG: KMSAN: uninit-value in xlog_write+0x1fac/0x2600 fs/xfs/xfs_log.c:2532 xlog_write_iovec fs/xfs/xfs_log.c:2227 xlog_write_full fs/xfs/xfs_log.c:2263 xlog_write+0x1fac/0x2600 fs/xfs/xfs_log.c:2532 xlog_cil_write_chain fs/xfs/xfs_log_cil.c:918 xlog_cil_push_work+0x30f2/0x44e0 fs/xfs/xfs_log_cil.c:1263 process_one_work kernel/workqueue.c:2630 process_scheduled_works+0x1188/0x1e30 kernel/workqueue.c:2703 worker_thread+0xee5/0x14f0 kernel/workqueue.c:2784 kthread+0x391/0x500 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
Uninit was created at: slab_post_alloc_hook+0x101/0xac0 mm/slab.h:768 slab_alloc_node mm/slub.c:3482 __kmem_cache_alloc_node+0x612/0xae0 mm/slub.c:3521 __do_kmalloc_node mm/slab_common.c:1006 __kmalloc+0x11a/0x410 mm/slab_common.c:1020 kmalloc ./include/linux/slab.h:604 xlog_kvmalloc fs/xfs/xfs_log_priv.h:704 xlog_cil_alloc_shadow_bufs fs/xfs/xfs_log_cil.c:343 xlog_cil_commit+0x487/0x4dc0 fs/xfs/xfs_log_cil.c:1574 __xfs_trans_commit+0x8df/0x1930 fs/xfs/xfs_trans.c:1017 xfs_trans_commit+0x30/0x40 fs/xfs/xfs_trans.c:1061 xfs_create+0x15af/0x2150 fs/xfs/xfs_inode.c:1076 xfs_generic_create+0x4cd/0x1550 fs/xfs/xfs_iops.c:199 xfs_vn_create+0x4a/0x60 fs/xfs/xfs_iops.c:275 lookup_open fs/namei.c:3477 open_last_lookups fs/namei.c:3546 path_openat+0x29ac/0x6180 fs/namei.c:3776 do_filp_open+0x24d/0x680 fs/namei.c:3809 do_sys_openat2+0x1bc/0x330 fs/open.c:1440 do_sys_open fs/open.c:1455 __do_sys_openat fs/open.c:1471 __se_sys_openat fs/open.c:1466 __x64_sys_openat+0x253/0x330 fs/open.c:1466 do_syscall_x64 arch/x86/entry/common.c:51 do_syscall_64+0x4f/0x140 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b arch/x86/entry/entry_64.S:120
Bytes 112-115 of 188 are uninitialized Memory access of size 188 starts at ffff88802fc262bc
This is caused by the struct xfs_log_dinode not having the di_crc field initialised. Log recovery never uses this field (it is only present these days for on-disk format compatibility reasons) and so it's value is never checked so nothing in XFS has caught this.
Further, none of the uninitialised memory access warning tools have caught this (despite catching other uninit memory accesses in the struct xfs_log_dinode back in 2017!) until recently. Alexander annotated the XFS code to get the dump of the actual bytes that were detected as uninitialised, and from that report it took me about 30s to realise what the issue was.
The issue was introduced back in 2016 and every inode that is logged fails to initialise this field. This is no actual bad behaviour caused by this issue - I find it hard to even classify it as a bug...
Reported-and-tested-by: Alexander Potapenko glider@google.com Fixes: f8d55aa0523a ("xfs: introduce inode log format object") Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/xfs_inode_item.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index 91c847a84e10..2ec23c9af760 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -554,10 +554,13 @@ xfs_inode_to_log_dinode( to->di_ino = ip->i_ino; to->di_lsn = lsn; memset(to->di_pad2, 0, sizeof(to->di_pad2)); uuid_copy(&to->di_uuid, &ip->i_mount->m_sb.sb_meta_uuid); to->di_v3_pad = 0; + + /* dummy value for initialisation */ + to->di_crc = 0; } else { to->di_version = 2; to->di_flushiter = ip->i_flushiter; memset(to->di_v2_pad, 0, sizeof(to->di_v2_pad)); }
From: Long Li leo.lilong@huawei.com
[ Upstream commit 07afd3173d0c6d24a47441839a835955ec6cf0d4 ]
[ 6.1: resolved conflict in xfs_ag.c ]
Take mp->m_perag_lock for deletions from the perag radix tree in xfs_initialize_perag to prevent racing with tagging operations. Lookups are fine - they are RCU protected so already deal with the tree changing shape underneath the lookup - but tagging operations require the tree to be stable while the tags are propagated back up to the root.
Right now there's nothing stopping radix tree tagging from operating while a growfs operation is progress and adding/removing new entries into the radix tree.
Hence we can have traversals that require a stable tree occurring at the same time we are removing unused entries from the radix tree which causes the shape of the tree to change.
Likely this hasn't caused a problem in the past because we are only doing append addition and removal so the active AG part of the tree is not changing shape, but that doesn't mean it is safe. Just making the radix tree modifications serialise against each other is obviously correct.
Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_ag.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index e03bfeacbed4..e7b011c42b7a 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -343,17 +343,21 @@ xfs_initialize_perag(
mp->m_ag_prealloc_blocks = xfs_prealloc_blocks(mp); return 0;
out_remove_pag: + spin_lock(&mp->m_perag_lock); radix_tree_delete(&mp->m_perag_tree, index); + spin_unlock(&mp->m_perag_lock); out_free_pag: kmem_free(pag); out_unwind_new_pags: /* unwind any prior newly initialized pags */ for (index = first_initialised; index < agcount; index++) { + spin_lock(&mp->m_perag_lock); pag = radix_tree_delete(&mp->m_perag_tree, index); + spin_unlock(&mp->m_perag_lock); if (!pag) break; xfs_buf_hash_destroy(pag); kmem_free(pag); }
From: Long Li leo.lilong@huawei.com
[ Upstream commit 7823921887750b39d02e6b44faafdd1cc617c651 ]
[ 6.1: resolved conflicts in xfs_ag.c and xfs_ag.h ]
During growfs, if new ag in memory has been initialized, however sb_agcount has not been updated, if an error occurs at this time it will cause perag leaks as follows, these new AGs will not been freed during umount , because of these new AGs are not visible(that is included in mp->m_sb.sb_agcount).
unreferenced object 0xffff88810be40200 (size 512): comm "xfs_growfs", pid 857, jiffies 4294909093 hex dump (first 32 bytes): 00 c0 c1 05 81 88 ff ff 04 00 00 00 00 00 00 00 ................ 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 381741e2): [<ffffffff8191aef6>] __kmalloc+0x386/0x4f0 [<ffffffff82553e65>] kmem_alloc+0xb5/0x2f0 [<ffffffff8238dac5>] xfs_initialize_perag+0xc5/0x810 [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0 [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0 [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0 [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0 [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a unreferenced object 0xffff88810be40800 (size 512): comm "xfs_growfs", pid 857, jiffies 4294909093 hex dump (first 32 bytes): 20 00 00 00 00 00 00 00 57 ef be dc 00 00 00 00 .......W....... 10 08 e4 0b 81 88 ff ff 10 08 e4 0b 81 88 ff ff ................ backtrace (crc bde50e2d): [<ffffffff8191b43a>] __kmalloc_node+0x3da/0x540 [<ffffffff81814489>] kvmalloc_node+0x99/0x160 [<ffffffff8286acff>] bucket_table_alloc.isra.0+0x5f/0x400 [<ffffffff8286bdc5>] rhashtable_init+0x405/0x760 [<ffffffff8238dda3>] xfs_initialize_perag+0x3a3/0x810 [<ffffffff824f679c>] xfs_growfs_data+0x9bc/0xbc0 [<ffffffff8250b90e>] xfs_file_ioctl+0x5fe/0x14d0 [<ffffffff81aa5194>] __x64_sys_ioctl+0x144/0x1c0 [<ffffffff83c3d81f>] do_syscall_64+0x3f/0xe0 [<ffffffff83e00087>] entry_SYSCALL_64_after_hwframe+0x62/0x6a
Factor out xfs_free_unused_perag_range() from xfs_initialize_perag(), used for freeing unused perag within a specified range in error handling, included in the error path of the growfs failure.
Fixes: 1c1c6ebcf528 ("xfs: Replace per-ag array with a radix tree") Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_ag.c | 34 +++++++++++++++++++++++++--------- fs/xfs/libxfs/xfs_ag.h | 3 +++ fs/xfs/xfs_fsops.c | 5 ++++- 3 files changed, 32 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index e7b011c42b7a..9743fa5b5388 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -257,10 +257,34 @@ xfs_agino_range( xfs_agino_t *last) { return __xfs_agino_range(mp, xfs_ag_block_count(mp, agno), first, last); }
+/* + * Free perag within the specified AG range, it is only used to free unused + * perags under the error handling path. + */ +void +xfs_free_unused_perag_range( + struct xfs_mount *mp, + xfs_agnumber_t agstart, + xfs_agnumber_t agend) +{ + struct xfs_perag *pag; + xfs_agnumber_t index; + + for (index = agstart; index < agend; index++) { + spin_lock(&mp->m_perag_lock); + pag = radix_tree_delete(&mp->m_perag_tree, index); + spin_unlock(&mp->m_perag_lock); + if (!pag) + break; + xfs_buf_hash_destroy(pag); + kmem_free(pag); + } +} + int xfs_initialize_perag( struct xfs_mount *mp, xfs_agnumber_t agcount, xfs_rfsblock_t dblocks, @@ -350,19 +374,11 @@ xfs_initialize_perag( spin_unlock(&mp->m_perag_lock); out_free_pag: kmem_free(pag); out_unwind_new_pags: /* unwind any prior newly initialized pags */ - for (index = first_initialised; index < agcount; index++) { - spin_lock(&mp->m_perag_lock); - pag = radix_tree_delete(&mp->m_perag_tree, index); - spin_unlock(&mp->m_perag_lock); - if (!pag) - break; - xfs_buf_hash_destroy(pag); - kmem_free(pag); - } + xfs_free_unused_perag_range(mp, first_initialised, agcount); return error; }
static int xfs_get_aghdr_buf( diff --git a/fs/xfs/libxfs/xfs_ag.h b/fs/xfs/libxfs/xfs_ag.h index 191b22b9a35b..eb84af1c8628 100644 --- a/fs/xfs/libxfs/xfs_ag.h +++ b/fs/xfs/libxfs/xfs_ag.h @@ -104,10 +104,13 @@ struct xfs_perag { struct delayed_work pag_blockgc_work;
#endif /* __KERNEL__ */ };
+ +void xfs_free_unused_perag_range(struct xfs_mount *mp, xfs_agnumber_t agstart, + xfs_agnumber_t agend); int xfs_initialize_perag(struct xfs_mount *mp, xfs_agnumber_t agcount, xfs_rfsblock_t dcount, xfs_agnumber_t *maxagi); int xfs_initialize_perag_data(struct xfs_mount *mp, xfs_agnumber_t agno); void xfs_free_perag(struct xfs_mount *mp);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 77b14f788214..96e9d64fbe62 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -151,11 +151,11 @@ xfs_growfs_data_private(
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growdata, (delta > 0 ? XFS_GROWFS_SPACE_RES(mp) : -delta), 0, XFS_TRANS_RESERVE, &tp); if (error) - return error; + goto out_free_unused_perag;
last_pag = xfs_perag_get(mp, oagcount - 1); if (delta > 0) { error = xfs_resizefs_init_new_ags(tp, &id, oagcount, nagcount, delta, last_pag, &lastag_extended); @@ -225,10 +225,13 @@ xfs_growfs_data_private( } return error;
out_trans_cancel: xfs_trans_cancel(tp); +out_free_unused_perag: + if (nagcount > oagcount) + xfs_free_unused_perag_range(mp, oagcount, nagcount); return error; }
static int xfs_growfs_log_private(
From: Jiachen Zhang zhangjiachen.jaycee@bytedance.com
[ Upstream commit e6af9c98cbf0164a619d95572136bfb54d482dd6 ]
In the case of returning -ENOSPC, ensure logflagsp is initialized by 0. Otherwise the caller __xfs_bunmapi will set uninitialized illegal tmp_logflags value into xfs log, which might cause unpredictable error in the log recovery procedure.
Also, remove the flags variable and set the *logflagsp directly, so that the code should be more robust in the long run.
Fixes: 1b24b633aafe ("xfs: move some more code into xfs_bmap_del_extent_real") Signed-off-by: Jiachen Zhang zhangjiachen.jaycee@bytedance.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_bmap.c | 73 +++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 42 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index c99fd7fe242e..2a3b78032cb0 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4997,242 +4997,231 @@ xfs_bmap_del_extent_real( { xfs_fsblock_t del_endblock=0; /* first block past del */ xfs_fileoff_t del_endoff; /* first offset past del */ int do_fx; /* free extent at end of routine */ int error; /* error return value */ - int flags = 0;/* inode logging flags */ struct xfs_bmbt_irec got; /* current extent entry */ xfs_fileoff_t got_endoff; /* first offset past got */ int i; /* temp state */ struct xfs_ifork *ifp; /* inode fork pointer */ xfs_mount_t *mp; /* mount structure */ xfs_filblks_t nblks; /* quota/sb block count */ xfs_bmbt_irec_t new; /* new record to be inserted */ /* REFERENCED */ uint qfield; /* quota field to update */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_bmbt_irec old;
+ *logflagsp = 0; + mp = ip->i_mount; XFS_STATS_INC(mp, xs_del_exlist);
ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(del->br_blockcount > 0); xfs_iext_get_extent(ifp, icur, &got); ASSERT(got.br_startoff <= del->br_startoff); del_endoff = del->br_startoff + del->br_blockcount; got_endoff = got.br_startoff + got.br_blockcount; ASSERT(got_endoff >= del_endoff); ASSERT(!isnullstartblock(got.br_startblock)); qfield = 0; - error = 0;
/* * If it's the case where the directory code is running with no block * reservation, and the deleted block is in the middle of its extent, * and the resulting insert of an extent would cause transformation to * btree format, then reject it. The calling code will then swap blocks * around instead. We have to do this now, rather than waiting for the * conversion to btree format, since the transaction will be dirty then. */ if (tp->t_blk_res == 0 && ifp->if_format == XFS_DINODE_FMT_EXTENTS && ifp->if_nextents >= XFS_IFORK_MAXEXT(ip, whichfork) && del->br_startoff > got.br_startoff && del_endoff < got_endoff) return -ENOSPC;
- flags = XFS_ILOG_CORE; + *logflagsp = XFS_ILOG_CORE; if (whichfork == XFS_DATA_FORK && XFS_IS_REALTIME_INODE(ip)) { if (!(bflags & XFS_BMAPI_REMAP)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) - goto done; + return error; }
do_fx = 0; qfield = XFS_TRANS_DQ_RTBCOUNT; } else { do_fx = 1; qfield = XFS_TRANS_DQ_BCOUNT; } nblks = del->br_blockcount;
del_endblock = del->br_startblock + del->br_blockcount; if (cur) { error = xfs_bmbt_lookup_eq(cur, &got, &i); if (error) - goto done; - if (XFS_IS_CORRUPT(mp, i != 1)) { - error = -EFSCORRUPTED; - goto done; - } + return error; + if (XFS_IS_CORRUPT(mp, i != 1)) + return -EFSCORRUPTED; }
if (got.br_startoff == del->br_startoff) state |= BMAP_LEFT_FILLING; if (got_endoff == del_endoff) state |= BMAP_RIGHT_FILLING;
switch (state & (BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING)) { case BMAP_LEFT_FILLING | BMAP_RIGHT_FILLING: /* * Matches the whole extent. Delete the entry. */ xfs_iext_remove(ip, icur, state); xfs_iext_prev(ifp, icur); ifp->if_nextents--;
- flags |= XFS_ILOG_CORE; + *logflagsp |= XFS_ILOG_CORE; if (!cur) { - flags |= xfs_ilog_fext(whichfork); + *logflagsp |= xfs_ilog_fext(whichfork); break; } if ((error = xfs_btree_delete(cur, &i))) - goto done; - if (XFS_IS_CORRUPT(mp, i != 1)) { - error = -EFSCORRUPTED; - goto done; - } + return error; + if (XFS_IS_CORRUPT(mp, i != 1)) + return -EFSCORRUPTED; break; case BMAP_LEFT_FILLING: /* * Deleting the first part of the extent. */ got.br_startoff = del_endoff; got.br_startblock = del_endblock; got.br_blockcount -= del->br_blockcount; xfs_iext_update_extent(ip, state, icur, &got); if (!cur) { - flags |= xfs_ilog_fext(whichfork); + *logflagsp |= xfs_ilog_fext(whichfork); break; } error = xfs_bmbt_update(cur, &got); if (error) - goto done; + return error; break; case BMAP_RIGHT_FILLING: /* * Deleting the last part of the extent. */ got.br_blockcount -= del->br_blockcount; xfs_iext_update_extent(ip, state, icur, &got); if (!cur) { - flags |= xfs_ilog_fext(whichfork); + *logflagsp |= xfs_ilog_fext(whichfork); break; } error = xfs_bmbt_update(cur, &got); if (error) - goto done; + return error; break; case 0: /* * Deleting the middle of the extent. */
old = got;
got.br_blockcount = del->br_startoff - got.br_startoff; xfs_iext_update_extent(ip, state, icur, &got);
new.br_startoff = del_endoff; new.br_blockcount = got_endoff - del_endoff; new.br_state = got.br_state; new.br_startblock = del_endblock;
- flags |= XFS_ILOG_CORE; + *logflagsp |= XFS_ILOG_CORE; if (cur) { error = xfs_bmbt_update(cur, &got); if (error) - goto done; + return error; error = xfs_btree_increment(cur, 0, &i); if (error) - goto done; + return error; cur->bc_rec.b = new; error = xfs_btree_insert(cur, &i); if (error && error != -ENOSPC) - goto done; + return error; /* * If get no-space back from btree insert, it tried a * split, and we have a zero block reservation. Fix up * our state and return the error. */ if (error == -ENOSPC) { /* * Reset the cursor, don't trust it after any * insert operation. */ error = xfs_bmbt_lookup_eq(cur, &got, &i); if (error) - goto done; - if (XFS_IS_CORRUPT(mp, i != 1)) { - error = -EFSCORRUPTED; - goto done; - } + return error; + if (XFS_IS_CORRUPT(mp, i != 1)) + return -EFSCORRUPTED; /* * Update the btree record back * to the original value. */ error = xfs_bmbt_update(cur, &old); if (error) - goto done; + return error; /* * Reset the extent record back * to the original value. */ xfs_iext_update_extent(ip, state, icur, &old); - flags = 0; - error = -ENOSPC; - goto done; - } - if (XFS_IS_CORRUPT(mp, i != 1)) { - error = -EFSCORRUPTED; - goto done; + *logflagsp = 0; + return -ENOSPC; } + if (XFS_IS_CORRUPT(mp, i != 1)) + return -EFSCORRUPTED; } else - flags |= xfs_ilog_fext(whichfork); + *logflagsp |= xfs_ilog_fext(whichfork);
ifp->if_nextents++; xfs_iext_next(ifp, icur); xfs_iext_insert(ip, icur, &new, state); break; }
/* remove reverse mapping */ xfs_rmap_unmap_extent(tp, ip, whichfork, del);
/* * If we need to, add to list of extents to delete. */ if (do_fx && !(bflags & XFS_BMAPI_REMAP)) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { error = __xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, XFS_AG_RESV_NONE, ((bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN)); if (error) - goto done; + return error; } }
/* * Adjust inode # blocks in the file. */ if (nblks) ip->i_nblocks -= nblks; /* * Adjust quota data. */ if (qfield && !(bflags & XFS_BMAPI_REMAP)) xfs_trans_mod_dquot_byino(tp, ip, qfield, (long)-nblks);
-done: - *logflagsp = flags; - return error; + return 0; }
/* * Unmap (remove) blocks from a file. * If nexts is nonzero then the number of extents to remove is limited to
From: Zhang Tianci zhangtianci.1997@bytedance.com
[ Upstream commit 5759aa4f956034b289b0ae2c99daddfc775442e1 ]
xfs_da3_swap_lastblock() copy the last block content to the dead block, but do not update the metadata in it. We need update some metadata for some kinds of type block, such as dir3 leafn block records its blkno, we shall update it to the dead block blkno. Otherwise, before write the xfs_buf to disk, the verify_write() will fail in blk_hdr->blkno != xfs_buf->b_bn, then xfs will be shutdown.
We will get this warning:
XFS (dm-0): Metadata corruption detected at xfs_dir3_leaf_verify+0xa8/0xe0 [xfs], xfs_dir3_leafn block 0x178 XFS (dm-0): Unmount and run xfs_repair XFS (dm-0): First 128 bytes of corrupted metadata buffer: 00000000e80f1917: 00 80 00 0b 00 80 00 07 3d ff 00 00 00 00 00 00 ........=....... 000000009604c005: 00 00 00 00 00 00 01 a0 00 00 00 00 00 00 00 00 ................ 000000006b6fb2bf: e4 44 e3 97 b5 64 44 41 8b 84 60 0e 50 43 d9 bf .D...dDA..`.PC.. 00000000678978a2: 00 00 00 00 00 00 00 83 01 73 00 93 00 00 00 00 .........s...... 00000000b28b247c: 99 29 1d 38 00 00 00 00 99 29 1d 40 00 00 00 00 .).8.....).@.... 000000002b2a662c: 99 29 1d 48 00 00 00 00 99 49 11 00 00 00 00 00 .).H.....I...... 00000000ea2ffbb8: 99 49 11 08 00 00 45 25 99 49 11 10 00 00 48 fe .I....E%.I....H. 0000000069e86440: 99 49 11 18 00 00 4c 6b 99 49 11 20 00 00 4d 97 .I....Lk.I. ..M. XFS (dm-0): xfs_do_force_shutdown(0x8) called from line 1423 of file fs/xfs/xfs_buf.c. Return address = 00000000c0ff63c1 XFS (dm-0): Corruption of in-memory data detected. Shutting down filesystem XFS (dm-0): Please umount the filesystem and rectify the problem(s)
From the log above, we know xfs_buf->b_no is 0x178, but the block's hdr record
its blkno is 0x1a0.
Fixes: 24df33b45ecf ("xfs: add CRC checking to dir2 leaf blocks") Signed-off-by: Zhang Tianci zhangtianci.1997@bytedance.com Suggested-by: Dave Chinner david@fromorbit.com Reviewed-by: "Darrick J. Wong" djwong@kernel.org Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_da_btree.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/xfs/libxfs/xfs_da_btree.c b/fs/xfs/libxfs/xfs_da_btree.c index e576560b46e9..282c7cf032f4 100644 --- a/fs/xfs/libxfs/xfs_da_btree.c +++ b/fs/xfs/libxfs/xfs_da_btree.c @@ -2314,14 +2314,21 @@ xfs_da3_swap_lastblock( error = xfs_da3_node_read(tp, dp, last_blkno, &last_buf, w); if (error) return error; /* * Copy the last block into the dead buffer and log it. + * On CRC-enabled file systems, also update the stamped in blkno. */ memcpy(dead_buf->b_addr, last_buf->b_addr, args->geo->blksize); + if (xfs_has_crc(mp)) { + struct xfs_da3_blkinfo *da3 = dead_buf->b_addr; + + da3->blkno = cpu_to_be64(xfs_buf_daddr(dead_buf)); + } xfs_trans_log_buf(tp, dead_buf, 0, args->geo->blksize - 1); dead_info = dead_buf->b_addr; + /* * Get values from the moved block. */ if (dead_info->magic == cpu_to_be16(XFS_DIR2_LEAFN_MAGIC) || dead_info->magic == cpu_to_be16(XFS_DIR3_LEAFN_MAGIC)) {
From: Andrey Albershteyn aalbersh@redhat.com
[ Upstream commit 82ef1a5356572219f41f9123ca047259a77bd67b ]
In XFS_DAS_NODE_REMOVE_ATTR case, xfs_attr_mode_remove_attr() sets filter to XFS_ATTR_INCOMPLETE. The filter is then reset in xfs_attr_complete_op() if XFS_DA_OP_REPLACE operation is performed.
The filter is not reset though if XFS just removes the attribute (args->value == NULL) with xfs_attr_defer_remove(). attr code goes to XFS_DAS_DONE state.
Fix this by always resetting XFS_ATTR_INCOMPLETE filter. The replace operation already resets this filter in anyway and others are completed at this step hence don't need it.
Fixes: fdaf1bb3cafc ("xfs: ATTR_REPLACE algorithm with LARP enabled needs rework") Signed-off-by: Andrey Albershteyn aalbersh@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_attr.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index e28d93d232de..32d350e97e0f 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -419,14 +419,14 @@ xfs_attr_complete_op( { struct xfs_da_args *args = attr->xattri_da_args; bool do_replace = args->op_flags & XFS_DA_OP_REPLACE;
args->op_flags &= ~XFS_DA_OP_REPLACE; - if (do_replace) { - args->attr_filter &= ~XFS_ATTR_INCOMPLETE; + args->attr_filter &= ~XFS_ATTR_INCOMPLETE; + if (do_replace) return replace_state; - } + return XFS_DAS_DONE; }
static int xfs_attr_leaf_addname(
From: "Darrick J. Wong" djwong@kernel.org
[ Upstream commit 881f78f472556ed05588172d5b5676b48dc48240 ]
[ 6.1: used 6.6 backport to minimize conflicts ]
[backport: resolve merge conflicts due to refactoring rtbitmap/summary macros and accessors]
I mistakenly turned off CONFIG_XFS_RT in the Kconfig file for arm64 variant of the djwong-wtf git branch. Unfortunately, it took me a good hour to figure out that RT wasn't built because this is what got printed to dmesg:
XFS (sda2): realtime geometry sanity check failed XFS (sda2): Metadata corruption detected at xfs_sb_read_verify+0x170/0x190 [xfs], xfs_sb block 0x0
Whereas I would have expected:
XFS (sda2): Not built with CONFIG_XFS_RT XFS (sda2): RT mount failed
The root cause of these problems is the conditional compilation of the new functions xfs_validate_rtextents and xfs_compute_rextslog that I introduced in the two commits listed below. The !RT versions of these functions return false and 0, respectively, which causes primary superblock validation to fail, which explains the first message.
Move the two functions to other parts of libxfs that are not conditionally defined by CONFIG_XFS_RT and remove the broken stubs so that validation works again.
Fixes: e14293803f4e ("xfs: don't allow overly small or large realtime volumes") Fixes: a6a38f309afc ("xfs: make rextslog computation consistent with mkfs") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Catherine Hoang catherine.hoang@oracle.com Acked-by: Darrick J. Wong djwong@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Leah Rumancik leah.rumancik@gmail.com Acked-by: "Darrick J. Wong" djwong@kernel.org --- fs/xfs/libxfs/xfs_rtbitmap.c | 14 -------------- fs/xfs/libxfs/xfs_rtbitmap.h | 16 ---------------- fs/xfs/libxfs/xfs_sb.c | 14 ++++++++++++++ fs/xfs/libxfs/xfs_sb.h | 2 ++ fs/xfs/libxfs/xfs_types.h | 12 ++++++++++++ fs/xfs/scrub/rtbitmap.c | 1 + 6 files changed, 29 insertions(+), 30 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 8db1243beacc..760172a65aff 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1129,19 +1129,5 @@ xfs_rtalloc_extent_is_free(
*is_free = matches; return 0; }
-/* - * Compute the maximum level number of the realtime summary file, as defined by - * mkfs. The historic use of highbit32 on a 64-bit quantity prohibited correct - * use of rt volumes with more than 2^32 extents. - */ -uint8_t -xfs_compute_rextslog( - xfs_rtbxlen_t rtextents) -{ - if (!rtextents) - return 0; - return xfs_highbit64(rtextents); -} - diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 4e49aadf0955..b89712983347 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -69,31 +69,15 @@ xfs_rtfree_extent(
/* Same as above, but in units of rt blocks. */ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, xfs_filblks_t rtlen);
-uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents); - -/* Do we support an rt volume having this number of rtextents? */ -static inline bool -xfs_validate_rtextents( - xfs_rtbxlen_t rtextents) -{ - /* No runt rt volumes */ - if (rtextents == 0) - return false; - - return true; -} - #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) # define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) # define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) # define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) # define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) -# define xfs_compute_rextslog(rtx) (0) -# define xfs_validate_rtextents(rtx) (false) #endif /* CONFIG_XFS_RT */
#endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 04247d1c7523..90ed55cd3d10 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1365,5 +1365,19 @@ xfs_validate_stripe_geometry( swidth, sunit); return false; } return true; } + +/* + * Compute the maximum level number of the realtime summary file, as defined by + * mkfs. The historic use of highbit32 on a 64-bit quantity prohibited correct + * use of rt volumes with more than 2^32 extents. + */ +uint8_t +xfs_compute_rextslog( + xfs_rtbxlen_t rtextents) +{ + if (!rtextents) + return 0; + return xfs_highbit64(rtextents); +} diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h index 19134b23c10b..2e8e8d63d4eb 100644 --- a/fs/xfs/libxfs/xfs_sb.h +++ b/fs/xfs/libxfs/xfs_sb.h @@ -36,6 +36,8 @@ extern int xfs_sb_get_secondary(struct xfs_mount *mp, struct xfs_buf **bpp);
extern bool xfs_validate_stripe_geometry(struct xfs_mount *mp, __s64 sunit, __s64 swidth, int sectorsize, bool silent);
+uint8_t xfs_compute_rextslog(xfs_rtbxlen_t rtextents); + #endif /* __XFS_SB_H__ */ diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 7023e4a79f87..42fed04f038d 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -226,6 +226,18 @@ void xfs_icount_range(struct xfs_mount *mp, unsigned long long *min, unsigned long long *max); bool xfs_verify_fileoff(struct xfs_mount *mp, xfs_fileoff_t off); bool xfs_verify_fileext(struct xfs_mount *mp, xfs_fileoff_t off, xfs_fileoff_t len);
+/* Do we support an rt volume having this number of rtextents? */ +static inline bool +xfs_validate_rtextents( + xfs_rtbxlen_t rtextents) +{ + /* No runt rt volumes */ + if (rtextents == 0) + return false; + + return true; +} + #endif /* __XFS_TYPES_H__ */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index faf6e0890d44..fad7c353ada6 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -12,10 +12,11 @@ #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_bmap.h" +#include "xfs_sb.h" #include "scrub/scrub.h" #include "scrub/common.h"
/* Set us up with the realtime metadata locked. */ int
linux-stable-mirror@lists.linaro.org