 
            Hi all,
Here's a bunch of hand-ported bug fixes for 6.12 LTS. Most of the patches fix a warning about dquot reclaim needing to read dquot buffers in from disk by pinning buffers at transaction commit time instead of during reclaim.
If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below.
With a bit of luck, this should all go splendidly. Comments and questions are, as always, welcome.
--D
kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=nex... --- Commits in this patchset: * xfs: avoid nested calls to __xfs_trans_commit * xfs: don't lose solo superblock counter update transactions * xfs: don't lose solo dquot update transactions * xfs: separate dquot buffer reads from xfs_dqflush * xfs: clean up log item accesses in xfs_qm_dqflush{,_done} * xfs: attach dquot buffer to dquot log item buffer * xfs: convert quotacheck to attach dquot buffers * xfs: don't over-report free space or inodes in statvfs * xfs: release the dquot buf outside of qli_lock * xfs: lock dquot buffer before detaching dquot from b_li_list * xfs: fix mount hang during primary superblock recovery failure --- fs/xfs/xfs_dquot.h | 6 + fs/xfs/xfs_dquot_item.h | 7 + fs/xfs/xfs_quota.h | 7 + fs/xfs/xfs_buf_item_recover.c | 11 ++ fs/xfs/xfs_dquot.c | 199 +++++++++++++++++++++++++++++++++++------ fs/xfs/xfs_dquot_item.c | 51 ++++++++--- fs/xfs/xfs_qm.c | 48 ++++++++-- fs/xfs/xfs_qm_bhv.c | 27 ++++-- fs/xfs/xfs_trans.c | 39 ++++---- fs/xfs/xfs_trans_ail.c | 2 fs/xfs/xfs_trans_dquot.c | 31 +++++- 11 files changed, 338 insertions(+), 90 deletions(-)
 
            From: Darrick J. Wong djwong@kernel.org
commit e96c1e2f262e0993859e266e751977bfad3ca98a upstream
Currently, __xfs_trans_commit calls xfs_defer_finish_noroll, which calls __xfs_trans_commit again on the same transaction. In other words, there's function recursion that has caused minor amounts of confusion in the past. There's no reason to keep this around, since there's only one place where we actually want the xfs_defer_finish_noroll, and that is in the top level xfs_trans_commit call.
Fixes: 98719051e75ccf ("xfs: refactor internal dfops initialization") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_trans.c | 26 ++++++++++++++------------ 1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 30e03342287a94..001d9bec4ed571 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -834,18 +834,6 @@ __xfs_trans_commit(
trace_xfs_trans_commit(tp, _RET_IP_);
- /* - * Finish deferred items on final commit. Only permanent transactions - * should ever have deferred ops. - */ - WARN_ON_ONCE(!list_empty(&tp->t_dfops) && - !(tp->t_flags & XFS_TRANS_PERM_LOG_RES)); - if (!regrant && (tp->t_flags & XFS_TRANS_PERM_LOG_RES)) { - error = xfs_defer_finish_noroll(&tp); - if (error) - goto out_unreserve; - } - error = xfs_trans_run_precommits(tp); if (error) goto out_unreserve; @@ -924,6 +912,20 @@ int xfs_trans_commit( struct xfs_trans *tp) { + /* + * Finish deferred items on final commit. Only permanent transactions + * should ever have deferred ops. + */ + WARN_ON_ONCE(!list_empty(&tp->t_dfops) && + !(tp->t_flags & XFS_TRANS_PERM_LOG_RES)); + if (tp->t_flags & XFS_TRANS_PERM_LOG_RES) { + int error = xfs_defer_finish_noroll(&tp); + if (error) { + xfs_trans_cancel(tp); + return error; + } + } + return __xfs_trans_commit(tp, false); }
 
            [ Sasha's backport helper bot ]
Hi,
The claimed upstream commit SHA1 (e96c1e2f262e0993859e266e751977bfad3ca98a) was not found. However, I found a matching commit: a004afdc62946d3261f724c6472997085c4f0735
Status in newer kernel trees: 6.13.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- Failed to apply patch cleanly, falling back to interdiff... ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.13.y | Failed | N/A | | stable/linux-6.12.y | Success | Failed | | stable/linux-6.6.y | Success | Failed | | stable/linux-6.1.y | Success | Failed | | stable/linux-5.15.y | Failed | N/A | | stable/linux-5.10.y | Failed | N/A | | stable/linux-5.4.y | Failed | N/A |
Build Errors: Build error for stable/linux-6.12.y: ssh: connect to host 192.168.1.58 port 22: No route to host
Build error for stable/linux-6.6.y: ssh: connect to host 192.168.1.58 port 22: No route to host
Build error for stable/linux-6.1.y: ssh: connect to host 192.168.1.58 port 22: No route to host
 
            From: Darrick J. Wong djwong@kernel.org
commit c817aabd3b08e8770d89a9a29ae80fead561a1a1 upstream
Superblock counter updates are tracked via per-transaction counters in the xfs_trans object. These changes are then turned into dirty log items in xfs_trans_apply_sb_deltas just prior to commiting the log items to the CIL.
However, updating the per-transaction counter deltas do not cause XFS_TRANS_DIRTY to be set on the transaction. In other words, a pure sb counter update will be silently discarded if there are no other dirty log items attached to the transaction.
This is currently not the case anywhere in the filesystem because sb counter updates always dirty at least one other metadata item, but let's not leave a logic bomb.
Cc: stable@vger.kernel.org # v2.6.35 Fixes: 0924378a689ccb ("xfs: split out iclog writing from xfs_trans_commit()") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_trans.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 001d9bec4ed571..ee46051db12dde 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -834,6 +834,13 @@ __xfs_trans_commit(
trace_xfs_trans_commit(tp, _RET_IP_);
+ /* + * Commit per-transaction changes that are not already tracked through + * log items. This can add dirty log items to the transaction. + */ + if (tp->t_flags & XFS_TRANS_SB_DIRTY) + xfs_trans_apply_sb_deltas(tp); + error = xfs_trans_run_precommits(tp); if (error) goto out_unreserve; @@ -864,8 +871,6 @@ __xfs_trans_commit( /* * If we need to update the superblock, then do it now. */ - if (tp->t_flags & XFS_TRANS_SB_DIRTY) - xfs_trans_apply_sb_deltas(tp); xfs_trans_apply_dquot_deltas(tp);
xlog_cil_commit(log, tp, &commit_seq, regrant);
 
            From: Darrick J. Wong djwong@kernel.org
commit d00ffba4adacd0d4d905f6e64bd8cd87011f5711 upstream
Quota counter updates are tracked via incore objects which hang off the xfs_trans object. These changes are then turned into dirty log items in xfs_trans_apply_dquot_deltas just prior to commiting the log items to the CIL.
However, updating the incore deltas do not cause XFS_TRANS_DIRTY to be set on the transaction. In other words, a pure quota counter update will be silently discarded if there are no other dirty log items attached to the transaction.
This is currently not the case anywhere in the filesystem because quota updates always dirty at least one other metadata item, but a subsequent bug fix will add dquot log item precommits, so we actually need a dirty dquot log item prior to xfs_trans_run_precommits. Also let's not leave a logic bomb.
Cc: stable@vger.kernel.org # v2.6.35 Fixes: 0924378a689ccb ("xfs: split out iclog writing from xfs_trans_commit()") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_quota.h | 7 ++++--- fs/xfs/xfs_trans.c | 10 +++------- fs/xfs/xfs_trans_dquot.c | 31 ++++++++++++++++++++++++++----- 3 files changed, 33 insertions(+), 15 deletions(-)
diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h index 23d71a55bbc006..823c5d472dc0d5 100644 --- a/fs/xfs/xfs_quota.h +++ b/fs/xfs/xfs_quota.h @@ -96,7 +96,8 @@ extern void xfs_trans_free_dqinfo(struct xfs_trans *); extern void xfs_trans_mod_dquot_byino(struct xfs_trans *, struct xfs_inode *, uint, int64_t); extern void xfs_trans_apply_dquot_deltas(struct xfs_trans *); -extern void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *); +void xfs_trans_unreserve_and_mod_dquots(struct xfs_trans *tp, + bool already_locked); int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip, int64_t dblocks, int64_t rblocks, bool force); extern int xfs_trans_reserve_quota_bydquots(struct xfs_trans *, @@ -165,8 +166,8 @@ static inline void xfs_trans_mod_dquot_byino(struct xfs_trans *tp, struct xfs_inode *ip, uint field, int64_t delta) { } -#define xfs_trans_apply_dquot_deltas(tp) -#define xfs_trans_unreserve_and_mod_dquots(tp) +#define xfs_trans_apply_dquot_deltas(tp, a) +#define xfs_trans_unreserve_and_mod_dquots(tp, a) static inline int xfs_trans_reserve_quota_nblks(struct xfs_trans *tp, struct xfs_inode *ip, int64_t dblocks, int64_t rblocks, bool force) diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index ee46051db12dde..39cd11cbe21fcb 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -840,6 +840,7 @@ __xfs_trans_commit( */ if (tp->t_flags & XFS_TRANS_SB_DIRTY) xfs_trans_apply_sb_deltas(tp); + xfs_trans_apply_dquot_deltas(tp);
error = xfs_trans_run_precommits(tp); if (error) @@ -868,11 +869,6 @@ __xfs_trans_commit(
ASSERT(tp->t_ticket != NULL);
- /* - * If we need to update the superblock, then do it now. - */ - xfs_trans_apply_dquot_deltas(tp); - xlog_cil_commit(log, tp, &commit_seq, regrant);
xfs_trans_free(tp); @@ -898,7 +894,7 @@ __xfs_trans_commit( * the dqinfo portion to be. All that means is that we have some * (non-persistent) quota reservations that need to be unreserved. */ - xfs_trans_unreserve_and_mod_dquots(tp); + xfs_trans_unreserve_and_mod_dquots(tp, true); if (tp->t_ticket) { if (regrant && !xlog_is_shutdown(log)) xfs_log_ticket_regrant(log, tp->t_ticket); @@ -992,7 +988,7 @@ xfs_trans_cancel( } #endif xfs_trans_unreserve_and_mod_sb(tp); - xfs_trans_unreserve_and_mod_dquots(tp); + xfs_trans_unreserve_and_mod_dquots(tp, false);
if (tp->t_ticket) { xfs_log_ticket_ungrant(log, tp->t_ticket); diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index b368e13424c4f4..b92eeaa1a2a9e7 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -602,6 +602,24 @@ xfs_trans_apply_dquot_deltas( ASSERT(dqp->q_blk.reserved >= dqp->q_blk.count); ASSERT(dqp->q_ino.reserved >= dqp->q_ino.count); ASSERT(dqp->q_rtb.reserved >= dqp->q_rtb.count); + + /* + * We've applied the count changes and given back + * whatever reservation we didn't use. Zero out the + * dqtrx fields. + */ + qtrx->qt_blk_res = 0; + qtrx->qt_bcount_delta = 0; + qtrx->qt_delbcnt_delta = 0; + + qtrx->qt_rtblk_res = 0; + qtrx->qt_rtblk_res_used = 0; + qtrx->qt_rtbcount_delta = 0; + qtrx->qt_delrtb_delta = 0; + + qtrx->qt_ino_res = 0; + qtrx->qt_ino_res_used = 0; + qtrx->qt_icount_delta = 0; } } } @@ -638,7 +656,8 @@ xfs_trans_unreserve_and_mod_dquots_hook( */ void xfs_trans_unreserve_and_mod_dquots( - struct xfs_trans *tp) + struct xfs_trans *tp, + bool already_locked) { int i, j; struct xfs_dquot *dqp; @@ -667,10 +686,12 @@ xfs_trans_unreserve_and_mod_dquots( * about the number of blocks used field, or deltas. * Also we don't bother to zero the fields. */ - locked = false; + locked = already_locked; if (qtrx->qt_blk_res) { - xfs_dqlock(dqp); - locked = true; + if (!locked) { + xfs_dqlock(dqp); + locked = true; + } dqp->q_blk.reserved -= (xfs_qcnt_t)qtrx->qt_blk_res; } @@ -691,7 +712,7 @@ xfs_trans_unreserve_and_mod_dquots( dqp->q_rtb.reserved -= (xfs_qcnt_t)qtrx->qt_rtblk_res; } - if (locked) + if (locked && !already_locked) xfs_dqunlock(dqp);
}
 
            On Fri, Feb 07, 2025 at 11:27:04AM -0800, Darrick J. Wong wrote:
From: Darrick J. Wong djwong@kernel.org
commit d00ffba4adacd0d4d905f6e64bd8cd87011f5711 upstream
There is no such commit upstream :(
And maybe because of this, it turns out this commit breaks the build, so I'll be dropping it now.
thanks,
greg k-h
 
            On Fri, Feb 14, 2025 at 02:35:34PM +0100, Greg KH wrote:
On Fri, Feb 07, 2025 at 11:27:04AM -0800, Darrick J. Wong wrote:
From: Darrick J. Wong djwong@kernel.org
commit d00ffba4adacd0d4d905f6e64bd8cd87011f5711 upstream
There is no such commit upstream :(
And maybe because of this, it turns out this commit breaks the build, so I'll be dropping it now.
Heh, I originally ported this patch when it was in my dev tree, then later fixed something in the dev branch and forgot to update the lts branch. Then all those quota fixes got delayed for a month because the upstream maintainer was on vacation and didn't push anything, and I forgot to ever get back to this, and checkpatch didn't notice because it doesn't validate commit ids that link to another repo.
Soooo here I go updating my bespoke checkpatch for a third day in a row, and will resubmit this patch soon.
Q: Does everyone else already have Really Good maintainer scripts and I am the grossly ignorant one? Or is scripts/checkpatch.pl really the only thing available? Because it whines about things that nobody in the xfs community really care about (minor style problems), and misses things that we really /do/ care about (correct commit tagging).
--D
thanks,
greg k-h
 
            From: Darrick J. Wong djwong@kernel.org
commit a40fe30868ba433ac08376e30132400bec067583 upstream
The first step towards holding the dquot buffer in the li_buf instead of reading it in the AIL is to separate the part that reads the buffer from the actual flush code. There should be no functional changes.
Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_dquot.h | 4 ++- fs/xfs/xfs_dquot.c | 57 +++++++++++++++++++++++++++++++---------------- fs/xfs/xfs_dquot_item.c | 20 +++++++++++++--- fs/xfs/xfs_qm.c | 37 +++++++++++++++++++++++++------ 4 files changed, 86 insertions(+), 32 deletions(-)
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index 677bb2dc9ac913..fb9995d2f2331a 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -204,7 +204,9 @@ void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp); #define XFS_DQ_IS_DIRTY(dqp) ((dqp)->q_flags & XFS_DQFLAG_DIRTY)
void xfs_qm_dqdestroy(struct xfs_dquot *dqp); -int xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf **bpp); +int xfs_dquot_read_buf(struct xfs_trans *tp, struct xfs_dquot *dqp, + struct xfs_buf **bpp); +int xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf *bp); void xfs_qm_dqunpin_wait(struct xfs_dquot *dqp); void xfs_qm_adjust_dqtimers(struct xfs_dquot *d); void xfs_qm_adjust_dqlimits(struct xfs_dquot *d); diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index c1b211c260a9d5..5c4ede3e0fc70d 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1232,6 +1232,42 @@ xfs_qm_dqflush_check( return NULL; }
+/* + * Get the buffer containing the on-disk dquot. + * + * Requires dquot flush lock, will clear the dirty flag, delete the quota log + * item from the AIL, and shut down the system if something goes wrong. + */ +int +xfs_dquot_read_buf( + struct xfs_trans *tp, + struct xfs_dquot *dqp, + struct xfs_buf **bpp) +{ + struct xfs_mount *mp = dqp->q_mount; + struct xfs_buf *bp = NULL; + int error; + + error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, dqp->q_blkno, + mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK, + &bp, &xfs_dquot_buf_ops); + if (error == -EAGAIN) + return error; + if (xfs_metadata_is_sick(error)) + xfs_dquot_mark_sick(dqp); + if (error) + goto out_abort; + + *bpp = bp; + return 0; + +out_abort: + dqp->q_flags &= ~XFS_DQFLAG_DIRTY; + xfs_trans_ail_delete(&dqp->q_logitem.qli_item, 0); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + return error; +} + /* * Write a modified dquot to disk. * The dquot must be locked and the flush lock too taken by caller. @@ -1243,11 +1279,10 @@ xfs_qm_dqflush_check( int xfs_qm_dqflush( struct xfs_dquot *dqp, - struct xfs_buf **bpp) + struct xfs_buf *bp) { struct xfs_mount *mp = dqp->q_mount; struct xfs_log_item *lip = &dqp->q_logitem.qli_item; - struct xfs_buf *bp; struct xfs_dqblk *dqblk; xfs_failaddr_t fa; int error; @@ -1257,28 +1292,12 @@ xfs_qm_dqflush(
trace_xfs_dqflush(dqp);
- *bpp = NULL; - xfs_qm_dqunpin_wait(dqp);
- /* - * Get the buffer containing the on-disk dquot - */ - error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dqp->q_blkno, - mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK, - &bp, &xfs_dquot_buf_ops); - if (error == -EAGAIN) - goto out_unlock; - if (xfs_metadata_is_sick(error)) - xfs_dquot_mark_sick(dqp); - if (error) - goto out_abort; - fa = xfs_qm_dqflush_check(dqp); if (fa) { xfs_alert(mp, "corrupt dquot ID 0x%x in memory at %pS", dqp->q_id, fa); - xfs_buf_relse(bp); xfs_dquot_mark_sick(dqp); error = -EFSCORRUPTED; goto out_abort; @@ -1328,14 +1347,12 @@ xfs_qm_dqflush( }
trace_xfs_dqflush_done(dqp); - *bpp = bp; return 0;
out_abort: dqp->q_flags &= ~XFS_DQFLAG_DIRTY; xfs_trans_ail_delete(lip, 0); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); -out_unlock: xfs_dqfunlock(dqp); return error; } diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 7d19091215b080..56ecc5ed01934d 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -155,14 +155,26 @@ xfs_qm_dquot_logitem_push(
spin_unlock(&lip->li_ailp->ail_lock);
- error = xfs_qm_dqflush(dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, &bp); + if (error) { + if (error == -EAGAIN) + rval = XFS_ITEM_LOCKED; + xfs_dqfunlock(dqp); + goto out_relock_ail; + } + + /* + * dqflush completes dqflock on error, and the delwri ioend does it on + * success. + */ + error = xfs_qm_dqflush(dqp, bp); if (!error) { if (!xfs_buf_delwri_queue(bp, buffer_list)) rval = XFS_ITEM_FLUSHING; - xfs_buf_relse(bp); - } else if (error == -EAGAIN) - rval = XFS_ITEM_LOCKED; + } + xfs_buf_relse(bp);
+out_relock_ail: spin_lock(&lip->li_ailp->ail_lock); out_unlock: xfs_dqunlock(dqp); diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 7e2307921deb2f..4f50d8ce125f57 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -146,17 +146,28 @@ xfs_qm_dqpurge( * We don't care about getting disk errors here. We need * to purge this dquot anyway, so we go ahead regardless. */ - error = xfs_qm_dqflush(dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, &bp); + if (error == -EAGAIN) { + xfs_dqfunlock(dqp); + dqp->q_flags &= ~XFS_DQFLAG_FREEING; + goto out_unlock; + } + if (error) + goto out_funlock; + + /* + * dqflush completes dqflock on error, and the bwrite ioend + * does it on success. + */ + error = xfs_qm_dqflush(dqp, bp); if (!error) { error = xfs_bwrite(bp); xfs_buf_relse(bp); - } else if (error == -EAGAIN) { - dqp->q_flags &= ~XFS_DQFLAG_FREEING; - goto out_unlock; } xfs_dqflock(dqp); }
+out_funlock: ASSERT(atomic_read(&dqp->q_pincount) == 0); ASSERT(xlog_is_shutdown(dqp->q_logitem.qli_item.li_log) || !test_bit(XFS_LI_IN_AIL, &dqp->q_logitem.qli_item.li_flags)); @@ -462,7 +473,17 @@ xfs_qm_dquot_isolate( /* we have to drop the LRU lock to flush the dquot */ spin_unlock(lru_lock);
- error = xfs_qm_dqflush(dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, &bp); + if (error) { + xfs_dqfunlock(dqp); + goto out_unlock_dirty; + } + + /* + * dqflush completes dqflock on error, and the delwri ioend + * does it on success. + */ + error = xfs_qm_dqflush(dqp, bp); if (error) goto out_unlock_dirty;
@@ -1287,11 +1308,13 @@ xfs_qm_flush_one( goto out_unlock; }
- error = xfs_qm_dqflush(dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, &bp); if (error) goto out_unlock;
- xfs_buf_delwri_queue(bp, buffer_list); + error = xfs_qm_dqflush(dqp, bp); + if (!error) + xfs_buf_delwri_queue(bp, buffer_list); xfs_buf_relse(bp); out_unlock: xfs_dqunlock(dqp);
 
            From: Darrick J. Wong djwong@kernel.org
commit ec88b41b932d5731291dcc0d0d63ea13ab8e07d5 upstream
Clean up these functions a little bit before we move on to the real modifications, and make the variable naming consistent for dquot log items.
Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_dquot.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 5c4ede3e0fc70d..4f8fd1fa94dae2 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1136,8 +1136,9 @@ static void xfs_qm_dqflush_done( struct xfs_log_item *lip) { - struct xfs_dq_logitem *qip = (struct xfs_dq_logitem *)lip; - struct xfs_dquot *dqp = qip->qli_dquot; + struct xfs_dq_logitem *qlip = + container_of(lip, struct xfs_dq_logitem, qli_item); + struct xfs_dquot *dqp = qlip->qli_dquot; struct xfs_ail *ailp = lip->li_ailp; xfs_lsn_t tail_lsn;
@@ -1150,12 +1151,12 @@ xfs_qm_dqflush_done( * holding the lock before removing the dquot from the AIL. */ if (test_bit(XFS_LI_IN_AIL, &lip->li_flags) && - ((lip->li_lsn == qip->qli_flush_lsn) || + ((lip->li_lsn == qlip->qli_flush_lsn) || test_bit(XFS_LI_FAILED, &lip->li_flags))) {
spin_lock(&ailp->ail_lock); xfs_clear_li_failed(lip); - if (lip->li_lsn == qip->qli_flush_lsn) { + if (lip->li_lsn == qlip->qli_flush_lsn) { /* xfs_ail_update_finish() drops the AIL lock */ tail_lsn = xfs_ail_delete_one(ailp, lip); xfs_ail_update_finish(ailp, tail_lsn); @@ -1313,7 +1314,7 @@ xfs_qm_dqflush( dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
xfs_trans_ail_copy_lsn(mp->m_ail, &dqp->q_logitem.qli_flush_lsn, - &dqp->q_logitem.qli_item.li_lsn); + &lip->li_lsn);
/* * copy the lsn into the on-disk dquot now while we have the in memory @@ -1325,7 +1326,7 @@ xfs_qm_dqflush( * of a dquot without an up-to-date CRC getting to disk. */ if (xfs_has_crc(mp)) { - dqblk->dd_lsn = cpu_to_be64(dqp->q_logitem.qli_item.li_lsn); + dqblk->dd_lsn = cpu_to_be64(lip->li_lsn); xfs_update_cksum((char *)dqblk, sizeof(struct xfs_dqblk), XFS_DQUOT_CRC_OFF); } @@ -1335,7 +1336,7 @@ xfs_qm_dqflush( * the AIL and release the flush lock once the dquot is synced to disk. */ bp->b_flags |= _XBF_DQUOTS; - list_add_tail(&dqp->q_logitem.qli_item.li_bio_list, &bp->b_li_list); + list_add_tail(&lip->li_bio_list, &bp->b_li_list);
/* * If the buffer is pinned then push on the log so we won't
 
            From: Darrick J. Wong djwong@kernel.org
commit acc8f8628c3737108f36e5637f4d5daeaf96d90e upstream
Ever since 6.12-rc1, I've observed a pile of warnings from the kernel when running fstests with quotas enabled:
WARNING: CPU: 1 PID: 458580 at mm/page_alloc.c:4221 __alloc_pages_noprof+0xc9c/0xf18 CPU: 1 UID: 0 PID: 458580 Comm: xfsaild/sda3 Tainted: G W 6.12.0-rc6-djwa #rc6 6ee3e0e531f6457e2d26aa008a3b65ff184b377c <snip> Call trace: __alloc_pages_noprof+0xc9c/0xf18 alloc_pages_mpol_noprof+0x94/0x240 alloc_pages_noprof+0x68/0xf8 new_slab+0x3e0/0x568 ___slab_alloc+0x5a0/0xb88 __slab_alloc.constprop.0+0x7c/0xf8 __kmalloc_noprof+0x404/0x4d0 xfs_buf_get_map+0x594/0xde0 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_buf_read_map+0x64/0x2e0 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_trans_read_buf_map+0x1dc/0x518 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_qm_dqflush+0xac/0x468 [xfs 384cb02810558b4c490343c164e9407332118f88] xfs_qm_dquot_logitem_push+0xe4/0x148 [xfs 384cb02810558b4c490343c164e9407332118f88] xfsaild+0x3f4/0xde8 [xfs 384cb02810558b4c490343c164e9407332118f88] kthread+0x110/0x128 ret_from_fork+0x10/0x20 ---[ end trace 0000000000000000 ]---
This corresponds to the line:
WARN_ON_ONCE(current->flags & PF_MEMALLOC);
within the NOFAIL checks. What's happening here is that the XFS AIL is trying to write a disk quota update back into the filesystem, but for that it needs to read the ondisk buffer for the dquot. The buffer is not in memory anymore, probably because it was evicted. Regardless, the buffer cache tries to allocate a new buffer, but those allocations are NOFAIL. The AIL thread has marked itself PF_MEMALLOC (aka noreclaim) since commit 43ff2122e6492b ("xfs: on-stack delayed write buffer lists") presumably because reclaim can push on XFS to push on the AIL.
An easy way to fix this probably would have been to drop the NOFAIL flag from the xfs_buf allocation and open code a retry loop, but then there's still the problem that for bs>ps filesystems, the buffer itself could require up to 64k worth of pages.
Inode items had similar behavior (multi-page cluster buffers that we don't want to allocate in the AIL) which we solved by making transaction precommit attach the inode cluster buffers to the dirty log item. Let's solve the dquot problem in the same way.
So: Make a real precommit handler to read the dquot buffer and attach it to the log item; pass it to dqflush in the push method; and have the iodone function detach the buffer once we've flushed everything. Add a state flag to the log item to track when a thread has entered the precommit -> push mechanism to skip the detaching if it turns out that the dquot is very busy, as we don't hold the dquot lock between log item commit and AIL push).
Reading and attaching the dquot buffer in the precommit hook is inspired by the work done for inode cluster buffers some time ago.
Cc: stable@vger.kernel.org # v6.12 Fixes: 903edea6c53f09 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_dquot.h | 6 ++ fs/xfs/xfs_dquot_item.h | 7 +++ fs/xfs/xfs_dquot.c | 129 +++++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_dquot_item.c | 39 +++++++++----- fs/xfs/xfs_qm.c | 9 ++- fs/xfs/xfs_trans_ail.c | 2 - 6 files changed, 168 insertions(+), 24 deletions(-)
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index fb9995d2f2331a..aad483fc08b8c3 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -205,7 +205,7 @@ void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp);
void xfs_qm_dqdestroy(struct xfs_dquot *dqp); int xfs_dquot_read_buf(struct xfs_trans *tp, struct xfs_dquot *dqp, - struct xfs_buf **bpp); + xfs_buf_flags_t flags, struct xfs_buf **bpp); int xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf *bp); void xfs_qm_dqunpin_wait(struct xfs_dquot *dqp); void xfs_qm_adjust_dqtimers(struct xfs_dquot *d); @@ -229,6 +229,10 @@ void xfs_dqlockn(struct xfs_dqtrx *q);
void xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
+int xfs_dquot_attach_buf(struct xfs_trans *tp, struct xfs_dquot *dqp); +int xfs_dquot_use_attached_buf(struct xfs_dquot *dqp, struct xfs_buf **bpp); +void xfs_dquot_detach_buf(struct xfs_dquot *dqp); + static inline struct xfs_dquot *xfs_qm_dqhold(struct xfs_dquot *dqp) { xfs_dqlock(dqp); diff --git a/fs/xfs/xfs_dquot_item.h b/fs/xfs/xfs_dquot_item.h index 794710c2447493..d66e52807d76d5 100644 --- a/fs/xfs/xfs_dquot_item.h +++ b/fs/xfs/xfs_dquot_item.h @@ -14,6 +14,13 @@ struct xfs_dq_logitem { struct xfs_log_item qli_item; /* common portion */ struct xfs_dquot *qli_dquot; /* dquot ptr */ xfs_lsn_t qli_flush_lsn; /* lsn at last flush */ + + /* + * We use this spinlock to coordinate access to the li_buf pointer in + * the log item and the qli_dirty flag. + */ + spinlock_t qli_lock; + bool qli_dirty; /* dirtied since last flush? */ };
void xfs_qm_dquot_logitem_init(struct xfs_dquot *dqp); diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index 4f8fd1fa94dae2..bba1387dfa42dc 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -68,6 +68,30 @@ xfs_dquot_mark_sick( } }
+/* + * Detach the dquot buffer if it's still attached, because we can get called + * through dqpurge after a log shutdown. Caller must hold the dqflock or have + * otherwise isolated the dquot. + */ +void +xfs_dquot_detach_buf( + struct xfs_dquot *dqp) +{ + struct xfs_dq_logitem *qlip = &dqp->q_logitem; + struct xfs_buf *bp = NULL; + + spin_lock(&qlip->qli_lock); + if (qlip->qli_item.li_buf) { + bp = qlip->qli_item.li_buf; + qlip->qli_item.li_buf = NULL; + } + spin_unlock(&qlip->qli_lock); + if (bp) { + list_del_init(&qlip->qli_item.li_bio_list); + xfs_buf_rele(bp); + } +} + /* * This is called to free all the memory associated with a dquot */ @@ -76,6 +100,7 @@ xfs_qm_dqdestroy( struct xfs_dquot *dqp) { ASSERT(list_empty(&dqp->q_lru)); + ASSERT(dqp->q_logitem.qli_item.li_buf == NULL);
kvfree(dqp->q_logitem.qli_item.li_lv_shadow); mutex_destroy(&dqp->q_qlock); @@ -1140,6 +1165,7 @@ xfs_qm_dqflush_done( container_of(lip, struct xfs_dq_logitem, qli_item); struct xfs_dquot *dqp = qlip->qli_dquot; struct xfs_ail *ailp = lip->li_ailp; + struct xfs_buf *bp = NULL; xfs_lsn_t tail_lsn;
/* @@ -1169,6 +1195,19 @@ xfs_qm_dqflush_done( * Release the dq's flush lock since we're done with it. */ xfs_dqfunlock(dqp); + + /* + * If this dquot hasn't been dirtied since initiating the last dqflush, + * release the buffer reference. + */ + spin_lock(&qlip->qli_lock); + if (!qlip->qli_dirty) { + bp = lip->li_buf; + lip->li_buf = NULL; + } + spin_unlock(&qlip->qli_lock); + if (bp) + xfs_buf_rele(bp); }
void @@ -1191,7 +1230,7 @@ xfs_buf_dquot_io_fail(
spin_lock(&bp->b_mount->m_ail->ail_lock); list_for_each_entry(lip, &bp->b_li_list, li_bio_list) - xfs_set_li_failed(lip, bp); + set_bit(XFS_LI_FAILED, &lip->li_flags); spin_unlock(&bp->b_mount->m_ail->ail_lock); }
@@ -1243,6 +1282,7 @@ int xfs_dquot_read_buf( struct xfs_trans *tp, struct xfs_dquot *dqp, + xfs_buf_flags_t xbf_flags, struct xfs_buf **bpp) { struct xfs_mount *mp = dqp->q_mount; @@ -1250,7 +1290,7 @@ xfs_dquot_read_buf( int error;
error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, dqp->q_blkno, - mp->m_quotainfo->qi_dqchunklen, XBF_TRYLOCK, + mp->m_quotainfo->qi_dqchunklen, xbf_flags, &bp, &xfs_dquot_buf_ops); if (error == -EAGAIN) return error; @@ -1269,6 +1309,77 @@ xfs_dquot_read_buf( return error; }
+/* + * Attach a dquot buffer to this dquot to avoid allocating a buffer during a + * dqflush, since dqflush can be called from reclaim context. + */ +int +xfs_dquot_attach_buf( + struct xfs_trans *tp, + struct xfs_dquot *dqp) +{ + struct xfs_dq_logitem *qlip = &dqp->q_logitem; + struct xfs_log_item *lip = &qlip->qli_item; + int error; + + spin_lock(&qlip->qli_lock); + if (!lip->li_buf) { + struct xfs_buf *bp = NULL; + + spin_unlock(&qlip->qli_lock); + error = xfs_dquot_read_buf(tp, dqp, 0, &bp); + if (error) + return error; + + /* + * Attach the dquot to the buffer so that the AIL does not have + * to read the dquot buffer to push this item. + */ + xfs_buf_hold(bp); + spin_lock(&qlip->qli_lock); + lip->li_buf = bp; + xfs_trans_brelse(tp, bp); + } + qlip->qli_dirty = true; + spin_unlock(&qlip->qli_lock); + + return 0; +} + +/* + * Get a new reference the dquot buffer attached to this dquot for a dqflush + * operation. + * + * Returns 0 and a NULL bp if none was attached to the dquot; 0 and a locked + * bp; or -EAGAIN if the buffer could not be locked. + */ +int +xfs_dquot_use_attached_buf( + struct xfs_dquot *dqp, + struct xfs_buf **bpp) +{ + struct xfs_buf *bp = dqp->q_logitem.qli_item.li_buf; + + /* + * A NULL buffer can happen if the dquot dirty flag was set but the + * filesystem shut down before transaction commit happened. In that + * case we're not going to flush anyway. + */ + if (!bp) { + ASSERT(xfs_is_shutdown(dqp->q_mount)); + + *bpp = NULL; + return 0; + } + + if (!xfs_buf_trylock(bp)) + return -EAGAIN; + + xfs_buf_hold(bp); + *bpp = bp; + return 0; +} + /* * Write a modified dquot to disk. * The dquot must be locked and the flush lock too taken by caller. @@ -1283,7 +1394,8 @@ xfs_qm_dqflush( struct xfs_buf *bp) { struct xfs_mount *mp = dqp->q_mount; - struct xfs_log_item *lip = &dqp->q_logitem.qli_item; + struct xfs_dq_logitem *qlip = &dqp->q_logitem; + struct xfs_log_item *lip = &qlip->qli_item; struct xfs_dqblk *dqblk; xfs_failaddr_t fa; int error; @@ -1313,8 +1425,15 @@ xfs_qm_dqflush( */ dqp->q_flags &= ~XFS_DQFLAG_DIRTY;
- xfs_trans_ail_copy_lsn(mp->m_ail, &dqp->q_logitem.qli_flush_lsn, - &lip->li_lsn); + /* + * We hold the dquot lock, so nobody can dirty it while we're + * scheduling the write out. Clear the dirty-since-flush flag. + */ + spin_lock(&qlip->qli_lock); + qlip->qli_dirty = false; + spin_unlock(&qlip->qli_lock); + + xfs_trans_ail_copy_lsn(mp->m_ail, &qlip->qli_flush_lsn, &lip->li_lsn);
/* * copy the lsn into the on-disk dquot now while we have the in memory diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 56ecc5ed01934d..271b195ebb9326 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -123,8 +123,9 @@ xfs_qm_dquot_logitem_push( __releases(&lip->li_ailp->ail_lock) __acquires(&lip->li_ailp->ail_lock) { - struct xfs_dquot *dqp = DQUOT_ITEM(lip)->qli_dquot; - struct xfs_buf *bp = lip->li_buf; + struct xfs_dq_logitem *qlip = DQUOT_ITEM(lip); + struct xfs_dquot *dqp = qlip->qli_dquot; + struct xfs_buf *bp; uint rval = XFS_ITEM_SUCCESS; int error;
@@ -155,11 +156,10 @@ xfs_qm_dquot_logitem_push(
spin_unlock(&lip->li_ailp->ail_lock);
- error = xfs_dquot_read_buf(NULL, dqp, &bp); - if (error) { - if (error == -EAGAIN) - rval = XFS_ITEM_LOCKED; + error = xfs_dquot_use_attached_buf(dqp, &bp); + if (error == -EAGAIN) { xfs_dqfunlock(dqp); + rval = XFS_ITEM_LOCKED; goto out_relock_ail; }
@@ -207,12 +207,10 @@ xfs_qm_dquot_logitem_committing( }
#ifdef DEBUG_EXPENSIVE -static int -xfs_qm_dquot_logitem_precommit( - struct xfs_trans *tp, - struct xfs_log_item *lip) +static void +xfs_qm_dquot_logitem_precommit_check( + struct xfs_dquot *dqp) { - struct xfs_dquot *dqp = DQUOT_ITEM(lip)->qli_dquot; struct xfs_mount *mp = dqp->q_mount; struct xfs_disk_dquot ddq = { }; xfs_failaddr_t fa; @@ -228,13 +226,24 @@ xfs_qm_dquot_logitem_precommit( xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); ASSERT(fa == NULL); } - - return 0; } #else -# define xfs_qm_dquot_logitem_precommit NULL +# define xfs_qm_dquot_logitem_precommit_check(...) ((void)0) #endif
+static int +xfs_qm_dquot_logitem_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_dq_logitem *qlip = DQUOT_ITEM(lip); + struct xfs_dquot *dqp = qlip->qli_dquot; + + xfs_qm_dquot_logitem_precommit_check(dqp); + + return xfs_dquot_attach_buf(tp, dqp); +} + static const struct xfs_item_ops xfs_dquot_item_ops = { .iop_size = xfs_qm_dquot_logitem_size, .iop_precommit = xfs_qm_dquot_logitem_precommit, @@ -259,5 +268,7 @@ xfs_qm_dquot_logitem_init(
xfs_log_item_init(dqp->q_mount, &lp->qli_item, XFS_LI_DQUOT, &xfs_dquot_item_ops); + spin_lock_init(&lp->qli_lock); lp->qli_dquot = dqp; + lp->qli_dirty = false; } diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 4f50d8ce125f57..10fa44165ea16d 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -146,7 +146,7 @@ xfs_qm_dqpurge( * We don't care about getting disk errors here. We need * to purge this dquot anyway, so we go ahead regardless. */ - error = xfs_dquot_read_buf(NULL, dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp); if (error == -EAGAIN) { xfs_dqfunlock(dqp); dqp->q_flags &= ~XFS_DQFLAG_FREEING; @@ -166,6 +166,7 @@ xfs_qm_dqpurge( } xfs_dqflock(dqp); } + xfs_dquot_detach_buf(dqp);
out_funlock: ASSERT(atomic_read(&dqp->q_pincount) == 0); @@ -473,7 +474,7 @@ xfs_qm_dquot_isolate( /* we have to drop the LRU lock to flush the dquot */ spin_unlock(lru_lock);
- error = xfs_dquot_read_buf(NULL, dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp); if (error) { xfs_dqfunlock(dqp); goto out_unlock_dirty; @@ -491,6 +492,8 @@ xfs_qm_dquot_isolate( xfs_buf_relse(bp); goto out_unlock_dirty; } + + xfs_dquot_detach_buf(dqp); xfs_dqfunlock(dqp);
/* @@ -1308,7 +1311,7 @@ xfs_qm_flush_one( goto out_unlock; }
- error = xfs_dquot_read_buf(NULL, dqp, &bp); + error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp); if (error) goto out_unlock;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c index 8ede9d099d1fea..f56d62dced97b1 100644 --- a/fs/xfs/xfs_trans_ail.c +++ b/fs/xfs/xfs_trans_ail.c @@ -360,7 +360,7 @@ xfsaild_resubmit_item(
/* protected by ail_lock */ list_for_each_entry(lip, &bp->b_li_list, li_bio_list) { - if (bp->b_flags & _XBF_INODES) + if (bp->b_flags & (_XBF_INODES | _XBF_DQUOTS)) clear_bit(XFS_LI_FAILED, &lip->li_flags); else xfs_clear_li_failed(lip);
 
            From: Darrick J. Wong djwong@kernel.org
commit ca378189fdfa890a4f0622f85ee41b710bbac271 upstream
Now that we've converted the dquot logging machinery to attach the dquot buffer to the li_buf pointer so that the AIL dqflush doesn't have to allocate or read buffers in a reclaim path, do the same for the quotacheck code so that the reclaim shrinker dqflush call doesn't have to do that either.
Cc: stable@vger.kernel.org # v6.12 Fixes: 903edea6c53f09 ("mm: warn about illegal __GFP_NOFAIL usage in a more appropriate location and manner") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_dquot.h | 2 -- fs/xfs/xfs_dquot.c | 9 +++------ fs/xfs/xfs_qm.c | 18 +++++++++++++----- 3 files changed, 16 insertions(+), 13 deletions(-)
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h index aad483fc08b8c3..bd7bfd9e402e5b 100644 --- a/fs/xfs/xfs_dquot.h +++ b/fs/xfs/xfs_dquot.h @@ -204,8 +204,6 @@ void xfs_dquot_to_disk(struct xfs_disk_dquot *ddqp, struct xfs_dquot *dqp); #define XFS_DQ_IS_DIRTY(dqp) ((dqp)->q_flags & XFS_DQFLAG_DIRTY)
void xfs_qm_dqdestroy(struct xfs_dquot *dqp); -int xfs_dquot_read_buf(struct xfs_trans *tp, struct xfs_dquot *dqp, - xfs_buf_flags_t flags, struct xfs_buf **bpp); int xfs_qm_dqflush(struct xfs_dquot *dqp, struct xfs_buf *bp); void xfs_qm_dqunpin_wait(struct xfs_dquot *dqp); void xfs_qm_adjust_dqtimers(struct xfs_dquot *d); diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index bba1387dfa42dc..d2b06ca2ec7a9c 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1278,11 +1278,10 @@ xfs_qm_dqflush_check( * Requires dquot flush lock, will clear the dirty flag, delete the quota log * item from the AIL, and shut down the system if something goes wrong. */ -int +static int xfs_dquot_read_buf( struct xfs_trans *tp, struct xfs_dquot *dqp, - xfs_buf_flags_t xbf_flags, struct xfs_buf **bpp) { struct xfs_mount *mp = dqp->q_mount; @@ -1290,10 +1289,8 @@ xfs_dquot_read_buf( int error;
error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, dqp->q_blkno, - mp->m_quotainfo->qi_dqchunklen, xbf_flags, + mp->m_quotainfo->qi_dqchunklen, 0, &bp, &xfs_dquot_buf_ops); - if (error == -EAGAIN) - return error; if (xfs_metadata_is_sick(error)) xfs_dquot_mark_sick(dqp); if (error) @@ -1327,7 +1324,7 @@ xfs_dquot_attach_buf( struct xfs_buf *bp = NULL;
spin_unlock(&qlip->qli_lock); - error = xfs_dquot_read_buf(tp, dqp, 0, &bp); + error = xfs_dquot_read_buf(tp, dqp, &bp); if (error) return error;
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 10fa44165ea16d..3212b5bf3fb3c6 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -146,13 +146,13 @@ xfs_qm_dqpurge( * We don't care about getting disk errors here. We need * to purge this dquot anyway, so we go ahead regardless. */ - error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp); + error = xfs_dquot_use_attached_buf(dqp, &bp); if (error == -EAGAIN) { xfs_dqfunlock(dqp); dqp->q_flags &= ~XFS_DQFLAG_FREEING; goto out_unlock; } - if (error) + if (!bp) goto out_funlock;
/* @@ -474,8 +474,8 @@ xfs_qm_dquot_isolate( /* we have to drop the LRU lock to flush the dquot */ spin_unlock(lru_lock);
- error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp); - if (error) { + error = xfs_dquot_use_attached_buf(dqp, &bp); + if (!bp || error == -EAGAIN) { xfs_dqfunlock(dqp); goto out_unlock_dirty; } @@ -1132,6 +1132,10 @@ xfs_qm_quotacheck_dqadjust( return error; }
+ error = xfs_dquot_attach_buf(NULL, dqp); + if (error) + return error; + trace_xfs_dqadjust(dqp);
/* @@ -1311,9 +1315,13 @@ xfs_qm_flush_one( goto out_unlock; }
- error = xfs_dquot_read_buf(NULL, dqp, XBF_TRYLOCK, &bp); + error = xfs_dquot_use_attached_buf(dqp, &bp); if (error) goto out_unlock; + if (!bp) { + error = -EFSCORRUPTED; + goto out_unlock; + }
error = xfs_qm_dqflush(dqp, bp); if (!error)
 
            From: Darrick J. Wong djwong@kernel.org
commit 4b8d867ca6e2fc6d152f629fdaf027053b81765a upstream
Emmanual Florac reports a strange occurrence when project quota limits are enabled, free space is lower than the remaining quota, and someone runs statvfs:
# mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o prjquota # xfs_quota -x -c 'limit -p bhard=2G 55' /mnt # mkdir /mnt/dir # xfs_io -c 'chproj 55' -c 'chattr +P' -c 'stat -vvvv' /mnt/dir # fallocate -l 19g /mnt/a # df /mnt /mnt/dir Filesystem Size Used Avail Use% Mounted on /dev/sda 20G 20G 345M 99% /mnt /dev/sda 2.0G 0 2.0G 0% /mnt
I think the bug here is that xfs_fill_statvfs_from_dquot unconditionally assigns to f_bfree without checking that the filesystem has enough free space to fill the remaining project quota. However, this is a longstanding behavior of xfs so it's unclear what to do here.
Cc: stable@vger.kernel.org # v2.6.18 Fixes: 932f2c323196c2 ("[XFS] statvfs component of directory/project quota support, code originally by Glen.") Reported-by: Emmanuel Florac eflorac@intellique.com Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_qm_bhv.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/xfs/xfs_qm_bhv.c b/fs/xfs/xfs_qm_bhv.c index a11436579877d5..410a2a9c18ec52 100644 --- a/fs/xfs/xfs_qm_bhv.c +++ b/fs/xfs/xfs_qm_bhv.c @@ -26,21 +26,28 @@ xfs_fill_statvfs_from_dquot( limit = dqp->q_blk.softlimit ? dqp->q_blk.softlimit : dqp->q_blk.hardlimit; - if (limit && statp->f_blocks > limit) { - statp->f_blocks = limit; - statp->f_bfree = statp->f_bavail = - (statp->f_blocks > dqp->q_blk.reserved) ? - (statp->f_blocks - dqp->q_blk.reserved) : 0; + if (limit) { + uint64_t remaining = 0; + + if (limit > dqp->q_blk.reserved) + remaining = limit - dqp->q_blk.reserved; + + statp->f_blocks = min(statp->f_blocks, limit); + statp->f_bfree = min(statp->f_bfree, remaining); + statp->f_bavail = min(statp->f_bavail, remaining); }
limit = dqp->q_ino.softlimit ? dqp->q_ino.softlimit : dqp->q_ino.hardlimit; - if (limit && statp->f_files > limit) { - statp->f_files = limit; - statp->f_ffree = - (statp->f_files > dqp->q_ino.reserved) ? - (statp->f_files - dqp->q_ino.reserved) : 0; + if (limit) { + uint64_t remaining = 0; + + if (limit > dqp->q_ino.reserved) + remaining = limit - dqp->q_ino.reserved; + + statp->f_files = min(statp->f_files, limit); + statp->f_ffree = min(statp->f_ffree, remaining); } }
 
            From: Darrick J. Wong djwong@kernel.org
commit 1aacd3fac248902ea1f7607f2d12b93929a4833b upstream
Lai Yi reported a lockdep complaint about circular locking:
Chain exists of: &lp->qli_lock --> &bch->bc_lock --> &l->lock
Possible unsafe locking scenario:
CPU0 CPU1 ---- ---- lock(&l->lock); lock(&bch->bc_lock); lock(&l->lock); lock(&lp->qli_lock);
I /think/ the problem here is that xfs_dquot_attach_buf during quotacheck will release the buffer while it's holding the qli_lock. Because this is a cached buffer, xfs_buf_rele_cached takes b_lock before decrementing b_hold. Other threads have taught lockdep that a locking dependency chain is bp->b_lock -> bch->bc_lock -> l(ru)->lock; and that another chain is l(ru)->lock -> lp->qli_lock. Hence we do not want to take b_lock while holding qli_lock.
Reported-by: syzbot+3126ab3db03db42e7a31@syzkaller.appspotmail.com Cc: stable@vger.kernel.org # v6.13-rc3 Fixes: ca378189fdfa89 ("xfs: convert quotacheck to attach dquot buffers") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de --- fs/xfs/xfs_dquot.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index d2b06ca2ec7a9c..d64d454cbe8bca 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -1308,7 +1308,8 @@ xfs_dquot_read_buf(
/* * Attach a dquot buffer to this dquot to avoid allocating a buffer during a - * dqflush, since dqflush can be called from reclaim context. + * dqflush, since dqflush can be called from reclaim context. Caller must hold + * the dqlock. */ int xfs_dquot_attach_buf( @@ -1329,13 +1330,16 @@ xfs_dquot_attach_buf( return error;
/* - * Attach the dquot to the buffer so that the AIL does not have - * to read the dquot buffer to push this item. + * Hold the dquot buffer so that we retain our ref to it after + * detaching it from the transaction, then give that ref to the + * dquot log item so that the AIL does not have to read the + * dquot buffer to push this item. */ xfs_buf_hold(bp); + xfs_trans_brelse(tp, bp); + spin_lock(&qlip->qli_lock); lip->li_buf = bp; - xfs_trans_brelse(tp, bp); } qlip->qli_dirty = true; spin_unlock(&qlip->qli_lock);
 
            From: Darrick J. Wong djwong@kernel.org
commit 111d36d6278756128b7d7fab787fdcbf8221cd98 upstream
We have to lock the buffer before we can delete the dquot log item from the buffer's log item list.
Cc: stable@vger.kernel.org # v6.13-rc3 Fixes: acc8f8628c3737 ("xfs: attach dquot buffer to dquot log item buffer") Signed-off-by: "Darrick J. Wong" djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Carlos Maiolino cem@kernel.org --- fs/xfs/xfs_dquot.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c index d64d454cbe8bca..0d73b59f1c9e57 100644 --- a/fs/xfs/xfs_dquot.c +++ b/fs/xfs/xfs_dquot.c @@ -87,8 +87,9 @@ xfs_dquot_detach_buf( } spin_unlock(&qlip->qli_lock); if (bp) { + xfs_buf_lock(bp); list_del_init(&qlip->qli_item.li_bio_list); - xfs_buf_rele(bp); + xfs_buf_relse(bp); } }
 
            From: Long Li leo.lilong@huawei.com
commit efebe42d95fbba91dca6e3e32cb9e0612eb56de5 upstream
When mounting an image containing a log with sb modifications that require log replay, the mount process hang all the time and stack as follows:
[root@localhost ~]# cat /proc/557/stack [<0>] xfs_buftarg_wait+0x31/0x70 [<0>] xfs_buftarg_drain+0x54/0x350 [<0>] xfs_mountfs+0x66e/0xe80 [<0>] xfs_fs_fill_super+0x7f1/0xec0 [<0>] get_tree_bdev_flags+0x186/0x280 [<0>] get_tree_bdev+0x18/0x30 [<0>] xfs_fs_get_tree+0x1d/0x30 [<0>] vfs_get_tree+0x2d/0x110 [<0>] path_mount+0xb59/0xfc0 [<0>] do_mount+0x92/0xc0 [<0>] __x64_sys_mount+0xc2/0x160 [<0>] x64_sys_call+0x2de4/0x45c0 [<0>] do_syscall_64+0xa7/0x240 [<0>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
During log recovery, while updating the in-memory superblock from the primary SB buffer, if an error is encountered, such as superblock corruption occurs or some other reasons, we will proceed to out_release and release the xfs_buf. However, this is insufficient because the xfs_buf's log item has already been initialized and the xfs_buf is held by the buffer log item as follows, the xfs_buf will not be released, causing the mount thread to hang.
xlog_recover_do_primary_sb_buffer xlog_recover_do_reg_buffer xlog_recover_validate_buf_type xfs_buf_item_init(bp, mp)
The solution is straightforward, we simply need to allow it to be handled by the normal buffer write process. The filesystem will be shutdown before the submission of buffer_list in xlog_do_recovery_pass(), ensuring the correct release of the xfs_buf as follows:
xlog_do_recovery_pass error = xlog_recover_process xlog_recover_process_data xlog_recover_process_ophdr xlog_recovery_process_trans ... xlog_recover_buf_commit_pass2 error = xlog_recover_do_primary_sb_buffer //Encounter error and return if (error) goto out_writebuf ... out_writebuf: xfs_buf_delwri_queue(bp, buffer_list) //add bp to list return error ... if (!list_empty(&buffer_list)) if (error) xlog_force_shutdown(log, SHUTDOWN_LOG_IO_ERROR); //shutdown first xfs_buf_delwri_submit(&buffer_list); //submit buffers in list __xfs_buf_submit if (bp->b_mount->m_log && xlog_is_shutdown(bp->b_mount->m_log)) xfs_buf_ioend_fail(bp) //release bp correctly
Fixes: 6a18765b54e2 ("xfs: update the file system geometry after recoverying superblock buffers") Cc: stable@vger.kernel.org # v6.12 Signed-off-by: Long Li leo.lilong@huawei.com Reviewed-by: Darrick J. Wong djwong@kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Carlos Maiolino cem@kernel.org --- fs/xfs/xfs_buf_item_recover.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 5180cbf5a90b4b..0185c92df8c2ea 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -1036,11 +1036,20 @@ xlog_recover_buf_commit_pass2( error = xlog_recover_do_primary_sb_buffer(mp, item, bp, buf_f, current_lsn); if (error) - goto out_release; + goto out_writebuf; } else { xlog_recover_do_reg_buffer(mp, item, bp, buf_f, current_lsn); }
+ /* + * Buffer held by buf log item during 'normal' buffer recovery must + * be committed through buffer I/O submission path to ensure proper + * release. When error occurs during sb buffer recovery, log shutdown + * will be done before submitting buffer list so that buffers can be + * released correctly through ioend failure path. + */ +out_writebuf: + /* * Perform delayed write on the buffer. Asynchronous writes will be * slower when taking into account all the buffers to be flushed.
linux-stable-mirror@lists.linaro.org


