From: Scott Mayhew smayhew@redhat.com
[ Upstream commit 2e47c3cc64b44b0b06cd68c2801db92ff143f2b2 ]
We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value.
Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid.
Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped.
Signed-off-by: Scott Mayhew smayhew@redhat.com Reviewed-by: Benjamin Coddington bcodding@hammerspace.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org ---
LLM Generated explanations, may be completely bogus:
## Analysis Summary
### 3. Bug Classification and Severity
**This is clearly a BUG FIX**, not a feature addition: - **Observed production issue**: NFS4ERR_OLD_STATEID errors - **Race condition bug**: Multiple waiters can corrupt stateid ordering - **Protocol compliance**: Client violates NFSv4 protocol by using stale seqids - **Potential data integrity risk**: Lock failures can lead to data corruption in concurrent access scenarios
**The root cause explained:** When `nfs_set_open_stateid_locked()` detects an out-of-sequence stateid, it waits up to 5 seconds on `state->waitq`. If multiple waiters all timeout (status = -EAGAIN), they all proceed to update the stateid. The last one wins, but it may have an **older** seqid than an earlier updater. This causes seqid to go backwards.
**The fix:** Before allowing the update after a timeout, check if the incoming stateid is actually newer than the current one using `nfs4_stateid_is_newer()`. If not, skip the update entirely and return early.
### 4. Scope and Risk Assessment
| Aspect | Assessment | |--------|------------| | Lines changed | ~15 lines in core logic | | Files touched | 2 (nfs4proc.c, nfs4trace.h) | | Dependencies | Uses existing helpers from 2019 | | New APIs | None | | Risk | LOW - adds a missing safety check |
The tracepoint addition (`nfs4_open_stateid_update_skip`) uses an existing event class - it's purely diagnostic.
### 5. Stable Criteria Checklist
- ✅ **Obviously correct**: Simple check using well-established helper functions - ✅ **Fixes real bug**: Observed NFS4ERR_OLD_STATEID in production - ✅ **Important issue**: I/O failures, potential data integrity issues - ✅ **Small and contained**: ~15 lines, single function fix - ✅ **No new features**: Pure bug fix with diagnostic tracepoint - ✅ **No new APIs**: Uses existing nfs4_stateid_* helpers from 2019 - ✅ **Reviewed**: By Benjamin Coddington (NFS expert) - ✅ **Signed-off**: By Trond Myklebust (primary NFS maintainer)
### 6. User Impact
- **Who is affected**: Any NFSv4 user with concurrent file access patterns - **Severity**: Medium-High - can cause I/O failures, lock errors, and potential data corruption - **Enterprise impact**: NFS is heavily used in enterprise environments; this is an important fix
### 7. Dependencies Check
The helper functions used (`nfs4_stateid_match_other`, `nfs4_stateid_is_newer`) have been in the kernel since 2019 and are used by 24 and 11 callers respectively. They will exist in all active stable trees (6.1.y, 6.6.y, etc.).
### Conclusion
This is an excellent stable backport candidate: - Fixes a real, observed bug that causes protocol violations and I/O failures - Small, surgical fix with clear logic - Uses existing infrastructure - no dependency issues - Reviewed and signed-off by NFS maintainers - Low risk of regression - it adds a missing safety check
The fix prevents NFSv4 stateid seqid from going backwards, which is a correctness issue that can impact production NFS users.
**YES**
fs/nfs/nfs4proc.c | 13 +++++++++++-- fs/nfs/nfs4trace.h | 1 + 2 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 93c6ce04332b8..d1ad2ef2ee6f9 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -1780,8 +1780,17 @@ static void nfs_set_open_stateid_locked(struct nfs4_state *state, if (nfs_stateid_is_sequential(state, stateid)) break;
- if (status) - break; + if (status) { + if (nfs4_stateid_match_other(stateid, &state->open_stateid) && + !nfs4_stateid_is_newer(stateid, &state->open_stateid)) { + trace_nfs4_open_stateid_update_skip(state->inode, + stateid, status); + return; + } else { + break; + } + } + /* Rely on seqids for serialisation with NFSv4.0 */ if (!nfs4_has_session(NFS_SERVER(state->inode)->nfs_client)) break; diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h index 9776d220cec33..6285128e631a5 100644 --- a/fs/nfs/nfs4trace.h +++ b/fs/nfs/nfs4trace.h @@ -1353,6 +1353,7 @@ DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_setattr); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_delegreturn); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_open_stateid_update); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_open_stateid_update_wait); +DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_open_stateid_update_skip); DEFINE_NFS4_INODE_STATEID_EVENT(nfs4_close_stateid_update_wait);
DECLARE_EVENT_CLASS(nfs4_getattr_event,