 
            On 6/6/23 18:21, Ilya Dryomov wrote:
On Tue, Jun 6, 2023 at 2:55 AM xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
There is a race between capsnaps flush and removing the inode from 'mdsc->snap_flush_list' list:
== Thread A == == Thread B ==ceph_queue_cap_snap() -> allocate 'capsnapA' ->ihold('&ci->vfs_inode') ->add 'capsnapA' to 'ci->i_cap_snaps' ->add 'ci' to 'mdsc->snap_flush_list' ... == Thread C == ceph_flush_snaps() ->__ceph_flush_snaps() ->__send_flush_snap() handle_cap_flushsnap_ack() ->iput('&ci->vfs_inode') this also will release 'ci' ... == Thread D == ceph_handle_snap() ->flush_snaps() ->iterate 'mdsc->snap_flush_list' ->get the stale 'ci' ->remove 'ci' from ->ihold(&ci->vfs_inode) this 'mdsc->snap_flush_list' will WARNING
To fix this we will increase the inode's i_count ref when adding 'ci' to the 'mdsc->snap_flush_list' list.
Cc: stable@vger.kernel.org URL: https://bugzilla.redhat.com/show_bug.cgi?id=2209299 Reviewed-by: Milind Changire mchangir@redhat.com Signed-off-by: Xiubo Li xiubli@redhat.com
V3:
Fix two minor typo in commit comments.
fs/ceph/caps.c | 6 ++++++ fs/ceph/snap.c | 4 +++- 2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index feabf4cc0c4f..7c2cb813aba4 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -1684,6 +1684,7 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, struct inode *inode = &ci->netfs.inode; struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc; struct ceph_mds_session *session = NULL;
int put = 0;Hi Xiubo,
Nit: renaming this variable to need_put and making it a bool would communicate the intent better.
Hi Ilya
Sure, will update it.
int mds; dout("ceph_flush_snaps %p\n", inode);@@ -1728,8 +1729,13 @@ void ceph_flush_snaps(struct ceph_inode_info *ci, ceph_put_mds_session(session); /* we flushed them all; remove this inode from the queue */ spin_lock(&mdsc->snap_flush_lock);
if (!list_empty(&ci->i_snap_flush_item))
put++;What are the cases when ci is expected to not be on snap_flush_list list (and therefore there is no corresponding reference to put)?
The reason I'm asking is that ceph_flush_snaps() is called from two other places directly (i.e. without iterating snap_flush_list list) and then __ceph_flush_snaps() is called from two yet other places. The problem that we are presented with here is that __ceph_flush_snaps() effectively consumes a reference on ci. Is ci protected from being freed by handle_cap_flushsnap_ack() very soon after __send_flush_snap() returns in all these other places?
There are 4 places will call the 'ceph_flush_snaps()':
Cscope tag: ceph_flush_snaps # line filename / context / line 1 3221 fs/ceph/caps.c <<__ceph_put_cap_refs>> ceph_flush_snaps(ci, NULL); 2 3336 fs/ceph/caps.c <<ceph_put_wrbuffer_cap_refs>> ceph_flush_snaps(ci, NULL); 3 2243 fs/ceph/inode.c <<ceph_inode_work>> ceph_flush_snaps(ci, NULL); 4 941 fs/ceph/snap.c <<flush_snaps>> ceph_flush_snaps(ci, &session); Type number and <Enter> (q or empty cancels):
For #1 it will add the 'ci' to the 'mdsc->snap_flush_list' list by calling '__ceph_finish_cap_snap()' and then call the 'ceph_flush_snaps()' directly or defer call it in the queue work in #3.
The #3 is the reason why we need the 'mdsc->snap_flush_list' list.
For #2 it won't add the 'ci' to the list because it will always call the 'ceph_flush_snaps()' directly.
For #4 it will call 'ceph_flush_snaps()' by iterating the 'mdsc->snap_flush_list' list just before the #3 being triggered.
The problem only exists in case of #1 --> #4, which will make the stale 'ci' to be held in the 'mdsc->snap_flush_list' list after 'capsnap' and 'ci' being freed. All the other cases are okay because the 'ci' will be protected by increasing the ref when allocating the 'capsnap' and will decrease the ref in 'handle_cap_flushsnap_ack()' when freeing the 'capsnap'.
Note: the '__ceph_flush_snaps()' won't increase the ref. The 'handle_cap_flushsnap_ack()' will just try to decrease the ref and only in case the ref reaches to '0' will the 'ci' be freed.
Thanks
- Xiubo
Thanks,
Ilya