On Thu, Feb 2, 2023 at 3:30 AM Xiubo Li xiubli@redhat.com wrote:
On 01/02/2023 21:12, Ilya Dryomov wrote:
On Wed, Feb 1, 2023 at 2:37 AM xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
When received corrupted snap trace we don't know what exactly has happened in MDS side. And we shouldn't continue IOs and metadatas access to MDS, which may corrupt or get incorrect contents.
This patch will just block all the further IO/MDS requests immediately and then evict the kclient itself.
The reason why we still need to evict the kclient just after blocking all the further IOs is that the MDS could revoke the caps faster.
Cc: stable@vger.kernel.org URL: https://tracker.ceph.com/issues/57686 Reviewed-by: Venky Shankar vshankar@redhat.com Signed-off-by: Xiubo Li xiubli@redhat.com
fs/ceph/addr.c | 17 +++++++++++++++-- fs/ceph/caps.c | 16 +++++++++++++--- fs/ceph/file.c | 9 +++++++++ fs/ceph/mds_client.c | 28 +++++++++++++++++++++++++--- fs/ceph/snap.c | 38 ++++++++++++++++++++++++++++++++++++-- fs/ceph/super.h | 1 + 6 files changed, 99 insertions(+), 10 deletions(-)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 8c74871e37c9..cac4083e387a 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -305,7 +305,7 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) struct inode *inode = rreq->inode; struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_osd_request *req;
struct ceph_osd_request *req = NULL; struct ceph_vino vino = ceph_vino(inode); struct iov_iter iter; struct page **pages;
@@ -313,6 +313,11 @@ static void ceph_netfs_issue_read(struct netfs_io_subrequest *subreq) int err = 0; u64 len = subreq->len;
if (ceph_inode_is_shutdown(inode)) {
err = -EIO;
goto out;
}
if (ceph_has_inline_data(ci) && ceph_netfs_issue_op_inline(subreq)) return;
@@ -563,6 +568,9 @@ static int writepage_nounlock(struct page *page, struct writeback_control *wbc)
dout("writepage %p idx %lu\n", page, page->index);
if (ceph_inode_is_shutdown(inode))
return -EIO;
/* verify this is a writeable snap context */ snapc = page_snap_context(page); if (!snapc) {
@@ -1643,7 +1651,7 @@ int ceph_uninline_data(struct file *file) struct ceph_inode_info *ci = ceph_inode(inode); struct ceph_fs_client *fsc = ceph_inode_to_client(inode); struct ceph_osd_request *req = NULL;
struct ceph_cap_flush *prealloc_cf;
struct ceph_cap_flush *prealloc_cf = NULL; struct folio *folio = NULL; u64 inline_version = CEPH_INLINE_NONE; struct page *pages[1];
@@ -1657,6 +1665,11 @@ int ceph_uninline_data(struct file *file) dout("uninline_data %p %llx.%llx inline_version %llu\n", inode, ceph_vinop(inode), inline_version);
if (ceph_inode_is_shutdown(inode)) {
err = -EIO;
goto out;
}
if (inline_version == CEPH_INLINE_NONE) return 0;
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index f75ad432f375..210e40037881 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -4078,6 +4078,7 @@ void ceph_handle_caps(struct ceph_mds_session *session, void *p, *end; struct cap_extra_info extra_info = {}; bool queue_trunc;
bool close_sessions = false; dout("handle_caps from mds%d\n", session->s_mds);
@@ -4215,9 +4216,13 @@ void ceph_handle_caps(struct ceph_mds_session *session, realm = NULL; if (snaptrace_len) { down_write(&mdsc->snap_rwsem);
ceph_update_snap_trace(mdsc, snaptrace,
snaptrace + snaptrace_len,
false, &realm);
if (ceph_update_snap_trace(mdsc, snaptrace,
snaptrace + snaptrace_len,
false, &realm)) {
up_write(&mdsc->snap_rwsem);
close_sessions = true;
goto done;
} downgrade_write(&mdsc->snap_rwsem); } else { down_read(&mdsc->snap_rwsem);
@@ -4277,6 +4282,11 @@ void ceph_handle_caps(struct ceph_mds_session *session, iput(inode); out: ceph_put_string(extra_info.pool_ns);
/* Defer closing the sessions after s_mutex lock being released */
if (close_sessions)
ceph_mdsc_close_sessions(mdsc);
return;
flush_cap_releases:
diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 764598e1efd9..3fbf4993d15d 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -943,6 +943,9 @@ static ssize_t ceph_sync_read(struct kiocb *iocb, struct iov_iter *to, dout("sync_read on file %p %llu~%u %s\n", file, off, (unsigned)len, (file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
if (ceph_inode_is_shutdown(inode))
return -EIO;
Hi Xiubo,
ceph_sync_read() is called only from ceph_read_iter() which already checks for ceph_inode_is_shutdown() (although the generated error is ESTALE instead of EIO). Is the new one in ceph_sync_read() actually needed?
If the answer is yes, why hasn't a corresponding check been added to ceph_sync_write()?
Before I generated this patch based on the fscrypt patches, this will be renamed to __ceph_sync_read() and also will be called by fill_fscrypt_truncate(). I just forgot this and after rebasing I didn't adjust it.
I have updated the 'wip-snap-trace-blocklist' branch by removing it from here and ceph_direct_read_write(). And I will fix this later in the fscrypt patches.
Hi Xiubo,
The latest revision looks fine. I folded the "ceph: dump the msg when receiving a corrupt snap trace" follow-up into this patch and pulled the result into master.
I also rebased testing appropriately, feel free to perform the necessary fscrypt-related fixups there!
Thanks,
Ilya