On Wed, May 17, 2023 at 1:04 PM Xiubo Li xiubli@redhat.com wrote:
On 5/17/23 18:31, Ilya Dryomov wrote:
On Wed, May 17, 2023 at 7:24 AM xiubli@redhat.com wrote:
From: Xiubo Li xiubli@redhat.com
When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the request may still contain a list of 'split_realms', and we need to skip it anyway. Or it will be parsed as a corrupt snaptrace.
Cc: stable@vger.kernel.org Cc: Frank Schilder frans@dtu.dk Reported-by: Frank Schilder frans@dtu.dk URL: https://tracker.ceph.com/issues/61200 Signed-off-by: Xiubo Li xiubli@redhat.com
fs/ceph/snap.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c index 0e59e95a96d9..d95dfe16b624 100644 --- a/fs/ceph/snap.c +++ b/fs/ceph/snap.c @@ -1114,6 +1114,9 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, continue; adjust_snap_realm_parent(mdsc, child, realm->ino); }
} else {
p += sizeof(u64) * num_split_inos;
p += sizeof(u64) * num_split_realms; } /*
-- 2.40.1
Hi Xiubo,
This code appears to be very old -- it goes back to the initial commit 963b61eb041e ("ceph: snapshot management") in 2009. Do you have an explanation for why this popped up only now?
As I remembered we hit this before in one cu BZ last year, but I couldn't remember exactly which one. But I am not sure whether @Jeff saw this before I joint ceph team.
Has MDS always been including split_inos and split_realms arrays in !CEPH_SNAP_OP_SPLIT case or is this a recent change? If it's a recent change, I'd argue that this needs to be addressed on the MDS side.
While in MDS side for the _UPDATE op it won't send the 'split_realm' list just before the commit in 2017:
commit 93e7267757508520dfc22cff1ab20558bd4a44d4 Author: Yan, Zheng zyan@redhat.com Date: Fri Jul 21 21:40:46 2017 +0800
mds: send snap related messages centrally during mds recovery sending CEPH_SNAP_OP_SPLIT and CEPH_SNAP_OP_UPDATE messages to clients centrally in MDCache::open_snaprealms() Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Before this commit it will only send the 'split_realm' list for the _SPLIT op.
It sounds like we have the culprit. This should be treated as a regression and fixed on the MDS side. I don't see a justification for putting useless data on the wire.
Thanks,
Ilya