On 5/17/23 23:04, Gregory Farnum wrote:
Just to be clear, I'd like the details here so we can see if there are ways to prevent similar issues in the future, which I haven't heard anybody talk about. :)
We have hit a similar issue with Venky in https://github.com/ceph/ceph/pull/48382. This PR just added two extra members and the old kclient couldn't recognize them and also just incorrectly parsed them as a new snaptrace.
Venky has fixed it by checking the peer client's feature bit before sending the msgs.
Thanks
- Xiubo
On Wed, May 17, 2023 at 8:03 AM Gregory Farnum gfarnum@redhat.com wrote:
On Wed, May 17, 2023 at 7:27 AM Ilya Dryomov idryomov@gmail.com wrote:
On Wed, May 17, 2023 at 3:59 PM Gregory Farnum gfarnum@redhat.com wrote:
On Wed, May 17, 2023 at 4:33 AM Ilya Dryomov idryomov@gmail.com wrote:
On Wed, May 17, 2023 at 1:04 PM Xiubo Li xiubli@redhat.com wrote:
On 5/17/23 18:31, Ilya Dryomov wrote: > On Wed, May 17, 2023 at 7:24 AM xiubli@redhat.com wrote: >> From: Xiubo Li xiubli@redhat.com >> >> When the MClientSnap reqeust's op is not CEPH_SNAP_OP_SPLIT the >> request may still contain a list of 'split_realms', and we need >> to skip it anyway. Or it will be parsed as a corrupt snaptrace. >> >> Cc: stable@vger.kernel.org >> Cc: Frank Schilder frans@dtu.dk >> Reported-by: Frank Schilder frans@dtu.dk >> URL: https://tracker.ceph.com/issues/61200 >> Signed-off-by: Xiubo Li xiubli@redhat.com >> --- >> fs/ceph/snap.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c >> index 0e59e95a96d9..d95dfe16b624 100644 >> --- a/fs/ceph/snap.c >> +++ b/fs/ceph/snap.c >> @@ -1114,6 +1114,9 @@ void ceph_handle_snap(struct ceph_mds_client *mdsc, >> continue; >> adjust_snap_realm_parent(mdsc, child, realm->ino); >> } >> + } else { >> + p += sizeof(u64) * num_split_inos; >> + p += sizeof(u64) * num_split_realms; >> } >> >> /* >> -- >> 2.40.1 >> > Hi Xiubo, > > This code appears to be very old -- it goes back to the initial commit > 963b61eb041e ("ceph: snapshot management") in 2009. Do you have an > explanation for why this popped up only now? As I remembered we hit this before in one cu BZ last year, but I couldn't remember exactly which one. But I am not sure whether @Jeff saw this before I joint ceph team.
> Has MDS always been including split_inos and split_realms arrays in > !CEPH_SNAP_OP_SPLIT case or is this a recent change? If it's a recent > change, I'd argue that this needs to be addressed on the MDS side. While in MDS side for the _UPDATE op it won't send the 'split_realm' list just before the commit in 2017:
commit 93e7267757508520dfc22cff1ab20558bd4a44d4 Author: Yan, Zheng zyan@redhat.com Date: Fri Jul 21 21:40:46 2017 +0800
mds: send snap related messages centrally during mds recovery sending CEPH_SNAP_OP_SPLIT and CEPH_SNAP_OP_UPDATE messages to clients centrally in MDCache::open_snaprealms() Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Before this commit it will only send the 'split_realm' list for the _SPLIT op.
It sounds like we have the culprit. This should be treated as a regression and fixed on the MDS side. I don't see a justification for putting useless data on the wire.
I don't really understand this viewpoint. We can treat it as an MDS regression if we want, but it's a six-year-old patch so this is in nearly every version of server code anybody's running. Why wouldn't we fix it on both sides?
Well, if I didn't speak up chances are we wouldn't have identified the regression in the MDS at all. People seem to have this perception that the client is somehow "easier" to fix, assume that the server is always doing the right thing and default to patching the client. I'm just trying to push back on that.
In this particular case, after understanding the scope of the issue _and_ getting a committal for the MDS side fix, I approved taking the kernel client patch in an earlier reply.
On Wed, May 17, 2023 at 4:07 AM Xiubo Li xiubli@redhat.com wrote:
And if the split_realm number equals to sizeof(ceph_mds_snap_realm) + extra snap buffer size by coincidence, the above 'corrupted' snaptrace will be parsed by kclient too and kclient won't give any warning, but it will corrupted the snaprealm and capsnap info in kclient.
I'm a bit confused about this patch, but I also don't follow the kernel client code much so please forgive my ignorance. The change you've made is still only invoked inside of the CEPH_SNAP_OP_SPLIT case, so clearly the kclient *mostly* does the right thing when these
No, it's invoked outside: it introduces a "op != CEPH_SNAP_OP_SPLIT" branch.
Oh I mis-parsed the braces/spacing there.
I'm still not getting how the precise size is causing the problem — obviously this isn't an unheard-of category of issue, but the fact that it works until the count matches a magic number is odd. Is that ceph_decode_need macro being called from ceph_update_snap_trace just skipping over the split data somehow? *puzzled* -Greg
Thanks,
Ilya