long/ kernel_time_t is 32 bit on a 32 bit system and 64 bit on a 64 bit system.
ceph_encode_timespec() encodes only the lower 32 bits on a 64 bit system and encodes all of 32 bits on a 32bit system.
ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec into kernel_time_t/ long.
The encode and decode functions do not match when the values are negative:
Consider the following scenario on a 32 bit system: When a negative number is cast to u32 as encode does, the value is positive and is greater than INT_MAX. Decode reads back this value. And, this value cannot be represented by long on 32 bit systems. So by section 6.3.1.3 of the C99 standard, the result is implementation defined.
Consider the following scenario on a 64 bit system: When a negative number is cast to u32 as encode does, the value is positive. This value is later assigned by decode function by a cast to long. Since this value can be represented in long data type, this becomes a positive value greater than INT_MAX. But, the value encoded was negative, so the encode and decode functions do not match.
Change the decode function as follows to overcome the above bug: The decode should first cast the value to a s64 this will be positive value greater than INT_MAX(in case of a negative encoded value)and then cast this value again as s32, which drops the higher order 32 bits. On 32 bit systems, this is the right value in kernel_time_t/ long. On 64 bit systems, assignment to kernel_time_t/ long will sign extend this value to reflect the signed bit encoded.
Assume ceph timestamp ranges permitted are 1902..2038.
Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com --- include/linux/ceph/decode.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index a6ef9cc..e777e99 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -137,8 +137,8 @@ bad: static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { - ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); - ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); + ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec); + ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec); } static inline void ceph_encode_timespec(struct ceph_timespec *tv, const struct timespec *ts)
On Feb 15, 2016, at 11:01, Deepa Dinamani deepa.kernel@gmail.com wrote:
long/ kernel_time_t is 32 bit on a 32 bit system and 64 bit on a 64 bit system.
ceph_encode_timespec() encodes only the lower 32 bits on a 64 bit system and encodes all of 32 bits on a 32bit system.
ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec into kernel_time_t/ long.
The encode and decode functions do not match when the values are negative:
Consider the following scenario on a 32 bit system: When a negative number is cast to u32 as encode does, the value is positive and is greater than INT_MAX. Decode reads back this value. And, this value cannot be represented by long on 32 bit systems. So by section 6.3.1.3 of the C99 standard, the result is implementation defined.
Consider the following scenario on a 64 bit system: When a negative number is cast to u32 as encode does, the value is positive. This value is later assigned by decode function by a cast to long. Since this value can be represented in long data type, this becomes a positive value greater than INT_MAX. But, the value encoded was negative, so the encode and decode functions do not match.
Change the decode function as follows to overcome the above bug: The decode should first cast the value to a s64 this will be positive value greater than INT_MAX(in case of a negative encoded value)and then cast this value again as s32, which drops the higher order 32 bits. On 32 bit systems, this is the right value in kernel_time_t/ long. On 64 bit systems, assignment to kernel_time_t/ long will sign extend this value to reflect the signed bit encoded.
Assume ceph timestamp ranges permitted are 1902..2038.
Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
include/linux/ceph/decode.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index a6ef9cc..e777e99 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -137,8 +137,8 @@ bad: static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
- ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
- ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
- ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
- ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
} static inline void ceph_encode_timespec(struct ceph_timespec *tv, const struct timespec *ts)
Applied, thanks
Yan, Zheng
-- 1.9.1
On Sunday 14 February 2016 19:01:53 Deepa Dinamani wrote:
long/ kernel_time_t is 32 bit on a 32 bit system and 64 bit on a 64 bit system.
ceph_encode_timespec() encodes only the lower 32 bits on a 64 bit system and encodes all of 32 bits on a 32bit system.
ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec into kernel_time_t/ long.
The encode and decode functions do not match when the values are negative:
Consider the following scenario on a 32 bit system: When a negative number is cast to u32 as encode does, the value is positive and is greater than INT_MAX. Decode reads back this value. And, this value cannot be represented by long on 32 bit systems. So by section 6.3.1.3 of the C99 standard, the result is implementation defined.
Consider the following scenario on a 64 bit system: When a negative number is cast to u32 as encode does, the value is positive. This value is later assigned by decode function by a cast to long. Since this value can be represented in long data type, this becomes a positive value greater than INT_MAX. But, the value encoded was negative, so the encode and decode functions do not match.
Change the decode function as follows to overcome the above bug: The decode should first cast the value to a s64 this will be positive value greater than INT_MAX(in case of a negative encoded value)and then cast this value again as s32, which drops the higher order 32 bits. On 32 bit systems, this is the right value in kernel_time_t/ long. On 64 bit systems, assignment to kernel_time_t/ long will sign extend this value to reflect the signed bit encoded.
Assume ceph timestamp ranges permitted are 1902..2038.
Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
Actually I was thinking we'd do the opposite and document the existing 64-bit behavior, and then make sure we do it the same on 32-bit machines once we have the new syscalls.
The most important part of course is to document what the file system is expected to do. Having this in the changelog is important here, but I'd also like to see a comment in the code to ensure readers can see that the behavior is intentional.
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index a6ef9cc..e777e99 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -137,8 +137,8 @@ bad: static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
}
Why did you choose to express this as "(s32)(s64)..." rather than "(s64)(s32)..."? The result is the same (just the s32 cast by itself would be sufficient), I just don't see yet how it clarifies what is going on.
Arnd
On Mon, Feb 15, 2016 at 4:01 AM, Deepa Dinamani deepa.kernel@gmail.com wrote:
long/ kernel_time_t is 32 bit on a 32 bit system and 64 bit on a 64 bit system.
ceph_encode_timespec() encodes only the lower 32 bits on a 64 bit system and encodes all of 32 bits on a 32bit system.
ceph_decode_timespec() decodes 32 bit tv_sec and tv_nsec into kernel_time_t/ long.
The encode and decode functions do not match when the values are negative:
Consider the following scenario on a 32 bit system: When a negative number is cast to u32 as encode does, the value is positive and is greater than INT_MAX. Decode reads back this value. And, this value cannot be represented by long on 32 bit systems. So by section 6.3.1.3 of the C99 standard, the result is implementation defined.
Consider the following scenario on a 64 bit system: When a negative number is cast to u32 as encode does, the value is positive. This value is later assigned by decode function by a cast to long. Since this value can be represented in long data type, this becomes a positive value greater than INT_MAX. But, the value encoded was negative, so the encode and decode functions do not match.
Change the decode function as follows to overcome the above bug: The decode should first cast the value to a s64 this will be positive value greater than INT_MAX(in case of a negative encoded value)and then cast this value again as s32, which drops the higher order 32 bits. On 32 bit systems, this is the right value in kernel_time_t/ long. On 64 bit systems, assignment to kernel_time_t/ long will sign extend this value to reflect the signed bit encoded.
Assume ceph timestamp ranges permitted are 1902..2038.
Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Deepa Dinamani deepa.kernel@gmail.com
include/linux/ceph/decode.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index a6ef9cc..e777e99 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -137,8 +137,8 @@ bad: static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
} static inline void ceph_encode_timespec(struct ceph_timespec *tv, const struct timespec *ts)
I think you are forgetting that this is a wire protocol. Changing how values are interpreted on one side without looking at the other side is generally not what you want to do.
There aren't any casts on the server side: __u32 is simply assigned to time_t, meaning no sign-extension is happening - see src/include/utime.h in ceph.git. The current __kernel_time_t and long casts on the kernel side are meaningless - they don't change the bit pattern, so everything is interpreted the same way. Your patch changes that.
If there is anything to be done here, it is documenting the existing behavior. This isn't to say that there aren't any quirks or bugs in our time handling, it's that any changes concerning the protocol or how the actual bits are interpreted should follow or align with the rest of ceph and have a note in the changelog on that.
(As Greg and I mentioned before, we do have a semi-concrete plan on how to deal with y2038 in ceph, just no timetable yet.)
Thanks,
Ilya
Actually I was thinking we'd do the opposite and document the existing 64-bit behavior, and then make sure we do it the same on 32-bit machines once we have the new syscalls.
I'm not sure I follow what is opposite here. You just want to document the behavior and fix it later when the time range is extended beyond 2038?
The most important part of course is to document what the file system is expected to do. Having this in the changelog is important here, but I'd also like to see a comment in the code to ensure readers can see that the behavior is intentional.
Will add this comment in the code.
diff --git a/include/linux/ceph/decode.h b/include/linux/ceph/decode.h index a6ef9cc..e777e99 100644 --- a/include/linux/ceph/decode.h +++ b/include/linux/ceph/decode.h @@ -137,8 +137,8 @@ bad: static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
}
Why did you choose to express this as "(s32)(s64)..." rather than "(s64)(s32)..."? The result is the same (just the s32 cast by itself would be sufficient), I just don't see yet how it clarifies what is going on.
Let's say we encode -1. so when you pass the same value to decode, ceph_timespec is {0xFFFFFFFF,0}. 0xFFFFFFFF is greater than INT_MAX. And, according to c99 the result is implementation dependent if this cast to a s32.
Quoting from http://c0x.coding-guidelines.com/6.3.1.3.html : 6.3.1.3 Signed and unsigned integers 1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the newtype, it is unchanged. 2 Otherwise, if the newtype is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the newtype until the value is in the range of the newtype.49) 3 Otherwise, the newtype is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised
GCC does guarantee the behavior. Quoting from GCC manual (https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html): The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3). For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
But, as the C standard suggests, behavior is implementation dependent. This is why I cast it to s64 before casting it to s32. I explained in the commit text, but missed GCC part. Quoting from commit text:
Consider the following scenario on a 32 bit system: When a negative number is cast to u32 as encode does, the value is positive and is greater than INT_MAX. Decode reads back this value. And, this value cannot be represented by long on 32 bit systems. So by section 6.3.1.3 of the C99 standard, the result is implementation defined.
Consider the following scenario on a 64 bit system: When a negative number is cast to u32 as encode does, the value is positive. This value is later assigned by decode function by a cast to long. Since this value can be represented in long data type, this becomes a positive value greater than INT_MAX. But, the value encoded was negative, so the encode and decode functions do not match.
Change the decode function as follows to overcome the above bug: The decode should first cast the value to a s64 this will be positive value greater than INT_MAX(in case of a negative encoded value)and then cast this value again as s32, which drops the higher order 32 bits. On 32 bit systems, this is the right value in kernel_time_t/ long. On 64 bit systems, assignment to kernel_time_t/ long will sign extend this value to reflect the signed bit encoded
-Deepa
On Monday 15 February 2016 21:02:33 Deepa Dinamani wrote:
Actually I was thinking we'd do the opposite and document the existing 64-bit behavior, and then make sure we do it the same on 32-bit machines once we have the new syscalls.
I'm not sure I follow what is opposite here. You just want to document the behavior and fix it later when the time range is extended beyond 2038?
What I meant was that rather than changing the returned range from 1970..2106 to 1902..2038, I would make the current behavior explicit and document that it is correct.
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = (s32)(s64)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (s32)(s64)le32_to_cpu(tv->tv_nsec);
}
Why did you choose to express this as "(s32)(s64)..." rather than "(s64)(s32)..."? The result is the same (just the s32 cast by itself would be sufficient), I just don't see yet how it clarifies what is going on.
Let's say we encode -1. so when you pass the same value to decode, ceph_timespec is {0xFFFFFFFF,0}. 0xFFFFFFFF is greater than INT_MAX. And, according to c99 the result is implementation dependent if this cast to a s32.
Quoting from http://c0x.coding-guidelines.com/6.3.1.3.html : 6.3.1.3 Signed and unsigned integers 1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the newtype, it is unchanged. 2 Otherwise, if the newtype is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the newtype until the value is in the range of the newtype.49) 3 Otherwise, the newtype is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised
GCC does guarantee the behavior. Quoting from GCC manual (https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html): The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3). For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
But, as the C standard suggests, behavior is implementation dependent. This is why I cast it to s64 before casting it to s32. I explained in the commit text, but missed GCC part.
Ok, fair enough. I always assumed that "implementation specific" in this context meant that on any architectures that implement negative numbers as twos-complement (anything that Linux runs on) we get the expected result. I'm sure we rely on that behavior in a lot of places in the kernel, but you are correct that the C standard makes your code well-defined, and the other one not.
Arnd
I think you are forgetting that this is a wire protocol. Changing how values are interpreted on one side without looking at the other side is generally not what you want to do.
There aren't any casts on the server side: __u32 is simply assigned to time_t, meaning no sign-extension is happening - see src/include/utime.h in ceph.git. The current __kernel_time_t and long casts on the kernel side are meaningless - they don't change the bit pattern, so everything is interpreted the same way. Your patch changes that.
If there is anything to be done here, it is documenting the existing behavior. This isn't to say that there aren't any quirks or bugs in our time handling, it's that any changes concerning the protocol or how the actual bits are interpreted should follow or align with the rest of ceph and have a note in the changelog on that.
(As Greg and I mentioned before, we do have a semi-concrete plan on how to deal with y2038 in ceph, just no timetable yet.)
I checked the server side code. So the server side also reads whatever is sent over wire as u32 and then assigns it to a time_t.
Do you have any documentation about the wire protocol?
To me, the wire protocol seems to have been designed to support only positive time values(u32). This means ceph can represent times from 1970 - 2106 as long as both server and client are 64 bit machines. Or, client is 64 bit and server does not operate on time values. And, on 32 bit machines implementation will be broken in 2038.
I will change back the above code to not use any casts(as they are a no-op) as on the server side and note the above things in the file and changelog:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) { - ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec); - ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec); + ts->tv_sec = le32_to_cpu(tv->tv_sec); + ts->tv_nsec = le32_to_cpu(tv->tv_nsec); }
-Deepa
On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
I checked the server side code. So the server side also reads whatever is sent over wire as u32 and then assigns it to a time_t.
I think the more important question is whether the server actually does anything with the timestamp other than pass it back to the client. If not, the interpretation it totally up to the client and the type on the server has no meaning (as long as it can store at least as many bits as the wire protocol, and passes all bits back the same way).
Do you have any documentation about the wire protocol?
To me, the wire protocol seems to have been designed to support only positive time values(u32).
I think assigning to a time_t usually just means that this wasn't completely thought through when it got implemented (which we already know from the fact that it started out using a 32-bit number from the time). It's totally possible that this code originated on a 32-bit machine and was meant to encode a signed number and that the behavior changed without anyone noticing when it got ported to a 64-bit architecture for the first time.
This means ceph can represent times from 1970 - 2106 as long as both server and client are 64 bit machines. Or, client is 64 bit and server does not operate on time values. And, on 32 bit machines implementation will be broken in 2038.
I will change back the above code to not use any casts(as they are a no-op) as on the server side and note the above things in the file and changelog:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = le32_to_cpu(tv->tv_sec);
ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
}
Ok. I really like to have an explicit range-extension as well, but as you say, your patch above does not change behavior.
I would write this as
ts->tv_sec = (__u64)le32_to_cpu(tv->tv_sec); ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
which is again the same thing, but leaves less ambiguity.
Arnd
On Thu, Feb 18, 2016 at 1:17 AM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
I checked the server side code. So the server side also reads whatever is sent over wire as u32 and then assigns it to a time_t.
I think the more important question is whether the server actually does anything with the timestamp other than pass it back to the client. If not, the interpretation it totally up to the client and the type on the server has no meaning (as long as it can store at least as many bits as the wire protocol, and passes all bits back the same way).
Timestamps do get interpreted and written to on the server according to the files in the mds directory.
Do you have any documentation about the wire protocol?
To me, the wire protocol seems to have been designed to support only positive time values(u32).
I think assigning to a time_t usually just means that this wasn't completely thought through when it got implemented (which we already know from the fact that it started out using a 32-bit number from the time). It's totally possible that this code originated on a 32-bit machine and was meant to encode a signed number and that the behavior changed without anyone noticing when it got ported to a 64-bit architecture for the first time.
This means ceph can represent times from 1970 - 2106 as long as both server and client are 64 bit machines. Or, client is 64 bit and server does not operate on time values. And, on 32 bit machines implementation will be broken in 2038.
I will change back the above code to not use any casts(as they are a no-op) as on the server side and note the above things in the file and changelog:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = le32_to_cpu(tv->tv_sec);
ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
}
Ok. I really like to have an explicit range-extension as well, but as you say, your patch above does not change behavior.
You mean change the functions and data types on both sides? Not sure if this is meant for me or the Ceph team.
I would write this as
ts->tv_sec = (__u64)le32_to_cpu(tv->tv_sec); ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
which is again the same thing, but leaves less ambiguity.
Will do.
-Deepa
On Thursday 18 February 2016 01:35:25 Deepa Dinamani wrote:
On Thu, Feb 18, 2016 at 1:17 AM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
I checked the server side code. So the server side also reads whatever is sent over wire as u32 and then assigns it to a time_t.
I think the more important question is whether the server actually does anything with the timestamp other than pass it back to the client. If not, the interpretation it totally up to the client and the type on the server has no meaning (as long as it can store at least as many bits as the wire protocol, and passes all bits back the same way).
Timestamps do get interpreted and written to on the server according to the files in the mds directory.
Ok. I see they are stored internally as a __u32/__u32 pair in "class utime_t", I just couldn't figure out whether this is used on the server at all, or stored in a timespec or time_t.
Do you have any documentation about the wire protocol?
To me, the wire protocol seems to have been designed to support only positive time values(u32).
I think assigning to a time_t usually just means that this wasn't completely thought through when it got implemented (which we already know from the fact that it started out using a 32-bit number from the time). It's totally possible that this code originated on a 32-bit machine and was meant to encode a signed number and that the behavior changed without anyone noticing when it got ported to a 64-bit architecture for the first time.
This means ceph can represent times from 1970 - 2106 as long as both server and client are 64 bit machines. Or, client is 64 bit and server does not operate on time values. And, on 32 bit machines implementation will be broken in 2038.
I will change back the above code to not use any casts(as they are a no-op) as on the server side and note the above things in the file and changelog:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = le32_to_cpu(tv->tv_sec);
ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
}
Ok. I really like to have an explicit range-extension as well, but as you say, your patch above does not change behavior.
You mean change the functions and data types on both sides? Not sure if this is meant for me or the Ceph team.
I meant it should be part of your patch.
Arnd
On Thu, Feb 18, 2016 at 10:35 AM, Deepa Dinamani deepa.kernel@gmail.com wrote:
On Thu, Feb 18, 2016 at 1:17 AM, Arnd Bergmann arnd@arndb.de wrote:
On Wednesday 17 February 2016 22:02:11 Deepa Dinamani wrote:
I checked the server side code. So the server side also reads whatever is sent over wire as u32 and then assigns it to a time_t.
I think the more important question is whether the server actually does anything with the timestamp other than pass it back to the client. If not, the interpretation it totally up to the client and the type on the server has no meaning (as long as it can store at least as many bits as the wire protocol, and passes all bits back the same way).
Timestamps do get interpreted and written to on the server according to the files in the mds directory.
Do you have any documentation about the wire protocol?
To me, the wire protocol seems to have been designed to support only positive time values(u32).
I think assigning to a time_t usually just means that this wasn't completely thought through when it got implemented (which we already know from the fact that it started out using a 32-bit number from the time). It's totally possible that this code originated on a 32-bit machine and was meant to encode a signed number and that the behavior changed without anyone noticing when it got ported to a 64-bit architecture for the first time.
This means ceph can represent times from 1970 - 2106 as long as both server and client are 64 bit machines. Or, client is 64 bit and server does not operate on time values. And, on 32 bit machines implementation will be broken in 2038.
I will change back the above code to not use any casts(as they are a no-op) as on the server side and note the above things in the file and changelog:
static inline void ceph_decode_timespec(struct timespec *ts, const struct ceph_timespec *tv) {
ts->tv_sec = (__kernel_time_t)le32_to_cpu(tv->tv_sec);
ts->tv_nsec = (long)le32_to_cpu(tv->tv_nsec);
ts->tv_sec = le32_to_cpu(tv->tv_sec);
ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
}
Ok. I really like to have an explicit range-extension as well, but as you say, your patch above does not change behavior.
You mean change the functions and data types on both sides? Not sure if this is meant for me or the Ceph team.
I would write this as
ts->tv_sec = (__u64)le32_to_cpu(tv->tv_sec); ts->tv_nsec = le32_to_cpu(tv->tv_nsec);
which is again the same thing, but leaves less ambiguity.
Will do.
If we are going to touch this function at all, let's drop all casts and just add a comment along the lines of "the lack of cast/sign-extension here is intentional, this function has a counterpart on the server side which also lacks cast/sign-extension, etc". To someone who is not up on these subtleties, the __u64 cast is going to be more confusing than explicit...
Thanks,
Ilya
On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
If we are going to touch this function at all, let's drop all casts and just add a comment along the lines of "the lack of cast/sign-extension here is intentional, this function has a counterpart on the server side which also lacks cast/sign-extension, etc". To someone who is not up on these subtleties, the __u64 cast is going to be more confusing than explicit...
We certainly a comment, I was just suggesting to add the cast as well.
Again, it's not the lack of the cast on the server that is important here, it's whether it gets interpreted as signed or unsigned at the point where the timestamp is actually used for comparison or printing. Let me suggest a comment here:
/* * ceph timestamps are unsigned 32-bit starting at 1970, which is * different from a signed 32-bit or 64-bit time_t. On 64-bit * architectures, this gets interpreted as a subset of time_t * in the range from 1970 to 2106. * Machines with a 32-bit time_t will incorrectly interpret the * timestamps with years 2038-2106 as negative numbers in the * 1902-1969 range, until both kernel and glibc are change to * using 64-bit time_t. */
Arnd
On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
If we are going to touch this function at all, let's drop all casts and just add a comment along the lines of "the lack of cast/sign-extension here is intentional, this function has a counterpart on the server side which also lacks cast/sign-extension, etc". To someone who is not up on these subtleties, the __u64 cast is going to be more confusing than explicit...
We certainly a comment, I was just suggesting to add the cast as well.
Again, it's not the lack of the cast on the server that is important here, it's whether it gets interpreted as signed or unsigned at the point where the timestamp is actually used for comparison or printing. Let me suggest a comment here:
I'd argue it's important to have encode/decode code look the same.
I mentioned before that at least in some places it's interpreted as signed and that the interpretation is not totally up to the client. The printing (at least some of it) is right there in the same file (src/include/utime.h) and you can see ... + 1900.
/*
- ceph timestamps are unsigned 32-bit starting at 1970, which is
- different from a signed 32-bit or 64-bit time_t. On 64-bit
- architectures, this gets interpreted as a subset of time_t
- in the range from 1970 to 2106.
- Machines with a 32-bit time_t will incorrectly interpret the
- timestamps with years 2038-2106 as negative numbers in the
- 1902-1969 range, until both kernel and glibc are change to
- using 64-bit time_t.
*/
I think a more accurate description would be "ceph timestamps are signed 32-bit, with the caveat that something somewhere is likely to break on a negative timestamp".
We are not looking at trying to reuse that extra bit for 1970..2106. As I said earlier, utime_t is used in quite a lot of places throughout the code base and is part of both in-core and on-disk data structures. Auditing all those sites for that is probably never going to happen. I think the plan is to live with what we've got until a proper 64-bit sec/nsec conversion is done in a way that was described earlier (message versions, ceph feature bit, etc). Any vfs timespec-handling changes should simply preserve the existing behavior, for now.
Thanks,
Ilya
On Thursday 18 February 2016 18:22:55 Ilya Dryomov wrote:
On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
If we are going to touch this function at all, let's drop all casts and just add a comment along the lines of "the lack of cast/sign-extension here is intentional, this function has a counterpart on the server side which also lacks cast/sign-extension, etc". To someone who is not up on these subtleties, the __u64 cast is going to be more confusing than explicit...
We certainly a comment, I was just suggesting to add the cast as well.
Again, it's not the lack of the cast on the server that is important here, it's whether it gets interpreted as signed or unsigned at the point where the timestamp is actually used for comparison or printing. Let me suggest a comment here:
I'd argue it's important to have encode/decode code look the same.
I mentioned before that at least in some places it's interpreted as signed and that the interpretation is not totally up to the client. The printing (at least some of it) is right there in the same file (src/include/utime.h) and you can see ... + 1900.
I see this code:
class utime_t { struct { __u32 tv_sec, tv_nsec; } tv;
time_t sec() const { return tv.tv_sec; }
ostream& gmtime(ostream& out) const { out.setf(std::ios::right); char oldfill = out.fill(); out.fill('0'); if (sec() < ((time_t)(60*60*24*365*10))) { // raw seconds. this looks like a relative time. out << (long)sec() << "." << std::setw(6) << usec(); } else { // localtime. this looks like an absolute time. // aim for http://en.wikipedia.org/wiki/ISO_8601 struct tm bdt; time_t tt = sec(); gmtime_r(&tt, &bdt); out << std::setw(4) << (bdt.tm_year+1900) // 2007 -> '07' << '-' << std::setw(2) << (bdt.tm_mon+1) << '-' << std::setw(2) << bdt.tm_mday << ' ' << std::setw(2) << bdt.tm_hour << ':' << std::setw(2) << bdt.tm_min << ':' << std::setw(2) << bdt.tm_sec; out << "." << std::setw(6) << usec(); out << "Z"; } out.fill(oldfill); out.unsetf(std::ios::right); return out; }
which interprets the time roughly in the way I explained:
* On 64-bit architectures, the time is interpreted as an unsigned number in the range from 1980-2106, while any times between 1970 and 1980 are interpreted as a relative time and printed as a positive seconds number.
* On 32-bit architectures, times in the range 1980-2038 are printed as a date like 64-bit does, times from 1970-1980 are also printed as a positive seconds number, and anything with the top bit set is printed as a negative seconds number.
It this the intended behavior? I guess we could change the kernel to reject any timestamps before 1980 and after 2038 in ceph_decode_timespec and just set the Linux timestamp to (time_t)0 (Jan 1 1970) in that case, and document that there is no valid interpretation for those.
- ceph timestamps are unsigned 32-bit starting at 1970, which is
- different from a signed 32-bit or 64-bit time_t. On 64-bit
- architectures, this gets interpreted as a subset of time_t
- in the range from 1970 to 2106.
- Machines with a 32-bit time_t will incorrectly interpret the
- timestamps with years 2038-2106 as negative numbers in the
- 1902-1969 range, until both kernel and glibc are change to
- using 64-bit time_t.
*/
I think a more accurate description would be "ceph timestamps are signed 32-bit, with the caveat that something somewhere is likely to break on a negative timestamp".
This is the opposite of what you said previously, citing:
| There aren't any casts on the server side: __u32 is simply assigned to | time_t, meaning no sign-extension is happening
which I read as meaning you are using it as an unsigned number, because of the way the assignment operator works when you assign an unsigned 32-bit number to a signed 64-bit number.
We are not looking at trying to reuse that extra bit for 1970..2106. As I said earlier, utime_t is used in quite a lot of places throughout the code base and is part of both in-core and on-disk data structures. Auditing all those sites for that is probably never going to happen. I think the plan is to live with what we've got until a proper 64-bit sec/nsec conversion is done in a way that was described earlier (message versions, ceph feature bit, etc). Any vfs timespec-handling changes should simply preserve the existing behavior, for now.
And that is a third option: preserve the existing behavior (different between 32-bit and 64-bit) even when we extend the 32-bit client to have 64-bit timestamps. I can't believe that is what you actually meant here but if you did, that would probably be the worst solution.
Please just make up your mind and say which behavior you want the kernel client to have when dealing with servers that communicate using 32-bit timestamps:
a) Remain consistent with current 64-bit servers and clients, interpreting the number as an unsigned 32-bit integer where possible, and remain broken on existing 32-bit machines until they start using 64-bit time_t.
b) Change 64-bit clients to behave the same way that that 32-bit clients and servers do today, and actually use the times as a signed number that you claimed they already did.
c) Interpret all times before 1980 or after 2038 as buggy servers and only accept times that are always handled consistently. Also fix the server to reject those times coming from broken clients.
d) Leave 64-bit clients unchanged (using unsigned timestamps) and document that 32-bit clients should keep interpreting them as signed even when we fix VFS, to preserve the existing behavior.
I think a), b) and c) are all reasonable, and I can send a patch as soon as you know what you want. If you want d), please leave me out of that and do it yourself.
Arnd
On Fri, Feb 19, 2016 at 11:00 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 18 February 2016 18:22:55 Ilya Dryomov wrote:
On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
If we are going to touch this function at all, let's drop all casts and just add a comment along the lines of "the lack of cast/sign-extension here is intentional, this function has a counterpart on the server side which also lacks cast/sign-extension, etc". To someone who is not up on these subtleties, the __u64 cast is going to be more confusing than explicit...
We certainly a comment, I was just suggesting to add the cast as well.
Again, it's not the lack of the cast on the server that is important here, it's whether it gets interpreted as signed or unsigned at the point where the timestamp is actually used for comparison or printing. Let me suggest a comment here:
I'd argue it's important to have encode/decode code look the same.
I mentioned before that at least in some places it's interpreted as signed and that the interpretation is not totally up to the client. The printing (at least some of it) is right there in the same file (src/include/utime.h) and you can see ... + 1900.
I see this code:
class utime_t { struct { __u32 tv_sec, tv_nsec; } tv;
time_t sec() const { return tv.tv_sec; }
ostream& gmtime(ostream& out) const { out.setf(std::ios::right); char oldfill = out.fill(); out.fill('0'); if (sec() < ((time_t)(60*60*24*365*10))) { // raw seconds. this looks like a relative time. out << (long)sec() << "." << std::setw(6) << usec(); } else { // localtime. this looks like an absolute time. // aim for http://en.wikipedia.org/wiki/ISO_8601 struct tm bdt; time_t tt = sec(); gmtime_r(&tt, &bdt); out << std::setw(4) << (bdt.tm_year+1900) // 2007 -> '07' << '-' << std::setw(2) << (bdt.tm_mon+1) << '-' << std::setw(2) << bdt.tm_mday << ' ' << std::setw(2) << bdt.tm_hour << ':' << std::setw(2) << bdt.tm_min << ':' << std::setw(2) << bdt.tm_sec; out << "." << std::setw(6) << usec(); out << "Z"; } out.fill(oldfill); out.unsetf(std::ios::right); return out; }
which interprets the time roughly in the way I explained:
On 64-bit architectures, the time is interpreted as an unsigned number in the range from 1980-2106, while any times between 1970 and 1980 are interpreted as a relative time and printed as a positive seconds number.
On 32-bit architectures, times in the range 1980-2038 are printed as a date like 64-bit does, times from 1970-1980 are also printed as a positive seconds number, and anything with the top bit set is printed as a negative seconds number.
It this the intended behavior? I guess we could change the kernel to reject any timestamps before 1980 and after 2038 in ceph_decode_timespec and just set the Linux timestamp to (time_t)0 (Jan 1 1970) in that case, and document that there is no valid interpretation for those.
At least conceptually, this isn't a bad idea, as only 1970(1980)..2038 works on both architectures; everything else is likely to break something in ceph either on 32-bit or on both, I think. But, I'm not sure if rejecting by setting to 0 is going to work all around and we can't change just the kernel client.
- ceph timestamps are unsigned 32-bit starting at 1970, which is
- different from a signed 32-bit or 64-bit time_t. On 64-bit
- architectures, this gets interpreted as a subset of time_t
- in the range from 1970 to 2106.
- Machines with a 32-bit time_t will incorrectly interpret the
- timestamps with years 2038-2106 as negative numbers in the
- 1902-1969 range, until both kernel and glibc are change to
- using 64-bit time_t.
*/
I think a more accurate description would be "ceph timestamps are signed 32-bit, with the caveat that something somewhere is likely to break on a negative timestamp".
This is the opposite of what you said previously, citing:
| There aren't any casts on the server side: __u32 is simply assigned to | time_t, meaning no sign-extension is happening
which I read as meaning you are using it as an unsigned number, because of the way the assignment operator works when you assign an unsigned 32-bit number to a signed 64-bit number.
All I was trying to convey throughout this thread is: we can't change just the kernel client. We can't add sign-extending casts without either adding the same casts in ceph.git (which is both servers and other clients) or explaining why it is OK to change just the kernel client in the commit message.
We are not looking at trying to reuse that extra bit for 1970..2106. As I said earlier, utime_t is used in quite a lot of places throughout the code base and is part of both in-core and on-disk data structures. Auditing all those sites for that is probably never going to happen. I think the plan is to live with what we've got until a proper 64-bit sec/nsec conversion is done in a way that was described earlier (message versions, ceph feature bit, etc). Any vfs timespec-handling changes should simply preserve the existing behavior, for now.
And that is a third option: preserve the existing behavior (different between 32-bit and 64-bit) even when we extend the 32-bit client to have 64-bit timestamps. I can't believe that is what you actually meant here but if you did, that would probably be the worst solution.
Please just make up your mind and say which behavior you want the kernel client to have when dealing with servers that communicate using 32-bit timestamps:
I want the kernel client to be bug compatible with servers and other clients in ceph.git.
a) Remain consistent with current 64-bit servers and clients, interpreting the number as an unsigned 32-bit integer where possible, and remain broken on existing 32-bit machines until they start using 64-bit time_t.
b) Change 64-bit clients to behave the same way that that 32-bit clients and servers do today, and actually use the times as a signed number that you claimed they already did.
c) Interpret all times before 1980 or after 2038 as buggy servers and only accept times that are always handled consistently. Also fix the server to reject those times coming from broken clients.
d) Leave 64-bit clients unchanged (using unsigned timestamps) and document that 32-bit clients should keep interpreting them as signed even when we fix VFS, to preserve the existing behavior.
I think a), b) and c) are all reasonable, and I can send a patch as soon as you know what you want. If you want d), please leave me out of that and do it yourself.
My preference would be a), possibly with some comments along the lines of c), to make the ..2038 range more explicit, as I doubt we've ever assumed ..2106 on 64-bit. This is all really old code and goes back to the dawn of ceph and beyond the kernel client, so let's wait for Sage to chime in.
Thanks,
Ilya
On Friday 19 February 2016 12:42:02 Ilya Dryomov wrote:
On Fri, Feb 19, 2016 at 11:00 AM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 18 February 2016 18:22:55 Ilya Dryomov wrote:
On Thu, Feb 18, 2016 at 5:20 PM, Arnd Bergmann arnd@arndb.de wrote:
On Thursday 18 February 2016 17:02:25 Ilya Dryomov wrote:
If we are going to touch this function at all, let's drop all casts and just add a comment along the lines of "the lack of cast/sign-extension here is intentional, this function has a counterpart on the server side which also lacks cast/sign-extension, etc". To someone who is not up on these subtleties, the __u64 cast is going to be more confusing than explicit...
We certainly a comment, I was just suggesting to add the cast as well.
Again, it's not the lack of the cast on the server that is important here, it's whether it gets interpreted as signed or unsigned at the point where the timestamp is actually used for comparison or printing. Let me suggest a comment here:
I'd argue it's important to have encode/decode code look the same.
I mentioned before that at least in some places it's interpreted as signed and that the interpretation is not totally up to the client. The printing (at least some of it) is right there in the same file (src/include/utime.h) and you can see ... + 1900.
I see this code:
class utime_t { struct { __u32 tv_sec, tv_nsec; } tv;
time_t sec() const { return tv.tv_sec; }
ostream& gmtime(ostream& out) const { out.setf(std::ios::right); char oldfill = out.fill(); out.fill('0'); if (sec() < ((time_t)(60*60*24*365*10))) { // raw seconds. this looks like a relative time. out << (long)sec() << "." << std::setw(6) << usec(); } else { // localtime. this looks like an absolute time. // aim for http://en.wikipedia.org/wiki/ISO_8601 struct tm bdt; time_t tt = sec(); gmtime_r(&tt, &bdt); out << std::setw(4) << (bdt.tm_year+1900) // 2007 -> '07' << '-' << std::setw(2) << (bdt.tm_mon+1) << '-' << std::setw(2) << bdt.tm_mday << ' ' << std::setw(2) << bdt.tm_hour << ':' << std::setw(2) << bdt.tm_min << ':' << std::setw(2) << bdt.tm_sec; out << "." << std::setw(6) << usec(); out << "Z"; } out.fill(oldfill); out.unsetf(std::ios::right); return out; }
which interprets the time roughly in the way I explained:
On 64-bit architectures, the time is interpreted as an unsigned number in the range from 1980-2106, while any times between 1970 and 1980 are interpreted as a relative time and printed as a positive seconds number.
On 32-bit architectures, times in the range 1980-2038 are printed as a date like 64-bit does, times from 1970-1980 are also printed as a positive seconds number, and anything with the top bit set is printed as a negative seconds number.
It this the intended behavior? I guess we could change the kernel to reject any timestamps before 1980 and after 2038 in ceph_decode_timespec and just set the Linux timestamp to (time_t)0 (Jan 1 1970) in that case, and document that there is no valid interpretation for those.
At least conceptually, this isn't a bad idea, as only 1970(1980)..2038 works on both architectures; everything else is likely to break something in ceph either on 32-bit or on both, I think. But, I'm not sure if rejecting by setting to 0 is going to work all around and we can't change just the kernel client.
Setting to 0 is a bit fishy, but I couldn't think of anything better either. Right now, we just overflow 32-bit timestamps all the time and get unpredictable results. I think we will end up implementing some form of truncation, so times earlier than the start of the file system specific epoch (1902, 1970, 1980, ... depending on the implementation) get set to the earliest supported time, while times after the last supported timestamp (2037, 2038, 2106, ...) get set to the maximum.
The problem here is that we don't even know whether an underflow or an overflow happened.
I want the kernel client to be bug compatible with servers and other clients in ceph.git.
a) Remain consistent with current 64-bit servers and clients, interpreting the number as an unsigned 32-bit integer where possible, and remain broken on existing 32-bit machines until they start using 64-bit time_t.
b) Change 64-bit clients to behave the same way that that 32-bit clients and servers do today, and actually use the times as a signed number that you claimed they already did.
c) Interpret all times before 1980 or after 2038 as buggy servers and only accept times that are always handled consistently. Also fix the server to reject those times coming from broken clients.
d) Leave 64-bit clients unchanged (using unsigned timestamps) and document that 32-bit clients should keep interpreting them as signed even when we fix VFS, to preserve the existing behavior.
I think a), b) and c) are all reasonable, and I can send a patch as soon as you know what you want. If you want d), please leave me out of that and do it yourself.
My preference would be a), possibly with some comments along the lines of c), to make the ..2038 range more explicit, as I doubt we've ever assumed ..2106 on 64-bit. This is all really old code and goes back to the dawn of ceph and beyond the kernel client, so let's wait for Sage to chime in.
Ok, thanks!
I think the same problem appeared in a lot of places when code was originally written with the assumption that time_t is 32-bit, and then it kept working in practice when ported to 64-bit systems for the first time but the underlying assumptions changed.
/* * ceph timestamp handling is traditionally inconsistent between * 32-bit clients and servers using them as a signed 32-bit time_t * ranging from 1902 to 2038, and 64-bit clients and servers * interpreting the same numbers as unsigned 32-bit integers * with negative numbers turning into dates between 2038 and 2106. * * Most clients and servers these days are 64-bit, so let's pretend * that interpreting the times as "__u32" is intended, and use * that on 32-bit systems as well once using a 64-bit time_t. */
Arnd
On Fri, 19 Feb 2016, Arnd Bergmann wrote:
On Friday 19 February 2016 12:42:02 Ilya Dryomov wrote:
I want the kernel client to be bug compatible with servers and other clients in ceph.git.
a) Remain consistent with current 64-bit servers and clients, interpreting the number as an unsigned 32-bit integer where possible, and remain broken on existing 32-bit machines until they start using 64-bit time_t.
b) Change 64-bit clients to behave the same way that that 32-bit clients and servers do today, and actually use the times as a signed number that you claimed they already did.
c) Interpret all times before 1980 or after 2038 as buggy servers and only accept times that are always handled consistently. Also fix the server to reject those times coming from broken clients.
d) Leave 64-bit clients unchanged (using unsigned timestamps) and document that 32-bit clients should keep interpreting them as signed even when we fix VFS, to preserve the existing behavior.
I think a), b) and c) are all reasonable, and I can send a patch as soon as you know what you want. If you want d), please leave me out of that and do it yourself.
My preference would be a), possibly with some comments along the lines of c), to make the ..2038 range more explicit, as I doubt we've ever assumed ..2106 on 64-bit. This is all really old code and goes back to the dawn of ceph and beyond the kernel client, so let's wait for Sage to chime in.
Ok, thanks!
I think the same problem appeared in a lot of places when code was originally written with the assumption that time_t is 32-bit, and then it kept working in practice when ported to 64-bit systems for the first time but the underlying assumptions changed.
/*
- ceph timestamp handling is traditionally inconsistent between
- 32-bit clients and servers using them as a signed 32-bit time_t
- ranging from 1902 to 2038, and 64-bit clients and servers
- interpreting the same numbers as unsigned 32-bit integers
- with negative numbers turning into dates between 2038 and 2106.
- Most clients and servers these days are 64-bit, so let's pretend
- that interpreting the times as "__u32" is intended, and use
- that on 32-bit systems as well once using a 64-bit time_t.
*/
This sounsd right to me. In practice we never have negative timestamps, so keeping the unsigned interpretation makes the most sense, and buys us several decades. Although it's pretty likely we'll switch to a 64-bit encoding long before then...
sage