On Thu, 10 Dec 2015, Daniel Lezcano wrote:
On 12/10/2015 06:16 PM, Nicolas Pitre wrote:
On Thu, 10 Dec 2015, Daniel Lezcano wrote:
+u32 stats_variance(struct stats *s) +{
- s32 mean = stats_mean(s);
- return s->n ? (s->sq_sum / s->n) - (mean * mean) : 0;
Here sq_sum is a u64 value. On 32-bit architectures you are not allowed to do a straight division with 64-bit dividends in the kernel. You have to use do_div(). The do_div() is there to make the division much less costly than the plain division where gcc promotes both the dividend and the divisor to 64 bits. Still, do_div() is still a very expensive operation when the divisor isn't constant.
Ah, ok. Will fix this.
Note that this is the reason why in my version of the IRQ stat code, I avoided all non constant divisions by factoring out the n term. Here you have:
sq_sum / n - mean * mean
where mean is obtained by:
sum / n
So by calling stats_variance() you pay the cost of two expensive divisions. If you call stats_mean() separately from higher level code (as you do in patch #3) then you add another division to the mix.
One reason I didn't use a library abstraction in my own implementation is because I wanted to have n always be a constant for such divisions (plain or do_div() style) so that gcc could optimize them with reciprocal multiplications at compile time. And some other divisions were factored out and postponed to the end, etc.
Furthermore, given you allow for values with a 24-bit magnitude, it is possible for the mean to have that magnitude. With mean being a s32 variable, any mean value with a magnitude greater than 15 bits (or actually 46340 to be precise) will overflow the product above.
Thanks for spotting this.
I think the return type is wrong, it should be an u64.
Right.
But then how behaves an u64 with int_sqrt which is based on BITS_PER_LONG ?
The prototype has an unsigned long for argument. Therefore, on a 32-bit architecture, your u64 value will be truncated.
+} +EXPORT_SYMBOL_GPL(stats_variance);
+/**
- stats_stddev - compute the standard deviation
- @s: the statistic structure
- Returns an u32 corresponding to the standard deviation, or zero if
- there is no data
- */
+u32 stats_stddev(struct stats *s) +{
- return int_sqrt(stats_variance(s));
+} +EXPORT_SYMBOL_GPL(stats_stddev);
You realize that int_sqrt() is also somewhat expensive.
Yes, this is why the stats library should be used wisely.
+/**
- stats_add - add a new value in the statistic structure
- @s: the statistic structure
- @value: the new value to be added, max 2^24 - 1
- Adds the value to the array, if the array is full, then the array
- is shifted.
- */
+void stats_add(struct stats *s, s32 value) +{
- /*
* In order to prevent an overflow in the statistic code, we
* limit the value to 2^24 - 1, if it is greater we just
* ignore it with a WARN_ON_ONCE letting know to userspace we
* are dealing with out-of-range values.
*/
- if (WARN_ON_ONCE(value >= ((2<<24) - 1)))
return;
You don't trap large negative values?
Right, fixed now. Thanks !
/*
* Insert the value in the array. If the array is already
* full, shift the values and add the value at the end of the
* series, otherwise add the value directly.
*/
- if (likely(s->len == s->n)) {
s->sum -= s->values[s->w_ptr];
s->sq_sum -= s->values[s->w_ptr] * s->values[s->w_ptr];
s->values[s->w_ptr] = value;
s->w_ptr = (s->w_ptr + 1) % s->len;
- } else {
s->values[s->n] = value;
s->n++;
- }
- /*
* Keep track of the min and max.
*/
- s->min = min(s->min, value);
- s->max = max(s->max, value);
/*
* In order to reduce the overhead and to prevent value
* derivation due to the integer computation, we just sum the
* value and do the division when the average and the variance
* are requested.
*/
- s->sum += value;
- s->sq_sum += value * value;
Same issue as in stats_variance(): given value is a s32 variable, any value greater than 46340 will overflow the value * value product. For example, if value = 46341, you'll actually add -2147479015 (yes, a large negative value) to s->sq_sum -- probably not what you intended. And because s->sq_sum is unsigned, then that negative value will then be interpreted as a very large positive value later on.
Ok, I trust what you say but 46340 is a small value and I did not see this overflow occurring at runtime in the traces. May be I missed it.
What will you suggest ?
s->sq_sum += (s64)value * value;
This way both terms will be promoted to s64. On ARM, gcc is smart enough to use the smlal instruction that performs it all, including the addition.
Nicolas