On Wed, 16 Dec 2015, Daniel Lezcano wrote:
On 12/15/2015 11:16 PM, Nicolas Pitre wrote:
On Tue, 15 Dec 2015, Daniel Lezcano wrote:
+static void stats_add(struct stats *s, s32 value) +{
- int limit = 1 << 24;
- /*
* In order to prevent an overflow in the statistic code, we
* limit the value to 2^24 - 1, if it is greater we just
* ignore it with a WARN_ON_ONCE letting know to userspace we
* are dealing with out-of-range values.
*/
- if (WARN_ON_ONCE(value >= (limit - 1) || value < -limit))
return;
This limit check is rather useless. Given we're adding microsecs intervals here, 2^24 microsecs is equal to 4.6 hours. IRQ predictions of that nature make no sense. In fact, any IRQ interval greater than a few seconds should simply reset the stats.
OK, as noted below, I mixed up ms with us. 2^24 microsecs is actually 16 seconds. That's probably fairly common. More below.
Ok.
In the previous code, when the irq interval was greater than one second then we reset the stats. I noted a strange behavior with the network where the results were worst. Actually on my system test, the hardware is a bit weird: there are for the same cpu several interrupts for the same NIC and it does round-robin at several seconds interval, so all interrupts were always discarded. Removing the stats reset improved the situation. I preferred to remove this portion of code in order to refocus on it later with a rational rather than 'one second is long enough'.
I also, suspect we may endup with a tasklet to do more complex statistics, especially using a standard error for a burst of intervals.
Now... let's assume we do want to limit statistics on IRQs that are less than a few seconds appart. We can have everything fit in 32-bit values. Assuming the maximum value of an u32, the sum of squared values may not exceed (2^32)-1. You have STATS_MAX_VALUE amount of such values. That means:
sqrt( ((2^32)-1) / STATS_MAX_VALUE ) = 32767
So... with 4 values we can fit everything in 32 bits if the IRQ interval is limited to 32 seconds.
Why 32 seconds ? It isn't 32 milli-second ?
Doh. Indeed. Scratch that one.
That means the actual limit with 64 bits would be:
sqrt( ((2^64)-1) / STATS_MAX_VALUE ) = 2147483647
That's 35 minutes. No IRQ interval prediction should ever be that long. So the limit of 16 seconds above is questionnable and probably unjustified here.
The limit should be part of the policy at a higher level. Silently dropping intervals greater than 16 seconds without resetting the stats is _possibly_ a good thing to do. But right now it happens rather by accident rather than by design.
I'd suggest documenting the restriction for stats_add() saying that the value must be less than sqrt(((2^64)-1) / STATS_MAX_VALUE) and then enforce a cutoff policy at the caller. Right now it is 16 seconds, but maybe some other value might provide even better results.
Nicolas