On 12/15/2015 11:16 PM, Nicolas Pitre wrote:
On Tue, 15 Dec 2015, Daniel Lezcano wrote:
Many IRQs are quiet most of the time, or they tend to come in bursts of fairly equal time intervals within each burst. It is therefore possible to detect those IRQs with stable intervals and guestimate when the next IRQ event is most likely to happen.
Examples of such IRQs may include audio related IRQs where the FIFO size and/or DMA descriptor size with the sample rate create stable intervals, block devices during large data transfers, etc. Even network streaming of multimedia content creates patterns of periodic network interface IRQs in some cases.
This patch adds code to track the mean interval and variance for each IRQ over a window of time intervals between IRQ events. Those statistics can be used to assist cpuidle in selecting the most appropriate sleep state by predicting the most likely time for the next interrupt.
Because the stats are gathered in interrupt context, the core computation is as light as possible.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
Comments below.
[ ... ]
+/*
- Define the number of values we will be dealing with to the monitor
- the interruptions. We define 2^x values to do stats on and as it is
- a multiple of 2, bits shifting will be faster than an optimized
- division.
- */
This comment is a bit confusing. What is 2^x relating to? And bit shifting is one possible form of an optimized division. Etc.
I'd suggest the following comment instead:
/*
- Define the number of samples over which the average and variance
- are computed. A power of 2 is preferred so to let the compiler
- optimize divisions by that number with simple arithmetic shifts.
*/
Ok.
+#define STATS_MAX_VALUE 4
This should also be something like STATS_MAX_NR_VALUES otherwise this could be interpreted as the maximum value used in the statistics.
Agree.
+/**
- struct stats - internal structure to encapsulate stats informations
- @sum: sum of the values
- @sq_sum: sum of the square values
- @values: array of values to do stats on
- @n: current number of values
- @w_ptr: current buffer pointer
- */
+struct stats {
- s64 sum; /* sum of values */
- u64 sq_sum; /* sum of the square values */
- s32 values[STATS_MAX_VALUE]; /* array of values */
- unsigned int n; /* current number of values */
- unsigned int w_ptr; /* current window pointer */
Given we're dealing with time intervals here, they will always be positive values. All fields can therefore be unsigned to make things a little more efficient.
Agree.
+};
+/**
- struct wakeup - internal structure describing a source of wakeup
- @stats: the stats structure on the different event intervals
- @timestamp: latest update timestamp
- */
+struct wakeup {
- struct stats stats;
- ktime_t timestamp;
+};
+/*
- Per cpu and irq statistics. Each cpu receives interrupts and those
- ones can be distributed following an irq chip specific
- algorithm. Random irq distribution is the worst case to predict
- interruption behavior but usually that does not happen or could be
- fixed from userspace by setting the irq affinity.
- */
+static DEFINE_PER_CPU(struct wakeup, *wakesup[NR_IRQS]);
s/wakesup/wakeups/
Noted.
+/**
- stats_add - add a new value in the statistic structure
- @s: the statistic structure
- @value: the new value to be added, max 2^24 - 1
- Adds the value to the array, if the array is full, then the array
- is shifted.
Strictly speaking, nothing is shifted. Instead you could say "if the array is full, the oldest value is replaced."
Agree.
- */
+static void stats_add(struct stats *s, s32 value) +{
- int limit = 1 << 24;
- /*
* In order to prevent an overflow in the statistic code, we
* limit the value to 2^24 - 1, if it is greater we just
* ignore it with a WARN_ON_ONCE letting know to userspace we
* are dealing with out-of-range values.
*/
- if (WARN_ON_ONCE(value >= (limit - 1) || value < -limit))
return;
This limit check is rather useless. Given we're adding microsecs intervals here, 2^24 microsecs is equal to 4.6 hours. IRQ predictions of that nature make no sense. In fact, any IRQ interval greater than a few seconds should simply reset the stats.
Ok.
In the previous code, when the irq interval was greater than one second then we reset the stats. I noted a strange behavior with the network where the results were worst. Actually on my system test, the hardware is a bit weird: there are for the same cpu several interrupts for the same NIC and it does round-robin at several seconds interval, so all interrupts were always discarded. Removing the stats reset improved the situation. I preferred to remove this portion of code in order to refocus on it later with a rational rather than 'one second is long enough'.
I also, suspect we may endup with a tasklet to do more complex statistics, especially using a standard error for a burst of intervals.
/*
* Insert the value in the array. If the array is already
* full, shift the values and add the value at the end of the
* series, otherwise add the value directly.
*/
- if (likely(s->n == STATS_MAX_VALUE)) {
s->sum -= s->values[s->w_ptr];
s->sq_sum -= s->values[s->w_ptr] * s->values[s->w_ptr];
Same overflow issue: you should cast one term to a 64-bit value. like you did below.
Good point.
s->values[s->w_ptr] = value;
s->w_ptr = (s->w_ptr + 1) % STATS_MAX_VALUE;
- } else {
s->values[s->n] = value;
s->n++;
- }
/*
* In order to reduce the overhead and to prevent value
* derivation due to the integer computation, we just sum the
* value and do the division when the average and the variance
* are requested.
*/
- s->sum += value;
- s->sq_sum += (u64)value * value;
Now... let's assume we do want to limit statistics on IRQs that are less than a few seconds appart. We can have everything fit in 32-bit values. Assuming the maximum value of an u32, the sum of squared values may not exceed (2^32)-1. You have STATS_MAX_VALUE amount of such values. That means:
sqrt( ((2^32)-1) / STATS_MAX_VALUE ) = 32767
So... with 4 values we can fit everything in 32 bits if the IRQ interval is limited to 32 seconds.
Why 32 seconds ? It isn't 32 milli-second ?
If STATS_MAX_VALUE is 8 then the max interval should be 23 seconds. That interval is probably way larger than practically predictable.
[ ... ]
- /*
* It is the first time the interrupt occurs of the series, we
* can't do any stats as we don't have an interval, just store
* the timestamp and exit.
*/
- if (!ktime_to_us(w->timestamp)) {
Here ktime_to_us() implies a division. It is a division with a constant divisor so it gets optimized into a multiplication by the reciprocal, but still rather costly just for the purpose of comparing with zero.
Instead, you should use:
if (!w->timestamp.tv64)
Or if you want to preserve the ktime_t abstraction you could use:
if (ktime_equal(w->timestamp, ktime_set(0, 0))
Ok.
w->timestamp = timestamp;
return;
- }
- /*
* Microsec resolution is enough for our purpose.
*/
- diff = ktime_us_delta(timestamp, w->timestamp);
- w->timestamp = timestamp;
- /*
* Add the value, it results in:
* - two substractions
* - two additions
* - two multiplications
* (without the sanity check)
The (questionnable) sanity check notwitstanding, if you want to be precise, you'd have to take into account the addition from the pointer updating and the modulo (which ought to be turned into a mask). The code is trivial and obvious enough (more than what my code looked like) that it is probably not worth tracking those operations here.
Yes, this comment will lead to infinite discussions and close the door for adding new operations.
- stats_add(&w->stats, diff);
+}
+/*
- Callback to be called when an interrupt happens.
- */
+static struct irqtimings irq_timings = {
- .handler = sched_idle_irq,
+};
+static ktime_t next_irq_event(void) +{
- unsigned int irq, cpu = raw_smp_processor_id();
- ktime_t diff, next, min = (ktime_t){ .tv64 = KTIME_MAX };
To be consistent, you should use:
min = ktime_set(KTIME_SEC_MAX, 0)
Same thing for the return value below.
Ok.
- struct wakeup *w;
- s32 interval, mean;
- u64 variance;
- /*
* Lookup the interrupt array for this cpu and search for the
* earlier expected interruption.
*/
for (irq = 0; irq < NR_IRQS; irq++) {
w = per_cpu(wakesup[irq], cpu);
/*
* The interrupt was not setup as a source of a wakeup
* or the wakeup source is not considered at this
* moment stable enough to do a prediction.
*/
if (!w)
continue;
/*
* No statistics available yet.
*/
if (!ktime_to_us(w->timestamp))
continue;
Same comment as before.
Ok.
/*
* If the mean value is null, just ignore this wakeup
* source.
*/
mean = stats_mean(&w->stats);
if (!mean)
continue;
variance = stats_variance(&w->stats, mean);
/*
* We want to check the last interval is:
*
* mean - stddev < interval < mean + stddev
*
* That simplifies to:
*
* 0 < interval < 2 * stddev
That isn't mathematically correct.
The only simplification you may do here is:
mean - stddev < interval < mean + stddev
-> -stddev < interval - mean < stddev
*
* The standard deviation is the sqrt of the variance:
*
* 0 < interval < 2 * sqrt(variance)
*
* and we want to prevent to do an sqrt, so we square
* the equation:
*
* 0 < interval^2 < (2 * sqrt(variance))^2
*
* 0 < interval^2 < 4 * variance
Just imagine that you have a perfectly periodic interrupt that always happens at a precise interval. In that case the variance will be zero. You'll discard it although it is perfectly predictable.
Grumble ...
Given the correct expression, that should instead be:
-stddev < interval - mean < stddev
--> abs(interval - mean) < stddev
--> abs(interval - mean) < sqrt(variance)
--> (interval - mean)^2 < variance
Ok, I will fix this.
*
* As the time a monotonic positive function it is
* always greater than zero:
*
* interval^2 < 4 * variance
*
* So if the latest value of the stats complies with
* this condition, then the wakeup source is
* considered predictable and can be used to predict
* the next event.
*/
interval = w->stats.values[w->stats.n];
if ((u64)interval * interval > 4 * variance)
continue;
[ ... ]
+/**
- sched_irq_timing_free - free_irq callback
- @irq: the irq number to stop tracking
- @dev_id: not used at the moment
- This function will remove from the wakeup source prediction table.
- */
+static void sched_irq_timing_free(unsigned int irq, void *dev_id) +{
- struct irq_desc *desc = irq_to_desc(irq);
- struct wakeup *w;
- unsigned int cpu;
- unsigned long flags;
- raw_spin_lock_irqsave(&desc->lock, flags);
- desc->timings = NULL;
- raw_spin_unlock_irqrestore(&desc->lock, flags);
- for_each_possible_cpu(cpu) {
w = per_cpu(wakesup[irq], cpu);
if (!w)
continue;
per_cpu(wakesup[irq], cpu) = NULL;
kfree(w);
What if another CPU is currently in next_irq_event and has a reference to w while it is freed?
At the first glance, I don't think that can happen.
If we are in the next_irq_event, the lock is held preventing free_irq to be called. Then interrupts are disabled and the irq handler is unset, preventing the interrupt handler with the next_irq_event hook to be called.
However the IRQ code is complex and I will double check this.
Thanks for the review Nico.
-- Daniel
[ ... ]
-- http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro Facebook | http://twitter.com/#!/linaroorg Twitter | http://www.linaro.org/linaro-blog/ Blog