Re: [Eas-dev] [PATCH 2/3] sched: idle: IRQ based next prediction for idle period

16 Dec 2015

      On 12/15/2015 11:16 PM, Nicolas Pitre wrote:
...
On Tue, 15 Dec 2015, Daniel Lezcano wrote:
...
Many IRQs are quiet most of the time, or they tend to come in bursts of
fairly equal time intervals within each burst. It is therefore possible
to detect those IRQs with stable intervals and guestimate when the next
IRQ event is most likely to happen.
Examples of such IRQs may include audio related IRQs where the FIFO size
and/or DMA descriptor size with the sample rate create stable intervals,
block devices during large data transfers, etc.  Even network streaming
of multimedia content creates patterns of periodic network interface IRQs
in some cases.
This patch adds code to track the mean interval and variance for each IRQ
over a window of time intervals between IRQ events. Those statistics can
be used to assist cpuidle in selecting the most appropriate sleep state
by predicting the most likely time for the next interrupt.
Because the stats are gathered in interrupt context, the core computation
is as light as possible.
Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org
Comments below.
[ ... ]
...
...

+/*

Define the number of values we will be dealing with to the monitor

the interruptions. We define 2^x values to do stats on and as it is

a multiple of 2, bits shifting will be faster than an optimized

division.

*/

This comment is a bit confusing.  What is 2^x relating to? And bit
shifting is one possible form of an optimized division. Etc.
I'd suggest the following comment instead:
/*

Define the number of samples over which the average and variance
are computed. A power of 2 is preferred so to let the compiler
optimize divisions by that number with simple arithmetic shifts.

*/
Ok.
...
...
+#define STATS_MAX_VALUE 4
This should also be something like STATS_MAX_NR_VALUES otherwise this
could be interpreted as the maximum value used in the statistics.
Agree.
...
...
+/**

struct stats - internal structure to encapsulate stats informations

@sum: sum of the values

@sq_sum: sum of the square values

@values: array of values to do stats on

@n: current number of values

@w_ptr: current buffer pointer

*/

+struct stats {

s64          sum;                     /* sum of values */
u64          sq_sum;                  /* sum of the square values */
s32          values[STATS_MAX_VALUE]; /* array of values */
unsigned int n;                       /* current number of values */
unsigned int w_ptr;	    /* current window pointer */

Given we're dealing with time intervals here, they will always be
positive values. All fields can therefore be unsigned to make things a
little more efficient.
Agree.
...
...
+};

+/**

struct wakeup - internal structure describing a source of wakeup

@stats: the stats structure on the different event intervals

@timestamp: latest update timestamp

*/

+struct wakeup {

struct stats stats;
ktime_t timestamp;

+};

+/*

Per cpu and irq statistics. Each cpu receives interrupts and those

ones can be distributed following an irq chip specific

algorithm. Random irq distribution is the worst case to predict

interruption behavior but usually that does not happen or could be

fixed from userspace by setting the irq affinity.

*/

+static DEFINE_PER_CPU(struct wakeup, *wakesup[NR_IRQS]);
s/wakesup/wakeups/
Noted.
...
...

+/**

stats_add - add a new value in the statistic structure

@s: the statistic structure

@value: the new value to be added, max 2^24 - 1

Adds the value to the array, if the array is full, then the array

is shifted.

Strictly speaking, nothing is shifted.  Instead you could say "if the
array is full, the oldest value is replaced."
Agree.
...
...

*/

+static void stats_add(struct stats *s, s32 value)
+{

int limit = 1 << 24;
/*
* In order to prevent an overflow in the statistic code, we

* limit the value to 2^24 - 1, if it is greater we just

* ignore it with a WARN_ON_ONCE letting know to userspace we

* are dealing with out-of-range values.

*/

if (WARN_ON_ONCE(value >= (limit - 1) || value < -limit))
return;

This limit check is rather useless.  Given we're adding microsecs
intervals here, 2^24 microsecs is equal to 4.6 hours.  IRQ predictions
of that nature make no sense.  In fact, any IRQ interval greater than a
few seconds should simply reset the stats.
Ok.
In the previous code, when the irq interval was greater than one second
then we reset the stats. I noted a strange behavior with the network
where the results were worst. Actually on my system test, the hardware
is a bit weird: there are for the same cpu several interrupts for the
same NIC and it does round-robin at several seconds interval, so all
interrupts were always discarded. Removing the stats reset improved the
situation. I preferred to remove this portion of code in order to
refocus on it later with a rational rather than 'one second is long enough'.
I also, suspect we may endup with a tasklet to do more complex
statistics, especially using a standard error for a burst of intervals.
...
...

   /*

    * Insert the value in the array. If the array is already

    * full, shift the values and add the value at the end of the

    * series, otherwise add the value directly.

    */

if (likely(s->n == STATS_MAX_VALUE)) {
s->sum -= s->values[s->w_ptr];

s->sq_sum -= s->values[s->w_ptr] * s->values[s->w_ptr];

Same overflow issue: you should cast one term to a 64-bit value. like
you did below.
Good point.
...
...

s->values[s->w_ptr] = value;

s->w_ptr = (s->w_ptr + 1) % STATS_MAX_VALUE;

} else {
s->values[s->n] = value;

s->n++;

}

   /*

* In order to reduce the overhead and to prevent value

* derivation due to the integer computation, we just sum the

* value and do the division when the average and the variance

* are requested.

    */

s->sum += value;
s->sq_sum += (u64)value * value;

Now... let's assume we do want to limit statistics on IRQs that are less
than a few seconds appart.  We can have everything fit in 32-bit values.
Assuming the maximum value of an u32, the sum of squared values may not
exceed (2^32)-1.  You have STATS_MAX_VALUE amount of such values. That
means:
sqrt( ((2^32)-1) / STATS_MAX_VALUE ) = 32767
So... with 4 values we can fit everything in 32 bits if the IRQ interval
is limited to 32 seconds.
Why 32 seconds ? It isn't 32 milli-second ?
...
If STATS_MAX_VALUE is 8 then the max interval
should be 23 seconds.  That interval is probably way larger than
practically predictable.
[ ... ]
...
...

/*
* It is the first time the interrupt occurs of the series, we

* can't do any stats as we don't have an interval, just store

* the timestamp and exit.

*/

if (!ktime_to_us(w->timestamp)) {

Here ktime_to_us() implies a division. It is a division with a constant
divisor so it gets optimized into a multiplication by the reciprocal,
but still rather costly just for the purpose of comparing with zero.
Instead, you should use:
if (!w->timestamp.tv64)
Or if you want to preserve the ktime_t abstraction you could use:
if (ktime_equal(w->timestamp, ktime_set(0, 0))
Ok.
...
...

w->timestamp = timestamp;

return;

}

/*
* Microsec resolution is enough for our purpose.

*/

diff = ktime_us_delta(timestamp, w->timestamp);
w->timestamp = timestamp;

/*
* Add the value, it results in:

* - two substractions

* - two additions

* - two multiplications

* (without the sanity check)

The (questionnable) sanity check notwitstanding, if you want to be
precise, you'd have to take into account the addition from the pointer
updating and the modulo (which ought to be turned into a mask).  The
code is trivial and obvious enough (more than what my code looked like)
that it is probably not worth tracking those operations here.
Yes, this comment will lead to infinite discussions and close the door
for adding new operations.
...
...

stats_add(&w->stats, diff);

+}

+/*

Callback to be called when an interrupt happens.

*/

+static struct irqtimings irq_timings = {

.handler = sched_idle_irq,

+};

+static ktime_t next_irq_event(void)
+{

unsigned int irq, cpu = raw_smp_processor_id();
ktime_t diff, next, min = (ktime_t){ .tv64 = KTIME_MAX };

To be consistent, you should use:
min = ktime_set(KTIME_SEC_MAX, 0)
Same thing for the return value below.
Ok.
...
...

struct wakeup *w;
s32 interval, mean;
u64 variance;

/*
* Lookup the interrupt array for this cpu and search for the

* earlier expected interruption.

*/

   for (irq = 0; irq < NR_IRQS; irq++) {

w = per_cpu(wakesup[irq], cpu);

/*

 * The interrupt was not setup as a source of a wakeup

 * or the wakeup source is not considered at this

 * moment stable enough to do a prediction.

 */

if (!w)

	continue;

/*

 * No statistics available yet.

 */

if (!ktime_to_us(w->timestamp))

	continue;

Same comment as before.
Ok.
...
...

/*

 * If the mean value is null, just ignore this wakeup

 * source.

 */

mean = stats_mean(&w->stats);

if (!mean)

	continue;

variance = stats_variance(&w->stats, mean);

/*

 * We want to check the last interval is:

 *

 *  mean - stddev < interval < mean + stddev

 *

 * That simplifies to:

 *

 * 0 < interval < 2 * stddev

That isn't mathematically correct.
The only simplification you may do here is:
mean - stddev < interval < mean + stddev
->	-stddev < interval - mean < stddev
...

 *

 * The standard deviation is the sqrt of the variance:

 *

 *  0 < interval < 2 * sqrt(variance)

 *

 * and we want to prevent to do an sqrt, so we square

 * the equation:

 *

 *  0 < interval^2 < (2 * sqrt(variance))^2

 *

 *  0 < interval^2 < 4 * variance

Just imagine that you have a perfectly periodic interrupt that always
happens at a precise interval.  In that case the variance will be zero.
You'll discard it although it is perfectly predictable.
Grumble ...
...
Given the correct expression, that should instead be:
-stddev < interval - mean < stddev
-->	abs(interval - mean) < stddev
-->	abs(interval - mean) < sqrt(variance)
-->	(interval - mean)^2 < variance
Ok, I will fix this.
...
...

 *

 * As the time a monotonic positive function it is

 * always greater than zero:

 *

 * interval^2 < 4 * variance

 *

 * So if the latest value of the stats complies with

 * this condition, then the wakeup source is

 * considered predictable and can be used to predict

 * the next event.

 */

interval = w->stats.values[w->stats.n];

if ((u64)interval * interval > 4 * variance)

	continue;

[ ... ]
...
...
+/**

sched_irq_timing_free - free_irq callback

@irq: the irq number to stop tracking

@dev_id: not used at the moment

This function will remove from the wakeup source prediction table.

*/

+static void sched_irq_timing_free(unsigned int irq, void *dev_id)
+{

struct irq_desc *desc = irq_to_desc(irq);
struct wakeup *w;
unsigned int cpu;
unsigned long flags;

raw_spin_lock_irqsave(&desc->lock, flags);
desc->timings = NULL;
raw_spin_unlock_irqrestore(&desc->lock, flags);

for_each_possible_cpu(cpu) {
w = per_cpu(wakesup[irq], cpu);

if (!w)

	continue;

per_cpu(wakesup[irq], cpu) = NULL;

kfree(w);

What if another CPU is currently in next_irq_event and has a reference
to w while it is freed?
At the first glance, I don't think that can happen.
If we are in the next_irq_event, the lock is held preventing free_irq to
be called. Then interrupts are disabled and the irq handler is unset,
preventing the interrupt handler with the next_irq_event hook to be called.
However the IRQ code is complex and I will double check this.
Thanks for the review Nico.
-- Daniel
[ ... ]
--
  http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs
Follow Linaro:  http://www.facebook.com/pages/Linaro Facebook |
http://twitter.com/#!/linaroorg Twitter |
http://www.linaro.org/linaro-blog/ Blog

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

Re: [Eas-dev] [PATCH 2/3] sched: idle: IRQ based next prediction for idle period