eas-dev

eas-dev@lists.linaro.org

423 discussions

[PATCH] rt-app: Fix compilation warnings

by Daniel Lezcano

Fix the following compilation warnings: rt-app_utils.c: In function ‘gettid’: rt-app_utils.c:150:9: warning: implicit declaration of function ‘syscall’ [-Wimplicit-function-declaration] return syscall(__NR_gettid); ^ rt-app_utils.c: In function ‘ftrace_write’: rt-app_utils.c:277:4: warning: implicit declaration of function ‘write’ [-Wimplicit-function-declaration] write(mark_fd, tmp, n); ^ mv -f .deps/rt-app_utils.Tpo .deps/rt-app_utils.Po gcc -DHAVE_CONFIG_H -I. -I./../libdl/ -g -O2 -MT rt-app_args.o -MD -MP -MF .deps/rt-app_args.Tpo -c -o rt-app_args.o rt-app_args.c rt-app_args.c: In function ‘parse_command_line’: rt-app_args.c:44:3: warning: implicit declaration of function ‘parse_config’ [-Wimplicit-function-declaration] parse_config(argv[1], opts); ^ rt-app_args.c:47:3: warning: implicit declaration of function ‘parse_config_stdin’ [-Wimplicit-function-declaration] parse_config_stdin(opts); ^ mv -f .deps/rt-app_args.Tpo .deps/rt-app_args.Po gcc -DHAVE_CONFIG_H -I. -I./../libdl/ -g -O2 -MT rt-app.o -MD -MP -MF .deps/rt-app.Tpo -c -o rt-app.o rt-app.c rt-app.c:173:15: warning: return type defaults to ‘int’ [-Wimplicit-int] static inline loadwait(unsigned long exec) ^ rt-app.c: In function ‘ioload’: rt-app.c:195:9: warning: implicit declaration of function ‘write’ [-Wimplicit-function-declaration] ret = write(io_fd, iomem->ptr, size); ^ rt-app.c: In function ‘run’: rt-app.c:340:4: warning: ‘return’ with no value, in function returning non-void return; ^ rt-app.c: In function ‘shutdown’: rt-app.c:381:3: warning: implicit declaration of function ‘close’ [-Wimplicit-function-declaration] close(ft_data.trace_fd); ^ rt-app.c: In function ‘main’: rt-app.c:848:3: warning: implicit declaration of function ‘sleep’ [-Wimplicit-function-declaration] sleep(opts.duration); Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org> --- src/rt-app.c | 5 +++-- src/rt-app_args.c | 1 + src/rt-app_parse_config.h | 2 ++ src/rt-app_utils.c | 2 +- 4 files changed, 7 insertions(+), 3 deletions(-) diff --git a/src/rt-app.c b/src/rt-app.c index fef12d8..c3e5df4 100644 --- a/src/rt-app.c +++ b/src/rt-app.c @@ -21,6 +21,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. #define _GNU_SOURCE #include <fcntl.h> +#include <unistd.h> #include "rt-app.h" #include "rt-app_utils.h" #include <sched.h> @@ -170,7 +171,7 @@ int calibrate_cpu_cycles(int clock) } -static inline loadwait(unsigned long exec) +static inline unsigned long loadwait(unsigned long exec) { unsigned long load_count; @@ -337,7 +338,7 @@ int run(int ind, event_data_t *events, for (i = 0; i < nbevents; i++) { if (!continue_running && !lock) - return; + return 0; log_debug("[%d] runs events %d type %d ", ind, i, events[i].type); if (opts.ftrace) diff --git a/src/rt-app_args.c b/src/rt-app_args.c index e16415d..c4d56de 100644 --- a/src/rt-app_args.c +++ b/src/rt-app_args.c @@ -19,6 +19,7 @@ along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ +#include "rt-app_parse_config.h" #include "rt-app_args.h" void diff --git a/src/rt-app_parse_config.h b/src/rt-app_parse_config.h index 023cabd..9b0e5fa 100644 --- a/src/rt-app_parse_config.h +++ b/src/rt-app_parse_config.h @@ -45,5 +45,7 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. void parse_config(const char *filename, rtapp_options_t *opts); +void +parse_config_stdin(rtapp_options_t *opts); #endif // _RTAPP_PARSE_CONFIG_H diff --git a/src/rt-app_utils.c b/src/rt-app_utils.c index c4840db..190affc 100644 --- a/src/rt-app_utils.c +++ b/src/rt-app_utils.c @@ -18,7 +18,7 @@ You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. */ - +#include <unistd.h> #include "rt-app_utils.h" unsigned long -- 1.9.1

9 years, 7 months

[PATCH 1/3] irq: Add a framework to measure interrupt timings

by Daniel Lezcano

The interrupt framework gives a lot of information and statistics about each interrupt (time accounting, statistics, ...). Unfortunately there is no way to measure when interrupts occur and provide a mathematical model for their behavior which could help in predicting their next occurence. This framework allows for registering a callback function that is invoked when an interrupt occurs. Each time an interrupt occurs, the callback will be called with the interval corresponding to the duration between two interrupts. This framework allows a subsystem to register a handler in order to receive the timing information for the registered interrupt. That gives other subsystems the ability to compute predictions for the next interrupt occurence. The main objective is to track and detect the periodic interrupts in order to predict the next event on a cpu and anticipate the sleeping time when entering idle. This fine grain approach allows to simplify and rationalize a wake up event prediction without IPIs interference, thus letting the scheduler to be smarter with the wakeup IPIs regarding the idle period. The irq timings tracking showed, in the proof-of-concept, an improvement with the predictions, the approach is correct but my knowledge to the irq subsystem is limited. I am not sure this patch measuring irq time interval is correct or acceptable, so it is at the RFC state (minus some polishing). Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org> --- include/linux/interrupt.h | 45 ++++++++++++++++++++++++++++++++ include/linux/irqdesc.h | 3 +++ kernel/irq/Kconfig | 4 +++ kernel/irq/handle.c | 12 +++++++++ kernel/irq/manage.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 128 insertions(+), 1 deletion(-) diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index be7e75c..f48e8ff 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -123,6 +123,51 @@ struct irqaction { extern irqreturn_t no_action(int cpl, void *dev_id); +#ifdef CONFIG_IRQ_TIMINGS +/** + * timing handler to be called when an interrupt happens + */ +typedef void (*irqt_handler_t)(unsigned int, ktime_t, void *, void *); + +/** + * struct irqtimings - per interrupt irq timings descriptor + * @handler: interrupt handler timings function + * @data: pointer to the private data to be passed to the handler + * @timestamp: latest interruption occurence + */ +struct irqtimings { + irqt_handler_t handler; + void *data; +} ____cacheline_internodealigned_in_smp; + +/** + * struct irqt_ops - structure to be used by the subsystem to call the + * register and unregister ops when an irq is setup or freed. + * @setup: registering callback + * @free: unregistering callback + * + * The callbacks assumes the lock is held on the irq desc + */ +struct irqtimings_ops { + int (*setup)(unsigned int, struct irqaction *); + void (*free)(unsigned int, void *); +}; + +extern int register_irq_timings(struct irqtimings_ops *ops); +extern int setup_irq_timings(unsigned int irq, struct irqaction *act); +extern void free_irq_timings(unsigned int irq, void *dev_id); +#else +static inline int setup_irq_timings(unsigned int irq, struct irqaction *act) +{ + return 0; +} + +static inline void free_irq_timings(unsigned int irq, void *dev_id) +{ + ; +} +#endif + extern int __must_check request_threaded_irq(unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index a587a33..e0d4263 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -51,6 +51,9 @@ struct irq_desc { #ifdef CONFIG_IRQ_PREFLOW_FASTEOI irq_preflow_handler_t preflow_handler; #endif +#ifdef CONFIG_IRQ_TIMINGS + struct irqtimings *timings; +#endif struct irqaction *action; /* IRQ action list */ unsigned int status_use_accessors; unsigned int core_internal_state__do_not_mess_with_it; diff --git a/kernel/irq/Kconfig b/kernel/irq/Kconfig index 9a76e3b..1275fd1 100644 --- a/kernel/irq/Kconfig +++ b/kernel/irq/Kconfig @@ -73,6 +73,10 @@ config GENERIC_MSI_IRQ_DOMAIN config HANDLE_DOMAIN_IRQ bool +config IRQ_TIMINGS + bool + default y + config IRQ_DOMAIN_DEBUG bool "Expose hardware/virtual IRQ mapping via debugfs" depends on IRQ_DOMAIN && DEBUG_FS diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c index e25a83b..ca8b0c5 100644 --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c @@ -132,6 +132,17 @@ void __irq_wake_thread(struct irq_desc *desc, struct irqaction *action) wake_up_process(action->thread); } +#ifdef CONFIG_IRQ_TIMINGS +void handle_irqt_event(struct irqtimings *irqt, struct irqaction *action) +{ + if (irqt) + irqt->handler(action->irq, ktime_get(), + action->dev_id, irqt->data); +} +#else +#define handle_irqt_event(a, b) +#endif + irqreturn_t handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) { @@ -165,6 +176,7 @@ handle_irq_event_percpu(struct irq_desc *desc, struct irqaction *action) /* Fall through to add to randomness */ case IRQ_HANDLED: flags |= action->flags; + handle_irqt_event(desc->timings, action); break; default: diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index f9a59f6..21cc7bf 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1017,6 +1017,60 @@ static void irq_release_resources(struct irq_desc *desc) c->irq_release_resources(d); } +#ifdef CONFIG_IRQ_TIMINGS +/* + * Global variable, only used by accessor functions, currently only + * one user is allowed and it is up to the caller to make sure to + * setup the irq timings which are already setup. + */ +static struct irqtimings_ops *irqtimings_ops; + +/** + * register_irq_timings - register the ops when an irq is setup or freed + * + * @ops: the register/unregister ops to be called when at setup or + * free time + * + * Returns -EBUSY if the slot is already in use, zero on success. + */ +int register_irq_timings(struct irqtimings_ops *ops) +{ + if (irqtimings_ops) + return -EBUSY; + + irqtimings_ops = ops; + + return 0; +} + +/** + * setup_irq_timings - call the timing register callback + * + * @desc: an irq desc structure + * + * Returns -EINVAL in case of error, zero on success. + */ +int setup_irq_timings(unsigned int irq, struct irqaction *act) +{ + if (irqtimings_ops && irqtimings_ops->setup) + return irqtimings_ops->setup(irq, act); + return 0; +} + +/** + * free_irq_timings - call the timing unregister callback + * + * @irq: the interrupt number + * @dev_id: the device id + * + */ +void free_irq_timings(unsigned int irq, void *dev_id) +{ + if (irqtimings_ops && irqtimings_ops->free) + irqtimings_ops->free(irq, dev_id); +} +#endif /* CONFIG_IRQ_TIMINGS */ + /* * Internal function to register an irqaction - typically used to * allocate special interrupts that are part of the architecture. @@ -1037,6 +1091,9 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) if (!try_module_get(desc->owner)) return -ENODEV; + ret = setup_irq_timings(irq, new); + if (ret) + goto out_mput; /* * Check whether the interrupt nests into another interrupt * thread. @@ -1045,7 +1102,7 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new) if (nested) { if (!new->thread_fn) { ret = -EINVAL; - goto out_mput; + goto out_free_timings; } /* * Replace the primary handler which was provided from @@ -1323,6 +1380,10 @@ out_thread: kthread_stop(t); put_task_struct(t); } + +out_free_timings: + free_irq_timings(irq, new->dev_id); + out_mput: module_put(desc->owner); return ret; @@ -1408,6 +1469,8 @@ static struct irqaction *__free_irq(unsigned int irq, void *dev_id) unregister_handler_proc(irq, action); + free_irq_timings(irq, dev_id); + /* Make sure it's not being used on another CPU: */ synchronize_irq(irq); -- 1.9.1

9 years, 7 months

[PATCH 01/10] lib: Add a simple statistics library

by Daniel Lezcano

It is usually interesting to do some statistics on a set of data, especially when the values are coming in a stream (measured time by time for instance). Instead of having the statistics formula inside a specific subsystem, this small library provides the basic statistics functions available for all the kernel. The library is designed to do the minimum computation when a new value is added. Only the basic value storage, array shifting and accumulation is done. The average, variance and standard deviation is computed when requested via the corresponding functions. The statistic library can deal up to 65536 values in the range of -2^24 and 2^24 - 1. These are large values and does not really make sense to the kernel code to use the statistics at these limits, so it is up to developer to use wisely the library. Signed-off-by: Daniel Lezcano <daniel.lezcano(a)linaro.org> --- include/linux/stats.h | 29 +++++++ lib/Makefile | 3 +- lib/stats.c | 235 ++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 266 insertions(+), 1 deletion(-) create mode 100644 include/linux/stats.h create mode 100644 lib/stats.c diff --git a/include/linux/stats.h b/include/linux/stats.h new file mode 100644 index 0000000..664eb30 --- /dev/null +++ b/include/linux/stats.h @@ -0,0 +1,29 @@ +/* + * include/linux/stats.h + */ +#ifndef _LINUX_STATS_H +#define _LINUX_STATS_H + +struct stats { + s64 sum; /* sum of values */ + u64 sq_sum; /* sum of the square values */ + s32 *values; /* array of values */ + s32 min; /* minimal value of the entire series */ + s32 max; /* maximal value of the entire series */ + unsigned int n; /* current number of values */ + unsigned int w_ptr; /* current window pointer */ + unsigned short len; /* size of the value array */ +}; + +extern s32 stats_max(struct stats *s); +extern s32 stats_min(struct stats *s); +extern s32 stats_mean(struct stats *s); +extern u32 stats_variance(struct stats *s); +extern u32 stats_stddev(struct stats *s); +extern void stats_add(struct stats *s, s32 val); +extern void stats_reset(struct stats *s); +extern void stats_free(struct stats *s); +extern struct stats *stats_alloc(unsigned short len); +extern unsigned short stats_n(struct stats *s); + +#endif /* _LINUX_STATS_H */ diff --git a/lib/Makefile b/lib/Makefile index 13a7c6a..18460d2 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -26,7 +26,8 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \ gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \ bsearch.o find_bit.o llist.o memweight.o kfifo.o \ - percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o + percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \ + stats.o obj-y += string_helpers.o obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o obj-y += hexdump.o diff --git a/lib/stats.c b/lib/stats.c new file mode 100644 index 0000000..f5425d1 --- /dev/null +++ b/lib/stats.c @@ -0,0 +1,235 @@ +/* + * Implementation of basics statistics functions useful to compute on + * a stream of data. + * + * Copyright: (C) 2015-2016 Linaro Limited + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ +#include <linux/export.h> +#include <linux/kernel.h> +#include <linux/slab.h> +#include <linux/stats.h> +#include <linux/types.h> + +/** + * stats_n - number of elements present in the statistics + * + * @s: the statistic structure + * + * Returns an unsigned short representing the number of values in the + * statistics series + */ +unsigned short stats_n(struct stats *s) +{ + return s->n; +} + +/** + * stats_max - maximal value of the series + * + * @s: the statistic structure + * + * Returns a s32 representing the maximum value of the series + */ +s32 stats_max(struct stats *s) +{ + return s->max; +} +EXPORT_SYMBOL_GPL(stats_max); + +/** + * stats_min - minimal value of the series + * + * @s: the statistic structure + * + * Returns a s32 representing the minimal value of the series + */ +s32 stats_min(struct stats *s) +{ + return s->min; +} +EXPORT_SYMBOL_GPL(stats_min); + +/** + * stats_mean - compute the average + * + * @s: the statistics structure + * + * Returns an s32 corresponding to the mean value, or zero if there is + * no data + */ +s32 stats_mean(struct stats *s) +{ + return s->n ? s->sum / s->n : 0; +} +EXPORT_SYMBOL_GPL(stats_mean); + +/** + * stats_variance - compute the variance + * + * @s: the statistic structure + * + * Returns an u32 corresponding to the variance, or zero if there is no + * data + */ +u32 stats_variance(struct stats *s) +{ + s32 mean = stats_mean(s); + return s->n ? (s->sq_sum / s->n) - (mean * mean) : 0; +} +EXPORT_SYMBOL_GPL(stats_variance); + +/** + * stats_stddev - compute the standard deviation + * + * @s: the statistic structure + * + * Returns an u32 corresponding to the standard deviation, or zero if + * there is no data + */ +u32 stats_stddev(struct stats *s) +{ + return int_sqrt(stats_variance(s)); +} +EXPORT_SYMBOL_GPL(stats_stddev); + +/** + * stats_add - add a new value in the statistic structure + * + * @s: the statistic structure + * @value: the new value to be added, max 2^24 - 1 + * + * Adds the value to the array, if the array is full, then the array + * is shifted. + * + */ +void stats_add(struct stats *s, s32 value) +{ + /* + * In order to prevent an overflow in the statistic code, we + * limit the value to 2^24 - 1, if it is greater we just + * ignore it with a WARN_ON_ONCE letting know to userspace we + * are dealing with out-of-range values. + */ + if (WARN_ON_ONCE(value >= ((2<<24) - 1))) + return; + + /* + * Insert the value in the array. If the array is already + * full, shift the values and add the value at the end of the + * series, otherwise add the value directly. + */ + if (likely(s->len == s->n)) { + s->sum -= s->values[s->w_ptr]; + s->sq_sum -= s->values[s->w_ptr] * s->values[s->w_ptr]; + s->values[s->w_ptr] = value; + s->w_ptr = (s->w_ptr + 1) % s->len; + } else { + s->values[s->n] = value; + s->n++; + } + + /* + * Keep track of the min and max. + */ + s->min = min(s->min, value); + s->max = max(s->max, value); + + /* + * In order to reduce the overhead and to prevent value + * derivation due to the integer computation, we just sum the + * value and do the division when the average and the variance + * are requested. + */ + s->sum += value; + s->sq_sum += value * value; +} +EXPORT_SYMBOL_GPL(stats_add); + +/** + * stats_reset - reset the stats + * + * @s: the statistic structure + * + * Reset the statistics and reset the values + * + */ +void stats_reset(struct stats *s) +{ + s->sum = s->sq_sum = s->n = s->w_ptr = 0; + s->max = S32_MIN; + s->min = S32_MAX; +} +EXPORT_SYMBOL_GPL(stats_reset); + +/** + * stats_alloc - allocate a structure for statistics + * + * @len: The number of items in the array which is limited by the + * unsigned short type. That allows to prevent overflow in all + * the statistics code. + * + * Allocates memory to store the different values and initialize the + * structure. + * + * In order to prevent an overflow in the computation, the maximum + * allowed number of values is 65536 and each value max is 2^24 - 1. + * + * The variance is the sum of the square value of the difference of + * each value to the average. The variance is a u64 (square values are + * always positives), so that gives a maximum of 18446744073709551615. + * We can store 65536 values, so: + * + * 18446744073709551615 / 65536 = 281474976710655 + * + * ... is the square max value we can have, hence the difference to + * the mean is max sqrt(281474976710655) = 16777215 (2^24 -1) + * + * Even if these values are not realistics (statistic in the kernel is + * for a few hundred values, large dispersion in the integer limits is + * very very rare so the sum won't be very high, or high integers + * series means low variance), we prevent any overflow in the code and + * we are safe. + * + * Returns a valid pointer a struct stats, NULL if the memory + * allocation fails. + */ +struct stats *stats_alloc(unsigned short len) +{ + struct stats *s; + s32 *values; + + s = kzalloc(sizeof(*s), GFP_KERNEL); + if (s) { + values = kzalloc(sizeof(*values) * len, GFP_KERNEL); + if (!values) { + kfree(s); + return NULL; + } + + s->values = values; + s->len = len; + s->min = S32_MAX; + s->max = S32_MIN; + } + + return s; +} +EXPORT_SYMBOL_GPL(stats_alloc); + +/** + * stats_free - free the statistics structure + * + * @s : the statistics structure + * + * Frees the memory allocated by the function stats_alloc. + */ +void stats_free(struct stats *s) +{ + kfree(s->values); + kfree(s); +} +EXPORT_SYMBOL_GPL(stats_free); -- 1.9.1

9 years, 7 months

[PATCH 00/10] IRQ based next prediction

by Daniel Lezcano

The current approach to select an idle state is based on the idle period statistics computation. Useless to say this approach satisfied everyone as a solution to find the best trade-off between the performances and the energy saving via the menu governor. However, the kernel is evolving to act pro-actively regarding the energy constraints with the scheduler and the different power management subsystems are not collaborating with the scheduler as the conductor of the decisions, they all act independently. In order to integrate the cpuidle framework into the scheduler, we have to radically change the approach by clearly identifying what is causing a wake up and how it behaves. The cpuidle governors are based on idle period statistics, hence without knowledge of what woke up the cpu. In these sources of wakes up, the IPI are of course accounted which results in doing statistics on the scheduler behavior too. It is no sense to let the scheduler to take a decision based on a next prediction of its own decisions. This serie inverts the logic. First there is a small statistic library do to basic and fast statistics computation, put in the library directory and make it available to everyone. It is mathematically proven there is no overflow in the code (check the log and comments). The second patch provides a callback to be registered in the irq subsystem and to be called when an interrupt is handled with a timestamp. Interrupts related to timers are discarded. The third patch uses the callback provided by the patch above to compute an average for each interrupt on each cpu. When the interrupt intervals are in standard deviation +/- mean value, then the source of wake up is considered stable and enters in the 'predictable' category. Then the next prediction wakeup for a specific cpu is the minimum remaining time of each interrupt's next prediction / or the timer. These are the results with a workload emulator (mp3, video, browser, ...) on a Dual Xeon 6 cores. Each test has been run 10 times. -------------------------- successful predictions (%) -------------------------- scripts/rt-app-browser.sh.menu.dat: N min max sum mean stddev 10 56.51 68.61 631.27 63.127 3.6882 scripts/rt-app-browser.sh.irq.dat: N min max sum mean stddev 10 72.88 79.94 774.43 77.443 2.10055 -------------------------- Successful predictions (%) -------------------------- scripts/rt-app-mp3.sh.menu.dat: N min max sum mean stddev 10 65.4 69.53 675.51 67.551 1.42503 scripts/rt-app-mp3.sh.irq.dat: N min max sum mean stddev 10 82.03 92.13 854.69 85.469 2.63553 -------------------------- Successful predictions (%) -------------------------- scripts/rt-app-video.sh.menu.dat: N min max sum mean stddev 10 57.69 77.72 625.58 62.558 5.54488 scripts/rt-app-video.sh.irq.dat: N min max sum mean stddev 10 73.19 75.2 742.33 74.233 0.752316 -------------------------- Successful predictions (%) -------------------------- scripts/video.sh.menu.dat: N min max sum mean stddev 10 40.7 59.08 463.02 46.302 5.25094 scripts/video.sh.irq.dat: N min max sum mean stddev 10 29.64 84.59 425.58 42.558 16.007 The next prediction algorithm is very simple at the moment but it opens the door for the following improvements: - Detect patterns (eg. 1, 1, 3, 1, 1, 3, ...) - Each devices behave differently, thus the prediction algorithm can be per interrupt. Eg. disk ios have a burst of fast interrupt followed by a couple of slow interrupts. If a simplistic algorithm gives better results than the menu governor, there is a high probability an optimized one will do much better. * Regarding how this integrates into the scheduler At the moment the integration is the first step, hence there is just a very small integration when the scheduler tries to find a cpu it will prevent to use an idle cpu where the idle period did not reach the energy break even. Invoking the API to enter idle is simplified on purpose to let the scheduler to take a decision between it asks when is expected the next wakeup on the cpu and when it enters idle. - sched_idle_next_wakeup() => returns a s64 telling the remaining time before a wakeup occurs - sched_idle(duration, latency) => goes idle with the specified duration and the latency constraint Daniel Lezcano (9): lib: Add a simple statistics library irq: Add a framework to measure interrupt timings sched: idle: IRQ based next prediction for idle period sched-idle: Plug sched idle with the idle task cpuidle: Add statistics and debug information with debugfs cpuidle: Store the idle start time stamp sched: fair: Fix wrong idle timestamp usage sched/fair: Prevent to break the target residency sched-idle: Add a debugfs entry to switch from cpuidle to sched-idle Nicolas Pitre (1): idle-sched: Add a trace event when an interrupt occurs drivers/cpuidle/Kconfig | 12 ++ drivers/cpuidle/Makefile | 2 + drivers/cpuidle/cpuidle.c | 16 +- drivers/cpuidle/debugfs.c | 232 +++++++++++++++++++++++ drivers/cpuidle/debugfs.h | 19 ++ include/linux/cpuidle.h | 16 ++ include/linux/interrupt.h | 45 +++++ include/linux/irqdesc.h | 3 + include/linux/stats.h | 29 +++ include/trace/events/irq.h | 44 +++++ kernel/irq/Kconfig | 3 + kernel/irq/handle.c | 12 ++ kernel/irq/manage.c | 65 ++++++- kernel/sched/Makefile | 1 + kernel/sched/fair.c | 44 +++-- kernel/sched/idle-sched.c | 449 +++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/idle.c | 11 +- kernel/sched/sched.h | 20 ++ lib/Makefile | 3 +- lib/stats.c | 235 ++++++++++++++++++++++++ 20 files changed, 1239 insertions(+), 22 deletions(-) create mode 100644 drivers/cpuidle/debugfs.c create mode 100644 drivers/cpuidle/debugfs.h create mode 100644 include/linux/stats.h create mode 100644 kernel/sched/idle-sched.c create mode 100644 lib/stats.c -- 1.9.1

9 years, 7 months

PELT load-tracking problem while running Geekbench on MT8173

by Koan-Sin Tan

Hi, Geekbench is a widely used benchmark on Android and iOs devices. It's also available on Linux and Mac OS X. It somehow provides a cross-platform benchmark which we can use as an index of how powerful the tested CPUs are. I was trying to run Geekbench on MTK's internal EAS-enabled Android-based device and found Geekbench scores are lower than expected because of relative slow response time of PELT. See the attached pdf for a bit more description of a GeekBench trace analysed using TRAPpy. -- // freedom

9 years, 7 months

Regarding EAS usefulness for SMP system

by Nitish Ambastha

Dear All I am going through the EAS project work and trying to port them on my ARM based SMP system (3.10 Linux version) Could you please help me clarify, will EAS be helpful in terms of power/performance for SMP systems as well? Thanks & Regards Nitish Ambastha

9 years, 7 months

decaying of idle CPU utilization

by Steve Muckle

To summarize the current problem with idle CPU capacity votes: - When the last task on a CPU (say CPU X) sleeps and the CPU goes idle, we currently drop its capacity vote to zero. We do not immediately update the cluster frequency based on this information however. - It depends on when other CPUs in the frequency domain have an event which forces re-evaluation of the capacity votes and corresponding frequency. It could occur right away, lowering the frequency only to require raising it again immediately if CPU X is idle a very short time. Or it could be a very long time before such an event occurs which will leave the cluster at an unnecessarily high OPP and waste energy. I have a draft of a change which modifies the nohz idle balance path a bit to ensure that update_blocked_averages() is called for tickless idle CPUs at least every X ms. This alone won't solve the above problems though. You need to force re-evaluation of the capacity votes somewhere to update the cluster frequency. I was originally going to call into cpufreq_sched as idle CPU loads are decayed to update the frequency there but folks didn't seem to like this during Thursday's call. We could get rid of the clearing of the capacity vote when entering idle and use a passive update when decaying idle CPU utilizations (setting the capacity vote but not triggering a re-evaluation of cluster frequency). That would solve the problem of risking the cluster frequency dropping to fmin during a very short idle and having to be immediately ramped up again. It will not solve the issue of the cluster potentially getting stuck idle at fmax/high frequency for long periods of time and wasting energy though. There's been some discussion on this issue in the context of integration of cpuidle with cpufreq and the scheduler (see attached). Rather than force regular load decay updates via the load balancer and figure out when to force frequency re-evaluation I'm inclined to just remove the clearing of the capacity vote in dequeue_task_fair when going idle and tackle this problem within cpuidle as part of an energy aware/platform aware decision (see #2 in the attachment). A possible policy in cpuidle might look like: - If it's a short idle, don't bother removing capacity vote. - If it's a long idle and the system doesn't burn extra power in idle at elevated frequency, passively remove the capacity vote. Frequency gets adjusted if another CPU has a freq-evaluating event, like today. - If it's a long idle and the system burns extra power in idle, actively remove the capacity vote, immediately adjusting frequency if needed. A slack timer mechanism may still be desirable in cpuidle to guard against the prediction being wrong (you think it's a short idle and leave a high capacity vote in, but it ends up being a long idle). Thanks if you've read this far! Also, I hope to migrate these discussions to lkml+linux-pm. Perhaps after the next sched-freq RFC posting which will surely spawn discussions there anyway and get everyone up to speed on our current status and issues, making it a good cutover point.

9 years, 7 months

enqueue_task_fair and task_tick_fair race when setting opp

by Vikram Mulukutla

Hi Juri, Steve, It looks like if cpuX sets its own OPP level in task_tick_fair (to capacity_orig), another cpuY can override this to any value (at least via enqueue_task_fair) before cpuX's request can take effect (i.e. before the throttling timestamp is updated via the kcpufreq thread). The request from cpuX at the next tick may be throttled or the task may go to sleep and its load is decayed enough that the next request after wakeup no longer crosses the threshold and hence we lose the opportunity to go to FMAX. It seems like we need to have a mechanism where a current higher request from cpu for its own capacity should override any other cpu's lower request? Thanks, Vikram

9 years, 7 months

cpufreq_sched: reset capacity in pick_next_task_idle?

by Vikram Mulukutla

Hello, I'm using the EASv5 3.18 tree with cpufreq_sched. With the sched governor enabled I've noticed that after a migration or after a switch from a non-fair task to the idle task, the source CPU goes idle and its (possibly max) capacity request stays in place, preventing other requests from going through until that source CPU decides to wake up and take up some work. I know that there are some ongoing discussions about how to actually enforce a frequency reduction when a CPU enters idle to save power, but this seems to be a more immediate problem since the other CPU(s)' requests are also basically ignored. How about a reset_capacity call in pick_next_task_idle? Throttling is a concern I suppose, but I think the check in dequeue_task_fair is doing the same thing already, so the following would just repeat for non_fair_class->idle_task. diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index c65dac8..555c21d 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -28,6 +28,8 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev) { put_prev_task(rq, prev); + cpufreq_sched_reset_cap(cpu_of(rq)); + schedstat_inc(rq, sched_goidle); return rq->idle; } Thanks, Vikram

9 years, 8 months

delaying update of setting OPP

by Steve Muckle

In cpufreq_sched_set_cap we currently have this: /* * We only change frequency if this cpu's capacity request represents a * new max. If another cpu has requested a capacity greater than the * previous max then we rely on that cpu to hit this code path and make * the change. IOW, the cpu with the new max capacity is responsible * for setting the new capacity/frequency. * * If this cpu is not the new maximum then bail */ if (capacity_max > capacity) goto out; But this can lead to situations like (2 CPU cluster, CPUs start with cap request of 0): 1. CPU0 gets heavily loaded, requests cap = 1024 (fmax) 2. CPU1 gets lightly loaded, requests cap = 10 3. CPU0's load goes away, requests cap = 0 4. CPU1's load of 10 persists for a long time In step #3 we could've set the cluster capacity to 10/1024 but did not because the CPU we were working with at the time (CPU0) was not the CPU driving the new cluster maximum capacity request. As a result we run unnecessarily at fmax for a long time. Any reason to not set the OPP associated with the new max capacity request immediately, regardless of what CPU is driving it? thanks, Steve

9 years, 8 months

← Newer
1
...
36
37
38
39
40
41
42
43
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

eas-dev